Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'pm-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
"These add support for 'artificial' Energy Models in which power
numbers for different entities may be in different scales, add support
for some new hardware, fix bugs and clean up code in multiple places.

Specifics:

- Update the Energy Model support code to allow the Energy Model to
be artificial, which means that the power values may not be on a
uniform scale with other devices providing power information, and
update the cpufreq_cooling and devfreq_cooling thermal drivers to
support artificial Energy Models (Lukasz Luba).

- Make DTPM check the Energy Model type (Lukasz Luba).

- Fix policy counter decrementation in cpufreq if Energy Model is in
use (Pierre Gondois).

- Add CPU-based scaling support to passive devfreq governor (Saravana
Kannan, Chanwoo Choi).

- Update the rk3399_dmc devfreq driver (Brian Norris).

- Export dev_pm_ops instead of suspend() and resume() in the IIO
chemical scd30 driver (Jonathan Cameron).

- Add namespace variants of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and
PM-runtime counterparts (Jonathan Cameron).

- Move symbol exports in the IIO chemical scd30 driver into the
IIO_SCD30 namespace (Jonathan Cameron).

- Avoid device PM-runtime usage count underflows (Rafael Wysocki).

- Allow dynamic debug to control printing of PM messages (David
Cohen).

- Fix some kernel-doc comments in hibernation code (Yang Li, Haowen
Bai).

- Preserve ACPI-table override during hibernation (Amadeusz
Sławiński).

- Improve support for suspend-to-RAM for PSCI OSI mode (Ulf Hansson).

- Make Intel RAPL power capping driver support the RaptorLake and
AlderLake N processors (Zhang Rui, Sumeet Pawnikar).

- Remove redundant store to value after multiply in the RAPL power
capping driver (Colin Ian King).

- Add AlderLake processor support to the intel_idle driver (Zhang
Rui).

- Fix regression leading to no genpd governor in the PSCI cpuidle
driver and fix the riscv-sbi cpuidle driver to allow a genpd
governor to be used (Ulf Hansson).

- Fix cpufreq governor clean up code to avoid using kfree() directly
to free kobject-based items (Kevin Hao).

- Prepare cpufreq for powerpc's asm/prom.h cleanup (Christophe
Leroy).

- Make intel_pstate notify frequency invariance code when no_turbo is
turned on and off (Chen Yu).

- Add Sapphire Rapids OOB mode support to intel_pstate (Srinivas
Pandruvada).

- Make cpufreq avoid unnecessary frequency updates due to mismatch
between hardware and the frequency table (Viresh Kumar).

- Make remove_cpu_dev_symlink() clear the real_cpus mask to simplify
code (Viresh Kumar).

- Rearrange cpufreq_offline() and cpufreq_remove_dev() to make the
calling convention for some driver callbacks consistent (Rafael
Wysocki).

- Avoid accessing half-initialized cpufreq policies from the show()
and store() sysfs functions (Schspa Shi).

- Rearrange cpufreq_offline() to make the calling convention for some
driver callbacks consistent (Schspa Shi).

- Update CPPC handling in cpufreq (Pierre Gondois).

- Extend dev_pm_domain_detach() doc (Krzysztof Kozlowski).

- Move genpd's time-accounting to ktime_get_mono_fast_ns() (Ulf
Hansson).

- Improve the way genpd deals with its governors (Ulf Hansson).

- Update the turbostat utility to version 2022.04.16 (Len Brown, Dan
Merillat, Sumeet Pawnikar, Zephaniah E. Loss-Cutler-Hull, Chen Yu)"

* tag 'pm-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (94 commits)
PM: domains: Trust domain-idle-states from DT to be correct by genpd
PM: domains: Measure power-on/off latencies in genpd based on a governor
PM: domains: Allocate governor data dynamically based on a genpd governor
PM: domains: Clean up some code in pm_genpd_init() and genpd_remove()
PM: domains: Fix initialization of genpd's next_wakeup
PM: domains: Fixup QoS latency measurements for IRQ safe devices in genpd
PM: domains: Measure suspend/resume latencies in genpd based on governor
PM: domains: Move the next_wakeup variable into the struct gpd_timing_data
PM: domains: Allocate gpd_timing_data dynamically based on governor
PM: domains: Skip another warning in irq_safe_dev_in_sleep_domain()
PM: domains: Rename irq_safe_dev_in_no_sleep_domain() in genpd
PM: domains: Don't check PM_QOS_FLAG_NO_POWER_OFF in genpd
PM: domains: Drop redundant code for genpd always-on governor
PM: domains: Add GENPD_FLAG_RPM_ALWAYS_ON for the always-on governor
powercap: intel_rapl: remove redundant store to value after multiply
cpufreq: CPPC: Enable dvfs_possible_from_any_cpu
cpufreq: CPPC: Enable fast_switch
ACPI: CPPC: Assume no transition latency if no PCCT
ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported
ACPI: CPPC: Check _OSC for flexible address space
...

+2469 -1011
-212
Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt
··· 1 - * Rockchip rk3399 DMC (Dynamic Memory Controller) device 2 - 3 - Required properties: 4 - - compatible: Must be "rockchip,rk3399-dmc". 5 - - devfreq-events: Node to get DDR loading, Refer to 6 - Documentation/devicetree/bindings/devfreq/event/ 7 - rockchip-dfi.txt 8 - - clocks: Phandles for clock specified in "clock-names" property 9 - - clock-names : The name of clock used by the DFI, must be 10 - "pclk_ddr_mon"; 11 - - operating-points-v2: Refer to Documentation/devicetree/bindings/opp/opp-v2.yaml 12 - for details. 13 - - center-supply: DMC supply node. 14 - - status: Marks the node enabled/disabled. 15 - - rockchip,pmu: Phandle to the syscon managing the "PMU general register 16 - files". 17 - 18 - Optional properties: 19 - - interrupts: The CPU interrupt number. The interrupt specifier 20 - format depends on the interrupt controller. 21 - It should be a DCF interrupt. When DDR DVFS finishes 22 - a DCF interrupt is triggered. 23 - - rockchip,pmu: Phandle to the syscon managing the "PMU general register 24 - files". 25 - 26 - Following properties relate to DDR timing: 27 - 28 - - rockchip,dram_speed_bin : Value reference include/dt-bindings/clock/rk3399-ddr.h, 29 - it selects the DDR3 cl-trp-trcd type. It must be 30 - set according to "Speed Bin" in DDR3 datasheet, 31 - DO NOT use a smaller "Speed Bin" than specified 32 - for the DDR3 being used. 33 - 34 - - rockchip,pd_idle : Configure the PD_IDLE value. Defines the 35 - power-down idle period in which memories are 36 - placed into power-down mode if bus is idle 37 - for PD_IDLE DFI clock cycles. 38 - 39 - - rockchip,sr_idle : Configure the SR_IDLE value. Defines the 40 - self-refresh idle period in which memories are 41 - placed into self-refresh mode if bus is idle 42 - for SR_IDLE * 1024 DFI clock cycles (DFI 43 - clocks freq is half of DRAM clock), default 44 - value is "0". 45 - 46 - - rockchip,sr_mc_gate_idle : Defines the memory self-refresh and controller 47 - clock gating idle period. Memories are placed 48 - into self-refresh mode and memory controller 49 - clock arg gating started if bus is idle for 50 - sr_mc_gate_idle*1024 DFI clock cycles. 51 - 52 - - rockchip,srpd_lite_idle : Defines the self-refresh power down idle 53 - period in which memories are placed into 54 - self-refresh power down mode if bus is idle 55 - for srpd_lite_idle * 1024 DFI clock cycles. 56 - This parameter is for LPDDR4 only. 57 - 58 - - rockchip,standby_idle : Defines the standby idle period in which 59 - memories are placed into self-refresh mode. 60 - The controller, pi, PHY and DRAM clock will 61 - be gated if bus is idle for standby_idle * DFI 62 - clock cycles. 63 - 64 - - rockchip,dram_dll_dis_freq : Defines the DDR3 DLL bypass frequency in MHz. 65 - When DDR frequency is less than DRAM_DLL_DISB_FREQ, 66 - DDR3 DLL will be bypassed. Note: if DLL was bypassed, 67 - the odt will also stop working. 68 - 69 - - rockchip,phy_dll_dis_freq : Defines the PHY dll bypass frequency in 70 - MHz (Mega Hz). When DDR frequency is less than 71 - DRAM_DLL_DISB_FREQ, PHY DLL will be bypassed. 72 - Note: PHY DLL and PHY ODT are independent. 73 - 74 - - rockchip,ddr3_odt_dis_freq : When the DRAM type is DDR3, this parameter defines 75 - the ODT disable frequency in MHz (Mega Hz). 76 - when the DDR frequency is less then ddr3_odt_dis_freq, 77 - the ODT on the DRAM side and controller side are 78 - both disabled. 79 - 80 - - rockchip,ddr3_drv : When the DRAM type is DDR3, this parameter defines 81 - the DRAM side driver strength in ohms. Default 82 - value is 40. 83 - 84 - - rockchip,ddr3_odt : When the DRAM type is DDR3, this parameter defines 85 - the DRAM side ODT strength in ohms. Default value 86 - is 120. 87 - 88 - - rockchip,phy_ddr3_ca_drv : When the DRAM type is DDR3, this parameter defines 89 - the phy side CA line (incluing command line, 90 - address line and clock line) driver strength. 91 - Default value is 40. 92 - 93 - - rockchip,phy_ddr3_dq_drv : When the DRAM type is DDR3, this parameter defines 94 - the PHY side DQ line (including DQS/DQ/DM line) 95 - driver strength. Default value is 40. 96 - 97 - - rockchip,phy_ddr3_odt : When the DRAM type is DDR3, this parameter defines 98 - the PHY side ODT strength. Default value is 240. 99 - 100 - - rockchip,lpddr3_odt_dis_freq : When the DRAM type is LPDDR3, this parameter defines 101 - then ODT disable frequency in MHz (Mega Hz). 102 - When DDR frequency is less then ddr3_odt_dis_freq, 103 - the ODT on the DRAM side and controller side are 104 - both disabled. 105 - 106 - - rockchip,lpddr3_drv : When the DRAM type is LPDDR3, this parameter defines 107 - the DRAM side driver strength in ohms. Default 108 - value is 34. 109 - 110 - - rockchip,lpddr3_odt : When the DRAM type is LPDDR3, this parameter defines 111 - the DRAM side ODT strength in ohms. Default value 112 - is 240. 113 - 114 - - rockchip,phy_lpddr3_ca_drv : When the DRAM type is LPDDR3, this parameter defines 115 - the PHY side CA line (including command line, 116 - address line and clock line) driver strength. 117 - Default value is 40. 118 - 119 - - rockchip,phy_lpddr3_dq_drv : When the DRAM type is LPDDR3, this parameter defines 120 - the PHY side DQ line (including DQS/DQ/DM line) 121 - driver strength. Default value is 40. 122 - 123 - - rockchip,phy_lpddr3_odt : When dram type is LPDDR3, this parameter define 124 - the phy side odt strength, default value is 240. 125 - 126 - - rockchip,lpddr4_odt_dis_freq : When the DRAM type is LPDDR4, this parameter 127 - defines the ODT disable frequency in 128 - MHz (Mega Hz). When the DDR frequency is less then 129 - ddr3_odt_dis_freq, the ODT on the DRAM side and 130 - controller side are both disabled. 131 - 132 - - rockchip,lpddr4_drv : When the DRAM type is LPDDR4, this parameter defines 133 - the DRAM side driver strength in ohms. Default 134 - value is 60. 135 - 136 - - rockchip,lpddr4_dq_odt : When the DRAM type is LPDDR4, this parameter defines 137 - the DRAM side ODT on DQS/DQ line strength in ohms. 138 - Default value is 40. 139 - 140 - - rockchip,lpddr4_ca_odt : When the DRAM type is LPDDR4, this parameter defines 141 - the DRAM side ODT on CA line strength in ohms. 142 - Default value is 40. 143 - 144 - - rockchip,phy_lpddr4_ca_drv : When the DRAM type is LPDDR4, this parameter defines 145 - the PHY side CA line (including command address 146 - line) driver strength. Default value is 40. 147 - 148 - - rockchip,phy_lpddr4_ck_cs_drv : When the DRAM type is LPDDR4, this parameter defines 149 - the PHY side clock line and CS line driver 150 - strength. Default value is 80. 151 - 152 - - rockchip,phy_lpddr4_dq_drv : When the DRAM type is LPDDR4, this parameter defines 153 - the PHY side DQ line (including DQS/DQ/DM line) 154 - driver strength. Default value is 80. 155 - 156 - - rockchip,phy_lpddr4_odt : When the DRAM type is LPDDR4, this parameter defines 157 - the PHY side ODT strength. Default value is 60. 158 - 159 - Example: 160 - dmc_opp_table: dmc_opp_table { 161 - compatible = "operating-points-v2"; 162 - 163 - opp00 { 164 - opp-hz = /bits/ 64 <300000000>; 165 - opp-microvolt = <900000>; 166 - }; 167 - opp01 { 168 - opp-hz = /bits/ 64 <666000000>; 169 - opp-microvolt = <900000>; 170 - }; 171 - }; 172 - 173 - dmc: dmc { 174 - compatible = "rockchip,rk3399-dmc"; 175 - devfreq-events = <&dfi>; 176 - interrupts = <GIC_SPI 1 IRQ_TYPE_LEVEL_HIGH>; 177 - clocks = <&cru SCLK_DDRC>; 178 - clock-names = "dmc_clk"; 179 - operating-points-v2 = <&dmc_opp_table>; 180 - center-supply = <&ppvar_centerlogic>; 181 - upthreshold = <15>; 182 - downdifferential = <10>; 183 - rockchip,ddr3_speed_bin = <21>; 184 - rockchip,pd_idle = <0x40>; 185 - rockchip,sr_idle = <0x2>; 186 - rockchip,sr_mc_gate_idle = <0x3>; 187 - rockchip,srpd_lite_idle = <0x4>; 188 - rockchip,standby_idle = <0x2000>; 189 - rockchip,dram_dll_dis_freq = <300>; 190 - rockchip,phy_dll_dis_freq = <125>; 191 - rockchip,auto_pd_dis_freq = <666>; 192 - rockchip,ddr3_odt_dis_freq = <333>; 193 - rockchip,ddr3_drv = <40>; 194 - rockchip,ddr3_odt = <120>; 195 - rockchip,phy_ddr3_ca_drv = <40>; 196 - rockchip,phy_ddr3_dq_drv = <40>; 197 - rockchip,phy_ddr3_odt = <240>; 198 - rockchip,lpddr3_odt_dis_freq = <333>; 199 - rockchip,lpddr3_drv = <34>; 200 - rockchip,lpddr3_odt = <240>; 201 - rockchip,phy_lpddr3_ca_drv = <40>; 202 - rockchip,phy_lpddr3_dq_drv = <40>; 203 - rockchip,phy_lpddr3_odt = <240>; 204 - rockchip,lpddr4_odt_dis_freq = <333>; 205 - rockchip,lpddr4_drv = <60>; 206 - rockchip,lpddr4_dq_odt = <40>; 207 - rockchip,lpddr4_ca_odt = <40>; 208 - rockchip,phy_lpddr4_ca_drv = <40>; 209 - rockchip,phy_lpddr4_ck_cs_drv = <80>; 210 - rockchip,phy_lpddr4_dq_drv = <80>; 211 - rockchip,phy_lpddr4_odt = <60>; 212 - };
+384
Documentation/devicetree/bindings/memory-controllers/rockchip,rk3399-dmc.yaml
··· 1 + # SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + # %YAML 1.2 3 + --- 4 + $id: http://devicetree.org/schemas/memory-controllers/rockchip,rk3399-dmc.yaml# 5 + $schema: http://devicetree.org/meta-schemas/core.yaml# 6 + 7 + title: Rockchip rk3399 DMC (Dynamic Memory Controller) device 8 + 9 + maintainers: 10 + - Brian Norris <briannorris@chromium.org> 11 + 12 + properties: 13 + compatible: 14 + enum: 15 + - rockchip,rk3399-dmc 16 + 17 + devfreq-events: 18 + $ref: /schemas/types.yaml#/definitions/phandle 19 + description: 20 + Node to get DDR loading. Refer to 21 + Documentation/devicetree/bindings/devfreq/event/rockchip-dfi.txt. 22 + 23 + clocks: 24 + maxItems: 1 25 + 26 + clock-names: 27 + items: 28 + - const: dmc_clk 29 + 30 + operating-points-v2: true 31 + 32 + center-supply: 33 + description: 34 + DMC regulator supply. 35 + 36 + rockchip,pmu: 37 + $ref: /schemas/types.yaml#/definitions/phandle 38 + description: 39 + Phandle to the syscon managing the "PMU general register files". 40 + 41 + interrupts: 42 + maxItems: 1 43 + description: 44 + The CPU interrupt number. It should be a DCF interrupt. When DDR DVFS 45 + finishes, a DCF interrupt is triggered. 46 + 47 + rockchip,ddr3_speed_bin: 48 + deprecated: true 49 + $ref: /schemas/types.yaml#/definitions/uint32 50 + description: 51 + For values, reference include/dt-bindings/clock/rk3399-ddr.h. Selects the 52 + DDR3 cl-trp-trcd type. It must be set according to "Speed Bin" in DDR3 53 + datasheet; DO NOT use a smaller "Speed Bin" than specified for the DDR3 54 + being used. 55 + 56 + rockchip,pd_idle: 57 + deprecated: true 58 + $ref: /schemas/types.yaml#/definitions/uint32 59 + description: 60 + Configure the PD_IDLE value. Defines the power-down idle period in which 61 + memories are placed into power-down mode if bus is idle for PD_IDLE DFI 62 + clock cycles. 63 + See also rockchip,pd-idle-ns. 64 + 65 + rockchip,sr_idle: 66 + deprecated: true 67 + $ref: /schemas/types.yaml#/definitions/uint32 68 + description: 69 + Configure the SR_IDLE value. Defines the self-refresh idle period in 70 + which memories are placed into self-refresh mode if bus is idle for 71 + SR_IDLE * 1024 DFI clock cycles (DFI clocks freq is half of DRAM clock). 72 + See also rockchip,sr-idle-ns. 73 + default: 0 74 + 75 + rockchip,sr_mc_gate_idle: 76 + deprecated: true 77 + $ref: /schemas/types.yaml#/definitions/uint32 78 + description: 79 + Defines the memory self-refresh and controller clock gating idle period. 80 + Memories are placed into self-refresh mode and memory controller clock 81 + arg gating started if bus is idle for sr_mc_gate_idle*1024 DFI clock 82 + cycles. 83 + See also rockchip,sr-mc-gate-idle-ns. 84 + 85 + rockchip,srpd_lite_idle: 86 + deprecated: true 87 + $ref: /schemas/types.yaml#/definitions/uint32 88 + description: 89 + Defines the self-refresh power down idle period in which memories are 90 + placed into self-refresh power down mode if bus is idle for 91 + srpd_lite_idle * 1024 DFI clock cycles. This parameter is for LPDDR4 92 + only. 93 + See also rockchip,srpd-lite-idle-ns. 94 + 95 + rockchip,standby_idle: 96 + deprecated: true 97 + $ref: /schemas/types.yaml#/definitions/uint32 98 + description: 99 + Defines the standby idle period in which memories are placed into 100 + self-refresh mode. The controller, pi, PHY and DRAM clock will be gated 101 + if bus is idle for standby_idle * DFI clock cycles. 102 + See also rockchip,standby-idle-ns. 103 + 104 + rockchip,dram_dll_dis_freq: 105 + deprecated: true 106 + $ref: /schemas/types.yaml#/definitions/uint32 107 + description: | 108 + Defines the DDR3 DLL bypass frequency in MHz. When DDR frequency is less 109 + than DRAM_DLL_DISB_FREQ, DDR3 DLL will be bypassed. 110 + Note: if DLL was bypassed, the odt will also stop working. 111 + 112 + rockchip,phy_dll_dis_freq: 113 + deprecated: true 114 + $ref: /schemas/types.yaml#/definitions/uint32 115 + description: | 116 + Defines the PHY dll bypass frequency in MHz (Mega Hz). When DDR frequency 117 + is less than DRAM_DLL_DISB_FREQ, PHY DLL will be bypassed. 118 + Note: PHY DLL and PHY ODT are independent. 119 + 120 + rockchip,auto_pd_dis_freq: 121 + deprecated: true 122 + $ref: /schemas/types.yaml#/definitions/uint32 123 + description: 124 + Defines the auto PD disable frequency in MHz. 125 + 126 + rockchip,ddr3_odt_dis_freq: 127 + $ref: /schemas/types.yaml#/definitions/uint32 128 + minimum: 1000000 # In case anyone thought this was MHz. 129 + description: 130 + When the DRAM type is DDR3, this parameter defines the ODT disable 131 + frequency in Hz. When the DDR frequency is less then ddr3_odt_dis_freq, 132 + the ODT on the DRAM side and controller side are both disabled. 133 + 134 + rockchip,ddr3_drv: 135 + deprecated: true 136 + $ref: /schemas/types.yaml#/definitions/uint32 137 + description: 138 + When the DRAM type is DDR3, this parameter defines the DRAM side drive 139 + strength in ohms. 140 + default: 40 141 + 142 + rockchip,ddr3_odt: 143 + deprecated: true 144 + $ref: /schemas/types.yaml#/definitions/uint32 145 + description: 146 + When the DRAM type is DDR3, this parameter defines the DRAM side ODT 147 + strength in ohms. 148 + default: 120 149 + 150 + rockchip,phy_ddr3_ca_drv: 151 + deprecated: true 152 + $ref: /schemas/types.yaml#/definitions/uint32 153 + description: 154 + When the DRAM type is DDR3, this parameter defines the phy side CA line 155 + (incluing command line, address line and clock line) drive strength. 156 + default: 40 157 + 158 + rockchip,phy_ddr3_dq_drv: 159 + deprecated: true 160 + $ref: /schemas/types.yaml#/definitions/uint32 161 + description: 162 + When the DRAM type is DDR3, this parameter defines the PHY side DQ line 163 + (including DQS/DQ/DM line) drive strength. 164 + default: 40 165 + 166 + rockchip,phy_ddr3_odt: 167 + deprecated: true 168 + $ref: /schemas/types.yaml#/definitions/uint32 169 + description: 170 + When the DRAM type is DDR3, this parameter defines the PHY side ODT 171 + strength. 172 + default: 240 173 + 174 + rockchip,lpddr3_odt_dis_freq: 175 + $ref: /schemas/types.yaml#/definitions/uint32 176 + minimum: 1000000 # In case anyone thought this was MHz. 177 + description: 178 + When the DRAM type is LPDDR3, this parameter defines then ODT disable 179 + frequency in Hz. When DDR frequency is less then ddr3_odt_dis_freq, the 180 + ODT on the DRAM side and controller side are both disabled. 181 + 182 + rockchip,lpddr3_drv: 183 + deprecated: true 184 + $ref: /schemas/types.yaml#/definitions/uint32 185 + description: 186 + When the DRAM type is LPDDR3, this parameter defines the DRAM side drive 187 + strength in ohms. 188 + default: 34 189 + 190 + rockchip,lpddr3_odt: 191 + deprecated: true 192 + $ref: /schemas/types.yaml#/definitions/uint32 193 + description: 194 + When the DRAM type is LPDDR3, this parameter defines the DRAM side ODT 195 + strength in ohms. 196 + default: 240 197 + 198 + rockchip,phy_lpddr3_ca_drv: 199 + deprecated: true 200 + $ref: /schemas/types.yaml#/definitions/uint32 201 + description: 202 + When the DRAM type is LPDDR3, this parameter defines the PHY side CA line 203 + (including command line, address line and clock line) drive strength. 204 + default: 40 205 + 206 + rockchip,phy_lpddr3_dq_drv: 207 + deprecated: true 208 + $ref: /schemas/types.yaml#/definitions/uint32 209 + description: 210 + When the DRAM type is LPDDR3, this parameter defines the PHY side DQ line 211 + (including DQS/DQ/DM line) drive strength. 212 + default: 40 213 + 214 + rockchip,phy_lpddr3_odt: 215 + deprecated: true 216 + $ref: /schemas/types.yaml#/definitions/uint32 217 + description: 218 + When dram type is LPDDR3, this parameter define the phy side odt 219 + strength, default value is 240. 220 + 221 + rockchip,lpddr4_odt_dis_freq: 222 + $ref: /schemas/types.yaml#/definitions/uint32 223 + minimum: 1000000 # In case anyone thought this was MHz. 224 + description: 225 + When the DRAM type is LPDDR4, this parameter defines the ODT disable 226 + frequency in Hz. When the DDR frequency is less then ddr3_odt_dis_freq, 227 + the ODT on the DRAM side and controller side are both disabled. 228 + 229 + rockchip,lpddr4_drv: 230 + deprecated: true 231 + $ref: /schemas/types.yaml#/definitions/uint32 232 + description: 233 + When the DRAM type is LPDDR4, this parameter defines the DRAM side drive 234 + strength in ohms. 235 + default: 60 236 + 237 + rockchip,lpddr4_dq_odt: 238 + deprecated: true 239 + $ref: /schemas/types.yaml#/definitions/uint32 240 + description: 241 + When the DRAM type is LPDDR4, this parameter defines the DRAM side ODT on 242 + DQS/DQ line strength in ohms. 243 + default: 40 244 + 245 + rockchip,lpddr4_ca_odt: 246 + deprecated: true 247 + $ref: /schemas/types.yaml#/definitions/uint32 248 + description: 249 + When the DRAM type is LPDDR4, this parameter defines the DRAM side ODT on 250 + CA line strength in ohms. 251 + default: 40 252 + 253 + rockchip,phy_lpddr4_ca_drv: 254 + deprecated: true 255 + $ref: /schemas/types.yaml#/definitions/uint32 256 + description: 257 + When the DRAM type is LPDDR4, this parameter defines the PHY side CA line 258 + (including command address line) drive strength. 259 + default: 40 260 + 261 + rockchip,phy_lpddr4_ck_cs_drv: 262 + deprecated: true 263 + $ref: /schemas/types.yaml#/definitions/uint32 264 + description: 265 + When the DRAM type is LPDDR4, this parameter defines the PHY side clock 266 + line and CS line drive strength. 267 + default: 80 268 + 269 + rockchip,phy_lpddr4_dq_drv: 270 + deprecated: true 271 + $ref: /schemas/types.yaml#/definitions/uint32 272 + description: 273 + When the DRAM type is LPDDR4, this parameter defines the PHY side DQ line 274 + (including DQS/DQ/DM line) drive strength. 275 + default: 80 276 + 277 + rockchip,phy_lpddr4_odt: 278 + deprecated: true 279 + $ref: /schemas/types.yaml#/definitions/uint32 280 + description: 281 + When the DRAM type is LPDDR4, this parameter defines the PHY side ODT 282 + strength. 283 + default: 60 284 + 285 + rockchip,pd-idle-ns: 286 + description: 287 + Configure the PD_IDLE value in nanoseconds. Defines the power-down idle 288 + period in which memories are placed into power-down mode if bus is idle 289 + for PD_IDLE nanoseconds. 290 + 291 + rockchip,sr-idle-ns: 292 + description: 293 + Configure the SR_IDLE value in nanoseconds. Defines the self-refresh idle 294 + period in which memories are placed into self-refresh mode if bus is idle 295 + for SR_IDLE nanoseconds. 296 + default: 0 297 + 298 + rockchip,sr-mc-gate-idle-ns: 299 + description: 300 + Defines the memory self-refresh and controller clock gating idle period in nanoseconds. 301 + Memories are placed into self-refresh mode and memory controller clock 302 + arg gating started if bus is idle for sr_mc_gate_idle nanoseconds. 303 + 304 + rockchip,srpd-lite-idle-ns: 305 + description: 306 + Defines the self-refresh power down idle period in which memories are 307 + placed into self-refresh power down mode if bus is idle for 308 + srpd_lite_idle nanoseonds. This parameter is for LPDDR4 only. 309 + 310 + rockchip,standby-idle-ns: 311 + description: 312 + Defines the standby idle period in which memories are placed into 313 + self-refresh mode. The controller, pi, PHY and DRAM clock will be gated 314 + if bus is idle for standby_idle nanoseconds. 315 + 316 + rockchip,pd-idle-dis-freq-hz: 317 + description: 318 + Defines the power-down idle disable frequency in Hz. When the DDR 319 + frequency is greater than pd-idle-dis-freq, power-down idle is disabled. 320 + See also rockchip,pd-idle-ns. 321 + 322 + rockchip,sr-idle-dis-freq-hz: 323 + description: 324 + Defines the self-refresh idle disable frequency in Hz. When the DDR 325 + frequency is greater than sr-idle-dis-freq, self-refresh idle is 326 + disabled. See also rockchip,sr-idle-ns. 327 + 328 + rockchip,sr-mc-gate-idle-dis-freq-hz: 329 + description: 330 + Defines the self-refresh and memory-controller clock gating disable 331 + frequency in Hz. When the DDR frequency is greater than 332 + sr-mc-gate-idle-dis-freq, the clock will not be gated when idle. See also 333 + rockchip,sr-mc-gate-idle-ns. 334 + 335 + rockchip,srpd-lite-idle-dis-freq-hz: 336 + description: 337 + Defines the self-refresh power down idle disable frequency in Hz. When 338 + the DDR frequency is greater than srpd-lite-idle-dis-freq, memory will 339 + not be placed into self-refresh power down mode when idle. See also 340 + rockchip,srpd-lite-idle-ns. 341 + 342 + rockchip,standby-idle-dis-freq-hz: 343 + description: 344 + Defines the standby idle disable frequency in Hz. When the DDR frequency 345 + is greater than standby-idle-dis-freq, standby idle is disabled. See also 346 + rockchip,standby-idle-ns. 347 + 348 + required: 349 + - compatible 350 + - devfreq-events 351 + - clocks 352 + - clock-names 353 + - operating-points-v2 354 + - center-supply 355 + 356 + additionalProperties: false 357 + 358 + examples: 359 + - | 360 + #include <dt-bindings/clock/rk3399-cru.h> 361 + #include <dt-bindings/interrupt-controller/arm-gic.h> 362 + memory-controller { 363 + compatible = "rockchip,rk3399-dmc"; 364 + devfreq-events = <&dfi>; 365 + rockchip,pmu = <&pmu>; 366 + interrupts = <GIC_SPI 1 IRQ_TYPE_LEVEL_HIGH>; 367 + clocks = <&cru SCLK_DDRC>; 368 + clock-names = "dmc_clk"; 369 + operating-points-v2 = <&dmc_opp_table>; 370 + center-supply = <&ppvar_centerlogic>; 371 + rockchip,pd-idle-ns = <160>; 372 + rockchip,sr-idle-ns = <10240>; 373 + rockchip,sr-mc-gate-idle-ns = <40960>; 374 + rockchip,srpd-lite-idle-ns = <61440>; 375 + rockchip,standby-idle-ns = <81920>; 376 + rockchip,ddr3_odt_dis_freq = <333000000>; 377 + rockchip,lpddr3_odt_dis_freq = <333000000>; 378 + rockchip,lpddr4_odt_dis_freq = <333000000>; 379 + rockchip,pd-idle-dis-freq-hz = <1000000000>; 380 + rockchip,sr-idle-dis-freq-hz = <1000000000>; 381 + rockchip,sr-mc-gate-idle-dis-freq-hz = <1000000000>; 382 + rockchip,srpd-lite-idle-dis-freq-hz = <0>; 383 + rockchip,standby-idle-dis-freq-hz = <928000000>; 384 + };
+22 -2
Documentation/power/energy-model.rst
··· 123 123 (static + dynamic). These power values might be coming directly from 124 124 experiments and measurements. 125 125 126 + Registration of 'artificial' EM 127 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 128 + 129 + There is an option to provide a custom callback for drivers missing detailed 130 + knowledge about power value for each performance state. The callback 131 + .get_cost() is optional and provides the 'cost' values used by the EAS. 132 + This is useful for platforms that only provide information on relative 133 + efficiency between CPU types, where one could use the information to 134 + create an abstract power model. But even an abstract power model can 135 + sometimes be hard to fit in, given the input power value size restrictions. 136 + The .get_cost() allows to provide the 'cost' values which reflect the 137 + efficiency of the CPUs. This would allow to provide EAS information which 138 + has different relation than what would be forced by the EM internal 139 + formulas calculating 'cost' values. To register an EM for such platform, the 140 + driver must set the flag 'milliwatts' to 0, provide .get_power() callback 141 + and provide .get_cost() callback. The EM framework would handle such platform 142 + properly during registration. A flag EM_PERF_DOMAIN_ARTIFICIAL is set for such 143 + platform. Special care should be taken by other frameworks which are using EM 144 + to test and treat this flag properly. 145 + 126 146 Registration of 'simple' EM 127 147 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 128 148 ··· 201 181 202 182 -> drivers/cpufreq/foo_cpufreq.c 203 183 204 - 01 static int est_power(unsigned long *mW, unsigned long *KHz, 205 - 02 struct device *dev) 184 + 01 static int est_power(struct device *dev, unsigned long *mW, 185 + 02 unsigned long *KHz) 206 186 03 { 207 187 04 long freq, power; 208 188 05
+1
arch/arm64/kernel/smp.c
··· 512 512 { 513 513 return &cpu_madt_gicc[cpu]; 514 514 } 515 + EXPORT_SYMBOL_GPL(acpi_cpu_get_madt_gicc); 515 516 516 517 /* 517 518 * acpi_map_gic_cpu_interface - parse processor MADT entry
+1
arch/x86/include/asm/msr-index.h
··· 319 319 320 320 /* Run Time Average Power Limiting (RAPL) Interface */ 321 321 322 + #define MSR_VR_CURRENT_CONFIG 0x00000601 322 323 #define MSR_RAPL_POWER_UNIT 0x00000606 323 324 324 325 #define MSR_PKG_POWER_LIMIT 0x00000610
+1 -1
arch/x86/kernel/acpi/boot.c
··· 1862 1862 1863 1863 void __init arch_reserve_mem_area(acpi_physical_address addr, size_t size) 1864 1864 { 1865 - e820__range_add(addr, size, E820_TYPE_ACPI); 1865 + e820__range_add(addr, size, E820_TYPE_NVS); 1866 1866 e820__update_table_print(); 1867 1867 } 1868 1868
+26 -8
drivers/acpi/bus.c
··· 279 279 EXPORT_SYMBOL_GPL(osc_pc_lpi_support_confirmed); 280 280 281 281 /* 282 + * ACPI 6.2 Section 6.2.11.2 'Platform-Wide OSPM Capabilities': 283 + * Starting with ACPI Specification 6.2, all _CPC registers can be in 284 + * PCC, System Memory, System IO, or Functional Fixed Hardware address 285 + * spaces. OSPM support for this more flexible register space scheme is 286 + * indicated by the “Flexible Address Space for CPPC Registers” _OSC bit. 287 + * 288 + * Otherwise (cf ACPI 6.1, s8.4.7.1.1.X), _CPC registers must be in: 289 + * - PCC or Functional Fixed Hardware address space if defined 290 + * - SystemMemory address space (NULL register) if not defined 291 + */ 292 + bool osc_cpc_flexible_adr_space_confirmed; 293 + EXPORT_SYMBOL_GPL(osc_cpc_flexible_adr_space_confirmed); 294 + 295 + /* 282 296 * ACPI 6.4 Operating System Capabilities for USB. 283 297 */ 284 298 bool osc_sb_native_usb4_support_confirmed; ··· 329 315 #endif 330 316 #ifdef CONFIG_X86 331 317 capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_GENERIC_INITIATOR_SUPPORT; 332 - if (boot_cpu_has(X86_FEATURE_HWP)) { 333 - capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPC_SUPPORT; 334 - capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPCV2_SUPPORT; 335 - } 336 318 #endif 319 + 320 + #ifdef CONFIG_ACPI_CPPC_LIB 321 + capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPC_SUPPORT; 322 + capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPCV2_SUPPORT; 323 + #endif 324 + 325 + capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPC_FLEXIBLE_ADR_SPACE; 337 326 338 327 if (IS_ENABLED(CONFIG_SCHED_MC_PRIO)) 339 328 capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPC_DIVERSE_HIGH_SUPPORT; ··· 358 341 return; 359 342 } 360 343 361 - #ifdef CONFIG_X86 362 - if (boot_cpu_has(X86_FEATURE_HWP)) 363 - osc_sb_cppc_not_supported = !(capbuf_ret[OSC_SUPPORT_DWORD] & 364 - (OSC_SB_CPC_SUPPORT | OSC_SB_CPCV2_SUPPORT)); 344 + #ifdef CONFIG_ACPI_CPPC_LIB 345 + osc_sb_cppc_not_supported = !(capbuf_ret[OSC_SUPPORT_DWORD] & 346 + (OSC_SB_CPC_SUPPORT | OSC_SB_CPCV2_SUPPORT)); 365 347 #endif 366 348 367 349 /* ··· 382 366 capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_PCLPI_SUPPORT; 383 367 osc_sb_native_usb4_support_confirmed = 384 368 capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_NATIVE_USB4_SUPPORT; 369 + osc_cpc_flexible_adr_space_confirmed = 370 + capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_CPC_FLEXIBLE_ADR_SPACE; 385 371 } 386 372 387 373 kfree(context.ret.pointer);
+43 -1
drivers/acpi/cppc_acpi.c
··· 100 100 (cpc)->cpc_entry.reg.space_id == \ 101 101 ACPI_ADR_SPACE_PLATFORM_COMM) 102 102 103 + /* Check if a CPC register is in SystemMemory */ 104 + #define CPC_IN_SYSTEM_MEMORY(cpc) ((cpc)->type == ACPI_TYPE_BUFFER && \ 105 + (cpc)->cpc_entry.reg.space_id == \ 106 + ACPI_ADR_SPACE_SYSTEM_MEMORY) 107 + 108 + /* Check if a CPC register is in SystemIo */ 109 + #define CPC_IN_SYSTEM_IO(cpc) ((cpc)->type == ACPI_TYPE_BUFFER && \ 110 + (cpc)->cpc_entry.reg.space_id == \ 111 + ACPI_ADR_SPACE_SYSTEM_IO) 112 + 103 113 /* Evaluates to True if reg is a NULL register descriptor */ 104 114 #define IS_NULL_REG(reg) ((reg)->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY && \ 105 115 (reg)->address == 0 && \ ··· 434 424 } 435 425 EXPORT_SYMBOL_GPL(acpi_cpc_valid); 436 426 427 + bool cppc_allow_fast_switch(void) 428 + { 429 + struct cpc_register_resource *desired_reg; 430 + struct cpc_desc *cpc_ptr; 431 + int cpu; 432 + 433 + for_each_possible_cpu(cpu) { 434 + cpc_ptr = per_cpu(cpc_desc_ptr, cpu); 435 + desired_reg = &cpc_ptr->cpc_regs[DESIRED_PERF]; 436 + if (!CPC_IN_SYSTEM_MEMORY(desired_reg) && 437 + !CPC_IN_SYSTEM_IO(desired_reg)) 438 + return false; 439 + } 440 + 441 + return true; 442 + } 443 + EXPORT_SYMBOL_GPL(cppc_allow_fast_switch); 444 + 437 445 /** 438 446 * acpi_get_psd_map - Map the CPUs in the freq domain of a given cpu 439 447 * @cpu: Find all CPUs that share a domain with cpu. ··· 764 736 if (gas_t->address) { 765 737 void __iomem *addr; 766 738 739 + if (!osc_cpc_flexible_adr_space_confirmed) { 740 + pr_debug("Flexible address space capability not supported\n"); 741 + goto out_free; 742 + } 743 + 767 744 addr = ioremap(gas_t->address, gas_t->bit_width/8); 768 745 if (!addr) 769 746 goto out_free; ··· 789 756 /* SystemIO registers use 16-bit integer addresses */ 790 757 pr_debug("Invalid IO port %llu for SystemIO register in _CPC\n", 791 758 gas_t->address); 759 + goto out_free; 760 + } 761 + if (!osc_cpc_flexible_adr_space_confirmed) { 762 + pr_debug("Flexible address space capability not supported\n"); 792 763 goto out_free; 793 764 } 794 765 } else { ··· 1484 1447 * transition latency for performance change requests. The closest we have 1485 1448 * is the timing information from the PCCT tables which provides the info 1486 1449 * on the number and frequency of PCC commands the platform can handle. 1450 + * 1451 + * If desired_reg is in the SystemMemory or SystemIo ACPI address space, 1452 + * then assume there is no latency. 1487 1453 */ 1488 1454 unsigned int cppc_get_transition_latency(int cpu_num) 1489 1455 { ··· 1512 1472 return CPUFREQ_ETERNAL; 1513 1473 1514 1474 desired_reg = &cpc_desc->cpc_regs[DESIRED_PERF]; 1515 - if (!CPC_IN_PCC(desired_reg)) 1475 + if (CPC_IN_SYSTEM_MEMORY(desired_reg) || CPC_IN_SYSTEM_IO(desired_reg)) 1476 + return 0; 1477 + else if (!CPC_IN_PCC(desired_reg)) 1516 1478 return CPUFREQ_ETERNAL; 1517 1479 1518 1480 if (pcc_ss_id < 0)
+4 -4
drivers/base/power/common.c
··· 172 172 * @dev: Device to detach. 173 173 * @power_off: Used to indicate whether we should power off the device. 174 174 * 175 - * This functions will reverse the actions from dev_pm_domain_attach() and 176 - * dev_pm_domain_attach_by_id(), thus it detaches @dev from its PM domain. 177 - * Typically it should be invoked during the remove phase, either from 178 - * subsystem level code or from drivers. 175 + * This functions will reverse the actions from dev_pm_domain_attach(), 176 + * dev_pm_domain_attach_by_id() and dev_pm_domain_attach_by_name(), thus it 177 + * detaches @dev from its PM domain. Typically it should be invoked during the 178 + * remove phase, either from subsystem level code or from drivers. 179 179 * 180 180 * Callers must ensure proper synchronization of this function with power 181 181 * management callbacks.
+170 -108
drivers/base/power/domain.c
··· 131 131 #define genpd_is_cpu_domain(genpd) (genpd->flags & GENPD_FLAG_CPU_DOMAIN) 132 132 #define genpd_is_rpm_always_on(genpd) (genpd->flags & GENPD_FLAG_RPM_ALWAYS_ON) 133 133 134 - static inline bool irq_safe_dev_in_no_sleep_domain(struct device *dev, 134 + static inline bool irq_safe_dev_in_sleep_domain(struct device *dev, 135 135 const struct generic_pm_domain *genpd) 136 136 { 137 137 bool ret; ··· 139 139 ret = pm_runtime_is_irq_safe(dev) && !genpd_is_irq_safe(genpd); 140 140 141 141 /* 142 - * Warn once if an IRQ safe device is attached to a no sleep domain, as 143 - * to indicate a suboptimal configuration for PM. For an always on 144 - * domain this isn't case, thus don't warn. 142 + * Warn once if an IRQ safe device is attached to a domain, which 143 + * callbacks are allowed to sleep. This indicates a suboptimal 144 + * configuration for PM, but it doesn't matter for an always on domain. 145 145 */ 146 - if (ret && !genpd_is_always_on(genpd)) 146 + if (genpd_is_always_on(genpd) || genpd_is_rpm_always_on(genpd)) 147 + return ret; 148 + 149 + if (ret) 147 150 dev_warn_once(dev, "PM domain %s will not be powered off\n", 148 151 genpd->name); 149 152 ··· 228 225 229 226 static void genpd_update_accounting(struct generic_pm_domain *genpd) 230 227 { 231 - ktime_t delta, now; 228 + u64 delta, now; 232 229 233 - now = ktime_get(); 234 - delta = ktime_sub(now, genpd->accounting_time); 230 + now = ktime_get_mono_fast_ns(); 231 + if (now <= genpd->accounting_time) 232 + return; 233 + 234 + delta = now - genpd->accounting_time; 235 235 236 236 /* 237 237 * If genpd->status is active, it means we are just 238 238 * out of off and so update the idle time and vice 239 239 * versa. 240 240 */ 241 - if (genpd->status == GENPD_STATE_ON) { 242 - int state_idx = genpd->state_idx; 243 - 244 - genpd->states[state_idx].idle_time = 245 - ktime_add(genpd->states[state_idx].idle_time, delta); 246 - } else { 247 - genpd->on_time = ktime_add(genpd->on_time, delta); 248 - } 241 + if (genpd->status == GENPD_STATE_ON) 242 + genpd->states[genpd->state_idx].idle_time += delta; 243 + else 244 + genpd->on_time += delta; 249 245 250 246 genpd->accounting_time = now; 251 247 } ··· 478 476 */ 479 477 void dev_pm_genpd_set_next_wakeup(struct device *dev, ktime_t next) 480 478 { 481 - struct generic_pm_domain_data *gpd_data; 482 479 struct generic_pm_domain *genpd; 480 + struct gpd_timing_data *td; 483 481 484 482 genpd = dev_to_genpd_safe(dev); 485 483 if (!genpd) 486 484 return; 487 485 488 - gpd_data = to_gpd_data(dev->power.subsys_data->domain_data); 489 - gpd_data->next_wakeup = next; 486 + td = to_gpd_data(dev->power.subsys_data->domain_data)->td; 487 + if (td) 488 + td->next_wakeup = next; 490 489 } 491 490 EXPORT_SYMBOL_GPL(dev_pm_genpd_set_next_wakeup); 492 491 ··· 509 506 if (!genpd->power_on) 510 507 goto out; 511 508 509 + timed = timed && genpd->gd && !genpd->states[state_idx].fwnode; 512 510 if (!timed) { 513 511 ret = genpd->power_on(genpd); 514 512 if (ret) ··· 528 524 goto out; 529 525 530 526 genpd->states[state_idx].power_on_latency_ns = elapsed_ns; 531 - genpd->max_off_time_changed = true; 527 + genpd->gd->max_off_time_changed = true; 532 528 pr_debug("%s: Power-%s latency exceeded, new value %lld ns\n", 533 529 genpd->name, "on", elapsed_ns); 534 530 ··· 559 555 if (!genpd->power_off) 560 556 goto out; 561 557 558 + timed = timed && genpd->gd && !genpd->states[state_idx].fwnode; 562 559 if (!timed) { 563 560 ret = genpd->power_off(genpd); 564 561 if (ret) ··· 578 573 goto out; 579 574 580 575 genpd->states[state_idx].power_off_latency_ns = elapsed_ns; 581 - genpd->max_off_time_changed = true; 576 + genpd->gd->max_off_time_changed = true; 582 577 pr_debug("%s: Power-%s latency exceeded, new value %lld ns\n", 583 578 genpd->name, "off", elapsed_ns); 584 579 ··· 654 649 } 655 650 656 651 list_for_each_entry(pdd, &genpd->dev_list, list_node) { 657 - enum pm_qos_flags_status stat; 658 - 659 - stat = dev_pm_qos_flags(pdd->dev, PM_QOS_FLAG_NO_POWER_OFF); 660 - if (stat > PM_QOS_FLAGS_NONE) 661 - return -EBUSY; 662 - 663 652 /* 664 653 * Do not allow PM domain to be powered off, when an IRQ safe 665 654 * device is part of a non-IRQ safe domain. 666 655 */ 667 656 if (!pm_runtime_suspended(pdd->dev) || 668 - irq_safe_dev_in_no_sleep_domain(pdd->dev, genpd)) 657 + irq_safe_dev_in_sleep_domain(pdd->dev, genpd)) 669 658 not_suspended++; 670 659 } 671 660 ··· 774 775 dev = gpd_data->base.dev; 775 776 776 777 for (;;) { 777 - struct generic_pm_domain *genpd; 778 + struct generic_pm_domain *genpd = ERR_PTR(-ENODATA); 778 779 struct pm_domain_data *pdd; 780 + struct gpd_timing_data *td; 779 781 780 782 spin_lock_irq(&dev->power.lock); 781 783 782 784 pdd = dev->power.subsys_data ? 783 785 dev->power.subsys_data->domain_data : NULL; 784 786 if (pdd) { 785 - to_gpd_data(pdd)->td.constraint_changed = true; 786 - genpd = dev_to_genpd(dev); 787 - } else { 788 - genpd = ERR_PTR(-ENODATA); 787 + td = to_gpd_data(pdd)->td; 788 + if (td) { 789 + td->constraint_changed = true; 790 + genpd = dev_to_genpd(dev); 791 + } 789 792 } 790 793 791 794 spin_unlock_irq(&dev->power.lock); 792 795 793 796 if (!IS_ERR(genpd)) { 794 797 genpd_lock(genpd); 795 - genpd->max_off_time_changed = true; 798 + genpd->gd->max_off_time_changed = true; 796 799 genpd_unlock(genpd); 797 800 } 798 801 ··· 880 879 struct generic_pm_domain *genpd; 881 880 bool (*suspend_ok)(struct device *__dev); 882 881 struct generic_pm_domain_data *gpd_data = dev_gpd_data(dev); 883 - struct gpd_timing_data *td = &gpd_data->td; 882 + struct gpd_timing_data *td = gpd_data->td; 884 883 bool runtime_pm = pm_runtime_enabled(dev); 885 - ktime_t time_start; 884 + ktime_t time_start = 0; 886 885 s64 elapsed_ns; 887 886 int ret; 888 887 ··· 903 902 return -EBUSY; 904 903 905 904 /* Measure suspend latency. */ 906 - time_start = 0; 907 - if (runtime_pm) 905 + if (td && runtime_pm) 908 906 time_start = ktime_get(); 909 907 910 908 ret = __genpd_runtime_suspend(dev); ··· 917 917 } 918 918 919 919 /* Update suspend latency value if the measured time exceeds it. */ 920 - if (runtime_pm) { 920 + if (td && runtime_pm) { 921 921 elapsed_ns = ktime_to_ns(ktime_sub(ktime_get(), time_start)); 922 922 if (elapsed_ns > td->suspend_latency_ns) { 923 923 td->suspend_latency_ns = elapsed_ns; 924 924 dev_dbg(dev, "suspend latency exceeded, %lld ns\n", 925 925 elapsed_ns); 926 - genpd->max_off_time_changed = true; 926 + genpd->gd->max_off_time_changed = true; 927 927 td->constraint_changed = true; 928 928 } 929 929 } ··· 932 932 * If power.irq_safe is set, this routine may be run with 933 933 * IRQs disabled, so suspend only if the PM domain also is irq_safe. 934 934 */ 935 - if (irq_safe_dev_in_no_sleep_domain(dev, genpd)) 935 + if (irq_safe_dev_in_sleep_domain(dev, genpd)) 936 936 return 0; 937 937 938 938 genpd_lock(genpd); ··· 955 955 { 956 956 struct generic_pm_domain *genpd; 957 957 struct generic_pm_domain_data *gpd_data = dev_gpd_data(dev); 958 - struct gpd_timing_data *td = &gpd_data->td; 959 - bool runtime_pm = pm_runtime_enabled(dev); 960 - ktime_t time_start; 958 + struct gpd_timing_data *td = gpd_data->td; 959 + bool timed = td && pm_runtime_enabled(dev); 960 + ktime_t time_start = 0; 961 961 s64 elapsed_ns; 962 962 int ret; 963 - bool timed = true; 964 963 965 964 dev_dbg(dev, "%s()\n", __func__); 966 965 ··· 971 972 * As we don't power off a non IRQ safe domain, which holds 972 973 * an IRQ safe device, we don't need to restore power to it. 973 974 */ 974 - if (irq_safe_dev_in_no_sleep_domain(dev, genpd)) { 975 - timed = false; 975 + if (irq_safe_dev_in_sleep_domain(dev, genpd)) 976 976 goto out; 977 - } 978 977 979 978 genpd_lock(genpd); 980 979 ret = genpd_power_on(genpd, 0); ··· 985 988 986 989 out: 987 990 /* Measure resume latency. */ 988 - time_start = 0; 989 - if (timed && runtime_pm) 991 + if (timed) 990 992 time_start = ktime_get(); 991 993 992 994 ret = genpd_start_dev(genpd, dev); ··· 997 1001 goto err_stop; 998 1002 999 1003 /* Update resume latency value if the measured time exceeds it. */ 1000 - if (timed && runtime_pm) { 1004 + if (timed) { 1001 1005 elapsed_ns = ktime_to_ns(ktime_sub(ktime_get(), time_start)); 1002 1006 if (elapsed_ns > td->resume_latency_ns) { 1003 1007 td->resume_latency_ns = elapsed_ns; 1004 1008 dev_dbg(dev, "resume latency exceeded, %lld ns\n", 1005 1009 elapsed_ns); 1006 - genpd->max_off_time_changed = true; 1010 + genpd->gd->max_off_time_changed = true; 1007 1011 td->constraint_changed = true; 1008 1012 } 1009 1013 } ··· 1496 1500 1497 1501 #endif /* CONFIG_PM_SLEEP */ 1498 1502 1499 - static struct generic_pm_domain_data *genpd_alloc_dev_data(struct device *dev) 1503 + static struct generic_pm_domain_data *genpd_alloc_dev_data(struct device *dev, 1504 + bool has_governor) 1500 1505 { 1501 1506 struct generic_pm_domain_data *gpd_data; 1507 + struct gpd_timing_data *td; 1502 1508 int ret; 1503 1509 1504 1510 ret = dev_pm_get_subsys_data(dev); ··· 1514 1516 } 1515 1517 1516 1518 gpd_data->base.dev = dev; 1517 - gpd_data->td.constraint_changed = true; 1518 - gpd_data->td.effective_constraint_ns = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT_NS; 1519 1519 gpd_data->nb.notifier_call = genpd_dev_pm_qos_notifier; 1520 - gpd_data->next_wakeup = KTIME_MAX; 1520 + 1521 + /* Allocate data used by a governor. */ 1522 + if (has_governor) { 1523 + td = kzalloc(sizeof(*td), GFP_KERNEL); 1524 + if (!td) { 1525 + ret = -ENOMEM; 1526 + goto err_free; 1527 + } 1528 + 1529 + td->constraint_changed = true; 1530 + td->effective_constraint_ns = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT_NS; 1531 + td->next_wakeup = KTIME_MAX; 1532 + gpd_data->td = td; 1533 + } 1521 1534 1522 1535 spin_lock_irq(&dev->power.lock); 1523 1536 1524 - if (dev->power.subsys_data->domain_data) { 1537 + if (dev->power.subsys_data->domain_data) 1525 1538 ret = -EINVAL; 1526 - goto err_free; 1527 - } 1528 - 1529 - dev->power.subsys_data->domain_data = &gpd_data->base; 1539 + else 1540 + dev->power.subsys_data->domain_data = &gpd_data->base; 1530 1541 1531 1542 spin_unlock_irq(&dev->power.lock); 1543 + 1544 + if (ret) 1545 + goto err_free; 1532 1546 1533 1547 return gpd_data; 1534 1548 1535 1549 err_free: 1536 - spin_unlock_irq(&dev->power.lock); 1550 + kfree(gpd_data->td); 1537 1551 kfree(gpd_data); 1538 1552 err_put: 1539 1553 dev_pm_put_subsys_data(dev); ··· 1561 1551 1562 1552 spin_unlock_irq(&dev->power.lock); 1563 1553 1554 + kfree(gpd_data->td); 1564 1555 kfree(gpd_data); 1565 1556 dev_pm_put_subsys_data(dev); 1566 1557 } ··· 1618 1607 static int genpd_add_device(struct generic_pm_domain *genpd, struct device *dev, 1619 1608 struct device *base_dev) 1620 1609 { 1610 + struct genpd_governor_data *gd = genpd->gd; 1621 1611 struct generic_pm_domain_data *gpd_data; 1622 1612 int ret; 1623 1613 ··· 1627 1615 if (IS_ERR_OR_NULL(genpd) || IS_ERR_OR_NULL(dev)) 1628 1616 return -EINVAL; 1629 1617 1630 - gpd_data = genpd_alloc_dev_data(dev); 1618 + gpd_data = genpd_alloc_dev_data(dev, gd); 1631 1619 if (IS_ERR(gpd_data)) 1632 1620 return PTR_ERR(gpd_data); 1633 1621 ··· 1643 1631 dev_pm_domain_set(dev, &genpd->domain); 1644 1632 1645 1633 genpd->device_count++; 1646 - genpd->max_off_time_changed = true; 1634 + if (gd) 1635 + gd->max_off_time_changed = true; 1647 1636 1648 1637 list_add_tail(&gpd_data->base.list_node, &genpd->dev_list); 1649 1638 ··· 1698 1685 } 1699 1686 1700 1687 genpd->device_count--; 1701 - genpd->max_off_time_changed = true; 1688 + if (genpd->gd) 1689 + genpd->gd->max_off_time_changed = true; 1702 1690 1703 1691 genpd_clear_cpumask(genpd, gpd_data->cpu); 1704 1692 dev_pm_domain_set(dev, NULL); ··· 1972 1958 return 0; 1973 1959 } 1974 1960 1961 + static int genpd_alloc_data(struct generic_pm_domain *genpd) 1962 + { 1963 + struct genpd_governor_data *gd = NULL; 1964 + int ret; 1965 + 1966 + if (genpd_is_cpu_domain(genpd) && 1967 + !zalloc_cpumask_var(&genpd->cpus, GFP_KERNEL)) 1968 + return -ENOMEM; 1969 + 1970 + if (genpd->gov) { 1971 + gd = kzalloc(sizeof(*gd), GFP_KERNEL); 1972 + if (!gd) { 1973 + ret = -ENOMEM; 1974 + goto free; 1975 + } 1976 + 1977 + gd->max_off_time_ns = -1; 1978 + gd->max_off_time_changed = true; 1979 + gd->next_wakeup = KTIME_MAX; 1980 + } 1981 + 1982 + /* Use only one "off" state if there were no states declared */ 1983 + if (genpd->state_count == 0) { 1984 + ret = genpd_set_default_power_state(genpd); 1985 + if (ret) 1986 + goto free; 1987 + } 1988 + 1989 + genpd->gd = gd; 1990 + return 0; 1991 + 1992 + free: 1993 + if (genpd_is_cpu_domain(genpd)) 1994 + free_cpumask_var(genpd->cpus); 1995 + kfree(gd); 1996 + return ret; 1997 + } 1998 + 1999 + static void genpd_free_data(struct generic_pm_domain *genpd) 2000 + { 2001 + if (genpd_is_cpu_domain(genpd)) 2002 + free_cpumask_var(genpd->cpus); 2003 + if (genpd->free_states) 2004 + genpd->free_states(genpd->states, genpd->state_count); 2005 + kfree(genpd->gd); 2006 + } 2007 + 1975 2008 static void genpd_lock_init(struct generic_pm_domain *genpd) 1976 2009 { 1977 2010 if (genpd->flags & GENPD_FLAG_IRQ_SAFE) { ··· 2056 1995 atomic_set(&genpd->sd_count, 0); 2057 1996 genpd->status = is_off ? GENPD_STATE_OFF : GENPD_STATE_ON; 2058 1997 genpd->device_count = 0; 2059 - genpd->max_off_time_ns = -1; 2060 - genpd->max_off_time_changed = true; 2061 1998 genpd->provider = NULL; 2062 1999 genpd->has_provider = false; 2063 - genpd->accounting_time = ktime_get(); 2000 + genpd->accounting_time = ktime_get_mono_fast_ns(); 2064 2001 genpd->domain.ops.runtime_suspend = genpd_runtime_suspend; 2065 2002 genpd->domain.ops.runtime_resume = genpd_runtime_resume; 2066 2003 genpd->domain.ops.prepare = genpd_prepare; ··· 2076 2017 genpd->dev_ops.start = pm_clk_resume; 2077 2018 } 2078 2019 2020 + /* The always-on governor works better with the corresponding flag. */ 2021 + if (gov == &pm_domain_always_on_gov) 2022 + genpd->flags |= GENPD_FLAG_RPM_ALWAYS_ON; 2023 + 2079 2024 /* Always-on domains must be powered on at initialization. */ 2080 2025 if ((genpd_is_always_on(genpd) || genpd_is_rpm_always_on(genpd)) && 2081 2026 !genpd_status_on(genpd)) 2082 2027 return -EINVAL; 2083 2028 2084 - if (genpd_is_cpu_domain(genpd) && 2085 - !zalloc_cpumask_var(&genpd->cpus, GFP_KERNEL)) 2086 - return -ENOMEM; 2087 - 2088 - /* Use only one "off" state if there were no states declared */ 2089 - if (genpd->state_count == 0) { 2090 - ret = genpd_set_default_power_state(genpd); 2091 - if (ret) { 2092 - if (genpd_is_cpu_domain(genpd)) 2093 - free_cpumask_var(genpd->cpus); 2094 - return ret; 2095 - } 2096 - } else if (!gov && genpd->state_count > 1) { 2029 + /* Multiple states but no governor doesn't make sense. */ 2030 + if (!gov && genpd->state_count > 1) 2097 2031 pr_warn("%s: no governor for states\n", genpd->name); 2098 - } 2032 + 2033 + ret = genpd_alloc_data(genpd); 2034 + if (ret) 2035 + return ret; 2099 2036 2100 2037 device_initialize(&genpd->dev); 2101 2038 dev_set_name(&genpd->dev, "%s", genpd->name); ··· 2136 2081 genpd_unlock(genpd); 2137 2082 genpd_debug_remove(genpd); 2138 2083 cancel_work_sync(&genpd->power_off_work); 2139 - if (genpd_is_cpu_domain(genpd)) 2140 - free_cpumask_var(genpd->cpus); 2141 - if (genpd->free_states) 2142 - genpd->free_states(genpd->states, genpd->state_count); 2084 + genpd_free_data(genpd); 2143 2085 2144 2086 pr_debug("%s: removed %s\n", __func__, genpd->name); 2145 2087 ··· 3215 3163 static int idle_states_show(struct seq_file *s, void *data) 3216 3164 { 3217 3165 struct generic_pm_domain *genpd = s->private; 3166 + u64 now, delta, idle_time = 0; 3218 3167 unsigned int i; 3219 3168 int ret = 0; 3220 3169 ··· 3226 3173 seq_puts(s, "State Time Spent(ms) Usage Rejected\n"); 3227 3174 3228 3175 for (i = 0; i < genpd->state_count; i++) { 3229 - ktime_t delta = 0; 3230 - s64 msecs; 3176 + idle_time += genpd->states[i].idle_time; 3231 3177 3232 - if ((genpd->status == GENPD_STATE_OFF) && 3233 - (genpd->state_idx == i)) 3234 - delta = ktime_sub(ktime_get(), genpd->accounting_time); 3178 + if (genpd->status == GENPD_STATE_OFF && genpd->state_idx == i) { 3179 + now = ktime_get_mono_fast_ns(); 3180 + if (now > genpd->accounting_time) { 3181 + delta = now - genpd->accounting_time; 3182 + idle_time += delta; 3183 + } 3184 + } 3235 3185 3236 - msecs = ktime_to_ms( 3237 - ktime_add(genpd->states[i].idle_time, delta)); 3238 - seq_printf(s, "S%-13i %-14lld %-14llu %llu\n", i, msecs, 3239 - genpd->states[i].usage, genpd->states[i].rejected); 3186 + do_div(idle_time, NSEC_PER_MSEC); 3187 + seq_printf(s, "S%-13i %-14llu %-14llu %llu\n", i, idle_time, 3188 + genpd->states[i].usage, genpd->states[i].rejected); 3240 3189 } 3241 3190 3242 3191 genpd_unlock(genpd); ··· 3248 3193 static int active_time_show(struct seq_file *s, void *data) 3249 3194 { 3250 3195 struct generic_pm_domain *genpd = s->private; 3251 - ktime_t delta = 0; 3196 + u64 now, on_time, delta = 0; 3252 3197 int ret = 0; 3253 3198 3254 3199 ret = genpd_lock_interruptible(genpd); 3255 3200 if (ret) 3256 3201 return -ERESTARTSYS; 3257 3202 3258 - if (genpd->status == GENPD_STATE_ON) 3259 - delta = ktime_sub(ktime_get(), genpd->accounting_time); 3203 + if (genpd->status == GENPD_STATE_ON) { 3204 + now = ktime_get_mono_fast_ns(); 3205 + if (now > genpd->accounting_time) 3206 + delta = now - genpd->accounting_time; 3207 + } 3260 3208 3261 - seq_printf(s, "%lld ms\n", ktime_to_ms( 3262 - ktime_add(genpd->on_time, delta))); 3209 + on_time = genpd->on_time + delta; 3210 + do_div(on_time, NSEC_PER_MSEC); 3211 + seq_printf(s, "%llu ms\n", on_time); 3263 3212 3264 3213 genpd_unlock(genpd); 3265 3214 return ret; ··· 3272 3213 static int total_idle_time_show(struct seq_file *s, void *data) 3273 3214 { 3274 3215 struct generic_pm_domain *genpd = s->private; 3275 - ktime_t delta = 0, total = 0; 3216 + u64 now, delta, total = 0; 3276 3217 unsigned int i; 3277 3218 int ret = 0; 3278 3219 ··· 3281 3222 return -ERESTARTSYS; 3282 3223 3283 3224 for (i = 0; i < genpd->state_count; i++) { 3225 + total += genpd->states[i].idle_time; 3284 3226 3285 - if ((genpd->status == GENPD_STATE_OFF) && 3286 - (genpd->state_idx == i)) 3287 - delta = ktime_sub(ktime_get(), genpd->accounting_time); 3288 - 3289 - total = ktime_add(total, genpd->states[i].idle_time); 3227 + if (genpd->status == GENPD_STATE_OFF && genpd->state_idx == i) { 3228 + now = ktime_get_mono_fast_ns(); 3229 + if (now > genpd->accounting_time) { 3230 + delta = now - genpd->accounting_time; 3231 + total += delta; 3232 + } 3233 + } 3290 3234 } 3291 - total = ktime_add(total, delta); 3292 3235 3293 - seq_printf(s, "%lld ms\n", ktime_to_ms(total)); 3236 + do_div(total, NSEC_PER_MSEC); 3237 + seq_printf(s, "%llu ms\n", total); 3294 3238 3295 3239 genpd_unlock(genpd); 3296 3240 return ret;
+35 -30
drivers/base/power/domain_governor.c
··· 18 18 s64 constraint_ns; 19 19 20 20 if (dev->power.subsys_data && dev->power.subsys_data->domain_data) { 21 + struct gpd_timing_data *td = dev_gpd_data(dev)->td; 22 + 21 23 /* 22 24 * Only take suspend-time QoS constraints of devices into 23 25 * account, because constraints updated after the device has ··· 27 25 * anyway. In order for them to take effect, the device has to 28 26 * be resumed and suspended again. 29 27 */ 30 - constraint_ns = dev_gpd_data(dev)->td.effective_constraint_ns; 28 + constraint_ns = td ? td->effective_constraint_ns : 29 + PM_QOS_RESUME_LATENCY_NO_CONSTRAINT_NS; 31 30 } else { 32 31 /* 33 32 * The child is not in a domain and there's no info on its ··· 52 49 */ 53 50 static bool default_suspend_ok(struct device *dev) 54 51 { 55 - struct gpd_timing_data *td = &dev_gpd_data(dev)->td; 52 + struct gpd_timing_data *td = dev_gpd_data(dev)->td; 56 53 unsigned long flags; 57 54 s64 constraint_ns; 58 55 ··· 139 136 * is able to enter its optimal idle state. 140 137 */ 141 138 list_for_each_entry(pdd, &genpd->dev_list, list_node) { 142 - next_wakeup = to_gpd_data(pdd)->next_wakeup; 139 + next_wakeup = to_gpd_data(pdd)->td->next_wakeup; 143 140 if (next_wakeup != KTIME_MAX && !ktime_before(next_wakeup, now)) 144 141 if (ktime_before(next_wakeup, domain_wakeup)) 145 142 domain_wakeup = next_wakeup; 146 143 } 147 144 148 145 list_for_each_entry(link, &genpd->parent_links, parent_node) { 149 - next_wakeup = link->child->next_wakeup; 146 + struct genpd_governor_data *cgd = link->child->gd; 147 + 148 + next_wakeup = cgd ? cgd->next_wakeup : KTIME_MAX; 150 149 if (next_wakeup != KTIME_MAX && !ktime_before(next_wakeup, now)) 151 150 if (ktime_before(next_wakeup, domain_wakeup)) 152 151 domain_wakeup = next_wakeup; 153 152 } 154 153 155 - genpd->next_wakeup = domain_wakeup; 154 + genpd->gd->next_wakeup = domain_wakeup; 156 155 } 157 156 158 157 static bool next_wakeup_allows_state(struct generic_pm_domain *genpd, 159 158 unsigned int state, ktime_t now) 160 159 { 161 - ktime_t domain_wakeup = genpd->next_wakeup; 160 + ktime_t domain_wakeup = genpd->gd->next_wakeup; 162 161 s64 idle_time_ns, min_sleep_ns; 163 162 164 163 min_sleep_ns = genpd->states[state].power_off_latency_ns + ··· 190 185 * All subdomains have been powered off already at this point. 191 186 */ 192 187 list_for_each_entry(link, &genpd->parent_links, parent_node) { 193 - struct generic_pm_domain *sd = link->child; 194 - s64 sd_max_off_ns = sd->max_off_time_ns; 188 + struct genpd_governor_data *cgd = link->child->gd; 189 + 190 + s64 sd_max_off_ns = cgd ? cgd->max_off_time_ns : -1; 195 191 196 192 if (sd_max_off_ns < 0) 197 193 continue; ··· 221 215 * domain to turn off and on (that's how much time it will 222 216 * have to wait worst case). 223 217 */ 224 - td = &to_gpd_data(pdd)->td; 218 + td = to_gpd_data(pdd)->td; 225 219 constraint_ns = td->effective_constraint_ns; 226 220 /* 227 221 * Zero means "no suspend at all" and this runs only when all ··· 250 244 * time and the time needed to turn the domain on is the maximum 251 245 * theoretical time this domain can spend in the "off" state. 252 246 */ 253 - genpd->max_off_time_ns = min_off_time_ns - 247 + genpd->gd->max_off_time_ns = min_off_time_ns - 254 248 genpd->states[state].power_on_latency_ns; 255 249 return true; 256 250 } ··· 265 259 static bool _default_power_down_ok(struct dev_pm_domain *pd, ktime_t now) 266 260 { 267 261 struct generic_pm_domain *genpd = pd_to_genpd(pd); 262 + struct genpd_governor_data *gd = genpd->gd; 268 263 int state_idx = genpd->state_count - 1; 269 264 struct gpd_link *link; 270 265 ··· 276 269 * cannot be met. 277 270 */ 278 271 update_domain_next_wakeup(genpd, now); 279 - if ((genpd->flags & GENPD_FLAG_MIN_RESIDENCY) && (genpd->next_wakeup != KTIME_MAX)) { 272 + if ((genpd->flags & GENPD_FLAG_MIN_RESIDENCY) && (gd->next_wakeup != KTIME_MAX)) { 280 273 /* Let's find out the deepest domain idle state, the devices prefer */ 281 274 while (state_idx >= 0) { 282 275 if (next_wakeup_allows_state(genpd, state_idx, now)) { 283 - genpd->max_off_time_changed = true; 276 + gd->max_off_time_changed = true; 284 277 break; 285 278 } 286 279 state_idx--; ··· 288 281 289 282 if (state_idx < 0) { 290 283 state_idx = 0; 291 - genpd->cached_power_down_ok = false; 284 + gd->cached_power_down_ok = false; 292 285 goto done; 293 286 } 294 287 } 295 288 296 - if (!genpd->max_off_time_changed) { 297 - genpd->state_idx = genpd->cached_power_down_state_idx; 298 - return genpd->cached_power_down_ok; 289 + if (!gd->max_off_time_changed) { 290 + genpd->state_idx = gd->cached_power_down_state_idx; 291 + return gd->cached_power_down_ok; 299 292 } 300 293 301 294 /* ··· 304 297 * going to be called for any parent until this instance 305 298 * returns. 306 299 */ 307 - list_for_each_entry(link, &genpd->child_links, child_node) 308 - link->parent->max_off_time_changed = true; 300 + list_for_each_entry(link, &genpd->child_links, child_node) { 301 + struct genpd_governor_data *pgd = link->parent->gd; 309 302 310 - genpd->max_off_time_ns = -1; 311 - genpd->max_off_time_changed = false; 312 - genpd->cached_power_down_ok = true; 303 + if (pgd) 304 + pgd->max_off_time_changed = true; 305 + } 306 + 307 + gd->max_off_time_ns = -1; 308 + gd->max_off_time_changed = false; 309 + gd->cached_power_down_ok = true; 313 310 314 311 /* 315 312 * Find a state to power down to, starting from the state ··· 321 310 */ 322 311 while (!__default_power_down_ok(pd, state_idx)) { 323 312 if (state_idx == 0) { 324 - genpd->cached_power_down_ok = false; 313 + gd->cached_power_down_ok = false; 325 314 break; 326 315 } 327 316 state_idx--; ··· 329 318 330 319 done: 331 320 genpd->state_idx = state_idx; 332 - genpd->cached_power_down_state_idx = genpd->state_idx; 333 - return genpd->cached_power_down_ok; 321 + gd->cached_power_down_state_idx = genpd->state_idx; 322 + return gd->cached_power_down_ok; 334 323 } 335 324 336 325 static bool default_power_down_ok(struct dev_pm_domain *pd) 337 326 { 338 327 return _default_power_down_ok(pd, ktime_get()); 339 - } 340 - 341 - static bool always_on_power_down_ok(struct dev_pm_domain *domain) 342 - { 343 - return false; 344 328 } 345 329 346 330 #ifdef CONFIG_CPU_IDLE ··· 407 401 * pm_genpd_gov_always_on - A governor implementing an always-on policy 408 402 */ 409 403 struct dev_power_governor pm_domain_always_on_gov = { 410 - .power_down_ok = always_on_power_down_ok, 411 404 .suspend_ok = default_suspend_ok, 412 405 };
+42 -11
drivers/base/power/runtime.c
··· 263 263 retval = -EINVAL; 264 264 else if (dev->power.disable_depth > 0) 265 265 retval = -EACCES; 266 - else if (atomic_read(&dev->power.usage_count) > 0) 266 + else if (atomic_read(&dev->power.usage_count)) 267 267 retval = -EAGAIN; 268 268 else if (!dev->power.ignore_children && 269 269 atomic_read(&dev->power.child_count)) ··· 1039 1039 } 1040 1040 EXPORT_SYMBOL_GPL(pm_schedule_suspend); 1041 1041 1042 + static int rpm_drop_usage_count(struct device *dev) 1043 + { 1044 + int ret; 1045 + 1046 + ret = atomic_sub_return(1, &dev->power.usage_count); 1047 + if (ret >= 0) 1048 + return ret; 1049 + 1050 + /* 1051 + * Because rpm_resume() does not check the usage counter, it will resume 1052 + * the device even if the usage counter is 0 or negative, so it is 1053 + * sufficient to increment the usage counter here to reverse the change 1054 + * made above. 1055 + */ 1056 + atomic_inc(&dev->power.usage_count); 1057 + dev_warn(dev, "Runtime PM usage count underflow!\n"); 1058 + return -EINVAL; 1059 + } 1060 + 1042 1061 /** 1043 1062 * __pm_runtime_idle - Entry point for runtime idle operations. 1044 1063 * @dev: Device to send idle notification for. 1045 1064 * @rpmflags: Flag bits. 1046 1065 * 1047 1066 * If the RPM_GET_PUT flag is set, decrement the device's usage count and 1048 - * return immediately if it is larger than zero. Then carry out an idle 1067 + * return immediately if it is larger than zero (if it becomes negative, log a 1068 + * warning, increment it, and return an error). Then carry out an idle 1049 1069 * notification, either synchronous or asynchronous. 1050 1070 * 1051 1071 * This routine may be called in atomic context if the RPM_ASYNC flag is set, ··· 1077 1057 int retval; 1078 1058 1079 1059 if (rpmflags & RPM_GET_PUT) { 1080 - if (!atomic_dec_and_test(&dev->power.usage_count)) { 1060 + retval = rpm_drop_usage_count(dev); 1061 + if (retval < 0) { 1062 + return retval; 1063 + } else if (retval > 0) { 1081 1064 trace_rpm_usage_rcuidle(dev, rpmflags); 1082 1065 return 0; 1083 1066 } ··· 1102 1079 * @rpmflags: Flag bits. 1103 1080 * 1104 1081 * If the RPM_GET_PUT flag is set, decrement the device's usage count and 1105 - * return immediately if it is larger than zero. Then carry out a suspend, 1082 + * return immediately if it is larger than zero (if it becomes negative, log a 1083 + * warning, increment it, and return an error). Then carry out a suspend, 1106 1084 * either synchronous or asynchronous. 1107 1085 * 1108 1086 * This routine may be called in atomic context if the RPM_ASYNC flag is set, ··· 1115 1091 int retval; 1116 1092 1117 1093 if (rpmflags & RPM_GET_PUT) { 1118 - if (!atomic_dec_and_test(&dev->power.usage_count)) { 1094 + retval = rpm_drop_usage_count(dev); 1095 + if (retval < 0) { 1096 + return retval; 1097 + } else if (retval > 0) { 1119 1098 trace_rpm_usage_rcuidle(dev, rpmflags); 1120 1099 return 0; 1121 1100 } ··· 1237 1210 { 1238 1211 struct device *parent = dev->parent; 1239 1212 bool notify_parent = false; 1213 + unsigned long flags; 1240 1214 int error = 0; 1241 1215 1242 1216 if (status != RPM_ACTIVE && status != RPM_SUSPENDED) 1243 1217 return -EINVAL; 1244 1218 1245 - spin_lock_irq(&dev->power.lock); 1219 + spin_lock_irqsave(&dev->power.lock, flags); 1246 1220 1247 1221 /* 1248 1222 * Prevent PM-runtime from being enabled for the device or return an ··· 1254 1226 else 1255 1227 error = -EAGAIN; 1256 1228 1257 - spin_unlock_irq(&dev->power.lock); 1229 + spin_unlock_irqrestore(&dev->power.lock, flags); 1258 1230 1259 1231 if (error) 1260 1232 return error; ··· 1275 1247 device_links_read_unlock(idx); 1276 1248 } 1277 1249 1278 - spin_lock_irq(&dev->power.lock); 1250 + spin_lock_irqsave(&dev->power.lock, flags); 1279 1251 1280 1252 if (dev->power.runtime_status == status || !parent) 1281 1253 goto out_set; ··· 1316 1288 dev->power.runtime_error = 0; 1317 1289 1318 1290 out: 1319 - spin_unlock_irq(&dev->power.lock); 1291 + spin_unlock_irqrestore(&dev->power.lock, flags); 1320 1292 1321 1293 if (notify_parent) 1322 1294 pm_request_idle(parent); ··· 1555 1527 */ 1556 1528 void pm_runtime_allow(struct device *dev) 1557 1529 { 1530 + int ret; 1531 + 1558 1532 spin_lock_irq(&dev->power.lock); 1559 1533 if (dev->power.runtime_auto) 1560 1534 goto out; 1561 1535 1562 1536 dev->power.runtime_auto = true; 1563 - if (atomic_dec_and_test(&dev->power.usage_count)) 1537 + ret = rpm_drop_usage_count(dev); 1538 + if (ret == 0) 1564 1539 rpm_idle(dev, RPM_AUTO | RPM_ASYNC); 1565 - else 1540 + else if (ret > 0) 1566 1541 trace_rpm_usage_rcuidle(dev, RPM_AUTO | RPM_ASYNC); 1567 1542 1568 1543 out:
+211
drivers/cpufreq/cppc_cpufreq.c
··· 389 389 return ret; 390 390 } 391 391 392 + static unsigned int cppc_cpufreq_fast_switch(struct cpufreq_policy *policy, 393 + unsigned int target_freq) 394 + { 395 + struct cppc_cpudata *cpu_data = policy->driver_data; 396 + unsigned int cpu = policy->cpu; 397 + u32 desired_perf; 398 + int ret; 399 + 400 + desired_perf = cppc_cpufreq_khz_to_perf(cpu_data, target_freq); 401 + cpu_data->perf_ctrls.desired_perf = desired_perf; 402 + ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls); 403 + 404 + if (ret) { 405 + pr_debug("Failed to set target on CPU:%d. ret:%d\n", 406 + cpu, ret); 407 + return 0; 408 + } 409 + 410 + return target_freq; 411 + } 412 + 392 413 static int cppc_verify_policy(struct cpufreq_policy_data *policy) 393 414 { 394 415 cpufreq_verify_within_cpu_limits(policy); ··· 441 420 return cppc_get_transition_latency(cpu) / NSEC_PER_USEC; 442 421 } 443 422 423 + static DEFINE_PER_CPU(unsigned int, efficiency_class); 424 + static void cppc_cpufreq_register_em(struct cpufreq_policy *policy); 425 + 426 + /* Create an artificial performance state every CPPC_EM_CAP_STEP capacity unit. */ 427 + #define CPPC_EM_CAP_STEP (20) 428 + /* Increase the cost value by CPPC_EM_COST_STEP every performance state. */ 429 + #define CPPC_EM_COST_STEP (1) 430 + /* Add a cost gap correspnding to the energy of 4 CPUs. */ 431 + #define CPPC_EM_COST_GAP (4 * SCHED_CAPACITY_SCALE * CPPC_EM_COST_STEP \ 432 + / CPPC_EM_CAP_STEP) 433 + 434 + static unsigned int get_perf_level_count(struct cpufreq_policy *policy) 435 + { 436 + struct cppc_perf_caps *perf_caps; 437 + unsigned int min_cap, max_cap; 438 + struct cppc_cpudata *cpu_data; 439 + int cpu = policy->cpu; 440 + 441 + cpu_data = policy->driver_data; 442 + perf_caps = &cpu_data->perf_caps; 443 + max_cap = arch_scale_cpu_capacity(cpu); 444 + min_cap = div_u64(max_cap * perf_caps->lowest_perf, perf_caps->highest_perf); 445 + if ((min_cap == 0) || (max_cap < min_cap)) 446 + return 0; 447 + return 1 + max_cap / CPPC_EM_CAP_STEP - min_cap / CPPC_EM_CAP_STEP; 448 + } 449 + 450 + /* 451 + * The cost is defined as: 452 + * cost = power * max_frequency / frequency 453 + */ 454 + static inline unsigned long compute_cost(int cpu, int step) 455 + { 456 + return CPPC_EM_COST_GAP * per_cpu(efficiency_class, cpu) + 457 + step * CPPC_EM_COST_STEP; 458 + } 459 + 460 + static int cppc_get_cpu_power(struct device *cpu_dev, 461 + unsigned long *power, unsigned long *KHz) 462 + { 463 + unsigned long perf_step, perf_prev, perf, perf_check; 464 + unsigned int min_step, max_step, step, step_check; 465 + unsigned long prev_freq = *KHz; 466 + unsigned int min_cap, max_cap; 467 + struct cpufreq_policy *policy; 468 + 469 + struct cppc_perf_caps *perf_caps; 470 + struct cppc_cpudata *cpu_data; 471 + 472 + policy = cpufreq_cpu_get_raw(cpu_dev->id); 473 + cpu_data = policy->driver_data; 474 + perf_caps = &cpu_data->perf_caps; 475 + max_cap = arch_scale_cpu_capacity(cpu_dev->id); 476 + min_cap = div_u64(max_cap * perf_caps->lowest_perf, 477 + perf_caps->highest_perf); 478 + 479 + perf_step = CPPC_EM_CAP_STEP * perf_caps->highest_perf / max_cap; 480 + min_step = min_cap / CPPC_EM_CAP_STEP; 481 + max_step = max_cap / CPPC_EM_CAP_STEP; 482 + 483 + perf_prev = cppc_cpufreq_khz_to_perf(cpu_data, *KHz); 484 + step = perf_prev / perf_step; 485 + 486 + if (step > max_step) 487 + return -EINVAL; 488 + 489 + if (min_step == max_step) { 490 + step = max_step; 491 + perf = perf_caps->highest_perf; 492 + } else if (step < min_step) { 493 + step = min_step; 494 + perf = perf_caps->lowest_perf; 495 + } else { 496 + step++; 497 + if (step == max_step) 498 + perf = perf_caps->highest_perf; 499 + else 500 + perf = step * perf_step; 501 + } 502 + 503 + *KHz = cppc_cpufreq_perf_to_khz(cpu_data, perf); 504 + perf_check = cppc_cpufreq_khz_to_perf(cpu_data, *KHz); 505 + step_check = perf_check / perf_step; 506 + 507 + /* 508 + * To avoid bad integer approximation, check that new frequency value 509 + * increased and that the new frequency will be converted to the 510 + * desired step value. 511 + */ 512 + while ((*KHz == prev_freq) || (step_check != step)) { 513 + perf++; 514 + *KHz = cppc_cpufreq_perf_to_khz(cpu_data, perf); 515 + perf_check = cppc_cpufreq_khz_to_perf(cpu_data, *KHz); 516 + step_check = perf_check / perf_step; 517 + } 518 + 519 + /* 520 + * With an artificial EM, only the cost value is used. Still the power 521 + * is populated such as 0 < power < EM_MAX_POWER. This allows to add 522 + * more sense to the artificial performance states. 523 + */ 524 + *power = compute_cost(cpu_dev->id, step); 525 + 526 + return 0; 527 + } 528 + 529 + static int cppc_get_cpu_cost(struct device *cpu_dev, unsigned long KHz, 530 + unsigned long *cost) 531 + { 532 + unsigned long perf_step, perf_prev; 533 + struct cppc_perf_caps *perf_caps; 534 + struct cpufreq_policy *policy; 535 + struct cppc_cpudata *cpu_data; 536 + unsigned int max_cap; 537 + int step; 538 + 539 + policy = cpufreq_cpu_get_raw(cpu_dev->id); 540 + cpu_data = policy->driver_data; 541 + perf_caps = &cpu_data->perf_caps; 542 + max_cap = arch_scale_cpu_capacity(cpu_dev->id); 543 + 544 + perf_prev = cppc_cpufreq_khz_to_perf(cpu_data, KHz); 545 + perf_step = CPPC_EM_CAP_STEP * perf_caps->highest_perf / max_cap; 546 + step = perf_prev / perf_step; 547 + 548 + *cost = compute_cost(cpu_dev->id, step); 549 + 550 + return 0; 551 + } 552 + 553 + static int populate_efficiency_class(void) 554 + { 555 + struct acpi_madt_generic_interrupt *gicc; 556 + DECLARE_BITMAP(used_classes, 256) = {}; 557 + int class, cpu, index; 558 + 559 + for_each_possible_cpu(cpu) { 560 + gicc = acpi_cpu_get_madt_gicc(cpu); 561 + class = gicc->efficiency_class; 562 + bitmap_set(used_classes, class, 1); 563 + } 564 + 565 + if (bitmap_weight(used_classes, 256) <= 1) { 566 + pr_debug("Efficiency classes are all equal (=%d). " 567 + "No EM registered", class); 568 + return -EINVAL; 569 + } 570 + 571 + /* 572 + * Squeeze efficiency class values on [0:#efficiency_class-1]. 573 + * Values are per spec in [0:255]. 574 + */ 575 + index = 0; 576 + for_each_set_bit(class, used_classes, 256) { 577 + for_each_possible_cpu(cpu) { 578 + gicc = acpi_cpu_get_madt_gicc(cpu); 579 + if (gicc->efficiency_class == class) 580 + per_cpu(efficiency_class, cpu) = index; 581 + } 582 + index++; 583 + } 584 + cppc_cpufreq_driver.register_em = cppc_cpufreq_register_em; 585 + 586 + return 0; 587 + } 588 + 589 + static void cppc_cpufreq_register_em(struct cpufreq_policy *policy) 590 + { 591 + struct cppc_cpudata *cpu_data; 592 + struct em_data_callback em_cb = 593 + EM_ADV_DATA_CB(cppc_get_cpu_power, cppc_get_cpu_cost); 594 + 595 + cpu_data = policy->driver_data; 596 + em_dev_register_perf_domain(get_cpu_device(policy->cpu), 597 + get_perf_level_count(policy), &em_cb, 598 + cpu_data->shared_cpu_map, 0); 599 + } 600 + 444 601 #else 445 602 446 603 static unsigned int cppc_cpufreq_get_transition_delay_us(unsigned int cpu) 447 604 { 448 605 return cppc_get_transition_latency(cpu) / NSEC_PER_USEC; 606 + } 607 + static int populate_efficiency_class(void) 608 + { 609 + return 0; 610 + } 611 + static void cppc_cpufreq_register_em(struct cpufreq_policy *policy) 612 + { 449 613 } 450 614 #endif 451 615 ··· 741 535 ret = -EFAULT; 742 536 goto out; 743 537 } 538 + 539 + policy->fast_switch_possible = cppc_allow_fast_switch(); 540 + policy->dvfs_possible_from_any_cpu = true; 744 541 745 542 /* 746 543 * If 'highest_perf' is greater than 'nominal_perf', we assume CPU Boost ··· 890 681 .verify = cppc_verify_policy, 891 682 .target = cppc_cpufreq_set_target, 892 683 .get = cppc_cpufreq_get_rate, 684 + .fast_switch = cppc_cpufreq_fast_switch, 893 685 .init = cppc_cpufreq_cpu_init, 894 686 .exit = cppc_cpufreq_cpu_exit, 895 687 .set_boost = cppc_cpufreq_set_boost, ··· 952 742 953 743 cppc_check_hisi_workaround(); 954 744 cppc_freq_invariance_init(); 745 + populate_efficiency_class(); 955 746 956 747 ret = cpufreq_register_driver(&cppc_cpufreq_driver); 957 748 if (ret)
+69 -43
drivers/cpufreq/cpufreq.c
··· 28 28 #include <linux/suspend.h> 29 29 #include <linux/syscore_ops.h> 30 30 #include <linux/tick.h> 31 + #include <linux/units.h> 31 32 #include <trace/events/power.h> 32 33 33 34 static LIST_HEAD(cpufreq_policy_list); ··· 948 947 { 949 948 struct cpufreq_policy *policy = to_policy(kobj); 950 949 struct freq_attr *fattr = to_attr(attr); 951 - ssize_t ret; 950 + ssize_t ret = -EBUSY; 952 951 953 952 if (!fattr->show) 954 953 return -EIO; 955 954 956 955 down_read(&policy->rwsem); 957 - ret = fattr->show(policy, buf); 956 + if (likely(!policy_is_inactive(policy))) 957 + ret = fattr->show(policy, buf); 958 958 up_read(&policy->rwsem); 959 959 960 960 return ret; ··· 966 964 { 967 965 struct cpufreq_policy *policy = to_policy(kobj); 968 966 struct freq_attr *fattr = to_attr(attr); 969 - ssize_t ret = -EINVAL; 967 + ssize_t ret = -EBUSY; 970 968 971 969 if (!fattr->store) 972 970 return -EIO; ··· 980 978 981 979 if (cpu_online(policy->cpu)) { 982 980 down_write(&policy->rwsem); 983 - ret = fattr->store(policy, buf, count); 981 + if (likely(!policy_is_inactive(policy))) 982 + ret = fattr->store(policy, buf, count); 984 983 up_write(&policy->rwsem); 985 984 } 986 985 ··· 1022 1019 dev_err(dev, "cpufreq symlink creation failed\n"); 1023 1020 } 1024 1021 1025 - static void remove_cpu_dev_symlink(struct cpufreq_policy *policy, 1022 + static void remove_cpu_dev_symlink(struct cpufreq_policy *policy, int cpu, 1026 1023 struct device *dev) 1027 1024 { 1028 1025 dev_dbg(dev, "%s: Removing symlink\n", __func__); 1029 1026 sysfs_remove_link(&dev->kobj, "cpufreq"); 1027 + cpumask_clear_cpu(cpu, policy->real_cpus); 1030 1028 } 1031 1029 1032 1030 static int cpufreq_add_dev_interface(struct cpufreq_policy *policy) ··· 1341 1337 down_write(&policy->rwsem); 1342 1338 policy->cpu = cpu; 1343 1339 policy->governor = NULL; 1344 - up_write(&policy->rwsem); 1345 1340 } else { 1346 1341 new_policy = true; 1347 1342 policy = cpufreq_policy_alloc(cpu); 1348 1343 if (!policy) 1349 1344 return -ENOMEM; 1345 + down_write(&policy->rwsem); 1350 1346 } 1351 1347 1352 1348 if (!new_policy && cpufreq_driver->online) { ··· 1386 1382 cpumask_copy(policy->related_cpus, policy->cpus); 1387 1383 } 1388 1384 1389 - down_write(&policy->rwsem); 1390 1385 /* 1391 1386 * affected cpus must always be the one, which are online. We aren't 1392 1387 * managing offline cpus here. ··· 1534 1531 1535 1532 out_destroy_policy: 1536 1533 for_each_cpu(j, policy->real_cpus) 1537 - remove_cpu_dev_symlink(policy, get_cpu_device(j)); 1534 + remove_cpu_dev_symlink(policy, j, get_cpu_device(j)); 1538 1535 1539 - up_write(&policy->rwsem); 1536 + cpumask_clear(policy->cpus); 1540 1537 1541 1538 out_offline_policy: 1542 1539 if (cpufreq_driver->offline) ··· 1547 1544 cpufreq_driver->exit(policy); 1548 1545 1549 1546 out_free_policy: 1547 + up_write(&policy->rwsem); 1548 + 1550 1549 cpufreq_policy_free(policy); 1551 1550 return ret; 1552 1551 } ··· 1580 1575 return 0; 1581 1576 } 1582 1577 1583 - static int cpufreq_offline(unsigned int cpu) 1578 + static void __cpufreq_offline(unsigned int cpu, struct cpufreq_policy *policy) 1584 1579 { 1585 - struct cpufreq_policy *policy; 1586 1580 int ret; 1587 1581 1588 - pr_debug("%s: unregistering CPU %u\n", __func__, cpu); 1589 - 1590 - policy = cpufreq_cpu_get_raw(cpu); 1591 - if (!policy) { 1592 - pr_debug("%s: No cpu_data found\n", __func__); 1593 - return 0; 1594 - } 1595 - 1596 - down_write(&policy->rwsem); 1597 1582 if (has_target()) 1598 1583 cpufreq_stop_governor(policy); 1599 1584 1600 1585 cpumask_clear_cpu(cpu, policy->cpus); 1601 1586 1602 - if (policy_is_inactive(policy)) { 1603 - if (has_target()) 1604 - strncpy(policy->last_governor, policy->governor->name, 1605 - CPUFREQ_NAME_LEN); 1606 - else 1607 - policy->last_policy = policy->policy; 1608 - } else if (cpu == policy->cpu) { 1609 - /* Nominate new CPU */ 1610 - policy->cpu = cpumask_any(policy->cpus); 1611 - } 1612 - 1613 - /* Start governor again for active policy */ 1614 1587 if (!policy_is_inactive(policy)) { 1588 + /* Nominate a new CPU if necessary. */ 1589 + if (cpu == policy->cpu) 1590 + policy->cpu = cpumask_any(policy->cpus); 1591 + 1592 + /* Start the governor again for the active policy. */ 1615 1593 if (has_target()) { 1616 1594 ret = cpufreq_start_governor(policy); 1617 1595 if (ret) 1618 1596 pr_err("%s: Failed to start governor\n", __func__); 1619 1597 } 1620 1598 1621 - goto unlock; 1599 + return; 1622 1600 } 1601 + 1602 + if (has_target()) 1603 + strncpy(policy->last_governor, policy->governor->name, 1604 + CPUFREQ_NAME_LEN); 1605 + else 1606 + policy->last_policy = policy->policy; 1623 1607 1624 1608 if (cpufreq_thermal_control_enabled(cpufreq_driver)) { 1625 1609 cpufreq_cooling_unregister(policy->cdev); ··· 1628 1634 cpufreq_driver->exit(policy); 1629 1635 policy->freq_table = NULL; 1630 1636 } 1637 + } 1631 1638 1632 - unlock: 1639 + static int cpufreq_offline(unsigned int cpu) 1640 + { 1641 + struct cpufreq_policy *policy; 1642 + 1643 + pr_debug("%s: unregistering CPU %u\n", __func__, cpu); 1644 + 1645 + policy = cpufreq_cpu_get_raw(cpu); 1646 + if (!policy) { 1647 + pr_debug("%s: No cpu_data found\n", __func__); 1648 + return 0; 1649 + } 1650 + 1651 + down_write(&policy->rwsem); 1652 + 1653 + __cpufreq_offline(cpu, policy); 1654 + 1633 1655 up_write(&policy->rwsem); 1634 1656 return 0; 1635 1657 } ··· 1663 1653 if (!policy) 1664 1654 return; 1665 1655 1656 + down_write(&policy->rwsem); 1657 + 1666 1658 if (cpu_online(cpu)) 1667 - cpufreq_offline(cpu); 1659 + __cpufreq_offline(cpu, policy); 1668 1660 1669 - cpumask_clear_cpu(cpu, policy->real_cpus); 1670 - remove_cpu_dev_symlink(policy, dev); 1661 + remove_cpu_dev_symlink(policy, cpu, dev); 1671 1662 1672 - if (cpumask_empty(policy->real_cpus)) { 1673 - /* We did light-weight exit earlier, do full tear down now */ 1674 - if (cpufreq_driver->offline) 1675 - cpufreq_driver->exit(policy); 1676 - 1677 - cpufreq_policy_free(policy); 1663 + if (!cpumask_empty(policy->real_cpus)) { 1664 + up_write(&policy->rwsem); 1665 + return; 1678 1666 } 1667 + 1668 + /* We did light-weight exit earlier, do full tear down now */ 1669 + if (cpufreq_driver->offline) 1670 + cpufreq_driver->exit(policy); 1671 + 1672 + up_write(&policy->rwsem); 1673 + 1674 + cpufreq_policy_free(policy); 1679 1675 } 1680 1676 1681 1677 /** ··· 1723 1707 return new_freq; 1724 1708 1725 1709 if (policy->cur != new_freq) { 1710 + /* 1711 + * For some platforms, the frequency returned by hardware may be 1712 + * slightly different from what is provided in the frequency 1713 + * table, for example hardware may return 499 MHz instead of 500 1714 + * MHz. In such cases it is better to avoid getting into 1715 + * unnecessary frequency updates. 1716 + */ 1717 + if (abs(policy->cur - new_freq) < HZ_PER_MHZ) 1718 + return policy->cur; 1719 + 1726 1720 cpufreq_out_of_sync(policy, new_freq); 1727 1721 if (update) 1728 1722 schedule_work(&policy->update);
+13 -7
drivers/cpufreq/cpufreq_governor.c
··· 388 388 gov->free(policy_dbs); 389 389 } 390 390 391 + static void cpufreq_dbs_data_release(struct kobject *kobj) 392 + { 393 + struct dbs_data *dbs_data = to_dbs_data(to_gov_attr_set(kobj)); 394 + struct dbs_governor *gov = dbs_data->gov; 395 + 396 + gov->exit(dbs_data); 397 + kfree(dbs_data); 398 + } 399 + 391 400 int cpufreq_dbs_governor_init(struct cpufreq_policy *policy) 392 401 { 393 402 struct dbs_governor *gov = dbs_governor_of(policy); ··· 434 425 goto free_policy_dbs_info; 435 426 } 436 427 428 + dbs_data->gov = gov; 437 429 gov_attr_set_init(&dbs_data->attr_set, &policy_dbs->list); 438 430 439 431 ret = gov->init(dbs_data); ··· 457 447 policy->governor_data = policy_dbs; 458 448 459 449 gov->kobj_type.sysfs_ops = &governor_sysfs_ops; 450 + gov->kobj_type.release = cpufreq_dbs_data_release; 460 451 ret = kobject_init_and_add(&dbs_data->attr_set.kobj, &gov->kobj_type, 461 452 get_governor_parent_kobj(policy), 462 453 "%s", gov->gov.name); ··· 499 488 500 489 policy->governor_data = NULL; 501 490 502 - if (!count) { 503 - if (!have_governor_per_policy()) 504 - gov->gdbs_data = NULL; 505 - 506 - gov->exit(dbs_data); 507 - kfree(dbs_data); 508 - } 491 + if (!count && !have_governor_per_policy()) 492 + gov->gdbs_data = NULL; 509 493 510 494 free_policy_dbs_info(policy_dbs, gov); 511 495
+1
drivers/cpufreq/cpufreq_governor.h
··· 37 37 /* Governor demand based switching data (per-policy or global). */ 38 38 struct dbs_data { 39 39 struct gov_attr_set attr_set; 40 + struct dbs_governor *gov; 40 41 void *tuners; 41 42 unsigned int ignore_nice_load; 42 43 unsigned int sampling_rate;
+2
drivers/cpufreq/intel_pstate.c
··· 1322 1322 mutex_unlock(&intel_pstate_limits_lock); 1323 1323 1324 1324 intel_pstate_update_policies(); 1325 + arch_set_max_freq_ratio(global.no_turbo); 1325 1326 1326 1327 mutex_unlock(&intel_pstate_driver_lock); 1327 1328 ··· 2425 2424 X86_MATCH(BROADWELL_X, core_funcs), 2426 2425 X86_MATCH(SKYLAKE_X, core_funcs), 2427 2426 X86_MATCH(ICELAKE_X, core_funcs), 2427 + X86_MATCH(SAPPHIRERAPIDS_X, core_funcs), 2428 2428 {} 2429 2429 }; 2430 2430
+2 -2
drivers/cpufreq/mediatek-cpufreq-hw.c
··· 51 51 }; 52 52 53 53 static int __maybe_unused 54 - mtk_cpufreq_get_cpu_power(unsigned long *mW, 55 - unsigned long *KHz, struct device *cpu_dev) 54 + mtk_cpufreq_get_cpu_power(struct device *cpu_dev, unsigned long *mW, 55 + unsigned long *KHz) 56 56 { 57 57 struct mtk_cpufreq_data *data; 58 58 struct cpufreq_policy *policy;
-1
drivers/cpufreq/pasemi-cpufreq.c
··· 18 18 19 19 #include <asm/hw_irq.h> 20 20 #include <asm/io.h> 21 - #include <asm/prom.h> 22 21 #include <asm/time.h> 23 22 #include <asm/smp.h> 24 23
+1 -1
drivers/cpufreq/pmac32-cpufreq.c
··· 24 24 #include <linux/device.h> 25 25 #include <linux/hardirq.h> 26 26 #include <linux/of_device.h> 27 - #include <asm/prom.h> 27 + 28 28 #include <asm/machdep.h> 29 29 #include <asm/irq.h> 30 30 #include <asm/pmac_feature.h>
+1 -1
drivers/cpufreq/pmac64-cpufreq.c
··· 22 22 #include <linux/completion.h> 23 23 #include <linux/mutex.h> 24 24 #include <linux/of_device.h> 25 - #include <asm/prom.h> 25 + 26 26 #include <asm/machdep.h> 27 27 #include <asm/irq.h> 28 28 #include <asm/sections.h>
-1
drivers/cpufreq/ppc_cbe_cpufreq.c
··· 12 12 #include <linux/of_platform.h> 13 13 14 14 #include <asm/machdep.h> 15 - #include <asm/prom.h> 16 15 #include <asm/cell-regs.h> 17 16 18 17 #include "ppc_cbe_cpufreq.h"
+1 -1
drivers/cpufreq/ppc_cbe_cpufreq_pmi.c
··· 13 13 #include <linux/init.h> 14 14 #include <linux/of_platform.h> 15 15 #include <linux/pm_qos.h> 16 + #include <linux/slab.h> 16 17 17 18 #include <asm/processor.h> 18 - #include <asm/prom.h> 19 19 #include <asm/pmi.h> 20 20 #include <asm/cell-regs.h> 21 21
+2 -2
drivers/cpufreq/scmi-cpufreq.c
··· 96 96 } 97 97 98 98 static int __maybe_unused 99 - scmi_get_cpu_power(unsigned long *power, unsigned long *KHz, 100 - struct device *cpu_dev) 99 + scmi_get_cpu_power(struct device *cpu_dev, unsigned long *power, 100 + unsigned long *KHz) 101 101 { 102 102 unsigned long Hz; 103 103 int ret, domain;
+2 -2
drivers/cpuidle/cpuidle-psci-domain.c
··· 52 52 struct generic_pm_domain *pd; 53 53 struct psci_pd_provider *pd_provider; 54 54 struct dev_power_governor *pd_gov; 55 - int ret = -ENOMEM, state_count = 0; 55 + int ret = -ENOMEM; 56 56 57 57 pd = dt_idle_pd_alloc(np, psci_dt_parse_state_node); 58 58 if (!pd) ··· 71 71 pd->flags |= GENPD_FLAG_ALWAYS_ON; 72 72 73 73 /* Use governor for CPU PM domains if it has some states to manage. */ 74 - pd_gov = state_count > 0 ? &pm_domain_cpu_gov : NULL; 74 + pd_gov = pd->states ? &pm_domain_cpu_gov : NULL; 75 75 76 76 ret = pm_genpd_init(pd, pd_gov, false); 77 77 if (ret)
+46
drivers/cpuidle/cpuidle-psci.c
··· 23 23 #include <linux/pm_runtime.h> 24 24 #include <linux/slab.h> 25 25 #include <linux/string.h> 26 + #include <linux/syscore_ops.h> 26 27 27 28 #include <asm/cpuidle.h> 28 29 ··· 132 131 return 0; 133 132 } 134 133 134 + static void psci_idle_syscore_switch(bool suspend) 135 + { 136 + bool cleared = false; 137 + struct device *dev; 138 + int cpu; 139 + 140 + for_each_possible_cpu(cpu) { 141 + dev = per_cpu_ptr(&psci_cpuidle_data, cpu)->dev; 142 + 143 + if (dev && suspend) { 144 + dev_pm_genpd_suspend(dev); 145 + } else if (dev) { 146 + dev_pm_genpd_resume(dev); 147 + 148 + /* Account for userspace having offlined a CPU. */ 149 + if (pm_runtime_status_suspended(dev)) 150 + pm_runtime_set_active(dev); 151 + 152 + /* Clear domain state to re-start fresh. */ 153 + if (!cleared) { 154 + psci_set_domain_state(0); 155 + cleared = true; 156 + } 157 + } 158 + } 159 + } 160 + 161 + static int psci_idle_syscore_suspend(void) 162 + { 163 + psci_idle_syscore_switch(true); 164 + return 0; 165 + } 166 + 167 + static void psci_idle_syscore_resume(void) 168 + { 169 + psci_idle_syscore_switch(false); 170 + } 171 + 172 + static struct syscore_ops psci_idle_syscore_ops = { 173 + .suspend = psci_idle_syscore_suspend, 174 + .resume = psci_idle_syscore_resume, 175 + }; 176 + 135 177 static void psci_idle_init_cpuhp(void) 136 178 { 137 179 int err; 138 180 139 181 if (!psci_cpuidle_use_cpuhp) 140 182 return; 183 + 184 + register_syscore_ops(&psci_idle_syscore_ops); 141 185 142 186 err = cpuhp_setup_state_nocalls(CPUHP_AP_CPU_PM_STARTING, 143 187 "cpuidle/psci:online",
+2 -2
drivers/cpuidle/cpuidle-riscv-sbi.c
··· 414 414 struct generic_pm_domain *pd; 415 415 struct sbi_pd_provider *pd_provider; 416 416 struct dev_power_governor *pd_gov; 417 - int ret = -ENOMEM, state_count = 0; 417 + int ret = -ENOMEM; 418 418 419 419 pd = dt_idle_pd_alloc(np, sbi_dt_parse_state_node); 420 420 if (!pd) ··· 433 433 pd->flags |= GENPD_FLAG_ALWAYS_ON; 434 434 435 435 /* Use governor for CPU PM domains if it has some states to manage. */ 436 - pd_gov = state_count > 0 ? &pm_domain_cpu_gov : NULL; 436 + pd_gov = pd->states ? &pm_domain_cpu_gov : NULL; 437 437 438 438 ret = pm_genpd_init(pd, pd_gov, false); 439 439 if (ret)
+12 -8
drivers/devfreq/devfreq.c
··· 112 112 } 113 113 114 114 /** 115 - * get_freq_range() - Get the current freq range 115 + * devfreq_get_freq_range() - Get the current freq range 116 116 * @devfreq: the devfreq instance 117 117 * @min_freq: the min frequency 118 118 * @max_freq: the max frequency 119 119 * 120 120 * This takes into consideration all constraints. 121 121 */ 122 - static void get_freq_range(struct devfreq *devfreq, 123 - unsigned long *min_freq, 124 - unsigned long *max_freq) 122 + void devfreq_get_freq_range(struct devfreq *devfreq, 123 + unsigned long *min_freq, 124 + unsigned long *max_freq) 125 125 { 126 126 unsigned long *freq_table = devfreq->profile->freq_table; 127 127 s32 qos_min_freq, qos_max_freq; ··· 158 158 if (*min_freq > *max_freq) 159 159 *min_freq = *max_freq; 160 160 } 161 + EXPORT_SYMBOL(devfreq_get_freq_range); 161 162 162 163 /** 163 164 * devfreq_get_freq_level() - Lookup freq_table for the frequency ··· 419 418 err = devfreq->governor->get_target_freq(devfreq, &freq); 420 419 if (err) 421 420 return err; 422 - get_freq_range(devfreq, &min_freq, &max_freq); 421 + devfreq_get_freq_range(devfreq, &min_freq, &max_freq); 423 422 424 423 if (freq < min_freq) { 425 424 freq = min_freq; ··· 786 785 { 787 786 struct devfreq *devfreq; 788 787 struct devfreq_governor *governor; 788 + unsigned long min_freq, max_freq; 789 789 int err = 0; 790 790 791 791 if (!dev || !profile || !governor_name) { ··· 850 848 err = -EINVAL; 851 849 goto err_dev; 852 850 } 851 + 852 + devfreq_get_freq_range(devfreq, &min_freq, &max_freq); 853 853 854 854 devfreq->suspend_freq = dev_pm_opp_get_suspend_opp_freq(dev); 855 855 devfreq->opp_table = dev_pm_opp_get_opp_table(dev); ··· 1591 1587 unsigned long min_freq, max_freq; 1592 1588 1593 1589 mutex_lock(&df->lock); 1594 - get_freq_range(df, &min_freq, &max_freq); 1590 + devfreq_get_freq_range(df, &min_freq, &max_freq); 1595 1591 mutex_unlock(&df->lock); 1596 1592 1597 1593 return sprintf(buf, "%lu\n", min_freq); ··· 1645 1641 unsigned long min_freq, max_freq; 1646 1642 1647 1643 mutex_lock(&df->lock); 1648 - get_freq_range(df, &min_freq, &max_freq); 1644 + devfreq_get_freq_range(df, &min_freq, &max_freq); 1649 1645 mutex_unlock(&df->lock); 1650 1646 1651 1647 return sprintf(buf, "%lu\n", max_freq); ··· 1959 1955 1960 1956 mutex_lock(&devfreq->lock); 1961 1957 cur_freq = devfreq->previous_freq; 1962 - get_freq_range(devfreq, &min_freq, &max_freq); 1958 + devfreq_get_freq_range(devfreq, &min_freq, &max_freq); 1963 1959 timer = devfreq->profile->timer; 1964 1960 1965 1961 if (IS_SUPPORTED_ATTR(devfreq->governor->attrs, POLLING_INTERVAL))
+27
drivers/devfreq/governor.h
··· 48 48 #define DEVFREQ_GOV_ATTR_TIMER BIT(1) 49 49 50 50 /** 51 + * struct devfreq_cpu_data - Hold the per-cpu data 52 + * @node: list node 53 + * @dev: reference to cpu device. 54 + * @first_cpu: the cpumask of the first cpu of a policy. 55 + * @opp_table: reference to cpu opp table. 56 + * @cur_freq: the current frequency of the cpu. 57 + * @min_freq: the min frequency of the cpu. 58 + * @max_freq: the max frequency of the cpu. 59 + * 60 + * This structure stores the required cpu_data of a cpu. 61 + * This is auto-populated by the governor. 62 + */ 63 + struct devfreq_cpu_data { 64 + struct list_head node; 65 + 66 + struct device *dev; 67 + unsigned int first_cpu; 68 + 69 + struct opp_table *opp_table; 70 + unsigned int cur_freq; 71 + unsigned int min_freq; 72 + unsigned int max_freq; 73 + }; 74 + 75 + /** 51 76 * struct devfreq_governor - Devfreq policy governor 52 77 * @node: list node - contains registered devfreq governors 53 78 * @name: Governor's name ··· 114 89 115 90 int devfreq_update_status(struct devfreq *devfreq, unsigned long freq); 116 91 int devfreq_update_target(struct devfreq *devfreq, unsigned long freq); 92 + void devfreq_get_freq_range(struct devfreq *devfreq, unsigned long *min_freq, 93 + unsigned long *max_freq); 117 94 118 95 static inline int devfreq_update_stats(struct devfreq *df) 119 96 {
+341 -74
drivers/devfreq/governor_passive.c
··· 1 - // SPDX-License-Identifier: GPL-2.0-only 1 + // SPDX-License-Identifier: GPL-2.0-only 2 2 /* 3 3 * linux/drivers/devfreq/governor_passive.c 4 4 * ··· 8 8 */ 9 9 10 10 #include <linux/module.h> 11 + #include <linux/cpu.h> 12 + #include <linux/cpufreq.h> 13 + #include <linux/cpumask.h> 14 + #include <linux/slab.h> 11 15 #include <linux/device.h> 12 16 #include <linux/devfreq.h> 13 17 #include "governor.h" 14 18 15 - static int devfreq_passive_get_target_freq(struct devfreq *devfreq, 19 + #define HZ_PER_KHZ 1000 20 + 21 + static struct devfreq_cpu_data * 22 + get_parent_cpu_data(struct devfreq_passive_data *p_data, 23 + struct cpufreq_policy *policy) 24 + { 25 + struct devfreq_cpu_data *parent_cpu_data; 26 + 27 + if (!p_data || !policy) 28 + return NULL; 29 + 30 + list_for_each_entry(parent_cpu_data, &p_data->cpu_data_list, node) 31 + if (parent_cpu_data->first_cpu == cpumask_first(policy->related_cpus)) 32 + return parent_cpu_data; 33 + 34 + return NULL; 35 + } 36 + 37 + static unsigned long get_target_freq_by_required_opp(struct device *p_dev, 38 + struct opp_table *p_opp_table, 39 + struct opp_table *opp_table, 40 + unsigned long *freq) 41 + { 42 + struct dev_pm_opp *opp = NULL, *p_opp = NULL; 43 + unsigned long target_freq; 44 + 45 + if (!p_dev || !p_opp_table || !opp_table || !freq) 46 + return 0; 47 + 48 + p_opp = devfreq_recommended_opp(p_dev, freq, 0); 49 + if (IS_ERR(p_opp)) 50 + return 0; 51 + 52 + opp = dev_pm_opp_xlate_required_opp(p_opp_table, opp_table, p_opp); 53 + dev_pm_opp_put(p_opp); 54 + 55 + if (IS_ERR(opp)) 56 + return 0; 57 + 58 + target_freq = dev_pm_opp_get_freq(opp); 59 + dev_pm_opp_put(opp); 60 + 61 + return target_freq; 62 + } 63 + 64 + static int get_target_freq_with_cpufreq(struct devfreq *devfreq, 65 + unsigned long *target_freq) 66 + { 67 + struct devfreq_passive_data *p_data = 68 + (struct devfreq_passive_data *)devfreq->data; 69 + struct devfreq_cpu_data *parent_cpu_data; 70 + struct cpufreq_policy *policy; 71 + unsigned long cpu, cpu_cur, cpu_min, cpu_max, cpu_percent; 72 + unsigned long dev_min, dev_max; 73 + unsigned long freq = 0; 74 + int ret = 0; 75 + 76 + for_each_online_cpu(cpu) { 77 + policy = cpufreq_cpu_get(cpu); 78 + if (!policy) { 79 + ret = -EINVAL; 80 + continue; 81 + } 82 + 83 + parent_cpu_data = get_parent_cpu_data(p_data, policy); 84 + if (!parent_cpu_data) { 85 + cpufreq_cpu_put(policy); 86 + continue; 87 + } 88 + 89 + /* Get target freq via required opps */ 90 + cpu_cur = parent_cpu_data->cur_freq * HZ_PER_KHZ; 91 + freq = get_target_freq_by_required_opp(parent_cpu_data->dev, 92 + parent_cpu_data->opp_table, 93 + devfreq->opp_table, &cpu_cur); 94 + if (freq) { 95 + *target_freq = max(freq, *target_freq); 96 + cpufreq_cpu_put(policy); 97 + continue; 98 + } 99 + 100 + /* Use interpolation if required opps is not available */ 101 + devfreq_get_freq_range(devfreq, &dev_min, &dev_max); 102 + 103 + cpu_min = parent_cpu_data->min_freq; 104 + cpu_max = parent_cpu_data->max_freq; 105 + cpu_cur = parent_cpu_data->cur_freq; 106 + 107 + cpu_percent = ((cpu_cur - cpu_min) * 100) / (cpu_max - cpu_min); 108 + freq = dev_min + mult_frac(dev_max - dev_min, cpu_percent, 100); 109 + 110 + *target_freq = max(freq, *target_freq); 111 + cpufreq_cpu_put(policy); 112 + } 113 + 114 + return ret; 115 + } 116 + 117 + static int get_target_freq_with_devfreq(struct devfreq *devfreq, 16 118 unsigned long *freq) 17 119 { 18 120 struct devfreq_passive_data *p_data 19 121 = (struct devfreq_passive_data *)devfreq->data; 20 122 struct devfreq *parent_devfreq = (struct devfreq *)p_data->parent; 21 123 unsigned long child_freq = ULONG_MAX; 22 - struct dev_pm_opp *opp, *p_opp; 23 124 int i, count; 125 + 126 + /* Get target freq via required opps */ 127 + child_freq = get_target_freq_by_required_opp(parent_devfreq->dev.parent, 128 + parent_devfreq->opp_table, 129 + devfreq->opp_table, freq); 130 + if (child_freq) 131 + goto out; 132 + 133 + /* Use interpolation if required opps is not available */ 134 + for (i = 0; i < parent_devfreq->profile->max_state; i++) 135 + if (parent_devfreq->profile->freq_table[i] == *freq) 136 + break; 137 + 138 + if (i == parent_devfreq->profile->max_state) 139 + return -EINVAL; 140 + 141 + if (i < devfreq->profile->max_state) { 142 + child_freq = devfreq->profile->freq_table[i]; 143 + } else { 144 + count = devfreq->profile->max_state; 145 + child_freq = devfreq->profile->freq_table[count - 1]; 146 + } 147 + 148 + out: 149 + *freq = child_freq; 150 + 151 + return 0; 152 + } 153 + 154 + static int devfreq_passive_get_target_freq(struct devfreq *devfreq, 155 + unsigned long *freq) 156 + { 157 + struct devfreq_passive_data *p_data = 158 + (struct devfreq_passive_data *)devfreq->data; 159 + int ret; 160 + 161 + if (!p_data) 162 + return -EINVAL; 24 163 25 164 /* 26 165 * If the devfreq device with passive governor has the specific method ··· 169 30 if (p_data->get_target_freq) 170 31 return p_data->get_target_freq(devfreq, freq); 171 32 172 - /* 173 - * If the parent and passive devfreq device uses the OPP table, 174 - * get the next frequency by using the OPP table. 175 - */ 33 + switch (p_data->parent_type) { 34 + case DEVFREQ_PARENT_DEV: 35 + ret = get_target_freq_with_devfreq(devfreq, freq); 36 + break; 37 + case CPUFREQ_PARENT_DEV: 38 + ret = get_target_freq_with_cpufreq(devfreq, freq); 39 + break; 40 + default: 41 + ret = -EINVAL; 42 + dev_err(&devfreq->dev, "Invalid parent type\n"); 43 + break; 44 + } 176 45 177 - /* 178 - * - parent devfreq device uses the governors except for passive. 179 - * - passive devfreq device uses the passive governor. 180 - * 181 - * Each devfreq has the OPP table. After deciding the new frequency 182 - * from the governor of parent devfreq device, the passive governor 183 - * need to get the index of new frequency on OPP table of parent 184 - * device. And then the index is used for getting the suitable 185 - * new frequency for passive devfreq device. 186 - */ 187 - if (!devfreq->profile || !devfreq->profile->freq_table 188 - || devfreq->profile->max_state <= 0) 189 - return -EINVAL; 46 + return ret; 47 + } 190 48 191 - /* 192 - * The passive governor have to get the correct frequency from OPP 193 - * list of parent device. Because in this case, *freq is temporary 194 - * value which is decided by ondemand governor. 195 - */ 196 - if (devfreq->opp_table && parent_devfreq->opp_table) { 197 - p_opp = devfreq_recommended_opp(parent_devfreq->dev.parent, 198 - freq, 0); 199 - if (IS_ERR(p_opp)) 200 - return PTR_ERR(p_opp); 49 + static int cpufreq_passive_notifier_call(struct notifier_block *nb, 50 + unsigned long event, void *ptr) 51 + { 52 + struct devfreq_passive_data *p_data = 53 + container_of(nb, struct devfreq_passive_data, nb); 54 + struct devfreq *devfreq = (struct devfreq *)p_data->this; 55 + struct devfreq_cpu_data *parent_cpu_data; 56 + struct cpufreq_freqs *freqs = ptr; 57 + unsigned int cur_freq; 58 + int ret; 201 59 202 - opp = dev_pm_opp_xlate_required_opp(parent_devfreq->opp_table, 203 - devfreq->opp_table, p_opp); 204 - dev_pm_opp_put(p_opp); 205 - 206 - if (IS_ERR(opp)) 207 - goto no_required_opp; 208 - 209 - *freq = dev_pm_opp_get_freq(opp); 210 - dev_pm_opp_put(opp); 211 - 60 + if (event != CPUFREQ_POSTCHANGE || !freqs) 212 61 return 0; 62 + 63 + parent_cpu_data = get_parent_cpu_data(p_data, freqs->policy); 64 + if (!parent_cpu_data || parent_cpu_data->cur_freq == freqs->new) 65 + return 0; 66 + 67 + cur_freq = parent_cpu_data->cur_freq; 68 + parent_cpu_data->cur_freq = freqs->new; 69 + 70 + mutex_lock(&devfreq->lock); 71 + ret = devfreq_update_target(devfreq, freqs->new); 72 + mutex_unlock(&devfreq->lock); 73 + if (ret) { 74 + parent_cpu_data->cur_freq = cur_freq; 75 + dev_err(&devfreq->dev, "failed to update the frequency.\n"); 76 + return ret; 213 77 } 214 - 215 - no_required_opp: 216 - /* 217 - * Get the OPP table's index of decided frequency by governor 218 - * of parent device. 219 - */ 220 - for (i = 0; i < parent_devfreq->profile->max_state; i++) 221 - if (parent_devfreq->profile->freq_table[i] == *freq) 222 - break; 223 - 224 - if (i == parent_devfreq->profile->max_state) 225 - return -EINVAL; 226 - 227 - /* Get the suitable frequency by using index of parent device. */ 228 - if (i < devfreq->profile->max_state) { 229 - child_freq = devfreq->profile->freq_table[i]; 230 - } else { 231 - count = devfreq->profile->max_state; 232 - child_freq = devfreq->profile->freq_table[count - 1]; 233 - } 234 - 235 - /* Return the suitable frequency for passive device. */ 236 - *freq = child_freq; 237 78 238 79 return 0; 80 + } 81 + 82 + static int cpufreq_passive_unregister_notifier(struct devfreq *devfreq) 83 + { 84 + struct devfreq_passive_data *p_data 85 + = (struct devfreq_passive_data *)devfreq->data; 86 + struct devfreq_cpu_data *parent_cpu_data; 87 + int cpu, ret = 0; 88 + 89 + if (p_data->nb.notifier_call) { 90 + ret = cpufreq_unregister_notifier(&p_data->nb, 91 + CPUFREQ_TRANSITION_NOTIFIER); 92 + if (ret < 0) 93 + return ret; 94 + } 95 + 96 + for_each_possible_cpu(cpu) { 97 + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu); 98 + if (!policy) { 99 + ret = -EINVAL; 100 + continue; 101 + } 102 + 103 + parent_cpu_data = get_parent_cpu_data(p_data, policy); 104 + if (!parent_cpu_data) { 105 + cpufreq_cpu_put(policy); 106 + continue; 107 + } 108 + 109 + list_del(&parent_cpu_data->node); 110 + if (parent_cpu_data->opp_table) 111 + dev_pm_opp_put_opp_table(parent_cpu_data->opp_table); 112 + kfree(parent_cpu_data); 113 + cpufreq_cpu_put(policy); 114 + } 115 + 116 + return ret; 117 + } 118 + 119 + static int cpufreq_passive_register_notifier(struct devfreq *devfreq) 120 + { 121 + struct devfreq_passive_data *p_data 122 + = (struct devfreq_passive_data *)devfreq->data; 123 + struct device *dev = devfreq->dev.parent; 124 + struct opp_table *opp_table = NULL; 125 + struct devfreq_cpu_data *parent_cpu_data; 126 + struct cpufreq_policy *policy; 127 + struct device *cpu_dev; 128 + unsigned int cpu; 129 + int ret; 130 + 131 + p_data->cpu_data_list 132 + = (struct list_head)LIST_HEAD_INIT(p_data->cpu_data_list); 133 + 134 + p_data->nb.notifier_call = cpufreq_passive_notifier_call; 135 + ret = cpufreq_register_notifier(&p_data->nb, CPUFREQ_TRANSITION_NOTIFIER); 136 + if (ret) { 137 + dev_err(dev, "failed to register cpufreq notifier\n"); 138 + p_data->nb.notifier_call = NULL; 139 + goto err; 140 + } 141 + 142 + for_each_possible_cpu(cpu) { 143 + policy = cpufreq_cpu_get(cpu); 144 + if (!policy) { 145 + ret = -EPROBE_DEFER; 146 + goto err; 147 + } 148 + 149 + parent_cpu_data = get_parent_cpu_data(p_data, policy); 150 + if (parent_cpu_data) { 151 + cpufreq_cpu_put(policy); 152 + continue; 153 + } 154 + 155 + parent_cpu_data = kzalloc(sizeof(*parent_cpu_data), 156 + GFP_KERNEL); 157 + if (!parent_cpu_data) { 158 + ret = -ENOMEM; 159 + goto err_put_policy; 160 + } 161 + 162 + cpu_dev = get_cpu_device(cpu); 163 + if (!cpu_dev) { 164 + dev_err(dev, "failed to get cpu device\n"); 165 + ret = -ENODEV; 166 + goto err_free_cpu_data; 167 + } 168 + 169 + opp_table = dev_pm_opp_get_opp_table(cpu_dev); 170 + if (IS_ERR(opp_table)) { 171 + dev_err(dev, "failed to get opp_table of cpu%d\n", cpu); 172 + ret = PTR_ERR(opp_table); 173 + goto err_free_cpu_data; 174 + } 175 + 176 + parent_cpu_data->dev = cpu_dev; 177 + parent_cpu_data->opp_table = opp_table; 178 + parent_cpu_data->first_cpu = cpumask_first(policy->related_cpus); 179 + parent_cpu_data->cur_freq = policy->cur; 180 + parent_cpu_data->min_freq = policy->cpuinfo.min_freq; 181 + parent_cpu_data->max_freq = policy->cpuinfo.max_freq; 182 + 183 + list_add_tail(&parent_cpu_data->node, &p_data->cpu_data_list); 184 + cpufreq_cpu_put(policy); 185 + } 186 + 187 + mutex_lock(&devfreq->lock); 188 + ret = devfreq_update_target(devfreq, 0L); 189 + mutex_unlock(&devfreq->lock); 190 + if (ret) 191 + dev_err(dev, "failed to update the frequency\n"); 192 + 193 + return ret; 194 + 195 + err_free_cpu_data: 196 + kfree(parent_cpu_data); 197 + err_put_policy: 198 + cpufreq_cpu_put(policy); 199 + err: 200 + WARN_ON(cpufreq_passive_unregister_notifier(devfreq)); 201 + 202 + return ret; 239 203 } 240 204 241 205 static int devfreq_passive_notifier_call(struct notifier_block *nb, ··· 373 131 return NOTIFY_DONE; 374 132 } 375 133 376 - static int devfreq_passive_event_handler(struct devfreq *devfreq, 377 - unsigned int event, void *data) 134 + static int devfreq_passive_unregister_notifier(struct devfreq *devfreq) 378 135 { 379 136 struct devfreq_passive_data *p_data 380 137 = (struct devfreq_passive_data *)devfreq->data; 381 138 struct devfreq *parent = (struct devfreq *)p_data->parent; 382 139 struct notifier_block *nb = &p_data->nb; 383 - int ret = 0; 140 + 141 + return devfreq_unregister_notifier(parent, nb, DEVFREQ_TRANSITION_NOTIFIER); 142 + } 143 + 144 + static int devfreq_passive_register_notifier(struct devfreq *devfreq) 145 + { 146 + struct devfreq_passive_data *p_data 147 + = (struct devfreq_passive_data *)devfreq->data; 148 + struct devfreq *parent = (struct devfreq *)p_data->parent; 149 + struct notifier_block *nb = &p_data->nb; 384 150 385 151 if (!parent) 386 152 return -EPROBE_DEFER; 387 153 154 + nb->notifier_call = devfreq_passive_notifier_call; 155 + return devfreq_register_notifier(parent, nb, DEVFREQ_TRANSITION_NOTIFIER); 156 + } 157 + 158 + static int devfreq_passive_event_handler(struct devfreq *devfreq, 159 + unsigned int event, void *data) 160 + { 161 + struct devfreq_passive_data *p_data 162 + = (struct devfreq_passive_data *)devfreq->data; 163 + int ret = 0; 164 + 165 + if (!p_data) 166 + return -EINVAL; 167 + 168 + if (!p_data->this) 169 + p_data->this = devfreq; 170 + 388 171 switch (event) { 389 172 case DEVFREQ_GOV_START: 390 - if (!p_data->this) 391 - p_data->this = devfreq; 392 - 393 - nb->notifier_call = devfreq_passive_notifier_call; 394 - ret = devfreq_register_notifier(parent, nb, 395 - DEVFREQ_TRANSITION_NOTIFIER); 173 + if (p_data->parent_type == DEVFREQ_PARENT_DEV) 174 + ret = devfreq_passive_register_notifier(devfreq); 175 + else if (p_data->parent_type == CPUFREQ_PARENT_DEV) 176 + ret = cpufreq_passive_register_notifier(devfreq); 396 177 break; 397 178 case DEVFREQ_GOV_STOP: 398 - WARN_ON(devfreq_unregister_notifier(parent, nb, 399 - DEVFREQ_TRANSITION_NOTIFIER)); 179 + if (p_data->parent_type == DEVFREQ_PARENT_DEV) 180 + WARN_ON(devfreq_passive_unregister_notifier(devfreq)); 181 + else if (p_data->parent_type == CPUFREQ_PARENT_DEV) 182 + WARN_ON(cpufreq_passive_unregister_notifier(devfreq)); 400 183 break; 401 184 default: 402 185 break;
+147 -165
drivers/devfreq/rk3399_dmc.c
··· 5 5 */ 6 6 7 7 #include <linux/arm-smccc.h> 8 + #include <linux/bitfield.h> 8 9 #include <linux/clk.h> 9 10 #include <linux/delay.h> 10 11 #include <linux/devfreq.h> ··· 21 20 #include <linux/rwsem.h> 22 21 #include <linux/suspend.h> 23 22 23 + #include <soc/rockchip/pm_domains.h> 24 24 #include <soc/rockchip/rk3399_grf.h> 25 25 #include <soc/rockchip/rockchip_sip.h> 26 26 27 - struct dram_timing { 28 - unsigned int ddr3_speed_bin; 29 - unsigned int pd_idle; 30 - unsigned int sr_idle; 31 - unsigned int sr_mc_gate_idle; 32 - unsigned int srpd_lite_idle; 33 - unsigned int standby_idle; 34 - unsigned int auto_pd_dis_freq; 35 - unsigned int dram_dll_dis_freq; 36 - unsigned int phy_dll_dis_freq; 37 - unsigned int ddr3_odt_dis_freq; 38 - unsigned int ddr3_drv; 39 - unsigned int ddr3_odt; 40 - unsigned int phy_ddr3_ca_drv; 41 - unsigned int phy_ddr3_dq_drv; 42 - unsigned int phy_ddr3_odt; 43 - unsigned int lpddr3_odt_dis_freq; 44 - unsigned int lpddr3_drv; 45 - unsigned int lpddr3_odt; 46 - unsigned int phy_lpddr3_ca_drv; 47 - unsigned int phy_lpddr3_dq_drv; 48 - unsigned int phy_lpddr3_odt; 49 - unsigned int lpddr4_odt_dis_freq; 50 - unsigned int lpddr4_drv; 51 - unsigned int lpddr4_dq_odt; 52 - unsigned int lpddr4_ca_odt; 53 - unsigned int phy_lpddr4_ca_drv; 54 - unsigned int phy_lpddr4_ck_cs_drv; 55 - unsigned int phy_lpddr4_dq_drv; 56 - unsigned int phy_lpddr4_odt; 57 - }; 27 + #define NS_TO_CYCLE(NS, MHz) (((NS) * (MHz)) / NSEC_PER_USEC) 28 + 29 + #define RK3399_SET_ODT_PD_0_SR_IDLE GENMASK(7, 0) 30 + #define RK3399_SET_ODT_PD_0_SR_MC_GATE_IDLE GENMASK(15, 8) 31 + #define RK3399_SET_ODT_PD_0_STANDBY_IDLE GENMASK(31, 16) 32 + 33 + #define RK3399_SET_ODT_PD_1_PD_IDLE GENMASK(11, 0) 34 + #define RK3399_SET_ODT_PD_1_SRPD_LITE_IDLE GENMASK(27, 16) 35 + 36 + #define RK3399_SET_ODT_PD_2_ODT_ENABLE BIT(0) 58 37 59 38 struct rk3399_dmcfreq { 60 39 struct device *dev; 61 40 struct devfreq *devfreq; 41 + struct devfreq_dev_profile profile; 62 42 struct devfreq_simple_ondemand_data ondemand_data; 63 43 struct clk *dmc_clk; 64 44 struct devfreq_event_dev *edev; 65 45 struct mutex lock; 66 - struct dram_timing timing; 67 46 struct regulator *vdd_center; 68 47 struct regmap *regmap_pmu; 69 48 unsigned long rate, target_rate; 70 49 unsigned long volt, target_volt; 71 50 unsigned int odt_dis_freq; 72 - int odt_pd_arg0, odt_pd_arg1; 51 + 52 + unsigned int pd_idle_ns; 53 + unsigned int sr_idle_ns; 54 + unsigned int sr_mc_gate_idle_ns; 55 + unsigned int srpd_lite_idle_ns; 56 + unsigned int standby_idle_ns; 57 + unsigned int ddr3_odt_dis_freq; 58 + unsigned int lpddr3_odt_dis_freq; 59 + unsigned int lpddr4_odt_dis_freq; 60 + 61 + unsigned int pd_idle_dis_freq; 62 + unsigned int sr_idle_dis_freq; 63 + unsigned int sr_mc_gate_idle_dis_freq; 64 + unsigned int srpd_lite_idle_dis_freq; 65 + unsigned int standby_idle_dis_freq; 73 66 }; 74 67 75 68 static int rk3399_dmcfreq_target(struct device *dev, unsigned long *freq, ··· 73 78 struct dev_pm_opp *opp; 74 79 unsigned long old_clk_rate = dmcfreq->rate; 75 80 unsigned long target_volt, target_rate; 81 + unsigned int ddrcon_mhz; 76 82 struct arm_smccc_res res; 77 - bool odt_enable = false; 78 83 int err; 84 + 85 + u32 odt_pd_arg0 = 0; 86 + u32 odt_pd_arg1 = 0; 87 + u32 odt_pd_arg2 = 0; 79 88 80 89 opp = devfreq_recommended_opp(dev, freq, flags); 81 90 if (IS_ERR(opp)) ··· 94 95 95 96 mutex_lock(&dmcfreq->lock); 96 97 98 + /* 99 + * Ensure power-domain transitions don't interfere with ARM Trusted 100 + * Firmware power-domain idling. 101 + */ 102 + err = rockchip_pmu_block(); 103 + if (err) { 104 + dev_err(dev, "Failed to block PMU: %d\n", err); 105 + goto out_unlock; 106 + } 107 + 108 + /* 109 + * Some idle parameters may be based on the DDR controller clock, which 110 + * is half of the DDR frequency. 111 + * pd_idle and standby_idle are based on the controller clock cycle. 112 + * sr_idle_cycle, sr_mc_gate_idle_cycle, and srpd_lite_idle_cycle 113 + * are based on the 1024 controller clock cycle 114 + */ 115 + ddrcon_mhz = target_rate / USEC_PER_SEC / 2; 116 + 117 + u32p_replace_bits(&odt_pd_arg1, 118 + NS_TO_CYCLE(dmcfreq->pd_idle_ns, ddrcon_mhz), 119 + RK3399_SET_ODT_PD_1_PD_IDLE); 120 + u32p_replace_bits(&odt_pd_arg0, 121 + NS_TO_CYCLE(dmcfreq->standby_idle_ns, ddrcon_mhz), 122 + RK3399_SET_ODT_PD_0_STANDBY_IDLE); 123 + u32p_replace_bits(&odt_pd_arg0, 124 + DIV_ROUND_UP(NS_TO_CYCLE(dmcfreq->sr_idle_ns, 125 + ddrcon_mhz), 1024), 126 + RK3399_SET_ODT_PD_0_SR_IDLE); 127 + u32p_replace_bits(&odt_pd_arg0, 128 + DIV_ROUND_UP(NS_TO_CYCLE(dmcfreq->sr_mc_gate_idle_ns, 129 + ddrcon_mhz), 1024), 130 + RK3399_SET_ODT_PD_0_SR_MC_GATE_IDLE); 131 + u32p_replace_bits(&odt_pd_arg1, 132 + DIV_ROUND_UP(NS_TO_CYCLE(dmcfreq->srpd_lite_idle_ns, 133 + ddrcon_mhz), 1024), 134 + RK3399_SET_ODT_PD_1_SRPD_LITE_IDLE); 135 + 97 136 if (dmcfreq->regmap_pmu) { 137 + if (target_rate >= dmcfreq->sr_idle_dis_freq) 138 + odt_pd_arg0 &= ~RK3399_SET_ODT_PD_0_SR_IDLE; 139 + 140 + if (target_rate >= dmcfreq->sr_mc_gate_idle_dis_freq) 141 + odt_pd_arg0 &= ~RK3399_SET_ODT_PD_0_SR_MC_GATE_IDLE; 142 + 143 + if (target_rate >= dmcfreq->standby_idle_dis_freq) 144 + odt_pd_arg0 &= ~RK3399_SET_ODT_PD_0_STANDBY_IDLE; 145 + 146 + if (target_rate >= dmcfreq->pd_idle_dis_freq) 147 + odt_pd_arg1 &= ~RK3399_SET_ODT_PD_1_PD_IDLE; 148 + 149 + if (target_rate >= dmcfreq->srpd_lite_idle_dis_freq) 150 + odt_pd_arg1 &= ~RK3399_SET_ODT_PD_1_SRPD_LITE_IDLE; 151 + 98 152 if (target_rate >= dmcfreq->odt_dis_freq) 99 - odt_enable = true; 153 + odt_pd_arg2 |= RK3399_SET_ODT_PD_2_ODT_ENABLE; 100 154 101 155 /* 102 156 * This makes a SMC call to the TF-A to set the DDR PD 103 157 * (power-down) timings and to enable or disable the 104 158 * ODT (on-die termination) resistors. 105 159 */ 106 - arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, dmcfreq->odt_pd_arg0, 107 - dmcfreq->odt_pd_arg1, 108 - ROCKCHIP_SIP_CONFIG_DRAM_SET_ODT_PD, 109 - odt_enable, 0, 0, 0, &res); 160 + arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, odt_pd_arg0, odt_pd_arg1, 161 + ROCKCHIP_SIP_CONFIG_DRAM_SET_ODT_PD, odt_pd_arg2, 162 + 0, 0, 0, &res); 110 163 } 111 164 112 165 /* ··· 209 158 dmcfreq->volt = target_volt; 210 159 211 160 out: 161 + rockchip_pmu_unblock(); 162 + out_unlock: 212 163 mutex_unlock(&dmcfreq->lock); 213 164 return err; 214 165 } ··· 241 188 242 189 return 0; 243 190 } 244 - 245 - static struct devfreq_dev_profile rk3399_devfreq_dmc_profile = { 246 - .polling_ms = 200, 247 - .target = rk3399_dmcfreq_target, 248 - .get_dev_status = rk3399_dmcfreq_get_dev_status, 249 - .get_cur_freq = rk3399_dmcfreq_get_cur_freq, 250 - }; 251 191 252 192 static __maybe_unused int rk3399_dmcfreq_suspend(struct device *dev) 253 193 { ··· 284 238 static SIMPLE_DEV_PM_OPS(rk3399_dmcfreq_pm, rk3399_dmcfreq_suspend, 285 239 rk3399_dmcfreq_resume); 286 240 287 - static int of_get_ddr_timings(struct dram_timing *timing, 288 - struct device_node *np) 241 + static int rk3399_dmcfreq_of_props(struct rk3399_dmcfreq *data, 242 + struct device_node *np) 289 243 { 290 244 int ret = 0; 291 245 292 - ret = of_property_read_u32(np, "rockchip,ddr3_speed_bin", 293 - &timing->ddr3_speed_bin); 294 - ret |= of_property_read_u32(np, "rockchip,pd_idle", 295 - &timing->pd_idle); 296 - ret |= of_property_read_u32(np, "rockchip,sr_idle", 297 - &timing->sr_idle); 298 - ret |= of_property_read_u32(np, "rockchip,sr_mc_gate_idle", 299 - &timing->sr_mc_gate_idle); 300 - ret |= of_property_read_u32(np, "rockchip,srpd_lite_idle", 301 - &timing->srpd_lite_idle); 302 - ret |= of_property_read_u32(np, "rockchip,standby_idle", 303 - &timing->standby_idle); 304 - ret |= of_property_read_u32(np, "rockchip,auto_pd_dis_freq", 305 - &timing->auto_pd_dis_freq); 306 - ret |= of_property_read_u32(np, "rockchip,dram_dll_dis_freq", 307 - &timing->dram_dll_dis_freq); 308 - ret |= of_property_read_u32(np, "rockchip,phy_dll_dis_freq", 309 - &timing->phy_dll_dis_freq); 246 + /* 247 + * These are all optional, and serve as minimum bounds. Give them large 248 + * (i.e., never "disabled") values if the DT doesn't specify one. 249 + */ 250 + data->pd_idle_dis_freq = 251 + data->sr_idle_dis_freq = 252 + data->sr_mc_gate_idle_dis_freq = 253 + data->srpd_lite_idle_dis_freq = 254 + data->standby_idle_dis_freq = UINT_MAX; 255 + 256 + ret |= of_property_read_u32(np, "rockchip,pd-idle-ns", 257 + &data->pd_idle_ns); 258 + ret |= of_property_read_u32(np, "rockchip,sr-idle-ns", 259 + &data->sr_idle_ns); 260 + ret |= of_property_read_u32(np, "rockchip,sr-mc-gate-idle-ns", 261 + &data->sr_mc_gate_idle_ns); 262 + ret |= of_property_read_u32(np, "rockchip,srpd-lite-idle-ns", 263 + &data->srpd_lite_idle_ns); 264 + ret |= of_property_read_u32(np, "rockchip,standby-idle-ns", 265 + &data->standby_idle_ns); 310 266 ret |= of_property_read_u32(np, "rockchip,ddr3_odt_dis_freq", 311 - &timing->ddr3_odt_dis_freq); 312 - ret |= of_property_read_u32(np, "rockchip,ddr3_drv", 313 - &timing->ddr3_drv); 314 - ret |= of_property_read_u32(np, "rockchip,ddr3_odt", 315 - &timing->ddr3_odt); 316 - ret |= of_property_read_u32(np, "rockchip,phy_ddr3_ca_drv", 317 - &timing->phy_ddr3_ca_drv); 318 - ret |= of_property_read_u32(np, "rockchip,phy_ddr3_dq_drv", 319 - &timing->phy_ddr3_dq_drv); 320 - ret |= of_property_read_u32(np, "rockchip,phy_ddr3_odt", 321 - &timing->phy_ddr3_odt); 267 + &data->ddr3_odt_dis_freq); 322 268 ret |= of_property_read_u32(np, "rockchip,lpddr3_odt_dis_freq", 323 - &timing->lpddr3_odt_dis_freq); 324 - ret |= of_property_read_u32(np, "rockchip,lpddr3_drv", 325 - &timing->lpddr3_drv); 326 - ret |= of_property_read_u32(np, "rockchip,lpddr3_odt", 327 - &timing->lpddr3_odt); 328 - ret |= of_property_read_u32(np, "rockchip,phy_lpddr3_ca_drv", 329 - &timing->phy_lpddr3_ca_drv); 330 - ret |= of_property_read_u32(np, "rockchip,phy_lpddr3_dq_drv", 331 - &timing->phy_lpddr3_dq_drv); 332 - ret |= of_property_read_u32(np, "rockchip,phy_lpddr3_odt", 333 - &timing->phy_lpddr3_odt); 269 + &data->lpddr3_odt_dis_freq); 334 270 ret |= of_property_read_u32(np, "rockchip,lpddr4_odt_dis_freq", 335 - &timing->lpddr4_odt_dis_freq); 336 - ret |= of_property_read_u32(np, "rockchip,lpddr4_drv", 337 - &timing->lpddr4_drv); 338 - ret |= of_property_read_u32(np, "rockchip,lpddr4_dq_odt", 339 - &timing->lpddr4_dq_odt); 340 - ret |= of_property_read_u32(np, "rockchip,lpddr4_ca_odt", 341 - &timing->lpddr4_ca_odt); 342 - ret |= of_property_read_u32(np, "rockchip,phy_lpddr4_ca_drv", 343 - &timing->phy_lpddr4_ca_drv); 344 - ret |= of_property_read_u32(np, "rockchip,phy_lpddr4_ck_cs_drv", 345 - &timing->phy_lpddr4_ck_cs_drv); 346 - ret |= of_property_read_u32(np, "rockchip,phy_lpddr4_dq_drv", 347 - &timing->phy_lpddr4_dq_drv); 348 - ret |= of_property_read_u32(np, "rockchip,phy_lpddr4_odt", 349 - &timing->phy_lpddr4_odt); 271 + &data->lpddr4_odt_dis_freq); 272 + 273 + ret |= of_property_read_u32(np, "rockchip,pd-idle-dis-freq-hz", 274 + &data->pd_idle_dis_freq); 275 + ret |= of_property_read_u32(np, "rockchip,sr-idle-dis-freq-hz", 276 + &data->sr_idle_dis_freq); 277 + ret |= of_property_read_u32(np, "rockchip,sr-mc-gate-idle-dis-freq-hz", 278 + &data->sr_mc_gate_idle_dis_freq); 279 + ret |= of_property_read_u32(np, "rockchip,srpd-lite-idle-dis-freq-hz", 280 + &data->srpd_lite_idle_dis_freq); 281 + ret |= of_property_read_u32(np, "rockchip,standby-idle-dis-freq-hz", 282 + &data->standby_idle_dis_freq); 350 283 351 284 return ret; 352 285 } ··· 336 311 struct device *dev = &pdev->dev; 337 312 struct device_node *np = pdev->dev.of_node, *node; 338 313 struct rk3399_dmcfreq *data; 339 - int ret, index, size; 340 - uint32_t *timing; 314 + int ret; 341 315 struct dev_pm_opp *opp; 342 316 u32 ddr_type; 343 317 u32 val; ··· 367 343 return ret; 368 344 } 369 345 370 - /* 371 - * Get dram timing and pass it to arm trust firmware, 372 - * the dram driver in arm trust firmware will get these 373 - * timing and to do dram initial. 374 - */ 375 - if (!of_get_ddr_timings(&data->timing, np)) { 376 - timing = &data->timing.ddr3_speed_bin; 377 - size = sizeof(struct dram_timing) / 4; 378 - for (index = 0; index < size; index++) { 379 - arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, *timing++, index, 380 - ROCKCHIP_SIP_CONFIG_DRAM_SET_PARAM, 381 - 0, 0, 0, 0, &res); 382 - if (res.a0) { 383 - dev_err(dev, "Failed to set dram param: %ld\n", 384 - res.a0); 385 - ret = -EINVAL; 386 - goto err_edev; 387 - } 388 - } 389 - } 346 + rk3399_dmcfreq_of_props(data, np); 390 347 391 348 node = of_parse_phandle(np, "rockchip,pmu", 0); 392 349 if (!node) ··· 386 381 387 382 switch (ddr_type) { 388 383 case RK3399_PMUGRF_DDRTYPE_DDR3: 389 - data->odt_dis_freq = data->timing.ddr3_odt_dis_freq; 384 + data->odt_dis_freq = data->ddr3_odt_dis_freq; 390 385 break; 391 386 case RK3399_PMUGRF_DDRTYPE_LPDDR3: 392 - data->odt_dis_freq = data->timing.lpddr3_odt_dis_freq; 387 + data->odt_dis_freq = data->lpddr3_odt_dis_freq; 393 388 break; 394 389 case RK3399_PMUGRF_DDRTYPE_LPDDR4: 395 - data->odt_dis_freq = data->timing.lpddr4_odt_dis_freq; 390 + data->odt_dis_freq = data->lpddr4_odt_dis_freq; 396 391 break; 397 392 default: 398 393 ret = -EINVAL; ··· 405 400 0, 0, 0, 0, &res); 406 401 407 402 /* 408 - * In TF-A there is a platform SIP call to set the PD (power-down) 409 - * timings and to enable or disable the ODT (on-die termination). 410 - * This call needs three arguments as follows: 411 - * 412 - * arg0: 413 - * bit[0-7] : sr_idle 414 - * bit[8-15] : sr_mc_gate_idle 415 - * bit[16-31] : standby idle 416 - * arg1: 417 - * bit[0-11] : pd_idle 418 - * bit[16-27] : srpd_lite_idle 419 - * arg2: 420 - * bit[0] : odt enable 421 - */ 422 - data->odt_pd_arg0 = (data->timing.sr_idle & 0xff) | 423 - ((data->timing.sr_mc_gate_idle & 0xff) << 8) | 424 - ((data->timing.standby_idle & 0xffff) << 16); 425 - data->odt_pd_arg1 = (data->timing.pd_idle & 0xfff) | 426 - ((data->timing.srpd_lite_idle & 0xfff) << 16); 427 - 428 - /* 429 403 * We add a devfreq driver to our parent since it has a device tree node 430 404 * with operating points. 431 405 */ 432 - if (dev_pm_opp_of_add_table(dev)) { 406 + if (devm_pm_opp_of_add_table(dev)) { 433 407 dev_err(dev, "Invalid operating-points in device tree.\n"); 434 408 ret = -EINVAL; 435 409 goto err_edev; 436 410 } 437 411 438 - of_property_read_u32(np, "upthreshold", 439 - &data->ondemand_data.upthreshold); 440 - of_property_read_u32(np, "downdifferential", 441 - &data->ondemand_data.downdifferential); 412 + data->ondemand_data.upthreshold = 25; 413 + data->ondemand_data.downdifferential = 15; 442 414 443 415 data->rate = clk_get_rate(data->dmc_clk); 444 416 445 417 opp = devfreq_recommended_opp(dev, &data->rate, 0); 446 418 if (IS_ERR(opp)) { 447 419 ret = PTR_ERR(opp); 448 - goto err_free_opp; 420 + goto err_edev; 449 421 } 450 422 451 423 data->rate = dev_pm_opp_get_freq(opp); 452 424 data->volt = dev_pm_opp_get_voltage(opp); 453 425 dev_pm_opp_put(opp); 454 426 455 - rk3399_devfreq_dmc_profile.initial_freq = data->rate; 427 + data->profile = (struct devfreq_dev_profile) { 428 + .polling_ms = 200, 429 + .target = rk3399_dmcfreq_target, 430 + .get_dev_status = rk3399_dmcfreq_get_dev_status, 431 + .get_cur_freq = rk3399_dmcfreq_get_cur_freq, 432 + .initial_freq = data->rate, 433 + }; 456 434 457 435 data->devfreq = devm_devfreq_add_device(dev, 458 - &rk3399_devfreq_dmc_profile, 436 + &data->profile, 459 437 DEVFREQ_GOV_SIMPLE_ONDEMAND, 460 438 &data->ondemand_data); 461 439 if (IS_ERR(data->devfreq)) { 462 440 ret = PTR_ERR(data->devfreq); 463 - goto err_free_opp; 441 + goto err_edev; 464 442 } 465 443 466 444 devm_devfreq_register_opp_notifier(dev, data->devfreq); ··· 453 465 454 466 return 0; 455 467 456 - err_free_opp: 457 - dev_pm_opp_of_remove_table(&pdev->dev); 458 468 err_edev: 459 469 devfreq_event_disable_edev(data->edev); 460 470 ··· 463 477 { 464 478 struct rk3399_dmcfreq *dmcfreq = dev_get_drvdata(&pdev->dev); 465 479 466 - /* 467 - * Before remove the opp table we need to unregister the opp notifier. 468 - */ 469 - devm_devfreq_unregister_opp_notifier(dmcfreq->dev, dmcfreq->devfreq); 470 - dev_pm_opp_of_remove_table(dmcfreq->dev); 480 + devfreq_event_disable_edev(dmcfreq->edev); 471 481 472 482 return 0; 473 483 }
+133
drivers/idle/intel_idle.c
··· 765 765 }; 766 766 767 767 /* 768 + * On AlderLake C1 has to be disabled if C1E is enabled, and vice versa. 769 + * C1E is enabled only if "C1E promotion" bit is set in MSR_IA32_POWER_CTL. 770 + * But in this case there is effectively no C1, because C1 requests are 771 + * promoted to C1E. If the "C1E promotion" bit is cleared, then both C1 772 + * and C1E requests end up with C1, so there is effectively no C1E. 773 + * 774 + * By default we enable C1E and disable C1 by marking it with 775 + * 'CPUIDLE_FLAG_UNUSABLE'. 776 + */ 777 + static struct cpuidle_state adl_cstates[] __initdata = { 778 + { 779 + .name = "C1", 780 + .desc = "MWAIT 0x00", 781 + .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_UNUSABLE, 782 + .exit_latency = 1, 783 + .target_residency = 1, 784 + .enter = &intel_idle, 785 + .enter_s2idle = intel_idle_s2idle, }, 786 + { 787 + .name = "C1E", 788 + .desc = "MWAIT 0x01", 789 + .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_ALWAYS_ENABLE, 790 + .exit_latency = 2, 791 + .target_residency = 4, 792 + .enter = &intel_idle, 793 + .enter_s2idle = intel_idle_s2idle, }, 794 + { 795 + .name = "C6", 796 + .desc = "MWAIT 0x20", 797 + .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED, 798 + .exit_latency = 220, 799 + .target_residency = 600, 800 + .enter = &intel_idle, 801 + .enter_s2idle = intel_idle_s2idle, }, 802 + { 803 + .name = "C8", 804 + .desc = "MWAIT 0x40", 805 + .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TLB_FLUSHED, 806 + .exit_latency = 280, 807 + .target_residency = 800, 808 + .enter = &intel_idle, 809 + .enter_s2idle = intel_idle_s2idle, }, 810 + { 811 + .name = "C10", 812 + .desc = "MWAIT 0x60", 813 + .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TLB_FLUSHED, 814 + .exit_latency = 680, 815 + .target_residency = 2000, 816 + .enter = &intel_idle, 817 + .enter_s2idle = intel_idle_s2idle, }, 818 + { 819 + .enter = NULL } 820 + }; 821 + 822 + static struct cpuidle_state adl_l_cstates[] __initdata = { 823 + { 824 + .name = "C1", 825 + .desc = "MWAIT 0x00", 826 + .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_UNUSABLE, 827 + .exit_latency = 1, 828 + .target_residency = 1, 829 + .enter = &intel_idle, 830 + .enter_s2idle = intel_idle_s2idle, }, 831 + { 832 + .name = "C1E", 833 + .desc = "MWAIT 0x01", 834 + .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_ALWAYS_ENABLE, 835 + .exit_latency = 2, 836 + .target_residency = 4, 837 + .enter = &intel_idle, 838 + .enter_s2idle = intel_idle_s2idle, }, 839 + { 840 + .name = "C6", 841 + .desc = "MWAIT 0x20", 842 + .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED, 843 + .exit_latency = 170, 844 + .target_residency = 500, 845 + .enter = &intel_idle, 846 + .enter_s2idle = intel_idle_s2idle, }, 847 + { 848 + .name = "C8", 849 + .desc = "MWAIT 0x40", 850 + .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TLB_FLUSHED, 851 + .exit_latency = 200, 852 + .target_residency = 600, 853 + .enter = &intel_idle, 854 + .enter_s2idle = intel_idle_s2idle, }, 855 + { 856 + .name = "C10", 857 + .desc = "MWAIT 0x60", 858 + .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TLB_FLUSHED, 859 + .exit_latency = 230, 860 + .target_residency = 700, 861 + .enter = &intel_idle, 862 + .enter_s2idle = intel_idle_s2idle, }, 863 + { 864 + .enter = NULL } 865 + }; 866 + 867 + /* 768 868 * On Sapphire Rapids Xeon C1 has to be disabled if C1E is enabled, and vice 769 869 * versa. On SPR C1E is enabled only if "C1E promotion" bit is set in 770 870 * MSR_IA32_POWER_CTL. But in this case there effectively no C1, because C1 ··· 1247 1147 .use_acpi = true, 1248 1148 }; 1249 1149 1150 + static const struct idle_cpu idle_cpu_adl __initconst = { 1151 + .state_table = adl_cstates, 1152 + }; 1153 + 1154 + static const struct idle_cpu idle_cpu_adl_l __initconst = { 1155 + .state_table = adl_l_cstates, 1156 + }; 1157 + 1250 1158 static const struct idle_cpu idle_cpu_spr __initconst = { 1251 1159 .state_table = spr_cstates, 1252 1160 .disable_promotion_to_c1e = true, ··· 1323 1215 X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X, &idle_cpu_skx), 1324 1216 X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, &idle_cpu_icx), 1325 1217 X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D, &idle_cpu_icx), 1218 + X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, &idle_cpu_adl), 1219 + X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, &idle_cpu_adl_l), 1326 1220 X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &idle_cpu_spr), 1327 1221 X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL, &idle_cpu_knl), 1328 1222 X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM, &idle_cpu_knl), ··· 1684 1574 } 1685 1575 1686 1576 /** 1577 + * adl_idle_state_table_update - Adjust AlderLake idle states table. 1578 + */ 1579 + static void __init adl_idle_state_table_update(void) 1580 + { 1581 + /* Check if user prefers C1 over C1E. */ 1582 + if (preferred_states_mask & BIT(1) && !(preferred_states_mask & BIT(2))) { 1583 + cpuidle_state_table[0].flags &= ~CPUIDLE_FLAG_UNUSABLE; 1584 + cpuidle_state_table[1].flags |= CPUIDLE_FLAG_UNUSABLE; 1585 + 1586 + /* Disable C1E by clearing the "C1E promotion" bit. */ 1587 + c1e_promotion = C1E_PROMOTION_DISABLE; 1588 + return; 1589 + } 1590 + 1591 + /* Make sure C1E is enabled by default */ 1592 + c1e_promotion = C1E_PROMOTION_ENABLE; 1593 + } 1594 + 1595 + /** 1687 1596 * spr_idle_state_table_update - Adjust Sapphire Rapids idle states table. 1688 1597 */ 1689 1598 static void __init spr_idle_state_table_update(void) ··· 1770 1641 break; 1771 1642 case INTEL_FAM6_SAPPHIRERAPIDS_X: 1772 1643 spr_idle_state_table_update(); 1644 + break; 1645 + case INTEL_FAM6_ALDERLAKE: 1646 + case INTEL_FAM6_ALDERLAKE_L: 1647 + adl_idle_state_table_update(); 1773 1648 break; 1774 1649 } 1775 1650
+1 -4
drivers/iio/chemical/scd30.h
··· 68 68 scd30_command_t command; 69 69 }; 70 70 71 - int scd30_suspend(struct device *dev); 72 - int scd30_resume(struct device *dev); 73 - 74 - static __maybe_unused SIMPLE_DEV_PM_OPS(scd30_pm_ops, scd30_suspend, scd30_resume); 71 + extern const struct dev_pm_ops scd30_pm_ops; 75 72 76 73 int scd30_probe(struct device *dev, int irq, const char *name, void *priv, scd30_command_t command); 77 74
+5 -5
drivers/iio/chemical/scd30_core.c
··· 517 517 IIO_CHAN_SOFT_TIMESTAMP(3), 518 518 }; 519 519 520 - int __maybe_unused scd30_suspend(struct device *dev) 520 + static int scd30_suspend(struct device *dev) 521 521 { 522 522 struct iio_dev *indio_dev = dev_get_drvdata(dev); 523 523 struct scd30_state *state = iio_priv(indio_dev); ··· 529 529 530 530 return regulator_disable(state->vdd); 531 531 } 532 - EXPORT_SYMBOL(scd30_suspend); 533 532 534 - int __maybe_unused scd30_resume(struct device *dev) 533 + static int scd30_resume(struct device *dev) 535 534 { 536 535 struct iio_dev *indio_dev = dev_get_drvdata(dev); 537 536 struct scd30_state *state = iio_priv(indio_dev); ··· 542 543 543 544 return scd30_command_write(state, CMD_START_MEAS, state->pressure_comp); 544 545 } 545 - EXPORT_SYMBOL(scd30_resume); 546 + 547 + EXPORT_NS_SIMPLE_DEV_PM_OPS(scd30_pm_ops, scd30_suspend, scd30_resume, IIO_SCD30); 546 548 547 549 static void scd30_stop_meas(void *data) 548 550 { ··· 759 759 760 760 return devm_iio_device_register(dev, indio_dev); 761 761 } 762 - EXPORT_SYMBOL(scd30_probe); 762 + EXPORT_SYMBOL_NS(scd30_probe, IIO_SCD30); 763 763 764 764 MODULE_AUTHOR("Tomasz Duszynski <tomasz.duszynski@octakon.com>"); 765 765 MODULE_DESCRIPTION("Sensirion SCD30 carbon dioxide sensor core driver");
+2 -1
drivers/iio/chemical/scd30_i2c.c
··· 128 128 .driver = { 129 129 .name = KBUILD_MODNAME, 130 130 .of_match_table = scd30_i2c_of_match, 131 - .pm = &scd30_pm_ops, 131 + .pm = pm_sleep_ptr(&scd30_pm_ops), 132 132 }, 133 133 .probe_new = scd30_i2c_probe, 134 134 }; ··· 137 137 MODULE_AUTHOR("Tomasz Duszynski <tomasz.duszynski@octakon.com>"); 138 138 MODULE_DESCRIPTION("Sensirion SCD30 carbon dioxide sensor i2c driver"); 139 139 MODULE_LICENSE("GPL v2"); 140 + MODULE_IMPORT_NS(IIO_SCD30);
+2 -1
drivers/iio/chemical/scd30_serial.c
··· 252 252 .driver = { 253 253 .name = KBUILD_MODNAME, 254 254 .of_match_table = scd30_serdev_of_match, 255 - .pm = &scd30_pm_ops, 255 + .pm = pm_sleep_ptr(&scd30_pm_ops), 256 256 }, 257 257 .probe = scd30_serdev_probe, 258 258 }; ··· 261 261 MODULE_AUTHOR("Tomasz Duszynski <tomasz.duszynski@octakon.com>"); 262 262 MODULE_DESCRIPTION("Sensirion SCD30 carbon dioxide sensor serial driver"); 263 263 MODULE_LICENSE("GPL v2"); 264 + MODULE_IMPORT_NS(IIO_SCD30);
+3 -3
drivers/opp/of.c
··· 1448 1448 * Returns 0 on success or a proper -EINVAL value in case of error. 1449 1449 */ 1450 1450 static int __maybe_unused 1451 - _get_dt_power(unsigned long *mW, unsigned long *kHz, struct device *dev) 1451 + _get_dt_power(struct device *dev, unsigned long *mW, unsigned long *kHz) 1452 1452 { 1453 1453 struct dev_pm_opp *opp; 1454 1454 unsigned long opp_freq, opp_power; ··· 1482 1482 * Returns -EINVAL if the power calculation failed because of missing 1483 1483 * parameters, 0 otherwise. 1484 1484 */ 1485 - static int __maybe_unused _get_power(unsigned long *mW, unsigned long *kHz, 1486 - struct device *dev) 1485 + static int __maybe_unused _get_power(struct device *dev, unsigned long *mW, 1486 + unsigned long *kHz) 1487 1487 { 1488 1488 struct dev_pm_opp *opp; 1489 1489 struct device_node *np;
+1 -1
drivers/powercap/dtpm_cpu.c
··· 211 211 return 0; 212 212 213 213 pd = em_cpu_get(cpu); 214 - if (!pd) 214 + if (!pd || em_is_artificial(pd)) 215 215 return -EINVAL; 216 216 217 217 dtpm_cpu = kzalloc(sizeof(*dtpm_cpu), GFP_KERNEL);
+3 -1
drivers/powercap/intel_rapl_common.c
··· 1010 1010 * where time_unit is default to 1 sec. Never 0. 1011 1011 */ 1012 1012 if (!to_raw) 1013 - return (value) ? value *= rp->time_unit : rp->time_unit; 1013 + return (value) ? value * rp->time_unit : rp->time_unit; 1014 1014 1015 1015 value = div64_u64(value, rp->time_unit); 1016 1016 ··· 1107 1107 X86_MATCH_INTEL_FAM6_MODEL(ROCKETLAKE, &rapl_defaults_core), 1108 1108 X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, &rapl_defaults_core), 1109 1109 X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, &rapl_defaults_core), 1110 + X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_N, &rapl_defaults_core), 1111 + X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE, &rapl_defaults_core), 1110 1112 X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &rapl_defaults_spr_server), 1111 1113 X86_MATCH_INTEL_FAM6_MODEL(LAKEFIELD, &rapl_defaults_core), 1112 1114
+1
drivers/powercap/intel_rapl_msr.c
··· 140 140 { X86_VENDOR_INTEL, 6, INTEL_FAM6_TIGERLAKE_L, X86_FEATURE_ANY }, 141 141 { X86_VENDOR_INTEL, 6, INTEL_FAM6_ALDERLAKE, X86_FEATURE_ANY }, 142 142 { X86_VENDOR_INTEL, 6, INTEL_FAM6_ALDERLAKE_L, X86_FEATURE_ANY }, 143 + { X86_VENDOR_INTEL, 6, INTEL_FAM6_RAPTORLAKE, X86_FEATURE_ANY }, 143 144 {} 144 145 }; 145 146
+118
drivers/soc/rockchip/pm_domains.c
··· 8 8 #include <linux/io.h> 9 9 #include <linux/iopoll.h> 10 10 #include <linux/err.h> 11 + #include <linux/mutex.h> 11 12 #include <linux/pm_clock.h> 12 13 #include <linux/pm_domain.h> 13 14 #include <linux/of_address.h> ··· 17 16 #include <linux/clk.h> 18 17 #include <linux/regmap.h> 19 18 #include <linux/mfd/syscon.h> 19 + #include <soc/rockchip/pm_domains.h> 20 20 #include <dt-bindings/power/px30-power.h> 21 21 #include <dt-bindings/power/rk3036-power.h> 22 22 #include <dt-bindings/power/rk3066-power.h> ··· 140 138 141 139 #define DOMAIN_RK3568(name, pwr, req, wakeup) \ 142 140 DOMAIN_M(name, pwr, pwr, req, req, req, wakeup) 141 + 142 + /* 143 + * Dynamic Memory Controller may need to coordinate with us -- see 144 + * rockchip_pmu_block(). 145 + * 146 + * dmc_pmu_mutex protects registration-time races, so DMC driver doesn't try to 147 + * block() while we're initializing the PMU. 148 + */ 149 + static DEFINE_MUTEX(dmc_pmu_mutex); 150 + static struct rockchip_pmu *dmc_pmu; 151 + 152 + /* 153 + * Block PMU transitions and make sure they don't interfere with ARM Trusted 154 + * Firmware operations. There are two conflicts, noted in the comments below. 155 + * 156 + * Caller must unblock PMU transitions via rockchip_pmu_unblock(). 157 + */ 158 + int rockchip_pmu_block(void) 159 + { 160 + struct rockchip_pmu *pmu; 161 + struct generic_pm_domain *genpd; 162 + struct rockchip_pm_domain *pd; 163 + int i, ret; 164 + 165 + mutex_lock(&dmc_pmu_mutex); 166 + 167 + /* No PMU (yet)? Then we just block rockchip_pmu_probe(). */ 168 + if (!dmc_pmu) 169 + return 0; 170 + pmu = dmc_pmu; 171 + 172 + /* 173 + * mutex blocks all idle transitions: we can't touch the 174 + * PMU_BUS_IDLE_REQ (our ".idle_offset") register while ARM Trusted 175 + * Firmware might be using it. 176 + */ 177 + mutex_lock(&pmu->mutex); 178 + 179 + /* 180 + * Power domain clocks: Per Rockchip, we *must* keep certain clocks 181 + * enabled for the duration of power-domain transitions. Most 182 + * transitions are handled by this driver, but some cases (in 183 + * particular, DRAM DVFS / memory-controller idle) must be handled by 184 + * firmware. Firmware can handle most clock management via a special 185 + * "ungate" register (PMU_CRU_GATEDIS_CON0), but unfortunately, this 186 + * doesn't handle PLLs. We can assist this transition by doing the 187 + * clock management on behalf of firmware. 188 + */ 189 + for (i = 0; i < pmu->genpd_data.num_domains; i++) { 190 + genpd = pmu->genpd_data.domains[i]; 191 + if (genpd) { 192 + pd = to_rockchip_pd(genpd); 193 + ret = clk_bulk_enable(pd->num_clks, pd->clks); 194 + if (ret < 0) { 195 + dev_err(pmu->dev, 196 + "failed to enable clks for domain '%s': %d\n", 197 + genpd->name, ret); 198 + goto err; 199 + } 200 + } 201 + } 202 + 203 + return 0; 204 + 205 + err: 206 + for (i = i - 1; i >= 0; i--) { 207 + genpd = pmu->genpd_data.domains[i]; 208 + if (genpd) { 209 + pd = to_rockchip_pd(genpd); 210 + clk_bulk_disable(pd->num_clks, pd->clks); 211 + } 212 + } 213 + mutex_unlock(&pmu->mutex); 214 + mutex_unlock(&dmc_pmu_mutex); 215 + 216 + return ret; 217 + } 218 + EXPORT_SYMBOL_GPL(rockchip_pmu_block); 219 + 220 + /* Unblock PMU transitions. */ 221 + void rockchip_pmu_unblock(void) 222 + { 223 + struct rockchip_pmu *pmu; 224 + struct generic_pm_domain *genpd; 225 + struct rockchip_pm_domain *pd; 226 + int i; 227 + 228 + if (dmc_pmu) { 229 + pmu = dmc_pmu; 230 + for (i = 0; i < pmu->genpd_data.num_domains; i++) { 231 + genpd = pmu->genpd_data.domains[i]; 232 + if (genpd) { 233 + pd = to_rockchip_pd(genpd); 234 + clk_bulk_disable(pd->num_clks, pd->clks); 235 + } 236 + } 237 + 238 + mutex_unlock(&pmu->mutex); 239 + } 240 + 241 + mutex_unlock(&dmc_pmu_mutex); 242 + } 243 + EXPORT_SYMBOL_GPL(rockchip_pmu_unblock); 143 244 144 245 static bool rockchip_pmu_domain_is_idle(struct rockchip_pm_domain *pd) 145 246 { ··· 795 690 796 691 error = -ENODEV; 797 692 693 + /* 694 + * Prevent any rockchip_pmu_block() from racing with the remainder of 695 + * setup (clocks, register initialization). 696 + */ 697 + mutex_lock(&dmc_pmu_mutex); 698 + 798 699 for_each_available_child_of_node(np, node) { 799 700 error = rockchip_pm_add_one_domain(pmu, node); 800 701 if (error) { ··· 830 719 goto err_out; 831 720 } 832 721 722 + /* We only expect one PMU. */ 723 + if (!WARN_ON_ONCE(dmc_pmu)) 724 + dmc_pmu = pmu; 725 + 726 + mutex_unlock(&dmc_pmu_mutex); 727 + 833 728 return 0; 834 729 835 730 err_out: 836 731 rockchip_pm_domain_cleanup(pmu); 732 + mutex_unlock(&dmc_pmu_mutex); 837 733 return error; 838 734 } 839 735
+1 -1
drivers/thermal/cpufreq_cooling.c
··· 328 328 struct cpufreq_policy *policy; 329 329 unsigned int nr_levels; 330 330 331 - if (!em) 331 + if (!em || em_is_artificial(em)) 332 332 return false; 333 333 334 334 policy = cpufreq_cdev->policy;
+5 -3
drivers/thermal/devfreq_cooling.c
··· 358 358 struct thermal_cooling_device *cdev; 359 359 struct device *dev = df->dev.parent; 360 360 struct devfreq_cooling_device *dfc; 361 + struct em_perf_domain *em; 361 362 char *name; 362 363 int err, num_opps; 363 364 ··· 368 367 369 368 dfc->devfreq = df; 370 369 371 - dfc->em_pd = em_pd_get(dev); 372 - if (dfc->em_pd) { 370 + em = em_pd_get(dev); 371 + if (em && !em_is_artificial(em)) { 372 + dfc->em_pd = em; 373 373 devfreq_cooling_ops.get_requested_power = 374 374 devfreq_cooling_get_requested_power; 375 375 devfreq_cooling_ops.state2power = devfreq_cooling_state2power; ··· 381 379 num_opps = em_pd_nr_perf_states(dfc->em_pd); 382 380 } else { 383 381 /* Backward compatibility for drivers which do not use IPA */ 384 - dev_dbg(dev, "missing EM for cooling device\n"); 382 + dev_dbg(dev, "missing proper EM for cooling device\n"); 385 383 386 384 num_opps = dev_pm_opp_get_opp_count(dev); 387 385
+5
include/acpi/cppc_acpi.h
··· 141 141 extern int cppc_set_enable(int cpu, bool enable); 142 142 extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps); 143 143 extern bool acpi_cpc_valid(void); 144 + extern bool cppc_allow_fast_switch(void); 144 145 extern int acpi_get_psd_map(unsigned int cpu, struct cppc_cpudata *cpu_data); 145 146 extern unsigned int cppc_get_transition_latency(int cpu); 146 147 extern bool cpc_ffh_supported(void); ··· 173 172 return -ENOTSUPP; 174 173 } 175 174 static inline bool acpi_cpc_valid(void) 175 + { 176 + return false; 177 + } 178 + static inline bool cppc_allow_fast_switch(void) 176 179 { 177 180 return false; 178 181 }
+2
include/linux/acpi.h
··· 574 574 #define OSC_SB_OSLPI_SUPPORT 0x00000100 575 575 #define OSC_SB_CPC_DIVERSE_HIGH_SUPPORT 0x00001000 576 576 #define OSC_SB_GENERIC_INITIATOR_SUPPORT 0x00002000 577 + #define OSC_SB_CPC_FLEXIBLE_ADR_SPACE 0x00004000 577 578 #define OSC_SB_NATIVE_USB4_SUPPORT 0x00040000 578 579 #define OSC_SB_PRM_SUPPORT 0x00200000 579 580 ··· 582 581 extern bool osc_pc_lpi_support_confirmed; 583 582 extern bool osc_sb_native_usb4_support_confirmed; 584 583 extern bool osc_sb_cppc_not_supported; 584 + extern bool osc_cpc_flexible_adr_space_confirmed; 585 585 586 586 /* USB4 Capabilities */ 587 587 #define OSC_USB_USB3_TUNNELING 0x00000001
+15 -2
include/linux/devfreq.h
··· 38 38 39 39 struct devfreq; 40 40 struct devfreq_governor; 41 + struct devfreq_cpu_data; 41 42 struct thermal_cooling_device; 42 43 43 44 /** ··· 289 288 #endif 290 289 291 290 #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE) 291 + enum devfreq_parent_dev_type { 292 + DEVFREQ_PARENT_DEV, 293 + CPUFREQ_PARENT_DEV, 294 + }; 295 + 292 296 /** 293 297 * struct devfreq_passive_data - ``void *data`` fed to struct devfreq 294 298 * and devfreq_add_device ··· 305 299 * using governors except for passive governor. 306 300 * If the devfreq device has the specific method to decide 307 301 * the next frequency, should use this callback. 308 - * @this: the devfreq instance of own device. 309 - * @nb: the notifier block for DEVFREQ_TRANSITION_NOTIFIER list 302 + * @parent_type: the parent type of the device. 303 + * @this: the devfreq instance of own device. 304 + * @nb: the notifier block for DEVFREQ_TRANSITION_NOTIFIER or 305 + * CPUFREQ_TRANSITION_NOTIFIER list. 306 + * @cpu_data_list: the list of cpu frequency data for all cpufreq_policy. 310 307 * 311 308 * The devfreq_passive_data have to set the devfreq instance of parent 312 309 * device with governors except for the passive governor. But, don't need to ··· 323 314 /* Optional callback to decide the next frequency of passvice device */ 324 315 int (*get_target_freq)(struct devfreq *this, unsigned long *freq); 325 316 317 + /* Should set the type of parent device */ 318 + enum devfreq_parent_dev_type parent_type; 319 + 326 320 /* For passive governor's internal use. Don't need to set them */ 327 321 struct devfreq *this; 328 322 struct notifier_block nb; 323 + struct list_head cpu_data_list; 329 324 }; 330 325 #endif 331 326
+31 -4
include/linux/energy_model.h
··· 67 67 * 68 68 * EM_PERF_DOMAIN_SKIP_INEFFICIENCIES: Skip inefficient states when estimating 69 69 * energy consumption. 70 + * 71 + * EM_PERF_DOMAIN_ARTIFICIAL: The power values are artificial and might be 72 + * created by platform missing real power information 70 73 */ 71 74 #define EM_PERF_DOMAIN_MILLIWATTS BIT(0) 72 75 #define EM_PERF_DOMAIN_SKIP_INEFFICIENCIES BIT(1) 76 + #define EM_PERF_DOMAIN_ARTIFICIAL BIT(2) 73 77 74 78 #define em_span_cpus(em) (to_cpumask((em)->cpus)) 79 + #define em_is_artificial(em) ((em)->flags & EM_PERF_DOMAIN_ARTIFICIAL) 75 80 76 81 #ifdef CONFIG_ENERGY_MODEL 77 82 #define EM_MAX_POWER 0xFFFF ··· 101 96 /** 102 97 * active_power() - Provide power at the next performance state of 103 98 * a device 99 + * @dev : Device for which we do this operation (can be a CPU) 104 100 * @power : Active power at the performance state 105 101 * (modified) 106 102 * @freq : Frequency at the performance state in kHz 107 103 * (modified) 108 - * @dev : Device for which we do this operation (can be a CPU) 109 104 * 110 105 * active_power() must find the lowest performance state of 'dev' above 111 106 * 'freq' and update 'power' and 'freq' to the matching active power ··· 117 112 * 118 113 * Return 0 on success. 119 114 */ 120 - int (*active_power)(unsigned long *power, unsigned long *freq, 121 - struct device *dev); 115 + int (*active_power)(struct device *dev, unsigned long *power, 116 + unsigned long *freq); 117 + 118 + /** 119 + * get_cost() - Provide the cost at the given performance state of 120 + * a device 121 + * @dev : Device for which we do this operation (can be a CPU) 122 + * @freq : Frequency at the performance state in kHz 123 + * @cost : The cost value for the performance state 124 + * (modified) 125 + * 126 + * In case of CPUs, the cost is the one of a single CPU in the domain. 127 + * It is expected to fit in the [0, EM_MAX_POWER] range due to internal 128 + * usage in EAS calculation. 129 + * 130 + * Return 0 on success, or appropriate error value in case of failure. 131 + */ 132 + int (*get_cost)(struct device *dev, unsigned long freq, 133 + unsigned long *cost); 122 134 }; 123 - #define EM_DATA_CB(_active_power_cb) { .active_power = &_active_power_cb } 124 135 #define EM_SET_ACTIVE_POWER_CB(em_cb, cb) ((em_cb).active_power = cb) 136 + #define EM_ADV_DATA_CB(_active_power_cb, _cost_cb) \ 137 + { .active_power = _active_power_cb, \ 138 + .get_cost = _cost_cb } 139 + #define EM_DATA_CB(_active_power_cb) \ 140 + EM_ADV_DATA_CB(_active_power_cb, NULL) 125 141 126 142 struct em_perf_domain *em_cpu_get(int cpu); 127 143 struct em_perf_domain *em_pd_get(struct device *dev); ··· 290 264 291 265 #else 292 266 struct em_data_callback {}; 267 + #define EM_ADV_DATA_CB(_active_power_cb, _cost_cb) { } 293 268 #define EM_DATA_CB(_active_power_cb) { } 294 269 #define EM_SET_ACTIVE_POWER_CB(em_cb, cb) do { } while (0) 295 270
+9 -5
include/linux/pm.h
··· 368 368 369 369 #ifdef CONFIG_PM 370 370 #define _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, runtime_suspend_fn, \ 371 - runtime_resume_fn, idle_fn, sec) \ 371 + runtime_resume_fn, idle_fn, sec, ns) \ 372 372 _DEFINE_DEV_PM_OPS(name, suspend_fn, resume_fn, runtime_suspend_fn, \ 373 373 runtime_resume_fn, idle_fn); \ 374 - _EXPORT_SYMBOL(name, sec) 374 + __EXPORT_SYMBOL(name, sec, ns) 375 375 #else 376 376 #define _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, runtime_suspend_fn, \ 377 - runtime_resume_fn, idle_fn, sec) \ 377 + runtime_resume_fn, idle_fn, sec, ns) \ 378 378 static __maybe_unused _DEFINE_DEV_PM_OPS(__static_##name, suspend_fn, \ 379 379 resume_fn, runtime_suspend_fn, \ 380 380 runtime_resume_fn, idle_fn) ··· 391 391 _DEFINE_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL) 392 392 393 393 #define EXPORT_SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn) \ 394 - _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL, "") 394 + _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL, "", "") 395 395 #define EXPORT_GPL_SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn) \ 396 - _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL, "_gpl") 396 + _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL, "_gpl", "") 397 + #define EXPORT_NS_SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn, ns) \ 398 + _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL, "", #ns) 399 + #define EXPORT_NS_GPL_SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn, ns) \ 400 + _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL, "_gpl", #ns) 397 401 398 402 /* Deprecated. Use DEFINE_SIMPLE_DEV_PM_OPS() instead. */ 399 403 #define SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn) \
+14 -10
include/linux/pm_domain.h
··· 91 91 int (*stop)(struct device *dev); 92 92 }; 93 93 94 + struct genpd_governor_data { 95 + s64 max_off_time_ns; 96 + bool max_off_time_changed; 97 + ktime_t next_wakeup; 98 + bool cached_power_down_ok; 99 + bool cached_power_down_state_idx; 100 + }; 101 + 94 102 struct genpd_power_state { 95 103 s64 power_off_latency_ns; 96 104 s64 power_on_latency_ns; ··· 106 98 u64 usage; 107 99 u64 rejected; 108 100 struct fwnode_handle *fwnode; 109 - ktime_t idle_time; 101 + u64 idle_time; 110 102 void *data; 111 103 }; 112 104 ··· 122 114 struct list_head child_links; /* Links with PM domain as a child */ 123 115 struct list_head dev_list; /* List of devices */ 124 116 struct dev_power_governor *gov; 117 + struct genpd_governor_data *gd; /* Data used by a genpd governor. */ 125 118 struct work_struct power_off_work; 126 119 struct fwnode_handle *provider; /* Identity of the domain provider */ 127 120 bool has_provider; ··· 143 134 int (*set_performance_state)(struct generic_pm_domain *genpd, 144 135 unsigned int state); 145 136 struct gpd_dev_ops dev_ops; 146 - s64 max_off_time_ns; /* Maximum allowed "suspended" time. */ 147 - ktime_t next_wakeup; /* Maintained by the domain governor */ 148 - bool max_off_time_changed; 149 - bool cached_power_down_ok; 150 - bool cached_power_down_state_idx; 151 137 int (*attach_dev)(struct generic_pm_domain *domain, 152 138 struct device *dev); 153 139 void (*detach_dev)(struct generic_pm_domain *domain, ··· 153 149 unsigned int state_count); 154 150 unsigned int state_count; /* number of states */ 155 151 unsigned int state_idx; /* state that genpd will go to when off */ 156 - ktime_t on_time; 157 - ktime_t accounting_time; 152 + u64 on_time; 153 + u64 accounting_time; 158 154 const struct genpd_lock_ops *lock_ops; 159 155 union { 160 156 struct mutex mlock; ··· 186 182 s64 suspend_latency_ns; 187 183 s64 resume_latency_ns; 188 184 s64 effective_constraint_ns; 185 + ktime_t next_wakeup; 189 186 bool constraint_changed; 190 187 bool cached_suspend_ok; 191 188 }; ··· 198 193 199 194 struct generic_pm_domain_data { 200 195 struct pm_domain_data base; 201 - struct gpd_timing_data td; 196 + struct gpd_timing_data *td; 202 197 struct notifier_block nb; 203 198 struct notifier_block *power_nb; 204 199 int cpu; 205 200 unsigned int performance_state; 206 201 unsigned int default_pstate; 207 202 unsigned int rpm_pstate; 208 - ktime_t next_wakeup; 209 203 void *data; 210 204 }; 211 205
+8 -2
include/linux/pm_runtime.h
··· 41 41 42 42 #define EXPORT_RUNTIME_DEV_PM_OPS(name, suspend_fn, resume_fn, idle_fn) \ 43 43 _EXPORT_DEV_PM_OPS(name, pm_runtime_force_suspend, pm_runtime_force_resume, \ 44 - suspend_fn, resume_fn, idle_fn, "") 44 + suspend_fn, resume_fn, idle_fn, "", "") 45 45 #define EXPORT_GPL_RUNTIME_DEV_PM_OPS(name, suspend_fn, resume_fn, idle_fn) \ 46 46 _EXPORT_DEV_PM_OPS(name, pm_runtime_force_suspend, pm_runtime_force_resume, \ 47 - suspend_fn, resume_fn, idle_fn, "_gpl") 47 + suspend_fn, resume_fn, idle_fn, "_gpl", "") 48 + #define EXPORT_NS_RUNTIME_DEV_PM_OPS(name, suspend_fn, resume_fn, idle_fn, ns) \ 49 + _EXPORT_DEV_PM_OPS(name, pm_runtime_force_suspend, pm_runtime_force_resume, \ 50 + suspend_fn, resume_fn, idle_fn, "", #ns) 51 + #define EXPORT_NS_GPL_RUNTIME_DEV_PM_OPS(name, suspend_fn, resume_fn, idle_fn, ns) \ 52 + _EXPORT_DEV_PM_OPS(name, pm_runtime_force_suspend, pm_runtime_force_resume, \ 53 + suspend_fn, resume_fn, idle_fn, "_gpl", #ns) 48 54 49 55 #ifdef CONFIG_PM 50 56 extern struct workqueue_struct *pm_wq;
+39 -5
include/linux/suspend.h
··· 542 542 #ifdef CONFIG_PM_SLEEP_DEBUG 543 543 extern bool pm_print_times_enabled; 544 544 extern bool pm_debug_messages_on; 545 - extern __printf(2, 3) void __pm_pr_dbg(bool defer, const char *fmt, ...); 545 + static inline int pm_dyn_debug_messages_on(void) 546 + { 547 + #ifdef CONFIG_DYNAMIC_DEBUG 548 + return 1; 549 + #else 550 + return 0; 551 + #endif 552 + } 553 + #ifndef pr_fmt 554 + #define pr_fmt(fmt) "PM: " fmt 555 + #endif 556 + #define __pm_pr_dbg(fmt, ...) \ 557 + do { \ 558 + if (pm_debug_messages_on) \ 559 + printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__); \ 560 + else if (pm_dyn_debug_messages_on()) \ 561 + pr_debug(fmt, ##__VA_ARGS__); \ 562 + } while (0) 563 + #define __pm_deferred_pr_dbg(fmt, ...) \ 564 + do { \ 565 + if (pm_debug_messages_on) \ 566 + printk_deferred(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__); \ 567 + } while (0) 546 568 #else 547 569 #define pm_print_times_enabled (false) 548 570 #define pm_debug_messages_on (false) 549 571 550 572 #include <linux/printk.h> 551 573 552 - #define __pm_pr_dbg(defer, fmt, ...) \ 553 - no_printk(KERN_DEBUG fmt, ##__VA_ARGS__) 574 + #define __pm_pr_dbg(fmt, ...) \ 575 + no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__) 576 + #define __pm_deferred_pr_dbg(fmt, ...) \ 577 + no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__) 554 578 #endif 555 579 580 + /** 581 + * pm_pr_dbg - print pm sleep debug messages 582 + * 583 + * If pm_debug_messages_on is enabled, print message. 584 + * If pm_debug_messages_on is disabled and CONFIG_DYNAMIC_DEBUG is enabled, 585 + * print message only from instances explicitly enabled on dynamic debug's 586 + * control. 587 + * If pm_debug_messages_on is disabled and CONFIG_DYNAMIC_DEBUG is disabled, 588 + * don't print message. 589 + */ 556 590 #define pm_pr_dbg(fmt, ...) \ 557 - __pm_pr_dbg(false, fmt, ##__VA_ARGS__) 591 + __pm_pr_dbg(fmt, ##__VA_ARGS__) 558 592 559 593 #define pm_deferred_pr_dbg(fmt, ...) \ 560 - __pm_pr_dbg(true, fmt, ##__VA_ARGS__) 594 + __pm_deferred_pr_dbg(fmt, ##__VA_ARGS__) 561 595 562 596 #ifdef CONFIG_PM_AUTOSLEEP 563 597
+25
include/soc/rockchip/pm_domains.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * Copyright 2022, The Chromium OS Authors. All rights reserved. 4 + */ 5 + 6 + #ifndef __SOC_ROCKCHIP_PM_DOMAINS_H__ 7 + #define __SOC_ROCKCHIP_PM_DOMAINS_H__ 8 + 9 + #ifdef CONFIG_ROCKCHIP_PM_DOMAINS 10 + 11 + int rockchip_pmu_block(void); 12 + void rockchip_pmu_unblock(void); 13 + 14 + #else /* CONFIG_ROCKCHIP_PM_DOMAINS */ 15 + 16 + static inline int rockchip_pmu_block(void) 17 + { 18 + return 0; 19 + } 20 + 21 + static inline void rockchip_pmu_unblock(void) { } 22 + 23 + #endif /* CONFIG_ROCKCHIP_PM_DOMAINS */ 24 + 25 + #endif /* __SOC_ROCKCHIP_PM_DOMAINS_H__ */
+5 -1
kernel/power/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 3 - ccflags-$(CONFIG_PM_DEBUG) := -DDEBUG 3 + ifeq ($(CONFIG_DYNAMIC_DEBUG), y) 4 + CFLAGS_swap.o := -DDEBUG 5 + CFLAGS_snapshot.o := -DDEBUG 6 + CFLAGS_energy_model.o := -DDEBUG 7 + endif 4 8 5 9 KASAN_SANITIZE_snapshot.o := n 6 10
+36 -29
kernel/power/energy_model.c
··· 54 54 } 55 55 DEFINE_SHOW_ATTRIBUTE(em_debug_cpus); 56 56 57 - static int em_debug_units_show(struct seq_file *s, void *unused) 57 + static int em_debug_flags_show(struct seq_file *s, void *unused) 58 58 { 59 59 struct em_perf_domain *pd = s->private; 60 - char *units = (pd->flags & EM_PERF_DOMAIN_MILLIWATTS) ? 61 - "milliWatts" : "bogoWatts"; 62 60 63 - seq_printf(s, "%s\n", units); 61 + seq_printf(s, "%#lx\n", pd->flags); 64 62 65 63 return 0; 66 64 } 67 - DEFINE_SHOW_ATTRIBUTE(em_debug_units); 68 - 69 - static int em_debug_skip_inefficiencies_show(struct seq_file *s, void *unused) 70 - { 71 - struct em_perf_domain *pd = s->private; 72 - int enabled = (pd->flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES) ? 1 : 0; 73 - 74 - seq_printf(s, "%d\n", enabled); 75 - 76 - return 0; 77 - } 78 - DEFINE_SHOW_ATTRIBUTE(em_debug_skip_inefficiencies); 65 + DEFINE_SHOW_ATTRIBUTE(em_debug_flags); 79 66 80 67 static void em_debug_create_pd(struct device *dev) 81 68 { ··· 76 89 debugfs_create_file("cpus", 0444, d, dev->em_pd->cpus, 77 90 &em_debug_cpus_fops); 78 91 79 - debugfs_create_file("units", 0444, d, dev->em_pd, &em_debug_units_fops); 80 - debugfs_create_file("skip-inefficiencies", 0444, d, dev->em_pd, 81 - &em_debug_skip_inefficiencies_fops); 92 + debugfs_create_file("flags", 0444, d, dev->em_pd, 93 + &em_debug_flags_fops); 82 94 83 95 /* Create a sub-directory for each performance state */ 84 96 for (i = 0; i < dev->em_pd->nr_perf_states; i++) ··· 107 121 #endif 108 122 109 123 static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, 110 - int nr_states, struct em_data_callback *cb) 124 + int nr_states, struct em_data_callback *cb, 125 + unsigned long flags) 111 126 { 112 127 unsigned long power, freq, prev_freq = 0, prev_cost = ULONG_MAX; 113 128 struct em_perf_state *table; ··· 126 139 * lowest performance state of 'dev' above 'freq' and updates 127 140 * 'power' and 'freq' accordingly. 128 141 */ 129 - ret = cb->active_power(&power, &freq, dev); 142 + ret = cb->active_power(dev, &power, &freq); 130 143 if (ret) { 131 144 dev_err(dev, "EM: invalid perf. state: %d\n", 132 145 ret); ··· 160 173 /* Compute the cost of each performance state. */ 161 174 fmax = (u64) table[nr_states - 1].frequency; 162 175 for (i = nr_states - 1; i >= 0; i--) { 163 - unsigned long power_res = em_scale_power(table[i].power); 176 + unsigned long power_res, cost; 164 177 165 - table[i].cost = div64_u64(fmax * power_res, 166 - table[i].frequency); 178 + if (flags & EM_PERF_DOMAIN_ARTIFICIAL) { 179 + ret = cb->get_cost(dev, table[i].frequency, &cost); 180 + if (ret || !cost || cost > EM_MAX_POWER) { 181 + dev_err(dev, "EM: invalid cost %lu %d\n", 182 + cost, ret); 183 + goto free_ps_table; 184 + } 185 + } else { 186 + power_res = em_scale_power(table[i].power); 187 + cost = div64_u64(fmax * power_res, table[i].frequency); 188 + } 189 + 190 + table[i].cost = cost; 191 + 167 192 if (table[i].cost >= prev_cost) { 168 193 table[i].flags = EM_PERF_STATE_INEFFICIENT; 169 194 dev_dbg(dev, "EM: OPP:%lu is inefficient\n", ··· 196 197 } 197 198 198 199 static int em_create_pd(struct device *dev, int nr_states, 199 - struct em_data_callback *cb, cpumask_t *cpus) 200 + struct em_data_callback *cb, cpumask_t *cpus, 201 + unsigned long flags) 200 202 { 201 203 struct em_perf_domain *pd; 202 204 struct device *cpu_dev; ··· 215 215 return -ENOMEM; 216 216 } 217 217 218 - ret = em_create_perf_table(dev, pd, nr_states, cb); 218 + ret = em_create_perf_table(dev, pd, nr_states, cb, flags); 219 219 if (ret) { 220 220 kfree(pd); 221 221 return ret; ··· 258 258 if (!cpufreq_table_set_inefficient(policy, table[i].frequency)) 259 259 found++; 260 260 } 261 + 262 + cpufreq_cpu_put(policy); 261 263 262 264 if (!found) 263 265 return; ··· 334 332 bool milliwatts) 335 333 { 336 334 unsigned long cap, prev_cap = 0; 335 + unsigned long flags = 0; 337 336 int cpu, ret; 338 337 339 338 if (!dev || !nr_states || !cb) ··· 381 378 } 382 379 } 383 380 384 - ret = em_create_pd(dev, nr_states, cb, cpus); 381 + if (milliwatts) 382 + flags |= EM_PERF_DOMAIN_MILLIWATTS; 383 + else if (cb->get_cost) 384 + flags |= EM_PERF_DOMAIN_ARTIFICIAL; 385 + 386 + ret = em_create_pd(dev, nr_states, cb, cpus, flags); 385 387 if (ret) 386 388 goto unlock; 387 389 388 - if (milliwatts) 389 - dev->em_pd->flags |= EM_PERF_DOMAIN_MILLIWATTS; 390 + dev->em_pd->flags |= flags; 390 391 391 392 em_cpufreq_update_efficiencies(dev); 392 393
-29
kernel/power/main.c
··· 545 545 } 546 546 __setup("pm_debug_messages", pm_debug_messages_setup); 547 547 548 - /** 549 - * __pm_pr_dbg - Print a suspend debug message to the kernel log. 550 - * @defer: Whether or not to use printk_deferred() to print the message. 551 - * @fmt: Message format. 552 - * 553 - * The message will be emitted if enabled through the pm_debug_messages 554 - * sysfs attribute. 555 - */ 556 - void __pm_pr_dbg(bool defer, const char *fmt, ...) 557 - { 558 - struct va_format vaf; 559 - va_list args; 560 - 561 - if (!pm_debug_messages_on) 562 - return; 563 - 564 - va_start(args, fmt); 565 - 566 - vaf.fmt = fmt; 567 - vaf.va = &args; 568 - 569 - if (defer) 570 - printk_deferred(KERN_DEBUG "PM: %pV", &vaf); 571 - else 572 - printk(KERN_DEBUG "PM: %pV", &vaf); 573 - 574 - va_end(args); 575 - } 576 - 577 548 #else /* !CONFIG_PM_SLEEP_DEBUG */ 578 549 static inline void pm_print_times_init(void) {} 579 550 #endif /* CONFIG_PM_SLEEP_DEBUG */
-3
kernel/power/process.c
··· 6 6 * Originally from swsusp. 7 7 */ 8 8 9 - 10 - #undef DEBUG 11 - 12 9 #include <linux/interrupt.h> 13 10 #include <linux/oom.h> 14 11 #include <linux/suspend.h>
+8 -4
kernel/power/snapshot.c
··· 326 326 return ret; 327 327 } 328 328 329 - /** 329 + /* 330 330 * Data types related to memory bitmaps. 331 331 * 332 332 * Memory bitmap is a structure consisting of many linked lists of ··· 427 427 428 428 /** 429 429 * alloc_rtree_node - Allocate a new node and add it to the radix tree. 430 + * @gfp_mask: GFP mask for the allocation. 431 + * @safe_needed: Get pages not used before hibernation (restore only) 432 + * @ca: Pointer to a linked list of pages ("a chain") to allocate from 433 + * @list: Radix Tree node to add. 430 434 * 431 435 * This function is used to allocate inner nodes as well as the 432 436 * leave nodes of the radix tree. It also adds the node to the ··· 906 902 } 907 903 908 904 /** 909 - * memory_bm_rtree_next_pfn - Find the next set bit in a memory bitmap. 905 + * memory_bm_next_pfn - Find the next set bit in a memory bitmap. 910 906 * @bm: Memory bitmap. 911 907 * 912 908 * Starting from the last returned position this function searches for the next ··· 1941 1937 } 1942 1938 1943 1939 /** 1944 - * alloc_highmem_image_pages - Allocate some highmem pages for the image. 1940 + * alloc_highmem_pages - Allocate some highmem pages for the image. 1945 1941 * 1946 1942 * Try to allocate as many pages as needed, but if the number of free highmem 1947 1943 * pages is less than that, allocate them all. ··· 2228 2224 } 2229 2225 2230 2226 /** 2231 - * load header - Check the image header and copy the data from it. 2227 + * load_header - Check the image header and copy the data from it. 2232 2228 */ 2233 2229 static int load_header(struct swsusp_info *info) 2234 2230 {
+1 -1
tools/power/x86/turbostat/Makefile
··· 9 9 endif 10 10 11 11 turbostat : turbostat.c 12 - override CFLAGS += -O2 -Wall -I../../../include 12 + override CFLAGS += -O2 -Wall -Wextra -I../../../include 13 13 override CFLAGS += -DMSRHEADER='"../../../../arch/x86/include/asm/msr-index.h"' 14 14 override CFLAGS += -DINTEL_FAMILY_HEADER='"../../../../arch/x86/include/asm/intel-family.h"' 15 15 override CFLAGS += -D_FILE_OFFSET_BITS=64
+1 -1
tools/power/x86/turbostat/turbostat.8
··· 292 292 must be run as root. 293 293 Alternatively, non-root users can be enabled to run turbostat this way: 294 294 295 - # setcap cap_sys_rawio=ep ./turbostat 295 + # setcap cap_sys_admin,cap_sys_rawio,cap_sys_nice=+ep ./turbostat 296 296 297 297 # chmod +r /dev/cpu/*/msr 298 298
+386 -208
tools/power/x86/turbostat/turbostat.c
··· 3 3 * turbostat -- show CPU frequency and C-state residency 4 4 * on modern Intel and AMD processors. 5 5 * 6 - * Copyright (c) 2021 Intel Corporation. 6 + * Copyright (c) 2022 Intel Corporation. 7 7 * Len Brown <len.brown@intel.com> 8 8 */ 9 9 ··· 37 37 #include <asm/unistd.h> 38 38 #include <stdbool.h> 39 39 40 + #define UNUSED(x) (void)(x) 41 + 42 + /* 43 + * This list matches the column headers, except 44 + * 1. built-in only, the sysfs counters are not here -- we learn of those at run-time 45 + * 2. Core and CPU are moved to the end, we can't have strings that contain them 46 + * matching on them for --show and --hide. 47 + */ 48 + 49 + /* 50 + * buffer size used by sscanf() for added column names 51 + * Usually truncated to 7 characters, but also handles 18 columns for raw 64-bit counters 52 + */ 53 + #define NAME_BYTES 20 54 + #define PATH_BYTES 128 55 + 56 + enum counter_scope { SCOPE_CPU, SCOPE_CORE, SCOPE_PACKAGE }; 57 + enum counter_type { COUNTER_ITEMS, COUNTER_CYCLES, COUNTER_SECONDS, COUNTER_USEC }; 58 + enum counter_format { FORMAT_RAW, FORMAT_DELTA, FORMAT_PERCENT }; 59 + 60 + struct msr_counter { 61 + unsigned int msr_num; 62 + char name[NAME_BYTES]; 63 + char path[PATH_BYTES]; 64 + unsigned int width; 65 + enum counter_type type; 66 + enum counter_format format; 67 + struct msr_counter *next; 68 + unsigned int flags; 69 + #define FLAGS_HIDE (1 << 0) 70 + #define FLAGS_SHOW (1 << 1) 71 + #define SYSFS_PERCPU (1 << 1) 72 + }; 73 + 74 + struct msr_counter bic[] = { 75 + { 0x0, "usec", "", 0, 0, 0, NULL, 0 }, 76 + { 0x0, "Time_Of_Day_Seconds", "", 0, 0, 0, NULL, 0 }, 77 + { 0x0, "Package", "", 0, 0, 0, NULL, 0 }, 78 + { 0x0, "Node", "", 0, 0, 0, NULL, 0 }, 79 + { 0x0, "Avg_MHz", "", 0, 0, 0, NULL, 0 }, 80 + { 0x0, "Busy%", "", 0, 0, 0, NULL, 0 }, 81 + { 0x0, "Bzy_MHz", "", 0, 0, 0, NULL, 0 }, 82 + { 0x0, "TSC_MHz", "", 0, 0, 0, NULL, 0 }, 83 + { 0x0, "IRQ", "", 0, 0, 0, NULL, 0 }, 84 + { 0x0, "SMI", "", 32, 0, FORMAT_DELTA, NULL, 0 }, 85 + { 0x0, "sysfs", "", 0, 0, 0, NULL, 0 }, 86 + { 0x0, "CPU%c1", "", 0, 0, 0, NULL, 0 }, 87 + { 0x0, "CPU%c3", "", 0, 0, 0, NULL, 0 }, 88 + { 0x0, "CPU%c6", "", 0, 0, 0, NULL, 0 }, 89 + { 0x0, "CPU%c7", "", 0, 0, 0, NULL, 0 }, 90 + { 0x0, "ThreadC", "", 0, 0, 0, NULL, 0 }, 91 + { 0x0, "CoreTmp", "", 0, 0, 0, NULL, 0 }, 92 + { 0x0, "CoreCnt", "", 0, 0, 0, NULL, 0 }, 93 + { 0x0, "PkgTmp", "", 0, 0, 0, NULL, 0 }, 94 + { 0x0, "GFX%rc6", "", 0, 0, 0, NULL, 0 }, 95 + { 0x0, "GFXMHz", "", 0, 0, 0, NULL, 0 }, 96 + { 0x0, "Pkg%pc2", "", 0, 0, 0, NULL, 0 }, 97 + { 0x0, "Pkg%pc3", "", 0, 0, 0, NULL, 0 }, 98 + { 0x0, "Pkg%pc6", "", 0, 0, 0, NULL, 0 }, 99 + { 0x0, "Pkg%pc7", "", 0, 0, 0, NULL, 0 }, 100 + { 0x0, "Pkg%pc8", "", 0, 0, 0, NULL, 0 }, 101 + { 0x0, "Pkg%pc9", "", 0, 0, 0, NULL, 0 }, 102 + { 0x0, "Pk%pc10", "", 0, 0, 0, NULL, 0 }, 103 + { 0x0, "CPU%LPI", "", 0, 0, 0, NULL, 0 }, 104 + { 0x0, "SYS%LPI", "", 0, 0, 0, NULL, 0 }, 105 + { 0x0, "PkgWatt", "", 0, 0, 0, NULL, 0 }, 106 + { 0x0, "CorWatt", "", 0, 0, 0, NULL, 0 }, 107 + { 0x0, "GFXWatt", "", 0, 0, 0, NULL, 0 }, 108 + { 0x0, "PkgCnt", "", 0, 0, 0, NULL, 0 }, 109 + { 0x0, "RAMWatt", "", 0, 0, 0, NULL, 0 }, 110 + { 0x0, "PKG_%", "", 0, 0, 0, NULL, 0 }, 111 + { 0x0, "RAM_%", "", 0, 0, 0, NULL, 0 }, 112 + { 0x0, "Pkg_J", "", 0, 0, 0, NULL, 0 }, 113 + { 0x0, "Cor_J", "", 0, 0, 0, NULL, 0 }, 114 + { 0x0, "GFX_J", "", 0, 0, 0, NULL, 0 }, 115 + { 0x0, "RAM_J", "", 0, 0, 0, NULL, 0 }, 116 + { 0x0, "Mod%c6", "", 0, 0, 0, NULL, 0 }, 117 + { 0x0, "Totl%C0", "", 0, 0, 0, NULL, 0 }, 118 + { 0x0, "Any%C0", "", 0, 0, 0, NULL, 0 }, 119 + { 0x0, "GFX%C0", "", 0, 0, 0, NULL, 0 }, 120 + { 0x0, "CPUGFX%", "", 0, 0, 0, NULL, 0 }, 121 + { 0x0, "Core", "", 0, 0, 0, NULL, 0 }, 122 + { 0x0, "CPU", "", 0, 0, 0, NULL, 0 }, 123 + { 0x0, "APIC", "", 0, 0, 0, NULL, 0 }, 124 + { 0x0, "X2APIC", "", 0, 0, 0, NULL, 0 }, 125 + { 0x0, "Die", "", 0, 0, 0, NULL, 0 }, 126 + { 0x0, "GFXAMHz", "", 0, 0, 0, NULL, 0 }, 127 + { 0x0, "IPC", "", 0, 0, 0, NULL, 0 }, 128 + { 0x0, "CoreThr", "", 0, 0, 0, NULL, 0 }, 129 + }; 130 + 131 + #define MAX_BIC (sizeof(bic) / sizeof(struct msr_counter)) 132 + #define BIC_USEC (1ULL << 0) 133 + #define BIC_TOD (1ULL << 1) 134 + #define BIC_Package (1ULL << 2) 135 + #define BIC_Node (1ULL << 3) 136 + #define BIC_Avg_MHz (1ULL << 4) 137 + #define BIC_Busy (1ULL << 5) 138 + #define BIC_Bzy_MHz (1ULL << 6) 139 + #define BIC_TSC_MHz (1ULL << 7) 140 + #define BIC_IRQ (1ULL << 8) 141 + #define BIC_SMI (1ULL << 9) 142 + #define BIC_sysfs (1ULL << 10) 143 + #define BIC_CPU_c1 (1ULL << 11) 144 + #define BIC_CPU_c3 (1ULL << 12) 145 + #define BIC_CPU_c6 (1ULL << 13) 146 + #define BIC_CPU_c7 (1ULL << 14) 147 + #define BIC_ThreadC (1ULL << 15) 148 + #define BIC_CoreTmp (1ULL << 16) 149 + #define BIC_CoreCnt (1ULL << 17) 150 + #define BIC_PkgTmp (1ULL << 18) 151 + #define BIC_GFX_rc6 (1ULL << 19) 152 + #define BIC_GFXMHz (1ULL << 20) 153 + #define BIC_Pkgpc2 (1ULL << 21) 154 + #define BIC_Pkgpc3 (1ULL << 22) 155 + #define BIC_Pkgpc6 (1ULL << 23) 156 + #define BIC_Pkgpc7 (1ULL << 24) 157 + #define BIC_Pkgpc8 (1ULL << 25) 158 + #define BIC_Pkgpc9 (1ULL << 26) 159 + #define BIC_Pkgpc10 (1ULL << 27) 160 + #define BIC_CPU_LPI (1ULL << 28) 161 + #define BIC_SYS_LPI (1ULL << 29) 162 + #define BIC_PkgWatt (1ULL << 30) 163 + #define BIC_CorWatt (1ULL << 31) 164 + #define BIC_GFXWatt (1ULL << 32) 165 + #define BIC_PkgCnt (1ULL << 33) 166 + #define BIC_RAMWatt (1ULL << 34) 167 + #define BIC_PKG__ (1ULL << 35) 168 + #define BIC_RAM__ (1ULL << 36) 169 + #define BIC_Pkg_J (1ULL << 37) 170 + #define BIC_Cor_J (1ULL << 38) 171 + #define BIC_GFX_J (1ULL << 39) 172 + #define BIC_RAM_J (1ULL << 40) 173 + #define BIC_Mod_c6 (1ULL << 41) 174 + #define BIC_Totl_c0 (1ULL << 42) 175 + #define BIC_Any_c0 (1ULL << 43) 176 + #define BIC_GFX_c0 (1ULL << 44) 177 + #define BIC_CPUGFX (1ULL << 45) 178 + #define BIC_Core (1ULL << 46) 179 + #define BIC_CPU (1ULL << 47) 180 + #define BIC_APIC (1ULL << 48) 181 + #define BIC_X2APIC (1ULL << 49) 182 + #define BIC_Die (1ULL << 50) 183 + #define BIC_GFXACTMHz (1ULL << 51) 184 + #define BIC_IPC (1ULL << 52) 185 + #define BIC_CORE_THROT_CNT (1ULL << 53) 186 + 187 + #define BIC_TOPOLOGY (BIC_Package | BIC_Node | BIC_CoreCnt | BIC_PkgCnt | BIC_Core | BIC_CPU | BIC_Die ) 188 + #define BIC_THERMAL_PWR ( BIC_CoreTmp | BIC_PkgTmp | BIC_PkgWatt | BIC_CorWatt | BIC_GFXWatt | BIC_RAMWatt | BIC_PKG__ | BIC_RAM__) 189 + #define BIC_FREQUENCY ( BIC_Avg_MHz | BIC_Busy | BIC_Bzy_MHz | BIC_TSC_MHz | BIC_GFXMHz | BIC_GFXACTMHz ) 190 + #define BIC_IDLE ( BIC_sysfs | BIC_CPU_c1 | BIC_CPU_c3 | BIC_CPU_c6 | BIC_CPU_c7 | BIC_GFX_rc6 | BIC_Pkgpc2 | BIC_Pkgpc3 | BIC_Pkgpc6 | BIC_Pkgpc7 | BIC_Pkgpc8 | BIC_Pkgpc9 | BIC_Pkgpc10 | BIC_CPU_LPI | BIC_SYS_LPI | BIC_Mod_c6 | BIC_Totl_c0 | BIC_Any_c0 | BIC_GFX_c0 | BIC_CPUGFX) 191 + #define BIC_OTHER ( BIC_IRQ | BIC_SMI | BIC_ThreadC | BIC_CoreTmp | BIC_IPC) 192 + 193 + #define BIC_DISABLED_BY_DEFAULT (BIC_USEC | BIC_TOD | BIC_APIC | BIC_X2APIC) 194 + 195 + unsigned long long bic_enabled = (0xFFFFFFFFFFFFFFFFULL & ~BIC_DISABLED_BY_DEFAULT); 196 + unsigned long long bic_present = BIC_USEC | BIC_TOD | BIC_sysfs | BIC_APIC | BIC_X2APIC; 197 + 198 + #define DO_BIC(COUNTER_NAME) (bic_enabled & bic_present & COUNTER_NAME) 199 + #define DO_BIC_READ(COUNTER_NAME) (bic_present & COUNTER_NAME) 200 + #define ENABLE_BIC(COUNTER_NAME) (bic_enabled |= COUNTER_NAME) 201 + #define BIC_PRESENT(COUNTER_BIT) (bic_present |= COUNTER_BIT) 202 + #define BIC_NOT_PRESENT(COUNTER_BIT) (bic_present &= ~COUNTER_BIT) 203 + #define BIC_IS_ENABLED(COUNTER_BIT) (bic_enabled & COUNTER_BIT) 204 + 40 205 char *proc_stat = "/proc/stat"; 41 206 FILE *outf; 42 207 int *fd_percpu; ··· 213 48 unsigned int model_orig; 214 49 215 50 unsigned int num_iterations; 51 + unsigned int header_iterations; 216 52 unsigned int debug; 217 53 unsigned int quiet; 218 54 unsigned int shown; ··· 325 159 326 160 #define MAX(a, b) ((a) > (b) ? (a) : (b)) 327 161 328 - /* 329 - * buffer size used by sscanf() for added column names 330 - * Usually truncated to 7 characters, but also handles 18 columns for raw 64-bit counters 331 - */ 332 - #define NAME_BYTES 20 333 - #define PATH_BYTES 128 334 - 335 162 int backwards_count; 336 163 char *progname; 337 164 ··· 364 205 unsigned int core_temp_c; 365 206 unsigned int core_energy; /* MSR_CORE_ENERGY_STAT */ 366 207 unsigned int core_id; 208 + unsigned long long core_throt_cnt; 367 209 unsigned long long counter[MAX_ADDED_COUNTERS]; 368 210 } *core_even, *core_odd; 369 211 ··· 414 254 (core_no)) 415 255 416 256 #define GET_PKG(pkg_base, pkg_no) (pkg_base + pkg_no) 417 - 418 - enum counter_scope { SCOPE_CPU, SCOPE_CORE, SCOPE_PACKAGE }; 419 - enum counter_type { COUNTER_ITEMS, COUNTER_CYCLES, COUNTER_SECONDS, COUNTER_USEC }; 420 - enum counter_format { FORMAT_RAW, FORMAT_DELTA, FORMAT_PERCENT }; 421 - 422 - struct msr_counter { 423 - unsigned int msr_num; 424 - char name[NAME_BYTES]; 425 - char path[PATH_BYTES]; 426 - unsigned int width; 427 - enum counter_type type; 428 - enum counter_format format; 429 - struct msr_counter *next; 430 - unsigned int flags; 431 - #define FLAGS_HIDE (1 << 0) 432 - #define FLAGS_SHOW (1 << 1) 433 - #define SYSFS_PERCPU (1 << 1) 434 - }; 435 257 436 258 /* 437 259 * The accumulated sum of MSR is defined as a monotonic ··· 664 522 665 523 /* counter for cpu_num, including user + kernel and all processes */ 666 524 fd = perf_event_open(&pea, -1, cpu_num, -1, 0); 667 - if (fd == -1) 668 - err(-1, "cpu%d: perf instruction counter\n", cpu_num); 525 + if (fd == -1) { 526 + warn("cpu%d: perf instruction counter", cpu_num); 527 + BIC_NOT_PRESENT(BIC_IPC); 528 + } 669 529 670 530 return fd; 671 531 } ··· 694 550 return 0; 695 551 } 696 552 697 - /* 698 - * This list matches the column headers, except 699 - * 1. built-in only, the sysfs counters are not here -- we learn of those at run-time 700 - * 2. Core and CPU are moved to the end, we can't have strings that contain them 701 - * matching on them for --show and --hide. 702 - */ 703 - struct msr_counter bic[] = { 704 - { 0x0, "usec" }, 705 - { 0x0, "Time_Of_Day_Seconds" }, 706 - { 0x0, "Package" }, 707 - { 0x0, "Node" }, 708 - { 0x0, "Avg_MHz" }, 709 - { 0x0, "Busy%" }, 710 - { 0x0, "Bzy_MHz" }, 711 - { 0x0, "TSC_MHz" }, 712 - { 0x0, "IRQ" }, 713 - { 0x0, "SMI", "", 32, 0, FORMAT_DELTA, NULL }, 714 - { 0x0, "sysfs" }, 715 - { 0x0, "CPU%c1" }, 716 - { 0x0, "CPU%c3" }, 717 - { 0x0, "CPU%c6" }, 718 - { 0x0, "CPU%c7" }, 719 - { 0x0, "ThreadC" }, 720 - { 0x0, "CoreTmp" }, 721 - { 0x0, "CoreCnt" }, 722 - { 0x0, "PkgTmp" }, 723 - { 0x0, "GFX%rc6" }, 724 - { 0x0, "GFXMHz" }, 725 - { 0x0, "Pkg%pc2" }, 726 - { 0x0, "Pkg%pc3" }, 727 - { 0x0, "Pkg%pc6" }, 728 - { 0x0, "Pkg%pc7" }, 729 - { 0x0, "Pkg%pc8" }, 730 - { 0x0, "Pkg%pc9" }, 731 - { 0x0, "Pk%pc10" }, 732 - { 0x0, "CPU%LPI" }, 733 - { 0x0, "SYS%LPI" }, 734 - { 0x0, "PkgWatt" }, 735 - { 0x0, "CorWatt" }, 736 - { 0x0, "GFXWatt" }, 737 - { 0x0, "PkgCnt" }, 738 - { 0x0, "RAMWatt" }, 739 - { 0x0, "PKG_%" }, 740 - { 0x0, "RAM_%" }, 741 - { 0x0, "Pkg_J" }, 742 - { 0x0, "Cor_J" }, 743 - { 0x0, "GFX_J" }, 744 - { 0x0, "RAM_J" }, 745 - { 0x0, "Mod%c6" }, 746 - { 0x0, "Totl%C0" }, 747 - { 0x0, "Any%C0" }, 748 - { 0x0, "GFX%C0" }, 749 - { 0x0, "CPUGFX%" }, 750 - { 0x0, "Core" }, 751 - { 0x0, "CPU" }, 752 - { 0x0, "APIC" }, 753 - { 0x0, "X2APIC" }, 754 - { 0x0, "Die" }, 755 - { 0x0, "GFXAMHz" }, 756 - { 0x0, "IPC" }, 757 - }; 758 - 759 - #define MAX_BIC (sizeof(bic) / sizeof(struct msr_counter)) 760 - #define BIC_USEC (1ULL << 0) 761 - #define BIC_TOD (1ULL << 1) 762 - #define BIC_Package (1ULL << 2) 763 - #define BIC_Node (1ULL << 3) 764 - #define BIC_Avg_MHz (1ULL << 4) 765 - #define BIC_Busy (1ULL << 5) 766 - #define BIC_Bzy_MHz (1ULL << 6) 767 - #define BIC_TSC_MHz (1ULL << 7) 768 - #define BIC_IRQ (1ULL << 8) 769 - #define BIC_SMI (1ULL << 9) 770 - #define BIC_sysfs (1ULL << 10) 771 - #define BIC_CPU_c1 (1ULL << 11) 772 - #define BIC_CPU_c3 (1ULL << 12) 773 - #define BIC_CPU_c6 (1ULL << 13) 774 - #define BIC_CPU_c7 (1ULL << 14) 775 - #define BIC_ThreadC (1ULL << 15) 776 - #define BIC_CoreTmp (1ULL << 16) 777 - #define BIC_CoreCnt (1ULL << 17) 778 - #define BIC_PkgTmp (1ULL << 18) 779 - #define BIC_GFX_rc6 (1ULL << 19) 780 - #define BIC_GFXMHz (1ULL << 20) 781 - #define BIC_Pkgpc2 (1ULL << 21) 782 - #define BIC_Pkgpc3 (1ULL << 22) 783 - #define BIC_Pkgpc6 (1ULL << 23) 784 - #define BIC_Pkgpc7 (1ULL << 24) 785 - #define BIC_Pkgpc8 (1ULL << 25) 786 - #define BIC_Pkgpc9 (1ULL << 26) 787 - #define BIC_Pkgpc10 (1ULL << 27) 788 - #define BIC_CPU_LPI (1ULL << 28) 789 - #define BIC_SYS_LPI (1ULL << 29) 790 - #define BIC_PkgWatt (1ULL << 30) 791 - #define BIC_CorWatt (1ULL << 31) 792 - #define BIC_GFXWatt (1ULL << 32) 793 - #define BIC_PkgCnt (1ULL << 33) 794 - #define BIC_RAMWatt (1ULL << 34) 795 - #define BIC_PKG__ (1ULL << 35) 796 - #define BIC_RAM__ (1ULL << 36) 797 - #define BIC_Pkg_J (1ULL << 37) 798 - #define BIC_Cor_J (1ULL << 38) 799 - #define BIC_GFX_J (1ULL << 39) 800 - #define BIC_RAM_J (1ULL << 40) 801 - #define BIC_Mod_c6 (1ULL << 41) 802 - #define BIC_Totl_c0 (1ULL << 42) 803 - #define BIC_Any_c0 (1ULL << 43) 804 - #define BIC_GFX_c0 (1ULL << 44) 805 - #define BIC_CPUGFX (1ULL << 45) 806 - #define BIC_Core (1ULL << 46) 807 - #define BIC_CPU (1ULL << 47) 808 - #define BIC_APIC (1ULL << 48) 809 - #define BIC_X2APIC (1ULL << 49) 810 - #define BIC_Die (1ULL << 50) 811 - #define BIC_GFXACTMHz (1ULL << 51) 812 - #define BIC_IPC (1ULL << 52) 813 - 814 - #define BIC_TOPOLOGY (BIC_Package | BIC_Node | BIC_CoreCnt | BIC_PkgCnt | BIC_Core | BIC_CPU | BIC_Die ) 815 - #define BIC_THERMAL_PWR ( BIC_CoreTmp | BIC_PkgTmp | BIC_PkgWatt | BIC_CorWatt | BIC_GFXWatt | BIC_RAMWatt | BIC_PKG__ | BIC_RAM__) 816 - #define BIC_FREQUENCY ( BIC_Avg_MHz | BIC_Busy | BIC_Bzy_MHz | BIC_TSC_MHz | BIC_GFXMHz | BIC_GFXACTMHz ) 817 - #define BIC_IDLE ( BIC_sysfs | BIC_CPU_c1 | BIC_CPU_c3 | BIC_CPU_c6 | BIC_CPU_c7 | BIC_GFX_rc6 | BIC_Pkgpc2 | BIC_Pkgpc3 | BIC_Pkgpc6 | BIC_Pkgpc7 | BIC_Pkgpc8 | BIC_Pkgpc9 | BIC_Pkgpc10 | BIC_CPU_LPI | BIC_SYS_LPI | BIC_Mod_c6 | BIC_Totl_c0 | BIC_Any_c0 | BIC_GFX_c0 | BIC_CPUGFX) 818 - #define BIC_OTHER ( BIC_IRQ | BIC_SMI | BIC_ThreadC | BIC_CoreTmp | BIC_IPC) 819 - 820 - #define BIC_DISABLED_BY_DEFAULT (BIC_USEC | BIC_TOD | BIC_APIC | BIC_X2APIC) 821 - 822 - unsigned long long bic_enabled = (0xFFFFFFFFFFFFFFFFULL & ~BIC_DISABLED_BY_DEFAULT); 823 - unsigned long long bic_present = BIC_USEC | BIC_TOD | BIC_sysfs | BIC_APIC | BIC_X2APIC; 824 - 825 - #define DO_BIC(COUNTER_NAME) (bic_enabled & bic_present & COUNTER_NAME) 826 - #define DO_BIC_READ(COUNTER_NAME) (bic_present & COUNTER_NAME) 827 - #define ENABLE_BIC(COUNTER_NAME) (bic_enabled |= COUNTER_NAME) 828 - #define BIC_PRESENT(COUNTER_BIT) (bic_present |= COUNTER_BIT) 829 - #define BIC_NOT_PRESENT(COUNTER_BIT) (bic_present &= ~COUNTER_BIT) 830 - #define BIC_IS_ENABLED(COUNTER_BIT) (bic_enabled & COUNTER_BIT) 831 - 832 553 #define MAX_DEFERRED 16 554 + char *deferred_add_names[MAX_DEFERRED]; 833 555 char *deferred_skip_names[MAX_DEFERRED]; 556 + int deferred_add_index; 834 557 int deferred_skip_index; 835 558 836 559 /* ··· 731 720 " -l, --list list column headers only\n" 732 721 " -n, --num_iterations num\n" 733 722 " number of the measurement iterations\n" 723 + " -N, --header_iterations num\n" 724 + " print header every num iterations\n" 734 725 " -o, --out file\n" 735 726 " create or truncate \"file\" for all output\n" 736 727 " -q, --quiet skip decoding system configuration header\n" ··· 754 741 */ 755 742 unsigned long long bic_lookup(char *name_list, enum show_hide_mode mode) 756 743 { 757 - int i; 744 + unsigned int i; 758 745 unsigned long long retval = 0; 759 746 760 747 while (name_list) { ··· 765 752 if (comma) 766 753 *comma = '\0'; 767 754 768 - if (!strcmp(name_list, "all")) 769 - return ~0; 770 - if (!strcmp(name_list, "topology")) 771 - return BIC_TOPOLOGY; 772 - if (!strcmp(name_list, "power")) 773 - return BIC_THERMAL_PWR; 774 - if (!strcmp(name_list, "idle")) 775 - return BIC_IDLE; 776 - if (!strcmp(name_list, "frequency")) 777 - return BIC_FREQUENCY; 778 - if (!strcmp(name_list, "other")) 779 - return BIC_OTHER; 780 - if (!strcmp(name_list, "all")) 781 - return 0; 782 - 783 755 for (i = 0; i < MAX_BIC; ++i) { 784 756 if (!strcmp(name_list, bic[i].name)) { 785 757 retval |= (1ULL << i); 786 758 break; 787 759 } 760 + if (!strcmp(name_list, "all")) { 761 + retval |= ~0; 762 + break; 763 + } else if (!strcmp(name_list, "topology")) { 764 + retval |= BIC_TOPOLOGY; 765 + break; 766 + } else if (!strcmp(name_list, "power")) { 767 + retval |= BIC_THERMAL_PWR; 768 + break; 769 + } else if (!strcmp(name_list, "idle")) { 770 + retval |= BIC_IDLE; 771 + break; 772 + } else if (!strcmp(name_list, "frequency")) { 773 + retval |= BIC_FREQUENCY; 774 + break; 775 + } else if (!strcmp(name_list, "other")) { 776 + retval |= BIC_OTHER; 777 + break; 778 + } 779 + 788 780 } 789 781 if (i == MAX_BIC) { 790 782 if (mode == SHOW_LIST) { 791 - fprintf(stderr, "Invalid counter name: %s\n", name_list); 792 - exit(-1); 793 - } 794 - deferred_skip_names[deferred_skip_index++] = name_list; 795 - if (debug) 796 - fprintf(stderr, "deferred \"%s\"\n", name_list); 797 - if (deferred_skip_index >= MAX_DEFERRED) { 798 - fprintf(stderr, "More than max %d un-recognized --skip options '%s'\n", 799 - MAX_DEFERRED, name_list); 800 - help(); 801 - exit(1); 783 + deferred_add_names[deferred_add_index++] = name_list; 784 + if (deferred_add_index >= MAX_DEFERRED) { 785 + fprintf(stderr, "More than max %d un-recognized --add options '%s'\n", 786 + MAX_DEFERRED, name_list); 787 + help(); 788 + exit(1); 789 + } 790 + } else { 791 + deferred_skip_names[deferred_skip_index++] = name_list; 792 + if (debug) 793 + fprintf(stderr, "deferred \"%s\"\n", name_list); 794 + if (deferred_skip_index >= MAX_DEFERRED) { 795 + fprintf(stderr, "More than max %d un-recognized --skip options '%s'\n", 796 + MAX_DEFERRED, name_list); 797 + help(); 798 + exit(1); 799 + } 802 800 } 803 801 } 804 802 ··· 895 871 896 872 if (DO_BIC(BIC_CoreTmp)) 897 873 outp += sprintf(outp, "%sCoreTmp", (printed++ ? delim : "")); 874 + 875 + if (DO_BIC(BIC_CORE_THROT_CNT)) 876 + outp += sprintf(outp, "%sCoreThr", (printed++ ? delim : "")); 898 877 899 878 if (do_rapl && !rapl_joules) { 900 879 if (DO_BIC(BIC_CorWatt) && (do_rapl & RAPL_PER_CORE_ENERGY)) ··· 1038 1011 outp += sprintf(outp, "c6: %016llX\n", c->c6); 1039 1012 outp += sprintf(outp, "c7: %016llX\n", c->c7); 1040 1013 outp += sprintf(outp, "DTS: %dC\n", c->core_temp_c); 1014 + outp += sprintf(outp, "cpu_throt_count: %016llX\n", c->core_throt_cnt); 1041 1015 outp += sprintf(outp, "Joules: %0X\n", c->core_energy); 1042 1016 1043 1017 for (i = 0, mp = sys.cp; mp; i++, mp = mp->next) { ··· 1253 1225 if (DO_BIC(BIC_CoreTmp)) 1254 1226 outp += sprintf(outp, "%s%d", (printed++ ? delim : ""), c->core_temp_c); 1255 1227 1228 + /* Core throttle count */ 1229 + if (DO_BIC(BIC_CORE_THROT_CNT)) 1230 + outp += sprintf(outp, "%s%lld", (printed++ ? delim : ""), c->core_throt_cnt); 1231 + 1256 1232 for (i = 0, mp = sys.cp; mp; i++, mp = mp->next) { 1257 1233 if (mp->format == FORMAT_RAW) { 1258 1234 if (mp->width == 32) ··· 1343 1311 if (DO_BIC(BIC_PkgWatt)) 1344 1312 outp += 1345 1313 sprintf(outp, fmt8, (printed++ ? delim : ""), p->energy_pkg * rapl_energy_units / interval_float); 1314 + 1346 1315 if (DO_BIC(BIC_CorWatt) && !(do_rapl & RAPL_PER_CORE_ENERGY)) 1347 1316 outp += 1348 1317 sprintf(outp, fmt8, (printed++ ? delim : ""), p->energy_cores * rapl_energy_units / interval_float); ··· 1419 1386 1420 1387 void format_all_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p) 1421 1388 { 1422 - static int printed; 1389 + static int count; 1423 1390 1424 - if (!printed || !summary_only) 1391 + if ((!count || (header_iterations && !(count % header_iterations))) || !summary_only) 1425 1392 print_header("\t"); 1426 1393 1427 1394 format_counters(&average.threads, &average.cores, &average.packages); 1428 1395 1429 - printed = 1; 1396 + count++; 1430 1397 1431 1398 if (summary_only) 1432 1399 return; ··· 1500 1467 old->c6 = new->c6 - old->c6; 1501 1468 old->c7 = new->c7 - old->c7; 1502 1469 old->core_temp_c = new->core_temp_c; 1470 + old->core_throt_cnt = new->core_throt_cnt; 1503 1471 old->mc6_us = new->mc6_us - old->mc6_us; 1504 1472 1505 1473 DELTA_WRAP32(new->core_energy, old->core_energy); ··· 1660 1626 c->mc6_us = 0; 1661 1627 c->core_temp_c = 0; 1662 1628 c->core_energy = 0; 1629 + c->core_throt_cnt = 0; 1663 1630 1664 1631 p->pkg_wtd_core_c0 = 0; 1665 1632 p->pkg_any_core_c0 = 0; ··· 1745 1710 average.cores.mc6_us += c->mc6_us; 1746 1711 1747 1712 average.cores.core_temp_c = MAX(average.cores.core_temp_c, c->core_temp_c); 1713 + average.cores.core_throt_cnt = MAX(average.cores.core_throt_cnt, c->core_throt_cnt); 1748 1714 1749 1715 average.cores.core_energy += c->core_energy; 1750 1716 ··· 2023 1987 fprintf(outf, "cpu%d: BIOS BUG: apic 0x%x x2apic 0x%x\n", t->cpu_id, t->apic_id, t->x2apic_id); 2024 1988 } 2025 1989 1990 + int get_core_throt_cnt(int cpu, unsigned long long *cnt) 1991 + { 1992 + char path[128 + PATH_BYTES]; 1993 + unsigned long long tmp; 1994 + FILE *fp; 1995 + int ret; 1996 + 1997 + sprintf(path, "/sys/devices/system/cpu/cpu%d/thermal_throttle/core_throttle_count", cpu); 1998 + fp = fopen(path, "r"); 1999 + if (!fp) 2000 + return -1; 2001 + ret = fscanf(fp, "%lld", &tmp); 2002 + if (ret != 1) 2003 + return -1; 2004 + fclose(fp); 2005 + *cnt = tmp; 2006 + 2007 + return 0; 2008 + } 2009 + 2026 2010 /* 2027 2011 * get_counters(...) 2028 2012 * migrate to cpu ··· 2184 2128 return -9; 2185 2129 c->core_temp_c = tj_max - ((msr >> 16) & 0x7F); 2186 2130 } 2131 + 2132 + if (DO_BIC(BIC_CORE_THROT_CNT)) 2133 + get_core_throt_cnt(cpu, &c->core_throt_cnt); 2187 2134 2188 2135 if (do_rapl & RAPL_AMD_F17H) { 2189 2136 if (get_msr(cpu, MSR_CORE_ENERGY_STAT, &msr)) ··· 2487 2428 if (!genuine_intel) 2488 2429 return 0; 2489 2430 2431 + if (family != 6) 2432 + return 0; 2433 + 2490 2434 switch (model) { 2491 2435 case INTEL_FAM6_ATOM_GOLDMONT: 2492 2436 case INTEL_FAM6_SKYLAKE_X: ··· 2497 2435 case INTEL_FAM6_ATOM_GOLDMONT_D: 2498 2436 case INTEL_FAM6_ATOM_TREMONT_D: 2499 2437 return 1; 2438 + default: 2439 + return 0; 2500 2440 } 2501 - return 0; 2502 2441 } 2503 2442 2504 2443 static void dump_turbo_ratio_limits(int family, int model) ··· 3090 3027 */ 3091 3028 int count_cpus(int cpu) 3092 3029 { 3030 + UNUSED(cpu); 3031 + 3093 3032 topo.num_cpus++; 3094 3033 return 0; 3095 3034 } ··· 3426 3361 int i, ret; 3427 3362 int cpu = t->cpu_id; 3428 3363 3364 + UNUSED(c); 3365 + UNUSED(p); 3366 + 3429 3367 for (i = IDX_PKG_ENERGY; i < IDX_COUNT; i++) { 3430 3368 unsigned long long msr_cur, msr_last; 3431 3369 off_t offset; ··· 3455 3387 3456 3388 static void msr_record_handler(union sigval v) 3457 3389 { 3390 + UNUSED(v); 3391 + 3458 3392 for_all_cpus(update_msr_sum, EVEN_COUNTERS); 3459 3393 } 3460 3394 ··· 3509 3439 /* 3510 3440 * set_my_sched_priority(pri) 3511 3441 * return previous 3442 + * 3443 + * if non-root, do this: 3444 + * # /sbin/setcap cap_sys_rawio,cap_sys_nice=+ep /usr/bin/turbostat 3512 3445 */ 3513 3446 int set_my_sched_priority(int priority) 3514 3447 { ··· 3530 3457 errno = 0; 3531 3458 retval = getpriority(PRIO_PROCESS, 0); 3532 3459 if (retval != priority) 3533 - err(-1, "getpriority(%d) != setpriority(%d)", retval, priority); 3460 + err(retval, "getpriority(%d) != setpriority(%d)", retval, priority); 3534 3461 3535 3462 return original_priority; 3536 3463 } ··· 3539 3466 { 3540 3467 int retval; 3541 3468 int restarted = 0; 3542 - int done_iters = 0; 3469 + unsigned int done_iters = 0; 3543 3470 3544 3471 setup_signal_handler(); 3545 3472 ··· 3751 3678 break; 3752 3679 case INTEL_FAM6_ATOM_SILVERMONT: /* BYT */ 3753 3680 no_MSR_MISC_PWR_MGMT = 1; 3681 + /* FALLTHRU */ 3754 3682 case INTEL_FAM6_ATOM_SILVERMONT_D: /* AVN */ 3755 3683 pkg_cstate_limits = slv_pkg_cstate_limits; 3756 3684 break; ··· 3795 3721 if (!genuine_intel) 3796 3722 return 0; 3797 3723 3724 + if (family != 6) 3725 + return 0; 3726 + 3798 3727 switch (model) { 3799 3728 case INTEL_FAM6_ATOM_SILVERMONT: 3800 3729 case INTEL_FAM6_ATOM_SILVERMONT_MID: ··· 3813 3736 if (!genuine_intel) 3814 3737 return 0; 3815 3738 3739 + if (family != 6) 3740 + return 0; 3741 + 3816 3742 switch (model) { 3817 3743 case INTEL_FAM6_ATOM_GOLDMONT_D: 3818 3744 return 1; ··· 3827 3747 { 3828 3748 3829 3749 if (!genuine_intel) 3750 + return 0; 3751 + 3752 + if (family != 6) 3830 3753 return 0; 3831 3754 3832 3755 switch (model) { ··· 3845 3762 if (!genuine_intel) 3846 3763 return 0; 3847 3764 3765 + if (family != 6) 3766 + return 0; 3767 + 3848 3768 switch (model) { 3849 3769 case INTEL_FAM6_SKYLAKE_X: 3850 3770 return 1; ··· 3859 3773 { 3860 3774 3861 3775 if (!genuine_intel) 3776 + return 0; 3777 + 3778 + if (family != 6) 3862 3779 return 0; 3863 3780 3864 3781 switch (model) { ··· 3876 3787 if (!genuine_intel) 3877 3788 return 0; 3878 3789 3790 + if (family != 6) 3791 + return 0; 3792 + 3879 3793 switch (model) { 3880 3794 case INTEL_FAM6_ATOM_TREMONT: 3881 3795 return 1; ··· 3891 3799 if (!genuine_intel) 3892 3800 return 0; 3893 3801 3802 + if (family != 6) 3803 + return 0; 3804 + 3894 3805 switch (model) { 3895 3806 case INTEL_FAM6_ATOM_TREMONT_D: 3896 3807 return 1; ··· 3904 3809 int has_turbo_ratio_limit(unsigned int family, unsigned int model) 3905 3810 { 3906 3811 if (has_slv_msrs(family, model)) 3812 + return 0; 3813 + 3814 + if (family != 6) 3907 3815 return 0; 3908 3816 3909 3817 switch (model) { ··· 4223 4125 char *epb_string; 4224 4126 int cpu, epb; 4225 4127 4128 + UNUSED(c); 4129 + UNUSED(p); 4130 + 4226 4131 if (!has_epb) 4227 4132 return 0; 4228 4133 ··· 4271 4170 { 4272 4171 unsigned long long msr; 4273 4172 int cpu; 4173 + 4174 + UNUSED(c); 4175 + UNUSED(p); 4274 4176 4275 4177 if (!has_hwp) 4276 4178 return 0; ··· 4357 4253 { 4358 4254 unsigned long long msr; 4359 4255 int cpu; 4256 + 4257 + UNUSED(c); 4258 + UNUSED(p); 4360 4259 4361 4260 cpu = t->cpu_id; 4362 4261 ··· 4466 4359 4467 4360 double get_tdp_amd(unsigned int family) 4468 4361 { 4362 + UNUSED(family); 4363 + 4469 4364 /* This is the max stock TDP of HEDT/Server Fam17h+ chips */ 4470 4365 return 280.0; 4471 4366 } ··· 4485 4376 case INTEL_FAM6_BROADWELL_X: /* BDX */ 4486 4377 case INTEL_FAM6_SKYLAKE_X: /* SKX */ 4487 4378 case INTEL_FAM6_XEON_PHI_KNL: /* KNL */ 4379 + case INTEL_FAM6_ICELAKE_X: /* ICX */ 4488 4380 return (rapl_dram_energy_units = 15.3 / 1000000); 4489 4381 default: 4490 4382 return (rapl_energy_units); ··· 4669 4559 unsigned int has_rapl = 0; 4670 4560 double tdp; 4671 4561 4562 + UNUSED(model); 4563 + 4672 4564 if (max_extended_level >= 0x80000007) { 4673 4565 __cpuid(0x80000007, eax, ebx, ecx, edx); 4674 4566 /* RAPL (Fam 17h+) */ ··· 4729 4617 case INTEL_FAM6_HASWELL_L: /* HSW */ 4730 4618 case INTEL_FAM6_HASWELL_G: /* HSW */ 4731 4619 do_gfx_perf_limit_reasons = 1; 4620 + /* FALLTHRU */ 4732 4621 case INTEL_FAM6_HASWELL_X: /* HSX */ 4733 4622 do_core_perf_limit_reasons = 1; 4734 4623 do_ring_perf_limit_reasons = 1; ··· 4755 4642 unsigned long long msr; 4756 4643 unsigned int dts, dts2; 4757 4644 int cpu; 4645 + 4646 + UNUSED(c); 4647 + UNUSED(p); 4758 4648 4759 4649 if (!(do_dts || do_ptm)) 4760 4650 return 0; ··· 4814 4698 4815 4699 void print_power_limit_msr(int cpu, unsigned long long msr, char *label) 4816 4700 { 4817 - fprintf(outf, "cpu%d: %s: %sabled (%f Watts, %f sec, clamp %sabled)\n", 4701 + fprintf(outf, "cpu%d: %s: %sabled (%0.3f Watts, %f sec, clamp %sabled)\n", 4818 4702 cpu, label, 4819 4703 ((msr >> 15) & 1) ? "EN" : "DIS", 4820 4704 ((msr >> 0) & 0x7FFF) * rapl_power_units, ··· 4829 4713 unsigned long long msr; 4830 4714 const char *msr_name; 4831 4715 int cpu; 4716 + 4717 + UNUSED(c); 4718 + UNUSED(p); 4832 4719 4833 4720 if (!do_rapl) 4834 4721 return 0; ··· 4881 4762 cpu, msr, (msr >> 63) & 1 ? "" : "UN"); 4882 4763 4883 4764 print_power_limit_msr(cpu, msr, "PKG Limit #1"); 4884 - fprintf(outf, "cpu%d: PKG Limit #2: %sabled (%f Watts, %f* sec, clamp %sabled)\n", 4765 + fprintf(outf, "cpu%d: PKG Limit #2: %sabled (%0.3f Watts, %f* sec, clamp %sabled)\n", 4885 4766 cpu, 4886 4767 ((msr >> 47) & 1) ? "EN" : "DIS", 4887 4768 ((msr >> 32) & 0x7FFF) * rapl_power_units, 4888 4769 (1.0 + (((msr >> 54) & 0x3) / 4.0)) * (1 << ((msr >> 49) & 0x1F)) * rapl_time_units, 4889 4770 ((msr >> 48) & 1) ? "EN" : "DIS"); 4771 + 4772 + if (get_msr(cpu, MSR_VR_CURRENT_CONFIG, &msr)) 4773 + return -9; 4774 + 4775 + fprintf(outf, "cpu%d: MSR_VR_CURRENT_CONFIG: 0x%08llx\n", cpu, msr); 4776 + fprintf(outf, "cpu%d: PKG Limit #4: %f Watts (%slocked)\n", 4777 + cpu, ((msr >> 0) & 0x1FFF) * rapl_power_units, (msr >> 31) & 1 ? "" : "UN"); 4890 4778 } 4891 4779 4892 4780 if (do_rapl & RAPL_DRAM_POWER_INFO) { ··· 4956 4830 if (!genuine_intel) 4957 4831 return 0; 4958 4832 4833 + if (family != 6) 4834 + return 0; 4835 + 4959 4836 switch (model) { 4960 4837 case INTEL_FAM6_SANDYBRIDGE: 4961 4838 case INTEL_FAM6_SANDYBRIDGE_X: ··· 5002 4873 if (!genuine_intel) 5003 4874 return 0; 5004 4875 4876 + if (family != 6) 4877 + return 0; 4878 + 5005 4879 switch (model) { 5006 4880 case INTEL_FAM6_HASWELL_L: /* HSW */ 5007 4881 case INTEL_FAM6_BROADWELL: /* BDW */ ··· 5031 4899 if (!genuine_intel) 5032 4900 return 0; 5033 4901 4902 + if (family != 6) 4903 + return 0; 4904 + 5034 4905 switch (model) { 5035 4906 case INTEL_FAM6_SKYLAKE_L: /* SKL */ 5036 4907 case INTEL_FAM6_CANNONLAKE_L: /* CNL */ ··· 5046 4911 { 5047 4912 if (!genuine_intel) 5048 4913 return 0; 4914 + 4915 + if (family != 6) 4916 + return 0; 4917 + 5049 4918 switch (model) { 5050 4919 case INTEL_FAM6_ATOM_SILVERMONT: /* BYT */ 5051 4920 case INTEL_FAM6_ATOM_SILVERMONT_D: /* AVN */ ··· 5062 4923 { 5063 4924 if (!genuine_intel) 5064 4925 return 0; 4926 + 4927 + if (family != 6) 4928 + return 0; 4929 + 5065 4930 switch (model) { 5066 4931 case INTEL_FAM6_XEON_PHI_KNL: /* KNL */ 5067 4932 return 1; ··· 5076 4933 int is_cnl(unsigned int family, unsigned int model) 5077 4934 { 5078 4935 if (!genuine_intel) 4936 + return 0; 4937 + 4938 + if (family != 6) 5079 4939 return 0; 5080 4940 5081 4941 switch (model) { ··· 5135 4989 { 5136 4990 unsigned int eax, ebx, ecx, edx; 5137 4991 4992 + UNUSED(c); 4993 + UNUSED(p); 4994 + 5138 4995 if (!genuine_intel) 5139 4996 return 0; 5140 4997 ··· 5173 5024 unsigned long long msr; 5174 5025 unsigned int tcc_default, tcc_offset; 5175 5026 int cpu; 5027 + 5028 + UNUSED(c); 5029 + UNUSED(p); 5176 5030 5177 5031 /* tj_max is used only for dts or ptm */ 5178 5032 if (!(do_dts || do_ptm)) ··· 5724 5572 else 5725 5573 BIC_NOT_PRESENT(BIC_CPU_LPI); 5726 5574 5575 + if (!access("/sys/devices/system/cpu/cpu0/thermal_throttle/core_throttle_count", R_OK)) 5576 + BIC_PRESENT(BIC_CORE_THROT_CNT); 5577 + else 5578 + BIC_NOT_PRESENT(BIC_CORE_THROT_CNT); 5579 + 5727 5580 if (!access(sys_lpi_file_sysfs, R_OK)) { 5728 5581 sys_lpi_file = sys_lpi_file_sysfs; 5729 5582 BIC_PRESENT(BIC_SYS_LPI); ··· 5756 5599 return 1; 5757 5600 else 5758 5601 return 0; 5759 - } 5760 - 5761 - int open_dev_cpu_msr(int dummy1) 5762 - { 5763 - return 0; 5764 5602 } 5765 5603 5766 5604 void topology_probe() ··· 6048 5896 6049 5897 if (!quiet && do_irtl_snb) 6050 5898 print_irtl(); 5899 + 5900 + if (DO_BIC(BIC_IPC)) 5901 + (void)get_instr_count_fd(base_cpu); 6051 5902 } 6052 5903 6053 5904 int fork_it(char **argv) ··· 6128 5973 6129 5974 void print_version() 6130 5975 { 6131 - fprintf(outf, "turbostat version 21.05.04" " - Len Brown <lenb@kernel.org>\n"); 5976 + fprintf(outf, "turbostat version 2022.04.16 - Len Brown <lenb@kernel.org>\n"); 6132 5977 } 6133 5978 6134 5979 int add_counter(unsigned int msr_num, char *path, char *name, ··· 6293 6138 } 6294 6139 } 6295 6140 6141 + int is_deferred_add(char *name) 6142 + { 6143 + int i; 6144 + 6145 + for (i = 0; i < deferred_add_index; ++i) 6146 + if (!strcmp(name, deferred_add_names[i])) 6147 + return 1; 6148 + return 0; 6149 + } 6150 + 6296 6151 int is_deferred_skip(char *name) 6297 6152 { 6298 6153 int i; ··· 6320 6155 FILE *input; 6321 6156 int state; 6322 6157 char *sp; 6323 - 6324 - if (!DO_BIC(BIC_sysfs)) 6325 - return; 6326 6158 6327 6159 for (state = 10; state >= 0; --state) { 6328 6160 ··· 6342 6180 fclose(input); 6343 6181 6344 6182 sprintf(path, "cpuidle/state%d/time", state); 6183 + 6184 + if (!DO_BIC(BIC_sysfs) && !is_deferred_add(name_buf)) 6185 + continue; 6345 6186 6346 6187 if (is_deferred_skip(name_buf)) 6347 6188 continue; ··· 6370 6205 remove_underbar(name_buf); 6371 6206 6372 6207 sprintf(path, "cpuidle/state%d/usage", state); 6208 + 6209 + if (!DO_BIC(BIC_sysfs) && !is_deferred_add(name_buf)) 6210 + continue; 6373 6211 6374 6212 if (is_deferred_skip(name_buf)) 6375 6213 continue; ··· 6481 6313 { "interval", required_argument, 0, 'i' }, 6482 6314 { "IPC", no_argument, 0, 'I' }, 6483 6315 { "num_iterations", required_argument, 0, 'n' }, 6316 + { "header_iterations", required_argument, 0, 'N' }, 6484 6317 { "help", no_argument, 0, 'h' }, 6485 6318 { "hide", required_argument, 0, 'H' }, // meh, -h taken by --help 6486 6319 { "Joules", no_argument, 0, 'J' }, ··· 6563 6394 exit(2); 6564 6395 } 6565 6396 break; 6397 + case 'N': 6398 + header_iterations = strtod(optarg, NULL); 6399 + 6400 + if (header_iterations <= 0) { 6401 + fprintf(outf, "iterations %d should be positive number\n", header_iterations); 6402 + exit(2); 6403 + } 6404 + break; 6566 6405 case 's': 6567 6406 /* 6568 6407 * --show: show only those specified ··· 6609 6432 6610 6433 turbostat_init(); 6611 6434 6435 + msr_sum_record(); 6436 + 6612 6437 /* dump counters and exit */ 6613 6438 if (dump_only) 6614 6439 return get_and_dump_counters(); ··· 6622 6443 return 0; 6623 6444 } 6624 6445 6625 - msr_sum_record(); 6626 6446 /* 6627 6447 * if any params left, it must be a command to fork 6628 6448 */