Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'pm+acpi-4.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management and ACPI updates from Rafael Wysocki:
"The second batch of power management and ACPI updates for v4.6.

Included are fixups on top of the previous PM/ACPI pull request and
other material that didn't make into it but still should go into 4.6.

Among other things, there's a fix for an intel_pstate driver issue
uncovered by recent cpufreq changes, a workaround for a boot hang on
Skylake-H related to the handling of deep C-states by the platform and
a PCI/ACPI fix for the handling of IO port resources on non-x86
architectures plus some new device IDs and similar.

Specifics:

- Fix for an intel_pstate driver issue related to the handling of MSR
updates uncovered by the recent cpufreq rework (Rafael Wysocki).

- cpufreq core cleanups related to starting governors and frequency
synchronization during resume from system suspend and a locking fix
for cpufreq_quick_get() (Rafael Wysocki, Richard Cochran).

- acpi-cpufreq and powernv cpufreq driver updates (Jisheng Zhang,
Michael Neuling, Richard Cochran, Shilpasri Bhat).

- intel_idle driver update preventing some Skylake-H systems from
hanging during initialization by disabling deep C-states mishandled
by the platform in the problematic configurations (Len Brown).

- Intel Xeon Phi Processor x200 support for intel_idle
(Dasaratharaman Chandramouli).

- cpuidle menu governor updates to make it always honor PM QoS
latency constraints (and prevent C1 from being used as the fallback
C-state on x86 when they are set below its exit latency) and to
restore the previous behavior to fall back to C1 if the next timer
event is set far enough in the future that was changed in 4.4 which
led to an energy consumption regression (Rik van Riel, Rafael
Wysocki).

- New device ID for a future AMD UART controller in the ACPI driver
for AMD SoCs (Wang Hongcheng).

- Rockchip rk3399 support for the rockchip-io-domain adaptive voltage
scaling (AVS) driver (David Wu).

- ACPI PCI resources management fix for the handling of IO space
resources on architectures where the IO space is memory mapped
(IA64 and ARM64) broken by the introduction of common ACPI
resources parsing for PCI host bridges in 4.4 (Lorenzo Pieralisi).

- Fix for the ACPI backend of the generic device properties API to
make it parse non-device (data node only) children of an ACPI
device correctly (Irina Tirdea).

- Fixes for the handling of global suspend flags (introduced in 4.4)
during hibernation and resume from it (Lukas Wunner).

- Support for obtaining configuration information from Device Trees
in the PM clocks framework (Jon Hunter).

- ACPI _DSM helper code and devfreq framework cleanups (Colin Ian
King, Geert Uytterhoeven)"

* tag 'pm+acpi-4.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
PM / AVS: rockchip-io: add io selectors and supplies for rk3399
intel_idle: Support for Intel Xeon Phi Processor x200 Product Family
intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled
ACPI / PM: Runtime resume devices when waking from hibernate
PM / sleep: Clear pm_suspend_global_flags upon hibernate
cpufreq: governor: Always schedule work on the CPU running update
cpufreq: Always update current frequency before startig governor
cpufreq: Introduce cpufreq_update_current_freq()
cpufreq: Introduce cpufreq_start_governor()
cpufreq: powernv: Add sysfs attributes to show throttle stats
cpufreq: acpi-cpufreq: make Intel/AMD MSR access, io port access static
PCI: ACPI: IA64: fix IO port generic range check
ACPI / util: cast data to u64 before shifting to fix sign extension
cpufreq: powernv: Define per_cpu chip pointer to optimize hot-path
cpuidle: menu: Fall back to polling if next timer event is near
cpufreq: acpi-cpufreq: Clean up hot plug notifier callback
intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts
cpufreq: Make cpufreq_quick_get() safe to call
ACPI / property: fix data node parsing in acpi_get_next_subnode()
ACPI / APD: Add device HID for future AMD UART controller
...

+598 -168
+69
Documentation/ABI/testing/sysfs-devices-system-cpu
··· 271 271 - WriteBack: data is written only to the cache line and 272 272 the modified cache line is written to main 273 273 memory only when it is replaced 274 + 275 + What: /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats 276 + /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/turbo_stat 277 + /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/sub_turbo_stat 278 + /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/unthrottle 279 + /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/powercap 280 + /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/overtemp 281 + /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/supply_fault 282 + /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/overcurrent 283 + /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/occ_reset 284 + Date: March 2016 285 + Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org> 286 + Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org> 287 + Description: POWERNV CPUFreq driver's frequency throttle stats directory and 288 + attributes 289 + 290 + 'cpuX/cpufreq/throttle_stats' directory contains the CPU frequency 291 + throttle stat attributes for the chip. The throttle stats of a cpu 292 + is common across all the cpus belonging to a chip. Below are the 293 + throttle attributes exported in the 'throttle_stats' directory: 294 + 295 + - turbo_stat : This file gives the total number of times the max 296 + frequency is throttled to lower frequency in turbo (at and above 297 + nominal frequency) range of frequencies. 298 + 299 + - sub_turbo_stat : This file gives the total number of times the 300 + max frequency is throttled to lower frequency in sub-turbo(below 301 + nominal frequency) range of frequencies. 302 + 303 + - unthrottle : This file gives the total number of times the max 304 + frequency is unthrottled after being throttled. 305 + 306 + - powercap : This file gives the total number of times the max 307 + frequency is throttled due to 'Power Capping'. 308 + 309 + - overtemp : This file gives the total number of times the max 310 + frequency is throttled due to 'CPU Over Temperature'. 311 + 312 + - supply_fault : This file gives the total number of times the 313 + max frequency is throttled due to 'Power Supply Failure'. 314 + 315 + - overcurrent : This file gives the total number of times the 316 + max frequency is throttled due to 'Overcurrent'. 317 + 318 + - occ_reset : This file gives the total number of times the max 319 + frequency is throttled due to 'OCC Reset'. 320 + 321 + The sysfs attributes representing different throttle reasons like 322 + powercap, overtemp, supply_fault, overcurrent and occ_reset map to 323 + the reasons provided by OCC firmware for throttling the frequency. 324 + 325 + What: /sys/devices/system/cpu/cpufreq/policyX/throttle_stats 326 + /sys/devices/system/cpu/cpufreq/policyX/throttle_stats/turbo_stat 327 + /sys/devices/system/cpu/cpufreq/policyX/throttle_stats/sub_turbo_stat 328 + /sys/devices/system/cpu/cpufreq/policyX/throttle_stats/unthrottle 329 + /sys/devices/system/cpu/cpufreq/policyX/throttle_stats/powercap 330 + /sys/devices/system/cpu/cpufreq/policyX/throttle_stats/overtemp 331 + /sys/devices/system/cpu/cpufreq/policyX/throttle_stats/supply_fault 332 + /sys/devices/system/cpu/cpufreq/policyX/throttle_stats/overcurrent 333 + /sys/devices/system/cpu/cpufreq/policyX/throttle_stats/occ_reset 334 + Date: March 2016 335 + Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org> 336 + Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org> 337 + Description: POWERNV CPUFreq driver's frequency throttle stats directory and 338 + attributes 339 + 340 + 'policyX/throttle_stats' directory and all the attributes are same as 341 + the /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats directory and 342 + attributes which give the frequency throttle information of the chip.
+11
Documentation/devicetree/bindings/power/rockchip-io-domain.txt
··· 35 35 - "rockchip,rk3288-io-voltage-domain" for rk3288 36 36 - "rockchip,rk3368-io-voltage-domain" for rk3368 37 37 - "rockchip,rk3368-pmu-io-voltage-domain" for rk3368 pmu-domains 38 + - "rockchip,rk3399-io-voltage-domain" for rk3399 39 + - "rockchip,rk3399-pmu-io-voltage-domain" for rk3399 pmu-domains 38 40 - rockchip,grf: phandle to the syscon managing the "general register files" 39 41 40 42 ··· 80 78 Possible supplies for rk3368 pmu-domains: 81 79 - pmu-supply: The supply connected to PMUIO_VDD. 82 80 - vop-supply: The supply connected to LCDC_VDD. 81 + 82 + Possible supplies for rk3399: 83 + - bt656-supply: The supply connected to APIO2_VDD. 84 + - audio-supply: The supply connected to APIO5_VDD. 85 + - sdmmc-supply: The supply connected to SDMMC0_VDD. 86 + - gpio1830 The supply connected to APIO4_VDD. 87 + 88 + Possible supplies for rk3399 pmu-domains: 89 + - pmu1830-supply:The supply connected to PMUIO2_VDD. 83 90 84 91 Example: 85 92
+1
drivers/acpi/acpi_apd.c
··· 145 145 { "AMD0010", APD_ADDR(cz_i2c_desc) }, 146 146 { "AMDI0010", APD_ADDR(cz_i2c_desc) }, 147 147 { "AMD0020", APD_ADDR(cz_uart_desc) }, 148 + { "AMDI0020", APD_ADDR(cz_uart_desc) }, 148 149 { "AMD0030", }, 149 150 #endif 150 151 #ifdef CONFIG_ARM64
+1
drivers/acpi/property.c
··· 816 816 next = adev->node.next; 817 817 if (next == head) { 818 818 child = NULL; 819 + adev = ACPI_COMPANION(dev); 819 820 goto nondev; 820 821 } 821 822 adev = list_entry(next, struct acpi_device, node);
+13 -1
drivers/acpi/resource.c
··· 27 27 28 28 #ifdef CONFIG_X86 29 29 #define valid_IRQ(i) (((i) != 0) && ((i) != 2)) 30 + static inline bool acpi_iospace_resource_valid(struct resource *res) 31 + { 32 + /* On X86 IO space is limited to the [0 - 64K] IO port range */ 33 + return res->end < 0x10003; 34 + } 30 35 #else 31 36 #define valid_IRQ(i) (true) 37 + /* 38 + * ACPI IO descriptors on arches other than X86 contain MMIO CPU physical 39 + * addresses mapping IO space in CPU physical address space, IO space 40 + * resources can be placed anywhere in the 64-bit physical address space. 41 + */ 42 + static inline bool 43 + acpi_iospace_resource_valid(struct resource *res) { return true; } 32 44 #endif 33 45 34 46 static bool acpi_dev_resource_len_valid(u64 start, u64 end, u64 len, bool io) ··· 139 127 if (!acpi_dev_resource_len_valid(res->start, res->end, len, true)) 140 128 res->flags |= IORESOURCE_DISABLED | IORESOURCE_UNSET; 141 129 142 - if (res->end >= 0x10003) 130 + if (!acpi_iospace_resource_valid(res)) 143 131 res->flags |= IORESOURCE_DISABLED | IORESOURCE_UNSET; 144 132 145 133 if (io_decode == ACPI_DECODE_16)
+1
drivers/acpi/sleep.c
··· 748 748 749 749 static void acpi_hibernation_leave(void) 750 750 { 751 + pm_set_resume_via_firmware(); 751 752 /* 752 753 * If ACPI is not enabled by the BIOS and the boot kernel, we need to 753 754 * enable it here.
+1 -1
drivers/acpi/utils.c
··· 692 692 mask = obj->integer.value; 693 693 else if (obj->type == ACPI_TYPE_BUFFER) 694 694 for (i = 0; i < obj->buffer.length && i < 8; i++) 695 - mask |= (((u8)obj->buffer.pointer[i]) << (i * 8)); 695 + mask |= (((u64)obj->buffer.pointer[i]) << (i * 8)); 696 696 ACPI_FREE(obj); 697 697 698 698 /*
+89
drivers/base/power/clock_ops.c
··· 137 137 return __pm_clk_add(dev, NULL, clk); 138 138 } 139 139 140 + 141 + /** 142 + * of_pm_clk_add_clks - Start using device clock(s) for power management. 143 + * @dev: Device whose clock(s) is going to be used for power management. 144 + * 145 + * Add a series of clocks described in the 'clocks' device-tree node for 146 + * a device to the list of clocks used for the power management of @dev. 147 + * On success, returns the number of clocks added. Returns a negative 148 + * error code if there are no clocks in the device node for the device 149 + * or if adding a clock fails. 150 + */ 151 + int of_pm_clk_add_clks(struct device *dev) 152 + { 153 + struct clk **clks; 154 + unsigned int i, count; 155 + int ret; 156 + 157 + if (!dev || !dev->of_node) 158 + return -EINVAL; 159 + 160 + count = of_count_phandle_with_args(dev->of_node, "clocks", 161 + "#clock-cells"); 162 + if (count == 0) 163 + return -ENODEV; 164 + 165 + clks = kcalloc(count, sizeof(*clks), GFP_KERNEL); 166 + if (!clks) 167 + return -ENOMEM; 168 + 169 + for (i = 0; i < count; i++) { 170 + clks[i] = of_clk_get(dev->of_node, i); 171 + if (IS_ERR(clks[i])) { 172 + ret = PTR_ERR(clks[i]); 173 + goto error; 174 + } 175 + 176 + ret = pm_clk_add_clk(dev, clks[i]); 177 + if (ret) { 178 + clk_put(clks[i]); 179 + goto error; 180 + } 181 + } 182 + 183 + kfree(clks); 184 + 185 + return i; 186 + 187 + error: 188 + while (i--) 189 + pm_clk_remove_clk(dev, clks[i]); 190 + 191 + kfree(clks); 192 + 193 + return ret; 194 + } 195 + 140 196 /** 141 197 * __pm_clk_remove - Destroy PM clock entry. 142 198 * @ce: PM clock entry to destroy. ··· 240 184 else if (!con_id || !ce->con_id) 241 185 continue; 242 186 else if (!strcmp(con_id, ce->con_id)) 187 + goto remove; 188 + } 189 + 190 + spin_unlock_irq(&psd->lock); 191 + return; 192 + 193 + remove: 194 + list_del(&ce->node); 195 + spin_unlock_irq(&psd->lock); 196 + 197 + __pm_clk_remove(ce); 198 + } 199 + 200 + /** 201 + * pm_clk_remove_clk - Stop using a device clock for power management. 202 + * @dev: Device whose clock should not be used for PM any more. 203 + * @clk: Clock pointer 204 + * 205 + * Remove the clock pointed to by @clk from the list of clocks used for 206 + * the power management of @dev. 207 + */ 208 + void pm_clk_remove_clk(struct device *dev, struct clk *clk) 209 + { 210 + struct pm_subsys_data *psd = dev_to_psd(dev); 211 + struct pm_clock_entry *ce; 212 + 213 + if (!psd || !clk) 214 + return; 215 + 216 + spin_lock_irq(&psd->lock); 217 + 218 + list_for_each_entry(ce, &psd->clock_list, node) { 219 + if (clk == ce->clk) 243 220 goto remove; 244 221 } 245 222
+10 -8
drivers/cpufreq/acpi-cpufreq.c
··· 245 245 } 246 246 } 247 247 248 - u32 cpu_freq_read_intel(struct acpi_pct_register *not_used) 248 + static u32 cpu_freq_read_intel(struct acpi_pct_register *not_used) 249 249 { 250 250 u32 val, dummy; 251 251 ··· 253 253 return val; 254 254 } 255 255 256 - void cpu_freq_write_intel(struct acpi_pct_register *not_used, u32 val) 256 + static void cpu_freq_write_intel(struct acpi_pct_register *not_used, u32 val) 257 257 { 258 258 u32 lo, hi; 259 259 ··· 262 262 wrmsr(MSR_IA32_PERF_CTL, lo, hi); 263 263 } 264 264 265 - u32 cpu_freq_read_amd(struct acpi_pct_register *not_used) 265 + static u32 cpu_freq_read_amd(struct acpi_pct_register *not_used) 266 266 { 267 267 u32 val, dummy; 268 268 ··· 270 270 return val; 271 271 } 272 272 273 - void cpu_freq_write_amd(struct acpi_pct_register *not_used, u32 val) 273 + static void cpu_freq_write_amd(struct acpi_pct_register *not_used, u32 val) 274 274 { 275 275 wrmsr(MSR_AMD_PERF_CTL, val, 0); 276 276 } 277 277 278 - u32 cpu_freq_read_io(struct acpi_pct_register *reg) 278 + static u32 cpu_freq_read_io(struct acpi_pct_register *reg) 279 279 { 280 280 u32 val; 281 281 ··· 283 283 return val; 284 284 } 285 285 286 - void cpu_freq_write_io(struct acpi_pct_register *reg, u32 val) 286 + static void cpu_freq_write_io(struct acpi_pct_register *reg, u32 val) 287 287 { 288 288 acpi_os_write_port(reg->address, val, reg->bit_width); 289 289 } ··· 514 514 */ 515 515 516 516 switch (action) { 517 - case CPU_UP_PREPARE: 518 - case CPU_UP_PREPARE_FROZEN: 517 + case CPU_DOWN_FAILED: 518 + case CPU_DOWN_FAILED_FROZEN: 519 + case CPU_ONLINE: 520 + case CPU_ONLINE_FROZEN: 519 521 boost_set_msrs(acpi_cpufreq_driver.boost_enabled, cpumask); 520 522 break; 521 523
+54 -44
drivers/cpufreq/cpufreq.c
··· 76 76 /* internal prototypes */ 77 77 static int cpufreq_governor(struct cpufreq_policy *policy, unsigned int event); 78 78 static unsigned int __cpufreq_get(struct cpufreq_policy *policy); 79 + static int cpufreq_start_governor(struct cpufreq_policy *policy); 79 80 80 81 /** 81 82 * Two notifier lists: the "policy" list is involved in the ··· 965 964 cpumask_set_cpu(cpu, policy->cpus); 966 965 967 966 if (has_target()) { 968 - ret = cpufreq_governor(policy, CPUFREQ_GOV_START); 969 - if (!ret) 970 - ret = cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); 971 - 967 + ret = cpufreq_start_governor(policy); 972 968 if (ret) 973 969 pr_err("%s: Failed to start governor\n", __func__); 974 970 } ··· 1306 1308 /* Start governor again for active policy */ 1307 1309 if (!policy_is_inactive(policy)) { 1308 1310 if (has_target()) { 1309 - ret = cpufreq_governor(policy, CPUFREQ_GOV_START); 1310 - if (!ret) 1311 - ret = cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); 1312 - 1311 + ret = cpufreq_start_governor(policy); 1313 1312 if (ret) 1314 1313 pr_err("%s: Failed to start governor\n", __func__); 1315 1314 } ··· 1396 1401 { 1397 1402 struct cpufreq_policy *policy; 1398 1403 unsigned int ret_freq = 0; 1404 + unsigned long flags; 1399 1405 1400 - if (cpufreq_driver && cpufreq_driver->setpolicy && cpufreq_driver->get) 1401 - return cpufreq_driver->get(cpu); 1406 + read_lock_irqsave(&cpufreq_driver_lock, flags); 1407 + 1408 + if (cpufreq_driver && cpufreq_driver->setpolicy && cpufreq_driver->get) { 1409 + ret_freq = cpufreq_driver->get(cpu); 1410 + read_unlock_irqrestore(&cpufreq_driver_lock, flags); 1411 + return ret_freq; 1412 + } 1413 + 1414 + read_unlock_irqrestore(&cpufreq_driver_lock, flags); 1402 1415 1403 1416 policy = cpufreq_cpu_get(cpu); 1404 1417 if (policy) { ··· 1486 1483 return ret_freq; 1487 1484 } 1488 1485 EXPORT_SYMBOL(cpufreq_get); 1486 + 1487 + static unsigned int cpufreq_update_current_freq(struct cpufreq_policy *policy) 1488 + { 1489 + unsigned int new_freq; 1490 + 1491 + new_freq = cpufreq_driver->get(policy->cpu); 1492 + if (!new_freq) 1493 + return 0; 1494 + 1495 + if (!policy->cur) { 1496 + pr_debug("cpufreq: Driver did not initialize current freq\n"); 1497 + policy->cur = new_freq; 1498 + } else if (policy->cur != new_freq && has_target()) { 1499 + cpufreq_out_of_sync(policy, new_freq); 1500 + } 1501 + 1502 + return new_freq; 1503 + } 1489 1504 1490 1505 static struct subsys_interface cpufreq_interface = { 1491 1506 .name = "cpufreq", ··· 1604 1583 policy); 1605 1584 } else { 1606 1585 down_write(&policy->rwsem); 1607 - ret = cpufreq_governor(policy, CPUFREQ_GOV_START); 1608 - if (!ret) 1609 - cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); 1586 + ret = cpufreq_start_governor(policy); 1610 1587 up_write(&policy->rwsem); 1611 1588 1612 1589 if (ret) ··· 1612 1593 __func__, policy); 1613 1594 } 1614 1595 } 1615 - 1616 - /* 1617 - * schedule call cpufreq_update_policy() for first-online CPU, as that 1618 - * wouldn't be hotplugged-out on suspend. It will verify that the 1619 - * current freq is in sync with what we believe it to be. 1620 - */ 1621 - policy = cpufreq_cpu_get_raw(cpumask_first(cpu_online_mask)); 1622 - if (WARN_ON(!policy)) 1623 - return; 1624 - 1625 - schedule_work(&policy->update); 1626 1596 } 1627 1597 1628 1598 /** ··· 1935 1927 return ret; 1936 1928 } 1937 1929 1930 + static int cpufreq_start_governor(struct cpufreq_policy *policy) 1931 + { 1932 + int ret; 1933 + 1934 + if (cpufreq_driver->get && !cpufreq_driver->setpolicy) 1935 + cpufreq_update_current_freq(policy); 1936 + 1937 + ret = cpufreq_governor(policy, CPUFREQ_GOV_START); 1938 + return ret ? ret : cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); 1939 + } 1940 + 1938 1941 int cpufreq_register_governor(struct cpufreq_governor *governor) 1939 1942 { 1940 1943 int err; ··· 2082 2063 return cpufreq_driver->setpolicy(new_policy); 2083 2064 } 2084 2065 2085 - if (new_policy->governor == policy->governor) 2086 - goto out; 2066 + if (new_policy->governor == policy->governor) { 2067 + pr_debug("cpufreq: governor limits update\n"); 2068 + return cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); 2069 + } 2087 2070 2088 2071 pr_debug("governor switch\n"); 2089 2072 ··· 2113 2092 policy->governor = new_policy->governor; 2114 2093 ret = cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT); 2115 2094 if (!ret) { 2116 - ret = cpufreq_governor(policy, CPUFREQ_GOV_START); 2117 - if (!ret) 2118 - goto out; 2119 - 2095 + ret = cpufreq_start_governor(policy); 2096 + if (!ret) { 2097 + pr_debug("cpufreq: governor change\n"); 2098 + return 0; 2099 + } 2120 2100 cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT); 2121 2101 } 2122 2102 ··· 2128 2106 if (cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) 2129 2107 policy->governor = NULL; 2130 2108 else 2131 - cpufreq_governor(policy, CPUFREQ_GOV_START); 2109 + cpufreq_start_governor(policy); 2132 2110 } 2133 2111 2134 2112 return ret; 2135 - 2136 - out: 2137 - pr_debug("governor: change or update limits\n"); 2138 - return cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); 2139 2113 } 2140 2114 2141 2115 /** ··· 2162 2144 * -> ask driver for current freq and notify governors about a change 2163 2145 */ 2164 2146 if (cpufreq_driver->get && !cpufreq_driver->setpolicy) { 2165 - new_policy.cur = cpufreq_driver->get(cpu); 2147 + new_policy.cur = cpufreq_update_current_freq(policy); 2166 2148 if (WARN_ON(!new_policy.cur)) { 2167 2149 ret = -EIO; 2168 2150 goto unlock; 2169 - } 2170 - 2171 - if (!policy->cur) { 2172 - pr_debug("Driver did not initialize current freq\n"); 2173 - policy->cur = new_policy.cur; 2174 - } else { 2175 - if (policy->cur != new_policy.cur && has_target()) 2176 - cpufreq_out_of_sync(policy, new_policy.cur); 2177 2151 } 2178 2152 } 2179 2153
+1 -1
drivers/cpufreq/cpufreq_governor.c
··· 329 329 struct policy_dbs_info *policy_dbs; 330 330 331 331 policy_dbs = container_of(irq_work, struct policy_dbs_info, irq_work); 332 - schedule_work(&policy_dbs->work); 332 + schedule_work_on(smp_processor_id(), &policy_dbs->work); 333 333 } 334 334 335 335 static void dbs_update_util_handler(struct update_util_data *data, u64 time,
+43 -30
drivers/cpufreq/intel_pstate.c
··· 134 134 int (*get_min)(void); 135 135 int (*get_turbo)(void); 136 136 int (*get_scaling)(void); 137 - void (*set)(struct cpudata*, int pstate); 137 + u64 (*get_val)(struct cpudata*, int pstate); 138 138 void (*get_vid)(struct cpudata *); 139 139 int32_t (*get_target_pstate)(struct cpudata *); 140 140 }; ··· 565 565 return value & 0x7F; 566 566 } 567 567 568 - static void atom_set_pstate(struct cpudata *cpudata, int pstate) 568 + static u64 atom_get_val(struct cpudata *cpudata, int pstate) 569 569 { 570 570 u64 val; 571 571 int32_t vid_fp; ··· 585 585 if (pstate > cpudata->pstate.max_pstate) 586 586 vid = cpudata->vid.turbo; 587 587 588 - val |= vid; 589 - 590 - wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val); 588 + return val | vid; 591 589 } 592 590 593 591 static int silvermont_get_scaling(void) ··· 709 711 return 100000; 710 712 } 711 713 712 - static void core_set_pstate(struct cpudata *cpudata, int pstate) 714 + static u64 core_get_val(struct cpudata *cpudata, int pstate) 713 715 { 714 716 u64 val; 715 717 ··· 717 719 if (limits->no_turbo && !limits->turbo_disabled) 718 720 val |= (u64)1 << 32; 719 721 720 - wrmsrl(MSR_IA32_PERF_CTL, val); 722 + return val; 721 723 } 722 724 723 725 static int knl_get_turbo_pstate(void) ··· 748 750 .get_min = core_get_min_pstate, 749 751 .get_turbo = core_get_turbo_pstate, 750 752 .get_scaling = core_get_scaling, 751 - .set = core_set_pstate, 753 + .get_val = core_get_val, 752 754 .get_target_pstate = get_target_pstate_use_performance, 753 755 }, 754 756 }; ··· 767 769 .get_max_physical = atom_get_max_pstate, 768 770 .get_min = atom_get_min_pstate, 769 771 .get_turbo = atom_get_turbo_pstate, 770 - .set = atom_set_pstate, 772 + .get_val = atom_get_val, 771 773 .get_scaling = silvermont_get_scaling, 772 774 .get_vid = atom_get_vid, 773 775 .get_target_pstate = get_target_pstate_use_cpu_load, ··· 788 790 .get_max_physical = atom_get_max_pstate, 789 791 .get_min = atom_get_min_pstate, 790 792 .get_turbo = atom_get_turbo_pstate, 791 - .set = atom_set_pstate, 793 + .get_val = atom_get_val, 792 794 .get_scaling = airmont_get_scaling, 793 795 .get_vid = atom_get_vid, 794 796 .get_target_pstate = get_target_pstate_use_cpu_load, ··· 810 812 .get_min = core_get_min_pstate, 811 813 .get_turbo = knl_get_turbo_pstate, 812 814 .get_scaling = core_get_scaling, 813 - .set = core_set_pstate, 815 + .get_val = core_get_val, 814 816 .get_target_pstate = get_target_pstate_use_performance, 815 817 }, 816 818 }; ··· 837 839 *min = clamp_t(int, min_perf, cpu->pstate.min_pstate, max_perf); 838 840 } 839 841 840 - static void intel_pstate_set_pstate(struct cpudata *cpu, int pstate, bool force) 842 + static inline void intel_pstate_record_pstate(struct cpudata *cpu, int pstate) 841 843 { 842 - int max_perf, min_perf; 843 - 844 - if (force) { 845 - update_turbo_state(); 846 - 847 - intel_pstate_get_min_max(cpu, &min_perf, &max_perf); 848 - 849 - pstate = clamp_t(int, pstate, min_perf, max_perf); 850 - 851 - if (pstate == cpu->pstate.current_pstate) 852 - return; 853 - } 854 844 trace_cpu_frequency(pstate * cpu->pstate.scaling, cpu->cpu); 855 - 856 845 cpu->pstate.current_pstate = pstate; 846 + } 857 847 858 - pstate_funcs.set(cpu, pstate); 848 + static void intel_pstate_set_min_pstate(struct cpudata *cpu) 849 + { 850 + int pstate = cpu->pstate.min_pstate; 851 + 852 + intel_pstate_record_pstate(cpu, pstate); 853 + /* 854 + * Generally, there is no guarantee that this code will always run on 855 + * the CPU being updated, so force the register update to run on the 856 + * right CPU. 857 + */ 858 + wrmsrl_on_cpu(cpu->cpu, MSR_IA32_PERF_CTL, 859 + pstate_funcs.get_val(cpu, pstate)); 859 860 } 860 861 861 862 static void intel_pstate_get_cpu_pstates(struct cpudata *cpu) ··· 867 870 868 871 if (pstate_funcs.get_vid) 869 872 pstate_funcs.get_vid(cpu); 870 - intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate, false); 873 + 874 + intel_pstate_set_min_pstate(cpu); 871 875 } 872 876 873 877 static inline void intel_pstate_calc_busy(struct cpudata *cpu) ··· 995 997 return cpu->pstate.current_pstate - pid_calc(&cpu->pid, core_busy); 996 998 } 997 999 1000 + static inline void intel_pstate_update_pstate(struct cpudata *cpu, int pstate) 1001 + { 1002 + int max_perf, min_perf; 1003 + 1004 + update_turbo_state(); 1005 + 1006 + intel_pstate_get_min_max(cpu, &min_perf, &max_perf); 1007 + pstate = clamp_t(int, pstate, min_perf, max_perf); 1008 + if (pstate == cpu->pstate.current_pstate) 1009 + return; 1010 + 1011 + intel_pstate_record_pstate(cpu, pstate); 1012 + wrmsrl(MSR_IA32_PERF_CTL, pstate_funcs.get_val(cpu, pstate)); 1013 + } 1014 + 998 1015 static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu) 999 1016 { 1000 1017 int from, target_pstate; ··· 1019 1006 1020 1007 target_pstate = pstate_funcs.get_target_pstate(cpu); 1021 1008 1022 - intel_pstate_set_pstate(cpu, target_pstate, true); 1009 + intel_pstate_update_pstate(cpu, target_pstate); 1023 1010 1024 1011 sample = &cpu->sample; 1025 1012 trace_pstate_sample(fp_toint(sample->core_pct_busy), ··· 1193 1180 if (hwp_active) 1194 1181 return; 1195 1182 1196 - intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate, false); 1183 + intel_pstate_set_min_pstate(cpu); 1197 1184 } 1198 1185 1199 1186 static int intel_pstate_cpu_init(struct cpufreq_policy *policy) ··· 1268 1255 pstate_funcs.get_min = funcs->get_min; 1269 1256 pstate_funcs.get_turbo = funcs->get_turbo; 1270 1257 pstate_funcs.get_scaling = funcs->get_scaling; 1271 - pstate_funcs.set = funcs->set; 1258 + pstate_funcs.get_val = funcs->get_val; 1272 1259 pstate_funcs.get_vid = funcs->get_vid; 1273 1260 pstate_funcs.get_target_pstate = funcs->get_target_pstate; 1274 1261
+89 -35
drivers/cpufreq/powernv-cpufreq.c
··· 44 44 45 45 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1]; 46 46 static bool rebooting, throttled, occ_reset; 47 - static unsigned int *core_to_chip_map; 48 47 49 48 static const char * const throttle_reason[] = { 50 49 "No throttling", ··· 54 55 "OCC Reset" 55 56 }; 56 57 58 + enum throttle_reason_type { 59 + NO_THROTTLE = 0, 60 + POWERCAP, 61 + CPU_OVERTEMP, 62 + POWER_SUPPLY_FAILURE, 63 + OVERCURRENT, 64 + OCC_RESET_THROTTLE, 65 + OCC_MAX_REASON 66 + }; 67 + 57 68 static struct chip { 58 69 unsigned int id; 59 70 bool throttled; ··· 71 62 u8 throttle_reason; 72 63 cpumask_t mask; 73 64 struct work_struct throttle; 65 + int throttle_turbo; 66 + int throttle_sub_turbo; 67 + int reason[OCC_MAX_REASON]; 74 68 } *chips; 75 69 76 70 static int nr_chips; 71 + static DEFINE_PER_CPU(struct chip *, chip_info); 77 72 78 73 /* 79 74 * Note: The set of pstates consists of contiguous integers, the ··· 209 196 NULL, 210 197 }; 211 198 199 + #define throttle_attr(name, member) \ 200 + static ssize_t name##_show(struct cpufreq_policy *policy, char *buf) \ 201 + { \ 202 + struct chip *chip = per_cpu(chip_info, policy->cpu); \ 203 + \ 204 + return sprintf(buf, "%u\n", chip->member); \ 205 + } \ 206 + \ 207 + static struct freq_attr throttle_attr_##name = __ATTR_RO(name) \ 208 + 209 + throttle_attr(unthrottle, reason[NO_THROTTLE]); 210 + throttle_attr(powercap, reason[POWERCAP]); 211 + throttle_attr(overtemp, reason[CPU_OVERTEMP]); 212 + throttle_attr(supply_fault, reason[POWER_SUPPLY_FAILURE]); 213 + throttle_attr(overcurrent, reason[OVERCURRENT]); 214 + throttle_attr(occ_reset, reason[OCC_RESET_THROTTLE]); 215 + throttle_attr(turbo_stat, throttle_turbo); 216 + throttle_attr(sub_turbo_stat, throttle_sub_turbo); 217 + 218 + static struct attribute *throttle_attrs[] = { 219 + &throttle_attr_unthrottle.attr, 220 + &throttle_attr_powercap.attr, 221 + &throttle_attr_overtemp.attr, 222 + &throttle_attr_supply_fault.attr, 223 + &throttle_attr_overcurrent.attr, 224 + &throttle_attr_occ_reset.attr, 225 + &throttle_attr_turbo_stat.attr, 226 + &throttle_attr_sub_turbo_stat.attr, 227 + NULL, 228 + }; 229 + 230 + static const struct attribute_group throttle_attr_grp = { 231 + .name = "throttle_stats", 232 + .attrs = throttle_attrs, 233 + }; 234 + 212 235 /* Helper routines */ 213 236 214 237 /* Access helpers to power mgt SPR */ ··· 373 324 374 325 static void powernv_cpufreq_throttle_check(void *data) 375 326 { 327 + struct chip *chip; 376 328 unsigned int cpu = smp_processor_id(); 377 - unsigned int chip_id = core_to_chip_map[cpu_core_index_of_thread(cpu)]; 378 329 unsigned long pmsr; 379 - int pmsr_pmax, i; 330 + int pmsr_pmax; 380 331 381 332 pmsr = get_pmspr(SPRN_PMSR); 382 - 383 - for (i = 0; i < nr_chips; i++) 384 - if (chips[i].id == chip_id) 385 - break; 333 + chip = this_cpu_read(chip_info); 386 334 387 335 /* Check for Pmax Capping */ 388 336 pmsr_pmax = (s8)PMSR_MAX(pmsr); 389 337 if (pmsr_pmax != powernv_pstate_info.max) { 390 - if (chips[i].throttled) 338 + if (chip->throttled) 391 339 goto next; 392 - chips[i].throttled = true; 393 - if (pmsr_pmax < powernv_pstate_info.nominal) 340 + chip->throttled = true; 341 + if (pmsr_pmax < powernv_pstate_info.nominal) { 394 342 pr_warn_once("CPU %d on Chip %u has Pmax reduced below nominal frequency (%d < %d)\n", 395 - cpu, chips[i].id, pmsr_pmax, 343 + cpu, chip->id, pmsr_pmax, 396 344 powernv_pstate_info.nominal); 397 - trace_powernv_throttle(chips[i].id, 398 - throttle_reason[chips[i].throttle_reason], 345 + chip->throttle_sub_turbo++; 346 + } else { 347 + chip->throttle_turbo++; 348 + } 349 + trace_powernv_throttle(chip->id, 350 + throttle_reason[chip->throttle_reason], 399 351 pmsr_pmax); 400 - } else if (chips[i].throttled) { 401 - chips[i].throttled = false; 402 - trace_powernv_throttle(chips[i].id, 403 - throttle_reason[chips[i].throttle_reason], 352 + } else if (chip->throttled) { 353 + chip->throttled = false; 354 + trace_powernv_throttle(chip->id, 355 + throttle_reason[chip->throttle_reason], 404 356 pmsr_pmax); 405 357 } 406 358 ··· 461 411 for (i = 0; i < threads_per_core; i++) 462 412 cpumask_set_cpu(base + i, policy->cpus); 463 413 414 + if (!policy->driver_data) { 415 + int ret; 416 + 417 + ret = sysfs_create_group(&policy->kobj, &throttle_attr_grp); 418 + if (ret) { 419 + pr_info("Failed to create throttle stats directory for cpu %d\n", 420 + policy->cpu); 421 + return ret; 422 + } 423 + /* 424 + * policy->driver_data is used as a flag for one-time 425 + * creation of throttle sysfs files. 426 + */ 427 + policy->driver_data = policy; 428 + } 464 429 return cpufreq_table_validate_and_show(policy, powernv_freqs); 465 430 } 466 431 ··· 582 517 break; 583 518 584 519 if (omsg.throttle_status >= 0 && 585 - omsg.throttle_status <= OCC_MAX_THROTTLE_STATUS) 520 + omsg.throttle_status <= OCC_MAX_THROTTLE_STATUS) { 586 521 chips[i].throttle_reason = omsg.throttle_status; 522 + chips[i].reason[omsg.throttle_status]++; 523 + } 587 524 588 525 if (!omsg.throttle_status) 589 526 chips[i].restore = true; ··· 625 558 unsigned int chip[256]; 626 559 unsigned int cpu, i; 627 560 unsigned int prev_chip_id = UINT_MAX; 628 - cpumask_t cpu_mask; 629 - int ret = -ENOMEM; 630 561 631 - core_to_chip_map = kcalloc(cpu_nr_cores(), sizeof(unsigned int), 632 - GFP_KERNEL); 633 - if (!core_to_chip_map) 634 - goto out; 635 - 636 - cpumask_copy(&cpu_mask, cpu_possible_mask); 637 - for_each_cpu(cpu, &cpu_mask) { 562 + for_each_possible_cpu(cpu) { 638 563 unsigned int id = cpu_to_chip_id(cpu); 639 564 640 565 if (prev_chip_id != id) { 641 566 prev_chip_id = id; 642 567 chip[nr_chips++] = id; 643 568 } 644 - core_to_chip_map[cpu_core_index_of_thread(cpu)] = id; 645 - cpumask_andnot(&cpu_mask, &cpu_mask, cpu_sibling_mask(cpu)); 646 569 } 647 570 648 571 chips = kcalloc(nr_chips, sizeof(struct chip), GFP_KERNEL); 649 572 if (!chips) 650 - goto free_chip_map; 573 + return -ENOMEM; 651 574 652 575 for (i = 0; i < nr_chips; i++) { 653 576 chips[i].id = chip[i]; 654 577 cpumask_copy(&chips[i].mask, cpumask_of_node(chip[i])); 655 578 INIT_WORK(&chips[i].throttle, powernv_cpufreq_work_fn); 579 + for_each_cpu(cpu, &chips[i].mask) 580 + per_cpu(chip_info, cpu) = &chips[i]; 656 581 } 657 582 658 583 return 0; 659 - free_chip_map: 660 - kfree(core_to_chip_map); 661 - out: 662 - return ret; 663 584 } 664 585 665 586 static inline void clean_chip_info(void) 666 587 { 667 588 kfree(chips); 668 - kfree(core_to_chip_map); 669 589 } 670 590 671 591 static inline void unregister_all_notifiers(void)
+33 -23
drivers/cpuidle/governors/menu.c
··· 196 196 * of points is below a threshold. If it is... then use the 197 197 * average of these 8 points as the estimated value. 198 198 */ 199 - static void get_typical_interval(struct menu_device *data) 199 + static unsigned int get_typical_interval(struct menu_device *data) 200 200 { 201 201 int i, divisor; 202 202 unsigned int max, thresh, avg; ··· 253 253 if (likely(variance <= U64_MAX/36)) { 254 254 if ((((u64)avg*avg > variance*36) && (divisor * 4 >= INTERVALS * 3)) 255 255 || variance <= 400) { 256 - if (data->next_timer_us > avg) 257 - data->predicted_us = avg; 258 - return; 256 + return avg; 259 257 } 260 258 } 261 259 ··· 267 269 * with sporadic activity with a bunch of short pauses. 268 270 */ 269 271 if ((divisor * 4) <= INTERVALS * 3) 270 - return; 272 + return UINT_MAX; 271 273 272 274 thresh = max - 1; 273 275 goto again; ··· 284 286 int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY); 285 287 int i; 286 288 unsigned int interactivity_req; 289 + unsigned int expected_interval; 287 290 unsigned long nr_iowaiters, cpu_load; 288 291 289 292 if (data->needs_update) { ··· 311 312 data->correction_factor[data->bucket], 312 313 RESOLUTION * DECAY); 313 314 314 - get_typical_interval(data); 315 + expected_interval = get_typical_interval(data); 316 + expected_interval = min(expected_interval, data->next_timer_us); 317 + 318 + if (CPUIDLE_DRIVER_STATE_START > 0) { 319 + struct cpuidle_state *s = &drv->states[CPUIDLE_DRIVER_STATE_START]; 320 + unsigned int polling_threshold; 321 + 322 + /* 323 + * We want to default to C1 (hlt), not to busy polling 324 + * unless the timer is happening really really soon, or 325 + * C1's exit latency exceeds the user configured limit. 326 + */ 327 + polling_threshold = max_t(unsigned int, 20, s->target_residency); 328 + if (data->next_timer_us > polling_threshold && 329 + latency_req > s->exit_latency && !s->disabled && 330 + !dev->states_usage[CPUIDLE_DRIVER_STATE_START].disable) 331 + data->last_state_idx = CPUIDLE_DRIVER_STATE_START; 332 + else 333 + data->last_state_idx = CPUIDLE_DRIVER_STATE_START - 1; 334 + } else { 335 + data->last_state_idx = CPUIDLE_DRIVER_STATE_START; 336 + } 315 337 316 338 /* 317 - * Performance multiplier defines a minimum predicted idle 318 - * duration / latency ratio. Adjust the latency limit if 319 - * necessary. 339 + * Use the lowest expected idle interval to pick the idle state. 340 + */ 341 + data->predicted_us = min(data->predicted_us, expected_interval); 342 + 343 + /* 344 + * Use the performance multiplier and the user-configurable 345 + * latency_req to determine the maximum exit latency. 320 346 */ 321 347 interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load); 322 348 if (latency_req > interactivity_req) 323 349 latency_req = interactivity_req; 324 - 325 - if (CPUIDLE_DRIVER_STATE_START > 0) { 326 - data->last_state_idx = CPUIDLE_DRIVER_STATE_START - 1; 327 - /* 328 - * We want to default to C1 (hlt), not to busy polling 329 - * unless the timer is happening really really soon. 330 - */ 331 - if (interactivity_req > 20 && 332 - !drv->states[CPUIDLE_DRIVER_STATE_START].disabled && 333 - dev->states_usage[CPUIDLE_DRIVER_STATE_START].disable == 0) 334 - data->last_state_idx = CPUIDLE_DRIVER_STATE_START; 335 - } else { 336 - data->last_state_idx = CPUIDLE_DRIVER_STATE_START; 337 - } 338 350 339 351 /* 340 352 * Find the idle state with the lowest power while satisfying
+1 -1
drivers/devfreq/Kconfig
··· 61 61 Sets the frequency at the user specified one. 62 62 This governor returns the user configured frequency if there 63 63 has been an input to /sys/devices/.../power/devfreq_set_freq. 64 - Otherwise, the governor does not change the frequnecy 64 + Otherwise, the governor does not change the frequency 65 65 given at the initialization. 66 66 67 67 comment "DEVFREQ Drivers"
+113 -24
drivers/idle/intel_idle.c
··· 65 65 #include <asm/mwait.h> 66 66 #include <asm/msr.h> 67 67 68 - #define INTEL_IDLE_VERSION "0.4" 68 + #define INTEL_IDLE_VERSION "0.4.1" 69 69 #define PREFIX "intel_idle: " 70 70 71 71 static struct cpuidle_driver intel_idle_driver = { ··· 716 716 { 717 717 .enter = NULL } 718 718 }; 719 + static struct cpuidle_state knl_cstates[] = { 720 + { 721 + .name = "C1-KNL", 722 + .desc = "MWAIT 0x00", 723 + .flags = MWAIT2flg(0x00), 724 + .exit_latency = 1, 725 + .target_residency = 2, 726 + .enter = &intel_idle, 727 + .enter_freeze = intel_idle_freeze }, 728 + { 729 + .name = "C6-KNL", 730 + .desc = "MWAIT 0x10", 731 + .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TLB_FLUSHED, 732 + .exit_latency = 120, 733 + .target_residency = 500, 734 + .enter = &intel_idle, 735 + .enter_freeze = intel_idle_freeze }, 736 + { 737 + .enter = NULL } 738 + }; 719 739 720 740 /** 721 741 * intel_idle ··· 910 890 .disable_promotion_to_c1e = true, 911 891 }; 912 892 893 + static const struct idle_cpu idle_cpu_knl = { 894 + .state_table = knl_cstates, 895 + }; 896 + 913 897 #define ICPU(model, cpu) \ 914 898 { X86_VENDOR_INTEL, 6, model, X86_FEATURE_MWAIT, (unsigned long)&cpu } 915 899 ··· 945 921 ICPU(0x56, idle_cpu_bdw), 946 922 ICPU(0x4e, idle_cpu_skl), 947 923 ICPU(0x5e, idle_cpu_skl), 924 + ICPU(0x57, idle_cpu_knl), 948 925 {} 949 926 }; 950 927 MODULE_DEVICE_TABLE(x86cpu, intel_idle_ids); ··· 1019 994 } 1020 995 1021 996 /* 997 + * ivt_idle_state_table_update(void) 998 + * 999 + * Tune IVT multi-socket targets 1000 + * Assumption: num_sockets == (max_package_num + 1) 1001 + */ 1002 + static void ivt_idle_state_table_update(void) 1003 + { 1004 + /* IVT uses a different table for 1-2, 3-4, and > 4 sockets */ 1005 + int cpu, package_num, num_sockets = 1; 1006 + 1007 + for_each_online_cpu(cpu) { 1008 + package_num = topology_physical_package_id(cpu); 1009 + if (package_num + 1 > num_sockets) { 1010 + num_sockets = package_num + 1; 1011 + 1012 + if (num_sockets > 4) { 1013 + cpuidle_state_table = ivt_cstates_8s; 1014 + return; 1015 + } 1016 + } 1017 + } 1018 + 1019 + if (num_sockets > 2) 1020 + cpuidle_state_table = ivt_cstates_4s; 1021 + 1022 + /* else, 1 and 2 socket systems use default ivt_cstates */ 1023 + } 1024 + /* 1025 + * sklh_idle_state_table_update(void) 1026 + * 1027 + * On SKL-H (model 0x5e) disable C8 and C9 if: 1028 + * C10 is enabled and SGX disabled 1029 + */ 1030 + static void sklh_idle_state_table_update(void) 1031 + { 1032 + unsigned long long msr; 1033 + unsigned int eax, ebx, ecx, edx; 1034 + 1035 + 1036 + /* if PC10 disabled via cmdline intel_idle.max_cstate=7 or shallower */ 1037 + if (max_cstate <= 7) 1038 + return; 1039 + 1040 + /* if PC10 not present in CPUID.MWAIT.EDX */ 1041 + if ((mwait_substates & (0xF << 28)) == 0) 1042 + return; 1043 + 1044 + rdmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr); 1045 + 1046 + /* PC10 is not enabled in PKG C-state limit */ 1047 + if ((msr & 0xF) != 8) 1048 + return; 1049 + 1050 + ecx = 0; 1051 + cpuid(7, &eax, &ebx, &ecx, &edx); 1052 + 1053 + /* if SGX is present */ 1054 + if (ebx & (1 << 2)) { 1055 + 1056 + rdmsrl(MSR_IA32_FEATURE_CONTROL, msr); 1057 + 1058 + /* if SGX is enabled */ 1059 + if (msr & (1 << 18)) 1060 + return; 1061 + } 1062 + 1063 + skl_cstates[5].disabled = 1; /* C8-SKL */ 1064 + skl_cstates[6].disabled = 1; /* C9-SKL */ 1065 + } 1066 + /* 1022 1067 * intel_idle_state_table_update() 1023 1068 * 1024 1069 * Update the default state_table for this CPU-id 1025 - * 1026 - * Currently used to access tuned IVT multi-socket targets 1027 - * Assumption: num_sockets == (max_package_num + 1) 1028 1070 */ 1029 - void intel_idle_state_table_update(void) 1071 + 1072 + static void intel_idle_state_table_update(void) 1030 1073 { 1031 - /* IVT uses a different table for 1-2, 3-4, and > 4 sockets */ 1032 - if (boot_cpu_data.x86_model == 0x3e) { /* IVT */ 1033 - int cpu, package_num, num_sockets = 1; 1074 + switch (boot_cpu_data.x86_model) { 1034 1075 1035 - for_each_online_cpu(cpu) { 1036 - package_num = topology_physical_package_id(cpu); 1037 - if (package_num + 1 > num_sockets) { 1038 - num_sockets = package_num + 1; 1039 - 1040 - if (num_sockets > 4) { 1041 - cpuidle_state_table = ivt_cstates_8s; 1042 - return; 1043 - } 1044 - } 1045 - } 1046 - 1047 - if (num_sockets > 2) 1048 - cpuidle_state_table = ivt_cstates_4s; 1049 - /* else, 1 and 2 socket systems use default ivt_cstates */ 1076 + case 0x3e: /* IVT */ 1077 + ivt_idle_state_table_update(); 1078 + break; 1079 + case 0x5e: /* SKL-H */ 1080 + sklh_idle_state_table_update(); 1081 + break; 1050 1082 } 1051 - return; 1052 1083 } 1053 1084 1054 1085 /* ··· 1143 1062 /* if NO sub-states for this state in CPUID, skip it */ 1144 1063 if (num_substates == 0) 1145 1064 continue; 1065 + 1066 + /* if state marked as disabled, skip it */ 1067 + if (cpuidle_state_table[cstate].disabled != 0) { 1068 + pr_debug(PREFIX "state %s is disabled", 1069 + cpuidle_state_table[cstate].name); 1070 + continue; 1071 + } 1072 + 1146 1073 1147 1074 if (((mwait_cstate + 1) > 2) && 1148 1075 !boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
+58
drivers/power/avs/rockchip-io-domain.c
··· 47 47 #define RK3368_SOC_CON15_FLASH0 BIT(14) 48 48 #define RK3368_SOC_FLASH_SUPPLY_NUM 2 49 49 50 + #define RK3399_PMUGRF_CON0 0x180 51 + #define RK3399_PMUGRF_CON0_VSEL BIT(8) 52 + #define RK3399_PMUGRF_VSEL_SUPPLY_NUM 9 53 + 50 54 struct rockchip_iodomain; 51 55 52 56 /** ··· 185 181 dev_warn(iod->dev, "couldn't update flash0 ctrl\n"); 186 182 } 187 183 184 + static void rk3399_pmu_iodomain_init(struct rockchip_iodomain *iod) 185 + { 186 + int ret; 187 + u32 val; 188 + 189 + /* if no pmu io supply we should leave things alone */ 190 + if (!iod->supplies[RK3399_PMUGRF_VSEL_SUPPLY_NUM].reg) 191 + return; 192 + 193 + /* 194 + * set pmu io iodomain to also use this framework 195 + * instead of a special gpio. 196 + */ 197 + val = RK3399_PMUGRF_CON0_VSEL | (RK3399_PMUGRF_CON0_VSEL << 16); 198 + ret = regmap_write(iod->grf, RK3399_PMUGRF_CON0, val); 199 + if (ret < 0) 200 + dev_warn(iod->dev, "couldn't update pmu io iodomain ctrl\n"); 201 + } 202 + 188 203 /* 189 204 * On the rk3188 the io-domains are handled by a shared register with the 190 205 * lower 8 bits being still being continuing drive-strength settings. ··· 275 252 }, 276 253 }; 277 254 255 + static const struct rockchip_iodomain_soc_data soc_data_rk3399 = { 256 + .grf_offset = 0xe640, 257 + .supply_names = { 258 + "bt656", /* APIO2_VDD */ 259 + "audio", /* APIO5_VDD */ 260 + "sdmmc", /* SDMMC0_VDD */ 261 + "gpio1830", /* APIO4_VDD */ 262 + }, 263 + }; 264 + 265 + static const struct rockchip_iodomain_soc_data soc_data_rk3399_pmu = { 266 + .grf_offset = 0x180, 267 + .supply_names = { 268 + NULL, 269 + NULL, 270 + NULL, 271 + NULL, 272 + NULL, 273 + NULL, 274 + NULL, 275 + NULL, 276 + NULL, 277 + "pmu1830", /* PMUIO2_VDD */ 278 + }, 279 + .init = rk3399_pmu_iodomain_init, 280 + }; 281 + 278 282 static const struct of_device_id rockchip_iodomain_match[] = { 279 283 { 280 284 .compatible = "rockchip,rk3188-io-voltage-domain", ··· 318 268 { 319 269 .compatible = "rockchip,rk3368-pmu-io-voltage-domain", 320 270 .data = (void *)&soc_data_rk3368_pmu 271 + }, 272 + { 273 + .compatible = "rockchip,rk3399-io-voltage-domain", 274 + .data = (void *)&soc_data_rk3399 275 + }, 276 + { 277 + .compatible = "rockchip,rk3399-pmu-io-voltage-domain", 278 + .data = (void *)&soc_data_rk3399_pmu 321 279 }, 322 280 { /* sentinel */ }, 323 281 };
+9
include/linux/pm_clock.h
··· 42 42 extern void pm_clk_destroy(struct device *dev); 43 43 extern int pm_clk_add(struct device *dev, const char *con_id); 44 44 extern int pm_clk_add_clk(struct device *dev, struct clk *clk); 45 + extern int of_pm_clk_add_clks(struct device *dev); 45 46 extern void pm_clk_remove(struct device *dev, const char *con_id); 47 + extern void pm_clk_remove_clk(struct device *dev, struct clk *clk); 46 48 extern int pm_clk_suspend(struct device *dev); 47 49 extern int pm_clk_resume(struct device *dev); 48 50 #else ··· 71 69 { 72 70 return -EINVAL; 73 71 } 72 + static inline int of_pm_clk_add_clks(struct device *dev) 73 + { 74 + return -EINVAL; 75 + } 74 76 static inline void pm_clk_remove(struct device *dev, const char *con_id) 75 77 { 76 78 } 77 79 #define pm_clk_suspend NULL 78 80 #define pm_clk_resume NULL 81 + static inline void pm_clk_remove_clk(struct device *dev, struct clk *clk) 82 + { 83 + } 79 84 #endif 80 85 81 86 #ifdef CONFIG_HAVE_CLK
+1
kernel/power/hibernate.c
··· 339 339 pm_message_t msg; 340 340 int error; 341 341 342 + pm_suspend_clear_flags(); 342 343 error = platform_begin(platform_mode); 343 344 if (error) 344 345 goto Close;