Merge back earlier cpuidle material for 6.15

+16 -11

Documentation/admin-guide/pm/cpuidle.rst

··· 275 275 and variance of them. If the variance is small (smaller than 400 square 276 276 milliseconds) or it is small relative to the average (the average is greater 277 277 that 6 times the standard deviation), the average is regarded as the "typical 278 - interval" value. Otherwise, the longest of the saved observed idle duration 278 + interval" value. Otherwise, either the longest or the shortest (depending on 279 + which one is farther from the average) of the saved observed idle duration 279 280 values is discarded and the computation is repeated for the remaining ones. 281 + 280 282 Again, if the variance of them is small (in the above sense), the average is 281 283 taken as the "typical interval" value and so on, until either the "typical 282 - interval" is determined or too many data points are disregarded, in which case 283 - the "typical interval" is assumed to equal "infinity" (the maximum unsigned 284 - integer value). 284 + interval" is determined or too many data points are disregarded. In the latter 285 + case, if the size of the set of data points still under consideration is 286 + sufficiently large, the next idle duration is not likely to be above the largest 287 + idle duration value still in that set, so that value is taken as the predicted 288 + next idle duration. Finally, if the set of data points still under 289 + consideration is too small, no prediction is made. 285 290 286 - If the "typical interval" computed this way is long enough, the governor obtains 287 - the time until the closest timer event with the assumption that the scheduler 288 - tick will be stopped. That time, referred to as the *sleep length* in what follows, 289 - is the upper bound on the time before the next CPU wakeup. It is used to determine 290 - the sleep length range, which in turn is needed to get the sleep length correction 291 - factor. 291 + If the preliminary prediction of the next idle duration computed this way is 292 + long enough, the governor obtains the time until the closest timer event with 293 + the assumption that the scheduler tick will be stopped. That time, referred to 294 + as the *sleep length* in what follows, is the upper bound on the time before the 295 + next CPU wakeup. It is used to determine the sleep length range, which in turn 296 + is needed to get the sleep length correction factor. 292 297 293 298 The ``menu`` governor maintains an array containing several correction factor 294 299 values that correspond to different sleep length ranges organized so that each ··· 307 302 The sleep length is multiplied by the correction factor for the range that it 308 303 falls into to obtain an approximation of the predicted idle duration that is 309 304 compared to the "typical interval" determined previously and the minimum of 310 - the two is taken as the idle duration prediction. 305 + the two is taken as the final idle duration prediction. 311 306 312 307 If the "typical interval" value is small, which means that the CPU is likely 313 308 to be woken up soon enough, the sleep length computation is skipped as it may

+13 -5

Documentation/admin-guide/pm/intel_idle.rst

··· 192 192 Documentation/admin-guide/pm/cpuidle.rst). 193 193 Setting ``max_cstate`` to 0 causes the ``intel_idle`` initialization to fail. 194 194 195 - The ``no_acpi`` and ``use_acpi`` module parameters (recognized by ``intel_idle`` 196 - if the kernel has been configured with ACPI support) can be set to make the 197 - driver ignore the system's ACPI tables entirely or use them for all of the 198 - recognized processor models, respectively (they both are unset by default and 199 - ``use_acpi`` has no effect if ``no_acpi`` is set). 195 + The ``no_acpi``, ``use_acpi`` and ``no_native`` module parameters are 196 + recognized by ``intel_idle`` if the kernel has been configured with ACPI 197 + support. In the case that ACPI is not configured these flags have no impact 198 + on functionality. 199 + 200 + ``no_acpi`` - Do not use ACPI at all. Only native mode is available, no 201 + ACPI mode. 202 + 203 + ``use_acpi`` - No-op in ACPI mode, the driver will consult ACPI tables for 204 + C-states on/off status in native mode. 205 + 206 + ``no_native`` - Work only in ACPI mode, no native mode available (ignore 207 + all custom tables). 200 208 201 209 The value of the ``states_off`` module parameter (0 by default) represents a 202 210 list of idle states to be disabled by default in the form of a bitmask.

+5 -3

MAINTAINERS

··· 11668 11668 F: drivers/crypto/intel/iaa/* 11669 11669 11670 11670 INTEL IDLE DRIVER 11671 - M: Jacob Pan <jacob.jun.pan@linux.intel.com> 11672 - M: Len Brown <lenb@kernel.org> 11671 + M: Rafael J. Wysocki <rafael@kernel.org> 11672 + M: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> 11673 + M: Artem Bityutskiy <dedekind1@gmail.com> 11674 + R: Len Brown <lenb@kernel.org> 11673 11675 L: linux-pm@vger.kernel.org 11674 11676 S: Supported 11675 11677 B: https://bugzilla.kernel.org 11676 - T: git git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git 11678 + T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 11677 11679 F: drivers/idle/intel_idle.c 11678 11680 11679 11681 INTEL IDXD DRIVER

+67 -62

drivers/cpuidle/governors/menu.c

············

+27 -8

drivers/idle/intel_idle.c

··· 90 90 * Indicate which enable bits to clear here. 91 91 */ 92 92 unsigned long auto_demotion_disable_flags; 93 - bool byt_auto_demotion_disable_flag; 94 93 bool disable_promotion_to_c1e; 95 94 bool use_acpi; 96 95 }; ··· 1463 1464 static const struct idle_cpu idle_cpu_byt __initconst = { 1464 1465 .state_table = byt_cstates, 1465 1466 .disable_promotion_to_c1e = true, 1466 - .byt_auto_demotion_disable_flag = true, 1467 1467 }; 1468 1468 1469 1469 static const struct idle_cpu idle_cpu_cht __initconst = { 1470 1470 .state_table = cht_cstates, 1471 1471 .disable_promotion_to_c1e = true, 1472 - .byt_auto_demotion_disable_flag = true, 1473 1472 }; 1474 1473 1475 1474 static const struct idle_cpu idle_cpu_ivb __initconst = { ··· 1693 1696 module_param_named(use_acpi, force_use_acpi, bool, 0444); 1694 1697 MODULE_PARM_DESC(use_acpi, "Use ACPI _CST for building the idle states list"); 1695 1698 1699 + static bool no_native __read_mostly; /* No effect if no_acpi is set. */ 1700 + module_param_named(no_native, no_native, bool, 0444); 1701 + MODULE_PARM_DESC(no_native, "Ignore cpu specific (native) idle states in lieu of ACPI idle states"); 1702 + 1696 1703 static struct acpi_processor_power acpi_state_table __initdata; 1697 1704 1698 1705 /** ··· 1839 1838 } 1840 1839 return true; 1841 1840 } 1841 + 1842 + static inline bool ignore_native(void) 1843 + { 1844 + return no_native && !no_acpi; 1845 + } 1842 1846 #else /* !CONFIG_ACPI_PROCESSOR_CSTATE */ 1843 1847 #define force_use_acpi (false) 1844 1848 ··· 1853 1847 { 1854 1848 return false; 1855 1849 } 1850 + static inline bool ignore_native(void) { return false; } 1856 1851 #endif /* !CONFIG_ACPI_PROCESSOR_CSTATE */ 1857 1852 1858 1853 /** ··· 2066 2059 } 2067 2060 } 2068 2061 2062 + /** 2063 + * byt_cht_auto_demotion_disable - Disable Bay/Cherry Trail auto-demotion. 2064 + */ 2065 + static void __init byt_cht_auto_demotion_disable(void) 2066 + { 2067 + wrmsrl(MSR_CC6_DEMOTION_POLICY_CONFIG, 0); 2068 + wrmsrl(MSR_MC6_DEMOTION_POLICY_CONFIG, 0); 2069 + } 2070 + 2069 2071 static bool __init intel_idle_verify_cstate(unsigned int mwait_hint) 2070 2072 { 2071 2073 unsigned int mwait_cstate = (MWAIT_HINT2CSTATE(mwait_hint) + 1) & ··· 2156 2140 case INTEL_ATOM_GRACEMONT: 2157 2141 adl_idle_state_table_update(); 2158 2142 break; 2143 + case INTEL_ATOM_SILVERMONT: 2144 + case INTEL_ATOM_AIRMONT: 2145 + byt_cht_auto_demotion_disable(); 2146 + break; 2159 2147 } 2160 2148 2161 2149 for (cstate = 0; cstate < CPUIDLE_STATE_MAX; ++cstate) { ··· 2201 2181 state->flags |= CPUIDLE_FLAG_TIMER_STOP; 2202 2182 2203 2183 drv->state_count++; 2204 - } 2205 - 2206 - if (icpu->byt_auto_demotion_disable_flag) { 2207 - wrmsrl(MSR_CC6_DEMOTION_POLICY_CONFIG, 0); 2208 - wrmsrl(MSR_MC6_DEMOTION_POLICY_CONFIG, 0); 2209 2184 } 2210 2185 } 2211 2186 ··· 2347 2332 pr_debug("MWAIT substates: 0x%x\n", mwait_substates); 2348 2333 2349 2334 icpu = (const struct idle_cpu *)id->driver_data; 2335 + if (icpu && ignore_native()) { 2336 + pr_debug("ignoring native CPU idle states\n"); 2337 + icpu = NULL; 2338 + } 2350 2339 if (icpu) { 2351 2340 if (icpu->state_table) 2352 2341 cpuidle_state_table = icpu->state_table;