···275275and variance of them. If the variance is small (smaller than 400 square276276milliseconds) or it is small relative to the average (the average is greater277277that 6 times the standard deviation), the average is regarded as the "typical278278-interval" value. Otherwise, the longest of the saved observed idle duration278278+interval" value. Otherwise, either the longest or the shortest (depending on279279+which one is farther from the average) of the saved observed idle duration279280values is discarded and the computation is repeated for the remaining ones.281281+280282Again, if the variance of them is small (in the above sense), the average is281283taken as the "typical interval" value and so on, until either the "typical282282-interval" is determined or too many data points are disregarded, in which case283283-the "typical interval" is assumed to equal "infinity" (the maximum unsigned284284-integer value).284284+interval" is determined or too many data points are disregarded. In the latter285285+case, if the size of the set of data points still under consideration is286286+sufficiently large, the next idle duration is not likely to be above the largest287287+idle duration value still in that set, so that value is taken as the predicted288288+next idle duration. Finally, if the set of data points still under289289+consideration is too small, no prediction is made.285290286286-If the "typical interval" computed this way is long enough, the governor obtains287287-the time until the closest timer event with the assumption that the scheduler288288-tick will be stopped. That time, referred to as the *sleep length* in what follows,289289-is the upper bound on the time before the next CPU wakeup. It is used to determine290290-the sleep length range, which in turn is needed to get the sleep length correction291291-factor.291291+If the preliminary prediction of the next idle duration computed this way is292292+long enough, the governor obtains the time until the closest timer event with293293+the assumption that the scheduler tick will be stopped. That time, referred to294294+as the *sleep length* in what follows, is the upper bound on the time before the295295+next CPU wakeup. It is used to determine the sleep length range, which in turn296296+is needed to get the sleep length correction factor.292297293298The ``menu`` governor maintains an array containing several correction factor294299values that correspond to different sleep length ranges organized so that each···307302The sleep length is multiplied by the correction factor for the range that it308303falls into to obtain an approximation of the predicted idle duration that is309304compared to the "typical interval" determined previously and the minimum of310310-the two is taken as the idle duration prediction.305305+the two is taken as the final idle duration prediction.311306312307If the "typical interval" value is small, which means that the CPU is likely313308to be woken up soon enough, the sleep length computation is skipped as it may
+13-5
Documentation/admin-guide/pm/intel_idle.rst
···192192Documentation/admin-guide/pm/cpuidle.rst).193193Setting ``max_cstate`` to 0 causes the ``intel_idle`` initialization to fail.194194195195-The ``no_acpi`` and ``use_acpi`` module parameters (recognized by ``intel_idle``196196-if the kernel has been configured with ACPI support) can be set to make the197197-driver ignore the system's ACPI tables entirely or use them for all of the198198-recognized processor models, respectively (they both are unset by default and199199-``use_acpi`` has no effect if ``no_acpi`` is set).195195+The ``no_acpi``, ``use_acpi`` and ``no_native`` module parameters are196196+recognized by ``intel_idle`` if the kernel has been configured with ACPI197197+support. In the case that ACPI is not configured these flags have no impact198198+on functionality.199199+200200+``no_acpi`` - Do not use ACPI at all. Only native mode is available, no201201+ACPI mode.202202+203203+``use_acpi`` - No-op in ACPI mode, the driver will consult ACPI tables for204204+C-states on/off status in native mode.205205+206206+``no_native`` - Work only in ACPI mode, no native mode available (ignore207207+all custom tables).200208201209The value of the ``states_off`` module parameter (0 by default) represents a202210list of idle states to be disabled by default in the form of a bitmask.
+5-3
MAINTAINERS
···1166811668F: drivers/crypto/intel/iaa/*11669116691167011670INTEL IDLE DRIVER1167111671-M: Jacob Pan <jacob.jun.pan@linux.intel.com>1167211672-M: Len Brown <lenb@kernel.org>1167111671+M: Rafael J. Wysocki <rafael@kernel.org>1167211672+M: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>1167311673+M: Artem Bityutskiy <dedekind1@gmail.com>1167411674+R: Len Brown <lenb@kernel.org>1167311675L: linux-pm@vger.kernel.org1167411676S: Supported1167511677B: https://bugzilla.kernel.org1167611676-T: git git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git1167811678+T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git1167711679F: drivers/idle/intel_idle.c11678116801167911681INTEL IDXD DRIVER
+67-62
drivers/cpuidle/governors/menu.c
···4141 * the C state is required to actually break even on this cost. CPUIDLE4242 * provides us this duration in the "target_residency" field. So all that we4343 * need is a good prediction of how long we'll be idle. Like the traditional4444- * menu governor, we start with the actual known "next timer event" time.4444+ * menu governor, we take the actual known "next timer event" time.4545 *4646 * Since there are other source of wakeups (interrupts for example) than4747 * the next timer event, this estimation is rather optimistic. To get a···5050 * duration always was 50% of the next timer tick, the correction factor will5151 * be 0.5.5252 *5353- * menu uses a running average for this correction factor, however it uses a5454- * set of factors, not just a single factor. This stems from the realization5555- * that the ratio is dependent on the order of magnitude of the expected5656- * duration; if we expect 500 milliseconds of idle time the likelihood of5757- * getting an interrupt very early is much higher than if we expect 50 micro5858- * seconds of idle time. A second independent factor that has big impact on5959- * the actual factor is if there is (disk) IO outstanding or not.6060- * (as a special twist, we consider every sleep longer than 50 milliseconds6161- * as perfect; there are no power gains for sleeping longer than this)6262- *6363- * For these two reasons we keep an array of 12 independent factors, that gets6464- * indexed based on the magnitude of the expected duration as well as the6565- * "is IO outstanding" property.5353+ * menu uses a running average for this correction factor, but it uses a set of5454+ * factors, not just a single factor. This stems from the realization that the5555+ * ratio is dependent on the order of magnitude of the expected duration; if we5656+ * expect 500 milliseconds of idle time the likelihood of getting an interrupt5757+ * very early is much higher than if we expect 50 micro seconds of idle time.5858+ * For this reason, menu keeps an array of 6 independent factors, that gets5959+ * indexed based on the magnitude of the expected duration.6660 *6761 * Repeatable-interval-detector6862 * ----------------------------6963 * There are some cases where "next timer" is a completely unusable predictor:7064 * Those cases where the interval is fixed, for example due to hardware7171- * interrupt mitigation, but also due to fixed transfer rate devices such as7272- * mice.6565+ * interrupt mitigation, but also due to fixed transfer rate devices like mice.7366 * For this, we use a different predictor: We track the duration of the last 87474- * intervals and if the stand deviation of these 8 intervals is below a7575- * threshold value, we use the average of these intervals as prediction.7676- *6767+ * intervals and use them to estimate the duration of the next one.7768 */78697970struct menu_device {···107116 */108117static unsigned int get_typical_interval(struct menu_device *data)109118{110110- int i, divisor;111111- unsigned int min, max, thresh, avg;112112- uint64_t sum, variance;113113-114114- thresh = INT_MAX; /* Discard outliers above this value */119119+ s64 value, min_thresh = -1, max_thresh = UINT_MAX;120120+ unsigned int max, min, divisor;121121+ u64 avg, variance, avg_sq;122122+ int i;115123116124again:117117-118118- /* First calculate the average of past intervals */119119- min = UINT_MAX;125125+ /* Compute the average and variance of past intervals. */120126 max = 0;121121- sum = 0;127127+ min = UINT_MAX;128128+ avg = 0;129129+ variance = 0;122130 divisor = 0;123131 for (i = 0; i < INTERVALS; i++) {124124- unsigned int value = data->intervals[i];125125- if (value <= thresh) {126126- sum += value;127127- divisor++;128128- if (value > max)129129- max = value;132132+ value = data->intervals[i];133133+ /*134134+ * Discard the samples outside the interval between the min and135135+ * max thresholds.136136+ */137137+ if (value <= min_thresh || value >= max_thresh)138138+ continue;130139131131- if (value < min)132132- min = value;133133- }140140+ divisor++;141141+142142+ avg += value;143143+ variance += value * value;144144+145145+ if (value > max)146146+ max = value;147147+148148+ if (value < min)149149+ min = value;134150 }135151136152 if (!max)137153 return UINT_MAX;138154139139- if (divisor == INTERVALS)140140- avg = sum >> INTERVAL_SHIFT;141141- else142142- avg = div_u64(sum, divisor);143143-144144- /* Then try to determine variance */145145- variance = 0;146146- for (i = 0; i < INTERVALS; i++) {147147- unsigned int value = data->intervals[i];148148- if (value <= thresh) {149149- int64_t diff = (int64_t)value - avg;150150- variance += diff * diff;151151- }152152- }153153- if (divisor == INTERVALS)155155+ if (divisor == INTERVALS) {156156+ avg >>= INTERVAL_SHIFT;154157 variance >>= INTERVAL_SHIFT;155155- else158158+ } else {159159+ do_div(avg, divisor);156160 do_div(variance, divisor);161161+ }162162+163163+ avg_sq = avg * avg;164164+ variance -= avg_sq;157165158166 /*159167 * The typical interval is obtained when standard deviation is···167177 * Use this result only if there is no timer to wake us up sooner.168178 */169179 if (likely(variance <= U64_MAX/36)) {170170- if ((((u64)avg*avg > variance*36) && (divisor * 4 >= INTERVALS * 3))171171- || variance <= 400) {180180+ if ((avg_sq > variance * 36 && divisor * 4 >= INTERVALS * 3) ||181181+ variance <= 400)172182 return avg;173173- }174183 }175184176185 /*177177- * If we have outliers to the upside in our distribution, discard178178- * those by setting the threshold to exclude these outliers, then186186+ * If there are outliers, discard them by setting thresholds to exclude187187+ * data points at a large enough distance from the average, then179188 * calculate the average and standard deviation again. Once we get180180- * down to the bottom 3/4 of our samples, stop excluding samples.189189+ * down to the last 3/4 of our samples, stop excluding samples.181190 *182191 * This can deal with workloads that have long pauses interspersed183192 * with sporadic activity with a bunch of short pauses.184193 */185185- if ((divisor * 4) <= INTERVALS * 3)186186- return UINT_MAX;194194+ if (divisor * 4 <= INTERVALS * 3) {195195+ /*196196+ * If there are sufficiently many data points still under197197+ * consideration after the outliers have been eliminated,198198+ * returning without a prediction would be a mistake because it199199+ * is likely that the next interval will not exceed the current200200+ * maximum, so return the latter in that case.201201+ */202202+ if (divisor >= INTERVALS / 2)203203+ return max;187204188188- thresh = max - 1;205205+ return UINT_MAX;206206+ }207207+208208+ /* Update the thresholds for the next round. */209209+ if (avg - min > max - avg)210210+ min_thresh = min;211211+ else212212+ max_thresh = max;213213+189214 goto again;190215}191216
+27-8
drivers/idle/intel_idle.c
···9090 * Indicate which enable bits to clear here.9191 */9292 unsigned long auto_demotion_disable_flags;9393- bool byt_auto_demotion_disable_flag;9493 bool disable_promotion_to_c1e;9594 bool use_acpi;9695};···14631464static const struct idle_cpu idle_cpu_byt __initconst = {14641465 .state_table = byt_cstates,14651466 .disable_promotion_to_c1e = true,14661466- .byt_auto_demotion_disable_flag = true,14671467};1468146814691469static const struct idle_cpu idle_cpu_cht __initconst = {14701470 .state_table = cht_cstates,14711471 .disable_promotion_to_c1e = true,14721472- .byt_auto_demotion_disable_flag = true,14731472};1474147314751474static const struct idle_cpu idle_cpu_ivb __initconst = {···16931696module_param_named(use_acpi, force_use_acpi, bool, 0444);16941697MODULE_PARM_DESC(use_acpi, "Use ACPI _CST for building the idle states list");1695169816991699+static bool no_native __read_mostly; /* No effect if no_acpi is set. */17001700+module_param_named(no_native, no_native, bool, 0444);17011701+MODULE_PARM_DESC(no_native, "Ignore cpu specific (native) idle states in lieu of ACPI idle states");17021702+16961703static struct acpi_processor_power acpi_state_table __initdata;1697170416981705/**···18391838 }18401839 return true;18411840}18411841+18421842+static inline bool ignore_native(void)18431843+{18441844+ return no_native && !no_acpi;18451845+}18421846#else /* !CONFIG_ACPI_PROCESSOR_CSTATE */18431847#define force_use_acpi (false)18441848···18531847{18541848 return false;18551849}18501850+static inline bool ignore_native(void) { return false; }18561851#endif /* !CONFIG_ACPI_PROCESSOR_CSTATE */1857185218581853/**···20662059 }20672060}2068206120622062+/**20632063+ * byt_cht_auto_demotion_disable - Disable Bay/Cherry Trail auto-demotion.20642064+ */20652065+static void __init byt_cht_auto_demotion_disable(void)20662066+{20672067+ wrmsrl(MSR_CC6_DEMOTION_POLICY_CONFIG, 0);20682068+ wrmsrl(MSR_MC6_DEMOTION_POLICY_CONFIG, 0);20692069+}20702070+20692071static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)20702072{20712073 unsigned int mwait_cstate = (MWAIT_HINT2CSTATE(mwait_hint) + 1) &···21562140 case INTEL_ATOM_GRACEMONT:21572141 adl_idle_state_table_update();21582142 break;21432143+ case INTEL_ATOM_SILVERMONT:21442144+ case INTEL_ATOM_AIRMONT:21452145+ byt_cht_auto_demotion_disable();21462146+ break;21592147 }2160214821612149 for (cstate = 0; cstate < CPUIDLE_STATE_MAX; ++cstate) {···22012181 state->flags |= CPUIDLE_FLAG_TIMER_STOP;2202218222032183 drv->state_count++;22042204- }22052205-22062206- if (icpu->byt_auto_demotion_disable_flag) {22072207- wrmsrl(MSR_CC6_DEMOTION_POLICY_CONFIG, 0);22082208- wrmsrl(MSR_MC6_DEMOTION_POLICY_CONFIG, 0);22092184 }22102185}22112186···23472332 pr_debug("MWAIT substates: 0x%x\n", mwait_substates);2348233323492334 icpu = (const struct idle_cpu *)id->driver_data;23352335+ if (icpu && ignore_native()) {23362336+ pr_debug("ignoring native CPU idle states\n");23372337+ icpu = NULL;23382338+ }23502339 if (icpu) {23512340 if (icpu->state_table)23522341 cpuidle_state_table = icpu->state_table;