Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

cpuidle: menu: Handle stopped tick more aggressively

Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
with stopped tick) missed the case when the target residencies of
deep idle states of CPUs are above the tick boundary which may cause
the CPU to get stuck in a shallow idle state for a long time.

Say there are two CPU idle states available: one shallow, with the
target residency much below the tick boundary and one deep, with
the target residency significantly above the tick boundary. In
that case, if the tick has been stopped already and the expected
next timer event is relatively far in the future, the governor will
assume the idle duration to be equal to TICK_USEC and it will select
the idle state for the CPU accordingly. However, that will cause the
shallow state to be selected even though it would have been more
energy-efficient to select the deep one.

To address this issue, modify the governor to always use the time
till the closest timer event instead of the predicted idle duration
if the latter is less than the tick period length and the tick has
been stopped already. Also make it extend the search for a matching
idle state if the tick is stopped to avoid settling on a shallow
state if deep states with target residencies above the tick period
length are available.

In addition, make it always indicate that the tick should be stopped
if it has been stopped already for consistency.

Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
Reported-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: 4.17+ <stable@vger.kernel.org> # 4.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

+24 -12
+24 -12
drivers/cpuidle/governors/menu.c
··· 348 348 * If the tick is already stopped, the cost of possible short 349 349 * idle duration misprediction is much higher, because the CPU 350 350 * may be stuck in a shallow idle state for a long time as a 351 - * result of it. In that case say we might mispredict and try 352 - * to force the CPU into a state for which we would have stopped 353 - * the tick, unless a timer is going to expire really soon 354 - * anyway. 351 + * result of it. In that case say we might mispredict and use 352 + * the known time till the closest timer event for the idle 353 + * state selection. 355 354 */ 356 355 if (data->predicted_us < TICK_USEC) 357 - data->predicted_us = min_t(unsigned int, TICK_USEC, 358 - ktime_to_us(delta_next)); 356 + data->predicted_us = ktime_to_us(delta_next); 359 357 } else { 360 358 /* 361 359 * Use the performance multiplier and the user-configurable ··· 378 380 continue; 379 381 if (idx == -1) 380 382 idx = i; /* first enabled state */ 381 - if (s->target_residency > data->predicted_us) 382 - break; 383 + if (s->target_residency > data->predicted_us) { 384 + if (!tick_nohz_tick_stopped()) 385 + break; 386 + 387 + /* 388 + * If the state selected so far is shallow and this 389 + * state's target residency matches the time till the 390 + * closest timer event, select this one to avoid getting 391 + * stuck in the shallow one for too long. 392 + */ 393 + if (drv->states[idx].target_residency < TICK_USEC && 394 + s->target_residency <= ktime_to_us(delta_next)) 395 + idx = i; 396 + 397 + goto out; 398 + } 383 399 if (s->exit_latency > latency_req) { 384 400 /* 385 401 * If we break out of the loop for latency reasons, use ··· 414 402 * Don't stop the tick if the selected state is a polling one or if the 415 403 * expected idle duration is shorter than the tick period length. 416 404 */ 417 - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || 418 - expected_interval < TICK_USEC) { 405 + if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || 406 + expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) { 419 407 unsigned int delta_next_us = ktime_to_us(delta_next); 420 408 421 409 *stop_tick = false; 422 410 423 - if (!tick_nohz_tick_stopped() && idx > 0 && 424 - drv->states[idx].target_residency > delta_next_us) { 411 + if (idx > 0 && drv->states[idx].target_residency > delta_next_us) { 425 412 /* 426 413 * The tick is not going to be stopped and the target 427 414 * residency of the state to be returned is not within ··· 439 428 } 440 429 } 441 430 431 + out: 442 432 data->last_state_idx = idx; 443 433 444 434 return data->last_state_idx;