Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Honor state disabling in the cpuidle ladder governor

There are two cpuidle governors ladder and menu. While the ladder
governor is always available, if CONFIG_CPU_IDLE is selected, the
menu governor additionally requires CONFIG_NO_HZ.

A particular C state can be disabled by writing to the sysfs file
/sys/devices/system/cpu/cpuN/cpuidle/stateN/disable, but this mechanism
is only implemented in the menu governor. Thus, in a system where
CONFIG_NO_HZ is not selected, the ladder governor becomes default and
always will walk through all sleep states - irrespective of whether the
C state was disabled via sysfs or not. The only way to select a specific
C state was to write the related latency to /dev/cpu_dma_latency and
keep the file open as long as this setting was required - not very
practical and not suitable for setting a single core in an SMP system.

With this patch, the ladder governor only will promote to the next
C state, if it has not been disabled, and it will demote, if the
current C state was disabled.

Note that the patch does not make the setting of the sysfs variable
"disable" coherent, i.e. if one is disabling a light state, then all
deeper states are disabled as well, but the "disable" variable does not
reflect it. Likewise, if one enables a deep state but a lighter state
still is disabled, then this has no effect. A related section has been
added to the documentation.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

authored by

Carsten Emde and committed by
Rafael J. Wysocki
62d6ae88 4cbe5a55

+12 -2
+9 -1
Documentation/cpuidle/sysfs.txt
··· 76 76 77 77 78 78 * desc : Small description about the idle state (string) 79 - * disable : Option to disable this idle state (bool) 79 + * disable : Option to disable this idle state (bool) -> see note below 80 80 * latency : Latency to exit out of this idle state (in microseconds) 81 81 * name : Name of the idle state (string) 82 82 * power : Power consumed while in this idle state (in milliwatts) 83 83 * time : Total time spent in this idle state (in microseconds) 84 84 * usage : Number of times this state was entered (count) 85 + 86 + Note: 87 + The behavior and the effect of the disable variable depends on the 88 + implementation of a particular governor. In the ladder governor, for 89 + example, it is not coherent, i.e. if one is disabling a light state, 90 + then all deeper states are disabled as well, but the disable variable 91 + does not reflect it. Likewise, if one enables a deep state but a lighter 92 + state still is disabled, then this has no effect.
+3 -1
drivers/cpuidle/governors/ladder.c
··· 88 88 89 89 /* consider promotion */ 90 90 if (last_idx < drv->state_count - 1 && 91 + !dev->states_usage[last_idx + 1].disable && 91 92 last_residency > last_state->threshold.promotion_time && 92 93 drv->states[last_idx + 1].exit_latency <= latency_req) { 93 94 last_state->stats.promotion_count++; ··· 101 100 102 101 /* consider demotion */ 103 102 if (last_idx > CPUIDLE_DRIVER_STATE_START && 104 - drv->states[last_idx].exit_latency > latency_req) { 103 + (dev->states_usage[last_idx].disable || 104 + drv->states[last_idx].exit_latency > latency_req)) { 105 105 int i; 106 106 107 107 for (i = last_idx - 1; i > CPUIDLE_DRIVER_STATE_START; i--) {