Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

intel_idle: Introduce 'states_off' module parameter

In certain system configurations it may not be desirable to use some
C-states assumed to be available by intel_idle and the driver needs
to be prevented from using them even before the cpuidle sysfs
interface becomes accessible to user space. Currently, the only way
to achieve that is by setting the 'max_cstate' module parameter to a
value lower than the index of the shallowest of the C-states in
question, but that may be overly intrusive, because it effectively
makes all of the idle states deeper than the 'max_cstate' one go
away (and the C-state to avoid may be in the middle of the range
normally regarded as available).

To allow that limitation to be overcome, introduce a new module
parameter called 'states_off' to represent a list of idle states to
be disabled by default in the form of a bitmask and update the
documentation to cover it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

+38 -4
+18 -1
Documentation/admin-guide/pm/intel_idle.rst
··· 168 168 ``MWAIT`` instruction is not allowed to be used, so the initialization of 169 169 ``intel_idle`` will fail. 170 170 171 - Apart from that there are three module parameters recognized by ``intel_idle`` 171 + Apart from that there are four module parameters recognized by ``intel_idle`` 172 172 itself that can be set via the kernel command line (they cannot be updated via 173 173 sysfs, so that is the only way to change their values). 174 174 ··· 194 194 driver ignore the system's ACPI tables entirely or use them for all of the 195 195 recognized processor models, respectively (they both are unset by default and 196 196 ``use_acpi`` has no effect if ``no_acpi`` is set). 197 + 198 + The value of the ``states_off`` module parameter (0 by default) represents a 199 + list of idle states to be disabled by default in the form of a bitmask. 200 + 201 + Namely, the positions of the bits that are set in the ``states_off`` value are 202 + the indices of idle states to be disabled by default (as reflected by the names 203 + of the corresponding idle state directories in ``sysfs``, :file:`state0`, 204 + :file:`state1` ... :file:`state<i>` ..., where ``<i>`` is the index of the given 205 + idle state; see :ref:`idle-states-representation` in :doc:`cpuidle`). 206 + 207 + For example, if ``states_off`` is equal to 3, the driver will disable idle 208 + states 0 and 1 by default, and if it is equal to 8, idle state 3 will be 209 + disabled by default and so on (bit positions beyond the maximum idle state index 210 + are ignored). 211 + 212 + The idle states disabled this way can be enabled (on a per-CPU basis) from user 213 + space via ``sysfs``. 197 214 198 215 199 216 .. _intel-idle-core-and-package-idle-states:
+20 -3
drivers/idle/intel_idle.c
··· 63 63 }; 64 64 /* intel_idle.max_cstate=0 disables driver */ 65 65 static int max_cstate = CPUIDLE_STATE_MAX - 1; 66 + static unsigned int disabled_states_mask; 66 67 67 68 static unsigned int mwait_substates; 68 69 ··· 1235 1234 if (cx->type > ACPI_STATE_C2) 1236 1235 state->flags |= CPUIDLE_FLAG_TLB_FLUSHED; 1237 1236 1237 + if (disabled_states_mask & BIT(cstate)) 1238 + state->flags |= CPUIDLE_FLAG_OFF; 1239 + 1238 1240 state->enter = intel_idle; 1239 1241 state->enter_s2idle = intel_idle_s2idle; 1240 1242 } ··· 1470 1466 /* Structure copy. */ 1471 1467 drv->states[drv->state_count] = cpuidle_state_table[cstate]; 1472 1468 1473 - if ((icpu->use_acpi || force_use_acpi) && 1474 - intel_idle_off_by_default(mwait_hint) && 1475 - !(cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_ALWAYS_ENABLE)) 1469 + if ((disabled_states_mask & BIT(drv->state_count)) || 1470 + ((icpu->use_acpi || force_use_acpi) && 1471 + intel_idle_off_by_default(mwait_hint) && 1472 + !(cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_ALWAYS_ENABLE))) 1476 1473 drv->states[drv->state_count].flags |= CPUIDLE_FLAG_OFF; 1477 1474 1478 1475 drv->state_count++; ··· 1492 1487 static void __init intel_idle_cpuidle_driver_init(struct cpuidle_driver *drv) 1493 1488 { 1494 1489 cpuidle_poll_state_init(drv); 1490 + 1491 + if (disabled_states_mask & BIT(0)) 1492 + drv->states[0].flags |= CPUIDLE_FLAG_OFF; 1493 + 1495 1494 drv->state_count = 1; 1496 1495 1497 1496 if (icpu) ··· 1676 1667 * is the easiest way (currently) to continue doing that. 1677 1668 */ 1678 1669 module_param(max_cstate, int, 0444); 1670 + /* 1671 + * The positions of the bits that are set in this number are the indices of the 1672 + * idle states to be disabled by default (as reflected by the names of the 1673 + * corresponding idle state directories in sysfs, "state0", "state1" ... 1674 + * "state<i>" ..., where <i> is the index of the given state). 1675 + */ 1676 + module_param_named(states_off, disabled_states_mask, uint, 0444); 1677 + MODULE_PARM_DESC(states_off, "Mask of disabled idle states");