Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
fork

Configure Feed

Select the types of activity you want to include in your feed.

timer/migration: Fix quick check reporting late expiry

When a CPU is the last active in the hierarchy and it tries to enter
into idle, the quick check looking up the next event towards cpuidle
heuristics may report a too late expiry, such as in the following
scenario:

[GRP1:0]
migrator = NONE
active = NONE
nextevt = T0:0, T0:1
/ \
[GRP0:0] [GRP0:1]
migrator = NONE migrator = NONE
active = NONE active = NONE
nextevt = T0, T1 nextevt = T2
/ \ / \
0 1 2 3
idle idle idle idle

0) The whole system is idle, and CPU 0 was the last migrator. CPU 0 has
a timer (T0), CPU 1 has a timer (T1) and CPU 2 has a timer (T2). The
expire order is T0 < T1 < T2.

[GRP1:0]
migrator = GRP0:0
active = GRP0:0
nextevt = T0:0(i), T0:1
/ \
[GRP0:0] [GRP0:1]
migrator = CPU0 migrator = NONE
active = CPU0 active = NONE
nextevt = T0(i), T1 nextevt = T2
/ \ / \
0 1 2 3
active idle idle idle

1) CPU 0 becomes active. The (i) means a now ignored timer.

[GRP1:0]
migrator = GRP0:0
active = GRP0:0
nextevt = T0:1
/ \
[GRP0:0] [GRP0:1]
migrator = CPU0 migrator = NONE
active = CPU0 active = NONE
nextevt = T1 nextevt = T2
/ \ / \
0 1 2 3
active idle idle idle

2) CPU 0 handles remote. No timer actually expired but ignored timers
have been cleaned out and their sibling's timers haven't been
propagated. As a result the top level's next event is T2 and not T1.

3) CPU 0 tries to enter idle without any global timer enqueued and calls
tmigr_quick_check(). The expiry of T2 is returned instead of the
expiry of T1.

When the quick check returns an expiry that is too late, the cpuidle
governor may pick up a C-state that is too deep. This may be result into
undesired CPU wake up latency if the next timer is actually close enough.

Fix this with assuming that expiries aren't sorted top-down while
performing the quick check. Pick up instead the earliest encountered one
while walking up the hierarchy.

7ee988770326 ("timers: Implement the hierarchical pull model")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240305002822.18130-1-frederic@kernel.org

authored by

Frederic Weisbecker and committed by
Thomas Gleixner
8ca18367 a184d983

+15 -9
+15 -9
kernel/time/timer_migration.c
··· 1385 1385 * single group active on the way to top level) 1386 1386 * * nextevt - when CPU is offline and has to handle timer on his own 1387 1387 * or when on the way to top in every group only a single 1388 - * child is active and but @nextevt is before next_expiry 1389 - * of top level group 1390 - * * next_expiry (top) - value of top level group, when on the way to top in 1391 - * every group only a single child is active and @nextevt 1392 - * is after this value active child. 1388 + * child is active but @nextevt is before the lowest 1389 + * next_expiry encountered while walking up to top level. 1390 + * * next_expiry - value of lowest expiry encountered while walking groups 1391 + * if only a single child is active on each and @nextevt 1392 + * is after this lowest expiry. 1393 1393 */ 1394 1394 u64 tmigr_quick_check(u64 nextevt) 1395 1395 { ··· 1408 1408 do { 1409 1409 if (!tmigr_check_lonely(group)) { 1410 1410 return KTIME_MAX; 1411 - } else if (!group->parent) { 1412 - u64 first_global = READ_ONCE(group->next_expiry); 1413 - 1414 - return min_t(u64, nextevt, first_global); 1411 + } else { 1412 + /* 1413 + * Since current CPU is active, events may not be sorted 1414 + * from bottom to the top because the CPU's event is ignored 1415 + * up to the top and its sibling's events not propagated upwards. 1416 + * Thus keep track of the lowest observed expiry. 1417 + */ 1418 + nextevt = min_t(u64, nextevt, READ_ONCE(group->next_expiry)); 1419 + if (!group->parent) 1420 + return nextevt; 1415 1421 } 1416 1422 group = group->parent; 1417 1423 } while (group);