Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

sched/core: Fix RQCF_ACT_SKIP leak

Igor Raits and Bagas Sanjaya report a RQCF_ACT_SKIP leak warning.

This warning may be triggered in the following situations:

CPU0 CPU1

__schedule()
*rq->clock_update_flags <<= 1;* unregister_fair_sched_group()
pick_next_task_fair+0x4a/0x410 destroy_cfs_bandwidth()
newidle_balance+0x115/0x3e0 for_each_possible_cpu(i) *i=0*
rq_unpin_lock(this_rq, rf) __cfsb_csd_unthrottle()
raw_spin_rq_unlock(this_rq)
rq_lock(*CPU0_rq*, &rf)
rq_clock_start_loop_update()
rq->clock_update_flags & RQCF_ACT_SKIP <--
raw_spin_rq_lock(this_rq)

The purpose of RQCF_ACT_SKIP is to skip the update rq clock,
but the update is very early in __schedule(), but we clear
RQCF_*_SKIP very late, causing it to span that gap above
and triggering this warning.

In __schedule() we can clear the RQCF_*_SKIP flag immediately
after update_rq_clock() to avoid this RQCF_ACT_SKIP leak warning.
And set rq->clock_update_flags to RQCF_UPDATED to avoid
rq->clock_update_flags < RQCF_ACT_SKIP warning that may be triggered later.

Fixes: ebb83d84e49b ("sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()")
Closes: https://lore.kernel.org/all/20230913082424.73252-1-jiahao.os@bytedance.com
Reported-by: Igor Raits <igor.raits@gmail.com>
Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/a5dd536d-041a-2ce9-f4b7-64d8d85c86dc@gmail.com

authored by

Hao Jia and committed by
Peter Zijlstra
5ebde09d 4e5b65a2

+1 -4
+1 -4
kernel/sched/core.c
··· 5361 5361 /* switch_mm_cid() requires the memory barriers above. */ 5362 5362 switch_mm_cid(rq, prev, next); 5363 5363 5364 - rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP); 5365 - 5366 5364 prepare_lock_switch(rq, next, rf); 5367 5365 5368 5366 /* Here we just switch the register state and the stack. */ ··· 6598 6600 /* Promote REQ to ACT */ 6599 6601 rq->clock_update_flags <<= 1; 6600 6602 update_rq_clock(rq); 6603 + rq->clock_update_flags = RQCF_UPDATED; 6601 6604 6602 6605 switch_count = &prev->nivcsw; 6603 6606 ··· 6678 6679 /* Also unlocks the rq: */ 6679 6680 rq = context_switch(rq, prev, next, &rf); 6680 6681 } else { 6681 - rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP); 6682 - 6683 6682 rq_unpin_lock(rq, &rf); 6684 6683 __balance_callbacks(rq); 6685 6684 raw_spin_rq_unlock_irq(rq);