Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

sched/deadline: Fix race in push_dl_task()

When a CPU chooses to call push_dl_task and picks a task to push to
another CPU's runqueue then it will call find_lock_later_rq method
which would take a double lock on both CPUs' runqueues. If one of the
locks aren't readily available, it may lead to dropping the current
runqueue lock and reacquiring both the locks at once. During this window
it is possible that the task is already migrated and is running on some
other CPU. These cases are already handled. However, if the task is
migrated and has already been executed and another CPU is now trying to
wake it up (ttwu) such that it is queued again on the runqeue
(on_rq is 1) and also if the task was run by the same CPU, then the
current checks will pass even though the task was migrated out and is no
longer in the pushable tasks list.
Please go through the original rt change for more details on the issue.

To fix this, after the lock is obtained inside the find_lock_later_rq,
it ensures that the task is still at the head of pushable tasks list.
Also removed some checks that are no longer needed with the addition of
this new check.
However, the new check of pushable tasks list only applies when
find_lock_later_rq is called by push_dl_task. For the other caller i.e.
dl_task_offline_migration, existing checks are used.

Signed-off-by: Harshit Agarwal <harshit@nutanix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250408045021.3283624-1-harshit@nutanix.com

authored by

Harshit Agarwal and committed by
Peter Zijlstra
8fd5485f b320789d

+49 -24
+49 -24
kernel/sched/deadline.c
··· 2580 2580 return -1; 2581 2581 } 2582 2582 2583 + static struct task_struct *pick_next_pushable_dl_task(struct rq *rq) 2584 + { 2585 + struct task_struct *p; 2586 + 2587 + if (!has_pushable_dl_tasks(rq)) 2588 + return NULL; 2589 + 2590 + p = __node_2_pdl(rb_first_cached(&rq->dl.pushable_dl_tasks_root)); 2591 + 2592 + WARN_ON_ONCE(rq->cpu != task_cpu(p)); 2593 + WARN_ON_ONCE(task_current(rq, p)); 2594 + WARN_ON_ONCE(p->nr_cpus_allowed <= 1); 2595 + 2596 + WARN_ON_ONCE(!task_on_rq_queued(p)); 2597 + WARN_ON_ONCE(!dl_task(p)); 2598 + 2599 + return p; 2600 + } 2601 + 2583 2602 /* Locks the rq it finds */ 2584 2603 static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq) 2585 2604 { ··· 2626 2607 2627 2608 /* Retry if something changed. */ 2628 2609 if (double_lock_balance(rq, later_rq)) { 2629 - if (unlikely(task_rq(task) != rq || 2610 + /* 2611 + * double_lock_balance had to release rq->lock, in the 2612 + * meantime, task may no longer be fit to be migrated. 2613 + * Check the following to ensure that the task is 2614 + * still suitable for migration: 2615 + * 1. It is possible the task was scheduled, 2616 + * migrate_disabled was set and then got preempted, 2617 + * so we must check the task migration disable 2618 + * flag. 2619 + * 2. The CPU picked is in the task's affinity. 2620 + * 3. For throttled task (dl_task_offline_migration), 2621 + * check the following: 2622 + * - the task is not on the rq anymore (it was 2623 + * migrated) 2624 + * - the task is not on CPU anymore 2625 + * - the task is still a dl task 2626 + * - the task is not queued on the rq anymore 2627 + * 4. For the non-throttled task (push_dl_task), the 2628 + * check to ensure that this task is still at the 2629 + * head of the pushable tasks list is enough. 2630 + */ 2631 + if (unlikely(is_migration_disabled(task) || 2630 2632 !cpumask_test_cpu(later_rq->cpu, &task->cpus_mask) || 2631 - task_on_cpu(rq, task) || 2632 - !dl_task(task) || 2633 - is_migration_disabled(task) || 2634 - !task_on_rq_queued(task))) { 2633 + (task->dl.dl_throttled && 2634 + (task_rq(task) != rq || 2635 + task_on_cpu(rq, task) || 2636 + !dl_task(task) || 2637 + !task_on_rq_queued(task))) || 2638 + (!task->dl.dl_throttled && 2639 + task != pick_next_pushable_dl_task(rq)))) { 2640 + 2635 2641 double_unlock_balance(rq, later_rq); 2636 2642 later_rq = NULL; 2637 2643 break; ··· 2677 2633 } 2678 2634 2679 2635 return later_rq; 2680 - } 2681 - 2682 - static struct task_struct *pick_next_pushable_dl_task(struct rq *rq) 2683 - { 2684 - struct task_struct *p; 2685 - 2686 - if (!has_pushable_dl_tasks(rq)) 2687 - return NULL; 2688 - 2689 - p = __node_2_pdl(rb_first_cached(&rq->dl.pushable_dl_tasks_root)); 2690 - 2691 - WARN_ON_ONCE(rq->cpu != task_cpu(p)); 2692 - WARN_ON_ONCE(task_current(rq, p)); 2693 - WARN_ON_ONCE(p->nr_cpus_allowed <= 1); 2694 - 2695 - WARN_ON_ONCE(!task_on_rq_queued(p)); 2696 - WARN_ON_ONCE(!dl_task(p)); 2697 - 2698 - return p; 2699 2636 } 2700 2637 2701 2638 /*