Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched

+1 -1

Documentation/sched-design-CFS.txt

··· 83 83 CFS uses nanosecond granularity accounting and does not rely on any 84 84 jiffies or other HZ detail. Thus the CFS scheduler has no notion of 85 85 'timeslices' and has no heuristics whatsoever. There is only one 86 - central tunable: 86 + central tunable (you have to switch on CONFIG_SCHED_DEBUG): 87 87 88 88 /proc/sys/kernel/sched_granularity_ns 89 89

+108

Documentation/sched-nice-design.txt

··· 1 + This document explains the thinking about the revamped and streamlined 2 + nice-levels implementation in the new Linux scheduler. 3 + 4 + Nice levels were always pretty weak under Linux and people continuously 5 + pestered us to make nice +19 tasks use up much less CPU time. 6 + 7 + Unfortunately that was not that easy to implement under the old 8 + scheduler, (otherwise we'd have done it long ago) because nice level 9 + support was historically coupled to timeslice length, and timeslice 10 + units were driven by the HZ tick, so the smallest timeslice was 1/HZ. 11 + 12 + In the O(1) scheduler (in 2003) we changed negative nice levels to be 13 + much stronger than they were before in 2.4 (and people were happy about 14 + that change), and we also intentionally calibrated the linear timeslice 15 + rule so that nice +19 level would be _exactly_ 1 jiffy. To better 16 + understand it, the timeslice graph went like this (cheesy ASCII art 17 + alert!): 18 + 19 + 20 + A 21 + \ | [timeslice length] 22 + \ | 23 + \ | 24 + \ | 25 + \ | 26 + \|___100msecs 27 + |^ . _ 28 + | ^ . _ 29 + | ^ . _ 30 + -*----------------------------------*-----> [nice level] 31 + -20 | +19 32 + | 33 + | 34 + 35 + So that if someone wanted to really renice tasks, +19 would give a much 36 + bigger hit than the normal linear rule would do. (The solution of 37 + changing the ABI to extend priorities was discarded early on.) 38 + 39 + This approach worked to some degree for some time, but later on with 40 + HZ=1000 it caused 1 jiffy to be 1 msec, which meant 0.1% CPU usage which 41 + we felt to be a bit excessive. Excessive _not_ because it's too small of 42 + a CPU utilization, but because it causes too frequent (once per 43 + millisec) rescheduling. (and would thus trash the cache, etc. Remember, 44 + this was long ago when hardware was weaker and caches were smaller, and 45 + people were running number crunching apps at nice +19.) 46 + 47 + So for HZ=1000 we changed nice +19 to 5msecs, because that felt like the 48 + right minimal granularity - and this translates to 5% CPU utilization. 49 + But the fundamental HZ-sensitive property for nice+19 still remained, 50 + and we never got a single complaint about nice +19 being too _weak_ in 51 + terms of CPU utilization, we only got complaints about it (still) being 52 + too _strong_ :-) 53 + 54 + To sum it up: we always wanted to make nice levels more consistent, but 55 + within the constraints of HZ and jiffies and their nasty design level 56 + coupling to timeslices and granularity it was not really viable. 57 + 58 + The second (less frequent but still periodically occuring) complaint 59 + about Linux's nice level support was its assymetry around the origo 60 + (which you can see demonstrated in the picture above), or more 61 + accurately: the fact that nice level behavior depended on the _absolute_ 62 + nice level as well, while the nice API itself is fundamentally 63 + "relative": 64 + 65 + int nice(int inc); 66 + 67 + asmlinkage long sys_nice(int increment) 68 + 69 + (the first one is the glibc API, the second one is the syscall API.) 70 + Note that the 'inc' is relative to the current nice level. Tools like 71 + bash's "nice" command mirror this relative API. 72 + 73 + With the old scheduler, if you for example started a niced task with +1 74 + and another task with +2, the CPU split between the two tasks would 75 + depend on the nice level of the parent shell - if it was at nice -10 the 76 + CPU split was different than if it was at +5 or +10. 77 + 78 + A third complaint against Linux's nice level support was that negative 79 + nice levels were not 'punchy enough', so lots of people had to resort to 80 + run audio (and other multimedia) apps under RT priorities such as 81 + SCHED_FIFO. But this caused other problems: SCHED_FIFO is not starvation 82 + proof, and a buggy SCHED_FIFO app can also lock up the system for good. 83 + 84 + The new scheduler in v2.6.23 addresses all three types of complaints: 85 + 86 + To address the first complaint (of nice levels being not "punchy" 87 + enough), the scheduler was decoupled from 'time slice' and HZ concepts 88 + (and granularity was made a separate concept from nice levels) and thus 89 + it was possible to implement better and more consistent nice +19 90 + support: with the new scheduler nice +19 tasks get a HZ-independent 91 + 1.5%, instead of the variable 3%-5%-9% range they got in the old 92 + scheduler. 93 + 94 + To address the second complaint (of nice levels not being consistent), 95 + the new scheduler makes nice(1) have the same CPU utilization effect on 96 + tasks, regardless of their absolute nice levels. So on the new 97 + scheduler, running a nice +10 and a nice 11 task has the same CPU 98 + utilization "split" between them as running a nice -5 and a nice -4 99 + task. (one will get 55% of the CPU, the other 45%.) That is why nice 100 + levels were changed to be "multiplicative" (or exponential) - that way 101 + it does not matter which nice level you start out from, the 'relative 102 + result' will always be the same. 103 + 104 + The third complaint (of negative nice levels not being "punchy" enough 105 + and forcing audio apps to run under the more dangerous SCHED_FIFO 106 + scheduling policy) is addressed by the new scheduler almost 107 + automatically: stronger negative nice levels are an automatic 108 + side-effect of the recalibrated dynamic range of nice levels.

+9 -11

include/linux/sched.h

··· 139 139 extern void proc_sched_show_task(struct task_struct *p, struct seq_file *m); 140 140 extern void proc_sched_set_task(struct task_struct *p); 141 141 extern void 142 - print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq, u64 now); 142 + print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq); 143 143 #else 144 144 static inline void 145 145 proc_sched_show_task(struct task_struct *p, struct seq_file *m) ··· 149 149 { 150 150 } 151 151 static inline void 152 - print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq, u64 now) 152 + print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) 153 153 { 154 154 } 155 155 #endif ··· 855 855 struct sched_class { 856 856 struct sched_class *next; 857 857 858 - void (*enqueue_task) (struct rq *rq, struct task_struct *p, 859 - int wakeup, u64 now); 860 - void (*dequeue_task) (struct rq *rq, struct task_struct *p, 861 - int sleep, u64 now); 858 + void (*enqueue_task) (struct rq *rq, struct task_struct *p, int wakeup); 859 + void (*dequeue_task) (struct rq *rq, struct task_struct *p, int sleep); 862 860 void (*yield_task) (struct rq *rq, struct task_struct *p); 863 861 864 862 void (*check_preempt_curr) (struct rq *rq, struct task_struct *p); 865 863 866 - struct task_struct * (*pick_next_task) (struct rq *rq, u64 now); 867 - void (*put_prev_task) (struct rq *rq, struct task_struct *p, u64 now); 864 + struct task_struct * (*pick_next_task) (struct rq *rq); 865 + void (*put_prev_task) (struct rq *rq, struct task_struct *p); 868 866 869 - int (*load_balance) (struct rq *this_rq, int this_cpu, 867 + unsigned long (*load_balance) (struct rq *this_rq, int this_cpu, 870 868 struct rq *busiest, 871 869 unsigned long max_nr_move, unsigned long max_load_move, 872 870 struct sched_domain *sd, enum cpu_idle_type idle, 873 - int *all_pinned, unsigned long *total_load_moved); 871 + int *all_pinned, int *this_best_prio); 874 872 875 873 void (*set_curr_task) (struct rq *rq); 876 874 void (*task_tick) (struct rq *rq, struct task_struct *p); 877 - void (*task_new) (struct rq *rq, struct task_struct *p, u64 now); 875 + void (*task_new) (struct rq *rq, struct task_struct *p); 878 876 }; 879 877 880 878 struct load_weight {

+178 -161

kernel/sched.c

··· 318 318 } 319 319 320 320 /* 321 - * Per-runqueue clock, as finegrained as the platform can give us: 321 + * Update the per-runqueue clock, as finegrained as the platform can give 322 + * us, but without assuming monotonicity, etc.: 322 323 */ 323 - static unsigned long long __rq_clock(struct rq *rq) 324 + static void __update_rq_clock(struct rq *rq) 324 325 { 325 326 u64 prev_raw = rq->prev_clock_raw; 326 327 u64 now = sched_clock(); 327 328 s64 delta = now - prev_raw; 328 329 u64 clock = rq->clock; 329 330 331 + #ifdef CONFIG_SCHED_DEBUG 332 + WARN_ON_ONCE(cpu_of(rq) != smp_processor_id()); 333 + #endif 330 334 /* 331 335 * Protect against sched_clock() occasionally going backwards: 332 336 */ ··· 353 349 354 350 rq->prev_clock_raw = now; 355 351 rq->clock = clock; 356 - 357 - return clock; 358 352 } 359 353 360 - static inline unsigned long long rq_clock(struct rq *rq) 354 + static void update_rq_clock(struct rq *rq) 361 355 { 362 - int this_cpu = smp_processor_id(); 363 - 364 - if (this_cpu == cpu_of(rq)) 365 - return __rq_clock(rq); 366 - 367 - return rq->clock; 356 + if (likely(smp_processor_id() == cpu_of(rq))) 357 + __update_rq_clock(rq); 368 358 } 369 359 370 360 /* ··· 384 386 { 385 387 unsigned long long now; 386 388 unsigned long flags; 389 + struct rq *rq; 387 390 388 391 local_irq_save(flags); 389 - now = rq_clock(cpu_rq(cpu)); 392 + rq = cpu_rq(cpu); 393 + update_rq_clock(rq); 394 + now = rq->clock; 390 395 local_irq_restore(flags); 391 396 392 397 return now; ··· 638 637 639 638 #define WMULT_SHIFT 32 640 639 640 + /* 641 + * Shift right and round: 642 + */ 643 + #define RSR(x, y) (((x) + (1UL << ((y) - 1))) >> (y)) 644 + 641 645 static unsigned long 642 646 calc_delta_mine(unsigned long delta_exec, unsigned long weight, 643 647 struct load_weight *lw) ··· 650 644 u64 tmp; 651 645 652 646 if (unlikely(!lw->inv_weight)) 653 - lw->inv_weight = WMULT_CONST / lw->weight; 647 + lw->inv_weight = (WMULT_CONST - lw->weight/2) / lw->weight + 1; 654 648 655 649 tmp = (u64)delta_exec * weight; 656 650 /* 657 651 * Check whether we'd overflow the 64-bit multiplication: 658 652 */ 659 - if (unlikely(tmp > WMULT_CONST)) { 660 - tmp = ((tmp >> WMULT_SHIFT/2) * lw->inv_weight) 661 - >> (WMULT_SHIFT/2); 662 - } else { 663 - tmp = (tmp * lw->inv_weight) >> WMULT_SHIFT; 664 - } 653 + if (unlikely(tmp > WMULT_CONST)) 654 + tmp = RSR(RSR(tmp, WMULT_SHIFT/2) * lw->inv_weight, 655 + WMULT_SHIFT/2); 656 + else 657 + tmp = RSR(tmp * lw->inv_weight, WMULT_SHIFT); 665 658 666 659 return (unsigned long)min(tmp, (u64)(unsigned long)LONG_MAX); 667 660 } ··· 708 703 * the relative distance between them is ~25%.) 709 704 */ 710 705 static const int prio_to_weight[40] = { 711 - /* -20 */ 88818, 71054, 56843, 45475, 36380, 29104, 23283, 18626, 14901, 11921, 712 - /* -10 */ 9537, 7629, 6103, 4883, 3906, 3125, 2500, 2000, 1600, 1280, 713 - /* 0 */ NICE_0_LOAD /* 1024 */, 714 - /* 1 */ 819, 655, 524, 419, 336, 268, 215, 172, 137, 715 - /* 10 */ 110, 87, 70, 56, 45, 36, 29, 23, 18, 15, 706 + /* -20 */ 88761, 71755, 56483, 46273, 36291, 707 + /* -15 */ 29154, 23254, 18705, 14949, 11916, 708 + /* -10 */ 9548, 7620, 6100, 4904, 3906, 709 + /* -5 */ 3121, 2501, 1991, 1586, 1277, 710 + /* 0 */ 1024, 820, 655, 526, 423, 711 + /* 5 */ 335, 272, 215, 172, 137, 712 + /* 10 */ 110, 87, 70, 56, 45, 713 + /* 15 */ 36, 29, 23, 18, 15, 716 714 }; 717 715 718 716 /* ··· 726 718 * into multiplications: 727 719 */ 728 720 static const u32 prio_to_wmult[40] = { 729 - /* -20 */ 48356, 60446, 75558, 94446, 118058, 730 - /* -15 */ 147573, 184467, 230589, 288233, 360285, 731 - /* -10 */ 450347, 562979, 703746, 879575, 1099582, 732 - /* -5 */ 1374389, 1717986, 2147483, 2684354, 3355443, 733 - /* 0 */ 4194304, 5244160, 6557201, 8196502, 10250518, 734 - /* 5 */ 12782640, 16025997, 19976592, 24970740, 31350126, 735 - /* 10 */ 39045157, 49367440, 61356675, 76695844, 95443717, 736 - /* 15 */ 119304647, 148102320, 186737708, 238609294, 286331153, 721 + /* -20 */ 48388, 59856, 76040, 92818, 118348, 722 + /* -15 */ 147320, 184698, 229616, 287308, 360437, 723 + /* -10 */ 449829, 563644, 704093, 875809, 1099582, 724 + /* -5 */ 1376151, 1717300, 2157191, 2708050, 3363326, 725 + /* 0 */ 4194304, 5237765, 6557202, 8165337, 10153587, 726 + /* 5 */ 12820798, 15790321, 19976592, 24970740, 31350126, 727 + /* 10 */ 39045157, 49367440, 61356676, 76695844, 95443717, 728 + /* 15 */ 119304647, 148102320, 186737708, 238609294, 286331153, 737 729 }; 738 730 739 731 static void activate_task(struct rq *rq, struct task_struct *p, int wakeup); ··· 753 745 unsigned long max_nr_move, unsigned long max_load_move, 754 746 struct sched_domain *sd, enum cpu_idle_type idle, 755 747 int *all_pinned, unsigned long *load_moved, 756 - int this_best_prio, int best_prio, int best_prio_seen, 757 - struct rq_iterator *iterator); 748 + int *this_best_prio, struct rq_iterator *iterator); 758 749 759 750 #include "sched_stats.h" 760 751 #include "sched_rt.c" ··· 789 782 * This function is called /before/ updating rq->ls.load 790 783 * and when switching tasks. 791 784 */ 792 - static void update_curr_load(struct rq *rq, u64 now) 785 + static void update_curr_load(struct rq *rq) 793 786 { 794 787 struct load_stat *ls = &rq->ls; 795 788 u64 start; 796 789 797 790 start = ls->load_update_start; 798 - ls->load_update_start = now; 799 - ls->delta_stat += now - start; 791 + ls->load_update_start = rq->clock; 792 + ls->delta_stat += rq->clock - start; 800 793 /* 801 794 * Stagger updates to ls->delta_fair. Very frequent updates 802 795 * can be expensive. ··· 805 798 __update_curr_load(rq, ls); 806 799 } 807 800 808 - static inline void 809 - inc_load(struct rq *rq, const struct task_struct *p, u64 now) 801 + static inline void inc_load(struct rq *rq, const struct task_struct *p) 810 802 { 811 - update_curr_load(rq, now); 803 + update_curr_load(rq); 812 804 update_load_add(&rq->ls.load, p->se.load.weight); 813 805 } 814 806 815 - static inline void 816 - dec_load(struct rq *rq, const struct task_struct *p, u64 now) 807 + static inline void dec_load(struct rq *rq, const struct task_struct *p) 817 808 { 818 - update_curr_load(rq, now); 809 + update_curr_load(rq); 819 810 update_load_sub(&rq->ls.load, p->se.load.weight); 820 811 } 821 812 822 - static void inc_nr_running(struct task_struct *p, struct rq *rq, u64 now) 813 + static void inc_nr_running(struct task_struct *p, struct rq *rq) 823 814 { 824 815 rq->nr_running++; 825 - inc_load(rq, p, now); 816 + inc_load(rq, p); 826 817 } 827 818 828 - static void dec_nr_running(struct task_struct *p, struct rq *rq, u64 now) 819 + static void dec_nr_running(struct task_struct *p, struct rq *rq) 829 820 { 830 821 rq->nr_running--; 831 - dec_load(rq, p, now); 822 + dec_load(rq, p); 832 823 } 833 824 834 825 static void set_load_weight(struct task_struct *p) ··· 853 848 p->se.load.inv_weight = prio_to_wmult[p->static_prio - MAX_RT_PRIO]; 854 849 } 855 850 856 - static void 857 - enqueue_task(struct rq *rq, struct task_struct *p, int wakeup, u64 now) 851 + static void enqueue_task(struct rq *rq, struct task_struct *p, int wakeup) 858 852 { 859 853 sched_info_queued(p); 860 - p->sched_class->enqueue_task(rq, p, wakeup, now); 854 + p->sched_class->enqueue_task(rq, p, wakeup); 861 855 p->se.on_rq = 1; 862 856 } 863 857 864 - static void 865 - dequeue_task(struct rq *rq, struct task_struct *p, int sleep, u64 now) 858 + static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep) 866 859 { 867 - p->sched_class->dequeue_task(rq, p, sleep, now); 860 + p->sched_class->dequeue_task(rq, p, sleep); 868 861 p->se.on_rq = 0; 869 862 } 870 863 ··· 917 914 */ 918 915 static void activate_task(struct rq *rq, struct task_struct *p, int wakeup) 919 916 { 920 - u64 now = rq_clock(rq); 921 - 922 917 if (p->state == TASK_UNINTERRUPTIBLE) 923 918 rq->nr_uninterruptible--; 924 919 925 - enqueue_task(rq, p, wakeup, now); 926 - inc_nr_running(p, rq, now); 920 + enqueue_task(rq, p, wakeup); 921 + inc_nr_running(p, rq); 927 922 } 928 923 929 924 /* ··· 929 928 */ 930 929 static inline void activate_idle_task(struct task_struct *p, struct rq *rq) 931 930 { 932 - u64 now = rq_clock(rq); 931 + update_rq_clock(rq); 933 932 934 933 if (p->state == TASK_UNINTERRUPTIBLE) 935 934 rq->nr_uninterruptible--; 936 935 937 - enqueue_task(rq, p, 0, now); 938 - inc_nr_running(p, rq, now); 936 + enqueue_task(rq, p, 0); 937 + inc_nr_running(p, rq); 939 938 } 940 939 941 940 /* ··· 943 942 */ 944 943 static void deactivate_task(struct rq *rq, struct task_struct *p, int sleep) 945 944 { 946 - u64 now = rq_clock(rq); 947 - 948 945 if (p->state == TASK_UNINTERRUPTIBLE) 949 946 rq->nr_uninterruptible++; 950 947 951 - dequeue_task(rq, p, sleep, now); 952 - dec_nr_running(p, rq, now); 948 + dequeue_task(rq, p, sleep); 949 + dec_nr_running(p, rq); 953 950 } 954 951 955 952 /** ··· 1515 1516 1516 1517 out_activate: 1517 1518 #endif /* CONFIG_SMP */ 1519 + update_rq_clock(rq); 1518 1520 activate_task(rq, p, 1); 1519 1521 /* 1520 1522 * Sync wakeups (i.e. those types of wakeups where the waker ··· 1647 1647 unsigned long flags; 1648 1648 struct rq *rq; 1649 1649 int this_cpu; 1650 - u64 now; 1651 1650 1652 1651 rq = task_rq_lock(p, &flags); 1653 1652 BUG_ON(p->state != TASK_RUNNING); 1654 1653 this_cpu = smp_processor_id(); /* parent's CPU */ 1655 - now = rq_clock(rq); 1654 + update_rq_clock(rq); 1656 1655 1657 1656 p->prio = effective_prio(p); 1658 1657 ··· 1665 1666 * Let the scheduling class do new task startup 1666 1667 * management (if any): 1667 1668 */ 1668 - p->sched_class->task_new(rq, p, now); 1669 - inc_nr_running(p, rq, now); 1669 + p->sched_class->task_new(rq, p); 1670 + inc_nr_running(p, rq); 1670 1671 } 1671 1672 check_preempt_curr(rq, p); 1672 1673 task_rq_unlock(rq, &flags); ··· 1953 1954 unsigned long total_load = this_rq->ls.load.weight; 1954 1955 unsigned long this_load = total_load; 1955 1956 struct load_stat *ls = &this_rq->ls; 1956 - u64 now = __rq_clock(this_rq); 1957 1957 int i, scale; 1958 1958 1959 1959 this_rq->nr_load_updates++; ··· 1960 1962 goto do_avg; 1961 1963 1962 1964 /* Update delta_fair/delta_exec fields first */ 1963 - update_curr_load(this_rq, now); 1965 + update_curr_load(this_rq); 1964 1966 1965 1967 fair_delta64 = ls->delta_fair + 1; 1966 1968 ls->delta_fair = 0; ··· 1968 1970 exec_delta64 = ls->delta_exec + 1; 1969 1971 ls->delta_exec = 0; 1970 1972 1971 - sample_interval64 = now - ls->load_update_last; 1972 - ls->load_update_last = now; 1973 + sample_interval64 = this_rq->clock - ls->load_update_last; 1974 + ls->load_update_last = this_rq->clock; 1973 1975 1974 1976 if ((s64)sample_interval64 < (s64)TICK_NSEC) 1975 1977 sample_interval64 = TICK_NSEC; ··· 2024 2026 spin_lock(&rq1->lock); 2025 2027 } 2026 2028 } 2029 + update_rq_clock(rq1); 2030 + update_rq_clock(rq2); 2027 2031 } 2028 2032 2029 2033 /* ··· 2166 2166 unsigned long max_nr_move, unsigned long max_load_move, 2167 2167 struct sched_domain *sd, enum cpu_idle_type idle, 2168 2168 int *all_pinned, unsigned long *load_moved, 2169 - int this_best_prio, int best_prio, int best_prio_seen, 2170 - struct rq_iterator *iterator) 2169 + int *this_best_prio, struct rq_iterator *iterator) 2171 2170 { 2172 2171 int pulled = 0, pinned = 0, skip_for_load; 2173 2172 struct task_struct *p; ··· 2191 2192 */ 2192 2193 skip_for_load = (p->se.load.weight >> 1) > rem_load_move + 2193 2194 SCHED_LOAD_SCALE_FUZZ; 2194 - if (skip_for_load && p->prio < this_best_prio) 2195 - skip_for_load = !best_prio_seen && p->prio == best_prio; 2196 - if (skip_for_load || 2195 + if ((skip_for_load && p->prio >= *this_best_prio) || 2197 2196 !can_migrate_task(p, busiest, this_cpu, sd, idle, &pinned)) { 2198 - 2199 - best_prio_seen |= p->prio == best_prio; 2200 2197 p = iterator->next(iterator->arg); 2201 2198 goto next; 2202 2199 } ··· 2206 2211 * and the prescribed amount of weighted load. 2207 2212 */ 2208 2213 if (pulled < max_nr_move && rem_load_move > 0) { 2209 - if (p->prio < this_best_prio) 2210 - this_best_prio = p->prio; 2214 + if (p->prio < *this_best_prio) 2215 + *this_best_prio = p->prio; 2211 2216 p = iterator->next(iterator->arg); 2212 2217 goto next; 2213 2218 } ··· 2226 2231 } 2227 2232 2228 2233 /* 2229 - * move_tasks tries to move up to max_nr_move tasks and max_load_move weighted 2230 - * load from busiest to this_rq, as part of a balancing operation within 2231 - * "domain". Returns the number of tasks moved. 2234 + * move_tasks tries to move up to max_load_move weighted load from busiest to 2235 + * this_rq, as part of a balancing operation within domain "sd". 2236 + * Returns 1 if successful and 0 otherwise. 2232 2237 * 2233 2238 * Called with both runqueues locked. 2234 2239 */ 2235 2240 static int move_tasks(struct rq *this_rq, int this_cpu, struct rq *busiest, 2236 - unsigned long max_nr_move, unsigned long max_load_move, 2241 + unsigned long max_load_move, 2237 2242 struct sched_domain *sd, enum cpu_idle_type idle, 2238 2243 int *all_pinned) 2239 2244 { 2240 2245 struct sched_class *class = sched_class_highest; 2241 - unsigned long load_moved, total_nr_moved = 0, nr_moved; 2242 - long rem_load_move = max_load_move; 2246 + unsigned long total_load_moved = 0; 2247 + int this_best_prio = this_rq->curr->prio; 2243 2248 2244 2249 do { 2245 - nr_moved = class->load_balance(this_rq, this_cpu, busiest, 2246 - max_nr_move, (unsigned long)rem_load_move, 2247 - sd, idle, all_pinned, &load_moved); 2248 - total_nr_moved += nr_moved; 2249 - max_nr_move -= nr_moved; 2250 - rem_load_move -= load_moved; 2250 + total_load_moved += 2251 + class->load_balance(this_rq, this_cpu, busiest, 2252 + ULONG_MAX, max_load_move - total_load_moved, 2253 + sd, idle, all_pinned, &this_best_prio); 2251 2254 class = class->next; 2252 - } while (class && max_nr_move && rem_load_move > 0); 2255 + } while (class && max_load_move > total_load_moved); 2253 2256 2254 - return total_nr_moved; 2257 + return total_load_moved > 0; 2258 + } 2259 + 2260 + /* 2261 + * move_one_task tries to move exactly one task from busiest to this_rq, as 2262 + * part of active balancing operations within "domain". 2263 + * Returns 1 if successful and 0 otherwise. 2264 + * 2265 + * Called with both runqueues locked. 2266 + */ 2267 + static int move_one_task(struct rq *this_rq, int this_cpu, struct rq *busiest, 2268 + struct sched_domain *sd, enum cpu_idle_type idle) 2269 + { 2270 + struct sched_class *class; 2271 + int this_best_prio = MAX_PRIO; 2272 + 2273 + for (class = sched_class_highest; class; class = class->next) 2274 + if (class->load_balance(this_rq, this_cpu, busiest, 2275 + 1, ULONG_MAX, sd, idle, NULL, 2276 + &this_best_prio)) 2277 + return 1; 2278 + 2279 + return 0; 2255 2280 } 2256 2281 2257 2282 /* ··· 2603 2588 */ 2604 2589 #define MAX_PINNED_INTERVAL 512 2605 2590 2606 - static inline unsigned long minus_1_or_zero(unsigned long n) 2607 - { 2608 - return n > 0 ? n - 1 : 0; 2609 - } 2610 - 2611 2591 /* 2612 2592 * Check this_cpu to ensure it is balanced within domain. Attempt to move 2613 2593 * tasks if there is an imbalance. ··· 2611 2601 struct sched_domain *sd, enum cpu_idle_type idle, 2612 2602 int *balance) 2613 2603 { 2614 - int nr_moved, all_pinned = 0, active_balance = 0, sd_idle = 0; 2604 + int ld_moved, all_pinned = 0, active_balance = 0, sd_idle = 0; 2615 2605 struct sched_group *group; 2616 2606 unsigned long imbalance; 2617 2607 struct rq *busiest; ··· 2652 2642 2653 2643 schedstat_add(sd, lb_imbalance[idle], imbalance); 2654 2644 2655 - nr_moved = 0; 2645 + ld_moved = 0; 2656 2646 if (busiest->nr_running > 1) { 2657 2647 /* 2658 2648 * Attempt to move tasks. If find_busiest_group has found 2659 2649 * an imbalance but busiest->nr_running <= 1, the group is 2660 - * still unbalanced. nr_moved simply stays zero, so it is 2650 + * still unbalanced. ld_moved simply stays zero, so it is 2661 2651 * correctly treated as an imbalance. 2662 2652 */ 2663 2653 local_irq_save(flags); 2664 2654 double_rq_lock(this_rq, busiest); 2665 - nr_moved = move_tasks(this_rq, this_cpu, busiest, 2666 - minus_1_or_zero(busiest->nr_running), 2655 + ld_moved = move_tasks(this_rq, this_cpu, busiest, 2667 2656 imbalance, sd, idle, &all_pinned); 2668 2657 double_rq_unlock(this_rq, busiest); 2669 2658 local_irq_restore(flags); ··· 2670 2661 /* 2671 2662 * some other cpu did the load balance for us. 2672 2663 */ 2673 - if (nr_moved && this_cpu != smp_processor_id()) 2664 + if (ld_moved && this_cpu != smp_processor_id()) 2674 2665 resched_cpu(this_cpu); 2675 2666 2676 2667 /* All tasks on this runqueue were pinned by CPU affinity */ ··· 2682 2673 } 2683 2674 } 2684 2675 2685 - if (!nr_moved) { 2676 + if (!ld_moved) { 2686 2677 schedstat_inc(sd, lb_failed[idle]); 2687 2678 sd->nr_balance_failed++; 2688 2679 ··· 2731 2722 sd->balance_interval *= 2; 2732 2723 } 2733 2724 2734 - if (!nr_moved && !sd_idle && sd->flags & SD_SHARE_CPUPOWER && 2725 + if (!ld_moved && !sd_idle && sd->flags & SD_SHARE_CPUPOWER && 2735 2726 !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE)) 2736 2727 return -1; 2737 - return nr_moved; 2728 + return ld_moved; 2738 2729 2739 2730 out_balanced: 2740 2731 schedstat_inc(sd, lb_balanced[idle]); ··· 2766 2757 struct sched_group *group; 2767 2758 struct rq *busiest = NULL; 2768 2759 unsigned long imbalance; 2769 - int nr_moved = 0; 2760 + int ld_moved = 0; 2770 2761 int sd_idle = 0; 2771 2762 int all_pinned = 0; 2772 2763 cpumask_t cpus = CPU_MASK_ALL; ··· 2801 2792 2802 2793 schedstat_add(sd, lb_imbalance[CPU_NEWLY_IDLE], imbalance); 2803 2794 2804 - nr_moved = 0; 2795 + ld_moved = 0; 2805 2796 if (busiest->nr_running > 1) { 2806 2797 /* Attempt to move tasks */ 2807 2798 double_lock_balance(this_rq, busiest); 2808 - nr_moved = move_tasks(this_rq, this_cpu, busiest, 2809 - minus_1_or_zero(busiest->nr_running), 2799 + /* this_rq->clock is already updated */ 2800 + update_rq_clock(busiest); 2801 + ld_moved = move_tasks(this_rq, this_cpu, busiest, 2810 2802 imbalance, sd, CPU_NEWLY_IDLE, 2811 2803 &all_pinned); 2812 2804 spin_unlock(&busiest->lock); ··· 2819 2809 } 2820 2810 } 2821 2811 2822 - if (!nr_moved) { 2812 + if (!ld_moved) { 2823 2813 schedstat_inc(sd, lb_failed[CPU_NEWLY_IDLE]); 2824 2814 if (!sd_idle && sd->flags & SD_SHARE_CPUPOWER && 2825 2815 !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE)) ··· 2827 2817 } else 2828 2818 sd->nr_balance_failed = 0; 2829 2819 2830 - return nr_moved; 2820 + return ld_moved; 2831 2821 2832 2822 out_balanced: 2833 2823 schedstat_inc(sd, lb_balanced[CPU_NEWLY_IDLE]); ··· 2904 2894 2905 2895 /* move a task from busiest_rq to target_rq */ 2906 2896 double_lock_balance(busiest_rq, target_rq); 2897 + update_rq_clock(busiest_rq); 2898 + update_rq_clock(target_rq); 2907 2899 2908 2900 /* Search for an sd spanning us and the target CPU. */ 2909 2901 for_each_domain(target_cpu, sd) { ··· 2917 2905 if (likely(sd)) { 2918 2906 schedstat_inc(sd, alb_cnt); 2919 2907 2920 - if (move_tasks(target_rq, target_cpu, busiest_rq, 1, 2921 - ULONG_MAX, sd, CPU_IDLE, NULL)) 2908 + if (move_one_task(target_rq, target_cpu, busiest_rq, 2909 + sd, CPU_IDLE)) 2922 2910 schedstat_inc(sd, alb_pushed); 2923 2911 else 2924 2912 schedstat_inc(sd, alb_failed); ··· 3187 3175 unsigned long max_nr_move, unsigned long max_load_move, 3188 3176 struct sched_domain *sd, enum cpu_idle_type idle, 3189 3177 int *all_pinned, unsigned long *load_moved, 3190 - int this_best_prio, int best_prio, int best_prio_seen, 3191 - struct rq_iterator *iterator) 3178 + int *this_best_prio, struct rq_iterator *iterator) 3192 3179 { 3193 3180 *load_moved = 0; 3194 3181 ··· 3213 3202 rq = task_rq_lock(p, &flags); 3214 3203 ns = p->se.sum_exec_runtime; 3215 3204 if (rq->curr == p) { 3216 - delta_exec = rq_clock(rq) - p->se.exec_start; 3205 + update_rq_clock(rq); 3206 + delta_exec = rq->clock - p->se.exec_start; 3217 3207 if ((s64)delta_exec > 0) 3218 3208 ns += delta_exec; 3219 3209 } ··· 3310 3298 struct task_struct *curr = rq->curr; 3311 3299 3312 3300 spin_lock(&rq->lock); 3301 + __update_rq_clock(rq); 3302 + update_cpu_load(rq); 3313 3303 if (curr != rq->idle) /* FIXME: needed? */ 3314 3304 curr->sched_class->task_tick(rq, curr); 3315 - update_cpu_load(rq); 3316 3305 spin_unlock(&rq->lock); 3317 3306 3318 3307 #ifdef CONFIG_SMP ··· 3395 3382 * Pick up the highest-prio task: 3396 3383 */ 3397 3384 static inline struct task_struct * 3398 - pick_next_task(struct rq *rq, struct task_struct *prev, u64 now) 3385 + pick_next_task(struct rq *rq, struct task_struct *prev) 3399 3386 { 3400 3387 struct sched_class *class; 3401 3388 struct task_struct *p; ··· 3405 3392 * the fair class we can call that function directly: 3406 3393 */ 3407 3394 if (likely(rq->nr_running == rq->cfs.nr_running)) { 3408 - p = fair_sched_class.pick_next_task(rq, now); 3395 + p = fair_sched_class.pick_next_task(rq); 3409 3396 if (likely(p)) 3410 3397 return p; 3411 3398 } 3412 3399 3413 3400 class = sched_class_highest; 3414 3401 for ( ; ; ) { 3415 - p = class->pick_next_task(rq, now); 3402 + p = class->pick_next_task(rq); 3416 3403 if (p) 3417 3404 return p; 3418 3405 /* ··· 3431 3418 struct task_struct *prev, *next; 3432 3419 long *switch_count; 3433 3420 struct rq *rq; 3434 - u64 now; 3435 3421 int cpu; 3436 3422 3437 3423 need_resched: ··· 3448 3436 3449 3437 spin_lock_irq(&rq->lock); 3450 3438 clear_tsk_need_resched(prev); 3439 + __update_rq_clock(rq); 3451 3440 3452 3441 if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) { 3453 3442 if (unlikely((prev->state & TASK_INTERRUPTIBLE) && ··· 3463 3450 if (unlikely(!rq->nr_running)) 3464 3451 idle_balance(cpu, rq); 3465 3452 3466 - now = __rq_clock(rq); 3467 - prev->sched_class->put_prev_task(rq, prev, now); 3468 - next = pick_next_task(rq, prev, now); 3453 + prev->sched_class->put_prev_task(rq, prev); 3454 + next = pick_next_task(rq, prev); 3469 3455 3470 3456 sched_info_switch(prev, next); 3471 3457 ··· 3907 3895 unsigned long flags; 3908 3896 int oldprio, on_rq; 3909 3897 struct rq *rq; 3910 - u64 now; 3911 3898 3912 3899 BUG_ON(prio < 0 || prio > MAX_PRIO); 3913 3900 3914 3901 rq = task_rq_lock(p, &flags); 3915 - now = rq_clock(rq); 3902 + update_rq_clock(rq); 3916 3903 3917 3904 oldprio = p->prio; 3918 3905 on_rq = p->se.on_rq; 3919 3906 if (on_rq) 3920 - dequeue_task(rq, p, 0, now); 3907 + dequeue_task(rq, p, 0); 3921 3908 3922 3909 if (rt_prio(prio)) 3923 3910 p->sched_class = &rt_sched_class; ··· 3926 3915 p->prio = prio; 3927 3916 3928 3917 if (on_rq) { 3929 - enqueue_task(rq, p, 0, now); 3918 + enqueue_task(rq, p, 0); 3930 3919 /* 3931 3920 * Reschedule if we are currently running on this runqueue and 3932 3921 * our priority decreased, or if we are not currently running on ··· 3949 3938 int old_prio, delta, on_rq; 3950 3939 unsigned long flags; 3951 3940 struct rq *rq; 3952 - u64 now; 3953 3941 3954 3942 if (TASK_NICE(p) == nice || nice < -20 || nice > 19) 3955 3943 return; ··· 3957 3947 * the task might be in the middle of scheduling on another CPU. 3958 3948 */ 3959 3949 rq = task_rq_lock(p, &flags); 3960 - now = rq_clock(rq); 3950 + update_rq_clock(rq); 3961 3951 /* 3962 3952 * The RT priorities are set via sched_setscheduler(), but we still 3963 3953 * allow the 'normal' nice value to be set - but as expected ··· 3970 3960 } 3971 3961 on_rq = p->se.on_rq; 3972 3962 if (on_rq) { 3973 - dequeue_task(rq, p, 0, now); 3974 - dec_load(rq, p, now); 3963 + dequeue_task(rq, p, 0); 3964 + dec_load(rq, p); 3975 3965 } 3976 3966 3977 3967 p->static_prio = NICE_TO_PRIO(nice); ··· 3981 3971 delta = p->prio - old_prio; 3982 3972 3983 3973 if (on_rq) { 3984 - enqueue_task(rq, p, 0, now); 3985 - inc_load(rq, p, now); 3974 + enqueue_task(rq, p, 0); 3975 + inc_load(rq, p); 3986 3976 /* 3987 3977 * If the task increased its priority or is running and 3988 3978 * lowered its priority, then reschedule its CPU: ··· 4218 4208 spin_unlock_irqrestore(&p->pi_lock, flags); 4219 4209 goto recheck; 4220 4210 } 4211 + update_rq_clock(rq); 4221 4212 on_rq = p->se.on_rq; 4222 4213 if (on_rq) 4223 4214 deactivate_task(rq, p, 0); ··· 4474 4463 out_unlock: 4475 4464 read_unlock(&tasklist_lock); 4476 4465 mutex_unlock(&sched_hotcpu_mutex); 4477 - if (retval) 4478 - return retval; 4479 4466 4480 - return 0; 4467 + return retval; 4481 4468 } 4482 4469 4483 4470 /** ··· 4975 4966 on_rq = p->se.on_rq; 4976 4967 if (on_rq) 4977 4968 deactivate_task(rq_src, p, 0); 4969 + 4978 4970 set_task_cpu(p, dest_cpu); 4979 4971 if (on_rq) { 4980 4972 activate_task(rq_dest, p, 0); ··· 5208 5198 for ( ; ; ) { 5209 5199 if (!rq->nr_running) 5210 5200 break; 5211 - next = pick_next_task(rq, rq->curr, rq_clock(rq)); 5201 + update_rq_clock(rq); 5202 + next = pick_next_task(rq, rq->curr); 5212 5203 if (!next) 5213 5204 break; 5214 5205 migrate_dead(dead_cpu, next); ··· 5221 5210 #if defined(CONFIG_SCHED_DEBUG) && defined(CONFIG_SYSCTL) 5222 5211 5223 5212 static struct ctl_table sd_ctl_dir[] = { 5224 - {CTL_UNNUMBERED, "sched_domain", NULL, 0, 0755, NULL, }, 5213 + { 5214 + .procname = "sched_domain", 5215 + .mode = 0755, 5216 + }, 5225 5217 {0,}, 5226 5218 }; 5227 5219 5228 5220 static struct ctl_table sd_ctl_root[] = { 5229 - {CTL_UNNUMBERED, "kernel", NULL, 0, 0755, sd_ctl_dir, }, 5221 + { 5222 + .procname = "kernel", 5223 + .mode = 0755, 5224 + .child = sd_ctl_dir, 5225 + }, 5230 5226 {0,}, 5231 5227 }; 5232 5228 ··· 5249 5231 } 5250 5232 5251 5233 static void 5252 - set_table_entry(struct ctl_table *entry, int ctl_name, 5234 + set_table_entry(struct ctl_table *entry, 5253 5235 const char *procname, void *data, int maxlen, 5254 5236 mode_t mode, proc_handler *proc_handler) 5255 5237 { 5256 - entry->ctl_name = ctl_name; 5257 5238 entry->procname = procname; 5258 5239 entry->data = data; 5259 5240 entry->maxlen = maxlen; ··· 5265 5248 { 5266 5249 struct ctl_table *table = sd_alloc_ctl_entry(14); 5267 5250 5268 - set_table_entry(&table[0], 1, "min_interval", &sd->min_interval, 5251 + set_table_entry(&table[0], "min_interval", &sd->min_interval, 5269 5252 sizeof(long), 0644, proc_doulongvec_minmax); 5270 - set_table_entry(&table[1], 2, "max_interval", &sd->max_interval, 5253 + set_table_entry(&table[1], "max_interval", &sd->max_interval, 5271 5254 sizeof(long), 0644, proc_doulongvec_minmax); 5272 - set_table_entry(&table[2], 3, "busy_idx", &sd->busy_idx, 5255 + set_table_entry(&table[2], "busy_idx", &sd->busy_idx, 5273 5256 sizeof(int), 0644, proc_dointvec_minmax); 5274 - set_table_entry(&table[3], 4, "idle_idx", &sd->idle_idx, 5257 + set_table_entry(&table[3], "idle_idx", &sd->idle_idx, 5275 5258 sizeof(int), 0644, proc_dointvec_minmax); 5276 - set_table_entry(&table[4], 5, "newidle_idx", &sd->newidle_idx, 5259 + set_table_entry(&table[4], "newidle_idx", &sd->newidle_idx, 5277 5260 sizeof(int), 0644, proc_dointvec_minmax); 5278 - set_table_entry(&table[5], 6, "wake_idx", &sd->wake_idx, 5261 + set_table_entry(&table[5], "wake_idx", &sd->wake_idx, 5279 5262 sizeof(int), 0644, proc_dointvec_minmax); 5280 - set_table_entry(&table[6], 7, "forkexec_idx", &sd->forkexec_idx, 5263 + set_table_entry(&table[6], "forkexec_idx", &sd->forkexec_idx, 5281 5264 sizeof(int), 0644, proc_dointvec_minmax); 5282 - set_table_entry(&table[7], 8, "busy_factor", &sd->busy_factor, 5265 + set_table_entry(&table[7], "busy_factor", &sd->busy_factor, 5283 5266 sizeof(int), 0644, proc_dointvec_minmax); 5284 - set_table_entry(&table[8], 9, "imbalance_pct", &sd->imbalance_pct, 5267 + set_table_entry(&table[8], "imbalance_pct", &sd->imbalance_pct, 5285 5268 sizeof(int), 0644, proc_dointvec_minmax); 5286 - set_table_entry(&table[10], 11, "cache_nice_tries", 5269 + set_table_entry(&table[10], "cache_nice_tries", 5287 5270 &sd->cache_nice_tries, 5288 5271 sizeof(int), 0644, proc_dointvec_minmax); 5289 - set_table_entry(&table[12], 13, "flags", &sd->flags, 5272 + set_table_entry(&table[12], "flags", &sd->flags, 5290 5273 sizeof(int), 0644, proc_dointvec_minmax); 5291 5274 5292 5275 return table; ··· 5306 5289 i = 0; 5307 5290 for_each_domain(cpu, sd) { 5308 5291 snprintf(buf, 32, "domain%d", i); 5309 - entry->ctl_name = i + 1; 5310 5292 entry->procname = kstrdup(buf, GFP_KERNEL); 5311 5293 entry->mode = 0755; 5312 5294 entry->child = sd_alloc_ctl_domain_table(sd); ··· 5326 5310 5327 5311 for (i = 0; i < cpu_num; i++, entry++) { 5328 5312 snprintf(buf, 32, "cpu%d", i); 5329 - entry->ctl_name = i + 1; 5330 5313 entry->procname = kstrdup(buf, GFP_KERNEL); 5331 5314 entry->mode = 0755; 5332 5315 entry->child = sd_alloc_ctl_cpu_table(i); ··· 5394 5379 rq->migration_thread = NULL; 5395 5380 /* Idle task back to normal (off runqueue, low prio) */ 5396 5381 rq = task_rq_lock(rq->idle, &flags); 5382 + update_rq_clock(rq); 5397 5383 deactivate_task(rq, rq->idle, 0); 5398 5384 rq->idle->static_prio = MAX_PRIO; 5399 5385 __setscheduler(rq, rq->idle, SCHED_NORMAL, 0); ··· 6632 6616 goto out_unlock; 6633 6617 #endif 6634 6618 6619 + update_rq_clock(rq); 6635 6620 on_rq = p->se.on_rq; 6636 6621 if (on_rq) 6637 - deactivate_task(task_rq(p), p, 0); 6622 + deactivate_task(rq, p, 0); 6638 6623 __setscheduler(rq, p, SCHED_NORMAL, 0); 6639 6624 if (on_rq) { 6640 - activate_task(task_rq(p), p, 0); 6625 + activate_task(rq, p, 0); 6641 6626 resched_task(rq->curr); 6642 6627 } 6643 6628 #ifdef CONFIG_SMP

+8 -8

kernel/sched_debug.c

··· 29 29 } while (0) 30 30 31 31 static void 32 - print_task(struct seq_file *m, struct rq *rq, struct task_struct *p, u64 now) 32 + print_task(struct seq_file *m, struct rq *rq, struct task_struct *p) 33 33 { 34 34 if (rq->curr == p) 35 35 SEQ_printf(m, "R"); ··· 56 56 #endif 57 57 } 58 58 59 - static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu, u64 now) 59 + static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu) 60 60 { 61 61 struct task_struct *g, *p; 62 62 ··· 77 77 if (!p->se.on_rq || task_cpu(p) != rq_cpu) 78 78 continue; 79 79 80 - print_task(m, rq, p, now); 80 + print_task(m, rq, p); 81 81 } while_each_thread(g, p); 82 82 83 83 read_unlock_irq(&tasklist_lock); ··· 106 106 (long long)wait_runtime_rq_sum); 107 107 } 108 108 109 - void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq, u64 now) 109 + void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) 110 110 { 111 111 SEQ_printf(m, "\ncfs_rq %p\n", cfs_rq); 112 112 ··· 124 124 print_cfs_rq_runtime_sum(m, cpu, cfs_rq); 125 125 } 126 126 127 - static void print_cpu(struct seq_file *m, int cpu, u64 now) 127 + static void print_cpu(struct seq_file *m, int cpu) 128 128 { 129 129 struct rq *rq = &per_cpu(runqueues, cpu); 130 130 ··· 166 166 P(cpu_load[4]); 167 167 #undef P 168 168 169 - print_cfs_stats(m, cpu, now); 169 + print_cfs_stats(m, cpu); 170 170 171 - print_rq(m, rq, cpu, now); 171 + print_rq(m, rq, cpu); 172 172 } 173 173 174 174 static int sched_debug_show(struct seq_file *m, void *v) ··· 184 184 SEQ_printf(m, "now at %Lu nsecs\n", (unsigned long long)now); 185 185 186 186 for_each_online_cpu(cpu) 187 - print_cpu(m, cpu, now); 187 + print_cpu(m, cpu); 188 188 189 189 SEQ_printf(m, "\n"); 190 190

+97 -117

kernel/sched_fair.c

··· 222 222 { 223 223 u64 tmp; 224 224 225 - /* 226 - * Negative nice levels get the same granularity as nice-0: 227 - */ 228 - if (likely(curr->load.weight >= NICE_0_LOAD)) 225 + if (likely(curr->load.weight == NICE_0_LOAD)) 229 226 return granularity; 230 227 /* 231 - * Positive nice level tasks get linearly finer 228 + * Positive nice levels get the same granularity as nice-0: 229 + */ 230 + if (likely(curr->load.weight < NICE_0_LOAD)) { 231 + tmp = curr->load.weight * (u64)granularity; 232 + return (long) (tmp >> NICE_0_SHIFT); 233 + } 234 + /* 235 + * Negative nice level tasks get linearly finer 232 236 * granularity: 233 237 */ 234 - tmp = curr->load.weight * (u64)granularity; 238 + tmp = curr->load.inv_weight * (u64)granularity; 235 239 236 240 /* 237 241 * It will always fit into 'long': 238 242 */ 239 - return (long) (tmp >> NICE_0_SHIFT); 243 + return (long) (tmp >> WMULT_SHIFT); 240 244 } 241 245 242 246 static inline void ··· 285 281 * are not in our scheduling class. 286 282 */ 287 283 static inline void 288 - __update_curr(struct cfs_rq *cfs_rq, struct sched_entity *curr, u64 now) 284 + __update_curr(struct cfs_rq *cfs_rq, struct sched_entity *curr) 289 285 { 290 - unsigned long delta, delta_exec, delta_fair; 291 - long delta_mine; 286 + unsigned long delta, delta_exec, delta_fair, delta_mine; 292 287 struct load_weight *lw = &cfs_rq->load; 293 288 unsigned long load = lw->weight; 294 - 295 - if (unlikely(!load)) 296 - return; 297 289 298 290 delta_exec = curr->delta_exec; 299 291 schedstat_set(curr->exec_max, max((u64)delta_exec, curr->exec_max)); ··· 297 297 curr->sum_exec_runtime += delta_exec; 298 298 cfs_rq->exec_clock += delta_exec; 299 299 300 + if (unlikely(!load)) 301 + return; 302 + 300 303 delta_fair = calc_delta_fair(delta_exec, lw); 301 304 delta_mine = calc_delta_mine(delta_exec, curr->load.weight, lw); 302 305 303 - if (cfs_rq->sleeper_bonus > sysctl_sched_stat_granularity) { 306 + if (cfs_rq->sleeper_bonus > sysctl_sched_granularity) { 304 307 delta = calc_delta_mine(cfs_rq->sleeper_bonus, 305 308 curr->load.weight, lw); 306 309 if (unlikely(delta > cfs_rq->sleeper_bonus)) ··· 324 321 add_wait_runtime(cfs_rq, curr, delta_mine - delta_exec); 325 322 } 326 323 327 - static void update_curr(struct cfs_rq *cfs_rq, u64 now) 324 + static void update_curr(struct cfs_rq *cfs_rq) 328 325 { 329 326 struct sched_entity *curr = cfs_rq_curr(cfs_rq); 330 327 unsigned long delta_exec; ··· 337 334 * since the last time we changed load (this cannot 338 335 * overflow on 32 bits): 339 336 */ 340 - delta_exec = (unsigned long)(now - curr->exec_start); 337 + delta_exec = (unsigned long)(rq_of(cfs_rq)->clock - curr->exec_start); 341 338 342 339 curr->delta_exec += delta_exec; 343 340 344 341 if (unlikely(curr->delta_exec > sysctl_sched_stat_granularity)) { 345 - __update_curr(cfs_rq, curr, now); 342 + __update_curr(cfs_rq, curr); 346 343 curr->delta_exec = 0; 347 344 } 348 - curr->exec_start = now; 345 + curr->exec_start = rq_of(cfs_rq)->clock; 349 346 } 350 347 351 348 static inline void 352 - update_stats_wait_start(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now) 349 + update_stats_wait_start(struct cfs_rq *cfs_rq, struct sched_entity *se) 353 350 { 354 351 se->wait_start_fair = cfs_rq->fair_clock; 355 - schedstat_set(se->wait_start, now); 352 + schedstat_set(se->wait_start, rq_of(cfs_rq)->clock); 356 353 } 357 354 358 355 /* ··· 380 377 /* 381 378 * Task is being enqueued - update stats: 382 379 */ 383 - static void 384 - update_stats_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now) 380 + static void update_stats_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se) 385 381 { 386 382 s64 key; 387 383 ··· 389 387 * a dequeue/enqueue event is a NOP) 390 388 */ 391 389 if (se != cfs_rq_curr(cfs_rq)) 392 - update_stats_wait_start(cfs_rq, se, now); 390 + update_stats_wait_start(cfs_rq, se); 393 391 /* 394 392 * Update the key: 395 393 */ ··· 409 407 (WMULT_SHIFT - NICE_0_SHIFT); 410 408 } else { 411 409 tmp = se->wait_runtime; 412 - key -= (tmp * se->load.weight) >> NICE_0_SHIFT; 410 + key -= (tmp * se->load.inv_weight) >> 411 + (WMULT_SHIFT - NICE_0_SHIFT); 413 412 } 414 413 } 415 414 ··· 421 418 * Note: must be called with a freshly updated rq->fair_clock. 422 419 */ 423 420 static inline void 424 - __update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now) 421 + __update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se) 425 422 { 426 423 unsigned long delta_fair = se->delta_fair_run; 427 424 428 - schedstat_set(se->wait_max, max(se->wait_max, now - se->wait_start)); 425 + schedstat_set(se->wait_max, max(se->wait_max, 426 + rq_of(cfs_rq)->clock - se->wait_start)); 429 427 430 428 if (unlikely(se->load.weight != NICE_0_LOAD)) 431 429 delta_fair = calc_weighted(delta_fair, se->load.weight, ··· 436 432 } 437 433 438 434 static void 439 - update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now) 435 + update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se) 440 436 { 441 437 unsigned long delta_fair; 442 438 ··· 446 442 se->delta_fair_run += delta_fair; 447 443 if (unlikely(abs(se->delta_fair_run) >= 448 444 sysctl_sched_stat_granularity)) { 449 - __update_stats_wait_end(cfs_rq, se, now); 445 + __update_stats_wait_end(cfs_rq, se); 450 446 se->delta_fair_run = 0; 451 447 } 452 448 ··· 455 451 } 456 452 457 453 static inline void 458 - update_stats_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now) 454 + update_stats_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se) 459 455 { 460 - update_curr(cfs_rq, now); 456 + update_curr(cfs_rq); 461 457 /* 462 458 * Mark the end of the wait period if dequeueing a 463 459 * waiting task: 464 460 */ 465 461 if (se != cfs_rq_curr(cfs_rq)) 466 - update_stats_wait_end(cfs_rq, se, now); 462 + update_stats_wait_end(cfs_rq, se); 467 463 } 468 464 469 465 /* 470 466 * We are picking a new current task - update its stats: 471 467 */ 472 468 static inline void 473 - update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now) 469 + update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se) 474 470 { 475 471 /* 476 472 * We are starting a new run period: 477 473 */ 478 - se->exec_start = now; 474 + se->exec_start = rq_of(cfs_rq)->clock; 479 475 } 480 476 481 477 /* 482 478 * We are descheduling a task - update its stats: 483 479 */ 484 480 static inline void 485 - update_stats_curr_end(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now) 481 + update_stats_curr_end(struct cfs_rq *cfs_rq, struct sched_entity *se) 486 482 { 487 483 se->exec_start = 0; 488 484 } ··· 491 487 * Scheduling class queueing methods: 492 488 */ 493 489 494 - static void 495 - __enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now) 490 + static void __enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se) 496 491 { 497 492 unsigned long load = cfs_rq->load.weight, delta_fair; 498 493 long prev_runtime; ··· 525 522 schedstat_add(cfs_rq, wait_runtime, se->wait_runtime); 526 523 } 527 524 528 - static void 529 - enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now) 525 + static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se) 530 526 { 531 527 struct task_struct *tsk = task_of(se); 532 528 unsigned long delta_fair; ··· 540 538 se->delta_fair_sleep += delta_fair; 541 539 if (unlikely(abs(se->delta_fair_sleep) >= 542 540 sysctl_sched_stat_granularity)) { 543 - __enqueue_sleeper(cfs_rq, se, now); 541 + __enqueue_sleeper(cfs_rq, se); 544 542 se->delta_fair_sleep = 0; 545 543 } 546 544 ··· 548 546 549 547 #ifdef CONFIG_SCHEDSTATS 550 548 if (se->sleep_start) { 551 - u64 delta = now - se->sleep_start; 549 + u64 delta = rq_of(cfs_rq)->clock - se->sleep_start; 552 550 553 551 if ((s64)delta < 0) 554 552 delta = 0; ··· 560 558 se->sum_sleep_runtime += delta; 561 559 } 562 560 if (se->block_start) { 563 - u64 delta = now - se->block_start; 561 + u64 delta = rq_of(cfs_rq)->clock - se->block_start; 564 562 565 563 if ((s64)delta < 0) 566 564 delta = 0; ··· 575 573 } 576 574 577 575 static void 578 - enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, 579 - int wakeup, u64 now) 576 + enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int wakeup) 580 577 { 581 578 /* 582 579 * Update the fair clock. 583 580 */ 584 - update_curr(cfs_rq, now); 581 + update_curr(cfs_rq); 585 582 586 583 if (wakeup) 587 - enqueue_sleeper(cfs_rq, se, now); 584 + enqueue_sleeper(cfs_rq, se); 588 585 589 - update_stats_enqueue(cfs_rq, se, now); 586 + update_stats_enqueue(cfs_rq, se); 590 587 __enqueue_entity(cfs_rq, se); 591 588 } 592 589 593 590 static void 594 - dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, 595 - int sleep, u64 now) 591 + dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int sleep) 596 592 { 597 - update_stats_dequeue(cfs_rq, se, now); 593 + update_stats_dequeue(cfs_rq, se); 598 594 if (sleep) { 599 595 se->sleep_start_fair = cfs_rq->fair_clock; 600 596 #ifdef CONFIG_SCHEDSTATS ··· 600 600 struct task_struct *tsk = task_of(se); 601 601 602 602 if (tsk->state & TASK_INTERRUPTIBLE) 603 - se->sleep_start = now; 603 + se->sleep_start = rq_of(cfs_rq)->clock; 604 604 if (tsk->state & TASK_UNINTERRUPTIBLE) 605 - se->block_start = now; 605 + se->block_start = rq_of(cfs_rq)->clock; 606 606 } 607 607 cfs_rq->wait_runtime -= se->wait_runtime; 608 608 #endif ··· 629 629 } 630 630 631 631 static inline void 632 - set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now) 632 + set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) 633 633 { 634 634 /* 635 635 * Any task has to be enqueued before it get to execute on ··· 638 638 * done a put_prev_task_fair() shortly before this, which 639 639 * updated rq->fair_clock - used by update_stats_wait_end()) 640 640 */ 641 - update_stats_wait_end(cfs_rq, se, now); 642 - update_stats_curr_start(cfs_rq, se, now); 641 + update_stats_wait_end(cfs_rq, se); 642 + update_stats_curr_start(cfs_rq, se); 643 643 set_cfs_rq_curr(cfs_rq, se); 644 644 } 645 645 646 - static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq, u64 now) 646 + static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq) 647 647 { 648 648 struct sched_entity *se = __pick_next_entity(cfs_rq); 649 649 650 - set_next_entity(cfs_rq, se, now); 650 + set_next_entity(cfs_rq, se); 651 651 652 652 return se; 653 653 } 654 654 655 - static void 656 - put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *prev, u64 now) 655 + static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *prev) 657 656 { 658 657 /* 659 658 * If still on the runqueue then deactivate_task() 660 659 * was not called and update_curr() has to be done: 661 660 */ 662 661 if (prev->on_rq) 663 - update_curr(cfs_rq, now); 662 + update_curr(cfs_rq); 664 663 665 - update_stats_curr_end(cfs_rq, prev, now); 664 + update_stats_curr_end(cfs_rq, prev); 666 665 667 666 if (prev->on_rq) 668 - update_stats_wait_start(cfs_rq, prev, now); 667 + update_stats_wait_start(cfs_rq, prev); 669 668 set_cfs_rq_curr(cfs_rq, NULL); 670 669 } 671 670 672 671 static void entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr) 673 672 { 674 - struct rq *rq = rq_of(cfs_rq); 675 673 struct sched_entity *next; 676 - u64 now = __rq_clock(rq); 677 674 678 675 /* 679 676 * Dequeue and enqueue the task to update its 680 677 * position within the tree: 681 678 */ 682 - dequeue_entity(cfs_rq, curr, 0, now); 683 - enqueue_entity(cfs_rq, curr, 0, now); 679 + dequeue_entity(cfs_rq, curr, 0); 680 + enqueue_entity(cfs_rq, curr, 0); 684 681 685 682 /* 686 683 * Reschedule if another task tops the current one. ··· 782 785 * increased. Here we update the fair scheduling stats and 783 786 * then put the task into the rbtree: 784 787 */ 785 - static void 786 - enqueue_task_fair(struct rq *rq, struct task_struct *p, int wakeup, u64 now) 788 + static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int wakeup) 787 789 { 788 790 struct cfs_rq *cfs_rq; 789 791 struct sched_entity *se = &p->se; ··· 791 795 if (se->on_rq) 792 796 break; 793 797 cfs_rq = cfs_rq_of(se); 794 - enqueue_entity(cfs_rq, se, wakeup, now); 798 + enqueue_entity(cfs_rq, se, wakeup); 795 799 } 796 800 } 797 801 ··· 800 804 * decreased. We remove the task from the rbtree and 801 805 * update the fair scheduling stats: 802 806 */ 803 - static void 804 - dequeue_task_fair(struct rq *rq, struct task_struct *p, int sleep, u64 now) 807 + static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int sleep) 805 808 { 806 809 struct cfs_rq *cfs_rq; 807 810 struct sched_entity *se = &p->se; 808 811 809 812 for_each_sched_entity(se) { 810 813 cfs_rq = cfs_rq_of(se); 811 - dequeue_entity(cfs_rq, se, sleep, now); 814 + dequeue_entity(cfs_rq, se, sleep); 812 815 /* Don't dequeue parent if it has other entities besides us */ 813 816 if (cfs_rq->load.weight) 814 817 break; ··· 820 825 static void yield_task_fair(struct rq *rq, struct task_struct *p) 821 826 { 822 827 struct cfs_rq *cfs_rq = task_cfs_rq(p); 823 - u64 now = __rq_clock(rq); 824 828 829 + __update_rq_clock(rq); 825 830 /* 826 831 * Dequeue and enqueue the task to update its 827 832 * position within the tree: 828 833 */ 829 - dequeue_entity(cfs_rq, &p->se, 0, now); 830 - enqueue_entity(cfs_rq, &p->se, 0, now); 834 + dequeue_entity(cfs_rq, &p->se, 0); 835 + enqueue_entity(cfs_rq, &p->se, 0); 831 836 } 832 837 833 838 /* ··· 840 845 unsigned long gran; 841 846 842 847 if (unlikely(rt_prio(p->prio))) { 843 - update_curr(cfs_rq, rq_clock(rq)); 848 + update_rq_clock(rq); 849 + update_curr(cfs_rq); 844 850 resched_task(curr); 845 851 return; 846 852 } ··· 857 861 __check_preempt_curr_fair(cfs_rq, &p->se, &curr->se, gran); 858 862 } 859 863 860 - static struct task_struct *pick_next_task_fair(struct rq *rq, u64 now) 864 + static struct task_struct *pick_next_task_fair(struct rq *rq) 861 865 { 862 866 struct cfs_rq *cfs_rq = &rq->cfs; 863 867 struct sched_entity *se; ··· 866 870 return NULL; 867 871 868 872 do { 869 - se = pick_next_entity(cfs_rq, now); 873 + se = pick_next_entity(cfs_rq); 870 874 cfs_rq = group_cfs_rq(se); 871 875 } while (cfs_rq); 872 876 ··· 876 880 /* 877 881 * Account for a descheduled task: 878 882 */ 879 - static void put_prev_task_fair(struct rq *rq, struct task_struct *prev, u64 now) 883 + static void put_prev_task_fair(struct rq *rq, struct task_struct *prev) 880 884 { 881 885 struct sched_entity *se = &prev->se; 882 886 struct cfs_rq *cfs_rq; 883 887 884 888 for_each_sched_entity(se) { 885 889 cfs_rq = cfs_rq_of(se); 886 - put_prev_entity(cfs_rq, se, now); 890 + put_prev_entity(cfs_rq, se); 887 891 } 888 892 } 889 893 ··· 926 930 return __load_balance_iterator(cfs_rq, cfs_rq->rb_load_balance_curr); 927 931 } 928 932 933 + #ifdef CONFIG_FAIR_GROUP_SCHED 929 934 static int cfs_rq_best_prio(struct cfs_rq *cfs_rq) 930 935 { 931 936 struct sched_entity *curr; ··· 940 943 941 944 return p->prio; 942 945 } 946 + #endif 943 947 944 - static int 948 + static unsigned long 945 949 load_balance_fair(struct rq *this_rq, int this_cpu, struct rq *busiest, 946 - unsigned long max_nr_move, unsigned long max_load_move, 947 - struct sched_domain *sd, enum cpu_idle_type idle, 948 - int *all_pinned, unsigned long *total_load_moved) 950 + unsigned long max_nr_move, unsigned long max_load_move, 951 + struct sched_domain *sd, enum cpu_idle_type idle, 952 + int *all_pinned, int *this_best_prio) 949 953 { 950 954 struct cfs_rq *busy_cfs_rq; 951 955 unsigned long load_moved, total_nr_moved = 0, nr_moved; ··· 957 959 cfs_rq_iterator.next = load_balance_next_fair; 958 960 959 961 for_each_leaf_cfs_rq(busiest, busy_cfs_rq) { 962 + #ifdef CONFIG_FAIR_GROUP_SCHED 960 963 struct cfs_rq *this_cfs_rq; 961 - long imbalance; 964 + long imbalances; 962 965 unsigned long maxload; 963 - int this_best_prio, best_prio, best_prio_seen = 0; 964 966 965 967 this_cfs_rq = cpu_cfs_rq(busy_cfs_rq, this_cpu); 966 968 ··· 974 976 imbalance /= 2; 975 977 maxload = min(rem_load_move, imbalance); 976 978 977 - this_best_prio = cfs_rq_best_prio(this_cfs_rq); 978 - best_prio = cfs_rq_best_prio(busy_cfs_rq); 979 - 980 - /* 981 - * Enable handling of the case where there is more than one task 982 - * with the best priority. If the current running task is one 983 - * of those with prio==best_prio we know it won't be moved 984 - * and therefore it's safe to override the skip (based on load) 985 - * of any task we find with that prio. 986 - */ 987 - if (cfs_rq_curr(busy_cfs_rq) == &busiest->curr->se) 988 - best_prio_seen = 1; 989 - 979 + *this_best_prio = cfs_rq_best_prio(this_cfs_rq); 980 + #else 981 + #define maxload rem_load_move 982 + #endif 990 983 /* pass busy_cfs_rq argument into 991 984 * load_balance_[start|next]_fair iterators 992 985 */ 993 986 cfs_rq_iterator.arg = busy_cfs_rq; 994 987 nr_moved = balance_tasks(this_rq, this_cpu, busiest, 995 988 max_nr_move, maxload, sd, idle, all_pinned, 996 - &load_moved, this_best_prio, best_prio, 997 - best_prio_seen, &cfs_rq_iterator); 989 + &load_moved, this_best_prio, &cfs_rq_iterator); 998 990 999 991 total_nr_moved += nr_moved; 1000 992 max_nr_move -= nr_moved; ··· 994 1006 break; 995 1007 } 996 1008 997 - *total_load_moved = max_load_move - rem_load_move; 998 - 999 - return total_nr_moved; 1009 + return max_load_move - rem_load_move; 1000 1010 } 1001 1011 1002 1012 /* ··· 1018 1032 * monopolize the CPU. Note: the parent runqueue is locked, 1019 1033 * the child is not running yet. 1020 1034 */ 1021 - static void task_new_fair(struct rq *rq, struct task_struct *p, u64 now) 1035 + static void task_new_fair(struct rq *rq, struct task_struct *p) 1022 1036 { 1023 1037 struct cfs_rq *cfs_rq = task_cfs_rq(p); 1024 1038 struct sched_entity *se = &p->se; 1025 1039 1026 1040 sched_info_queued(p); 1027 1041 1028 - update_stats_enqueue(cfs_rq, se, now); 1042 + update_stats_enqueue(cfs_rq, se); 1029 1043 /* 1030 1044 * Child runs first: we let it run before the parent 1031 1045 * until it reschedules once. We set up the key so that ··· 1058 1072 */ 1059 1073 static void set_curr_task_fair(struct rq *rq) 1060 1074 { 1061 - struct task_struct *curr = rq->curr; 1062 - struct sched_entity *se = &curr->se; 1063 - u64 now = rq_clock(rq); 1064 - struct cfs_rq *cfs_rq; 1075 + struct sched_entity *se = &rq->curr.se; 1065 1076 1066 - for_each_sched_entity(se) { 1067 - cfs_rq = cfs_rq_of(se); 1068 - set_next_entity(cfs_rq, se, now); 1069 - } 1077 + for_each_sched_entity(se) 1078 + set_next_entity(cfs_rq_of(se), se); 1070 1079 } 1071 1080 #else 1072 1081 static void set_curr_task_fair(struct rq *rq) ··· 1090 1109 }; 1091 1110 1092 1111 #ifdef CONFIG_SCHED_DEBUG 1093 - void print_cfs_stats(struct seq_file *m, int cpu, u64 now) 1112 + static void print_cfs_stats(struct seq_file *m, int cpu) 1094 1113 { 1095 - struct rq *rq = cpu_rq(cpu); 1096 1114 struct cfs_rq *cfs_rq; 1097 1115 1098 - for_each_leaf_cfs_rq(rq, cfs_rq) 1099 - print_cfs_rq(m, cpu, cfs_rq, now); 1116 + for_each_leaf_cfs_rq(cpu_rq(cpu), cfs_rq) 1117 + print_cfs_rq(m, cpu, cfs_rq); 1100 1118 } 1101 1119 #endif

+5 -5

kernel/sched_idletask.c

··· 13 13 resched_task(rq->idle); 14 14 } 15 15 16 - static struct task_struct *pick_next_task_idle(struct rq *rq, u64 now) 16 + static struct task_struct *pick_next_task_idle(struct rq *rq) 17 17 { 18 18 schedstat_inc(rq, sched_goidle); 19 19 ··· 25 25 * message if some code attempts to do it: 26 26 */ 27 27 static void 28 - dequeue_task_idle(struct rq *rq, struct task_struct *p, int sleep, u64 now) 28 + dequeue_task_idle(struct rq *rq, struct task_struct *p, int sleep) 29 29 { 30 30 spin_unlock_irq(&rq->lock); 31 31 printk(KERN_ERR "bad: scheduling from the idle thread!\n"); ··· 33 33 spin_lock_irq(&rq->lock); 34 34 } 35 35 36 - static void put_prev_task_idle(struct rq *rq, struct task_struct *prev, u64 now) 36 + static void put_prev_task_idle(struct rq *rq, struct task_struct *prev) 37 37 { 38 38 } 39 39 40 - static int 40 + static unsigned long 41 41 load_balance_idle(struct rq *this_rq, int this_cpu, struct rq *busiest, 42 42 unsigned long max_nr_move, unsigned long max_load_move, 43 43 struct sched_domain *sd, enum cpu_idle_type idle, 44 - int *all_pinned, unsigned long *total_load_moved) 44 + int *all_pinned, int *this_best_prio) 45 45 { 46 46 return 0; 47 47 }

+16 -32

kernel/sched_rt.c

··· 7 7 * Update the current task's runtime statistics. Skip current tasks that 8 8 * are not in our scheduling class. 9 9 */ 10 - static inline void update_curr_rt(struct rq *rq, u64 now) 10 + static inline void update_curr_rt(struct rq *rq) 11 11 { 12 12 struct task_struct *curr = rq->curr; 13 13 u64 delta_exec; ··· 15 15 if (!task_has_rt_policy(curr)) 16 16 return; 17 17 18 - delta_exec = now - curr->se.exec_start; 18 + delta_exec = rq->clock - curr->se.exec_start; 19 19 if (unlikely((s64)delta_exec < 0)) 20 20 delta_exec = 0; 21 21 22 22 schedstat_set(curr->se.exec_max, max(curr->se.exec_max, delta_exec)); 23 23 24 24 curr->se.sum_exec_runtime += delta_exec; 25 - curr->se.exec_start = now; 25 + curr->se.exec_start = rq->clock; 26 26 } 27 27 28 - static void 29 - enqueue_task_rt(struct rq *rq, struct task_struct *p, int wakeup, u64 now) 28 + static void enqueue_task_rt(struct rq *rq, struct task_struct *p, int wakeup) 30 29 { 31 30 struct rt_prio_array *array = &rq->rt.active; 32 31 ··· 36 37 /* 37 38 * Adding/removing a task to/from a priority array: 38 39 */ 39 - static void 40 - dequeue_task_rt(struct rq *rq, struct task_struct *p, int sleep, u64 now) 40 + static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int sleep) 41 41 { 42 42 struct rt_prio_array *array = &rq->rt.active; 43 43 44 - update_curr_rt(rq, now); 44 + update_curr_rt(rq); 45 45 46 46 list_del(&p->run_list); 47 47 if (list_empty(array->queue + p->prio)) ··· 73 75 resched_task(rq->curr); 74 76 } 75 77 76 - static struct task_struct *pick_next_task_rt(struct rq *rq, u64 now) 78 + static struct task_struct *pick_next_task_rt(struct rq *rq) 77 79 { 78 80 struct rt_prio_array *array = &rq->rt.active; 79 81 struct task_struct *next; ··· 87 89 queue = array->queue + idx; 88 90 next = list_entry(queue->next, struct task_struct, run_list); 89 91 90 - next->se.exec_start = now; 92 + next->se.exec_start = rq->clock; 91 93 92 94 return next; 93 95 } 94 96 95 - static void put_prev_task_rt(struct rq *rq, struct task_struct *p, u64 now) 97 + static void put_prev_task_rt(struct rq *rq, struct task_struct *p) 96 98 { 97 - update_curr_rt(rq, now); 99 + update_curr_rt(rq); 98 100 p->se.exec_start = 0; 99 101 } 100 102 ··· 170 172 return p; 171 173 } 172 174 173 - static int 175 + static unsigned long 174 176 load_balance_rt(struct rq *this_rq, int this_cpu, struct rq *busiest, 175 177 unsigned long max_nr_move, unsigned long max_load_move, 176 178 struct sched_domain *sd, enum cpu_idle_type idle, 177 - int *all_pinned, unsigned long *load_moved) 179 + int *all_pinned, int *this_best_prio) 178 180 { 179 - int this_best_prio, best_prio, best_prio_seen = 0; 180 181 int nr_moved; 181 182 struct rq_iterator rt_rq_iterator; 182 - 183 - best_prio = sched_find_first_bit(busiest->rt.active.bitmap); 184 - this_best_prio = sched_find_first_bit(this_rq->rt.active.bitmap); 185 - 186 - /* 187 - * Enable handling of the case where there is more than one task 188 - * with the best priority. If the current running task is one 189 - * of those with prio==best_prio we know it won't be moved 190 - * and therefore it's safe to override the skip (based on load) 191 - * of any task we find with that prio. 192 - */ 193 - if (busiest->curr->prio == best_prio) 194 - best_prio_seen = 1; 183 + unsigned long load_moved; 195 184 196 185 rt_rq_iterator.start = load_balance_start_rt; 197 186 rt_rq_iterator.next = load_balance_next_rt; ··· 188 203 rt_rq_iterator.arg = busiest; 189 204 190 205 nr_moved = balance_tasks(this_rq, this_cpu, busiest, max_nr_move, 191 - max_load_move, sd, idle, all_pinned, load_moved, 192 - this_best_prio, best_prio, best_prio_seen, 193 - &rt_rq_iterator); 206 + max_load_move, sd, idle, all_pinned, &load_moved, 207 + this_best_prio, &rt_rq_iterator); 194 208 195 - return nr_moved; 209 + return load_moved; 196 210 } 197 211 198 212 static void task_tick_rt(struct rq *rq, struct task_struct *p)