Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

[PATCH] sched: resched and cpu_idle rework

Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
confusion, and make their semantics rigid. Improves efficiency of
resched_task and some cpu_idle routines.

* In resched_task:
- TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
and as we hold it during resched_task, then there is no need for an
atomic test and set there. The only other time this should be set is
when the task's quantum expires, in the timer interrupt - this is
protected against because the rq lock is irq-safe.

- If TIF_NEED_RESCHED is set, then we don't need to do anything. It
won't get unset until the task get's schedule()d off.

- If we are running on the same CPU as the task we resched, then set
TIF_NEED_RESCHED and no further action is required.

- If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
after TIF_NEED_RESCHED has been set, then we need to send an IPI.

Using these rules, we are able to remove the test and set operation in
resched_task, and make clear the previously vague semantics of
POLLING_NRFLAG.

* In idle routines:
- Enter cpu_idle with preempt disabled. When the need_resched() condition
becomes true, explicitly call schedule(). This makes things a bit clearer
(IMO), but haven't updated all architectures yet.

- Many do a test and clear of TIF_NEED_RESCHED for some reason. According
to the resched_task rules, this isn't needed (and actually breaks the
assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
held). So remove that. Generally one less locked memory op when switching
to the idle thread.

- Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
most polling idle loops. The above resched_task semantics allow it to be
set until before the last time need_resched() is checked before going into
a halt requiring interrupt wakeup.

Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
can be always left set, completely eliminating resched IPIs when rescheduling
the idle task.

POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Con Kolivas <kernel@kolivas.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

authored by

Nick Piggin and committed by
Linus Torvalds
64c7c8f8 5bfb5d69

+297 -221
+89
Documentation/sched-arch.txt
··· 1 + CPU Scheduler implementation hints for architecture specific code 2 + 3 + Nick Piggin, 2005 4 + 5 + Context switch 6 + ============== 7 + 1. Runqueue locking 8 + By default, the switch_to arch function is called with the runqueue 9 + locked. This is usually not a problem unless switch_to may need to 10 + take the runqueue lock. This is usually due to a wake up operation in 11 + the context switch. See include/asm-ia64/system.h for an example. 12 + 13 + To request the scheduler call switch_to with the runqueue unlocked, 14 + you must `#define __ARCH_WANT_UNLOCKED_CTXSW` in a header file 15 + (typically the one where switch_to is defined). 16 + 17 + Unlocked context switches introduce only a very minor performance 18 + penalty to the core scheduler implementation in the CONFIG_SMP case. 19 + 20 + 2. Interrupt status 21 + By default, the switch_to arch function is called with interrupts 22 + disabled. Interrupts may be enabled over the call if it is likely to 23 + introduce a significant interrupt latency by adding the line 24 + `#define __ARCH_WANT_INTERRUPTS_ON_CTXSW` in the same place as for 25 + unlocked context switches. This define also implies 26 + `__ARCH_WANT_UNLOCKED_CTXSW`. See include/asm-arm/system.h for an 27 + example. 28 + 29 + 30 + CPU idle 31 + ======== 32 + Your cpu_idle routines need to obey the following rules: 33 + 34 + 1. Preempt should now disabled over idle routines. Should only 35 + be enabled to call schedule() then disabled again. 36 + 37 + 2. need_resched/TIF_NEED_RESCHED is only ever set, and will never 38 + be cleared until the running task has called schedule(). Idle 39 + threads need only ever query need_resched, and may never set or 40 + clear it. 41 + 42 + 3. When cpu_idle finds (need_resched() == 'true'), it should call 43 + schedule(). It should not call schedule() otherwise. 44 + 45 + 4. The only time interrupts need to be disabled when checking 46 + need_resched is if we are about to sleep the processor until 47 + the next interrupt (this doesn't provide any protection of 48 + need_resched, it prevents losing an interrupt). 49 + 50 + 4a. Common problem with this type of sleep appears to be: 51 + local_irq_disable(); 52 + if (!need_resched()) { 53 + local_irq_enable(); 54 + *** resched interrupt arrives here *** 55 + __asm__("sleep until next interrupt"); 56 + } 57 + 58 + 5. TIF_POLLING_NRFLAG can be set by idle routines that do not 59 + need an interrupt to wake them up when need_resched goes high. 60 + In other words, they must be periodically polling need_resched, 61 + although it may be reasonable to do some background work or enter 62 + a low CPU priority. 63 + 64 + 5a. If TIF_POLLING_NRFLAG is set, and we do decide to enter 65 + an interrupt sleep, it needs to be cleared then a memory 66 + barrier issued (followed by a test of need_resched with 67 + interrupts disabled, as explained in 3). 68 + 69 + arch/i386/kernel/process.c has examples of both polling and 70 + sleeping idle functions. 71 + 72 + 73 + Possible arch/ problems 74 + ======================= 75 + 76 + Possible arch problems I found (and either tried to fix or didn't): 77 + 78 + h8300 - Is such sleeping racy vs interrupts? (See #4a). 79 + The H8/300 manual I found indicates yes, however disabling IRQs 80 + over the sleep mean only NMIs can wake it up, so can't fix easily 81 + without doing spin waiting. 82 + 83 + ia64 - is safe_halt call racy vs interrupts? (does it sleep?) (See #4a) 84 + 85 + sh64 - Is sleeping racy vs interrupts? (See #4a) 86 + 87 + sparc - IRQs on at this point(?), change local_irq_save to _disable. 88 + - TODO: needs secondary CPUs to disable preempt (See #1) 89 +
+3 -7
arch/alpha/kernel/process.c
··· 43 43 #include "proto.h" 44 44 #include "pci_impl.h" 45 45 46 - void default_idle(void) 47 - { 48 - barrier(); 49 - } 50 - 51 46 void 52 47 cpu_idle(void) 53 48 { 49 + set_thread_flag(TIF_POLLING_NRFLAG); 50 + 54 51 while (1) { 55 - void (*idle)(void) = default_idle; 56 52 /* FIXME -- EV6 and LCA45 know how to power down 57 53 the CPU. */ 58 54 59 55 while (!need_resched()) 60 - idle(); 56 + cpu_relax(); 61 57 schedule(); 62 58 } 63 59 }
+9 -5
arch/arm/kernel/process.c
··· 86 86 */ 87 87 void default_idle(void) 88 88 { 89 - local_irq_disable(); 90 - if (!need_resched() && !hlt_counter) { 91 - timer_dyn_reprogram(); 92 - arch_idle(); 89 + if (hlt_counter) 90 + cpu_relax(); 91 + else { 92 + local_irq_disable(); 93 + if (!need_resched()) { 94 + timer_dyn_reprogram(); 95 + arch_idle(); 96 + } 97 + local_irq_enable(); 93 98 } 94 - local_irq_enable(); 95 99 } 96 100 97 101 /*
+19 -1
arch/i386/kernel/apm.c
··· 769 769 static int apm_do_idle(void) 770 770 { 771 771 u32 eax; 772 + u8 ret = 0; 773 + int idled = 0; 774 + int polling; 772 775 773 - if (apm_bios_call_simple(APM_FUNC_IDLE, 0, 0, &eax)) { 776 + polling = test_thread_flag(TIF_POLLING_NRFLAG); 777 + if (polling) { 778 + clear_thread_flag(TIF_POLLING_NRFLAG); 779 + smp_mb__after_clear_bit(); 780 + } 781 + if (!need_resched()) { 782 + idled = 1; 783 + ret = apm_bios_call_simple(APM_FUNC_IDLE, 0, 0, &eax); 784 + } 785 + if (polling) 786 + set_thread_flag(TIF_POLLING_NRFLAG); 787 + 788 + if (!idled) 789 + return 0; 790 + 791 + if (ret) { 774 792 static unsigned long t; 775 793 776 794 /* This always fails on some SMP boards running UP kernels.
+28 -36
arch/i386/kernel/process.c
··· 99 99 */ 100 100 void default_idle(void) 101 101 { 102 + local_irq_enable(); 103 + 102 104 if (!hlt_counter && boot_cpu_data.hlt_works_ok) { 103 - local_irq_disable(); 104 - if (!need_resched()) 105 - safe_halt(); 106 - else 107 - local_irq_enable(); 105 + clear_thread_flag(TIF_POLLING_NRFLAG); 106 + smp_mb__after_clear_bit(); 107 + while (!need_resched()) { 108 + local_irq_disable(); 109 + if (!need_resched()) 110 + safe_halt(); 111 + else 112 + local_irq_enable(); 113 + } 114 + set_thread_flag(TIF_POLLING_NRFLAG); 108 115 } else { 109 - cpu_relax(); 116 + while (!need_resched()) 117 + cpu_relax(); 110 118 } 111 119 } 112 120 #ifdef CONFIG_APM_MODULE ··· 128 120 */ 129 121 static void poll_idle (void) 130 122 { 131 - int oldval; 132 - 133 123 local_irq_enable(); 134 124 135 - /* 136 - * Deal with another CPU just having chosen a thread to 137 - * run here: 138 - */ 139 - oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED); 140 - 141 - if (!oldval) { 142 - set_thread_flag(TIF_POLLING_NRFLAG); 143 - asm volatile( 144 - "2:" 145 - "testl %0, %1;" 146 - "rep; nop;" 147 - "je 2b;" 148 - : : "i"(_TIF_NEED_RESCHED), "m" (current_thread_info()->flags)); 149 - 150 - clear_thread_flag(TIF_POLLING_NRFLAG); 151 - } else { 152 - set_need_resched(); 153 - } 125 + asm volatile( 126 + "2:" 127 + "testl %0, %1;" 128 + "rep; nop;" 129 + "je 2b;" 130 + : : "i"(_TIF_NEED_RESCHED), "m" (current_thread_info()->flags)); 154 131 } 155 132 156 133 #ifdef CONFIG_HOTPLUG_CPU ··· 173 180 void cpu_idle(void) 174 181 { 175 182 int cpu = smp_processor_id(); 183 + 184 + set_thread_flag(TIF_POLLING_NRFLAG); 176 185 177 186 /* endless idle loop with no priority at all */ 178 187 while (1) { ··· 241 246 { 242 247 local_irq_enable(); 243 248 244 - if (!need_resched()) { 245 - set_thread_flag(TIF_POLLING_NRFLAG); 246 - do { 247 - __monitor((void *)&current_thread_info()->flags, 0, 0); 248 - if (need_resched()) 249 - break; 250 - __mwait(0, 0); 251 - } while (!need_resched()); 252 - clear_thread_flag(TIF_POLLING_NRFLAG); 249 + while (!need_resched()) { 250 + __monitor((void *)&current_thread_info()->flags, 0, 0); 251 + smp_mb(); 252 + if (need_resched()) 253 + break; 254 + __mwait(0, 0); 253 255 } 254 256 } 255 257
+17 -15
arch/ia64/kernel/process.c
··· 197 197 default_idle (void) 198 198 { 199 199 local_irq_enable(); 200 - while (!need_resched()) 201 - if (can_do_pal_halt) 202 - safe_halt(); 203 - else 200 + while (!need_resched()) { 201 + if (can_do_pal_halt) { 202 + local_irq_disable(); 203 + if (!need_resched()) 204 + safe_halt(); 205 + local_irq_enable(); 206 + } else 204 207 cpu_relax(); 208 + } 205 209 } 206 210 207 211 #ifdef CONFIG_HOTPLUG_CPU ··· 267 263 cpu_idle (void) 268 264 { 269 265 void (*mark_idle)(int) = ia64_mark_idle; 266 + int cpu = smp_processor_id(); 267 + set_thread_flag(TIF_POLLING_NRFLAG); 270 268 271 269 /* endless idle loop with no priority at all */ 272 270 while (1) { 271 + if (!need_resched()) { 272 + void (*idle)(void); 273 273 #ifdef CONFIG_SMP 274 - if (!need_resched()) 275 274 min_xtp(); 276 275 #endif 277 - while (!need_resched()) { 278 - void (*idle)(void); 279 - 280 276 if (__get_cpu_var(cpu_idle_state)) 281 277 __get_cpu_var(cpu_idle_state) = 0; 282 278 ··· 288 284 if (!idle) 289 285 idle = default_idle; 290 286 (*idle)(); 291 - } 292 - 293 - if (mark_idle) 294 - (*mark_idle)(0); 295 - 287 + if (mark_idle) 288 + (*mark_idle)(0); 296 289 #ifdef CONFIG_SMP 297 - normal_xtp(); 290 + normal_xtp(); 298 291 #endif 292 + } 299 293 preempt_enable_no_resched(); 300 294 schedule(); 301 295 preempt_disable(); 302 296 check_pgt_cache(); 303 - if (cpu_is_offline(smp_processor_id())) 297 + if (cpu_is_offline(cpu)) 304 298 play_dead(); 305 299 } 306 300 }
+2
arch/parisc/kernel/process.c
··· 88 88 */ 89 89 void cpu_idle(void) 90 90 { 91 + set_thread_flag(TIF_POLLING_NRFLAG); 92 + 91 93 /* endless idle loop with no priority at all */ 92 94 while (1) { 93 95 while (!need_resched())
+2 -8
arch/powerpc/platforms/iseries/setup.c
··· 703 703 static void iseries_dedicated_idle(void) 704 704 { 705 705 long oldval; 706 + set_thread_flag(TIF_POLLING_NRFLAG); 706 707 707 708 while (1) { 708 - oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED); 709 - 710 - if (!oldval) { 711 - set_thread_flag(TIF_POLLING_NRFLAG); 712 - 709 + if (!need_resched()) { 713 710 while (!need_resched()) { 714 711 ppc64_runlatch_off(); 715 712 HMT_low(); ··· 719 722 } 720 723 721 724 HMT_medium(); 722 - clear_thread_flag(TIF_POLLING_NRFLAG); 723 - } else { 724 - set_need_resched(); 725 725 } 726 726 727 727 ppc64_runlatch_on();
+4 -8
arch/powerpc/platforms/pseries/setup.c
··· 469 469 * more. 470 470 */ 471 471 clear_thread_flag(TIF_POLLING_NRFLAG); 472 + smp_mb__after_clear_bit(); 472 473 473 474 /* 474 475 * SMT dynamic mode. Cede will result in this thread going ··· 482 481 cede_processor(); 483 482 else 484 483 local_irq_enable(); 484 + set_thread_flag(TIF_POLLING_NRFLAG); 485 485 } else { 486 486 /* 487 487 * Give the HV an opportunity at the processor, since we are ··· 494 492 495 493 static void pseries_dedicated_idle(void) 496 494 { 497 - long oldval; 498 495 struct paca_struct *lpaca = get_paca(); 499 496 unsigned int cpu = smp_processor_id(); 500 497 unsigned long start_snooze; 501 498 unsigned long *smt_snooze_delay = &__get_cpu_var(smt_snooze_delay); 499 + set_thread_flag(TIF_POLLING_NRFLAG); 502 500 503 501 while (1) { 504 502 /* ··· 507 505 */ 508 506 lpaca->lppaca.idle = 1; 509 507 510 - oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED); 511 - if (!oldval) { 512 - set_thread_flag(TIF_POLLING_NRFLAG); 513 - 508 + if (!need_resched()) { 514 509 start_snooze = __get_tb() + 515 510 *smt_snooze_delay * tb_ticks_per_usec; 516 511 ··· 530 531 } 531 532 532 533 HMT_medium(); 533 - clear_thread_flag(TIF_POLLING_NRFLAG); 534 - } else { 535 - set_need_resched(); 536 534 } 537 535 538 536 lpaca->lppaca.idle = 0;
+10 -10
arch/ppc/kernel/idle.c
··· 63 63 int cpu = smp_processor_id(); 64 64 65 65 for (;;) { 66 - if (ppc_md.idle != NULL) 67 - ppc_md.idle(); 68 - else 69 - default_idle(); 70 - if (cpu_is_offline(cpu) && system_state == SYSTEM_RUNNING) 71 - cpu_die(); 72 - if (need_resched()) { 73 - preempt_enable_no_resched(); 74 - schedule(); 75 - preempt_disable(); 66 + while (need_resched()) { 67 + if (ppc_md.idle != NULL) 68 + ppc_md.idle(); 69 + else 70 + default_idle(); 76 71 } 77 72 73 + if (cpu_is_offline(cpu) && system_state == SYSTEM_RUNNING) 74 + cpu_die(); 75 + preempt_enable_no_resched(); 76 + schedule(); 77 + preempt_disable(); 78 78 } 79 79 } 80 80
+2 -9
arch/ppc64/kernel/idle.c
··· 34 34 35 35 void default_idle(void) 36 36 { 37 - long oldval; 38 37 unsigned int cpu = smp_processor_id(); 38 + set_thread_flag(TIF_POLLING_NRFLAG); 39 39 40 40 while (1) { 41 - oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED); 42 - 43 - if (!oldval) { 44 - set_thread_flag(TIF_POLLING_NRFLAG); 45 - 41 + if (!need_resched()) { 46 42 while (!need_resched() && !cpu_is_offline(cpu)) { 47 43 ppc64_runlatch_off(); 48 44 ··· 51 55 } 52 56 53 57 HMT_medium(); 54 - clear_thread_flag(TIF_POLLING_NRFLAG); 55 - } else { 56 - set_need_resched(); 57 58 } 58 59 59 60 ppc64_runlatch_on();
+8 -7
arch/s390/kernel/process.c
··· 99 99 { 100 100 int cpu, rc; 101 101 102 - local_irq_disable(); 103 - if (need_resched()) { 104 - local_irq_enable(); 105 - return; 106 - } 107 - 108 102 /* CPU is going idle. */ 109 103 cpu = smp_processor_id(); 104 + 105 + local_irq_disable(); 106 + if (need_resched()) { 107 + local_irq_enable(); 108 + return; 109 + } 110 + 110 111 rc = notifier_call_chain(&idle_chain, CPU_IDLE, (void *)(long) cpu); 111 112 if (rc != NOTIFY_OK && rc != NOTIFY_DONE) 112 113 BUG(); ··· 120 119 __ctl_set_bit(8, 15); 121 120 122 121 #ifdef CONFIG_HOTPLUG_CPU 123 - if (cpu_is_offline(smp_processor_id())) 122 + if (cpu_is_offline(cpu)) 124 123 cpu_die(); 125 124 #endif 126 125
+3 -9
arch/sh/kernel/process.c
··· 51 51 52 52 EXPORT_SYMBOL(enable_hlt); 53 53 54 - void default_idle(void) 54 + void cpu_idle(void) 55 55 { 56 56 /* endless idle loop with no priority at all */ 57 57 while (1) { 58 58 if (hlt_counter) { 59 - while (1) 60 - if (need_resched()) 61 - break; 59 + while (!need_resched()) 60 + cpu_relax(); 62 61 } else { 63 62 while (!need_resched()) 64 63 cpu_sleep(); ··· 67 68 schedule(); 68 69 preempt_disable(); 69 70 } 70 - } 71 - 72 - void cpu_idle(void) 73 - { 74 - default_idle(); 75 71 } 76 72 77 73 void machine_restart(char * __unused)
+3 -11
arch/sh64/kernel/process.c
··· 307 307 308 308 static inline void hlt(void) 309 309 { 310 - if (hlt_counter) 311 - return; 312 - 313 310 __asm__ __volatile__ ("sleep" : : : "memory"); 314 311 } 315 312 316 313 /* 317 314 * The idle loop on a uniprocessor SH.. 318 315 */ 319 - void default_idle(void) 316 + void cpu_idle(void) 320 317 { 321 318 /* endless idle loop with no priority at all */ 322 319 while (1) { 323 320 if (hlt_counter) { 324 - while (1) 325 - if (need_resched()) 326 - break; 321 + while (!need_resched()) 322 + cpu_relax(); 327 323 } else { 328 324 local_irq_disable(); 329 325 while (!need_resched()) { ··· 334 338 schedule(); 335 339 preempt_disable(); 336 340 } 337 - } 338 341 339 - void cpu_idle(void) 340 - { 341 - default_idle(); 342 342 } 343 343 344 344 void machine_restart(char * __unused)
+15 -20
arch/sparc/kernel/process.c
··· 67 67 struct task_struct *last_task_used_math = NULL; 68 68 struct thread_info *current_set[NR_CPUS]; 69 69 70 - /* 71 - * default_idle is new in 2.5. XXX Review, currently stolen from sparc64. 72 - */ 73 - void default_idle(void) 74 - { 75 - } 76 - 77 70 #ifndef CONFIG_SMP 78 71 79 72 #define SUN4C_FAULT_HIGH 100 ··· 85 92 static unsigned long fps; 86 93 unsigned long now; 87 94 unsigned long faults; 88 - unsigned long flags; 89 95 90 96 extern unsigned long sun4c_kernel_faults; 91 97 extern void sun4c_grow_kernel_ring(void); 92 98 93 - local_irq_save(flags); 99 + local_irq_disable(); 94 100 now = jiffies; 95 101 count -= (now - last_jiffies); 96 102 last_jiffies = now; ··· 105 113 sun4c_grow_kernel_ring(); 106 114 } 107 115 } 108 - local_irq_restore(flags); 116 + local_irq_enable(); 109 117 } 110 118 111 - while((!need_resched()) && pm_idle) { 112 - (*pm_idle)(); 119 + if (pm_idle) { 120 + while (!need_resched()) 121 + (*pm_idle)(); 122 + } else { 123 + while (!need_resched()) 124 + cpu_relax(); 113 125 } 114 - 115 126 preempt_enable_no_resched(); 116 127 schedule(); 117 128 preempt_disable(); ··· 127 132 /* This is being executed in task 0 'user space'. */ 128 133 void cpu_idle(void) 129 134 { 135 + set_thread_flag(TIF_POLLING_NRFLAG); 130 136 /* endless idle loop with no priority at all */ 131 137 while(1) { 132 - if(need_resched()) { 133 - preempt_enable_no_resched(); 134 - schedule(); 135 - preempt_disable(); 136 - check_pgt_cache(); 137 - } 138 - barrier(); /* or else gcc optimizes... */ 138 + while (!need_resched()) 139 + cpu_relax(); 140 + preempt_enable_no_resched(); 141 + schedule(); 142 + preempt_disable(); 143 + check_pgt_cache(); 139 144 } 140 145 } 141 146
+14 -6
arch/sparc64/kernel/process.c
··· 85 85 86 86 /* 87 87 * the idle loop on a UltraMultiPenguin... 88 + * 89 + * TIF_POLLING_NRFLAG is set because we do not sleep the cpu 90 + * inside of the idler task, so an interrupt is not needed 91 + * to get a clean fast response. 92 + * 93 + * XXX Reverify this assumption... -DaveM 94 + * 95 + * Addendum: We do want it to do something for the signal 96 + * delivery case, we detect that by just seeing 97 + * if we are trying to send this to an idler or not. 88 98 */ 89 - #define idle_me_harder() (cpu_data(smp_processor_id()).idle_volume += 1) 90 - #define unidle_me() (cpu_data(smp_processor_id()).idle_volume = 0) 91 99 void cpu_idle(void) 92 100 { 101 + cpuinfo_sparc *cpuinfo = &local_cpu_data(); 93 102 set_thread_flag(TIF_POLLING_NRFLAG); 103 + 94 104 while(1) { 95 105 if (need_resched()) { 96 - unidle_me(); 97 - clear_thread_flag(TIF_POLLING_NRFLAG); 106 + cpuinfo->idle_volume = 0; 98 107 preempt_enable_no_resched(); 99 108 schedule(); 100 109 preempt_disable(); 101 - set_thread_flag(TIF_POLLING_NRFLAG); 102 110 check_pgt_cache(); 103 111 } 104 - idle_me_harder(); 112 + cpuinfo->idle_volume++; 105 113 106 114 /* The store ordering is so that IRQ handlers on 107 115 * other cpus see our increasing idleness for the buddy
+1 -12
arch/sparc64/kernel/smp.c
··· 1152 1152 (bogosum/(5000/HZ))%100); 1153 1153 } 1154 1154 1155 - /* This needn't do anything as we do not sleep the cpu 1156 - * inside of the idler task, so an interrupt is not needed 1157 - * to get a clean fast response. 1158 - * 1159 - * XXX Reverify this assumption... -DaveM 1160 - * 1161 - * Addendum: We do want it to do something for the signal 1162 - * delivery case, we detect that by just seeing 1163 - * if we are trying to send this to an idler or not. 1164 - */ 1165 1155 void smp_send_reschedule(int cpu) 1166 1156 { 1167 - if (cpu_data(cpu).idle_volume == 0) 1168 - smp_receive_signal(cpu); 1157 + smp_receive_signal(cpu); 1169 1158 } 1170 1159 1171 1160 /* This is a nop because we capture all other cpus
+31 -36
arch/x86_64/kernel/process.c
··· 86 86 */ 87 87 void default_idle(void) 88 88 { 89 + local_irq_enable(); 90 + 89 91 if (!atomic_read(&hlt_counter)) { 90 - local_irq_disable(); 91 - if (!need_resched()) 92 - safe_halt(); 93 - else 94 - local_irq_enable(); 92 + clear_thread_flag(TIF_POLLING_NRFLAG); 93 + smp_mb__after_clear_bit(); 94 + while (!need_resched()) { 95 + local_irq_disable(); 96 + if (!need_resched()) 97 + safe_halt(); 98 + else 99 + local_irq_enable(); 100 + } 101 + set_thread_flag(TIF_POLLING_NRFLAG); 102 + } else { 103 + while (!need_resched()) 104 + cpu_relax(); 95 105 } 96 106 } 97 107 ··· 112 102 */ 113 103 static void poll_idle (void) 114 104 { 115 - int oldval; 116 - 117 105 local_irq_enable(); 118 106 119 - /* 120 - * Deal with another CPU just having chosen a thread to 121 - * run here: 122 - */ 123 - oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED); 124 - 125 - if (!oldval) { 126 - set_thread_flag(TIF_POLLING_NRFLAG); 127 - asm volatile( 128 - "2:" 129 - "testl %0,%1;" 130 - "rep; nop;" 131 - "je 2b;" 132 - : : 133 - "i" (_TIF_NEED_RESCHED), 134 - "m" (current_thread_info()->flags)); 135 - clear_thread_flag(TIF_POLLING_NRFLAG); 136 - } else { 137 - set_need_resched(); 138 - } 107 + asm volatile( 108 + "2:" 109 + "testl %0,%1;" 110 + "rep; nop;" 111 + "je 2b;" 112 + : : 113 + "i" (_TIF_NEED_RESCHED), 114 + "m" (current_thread_info()->flags)); 139 115 } 140 116 141 117 void cpu_idle_wait(void) ··· 183 187 */ 184 188 void cpu_idle (void) 185 189 { 190 + set_thread_flag(TIF_POLLING_NRFLAG); 191 + 186 192 /* endless idle loop with no priority at all */ 187 193 while (1) { 188 194 while (!need_resched()) { ··· 219 221 { 220 222 local_irq_enable(); 221 223 222 - if (!need_resched()) { 223 - set_thread_flag(TIF_POLLING_NRFLAG); 224 - do { 225 - __monitor((void *)&current_thread_info()->flags, 0, 0); 226 - if (need_resched()) 227 - break; 228 - __mwait(0, 0); 229 - } while (!need_resched()); 230 - clear_thread_flag(TIF_POLLING_NRFLAG); 224 + while (!need_resched()) { 225 + __monitor((void *)&current_thread_info()->flags, 0, 0); 226 + smp_mb(); 227 + if (need_resched()) 228 + break; 229 + __mwait(0, 0); 231 230 } 232 231 } 233 232
+23 -14
drivers/acpi/processor_idle.c
··· 167 167 return; 168 168 } 169 169 170 + static void acpi_safe_halt(void) 171 + { 172 + int polling = test_thread_flag(TIF_POLLING_NRFLAG); 173 + if (polling) { 174 + clear_thread_flag(TIF_POLLING_NRFLAG); 175 + smp_mb__after_clear_bit(); 176 + } 177 + if (!need_resched()) 178 + safe_halt(); 179 + if (polling) 180 + set_thread_flag(TIF_POLLING_NRFLAG); 181 + } 182 + 170 183 static atomic_t c3_cpu_count; 171 184 172 185 static void acpi_processor_idle(void) ··· 190 177 int sleep_ticks = 0; 191 178 u32 t1, t2 = 0; 192 179 193 - pr = processors[raw_smp_processor_id()]; 180 + pr = processors[smp_processor_id()]; 194 181 if (!pr) 195 182 return; 196 183 ··· 210 197 } 211 198 212 199 cx = pr->power.state; 213 - if (!cx) 214 - goto easy_out; 200 + if (!cx) { 201 + if (pm_idle_save) 202 + pm_idle_save(); 203 + else 204 + acpi_safe_halt(); 205 + return; 206 + } 215 207 216 208 /* 217 209 * Check BM Activity ··· 296 278 if (pm_idle_save) 297 279 pm_idle_save(); 298 280 else 299 - safe_halt(); 281 + acpi_safe_halt(); 282 + 300 283 /* 301 284 * TBD: Can't get time duration while in C1, as resumes 302 285 * go to an ISR rather than here. Need to instrument ··· 433 414 */ 434 415 if (next_state != pr->power.state) 435 416 acpi_processor_power_activate(pr, next_state); 436 - 437 - return; 438 - 439 - easy_out: 440 - /* do C1 instead of busy loop */ 441 - if (pm_idle_save) 442 - pm_idle_save(); 443 - else 444 - safe_halt(); 445 - return; 446 417 } 447 418 448 419 static int acpi_processor_set_power_policy(struct acpi_processor *pr)
+14 -7
kernel/sched.c
··· 864 864 #ifdef CONFIG_SMP 865 865 static void resched_task(task_t *p) 866 866 { 867 - int need_resched, nrpolling; 867 + int cpu; 868 868 869 869 assert_spin_locked(&task_rq(p)->lock); 870 870 871 - /* minimise the chance of sending an interrupt to poll_idle() */ 872 - nrpolling = test_tsk_thread_flag(p,TIF_POLLING_NRFLAG); 873 - need_resched = test_and_set_tsk_thread_flag(p,TIF_NEED_RESCHED); 874 - nrpolling |= test_tsk_thread_flag(p,TIF_POLLING_NRFLAG); 871 + if (unlikely(test_tsk_thread_flag(p, TIF_NEED_RESCHED))) 872 + return; 875 873 876 - if (!need_resched && !nrpolling && (task_cpu(p) != smp_processor_id())) 877 - smp_send_reschedule(task_cpu(p)); 874 + set_tsk_thread_flag(p, TIF_NEED_RESCHED); 875 + 876 + cpu = task_cpu(p); 877 + if (cpu == smp_processor_id()) 878 + return; 879 + 880 + /* NEED_RESCHED must be visible before we test POLLING_NRFLAG */ 881 + smp_mb(); 882 + if (!test_tsk_thread_flag(p, TIF_POLLING_NRFLAG)) 883 + smp_send_reschedule(cpu); 878 884 } 879 885 #else 880 886 static inline void resched_task(task_t *p) 881 887 { 888 + assert_spin_locked(&task_rq(p)->lock); 882 889 set_tsk_need_resched(p); 883 890 } 884 891 #endif