Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

sched, net: Fixup busy_loop_us_clock()

The only valid use of preempt_enable_no_resched() is if the very next
line is schedule() or if we know preemption cannot actually be enabled
by that statement due to known more preempt_count 'refs'.

This busy_poll stuff looks to be completely and utterly broken,
sched_clock() can return utter garbage with interrupts enabled (rare
but still) and it can drift unbounded between CPUs.

This means that if you get preempted/migrated and your new CPU is
years behind on the previous CPU we get to busy spin for a _very_ long
time.

There is a _REASON_ sched_clock() warns about preemptability -
papering over it with a preempt_disable()/preempt_enable_no_resched()
is just terminal brain damage on so many levels.

Replace sched_clock() usage with local_clock() which has a bounded
drift between CPUs (<2 jiffies).

There is a further problem with the entire busy wait poll thing in
that the spin time is additive to the syscall timeout, not inclusive.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: rui.zhang@intel.com
Cc: jacob.jun.pan@linux.intel.com
Cc: Mike Galbraith <bitbucket@online.de>
Cc: hpa@zytor.com
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: lenb@kernel.org
Cc: rjw@rjwysocki.net
Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>

authored by

Peter Zijlstra and committed by
Ingo Molnar
37089834 1774e9f3

+1 -18
+1 -18
include/net/busy_poll.h
··· 42 42 return sysctl_net_busy_poll; 43 43 } 44 44 45 - /* a wrapper to make debug_smp_processor_id() happy 46 - * we can use sched_clock() because we don't care much about precision 47 - * we only care that the average is bounded 48 - */ 49 - #ifdef CONFIG_DEBUG_PREEMPT 50 45 static inline u64 busy_loop_us_clock(void) 51 46 { 52 - u64 rc; 53 - 54 - preempt_disable_notrace(); 55 - rc = sched_clock(); 56 - preempt_enable_no_resched_notrace(); 57 - 58 - return rc >> 10; 47 + return local_clock() >> 10; 59 48 } 60 - #else /* CONFIG_DEBUG_PREEMPT */ 61 - static inline u64 busy_loop_us_clock(void) 62 - { 63 - return sched_clock() >> 10; 64 - } 65 - #endif /* CONFIG_DEBUG_PREEMPT */ 66 49 67 50 static inline unsigned long sk_busy_loop_end_time(struct sock *sk) 68 51 {