···328328 RCU rather than SRCU, because RCU is almost always faster and329329 easier to use than is SRCU.330330331331+ If you need to enter your read-side critical section in a332332+ hardirq or exception handler, and then exit that same read-side333333+ critical section in the task that was interrupted, then you need334334+ to srcu_read_lock_raw() and srcu_read_unlock_raw(), which avoid335335+ the lockdep checking that would otherwise this practice illegal.336336+331337 Also unlike other forms of RCU, explicit initialization332338 and cleanup is required via init_srcu_struct() and333339 cleanup_srcu_struct(). These are passed a "struct srcu_struct"
+5-5
Documentation/RCU/rcu.txt
···38383939 Preemptible variants of RCU (CONFIG_TREE_PREEMPT_RCU) get the4040 same effect, but require that the readers manipulate CPU-local4141- counters. These counters allow limited types of blocking4242- within RCU read-side critical sections. SRCU also uses4343- CPU-local counters, and permits general blocking within4444- RCU read-side critical sections. These two variants of4545- RCU detect grace periods by sampling these counters.4141+ counters. These counters allow limited types of blocking within4242+ RCU read-side critical sections. SRCU also uses CPU-local4343+ counters, and permits general blocking within RCU read-side4444+ critical sections. These variants of RCU detect grace periods4545+ by sampling these counters.46464747o If I am running on a uniprocessor kernel, which can only do one4848 thing at a time, why should I wait for a grace period?
+10-6
Documentation/RCU/stallwarn.txt
···101101 CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning102102 messages.103103104104+o A hardware or software issue shuts off the scheduler-clock105105+ interrupt on a CPU that is not in dyntick-idle mode. This106106+ problem really has happened, and seems to be most likely to107107+ result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels.108108+104109o A bug in the RCU implementation.105110106111o A hardware failure. This is quite unlikely, but has occurred···114109 This resulted in a series of RCU CPU stall warnings, eventually115110 leading the realization that the CPU had failed.116111117117-The RCU, RCU-sched, and RCU-bh implementations have CPU stall118118-warning. SRCU does not have its own CPU stall warnings, but its119119-calls to synchronize_sched() will result in RCU-sched detecting120120-RCU-sched-related CPU stalls. Please note that RCU only detects121121-CPU stalls when there is a grace period in progress. No grace period,122122-no CPU stall warnings.112112+The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning.113113+SRCU does not have its own CPU stall warnings, but its calls to114114+synchronize_sched() will result in RCU-sched detecting RCU-sched-related115115+CPU stalls. Please note that RCU only detects CPU stalls when there is116116+a grace period in progress. No grace period, no CPU stall warnings.123117124118To diagnose the cause of the stall, inspect the stack traces.125119The offending function will usually be near the top of the stack.
+13
Documentation/RCU/torture.txt
···6161 To properly exercise RCU implementations with preemptible6262 read-side critical sections.63636464+onoff_interval6565+ The number of seconds between each attempt to execute a6666+ randomly selected CPU-hotplug operation. Defaults to6767+ zero, which disables CPU hotplugging. In HOTPLUG_CPU=n6868+ kernels, rcutorture will silently refuse to do any6969+ CPU-hotplug operations regardless of what value is7070+ specified for onoff_interval.7171+6472shuffle_interval6573 The number of seconds to keep the test threads affinitied6674 to a particular subset of the CPUs, defaults to 3 seconds.6775 Used in conjunction with test_no_idle_hz.7676+7777+shutdown_secs The number of seconds to run the test before terminating7878+ the test and powering off the system. The default is7979+ zero, which disables test termination and system shutdown.8080+ This capability is useful for automated testing.68816982stat_interval The number of seconds between output of torture7083 statistics (via printk()). Regardless of the interval,
-4
Documentation/RCU/trace.txt
···105105 or one greater than the interrupt-nesting depth otherwise.106106 The number after the second "/" is the NMI nesting depth.107107108108- This field is displayed only for CONFIG_NO_HZ kernels.109109-110108o "df" is the number of times that some other CPU has forced a111109 quiescent state on behalf of this CPU due to this CPU being in112110 dynticks-idle state.113113-114114- This field is displayed only for CONFIG_NO_HZ kernels.115111116112o "of" is the number of times that some other CPU has forced a117113 quiescent state on behalf of this CPU due to this CPU being
+14-5
Documentation/RCU/whatisRCU.txt
···441. What is RCU, Fundamentally? http://lwn.net/Articles/262464/552. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/663. RCU part 3: the RCU API http://lwn.net/Articles/264090/77+4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/7889910What is RCU?···835834836835 srcu_read_lock synchronize_srcu N/A837836 srcu_read_unlock synchronize_srcu_expedited837837+ srcu_read_lock_raw838838+ srcu_read_unlock_raw838839 srcu_dereference839840840841SRCU: Initialization/cleanup···858855859856a. Will readers need to block? If so, you need SRCU.860857861861-b. What about the -rt patchset? If readers would need to block858858+b. Is it necessary to start a read-side critical section in a859859+ hardirq handler or exception handler, and then to complete860860+ this read-side critical section in the task that was861861+ interrupted? If so, you need SRCU's srcu_read_lock_raw() and862862+ srcu_read_unlock_raw() primitives.863863+864864+c. What about the -rt patchset? If readers would need to block862865 in an non-rt kernel, you need SRCU. If readers would block863866 in a -rt kernel, but not in a non-rt kernel, SRCU is not864867 necessary.865868866866-c. Do you need to treat NMI handlers, hardirq handlers,869869+d. Do you need to treat NMI handlers, hardirq handlers,867870 and code segments with preemption disabled (whether868871 via preempt_disable(), local_irq_save(), local_bh_disable(),869872 or some other mechanism) as if they were explicit RCU readers?870873 If so, you need RCU-sched.871874872872-d. Do you need RCU grace periods to complete even in the face875875+e. Do you need RCU grace periods to complete even in the face873876 of softirq monopolization of one or more of the CPUs? For874877 example, is your code subject to network-based denial-of-service875878 attacks? If so, you need RCU-bh.876879877877-e. Is your workload too update-intensive for normal use of880880+f. Is your workload too update-intensive for normal use of878881 RCU, but inappropriate for other synchronization mechanisms?879882 If so, consider SLAB_DESTROY_BY_RCU. But please be careful!880883881881-f. Otherwise, use RCU.884884+g. Otherwise, use RCU.882885883886Of course, this all assumes that you have determined that RCU is in fact884887the right tool for your job.
+87
Documentation/atomic_ops.txt
···84848585*** YOU HAVE BEEN WARNED! ***86868787+Properly aligned pointers, longs, ints, and chars (and unsigned8888+equivalents) may be atomically loaded from and stored to in the same8989+sense as described for atomic_read() and atomic_set(). The ACCESS_ONCE()9090+macro should be used to prevent the compiler from using optimizations9191+that might otherwise optimize accesses out of existence on the one hand,9292+or that might create unsolicited accesses on the other.9393+9494+For example consider the following code:9595+9696+ while (a > 0)9797+ do_something();9898+9999+If the compiler can prove that do_something() does not store to the100100+variable a, then the compiler is within its rights transforming this to101101+the following:102102+103103+ tmp = a;104104+ if (a > 0)105105+ for (;;)106106+ do_something();107107+108108+If you don't want the compiler to do this (and you probably don't), then109109+you should use something like the following:110110+111111+ while (ACCESS_ONCE(a) < 0)112112+ do_something();113113+114114+Alternatively, you could place a barrier() call in the loop.115115+116116+For another example, consider the following code:117117+118118+ tmp_a = a;119119+ do_something_with(tmp_a);120120+ do_something_else_with(tmp_a);121121+122122+If the compiler can prove that do_something_with() does not store to the123123+variable a, then the compiler is within its rights to manufacture an124124+additional load as follows:125125+126126+ tmp_a = a;127127+ do_something_with(tmp_a);128128+ tmp_a = a;129129+ do_something_else_with(tmp_a);130130+131131+This could fatally confuse your code if it expected the same value132132+to be passed to do_something_with() and do_something_else_with().133133+134134+The compiler would be likely to manufacture this additional load if135135+do_something_with() was an inline function that made very heavy use136136+of registers: reloading from variable a could save a flush to the137137+stack and later reload. To prevent the compiler from attacking your138138+code in this manner, write the following:139139+140140+ tmp_a = ACCESS_ONCE(a);141141+ do_something_with(tmp_a);142142+ do_something_else_with(tmp_a);143143+144144+For a final example, consider the following code, assuming that the145145+variable a is set at boot time before the second CPU is brought online146146+and never changed later, so that memory barriers are not needed:147147+148148+ if (a)149149+ b = 9;150150+ else151151+ b = 42;152152+153153+The compiler is within its rights to manufacture an additional store154154+by transforming the above code into the following:155155+156156+ b = 42;157157+ if (a)158158+ b = 9;159159+160160+This could come as a fatal surprise to other code running concurrently161161+that expected b to never have the value 42 if a was zero. To prevent162162+the compiler from doing this, write something like:163163+164164+ if (a)165165+ ACCESS_ONCE(b) = 9;166166+ else167167+ ACCESS_ONCE(b) = 42;168168+169169+Don't even -think- about doing this without proper use of memory barriers,170170+locks, or atomic operations if variable a can change at runtime!171171+172172+*** WARNING: ACCESS_ONCE() DOES NOT IMPLY A BARRIER! ***173173+87174Now, we move onto the atomic operation interfaces typically implemented with88175the help of assembly code.89176
+63
Documentation/lockdep-design.txt
···221221table, which hash-table can be checked in a lockfree manner. If the222222locking chain occurs again later on, the hash table tells us that we223223dont have to validate the chain again.224224+225225+Troubleshooting:226226+----------------227227+228228+The validator tracks a maximum of MAX_LOCKDEP_KEYS number of lock classes.229229+Exceeding this number will trigger the following lockdep warning:230230+231231+ (DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))232232+233233+By default, MAX_LOCKDEP_KEYS is currently set to 8191, and typical234234+desktop systems have less than 1,000 lock classes, so this warning235235+normally results from lock-class leakage or failure to properly236236+initialize locks. These two problems are illustrated below:237237+238238+1. Repeated module loading and unloading while running the validator239239+ will result in lock-class leakage. The issue here is that each240240+ load of the module will create a new set of lock classes for241241+ that module's locks, but module unloading does not remove old242242+ classes (see below discussion of reuse of lock classes for why).243243+ Therefore, if that module is loaded and unloaded repeatedly,244244+ the number of lock classes will eventually reach the maximum.245245+246246+2. Using structures such as arrays that have large numbers of247247+ locks that are not explicitly initialized. For example,248248+ a hash table with 8192 buckets where each bucket has its own249249+ spinlock_t will consume 8192 lock classes -unless- each spinlock250250+ is explicitly initialized at runtime, for example, using the251251+ run-time spin_lock_init() as opposed to compile-time initializers252252+ such as __SPIN_LOCK_UNLOCKED(). Failure to properly initialize253253+ the per-bucket spinlocks would guarantee lock-class overflow.254254+ In contrast, a loop that called spin_lock_init() on each lock255255+ would place all 8192 locks into a single lock class.256256+257257+ The moral of this story is that you should always explicitly258258+ initialize your locks.259259+260260+One might argue that the validator should be modified to allow261261+lock classes to be reused. However, if you are tempted to make this262262+argument, first review the code and think through the changes that would263263+be required, keeping in mind that the lock classes to be removed are264264+likely to be linked into the lock-dependency graph. This turns out to265265+be harder to do than to say.266266+267267+Of course, if you do run out of lock classes, the next thing to do is268268+to find the offending lock classes. First, the following command gives269269+you the number of lock classes currently in use along with the maximum:270270+271271+ grep "lock-classes" /proc/lockdep_stats272272+273273+This command produces the following output on a modest system:274274+275275+ lock-classes: 748 [max: 8191]276276+277277+If the number allocated (748 above) increases continually over time,278278+then there is likely a leak. The following command can be used to279279+identify the leaking lock classes:280280+281281+ grep "BD" /proc/lockdep282282+283283+Run the command and save the output, then compare against the output from284284+a later run of this command to identify the leakers. This same output285285+can also help you find situations where runtime lock initialization has286286+been omitted.
+4-2
arch/arm/kernel/process.c
···183183184184 /* endless idle loop with no priority at all */185185 while (1) {186186- tick_nohz_stop_sched_tick(1);186186+ tick_nohz_idle_enter();187187+ rcu_idle_enter();187188 leds_event(led_idle_start);188189 while (!need_resched()) {189190#ifdef CONFIG_HOTPLUG_CPU···214213 }215214 }216215 leds_event(led_idle_end);217217- tick_nohz_restart_sched_tick();216216+ rcu_idle_exit();217217+ tick_nohz_idle_exit();218218 preempt_enable_no_resched();219219 schedule();220220 preempt_disable();
+4-2
arch/avr32/kernel/process.c
···3434{3535 /* endless idle loop with no priority at all */3636 while (1) {3737- tick_nohz_stop_sched_tick(1);3737+ tick_nohz_idle_enter();3838+ rcu_idle_enter();3839 while (!need_resched())3940 cpu_idle_sleep();4040- tick_nohz_restart_sched_tick();4141+ rcu_idle_exit();4242+ tick_nohz_idle_exit();4143 preempt_enable_no_resched();4244 schedule();4345 preempt_disable();
+4-2
arch/blackfin/kernel/process.c
···8888#endif8989 if (!idle)9090 idle = default_idle;9191- tick_nohz_stop_sched_tick(1);9191+ tick_nohz_idle_enter();9292+ rcu_idle_enter();9293 while (!need_resched())9394 idle();9494- tick_nohz_restart_sched_tick();9595+ rcu_idle_exit();9696+ tick_nohz_idle_exit();9597 preempt_enable_no_resched();9698 schedule();9799 preempt_disable();
+4-2
arch/microblaze/kernel/process.c
···103103 if (!idle)104104 idle = default_idle;105105106106- tick_nohz_stop_sched_tick(1);106106+ tick_nohz_idle_enter();107107+ rcu_idle_enter();107108 while (!need_resched())108109 idle();109109- tick_nohz_restart_sched_tick();110110+ rcu_idle_exit();111111+ tick_nohz_idle_exit();110112111113 preempt_enable_no_resched();112114 schedule();
+4-2
arch/mips/kernel/process.c
···56565757 /* endless idle loop with no priority at all */5858 while (1) {5959- tick_nohz_stop_sched_tick(1);5959+ tick_nohz_idle_enter();6060+ rcu_idle_enter();6061 while (!need_resched() && cpu_online(cpu)) {6162#ifdef CONFIG_MIPS_MT_SMTC6263 extern void smtc_idle_loop_hook(void);···7877 system_state == SYSTEM_BOOTING))7978 play_dead();8079#endif8181- tick_nohz_restart_sched_tick();8080+ rcu_idle_exit();8181+ tick_nohz_idle_exit();8282 preempt_enable_no_resched();8383 schedule();8484 preempt_disable();
+4-2
arch/openrisc/kernel/idle.c
···51515252 /* endless idle loop with no priority at all */5353 while (1) {5454- tick_nohz_stop_sched_tick(1);5454+ tick_nohz_idle_enter();5555+ rcu_idle_enter();55565657 while (!need_resched()) {5758 check_pgt_cache();···7069 set_thread_flag(TIF_POLLING_NRFLAG);7170 }72717373- tick_nohz_restart_sched_tick();7272+ rcu_idle_exit();7373+ tick_nohz_idle_exit();7474 preempt_enable_no_resched();7575 schedule();7676 preempt_disable();
+13-2
arch/powerpc/kernel/idle.c
···4646}4747__setup("powersave=off", powersave_off);48484949+#if defined(CONFIG_PPC_PSERIES) && defined(CONFIG_TRACEPOINTS)5050+static const bool idle_uses_rcu = 1;5151+#else5252+static const bool idle_uses_rcu;5353+#endif5454+4955/*5056 * The body of the idle task.5157 */···62566357 set_thread_flag(TIF_POLLING_NRFLAG);6458 while (1) {6565- tick_nohz_stop_sched_tick(1);5959+ tick_nohz_idle_enter();6060+ if (!idle_uses_rcu)6161+ rcu_idle_enter();6262+6663 while (!need_resched() && !cpu_should_die()) {6764 ppc64_runlatch_off();6865···1029310394 HMT_medium();10495 ppc64_runlatch_on();105105- tick_nohz_restart_sched_tick();9696+ if (!idle_uses_rcu)9797+ rcu_idle_exit();9898+ tick_nohz_idle_exit();10699 preempt_enable_no_resched();107100 if (cpu_should_die())108101 cpu_die();
+8-4
arch/powerpc/platforms/iseries/setup.c
···563563static void iseries_shared_idle(void)564564{565565 while (1) {566566- tick_nohz_stop_sched_tick(1);566566+ tick_nohz_idle_enter();567567+ rcu_idle_enter();567568 while (!need_resched() && !hvlpevent_is_pending()) {568569 local_irq_disable();569570 ppc64_runlatch_off();···578577 }579578580579 ppc64_runlatch_on();581581- tick_nohz_restart_sched_tick();580580+ rcu_idle_exit();581581+ tick_nohz_idle_exit();582582583583 if (hvlpevent_is_pending())584584 process_iSeries_events();···595593 set_thread_flag(TIF_POLLING_NRFLAG);596594597595 while (1) {598598- tick_nohz_stop_sched_tick(1);596596+ tick_nohz_idle_enter();597597+ rcu_idle_enter();599598 if (!need_resched()) {600599 while (!need_resched()) {601600 ppc64_runlatch_off();···613610 }614611615612 ppc64_runlatch_on();616616- tick_nohz_restart_sched_tick();613613+ rcu_idle_exit();614614+ tick_nohz_idle_exit();617615 preempt_enable_no_resched();618616 schedule();619617 preempt_disable();
···9191void cpu_idle(void)9292{9393 for (;;) {9494- tick_nohz_stop_sched_tick(1);9494+ tick_nohz_idle_enter();9595+ rcu_idle_enter();9596 while (!need_resched())9697 default_idle();9797- tick_nohz_restart_sched_tick();9898+ rcu_idle_exit();9999+ tick_nohz_idle_exit();98100 preempt_enable_no_resched();99101 schedule();100102 preempt_disable();
+4-2
arch/sh/kernel/idle.c
···89899090 /* endless idle loop with no priority at all */9191 while (1) {9292- tick_nohz_stop_sched_tick(1);9292+ tick_nohz_idle_enter();9393+ rcu_idle_enter();93949495 while (!need_resched()) {9596 check_pgt_cache();···112111 start_critical_timings();113112 }114113115115- tick_nohz_restart_sched_tick();114114+ rcu_idle_exit();115115+ tick_nohz_idle_exit();116116 preempt_enable_no_resched();117117 schedule();118118 preempt_disable();
···85858686 /* endless idle loop with no priority at all */8787 while (1) {8888- tick_nohz_stop_sched_tick(1);8888+ tick_nohz_idle_enter();8989+ rcu_idle_enter();8990 while (!need_resched()) {9091 if (cpu_is_offline(cpu))9192 BUG(); /* no HOTPLUG_CPU */···106105 local_irq_enable();107106 current_thread_info()->status |= TS_POLLING;108107 }109109- tick_nohz_restart_sched_tick();108108+ rcu_idle_exit();109109+ tick_nohz_idle_exit();110110 preempt_enable_no_resched();111111 schedule();112112 preempt_disable();
+2-2
arch/tile/mm/fault.c
···5454 if (unlikely(tsk->pid < 2)) {5555 panic("Signal %d (code %d) at %#lx sent to %s!",5656 si_signo, si_code & 0xffff, address,5757- tsk->pid ? "init" : "the idle task");5757+ is_idle_task(tsk) ? "the idle task" : "init");5858 }59596060 info.si_signo = si_signo;···515515516516 if (unlikely(tsk->pid < 2)) {517517 panic("Kernel page fault running %s!",518518- tsk->pid ? "init" : "the idle task");518518+ is_idle_task(tsk) ? "the idle task" : "init");519519 }520520521521 /*
···5555{5656 /* endless idle loop with no priority at all */5757 while (1) {5858- tick_nohz_stop_sched_tick(1);5858+ tick_nohz_idle_enter();5959+ rcu_idle_enter();5960 while (!need_resched()) {6061 local_irq_disable();6162 stop_critical_timings();···6463 local_irq_enable();6564 start_critical_timings();6665 }6767- tick_nohz_restart_sched_tick();6666+ rcu_idle_exit();6767+ tick_nohz_idle_exit();6868 preempt_enable_no_resched();6969 schedule();7070 preempt_disable();
+3-3
arch/x86/kernel/apic/apic.c
···876876 * Besides, if we don't timer interrupts ignore the global877877 * interrupt lock, which is the WrongThing (tm) to do.878878 */879879- exit_idle();880879 irq_enter();880880+ exit_idle();881881 local_apic_timer_interrupt();882882 irq_exit();883883···18091809{18101810 u32 v;1811181118121812- exit_idle();18131812 irq_enter();18131813+ exit_idle();18141814 /*18151815 * Check if this really is a spurious interrupt and ACK it18161816 * if it is a vectored one. Just in case...···18461846 "Illegal register address", /* APIC Error Bit 7 */18471847 };1848184818491849- exit_idle();18501849 irq_enter();18501850+ exit_idle();18511851 /* First tickle the hardware, only then report what went on. -- REW */18521852 v0 = apic_read(APIC_ESR);18531853 apic_write(APIC_ESR, 0);
···9999100100 /* endless idle loop with no priority at all */101101 while (1) {102102- tick_nohz_stop_sched_tick(1);102102+ tick_nohz_idle_enter();103103+ rcu_idle_enter();103104 while (!need_resched()) {104105105106 check_pgt_cache();···117116 pm_idle();118117 start_critical_timings();119118 }120120- tick_nohz_restart_sched_tick();119119+ rcu_idle_exit();120120+ tick_nohz_idle_exit();121121 preempt_enable_no_resched();122122 schedule();123123 preempt_disable();
+8-2
arch/x86/kernel/process_64.c
···122122123123 /* endless idle loop with no priority at all */124124 while (1) {125125- tick_nohz_stop_sched_tick(1);125125+ tick_nohz_idle_enter();126126 while (!need_resched()) {127127128128 rmb();···139139 enter_idle();140140 /* Don't trace irqs off for idle */141141 stop_critical_timings();142142+143143+ /* enter_idle() needs rcu for notifiers */144144+ rcu_idle_enter();145145+142146 if (cpuidle_idle_call())143147 pm_idle();148148+149149+ rcu_idle_exit();144150 start_critical_timings();145151146152 /* In many cases the interrupt that ended idle···155149 __exit_idle();156150 }157151158158- tick_nohz_restart_sched_tick();152152+ tick_nohz_idle_exit();159153 preempt_enable_no_resched();160154 schedule();161155 preempt_disable();
+7
drivers/base/cpu.c
···247247}248248EXPORT_SYMBOL_GPL(get_cpu_sysdev);249249250250+bool cpu_is_hotpluggable(unsigned cpu)251251+{252252+ struct sys_device *dev = get_cpu_sysdev(cpu);253253+ return dev && container_of(dev, struct cpu, sysdev)->hotpluggable;254254+}255255+EXPORT_SYMBOL_GPL(cpu_is_hotpluggable);256256+250257int __init cpu_dev_init(void)251258{252259 int err;
+1
include/linux/cpu.h
···27272828extern int register_cpu(struct cpu *cpu, int num);2929extern struct sys_device *get_cpu_sysdev(unsigned cpu);3030+extern bool cpu_is_hotpluggable(unsigned cpu);30313132extern int cpu_add_sysdev_attr(struct sysdev_attribute *attr);3233extern void cpu_remove_sysdev_attr(struct sysdev_attribute *attr);
-21
include/linux/hardirq.h
···139139extern void account_system_vtime(struct task_struct *tsk);140140#endif141141142142-#if defined(CONFIG_NO_HZ)143142#if defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU)144144-extern void rcu_enter_nohz(void);145145-extern void rcu_exit_nohz(void);146146-147147-static inline void rcu_irq_enter(void)148148-{149149- rcu_exit_nohz();150150-}151151-152152-static inline void rcu_irq_exit(void)153153-{154154- rcu_enter_nohz();155155-}156143157144static inline void rcu_nmi_enter(void)158145{···150163}151164152165#else153153-extern void rcu_irq_enter(void);154154-extern void rcu_irq_exit(void);155166extern void rcu_nmi_enter(void);156167extern void rcu_nmi_exit(void);157168#endif158158-#else159159-# define rcu_irq_enter() do { } while (0)160160-# define rcu_irq_exit() do { } while (0)161161-# define rcu_nmi_enter() do { } while (0)162162-# define rcu_nmi_exit() do { } while (0)163163-#endif /* #if defined(CONFIG_NO_HZ) */164169165170/*166171 * It is safe to do non-atomic ops on ->hardirq_context,
+73-42
include/linux/rcupdate.h
···5151#if defined(CONFIG_TREE_RCU) || defined(CONFIG_TREE_PREEMPT_RCU)5252extern void rcutorture_record_test_transition(void);5353extern void rcutorture_record_progress(unsigned long vernum);5454+extern void do_trace_rcu_torture_read(char *rcutorturename,5555+ struct rcu_head *rhp);5456#else5557static inline void rcutorture_record_test_transition(void)5658{···6058static inline void rcutorture_record_progress(unsigned long vernum)6159{6260}6161+#ifdef CONFIG_RCU_TRACE6262+extern void do_trace_rcu_torture_read(char *rcutorturename,6363+ struct rcu_head *rhp);6464+#else6565+#define do_trace_rcu_torture_read(rcutorturename, rhp) do { } while (0)6666+#endif6367#endif64686569#define UINT_CMP_GE(a, b) (UINT_MAX / 2 >= (a) - (b))···185177extern void rcu_bh_qs(int cpu);186178extern void rcu_check_callbacks(int cpu, int user);187179struct notifier_block;188188-189189-#ifdef CONFIG_NO_HZ190190-191191-extern void rcu_enter_nohz(void);192192-extern void rcu_exit_nohz(void);193193-194194-#else /* #ifdef CONFIG_NO_HZ */195195-196196-static inline void rcu_enter_nohz(void)197197-{198198-}199199-200200-static inline void rcu_exit_nohz(void)201201-{202202-}203203-204204-#endif /* #else #ifdef CONFIG_NO_HZ */180180+extern void rcu_idle_enter(void);181181+extern void rcu_idle_exit(void);182182+extern void rcu_irq_enter(void);183183+extern void rcu_irq_exit(void);205184206185/*207186 * Infrastructure to implement the synchronize_() primitives in···228233229234#ifdef CONFIG_DEBUG_LOCK_ALLOC230235236236+#ifdef CONFIG_PROVE_RCU237237+extern int rcu_is_cpu_idle(void);238238+#else /* !CONFIG_PROVE_RCU */239239+static inline int rcu_is_cpu_idle(void)240240+{241241+ return 0;242242+}243243+#endif /* else !CONFIG_PROVE_RCU */244244+245245+static inline void rcu_lock_acquire(struct lockdep_map *map)246246+{247247+ WARN_ON_ONCE(rcu_is_cpu_idle());248248+ lock_acquire(map, 0, 0, 2, 1, NULL, _THIS_IP_);249249+}250250+251251+static inline void rcu_lock_release(struct lockdep_map *map)252252+{253253+ WARN_ON_ONCE(rcu_is_cpu_idle());254254+ lock_release(map, 1, _THIS_IP_);255255+}256256+231257extern struct lockdep_map rcu_lock_map;232232-# define rcu_read_acquire() \233233- lock_acquire(&rcu_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)234234-# define rcu_read_release() lock_release(&rcu_lock_map, 1, _THIS_IP_)235235-236258extern struct lockdep_map rcu_bh_lock_map;237237-# define rcu_read_acquire_bh() \238238- lock_acquire(&rcu_bh_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)239239-# define rcu_read_release_bh() lock_release(&rcu_bh_lock_map, 1, _THIS_IP_)240240-241259extern struct lockdep_map rcu_sched_lock_map;242242-# define rcu_read_acquire_sched() \243243- lock_acquire(&rcu_sched_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)244244-# define rcu_read_release_sched() \245245- lock_release(&rcu_sched_lock_map, 1, _THIS_IP_)246246-247260extern int debug_lockdep_rcu_enabled(void);248261249262/**···265262 *266263 * Checks debug_lockdep_rcu_enabled() to prevent false positives during boot267264 * and while lockdep is disabled.265265+ *266266+ * Note that rcu_read_lock() and the matching rcu_read_unlock() must267267+ * occur in the same context, for example, it is illegal to invoke268268+ * rcu_read_unlock() in process context if the matching rcu_read_lock()269269+ * was invoked from within an irq handler.268270 */269271static inline int rcu_read_lock_held(void)270272{271273 if (!debug_lockdep_rcu_enabled())272274 return 1;275275+ if (rcu_is_cpu_idle())276276+ return 0;273277 return lock_is_held(&rcu_lock_map);274278}275279···300290 *301291 * Check debug_lockdep_rcu_enabled() to prevent false positives during boot302292 * and while lockdep is disabled.293293+ *294294+ * Note that if the CPU is in the idle loop from an RCU point of295295+ * view (ie: that we are in the section between rcu_idle_enter() and296296+ * rcu_idle_exit()) then rcu_read_lock_held() returns false even if the CPU297297+ * did an rcu_read_lock(). The reason for this is that RCU ignores CPUs298298+ * that are in such a section, considering these as in extended quiescent299299+ * state, so such a CPU is effectively never in an RCU read-side critical300300+ * section regardless of what RCU primitives it invokes. This state of301301+ * affairs is required --- we need to keep an RCU-free window in idle302302+ * where the CPU may possibly enter into low power mode. This way we can303303+ * notice an extended quiescent state to other CPUs that started a grace304304+ * period. Otherwise we would delay any grace period as long as we run in305305+ * the idle task.303306 */304307#ifdef CONFIG_PREEMPT_COUNT305308static inline int rcu_read_lock_sched_held(void)···321298322299 if (!debug_lockdep_rcu_enabled())323300 return 1;301301+ if (rcu_is_cpu_idle())302302+ return 0;324303 if (debug_locks)325304 lockdep_opinion = lock_is_held(&rcu_sched_lock_map);326305 return lockdep_opinion || preempt_count() != 0 || irqs_disabled();···336311337312#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */338313339339-# define rcu_read_acquire() do { } while (0)340340-# define rcu_read_release() do { } while (0)341341-# define rcu_read_acquire_bh() do { } while (0)342342-# define rcu_read_release_bh() do { } while (0)343343-# define rcu_read_acquire_sched() do { } while (0)344344-# define rcu_read_release_sched() do { } while (0)314314+# define rcu_lock_acquire(a) do { } while (0)315315+# define rcu_lock_release(a) do { } while (0)345316346317static inline int rcu_read_lock_held(void)347318{···658637{659638 __rcu_read_lock();660639 __acquire(RCU);661661- rcu_read_acquire();640640+ rcu_lock_acquire(&rcu_lock_map);662641}663642664643/*···678657 */679658static inline void rcu_read_unlock(void)680659{681681- rcu_read_release();660660+ rcu_lock_release(&rcu_lock_map);682661 __release(RCU);683662 __rcu_read_unlock();684663}···694673 * critical sections in interrupt context can use just rcu_read_lock(),695674 * though this should at least be commented to avoid confusing people696675 * reading the code.676676+ *677677+ * Note that rcu_read_lock_bh() and the matching rcu_read_unlock_bh()678678+ * must occur in the same context, for example, it is illegal to invoke679679+ * rcu_read_unlock_bh() from one task if the matching rcu_read_lock_bh()680680+ * was invoked from some other task.697681 */698682static inline void rcu_read_lock_bh(void)699683{700684 local_bh_disable();701685 __acquire(RCU_BH);702702- rcu_read_acquire_bh();686686+ rcu_lock_acquire(&rcu_bh_lock_map);703687}704688705689/*···714688 */715689static inline void rcu_read_unlock_bh(void)716690{717717- rcu_read_release_bh();691691+ rcu_lock_release(&rcu_bh_lock_map);718692 __release(RCU_BH);719693 local_bh_enable();720694}···726700 * are being done using call_rcu_sched() or synchronize_rcu_sched().727701 * Read-side critical sections can also be introduced by anything that728702 * disables preemption, including local_irq_disable() and friends.703703+ *704704+ * Note that rcu_read_lock_sched() and the matching rcu_read_unlock_sched()705705+ * must occur in the same context, for example, it is illegal to invoke706706+ * rcu_read_unlock_sched() from process context if the matching707707+ * rcu_read_lock_sched() was invoked from an NMI handler.729708 */730709static inline void rcu_read_lock_sched(void)731710{732711 preempt_disable();733712 __acquire(RCU_SCHED);734734- rcu_read_acquire_sched();713713+ rcu_lock_acquire(&rcu_sched_lock_map);735714}736715737716/* Used by lockdep and tracing: cannot be traced, cannot call lockdep. */···753722 */754723static inline void rcu_read_unlock_sched(void)755724{756756- rcu_read_release_sched();725725+ rcu_lock_release(&rcu_sched_lock_map);757726 __release(RCU_SCHED);758727 preempt_enable();759728}
+8
include/linux/sched.h
···20702070extern int sched_setscheduler_nocheck(struct task_struct *, int,20712071 const struct sched_param *);20722072extern struct task_struct *idle_task(int cpu);20732073+/**20742074+ * is_idle_task - is the specified task an idle task?20752075+ * @tsk: the task in question.20762076+ */20772077+static inline bool is_idle_task(struct task_struct *p)20782078+{20792079+ return p->pid == 0;20802080+}20732081extern struct task_struct *curr_task(int cpu);20742082extern void set_curr_task(int cpu, struct task_struct *p);20752083
+74-13
include/linux/srcu.h
···2828#define _LINUX_SRCU_H29293030#include <linux/mutex.h>3131+#include <linux/rcupdate.h>31323233struct srcu_struct_array {3334 int c[2];···6160 __init_srcu_struct((sp), #sp, &__srcu_key); \6261})63626464-# define srcu_read_acquire(sp) \6565- lock_acquire(&(sp)->dep_map, 0, 0, 2, 1, NULL, _THIS_IP_)6666-# define srcu_read_release(sp) \6767- lock_release(&(sp)->dep_map, 1, _THIS_IP_)6868-6963#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */70647165int init_srcu_struct(struct srcu_struct *sp);7272-7373-# define srcu_read_acquire(sp) do { } while (0)7474-# define srcu_read_release(sp) do { } while (0)75667667#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */7768···8390 * read-side critical section. In absence of CONFIG_DEBUG_LOCK_ALLOC,8491 * this assumes we are in an SRCU read-side critical section unless it can8592 * prove otherwise.9393+ *9494+ * Checks debug_lockdep_rcu_enabled() to prevent false positives during boot9595+ * and while lockdep is disabled.9696+ *9797+ * Note that if the CPU is in the idle loop from an RCU point of view9898+ * (ie: that we are in the section between rcu_idle_enter() and9999+ * rcu_idle_exit()) then srcu_read_lock_held() returns false even if100100+ * the CPU did an srcu_read_lock(). The reason for this is that RCU101101+ * ignores CPUs that are in such a section, considering these as in102102+ * extended quiescent state, so such a CPU is effectively never in an103103+ * RCU read-side critical section regardless of what RCU primitives it104104+ * invokes. This state of affairs is required --- we need to keep an105105+ * RCU-free window in idle where the CPU may possibly enter into low106106+ * power mode. This way we can notice an extended quiescent state to107107+ * other CPUs that started a grace period. Otherwise we would delay any108108+ * grace period as long as we run in the idle task.86109 */87110static inline int srcu_read_lock_held(struct srcu_struct *sp)88111{8989- if (debug_locks)9090- return lock_is_held(&sp->dep_map);9191- return 1;112112+ if (rcu_is_cpu_idle())113113+ return 0;114114+115115+ if (!debug_lockdep_rcu_enabled())116116+ return 1;117117+118118+ return lock_is_held(&sp->dep_map);92119}9312094121#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */···158145 * one way to indirectly wait on an SRCU grace period is to acquire159146 * a mutex that is held elsewhere while calling synchronize_srcu() or160147 * synchronize_srcu_expedited().148148+ *149149+ * Note that srcu_read_lock() and the matching srcu_read_unlock() must150150+ * occur in the same context, for example, it is illegal to invoke151151+ * srcu_read_unlock() in an irq handler if the matching srcu_read_lock()152152+ * was invoked in process context.161153 */162154static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)163155{164156 int retval = __srcu_read_lock(sp);165157166166- srcu_read_acquire(sp);158158+ rcu_lock_acquire(&(sp)->dep_map);167159 return retval;168160}169161···182164static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)183165 __releases(sp)184166{185185- srcu_read_release(sp);167167+ rcu_lock_release(&(sp)->dep_map);186168 __srcu_read_unlock(sp, idx);169169+}170170+171171+/**172172+ * srcu_read_lock_raw - register a new reader for an SRCU-protected structure.173173+ * @sp: srcu_struct in which to register the new reader.174174+ *175175+ * Enter an SRCU read-side critical section. Similar to srcu_read_lock(),176176+ * but avoids the RCU-lockdep checking. This means that it is legal to177177+ * use srcu_read_lock_raw() in one context, for example, in an exception178178+ * handler, and then have the matching srcu_read_unlock_raw() in another179179+ * context, for example in the task that took the exception.180180+ *181181+ * However, the entire SRCU read-side critical section must reside within a182182+ * single task. For example, beware of using srcu_read_lock_raw() in183183+ * a device interrupt handler and srcu_read_unlock() in the interrupted184184+ * task: This will not work if interrupts are threaded.185185+ */186186+static inline int srcu_read_lock_raw(struct srcu_struct *sp)187187+{188188+ unsigned long flags;189189+ int ret;190190+191191+ local_irq_save(flags);192192+ ret = __srcu_read_lock(sp);193193+ local_irq_restore(flags);194194+ return ret;195195+}196196+197197+/**198198+ * srcu_read_unlock_raw - unregister reader from an SRCU-protected structure.199199+ * @sp: srcu_struct in which to unregister the old reader.200200+ * @idx: return value from corresponding srcu_read_lock_raw().201201+ *202202+ * Exit an SRCU read-side critical section without lockdep-RCU checking.203203+ * See srcu_read_lock_raw() for more details.204204+ */205205+static inline void srcu_read_unlock_raw(struct srcu_struct *sp, int idx)206206+{207207+ unsigned long flags;208208+209209+ local_irq_save(flags);210210+ __srcu_read_unlock(sp, idx);211211+ local_irq_restore(flags);187212}188213189214#endif
···241241242242/*243243 * Tracepoint for dyntick-idle entry/exit events. These take a string244244- * as argument: "Start" for entering dyntick-idle mode and "End" for245245- * leaving it.244244+ * as argument: "Start" for entering dyntick-idle mode, "End" for245245+ * leaving it, "--=" for events moving towards idle, and "++=" for events246246+ * moving away from idle. "Error on entry: not idle task" and "Error on247247+ * exit: not idle task" indicate that a non-idle task is erroneously248248+ * toying with the idle loop.249249+ *250250+ * These events also take a pair of numbers, which indicate the nesting251251+ * depth before and after the event of interest. Note that task-related252252+ * events use the upper bits of each number, while interrupt-related253253+ * events use the lower bits.246254 */247255TRACE_EVENT(rcu_dyntick,248256249249- TP_PROTO(char *polarity),257257+ TP_PROTO(char *polarity, long long oldnesting, long long newnesting),250258251251- TP_ARGS(polarity),259259+ TP_ARGS(polarity, oldnesting, newnesting),252260253261 TP_STRUCT__entry(254262 __field(char *, polarity)263263+ __field(long long, oldnesting)264264+ __field(long long, newnesting)255265 ),256266257267 TP_fast_assign(258268 __entry->polarity = polarity;269269+ __entry->oldnesting = oldnesting;270270+ __entry->newnesting = newnesting;259271 ),260272261261- TP_printk("%s", __entry->polarity)273273+ TP_printk("%s %llx %llx", __entry->polarity,274274+ __entry->oldnesting, __entry->newnesting)275275+);276276+277277+/*278278+ * Tracepoint for RCU preparation for idle, the goal being to get RCU279279+ * processing done so that the current CPU can shut off its scheduling280280+ * clock and enter dyntick-idle mode. One way to accomplish this is281281+ * to drain all RCU callbacks from this CPU, and the other is to have282282+ * done everything RCU requires for the current grace period. In this283283+ * latter case, the CPU will be awakened at the end of the current grace284284+ * period in order to process the remainder of its callbacks.285285+ *286286+ * These tracepoints take a string as argument:287287+ *288288+ * "No callbacks": Nothing to do, no callbacks on this CPU.289289+ * "In holdoff": Nothing to do, holding off after unsuccessful attempt.290290+ * "Begin holdoff": Attempt failed, don't retry until next jiffy.291291+ * "Dyntick with callbacks": Entering dyntick-idle despite callbacks.292292+ * "More callbacks": Still more callbacks, try again to clear them out.293293+ * "Callbacks drained": All callbacks processed, off to dyntick idle!294294+ * "Timer": Timer fired to cause CPU to continue processing callbacks.295295+ */296296+TRACE_EVENT(rcu_prep_idle,297297+298298+ TP_PROTO(char *reason),299299+300300+ TP_ARGS(reason),301301+302302+ TP_STRUCT__entry(303303+ __field(char *, reason)304304+ ),305305+306306+ TP_fast_assign(307307+ __entry->reason = reason;308308+ ),309309+310310+ TP_printk("%s", __entry->reason)262311);263312264313/*···461412462413/*463414 * Tracepoint for exiting rcu_do_batch after RCU callbacks have been464464- * invoked. The first argument is the name of the RCU flavor and465465- * the second argument is number of callbacks actually invoked.415415+ * invoked. The first argument is the name of the RCU flavor,416416+ * the second argument is number of callbacks actually invoked,417417+ * the third argument (cb) is whether or not any of the callbacks that418418+ * were ready to invoke at the beginning of this batch are still419419+ * queued, the fourth argument (nr) is the return value of need_resched(),420420+ * the fifth argument (iit) is 1 if the current task is the idle task,421421+ * and the sixth argument (risk) is the return value from422422+ * rcu_is_callbacks_kthread().466423 */467424TRACE_EVENT(rcu_batch_end,468425469469- TP_PROTO(char *rcuname, int callbacks_invoked),426426+ TP_PROTO(char *rcuname, int callbacks_invoked,427427+ bool cb, bool nr, bool iit, bool risk),470428471471- TP_ARGS(rcuname, callbacks_invoked),429429+ TP_ARGS(rcuname, callbacks_invoked, cb, nr, iit, risk),472430473431 TP_STRUCT__entry(474432 __field(char *, rcuname)475433 __field(int, callbacks_invoked)434434+ __field(bool, cb)435435+ __field(bool, nr)436436+ __field(bool, iit)437437+ __field(bool, risk)476438 ),477439478440 TP_fast_assign(479441 __entry->rcuname = rcuname;480442 __entry->callbacks_invoked = callbacks_invoked;443443+ __entry->cb = cb;444444+ __entry->nr = nr;445445+ __entry->iit = iit;446446+ __entry->risk = risk;481447 ),482448483483- TP_printk("%s CBs-invoked=%d",484484- __entry->rcuname, __entry->callbacks_invoked)449449+ TP_printk("%s CBs-invoked=%d idle=%c%c%c%c",450450+ __entry->rcuname, __entry->callbacks_invoked,451451+ __entry->cb ? 'C' : '.',452452+ __entry->nr ? 'S' : '.',453453+ __entry->iit ? 'I' : '.',454454+ __entry->risk ? 'R' : '.')455455+);456456+457457+/*458458+ * Tracepoint for rcutorture readers. The first argument is the name459459+ * of the RCU flavor from rcutorture's viewpoint and the second argument460460+ * is the callback address.461461+ */462462+TRACE_EVENT(rcu_torture_read,463463+464464+ TP_PROTO(char *rcutorturename, struct rcu_head *rhp),465465+466466+ TP_ARGS(rcutorturename, rhp),467467+468468+ TP_STRUCT__entry(469469+ __field(char *, rcutorturename)470470+ __field(struct rcu_head *, rhp)471471+ ),472472+473473+ TP_fast_assign(474474+ __entry->rcutorturename = rcutorturename;475475+ __entry->rhp = rhp;476476+ ),477477+478478+ TP_printk("%s torture read %p",479479+ __entry->rcutorturename, __entry->rhp)485480);486481487482#else /* #ifdef CONFIG_RCU_TRACE */···536443#define trace_rcu_unlock_preempted_task(rcuname, gpnum, pid) do { } while (0)537444#define trace_rcu_quiescent_state_report(rcuname, gpnum, mask, qsmask, level, grplo, grphi, gp_tasks) do { } while (0)538445#define trace_rcu_fqs(rcuname, gpnum, cpu, qsevent) do { } while (0)539539-#define trace_rcu_dyntick(polarity) do { } while (0)446446+#define trace_rcu_dyntick(polarity, oldnesting, newnesting) do { } while (0)447447+#define trace_rcu_prep_idle(reason) do { } while (0)540448#define trace_rcu_callback(rcuname, rhp, qlen) do { } while (0)541449#define trace_rcu_kfree_callback(rcuname, rhp, offset, qlen) do { } while (0)542450#define trace_rcu_batch_start(rcuname, qlen, blimit) do { } while (0)543451#define trace_rcu_invoke_callback(rcuname, rhp) do { } while (0)544452#define trace_rcu_invoke_kfree_callback(rcuname, rhp, offset) do { } while (0)545545-#define trace_rcu_batch_end(rcuname, callbacks_invoked) do { } while (0)453453+#define trace_rcu_batch_end(rcuname, callbacks_invoked, cb, nr, iit, risk) \454454+ do { } while (0)455455+#define trace_rcu_torture_read(rcutorturename, rhp) do { } while (0)546456547457#endif /* #else #ifdef CONFIG_RCU_TRACE */548458
+5-5
init/Kconfig
···469469470470config RCU_FAST_NO_HZ471471 bool "Accelerate last non-dyntick-idle CPU's grace periods"472472- depends on TREE_RCU && NO_HZ && SMP472472+ depends on NO_HZ && SMP473473 default n474474 help475475 This option causes RCU to attempt to accelerate grace periods476476- in order to allow the final CPU to enter dynticks-idle state477477- more quickly. On the other hand, this option increases the478478- overhead of the dynticks-idle checking, particularly on systems479479- with large numbers of CPUs.476476+ in order to allow CPUs to enter dynticks-idle state more477477+ quickly. On the other hand, this option increases the overhead478478+ of the dynticks-idle checking, particularly on systems with479479+ large numbers of CPUs.480480481481 Say Y if energy efficiency is critically important, particularly482482 if you have relatively few CPUs.
···636636 (p->exit_state & EXIT_ZOMBIE) ? 'Z' :637637 (p->exit_state & EXIT_DEAD) ? 'E' :638638 (p->state & TASK_INTERRUPTIBLE) ? 'S' : '?';639639- if (p->pid == 0) {639639+ if (is_idle_task(p)) {640640 /* Idle task. Is it really idle, apart from the kdb641641 * interrupt? */642642 if (!kdb_task_has_cpu(p) || kgdb_info[cpu].irq_depth == 1) {
+1-1
kernel/events/core.c
···53625362 regs = get_irq_regs();5363536353645364 if (regs && !perf_exclude_event(event, regs)) {53655365- if (!(event->attr.exclude_idle && current->pid == 0))53655365+ if (!(event->attr.exclude_idle && is_idle_task(current)))53665366 if (perf_event_overflow(event, &data, regs))53675367 ret = HRTIMER_NORESTART;53685368 }
+22
kernel/lockdep.c
···41704170 printk("%s:%d %s!\n", file, line, s);41714171 printk("\nother info that might help us debug this:\n\n");41724172 printk("\nrcu_scheduler_active = %d, debug_locks = %d\n", rcu_scheduler_active, debug_locks);41734173+41744174+ /*41754175+ * If a CPU is in the RCU-free window in idle (ie: in the section41764176+ * between rcu_idle_enter() and rcu_idle_exit(), then RCU41774177+ * considers that CPU to be in an "extended quiescent state",41784178+ * which means that RCU will be completely ignoring that CPU.41794179+ * Therefore, rcu_read_lock() and friends have absolutely no41804180+ * effect on a CPU running in that state. In other words, even if41814181+ * such an RCU-idle CPU has called rcu_read_lock(), RCU might well41824182+ * delete data structures out from under it. RCU really has no41834183+ * choice here: we need to keep an RCU-free window in idle where41844184+ * the CPU may possibly enter into low power mode. This way we can41854185+ * notice an extended quiescent state to other CPUs that started a grace41864186+ * period. Otherwise we would delay any grace period as long as we run41874187+ * in the idle task.41884188+ *41894189+ * So complain bitterly if someone does call rcu_read_lock(),41904190+ * rcu_read_lock_bh() and so on from extended quiescent states.41914191+ */41924192+ if (rcu_is_cpu_idle())41934193+ printk("RCU used illegally from extended quiescent state!\n");41944194+41734195 lockdep_print_held_locks(curr);41744196 printk("\nstack backtrace:\n");41754197 dump_stack();
+7
kernel/rcu.h
···3030#endif /* #else #ifdef CONFIG_RCU_TRACE */31313232/*3333+ * Process-level increment to ->dynticks_nesting field. This allows for3434+ * architectures that use half-interrupts and half-exceptions from3535+ * process context.3636+ */3737+#define DYNTICK_TASK_NESTING (LLONG_MAX / 2 - 1)3838+3939+/*3340 * debug_rcu_head_queue()/debug_rcu_head_unqueue() are used internally3441 * by call_rcu() and rcu callback execution, and are therefore not part of the3542 * RCU API. Leaving in rcupdate.h because they are used by all RCU flavors.
+12
kernel/rcupdate.c
···9393{9494 if (!debug_lockdep_rcu_enabled())9595 return 1;9696+ if (rcu_is_cpu_idle())9797+ return 0;9698 return in_softirq() || irqs_disabled();9799}98100EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);···318316};319317EXPORT_SYMBOL_GPL(rcuhead_debug_descr);320318#endif /* #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD */319319+320320+#if defined(CONFIG_TREE_RCU) || defined(CONFIG_TREE_PREEMPT_RCU) || defined(CONFIG_RCU_TRACE)321321+void do_trace_rcu_torture_read(char *rcutorturename, struct rcu_head *rhp)322322+{323323+ trace_rcu_torture_read(rcutorturename, rhp);324324+}325325+EXPORT_SYMBOL_GPL(do_trace_rcu_torture_read);326326+#else327327+#define do_trace_rcu_torture_read(rcutorturename, rhp) do { } while (0)328328+#endif
+133-22
kernel/rcutiny.c
···53535454#include "rcutiny_plugin.h"55555656-#ifdef CONFIG_NO_HZ5656+static long long rcu_dynticks_nesting = DYNTICK_TASK_NESTING;57575858-static long rcu_dynticks_nesting = 1;5959-6060-/*6161- * Enter dynticks-idle mode, which is an extended quiescent state6262- * if we have fully entered that mode (i.e., if the new value of6363- * dynticks_nesting is zero).6464- */6565-void rcu_enter_nohz(void)5858+/* Common code for rcu_idle_enter() and rcu_irq_exit(), see kernel/rcutree.c. */5959+static void rcu_idle_enter_common(long long oldval)6660{6767- if (--rcu_dynticks_nesting == 0)6868- rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */6161+ if (rcu_dynticks_nesting) {6262+ RCU_TRACE(trace_rcu_dyntick("--=",6363+ oldval, rcu_dynticks_nesting));6464+ return;6565+ }6666+ RCU_TRACE(trace_rcu_dyntick("Start", oldval, rcu_dynticks_nesting));6767+ if (!is_idle_task(current)) {6868+ struct task_struct *idle = idle_task(smp_processor_id());6969+7070+ RCU_TRACE(trace_rcu_dyntick("Error on entry: not idle task",7171+ oldval, rcu_dynticks_nesting));7272+ ftrace_dump(DUMP_ALL);7373+ WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",7474+ current->pid, current->comm,7575+ idle->pid, idle->comm); /* must be idle task! */7676+ }7777+ rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */6978}70797180/*7272- * Exit dynticks-idle mode, so that we are no longer in an extended7373- * quiescent state.8181+ * Enter idle, which is an extended quiescent state if we have fully8282+ * entered that mode (i.e., if the new value of dynticks_nesting is zero).7483 */7575-void rcu_exit_nohz(void)8484+void rcu_idle_enter(void)7685{8686+ unsigned long flags;8787+ long long oldval;8888+8989+ local_irq_save(flags);9090+ oldval = rcu_dynticks_nesting;9191+ rcu_dynticks_nesting = 0;9292+ rcu_idle_enter_common(oldval);9393+ local_irq_restore(flags);9494+}9595+9696+/*9797+ * Exit an interrupt handler towards idle.9898+ */9999+void rcu_irq_exit(void)100100+{101101+ unsigned long flags;102102+ long long oldval;103103+104104+ local_irq_save(flags);105105+ oldval = rcu_dynticks_nesting;106106+ rcu_dynticks_nesting--;107107+ WARN_ON_ONCE(rcu_dynticks_nesting < 0);108108+ rcu_idle_enter_common(oldval);109109+ local_irq_restore(flags);110110+}111111+112112+/* Common code for rcu_idle_exit() and rcu_irq_enter(), see kernel/rcutree.c. */113113+static void rcu_idle_exit_common(long long oldval)114114+{115115+ if (oldval) {116116+ RCU_TRACE(trace_rcu_dyntick("++=",117117+ oldval, rcu_dynticks_nesting));118118+ return;119119+ }120120+ RCU_TRACE(trace_rcu_dyntick("End", oldval, rcu_dynticks_nesting));121121+ if (!is_idle_task(current)) {122122+ struct task_struct *idle = idle_task(smp_processor_id());123123+124124+ RCU_TRACE(trace_rcu_dyntick("Error on exit: not idle task",125125+ oldval, rcu_dynticks_nesting));126126+ ftrace_dump(DUMP_ALL);127127+ WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",128128+ current->pid, current->comm,129129+ idle->pid, idle->comm); /* must be idle task! */130130+ }131131+}132132+133133+/*134134+ * Exit idle, so that we are no longer in an extended quiescent state.135135+ */136136+void rcu_idle_exit(void)137137+{138138+ unsigned long flags;139139+ long long oldval;140140+141141+ local_irq_save(flags);142142+ oldval = rcu_dynticks_nesting;143143+ WARN_ON_ONCE(oldval != 0);144144+ rcu_dynticks_nesting = DYNTICK_TASK_NESTING;145145+ rcu_idle_exit_common(oldval);146146+ local_irq_restore(flags);147147+}148148+149149+/*150150+ * Enter an interrupt handler, moving away from idle.151151+ */152152+void rcu_irq_enter(void)153153+{154154+ unsigned long flags;155155+ long long oldval;156156+157157+ local_irq_save(flags);158158+ oldval = rcu_dynticks_nesting;77159 rcu_dynticks_nesting++;160160+ WARN_ON_ONCE(rcu_dynticks_nesting == 0);161161+ rcu_idle_exit_common(oldval);162162+ local_irq_restore(flags);78163}791648080-#endif /* #ifdef CONFIG_NO_HZ */165165+#ifdef CONFIG_PROVE_RCU166166+167167+/*168168+ * Test whether RCU thinks that the current CPU is idle.169169+ */170170+int rcu_is_cpu_idle(void)171171+{172172+ return !rcu_dynticks_nesting;173173+}174174+EXPORT_SYMBOL(rcu_is_cpu_idle);175175+176176+#endif /* #ifdef CONFIG_PROVE_RCU */177177+178178+/*179179+ * Test whether the current CPU was interrupted from idle. Nested180180+ * interrupts don't count, we must be running at the first interrupt181181+ * level.182182+ */183183+int rcu_is_cpu_rrupt_from_idle(void)184184+{185185+ return rcu_dynticks_nesting <= 0;186186+}8118782188/*83189 * Helper function for rcu_sched_qs() and rcu_bh_qs().···232126233127/*234128 * Check to see if the scheduling-clock interrupt came from an extended235235- * quiescent state, and, if so, tell RCU about it.129129+ * quiescent state, and, if so, tell RCU about it. This function must130130+ * be called from hardirq context. It is normally called from the131131+ * scheduling-clock interrupt.236132 */237133void rcu_check_callbacks(int cpu, int user)238134{239239- if (user ||240240- (idle_cpu(cpu) &&241241- !in_softirq() &&242242- hardirq_count() <= (1 << HARDIRQ_SHIFT)))135135+ if (user || rcu_is_cpu_rrupt_from_idle())243136 rcu_sched_qs(cpu);244137 else if (!in_softirq())245138 rcu_bh_qs(cpu);···259154 /* If no RCU callbacks ready to invoke, just return. */260155 if (&rcp->rcucblist == rcp->donetail) {261156 RCU_TRACE(trace_rcu_batch_start(rcp->name, 0, -1));262262- RCU_TRACE(trace_rcu_batch_end(rcp->name, 0));157157+ RCU_TRACE(trace_rcu_batch_end(rcp->name, 0,158158+ ACCESS_ONCE(rcp->rcucblist),159159+ need_resched(),160160+ is_idle_task(current),161161+ rcu_is_callbacks_kthread()));263162 return;264163 }265164···292183 RCU_TRACE(cb_count++);293184 }294185 RCU_TRACE(rcu_trace_sub_qlen(rcp, cb_count));295295- RCU_TRACE(trace_rcu_batch_end(rcp->name, cb_count));186186+ RCU_TRACE(trace_rcu_batch_end(rcp->name, cb_count, 0, need_resched(),187187+ is_idle_task(current),188188+ rcu_is_callbacks_kthread()));296189}297190298191static void rcu_process_callbacks(struct softirq_action *unused)
+27-2
kernel/rcutiny_plugin.h
···312312 rt_mutex_lock(&mtx);313313 rt_mutex_unlock(&mtx); /* Keep lockdep happy. */314314315315- return rcu_preempt_ctrlblk.boost_tasks != NULL ||316316- rcu_preempt_ctrlblk.exp_tasks != NULL;315315+ return ACCESS_ONCE(rcu_preempt_ctrlblk.boost_tasks) != NULL ||316316+ ACCESS_ONCE(rcu_preempt_ctrlblk.exp_tasks) != NULL;317317}318318319319/*···885885 wake_up(&rcu_kthread_wq);886886}887887888888+#ifdef CONFIG_RCU_TRACE889889+890890+/*891891+ * Is the current CPU running the RCU-callbacks kthread?892892+ * Caller must have preemption disabled.893893+ */894894+static bool rcu_is_callbacks_kthread(void)895895+{896896+ return rcu_kthread_task == current;897897+}898898+899899+#endif /* #ifdef CONFIG_RCU_TRACE */900900+888901/*889902 * This kthread invokes RCU callbacks whose grace periods have890903 * elapsed. It is awakened as needed, and takes the place of the···950937{951938 raise_softirq(RCU_SOFTIRQ);952939}940940+941941+#ifdef CONFIG_RCU_TRACE942942+943943+/*944944+ * There is no callback kthread, so this thread is never it.945945+ */946946+static bool rcu_is_callbacks_kthread(void)947947+{948948+ return false;949949+}950950+951951+#endif /* #ifdef CONFIG_RCU_TRACE */953952954953void rcu_init(void)955954{
+218-7
kernel/rcutorture.c
···6161static int shuffle_interval = 3; /* Interval between shuffles (in sec)*/6262static int stutter = 5; /* Start/stop testing interval (in sec) */6363static int irqreader = 1; /* RCU readers from irq (timers). */6464-static int fqs_duration = 0; /* Duration of bursts (us), 0 to disable. */6565-static int fqs_holdoff = 0; /* Hold time within burst (us). */6464+static int fqs_duration; /* Duration of bursts (us), 0 to disable. */6565+static int fqs_holdoff; /* Hold time within burst (us). */6666static int fqs_stutter = 3; /* Wait time between bursts (s). */6767+static int onoff_interval; /* Wait time between CPU hotplugs, 0=disable. */6868+static int shutdown_secs; /* Shutdown time (s). <=0 for no shutdown. */6769static int test_boost = 1; /* Test RCU prio boost: 0=no, 1=maybe, 2=yes. */6870static int test_boost_interval = 7; /* Interval between boost tests, seconds. */6971static int test_boost_duration = 4; /* Duration of each boost test, seconds. */···9391MODULE_PARM_DESC(fqs_holdoff, "Holdoff time within fqs bursts (us)");9492module_param(fqs_stutter, int, 0444);9593MODULE_PARM_DESC(fqs_stutter, "Wait time between fqs bursts (s)");9494+module_param(onoff_interval, int, 0444);9595+MODULE_PARM_DESC(onoff_interval, "Time between CPU hotplugs (s), 0=disable");9696+module_param(shutdown_secs, int, 0444);9797+MODULE_PARM_DESC(shutdown_secs, "Shutdown time (s), zero to disable.");9698module_param(test_boost, int, 0444);9799MODULE_PARM_DESC(test_boost, "Test RCU prio boost: 0=no, 1=maybe, 2=yes.");98100module_param(test_boost_interval, int, 0444);···125119static struct task_struct *stutter_task;126120static struct task_struct *fqs_task;127121static struct task_struct *boost_tasks[NR_CPUS];122122+static struct task_struct *shutdown_task;123123+#ifdef CONFIG_HOTPLUG_CPU124124+static struct task_struct *onoff_task;125125+#endif /* #ifdef CONFIG_HOTPLUG_CPU */128126129127#define RCU_TORTURE_PIPE_LEN 10130128···159149static long n_rcu_torture_boost_failure;160150static long n_rcu_torture_boosts;161151static long n_rcu_torture_timers;152152+static long n_offline_attempts;153153+static long n_offline_successes;154154+static long n_online_attempts;155155+static long n_online_successes;162156static struct list_head rcu_torture_removed;163157static cpumask_var_t shuffle_tmp_mask;164158···174160#define RCUTORTURE_RUNNABLE_INIT 0175161#endif176162int rcutorture_runnable = RCUTORTURE_RUNNABLE_INIT;163163+module_param(rcutorture_runnable, int, 0444);164164+MODULE_PARM_DESC(rcutorture_runnable, "Start rcutorture at boot");177165178166#if defined(CONFIG_RCU_BOOST) && !defined(CONFIG_HOTPLUG_CPU)179167#define rcu_can_boost() 1···183167#define rcu_can_boost() 0184168#endif /* #else #if defined(CONFIG_RCU_BOOST) && !defined(CONFIG_HOTPLUG_CPU) */185169170170+static unsigned long shutdown_time; /* jiffies to system shutdown. */186171static unsigned long boost_starttime; /* jiffies of next boost test start. */187172DEFINE_MUTEX(boost_mutex); /* protect setting boost_starttime */188173 /* and boost task create/destroy. */···198181 * Protect fullstop transitions and spawning of kthreads.199182 */200183static DEFINE_MUTEX(fullstop_mutex);184184+185185+/* Forward reference. */186186+static void rcu_torture_cleanup(void);201187202188/*203189 * Detect and respond to a system shutdown.···632612 .name = "srcu"633613};634614615615+static int srcu_torture_read_lock_raw(void) __acquires(&srcu_ctl)616616+{617617+ return srcu_read_lock_raw(&srcu_ctl);618618+}619619+620620+static void srcu_torture_read_unlock_raw(int idx) __releases(&srcu_ctl)621621+{622622+ srcu_read_unlock_raw(&srcu_ctl, idx);623623+}624624+625625+static struct rcu_torture_ops srcu_raw_ops = {626626+ .init = srcu_torture_init,627627+ .cleanup = srcu_torture_cleanup,628628+ .readlock = srcu_torture_read_lock_raw,629629+ .read_delay = srcu_read_delay,630630+ .readunlock = srcu_torture_read_unlock_raw,631631+ .completed = srcu_torture_completed,632632+ .deferred_free = rcu_sync_torture_deferred_free,633633+ .sync = srcu_torture_synchronize,634634+ .cb_barrier = NULL,635635+ .stats = srcu_torture_stats,636636+ .name = "srcu_raw"637637+};638638+635639static void srcu_torture_synchronize_expedited(void)636640{637641 synchronize_srcu_expedited(&srcu_ctl);···957913 return 0;958914}959915916916+void rcutorture_trace_dump(void)917917+{918918+ static atomic_t beenhere = ATOMIC_INIT(0);919919+920920+ if (atomic_read(&beenhere))921921+ return;922922+ if (atomic_xchg(&beenhere, 1) != 0)923923+ return;924924+ do_trace_rcu_torture_read(cur_ops->name, (struct rcu_head *)~0UL);925925+ ftrace_dump(DUMP_ALL);926926+}927927+960928/*961929 * RCU torture reader from timer handler. Dereferences rcu_torture_current,962930 * incrementing the corresponding element of the pipeline array. The···990934 rcu_read_lock_bh_held() ||991935 rcu_read_lock_sched_held() ||992936 srcu_read_lock_held(&srcu_ctl));937937+ do_trace_rcu_torture_read(cur_ops->name, &p->rtort_rcu);993938 if (p == NULL) {994939 /* Leave because rcu_torture_writer is not yet underway */995940 cur_ops->readunlock(idx);···1008951 /* Should not happen, but... */1009952 pipe_count = RCU_TORTURE_PIPE_LEN;1010953 }954954+ if (pipe_count > 1)955955+ rcutorture_trace_dump();1011956 __this_cpu_inc(rcu_torture_count[pipe_count]);1012957 completed = cur_ops->completed() - completed;1013958 if (completed > RCU_TORTURE_PIPE_LEN) {···1053994 rcu_read_lock_bh_held() ||1054995 rcu_read_lock_sched_held() ||1055996 srcu_read_lock_held(&srcu_ctl));997997+ do_trace_rcu_torture_read(cur_ops->name, &p->rtort_rcu);1056998 if (p == NULL) {1057999 /* Wait for rcu_torture_writer to get underway */10581000 cur_ops->readunlock(idx);···10691009 /* Should not happen, but... */10701010 pipe_count = RCU_TORTURE_PIPE_LEN;10711011 }10121012+ if (pipe_count > 1)10131013+ rcutorture_trace_dump();10721014 __this_cpu_inc(rcu_torture_count[pipe_count]);10731015 completed = cur_ops->completed() - completed;10741016 if (completed > RCU_TORTURE_PIPE_LEN) {···11181056 cnt += sprintf(&page[cnt],11191057 "rtc: %p ver: %lu tfle: %d rta: %d rtaf: %d rtf: %d "11201058 "rtmbe: %d rtbke: %ld rtbre: %ld "11211121- "rtbf: %ld rtb: %ld nt: %ld",10591059+ "rtbf: %ld rtb: %ld nt: %ld "10601060+ "onoff: %ld/%ld:%ld/%ld",11221061 rcu_torture_current,11231062 rcu_torture_current_version,11241063 list_empty(&rcu_torture_freelist),···11311068 n_rcu_torture_boost_rterror,11321069 n_rcu_torture_boost_failure,11331070 n_rcu_torture_boosts,11341134- n_rcu_torture_timers);10711071+ n_rcu_torture_timers,10721072+ n_online_successes,10731073+ n_online_attempts,10741074+ n_offline_successes,10751075+ n_offline_attempts);11351076 if (atomic_read(&n_rcu_torture_mberror) != 0 ||11361077 n_rcu_torture_boost_ktrerror != 0 ||11371078 n_rcu_torture_boost_rterror != 0 ||···12991232 "shuffle_interval=%d stutter=%d irqreader=%d "13001233 "fqs_duration=%d fqs_holdoff=%d fqs_stutter=%d "13011234 "test_boost=%d/%d test_boost_interval=%d "13021302- "test_boost_duration=%d\n",12351235+ "test_boost_duration=%d shutdown_secs=%d "12361236+ "onoff_interval=%d\n",13031237 torture_type, tag, nrealreaders, nfakewriters,13041238 stat_interval, verbose, test_no_idle_hz, shuffle_interval,13051239 stutter, irqreader, fqs_duration, fqs_holdoff, fqs_stutter,13061240 test_boost, cur_ops->can_boost,13071307- test_boost_interval, test_boost_duration);12411241+ test_boost_interval, test_boost_duration, shutdown_secs,12421242+ onoff_interval);13081243}1309124413101245static struct notifier_block rcutorture_shutdown_nb = {···13551286 mutex_unlock(&boost_mutex);13561287 return 0;13571288}12891289+12901290+/*12911291+ * Cause the rcutorture test to shutdown the system after the test has12921292+ * run for the time specified by the shutdown_secs module parameter.12931293+ */12941294+static int12951295+rcu_torture_shutdown(void *arg)12961296+{12971297+ long delta;12981298+ unsigned long jiffies_snap;12991299+13001300+ VERBOSE_PRINTK_STRING("rcu_torture_shutdown task started");13011301+ jiffies_snap = ACCESS_ONCE(jiffies);13021302+ while (ULONG_CMP_LT(jiffies_snap, shutdown_time) &&13031303+ !kthread_should_stop()) {13041304+ delta = shutdown_time - jiffies_snap;13051305+ if (verbose)13061306+ printk(KERN_ALERT "%s" TORTURE_FLAG13071307+ "rcu_torture_shutdown task: %lu "13081308+ "jiffies remaining\n",13091309+ torture_type, delta);13101310+ schedule_timeout_interruptible(delta);13111311+ jiffies_snap = ACCESS_ONCE(jiffies);13121312+ }13131313+ if (kthread_should_stop()) {13141314+ VERBOSE_PRINTK_STRING("rcu_torture_shutdown task stopping");13151315+ return 0;13161316+ }13171317+13181318+ /* OK, shut down the system. */13191319+13201320+ VERBOSE_PRINTK_STRING("rcu_torture_shutdown task shutting down system");13211321+ shutdown_task = NULL; /* Avoid self-kill deadlock. */13221322+ rcu_torture_cleanup(); /* Get the success/failure message. */13231323+ kernel_power_off(); /* Shut down the system. */13241324+ return 0;13251325+}13261326+13271327+#ifdef CONFIG_HOTPLUG_CPU13281328+13291329+/*13301330+ * Execute random CPU-hotplug operations at the interval specified13311331+ * by the onoff_interval.13321332+ */13331333+static int13341334+rcu_torture_onoff(void *arg)13351335+{13361336+ int cpu;13371337+ int maxcpu = -1;13381338+ DEFINE_RCU_RANDOM(rand);13391339+13401340+ VERBOSE_PRINTK_STRING("rcu_torture_onoff task started");13411341+ for_each_online_cpu(cpu)13421342+ maxcpu = cpu;13431343+ WARN_ON(maxcpu < 0);13441344+ while (!kthread_should_stop()) {13451345+ cpu = (rcu_random(&rand) >> 4) % (maxcpu + 1);13461346+ if (cpu_online(cpu) && cpu_is_hotpluggable(cpu)) {13471347+ if (verbose)13481348+ printk(KERN_ALERT "%s" TORTURE_FLAG13491349+ "rcu_torture_onoff task: offlining %d\n",13501350+ torture_type, cpu);13511351+ n_offline_attempts++;13521352+ if (cpu_down(cpu) == 0) {13531353+ if (verbose)13541354+ printk(KERN_ALERT "%s" TORTURE_FLAG13551355+ "rcu_torture_onoff task: "13561356+ "offlined %d\n",13571357+ torture_type, cpu);13581358+ n_offline_successes++;13591359+ }13601360+ } else if (cpu_is_hotpluggable(cpu)) {13611361+ if (verbose)13621362+ printk(KERN_ALERT "%s" TORTURE_FLAG13631363+ "rcu_torture_onoff task: onlining %d\n",13641364+ torture_type, cpu);13651365+ n_online_attempts++;13661366+ if (cpu_up(cpu) == 0) {13671367+ if (verbose)13681368+ printk(KERN_ALERT "%s" TORTURE_FLAG13691369+ "rcu_torture_onoff task: "13701370+ "onlined %d\n",13711371+ torture_type, cpu);13721372+ n_online_successes++;13731373+ }13741374+ }13751375+ schedule_timeout_interruptible(onoff_interval * HZ);13761376+ }13771377+ VERBOSE_PRINTK_STRING("rcu_torture_onoff task stopping");13781378+ return 0;13791379+}13801380+13811381+static int13821382+rcu_torture_onoff_init(void)13831383+{13841384+ if (onoff_interval <= 0)13851385+ return 0;13861386+ onoff_task = kthread_run(rcu_torture_onoff, NULL, "rcu_torture_onoff");13871387+ if (IS_ERR(onoff_task)) {13881388+ onoff_task = NULL;13891389+ return PTR_ERR(onoff_task);13901390+ }13911391+ return 0;13921392+}13931393+13941394+static void rcu_torture_onoff_cleanup(void)13951395+{13961396+ if (onoff_task == NULL)13971397+ return;13981398+ VERBOSE_PRINTK_STRING("Stopping rcu_torture_onoff task");13991399+ kthread_stop(onoff_task);14001400+}14011401+14021402+#else /* #ifdef CONFIG_HOTPLUG_CPU */14031403+14041404+static void14051405+rcu_torture_onoff_init(void)14061406+{14071407+}14081408+14091409+static void rcu_torture_onoff_cleanup(void)14101410+{14111411+}14121412+14131413+#endif /* #else #ifdef CONFIG_HOTPLUG_CPU */1358141413591415static int rcutorture_cpu_notify(struct notifier_block *self,13601416 unsigned long action, void *hcpu)···15851391 for_each_possible_cpu(i)15861392 rcutorture_booster_cleanup(i);15871393 }13941394+ if (shutdown_task != NULL) {13951395+ VERBOSE_PRINTK_STRING("Stopping rcu_torture_shutdown task");13961396+ kthread_stop(shutdown_task);13971397+ }13981398+ rcu_torture_onoff_cleanup();1588139915891400 /* Wait for all RCU callbacks to fire. */15901401···16151416 static struct rcu_torture_ops *torture_ops[] =16161417 { &rcu_ops, &rcu_sync_ops, &rcu_expedited_ops,16171418 &rcu_bh_ops, &rcu_bh_sync_ops, &rcu_bh_expedited_ops,16181618- &srcu_ops, &srcu_expedited_ops,14191419+ &srcu_ops, &srcu_raw_ops, &srcu_expedited_ops,16191420 &sched_ops, &sched_sync_ops, &sched_expedited_ops, };1620142116211422 mutex_lock(&fullstop_mutex);···18061607 }18071608 }18081609 }16101610+ if (shutdown_secs > 0) {16111611+ shutdown_time = jiffies + shutdown_secs * HZ;16121612+ shutdown_task = kthread_run(rcu_torture_shutdown, NULL,16131613+ "rcu_torture_shutdown");16141614+ if (IS_ERR(shutdown_task)) {16151615+ firsterr = PTR_ERR(shutdown_task);16161616+ VERBOSE_PRINTK_ERRSTRING("Failed to create shutdown");16171617+ shutdown_task = NULL;16181618+ goto unwind;16191619+ }16201620+ }16211621+ rcu_torture_onoff_init();18091622 register_reboot_notifier(&rcutorture_shutdown_nb);18101623 rcutorture_record_test_transition();18111624 mutex_unlock(&fullstop_mutex);
+212-92
kernel/rcutree.c
···6969 NUM_RCU_LVL_3, \7070 NUM_RCU_LVL_4, /* == MAX_RCU_LVLS */ \7171 }, \7272- .signaled = RCU_GP_IDLE, \7272+ .fqs_state = RCU_GP_IDLE, \7373 .gpnum = -300, \7474 .completed = -300, \7575 .onofflock = __RAW_SPIN_LOCK_UNLOCKED(&structname##_state.onofflock), \···195195}196196EXPORT_SYMBOL_GPL(rcu_note_context_switch);197197198198-#ifdef CONFIG_NO_HZ199198DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {200200- .dynticks_nesting = 1,199199+ .dynticks_nesting = DYNTICK_TASK_NESTING,201200 .dynticks = ATOMIC_INIT(1),202201};203203-#endif /* #ifdef CONFIG_NO_HZ */204202205203static int blimit = 10; /* Maximum callbacks per rcu_do_batch. */206204static int qhimark = 10000; /* If this many pending, ignore blimit. */···326328 return 1;327329 }328330329329- /* If preemptible RCU, no point in sending reschedule IPI. */330330- if (rdp->preemptible)331331- return 0;332332-333333- /* The CPU is online, so send it a reschedule IPI. */331331+ /*332332+ * The CPU is online, so send it a reschedule IPI. This forces333333+ * it through the scheduler, and (inefficiently) also handles cases334334+ * where idle loops fail to inform RCU about the CPU being idle.335335+ */334336 if (rdp->cpu != smp_processor_id())335337 smp_send_reschedule(rdp->cpu);336338 else···341343342344#endif /* #ifdef CONFIG_SMP */343345344344-#ifdef CONFIG_NO_HZ345345-346346-/**347347- * rcu_enter_nohz - inform RCU that current CPU is entering nohz346346+/*347347+ * rcu_idle_enter_common - inform RCU that current CPU is moving towards idle348348 *349349- * Enter nohz mode, in other words, -leave- the mode in which RCU350350- * read-side critical sections can occur. (Though RCU read-side351351- * critical sections can occur in irq handlers in nohz mode, a possibility352352- * handled by rcu_irq_enter() and rcu_irq_exit()).349349+ * If the new value of the ->dynticks_nesting counter now is zero,350350+ * we really have entered idle, and must do the appropriate accounting.351351+ * The caller must have disabled interrupts.353352 */354354-void rcu_enter_nohz(void)353353+static void rcu_idle_enter_common(struct rcu_dynticks *rdtp, long long oldval)355354{356356- unsigned long flags;357357- struct rcu_dynticks *rdtp;355355+ trace_rcu_dyntick("Start", oldval, 0);356356+ if (!is_idle_task(current)) {357357+ struct task_struct *idle = idle_task(smp_processor_id());358358359359- local_irq_save(flags);360360- rdtp = &__get_cpu_var(rcu_dynticks);361361- if (--rdtp->dynticks_nesting) {362362- local_irq_restore(flags);363363- return;359359+ trace_rcu_dyntick("Error on entry: not idle task", oldval, 0);360360+ ftrace_dump(DUMP_ALL);361361+ WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",362362+ current->pid, current->comm,363363+ idle->pid, idle->comm); /* must be idle task! */364364 }365365- trace_rcu_dyntick("Start");365365+ rcu_prepare_for_idle(smp_processor_id());366366 /* CPUs seeing atomic_inc() must see prior RCU read-side crit sects */367367 smp_mb__before_atomic_inc(); /* See above. */368368 atomic_inc(&rdtp->dynticks);369369 smp_mb__after_atomic_inc(); /* Force ordering with next sojourn. */370370 WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);371371- local_irq_restore(flags);372371}373372374374-/*375375- * rcu_exit_nohz - inform RCU that current CPU is leaving nohz373373+/**374374+ * rcu_idle_enter - inform RCU that current CPU is entering idle376375 *377377- * Exit nohz mode, in other words, -enter- the mode in which RCU378378- * read-side critical sections normally occur.376376+ * Enter idle mode, in other words, -leave- the mode in which RCU377377+ * read-side critical sections can occur. (Though RCU read-side378378+ * critical sections can occur in irq handlers in idle, a possibility379379+ * handled by irq_enter() and irq_exit().)380380+ *381381+ * We crowbar the ->dynticks_nesting field to zero to allow for382382+ * the possibility of usermode upcalls having messed up our count383383+ * of interrupt nesting level during the prior busy period.379384 */380380-void rcu_exit_nohz(void)385385+void rcu_idle_enter(void)381386{382387 unsigned long flags;388388+ long long oldval;383389 struct rcu_dynticks *rdtp;384390385391 local_irq_save(flags);386392 rdtp = &__get_cpu_var(rcu_dynticks);387387- if (rdtp->dynticks_nesting++) {388388- local_irq_restore(flags);389389- return;390390- }393393+ oldval = rdtp->dynticks_nesting;394394+ rdtp->dynticks_nesting = 0;395395+ rcu_idle_enter_common(rdtp, oldval);396396+ local_irq_restore(flags);397397+}398398+399399+/**400400+ * rcu_irq_exit - inform RCU that current CPU is exiting irq towards idle401401+ *402402+ * Exit from an interrupt handler, which might possibly result in entering403403+ * idle mode, in other words, leaving the mode in which read-side critical404404+ * sections can occur.405405+ *406406+ * This code assumes that the idle loop never does anything that might407407+ * result in unbalanced calls to irq_enter() and irq_exit(). If your408408+ * architecture violates this assumption, RCU will give you what you409409+ * deserve, good and hard. But very infrequently and irreproducibly.410410+ *411411+ * Use things like work queues to work around this limitation.412412+ *413413+ * You have been warned.414414+ */415415+void rcu_irq_exit(void)416416+{417417+ unsigned long flags;418418+ long long oldval;419419+ struct rcu_dynticks *rdtp;420420+421421+ local_irq_save(flags);422422+ rdtp = &__get_cpu_var(rcu_dynticks);423423+ oldval = rdtp->dynticks_nesting;424424+ rdtp->dynticks_nesting--;425425+ WARN_ON_ONCE(rdtp->dynticks_nesting < 0);426426+ if (rdtp->dynticks_nesting)427427+ trace_rcu_dyntick("--=", oldval, rdtp->dynticks_nesting);428428+ else429429+ rcu_idle_enter_common(rdtp, oldval);430430+ local_irq_restore(flags);431431+}432432+433433+/*434434+ * rcu_idle_exit_common - inform RCU that current CPU is moving away from idle435435+ *436436+ * If the new value of the ->dynticks_nesting counter was previously zero,437437+ * we really have exited idle, and must do the appropriate accounting.438438+ * The caller must have disabled interrupts.439439+ */440440+static void rcu_idle_exit_common(struct rcu_dynticks *rdtp, long long oldval)441441+{391442 smp_mb__before_atomic_inc(); /* Force ordering w/previous sojourn. */392443 atomic_inc(&rdtp->dynticks);393444 /* CPUs seeing atomic_inc() must see later RCU read-side crit sects */394445 smp_mb__after_atomic_inc(); /* See above. */395446 WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));396396- trace_rcu_dyntick("End");447447+ rcu_cleanup_after_idle(smp_processor_id());448448+ trace_rcu_dyntick("End", oldval, rdtp->dynticks_nesting);449449+ if (!is_idle_task(current)) {450450+ struct task_struct *idle = idle_task(smp_processor_id());451451+452452+ trace_rcu_dyntick("Error on exit: not idle task",453453+ oldval, rdtp->dynticks_nesting);454454+ ftrace_dump(DUMP_ALL);455455+ WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",456456+ current->pid, current->comm,457457+ idle->pid, idle->comm); /* must be idle task! */458458+ }459459+}460460+461461+/**462462+ * rcu_idle_exit - inform RCU that current CPU is leaving idle463463+ *464464+ * Exit idle mode, in other words, -enter- the mode in which RCU465465+ * read-side critical sections can occur.466466+ *467467+ * We crowbar the ->dynticks_nesting field to DYNTICK_TASK_NESTING to468468+ * allow for the possibility of usermode upcalls messing up our count469469+ * of interrupt nesting level during the busy period that is just470470+ * now starting.471471+ */472472+void rcu_idle_exit(void)473473+{474474+ unsigned long flags;475475+ struct rcu_dynticks *rdtp;476476+ long long oldval;477477+478478+ local_irq_save(flags);479479+ rdtp = &__get_cpu_var(rcu_dynticks);480480+ oldval = rdtp->dynticks_nesting;481481+ WARN_ON_ONCE(oldval != 0);482482+ rdtp->dynticks_nesting = DYNTICK_TASK_NESTING;483483+ rcu_idle_exit_common(rdtp, oldval);484484+ local_irq_restore(flags);485485+}486486+487487+/**488488+ * rcu_irq_enter - inform RCU that current CPU is entering irq away from idle489489+ *490490+ * Enter an interrupt handler, which might possibly result in exiting491491+ * idle mode, in other words, entering the mode in which read-side critical492492+ * sections can occur.493493+ *494494+ * Note that the Linux kernel is fully capable of entering an interrupt495495+ * handler that it never exits, for example when doing upcalls to496496+ * user mode! This code assumes that the idle loop never does upcalls to497497+ * user mode. If your architecture does do upcalls from the idle loop (or498498+ * does anything else that results in unbalanced calls to the irq_enter()499499+ * and irq_exit() functions), RCU will give you what you deserve, good500500+ * and hard. But very infrequently and irreproducibly.501501+ *502502+ * Use things like work queues to work around this limitation.503503+ *504504+ * You have been warned.505505+ */506506+void rcu_irq_enter(void)507507+{508508+ unsigned long flags;509509+ struct rcu_dynticks *rdtp;510510+ long long oldval;511511+512512+ local_irq_save(flags);513513+ rdtp = &__get_cpu_var(rcu_dynticks);514514+ oldval = rdtp->dynticks_nesting;515515+ rdtp->dynticks_nesting++;516516+ WARN_ON_ONCE(rdtp->dynticks_nesting == 0);517517+ if (oldval)518518+ trace_rcu_dyntick("++=", oldval, rdtp->dynticks_nesting);519519+ else520520+ rcu_idle_exit_common(rdtp, oldval);397521 local_irq_restore(flags);398522}399523···562442 WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);563443}564444565565-/**566566- * rcu_irq_enter - inform RCU of entry to hard irq context567567- *568568- * If the CPU was idle with dynamic ticks active, this updates the569569- * rdtp->dynticks to let the RCU handling know that the CPU is active.570570- */571571-void rcu_irq_enter(void)572572-{573573- rcu_exit_nohz();574574-}445445+#ifdef CONFIG_PROVE_RCU575446576447/**577577- * rcu_irq_exit - inform RCU of exit from hard irq context448448+ * rcu_is_cpu_idle - see if RCU thinks that the current CPU is idle578449 *579579- * If the CPU was idle with dynamic ticks active, update the rdp->dynticks580580- * to put let the RCU handling be aware that the CPU is going back to idle581581- * with no ticks.450450+ * If the current CPU is in its idle loop and is neither in an interrupt451451+ * or NMI handler, return true.582452 */583583-void rcu_irq_exit(void)453453+int rcu_is_cpu_idle(void)584454{585585- rcu_enter_nohz();455455+ int ret;456456+457457+ preempt_disable();458458+ ret = (atomic_read(&__get_cpu_var(rcu_dynticks).dynticks) & 0x1) == 0;459459+ preempt_enable();460460+ return ret;461461+}462462+EXPORT_SYMBOL(rcu_is_cpu_idle);463463+464464+#endif /* #ifdef CONFIG_PROVE_RCU */465465+466466+/**467467+ * rcu_is_cpu_rrupt_from_idle - see if idle or immediately interrupted from idle468468+ *469469+ * If the current CPU is idle or running at a first-level (not nested)470470+ * interrupt from idle, return true. The caller must have at least471471+ * disabled preemption.472472+ */473473+int rcu_is_cpu_rrupt_from_idle(void)474474+{475475+ return __get_cpu_var(rcu_dynticks).dynticks_nesting <= 1;586476}587477588478#ifdef CONFIG_SMP···605475static int dyntick_save_progress_counter(struct rcu_data *rdp)606476{607477 rdp->dynticks_snap = atomic_add_return(0, &rdp->dynticks->dynticks);608608- return 0;478478+ return (rdp->dynticks_snap & 0x1) == 0;609479}610480611481/*···641511}642512643513#endif /* #ifdef CONFIG_SMP */644644-645645-#else /* #ifdef CONFIG_NO_HZ */646646-647647-#ifdef CONFIG_SMP648648-649649-static int dyntick_save_progress_counter(struct rcu_data *rdp)650650-{651651- return 0;652652-}653653-654654-static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)655655-{656656- return rcu_implicit_offline_qs(rdp);657657-}658658-659659-#endif /* #ifdef CONFIG_SMP */660660-661661-#endif /* #else #ifdef CONFIG_NO_HZ */662662-663663-int rcu_cpu_stall_suppress __read_mostly;664514665515static void record_gp_stall_check_time(struct rcu_state *rsp)666516{···976866 /* Advance to a new grace period and initialize state. */977867 rsp->gpnum++;978868 trace_rcu_grace_period(rsp->name, rsp->gpnum, "start");979979- WARN_ON_ONCE(rsp->signaled == RCU_GP_INIT);980980- rsp->signaled = RCU_GP_INIT; /* Hold off force_quiescent_state. */869869+ WARN_ON_ONCE(rsp->fqs_state == RCU_GP_INIT);870870+ rsp->fqs_state = RCU_GP_INIT; /* Hold off force_quiescent_state. */981871 rsp->jiffies_force_qs = jiffies + RCU_JIFFIES_TILL_FORCE_QS;982872 record_gp_stall_check_time(rsp);983873···987877 rnp->qsmask = rnp->qsmaskinit;988878 rnp->gpnum = rsp->gpnum;989879 rnp->completed = rsp->completed;990990- rsp->signaled = RCU_SIGNAL_INIT; /* force_quiescent_state OK. */880880+ rsp->fqs_state = RCU_SIGNAL_INIT; /* force_quiescent_state OK */991881 rcu_start_gp_per_cpu(rsp, rnp, rdp);992882 rcu_preempt_boost_start_gp(rnp);993883 trace_rcu_grace_period_init(rsp->name, rnp->gpnum,···10379271038928 rnp = rcu_get_root(rsp);1039929 raw_spin_lock(&rnp->lock); /* irqs already disabled. */10401040- rsp->signaled = RCU_SIGNAL_INIT; /* force_quiescent_state now OK. */930930+ rsp->fqs_state = RCU_SIGNAL_INIT; /* force_quiescent_state now OK. */1041931 raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */1042932 raw_spin_unlock_irqrestore(&rsp->onofflock, flags);1043933}···11019911102992 rsp->completed = rsp->gpnum; /* Declare the grace period complete. */1103993 trace_rcu_grace_period(rsp->name, rsp->completed, "end");11041104- rsp->signaled = RCU_GP_IDLE;994994+ rsp->fqs_state = RCU_GP_IDLE;1105995 rcu_start_gp(rsp, flags); /* releases root node's rnp->lock. */1106996}1107997···13311221 else13321222 raw_spin_unlock_irqrestore(&rnp->lock, flags);13331223 if (need_report & RCU_OFL_TASKS_EXP_GP)13341334- rcu_report_exp_rnp(rsp, rnp);12241224+ rcu_report_exp_rnp(rsp, rnp, true);13351225 rcu_node_kthread_setaffinity(rnp, -1);13361226}13371227···13731263 /* If no callbacks are ready, just return.*/13741264 if (!cpu_has_callbacks_ready_to_invoke(rdp)) {13751265 trace_rcu_batch_start(rsp->name, 0, 0);13761376- trace_rcu_batch_end(rsp->name, 0);12661266+ trace_rcu_batch_end(rsp->name, 0, !!ACCESS_ONCE(rdp->nxtlist),12671267+ need_resched(), is_idle_task(current),12681268+ rcu_is_callbacks_kthread());13771269 return;13781270 }13791271···14031291 debug_rcu_head_unqueue(list);14041292 __rcu_reclaim(rsp->name, list);14051293 list = next;14061406- if (++count >= bl)12941294+ /* Stop only if limit reached and CPU has something to do. */12951295+ if (++count >= bl &&12961296+ (need_resched() ||12971297+ (!is_idle_task(current) && !rcu_is_callbacks_kthread())))14071298 break;14081299 }1409130014101301 local_irq_save(flags);14111411- trace_rcu_batch_end(rsp->name, count);13021302+ trace_rcu_batch_end(rsp->name, count, !!list, need_resched(),13031303+ is_idle_task(current),13041304+ rcu_is_callbacks_kthread());1412130514131306 /* Update count, and requeue any remaining callbacks. */14141307 rdp->qlen -= count;···14511334 * (user mode or idle loop for rcu, non-softirq execution for rcu_bh).14521335 * Also schedule RCU core processing.14531336 *14541454- * This function must be called with hardirqs disabled. It is normally13371337+ * This function must be called from hardirq context. It is normally14551338 * invoked from the scheduling-clock interrupt. If rcu_pending returns14561339 * false, there is no point in invoking rcu_check_callbacks().14571340 */14581341void rcu_check_callbacks(int cpu, int user)14591342{14601343 trace_rcu_utilization("Start scheduler-tick");14611461- if (user ||14621462- (idle_cpu(cpu) && rcu_scheduler_active &&14631463- !in_softirq() && hardirq_count() <= (1 << HARDIRQ_SHIFT))) {13441344+ if (user || rcu_is_cpu_rrupt_from_idle()) {1464134514651346 /*14661347 * Get here if this CPU took its interrupt from user···15721457 goto unlock_fqs_ret; /* no GP in progress, time updated. */15731458 }15741459 rsp->fqs_active = 1;15751575- switch (rsp->signaled) {14601460+ switch (rsp->fqs_state) {15761461 case RCU_GP_IDLE:15771462 case RCU_GP_INIT:15781463···15881473 force_qs_rnp(rsp, dyntick_save_progress_counter);15891474 raw_spin_lock(&rnp->lock); /* irqs already disabled */15901475 if (rcu_gp_in_progress(rsp))15911591- rsp->signaled = RCU_FORCE_QS;14761476+ rsp->fqs_state = RCU_FORCE_QS;15921477 break;1593147815941479 case RCU_FORCE_QS:···19271812 * by the current CPU, even if none need be done immediately, returning19281813 * 1 if so.19291814 */19301930-static int rcu_needs_cpu_quick_check(int cpu)18151815+static int rcu_cpu_has_callbacks(int cpu)19311816{19321817 /* RCU callbacks either ready or pending? */19331818 return per_cpu(rcu_sched_data, cpu).nxtlist ||···20281913 for (i = 0; i < RCU_NEXT_SIZE; i++)20291914 rdp->nxttail[i] = &rdp->nxtlist;20301915 rdp->qlen = 0;20312031-#ifdef CONFIG_NO_HZ20321916 rdp->dynticks = &per_cpu(rcu_dynticks, cpu);20332033-#endif /* #ifdef CONFIG_NO_HZ */19171917+ WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != DYNTICK_TASK_NESTING);19181918+ WARN_ON_ONCE(atomic_read(&rdp->dynticks->dynticks) != 1);20341919 rdp->cpu = cpu;20351920 rdp->rsp = rsp;20361921 raw_spin_unlock_irqrestore(&rnp->lock, flags);···20571942 rdp->qlen_last_fqs_check = 0;20581943 rdp->n_force_qs_snap = rsp->n_force_qs;20591944 rdp->blimit = blimit;19451945+ rdp->dynticks->dynticks_nesting = DYNTICK_TASK_NESTING;19461946+ atomic_set(&rdp->dynticks->dynticks,19471947+ (atomic_read(&rdp->dynticks->dynticks) & ~0x1) + 1);19481948+ rcu_prepare_for_idle_init(cpu);20601949 raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */2061195020621951 /*···21422023 rcu_send_cbs_to_online(&rcu_bh_state);21432024 rcu_send_cbs_to_online(&rcu_sched_state);21442025 rcu_preempt_send_cbs_to_online();20262026+ rcu_cleanup_after_idle(cpu);21452027 break;21462028 case CPU_DEAD:21472029 case CPU_DEAD_FROZEN:
+12-14
kernel/rcutree.h
···8484 * Dynticks per-CPU state.8585 */8686struct rcu_dynticks {8787- int dynticks_nesting; /* Track irq/process nesting level. */8888- int dynticks_nmi_nesting; /* Track NMI nesting level. */8989- atomic_t dynticks; /* Even value for dynticks-idle, else odd. */8787+ long long dynticks_nesting; /* Track irq/process nesting level. */8888+ /* Process level is worth LLONG_MAX/2. */8989+ int dynticks_nmi_nesting; /* Track NMI nesting level. */9090+ atomic_t dynticks; /* Even value for idle, else odd. */9091};91929293/* RCU's kthread states for tracing. */···275274 /* did other CPU force QS recently? */276275 long blimit; /* Upper limit on a processed batch */277276278278-#ifdef CONFIG_NO_HZ279277 /* 3) dynticks interface. */280278 struct rcu_dynticks *dynticks; /* Shared per-CPU dynticks state. */281279 int dynticks_snap; /* Per-GP tracking for dynticks. */282282-#endif /* #ifdef CONFIG_NO_HZ */283280284281 /* 4) reasons this CPU needed to be kicked by force_quiescent_state */285285-#ifdef CONFIG_NO_HZ286282 unsigned long dynticks_fqs; /* Kicked due to dynticks idle. */287287-#endif /* #ifdef CONFIG_NO_HZ */288283 unsigned long offline_fqs; /* Kicked due to being offline. */289284 unsigned long resched_ipi; /* Sent a resched IPI. */290285···299302 struct rcu_state *rsp;300303};301304302302-/* Values for signaled field in struct rcu_state. */305305+/* Values for fqs_state field in struct rcu_state. */303306#define RCU_GP_IDLE 0 /* No grace period in progress. */304307#define RCU_GP_INIT 1 /* Grace period being initialized. */305308#define RCU_SAVE_DYNTICK 2 /* Need to scan dyntick state. */306309#define RCU_FORCE_QS 3 /* Need to force quiescent state. */307307-#ifdef CONFIG_NO_HZ308310#define RCU_SIGNAL_INIT RCU_SAVE_DYNTICK309309-#else /* #ifdef CONFIG_NO_HZ */310310-#define RCU_SIGNAL_INIT RCU_FORCE_QS311311-#endif /* #else #ifdef CONFIG_NO_HZ */312311313312#define RCU_JIFFIES_TILL_FORCE_QS 3 /* for rsp->jiffies_force_qs */314313···354361355362 /* The following fields are guarded by the root rcu_node's lock. */356363357357- u8 signaled ____cacheline_internodealigned_in_smp;364364+ u8 fqs_state ____cacheline_internodealigned_in_smp;358365 /* Force QS state. */359366 u8 fqs_active; /* force_quiescent_state() */360367 /* is running. */···444451static void rcu_preempt_process_callbacks(void);445452void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu));446453#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_TREE_PREEMPT_RCU)447447-static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp);454454+static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,455455+ bool wake);448456#endif /* #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_TREE_PREEMPT_RCU) */449457static int rcu_preempt_pending(int cpu);450458static int rcu_preempt_needs_cpu(int cpu);···455461static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags);456462static void rcu_preempt_boost_start_gp(struct rcu_node *rnp);457463static void invoke_rcu_callbacks_kthread(void);464464+static bool rcu_is_callbacks_kthread(void);458465#ifdef CONFIG_RCU_BOOST459466static void rcu_preempt_do_callbacks(void);460467static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp,···468473#endif /* #ifdef CONFIG_RCU_BOOST */469474static void rcu_cpu_kthread_setrt(int cpu, int to_rt);470475static void __cpuinit rcu_prepare_kthreads(int cpu);476476+static void rcu_prepare_for_idle_init(int cpu);477477+static void rcu_cleanup_after_idle(int cpu);478478+static void rcu_prepare_for_idle(int cpu);471479472480#endif /* #ifndef RCU_TREE_NONCORE */
+229-56
kernel/rcutree_plugin.h
···312312{313313 int empty;314314 int empty_exp;315315+ int empty_exp_now;315316 unsigned long flags;316317 struct list_head *np;317318#ifdef CONFIG_RCU_BOOST···383382 /*384383 * If this was the last task on the current list, and if385384 * we aren't waiting on any CPUs, report the quiescent state.386386- * Note that rcu_report_unblock_qs_rnp() releases rnp->lock.385385+ * Note that rcu_report_unblock_qs_rnp() releases rnp->lock,386386+ * so we must take a snapshot of the expedited state.387387 */388388+ empty_exp_now = !rcu_preempted_readers_exp(rnp);388389 if (!empty && !rcu_preempt_blocked_readers_cgp(rnp)) {389390 trace_rcu_quiescent_state_report("preempt_rcu",390391 rnp->gpnum,···409406 * If this was the last task on the expedited lists,410407 * then we need to report up the rcu_node hierarchy.411408 */412412- if (!empty_exp && !rcu_preempted_readers_exp(rnp))413413- rcu_report_exp_rnp(&rcu_preempt_state, rnp);409409+ if (!empty_exp && empty_exp_now)410410+ rcu_report_exp_rnp(&rcu_preempt_state, rnp, true);414411 } else {415412 local_irq_restore(flags);416413 }···732729 * recursively up the tree. (Calm down, calm down, we do the recursion733730 * iteratively!)734731 *732732+ * Most callers will set the "wake" flag, but the task initiating the733733+ * expedited grace period need not wake itself.734734+ *735735 * Caller must hold sync_rcu_preempt_exp_mutex.736736 */737737-static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)737737+static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,738738+ bool wake)738739{739740 unsigned long flags;740741 unsigned long mask;···751744 }752745 if (rnp->parent == NULL) {753746 raw_spin_unlock_irqrestore(&rnp->lock, flags);754754- wake_up(&sync_rcu_preempt_exp_wq);747747+ if (wake)748748+ wake_up(&sync_rcu_preempt_exp_wq);755749 break;756750 }757751 mask = rnp->grpmask;···785777 must_wait = 1;786778 }787779 if (!must_wait)788788- rcu_report_exp_rnp(rsp, rnp);780780+ rcu_report_exp_rnp(rsp, rnp, false); /* Don't wake self. */789781}790782791783/*···10771069 * report on tasks preempted in RCU read-side critical sections during10781070 * expedited RCU grace periods.10791071 */10801080-static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)10721072+static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,10731073+ bool wake)10811074{10821082- return;10831075}1084107610851077#endif /* #ifdef CONFIG_HOTPLUG_CPU */···1165115711661158#endif /* #else #ifdef CONFIG_RCU_TRACE */1167115911681168-static struct lock_class_key rcu_boost_class;11691169-11701160/*11711161 * Carry out RCU priority boosting on the task indicated by ->exp_tasks11721162 * or ->boost_tasks, advancing the pointer to the next task in the···12271221 */12281222 t = container_of(tb, struct task_struct, rcu_node_entry);12291223 rt_mutex_init_proxy_locked(&mtx, t);12301230- /* Avoid lockdep false positives. This rt_mutex is its own thing. */12311231- lockdep_set_class_and_name(&mtx.wait_lock, &rcu_boost_class,12321232- "rcu_boost_mutex");12331224 t->rcu_boost_mutex = &mtx;12341225 raw_spin_unlock_irqrestore(&rnp->lock, flags);12351226 rt_mutex_lock(&mtx); /* Side effect: boosts task t's priority. */12361227 rt_mutex_unlock(&mtx); /* Keep lockdep happy. */1237122812381238- return rnp->exp_tasks != NULL || rnp->boost_tasks != NULL;12291229+ return ACCESS_ONCE(rnp->exp_tasks) != NULL ||12301230+ ACCESS_ONCE(rnp->boost_tasks) != NULL;12391231}1240123212411233/*···13301326 current != __this_cpu_read(rcu_cpu_kthread_task))13311327 wake_up_process(__this_cpu_read(rcu_cpu_kthread_task));13321328 local_irq_restore(flags);13291329+}13301330+13311331+/*13321332+ * Is the current CPU running the RCU-callbacks kthread?13331333+ * Caller must have preemption disabled.13341334+ */13351335+static bool rcu_is_callbacks_kthread(void)13361336+{13371337+ return __get_cpu_var(rcu_cpu_kthread_task) == current;13331338}1334133913351340/*···17851772 WARN_ON_ONCE(1);17861773}1787177417751775+static bool rcu_is_callbacks_kthread(void)17761776+{17771777+ return false;17781778+}17791779+17881780static void rcu_preempt_boost_start_gp(struct rcu_node *rnp)17891781{17901782}···19251907 * grace period works for us.19261908 */19271909 get_online_cpus();19281928- snap = atomic_read(&sync_sched_expedited_started) - 1;19101910+ snap = atomic_read(&sync_sched_expedited_started);19291911 smp_mb(); /* ensure read is before try_stop_cpus(). */19301912 }19311913···19571939 * 1 if so. This function is part of the RCU implementation; it is -not-19581940 * an exported member of the RCU API.19591941 *19601960- * Because we have preemptible RCU, just check whether this CPU needs19611961- * any flavor of RCU. Do not chew up lots of CPU cycles with preemption19621962- * disabled in a most-likely vain attempt to cause RCU not to need this CPU.19421942+ * Because we not have RCU_FAST_NO_HZ, just check whether this CPU needs19431943+ * any flavor of RCU.19631944 */19641945int rcu_needs_cpu(int cpu)19651946{19661966- return rcu_needs_cpu_quick_check(cpu);19471947+ return rcu_cpu_has_callbacks(cpu);19481948+}19491949+19501950+/*19511951+ * Because we do not have RCU_FAST_NO_HZ, don't bother initializing for it.19521952+ */19531953+static void rcu_prepare_for_idle_init(int cpu)19541954+{19551955+}19561956+19571957+/*19581958+ * Because we do not have RCU_FAST_NO_HZ, don't bother cleaning up19591959+ * after it.19601960+ */19611961+static void rcu_cleanup_after_idle(int cpu)19621962+{19631963+}19641964+19651965+/*19661966+ * Do the idle-entry grace-period work, which, because CONFIG_RCU_FAST_NO_HZ=y,19671967+ * is nothing.19681968+ */19691969+static void rcu_prepare_for_idle(int cpu)19701970+{19671971}1968197219691973#else /* #if !defined(CONFIG_RCU_FAST_NO_HZ) */1970197419711971-#define RCU_NEEDS_CPU_FLUSHES 519751975+/*19761976+ * This code is invoked when a CPU goes idle, at which point we want19771977+ * to have the CPU do everything required for RCU so that it can enter19781978+ * the energy-efficient dyntick-idle mode. This is handled by a19791979+ * state machine implemented by rcu_prepare_for_idle() below.19801980+ *19811981+ * The following three proprocessor symbols control this state machine:19821982+ *19831983+ * RCU_IDLE_FLUSHES gives the maximum number of times that we will attempt19841984+ * to satisfy RCU. Beyond this point, it is better to incur a periodic19851985+ * scheduling-clock interrupt than to loop through the state machine19861986+ * at full power.19871987+ * RCU_IDLE_OPT_FLUSHES gives the number of RCU_IDLE_FLUSHES that are19881988+ * optional if RCU does not need anything immediately from this19891989+ * CPU, even if this CPU still has RCU callbacks queued. The first19901990+ * times through the state machine are mandatory: we need to give19911991+ * the state machine a chance to communicate a quiescent state19921992+ * to the RCU core.19931993+ * RCU_IDLE_GP_DELAY gives the number of jiffies that a CPU is permitted19941994+ * to sleep in dyntick-idle mode with RCU callbacks pending. This19951995+ * is sized to be roughly one RCU grace period. Those energy-efficiency19961996+ * benchmarkers who might otherwise be tempted to set this to a large19971997+ * number, be warned: Setting RCU_IDLE_GP_DELAY too high can hang your19981998+ * system. And if you are -that- concerned about energy efficiency,19991999+ * just power the system down and be done with it!20002000+ *20012001+ * The values below work well in practice. If future workloads require20022002+ * adjustment, they can be converted into kernel config parameters, though20032003+ * making the state machine smarter might be a better option.20042004+ */20052005+#define RCU_IDLE_FLUSHES 5 /* Number of dyntick-idle tries. */20062006+#define RCU_IDLE_OPT_FLUSHES 3 /* Optional dyntick-idle tries. */20072007+#define RCU_IDLE_GP_DELAY 6 /* Roughly one grace period. */20082008+19722009static DEFINE_PER_CPU(int, rcu_dyntick_drain);19732010static DEFINE_PER_CPU(unsigned long, rcu_dyntick_holdoff);20112011+static DEFINE_PER_CPU(struct hrtimer, rcu_idle_gp_timer);20122012+static ktime_t rcu_idle_gp_wait;1974201319752014/*19761976- * Check to see if any future RCU-related work will need to be done19771977- * by the current CPU, even if none need be done immediately, returning19781978- * 1 if so. This function is part of the RCU implementation; it is -not-19791979- * an exported member of the RCU API.20152015+ * Allow the CPU to enter dyntick-idle mode if either: (1) There are no20162016+ * callbacks on this CPU, (2) this CPU has not yet attempted to enter20172017+ * dyntick-idle mode, or (3) this CPU is in the process of attempting to20182018+ * enter dyntick-idle mode. Otherwise, if we have recently tried and failed20192019+ * to enter dyntick-idle mode, we refuse to try to enter it. After all,20202020+ * it is better to incur scheduling-clock interrupts than to spin20212021+ * continuously for the same time duration!20222022+ */20232023+int rcu_needs_cpu(int cpu)20242024+{20252025+ /* If no callbacks, RCU doesn't need the CPU. */20262026+ if (!rcu_cpu_has_callbacks(cpu))20272027+ return 0;20282028+ /* Otherwise, RCU needs the CPU only if it recently tried and failed. */20292029+ return per_cpu(rcu_dyntick_holdoff, cpu) == jiffies;20302030+}20312031+20322032+/*20332033+ * Timer handler used to force CPU to start pushing its remaining RCU20342034+ * callbacks in the case where it entered dyntick-idle mode with callbacks20352035+ * pending. The hander doesn't really need to do anything because the20362036+ * real work is done upon re-entry to idle, or by the next scheduling-clock20372037+ * interrupt should idle not be re-entered.20382038+ */20392039+static enum hrtimer_restart rcu_idle_gp_timer_func(struct hrtimer *hrtp)20402040+{20412041+ trace_rcu_prep_idle("Timer");20422042+ return HRTIMER_NORESTART;20432043+}20442044+20452045+/*20462046+ * Initialize the timer used to pull CPUs out of dyntick-idle mode.20472047+ */20482048+static void rcu_prepare_for_idle_init(int cpu)20492049+{20502050+ static int firsttime = 1;20512051+ struct hrtimer *hrtp = &per_cpu(rcu_idle_gp_timer, cpu);20522052+20532053+ hrtimer_init(hrtp, CLOCK_MONOTONIC, HRTIMER_MODE_REL);20542054+ hrtp->function = rcu_idle_gp_timer_func;20552055+ if (firsttime) {20562056+ unsigned int upj = jiffies_to_usecs(RCU_IDLE_GP_DELAY);20572057+20582058+ rcu_idle_gp_wait = ns_to_ktime(upj * (u64)1000);20592059+ firsttime = 0;20602060+ }20612061+}20622062+20632063+/*20642064+ * Clean up for exit from idle. Because we are exiting from idle, there20652065+ * is no longer any point to rcu_idle_gp_timer, so cancel it. This will20662066+ * do nothing if this timer is not active, so just cancel it unconditionally.20672067+ */20682068+static void rcu_cleanup_after_idle(int cpu)20692069+{20702070+ hrtimer_cancel(&per_cpu(rcu_idle_gp_timer, cpu));20712071+}20722072+20732073+/*20742074+ * Check to see if any RCU-related work can be done by the current CPU,20752075+ * and if so, schedule a softirq to get it done. This function is part20762076+ * of the RCU implementation; it is -not- an exported member of the RCU API.19802077 *19811981- * Because we are not supporting preemptible RCU, attempt to accelerate19821982- * any current grace periods so that RCU no longer needs this CPU, but19831983- * only if all other CPUs are already in dynticks-idle mode. This will19841984- * allow the CPU cores to be powered down immediately, as opposed to after19851985- * waiting many milliseconds for grace periods to elapse.20782078+ * The idea is for the current CPU to clear out all work required by the20792079+ * RCU core for the current grace period, so that this CPU can be permitted20802080+ * to enter dyntick-idle mode. In some cases, it will need to be awakened20812081+ * at the end of the grace period by whatever CPU ends the grace period.20822082+ * This allows CPUs to go dyntick-idle more quickly, and to reduce the20832083+ * number of wakeups by a modest integer factor.19862084 *19872085 * Because it is not legal to invoke rcu_process_callbacks() with irqs19882086 * disabled, we do one pass of force_quiescent_state(), then do a19892087 * invoke_rcu_core() to cause rcu_process_callbacks() to be invoked19902088 * later. The per-cpu rcu_dyntick_drain variable controls the sequencing.20892089+ *20902090+ * The caller must have disabled interrupts.19912091 */19921992-int rcu_needs_cpu(int cpu)20922092+static void rcu_prepare_for_idle(int cpu)19932093{19941994- int c = 0;19951995- int snap;19961996- int thatcpu;20942094+ unsigned long flags;1997209519981998- /* Check for being in the holdoff period. */19991999- if (per_cpu(rcu_dyntick_holdoff, cpu) == jiffies)20002000- return rcu_needs_cpu_quick_check(cpu);20962096+ local_irq_save(flags);2001209720022002- /* Don't bother unless we are the last non-dyntick-idle CPU. */20032003- for_each_online_cpu(thatcpu) {20042004- if (thatcpu == cpu)20052005- continue;20062006- snap = atomic_add_return(0, &per_cpu(rcu_dynticks,20072007- thatcpu).dynticks);20082008- smp_mb(); /* Order sampling of snap with end of grace period. */20092009- if ((snap & 0x1) != 0) {20102010- per_cpu(rcu_dyntick_drain, cpu) = 0;20112011- per_cpu(rcu_dyntick_holdoff, cpu) = jiffies - 1;20122012- return rcu_needs_cpu_quick_check(cpu);20132013- }20982098+ /*20992099+ * If there are no callbacks on this CPU, enter dyntick-idle mode.21002100+ * Also reset state to avoid prejudicing later attempts.21012101+ */21022102+ if (!rcu_cpu_has_callbacks(cpu)) {21032103+ per_cpu(rcu_dyntick_holdoff, cpu) = jiffies - 1;21042104+ per_cpu(rcu_dyntick_drain, cpu) = 0;21052105+ local_irq_restore(flags);21062106+ trace_rcu_prep_idle("No callbacks");21072107+ return;21082108+ }21092109+21102110+ /*21112111+ * If in holdoff mode, just return. We will presumably have21122112+ * refrained from disabling the scheduling-clock tick.21132113+ */21142114+ if (per_cpu(rcu_dyntick_holdoff, cpu) == jiffies) {21152115+ local_irq_restore(flags);21162116+ trace_rcu_prep_idle("In holdoff");21172117+ return;20142118 }2015211920162120 /* Check and update the rcu_dyntick_drain sequencing. */20172121 if (per_cpu(rcu_dyntick_drain, cpu) <= 0) {20182122 /* First time through, initialize the counter. */20192019- per_cpu(rcu_dyntick_drain, cpu) = RCU_NEEDS_CPU_FLUSHES;21232123+ per_cpu(rcu_dyntick_drain, cpu) = RCU_IDLE_FLUSHES;21242124+ } else if (per_cpu(rcu_dyntick_drain, cpu) <= RCU_IDLE_OPT_FLUSHES &&21252125+ !rcu_pending(cpu)) {21262126+ /* Can we go dyntick-idle despite still having callbacks? */21272127+ trace_rcu_prep_idle("Dyntick with callbacks");21282128+ per_cpu(rcu_dyntick_drain, cpu) = 0;21292129+ per_cpu(rcu_dyntick_holdoff, cpu) = jiffies - 1;21302130+ hrtimer_start(&per_cpu(rcu_idle_gp_timer, cpu),21312131+ rcu_idle_gp_wait, HRTIMER_MODE_REL);21322132+ return; /* Nothing more to do immediately. */20202133 } else if (--per_cpu(rcu_dyntick_drain, cpu) <= 0) {20212134 /* We have hit the limit, so time to give up. */20222135 per_cpu(rcu_dyntick_holdoff, cpu) = jiffies;20232023- return rcu_needs_cpu_quick_check(cpu);21362136+ local_irq_restore(flags);21372137+ trace_rcu_prep_idle("Begin holdoff");21382138+ invoke_rcu_core(); /* Force the CPU out of dyntick-idle. */21392139+ return;20242140 }2025214120262026- /* Do one step pushing remaining RCU callbacks through. */21422142+ /*21432143+ * Do one step of pushing the remaining RCU callbacks through21442144+ * the RCU core state machine.21452145+ */21462146+#ifdef CONFIG_TREE_PREEMPT_RCU21472147+ if (per_cpu(rcu_preempt_data, cpu).nxtlist) {21482148+ local_irq_restore(flags);21492149+ rcu_preempt_qs(cpu);21502150+ force_quiescent_state(&rcu_preempt_state, 0);21512151+ local_irq_save(flags);21522152+ }21532153+#endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */20272154 if (per_cpu(rcu_sched_data, cpu).nxtlist) {21552155+ local_irq_restore(flags);20282156 rcu_sched_qs(cpu);20292157 force_quiescent_state(&rcu_sched_state, 0);20302030- c = c || per_cpu(rcu_sched_data, cpu).nxtlist;21582158+ local_irq_save(flags);20312159 }20322160 if (per_cpu(rcu_bh_data, cpu).nxtlist) {21612161+ local_irq_restore(flags);20332162 rcu_bh_qs(cpu);20342163 force_quiescent_state(&rcu_bh_state, 0);20352035- c = c || per_cpu(rcu_bh_data, cpu).nxtlist;21642164+ local_irq_save(flags);20362165 }2037216620382038- /* If RCU callbacks are still pending, RCU still needs this CPU. */20392039- if (c)21672167+ /*21682168+ * If RCU callbacks are still pending, RCU still needs this CPU.21692169+ * So try forcing the callbacks through the grace period.21702170+ */21712171+ if (rcu_cpu_has_callbacks(cpu)) {21722172+ local_irq_restore(flags);21732173+ trace_rcu_prep_idle("More callbacks");20402174 invoke_rcu_core();20412041- return c;21752175+ } else {21762176+ local_irq_restore(flags);21772177+ trace_rcu_prep_idle("Callbacks drained");21782178+ }20422179}2043218020442181#endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
···579579 struct rt_mutex_waiter *waiter)580580{581581 int ret = 0;582582- int was_disabled;583582584583 for (;;) {585584 /* Try to acquire the lock: */···601602602603 raw_spin_unlock(&lock->wait_lock);603604604604- was_disabled = irqs_disabled();605605- if (was_disabled)606606- local_irq_enable();607607-608605 debug_rt_mutex_print_deadlock(waiter);609606610607 schedule_rt_mutex(lock);611611-612612- if (was_disabled)613613- local_irq_disable();614608615609 raw_spin_lock(&lock->wait_lock);616610 set_current_state(state);
+2-2
kernel/softirq.c
···347347 if (!in_interrupt() && local_softirq_pending())348348 invoke_softirq();349349350350- rcu_irq_exit();351350#ifdef CONFIG_NO_HZ352351 /* Make sure that timer wheel updates are propagated */353352 if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())354354- tick_nohz_stop_sched_tick(0);353353+ tick_nohz_irq_exit();355354#endif355355+ rcu_irq_exit();356356 preempt_enable_no_resched();357357}358358
+60-37
kernel/time/tick-sched.c
···275275}276276EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);277277278278-/**279279- * tick_nohz_stop_sched_tick - stop the idle tick from the idle task280280- *281281- * When the next event is more than a tick into the future, stop the idle tick282282- * Called either from the idle loop or from irq_exit() when an idle period was283283- * just interrupted by an interrupt which did not cause a reschedule.284284- */285285-void tick_nohz_stop_sched_tick(int inidle)278278+static void tick_nohz_stop_sched_tick(struct tick_sched *ts)286279{287287- unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags;288288- struct tick_sched *ts;280280+ unsigned long seq, last_jiffies, next_jiffies, delta_jiffies;289281 ktime_t last_update, expires, now;290282 struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;291283 u64 time_delta;292284 int cpu;293285294294- local_irq_save(flags);295295-296286 cpu = smp_processor_id();297287 ts = &per_cpu(tick_cpu_sched, cpu);298298-299299- /*300300- * Call to tick_nohz_start_idle stops the last_update_time from being301301- * updated. Thus, it must not be called in the event we are called from302302- * irq_exit() with the prior state different than idle.303303- */304304- if (!inidle && !ts->inidle)305305- goto end;306306-307307- /*308308- * Set ts->inidle unconditionally. Even if the system did not309309- * switch to NOHZ mode the cpu frequency governers rely on the310310- * update of the idle time accounting in tick_nohz_start_idle().311311- */312312- ts->inidle = 1;313288314289 now = tick_nohz_start_idle(cpu, ts);315290···301326 }302327303328 if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))304304- goto end;329329+ return;305330306331 if (need_resched())307307- goto end;332332+ return;308333309334 if (unlikely(local_softirq_pending() && cpu_online(cpu))) {310335 static int ratelimit;···314339 (unsigned int) local_softirq_pending());315340 ratelimit++;316341 }317317- goto end;342342+ return;318343 }319344320345 ts->idle_calls++;···409434 ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);410435 ts->tick_stopped = 1;411436 ts->idle_jiffies = last_jiffies;412412- rcu_enter_nohz();413437 }414438415439 ts->idle_sleeps++;···446472 ts->next_jiffies = next_jiffies;447473 ts->last_jiffies = last_jiffies;448474 ts->sleep_length = ktime_sub(dev->next_event, now);449449-end:450450- local_irq_restore(flags);475475+}476476+477477+/**478478+ * tick_nohz_idle_enter - stop the idle tick from the idle task479479+ *480480+ * When the next event is more than a tick into the future, stop the idle tick481481+ * Called when we start the idle loop.482482+ *483483+ * The arch is responsible of calling:484484+ *485485+ * - rcu_idle_enter() after its last use of RCU before the CPU is put486486+ * to sleep.487487+ * - rcu_idle_exit() before the first use of RCU after the CPU is woken up.488488+ */489489+void tick_nohz_idle_enter(void)490490+{491491+ struct tick_sched *ts;492492+493493+ WARN_ON_ONCE(irqs_disabled());494494+495495+ local_irq_disable();496496+497497+ ts = &__get_cpu_var(tick_cpu_sched);498498+ /*499499+ * set ts->inidle unconditionally. even if the system did not500500+ * switch to nohz mode the cpu frequency governers rely on the501501+ * update of the idle time accounting in tick_nohz_start_idle().502502+ */503503+ ts->inidle = 1;504504+ tick_nohz_stop_sched_tick(ts);505505+506506+ local_irq_enable();507507+}508508+509509+/**510510+ * tick_nohz_irq_exit - update next tick event from interrupt exit511511+ *512512+ * When an interrupt fires while we are idle and it doesn't cause513513+ * a reschedule, it may still add, modify or delete a timer, enqueue514514+ * an RCU callback, etc...515515+ * So we need to re-calculate and reprogram the next tick event.516516+ */517517+void tick_nohz_irq_exit(void)518518+{519519+ struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);520520+521521+ if (!ts->inidle)522522+ return;523523+524524+ tick_nohz_stop_sched_tick(ts);451525}452526453527/**···537515}538516539517/**540540- * tick_nohz_restart_sched_tick - restart the idle tick from the idle task518518+ * tick_nohz_idle_exit - restart the idle tick from the idle task541519 *542520 * Restart the idle tick when the CPU is woken up from idle521521+ * This also exit the RCU extended quiescent state. The CPU522522+ * can use RCU again after this function is called.543523 */544544-void tick_nohz_restart_sched_tick(void)524524+void tick_nohz_idle_exit(void)545525{546526 int cpu = smp_processor_id();547527 struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);···553529 ktime_t now;554530555531 local_irq_disable();532532+556533 if (ts->idle_active || (ts->inidle && ts->tick_stopped))557534 now = ktime_get();558535···567542 }568543569544 ts->inidle = 0;570570-571571- rcu_exit_nohz();572545573546 /* Update jiffies first */574547 select_nohz_load_balancer(0);