···11CONFIG_RCU_TRACE debugfs Files and Formats223344-The rcutree implementation of RCU provides debugfs trace output that55-summarizes counters and state. This information is useful for debugging66-RCU itself, and can sometimes also help to debug abuses of RCU.77-The following sections describe the debugfs files and formats.44+The rcutree and rcutiny implementations of RCU provide debugfs trace55+output that summarizes counters and state. This information is useful for66+debugging RCU itself, and can sometimes also help to debug abuses of RCU.77+The following sections describe the debugfs files and formats, first88+for rcutree and next for rcutiny.899101010-Hierarchical RCU debugfs Files and Formats1111+CONFIG_TREE_RCU and CONFIG_TREE_PREEMPT_RCU debugfs Files and Formats11121212-This implementation of RCU provides three debugfs files under the1313+These implementations of RCU provides five debugfs files under the1314top-level directory RCU: rcu/rcudata (which displays fields in struct1414-rcu_data), rcu/rcugp (which displays grace-period counters), and1515-rcu/rcuhier (which displays the struct rcu_node hierarchy).1515+rcu_data), rcu/rcudata.csv (which is a .csv spreadsheet version of1616+rcu/rcudata), rcu/rcugp (which displays grace-period counters),1717+rcu/rcuhier (which displays the struct rcu_node hierarchy), and1818+rcu/rcu_pending (which displays counts of the reasons that the1919+rcu_pending() function decided that there was core RCU work to do).16201721The output of "cat rcu/rcudata" looks as follows:1822···134130 been registered in absence of CPU-hotplug activity.135131136132o "co" is the number of RCU callbacks that have been orphaned due to137137- this CPU going offline.133133+ this CPU going offline. These orphaned callbacks have been moved134134+ to an arbitrarily chosen online CPU.138135139136o "ca" is the number of RCU callbacks that have been adopted due to140137 other CPUs going offline. Note that ci+co-ca+ql is the number of···173168174169The output of "cat rcu/rcuhier" looks as follows, with very long lines:175170176176-c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6 oqlen=0171171+c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=61771721/1 .>. 0:127 ^0 1781733/3 .>. 0:35 ^0 0/0 .>. 36:71 ^1 0/0 .>. 72:107 ^2 0/0 .>. 108:127 ^3 1791743/3f .>. 0:5 ^0 2/3 .>. 6:11 ^1 0/0 .>. 12:17 ^2 0/0 .>. 18:23 ^3 0/0 .>. 24:29 ^4 0/0 .>. 30:35 ^5 0/0 .>. 36:41 ^0 0/0 .>. 42:47 ^1 0/0 .>. 48:53 ^2 0/0 .>. 54:59 ^3 0/0 .>. 60:65 ^4 0/0 .>. 66:71 ^5 0/0 .>. 72:77 ^0 0/0 .>. 78:83 ^1 0/0 .>. 84:89 ^2 0/0 .>. 90:95 ^3 0/0 .>. 96:101 ^4 0/0 .>. 102:107 ^5 0/0 .>. 108:113 ^0 0/0 .>. 114:119 ^1 0/0 .>. 120:125 ^2 0/0 .>. 126:127 ^3 180175rcu_bh:181181-c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0 oqlen=0176176+c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=01821770/1 .>. 0:127 ^0 1831780/3 .>. 0:35 ^0 0/0 .>. 36:71 ^1 0/0 .>. 72:107 ^2 0/0 .>. 108:127 ^3 1841790/3f .>. 0:5 ^0 0/3 .>. 6:11 ^1 0/0 .>. 12:17 ^2 0/0 .>. 18:23 ^3 0/0 .>. 24:29 ^4 0/0 .>. 30:35 ^5 0/0 .>. 36:41 ^0 0/0 .>. 42:47 ^1 0/0 .>. 48:53 ^2 0/0 .>. 54:59 ^3 0/0 .>. 60:65 ^4 0/0 .>. 66:71 ^5 0/0 .>. 72:77 ^0 0/0 .>. 78:83 ^1 0/0 .>. 84:89 ^2 0/0 .>. 90:95 ^3 0/0 .>. 96:101 ^4 0/0 .>. 102:107 ^5 0/0 .>. 108:113 ^0 0/0 .>. 114:119 ^1 0/0 .>. 120:125 ^2 0/0 .>. 126:127 ^3···216211o "fqlh" is the number of calls to force_quiescent_state() that217212 exited immediately (without even being counted in nfqs above)218213 due to contention on ->fqslock.219219-220220-o "oqlen" is the number of callbacks on the "orphan" callback221221- list. RCU callbacks are placed on this list by CPUs going222222- offline, and are "adopted" either by the CPU helping the outgoing223223- CPU or by the next rcu_barrier*() call, whichever comes first.224214225215o Each element of the form "1/1 0:127 ^0" represents one struct226216 rcu_node. Each line represents one level of the hierarchy, from···326326 readers will note that the rcu "nn" number for a given CPU very327327 closely matches the rcu_bh "np" number for that same CPU. This328328 is due to short-circuit evaluation in rcu_pending().329329+330330+331331+CONFIG_TINY_RCU and CONFIG_TINY_PREEMPT_RCU debugfs Files and Formats332332+333333+These implementations of RCU provides a single debugfs file under the334334+top-level directory RCU, namely rcu/rcudata, which displays fields in335335+rcu_bh_ctrlblk, rcu_sched_ctrlblk and, for CONFIG_TINY_PREEMPT_RCU,336336+rcu_preempt_ctrlblk.337337+338338+The output of "cat rcu/rcudata" is as follows:339339+340340+rcu_preempt: qlen=24 gp=1097669 g197/p197/c197 tasks=...341341+ ttb=. btg=no ntb=184 neb=0 nnb=183 j=01f7 bt=0274342342+ normal balk: nt=1097669 gt=0 bt=371 b=0 ny=25073378 nos=0343343+ exp balk: bt=0 nos=0344344+rcu_sched: qlen: 0345345+rcu_bh: qlen: 0346346+347347+This is split into rcu_preempt, rcu_sched, and rcu_bh sections, with the348348+rcu_preempt section appearing only in CONFIG_TINY_PREEMPT_RCU builds.349349+The last three lines of the rcu_preempt section appear only in350350+CONFIG_RCU_BOOST kernel builds. The fields are as follows:351351+352352+o "qlen" is the number of RCU callbacks currently waiting either353353+ for an RCU grace period or waiting to be invoked. This is the354354+ only field present for rcu_sched and rcu_bh, due to the355355+ short-circuiting of grace period in those two cases.356356+357357+o "gp" is the number of grace periods that have completed.358358+359359+o "g197/p197/c197" displays the grace-period state, with the360360+ "g" number being the number of grace periods that have started361361+ (mod 256), the "p" number being the number of grace periods362362+ that the CPU has responded to (also mod 256), and the "c"363363+ number being the number of grace periods that have completed364364+ (once again mode 256).365365+366366+ Why have both "gp" and "g"? Because the data flowing into367367+ "gp" is only present in a CONFIG_RCU_TRACE kernel.368368+369369+o "tasks" is a set of bits. The first bit is "T" if there are370370+ currently tasks that have recently blocked within an RCU371371+ read-side critical section, the second bit is "N" if any of the372372+ aforementioned tasks are blocking the current RCU grace period,373373+ and the third bit is "E" if any of the aforementioned tasks are374374+ blocking the current expedited grace period. Each bit is "."375375+ if the corresponding condition does not hold.376376+377377+o "ttb" is a single bit. It is "B" if any of the blocked tasks378378+ need to be priority boosted and "." otherwise.379379+380380+o "btg" indicates whether boosting has been carried out during381381+ the current grace period, with "exp" indicating that boosting382382+ is in progress for an expedited grace period, "no" indicating383383+ that boosting has not yet started for a normal grace period,384384+ "begun" indicating that boosting has bebug for a normal grace385385+ period, and "done" indicating that boosting has completed for386386+ a normal grace period.387387+388388+o "ntb" is the total number of tasks subjected to RCU priority boosting389389+ periods since boot.390390+391391+o "neb" is the number of expedited grace periods that have had392392+ to resort to RCU priority boosting since boot.393393+394394+o "nnb" is the number of normal grace periods that have had395395+ to resort to RCU priority boosting since boot.396396+397397+o "j" is the low-order 12 bits of the jiffies counter in hexadecimal.398398+399399+o "bt" is the low-order 12 bits of the value that the jiffies counter400400+ will have at the next time that boosting is scheduled to begin.401401+402402+o In the line beginning with "normal balk", the fields are as follows:403403+404404+ o "nt" is the number of times that the system balked from405405+ boosting because there were no blocked tasks to boost.406406+ Note that the system will balk from boosting even if the407407+ grace period is overdue when the currently running task408408+ is looping within an RCU read-side critical section.409409+ There is no point in boosting in this case, because410410+ boosting a running task won't make it run any faster.411411+412412+ o "gt" is the number of times that the system balked413413+ from boosting because, although there were blocked tasks,414414+ none of them were preventing the current grace period415415+ from completing.416416+417417+ o "bt" is the number of times that the system balked418418+ from boosting because boosting was already in progress.419419+420420+ o "b" is the number of times that the system balked from421421+ boosting because boosting had already completed for422422+ the grace period in question.423423+424424+ o "ny" is the number of times that the system balked from425425+ boosting because it was not yet time to start boosting426426+ the grace period in question.427427+428428+ o "nos" is the number of times that the system balked from429429+ boosting for inexplicable ("not otherwise specified")430430+ reasons. This can actually happen due to races involving431431+ increments of the jiffies counter.432432+433433+o In the line beginning with "exp balk", the fields are as follows:434434+435435+ o "bt" is the number of times that the system balked from436436+ boosting because there were no blocked tasks to boost.437437+438438+ o "nos" is the number of times that the system balked from439439+ boosting for inexplicable ("not otherwise specified")440440+ reasons.
···241241#define list_first_entry_rcu(ptr, type, member) \242242 list_entry_rcu((ptr)->next, type, member)243243244244-#define __list_for_each_rcu(pos, head) \245245- for (pos = rcu_dereference_raw(list_next_rcu(head)); \246246- pos != (head); \247247- pos = rcu_dereference_raw(list_next_rcu((pos)))248248-249244/**250245 * list_for_each_entry_rcu - iterate over rcu list of given type251246 * @pos: the type * to use as a loop cursor.
+2-2
include/linux/rcupdate.h
···4747extern int rcutorture_runnable; /* for sysctl */4848#endif /* #ifdef CONFIG_RCU_TORTURE_TEST */49495050+#define UINT_CMP_GE(a, b) (UINT_MAX / 2 >= (a) - (b))5151+#define UINT_CMP_LT(a, b) (UINT_MAX / 2 < (a) - (b))5052#define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b))5153#define ULONG_CMP_LT(a, b) (ULONG_MAX / 2 < (a) - (b))5254···6866extern void synchronize_sched(void);6967extern void rcu_barrier_bh(void);7068extern void rcu_barrier_sched(void);7171-extern void synchronize_sched_expedited(void);7269extern int sched_expedited_torture_stats(char *page);73707471static inline void __rcu_read_lock_bh(void)···119118#endif /* #else #ifdef CONFIG_PREEMPT_RCU */120119121120/* Internal to kernel */122122-extern void rcu_init(void);123121extern void rcu_sched_qs(int cpu);124122extern void rcu_bh_qs(int cpu);125123extern void rcu_check_callbacks(int cpu, int user);
···393393394394config RCU_TRACE395395 bool "Enable tracing for RCU"396396- depends on TREE_RCU || TREE_PREEMPT_RCU397396 help398397 This option provides tracing in RCU which presents stats399398 in debugfs for debugging RCU implementation.···457458 This option provides tracing for the TREE_RCU and458459 TREE_PREEMPT_RCU implementations, permitting Makefile to459460 trivially select kernel/rcutree_trace.c.461461+462462+config RCU_BOOST463463+ bool "Enable RCU priority boosting"464464+ depends on RT_MUTEXES && TINY_PREEMPT_RCU465465+ default n466466+ help467467+ This option boosts the priority of preempted RCU readers that468468+ block the current preemptible RCU grace period for too long.469469+ This option also prevents heavy loads from blocking RCU470470+ callback invocation for all flavors of RCU.471471+472472+ Say Y here if you are working with real-time apps or heavy loads473473+ Say N here if you are unsure.474474+475475+config RCU_BOOST_PRIO476476+ int "Real-time priority to boost RCU readers to"477477+ range 1 99478478+ depends on RCU_BOOST479479+ default 1480480+ help481481+ This option specifies the real-time priority to which preempted482482+ RCU readers are to be boosted. If you are working with CPU-bound483483+ real-time applications, you should specify a priority higher then484484+ the highest-priority CPU-bound application.485485+486486+ Specify the real-time priority, or take the default if unsure.487487+488488+config RCU_BOOST_DELAY489489+ int "Milliseconds to delay boosting after RCU grace-period start"490490+ range 0 3000491491+ depends on RCU_BOOST492492+ default 500493493+ help494494+ This option specifies the time to wait after the beginning of495495+ a given grace period before priority-boosting preempted RCU496496+ readers blocking that grace period. Note that any RCU reader497497+ blocking an expedited RCU grace period is boosted immediately.498498+499499+ Accept the default if unsure.500500+501501+config SRCU_SYNCHRONIZE_DELAY502502+ int "Microseconds to delay before waiting for readers"503503+ range 0 20504504+ default 10505505+ help506506+ This option controls how long SRCU delays before entering its507507+ loop waiting on SRCU readers. The purpose of this loop is508508+ to avoid the unconditional context-switch penalty that would509509+ otherwise be incurred if there was an active SRCU reader,510510+ in a manner similar to adaptive locking schemes. This should511511+ be set to be a bit longer than the common-case SRCU read-side512512+ critical-section overhead.513513+514514+ Accept the default if unsure.460515461516endmenu # "RCU Subsystem"462517
+70-35
kernel/rcutiny.c
···3636#include <linux/time.h>3737#include <linux/cpu.h>38383939-/* Global control variables for rcupdate callback mechanism. */4040-struct rcu_ctrlblk {4141- struct rcu_head *rcucblist; /* List of pending callbacks (CBs). */4242- struct rcu_head **donetail; /* ->next pointer of last "done" CB. */4343- struct rcu_head **curtail; /* ->next pointer of last CB. */4444-};4545-4646-/* Definition for rcupdate control block. */4747-static struct rcu_ctrlblk rcu_sched_ctrlblk = {4848- .donetail = &rcu_sched_ctrlblk.rcucblist,4949- .curtail = &rcu_sched_ctrlblk.rcucblist,5050-};5151-5252-static struct rcu_ctrlblk rcu_bh_ctrlblk = {5353- .donetail = &rcu_bh_ctrlblk.rcucblist,5454- .curtail = &rcu_bh_ctrlblk.rcucblist,5555-};5656-5757-#ifdef CONFIG_DEBUG_LOCK_ALLOC5858-int rcu_scheduler_active __read_mostly;5959-EXPORT_SYMBOL_GPL(rcu_scheduler_active);6060-#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */3939+/* Controls for rcu_kthread() kthread, replacing RCU_SOFTIRQ used previously. */4040+static struct task_struct *rcu_kthread_task;4141+static DECLARE_WAIT_QUEUE_HEAD(rcu_kthread_wq);4242+static unsigned long have_rcu_kthread_work;4343+static void invoke_rcu_kthread(void);61446245/* Forward declarations for rcutiny_plugin.h. */6363-static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp);4646+struct rcu_ctrlblk;4747+static void rcu_process_callbacks(struct rcu_ctrlblk *rcp);4848+static int rcu_kthread(void *arg);6449static void __call_rcu(struct rcu_head *head,6550 void (*func)(struct rcu_head *rcu),6651 struct rcu_ctrlblk *rcp);···108123{109124 if (rcu_qsctr_help(&rcu_sched_ctrlblk) +110125 rcu_qsctr_help(&rcu_bh_ctrlblk))111111- raise_softirq(RCU_SOFTIRQ);126126+ invoke_rcu_kthread();112127}113128114129/*···117132void rcu_bh_qs(int cpu)118133{119134 if (rcu_qsctr_help(&rcu_bh_ctrlblk))120120- raise_softirq(RCU_SOFTIRQ);135135+ invoke_rcu_kthread();121136}122137123138/*···137152}138153139154/*140140- * Helper function for rcu_process_callbacks() that operates on the141141- * specified rcu_ctrlkblk structure.155155+ * Invoke the RCU callbacks on the specified rcu_ctrlkblk structure156156+ * whose grace period has elapsed.142157 */143143-static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)158158+static void rcu_process_callbacks(struct rcu_ctrlblk *rcp)144159{145160 struct rcu_head *next, *list;146161 unsigned long flags;162162+ RCU_TRACE(int cb_count = 0);147163148164 /* If no RCU callbacks ready to invoke, just return. */149165 if (&rcp->rcucblist == rcp->donetail)···166180 next = list->next;167181 prefetch(next);168182 debug_rcu_head_unqueue(list);183183+ local_bh_disable();169184 list->func(list);185185+ local_bh_enable();170186 list = next;187187+ RCU_TRACE(cb_count++);171188 }189189+ RCU_TRACE(rcu_trace_sub_qlen(rcp, cb_count));172190}173191174192/*175175- * Invoke any callbacks whose grace period has completed.193193+ * This kthread invokes RCU callbacks whose grace periods have194194+ * elapsed. It is awakened as needed, and takes the place of the195195+ * RCU_SOFTIRQ that was used previously for this purpose.196196+ * This is a kthread, but it is never stopped, at least not until197197+ * the system goes down.176198 */177177-static void rcu_process_callbacks(struct softirq_action *unused)199199+static int rcu_kthread(void *arg)178200{179179- __rcu_process_callbacks(&rcu_sched_ctrlblk);180180- __rcu_process_callbacks(&rcu_bh_ctrlblk);181181- rcu_preempt_process_callbacks();201201+ unsigned long work;202202+ unsigned long morework;203203+ unsigned long flags;204204+205205+ for (;;) {206206+ wait_event(rcu_kthread_wq, have_rcu_kthread_work != 0);207207+ morework = rcu_boost();208208+ local_irq_save(flags);209209+ work = have_rcu_kthread_work;210210+ have_rcu_kthread_work = morework;211211+ local_irq_restore(flags);212212+ if (work) {213213+ rcu_process_callbacks(&rcu_sched_ctrlblk);214214+ rcu_process_callbacks(&rcu_bh_ctrlblk);215215+ rcu_preempt_process_callbacks();216216+ }217217+ schedule_timeout_interruptible(1); /* Leave CPU for others. */218218+ }219219+220220+ return 0; /* Not reached, but needed to shut gcc up. */221221+}222222+223223+/*224224+ * Wake up rcu_kthread() to process callbacks now eligible for invocation225225+ * or to boost readers.226226+ */227227+static void invoke_rcu_kthread(void)228228+{229229+ unsigned long flags;230230+231231+ local_irq_save(flags);232232+ have_rcu_kthread_work = 1;233233+ wake_up(&rcu_kthread_wq);234234+ local_irq_restore(flags);182235}183236184237/*···255230 local_irq_save(flags);256231 *rcp->curtail = head;257232 rcp->curtail = &head->next;233233+ RCU_TRACE(rcp->qlen++);258234 local_irq_restore(flags);259235}260236···308282}309283EXPORT_SYMBOL_GPL(rcu_barrier_sched);310284311311-void __init rcu_init(void)285285+/*286286+ * Spawn the kthread that invokes RCU callbacks.287287+ */288288+static int __init rcu_spawn_kthreads(void)312289{313313- open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);290290+ struct sched_param sp;291291+292292+ rcu_kthread_task = kthread_run(rcu_kthread, NULL, "rcu_kthread");293293+ sp.sched_priority = RCU_BOOST_PRIO;294294+ sched_setscheduler_nocheck(rcu_kthread_task, SCHED_FIFO, &sp);295295+ return 0;314296}297297+early_initcall(rcu_spawn_kthreads);
+419-14
kernel/rcutiny_plugin.h
···2222 * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>2323 */24242525+#include <linux/kthread.h>2626+#include <linux/debugfs.h>2727+#include <linux/seq_file.h>2828+2929+#ifdef CONFIG_RCU_TRACE3030+#define RCU_TRACE(stmt) stmt3131+#else /* #ifdef CONFIG_RCU_TRACE */3232+#define RCU_TRACE(stmt)3333+#endif /* #else #ifdef CONFIG_RCU_TRACE */3434+3535+/* Global control variables for rcupdate callback mechanism. */3636+struct rcu_ctrlblk {3737+ struct rcu_head *rcucblist; /* List of pending callbacks (CBs). */3838+ struct rcu_head **donetail; /* ->next pointer of last "done" CB. */3939+ struct rcu_head **curtail; /* ->next pointer of last CB. */4040+ RCU_TRACE(long qlen); /* Number of pending CBs. */4141+};4242+4343+/* Definition for rcupdate control block. */4444+static struct rcu_ctrlblk rcu_sched_ctrlblk = {4545+ .donetail = &rcu_sched_ctrlblk.rcucblist,4646+ .curtail = &rcu_sched_ctrlblk.rcucblist,4747+};4848+4949+static struct rcu_ctrlblk rcu_bh_ctrlblk = {5050+ .donetail = &rcu_bh_ctrlblk.rcucblist,5151+ .curtail = &rcu_bh_ctrlblk.rcucblist,5252+};5353+5454+#ifdef CONFIG_DEBUG_LOCK_ALLOC5555+int rcu_scheduler_active __read_mostly;5656+EXPORT_SYMBOL_GPL(rcu_scheduler_active);5757+#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */5858+2559#ifdef CONFIG_TINY_PREEMPT_RCU26602761#include <linux/delay.h>···8046 struct list_head *gp_tasks;8147 /* Pointer to the first task blocking the */8248 /* current grace period, or NULL if there */8383- /* is not such task. */4949+ /* is no such task. */8450 struct list_head *exp_tasks;8551 /* Pointer to first task blocking the */8652 /* current expedited grace period, or NULL */8753 /* if there is no such task. If there */8854 /* is no current expedited grace period, */8955 /* then there cannot be any such task. */5656+#ifdef CONFIG_RCU_BOOST5757+ struct list_head *boost_tasks;5858+ /* Pointer to first task that needs to be */5959+ /* priority-boosted, or NULL if no priority */6060+ /* boosting is needed. If there is no */6161+ /* current or expedited grace period, there */6262+ /* can be no such task. */6363+#endif /* #ifdef CONFIG_RCU_BOOST */9064 u8 gpnum; /* Current grace period. */9165 u8 gpcpu; /* Last grace period blocked by the CPU. */9266 u8 completed; /* Last grace period completed. */9367 /* If all three are equal, RCU is idle. */6868+#ifdef CONFIG_RCU_BOOST6969+ s8 boosted_this_gp; /* Has boosting already happened? */7070+ unsigned long boost_time; /* When to start boosting (jiffies) */7171+#endif /* #ifdef CONFIG_RCU_BOOST */7272+#ifdef CONFIG_RCU_TRACE7373+ unsigned long n_grace_periods;7474+#ifdef CONFIG_RCU_BOOST7575+ unsigned long n_tasks_boosted;7676+ unsigned long n_exp_boosts;7777+ unsigned long n_normal_boosts;7878+ unsigned long n_normal_balk_blkd_tasks;7979+ unsigned long n_normal_balk_gp_tasks;8080+ unsigned long n_normal_balk_boost_tasks;8181+ unsigned long n_normal_balk_boosted;8282+ unsigned long n_normal_balk_notyet;8383+ unsigned long n_normal_balk_nos;8484+ unsigned long n_exp_balk_blkd_tasks;8585+ unsigned long n_exp_balk_nos;8686+#endif /* #ifdef CONFIG_RCU_BOOST */8787+#endif /* #ifdef CONFIG_RCU_TRACE */9488};95899690static struct rcu_preempt_ctrlblk rcu_preempt_ctrlblk = {···184122}185123186124/*125125+ * Advance a ->blkd_tasks-list pointer to the next entry, instead126126+ * returning NULL if at the end of the list.127127+ */128128+static struct list_head *rcu_next_node_entry(struct task_struct *t)129129+{130130+ struct list_head *np;131131+132132+ np = t->rcu_node_entry.next;133133+ if (np == &rcu_preempt_ctrlblk.blkd_tasks)134134+ np = NULL;135135+ return np;136136+}137137+138138+#ifdef CONFIG_RCU_TRACE139139+140140+#ifdef CONFIG_RCU_BOOST141141+static void rcu_initiate_boost_trace(void);142142+static void rcu_initiate_exp_boost_trace(void);143143+#endif /* #ifdef CONFIG_RCU_BOOST */144144+145145+/*146146+ * Dump additional statistice for TINY_PREEMPT_RCU.147147+ */148148+static void show_tiny_preempt_stats(struct seq_file *m)149149+{150150+ seq_printf(m, "rcu_preempt: qlen=%ld gp=%lu g%u/p%u/c%u tasks=%c%c%c\n",151151+ rcu_preempt_ctrlblk.rcb.qlen,152152+ rcu_preempt_ctrlblk.n_grace_periods,153153+ rcu_preempt_ctrlblk.gpnum,154154+ rcu_preempt_ctrlblk.gpcpu,155155+ rcu_preempt_ctrlblk.completed,156156+ "T."[list_empty(&rcu_preempt_ctrlblk.blkd_tasks)],157157+ "N."[!rcu_preempt_ctrlblk.gp_tasks],158158+ "E."[!rcu_preempt_ctrlblk.exp_tasks]);159159+#ifdef CONFIG_RCU_BOOST160160+ seq_printf(m, " ttb=%c btg=",161161+ "B."[!rcu_preempt_ctrlblk.boost_tasks]);162162+ switch (rcu_preempt_ctrlblk.boosted_this_gp) {163163+ case -1:164164+ seq_puts(m, "exp");165165+ break;166166+ case 0:167167+ seq_puts(m, "no");168168+ break;169169+ case 1:170170+ seq_puts(m, "begun");171171+ break;172172+ case 2:173173+ seq_puts(m, "done");174174+ break;175175+ default:176176+ seq_printf(m, "?%d?", rcu_preempt_ctrlblk.boosted_this_gp);177177+ }178178+ seq_printf(m, " ntb=%lu neb=%lu nnb=%lu j=%04x bt=%04x\n",179179+ rcu_preempt_ctrlblk.n_tasks_boosted,180180+ rcu_preempt_ctrlblk.n_exp_boosts,181181+ rcu_preempt_ctrlblk.n_normal_boosts,182182+ (int)(jiffies & 0xffff),183183+ (int)(rcu_preempt_ctrlblk.boost_time & 0xffff));184184+ seq_printf(m, " %s: nt=%lu gt=%lu bt=%lu b=%lu ny=%lu nos=%lu\n",185185+ "normal balk",186186+ rcu_preempt_ctrlblk.n_normal_balk_blkd_tasks,187187+ rcu_preempt_ctrlblk.n_normal_balk_gp_tasks,188188+ rcu_preempt_ctrlblk.n_normal_balk_boost_tasks,189189+ rcu_preempt_ctrlblk.n_normal_balk_boosted,190190+ rcu_preempt_ctrlblk.n_normal_balk_notyet,191191+ rcu_preempt_ctrlblk.n_normal_balk_nos);192192+ seq_printf(m, " exp balk: bt=%lu nos=%lu\n",193193+ rcu_preempt_ctrlblk.n_exp_balk_blkd_tasks,194194+ rcu_preempt_ctrlblk.n_exp_balk_nos);195195+#endif /* #ifdef CONFIG_RCU_BOOST */196196+}197197+198198+#endif /* #ifdef CONFIG_RCU_TRACE */199199+200200+#ifdef CONFIG_RCU_BOOST201201+202202+#include "rtmutex_common.h"203203+204204+/*205205+ * Carry out RCU priority boosting on the task indicated by ->boost_tasks,206206+ * and advance ->boost_tasks to the next task in the ->blkd_tasks list.207207+ */208208+static int rcu_boost(void)209209+{210210+ unsigned long flags;211211+ struct rt_mutex mtx;212212+ struct list_head *np;213213+ struct task_struct *t;214214+215215+ if (rcu_preempt_ctrlblk.boost_tasks == NULL)216216+ return 0; /* Nothing to boost. */217217+ raw_local_irq_save(flags);218218+ rcu_preempt_ctrlblk.boosted_this_gp++;219219+ t = container_of(rcu_preempt_ctrlblk.boost_tasks, struct task_struct,220220+ rcu_node_entry);221221+ np = rcu_next_node_entry(t);222222+ rt_mutex_init_proxy_locked(&mtx, t);223223+ t->rcu_boost_mutex = &mtx;224224+ t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BOOSTED;225225+ raw_local_irq_restore(flags);226226+ rt_mutex_lock(&mtx);227227+ RCU_TRACE(rcu_preempt_ctrlblk.n_tasks_boosted++);228228+ rcu_preempt_ctrlblk.boosted_this_gp++;229229+ rt_mutex_unlock(&mtx);230230+ return rcu_preempt_ctrlblk.boost_tasks != NULL;231231+}232232+233233+/*234234+ * Check to see if it is now time to start boosting RCU readers blocking235235+ * the current grace period, and, if so, tell the rcu_kthread_task to236236+ * start boosting them. If there is an expedited boost in progress,237237+ * we wait for it to complete.238238+ *239239+ * If there are no blocked readers blocking the current grace period,240240+ * return 0 to let the caller know, otherwise return 1. Note that this241241+ * return value is independent of whether or not boosting was done.242242+ */243243+static int rcu_initiate_boost(void)244244+{245245+ if (!rcu_preempt_blocked_readers_cgp()) {246246+ RCU_TRACE(rcu_preempt_ctrlblk.n_normal_balk_blkd_tasks++);247247+ return 0;248248+ }249249+ if (rcu_preempt_ctrlblk.gp_tasks != NULL &&250250+ rcu_preempt_ctrlblk.boost_tasks == NULL &&251251+ rcu_preempt_ctrlblk.boosted_this_gp == 0 &&252252+ ULONG_CMP_GE(jiffies, rcu_preempt_ctrlblk.boost_time)) {253253+ rcu_preempt_ctrlblk.boost_tasks = rcu_preempt_ctrlblk.gp_tasks;254254+ invoke_rcu_kthread();255255+ RCU_TRACE(rcu_preempt_ctrlblk.n_normal_boosts++);256256+ } else257257+ RCU_TRACE(rcu_initiate_boost_trace());258258+ return 1;259259+}260260+261261+/*262262+ * Initiate boosting for an expedited grace period.263263+ */264264+static void rcu_initiate_expedited_boost(void)265265+{266266+ unsigned long flags;267267+268268+ raw_local_irq_save(flags);269269+ if (!list_empty(&rcu_preempt_ctrlblk.blkd_tasks)) {270270+ rcu_preempt_ctrlblk.boost_tasks =271271+ rcu_preempt_ctrlblk.blkd_tasks.next;272272+ rcu_preempt_ctrlblk.boosted_this_gp = -1;273273+ invoke_rcu_kthread();274274+ RCU_TRACE(rcu_preempt_ctrlblk.n_exp_boosts++);275275+ } else276276+ RCU_TRACE(rcu_initiate_exp_boost_trace());277277+ raw_local_irq_restore(flags);278278+}279279+280280+#define RCU_BOOST_DELAY_JIFFIES DIV_ROUND_UP(CONFIG_RCU_BOOST_DELAY * HZ, 1000);281281+282282+/*283283+ * Do priority-boost accounting for the start of a new grace period.284284+ */285285+static void rcu_preempt_boost_start_gp(void)286286+{287287+ rcu_preempt_ctrlblk.boost_time = jiffies + RCU_BOOST_DELAY_JIFFIES;288288+ if (rcu_preempt_ctrlblk.boosted_this_gp > 0)289289+ rcu_preempt_ctrlblk.boosted_this_gp = 0;290290+}291291+292292+#else /* #ifdef CONFIG_RCU_BOOST */293293+294294+/*295295+ * If there is no RCU priority boosting, we don't boost.296296+ */297297+static int rcu_boost(void)298298+{299299+ return 0;300300+}301301+302302+/*303303+ * If there is no RCU priority boosting, we don't initiate boosting,304304+ * but we do indicate whether there are blocked readers blocking the305305+ * current grace period.306306+ */307307+static int rcu_initiate_boost(void)308308+{309309+ return rcu_preempt_blocked_readers_cgp();310310+}311311+312312+/*313313+ * If there is no RCU priority boosting, we don't initiate expedited boosting.314314+ */315315+static void rcu_initiate_expedited_boost(void)316316+{317317+}318318+319319+/*320320+ * If there is no RCU priority boosting, nothing to do at grace-period start.321321+ */322322+static void rcu_preempt_boost_start_gp(void)323323+{324324+}325325+326326+#endif /* else #ifdef CONFIG_RCU_BOOST */327327+328328+/*187329 * Record a preemptible-RCU quiescent state for the specified CPU. Note188330 * that this just means that the task currently running on the CPU is189331 * in a quiescent state. There might be any number of tasks blocked···414148 rcu_preempt_ctrlblk.gpcpu = rcu_preempt_ctrlblk.gpnum;415149 current->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;416150151151+ /* If there is no GP then there is nothing more to do. */152152+ if (!rcu_preempt_gp_in_progress())153153+ return;417154 /*418418- * If there is no GP, or if blocked readers are still blocking GP,419419- * then there is nothing more to do.155155+ * Check up on boosting. If there are no readers blocking the156156+ * current grace period, leave.420157 */421421- if (!rcu_preempt_gp_in_progress() || rcu_preempt_blocked_readers_cgp())158158+ if (rcu_initiate_boost())422159 return;423160424161 /* Advance callbacks. */···433164 if (!rcu_preempt_blocked_readers_any())434165 rcu_preempt_ctrlblk.rcb.donetail = rcu_preempt_ctrlblk.nexttail;435166436436- /* If there are done callbacks, make RCU_SOFTIRQ process them. */167167+ /* If there are done callbacks, cause them to be invoked. */437168 if (*rcu_preempt_ctrlblk.rcb.donetail != NULL)438438- raise_softirq(RCU_SOFTIRQ);169169+ invoke_rcu_kthread();439170}440171441172/*···447178448179 /* Official start of GP. */449180 rcu_preempt_ctrlblk.gpnum++;181181+ RCU_TRACE(rcu_preempt_ctrlblk.n_grace_periods++);450182451183 /* Any blocked RCU readers block new GP. */452184 if (rcu_preempt_blocked_readers_any())453185 rcu_preempt_ctrlblk.gp_tasks =454186 rcu_preempt_ctrlblk.blkd_tasks.next;187187+188188+ /* Set up for RCU priority boosting. */189189+ rcu_preempt_boost_start_gp();455190456191 /* If there is no running reader, CPU is done with GP. */457192 if (!rcu_preempt_running_reader())···577304 */578305 empty = !rcu_preempt_blocked_readers_cgp();579306 empty_exp = rcu_preempt_ctrlblk.exp_tasks == NULL;580580- np = t->rcu_node_entry.next;581581- if (np == &rcu_preempt_ctrlblk.blkd_tasks)582582- np = NULL;307307+ np = rcu_next_node_entry(t);583308 list_del(&t->rcu_node_entry);584309 if (&t->rcu_node_entry == rcu_preempt_ctrlblk.gp_tasks)585310 rcu_preempt_ctrlblk.gp_tasks = np;586311 if (&t->rcu_node_entry == rcu_preempt_ctrlblk.exp_tasks)587312 rcu_preempt_ctrlblk.exp_tasks = np;313313+#ifdef CONFIG_RCU_BOOST314314+ if (&t->rcu_node_entry == rcu_preempt_ctrlblk.boost_tasks)315315+ rcu_preempt_ctrlblk.boost_tasks = np;316316+#endif /* #ifdef CONFIG_RCU_BOOST */588317 INIT_LIST_HEAD(&t->rcu_node_entry);589318590319 /*···606331 if (!empty_exp && rcu_preempt_ctrlblk.exp_tasks == NULL)607332 rcu_report_exp_done();608333 }334334+#ifdef CONFIG_RCU_BOOST335335+ /* Unboost self if was boosted. */336336+ if (special & RCU_READ_UNLOCK_BOOSTED) {337337+ t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_BOOSTED;338338+ rt_mutex_unlock(t->rcu_boost_mutex);339339+ t->rcu_boost_mutex = NULL;340340+ }341341+#endif /* #ifdef CONFIG_RCU_BOOST */609342 local_irq_restore(flags);610343}611344···657374 rcu_preempt_cpu_qs();658375 if (&rcu_preempt_ctrlblk.rcb.rcucblist !=659376 rcu_preempt_ctrlblk.rcb.donetail)660660- raise_softirq(RCU_SOFTIRQ);377377+ invoke_rcu_kthread();661378 if (rcu_preempt_gp_in_progress() &&662379 rcu_cpu_blocking_cur_gp() &&663380 rcu_preempt_running_reader())···666383667384/*668385 * TINY_PREEMPT_RCU has an extra callback-list tail pointer to669669- * update, so this is invoked from __rcu_process_callbacks() to386386+ * update, so this is invoked from rcu_process_callbacks() to670387 * handle that case. Of course, it is invoked for all flavors of671388 * RCU, but RCU callbacks can appear only on one of the lists, and672389 * neither ->nexttail nor ->donetail can possibly be NULL, so there···683400 */684401static void rcu_preempt_process_callbacks(void)685402{686686- __rcu_process_callbacks(&rcu_preempt_ctrlblk.rcb);403403+ rcu_process_callbacks(&rcu_preempt_ctrlblk.rcb);687404}688405689406/*···700417 local_irq_save(flags);701418 *rcu_preempt_ctrlblk.nexttail = head;702419 rcu_preempt_ctrlblk.nexttail = &head->next;420420+ RCU_TRACE(rcu_preempt_ctrlblk.rcb.qlen++);703421 rcu_preempt_start_gp(); /* checks to see if GP needed. */704422 local_irq_restore(flags);705423}···816532817533 /* Wait for tail of ->blkd_tasks list to drain. */818534 if (rcu_preempted_readers_exp())535535+ rcu_initiate_expedited_boost();819536 wait_event(sync_rcu_preempt_exp_wq,820537 !rcu_preempted_readers_exp());821538···857572858573#else /* #ifdef CONFIG_TINY_PREEMPT_RCU */859574575575+#ifdef CONFIG_RCU_TRACE576576+577577+/*578578+ * Because preemptible RCU does not exist, it is not necessary to579579+ * dump out its statistics.580580+ */581581+static void show_tiny_preempt_stats(struct seq_file *m)582582+{583583+}584584+585585+#endif /* #ifdef CONFIG_RCU_TRACE */586586+587587+/*588588+ * Because preemptible RCU does not exist, it is never necessary to589589+ * boost preempted RCU readers.590590+ */591591+static int rcu_boost(void)592592+{593593+ return 0;594594+}595595+860596/*861597 * Because preemptible RCU does not exist, it never has any callbacks862598 * to check.···905599#endif /* #else #ifdef CONFIG_TINY_PREEMPT_RCU */906600907601#ifdef CONFIG_DEBUG_LOCK_ALLOC908908-909602#include <linux/kernel_stat.h>910603911604/*912605 * During boot, we forgive RCU lockdep issues. After this function is913606 * invoked, we start taking RCU lockdep issues seriously.914607 */915915-void rcu_scheduler_starting(void)608608+void __init rcu_scheduler_starting(void)916609{917610 WARN_ON(nr_context_switches() > 0);918611 rcu_scheduler_active = 1;919612}920613921614#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */615615+616616+#ifdef CONFIG_RCU_BOOST617617+#define RCU_BOOST_PRIO CONFIG_RCU_BOOST_PRIO618618+#else /* #ifdef CONFIG_RCU_BOOST */619619+#define RCU_BOOST_PRIO 1620620+#endif /* #else #ifdef CONFIG_RCU_BOOST */621621+622622+#ifdef CONFIG_RCU_TRACE623623+624624+#ifdef CONFIG_RCU_BOOST625625+626626+static void rcu_initiate_boost_trace(void)627627+{628628+ if (rcu_preempt_ctrlblk.gp_tasks == NULL)629629+ rcu_preempt_ctrlblk.n_normal_balk_gp_tasks++;630630+ else if (rcu_preempt_ctrlblk.boost_tasks != NULL)631631+ rcu_preempt_ctrlblk.n_normal_balk_boost_tasks++;632632+ else if (rcu_preempt_ctrlblk.boosted_this_gp != 0)633633+ rcu_preempt_ctrlblk.n_normal_balk_boosted++;634634+ else if (!ULONG_CMP_GE(jiffies, rcu_preempt_ctrlblk.boost_time))635635+ rcu_preempt_ctrlblk.n_normal_balk_notyet++;636636+ else637637+ rcu_preempt_ctrlblk.n_normal_balk_nos++;638638+}639639+640640+static void rcu_initiate_exp_boost_trace(void)641641+{642642+ if (list_empty(&rcu_preempt_ctrlblk.blkd_tasks))643643+ rcu_preempt_ctrlblk.n_exp_balk_blkd_tasks++;644644+ else645645+ rcu_preempt_ctrlblk.n_exp_balk_nos++;646646+}647647+648648+#endif /* #ifdef CONFIG_RCU_BOOST */649649+650650+static void rcu_trace_sub_qlen(struct rcu_ctrlblk *rcp, int n)651651+{652652+ unsigned long flags;653653+654654+ raw_local_irq_save(flags);655655+ rcp->qlen -= n;656656+ raw_local_irq_restore(flags);657657+}658658+659659+/*660660+ * Dump statistics for TINY_RCU, such as they are.661661+ */662662+static int show_tiny_stats(struct seq_file *m, void *unused)663663+{664664+ show_tiny_preempt_stats(m);665665+ seq_printf(m, "rcu_sched: qlen: %ld\n", rcu_sched_ctrlblk.qlen);666666+ seq_printf(m, "rcu_bh: qlen: %ld\n", rcu_bh_ctrlblk.qlen);667667+ return 0;668668+}669669+670670+static int show_tiny_stats_open(struct inode *inode, struct file *file)671671+{672672+ return single_open(file, show_tiny_stats, NULL);673673+}674674+675675+static const struct file_operations show_tiny_stats_fops = {676676+ .owner = THIS_MODULE,677677+ .open = show_tiny_stats_open,678678+ .read = seq_read,679679+ .llseek = seq_lseek,680680+ .release = single_release,681681+};682682+683683+static struct dentry *rcudir;684684+685685+static int __init rcutiny_trace_init(void)686686+{687687+ struct dentry *retval;688688+689689+ rcudir = debugfs_create_dir("rcu", NULL);690690+ if (!rcudir)691691+ goto free_out;692692+ retval = debugfs_create_file("rcudata", 0444, rcudir,693693+ NULL, &show_tiny_stats_fops);694694+ if (!retval)695695+ goto free_out;696696+ return 0;697697+free_out:698698+ debugfs_remove_recursive(rcudir);699699+ return 1;700700+}701701+702702+static void __exit rcutiny_trace_cleanup(void)703703+{704704+ debugfs_remove_recursive(rcudir);705705+}706706+707707+module_init(rcutiny_trace_init);708708+module_exit(rcutiny_trace_cleanup);709709+710710+MODULE_AUTHOR("Paul E. McKenney");711711+MODULE_DESCRIPTION("Read-Copy Update tracing for tiny implementation");712712+MODULE_LICENSE("GPL");713713+714714+#endif /* #ifdef CONFIG_RCU_TRACE */
+259-11
kernel/rcutorture.c
···4747#include <linux/srcu.h>4848#include <linux/slab.h>4949#include <asm/byteorder.h>5050+#include <linux/sched.h>50515152MODULE_LICENSE("GPL");5253MODULE_AUTHOR("Paul E. McKenney <paulmck@us.ibm.com> and "···6564static int fqs_duration = 0; /* Duration of bursts (us), 0 to disable. */6665static int fqs_holdoff = 0; /* Hold time within burst (us). */6766static int fqs_stutter = 3; /* Wait time between bursts (s). */6767+static int test_boost = 1; /* Test RCU prio boost: 0=no, 1=maybe, 2=yes. */6868+static int test_boost_interval = 7; /* Interval between boost tests, seconds. */6969+static int test_boost_duration = 4; /* Duration of each boost test, seconds. */6870static char *torture_type = "rcu"; /* What RCU implementation to torture. */69717072module_param(nreaders, int, 0444);···9288MODULE_PARM_DESC(fqs_holdoff, "Holdoff time within fqs bursts (us)");9389module_param(fqs_stutter, int, 0444);9490MODULE_PARM_DESC(fqs_stutter, "Wait time between fqs bursts (s)");9191+module_param(test_boost, int, 0444);9292+MODULE_PARM_DESC(test_boost, "Test RCU prio boost: 0=no, 1=maybe, 2=yes.");9393+module_param(test_boost_interval, int, 0444);9494+MODULE_PARM_DESC(test_boost_interval, "Interval between boost tests, seconds.");9595+module_param(test_boost_duration, int, 0444);9696+MODULE_PARM_DESC(test_boost_duration, "Duration of each boost test, seconds.");9597module_param(torture_type, charp, 0444);9698MODULE_PARM_DESC(torture_type, "Type of RCU to torture (rcu, rcu_bh, srcu)");9799···119109static struct task_struct *shuffler_task;120110static struct task_struct *stutter_task;121111static struct task_struct *fqs_task;112112+static struct task_struct *boost_tasks[NR_CPUS];122113123114#define RCU_TORTURE_PIPE_LEN 10124115···145134static atomic_t n_rcu_torture_free;146135static atomic_t n_rcu_torture_mberror;147136static atomic_t n_rcu_torture_error;137137+static long n_rcu_torture_boost_ktrerror;138138+static long n_rcu_torture_boost_rterror;139139+static long n_rcu_torture_boost_allocerror;140140+static long n_rcu_torture_boost_afferror;141141+static long n_rcu_torture_boost_failure;142142+static long n_rcu_torture_boosts;148143static long n_rcu_torture_timers;149144static struct list_head rcu_torture_removed;150145static cpumask_var_t shuffle_tmp_mask;···163146#define RCUTORTURE_RUNNABLE_INIT 0164147#endif165148int rcutorture_runnable = RCUTORTURE_RUNNABLE_INIT;149149+150150+#ifdef CONFIG_RCU_BOOST151151+#define rcu_can_boost() 1152152+#else /* #ifdef CONFIG_RCU_BOOST */153153+#define rcu_can_boost() 0154154+#endif /* #else #ifdef CONFIG_RCU_BOOST */155155+156156+static unsigned long boost_starttime; /* jiffies of next boost test start. */157157+DEFINE_MUTEX(boost_mutex); /* protect setting boost_starttime */158158+ /* and boost task create/destroy. */166159167160/* Mediate rmmod and system shutdown. Concurrent rmmod & shutdown illegal! */168161···304277 void (*fqs)(void);305278 int (*stats)(char *page);306279 int irq_capable;280280+ int can_boost;307281 char *name;308282};309283···394366 .fqs = rcu_force_quiescent_state,395367 .stats = NULL,396368 .irq_capable = 1,369369+ .can_boost = rcu_can_boost(),397370 .name = "rcu"398371};399372···437408 .fqs = rcu_force_quiescent_state,438409 .stats = NULL,439410 .irq_capable = 1,411411+ .can_boost = rcu_can_boost(),440412 .name = "rcu_sync"441413};442414···454424 .fqs = rcu_force_quiescent_state,455425 .stats = NULL,456426 .irq_capable = 1,427427+ .can_boost = rcu_can_boost(),457428 .name = "rcu_expedited"458429};459430···715684};716685717686/*687687+ * RCU torture priority-boost testing. Runs one real-time thread per688688+ * CPU for moderate bursts, repeatedly registering RCU callbacks and689689+ * spinning waiting for them to be invoked. If a given callback takes690690+ * too long to be invoked, we assume that priority inversion has occurred.691691+ */692692+693693+struct rcu_boost_inflight {694694+ struct rcu_head rcu;695695+ int inflight;696696+};697697+698698+static void rcu_torture_boost_cb(struct rcu_head *head)699699+{700700+ struct rcu_boost_inflight *rbip =701701+ container_of(head, struct rcu_boost_inflight, rcu);702702+703703+ smp_mb(); /* Ensure RCU-core accesses precede clearing ->inflight */704704+ rbip->inflight = 0;705705+}706706+707707+static int rcu_torture_boost(void *arg)708708+{709709+ unsigned long call_rcu_time;710710+ unsigned long endtime;711711+ unsigned long oldstarttime;712712+ struct rcu_boost_inflight rbi = { .inflight = 0 };713713+ struct sched_param sp;714714+715715+ VERBOSE_PRINTK_STRING("rcu_torture_boost started");716716+717717+ /* Set real-time priority. */718718+ sp.sched_priority = 1;719719+ if (sched_setscheduler(current, SCHED_FIFO, &sp) < 0) {720720+ VERBOSE_PRINTK_STRING("rcu_torture_boost RT prio failed!");721721+ n_rcu_torture_boost_rterror++;722722+ }723723+724724+ /* Each pass through the following loop does one boost-test cycle. */725725+ do {726726+ /* Wait for the next test interval. */727727+ oldstarttime = boost_starttime;728728+ while (jiffies - oldstarttime > ULONG_MAX / 2) {729729+ schedule_timeout_uninterruptible(1);730730+ rcu_stutter_wait("rcu_torture_boost");731731+ if (kthread_should_stop() ||732732+ fullstop != FULLSTOP_DONTSTOP)733733+ goto checkwait;734734+ }735735+736736+ /* Do one boost-test interval. */737737+ endtime = oldstarttime + test_boost_duration * HZ;738738+ call_rcu_time = jiffies;739739+ while (jiffies - endtime > ULONG_MAX / 2) {740740+ /* If we don't have a callback in flight, post one. */741741+ if (!rbi.inflight) {742742+ smp_mb(); /* RCU core before ->inflight = 1. */743743+ rbi.inflight = 1;744744+ call_rcu(&rbi.rcu, rcu_torture_boost_cb);745745+ if (jiffies - call_rcu_time >746746+ test_boost_duration * HZ - HZ / 2) {747747+ VERBOSE_PRINTK_STRING("rcu_torture_boost boosting failed");748748+ n_rcu_torture_boost_failure++;749749+ }750750+ call_rcu_time = jiffies;751751+ }752752+ cond_resched();753753+ rcu_stutter_wait("rcu_torture_boost");754754+ if (kthread_should_stop() ||755755+ fullstop != FULLSTOP_DONTSTOP)756756+ goto checkwait;757757+ }758758+759759+ /*760760+ * Set the start time of the next test interval.761761+ * Yes, this is vulnerable to long delays, but such762762+ * delays simply cause a false negative for the next763763+ * interval. Besides, we are running at RT priority,764764+ * so delays should be relatively rare.765765+ */766766+ while (oldstarttime == boost_starttime) {767767+ if (mutex_trylock(&boost_mutex)) {768768+ boost_starttime = jiffies +769769+ test_boost_interval * HZ;770770+ n_rcu_torture_boosts++;771771+ mutex_unlock(&boost_mutex);772772+ break;773773+ }774774+ schedule_timeout_uninterruptible(1);775775+ }776776+777777+ /* Go do the stutter. */778778+checkwait: rcu_stutter_wait("rcu_torture_boost");779779+ } while (!kthread_should_stop() && fullstop == FULLSTOP_DONTSTOP);780780+781781+ /* Clean up and exit. */782782+ VERBOSE_PRINTK_STRING("rcu_torture_boost task stopping");783783+ rcutorture_shutdown_absorb("rcu_torture_boost");784784+ while (!kthread_should_stop() || rbi.inflight)785785+ schedule_timeout_uninterruptible(1);786786+ smp_mb(); /* order accesses to ->inflight before stack-frame death. */787787+ return 0;788788+}789789+790790+/*718791 * RCU torture force-quiescent-state kthread. Repeatedly induces719792 * bursts of calls to force_quiescent_state(), increasing the probability720793 * of occurrence of some important types of race conditions.···1068933 cnt += sprintf(&page[cnt], "%s%s ", torture_type, TORTURE_FLAG);1069934 cnt += sprintf(&page[cnt],1070935 "rtc: %p ver: %ld tfle: %d rta: %d rtaf: %d rtf: %d "10711071- "rtmbe: %d nt: %ld",936936+ "rtmbe: %d rtbke: %ld rtbre: %ld rtbae: %ld rtbafe: %ld "937937+ "rtbf: %ld rtb: %ld nt: %ld",1072938 rcu_torture_current,1073939 rcu_torture_current_version,1074940 list_empty(&rcu_torture_freelist),···1077941 atomic_read(&n_rcu_torture_alloc_fail),1078942 atomic_read(&n_rcu_torture_free),1079943 atomic_read(&n_rcu_torture_mberror),944944+ n_rcu_torture_boost_ktrerror,945945+ n_rcu_torture_boost_rterror,946946+ n_rcu_torture_boost_allocerror,947947+ n_rcu_torture_boost_afferror,948948+ n_rcu_torture_boost_failure,949949+ n_rcu_torture_boosts,1080950 n_rcu_torture_timers);10811081- if (atomic_read(&n_rcu_torture_mberror) != 0)951951+ if (atomic_read(&n_rcu_torture_mberror) != 0 ||952952+ n_rcu_torture_boost_ktrerror != 0 ||953953+ n_rcu_torture_boost_rterror != 0 ||954954+ n_rcu_torture_boost_allocerror != 0 ||955955+ n_rcu_torture_boost_afferror != 0 ||956956+ n_rcu_torture_boost_failure != 0)1082957 cnt += sprintf(&page[cnt], " !!!");1083958 cnt += sprintf(&page[cnt], "\n%s%s ", torture_type, TORTURE_FLAG);1084959 if (i > 1) {···12411094}1242109512431096static inline void12441244-rcu_torture_print_module_parms(char *tag)10971097+rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, char *tag)12451098{12461099 printk(KERN_ALERT "%s" TORTURE_FLAG12471100 "--- %s: nreaders=%d nfakewriters=%d "12481101 "stat_interval=%d verbose=%d test_no_idle_hz=%d "12491102 "shuffle_interval=%d stutter=%d irqreader=%d "12501250- "fqs_duration=%d fqs_holdoff=%d fqs_stutter=%d\n",11031103+ "fqs_duration=%d fqs_holdoff=%d fqs_stutter=%d "11041104+ "test_boost=%d/%d test_boost_interval=%d "11051105+ "test_boost_duration=%d\n",12511106 torture_type, tag, nrealreaders, nfakewriters,12521107 stat_interval, verbose, test_no_idle_hz, shuffle_interval,12531253- stutter, irqreader, fqs_duration, fqs_holdoff, fqs_stutter);11081108+ stutter, irqreader, fqs_duration, fqs_holdoff, fqs_stutter,11091109+ test_boost, cur_ops->can_boost,11101110+ test_boost_interval, test_boost_duration);12541111}1255111212561256-static struct notifier_block rcutorture_nb = {11131113+static struct notifier_block rcutorture_shutdown_nb = {12571114 .notifier_call = rcutorture_shutdown_notify,11151115+};11161116+11171117+static void rcutorture_booster_cleanup(int cpu)11181118+{11191119+ struct task_struct *t;11201120+11211121+ if (boost_tasks[cpu] == NULL)11221122+ return;11231123+ mutex_lock(&boost_mutex);11241124+ VERBOSE_PRINTK_STRING("Stopping rcu_torture_boost task");11251125+ t = boost_tasks[cpu];11261126+ boost_tasks[cpu] = NULL;11271127+ mutex_unlock(&boost_mutex);11281128+11291129+ /* This must be outside of the mutex, otherwise deadlock! */11301130+ kthread_stop(t);11311131+}11321132+11331133+static int rcutorture_booster_init(int cpu)11341134+{11351135+ int retval;11361136+11371137+ if (boost_tasks[cpu] != NULL)11381138+ return 0; /* Already created, nothing more to do. */11391139+11401140+ /* Don't allow time recalculation while creating a new task. */11411141+ mutex_lock(&boost_mutex);11421142+ VERBOSE_PRINTK_STRING("Creating rcu_torture_boost task");11431143+ boost_tasks[cpu] = kthread_create(rcu_torture_boost, NULL,11441144+ "rcu_torture_boost");11451145+ if (IS_ERR(boost_tasks[cpu])) {11461146+ retval = PTR_ERR(boost_tasks[cpu]);11471147+ VERBOSE_PRINTK_STRING("rcu_torture_boost task create failed");11481148+ n_rcu_torture_boost_ktrerror++;11491149+ boost_tasks[cpu] = NULL;11501150+ mutex_unlock(&boost_mutex);11511151+ return retval;11521152+ }11531153+ kthread_bind(boost_tasks[cpu], cpu);11541154+ wake_up_process(boost_tasks[cpu]);11551155+ mutex_unlock(&boost_mutex);11561156+ return 0;11571157+}11581158+11591159+static int rcutorture_cpu_notify(struct notifier_block *self,11601160+ unsigned long action, void *hcpu)11611161+{11621162+ long cpu = (long)hcpu;11631163+11641164+ switch (action) {11651165+ case CPU_ONLINE:11661166+ case CPU_DOWN_FAILED:11671167+ (void)rcutorture_booster_init(cpu);11681168+ break;11691169+ case CPU_DOWN_PREPARE:11701170+ rcutorture_booster_cleanup(cpu);11711171+ break;11721172+ default:11731173+ break;11741174+ }11751175+ return NOTIFY_OK;11761176+}11771177+11781178+static struct notifier_block rcutorture_cpu_nb = {11791179+ .notifier_call = rcutorture_cpu_notify,12581180};1259118112601182static void···13431127 }13441128 fullstop = FULLSTOP_RMMOD;13451129 mutex_unlock(&fullstop_mutex);13461346- unregister_reboot_notifier(&rcutorture_nb);11301130+ unregister_reboot_notifier(&rcutorture_shutdown_nb);13471131 if (stutter_task) {13481132 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stutter task");13491133 kthread_stop(stutter_task);···14001184 kthread_stop(fqs_task);14011185 }14021186 fqs_task = NULL;11871187+ if ((test_boost == 1 && cur_ops->can_boost) ||11881188+ test_boost == 2) {11891189+ unregister_cpu_notifier(&rcutorture_cpu_nb);11901190+ for_each_possible_cpu(i)11911191+ rcutorture_booster_cleanup(i);11921192+ }1403119314041194 /* Wait for all RCU callbacks to fire. */14051195···14171195 if (cur_ops->cleanup)14181196 cur_ops->cleanup();14191197 if (atomic_read(&n_rcu_torture_error))14201420- rcu_torture_print_module_parms("End of test: FAILURE");11981198+ rcu_torture_print_module_parms(cur_ops, "End of test: FAILURE");14211199 else14221422- rcu_torture_print_module_parms("End of test: SUCCESS");12001200+ rcu_torture_print_module_parms(cur_ops, "End of test: SUCCESS");14231201}1424120214251203static int __init···14641242 nrealreaders = nreaders;14651243 else14661244 nrealreaders = 2 * num_online_cpus();14671467- rcu_torture_print_module_parms("Start of test");12451245+ rcu_torture_print_module_parms(cur_ops, "Start of test");14681246 fullstop = FULLSTOP_DONTSTOP;1469124714701248 /* Set up the freelist. */···14851263 atomic_set(&n_rcu_torture_free, 0);14861264 atomic_set(&n_rcu_torture_mberror, 0);14871265 atomic_set(&n_rcu_torture_error, 0);12661266+ n_rcu_torture_boost_ktrerror = 0;12671267+ n_rcu_torture_boost_rterror = 0;12681268+ n_rcu_torture_boost_allocerror = 0;12691269+ n_rcu_torture_boost_afferror = 0;12701270+ n_rcu_torture_boost_failure = 0;12711271+ n_rcu_torture_boosts = 0;14881272 for (i = 0; i < RCU_TORTURE_PIPE_LEN + 1; i++)14891273 atomic_set(&rcu_torture_wcount[i], 0);14901274 for_each_possible_cpu(cpu) {···16041376 goto unwind;16051377 }16061378 }16071607- register_reboot_notifier(&rcutorture_nb);13791379+ if (test_boost_interval < 1)13801380+ test_boost_interval = 1;13811381+ if (test_boost_duration < 2)13821382+ test_boost_duration = 2;13831383+ if ((test_boost == 1 && cur_ops->can_boost) ||13841384+ test_boost == 2) {13851385+ int retval;13861386+13871387+ boost_starttime = jiffies + test_boost_interval * HZ;13881388+ register_cpu_notifier(&rcutorture_cpu_nb);13891389+ for_each_possible_cpu(i) {13901390+ if (cpu_is_offline(i))13911391+ continue; /* Heuristic: CPU can go offline. */13921392+ retval = rcutorture_booster_init(i);13931393+ if (retval < 0) {13941394+ firsterr = retval;13951395+ goto unwind;13961396+ }13971397+ }13981398+ }13991399+ register_reboot_notifier(&rcutorture_shutdown_nb);16081400 mutex_unlock(&fullstop_mutex);16091401 return 0;16101402
+75-81
kernel/rcutree.c
···6767 .gpnum = -300, \6868 .completed = -300, \6969 .onofflock = __RAW_SPIN_LOCK_UNLOCKED(&structname.onofflock), \7070- .orphan_cbs_list = NULL, \7171- .orphan_cbs_tail = &structname.orphan_cbs_list, \7272- .orphan_qlen = 0, \7370 .fqslock = __RAW_SPIN_LOCK_UNLOCKED(&structname.fqslock), \7471 .n_force_qs = 0, \7572 .n_force_qs_ngp = 0, \···617620static void __note_new_gpnum(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_data *rdp)618621{619622 if (rdp->gpnum != rnp->gpnum) {620620- rdp->qs_pending = 1;621621- rdp->passed_quiesc = 0;623623+ /*624624+ * If the current grace period is waiting for this CPU,625625+ * set up to detect a quiescent state, otherwise don't626626+ * go looking for one.627627+ */622628 rdp->gpnum = rnp->gpnum;629629+ if (rnp->qsmask & rdp->grpmask) {630630+ rdp->qs_pending = 1;631631+ rdp->passed_quiesc = 0;632632+ } else633633+ rdp->qs_pending = 0;623634 }624635}625636···686681687682 /* Remember that we saw this grace-period completion. */688683 rdp->completed = rnp->completed;684684+685685+ /*686686+ * If we were in an extended quiescent state, we may have687687+ * missed some grace periods that others CPUs handled on688688+ * our behalf. Catch up with this state to avoid noting689689+ * spurious new grace periods. If another grace period690690+ * has started, then rnp->gpnum will have advanced, so691691+ * we will detect this later on.692692+ */693693+ if (ULONG_CMP_LT(rdp->gpnum, rdp->completed))694694+ rdp->gpnum = rdp->completed;695695+696696+ /*697697+ * If RCU does not need a quiescent state from this CPU,698698+ * then make sure that this CPU doesn't go looking for one.699699+ */700700+ if ((rnp->qsmask & rdp->grpmask) == 0)701701+ rdp->qs_pending = 0;689702 }690703}691704···1007984#ifdef CONFIG_HOTPLUG_CPU10089851009986/*10101010- * Move a dying CPU's RCU callbacks to the ->orphan_cbs_list for the10111011- * specified flavor of RCU. The callbacks will be adopted by the next10121012- * _rcu_barrier() invocation or by the CPU_DEAD notifier, whichever10131013- * comes first. Because this is invoked from the CPU_DYING notifier,10141014- * irqs are already disabled.987987+ * Move a dying CPU's RCU callbacks to online CPU's callback list.988988+ * Synchronization is not required because this function executes989989+ * in stop_machine() context.1015990 */10161016-static void rcu_send_cbs_to_orphanage(struct rcu_state *rsp)991991+static void rcu_send_cbs_to_online(struct rcu_state *rsp)1017992{1018993 int i;994994+ /* current DYING CPU is cleared in the cpu_online_mask */995995+ int receive_cpu = cpumask_any(cpu_online_mask);1019996 struct rcu_data *rdp = this_cpu_ptr(rsp->rda);997997+ struct rcu_data *receive_rdp = per_cpu_ptr(rsp->rda, receive_cpu);10209981021999 if (rdp->nxtlist == NULL)10221000 return; /* irqs disabled, so comparison is stable. */10231023- raw_spin_lock(&rsp->onofflock); /* irqs already disabled. */10241024- *rsp->orphan_cbs_tail = rdp->nxtlist;10251025- rsp->orphan_cbs_tail = rdp->nxttail[RCU_NEXT_TAIL];10011001+10021002+ *receive_rdp->nxttail[RCU_NEXT_TAIL] = rdp->nxtlist;10031003+ receive_rdp->nxttail[RCU_NEXT_TAIL] = rdp->nxttail[RCU_NEXT_TAIL];10041004+ receive_rdp->qlen += rdp->qlen;10051005+ receive_rdp->n_cbs_adopted += rdp->qlen;10061006+ rdp->n_cbs_orphaned += rdp->qlen;10071007+10261008 rdp->nxtlist = NULL;10271009 for (i = 0; i < RCU_NEXT_SIZE; i++)10281010 rdp->nxttail[i] = &rdp->nxtlist;10291029- rsp->orphan_qlen += rdp->qlen;10301030- rdp->n_cbs_orphaned += rdp->qlen;10311011 rdp->qlen = 0;10321032- raw_spin_unlock(&rsp->onofflock); /* irqs remain disabled. */10331033-}10341034-10351035-/*10361036- * Adopt previously orphaned RCU callbacks.10371037- */10381038-static void rcu_adopt_orphan_cbs(struct rcu_state *rsp)10391039-{10401040- unsigned long flags;10411041- struct rcu_data *rdp;10421042-10431043- raw_spin_lock_irqsave(&rsp->onofflock, flags);10441044- rdp = this_cpu_ptr(rsp->rda);10451045- if (rsp->orphan_cbs_list == NULL) {10461046- raw_spin_unlock_irqrestore(&rsp->onofflock, flags);10471047- return;10481048- }10491049- *rdp->nxttail[RCU_NEXT_TAIL] = rsp->orphan_cbs_list;10501050- rdp->nxttail[RCU_NEXT_TAIL] = rsp->orphan_cbs_tail;10511051- rdp->qlen += rsp->orphan_qlen;10521052- rdp->n_cbs_adopted += rsp->orphan_qlen;10531053- rsp->orphan_cbs_list = NULL;10541054- rsp->orphan_cbs_tail = &rsp->orphan_cbs_list;10551055- rsp->orphan_qlen = 0;10561056- raw_spin_unlock_irqrestore(&rsp->onofflock, flags);10571012}1058101310591014/*···10821081 raw_spin_unlock_irqrestore(&rnp->lock, flags);10831082 if (need_report & RCU_OFL_TASKS_EXP_GP)10841083 rcu_report_exp_rnp(rsp, rnp);10851085-10861086- rcu_adopt_orphan_cbs(rsp);10871084}1088108510891086/*···1099110011001101#else /* #ifdef CONFIG_HOTPLUG_CPU */1101110211021102-static void rcu_send_cbs_to_orphanage(struct rcu_state *rsp)11031103-{11041104-}11051105-11061106-static void rcu_adopt_orphan_cbs(struct rcu_state *rsp)11031103+static void rcu_send_cbs_to_online(struct rcu_state *rsp)11071104{11081105}11091106···14351440 */14361441 local_irq_save(flags);14371442 rdp = this_cpu_ptr(rsp->rda);14381438- rcu_process_gp_end(rsp, rdp);14391439- check_for_new_grace_period(rsp, rdp);1440144314411444 /* Add the callback to our list. */14421445 *rdp->nxttail[RCU_NEXT_TAIL] = head;14431446 rdp->nxttail[RCU_NEXT_TAIL] = &head->next;14441444-14451445- /* Start a new grace period if one not already started. */14461446- if (!rcu_gp_in_progress(rsp)) {14471447- unsigned long nestflag;14481448- struct rcu_node *rnp_root = rcu_get_root(rsp);14491449-14501450- raw_spin_lock_irqsave(&rnp_root->lock, nestflag);14511451- rcu_start_gp(rsp, nestflag); /* releases rnp_root->lock. */14521452- }1453144714541448 /*14551449 * Force the grace period if too many callbacks or too long waiting.···14481464 * is the only one waiting for a grace period to complete.14491465 */14501466 if (unlikely(++rdp->qlen > rdp->qlen_last_fqs_check + qhimark)) {14511451- rdp->blimit = LONG_MAX;14521452- if (rsp->n_force_qs == rdp->n_force_qs_snap &&14531453- *rdp->nxttail[RCU_DONE_TAIL] != head)14541454- force_quiescent_state(rsp, 0);14551455- rdp->n_force_qs_snap = rsp->n_force_qs;14561456- rdp->qlen_last_fqs_check = rdp->qlen;14671467+14681468+ /* Are we ignoring a completed grace period? */14691469+ rcu_process_gp_end(rsp, rdp);14701470+ check_for_new_grace_period(rsp, rdp);14711471+14721472+ /* Start a new grace period if one not already started. */14731473+ if (!rcu_gp_in_progress(rsp)) {14741474+ unsigned long nestflag;14751475+ struct rcu_node *rnp_root = rcu_get_root(rsp);14761476+14771477+ raw_spin_lock_irqsave(&rnp_root->lock, nestflag);14781478+ rcu_start_gp(rsp, nestflag); /* rlses rnp_root->lock */14791479+ } else {14801480+ /* Give the grace period a kick. */14811481+ rdp->blimit = LONG_MAX;14821482+ if (rsp->n_force_qs == rdp->n_force_qs_snap &&14831483+ *rdp->nxttail[RCU_DONE_TAIL] != head)14841484+ force_quiescent_state(rsp, 0);14851485+ rdp->n_force_qs_snap = rsp->n_force_qs;14861486+ rdp->qlen_last_fqs_check = rdp->qlen;14871487+ }14571488 } else if (ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs), jiffies))14581489 force_quiescent_state(rsp, 1);14591490 local_irq_restore(flags);···16981699 * decrement rcu_barrier_cpu_count -- otherwise the first CPU16991700 * might complete its grace period before all of the other CPUs17001701 * did their increment, causing this function to return too17011701- * early.17021702+ * early. Note that on_each_cpu() disables irqs, which prevents17031703+ * any CPUs from coming online or going offline until each online17041704+ * CPU has queued its RCU-barrier callback.17021705 */17031706 atomic_set(&rcu_barrier_cpu_count, 1);17041704- preempt_disable(); /* stop CPU_DYING from filling orphan_cbs_list */17051705- rcu_adopt_orphan_cbs(rsp);17061707 on_each_cpu(rcu_barrier_func, (void *)call_rcu_func, 1);17071707- preempt_enable(); /* CPU_DYING can again fill orphan_cbs_list */17081708 if (atomic_dec_and_test(&rcu_barrier_cpu_count))17091709 complete(&rcu_barrier_completion);17101710 wait_for_completion(&rcu_barrier_completion);···18291831 case CPU_DYING:18301832 case CPU_DYING_FROZEN:18311833 /*18321832- * preempt_disable() in _rcu_barrier() prevents stop_machine(),18331833- * so when "on_each_cpu(rcu_barrier_func, (void *)type, 1);"18341834- * returns, all online cpus have queued rcu_barrier_func().18351835- * The dying CPU clears its cpu_online_mask bit and18361836- * moves all of its RCU callbacks to ->orphan_cbs_list18371837- * in the context of stop_machine(), so subsequent calls18381838- * to _rcu_barrier() will adopt these callbacks and only18391839- * then queue rcu_barrier_func() on all remaining CPUs.18341834+ * The whole machine is "stopped" except this CPU, so we can18351835+ * touch any data without introducing corruption. We send the18361836+ * dying CPU's callbacks to an arbitrarily chosen online CPU.18401837 */18411841- rcu_send_cbs_to_orphanage(&rcu_bh_state);18421842- rcu_send_cbs_to_orphanage(&rcu_sched_state);18431843- rcu_preempt_send_cbs_to_orphanage();18381838+ rcu_send_cbs_to_online(&rcu_bh_state);18391839+ rcu_send_cbs_to_online(&rcu_sched_state);18401840+ rcu_preempt_send_cbs_to_online();18441841 break;18451842 case CPU_DEAD:18461843 case CPU_DEAD_FROZEN:···18731880{18741881 int i;1875188218761876- for (i = NUM_RCU_LVLS - 1; i >= 0; i--)18831883+ for (i = NUM_RCU_LVLS - 1; i > 0; i--)18771884 rsp->levelspread[i] = CONFIG_RCU_FANOUT;18851885+ rsp->levelspread[0] = RCU_FANOUT_LEAF;18781886}18791887#else /* #ifdef CONFIG_RCU_FANOUT_EXACT */18801888static void __init rcu_init_levelspread(struct rcu_state *rsp)
+28-31
kernel/rcutree.h
···3131/*3232 * Define shape of hierarchy based on NR_CPUS and CONFIG_RCU_FANOUT.3333 * In theory, it should be possible to add more levels straightforwardly.3434- * In practice, this has not been tested, so there is probably some3535- * bug somewhere.3434+ * In practice, this did work well going from three levels to four.3535+ * Of course, your mileage may vary.3636 */3737#define MAX_RCU_LVLS 43838-#define RCU_FANOUT (CONFIG_RCU_FANOUT)3939-#define RCU_FANOUT_SQ (RCU_FANOUT * RCU_FANOUT)4040-#define RCU_FANOUT_CUBE (RCU_FANOUT_SQ * RCU_FANOUT)4141-#define RCU_FANOUT_FOURTH (RCU_FANOUT_CUBE * RCU_FANOUT)3838+#if CONFIG_RCU_FANOUT > 163939+#define RCU_FANOUT_LEAF 164040+#else /* #if CONFIG_RCU_FANOUT > 16 */4141+#define RCU_FANOUT_LEAF (CONFIG_RCU_FANOUT)4242+#endif /* #else #if CONFIG_RCU_FANOUT > 16 */4343+#define RCU_FANOUT_1 (RCU_FANOUT_LEAF)4444+#define RCU_FANOUT_2 (RCU_FANOUT_1 * CONFIG_RCU_FANOUT)4545+#define RCU_FANOUT_3 (RCU_FANOUT_2 * CONFIG_RCU_FANOUT)4646+#define RCU_FANOUT_4 (RCU_FANOUT_3 * CONFIG_RCU_FANOUT)42474343-#if NR_CPUS <= RCU_FANOUT4848+#if NR_CPUS <= RCU_FANOUT_14449# define NUM_RCU_LVLS 14550# define NUM_RCU_LVL_0 14651# define NUM_RCU_LVL_1 (NR_CPUS)4752# define NUM_RCU_LVL_2 04853# define NUM_RCU_LVL_3 04954# define NUM_RCU_LVL_4 05050-#elif NR_CPUS <= RCU_FANOUT_SQ5555+#elif NR_CPUS <= RCU_FANOUT_25156# define NUM_RCU_LVLS 25257# define NUM_RCU_LVL_0 15353-# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT)5858+# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)5459# define NUM_RCU_LVL_2 (NR_CPUS)5560# define NUM_RCU_LVL_3 05661# define NUM_RCU_LVL_4 05757-#elif NR_CPUS <= RCU_FANOUT_CUBE6262+#elif NR_CPUS <= RCU_FANOUT_35863# define NUM_RCU_LVLS 35964# define NUM_RCU_LVL_0 16060-# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_SQ)6161-# define NUM_RCU_LVL_2 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT)6262-# define NUM_RCU_LVL_3 NR_CPUS6565+# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_2)6666+# define NUM_RCU_LVL_2 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)6767+# define NUM_RCU_LVL_3 (NR_CPUS)6368# define NUM_RCU_LVL_4 06464-#elif NR_CPUS <= RCU_FANOUT_FOURTH6969+#elif NR_CPUS <= RCU_FANOUT_46570# define NUM_RCU_LVLS 46671# define NUM_RCU_LVL_0 16767-# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_CUBE)6868-# define NUM_RCU_LVL_2 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_SQ)6969-# define NUM_RCU_LVL_3 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT)7070-# define NUM_RCU_LVL_4 NR_CPUS7272+# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_3)7373+# define NUM_RCU_LVL_2 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_2)7474+# define NUM_RCU_LVL_3 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)7575+# define NUM_RCU_LVL_4 (NR_CPUS)7176#else7277# error "CONFIG_RCU_FANOUT insufficient for NR_CPUS"7373-#endif /* #if (NR_CPUS) <= RCU_FANOUT */7878+#endif /* #if (NR_CPUS) <= RCU_FANOUT_1 */74797580#define RCU_SUM (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2 + NUM_RCU_LVL_3 + NUM_RCU_LVL_4)7681#define NUM_RCU_NODES (RCU_SUM - NR_CPUS)···208203 long qlen_last_fqs_check;209204 /* qlen at last check for QS forcing */210205 unsigned long n_cbs_invoked; /* count of RCU cbs invoked. */211211- unsigned long n_cbs_orphaned; /* RCU cbs sent to orphanage. */212212- unsigned long n_cbs_adopted; /* RCU cbs adopted from orphanage. */206206+ unsigned long n_cbs_orphaned; /* RCU cbs orphaned by dying CPU */207207+ unsigned long n_cbs_adopted; /* RCU cbs adopted from dying CPU */213208 unsigned long n_force_qs_snap;214209 /* did other CPU force QS recently? */215210 long blimit; /* Upper limit on a processed batch */···314309 /* End of fields guarded by root rcu_node's lock. */315310316311 raw_spinlock_t onofflock; /* exclude on/offline and */317317- /* starting new GP. Also */318318- /* protects the following */319319- /* orphan_cbs fields. */320320- struct rcu_head *orphan_cbs_list; /* list of rcu_head structs */321321- /* orphaned by all CPUs in */322322- /* a given leaf rcu_node */323323- /* going offline. */324324- struct rcu_head **orphan_cbs_tail; /* And tail pointer. */325325- long orphan_qlen; /* Number of orphaned cbs. */312312+ /* starting new GP. */326313 raw_spinlock_t fqslock; /* Only one task forcing */327314 /* quiescent states. */328315 unsigned long jiffies_force_qs; /* Time at which to invoke */···387390static int rcu_preempt_pending(int cpu);388391static int rcu_preempt_needs_cpu(int cpu);389392static void __cpuinit rcu_preempt_init_percpu_data(int cpu);390390-static void rcu_preempt_send_cbs_to_orphanage(void);393393+static void rcu_preempt_send_cbs_to_online(void);391394static void __init __rcu_init_preempt(void);392395static void rcu_needs_cpu_flush(void);393396
+131-4
kernel/rcutree_plugin.h
···2525 */26262727#include <linux/delay.h>2828+#include <linux/stop_machine.h>28292930/*3031 * Check the RCU kernel configuration parameters and print informative···774773}775774776775/*777777- * Move preemptable RCU's callbacks to ->orphan_cbs_list.776776+ * Move preemptable RCU's callbacks from dying CPU to other online CPU.778777 */779779-static void rcu_preempt_send_cbs_to_orphanage(void)778778+static void rcu_preempt_send_cbs_to_online(void)780779{781781- rcu_send_cbs_to_orphanage(&rcu_preempt_state);780780+ rcu_send_cbs_to_online(&rcu_preempt_state);782781}783782784783/*···10021001/*10031002 * Because there is no preemptable RCU, there are no callbacks to move.10041003 */10051005-static void rcu_preempt_send_cbs_to_orphanage(void)10041004+static void rcu_preempt_send_cbs_to_online(void)10061005{10071006}10081007···10141013}1015101410161015#endif /* #else #ifdef CONFIG_TREE_PREEMPT_RCU */10161016+10171017+#ifndef CONFIG_SMP10181018+10191019+void synchronize_sched_expedited(void)10201020+{10211021+ cond_resched();10221022+}10231023+EXPORT_SYMBOL_GPL(synchronize_sched_expedited);10241024+10251025+#else /* #ifndef CONFIG_SMP */10261026+10271027+static atomic_t sync_sched_expedited_started = ATOMIC_INIT(0);10281028+static atomic_t sync_sched_expedited_done = ATOMIC_INIT(0);10291029+10301030+static int synchronize_sched_expedited_cpu_stop(void *data)10311031+{10321032+ /*10331033+ * There must be a full memory barrier on each affected CPU10341034+ * between the time that try_stop_cpus() is called and the10351035+ * time that it returns.10361036+ *10371037+ * In the current initial implementation of cpu_stop, the10381038+ * above condition is already met when the control reaches10391039+ * this point and the following smp_mb() is not strictly10401040+ * necessary. Do smp_mb() anyway for documentation and10411041+ * robustness against future implementation changes.10421042+ */10431043+ smp_mb(); /* See above comment block. */10441044+ return 0;10451045+}10461046+10471047+/*10481048+ * Wait for an rcu-sched grace period to elapse, but use "big hammer"10491049+ * approach to force grace period to end quickly. This consumes10501050+ * significant time on all CPUs, and is thus not recommended for10511051+ * any sort of common-case code.10521052+ *10531053+ * Note that it is illegal to call this function while holding any10541054+ * lock that is acquired by a CPU-hotplug notifier. Failing to10551055+ * observe this restriction will result in deadlock.10561056+ *10571057+ * This implementation can be thought of as an application of ticket10581058+ * locking to RCU, with sync_sched_expedited_started and10591059+ * sync_sched_expedited_done taking on the roles of the halves10601060+ * of the ticket-lock word. Each task atomically increments10611061+ * sync_sched_expedited_started upon entry, snapshotting the old value,10621062+ * then attempts to stop all the CPUs. If this succeeds, then each10631063+ * CPU will have executed a context switch, resulting in an RCU-sched10641064+ * grace period. We are then done, so we use atomic_cmpxchg() to10651065+ * update sync_sched_expedited_done to match our snapshot -- but10661066+ * only if someone else has not already advanced past our snapshot.10671067+ *10681068+ * On the other hand, if try_stop_cpus() fails, we check the value10691069+ * of sync_sched_expedited_done. If it has advanced past our10701070+ * initial snapshot, then someone else must have forced a grace period10711071+ * some time after we took our snapshot. In this case, our work is10721072+ * done for us, and we can simply return. Otherwise, we try again,10731073+ * but keep our initial snapshot for purposes of checking for someone10741074+ * doing our work for us.10751075+ *10761076+ * If we fail too many times in a row, we fall back to synchronize_sched().10771077+ */10781078+void synchronize_sched_expedited(void)10791079+{10801080+ int firstsnap, s, snap, trycount = 0;10811081+10821082+ /* Note that atomic_inc_return() implies full memory barrier. */10831083+ firstsnap = snap = atomic_inc_return(&sync_sched_expedited_started);10841084+ get_online_cpus();10851085+10861086+ /*10871087+ * Each pass through the following loop attempts to force a10881088+ * context switch on each CPU.10891089+ */10901090+ while (try_stop_cpus(cpu_online_mask,10911091+ synchronize_sched_expedited_cpu_stop,10921092+ NULL) == -EAGAIN) {10931093+ put_online_cpus();10941094+10951095+ /* No joy, try again later. Or just synchronize_sched(). */10961096+ if (trycount++ < 10)10971097+ udelay(trycount * num_online_cpus());10981098+ else {10991099+ synchronize_sched();11001100+ return;11011101+ }11021102+11031103+ /* Check to see if someone else did our work for us. */11041104+ s = atomic_read(&sync_sched_expedited_done);11051105+ if (UINT_CMP_GE((unsigned)s, (unsigned)firstsnap)) {11061106+ smp_mb(); /* ensure test happens before caller kfree */11071107+ return;11081108+ }11091109+11101110+ /*11111111+ * Refetching sync_sched_expedited_started allows later11121112+ * callers to piggyback on our grace period. We subtract11131113+ * 1 to get the same token that the last incrementer got.11141114+ * We retry after they started, so our grace period works11151115+ * for them, and they started after our first try, so their11161116+ * grace period works for us.11171117+ */11181118+ get_online_cpus();11191119+ snap = atomic_read(&sync_sched_expedited_started) - 1;11201120+ smp_mb(); /* ensure read is before try_stop_cpus(). */11211121+ }11221122+11231123+ /*11241124+ * Everyone up to our most recent fetch is covered by our grace11251125+ * period. Update the counter, but only if our work is still11261126+ * relevant -- which it won't be if someone who started later11271127+ * than we did beat us to the punch.11281128+ */11291129+ do {11301130+ s = atomic_read(&sync_sched_expedited_done);11311131+ if (UINT_CMP_GE((unsigned)s, (unsigned)snap)) {11321132+ smp_mb(); /* ensure test happens before caller kfree */11331133+ break;11341134+ }11351135+ } while (atomic_cmpxchg(&sync_sched_expedited_done, s, snap) != s);11361136+11371137+ put_online_cpus();11381138+}11391139+EXPORT_SYMBOL_GPL(synchronize_sched_expedited);11401140+11411141+#endif /* #else #ifndef CONFIG_SMP */1017114210181143#if !defined(CONFIG_RCU_FAST_NO_HZ)10191144
···95349534};95359535#endif /* CONFIG_CGROUP_CPUACCT */9536953695379537-#ifndef CONFIG_SMP95389538-95399539-void synchronize_sched_expedited(void)95409540-{95419541- barrier();95429542-}95439543-EXPORT_SYMBOL_GPL(synchronize_sched_expedited);95449544-95459545-#else /* #ifndef CONFIG_SMP */95469546-95479547-static atomic_t synchronize_sched_expedited_count = ATOMIC_INIT(0);95489548-95499549-static int synchronize_sched_expedited_cpu_stop(void *data)95509550-{95519551- /*95529552- * There must be a full memory barrier on each affected CPU95539553- * between the time that try_stop_cpus() is called and the95549554- * time that it returns.95559555- *95569556- * In the current initial implementation of cpu_stop, the95579557- * above condition is already met when the control reaches95589558- * this point and the following smp_mb() is not strictly95599559- * necessary. Do smp_mb() anyway for documentation and95609560- * robustness against future implementation changes.95619561- */95629562- smp_mb(); /* See above comment block. */95639563- return 0;95649564-}95659565-95669566-/*95679567- * Wait for an rcu-sched grace period to elapse, but use "big hammer"95689568- * approach to force grace period to end quickly. This consumes95699569- * significant time on all CPUs, and is thus not recommended for95709570- * any sort of common-case code.95719571- *95729572- * Note that it is illegal to call this function while holding any95739573- * lock that is acquired by a CPU-hotplug notifier. Failing to95749574- * observe this restriction will result in deadlock.95759575- */95769576-void synchronize_sched_expedited(void)95779577-{95789578- int snap, trycount = 0;95799579-95809580- smp_mb(); /* ensure prior mod happens before capturing snap. */95819581- snap = atomic_read(&synchronize_sched_expedited_count) + 1;95829582- get_online_cpus();95839583- while (try_stop_cpus(cpu_online_mask,95849584- synchronize_sched_expedited_cpu_stop,95859585- NULL) == -EAGAIN) {95869586- put_online_cpus();95879587- if (trycount++ < 10)95889588- udelay(trycount * num_online_cpus());95899589- else {95909590- synchronize_sched();95919591- return;95929592- }95939593- if (atomic_read(&synchronize_sched_expedited_count) - snap > 0) {95949594- smp_mb(); /* ensure test happens before caller kfree */95959595- return;95969596- }95979597- get_online_cpus();95989598- }95999599- atomic_inc(&synchronize_sched_expedited_count);96009600- smp_mb__after_atomic_inc(); /* ensure post-GP actions seen after GP. */96019601- put_online_cpus();96029602-}96039603-EXPORT_SYMBOL_GPL(synchronize_sched_expedited);96049604-96059605-#endif /* #else #ifndef CONFIG_SMP */
+7-1
kernel/srcu.c
···3131#include <linux/rcupdate.h>3232#include <linux/sched.h>3333#include <linux/smp.h>3434+#include <linux/delay.h>3435#include <linux/srcu.h>35363637static int init_srcu_struct_fields(struct srcu_struct *sp)···204203 * all srcu_read_lock() calls using the old counters have completed.205204 * Their corresponding critical sections might well be still206205 * executing, but the srcu_read_lock() primitives themselves207207- * will have finished executing.206206+ * will have finished executing. We initially give readers207207+ * an arbitrarily chosen 10 microseconds to get out of their208208+ * SRCU read-side critical sections, then loop waiting 1/HZ209209+ * seconds per iteration.208210 */209211212212+ if (srcu_readers_active_idx(sp, idx))213213+ udelay(CONFIG_SRCU_SYNCHRONIZE_DELAY);210214 while (srcu_readers_active_idx(sp, idx))211215 schedule_timeout_interruptible(1);212216