Documentation/RCU/whatisRCU.rst at v6.18

tjh.dev / kernel
fork
Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
fork
kernel / Documentation / RCU / whatisRCU.rst
at v6.18 1412 lines 53 kB view raw
wrap content
   1.. _whatisrcu_doc:
   2
   3What is RCU?  --  "Read, Copy, Update"
   4======================================
   5
   6Please note that the "What is RCU?" LWN series is an excellent place
   7to start learning about RCU:
   8
   9| 1.	What is RCU, Fundamentally?  https://lwn.net/Articles/262464/
  10| 2.	What is RCU? Part 2: Usage   https://lwn.net/Articles/263130/
  11| 3.	RCU part 3: the RCU API      https://lwn.net/Articles/264090/
  12| 4.	The RCU API, 2010 Edition    https://lwn.net/Articles/418853/
  13| 	2010 Big API Table           https://lwn.net/Articles/419086/
  14| 5.	The RCU API, 2014 Edition    https://lwn.net/Articles/609904/
  15|	2014 Big API Table           https://lwn.net/Articles/609973/
  16| 6.	The RCU API, 2019 Edition    https://lwn.net/Articles/777036/
  17|	2019 Big API Table           https://lwn.net/Articles/777165/
  18| 7.	The RCU API, 2024 Edition    https://lwn.net/Articles/988638/
  19|       2024 Background Information  https://lwn.net/Articles/988641/
  20|	2024 Big API Table           https://lwn.net/Articles/988666/
  21
  22For those preferring video:
  23
  24| 1.	Unraveling RCU Mysteries: Fundamentals          https://www.linuxfoundation.org/webinars/unraveling-rcu-usage-mysteries
  25| 2.	Unraveling RCU Mysteries: Additional Use Cases  https://www.linuxfoundation.org/webinars/unraveling-rcu-usage-mysteries-additional-use-cases
  26
  27
  28What is RCU?
  29
  30RCU is a synchronization mechanism that was added to the Linux kernel
  31during the 2.5 development effort that is optimized for read-mostly
  32situations.  Although RCU is actually quite simple, making effective use
  33of it requires you to think differently about your code.  Another part
  34of the problem is the mistaken assumption that there is "one true way" to
  35describe and to use RCU.  Instead, the experience has been that different
  36people must take different paths to arrive at an understanding of RCU,
  37depending on their experiences and use cases.  This document provides
  38several different paths, as follows:
  39
  40:ref:`1.	RCU OVERVIEW <1_whatisRCU>`
  41
  42:ref:`2.	WHAT IS RCU'S CORE API? <2_whatisRCU>`
  43
  44:ref:`3.	WHAT ARE SOME EXAMPLE USES OF CORE RCU API? <3_whatisRCU>`
  45
  46:ref:`4.	WHAT IF MY UPDATING THREAD CANNOT BLOCK? <4_whatisRCU>`
  47
  48:ref:`5.	WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? <5_whatisRCU>`
  49
  50:ref:`6.	ANALOGY WITH READER-WRITER LOCKING <6_whatisRCU>`
  51
  52:ref:`7.	ANALOGY WITH REFERENCE COUNTING <7_whatisRCU>`
  53
  54:ref:`8.	FULL LIST OF RCU APIs <8_whatisRCU>`
  55
  56:ref:`9.	ANSWERS TO QUICK QUIZZES <9_whatisRCU>`
  57
  58People who prefer starting with a conceptual overview should focus on
  59Section 1, though most readers will profit by reading this section at
  60some point.  People who prefer to start with an API that they can then
  61experiment with should focus on Section 2.  People who prefer to start
  62with example uses should focus on Sections 3 and 4.  People who need to
  63understand the RCU implementation should focus on Section 5, then dive
  64into the kernel source code.  People who reason best by analogy should
  65focus on Section 6 and 7.  Section 8 serves as an index to the docbook
  66API documentation, and Section 9 is the traditional answer key.
  67
  68So, start with the section that makes the most sense to you and your
  69preferred method of learning.  If you need to know everything about
  70everything, feel free to read the whole thing -- but if you are really
  71that type of person, you have perused the source code and will therefore
  72never need this document anyway.  ;-)
  73
  74.. _1_whatisRCU:
  75
  761.  RCU OVERVIEW
  77----------------
  78
  79The basic idea behind RCU is to split updates into "removal" and
  80"reclamation" phases.  The removal phase removes references to data items
  81within a data structure (possibly by replacing them with references to
  82new versions of these data items), and can run concurrently with readers.
  83The reason that it is safe to run the removal phase concurrently with
  84readers is the semantics of modern CPUs guarantee that readers will see
  85either the old or the new version of the data structure rather than a
  86partially updated reference.  The reclamation phase does the work of reclaiming
  87(e.g., freeing) the data items removed from the data structure during the
  88removal phase.  Because reclaiming data items can disrupt any readers
  89concurrently referencing those data items, the reclamation phase must
  90not start until readers no longer hold references to those data items.
  91
  92Splitting the update into removal and reclamation phases permits the
  93updater to perform the removal phase immediately, and to defer the
  94reclamation phase until all readers active during the removal phase have
  95completed, either by blocking until they finish or by registering a
  96callback that is invoked after they finish.  Only readers that are active
  97during the removal phase need be considered, because any reader starting
  98after the removal phase will be unable to gain a reference to the removed
  99data items, and therefore cannot be disrupted by the reclamation phase.
 100
 101So the typical RCU update sequence goes something like the following:
 102
 103a.	Remove pointers to a data structure, so that subsequent
 104	readers cannot gain a reference to it.
 105
 106b.	Wait for all previous readers to complete their RCU read-side
 107	critical sections.
 108
 109c.	At this point, there cannot be any readers who hold references
 110	to the data structure, so it now may safely be reclaimed
 111	(e.g., kfree()d).
 112
 113Step (b) above is the key idea underlying RCU's deferred destruction.
 114The ability to wait until all readers are done allows RCU readers to
 115use much lighter-weight synchronization, in some cases, absolutely no
 116synchronization at all.  In contrast, in more conventional lock-based
 117schemes, readers must use heavy-weight synchronization in order to
 118prevent an updater from deleting the data structure out from under them.
 119This is because lock-based updaters typically update data items in place,
 120and must therefore exclude readers.  In contrast, RCU-based updaters
 121typically take advantage of the fact that writes to single aligned
 122pointers are atomic on modern CPUs, allowing atomic insertion, removal,
 123and replacement of data items in a linked structure without disrupting
 124readers.  Concurrent RCU readers can then continue accessing the old
 125versions, and can dispense with the atomic operations, memory barriers,
 126and communications cache misses that are so expensive on present-day
 127SMP computer systems, even in absence of lock contention.
 128
 129In the three-step procedure shown above, the updater is performing both
 130the removal and the reclamation step, but it is often helpful for an
 131entirely different thread to do the reclamation, as is in fact the case
 132in the Linux kernel's directory-entry cache (dcache).  Even if the same
 133thread performs both the update step (step (a) above) and the reclamation
 134step (step (c) above), it is often helpful to think of them separately.
 135For example, RCU readers and updaters need not communicate at all,
 136but RCU provides implicit low-overhead communication between readers
 137and reclaimers, namely, in step (b) above.
 138
 139So how the heck can a reclaimer tell when a reader is done, given
 140that readers are not doing any sort of synchronization operations???
 141Read on to learn about how RCU's API makes this easy.
 142
 143.. _2_whatisRCU:
 144
 1452.  WHAT IS RCU'S CORE API?
 146---------------------------
 147
 148The core RCU API is quite small:
 149
 150a.	rcu_read_lock()
 151b.	rcu_read_unlock()
 152c.	synchronize_rcu() / call_rcu()
 153d.	rcu_assign_pointer()
 154e.	rcu_dereference()
 155
 156There are many other members of the RCU API, but the rest can be
 157expressed in terms of these five, though most implementations instead
 158express synchronize_rcu() in terms of the call_rcu() callback API.
 159
 160The five core RCU APIs are described below, the other 18 will be enumerated
 161later.  See the kernel docbook documentation for more info, or look directly
 162at the function header comments.
 163
 164rcu_read_lock()
 165^^^^^^^^^^^^^^^
 166	void rcu_read_lock(void);
 167
 168	This temporal primitive is used by a reader to inform the
 169	reclaimer that the reader is entering an RCU read-side critical
 170	section.  It is illegal to block while in an RCU read-side
 171	critical section, though kernels built with CONFIG_PREEMPT_RCU
 172	can preempt RCU read-side critical sections.  Any RCU-protected
 173	data structure accessed during an RCU read-side critical section
 174	is guaranteed to remain unreclaimed for the full duration of that
 175	critical section.  Reference counts may be used in conjunction
 176	with RCU to maintain longer-term references to data structures.
 177
 178	Note that anything that disables bottom halves, preemption,
 179	or interrupts also enters an RCU read-side critical section.
 180	Acquiring a spinlock also enters an RCU read-side critical
 181	sections, even for spinlocks that do not disable preemption,
 182	as is the case in kernels built with CONFIG_PREEMPT_RT=y.
 183	Sleeplocks do *not* enter RCU read-side critical sections.
 184
 185rcu_read_unlock()
 186^^^^^^^^^^^^^^^^^
 187	void rcu_read_unlock(void);
 188
 189	This temporal primitives is used by a reader to inform the
 190	reclaimer that the reader is exiting an RCU read-side critical
 191	section.  Anything that enables bottom halves, preemption,
 192	or interrupts also exits an RCU read-side critical section.
 193	Releasing a spinlock also exits an RCU read-side critical section.
 194
 195	Note that RCU read-side critical sections may be nested and/or
 196	overlapping.
 197
 198synchronize_rcu()
 199^^^^^^^^^^^^^^^^^
 200	void synchronize_rcu(void);
 201
 202	This temporal primitive marks the end of updater code and the
 203	beginning of reclaimer code.  It does this by blocking until
 204	all pre-existing RCU read-side critical sections on all CPUs
 205	have completed.  Note that synchronize_rcu() will **not**
 206	necessarily wait for any subsequent RCU read-side critical
 207	sections to complete.  For example, consider the following
 208	sequence of events::
 209
 210	         CPU 0                  CPU 1                 CPU 2
 211	     ----------------- ------------------------- ---------------
 212	 1.  rcu_read_lock()
 213	 2.                    enters synchronize_rcu()
 214	 3.                                               rcu_read_lock()
 215	 4.  rcu_read_unlock()
 216	 5.                     exits synchronize_rcu()
 217	 6.                                              rcu_read_unlock()
 218
 219	To reiterate, synchronize_rcu() waits only for ongoing RCU
 220	read-side critical sections to complete, not necessarily for
 221	any that begin after synchronize_rcu() is invoked.
 222
 223	Of course, synchronize_rcu() does not necessarily return
 224	**immediately** after the last pre-existing RCU read-side critical
 225	section completes.  For one thing, there might well be scheduling
 226	delays.  For another thing, many RCU implementations process
 227	requests in batches in order to improve efficiencies, which can
 228	further delay synchronize_rcu().
 229
 230	Since synchronize_rcu() is the API that must figure out when
 231	readers are done, its implementation is key to RCU.  For RCU
 232	to be useful in all but the most read-intensive situations,
 233	synchronize_rcu()'s overhead must also be quite small.
 234
 235	The call_rcu() API is an asynchronous callback form of
 236	synchronize_rcu(), and is described in more detail in a later
 237	section.  Instead of blocking, it registers a function and
 238	argument which are invoked after all ongoing RCU read-side
 239	critical sections have completed.  This callback variant is
 240	particularly useful in situations where it is illegal to block
 241	or where update-side performance is critically important.
 242
 243	However, the call_rcu() API should not be used lightly, as use
 244	of the synchronize_rcu() API generally results in simpler code.
 245	In addition, the synchronize_rcu() API has the nice property
 246	of automatically limiting update rate should grace periods
 247	be delayed.  This property results in system resilience in face
 248	of denial-of-service attacks.  Code using call_rcu() should limit
 249	update rate in order to gain this same sort of resilience.  See
 250	checklist.rst for some approaches to limiting the update rate.
 251
 252rcu_assign_pointer()
 253^^^^^^^^^^^^^^^^^^^^
 254	void rcu_assign_pointer(p, typeof(p) v);
 255
 256	Yes, rcu_assign_pointer() **is** implemented as a macro, though
 257	it would be cool to be able to declare a function in this manner.
 258	(And there has been some discussion of adding overloaded functions
 259	to the C language, so who knows?)
 260
 261	The updater uses this spatial macro to assign a new value to an
 262	RCU-protected pointer, in order to safely communicate the change
 263	in value from the updater to the reader.  This is a spatial (as
 264	opposed to temporal) macro.  It does not evaluate to an rvalue,
 265	but it does provide any compiler directives and memory-barrier
 266	instructions required for a given compile or CPU architecture.
 267	Its ordering properties are that of a store-release operation,
 268	that is, any prior loads and stores required to initialize the
 269	structure are ordered before the store that publishes the pointer
 270	to that structure.
 271
 272	Perhaps just as important, rcu_assign_pointer() serves to document
 273	(1) which pointers are protected by RCU and (2) the point at which
 274	a given structure becomes accessible to other CPUs.  That said,
 275	rcu_assign_pointer() is most frequently used indirectly, via
 276	the _rcu list-manipulation primitives such as list_add_rcu().
 277
 278rcu_dereference()
 279^^^^^^^^^^^^^^^^^
 280	typeof(p) rcu_dereference(p);
 281
 282	Like rcu_assign_pointer(), rcu_dereference() must be implemented
 283	as a macro.
 284
 285	The reader uses the spatial rcu_dereference() macro to fetch
 286	an RCU-protected pointer, which returns a value that may
 287	then be safely dereferenced.  Note that rcu_dereference()
 288	does not actually dereference the pointer, instead, it
 289	protects the pointer for later dereferencing.  It also
 290	executes any needed memory-barrier instructions for a given
 291	CPU architecture.  Currently, only Alpha needs memory barriers
 292	within rcu_dereference() -- on other CPUs, it compiles to a
 293	volatile load.	However, no mainstream C compilers respect
 294	address dependencies, so rcu_dereference() uses volatile casts,
 295	which, in combination with the coding guidelines listed in
 296	rcu_dereference.rst, prevent current compilers from breaking
 297	these dependencies.
 298
 299	Common coding practice uses rcu_dereference() to copy an
 300	RCU-protected pointer to a local variable, then dereferences
 301	this local variable, for example as follows::
 302
 303		p = rcu_dereference(head.next);
 304		return p->data;
 305
 306	However, in this case, one could just as easily combine these
 307	into one statement::
 308
 309		return rcu_dereference(head.next)->data;
 310
 311	If you are going to be fetching multiple fields from the
 312	RCU-protected structure, using the local variable is of
 313	course preferred.  Repeated rcu_dereference() calls look
 314	ugly, do not guarantee that the same pointer will be returned
 315	if an update happened while in the critical section, and incur
 316	unnecessary overhead on Alpha CPUs.
 317
 318	Note that the value returned by rcu_dereference() is valid
 319	only within the enclosing RCU read-side critical section [1]_.
 320	For example, the following is **not** legal::
 321
 322		rcu_read_lock();
 323		p = rcu_dereference(head.next);
 324		rcu_read_unlock();
 325		x = p->address;	/* BUG!!! */
 326		rcu_read_lock();
 327		y = p->data;	/* BUG!!! */
 328		rcu_read_unlock();
 329
 330	Holding a reference from one RCU read-side critical section
 331	to another is just as illegal as holding a reference from
 332	one lock-based critical section to another!  Similarly,
 333	using a reference outside of the critical section in which
 334	it was acquired is just as illegal as doing so with normal
 335	locking.
 336
 337	As with rcu_assign_pointer(), an important function of
 338	rcu_dereference() is to document which pointers are protected by
 339	RCU, in particular, flagging a pointer that is subject to changing
 340	at any time, including immediately after the rcu_dereference().
 341	And, again like rcu_assign_pointer(), rcu_dereference() is
 342	typically used indirectly, via the _rcu list-manipulation
 343	primitives, such as list_for_each_entry_rcu() [2]_.
 344
 345.. 	[1] The variant rcu_dereference_protected() can be used outside
 346	of an RCU read-side critical section as long as the usage is
 347	protected by locks acquired by the update-side code.  This variant
 348	avoids the lockdep warning that would happen when using (for
 349	example) rcu_dereference() without rcu_read_lock() protection.
 350	Using rcu_dereference_protected() also has the advantage
 351	of permitting compiler optimizations that rcu_dereference()
 352	must prohibit.	The rcu_dereference_protected() variant takes
 353	a lockdep expression to indicate which locks must be acquired
 354	by the caller. If the indicated protection is not provided,
 355	a lockdep splat is emitted.  See Design/Requirements/Requirements.rst
 356	and the API's code comments for more details and example usage.
 357
 358.. 	[2] If the list_for_each_entry_rcu() instance might be used by
 359	update-side code as well as by RCU readers, then an additional
 360	lockdep expression can be added to its list of arguments.
 361	For example, given an additional "lock_is_held(&mylock)" argument,
 362	the RCU lockdep code would complain only if this instance was
 363	invoked outside of an RCU read-side critical section and without
 364	the protection of mylock.
 365
 366The following diagram shows how each API communicates among the
 367reader, updater, and reclaimer.
 368::
 369
 370
 371	    rcu_assign_pointer()
 372	                            +--------+
 373	    +---------------------->| reader |---------+
 374	    |                       +--------+         |
 375	    |                           |              |
 376	    |                           |              | Protect:
 377	    |                           |              | rcu_read_lock()
 378	    |                           |              | rcu_read_unlock()
 379	    |        rcu_dereference()  |              |
 380	    +---------+                 |              |
 381	    | updater |<----------------+              |
 382	    +---------+                                V
 383	    |                                    +-----------+
 384	    +----------------------------------->| reclaimer |
 385	                                         +-----------+
 386	      Defer:
 387	      synchronize_rcu() & call_rcu()
 388
 389
 390The RCU infrastructure observes the temporal sequence of rcu_read_lock(),
 391rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in
 392order to determine when (1) synchronize_rcu() invocations may return
 393to their callers and (2) call_rcu() callbacks may be invoked.  Efficient
 394implementations of the RCU infrastructure make heavy use of batching in
 395order to amortize their overhead over many uses of the corresponding APIs.
 396The rcu_assign_pointer() and rcu_dereference() invocations communicate
 397spatial changes via stores to and loads from the RCU-protected pointer in
 398question.
 399
 400There are at least three flavors of RCU usage in the Linux kernel. The diagram
 401above shows the most common one. On the updater side, the rcu_assign_pointer(),
 402synchronize_rcu() and call_rcu() primitives used are the same for all three
 403flavors. However for protection (on the reader side), the primitives used vary
 404depending on the flavor:
 405
 406a.	rcu_read_lock() / rcu_read_unlock()
 407	rcu_dereference()
 408
 409b.	rcu_read_lock_bh() / rcu_read_unlock_bh()
 410	local_bh_disable() / local_bh_enable()
 411	rcu_dereference_bh()
 412
 413c.	rcu_read_lock_sched() / rcu_read_unlock_sched()
 414	preempt_disable() / preempt_enable()
 415	local_irq_save() / local_irq_restore()
 416	hardirq enter / hardirq exit
 417	NMI enter / NMI exit
 418	rcu_dereference_sched()
 419
 420These three flavors are used as follows:
 421
 422a.	RCU applied to normal data structures.
 423
 424b.	RCU applied to networking data structures that may be subjected
 425	to remote denial-of-service attacks.
 426
 427c.	RCU applied to scheduler and interrupt/NMI-handler tasks.
 428
 429Again, most uses will be of (a).  The (b) and (c) cases are important
 430for specialized uses, but are relatively uncommon.  The SRCU, RCU-Tasks,
 431RCU-Tasks-Rude, and RCU-Tasks-Trace have similar relationships among
 432their assorted primitives.
 433
 434.. _3_whatisRCU:
 435
 4363.  WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
 437-----------------------------------------------
 438
 439This section shows a simple use of the core RCU API to protect a
 440global pointer to a dynamically allocated structure.  More-typical
 441uses of RCU may be found in listRCU.rst and NMI-RCU.rst.
 442::
 443
 444	struct foo {
 445		int a;
 446		char b;
 447		long c;
 448	};
 449	DEFINE_SPINLOCK(foo_mutex);
 450
 451	struct foo __rcu *gbl_foo;
 452
 453	/*
 454	 * Create a new struct foo that is the same as the one currently
 455	 * pointed to by gbl_foo, except that field "a" is replaced
 456	 * with "new_a".  Points gbl_foo to the new structure, and
 457	 * frees up the old structure after a grace period.
 458	 *
 459	 * Uses rcu_assign_pointer() to ensure that concurrent readers
 460	 * see the initialized version of the new structure.
 461	 *
 462	 * Uses synchronize_rcu() to ensure that any readers that might
 463	 * have references to the old structure complete before freeing
 464	 * the old structure.
 465	 */
 466	void foo_update_a(int new_a)
 467	{
 468		struct foo *new_fp;
 469		struct foo *old_fp;
 470
 471		new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
 472		spin_lock(&foo_mutex);
 473		old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
 474		*new_fp = *old_fp;
 475		new_fp->a = new_a;
 476		rcu_assign_pointer(gbl_foo, new_fp);
 477		spin_unlock(&foo_mutex);
 478		synchronize_rcu();
 479		kfree(old_fp);
 480	}
 481
 482	/*
 483	 * Return the value of field "a" of the current gbl_foo
 484	 * structure.  Use rcu_read_lock() and rcu_read_unlock()
 485	 * to ensure that the structure does not get deleted out
 486	 * from under us, and use rcu_dereference() to ensure that
 487	 * we see the initialized version of the structure (important
 488	 * for DEC Alpha and for people reading the code).
 489	 */
 490	int foo_get_a(void)
 491	{
 492		int retval;
 493
 494		rcu_read_lock();
 495		retval = rcu_dereference(gbl_foo)->a;
 496		rcu_read_unlock();
 497		return retval;
 498	}
 499
 500So, to sum up:
 501
 502-	Use rcu_read_lock() and rcu_read_unlock() to guard RCU
 503	read-side critical sections.
 504
 505-	Within an RCU read-side critical section, use rcu_dereference()
 506	to dereference RCU-protected pointers.
 507
 508-	Use some solid design (such as locks or semaphores) to
 509	keep concurrent updates from interfering with each other.
 510
 511-	Use rcu_assign_pointer() to update an RCU-protected pointer.
 512	This primitive protects concurrent readers from the updater,
 513	**not** concurrent updates from each other!  You therefore still
 514	need to use locking (or something similar) to keep concurrent
 515	rcu_assign_pointer() primitives from interfering with each other.
 516
 517-	Use synchronize_rcu() **after** removing a data element from an
 518	RCU-protected data structure, but **before** reclaiming/freeing
 519	the data element, in order to wait for the completion of all
 520	RCU read-side critical sections that might be referencing that
 521	data item.
 522
 523See checklist.rst for additional rules to follow when using RCU.
 524And again, more-typical uses of RCU may be found in listRCU.rst
 525and NMI-RCU.rst.
 526
 527.. _4_whatisRCU:
 528
 5294.  WHAT IF MY UPDATING THREAD CANNOT BLOCK?
 530--------------------------------------------
 531
 532In the example above, foo_update_a() blocks until a grace period elapses.
 533This is quite simple, but in some cases one cannot afford to wait so
 534long -- there might be other high-priority work to be done.
 535
 536In such cases, one uses call_rcu() rather than synchronize_rcu().
 537The call_rcu() API is as follows::
 538
 539	void call_rcu(struct rcu_head *head, rcu_callback_t func);
 540
 541This function invokes func(head) after a grace period has elapsed.
 542This invocation might happen from either softirq or process context,
 543so the function is not permitted to block.  The foo struct needs to
 544have an rcu_head structure added, perhaps as follows::
 545
 546	struct foo {
 547		int a;
 548		char b;
 549		long c;
 550		struct rcu_head rcu;
 551	};
 552
 553The foo_update_a() function might then be written as follows::
 554
 555	/*
 556	 * Create a new struct foo that is the same as the one currently
 557	 * pointed to by gbl_foo, except that field "a" is replaced
 558	 * with "new_a".  Points gbl_foo to the new structure, and
 559	 * frees up the old structure after a grace period.
 560	 *
 561	 * Uses rcu_assign_pointer() to ensure that concurrent readers
 562	 * see the initialized version of the new structure.
 563	 *
 564	 * Uses call_rcu() to ensure that any readers that might have
 565	 * references to the old structure complete before freeing the
 566	 * old structure.
 567	 */
 568	void foo_update_a(int new_a)
 569	{
 570		struct foo *new_fp;
 571		struct foo *old_fp;
 572
 573		new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
 574		spin_lock(&foo_mutex);
 575		old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
 576		*new_fp = *old_fp;
 577		new_fp->a = new_a;
 578		rcu_assign_pointer(gbl_foo, new_fp);
 579		spin_unlock(&foo_mutex);
 580		call_rcu(&old_fp->rcu, foo_reclaim);
 581	}
 582
 583The foo_reclaim() function might appear as follows::
 584
 585	void foo_reclaim(struct rcu_head *rp)
 586	{
 587		struct foo *fp = container_of(rp, struct foo, rcu);
 588
 589		foo_cleanup(fp->a);
 590
 591		kfree(fp);
 592	}
 593
 594The container_of() primitive is a macro that, given a pointer into a
 595struct, the type of the struct, and the pointed-to field within the
 596struct, returns a pointer to the beginning of the struct.
 597
 598The use of call_rcu() permits the caller of foo_update_a() to
 599immediately regain control, without needing to worry further about the
 600old version of the newly updated element.  It also clearly shows the
 601RCU distinction between updater, namely foo_update_a(), and reclaimer,
 602namely foo_reclaim().
 603
 604The summary of advice is the same as for the previous section, except
 605that we are now using call_rcu() rather than synchronize_rcu():
 606
 607-	Use call_rcu() **after** removing a data element from an
 608	RCU-protected data structure in order to register a callback
 609	function that will be invoked after the completion of all RCU
 610	read-side critical sections that might be referencing that
 611	data item.
 612
 613If the callback for call_rcu() is not doing anything more than calling
 614kfree() on the structure, you can use kfree_rcu() instead of call_rcu()
 615to avoid having to write your own callback::
 616
 617	kfree_rcu(old_fp, rcu);
 618
 619If the occasional sleep is permitted, the single-argument form may
 620be used, omitting the rcu_head structure from struct foo.
 621
 622	kfree_rcu_mightsleep(old_fp);
 623
 624This variant almost never blocks, but might do so by invoking
 625synchronize_rcu() in response to memory-allocation failure.
 626
 627Again, see checklist.rst for additional rules governing the use of RCU.
 628
 629.. _5_whatisRCU:
 630
 6315.  WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?
 632------------------------------------------------
 633
 634One of the nice things about RCU is that it has extremely simple "toy"
 635implementations that are a good first step towards understanding the
 636production-quality implementations in the Linux kernel.  This section
 637presents two such "toy" implementations of RCU, one that is implemented
 638in terms of familiar locking primitives, and another that more closely
 639resembles "classic" RCU.  Both are way too simple for real-world use,
 640lacking both functionality and performance.  However, they are useful
 641in getting a feel for how RCU works.  See kernel/rcu/update.c for a
 642production-quality implementation, and see:
 643
 644	https://docs.google.com/document/d/1X0lThx8OK0ZgLMqVoXiR4ZrGURHrXK6NyLRbeXe3Xac/edit
 645
 646for papers describing the Linux kernel RCU implementation.  The OLS'01
 647and OLS'02 papers are a good introduction, and the dissertation provides
 648more details on the current implementation as of early 2004.
 649
 650
 6515A.  "TOY" IMPLEMENTATION #1: LOCKING
 652^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 653This section presents a "toy" RCU implementation that is based on
 654familiar locking primitives.  Its overhead makes it a non-starter for
 655real-life use, as does its lack of scalability.  It is also unsuitable
 656for realtime use, since it allows scheduling latency to "bleed" from
 657one read-side critical section to another.  It also assumes recursive
 658reader-writer locks:  If you try this with non-recursive locks, and
 659you allow nested rcu_read_lock() calls, you can deadlock.
 660
 661However, it is probably the easiest implementation to relate to, so is
 662a good starting point.
 663
 664It is extremely simple::
 665
 666	static DEFINE_RWLOCK(rcu_gp_mutex);
 667
 668	void rcu_read_lock(void)
 669	{
 670		read_lock(&rcu_gp_mutex);
 671	}
 672
 673	void rcu_read_unlock(void)
 674	{
 675		read_unlock(&rcu_gp_mutex);
 676	}
 677
 678	void synchronize_rcu(void)
 679	{
 680		write_lock(&rcu_gp_mutex);
 681		smp_mb__after_spinlock();
 682		write_unlock(&rcu_gp_mutex);
 683	}
 684
 685[You can ignore rcu_assign_pointer() and rcu_dereference() without missing
 686much.  But here are simplified versions anyway.  And whatever you do,
 687don't forget about them when submitting patches making use of RCU!]::
 688
 689	#define rcu_assign_pointer(p, v) \
 690	({ \
 691		smp_store_release(&(p), (v)); \
 692	})
 693
 694	#define rcu_dereference(p) \
 695	({ \
 696		typeof(p) _________p1 = READ_ONCE(p); \
 697		(_________p1); \
 698	})
 699
 700
 701The rcu_read_lock() and rcu_read_unlock() primitive read-acquire
 702and release a global reader-writer lock.  The synchronize_rcu()
 703primitive write-acquires this same lock, then releases it.  This means
 704that once synchronize_rcu() exits, all RCU read-side critical sections
 705that were in progress before synchronize_rcu() was called are guaranteed
 706to have completed -- there is no way that synchronize_rcu() would have
 707been able to write-acquire the lock otherwise.  The smp_mb__after_spinlock()
 708promotes synchronize_rcu() to a full memory barrier in compliance with
 709the "Memory-Barrier Guarantees" listed in:
 710
 711	Design/Requirements/Requirements.rst
 712
 713It is possible to nest rcu_read_lock(), since reader-writer locks may
 714be recursively acquired.  Note also that rcu_read_lock() is immune
 715from deadlock (an important property of RCU).  The reason for this is
 716that the only thing that can block rcu_read_lock() is a synchronize_rcu().
 717But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex,
 718so there can be no deadlock cycle.
 719
 720.. _quiz_1:
 721
 722Quick Quiz #1:
 723		Why is this argument naive?  How could a deadlock
 724		occur when using this algorithm in a real-world Linux
 725		kernel?  How could this deadlock be avoided?
 726
 727:ref:`Answers to Quick Quiz <9_whatisRCU>`
 728
 7295B.  "TOY" EXAMPLE #2: CLASSIC RCU
 730^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 731This section presents a "toy" RCU implementation that is based on
 732"classic RCU".  It is also short on performance (but only for updates) and
 733on features such as hotplug CPU and the ability to run in CONFIG_PREEMPTION
 734kernels.  The definitions of rcu_dereference() and rcu_assign_pointer()
 735are the same as those shown in the preceding section, so they are omitted.
 736::
 737
 738	void rcu_read_lock(void) { }
 739
 740	void rcu_read_unlock(void) { }
 741
 742	void synchronize_rcu(void)
 743	{
 744		int cpu;
 745
 746		for_each_possible_cpu(cpu)
 747			run_on(cpu);
 748	}
 749
 750Note that rcu_read_lock() and rcu_read_unlock() do absolutely nothing.
 751This is the great strength of classic RCU in a non-preemptive kernel:
 752read-side overhead is precisely zero, at least on non-Alpha CPUs.
 753And there is absolutely no way that rcu_read_lock() can possibly
 754participate in a deadlock cycle!
 755
 756The implementation of synchronize_rcu() simply schedules itself on each
 757CPU in turn.  The run_on() primitive can be implemented straightforwardly
 758in terms of the sched_setaffinity() primitive.  Of course, a somewhat less
 759"toy" implementation would restore the affinity upon completion rather
 760than just leaving all tasks running on the last CPU, but when I said
 761"toy", I meant **toy**!
 762
 763So how the heck is this supposed to work???
 764
 765Remember that it is illegal to block while in an RCU read-side critical
 766section.  Therefore, if a given CPU executes a context switch, we know
 767that it must have completed all preceding RCU read-side critical sections.
 768Once **all** CPUs have executed a context switch, then **all** preceding
 769RCU read-side critical sections will have completed.
 770
 771So, suppose that we remove a data item from its structure and then invoke
 772synchronize_rcu().  Once synchronize_rcu() returns, we are guaranteed
 773that there are no RCU read-side critical sections holding a reference
 774to that data item, so we can safely reclaim it.
 775
 776.. _quiz_2:
 777
 778Quick Quiz #2:
 779		Give an example where Classic RCU's read-side
 780		overhead is **negative**.
 781
 782:ref:`Answers to Quick Quiz <9_whatisRCU>`
 783
 784.. _quiz_3:
 785
 786Quick Quiz #3:
 787		If it is illegal to block in an RCU read-side
 788		critical section, what the heck do you do in
 789		CONFIG_PREEMPT_RT, where normal spinlocks can block???
 790
 791:ref:`Answers to Quick Quiz <9_whatisRCU>`
 792
 793.. _6_whatisRCU:
 794
 7956.  ANALOGY WITH READER-WRITER LOCKING
 796--------------------------------------
 797
 798Although RCU can be used in many different ways, a very common use of
 799RCU is analogous to reader-writer locking.  The following unified
 800diff shows how closely related RCU and reader-writer locking can be.
 801::
 802
 803	@@ -5,5 +5,5 @@ struct el {
 804	 	int data;
 805	 	/* Other data fields */
 806	 };
 807	-rwlock_t listmutex;
 808	+spinlock_t listmutex;
 809	 struct el head;
 810
 811	@@ -13,15 +14,15 @@
 812		struct list_head *lp;
 813		struct el *p;
 814
 815	-	read_lock(&listmutex);
 816	-	list_for_each_entry(p, head, lp) {
 817	+	rcu_read_lock();
 818	+	list_for_each_entry_rcu(p, head, lp) {
 819			if (p->key == key) {
 820				*result = p->data;
 821	-			read_unlock(&listmutex);
 822	+			rcu_read_unlock();
 823				return 1;
 824			}
 825		}
 826	-	read_unlock(&listmutex);
 827	+	rcu_read_unlock();
 828		return 0;
 829	 }
 830
 831	@@ -29,15 +30,16 @@
 832	 {
 833		struct el *p;
 834
 835	-	write_lock(&listmutex);
 836	+	spin_lock(&listmutex);
 837		list_for_each_entry(p, head, lp) {
 838			if (p->key == key) {
 839	-			list_del(&p->list);
 840	-			write_unlock(&listmutex);
 841	+			list_del_rcu(&p->list);
 842	+			spin_unlock(&listmutex);
 843	+			synchronize_rcu();
 844				kfree(p);
 845				return 1;
 846			}
 847		}
 848	-	write_unlock(&listmutex);
 849	+	spin_unlock(&listmutex);
 850		return 0;
 851	 }
 852
 853Or, for those who prefer a side-by-side listing::
 854
 855 1 struct el {                          1 struct el {
 856 2   struct list_head list;             2   struct list_head list;
 857 3   long key;                          3   long key;
 858 4   spinlock_t mutex;                  4   spinlock_t mutex;
 859 5   int data;                          5   int data;
 860 6   /* Other data fields */            6   /* Other data fields */
 861 7 };                                   7 };
 862 8 rwlock_t listmutex;                  8 spinlock_t listmutex;
 863 9 struct el head;                      9 struct el head;
 864
 865::
 866
 867  1 int search(long key, int *result)    1 int search(long key, int *result)
 868  2 {                                    2 {
 869  3   struct list_head *lp;              3   struct list_head *lp;
 870  4   struct el *p;                      4   struct el *p;
 871  5                                      5
 872  6   read_lock(&listmutex);             6   rcu_read_lock();
 873  7   list_for_each_entry(p, head, lp) { 7   list_for_each_entry_rcu(p, head, lp) {
 874  8     if (p->key == key) {             8     if (p->key == key) {
 875  9       *result = p->data;             9       *result = p->data;
 876 10       read_unlock(&listmutex);      10       rcu_read_unlock();
 877 11       return 1;                     11       return 1;
 878 12     }                               12     }
 879 13   }                                 13   }
 880 14   read_unlock(&listmutex);          14   rcu_read_unlock();
 881 15   return 0;                         15   return 0;
 882 16 }                                   16 }
 883
 884::
 885
 886  1 int delete(long key)                 1 int delete(long key)
 887  2 {                                    2 {
 888  3   struct el *p;                      3   struct el *p;
 889  4                                      4
 890  5   write_lock(&listmutex);            5   spin_lock(&listmutex);
 891  6   list_for_each_entry(p, head, lp) { 6   list_for_each_entry(p, head, lp) {
 892  7     if (p->key == key) {             7     if (p->key == key) {
 893  8       list_del(&p->list);            8       list_del_rcu(&p->list);
 894  9       write_unlock(&listmutex);      9       spin_unlock(&listmutex);
 895                                        10       synchronize_rcu();
 896 10       kfree(p);                     11       kfree(p);
 897 11       return 1;                     12       return 1;
 898 12     }                               13     }
 899 13   }                                 14   }
 900 14   write_unlock(&listmutex);         15   spin_unlock(&listmutex);
 901 15   return 0;                         16   return 0;
 902 16 }                                   17 }
 903
 904Either way, the differences are quite small.  Read-side locking moves
 905to rcu_read_lock() and rcu_read_unlock, update-side locking moves from
 906a reader-writer lock to a simple spinlock, and a synchronize_rcu()
 907precedes the kfree().
 908
 909However, there is one potential catch: the read-side and update-side
 910critical sections can now run concurrently.  In many cases, this will
 911not be a problem, but it is necessary to check carefully regardless.
 912For example, if multiple independent list updates must be seen as
 913a single atomic update, converting to RCU will require special care.
 914
 915Also, the presence of synchronize_rcu() means that the RCU version of
 916delete() can now block.  If this is a problem, there is a callback-based
 917mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can
 918be used in place of synchronize_rcu().
 919
 920.. _7_whatisRCU:
 921
 9227.  ANALOGY WITH REFERENCE COUNTING
 923-----------------------------------
 924
 925The reader-writer analogy (illustrated by the previous section) is not
 926always the best way to think about using RCU.  Another helpful analogy
 927considers RCU an effective reference count on everything which is
 928protected by RCU.
 929
 930A reference count typically does not prevent the referenced object's
 931values from changing, but does prevent changes to type -- particularly the
 932gross change of type that happens when that object's memory is freed and
 933re-allocated for some other purpose.  Once a type-safe reference to the
 934object is obtained, some other mechanism is needed to ensure consistent
 935access to the data in the object.  This could involve taking a spinlock,
 936but with RCU the typical approach is to perform reads with SMP-aware
 937operations such as smp_load_acquire(), to perform updates with atomic
 938read-modify-write operations, and to provide the necessary ordering.
 939RCU provides a number of support functions that embed the required
 940operations and ordering, such as the list_for_each_entry_rcu() macro
 941used in the previous section.
 942
 943A more focused view of the reference counting behavior is that,
 944between rcu_read_lock() and rcu_read_unlock(), any reference taken with
 945rcu_dereference() on a pointer marked as ``__rcu`` can be treated as
 946though a reference-count on that object has been temporarily increased.
 947This prevents the object from changing type.  Exactly what this means
 948will depend on normal expectations of objects of that type, but it
 949typically includes that spinlocks can still be safely locked, normal
 950reference counters can be safely manipulated, and ``__rcu`` pointers
 951can be safely dereferenced.
 952
 953Some operations that one might expect to see on an object for
 954which an RCU reference is held include:
 955
 956 - Copying out data that is guaranteed to be stable by the object's type.
 957 - Using kref_get_unless_zero() or similar to get a longer-term
 958   reference.  This may fail of course.
 959 - Acquiring a spinlock in the object, and checking if the object still
 960   is the expected object and if so, manipulating it freely.
 961
 962The understanding that RCU provides a reference that only prevents a
 963change of type is particularly visible with objects allocated from a
 964slab cache marked ``SLAB_TYPESAFE_BY_RCU``.  RCU operations may yield a
 965reference to an object from such a cache that has been concurrently freed
 966and the memory reallocated to a completely different object, though of
 967the same type.  In this case RCU doesn't even protect the identity of the
 968object from changing, only its type.  So the object found may not be the
 969one expected, but it will be one where it is safe to take a reference
 970(and then potentially acquiring a spinlock), allowing subsequent code
 971to check whether the identity matches expectations.  It is tempting
 972to simply acquire the spinlock without first taking the reference, but
 973unfortunately any spinlock in a ``SLAB_TYPESAFE_BY_RCU`` object must be
 974initialized after each and every call to kmem_cache_alloc(), which renders
 975reference-free spinlock acquisition completely unsafe.  Therefore, when
 976using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter.
 977If using refcount_t, the specialized refcount_{add|inc}_not_zero_acquire()
 978and refcount_set_release() APIs should be used to ensure correct operation
 979ordering when verifying object identity and when initializing newly
 980allocated objects. Acquire fence in refcount_{add|inc}_not_zero_acquire()
 981ensures that identity checks happen *after* reference count is taken.
 982refcount_set_release() should be called after a newly allocated object is
 983fully initialized and release fence ensures that new values are visible
 984*before* refcount can be successfully taken by other users. Once
 985refcount_set_release() is called, the object should be considered visible
 986by other tasks.
 987(Those willing to initialize their locks in a kmem_cache constructor
 988may also use locking, including cache-friendly sequence locking.)
 989
 990With traditional reference counting -- such as that implemented by the
 991kref library in Linux -- there is typically code that runs when the last
 992reference to an object is dropped.  With kref, this is the function
 993passed to kref_put().  When RCU is being used, such finalization code
 994must not be run until all ``__rcu`` pointers referencing the object have
 995been updated, and then a grace period has passed.  Every remaining
 996globally visible pointer to the object must be considered to be a
 997potential counted reference, and the finalization code is typically run
 998using call_rcu() only after all those pointers have been changed.
 999
1000To see how to choose between these two analogies -- of RCU as a
1001reader-writer lock and RCU as a reference counting system -- it is useful
1002to reflect on the scale of the thing being protected.  The reader-writer
1003lock analogy looks at larger multi-part objects such as a linked list
1004and shows how RCU can facilitate concurrency while elements are added
1005to, and removed from, the list.  The reference-count analogy looks at
1006the individual objects and looks at how they can be accessed safely
1007within whatever whole they are a part of.
1008
1009.. _8_whatisRCU:
1010
10118.  FULL LIST OF RCU APIs
1012-------------------------
1013
1014The RCU APIs are documented in docbook-format header comments in the
1015Linux-kernel source code, but it helps to have a full list of the
1016APIs, since there does not appear to be a way to categorize them
1017in docbook.  Here is the list, by category.
1018
1019RCU list traversal::
1020
1021	list_entry_rcu
1022	list_entry_lockless
1023	list_first_entry_rcu
1024	list_first_or_null_rcu
1025	list_tail_rcu
1026	list_next_rcu
1027	list_next_or_null_rcu
1028	list_for_each_entry_rcu
1029	list_for_each_entry_continue_rcu
1030	list_for_each_entry_from_rcu
1031	list_for_each_entry_lockless
1032	hlist_first_rcu
1033	hlist_next_rcu
1034	hlist_pprev_rcu
1035	hlist_for_each_entry_rcu
1036	hlist_for_each_entry_rcu_notrace
1037	hlist_for_each_entry_rcu_bh
1038	hlist_for_each_entry_from_rcu
1039	hlist_for_each_entry_continue_rcu
1040	hlist_for_each_entry_continue_rcu_bh
1041	hlist_nulls_first_rcu
1042	hlist_nulls_next_rcu
1043	hlist_nulls_for_each_entry_rcu
1044	hlist_nulls_for_each_entry_safe
1045	hlist_bl_first_rcu
1046	hlist_bl_for_each_entry_rcu
1047
1048RCU pointer/list update::
1049
1050	rcu_assign_pointer
1051	rcu_replace_pointer
1052	INIT_LIST_HEAD_RCU
1053	list_add_rcu
1054	list_add_tail_rcu
1055	list_del_rcu
1056	list_replace_rcu
1057	list_splice_init_rcu
1058	list_splice_tail_init_rcu
1059	hlist_add_behind_rcu
1060	hlist_add_before_rcu
1061	hlist_add_head_rcu
1062	hlist_add_tail_rcu
1063	hlist_del_rcu
1064	hlist_del_init_rcu
1065	hlist_replace_rcu
1066	hlist_nulls_del_init_rcu
1067	hlist_nulls_del_rcu
1068	hlist_nulls_add_head_rcu
1069	hlist_nulls_add_tail_rcu
1070	hlist_nulls_add_fake
1071	hlists_swap_heads_rcu
1072	hlist_bl_add_head_rcu
1073	hlist_bl_del_rcu
1074	hlist_bl_set_first_rcu
1075
1076RCU::
1077
1078	Critical sections		Grace period		Barrier
1079
1080	rcu_read_lock			synchronize_net		rcu_barrier
1081	rcu_read_unlock			synchronize_rcu
1082	guard(rcu)()			synchronize_rcu_expedited
1083	scoped_guard(rcu)		synchronize_rcu_mult
1084	rcu_dereference			call_rcu
1085	rcu_dereference_check		call_rcu_hurry
1086	rcu_dereference_protected	kfree_rcu
1087	rcu_read_lock_held		kvfree_rcu
1088	rcu_read_lock_any_held		kfree_rcu_mightsleep
1089	rcu_pointer_handoff		cond_synchronize_rcu
1090	unrcu_pointer			cond_synchronize_rcu_full
1091					cond_synchronize_rcu_expedited
1092					cond_synchronize_rcu_expedited_full
1093					get_completed_synchronize_rcu
1094					get_completed_synchronize_rcu_full
1095					get_state_synchronize_rcu
1096					get_state_synchronize_rcu_full
1097					poll_state_synchronize_rcu
1098					poll_state_synchronize_rcu_full
1099					same_state_synchronize_rcu
1100					same_state_synchronize_rcu_full
1101					start_poll_synchronize_rcu
1102					start_poll_synchronize_rcu_full
1103					start_poll_synchronize_rcu_expedited
1104					start_poll_synchronize_rcu_expedited_full
1105
1106bh::
1107
1108	Critical sections	Grace period		Barrier
1109
1110	rcu_read_lock_bh	[Same as RCU]		[Same as RCU]
1111	rcu_read_unlock_bh
1112	[local_bh_disable]
1113	[and friends]
1114	rcu_dereference_bh
1115	rcu_dereference_bh_check
1116	rcu_dereference_bh_protected
1117	rcu_read_lock_bh_held
1118
1119sched::
1120
1121	Critical sections	Grace period		Barrier
1122
1123	rcu_read_lock_sched	[Same as RCU]		[Same as RCU]
1124	rcu_read_unlock_sched
1125	[preempt_disable]
1126	[and friends]
1127	rcu_read_lock_sched_notrace
1128	rcu_read_unlock_sched_notrace
1129	rcu_dereference_sched
1130	rcu_dereference_sched_check
1131	rcu_dereference_sched_protected
1132	rcu_read_lock_sched_held
1133
1134
1135RCU: Initialization/cleanup/ordering::
1136
1137	RCU_INIT_POINTER
1138	RCU_INITIALIZER
1139	RCU_POINTER_INITIALIZER
1140	init_rcu_head
1141	destroy_rcu_head
1142	init_rcu_head_on_stack
1143	destroy_rcu_head_on_stack
1144	SLAB_TYPESAFE_BY_RCU
1145
1146
1147RCU: Quiescents states and control::
1148
1149	cond_resched_tasks_rcu_qs
1150	rcu_all_qs
1151	rcu_softirq_qs_periodic
1152	rcu_end_inkernel_boot
1153	rcu_expedite_gp
1154	rcu_gp_is_expedited
1155	rcu_unexpedite_gp
1156	rcu_cpu_stall_reset
1157	rcu_head_after_call_rcu
1158	rcu_is_watching
1159
1160
1161RCU-sync primitive::
1162
1163	rcu_sync_is_idle
1164	rcu_sync_init
1165	rcu_sync_enter
1166	rcu_sync_exit
1167	rcu_sync_dtor
1168
1169
1170RCU-Tasks::
1171
1172	Critical sections	Grace period			Barrier
1173
1174	N/A			call_rcu_tasks			rcu_barrier_tasks
1175				synchronize_rcu_tasks
1176
1177
1178RCU-Tasks-Rude::
1179
1180	Critical sections	Grace period			Barrier
1181
1182	N/A			synchronize_rcu_tasks_rude	rcu_barrier_tasks_rude
1183				call_rcu_tasks_rude
1184
1185
1186RCU-Tasks-Trace::
1187
1188	Critical sections	Grace period			Barrier
1189
1190	rcu_read_lock_trace	call_rcu_tasks_trace		rcu_barrier_tasks_trace
1191	rcu_read_unlock_trace	synchronize_rcu_tasks_trace
1192	guard(rcu_tasks_trace)()
1193	scoped_guard(rcu_tasks_trace)
1194
1195
1196SRCU list traversal::
1197	list_for_each_entry_srcu
1198	hlist_for_each_entry_srcu
1199
1200
1201SRCU::
1202
1203	Critical sections		Grace period		Barrier
1204
1205	srcu_read_lock			call_srcu		srcu_barrier
1206	srcu_read_unlock		synchronize_srcu
1207	srcu_read_lock_fast		synchronize_srcu_expedited
1208	srcu_read_unlock_fast		get_state_synchronize_srcu
1209	srcu_read_lock_nmisafe		start_poll_synchronize_srcu
1210	srcu_read_unlock_nmisafe	start_poll_synchronize_srcu_expedited
1211	srcu_read_lock_notrace		poll_state_synchronize_srcu
1212	srcu_read_unlock_notrace
1213	srcu_down_read
1214	srcu_up_read
1215	srcu_down_read_fast
1216	srcu_up_read_fast
1217	guard(srcu)()
1218	scoped_guard(srcu)
1219	srcu_read_lock_held
1220	srcu_dereference
1221	srcu_dereference_check
1222	srcu_dereference_notrace
1223	srcu_read_lock_held
1224
1225
1226SRCU: Initialization/cleanup/ordering::
1227
1228	DEFINE_SRCU
1229	DEFINE_STATIC_SRCU
1230	init_srcu_struct
1231	cleanup_srcu_struct
1232	smp_mb__after_srcu_read_unlock
1233
1234All: lockdep-checked RCU utility APIs::
1235
1236	RCU_LOCKDEP_WARN
1237	rcu_sleep_check
1238
1239All: Unchecked RCU-protected pointer access::
1240
1241	rcu_dereference_raw
1242
1243All: Unchecked RCU-protected pointer access with dereferencing prohibited::
1244
1245	rcu_access_pointer
1246
1247See the comment headers in the source code (or the docbook generated
1248from them) for more information.
1249
1250However, given that there are no fewer than four families of RCU APIs
1251in the Linux kernel, how do you choose which one to use?  The following
1252list can be helpful:
1253
1254a.	Will readers need to block?  If so, you need SRCU.
1255
1256b.	Will readers need to block and are you doing tracing, for
1257	example, ftrace or BPF?  If so, you need RCU-tasks,
1258	RCU-tasks-rude, and/or RCU-tasks-trace.
1259
1260c.	What about the -rt patchset?  If readers would need to block in
1261	an non-rt kernel, you need SRCU.  If readers would block when
1262	acquiring spinlocks in a -rt kernel, but not in a non-rt kernel,
1263	SRCU is not necessary.	(The -rt patchset turns spinlocks into
1264	sleeplocks, hence this distinction.)
1265
1266d.	Do you need to treat NMI handlers, hardirq handlers,
1267	and code segments with preemption disabled (whether
1268	via preempt_disable(), local_irq_save(), local_bh_disable(),
1269	or some other mechanism) as if they were explicit RCU readers?
1270	If so, RCU-sched readers are the only choice that will work
1271	for you, but since about v4.20 you use can use the vanilla RCU
1272	update primitives.
1273
1274e.	Do you need RCU grace periods to complete even in the face of
1275	softirq monopolization of one or more of the CPUs?  For example,
1276	is your code subject to network-based denial-of-service attacks?
1277	If so, you should disable softirq across your readers, for
1278	example, by using rcu_read_lock_bh().  Since about v4.20 you
1279	use can use the vanilla RCU update primitives.
1280
1281f.	Is your workload too update-intensive for normal use of
1282	RCU, but inappropriate for other synchronization mechanisms?
1283	If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
1284	named SLAB_DESTROY_BY_RCU).  But please be careful!
1285
1286g.	Do you need read-side critical sections that are respected even
1287	on CPUs that are deep in the idle loop, during entry to or exit
1288	from user-mode execution, or on an offlined CPU?  If so, SRCU
1289	and RCU Tasks Trace are the only choices that will work for you,
1290	with SRCU being strongly preferred in almost all cases.
1291
1292h.	Otherwise, use RCU.
1293
1294Of course, this all assumes that you have determined that RCU is in fact
1295the right tool for your job.
1296
1297.. _9_whatisRCU:
1298
12999.  ANSWERS TO QUICK QUIZZES
1300----------------------------
1301
1302Quick Quiz #1:
1303		Why is this argument naive?  How could a deadlock
1304		occur when using this algorithm in a real-world Linux
1305		kernel?  [Referring to the lock-based "toy" RCU
1306		algorithm.]
1307
1308Answer:
1309		Consider the following sequence of events:
1310
1311		1.	CPU 0 acquires some unrelated lock, call it
1312			"problematic_lock", disabling irq via
1313			spin_lock_irqsave().
1314
1315		2.	CPU 1 enters synchronize_rcu(), write-acquiring
1316			rcu_gp_mutex.
1317
1318		3.	CPU 0 enters rcu_read_lock(), but must wait
1319			because CPU 1 holds rcu_gp_mutex.
1320
1321		4.	CPU 1 is interrupted, and the irq handler
1322			attempts to acquire problematic_lock.
1323
1324		The system is now deadlocked.
1325
1326		One way to avoid this deadlock is to use an approach like
1327		that of CONFIG_PREEMPT_RT, where all normal spinlocks
1328		become blocking locks, and all irq handlers execute in
1329		the context of special tasks.  In this case, in step 4
1330		above, the irq handler would block, allowing CPU 1 to
1331		release rcu_gp_mutex, avoiding the deadlock.
1332
1333		Even in the absence of deadlock, this RCU implementation
1334		allows latency to "bleed" from readers to other
1335		readers through synchronize_rcu().  To see this,
1336		consider task A in an RCU read-side critical section
1337		(thus read-holding rcu_gp_mutex), task B blocked
1338		attempting to write-acquire rcu_gp_mutex, and
1339		task C blocked in rcu_read_lock() attempting to
1340		read_acquire rcu_gp_mutex.  Task A's RCU read-side
1341		latency is holding up task C, albeit indirectly via
1342		task B.
1343
1344		Realtime RCU implementations therefore use a counter-based
1345		approach where tasks in RCU read-side critical sections
1346		cannot be blocked by tasks executing synchronize_rcu().
1347
1348:ref:`Back to Quick Quiz #1 <quiz_1>`
1349
1350Quick Quiz #2:
1351		Give an example where Classic RCU's read-side
1352		overhead is **negative**.
1353
1354Answer:
1355		Imagine a single-CPU system with a non-CONFIG_PREEMPTION
1356		kernel where a routing table is used by process-context
1357		code, but can be updated by irq-context code (for example,
1358		by an "ICMP REDIRECT" packet).	The usual way of handling
1359		this would be to have the process-context code disable
1360		interrupts while searching the routing table.  Use of
1361		RCU allows such interrupt-disabling to be dispensed with.
1362		Thus, without RCU, you pay the cost of disabling interrupts,
1363		and with RCU you don't.
1364
1365		One can argue that the overhead of RCU in this
1366		case is negative with respect to the single-CPU
1367		interrupt-disabling approach.  Others might argue that
1368		the overhead of RCU is merely zero, and that replacing
1369		the positive overhead of the interrupt-disabling scheme
1370		with the zero-overhead RCU scheme does not constitute
1371		negative overhead.
1372
1373		In real life, of course, things are more complex.  But
1374		even the theoretical possibility of negative overhead for
1375		a synchronization primitive is a bit unexpected.  ;-)
1376
1377:ref:`Back to Quick Quiz #2 <quiz_2>`
1378
1379Quick Quiz #3:
1380		If it is illegal to block in an RCU read-side
1381		critical section, what the heck do you do in
1382		CONFIG_PREEMPT_RT, where normal spinlocks can block???
1383
1384Answer:
1385		Just as CONFIG_PREEMPT_RT permits preemption of spinlock
1386		critical sections, it permits preemption of RCU
1387		read-side critical sections.  It also permits
1388		spinlocks blocking while in RCU read-side critical
1389		sections.
1390
1391		Why the apparent inconsistency?  Because it is
1392		possible to use priority boosting to keep the RCU
1393		grace periods short if need be (for example, if running
1394		short of memory).  In contrast, if blocking waiting
1395		for (say) network reception, there is no way to know
1396		what should be boosted.  Especially given that the
1397		process we need to boost might well be a human being
1398		who just went out for a pizza or something.  And although
1399		a computer-operated cattle prod might arouse serious
1400		interest, it might also provoke serious objections.
1401		Besides, how does the computer know what pizza parlor
1402		the human being went to???
1403
1404:ref:`Back to Quick Quiz #3 <quiz_3>`
1405
1406ACKNOWLEDGEMENTS
1407
1408My thanks to the people who helped make this human-readable, including
1409Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern.
1410
1411
1412For more information, see http://www.rdrop.com/users/paulmck/RCU.
Configure Feed

Configure Feed