Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

+4

.mailmap

··· 210 210 Patrick Mochel <mochel@digitalimplant.org> 211 211 Paul Burton <paulburton@kernel.org> <paul.burton@imgtec.com> 212 212 Paul Burton <paulburton@kernel.org> <paul.burton@mips.com> 213 + Paul E. McKenney <paulmck@kernel.org> <paulmck@linux.ibm.com> 214 + Paul E. McKenney <paulmck@kernel.org> <paulmck@linux.vnet.ibm.com> 215 + Paul E. McKenney <paulmck@kernel.org> <paul.mckenney@linaro.org> 216 + Paul E. McKenney <paulmck@kernel.org> <paulmck@us.ibm.com> 213 217 Peter A Jonsson <pj@ludd.ltu.se> 214 218 Peter Oruba <peter@oruba.de> 215 219 Peter Oruba <peter.oruba@amd.com>

+28 -25

Documentation/RCU/NMI-RCU.txt Documentation/RCU/NMI-RCU.rst

··· 1 + .. _NMI_rcu_doc: 2 + 1 3 Using RCU to Protect Dynamic NMI Handlers 4 + ========================================= 2 5 3 6 4 7 Although RCU is usually used to protect read-mostly data structures, ··· 12 9 "arch/x86/kernel/traps.c". 13 10 14 11 The relevant pieces of code are listed below, each followed by a 15 - brief explanation. 12 + brief explanation:: 16 13 17 14 static int dummy_nmi_callback(struct pt_regs *regs, int cpu) 18 15 { ··· 21 18 22 19 The dummy_nmi_callback() function is a "dummy" NMI handler that does 23 20 nothing, but returns zero, thus saying that it did nothing, allowing 24 - the NMI handler to take the default machine-specific action. 21 + the NMI handler to take the default machine-specific action:: 25 22 26 23 static nmi_callback_t nmi_callback = dummy_nmi_callback; 27 24 28 25 This nmi_callback variable is a global function pointer to the current 29 - NMI handler. 26 + NMI handler:: 30 27 31 28 void do_nmi(struct pt_regs * regs, long error_code) 32 29 { ··· 56 53 for anyone attempting to do something similar on Alpha or on systems 57 54 with aggressive optimizing compilers. 58 55 59 - Quick Quiz: Why might the rcu_dereference_sched() be necessary on Alpha, 60 - given that the code referenced by the pointer is read-only? 56 + Quick Quiz: 57 + Why might the rcu_dereference_sched() be necessary on Alpha, given that the code referenced by the pointer is read-only? 61 58 59 + :ref:`Answer to Quick Quiz <answer_quick_quiz_NMI>` 62 60 63 - Back to the discussion of NMI and RCU... 61 + Back to the discussion of NMI and RCU:: 64 62 65 63 void set_nmi_callback(nmi_callback_t callback) 66 64 { ··· 72 68 data that is to be used by the callback must be initialized up -before- 73 69 the call to set_nmi_callback(). On architectures that do not order 74 70 writes, the rcu_assign_pointer() ensures that the NMI handler sees the 75 - initialized values. 71 + initialized values:: 76 72 77 73 void unset_nmi_callback(void) 78 74 { ··· 86 82 of it completes on all other CPUs. 87 83 88 84 One way to accomplish this is via synchronize_rcu(), perhaps as 89 - follows: 85 + follows:: 90 86 91 87 unset_nmi_callback(); 92 88 synchronize_rcu(); ··· 102 98 Important note: for this to work, the architecture in question must 103 99 invoke nmi_enter() and nmi_exit() on NMI entry and exit, respectively. 104 100 101 + .. _answer_quick_quiz_NMI: 105 102 106 - Answer to Quick Quiz 103 + Answer to Quick Quiz: 104 + Why might the rcu_dereference_sched() be necessary on Alpha, given that the code referenced by the pointer is read-only? 107 105 108 - Why might the rcu_dereference_sched() be necessary on Alpha, given 109 - that the code referenced by the pointer is read-only? 106 + The caller to set_nmi_callback() might well have 107 + initialized some data that is to be used by the new NMI 108 + handler. In this case, the rcu_dereference_sched() would 109 + be needed, because otherwise a CPU that received an NMI 110 + just after the new handler was set might see the pointer 111 + to the new NMI handler, but the old pre-initialized 112 + version of the handler's data. 110 113 111 - Answer: The caller to set_nmi_callback() might well have 112 - initialized some data that is to be used by the new NMI 113 - handler. In this case, the rcu_dereference_sched() would 114 - be needed, because otherwise a CPU that received an NMI 115 - just after the new handler was set might see the pointer 116 - to the new NMI handler, but the old pre-initialized 117 - version of the handler's data. 114 + This same sad story can happen on other CPUs when using 115 + a compiler with aggressive pointer-value speculation 116 + optimizations. 118 117 119 - This same sad story can happen on other CPUs when using 120 - a compiler with aggressive pointer-value speculation 121 - optimizations. 122 - 123 - More important, the rcu_dereference_sched() makes it 124 - clear to someone reading the code that the pointer is 125 - being protected by RCU-sched. 118 + More important, the rcu_dereference_sched() makes it 119 + clear to someone reading the code that the pointer is 120 + being protected by RCU-sched.

+23 -11

Documentation/RCU/arrayRCU.txt Documentation/RCU/arrayRCU.rst

··· 1 - Using RCU to Protect Read-Mostly Arrays 1 + .. _array_rcu_doc: 2 2 3 + Using RCU to Protect Read-Mostly Arrays 4 + ======================================= 3 5 4 6 Although RCU is more commonly used to protect linked lists, it can 5 7 also be used to protect arrays. Three situations are as follows: 6 8 7 - 1. Hash Tables 9 + 1. :ref:`Hash Tables <hash_tables>` 8 10 9 - 2. Static Arrays 11 + 2. :ref:`Static Arrays <static_arrays>` 10 12 11 - 3. Resizeable Arrays 13 + 3. :ref:`Resizable Arrays <resizable_arrays>` 12 14 13 15 Each of these three situations involves an RCU-protected pointer to an 14 16 array that is separately indexed. It might be tempting to consider use 15 17 of RCU to instead protect the index into an array, however, this use 16 - case is -not- supported. The problem with RCU-protected indexes into 18 + case is **not** supported. The problem with RCU-protected indexes into 17 19 arrays is that compilers can play way too many optimization games with 18 20 integers, which means that the rules governing handling of these indexes 19 21 are far more trouble than they are worth. If RCU-protected indexes into ··· 26 24 That aside, each of the three RCU-protected pointer situations are 27 25 described in the following sections. 28 26 27 + .. _hash_tables: 29 28 30 29 Situation 1: Hash Tables 30 + ------------------------ 31 31 32 32 Hash tables are often implemented as an array, where each array entry 33 33 has a linked-list hash chain. Each hash chain can be protected by RCU 34 34 as described in the listRCU.txt document. This approach also applies 35 35 to other array-of-list situations, such as radix trees. 36 36 37 + .. _static_arrays: 37 38 38 39 Situation 2: Static Arrays 40 + -------------------------- 39 41 40 42 Static arrays, where the data (rather than a pointer to the data) is 41 43 located in each array element, and where the array is never resized, ··· 47 41 this situation, which would also have minimal read-side overhead as long 48 42 as updates are rare. 49 43 50 - Quick Quiz: Why is it so important that updates be rare when 51 - using seqlock? 44 + Quick Quiz: 45 + Why is it so important that updates be rare when using seqlock? 52 46 47 + :ref:`Answer to Quick Quiz <answer_quick_quiz_seqlock>` 53 48 54 - Situation 3: Resizeable Arrays 49 + .. _resizable_arrays: 55 50 56 - Use of RCU for resizeable arrays is demonstrated by the grow_ary() 51 + Situation 3: Resizable Arrays 52 + ------------------------------ 53 + 54 + Use of RCU for resizable arrays is demonstrated by the grow_ary() 57 55 function formerly used by the System V IPC code. The array is used 58 56 to map from semaphore, message-queue, and shared-memory IDs to the data 59 57 structure that represents the corresponding IPC construct. The grow_ary() ··· 70 60 the new array, and invokes ipc_rcu_putref() to free up the old array. 71 61 Note that rcu_assign_pointer() is used to update the ids->entries pointer, 72 62 which includes any memory barriers required on whatever architecture 73 - you are running on. 63 + you are running on:: 74 64 75 65 static int grow_ary(struct ipc_ids* ids, int newsize) 76 66 { ··· 122 112 to the desired IPC object is placed in "out", with NULL indicating 123 113 a non-existent entry. After acquiring "out->lock", the "out->deleted" 124 114 flag indicates whether the IPC object is in the process of being 125 - deleted, and, if not, the pointer is returned. 115 + deleted, and, if not, the pointer is returned:: 126 116 127 117 struct kern_ipc_perm* ipc_lock(struct ipc_ids* ids, int id) 128 118 { ··· 154 144 return out; 155 145 } 156 146 147 + .. _answer_quick_quiz_seqlock: 157 148 158 149 Answer to Quick Quiz: 150 + Why is it so important that updates be rare when using seqlock? 159 151 160 152 The reason that it is important that updates be rare when 161 153 using seqlock is that frequent updates can livelock readers.

+5

Documentation/RCU/index.rst

··· 7 7 .. toctree:: 8 8 :maxdepth: 3 9 9 10 + arrayRCU 11 + rcubarrier 12 + rcu_dereference 13 + whatisRCU 10 14 rcu 11 15 listRCU 16 + NMI-RCU 12 17 UP 13 18 14 19 Design/Memory-Ordering/Tree-RCU-Memory-Ordering

+1 -1

Documentation/RCU/lockdep-splat.txt

··· 99 99 read-side critical section, which again would have suppressed the 100 100 above lockdep-RCU splat. 101 101 102 - But in this particular case, we don't actually deference the pointer 102 + But in this particular case, we don't actually dereference the pointer 103 103 returned from rcu_dereference(). Instead, that pointer is just compared 104 104 to the cic pointer, which means that the rcu_dereference() can be replaced 105 105 by rcu_access_pointer() as follows:

+41 -34

Documentation/RCU/rcu_dereference.txt Documentation/RCU/rcu_dereference.rst

··· 1 + .. _rcu_dereference_doc: 2 + 1 3 PROPER CARE AND FEEDING OF RETURN VALUES FROM rcu_dereference() 4 + =============================================================== 2 5 3 6 Most of the time, you can use values from rcu_dereference() or one of 4 7 the similar primitives without worries. Dereferencing (prefix "*"), ··· 11 8 It is nevertheless possible to get into trouble with other operations. 12 9 Follow these rules to keep your RCU code working properly: 13 10 14 - o You must use one of the rcu_dereference() family of primitives 11 + - You must use one of the rcu_dereference() family of primitives 15 12 to load an RCU-protected pointer, otherwise CONFIG_PROVE_RCU 16 13 will complain. Worse yet, your code can see random memory-corruption 17 14 bugs due to games that compilers and DEC Alpha can play. ··· 28 25 for an example where the compiler can in fact deduce the exact 29 26 value of the pointer, and thus cause misordering. 30 27 31 - o You are only permitted to use rcu_dereference on pointer values. 28 + - You are only permitted to use rcu_dereference on pointer values. 32 29 The compiler simply knows too much about integral values to 33 30 trust it to carry dependencies through integer operations. 34 31 There are a very few exceptions, namely that you can temporarily 35 32 cast the pointer to uintptr_t in order to: 36 33 37 - o Set bits and clear bits down in the must-be-zero low-order 34 + - Set bits and clear bits down in the must-be-zero low-order 38 35 bits of that pointer. This clearly means that the pointer 39 36 must have alignment constraints, for example, this does 40 37 -not- work in general for char* pointers. 41 38 42 - o XOR bits to translate pointers, as is done in some 39 + - XOR bits to translate pointers, as is done in some 43 40 classic buddy-allocator algorithms. 44 41 45 42 It is important to cast the value back to pointer before 46 43 doing much of anything else with it. 47 44 48 - o Avoid cancellation when using the "+" and "-" infix arithmetic 45 + - Avoid cancellation when using the "+" and "-" infix arithmetic 49 46 operators. For example, for a given variable "x", avoid 50 47 "(x-(uintptr_t)x)" for char* pointers. The compiler is within its 51 48 rights to substitute zero for this sort of expression, so that ··· 57 54 "p+a-b" is safe because its value still necessarily depends on 58 55 the rcu_dereference(), thus maintaining proper ordering. 59 56 60 - o If you are using RCU to protect JITed functions, so that the 57 + - If you are using RCU to protect JITed functions, so that the 61 58 "()" function-invocation operator is applied to a value obtained 62 59 (directly or indirectly) from rcu_dereference(), you may need to 63 60 interact directly with the hardware to flush instruction caches. 64 61 This issue arises on some systems when a newly JITed function is 65 62 using the same memory that was used by an earlier JITed function. 66 63 67 - o Do not use the results from relational operators ("==", "!=", 64 + - Do not use the results from relational operators ("==", "!=", 68 65 ">", ">=", "<", or "<=") when dereferencing. For example, 69 - the following (quite strange) code is buggy: 66 + the following (quite strange) code is buggy:: 70 67 71 68 int *p; 72 69 int *q; ··· 84 81 after such branches, but can speculate loads, which can again 85 82 result in misordering bugs. 86 83 87 - o Be very careful about comparing pointers obtained from 84 + - Be very careful about comparing pointers obtained from 88 85 rcu_dereference() against non-NULL values. As Linus Torvalds 89 86 explained, if the two pointers are equal, the compiler could 90 87 substitute the pointer you are comparing against for the pointer 91 - obtained from rcu_dereference(). For example: 88 + obtained from rcu_dereference(). For example:: 92 89 93 90 p = rcu_dereference(gp); 94 91 if (p == &default_struct) ··· 96 93 97 94 Because the compiler now knows that the value of "p" is exactly 98 95 the address of the variable "default_struct", it is free to 99 - transform this code into the following: 96 + transform this code into the following:: 100 97 101 98 p = rcu_dereference(gp); 102 99 if (p == &default_struct) ··· 108 105 109 106 However, comparisons are OK in the following cases: 110 107 111 - o The comparison was against the NULL pointer. If the 108 + - The comparison was against the NULL pointer. If the 112 109 compiler knows that the pointer is NULL, you had better 113 110 not be dereferencing it anyway. If the comparison is 114 111 non-equal, the compiler is none the wiser. Therefore, 115 112 it is safe to compare pointers from rcu_dereference() 116 113 against NULL pointers. 117 114 118 - o The pointer is never dereferenced after being compared. 115 + - The pointer is never dereferenced after being compared. 119 116 Since there are no subsequent dereferences, the compiler 120 117 cannot use anything it learned from the comparison 121 118 to reorder the non-existent subsequent dereferences. ··· 127 124 dereferenced, rcu_access_pointer() should be used in place 128 125 of rcu_dereference(). 129 126 130 - o The comparison is against a pointer that references memory 127 + - The comparison is against a pointer that references memory 131 128 that was initialized "a long time ago." The reason 132 129 this is safe is that even if misordering occurs, the 133 130 misordering will not affect the accesses that follow 134 131 the comparison. So exactly how long ago is "a long 135 132 time ago"? Here are some possibilities: 136 133 137 - o Compile time. 134 + - Compile time. 138 135 139 - o Boot time. 136 + - Boot time. 140 137 141 - o Module-init time for module code. 138 + - Module-init time for module code. 142 139 143 - o Prior to kthread creation for kthread code. 140 + - Prior to kthread creation for kthread code. 144 141 145 - o During some prior acquisition of the lock that 142 + - During some prior acquisition of the lock that 146 143 we now hold. 147 144 148 - o Before mod_timer() time for a timer handler. 145 + - Before mod_timer() time for a timer handler. 149 146 150 147 There are many other possibilities involving the Linux 151 148 kernel's wide array of primitives that cause code to 152 149 be invoked at a later time. 153 150 154 - o The pointer being compared against also came from 151 + - The pointer being compared against also came from 155 152 rcu_dereference(). In this case, both pointers depend 156 153 on one rcu_dereference() or another, so you get proper 157 154 ordering either way. ··· 162 159 of such an RCU usage bug is shown in the section titled 163 160 "EXAMPLE OF AMPLIFIED RCU-USAGE BUG". 164 161 165 - o All of the accesses following the comparison are stores, 162 + - All of the accesses following the comparison are stores, 166 163 so that a control dependency preserves the needed ordering. 167 164 That said, it is easy to get control dependencies wrong. 168 165 Please see the "CONTROL DEPENDENCIES" section of 169 166 Documentation/memory-barriers.txt for more details. 170 167 171 - o The pointers are not equal -and- the compiler does 168 + - The pointers are not equal -and- the compiler does 172 169 not have enough information to deduce the value of the 173 170 pointer. Note that the volatile cast in rcu_dereference() 174 171 will normally prevent the compiler from knowing too much. ··· 178 175 comparison will provide exactly the information that the 179 176 compiler needs to deduce the value of the pointer. 180 177 181 - o Disable any value-speculation optimizations that your compiler 178 + - Disable any value-speculation optimizations that your compiler 182 179 might provide, especially if you are making use of feedback-based 183 180 optimizations that take data collected from prior runs. Such 184 181 value-speculation optimizations reorder operations by design. ··· 191 188 192 189 193 190 EXAMPLE OF AMPLIFIED RCU-USAGE BUG 191 + ---------------------------------- 194 192 195 193 Because updaters can run concurrently with RCU readers, RCU readers can 196 194 see stale and/or inconsistent values. If RCU readers need fresh or 197 195 consistent values, which they sometimes do, they need to take proper 198 - precautions. To see this, consider the following code fragment: 196 + precautions. To see this, consider the following code fragment:: 199 197 200 198 struct foo { 201 199 int a; ··· 248 244 249 245 But suppose that the reader needs a consistent view? 250 246 251 - Then one approach is to use locking, for example, as follows: 247 + Then one approach is to use locking, for example, as follows:: 252 248 253 249 struct foo { 254 250 int a; ··· 303 299 304 300 305 301 EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH 302 + ----------------------------------------- 306 303 307 304 If a pointer obtained from rcu_dereference() compares not-equal to some 308 305 other pointer, the compiler normally has no clue what the value of the ··· 313 308 should prevent the compiler from guessing the value. 314 309 315 310 But without rcu_dereference(), the compiler knows more than you might 316 - expect. Consider the following code fragment: 311 + expect. Consider the following code fragment:: 317 312 318 313 struct foo { 319 314 int a; ··· 359 354 360 355 361 356 WHICH MEMBER OF THE rcu_dereference() FAMILY SHOULD YOU USE? 357 + ------------------------------------------------------------ 362 358 363 359 First, please avoid using rcu_dereference_raw() and also please avoid 364 360 using rcu_dereference_check() and rcu_dereference_protected() with a ··· 376 370 377 371 2. If the access might be within an RCU read-side critical section 378 372 on the one hand, or protected by (say) my_lock on the other, 379 - use rcu_dereference_check(), for example: 373 + use rcu_dereference_check(), for example:: 380 374 381 375 p1 = rcu_dereference_check(p->rcu_protected_pointer, 382 376 lockdep_is_held(&my_lock)); ··· 384 378 385 379 3. If the access might be within an RCU read-side critical section 386 380 on the one hand, or protected by either my_lock or your_lock on 387 - the other, again use rcu_dereference_check(), for example: 381 + the other, again use rcu_dereference_check(), for example:: 388 382 389 383 p1 = rcu_dereference_check(p->rcu_protected_pointer, 390 384 lockdep_is_held(&my_lock) || 391 385 lockdep_is_held(&your_lock)); 392 386 393 387 4. If the access is on the update side, so that it is always protected 394 - by my_lock, use rcu_dereference_protected(): 388 + by my_lock, use rcu_dereference_protected():: 395 389 396 390 p1 = rcu_dereference_protected(p->rcu_protected_pointer, 397 391 lockdep_is_held(&my_lock)); ··· 416 410 417 411 418 412 SPARSE CHECKING OF RCU-PROTECTED POINTERS 413 + ----------------------------------------- 419 414 420 415 The sparse static-analysis tool checks for direct access to RCU-protected 421 416 pointers, which can result in "interesting" bugs due to compiler 422 417 optimizations involving invented loads and perhaps also load tearing. 423 - For example, suppose someone mistakenly does something like this: 418 + For example, suppose someone mistakenly does something like this:: 424 419 425 420 p = q->rcu_protected_pointer; 426 421 do_something_with(p->a); 427 422 do_something_else_with(p->b); 428 423 429 424 If register pressure is high, the compiler might optimize "p" out 430 - of existence, transforming the code to something like this: 425 + of existence, transforming the code to something like this:: 431 426 432 427 do_something_with(q->rcu_protected_pointer->a); 433 428 do_something_else_with(q->rcu_protected_pointer->b); ··· 442 435 of pointers, which also might fatally disappoint your code. 443 436 444 437 These problems could have been avoided simply by making the code instead 445 - read as follows: 438 + read as follows:: 446 439 447 440 p = rcu_dereference(q->rcu_protected_pointer); 448 441 do_something_with(p->a); ··· 455 448 this pointer is accessed directly. It will also cause sparse to complain 456 449 if a pointer not marked with "__rcu" is accessed using rcu_dereference() 457 450 and friends. For example, ->rcu_protected_pointer might be declared as 458 - follows: 451 + follows:: 459 452 460 453 struct foo __rcu *rcu_protected_pointer; 461 454

+125 -97

Documentation/RCU/rcubarrier.txt Documentation/RCU/rcubarrier.rst

··· 1 + .. _rcu_barrier: 2 + 1 3 RCU and Unloadable Modules 4 + ========================== 2 5 3 6 [Originally published in LWN Jan. 14, 2007: http://lwn.net/Articles/217484/] 4 7 ··· 24 21 presence? There is a synchronize_rcu() primitive that blocks until all 25 22 pre-existing readers have completed. An updater wishing to delete an 26 23 element p from a linked list might do the following, while holding an 27 - appropriate lock, of course: 24 + appropriate lock, of course:: 28 25 29 26 list_del_rcu(p); 30 27 synchronize_rcu(); ··· 35 32 rcu_head struct placed within the RCU-protected data structure and 36 33 another pointer to a function that may be invoked later to free that 37 34 structure. Code to delete an element p from the linked list from IRQ 38 - context might then be as follows: 35 + context might then be as follows:: 39 36 40 37 list_del_rcu(p); 41 38 call_rcu(&p->rcu, p_callback); 42 39 43 40 Since call_rcu() never blocks, this code can safely be used from within 44 - IRQ context. The function p_callback() might be defined as follows: 41 + IRQ context. The function p_callback() might be defined as follows:: 45 42 46 43 static void p_callback(struct rcu_head *rp) 47 44 { ··· 52 49 53 50 54 51 Unloading Modules That Use call_rcu() 52 + ------------------------------------- 55 53 56 54 But what if p_callback is defined in an unloadable module? 57 55 ··· 73 69 74 70 75 71 rcu_barrier() 72 + ------------- 76 73 77 74 We instead need the rcu_barrier() primitive. Rather than waiting for 78 75 a grace period to elapse, rcu_barrier() waits for all outstanding RCU 79 - callbacks to complete. Please note that rcu_barrier() does -not- imply 76 + callbacks to complete. Please note that rcu_barrier() does **not** imply 80 77 synchronize_rcu(), in particular, if there are no RCU callbacks queued 81 78 anywhere, rcu_barrier() is within its rights to return immediately, 82 79 without waiting for a grace period to elapse. ··· 93 88 module uses multiple flavors of call_rcu(), then it must also use multiple 94 89 flavors of rcu_barrier() when unloading that module. For example, if 95 90 it uses call_rcu(), call_srcu() on srcu_struct_1, and call_srcu() on 96 - srcu_struct_2(), then the following three lines of code will be required 97 - when unloading: 91 + srcu_struct_2, then the following three lines of code will be required 92 + when unloading:: 98 93 99 94 1 rcu_barrier(); 100 95 2 srcu_barrier(&srcu_struct_1); 101 96 3 srcu_barrier(&srcu_struct_2); 102 97 103 98 The rcutorture module makes use of rcu_barrier() in its exit function 104 - as follows: 99 + as follows:: 105 100 106 - 1 static void 107 - 2 rcu_torture_cleanup(void) 108 - 3 { 109 - 4 int i; 101 + 1 static void 102 + 2 rcu_torture_cleanup(void) 103 + 3 { 104 + 4 int i; 110 105 5 111 - 6 fullstop = 1; 112 - 7 if (shuffler_task != NULL) { 106 + 6 fullstop = 1; 107 + 7 if (shuffler_task != NULL) { 113 108 8 VERBOSE_PRINTK_STRING("Stopping rcu_torture_shuffle task"); 114 109 9 kthread_stop(shuffler_task); 115 - 10 } 116 - 11 shuffler_task = NULL; 117 - 12 118 - 13 if (writer_task != NULL) { 119 - 14 VERBOSE_PRINTK_STRING("Stopping rcu_torture_writer task"); 120 - 15 kthread_stop(writer_task); 121 - 16 } 122 - 17 writer_task = NULL; 123 - 18 124 - 19 if (reader_tasks != NULL) { 125 - 20 for (i = 0; i < nrealreaders; i++) { 126 - 21 if (reader_tasks[i] != NULL) { 127 - 22 VERBOSE_PRINTK_STRING( 128 - 23 "Stopping rcu_torture_reader task"); 129 - 24 kthread_stop(reader_tasks[i]); 130 - 25 } 131 - 26 reader_tasks[i] = NULL; 132 - 27 } 133 - 28 kfree(reader_tasks); 134 - 29 reader_tasks = NULL; 135 - 30 } 136 - 31 rcu_torture_current = NULL; 137 - 32 138 - 33 if (fakewriter_tasks != NULL) { 139 - 34 for (i = 0; i < nfakewriters; i++) { 140 - 35 if (fakewriter_tasks[i] != NULL) { 141 - 36 VERBOSE_PRINTK_STRING( 142 - 37 "Stopping rcu_torture_fakewriter task"); 143 - 38 kthread_stop(fakewriter_tasks[i]); 144 - 39 } 145 - 40 fakewriter_tasks[i] = NULL; 146 - 41 } 147 - 42 kfree(fakewriter_tasks); 148 - 43 fakewriter_tasks = NULL; 149 - 44 } 150 - 45 151 - 46 if (stats_task != NULL) { 152 - 47 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task"); 153 - 48 kthread_stop(stats_task); 154 - 49 } 155 - 50 stats_task = NULL; 156 - 51 157 - 52 /* Wait for all RCU callbacks to fire. */ 158 - 53 rcu_barrier(); 159 - 54 160 - 55 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */ 161 - 56 162 - 57 if (cur_ops->cleanup != NULL) 163 - 58 cur_ops->cleanup(); 164 - 59 if (atomic_read(&n_rcu_torture_error)) 165 - 60 rcu_torture_print_module_parms("End of test: FAILURE"); 166 - 61 else 167 - 62 rcu_torture_print_module_parms("End of test: SUCCESS"); 168 - 63 } 110 + 10 } 111 + 11 shuffler_task = NULL; 112 + 12 113 + 13 if (writer_task != NULL) { 114 + 14 VERBOSE_PRINTK_STRING("Stopping rcu_torture_writer task"); 115 + 15 kthread_stop(writer_task); 116 + 16 } 117 + 17 writer_task = NULL; 118 + 18 119 + 19 if (reader_tasks != NULL) { 120 + 20 for (i = 0; i < nrealreaders; i++) { 121 + 21 if (reader_tasks[i] != NULL) { 122 + 22 VERBOSE_PRINTK_STRING( 123 + 23 "Stopping rcu_torture_reader task"); 124 + 24 kthread_stop(reader_tasks[i]); 125 + 25 } 126 + 26 reader_tasks[i] = NULL; 127 + 27 } 128 + 28 kfree(reader_tasks); 129 + 29 reader_tasks = NULL; 130 + 30 } 131 + 31 rcu_torture_current = NULL; 132 + 32 133 + 33 if (fakewriter_tasks != NULL) { 134 + 34 for (i = 0; i < nfakewriters; i++) { 135 + 35 if (fakewriter_tasks[i] != NULL) { 136 + 36 VERBOSE_PRINTK_STRING( 137 + 37 "Stopping rcu_torture_fakewriter task"); 138 + 38 kthread_stop(fakewriter_tasks[i]); 139 + 39 } 140 + 40 fakewriter_tasks[i] = NULL; 141 + 41 } 142 + 42 kfree(fakewriter_tasks); 143 + 43 fakewriter_tasks = NULL; 144 + 44 } 145 + 45 146 + 46 if (stats_task != NULL) { 147 + 47 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task"); 148 + 48 kthread_stop(stats_task); 149 + 49 } 150 + 50 stats_task = NULL; 151 + 51 152 + 52 /* Wait for all RCU callbacks to fire. */ 153 + 53 rcu_barrier(); 154 + 54 155 + 55 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */ 156 + 56 157 + 57 if (cur_ops->cleanup != NULL) 158 + 58 cur_ops->cleanup(); 159 + 59 if (atomic_read(&n_rcu_torture_error)) 160 + 60 rcu_torture_print_module_parms("End of test: FAILURE"); 161 + 61 else 162 + 62 rcu_torture_print_module_parms("End of test: SUCCESS"); 163 + 63 } 169 164 170 165 Line 6 sets a global variable that prevents any RCU callbacks from 171 166 re-posting themselves. This will not be necessary in most cases, since ··· 181 176 Then lines 55-62 print status and do operation-specific cleanup, and 182 177 then return, permitting the module-unload operation to be completed. 183 178 184 - Quick Quiz #1: Is there any other situation where rcu_barrier() might 179 + .. _rcubarrier_quiz_1: 180 + 181 + Quick Quiz #1: 182 + Is there any other situation where rcu_barrier() might 185 183 be required? 184 + 185 + :ref:`Answer to Quick Quiz #1 <answer_rcubarrier_quiz_1>` 186 186 187 187 Your module might have additional complications. For example, if your 188 188 module invokes call_rcu() from timers, you will need to first cancel all ··· 198 188 rcu_barrier() before unloading. Similarly, if your module uses 199 189 call_srcu(), you will need to invoke srcu_barrier() before unloading, 200 190 and on the same srcu_struct structure. If your module uses call_rcu() 201 - -and- call_srcu(), then you will need to invoke rcu_barrier() -and- 191 + **and** call_srcu(), then you will need to invoke rcu_barrier() **and** 202 192 srcu_barrier(). 203 193 204 194 205 195 Implementing rcu_barrier() 196 + -------------------------- 206 197 207 198 Dipankar Sarma's implementation of rcu_barrier() makes use of the fact 208 199 that RCU callbacks are never reordered once queued on one of the per-CPU ··· 211 200 callback queues, and then waits until they have all started executing, at 212 201 which point, all earlier RCU callbacks are guaranteed to have completed. 213 202 214 - The original code for rcu_barrier() was as follows: 203 + The original code for rcu_barrier() was as follows:: 215 204 216 - 1 void rcu_barrier(void) 217 - 2 { 218 - 3 BUG_ON(in_interrupt()); 219 - 4 /* Take cpucontrol mutex to protect against CPU hotplug */ 220 - 5 mutex_lock(&rcu_barrier_mutex); 221 - 6 init_completion(&rcu_barrier_completion); 222 - 7 atomic_set(&rcu_barrier_cpu_count, 0); 223 - 8 on_each_cpu(rcu_barrier_func, NULL, 0, 1); 224 - 9 wait_for_completion(&rcu_barrier_completion); 225 - 10 mutex_unlock(&rcu_barrier_mutex); 226 - 11 } 205 + 1 void rcu_barrier(void) 206 + 2 { 207 + 3 BUG_ON(in_interrupt()); 208 + 4 /* Take cpucontrol mutex to protect against CPU hotplug */ 209 + 5 mutex_lock(&rcu_barrier_mutex); 210 + 6 init_completion(&rcu_barrier_completion); 211 + 7 atomic_set(&rcu_barrier_cpu_count, 0); 212 + 8 on_each_cpu(rcu_barrier_func, NULL, 0, 1); 213 + 9 wait_for_completion(&rcu_barrier_completion); 214 + 10 mutex_unlock(&rcu_barrier_mutex); 215 + 11 } 227 216 228 217 Line 3 verifies that the caller is in process context, and lines 5 and 10 229 218 use rcu_barrier_mutex to ensure that only one rcu_barrier() is using the ··· 237 226 still gives the general idea. 238 227 239 228 The rcu_barrier_func() runs on each CPU, where it invokes call_rcu() 240 - to post an RCU callback, as follows: 229 + to post an RCU callback, as follows:: 241 230 242 - 1 static void rcu_barrier_func(void *notused) 243 - 2 { 244 - 3 int cpu = smp_processor_id(); 245 - 4 struct rcu_data *rdp = &per_cpu(rcu_data, cpu); 246 - 5 struct rcu_head *head; 231 + 1 static void rcu_barrier_func(void *notused) 232 + 2 { 233 + 3 int cpu = smp_processor_id(); 234 + 4 struct rcu_data *rdp = &per_cpu(rcu_data, cpu); 235 + 5 struct rcu_head *head; 247 236 6 248 - 7 head = &rdp->barrier; 249 - 8 atomic_inc(&rcu_barrier_cpu_count); 250 - 9 call_rcu(head, rcu_barrier_callback); 251 - 10 } 237 + 7 head = &rdp->barrier; 238 + 8 atomic_inc(&rcu_barrier_cpu_count); 239 + 9 call_rcu(head, rcu_barrier_callback); 240 + 10 } 252 241 253 242 Lines 3 and 4 locate RCU's internal per-CPU rcu_data structure, 254 243 which contains the struct rcu_head that needed for the later call to ··· 259 248 260 249 The rcu_barrier_callback() function simply atomically decrements the 261 250 rcu_barrier_cpu_count variable and finalizes the completion when it 262 - reaches zero, as follows: 251 + reaches zero, as follows:: 263 252 264 253 1 static void rcu_barrier_callback(struct rcu_head *notused) 265 254 2 { 266 - 3 if (atomic_dec_and_test(&rcu_barrier_cpu_count)) 267 - 4 complete(&rcu_barrier_completion); 255 + 3 if (atomic_dec_and_test(&rcu_barrier_cpu_count)) 256 + 4 complete(&rcu_barrier_completion); 268 257 5 } 269 258 270 - Quick Quiz #2: What happens if CPU 0's rcu_barrier_func() executes 259 + .. _rcubarrier_quiz_2: 260 + 261 + Quick Quiz #2: 262 + What happens if CPU 0's rcu_barrier_func() executes 271 263 immediately (thus incrementing rcu_barrier_cpu_count to the 272 264 value one), but the other CPU's rcu_barrier_func() invocations 273 265 are delayed for a full grace period? Couldn't this result in 274 266 rcu_barrier() returning prematurely? 267 + 268 + :ref:`Answer to Quick Quiz #2 <answer_rcubarrier_quiz_2>` 275 269 276 270 The current rcu_barrier() implementation is more complex, due to the need 277 271 to avoid disturbing idle CPUs (especially on battery-powered systems) ··· 285 269 286 270 287 271 rcu_barrier() Summary 272 + --------------------- 288 273 289 274 The rcu_barrier() primitive has seen relatively little use, since most 290 275 code using RCU is in the core kernel rather than in modules. However, if ··· 294 277 295 278 296 279 Answers to Quick Quizzes 280 + ------------------------ 297 281 298 - Quick Quiz #1: Is there any other situation where rcu_barrier() might 282 + .. _answer_rcubarrier_quiz_1: 283 + 284 + Quick Quiz #1: 285 + Is there any other situation where rcu_barrier() might 299 286 be required? 300 287 301 288 Answer: Interestingly enough, rcu_barrier() was not originally ··· 313 292 implementing rcutorture, and found that rcu_barrier() solves 314 293 this problem as well. 315 294 316 - Quick Quiz #2: What happens if CPU 0's rcu_barrier_func() executes 295 + :ref:`Back to Quick Quiz #1 <rcubarrier_quiz_1>` 296 + 297 + .. _answer_rcubarrier_quiz_2: 298 + 299 + Quick Quiz #2: 300 + What happens if CPU 0's rcu_barrier_func() executes 317 301 immediately (thus incrementing rcu_barrier_cpu_count to the 318 302 value one), but the other CPU's rcu_barrier_func() invocations 319 303 are delayed for a full grace period? Couldn't this result in ··· 349 323 is to add an rcu_read_lock() before line 8 of rcu_barrier() 350 324 and an rcu_read_unlock() after line 8 of this same function. If 351 325 you can think of a better change, please let me know! 326 + 327 + :ref:`Back to Quick Quiz #2 <rcubarrier_quiz_2>`

+3 -8

Documentation/RCU/stallwarn.txt

··· 225 225 In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed 226 226 for each CPU: 227 227 228 - 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 Nonlazy posted: ..D 228 + 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 dyntick_enabled: 1 229 229 230 230 The "last_accelerate:" prints the low-order 16 bits (in hex) of the 231 231 jiffies counter when this CPU last invoked rcu_try_advance_all_cbs() 232 232 from rcu_needs_cpu() or last invoked rcu_accelerate_cbs() from 233 - rcu_prepare_for_idle(). The "Nonlazy posted:" indicates lazy-callback 234 - status, so that an "l" indicates that all callbacks were lazy at the start 235 - of the last idle period and an "L" indicates that there are currently 236 - no non-lazy callbacks (in both cases, "." is printed otherwise, as 237 - shown above) and "D" indicates that dyntick-idle processing is enabled 238 - ("." is printed otherwise, for example, if disabled via the "nohz=" 239 - kernel boot parameter). 233 + rcu_prepare_for_idle(). "dyntick_enabled: 1" indicates that dyntick-idle 234 + processing is enabled. 240 235 241 236 If the grace period ends just as the stall warning starts printing, 242 237 there will be a spurious stall-warning message, which will include

+182 -107

Documentation/RCU/whatisRCU.txt Documentation/RCU/whatisRCU.rst

··· 1 + .. _whatisrcu_doc: 2 + 1 3 What is RCU? -- "Read, Copy, Update" 4 + ====================================== 2 5 3 6 Please note that the "What is RCU?" LWN series is an excellent place 4 7 to start learning about RCU: 5 8 6 - 1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/ 7 - 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/ 8 - 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/ 9 - 4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/ 10 - 2010 Big API Table http://lwn.net/Articles/419086/ 11 - 5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/ 12 - 2014 Big API Table http://lwn.net/Articles/609973/ 9 + | 1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/ 10 + | 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/ 11 + | 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/ 12 + | 4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/ 13 + | 2010 Big API Table http://lwn.net/Articles/419086/ 14 + | 5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/ 15 + | 2014 Big API Table http://lwn.net/Articles/609973/ 13 16 14 17 15 18 What is RCU? ··· 27 24 to arrive at an understanding of RCU. This document provides several 28 25 different paths, as follows: 29 26 30 - 1. RCU OVERVIEW 31 - 2. WHAT IS RCU'S CORE API? 32 - 3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? 33 - 4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? 34 - 5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? 35 - 6. ANALOGY WITH READER-WRITER LOCKING 36 - 7. FULL LIST OF RCU APIs 37 - 8. ANSWERS TO QUICK QUIZZES 27 + :ref:`1. RCU OVERVIEW <1_whatisRCU>` 28 + 29 + :ref:`2. WHAT IS RCU'S CORE API? <2_whatisRCU>` 30 + 31 + :ref:`3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? <3_whatisRCU>` 32 + 33 + :ref:`4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? <4_whatisRCU>` 34 + 35 + :ref:`5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? <5_whatisRCU>` 36 + 37 + :ref:`6. ANALOGY WITH READER-WRITER LOCKING <6_whatisRCU>` 38 + 39 + :ref:`7. FULL LIST OF RCU APIs <7_whatisRCU>` 40 + 41 + :ref:`8. ANSWERS TO QUICK QUIZZES <8_whatisRCU>` 38 42 39 43 People who prefer starting with a conceptual overview should focus on 40 44 Section 1, though most readers will profit by reading this section at ··· 59 49 that type of person, you have perused the source code and will therefore 60 50 never need this document anyway. ;-) 61 51 52 + .. _1_whatisRCU: 62 53 63 54 1. RCU OVERVIEW 55 + ---------------- 64 56 65 57 The basic idea behind RCU is to split updates into "removal" and 66 58 "reclamation" phases. The removal phase removes references to data items ··· 128 116 that readers are not doing any sort of synchronization operations??? 129 117 Read on to learn about how RCU's API makes this easy. 130 118 119 + .. _2_whatisRCU: 131 120 132 121 2. WHAT IS RCU'S CORE API? 122 + --------------------------- 133 123 134 124 The core RCU API is quite small: 135 125 ··· 150 136 at the function header comments. 151 137 152 138 rcu_read_lock() 153 - 139 + ^^^^^^^^^^^^^^^ 154 140 void rcu_read_lock(void); 155 141 156 142 Used by a reader to inform the reclaimer that the reader is ··· 164 150 longer-term references to data structures. 165 151 166 152 rcu_read_unlock() 167 - 153 + ^^^^^^^^^^^^^^^^^ 168 154 void rcu_read_unlock(void); 169 155 170 156 Used by a reader to inform the reclaimer that the reader is ··· 172 158 read-side critical sections may be nested and/or overlapping. 173 159 174 160 synchronize_rcu() 175 - 161 + ^^^^^^^^^^^^^^^^^ 176 162 void synchronize_rcu(void); 177 163 178 164 Marks the end of updater code and the beginning of reclaimer 179 165 code. It does this by blocking until all pre-existing RCU 180 166 read-side critical sections on all CPUs have completed. 181 - Note that synchronize_rcu() will -not- necessarily wait for 167 + Note that synchronize_rcu() will **not** necessarily wait for 182 168 any subsequent RCU read-side critical sections to complete. 183 - For example, consider the following sequence of events: 169 + For example, consider the following sequence of events:: 184 170 185 171 CPU 0 CPU 1 CPU 2 186 172 ----------------- ------------------------- --------------- ··· 196 182 any that begin after synchronize_rcu() is invoked. 197 183 198 184 Of course, synchronize_rcu() does not necessarily return 199 - -immediately- after the last pre-existing RCU read-side critical 185 + **immediately** after the last pre-existing RCU read-side critical 200 186 section completes. For one thing, there might well be scheduling 201 187 delays. For another thing, many RCU implementations process 202 188 requests in batches in order to improve efficiencies, which can ··· 225 211 checklist.txt for some approaches to limiting the update rate. 226 212 227 213 rcu_assign_pointer() 228 - 214 + ^^^^^^^^^^^^^^^^^^^^ 229 215 void rcu_assign_pointer(p, typeof(p) v); 230 216 231 - Yes, rcu_assign_pointer() -is- implemented as a macro, though it 217 + Yes, rcu_assign_pointer() **is** implemented as a macro, though it 232 218 would be cool to be able to declare a function in this manner. 233 219 (Compiler experts will no doubt disagree.) 234 220 ··· 245 231 the _rcu list-manipulation primitives such as list_add_rcu(). 246 232 247 233 rcu_dereference() 248 - 234 + ^^^^^^^^^^^^^^^^^ 249 235 typeof(p) rcu_dereference(p); 250 236 251 237 Like rcu_assign_pointer(), rcu_dereference() must be implemented ··· 262 248 263 249 Common coding practice uses rcu_dereference() to copy an 264 250 RCU-protected pointer to a local variable, then dereferences 265 - this local variable, for example as follows: 251 + this local variable, for example as follows:: 266 252 267 253 p = rcu_dereference(head.next); 268 254 return p->data; 269 255 270 256 However, in this case, one could just as easily combine these 271 - into one statement: 257 + into one statement:: 272 258 273 259 return rcu_dereference(head.next)->data; 274 260 ··· 280 266 unnecessary overhead on Alpha CPUs. 281 267 282 268 Note that the value returned by rcu_dereference() is valid 283 - only within the enclosing RCU read-side critical section [1]. 284 - For example, the following is -not- legal: 269 + only within the enclosing RCU read-side critical section [1]_. 270 + For example, the following is **not** legal:: 285 271 286 272 rcu_read_lock(); 287 273 p = rcu_dereference(head.next); ··· 304 290 at any time, including immediately after the rcu_dereference(). 305 291 And, again like rcu_assign_pointer(), rcu_dereference() is 306 292 typically used indirectly, via the _rcu list-manipulation 307 - primitives, such as list_for_each_entry_rcu() [2]. 293 + primitives, such as list_for_each_entry_rcu() [2]_. 308 294 309 - [1] The variant rcu_dereference_protected() can be used outside 295 + .. [1] The variant rcu_dereference_protected() can be used outside 310 296 of an RCU read-side critical section as long as the usage is 311 297 protected by locks acquired by the update-side code. This variant 312 298 avoids the lockdep warning that would happen when using (for ··· 319 305 a lockdep splat is emitted. See Documentation/RCU/Design/Requirements/Requirements.rst 320 306 and the API's code comments for more details and example usage. 321 307 322 - [2] If the list_for_each_entry_rcu() instance might be used by 308 + .. [2] If the list_for_each_entry_rcu() instance might be used by 323 309 update-side code as well as by RCU readers, then an additional 324 310 lockdep expression can be added to its list of arguments. 325 311 For example, given an additional "lock_is_held(&mylock)" argument, ··· 329 315 330 316 The following diagram shows how each API communicates among the 331 317 reader, updater, and reclaimer. 318 + :: 332 319 333 320 334 321 rcu_assign_pointer() ··· 390 375 Again, most uses will be of (a). The (b) and (c) cases are important 391 376 for specialized uses, but are relatively uncommon. 392 377 378 + .. _3_whatisRCU: 393 379 394 380 3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? 381 + ----------------------------------------------- 395 382 396 383 This section shows a simple use of the core RCU API to protect a 397 384 global pointer to a dynamically allocated structure. More-typical 398 - uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt. 385 + uses of RCU may be found in :ref:`listRCU.rst <list_rcu_doc>`, 386 + :ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst <NMI_rcu_doc>`. 387 + :: 399 388 400 389 struct foo { 401 390 int a; ··· 459 440 460 441 So, to sum up: 461 442 462 - o Use rcu_read_lock() and rcu_read_unlock() to guard RCU 443 + - Use rcu_read_lock() and rcu_read_unlock() to guard RCU 463 444 read-side critical sections. 464 445 465 - o Within an RCU read-side critical section, use rcu_dereference() 446 + - Within an RCU read-side critical section, use rcu_dereference() 466 447 to dereference RCU-protected pointers. 467 448 468 - o Use some solid scheme (such as locks or semaphores) to 449 + - Use some solid scheme (such as locks or semaphores) to 469 450 keep concurrent updates from interfering with each other. 470 451 471 - o Use rcu_assign_pointer() to update an RCU-protected pointer. 452 + - Use rcu_assign_pointer() to update an RCU-protected pointer. 472 453 This primitive protects concurrent readers from the updater, 473 - -not- concurrent updates from each other! You therefore still 454 + **not** concurrent updates from each other! You therefore still 474 455 need to use locking (or something similar) to keep concurrent 475 456 rcu_assign_pointer() primitives from interfering with each other. 476 457 477 - o Use synchronize_rcu() -after- removing a data element from an 478 - RCU-protected data structure, but -before- reclaiming/freeing 458 + - Use synchronize_rcu() **after** removing a data element from an 459 + RCU-protected data structure, but **before** reclaiming/freeing 479 460 the data element, in order to wait for the completion of all 480 461 RCU read-side critical sections that might be referencing that 481 462 data item. 482 463 483 464 See checklist.txt for additional rules to follow when using RCU. 484 - And again, more-typical uses of RCU may be found in listRCU.txt, 485 - arrayRCU.txt, and NMI-RCU.txt. 465 + And again, more-typical uses of RCU may be found in :ref:`listRCU.rst 466 + <list_rcu_doc>`, :ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst 467 + <NMI_rcu_doc>`. 486 468 469 + .. _4_whatisRCU: 487 470 488 471 4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? 472 + -------------------------------------------- 489 473 490 474 In the example above, foo_update_a() blocks until a grace period elapses. 491 475 This is quite simple, but in some cases one cannot afford to wait so 492 476 long -- there might be other high-priority work to be done. 493 477 494 478 In such cases, one uses call_rcu() rather than synchronize_rcu(). 495 - The call_rcu() API is as follows: 479 + The call_rcu() API is as follows:: 496 480 497 481 void call_rcu(struct rcu_head * head, 498 482 void (*func)(struct rcu_head *head)); ··· 503 481 This function invokes func(head) after a grace period has elapsed. 504 482 This invocation might happen from either softirq or process context, 505 483 so the function is not permitted to block. The foo struct needs to 506 - have an rcu_head structure added, perhaps as follows: 484 + have an rcu_head structure added, perhaps as follows:: 507 485 508 486 struct foo { 509 487 int a; ··· 512 490 struct rcu_head rcu; 513 491 }; 514 492 515 - The foo_update_a() function might then be written as follows: 493 + The foo_update_a() function might then be written as follows:: 516 494 517 495 /* 518 496 * Create a new struct foo that is the same as the one currently ··· 542 520 call_rcu(&old_fp->rcu, foo_reclaim); 543 521 } 544 522 545 - The foo_reclaim() function might appear as follows: 523 + The foo_reclaim() function might appear as follows:: 546 524 547 525 void foo_reclaim(struct rcu_head *rp) 548 526 { ··· 566 544 The summary of advice is the same as for the previous section, except 567 545 that we are now using call_rcu() rather than synchronize_rcu(): 568 546 569 - o Use call_rcu() -after- removing a data element from an 547 + - Use call_rcu() **after** removing a data element from an 570 548 RCU-protected data structure in order to register a callback 571 549 function that will be invoked after the completion of all RCU 572 550 read-side critical sections that might be referencing that ··· 574 552 575 553 If the callback for call_rcu() is not doing anything more than calling 576 554 kfree() on the structure, you can use kfree_rcu() instead of call_rcu() 577 - to avoid having to write your own callback: 555 + to avoid having to write your own callback:: 578 556 579 557 kfree_rcu(old_fp, rcu); 580 558 581 559 Again, see checklist.txt for additional rules governing the use of RCU. 582 560 561 + .. _5_whatisRCU: 583 562 584 563 5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? 564 + ------------------------------------------------ 585 565 586 566 One of the nice things about RCU is that it has extremely simple "toy" 587 567 implementations that are a good first step towards understanding the ··· 603 579 604 580 605 581 5A. "TOY" IMPLEMENTATION #1: LOCKING 606 - 582 + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 607 583 This section presents a "toy" RCU implementation that is based on 608 584 familiar locking primitives. Its overhead makes it a non-starter for 609 585 real-life use, as does its lack of scalability. It is also unsuitable ··· 615 591 However, it is probably the easiest implementation to relate to, so is 616 592 a good starting point. 617 593 618 - It is extremely simple: 594 + It is extremely simple:: 619 595 620 596 static DEFINE_RWLOCK(rcu_gp_mutex); 621 597 ··· 638 614 639 615 [You can ignore rcu_assign_pointer() and rcu_dereference() without missing 640 616 much. But here are simplified versions anyway. And whatever you do, 641 - don't forget about them when submitting patches making use of RCU!] 617 + don't forget about them when submitting patches making use of RCU!]:: 642 618 643 619 #define rcu_assign_pointer(p, v) \ 644 620 ({ \ ··· 671 647 But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex, 672 648 so there can be no deadlock cycle. 673 649 674 - Quick Quiz #1: Why is this argument naive? How could a deadlock 650 + .. _quiz_1: 651 + 652 + Quick Quiz #1: 653 + Why is this argument naive? How could a deadlock 675 654 occur when using this algorithm in a real-world Linux 676 655 kernel? How could this deadlock be avoided? 677 656 657 + :ref:`Answers to Quick Quiz <8_whatisRCU>` 678 658 679 659 5B. "TOY" EXAMPLE #2: CLASSIC RCU 680 - 660 + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 681 661 This section presents a "toy" RCU implementation that is based on 682 662 "classic RCU". It is also short on performance (but only for updates) and 683 663 on features such as hotplug CPU and the ability to run in CONFIG_PREEMPT 684 664 kernels. The definitions of rcu_dereference() and rcu_assign_pointer() 685 665 are the same as those shown in the preceding section, so they are omitted. 666 + :: 686 667 687 668 void rcu_read_lock(void) { } 688 669 ··· 712 683 in terms of the sched_setaffinity() primitive. Of course, a somewhat less 713 684 "toy" implementation would restore the affinity upon completion rather 714 685 than just leaving all tasks running on the last CPU, but when I said 715 - "toy", I meant -toy-! 686 + "toy", I meant **toy**! 716 687 717 688 So how the heck is this supposed to work??? 718 689 719 690 Remember that it is illegal to block while in an RCU read-side critical 720 691 section. Therefore, if a given CPU executes a context switch, we know 721 692 that it must have completed all preceding RCU read-side critical sections. 722 - Once -all- CPUs have executed a context switch, then -all- preceding 693 + Once **all** CPUs have executed a context switch, then **all** preceding 723 694 RCU read-side critical sections will have completed. 724 695 725 696 So, suppose that we remove a data item from its structure and then invoke ··· 727 698 that there are no RCU read-side critical sections holding a reference 728 699 to that data item, so we can safely reclaim it. 729 700 730 - Quick Quiz #2: Give an example where Classic RCU's read-side 731 - overhead is -negative-. 701 + .. _quiz_2: 732 702 733 - Quick Quiz #3: If it is illegal to block in an RCU read-side 703 + Quick Quiz #2: 704 + Give an example where Classic RCU's read-side 705 + overhead is **negative**. 706 + 707 + :ref:`Answers to Quick Quiz <8_whatisRCU>` 708 + 709 + .. _quiz_3: 710 + 711 + Quick Quiz #3: 712 + If it is illegal to block in an RCU read-side 734 713 critical section, what the heck do you do in 735 714 PREEMPT_RT, where normal spinlocks can block??? 736 715 716 + :ref:`Answers to Quick Quiz <8_whatisRCU>` 717 + 718 + .. _6_whatisRCU: 737 719 738 720 6. ANALOGY WITH READER-WRITER LOCKING 721 + -------------------------------------- 739 722 740 723 Although RCU can be used in many different ways, a very common use of 741 724 RCU is analogous to reader-writer locking. The following unified 742 725 diff shows how closely related RCU and reader-writer locking can be. 726 + :: 743 727 744 728 @@ -5,5 +5,5 @@ struct el { 745 729 int data; ··· 804 762 return 0; 805 763 } 806 764 807 - Or, for those who prefer a side-by-side listing: 765 + Or, for those who prefer a side-by-side listing:: 808 766 809 767 1 struct el { 1 struct el { 810 768 2 struct list_head list; 2 struct list_head list; ··· 816 774 8 rwlock_t listmutex; 8 spinlock_t listmutex; 817 775 9 struct el head; 9 struct el head; 818 776 819 - 1 int search(long key, int *result) 1 int search(long key, int *result) 820 - 2 { 2 { 821 - 3 struct list_head *lp; 3 struct list_head *lp; 822 - 4 struct el *p; 4 struct el *p; 823 - 5 5 824 - 6 read_lock(&listmutex); 6 rcu_read_lock(); 825 - 7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) { 826 - 8 if (p->key == key) { 8 if (p->key == key) { 827 - 9 *result = p->data; 9 *result = p->data; 828 - 10 read_unlock(&listmutex); 10 rcu_read_unlock(); 829 - 11 return 1; 11 return 1; 830 - 12 } 12 } 831 - 13 } 13 } 832 - 14 read_unlock(&listmutex); 14 rcu_read_unlock(); 833 - 15 return 0; 15 return 0; 834 - 16 } 16 } 777 + :: 835 778 836 - 1 int delete(long key) 1 int delete(long key) 837 - 2 { 2 { 838 - 3 struct el *p; 3 struct el *p; 839 - 4 4 840 - 5 write_lock(&listmutex); 5 spin_lock(&listmutex); 841 - 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) { 842 - 7 if (p->key == key) { 7 if (p->key == key) { 843 - 8 list_del(&p->list); 8 list_del_rcu(&p->list); 844 - 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex); 845 - 10 synchronize_rcu(); 846 - 10 kfree(p); 11 kfree(p); 847 - 11 return 1; 12 return 1; 848 - 12 } 13 } 849 - 13 } 14 } 850 - 14 write_unlock(&listmutex); 15 spin_unlock(&listmutex); 851 - 15 return 0; 16 return 0; 852 - 16 } 17 } 779 + 1 int search(long key, int *result) 1 int search(long key, int *result) 780 + 2 { 2 { 781 + 3 struct list_head *lp; 3 struct list_head *lp; 782 + 4 struct el *p; 4 struct el *p; 783 + 5 5 784 + 6 read_lock(&listmutex); 6 rcu_read_lock(); 785 + 7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) { 786 + 8 if (p->key == key) { 8 if (p->key == key) { 787 + 9 *result = p->data; 9 *result = p->data; 788 + 10 read_unlock(&listmutex); 10 rcu_read_unlock(); 789 + 11 return 1; 11 return 1; 790 + 12 } 12 } 791 + 13 } 13 } 792 + 14 read_unlock(&listmutex); 14 rcu_read_unlock(); 793 + 15 return 0; 15 return 0; 794 + 16 } 16 } 795 + 796 + :: 797 + 798 + 1 int delete(long key) 1 int delete(long key) 799 + 2 { 2 { 800 + 3 struct el *p; 3 struct el *p; 801 + 4 4 802 + 5 write_lock(&listmutex); 5 spin_lock(&listmutex); 803 + 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) { 804 + 7 if (p->key == key) { 7 if (p->key == key) { 805 + 8 list_del(&p->list); 8 list_del_rcu(&p->list); 806 + 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex); 807 + 10 synchronize_rcu(); 808 + 10 kfree(p); 11 kfree(p); 809 + 11 return 1; 12 return 1; 810 + 12 } 13 } 811 + 13 } 14 } 812 + 14 write_unlock(&listmutex); 15 spin_unlock(&listmutex); 813 + 15 return 0; 16 return 0; 814 + 16 } 17 } 853 815 854 816 Either way, the differences are quite small. Read-side locking moves 855 817 to rcu_read_lock() and rcu_read_unlock, update-side locking moves from ··· 871 825 mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can 872 826 be used in place of synchronize_rcu(). 873 827 828 + .. _7_whatisRCU: 874 829 875 830 7. FULL LIST OF RCU APIs 831 + ------------------------- 876 832 877 833 The RCU APIs are documented in docbook-format header comments in the 878 834 Linux-kernel source code, but it helps to have a full list of the 879 835 APIs, since there does not appear to be a way to categorize them 880 836 in docbook. Here is the list, by category. 881 837 882 - RCU list traversal: 838 + RCU list traversal:: 883 839 884 840 list_entry_rcu 841 + list_entry_lockless 885 842 list_first_entry_rcu 886 843 list_next_rcu 887 844 list_for_each_entry_rcu 888 845 list_for_each_entry_continue_rcu 889 846 list_for_each_entry_from_rcu 847 + list_first_or_null_rcu 848 + list_next_or_null_rcu 890 849 hlist_first_rcu 891 850 hlist_next_rcu 892 851 hlist_pprev_rcu ··· 905 854 hlist_bl_first_rcu 906 855 hlist_bl_for_each_entry_rcu 907 856 908 - RCU pointer/list update: 857 + RCU pointer/list update:: 909 858 910 859 rcu_assign_pointer 911 860 list_add_rcu ··· 915 864 hlist_add_behind_rcu 916 865 hlist_add_before_rcu 917 866 hlist_add_head_rcu 867 + hlist_add_tail_rcu 918 868 hlist_del_rcu 919 869 hlist_del_init_rcu 920 870 hlist_replace_rcu 921 - list_splice_init_rcu() 871 + list_splice_init_rcu 872 + list_splice_tail_init_rcu 922 873 hlist_nulls_del_init_rcu 923 874 hlist_nulls_del_rcu 924 875 hlist_nulls_add_head_rcu ··· 929 876 hlist_bl_del_rcu 930 877 hlist_bl_set_first_rcu 931 878 932 - RCU: Critical sections Grace period Barrier 879 + RCU:: 880 + 881 + Critical sections Grace period Barrier 933 882 934 883 rcu_read_lock synchronize_net rcu_barrier 935 884 rcu_read_unlock synchronize_rcu ··· 940 885 rcu_dereference_check kfree_rcu 941 886 rcu_dereference_protected 942 887 943 - bh: Critical sections Grace period Barrier 888 + bh:: 889 + 890 + Critical sections Grace period Barrier 944 891 945 892 rcu_read_lock_bh call_rcu rcu_barrier 946 893 rcu_read_unlock_bh synchronize_rcu ··· 953 896 rcu_dereference_bh_protected 954 897 rcu_read_lock_bh_held 955 898 956 - sched: Critical sections Grace period Barrier 899 + sched:: 900 + 901 + Critical sections Grace period Barrier 957 902 958 903 rcu_read_lock_sched call_rcu rcu_barrier 959 904 rcu_read_unlock_sched synchronize_rcu ··· 969 910 rcu_read_lock_sched_held 970 911 971 912 972 - SRCU: Critical sections Grace period Barrier 913 + SRCU:: 914 + 915 + Critical sections Grace period Barrier 973 916 974 917 srcu_read_lock call_srcu srcu_barrier 975 918 srcu_read_unlock synchronize_srcu ··· 979 918 srcu_dereference_check 980 919 srcu_read_lock_held 981 920 982 - SRCU: Initialization/cleanup 921 + SRCU: Initialization/cleanup:: 922 + 983 923 DEFINE_SRCU 984 924 DEFINE_STATIC_SRCU 985 925 init_srcu_struct 986 926 cleanup_srcu_struct 987 927 988 - All: lockdep-checked RCU-protected pointer access 928 + All: lockdep-checked RCU-protected pointer access:: 989 929 990 930 rcu_access_pointer 991 931 rcu_dereference_raw ··· 1036 974 Of course, this all assumes that you have determined that RCU is in fact 1037 975 the right tool for your job. 1038 976 977 + .. _8_whatisRCU: 1039 978 1040 979 8. ANSWERS TO QUICK QUIZZES 980 + ---------------------------- 1041 981 1042 - Quick Quiz #1: Why is this argument naive? How could a deadlock 982 + Quick Quiz #1: 983 + Why is this argument naive? How could a deadlock 1043 984 occur when using this algorithm in a real-world Linux 1044 985 kernel? [Referring to the lock-based "toy" RCU 1045 986 algorithm.] 1046 987 1047 - Answer: Consider the following sequence of events: 988 + Answer: 989 + Consider the following sequence of events: 1048 990 1049 991 1. CPU 0 acquires some unrelated lock, call it 1050 992 "problematic_lock", disabling irq via ··· 1087 1021 approach where tasks in RCU read-side critical sections 1088 1022 cannot be blocked by tasks executing synchronize_rcu(). 1089 1023 1090 - Quick Quiz #2: Give an example where Classic RCU's read-side 1091 - overhead is -negative-. 1024 + :ref:`Back to Quick Quiz #1 <quiz_1>` 1092 1025 1093 - Answer: Imagine a single-CPU system with a non-CONFIG_PREEMPT 1026 + Quick Quiz #2: 1027 + Give an example where Classic RCU's read-side 1028 + overhead is **negative**. 1029 + 1030 + Answer: 1031 + Imagine a single-CPU system with a non-CONFIG_PREEMPT 1094 1032 kernel where a routing table is used by process-context 1095 1033 code, but can be updated by irq-context code (for example, 1096 1034 by an "ICMP REDIRECT" packet). The usual way of handling ··· 1116 1046 even the theoretical possibility of negative overhead for 1117 1047 a synchronization primitive is a bit unexpected. ;-) 1118 1048 1119 - Quick Quiz #3: If it is illegal to block in an RCU read-side 1049 + :ref:`Back to Quick Quiz #2 <quiz_2>` 1050 + 1051 + Quick Quiz #3: 1052 + If it is illegal to block in an RCU read-side 1120 1053 critical section, what the heck do you do in 1121 1054 PREEMPT_RT, where normal spinlocks can block??? 1122 1055 1123 - Answer: Just as PREEMPT_RT permits preemption of spinlock 1056 + Answer: 1057 + Just as PREEMPT_RT permits preemption of spinlock 1124 1058 critical sections, it permits preemption of RCU 1125 1059 read-side critical sections. It also permits 1126 1060 spinlocks blocking while in RCU read-side critical ··· 1143 1069 Besides, how does the computer know what pizza parlor 1144 1070 the human being went to??? 1145 1071 1072 + :ref:`Back to Quick Quiz #3 <quiz_3>` 1146 1073 1147 1074 ACKNOWLEDGEMENTS 1148 1075

+13

Documentation/admin-guide/kernel-parameters.txt

··· 4001 4001 test until boot completes in order to avoid 4002 4002 interference. 4003 4003 4004 + rcuperf.kfree_rcu_test= [KNL] 4005 + Set to measure performance of kfree_rcu() flooding. 4006 + 4007 + rcuperf.kfree_nthreads= [KNL] 4008 + The number of threads running loops of kfree_rcu(). 4009 + 4010 + rcuperf.kfree_alloc_num= [KNL] 4011 + Number of allocations and frees done in an iteration. 4012 + 4013 + rcuperf.kfree_loops= [KNL] 4014 + Number of loops doing rcuperf.kfree_alloc_num number 4015 + of allocations and frees. 4016 + 4004 4017 rcuperf.nreaders= [KNL] 4005 4018 Set number of RCU readers. The value -1 selects 4006 4019 N, where N is the number of CPUs. A value

-2

arch/powerpc/include/asm/barrier.h

··· 18 18 * mb() prevents loads and stores being reordered across this point. 19 19 * rmb() prevents loads being reordered across this point. 20 20 * wmb() prevents stores being reordered across this point. 21 - * read_barrier_depends() prevents data-dependent loads being reordered 22 - * across this point (nop on PPC). 23 21 * 24 22 * *mb() variants without smp_ prefix must order all types of memory 25 23 * operations with one another. sync is the only instruction sufficient

+2 -2

drivers/net/wireless/mediatek/mt76/agg-rx.c

··· 281 281 { 282 282 struct mt76_rx_tid *tid = NULL; 283 283 284 - rcu_swap_protected(wcid->aggr[tidno], tid, 285 - lockdep_is_held(&dev->mutex)); 284 + tid = rcu_replace_pointer(wcid->aggr[tidno], tid, 285 + lockdep_is_held(&dev->mutex)); 286 286 if (tid) { 287 287 mt76_rx_aggr_shutdown(dev, tid); 288 288 kfree_rcu(tid, rcu_head);

+112 -24

include/linux/list.h

··· 23 23 #define LIST_HEAD(name) \ 24 24 struct list_head name = LIST_HEAD_INIT(name) 25 25 26 + /** 27 + * INIT_LIST_HEAD - Initialize a list_head structure 28 + * @list: list_head structure to be initialized. 29 + * 30 + * Initializes the list_head to point to itself. If it is a list header, 31 + * the result is an empty list. 32 + */ 26 33 static inline void INIT_LIST_HEAD(struct list_head *list) 27 34 { 28 35 WRITE_ONCE(list->next, list); ··· 127 120 entry->prev = NULL; 128 121 } 129 122 130 - /** 131 - * list_del - deletes entry from list. 132 - * @entry: the element to delete from the list. 133 - * Note: list_empty() on entry does not return true after this, the entry is 134 - * in an undefined state. 135 - */ 136 123 static inline void __list_del_entry(struct list_head *entry) 137 124 { 138 125 if (!__list_del_entry_valid(entry)) ··· 135 134 __list_del(entry->prev, entry->next); 136 135 } 137 136 137 + /** 138 + * list_del - deletes entry from list. 139 + * @entry: the element to delete from the list. 140 + * Note: list_empty() on entry does not return true after this, the entry is 141 + * in an undefined state. 142 + */ 138 143 static inline void list_del(struct list_head *entry) 139 144 { 140 145 __list_del_entry(entry); ··· 164 157 new->prev->next = new; 165 158 } 166 159 160 + /** 161 + * list_replace_init - replace old entry by new one and initialize the old one 162 + * @old : the element to be replaced 163 + * @new : the new element to insert 164 + * 165 + * If @old was empty, it will be overwritten. 166 + */ 167 167 static inline void list_replace_init(struct list_head *old, 168 - struct list_head *new) 168 + struct list_head *new) 169 169 { 170 170 list_replace(old, new); 171 171 INIT_LIST_HEAD(old); ··· 768 754 h->pprev = NULL; 769 755 } 770 756 757 + /** 758 + * hlist_unhashed - Has node been removed from list and reinitialized? 759 + * @h: Node to be checked 760 + * 761 + * Not that not all removal functions will leave a node in unhashed 762 + * state. For example, hlist_nulls_del_init_rcu() does leave the 763 + * node in unhashed state, but hlist_nulls_del() does not. 764 + */ 771 765 static inline int hlist_unhashed(const struct hlist_node *h) 772 766 { 773 767 return !h->pprev; 774 768 } 775 769 770 + /** 771 + * hlist_unhashed_lockless - Version of hlist_unhashed for lockless use 772 + * @h: Node to be checked 773 + * 774 + * This variant of hlist_unhashed() must be used in lockless contexts 775 + * to avoid potential load-tearing. The READ_ONCE() is paired with the 776 + * various WRITE_ONCE() in hlist helpers that are defined below. 777 + */ 778 + static inline int hlist_unhashed_lockless(const struct hlist_node *h) 779 + { 780 + return !READ_ONCE(h->pprev); 781 + } 782 + 783 + /** 784 + * hlist_empty - Is the specified hlist_head structure an empty hlist? 785 + * @h: Structure to check. 786 + */ 776 787 static inline int hlist_empty(const struct hlist_head *h) 777 788 { 778 789 return !READ_ONCE(h->first); ··· 810 771 811 772 WRITE_ONCE(*pprev, next); 812 773 if (next) 813 - next->pprev = pprev; 774 + WRITE_ONCE(next->pprev, pprev); 814 775 } 815 776 777 + /** 778 + * hlist_del - Delete the specified hlist_node from its list 779 + * @n: Node to delete. 780 + * 781 + * Note that this function leaves the node in hashed state. Use 782 + * hlist_del_init() or similar instead to unhash @n. 783 + */ 816 784 static inline void hlist_del(struct hlist_node *n) 817 785 { 818 786 __hlist_del(n); ··· 827 781 n->pprev = LIST_POISON2; 828 782 } 829 783 784 + /** 785 + * hlist_del_init - Delete the specified hlist_node from its list and initialize 786 + * @n: Node to delete. 787 + * 788 + * Note that this function leaves the node in unhashed state. 789 + */ 830 790 static inline void hlist_del_init(struct hlist_node *n) 831 791 { 832 792 if (!hlist_unhashed(n)) { ··· 841 789 } 842 790 } 843 791 792 + /** 793 + * hlist_add_head - add a new entry at the beginning of the hlist 794 + * @n: new entry to be added 795 + * @h: hlist head to add it after 796 + * 797 + * Insert a new entry after the specified head. 798 + * This is good for implementing stacks. 799 + */ 844 800 static inline void hlist_add_head(struct hlist_node *n, struct hlist_head *h) 845 801 { 846 802 struct hlist_node *first = h->first; 847 - n->next = first; 803 + WRITE_ONCE(n->next, first); 848 804 if (first) 849 - first->pprev = &n->next; 805 + WRITE_ONCE(first->pprev, &n->next); 850 806 WRITE_ONCE(h->first, n); 851 - n->pprev = &h->first; 807 + WRITE_ONCE(n->pprev, &h->first); 852 808 } 853 809 854 - /* next must be != NULL */ 810 + /** 811 + * hlist_add_before - add a new entry before the one specified 812 + * @n: new entry to be added 813 + * @next: hlist node to add it before, which must be non-NULL 814 + */ 855 815 static inline void hlist_add_before(struct hlist_node *n, 856 - struct hlist_node *next) 816 + struct hlist_node *next) 857 817 { 858 - n->pprev = next->pprev; 859 - n->next = next; 860 - next->pprev = &n->next; 818 + WRITE_ONCE(n->pprev, next->pprev); 819 + WRITE_ONCE(n->next, next); 820 + WRITE_ONCE(next->pprev, &n->next); 861 821 WRITE_ONCE(*(n->pprev), n); 862 822 } 863 823 824 + /** 825 + * hlist_add_behing - add a new entry after the one specified 826 + * @n: new entry to be added 827 + * @prev: hlist node to add it after, which must be non-NULL 828 + */ 864 829 static inline void hlist_add_behind(struct hlist_node *n, 865 830 struct hlist_node *prev) 866 831 { 867 - n->next = prev->next; 868 - prev->next = n; 869 - n->pprev = &prev->next; 832 + WRITE_ONCE(n->next, prev->next); 833 + WRITE_ONCE(prev->next, n); 834 + WRITE_ONCE(n->pprev, &prev->next); 870 835 871 836 if (n->next) 872 - n->next->pprev = &n->next; 837 + WRITE_ONCE(n->next->pprev, &n->next); 873 838 } 874 839 875 - /* after that we'll appear to be on some hlist and hlist_del will work */ 840 + /** 841 + * hlist_add_fake - create a fake hlist consisting of a single headless node 842 + * @n: Node to make a fake list out of 843 + * 844 + * This makes @n appear to be its own predecessor on a headless hlist. 845 + * The point of this is to allow things like hlist_del() to work correctly 846 + * in cases where there is no list. 847 + */ 876 848 static inline void hlist_add_fake(struct hlist_node *n) 877 849 { 878 850 n->pprev = &n->next; 879 851 } 880 852 853 + /** 854 + * hlist_fake: Is this node a fake hlist? 855 + * @h: Node to check for being a self-referential fake hlist. 856 + */ 881 857 static inline bool hlist_fake(struct hlist_node *h) 882 858 { 883 859 return h->pprev == &h->next; 884 860 } 885 861 886 - /* 862 + /** 863 + * hlist_is_singular_node - is node the only element of the specified hlist? 864 + * @n: Node to check for singularity. 865 + * @h: Header for potentially singular list. 866 + * 887 867 * Check whether the node is the only node of the head without 888 - * accessing head: 868 + * accessing head, thus avoiding unnecessary cache misses. 889 869 */ 890 870 static inline bool 891 871 hlist_is_singular_node(struct hlist_node *n, struct hlist_head *h) ··· 925 841 return !n->next && n->pprev == &h->first; 926 842 } 927 843 928 - /* 844 + /** 845 + * hlist_move_list - Move an hlist 846 + * @old: hlist_head for old list. 847 + * @new: hlist_head for new list. 848 + * 929 849 * Move a list from one list head to another. Fixup the pprev 930 850 * reference of the first entry if it exists. 931 851 */

+26 -4

include/linux/list_nulls.h

··· 56 56 return ((unsigned long)ptr) >> 1; 57 57 } 58 58 59 + /** 60 + * hlist_nulls_unhashed - Has node been removed and reinitialized? 61 + * @h: Node to be checked 62 + * 63 + * Not that not all removal functions will leave a node in unhashed state. 64 + * For example, hlist_del_init_rcu() leaves the node in unhashed state, 65 + * but hlist_nulls_del() does not. 66 + */ 59 67 static inline int hlist_nulls_unhashed(const struct hlist_nulls_node *h) 60 68 { 61 69 return !h->pprev; 70 + } 71 + 72 + /** 73 + * hlist_nulls_unhashed_lockless - Has node been removed and reinitialized? 74 + * @h: Node to be checked 75 + * 76 + * Not that not all removal functions will leave a node in unhashed state. 77 + * For example, hlist_del_init_rcu() leaves the node in unhashed state, 78 + * but hlist_nulls_del() does not. Unlike hlist_nulls_unhashed(), this 79 + * function may be used locklessly. 80 + */ 81 + static inline int hlist_nulls_unhashed_lockless(const struct hlist_nulls_node *h) 82 + { 83 + return !READ_ONCE(h->pprev); 62 84 } 63 85 64 86 static inline int hlist_nulls_empty(const struct hlist_nulls_head *h) ··· 94 72 struct hlist_nulls_node *first = h->first; 95 73 96 74 n->next = first; 97 - n->pprev = &h->first; 75 + WRITE_ONCE(n->pprev, &h->first); 98 76 h->first = n; 99 77 if (!is_a_nulls(first)) 100 - first->pprev = &n->next; 78 + WRITE_ONCE(first->pprev, &n->next); 101 79 } 102 80 103 81 static inline void __hlist_nulls_del(struct hlist_nulls_node *n) ··· 107 85 108 86 WRITE_ONCE(*pprev, next); 109 87 if (!is_a_nulls(next)) 110 - next->pprev = pprev; 88 + WRITE_ONCE(next->pprev, pprev); 111 89 } 112 90 113 91 static inline void hlist_nulls_del(struct hlist_nulls_node *n) 114 92 { 115 93 __hlist_nulls_del(n); 116 - n->pprev = LIST_POISON2; 94 + WRITE_ONCE(n->pprev, LIST_POISON2); 117 95 } 118 96 119 97 /**

-2

include/linux/rcu_segcblist.h

··· 22 22 struct rcu_head *head; 23 23 struct rcu_head **tail; 24 24 long len; 25 - long len_lazy; 26 25 }; 27 26 28 27 #define RCU_CBLIST_INITIALIZER(n) { .head = NULL, .tail = &n.head } ··· 72 73 #else 73 74 long len; 74 75 #endif 75 - long len_lazy; 76 76 u8 enabled; 77 77 u8 offloaded; 78 78 };

+24 -14

include/linux/rculist.h

··· 40 40 */ 41 41 #define list_next_rcu(list) (*((struct list_head __rcu **)(&(list)->next))) 42 42 43 + /** 44 + * list_tail_rcu - returns the prev pointer of the head of the list 45 + * @head: the head of the list 46 + * 47 + * Note: This should only be used with the list header, and even then 48 + * only if list_del() and similar primitives are not also used on the 49 + * list header. 50 + */ 51 + #define list_tail_rcu(head) (*((struct list_head __rcu **)(&(head)->prev))) 52 + 43 53 /* 44 54 * Check during list traversal that we are within an RCU reader 45 55 */ ··· 183 173 { 184 174 if (!hlist_unhashed(n)) { 185 175 __hlist_del(n); 186 - n->pprev = NULL; 176 + WRITE_ONCE(n->pprev, NULL); 187 177 } 188 178 } 189 179 ··· 371 361 * @pos: the type * to use as a loop cursor. 372 362 * @head: the head for your list. 373 363 * @member: the name of the list_head within the struct. 374 - * @cond: optional lockdep expression if called from non-RCU protection. 364 + * @cond...: optional lockdep expression if called from non-RCU protection. 375 365 * 376 366 * This list-traversal primitive may safely run concurrently with 377 367 * the _rcu list-mutation primitives such as list_add_rcu() ··· 483 473 static inline void hlist_del_rcu(struct hlist_node *n) 484 474 { 485 475 __hlist_del(n); 486 - n->pprev = LIST_POISON2; 476 + WRITE_ONCE(n->pprev, LIST_POISON2); 487 477 } 488 478 489 479 /** ··· 499 489 struct hlist_node *next = old->next; 500 490 501 491 new->next = next; 502 - new->pprev = old->pprev; 492 + WRITE_ONCE(new->pprev, old->pprev); 503 493 rcu_assign_pointer(*(struct hlist_node __rcu **)new->pprev, new); 504 494 if (next) 505 - new->next->pprev = &new->next; 506 - old->pprev = LIST_POISON2; 495 + WRITE_ONCE(new->next->pprev, &new->next); 496 + WRITE_ONCE(old->pprev, LIST_POISON2); 507 497 } 508 498 509 499 /* ··· 538 528 struct hlist_node *first = h->first; 539 529 540 530 n->next = first; 541 - n->pprev = &h->first; 531 + WRITE_ONCE(n->pprev, &h->first); 542 532 rcu_assign_pointer(hlist_first_rcu(h), n); 543 533 if (first) 544 - first->pprev = &n->next; 534 + WRITE_ONCE(first->pprev, &n->next); 545 535 } 546 536 547 537 /** ··· 574 564 575 565 if (last) { 576 566 n->next = last->next; 577 - n->pprev = &last->next; 567 + WRITE_ONCE(n->pprev, &last->next); 578 568 rcu_assign_pointer(hlist_next_rcu(last), n); 579 569 } else { 580 570 hlist_add_head_rcu(n, h); ··· 602 592 static inline void hlist_add_before_rcu(struct hlist_node *n, 603 593 struct hlist_node *next) 604 594 { 605 - n->pprev = next->pprev; 595 + WRITE_ONCE(n->pprev, next->pprev); 606 596 n->next = next; 607 597 rcu_assign_pointer(hlist_pprev_rcu(n), n); 608 - next->pprev = &n->next; 598 + WRITE_ONCE(next->pprev, &n->next); 609 599 } 610 600 611 601 /** ··· 630 620 struct hlist_node *prev) 631 621 { 632 622 n->next = prev->next; 633 - n->pprev = &prev->next; 623 + WRITE_ONCE(n->pprev, &prev->next); 634 624 rcu_assign_pointer(hlist_next_rcu(prev), n); 635 625 if (n->next) 636 - n->next->pprev = &n->next; 626 + WRITE_ONCE(n->next->pprev, &n->next); 637 627 } 638 628 639 629 #define __hlist_for_each_rcu(pos, head) \ ··· 646 636 * @pos: the type * to use as a loop cursor. 647 637 * @head: the head for your list. 648 638 * @member: the name of the hlist_node within the struct. 649 - * @cond: optional lockdep expression if called from non-RCU protection. 639 + * @cond...: optional lockdep expression if called from non-RCU protection. 650 640 * 651 641 * This list-traversal primitive may safely run concurrently with 652 642 * the _rcu list-mutation primitives such as hlist_add_head_rcu()

+14 -6

include/linux/rculist_nulls.h

··· 34 34 { 35 35 if (!hlist_nulls_unhashed(n)) { 36 36 __hlist_nulls_del(n); 37 - n->pprev = NULL; 37 + WRITE_ONCE(n->pprev, NULL); 38 38 } 39 39 } 40 40 41 + /** 42 + * hlist_nulls_first_rcu - returns the first element of the hash list. 43 + * @head: the head of the list. 44 + */ 41 45 #define hlist_nulls_first_rcu(head) \ 42 46 (*((struct hlist_nulls_node __rcu __force **)&(head)->first)) 43 47 48 + /** 49 + * hlist_nulls_next_rcu - returns the element of the list after @node. 50 + * @node: element of the list. 51 + */ 44 52 #define hlist_nulls_next_rcu(node) \ 45 53 (*((struct hlist_nulls_node __rcu __force **)&(node)->next)) 46 54 ··· 74 66 static inline void hlist_nulls_del_rcu(struct hlist_nulls_node *n) 75 67 { 76 68 __hlist_nulls_del(n); 77 - n->pprev = LIST_POISON2; 69 + WRITE_ONCE(n->pprev, LIST_POISON2); 78 70 } 79 71 80 72 /** ··· 102 94 struct hlist_nulls_node *first = h->first; 103 95 104 96 n->next = first; 105 - n->pprev = &h->first; 97 + WRITE_ONCE(n->pprev, &h->first); 106 98 rcu_assign_pointer(hlist_nulls_first_rcu(h), n); 107 99 if (!is_a_nulls(first)) 108 - first->pprev = &n->next; 100 + WRITE_ONCE(first->pprev, &n->next); 109 101 } 110 102 111 103 /** ··· 149 141 * hlist_nulls_for_each_entry_rcu - iterate over rcu list of given type 150 142 * @tpos: the type * to use as a loop cursor. 151 143 * @pos: the &struct hlist_nulls_node to use as a loop cursor. 152 - * @head: the head for your list. 144 + * @head: the head of the list. 153 145 * @member: the name of the hlist_nulls_node within the struct. 154 146 * 155 147 * The barrier() is needed to make sure compiler doesn't cache first element [1], ··· 169 161 * iterate over list of given type safe against removal of list entry 170 162 * @tpos: the type * to use as a loop cursor. 171 163 * @pos: the &struct hlist_nulls_node to use as a loop cursor. 172 - * @head: the head for your list. 164 + * @head: the head of the list. 173 165 * @member: the name of the hlist_nulls_node within the struct. 174 166 */ 175 167 #define hlist_nulls_for_each_entry_safe(tpos, pos, head, member) \

+8 -20

include/linux/rcupdate.h

··· 154 154 * 155 155 * This macro resembles cond_resched(), except that it is defined to 156 156 * report potential quiescent states to RCU-tasks even if the cond_resched() 157 - * machinery were to be shut off, as some advocate for PREEMPT kernels. 157 + * machinery were to be shut off, as some advocate for PREEMPTION kernels. 158 158 */ 159 159 #define cond_resched_tasks_rcu_qs() \ 160 160 do { \ ··· 167 167 * TREE_RCU and rcu_barrier_() primitives in TINY_RCU. 168 168 */ 169 169 170 - #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) 170 + #if defined(CONFIG_TREE_RCU) 171 171 #include <linux/rcutree.h> 172 172 #elif defined(CONFIG_TINY_RCU) 173 173 #include <linux/rcutiny.h> ··· 401 401 }) 402 402 403 403 /** 404 - * rcu_swap_protected() - swap an RCU and a regular pointer 405 - * @rcu_ptr: RCU pointer 406 - * @ptr: regular pointer 407 - * @c: the conditions under which the dereference will take place 408 - * 409 - * Perform swap(@rcu_ptr, @ptr) where @rcu_ptr is an RCU-annotated pointer and 410 - * @c is the argument that is passed to the rcu_dereference_protected() call 411 - * used to read that pointer. 412 - */ 413 - #define rcu_swap_protected(rcu_ptr, ptr, c) do { \ 414 - typeof(ptr) __tmp = rcu_dereference_protected((rcu_ptr), (c)); \ 415 - rcu_assign_pointer((rcu_ptr), (ptr)); \ 416 - (ptr) = __tmp; \ 417 - } while (0) 418 - 419 - /** 420 404 * rcu_access_pointer() - fetch RCU pointer with no dereferencing 421 405 * @p: The pointer to read 422 406 * ··· 582 598 * 583 599 * You can avoid reading and understanding the next paragraph by 584 600 * following this rule: don't put anything in an rcu_read_lock() RCU 585 - * read-side critical section that would block in a !PREEMPT kernel. 601 + * read-side critical section that would block in a !PREEMPTION kernel. 586 602 * But if you want the full story, read on! 587 603 * 588 - * In non-preemptible RCU implementations (TREE_RCU and TINY_RCU), 604 + * In non-preemptible RCU implementations (pure TREE_RCU and TINY_RCU), 589 605 * it is illegal to block while in an RCU read-side critical section. 590 606 * In preemptible RCU implementations (PREEMPT_RCU) in CONFIG_PREEMPTION 591 607 * kernel builds, RCU read-side critical sections may be preempted, ··· 895 911 WARN_ON_ONCE(func != (rcu_callback_t)~0L); 896 912 return false; 897 913 } 914 + 915 + /* kernel/ksysfs.c definitions */ 916 + extern int rcu_expedited; 917 + extern int rcu_normal; 898 918 899 919 #endif /* __LINUX_RCUPDATE_H */

+1

include/linux/rcutiny.h

··· 85 85 static inline void rcu_end_inkernel_boot(void) { } 86 86 static inline bool rcu_is_watching(void) { return true; } 87 87 static inline void rcu_momentary_dyntick_idle(void) { } 88 + static inline void kfree_rcu_scheduler_running(void) { } 88 89 89 90 /* Avoid RCU read-side critical sections leaking across. */ 90 91 static inline void rcu_all_qs(void) { barrier(); }

+1

include/linux/rcutree.h

··· 38 38 void rcu_barrier(void); 39 39 bool rcu_eqs_special_set(int cpu); 40 40 void rcu_momentary_dyntick_idle(void); 41 + void kfree_rcu_scheduler_running(void); 41 42 unsigned long get_state_synchronize_rcu(void); 42 43 void cond_synchronize_rcu(unsigned long oldstate); 43 44

+4 -1

include/linux/tick.h

··· 109 109 TICK_DEP_BIT_PERF_EVENTS = 1, 110 110 TICK_DEP_BIT_SCHED = 2, 111 111 TICK_DEP_BIT_CLOCK_UNSTABLE = 3, 112 - TICK_DEP_BIT_RCU = 4 112 + TICK_DEP_BIT_RCU = 4, 113 + TICK_DEP_BIT_RCU_EXP = 5 113 114 }; 115 + #define TICK_DEP_BIT_MAX TICK_DEP_BIT_RCU_EXP 114 116 115 117 #define TICK_DEP_MASK_NONE 0 116 118 #define TICK_DEP_MASK_POSIX_TIMER (1 << TICK_DEP_BIT_POSIX_TIMER) ··· 120 118 #define TICK_DEP_MASK_SCHED (1 << TICK_DEP_BIT_SCHED) 121 119 #define TICK_DEP_MASK_CLOCK_UNSTABLE (1 << TICK_DEP_BIT_CLOCK_UNSTABLE) 122 120 #define TICK_DEP_MASK_RCU (1 << TICK_DEP_BIT_RCU) 121 + #define TICK_DEP_MASK_RCU_EXP (1 << TICK_DEP_BIT_RCU_EXP) 123 122 124 123 #ifdef CONFIG_NO_HZ_COMMON 125 124 extern bool tick_nohz_enabled;

+16 -24

include/trace/events/rcu.h

··· 41 41 TP_printk("%s", __entry->s) 42 42 ); 43 43 44 - #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) 44 + #if defined(CONFIG_TREE_RCU) 45 45 46 46 /* 47 47 * Tracepoint for grace-period events. Takes a string identifying the ··· 432 432 __entry->cpu, __entry->qsevent) 433 433 ); 434 434 435 - #endif /* #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) */ 435 + #endif /* #if defined(CONFIG_TREE_RCU) */ 436 436 437 437 /* 438 438 * Tracepoint for dyntick-idle entry/exit events. These take a string ··· 449 449 */ 450 450 TRACE_EVENT_RCU(rcu_dyntick, 451 451 452 - TP_PROTO(const char *polarity, long oldnesting, long newnesting, atomic_t dynticks), 452 + TP_PROTO(const char *polarity, long oldnesting, long newnesting, int dynticks), 453 453 454 454 TP_ARGS(polarity, oldnesting, newnesting, dynticks), 455 455 ··· 464 464 __entry->polarity = polarity; 465 465 __entry->oldnesting = oldnesting; 466 466 __entry->newnesting = newnesting; 467 - __entry->dynticks = atomic_read(&dynticks); 467 + __entry->dynticks = dynticks; 468 468 ), 469 469 470 470 TP_printk("%s %lx %lx %#3x", __entry->polarity, ··· 481 481 */ 482 482 TRACE_EVENT_RCU(rcu_callback, 483 483 484 - TP_PROTO(const char *rcuname, struct rcu_head *rhp, long qlen_lazy, 485 - long qlen), 484 + TP_PROTO(const char *rcuname, struct rcu_head *rhp, long qlen), 486 485 487 - TP_ARGS(rcuname, rhp, qlen_lazy, qlen), 486 + TP_ARGS(rcuname, rhp, qlen), 488 487 489 488 TP_STRUCT__entry( 490 489 __field(const char *, rcuname) 491 490 __field(void *, rhp) 492 491 __field(void *, func) 493 - __field(long, qlen_lazy) 494 492 __field(long, qlen) 495 493 ), 496 494 ··· 496 498 __entry->rcuname = rcuname; 497 499 __entry->rhp = rhp; 498 500 __entry->func = rhp->func; 499 - __entry->qlen_lazy = qlen_lazy; 500 501 __entry->qlen = qlen; 501 502 ), 502 503 503 - TP_printk("%s rhp=%p func=%ps %ld/%ld", 504 + TP_printk("%s rhp=%p func=%ps %ld", 504 505 __entry->rcuname, __entry->rhp, __entry->func, 505 - __entry->qlen_lazy, __entry->qlen) 506 + __entry->qlen) 506 507 ); 507 508 508 509 /* ··· 515 518 TRACE_EVENT_RCU(rcu_kfree_callback, 516 519 517 520 TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset, 518 - long qlen_lazy, long qlen), 521 + long qlen), 519 522 520 - TP_ARGS(rcuname, rhp, offset, qlen_lazy, qlen), 523 + TP_ARGS(rcuname, rhp, offset, qlen), 521 524 522 525 TP_STRUCT__entry( 523 526 __field(const char *, rcuname) 524 527 __field(void *, rhp) 525 528 __field(unsigned long, offset) 526 - __field(long, qlen_lazy) 527 529 __field(long, qlen) 528 530 ), 529 531 ··· 530 534 __entry->rcuname = rcuname; 531 535 __entry->rhp = rhp; 532 536 __entry->offset = offset; 533 - __entry->qlen_lazy = qlen_lazy; 534 537 __entry->qlen = qlen; 535 538 ), 536 539 537 - TP_printk("%s rhp=%p func=%ld %ld/%ld", 540 + TP_printk("%s rhp=%p func=%ld %ld", 538 541 __entry->rcuname, __entry->rhp, __entry->offset, 539 - __entry->qlen_lazy, __entry->qlen) 542 + __entry->qlen) 540 543 ); 541 544 542 545 /* ··· 547 552 */ 548 553 TRACE_EVENT_RCU(rcu_batch_start, 549 554 550 - TP_PROTO(const char *rcuname, long qlen_lazy, long qlen, long blimit), 555 + TP_PROTO(const char *rcuname, long qlen, long blimit), 551 556 552 - TP_ARGS(rcuname, qlen_lazy, qlen, blimit), 557 + TP_ARGS(rcuname, qlen, blimit), 553 558 554 559 TP_STRUCT__entry( 555 560 __field(const char *, rcuname) 556 - __field(long, qlen_lazy) 557 561 __field(long, qlen) 558 562 __field(long, blimit) 559 563 ), 560 564 561 565 TP_fast_assign( 562 566 __entry->rcuname = rcuname; 563 - __entry->qlen_lazy = qlen_lazy; 564 567 __entry->qlen = qlen; 565 568 __entry->blimit = blimit; 566 569 ), 567 570 568 - TP_printk("%s CBs=%ld/%ld bl=%ld", 569 - __entry->rcuname, __entry->qlen_lazy, __entry->qlen, 570 - __entry->blimit) 571 + TP_printk("%s CBs=%ld bl=%ld", 572 + __entry->rcuname, __entry->qlen, __entry->blimit) 571 573 ); 572 574 573 575 /*

+9 -8

kernel/rcu/Kconfig

··· 7 7 8 8 config TREE_RCU 9 9 bool 10 - default y if !PREEMPTION && SMP 10 + default y if SMP 11 11 help 12 12 This option selects the RCU implementation that is 13 13 designed for very large SMP system with hundreds or ··· 17 17 config PREEMPT_RCU 18 18 bool 19 19 default y if PREEMPTION 20 + select TREE_RCU 20 21 help 21 22 This option selects the RCU implementation that is 22 23 designed for very large SMP systems with hundreds or ··· 79 78 user-mode execution as quiescent states. 80 79 81 80 config RCU_STALL_COMMON 82 - def_bool ( TREE_RCU || PREEMPT_RCU ) 81 + def_bool TREE_RCU 83 82 help 84 83 This option enables RCU CPU stall code that is common between 85 84 the TINY and TREE variants of RCU. The purpose is to allow ··· 87 86 making these warnings mandatory for the tree variants. 88 87 89 88 config RCU_NEED_SEGCBLIST 90 - def_bool ( TREE_RCU || PREEMPT_RCU || TREE_SRCU ) 89 + def_bool ( TREE_RCU || TREE_SRCU ) 91 90 92 91 config RCU_FANOUT 93 92 int "Tree-based hierarchical RCU fanout value" 94 93 range 2 64 if 64BIT 95 94 range 2 32 if !64BIT 96 - depends on (TREE_RCU || PREEMPT_RCU) && RCU_EXPERT 95 + depends on TREE_RCU && RCU_EXPERT 97 96 default 64 if 64BIT 98 97 default 32 if !64BIT 99 98 help ··· 113 112 int "Tree-based hierarchical RCU leaf-level fanout value" 114 113 range 2 64 if 64BIT 115 114 range 2 32 if !64BIT 116 - depends on (TREE_RCU || PREEMPT_RCU) && RCU_EXPERT 115 + depends on TREE_RCU && RCU_EXPERT 117 116 default 16 118 117 help 119 118 This option controls the leaf-level fanout of hierarchical ··· 188 187 189 188 config RCU_NOCB_CPU 190 189 bool "Offload RCU callback processing from boot-selected CPUs" 191 - depends on TREE_RCU || PREEMPT_RCU 190 + depends on TREE_RCU 192 191 depends on RCU_EXPERT || NO_HZ_FULL 193 192 default n 194 193 help ··· 201 200 specified at boot time by the rcu_nocbs parameter. For each 202 201 such CPU, a kthread ("rcuox/N") will be created to invoke 203 202 callbacks, where the "N" is the CPU being offloaded, and where 204 - the "p" for RCU-preempt (PREEMPT kernels) and "s" for RCU-sched 205 - (!PREEMPT kernels). Nothing prevents this kthread from running 203 + the "p" for RCU-preempt (PREEMPTION kernels) and "s" for RCU-sched 204 + (!PREEMPTION kernels). Nothing prevents this kthread from running 206 205 on the specified CPUs, but (1) the kthreads may be preempted 207 206 between each callback, and (2) affinity or cgroups can be used 208 207 to force the kthreads to run on whatever set of CPUs is desired.

-1

kernel/rcu/Makefile

··· 9 9 obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o 10 10 obj-$(CONFIG_RCU_PERF_TEST) += rcuperf.o 11 11 obj-$(CONFIG_TREE_RCU) += tree.o 12 - obj-$(CONFIG_PREEMPT_RCU) += tree.o 13 12 obj-$(CONFIG_TINY_RCU) += tiny.o 14 13 obj-$(CONFIG_RCU_NEED_SEGCBLIST) += rcu_segcblist.o

+3 -30

kernel/rcu/rcu.h

··· 198 198 } 199 199 #endif /* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */ 200 200 201 - void kfree(const void *); 202 - 203 - /* 204 - * Reclaim the specified callback, either by invoking it (non-lazy case) 205 - * or freeing it directly (lazy case). Return true if lazy, false otherwise. 206 - */ 207 - static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head) 208 - { 209 - rcu_callback_t f; 210 - unsigned long offset = (unsigned long)head->func; 211 - 212 - rcu_lock_acquire(&rcu_callback_map); 213 - if (__is_kfree_rcu_offset(offset)) { 214 - trace_rcu_invoke_kfree_callback(rn, head, offset); 215 - kfree((void *)head - offset); 216 - rcu_lock_release(&rcu_callback_map); 217 - return true; 218 - } else { 219 - trace_rcu_invoke_callback(rn, head); 220 - f = head->func; 221 - WRITE_ONCE(head->func, (rcu_callback_t)0L); 222 - f(head); 223 - rcu_lock_release(&rcu_callback_map); 224 - return false; 225 - } 226 - } 227 - 228 201 #ifdef CONFIG_RCU_STALL_COMMON 229 202 230 203 extern int rcu_cpu_stall_ftrace_dump; ··· 254 281 */ 255 282 extern void resched_cpu(int cpu); 256 283 257 - #if defined(SRCU) || !defined(TINY_RCU) 284 + #if defined(CONFIG_SRCU) || !defined(CONFIG_TINY_RCU) 258 285 259 286 #include <linux/rcu_node_tree.h> 260 287 ··· 391 418 #define raw_lockdep_assert_held_rcu_node(p) \ 392 419 lockdep_assert_held(&ACCESS_PRIVATE(p, lock)) 393 420 394 - #endif /* #if defined(SRCU) || !defined(TINY_RCU) */ 421 + #endif /* #if defined(CONFIG_SRCU) || !defined(CONFIG_TINY_RCU) */ 395 422 396 423 #ifdef CONFIG_SRCU 397 424 void srcu_init(void); ··· 427 454 INVALID_RCU_FLAVOR 428 455 }; 429 456 430 - #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) 457 + #if defined(CONFIG_TREE_RCU) 431 458 void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags, 432 459 unsigned long *gp_seq); 433 460 void do_trace_rcu_torture_read(const char *rcutorturename,

+3 -22

kernel/rcu/rcu_segcblist.c

··· 20 20 rclp->head = NULL; 21 21 rclp->tail = &rclp->head; 22 22 rclp->len = 0; 23 - rclp->len_lazy = 0; 24 23 } 25 24 26 25 /* 27 26 * Enqueue an rcu_head structure onto the specified callback list. 28 - * This function assumes that the callback is non-lazy because it 29 - * is intended for use by no-CBs CPUs, which do not distinguish 30 - * between lazy and non-lazy RCU callbacks. 31 27 */ 32 28 void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp) 33 29 { ··· 50 54 else 51 55 drclp->tail = &drclp->head; 52 56 drclp->len = srclp->len; 53 - drclp->len_lazy = srclp->len_lazy; 54 57 if (!rhp) { 55 58 rcu_cblist_init(srclp); 56 59 } else { ··· 57 62 srclp->head = rhp; 58 63 srclp->tail = &rhp->next; 59 64 WRITE_ONCE(srclp->len, 1); 60 - srclp->len_lazy = 0; 61 65 } 62 66 } 63 67 64 68 /* 65 69 * Dequeue the oldest rcu_head structure from the specified callback 66 - * list. This function assumes that the callback is non-lazy, but 67 - * the caller can later invoke rcu_cblist_dequeued_lazy() if it 68 - * finds otherwise (and if it cares about laziness). This allows 69 - * different users to have different ways of determining laziness. 70 + * list. 70 71 */ 71 72 struct rcu_head *rcu_cblist_dequeue(struct rcu_cblist *rclp) 72 73 { ··· 152 161 for (i = 0; i < RCU_CBLIST_NSEGS; i++) 153 162 rsclp->tails[i] = &rsclp->head; 154 163 rcu_segcblist_set_len(rsclp, 0); 155 - rsclp->len_lazy = 0; 156 164 rsclp->enabled = 1; 157 165 } 158 166 ··· 163 173 { 164 174 WARN_ON_ONCE(!rcu_segcblist_empty(rsclp)); 165 175 WARN_ON_ONCE(rcu_segcblist_n_cbs(rsclp)); 166 - WARN_ON_ONCE(rcu_segcblist_n_lazy_cbs(rsclp)); 167 176 rsclp->enabled = 0; 168 177 } 169 178 ··· 242 253 * absolutely not OK for it to ever miss posting a callback. 243 254 */ 244 255 void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp, 245 - struct rcu_head *rhp, bool lazy) 256 + struct rcu_head *rhp) 246 257 { 247 258 rcu_segcblist_inc_len(rsclp); 248 - if (lazy) 249 - rsclp->len_lazy++; 250 259 smp_mb(); /* Ensure counts are updated before callback is enqueued. */ 251 260 rhp->next = NULL; 252 261 WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rhp); ··· 262 275 * period. You have been warned. 263 276 */ 264 277 bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp, 265 - struct rcu_head *rhp, bool lazy) 278 + struct rcu_head *rhp) 266 279 { 267 280 int i; 268 281 269 282 if (rcu_segcblist_n_cbs(rsclp) == 0) 270 283 return false; 271 284 rcu_segcblist_inc_len(rsclp); 272 - if (lazy) 273 - rsclp->len_lazy++; 274 285 smp_mb(); /* Ensure counts are updated before callback is entrained. */ 275 286 rhp->next = NULL; 276 287 for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--) ··· 292 307 void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp, 293 308 struct rcu_cblist *rclp) 294 309 { 295 - rclp->len_lazy += rsclp->len_lazy; 296 - rsclp->len_lazy = 0; 297 310 rclp->len = rcu_segcblist_xchg_len(rsclp, 0); 298 311 } 299 312 ··· 344 361 void rcu_segcblist_insert_count(struct rcu_segcblist *rsclp, 345 362 struct rcu_cblist *rclp) 346 363 { 347 - rsclp->len_lazy += rclp->len_lazy; 348 364 rcu_segcblist_add_len(rsclp, rclp->len); 349 - rclp->len_lazy = 0; 350 365 rclp->len = 0; 351 366 } 352 367

+2 -23

kernel/rcu/rcu_segcblist.h

··· 15 15 return READ_ONCE(rclp->len); 16 16 } 17 17 18 - /* 19 - * Account for the fact that a previously dequeued callback turned out 20 - * to be marked as lazy. 21 - */ 22 - static inline void rcu_cblist_dequeued_lazy(struct rcu_cblist *rclp) 23 - { 24 - rclp->len_lazy--; 25 - } 26 - 27 18 void rcu_cblist_init(struct rcu_cblist *rclp); 28 19 void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp); 29 20 void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp, ··· 48 57 #else 49 58 return READ_ONCE(rsclp->len); 50 59 #endif 51 - } 52 - 53 - /* Return number of lazy callbacks in segmented callback list. */ 54 - static inline long rcu_segcblist_n_lazy_cbs(struct rcu_segcblist *rsclp) 55 - { 56 - return rsclp->len_lazy; 57 - } 58 - 59 - /* Return number of lazy callbacks in segmented callback list. */ 60 - static inline long rcu_segcblist_n_nonlazy_cbs(struct rcu_segcblist *rsclp) 61 - { 62 - return rcu_segcblist_n_cbs(rsclp) - rsclp->len_lazy; 63 60 } 64 61 65 62 /* ··· 85 106 struct rcu_head *rcu_segcblist_first_pend_cb(struct rcu_segcblist *rsclp); 86 107 bool rcu_segcblist_nextgp(struct rcu_segcblist *rsclp, unsigned long *lp); 87 108 void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp, 88 - struct rcu_head *rhp, bool lazy); 109 + struct rcu_head *rhp); 89 110 bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp, 90 - struct rcu_head *rhp, bool lazy); 111 + struct rcu_head *rhp); 91 112 void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp, 92 113 struct rcu_cblist *rclp); 93 114 void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,

+165 -8

kernel/rcu/rcuperf.c

··· 86 86 "Shutdown at end of performance tests."); 87 87 torture_param(int, verbose, 1, "Enable verbose debugging printk()s"); 88 88 torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable"); 89 + torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() perf test?"); 89 90 90 91 static char *perf_type = "rcu"; 91 92 module_param(perf_type, charp, 0444); ··· 106 105 static wait_queue_head_t shutdown_wq; 107 106 static u64 t_rcu_perf_writer_started; 108 107 static u64 t_rcu_perf_writer_finished; 109 - static unsigned long b_rcu_perf_writer_started; 110 - static unsigned long b_rcu_perf_writer_finished; 108 + static unsigned long b_rcu_gp_test_started; 109 + static unsigned long b_rcu_gp_test_finished; 111 110 static DEFINE_PER_CPU(atomic_t, n_async_inflight); 112 111 113 112 #define MAX_MEAS 10000 ··· 379 378 if (atomic_inc_return(&n_rcu_perf_writer_started) >= nrealwriters) { 380 379 t_rcu_perf_writer_started = t; 381 380 if (gp_exp) { 382 - b_rcu_perf_writer_started = 381 + b_rcu_gp_test_started = 383 382 cur_ops->exp_completed() / 2; 384 383 } else { 385 - b_rcu_perf_writer_started = cur_ops->get_gp_seq(); 384 + b_rcu_gp_test_started = cur_ops->get_gp_seq(); 386 385 } 387 386 } 388 387 ··· 430 429 PERFOUT_STRING("Test complete"); 431 430 t_rcu_perf_writer_finished = t; 432 431 if (gp_exp) { 433 - b_rcu_perf_writer_finished = 432 + b_rcu_gp_test_finished = 434 433 cur_ops->exp_completed() / 2; 435 434 } else { 436 - b_rcu_perf_writer_finished = 435 + b_rcu_gp_test_finished = 437 436 cur_ops->get_gp_seq(); 438 437 } 439 438 if (shutdown) { ··· 516 515 t_rcu_perf_writer_finished - 517 516 t_rcu_perf_writer_started, 518 517 ngps, 519 - rcuperf_seq_diff(b_rcu_perf_writer_finished, 520 - b_rcu_perf_writer_started)); 518 + rcuperf_seq_diff(b_rcu_gp_test_finished, 519 + b_rcu_gp_test_started)); 521 520 for (i = 0; i < nrealwriters; i++) { 522 521 if (!writer_durations) 523 522 break; ··· 585 584 return -EINVAL; 586 585 } 587 586 587 + /* 588 + * kfree_rcu() performance tests: Start a kfree_rcu() loop on all CPUs for number 589 + * of iterations and measure total time and number of GP for all iterations to complete. 590 + */ 591 + 592 + torture_param(int, kfree_nthreads, -1, "Number of threads running loops of kfree_rcu()."); 593 + torture_param(int, kfree_alloc_num, 8000, "Number of allocations and frees done in an iteration."); 594 + torture_param(int, kfree_loops, 10, "Number of loops doing kfree_alloc_num allocations and frees."); 595 + 596 + static struct task_struct **kfree_reader_tasks; 597 + static int kfree_nrealthreads; 598 + static atomic_t n_kfree_perf_thread_started; 599 + static atomic_t n_kfree_perf_thread_ended; 600 + 601 + struct kfree_obj { 602 + char kfree_obj[8]; 603 + struct rcu_head rh; 604 + }; 605 + 606 + static int 607 + kfree_perf_thread(void *arg) 608 + { 609 + int i, loop = 0; 610 + long me = (long)arg; 611 + struct kfree_obj *alloc_ptr; 612 + u64 start_time, end_time; 613 + 614 + VERBOSE_PERFOUT_STRING("kfree_perf_thread task started"); 615 + set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids)); 616 + set_user_nice(current, MAX_NICE); 617 + 618 + start_time = ktime_get_mono_fast_ns(); 619 + 620 + if (atomic_inc_return(&n_kfree_perf_thread_started) >= kfree_nrealthreads) { 621 + if (gp_exp) 622 + b_rcu_gp_test_started = cur_ops->exp_completed() / 2; 623 + else 624 + b_rcu_gp_test_started = cur_ops->get_gp_seq(); 625 + } 626 + 627 + do { 628 + for (i = 0; i < kfree_alloc_num; i++) { 629 + alloc_ptr = kmalloc(sizeof(struct kfree_obj), GFP_KERNEL); 630 + if (!alloc_ptr) 631 + return -ENOMEM; 632 + 633 + kfree_rcu(alloc_ptr, rh); 634 + } 635 + 636 + cond_resched(); 637 + } while (!torture_must_stop() && ++loop < kfree_loops); 638 + 639 + if (atomic_inc_return(&n_kfree_perf_thread_ended) >= kfree_nrealthreads) { 640 + end_time = ktime_get_mono_fast_ns(); 641 + 642 + if (gp_exp) 643 + b_rcu_gp_test_finished = cur_ops->exp_completed() / 2; 644 + else 645 + b_rcu_gp_test_finished = cur_ops->get_gp_seq(); 646 + 647 + pr_alert("Total time taken by all kfree'ers: %llu ns, loops: %d, batches: %ld\n", 648 + (unsigned long long)(end_time - start_time), kfree_loops, 649 + rcuperf_seq_diff(b_rcu_gp_test_finished, b_rcu_gp_test_started)); 650 + if (shutdown) { 651 + smp_mb(); /* Assign before wake. */ 652 + wake_up(&shutdown_wq); 653 + } 654 + } 655 + 656 + torture_kthread_stopping("kfree_perf_thread"); 657 + return 0; 658 + } 659 + 660 + static void 661 + kfree_perf_cleanup(void) 662 + { 663 + int i; 664 + 665 + if (torture_cleanup_begin()) 666 + return; 667 + 668 + if (kfree_reader_tasks) { 669 + for (i = 0; i < kfree_nrealthreads; i++) 670 + torture_stop_kthread(kfree_perf_thread, 671 + kfree_reader_tasks[i]); 672 + kfree(kfree_reader_tasks); 673 + } 674 + 675 + torture_cleanup_end(); 676 + } 677 + 678 + /* 679 + * shutdown kthread. Just waits to be awakened, then shuts down system. 680 + */ 681 + static int 682 + kfree_perf_shutdown(void *arg) 683 + { 684 + do { 685 + wait_event(shutdown_wq, 686 + atomic_read(&n_kfree_perf_thread_ended) >= 687 + kfree_nrealthreads); 688 + } while (atomic_read(&n_kfree_perf_thread_ended) < kfree_nrealthreads); 689 + 690 + smp_mb(); /* Wake before output. */ 691 + 692 + kfree_perf_cleanup(); 693 + kernel_power_off(); 694 + return -EINVAL; 695 + } 696 + 697 + static int __init 698 + kfree_perf_init(void) 699 + { 700 + long i; 701 + int firsterr = 0; 702 + 703 + kfree_nrealthreads = compute_real(kfree_nthreads); 704 + /* Start up the kthreads. */ 705 + if (shutdown) { 706 + init_waitqueue_head(&shutdown_wq); 707 + firsterr = torture_create_kthread(kfree_perf_shutdown, NULL, 708 + shutdown_task); 709 + if (firsterr) 710 + goto unwind; 711 + schedule_timeout_uninterruptible(1); 712 + } 713 + 714 + kfree_reader_tasks = kcalloc(kfree_nrealthreads, sizeof(kfree_reader_tasks[0]), 715 + GFP_KERNEL); 716 + if (kfree_reader_tasks == NULL) { 717 + firsterr = -ENOMEM; 718 + goto unwind; 719 + } 720 + 721 + for (i = 0; i < kfree_nrealthreads; i++) { 722 + firsterr = torture_create_kthread(kfree_perf_thread, (void *)i, 723 + kfree_reader_tasks[i]); 724 + if (firsterr) 725 + goto unwind; 726 + } 727 + 728 + while (atomic_read(&n_kfree_perf_thread_started) < kfree_nrealthreads) 729 + schedule_timeout_uninterruptible(1); 730 + 731 + torture_init_end(); 732 + return 0; 733 + 734 + unwind: 735 + torture_init_end(); 736 + kfree_perf_cleanup(); 737 + return firsterr; 738 + } 739 + 588 740 static int __init 589 741 rcu_perf_init(void) 590 742 { ··· 769 615 } 770 616 if (cur_ops->init) 771 617 cur_ops->init(); 618 + 619 + if (kfree_rcu_test) 620 + return kfree_perf_init(); 772 621 773 622 nrealwriters = compute_real(nwriters); 774 623 nrealreaders = compute_real(nreaders);

+81 -60

kernel/rcu/rcutorture.c

··· 1661 1661 struct rcu_fwd_cb { 1662 1662 struct rcu_head rh; 1663 1663 struct rcu_fwd_cb *rfc_next; 1664 + struct rcu_fwd *rfc_rfp; 1664 1665 int rfc_gps; 1665 1666 }; 1666 - static DEFINE_SPINLOCK(rcu_fwd_lock); 1667 - static struct rcu_fwd_cb *rcu_fwd_cb_head; 1668 - static struct rcu_fwd_cb **rcu_fwd_cb_tail = &rcu_fwd_cb_head; 1669 - static long n_launders_cb; 1670 - static unsigned long rcu_fwd_startat; 1671 - static bool rcu_fwd_emergency_stop; 1667 + 1672 1668 #define MAX_FWD_CB_JIFFIES (8 * HZ) /* Maximum CB test duration. */ 1673 1669 #define MIN_FWD_CB_LAUNDERS 3 /* This many CB invocations to count. */ 1674 1670 #define MIN_FWD_CBS_LAUNDERED 100 /* Number of counted CBs. */ 1675 1671 #define FWD_CBS_HIST_DIV 10 /* Histogram buckets/second. */ 1672 + #define N_LAUNDERS_HIST (2 * MAX_FWD_CB_JIFFIES / (HZ / FWD_CBS_HIST_DIV)) 1673 + 1676 1674 struct rcu_launder_hist { 1677 1675 long n_launders; 1678 1676 unsigned long launder_gp_seq; 1679 1677 }; 1680 - #define N_LAUNDERS_HIST (2 * MAX_FWD_CB_JIFFIES / (HZ / FWD_CBS_HIST_DIV)) 1681 - static struct rcu_launder_hist n_launders_hist[N_LAUNDERS_HIST]; 1682 - static unsigned long rcu_launder_gp_seq_start; 1683 1678 1684 - static void rcu_torture_fwd_cb_hist(void) 1679 + struct rcu_fwd { 1680 + spinlock_t rcu_fwd_lock; 1681 + struct rcu_fwd_cb *rcu_fwd_cb_head; 1682 + struct rcu_fwd_cb **rcu_fwd_cb_tail; 1683 + long n_launders_cb; 1684 + unsigned long rcu_fwd_startat; 1685 + struct rcu_launder_hist n_launders_hist[N_LAUNDERS_HIST]; 1686 + unsigned long rcu_launder_gp_seq_start; 1687 + }; 1688 + 1689 + struct rcu_fwd *rcu_fwds; 1690 + bool rcu_fwd_emergency_stop; 1691 + 1692 + static void rcu_torture_fwd_cb_hist(struct rcu_fwd *rfp) 1685 1693 { 1686 1694 unsigned long gps; 1687 1695 unsigned long gps_old; 1688 1696 int i; 1689 1697 int j; 1690 1698 1691 - for (i = ARRAY_SIZE(n_launders_hist) - 1; i > 0; i--) 1692 - if (n_launders_hist[i].n_launders > 0) 1699 + for (i = ARRAY_SIZE(rfp->n_launders_hist) - 1; i > 0; i--) 1700 + if (rfp->n_launders_hist[i].n_launders > 0) 1693 1701 break; 1694 1702 pr_alert("%s: Callback-invocation histogram (duration %lu jiffies):", 1695 - __func__, jiffies - rcu_fwd_startat); 1696 - gps_old = rcu_launder_gp_seq_start; 1703 + __func__, jiffies - rfp->rcu_fwd_startat); 1704 + gps_old = rfp->rcu_launder_gp_seq_start; 1697 1705 for (j = 0; j <= i; j++) { 1698 - gps = n_launders_hist[j].launder_gp_seq; 1706 + gps = rfp->n_launders_hist[j].launder_gp_seq; 1699 1707 pr_cont(" %ds/%d: %ld:%ld", 1700 - j + 1, FWD_CBS_HIST_DIV, n_launders_hist[j].n_launders, 1708 + j + 1, FWD_CBS_HIST_DIV, 1709 + rfp->n_launders_hist[j].n_launders, 1701 1710 rcutorture_seq_diff(gps, gps_old)); 1702 1711 gps_old = gps; 1703 1712 } ··· 1720 1711 int i; 1721 1712 struct rcu_fwd_cb *rfcp = container_of(rhp, struct rcu_fwd_cb, rh); 1722 1713 struct rcu_fwd_cb **rfcpp; 1714 + struct rcu_fwd *rfp = rfcp->rfc_rfp; 1723 1715 1724 1716 rfcp->rfc_next = NULL; 1725 1717 rfcp->rfc_gps++; 1726 - spin_lock_irqsave(&rcu_fwd_lock, flags); 1727 - rfcpp = rcu_fwd_cb_tail; 1728 - rcu_fwd_cb_tail = &rfcp->rfc_next; 1718 + spin_lock_irqsave(&rfp->rcu_fwd_lock, flags); 1719 + rfcpp = rfp->rcu_fwd_cb_tail; 1720 + rfp->rcu_fwd_cb_tail = &rfcp->rfc_next; 1729 1721 WRITE_ONCE(*rfcpp, rfcp); 1730 - WRITE_ONCE(n_launders_cb, n_launders_cb + 1); 1731 - i = ((jiffies - rcu_fwd_startat) / (HZ / FWD_CBS_HIST_DIV)); 1732 - if (i >= ARRAY_SIZE(n_launders_hist)) 1733 - i = ARRAY_SIZE(n_launders_hist) - 1; 1734 - n_launders_hist[i].n_launders++; 1735 - n_launders_hist[i].launder_gp_seq = cur_ops->get_gp_seq(); 1736 - spin_unlock_irqrestore(&rcu_fwd_lock, flags); 1722 + WRITE_ONCE(rfp->n_launders_cb, rfp->n_launders_cb + 1); 1723 + i = ((jiffies - rfp->rcu_fwd_startat) / (HZ / FWD_CBS_HIST_DIV)); 1724 + if (i >= ARRAY_SIZE(rfp->n_launders_hist)) 1725 + i = ARRAY_SIZE(rfp->n_launders_hist) - 1; 1726 + rfp->n_launders_hist[i].n_launders++; 1727 + rfp->n_launders_hist[i].launder_gp_seq = cur_ops->get_gp_seq(); 1728 + spin_unlock_irqrestore(&rfp->rcu_fwd_lock, flags); 1737 1729 } 1738 1730 1739 1731 // Give the scheduler a chance, even on nohz_full CPUs. 1740 1732 static void rcu_torture_fwd_prog_cond_resched(unsigned long iter) 1741 1733 { 1742 - if (IS_ENABLED(CONFIG_PREEMPT) && IS_ENABLED(CONFIG_NO_HZ_FULL)) { 1734 + if (IS_ENABLED(CONFIG_PREEMPTION) && IS_ENABLED(CONFIG_NO_HZ_FULL)) { 1743 1735 // Real call_rcu() floods hit userspace, so emulate that. 1744 1736 if (need_resched() || (iter & 0xfff)) 1745 1737 schedule(); ··· 1754 1744 * Free all callbacks on the rcu_fwd_cb_head list, either because the 1755 1745 * test is over or because we hit an OOM event. 1756 1746 */ 1757 - static unsigned long rcu_torture_fwd_prog_cbfree(void) 1747 + static unsigned long rcu_torture_fwd_prog_cbfree(struct rcu_fwd *rfp) 1758 1748 { 1759 1749 unsigned long flags; 1760 1750 unsigned long freed = 0; 1761 1751 struct rcu_fwd_cb *rfcp; 1762 1752 1763 1753 for (;;) { 1764 - spin_lock_irqsave(&rcu_fwd_lock, flags); 1765 - rfcp = rcu_fwd_cb_head; 1754 + spin_lock_irqsave(&rfp->rcu_fwd_lock, flags); 1755 + rfcp = rfp->rcu_fwd_cb_head; 1766 1756 if (!rfcp) { 1767 - spin_unlock_irqrestore(&rcu_fwd_lock, flags); 1757 + spin_unlock_irqrestore(&rfp->rcu_fwd_lock, flags); 1768 1758 break; 1769 1759 } 1770 - rcu_fwd_cb_head = rfcp->rfc_next; 1771 - if (!rcu_fwd_cb_head) 1772 - rcu_fwd_cb_tail = &rcu_fwd_cb_head; 1773 - spin_unlock_irqrestore(&rcu_fwd_lock, flags); 1760 + rfp->rcu_fwd_cb_head = rfcp->rfc_next; 1761 + if (!rfp->rcu_fwd_cb_head) 1762 + rfp->rcu_fwd_cb_tail = &rfp->rcu_fwd_cb_head; 1763 + spin_unlock_irqrestore(&rfp->rcu_fwd_lock, flags); 1774 1764 kfree(rfcp); 1775 1765 freed++; 1776 1766 rcu_torture_fwd_prog_cond_resched(freed); ··· 1784 1774 } 1785 1775 1786 1776 /* Carry out need_resched()/cond_resched() forward-progress testing. */ 1787 - static void rcu_torture_fwd_prog_nr(int *tested, int *tested_tries) 1777 + static void rcu_torture_fwd_prog_nr(struct rcu_fwd *rfp, 1778 + int *tested, int *tested_tries) 1788 1779 { 1789 1780 unsigned long cver; 1790 1781 unsigned long dur; ··· 1815 1804 sd = cur_ops->stall_dur() + 1; 1816 1805 sd4 = (sd + fwd_progress_div - 1) / fwd_progress_div; 1817 1806 dur = sd4 + torture_random(&trs) % (sd - sd4); 1818 - WRITE_ONCE(rcu_fwd_startat, jiffies); 1819 - stopat = rcu_fwd_startat + dur; 1807 + WRITE_ONCE(rfp->rcu_fwd_startat, jiffies); 1808 + stopat = rfp->rcu_fwd_startat + dur; 1820 1809 while (time_before(jiffies, stopat) && 1821 1810 !shutdown_time_arrived() && 1822 1811 !READ_ONCE(rcu_fwd_emergency_stop) && !torture_must_stop()) { ··· 1851 1840 } 1852 1841 1853 1842 /* Carry out call_rcu() forward-progress testing. */ 1854 - static void rcu_torture_fwd_prog_cr(void) 1843 + static void rcu_torture_fwd_prog_cr(struct rcu_fwd *rfp) 1855 1844 { 1856 1845 unsigned long cver; 1857 1846 unsigned long flags; ··· 1875 1864 /* Loop continuously posting RCU callbacks. */ 1876 1865 WRITE_ONCE(rcu_fwd_cb_nodelay, true); 1877 1866 cur_ops->sync(); /* Later readers see above write. */ 1878 - WRITE_ONCE(rcu_fwd_startat, jiffies); 1879 - stopat = rcu_fwd_startat + MAX_FWD_CB_JIFFIES; 1867 + WRITE_ONCE(rfp->rcu_fwd_startat, jiffies); 1868 + stopat = rfp->rcu_fwd_startat + MAX_FWD_CB_JIFFIES; 1880 1869 n_launders = 0; 1881 - n_launders_cb = 0; 1870 + rfp->n_launders_cb = 0; // Hoist initialization for multi-kthread 1882 1871 n_launders_sa = 0; 1883 1872 n_max_cbs = 0; 1884 1873 n_max_gps = 0; 1885 - for (i = 0; i < ARRAY_SIZE(n_launders_hist); i++) 1886 - n_launders_hist[i].n_launders = 0; 1874 + for (i = 0; i < ARRAY_SIZE(rfp->n_launders_hist); i++) 1875 + rfp->n_launders_hist[i].n_launders = 0; 1887 1876 cver = READ_ONCE(rcu_torture_current_version); 1888 1877 gps = cur_ops->get_gp_seq(); 1889 - rcu_launder_gp_seq_start = gps; 1878 + rfp->rcu_launder_gp_seq_start = gps; 1890 1879 tick_dep_set_task(current, TICK_DEP_BIT_RCU); 1891 1880 while (time_before(jiffies, stopat) && 1892 1881 !shutdown_time_arrived() && 1893 1882 !READ_ONCE(rcu_fwd_emergency_stop) && !torture_must_stop()) { 1894 - rfcp = READ_ONCE(rcu_fwd_cb_head); 1883 + rfcp = READ_ONCE(rfp->rcu_fwd_cb_head); 1895 1884 rfcpn = NULL; 1896 1885 if (rfcp) 1897 1886 rfcpn = READ_ONCE(rfcp->rfc_next); ··· 1899 1888 if (rfcp->rfc_gps >= MIN_FWD_CB_LAUNDERS && 1900 1889 ++n_max_gps >= MIN_FWD_CBS_LAUNDERED) 1901 1890 break; 1902 - rcu_fwd_cb_head = rfcpn; 1891 + rfp->rcu_fwd_cb_head = rfcpn; 1903 1892 n_launders++; 1904 1893 n_launders_sa++; 1905 1894 } else { ··· 1911 1900 n_max_cbs++; 1912 1901 n_launders_sa = 0; 1913 1902 rfcp->rfc_gps = 0; 1903 + rfcp->rfc_rfp = rfp; 1914 1904 } 1915 1905 cur_ops->call(&rfcp->rh, rcu_torture_fwd_cb_cr); 1916 1906 rcu_torture_fwd_prog_cond_resched(n_launders + n_max_cbs); ··· 1922 1910 } 1923 1911 } 1924 1912 stoppedat = jiffies; 1925 - n_launders_cb_snap = READ_ONCE(n_launders_cb); 1913 + n_launders_cb_snap = READ_ONCE(rfp->n_launders_cb); 1926 1914 cver = READ_ONCE(rcu_torture_current_version) - cver; 1927 1915 gps = rcutorture_seq_diff(cur_ops->get_gp_seq(), gps); 1928 1916 cur_ops->cb_barrier(); /* Wait for callbacks to be invoked. */ 1929 - (void)rcu_torture_fwd_prog_cbfree(); 1917 + (void)rcu_torture_fwd_prog_cbfree(rfp); 1930 1918 1931 1919 if (!torture_must_stop() && !READ_ONCE(rcu_fwd_emergency_stop) && 1932 1920 !shutdown_time_arrived()) { 1933 1921 WARN_ON(n_max_gps < MIN_FWD_CBS_LAUNDERED); 1934 1922 pr_alert("%s Duration %lu barrier: %lu pending %ld n_launders: %ld n_launders_sa: %ld n_max_gps: %ld n_max_cbs: %ld cver %ld gps %ld\n", 1935 1923 __func__, 1936 - stoppedat - rcu_fwd_startat, jiffies - stoppedat, 1924 + stoppedat - rfp->rcu_fwd_startat, jiffies - stoppedat, 1937 1925 n_launders + n_max_cbs - n_launders_cb_snap, 1938 1926 n_launders, n_launders_sa, 1939 1927 n_max_gps, n_max_cbs, cver, gps); 1940 - rcu_torture_fwd_cb_hist(); 1928 + rcu_torture_fwd_cb_hist(rfp); 1941 1929 } 1942 1930 schedule_timeout_uninterruptible(HZ); /* Let CBs drain. */ 1943 1931 tick_dep_clear_task(current, TICK_DEP_BIT_RCU); ··· 1952 1940 static int rcutorture_oom_notify(struct notifier_block *self, 1953 1941 unsigned long notused, void *nfreed) 1954 1942 { 1943 + struct rcu_fwd *rfp = rcu_fwds; 1944 + 1955 1945 WARN(1, "%s invoked upon OOM during forward-progress testing.\n", 1956 1946 __func__); 1957 - rcu_torture_fwd_cb_hist(); 1958 - rcu_fwd_progress_check(1 + (jiffies - READ_ONCE(rcu_fwd_startat)) / 2); 1947 + rcu_torture_fwd_cb_hist(rfp); 1948 + rcu_fwd_progress_check(1 + (jiffies - READ_ONCE(rfp->rcu_fwd_startat)) / 2); 1959 1949 WRITE_ONCE(rcu_fwd_emergency_stop, true); 1960 1950 smp_mb(); /* Emergency stop before free and wait to avoid hangs. */ 1961 1951 pr_info("%s: Freed %lu RCU callbacks.\n", 1962 - __func__, rcu_torture_fwd_prog_cbfree()); 1952 + __func__, rcu_torture_fwd_prog_cbfree(rfp)); 1963 1953 rcu_barrier(); 1964 1954 pr_info("%s: Freed %lu RCU callbacks.\n", 1965 - __func__, rcu_torture_fwd_prog_cbfree()); 1955 + __func__, rcu_torture_fwd_prog_cbfree(rfp)); 1966 1956 rcu_barrier(); 1967 1957 pr_info("%s: Freed %lu RCU callbacks.\n", 1968 - __func__, rcu_torture_fwd_prog_cbfree()); 1958 + __func__, rcu_torture_fwd_prog_cbfree(rfp)); 1969 1959 smp_mb(); /* Frees before return to avoid redoing OOM. */ 1970 1960 (*(unsigned long *)nfreed)++; /* Forward progress CBs freed! */ 1971 1961 pr_info("%s returning after OOM processing.\n", __func__); ··· 1981 1967 /* Carry out grace-period forward-progress testing. */ 1982 1968 static int rcu_torture_fwd_prog(void *args) 1983 1969 { 1970 + struct rcu_fwd *rfp = args; 1984 1971 int tested = 0; 1985 1972 int tested_tries = 0; 1986 1973 ··· 1993 1978 schedule_timeout_interruptible(fwd_progress_holdoff * HZ); 1994 1979 WRITE_ONCE(rcu_fwd_emergency_stop, false); 1995 1980 register_oom_notifier(&rcutorture_oom_nb); 1996 - rcu_torture_fwd_prog_nr(&tested, &tested_tries); 1997 - rcu_torture_fwd_prog_cr(); 1981 + rcu_torture_fwd_prog_nr(rfp, &tested, &tested_tries); 1982 + rcu_torture_fwd_prog_cr(rfp); 1998 1983 unregister_oom_notifier(&rcutorture_oom_nb); 1999 1984 2000 1985 /* Avoid slow periods, better to test when busy. */ ··· 2010 1995 /* If forward-progress checking is requested and feasible, spawn the thread. */ 2011 1996 static int __init rcu_torture_fwd_prog_init(void) 2012 1997 { 1998 + struct rcu_fwd *rfp; 1999 + 2013 2000 if (!fwd_progress) 2014 2001 return 0; /* Not requested, so don't do it. */ 2015 2002 if (!cur_ops->stall_dur || cur_ops->stall_dur() <= 0 || ··· 2030 2013 fwd_progress_holdoff = 1; 2031 2014 if (fwd_progress_div <= 0) 2032 2015 fwd_progress_div = 4; 2033 - return torture_create_kthread(rcu_torture_fwd_prog, 2034 - NULL, fwd_prog_task); 2016 + rfp = kzalloc(sizeof(*rfp), GFP_KERNEL); 2017 + if (!rfp) 2018 + return -ENOMEM; 2019 + spin_lock_init(&rfp->rcu_fwd_lock); 2020 + rfp->rcu_fwd_cb_tail = &rfp->rcu_fwd_cb_head; 2021 + return torture_create_kthread(rcu_torture_fwd_prog, rfp, fwd_prog_task); 2035 2022 } 2036 2023 2037 2024 /* Callback function for RCU barrier testing. */

+1 -1

kernel/rcu/srcutiny.c

··· 103 103 104 104 /* 105 105 * Workqueue handler to drive one grace period and invoke any callbacks 106 - * that become ready as a result. Single-CPU and !PREEMPT operation 106 + * that become ready as a result. Single-CPU and !PREEMPTION operation 107 107 * means that we get away with murder on synchronization. ;-) 108 108 */ 109 109 void srcu_drive_gp(struct work_struct *wp)

+6 -5

kernel/rcu/srcutree.c

··· 530 530 idx = rcu_seq_state(ssp->srcu_gp_seq); 531 531 WARN_ON_ONCE(idx != SRCU_STATE_SCAN2); 532 532 cbdelay = srcu_get_delay(ssp); 533 - ssp->srcu_last_gp_end = ktime_get_mono_fast_ns(); 533 + WRITE_ONCE(ssp->srcu_last_gp_end, ktime_get_mono_fast_ns()); 534 534 rcu_seq_end(&ssp->srcu_gp_seq); 535 535 gpseq = rcu_seq_current(&ssp->srcu_gp_seq); 536 536 if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, gpseq)) ··· 762 762 unsigned long flags; 763 763 struct srcu_data *sdp; 764 764 unsigned long t; 765 + unsigned long tlast; 765 766 766 767 /* If the local srcu_data structure has callbacks, not idle. */ 767 768 local_irq_save(flags); ··· 781 780 782 781 /* First, see if enough time has passed since the last GP. */ 783 782 t = ktime_get_mono_fast_ns(); 783 + tlast = READ_ONCE(ssp->srcu_last_gp_end); 784 784 if (exp_holdoff == 0 || 785 - time_in_range_open(t, ssp->srcu_last_gp_end, 786 - ssp->srcu_last_gp_end + exp_holdoff)) 785 + time_in_range_open(t, tlast, tlast + exp_holdoff)) 787 786 return false; /* Too soon after last GP. */ 788 787 789 788 /* Next, check for probable idleness. */ ··· 854 853 local_irq_save(flags); 855 854 sdp = this_cpu_ptr(ssp->sda); 856 855 spin_lock_rcu_node(sdp); 857 - rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp, false); 856 + rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp); 858 857 rcu_segcblist_advance(&sdp->srcu_cblist, 859 858 rcu_seq_current(&ssp->srcu_gp_seq)); 860 859 s = rcu_seq_snap(&ssp->srcu_gp_seq); ··· 1053 1052 sdp->srcu_barrier_head.func = srcu_barrier_cb; 1054 1053 debug_rcu_head_queue(&sdp->srcu_barrier_head); 1055 1054 if (!rcu_segcblist_entrain(&sdp->srcu_cblist, 1056 - &sdp->srcu_barrier_head, 0)) { 1055 + &sdp->srcu_barrier_head)) { 1057 1056 debug_rcu_head_unqueue(&sdp->srcu_barrier_head); 1058 1057 atomic_dec(&ssp->srcu_barrier_cpu_cnt); 1059 1058 }

+27 -1

kernel/rcu/tiny.c

··· 22 22 #include <linux/time.h> 23 23 #include <linux/cpu.h> 24 24 #include <linux/prefetch.h> 25 + #include <linux/slab.h> 25 26 26 27 #include "rcu.h" 27 28 ··· 74 73 } 75 74 } 76 75 76 + /* 77 + * Reclaim the specified callback, either by invoking it for non-kfree cases or 78 + * freeing it directly (for kfree). Return true if kfreeing, false otherwise. 79 + */ 80 + static inline bool rcu_reclaim_tiny(struct rcu_head *head) 81 + { 82 + rcu_callback_t f; 83 + unsigned long offset = (unsigned long)head->func; 84 + 85 + rcu_lock_acquire(&rcu_callback_map); 86 + if (__is_kfree_rcu_offset(offset)) { 87 + trace_rcu_invoke_kfree_callback("", head, offset); 88 + kfree((void *)head - offset); 89 + rcu_lock_release(&rcu_callback_map); 90 + return true; 91 + } 92 + 93 + trace_rcu_invoke_callback("", head); 94 + f = head->func; 95 + WRITE_ONCE(head->func, (rcu_callback_t)0L); 96 + f(head); 97 + rcu_lock_release(&rcu_callback_map); 98 + return false; 99 + } 100 + 77 101 /* Invoke the RCU callbacks whose grace period has elapsed. */ 78 102 static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused) 79 103 { ··· 126 100 prefetch(next); 127 101 debug_rcu_head_unqueue(list); 128 102 local_bh_disable(); 129 - __rcu_reclaim("", list); 103 + rcu_reclaim_tiny(list); 130 104 local_bh_enable(); 131 105 list = next; 132 106 }

+272 -52

kernel/rcu/tree.c

··· 43 43 #include <uapi/linux/sched/types.h> 44 44 #include <linux/prefetch.h> 45 45 #include <linux/delay.h> 46 - #include <linux/stop_machine.h> 47 46 #include <linux/random.h> 48 47 #include <linux/trace_events.h> 49 48 #include <linux/suspend.h> ··· 54 55 #include <linux/oom.h> 55 56 #include <linux/smpboot.h> 56 57 #include <linux/jiffies.h> 58 + #include <linux/slab.h> 57 59 #include <linux/sched/isolation.h> 58 60 #include <linux/sched/clock.h> 59 61 #include "../time/tick-internal.h" ··· 84 84 .dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE, 85 85 .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR), 86 86 }; 87 - struct rcu_state rcu_state = { 87 + static struct rcu_state rcu_state = { 88 88 .level = { &rcu_state.node[0] }, 89 89 .gp_state = RCU_GP_IDLE, 90 90 .gp_seq = (0UL - 300UL) << RCU_SEQ_CTR_SHIFT, ··· 188 188 * held, but the bit corresponding to the current CPU will be stable 189 189 * in most contexts. 190 190 */ 191 - unsigned long rcu_rnp_online_cpus(struct rcu_node *rnp) 191 + static unsigned long rcu_rnp_online_cpus(struct rcu_node *rnp) 192 192 { 193 193 return READ_ONCE(rnp->qsmaskinitnext); 194 194 } ··· 294 294 * 295 295 * No ordering, as we are sampling CPU-local information. 296 296 */ 297 - bool rcu_dynticks_curr_cpu_in_eqs(void) 297 + static bool rcu_dynticks_curr_cpu_in_eqs(void) 298 298 { 299 299 struct rcu_data *rdp = this_cpu_ptr(&rcu_data); 300 300 ··· 305 305 * Snapshot the ->dynticks counter with full ordering so as to allow 306 306 * stable comparison of this counter with past and future snapshots. 307 307 */ 308 - int rcu_dynticks_snap(struct rcu_data *rdp) 308 + static int rcu_dynticks_snap(struct rcu_data *rdp) 309 309 { 310 310 int snap = atomic_add_return(0, &rdp->dynticks); 311 311 ··· 529 529 } 530 530 531 531 /* 532 - * Convert a ->gp_state value to a character string. 533 - */ 534 - static const char *gp_state_getname(short gs) 535 - { 536 - if (gs < 0 || gs >= ARRAY_SIZE(gp_state_names)) 537 - return "???"; 538 - return gp_state_names[gs]; 539 - } 540 - 541 - /* 542 532 * Send along grace-period-related data for rcutorture diagnostics. 543 533 */ 544 534 void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags, ··· 567 577 } 568 578 569 579 lockdep_assert_irqs_disabled(); 570 - trace_rcu_dyntick(TPS("Start"), rdp->dynticks_nesting, 0, rdp->dynticks); 580 + trace_rcu_dyntick(TPS("Start"), rdp->dynticks_nesting, 0, atomic_read(&rdp->dynticks)); 571 581 WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current)); 572 582 rdp = this_cpu_ptr(&rcu_data); 573 583 do_nocb_deferred_wakeup(rdp); ··· 640 650 * leave it in non-RCU-idle state. 641 651 */ 642 652 if (rdp->dynticks_nmi_nesting != 1) { 643 - trace_rcu_dyntick(TPS("--="), rdp->dynticks_nmi_nesting, rdp->dynticks_nmi_nesting - 2, rdp->dynticks); 653 + trace_rcu_dyntick(TPS("--="), rdp->dynticks_nmi_nesting, rdp->dynticks_nmi_nesting - 2, 654 + atomic_read(&rdp->dynticks)); 644 655 WRITE_ONCE(rdp->dynticks_nmi_nesting, /* No store tearing. */ 645 656 rdp->dynticks_nmi_nesting - 2); 646 657 return; 647 658 } 648 659 649 660 /* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */ 650 - trace_rcu_dyntick(TPS("Startirq"), rdp->dynticks_nmi_nesting, 0, rdp->dynticks); 661 + trace_rcu_dyntick(TPS("Startirq"), rdp->dynticks_nmi_nesting, 0, atomic_read(&rdp->dynticks)); 651 662 WRITE_ONCE(rdp->dynticks_nmi_nesting, 0); /* Avoid store tearing. */ 652 663 653 664 if (irq) ··· 735 744 rcu_dynticks_task_exit(); 736 745 rcu_dynticks_eqs_exit(); 737 746 rcu_cleanup_after_idle(); 738 - trace_rcu_dyntick(TPS("End"), rdp->dynticks_nesting, 1, rdp->dynticks); 747 + trace_rcu_dyntick(TPS("End"), rdp->dynticks_nesting, 1, atomic_read(&rdp->dynticks)); 739 748 WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current)); 740 749 WRITE_ONCE(rdp->dynticks_nesting, 1); 741 750 WARN_ON_ONCE(rdp->dynticks_nmi_nesting); ··· 791 800 */ 792 801 static __always_inline void rcu_nmi_enter_common(bool irq) 793 802 { 794 - struct rcu_data *rdp = this_cpu_ptr(&rcu_data); 795 803 long incby = 2; 804 + struct rcu_data *rdp = this_cpu_ptr(&rcu_data); 796 805 797 806 /* Complain about underflow. */ 798 807 WARN_ON_ONCE(rdp->dynticks_nmi_nesting < 0); ··· 819 828 } else if (tick_nohz_full_cpu(rdp->cpu) && 820 829 rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE && 821 830 READ_ONCE(rdp->rcu_urgent_qs) && !rdp->rcu_forced_tick) { 822 - rdp->rcu_forced_tick = true; 823 - tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU); 831 + raw_spin_lock_rcu_node(rdp->mynode); 832 + // Recheck under lock. 833 + if (rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) { 834 + rdp->rcu_forced_tick = true; 835 + tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU); 836 + } 837 + raw_spin_unlock_rcu_node(rdp->mynode); 824 838 } 825 839 trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="), 826 840 rdp->dynticks_nmi_nesting, 827 - rdp->dynticks_nmi_nesting + incby, rdp->dynticks); 841 + rdp->dynticks_nmi_nesting + incby, atomic_read(&rdp->dynticks)); 828 842 WRITE_ONCE(rdp->dynticks_nmi_nesting, /* Prevent store tearing. */ 829 843 rdp->dynticks_nmi_nesting + incby); 830 844 barrier(); ··· 894 898 */ 895 899 static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp) 896 900 { 901 + raw_lockdep_assert_held_rcu_node(rdp->mynode); 897 902 WRITE_ONCE(rdp->rcu_urgent_qs, false); 898 903 WRITE_ONCE(rdp->rcu_need_heavy_qs, false); 899 904 if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_forced_tick) { ··· 1931 1934 struct rcu_node *rnp_p; 1932 1935 1933 1936 raw_lockdep_assert_held_rcu_node(rnp); 1934 - if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_PREEMPTION)) || 1937 + if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_PREEMPT_RCU)) || 1935 1938 WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp)) || 1936 1939 rnp->qsmask != 0) { 1937 1940 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); ··· 2143 2146 /* If no callbacks are ready, just return. */ 2144 2147 if (!rcu_segcblist_ready_cbs(&rdp->cblist)) { 2145 2148 trace_rcu_batch_start(rcu_state.name, 2146 - rcu_segcblist_n_lazy_cbs(&rdp->cblist), 2147 2149 rcu_segcblist_n_cbs(&rdp->cblist), 0); 2148 2150 trace_rcu_batch_end(rcu_state.name, 0, 2149 2151 !rcu_segcblist_empty(&rdp->cblist), ··· 2164 2168 if (unlikely(bl > 100)) 2165 2169 tlimit = local_clock() + rcu_resched_ns; 2166 2170 trace_rcu_batch_start(rcu_state.name, 2167 - rcu_segcblist_n_lazy_cbs(&rdp->cblist), 2168 2171 rcu_segcblist_n_cbs(&rdp->cblist), bl); 2169 2172 rcu_segcblist_extract_done_cbs(&rdp->cblist, &rcl); 2170 2173 if (offloaded) ··· 2174 2179 tick_dep_set_task(current, TICK_DEP_BIT_RCU); 2175 2180 rhp = rcu_cblist_dequeue(&rcl); 2176 2181 for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) { 2182 + rcu_callback_t f; 2183 + 2177 2184 debug_rcu_head_unqueue(rhp); 2178 - if (__rcu_reclaim(rcu_state.name, rhp)) 2179 - rcu_cblist_dequeued_lazy(&rcl); 2185 + 2186 + rcu_lock_acquire(&rcu_callback_map); 2187 + trace_rcu_invoke_callback(rcu_state.name, rhp); 2188 + 2189 + f = rhp->func; 2190 + WRITE_ONCE(rhp->func, (rcu_callback_t)0L); 2191 + f(rhp); 2192 + 2193 + rcu_lock_release(&rcu_callback_map); 2194 + 2180 2195 /* 2181 2196 * Stop only if limit reached and CPU has something to do. 2182 2197 * Note: The rcl structure counts down from zero. ··· 2299 2294 mask = 0; 2300 2295 raw_spin_lock_irqsave_rcu_node(rnp, flags); 2301 2296 if (rnp->qsmask == 0) { 2302 - if (!IS_ENABLED(CONFIG_PREEMPTION) || 2297 + if (!IS_ENABLED(CONFIG_PREEMPT_RCU) || 2303 2298 rcu_preempt_blocked_readers_cgp(rnp)) { 2304 2299 /* 2305 2300 * No point in scanning bits because they ··· 2313 2308 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 2314 2309 continue; 2315 2310 } 2316 - for_each_leaf_node_possible_cpu(rnp, cpu) { 2317 - unsigned long bit = leaf_node_cpu_bit(rnp, cpu); 2318 - if ((rnp->qsmask & bit) != 0) { 2319 - rdp = per_cpu_ptr(&rcu_data, cpu); 2320 - if (f(rdp)) { 2321 - mask |= bit; 2322 - rcu_disable_urgency_upon_qs(rdp); 2323 - } 2311 + for_each_leaf_node_cpu_mask(rnp, cpu, rnp->qsmask) { 2312 + rdp = per_cpu_ptr(&rcu_data, cpu); 2313 + if (f(rdp)) { 2314 + mask |= rdp->grpmask; 2315 + rcu_disable_urgency_upon_qs(rdp); 2324 2316 } 2325 2317 } 2326 2318 if (mask != 0) { ··· 2476 2474 char work, *workp = this_cpu_ptr(&rcu_data.rcu_cpu_has_work); 2477 2475 int spincnt; 2478 2476 2477 + trace_rcu_utilization(TPS("Start CPU kthread@rcu_run")); 2479 2478 for (spincnt = 0; spincnt < 10; spincnt++) { 2480 - trace_rcu_utilization(TPS("Start CPU kthread@rcu_wait")); 2481 2479 local_bh_disable(); 2482 2480 *statusp = RCU_KTHREAD_RUNNING; 2483 2481 local_irq_disable(); ··· 2585 2583 * is expected to specify a CPU. 2586 2584 */ 2587 2585 static void 2588 - __call_rcu(struct rcu_head *head, rcu_callback_t func, bool lazy) 2586 + __call_rcu(struct rcu_head *head, rcu_callback_t func) 2589 2587 { 2590 2588 unsigned long flags; 2591 2589 struct rcu_data *rdp; ··· 2620 2618 if (rcu_segcblist_empty(&rdp->cblist)) 2621 2619 rcu_segcblist_init(&rdp->cblist); 2622 2620 } 2621 + 2623 2622 if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags)) 2624 2623 return; // Enqueued onto ->nocb_bypass, so just leave. 2625 2624 /* If we get here, rcu_nocb_try_bypass() acquired ->nocb_lock. */ 2626 - rcu_segcblist_enqueue(&rdp->cblist, head, lazy); 2625 + rcu_segcblist_enqueue(&rdp->cblist, head); 2627 2626 if (__is_kfree_rcu_offset((unsigned long)func)) 2628 2627 trace_rcu_kfree_callback(rcu_state.name, head, 2629 2628 (unsigned long)func, 2630 - rcu_segcblist_n_lazy_cbs(&rdp->cblist), 2631 2629 rcu_segcblist_n_cbs(&rdp->cblist)); 2632 2630 else 2633 2631 trace_rcu_callback(rcu_state.name, head, 2634 - rcu_segcblist_n_lazy_cbs(&rdp->cblist), 2635 2632 rcu_segcblist_n_cbs(&rdp->cblist)); 2636 2633 2637 2634 /* Go handle any RCU core processing required. */ ··· 2680 2679 */ 2681 2680 void call_rcu(struct rcu_head *head, rcu_callback_t func) 2682 2681 { 2683 - __call_rcu(head, func, 0); 2682 + __call_rcu(head, func); 2684 2683 } 2685 2684 EXPORT_SYMBOL_GPL(call_rcu); 2686 2685 2686 + 2687 + /* Maximum number of jiffies to wait before draining a batch. */ 2688 + #define KFREE_DRAIN_JIFFIES (HZ / 50) 2689 + #define KFREE_N_BATCHES 2 2690 + 2691 + /** 2692 + * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests 2693 + * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period 2694 + * @head_free: List of kfree_rcu() objects waiting for a grace period 2695 + * @krcp: Pointer to @kfree_rcu_cpu structure 2696 + */ 2697 + 2698 + struct kfree_rcu_cpu_work { 2699 + struct rcu_work rcu_work; 2700 + struct rcu_head *head_free; 2701 + struct kfree_rcu_cpu *krcp; 2702 + }; 2703 + 2704 + /** 2705 + * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period 2706 + * @head: List of kfree_rcu() objects not yet waiting for a grace period 2707 + * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period 2708 + * @lock: Synchronize access to this structure 2709 + * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES 2710 + * @monitor_todo: Tracks whether a @monitor_work delayed work is pending 2711 + * @initialized: The @lock and @rcu_work fields have been initialized 2712 + * 2713 + * This is a per-CPU structure. The reason that it is not included in 2714 + * the rcu_data structure is to permit this code to be extracted from 2715 + * the RCU files. Such extraction could allow further optimization of 2716 + * the interactions with the slab allocators. 2717 + */ 2718 + struct kfree_rcu_cpu { 2719 + struct rcu_head *head; 2720 + struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES]; 2721 + spinlock_t lock; 2722 + struct delayed_work monitor_work; 2723 + bool monitor_todo; 2724 + bool initialized; 2725 + }; 2726 + 2727 + static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc); 2728 + 2687 2729 /* 2688 - * Queue an RCU callback for lazy invocation after a grace period. 2689 - * This will likely be later named something like "call_rcu_lazy()", 2690 - * but this change will require some way of tagging the lazy RCU 2691 - * callbacks in the list of pending callbacks. Until then, this 2692 - * function may only be called from __kfree_rcu(). 2730 + * This function is invoked in workqueue context after a grace period. 2731 + * It frees all the objects queued on ->head_free. 2732 + */ 2733 + static void kfree_rcu_work(struct work_struct *work) 2734 + { 2735 + unsigned long flags; 2736 + struct rcu_head *head, *next; 2737 + struct kfree_rcu_cpu *krcp; 2738 + struct kfree_rcu_cpu_work *krwp; 2739 + 2740 + krwp = container_of(to_rcu_work(work), 2741 + struct kfree_rcu_cpu_work, rcu_work); 2742 + krcp = krwp->krcp; 2743 + spin_lock_irqsave(&krcp->lock, flags); 2744 + head = krwp->head_free; 2745 + krwp->head_free = NULL; 2746 + spin_unlock_irqrestore(&krcp->lock, flags); 2747 + 2748 + // List "head" is now private, so traverse locklessly. 2749 + for (; head; head = next) { 2750 + unsigned long offset = (unsigned long)head->func; 2751 + 2752 + next = head->next; 2753 + // Potentially optimize with kfree_bulk in future. 2754 + debug_rcu_head_unqueue(head); 2755 + rcu_lock_acquire(&rcu_callback_map); 2756 + trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset); 2757 + 2758 + if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset))) { 2759 + /* Could be optimized with kfree_bulk() in future. */ 2760 + kfree((void *)head - offset); 2761 + } 2762 + 2763 + rcu_lock_release(&rcu_callback_map); 2764 + cond_resched_tasks_rcu_qs(); 2765 + } 2766 + } 2767 + 2768 + /* 2769 + * Schedule the kfree batch RCU work to run in workqueue context after a GP. 2770 + * 2771 + * This function is invoked by kfree_rcu_monitor() when the KFREE_DRAIN_JIFFIES 2772 + * timeout has been reached. 2773 + */ 2774 + static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp) 2775 + { 2776 + int i; 2777 + struct kfree_rcu_cpu_work *krwp = NULL; 2778 + 2779 + lockdep_assert_held(&krcp->lock); 2780 + for (i = 0; i < KFREE_N_BATCHES; i++) 2781 + if (!krcp->krw_arr[i].head_free) { 2782 + krwp = &(krcp->krw_arr[i]); 2783 + break; 2784 + } 2785 + 2786 + // If a previous RCU batch is in progress, we cannot immediately 2787 + // queue another one, so return false to tell caller to retry. 2788 + if (!krwp) 2789 + return false; 2790 + 2791 + krwp->head_free = krcp->head; 2792 + krcp->head = NULL; 2793 + INIT_RCU_WORK(&krwp->rcu_work, kfree_rcu_work); 2794 + queue_rcu_work(system_wq, &krwp->rcu_work); 2795 + return true; 2796 + } 2797 + 2798 + static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp, 2799 + unsigned long flags) 2800 + { 2801 + // Attempt to start a new batch. 2802 + krcp->monitor_todo = false; 2803 + if (queue_kfree_rcu_work(krcp)) { 2804 + // Success! Our job is done here. 2805 + spin_unlock_irqrestore(&krcp->lock, flags); 2806 + return; 2807 + } 2808 + 2809 + // Previous RCU batch still in progress, try again later. 2810 + krcp->monitor_todo = true; 2811 + schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES); 2812 + spin_unlock_irqrestore(&krcp->lock, flags); 2813 + } 2814 + 2815 + /* 2816 + * This function is invoked after the KFREE_DRAIN_JIFFIES timeout. 2817 + * It invokes kfree_rcu_drain_unlock() to attempt to start another batch. 2818 + */ 2819 + static void kfree_rcu_monitor(struct work_struct *work) 2820 + { 2821 + unsigned long flags; 2822 + struct kfree_rcu_cpu *krcp = container_of(work, struct kfree_rcu_cpu, 2823 + monitor_work.work); 2824 + 2825 + spin_lock_irqsave(&krcp->lock, flags); 2826 + if (krcp->monitor_todo) 2827 + kfree_rcu_drain_unlock(krcp, flags); 2828 + else 2829 + spin_unlock_irqrestore(&krcp->lock, flags); 2830 + } 2831 + 2832 + /* 2833 + * Queue a request for lazy invocation of kfree() after a grace period. 2834 + * 2835 + * Each kfree_call_rcu() request is added to a batch. The batch will be drained 2836 + * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch 2837 + * will be kfree'd in workqueue context. This allows us to: 2838 + * 2839 + * 1. Batch requests together to reduce the number of grace periods during 2840 + * heavy kfree_rcu() load. 2841 + * 2842 + * 2. It makes it possible to use kfree_bulk() on a large number of 2843 + * kfree_rcu() requests thus reducing cache misses and the per-object 2844 + * overhead of kfree(). 2693 2845 */ 2694 2846 void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func) 2695 2847 { 2696 - __call_rcu(head, func, 1); 2848 + unsigned long flags; 2849 + struct kfree_rcu_cpu *krcp; 2850 + 2851 + local_irq_save(flags); // For safely calling this_cpu_ptr(). 2852 + krcp = this_cpu_ptr(&krc); 2853 + if (krcp->initialized) 2854 + spin_lock(&krcp->lock); 2855 + 2856 + // Queue the object but don't yet schedule the batch. 2857 + if (debug_rcu_head_queue(head)) { 2858 + // Probable double kfree_rcu(), just leak. 2859 + WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n", 2860 + __func__, head); 2861 + goto unlock_return; 2862 + } 2863 + head->func = func; 2864 + head->next = krcp->head; 2865 + krcp->head = head; 2866 + 2867 + // Set timer to drain after KFREE_DRAIN_JIFFIES. 2868 + if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING && 2869 + !krcp->monitor_todo) { 2870 + krcp->monitor_todo = true; 2871 + schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES); 2872 + } 2873 + 2874 + unlock_return: 2875 + if (krcp->initialized) 2876 + spin_unlock(&krcp->lock); 2877 + local_irq_restore(flags); 2697 2878 } 2698 2879 EXPORT_SYMBOL_GPL(kfree_call_rcu); 2699 2880 2881 + void __init kfree_rcu_scheduler_running(void) 2882 + { 2883 + int cpu; 2884 + unsigned long flags; 2885 + 2886 + for_each_online_cpu(cpu) { 2887 + struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 2888 + 2889 + spin_lock_irqsave(&krcp->lock, flags); 2890 + if (!krcp->head || krcp->monitor_todo) { 2891 + spin_unlock_irqrestore(&krcp->lock, flags); 2892 + continue; 2893 + } 2894 + krcp->monitor_todo = true; 2895 + schedule_delayed_work_on(cpu, &krcp->monitor_work, 2896 + KFREE_DRAIN_JIFFIES); 2897 + spin_unlock_irqrestore(&krcp->lock, flags); 2898 + } 2899 + } 2900 + 2700 2901 /* 2701 2902 * During early boot, any blocking grace-period wait automatically 2702 - * implies a grace period. Later on, this is never the case for PREEMPT. 2903 + * implies a grace period. Later on, this is never the case for PREEMPTION. 2703 2904 * 2704 - * Howevr, because a context switch is a grace period for !PREEMPT, any 2905 + * Howevr, because a context switch is a grace period for !PREEMPTION, any 2705 2906 * blocking grace-period wait automatically implies a grace period if 2706 2907 * there is only one CPU online at any point time during execution of 2707 2908 * either synchronize_rcu() or synchronize_rcu_expedited(). It is OK to ··· 3099 2896 debug_rcu_head_queue(&rdp->barrier_head); 3100 2897 rcu_nocb_lock(rdp); 3101 2898 WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies)); 3102 - if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head, 0)) { 2899 + if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) { 3103 2900 atomic_inc(&rcu_state.barrier_cpu_count); 3104 2901 } else { 3105 2902 debug_rcu_head_unqueue(&rdp->barrier_head); ··· 3760 3557 struct workqueue_struct *rcu_gp_wq; 3761 3558 struct workqueue_struct *rcu_par_gp_wq; 3762 3559 3560 + static void __init kfree_rcu_batch_init(void) 3561 + { 3562 + int cpu; 3563 + int i; 3564 + 3565 + for_each_possible_cpu(cpu) { 3566 + struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 3567 + 3568 + spin_lock_init(&krcp->lock); 3569 + for (i = 0; i < KFREE_N_BATCHES; i++) 3570 + krcp->krw_arr[i].krcp = krcp; 3571 + INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor); 3572 + krcp->initialized = true; 3573 + } 3574 + } 3575 + 3763 3576 void __init rcu_init(void) 3764 3577 { 3765 3578 int cpu; 3766 3579 3767 3580 rcu_early_boot_tests(); 3768 3581 3582 + kfree_rcu_batch_init(); 3769 3583 rcu_bootup_announce(); 3770 3584 rcu_init_geometry(); 3771 3585 rcu_init_one();

+1 -17

kernel/rcu/tree.h

··· 16 16 #include <linux/cpumask.h> 17 17 #include <linux/seqlock.h> 18 18 #include <linux/swait.h> 19 - #include <linux/stop_machine.h> 20 19 #include <linux/rcu_node_tree.h> 21 20 22 21 #include "rcu_segcblist.h" ··· 181 182 bool rcu_need_heavy_qs; /* GP old, so heavy quiescent state! */ 182 183 bool rcu_urgent_qs; /* GP old need light quiescent state. */ 183 184 bool rcu_forced_tick; /* Forced tick to provide QS. */ 185 + bool rcu_forced_tick_exp; /* ... provide QS to expedited GP. */ 184 186 #ifdef CONFIG_RCU_FAST_NO_HZ 185 - bool all_lazy; /* All CPU's CBs lazy at idle start? */ 186 187 unsigned long last_accelerate; /* Last jiffy CBs were accelerated. */ 187 188 unsigned long last_advance_all; /* Last jiffy CBs were all advanced. */ 188 189 int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */ ··· 367 368 #define RCU_GP_CLEANUP 7 /* Grace-period cleanup started. */ 368 369 #define RCU_GP_CLEANED 8 /* Grace-period cleanup complete. */ 369 370 370 - static const char * const gp_state_names[] = { 371 - "RCU_GP_IDLE", 372 - "RCU_GP_WAIT_GPS", 373 - "RCU_GP_DONE_GPS", 374 - "RCU_GP_ONOFF", 375 - "RCU_GP_INIT", 376 - "RCU_GP_WAIT_FQS", 377 - "RCU_GP_DOING_FQS", 378 - "RCU_GP_CLEANUP", 379 - "RCU_GP_CLEANED", 380 - }; 381 - 382 371 /* 383 372 * In order to export the rcu_state name to the tracing tools, it 384 373 * needs to be added in the __tracepoint_string section. ··· 390 403 #define RCU_NAME rcu_name 391 404 #endif /* #else #ifdef CONFIG_TRACING */ 392 405 393 - int rcu_dynticks_snap(struct rcu_data *rdp); 394 - 395 406 /* Forward declarations for tree_plugin.h */ 396 407 static void rcu_bootup_announce(void); 397 408 static void rcu_qs(void); ··· 400 415 static int rcu_print_task_exp_stall(struct rcu_node *rnp); 401 416 static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp); 402 417 static void rcu_flavor_sched_clock_irq(int user); 403 - void call_rcu(struct rcu_head *head, rcu_callback_t func); 404 418 static void dump_blkd_tasks(struct rcu_node *rnp, int ncheck); 405 419 static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags); 406 420 static void rcu_preempt_boost_start_gp(struct rcu_node *rnp);

+91 -56

kernel/rcu/tree_exp.h

··· 21 21 } 22 22 23 23 /* 24 - * Return then value that expedited-grace-period counter will have 24 + * Return the value that the expedited-grace-period counter will have 25 25 * at the end of the current grace period. 26 26 */ 27 27 static __maybe_unused unsigned long rcu_exp_gp_seq_endval(void) ··· 39 39 } 40 40 41 41 /* 42 - * Take a snapshot of the expedited-grace-period counter. 42 + * Take a snapshot of the expedited-grace-period counter, which is the 43 + * earliest value that will indicate that a full grace period has 44 + * elapsed since the current time. 43 45 */ 44 46 static unsigned long rcu_exp_gp_seq_snap(void) 45 47 { ··· 136 134 rcu_for_each_node_breadth_first(rnp) { 137 135 raw_spin_lock_irqsave_rcu_node(rnp, flags); 138 136 WARN_ON_ONCE(rnp->expmask); 139 - rnp->expmask = rnp->expmaskinit; 137 + WRITE_ONCE(rnp->expmask, rnp->expmaskinit); 140 138 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 141 139 } 142 140 } ··· 145 143 * Return non-zero if there is no RCU expedited grace period in progress 146 144 * for the specified rcu_node structure, in other words, if all CPUs and 147 145 * tasks covered by the specified rcu_node structure have done their bit 148 - * for the current expedited grace period. Works only for preemptible 149 - * RCU -- other RCU implementation use other means. 150 - * 151 - * Caller must hold the specificed rcu_node structure's ->lock 146 + * for the current expedited grace period. 152 147 */ 153 - static bool sync_rcu_preempt_exp_done(struct rcu_node *rnp) 148 + static bool sync_rcu_exp_done(struct rcu_node *rnp) 154 149 { 155 150 raw_lockdep_assert_held_rcu_node(rnp); 156 - 157 151 return rnp->exp_tasks == NULL && 158 152 READ_ONCE(rnp->expmask) == 0; 159 153 } 160 154 161 155 /* 162 - * Like sync_rcu_preempt_exp_done(), but this function assumes the caller 163 - * doesn't hold the rcu_node's ->lock, and will acquire and release the lock 164 - * itself 156 + * Like sync_rcu_exp_done(), but where the caller does not hold the 157 + * rcu_node's ->lock. 165 158 */ 166 - static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp) 159 + static bool sync_rcu_exp_done_unlocked(struct rcu_node *rnp) 167 160 { 168 161 unsigned long flags; 169 162 bool ret; 170 163 171 164 raw_spin_lock_irqsave_rcu_node(rnp, flags); 172 - ret = sync_rcu_preempt_exp_done(rnp); 165 + ret = sync_rcu_exp_done(rnp); 173 166 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 174 167 175 168 return ret; ··· 178 181 * which the task was queued or to one of that rcu_node structure's ancestors, 179 182 * recursively up the tree. (Calm down, calm down, we do the recursion 180 183 * iteratively!) 181 - * 182 - * Caller must hold the specified rcu_node structure's ->lock. 183 184 */ 184 185 static void __rcu_report_exp_rnp(struct rcu_node *rnp, 185 186 bool wake, unsigned long flags) ··· 185 190 { 186 191 unsigned long mask; 187 192 193 + raw_lockdep_assert_held_rcu_node(rnp); 188 194 for (;;) { 189 - if (!sync_rcu_preempt_exp_done(rnp)) { 195 + if (!sync_rcu_exp_done(rnp)) { 190 196 if (!rnp->expmask) 191 197 rcu_initiate_boost(rnp, flags); 192 198 else ··· 207 211 rnp = rnp->parent; 208 212 raw_spin_lock_rcu_node(rnp); /* irqs already disabled */ 209 213 WARN_ON_ONCE(!(rnp->expmask & mask)); 210 - rnp->expmask &= ~mask; 214 + WRITE_ONCE(rnp->expmask, rnp->expmask & ~mask); 211 215 } 212 216 } 213 217 ··· 230 234 static void rcu_report_exp_cpu_mult(struct rcu_node *rnp, 231 235 unsigned long mask, bool wake) 232 236 { 237 + int cpu; 233 238 unsigned long flags; 239 + struct rcu_data *rdp; 234 240 235 241 raw_spin_lock_irqsave_rcu_node(rnp, flags); 236 242 if (!(rnp->expmask & mask)) { 237 243 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 238 244 return; 239 245 } 240 - rnp->expmask &= ~mask; 246 + WRITE_ONCE(rnp->expmask, rnp->expmask & ~mask); 247 + for_each_leaf_node_cpu_mask(rnp, cpu, mask) { 248 + rdp = per_cpu_ptr(&rcu_data, cpu); 249 + if (!IS_ENABLED(CONFIG_NO_HZ_FULL) || !rdp->rcu_forced_tick_exp) 250 + continue; 251 + rdp->rcu_forced_tick_exp = false; 252 + tick_dep_clear_cpu(cpu, TICK_DEP_BIT_RCU_EXP); 253 + } 241 254 __rcu_report_exp_rnp(rnp, wake, flags); /* Releases rnp->lock. */ 242 255 } 243 256 ··· 350 345 /* Each pass checks a CPU for identity, offline, and idle. */ 351 346 mask_ofl_test = 0; 352 347 for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) { 353 - unsigned long mask = leaf_node_cpu_bit(rnp, cpu); 354 348 struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); 349 + unsigned long mask = rdp->grpmask; 355 350 int snap; 356 351 357 352 if (raw_smp_processor_id() == cpu || ··· 377 372 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 378 373 379 374 /* IPI the remaining CPUs for expedited quiescent state. */ 380 - for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) { 381 - unsigned long mask = leaf_node_cpu_bit(rnp, cpu); 375 + for_each_leaf_node_cpu_mask(rnp, cpu, mask_ofl_ipi) { 382 376 struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); 377 + unsigned long mask = rdp->grpmask; 383 378 384 - if (!(mask_ofl_ipi & mask)) 385 - continue; 386 379 retry_ipi: 387 380 if (rcu_dynticks_in_eqs_since(rdp, rdp->exp_dynticks_snap)) { 388 381 mask_ofl_test |= mask; ··· 392 389 } 393 390 ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0); 394 391 put_cpu(); 395 - if (!ret) { 396 - mask_ofl_ipi &= ~mask; 392 + /* The CPU will report the QS in response to the IPI. */ 393 + if (!ret) 397 394 continue; 398 - } 395 + 399 396 /* Failed, raced with CPU hotplug operation. */ 400 397 raw_spin_lock_irqsave_rcu_node(rnp, flags); 401 398 if ((rnp->qsmaskinitnext & mask) && ··· 406 403 schedule_timeout_uninterruptible(1); 407 404 goto retry_ipi; 408 405 } 409 - /* CPU really is offline, so we can ignore it. */ 410 - if (!(rnp->expmask & mask)) 411 - mask_ofl_ipi &= ~mask; 406 + /* CPU really is offline, so we must report its QS. */ 407 + if (rnp->expmask & mask) 408 + mask_ofl_test |= mask; 412 409 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 413 410 } 414 411 /* Report quiescent states for those that went offline. */ 415 - mask_ofl_test |= mask_ofl_ipi; 416 412 if (mask_ofl_test) 417 413 rcu_report_exp_cpu_mult(rnp, mask_ofl_test, false); 418 414 } ··· 458 456 flush_work(&rnp->rew.rew_work); 459 457 } 460 458 461 - static void synchronize_sched_expedited_wait(void) 459 + /* 460 + * Wait for the expedited grace period to elapse, within time limit. 461 + * If the time limit is exceeded without the grace period elapsing, 462 + * return false, otherwise return true. 463 + */ 464 + static bool synchronize_rcu_expedited_wait_once(long tlimit) 465 + { 466 + int t; 467 + struct rcu_node *rnp_root = rcu_get_root(); 468 + 469 + t = swait_event_timeout_exclusive(rcu_state.expedited_wq, 470 + sync_rcu_exp_done_unlocked(rnp_root), 471 + tlimit); 472 + // Workqueues should not be signaled. 473 + if (t > 0 || sync_rcu_exp_done_unlocked(rnp_root)) 474 + return true; 475 + WARN_ON(t < 0); /* workqueues should not be signaled. */ 476 + return false; 477 + } 478 + 479 + /* 480 + * Wait for the expedited grace period to elapse, issuing any needed 481 + * RCU CPU stall warnings along the way. 482 + */ 483 + static void synchronize_rcu_expedited_wait(void) 462 484 { 463 485 int cpu; 464 486 unsigned long jiffies_stall; 465 487 unsigned long jiffies_start; 466 488 unsigned long mask; 467 489 int ndetected; 490 + struct rcu_data *rdp; 468 491 struct rcu_node *rnp; 469 492 struct rcu_node *rnp_root = rcu_get_root(); 470 - int ret; 471 493 472 494 trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait")); 473 495 jiffies_stall = rcu_jiffies_till_stall_check(); 474 496 jiffies_start = jiffies; 497 + if (IS_ENABLED(CONFIG_NO_HZ_FULL)) { 498 + if (synchronize_rcu_expedited_wait_once(1)) 499 + return; 500 + rcu_for_each_leaf_node(rnp) { 501 + for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) { 502 + rdp = per_cpu_ptr(&rcu_data, cpu); 503 + if (rdp->rcu_forced_tick_exp) 504 + continue; 505 + rdp->rcu_forced_tick_exp = true; 506 + tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP); 507 + } 508 + } 509 + WARN_ON_ONCE(1); 510 + } 475 511 476 512 for (;;) { 477 - ret = swait_event_timeout_exclusive( 478 - rcu_state.expedited_wq, 479 - sync_rcu_preempt_exp_done_unlocked(rnp_root), 480 - jiffies_stall); 481 - if (ret > 0 || sync_rcu_preempt_exp_done_unlocked(rnp_root)) 513 + if (synchronize_rcu_expedited_wait_once(jiffies_stall)) 482 514 return; 483 - WARN_ON(ret < 0); /* workqueues should not be signaled. */ 484 515 if (rcu_cpu_stall_suppress) 485 516 continue; 486 517 panic_on_rcu_stall(); ··· 526 491 struct rcu_data *rdp; 527 492 528 493 mask = leaf_node_cpu_bit(rnp, cpu); 529 - if (!(rnp->expmask & mask)) 494 + if (!(READ_ONCE(rnp->expmask) & mask)) 530 495 continue; 531 496 ndetected++; 532 497 rdp = per_cpu_ptr(&rcu_data, cpu); ··· 538 503 } 539 504 pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n", 540 505 jiffies - jiffies_start, rcu_state.expedited_sequence, 541 - rnp_root->expmask, ".T"[!!rnp_root->exp_tasks]); 506 + READ_ONCE(rnp_root->expmask), 507 + ".T"[!!rnp_root->exp_tasks]); 542 508 if (ndetected) { 543 509 pr_err("blocking rcu_node structures:"); 544 510 rcu_for_each_node_breadth_first(rnp) { 545 511 if (rnp == rnp_root) 546 512 continue; /* printed unconditionally */ 547 - if (sync_rcu_preempt_exp_done_unlocked(rnp)) 513 + if (sync_rcu_exp_done_unlocked(rnp)) 548 514 continue; 549 515 pr_cont(" l=%u:%d-%d:%#lx/%c", 550 516 rnp->level, rnp->grplo, rnp->grphi, 551 - rnp->expmask, 517 + READ_ONCE(rnp->expmask), 552 518 ".T"[!!rnp->exp_tasks]); 553 519 } 554 520 pr_cont("\n"); ··· 557 521 rcu_for_each_leaf_node(rnp) { 558 522 for_each_leaf_node_possible_cpu(rnp, cpu) { 559 523 mask = leaf_node_cpu_bit(rnp, cpu); 560 - if (!(rnp->expmask & mask)) 524 + if (!(READ_ONCE(rnp->expmask) & mask)) 561 525 continue; 562 526 dump_cpu_task(cpu); 563 527 } ··· 576 540 { 577 541 struct rcu_node *rnp; 578 542 579 - synchronize_sched_expedited_wait(); 543 + synchronize_rcu_expedited_wait(); 544 + 545 + // Switch over to wakeup mode, allowing the next GP to proceed. 546 + // End the previous grace period only after acquiring the mutex 547 + // to ensure that only one GP runs concurrently with wakeups. 548 + mutex_lock(&rcu_state.exp_wake_mutex); 580 549 rcu_exp_gp_seq_end(); 581 550 trace_rcu_exp_grace_period(rcu_state.name, s, TPS("end")); 582 - 583 - /* 584 - * Switch over to wakeup mode, allowing the next GP, but -only- the 585 - * next GP, to proceed. 586 - */ 587 - mutex_lock(&rcu_state.exp_wake_mutex); 588 551 589 552 rcu_for_each_node_breadth_first(rnp) { 590 553 if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s)) { ··· 594 559 spin_unlock(&rnp->exp_lock); 595 560 } 596 561 smp_mb(); /* All above changes before wakeup. */ 597 - wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rcu_state.expedited_sequence) & 0x3]); 562 + wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]); 598 563 } 599 564 trace_rcu_exp_grace_period(rcu_state.name, s, TPS("endwake")); 600 565 mutex_unlock(&rcu_state.exp_wake_mutex); ··· 645 610 * critical section. If also enabled or idle, immediately 646 611 * report the quiescent state, otherwise defer. 647 612 */ 648 - if (!t->rcu_read_lock_nesting) { 613 + if (!rcu_preempt_depth()) { 649 614 if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) || 650 615 rcu_dynticks_curr_cpu_in_eqs()) { 651 616 rcu_report_exp_rdp(rdp); ··· 669 634 * can have caused this quiescent state to already have been 670 635 * reported, so we really do need to check ->expmask. 671 636 */ 672 - if (t->rcu_read_lock_nesting > 0) { 637 + if (rcu_preempt_depth() > 0) { 673 638 raw_spin_lock_irqsave_rcu_node(rnp, flags); 674 639 if (rnp->expmask & rdp->grpmask) { 675 640 rdp->exp_deferred_qs = true; ··· 705 670 } 706 671 } 707 672 708 - /* PREEMPT=y, so no PREEMPT=n expedited grace period to clean up after. */ 673 + /* PREEMPTION=y, so no PREEMPTION=n expedited grace period to clean up after. */ 709 674 static void sync_sched_exp_online_cleanup(int cpu) 710 675 { 711 676 } ··· 820 785 * implementations, it is still unfriendly to real-time workloads, so is 821 786 * thus not recommended for any sort of common-case code. In fact, if 822 787 * you are using synchronize_rcu_expedited() in a loop, please restructure 823 - * your code to batch your updates, and then Use a single synchronize_rcu() 788 + * your code to batch your updates, and then use a single synchronize_rcu() 824 789 * instead. 825 790 * 826 791 * This has the same semantics as (but is more brutal than) synchronize_rcu().

+78 -90

kernel/rcu/tree_plugin.h

··· 220 220 * blocked tasks. 221 221 */ 222 222 if (!rnp->gp_tasks && (blkd_state & RCU_GP_BLKD)) { 223 - rnp->gp_tasks = &t->rcu_node_entry; 223 + WRITE_ONCE(rnp->gp_tasks, &t->rcu_node_entry); 224 224 WARN_ON_ONCE(rnp->completedqs == rnp->gp_seq); 225 225 } 226 226 if (!rnp->exp_tasks && (blkd_state & RCU_EXP_BLKD)) ··· 290 290 291 291 trace_rcu_utilization(TPS("Start context switch")); 292 292 lockdep_assert_irqs_disabled(); 293 - WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0); 294 - if (t->rcu_read_lock_nesting > 0 && 293 + WARN_ON_ONCE(!preempt && rcu_preempt_depth() > 0); 294 + if (rcu_preempt_depth() > 0 && 295 295 !t->rcu_read_unlock_special.b.blocked) { 296 296 297 297 /* Possibly blocking in an RCU read-side critical section. */ ··· 340 340 */ 341 341 static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp) 342 342 { 343 - return rnp->gp_tasks != NULL; 343 + return READ_ONCE(rnp->gp_tasks) != NULL; 344 344 } 345 345 346 346 /* Bias and limit values for ->rcu_read_lock_nesting. */ 347 347 #define RCU_NEST_BIAS INT_MAX 348 348 #define RCU_NEST_NMAX (-INT_MAX / 2) 349 349 #define RCU_NEST_PMAX (INT_MAX / 2) 350 + 351 + static void rcu_preempt_read_enter(void) 352 + { 353 + current->rcu_read_lock_nesting++; 354 + } 355 + 356 + static void rcu_preempt_read_exit(void) 357 + { 358 + current->rcu_read_lock_nesting--; 359 + } 360 + 361 + static void rcu_preempt_depth_set(int val) 362 + { 363 + current->rcu_read_lock_nesting = val; 364 + } 350 365 351 366 /* 352 367 * Preemptible RCU implementation for rcu_read_lock(). ··· 370 355 */ 371 356 void __rcu_read_lock(void) 372 357 { 373 - current->rcu_read_lock_nesting++; 358 + rcu_preempt_read_enter(); 374 359 if (IS_ENABLED(CONFIG_PROVE_LOCKING)) 375 - WARN_ON_ONCE(current->rcu_read_lock_nesting > RCU_NEST_PMAX); 360 + WARN_ON_ONCE(rcu_preempt_depth() > RCU_NEST_PMAX); 376 361 barrier(); /* critical section after entry code. */ 377 362 } 378 363 EXPORT_SYMBOL_GPL(__rcu_read_lock); ··· 388 373 { 389 374 struct task_struct *t = current; 390 375 391 - if (t->rcu_read_lock_nesting != 1) { 392 - --t->rcu_read_lock_nesting; 376 + if (rcu_preempt_depth() != 1) { 377 + rcu_preempt_read_exit(); 393 378 } else { 394 379 barrier(); /* critical section before exit code. */ 395 - t->rcu_read_lock_nesting = -RCU_NEST_BIAS; 380 + rcu_preempt_depth_set(-RCU_NEST_BIAS); 396 381 barrier(); /* assign before ->rcu_read_unlock_special load */ 397 382 if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s))) 398 383 rcu_read_unlock_special(t); 399 384 barrier(); /* ->rcu_read_unlock_special load before assign */ 400 - t->rcu_read_lock_nesting = 0; 385 + rcu_preempt_depth_set(0); 401 386 } 402 387 if (IS_ENABLED(CONFIG_PROVE_LOCKING)) { 403 - int rrln = t->rcu_read_lock_nesting; 388 + int rrln = rcu_preempt_depth(); 404 389 405 390 WARN_ON_ONCE(rrln < 0 && rrln > RCU_NEST_NMAX); 406 391 } ··· 459 444 local_irq_restore(flags); 460 445 return; 461 446 } 462 - t->rcu_read_unlock_special.b.deferred_qs = false; 463 - if (special.b.need_qs) { 447 + t->rcu_read_unlock_special.s = 0; 448 + if (special.b.need_qs) 464 449 rcu_qs(); 465 - t->rcu_read_unlock_special.b.need_qs = false; 466 - if (!t->rcu_read_unlock_special.s && !rdp->exp_deferred_qs) { 467 - local_irq_restore(flags); 468 - return; 469 - } 470 - } 471 450 472 451 /* 473 452 * Respond to a request by an expedited grace period for a ··· 469 460 * tasks are handled when removing the task from the 470 461 * blocked-tasks list below. 471 462 */ 472 - if (rdp->exp_deferred_qs) { 463 + if (rdp->exp_deferred_qs) 473 464 rcu_report_exp_rdp(rdp); 474 - if (!t->rcu_read_unlock_special.s) { 475 - local_irq_restore(flags); 476 - return; 477 - } 478 - } 479 465 480 466 /* Clean up if blocked during RCU read-side critical section. */ 481 467 if (special.b.blocked) { 482 - t->rcu_read_unlock_special.b.blocked = false; 483 468 484 469 /* 485 470 * Remove this task from the list it blocked on. The task ··· 488 485 empty_norm = !rcu_preempt_blocked_readers_cgp(rnp); 489 486 WARN_ON_ONCE(rnp->completedqs == rnp->gp_seq && 490 487 (!empty_norm || rnp->qsmask)); 491 - empty_exp = sync_rcu_preempt_exp_done(rnp); 488 + empty_exp = sync_rcu_exp_done(rnp); 492 489 smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */ 493 490 np = rcu_next_node_entry(t, rnp); 494 491 list_del_init(&t->rcu_node_entry); ··· 496 493 trace_rcu_unlock_preempted_task(TPS("rcu_preempt"), 497 494 rnp->gp_seq, t->pid); 498 495 if (&t->rcu_node_entry == rnp->gp_tasks) 499 - rnp->gp_tasks = np; 496 + WRITE_ONCE(rnp->gp_tasks, np); 500 497 if (&t->rcu_node_entry == rnp->exp_tasks) 501 498 rnp->exp_tasks = np; 502 499 if (IS_ENABLED(CONFIG_RCU_BOOST)) { ··· 512 509 * Note that rcu_report_unblock_qs_rnp() releases rnp->lock, 513 510 * so we must take a snapshot of the expedited state. 514 511 */ 515 - empty_exp_now = sync_rcu_preempt_exp_done(rnp); 512 + empty_exp_now = sync_rcu_exp_done(rnp); 516 513 if (!empty_norm && !rcu_preempt_blocked_readers_cgp(rnp)) { 517 514 trace_rcu_quiescent_state_report(TPS("preempt_rcu"), 518 515 rnp->gp_seq, ··· 554 551 { 555 552 return (__this_cpu_read(rcu_data.exp_deferred_qs) || 556 553 READ_ONCE(t->rcu_read_unlock_special.s)) && 557 - t->rcu_read_lock_nesting <= 0; 554 + rcu_preempt_depth() <= 0; 558 555 } 559 556 560 557 /* ··· 567 564 static void rcu_preempt_deferred_qs(struct task_struct *t) 568 565 { 569 566 unsigned long flags; 570 - bool couldrecurse = t->rcu_read_lock_nesting >= 0; 567 + bool couldrecurse = rcu_preempt_depth() >= 0; 571 568 572 569 if (!rcu_preempt_need_deferred_qs(t)) 573 570 return; 574 571 if (couldrecurse) 575 - t->rcu_read_lock_nesting -= RCU_NEST_BIAS; 572 + rcu_preempt_depth_set(rcu_preempt_depth() - RCU_NEST_BIAS); 576 573 local_irq_save(flags); 577 574 rcu_preempt_deferred_qs_irqrestore(t, flags); 578 575 if (couldrecurse) 579 - t->rcu_read_lock_nesting += RCU_NEST_BIAS; 576 + rcu_preempt_depth_set(rcu_preempt_depth() + RCU_NEST_BIAS); 580 577 } 581 578 582 579 /* ··· 613 610 struct rcu_data *rdp = this_cpu_ptr(&rcu_data); 614 611 struct rcu_node *rnp = rdp->mynode; 615 612 616 - t->rcu_read_unlock_special.b.exp_hint = false; 617 613 exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) || 618 - (rdp->grpmask & rnp->expmask) || 614 + (rdp->grpmask & READ_ONCE(rnp->expmask)) || 619 615 tick_nohz_full_cpu(rdp->cpu); 620 616 // Need to defer quiescent state until everything is enabled. 621 617 if (irqs_were_disabled && use_softirq && ··· 642 640 local_irq_restore(flags); 643 641 return; 644 642 } 645 - WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false); 646 643 rcu_preempt_deferred_qs_irqrestore(t, flags); 647 644 } 648 645 ··· 649 648 * Check that the list of blocked tasks for the newly completed grace 650 649 * period is in fact empty. It is a serious bug to complete a grace 651 650 * period that still has RCU readers blocked! This function must be 652 - * invoked -before- updating this rnp's ->gp_seq, and the rnp's ->lock 653 - * must be held by the caller. 651 + * invoked -before- updating this rnp's ->gp_seq. 654 652 * 655 653 * Also, if there are blocked tasks on the list, they automatically 656 654 * block the newly created grace period, so set up ->gp_tasks accordingly. ··· 659 659 struct task_struct *t; 660 660 661 661 RCU_LOCKDEP_WARN(preemptible(), "rcu_preempt_check_blocked_tasks() invoked with preemption enabled!!!\n"); 662 + raw_lockdep_assert_held_rcu_node(rnp); 662 663 if (WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp))) 663 664 dump_blkd_tasks(rnp, 10); 664 665 if (rcu_preempt_has_tasks(rnp) && 665 666 (rnp->qsmaskinit || rnp->wait_blkd_tasks)) { 666 - rnp->gp_tasks = rnp->blkd_tasks.next; 667 + WRITE_ONCE(rnp->gp_tasks, rnp->blkd_tasks.next); 667 668 t = container_of(rnp->gp_tasks, struct task_struct, 668 669 rcu_node_entry); 669 670 trace_rcu_unlock_preempted_task(TPS("rcu_preempt-GPS"), ··· 687 686 if (user || rcu_is_cpu_rrupt_from_idle()) { 688 687 rcu_note_voluntary_context_switch(current); 689 688 } 690 - if (t->rcu_read_lock_nesting > 0 || 689 + if (rcu_preempt_depth() > 0 || 691 690 (preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) { 692 691 /* No QS, force context switch if deferred. */ 693 692 if (rcu_preempt_need_deferred_qs(t)) { ··· 697 696 } else if (rcu_preempt_need_deferred_qs(t)) { 698 697 rcu_preempt_deferred_qs(t); /* Report deferred QS. */ 699 698 return; 700 - } else if (!t->rcu_read_lock_nesting) { 699 + } else if (!rcu_preempt_depth()) { 701 700 rcu_qs(); /* Report immediate QS. */ 702 701 return; 703 702 } 704 703 705 704 /* If GP is oldish, ask for help from rcu_read_unlock_special(). */ 706 - if (t->rcu_read_lock_nesting > 0 && 705 + if (rcu_preempt_depth() > 0 && 707 706 __this_cpu_read(rcu_data.core_needs_qs) && 708 707 __this_cpu_read(rcu_data.cpu_no_qs.b.norm) && 709 708 !t->rcu_read_unlock_special.b.need_qs && ··· 724 723 struct task_struct *t = current; 725 724 726 725 if (unlikely(!list_empty(&current->rcu_node_entry))) { 727 - t->rcu_read_lock_nesting = 1; 726 + rcu_preempt_depth_set(1); 728 727 barrier(); 729 728 WRITE_ONCE(t->rcu_read_unlock_special.b.blocked, true); 730 - } else if (unlikely(t->rcu_read_lock_nesting)) { 731 - t->rcu_read_lock_nesting = 1; 729 + } else if (unlikely(rcu_preempt_depth())) { 730 + rcu_preempt_depth_set(1); 732 731 } else { 733 732 return; 734 733 } ··· 758 757 pr_info("%s: %d:%d ->qsmask %#lx ->qsmaskinit %#lx ->qsmaskinitnext %#lx\n", 759 758 __func__, rnp1->grplo, rnp1->grphi, rnp1->qsmask, rnp1->qsmaskinit, rnp1->qsmaskinitnext); 760 759 pr_info("%s: ->gp_tasks %p ->boost_tasks %p ->exp_tasks %p\n", 761 - __func__, rnp->gp_tasks, rnp->boost_tasks, rnp->exp_tasks); 760 + __func__, READ_ONCE(rnp->gp_tasks), rnp->boost_tasks, 761 + rnp->exp_tasks); 762 762 pr_info("%s: ->blkd_tasks", __func__); 763 763 i = 0; 764 764 list_for_each(lhp, &rnp->blkd_tasks) { ··· 790 788 } 791 789 792 790 /* 793 - * Note a quiescent state for PREEMPT=n. Because we do not need to know 791 + * Note a quiescent state for PREEMPTION=n. Because we do not need to know 794 792 * how many quiescent states passed, just if there was at least one since 795 793 * the start of the grace period, this just sets a flag. The caller must 796 794 * have disabled preemption. ··· 840 838 EXPORT_SYMBOL_GPL(rcu_all_qs); 841 839 842 840 /* 843 - * Note a PREEMPT=n context switch. The caller must have disabled interrupts. 841 + * Note a PREEMPTION=n context switch. The caller must have disabled interrupts. 844 842 */ 845 843 void rcu_note_context_switch(bool preempt) 846 844 { ··· 1264 1262 /* 1265 1263 * This code is invoked when a CPU goes idle, at which point we want 1266 1264 * to have the CPU do everything required for RCU so that it can enter 1267 - * the energy-efficient dyntick-idle mode. This is handled by a 1268 - * state machine implemented by rcu_prepare_for_idle() below. 1265 + * the energy-efficient dyntick-idle mode. 1269 1266 * 1270 - * The following three proprocessor symbols control this state machine: 1267 + * The following preprocessor symbol controls this: 1271 1268 * 1272 1269 * RCU_IDLE_GP_DELAY gives the number of jiffies that a CPU is permitted 1273 1270 * to sleep in dyntick-idle mode with RCU callbacks pending. This ··· 1275 1274 * number, be warned: Setting RCU_IDLE_GP_DELAY too high can hang your 1276 1275 * system. And if you are -that- concerned about energy efficiency, 1277 1276 * just power the system down and be done with it! 1278 - * RCU_IDLE_LAZY_GP_DELAY gives the number of jiffies that a CPU is 1279 - * permitted to sleep in dyntick-idle mode with only lazy RCU 1280 - * callbacks pending. Setting this too high can OOM your system. 1281 1277 * 1282 - * The values below work well in practice. If future workloads require 1278 + * The value below works well in practice. If future workloads require 1283 1279 * adjustment, they can be converted into kernel config parameters, though 1284 1280 * making the state machine smarter might be a better option. 1285 1281 */ 1286 1282 #define RCU_IDLE_GP_DELAY 4 /* Roughly one grace period. */ 1287 - #define RCU_IDLE_LAZY_GP_DELAY (6 * HZ) /* Roughly six seconds. */ 1288 1283 1289 1284 static int rcu_idle_gp_delay = RCU_IDLE_GP_DELAY; 1290 1285 module_param(rcu_idle_gp_delay, int, 0644); 1291 - static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY; 1292 - module_param(rcu_idle_lazy_gp_delay, int, 0644); 1293 1286 1294 1287 /* 1295 1288 * Try to advance callbacks on the current CPU, but only if it has been ··· 1322 1327 /* 1323 1328 * Allow the CPU to enter dyntick-idle mode unless it has callbacks ready 1324 1329 * to invoke. If the CPU has callbacks, try to advance them. Tell the 1325 - * caller to set the timeout based on whether or not there are non-lazy 1326 - * callbacks. 1330 + * caller about what to set the timeout. 1327 1331 * 1328 1332 * The caller must have disabled interrupts. 1329 1333 */ ··· 1348 1354 } 1349 1355 rdp->last_accelerate = jiffies; 1350 1356 1351 - /* Request timer delay depending on laziness, and round. */ 1352 - rdp->all_lazy = !rcu_segcblist_n_nonlazy_cbs(&rdp->cblist); 1353 - if (rdp->all_lazy) { 1354 - dj = round_jiffies(rcu_idle_lazy_gp_delay + jiffies) - jiffies; 1355 - } else { 1356 - dj = round_up(rcu_idle_gp_delay + jiffies, 1357 - rcu_idle_gp_delay) - jiffies; 1358 - } 1357 + /* Request timer and round. */ 1358 + dj = round_up(rcu_idle_gp_delay + jiffies, rcu_idle_gp_delay) - jiffies; 1359 + 1359 1360 *nextevt = basemono + dj * TICK_NSEC; 1360 1361 return 0; 1361 1362 } 1362 1363 1363 1364 /* 1364 - * Prepare a CPU for idle from an RCU perspective. The first major task 1365 - * is to sense whether nohz mode has been enabled or disabled via sysfs. 1366 - * The second major task is to check to see if a non-lazy callback has 1367 - * arrived at a CPU that previously had only lazy callbacks. The third 1368 - * major task is to accelerate (that is, assign grace-period numbers to) 1369 - * any recently arrived callbacks. 1365 + * Prepare a CPU for idle from an RCU perspective. The first major task is to 1366 + * sense whether nohz mode has been enabled or disabled via sysfs. The second 1367 + * major task is to accelerate (that is, assign grace-period numbers to) any 1368 + * recently arrived callbacks. 1370 1369 * 1371 1370 * The caller must have disabled interrupts. 1372 1371 */ ··· 1384 1397 } 1385 1398 if (!tne) 1386 1399 return; 1387 - 1388 - /* 1389 - * If a non-lazy callback arrived at a CPU having only lazy 1390 - * callbacks, invoke RCU core for the side-effect of recalculating 1391 - * idle duration on re-entry to idle. 1392 - */ 1393 - if (rdp->all_lazy && rcu_segcblist_n_nonlazy_cbs(&rdp->cblist)) { 1394 - rdp->all_lazy = false; 1395 - invoke_rcu_core(); 1396 - return; 1397 - } 1398 1400 1399 1401 /* 1400 1402 * If we have not yet accelerated this jiffy, accelerate all ··· 2297 2321 { 2298 2322 int cpu; 2299 2323 bool firsttime = true; 2324 + bool gotnocbs = false; 2325 + bool gotnocbscbs = true; 2300 2326 int ls = rcu_nocb_gp_stride; 2301 2327 int nl = 0; /* Next GP kthread. */ 2302 2328 struct rcu_data *rdp; ··· 2321 2343 rdp = per_cpu_ptr(&rcu_data, cpu); 2322 2344 if (rdp->cpu >= nl) { 2323 2345 /* New GP kthread, set up for CBs & next GP. */ 2346 + gotnocbs = true; 2324 2347 nl = DIV_ROUND_UP(rdp->cpu + 1, ls) * ls; 2325 2348 rdp->nocb_gp_rdp = rdp; 2326 2349 rdp_gp = rdp; 2327 - if (!firsttime && dump_tree) 2328 - pr_cont("\n"); 2329 - firsttime = false; 2330 - pr_alert("%s: No-CB GP kthread CPU %d:", __func__, cpu); 2350 + if (dump_tree) { 2351 + if (!firsttime) 2352 + pr_cont("%s\n", gotnocbscbs 2353 + ? "" : " (self only)"); 2354 + gotnocbscbs = false; 2355 + firsttime = false; 2356 + pr_alert("%s: No-CB GP kthread CPU %d:", 2357 + __func__, cpu); 2358 + } 2331 2359 } else { 2332 2360 /* Another CB kthread, link to previous GP kthread. */ 2361 + gotnocbscbs = true; 2333 2362 rdp->nocb_gp_rdp = rdp_gp; 2334 2363 rdp_prev->nocb_next_cb_rdp = rdp; 2335 - pr_alert(" %d", cpu); 2364 + if (dump_tree) 2365 + pr_cont(" %d", cpu); 2336 2366 } 2337 2367 rdp_prev = rdp; 2338 2368 } 2369 + if (gotnocbs && dump_tree) 2370 + pr_cont("%s\n", gotnocbscbs ? "" : " (self only)"); 2339 2371 } 2340 2372 2341 2373 /*

+27 -7

kernel/rcu/tree_stall.h

··· 163 163 // 164 164 // Printing RCU CPU stall warnings 165 165 166 - #ifdef CONFIG_PREEMPTION 166 + #ifdef CONFIG_PREEMPT_RCU 167 167 168 168 /* 169 169 * Dump detailed information for all tasks blocking the current RCU ··· 215 215 return ndetected; 216 216 } 217 217 218 - #else /* #ifdef CONFIG_PREEMPTION */ 218 + #else /* #ifdef CONFIG_PREEMPT_RCU */ 219 219 220 220 /* 221 221 * Because preemptible RCU does not exist, we never have to check for ··· 233 233 { 234 234 return 0; 235 235 } 236 - #endif /* #else #ifdef CONFIG_PREEMPTION */ 236 + #endif /* #else #ifdef CONFIG_PREEMPT_RCU */ 237 237 238 238 /* 239 239 * Dump stacks of all tasks running on stalled CPUs. First try using ··· 263 263 { 264 264 struct rcu_data *rdp = &per_cpu(rcu_data, cpu); 265 265 266 - sprintf(cp, "last_accelerate: %04lx/%04lx, Nonlazy posted: %c%c%c", 266 + sprintf(cp, "last_accelerate: %04lx/%04lx dyntick_enabled: %d", 267 267 rdp->last_accelerate & 0xffff, jiffies & 0xffff, 268 - ".l"[rdp->all_lazy], 269 - ".L"[!rcu_segcblist_n_nonlazy_cbs(&rdp->cblist)], 270 - ".D"[!!rdp->tick_nohz_enabled_snap]); 268 + !!rdp->tick_nohz_enabled_snap); 271 269 } 272 270 273 271 #else /* #ifdef CONFIG_RCU_FAST_NO_HZ */ ··· 276 278 } 277 279 278 280 #endif /* #else #ifdef CONFIG_RCU_FAST_NO_HZ */ 281 + 282 + static const char * const gp_state_names[] = { 283 + [RCU_GP_IDLE] = "RCU_GP_IDLE", 284 + [RCU_GP_WAIT_GPS] = "RCU_GP_WAIT_GPS", 285 + [RCU_GP_DONE_GPS] = "RCU_GP_DONE_GPS", 286 + [RCU_GP_ONOFF] = "RCU_GP_ONOFF", 287 + [RCU_GP_INIT] = "RCU_GP_INIT", 288 + [RCU_GP_WAIT_FQS] = "RCU_GP_WAIT_FQS", 289 + [RCU_GP_DOING_FQS] = "RCU_GP_DOING_FQS", 290 + [RCU_GP_CLEANUP] = "RCU_GP_CLEANUP", 291 + [RCU_GP_CLEANED] = "RCU_GP_CLEANED", 292 + }; 293 + 294 + /* 295 + * Convert a ->gp_state value to a character string. 296 + */ 297 + static const char *gp_state_getname(short gs) 298 + { 299 + if (gs < 0 || gs >= ARRAY_SIZE(gp_state_names)) 300 + return "???"; 301 + return gp_state_names[gs]; 302 + } 279 303 280 304 /* 281 305 * Print out diagnostic information for the specified stalled CPU.

+11 -3

kernel/rcu/update.c

··· 40 40 #include <linux/rcupdate_wait.h> 41 41 #include <linux/sched/isolation.h> 42 42 #include <linux/kprobes.h> 43 + #include <linux/slab.h> 43 44 44 45 #define CREATE_TRACE_POINTS 45 46 ··· 52 51 #define MODULE_PARAM_PREFIX "rcupdate." 53 52 54 53 #ifndef CONFIG_TINY_RCU 55 - extern int rcu_expedited; /* from sysctl */ 56 54 module_param(rcu_expedited, int, 0); 57 - extern int rcu_normal; /* from sysctl */ 58 55 module_param(rcu_normal, int, 0); 59 56 static int rcu_normal_after_boot; 60 57 module_param(rcu_normal_after_boot, int, 0); ··· 217 218 { 218 219 rcu_test_sync_prims(); 219 220 rcu_scheduler_active = RCU_SCHEDULER_RUNNING; 221 + kfree_rcu_scheduler_running(); 220 222 rcu_test_sync_prims(); 221 223 return 0; 222 224 } ··· 435 435 EXPORT_SYMBOL_GPL(rcuhead_debug_descr); 436 436 #endif /* #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD */ 437 437 438 - #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) || defined(CONFIG_RCU_TRACE) 438 + #if defined(CONFIG_TREE_RCU) || defined(CONFIG_RCU_TRACE) 439 439 void do_trace_rcu_torture_read(const char *rcutorturename, struct rcu_head *rhp, 440 440 unsigned long secs, 441 441 unsigned long c_old, unsigned long c) ··· 853 853 854 854 DEFINE_STATIC_SRCU(early_srcu); 855 855 856 + struct early_boot_kfree_rcu { 857 + struct rcu_head rh; 858 + }; 859 + 856 860 static void early_boot_test_call_rcu(void) 857 861 { 858 862 static struct rcu_head head; 859 863 static struct rcu_head shead; 864 + struct early_boot_kfree_rcu *rhp; 860 865 861 866 call_rcu(&head, test_callback); 862 867 if (IS_ENABLED(CONFIG_SRCU)) 863 868 call_srcu(&early_srcu, &shead, test_callback); 869 + rhp = kmalloc(sizeof(*rhp), GFP_KERNEL); 870 + if (!WARN_ON_ONCE(!rhp)) 871 + kfree_rcu(rhp, rh); 864 872 } 865 873 866 874 void rcu_early_boot_tests(void)

+1 -1

kernel/sysctl.c

··· 1268 1268 .proc_handler = proc_do_static_key, 1269 1269 }, 1270 1270 #endif 1271 - #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) 1271 + #if defined(CONFIG_TREE_RCU) 1272 1272 { 1273 1273 .procname = "panic_on_rcu_stall", 1274 1274 .data = &sysctl_panic_on_rcu_stall,

+1 -4

net/tipc/crypto.c

··· 257 257 #define tipc_aead_rcu_ptr(rcu_ptr, lock) \ 258 258 rcu_dereference_protected((rcu_ptr), lockdep_is_held(lock)) 259 259 260 - #define tipc_aead_rcu_swap(rcu_ptr, ptr, lock) \ 261 - rcu_swap_protected((rcu_ptr), (ptr), lockdep_is_held(lock)) 262 - 263 260 #define tipc_aead_rcu_replace(rcu_ptr, ptr, lock) \ 264 261 do { \ 265 262 typeof(rcu_ptr) __tmp = rcu_dereference_protected((rcu_ptr), \ ··· 1186 1189 1187 1190 /* Move passive key if any */ 1188 1191 if (key.passive) { 1189 - tipc_aead_rcu_swap(rx->aead[key.passive], tmp2, &rx->lock); 1192 + tmp2 = rcu_replace_pointer(rx->aead[key.passive], tmp2, lockdep_is_held(&rx->lock)); 1190 1193 x = (key.passive - key.pending + new_pending) % KEY_MAX; 1191 1194 new_passive = (x <= 0) ? x + KEY_MAX : x; 1192 1195 }

+9 -2

tools/testing/selftests/rcutorture/bin/cpus2use.sh

··· 15 15 exit 0 16 16 fi 17 17 ncpus=`grep '^processor' /proc/cpuinfo | wc -l` 18 - idlecpus=`mpstat | tail -1 | \ 19 - awk -v ncpus=$ncpus '{ print ncpus * ($7 + $NF) / 100 }'` 18 + if mpstat -V > /dev/null 2>&1 19 + then 20 + idlecpus=`mpstat | tail -1 | \ 21 + awk -v ncpus=$ncpus '{ print ncpus * ($7 + $NF) / 100 }'` 22 + else 23 + # No mpstat command, so use all available CPUs. 24 + echo The mpstat command is not available, so greedily using all CPUs. 25 + idlecpus=$ncpus 26 + fi 20 27 awk -v ncpus=$ncpus -v idlecpus=$idlecpus < /dev/null ' 21 28 BEGIN { 22 29 cpus2use = idlecpus;

+22 -8

tools/testing/selftests/rcutorture/bin/jitter.sh

··· 23 23 24 24 n=1 25 25 26 - starttime=`awk 'BEGIN { print systime(); }' < /dev/null` 26 + starttime=`gawk 'BEGIN { print systime(); }' < /dev/null` 27 + 28 + nohotplugcpus= 29 + for i in /sys/devices/system/cpu/cpu[0-9]* 30 + do 31 + if test -f $i/online 32 + then 33 + : 34 + else 35 + curcpu=`echo $i | sed -e 's/^[^0-9]*//'` 36 + nohotplugcpus="$nohotplugcpus $curcpu" 37 + fi 38 + done 27 39 28 40 while : 29 41 do 30 42 # Check for done. 31 - t=`awk -v s=$starttime 'BEGIN { print systime() - s; }' < /dev/null` 43 + t=`gawk -v s=$starttime 'BEGIN { print systime() - s; }' < /dev/null` 32 44 if test "$t" -gt "$duration" 33 45 then 34 46 exit 0; 35 47 fi 36 48 37 49 # Set affinity to randomly selected online CPU 38 - cpus=`grep 1 /sys/devices/system/cpu/*/online | 39 - sed -e 's,/[^/]*$,,' -e 's/^[^0-9]*//'` 40 - 41 - # Do not leave out poor old cpu0 which may not be hot-pluggable 42 - if [ ! -f "/sys/devices/system/cpu/cpu0/online" ]; then 43 - cpus="0 $cpus" 50 + if cpus=`grep 1 /sys/devices/system/cpu/*/online 2>&1 | 51 + sed -e 's,/[^/]*$,,' -e 's/^[^0-9]*//'` 52 + then 53 + : 54 + else 55 + cpus= 44 56 fi 57 + # Do not leave out non-hot-pluggable CPUs 58 + cpus="$cpus $nohotplugcpus" 45 59 46 60 cpumask=`awk -v cpus="$cpus" -v me=$me -v n=$n 'BEGIN { 47 61 srand(n + me + systime());

+2 -1

tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh

··· 25 25 tail -1 | sed -e 's/^\[[ 0-9.]*] //' | 26 26 awk '{ print \"[\" $1 \" \" $5 \" \" $6 \" \" $7 \"]\"; }' | 27 27 tr -d '\012\015'`" 28 + fwdprog="`grep 'rcu_torture_fwd_prog_cr Duration' $i/console.log 2> /dev/null | sed -e 's/^\[[^]]*] //' | sort -k15nr | head -1 | awk '{ print $14 " " $15 }'`" 28 29 if test -z "$ngps" 29 30 then 30 31 echo "$configfile ------- " $stopstate ··· 40 39 BEGIN { print ngps / dur }' < /dev/null` 41 40 title="$title ($ngpsps/s)" 42 41 fi 43 - echo $title $stopstate 42 + echo $title $stopstate $fwdprog 44 43 nclosecalls=`grep --binary-files=text 'torture: Reader Batch' $i/console.log | tail -1 | awk '{for (i=NF-8;i<=NF;i++) sum+=$i; } END {print sum}'` 45 44 if test -z "$nclosecalls" 46 45 then

+6 -7

tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh

··· 123 123 boot_args=$6 124 124 125 125 cd $KVM 126 - kstarttime=`awk 'BEGIN { print systime() }' < /dev/null` 126 + kstarttime=`gawk 'BEGIN { print systime() }' < /dev/null` 127 127 if test -z "$TORTURE_BUILDONLY" 128 128 then 129 129 echo ' ---' `date`: Starting kernel ··· 133 133 qemu_args="-enable-kvm -nographic $qemu_args" 134 134 cpu_count=`configNR_CPUS.sh $resdir/ConfigFragment` 135 135 cpu_count=`configfrag_boot_cpus "$boot_args" "$config_template" "$cpu_count"` 136 - vcpus=`identify_qemu_vcpus` 137 - if test $cpu_count -gt $vcpus 136 + if test "$cpu_count" -gt "$TORTURE_ALLOTED_CPUS" 138 137 then 139 - echo CPU count limited from $cpu_count to $vcpus | tee -a $resdir/Warnings 140 - cpu_count=$vcpus 138 + echo CPU count limited from $cpu_count to $TORTURE_ALLOTED_CPUS | tee -a $resdir/Warnings 139 + cpu_count=$TORTURE_ALLOTED_CPUS 141 140 fi 142 141 qemu_args="`specify_qemu_cpus "$QEMU" "$qemu_args" "$cpu_count"`" 143 142 ··· 176 177 then 177 178 qemu_pid=`cat "$resdir/qemu_pid"` 178 179 fi 179 - kruntime=`awk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null` 180 + kruntime=`gawk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null` 180 181 if test -z "$qemu_pid" || kill -0 "$qemu_pid" > /dev/null 2>&1 181 182 then 182 183 if test $kruntime -ge $seconds ··· 212 213 oldline="`tail $resdir/console.log`" 213 214 while : 214 215 do 215 - kruntime=`awk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null` 216 + kruntime=`gawk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null` 216 217 if kill -0 $qemu_pid > /dev/null 2>&1 217 218 then 218 219 :

+21 -9

tools/testing/selftests/rcutorture/bin/kvm.sh

··· 24 24 dryrun="" 25 25 KVM="`pwd`/tools/testing/selftests/rcutorture"; export KVM 26 26 PATH=${KVM}/bin:$PATH; export PATH 27 - TORTURE_ALLOTED_CPUS="" 27 + . functions.sh 28 + 29 + TORTURE_ALLOTED_CPUS="`identify_qemu_vcpus`" 28 30 TORTURE_DEFCONFIG=defconfig 29 31 TORTURE_BOOT_IMAGE="" 30 32 TORTURE_INITRD="$KVM/initrd"; export TORTURE_INITRD ··· 41 39 cpus=0 42 40 ds=`date +%Y.%m.%d-%H:%M:%S` 43 41 jitter="-1" 44 - 45 - . functions.sh 46 42 47 43 usage () { 48 44 echo "Usage: $scriptname optional arguments:" ··· 93 93 checkarg --cpus "(number)" "$#" "$2" '^[0-9]*$' '^--' 94 94 cpus=$2 95 95 TORTURE_ALLOTED_CPUS="$2" 96 + max_cpus="`identify_qemu_vcpus`" 97 + if test "$TORTURE_ALLOTED_CPUS" -gt "$max_cpus" 98 + then 99 + TORTURE_ALLOTED_CPUS=$max_cpus 100 + fi 96 101 shift 97 102 ;; 98 103 --datestamp) ··· 203 198 204 199 CONFIGFRAG=${KVM}/configs/${TORTURE_SUITE}; export CONFIGFRAG 205 200 201 + defaultconfigs="`tr '\012' ' ' < $CONFIGFRAG/CFLIST`" 206 202 if test -z "$configs" 207 203 then 208 - configs="`cat $CONFIGFRAG/CFLIST`" 204 + configs=$defaultconfigs 209 205 fi 210 206 211 207 if test -z "$resdir" ··· 215 209 fi 216 210 217 211 # Create a file of test-name/#cpus pairs, sorted by decreasing #cpus. 218 - touch $T/cfgcpu 212 + configs_derep= 219 213 for CF in $configs 220 214 do 221 215 case $CF in ··· 228 222 CF1=$CF 229 223 ;; 230 224 esac 225 + for ((cur_rep=0;cur_rep<$config_reps;cur_rep++)) 226 + do 227 + configs_derep="$configs_derep $CF1" 228 + done 229 + done 230 + touch $T/cfgcpu 231 + configs_derep="`echo $configs_derep | sed -e "s/\<CFLIST\>/$defaultconfigs/g"`" 232 + for CF1 in $configs_derep 233 + do 231 234 if test -f "$CONFIGFRAG/$CF1" 232 235 then 233 236 cpu_count=`configNR_CPUS.sh $CONFIGFRAG/$CF1` 234 237 cpu_count=`configfrag_boot_cpus "$TORTURE_BOOTARGS" "$CONFIGFRAG/$CF1" "$cpu_count"` 235 238 cpu_count=`configfrag_boot_maxcpus "$TORTURE_BOOTARGS" "$CONFIGFRAG/$CF1" "$cpu_count"` 236 - for ((cur_rep=0;cur_rep<$config_reps;cur_rep++)) 237 - do 238 - echo $CF1 $cpu_count >> $T/cfgcpu 239 - done 239 + echo $CF1 $cpu_count >> $T/cfgcpu 240 240 else 241 241 echo "The --configs file $CF1 does not exist, terminating." 242 242 exit 1

+3 -52

tools/testing/selftests/rcutorture/bin/mkinitrd.sh

··· 20 20 exit 0 21 21 fi 22 22 23 - T=${TMPDIR-/tmp}/mkinitrd.sh.$$ 24 - trap 'rm -rf $T' 0 2 25 - mkdir $T 26 - 27 - cat > $T/init << '__EOF___' 28 - #!/bin/sh 29 - # Run in userspace a few milliseconds every second. This helps to 30 - # exercise the NO_HZ_FULL portions of RCU. The 192 instances of "a" was 31 - # empirically shown to give a nice multi-millisecond burst of user-mode 32 - # execution on a 2GHz CPU, as desired. Modern CPUs will vary from a 33 - # couple of milliseconds up to perhaps 100 milliseconds, which is an 34 - # acceptable range. 35 - # 36 - # Why not calibrate an exact delay? Because within this initrd, we 37 - # are restricted to Bourne-shell builtins, which as far as I know do not 38 - # provide any means of obtaining a fine-grained timestamp. 39 - 40 - a4="a a a a" 41 - a16="$a4 $a4 $a4 $a4" 42 - a64="$a16 $a16 $a16 $a16" 43 - a192="$a64 $a64 $a64" 44 - while : 45 - do 46 - q= 47 - for i in $a192 48 - do 49 - q="$q $i" 50 - done 51 - sleep 1 52 - done 53 - __EOF___ 54 - 55 - # Try using dracut to create initrd 56 - if command -v dracut >/dev/null 2>&1 57 - then 58 - echo Creating $D/initrd using dracut. 59 - # Filesystem creation 60 - dracut --force --no-hostonly --no-hostonly-cmdline --module "base" $T/initramfs.img 61 - cd $D 62 - mkdir -p initrd 63 - cd initrd 64 - zcat $T/initramfs.img | cpio -id 65 - cp $T/init init 66 - chmod +x init 67 - echo Done creating $D/initrd using dracut 68 - exit 0 69 - fi 70 - 71 - # No dracut, so create a C-language initrd/init program and statically 72 - # link it. This results in a very small initrd, but might be a bit less 73 - # future-proof than dracut. 74 - echo "Could not find dracut, attempting C initrd" 23 + # Create a C-language initrd/init infinite-loop program and statically 24 + # link it. This results in a very small initrd. 25 + echo "Creating a statically linked C-language initrd" 75 26 cd $D 76 27 mkdir -p initrd 77 28 cd initrd