Merge tag 'core-rcu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

+6 -1

Documentation/RCU/Design/Requirements/Requirements.rst

··· 2583 2583 would need to be instructions following ``rcu_read_unlock()``. Although 2584 2584 ``synchronize_rcu()`` would guarantee that execution reached the 2585 2585 ``rcu_read_unlock()``, it would not be able to guarantee that execution 2586 - had completely left the trampoline. 2586 + had completely left the trampoline. Worse yet, in some situations 2587 + the trampoline's protection must extend a few instructions *prior* to 2588 + execution reaching the trampoline. For example, these few instructions 2589 + might calculate the address of the trampoline, so that entering the 2590 + trampoline would be pre-ordained a surprisingly long time before execution 2591 + actually reached the trampoline itself. 2587 2592 2588 2593 The solution, in the form of `Tasks 2589 2594 RCU <https://lwn.net/Articles/607117/>`__, is to have implicit read-side

+12 -5

Documentation/RCU/checklist.txt Documentation/RCU/checklist.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================================ 1 4 Review Checklist for RCU Patches 5 + ================================ 2 6 3 7 4 8 This document contains a checklist for producing and reviewing patches ··· 415 411 __rcu sparse checks to validate your RCU code. These can help 416 412 find problems as follows: 417 413 418 - CONFIG_PROVE_LOCKING: check that accesses to RCU-protected data 414 + CONFIG_PROVE_LOCKING: 415 + check that accesses to RCU-protected data 419 416 structures are carried out under the proper RCU 420 417 read-side critical section, while holding the right 421 418 combination of locks, or whatever other conditions 422 419 are appropriate. 423 420 424 - CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the 421 + CONFIG_DEBUG_OBJECTS_RCU_HEAD: 422 + check that you don't pass the 425 423 same object to call_rcu() (or friends) before an RCU 426 424 grace period has elapsed since the last time that you 427 425 passed that same object to call_rcu() (or friends). 428 426 429 - __rcu sparse checks: tag the pointer to the RCU-protected data 427 + __rcu sparse checks: 428 + tag the pointer to the RCU-protected data 430 429 structure with __rcu, and sparse will warn you if you 431 430 access that pointer without the services of one of the 432 431 variants of rcu_dereference(). ··· 449 442 450 443 You instead need to use one of the barrier functions: 451 444 452 - o call_rcu() -> rcu_barrier() 453 - o call_srcu() -> srcu_barrier() 445 + - call_rcu() -> rcu_barrier() 446 + - call_srcu() -> srcu_barrier() 454 447 455 448 However, these barrier functions are absolutely -not- guaranteed 456 449 to wait for a grace period. In fact, if there are no call_rcu()

+9

Documentation/RCU/index.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 1 3 .. _rcu_concepts: 2 4 3 5 ============ ··· 10 8 :maxdepth: 3 11 9 12 10 arrayRCU 11 + checklist 12 + lockdep 13 + lockdep-splat 13 14 rcubarrier 14 15 rcu_dereference 15 16 whatisRCU 16 17 rcu 18 + rculist_nulls 19 + rcuref 20 + torture 21 + stallwarn 17 22 listRCU 18 23 NMI-RCU 19 24 UP

+52 -47

Documentation/RCU/lockdep-splat.txt Documentation/RCU/lockdep-splat.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================= 4 + Lockdep-RCU Splat 5 + ================= 6 + 1 7 Lockdep-RCU was added to the Linux kernel in early 2010 2 8 (http://lwn.net/Articles/371986/). This facility checks for some common 3 9 misuses of the RCU API, most notably using one of the rcu_dereference() ··· 18 12 being the real world and all that. 19 13 20 14 So let's look at an example RCU lockdep splat from 3.0-rc5, one that 21 - has long since been fixed: 15 + has long since been fixed:: 22 16 23 - ============================= 24 - WARNING: suspicious RCU usage 25 - ----------------------------- 26 - block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage! 17 + ============================= 18 + WARNING: suspicious RCU usage 19 + ----------------------------- 20 + block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage! 27 21 28 - other info that might help us debug this: 22 + other info that might help us debug this:: 29 23 24 + rcu_scheduler_active = 1, debug_locks = 0 25 + 3 locks held by scsi_scan_6/1552: 26 + #0: (&shost->scan_mutex){+.+.}, at: [<ffffffff8145efca>] 27 + scsi_scan_host_selected+0x5a/0x150 28 + #1: (&eq->sysfs_lock){+.+.}, at: [<ffffffff812a5032>] 29 + elevator_exit+0x22/0x60 30 + #2: (&(&q->__queue_lock)->rlock){-.-.}, at: [<ffffffff812b6233>] 31 + cfq_exit_queue+0x43/0x190 30 32 31 - rcu_scheduler_active = 1, debug_locks = 0 32 - 3 locks held by scsi_scan_6/1552: 33 - #0: (&shost->scan_mutex){+.+.}, at: [<ffffffff8145efca>] 34 - scsi_scan_host_selected+0x5a/0x150 35 - #1: (&eq->sysfs_lock){+.+.}, at: [<ffffffff812a5032>] 36 - elevator_exit+0x22/0x60 37 - #2: (&(&q->__queue_lock)->rlock){-.-.}, at: [<ffffffff812b6233>] 38 - cfq_exit_queue+0x43/0x190 33 + stack backtrace: 34 + Pid: 1552, comm: scsi_scan_6 Not tainted 3.0.0-rc5 #17 35 + Call Trace: 36 + [<ffffffff810abb9b>] lockdep_rcu_dereference+0xbb/0xc0 37 + [<ffffffff812b6139>] __cfq_exit_single_io_context+0xe9/0x120 38 + [<ffffffff812b626c>] cfq_exit_queue+0x7c/0x190 39 + [<ffffffff812a5046>] elevator_exit+0x36/0x60 40 + [<ffffffff812a802a>] blk_cleanup_queue+0x4a/0x60 41 + [<ffffffff8145cc09>] scsi_free_queue+0x9/0x10 42 + [<ffffffff81460944>] __scsi_remove_device+0x84/0xd0 43 + [<ffffffff8145dca3>] scsi_probe_and_add_lun+0x353/0xb10 44 + [<ffffffff817da069>] ? error_exit+0x29/0xb0 45 + [<ffffffff817d98ed>] ? _raw_spin_unlock_irqrestore+0x3d/0x80 46 + [<ffffffff8145e722>] __scsi_scan_target+0x112/0x680 47 + [<ffffffff812c690d>] ? trace_hardirqs_off_thunk+0x3a/0x3c 48 + [<ffffffff817da069>] ? error_exit+0x29/0xb0 49 + [<ffffffff812bcc60>] ? kobject_del+0x40/0x40 50 + [<ffffffff8145ed16>] scsi_scan_channel+0x86/0xb0 51 + [<ffffffff8145f0b0>] scsi_scan_host_selected+0x140/0x150 52 + [<ffffffff8145f149>] do_scsi_scan_host+0x89/0x90 53 + [<ffffffff8145f170>] do_scan_async+0x20/0x160 54 + [<ffffffff8145f150>] ? do_scsi_scan_host+0x90/0x90 55 + [<ffffffff810975b6>] kthread+0xa6/0xb0 56 + [<ffffffff817db154>] kernel_thread_helper+0x4/0x10 57 + [<ffffffff81066430>] ? finish_task_switch+0x80/0x110 58 + [<ffffffff817d9c04>] ? retint_restore_args+0xe/0xe 59 + [<ffffffff81097510>] ? __kthread_init_worker+0x70/0x70 60 + [<ffffffff817db150>] ? gs_change+0xb/0xb 39 61 40 - stack backtrace: 41 - Pid: 1552, comm: scsi_scan_6 Not tainted 3.0.0-rc5 #17 42 - Call Trace: 43 - [<ffffffff810abb9b>] lockdep_rcu_dereference+0xbb/0xc0 44 - [<ffffffff812b6139>] __cfq_exit_single_io_context+0xe9/0x120 45 - [<ffffffff812b626c>] cfq_exit_queue+0x7c/0x190 46 - [<ffffffff812a5046>] elevator_exit+0x36/0x60 47 - [<ffffffff812a802a>] blk_cleanup_queue+0x4a/0x60 48 - [<ffffffff8145cc09>] scsi_free_queue+0x9/0x10 49 - [<ffffffff81460944>] __scsi_remove_device+0x84/0xd0 50 - [<ffffffff8145dca3>] scsi_probe_and_add_lun+0x353/0xb10 51 - [<ffffffff817da069>] ? error_exit+0x29/0xb0 52 - [<ffffffff817d98ed>] ? _raw_spin_unlock_irqrestore+0x3d/0x80 53 - [<ffffffff8145e722>] __scsi_scan_target+0x112/0x680 54 - [<ffffffff812c690d>] ? trace_hardirqs_off_thunk+0x3a/0x3c 55 - [<ffffffff817da069>] ? error_exit+0x29/0xb0 56 - [<ffffffff812bcc60>] ? kobject_del+0x40/0x40 57 - [<ffffffff8145ed16>] scsi_scan_channel+0x86/0xb0 58 - [<ffffffff8145f0b0>] scsi_scan_host_selected+0x140/0x150 59 - [<ffffffff8145f149>] do_scsi_scan_host+0x89/0x90 60 - [<ffffffff8145f170>] do_scan_async+0x20/0x160 61 - [<ffffffff8145f150>] ? do_scsi_scan_host+0x90/0x90 62 - [<ffffffff810975b6>] kthread+0xa6/0xb0 63 - [<ffffffff817db154>] kernel_thread_helper+0x4/0x10 64 - [<ffffffff81066430>] ? finish_task_switch+0x80/0x110 65 - [<ffffffff817d9c04>] ? retint_restore_args+0xe/0xe 66 - [<ffffffff81097510>] ? __kthread_init_worker+0x70/0x70 67 - [<ffffffff817db150>] ? gs_change+0xb/0xb 68 - 69 - Line 2776 of block/cfq-iosched.c in v3.0-rc5 is as follows: 62 + Line 2776 of block/cfq-iosched.c in v3.0-rc5 is as follows:: 70 63 71 64 if (rcu_dereference(ioc->ioc_data) == cic) { 72 65 ··· 75 70 And maybe that lock really does protect this reference. If so, the fix 76 71 is to inform RCU, perhaps by changing __cfq_exit_single_io_context() to 77 72 take the struct request_queue "q" from cfq_exit_queue() as an argument, 78 - which would permit us to invoke rcu_dereference_protected as follows: 73 + which would permit us to invoke rcu_dereference_protected as follows:: 79 74 80 75 if (rcu_dereference_protected(ioc->ioc_data, 81 76 lockdep_is_held(&q->queue_lock)) == cic) { ··· 90 85 section. In this case, the critical section must span the use of the 91 86 return value from rcu_dereference(), or at least until there is some 92 87 reference count incremented or some such. One way to handle this is to 93 - add rcu_read_lock() and rcu_read_unlock() as follows: 88 + add rcu_read_lock() and rcu_read_unlock() as follows:: 94 89 95 90 rcu_read_lock(); 96 91 if (rcu_dereference(ioc->ioc_data) == cic) { ··· 107 102 But in this particular case, we don't actually dereference the pointer 108 103 returned from rcu_dereference(). Instead, that pointer is just compared 109 104 to the cic pointer, which means that the rcu_dereference() can be replaced 110 - by rcu_access_pointer() as follows: 105 + by rcu_access_pointer() as follows:: 111 106 112 107 if (rcu_access_pointer(ioc->ioc_data) == cic) { 113 108

+8 -4

Documentation/RCU/lockdep.txt Documentation/RCU/lockdep.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ======================== 1 4 RCU and lockdep checking 5 + ======================== 2 6 3 7 All flavors of RCU have lockdep checking available, so that lockdep is 4 8 aware of when each task enters and leaves any flavor of RCU read-side ··· 12 8 deadlocks and the like. 13 9 14 10 In addition, RCU provides the following primitives that check lockdep's 15 - state: 11 + state:: 16 12 17 13 rcu_read_lock_held() for normal RCU. 18 14 rcu_read_lock_bh_held() for RCU-bh. ··· 67 63 The rcu_dereference_check() check expression can be any boolean 68 64 expression, but would normally include a lockdep expression. However, 69 65 any boolean expression can be used. For a moderately ornate example, 70 - consider the following: 66 + consider the following:: 71 67 72 68 file = rcu_dereference_check(fdt->fd[fd], 73 69 lockdep_is_held(&files->file_lock) || ··· 86 82 any change from taking place, and finally, in case (3) the current task 87 83 is the only task accessing the file_struct, again preventing any change 88 84 from taking place. If the above statement was invoked only from updater 89 - code, it could instead be written as follows: 85 + code, it could instead be written as follows:: 90 86 91 87 file = rcu_dereference_protected(fdt->fd[fd], 92 88 lockdep_is_held(&files->file_lock) || ··· 109 105 110 106 For example, the workqueue for_each_pwq() macro is intended to be used 111 107 either within an RCU read-side critical section or with wq->mutex held. 112 - It is thus implemented as follows: 108 + It is thus implemented as follows:: 113 109 114 110 #define for_each_pwq(pwq, wq) 115 111 list_for_each_entry_rcu((pwq), &(wq)->pwqs, pwqs_node,

+200

Documentation/RCU/rculist_nulls.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================================================= 4 + Using RCU hlist_nulls to protect list and objects 5 + ================================================= 6 + 7 + This section describes how to use hlist_nulls to 8 + protect read-mostly linked lists and 9 + objects using SLAB_TYPESAFE_BY_RCU allocations. 10 + 11 + Please read the basics in Documentation/RCU/listRCU.rst 12 + 13 + Using 'nulls' 14 + ============= 15 + 16 + Using special makers (called 'nulls') is a convenient way 17 + to solve following problem : 18 + 19 + A typical RCU linked list managing objects which are 20 + allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can 21 + use following algos : 22 + 23 + 1) Lookup algo 24 + -------------- 25 + 26 + :: 27 + 28 + rcu_read_lock() 29 + begin: 30 + obj = lockless_lookup(key); 31 + if (obj) { 32 + if (!try_get_ref(obj)) // might fail for free objects 33 + goto begin; 34 + /* 35 + * Because a writer could delete object, and a writer could 36 + * reuse these object before the RCU grace period, we 37 + * must check key after getting the reference on object 38 + */ 39 + if (obj->key != key) { // not the object we expected 40 + put_ref(obj); 41 + goto begin; 42 + } 43 + } 44 + rcu_read_unlock(); 45 + 46 + Beware that lockless_lookup(key) cannot use traditional hlist_for_each_entry_rcu() 47 + but a version with an additional memory barrier (smp_rmb()) 48 + 49 + :: 50 + 51 + lockless_lookup(key) 52 + { 53 + struct hlist_node *node, *next; 54 + for (pos = rcu_dereference((head)->first); 55 + pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) && 56 + ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); 57 + pos = rcu_dereference(next)) 58 + if (obj->key == key) 59 + return obj; 60 + return NULL; 61 + } 62 + 63 + And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb():: 64 + 65 + struct hlist_node *node; 66 + for (pos = rcu_dereference((head)->first); 67 + pos && ({ prefetch(pos->next); 1; }) && 68 + ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); 69 + pos = rcu_dereference(pos->next)) 70 + if (obj->key == key) 71 + return obj; 72 + return NULL; 73 + 74 + Quoting Corey Minyard:: 75 + 76 + "If the object is moved from one list to another list in-between the 77 + time the hash is calculated and the next field is accessed, and the 78 + object has moved to the end of a new list, the traversal will not 79 + complete properly on the list it should have, since the object will 80 + be on the end of the new list and there's not a way to tell it's on a 81 + new list and restart the list traversal. I think that this can be 82 + solved by pre-fetching the "next" field (with proper barriers) before 83 + checking the key." 84 + 85 + 2) Insert algo 86 + -------------- 87 + 88 + We need to make sure a reader cannot read the new 'obj->obj_next' value 89 + and previous value of 'obj->key'. Or else, an item could be deleted 90 + from a chain, and inserted into another chain. If new chain was empty 91 + before the move, 'next' pointer is NULL, and lockless reader can 92 + not detect it missed following items in original chain. 93 + 94 + :: 95 + 96 + /* 97 + * Please note that new inserts are done at the head of list, 98 + * not in the middle or end. 99 + */ 100 + obj = kmem_cache_alloc(...); 101 + lock_chain(); // typically a spin_lock() 102 + obj->key = key; 103 + /* 104 + * we need to make sure obj->key is updated before obj->next 105 + * or obj->refcnt 106 + */ 107 + smp_wmb(); 108 + atomic_set(&obj->refcnt, 1); 109 + hlist_add_head_rcu(&obj->obj_node, list); 110 + unlock_chain(); // typically a spin_unlock() 111 + 112 + 113 + 3) Remove algo 114 + -------------- 115 + Nothing special here, we can use a standard RCU hlist deletion. 116 + But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused 117 + very very fast (before the end of RCU grace period) 118 + 119 + :: 120 + 121 + if (put_last_reference_on(obj) { 122 + lock_chain(); // typically a spin_lock() 123 + hlist_del_init_rcu(&obj->obj_node); 124 + unlock_chain(); // typically a spin_unlock() 125 + kmem_cache_free(cachep, obj); 126 + } 127 + 128 + 129 + 130 + -------------------------------------------------------------------------- 131 + 132 + Avoiding extra smp_rmb() 133 + ======================== 134 + 135 + With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup() 136 + and extra smp_wmb() in insert function. 137 + 138 + For example, if we choose to store the slot number as the 'nulls' 139 + end-of-list marker for each slot of the hash table, we can detect 140 + a race (some writer did a delete and/or a move of an object 141 + to another chain) checking the final 'nulls' value if 142 + the lookup met the end of chain. If final 'nulls' value 143 + is not the slot number, then we must restart the lookup at 144 + the beginning. If the object was moved to the same chain, 145 + then the reader doesn't care : It might eventually 146 + scan the list again without harm. 147 + 148 + 149 + 1) lookup algo 150 + -------------- 151 + 152 + :: 153 + 154 + head = &table[slot]; 155 + rcu_read_lock(); 156 + begin: 157 + hlist_nulls_for_each_entry_rcu(obj, node, head, member) { 158 + if (obj->key == key) { 159 + if (!try_get_ref(obj)) // might fail for free objects 160 + goto begin; 161 + if (obj->key != key) { // not the object we expected 162 + put_ref(obj); 163 + goto begin; 164 + } 165 + goto out; 166 + } 167 + /* 168 + * if the nulls value we got at the end of this lookup is 169 + * not the expected one, we must restart lookup. 170 + * We probably met an item that was moved to another chain. 171 + */ 172 + if (get_nulls_value(node) != slot) 173 + goto begin; 174 + obj = NULL; 175 + 176 + out: 177 + rcu_read_unlock(); 178 + 179 + 2) Insert function 180 + ------------------ 181 + 182 + :: 183 + 184 + /* 185 + * Please note that new inserts are done at the head of list, 186 + * not in the middle or end. 187 + */ 188 + obj = kmem_cache_alloc(cachep); 189 + lock_chain(); // typically a spin_lock() 190 + obj->key = key; 191 + /* 192 + * changes to obj->key must be visible before refcnt one 193 + */ 194 + smp_wmb(); 195 + atomic_set(&obj->refcnt, 1); 196 + /* 197 + * insert obj in RCU way (readers might be traversing chain) 198 + */ 199 + hlist_nulls_add_head_rcu(&obj->obj_node, list); 200 + unlock_chain(); // typically a spin_unlock()

-172

Documentation/RCU/rculist_nulls.txt

··· 1 - Using hlist_nulls to protect read-mostly linked lists and 2 - objects using SLAB_TYPESAFE_BY_RCU allocations. 3 - 4 - Please read the basics in Documentation/RCU/listRCU.rst 5 - 6 - Using special makers (called 'nulls') is a convenient way 7 - to solve following problem : 8 - 9 - A typical RCU linked list managing objects which are 10 - allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can 11 - use following algos : 12 - 13 - 1) Lookup algo 14 - -------------- 15 - rcu_read_lock() 16 - begin: 17 - obj = lockless_lookup(key); 18 - if (obj) { 19 - if (!try_get_ref(obj)) // might fail for free objects 20 - goto begin; 21 - /* 22 - * Because a writer could delete object, and a writer could 23 - * reuse these object before the RCU grace period, we 24 - * must check key after getting the reference on object 25 - */ 26 - if (obj->key != key) { // not the object we expected 27 - put_ref(obj); 28 - goto begin; 29 - } 30 - } 31 - rcu_read_unlock(); 32 - 33 - Beware that lockless_lookup(key) cannot use traditional hlist_for_each_entry_rcu() 34 - but a version with an additional memory barrier (smp_rmb()) 35 - 36 - lockless_lookup(key) 37 - { 38 - struct hlist_node *node, *next; 39 - for (pos = rcu_dereference((head)->first); 40 - pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) && 41 - ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); 42 - pos = rcu_dereference(next)) 43 - if (obj->key == key) 44 - return obj; 45 - return NULL; 46 - 47 - And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb() : 48 - 49 - struct hlist_node *node; 50 - for (pos = rcu_dereference((head)->first); 51 - pos && ({ prefetch(pos->next); 1; }) && 52 - ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); 53 - pos = rcu_dereference(pos->next)) 54 - if (obj->key == key) 55 - return obj; 56 - return NULL; 57 - } 58 - 59 - Quoting Corey Minyard : 60 - 61 - "If the object is moved from one list to another list in-between the 62 - time the hash is calculated and the next field is accessed, and the 63 - object has moved to the end of a new list, the traversal will not 64 - complete properly on the list it should have, since the object will 65 - be on the end of the new list and there's not a way to tell it's on a 66 - new list and restart the list traversal. I think that this can be 67 - solved by pre-fetching the "next" field (with proper barriers) before 68 - checking the key." 69 - 70 - 2) Insert algo : 71 - ---------------- 72 - 73 - We need to make sure a reader cannot read the new 'obj->obj_next' value 74 - and previous value of 'obj->key'. Or else, an item could be deleted 75 - from a chain, and inserted into another chain. If new chain was empty 76 - before the move, 'next' pointer is NULL, and lockless reader can 77 - not detect it missed following items in original chain. 78 - 79 - /* 80 - * Please note that new inserts are done at the head of list, 81 - * not in the middle or end. 82 - */ 83 - obj = kmem_cache_alloc(...); 84 - lock_chain(); // typically a spin_lock() 85 - obj->key = key; 86 - /* 87 - * we need to make sure obj->key is updated before obj->next 88 - * or obj->refcnt 89 - */ 90 - smp_wmb(); 91 - atomic_set(&obj->refcnt, 1); 92 - hlist_add_head_rcu(&obj->obj_node, list); 93 - unlock_chain(); // typically a spin_unlock() 94 - 95 - 96 - 3) Remove algo 97 - -------------- 98 - Nothing special here, we can use a standard RCU hlist deletion. 99 - But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused 100 - very very fast (before the end of RCU grace period) 101 - 102 - if (put_last_reference_on(obj) { 103 - lock_chain(); // typically a spin_lock() 104 - hlist_del_init_rcu(&obj->obj_node); 105 - unlock_chain(); // typically a spin_unlock() 106 - kmem_cache_free(cachep, obj); 107 - } 108 - 109 - 110 - 111 - -------------------------------------------------------------------------- 112 - With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup() 113 - and extra smp_wmb() in insert function. 114 - 115 - For example, if we choose to store the slot number as the 'nulls' 116 - end-of-list marker for each slot of the hash table, we can detect 117 - a race (some writer did a delete and/or a move of an object 118 - to another chain) checking the final 'nulls' value if 119 - the lookup met the end of chain. If final 'nulls' value 120 - is not the slot number, then we must restart the lookup at 121 - the beginning. If the object was moved to the same chain, 122 - then the reader doesn't care : It might eventually 123 - scan the list again without harm. 124 - 125 - 126 - 1) lookup algo 127 - 128 - head = &table[slot]; 129 - rcu_read_lock(); 130 - begin: 131 - hlist_nulls_for_each_entry_rcu(obj, node, head, member) { 132 - if (obj->key == key) { 133 - if (!try_get_ref(obj)) // might fail for free objects 134 - goto begin; 135 - if (obj->key != key) { // not the object we expected 136 - put_ref(obj); 137 - goto begin; 138 - } 139 - goto out; 140 - } 141 - /* 142 - * if the nulls value we got at the end of this lookup is 143 - * not the expected one, we must restart lookup. 144 - * We probably met an item that was moved to another chain. 145 - */ 146 - if (get_nulls_value(node) != slot) 147 - goto begin; 148 - obj = NULL; 149 - 150 - out: 151 - rcu_read_unlock(); 152 - 153 - 2) Insert function : 154 - -------------------- 155 - 156 - /* 157 - * Please note that new inserts are done at the head of list, 158 - * not in the middle or end. 159 - */ 160 - obj = kmem_cache_alloc(cachep); 161 - lock_chain(); // typically a spin_lock() 162 - obj->key = key; 163 - /* 164 - * changes to obj->key must be visible before refcnt one 165 - */ 166 - smp_wmb(); 167 - atomic_set(&obj->refcnt, 1); 168 - /* 169 - * insert obj in RCU way (readers might be traversing chain) 170 - */ 171 - hlist_nulls_add_head_rcu(&obj->obj_node, list); 172 - unlock_chain(); // typically a spin_unlock()

+158

Documentation/RCU/rcuref.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ==================================================================== 4 + Reference-count design for elements of lists/arrays protected by RCU 5 + ==================================================================== 6 + 7 + 8 + Please note that the percpu-ref feature is likely your first 9 + stop if you need to combine reference counts and RCU. Please see 10 + include/linux/percpu-refcount.h for more information. However, in 11 + those unusual cases where percpu-ref would consume too much memory, 12 + please read on. 13 + 14 + ------------------------------------------------------------------------ 15 + 16 + Reference counting on elements of lists which are protected by traditional 17 + reader/writer spinlocks or semaphores are straightforward: 18 + 19 + CODE LISTING A:: 20 + 21 + 1. 2. 22 + add() search_and_reference() 23 + { { 24 + alloc_object read_lock(&list_lock); 25 + ... search_for_element 26 + atomic_set(&el->rc, 1); atomic_inc(&el->rc); 27 + write_lock(&list_lock); ... 28 + add_element read_unlock(&list_lock); 29 + ... ... 30 + write_unlock(&list_lock); } 31 + } 32 + 33 + 3. 4. 34 + release_referenced() delete() 35 + { { 36 + ... write_lock(&list_lock); 37 + if(atomic_dec_and_test(&el->rc)) ... 38 + kfree(el); 39 + ... remove_element 40 + } write_unlock(&list_lock); 41 + ... 42 + if (atomic_dec_and_test(&el->rc)) 43 + kfree(el); 44 + ... 45 + } 46 + 47 + If this list/array is made lock free using RCU as in changing the 48 + write_lock() in add() and delete() to spin_lock() and changing read_lock() 49 + in search_and_reference() to rcu_read_lock(), the atomic_inc() in 50 + search_and_reference() could potentially hold reference to an element which 51 + has already been deleted from the list/array. Use atomic_inc_not_zero() 52 + in this scenario as follows: 53 + 54 + CODE LISTING B:: 55 + 56 + 1. 2. 57 + add() search_and_reference() 58 + { { 59 + alloc_object rcu_read_lock(); 60 + ... search_for_element 61 + atomic_set(&el->rc, 1); if (!atomic_inc_not_zero(&el->rc)) { 62 + spin_lock(&list_lock); rcu_read_unlock(); 63 + return FAIL; 64 + add_element } 65 + ... ... 66 + spin_unlock(&list_lock); rcu_read_unlock(); 67 + } } 68 + 3. 4. 69 + release_referenced() delete() 70 + { { 71 + ... spin_lock(&list_lock); 72 + if (atomic_dec_and_test(&el->rc)) ... 73 + call_rcu(&el->head, el_free); remove_element 74 + ... spin_unlock(&list_lock); 75 + } ... 76 + if (atomic_dec_and_test(&el->rc)) 77 + call_rcu(&el->head, el_free); 78 + ... 79 + } 80 + 81 + Sometimes, a reference to the element needs to be obtained in the 82 + update (write) stream. In such cases, atomic_inc_not_zero() might be 83 + overkill, since we hold the update-side spinlock. One might instead 84 + use atomic_inc() in such cases. 85 + 86 + It is not always convenient to deal with "FAIL" in the 87 + search_and_reference() code path. In such cases, the 88 + atomic_dec_and_test() may be moved from delete() to el_free() 89 + as follows: 90 + 91 + CODE LISTING C:: 92 + 93 + 1. 2. 94 + add() search_and_reference() 95 + { { 96 + alloc_object rcu_read_lock(); 97 + ... search_for_element 98 + atomic_set(&el->rc, 1); atomic_inc(&el->rc); 99 + spin_lock(&list_lock); ... 100 + 101 + add_element rcu_read_unlock(); 102 + ... } 103 + spin_unlock(&list_lock); 4. 104 + } delete() 105 + 3. { 106 + release_referenced() spin_lock(&list_lock); 107 + { ... 108 + ... remove_element 109 + if (atomic_dec_and_test(&el->rc)) spin_unlock(&list_lock); 110 + kfree(el); ... 111 + ... call_rcu(&el->head, el_free); 112 + } ... 113 + 5. } 114 + void el_free(struct rcu_head *rhp) 115 + { 116 + release_referenced(); 117 + } 118 + 119 + The key point is that the initial reference added by add() is not removed 120 + until after a grace period has elapsed following removal. This means that 121 + search_and_reference() cannot find this element, which means that the value 122 + of el->rc cannot increase. Thus, once it reaches zero, there are no 123 + readers that can or ever will be able to reference the element. The 124 + element can therefore safely be freed. This in turn guarantees that if 125 + any reader finds the element, that reader may safely acquire a reference 126 + without checking the value of the reference counter. 127 + 128 + A clear advantage of the RCU-based pattern in listing C over the one 129 + in listing B is that any call to search_and_reference() that locates 130 + a given object will succeed in obtaining a reference to that object, 131 + even given a concurrent invocation of delete() for that same object. 132 + Similarly, a clear advantage of both listings B and C over listing A is 133 + that a call to delete() is not delayed even if there are an arbitrarily 134 + large number of calls to search_and_reference() searching for the same 135 + object that delete() was invoked on. Instead, all that is delayed is 136 + the eventual invocation of kfree(), which is usually not a problem on 137 + modern computer systems, even the small ones. 138 + 139 + In cases where delete() can sleep, synchronize_rcu() can be called from 140 + delete(), so that el_free() can be subsumed into delete as follows:: 141 + 142 + 4. 143 + delete() 144 + { 145 + spin_lock(&list_lock); 146 + ... 147 + remove_element 148 + spin_unlock(&list_lock); 149 + ... 150 + synchronize_rcu(); 151 + if (atomic_dec_and_test(&el->rc)) 152 + kfree(el); 153 + ... 154 + } 155 + 156 + As additional examples in the kernel, the pattern in listing C is used by 157 + reference counting of struct pid, while the pattern in listing B is used by 158 + struct posix_acl.

-151

Documentation/RCU/rcuref.txt

··· 1 - Reference-count design for elements of lists/arrays protected by RCU. 2 - 3 - 4 - Please note that the percpu-ref feature is likely your first 5 - stop if you need to combine reference counts and RCU. Please see 6 - include/linux/percpu-refcount.h for more information. However, in 7 - those unusual cases where percpu-ref would consume too much memory, 8 - please read on. 9 - 10 - ------------------------------------------------------------------------ 11 - 12 - Reference counting on elements of lists which are protected by traditional 13 - reader/writer spinlocks or semaphores are straightforward: 14 - 15 - CODE LISTING A: 16 - 1. 2. 17 - add() search_and_reference() 18 - { { 19 - alloc_object read_lock(&list_lock); 20 - ... search_for_element 21 - atomic_set(&el->rc, 1); atomic_inc(&el->rc); 22 - write_lock(&list_lock); ... 23 - add_element read_unlock(&list_lock); 24 - ... ... 25 - write_unlock(&list_lock); } 26 - } 27 - 28 - 3. 4. 29 - release_referenced() delete() 30 - { { 31 - ... write_lock(&list_lock); 32 - if(atomic_dec_and_test(&el->rc)) ... 33 - kfree(el); 34 - ... remove_element 35 - } write_unlock(&list_lock); 36 - ... 37 - if (atomic_dec_and_test(&el->rc)) 38 - kfree(el); 39 - ... 40 - } 41 - 42 - If this list/array is made lock free using RCU as in changing the 43 - write_lock() in add() and delete() to spin_lock() and changing read_lock() 44 - in search_and_reference() to rcu_read_lock(), the atomic_inc() in 45 - search_and_reference() could potentially hold reference to an element which 46 - has already been deleted from the list/array. Use atomic_inc_not_zero() 47 - in this scenario as follows: 48 - 49 - CODE LISTING B: 50 - 1. 2. 51 - add() search_and_reference() 52 - { { 53 - alloc_object rcu_read_lock(); 54 - ... search_for_element 55 - atomic_set(&el->rc, 1); if (!atomic_inc_not_zero(&el->rc)) { 56 - spin_lock(&list_lock); rcu_read_unlock(); 57 - return FAIL; 58 - add_element } 59 - ... ... 60 - spin_unlock(&list_lock); rcu_read_unlock(); 61 - } } 62 - 3. 4. 63 - release_referenced() delete() 64 - { { 65 - ... spin_lock(&list_lock); 66 - if (atomic_dec_and_test(&el->rc)) ... 67 - call_rcu(&el->head, el_free); remove_element 68 - ... spin_unlock(&list_lock); 69 - } ... 70 - if (atomic_dec_and_test(&el->rc)) 71 - call_rcu(&el->head, el_free); 72 - ... 73 - } 74 - 75 - Sometimes, a reference to the element needs to be obtained in the 76 - update (write) stream. In such cases, atomic_inc_not_zero() might be 77 - overkill, since we hold the update-side spinlock. One might instead 78 - use atomic_inc() in such cases. 79 - 80 - It is not always convenient to deal with "FAIL" in the 81 - search_and_reference() code path. In such cases, the 82 - atomic_dec_and_test() may be moved from delete() to el_free() 83 - as follows: 84 - 85 - CODE LISTING C: 86 - 1. 2. 87 - add() search_and_reference() 88 - { { 89 - alloc_object rcu_read_lock(); 90 - ... search_for_element 91 - atomic_set(&el->rc, 1); atomic_inc(&el->rc); 92 - spin_lock(&list_lock); ... 93 - 94 - add_element rcu_read_unlock(); 95 - ... } 96 - spin_unlock(&list_lock); 4. 97 - } delete() 98 - 3. { 99 - release_referenced() spin_lock(&list_lock); 100 - { ... 101 - ... remove_element 102 - if (atomic_dec_and_test(&el->rc)) spin_unlock(&list_lock); 103 - kfree(el); ... 104 - ... call_rcu(&el->head, el_free); 105 - } ... 106 - 5. } 107 - void el_free(struct rcu_head *rhp) 108 - { 109 - release_referenced(); 110 - } 111 - 112 - The key point is that the initial reference added by add() is not removed 113 - until after a grace period has elapsed following removal. This means that 114 - search_and_reference() cannot find this element, which means that the value 115 - of el->rc cannot increase. Thus, once it reaches zero, there are no 116 - readers that can or ever will be able to reference the element. The 117 - element can therefore safely be freed. This in turn guarantees that if 118 - any reader finds the element, that reader may safely acquire a reference 119 - without checking the value of the reference counter. 120 - 121 - A clear advantage of the RCU-based pattern in listing C over the one 122 - in listing B is that any call to search_and_reference() that locates 123 - a given object will succeed in obtaining a reference to that object, 124 - even given a concurrent invocation of delete() for that same object. 125 - Similarly, a clear advantage of both listings B and C over listing A is 126 - that a call to delete() is not delayed even if there are an arbitrarily 127 - large number of calls to search_and_reference() searching for the same 128 - object that delete() was invoked on. Instead, all that is delayed is 129 - the eventual invocation of kfree(), which is usually not a problem on 130 - modern computer systems, even the small ones. 131 - 132 - In cases where delete() can sleep, synchronize_rcu() can be called from 133 - delete(), so that el_free() can be subsumed into delete as follows: 134 - 135 - 4. 136 - delete() 137 - { 138 - spin_lock(&list_lock); 139 - ... 140 - remove_element 141 - spin_unlock(&list_lock); 142 - ... 143 - synchronize_rcu(); 144 - if (atomic_dec_and_test(&el->rc)) 145 - kfree(el); 146 - ... 147 - } 148 - 149 - As additional examples in the kernel, the pattern in listing C is used by 150 - reference counting of struct pid, while the pattern in listing B is used by 151 - struct posix_acl.

+41 -21

Documentation/RCU/stallwarn.txt Documentation/RCU/stallwarn.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ============================== 1 4 Using RCU's CPU Stall Detector 5 + ============================== 2 6 3 7 This document first discusses what sorts of issues RCU's CPU stall 4 8 detector can locate, and then discusses kernel parameters and Kconfig ··· 11 7 12 8 13 9 What Causes RCU CPU Stall Warnings? 10 + =================================== 14 11 15 12 So your kernel printed an RCU CPU stall warning. The next question is 16 13 "What caused it?" The following problems can result in RCU CPU stall 17 14 warnings: 18 15 19 - o A CPU looping in an RCU read-side critical section. 16 + - A CPU looping in an RCU read-side critical section. 20 17 21 - o A CPU looping with interrupts disabled. 18 + - A CPU looping with interrupts disabled. 22 19 23 - o A CPU looping with preemption disabled. 20 + - A CPU looping with preemption disabled. 24 21 25 - o A CPU looping with bottom halves disabled. 22 + - A CPU looping with bottom halves disabled. 26 23 27 - o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel 24 + - For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel 28 25 without invoking schedule(). If the looping in the kernel is 29 26 really expected and desirable behavior, you might need to add 30 27 some calls to cond_resched(). 31 28 32 - o Booting Linux using a console connection that is too slow to 29 + - Booting Linux using a console connection that is too slow to 33 30 keep up with the boot-time console-message rate. For example, 34 31 a 115Kbaud serial console can be -way- too slow to keep up 35 32 with boot-time message rates, and will frequently result in 36 33 RCU CPU stall warning messages. Especially if you have added 37 34 debug printk()s. 38 35 39 - o Anything that prevents RCU's grace-period kthreads from running. 36 + - Anything that prevents RCU's grace-period kthreads from running. 40 37 This can result in the "All QSes seen" console-log message. 41 38 This message will include information on when the kthread last 42 39 ran and how often it should be expected to run. It can also 43 - result in the "rcu_.*kthread starved for" console-log message, 40 + result in the ``rcu_.*kthread starved for`` console-log message, 44 41 which will include additional debugging information. 45 42 46 - o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might 43 + - A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might 47 44 happen to preempt a low-priority task in the middle of an RCU 48 45 read-side critical section. This is especially damaging if 49 46 that low-priority task is not permitted to run on any other CPU, ··· 53 48 While the system is in the process of running itself out of 54 49 memory, you might see stall-warning messages. 55 50 56 - o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that 51 + - A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that 57 52 is running at a higher priority than the RCU softirq threads. 58 53 This will prevent RCU callbacks from ever being invoked, 59 54 and in a CONFIG_PREEMPT_RCU kernel will further prevent ··· 68 63 can increase your system's context-switch rate and thus degrade 69 64 performance. 70 65 71 - o A periodic interrupt whose handler takes longer than the time 66 + - A periodic interrupt whose handler takes longer than the time 72 67 interval between successive pairs of interrupts. This can 73 68 prevent RCU's kthreads and softirq handlers from running. 74 69 Note that certain high-overhead debugging options, for example ··· 76 71 considerably longer than normal, which can in turn result in 77 72 RCU CPU stall warnings. 78 73 79 - o Testing a workload on a fast system, tuning the stall-warning 74 + - Testing a workload on a fast system, tuning the stall-warning 80 75 timeout down to just barely avoid RCU CPU stall warnings, and then 81 76 running the same workload with the same stall-warning timeout on a 82 77 slow system. Note that thermal throttling and on-demand governors 83 78 can cause a single system to be sometimes fast and sometimes slow! 84 79 85 - o A hardware or software issue shuts off the scheduler-clock 80 + - A hardware or software issue shuts off the scheduler-clock 86 81 interrupt on a CPU that is not in dyntick-idle mode. This 87 82 problem really has happened, and seems to be most likely to 88 83 result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels. 89 84 90 - o A bug in the RCU implementation. 85 + - A hardware or software issue that prevents time-based wakeups 86 + from occurring. These issues can range from misconfigured or 87 + buggy timer hardware through bugs in the interrupt or exception 88 + path (whether hardware, firmware, or software) through bugs 89 + in Linux's timer subsystem through bugs in the scheduler, and, 90 + yes, even including bugs in RCU itself. 91 91 92 - o A hardware failure. This is quite unlikely, but has occurred 92 + - A bug in the RCU implementation. 93 + 94 + - A hardware failure. This is quite unlikely, but has occurred 93 95 at least once in real life. A CPU failed in a running system, 94 96 becoming unresponsive, but not causing an immediate crash. 95 97 This resulted in a series of RCU CPU stall warnings, eventually ··· 121 109 122 110 123 111 Fine-Tuning the RCU CPU Stall Detector 112 + ====================================== 124 113 125 114 The rcuupdate.rcu_cpu_stall_suppress module parameter disables RCU's 126 115 CPU stall detector, which detects conditions that unduly delay RCU grace ··· 131 118 controlled by a set of kernel configuration variables and cpp macros: 132 119 133 120 CONFIG_RCU_CPU_STALL_TIMEOUT 121 + ---------------------------- 134 122 135 123 This kernel configuration parameter defines the period of time 136 124 that RCU will wait from the beginning of a grace period until it ··· 151 137 /sys/module/rcupdate/parameters/rcu_cpu_stall_suppress. 152 138 153 139 RCU_STALL_DELAY_DELTA 140 + --------------------- 154 141 155 142 Although the lockdep facility is extremely useful, it does add 156 143 some overhead. Therefore, under CONFIG_PROVE_RCU, the ··· 160 145 macro, not a kernel configuration parameter.) 161 146 162 147 RCU_STALL_RAT_DELAY 148 + ------------------- 163 149 164 150 The CPU stall detector tries to make the offending CPU print its 165 151 own warnings, as this often gives better-quality stack traces. ··· 171 155 parameter.) 172 156 173 157 rcupdate.rcu_task_stall_timeout 158 + ------------------------------- 174 159 175 160 This boot/sysfs parameter controls the RCU-tasks stall warning 176 161 interval. A value of zero or less suppresses RCU-tasks stall ··· 185 168 186 169 187 170 Interpreting RCU's CPU Stall-Detector "Splats" 171 + ============================================== 188 172 189 173 For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling, 190 - it will print a message similar to the following: 174 + it will print a message similar to the following:: 191 175 192 176 INFO: rcu_sched detected stalls on CPUs/tasks: 193 177 2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0 ··· 241 223 (625 in this case). 242 224 243 225 In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed 244 - for each CPU: 226 + for each CPU:: 245 227 246 228 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 dyntick_enabled: 1 247 229 ··· 253 235 254 236 If the grace period ends just as the stall warning starts printing, 255 237 there will be a spurious stall-warning message, which will include 256 - the following: 238 + the following:: 257 239 258 240 INFO: Stall ended before state dump start 259 241 ··· 266 248 267 249 If all CPUs and tasks have passed through quiescent states, but the 268 250 grace period has nevertheless failed to end, the stall-warning splat 269 - will include something like the following: 251 + will include something like the following:: 270 252 271 253 All QSes seen, last rcu_preempt kthread activity 23807 (4297905177-4297881370), jiffies_till_next_fqs=3, root ->qsmask 0x0 272 254 ··· 279 261 280 262 If the relevant grace-period kthread has been unable to run prior to 281 263 the stall warning, as was the case in the "All QSes seen" line above, 282 - the following additional line is printed: 264 + the following additional line is printed:: 283 265 284 266 kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5 285 267 ··· 294 276 295 277 296 278 Multiple Warnings From One Stall 279 + ================================ 297 280 298 281 If a stall lasts long enough, multiple stall-warning messages will be 299 282 printed for it. The second and subsequent messages are printed at ··· 304 285 305 286 306 287 Stall Warnings for Expedited Grace Periods 288 + ========================================== 307 289 308 290 If an expedited grace period detects a stall, it will place a message 309 - like the following in dmesg: 291 + like the following in dmesg:: 310 292 311 293 INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 21119 jiffies s: 73 root: 0x2/. 312 294

+63 -52

Documentation/RCU/torture.txt Documentation/RCU/torture.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ========================== 1 4 RCU Torture Test Operation 5 + ========================== 2 6 3 7 4 8 CONFIG_RCU_TORTURE_TEST 9 + ======================= 5 10 6 11 The CONFIG_RCU_TORTURE_TEST config option is available for all RCU 7 12 implementations. It creates an rcutorture kernel module that can ··· 18 13 Module parameters are prefixed by "rcutorture." in 19 14 Documentation/admin-guide/kernel-parameters.txt. 20 15 21 - OUTPUT 16 + Output 17 + ====== 22 18 23 - The statistics output is as follows: 19 + The statistics output is as follows:: 24 20 25 21 rcu-torture:--- Start of test: nreaders=16 nfakewriters=4 stat_interval=30 verbose=0 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 26 22 rcu-torture: rtc: (null) ver: 155441 tfle: 0 rta: 155441 rtaf: 8884 rtf: 155440 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 3055767 ··· 42 36 43 37 The entries are as follows: 44 38 45 - o "rtc": The hexadecimal address of the structure currently visible 39 + * "rtc": The hexadecimal address of the structure currently visible 46 40 to readers. 47 41 48 - o "ver": The number of times since boot that the RCU writer task 42 + * "ver": The number of times since boot that the RCU writer task 49 43 has changed the structure visible to readers. 50 44 51 - o "tfle": If non-zero, indicates that the "torture freelist" 45 + * "tfle": If non-zero, indicates that the "torture freelist" 52 46 containing structures to be placed into the "rtc" area is empty. 53 47 This condition is important, since it can fool you into thinking 54 48 that RCU is working when it is not. :-/ 55 49 56 - o "rta": Number of structures allocated from the torture freelist. 50 + * "rta": Number of structures allocated from the torture freelist. 57 51 58 - o "rtaf": Number of allocations from the torture freelist that have 52 + * "rtaf": Number of allocations from the torture freelist that have 59 53 failed due to the list being empty. It is not unusual for this 60 54 to be non-zero, but it is bad for it to be a large fraction of 61 55 the value indicated by "rta". 62 56 63 - o "rtf": Number of frees into the torture freelist. 57 + * "rtf": Number of frees into the torture freelist. 64 58 65 - o "rtmbe": A non-zero value indicates that rcutorture believes that 59 + * "rtmbe": A non-zero value indicates that rcutorture believes that 66 60 rcu_assign_pointer() and rcu_dereference() are not working 67 61 correctly. This value should be zero. 68 62 69 - o "rtbe": A non-zero value indicates that one of the rcu_barrier() 63 + * "rtbe": A non-zero value indicates that one of the rcu_barrier() 70 64 family of functions is not working correctly. 71 65 72 - o "rtbke": rcutorture was unable to create the real-time kthreads 66 + * "rtbke": rcutorture was unable to create the real-time kthreads 73 67 used to force RCU priority inversion. This value should be zero. 74 68 75 - o "rtbre": Although rcutorture successfully created the kthreads 69 + * "rtbre": Although rcutorture successfully created the kthreads 76 70 used to force RCU priority inversion, it was unable to set them 77 71 to the real-time priority level of 1. This value should be zero. 78 72 79 - o "rtbf": The number of times that RCU priority boosting failed 73 + * "rtbf": The number of times that RCU priority boosting failed 80 74 to resolve RCU priority inversion. 81 75 82 - o "rtb": The number of times that rcutorture attempted to force 76 + * "rtb": The number of times that rcutorture attempted to force 83 77 an RCU priority inversion condition. If you are testing RCU 84 78 priority boosting via the "test_boost" module parameter, this 85 79 value should be non-zero. 86 80 87 - o "nt": The number of times rcutorture ran RCU read-side code from 81 + * "nt": The number of times rcutorture ran RCU read-side code from 88 82 within a timer handler. This value should be non-zero only 89 83 if you specified the "irqreader" module parameter. 90 84 91 - o "Reader Pipe": Histogram of "ages" of structures seen by readers. 85 + * "Reader Pipe": Histogram of "ages" of structures seen by readers. 92 86 If any entries past the first two are non-zero, RCU is broken. 93 87 And rcutorture prints the error flag string "!!!" to make sure 94 88 you notice. The age of a newly allocated structure is zero, ··· 100 94 RCU. If you want to see what it looks like when broken, break 101 95 it yourself. ;-) 102 96 103 - o "Reader Batch": Another histogram of "ages" of structures seen 97 + * "Reader Batch": Another histogram of "ages" of structures seen 104 98 by readers, but in terms of counter flips (or batches) rather 105 99 than in terms of grace periods. The legal number of non-zero 106 100 entries is again two. The reason for this separate view is that 107 101 it is sometimes easier to get the third entry to show up in the 108 102 "Reader Batch" list than in the "Reader Pipe" list. 109 103 110 - o "Free-Block Circulation": Shows the number of torture structures 104 + * "Free-Block Circulation": Shows the number of torture structures 111 105 that have reached a given point in the pipeline. The first element 112 106 should closely correspond to the number of structures allocated, 113 107 the second to the number that have been removed from reader view, ··· 118 112 119 113 Different implementations of RCU can provide implementation-specific 120 114 additional information. For example, Tree SRCU provides the following 121 - additional line: 115 + additional line:: 122 116 123 117 srcud-torture: Tree SRCU per-CPU(idx=0): 0(35,-21) 1(-4,24) 2(1,1) 3(-26,20) 4(28,-47) 5(-9,4) 6(-10,14) 7(-14,11) T(1,6) 124 118 ··· 129 123 "old" and "current" values to the underlying array, and is useful for 130 124 debugging. The final "T" entry contains the totals of the counters. 131 125 132 - 133 - USAGE ON SPECIFIC KERNEL BUILDS 126 + Usage on Specific Kernel Builds 127 + =============================== 134 128 135 129 It is sometimes desirable to torture RCU on a specific kernel build, 136 130 for example, when preparing to put that kernel build into production. 137 131 In that case, the kernel should be built with CONFIG_RCU_TORTURE_TEST=m 138 132 so that the test can be started using modprobe and terminated using rmmod. 139 133 140 - For example, the following script may be used to torture RCU: 134 + For example, the following script may be used to torture RCU:: 141 135 142 136 #!/bin/sh 143 137 ··· 154 148 were no RCU failures, CPU-hotplug problems were detected. 155 149 156 150 157 - USAGE ON MAINLINE KERNELS 151 + Usage on Mainline Kernels 152 + ========================= 158 153 159 154 When using rcutorture to test changes to RCU itself, it is often 160 155 necessary to build a number of kernels in order to test that change ··· 187 180 --configs argument to kvm.sh as follows: "--configs 'SRCU-N SRCU-P'". 188 181 Large systems can run multiple copies of of the full set of scenarios, 189 182 for example, a system with 448 hardware threads can run five instances 190 - of the full set concurrently. To make this happen: 183 + of the full set concurrently. To make this happen:: 191 184 192 185 kvm.sh --cpus 448 --configs '5*CFLIST' 193 186 194 187 Alternatively, such a system can run 56 concurrent instances of a single 195 - eight-CPU scenario: 188 + eight-CPU scenario:: 196 189 197 190 kvm.sh --cpus 448 --configs '56*TREE04' 198 191 199 - Or 28 concurrent instances of each of two eight-CPU scenarios: 192 + Or 28 concurrent instances of each of two eight-CPU scenarios:: 200 193 201 194 kvm.sh --cpus 448 --configs '28*TREE03 28*TREE04' 202 195 ··· 206 199 using the --bootargs parameter discussed below. 207 200 208 201 Sometimes additional debugging is useful, and in such cases the --kconfig 209 - parameter to kvm.sh may be used, for example, "--kconfig 'CONFIG_KASAN=y'". 202 + parameter to kvm.sh may be used, for example, ``--kconfig 'CONFIG_KASAN=y'``. 210 203 211 204 Kernel boot arguments can also be supplied, for example, to control 212 205 rcutorture's module parameters. For example, to test a change to RCU's 213 206 CPU stall-warning code, use "--bootargs 'rcutorture.stall_cpu=30'". 214 207 This will of course result in the scripting reporting a failure, namely 215 208 the resuling RCU CPU stall warning. As noted above, reducing memory may 216 - require disabling rcutorture's callback-flooding tests: 209 + require disabling rcutorture's callback-flooding tests:: 217 210 218 211 kvm.sh --cpus 448 --configs '56*TREE04' --memory 128M \ 219 212 --bootargs 'rcutorture.fwd_progress=0' ··· 232 225 to a file. The build products and console output of each run is kept in 233 226 tools/testing/selftests/rcutorture/res in timestamped directories. A 234 227 given directory can be supplied to kvm-find-errors.sh in order to have 235 - it cycle you through summaries of errors and full error logs. For example: 228 + it cycle you through summaries of errors and full error logs. For example:: 236 229 237 230 tools/testing/selftests/rcutorture/bin/kvm-find-errors.sh \ 238 231 tools/testing/selftests/rcutorture/res/2020.01.20-15.54.23 ··· 252 245 253 246 The most frequently used files in each per-scenario-run directory are: 254 247 255 - .config: This file contains the Kconfig options. 248 + .config: 249 + This file contains the Kconfig options. 256 250 257 - Make.out: This contains build output for a specific scenario. 251 + Make.out: 252 + This contains build output for a specific scenario. 258 253 259 - console.log: This contains the console output for a specific scenario. 254 + console.log: 255 + This contains the console output for a specific scenario. 260 256 This file may be examined once the kernel has booted, but 261 257 it might not exist if the build failed. 262 258 263 - vmlinux: This contains the kernel, which can be useful with tools like 259 + vmlinux: 260 + This contains the kernel, which can be useful with tools like 264 261 objdump and gdb. 265 262 266 263 A number of additional files are available, but are less frequently used. 267 264 Many are intended for debugging of rcutorture itself or of its scripting. 268 265 269 266 As of v5.4, a successful run with the default set of scenarios produces 270 - the following summary at the end of the run on a 12-CPU system: 267 + the following summary at the end of the run on a 12-CPU system:: 271 268 272 - SRCU-N ------- 804233 GPs (148.932/s) [srcu: g10008272 f0x0 ] 273 - SRCU-P ------- 202320 GPs (37.4667/s) [srcud: g1809476 f0x0 ] 274 - SRCU-t ------- 1122086 GPs (207.794/s) [srcu: g0 f0x0 ] 275 - SRCU-u ------- 1111285 GPs (205.794/s) [srcud: g1 f0x0 ] 276 - TASKS01 ------- 19666 GPs (3.64185/s) [tasks: g0 f0x0 ] 277 - TASKS02 ------- 20541 GPs (3.80389/s) [tasks: g0 f0x0 ] 278 - TASKS03 ------- 19416 GPs (3.59556/s) [tasks: g0 f0x0 ] 279 - TINY01 ------- 836134 GPs (154.84/s) [rcu: g0 f0x0 ] n_max_cbs: 34198 280 - TINY02 ------- 850371 GPs (157.476/s) [rcu: g0 f0x0 ] n_max_cbs: 2631 281 - TREE01 ------- 162625 GPs (30.1157/s) [rcu: g1124169 f0x0 ] 282 - TREE02 ------- 333003 GPs (61.6672/s) [rcu: g2647753 f0x0 ] n_max_cbs: 35844 283 - TREE03 ------- 306623 GPs (56.782/s) [rcu: g2975325 f0x0 ] n_max_cbs: 1496497 284 - CPU count limited from 16 to 12 285 - TREE04 ------- 246149 GPs (45.5831/s) [rcu: g1695737 f0x0 ] n_max_cbs: 434961 286 - TREE05 ------- 314603 GPs (58.2598/s) [rcu: g2257741 f0x2 ] n_max_cbs: 193997 287 - TREE07 ------- 167347 GPs (30.9902/s) [rcu: g1079021 f0x0 ] n_max_cbs: 478732 288 - CPU count limited from 16 to 12 289 - TREE09 ------- 752238 GPs (139.303/s) [rcu: g13075057 f0x0 ] n_max_cbs: 99011 269 + SRCU-N ------- 804233 GPs (148.932/s) [srcu: g10008272 f0x0 ] 270 + SRCU-P ------- 202320 GPs (37.4667/s) [srcud: g1809476 f0x0 ] 271 + SRCU-t ------- 1122086 GPs (207.794/s) [srcu: g0 f0x0 ] 272 + SRCU-u ------- 1111285 GPs (205.794/s) [srcud: g1 f0x0 ] 273 + TASKS01 ------- 19666 GPs (3.64185/s) [tasks: g0 f0x0 ] 274 + TASKS02 ------- 20541 GPs (3.80389/s) [tasks: g0 f0x0 ] 275 + TASKS03 ------- 19416 GPs (3.59556/s) [tasks: g0 f0x0 ] 276 + TINY01 ------- 836134 GPs (154.84/s) [rcu: g0 f0x0 ] n_max_cbs: 34198 277 + TINY02 ------- 850371 GPs (157.476/s) [rcu: g0 f0x0 ] n_max_cbs: 2631 278 + TREE01 ------- 162625 GPs (30.1157/s) [rcu: g1124169 f0x0 ] 279 + TREE02 ------- 333003 GPs (61.6672/s) [rcu: g2647753 f0x0 ] n_max_cbs: 35844 280 + TREE03 ------- 306623 GPs (56.782/s) [rcu: g2975325 f0x0 ] n_max_cbs: 1496497 281 + CPU count limited from 16 to 12 282 + TREE04 ------- 246149 GPs (45.5831/s) [rcu: g1695737 f0x0 ] n_max_cbs: 434961 283 + TREE05 ------- 314603 GPs (58.2598/s) [rcu: g2257741 f0x2 ] n_max_cbs: 193997 284 + TREE07 ------- 167347 GPs (30.9902/s) [rcu: g1079021 f0x0 ] n_max_cbs: 478732 285 + CPU count limited from 16 to 12 286 + TREE09 ------- 752238 GPs (139.303/s) [rcu: g13075057 f0x0 ] n_max_cbs: 99011

+68

Documentation/admin-guide/kernel-parameters.txt

··· 4038 4038 latencies, which will choose a value aligned 4039 4039 with the appropriate hardware boundaries. 4040 4040 4041 + rcutree.rcu_min_cached_objs= [KNL] 4042 + Minimum number of objects which are cached and 4043 + maintained per one CPU. Object size is equal 4044 + to PAGE_SIZE. The cache allows to reduce the 4045 + pressure to page allocator, also it makes the 4046 + whole algorithm to behave better in low memory 4047 + condition. 4048 + 4041 4049 rcutree.jiffies_till_first_fqs= [KNL] 4042 4050 Set delay from grace-period initialization to 4043 4051 first attempt to force quiescent states. ··· 4266 4258 Set time (jiffies) between CPU-hotplug operations, 4267 4259 or zero to disable CPU-hotplug testing. 4268 4260 4261 + rcutorture.read_exit= [KNL] 4262 + Set the number of read-then-exit kthreads used 4263 + to test the interaction of RCU updaters and 4264 + task-exit processing. 4265 + 4266 + rcutorture.read_exit_burst= [KNL] 4267 + The number of times in a given read-then-exit 4268 + episode that a set of read-then-exit kthreads 4269 + is spawned. 4270 + 4271 + rcutorture.read_exit_delay= [KNL] 4272 + The delay, in seconds, between successive 4273 + read-then-exit testing episodes. 4274 + 4269 4275 rcutorture.shuffle_interval= [KNL] 4270 4276 Set task-shuffle interval (s). Shuffling tasks 4271 4277 allows some CPUs to go into dyntick-idle mode ··· 4428 4406 reboot_force is either force or not specified, 4429 4407 reboot_cpu is s[mp]#### with #### being the processor 4430 4408 to be used for rebooting. 4409 + 4410 + refscale.holdoff= [KNL] 4411 + Set test-start holdoff period. The purpose of 4412 + this parameter is to delay the start of the 4413 + test until boot completes in order to avoid 4414 + interference. 4415 + 4416 + refscale.loops= [KNL] 4417 + Set the number of loops over the synchronization 4418 + primitive under test. Increasing this number 4419 + reduces noise due to loop start/end overhead, 4420 + but the default has already reduced the per-pass 4421 + noise to a handful of picoseconds on ca. 2020 4422 + x86 laptops. 4423 + 4424 + refscale.nreaders= [KNL] 4425 + Set number of readers. The default value of -1 4426 + selects N, where N is roughly 75% of the number 4427 + of CPUs. A value of zero is an interesting choice. 4428 + 4429 + refscale.nruns= [KNL] 4430 + Set number of runs, each of which is dumped onto 4431 + the console log. 4432 + 4433 + refscale.readdelay= [KNL] 4434 + Set the read-side critical-section duration, 4435 + measured in microseconds. 4436 + 4437 + refscale.scale_type= [KNL] 4438 + Specify the read-protection implementation to test. 4439 + 4440 + refscale.shutdown= [KNL] 4441 + Shut down the system at the end of the performance 4442 + test. This defaults to 1 (shut it down) when 4443 + rcuperf is built into the kernel and to 0 (leave 4444 + it running) when rcuperf is built as a module. 4445 + 4446 + refscale.verbose= [KNL] 4447 + Enable additional printk() statements. 4431 4448 4432 4449 relax_domain_level= 4433 4450 [KNL, SMP] Set scheduler's default relax_domain_level. ··· 5142 5081 torture.disable_onoff_at_boot= [KNL] 5143 5082 Prevent the CPU-hotplug component of torturing 5144 5083 until after init has spawned. 5084 + 5085 + torture.ftrace_dump_at_shutdown= [KNL] 5086 + Dump the ftrace buffer at torture-test shutdown, 5087 + even if there were no errors. This can be a 5088 + very costly operation when many torture tests 5089 + are running concurrently, especially on systems 5090 + with rotating-rust storage. 5145 5091 5146 5092 tp720= [HW,PS2] 5147 5093

+1 -1

Documentation/locking/locktorture.rst

··· 166 166 two are self-explanatory, while the last indicates that while there 167 167 were no locking failures, CPU-hotplug problems were detected. 168 168 169 - Also see: Documentation/RCU/torture.txt 169 + Also see: Documentation/RCU/torture.rst

+2 -2

MAINTAINERS

··· 14449 14449 F: Documentation/RCU/ 14450 14450 F: include/linux/rcu* 14451 14451 F: kernel/rcu/ 14452 - X: Documentation/RCU/torture.txt 14452 + X: Documentation/RCU/torture.rst 14453 14453 X: include/linux/srcu*.h 14454 14454 X: kernel/rcu/srcu*.c 14455 14455 ··· 17301 17301 L: linux-kernel@vger.kernel.org 17302 17302 S: Supported 17303 17303 T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev 17304 - F: Documentation/RCU/torture.txt 17304 + F: Documentation/RCU/torture.rst 17305 17305 F: kernel/locking/locktorture.c 17306 17306 F: kernel/rcu/rcuperf.c 17307 17307 F: kernel/rcu/rcutorture.c

+2

fs/btrfs/extent_io.c

··· 4541 4541 4542 4542 /* once for us */ 4543 4543 free_extent_map(em); 4544 + 4545 + cond_resched(); /* Allow large-extent preemption. */ 4544 4546 } 4545 4547 } 4546 4548 return try_release_extent_state(tree, page, mask);

+1 -1

include/linux/rculist.h

··· 512 512 * @right: The hlist head on the right 513 513 * 514 514 * The lists start out as [@left ][node1 ... ] and 515 - [@right ][node2 ... ] 515 + * [@right ][node2 ... ] 516 516 * The lists end up as [@left ][node2 ... ] 517 517 * [@right ][node1 ... ] 518 518 */

+1 -1

include/linux/rculist_nulls.h

··· 162 162 * The barrier() is needed to make sure compiler doesn't cache first element [1], 163 163 * as this loop can be restarted [2] 164 164 * [1] Documentation/core-api/atomic_ops.rst around line 114 165 - * [2] Documentation/RCU/rculist_nulls.txt around line 146 165 + * [2] Documentation/RCU/rculist_nulls.rst around line 146 166 166 */ 167 167 #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \ 168 168 for (({barrier();}), \

+46 -7

include/linux/rcupdate.h

··· 828 828 829 829 /* 830 830 * Does the specified offset indicate that the corresponding rcu_head 831 - * structure can be handled by kfree_rcu()? 831 + * structure can be handled by kvfree_rcu()? 832 832 */ 833 - #define __is_kfree_rcu_offset(offset) ((offset) < 4096) 833 + #define __is_kvfree_rcu_offset(offset) ((offset) < 4096) 834 834 835 835 /* 836 836 * Helper macro for kfree_rcu() to prevent argument-expansion eyestrain. 837 837 */ 838 - #define __kfree_rcu(head, offset) \ 838 + #define __kvfree_rcu(head, offset) \ 839 839 do { \ 840 - BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \ 841 - kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \ 840 + BUILD_BUG_ON(!__is_kvfree_rcu_offset(offset)); \ 841 + kvfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \ 842 842 } while (0) 843 843 844 844 /** ··· 857 857 * Because the functions are not allowed in the low-order 4096 bytes of 858 858 * kernel virtual memory, offsets up to 4095 bytes can be accommodated. 859 859 * If the offset is larger than 4095 bytes, a compile-time error will 860 - * be generated in __kfree_rcu(). If this error is triggered, you can 860 + * be generated in __kvfree_rcu(). If this error is triggered, you can 861 861 * either fall back to use of call_rcu() or rearrange the structure to 862 862 * position the rcu_head structure into the first 4096 bytes. 863 863 * ··· 872 872 typeof (ptr) ___p = (ptr); \ 873 873 \ 874 874 if (___p) \ 875 - __kfree_rcu(&((___p)->rhf), offsetof(typeof(*(ptr)), rhf)); \ 875 + __kvfree_rcu(&((___p)->rhf), offsetof(typeof(*(ptr)), rhf)); \ 876 + } while (0) 877 + 878 + /** 879 + * kvfree_rcu() - kvfree an object after a grace period. 880 + * 881 + * This macro consists of one or two arguments and it is 882 + * based on whether an object is head-less or not. If it 883 + * has a head then a semantic stays the same as it used 884 + * to be before: 885 + * 886 + * kvfree_rcu(ptr, rhf); 887 + * 888 + * where @ptr is a pointer to kvfree(), @rhf is the name 889 + * of the rcu_head structure within the type of @ptr. 890 + * 891 + * When it comes to head-less variant, only one argument 892 + * is passed and that is just a pointer which has to be 893 + * freed after a grace period. Therefore the semantic is 894 + * 895 + * kvfree_rcu(ptr); 896 + * 897 + * where @ptr is a pointer to kvfree(). 898 + * 899 + * Please note, head-less way of freeing is permitted to 900 + * use from a context that has to follow might_sleep() 901 + * annotation. Otherwise, please switch and embed the 902 + * rcu_head structure within the type of @ptr. 903 + */ 904 + #define kvfree_rcu(...) KVFREE_GET_MACRO(__VA_ARGS__, \ 905 + kvfree_rcu_arg_2, kvfree_rcu_arg_1)(__VA_ARGS__) 906 + 907 + #define KVFREE_GET_MACRO(_1, _2, NAME, ...) NAME 908 + #define kvfree_rcu_arg_2(ptr, rhf) kfree_rcu(ptr, rhf) 909 + #define kvfree_rcu_arg_1(ptr) \ 910 + do { \ 911 + typeof(ptr) ___p = (ptr); \ 912 + \ 913 + if (___p) \ 914 + kvfree_call_rcu(NULL, (rcu_callback_t) (___p)); \ 876 915 } while (0) 877 916 878 917 /*

+2 -2

include/linux/rcupdate_trace.h

··· 36 36 /** 37 37 * rcu_read_lock_trace - mark beginning of RCU-trace read-side critical section 38 38 * 39 - * When synchronize_rcu_trace() is invoked by one task, then that task 40 - * is guaranteed to block until all other tasks exit their read-side 39 + * When synchronize_rcu_tasks_trace() is invoked by one task, then that 40 + * task is guaranteed to block until all other tasks exit their read-side 41 41 * critical sections. Similarly, if call_rcu_trace() is invoked on one 42 42 * task while other tasks are within RCU read-side critical sections, 43 43 * invocation of the corresponding RCU callback is deferred until after

+18 -2

include/linux/rcutiny.h

··· 34 34 synchronize_rcu(); 35 35 } 36 36 37 - static inline void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func) 37 + /* 38 + * Add one more declaration of kvfree() here. It is 39 + * not so straight forward to just include <linux/mm.h> 40 + * where it is defined due to getting many compile 41 + * errors caused by that include. 42 + */ 43 + extern void kvfree(const void *addr); 44 + 45 + static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) 38 46 { 39 - call_rcu(head, func); 47 + if (head) { 48 + call_rcu(head, func); 49 + return; 50 + } 51 + 52 + // kvfree_rcu(one_arg) call. 53 + might_sleep(); 54 + synchronize_rcu(); 55 + kvfree((void *) func); 40 56 } 41 57 42 58 void rcu_qs(void);

+1 -1

include/linux/rcutree.h

··· 33 33 } 34 34 35 35 void synchronize_rcu_expedited(void); 36 - void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func); 36 + void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func); 37 37 38 38 void rcu_barrier(void); 39 39 bool rcu_eqs_special_set(int cpu);

+5

include/linux/torture.h

··· 55 55 #define DEFINE_TORTURE_RANDOM_PERCPU(name) \ 56 56 DEFINE_PER_CPU(struct torture_random_state, name) 57 57 unsigned long torture_random(struct torture_random_state *trsp); 58 + static inline void torture_random_init(struct torture_random_state *trsp) 59 + { 60 + trsp->trs_state = 0; 61 + trsp->trs_count = 0; 62 + } 58 63 59 64 /* Task shuffler, which causes CPUs to occasionally go idle. */ 60 65 void torture_shuffle_task_register(struct task_struct *tp);

+10 -9

include/trace/events/rcu.h

··· 435 435 #endif /* #if defined(CONFIG_TREE_RCU) */ 436 436 437 437 /* 438 - * Tracepoint for dyntick-idle entry/exit events. These take a string 439 - * as argument: "Start" for entering dyntick-idle mode, "Startirq" for 440 - * entering it from irq/NMI, "End" for leaving it, "Endirq" for leaving it 441 - * to irq/NMI, "--=" for events moving towards idle, and "++=" for events 442 - * moving away from idle. 438 + * Tracepoint for dyntick-idle entry/exit events. These take 2 strings 439 + * as argument: 440 + * polarity: "Start", "End", "StillNonIdle" for entering, exiting or still not 441 + * being in dyntick-idle mode. 442 + * context: "USER" or "IDLE" or "IRQ". 443 + * NMIs nested in IRQs are inferred with dynticks_nesting > 1 in IRQ context. 443 444 * 444 445 * These events also take a pair of numbers, which indicate the nesting 445 446 * depth before and after the event of interest, and a third number that is ··· 507 506 508 507 /* 509 508 * Tracepoint for the registration of a single RCU callback of the special 510 - * kfree() form. The first argument is the RCU type, the second argument 509 + * kvfree() form. The first argument is the RCU type, the second argument 511 510 * is a pointer to the RCU callback, the third argument is the offset 512 511 * of the callback within the enclosing RCU-protected data structure, 513 512 * the fourth argument is the number of lazy callbacks queued, and the 514 513 * fifth argument is the total number of callbacks queued. 515 514 */ 516 - TRACE_EVENT_RCU(rcu_kfree_callback, 515 + TRACE_EVENT_RCU(rcu_kvfree_callback, 517 516 518 517 TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset, 519 518 long qlen), ··· 597 596 598 597 /* 599 598 * Tracepoint for the invocation of a single RCU callback of the special 600 - * kfree() form. The first argument is the RCU flavor, the second 599 + * kvfree() form. The first argument is the RCU flavor, the second 601 600 * argument is a pointer to the RCU callback, and the third argument 602 601 * is the offset of the callback within the enclosing RCU-protected 603 602 * data structure. 604 603 */ 605 - TRACE_EVENT_RCU(rcu_invoke_kfree_callback, 604 + TRACE_EVENT_RCU(rcu_invoke_kvfree_callback, 606 605 607 606 TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset), 608 607

+1 -3

kernel/locking/lockdep.c

··· 5851 5851 pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n", 5852 5852 !rcu_lockdep_current_cpu_online() 5853 5853 ? "RCU used illegally from offline CPU!\n" 5854 - : !rcu_is_watching() 5855 - ? "RCU used illegally from idle CPU!\n" 5856 - : "", 5854 + : "", 5857 5855 rcu_scheduler_active, debug_locks); 5858 5856 5859 5857 /*

+7 -7

kernel/locking/locktorture.c

··· 631 631 cxt.cur_ops->writelock(); 632 632 if (WARN_ON_ONCE(lock_is_write_held)) 633 633 lwsp->n_lock_fail++; 634 - lock_is_write_held = 1; 634 + lock_is_write_held = true; 635 635 if (WARN_ON_ONCE(lock_is_read_held)) 636 636 lwsp->n_lock_fail++; /* rare, but... */ 637 637 638 638 lwsp->n_lock_acquired++; 639 639 cxt.cur_ops->write_delay(&rand); 640 - lock_is_write_held = 0; 640 + lock_is_write_held = false; 641 641 cxt.cur_ops->writeunlock(); 642 642 643 643 stutter_wait("lock_torture_writer"); ··· 665 665 schedule_timeout_uninterruptible(1); 666 666 667 667 cxt.cur_ops->readlock(); 668 - lock_is_read_held = 1; 668 + lock_is_read_held = true; 669 669 if (WARN_ON_ONCE(lock_is_write_held)) 670 670 lrsp->n_lock_fail++; /* rare, but... */ 671 671 672 672 lrsp->n_lock_acquired++; 673 673 cxt.cur_ops->read_delay(&rand); 674 - lock_is_read_held = 0; 674 + lock_is_read_held = false; 675 675 cxt.cur_ops->readunlock(); 676 676 677 677 stutter_wait("lock_torture_reader"); ··· 686 686 static void __torture_print_stats(char *page, 687 687 struct lock_stress_stats *statp, bool write) 688 688 { 689 - bool fail = 0; 689 + bool fail = false; 690 690 int i, n_stress; 691 691 long max = 0, min = statp ? statp[0].n_lock_acquired : 0; 692 692 long long sum = 0; ··· 904 904 905 905 /* Initialize the statistics so that each run gets its own numbers. */ 906 906 if (nwriters_stress) { 907 - lock_is_write_held = 0; 907 + lock_is_write_held = false; 908 908 cxt.lwsa = kmalloc_array(cxt.nrealwriters_stress, 909 909 sizeof(*cxt.lwsa), 910 910 GFP_KERNEL); ··· 935 935 } 936 936 937 937 if (nreaders_stress) { 938 - lock_is_read_held = 0; 938 + lock_is_read_held = false; 939 939 cxt.lrsa = kmalloc_array(cxt.nrealreaders_stress, 940 940 sizeof(*cxt.lrsa), 941 941 GFP_KERNEL);

+19

kernel/rcu/Kconfig.debug

··· 61 61 Say M if you want the RCU torture tests to build as a module. 62 62 Say N if you are unsure. 63 63 64 + config RCU_REF_SCALE_TEST 65 + tristate "Scalability tests for read-side synchronization (RCU and others)" 66 + depends on DEBUG_KERNEL 67 + select TORTURE_TEST 68 + select SRCU 69 + select TASKS_RCU 70 + select TASKS_RUDE_RCU 71 + select TASKS_TRACE_RCU 72 + default n 73 + help 74 + This option provides a kernel module that runs performance tests 75 + useful comparing RCU with various read-side synchronization mechanisms. 76 + The kernel module may be built after the fact on the running kernel to be 77 + tested, if desired. 78 + 79 + Say Y here if you want these performance tests built into the kernel. 80 + Say M if you want to build it as a module instead. 81 + Say N if you are unsure. 82 + 64 83 config RCU_CPU_STALL_TIMEOUT 65 84 int "RCU CPU stall timeout in seconds" 66 85 depends on RCU_STALL_COMMON

+1

kernel/rcu/Makefile

··· 12 12 obj-$(CONFIG_TINY_SRCU) += srcutiny.o 13 13 obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o 14 14 obj-$(CONFIG_RCU_PERF_TEST) += rcuperf.o 15 + obj-$(CONFIG_RCU_REF_SCALE_TEST) += refscale.o 15 16 obj-$(CONFIG_TREE_RCU) += tree.o 16 17 obj-$(CONFIG_TINY_RCU) += tiny.o 17 18 obj-$(CONFIG_RCU_NEED_SEGCBLIST) += rcu_segcblist.o

+13 -12

kernel/rcu/rcuperf.c

··· 69 69 * value specified by nr_cpus for a read-only test. 70 70 * 71 71 * Various other use cases may of course be specified. 72 + * 73 + * Note that this test's readers are intended only as a test load for 74 + * the writers. The reader performance statistics will be overly 75 + * pessimistic due to the per-critical-section interrupt disabling, 76 + * test-end checks, and the pair of calls through pointers. 72 77 */ 73 78 74 79 #ifdef MODULE ··· 314 309 } 315 310 316 311 /* 317 - * RCU perf reader kthread. Repeatedly does empty RCU read-side 318 - * critical section, minimizing update-side interference. 312 + * RCU perf reader kthread. Repeatedly does empty RCU read-side critical 313 + * section, minimizing update-side interference. However, the point of 314 + * this test is not to evaluate reader performance, but instead to serve 315 + * as a test load for update-side performance testing. 319 316 */ 320 317 static int 321 318 rcu_perf_reader(void *arg) ··· 583 576 static int 584 577 rcu_perf_shutdown(void *arg) 585 578 { 586 - do { 587 - wait_event(shutdown_wq, 588 - atomic_read(&n_rcu_perf_writer_finished) >= 589 - nrealwriters); 590 - } while (atomic_read(&n_rcu_perf_writer_finished) < nrealwriters); 579 + wait_event(shutdown_wq, 580 + atomic_read(&n_rcu_perf_writer_finished) >= nrealwriters); 591 581 smp_mb(); /* Wake before output. */ 592 582 rcu_perf_cleanup(); 593 583 kernel_power_off(); ··· 697 693 static int 698 694 kfree_perf_shutdown(void *arg) 699 695 { 700 - do { 701 - wait_event(shutdown_wq, 702 - atomic_read(&n_kfree_perf_thread_ended) >= 703 - kfree_nrealthreads); 704 - } while (atomic_read(&n_kfree_perf_thread_ended) < kfree_nrealthreads); 696 + wait_event(shutdown_wq, 697 + atomic_read(&n_kfree_perf_thread_ended) >= kfree_nrealthreads); 705 698 706 699 smp_mb(); /* Wake before output. */ 707 700

+113 -6

kernel/rcu/rcutorture.c

··· 7 7 * Authors: Paul E. McKenney <paulmck@linux.ibm.com> 8 8 * Josh Triplett <josh@joshtriplett.org> 9 9 * 10 - * See also: Documentation/RCU/torture.txt 10 + * See also: Documentation/RCU/torture.rst 11 11 */ 12 12 13 13 #define pr_fmt(fmt) fmt ··· 109 109 torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)"); 110 110 torture_param(int, onoff_interval, 0, 111 111 "Time between CPU hotplugs (jiffies), 0=disable"); 112 + torture_param(int, read_exit_delay, 13, 113 + "Delay between read-then-exit episodes (s)"); 114 + torture_param(int, read_exit_burst, 16, 115 + "# of read-then-exit bursts per episode, zero to disable"); 112 116 torture_param(int, shuffle_interval, 3, "Number of seconds between shuffles"); 113 117 torture_param(int, shutdown_secs, 0, "Shutdown time (s), <= zero to disable."); 114 118 torture_param(int, stall_cpu, 0, "Stall duration (s), zero to disable."); ··· 150 146 static struct task_struct *fwd_prog_task; 151 147 static struct task_struct **barrier_cbs_tasks; 152 148 static struct task_struct *barrier_task; 149 + static struct task_struct *read_exit_task; 153 150 154 151 #define RCU_TORTURE_PIPE_LEN 10 155 152 ··· 182 177 static atomic_long_t n_rcu_torture_timers; 183 178 static long n_barrier_attempts; 184 179 static long n_barrier_successes; /* did rcu_barrier test succeed? */ 180 + static unsigned long n_read_exits; 185 181 static struct list_head rcu_torture_removed; 186 182 static unsigned long shutdown_jiffies; 187 183 ··· 1172 1166 WARN(1, "%s: rtort_pipe_count: %d\n", __func__, rcu_tortures[i].rtort_pipe_count); 1173 1167 } 1174 1168 } while (!torture_must_stop()); 1169 + rcu_torture_current = NULL; // Let stats task know that we are done. 1175 1170 /* Reset expediting back to unexpedited. */ 1176 1171 if (expediting > 0) 1177 1172 expediting = -expediting; ··· 1377 1370 struct rt_read_seg *rtrsp1; 1378 1371 unsigned long long ts; 1379 1372 1373 + WARN_ON_ONCE(!rcu_is_watching()); 1380 1374 newstate = rcutorture_extend_mask(readstate, trsp); 1381 1375 rcutorture_one_extend(&readstate, newstate, trsp, rtrsp++); 1382 1376 started = cur_ops->get_gp_seq(); ··· 1547 1539 n_rcu_torture_boosts, 1548 1540 atomic_long_read(&n_rcu_torture_timers)); 1549 1541 torture_onoff_stats(); 1550 - pr_cont("barrier: %ld/%ld:%ld\n", 1542 + pr_cont("barrier: %ld/%ld:%ld ", 1551 1543 data_race(n_barrier_successes), 1552 1544 data_race(n_barrier_attempts), 1553 1545 data_race(n_rcu_torture_barrier_error)); 1546 + pr_cont("read-exits: %ld\n", data_race(n_read_exits)); 1554 1547 1555 1548 pr_alert("%s%s ", torture_type, TORTURE_FLAG); 1556 1549 if (atomic_read(&n_rcu_torture_mberror) || ··· 1643 1634 "stall_cpu=%d stall_cpu_holdoff=%d stall_cpu_irqsoff=%d " 1644 1635 "stall_cpu_block=%d " 1645 1636 "n_barrier_cbs=%d " 1646 - "onoff_interval=%d onoff_holdoff=%d\n", 1637 + "onoff_interval=%d onoff_holdoff=%d " 1638 + "read_exit_delay=%d read_exit_burst=%d\n", 1647 1639 torture_type, tag, nrealreaders, nfakewriters, 1648 1640 stat_interval, verbose, test_no_idle_hz, shuffle_interval, 1649 1641 stutter, irqreader, fqs_duration, fqs_holdoff, fqs_stutter, ··· 1653 1643 stall_cpu, stall_cpu_holdoff, stall_cpu_irqsoff, 1654 1644 stall_cpu_block, 1655 1645 n_barrier_cbs, 1656 - onoff_interval, onoff_holdoff); 1646 + onoff_interval, onoff_holdoff, 1647 + read_exit_delay, read_exit_burst); 1657 1648 } 1658 1649 1659 1650 static int rcutorture_booster_cleanup(unsigned int cpu) ··· 2186 2175 static int rcu_torture_barrier_cbs(void *arg) 2187 2176 { 2188 2177 long myid = (long)arg; 2189 - bool lastphase = 0; 2178 + bool lastphase = false; 2190 2179 bool newphase; 2191 2180 struct rcu_head rcu; 2192 2181 ··· 2349 2338 return true; 2350 2339 } 2351 2340 2341 + static bool read_exit_child_stop; 2342 + static bool read_exit_child_stopped; 2343 + static wait_queue_head_t read_exit_wq; 2344 + 2345 + // Child kthread which just does an rcutorture reader and exits. 2346 + static int rcu_torture_read_exit_child(void *trsp_in) 2347 + { 2348 + struct torture_random_state *trsp = trsp_in; 2349 + 2350 + set_user_nice(current, MAX_NICE); 2351 + // Minimize time between reading and exiting. 2352 + while (!kthread_should_stop()) 2353 + schedule_timeout_uninterruptible(1); 2354 + (void)rcu_torture_one_read(trsp); 2355 + return 0; 2356 + } 2357 + 2358 + // Parent kthread which creates and destroys read-exit child kthreads. 2359 + static int rcu_torture_read_exit(void *unused) 2360 + { 2361 + int count = 0; 2362 + bool errexit = false; 2363 + int i; 2364 + struct task_struct *tsp; 2365 + DEFINE_TORTURE_RANDOM(trs); 2366 + 2367 + // Allocate and initialize. 2368 + set_user_nice(current, MAX_NICE); 2369 + VERBOSE_TOROUT_STRING("rcu_torture_read_exit: Start of test"); 2370 + 2371 + // Each pass through this loop does one read-exit episode. 2372 + do { 2373 + if (++count > read_exit_burst) { 2374 + VERBOSE_TOROUT_STRING("rcu_torture_read_exit: End of episode"); 2375 + rcu_barrier(); // Wait for task_struct free, avoid OOM. 2376 + for (i = 0; i < read_exit_delay; i++) { 2377 + schedule_timeout_uninterruptible(HZ); 2378 + if (READ_ONCE(read_exit_child_stop)) 2379 + break; 2380 + } 2381 + if (!READ_ONCE(read_exit_child_stop)) 2382 + VERBOSE_TOROUT_STRING("rcu_torture_read_exit: Start of episode"); 2383 + count = 0; 2384 + } 2385 + if (READ_ONCE(read_exit_child_stop)) 2386 + break; 2387 + // Spawn child. 2388 + tsp = kthread_run(rcu_torture_read_exit_child, 2389 + &trs, "%s", 2390 + "rcu_torture_read_exit_child"); 2391 + if (IS_ERR(tsp)) { 2392 + VERBOSE_TOROUT_ERRSTRING("out of memory"); 2393 + errexit = true; 2394 + tsp = NULL; 2395 + break; 2396 + } 2397 + cond_resched(); 2398 + kthread_stop(tsp); 2399 + n_read_exits ++; 2400 + stutter_wait("rcu_torture_read_exit"); 2401 + } while (!errexit && !READ_ONCE(read_exit_child_stop)); 2402 + 2403 + // Clean up and exit. 2404 + smp_store_release(&read_exit_child_stopped, true); // After reaping. 2405 + smp_mb(); // Store before wakeup. 2406 + wake_up(&read_exit_wq); 2407 + while (!torture_must_stop()) 2408 + schedule_timeout_uninterruptible(1); 2409 + torture_kthread_stopping("rcu_torture_read_exit"); 2410 + return 0; 2411 + } 2412 + 2413 + static int rcu_torture_read_exit_init(void) 2414 + { 2415 + if (read_exit_burst <= 0) 2416 + return -EINVAL; 2417 + init_waitqueue_head(&read_exit_wq); 2418 + read_exit_child_stop = false; 2419 + read_exit_child_stopped = false; 2420 + return torture_create_kthread(rcu_torture_read_exit, NULL, 2421 + read_exit_task); 2422 + } 2423 + 2424 + static void rcu_torture_read_exit_cleanup(void) 2425 + { 2426 + if (!read_exit_task) 2427 + return; 2428 + WRITE_ONCE(read_exit_child_stop, true); 2429 + smp_mb(); // Above write before wait. 2430 + wait_event(read_exit_wq, smp_load_acquire(&read_exit_child_stopped)); 2431 + torture_stop_kthread(rcutorture_read_exit, read_exit_task); 2432 + } 2433 + 2352 2434 static enum cpuhp_state rcutor_hp; 2353 2435 2354 2436 static void ··· 2463 2359 } 2464 2360 2465 2361 show_rcu_gp_kthreads(); 2362 + rcu_torture_read_exit_cleanup(); 2466 2363 rcu_torture_barrier_cleanup(); 2467 2364 torture_stop_kthread(rcu_torture_fwd_prog, fwd_prog_task); 2468 2365 torture_stop_kthread(rcu_torture_stall, stall_task); ··· 2475 2370 reader_tasks[i]); 2476 2371 kfree(reader_tasks); 2477 2372 } 2478 - rcu_torture_current = NULL; 2479 2373 2480 2374 if (fakewriter_tasks) { 2481 2375 for (i = 0; i < nfakewriters; i++) { ··· 2784 2680 if (firsterr) 2785 2681 goto unwind; 2786 2682 firsterr = rcu_torture_barrier_init(); 2683 + if (firsterr) 2684 + goto unwind; 2685 + firsterr = rcu_torture_read_exit_init(); 2787 2686 if (firsterr) 2788 2687 goto unwind; 2789 2688 if (object_debug)

+717

kernel/rcu/refscale.c

··· 1 + // SPDX-License-Identifier: GPL-2.0+ 2 + // 3 + // Scalability test comparing RCU vs other mechanisms 4 + // for acquiring references on objects. 5 + // 6 + // Copyright (C) Google, 2020. 7 + // 8 + // Author: Joel Fernandes <joel@joelfernandes.org> 9 + 10 + #define pr_fmt(fmt) fmt 11 + 12 + #include <linux/atomic.h> 13 + #include <linux/bitops.h> 14 + #include <linux/completion.h> 15 + #include <linux/cpu.h> 16 + #include <linux/delay.h> 17 + #include <linux/err.h> 18 + #include <linux/init.h> 19 + #include <linux/interrupt.h> 20 + #include <linux/kthread.h> 21 + #include <linux/kernel.h> 22 + #include <linux/mm.h> 23 + #include <linux/module.h> 24 + #include <linux/moduleparam.h> 25 + #include <linux/notifier.h> 26 + #include <linux/percpu.h> 27 + #include <linux/rcupdate.h> 28 + #include <linux/rcupdate_trace.h> 29 + #include <linux/reboot.h> 30 + #include <linux/sched.h> 31 + #include <linux/spinlock.h> 32 + #include <linux/smp.h> 33 + #include <linux/stat.h> 34 + #include <linux/srcu.h> 35 + #include <linux/slab.h> 36 + #include <linux/torture.h> 37 + #include <linux/types.h> 38 + 39 + #include "rcu.h" 40 + 41 + #define SCALE_FLAG "-ref-scale: " 42 + 43 + #define SCALEOUT(s, x...) \ 44 + pr_alert("%s" SCALE_FLAG s, scale_type, ## x) 45 + 46 + #define VERBOSE_SCALEOUT(s, x...) \ 47 + do { if (verbose) pr_alert("%s" SCALE_FLAG s, scale_type, ## x); } while (0) 48 + 49 + #define VERBOSE_SCALEOUT_ERRSTRING(s, x...) \ 50 + do { if (verbose) pr_alert("%s" SCALE_FLAG "!!! " s, scale_type, ## x); } while (0) 51 + 52 + MODULE_LICENSE("GPL"); 53 + MODULE_AUTHOR("Joel Fernandes (Google) <joel@joelfernandes.org>"); 54 + 55 + static char *scale_type = "rcu"; 56 + module_param(scale_type, charp, 0444); 57 + MODULE_PARM_DESC(scale_type, "Type of test (rcu, srcu, refcnt, rwsem, rwlock."); 58 + 59 + torture_param(int, verbose, 0, "Enable verbose debugging printk()s"); 60 + 61 + // Wait until there are multiple CPUs before starting test. 62 + torture_param(int, holdoff, IS_BUILTIN(CONFIG_RCU_REF_SCALE_TEST) ? 10 : 0, 63 + "Holdoff time before test start (s)"); 64 + // Number of loops per experiment, all readers execute operations concurrently. 65 + torture_param(long, loops, 10000, "Number of loops per experiment."); 66 + // Number of readers, with -1 defaulting to about 75% of the CPUs. 67 + torture_param(int, nreaders, -1, "Number of readers, -1 for 75% of CPUs."); 68 + // Number of runs. 69 + torture_param(int, nruns, 30, "Number of experiments to run."); 70 + // Reader delay in nanoseconds, 0 for no delay. 71 + torture_param(int, readdelay, 0, "Read-side delay in nanoseconds."); 72 + 73 + #ifdef MODULE 74 + # define REFSCALE_SHUTDOWN 0 75 + #else 76 + # define REFSCALE_SHUTDOWN 1 77 + #endif 78 + 79 + torture_param(bool, shutdown, REFSCALE_SHUTDOWN, 80 + "Shutdown at end of scalability tests."); 81 + 82 + struct reader_task { 83 + struct task_struct *task; 84 + int start_reader; 85 + wait_queue_head_t wq; 86 + u64 last_duration_ns; 87 + }; 88 + 89 + static struct task_struct *shutdown_task; 90 + static wait_queue_head_t shutdown_wq; 91 + 92 + static struct task_struct *main_task; 93 + static wait_queue_head_t main_wq; 94 + static int shutdown_start; 95 + 96 + static struct reader_task *reader_tasks; 97 + 98 + // Number of readers that are part of the current experiment. 99 + static atomic_t nreaders_exp; 100 + 101 + // Use to wait for all threads to start. 102 + static atomic_t n_init; 103 + static atomic_t n_started; 104 + static atomic_t n_warmedup; 105 + static atomic_t n_cooleddown; 106 + 107 + // Track which experiment is currently running. 108 + static int exp_idx; 109 + 110 + // Operations vector for selecting different types of tests. 111 + struct ref_scale_ops { 112 + void (*init)(void); 113 + void (*cleanup)(void); 114 + void (*readsection)(const int nloops); 115 + void (*delaysection)(const int nloops, const int udl, const int ndl); 116 + const char *name; 117 + }; 118 + 119 + static struct ref_scale_ops *cur_ops; 120 + 121 + static void un_delay(const int udl, const int ndl) 122 + { 123 + if (udl) 124 + udelay(udl); 125 + if (ndl) 126 + ndelay(ndl); 127 + } 128 + 129 + static void ref_rcu_read_section(const int nloops) 130 + { 131 + int i; 132 + 133 + for (i = nloops; i >= 0; i--) { 134 + rcu_read_lock(); 135 + rcu_read_unlock(); 136 + } 137 + } 138 + 139 + static void ref_rcu_delay_section(const int nloops, const int udl, const int ndl) 140 + { 141 + int i; 142 + 143 + for (i = nloops; i >= 0; i--) { 144 + rcu_read_lock(); 145 + un_delay(udl, ndl); 146 + rcu_read_unlock(); 147 + } 148 + } 149 + 150 + static void rcu_sync_scale_init(void) 151 + { 152 + } 153 + 154 + static struct ref_scale_ops rcu_ops = { 155 + .init = rcu_sync_scale_init, 156 + .readsection = ref_rcu_read_section, 157 + .delaysection = ref_rcu_delay_section, 158 + .name = "rcu" 159 + }; 160 + 161 + // Definitions for SRCU ref scale testing. 162 + DEFINE_STATIC_SRCU(srcu_refctl_scale); 163 + static struct srcu_struct *srcu_ctlp = &srcu_refctl_scale; 164 + 165 + static void srcu_ref_scale_read_section(const int nloops) 166 + { 167 + int i; 168 + int idx; 169 + 170 + for (i = nloops; i >= 0; i--) { 171 + idx = srcu_read_lock(srcu_ctlp); 172 + srcu_read_unlock(srcu_ctlp, idx); 173 + } 174 + } 175 + 176 + static void srcu_ref_scale_delay_section(const int nloops, const int udl, const int ndl) 177 + { 178 + int i; 179 + int idx; 180 + 181 + for (i = nloops; i >= 0; i--) { 182 + idx = srcu_read_lock(srcu_ctlp); 183 + un_delay(udl, ndl); 184 + srcu_read_unlock(srcu_ctlp, idx); 185 + } 186 + } 187 + 188 + static struct ref_scale_ops srcu_ops = { 189 + .init = rcu_sync_scale_init, 190 + .readsection = srcu_ref_scale_read_section, 191 + .delaysection = srcu_ref_scale_delay_section, 192 + .name = "srcu" 193 + }; 194 + 195 + // Definitions for RCU Tasks ref scale testing: Empty read markers. 196 + // These definitions also work for RCU Rude readers. 197 + static void rcu_tasks_ref_scale_read_section(const int nloops) 198 + { 199 + int i; 200 + 201 + for (i = nloops; i >= 0; i--) 202 + continue; 203 + } 204 + 205 + static void rcu_tasks_ref_scale_delay_section(const int nloops, const int udl, const int ndl) 206 + { 207 + int i; 208 + 209 + for (i = nloops; i >= 0; i--) 210 + un_delay(udl, ndl); 211 + } 212 + 213 + static struct ref_scale_ops rcu_tasks_ops = { 214 + .init = rcu_sync_scale_init, 215 + .readsection = rcu_tasks_ref_scale_read_section, 216 + .delaysection = rcu_tasks_ref_scale_delay_section, 217 + .name = "rcu-tasks" 218 + }; 219 + 220 + // Definitions for RCU Tasks Trace ref scale testing. 221 + static void rcu_trace_ref_scale_read_section(const int nloops) 222 + { 223 + int i; 224 + 225 + for (i = nloops; i >= 0; i--) { 226 + rcu_read_lock_trace(); 227 + rcu_read_unlock_trace(); 228 + } 229 + } 230 + 231 + static void rcu_trace_ref_scale_delay_section(const int nloops, const int udl, const int ndl) 232 + { 233 + int i; 234 + 235 + for (i = nloops; i >= 0; i--) { 236 + rcu_read_lock_trace(); 237 + un_delay(udl, ndl); 238 + rcu_read_unlock_trace(); 239 + } 240 + } 241 + 242 + static struct ref_scale_ops rcu_trace_ops = { 243 + .init = rcu_sync_scale_init, 244 + .readsection = rcu_trace_ref_scale_read_section, 245 + .delaysection = rcu_trace_ref_scale_delay_section, 246 + .name = "rcu-trace" 247 + }; 248 + 249 + // Definitions for reference count 250 + static atomic_t refcnt; 251 + 252 + static void ref_refcnt_section(const int nloops) 253 + { 254 + int i; 255 + 256 + for (i = nloops; i >= 0; i--) { 257 + atomic_inc(&refcnt); 258 + atomic_dec(&refcnt); 259 + } 260 + } 261 + 262 + static void ref_refcnt_delay_section(const int nloops, const int udl, const int ndl) 263 + { 264 + int i; 265 + 266 + for (i = nloops; i >= 0; i--) { 267 + atomic_inc(&refcnt); 268 + un_delay(udl, ndl); 269 + atomic_dec(&refcnt); 270 + } 271 + } 272 + 273 + static struct ref_scale_ops refcnt_ops = { 274 + .init = rcu_sync_scale_init, 275 + .readsection = ref_refcnt_section, 276 + .delaysection = ref_refcnt_delay_section, 277 + .name = "refcnt" 278 + }; 279 + 280 + // Definitions for rwlock 281 + static rwlock_t test_rwlock; 282 + 283 + static void ref_rwlock_init(void) 284 + { 285 + rwlock_init(&test_rwlock); 286 + } 287 + 288 + static void ref_rwlock_section(const int nloops) 289 + { 290 + int i; 291 + 292 + for (i = nloops; i >= 0; i--) { 293 + read_lock(&test_rwlock); 294 + read_unlock(&test_rwlock); 295 + } 296 + } 297 + 298 + static void ref_rwlock_delay_section(const int nloops, const int udl, const int ndl) 299 + { 300 + int i; 301 + 302 + for (i = nloops; i >= 0; i--) { 303 + read_lock(&test_rwlock); 304 + un_delay(udl, ndl); 305 + read_unlock(&test_rwlock); 306 + } 307 + } 308 + 309 + static struct ref_scale_ops rwlock_ops = { 310 + .init = ref_rwlock_init, 311 + .readsection = ref_rwlock_section, 312 + .delaysection = ref_rwlock_delay_section, 313 + .name = "rwlock" 314 + }; 315 + 316 + // Definitions for rwsem 317 + static struct rw_semaphore test_rwsem; 318 + 319 + static void ref_rwsem_init(void) 320 + { 321 + init_rwsem(&test_rwsem); 322 + } 323 + 324 + static void ref_rwsem_section(const int nloops) 325 + { 326 + int i; 327 + 328 + for (i = nloops; i >= 0; i--) { 329 + down_read(&test_rwsem); 330 + up_read(&test_rwsem); 331 + } 332 + } 333 + 334 + static void ref_rwsem_delay_section(const int nloops, const int udl, const int ndl) 335 + { 336 + int i; 337 + 338 + for (i = nloops; i >= 0; i--) { 339 + down_read(&test_rwsem); 340 + un_delay(udl, ndl); 341 + up_read(&test_rwsem); 342 + } 343 + } 344 + 345 + static struct ref_scale_ops rwsem_ops = { 346 + .init = ref_rwsem_init, 347 + .readsection = ref_rwsem_section, 348 + .delaysection = ref_rwsem_delay_section, 349 + .name = "rwsem" 350 + }; 351 + 352 + static void rcu_scale_one_reader(void) 353 + { 354 + if (readdelay <= 0) 355 + cur_ops->readsection(loops); 356 + else 357 + cur_ops->delaysection(loops, readdelay / 1000, readdelay % 1000); 358 + } 359 + 360 + // Reader kthread. Repeatedly does empty RCU read-side 361 + // critical section, minimizing update-side interference. 362 + static int 363 + ref_scale_reader(void *arg) 364 + { 365 + unsigned long flags; 366 + long me = (long)arg; 367 + struct reader_task *rt = &(reader_tasks[me]); 368 + u64 start; 369 + s64 duration; 370 + 371 + VERBOSE_SCALEOUT("ref_scale_reader %ld: task started", me); 372 + set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids)); 373 + set_user_nice(current, MAX_NICE); 374 + atomic_inc(&n_init); 375 + if (holdoff) 376 + schedule_timeout_interruptible(holdoff * HZ); 377 + repeat: 378 + VERBOSE_SCALEOUT("ref_scale_reader %ld: waiting to start next experiment on cpu %d", me, smp_processor_id()); 379 + 380 + // Wait for signal that this reader can start. 381 + wait_event(rt->wq, (atomic_read(&nreaders_exp) && smp_load_acquire(&rt->start_reader)) || 382 + torture_must_stop()); 383 + 384 + if (torture_must_stop()) 385 + goto end; 386 + 387 + // Make sure that the CPU is affinitized appropriately during testing. 388 + WARN_ON_ONCE(smp_processor_id() != me); 389 + 390 + WRITE_ONCE(rt->start_reader, 0); 391 + if (!atomic_dec_return(&n_started)) 392 + while (atomic_read_acquire(&n_started)) 393 + cpu_relax(); 394 + 395 + VERBOSE_SCALEOUT("ref_scale_reader %ld: experiment %d started", me, exp_idx); 396 + 397 + 398 + // To reduce noise, do an initial cache-warming invocation, check 399 + // in, and then keep warming until everyone has checked in. 400 + rcu_scale_one_reader(); 401 + if (!atomic_dec_return(&n_warmedup)) 402 + while (atomic_read_acquire(&n_warmedup)) 403 + rcu_scale_one_reader(); 404 + // Also keep interrupts disabled. This also has the effect 405 + // of preventing entries into slow path for rcu_read_unlock(). 406 + local_irq_save(flags); 407 + start = ktime_get_mono_fast_ns(); 408 + 409 + rcu_scale_one_reader(); 410 + 411 + duration = ktime_get_mono_fast_ns() - start; 412 + local_irq_restore(flags); 413 + 414 + rt->last_duration_ns = WARN_ON_ONCE(duration < 0) ? 0 : duration; 415 + // To reduce runtime-skew noise, do maintain-load invocations until 416 + // everyone is done. 417 + if (!atomic_dec_return(&n_cooleddown)) 418 + while (atomic_read_acquire(&n_cooleddown)) 419 + rcu_scale_one_reader(); 420 + 421 + if (atomic_dec_and_test(&nreaders_exp)) 422 + wake_up(&main_wq); 423 + 424 + VERBOSE_SCALEOUT("ref_scale_reader %ld: experiment %d ended, (readers remaining=%d)", 425 + me, exp_idx, atomic_read(&nreaders_exp)); 426 + 427 + if (!torture_must_stop()) 428 + goto repeat; 429 + end: 430 + torture_kthread_stopping("ref_scale_reader"); 431 + return 0; 432 + } 433 + 434 + static void reset_readers(void) 435 + { 436 + int i; 437 + struct reader_task *rt; 438 + 439 + for (i = 0; i < nreaders; i++) { 440 + rt = &(reader_tasks[i]); 441 + 442 + rt->last_duration_ns = 0; 443 + } 444 + } 445 + 446 + // Print the results of each reader and return the sum of all their durations. 447 + static u64 process_durations(int n) 448 + { 449 + int i; 450 + struct reader_task *rt; 451 + char buf1[64]; 452 + char *buf; 453 + u64 sum = 0; 454 + 455 + buf = kmalloc(128 + nreaders * 32, GFP_KERNEL); 456 + if (!buf) 457 + return 0; 458 + buf[0] = 0; 459 + sprintf(buf, "Experiment #%d (Format: <THREAD-NUM>:<Total loop time in ns>)", 460 + exp_idx); 461 + 462 + for (i = 0; i < n && !torture_must_stop(); i++) { 463 + rt = &(reader_tasks[i]); 464 + sprintf(buf1, "%d: %llu\t", i, rt->last_duration_ns); 465 + 466 + if (i % 5 == 0) 467 + strcat(buf, "\n"); 468 + strcat(buf, buf1); 469 + 470 + sum += rt->last_duration_ns; 471 + } 472 + strcat(buf, "\n"); 473 + 474 + SCALEOUT("%s\n", buf); 475 + 476 + kfree(buf); 477 + return sum; 478 + } 479 + 480 + // The main_func is the main orchestrator, it performs a bunch of 481 + // experiments. For every experiment, it orders all the readers 482 + // involved to start and waits for them to finish the experiment. It 483 + // then reads their timestamps and starts the next experiment. Each 484 + // experiment progresses from 1 concurrent reader to N of them at which 485 + // point all the timestamps are printed. 486 + static int main_func(void *arg) 487 + { 488 + bool errexit = false; 489 + int exp, r; 490 + char buf1[64]; 491 + char *buf; 492 + u64 *result_avg; 493 + 494 + set_cpus_allowed_ptr(current, cpumask_of(nreaders % nr_cpu_ids)); 495 + set_user_nice(current, MAX_NICE); 496 + 497 + VERBOSE_SCALEOUT("main_func task started"); 498 + result_avg = kzalloc(nruns * sizeof(*result_avg), GFP_KERNEL); 499 + buf = kzalloc(64 + nruns * 32, GFP_KERNEL); 500 + if (!result_avg || !buf) { 501 + VERBOSE_SCALEOUT_ERRSTRING("out of memory"); 502 + errexit = true; 503 + } 504 + if (holdoff) 505 + schedule_timeout_interruptible(holdoff * HZ); 506 + 507 + // Wait for all threads to start. 508 + atomic_inc(&n_init); 509 + while (atomic_read(&n_init) < nreaders + 1) 510 + schedule_timeout_uninterruptible(1); 511 + 512 + // Start exp readers up per experiment 513 + for (exp = 0; exp < nruns && !torture_must_stop(); exp++) { 514 + if (errexit) 515 + break; 516 + if (torture_must_stop()) 517 + goto end; 518 + 519 + reset_readers(); 520 + atomic_set(&nreaders_exp, nreaders); 521 + atomic_set(&n_started, nreaders); 522 + atomic_set(&n_warmedup, nreaders); 523 + atomic_set(&n_cooleddown, nreaders); 524 + 525 + exp_idx = exp; 526 + 527 + for (r = 0; r < nreaders; r++) { 528 + smp_store_release(&reader_tasks[r].start_reader, 1); 529 + wake_up(&reader_tasks[r].wq); 530 + } 531 + 532 + VERBOSE_SCALEOUT("main_func: experiment started, waiting for %d readers", 533 + nreaders); 534 + 535 + wait_event(main_wq, 536 + !atomic_read(&nreaders_exp) || torture_must_stop()); 537 + 538 + VERBOSE_SCALEOUT("main_func: experiment ended"); 539 + 540 + if (torture_must_stop()) 541 + goto end; 542 + 543 + result_avg[exp] = div_u64(1000 * process_durations(nreaders), nreaders * loops); 544 + } 545 + 546 + // Print the average of all experiments 547 + SCALEOUT("END OF TEST. Calculating average duration per loop (nanoseconds)...\n"); 548 + 549 + buf[0] = 0; 550 + strcat(buf, "\n"); 551 + strcat(buf, "Runs\tTime(ns)\n"); 552 + 553 + for (exp = 0; exp < nruns; exp++) { 554 + u64 avg; 555 + u32 rem; 556 + 557 + if (errexit) 558 + break; 559 + avg = div_u64_rem(result_avg[exp], 1000, &rem); 560 + sprintf(buf1, "%d\t%llu.%03u\n", exp + 1, avg, rem); 561 + strcat(buf, buf1); 562 + } 563 + 564 + if (!errexit) 565 + SCALEOUT("%s", buf); 566 + 567 + // This will shutdown everything including us. 568 + if (shutdown) { 569 + shutdown_start = 1; 570 + wake_up(&shutdown_wq); 571 + } 572 + 573 + // Wait for torture to stop us 574 + while (!torture_must_stop()) 575 + schedule_timeout_uninterruptible(1); 576 + 577 + end: 578 + torture_kthread_stopping("main_func"); 579 + kfree(result_avg); 580 + kfree(buf); 581 + return 0; 582 + } 583 + 584 + static void 585 + ref_scale_print_module_parms(struct ref_scale_ops *cur_ops, const char *tag) 586 + { 587 + pr_alert("%s" SCALE_FLAG 588 + "--- %s: verbose=%d shutdown=%d holdoff=%d loops=%ld nreaders=%d nruns=%d readdelay=%d\n", scale_type, tag, 589 + verbose, shutdown, holdoff, loops, nreaders, nruns, readdelay); 590 + } 591 + 592 + static void 593 + ref_scale_cleanup(void) 594 + { 595 + int i; 596 + 597 + if (torture_cleanup_begin()) 598 + return; 599 + 600 + if (!cur_ops) { 601 + torture_cleanup_end(); 602 + return; 603 + } 604 + 605 + if (reader_tasks) { 606 + for (i = 0; i < nreaders; i++) 607 + torture_stop_kthread("ref_scale_reader", 608 + reader_tasks[i].task); 609 + } 610 + kfree(reader_tasks); 611 + 612 + torture_stop_kthread("main_task", main_task); 613 + kfree(main_task); 614 + 615 + // Do scale-type-specific cleanup operations. 616 + if (cur_ops->cleanup != NULL) 617 + cur_ops->cleanup(); 618 + 619 + torture_cleanup_end(); 620 + } 621 + 622 + // Shutdown kthread. Just waits to be awakened, then shuts down system. 623 + static int 624 + ref_scale_shutdown(void *arg) 625 + { 626 + wait_event(shutdown_wq, shutdown_start); 627 + 628 + smp_mb(); // Wake before output. 629 + ref_scale_cleanup(); 630 + kernel_power_off(); 631 + 632 + return -EINVAL; 633 + } 634 + 635 + static int __init 636 + ref_scale_init(void) 637 + { 638 + long i; 639 + int firsterr = 0; 640 + static struct ref_scale_ops *scale_ops[] = { 641 + &rcu_ops, &srcu_ops, &rcu_trace_ops, &rcu_tasks_ops, 642 + &refcnt_ops, &rwlock_ops, &rwsem_ops, 643 + }; 644 + 645 + if (!torture_init_begin(scale_type, verbose)) 646 + return -EBUSY; 647 + 648 + for (i = 0; i < ARRAY_SIZE(scale_ops); i++) { 649 + cur_ops = scale_ops[i]; 650 + if (strcmp(scale_type, cur_ops->name) == 0) 651 + break; 652 + } 653 + if (i == ARRAY_SIZE(scale_ops)) { 654 + pr_alert("rcu-scale: invalid scale type: \"%s\"\n", scale_type); 655 + pr_alert("rcu-scale types:"); 656 + for (i = 0; i < ARRAY_SIZE(scale_ops); i++) 657 + pr_cont(" %s", scale_ops[i]->name); 658 + pr_cont("\n"); 659 + WARN_ON(!IS_MODULE(CONFIG_RCU_REF_SCALE_TEST)); 660 + firsterr = -EINVAL; 661 + cur_ops = NULL; 662 + goto unwind; 663 + } 664 + if (cur_ops->init) 665 + cur_ops->init(); 666 + 667 + ref_scale_print_module_parms(cur_ops, "Start of test"); 668 + 669 + // Shutdown task 670 + if (shutdown) { 671 + init_waitqueue_head(&shutdown_wq); 672 + firsterr = torture_create_kthread(ref_scale_shutdown, NULL, 673 + shutdown_task); 674 + if (firsterr) 675 + goto unwind; 676 + schedule_timeout_uninterruptible(1); 677 + } 678 + 679 + // Reader tasks (default to ~75% of online CPUs). 680 + if (nreaders < 0) 681 + nreaders = (num_online_cpus() >> 1) + (num_online_cpus() >> 2); 682 + reader_tasks = kcalloc(nreaders, sizeof(reader_tasks[0]), 683 + GFP_KERNEL); 684 + if (!reader_tasks) { 685 + VERBOSE_SCALEOUT_ERRSTRING("out of memory"); 686 + firsterr = -ENOMEM; 687 + goto unwind; 688 + } 689 + 690 + VERBOSE_SCALEOUT("Starting %d reader threads\n", nreaders); 691 + 692 + for (i = 0; i < nreaders; i++) { 693 + firsterr = torture_create_kthread(ref_scale_reader, (void *)i, 694 + reader_tasks[i].task); 695 + if (firsterr) 696 + goto unwind; 697 + 698 + init_waitqueue_head(&(reader_tasks[i].wq)); 699 + } 700 + 701 + // Main Task 702 + init_waitqueue_head(&main_wq); 703 + firsterr = torture_create_kthread(main_func, NULL, main_task); 704 + if (firsterr) 705 + goto unwind; 706 + 707 + torture_init_end(); 708 + return 0; 709 + 710 + unwind: 711 + torture_init_end(); 712 + ref_scale_cleanup(); 713 + return firsterr; 714 + } 715 + 716 + module_init(ref_scale_init); 717 + module_exit(ref_scale_cleanup);

+8 -8

kernel/rcu/srcutree.c

··· 766 766 * it, if this function was preempted for enough time for the counters 767 767 * to wrap, it really doesn't matter whether or not we expedite the grace 768 768 * period. The extra overhead of a needlessly expedited grace period is 769 - * negligible when amoritized over that time period, and the extra latency 769 + * negligible when amortized over that time period, and the extra latency 770 770 * of a needlessly non-expedited grace period is similarly negligible. 771 771 */ 772 772 static bool srcu_might_be_idle(struct srcu_struct *ssp) ··· 777 777 unsigned long t; 778 778 unsigned long tlast; 779 779 780 + check_init_srcu_struct(ssp); 780 781 /* If the local srcu_data structure has callbacks, not idle. */ 781 - local_irq_save(flags); 782 - sdp = this_cpu_ptr(ssp->sda); 782 + sdp = raw_cpu_ptr(ssp->sda); 783 + spin_lock_irqsave_rcu_node(sdp, flags); 783 784 if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) { 784 - local_irq_restore(flags); 785 + spin_unlock_irqrestore_rcu_node(sdp, flags); 785 786 return false; /* Callbacks already present, so not idle. */ 786 787 } 787 - local_irq_restore(flags); 788 + spin_unlock_irqrestore_rcu_node(sdp, flags); 788 789 789 790 /* 790 791 * No local callbacks, so probabalistically probe global state. ··· 865 864 } 866 865 rhp->func = func; 867 866 idx = srcu_read_lock(ssp); 868 - local_irq_save(flags); 869 - sdp = this_cpu_ptr(ssp->sda); 870 - spin_lock_rcu_node(sdp); 867 + sdp = raw_cpu_ptr(ssp->sda); 868 + spin_lock_irqsave_rcu_node(sdp, flags); 871 869 rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp); 872 870 rcu_segcblist_advance(&sdp->srcu_cblist, 873 871 rcu_seq_current(&ssp->srcu_gp_seq));

+25 -12

kernel/rcu/tasks.h

··· 103 103 #define RTGS_WAIT_READERS 9 104 104 #define RTGS_INVOKE_CBS 10 105 105 #define RTGS_WAIT_CBS 11 106 + #ifndef CONFIG_TINY_RCU 106 107 static const char * const rcu_tasks_gp_state_names[] = { 107 108 "RTGS_INIT", 108 109 "RTGS_WAIT_WAIT_CBS", ··· 118 117 "RTGS_INVOKE_CBS", 119 118 "RTGS_WAIT_CBS", 120 119 }; 120 + #endif /* #ifndef CONFIG_TINY_RCU */ 121 121 122 122 //////////////////////////////////////////////////////////////////////// 123 123 // ··· 131 129 rtp->gp_jiffies = jiffies; 132 130 } 133 131 132 + #ifndef CONFIG_TINY_RCU 134 133 /* Return state name. */ 135 134 static const char *tasks_gp_state_getname(struct rcu_tasks *rtp) 136 135 { ··· 142 139 return "???"; 143 140 return rcu_tasks_gp_state_names[j]; 144 141 } 142 + #endif /* #ifndef CONFIG_TINY_RCU */ 145 143 146 144 // Enqueue a callback for the specified flavor of Tasks RCU. 147 145 static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func, ··· 209 205 if (!rtp->cbs_head) { 210 206 WARN_ON(signal_pending(current)); 211 207 set_tasks_gp_state(rtp, RTGS_WAIT_WAIT_CBS); 212 - schedule_timeout_interruptible(HZ/10); 208 + schedule_timeout_idle(HZ/10); 213 209 } 214 210 continue; 215 211 } ··· 231 227 cond_resched(); 232 228 } 233 229 /* Paranoid sleep to keep this from entering a tight loop */ 234 - schedule_timeout_uninterruptible(HZ/10); 230 + schedule_timeout_idle(HZ/10); 235 231 236 232 set_tasks_gp_state(rtp, RTGS_WAIT_CBS); 237 233 } ··· 272 268 273 269 #endif /* #ifndef CONFIG_TINY_RCU */ 274 270 271 + #ifndef CONFIG_TINY_RCU 275 272 /* Dump out rcutorture-relevant state common to all RCU-tasks flavors. */ 276 273 static void show_rcu_tasks_generic_gp_kthread(struct rcu_tasks *rtp, char *s) 277 274 { ··· 286 281 ".C"[!!data_race(rtp->cbs_head)], 287 282 s); 288 283 } 284 + #endif /* #ifndef CONFIG_TINY_RCU */ 289 285 290 286 static void exit_tasks_rcu_finish_trace(struct task_struct *t); 291 287 ··· 342 336 343 337 /* Slowly back off waiting for holdouts */ 344 338 set_tasks_gp_state(rtp, RTGS_WAIT_SCAN_HOLDOUTS); 345 - schedule_timeout_interruptible(HZ/fract); 339 + schedule_timeout_idle(HZ/fract); 346 340 347 341 if (fract > 1) 348 342 fract--; ··· 408 402 } 409 403 410 404 /* Processing between scanning taskslist and draining the holdout list. */ 411 - void rcu_tasks_postscan(struct list_head *hop) 405 + static void rcu_tasks_postscan(struct list_head *hop) 412 406 { 413 407 /* 414 408 * Wait for tasks that are in the process of exiting. This ··· 563 557 } 564 558 core_initcall(rcu_spawn_tasks_kthread); 565 559 560 + #ifndef CONFIG_TINY_RCU 566 561 static void show_rcu_tasks_classic_gp_kthread(void) 567 562 { 568 563 show_rcu_tasks_generic_gp_kthread(&rcu_tasks, ""); 569 564 } 565 + #endif /* #ifndef CONFIG_TINY_RCU */ 570 566 571 567 /* Do the srcu_read_lock() for the above synchronize_srcu(). */ 572 568 void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu) ··· 690 682 } 691 683 core_initcall(rcu_spawn_tasks_rude_kthread); 692 684 685 + #ifndef CONFIG_TINY_RCU 693 686 static void show_rcu_tasks_rude_gp_kthread(void) 694 687 { 695 688 show_rcu_tasks_generic_gp_kthread(&rcu_tasks_rude, ""); 696 689 } 690 + #endif /* #ifndef CONFIG_TINY_RCU */ 697 691 698 692 #else /* #ifdef CONFIG_TASKS_RUDE_RCU */ 699 693 static void show_rcu_tasks_rude_gp_kthread(void) {} ··· 737 727 738 728 #ifdef CONFIG_TASKS_TRACE_RCU 739 729 740 - atomic_t trc_n_readers_need_end; // Number of waited-for readers. 741 - DECLARE_WAIT_QUEUE_HEAD(trc_wait); // List of holdout tasks. 730 + static atomic_t trc_n_readers_need_end; // Number of waited-for readers. 731 + static DECLARE_WAIT_QUEUE_HEAD(trc_wait); // List of holdout tasks. 742 732 743 733 // Record outstanding IPIs to each CPU. No point in sending two... 744 734 static DEFINE_PER_CPU(bool, trc_ipi_to_cpu); ··· 845 835 bool ofl = cpu_is_offline(cpu); 846 836 847 837 if (task_curr(t)) { 848 - WARN_ON_ONCE(ofl & !is_idle_task(t)); 838 + WARN_ON_ONCE(ofl && !is_idle_task(t)); 849 839 850 840 // If no chance of heavyweight readers, do it the hard way. 851 841 if (!ofl && !IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB)) ··· 1128 1118 * synchronize_rcu_tasks_trace - wait for a trace rcu-tasks grace period 1129 1119 * 1130 1120 * Control will return to the caller some time after a trace rcu-tasks 1131 - * grace period has elapsed, in other words after all currently 1132 - * executing rcu-tasks read-side critical sections have elapsed. These 1133 - * read-side critical sections are delimited by calls to schedule(), 1134 - * cond_resched_tasks_rcu_qs(), userspace execution, and (in theory, 1135 - * anyway) cond_resched(). 1121 + * grace period has elapsed, in other words after all currently executing 1122 + * rcu-tasks read-side critical sections have elapsed. These read-side 1123 + * critical sections are delimited by calls to rcu_read_lock_trace() 1124 + * and rcu_read_unlock_trace(). 1136 1125 * 1137 1126 * This is a very specialized primitive, intended only for a few uses in 1138 1127 * tracing and other situations requiring manipulation of function preambles ··· 1173 1164 } 1174 1165 core_initcall(rcu_spawn_tasks_trace_kthread); 1175 1166 1167 + #ifndef CONFIG_TINY_RCU 1176 1168 static void show_rcu_tasks_trace_gp_kthread(void) 1177 1169 { 1178 1170 char buf[64]; ··· 1184 1174 data_race(n_heavy_reader_attempts)); 1185 1175 show_rcu_tasks_generic_gp_kthread(&rcu_tasks_trace, buf); 1186 1176 } 1177 + #endif /* #ifndef CONFIG_TINY_RCU */ 1187 1178 1188 1179 #else /* #ifdef CONFIG_TASKS_TRACE_RCU */ 1189 1180 static void exit_tasks_rcu_finish_trace(struct task_struct *t) { } 1190 1181 static inline void show_rcu_tasks_trace_gp_kthread(void) {} 1191 1182 #endif /* #else #ifdef CONFIG_TASKS_TRACE_RCU */ 1192 1183 1184 + #ifndef CONFIG_TINY_RCU 1193 1185 void show_rcu_tasks_gp_kthreads(void) 1194 1186 { 1195 1187 show_rcu_tasks_classic_gp_kthread(); 1196 1188 show_rcu_tasks_rude_gp_kthread(); 1197 1189 show_rcu_tasks_trace_gp_kthread(); 1198 1190 } 1191 + #endif /* #ifndef CONFIG_TINY_RCU */ 1199 1192 1200 1193 #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */ 1201 1194 static inline void rcu_tasks_bootup_oddness(void) {}

+4 -3

kernel/rcu/tiny.c

··· 23 23 #include <linux/cpu.h> 24 24 #include <linux/prefetch.h> 25 25 #include <linux/slab.h> 26 + #include <linux/mm.h> 26 27 27 28 #include "rcu.h" 28 29 ··· 85 84 unsigned long offset = (unsigned long)head->func; 86 85 87 86 rcu_lock_acquire(&rcu_callback_map); 88 - if (__is_kfree_rcu_offset(offset)) { 89 - trace_rcu_invoke_kfree_callback("", head, offset); 90 - kfree((void *)head - offset); 87 + if (__is_kvfree_rcu_offset(offset)) { 88 + trace_rcu_invoke_kvfree_callback("", head, offset); 89 + kvfree((void *)head - offset); 91 90 rcu_lock_release(&rcu_callback_map); 92 91 return true; 93 92 }

+279 -126

kernel/rcu/tree.c

··· 57 57 #include <linux/slab.h> 58 58 #include <linux/sched/isolation.h> 59 59 #include <linux/sched/clock.h> 60 + #include <linux/vmalloc.h> 61 + #include <linux/mm.h> 60 62 #include "../time/tick-internal.h" 61 63 62 64 #include "tree.h" ··· 176 174 module_param(gp_init_delay, int, 0444); 177 175 static int gp_cleanup_delay; 178 176 module_param(gp_cleanup_delay, int, 0444); 177 + 178 + /* 179 + * This rcu parameter is runtime-read-only. It reflects 180 + * a minimum allowed number of objects which can be cached 181 + * per-CPU. Object size is equal to one page. This value 182 + * can be changed at boot time. 183 + */ 184 + static int rcu_min_cached_objs = 2; 185 + module_param(rcu_min_cached_objs, int, 0444); 179 186 180 187 /* Retrieve RCU kthreads priority for rcutorture */ 181 188 int rcu_get_gp_kthreads_prio(void) ··· 965 954 966 955 /** 967 956 * rcu_nmi_enter - inform RCU of entry to NMI context 968 - * @irq: Is this call from rcu_irq_enter? 969 957 * 970 958 * If the CPU was idle from RCU's viewpoint, update rdp->dynticks and 971 959 * rdp->dynticks_nmi_nesting to let the RCU grace-period handling know ··· 1000 990 rcu_dynticks_eqs_exit(); 1001 991 // ... but is watching here. 1002 992 1003 - if (!in_nmi()) 993 + if (!in_nmi()) { 994 + instrumentation_begin(); 1004 995 rcu_cleanup_after_idle(); 996 + instrumentation_end(); 997 + } 1005 998 1006 999 instrumentation_begin(); 1007 1000 // instrumentation for the noinstr rcu_dynticks_curr_cpu_in_eqs() ··· 1651 1638 if (delay > 0 && 1652 1639 !(rcu_seq_ctr(rcu_state.gp_seq) % 1653 1640 (rcu_num_nodes * PER_RCU_NODE_PERIOD * delay))) 1654 - schedule_timeout_uninterruptible(delay); 1641 + schedule_timeout_idle(delay); 1655 1642 } 1656 1643 1657 1644 static unsigned long sleep_duration; ··· 1674 1661 duration = xchg(&sleep_duration, 0UL); 1675 1662 if (duration > 0) { 1676 1663 pr_alert("%s: Waiting %lu jiffies\n", __func__, duration); 1677 - schedule_timeout_uninterruptible(duration); 1664 + schedule_timeout_idle(duration); 1678 1665 pr_alert("%s: Wait complete\n", __func__); 1679 1666 } 1680 1667 } ··· 2456 2443 local_irq_save(flags); 2457 2444 rcu_nocb_lock(rdp); 2458 2445 count = -rcl.len; 2446 + rdp->n_cbs_invoked += count; 2459 2447 trace_rcu_batch_end(rcu_state.name, count, !!rcl.head, need_resched(), 2460 2448 is_idle_task(current), rcu_is_callbacks_kthread()); 2461 2449 ··· 2740 2726 } 2741 2727 *statusp = RCU_KTHREAD_YIELDING; 2742 2728 trace_rcu_utilization(TPS("Start CPU kthread@rcu_yield")); 2743 - schedule_timeout_interruptible(2); 2729 + schedule_timeout_idle(2); 2744 2730 trace_rcu_utilization(TPS("End CPU kthread@rcu_yield")); 2745 2731 *statusp = RCU_KTHREAD_WAITING; 2746 2732 } ··· 2908 2894 return; // Enqueued onto ->nocb_bypass, so just leave. 2909 2895 // If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock. 2910 2896 rcu_segcblist_enqueue(&rdp->cblist, head); 2911 - if (__is_kfree_rcu_offset((unsigned long)func)) 2912 - trace_rcu_kfree_callback(rcu_state.name, head, 2897 + if (__is_kvfree_rcu_offset((unsigned long)func)) 2898 + trace_rcu_kvfree_callback(rcu_state.name, head, 2913 2899 (unsigned long)func, 2914 2900 rcu_segcblist_n_cbs(&rdp->cblist)); 2915 2901 else ··· 2971 2957 /* Maximum number of jiffies to wait before draining a batch. */ 2972 2958 #define KFREE_DRAIN_JIFFIES (HZ / 50) 2973 2959 #define KFREE_N_BATCHES 2 2960 + #define FREE_N_CHANNELS 2 2961 + 2962 + /** 2963 + * struct kvfree_rcu_bulk_data - single block to store kvfree_rcu() pointers 2964 + * @nr_records: Number of active pointers in the array 2965 + * @next: Next bulk object in the block chain 2966 + * @records: Array of the kvfree_rcu() pointers 2967 + */ 2968 + struct kvfree_rcu_bulk_data { 2969 + unsigned long nr_records; 2970 + struct kvfree_rcu_bulk_data *next; 2971 + void *records[]; 2972 + }; 2974 2973 2975 2974 /* 2976 2975 * This macro defines how many entries the "records" array 2977 2976 * will contain. It is based on the fact that the size of 2978 - * kfree_rcu_bulk_data structure becomes exactly one page. 2977 + * kvfree_rcu_bulk_data structure becomes exactly one page. 2979 2978 */ 2980 - #define KFREE_BULK_MAX_ENTR ((PAGE_SIZE / sizeof(void *)) - 3) 2981 - 2982 - /** 2983 - * struct kfree_rcu_bulk_data - single block to store kfree_rcu() pointers 2984 - * @nr_records: Number of active pointers in the array 2985 - * @records: Array of the kfree_rcu() pointers 2986 - * @next: Next bulk object in the block chain 2987 - * @head_free_debug: For debug, when CONFIG_DEBUG_OBJECTS_RCU_HEAD is set 2988 - */ 2989 - struct kfree_rcu_bulk_data { 2990 - unsigned long nr_records; 2991 - void *records[KFREE_BULK_MAX_ENTR]; 2992 - struct kfree_rcu_bulk_data *next; 2993 - struct rcu_head *head_free_debug; 2994 - }; 2979 + #define KVFREE_BULK_MAX_ENTR \ 2980 + ((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *)) 2995 2981 2996 2982 /** 2997 2983 * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests 2998 2984 * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period 2999 2985 * @head_free: List of kfree_rcu() objects waiting for a grace period 3000 - * @bhead_free: Bulk-List of kfree_rcu() objects waiting for a grace period 2986 + * @bkvhead_free: Bulk-List of kvfree_rcu() objects waiting for a grace period 3001 2987 * @krcp: Pointer to @kfree_rcu_cpu structure 3002 2988 */ 3003 2989 3004 2990 struct kfree_rcu_cpu_work { 3005 2991 struct rcu_work rcu_work; 3006 2992 struct rcu_head *head_free; 3007 - struct kfree_rcu_bulk_data *bhead_free; 2993 + struct kvfree_rcu_bulk_data *bkvhead_free[FREE_N_CHANNELS]; 3008 2994 struct kfree_rcu_cpu *krcp; 3009 2995 }; 3010 2996 3011 2997 /** 3012 2998 * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period 3013 2999 * @head: List of kfree_rcu() objects not yet waiting for a grace period 3014 - * @bhead: Bulk-List of kfree_rcu() objects not yet waiting for a grace period 3015 - * @bcached: Keeps at most one object for later reuse when build chain blocks 3000 + * @bkvhead: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period 3016 3001 * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period 3017 3002 * @lock: Synchronize access to this structure 3018 3003 * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES 3019 3004 * @monitor_todo: Tracks whether a @monitor_work delayed work is pending 3020 - * @initialized: The @lock and @rcu_work fields have been initialized 3005 + * @initialized: The @rcu_work fields have been initialized 3006 + * @count: Number of objects for which GP not started 3021 3007 * 3022 3008 * This is a per-CPU structure. The reason that it is not included in 3023 3009 * the rcu_data structure is to permit this code to be extracted from ··· 3026 3012 */ 3027 3013 struct kfree_rcu_cpu { 3028 3014 struct rcu_head *head; 3029 - struct kfree_rcu_bulk_data *bhead; 3030 - struct kfree_rcu_bulk_data *bcached; 3015 + struct kvfree_rcu_bulk_data *bkvhead[FREE_N_CHANNELS]; 3031 3016 struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES]; 3032 - spinlock_t lock; 3017 + raw_spinlock_t lock; 3033 3018 struct delayed_work monitor_work; 3034 3019 bool monitor_todo; 3035 3020 bool initialized; 3036 - // Number of objects for which GP not started 3037 3021 int count; 3022 + 3023 + /* 3024 + * A simple cache list that contains objects for 3025 + * reuse purpose. In order to save some per-cpu 3026 + * space the list is singular. Even though it is 3027 + * lockless an access has to be protected by the 3028 + * per-cpu lock. 3029 + */ 3030 + struct llist_head bkvcache; 3031 + int nr_bkv_objs; 3038 3032 }; 3039 3033 3040 - static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc); 3034 + static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = { 3035 + .lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock), 3036 + }; 3041 3037 3042 3038 static __always_inline void 3043 - debug_rcu_head_unqueue_bulk(struct rcu_head *head) 3039 + debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead) 3044 3040 { 3045 3041 #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD 3046 - for (; head; head = head->next) 3047 - debug_rcu_head_unqueue(head); 3042 + int i; 3043 + 3044 + for (i = 0; i < bhead->nr_records; i++) 3045 + debug_rcu_head_unqueue((struct rcu_head *)(bhead->records[i])); 3048 3046 #endif 3047 + } 3048 + 3049 + static inline struct kfree_rcu_cpu * 3050 + krc_this_cpu_lock(unsigned long *flags) 3051 + { 3052 + struct kfree_rcu_cpu *krcp; 3053 + 3054 + local_irq_save(*flags); // For safely calling this_cpu_ptr(). 3055 + krcp = this_cpu_ptr(&krc); 3056 + raw_spin_lock(&krcp->lock); 3057 + 3058 + return krcp; 3059 + } 3060 + 3061 + static inline void 3062 + krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags) 3063 + { 3064 + raw_spin_unlock(&krcp->lock); 3065 + local_irq_restore(flags); 3066 + } 3067 + 3068 + static inline struct kvfree_rcu_bulk_data * 3069 + get_cached_bnode(struct kfree_rcu_cpu *krcp) 3070 + { 3071 + if (!krcp->nr_bkv_objs) 3072 + return NULL; 3073 + 3074 + krcp->nr_bkv_objs--; 3075 + return (struct kvfree_rcu_bulk_data *) 3076 + llist_del_first(&krcp->bkvcache); 3077 + } 3078 + 3079 + static inline bool 3080 + put_cached_bnode(struct kfree_rcu_cpu *krcp, 3081 + struct kvfree_rcu_bulk_data *bnode) 3082 + { 3083 + // Check the limit. 3084 + if (krcp->nr_bkv_objs >= rcu_min_cached_objs) 3085 + return false; 3086 + 3087 + llist_add((struct llist_node *) bnode, &krcp->bkvcache); 3088 + krcp->nr_bkv_objs++; 3089 + return true; 3090 + 3049 3091 } 3050 3092 3051 3093 /* ··· 3111 3041 static void kfree_rcu_work(struct work_struct *work) 3112 3042 { 3113 3043 unsigned long flags; 3044 + struct kvfree_rcu_bulk_data *bkvhead[FREE_N_CHANNELS], *bnext; 3114 3045 struct rcu_head *head, *next; 3115 - struct kfree_rcu_bulk_data *bhead, *bnext; 3116 3046 struct kfree_rcu_cpu *krcp; 3117 3047 struct kfree_rcu_cpu_work *krwp; 3048 + int i, j; 3118 3049 3119 3050 krwp = container_of(to_rcu_work(work), 3120 3051 struct kfree_rcu_cpu_work, rcu_work); 3121 3052 krcp = krwp->krcp; 3122 - spin_lock_irqsave(&krcp->lock, flags); 3053 + 3054 + raw_spin_lock_irqsave(&krcp->lock, flags); 3055 + // Channels 1 and 2. 3056 + for (i = 0; i < FREE_N_CHANNELS; i++) { 3057 + bkvhead[i] = krwp->bkvhead_free[i]; 3058 + krwp->bkvhead_free[i] = NULL; 3059 + } 3060 + 3061 + // Channel 3. 3123 3062 head = krwp->head_free; 3124 3063 krwp->head_free = NULL; 3125 - bhead = krwp->bhead_free; 3126 - krwp->bhead_free = NULL; 3127 - spin_unlock_irqrestore(&krcp->lock, flags); 3064 + raw_spin_unlock_irqrestore(&krcp->lock, flags); 3128 3065 3129 - /* "bhead" is now private, so traverse locklessly. */ 3130 - for (; bhead; bhead = bnext) { 3131 - bnext = bhead->next; 3066 + // Handle two first channels. 3067 + for (i = 0; i < FREE_N_CHANNELS; i++) { 3068 + for (; bkvhead[i]; bkvhead[i] = bnext) { 3069 + bnext = bkvhead[i]->next; 3070 + debug_rcu_bhead_unqueue(bkvhead[i]); 3132 3071 3133 - debug_rcu_head_unqueue_bulk(bhead->head_free_debug); 3072 + rcu_lock_acquire(&rcu_callback_map); 3073 + if (i == 0) { // kmalloc() / kfree(). 3074 + trace_rcu_invoke_kfree_bulk_callback( 3075 + rcu_state.name, bkvhead[i]->nr_records, 3076 + bkvhead[i]->records); 3134 3077 3135 - rcu_lock_acquire(&rcu_callback_map); 3136 - trace_rcu_invoke_kfree_bulk_callback(rcu_state.name, 3137 - bhead->nr_records, bhead->records); 3078 + kfree_bulk(bkvhead[i]->nr_records, 3079 + bkvhead[i]->records); 3080 + } else { // vmalloc() / vfree(). 3081 + for (j = 0; j < bkvhead[i]->nr_records; j++) { 3082 + trace_rcu_invoke_kvfree_callback( 3083 + rcu_state.name, 3084 + bkvhead[i]->records[j], 0); 3138 3085 3139 - kfree_bulk(bhead->nr_records, bhead->records); 3140 - rcu_lock_release(&rcu_callback_map); 3086 + vfree(bkvhead[i]->records[j]); 3087 + } 3088 + } 3089 + rcu_lock_release(&rcu_callback_map); 3141 3090 3142 - if (cmpxchg(&krcp->bcached, NULL, bhead)) 3143 - free_page((unsigned long) bhead); 3091 + krcp = krc_this_cpu_lock(&flags); 3092 + if (put_cached_bnode(krcp, bkvhead[i])) 3093 + bkvhead[i] = NULL; 3094 + krc_this_cpu_unlock(krcp, flags); 3144 3095 3145 - cond_resched_tasks_rcu_qs(); 3096 + if (bkvhead[i]) 3097 + free_page((unsigned long) bkvhead[i]); 3098 + 3099 + cond_resched_tasks_rcu_qs(); 3100 + } 3146 3101 } 3147 3102 3148 3103 /* ··· 3177 3082 */ 3178 3083 for (; head; head = next) { 3179 3084 unsigned long offset = (unsigned long)head->func; 3085 + void *ptr = (void *)head - offset; 3180 3086 3181 3087 next = head->next; 3182 - debug_rcu_head_unqueue(head); 3088 + debug_rcu_head_unqueue((struct rcu_head *)ptr); 3183 3089 rcu_lock_acquire(&rcu_callback_map); 3184 - trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset); 3090 + trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset); 3185 3091 3186 - if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset))) 3187 - kfree((void *)head - offset); 3092 + if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) 3093 + kvfree(ptr); 3188 3094 3189 3095 rcu_lock_release(&rcu_callback_map); 3190 3096 cond_resched_tasks_rcu_qs(); ··· 3201 3105 static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp) 3202 3106 { 3203 3107 struct kfree_rcu_cpu_work *krwp; 3204 - bool queued = false; 3205 - int i; 3108 + bool repeat = false; 3109 + int i, j; 3206 3110 3207 3111 lockdep_assert_held(&krcp->lock); 3208 3112 ··· 3210 3114 krwp = &(krcp->krw_arr[i]); 3211 3115 3212 3116 /* 3213 - * Try to detach bhead or head and attach it over any 3117 + * Try to detach bkvhead or head and attach it over any 3214 3118 * available corresponding free channel. It can be that 3215 3119 * a previous RCU batch is in progress, it means that 3216 3120 * immediately to queue another one is not possible so 3217 3121 * return false to tell caller to retry. 3218 3122 */ 3219 - if ((krcp->bhead && !krwp->bhead_free) || 3123 + if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) || 3124 + (krcp->bkvhead[1] && !krwp->bkvhead_free[1]) || 3220 3125 (krcp->head && !krwp->head_free)) { 3221 - /* Channel 1. */ 3222 - if (!krwp->bhead_free) { 3223 - krwp->bhead_free = krcp->bhead; 3224 - krcp->bhead = NULL; 3126 + // Channel 1 corresponds to SLAB ptrs. 3127 + // Channel 2 corresponds to vmalloc ptrs. 3128 + for (j = 0; j < FREE_N_CHANNELS; j++) { 3129 + if (!krwp->bkvhead_free[j]) { 3130 + krwp->bkvhead_free[j] = krcp->bkvhead[j]; 3131 + krcp->bkvhead[j] = NULL; 3132 + } 3225 3133 } 3226 3134 3227 - /* Channel 2. */ 3135 + // Channel 3 corresponds to emergency path. 3228 3136 if (!krwp->head_free) { 3229 3137 krwp->head_free = krcp->head; 3230 3138 krcp->head = NULL; ··· 3237 3137 WRITE_ONCE(krcp->count, 0); 3238 3138 3239 3139 /* 3240 - * One work is per one batch, so there are two "free channels", 3241 - * "bhead_free" and "head_free" the batch can handle. It can be 3242 - * that the work is in the pending state when two channels have 3243 - * been detached following each other, one by one. 3140 + * One work is per one batch, so there are three 3141 + * "free channels", the batch can handle. It can 3142 + * be that the work is in the pending state when 3143 + * channels have been detached following by each 3144 + * other. 3244 3145 */ 3245 3146 queue_rcu_work(system_wq, &krwp->rcu_work); 3246 - queued = true; 3247 3147 } 3148 + 3149 + // Repeat if any "free" corresponding channel is still busy. 3150 + if (krcp->bkvhead[0] || krcp->bkvhead[1] || krcp->head) 3151 + repeat = true; 3248 3152 } 3249 3153 3250 - return queued; 3154 + return !repeat; 3251 3155 } 3252 3156 3253 3157 static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp, ··· 3261 3157 krcp->monitor_todo = false; 3262 3158 if (queue_kfree_rcu_work(krcp)) { 3263 3159 // Success! Our job is done here. 3264 - spin_unlock_irqrestore(&krcp->lock, flags); 3160 + raw_spin_unlock_irqrestore(&krcp->lock, flags); 3265 3161 return; 3266 3162 } 3267 3163 3268 3164 // Previous RCU batch still in progress, try again later. 3269 3165 krcp->monitor_todo = true; 3270 3166 schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES); 3271 - spin_unlock_irqrestore(&krcp->lock, flags); 3167 + raw_spin_unlock_irqrestore(&krcp->lock, flags); 3272 3168 } 3273 3169 3274 3170 /* ··· 3281 3177 struct kfree_rcu_cpu *krcp = container_of(work, struct kfree_rcu_cpu, 3282 3178 monitor_work.work); 3283 3179 3284 - spin_lock_irqsave(&krcp->lock, flags); 3180 + raw_spin_lock_irqsave(&krcp->lock, flags); 3285 3181 if (krcp->monitor_todo) 3286 3182 kfree_rcu_drain_unlock(krcp, flags); 3287 3183 else 3288 - spin_unlock_irqrestore(&krcp->lock, flags); 3184 + raw_spin_unlock_irqrestore(&krcp->lock, flags); 3289 3185 } 3290 3186 3291 3187 static inline bool 3292 - kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, 3293 - struct rcu_head *head, rcu_callback_t func) 3188 + kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) 3294 3189 { 3295 - struct kfree_rcu_bulk_data *bnode; 3190 + struct kvfree_rcu_bulk_data *bnode; 3191 + int idx; 3296 3192 3297 3193 if (unlikely(!krcp->initialized)) 3298 3194 return false; 3299 3195 3300 3196 lockdep_assert_held(&krcp->lock); 3197 + idx = !!is_vmalloc_addr(ptr); 3301 3198 3302 3199 /* Check if a new block is required. */ 3303 - if (!krcp->bhead || 3304 - krcp->bhead->nr_records == KFREE_BULK_MAX_ENTR) { 3305 - bnode = xchg(&krcp->bcached, NULL); 3200 + if (!krcp->bkvhead[idx] || 3201 + krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) { 3202 + bnode = get_cached_bnode(krcp); 3306 3203 if (!bnode) { 3307 - WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE); 3204 + /* 3205 + * To keep this path working on raw non-preemptible 3206 + * sections, prevent the optional entry into the 3207 + * allocator as it uses sleeping locks. In fact, even 3208 + * if the caller of kfree_rcu() is preemptible, this 3209 + * path still is not, as krcp->lock is a raw spinlock. 3210 + * With additional page pre-allocation in the works, 3211 + * hitting this return is going to be much less likely. 3212 + */ 3213 + if (IS_ENABLED(CONFIG_PREEMPT_RT)) 3214 + return false; 3308 3215 3309 - bnode = (struct kfree_rcu_bulk_data *) 3216 + /* 3217 + * NOTE: For one argument of kvfree_rcu() we can 3218 + * drop the lock and get the page in sleepable 3219 + * context. That would allow to maintain an array 3220 + * for the CONFIG_PREEMPT_RT as well if no cached 3221 + * pages are available. 3222 + */ 3223 + bnode = (struct kvfree_rcu_bulk_data *) 3310 3224 __get_free_page(GFP_NOWAIT | __GFP_NOWARN); 3311 3225 } 3312 3226 ··· 3334 3212 3335 3213 /* Initialize the new block. */ 3336 3214 bnode->nr_records = 0; 3337 - bnode->next = krcp->bhead; 3338 - bnode->head_free_debug = NULL; 3215 + bnode->next = krcp->bkvhead[idx]; 3339 3216 3340 3217 /* Attach it to the head. */ 3341 - krcp->bhead = bnode; 3218 + krcp->bkvhead[idx] = bnode; 3342 3219 } 3343 3220 3344 - #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD 3345 - head->func = func; 3346 - head->next = krcp->bhead->head_free_debug; 3347 - krcp->bhead->head_free_debug = head; 3348 - #endif 3349 - 3350 3221 /* Finally insert. */ 3351 - krcp->bhead->records[krcp->bhead->nr_records++] = 3352 - (void *) head - (unsigned long) func; 3222 + krcp->bkvhead[idx]->records 3223 + [krcp->bkvhead[idx]->nr_records++] = ptr; 3353 3224 3354 3225 return true; 3355 3226 } 3356 3227 3357 3228 /* 3358 - * Queue a request for lazy invocation of kfree_bulk()/kfree() after a grace 3359 - * period. Please note there are two paths are maintained, one is the main one 3360 - * that uses kfree_bulk() interface and second one is emergency one, that is 3361 - * used only when the main path can not be maintained temporary, due to memory 3362 - * pressure. 3229 + * Queue a request for lazy invocation of appropriate free routine after a 3230 + * grace period. Please note there are three paths are maintained, two are the 3231 + * main ones that use array of pointers interface and third one is emergency 3232 + * one, that is used only when the main path can not be maintained temporary, 3233 + * due to memory pressure. 3363 3234 * 3364 - * Each kfree_call_rcu() request is added to a batch. The batch will be drained 3235 + * Each kvfree_call_rcu() request is added to a batch. The batch will be drained 3365 3236 * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will 3366 3237 * be free'd in workqueue context. This allows us to: batch requests together to 3367 - * reduce the number of grace periods during heavy kfree_rcu() load. 3238 + * reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load. 3368 3239 */ 3369 - void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func) 3240 + void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) 3370 3241 { 3371 3242 unsigned long flags; 3372 3243 struct kfree_rcu_cpu *krcp; 3244 + bool success; 3245 + void *ptr; 3373 3246 3374 - local_irq_save(flags); // For safely calling this_cpu_ptr(). 3375 - krcp = this_cpu_ptr(&krc); 3376 - if (krcp->initialized) 3377 - spin_lock(&krcp->lock); 3247 + if (head) { 3248 + ptr = (void *) head - (unsigned long) func; 3249 + } else { 3250 + /* 3251 + * Please note there is a limitation for the head-less 3252 + * variant, that is why there is a clear rule for such 3253 + * objects: it can be used from might_sleep() context 3254 + * only. For other places please embed an rcu_head to 3255 + * your data. 3256 + */ 3257 + might_sleep(); 3258 + ptr = (unsigned long *) func; 3259 + } 3260 + 3261 + krcp = krc_this_cpu_lock(&flags); 3378 3262 3379 3263 // Queue the object but don't yet schedule the batch. 3380 - if (debug_rcu_head_queue(head)) { 3264 + if (debug_rcu_head_queue(ptr)) { 3381 3265 // Probable double kfree_rcu(), just leak. 3382 3266 WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n", 3383 3267 __func__, head); 3268 + 3269 + // Mark as success and leave. 3270 + success = true; 3384 3271 goto unlock_return; 3385 3272 } 3386 3273 ··· 3397 3266 * Under high memory pressure GFP_NOWAIT can fail, 3398 3267 * in that case the emergency path is maintained. 3399 3268 */ 3400 - if (unlikely(!kfree_call_rcu_add_ptr_to_bulk(krcp, head, func))) { 3269 + success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr); 3270 + if (!success) { 3271 + if (head == NULL) 3272 + // Inline if kvfree_rcu(one_arg) call. 3273 + goto unlock_return; 3274 + 3401 3275 head->func = func; 3402 3276 head->next = krcp->head; 3403 3277 krcp->head = head; 3278 + success = true; 3404 3279 } 3405 3280 3406 3281 WRITE_ONCE(krcp->count, krcp->count + 1); ··· 3419 3282 } 3420 3283 3421 3284 unlock_return: 3422 - if (krcp->initialized) 3423 - spin_unlock(&krcp->lock); 3424 - local_irq_restore(flags); 3285 + krc_this_cpu_unlock(krcp, flags); 3286 + 3287 + /* 3288 + * Inline kvfree() after synchronize_rcu(). We can do 3289 + * it from might_sleep() context only, so the current 3290 + * CPU can pass the QS state. 3291 + */ 3292 + if (!success) { 3293 + debug_rcu_head_unqueue((struct rcu_head *) ptr); 3294 + synchronize_rcu(); 3295 + kvfree(ptr); 3296 + } 3425 3297 } 3426 - EXPORT_SYMBOL_GPL(kfree_call_rcu); 3298 + EXPORT_SYMBOL_GPL(kvfree_call_rcu); 3427 3299 3428 3300 static unsigned long 3429 3301 kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) ··· 3461 3315 struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 3462 3316 3463 3317 count = krcp->count; 3464 - spin_lock_irqsave(&krcp->lock, flags); 3318 + raw_spin_lock_irqsave(&krcp->lock, flags); 3465 3319 if (krcp->monitor_todo) 3466 3320 kfree_rcu_drain_unlock(krcp, flags); 3467 3321 else 3468 - spin_unlock_irqrestore(&krcp->lock, flags); 3322 + raw_spin_unlock_irqrestore(&krcp->lock, flags); 3469 3323 3470 3324 sc->nr_to_scan -= count; 3471 3325 freed += count; ··· 3474 3328 break; 3475 3329 } 3476 3330 3477 - return freed; 3331 + return freed == 0 ? SHRINK_STOP : freed; 3478 3332 } 3479 3333 3480 3334 static struct shrinker kfree_rcu_shrinker = { ··· 3492 3346 for_each_online_cpu(cpu) { 3493 3347 struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 3494 3348 3495 - spin_lock_irqsave(&krcp->lock, flags); 3349 + raw_spin_lock_irqsave(&krcp->lock, flags); 3496 3350 if (!krcp->head || krcp->monitor_todo) { 3497 - spin_unlock_irqrestore(&krcp->lock, flags); 3351 + raw_spin_unlock_irqrestore(&krcp->lock, flags); 3498 3352 continue; 3499 3353 } 3500 3354 krcp->monitor_todo = true; 3501 3355 schedule_delayed_work_on(cpu, &krcp->monitor_work, 3502 3356 KFREE_DRAIN_JIFFIES); 3503 - spin_unlock_irqrestore(&krcp->lock, flags); 3357 + raw_spin_unlock_irqrestore(&krcp->lock, flags); 3504 3358 } 3505 3359 } 3506 3360 ··· 3988 3842 { 3989 3843 unsigned long flags; 3990 3844 unsigned long mask; 3991 - int nbits; 3992 - unsigned long oldmask; 3993 3845 struct rcu_data *rdp; 3994 3846 struct rcu_node *rnp; 3847 + bool newcpu; 3995 3848 3996 3849 if (per_cpu(rcu_cpu_started, cpu)) 3997 3850 return; ··· 4002 3857 mask = rdp->grpmask; 4003 3858 raw_spin_lock_irqsave_rcu_node(rnp, flags); 4004 3859 WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext | mask); 4005 - oldmask = rnp->expmaskinitnext; 3860 + newcpu = !(rnp->expmaskinitnext & mask); 4006 3861 rnp->expmaskinitnext |= mask; 4007 - oldmask ^= rnp->expmaskinitnext; 4008 - nbits = bitmap_weight(&oldmask, BITS_PER_LONG); 4009 3862 /* Allow lockless access for expedited grace periods. */ 4010 - smp_store_release(&rcu_state.ncpus, rcu_state.ncpus + nbits); /* ^^^ */ 3863 + smp_store_release(&rcu_state.ncpus, rcu_state.ncpus + newcpu); /* ^^^ */ 4011 3864 ASSERT_EXCLUSIVE_WRITER(rcu_state.ncpus); 4012 3865 rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */ 4013 3866 rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq); ··· 4392 4249 4393 4250 for_each_possible_cpu(cpu) { 4394 4251 struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 4252 + struct kvfree_rcu_bulk_data *bnode; 4395 4253 4396 - spin_lock_init(&krcp->lock); 4397 4254 for (i = 0; i < KFREE_N_BATCHES; i++) { 4398 4255 INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work); 4399 4256 krcp->krw_arr[i].krcp = krcp; 4257 + } 4258 + 4259 + for (i = 0; i < rcu_min_cached_objs; i++) { 4260 + bnode = (struct kvfree_rcu_bulk_data *) 4261 + __get_free_page(GFP_NOWAIT | __GFP_NOWARN); 4262 + 4263 + if (bnode) 4264 + put_cached_bnode(krcp, bnode); 4265 + else 4266 + pr_err("Failed to preallocate for %d CPU!\n", cpu); 4400 4267 } 4401 4268 4402 4269 INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);

+8 -7

kernel/rcu/tree.h

··· 41 41 raw_spinlock_t __private lock; /* Root rcu_node's lock protects */ 42 42 /* some rcu_state fields as well as */ 43 43 /* following. */ 44 - unsigned long gp_seq; /* Track rsp->rcu_gp_seq. */ 44 + unsigned long gp_seq; /* Track rsp->gp_seq. */ 45 45 unsigned long gp_seq_needed; /* Track furthest future GP request. */ 46 46 unsigned long completedqs; /* All QSes done for this node. */ 47 47 unsigned long qsmask; /* CPUs or groups that need to switch in */ ··· 73 73 unsigned long ffmask; /* Fully functional CPUs. */ 74 74 unsigned long grpmask; /* Mask to apply to parent qsmask. */ 75 75 /* Only one bit will be set in this mask. */ 76 - int grplo; /* lowest-numbered CPU or group here. */ 77 - int grphi; /* highest-numbered CPU or group here. */ 78 - u8 grpnum; /* CPU/group number for next level up. */ 76 + int grplo; /* lowest-numbered CPU here. */ 77 + int grphi; /* highest-numbered CPU here. */ 78 + u8 grpnum; /* group number for next level up. */ 79 79 u8 level; /* root is at level 0. */ 80 80 bool wait_blkd_tasks;/* Necessary to wait for blocked tasks to */ 81 81 /* exit RCU read-side critical sections */ ··· 149 149 /* Per-CPU data for read-copy update. */ 150 150 struct rcu_data { 151 151 /* 1) quiescent-state and grace-period handling : */ 152 - unsigned long gp_seq; /* Track rsp->rcu_gp_seq counter. */ 152 + unsigned long gp_seq; /* Track rsp->gp_seq counter. */ 153 153 unsigned long gp_seq_needed; /* Track furthest future GP request. */ 154 154 union rcu_noqs cpu_no_qs; /* No QSes yet for this CPU. */ 155 155 bool core_needs_qs; /* Core waits for quiesc state. */ ··· 171 171 /* different grace periods. */ 172 172 long qlen_last_fqs_check; 173 173 /* qlen at last check for QS forcing */ 174 + unsigned long n_cbs_invoked; /* # callbacks invoked since boot. */ 174 175 unsigned long n_force_qs_snap; 175 176 /* did other CPU force QS recently? */ 176 177 long blimit; /* Upper limit on a processed batch */ ··· 302 301 u8 boost ____cacheline_internodealigned_in_smp; 303 302 /* Subject to priority boost. */ 304 303 unsigned long gp_seq; /* Grace-period sequence #. */ 304 + unsigned long gp_max; /* Maximum GP duration in */ 305 + /* jiffies. */ 305 306 struct task_struct *gp_kthread; /* Task for grace periods. */ 306 307 struct swait_queue_head gp_wq; /* Where GP task waits. */ 307 308 short gp_flags; /* Commands for GP task. */ ··· 349 346 /* a reluctant CPU. */ 350 347 unsigned long n_force_qs_gpstart; /* Snapshot of n_force_qs at */ 351 348 /* GP start. */ 352 - unsigned long gp_max; /* Maximum GP duration in */ 353 - /* jiffies. */ 354 349 const char *name; /* Name of structure. */ 355 350 char abbr; /* Abbreviated name. */ 356 351

+1 -1

kernel/rcu/tree_exp.h

··· 403 403 /* Online, so delay for a bit and try again. */ 404 404 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 405 405 trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("selectofl")); 406 - schedule_timeout_uninterruptible(1); 406 + schedule_timeout_idle(1); 407 407 goto retry_ipi; 408 408 } 409 409 /* CPU really is offline, so we must report its QS. */

+2 -2

kernel/rcu/tree_plugin.h

··· 1033 1033 if (spincnt > 10) { 1034 1034 WRITE_ONCE(rnp->boost_kthread_status, RCU_KTHREAD_YIELDING); 1035 1035 trace_rcu_utilization(TPS("End boost kthread@rcu_yield")); 1036 - schedule_timeout_interruptible(2); 1036 + schedule_timeout_idle(2); 1037 1037 trace_rcu_utilization(TPS("Start boost kthread@rcu_yield")); 1038 1038 spincnt = 0; 1039 1039 } ··· 2005 2005 /* Polling, so trace if first poll in the series. */ 2006 2006 if (gotcbs) 2007 2007 trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Poll")); 2008 - schedule_timeout_interruptible(1); 2008 + schedule_timeout_idle(1); 2009 2009 } else if (!needwait_gp) { 2010 2010 /* Wait for callbacks to appear. */ 2011 2011 trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Sleep"));

+5 -4

kernel/rcu/tree_stall.h

··· 237 237 */ 238 238 static bool check_slow_task(struct task_struct *t, void *arg) 239 239 { 240 - struct rcu_node *rnp; 241 240 struct rcu_stall_chk_rdr *rscrp = arg; 242 241 243 242 if (task_curr(t)) 244 243 return false; // It is running, so decline to inspect it. 245 244 rscrp->nesting = t->rcu_read_lock_nesting; 246 245 rscrp->rs = t->rcu_read_unlock_special; 247 - rnp = t->rcu_blocked_node; 248 246 rscrp->on_blkd_list = !list_empty(&t->rcu_node_entry); 249 247 return true; 250 248 } ··· 466 468 467 469 /* 468 470 * OK, time to rat on our buddy... 469 - * See Documentation/RCU/stallwarn.txt for info on how to debug 471 + * See Documentation/RCU/stallwarn.rst for info on how to debug 470 472 * RCU CPU stall warnings. 471 473 */ 472 474 pr_err("INFO: %s detected stalls on CPUs/tasks:\n", rcu_state.name); ··· 533 535 534 536 /* 535 537 * OK, time to rat on ourselves... 536 - * See Documentation/RCU/stallwarn.txt for info on how to debug 538 + * See Documentation/RCU/stallwarn.rst for info on how to debug 537 539 * RCU CPU stall warnings. 538 540 */ 539 541 pr_err("INFO: %s self-detected stall on CPU\n", rcu_state.name); ··· 647 649 */ 648 650 void show_rcu_gp_kthreads(void) 649 651 { 652 + unsigned long cbs = 0; 650 653 int cpu; 651 654 unsigned long j; 652 655 unsigned long ja; ··· 689 690 } 690 691 for_each_possible_cpu(cpu) { 691 692 rdp = per_cpu_ptr(&rcu_data, cpu); 693 + cbs += data_race(rdp->n_cbs_invoked); 692 694 if (rcu_segcblist_is_offloaded(&rdp->cblist)) 693 695 show_rcu_nocb_state(rdp); 694 696 } 697 + pr_info("RCU callbacks invoked since boot: %lu\n", cbs); 695 698 show_rcu_tasks_gp_kthreads(); 696 699 } 697 700 EXPORT_SYMBOL_GPL(show_rcu_gp_kthreads);

+10 -6

kernel/rcu/update.c

··· 42 42 #include <linux/kprobes.h> 43 43 #include <linux/slab.h> 44 44 #include <linux/irq_work.h> 45 + #include <linux/rcupdate_trace.h> 45 46 46 47 #define CREATE_TRACE_POINTS 47 48 ··· 208 207 rcu_unexpedite_gp(); 209 208 if (rcu_normal_after_boot) 210 209 WRITE_ONCE(rcu_normal, 1); 211 - rcu_boot_ended = 1; 210 + rcu_boot_ended = true; 212 211 } 213 212 214 213 /* ··· 280 279 }; 281 280 EXPORT_SYMBOL_GPL(rcu_sched_lock_map); 282 281 282 + // Tell lockdep when RCU callbacks are being invoked. 283 283 static struct lock_class_key rcu_callback_key; 284 284 struct lockdep_map rcu_callback_map = 285 285 STATIC_LOCKDEP_MAP_INIT("rcu_callback", &rcu_callback_key); ··· 392 390 might_sleep(); 393 391 continue; 394 392 } 395 - init_rcu_head_on_stack(&rs_array[i].head); 396 - init_completion(&rs_array[i].completion); 397 393 for (j = 0; j < i; j++) 398 394 if (crcu_array[j] == crcu_array[i]) 399 395 break; 400 - if (j == i) 396 + if (j == i) { 397 + init_rcu_head_on_stack(&rs_array[i].head); 398 + init_completion(&rs_array[i].completion); 401 399 (crcu_array[i])(&rs_array[i].head, wakeme_after_rcu); 400 + } 402 401 } 403 402 404 403 /* Wait for all callbacks to be invoked. */ ··· 410 407 for (j = 0; j < i; j++) 411 408 if (crcu_array[j] == crcu_array[i]) 412 409 break; 413 - if (j == i) 410 + if (j == i) { 414 411 wait_for_completion(&rs_array[i].completion); 415 - destroy_rcu_head_on_stack(&rs_array[i].head); 412 + destroy_rcu_head_on_stack(&rs_array[i].head); 413 + } 416 414 } 417 415 } 418 416 EXPORT_SYMBOL_GPL(__wait_rcu_gp);

+15 -7

kernel/time/tick-sched.c

··· 351 351 EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cpu); 352 352 353 353 /* 354 - * Set a per-task tick dependency. Posix CPU timers need this in order to elapse 355 - * per task timers. 354 + * Set a per-task tick dependency. RCU need this. Also posix CPU timers 355 + * in order to elapse per task timers. 356 356 */ 357 357 void tick_nohz_dep_set_task(struct task_struct *tsk, enum tick_dep_bits bit) 358 358 { 359 - /* 360 - * We could optimize this with just kicking the target running the task 361 - * if that noise matters for nohz full users. 362 - */ 363 - tick_nohz_dep_set_all(&tsk->tick_dep_mask, bit); 359 + if (!atomic_fetch_or(BIT(bit), &tsk->tick_dep_mask)) { 360 + if (tsk == current) { 361 + preempt_disable(); 362 + tick_nohz_full_kick(); 363 + preempt_enable(); 364 + } else { 365 + /* 366 + * Some future tick_nohz_full_kick_task() 367 + * should optimize this. 368 + */ 369 + tick_nohz_full_kick_all(); 370 + } 371 + } 364 372 } 365 373 EXPORT_SYMBOL_GPL(tick_nohz_dep_set_task); 366 374

+5 -1

kernel/torture.c

··· 45 45 static bool disable_onoff_at_boot; 46 46 module_param(disable_onoff_at_boot, bool, 0444); 47 47 48 + static bool ftrace_dump_at_shutdown; 49 + module_param(ftrace_dump_at_shutdown, bool, 0444); 50 + 48 51 static char *torture_type; 49 52 static int verbose; 50 53 ··· 530 527 torture_shutdown_hook(); 531 528 else 532 529 VERBOSE_TOROUT_STRING("No torture_shutdown_hook(), skipping."); 533 - rcu_ftrace_dump(DUMP_ALL); 530 + if (ftrace_dump_at_shutdown) 531 + rcu_ftrace_dump(DUMP_ALL); 534 532 kernel_power_off(); /* Shut down the system. */ 535 533 return 0; 536 534 }

+95 -8

lib/test_vmalloc.c

··· 15 15 #include <linux/delay.h> 16 16 #include <linux/rwsem.h> 17 17 #include <linux/mm.h> 18 + #include <linux/rcupdate.h> 19 + #include <linux/slab.h> 18 20 19 21 #define __param(type, name, init, msg) \ 20 22 static type name = init; \ ··· 37 35 38 36 __param(int, run_test_mask, INT_MAX, 39 37 "Set tests specified in the mask.\n\n" 40 - "\t\tid: 1, name: fix_size_alloc_test\n" 41 - "\t\tid: 2, name: full_fit_alloc_test\n" 42 - "\t\tid: 4, name: long_busy_list_alloc_test\n" 43 - "\t\tid: 8, name: random_size_alloc_test\n" 44 - "\t\tid: 16, name: fix_align_alloc_test\n" 45 - "\t\tid: 32, name: random_size_align_alloc_test\n" 46 - "\t\tid: 64, name: align_shift_alloc_test\n" 47 - "\t\tid: 128, name: pcpu_alloc_test\n" 38 + "\t\tid: 1, name: fix_size_alloc_test\n" 39 + "\t\tid: 2, name: full_fit_alloc_test\n" 40 + "\t\tid: 4, name: long_busy_list_alloc_test\n" 41 + "\t\tid: 8, name: random_size_alloc_test\n" 42 + "\t\tid: 16, name: fix_align_alloc_test\n" 43 + "\t\tid: 32, name: random_size_align_alloc_test\n" 44 + "\t\tid: 64, name: align_shift_alloc_test\n" 45 + "\t\tid: 128, name: pcpu_alloc_test\n" 46 + "\t\tid: 256, name: kvfree_rcu_1_arg_vmalloc_test\n" 47 + "\t\tid: 512, name: kvfree_rcu_2_arg_vmalloc_test\n" 48 + "\t\tid: 1024, name: kvfree_rcu_1_arg_slab_test\n" 49 + "\t\tid: 2048, name: kvfree_rcu_2_arg_slab_test\n" 48 50 /* Add a new test case description here. */ 49 51 ); 50 52 ··· 322 316 return rv; 323 317 } 324 318 319 + struct test_kvfree_rcu { 320 + struct rcu_head rcu; 321 + unsigned char array[20]; 322 + }; 323 + 324 + static int 325 + kvfree_rcu_1_arg_vmalloc_test(void) 326 + { 327 + struct test_kvfree_rcu *p; 328 + int i; 329 + 330 + for (i = 0; i < test_loop_count; i++) { 331 + p = vmalloc(1 * PAGE_SIZE); 332 + if (!p) 333 + return -1; 334 + 335 + p->array[0] = 'a'; 336 + kvfree_rcu(p); 337 + } 338 + 339 + return 0; 340 + } 341 + 342 + static int 343 + kvfree_rcu_2_arg_vmalloc_test(void) 344 + { 345 + struct test_kvfree_rcu *p; 346 + int i; 347 + 348 + for (i = 0; i < test_loop_count; i++) { 349 + p = vmalloc(1 * PAGE_SIZE); 350 + if (!p) 351 + return -1; 352 + 353 + p->array[0] = 'a'; 354 + kvfree_rcu(p, rcu); 355 + } 356 + 357 + return 0; 358 + } 359 + 360 + static int 361 + kvfree_rcu_1_arg_slab_test(void) 362 + { 363 + struct test_kvfree_rcu *p; 364 + int i; 365 + 366 + for (i = 0; i < test_loop_count; i++) { 367 + p = kmalloc(sizeof(*p), GFP_KERNEL); 368 + if (!p) 369 + return -1; 370 + 371 + p->array[0] = 'a'; 372 + kvfree_rcu(p); 373 + } 374 + 375 + return 0; 376 + } 377 + 378 + static int 379 + kvfree_rcu_2_arg_slab_test(void) 380 + { 381 + struct test_kvfree_rcu *p; 382 + int i; 383 + 384 + for (i = 0; i < test_loop_count; i++) { 385 + p = kmalloc(sizeof(*p), GFP_KERNEL); 386 + if (!p) 387 + return -1; 388 + 389 + p->array[0] = 'a'; 390 + kvfree_rcu(p, rcu); 391 + } 392 + 393 + return 0; 394 + } 395 + 325 396 struct test_case_desc { 326 397 const char *test_name; 327 398 int (*test_func)(void); ··· 413 330 { "random_size_align_alloc_test", random_size_align_alloc_test }, 414 331 { "align_shift_alloc_test", align_shift_alloc_test }, 415 332 { "pcpu_alloc_test", pcpu_alloc_test }, 333 + { "kvfree_rcu_1_arg_vmalloc_test", kvfree_rcu_1_arg_vmalloc_test }, 334 + { "kvfree_rcu_2_arg_vmalloc_test", kvfree_rcu_2_arg_vmalloc_test }, 335 + { "kvfree_rcu_1_arg_slab_test", kvfree_rcu_1_arg_slab_test }, 336 + { "kvfree_rcu_2_arg_slab_test", kvfree_rcu_2_arg_slab_test }, 416 337 /* Add a new test case here. */ 417 338 }; 418 339

+3 -3

mm/list_lru.c

··· 373 373 struct list_lru_memcg *memcg_lrus; 374 374 /* 375 375 * This is called when shrinker has already been unregistered, 376 - * and nobody can use it. So, there is no need to use kvfree_rcu(). 376 + * and nobody can use it. So, there is no need to use kvfree_rcu_local(). 377 377 */ 378 378 memcg_lrus = rcu_dereference_protected(nlru->memcg_lrus, true); 379 379 __memcg_destroy_list_lru_node(memcg_lrus, 0, memcg_nr_cache_ids); 380 380 kvfree(memcg_lrus); 381 381 } 382 382 383 - static void kvfree_rcu(struct rcu_head *head) 383 + static void kvfree_rcu_local(struct rcu_head *head) 384 384 { 385 385 struct list_lru_memcg *mlru; 386 386 ··· 419 419 rcu_assign_pointer(nlru->memcg_lrus, new); 420 420 spin_unlock_irq(&nlru->lock); 421 421 422 - call_rcu(&old->rcu, kvfree_rcu); 422 + call_rcu(&old->rcu, kvfree_rcu_local); 423 423 return 0; 424 424 } 425 425

+1

mm/mmap.c

··· 3171 3171 if (vma->vm_flags & VM_ACCOUNT) 3172 3172 nr_accounted += vma_pages(vma); 3173 3173 vma = remove_vma(vma); 3174 + cond_resched(); 3174 3175 } 3175 3176 vm_unacct_memory(nr_accounted); 3176 3177 }

+2 -2

net/core/sock.c

··· 1973 1973 1974 1974 /* 1975 1975 * Before updating sk_refcnt, we must commit prior changes to memory 1976 - * (Documentation/RCU/rculist_nulls.txt for details) 1976 + * (Documentation/RCU/rculist_nulls.rst for details) 1977 1977 */ 1978 1978 smp_wmb(); 1979 1979 refcount_set(&newsk->sk_refcnt, 2); ··· 3035 3035 sk_rx_queue_clear(sk); 3036 3036 /* 3037 3037 * Before updating sk_refcnt, we must commit prior changes to memory 3038 - * (Documentation/RCU/rculist_nulls.txt for details) 3038 + * (Documentation/RCU/rculist_nulls.rst for details) 3039 3039 */ 3040 3040 smp_wmb(); 3041 3041 refcount_set(&sk->sk_refcnt, 1);

+2 -2

tools/testing/selftests/rcutorture/bin/configinit.sh

··· 32 32 then 33 33 make clean > $resdir/Make.clean 2>&1 34 34 fi 35 - make $TORTURE_DEFCONFIG > $resdir/Make.defconfig.out 2>&1 35 + make $TORTURE_KMAKE_ARG $TORTURE_DEFCONFIG > $resdir/Make.defconfig.out 2>&1 36 36 mv .config .config.sav 37 37 sh $T/upd.sh < .config.sav > .config 38 38 cp .config .config.new 39 - yes '' | make oldconfig > $resdir/Make.oldconfig.out 2> $resdir/Make.oldconfig.err 39 + yes '' | make $TORTURE_KMAKE_ARG oldconfig > $resdir/Make.oldconfig.out 2> $resdir/Make.oldconfig.err 40 40 41 41 # verify new config matches specification. 42 42 configcheck.sh .config $c

+16

tools/testing/selftests/rcutorture/bin/console-badness.sh

+19 -4

tools/testing/selftests/rcutorture/bin/functions.sh

··· 215 215 then 216 216 echo -device spapr-vlan,netdev=net0,mac=$TORTURE_QEMU_MAC 217 217 echo -netdev bridge,br=br0,id=net0 218 - elif test -n "$TORTURE_QEMU_INTERACTIVE" 219 - then 220 - echo -net nic -net user 221 218 fi 222 219 ;; 223 220 esac ··· 231 234 # Returns the number of virtual CPUs available to the aggregate of the 232 235 # guest OSes. 233 236 identify_qemu_vcpus () { 234 - lscpu | grep '^CPU(s):' | sed -e 's/CPU(s)://' 237 + lscpu | grep '^CPU(s):' | sed -e 's/CPU(s)://' -e 's/[ ]*//g' 235 238 } 236 239 237 240 # print_bug ··· 270 273 echo $2 -smp cores=`expr $ $3 + $nt - 1 $ / $nt`,threads=$nt 271 274 ;; 272 275 esac 276 + fi 277 + } 278 + 279 + # specify_qemu_net qemu-args 280 + # 281 + # Appends a string containing "-net none" to qemu-args, unless the incoming 282 + # qemu-args already contains "-smp" or unless the TORTURE_QEMU_INTERACTIVE 283 + # environment variable is set, in which case the string that is be added is 284 + # instead "-net nic -net user". 285 + specify_qemu_net () { 286 + if echo $1 | grep -q -e -net 287 + then 288 + echo $1 289 + elif test -n "$TORTURE_QEMU_INTERACTIVE" 290 + then 291 + echo $1 -net nic -net user 292 + else 293 + echo $1 -net none 273 294 fi 274 295 }

+6

tools/testing/selftests/rcutorture/bin/jitter.sh

··· 46 46 exit 0; 47 47 fi 48 48 49 + # Check for stop request. 50 + if test -f "$TORTURE_STOPFILE" 51 + then 52 + exit 1; 53 + fi 54 + 49 55 # Set affinity to randomly selected online CPU 50 56 if cpus=`grep 1 /sys/devices/system/cpu/*/online 2>&1 | 51 57 sed -e 's,/[^/]*$,,' -e 's/^[^0-9]*//'`

+6

tools/testing/selftests/rcutorture/bin/kvm-build.sh

··· 9 9 # 10 10 # Authors: Paul E. McKenney <paulmck@linux.ibm.com> 11 11 12 + if test -f "$TORTURE_STOPFILE" 13 + then 14 + echo "kvm-build.sh early exit due to run STOP request" 15 + exit 1 16 + fi 17 + 12 18 config_template=${1} 13 19 if test -z "$config_template" -o ! -f "$config_template" -o ! -r "$config_template" 14 20 then

+108

tools/testing/selftests/rcutorture/bin/kvm-check-branches.sh

··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0+ 3 + # 4 + # Run a group of kvm.sh tests on the specified commits. This currently 5 + # unconditionally does three-minute runs on each scenario in CFLIST, 6 + # taking advantage of all available CPUs and trusting the "make" utility. 7 + # In the short term, adjustments can be made by editing this script and 8 + # CFLIST. If some adjustments appear to have ongoing value, this script 9 + # might grow some command-line arguments. 10 + # 11 + # Usage: kvm-check-branches.sh commit1 commit2..commit3 commit4 ... 12 + # 13 + # This script considers its arguments one at a time. If more elaborate 14 + # specification of commits is needed, please use "git rev-list" to 15 + # produce something that this simple script can understand. The reason 16 + # for retaining the simplicity is that it allows the user to more easily 17 + # see which commit came from which branch. 18 + # 19 + # This script creates a yyyy.mm.dd-hh.mm.ss-group entry in the "res" 20 + # directory. The calls to kvm.sh create the usual entries, but this script 21 + # moves them under the yyyy.mm.dd-hh.mm.ss-group entry, each in its own 22 + # directory numbered in run order, that is, "0001", "0002", and so on. 23 + # For successful runs, the large build artifacts are removed. Doing this 24 + # reduces the disk space required by about two orders of magnitude for 25 + # successful runs. 26 + # 27 + # Copyright (C) Facebook, 2020 28 + # 29 + # Authors: Paul E. McKenney <paulmck@kernel.org> 30 + 31 + if ! git status > /dev/null 2>&1 32 + then 33 + echo '!!!' This script needs to run in a git archive. 1>&2 34 + echo '!!!' Giving up. 1>&2 35 + exit 1 36 + fi 37 + 38 + # Remember where we started so that we can get back and the end. 39 + curcommit="`git status | head -1 | awk '{ print $NF }'`" 40 + 41 + nfail=0 42 + ntry=0 43 + resdir="tools/testing/selftests/rcutorture/res" 44 + ds="`date +%Y.%m.%d-%H.%M.%S`-group" 45 + if ! test -e $resdir 46 + then 47 + mkdir $resdir || : 48 + fi 49 + mkdir $resdir/$ds 50 + echo Results directory: $resdir/$ds 51 + 52 + KVM="`pwd`/tools/testing/selftests/rcutorture"; export KVM 53 + PATH=${KVM}/bin:$PATH; export PATH 54 + . functions.sh 55 + cpus="`identify_qemu_vcpus`" 56 + echo Using up to $cpus CPUs. 57 + 58 + # Each pass through this loop does one command-line argument. 59 + for gitbr in $@ 60 + do 61 + echo ' --- git branch ' $gitbr 62 + 63 + # Each pass through this loop tests one commit. 64 + for i in `git rev-list "$gitbr"` 65 + do 66 + ntry=`expr $ntry + 1` 67 + idir=`awk -v ntry="$ntry" 'END { printf "%04d", ntry; }' < /dev/null` 68 + echo ' --- commit ' $i from branch $gitbr 69 + date 70 + mkdir $resdir/$ds/$idir 71 + echo $gitbr > $resdir/$ds/$idir/gitbr 72 + echo $i >> $resdir/$ds/$idir/gitbr 73 + 74 + # Test the specified commit. 75 + git checkout $i > $resdir/$ds/$idir/git-checkout.out 2>&1 76 + echo git checkout return code: $? "(Commit $ntry: $i)" 77 + kvm.sh --cpus $cpus --duration 3 --trust-make > $resdir/$ds/$idir/kvm.sh.out 2>&1 78 + ret=$? 79 + echo kvm.sh return code $ret for commit $i from branch $gitbr 80 + 81 + # Move the build products to their resting place. 82 + runresdir="`grep -m 1 '^Results directory:' < $resdir/$ds/$idir/kvm.sh.out | sed -e 's/^Results directory://'`" 83 + mv $runresdir $resdir/$ds/$idir 84 + rrd="`echo $runresdir | sed -e 's,^.*/,,'`" 85 + echo Run results: $resdir/$ds/$idir/$rrd 86 + if test "$ret" -ne 0 87 + then 88 + # Failure, so leave all evidence intact. 89 + nfail=`expr $nfail + 1` 90 + else 91 + # Success, so remove large files to save about 1GB. 92 + ( cd $resdir/$ds/$idir/$rrd; rm -f */vmlinux */bzImage */System.map */Module.symvers ) 93 + fi 94 + done 95 + done 96 + date 97 + 98 + # Go back to the original commit. 99 + git checkout "$curcommit" 100 + 101 + if test $nfail -ne 0 102 + then 103 + echo '!!! ' $nfail failures in $ntry 'runs!!!' 104 + exit 1 105 + else 106 + echo No failures in $ntry runs. 107 + exit 0 108 + fi

+71

tools/testing/selftests/rcutorture/bin/kvm-recheck-refscale.sh

··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0+ 3 + # 4 + # Analyze a given results directory for refscale performance measurements. 5 + # 6 + # Usage: kvm-recheck-refscale.sh resdir 7 + # 8 + # Copyright (C) IBM Corporation, 2016 9 + # 10 + # Authors: Paul E. McKenney <paulmck@linux.ibm.com> 11 + 12 + i="$1" 13 + if test -d "$i" -a -r "$i" 14 + then 15 + : 16 + else 17 + echo Unreadable results directory: $i 18 + exit 1 19 + fi 20 + PATH=`pwd`/tools/testing/selftests/rcutorture/bin:$PATH; export PATH 21 + . functions.sh 22 + 23 + configfile=`echo $i | sed -e 's/^.*\///'` 24 + 25 + sed -e 's/^\[[^]]*]//' < $i/console.log | tr -d '\015' | 26 + awk -v configfile="$configfile" ' 27 + /^[ ]*Runs Time$ns$ *$/ { 28 + if (dataphase + 0 == 0) { 29 + dataphase = 1; 30 + # print configfile, $0; 31 + } 32 + next; 33 + } 34 + 35 + /[^ ]*[0-9][0-9]* [0-9][0-9]*\.[0-9][0-9]*$/ { 36 + if (dataphase == 1) { 37 + # print $0; 38 + readertimes[++n] = $2; 39 + sum += $2; 40 + } 41 + next; 42 + } 43 + 44 + { 45 + if (dataphase == 1) 46 + dataphase == 2; 47 + next; 48 + } 49 + 50 + END { 51 + print configfile " results:"; 52 + newNR = asort(readertimes); 53 + if (newNR <= 0) { 54 + print "No refscale records found???" 55 + exit; 56 + } 57 + medianidx = int(newNR / 2); 58 + if (newNR == medianidx * 2) 59 + medianvalue = (readertimes[medianidx - 1] + readertimes[medianidx]) / 2; 60 + else 61 + medianvalue = readertimes[medianidx]; 62 + points = "Points:"; 63 + for (i = 1; i <= newNR; i++) 64 + points = points " " readertimes[i]; 65 + print points; 66 + print "Average reader duration: " sum / newNR " nanoseconds"; 67 + print "Minimum reader duration: " readertimes[1]; 68 + print "Median reader duration: " medianvalue; 69 + print "Maximum reader duration: " readertimes[newNR]; 70 + print "Computed from refscale printk output."; 71 + }'

+13 -7

tools/testing/selftests/rcutorture/bin/kvm-recheck.sh

··· 31 31 head -1 $resdir/log 32 32 fi 33 33 TORTURE_SUITE="`cat $i/../TORTURE_SUITE`" 34 + configfile=`echo $i | sed -e 's,^.*/,,'` 34 35 rm -f $i/console.log.*.diags 35 36 kvm-recheck-${TORTURE_SUITE}.sh $i 36 37 if test -f "$i/qemu-retval" && test "`cat $i/qemu-retval`" -ne 0 && test "`cat $i/qemu-retval`" -ne 137 ··· 44 43 then 45 44 echo QEMU killed 46 45 fi 47 - configcheck.sh $i/.config $i/ConfigFragment 46 + configcheck.sh $i/.config $i/ConfigFragment > $T 2>&1 47 + cat $T 48 48 if test -r $i/Make.oldconfig.err 49 49 then 50 50 cat $i/Make.oldconfig.err ··· 57 55 cat $i/Warnings 58 56 fi 59 57 else 60 - if test -f "$i/qemu-cmd" 61 - then 62 - print_bug qemu failed 63 - echo " $i" 64 - elif test -f "$i/buildonly" 58 + if test -f "$i/buildonly" 65 59 then 66 60 echo Build-only run, no boot/test 67 61 configcheck.sh $i/.config $i/ConfigFragment 68 62 parse-build.sh $i/Make.out $configfile 63 + elif test -f "$i/qemu-cmd" 64 + then 65 + print_bug qemu failed 66 + echo " $i" 69 67 else 70 68 print_bug Build failed 71 69 echo " $i" ··· 74 72 done 75 73 if test -f "$rd/kcsan.sum" 76 74 then 77 - if test -s "$rd/kcsan.sum" 75 + if grep -q CONFIG_KCSAN=y $T 76 + then 77 + echo "Compiler or architecture does not support KCSAN!" 78 + echo Did you forget to switch your compiler with '--kmake-arg CC=<cc-that-supports-kcsan>'? 79 + elif test -s "$rd/kcsan.sum" 78 80 then 79 81 echo KCSAN summary in $rd/kcsan.sum 80 82 else

+22 -5

tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh

··· 124 124 qemu_args=$5 125 125 boot_args=$6 126 126 127 - cd $KVM 128 127 kstarttime=`gawk 'BEGIN { print systime() }' < /dev/null` 129 128 if test -z "$TORTURE_BUILDONLY" 130 129 then ··· 140 141 cpu_count=$TORTURE_ALLOTED_CPUS 141 142 fi 142 143 qemu_args="`specify_qemu_cpus "$QEMU" "$qemu_args" "$cpu_count"`" 144 + qemu_args="`specify_qemu_net "$qemu_args"`" 143 145 144 146 # Generate architecture-specific and interaction-specific qemu arguments 145 147 qemu_args="$qemu_args `identify_qemu_args "$QEMU" "$resdir/console.log"`" ··· 152 152 boot_args="`configfrag_boot_params "$boot_args" "$config_template"`" 153 153 # Generate kernel-version-specific boot parameters 154 154 boot_args="`per_version_boot_params "$boot_args" $resdir/.config $seconds`" 155 + echo $QEMU $qemu_args -m $TORTURE_QEMU_MEM -kernel $KERNEL -append \"$qemu_append $boot_args\" > $resdir/qemu-cmd 155 156 156 157 if test -n "$TORTURE_BUILDONLY" 157 158 then ··· 160 159 touch $resdir/buildonly 161 160 exit 0 162 161 fi 162 + 163 + # Decorate qemu-cmd with redirection, backgrounding, and PID capture 164 + sed -e 's/$/ 2>\&1 \&/' < $resdir/qemu-cmd > $T/qemu-cmd 165 + echo 'echo $! > $resdir/qemu_pid' >> $T/qemu-cmd 166 + 167 + # In case qemu refuses to run... 163 168 echo "NOTE: $QEMU either did not run or was interactive" > $resdir/console.log 164 - echo $QEMU $qemu_args -m $TORTURE_QEMU_MEM -kernel $KERNEL -append \"$qemu_append $boot_args\" > $resdir/qemu-cmd 165 - ( $QEMU $qemu_args -m $TORTURE_QEMU_MEM -kernel $KERNEL -append "$qemu_append $boot_args" > $resdir/qemu-output 2>&1 & echo $! > $resdir/qemu_pid; wait `cat $resdir/qemu_pid`; echo $? > $resdir/qemu-retval ) & 169 + 170 + # Attempt to run qemu 171 + ( . $T/qemu-cmd; wait `cat $resdir/qemu_pid`; echo $? > $resdir/qemu-retval ) & 166 172 commandcompleted=0 167 173 sleep 10 # Give qemu's pid a chance to reach the file 168 174 if test -s "$resdir/qemu_pid" ··· 189 181 kruntime=`gawk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null` 190 182 if test -z "$qemu_pid" || kill -0 "$qemu_pid" > /dev/null 2>&1 191 183 then 192 - if test $kruntime -ge $seconds 184 + if test $kruntime -ge $seconds -o -f "$TORTURE_STOPFILE" 193 185 then 194 186 break; 195 187 fi ··· 218 210 fi 219 211 if test $commandcompleted -eq 0 -a -n "$qemu_pid" 220 212 then 221 - echo Grace period for qemu job at pid $qemu_pid 213 + if ! test -f "$TORTURE_STOPFILE" 214 + then 215 + echo Grace period for qemu job at pid $qemu_pid 216 + fi 222 217 oldline="`tail $resdir/console.log`" 223 218 while : 224 219 do 220 + if test -f "$TORTURE_STOPFILE" 221 + then 222 + echo "PID $qemu_pid killed due to run STOP request" >> $resdir/Warnings 2>&1 223 + kill -KILL $qemu_pid 224 + break 225 + fi 225 226 kruntime=`gawk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null` 226 227 if kill -0 $qemu_pid > /dev/null 2>&1 227 228 then

+51

tools/testing/selftests/rcutorture/bin/kvm-transform.sh

··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0+ 3 + # 4 + # Transform a qemu-cmd file to allow reuse. 5 + # 6 + # Usage: kvm-transform.sh bzImage console.log < qemu-cmd-in > qemu-cmd-out 7 + # 8 + # bzImage: Kernel and initrd from the same prior kvm.sh run. 9 + # console.log: File into which to place console output. 10 + # 11 + # The original qemu-cmd file is provided on standard input. 12 + # The transformed qemu-cmd file is on standard output. 13 + # The transformation assumes that the qemu command is confined to a 14 + # single line. It also assumes no whitespace in filenames. 15 + # 16 + # Copyright (C) 2020 Facebook, Inc. 17 + # 18 + # Authors: Paul E. McKenney <paulmck@kernel.org> 19 + 20 + image="$1" 21 + if test -z "$image" 22 + then 23 + echo Need kernel image file. 24 + exit 1 25 + fi 26 + consolelog="$2" 27 + if test -z "$consolelog" 28 + then 29 + echo "Need console log file name." 30 + exit 1 31 + fi 32 + 33 + awk -v image="$image" -v consolelog="$consolelog" ' 34 + { 35 + line = ""; 36 + for (i = 1; i <= NF; i++) { 37 + if (line == "") 38 + line = $i; 39 + else 40 + line = line " " $i; 41 + if ($i == "-serial") { 42 + i++; 43 + line = line " file:" consolelog; 44 + } 45 + if ($i == "-kernel") { 46 + i++; 47 + line = line " " image; 48 + } 49 + } 50 + print line; 51 + }'

+15 -4

tools/testing/selftests/rcutorture/bin/kvm.sh

··· 73 73 while test $# -gt 0 74 74 do 75 75 case "$1" in 76 + --allcpus) 77 + cpus=$TORTURE_ALLOTED_CPUS 78 + max_cpus=$TORTURE_ALLOTED_CPUS 79 + ;; 76 80 --bootargs|--bootarg) 77 81 checkarg --bootargs "(list of kernel boot arguments)" "$#" "$2" '.*' '^--' 78 82 TORTURE_BOOTARGS="$2" ··· 184 180 shift 185 181 ;; 186 182 --torture) 187 - checkarg --torture "(suite name)" "$#" "$2" '^$lock\|rcu\|rcuperf$$' '^--' 183 + checkarg --torture "(suite name)" "$#" "$2" '^$lock\|rcu\|rcuperf\|refscale$$' '^--' 188 184 TORTURE_SUITE=$2 189 185 shift 190 - if test "$TORTURE_SUITE" = rcuperf 186 + if test "$TORTURE_SUITE" = rcuperf || test "$TORTURE_SUITE" = refscale 191 187 then 192 - # If you really want jitter for rcuperf, specify 193 - # it after specifying rcuperf. (But why?) 188 + # If you really want jitter for refscale or 189 + # rcuperf, specify it after specifying the rcuperf 190 + # or the refscale. (But why jitter in these cases?) 194 191 jitter=0 195 192 fi 196 193 ;; ··· 338 333 mkdir -p "$resdir" || : 339 334 fi 340 335 mkdir $resdir/$ds 336 + TORTURE_RESDIR="$resdir/$ds"; export TORTURE_RESDIR 337 + TORTURE_STOPFILE="$resdir/$ds/STOP"; export TORTURE_STOPFILE 341 338 echo Results directory: $resdir/$ds 342 339 echo $scriptname $args 343 340 touch $resdir/$ds/log ··· 504 497 # Tracing: trace_event=rcu:rcu_grace_period,rcu:rcu_future_grace_period,rcu:rcu_grace_period_init,rcu:rcu_nocb_wake,rcu:rcu_preempt_task,rcu:rcu_unlock_preempted_task,rcu:rcu_quiescent_state_report,rcu:rcu_fqs,rcu:rcu_callback,rcu:rcu_kfree_callback,rcu:rcu_batch_start,rcu:rcu_invoke_callback,rcu:rcu_invoke_kfree_callback,rcu:rcu_batch_end,rcu:rcu_torture_read,rcu:rcu_barrier 505 498 # Function-graph tracing: ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop 506 499 # Also --kconfig "CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y" 500 + # Control buffer size: --bootargs trace_buf_size=3k 501 + # Get trace-buffer dumps on all oopses: --bootargs ftrace_dump_on_oops 502 + # Ditto, but dump only the oopsing CPU: --bootargs ftrace_dump_on_oops=orig_cpu 503 + # Heavy-handed way to also dump on warnings: --bootargs panic_on_warn

+18 -9

tools/testing/selftests/rcutorture/bin/parse-console.sh

··· 33 33 fi 34 34 cat /dev/null > $file.diags 35 35 36 - # Check for proper termination, except that rcuperf runs don't indicate this. 37 - if test "$TORTURE_SUITE" != rcuperf 36 + # Check for proper termination, except for rcuperf and refscale. 37 + if test "$TORTURE_SUITE" != rcuperf && test "$TORTURE_SUITE" != refscale 38 38 then 39 39 # check for abject failure 40 40 ··· 44 44 tail -1 | 45 45 awk ' 46 46 { 47 - for (i=NF-8;i<=NF;i++) 47 + normalexit = 1; 48 + for (i=NF-8;i<=NF;i++) { 49 + if (i <= 0 || i !~ /^[0-9]*$/) { 50 + bangstring = $0; 51 + gsub(/^\[[^]]*] /, "", bangstring); 52 + print bangstring; 53 + normalexit = 0; 54 + exit 0; 55 + } 48 56 sum+=$i; 57 + } 49 58 } 50 - END { print sum }'` 51 - print_bug $title FAILURE, $nerrs instances 59 + END { 60 + if (normalexit) 61 + print sum " instances" 62 + }'` 63 + print_bug $title FAILURE, $nerrs 52 64 exit 53 65 fi 54 66 ··· 116 104 fi 117 105 fi | tee -a $file.diags 118 106 119 - egrep 'Badness|WARNING:|Warn|BUG|===========|Call Trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state|rcu_.*kthread starved for' < $file | 120 - grep -v 'ODEBUG: ' | 121 - grep -v 'This means that this is a DEBUG kernel and it is' | 122 - grep -v 'Warning: unable to open an initial console' > $T.diags 107 + console-badness.sh < $file > $T.diags 123 108 if test -s $T.diags 124 109 then 125 110 print_warning "Assertion failure in $file $title"

+2

tools/testing/selftests/rcutorture/configs/refscale/CFLIST

··· 1 + NOPREEMPT 2 + PREEMPT

+2

tools/testing/selftests/rcutorture/configs/refscale/CFcommon

··· 1 + CONFIG_RCU_REF_SCALE_TEST=y 2 + CONFIG_PRINTK_TIME=y

+18

tools/testing/selftests/rcutorture/configs/refscale/NOPREEMPT

··· 1 + CONFIG_SMP=y 2 + CONFIG_PREEMPT_NONE=y 3 + CONFIG_PREEMPT_VOLUNTARY=n 4 + CONFIG_PREEMPT=n 5 + #CHECK#CONFIG_PREEMPT_RCU=n 6 + CONFIG_HZ_PERIODIC=n 7 + CONFIG_NO_HZ_IDLE=y 8 + CONFIG_NO_HZ_FULL=n 9 + CONFIG_RCU_FAST_NO_HZ=n 10 + CONFIG_HOTPLUG_CPU=n 11 + CONFIG_SUSPEND=n 12 + CONFIG_HIBERNATION=n 13 + CONFIG_RCU_NOCB_CPU=n 14 + CONFIG_DEBUG_LOCK_ALLOC=n 15 + CONFIG_PROVE_LOCKING=n 16 + CONFIG_RCU_BOOST=n 17 + CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 18 + CONFIG_RCU_EXPERT=y

+18

tools/testing/selftests/rcutorture/configs/refscale/PREEMPT

··· 1 + CONFIG_SMP=y 2 + CONFIG_PREEMPT_NONE=n 3 + CONFIG_PREEMPT_VOLUNTARY=n 4 + CONFIG_PREEMPT=y 5 + #CHECK#CONFIG_PREEMPT_RCU=y 6 + CONFIG_HZ_PERIODIC=n 7 + CONFIG_NO_HZ_IDLE=y 8 + CONFIG_NO_HZ_FULL=n 9 + CONFIG_RCU_FAST_NO_HZ=n 10 + CONFIG_HOTPLUG_CPU=n 11 + CONFIG_SUSPEND=n 12 + CONFIG_HIBERNATION=n 13 + CONFIG_RCU_NOCB_CPU=n 14 + CONFIG_DEBUG_LOCK_ALLOC=n 15 + CONFIG_PROVE_LOCKING=n 16 + CONFIG_RCU_BOOST=n 17 + CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 18 + CONFIG_RCU_EXPERT=y

+16

tools/testing/selftests/rcutorture/configs/refscale/ver_functions.sh

··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0+ 3 + # 4 + # Torture-suite-dependent shell functions for the rest of the scripts. 5 + # 6 + # Copyright (C) IBM Corporation, 2015 7 + # 8 + # Authors: Paul E. McKenney <paulmck@linux.ibm.com> 9 + 10 + # per_version_boot_params bootparam-string config-file seconds 11 + # 12 + # Adds per-version torture-module parameters to kernels supporting them. 13 + per_version_boot_params () { 14 + echo $1 refscale.shutdown=1 \ 15 + refscale.verbose=1 16 + }