commit b36c830f8c9b13bfe69b117e879153776c19ad82 · tjh.dev/kernel

+1 -1

Documentation/RCU/Design/Data-Structures/Data-Structures.rst

··· 963 ``->dynticks_nesting`` field is incremented up from zero, the 964 ``->dynticks_nmi_nesting`` field is set to a large positive number, and 965 whenever the ``->dynticks_nesting`` field is decremented down to zero, 966 - the the ``->dynticks_nmi_nesting`` field is set to zero. Assuming that 967 the number of misnested interrupts is not sufficient to overflow the 968 counter, this approach corrects the ``->dynticks_nmi_nesting`` field 969 every time the corresponding CPU enters the idle loop from process

··· 963 ``->dynticks_nesting`` field is incremented up from zero, the 964 ``->dynticks_nmi_nesting`` field is set to a large positive number, and 965 whenever the ``->dynticks_nesting`` field is decremented down to zero, 966 + the ``->dynticks_nmi_nesting`` field is set to zero. Assuming that 967 the number of misnested interrupts is not sufficient to overflow the 968 counter, this approach corrects the ``->dynticks_nmi_nesting`` field 969 every time the corresponding CPU enters the idle loop from process

+2 -2

Documentation/RCU/Design/Requirements/Requirements.rst

··· 2162 this sort of thing. 2163 #. If a CPU is in a portion of the kernel that is absolutely positively 2164 no-joking guaranteed to never execute any RCU read-side critical 2165 - sections, and RCU believes this CPU to to be idle, no problem. This 2166 sort of thing is used by some architectures for light-weight 2167 exception handlers, which can then avoid the overhead of 2168 ``rcu_irq_enter()`` and ``rcu_irq_exit()`` at exception entry and ··· 2431 not have this property, given that any point in the code outside of an 2432 RCU read-side critical section can be a quiescent state. Therefore, 2433 *RCU-sched* was created, which follows “classic” RCU in that an 2434 - RCU-sched grace period waits for for pre-existing interrupt and NMI 2435 handlers. In kernels built with ``CONFIG_PREEMPT=n``, the RCU and 2436 RCU-sched APIs have identical implementations, while kernels built with 2437 ``CONFIG_PREEMPT=y`` provide a separate implementation for each.

··· 2162 this sort of thing. 2163 #. If a CPU is in a portion of the kernel that is absolutely positively 2164 no-joking guaranteed to never execute any RCU read-side critical 2165 + sections, and RCU believes this CPU to be idle, no problem. This 2166 sort of thing is used by some architectures for light-weight 2167 exception handlers, which can then avoid the overhead of 2168 ``rcu_irq_enter()`` and ``rcu_irq_exit()`` at exception entry and ··· 2431 not have this property, given that any point in the code outside of an 2432 RCU read-side critical section can be a quiescent state. Therefore, 2433 *RCU-sched* was created, which follows “classic” RCU in that an 2434 + RCU-sched grace period waits for pre-existing interrupt and NMI 2435 handlers. In kernels built with ``CONFIG_PREEMPT=n``, the RCU and 2436 RCU-sched APIs have identical implementations, while kernels built with 2437 ``CONFIG_PREEMPT=y`` provide a separate implementation for each.

+1 -1

Documentation/RCU/whatisRCU.rst

··· 360 361 There are at least three flavors of RCU usage in the Linux kernel. The diagram 362 above shows the most common one. On the updater side, the rcu_assign_pointer(), 363 - sychronize_rcu() and call_rcu() primitives used are the same for all three 364 flavors. However for protection (on the reader side), the primitives used vary 365 depending on the flavor: 366

··· 360 361 There are at least three flavors of RCU usage in the Linux kernel. The diagram 362 above shows the most common one. On the updater side, the rcu_assign_pointer(), 363 + synchronize_rcu() and call_rcu() primitives used are the same for all three 364 flavors. However for protection (on the reader side), the primitives used vary 365 depending on the flavor: 366

+135 -18

Documentation/admin-guide/kernel-parameters.txt

··· 3070 and gids from such clients. This is intended to ease 3071 migration from NFSv2/v3. 3072 3073 nmi_debug= [KNL,SH] Specify one or more actions to take 3074 when a NMI is triggered. 3075 Format: [state][,regs][,debounce][,die] ··· 4153 This wake_up() will be accompanied by a 4154 WARN_ONCE() splat and an ftrace_dump(). 4155 4156 rcutree.sysrq_rcu= [KNL] 4157 Commandeer a sysrq key to dump out Tree RCU's 4158 rcu_node tree with an eye towards determining 4159 why a new grace period has not yet started. 4160 4161 - rcuperf.gp_async= [KNL] 4162 Measure performance of asynchronous 4163 grace-period primitives such as call_rcu(). 4164 4165 - rcuperf.gp_async_max= [KNL] 4166 Specify the maximum number of outstanding 4167 callbacks per writer thread. When a writer 4168 thread exceeds this limit, it invokes the 4169 corresponding flavor of rcu_barrier() to allow 4170 previously posted callbacks to drain. 4171 4172 - rcuperf.gp_exp= [KNL] 4173 Measure performance of expedited synchronous 4174 grace-period primitives. 4175 4176 - rcuperf.holdoff= [KNL] 4177 Set test-start holdoff period. The purpose of 4178 this parameter is to delay the start of the 4179 test until boot completes in order to avoid 4180 interference. 4181 4182 - rcuperf.kfree_rcu_test= [KNL] 4183 Set to measure performance of kfree_rcu() flooding. 4184 4185 - rcuperf.kfree_nthreads= [KNL] 4186 The number of threads running loops of kfree_rcu(). 4187 4188 - rcuperf.kfree_alloc_num= [KNL] 4189 Number of allocations and frees done in an iteration. 4190 4191 - rcuperf.kfree_loops= [KNL] 4192 - Number of loops doing rcuperf.kfree_alloc_num number 4193 of allocations and frees. 4194 4195 - rcuperf.nreaders= [KNL] 4196 Set number of RCU readers. The value -1 selects 4197 N, where N is the number of CPUs. A value 4198 "n" less than -1 selects N-n+1, where N is again ··· 4210 A value of "n" less than or equal to -N selects 4211 a single reader. 4212 4213 - rcuperf.nwriters= [KNL] 4214 Set number of RCU writers. The values operate 4215 - the same as for rcuperf.nreaders. 4216 N, where N is the number of CPUs 4217 4218 - rcuperf.perf_type= [KNL] 4219 Specify the RCU implementation to test. 4220 4221 - rcuperf.shutdown= [KNL] 4222 Shut the system down after performance tests 4223 complete. This is useful for hands-off automated 4224 testing. 4225 4226 - rcuperf.verbose= [KNL] 4227 Enable additional printk() statements. 4228 4229 - rcuperf.writer_holdoff= [KNL] 4230 Write-side holdoff between grace periods, 4231 in microseconds. The default of zero says 4232 no holdoff. ··· 4278 rcutorture.gp_normal=, and rcutorture.gp_sync= 4279 are zero, rcutorture acts as if is interpreted 4280 they are all non-zero. 4281 4282 rcutorture.n_barrier_cbs= [KNL] 4283 Set callbacks/threads for rcu_barrier() testing. ··· 4512 refscale.shutdown= [KNL] 4513 Shut down the system at the end of the performance 4514 test. This defaults to 1 (shut it down) when 4515 - rcuperf is built into the kernel and to 0 (leave 4516 - it running) when rcuperf is built as a module. 4517 4518 refscale.verbose= [KNL] 4519 Enable additional printk() statements. ··· 4658 and so on. 4659 Format: integer between 0 and 10 4660 Default is 0. 4661 4662 skew_tick= [KNL] Offset the periodic timer tick per cpu to mitigate 4663 xtime_lock contention on larger systems, and/or RCU lock

··· 3070 and gids from such clients. This is intended to ease 3071 migration from NFSv2/v3. 3072 3073 + nmi_backtrace.backtrace_idle [KNL] 3074 + Dump stacks even of idle CPUs in response to an 3075 + NMI stack-backtrace request. 3076 + 3077 nmi_debug= [KNL,SH] Specify one or more actions to take 3078 when a NMI is triggered. 3079 Format: [state][,regs][,debounce][,die] ··· 4149 This wake_up() will be accompanied by a 4150 WARN_ONCE() splat and an ftrace_dump(). 4151 4152 + rcutree.rcu_unlock_delay= [KNL] 4153 + In CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, 4154 + this specifies an rcu_read_unlock()-time delay 4155 + in microseconds. This defaults to zero. 4156 + Larger delays increase the probability of 4157 + catching RCU pointer leaks, that is, buggy use 4158 + of RCU-protected pointers after the relevant 4159 + rcu_read_unlock() has completed. 4160 + 4161 rcutree.sysrq_rcu= [KNL] 4162 Commandeer a sysrq key to dump out Tree RCU's 4163 rcu_node tree with an eye towards determining 4164 why a new grace period has not yet started. 4165 4166 + rcuscale.gp_async= [KNL] 4167 Measure performance of asynchronous 4168 grace-period primitives such as call_rcu(). 4169 4170 + rcuscale.gp_async_max= [KNL] 4171 Specify the maximum number of outstanding 4172 callbacks per writer thread. When a writer 4173 thread exceeds this limit, it invokes the 4174 corresponding flavor of rcu_barrier() to allow 4175 previously posted callbacks to drain. 4176 4177 + rcuscale.gp_exp= [KNL] 4178 Measure performance of expedited synchronous 4179 grace-period primitives. 4180 4181 + rcuscale.holdoff= [KNL] 4182 Set test-start holdoff period. The purpose of 4183 this parameter is to delay the start of the 4184 test until boot completes in order to avoid 4185 interference. 4186 4187 + rcuscale.kfree_rcu_test= [KNL] 4188 Set to measure performance of kfree_rcu() flooding. 4189 4190 + rcuscale.kfree_nthreads= [KNL] 4191 The number of threads running loops of kfree_rcu(). 4192 4193 + rcuscale.kfree_alloc_num= [KNL] 4194 Number of allocations and frees done in an iteration. 4195 4196 + rcuscale.kfree_loops= [KNL] 4197 + Number of loops doing rcuscale.kfree_alloc_num number 4198 of allocations and frees. 4199 4200 + rcuscale.nreaders= [KNL] 4201 Set number of RCU readers. The value -1 selects 4202 N, where N is the number of CPUs. A value 4203 "n" less than -1 selects N-n+1, where N is again ··· 4197 A value of "n" less than or equal to -N selects 4198 a single reader. 4199 4200 + rcuscale.nwriters= [KNL] 4201 Set number of RCU writers. The values operate 4202 + the same as for rcuscale.nreaders. 4203 N, where N is the number of CPUs 4204 4205 + rcuscale.perf_type= [KNL] 4206 Specify the RCU implementation to test. 4207 4208 + rcuscale.shutdown= [KNL] 4209 Shut the system down after performance tests 4210 complete. This is useful for hands-off automated 4211 testing. 4212 4213 + rcuscale.verbose= [KNL] 4214 Enable additional printk() statements. 4215 4216 + rcuscale.writer_holdoff= [KNL] 4217 Write-side holdoff between grace periods, 4218 in microseconds. The default of zero says 4219 no holdoff. ··· 4265 rcutorture.gp_normal=, and rcutorture.gp_sync= 4266 are zero, rcutorture acts as if is interpreted 4267 they are all non-zero. 4268 + 4269 + rcutorture.irqreader= [KNL] 4270 + Run RCU readers from irq handlers, or, more 4271 + accurately, from a timer handler. Not all RCU 4272 + flavors take kindly to this sort of thing. 4273 + 4274 + rcutorture.leakpointer= [KNL] 4275 + Leak an RCU-protected pointer out of the reader. 4276 + This can of course result in splats, and is 4277 + intended to test the ability of things like 4278 + CONFIG_RCU_STRICT_GRACE_PERIOD=y to detect 4279 + such leaks. 4280 4281 rcutorture.n_barrier_cbs= [KNL] 4282 Set callbacks/threads for rcu_barrier() testing. ··· 4487 refscale.shutdown= [KNL] 4488 Shut down the system at the end of the performance 4489 test. This defaults to 1 (shut it down) when 4490 + refscale is built into the kernel and to 0 (leave 4491 + it running) when refscale is built as a module. 4492 4493 refscale.verbose= [KNL] 4494 Enable additional printk() statements. ··· 4633 and so on. 4634 Format: integer between 0 and 10 4635 Default is 0. 4636 + 4637 + scftorture.holdoff= [KNL] 4638 + Number of seconds to hold off before starting 4639 + test. Defaults to zero for module insertion and 4640 + to 10 seconds for built-in smp_call_function() 4641 + tests. 4642 + 4643 + scftorture.longwait= [KNL] 4644 + Request ridiculously long waits randomly selected 4645 + up to the chosen limit in seconds. Zero (the 4646 + default) disables this feature. Please note 4647 + that requesting even small non-zero numbers of 4648 + seconds can result in RCU CPU stall warnings, 4649 + softlockup complaints, and so on. 4650 + 4651 + scftorture.nthreads= [KNL] 4652 + Number of kthreads to spawn to invoke the 4653 + smp_call_function() family of functions. 4654 + The default of -1 specifies a number of kthreads 4655 + equal to the number of CPUs. 4656 + 4657 + scftorture.onoff_holdoff= [KNL] 4658 + Number seconds to wait after the start of the 4659 + test before initiating CPU-hotplug operations. 4660 + 4661 + scftorture.onoff_interval= [KNL] 4662 + Number seconds to wait between successive 4663 + CPU-hotplug operations. Specifying zero (which 4664 + is the default) disables CPU-hotplug operations. 4665 + 4666 + scftorture.shutdown_secs= [KNL] 4667 + The number of seconds following the start of the 4668 + test after which to shut down the system. The 4669 + default of zero avoids shutting down the system. 4670 + Non-zero values are useful for automated tests. 4671 + 4672 + scftorture.stat_interval= [KNL] 4673 + The number of seconds between outputting the 4674 + current test statistics to the console. A value 4675 + of zero disables statistics output. 4676 + 4677 + scftorture.stutter_cpus= [KNL] 4678 + The number of jiffies to wait between each change 4679 + to the set of CPUs under test. 4680 + 4681 + scftorture.use_cpus_read_lock= [KNL] 4682 + Use use_cpus_read_lock() instead of the default 4683 + preempt_disable() to disable CPU hotplug 4684 + while invoking one of the smp_call_function*() 4685 + functions. 4686 + 4687 + scftorture.verbose= [KNL] 4688 + Enable additional printk() statements. 4689 + 4690 + scftorture.weight_single= [KNL] 4691 + The probability weighting to use for the 4692 + smp_call_function_single() function with a zero 4693 + "wait" parameter. A value of -1 selects the 4694 + default if all other weights are -1. However, 4695 + if at least one weight has some other value, a 4696 + value of -1 will instead select a weight of zero. 4697 + 4698 + scftorture.weight_single_wait= [KNL] 4699 + The probability weighting to use for the 4700 + smp_call_function_single() function with a 4701 + non-zero "wait" parameter. See weight_single. 4702 + 4703 + scftorture.weight_many= [KNL] 4704 + The probability weighting to use for the 4705 + smp_call_function_many() function with a zero 4706 + "wait" parameter. See weight_single. 4707 + Note well that setting a high probability for 4708 + this weighting can place serious IPI load 4709 + on the system. 4710 + 4711 + scftorture.weight_many_wait= [KNL] 4712 + The probability weighting to use for the 4713 + smp_call_function_many() function with a 4714 + non-zero "wait" parameter. See weight_single 4715 + and weight_many. 4716 + 4717 + scftorture.weight_all= [KNL] 4718 + The probability weighting to use for the 4719 + smp_call_function_all() function with a zero 4720 + "wait" parameter. See weight_single and 4721 + weight_many. 4722 + 4723 + scftorture.weight_all_wait= [KNL] 4724 + The probability weighting to use for the 4725 + smp_call_function_all() function with a 4726 + non-zero "wait" parameter. See weight_single 4727 + and weight_many. 4728 4729 skew_tick= [KNL] Offset the periodic timer tick per cpu to mitigate 4730 xtime_lock contention on larger systems, and/or RCU lock

+2 -1

MAINTAINERS

··· 17547 T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev 17548 F: Documentation/RCU/torture.rst 17549 F: kernel/locking/locktorture.c 17550 - F: kernel/rcu/rcuperf.c 17551 F: kernel/rcu/rcutorture.c 17552 F: kernel/torture.c 17553 17554 TOSHIBA ACPI EXTRAS DRIVER

··· 17547 T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev 17548 F: Documentation/RCU/torture.rst 17549 F: kernel/locking/locktorture.c 17550 + F: kernel/rcu/rcuscale.c 17551 F: kernel/rcu/rcutorture.c 17552 + F: kernel/rcu/refscale.c 17553 F: kernel/torture.c 17554 17555 TOSHIBA ACPI EXTRAS DRIVER

+4 -2

arch/x86/kvm/mmu/page_track.c

··· 229 return; 230 231 idx = srcu_read_lock(&head->track_srcu); 232 - hlist_for_each_entry_rcu(n, &head->track_notifier_list, node) 233 if (n->track_write) 234 n->track_write(vcpu, gpa, new, bytes, n); 235 srcu_read_unlock(&head->track_srcu, idx); ··· 255 return; 256 257 idx = srcu_read_lock(&head->track_srcu); 258 - hlist_for_each_entry_rcu(n, &head->track_notifier_list, node) 259 if (n->track_flush_slot) 260 n->track_flush_slot(kvm, slot, n); 261 srcu_read_unlock(&head->track_srcu, idx);

··· 229 return; 230 231 idx = srcu_read_lock(&head->track_srcu); 232 + hlist_for_each_entry_srcu(n, &head->track_notifier_list, node, 233 + srcu_read_lock_held(&head->track_srcu)) 234 if (n->track_write) 235 n->track_write(vcpu, gpa, new, bytes, n); 236 srcu_read_unlock(&head->track_srcu, idx); ··· 254 return; 255 256 idx = srcu_read_lock(&head->track_srcu); 257 + hlist_for_each_entry_srcu(n, &head->track_notifier_list, node, 258 + srcu_read_lock_held(&head->track_srcu)) 259 if (n->track_flush_slot) 260 n->track_flush_slot(kvm, slot, n); 261 srcu_read_unlock(&head->track_srcu, idx);

+48

include/linux/rculist.h

··· 63 RCU_LOCKDEP_WARN(!(cond) && !rcu_read_lock_any_held(), \ 64 "RCU-list traversed in non-reader section!"); \ 65 }) 66 #else 67 #define __list_check_rcu(dummy, cond, extra...) \ 68 ({ check_arg_count_one(extra); }) 69 #endif 70 71 /* ··· 394 pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) 395 396 /** 397 * list_entry_lockless - get the struct for this entry 398 * @ptr: the &struct list_head pointer. 399 * @type: the type of the struct this is embedded in. ··· 704 */ 705 #define hlist_for_each_entry_rcu(pos, head, member, cond...) \ 706 for (__list_check_rcu(dummy, ## cond, 0), \ 707 pos = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(head)),\ 708 typeof(*(pos)), member); \ 709 pos; \

··· 63 RCU_LOCKDEP_WARN(!(cond) && !rcu_read_lock_any_held(), \ 64 "RCU-list traversed in non-reader section!"); \ 65 }) 66 + 67 + #define __list_check_srcu(cond) \ 68 + ({ \ 69 + RCU_LOCKDEP_WARN(!(cond), \ 70 + "RCU-list traversed without holding the required lock!");\ 71 + }) 72 #else 73 #define __list_check_rcu(dummy, cond, extra...) \ 74 ({ check_arg_count_one(extra); }) 75 + 76 + #define __list_check_srcu(cond) ({ }) 77 #endif 78 79 /* ··· 386 pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) 387 388 /** 389 + * list_for_each_entry_srcu - iterate over rcu list of given type 390 + * @pos: the type * to use as a loop cursor. 391 + * @head: the head for your list. 392 + * @member: the name of the list_head within the struct. 393 + * @cond: lockdep expression for the lock required to traverse the list. 394 + * 395 + * This list-traversal primitive may safely run concurrently with 396 + * the _rcu list-mutation primitives such as list_add_rcu() 397 + * as long as the traversal is guarded by srcu_read_lock(). 398 + * The lockdep expression srcu_read_lock_held() can be passed as the 399 + * cond argument from read side. 400 + */ 401 + #define list_for_each_entry_srcu(pos, head, member, cond) \ 402 + for (__list_check_srcu(cond), \ 403 + pos = list_entry_rcu((head)->next, typeof(*pos), member); \ 404 + &pos->member != (head); \ 405 + pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) 406 + 407 + /** 408 * list_entry_lockless - get the struct for this entry 409 * @ptr: the &struct list_head pointer. 410 * @type: the type of the struct this is embedded in. ··· 677 */ 678 #define hlist_for_each_entry_rcu(pos, head, member, cond...) \ 679 for (__list_check_rcu(dummy, ## cond, 0), \ 680 + pos = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(head)),\ 681 + typeof(*(pos)), member); \ 682 + pos; \ 683 + pos = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(\ 684 + &(pos)->member)), typeof(*(pos)), member)) 685 + 686 + /** 687 + * hlist_for_each_entry_srcu - iterate over rcu list of given type 688 + * @pos: the type * to use as a loop cursor. 689 + * @head: the head for your list. 690 + * @member: the name of the hlist_node within the struct. 691 + * @cond: lockdep expression for the lock required to traverse the list. 692 + * 693 + * This list-traversal primitive may safely run concurrently with 694 + * the _rcu list-mutation primitives such as hlist_add_head_rcu() 695 + * as long as the traversal is guarded by srcu_read_lock(). 696 + * The lockdep expression srcu_read_lock_held() can be passed as the 697 + * cond argument from read side. 698 + */ 699 + #define hlist_for_each_entry_srcu(pos, head, member, cond) \ 700 + for (__list_check_srcu(cond), \ 701 pos = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(head)),\ 702 typeof(*(pos)), member); \ 703 pos; \

+13 -6

include/linux/rcupdate.h

··· 55 56 #else /* #ifdef CONFIG_PREEMPT_RCU */ 57 58 static inline void __rcu_read_lock(void) 59 { 60 preempt_disable(); ··· 69 static inline void __rcu_read_unlock(void) 70 { 71 preempt_enable(); 72 } 73 74 static inline int rcu_preempt_depth(void) ··· 716 "rcu_read_lock_bh() used illegally while idle"); 717 } 718 719 - /* 720 - * rcu_read_unlock_bh - marks the end of a softirq-only RCU critical section 721 * 722 * See rcu_read_lock_bh() for more information. 723 */ ··· 758 __acquire(RCU_SCHED); 759 } 760 761 - /* 762 - * rcu_read_unlock_sched - marks the end of a RCU-classic critical section 763 * 764 - * See rcu_read_lock_sched for more information. 765 */ 766 static inline void rcu_read_unlock_sched(void) 767 { ··· 952 } 953 954 /** 955 - * rcu_head_after_call_rcu - Has this rcu_head been passed to call_rcu()? 956 * @rhp: The rcu_head structure to test. 957 * @f: The function passed to call_rcu() along with @rhp. 958 *

··· 55 56 #else /* #ifdef CONFIG_PREEMPT_RCU */ 57 58 + #ifdef CONFIG_TINY_RCU 59 + #define rcu_read_unlock_strict() do { } while (0) 60 + #else 61 + void rcu_read_unlock_strict(void); 62 + #endif 63 + 64 static inline void __rcu_read_lock(void) 65 { 66 preempt_disable(); ··· 63 static inline void __rcu_read_unlock(void) 64 { 65 preempt_enable(); 66 + rcu_read_unlock_strict(); 67 } 68 69 static inline int rcu_preempt_depth(void) ··· 709 "rcu_read_lock_bh() used illegally while idle"); 710 } 711 712 + /** 713 + * rcu_read_unlock_bh() - marks the end of a softirq-only RCU critical section 714 * 715 * See rcu_read_lock_bh() for more information. 716 */ ··· 751 __acquire(RCU_SCHED); 752 } 753 754 + /** 755 + * rcu_read_unlock_sched() - marks the end of a RCU-classic critical section 756 * 757 + * See rcu_read_lock_sched() for more information. 758 */ 759 static inline void rcu_read_unlock_sched(void) 760 { ··· 945 } 946 947 /** 948 + * rcu_head_after_call_rcu() - Has this rcu_head been passed to call_rcu()? 949 * @rhp: The rcu_head structure to test. 950 * @f: The function passed to call_rcu() along with @rhp. 951 *

-1

include/linux/rcutiny.h

··· 103 static inline void rcu_end_inkernel_boot(void) { } 104 static inline bool rcu_inkernel_boot_has_ended(void) { return true; } 105 static inline bool rcu_is_watching(void) { return true; } 106 - static inline bool __rcu_is_watching(void) { return true; } 107 static inline void rcu_momentary_dyntick_idle(void) { } 108 static inline void kfree_rcu_scheduler_running(void) { } 109 static inline bool rcu_gp_might_be_stalled(void) { return false; }

··· 103 static inline void rcu_end_inkernel_boot(void) { } 104 static inline bool rcu_inkernel_boot_has_ended(void) { return true; } 105 static inline bool rcu_is_watching(void) { return true; } 106 static inline void rcu_momentary_dyntick_idle(void) { } 107 static inline void kfree_rcu_scheduler_running(void) { } 108 static inline bool rcu_gp_might_be_stalled(void) { return false; }

-1

include/linux/rcutree.h

··· 64 void rcu_end_inkernel_boot(void); 65 bool rcu_inkernel_boot_has_ended(void); 66 bool rcu_is_watching(void); 67 - bool __rcu_is_watching(void); 68 #ifndef CONFIG_PREEMPTION 69 void rcu_all_qs(void); 70 #endif

··· 64 void rcu_end_inkernel_boot(void); 65 bool rcu_inkernel_boot_has_ended(void); 66 bool rcu_is_watching(void); 67 #ifndef CONFIG_PREEMPTION 68 void rcu_all_qs(void); 69 #endif

+3

include/linux/smp.h

··· 26 struct { 27 struct llist_node llist; 28 unsigned int flags; 29 }; 30 }; 31 smp_call_func_t func;

··· 26 struct { 27 struct llist_node llist; 28 unsigned int flags; 29 + #ifdef CONFIG_64BIT 30 + u16 src, dst; 31 + #endif 32 }; 33 }; 34 smp_call_func_t func;

+3

include/linux/smp_types.h

··· 61 unsigned int u_flags; 62 atomic_t a_flags; 63 }; 64 }; 65 66 #endif /* __LINUX_SMP_TYPES_H */

··· 61 unsigned int u_flags; 62 atomic_t a_flags; 63 }; 64 + #ifdef CONFIG_64BIT 65 + u16 src, dst; 66 + #endif 67 }; 68 69 #endif /* __LINUX_SMP_TYPES_H */

+27 -27

include/trace/events/rcu.h

··· 74 75 TP_STRUCT__entry( 76 __field(const char *, rcuname) 77 - __field(unsigned long, gp_seq) 78 __field(const char *, gpevent) 79 ), 80 81 TP_fast_assign( 82 __entry->rcuname = rcuname; 83 - __entry->gp_seq = gp_seq; 84 __entry->gpevent = gpevent; 85 ), 86 87 - TP_printk("%s %lu %s", 88 __entry->rcuname, __entry->gp_seq, __entry->gpevent) 89 ); 90 ··· 114 115 TP_STRUCT__entry( 116 __field(const char *, rcuname) 117 - __field(unsigned long, gp_seq) 118 - __field(unsigned long, gp_seq_req) 119 __field(u8, level) 120 __field(int, grplo) 121 __field(int, grphi) ··· 124 125 TP_fast_assign( 126 __entry->rcuname = rcuname; 127 - __entry->gp_seq = gp_seq; 128 - __entry->gp_seq_req = gp_seq_req; 129 __entry->level = level; 130 __entry->grplo = grplo; 131 __entry->grphi = grphi; 132 __entry->gpevent = gpevent; 133 ), 134 135 - TP_printk("%s %lu %lu %u %d %d %s", 136 - __entry->rcuname, __entry->gp_seq, __entry->gp_seq_req, __entry->level, 137 __entry->grplo, __entry->grphi, __entry->gpevent) 138 ); 139 ··· 153 154 TP_STRUCT__entry( 155 __field(const char *, rcuname) 156 - __field(unsigned long, gp_seq) 157 __field(u8, level) 158 __field(int, grplo) 159 __field(int, grphi) ··· 162 163 TP_fast_assign( 164 __entry->rcuname = rcuname; 165 - __entry->gp_seq = gp_seq; 166 __entry->level = level; 167 __entry->grplo = grplo; 168 __entry->grphi = grphi; 169 __entry->qsmask = qsmask; 170 ), 171 172 - TP_printk("%s %lu %u %d %d %lx", 173 __entry->rcuname, __entry->gp_seq, __entry->level, 174 __entry->grplo, __entry->grphi, __entry->qsmask) 175 ); ··· 197 198 TP_STRUCT__entry( 199 __field(const char *, rcuname) 200 - __field(unsigned long, gpseq) 201 __field(const char *, gpevent) 202 ), 203 204 TP_fast_assign( 205 __entry->rcuname = rcuname; 206 - __entry->gpseq = gpseq; 207 __entry->gpevent = gpevent; 208 ), 209 210 - TP_printk("%s %lu %s", 211 __entry->rcuname, __entry->gpseq, __entry->gpevent) 212 ); 213 ··· 316 317 TP_STRUCT__entry( 318 __field(const char *, rcuname) 319 - __field(unsigned long, gp_seq) 320 __field(int, pid) 321 ), 322 323 TP_fast_assign( 324 __entry->rcuname = rcuname; 325 - __entry->gp_seq = gp_seq; 326 __entry->pid = pid; 327 ), 328 329 - TP_printk("%s %lu %d", 330 __entry->rcuname, __entry->gp_seq, __entry->pid) 331 ); 332 ··· 343 344 TP_STRUCT__entry( 345 __field(const char *, rcuname) 346 - __field(unsigned long, gp_seq) 347 __field(int, pid) 348 ), 349 350 TP_fast_assign( 351 __entry->rcuname = rcuname; 352 - __entry->gp_seq = gp_seq; 353 __entry->pid = pid; 354 ), 355 356 - TP_printk("%s %lu %d", __entry->rcuname, __entry->gp_seq, __entry->pid) 357 ); 358 359 /* ··· 374 375 TP_STRUCT__entry( 376 __field(const char *, rcuname) 377 - __field(unsigned long, gp_seq) 378 __field(unsigned long, mask) 379 __field(unsigned long, qsmask) 380 __field(u8, level) ··· 385 386 TP_fast_assign( 387 __entry->rcuname = rcuname; 388 - __entry->gp_seq = gp_seq; 389 __entry->mask = mask; 390 __entry->qsmask = qsmask; 391 __entry->level = level; ··· 394 __entry->gp_tasks = gp_tasks; 395 ), 396 397 - TP_printk("%s %lu %lx>%lx %u %d %d %u", 398 __entry->rcuname, __entry->gp_seq, 399 __entry->mask, __entry->qsmask, __entry->level, 400 __entry->grplo, __entry->grphi, __entry->gp_tasks) ··· 415 416 TP_STRUCT__entry( 417 __field(const char *, rcuname) 418 - __field(unsigned long, gp_seq) 419 __field(int, cpu) 420 __field(const char *, qsevent) 421 ), 422 423 TP_fast_assign( 424 __entry->rcuname = rcuname; 425 - __entry->gp_seq = gp_seq; 426 __entry->cpu = cpu; 427 __entry->qsevent = qsevent; 428 ), 429 430 - TP_printk("%s %lu %d %s", 431 __entry->rcuname, __entry->gp_seq, 432 __entry->cpu, __entry->qsevent) 433 );

··· 74 75 TP_STRUCT__entry( 76 __field(const char *, rcuname) 77 + __field(long, gp_seq) 78 __field(const char *, gpevent) 79 ), 80 81 TP_fast_assign( 82 __entry->rcuname = rcuname; 83 + __entry->gp_seq = (long)gp_seq; 84 __entry->gpevent = gpevent; 85 ), 86 87 + TP_printk("%s %ld %s", 88 __entry->rcuname, __entry->gp_seq, __entry->gpevent) 89 ); 90 ··· 114 115 TP_STRUCT__entry( 116 __field(const char *, rcuname) 117 + __field(long, gp_seq) 118 + __field(long, gp_seq_req) 119 __field(u8, level) 120 __field(int, grplo) 121 __field(int, grphi) ··· 124 125 TP_fast_assign( 126 __entry->rcuname = rcuname; 127 + __entry->gp_seq = (long)gp_seq; 128 + __entry->gp_seq_req = (long)gp_seq_req; 129 __entry->level = level; 130 __entry->grplo = grplo; 131 __entry->grphi = grphi; 132 __entry->gpevent = gpevent; 133 ), 134 135 + TP_printk("%s %ld %ld %u %d %d %s", 136 + __entry->rcuname, (long)__entry->gp_seq, (long)__entry->gp_seq_req, __entry->level, 137 __entry->grplo, __entry->grphi, __entry->gpevent) 138 ); 139 ··· 153 154 TP_STRUCT__entry( 155 __field(const char *, rcuname) 156 + __field(long, gp_seq) 157 __field(u8, level) 158 __field(int, grplo) 159 __field(int, grphi) ··· 162 163 TP_fast_assign( 164 __entry->rcuname = rcuname; 165 + __entry->gp_seq = (long)gp_seq; 166 __entry->level = level; 167 __entry->grplo = grplo; 168 __entry->grphi = grphi; 169 __entry->qsmask = qsmask; 170 ), 171 172 + TP_printk("%s %ld %u %d %d %lx", 173 __entry->rcuname, __entry->gp_seq, __entry->level, 174 __entry->grplo, __entry->grphi, __entry->qsmask) 175 ); ··· 197 198 TP_STRUCT__entry( 199 __field(const char *, rcuname) 200 + __field(long, gpseq) 201 __field(const char *, gpevent) 202 ), 203 204 TP_fast_assign( 205 __entry->rcuname = rcuname; 206 + __entry->gpseq = (long)gpseq; 207 __entry->gpevent = gpevent; 208 ), 209 210 + TP_printk("%s %ld %s", 211 __entry->rcuname, __entry->gpseq, __entry->gpevent) 212 ); 213 ··· 316 317 TP_STRUCT__entry( 318 __field(const char *, rcuname) 319 + __field(long, gp_seq) 320 __field(int, pid) 321 ), 322 323 TP_fast_assign( 324 __entry->rcuname = rcuname; 325 + __entry->gp_seq = (long)gp_seq; 326 __entry->pid = pid; 327 ), 328 329 + TP_printk("%s %ld %d", 330 __entry->rcuname, __entry->gp_seq, __entry->pid) 331 ); 332 ··· 343 344 TP_STRUCT__entry( 345 __field(const char *, rcuname) 346 + __field(long, gp_seq) 347 __field(int, pid) 348 ), 349 350 TP_fast_assign( 351 __entry->rcuname = rcuname; 352 + __entry->gp_seq = (long)gp_seq; 353 __entry->pid = pid; 354 ), 355 356 + TP_printk("%s %ld %d", __entry->rcuname, __entry->gp_seq, __entry->pid) 357 ); 358 359 /* ··· 374 375 TP_STRUCT__entry( 376 __field(const char *, rcuname) 377 + __field(long, gp_seq) 378 __field(unsigned long, mask) 379 __field(unsigned long, qsmask) 380 __field(u8, level) ··· 385 386 TP_fast_assign( 387 __entry->rcuname = rcuname; 388 + __entry->gp_seq = (long)gp_seq; 389 __entry->mask = mask; 390 __entry->qsmask = qsmask; 391 __entry->level = level; ··· 394 __entry->gp_tasks = gp_tasks; 395 ), 396 397 + TP_printk("%s %ld %lx>%lx %u %d %d %u", 398 __entry->rcuname, __entry->gp_seq, 399 __entry->mask, __entry->qsmask, __entry->level, 400 __entry->grplo, __entry->grphi, __entry->gp_tasks) ··· 415 416 TP_STRUCT__entry( 417 __field(const char *, rcuname) 418 + __field(long, gp_seq) 419 __field(int, cpu) 420 __field(const char *, qsevent) 421 ), 422 423 TP_fast_assign( 424 __entry->rcuname = rcuname; 425 + __entry->gp_seq = (long)gp_seq; 426 __entry->cpu = cpu; 427 __entry->qsevent = qsevent; 428 ), 429 430 + TP_printk("%s %ld %d %s", 431 __entry->rcuname, __entry->gp_seq, 432 __entry->cpu, __entry->qsevent) 433 );

+2

kernel/Makefile

··· 133 KCSAN_SANITIZE_stackleak.o := n 134 KCOV_INSTRUMENT_stackleak.o := n 135 136 $(obj)/configs.o: $(obj)/config_data.gz 137 138 targets += config_data.gz

··· 133 KCSAN_SANITIZE_stackleak.o := n 134 KCOV_INSTRUMENT_stackleak.o := n 135 136 + obj-$(CONFIG_SCF_TORTURE_TEST) += scftorture.o 137 + 138 $(obj)/configs.o: $(obj)/config_data.gz 139 140 targets += config_data.gz

+1 -1

kernel/entry/common.c

··· 304 * terminate a grace period, if and only if the timer interrupt is 305 * not nested into another interrupt. 306 * 307 - * Checking for __rcu_is_watching() here would prevent the nesting 308 * interrupt to invoke rcu_irq_enter(). If that nested interrupt is 309 * the tick then rcu_flavor_sched_clock_irq() would wrongfully 310 * assume that it is the first interupt and eventually claim

··· 304 * terminate a grace period, if and only if the timer interrupt is 305 * not nested into another interrupt. 306 * 307 + * Checking for rcu_is_watching() here would prevent the nesting 308 * interrupt to invoke rcu_irq_enter(). If that nested interrupt is 309 * the tick then rcu_flavor_sched_clock_irq() would wrongfully 310 * assume that it is the first interupt and eventually claim

+1 -1

kernel/locking/locktorture.c

··· 566 #include <linux/percpu-rwsem.h> 567 static struct percpu_rw_semaphore pcpu_rwsem; 568 569 - void torture_percpu_rwsem_init(void) 570 { 571 BUG_ON(percpu_init_rwsem(&pcpu_rwsem)); 572 }

··· 566 #include <linux/percpu-rwsem.h> 567 static struct percpu_rw_semaphore pcpu_rwsem; 568 569 + static void torture_percpu_rwsem_init(void) 570 { 571 BUG_ON(percpu_init_rwsem(&pcpu_rwsem)); 572 }

+5 -3

kernel/rcu/Kconfig

··· 135 136 config RCU_FANOUT_LEAF 137 int "Tree-based hierarchical RCU leaf-level fanout value" 138 - range 2 64 if 64BIT 139 - range 2 32 if !64BIT 140 depends on TREE_RCU && RCU_EXPERT 141 - default 16 142 help 143 This option controls the leaf-level fanout of hierarchical 144 implementations of RCU, and allows trading off cache misses

··· 135 136 config RCU_FANOUT_LEAF 137 int "Tree-based hierarchical RCU leaf-level fanout value" 138 + range 2 64 if 64BIT && !RCU_STRICT_GRACE_PERIOD 139 + range 2 32 if !64BIT && !RCU_STRICT_GRACE_PERIOD 140 + range 2 3 if RCU_STRICT_GRACE_PERIOD 141 depends on TREE_RCU && RCU_EXPERT 142 + default 16 if !RCU_STRICT_GRACE_PERIOD 143 + default 2 if RCU_STRICT_GRACE_PERIOD 144 help 145 This option controls the leaf-level fanout of hierarchical 146 implementations of RCU, and allows trading off cache misses

+16 -1

kernel/rcu/Kconfig.debug

··· 23 tristate 24 default n 25 26 - config RCU_PERF_TEST 27 tristate "performance tests for RCU" 28 depends on DEBUG_KERNEL 29 select TORTURE_TEST ··· 113 114 Say N here if you need ultimate kernel/user switch latencies 115 Say Y if you are unsure 116 117 endmenu # "RCU Debugging"

··· 23 tristate 24 default n 25 26 + config RCU_SCALE_TEST 27 tristate "performance tests for RCU" 28 depends on DEBUG_KERNEL 29 select TORTURE_TEST ··· 113 114 Say N here if you need ultimate kernel/user switch latencies 115 Say Y if you are unsure 116 + 117 + config RCU_STRICT_GRACE_PERIOD 118 + bool "Provide debug RCU implementation with short grace periods" 119 + depends on DEBUG_KERNEL && RCU_EXPERT 120 + default n 121 + select PREEMPT_COUNT if PREEMPT=n 122 + help 123 + Select this option to build an RCU variant that is strict about 124 + grace periods, making them as short as it can. This limits 125 + scalability, destroys real-time response, degrades battery 126 + lifetime and kills performance. Don't try this on large 127 + machines, as in systems with more than about 10 or 20 CPUs. 128 + But in conjunction with tools like KASAN, it can be helpful 129 + when looking for certain types of RCU usage bugs, for example, 130 + too-short RCU read-side critical sections. 131 132 endmenu # "RCU Debugging"

+1 -1

kernel/rcu/Makefile

··· 11 obj-$(CONFIG_TREE_SRCU) += srcutree.o 12 obj-$(CONFIG_TINY_SRCU) += srcutiny.o 13 obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o 14 - obj-$(CONFIG_RCU_PERF_TEST) += rcuperf.o 15 obj-$(CONFIG_RCU_REF_SCALE_TEST) += refscale.o 16 obj-$(CONFIG_TREE_RCU) += tree.o 17 obj-$(CONFIG_TINY_RCU) += tiny.o

··· 11 obj-$(CONFIG_TREE_SRCU) += srcutree.o 12 obj-$(CONFIG_TINY_SRCU) += srcutiny.o 13 obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o 14 + obj-$(CONFIG_RCU_SCALE_TEST) += rcuscale.o 15 obj-$(CONFIG_RCU_REF_SCALE_TEST) += refscale.o 16 obj-$(CONFIG_TREE_RCU) += tree.o 17 obj-$(CONFIG_TINY_RCU) += tiny.o

+9 -1

kernel/rcu/rcu_segcblist.c

··· 475 * Also advance to the oldest segment of callbacks whose 476 * ->gp_seq[] completion is at or after that passed in via "seq", 477 * skipping any empty segments. 478 */ 479 - if (++i >= RCU_NEXT_TAIL) 480 return false; 481 482 /*

··· 475 * Also advance to the oldest segment of callbacks whose 476 * ->gp_seq[] completion is at or after that passed in via "seq", 477 * skipping any empty segments. 478 + * 479 + * Note that segment "i" (and any lower-numbered segments 480 + * containing older callbacks) will be unaffected, and their 481 + * grace-period numbers remain unchanged. For example, if i == 482 + * WAIT_TAIL, then neither WAIT_TAIL nor DONE_TAIL will be touched. 483 + * Instead, the CBs in NEXT_TAIL will be merged with those in 484 + * NEXT_READY_TAIL and the grace-period number of NEXT_READY_TAIL 485 + * would be updated. NEXT_TAIL would then be empty. 486 */ 487 + if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL) 488 return false; 489 490 /*

+165 -165

kernel/rcu/rcuperf.c kernel/rcu/rcuscale.c

··· 1 // SPDX-License-Identifier: GPL-2.0+ 2 /* 3 - * Read-Copy Update module-based performance-test facility 4 * 5 * Copyright (C) IBM Corporation, 2015 6 * ··· 44 MODULE_LICENSE("GPL"); 45 MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>"); 46 47 - #define PERF_FLAG "-perf:" 48 - #define PERFOUT_STRING(s) \ 49 - pr_alert("%s" PERF_FLAG " %s\n", perf_type, s) 50 - #define VERBOSE_PERFOUT_STRING(s) \ 51 - do { if (verbose) pr_alert("%s" PERF_FLAG " %s\n", perf_type, s); } while (0) 52 - #define VERBOSE_PERFOUT_ERRSTRING(s) \ 53 - do { if (verbose) pr_alert("%s" PERF_FLAG "!!! %s\n", perf_type, s); } while (0) 54 55 /* 56 * The intended use cases for the nreaders and nwriters module parameters ··· 61 * nr_cpus for a mixed reader/writer test. 62 * 63 * 2. Specify the nr_cpus kernel boot parameter, but set 64 - * rcuperf.nreaders to zero. This will set nwriters to the 65 * value specified by nr_cpus for an update-only test. 66 * 67 * 3. Specify the nr_cpus kernel boot parameter, but set 68 - * rcuperf.nwriters to zero. This will set nreaders to the 69 * value specified by nr_cpus for a read-only test. 70 * 71 * Various other use cases may of course be specified. 72 * 73 * Note that this test's readers are intended only as a test load for 74 - * the writers. The reader performance statistics will be overly 75 * pessimistic due to the per-critical-section interrupt disabling, 76 * test-end checks, and the pair of calls through pointers. 77 */ 78 79 #ifdef MODULE 80 - # define RCUPERF_SHUTDOWN 0 81 #else 82 - # define RCUPERF_SHUTDOWN 1 83 #endif 84 85 torture_param(bool, gp_async, false, "Use asynchronous GP wait primitives"); ··· 88 torture_param(int, holdoff, 10, "Holdoff time before test start (s)"); 89 torture_param(int, nreaders, -1, "Number of RCU reader threads"); 90 torture_param(int, nwriters, -1, "Number of RCU updater threads"); 91 - torture_param(bool, shutdown, RCUPERF_SHUTDOWN, 92 - "Shutdown at end of performance tests."); 93 torture_param(int, verbose, 1, "Enable verbose debugging printk()s"); 94 torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable"); 95 - torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() perf test?"); 96 torture_param(int, kfree_mult, 1, "Multiple of kfree_obj size to allocate."); 97 98 - static char *perf_type = "rcu"; 99 - module_param(perf_type, charp, 0444); 100 - MODULE_PARM_DESC(perf_type, "Type of RCU to performance-test (rcu, srcu, ...)"); 101 102 static int nrealreaders; 103 static int nrealwriters; ··· 107 108 static u64 **writer_durations; 109 static int *writer_n_durations; 110 - static atomic_t n_rcu_perf_reader_started; 111 - static atomic_t n_rcu_perf_writer_started; 112 - static atomic_t n_rcu_perf_writer_finished; 113 static wait_queue_head_t shutdown_wq; 114 - static u64 t_rcu_perf_writer_started; 115 - static u64 t_rcu_perf_writer_finished; 116 static unsigned long b_rcu_gp_test_started; 117 static unsigned long b_rcu_gp_test_finished; 118 static DEFINE_PER_CPU(atomic_t, n_async_inflight); ··· 124 * Operations vector for selecting different types of tests. 125 */ 126 127 - struct rcu_perf_ops { 128 int ptype; 129 void (*init)(void); 130 void (*cleanup)(void); ··· 140 const char *name; 141 }; 142 143 - static struct rcu_perf_ops *cur_ops; 144 145 /* 146 - * Definitions for rcu perf testing. 147 */ 148 149 - static int rcu_perf_read_lock(void) __acquires(RCU) 150 { 151 rcu_read_lock(); 152 return 0; 153 } 154 155 - static void rcu_perf_read_unlock(int idx) __releases(RCU) 156 { 157 rcu_read_unlock(); 158 } ··· 162 return 0; 163 } 164 165 - static void rcu_sync_perf_init(void) 166 { 167 } 168 169 - static struct rcu_perf_ops rcu_ops = { 170 .ptype = RCU_FLAVOR, 171 - .init = rcu_sync_perf_init, 172 - .readlock = rcu_perf_read_lock, 173 - .readunlock = rcu_perf_read_unlock, 174 .get_gp_seq = rcu_get_gp_seq, 175 .gp_diff = rcu_seq_diff, 176 .exp_completed = rcu_exp_batches_completed, ··· 182 }; 183 184 /* 185 - * Definitions for srcu perf testing. 186 */ 187 188 - DEFINE_STATIC_SRCU(srcu_ctl_perf); 189 - static struct srcu_struct *srcu_ctlp = &srcu_ctl_perf; 190 191 - static int srcu_perf_read_lock(void) __acquires(srcu_ctlp) 192 { 193 return srcu_read_lock(srcu_ctlp); 194 } 195 196 - static void srcu_perf_read_unlock(int idx) __releases(srcu_ctlp) 197 { 198 srcu_read_unlock(srcu_ctlp, idx); 199 } 200 201 - static unsigned long srcu_perf_completed(void) 202 { 203 return srcu_batches_completed(srcu_ctlp); 204 } ··· 213 srcu_barrier(srcu_ctlp); 214 } 215 216 - static void srcu_perf_synchronize(void) 217 { 218 synchronize_srcu(srcu_ctlp); 219 } 220 221 - static void srcu_perf_synchronize_expedited(void) 222 { 223 synchronize_srcu_expedited(srcu_ctlp); 224 } 225 226 - static struct rcu_perf_ops srcu_ops = { 227 .ptype = SRCU_FLAVOR, 228 - .init = rcu_sync_perf_init, 229 - .readlock = srcu_perf_read_lock, 230 - .readunlock = srcu_perf_read_unlock, 231 - .get_gp_seq = srcu_perf_completed, 232 .gp_diff = rcu_seq_diff, 233 - .exp_completed = srcu_perf_completed, 234 .async = srcu_call_rcu, 235 .gp_barrier = srcu_rcu_barrier, 236 - .sync = srcu_perf_synchronize, 237 - .exp_sync = srcu_perf_synchronize_expedited, 238 .name = "srcu" 239 }; 240 241 static struct srcu_struct srcud; 242 243 - static void srcu_sync_perf_init(void) 244 { 245 srcu_ctlp = &srcud; 246 init_srcu_struct(srcu_ctlp); 247 } 248 249 - static void srcu_sync_perf_cleanup(void) 250 { 251 cleanup_srcu_struct(srcu_ctlp); 252 } 253 254 - static struct rcu_perf_ops srcud_ops = { 255 .ptype = SRCU_FLAVOR, 256 - .init = srcu_sync_perf_init, 257 - .cleanup = srcu_sync_perf_cleanup, 258 - .readlock = srcu_perf_read_lock, 259 - .readunlock = srcu_perf_read_unlock, 260 - .get_gp_seq = srcu_perf_completed, 261 .gp_diff = rcu_seq_diff, 262 - .exp_completed = srcu_perf_completed, 263 .async = srcu_call_rcu, 264 .gp_barrier = srcu_rcu_barrier, 265 - .sync = srcu_perf_synchronize, 266 - .exp_sync = srcu_perf_synchronize_expedited, 267 .name = "srcud" 268 }; 269 270 /* 271 - * Definitions for RCU-tasks perf testing. 272 */ 273 274 - static int tasks_perf_read_lock(void) 275 { 276 return 0; 277 } 278 279 - static void tasks_perf_read_unlock(int idx) 280 { 281 } 282 283 - static struct rcu_perf_ops tasks_ops = { 284 .ptype = RCU_TASKS_FLAVOR, 285 - .init = rcu_sync_perf_init, 286 - .readlock = tasks_perf_read_lock, 287 - .readunlock = tasks_perf_read_unlock, 288 .get_gp_seq = rcu_no_completed, 289 .gp_diff = rcu_seq_diff, 290 .async = call_rcu_tasks, ··· 294 .name = "tasks" 295 }; 296 297 - static unsigned long rcuperf_seq_diff(unsigned long new, unsigned long old) 298 { 299 if (!cur_ops->gp_diff) 300 return new - old; ··· 302 } 303 304 /* 305 - * If performance tests complete, wait for shutdown to commence. 306 */ 307 - static void rcu_perf_wait_shutdown(void) 308 { 309 cond_resched_tasks_rcu_qs(); 310 - if (atomic_read(&n_rcu_perf_writer_finished) < nrealwriters) 311 return; 312 while (!torture_must_stop()) 313 schedule_timeout_uninterruptible(1); 314 } 315 316 /* 317 - * RCU perf reader kthread. Repeatedly does empty RCU read-side critical 318 - * section, minimizing update-side interference. However, the point of 319 - * this test is not to evaluate reader performance, but instead to serve 320 - * as a test load for update-side performance testing. 321 */ 322 static int 323 - rcu_perf_reader(void *arg) 324 { 325 unsigned long flags; 326 int idx; 327 long me = (long)arg; 328 329 - VERBOSE_PERFOUT_STRING("rcu_perf_reader task started"); 330 set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids)); 331 set_user_nice(current, MAX_NICE); 332 - atomic_inc(&n_rcu_perf_reader_started); 333 334 do { 335 local_irq_save(flags); 336 idx = cur_ops->readlock(); 337 cur_ops->readunlock(idx); 338 local_irq_restore(flags); 339 - rcu_perf_wait_shutdown(); 340 } while (!torture_must_stop()); 341 - torture_kthread_stopping("rcu_perf_reader"); 342 return 0; 343 } 344 345 /* 346 - * Callback function for asynchronous grace periods from rcu_perf_writer(). 347 */ 348 - static void rcu_perf_async_cb(struct rcu_head *rhp) 349 { 350 atomic_dec(this_cpu_ptr(&n_async_inflight)); 351 kfree(rhp); 352 } 353 354 /* 355 - * RCU perf writer kthread. Repeatedly does a grace period. 356 */ 357 static int 358 - rcu_perf_writer(void *arg) 359 { 360 int i = 0; 361 int i_max; ··· 366 u64 *wdp; 367 u64 *wdpp = writer_durations[me]; 368 369 - VERBOSE_PERFOUT_STRING("rcu_perf_writer task started"); 370 WARN_ON(!wdpp); 371 set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids)); 372 sched_set_fifo_low(current); ··· 383 schedule_timeout_uninterruptible(1); 384 385 t = ktime_get_mono_fast_ns(); 386 - if (atomic_inc_return(&n_rcu_perf_writer_started) >= nrealwriters) { 387 - t_rcu_perf_writer_started = t; 388 if (gp_exp) { 389 b_rcu_gp_test_started = 390 cur_ops->exp_completed() / 2; ··· 404 rhp = kmalloc(sizeof(*rhp), GFP_KERNEL); 405 if (rhp && atomic_read(this_cpu_ptr(&n_async_inflight)) < gp_async_max) { 406 atomic_inc(this_cpu_ptr(&n_async_inflight)); 407 - cur_ops->async(rhp, rcu_perf_async_cb); 408 rhp = NULL; 409 } else if (!kthread_should_stop()) { 410 cur_ops->gp_barrier(); ··· 421 *wdp = t - *wdp; 422 i_max = i; 423 if (!started && 424 - atomic_read(&n_rcu_perf_writer_started) >= nrealwriters) 425 started = true; 426 if (!done && i >= MIN_MEAS) { 427 done = true; 428 sched_set_normal(current, 0); 429 - pr_alert("%s%s rcu_perf_writer %ld has %d measurements\n", 430 - perf_type, PERF_FLAG, me, MIN_MEAS); 431 - if (atomic_inc_return(&n_rcu_perf_writer_finished) >= 432 nrealwriters) { 433 schedule_timeout_interruptible(10); 434 rcu_ftrace_dump(DUMP_ALL); 435 - PERFOUT_STRING("Test complete"); 436 - t_rcu_perf_writer_finished = t; 437 if (gp_exp) { 438 b_rcu_gp_test_finished = 439 cur_ops->exp_completed() / 2; ··· 448 } 449 } 450 if (done && !alldone && 451 - atomic_read(&n_rcu_perf_writer_finished) >= nrealwriters) 452 alldone = true; 453 if (started && !alldone && i < MAX_MEAS - 1) 454 i++; 455 - rcu_perf_wait_shutdown(); 456 } while (!torture_must_stop()); 457 if (gp_async) { 458 cur_ops->gp_barrier(); 459 } 460 writer_n_durations[me] = i_max; 461 - torture_kthread_stopping("rcu_perf_writer"); 462 return 0; 463 } 464 465 static void 466 - rcu_perf_print_module_parms(struct rcu_perf_ops *cur_ops, const char *tag) 467 { 468 - pr_alert("%s" PERF_FLAG 469 "--- %s: nreaders=%d nwriters=%d verbose=%d shutdown=%d\n", 470 - perf_type, tag, nrealreaders, nrealwriters, verbose, shutdown); 471 } 472 473 static void 474 - rcu_perf_cleanup(void) 475 { 476 int i; 477 int j; ··· 484 * during the mid-boot phase, so have to wait till the end. 485 */ 486 if (rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp) 487 - VERBOSE_PERFOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!"); 488 if (rcu_gp_is_normal() && gp_exp) 489 - VERBOSE_PERFOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!"); 490 if (gp_exp && gp_async) 491 - VERBOSE_PERFOUT_ERRSTRING("No expedited async GPs, so went with async!"); 492 493 if (torture_cleanup_begin()) 494 return; ··· 499 500 if (reader_tasks) { 501 for (i = 0; i < nrealreaders; i++) 502 - torture_stop_kthread(rcu_perf_reader, 503 reader_tasks[i]); 504 kfree(reader_tasks); 505 } 506 507 if (writer_tasks) { 508 for (i = 0; i < nrealwriters; i++) { 509 - torture_stop_kthread(rcu_perf_writer, 510 writer_tasks[i]); 511 if (!writer_n_durations) 512 continue; 513 j = writer_n_durations[i]; 514 pr_alert("%s%s writer %d gps: %d\n", 515 - perf_type, PERF_FLAG, i, j); 516 ngps += j; 517 } 518 pr_alert("%s%s start: %llu end: %llu duration: %llu gps: %d batches: %ld\n", 519 - perf_type, PERF_FLAG, 520 - t_rcu_perf_writer_started, t_rcu_perf_writer_finished, 521 - t_rcu_perf_writer_finished - 522 - t_rcu_perf_writer_started, 523 ngps, 524 - rcuperf_seq_diff(b_rcu_gp_test_finished, 525 - b_rcu_gp_test_started)); 526 for (i = 0; i < nrealwriters; i++) { 527 if (!writer_durations) 528 break; ··· 534 for (j = 0; j <= writer_n_durations[i]; j++) { 535 wdp = &wdpp[j]; 536 pr_alert("%s%s %4d writer-duration: %5d %llu\n", 537 - perf_type, PERF_FLAG, 538 i, j, *wdp); 539 if (j % 100 == 0) 540 schedule_timeout_uninterruptible(1); ··· 573 } 574 575 /* 576 - * RCU perf shutdown kthread. Just waits to be awakened, then shuts 577 * down system. 578 */ 579 static int 580 - rcu_perf_shutdown(void *arg) 581 { 582 wait_event(shutdown_wq, 583 - atomic_read(&n_rcu_perf_writer_finished) >= nrealwriters); 584 smp_mb(); /* Wake before output. */ 585 - rcu_perf_cleanup(); 586 kernel_power_off(); 587 return -EINVAL; 588 } 589 590 /* 591 - * kfree_rcu() performance tests: Start a kfree_rcu() loop on all CPUs for number 592 * of iterations and measure total time and number of GP for all iterations to complete. 593 */ 594 ··· 598 599 static struct task_struct **kfree_reader_tasks; 600 static int kfree_nrealthreads; 601 - static atomic_t n_kfree_perf_thread_started; 602 - static atomic_t n_kfree_perf_thread_ended; 603 604 struct kfree_obj { 605 char kfree_obj[8]; ··· 607 }; 608 609 static int 610 - kfree_perf_thread(void *arg) 611 { 612 int i, loop = 0; 613 long me = (long)arg; ··· 615 u64 start_time, end_time; 616 long long mem_begin, mem_during = 0; 617 618 - VERBOSE_PERFOUT_STRING("kfree_perf_thread task started"); 619 set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids)); 620 set_user_nice(current, MAX_NICE); 621 622 start_time = ktime_get_mono_fast_ns(); 623 624 - if (atomic_inc_return(&n_kfree_perf_thread_started) >= kfree_nrealthreads) { 625 if (gp_exp) 626 b_rcu_gp_test_started = cur_ops->exp_completed() / 2; 627 else ··· 646 cond_resched(); 647 } while (!torture_must_stop() && ++loop < kfree_loops); 648 649 - if (atomic_inc_return(&n_kfree_perf_thread_ended) >= kfree_nrealthreads) { 650 end_time = ktime_get_mono_fast_ns(); 651 652 if (gp_exp) ··· 656 657 pr_alert("Total time taken by all kfree'ers: %llu ns, loops: %d, batches: %ld, memory footprint: %lldMB\n", 658 (unsigned long long)(end_time - start_time), kfree_loops, 659 - rcuperf_seq_diff(b_rcu_gp_test_finished, b_rcu_gp_test_started), 660 (mem_begin - mem_during) >> (20 - PAGE_SHIFT)); 661 662 if (shutdown) { ··· 665 } 666 } 667 668 - torture_kthread_stopping("kfree_perf_thread"); 669 return 0; 670 } 671 672 static void 673 - kfree_perf_cleanup(void) 674 { 675 int i; 676 ··· 679 680 if (kfree_reader_tasks) { 681 for (i = 0; i < kfree_nrealthreads; i++) 682 - torture_stop_kthread(kfree_perf_thread, 683 kfree_reader_tasks[i]); 684 kfree(kfree_reader_tasks); 685 } ··· 691 * shutdown kthread. Just waits to be awakened, then shuts down system. 692 */ 693 static int 694 - kfree_perf_shutdown(void *arg) 695 { 696 wait_event(shutdown_wq, 697 - atomic_read(&n_kfree_perf_thread_ended) >= kfree_nrealthreads); 698 699 smp_mb(); /* Wake before output. */ 700 701 - kfree_perf_cleanup(); 702 kernel_power_off(); 703 return -EINVAL; 704 } 705 706 static int __init 707 - kfree_perf_init(void) 708 { 709 long i; 710 int firsterr = 0; ··· 713 /* Start up the kthreads. */ 714 if (shutdown) { 715 init_waitqueue_head(&shutdown_wq); 716 - firsterr = torture_create_kthread(kfree_perf_shutdown, NULL, 717 shutdown_task); 718 if (firsterr) 719 goto unwind; ··· 730 } 731 732 for (i = 0; i < kfree_nrealthreads; i++) { 733 - firsterr = torture_create_kthread(kfree_perf_thread, (void *)i, 734 kfree_reader_tasks[i]); 735 if (firsterr) 736 goto unwind; 737 } 738 739 - while (atomic_read(&n_kfree_perf_thread_started) < kfree_nrealthreads) 740 schedule_timeout_uninterruptible(1); 741 742 torture_init_end(); ··· 744 745 unwind: 746 torture_init_end(); 747 - kfree_perf_cleanup(); 748 return firsterr; 749 } 750 751 static int __init 752 - rcu_perf_init(void) 753 { 754 long i; 755 int firsterr = 0; 756 - static struct rcu_perf_ops *perf_ops[] = { 757 &rcu_ops, &srcu_ops, &srcud_ops, &tasks_ops, 758 }; 759 760 - if (!torture_init_begin(perf_type, verbose)) 761 return -EBUSY; 762 763 - /* Process args and tell the world that the perf'er is on the job. */ 764 - for (i = 0; i < ARRAY_SIZE(perf_ops); i++) { 765 - cur_ops = perf_ops[i]; 766 - if (strcmp(perf_type, cur_ops->name) == 0) 767 break; 768 } 769 - if (i == ARRAY_SIZE(perf_ops)) { 770 - pr_alert("rcu-perf: invalid perf type: \"%s\"\n", perf_type); 771 - pr_alert("rcu-perf types:"); 772 - for (i = 0; i < ARRAY_SIZE(perf_ops); i++) 773 - pr_cont(" %s", perf_ops[i]->name); 774 pr_cont("\n"); 775 - WARN_ON(!IS_MODULE(CONFIG_RCU_PERF_TEST)); 776 firsterr = -EINVAL; 777 cur_ops = NULL; 778 goto unwind; ··· 781 cur_ops->init(); 782 783 if (kfree_rcu_test) 784 - return kfree_perf_init(); 785 786 nrealwriters = compute_real(nwriters); 787 nrealreaders = compute_real(nreaders); 788 - atomic_set(&n_rcu_perf_reader_started, 0); 789 - atomic_set(&n_rcu_perf_writer_started, 0); 790 - atomic_set(&n_rcu_perf_writer_finished, 0); 791 - rcu_perf_print_module_parms(cur_ops, "Start of test"); 792 793 /* Start up the kthreads. */ 794 795 if (shutdown) { 796 init_waitqueue_head(&shutdown_wq); 797 - firsterr = torture_create_kthread(rcu_perf_shutdown, NULL, 798 shutdown_task); 799 if (firsterr) 800 goto unwind; ··· 803 reader_tasks = kcalloc(nrealreaders, sizeof(reader_tasks[0]), 804 GFP_KERNEL); 805 if (reader_tasks == NULL) { 806 - VERBOSE_PERFOUT_ERRSTRING("out of memory"); 807 firsterr = -ENOMEM; 808 goto unwind; 809 } 810 for (i = 0; i < nrealreaders; i++) { 811 - firsterr = torture_create_kthread(rcu_perf_reader, (void *)i, 812 reader_tasks[i]); 813 if (firsterr) 814 goto unwind; 815 } 816 - while (atomic_read(&n_rcu_perf_reader_started) < nrealreaders) 817 schedule_timeout_uninterruptible(1); 818 writer_tasks = kcalloc(nrealwriters, sizeof(reader_tasks[0]), 819 GFP_KERNEL); ··· 823 kcalloc(nrealwriters, sizeof(*writer_n_durations), 824 GFP_KERNEL); 825 if (!writer_tasks || !writer_durations || !writer_n_durations) { 826 - VERBOSE_PERFOUT_ERRSTRING("out of memory"); 827 firsterr = -ENOMEM; 828 goto unwind; 829 } ··· 835 firsterr = -ENOMEM; 836 goto unwind; 837 } 838 - firsterr = torture_create_kthread(rcu_perf_writer, (void *)i, 839 writer_tasks[i]); 840 if (firsterr) 841 goto unwind; ··· 845 846 unwind: 847 torture_init_end(); 848 - rcu_perf_cleanup(); 849 return firsterr; 850 } 851 852 - module_init(rcu_perf_init); 853 - module_exit(rcu_perf_cleanup);

··· 1 // SPDX-License-Identifier: GPL-2.0+ 2 /* 3 + * Read-Copy Update module-based scalability-test facility 4 * 5 * Copyright (C) IBM Corporation, 2015 6 * ··· 44 MODULE_LICENSE("GPL"); 45 MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>"); 46 47 + #define SCALE_FLAG "-scale:" 48 + #define SCALEOUT_STRING(s) \ 49 + pr_alert("%s" SCALE_FLAG " %s\n", scale_type, s) 50 + #define VERBOSE_SCALEOUT_STRING(s) \ 51 + do { if (verbose) pr_alert("%s" SCALE_FLAG " %s\n", scale_type, s); } while (0) 52 + #define VERBOSE_SCALEOUT_ERRSTRING(s) \ 53 + do { if (verbose) pr_alert("%s" SCALE_FLAG "!!! %s\n", scale_type, s); } while (0) 54 55 /* 56 * The intended use cases for the nreaders and nwriters module parameters ··· 61 * nr_cpus for a mixed reader/writer test. 62 * 63 * 2. Specify the nr_cpus kernel boot parameter, but set 64 + * rcuscale.nreaders to zero. This will set nwriters to the 65 * value specified by nr_cpus for an update-only test. 66 * 67 * 3. Specify the nr_cpus kernel boot parameter, but set 68 + * rcuscale.nwriters to zero. This will set nreaders to the 69 * value specified by nr_cpus for a read-only test. 70 * 71 * Various other use cases may of course be specified. 72 * 73 * Note that this test's readers are intended only as a test load for 74 + * the writers. The reader scalability statistics will be overly 75 * pessimistic due to the per-critical-section interrupt disabling, 76 * test-end checks, and the pair of calls through pointers. 77 */ 78 79 #ifdef MODULE 80 + # define RCUSCALE_SHUTDOWN 0 81 #else 82 + # define RCUSCALE_SHUTDOWN 1 83 #endif 84 85 torture_param(bool, gp_async, false, "Use asynchronous GP wait primitives"); ··· 88 torture_param(int, holdoff, 10, "Holdoff time before test start (s)"); 89 torture_param(int, nreaders, -1, "Number of RCU reader threads"); 90 torture_param(int, nwriters, -1, "Number of RCU updater threads"); 91 + torture_param(bool, shutdown, RCUSCALE_SHUTDOWN, 92 + "Shutdown at end of scalability tests."); 93 torture_param(int, verbose, 1, "Enable verbose debugging printk()s"); 94 torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable"); 95 + torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() scale test?"); 96 torture_param(int, kfree_mult, 1, "Multiple of kfree_obj size to allocate."); 97 98 + static char *scale_type = "rcu"; 99 + module_param(scale_type, charp, 0444); 100 + MODULE_PARM_DESC(scale_type, "Type of RCU to scalability-test (rcu, srcu, ...)"); 101 102 static int nrealreaders; 103 static int nrealwriters; ··· 107 108 static u64 **writer_durations; 109 static int *writer_n_durations; 110 + static atomic_t n_rcu_scale_reader_started; 111 + static atomic_t n_rcu_scale_writer_started; 112 + static atomic_t n_rcu_scale_writer_finished; 113 static wait_queue_head_t shutdown_wq; 114 + static u64 t_rcu_scale_writer_started; 115 + static u64 t_rcu_scale_writer_finished; 116 static unsigned long b_rcu_gp_test_started; 117 static unsigned long b_rcu_gp_test_finished; 118 static DEFINE_PER_CPU(atomic_t, n_async_inflight); ··· 124 * Operations vector for selecting different types of tests. 125 */ 126 127 + struct rcu_scale_ops { 128 int ptype; 129 void (*init)(void); 130 void (*cleanup)(void); ··· 140 const char *name; 141 }; 142 143 + static struct rcu_scale_ops *cur_ops; 144 145 /* 146 + * Definitions for rcu scalability testing. 147 */ 148 149 + static int rcu_scale_read_lock(void) __acquires(RCU) 150 { 151 rcu_read_lock(); 152 return 0; 153 } 154 155 + static void rcu_scale_read_unlock(int idx) __releases(RCU) 156 { 157 rcu_read_unlock(); 158 } ··· 162 return 0; 163 } 164 165 + static void rcu_sync_scale_init(void) 166 { 167 } 168 169 + static struct rcu_scale_ops rcu_ops = { 170 .ptype = RCU_FLAVOR, 171 + .init = rcu_sync_scale_init, 172 + .readlock = rcu_scale_read_lock, 173 + .readunlock = rcu_scale_read_unlock, 174 .get_gp_seq = rcu_get_gp_seq, 175 .gp_diff = rcu_seq_diff, 176 .exp_completed = rcu_exp_batches_completed, ··· 182 }; 183 184 /* 185 + * Definitions for srcu scalability testing. 186 */ 187 188 + DEFINE_STATIC_SRCU(srcu_ctl_scale); 189 + static struct srcu_struct *srcu_ctlp = &srcu_ctl_scale; 190 191 + static int srcu_scale_read_lock(void) __acquires(srcu_ctlp) 192 { 193 return srcu_read_lock(srcu_ctlp); 194 } 195 196 + static void srcu_scale_read_unlock(int idx) __releases(srcu_ctlp) 197 { 198 srcu_read_unlock(srcu_ctlp, idx); 199 } 200 201 + static unsigned long srcu_scale_completed(void) 202 { 203 return srcu_batches_completed(srcu_ctlp); 204 } ··· 213 srcu_barrier(srcu_ctlp); 214 } 215 216 + static void srcu_scale_synchronize(void) 217 { 218 synchronize_srcu(srcu_ctlp); 219 } 220 221 + static void srcu_scale_synchronize_expedited(void) 222 { 223 synchronize_srcu_expedited(srcu_ctlp); 224 } 225 226 + static struct rcu_scale_ops srcu_ops = { 227 .ptype = SRCU_FLAVOR, 228 + .init = rcu_sync_scale_init, 229 + .readlock = srcu_scale_read_lock, 230 + .readunlock = srcu_scale_read_unlock, 231 + .get_gp_seq = srcu_scale_completed, 232 .gp_diff = rcu_seq_diff, 233 + .exp_completed = srcu_scale_completed, 234 .async = srcu_call_rcu, 235 .gp_barrier = srcu_rcu_barrier, 236 + .sync = srcu_scale_synchronize, 237 + .exp_sync = srcu_scale_synchronize_expedited, 238 .name = "srcu" 239 }; 240 241 static struct srcu_struct srcud; 242 243 + static void srcu_sync_scale_init(void) 244 { 245 srcu_ctlp = &srcud; 246 init_srcu_struct(srcu_ctlp); 247 } 248 249 + static void srcu_sync_scale_cleanup(void) 250 { 251 cleanup_srcu_struct(srcu_ctlp); 252 } 253 254 + static struct rcu_scale_ops srcud_ops = { 255 .ptype = SRCU_FLAVOR, 256 + .init = srcu_sync_scale_init, 257 + .cleanup = srcu_sync_scale_cleanup, 258 + .readlock = srcu_scale_read_lock, 259 + .readunlock = srcu_scale_read_unlock, 260 + .get_gp_seq = srcu_scale_completed, 261 .gp_diff = rcu_seq_diff, 262 + .exp_completed = srcu_scale_completed, 263 .async = srcu_call_rcu, 264 .gp_barrier = srcu_rcu_barrier, 265 + .sync = srcu_scale_synchronize, 266 + .exp_sync = srcu_scale_synchronize_expedited, 267 .name = "srcud" 268 }; 269 270 /* 271 + * Definitions for RCU-tasks scalability testing. 272 */ 273 274 + static int tasks_scale_read_lock(void) 275 { 276 return 0; 277 } 278 279 + static void tasks_scale_read_unlock(int idx) 280 { 281 } 282 283 + static struct rcu_scale_ops tasks_ops = { 284 .ptype = RCU_TASKS_FLAVOR, 285 + .init = rcu_sync_scale_init, 286 + .readlock = tasks_scale_read_lock, 287 + .readunlock = tasks_scale_read_unlock, 288 .get_gp_seq = rcu_no_completed, 289 .gp_diff = rcu_seq_diff, 290 .async = call_rcu_tasks, ··· 294 .name = "tasks" 295 }; 296 297 + static unsigned long rcuscale_seq_diff(unsigned long new, unsigned long old) 298 { 299 if (!cur_ops->gp_diff) 300 return new - old; ··· 302 } 303 304 /* 305 + * If scalability tests complete, wait for shutdown to commence. 306 */ 307 + static void rcu_scale_wait_shutdown(void) 308 { 309 cond_resched_tasks_rcu_qs(); 310 + if (atomic_read(&n_rcu_scale_writer_finished) < nrealwriters) 311 return; 312 while (!torture_must_stop()) 313 schedule_timeout_uninterruptible(1); 314 } 315 316 /* 317 + * RCU scalability reader kthread. Repeatedly does empty RCU read-side 318 + * critical section, minimizing update-side interference. However, the 319 + * point of this test is not to evaluate reader scalability, but instead 320 + * to serve as a test load for update-side scalability testing. 321 */ 322 static int 323 + rcu_scale_reader(void *arg) 324 { 325 unsigned long flags; 326 int idx; 327 long me = (long)arg; 328 329 + VERBOSE_SCALEOUT_STRING("rcu_scale_reader task started"); 330 set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids)); 331 set_user_nice(current, MAX_NICE); 332 + atomic_inc(&n_rcu_scale_reader_started); 333 334 do { 335 local_irq_save(flags); 336 idx = cur_ops->readlock(); 337 cur_ops->readunlock(idx); 338 local_irq_restore(flags); 339 + rcu_scale_wait_shutdown(); 340 } while (!torture_must_stop()); 341 + torture_kthread_stopping("rcu_scale_reader"); 342 return 0; 343 } 344 345 /* 346 + * Callback function for asynchronous grace periods from rcu_scale_writer(). 347 */ 348 + static void rcu_scale_async_cb(struct rcu_head *rhp) 349 { 350 atomic_dec(this_cpu_ptr(&n_async_inflight)); 351 kfree(rhp); 352 } 353 354 /* 355 + * RCU scale writer kthread. Repeatedly does a grace period. 356 */ 357 static int 358 + rcu_scale_writer(void *arg) 359 { 360 int i = 0; 361 int i_max; ··· 366 u64 *wdp; 367 u64 *wdpp = writer_durations[me]; 368 369 + VERBOSE_SCALEOUT_STRING("rcu_scale_writer task started"); 370 WARN_ON(!wdpp); 371 set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids)); 372 sched_set_fifo_low(current); ··· 383 schedule_timeout_uninterruptible(1); 384 385 t = ktime_get_mono_fast_ns(); 386 + if (atomic_inc_return(&n_rcu_scale_writer_started) >= nrealwriters) { 387 + t_rcu_scale_writer_started = t; 388 if (gp_exp) { 389 b_rcu_gp_test_started = 390 cur_ops->exp_completed() / 2; ··· 404 rhp = kmalloc(sizeof(*rhp), GFP_KERNEL); 405 if (rhp && atomic_read(this_cpu_ptr(&n_async_inflight)) < gp_async_max) { 406 atomic_inc(this_cpu_ptr(&n_async_inflight)); 407 + cur_ops->async(rhp, rcu_scale_async_cb); 408 rhp = NULL; 409 } else if (!kthread_should_stop()) { 410 cur_ops->gp_barrier(); ··· 421 *wdp = t - *wdp; 422 i_max = i; 423 if (!started && 424 + atomic_read(&n_rcu_scale_writer_started) >= nrealwriters) 425 started = true; 426 if (!done && i >= MIN_MEAS) { 427 done = true; 428 sched_set_normal(current, 0); 429 + pr_alert("%s%s rcu_scale_writer %ld has %d measurements\n", 430 + scale_type, SCALE_FLAG, me, MIN_MEAS); 431 + if (atomic_inc_return(&n_rcu_scale_writer_finished) >= 432 nrealwriters) { 433 schedule_timeout_interruptible(10); 434 rcu_ftrace_dump(DUMP_ALL); 435 + SCALEOUT_STRING("Test complete"); 436 + t_rcu_scale_writer_finished = t; 437 if (gp_exp) { 438 b_rcu_gp_test_finished = 439 cur_ops->exp_completed() / 2; ··· 448 } 449 } 450 if (done && !alldone && 451 + atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters) 452 alldone = true; 453 if (started && !alldone && i < MAX_MEAS - 1) 454 i++; 455 + rcu_scale_wait_shutdown(); 456 } while (!torture_must_stop()); 457 if (gp_async) { 458 cur_ops->gp_barrier(); 459 } 460 writer_n_durations[me] = i_max; 461 + torture_kthread_stopping("rcu_scale_writer"); 462 return 0; 463 } 464 465 static void 466 + rcu_scale_print_module_parms(struct rcu_scale_ops *cur_ops, const char *tag) 467 { 468 + pr_alert("%s" SCALE_FLAG 469 "--- %s: nreaders=%d nwriters=%d verbose=%d shutdown=%d\n", 470 + scale_type, tag, nrealreaders, nrealwriters, verbose, shutdown); 471 } 472 473 static void 474 + rcu_scale_cleanup(void) 475 { 476 int i; 477 int j; ··· 484 * during the mid-boot phase, so have to wait till the end. 485 */ 486 if (rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp) 487 + VERBOSE_SCALEOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!"); 488 if (rcu_gp_is_normal() && gp_exp) 489 + VERBOSE_SCALEOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!"); 490 if (gp_exp && gp_async) 491 + VERBOSE_SCALEOUT_ERRSTRING("No expedited async GPs, so went with async!"); 492 493 if (torture_cleanup_begin()) 494 return; ··· 499 500 if (reader_tasks) { 501 for (i = 0; i < nrealreaders; i++) 502 + torture_stop_kthread(rcu_scale_reader, 503 reader_tasks[i]); 504 kfree(reader_tasks); 505 } 506 507 if (writer_tasks) { 508 for (i = 0; i < nrealwriters; i++) { 509 + torture_stop_kthread(rcu_scale_writer, 510 writer_tasks[i]); 511 if (!writer_n_durations) 512 continue; 513 j = writer_n_durations[i]; 514 pr_alert("%s%s writer %d gps: %d\n", 515 + scale_type, SCALE_FLAG, i, j); 516 ngps += j; 517 } 518 pr_alert("%s%s start: %llu end: %llu duration: %llu gps: %d batches: %ld\n", 519 + scale_type, SCALE_FLAG, 520 + t_rcu_scale_writer_started, t_rcu_scale_writer_finished, 521 + t_rcu_scale_writer_finished - 522 + t_rcu_scale_writer_started, 523 ngps, 524 + rcuscale_seq_diff(b_rcu_gp_test_finished, 525 + b_rcu_gp_test_started)); 526 for (i = 0; i < nrealwriters; i++) { 527 if (!writer_durations) 528 break; ··· 534 for (j = 0; j <= writer_n_durations[i]; j++) { 535 wdp = &wdpp[j]; 536 pr_alert("%s%s %4d writer-duration: %5d %llu\n", 537 + scale_type, SCALE_FLAG, 538 i, j, *wdp); 539 if (j % 100 == 0) 540 schedule_timeout_uninterruptible(1); ··· 573 } 574 575 /* 576 + * RCU scalability shutdown kthread. Just waits to be awakened, then shuts 577 * down system. 578 */ 579 static int 580 + rcu_scale_shutdown(void *arg) 581 { 582 wait_event(shutdown_wq, 583 + atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters); 584 smp_mb(); /* Wake before output. */ 585 + rcu_scale_cleanup(); 586 kernel_power_off(); 587 return -EINVAL; 588 } 589 590 /* 591 + * kfree_rcu() scalability tests: Start a kfree_rcu() loop on all CPUs for number 592 * of iterations and measure total time and number of GP for all iterations to complete. 593 */ 594 ··· 598 599 static struct task_struct **kfree_reader_tasks; 600 static int kfree_nrealthreads; 601 + static atomic_t n_kfree_scale_thread_started; 602 + static atomic_t n_kfree_scale_thread_ended; 603 604 struct kfree_obj { 605 char kfree_obj[8]; ··· 607 }; 608 609 static int 610 + kfree_scale_thread(void *arg) 611 { 612 int i, loop = 0; 613 long me = (long)arg; ··· 615 u64 start_time, end_time; 616 long long mem_begin, mem_during = 0; 617 618 + VERBOSE_SCALEOUT_STRING("kfree_scale_thread task started"); 619 set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids)); 620 set_user_nice(current, MAX_NICE); 621 622 start_time = ktime_get_mono_fast_ns(); 623 624 + if (atomic_inc_return(&n_kfree_scale_thread_started) >= kfree_nrealthreads) { 625 if (gp_exp) 626 b_rcu_gp_test_started = cur_ops->exp_completed() / 2; 627 else ··· 646 cond_resched(); 647 } while (!torture_must_stop() && ++loop < kfree_loops); 648 649 + if (atomic_inc_return(&n_kfree_scale_thread_ended) >= kfree_nrealthreads) { 650 end_time = ktime_get_mono_fast_ns(); 651 652 if (gp_exp) ··· 656 657 pr_alert("Total time taken by all kfree'ers: %llu ns, loops: %d, batches: %ld, memory footprint: %lldMB\n", 658 (unsigned long long)(end_time - start_time), kfree_loops, 659 + rcuscale_seq_diff(b_rcu_gp_test_finished, b_rcu_gp_test_started), 660 (mem_begin - mem_during) >> (20 - PAGE_SHIFT)); 661 662 if (shutdown) { ··· 665 } 666 } 667 668 + torture_kthread_stopping("kfree_scale_thread"); 669 return 0; 670 } 671 672 static void 673 + kfree_scale_cleanup(void) 674 { 675 int i; 676 ··· 679 680 if (kfree_reader_tasks) { 681 for (i = 0; i < kfree_nrealthreads; i++) 682 + torture_stop_kthread(kfree_scale_thread, 683 kfree_reader_tasks[i]); 684 kfree(kfree_reader_tasks); 685 } ··· 691 * shutdown kthread. Just waits to be awakened, then shuts down system. 692 */ 693 static int 694 + kfree_scale_shutdown(void *arg) 695 { 696 wait_event(shutdown_wq, 697 + atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads); 698 699 smp_mb(); /* Wake before output. */ 700 701 + kfree_scale_cleanup(); 702 kernel_power_off(); 703 return -EINVAL; 704 } 705 706 static int __init 707 + kfree_scale_init(void) 708 { 709 long i; 710 int firsterr = 0; ··· 713 /* Start up the kthreads. */ 714 if (shutdown) { 715 init_waitqueue_head(&shutdown_wq); 716 + firsterr = torture_create_kthread(kfree_scale_shutdown, NULL, 717 shutdown_task); 718 if (firsterr) 719 goto unwind; ··· 730 } 731 732 for (i = 0; i < kfree_nrealthreads; i++) { 733 + firsterr = torture_create_kthread(kfree_scale_thread, (void *)i, 734 kfree_reader_tasks[i]); 735 if (firsterr) 736 goto unwind; 737 } 738 739 + while (atomic_read(&n_kfree_scale_thread_started) < kfree_nrealthreads) 740 schedule_timeout_uninterruptible(1); 741 742 torture_init_end(); ··· 744 745 unwind: 746 torture_init_end(); 747 + kfree_scale_cleanup(); 748 return firsterr; 749 } 750 751 static int __init 752 + rcu_scale_init(void) 753 { 754 long i; 755 int firsterr = 0; 756 + static struct rcu_scale_ops *scale_ops[] = { 757 &rcu_ops, &srcu_ops, &srcud_ops, &tasks_ops, 758 }; 759 760 + if (!torture_init_begin(scale_type, verbose)) 761 return -EBUSY; 762 763 + /* Process args and announce that the scalability'er is on the job. */ 764 + for (i = 0; i < ARRAY_SIZE(scale_ops); i++) { 765 + cur_ops = scale_ops[i]; 766 + if (strcmp(scale_type, cur_ops->name) == 0) 767 break; 768 } 769 + if (i == ARRAY_SIZE(scale_ops)) { 770 + pr_alert("rcu-scale: invalid scale type: \"%s\"\n", scale_type); 771 + pr_alert("rcu-scale types:"); 772 + for (i = 0; i < ARRAY_SIZE(scale_ops); i++) 773 + pr_cont(" %s", scale_ops[i]->name); 774 pr_cont("\n"); 775 + WARN_ON(!IS_MODULE(CONFIG_RCU_SCALE_TEST)); 776 firsterr = -EINVAL; 777 cur_ops = NULL; 778 goto unwind; ··· 781 cur_ops->init(); 782 783 if (kfree_rcu_test) 784 + return kfree_scale_init(); 785 786 nrealwriters = compute_real(nwriters); 787 nrealreaders = compute_real(nreaders); 788 + atomic_set(&n_rcu_scale_reader_started, 0); 789 + atomic_set(&n_rcu_scale_writer_started, 0); 790 + atomic_set(&n_rcu_scale_writer_finished, 0); 791 + rcu_scale_print_module_parms(cur_ops, "Start of test"); 792 793 /* Start up the kthreads. */ 794 795 if (shutdown) { 796 init_waitqueue_head(&shutdown_wq); 797 + firsterr = torture_create_kthread(rcu_scale_shutdown, NULL, 798 shutdown_task); 799 if (firsterr) 800 goto unwind; ··· 803 reader_tasks = kcalloc(nrealreaders, sizeof(reader_tasks[0]), 804 GFP_KERNEL); 805 if (reader_tasks == NULL) { 806 + VERBOSE_SCALEOUT_ERRSTRING("out of memory"); 807 firsterr = -ENOMEM; 808 goto unwind; 809 } 810 for (i = 0; i < nrealreaders; i++) { 811 + firsterr = torture_create_kthread(rcu_scale_reader, (void *)i, 812 reader_tasks[i]); 813 if (firsterr) 814 goto unwind; 815 } 816 + while (atomic_read(&n_rcu_scale_reader_started) < nrealreaders) 817 schedule_timeout_uninterruptible(1); 818 writer_tasks = kcalloc(nrealwriters, sizeof(reader_tasks[0]), 819 GFP_KERNEL); ··· 823 kcalloc(nrealwriters, sizeof(*writer_n_durations), 824 GFP_KERNEL); 825 if (!writer_tasks || !writer_durations || !writer_n_durations) { 826 + VERBOSE_SCALEOUT_ERRSTRING("out of memory"); 827 firsterr = -ENOMEM; 828 goto unwind; 829 } ··· 835 firsterr = -ENOMEM; 836 goto unwind; 837 } 838 + firsterr = torture_create_kthread(rcu_scale_writer, (void *)i, 839 writer_tasks[i]); 840 if (firsterr) 841 goto unwind; ··· 845 846 unwind: 847 torture_init_end(); 848 + rcu_scale_cleanup(); 849 return firsterr; 850 } 851 852 + module_init(rcu_scale_init); 853 + module_exit(rcu_scale_cleanup);

+42 -19

kernel/rcu/rcutorture.c

··· 52 MODULE_LICENSE("GPL"); 53 MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com> and Josh Triplett <josh@joshtriplett.org>"); 54 55 - #ifndef data_race 56 - #define data_race(expr) \ 57 - ({ \ 58 - expr; \ 59 - }) 60 - #endif 61 - #ifndef ASSERT_EXCLUSIVE_WRITER 62 - #define ASSERT_EXCLUSIVE_WRITER(var) do { } while (0) 63 - #endif 64 - #ifndef ASSERT_EXCLUSIVE_ACCESS 65 - #define ASSERT_EXCLUSIVE_ACCESS(var) do { } while (0) 66 - #endif 67 - 68 /* Bits for ->extendables field, extendables param, and related definitions. */ 69 #define RCUTORTURE_RDR_SHIFT 8 /* Put SRCU index in upper bits. */ 70 #define RCUTORTURE_RDR_MASK ((1 << RCUTORTURE_RDR_SHIFT) - 1) ··· 87 "Use normal (non-expedited) GP wait primitives"); 88 torture_param(bool, gp_sync, false, "Use synchronous GP wait primitives"); 89 torture_param(int, irqreader, 1, "Allow RCU readers from irq handlers"); 90 torture_param(int, n_barrier_cbs, 0, 91 "# of callbacks/kthreads for barrier testing"); 92 torture_param(int, nfakewriters, 4, "Number of RCU fake writer threads"); ··· 173 static unsigned long n_read_exits; 174 static struct list_head rcu_torture_removed; 175 static unsigned long shutdown_jiffies; 176 177 static int rcu_torture_writer_state; 178 #define RTWS_FIXED_DELAY 0 ··· 1402 preempt_enable(); 1403 rcutorture_one_extend(&readstate, 0, trsp, rtrsp); 1404 WARN_ON_ONCE(readstate & RCUTORTURE_RDR_MASK); 1405 1406 /* If error or close call, record the sequence of reader protections. */ 1407 if ((pipe_count > 1 || completed > 1) && !xchg(&err_segs_recorded, 1)) { ··· 1800 unsigned long rcu_launder_gp_seq_start; 1801 }; 1802 1803 static struct rcu_fwd *rcu_fwds; 1804 static bool rcu_fwd_emergency_stop; 1805 ··· 2067 static int rcutorture_oom_notify(struct notifier_block *self, 2068 unsigned long notused, void *nfreed) 2069 { 2070 - struct rcu_fwd *rfp = rcu_fwds; 2071 2072 WARN(1, "%s invoked upon OOM during forward-progress testing.\n", 2073 __func__); 2074 rcu_torture_fwd_cb_hist(rfp); ··· 2092 smp_mb(); /* Frees before return to avoid redoing OOM. */ 2093 (*(unsigned long *)nfreed)++; /* Forward progress CBs freed! */ 2094 pr_info("%s returning after OOM processing.\n", __func__); 2095 return NOTIFY_OK; 2096 } 2097 ··· 2114 do { 2115 schedule_timeout_interruptible(fwd_progress_holdoff * HZ); 2116 WRITE_ONCE(rcu_fwd_emergency_stop, false); 2117 - register_oom_notifier(&rcutorture_oom_nb); 2118 if (!IS_ENABLED(CONFIG_TINY_RCU) || 2119 rcu_inkernel_boot_has_ended()) 2120 rcu_torture_fwd_prog_nr(rfp, &tested, &tested_tries); 2121 if (rcu_inkernel_boot_has_ended()) 2122 rcu_torture_fwd_prog_cr(rfp); 2123 - unregister_oom_notifier(&rcutorture_oom_nb); 2124 2125 /* Avoid slow periods, better to test when busy. */ 2126 stutter_wait("rcu_torture_fwd_prog"); ··· 2158 return -ENOMEM; 2159 spin_lock_init(&rfp->rcu_fwd_lock); 2160 rfp->rcu_fwd_cb_tail = &rfp->rcu_fwd_cb_head; 2161 return torture_create_kthread(rcu_torture_fwd_prog, rfp, fwd_prog_task); 2162 } 2163 2164 /* Callback function for RCU barrier testing. */ ··· 2475 show_rcu_gp_kthreads(); 2476 rcu_torture_read_exit_cleanup(); 2477 rcu_torture_barrier_cleanup(); 2478 - torture_stop_kthread(rcu_torture_fwd_prog, fwd_prog_task); 2479 torture_stop_kthread(rcu_torture_stall, stall_task); 2480 torture_stop_kthread(rcu_torture_writer, writer_task); 2481 ··· 2497 2498 rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq); 2499 srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq); 2500 - pr_alert("%s: End-test grace-period state: g%lu f%#x\n", 2501 - cur_ops->name, gp_seq, flags); 2502 torture_stop_kthread(rcu_torture_stats, stats_task); 2503 torture_stop_kthread(rcu_torture_fqs, fqs_task); 2504 if (rcu_torture_can_boost()) ··· 2623 long i; 2624 int cpu; 2625 int firsterr = 0; 2626 static struct rcu_torture_ops *torture_ops[] = { 2627 &rcu_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops, 2628 &busted_srcud_ops, &tasks_ops, &tasks_rude_ops, ··· 2667 nrealreaders = 1; 2668 } 2669 rcu_torture_print_module_parms(cur_ops, "Start of test"); 2670 2671 /* Set up the freelist. */ 2672

··· 52 MODULE_LICENSE("GPL"); 53 MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com> and Josh Triplett <josh@joshtriplett.org>"); 54 55 /* Bits for ->extendables field, extendables param, and related definitions. */ 56 #define RCUTORTURE_RDR_SHIFT 8 /* Put SRCU index in upper bits. */ 57 #define RCUTORTURE_RDR_MASK ((1 << RCUTORTURE_RDR_SHIFT) - 1) ··· 100 "Use normal (non-expedited) GP wait primitives"); 101 torture_param(bool, gp_sync, false, "Use synchronous GP wait primitives"); 102 torture_param(int, irqreader, 1, "Allow RCU readers from irq handlers"); 103 + torture_param(int, leakpointer, 0, "Leak pointer dereferences from readers"); 104 torture_param(int, n_barrier_cbs, 0, 105 "# of callbacks/kthreads for barrier testing"); 106 torture_param(int, nfakewriters, 4, "Number of RCU fake writer threads"); ··· 185 static unsigned long n_read_exits; 186 static struct list_head rcu_torture_removed; 187 static unsigned long shutdown_jiffies; 188 + static unsigned long start_gp_seq; 189 190 static int rcu_torture_writer_state; 191 #define RTWS_FIXED_DELAY 0 ··· 1413 preempt_enable(); 1414 rcutorture_one_extend(&readstate, 0, trsp, rtrsp); 1415 WARN_ON_ONCE(readstate & RCUTORTURE_RDR_MASK); 1416 + // This next splat is expected behavior if leakpointer, especially 1417 + // for CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels. 1418 + WARN_ON_ONCE(leakpointer && READ_ONCE(p->rtort_pipe_count) > 1); 1419 1420 /* If error or close call, record the sequence of reader protections. */ 1421 if ((pipe_count > 1 || completed > 1) && !xchg(&err_segs_recorded, 1)) { ··· 1808 unsigned long rcu_launder_gp_seq_start; 1809 }; 1810 1811 + static DEFINE_MUTEX(rcu_fwd_mutex); 1812 static struct rcu_fwd *rcu_fwds; 1813 static bool rcu_fwd_emergency_stop; 1814 ··· 2074 static int rcutorture_oom_notify(struct notifier_block *self, 2075 unsigned long notused, void *nfreed) 2076 { 2077 + struct rcu_fwd *rfp; 2078 2079 + mutex_lock(&rcu_fwd_mutex); 2080 + rfp = rcu_fwds; 2081 + if (!rfp) { 2082 + mutex_unlock(&rcu_fwd_mutex); 2083 + return NOTIFY_OK; 2084 + } 2085 WARN(1, "%s invoked upon OOM during forward-progress testing.\n", 2086 __func__); 2087 rcu_torture_fwd_cb_hist(rfp); ··· 2093 smp_mb(); /* Frees before return to avoid redoing OOM. */ 2094 (*(unsigned long *)nfreed)++; /* Forward progress CBs freed! */ 2095 pr_info("%s returning after OOM processing.\n", __func__); 2096 + mutex_unlock(&rcu_fwd_mutex); 2097 return NOTIFY_OK; 2098 } 2099 ··· 2114 do { 2115 schedule_timeout_interruptible(fwd_progress_holdoff * HZ); 2116 WRITE_ONCE(rcu_fwd_emergency_stop, false); 2117 if (!IS_ENABLED(CONFIG_TINY_RCU) || 2118 rcu_inkernel_boot_has_ended()) 2119 rcu_torture_fwd_prog_nr(rfp, &tested, &tested_tries); 2120 if (rcu_inkernel_boot_has_ended()) 2121 rcu_torture_fwd_prog_cr(rfp); 2122 2123 /* Avoid slow periods, better to test when busy. */ 2124 stutter_wait("rcu_torture_fwd_prog"); ··· 2160 return -ENOMEM; 2161 spin_lock_init(&rfp->rcu_fwd_lock); 2162 rfp->rcu_fwd_cb_tail = &rfp->rcu_fwd_cb_head; 2163 + mutex_lock(&rcu_fwd_mutex); 2164 + rcu_fwds = rfp; 2165 + mutex_unlock(&rcu_fwd_mutex); 2166 + register_oom_notifier(&rcutorture_oom_nb); 2167 return torture_create_kthread(rcu_torture_fwd_prog, rfp, fwd_prog_task); 2168 + } 2169 + 2170 + static void rcu_torture_fwd_prog_cleanup(void) 2171 + { 2172 + struct rcu_fwd *rfp; 2173 + 2174 + torture_stop_kthread(rcu_torture_fwd_prog, fwd_prog_task); 2175 + rfp = rcu_fwds; 2176 + mutex_lock(&rcu_fwd_mutex); 2177 + rcu_fwds = NULL; 2178 + mutex_unlock(&rcu_fwd_mutex); 2179 + unregister_oom_notifier(&rcutorture_oom_nb); 2180 + kfree(rfp); 2181 } 2182 2183 /* Callback function for RCU barrier testing. */ ··· 2460 show_rcu_gp_kthreads(); 2461 rcu_torture_read_exit_cleanup(); 2462 rcu_torture_barrier_cleanup(); 2463 + rcu_torture_fwd_prog_cleanup(); 2464 torture_stop_kthread(rcu_torture_stall, stall_task); 2465 torture_stop_kthread(rcu_torture_writer, writer_task); 2466 ··· 2482 2483 rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq); 2484 srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq); 2485 + pr_alert("%s: End-test grace-period state: g%ld f%#x total-gps=%ld\n", 2486 + cur_ops->name, (long)gp_seq, flags, 2487 + rcutorture_seq_diff(gp_seq, start_gp_seq)); 2488 torture_stop_kthread(rcu_torture_stats, stats_task); 2489 torture_stop_kthread(rcu_torture_fqs, fqs_task); 2490 if (rcu_torture_can_boost()) ··· 2607 long i; 2608 int cpu; 2609 int firsterr = 0; 2610 + int flags = 0; 2611 + unsigned long gp_seq = 0; 2612 static struct rcu_torture_ops *torture_ops[] = { 2613 &rcu_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops, 2614 &busted_srcud_ops, &tasks_ops, &tasks_rude_ops, ··· 2649 nrealreaders = 1; 2650 } 2651 rcu_torture_print_module_parms(cur_ops, "Start of test"); 2652 + rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq); 2653 + srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq); 2654 + start_gp_seq = gp_seq; 2655 + pr_alert("%s: Start-test grace-period state: g%ld f%#x\n", 2656 + cur_ops->name, (long)gp_seq, flags); 2657 2658 /* Set up the freelist. */ 2659

+5 -3

kernel/rcu/refscale.c

··· 546 // Print the average of all experiments 547 SCALEOUT("END OF TEST. Calculating average duration per loop (nanoseconds)...\n"); 548 549 - buf[0] = 0; 550 - strcat(buf, "\n"); 551 - strcat(buf, "Runs\tTime(ns)\n"); 552 553 for (exp = 0; exp < nruns; exp++) { 554 u64 avg;

··· 546 // Print the average of all experiments 547 SCALEOUT("END OF TEST. Calculating average duration per loop (nanoseconds)...\n"); 548 549 + if (!errexit) { 550 + buf[0] = 0; 551 + strcat(buf, "\n"); 552 + strcat(buf, "Runs\tTime(ns)\n"); 553 + } 554 555 for (exp = 0; exp < nruns; exp++) { 556 u64 avg;

-13

kernel/rcu/srcutree.c

··· 29 #include "rcu.h" 30 #include "rcu_segcblist.h" 31 32 - #ifndef data_race 33 - #define data_race(expr) \ 34 - ({ \ 35 - expr; \ 36 - }) 37 - #endif 38 - #ifndef ASSERT_EXCLUSIVE_WRITER 39 - #define ASSERT_EXCLUSIVE_WRITER(var) do { } while (0) 40 - #endif 41 - #ifndef ASSERT_EXCLUSIVE_ACCESS 42 - #define ASSERT_EXCLUSIVE_ACCESS(var) do { } while (0) 43 - #endif 44 - 45 /* Holdoff in nanoseconds for auto-expediting. */ 46 #define DEFAULT_SRCU_EXP_HOLDOFF (25 * 1000) 47 static ulong exp_holdoff = DEFAULT_SRCU_EXP_HOLDOFF;

··· 29 #include "rcu.h" 30 #include "rcu_segcblist.h" 31 32 /* Holdoff in nanoseconds for auto-expediting. */ 33 #define DEFAULT_SRCU_EXP_HOLDOFF (25 * 1000) 34 static ulong exp_holdoff = DEFAULT_SRCU_EXP_HOLDOFF;

+111 -56

kernel/rcu/tree.c

··· 70 #endif 71 #define MODULE_PARAM_PREFIX "rcutree." 72 73 - #ifndef data_race 74 - #define data_race(expr) \ 75 - ({ \ 76 - expr; \ 77 - }) 78 - #endif 79 - #ifndef ASSERT_EXCLUSIVE_WRITER 80 - #define ASSERT_EXCLUSIVE_WRITER(var) do { } while (0) 81 - #endif 82 - #ifndef ASSERT_EXCLUSIVE_ACCESS 83 - #define ASSERT_EXCLUSIVE_ACCESS(var) do { } while (0) 84 - #endif 85 - 86 /* Data structures. */ 87 88 /* ··· 164 module_param(gp_init_delay, int, 0444); 165 static int gp_cleanup_delay; 166 module_param(gp_cleanup_delay, int, 0444); 167 168 /* 169 * This rcu parameter is runtime-read-only. It reflects ··· 461 return __this_cpu_read(rcu_data.dynticks_nesting) == 0; 462 } 463 464 - #define DEFAULT_RCU_BLIMIT 10 /* Maximum callbacks per rcu_do_batch ... */ 465 - #define DEFAULT_MAX_RCU_BLIMIT 10000 /* ... even during callback flood. */ 466 static long blimit = DEFAULT_RCU_BLIMIT; 467 - #define DEFAULT_RCU_QHIMARK 10000 /* If this many pending, ignore blimit. */ 468 static long qhimark = DEFAULT_RCU_QHIMARK; 469 - #define DEFAULT_RCU_QLOMARK 100 /* Once only this many pending, use blimit. */ 470 static long qlowmark = DEFAULT_RCU_QLOMARK; 471 #define DEFAULT_RCU_QOVLD_MULT 2 472 #define DEFAULT_RCU_QOVLD (DEFAULT_RCU_QOVLD_MULT * DEFAULT_RCU_QHIMARK) 473 - static long qovld = DEFAULT_RCU_QOVLD; /* If this many pending, hammer QS. */ 474 - static long qovld_calc = -1; /* No pre-initialization lock acquisitions! */ 475 476 module_param(blimit, long, 0444); 477 module_param(qhimark, long, 0444); 478 module_param(qlowmark, long, 0444); 479 module_param(qovld, long, 0444); 480 481 - static ulong jiffies_till_first_fqs = ULONG_MAX; 482 static ulong jiffies_till_next_fqs = ULONG_MAX; 483 static bool rcu_kick_kthreads; 484 static int rcu_divisor = 7; ··· 1086 } 1087 } 1088 1089 - noinstr bool __rcu_is_watching(void) 1090 - { 1091 - return !rcu_dynticks_curr_cpu_in_eqs(); 1092 - } 1093 - 1094 /** 1095 * rcu_is_watching - see if RCU thinks that the current CPU is not idle 1096 * ··· 1218 return 1; 1219 } 1220 1221 - /* If waiting too long on an offline CPU, complain. */ 1222 - if (!(rdp->grpmask & rcu_rnp_online_cpus(rnp)) && 1223 - time_after(jiffies, rcu_state.gp_start + HZ)) { 1224 bool onl; 1225 struct rcu_node *rnp1; 1226 1227 - WARN_ON(1); /* Offline CPUs are supposed to report QS! */ 1228 pr_info("%s: grp: %d-%d level: %d ->gp_seq %ld ->completedqs %ld\n", 1229 __func__, rnp->grplo, rnp->grphi, rnp->level, 1230 (long)rnp->gp_seq, (long)rnp->completedqs); ··· 1502 1503 /* Trace depending on how much we were able to accelerate. */ 1504 if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL)) 1505 - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB")); 1506 else 1507 - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB")); 1508 return ret; 1509 } 1510 ··· 1581 } 1582 1583 /* 1584 * Update CPU-local rcu_data state to record the beginnings and ends of 1585 * grace periods. The caller must hold the ->lock of the leaf rcu_node 1586 * structure corresponding to the current CPU, and must have irqs disabled. ··· 1663 } 1664 needwake = __note_gp_changes(rnp, rdp); 1665 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 1666 if (needwake) 1667 rcu_gp_kthread_wake(); 1668 } ··· 1699 schedule_timeout_idle(duration); 1700 pr_alert("%s: Wait complete\n", __func__); 1701 } 1702 } 1703 1704 /* ··· 1748 raw_spin_unlock_irq_rcu_node(rnp); 1749 1750 /* 1751 - * Apply per-leaf buffered online and offline operations to the 1752 - * rcu_node tree. Note that this new grace period need not wait 1753 - * for subsequent online CPUs, and that quiescent-state forcing 1754 - * will handle subsequent offline CPUs. 1755 */ 1756 rcu_state.gp_state = RCU_GP_ONOFF; 1757 rcu_for_each_leaf_node(rnp) { ··· 1840 cond_resched_tasks_rcu_qs(); 1841 WRITE_ONCE(rcu_state.gp_activity, jiffies); 1842 } 1843 1844 return true; 1845 } ··· 1933 break; 1934 /* If time for quiescent-state forcing, do it. */ 1935 if (!time_after(rcu_state.jiffies_force_qs, jiffies) || 1936 - (gf & RCU_GP_FLAG_FQS)) { 1937 trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, 1938 TPS("fqsstart")); 1939 rcu_gp_fqs(first_gp_fqs); ··· 2061 rcu_state.gp_flags & RCU_GP_FLAG_INIT); 2062 } 2063 raw_spin_unlock_irq_rcu_node(rnp); 2064 } 2065 2066 /* ··· 2243 * structure. This must be called from the specified CPU. 2244 */ 2245 static void 2246 - rcu_report_qs_rdp(int cpu, struct rcu_data *rdp) 2247 { 2248 unsigned long flags; 2249 unsigned long mask; ··· 2252 rcu_segcblist_is_offloaded(&rdp->cblist); 2253 struct rcu_node *rnp; 2254 2255 rnp = rdp->mynode; 2256 raw_spin_lock_irqsave_rcu_node(rnp, flags); 2257 if (rdp->cpu_no_qs.b.norm || rdp->gp_seq != rnp->gp_seq || ··· 2269 return; 2270 } 2271 mask = rdp->grpmask; 2272 - if (rdp->cpu == smp_processor_id()) 2273 - rdp->core_needs_qs = false; 2274 if ((rnp->qsmask & mask) == 0) { 2275 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 2276 } else { ··· 2318 * Tell RCU we are done (but rcu_report_qs_rdp() will be the 2319 * judge of that). 2320 */ 2321 - rcu_report_qs_rdp(rdp->cpu, rdp); 2322 } 2323 2324 /* ··· 2415 */ 2416 static void rcu_do_batch(struct rcu_data *rdp) 2417 { 2418 unsigned long flags; 2419 const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) && 2420 rcu_segcblist_is_offloaded(&rdp->cblist); ··· 2444 rcu_nocb_lock(rdp); 2445 WARN_ON_ONCE(cpu_is_offline(smp_processor_id())); 2446 pending = rcu_segcblist_n_cbs(&rdp->cblist); 2447 - bl = max(rdp->blimit, pending >> rcu_divisor); 2448 - if (unlikely(bl > 100)) 2449 - tlimit = local_clock() + rcu_resched_ns; 2450 trace_rcu_batch_start(rcu_state.name, 2451 rcu_segcblist_n_cbs(&rdp->cblist), bl); 2452 rcu_segcblist_extract_done_cbs(&rdp->cblist, &rcl); ··· 2593 raw_spin_lock_irqsave_rcu_node(rnp, flags); 2594 rcu_state.cbovldnext |= !!rnp->cbovldmask; 2595 if (rnp->qsmask == 0) { 2596 - if (!IS_ENABLED(CONFIG_PREEMPT_RCU) || 2597 - rcu_preempt_blocked_readers_cgp(rnp)) { 2598 /* 2599 * No point in scanning bits because they 2600 * are all zero. But we might need to ··· 2661 } 2662 EXPORT_SYMBOL_GPL(rcu_force_quiescent_state); 2663 2664 /* Perform RCU core processing work for the current CPU. */ 2665 static __latent_entropy void rcu_core(void) 2666 { ··· 2713 /* Do any needed deferred wakeups of rcuo kthreads. */ 2714 do_nocb_deferred_wakeup(rdp); 2715 trace_rcu_utilization(TPS("End RCU core")); 2716 } 2717 2718 static void rcu_core_si(struct softirq_action *h) ··· 3502 unsigned long count = 0; 3503 3504 /* Snapshot count of all CPUs */ 3505 - for_each_online_cpu(cpu) { 3506 struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 3507 3508 count += READ_ONCE(krcp->count); ··· 3517 int cpu, freed = 0; 3518 unsigned long flags; 3519 3520 - for_each_online_cpu(cpu) { 3521 int count; 3522 struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 3523 ··· 3550 int cpu; 3551 unsigned long flags; 3552 3553 - for_each_online_cpu(cpu) { 3554 struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 3555 3556 raw_spin_lock_irqsave(&krcp->lock, flags); ··· 3914 3915 /* Set up local state, ensuring consistent view of global state. */ 3916 rdp->grpmask = leaf_node_cpu_bit(rdp->mynode, cpu); 3917 WARN_ON_ONCE(rdp->dynticks_nesting != 1); 3918 WARN_ON_ONCE(rcu_dynticks_in_eqs(rcu_dynticks_snap(rdp))); 3919 rdp->rcu_ofl_gp_seq = rcu_state.gp_seq; ··· 4033 return 0; 4034 } 4035 4036 - static DEFINE_PER_CPU(int, rcu_cpu_started); 4037 - 4038 /* 4039 * Mark the specified CPU as being online so that subsequent grace periods 4040 * (both expedited and normal) will wait on it. Note that this means that ··· 4052 struct rcu_node *rnp; 4053 bool newcpu; 4054 4055 - if (per_cpu(rcu_cpu_started, cpu)) 4056 - return; 4057 - 4058 - per_cpu(rcu_cpu_started, cpu) = 1; 4059 - 4060 rdp = per_cpu_ptr(&rcu_data, cpu); 4061 rnp = rdp->mynode; 4062 mask = rdp->grpmask; 4063 raw_spin_lock_irqsave_rcu_node(rnp, flags); ··· 4116 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 4117 raw_spin_unlock(&rcu_state.ofl_lock); 4118 4119 - per_cpu(rcu_cpu_started, cpu) = 0; 4120 } 4121 4122 /*

··· 70 #endif 71 #define MODULE_PARAM_PREFIX "rcutree." 72 73 /* Data structures. */ 74 75 /* ··· 177 module_param(gp_init_delay, int, 0444); 178 static int gp_cleanup_delay; 179 module_param(gp_cleanup_delay, int, 0444); 180 + 181 + // Add delay to rcu_read_unlock() for strict grace periods. 182 + static int rcu_unlock_delay; 183 + #ifdef CONFIG_RCU_STRICT_GRACE_PERIOD 184 + module_param(rcu_unlock_delay, int, 0444); 185 + #endif 186 187 /* 188 * This rcu parameter is runtime-read-only. It reflects ··· 468 return __this_cpu_read(rcu_data.dynticks_nesting) == 0; 469 } 470 471 + #define DEFAULT_RCU_BLIMIT (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ? 1000 : 10) 472 + // Maximum callbacks per rcu_do_batch ... 473 + #define DEFAULT_MAX_RCU_BLIMIT 10000 // ... even during callback flood. 474 static long blimit = DEFAULT_RCU_BLIMIT; 475 + #define DEFAULT_RCU_QHIMARK 10000 // If this many pending, ignore blimit. 476 static long qhimark = DEFAULT_RCU_QHIMARK; 477 + #define DEFAULT_RCU_QLOMARK 100 // Once only this many pending, use blimit. 478 static long qlowmark = DEFAULT_RCU_QLOMARK; 479 #define DEFAULT_RCU_QOVLD_MULT 2 480 #define DEFAULT_RCU_QOVLD (DEFAULT_RCU_QOVLD_MULT * DEFAULT_RCU_QHIMARK) 481 + static long qovld = DEFAULT_RCU_QOVLD; // If this many pending, hammer QS. 482 + static long qovld_calc = -1; // No pre-initialization lock acquisitions! 483 484 module_param(blimit, long, 0444); 485 module_param(qhimark, long, 0444); 486 module_param(qlowmark, long, 0444); 487 module_param(qovld, long, 0444); 488 489 + static ulong jiffies_till_first_fqs = IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ? 0 : ULONG_MAX; 490 static ulong jiffies_till_next_fqs = ULONG_MAX; 491 static bool rcu_kick_kthreads; 492 static int rcu_divisor = 7; ··· 1092 } 1093 } 1094 1095 /** 1096 * rcu_is_watching - see if RCU thinks that the current CPU is not idle 1097 * ··· 1229 return 1; 1230 } 1231 1232 + /* 1233 + * Complain if a CPU that is considered to be offline from RCU's 1234 + * perspective has not yet reported a quiescent state. After all, 1235 + * the offline CPU should have reported a quiescent state during 1236 + * the CPU-offline process, or, failing that, by rcu_gp_init() 1237 + * if it ran concurrently with either the CPU going offline or the 1238 + * last task on a leaf rcu_node structure exiting its RCU read-side 1239 + * critical section while all CPUs corresponding to that structure 1240 + * are offline. This added warning detects bugs in any of these 1241 + * code paths. 1242 + * 1243 + * The rcu_node structure's ->lock is held here, which excludes 1244 + * the relevant portions the CPU-hotplug code, the grace-period 1245 + * initialization code, and the rcu_read_unlock() code paths. 1246 + * 1247 + * For more detail, please refer to the "Hotplug CPU" section 1248 + * of RCU's Requirements documentation. 1249 + */ 1250 + if (WARN_ON_ONCE(!(rdp->grpmask & rcu_rnp_online_cpus(rnp)))) { 1251 bool onl; 1252 struct rcu_node *rnp1; 1253 1254 pr_info("%s: grp: %d-%d level: %d ->gp_seq %ld ->completedqs %ld\n", 1255 __func__, rnp->grplo, rnp->grphi, rnp->level, 1256 (long)rnp->gp_seq, (long)rnp->completedqs); ··· 1498 1499 /* Trace depending on how much we were able to accelerate. */ 1500 if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL)) 1501 + trace_rcu_grace_period(rcu_state.name, gp_seq_req, TPS("AccWaitCB")); 1502 else 1503 + trace_rcu_grace_period(rcu_state.name, gp_seq_req, TPS("AccReadyCB")); 1504 + 1505 return ret; 1506 } 1507 ··· 1576 } 1577 1578 /* 1579 + * In CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, attempt to generate a 1580 + * quiescent state. This is intended to be invoked when the CPU notices 1581 + * a new grace period. 1582 + */ 1583 + static void rcu_strict_gp_check_qs(void) 1584 + { 1585 + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) { 1586 + rcu_read_lock(); 1587 + rcu_read_unlock(); 1588 + } 1589 + } 1590 + 1591 + /* 1592 * Update CPU-local rcu_data state to record the beginnings and ends of 1593 * grace periods. The caller must hold the ->lock of the leaf rcu_node 1594 * structure corresponding to the current CPU, and must have irqs disabled. ··· 1645 } 1646 needwake = __note_gp_changes(rnp, rdp); 1647 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 1648 + rcu_strict_gp_check_qs(); 1649 if (needwake) 1650 rcu_gp_kthread_wake(); 1651 } ··· 1680 schedule_timeout_idle(duration); 1681 pr_alert("%s: Wait complete\n", __func__); 1682 } 1683 + } 1684 + 1685 + /* 1686 + * Handler for on_each_cpu() to invoke the target CPU's RCU core 1687 + * processing. 1688 + */ 1689 + static void rcu_strict_gp_boundary(void *unused) 1690 + { 1691 + invoke_rcu_core(); 1692 } 1693 1694 /* ··· 1720 raw_spin_unlock_irq_rcu_node(rnp); 1721 1722 /* 1723 + * Apply per-leaf buffered online and offline operations to 1724 + * the rcu_node tree. Note that this new grace period need not 1725 + * wait for subsequent online CPUs, and that RCU hooks in the CPU 1726 + * offlining path, when combined with checks in this function, 1727 + * will handle CPUs that are currently going offline or that will 1728 + * go offline later. Please also refer to "Hotplug CPU" section 1729 + * of RCU's Requirements documentation. 1730 */ 1731 rcu_state.gp_state = RCU_GP_ONOFF; 1732 rcu_for_each_leaf_node(rnp) { ··· 1809 cond_resched_tasks_rcu_qs(); 1810 WRITE_ONCE(rcu_state.gp_activity, jiffies); 1811 } 1812 + 1813 + // If strict, make all CPUs aware of new grace period. 1814 + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) 1815 + on_each_cpu(rcu_strict_gp_boundary, NULL, 0); 1816 1817 return true; 1818 } ··· 1898 break; 1899 /* If time for quiescent-state forcing, do it. */ 1900 if (!time_after(rcu_state.jiffies_force_qs, jiffies) || 1901 + (gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) { 1902 trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, 1903 TPS("fqsstart")); 1904 rcu_gp_fqs(first_gp_fqs); ··· 2026 rcu_state.gp_flags & RCU_GP_FLAG_INIT); 2027 } 2028 raw_spin_unlock_irq_rcu_node(rnp); 2029 + 2030 + // If strict, make all CPUs aware of the end of the old grace period. 2031 + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) 2032 + on_each_cpu(rcu_strict_gp_boundary, NULL, 0); 2033 } 2034 2035 /* ··· 2204 * structure. This must be called from the specified CPU. 2205 */ 2206 static void 2207 + rcu_report_qs_rdp(struct rcu_data *rdp) 2208 { 2209 unsigned long flags; 2210 unsigned long mask; ··· 2213 rcu_segcblist_is_offloaded(&rdp->cblist); 2214 struct rcu_node *rnp; 2215 2216 + WARN_ON_ONCE(rdp->cpu != smp_processor_id()); 2217 rnp = rdp->mynode; 2218 raw_spin_lock_irqsave_rcu_node(rnp, flags); 2219 if (rdp->cpu_no_qs.b.norm || rdp->gp_seq != rnp->gp_seq || ··· 2229 return; 2230 } 2231 mask = rdp->grpmask; 2232 + rdp->core_needs_qs = false; 2233 if ((rnp->qsmask & mask) == 0) { 2234 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 2235 } else { ··· 2279 * Tell RCU we are done (but rcu_report_qs_rdp() will be the 2280 * judge of that). 2281 */ 2282 + rcu_report_qs_rdp(rdp); 2283 } 2284 2285 /* ··· 2376 */ 2377 static void rcu_do_batch(struct rcu_data *rdp) 2378 { 2379 + int div; 2380 unsigned long flags; 2381 const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) && 2382 rcu_segcblist_is_offloaded(&rdp->cblist); ··· 2404 rcu_nocb_lock(rdp); 2405 WARN_ON_ONCE(cpu_is_offline(smp_processor_id())); 2406 pending = rcu_segcblist_n_cbs(&rdp->cblist); 2407 + div = READ_ONCE(rcu_divisor); 2408 + div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div; 2409 + bl = max(rdp->blimit, pending >> div); 2410 + if (unlikely(bl > 100)) { 2411 + long rrn = READ_ONCE(rcu_resched_ns); 2412 + 2413 + rrn = rrn < NSEC_PER_MSEC ? NSEC_PER_MSEC : rrn > NSEC_PER_SEC ? NSEC_PER_SEC : rrn; 2414 + tlimit = local_clock() + rrn; 2415 + } 2416 trace_rcu_batch_start(rcu_state.name, 2417 rcu_segcblist_n_cbs(&rdp->cblist), bl); 2418 rcu_segcblist_extract_done_cbs(&rdp->cblist, &rcl); ··· 2547 raw_spin_lock_irqsave_rcu_node(rnp, flags); 2548 rcu_state.cbovldnext |= !!rnp->cbovldmask; 2549 if (rnp->qsmask == 0) { 2550 + if (rcu_preempt_blocked_readers_cgp(rnp)) { 2551 /* 2552 * No point in scanning bits because they 2553 * are all zero. But we might need to ··· 2616 } 2617 EXPORT_SYMBOL_GPL(rcu_force_quiescent_state); 2618 2619 + // Workqueue handler for an RCU reader for kernels enforcing struct RCU 2620 + // grace periods. 2621 + static void strict_work_handler(struct work_struct *work) 2622 + { 2623 + rcu_read_lock(); 2624 + rcu_read_unlock(); 2625 + } 2626 + 2627 /* Perform RCU core processing work for the current CPU. */ 2628 static __latent_entropy void rcu_core(void) 2629 { ··· 2660 /* Do any needed deferred wakeups of rcuo kthreads. */ 2661 do_nocb_deferred_wakeup(rdp); 2662 trace_rcu_utilization(TPS("End RCU core")); 2663 + 2664 + // If strict GPs, schedule an RCU reader in a clean environment. 2665 + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) 2666 + queue_work_on(rdp->cpu, rcu_gp_wq, &rdp->strict_work); 2667 } 2668 2669 static void rcu_core_si(struct softirq_action *h) ··· 3445 unsigned long count = 0; 3446 3447 /* Snapshot count of all CPUs */ 3448 + for_each_possible_cpu(cpu) { 3449 struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 3450 3451 count += READ_ONCE(krcp->count); ··· 3460 int cpu, freed = 0; 3461 unsigned long flags; 3462 3463 + for_each_possible_cpu(cpu) { 3464 int count; 3465 struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 3466 ··· 3493 int cpu; 3494 unsigned long flags; 3495 3496 + for_each_possible_cpu(cpu) { 3497 struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 3498 3499 raw_spin_lock_irqsave(&krcp->lock, flags); ··· 3857 3858 /* Set up local state, ensuring consistent view of global state. */ 3859 rdp->grpmask = leaf_node_cpu_bit(rdp->mynode, cpu); 3860 + INIT_WORK(&rdp->strict_work, strict_work_handler); 3861 WARN_ON_ONCE(rdp->dynticks_nesting != 1); 3862 WARN_ON_ONCE(rcu_dynticks_in_eqs(rcu_dynticks_snap(rdp))); 3863 rdp->rcu_ofl_gp_seq = rcu_state.gp_seq; ··· 3975 return 0; 3976 } 3977 3978 /* 3979 * Mark the specified CPU as being online so that subsequent grace periods 3980 * (both expedited and normal) will wait on it. Note that this means that ··· 3996 struct rcu_node *rnp; 3997 bool newcpu; 3998 3999 rdp = per_cpu_ptr(&rcu_data, cpu); 4000 + if (rdp->cpu_started) 4001 + return; 4002 + rdp->cpu_started = true; 4003 + 4004 rnp = rdp->mynode; 4005 mask = rdp->grpmask; 4006 raw_spin_lock_irqsave_rcu_node(rnp, flags); ··· 4061 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 4062 raw_spin_unlock(&rcu_state.ofl_lock); 4063 4064 + rdp->cpu_started = false; 4065 } 4066 4067 /*

+2

kernel/rcu/tree.h

··· 156 bool beenonline; /* CPU online at least once. */ 157 bool gpwrap; /* Possible ->gp_seq wrap. */ 158 bool exp_deferred_qs; /* This CPU awaiting a deferred QS? */ 159 struct rcu_node *mynode; /* This CPU's leaf of hierarchy */ 160 unsigned long grpmask; /* Mask to apply to leaf qsmask. */ 161 unsigned long ticks_this_gp; /* The number of scheduling-clock */ ··· 165 /* period it is aware of. */ 166 struct irq_work defer_qs_iw; /* Obtain later scheduler attention. */ 167 bool defer_qs_iw_pending; /* Scheduler attention pending? */ 168 169 /* 2) batch handling */ 170 struct rcu_segcblist cblist; /* Segmented callback list, with */

··· 156 bool beenonline; /* CPU online at least once. */ 157 bool gpwrap; /* Possible ->gp_seq wrap. */ 158 bool exp_deferred_qs; /* This CPU awaiting a deferred QS? */ 159 + bool cpu_started; /* RCU watching this onlining CPU. */ 160 struct rcu_node *mynode; /* This CPU's leaf of hierarchy */ 161 unsigned long grpmask; /* Mask to apply to leaf qsmask. */ 162 unsigned long ticks_this_gp; /* The number of scheduling-clock */ ··· 164 /* period it is aware of. */ 165 struct irq_work defer_qs_iw; /* Obtain later scheduler attention. */ 166 bool defer_qs_iw_pending; /* Scheduler attention pending? */ 167 + struct work_struct strict_work; /* Schedule readers for strict GPs. */ 168 169 /* 2) batch handling */ 170 struct rcu_segcblist cblist; /* Segmented callback list, with */

+2 -4

kernel/rcu/tree_exp.h

··· 732 /* Invoked on each online non-idle CPU for expedited quiescent state. */ 733 static void rcu_exp_handler(void *unused) 734 { 735 - struct rcu_data *rdp; 736 - struct rcu_node *rnp; 737 738 - rdp = this_cpu_ptr(&rcu_data); 739 - rnp = rdp->mynode; 740 if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) || 741 __this_cpu_read(rcu_data.cpu_no_qs.b.exp)) 742 return;

··· 732 /* Invoked on each online non-idle CPU for expedited quiescent state. */ 733 static void rcu_exp_handler(void *unused) 734 { 735 + struct rcu_data *rdp = this_cpu_ptr(&rcu_data); 736 + struct rcu_node *rnp = rdp->mynode; 737 738 if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) || 739 __this_cpu_read(rcu_data.cpu_no_qs.b.exp)) 740 return;

+34 -6

kernel/rcu/tree_plugin.h

··· 36 pr_info("\tRCU dyntick-idle grace-period acceleration is enabled.\n"); 37 if (IS_ENABLED(CONFIG_PROVE_RCU)) 38 pr_info("\tRCU lockdep checking is enabled.\n"); 39 if (RCU_NUM_LVLS >= 4) 40 pr_info("\tFour(or more)-level hierarchy is enabled.\n"); 41 if (RCU_FANOUT_LEAF != 16) ··· 376 rcu_preempt_read_enter(); 377 if (IS_ENABLED(CONFIG_PROVE_LOCKING)) 378 WARN_ON_ONCE(rcu_preempt_depth() > RCU_NEST_PMAX); 379 barrier(); /* critical section after entry code. */ 380 } 381 EXPORT_SYMBOL_GPL(__rcu_read_lock); ··· 459 return; 460 } 461 t->rcu_read_unlock_special.s = 0; 462 - if (special.b.need_qs) 463 - rcu_qs(); 464 465 /* 466 * Respond to a request by an expedited grace period for a ··· 777 } 778 779 #else /* #ifdef CONFIG_PREEMPT_RCU */ 780 781 /* 782 * Tell them what RCU they are running. ··· 1954 * nearest grace period (if any) to wait for next. The CB kthreads 1955 * and the global grace-period kthread are awakened if needed. 1956 */ 1957 for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) { 1958 trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check")); 1959 rcu_nocb_lock_irqsave(rdp, flags); ··· 2440 return; 2441 2442 waslocked = raw_spin_is_locked(&rdp->nocb_gp_lock); 2443 - wastimer = timer_pending(&rdp->nocb_timer); 2444 wassleep = swait_active(&rdp->nocb_gp_wq); 2445 - if (!rdp->nocb_defer_wakeup && !rdp->nocb_gp_sleep && 2446 - !waslocked && !wastimer && !wassleep) 2447 return; /* Nothing untowards. */ 2448 2449 - pr_info(" !!! %c%c%c%c %c\n", 2450 "lL"[waslocked], 2451 "dD"[!!rdp->nocb_defer_wakeup], 2452 "tT"[wastimer],

··· 36 pr_info("\tRCU dyntick-idle grace-period acceleration is enabled.\n"); 37 if (IS_ENABLED(CONFIG_PROVE_RCU)) 38 pr_info("\tRCU lockdep checking is enabled.\n"); 39 + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) 40 + pr_info("\tRCU strict (and thus non-scalable) grace periods enabled.\n"); 41 if (RCU_NUM_LVLS >= 4) 42 pr_info("\tFour(or more)-level hierarchy is enabled.\n"); 43 if (RCU_FANOUT_LEAF != 16) ··· 374 rcu_preempt_read_enter(); 375 if (IS_ENABLED(CONFIG_PROVE_LOCKING)) 376 WARN_ON_ONCE(rcu_preempt_depth() > RCU_NEST_PMAX); 377 + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) && rcu_state.gp_kthread) 378 + WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true); 379 barrier(); /* critical section after entry code. */ 380 } 381 EXPORT_SYMBOL_GPL(__rcu_read_lock); ··· 455 return; 456 } 457 t->rcu_read_unlock_special.s = 0; 458 + if (special.b.need_qs) { 459 + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) { 460 + rcu_report_qs_rdp(rdp); 461 + udelay(rcu_unlock_delay); 462 + } else { 463 + rcu_qs(); 464 + } 465 + } 466 467 /* 468 * Respond to a request by an expedited grace period for a ··· 767 } 768 769 #else /* #ifdef CONFIG_PREEMPT_RCU */ 770 + 771 + /* 772 + * If strict grace periods are enabled, and if the calling 773 + * __rcu_read_unlock() marks the beginning of a quiescent state, immediately 774 + * report that quiescent state and, if requested, spin for a bit. 775 + */ 776 + void rcu_read_unlock_strict(void) 777 + { 778 + struct rcu_data *rdp; 779 + 780 + if (!IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) || 781 + irqs_disabled() || preempt_count() || !rcu_state.gp_kthread) 782 + return; 783 + rdp = this_cpu_ptr(&rcu_data); 784 + rcu_report_qs_rdp(rdp); 785 + udelay(rcu_unlock_delay); 786 + } 787 + EXPORT_SYMBOL_GPL(rcu_read_unlock_strict); 788 789 /* 790 * Tell them what RCU they are running. ··· 1926 * nearest grace period (if any) to wait for next. The CB kthreads 1927 * and the global grace-period kthread are awakened if needed. 1928 */ 1929 + WARN_ON_ONCE(my_rdp->nocb_gp_rdp != my_rdp); 1930 for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) { 1931 trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check")); 1932 rcu_nocb_lock_irqsave(rdp, flags); ··· 2411 return; 2412 2413 waslocked = raw_spin_is_locked(&rdp->nocb_gp_lock); 2414 + wastimer = timer_pending(&rdp->nocb_bypass_timer); 2415 wassleep = swait_active(&rdp->nocb_gp_wq); 2416 + if (!rdp->nocb_gp_sleep && !waslocked && !wastimer && !wassleep) 2417 return; /* Nothing untowards. */ 2418 2419 + pr_info(" nocb GP activity on CB-only CPU!!! %c%c%c%c %c\n", 2420 "lL"[waslocked], 2421 "dD"[!!rdp->nocb_defer_wakeup], 2422 "tT"[wastimer],

+4 -4

kernel/rcu/tree_stall.h

··· 158 { 159 unsigned long j; 160 161 - if (!rcu_kick_kthreads) 162 return; 163 j = READ_ONCE(rcu_state.jiffies_kick_kthreads); 164 if (time_after(jiffies, j) && rcu_state.gp_kthread && ··· 580 unsigned long js; 581 struct rcu_node *rnp; 582 583 - if ((rcu_stall_is_suppressed() && !rcu_kick_kthreads) || 584 !rcu_gp_in_progress()) 585 return; 586 rcu_stall_kick_kthreads(); ··· 623 624 /* We haven't checked in, so go dump stack. */ 625 print_cpu_stall(gps); 626 - if (rcu_cpu_stall_ftrace_dump) 627 rcu_ftrace_dump(DUMP_ALL); 628 629 } else if (rcu_gp_in_progress() && ··· 632 633 /* They had a few time units to dump stack, so complain. */ 634 print_other_cpu_stall(gs2, gps); 635 - if (rcu_cpu_stall_ftrace_dump) 636 rcu_ftrace_dump(DUMP_ALL); 637 } 638 }

··· 158 { 159 unsigned long j; 160 161 + if (!READ_ONCE(rcu_kick_kthreads)) 162 return; 163 j = READ_ONCE(rcu_state.jiffies_kick_kthreads); 164 if (time_after(jiffies, j) && rcu_state.gp_kthread && ··· 580 unsigned long js; 581 struct rcu_node *rnp; 582 583 + if ((rcu_stall_is_suppressed() && !READ_ONCE(rcu_kick_kthreads)) || 584 !rcu_gp_in_progress()) 585 return; 586 rcu_stall_kick_kthreads(); ··· 623 624 /* We haven't checked in, so go dump stack. */ 625 print_cpu_stall(gps); 626 + if (READ_ONCE(rcu_cpu_stall_ftrace_dump)) 627 rcu_ftrace_dump(DUMP_ALL); 628 629 } else if (rcu_gp_in_progress() && ··· 632 633 /* They had a few time units to dump stack, so complain. */ 634 print_other_cpu_stall(gs2, gps); 635 + if (READ_ONCE(rcu_cpu_stall_ftrace_dump)) 636 rcu_ftrace_dump(DUMP_ALL); 637 } 638 }

-13

kernel/rcu/update.c

··· 53 #endif 54 #define MODULE_PARAM_PREFIX "rcupdate." 55 56 - #ifndef data_race 57 - #define data_race(expr) \ 58 - ({ \ 59 - expr; \ 60 - }) 61 - #endif 62 - #ifndef ASSERT_EXCLUSIVE_WRITER 63 - #define ASSERT_EXCLUSIVE_WRITER(var) do { } while (0) 64 - #endif 65 - #ifndef ASSERT_EXCLUSIVE_ACCESS 66 - #define ASSERT_EXCLUSIVE_ACCESS(var) do { } while (0) 67 - #endif 68 - 69 #ifndef CONFIG_TINY_RCU 70 module_param(rcu_expedited, int, 0); 71 module_param(rcu_normal, int, 0);

··· 53 #endif 54 #define MODULE_PARAM_PREFIX "rcupdate." 55 56 #ifndef CONFIG_TINY_RCU 57 module_param(rcu_expedited, int, 0); 58 module_param(rcu_normal, int, 0);

+575

kernel/scftorture.c

···

··· 1 + // SPDX-License-Identifier: GPL-2.0+ 2 + // 3 + // Torture test for smp_call_function() and friends. 4 + // 5 + // Copyright (C) Facebook, 2020. 6 + // 7 + // Author: Paul E. McKenney <paulmck@kernel.org> 8 + 9 + #define pr_fmt(fmt) fmt 10 + 11 + #include <linux/atomic.h> 12 + #include <linux/bitops.h> 13 + #include <linux/completion.h> 14 + #include <linux/cpu.h> 15 + #include <linux/delay.h> 16 + #include <linux/err.h> 17 + #include <linux/init.h> 18 + #include <linux/interrupt.h> 19 + #include <linux/kthread.h> 20 + #include <linux/kernel.h> 21 + #include <linux/mm.h> 22 + #include <linux/module.h> 23 + #include <linux/moduleparam.h> 24 + #include <linux/notifier.h> 25 + #include <linux/percpu.h> 26 + #include <linux/rcupdate.h> 27 + #include <linux/rcupdate_trace.h> 28 + #include <linux/reboot.h> 29 + #include <linux/sched.h> 30 + #include <linux/spinlock.h> 31 + #include <linux/smp.h> 32 + #include <linux/stat.h> 33 + #include <linux/srcu.h> 34 + #include <linux/slab.h> 35 + #include <linux/torture.h> 36 + #include <linux/types.h> 37 + 38 + #define SCFTORT_STRING "scftorture" 39 + #define SCFTORT_FLAG SCFTORT_STRING ": " 40 + 41 + #define SCFTORTOUT(s, x...) \ 42 + pr_alert(SCFTORT_FLAG s, ## x) 43 + 44 + #define VERBOSE_SCFTORTOUT(s, x...) \ 45 + do { if (verbose) pr_alert(SCFTORT_FLAG s, ## x); } while (0) 46 + 47 + #define VERBOSE_SCFTORTOUT_ERRSTRING(s, x...) \ 48 + do { if (verbose) pr_alert(SCFTORT_FLAG "!!! " s, ## x); } while (0) 49 + 50 + MODULE_LICENSE("GPL"); 51 + MODULE_AUTHOR("Paul E. McKenney <paulmck@kernel.org>"); 52 + 53 + // Wait until there are multiple CPUs before starting test. 54 + torture_param(int, holdoff, IS_BUILTIN(CONFIG_SCF_TORTURE_TEST) ? 10 : 0, 55 + "Holdoff time before test start (s)"); 56 + torture_param(int, longwait, 0, "Include ridiculously long waits? (seconds)"); 57 + torture_param(int, nthreads, -1, "# threads, defaults to -1 for all CPUs."); 58 + torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)"); 59 + torture_param(int, onoff_interval, 0, "Time between CPU hotplugs (s), 0=disable"); 60 + torture_param(int, shutdown_secs, 0, "Shutdown time (ms), <= zero to disable."); 61 + torture_param(int, stat_interval, 60, "Number of seconds between stats printk()s."); 62 + torture_param(int, stutter_cpus, 5, "Number of jiffies to change CPUs under test, 0=disable"); 63 + torture_param(bool, use_cpus_read_lock, 0, "Use cpus_read_lock() to exclude CPU hotplug."); 64 + torture_param(int, verbose, 0, "Enable verbose debugging printk()s"); 65 + torture_param(int, weight_single, -1, "Testing weight for single-CPU no-wait operations."); 66 + torture_param(int, weight_single_wait, -1, "Testing weight for single-CPU operations."); 67 + torture_param(int, weight_many, -1, "Testing weight for multi-CPU no-wait operations."); 68 + torture_param(int, weight_many_wait, -1, "Testing weight for multi-CPU operations."); 69 + torture_param(int, weight_all, -1, "Testing weight for all-CPU no-wait operations."); 70 + torture_param(int, weight_all_wait, -1, "Testing weight for all-CPU operations."); 71 + 72 + char *torture_type = ""; 73 + 74 + #ifdef MODULE 75 + # define SCFTORT_SHUTDOWN 0 76 + #else 77 + # define SCFTORT_SHUTDOWN 1 78 + #endif 79 + 80 + torture_param(bool, shutdown, SCFTORT_SHUTDOWN, "Shutdown at end of torture test."); 81 + 82 + struct scf_statistics { 83 + struct task_struct *task; 84 + int cpu; 85 + long long n_single; 86 + long long n_single_ofl; 87 + long long n_single_wait; 88 + long long n_single_wait_ofl; 89 + long long n_many; 90 + long long n_many_wait; 91 + long long n_all; 92 + long long n_all_wait; 93 + }; 94 + 95 + static struct scf_statistics *scf_stats_p; 96 + static struct task_struct *scf_torture_stats_task; 97 + static DEFINE_PER_CPU(long long, scf_invoked_count); 98 + 99 + // Data for random primitive selection 100 + #define SCF_PRIM_SINGLE 0 101 + #define SCF_PRIM_MANY 1 102 + #define SCF_PRIM_ALL 2 103 + #define SCF_NPRIMS (2 * 3) // Need wait and no-wait versions of each. 104 + 105 + static char *scf_prim_name[] = { 106 + "smp_call_function_single", 107 + "smp_call_function_many", 108 + "smp_call_function", 109 + }; 110 + 111 + struct scf_selector { 112 + unsigned long scfs_weight; 113 + int scfs_prim; 114 + bool scfs_wait; 115 + }; 116 + static struct scf_selector scf_sel_array[SCF_NPRIMS]; 117 + static int scf_sel_array_len; 118 + static unsigned long scf_sel_totweight; 119 + 120 + // Communicate between caller and handler. 121 + struct scf_check { 122 + bool scfc_in; 123 + bool scfc_out; 124 + int scfc_cpu; // -1 for not _single(). 125 + bool scfc_wait; 126 + }; 127 + 128 + // Use to wait for all threads to start. 129 + static atomic_t n_started; 130 + static atomic_t n_errs; 131 + static atomic_t n_mb_in_errs; 132 + static atomic_t n_mb_out_errs; 133 + static atomic_t n_alloc_errs; 134 + static bool scfdone; 135 + static char *bangstr = ""; 136 + 137 + static DEFINE_TORTURE_RANDOM_PERCPU(scf_torture_rand); 138 + 139 + // Print torture statistics. Caller must ensure serialization. 140 + static void scf_torture_stats_print(void) 141 + { 142 + int cpu; 143 + int i; 144 + long long invoked_count = 0; 145 + bool isdone = READ_ONCE(scfdone); 146 + struct scf_statistics scfs = {}; 147 + 148 + for_each_possible_cpu(cpu) 149 + invoked_count += data_race(per_cpu(scf_invoked_count, cpu)); 150 + for (i = 0; i < nthreads; i++) { 151 + scfs.n_single += scf_stats_p[i].n_single; 152 + scfs.n_single_ofl += scf_stats_p[i].n_single_ofl; 153 + scfs.n_single_wait += scf_stats_p[i].n_single_wait; 154 + scfs.n_single_wait_ofl += scf_stats_p[i].n_single_wait_ofl; 155 + scfs.n_many += scf_stats_p[i].n_many; 156 + scfs.n_many_wait += scf_stats_p[i].n_many_wait; 157 + scfs.n_all += scf_stats_p[i].n_all; 158 + scfs.n_all_wait += scf_stats_p[i].n_all_wait; 159 + } 160 + if (atomic_read(&n_errs) || atomic_read(&n_mb_in_errs) || 161 + atomic_read(&n_mb_out_errs) || atomic_read(&n_alloc_errs)) 162 + bangstr = "!!! "; 163 + pr_alert("%s %sscf_invoked_count %s: %lld single: %lld/%lld single_ofl: %lld/%lld many: %lld/%lld all: %lld/%lld ", 164 + SCFTORT_FLAG, bangstr, isdone ? "VER" : "ver", invoked_count, 165 + scfs.n_single, scfs.n_single_wait, scfs.n_single_ofl, scfs.n_single_wait_ofl, 166 + scfs.n_many, scfs.n_many_wait, scfs.n_all, scfs.n_all_wait); 167 + torture_onoff_stats(); 168 + pr_cont("ste: %d stnmie: %d stnmoe: %d staf: %d\n", atomic_read(&n_errs), 169 + atomic_read(&n_mb_in_errs), atomic_read(&n_mb_out_errs), 170 + atomic_read(&n_alloc_errs)); 171 + } 172 + 173 + // Periodically prints torture statistics, if periodic statistics printing 174 + // was specified via the stat_interval module parameter. 175 + static int 176 + scf_torture_stats(void *arg) 177 + { 178 + VERBOSE_TOROUT_STRING("scf_torture_stats task started"); 179 + do { 180 + schedule_timeout_interruptible(stat_interval * HZ); 181 + scf_torture_stats_print(); 182 + torture_shutdown_absorb("scf_torture_stats"); 183 + } while (!torture_must_stop()); 184 + torture_kthread_stopping("scf_torture_stats"); 185 + return 0; 186 + } 187 + 188 + // Add a primitive to the scf_sel_array[]. 189 + static void scf_sel_add(unsigned long weight, int prim, bool wait) 190 + { 191 + struct scf_selector *scfsp = &scf_sel_array[scf_sel_array_len]; 192 + 193 + // If no weight, if array would overflow, if computing three-place 194 + // percentages would overflow, or if the scf_prim_name[] array would 195 + // overflow, don't bother. In the last three two cases, complain. 196 + if (!weight || 197 + WARN_ON_ONCE(scf_sel_array_len >= ARRAY_SIZE(scf_sel_array)) || 198 + WARN_ON_ONCE(0 - 100000 * weight <= 100000 * scf_sel_totweight) || 199 + WARN_ON_ONCE(prim >= ARRAY_SIZE(scf_prim_name))) 200 + return; 201 + scf_sel_totweight += weight; 202 + scfsp->scfs_weight = scf_sel_totweight; 203 + scfsp->scfs_prim = prim; 204 + scfsp->scfs_wait = wait; 205 + scf_sel_array_len++; 206 + } 207 + 208 + // Dump out weighting percentages for scf_prim_name[] array. 209 + static void scf_sel_dump(void) 210 + { 211 + int i; 212 + unsigned long oldw = 0; 213 + struct scf_selector *scfsp; 214 + unsigned long w; 215 + 216 + for (i = 0; i < scf_sel_array_len; i++) { 217 + scfsp = &scf_sel_array[i]; 218 + w = (scfsp->scfs_weight - oldw) * 100000 / scf_sel_totweight; 219 + pr_info("%s: %3lu.%03lu %s(%s)\n", __func__, w / 1000, w % 1000, 220 + scf_prim_name[scfsp->scfs_prim], 221 + scfsp->scfs_wait ? "wait" : "nowait"); 222 + oldw = scfsp->scfs_weight; 223 + } 224 + } 225 + 226 + // Randomly pick a primitive and wait/nowait, based on weightings. 227 + static struct scf_selector *scf_sel_rand(struct torture_random_state *trsp) 228 + { 229 + int i; 230 + unsigned long w = torture_random(trsp) % (scf_sel_totweight + 1); 231 + 232 + for (i = 0; i < scf_sel_array_len; i++) 233 + if (scf_sel_array[i].scfs_weight >= w) 234 + return &scf_sel_array[i]; 235 + WARN_ON_ONCE(1); 236 + return &scf_sel_array[0]; 237 + } 238 + 239 + // Update statistics and occasionally burn up mass quantities of CPU time, 240 + // if told to do so via scftorture.longwait. Otherwise, occasionally burn 241 + // a little bit. 242 + static void scf_handler(void *scfc_in) 243 + { 244 + int i; 245 + int j; 246 + unsigned long r = torture_random(this_cpu_ptr(&scf_torture_rand)); 247 + struct scf_check *scfcp = scfc_in; 248 + 249 + if (likely(scfcp)) { 250 + WRITE_ONCE(scfcp->scfc_out, false); // For multiple receivers. 251 + if (WARN_ON_ONCE(unlikely(!READ_ONCE(scfcp->scfc_in)))) 252 + atomic_inc(&n_mb_in_errs); 253 + } 254 + this_cpu_inc(scf_invoked_count); 255 + if (longwait <= 0) { 256 + if (!(r & 0xffc0)) 257 + udelay(r & 0x3f); 258 + goto out; 259 + } 260 + if (r & 0xfff) 261 + goto out; 262 + r = (r >> 12); 263 + if (longwait <= 0) { 264 + udelay((r & 0xff) + 1); 265 + goto out; 266 + } 267 + r = r % longwait + 1; 268 + for (i = 0; i < r; i++) { 269 + for (j = 0; j < 1000; j++) { 270 + udelay(1000); 271 + cpu_relax(); 272 + } 273 + } 274 + out: 275 + if (unlikely(!scfcp)) 276 + return; 277 + if (scfcp->scfc_wait) 278 + WRITE_ONCE(scfcp->scfc_out, true); 279 + else 280 + kfree(scfcp); 281 + } 282 + 283 + // As above, but check for correct CPU. 284 + static void scf_handler_1(void *scfc_in) 285 + { 286 + struct scf_check *scfcp = scfc_in; 287 + 288 + if (likely(scfcp) && WARN_ONCE(smp_processor_id() != scfcp->scfc_cpu, "%s: Wanted CPU %d got CPU %d\n", __func__, scfcp->scfc_cpu, smp_processor_id())) { 289 + atomic_inc(&n_errs); 290 + } 291 + scf_handler(scfcp); 292 + } 293 + 294 + // Randomly do an smp_call_function*() invocation. 295 + static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_random_state *trsp) 296 + { 297 + uintptr_t cpu; 298 + int ret = 0; 299 + struct scf_check *scfcp = NULL; 300 + struct scf_selector *scfsp = scf_sel_rand(trsp); 301 + 302 + if (use_cpus_read_lock) 303 + cpus_read_lock(); 304 + else 305 + preempt_disable(); 306 + if (scfsp->scfs_prim == SCF_PRIM_SINGLE || scfsp->scfs_wait) { 307 + scfcp = kmalloc(sizeof(*scfcp), GFP_ATOMIC); 308 + if (WARN_ON_ONCE(!scfcp)) { 309 + atomic_inc(&n_alloc_errs); 310 + } else { 311 + scfcp->scfc_cpu = -1; 312 + scfcp->scfc_wait = scfsp->scfs_wait; 313 + scfcp->scfc_out = false; 314 + } 315 + } 316 + switch (scfsp->scfs_prim) { 317 + case SCF_PRIM_SINGLE: 318 + cpu = torture_random(trsp) % nr_cpu_ids; 319 + if (scfsp->scfs_wait) 320 + scfp->n_single_wait++; 321 + else 322 + scfp->n_single++; 323 + if (scfcp) { 324 + scfcp->scfc_cpu = cpu; 325 + barrier(); // Prevent race-reduction compiler optimizations. 326 + scfcp->scfc_in = true; 327 + } 328 + ret = smp_call_function_single(cpu, scf_handler_1, (void *)scfcp, scfsp->scfs_wait); 329 + if (ret) { 330 + if (scfsp->scfs_wait) 331 + scfp->n_single_wait_ofl++; 332 + else 333 + scfp->n_single_ofl++; 334 + kfree(scfcp); 335 + scfcp = NULL; 336 + } 337 + break; 338 + case SCF_PRIM_MANY: 339 + if (scfsp->scfs_wait) 340 + scfp->n_many_wait++; 341 + else 342 + scfp->n_many++; 343 + if (scfcp) { 344 + barrier(); // Prevent race-reduction compiler optimizations. 345 + scfcp->scfc_in = true; 346 + } 347 + smp_call_function_many(cpu_online_mask, scf_handler, scfcp, scfsp->scfs_wait); 348 + break; 349 + case SCF_PRIM_ALL: 350 + if (scfsp->scfs_wait) 351 + scfp->n_all_wait++; 352 + else 353 + scfp->n_all++; 354 + if (scfcp) { 355 + barrier(); // Prevent race-reduction compiler optimizations. 356 + scfcp->scfc_in = true; 357 + } 358 + smp_call_function(scf_handler, scfcp, scfsp->scfs_wait); 359 + break; 360 + default: 361 + WARN_ON_ONCE(1); 362 + if (scfcp) 363 + scfcp->scfc_out = true; 364 + } 365 + if (scfcp && scfsp->scfs_wait) { 366 + if (WARN_ON_ONCE((num_online_cpus() > 1 || scfsp->scfs_prim == SCF_PRIM_SINGLE) && 367 + !scfcp->scfc_out)) 368 + atomic_inc(&n_mb_out_errs); // Leak rather than trash! 369 + else 370 + kfree(scfcp); 371 + barrier(); // Prevent race-reduction compiler optimizations. 372 + } 373 + if (use_cpus_read_lock) 374 + cpus_read_unlock(); 375 + else 376 + preempt_enable(); 377 + if (!(torture_random(trsp) & 0xfff)) 378 + schedule_timeout_uninterruptible(1); 379 + } 380 + 381 + // SCF test kthread. Repeatedly does calls to members of the 382 + // smp_call_function() family of functions. 383 + static int scftorture_invoker(void *arg) 384 + { 385 + int cpu; 386 + DEFINE_TORTURE_RANDOM(rand); 387 + struct scf_statistics *scfp = (struct scf_statistics *)arg; 388 + bool was_offline = false; 389 + 390 + VERBOSE_SCFTORTOUT("scftorture_invoker %d: task started", scfp->cpu); 391 + cpu = scfp->cpu % nr_cpu_ids; 392 + set_cpus_allowed_ptr(current, cpumask_of(cpu)); 393 + set_user_nice(current, MAX_NICE); 394 + if (holdoff) 395 + schedule_timeout_interruptible(holdoff * HZ); 396 + 397 + VERBOSE_SCFTORTOUT("scftorture_invoker %d: Waiting for all SCF torturers from cpu %d", scfp->cpu, smp_processor_id()); 398 + 399 + // Make sure that the CPU is affinitized appropriately during testing. 400 + WARN_ON_ONCE(smp_processor_id() != scfp->cpu); 401 + 402 + if (!atomic_dec_return(&n_started)) 403 + while (atomic_read_acquire(&n_started)) { 404 + if (torture_must_stop()) { 405 + VERBOSE_SCFTORTOUT("scftorture_invoker %d ended before starting", scfp->cpu); 406 + goto end; 407 + } 408 + schedule_timeout_uninterruptible(1); 409 + } 410 + 411 + VERBOSE_SCFTORTOUT("scftorture_invoker %d started", scfp->cpu); 412 + 413 + do { 414 + scftorture_invoke_one(scfp, &rand); 415 + while (cpu_is_offline(cpu) && !torture_must_stop()) { 416 + schedule_timeout_interruptible(HZ / 5); 417 + was_offline = true; 418 + } 419 + if (was_offline) { 420 + set_cpus_allowed_ptr(current, cpumask_of(cpu)); 421 + was_offline = false; 422 + } 423 + cond_resched(); 424 + } while (!torture_must_stop()); 425 + 426 + VERBOSE_SCFTORTOUT("scftorture_invoker %d ended", scfp->cpu); 427 + end: 428 + torture_kthread_stopping("scftorture_invoker"); 429 + return 0; 430 + } 431 + 432 + static void 433 + scftorture_print_module_parms(const char *tag) 434 + { 435 + pr_alert(SCFTORT_FLAG 436 + "--- %s: verbose=%d holdoff=%d longwait=%d nthreads=%d onoff_holdoff=%d onoff_interval=%d shutdown_secs=%d stat_interval=%d stutter_cpus=%d use_cpus_read_lock=%d, weight_single=%d, weight_single_wait=%d, weight_many=%d, weight_many_wait=%d, weight_all=%d, weight_all_wait=%d\n", tag, 437 + verbose, holdoff, longwait, nthreads, onoff_holdoff, onoff_interval, shutdown, stat_interval, stutter_cpus, use_cpus_read_lock, weight_single, weight_single_wait, weight_many, weight_many_wait, weight_all, weight_all_wait); 438 + } 439 + 440 + static void scf_cleanup_handler(void *unused) 441 + { 442 + } 443 + 444 + static void scf_torture_cleanup(void) 445 + { 446 + int i; 447 + 448 + if (torture_cleanup_begin()) 449 + return; 450 + 451 + WRITE_ONCE(scfdone, true); 452 + if (nthreads) 453 + for (i = 0; i < nthreads; i++) 454 + torture_stop_kthread("scftorture_invoker", scf_stats_p[i].task); 455 + else 456 + goto end; 457 + smp_call_function(scf_cleanup_handler, NULL, 0); 458 + torture_stop_kthread(scf_torture_stats, scf_torture_stats_task); 459 + scf_torture_stats_print(); // -After- the stats thread is stopped! 460 + kfree(scf_stats_p); // -After- the last stats print has completed! 461 + scf_stats_p = NULL; 462 + 463 + if (atomic_read(&n_errs) || atomic_read(&n_mb_in_errs) || atomic_read(&n_mb_out_errs)) 464 + scftorture_print_module_parms("End of test: FAILURE"); 465 + else if (torture_onoff_failures()) 466 + scftorture_print_module_parms("End of test: LOCK_HOTPLUG"); 467 + else 468 + scftorture_print_module_parms("End of test: SUCCESS"); 469 + 470 + end: 471 + torture_cleanup_end(); 472 + } 473 + 474 + static int __init scf_torture_init(void) 475 + { 476 + long i; 477 + int firsterr = 0; 478 + unsigned long weight_single1 = weight_single; 479 + unsigned long weight_single_wait1 = weight_single_wait; 480 + unsigned long weight_many1 = weight_many; 481 + unsigned long weight_many_wait1 = weight_many_wait; 482 + unsigned long weight_all1 = weight_all; 483 + unsigned long weight_all_wait1 = weight_all_wait; 484 + 485 + if (!torture_init_begin(SCFTORT_STRING, verbose)) 486 + return -EBUSY; 487 + 488 + scftorture_print_module_parms("Start of test"); 489 + 490 + if (weight_single == -1 && weight_single_wait == -1 && 491 + weight_many == -1 && weight_many_wait == -1 && 492 + weight_all == -1 && weight_all_wait == -1) { 493 + weight_single1 = 2 * nr_cpu_ids; 494 + weight_single_wait1 = 2 * nr_cpu_ids; 495 + weight_many1 = 2; 496 + weight_many_wait1 = 2; 497 + weight_all1 = 1; 498 + weight_all_wait1 = 1; 499 + } else { 500 + if (weight_single == -1) 501 + weight_single1 = 0; 502 + if (weight_single_wait == -1) 503 + weight_single_wait1 = 0; 504 + if (weight_many == -1) 505 + weight_many1 = 0; 506 + if (weight_many_wait == -1) 507 + weight_many_wait1 = 0; 508 + if (weight_all == -1) 509 + weight_all1 = 0; 510 + if (weight_all_wait == -1) 511 + weight_all_wait1 = 0; 512 + } 513 + if (weight_single1 == 0 && weight_single_wait1 == 0 && 514 + weight_many1 == 0 && weight_many_wait1 == 0 && 515 + weight_all1 == 0 && weight_all_wait1 == 0) { 516 + VERBOSE_SCFTORTOUT_ERRSTRING("all zero weights makes no sense"); 517 + firsterr = -EINVAL; 518 + goto unwind; 519 + } 520 + scf_sel_add(weight_single1, SCF_PRIM_SINGLE, false); 521 + scf_sel_add(weight_single_wait1, SCF_PRIM_SINGLE, true); 522 + scf_sel_add(weight_many1, SCF_PRIM_MANY, false); 523 + scf_sel_add(weight_many_wait1, SCF_PRIM_MANY, true); 524 + scf_sel_add(weight_all1, SCF_PRIM_ALL, false); 525 + scf_sel_add(weight_all_wait1, SCF_PRIM_ALL, true); 526 + scf_sel_dump(); 527 + 528 + if (onoff_interval > 0) { 529 + firsterr = torture_onoff_init(onoff_holdoff * HZ, onoff_interval, NULL); 530 + if (firsterr) 531 + goto unwind; 532 + } 533 + if (shutdown_secs > 0) { 534 + firsterr = torture_shutdown_init(shutdown_secs, scf_torture_cleanup); 535 + if (firsterr) 536 + goto unwind; 537 + } 538 + 539 + // Worker tasks invoking smp_call_function(). 540 + if (nthreads < 0) 541 + nthreads = num_online_cpus(); 542 + scf_stats_p = kcalloc(nthreads, sizeof(scf_stats_p[0]), GFP_KERNEL); 543 + if (!scf_stats_p) { 544 + VERBOSE_SCFTORTOUT_ERRSTRING("out of memory"); 545 + firsterr = -ENOMEM; 546 + goto unwind; 547 + } 548 + 549 + VERBOSE_SCFTORTOUT("Starting %d smp_call_function() threads\n", nthreads); 550 + 551 + atomic_set(&n_started, nthreads); 552 + for (i = 0; i < nthreads; i++) { 553 + scf_stats_p[i].cpu = i; 554 + firsterr = torture_create_kthread(scftorture_invoker, (void *)&scf_stats_p[i], 555 + scf_stats_p[i].task); 556 + if (firsterr) 557 + goto unwind; 558 + } 559 + if (stat_interval > 0) { 560 + firsterr = torture_create_kthread(scf_torture_stats, NULL, scf_torture_stats_task); 561 + if (firsterr) 562 + goto unwind; 563 + } 564 + 565 + torture_init_end(); 566 + return 0; 567 + 568 + unwind: 569 + torture_init_end(); 570 + scf_torture_cleanup(); 571 + return firsterr; 572 + } 573 + 574 + module_init(scf_torture_init); 575 + module_exit(scf_torture_cleanup);

+134

kernel/smp.c

··· 20 #include <linux/sched.h> 21 #include <linux/sched/idle.h> 22 #include <linux/hypervisor.h> 23 24 #include "smpboot.h" 25 #include "sched/smp.h" ··· 99 smpcfd_prepare_cpu(smp_processor_id()); 100 } 101 102 /* 103 * csd_lock/csd_unlock used to serialize access to per-cpu csd resources 104 * ··· 205 */ 206 static __always_inline void csd_lock_wait(call_single_data_t *csd) 207 { 208 smp_cond_load_acquire(&csd->flags, !(VAL & CSD_FLAG_LOCK)); 209 } 210 211 static __always_inline void csd_lock(call_single_data_t *csd) 212 { ··· 286 * We can unlock early even for the synchronous on-stack case, 287 * since we're doing this from the same CPU.. 288 */ 289 csd_unlock(csd); 290 local_irq_save(flags); 291 func(info); 292 local_irq_restore(flags); 293 return 0; 294 } ··· 390 entry = &csd_next->llist; 391 } 392 393 func(info); 394 csd_unlock(csd); 395 } else { 396 prev = &csd->llist; 397 } ··· 420 smp_call_func_t func = csd->func; 421 void *info = csd->info; 422 423 csd_unlock(csd); 424 func(info); 425 } else if (type == CSD_TYPE_IRQ_WORK) { 426 irq_work_single(csd); 427 } ··· 501 502 csd->func = func; 503 csd->info = info; 504 505 err = generic_exec_single(cpu, csd); 506 ··· 670 csd->flags |= CSD_TYPE_SYNC; 671 csd->func = func; 672 csd->info = info; 673 if (llist_add(&csd->llist, &per_cpu(call_single_queue, cpu))) 674 __cpumask_set_cpu(cpu, cfd->cpumask_ipi); 675 }

··· 20 #include <linux/sched.h> 21 #include <linux/sched/idle.h> 22 #include <linux/hypervisor.h> 23 + #include <linux/sched/clock.h> 24 + #include <linux/nmi.h> 25 + #include <linux/sched/debug.h> 26 27 #include "smpboot.h" 28 #include "sched/smp.h" ··· 96 smpcfd_prepare_cpu(smp_processor_id()); 97 } 98 99 + #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG 100 + 101 + static DEFINE_PER_CPU(call_single_data_t *, cur_csd); 102 + static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func); 103 + static DEFINE_PER_CPU(void *, cur_csd_info); 104 + 105 + #define CSD_LOCK_TIMEOUT (5ULL * NSEC_PER_SEC) 106 + static atomic_t csd_bug_count = ATOMIC_INIT(0); 107 + 108 + /* Record current CSD work for current CPU, NULL to erase. */ 109 + static void csd_lock_record(call_single_data_t *csd) 110 + { 111 + if (!csd) { 112 + smp_mb(); /* NULL cur_csd after unlock. */ 113 + __this_cpu_write(cur_csd, NULL); 114 + return; 115 + } 116 + __this_cpu_write(cur_csd_func, csd->func); 117 + __this_cpu_write(cur_csd_info, csd->info); 118 + smp_wmb(); /* func and info before csd. */ 119 + __this_cpu_write(cur_csd, csd); 120 + smp_mb(); /* Update cur_csd before function call. */ 121 + /* Or before unlock, as the case may be. */ 122 + } 123 + 124 + static __always_inline int csd_lock_wait_getcpu(call_single_data_t *csd) 125 + { 126 + unsigned int csd_type; 127 + 128 + csd_type = CSD_TYPE(csd); 129 + if (csd_type == CSD_TYPE_ASYNC || csd_type == CSD_TYPE_SYNC) 130 + return csd->dst; /* Other CSD_TYPE_ values might not have ->dst. */ 131 + return -1; 132 + } 133 + 134 + /* 135 + * Complain if too much time spent waiting. Note that only 136 + * the CSD_TYPE_SYNC/ASYNC types provide the destination CPU, 137 + * so waiting on other types gets much less information. 138 + */ 139 + static __always_inline bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id) 140 + { 141 + int cpu = -1; 142 + int cpux; 143 + bool firsttime; 144 + u64 ts2, ts_delta; 145 + call_single_data_t *cpu_cur_csd; 146 + unsigned int flags = READ_ONCE(csd->flags); 147 + 148 + if (!(flags & CSD_FLAG_LOCK)) { 149 + if (!unlikely(*bug_id)) 150 + return true; 151 + cpu = csd_lock_wait_getcpu(csd); 152 + pr_alert("csd: CSD lock (#%d) got unstuck on CPU#%02d, CPU#%02d released the lock.\n", 153 + *bug_id, raw_smp_processor_id(), cpu); 154 + return true; 155 + } 156 + 157 + ts2 = sched_clock(); 158 + ts_delta = ts2 - *ts1; 159 + if (likely(ts_delta <= CSD_LOCK_TIMEOUT)) 160 + return false; 161 + 162 + firsttime = !*bug_id; 163 + if (firsttime) 164 + *bug_id = atomic_inc_return(&csd_bug_count); 165 + cpu = csd_lock_wait_getcpu(csd); 166 + if (WARN_ONCE(cpu < 0 || cpu >= nr_cpu_ids, "%s: cpu = %d\n", __func__, cpu)) 167 + cpux = 0; 168 + else 169 + cpux = cpu; 170 + cpu_cur_csd = smp_load_acquire(&per_cpu(cur_csd, cpux)); /* Before func and info. */ 171 + pr_alert("csd: %s non-responsive CSD lock (#%d) on CPU#%d, waiting %llu ns for CPU#%02d %pS(%ps).\n", 172 + firsttime ? "Detected" : "Continued", *bug_id, raw_smp_processor_id(), ts2 - ts0, 173 + cpu, csd->func, csd->info); 174 + if (cpu_cur_csd && csd != cpu_cur_csd) { 175 + pr_alert("\tcsd: CSD lock (#%d) handling prior %pS(%ps) request.\n", 176 + *bug_id, READ_ONCE(per_cpu(cur_csd_func, cpux)), 177 + READ_ONCE(per_cpu(cur_csd_info, cpux))); 178 + } else { 179 + pr_alert("\tcsd: CSD lock (#%d) %s.\n", 180 + *bug_id, !cpu_cur_csd ? "unresponsive" : "handling this request"); 181 + } 182 + if (cpu >= 0) { 183 + if (!trigger_single_cpu_backtrace(cpu)) 184 + dump_cpu_task(cpu); 185 + if (!cpu_cur_csd) { 186 + pr_alert("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", *bug_id, raw_smp_processor_id(), cpu); 187 + arch_send_call_function_single_ipi(cpu); 188 + } 189 + } 190 + dump_stack(); 191 + *ts1 = ts2; 192 + 193 + return false; 194 + } 195 + 196 /* 197 * csd_lock/csd_unlock used to serialize access to per-cpu csd resources 198 * ··· 105 */ 106 static __always_inline void csd_lock_wait(call_single_data_t *csd) 107 { 108 + int bug_id = 0; 109 + u64 ts0, ts1; 110 + 111 + ts1 = ts0 = sched_clock(); 112 + for (;;) { 113 + if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id)) 114 + break; 115 + cpu_relax(); 116 + } 117 + smp_acquire__after_ctrl_dep(); 118 + } 119 + 120 + #else 121 + static void csd_lock_record(call_single_data_t *csd) 122 + { 123 + } 124 + 125 + static __always_inline void csd_lock_wait(call_single_data_t *csd) 126 + { 127 smp_cond_load_acquire(&csd->flags, !(VAL & CSD_FLAG_LOCK)); 128 } 129 + #endif 130 131 static __always_inline void csd_lock(call_single_data_t *csd) 132 { ··· 166 * We can unlock early even for the synchronous on-stack case, 167 * since we're doing this from the same CPU.. 168 */ 169 + csd_lock_record(csd); 170 csd_unlock(csd); 171 local_irq_save(flags); 172 func(info); 173 + csd_lock_record(NULL); 174 local_irq_restore(flags); 175 return 0; 176 } ··· 268 entry = &csd_next->llist; 269 } 270 271 + csd_lock_record(csd); 272 func(info); 273 csd_unlock(csd); 274 + csd_lock_record(NULL); 275 } else { 276 prev = &csd->llist; 277 } ··· 296 smp_call_func_t func = csd->func; 297 void *info = csd->info; 298 299 + csd_lock_record(csd); 300 csd_unlock(csd); 301 func(info); 302 + csd_lock_record(NULL); 303 } else if (type == CSD_TYPE_IRQ_WORK) { 304 irq_work_single(csd); 305 } ··· 375 376 csd->func = func; 377 csd->info = info; 378 + #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG 379 + csd->src = smp_processor_id(); 380 + csd->dst = cpu; 381 + #endif 382 383 err = generic_exec_single(cpu, csd); 384 ··· 540 csd->flags |= CSD_TYPE_SYNC; 541 csd->func = func; 542 csd->info = info; 543 + #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG 544 + csd->src = smp_processor_id(); 545 + csd->dst = cpu; 546 + #endif 547 if (llist_add(&csd->llist, &per_cpu(call_single_queue, cpu))) 548 __cpumask_set_cpu(cpu, cfd->cpumask_ipi); 549 }

+1 -1

kernel/time/tick-sched.c

··· 927 928 if (ratelimit < 10 && 929 (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) { 930 - pr_warn("NOHZ: local_softirq_pending %02x\n", 931 (unsigned int) local_softirq_pending()); 932 ratelimit++; 933 }

··· 927 928 if (ratelimit < 10 && 929 (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) { 930 + pr_warn("NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #%02x!!!\n", 931 (unsigned int) local_softirq_pending()); 932 ratelimit++; 933 }

+21

lib/Kconfig.debug

··· 1367 Say M if you want these self tests to build as a module. 1368 Say N if you are unsure. 1369 1370 endmenu # lock debugging 1371 1372 config TRACE_IRQFLAGS

··· 1367 Say M if you want these self tests to build as a module. 1368 Say N if you are unsure. 1369 1370 + config SCF_TORTURE_TEST 1371 + tristate "torture tests for smp_call_function*()" 1372 + depends on DEBUG_KERNEL 1373 + select TORTURE_TEST 1374 + help 1375 + This option provides a kernel module that runs torture tests 1376 + on the smp_call_function() family of primitives. The kernel 1377 + module may be built after the fact on the running kernel to 1378 + be tested, if desired. 1379 + 1380 + config CSD_LOCK_WAIT_DEBUG 1381 + bool "Debugging for csd_lock_wait(), called from smp_call_function*()" 1382 + depends on DEBUG_KERNEL 1383 + depends on 64BIT 1384 + default n 1385 + help 1386 + This option enables debug prints when CPUs are slow to respond 1387 + to the smp_call_function*() IPI wrappers. These debug prints 1388 + include the IPI handler function currently executing (if any) 1389 + and relevant stack traces. 1390 + 1391 endmenu # lock debugging 1392 1393 config TRACE_IRQFLAGS

+5 -1

lib/nmi_backtrace.c

··· 85 put_cpu(); 86 } 87 88 bool nmi_cpu_backtrace(struct pt_regs *regs) 89 { 90 int cpu = smp_processor_id(); 91 92 if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) { 93 - if (regs && cpu_in_idle(instruction_pointer(regs))) { 94 pr_warn("NMI backtrace for cpu %d skipped: idling at %pS\n", 95 cpu, (void *)instruction_pointer(regs)); 96 } else {

··· 85 put_cpu(); 86 } 87 88 + // Dump stacks even for idle CPUs. 89 + static bool backtrace_idle; 90 + module_param(backtrace_idle, bool, 0644); 91 + 92 bool nmi_cpu_backtrace(struct pt_regs *regs) 93 { 94 int cpu = smp_processor_id(); 95 96 if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) { 97 + if (!READ_ONCE(backtrace_idle) && regs && cpu_in_idle(instruction_pointer(regs))) { 98 pr_warn("NMI backtrace for cpu %d skipped: idling at %pS\n", 99 cpu, (void *)instruction_pointer(regs)); 100 } else {

+3 -3

tools/testing/selftests/rcutorture/bin/kvm-recheck-rcuperf-ftrace.sh tools/testing/selftests/rcutorture/bin/kvm-recheck-rcuscale-ftrace.sh

··· 1 #!/bin/bash 2 # SPDX-License-Identifier: GPL-2.0+ 3 # 4 - # Analyze a given results directory for rcuperf performance measurements, 5 # looking for ftrace data. Exits with 0 if data was found, analyzed, and 6 - # printed. Intended to be invoked from kvm-recheck-rcuperf.sh after 7 # argument checking. 8 # 9 - # Usage: kvm-recheck-rcuperf-ftrace.sh resdir 10 # 11 # Copyright (C) IBM Corporation, 2016 12 #

··· 1 #!/bin/bash 2 # SPDX-License-Identifier: GPL-2.0+ 3 # 4 + # Analyze a given results directory for rcuscale performance measurements, 5 # looking for ftrace data. Exits with 0 if data was found, analyzed, and 6 + # printed. Intended to be invoked from kvm-recheck-rcuscale.sh after 7 # argument checking. 8 # 9 + # Usage: kvm-recheck-rcuscale-ftrace.sh resdir 10 # 11 # Copyright (C) IBM Corporation, 2016 12 #

+7 -7

tools/testing/selftests/rcutorture/bin/kvm-recheck-rcuperf.sh tools/testing/selftests/rcutorture/bin/kvm-recheck-rcuscale.sh

··· 1 #!/bin/bash 2 # SPDX-License-Identifier: GPL-2.0+ 3 # 4 - # Analyze a given results directory for rcuperf performance measurements. 5 # 6 - # Usage: kvm-recheck-rcuperf.sh resdir 7 # 8 # Copyright (C) IBM Corporation, 2016 9 # ··· 20 PATH=`pwd`/tools/testing/selftests/rcutorture/bin:$PATH; export PATH 21 . functions.sh 22 23 - if kvm-recheck-rcuperf-ftrace.sh $i 24 then 25 # ftrace data was successfully analyzed, call it good! 26 exit 0 ··· 30 31 sed -e 's/^\[[^]]*]//' < $i/console.log | 32 awk ' 33 - /-perf: .* gps: .* batches:/ { 34 ngps = $9; 35 nbatches = $11; 36 } 37 38 - /-perf: .*writer-duration/ { 39 gptimes[++n] = $5 / 1000.; 40 sum += $5 / 1000.; 41 } ··· 43 END { 44 newNR = asort(gptimes); 45 if (newNR <= 0) { 46 - print "No rcuperf records found???" 47 exit; 48 } 49 pct50 = int(newNR * 50 / 100); ··· 79 print "99th percentile grace-period duration: " gptimes[pct99]; 80 print "Maximum grace-period duration: " gptimes[newNR]; 81 print "Grace periods: " ngps + 0 " Batches: " nbatches + 0 " Ratio: " ngps / nbatches; 82 - print "Computed from rcuperf printk output."; 83 }'

··· 1 #!/bin/bash 2 # SPDX-License-Identifier: GPL-2.0+ 3 # 4 + # Analyze a given results directory for rcuscale scalability measurements. 5 # 6 + # Usage: kvm-recheck-rcuscale.sh resdir 7 # 8 # Copyright (C) IBM Corporation, 2016 9 # ··· 20 PATH=`pwd`/tools/testing/selftests/rcutorture/bin:$PATH; export PATH 21 . functions.sh 22 23 + if kvm-recheck-rcuscale-ftrace.sh $i 24 then 25 # ftrace data was successfully analyzed, call it good! 26 exit 0 ··· 30 31 sed -e 's/^\[[^]]*]//' < $i/console.log | 32 awk ' 33 + /-scale: .* gps: .* batches:/ { 34 ngps = $9; 35 nbatches = $11; 36 } 37 38 + /-scale: .*writer-duration/ { 39 gptimes[++n] = $5 / 1000.; 40 sum += $5 / 1000.; 41 } ··· 43 END { 44 newNR = asort(gptimes); 45 if (newNR <= 0) { 46 + print "No rcuscale records found???" 47 exit; 48 } 49 pct50 = int(newNR * 50 / 100); ··· 79 print "99th percentile grace-period duration: " gptimes[pct99]; 80 print "Maximum grace-period duration: " gptimes[newNR]; 81 print "Grace periods: " ngps + 0 " Batches: " nbatches + 0 " Ratio: " ngps / nbatches; 82 + print "Computed from rcuscale printk output."; 83 }'

+38

tools/testing/selftests/rcutorture/bin/kvm-recheck-scf.sh

···

··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0+ 3 + # 4 + # Analyze a given results directory for rcutorture progress. 5 + # 6 + # Usage: kvm-recheck-rcu.sh resdir 7 + # 8 + # Copyright (C) Facebook, 2020 9 + # 10 + # Authors: Paul E. McKenney <paulmck@kernel.org> 11 + 12 + i="$1" 13 + if test -d "$i" -a -r "$i" 14 + then 15 + : 16 + else 17 + echo Unreadable results directory: $i 18 + exit 1 19 + fi 20 + . functions.sh 21 + 22 + configfile=`echo $i | sed -e 's/^.*\///'` 23 + nscfs="`grep 'scf_invoked_count ver:' $i/console.log 2> /dev/null | tail -1 | sed -e 's/^.* scf_invoked_count ver: //' -e 's/ .*$//' | tr -d '\015'`" 24 + if test -z "$nscfs" 25 + then 26 + echo "$configfile ------- " 27 + else 28 + dur="`sed -e 's/^.* scftorture.shutdown_secs=//' -e 's/ .*$//' < $i/qemu-cmd 2> /dev/null`" 29 + if test -z "$dur" 30 + then 31 + rate="" 32 + else 33 + nscfss=`awk -v nscfs=$nscfs -v dur=$dur ' 34 + BEGIN { print nscfs / dur }' < /dev/null` 35 + rate=" ($nscfss/s)" 36 + fi 37 + echo "${configfile} ------- ${nscfs} SCF handler invocations$rate" 38 + fi

+25 -8

tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh

··· 66 echo > $T/KcList 67 config_override_param "$config_dir/CFcommon" KcList "`cat $config_dir/CFcommon 2> /dev/null`" 68 config_override_param "$config_template" KcList "`cat $config_template 2> /dev/null`" 69 config_override_param "--kasan options" KcList "$TORTURE_KCONFIG_KASAN_ARG" 70 config_override_param "--kcsan options" KcList "$TORTURE_KCONFIG_KCSAN_ARG" 71 config_override_param "--kconfig argument" KcList "$TORTURE_KCONFIG_ARG" ··· 153 boot_args="`configfrag_boot_params "$boot_args" "$config_template"`" 154 # Generate kernel-version-specific boot parameters 155 boot_args="`per_version_boot_params "$boot_args" $resdir/.config $seconds`" 156 - echo $QEMU $qemu_args -m $TORTURE_QEMU_MEM -kernel $KERNEL -append \"$qemu_append $boot_args\" > $resdir/qemu-cmd 157 158 if test -n "$TORTURE_BUILDONLY" 159 then ··· 176 # Attempt to run qemu 177 ( . $T/qemu-cmd; wait `cat $resdir/qemu_pid`; echo $? > $resdir/qemu-retval ) & 178 commandcompleted=0 179 - sleep 10 # Give qemu's pid a chance to reach the file 180 - if test -s "$resdir/qemu_pid" 181 then 182 - qemu_pid=`cat "$resdir/qemu_pid"` 183 - echo Monitoring qemu job at pid $qemu_pid 184 - else 185 - qemu_pid="" 186 - echo Monitoring qemu job at yet-as-unknown pid 187 fi 188 while : 189 do

··· 66 echo > $T/KcList 67 config_override_param "$config_dir/CFcommon" KcList "`cat $config_dir/CFcommon 2> /dev/null`" 68 config_override_param "$config_template" KcList "`cat $config_template 2> /dev/null`" 69 + config_override_param "--gdb options" KcList "$TORTURE_KCONFIG_GDB_ARG" 70 config_override_param "--kasan options" KcList "$TORTURE_KCONFIG_KASAN_ARG" 71 config_override_param "--kcsan options" KcList "$TORTURE_KCONFIG_KCSAN_ARG" 72 config_override_param "--kconfig argument" KcList "$TORTURE_KCONFIG_ARG" ··· 152 boot_args="`configfrag_boot_params "$boot_args" "$config_template"`" 153 # Generate kernel-version-specific boot parameters 154 boot_args="`per_version_boot_params "$boot_args" $resdir/.config $seconds`" 155 + if test -n "$TORTURE_BOOT_GDB_ARG" 156 + then 157 + boot_args="$boot_args $TORTURE_BOOT_GDB_ARG" 158 + fi 159 + echo $QEMU $qemu_args -m $TORTURE_QEMU_MEM -kernel $KERNEL -append \"$qemu_append $boot_args\" $TORTURE_QEMU_GDB_ARG > $resdir/qemu-cmd 160 161 if test -n "$TORTURE_BUILDONLY" 162 then ··· 171 # Attempt to run qemu 172 ( . $T/qemu-cmd; wait `cat $resdir/qemu_pid`; echo $? > $resdir/qemu-retval ) & 173 commandcompleted=0 174 + if test -z "$TORTURE_KCONFIG_GDB_ARG" 175 then 176 + sleep 10 # Give qemu's pid a chance to reach the file 177 + if test -s "$resdir/qemu_pid" 178 + then 179 + qemu_pid=`cat "$resdir/qemu_pid"` 180 + echo Monitoring qemu job at pid $qemu_pid 181 + else 182 + qemu_pid="" 183 + echo Monitoring qemu job at yet-as-unknown pid 184 + fi 185 + fi 186 + if test -n "$TORTURE_KCONFIG_GDB_ARG" 187 + then 188 + echo Waiting for you to attach a debug session, for example: > /dev/tty 189 + echo " gdb $base_resdir/vmlinux" > /dev/tty 190 + echo 'After symbols load and the "(gdb)" prompt appears:' > /dev/tty 191 + echo " target remote :1234" > /dev/tty 192 + echo " continue" > /dev/tty 193 + kstarttime=`gawk 'BEGIN { print systime() }' < /dev/null` 194 fi 195 while : 196 do

+31 -5

tools/testing/selftests/rcutorture/bin/kvm.sh

··· 31 TORTURE_BOOT_IMAGE="" 32 TORTURE_INITRD="$KVM/initrd"; export TORTURE_INITRD 33 TORTURE_KCONFIG_ARG="" 34 TORTURE_KCONFIG_KASAN_ARG="" 35 TORTURE_KCONFIG_KCSAN_ARG="" 36 TORTURE_KMAKE_ARG="" ··· 49 50 usage () { 51 echo "Usage: $scriptname optional arguments:" 52 echo " --bootargs kernel-boot-arguments" 53 echo " --bootimage relative-path-to-kernel-boot-image" 54 echo " --buildonly" ··· 59 echo " --defconfig string" 60 echo " --dryrun sched|script" 61 echo " --duration minutes" 62 echo " --interactive" 63 echo " --jitter N [ maxsleep (us) [ maxspin (us) ] ]" 64 echo " --kconfig Kconfig-options" 65 echo " --kmake-arg kernel-make-arguments" 66 echo " --mac nn:nn:nn:nn:nn:nn" 67 - echo " --memory megabytes | nnnG" 68 echo " --no-initrd" 69 echo " --qemu-args qemu-arguments" 70 echo " --qemu-cmd qemu-system-..." 71 echo " --results absolute-pathname" 72 - echo " --torture rcu" 73 echo " --trust-make" 74 exit 1 75 } ··· 132 dur=$(($2*60)) 133 shift 134 ;; 135 --interactive) 136 TORTURE_QEMU_INTERACTIVE=1; export TORTURE_QEMU_INTERACTIVE 137 ;; ··· 198 shift 199 ;; 200 --torture) 201 - checkarg --torture "(suite name)" "$#" "$2" '^$lock\|rcu\|rcuperf\|refscale$$' '^--' 202 TORTURE_SUITE=$2 203 shift 204 - if test "$TORTURE_SUITE" = rcuperf || test "$TORTURE_SUITE" = refscale 205 then 206 # If you really want jitter for refscale or 207 - # rcuperf, specify it after specifying the rcuperf 208 # or the refscale. (But why jitter in these cases?) 209 jitter=0 210 fi ··· 262 done 263 touch $T/cfgcpu 264 configs_derep="`echo $configs_derep | sed -e "s/\<CFLIST\>/$defaultconfigs/g"`" 265 for CF1 in $configs_derep 266 do 267 if test -f "$CONFIGFRAG/$CF1" ··· 346 TORTURE_DEFCONFIG="$TORTURE_DEFCONFIG"; export TORTURE_DEFCONFIG 347 TORTURE_INITRD="$TORTURE_INITRD"; export TORTURE_INITRD 348 TORTURE_KCONFIG_ARG="$TORTURE_KCONFIG_ARG"; export TORTURE_KCONFIG_ARG 349 TORTURE_KCONFIG_KASAN_ARG="$TORTURE_KCONFIG_KASAN_ARG"; export TORTURE_KCONFIG_KASAN_ARG 350 TORTURE_KCONFIG_KCSAN_ARG="$TORTURE_KCONFIG_KCSAN_ARG"; export TORTURE_KCONFIG_KCSAN_ARG 351 TORTURE_KMAKE_ARG="$TORTURE_KMAKE_ARG"; export TORTURE_KMAKE_ARG

··· 31 TORTURE_BOOT_IMAGE="" 32 TORTURE_INITRD="$KVM/initrd"; export TORTURE_INITRD 33 TORTURE_KCONFIG_ARG="" 34 + TORTURE_KCONFIG_GDB_ARG="" 35 + TORTURE_BOOT_GDB_ARG="" 36 + TORTURE_QEMU_GDB_ARG="" 37 TORTURE_KCONFIG_KASAN_ARG="" 38 TORTURE_KCONFIG_KCSAN_ARG="" 39 TORTURE_KMAKE_ARG="" ··· 46 47 usage () { 48 echo "Usage: $scriptname optional arguments:" 49 + echo " --allcpus" 50 echo " --bootargs kernel-boot-arguments" 51 echo " --bootimage relative-path-to-kernel-boot-image" 52 echo " --buildonly" ··· 55 echo " --defconfig string" 56 echo " --dryrun sched|script" 57 echo " --duration minutes" 58 + echo " --gdb" 59 + echo " --help" 60 echo " --interactive" 61 echo " --jitter N [ maxsleep (us) [ maxspin (us) ] ]" 62 echo " --kconfig Kconfig-options" 63 echo " --kmake-arg kernel-make-arguments" 64 echo " --mac nn:nn:nn:nn:nn:nn" 65 + echo " --memory megabytes|nnnG" 66 echo " --no-initrd" 67 echo " --qemu-args qemu-arguments" 68 echo " --qemu-cmd qemu-system-..." 69 echo " --results absolute-pathname" 70 + echo " --torture lock|rcu|rcuscale|refscale|scf" 71 echo " --trust-make" 72 exit 1 73 } ··· 126 dur=$(($2*60)) 127 shift 128 ;; 129 + --gdb) 130 + TORTURE_KCONFIG_GDB_ARG="CONFIG_DEBUG_INFO=y"; export TORTURE_KCONFIG_GDB_ARG 131 + TORTURE_BOOT_GDB_ARG="nokaslr"; export TORTURE_BOOT_GDB_ARG 132 + TORTURE_QEMU_GDB_ARG="-s -S"; export TORTURE_QEMU_GDB_ARG 133 + ;; 134 + --help|-h) 135 + usage 136 + ;; 137 --interactive) 138 TORTURE_QEMU_INTERACTIVE=1; export TORTURE_QEMU_INTERACTIVE 139 ;; ··· 184 shift 185 ;; 186 --torture) 187 + checkarg --torture "(suite name)" "$#" "$2" '^$lock\|rcu\|rcuscale\|refscale\|scf$$' '^--' 188 TORTURE_SUITE=$2 189 shift 190 + if test "$TORTURE_SUITE" = rcuscale || test "$TORTURE_SUITE" = refscale 191 then 192 # If you really want jitter for refscale or 193 + # rcuscale, specify it after specifying the rcuscale 194 # or the refscale. (But why jitter in these cases?) 195 jitter=0 196 fi ··· 248 done 249 touch $T/cfgcpu 250 configs_derep="`echo $configs_derep | sed -e "s/\<CFLIST\>/$defaultconfigs/g"`" 251 + if test -n "$TORTURE_KCONFIG_GDB_ARG" 252 + then 253 + if test "`echo $configs_derep | wc -w`" -gt 1 254 + then 255 + echo "The --config list is: $configs_derep." 256 + echo "Only one --config permitted with --gdb, terminating." 257 + exit 1 258 + fi 259 + fi 260 for CF1 in $configs_derep 261 do 262 if test -f "$CONFIGFRAG/$CF1" ··· 323 TORTURE_DEFCONFIG="$TORTURE_DEFCONFIG"; export TORTURE_DEFCONFIG 324 TORTURE_INITRD="$TORTURE_INITRD"; export TORTURE_INITRD 325 TORTURE_KCONFIG_ARG="$TORTURE_KCONFIG_ARG"; export TORTURE_KCONFIG_ARG 326 + TORTURE_KCONFIG_GDB_ARG="$TORTURE_KCONFIG_GDB_ARG"; export TORTURE_KCONFIG_GDB_ARG 327 + TORTURE_BOOT_GDB_ARG="$TORTURE_BOOT_GDB_ARG"; export TORTURE_BOOT_GDB_ARG 328 + TORTURE_QEMU_GDB_ARG="$TORTURE_QEMU_GDB_ARG"; export TORTURE_QEMU_GDB_ARG 329 TORTURE_KCONFIG_KASAN_ARG="$TORTURE_KCONFIG_KASAN_ARG"; export TORTURE_KCONFIG_KASAN_ARG 330 TORTURE_KCONFIG_KCSAN_ARG="$TORTURE_KCONFIG_KCSAN_ARG"; export TORTURE_KCONFIG_KCSAN_ARG 331 TORTURE_KMAKE_ARG="$TORTURE_KMAKE_ARG"; export TORTURE_KMAKE_ARG

+6 -5

tools/testing/selftests/rcutorture/bin/parse-console.sh

··· 33 fi 34 cat /dev/null > $file.diags 35 36 - # Check for proper termination, except for rcuperf and refscale. 37 - if test "$TORTURE_SUITE" != rcuperf && test "$TORTURE_SUITE" != refscale 38 then 39 # check for abject failure 40 ··· 67 grep --binary-files=text 'torture:.*ver:' $file | 68 egrep --binary-files=text -v '$null$|rtc: 000000000* ' | 69 sed -e 's/^(initramfs)[^]]*] //' -e 's/^\[[^]]*] //' | 70 awk ' 71 BEGIN { 72 ver = 0; ··· 75 } 76 77 { 78 - if (!badseq && ($5 + 0 != $5 || $5 <= ver)) { 79 badseqno1 = ver; 80 - badseqno2 = $5; 81 badseqnr = NR; 82 badseq = 1; 83 } 84 - ver = $5 85 } 86 87 END {

··· 33 fi 34 cat /dev/null > $file.diags 35 36 + # Check for proper termination, except for rcuscale and refscale. 37 + if test "$TORTURE_SUITE" != rcuscale && test "$TORTURE_SUITE" != refscale 38 then 39 # check for abject failure 40 ··· 67 grep --binary-files=text 'torture:.*ver:' $file | 68 egrep --binary-files=text -v '$null$|rtc: 000000000* ' | 69 sed -e 's/^(initramfs)[^]]*] //' -e 's/^\[[^]]*] //' | 70 + sed -e 's/^.*ver: //' | 71 awk ' 72 BEGIN { 73 ver = 0; ··· 74 } 75 76 { 77 + if (!badseq && ($1 + 0 != $1 || $1 <= ver)) { 78 badseqno1 = ver; 79 + badseqno2 = $1; 80 badseqnr = NR; 81 badseq = 1; 82 } 83 + ver = $1 84 } 85 86 END {

+1

tools/testing/selftests/rcutorture/configs/rcu/TREE05

··· 16 CONFIG_DEBUG_LOCK_ALLOC=y 17 CONFIG_PROVE_LOCKING=y 18 #CHECK#CONFIG_PROVE_RCU=y 19 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 20 CONFIG_RCU_EXPERT=y

··· 16 CONFIG_DEBUG_LOCK_ALLOC=y 17 CONFIG_PROVE_LOCKING=y 18 #CHECK#CONFIG_PROVE_RCU=y 19 + CONFIG_PROVE_RCU_LIST=y 20 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 21 CONFIG_RCU_EXPERT=y

tools/testing/selftests/rcutorture/configs/rcuperf/CFLIST tools/testing/selftests/rcutorture/configs/rcuscale/CFLIST

-2

tools/testing/selftests/rcutorture/configs/rcuperf/CFcommon

··· 1 - CONFIG_RCU_PERF_TEST=y 2 - CONFIG_PRINTK_TIME=y

···

tools/testing/selftests/rcutorture/configs/rcuperf/TINY tools/testing/selftests/rcutorture/configs/rcuscale/TINY

tools/testing/selftests/rcutorture/configs/rcuperf/TREE tools/testing/selftests/rcutorture/configs/rcuscale/TREE

tools/testing/selftests/rcutorture/configs/rcuperf/TREE54 tools/testing/selftests/rcutorture/configs/rcuscale/TREE54

+2 -2

tools/testing/selftests/rcutorture/configs/rcuperf/ver_functions.sh tools/testing/selftests/rcutorture/configs/rcuscale/ver_functions.sh

··· 11 # 12 # Adds per-version torture-module parameters to kernels supporting them. 13 per_version_boot_params () { 14 - echo $1 rcuperf.shutdown=1 \ 15 - rcuperf.verbose=1 16 }

··· 11 # 12 # Adds per-version torture-module parameters to kernels supporting them. 13 per_version_boot_params () { 14 + echo $1 rcuscale.shutdown=1 \ 15 + rcuscale.verbose=1 16 }

+2

tools/testing/selftests/rcutorture/configs/rcuscale/CFcommon

···

··· 1 + CONFIG_RCU_SCALE_TEST=y 2 + CONFIG_PRINTK_TIME=y

+2

tools/testing/selftests/rcutorture/configs/scf/CFLIST

···

··· 1 + NOPREEMPT 2 + PREEMPT

+2

tools/testing/selftests/rcutorture/configs/scf/CFcommon

···

··· 1 + CONFIG_SCF_TORTURE_TEST=y 2 + CONFIG_PRINTK_TIME=y

+9

tools/testing/selftests/rcutorture/configs/scf/NOPREEMPT

···

··· 1 + CONFIG_SMP=y 2 + CONFIG_PREEMPT_NONE=y 3 + CONFIG_PREEMPT_VOLUNTARY=n 4 + CONFIG_PREEMPT=n 5 + CONFIG_HZ_PERIODIC=n 6 + CONFIG_NO_HZ_IDLE=n 7 + CONFIG_NO_HZ_FULL=y 8 + CONFIG_DEBUG_LOCK_ALLOC=n 9 + CONFIG_PROVE_LOCKING=n

+1

tools/testing/selftests/rcutorture/configs/scf/NOPREEMPT.boot

···

··· 1 + nohz_full=1

+9

tools/testing/selftests/rcutorture/configs/scf/PREEMPT

···

··· 1 + CONFIG_SMP=y 2 + CONFIG_PREEMPT_NONE=n 3 + CONFIG_PREEMPT_VOLUNTARY=n 4 + CONFIG_PREEMPT=y 5 + CONFIG_HZ_PERIODIC=n 6 + CONFIG_NO_HZ_IDLE=y 7 + CONFIG_NO_HZ_FULL=n 8 + CONFIG_DEBUG_LOCK_ALLOC=y 9 + CONFIG_PROVE_LOCKING=y

+30

tools/testing/selftests/rcutorture/configs/scf/ver_functions.sh

···

··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0+ 3 + # 4 + # Torture-suite-dependent shell functions for the rest of the scripts. 5 + # 6 + # Copyright (C) Facebook, 2020 7 + # 8 + # Authors: Paul E. McKenney <paulmck@kernel.org> 9 + 10 + # scftorture_param_onoff bootparam-string config-file 11 + # 12 + # Adds onoff scftorture module parameters to kernels having it. 13 + scftorture_param_onoff () { 14 + if ! bootparam_hotplug_cpu "$1" && configfrag_hotplug_cpu "$2" 15 + then 16 + echo CPU-hotplug kernel, adding scftorture onoff. 1>&2 17 + echo scftorture.onoff_interval=1000 scftorture.onoff_holdoff=30 18 + fi 19 + } 20 + 21 + # per_version_boot_params bootparam-string config-file seconds 22 + # 23 + # Adds per-version torture-module parameters to kernels supporting them. 24 + per_version_boot_params () { 25 + echo $1 `scftorture_param_onoff "$1" "$2"` \ 26 + scftorture.stat_interval=15 \ 27 + scftorture.shutdown_secs=$3 \ 28 + scftorture.verbose=1 \ 29 + scf 30 + }

+7 -29

tools/testing/selftests/rcutorture/doc/initrd.txt

··· 1 - The rcutorture scripting tools automatically create the needed initrd 2 - directory using dracut. Failing that, this tool will create an initrd 3 - containing a single statically linked binary named "init" that loops 4 - over a very long sleep() call. In both cases, this creation is done 5 - by tools/testing/selftests/rcutorture/bin/mkinitrd.sh. 6 7 - However, if you are attempting to run rcutorture on a system that does 8 - not have dracut installed, and if you don't like the notion of static 9 - linking, you might wish to press an existing initrd into service: 10 11 ------------------------------------------------------------------------ 12 cd tools/testing/selftests/rcutorture ··· 14 cd initrd 15 cpio -id < /tmp/initrd.img.zcat 16 # Manually verify that initrd contains needed binaries and libraries. 17 - ------------------------------------------------------------------------ 18 - 19 - Interestingly enough, if you are running rcutorture, you don't really 20 - need userspace in many cases. Running without userspace has the 21 - advantage of allowing you to test your kernel independently of the 22 - distro in place, the root-filesystem layout, and so on. To make this 23 - happen, put the following script in the initrd's tree's "/init" file, 24 - with 0755 mode. 25 - 26 - ------------------------------------------------------------------------ 27 - #!/bin/sh 28 - 29 - while : 30 - do 31 - sleep 10 32 - done 33 - ------------------------------------------------------------------------ 34 - 35 - This approach also allows most of the binaries and libraries in the 36 - initrd filesystem to be dispensed with, which can save significant 37 - space in rcutorture's "res" directory.

··· 1 + The rcutorture scripting tools automatically create an initrd containing 2 + a single statically linked binary named "init" that loops over a 3 + very long sleep() call. In both cases, this creation is done by 4 + tools/testing/selftests/rcutorture/bin/mkinitrd.sh. 5 6 + However, if you don't like the notion of statically linked bare-bones 7 + userspace environments, you might wish to press an existing initrd 8 + into service: 9 10 ------------------------------------------------------------------------ 11 cd tools/testing/selftests/rcutorture ··· 15 cd initrd 16 cpio -id < /tmp/initrd.img.zcat 17 # Manually verify that initrd contains needed binaries and libraries.

+33 -8

tools/testing/selftests/rcutorture/doc/rcu-test-image.txt

··· 1 - This document describes one way to create the rcu-test-image file 2 - that contains the filesystem used by the guest-OS kernel. There are 3 - probably much better ways of doing this, and this filesystem could no 4 - doubt be smaller. It is probably also possible to simply download 5 - an appropriate image from any number of places. 6 7 That said, here are the commands: 8 ··· 61 https://help.ubuntu.com/community/JeOSVMBuilder 62 http://wiki.libvirt.org/page/UbuntuKVMWalkthrough 63 http://www.moe.co.uk/2011/01/07/pci_add_option_rom-failed-to-find-romfile-pxe-rtl8139-bin/ -- "apt-get install kvm-pxe" 64 - http://www.landley.net/writing/rootfs-howto.html 65 - http://en.wikipedia.org/wiki/Initrd 66 - http://en.wikipedia.org/wiki/Cpio 67 http://wiki.libvirt.org/page/UbuntuKVMWalkthrough

··· 1 + Normally, a minimal initrd is created automatically by the rcutorture 2 + scripting. But minimal really does mean "minimal", namely just a single 3 + root directory with a single statically linked executable named "init": 4 + 5 + $ size tools/testing/selftests/rcutorture/initrd/init 6 + text data bss dec hex filename 7 + 328 0 8 336 150 tools/testing/selftests/rcutorture/initrd/init 8 + 9 + Suppose you need to run some scripts, perhaps to monitor or control 10 + some aspect of the rcutorture testing. This will require a more fully 11 + filled-out userspace, perhaps containing libraries, executables for 12 + the shell and other utilities, and soforth. In that case, place your 13 + desired filesystem here: 14 + 15 + tools/testing/selftests/rcutorture/initrd 16 + 17 + For example, your tools/testing/selftests/rcutorture/initrd/init might 18 + be a script that does any needed mount operations and starts whatever 19 + scripts need starting to properly monitor or control your testing. 20 + The next rcutorture build will then incorporate this filesystem into 21 + the kernel image that is passed to qemu. 22 + 23 + Or maybe you need a real root filesystem for some reason, in which case 24 + please read on! 25 + 26 + The remainder of this document describes one way to create the 27 + rcu-test-image file that contains the filesystem used by the guest-OS 28 + kernel. There are probably much better ways of doing this, and this 29 + filesystem could no doubt be smaller. It is probably also possible to 30 + simply download an appropriate image from any number of places. 31 32 That said, here are the commands: 33 ··· 36 https://help.ubuntu.com/community/JeOSVMBuilder 37 http://wiki.libvirt.org/page/UbuntuKVMWalkthrough 38 http://www.moe.co.uk/2011/01/07/pci_add_option_rom-failed-to-find-romfile-pxe-rtl8139-bin/ -- "apt-get install kvm-pxe" 39 + https://www.landley.net/writing/rootfs-howto.html 40 + https://en.wikipedia.org/wiki/Initrd 41 + https://en.wikipedia.org/wiki/Cpio 42 http://wiki.libvirt.org/page/UbuntuKVMWalkthrough