commit 41eea65e2aaadc0611fd56a1b177ce25dcc4c1df · tjh.dev/kernel

+1 -1

Documentation/RCU/Design/Data-Structures/Data-Structures.rst

··· 963 963 ``->dynticks_nesting`` field is incremented up from zero, the 964 964 ``->dynticks_nmi_nesting`` field is set to a large positive number, and 965 965 whenever the ``->dynticks_nesting`` field is decremented down to zero, 966 - the the ``->dynticks_nmi_nesting`` field is set to zero. Assuming that 966 + the ``->dynticks_nmi_nesting`` field is set to zero. Assuming that 967 967 the number of misnested interrupts is not sufficient to overflow the 968 968 counter, this approach corrects the ``->dynticks_nmi_nesting`` field 969 969 every time the corresponding CPU enters the idle loop from process

+2 -2

Documentation/RCU/Design/Requirements/Requirements.rst

··· 2162 2162 this sort of thing. 2163 2163 #. If a CPU is in a portion of the kernel that is absolutely positively 2164 2164 no-joking guaranteed to never execute any RCU read-side critical 2165 - sections, and RCU believes this CPU to to be idle, no problem. This 2165 + sections, and RCU believes this CPU to be idle, no problem. This 2166 2166 sort of thing is used by some architectures for light-weight 2167 2167 exception handlers, which can then avoid the overhead of 2168 2168 ``rcu_irq_enter()`` and ``rcu_irq_exit()`` at exception entry and ··· 2431 2431 not have this property, given that any point in the code outside of an 2432 2432 RCU read-side critical section can be a quiescent state. Therefore, 2433 2433 *RCU-sched* was created, which follows “classic” RCU in that an 2434 - RCU-sched grace period waits for for pre-existing interrupt and NMI 2434 + RCU-sched grace period waits for pre-existing interrupt and NMI 2435 2435 handlers. In kernels built with ``CONFIG_PREEMPT=n``, the RCU and 2436 2436 RCU-sched APIs have identical implementations, while kernels built with 2437 2437 ``CONFIG_PREEMPT=y`` provide a separate implementation for each.

+1 -1

Documentation/RCU/whatisRCU.rst

··· 360 360 361 361 There are at least three flavors of RCU usage in the Linux kernel. The diagram 362 362 above shows the most common one. On the updater side, the rcu_assign_pointer(), 363 - sychronize_rcu() and call_rcu() primitives used are the same for all three 363 + synchronize_rcu() and call_rcu() primitives used are the same for all three 364 364 flavors. However for protection (on the reader side), the primitives used vary 365 365 depending on the flavor: 366 366

+135 -18

Documentation/admin-guide/kernel-parameters.txt

··· 3095 3095 and gids from such clients. This is intended to ease 3096 3096 migration from NFSv2/v3. 3097 3097 3098 + nmi_backtrace.backtrace_idle [KNL] 3099 + Dump stacks even of idle CPUs in response to an 3100 + NMI stack-backtrace request. 3101 + 3098 3102 nmi_debug= [KNL,SH] Specify one or more actions to take 3099 3103 when a NMI is triggered. 3100 3104 Format: [state][,regs][,debounce][,die] ··· 4178 4174 This wake_up() will be accompanied by a 4179 4175 WARN_ONCE() splat and an ftrace_dump(). 4180 4176 4177 + rcutree.rcu_unlock_delay= [KNL] 4178 + In CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, 4179 + this specifies an rcu_read_unlock()-time delay 4180 + in microseconds. This defaults to zero. 4181 + Larger delays increase the probability of 4182 + catching RCU pointer leaks, that is, buggy use 4183 + of RCU-protected pointers after the relevant 4184 + rcu_read_unlock() has completed. 4185 + 4181 4186 rcutree.sysrq_rcu= [KNL] 4182 4187 Commandeer a sysrq key to dump out Tree RCU's 4183 4188 rcu_node tree with an eye towards determining 4184 4189 why a new grace period has not yet started. 4185 4190 4186 - rcuperf.gp_async= [KNL] 4191 + rcuscale.gp_async= [KNL] 4187 4192 Measure performance of asynchronous 4188 4193 grace-period primitives such as call_rcu(). 4189 4194 4190 - rcuperf.gp_async_max= [KNL] 4195 + rcuscale.gp_async_max= [KNL] 4191 4196 Specify the maximum number of outstanding 4192 4197 callbacks per writer thread. When a writer 4193 4198 thread exceeds this limit, it invokes the 4194 4199 corresponding flavor of rcu_barrier() to allow 4195 4200 previously posted callbacks to drain. 4196 4201 4197 - rcuperf.gp_exp= [KNL] 4202 + rcuscale.gp_exp= [KNL] 4198 4203 Measure performance of expedited synchronous 4199 4204 grace-period primitives. 4200 4205 4201 - rcuperf.holdoff= [KNL] 4206 + rcuscale.holdoff= [KNL] 4202 4207 Set test-start holdoff period. The purpose of 4203 4208 this parameter is to delay the start of the 4204 4209 test until boot completes in order to avoid 4205 4210 interference. 4206 4211 4207 - rcuperf.kfree_rcu_test= [KNL] 4212 + rcuscale.kfree_rcu_test= [KNL] 4208 4213 Set to measure performance of kfree_rcu() flooding. 4209 4214 4210 - rcuperf.kfree_nthreads= [KNL] 4215 + rcuscale.kfree_nthreads= [KNL] 4211 4216 The number of threads running loops of kfree_rcu(). 4212 4217 4213 - rcuperf.kfree_alloc_num= [KNL] 4218 + rcuscale.kfree_alloc_num= [KNL] 4214 4219 Number of allocations and frees done in an iteration. 4215 4220 4216 - rcuperf.kfree_loops= [KNL] 4217 - Number of loops doing rcuperf.kfree_alloc_num number 4221 + rcuscale.kfree_loops= [KNL] 4222 + Number of loops doing rcuscale.kfree_alloc_num number 4218 4223 of allocations and frees. 4219 4224 4220 - rcuperf.nreaders= [KNL] 4225 + rcuscale.nreaders= [KNL] 4221 4226 Set number of RCU readers. The value -1 selects 4222 4227 N, where N is the number of CPUs. A value 4223 4228 "n" less than -1 selects N-n+1, where N is again ··· 4235 4222 A value of "n" less than or equal to -N selects 4236 4223 a single reader. 4237 4224 4238 - rcuperf.nwriters= [KNL] 4225 + rcuscale.nwriters= [KNL] 4239 4226 Set number of RCU writers. The values operate 4240 - the same as for rcuperf.nreaders. 4227 + the same as for rcuscale.nreaders. 4241 4228 N, where N is the number of CPUs 4242 4229 4243 - rcuperf.perf_type= [KNL] 4230 + rcuscale.perf_type= [KNL] 4244 4231 Specify the RCU implementation to test. 4245 4232 4246 - rcuperf.shutdown= [KNL] 4233 + rcuscale.shutdown= [KNL] 4247 4234 Shut the system down after performance tests 4248 4235 complete. This is useful for hands-off automated 4249 4236 testing. 4250 4237 4251 - rcuperf.verbose= [KNL] 4238 + rcuscale.verbose= [KNL] 4252 4239 Enable additional printk() statements. 4253 4240 4254 - rcuperf.writer_holdoff= [KNL] 4241 + rcuscale.writer_holdoff= [KNL] 4255 4242 Write-side holdoff between grace periods, 4256 4243 in microseconds. The default of zero says 4257 4244 no holdoff. ··· 4303 4290 rcutorture.gp_normal=, and rcutorture.gp_sync= 4304 4291 are zero, rcutorture acts as if is interpreted 4305 4292 they are all non-zero. 4293 + 4294 + rcutorture.irqreader= [KNL] 4295 + Run RCU readers from irq handlers, or, more 4296 + accurately, from a timer handler. Not all RCU 4297 + flavors take kindly to this sort of thing. 4298 + 4299 + rcutorture.leakpointer= [KNL] 4300 + Leak an RCU-protected pointer out of the reader. 4301 + This can of course result in splats, and is 4302 + intended to test the ability of things like 4303 + CONFIG_RCU_STRICT_GRACE_PERIOD=y to detect 4304 + such leaks. 4306 4305 4307 4306 rcutorture.n_barrier_cbs= [KNL] 4308 4307 Set callbacks/threads for rcu_barrier() testing. ··· 4537 4512 refscale.shutdown= [KNL] 4538 4513 Shut down the system at the end of the performance 4539 4514 test. This defaults to 1 (shut it down) when 4540 - rcuperf is built into the kernel and to 0 (leave 4541 - it running) when rcuperf is built as a module. 4515 + refscale is built into the kernel and to 0 (leave 4516 + it running) when refscale is built as a module. 4542 4517 4543 4518 refscale.verbose= [KNL] 4544 4519 Enable additional printk() statements. ··· 4683 4658 and so on. 4684 4659 Format: integer between 0 and 10 4685 4660 Default is 0. 4661 + 4662 + scftorture.holdoff= [KNL] 4663 + Number of seconds to hold off before starting 4664 + test. Defaults to zero for module insertion and 4665 + to 10 seconds for built-in smp_call_function() 4666 + tests. 4667 + 4668 + scftorture.longwait= [KNL] 4669 + Request ridiculously long waits randomly selected 4670 + up to the chosen limit in seconds. Zero (the 4671 + default) disables this feature. Please note 4672 + that requesting even small non-zero numbers of 4673 + seconds can result in RCU CPU stall warnings, 4674 + softlockup complaints, and so on. 4675 + 4676 + scftorture.nthreads= [KNL] 4677 + Number of kthreads to spawn to invoke the 4678 + smp_call_function() family of functions. 4679 + The default of -1 specifies a number of kthreads 4680 + equal to the number of CPUs. 4681 + 4682 + scftorture.onoff_holdoff= [KNL] 4683 + Number seconds to wait after the start of the 4684 + test before initiating CPU-hotplug operations. 4685 + 4686 + scftorture.onoff_interval= [KNL] 4687 + Number seconds to wait between successive 4688 + CPU-hotplug operations. Specifying zero (which 4689 + is the default) disables CPU-hotplug operations. 4690 + 4691 + scftorture.shutdown_secs= [KNL] 4692 + The number of seconds following the start of the 4693 + test after which to shut down the system. The 4694 + default of zero avoids shutting down the system. 4695 + Non-zero values are useful for automated tests. 4696 + 4697 + scftorture.stat_interval= [KNL] 4698 + The number of seconds between outputting the 4699 + current test statistics to the console. A value 4700 + of zero disables statistics output. 4701 + 4702 + scftorture.stutter_cpus= [KNL] 4703 + The number of jiffies to wait between each change 4704 + to the set of CPUs under test. 4705 + 4706 + scftorture.use_cpus_read_lock= [KNL] 4707 + Use use_cpus_read_lock() instead of the default 4708 + preempt_disable() to disable CPU hotplug 4709 + while invoking one of the smp_call_function*() 4710 + functions. 4711 + 4712 + scftorture.verbose= [KNL] 4713 + Enable additional printk() statements. 4714 + 4715 + scftorture.weight_single= [KNL] 4716 + The probability weighting to use for the 4717 + smp_call_function_single() function with a zero 4718 + "wait" parameter. A value of -1 selects the 4719 + default if all other weights are -1. However, 4720 + if at least one weight has some other value, a 4721 + value of -1 will instead select a weight of zero. 4722 + 4723 + scftorture.weight_single_wait= [KNL] 4724 + The probability weighting to use for the 4725 + smp_call_function_single() function with a 4726 + non-zero "wait" parameter. See weight_single. 4727 + 4728 + scftorture.weight_many= [KNL] 4729 + The probability weighting to use for the 4730 + smp_call_function_many() function with a zero 4731 + "wait" parameter. See weight_single. 4732 + Note well that setting a high probability for 4733 + this weighting can place serious IPI load 4734 + on the system. 4735 + 4736 + scftorture.weight_many_wait= [KNL] 4737 + The probability weighting to use for the 4738 + smp_call_function_many() function with a 4739 + non-zero "wait" parameter. See weight_single 4740 + and weight_many. 4741 + 4742 + scftorture.weight_all= [KNL] 4743 + The probability weighting to use for the 4744 + smp_call_function_all() function with a zero 4745 + "wait" parameter. See weight_single and 4746 + weight_many. 4747 + 4748 + scftorture.weight_all_wait= [KNL] 4749 + The probability weighting to use for the 4750 + smp_call_function_all() function with a 4751 + non-zero "wait" parameter. See weight_single 4752 + and weight_many. 4686 4753 4687 4754 skew_tick= [KNL] Offset the periodic timer tick per cpu to mitigate 4688 4755 xtime_lock contention on larger systems, and/or RCU lock

+2 -1

MAINTAINERS

··· 17672 17672 T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev 17673 17673 F: Documentation/RCU/torture.rst 17674 17674 F: kernel/locking/locktorture.c 17675 - F: kernel/rcu/rcuperf.c 17675 + F: kernel/rcu/rcuscale.c 17676 17676 F: kernel/rcu/rcutorture.c 17677 + F: kernel/rcu/refscale.c 17677 17678 F: kernel/torture.c 17678 17679 17679 17680 TOSHIBA ACPI EXTRAS DRIVER

+4 -2

arch/x86/kvm/mmu/page_track.c

··· 229 229 return; 230 230 231 231 idx = srcu_read_lock(&head->track_srcu); 232 - hlist_for_each_entry_rcu(n, &head->track_notifier_list, node) 232 + hlist_for_each_entry_srcu(n, &head->track_notifier_list, node, 233 + srcu_read_lock_held(&head->track_srcu)) 233 234 if (n->track_write) 234 235 n->track_write(vcpu, gpa, new, bytes, n); 235 236 srcu_read_unlock(&head->track_srcu, idx); ··· 255 254 return; 256 255 257 256 idx = srcu_read_lock(&head->track_srcu); 258 - hlist_for_each_entry_rcu(n, &head->track_notifier_list, node) 257 + hlist_for_each_entry_srcu(n, &head->track_notifier_list, node, 258 + srcu_read_lock_held(&head->track_srcu)) 259 259 if (n->track_flush_slot) 260 260 n->track_flush_slot(kvm, slot, n); 261 261 srcu_read_unlock(&head->track_srcu, idx);

+48

include/linux/rculist.h

··· 63 63 RCU_LOCKDEP_WARN(!(cond) && !rcu_read_lock_any_held(), \ 64 64 "RCU-list traversed in non-reader section!"); \ 65 65 }) 66 + 67 + #define __list_check_srcu(cond) \ 68 + ({ \ 69 + RCU_LOCKDEP_WARN(!(cond), \ 70 + "RCU-list traversed without holding the required lock!");\ 71 + }) 66 72 #else 67 73 #define __list_check_rcu(dummy, cond, extra...) \ 68 74 ({ check_arg_count_one(extra); }) 75 + 76 + #define __list_check_srcu(cond) ({ }) 69 77 #endif 70 78 71 79 /* ··· 394 386 pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) 395 387 396 388 /** 389 + * list_for_each_entry_srcu - iterate over rcu list of given type 390 + * @pos: the type * to use as a loop cursor. 391 + * @head: the head for your list. 392 + * @member: the name of the list_head within the struct. 393 + * @cond: lockdep expression for the lock required to traverse the list. 394 + * 395 + * This list-traversal primitive may safely run concurrently with 396 + * the _rcu list-mutation primitives such as list_add_rcu() 397 + * as long as the traversal is guarded by srcu_read_lock(). 398 + * The lockdep expression srcu_read_lock_held() can be passed as the 399 + * cond argument from read side. 400 + */ 401 + #define list_for_each_entry_srcu(pos, head, member, cond) \ 402 + for (__list_check_srcu(cond), \ 403 + pos = list_entry_rcu((head)->next, typeof(*pos), member); \ 404 + &pos->member != (head); \ 405 + pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) 406 + 407 + /** 397 408 * list_entry_lockless - get the struct for this entry 398 409 * @ptr: the &struct list_head pointer. 399 410 * @type: the type of the struct this is embedded in. ··· 704 677 */ 705 678 #define hlist_for_each_entry_rcu(pos, head, member, cond...) \ 706 679 for (__list_check_rcu(dummy, ## cond, 0), \ 680 + pos = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(head)),\ 681 + typeof(*(pos)), member); \ 682 + pos; \ 683 + pos = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(\ 684 + &(pos)->member)), typeof(*(pos)), member)) 685 + 686 + /** 687 + * hlist_for_each_entry_srcu - iterate over rcu list of given type 688 + * @pos: the type * to use as a loop cursor. 689 + * @head: the head for your list. 690 + * @member: the name of the hlist_node within the struct. 691 + * @cond: lockdep expression for the lock required to traverse the list. 692 + * 693 + * This list-traversal primitive may safely run concurrently with 694 + * the _rcu list-mutation primitives such as hlist_add_head_rcu() 695 + * as long as the traversal is guarded by srcu_read_lock(). 696 + * The lockdep expression srcu_read_lock_held() can be passed as the 697 + * cond argument from read side. 698 + */ 699 + #define hlist_for_each_entry_srcu(pos, head, member, cond) \ 700 + for (__list_check_srcu(cond), \ 707 701 pos = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(head)),\ 708 702 typeof(*(pos)), member); \ 709 703 pos; \

+13 -6

include/linux/rcupdate.h

··· 55 55 56 56 #else /* #ifdef CONFIG_PREEMPT_RCU */ 57 57 58 + #ifdef CONFIG_TINY_RCU 59 + #define rcu_read_unlock_strict() do { } while (0) 60 + #else 61 + void rcu_read_unlock_strict(void); 62 + #endif 63 + 58 64 static inline void __rcu_read_lock(void) 59 65 { 60 66 preempt_disable(); ··· 69 63 static inline void __rcu_read_unlock(void) 70 64 { 71 65 preempt_enable(); 66 + rcu_read_unlock_strict(); 72 67 } 73 68 74 69 static inline int rcu_preempt_depth(void) ··· 716 709 "rcu_read_lock_bh() used illegally while idle"); 717 710 } 718 711 719 - /* 720 - * rcu_read_unlock_bh - marks the end of a softirq-only RCU critical section 712 + /** 713 + * rcu_read_unlock_bh() - marks the end of a softirq-only RCU critical section 721 714 * 722 715 * See rcu_read_lock_bh() for more information. 723 716 */ ··· 758 751 __acquire(RCU_SCHED); 759 752 } 760 753 761 - /* 762 - * rcu_read_unlock_sched - marks the end of a RCU-classic critical section 754 + /** 755 + * rcu_read_unlock_sched() - marks the end of a RCU-classic critical section 763 756 * 764 - * See rcu_read_lock_sched for more information. 757 + * See rcu_read_lock_sched() for more information. 765 758 */ 766 759 static inline void rcu_read_unlock_sched(void) 767 760 { ··· 952 945 } 953 946 954 947 /** 955 - * rcu_head_after_call_rcu - Has this rcu_head been passed to call_rcu()? 948 + * rcu_head_after_call_rcu() - Has this rcu_head been passed to call_rcu()? 956 949 * @rhp: The rcu_head structure to test. 957 950 * @f: The function passed to call_rcu() along with @rhp. 958 951 *

-1

include/linux/rcutiny.h

··· 103 103 static inline void rcu_end_inkernel_boot(void) { } 104 104 static inline bool rcu_inkernel_boot_has_ended(void) { return true; } 105 105 static inline bool rcu_is_watching(void) { return true; } 106 - static inline bool __rcu_is_watching(void) { return true; } 107 106 static inline void rcu_momentary_dyntick_idle(void) { } 108 107 static inline void kfree_rcu_scheduler_running(void) { } 109 108 static inline bool rcu_gp_might_be_stalled(void) { return false; }

-1

include/linux/rcutree.h

··· 64 64 void rcu_end_inkernel_boot(void); 65 65 bool rcu_inkernel_boot_has_ended(void); 66 66 bool rcu_is_watching(void); 67 - bool __rcu_is_watching(void); 68 67 #ifndef CONFIG_PREEMPTION 69 68 void rcu_all_qs(void); 70 69 #endif

+3

include/linux/smp.h

··· 26 26 struct { 27 27 struct llist_node llist; 28 28 unsigned int flags; 29 + #ifdef CONFIG_64BIT 30 + u16 src, dst; 31 + #endif 29 32 }; 30 33 }; 31 34 smp_call_func_t func;

+3

include/linux/smp_types.h

··· 61 61 unsigned int u_flags; 62 62 atomic_t a_flags; 63 63 }; 64 + #ifdef CONFIG_64BIT 65 + u16 src, dst; 66 + #endif 64 67 }; 65 68 66 69 #endif /* __LINUX_SMP_TYPES_H */

+27 -27

include/trace/events/rcu.h

··· 74 74 75 75 TP_STRUCT__entry( 76 76 __field(const char *, rcuname) 77 - __field(unsigned long, gp_seq) 77 + __field(long, gp_seq) 78 78 __field(const char *, gpevent) 79 79 ), 80 80 81 81 TP_fast_assign( 82 82 __entry->rcuname = rcuname; 83 - __entry->gp_seq = gp_seq; 83 + __entry->gp_seq = (long)gp_seq; 84 84 __entry->gpevent = gpevent; 85 85 ), 86 86 87 - TP_printk("%s %lu %s", 87 + TP_printk("%s %ld %s", 88 88 __entry->rcuname, __entry->gp_seq, __entry->gpevent) 89 89 ); 90 90 ··· 114 114 115 115 TP_STRUCT__entry( 116 116 __field(const char *, rcuname) 117 - __field(unsigned long, gp_seq) 118 - __field(unsigned long, gp_seq_req) 117 + __field(long, gp_seq) 118 + __field(long, gp_seq_req) 119 119 __field(u8, level) 120 120 __field(int, grplo) 121 121 __field(int, grphi) ··· 124 124 125 125 TP_fast_assign( 126 126 __entry->rcuname = rcuname; 127 - __entry->gp_seq = gp_seq; 128 - __entry->gp_seq_req = gp_seq_req; 127 + __entry->gp_seq = (long)gp_seq; 128 + __entry->gp_seq_req = (long)gp_seq_req; 129 129 __entry->level = level; 130 130 __entry->grplo = grplo; 131 131 __entry->grphi = grphi; 132 132 __entry->gpevent = gpevent; 133 133 ), 134 134 135 - TP_printk("%s %lu %lu %u %d %d %s", 136 - __entry->rcuname, __entry->gp_seq, __entry->gp_seq_req, __entry->level, 135 + TP_printk("%s %ld %ld %u %d %d %s", 136 + __entry->rcuname, (long)__entry->gp_seq, (long)__entry->gp_seq_req, __entry->level, 137 137 __entry->grplo, __entry->grphi, __entry->gpevent) 138 138 ); 139 139 ··· 153 153 154 154 TP_STRUCT__entry( 155 155 __field(const char *, rcuname) 156 - __field(unsigned long, gp_seq) 156 + __field(long, gp_seq) 157 157 __field(u8, level) 158 158 __field(int, grplo) 159 159 __field(int, grphi) ··· 162 162 163 163 TP_fast_assign( 164 164 __entry->rcuname = rcuname; 165 - __entry->gp_seq = gp_seq; 165 + __entry->gp_seq = (long)gp_seq; 166 166 __entry->level = level; 167 167 __entry->grplo = grplo; 168 168 __entry->grphi = grphi; 169 169 __entry->qsmask = qsmask; 170 170 ), 171 171 172 - TP_printk("%s %lu %u %d %d %lx", 172 + TP_printk("%s %ld %u %d %d %lx", 173 173 __entry->rcuname, __entry->gp_seq, __entry->level, 174 174 __entry->grplo, __entry->grphi, __entry->qsmask) 175 175 ); ··· 197 197 198 198 TP_STRUCT__entry( 199 199 __field(const char *, rcuname) 200 - __field(unsigned long, gpseq) 200 + __field(long, gpseq) 201 201 __field(const char *, gpevent) 202 202 ), 203 203 204 204 TP_fast_assign( 205 205 __entry->rcuname = rcuname; 206 - __entry->gpseq = gpseq; 206 + __entry->gpseq = (long)gpseq; 207 207 __entry->gpevent = gpevent; 208 208 ), 209 209 210 - TP_printk("%s %lu %s", 210 + TP_printk("%s %ld %s", 211 211 __entry->rcuname, __entry->gpseq, __entry->gpevent) 212 212 ); 213 213 ··· 316 316 317 317 TP_STRUCT__entry( 318 318 __field(const char *, rcuname) 319 - __field(unsigned long, gp_seq) 319 + __field(long, gp_seq) 320 320 __field(int, pid) 321 321 ), 322 322 323 323 TP_fast_assign( 324 324 __entry->rcuname = rcuname; 325 - __entry->gp_seq = gp_seq; 325 + __entry->gp_seq = (long)gp_seq; 326 326 __entry->pid = pid; 327 327 ), 328 328 329 - TP_printk("%s %lu %d", 329 + TP_printk("%s %ld %d", 330 330 __entry->rcuname, __entry->gp_seq, __entry->pid) 331 331 ); 332 332 ··· 343 343 344 344 TP_STRUCT__entry( 345 345 __field(const char *, rcuname) 346 - __field(unsigned long, gp_seq) 346 + __field(long, gp_seq) 347 347 __field(int, pid) 348 348 ), 349 349 350 350 TP_fast_assign( 351 351 __entry->rcuname = rcuname; 352 - __entry->gp_seq = gp_seq; 352 + __entry->gp_seq = (long)gp_seq; 353 353 __entry->pid = pid; 354 354 ), 355 355 356 - TP_printk("%s %lu %d", __entry->rcuname, __entry->gp_seq, __entry->pid) 356 + TP_printk("%s %ld %d", __entry->rcuname, __entry->gp_seq, __entry->pid) 357 357 ); 358 358 359 359 /* ··· 374 374 375 375 TP_STRUCT__entry( 376 376 __field(const char *, rcuname) 377 - __field(unsigned long, gp_seq) 377 + __field(long, gp_seq) 378 378 __field(unsigned long, mask) 379 379 __field(unsigned long, qsmask) 380 380 __field(u8, level) ··· 385 385 386 386 TP_fast_assign( 387 387 __entry->rcuname = rcuname; 388 - __entry->gp_seq = gp_seq; 388 + __entry->gp_seq = (long)gp_seq; 389 389 __entry->mask = mask; 390 390 __entry->qsmask = qsmask; 391 391 __entry->level = level; ··· 394 394 __entry->gp_tasks = gp_tasks; 395 395 ), 396 396 397 - TP_printk("%s %lu %lx>%lx %u %d %d %u", 397 + TP_printk("%s %ld %lx>%lx %u %d %d %u", 398 398 __entry->rcuname, __entry->gp_seq, 399 399 __entry->mask, __entry->qsmask, __entry->level, 400 400 __entry->grplo, __entry->grphi, __entry->gp_tasks) ··· 415 415 416 416 TP_STRUCT__entry( 417 417 __field(const char *, rcuname) 418 - __field(unsigned long, gp_seq) 418 + __field(long, gp_seq) 419 419 __field(int, cpu) 420 420 __field(const char *, qsevent) 421 421 ), 422 422 423 423 TP_fast_assign( 424 424 __entry->rcuname = rcuname; 425 - __entry->gp_seq = gp_seq; 425 + __entry->gp_seq = (long)gp_seq; 426 426 __entry->cpu = cpu; 427 427 __entry->qsevent = qsevent; 428 428 ), 429 429 430 - TP_printk("%s %lu %d %s", 430 + TP_printk("%s %ld %d %s", 431 431 __entry->rcuname, __entry->gp_seq, 432 432 __entry->cpu, __entry->qsevent) 433 433 );

+2

kernel/Makefile

··· 134 134 KCSAN_SANITIZE_stackleak.o := n 135 135 KCOV_INSTRUMENT_stackleak.o := n 136 136 137 + obj-$(CONFIG_SCF_TORTURE_TEST) += scftorture.o 138 + 137 139 $(obj)/configs.o: $(obj)/config_data.gz 138 140 139 141 targets += config_data.gz

+1 -1

kernel/entry/common.c

··· 304 304 * terminate a grace period, if and only if the timer interrupt is 305 305 * not nested into another interrupt. 306 306 * 307 - * Checking for __rcu_is_watching() here would prevent the nesting 307 + * Checking for rcu_is_watching() here would prevent the nesting 308 308 * interrupt to invoke rcu_irq_enter(). If that nested interrupt is 309 309 * the tick then rcu_flavor_sched_clock_irq() would wrongfully 310 310 * assume that it is the first interupt and eventually claim

+1 -1

kernel/locking/locktorture.c

··· 566 566 #include <linux/percpu-rwsem.h> 567 567 static struct percpu_rw_semaphore pcpu_rwsem; 568 568 569 - void torture_percpu_rwsem_init(void) 569 + static void torture_percpu_rwsem_init(void) 570 570 { 571 571 BUG_ON(percpu_init_rwsem(&pcpu_rwsem)); 572 572 }

+5 -3

kernel/rcu/Kconfig

··· 135 135 136 136 config RCU_FANOUT_LEAF 137 137 int "Tree-based hierarchical RCU leaf-level fanout value" 138 - range 2 64 if 64BIT 139 - range 2 32 if !64BIT 138 + range 2 64 if 64BIT && !RCU_STRICT_GRACE_PERIOD 139 + range 2 32 if !64BIT && !RCU_STRICT_GRACE_PERIOD 140 + range 2 3 if RCU_STRICT_GRACE_PERIOD 140 141 depends on TREE_RCU && RCU_EXPERT 141 - default 16 142 + default 16 if !RCU_STRICT_GRACE_PERIOD 143 + default 2 if RCU_STRICT_GRACE_PERIOD 142 144 help 143 145 This option controls the leaf-level fanout of hierarchical 144 146 implementations of RCU, and allows trading off cache misses

+16 -1

kernel/rcu/Kconfig.debug

··· 23 23 tristate 24 24 default n 25 25 26 - config RCU_PERF_TEST 26 + config RCU_SCALE_TEST 27 27 tristate "performance tests for RCU" 28 28 depends on DEBUG_KERNEL 29 29 select TORTURE_TEST ··· 113 113 114 114 Say N here if you need ultimate kernel/user switch latencies 115 115 Say Y if you are unsure 116 + 117 + config RCU_STRICT_GRACE_PERIOD 118 + bool "Provide debug RCU implementation with short grace periods" 119 + depends on DEBUG_KERNEL && RCU_EXPERT 120 + default n 121 + select PREEMPT_COUNT if PREEMPT=n 122 + help 123 + Select this option to build an RCU variant that is strict about 124 + grace periods, making them as short as it can. This limits 125 + scalability, destroys real-time response, degrades battery 126 + lifetime and kills performance. Don't try this on large 127 + machines, as in systems with more than about 10 or 20 CPUs. 128 + But in conjunction with tools like KASAN, it can be helpful 129 + when looking for certain types of RCU usage bugs, for example, 130 + too-short RCU read-side critical sections. 116 131 117 132 endmenu # "RCU Debugging"

+1 -1

kernel/rcu/Makefile

··· 11 11 obj-$(CONFIG_TREE_SRCU) += srcutree.o 12 12 obj-$(CONFIG_TINY_SRCU) += srcutiny.o 13 13 obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o 14 - obj-$(CONFIG_RCU_PERF_TEST) += rcuperf.o 14 + obj-$(CONFIG_RCU_SCALE_TEST) += rcuscale.o 15 15 obj-$(CONFIG_RCU_REF_SCALE_TEST) += refscale.o 16 16 obj-$(CONFIG_TREE_RCU) += tree.o 17 17 obj-$(CONFIG_TINY_RCU) += tiny.o

+9 -1

kernel/rcu/rcu_segcblist.c

··· 475 475 * Also advance to the oldest segment of callbacks whose 476 476 * ->gp_seq[] completion is at or after that passed in via "seq", 477 477 * skipping any empty segments. 478 + * 479 + * Note that segment "i" (and any lower-numbered segments 480 + * containing older callbacks) will be unaffected, and their 481 + * grace-period numbers remain unchanged. For example, if i == 482 + * WAIT_TAIL, then neither WAIT_TAIL nor DONE_TAIL will be touched. 483 + * Instead, the CBs in NEXT_TAIL will be merged with those in 484 + * NEXT_READY_TAIL and the grace-period number of NEXT_READY_TAIL 485 + * would be updated. NEXT_TAIL would then be empty. 478 486 */ 479 - if (++i >= RCU_NEXT_TAIL) 487 + if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL) 480 488 return false; 481 489 482 490 /*

+165 -165

kernel/rcu/rcuperf.c kernel/rcu/rcuscale.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0+ 2 2 /* 3 - * Read-Copy Update module-based performance-test facility 3 + * Read-Copy Update module-based scalability-test facility 4 4 * 5 5 * Copyright (C) IBM Corporation, 2015 6 6 * ··· 44 44 MODULE_LICENSE("GPL"); 45 45 MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>"); 46 46 47 - #define PERF_FLAG "-perf:" 48 - #define PERFOUT_STRING(s) \ 49 - pr_alert("%s" PERF_FLAG " %s\n", perf_type, s) 50 - #define VERBOSE_PERFOUT_STRING(s) \ 51 - do { if (verbose) pr_alert("%s" PERF_FLAG " %s\n", perf_type, s); } while (0) 52 - #define VERBOSE_PERFOUT_ERRSTRING(s) \ 53 - do { if (verbose) pr_alert("%s" PERF_FLAG "!!! %s\n", perf_type, s); } while (0) 47 + #define SCALE_FLAG "-scale:" 48 + #define SCALEOUT_STRING(s) \ 49 + pr_alert("%s" SCALE_FLAG " %s\n", scale_type, s) 50 + #define VERBOSE_SCALEOUT_STRING(s) \ 51 + do { if (verbose) pr_alert("%s" SCALE_FLAG " %s\n", scale_type, s); } while (0) 52 + #define VERBOSE_SCALEOUT_ERRSTRING(s) \ 53 + do { if (verbose) pr_alert("%s" SCALE_FLAG "!!! %s\n", scale_type, s); } while (0) 54 54 55 55 /* 56 56 * The intended use cases for the nreaders and nwriters module parameters ··· 61 61 * nr_cpus for a mixed reader/writer test. 62 62 * 63 63 * 2. Specify the nr_cpus kernel boot parameter, but set 64 - * rcuperf.nreaders to zero. This will set nwriters to the 64 + * rcuscale.nreaders to zero. This will set nwriters to the 65 65 * value specified by nr_cpus for an update-only test. 66 66 * 67 67 * 3. Specify the nr_cpus kernel boot parameter, but set 68 - * rcuperf.nwriters to zero. This will set nreaders to the 68 + * rcuscale.nwriters to zero. This will set nreaders to the 69 69 * value specified by nr_cpus for a read-only test. 70 70 * 71 71 * Various other use cases may of course be specified. 72 72 * 73 73 * Note that this test's readers are intended only as a test load for 74 - * the writers. The reader performance statistics will be overly 74 + * the writers. The reader scalability statistics will be overly 75 75 * pessimistic due to the per-critical-section interrupt disabling, 76 76 * test-end checks, and the pair of calls through pointers. 77 77 */ 78 78 79 79 #ifdef MODULE 80 - # define RCUPERF_SHUTDOWN 0 80 + # define RCUSCALE_SHUTDOWN 0 81 81 #else 82 - # define RCUPERF_SHUTDOWN 1 82 + # define RCUSCALE_SHUTDOWN 1 83 83 #endif 84 84 85 85 torture_param(bool, gp_async, false, "Use asynchronous GP wait primitives"); ··· 88 88 torture_param(int, holdoff, 10, "Holdoff time before test start (s)"); 89 89 torture_param(int, nreaders, -1, "Number of RCU reader threads"); 90 90 torture_param(int, nwriters, -1, "Number of RCU updater threads"); 91 - torture_param(bool, shutdown, RCUPERF_SHUTDOWN, 92 - "Shutdown at end of performance tests."); 91 + torture_param(bool, shutdown, RCUSCALE_SHUTDOWN, 92 + "Shutdown at end of scalability tests."); 93 93 torture_param(int, verbose, 1, "Enable verbose debugging printk()s"); 94 94 torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable"); 95 - torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() perf test?"); 95 + torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() scale test?"); 96 96 torture_param(int, kfree_mult, 1, "Multiple of kfree_obj size to allocate."); 97 97 98 - static char *perf_type = "rcu"; 99 - module_param(perf_type, charp, 0444); 100 - MODULE_PARM_DESC(perf_type, "Type of RCU to performance-test (rcu, srcu, ...)"); 98 + static char *scale_type = "rcu"; 99 + module_param(scale_type, charp, 0444); 100 + MODULE_PARM_DESC(scale_type, "Type of RCU to scalability-test (rcu, srcu, ...)"); 101 101 102 102 static int nrealreaders; 103 103 static int nrealwriters; ··· 107 107 108 108 static u64 **writer_durations; 109 109 static int *writer_n_durations; 110 - static atomic_t n_rcu_perf_reader_started; 111 - static atomic_t n_rcu_perf_writer_started; 112 - static atomic_t n_rcu_perf_writer_finished; 110 + static atomic_t n_rcu_scale_reader_started; 111 + static atomic_t n_rcu_scale_writer_started; 112 + static atomic_t n_rcu_scale_writer_finished; 113 113 static wait_queue_head_t shutdown_wq; 114 - static u64 t_rcu_perf_writer_started; 115 - static u64 t_rcu_perf_writer_finished; 114 + static u64 t_rcu_scale_writer_started; 115 + static u64 t_rcu_scale_writer_finished; 116 116 static unsigned long b_rcu_gp_test_started; 117 117 static unsigned long b_rcu_gp_test_finished; 118 118 static DEFINE_PER_CPU(atomic_t, n_async_inflight); ··· 124 124 * Operations vector for selecting different types of tests. 125 125 */ 126 126 127 - struct rcu_perf_ops { 127 + struct rcu_scale_ops { 128 128 int ptype; 129 129 void (*init)(void); 130 130 void (*cleanup)(void); ··· 140 140 const char *name; 141 141 }; 142 142 143 - static struct rcu_perf_ops *cur_ops; 143 + static struct rcu_scale_ops *cur_ops; 144 144 145 145 /* 146 - * Definitions for rcu perf testing. 146 + * Definitions for rcu scalability testing. 147 147 */ 148 148 149 - static int rcu_perf_read_lock(void) __acquires(RCU) 149 + static int rcu_scale_read_lock(void) __acquires(RCU) 150 150 { 151 151 rcu_read_lock(); 152 152 return 0; 153 153 } 154 154 155 - static void rcu_perf_read_unlock(int idx) __releases(RCU) 155 + static void rcu_scale_read_unlock(int idx) __releases(RCU) 156 156 { 157 157 rcu_read_unlock(); 158 158 } ··· 162 162 return 0; 163 163 } 164 164 165 - static void rcu_sync_perf_init(void) 165 + static void rcu_sync_scale_init(void) 166 166 { 167 167 } 168 168 169 - static struct rcu_perf_ops rcu_ops = { 169 + static struct rcu_scale_ops rcu_ops = { 170 170 .ptype = RCU_FLAVOR, 171 - .init = rcu_sync_perf_init, 172 - .readlock = rcu_perf_read_lock, 173 - .readunlock = rcu_perf_read_unlock, 171 + .init = rcu_sync_scale_init, 172 + .readlock = rcu_scale_read_lock, 173 + .readunlock = rcu_scale_read_unlock, 174 174 .get_gp_seq = rcu_get_gp_seq, 175 175 .gp_diff = rcu_seq_diff, 176 176 .exp_completed = rcu_exp_batches_completed, ··· 182 182 }; 183 183 184 184 /* 185 - * Definitions for srcu perf testing. 185 + * Definitions for srcu scalability testing. 186 186 */ 187 187 188 - DEFINE_STATIC_SRCU(srcu_ctl_perf); 189 - static struct srcu_struct *srcu_ctlp = &srcu_ctl_perf; 188 + DEFINE_STATIC_SRCU(srcu_ctl_scale); 189 + static struct srcu_struct *srcu_ctlp = &srcu_ctl_scale; 190 190 191 - static int srcu_perf_read_lock(void) __acquires(srcu_ctlp) 191 + static int srcu_scale_read_lock(void) __acquires(srcu_ctlp) 192 192 { 193 193 return srcu_read_lock(srcu_ctlp); 194 194 } 195 195 196 - static void srcu_perf_read_unlock(int idx) __releases(srcu_ctlp) 196 + static void srcu_scale_read_unlock(int idx) __releases(srcu_ctlp) 197 197 { 198 198 srcu_read_unlock(srcu_ctlp, idx); 199 199 } 200 200 201 - static unsigned long srcu_perf_completed(void) 201 + static unsigned long srcu_scale_completed(void) 202 202 { 203 203 return srcu_batches_completed(srcu_ctlp); 204 204 } ··· 213 213 srcu_barrier(srcu_ctlp); 214 214 } 215 215 216 - static void srcu_perf_synchronize(void) 216 + static void srcu_scale_synchronize(void) 217 217 { 218 218 synchronize_srcu(srcu_ctlp); 219 219 } 220 220 221 - static void srcu_perf_synchronize_expedited(void) 221 + static void srcu_scale_synchronize_expedited(void) 222 222 { 223 223 synchronize_srcu_expedited(srcu_ctlp); 224 224 } 225 225 226 - static struct rcu_perf_ops srcu_ops = { 226 + static struct rcu_scale_ops srcu_ops = { 227 227 .ptype = SRCU_FLAVOR, 228 - .init = rcu_sync_perf_init, 229 - .readlock = srcu_perf_read_lock, 230 - .readunlock = srcu_perf_read_unlock, 231 - .get_gp_seq = srcu_perf_completed, 228 + .init = rcu_sync_scale_init, 229 + .readlock = srcu_scale_read_lock, 230 + .readunlock = srcu_scale_read_unlock, 231 + .get_gp_seq = srcu_scale_completed, 232 232 .gp_diff = rcu_seq_diff, 233 - .exp_completed = srcu_perf_completed, 233 + .exp_completed = srcu_scale_completed, 234 234 .async = srcu_call_rcu, 235 235 .gp_barrier = srcu_rcu_barrier, 236 - .sync = srcu_perf_synchronize, 237 - .exp_sync = srcu_perf_synchronize_expedited, 236 + .sync = srcu_scale_synchronize, 237 + .exp_sync = srcu_scale_synchronize_expedited, 238 238 .name = "srcu" 239 239 }; 240 240 241 241 static struct srcu_struct srcud; 242 242 243 - static void srcu_sync_perf_init(void) 243 + static void srcu_sync_scale_init(void) 244 244 { 245 245 srcu_ctlp = &srcud; 246 246 init_srcu_struct(srcu_ctlp); 247 247 } 248 248 249 - static void srcu_sync_perf_cleanup(void) 249 + static void srcu_sync_scale_cleanup(void) 250 250 { 251 251 cleanup_srcu_struct(srcu_ctlp); 252 252 } 253 253 254 - static struct rcu_perf_ops srcud_ops = { 254 + static struct rcu_scale_ops srcud_ops = { 255 255 .ptype = SRCU_FLAVOR, 256 - .init = srcu_sync_perf_init, 257 - .cleanup = srcu_sync_perf_cleanup, 258 - .readlock = srcu_perf_read_lock, 259 - .readunlock = srcu_perf_read_unlock, 260 - .get_gp_seq = srcu_perf_completed, 256 + .init = srcu_sync_scale_init, 257 + .cleanup = srcu_sync_scale_cleanup, 258 + .readlock = srcu_scale_read_lock, 259 + .readunlock = srcu_scale_read_unlock, 260 + .get_gp_seq = srcu_scale_completed, 261 261 .gp_diff = rcu_seq_diff, 262 - .exp_completed = srcu_perf_completed, 262 + .exp_completed = srcu_scale_completed, 263 263 .async = srcu_call_rcu, 264 264 .gp_barrier = srcu_rcu_barrier, 265 - .sync = srcu_perf_synchronize, 266 - .exp_sync = srcu_perf_synchronize_expedited, 265 + .sync = srcu_scale_synchronize, 266 + .exp_sync = srcu_scale_synchronize_expedited, 267 267 .name = "srcud" 268 268 }; 269 269 270 270 /* 271 - * Definitions for RCU-tasks perf testing. 271 + * Definitions for RCU-tasks scalability testing. 272 272 */ 273 273 274 - static int tasks_perf_read_lock(void) 274 + static int tasks_scale_read_lock(void) 275 275 { 276 276 return 0; 277 277 } 278 278 279 - static void tasks_perf_read_unlock(int idx) 279 + static void tasks_scale_read_unlock(int idx) 280 280 { 281 281 } 282 282 283 - static struct rcu_perf_ops tasks_ops = { 283 + static struct rcu_scale_ops tasks_ops = { 284 284 .ptype = RCU_TASKS_FLAVOR, 285 - .init = rcu_sync_perf_init, 286 - .readlock = tasks_perf_read_lock, 287 - .readunlock = tasks_perf_read_unlock, 285 + .init = rcu_sync_scale_init, 286 + .readlock = tasks_scale_read_lock, 287 + .readunlock = tasks_scale_read_unlock, 288 288 .get_gp_seq = rcu_no_completed, 289 289 .gp_diff = rcu_seq_diff, 290 290 .async = call_rcu_tasks, ··· 294 294 .name = "tasks" 295 295 }; 296 296 297 - static unsigned long rcuperf_seq_diff(unsigned long new, unsigned long old) 297 + static unsigned long rcuscale_seq_diff(unsigned long new, unsigned long old) 298 298 { 299 299 if (!cur_ops->gp_diff) 300 300 return new - old; ··· 302 302 } 303 303 304 304 /* 305 - * If performance tests complete, wait for shutdown to commence. 305 + * If scalability tests complete, wait for shutdown to commence. 306 306 */ 307 - static void rcu_perf_wait_shutdown(void) 307 + static void rcu_scale_wait_shutdown(void) 308 308 { 309 309 cond_resched_tasks_rcu_qs(); 310 - if (atomic_read(&n_rcu_perf_writer_finished) < nrealwriters) 310 + if (atomic_read(&n_rcu_scale_writer_finished) < nrealwriters) 311 311 return; 312 312 while (!torture_must_stop()) 313 313 schedule_timeout_uninterruptible(1); 314 314 } 315 315 316 316 /* 317 - * RCU perf reader kthread. Repeatedly does empty RCU read-side critical 318 - * section, minimizing update-side interference. However, the point of 319 - * this test is not to evaluate reader performance, but instead to serve 320 - * as a test load for update-side performance testing. 317 + * RCU scalability reader kthread. Repeatedly does empty RCU read-side 318 + * critical section, minimizing update-side interference. However, the 319 + * point of this test is not to evaluate reader scalability, but instead 320 + * to serve as a test load for update-side scalability testing. 321 321 */ 322 322 static int 323 - rcu_perf_reader(void *arg) 323 + rcu_scale_reader(void *arg) 324 324 { 325 325 unsigned long flags; 326 326 int idx; 327 327 long me = (long)arg; 328 328 329 - VERBOSE_PERFOUT_STRING("rcu_perf_reader task started"); 329 + VERBOSE_SCALEOUT_STRING("rcu_scale_reader task started"); 330 330 set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids)); 331 331 set_user_nice(current, MAX_NICE); 332 - atomic_inc(&n_rcu_perf_reader_started); 332 + atomic_inc(&n_rcu_scale_reader_started); 333 333 334 334 do { 335 335 local_irq_save(flags); 336 336 idx = cur_ops->readlock(); 337 337 cur_ops->readunlock(idx); 338 338 local_irq_restore(flags); 339 - rcu_perf_wait_shutdown(); 339 + rcu_scale_wait_shutdown(); 340 340 } while (!torture_must_stop()); 341 - torture_kthread_stopping("rcu_perf_reader"); 341 + torture_kthread_stopping("rcu_scale_reader"); 342 342 return 0; 343 343 } 344 344 345 345 /* 346 - * Callback function for asynchronous grace periods from rcu_perf_writer(). 346 + * Callback function for asynchronous grace periods from rcu_scale_writer(). 347 347 */ 348 - static void rcu_perf_async_cb(struct rcu_head *rhp) 348 + static void rcu_scale_async_cb(struct rcu_head *rhp) 349 349 { 350 350 atomic_dec(this_cpu_ptr(&n_async_inflight)); 351 351 kfree(rhp); 352 352 } 353 353 354 354 /* 355 - * RCU perf writer kthread. Repeatedly does a grace period. 355 + * RCU scale writer kthread. Repeatedly does a grace period. 356 356 */ 357 357 static int 358 - rcu_perf_writer(void *arg) 358 + rcu_scale_writer(void *arg) 359 359 { 360 360 int i = 0; 361 361 int i_max; ··· 366 366 u64 *wdp; 367 367 u64 *wdpp = writer_durations[me]; 368 368 369 - VERBOSE_PERFOUT_STRING("rcu_perf_writer task started"); 369 + VERBOSE_SCALEOUT_STRING("rcu_scale_writer task started"); 370 370 WARN_ON(!wdpp); 371 371 set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids)); 372 372 sched_set_fifo_low(current); ··· 383 383 schedule_timeout_uninterruptible(1); 384 384 385 385 t = ktime_get_mono_fast_ns(); 386 - if (atomic_inc_return(&n_rcu_perf_writer_started) >= nrealwriters) { 387 - t_rcu_perf_writer_started = t; 386 + if (atomic_inc_return(&n_rcu_scale_writer_started) >= nrealwriters) { 387 + t_rcu_scale_writer_started = t; 388 388 if (gp_exp) { 389 389 b_rcu_gp_test_started = 390 390 cur_ops->exp_completed() / 2; ··· 404 404 rhp = kmalloc(sizeof(*rhp), GFP_KERNEL); 405 405 if (rhp && atomic_read(this_cpu_ptr(&n_async_inflight)) < gp_async_max) { 406 406 atomic_inc(this_cpu_ptr(&n_async_inflight)); 407 - cur_ops->async(rhp, rcu_perf_async_cb); 407 + cur_ops->async(rhp, rcu_scale_async_cb); 408 408 rhp = NULL; 409 409 } else if (!kthread_should_stop()) { 410 410 cur_ops->gp_barrier(); ··· 421 421 *wdp = t - *wdp; 422 422 i_max = i; 423 423 if (!started && 424 - atomic_read(&n_rcu_perf_writer_started) >= nrealwriters) 424 + atomic_read(&n_rcu_scale_writer_started) >= nrealwriters) 425 425 started = true; 426 426 if (!done && i >= MIN_MEAS) { 427 427 done = true; 428 428 sched_set_normal(current, 0); 429 - pr_alert("%s%s rcu_perf_writer %ld has %d measurements\n", 430 - perf_type, PERF_FLAG, me, MIN_MEAS); 431 - if (atomic_inc_return(&n_rcu_perf_writer_finished) >= 429 + pr_alert("%s%s rcu_scale_writer %ld has %d measurements\n", 430 + scale_type, SCALE_FLAG, me, MIN_MEAS); 431 + if (atomic_inc_return(&n_rcu_scale_writer_finished) >= 432 432 nrealwriters) { 433 433 schedule_timeout_interruptible(10); 434 434 rcu_ftrace_dump(DUMP_ALL); 435 - PERFOUT_STRING("Test complete"); 436 - t_rcu_perf_writer_finished = t; 435 + SCALEOUT_STRING("Test complete"); 436 + t_rcu_scale_writer_finished = t; 437 437 if (gp_exp) { 438 438 b_rcu_gp_test_finished = 439 439 cur_ops->exp_completed() / 2; ··· 448 448 } 449 449 } 450 450 if (done && !alldone && 451 - atomic_read(&n_rcu_perf_writer_finished) >= nrealwriters) 451 + atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters) 452 452 alldone = true; 453 453 if (started && !alldone && i < MAX_MEAS - 1) 454 454 i++; 455 - rcu_perf_wait_shutdown(); 455 + rcu_scale_wait_shutdown(); 456 456 } while (!torture_must_stop()); 457 457 if (gp_async) { 458 458 cur_ops->gp_barrier(); 459 459 } 460 460 writer_n_durations[me] = i_max; 461 - torture_kthread_stopping("rcu_perf_writer"); 461 + torture_kthread_stopping("rcu_scale_writer"); 462 462 return 0; 463 463 } 464 464 465 465 static void 466 - rcu_perf_print_module_parms(struct rcu_perf_ops *cur_ops, const char *tag) 466 + rcu_scale_print_module_parms(struct rcu_scale_ops *cur_ops, const char *tag) 467 467 { 468 - pr_alert("%s" PERF_FLAG 468 + pr_alert("%s" SCALE_FLAG 469 469 "--- %s: nreaders=%d nwriters=%d verbose=%d shutdown=%d\n", 470 - perf_type, tag, nrealreaders, nrealwriters, verbose, shutdown); 470 + scale_type, tag, nrealreaders, nrealwriters, verbose, shutdown); 471 471 } 472 472 473 473 static void 474 - rcu_perf_cleanup(void) 474 + rcu_scale_cleanup(void) 475 475 { 476 476 int i; 477 477 int j; ··· 484 484 * during the mid-boot phase, so have to wait till the end. 485 485 */ 486 486 if (rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp) 487 - VERBOSE_PERFOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!"); 487 + VERBOSE_SCALEOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!"); 488 488 if (rcu_gp_is_normal() && gp_exp) 489 - VERBOSE_PERFOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!"); 489 + VERBOSE_SCALEOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!"); 490 490 if (gp_exp && gp_async) 491 - VERBOSE_PERFOUT_ERRSTRING("No expedited async GPs, so went with async!"); 491 + VERBOSE_SCALEOUT_ERRSTRING("No expedited async GPs, so went with async!"); 492 492 493 493 if (torture_cleanup_begin()) 494 494 return; ··· 499 499 500 500 if (reader_tasks) { 501 501 for (i = 0; i < nrealreaders; i++) 502 - torture_stop_kthread(rcu_perf_reader, 502 + torture_stop_kthread(rcu_scale_reader, 503 503 reader_tasks[i]); 504 504 kfree(reader_tasks); 505 505 } 506 506 507 507 if (writer_tasks) { 508 508 for (i = 0; i < nrealwriters; i++) { 509 - torture_stop_kthread(rcu_perf_writer, 509 + torture_stop_kthread(rcu_scale_writer, 510 510 writer_tasks[i]); 511 511 if (!writer_n_durations) 512 512 continue; 513 513 j = writer_n_durations[i]; 514 514 pr_alert("%s%s writer %d gps: %d\n", 515 - perf_type, PERF_FLAG, i, j); 515 + scale_type, SCALE_FLAG, i, j); 516 516 ngps += j; 517 517 } 518 518 pr_alert("%s%s start: %llu end: %llu duration: %llu gps: %d batches: %ld\n", 519 - perf_type, PERF_FLAG, 520 - t_rcu_perf_writer_started, t_rcu_perf_writer_finished, 521 - t_rcu_perf_writer_finished - 522 - t_rcu_perf_writer_started, 519 + scale_type, SCALE_FLAG, 520 + t_rcu_scale_writer_started, t_rcu_scale_writer_finished, 521 + t_rcu_scale_writer_finished - 522 + t_rcu_scale_writer_started, 523 523 ngps, 524 - rcuperf_seq_diff(b_rcu_gp_test_finished, 525 - b_rcu_gp_test_started)); 524 + rcuscale_seq_diff(b_rcu_gp_test_finished, 525 + b_rcu_gp_test_started)); 526 526 for (i = 0; i < nrealwriters; i++) { 527 527 if (!writer_durations) 528 528 break; ··· 534 534 for (j = 0; j <= writer_n_durations[i]; j++) { 535 535 wdp = &wdpp[j]; 536 536 pr_alert("%s%s %4d writer-duration: %5d %llu\n", 537 - perf_type, PERF_FLAG, 537 + scale_type, SCALE_FLAG, 538 538 i, j, *wdp); 539 539 if (j % 100 == 0) 540 540 schedule_timeout_uninterruptible(1); ··· 573 573 } 574 574 575 575 /* 576 - * RCU perf shutdown kthread. Just waits to be awakened, then shuts 576 + * RCU scalability shutdown kthread. Just waits to be awakened, then shuts 577 577 * down system. 578 578 */ 579 579 static int 580 - rcu_perf_shutdown(void *arg) 580 + rcu_scale_shutdown(void *arg) 581 581 { 582 582 wait_event(shutdown_wq, 583 - atomic_read(&n_rcu_perf_writer_finished) >= nrealwriters); 583 + atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters); 584 584 smp_mb(); /* Wake before output. */ 585 - rcu_perf_cleanup(); 585 + rcu_scale_cleanup(); 586 586 kernel_power_off(); 587 587 return -EINVAL; 588 588 } 589 589 590 590 /* 591 - * kfree_rcu() performance tests: Start a kfree_rcu() loop on all CPUs for number 591 + * kfree_rcu() scalability tests: Start a kfree_rcu() loop on all CPUs for number 592 592 * of iterations and measure total time and number of GP for all iterations to complete. 593 593 */ 594 594 ··· 598 598 599 599 static struct task_struct **kfree_reader_tasks; 600 600 static int kfree_nrealthreads; 601 - static atomic_t n_kfree_perf_thread_started; 602 - static atomic_t n_kfree_perf_thread_ended; 601 + static atomic_t n_kfree_scale_thread_started; 602 + static atomic_t n_kfree_scale_thread_ended; 603 603 604 604 struct kfree_obj { 605 605 char kfree_obj[8]; ··· 607 607 }; 608 608 609 609 static int 610 - kfree_perf_thread(void *arg) 610 + kfree_scale_thread(void *arg) 611 611 { 612 612 int i, loop = 0; 613 613 long me = (long)arg; ··· 615 615 u64 start_time, end_time; 616 616 long long mem_begin, mem_during = 0; 617 617 618 - VERBOSE_PERFOUT_STRING("kfree_perf_thread task started"); 618 + VERBOSE_SCALEOUT_STRING("kfree_scale_thread task started"); 619 619 set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids)); 620 620 set_user_nice(current, MAX_NICE); 621 621 622 622 start_time = ktime_get_mono_fast_ns(); 623 623 624 - if (atomic_inc_return(&n_kfree_perf_thread_started) >= kfree_nrealthreads) { 624 + if (atomic_inc_return(&n_kfree_scale_thread_started) >= kfree_nrealthreads) { 625 625 if (gp_exp) 626 626 b_rcu_gp_test_started = cur_ops->exp_completed() / 2; 627 627 else ··· 646 646 cond_resched(); 647 647 } while (!torture_must_stop() && ++loop < kfree_loops); 648 648 649 - if (atomic_inc_return(&n_kfree_perf_thread_ended) >= kfree_nrealthreads) { 649 + if (atomic_inc_return(&n_kfree_scale_thread_ended) >= kfree_nrealthreads) { 650 650 end_time = ktime_get_mono_fast_ns(); 651 651 652 652 if (gp_exp) ··· 656 656 657 657 pr_alert("Total time taken by all kfree'ers: %llu ns, loops: %d, batches: %ld, memory footprint: %lldMB\n", 658 658 (unsigned long long)(end_time - start_time), kfree_loops, 659 - rcuperf_seq_diff(b_rcu_gp_test_finished, b_rcu_gp_test_started), 659 + rcuscale_seq_diff(b_rcu_gp_test_finished, b_rcu_gp_test_started), 660 660 (mem_begin - mem_during) >> (20 - PAGE_SHIFT)); 661 661 662 662 if (shutdown) { ··· 665 665 } 666 666 } 667 667 668 - torture_kthread_stopping("kfree_perf_thread"); 668 + torture_kthread_stopping("kfree_scale_thread"); 669 669 return 0; 670 670 } 671 671 672 672 static void 673 - kfree_perf_cleanup(void) 673 + kfree_scale_cleanup(void) 674 674 { 675 675 int i; 676 676 ··· 679 679 680 680 if (kfree_reader_tasks) { 681 681 for (i = 0; i < kfree_nrealthreads; i++) 682 - torture_stop_kthread(kfree_perf_thread, 682 + torture_stop_kthread(kfree_scale_thread, 683 683 kfree_reader_tasks[i]); 684 684 kfree(kfree_reader_tasks); 685 685 } ··· 691 691 * shutdown kthread. Just waits to be awakened, then shuts down system. 692 692 */ 693 693 static int 694 - kfree_perf_shutdown(void *arg) 694 + kfree_scale_shutdown(void *arg) 695 695 { 696 696 wait_event(shutdown_wq, 697 - atomic_read(&n_kfree_perf_thread_ended) >= kfree_nrealthreads); 697 + atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads); 698 698 699 699 smp_mb(); /* Wake before output. */ 700 700 701 - kfree_perf_cleanup(); 701 + kfree_scale_cleanup(); 702 702 kernel_power_off(); 703 703 return -EINVAL; 704 704 } 705 705 706 706 static int __init 707 - kfree_perf_init(void) 707 + kfree_scale_init(void) 708 708 { 709 709 long i; 710 710 int firsterr = 0; ··· 713 713 /* Start up the kthreads. */ 714 714 if (shutdown) { 715 715 init_waitqueue_head(&shutdown_wq); 716 - firsterr = torture_create_kthread(kfree_perf_shutdown, NULL, 716 + firsterr = torture_create_kthread(kfree_scale_shutdown, NULL, 717 717 shutdown_task); 718 718 if (firsterr) 719 719 goto unwind; ··· 730 730 } 731 731 732 732 for (i = 0; i < kfree_nrealthreads; i++) { 733 - firsterr = torture_create_kthread(kfree_perf_thread, (void *)i, 733 + firsterr = torture_create_kthread(kfree_scale_thread, (void *)i, 734 734 kfree_reader_tasks[i]); 735 735 if (firsterr) 736 736 goto unwind; 737 737 } 738 738 739 - while (atomic_read(&n_kfree_perf_thread_started) < kfree_nrealthreads) 739 + while (atomic_read(&n_kfree_scale_thread_started) < kfree_nrealthreads) 740 740 schedule_timeout_uninterruptible(1); 741 741 742 742 torture_init_end(); ··· 744 744 745 745 unwind: 746 746 torture_init_end(); 747 - kfree_perf_cleanup(); 747 + kfree_scale_cleanup(); 748 748 return firsterr; 749 749 } 750 750 751 751 static int __init 752 - rcu_perf_init(void) 752 + rcu_scale_init(void) 753 753 { 754 754 long i; 755 755 int firsterr = 0; 756 - static struct rcu_perf_ops *perf_ops[] = { 756 + static struct rcu_scale_ops *scale_ops[] = { 757 757 &rcu_ops, &srcu_ops, &srcud_ops, &tasks_ops, 758 758 }; 759 759 760 - if (!torture_init_begin(perf_type, verbose)) 760 + if (!torture_init_begin(scale_type, verbose)) 761 761 return -EBUSY; 762 762 763 - /* Process args and tell the world that the perf'er is on the job. */ 764 - for (i = 0; i < ARRAY_SIZE(perf_ops); i++) { 765 - cur_ops = perf_ops[i]; 766 - if (strcmp(perf_type, cur_ops->name) == 0) 763 + /* Process args and announce that the scalability'er is on the job. */ 764 + for (i = 0; i < ARRAY_SIZE(scale_ops); i++) { 765 + cur_ops = scale_ops[i]; 766 + if (strcmp(scale_type, cur_ops->name) == 0) 767 767 break; 768 768 } 769 - if (i == ARRAY_SIZE(perf_ops)) { 770 - pr_alert("rcu-perf: invalid perf type: \"%s\"\n", perf_type); 771 - pr_alert("rcu-perf types:"); 772 - for (i = 0; i < ARRAY_SIZE(perf_ops); i++) 773 - pr_cont(" %s", perf_ops[i]->name); 769 + if (i == ARRAY_SIZE(scale_ops)) { 770 + pr_alert("rcu-scale: invalid scale type: \"%s\"\n", scale_type); 771 + pr_alert("rcu-scale types:"); 772 + for (i = 0; i < ARRAY_SIZE(scale_ops); i++) 773 + pr_cont(" %s", scale_ops[i]->name); 774 774 pr_cont("\n"); 775 - WARN_ON(!IS_MODULE(CONFIG_RCU_PERF_TEST)); 775 + WARN_ON(!IS_MODULE(CONFIG_RCU_SCALE_TEST)); 776 776 firsterr = -EINVAL; 777 777 cur_ops = NULL; 778 778 goto unwind; ··· 781 781 cur_ops->init(); 782 782 783 783 if (kfree_rcu_test) 784 - return kfree_perf_init(); 784 + return kfree_scale_init(); 785 785 786 786 nrealwriters = compute_real(nwriters); 787 787 nrealreaders = compute_real(nreaders); 788 - atomic_set(&n_rcu_perf_reader_started, 0); 789 - atomic_set(&n_rcu_perf_writer_started, 0); 790 - atomic_set(&n_rcu_perf_writer_finished, 0); 791 - rcu_perf_print_module_parms(cur_ops, "Start of test"); 788 + atomic_set(&n_rcu_scale_reader_started, 0); 789 + atomic_set(&n_rcu_scale_writer_started, 0); 790 + atomic_set(&n_rcu_scale_writer_finished, 0); 791 + rcu_scale_print_module_parms(cur_ops, "Start of test"); 792 792 793 793 /* Start up the kthreads. */ 794 794 795 795 if (shutdown) { 796 796 init_waitqueue_head(&shutdown_wq); 797 - firsterr = torture_create_kthread(rcu_perf_shutdown, NULL, 797 + firsterr = torture_create_kthread(rcu_scale_shutdown, NULL, 798 798 shutdown_task); 799 799 if (firsterr) 800 800 goto unwind; ··· 803 803 reader_tasks = kcalloc(nrealreaders, sizeof(reader_tasks[0]), 804 804 GFP_KERNEL); 805 805 if (reader_tasks == NULL) { 806 - VERBOSE_PERFOUT_ERRSTRING("out of memory"); 806 + VERBOSE_SCALEOUT_ERRSTRING("out of memory"); 807 807 firsterr = -ENOMEM; 808 808 goto unwind; 809 809 } 810 810 for (i = 0; i < nrealreaders; i++) { 811 - firsterr = torture_create_kthread(rcu_perf_reader, (void *)i, 811 + firsterr = torture_create_kthread(rcu_scale_reader, (void *)i, 812 812 reader_tasks[i]); 813 813 if (firsterr) 814 814 goto unwind; 815 815 } 816 - while (atomic_read(&n_rcu_perf_reader_started) < nrealreaders) 816 + while (atomic_read(&n_rcu_scale_reader_started) < nrealreaders) 817 817 schedule_timeout_uninterruptible(1); 818 818 writer_tasks = kcalloc(nrealwriters, sizeof(reader_tasks[0]), 819 819 GFP_KERNEL); ··· 823 823 kcalloc(nrealwriters, sizeof(*writer_n_durations), 824 824 GFP_KERNEL); 825 825 if (!writer_tasks || !writer_durations || !writer_n_durations) { 826 - VERBOSE_PERFOUT_ERRSTRING("out of memory"); 826 + VERBOSE_SCALEOUT_ERRSTRING("out of memory"); 827 827 firsterr = -ENOMEM; 828 828 goto unwind; 829 829 } ··· 835 835 firsterr = -ENOMEM; 836 836 goto unwind; 837 837 } 838 - firsterr = torture_create_kthread(rcu_perf_writer, (void *)i, 838 + firsterr = torture_create_kthread(rcu_scale_writer, (void *)i, 839 839 writer_tasks[i]); 840 840 if (firsterr) 841 841 goto unwind; ··· 845 845 846 846 unwind: 847 847 torture_init_end(); 848 - rcu_perf_cleanup(); 848 + rcu_scale_cleanup(); 849 849 return firsterr; 850 850 } 851 851 852 - module_init(rcu_perf_init); 853 - module_exit(rcu_perf_cleanup); 852 + module_init(rcu_scale_init); 853 + module_exit(rcu_scale_cleanup);

+42 -19

kernel/rcu/rcutorture.c

··· 52 52 MODULE_LICENSE("GPL"); 53 53 MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com> and Josh Triplett <josh@joshtriplett.org>"); 54 54 55 - #ifndef data_race 56 - #define data_race(expr) \ 57 - ({ \ 58 - expr; \ 59 - }) 60 - #endif 61 - #ifndef ASSERT_EXCLUSIVE_WRITER 62 - #define ASSERT_EXCLUSIVE_WRITER(var) do { } while (0) 63 - #endif 64 - #ifndef ASSERT_EXCLUSIVE_ACCESS 65 - #define ASSERT_EXCLUSIVE_ACCESS(var) do { } while (0) 66 - #endif 67 - 68 55 /* Bits for ->extendables field, extendables param, and related definitions. */ 69 56 #define RCUTORTURE_RDR_SHIFT 8 /* Put SRCU index in upper bits. */ 70 57 #define RCUTORTURE_RDR_MASK ((1 << RCUTORTURE_RDR_SHIFT) - 1) ··· 87 100 "Use normal (non-expedited) GP wait primitives"); 88 101 torture_param(bool, gp_sync, false, "Use synchronous GP wait primitives"); 89 102 torture_param(int, irqreader, 1, "Allow RCU readers from irq handlers"); 103 + torture_param(int, leakpointer, 0, "Leak pointer dereferences from readers"); 90 104 torture_param(int, n_barrier_cbs, 0, 91 105 "# of callbacks/kthreads for barrier testing"); 92 106 torture_param(int, nfakewriters, 4, "Number of RCU fake writer threads"); ··· 173 185 static unsigned long n_read_exits; 174 186 static struct list_head rcu_torture_removed; 175 187 static unsigned long shutdown_jiffies; 188 + static unsigned long start_gp_seq; 176 189 177 190 static int rcu_torture_writer_state; 178 191 #define RTWS_FIXED_DELAY 0 ··· 1402 1413 preempt_enable(); 1403 1414 rcutorture_one_extend(&readstate, 0, trsp, rtrsp); 1404 1415 WARN_ON_ONCE(readstate & RCUTORTURE_RDR_MASK); 1416 + // This next splat is expected behavior if leakpointer, especially 1417 + // for CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels. 1418 + WARN_ON_ONCE(leakpointer && READ_ONCE(p->rtort_pipe_count) > 1); 1405 1419 1406 1420 /* If error or close call, record the sequence of reader protections. */ 1407 1421 if ((pipe_count > 1 || completed > 1) && !xchg(&err_segs_recorded, 1)) { ··· 1800 1808 unsigned long rcu_launder_gp_seq_start; 1801 1809 }; 1802 1810 1811 + static DEFINE_MUTEX(rcu_fwd_mutex); 1803 1812 static struct rcu_fwd *rcu_fwds; 1804 1813 static bool rcu_fwd_emergency_stop; 1805 1814 ··· 2067 2074 static int rcutorture_oom_notify(struct notifier_block *self, 2068 2075 unsigned long notused, void *nfreed) 2069 2076 { 2070 - struct rcu_fwd *rfp = rcu_fwds; 2077 + struct rcu_fwd *rfp; 2071 2078 2079 + mutex_lock(&rcu_fwd_mutex); 2080 + rfp = rcu_fwds; 2081 + if (!rfp) { 2082 + mutex_unlock(&rcu_fwd_mutex); 2083 + return NOTIFY_OK; 2084 + } 2072 2085 WARN(1, "%s invoked upon OOM during forward-progress testing.\n", 2073 2086 __func__); 2074 2087 rcu_torture_fwd_cb_hist(rfp); ··· 2092 2093 smp_mb(); /* Frees before return to avoid redoing OOM. */ 2093 2094 (*(unsigned long *)nfreed)++; /* Forward progress CBs freed! */ 2094 2095 pr_info("%s returning after OOM processing.\n", __func__); 2096 + mutex_unlock(&rcu_fwd_mutex); 2095 2097 return NOTIFY_OK; 2096 2098 } 2097 2099 ··· 2114 2114 do { 2115 2115 schedule_timeout_interruptible(fwd_progress_holdoff * HZ); 2116 2116 WRITE_ONCE(rcu_fwd_emergency_stop, false); 2117 - register_oom_notifier(&rcutorture_oom_nb); 2118 2117 if (!IS_ENABLED(CONFIG_TINY_RCU) || 2119 2118 rcu_inkernel_boot_has_ended()) 2120 2119 rcu_torture_fwd_prog_nr(rfp, &tested, &tested_tries); 2121 2120 if (rcu_inkernel_boot_has_ended()) 2122 2121 rcu_torture_fwd_prog_cr(rfp); 2123 - unregister_oom_notifier(&rcutorture_oom_nb); 2124 2122 2125 2123 /* Avoid slow periods, better to test when busy. */ 2126 2124 stutter_wait("rcu_torture_fwd_prog"); ··· 2158 2160 return -ENOMEM; 2159 2161 spin_lock_init(&rfp->rcu_fwd_lock); 2160 2162 rfp->rcu_fwd_cb_tail = &rfp->rcu_fwd_cb_head; 2163 + mutex_lock(&rcu_fwd_mutex); 2164 + rcu_fwds = rfp; 2165 + mutex_unlock(&rcu_fwd_mutex); 2166 + register_oom_notifier(&rcutorture_oom_nb); 2161 2167 return torture_create_kthread(rcu_torture_fwd_prog, rfp, fwd_prog_task); 2168 + } 2169 + 2170 + static void rcu_torture_fwd_prog_cleanup(void) 2171 + { 2172 + struct rcu_fwd *rfp; 2173 + 2174 + torture_stop_kthread(rcu_torture_fwd_prog, fwd_prog_task); 2175 + rfp = rcu_fwds; 2176 + mutex_lock(&rcu_fwd_mutex); 2177 + rcu_fwds = NULL; 2178 + mutex_unlock(&rcu_fwd_mutex); 2179 + unregister_oom_notifier(&rcutorture_oom_nb); 2180 + kfree(rfp); 2162 2181 } 2163 2182 2164 2183 /* Callback function for RCU barrier testing. */ ··· 2475 2460 show_rcu_gp_kthreads(); 2476 2461 rcu_torture_read_exit_cleanup(); 2477 2462 rcu_torture_barrier_cleanup(); 2478 - torture_stop_kthread(rcu_torture_fwd_prog, fwd_prog_task); 2463 + rcu_torture_fwd_prog_cleanup(); 2479 2464 torture_stop_kthread(rcu_torture_stall, stall_task); 2480 2465 torture_stop_kthread(rcu_torture_writer, writer_task); 2481 2466 ··· 2497 2482 2498 2483 rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq); 2499 2484 srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq); 2500 - pr_alert("%s: End-test grace-period state: g%lu f%#x\n", 2501 - cur_ops->name, gp_seq, flags); 2485 + pr_alert("%s: End-test grace-period state: g%ld f%#x total-gps=%ld\n", 2486 + cur_ops->name, (long)gp_seq, flags, 2487 + rcutorture_seq_diff(gp_seq, start_gp_seq)); 2502 2488 torture_stop_kthread(rcu_torture_stats, stats_task); 2503 2489 torture_stop_kthread(rcu_torture_fqs, fqs_task); 2504 2490 if (rcu_torture_can_boost()) ··· 2623 2607 long i; 2624 2608 int cpu; 2625 2609 int firsterr = 0; 2610 + int flags = 0; 2611 + unsigned long gp_seq = 0; 2626 2612 static struct rcu_torture_ops *torture_ops[] = { 2627 2613 &rcu_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops, 2628 2614 &busted_srcud_ops, &tasks_ops, &tasks_rude_ops, ··· 2667 2649 nrealreaders = 1; 2668 2650 } 2669 2651 rcu_torture_print_module_parms(cur_ops, "Start of test"); 2652 + rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq); 2653 + srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq); 2654 + start_gp_seq = gp_seq; 2655 + pr_alert("%s: Start-test grace-period state: g%ld f%#x\n", 2656 + cur_ops->name, (long)gp_seq, flags); 2670 2657 2671 2658 /* Set up the freelist. */ 2672 2659

+5 -3

kernel/rcu/refscale.c

··· 546 546 // Print the average of all experiments 547 547 SCALEOUT("END OF TEST. Calculating average duration per loop (nanoseconds)...\n"); 548 548 549 - buf[0] = 0; 550 - strcat(buf, "\n"); 551 - strcat(buf, "Runs\tTime(ns)\n"); 549 + if (!errexit) { 550 + buf[0] = 0; 551 + strcat(buf, "\n"); 552 + strcat(buf, "Runs\tTime(ns)\n"); 553 + } 552 554 553 555 for (exp = 0; exp < nruns; exp++) { 554 556 u64 avg;

-13

kernel/rcu/srcutree.c

··· 29 29 #include "rcu.h" 30 30 #include "rcu_segcblist.h" 31 31 32 - #ifndef data_race 33 - #define data_race(expr) \ 34 - ({ \ 35 - expr; \ 36 - }) 37 - #endif 38 - #ifndef ASSERT_EXCLUSIVE_WRITER 39 - #define ASSERT_EXCLUSIVE_WRITER(var) do { } while (0) 40 - #endif 41 - #ifndef ASSERT_EXCLUSIVE_ACCESS 42 - #define ASSERT_EXCLUSIVE_ACCESS(var) do { } while (0) 43 - #endif 44 - 45 32 /* Holdoff in nanoseconds for auto-expediting. */ 46 33 #define DEFAULT_SRCU_EXP_HOLDOFF (25 * 1000) 47 34 static ulong exp_holdoff = DEFAULT_SRCU_EXP_HOLDOFF;

+111 -56

kernel/rcu/tree.c

··· 70 70 #endif 71 71 #define MODULE_PARAM_PREFIX "rcutree." 72 72 73 - #ifndef data_race 74 - #define data_race(expr) \ 75 - ({ \ 76 - expr; \ 77 - }) 78 - #endif 79 - #ifndef ASSERT_EXCLUSIVE_WRITER 80 - #define ASSERT_EXCLUSIVE_WRITER(var) do { } while (0) 81 - #endif 82 - #ifndef ASSERT_EXCLUSIVE_ACCESS 83 - #define ASSERT_EXCLUSIVE_ACCESS(var) do { } while (0) 84 - #endif 85 - 86 73 /* Data structures. */ 87 74 88 75 /* ··· 164 177 module_param(gp_init_delay, int, 0444); 165 178 static int gp_cleanup_delay; 166 179 module_param(gp_cleanup_delay, int, 0444); 180 + 181 + // Add delay to rcu_read_unlock() for strict grace periods. 182 + static int rcu_unlock_delay; 183 + #ifdef CONFIG_RCU_STRICT_GRACE_PERIOD 184 + module_param(rcu_unlock_delay, int, 0444); 185 + #endif 167 186 168 187 /* 169 188 * This rcu parameter is runtime-read-only. It reflects ··· 461 468 return __this_cpu_read(rcu_data.dynticks_nesting) == 0; 462 469 } 463 470 464 - #define DEFAULT_RCU_BLIMIT 10 /* Maximum callbacks per rcu_do_batch ... */ 465 - #define DEFAULT_MAX_RCU_BLIMIT 10000 /* ... even during callback flood. */ 471 + #define DEFAULT_RCU_BLIMIT (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ? 1000 : 10) 472 + // Maximum callbacks per rcu_do_batch ... 473 + #define DEFAULT_MAX_RCU_BLIMIT 10000 // ... even during callback flood. 466 474 static long blimit = DEFAULT_RCU_BLIMIT; 467 - #define DEFAULT_RCU_QHIMARK 10000 /* If this many pending, ignore blimit. */ 475 + #define DEFAULT_RCU_QHIMARK 10000 // If this many pending, ignore blimit. 468 476 static long qhimark = DEFAULT_RCU_QHIMARK; 469 - #define DEFAULT_RCU_QLOMARK 100 /* Once only this many pending, use blimit. */ 477 + #define DEFAULT_RCU_QLOMARK 100 // Once only this many pending, use blimit. 470 478 static long qlowmark = DEFAULT_RCU_QLOMARK; 471 479 #define DEFAULT_RCU_QOVLD_MULT 2 472 480 #define DEFAULT_RCU_QOVLD (DEFAULT_RCU_QOVLD_MULT * DEFAULT_RCU_QHIMARK) 473 - static long qovld = DEFAULT_RCU_QOVLD; /* If this many pending, hammer QS. */ 474 - static long qovld_calc = -1; /* No pre-initialization lock acquisitions! */ 481 + static long qovld = DEFAULT_RCU_QOVLD; // If this many pending, hammer QS. 482 + static long qovld_calc = -1; // No pre-initialization lock acquisitions! 475 483 476 484 module_param(blimit, long, 0444); 477 485 module_param(qhimark, long, 0444); 478 486 module_param(qlowmark, long, 0444); 479 487 module_param(qovld, long, 0444); 480 488 481 - static ulong jiffies_till_first_fqs = ULONG_MAX; 489 + static ulong jiffies_till_first_fqs = IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ? 0 : ULONG_MAX; 482 490 static ulong jiffies_till_next_fqs = ULONG_MAX; 483 491 static bool rcu_kick_kthreads; 484 492 static int rcu_divisor = 7; ··· 1086 1092 } 1087 1093 } 1088 1094 1089 - noinstr bool __rcu_is_watching(void) 1090 - { 1091 - return !rcu_dynticks_curr_cpu_in_eqs(); 1092 - } 1093 - 1094 1095 /** 1095 1096 * rcu_is_watching - see if RCU thinks that the current CPU is not idle 1096 1097 * ··· 1218 1229 return 1; 1219 1230 } 1220 1231 1221 - /* If waiting too long on an offline CPU, complain. */ 1222 - if (!(rdp->grpmask & rcu_rnp_online_cpus(rnp)) && 1223 - time_after(jiffies, rcu_state.gp_start + HZ)) { 1232 + /* 1233 + * Complain if a CPU that is considered to be offline from RCU's 1234 + * perspective has not yet reported a quiescent state. After all, 1235 + * the offline CPU should have reported a quiescent state during 1236 + * the CPU-offline process, or, failing that, by rcu_gp_init() 1237 + * if it ran concurrently with either the CPU going offline or the 1238 + * last task on a leaf rcu_node structure exiting its RCU read-side 1239 + * critical section while all CPUs corresponding to that structure 1240 + * are offline. This added warning detects bugs in any of these 1241 + * code paths. 1242 + * 1243 + * The rcu_node structure's ->lock is held here, which excludes 1244 + * the relevant portions the CPU-hotplug code, the grace-period 1245 + * initialization code, and the rcu_read_unlock() code paths. 1246 + * 1247 + * For more detail, please refer to the "Hotplug CPU" section 1248 + * of RCU's Requirements documentation. 1249 + */ 1250 + if (WARN_ON_ONCE(!(rdp->grpmask & rcu_rnp_online_cpus(rnp)))) { 1224 1251 bool onl; 1225 1252 struct rcu_node *rnp1; 1226 1253 1227 - WARN_ON(1); /* Offline CPUs are supposed to report QS! */ 1228 1254 pr_info("%s: grp: %d-%d level: %d ->gp_seq %ld ->completedqs %ld\n", 1229 1255 __func__, rnp->grplo, rnp->grphi, rnp->level, 1230 1256 (long)rnp->gp_seq, (long)rnp->completedqs); ··· 1502 1498 1503 1499 /* Trace depending on how much we were able to accelerate. */ 1504 1500 if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL)) 1505 - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB")); 1501 + trace_rcu_grace_period(rcu_state.name, gp_seq_req, TPS("AccWaitCB")); 1506 1502 else 1507 - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB")); 1503 + trace_rcu_grace_period(rcu_state.name, gp_seq_req, TPS("AccReadyCB")); 1504 + 1508 1505 return ret; 1509 1506 } 1510 1507 ··· 1581 1576 } 1582 1577 1583 1578 /* 1579 + * In CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, attempt to generate a 1580 + * quiescent state. This is intended to be invoked when the CPU notices 1581 + * a new grace period. 1582 + */ 1583 + static void rcu_strict_gp_check_qs(void) 1584 + { 1585 + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) { 1586 + rcu_read_lock(); 1587 + rcu_read_unlock(); 1588 + } 1589 + } 1590 + 1591 + /* 1584 1592 * Update CPU-local rcu_data state to record the beginnings and ends of 1585 1593 * grace periods. The caller must hold the ->lock of the leaf rcu_node 1586 1594 * structure corresponding to the current CPU, and must have irqs disabled. ··· 1663 1645 } 1664 1646 needwake = __note_gp_changes(rnp, rdp); 1665 1647 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 1648 + rcu_strict_gp_check_qs(); 1666 1649 if (needwake) 1667 1650 rcu_gp_kthread_wake(); 1668 1651 } ··· 1699 1680 schedule_timeout_idle(duration); 1700 1681 pr_alert("%s: Wait complete\n", __func__); 1701 1682 } 1683 + } 1684 + 1685 + /* 1686 + * Handler for on_each_cpu() to invoke the target CPU's RCU core 1687 + * processing. 1688 + */ 1689 + static void rcu_strict_gp_boundary(void *unused) 1690 + { 1691 + invoke_rcu_core(); 1702 1692 } 1703 1693 1704 1694 /* ··· 1748 1720 raw_spin_unlock_irq_rcu_node(rnp); 1749 1721 1750 1722 /* 1751 - * Apply per-leaf buffered online and offline operations to the 1752 - * rcu_node tree. Note that this new grace period need not wait 1753 - * for subsequent online CPUs, and that quiescent-state forcing 1754 - * will handle subsequent offline CPUs. 1723 + * Apply per-leaf buffered online and offline operations to 1724 + * the rcu_node tree. Note that this new grace period need not 1725 + * wait for subsequent online CPUs, and that RCU hooks in the CPU 1726 + * offlining path, when combined with checks in this function, 1727 + * will handle CPUs that are currently going offline or that will 1728 + * go offline later. Please also refer to "Hotplug CPU" section 1729 + * of RCU's Requirements documentation. 1755 1730 */ 1756 1731 rcu_state.gp_state = RCU_GP_ONOFF; 1757 1732 rcu_for_each_leaf_node(rnp) { ··· 1840 1809 cond_resched_tasks_rcu_qs(); 1841 1810 WRITE_ONCE(rcu_state.gp_activity, jiffies); 1842 1811 } 1812 + 1813 + // If strict, make all CPUs aware of new grace period. 1814 + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) 1815 + on_each_cpu(rcu_strict_gp_boundary, NULL, 0); 1843 1816 1844 1817 return true; 1845 1818 } ··· 1933 1898 break; 1934 1899 /* If time for quiescent-state forcing, do it. */ 1935 1900 if (!time_after(rcu_state.jiffies_force_qs, jiffies) || 1936 - (gf & RCU_GP_FLAG_FQS)) { 1901 + (gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) { 1937 1902 trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, 1938 1903 TPS("fqsstart")); 1939 1904 rcu_gp_fqs(first_gp_fqs); ··· 2061 2026 rcu_state.gp_flags & RCU_GP_FLAG_INIT); 2062 2027 } 2063 2028 raw_spin_unlock_irq_rcu_node(rnp); 2029 + 2030 + // If strict, make all CPUs aware of the end of the old grace period. 2031 + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) 2032 + on_each_cpu(rcu_strict_gp_boundary, NULL, 0); 2064 2033 } 2065 2034 2066 2035 /* ··· 2243 2204 * structure. This must be called from the specified CPU. 2244 2205 */ 2245 2206 static void 2246 - rcu_report_qs_rdp(int cpu, struct rcu_data *rdp) 2207 + rcu_report_qs_rdp(struct rcu_data *rdp) 2247 2208 { 2248 2209 unsigned long flags; 2249 2210 unsigned long mask; ··· 2252 2213 rcu_segcblist_is_offloaded(&rdp->cblist); 2253 2214 struct rcu_node *rnp; 2254 2215 2216 + WARN_ON_ONCE(rdp->cpu != smp_processor_id()); 2255 2217 rnp = rdp->mynode; 2256 2218 raw_spin_lock_irqsave_rcu_node(rnp, flags); 2257 2219 if (rdp->cpu_no_qs.b.norm || rdp->gp_seq != rnp->gp_seq || ··· 2269 2229 return; 2270 2230 } 2271 2231 mask = rdp->grpmask; 2272 - if (rdp->cpu == smp_processor_id()) 2273 - rdp->core_needs_qs = false; 2232 + rdp->core_needs_qs = false; 2274 2233 if ((rnp->qsmask & mask) == 0) { 2275 2234 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 2276 2235 } else { ··· 2318 2279 * Tell RCU we are done (but rcu_report_qs_rdp() will be the 2319 2280 * judge of that). 2320 2281 */ 2321 - rcu_report_qs_rdp(rdp->cpu, rdp); 2282 + rcu_report_qs_rdp(rdp); 2322 2283 } 2323 2284 2324 2285 /* ··· 2415 2376 */ 2416 2377 static void rcu_do_batch(struct rcu_data *rdp) 2417 2378 { 2379 + int div; 2418 2380 unsigned long flags; 2419 2381 const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) && 2420 2382 rcu_segcblist_is_offloaded(&rdp->cblist); ··· 2444 2404 rcu_nocb_lock(rdp); 2445 2405 WARN_ON_ONCE(cpu_is_offline(smp_processor_id())); 2446 2406 pending = rcu_segcblist_n_cbs(&rdp->cblist); 2447 - bl = max(rdp->blimit, pending >> rcu_divisor); 2448 - if (unlikely(bl > 100)) 2449 - tlimit = local_clock() + rcu_resched_ns; 2407 + div = READ_ONCE(rcu_divisor); 2408 + div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div; 2409 + bl = max(rdp->blimit, pending >> div); 2410 + if (unlikely(bl > 100)) { 2411 + long rrn = READ_ONCE(rcu_resched_ns); 2412 + 2413 + rrn = rrn < NSEC_PER_MSEC ? NSEC_PER_MSEC : rrn > NSEC_PER_SEC ? NSEC_PER_SEC : rrn; 2414 + tlimit = local_clock() + rrn; 2415 + } 2450 2416 trace_rcu_batch_start(rcu_state.name, 2451 2417 rcu_segcblist_n_cbs(&rdp->cblist), bl); 2452 2418 rcu_segcblist_extract_done_cbs(&rdp->cblist, &rcl); ··· 2593 2547 raw_spin_lock_irqsave_rcu_node(rnp, flags); 2594 2548 rcu_state.cbovldnext |= !!rnp->cbovldmask; 2595 2549 if (rnp->qsmask == 0) { 2596 - if (!IS_ENABLED(CONFIG_PREEMPT_RCU) || 2597 - rcu_preempt_blocked_readers_cgp(rnp)) { 2550 + if (rcu_preempt_blocked_readers_cgp(rnp)) { 2598 2551 /* 2599 2552 * No point in scanning bits because they 2600 2553 * are all zero. But we might need to ··· 2661 2616 } 2662 2617 EXPORT_SYMBOL_GPL(rcu_force_quiescent_state); 2663 2618 2619 + // Workqueue handler for an RCU reader for kernels enforcing struct RCU 2620 + // grace periods. 2621 + static void strict_work_handler(struct work_struct *work) 2622 + { 2623 + rcu_read_lock(); 2624 + rcu_read_unlock(); 2625 + } 2626 + 2664 2627 /* Perform RCU core processing work for the current CPU. */ 2665 2628 static __latent_entropy void rcu_core(void) 2666 2629 { ··· 2713 2660 /* Do any needed deferred wakeups of rcuo kthreads. */ 2714 2661 do_nocb_deferred_wakeup(rdp); 2715 2662 trace_rcu_utilization(TPS("End RCU core")); 2663 + 2664 + // If strict GPs, schedule an RCU reader in a clean environment. 2665 + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) 2666 + queue_work_on(rdp->cpu, rcu_gp_wq, &rdp->strict_work); 2716 2667 } 2717 2668 2718 2669 static void rcu_core_si(struct softirq_action *h) ··· 3500 3443 unsigned long count = 0; 3501 3444 3502 3445 /* Snapshot count of all CPUs */ 3503 - for_each_online_cpu(cpu) { 3446 + for_each_possible_cpu(cpu) { 3504 3447 struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 3505 3448 3506 3449 count += READ_ONCE(krcp->count); ··· 3515 3458 int cpu, freed = 0; 3516 3459 unsigned long flags; 3517 3460 3518 - for_each_online_cpu(cpu) { 3461 + for_each_possible_cpu(cpu) { 3519 3462 int count; 3520 3463 struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 3521 3464 ··· 3548 3491 int cpu; 3549 3492 unsigned long flags; 3550 3493 3551 - for_each_online_cpu(cpu) { 3494 + for_each_possible_cpu(cpu) { 3552 3495 struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 3553 3496 3554 3497 raw_spin_lock_irqsave(&krcp->lock, flags); ··· 3912 3855 3913 3856 /* Set up local state, ensuring consistent view of global state. */ 3914 3857 rdp->grpmask = leaf_node_cpu_bit(rdp->mynode, cpu); 3858 + INIT_WORK(&rdp->strict_work, strict_work_handler); 3915 3859 WARN_ON_ONCE(rdp->dynticks_nesting != 1); 3916 3860 WARN_ON_ONCE(rcu_dynticks_in_eqs(rcu_dynticks_snap(rdp))); 3917 3861 rdp->rcu_ofl_gp_seq = rcu_state.gp_seq; ··· 4031 3973 return 0; 4032 3974 } 4033 3975 4034 - static DEFINE_PER_CPU(int, rcu_cpu_started); 4035 - 4036 3976 /* 4037 3977 * Mark the specified CPU as being online so that subsequent grace periods 4038 3978 * (both expedited and normal) will wait on it. Note that this means that ··· 4050 3994 struct rcu_node *rnp; 4051 3995 bool newcpu; 4052 3996 4053 - if (per_cpu(rcu_cpu_started, cpu)) 4054 - return; 4055 - 4056 - per_cpu(rcu_cpu_started, cpu) = 1; 4057 - 4058 3997 rdp = per_cpu_ptr(&rcu_data, cpu); 3998 + if (rdp->cpu_started) 3999 + return; 4000 + rdp->cpu_started = true; 4001 + 4059 4002 rnp = rdp->mynode; 4060 4003 mask = rdp->grpmask; 4061 4004 raw_spin_lock_irqsave_rcu_node(rnp, flags); ··· 4114 4059 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 4115 4060 raw_spin_unlock(&rcu_state.ofl_lock); 4116 4061 4117 - per_cpu(rcu_cpu_started, cpu) = 0; 4062 + rdp->cpu_started = false; 4118 4063 } 4119 4064 4120 4065 /*

+2

kernel/rcu/tree.h

··· 156 156 bool beenonline; /* CPU online at least once. */ 157 157 bool gpwrap; /* Possible ->gp_seq wrap. */ 158 158 bool exp_deferred_qs; /* This CPU awaiting a deferred QS? */ 159 + bool cpu_started; /* RCU watching this onlining CPU. */ 159 160 struct rcu_node *mynode; /* This CPU's leaf of hierarchy */ 160 161 unsigned long grpmask; /* Mask to apply to leaf qsmask. */ 161 162 unsigned long ticks_this_gp; /* The number of scheduling-clock */ ··· 165 164 /* period it is aware of. */ 166 165 struct irq_work defer_qs_iw; /* Obtain later scheduler attention. */ 167 166 bool defer_qs_iw_pending; /* Scheduler attention pending? */ 167 + struct work_struct strict_work; /* Schedule readers for strict GPs. */ 168 168 169 169 /* 2) batch handling */ 170 170 struct rcu_segcblist cblist; /* Segmented callback list, with */

+2 -4

kernel/rcu/tree_exp.h

··· 732 732 /* Invoked on each online non-idle CPU for expedited quiescent state. */ 733 733 static void rcu_exp_handler(void *unused) 734 734 { 735 - struct rcu_data *rdp; 736 - struct rcu_node *rnp; 735 + struct rcu_data *rdp = this_cpu_ptr(&rcu_data); 736 + struct rcu_node *rnp = rdp->mynode; 737 737 738 - rdp = this_cpu_ptr(&rcu_data); 739 - rnp = rdp->mynode; 740 738 if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) || 741 739 __this_cpu_read(rcu_data.cpu_no_qs.b.exp)) 742 740 return;

+34 -6

kernel/rcu/tree_plugin.h

··· 36 36 pr_info("\tRCU dyntick-idle grace-period acceleration is enabled.\n"); 37 37 if (IS_ENABLED(CONFIG_PROVE_RCU)) 38 38 pr_info("\tRCU lockdep checking is enabled.\n"); 39 + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) 40 + pr_info("\tRCU strict (and thus non-scalable) grace periods enabled.\n"); 39 41 if (RCU_NUM_LVLS >= 4) 40 42 pr_info("\tFour(or more)-level hierarchy is enabled.\n"); 41 43 if (RCU_FANOUT_LEAF != 16) ··· 376 374 rcu_preempt_read_enter(); 377 375 if (IS_ENABLED(CONFIG_PROVE_LOCKING)) 378 376 WARN_ON_ONCE(rcu_preempt_depth() > RCU_NEST_PMAX); 377 + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) && rcu_state.gp_kthread) 378 + WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true); 379 379 barrier(); /* critical section after entry code. */ 380 380 } 381 381 EXPORT_SYMBOL_GPL(__rcu_read_lock); ··· 459 455 return; 460 456 } 461 457 t->rcu_read_unlock_special.s = 0; 462 - if (special.b.need_qs) 463 - rcu_qs(); 458 + if (special.b.need_qs) { 459 + if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) { 460 + rcu_report_qs_rdp(rdp); 461 + udelay(rcu_unlock_delay); 462 + } else { 463 + rcu_qs(); 464 + } 465 + } 464 466 465 467 /* 466 468 * Respond to a request by an expedited grace period for a ··· 777 767 } 778 768 779 769 #else /* #ifdef CONFIG_PREEMPT_RCU */ 770 + 771 + /* 772 + * If strict grace periods are enabled, and if the calling 773 + * __rcu_read_unlock() marks the beginning of a quiescent state, immediately 774 + * report that quiescent state and, if requested, spin for a bit. 775 + */ 776 + void rcu_read_unlock_strict(void) 777 + { 778 + struct rcu_data *rdp; 779 + 780 + if (!IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) || 781 + irqs_disabled() || preempt_count() || !rcu_state.gp_kthread) 782 + return; 783 + rdp = this_cpu_ptr(&rcu_data); 784 + rcu_report_qs_rdp(rdp); 785 + udelay(rcu_unlock_delay); 786 + } 787 + EXPORT_SYMBOL_GPL(rcu_read_unlock_strict); 780 788 781 789 /* 782 790 * Tell them what RCU they are running. ··· 1954 1926 * nearest grace period (if any) to wait for next. The CB kthreads 1955 1927 * and the global grace-period kthread are awakened if needed. 1956 1928 */ 1929 + WARN_ON_ONCE(my_rdp->nocb_gp_rdp != my_rdp); 1957 1930 for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) { 1958 1931 trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check")); 1959 1932 rcu_nocb_lock_irqsave(rdp, flags); ··· 2440 2411 return; 2441 2412 2442 2413 waslocked = raw_spin_is_locked(&rdp->nocb_gp_lock); 2443 - wastimer = timer_pending(&rdp->nocb_timer); 2414 + wastimer = timer_pending(&rdp->nocb_bypass_timer); 2444 2415 wassleep = swait_active(&rdp->nocb_gp_wq); 2445 - if (!rdp->nocb_defer_wakeup && !rdp->nocb_gp_sleep && 2446 - !waslocked && !wastimer && !wassleep) 2416 + if (!rdp->nocb_gp_sleep && !waslocked && !wastimer && !wassleep) 2447 2417 return; /* Nothing untowards. */ 2448 2418 2449 - pr_info(" !!! %c%c%c%c %c\n", 2419 + pr_info(" nocb GP activity on CB-only CPU!!! %c%c%c%c %c\n", 2450 2420 "lL"[waslocked], 2451 2421 "dD"[!!rdp->nocb_defer_wakeup], 2452 2422 "tT"[wastimer],

+4 -4

kernel/rcu/tree_stall.h

··· 158 158 { 159 159 unsigned long j; 160 160 161 - if (!rcu_kick_kthreads) 161 + if (!READ_ONCE(rcu_kick_kthreads)) 162 162 return; 163 163 j = READ_ONCE(rcu_state.jiffies_kick_kthreads); 164 164 if (time_after(jiffies, j) && rcu_state.gp_kthread && ··· 580 580 unsigned long js; 581 581 struct rcu_node *rnp; 582 582 583 - if ((rcu_stall_is_suppressed() && !rcu_kick_kthreads) || 583 + if ((rcu_stall_is_suppressed() && !READ_ONCE(rcu_kick_kthreads)) || 584 584 !rcu_gp_in_progress()) 585 585 return; 586 586 rcu_stall_kick_kthreads(); ··· 623 623 624 624 /* We haven't checked in, so go dump stack. */ 625 625 print_cpu_stall(gps); 626 - if (rcu_cpu_stall_ftrace_dump) 626 + if (READ_ONCE(rcu_cpu_stall_ftrace_dump)) 627 627 rcu_ftrace_dump(DUMP_ALL); 628 628 629 629 } else if (rcu_gp_in_progress() && ··· 632 632 633 633 /* They had a few time units to dump stack, so complain. */ 634 634 print_other_cpu_stall(gs2, gps); 635 - if (rcu_cpu_stall_ftrace_dump) 635 + if (READ_ONCE(rcu_cpu_stall_ftrace_dump)) 636 636 rcu_ftrace_dump(DUMP_ALL); 637 637 } 638 638 }

-13

kernel/rcu/update.c

··· 53 53 #endif 54 54 #define MODULE_PARAM_PREFIX "rcupdate." 55 55 56 - #ifndef data_race 57 - #define data_race(expr) \ 58 - ({ \ 59 - expr; \ 60 - }) 61 - #endif 62 - #ifndef ASSERT_EXCLUSIVE_WRITER 63 - #define ASSERT_EXCLUSIVE_WRITER(var) do { } while (0) 64 - #endif 65 - #ifndef ASSERT_EXCLUSIVE_ACCESS 66 - #define ASSERT_EXCLUSIVE_ACCESS(var) do { } while (0) 67 - #endif 68 - 69 56 #ifndef CONFIG_TINY_RCU 70 57 module_param(rcu_expedited, int, 0); 71 58 module_param(rcu_normal, int, 0);

+575

kernel/scftorture.c

··· 1 + // SPDX-License-Identifier: GPL-2.0+ 2 + // 3 + // Torture test for smp_call_function() and friends. 4 + // 5 + // Copyright (C) Facebook, 2020. 6 + // 7 + // Author: Paul E. McKenney <paulmck@kernel.org> 8 + 9 + #define pr_fmt(fmt) fmt 10 + 11 + #include <linux/atomic.h> 12 + #include <linux/bitops.h> 13 + #include <linux/completion.h> 14 + #include <linux/cpu.h> 15 + #include <linux/delay.h> 16 + #include <linux/err.h> 17 + #include <linux/init.h> 18 + #include <linux/interrupt.h> 19 + #include <linux/kthread.h> 20 + #include <linux/kernel.h> 21 + #include <linux/mm.h> 22 + #include <linux/module.h> 23 + #include <linux/moduleparam.h> 24 + #include <linux/notifier.h> 25 + #include <linux/percpu.h> 26 + #include <linux/rcupdate.h> 27 + #include <linux/rcupdate_trace.h> 28 + #include <linux/reboot.h> 29 + #include <linux/sched.h> 30 + #include <linux/spinlock.h> 31 + #include <linux/smp.h> 32 + #include <linux/stat.h> 33 + #include <linux/srcu.h> 34 + #include <linux/slab.h> 35 + #include <linux/torture.h> 36 + #include <linux/types.h> 37 + 38 + #define SCFTORT_STRING "scftorture" 39 + #define SCFTORT_FLAG SCFTORT_STRING ": " 40 + 41 + #define SCFTORTOUT(s, x...) \ 42 + pr_alert(SCFTORT_FLAG s, ## x) 43 + 44 + #define VERBOSE_SCFTORTOUT(s, x...) \ 45 + do { if (verbose) pr_alert(SCFTORT_FLAG s, ## x); } while (0) 46 + 47 + #define VERBOSE_SCFTORTOUT_ERRSTRING(s, x...) \ 48 + do { if (verbose) pr_alert(SCFTORT_FLAG "!!! " s, ## x); } while (0) 49 + 50 + MODULE_LICENSE("GPL"); 51 + MODULE_AUTHOR("Paul E. McKenney <paulmck@kernel.org>"); 52 + 53 + // Wait until there are multiple CPUs before starting test. 54 + torture_param(int, holdoff, IS_BUILTIN(CONFIG_SCF_TORTURE_TEST) ? 10 : 0, 55 + "Holdoff time before test start (s)"); 56 + torture_param(int, longwait, 0, "Include ridiculously long waits? (seconds)"); 57 + torture_param(int, nthreads, -1, "# threads, defaults to -1 for all CPUs."); 58 + torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)"); 59 + torture_param(int, onoff_interval, 0, "Time between CPU hotplugs (s), 0=disable"); 60 + torture_param(int, shutdown_secs, 0, "Shutdown time (ms), <= zero to disable."); 61 + torture_param(int, stat_interval, 60, "Number of seconds between stats printk()s."); 62 + torture_param(int, stutter_cpus, 5, "Number of jiffies to change CPUs under test, 0=disable"); 63 + torture_param(bool, use_cpus_read_lock, 0, "Use cpus_read_lock() to exclude CPU hotplug."); 64 + torture_param(int, verbose, 0, "Enable verbose debugging printk()s"); 65 + torture_param(int, weight_single, -1, "Testing weight for single-CPU no-wait operations."); 66 + torture_param(int, weight_single_wait, -1, "Testing weight for single-CPU operations."); 67 + torture_param(int, weight_many, -1, "Testing weight for multi-CPU no-wait operations."); 68 + torture_param(int, weight_many_wait, -1, "Testing weight for multi-CPU operations."); 69 + torture_param(int, weight_all, -1, "Testing weight for all-CPU no-wait operations."); 70 + torture_param(int, weight_all_wait, -1, "Testing weight for all-CPU operations."); 71 + 72 + char *torture_type = ""; 73 + 74 + #ifdef MODULE 75 + # define SCFTORT_SHUTDOWN 0 76 + #else 77 + # define SCFTORT_SHUTDOWN 1 78 + #endif 79 + 80 + torture_param(bool, shutdown, SCFTORT_SHUTDOWN, "Shutdown at end of torture test."); 81 + 82 + struct scf_statistics { 83 + struct task_struct *task; 84 + int cpu; 85 + long long n_single; 86 + long long n_single_ofl; 87 + long long n_single_wait; 88 + long long n_single_wait_ofl; 89 + long long n_many; 90 + long long n_many_wait; 91 + long long n_all; 92 + long long n_all_wait; 93 + }; 94 + 95 + static struct scf_statistics *scf_stats_p; 96 + static struct task_struct *scf_torture_stats_task; 97 + static DEFINE_PER_CPU(long long, scf_invoked_count); 98 + 99 + // Data for random primitive selection 100 + #define SCF_PRIM_SINGLE 0 101 + #define SCF_PRIM_MANY 1 102 + #define SCF_PRIM_ALL 2 103 + #define SCF_NPRIMS (2 * 3) // Need wait and no-wait versions of each. 104 + 105 + static char *scf_prim_name[] = { 106 + "smp_call_function_single", 107 + "smp_call_function_many", 108 + "smp_call_function", 109 + }; 110 + 111 + struct scf_selector { 112 + unsigned long scfs_weight; 113 + int scfs_prim; 114 + bool scfs_wait; 115 + }; 116 + static struct scf_selector scf_sel_array[SCF_NPRIMS]; 117 + static int scf_sel_array_len; 118 + static unsigned long scf_sel_totweight; 119 + 120 + // Communicate between caller and handler. 121 + struct scf_check { 122 + bool scfc_in; 123 + bool scfc_out; 124 + int scfc_cpu; // -1 for not _single(). 125 + bool scfc_wait; 126 + }; 127 + 128 + // Use to wait for all threads to start. 129 + static atomic_t n_started; 130 + static atomic_t n_errs; 131 + static atomic_t n_mb_in_errs; 132 + static atomic_t n_mb_out_errs; 133 + static atomic_t n_alloc_errs; 134 + static bool scfdone; 135 + static char *bangstr = ""; 136 + 137 + static DEFINE_TORTURE_RANDOM_PERCPU(scf_torture_rand); 138 + 139 + // Print torture statistics. Caller must ensure serialization. 140 + static void scf_torture_stats_print(void) 141 + { 142 + int cpu; 143 + int i; 144 + long long invoked_count = 0; 145 + bool isdone = READ_ONCE(scfdone); 146 + struct scf_statistics scfs = {}; 147 + 148 + for_each_possible_cpu(cpu) 149 + invoked_count += data_race(per_cpu(scf_invoked_count, cpu)); 150 + for (i = 0; i < nthreads; i++) { 151 + scfs.n_single += scf_stats_p[i].n_single; 152 + scfs.n_single_ofl += scf_stats_p[i].n_single_ofl; 153 + scfs.n_single_wait += scf_stats_p[i].n_single_wait; 154 + scfs.n_single_wait_ofl += scf_stats_p[i].n_single_wait_ofl; 155 + scfs.n_many += scf_stats_p[i].n_many; 156 + scfs.n_many_wait += scf_stats_p[i].n_many_wait; 157 + scfs.n_all += scf_stats_p[i].n_all; 158 + scfs.n_all_wait += scf_stats_p[i].n_all_wait; 159 + } 160 + if (atomic_read(&n_errs) || atomic_read(&n_mb_in_errs) || 161 + atomic_read(&n_mb_out_errs) || atomic_read(&n_alloc_errs)) 162 + bangstr = "!!! "; 163 + pr_alert("%s %sscf_invoked_count %s: %lld single: %lld/%lld single_ofl: %lld/%lld many: %lld/%lld all: %lld/%lld ", 164 + SCFTORT_FLAG, bangstr, isdone ? "VER" : "ver", invoked_count, 165 + scfs.n_single, scfs.n_single_wait, scfs.n_single_ofl, scfs.n_single_wait_ofl, 166 + scfs.n_many, scfs.n_many_wait, scfs.n_all, scfs.n_all_wait); 167 + torture_onoff_stats(); 168 + pr_cont("ste: %d stnmie: %d stnmoe: %d staf: %d\n", atomic_read(&n_errs), 169 + atomic_read(&n_mb_in_errs), atomic_read(&n_mb_out_errs), 170 + atomic_read(&n_alloc_errs)); 171 + } 172 + 173 + // Periodically prints torture statistics, if periodic statistics printing 174 + // was specified via the stat_interval module parameter. 175 + static int 176 + scf_torture_stats(void *arg) 177 + { 178 + VERBOSE_TOROUT_STRING("scf_torture_stats task started"); 179 + do { 180 + schedule_timeout_interruptible(stat_interval * HZ); 181 + scf_torture_stats_print(); 182 + torture_shutdown_absorb("scf_torture_stats"); 183 + } while (!torture_must_stop()); 184 + torture_kthread_stopping("scf_torture_stats"); 185 + return 0; 186 + } 187 + 188 + // Add a primitive to the scf_sel_array[]. 189 + static void scf_sel_add(unsigned long weight, int prim, bool wait) 190 + { 191 + struct scf_selector *scfsp = &scf_sel_array[scf_sel_array_len]; 192 + 193 + // If no weight, if array would overflow, if computing three-place 194 + // percentages would overflow, or if the scf_prim_name[] array would 195 + // overflow, don't bother. In the last three two cases, complain. 196 + if (!weight || 197 + WARN_ON_ONCE(scf_sel_array_len >= ARRAY_SIZE(scf_sel_array)) || 198 + WARN_ON_ONCE(0 - 100000 * weight <= 100000 * scf_sel_totweight) || 199 + WARN_ON_ONCE(prim >= ARRAY_SIZE(scf_prim_name))) 200 + return; 201 + scf_sel_totweight += weight; 202 + scfsp->scfs_weight = scf_sel_totweight; 203 + scfsp->scfs_prim = prim; 204 + scfsp->scfs_wait = wait; 205 + scf_sel_array_len++; 206 + } 207 + 208 + // Dump out weighting percentages for scf_prim_name[] array. 209 + static void scf_sel_dump(void) 210 + { 211 + int i; 212 + unsigned long oldw = 0; 213 + struct scf_selector *scfsp; 214 + unsigned long w; 215 + 216 + for (i = 0; i < scf_sel_array_len; i++) { 217 + scfsp = &scf_sel_array[i]; 218 + w = (scfsp->scfs_weight - oldw) * 100000 / scf_sel_totweight; 219 + pr_info("%s: %3lu.%03lu %s(%s)\n", __func__, w / 1000, w % 1000, 220 + scf_prim_name[scfsp->scfs_prim], 221 + scfsp->scfs_wait ? "wait" : "nowait"); 222 + oldw = scfsp->scfs_weight; 223 + } 224 + } 225 + 226 + // Randomly pick a primitive and wait/nowait, based on weightings. 227 + static struct scf_selector *scf_sel_rand(struct torture_random_state *trsp) 228 + { 229 + int i; 230 + unsigned long w = torture_random(trsp) % (scf_sel_totweight + 1); 231 + 232 + for (i = 0; i < scf_sel_array_len; i++) 233 + if (scf_sel_array[i].scfs_weight >= w) 234 + return &scf_sel_array[i]; 235 + WARN_ON_ONCE(1); 236 + return &scf_sel_array[0]; 237 + } 238 + 239 + // Update statistics and occasionally burn up mass quantities of CPU time, 240 + // if told to do so via scftorture.longwait. Otherwise, occasionally burn 241 + // a little bit. 242 + static void scf_handler(void *scfc_in) 243 + { 244 + int i; 245 + int j; 246 + unsigned long r = torture_random(this_cpu_ptr(&scf_torture_rand)); 247 + struct scf_check *scfcp = scfc_in; 248 + 249 + if (likely(scfcp)) { 250 + WRITE_ONCE(scfcp->scfc_out, false); // For multiple receivers. 251 + if (WARN_ON_ONCE(unlikely(!READ_ONCE(scfcp->scfc_in)))) 252 + atomic_inc(&n_mb_in_errs); 253 + } 254 + this_cpu_inc(scf_invoked_count); 255 + if (longwait <= 0) { 256 + if (!(r & 0xffc0)) 257 + udelay(r & 0x3f); 258 + goto out; 259 + } 260 + if (r & 0xfff) 261 + goto out; 262 + r = (r >> 12); 263 + if (longwait <= 0) { 264 + udelay((r & 0xff) + 1); 265 + goto out; 266 + } 267 + r = r % longwait + 1; 268 + for (i = 0; i < r; i++) { 269 + for (j = 0; j < 1000; j++) { 270 + udelay(1000); 271 + cpu_relax(); 272 + } 273 + } 274 + out: 275 + if (unlikely(!scfcp)) 276 + return; 277 + if (scfcp->scfc_wait) 278 + WRITE_ONCE(scfcp->scfc_out, true); 279 + else 280 + kfree(scfcp); 281 + } 282 + 283 + // As above, but check for correct CPU. 284 + static void scf_handler_1(void *scfc_in) 285 + { 286 + struct scf_check *scfcp = scfc_in; 287 + 288 + if (likely(scfcp) && WARN_ONCE(smp_processor_id() != scfcp->scfc_cpu, "%s: Wanted CPU %d got CPU %d\n", __func__, scfcp->scfc_cpu, smp_processor_id())) { 289 + atomic_inc(&n_errs); 290 + } 291 + scf_handler(scfcp); 292 + } 293 + 294 + // Randomly do an smp_call_function*() invocation. 295 + static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_random_state *trsp) 296 + { 297 + uintptr_t cpu; 298 + int ret = 0; 299 + struct scf_check *scfcp = NULL; 300 + struct scf_selector *scfsp = scf_sel_rand(trsp); 301 + 302 + if (use_cpus_read_lock) 303 + cpus_read_lock(); 304 + else 305 + preempt_disable(); 306 + if (scfsp->scfs_prim == SCF_PRIM_SINGLE || scfsp->scfs_wait) { 307 + scfcp = kmalloc(sizeof(*scfcp), GFP_ATOMIC); 308 + if (WARN_ON_ONCE(!scfcp)) { 309 + atomic_inc(&n_alloc_errs); 310 + } else { 311 + scfcp->scfc_cpu = -1; 312 + scfcp->scfc_wait = scfsp->scfs_wait; 313 + scfcp->scfc_out = false; 314 + } 315 + } 316 + switch (scfsp->scfs_prim) { 317 + case SCF_PRIM_SINGLE: 318 + cpu = torture_random(trsp) % nr_cpu_ids; 319 + if (scfsp->scfs_wait) 320 + scfp->n_single_wait++; 321 + else 322 + scfp->n_single++; 323 + if (scfcp) { 324 + scfcp->scfc_cpu = cpu; 325 + barrier(); // Prevent race-reduction compiler optimizations. 326 + scfcp->scfc_in = true; 327 + } 328 + ret = smp_call_function_single(cpu, scf_handler_1, (void *)scfcp, scfsp->scfs_wait); 329 + if (ret) { 330 + if (scfsp->scfs_wait) 331 + scfp->n_single_wait_ofl++; 332 + else 333 + scfp->n_single_ofl++; 334 + kfree(scfcp); 335 + scfcp = NULL; 336 + } 337 + break; 338 + case SCF_PRIM_MANY: 339 + if (scfsp->scfs_wait) 340 + scfp->n_many_wait++; 341 + else 342 + scfp->n_many++; 343 + if (scfcp) { 344 + barrier(); // Prevent race-reduction compiler optimizations. 345 + scfcp->scfc_in = true; 346 + } 347 + smp_call_function_many(cpu_online_mask, scf_handler, scfcp, scfsp->scfs_wait); 348 + break; 349 + case SCF_PRIM_ALL: 350 + if (scfsp->scfs_wait) 351 + scfp->n_all_wait++; 352 + else 353 + scfp->n_all++; 354 + if (scfcp) { 355 + barrier(); // Prevent race-reduction compiler optimizations. 356 + scfcp->scfc_in = true; 357 + } 358 + smp_call_function(scf_handler, scfcp, scfsp->scfs_wait); 359 + break; 360 + default: 361 + WARN_ON_ONCE(1); 362 + if (scfcp) 363 + scfcp->scfc_out = true; 364 + } 365 + if (scfcp && scfsp->scfs_wait) { 366 + if (WARN_ON_ONCE((num_online_cpus() > 1 || scfsp->scfs_prim == SCF_PRIM_SINGLE) && 367 + !scfcp->scfc_out)) 368 + atomic_inc(&n_mb_out_errs); // Leak rather than trash! 369 + else 370 + kfree(scfcp); 371 + barrier(); // Prevent race-reduction compiler optimizations. 372 + } 373 + if (use_cpus_read_lock) 374 + cpus_read_unlock(); 375 + else 376 + preempt_enable(); 377 + if (!(torture_random(trsp) & 0xfff)) 378 + schedule_timeout_uninterruptible(1); 379 + } 380 + 381 + // SCF test kthread. Repeatedly does calls to members of the 382 + // smp_call_function() family of functions. 383 + static int scftorture_invoker(void *arg) 384 + { 385 + int cpu; 386 + DEFINE_TORTURE_RANDOM(rand); 387 + struct scf_statistics *scfp = (struct scf_statistics *)arg; 388 + bool was_offline = false; 389 + 390 + VERBOSE_SCFTORTOUT("scftorture_invoker %d: task started", scfp->cpu); 391 + cpu = scfp->cpu % nr_cpu_ids; 392 + set_cpus_allowed_ptr(current, cpumask_of(cpu)); 393 + set_user_nice(current, MAX_NICE); 394 + if (holdoff) 395 + schedule_timeout_interruptible(holdoff * HZ); 396 + 397 + VERBOSE_SCFTORTOUT("scftorture_invoker %d: Waiting for all SCF torturers from cpu %d", scfp->cpu, smp_processor_id()); 398 + 399 + // Make sure that the CPU is affinitized appropriately during testing. 400 + WARN_ON_ONCE(smp_processor_id() != scfp->cpu); 401 + 402 + if (!atomic_dec_return(&n_started)) 403 + while (atomic_read_acquire(&n_started)) { 404 + if (torture_must_stop()) { 405 + VERBOSE_SCFTORTOUT("scftorture_invoker %d ended before starting", scfp->cpu); 406 + goto end; 407 + } 408 + schedule_timeout_uninterruptible(1); 409 + } 410 + 411 + VERBOSE_SCFTORTOUT("scftorture_invoker %d started", scfp->cpu); 412 + 413 + do { 414 + scftorture_invoke_one(scfp, &rand); 415 + while (cpu_is_offline(cpu) && !torture_must_stop()) { 416 + schedule_timeout_interruptible(HZ / 5); 417 + was_offline = true; 418 + } 419 + if (was_offline) { 420 + set_cpus_allowed_ptr(current, cpumask_of(cpu)); 421 + was_offline = false; 422 + } 423 + cond_resched(); 424 + } while (!torture_must_stop()); 425 + 426 + VERBOSE_SCFTORTOUT("scftorture_invoker %d ended", scfp->cpu); 427 + end: 428 + torture_kthread_stopping("scftorture_invoker"); 429 + return 0; 430 + } 431 + 432 + static void 433 + scftorture_print_module_parms(const char *tag) 434 + { 435 + pr_alert(SCFTORT_FLAG 436 + "--- %s: verbose=%d holdoff=%d longwait=%d nthreads=%d onoff_holdoff=%d onoff_interval=%d shutdown_secs=%d stat_interval=%d stutter_cpus=%d use_cpus_read_lock=%d, weight_single=%d, weight_single_wait=%d, weight_many=%d, weight_many_wait=%d, weight_all=%d, weight_all_wait=%d\n", tag, 437 + verbose, holdoff, longwait, nthreads, onoff_holdoff, onoff_interval, shutdown, stat_interval, stutter_cpus, use_cpus_read_lock, weight_single, weight_single_wait, weight_many, weight_many_wait, weight_all, weight_all_wait); 438 + } 439 + 440 + static void scf_cleanup_handler(void *unused) 441 + { 442 + } 443 + 444 + static void scf_torture_cleanup(void) 445 + { 446 + int i; 447 + 448 + if (torture_cleanup_begin()) 449 + return; 450 + 451 + WRITE_ONCE(scfdone, true); 452 + if (nthreads) 453 + for (i = 0; i < nthreads; i++) 454 + torture_stop_kthread("scftorture_invoker", scf_stats_p[i].task); 455 + else 456 + goto end; 457 + smp_call_function(scf_cleanup_handler, NULL, 0); 458 + torture_stop_kthread(scf_torture_stats, scf_torture_stats_task); 459 + scf_torture_stats_print(); // -After- the stats thread is stopped! 460 + kfree(scf_stats_p); // -After- the last stats print has completed! 461 + scf_stats_p = NULL; 462 + 463 + if (atomic_read(&n_errs) || atomic_read(&n_mb_in_errs) || atomic_read(&n_mb_out_errs)) 464 + scftorture_print_module_parms("End of test: FAILURE"); 465 + else if (torture_onoff_failures()) 466 + scftorture_print_module_parms("End of test: LOCK_HOTPLUG"); 467 + else 468 + scftorture_print_module_parms("End of test: SUCCESS"); 469 + 470 + end: 471 + torture_cleanup_end(); 472 + } 473 + 474 + static int __init scf_torture_init(void) 475 + { 476 + long i; 477 + int firsterr = 0; 478 + unsigned long weight_single1 = weight_single; 479 + unsigned long weight_single_wait1 = weight_single_wait; 480 + unsigned long weight_many1 = weight_many; 481 + unsigned long weight_many_wait1 = weight_many_wait; 482 + unsigned long weight_all1 = weight_all; 483 + unsigned long weight_all_wait1 = weight_all_wait; 484 + 485 + if (!torture_init_begin(SCFTORT_STRING, verbose)) 486 + return -EBUSY; 487 + 488 + scftorture_print_module_parms("Start of test"); 489 + 490 + if (weight_single == -1 && weight_single_wait == -1 && 491 + weight_many == -1 && weight_many_wait == -1 && 492 + weight_all == -1 && weight_all_wait == -1) { 493 + weight_single1 = 2 * nr_cpu_ids; 494 + weight_single_wait1 = 2 * nr_cpu_ids; 495 + weight_many1 = 2; 496 + weight_many_wait1 = 2; 497 + weight_all1 = 1; 498 + weight_all_wait1 = 1; 499 + } else { 500 + if (weight_single == -1) 501 + weight_single1 = 0; 502 + if (weight_single_wait == -1) 503 + weight_single_wait1 = 0; 504 + if (weight_many == -1) 505 + weight_many1 = 0; 506 + if (weight_many_wait == -1) 507 + weight_many_wait1 = 0; 508 + if (weight_all == -1) 509 + weight_all1 = 0; 510 + if (weight_all_wait == -1) 511 + weight_all_wait1 = 0; 512 + } 513 + if (weight_single1 == 0 && weight_single_wait1 == 0 && 514 + weight_many1 == 0 && weight_many_wait1 == 0 && 515 + weight_all1 == 0 && weight_all_wait1 == 0) { 516 + VERBOSE_SCFTORTOUT_ERRSTRING("all zero weights makes no sense"); 517 + firsterr = -EINVAL; 518 + goto unwind; 519 + } 520 + scf_sel_add(weight_single1, SCF_PRIM_SINGLE, false); 521 + scf_sel_add(weight_single_wait1, SCF_PRIM_SINGLE, true); 522 + scf_sel_add(weight_many1, SCF_PRIM_MANY, false); 523 + scf_sel_add(weight_many_wait1, SCF_PRIM_MANY, true); 524 + scf_sel_add(weight_all1, SCF_PRIM_ALL, false); 525 + scf_sel_add(weight_all_wait1, SCF_PRIM_ALL, true); 526 + scf_sel_dump(); 527 + 528 + if (onoff_interval > 0) { 529 + firsterr = torture_onoff_init(onoff_holdoff * HZ, onoff_interval, NULL); 530 + if (firsterr) 531 + goto unwind; 532 + } 533 + if (shutdown_secs > 0) { 534 + firsterr = torture_shutdown_init(shutdown_secs, scf_torture_cleanup); 535 + if (firsterr) 536 + goto unwind; 537 + } 538 + 539 + // Worker tasks invoking smp_call_function(). 540 + if (nthreads < 0) 541 + nthreads = num_online_cpus(); 542 + scf_stats_p = kcalloc(nthreads, sizeof(scf_stats_p[0]), GFP_KERNEL); 543 + if (!scf_stats_p) { 544 + VERBOSE_SCFTORTOUT_ERRSTRING("out of memory"); 545 + firsterr = -ENOMEM; 546 + goto unwind; 547 + } 548 + 549 + VERBOSE_SCFTORTOUT("Starting %d smp_call_function() threads\n", nthreads); 550 + 551 + atomic_set(&n_started, nthreads); 552 + for (i = 0; i < nthreads; i++) { 553 + scf_stats_p[i].cpu = i; 554 + firsterr = torture_create_kthread(scftorture_invoker, (void *)&scf_stats_p[i], 555 + scf_stats_p[i].task); 556 + if (firsterr) 557 + goto unwind; 558 + } 559 + if (stat_interval > 0) { 560 + firsterr = torture_create_kthread(scf_torture_stats, NULL, scf_torture_stats_task); 561 + if (firsterr) 562 + goto unwind; 563 + } 564 + 565 + torture_init_end(); 566 + return 0; 567 + 568 + unwind: 569 + torture_init_end(); 570 + scf_torture_cleanup(); 571 + return firsterr; 572 + } 573 + 574 + module_init(scf_torture_init); 575 + module_exit(scf_torture_cleanup);

+134

kernel/smp.c

··· 20 20 #include <linux/sched.h> 21 21 #include <linux/sched/idle.h> 22 22 #include <linux/hypervisor.h> 23 + #include <linux/sched/clock.h> 24 + #include <linux/nmi.h> 25 + #include <linux/sched/debug.h> 23 26 24 27 #include "smpboot.h" 25 28 #include "sched/smp.h" ··· 99 96 smpcfd_prepare_cpu(smp_processor_id()); 100 97 } 101 98 99 + #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG 100 + 101 + static DEFINE_PER_CPU(call_single_data_t *, cur_csd); 102 + static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func); 103 + static DEFINE_PER_CPU(void *, cur_csd_info); 104 + 105 + #define CSD_LOCK_TIMEOUT (5ULL * NSEC_PER_SEC) 106 + static atomic_t csd_bug_count = ATOMIC_INIT(0); 107 + 108 + /* Record current CSD work for current CPU, NULL to erase. */ 109 + static void csd_lock_record(call_single_data_t *csd) 110 + { 111 + if (!csd) { 112 + smp_mb(); /* NULL cur_csd after unlock. */ 113 + __this_cpu_write(cur_csd, NULL); 114 + return; 115 + } 116 + __this_cpu_write(cur_csd_func, csd->func); 117 + __this_cpu_write(cur_csd_info, csd->info); 118 + smp_wmb(); /* func and info before csd. */ 119 + __this_cpu_write(cur_csd, csd); 120 + smp_mb(); /* Update cur_csd before function call. */ 121 + /* Or before unlock, as the case may be. */ 122 + } 123 + 124 + static __always_inline int csd_lock_wait_getcpu(call_single_data_t *csd) 125 + { 126 + unsigned int csd_type; 127 + 128 + csd_type = CSD_TYPE(csd); 129 + if (csd_type == CSD_TYPE_ASYNC || csd_type == CSD_TYPE_SYNC) 130 + return csd->dst; /* Other CSD_TYPE_ values might not have ->dst. */ 131 + return -1; 132 + } 133 + 134 + /* 135 + * Complain if too much time spent waiting. Note that only 136 + * the CSD_TYPE_SYNC/ASYNC types provide the destination CPU, 137 + * so waiting on other types gets much less information. 138 + */ 139 + static __always_inline bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id) 140 + { 141 + int cpu = -1; 142 + int cpux; 143 + bool firsttime; 144 + u64 ts2, ts_delta; 145 + call_single_data_t *cpu_cur_csd; 146 + unsigned int flags = READ_ONCE(csd->flags); 147 + 148 + if (!(flags & CSD_FLAG_LOCK)) { 149 + if (!unlikely(*bug_id)) 150 + return true; 151 + cpu = csd_lock_wait_getcpu(csd); 152 + pr_alert("csd: CSD lock (#%d) got unstuck on CPU#%02d, CPU#%02d released the lock.\n", 153 + *bug_id, raw_smp_processor_id(), cpu); 154 + return true; 155 + } 156 + 157 + ts2 = sched_clock(); 158 + ts_delta = ts2 - *ts1; 159 + if (likely(ts_delta <= CSD_LOCK_TIMEOUT)) 160 + return false; 161 + 162 + firsttime = !*bug_id; 163 + if (firsttime) 164 + *bug_id = atomic_inc_return(&csd_bug_count); 165 + cpu = csd_lock_wait_getcpu(csd); 166 + if (WARN_ONCE(cpu < 0 || cpu >= nr_cpu_ids, "%s: cpu = %d\n", __func__, cpu)) 167 + cpux = 0; 168 + else 169 + cpux = cpu; 170 + cpu_cur_csd = smp_load_acquire(&per_cpu(cur_csd, cpux)); /* Before func and info. */ 171 + pr_alert("csd: %s non-responsive CSD lock (#%d) on CPU#%d, waiting %llu ns for CPU#%02d %pS(%ps).\n", 172 + firsttime ? "Detected" : "Continued", *bug_id, raw_smp_processor_id(), ts2 - ts0, 173 + cpu, csd->func, csd->info); 174 + if (cpu_cur_csd && csd != cpu_cur_csd) { 175 + pr_alert("\tcsd: CSD lock (#%d) handling prior %pS(%ps) request.\n", 176 + *bug_id, READ_ONCE(per_cpu(cur_csd_func, cpux)), 177 + READ_ONCE(per_cpu(cur_csd_info, cpux))); 178 + } else { 179 + pr_alert("\tcsd: CSD lock (#%d) %s.\n", 180 + *bug_id, !cpu_cur_csd ? "unresponsive" : "handling this request"); 181 + } 182 + if (cpu >= 0) { 183 + if (!trigger_single_cpu_backtrace(cpu)) 184 + dump_cpu_task(cpu); 185 + if (!cpu_cur_csd) { 186 + pr_alert("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", *bug_id, raw_smp_processor_id(), cpu); 187 + arch_send_call_function_single_ipi(cpu); 188 + } 189 + } 190 + dump_stack(); 191 + *ts1 = ts2; 192 + 193 + return false; 194 + } 195 + 102 196 /* 103 197 * csd_lock/csd_unlock used to serialize access to per-cpu csd resources 104 198 * ··· 205 105 */ 206 106 static __always_inline void csd_lock_wait(call_single_data_t *csd) 207 107 { 108 + int bug_id = 0; 109 + u64 ts0, ts1; 110 + 111 + ts1 = ts0 = sched_clock(); 112 + for (;;) { 113 + if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id)) 114 + break; 115 + cpu_relax(); 116 + } 117 + smp_acquire__after_ctrl_dep(); 118 + } 119 + 120 + #else 121 + static void csd_lock_record(call_single_data_t *csd) 122 + { 123 + } 124 + 125 + static __always_inline void csd_lock_wait(call_single_data_t *csd) 126 + { 208 127 smp_cond_load_acquire(&csd->flags, !(VAL & CSD_FLAG_LOCK)); 209 128 } 129 + #endif 210 130 211 131 static __always_inline void csd_lock(call_single_data_t *csd) 212 132 { ··· 286 166 * We can unlock early even for the synchronous on-stack case, 287 167 * since we're doing this from the same CPU.. 288 168 */ 169 + csd_lock_record(csd); 289 170 csd_unlock(csd); 290 171 local_irq_save(flags); 291 172 func(info); 173 + csd_lock_record(NULL); 292 174 local_irq_restore(flags); 293 175 return 0; 294 176 } ··· 390 268 entry = &csd_next->llist; 391 269 } 392 270 271 + csd_lock_record(csd); 393 272 func(info); 394 273 csd_unlock(csd); 274 + csd_lock_record(NULL); 395 275 } else { 396 276 prev = &csd->llist; 397 277 } ··· 420 296 smp_call_func_t func = csd->func; 421 297 void *info = csd->info; 422 298 299 + csd_lock_record(csd); 423 300 csd_unlock(csd); 424 301 func(info); 302 + csd_lock_record(NULL); 425 303 } else if (type == CSD_TYPE_IRQ_WORK) { 426 304 irq_work_single(csd); 427 305 } ··· 501 375 502 376 csd->func = func; 503 377 csd->info = info; 378 + #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG 379 + csd->src = smp_processor_id(); 380 + csd->dst = cpu; 381 + #endif 504 382 505 383 err = generic_exec_single(cpu, csd); 506 384 ··· 670 540 csd->flags |= CSD_TYPE_SYNC; 671 541 csd->func = func; 672 542 csd->info = info; 543 + #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG 544 + csd->src = smp_processor_id(); 545 + csd->dst = cpu; 546 + #endif 673 547 if (llist_add(&csd->llist, &per_cpu(call_single_queue, cpu))) 674 548 __cpumask_set_cpu(cpu, cfd->cpumask_ipi); 675 549 }

+1 -1

kernel/time/tick-sched.c

··· 927 927 928 928 if (ratelimit < 10 && 929 929 (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) { 930 - pr_warn("NOHZ: local_softirq_pending %02x\n", 930 + pr_warn("NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #%02x!!!\n", 931 931 (unsigned int) local_softirq_pending()); 932 932 ratelimit++; 933 933 }

+21

lib/Kconfig.debug

··· 1367 1367 Say M if you want these self tests to build as a module. 1368 1368 Say N if you are unsure. 1369 1369 1370 + config SCF_TORTURE_TEST 1371 + tristate "torture tests for smp_call_function*()" 1372 + depends on DEBUG_KERNEL 1373 + select TORTURE_TEST 1374 + help 1375 + This option provides a kernel module that runs torture tests 1376 + on the smp_call_function() family of primitives. The kernel 1377 + module may be built after the fact on the running kernel to 1378 + be tested, if desired. 1379 + 1380 + config CSD_LOCK_WAIT_DEBUG 1381 + bool "Debugging for csd_lock_wait(), called from smp_call_function*()" 1382 + depends on DEBUG_KERNEL 1383 + depends on 64BIT 1384 + default n 1385 + help 1386 + This option enables debug prints when CPUs are slow to respond 1387 + to the smp_call_function*() IPI wrappers. These debug prints 1388 + include the IPI handler function currently executing (if any) 1389 + and relevant stack traces. 1390 + 1370 1391 endmenu # lock debugging 1371 1392 1372 1393 config TRACE_IRQFLAGS

+5 -1

lib/nmi_backtrace.c

··· 85 85 put_cpu(); 86 86 } 87 87 88 + // Dump stacks even for idle CPUs. 89 + static bool backtrace_idle; 90 + module_param(backtrace_idle, bool, 0644); 91 + 88 92 bool nmi_cpu_backtrace(struct pt_regs *regs) 89 93 { 90 94 int cpu = smp_processor_id(); 91 95 92 96 if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) { 93 - if (regs && cpu_in_idle(instruction_pointer(regs))) { 97 + if (!READ_ONCE(backtrace_idle) && regs && cpu_in_idle(instruction_pointer(regs))) { 94 98 pr_warn("NMI backtrace for cpu %d skipped: idling at %pS\n", 95 99 cpu, (void *)instruction_pointer(regs)); 96 100 } else {

+3 -3

tools/testing/selftests/rcutorture/bin/kvm-recheck-rcuperf-ftrace.sh tools/testing/selftests/rcutorture/bin/kvm-recheck-rcuscale-ftrace.sh

··· 1 1 #!/bin/bash 2 2 # SPDX-License-Identifier: GPL-2.0+ 3 3 # 4 - # Analyze a given results directory for rcuperf performance measurements, 4 + # Analyze a given results directory for rcuscale performance measurements, 5 5 # looking for ftrace data. Exits with 0 if data was found, analyzed, and 6 - # printed. Intended to be invoked from kvm-recheck-rcuperf.sh after 6 + # printed. Intended to be invoked from kvm-recheck-rcuscale.sh after 7 7 # argument checking. 8 8 # 9 - # Usage: kvm-recheck-rcuperf-ftrace.sh resdir 9 + # Usage: kvm-recheck-rcuscale-ftrace.sh resdir 10 10 # 11 11 # Copyright (C) IBM Corporation, 2016 12 12 #

+7 -7

tools/testing/selftests/rcutorture/bin/kvm-recheck-rcuperf.sh tools/testing/selftests/rcutorture/bin/kvm-recheck-rcuscale.sh

··· 1 1 #!/bin/bash 2 2 # SPDX-License-Identifier: GPL-2.0+ 3 3 # 4 - # Analyze a given results directory for rcuperf performance measurements. 4 + # Analyze a given results directory for rcuscale scalability measurements. 5 5 # 6 - # Usage: kvm-recheck-rcuperf.sh resdir 6 + # Usage: kvm-recheck-rcuscale.sh resdir 7 7 # 8 8 # Copyright (C) IBM Corporation, 2016 9 9 # ··· 20 20 PATH=`pwd`/tools/testing/selftests/rcutorture/bin:$PATH; export PATH 21 21 . functions.sh 22 22 23 - if kvm-recheck-rcuperf-ftrace.sh $i 23 + if kvm-recheck-rcuscale-ftrace.sh $i 24 24 then 25 25 # ftrace data was successfully analyzed, call it good! 26 26 exit 0 ··· 30 30 31 31 sed -e 's/^\[[^]]*]//' < $i/console.log | 32 32 awk ' 33 - /-perf: .* gps: .* batches:/ { 33 + /-scale: .* gps: .* batches:/ { 34 34 ngps = $9; 35 35 nbatches = $11; 36 36 } 37 37 38 - /-perf: .*writer-duration/ { 38 + /-scale: .*writer-duration/ { 39 39 gptimes[++n] = $5 / 1000.; 40 40 sum += $5 / 1000.; 41 41 } ··· 43 43 END { 44 44 newNR = asort(gptimes); 45 45 if (newNR <= 0) { 46 - print "No rcuperf records found???" 46 + print "No rcuscale records found???" 47 47 exit; 48 48 } 49 49 pct50 = int(newNR * 50 / 100); ··· 79 79 print "99th percentile grace-period duration: " gptimes[pct99]; 80 80 print "Maximum grace-period duration: " gptimes[newNR]; 81 81 print "Grace periods: " ngps + 0 " Batches: " nbatches + 0 " Ratio: " ngps / nbatches; 82 - print "Computed from rcuperf printk output."; 82 + print "Computed from rcuscale printk output."; 83 83 }'

+38

tools/testing/selftests/rcutorture/bin/kvm-recheck-scf.sh

··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0+ 3 + # 4 + # Analyze a given results directory for rcutorture progress. 5 + # 6 + # Usage: kvm-recheck-rcu.sh resdir 7 + # 8 + # Copyright (C) Facebook, 2020 9 + # 10 + # Authors: Paul E. McKenney <paulmck@kernel.org> 11 + 12 + i="$1" 13 + if test -d "$i" -a -r "$i" 14 + then 15 + : 16 + else 17 + echo Unreadable results directory: $i 18 + exit 1 19 + fi 20 + . functions.sh 21 + 22 + configfile=`echo $i | sed -e 's/^.*\///'` 23 + nscfs="`grep 'scf_invoked_count ver:' $i/console.log 2> /dev/null | tail -1 | sed -e 's/^.* scf_invoked_count ver: //' -e 's/ .*$//' | tr -d '\015'`" 24 + if test -z "$nscfs" 25 + then 26 + echo "$configfile ------- " 27 + else 28 + dur="`sed -e 's/^.* scftorture.shutdown_secs=//' -e 's/ .*$//' < $i/qemu-cmd 2> /dev/null`" 29 + if test -z "$dur" 30 + then 31 + rate="" 32 + else 33 + nscfss=`awk -v nscfs=$nscfs -v dur=$dur ' 34 + BEGIN { print nscfs / dur }' < /dev/null` 35 + rate=" ($nscfss/s)" 36 + fi 37 + echo "${configfile} ------- ${nscfs} SCF handler invocations$rate" 38 + fi

+25 -8

tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh

··· 66 66 echo > $T/KcList 67 67 config_override_param "$config_dir/CFcommon" KcList "`cat $config_dir/CFcommon 2> /dev/null`" 68 68 config_override_param "$config_template" KcList "`cat $config_template 2> /dev/null`" 69 + config_override_param "--gdb options" KcList "$TORTURE_KCONFIG_GDB_ARG" 69 70 config_override_param "--kasan options" KcList "$TORTURE_KCONFIG_KASAN_ARG" 70 71 config_override_param "--kcsan options" KcList "$TORTURE_KCONFIG_KCSAN_ARG" 71 72 config_override_param "--kconfig argument" KcList "$TORTURE_KCONFIG_ARG" ··· 153 152 boot_args="`configfrag_boot_params "$boot_args" "$config_template"`" 154 153 # Generate kernel-version-specific boot parameters 155 154 boot_args="`per_version_boot_params "$boot_args" $resdir/.config $seconds`" 156 - echo $QEMU $qemu_args -m $TORTURE_QEMU_MEM -kernel $KERNEL -append \"$qemu_append $boot_args\" > $resdir/qemu-cmd 155 + if test -n "$TORTURE_BOOT_GDB_ARG" 156 + then 157 + boot_args="$boot_args $TORTURE_BOOT_GDB_ARG" 158 + fi 159 + echo $QEMU $qemu_args -m $TORTURE_QEMU_MEM -kernel $KERNEL -append \"$qemu_append $boot_args\" $TORTURE_QEMU_GDB_ARG > $resdir/qemu-cmd 157 160 158 161 if test -n "$TORTURE_BUILDONLY" 159 162 then ··· 176 171 # Attempt to run qemu 177 172 ( . $T/qemu-cmd; wait `cat $resdir/qemu_pid`; echo $? > $resdir/qemu-retval ) & 178 173 commandcompleted=0 179 - sleep 10 # Give qemu's pid a chance to reach the file 180 - if test -s "$resdir/qemu_pid" 174 + if test -z "$TORTURE_KCONFIG_GDB_ARG" 181 175 then 182 - qemu_pid=`cat "$resdir/qemu_pid"` 183 - echo Monitoring qemu job at pid $qemu_pid 184 - else 185 - qemu_pid="" 186 - echo Monitoring qemu job at yet-as-unknown pid 176 + sleep 10 # Give qemu's pid a chance to reach the file 177 + if test -s "$resdir/qemu_pid" 178 + then 179 + qemu_pid=`cat "$resdir/qemu_pid"` 180 + echo Monitoring qemu job at pid $qemu_pid 181 + else 182 + qemu_pid="" 183 + echo Monitoring qemu job at yet-as-unknown pid 184 + fi 185 + fi 186 + if test -n "$TORTURE_KCONFIG_GDB_ARG" 187 + then 188 + echo Waiting for you to attach a debug session, for example: > /dev/tty 189 + echo " gdb $base_resdir/vmlinux" > /dev/tty 190 + echo 'After symbols load and the "(gdb)" prompt appears:' > /dev/tty 191 + echo " target remote :1234" > /dev/tty 192 + echo " continue" > /dev/tty 193 + kstarttime=`gawk 'BEGIN { print systime() }' < /dev/null` 187 194 fi 188 195 while : 189 196 do

+31 -5

tools/testing/selftests/rcutorture/bin/kvm.sh

··· 31 31 TORTURE_BOOT_IMAGE="" 32 32 TORTURE_INITRD="$KVM/initrd"; export TORTURE_INITRD 33 33 TORTURE_KCONFIG_ARG="" 34 + TORTURE_KCONFIG_GDB_ARG="" 35 + TORTURE_BOOT_GDB_ARG="" 36 + TORTURE_QEMU_GDB_ARG="" 34 37 TORTURE_KCONFIG_KASAN_ARG="" 35 38 TORTURE_KCONFIG_KCSAN_ARG="" 36 39 TORTURE_KMAKE_ARG="" ··· 49 46 50 47 usage () { 51 48 echo "Usage: $scriptname optional arguments:" 49 + echo " --allcpus" 52 50 echo " --bootargs kernel-boot-arguments" 53 51 echo " --bootimage relative-path-to-kernel-boot-image" 54 52 echo " --buildonly" ··· 59 55 echo " --defconfig string" 60 56 echo " --dryrun sched|script" 61 57 echo " --duration minutes" 58 + echo " --gdb" 59 + echo " --help" 62 60 echo " --interactive" 63 61 echo " --jitter N [ maxsleep (us) [ maxspin (us) ] ]" 64 62 echo " --kconfig Kconfig-options" 65 63 echo " --kmake-arg kernel-make-arguments" 66 64 echo " --mac nn:nn:nn:nn:nn:nn" 67 - echo " --memory megabytes | nnnG" 65 + echo " --memory megabytes|nnnG" 68 66 echo " --no-initrd" 69 67 echo " --qemu-args qemu-arguments" 70 68 echo " --qemu-cmd qemu-system-..." 71 69 echo " --results absolute-pathname" 72 - echo " --torture rcu" 70 + echo " --torture lock|rcu|rcuscale|refscale|scf" 73 71 echo " --trust-make" 74 72 exit 1 75 73 } ··· 132 126 dur=$(($2*60)) 133 127 shift 134 128 ;; 129 + --gdb) 130 + TORTURE_KCONFIG_GDB_ARG="CONFIG_DEBUG_INFO=y"; export TORTURE_KCONFIG_GDB_ARG 131 + TORTURE_BOOT_GDB_ARG="nokaslr"; export TORTURE_BOOT_GDB_ARG 132 + TORTURE_QEMU_GDB_ARG="-s -S"; export TORTURE_QEMU_GDB_ARG 133 + ;; 134 + --help|-h) 135 + usage 136 + ;; 135 137 --interactive) 136 138 TORTURE_QEMU_INTERACTIVE=1; export TORTURE_QEMU_INTERACTIVE 137 139 ;; ··· 198 184 shift 199 185 ;; 200 186 --torture) 201 - checkarg --torture "(suite name)" "$#" "$2" '^$lock\|rcu\|rcuperf\|refscale$$' '^--' 187 + checkarg --torture "(suite name)" "$#" "$2" '^$lock\|rcu\|rcuscale\|refscale\|scf$$' '^--' 202 188 TORTURE_SUITE=$2 203 189 shift 204 - if test "$TORTURE_SUITE" = rcuperf || test "$TORTURE_SUITE" = refscale 190 + if test "$TORTURE_SUITE" = rcuscale || test "$TORTURE_SUITE" = refscale 205 191 then 206 192 # If you really want jitter for refscale or 207 - # rcuperf, specify it after specifying the rcuperf 193 + # rcuscale, specify it after specifying the rcuscale 208 194 # or the refscale. (But why jitter in these cases?) 209 195 jitter=0 210 196 fi ··· 262 248 done 263 249 touch $T/cfgcpu 264 250 configs_derep="`echo $configs_derep | sed -e "s/\<CFLIST\>/$defaultconfigs/g"`" 251 + if test -n "$TORTURE_KCONFIG_GDB_ARG" 252 + then 253 + if test "`echo $configs_derep | wc -w`" -gt 1 254 + then 255 + echo "The --config list is: $configs_derep." 256 + echo "Only one --config permitted with --gdb, terminating." 257 + exit 1 258 + fi 259 + fi 265 260 for CF1 in $configs_derep 266 261 do 267 262 if test -f "$CONFIGFRAG/$CF1" ··· 346 323 TORTURE_DEFCONFIG="$TORTURE_DEFCONFIG"; export TORTURE_DEFCONFIG 347 324 TORTURE_INITRD="$TORTURE_INITRD"; export TORTURE_INITRD 348 325 TORTURE_KCONFIG_ARG="$TORTURE_KCONFIG_ARG"; export TORTURE_KCONFIG_ARG 326 + TORTURE_KCONFIG_GDB_ARG="$TORTURE_KCONFIG_GDB_ARG"; export TORTURE_KCONFIG_GDB_ARG 327 + TORTURE_BOOT_GDB_ARG="$TORTURE_BOOT_GDB_ARG"; export TORTURE_BOOT_GDB_ARG 328 + TORTURE_QEMU_GDB_ARG="$TORTURE_QEMU_GDB_ARG"; export TORTURE_QEMU_GDB_ARG 349 329 TORTURE_KCONFIG_KASAN_ARG="$TORTURE_KCONFIG_KASAN_ARG"; export TORTURE_KCONFIG_KASAN_ARG 350 330 TORTURE_KCONFIG_KCSAN_ARG="$TORTURE_KCONFIG_KCSAN_ARG"; export TORTURE_KCONFIG_KCSAN_ARG 351 331 TORTURE_KMAKE_ARG="$TORTURE_KMAKE_ARG"; export TORTURE_KMAKE_ARG

+6 -5

tools/testing/selftests/rcutorture/bin/parse-console.sh

··· 33 33 fi 34 34 cat /dev/null > $file.diags 35 35 36 - # Check for proper termination, except for rcuperf and refscale. 37 - if test "$TORTURE_SUITE" != rcuperf && test "$TORTURE_SUITE" != refscale 36 + # Check for proper termination, except for rcuscale and refscale. 37 + if test "$TORTURE_SUITE" != rcuscale && test "$TORTURE_SUITE" != refscale 38 38 then 39 39 # check for abject failure 40 40 ··· 67 67 grep --binary-files=text 'torture:.*ver:' $file | 68 68 egrep --binary-files=text -v '$null$|rtc: 000000000* ' | 69 69 sed -e 's/^(initramfs)[^]]*] //' -e 's/^\[[^]]*] //' | 70 + sed -e 's/^.*ver: //' | 70 71 awk ' 71 72 BEGIN { 72 73 ver = 0; ··· 75 74 } 76 75 77 76 { 78 - if (!badseq && ($5 + 0 != $5 || $5 <= ver)) { 77 + if (!badseq && ($1 + 0 != $1 || $1 <= ver)) { 79 78 badseqno1 = ver; 80 - badseqno2 = $5; 79 + badseqno2 = $1; 81 80 badseqnr = NR; 82 81 badseq = 1; 83 82 } 84 - ver = $5 83 + ver = $1 85 84 } 86 85 87 86 END {

+1

tools/testing/selftests/rcutorture/configs/rcu/TREE05

··· 16 16 CONFIG_DEBUG_LOCK_ALLOC=y 17 17 CONFIG_PROVE_LOCKING=y 18 18 #CHECK#CONFIG_PROVE_RCU=y 19 + CONFIG_PROVE_RCU_LIST=y 19 20 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 20 21 CONFIG_RCU_EXPERT=y

tools/testing/selftests/rcutorture/configs/rcuperf/CFLIST tools/testing/selftests/rcutorture/configs/rcuscale/CFLIST

-2

tools/testing/selftests/rcutorture/configs/rcuperf/CFcommon

··· 1 - CONFIG_RCU_PERF_TEST=y 2 - CONFIG_PRINTK_TIME=y

tools/testing/selftests/rcutorture/configs/rcuperf/TINY tools/testing/selftests/rcutorture/configs/rcuscale/TINY

tools/testing/selftests/rcutorture/configs/rcuperf/TREE tools/testing/selftests/rcutorture/configs/rcuscale/TREE

tools/testing/selftests/rcutorture/configs/rcuperf/TREE54 tools/testing/selftests/rcutorture/configs/rcuscale/TREE54

+2 -2

tools/testing/selftests/rcutorture/configs/rcuperf/ver_functions.sh tools/testing/selftests/rcutorture/configs/rcuscale/ver_functions.sh

··· 11 11 # 12 12 # Adds per-version torture-module parameters to kernels supporting them. 13 13 per_version_boot_params () { 14 - echo $1 rcuperf.shutdown=1 \ 15 - rcuperf.verbose=1 14 + echo $1 rcuscale.shutdown=1 \ 15 + rcuscale.verbose=1 16 16 }

+2

tools/testing/selftests/rcutorture/configs/rcuscale/CFcommon

··· 1 + CONFIG_RCU_SCALE_TEST=y 2 + CONFIG_PRINTK_TIME=y

+2

tools/testing/selftests/rcutorture/configs/scf/CFLIST

··· 1 + NOPREEMPT 2 + PREEMPT

+2

tools/testing/selftests/rcutorture/configs/scf/CFcommon

··· 1 + CONFIG_SCF_TORTURE_TEST=y 2 + CONFIG_PRINTK_TIME=y

+9

tools/testing/selftests/rcutorture/configs/scf/NOPREEMPT

··· 1 + CONFIG_SMP=y 2 + CONFIG_PREEMPT_NONE=y 3 + CONFIG_PREEMPT_VOLUNTARY=n 4 + CONFIG_PREEMPT=n 5 + CONFIG_HZ_PERIODIC=n 6 + CONFIG_NO_HZ_IDLE=n 7 + CONFIG_NO_HZ_FULL=y 8 + CONFIG_DEBUG_LOCK_ALLOC=n 9 + CONFIG_PROVE_LOCKING=n

+1

tools/testing/selftests/rcutorture/configs/scf/NOPREEMPT.boot

··· 1 + nohz_full=1

+9

tools/testing/selftests/rcutorture/configs/scf/PREEMPT

··· 1 + CONFIG_SMP=y 2 + CONFIG_PREEMPT_NONE=n 3 + CONFIG_PREEMPT_VOLUNTARY=n 4 + CONFIG_PREEMPT=y 5 + CONFIG_HZ_PERIODIC=n 6 + CONFIG_NO_HZ_IDLE=y 7 + CONFIG_NO_HZ_FULL=n 8 + CONFIG_DEBUG_LOCK_ALLOC=y 9 + CONFIG_PROVE_LOCKING=y

+30

tools/testing/selftests/rcutorture/configs/scf/ver_functions.sh

··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0+ 3 + # 4 + # Torture-suite-dependent shell functions for the rest of the scripts. 5 + # 6 + # Copyright (C) Facebook, 2020 7 + # 8 + # Authors: Paul E. McKenney <paulmck@kernel.org> 9 + 10 + # scftorture_param_onoff bootparam-string config-file 11 + # 12 + # Adds onoff scftorture module parameters to kernels having it. 13 + scftorture_param_onoff () { 14 + if ! bootparam_hotplug_cpu "$1" && configfrag_hotplug_cpu "$2" 15 + then 16 + echo CPU-hotplug kernel, adding scftorture onoff. 1>&2 17 + echo scftorture.onoff_interval=1000 scftorture.onoff_holdoff=30 18 + fi 19 + } 20 + 21 + # per_version_boot_params bootparam-string config-file seconds 22 + # 23 + # Adds per-version torture-module parameters to kernels supporting them. 24 + per_version_boot_params () { 25 + echo $1 `scftorture_param_onoff "$1" "$2"` \ 26 + scftorture.stat_interval=15 \ 27 + scftorture.shutdown_secs=$3 \ 28 + scftorture.verbose=1 \ 29 + scf 30 + }

+7 -29

tools/testing/selftests/rcutorture/doc/initrd.txt

··· 1 - The rcutorture scripting tools automatically create the needed initrd 2 - directory using dracut. Failing that, this tool will create an initrd 3 - containing a single statically linked binary named "init" that loops 4 - over a very long sleep() call. In both cases, this creation is done 5 - by tools/testing/selftests/rcutorture/bin/mkinitrd.sh. 1 + The rcutorture scripting tools automatically create an initrd containing 2 + a single statically linked binary named "init" that loops over a 3 + very long sleep() call. In both cases, this creation is done by 4 + tools/testing/selftests/rcutorture/bin/mkinitrd.sh. 6 5 7 - However, if you are attempting to run rcutorture on a system that does 8 - not have dracut installed, and if you don't like the notion of static 9 - linking, you might wish to press an existing initrd into service: 6 + However, if you don't like the notion of statically linked bare-bones 7 + userspace environments, you might wish to press an existing initrd 8 + into service: 10 9 11 10 ------------------------------------------------------------------------ 12 11 cd tools/testing/selftests/rcutorture ··· 14 15 cd initrd 15 16 cpio -id < /tmp/initrd.img.zcat 16 17 # Manually verify that initrd contains needed binaries and libraries. 17 - ------------------------------------------------------------------------ 18 - 19 - Interestingly enough, if you are running rcutorture, you don't really 20 - need userspace in many cases. Running without userspace has the 21 - advantage of allowing you to test your kernel independently of the 22 - distro in place, the root-filesystem layout, and so on. To make this 23 - happen, put the following script in the initrd's tree's "/init" file, 24 - with 0755 mode. 25 - 26 - ------------------------------------------------------------------------ 27 - #!/bin/sh 28 - 29 - while : 30 - do 31 - sleep 10 32 - done 33 - ------------------------------------------------------------------------ 34 - 35 - This approach also allows most of the binaries and libraries in the 36 - initrd filesystem to be dispensed with, which can save significant 37 - space in rcutorture's "res" directory.

+33 -8

tools/testing/selftests/rcutorture/doc/rcu-test-image.txt

··· 1 - This document describes one way to create the rcu-test-image file 2 - that contains the filesystem used by the guest-OS kernel. There are 3 - probably much better ways of doing this, and this filesystem could no 4 - doubt be smaller. It is probably also possible to simply download 5 - an appropriate image from any number of places. 1 + Normally, a minimal initrd is created automatically by the rcutorture 2 + scripting. But minimal really does mean "minimal", namely just a single 3 + root directory with a single statically linked executable named "init": 4 + 5 + $ size tools/testing/selftests/rcutorture/initrd/init 6 + text data bss dec hex filename 7 + 328 0 8 336 150 tools/testing/selftests/rcutorture/initrd/init 8 + 9 + Suppose you need to run some scripts, perhaps to monitor or control 10 + some aspect of the rcutorture testing. This will require a more fully 11 + filled-out userspace, perhaps containing libraries, executables for 12 + the shell and other utilities, and soforth. In that case, place your 13 + desired filesystem here: 14 + 15 + tools/testing/selftests/rcutorture/initrd 16 + 17 + For example, your tools/testing/selftests/rcutorture/initrd/init might 18 + be a script that does any needed mount operations and starts whatever 19 + scripts need starting to properly monitor or control your testing. 20 + The next rcutorture build will then incorporate this filesystem into 21 + the kernel image that is passed to qemu. 22 + 23 + Or maybe you need a real root filesystem for some reason, in which case 24 + please read on! 25 + 26 + The remainder of this document describes one way to create the 27 + rcu-test-image file that contains the filesystem used by the guest-OS 28 + kernel. There are probably much better ways of doing this, and this 29 + filesystem could no doubt be smaller. It is probably also possible to 30 + simply download an appropriate image from any number of places. 6 31 7 32 That said, here are the commands: 8 33 ··· 61 36 https://help.ubuntu.com/community/JeOSVMBuilder 62 37 http://wiki.libvirt.org/page/UbuntuKVMWalkthrough 63 38 http://www.moe.co.uk/2011/01/07/pci_add_option_rom-failed-to-find-romfile-pxe-rtl8139-bin/ -- "apt-get install kvm-pxe" 64 - http://www.landley.net/writing/rootfs-howto.html 65 - http://en.wikipedia.org/wiki/Initrd 66 - http://en.wikipedia.org/wiki/Cpio 39 + https://www.landley.net/writing/rootfs-howto.html 40 + https://en.wikipedia.org/wiki/Initrd 41 + https://en.wikipedia.org/wiki/Cpio 67 42 http://wiki.libvirt.org/page/UbuntuKVMWalkthrough