Merge tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

Pull RCU updates from Boqun Feng:
"Documentation:
- Add broken-timing possibility to stallwarn.rst
- Improve discussion of this_cpu_ptr(), add raw_cpu_ptr()
- Document self-propagating callbacks
- Point call_srcu() to call_rcu() for detailed memory ordering
- Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header
- Clarify RCU_LAZY and RCU_LAZY_DEFAULT_OFF help text
- Remove references to old grace-period-wait primitives

srcu:
- Introduce srcu_read_{un,}lock_fast(), which is similar to
srcu_read_{un,}lock_lite(): avoid smp_mb()s in lock and unlock
at the cost of calling synchronize_rcu() in synchronize_srcu()

Moreover, by returning the percpu offset of the counter at
srcu_read_lock_fast() time, srcu_read_unlock_fast() can avoid
extra pointer dereferencing, which makes it faster than
srcu_read_{un,}lock_lite()

srcu_read_{un,}lock_fast() are intended to replace
rcu_read_{un,}lock_trace() if possible

RCU torture:
- Add get_torture_init_jiffies() to return the start time of the test
- Add a test_boost_holdoff module parameter to allow delaying
boosting tests when building rcutorture as built-in
- Add grace period sequence number logging at the beginning and end
of failure/close-call results
- Switch to hexadecimal for the expedited grace period sequence
number in the rcu_exp_grace_period trace point
- Make cur_ops->format_gp_seqs take buffer length
- Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool
- Complain when invalid SRCU reader_flavor is specified
- Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing, which forces SRCU
uses atomics even when percpu ops are NMI safe, and use the Kconfig
for SRCU lockdep testing

Misc:
- Split rcu_report_exp_cpu_mult() mask parameter and use for tracing
- Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes()
- Fix get_state_synchronize_rcu_full() GP-start detection
- Move RCU Tasks self-tests to core_initcall()
- Print segment lengths in show_rcu_nocb_gp_state()
- Make RCU watch ct_kernel_exit_state() warning
- Flush console log from kernel_power_off()
- rcutorture: Allow a negative value for nfakewriters
- rcu: Update TREE05.boot to test normal synchronize_rcu()
- rcu: Use _full() API to debug synchronize_rcu()

Make RCU handle PREEMPT_LAZY better:
- Fix header guard for rcu_all_qs()
- rcu: Rename PREEMPT_AUTO to PREEMPT_LAZY
- Update __cond_resched comment about RCU quiescent states
- Handle unstable rdp in rcu_read_unlock_strict()
- Handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
- osnoise: Provide quiescent states
- Adjust rcutorture with possible PREEMPT_RCU=n && PREEMPT_COUNT=y
combination
- Limit PREEMPT_RCU configurations
- Make rcutorture senario TREE07 and senario TREE10 use
PREEMPT_LAZY=y"

* tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (59 commits)
rcutorture: Make scenario TREE07 build CONFIG_PREEMPT_LAZY=y
rcutorture: Make scenario TREE10 build CONFIG_PREEMPT_LAZY=y
rcu: limit PREEMPT_RCU configurations
rcutorture: Update ->extendables check for lazy preemption
rcutorture: Update rcutorture_one_extend_check() for lazy preemption
osnoise: provide quiescent states
rcu: Use _full() API to debug synchronize_rcu()
rcu: Update TREE05.boot to test normal synchronize_rcu()
rcutorture: Allow a negative value for nfakewriters
Flush console log from kernel_power_off()
context_tracking: Make RCU watch ct_kernel_exit_state() warning
rcu/nocb: Print segment lengths in show_rcu_nocb_gp_state()
rcu-tasks: Move RCU Tasks self-tests to core_initcall()
rcu: Fix get_state_synchronize_rcu_full() GP-start detection
torture: Make SRCU lockdep testing use srcu_read_lock_nmisafe()
srcu: Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing
rcutorture: Complain when invalid SRCU reader_flavor is specified
rcutorture: Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool
rcutorture: Make cur_ops->format_gp_seqs take buffer length
rcutorture: Add ftrace-compatible timestamp to GP# failure/close-call output
...

Linus Torvalds 1 year ago 3ba7dfb8 2f2d5294

+717 -246

38 changed files

expand all

Documentation

RCU

rcubarrier.rst

stallwarn.rst

admin-guide

kernel-parameters.txt

core-api

this_cpu_ops.rst

include

linux

printk.h

rcupdate.h

rcupdate_wait.h

rcutree.h

srcu.h

srcutiny.h

srcutree.h

torture.h

trace

events

rcu.h

init

main.c

kernel

context_tracking.c

printk

printk.c

rcu

Kconfig

Kconfig.debug

rcu.h

rcutorture.c

refscale.c

srcutiny.c

srcutree.c

tasks.h

tiny.c

tree.c

tree_exp.h

tree_nocb.h

tree_plugin.h

reboot.c

sched

core.c

torture.c

trace

trace_osnoise.c

tools

testing

selftests

rcutorture

bin

srcu_lockdep.sh

configs

rcu

SRCU-P.boot

TREE05.boot

TREE07

TREE10

+1 -4

Documentation/RCU/rcubarrier.rst

··· 329 329 was first added back in 2005. This is because on_each_cpu() 330 330 disables preemption, which acted as an RCU read-side critical 331 331 section, thus preventing CPU 0's grace period from completing 332 - until on_each_cpu() had dealt with all of the CPUs. However, 333 - with the advent of preemptible RCU, rcu_barrier() no longer 334 - waited on nonpreemptible regions of code in preemptible kernels, 335 - that being the job of the new rcu_barrier_sched() function. 332 + until on_each_cpu() had dealt with all of the CPUs. 336 333 337 334 However, with the RCU flavor consolidation around v4.20, this 338 335 possibility was once again ruled out, because the consolidated

Documentation/RCU/stallwarn.rst

··· 96 96 the ``rcu_.*timer wakeup didn't happen for`` console-log message, 97 97 which will include additional debugging information. 98 98 99 + - A timer issue causes time to appear to jump forward, so that RCU 100 + believes that the RCU CPU stall-warning timeout has been exceeded 101 + when in fact much less time has passed. This could be due to 102 + timer hardware bugs, timer driver bugs, or even corruption of 103 + the "jiffies" global variable. These sorts of timer hardware 104 + and driver bugs are not uncommon when testing new hardware. 105 + 99 106 - A low-level kernel issue that either fails to invoke one of the 100 107 variants of rcu_eqs_enter(true), rcu_eqs_exit(true), ct_idle_enter(), 101 108 ct_idle_exit(), ct_irq_enter(), or ct_irq_exit() on the one

Documentation/admin-guide/kernel-parameters.txt

··· 5760 5760 rcutorture.test_boost_duration= [KNL] 5761 5761 Duration (s) of each individual boost test. 5762 5762 5763 + rcutorture.test_boost_holdoff= [KNL] 5764 + Holdoff time (s) from start of test to the start 5765 + of RCU priority-boost testing. Defaults to zero, 5766 + that is, no holdoff. 5767 + 5763 5768 rcutorture.test_boost_interval= [KNL] 5764 5769 Interval (s) between each boost test. 5765 5770

+15 -5

Documentation/core-api/this_cpu_ops.rst

··· 138 138 available. Instead, the offset of the local per cpu area is simply 139 139 added to the per cpu offset. 140 140 141 - Note that this operation is usually used in a code segment when 142 - preemption has been disabled. The pointer is then used to 143 - access local per cpu data in a critical section. When preemption 144 - is re-enabled this pointer is usually no longer useful since it may 145 - no longer point to per cpu data of the current processor. 141 + Note that this operation can only be used in code segments where 142 + smp_processor_id() may be used, for example, where preemption has been 143 + disabled. The pointer is then used to access local per cpu data in a 144 + critical section. When preemption is re-enabled this pointer is usually 145 + no longer useful since it may no longer point to per cpu data of the 146 + current processor. 146 147 148 + The special cases where it makes sense to obtain a per-CPU pointer in 149 + preemptible code are addressed by raw_cpu_ptr(), but such use cases need 150 + to handle cases where two different CPUs are accessing the same per cpu 151 + variable, which might well be that of a third CPU. These use cases are 152 + typically performance optimizations. For example, SRCU implements a pair 153 + of counters as a pair of per-CPU variables, and rcu_read_lock_nmisafe() 154 + uses raw_cpu_ptr() to get a pointer to some CPU's counter, and uses 155 + atomic_inc_long() to handle migration between the raw_cpu_ptr() and 156 + the atomic_inc_long(). 147 157 148 158 Per cpu variables and offsets 149 159 -----------------------------

include/linux/printk.h

··· 207 207 extern bool nbcon_device_try_acquire(struct console *con); 208 208 extern void nbcon_device_release(struct console *con); 209 209 void nbcon_atomic_flush_unsafe(void); 210 + bool pr_flush(int timeout_ms, bool reset_on_progress); 210 211 #else 211 212 static inline __printf(1, 0) 212 213 int vprintk(const char *s, va_list args) ··· 314 313 315 314 static inline void nbcon_atomic_flush_unsafe(void) 316 315 { 316 + } 317 + 318 + static inline bool pr_flush(int timeout_ms, bool reset_on_progress) 319 + { 320 + return true; 317 321 } 318 322 319 323 #endif

+8 -17

include/linux/rcupdate.h

··· 95 95 96 96 static inline void __rcu_read_unlock(void) 97 97 { 98 - preempt_enable(); 99 98 if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) 100 99 rcu_read_unlock_strict(); 100 + preempt_enable(); 101 101 } 102 102 103 103 static inline int rcu_preempt_depth(void) ··· 120 120 void rcu_init(void); 121 121 extern int rcu_scheduler_active; 122 122 void rcu_sched_clock_irq(int user); 123 - 124 - #ifdef CONFIG_TASKS_RCU_GENERIC 125 - void rcu_init_tasks_generic(void); 126 - #else 127 - static inline void rcu_init_tasks_generic(void) { } 128 - #endif 129 123 130 124 #ifdef CONFIG_RCU_STALL_COMMON 131 125 void rcu_sysrq_start(void); ··· 800 806 * sections, invocation of the corresponding RCU callback is deferred 801 807 * until after the all the other CPUs exit their critical sections. 802 808 * 803 - * In v5.0 and later kernels, synchronize_rcu() and call_rcu() also 804 - * wait for regions of code with preemption disabled, including regions of 805 - * code with interrupts or softirqs disabled. In pre-v5.0 kernels, which 806 - * define synchronize_sched(), only code enclosed within rcu_read_lock() 807 - * and rcu_read_unlock() are guaranteed to be waited for. 809 + * Both synchronize_rcu() and call_rcu() also wait for regions of code 810 + * with preemption disabled, including regions of code with interrupts or 811 + * softirqs disabled. 808 812 * 809 813 * Note, however, that RCU callbacks are permitted to run concurrently 810 814 * with new RCU read-side critical sections. One way that this can happen ··· 857 865 * rcu_read_unlock() - marks the end of an RCU read-side critical section. 858 866 * 859 867 * In almost all situations, rcu_read_unlock() is immune from deadlock. 860 - * In recent kernels that have consolidated synchronize_sched() and 861 - * synchronize_rcu_bh() into synchronize_rcu(), this deadlock immunity 862 - * also extends to the scheduler's runqueue and priority-inheritance 863 - * spinlocks, courtesy of the quiescent-state deferral that is carried 864 - * out when rcu_read_unlock() is invoked with interrupts disabled. 868 + * This deadlock immunity also extends to the scheduler's runqueue 869 + * and priority-inheritance spinlocks, courtesy of the quiescent-state 870 + * deferral that is carried out when rcu_read_unlock() is invoked with 871 + * interrupts disabled. 865 872 * 866 873 * See rcu_read_lock() for more information. 867 874 */

include/linux/rcupdate_wait.h

··· 16 16 struct rcu_synchronize { 17 17 struct rcu_head head; 18 18 struct completion completion; 19 + 20 + /* This is for debugging. */ 21 + struct rcu_gp_oldstate oldstate; 19 22 }; 20 23 void wakeme_after_rcu(struct rcu_head *head); 21 24

+1 -1

include/linux/rcutree.h

··· 100 100 void rcu_end_inkernel_boot(void); 101 101 bool rcu_inkernel_boot_has_ended(void); 102 102 bool rcu_is_watching(void); 103 - #ifndef CONFIG_PREEMPTION 103 + #ifndef CONFIG_PREEMPT_RCU 104 104 void rcu_all_qs(void); 105 105 #endif 106 106

+88 -14

include/linux/srcu.h

··· 47 47 #define SRCU_READ_FLAVOR_NORMAL 0x1 // srcu_read_lock(). 48 48 #define SRCU_READ_FLAVOR_NMI 0x2 // srcu_read_lock_nmisafe(). 49 49 #define SRCU_READ_FLAVOR_LITE 0x4 // srcu_read_lock_lite(). 50 - #define SRCU_READ_FLAVOR_ALL 0x7 // All of the above. 50 + #define SRCU_READ_FLAVOR_FAST 0x8 // srcu_read_lock_fast(). 51 + #define SRCU_READ_FLAVOR_ALL (SRCU_READ_FLAVOR_NORMAL | SRCU_READ_FLAVOR_NMI | \ 52 + SRCU_READ_FLAVOR_LITE | SRCU_READ_FLAVOR_FAST) // All of the above. 53 + #define SRCU_READ_FLAVOR_SLOWGP (SRCU_READ_FLAVOR_LITE | SRCU_READ_FLAVOR_FAST) 54 + // Flavors requiring synchronize_rcu() 55 + // instead of smp_mb(). 56 + void __srcu_read_unlock(struct srcu_struct *ssp, int idx) __releases(ssp); 51 57 52 58 #ifdef CONFIG_TINY_SRCU 53 59 #include <linux/srcutiny.h> ··· 66 60 void call_srcu(struct srcu_struct *ssp, struct rcu_head *head, 67 61 void (*func)(struct rcu_head *head)); 68 62 void cleanup_srcu_struct(struct srcu_struct *ssp); 69 - int __srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp); 70 - void __srcu_read_unlock(struct srcu_struct *ssp, int idx) __releases(ssp); 71 - #ifdef CONFIG_TINY_SRCU 72 - #define __srcu_read_lock_lite __srcu_read_lock 73 - #define __srcu_read_unlock_lite __srcu_read_unlock 74 - #else // #ifdef CONFIG_TINY_SRCU 75 - int __srcu_read_lock_lite(struct srcu_struct *ssp) __acquires(ssp); 76 - void __srcu_read_unlock_lite(struct srcu_struct *ssp, int idx) __releases(ssp); 77 - #endif // #else // #ifdef CONFIG_TINY_SRCU 78 63 void synchronize_srcu(struct srcu_struct *ssp); 79 64 80 65 #define SRCU_GET_STATE_COMPLETED 0x1 ··· 255 258 } 256 259 257 260 /** 261 + * srcu_read_lock_fast - register a new reader for an SRCU-protected structure. 262 + * @ssp: srcu_struct in which to register the new reader. 263 + * 264 + * Enter an SRCU read-side critical section, but for a light-weight 265 + * smp_mb()-free reader. See srcu_read_lock() for more information. 266 + * 267 + * If srcu_read_lock_fast() is ever used on an srcu_struct structure, 268 + * then none of the other flavors may be used, whether before, during, 269 + * or after. Note that grace-period auto-expediting is disabled for _fast 270 + * srcu_struct structures because auto-expedited grace periods invoke 271 + * synchronize_rcu_expedited(), IPIs and all. 272 + * 273 + * Note that srcu_read_lock_fast() can be invoked only from those contexts 274 + * where RCU is watching, that is, from contexts where it would be legal 275 + * to invoke rcu_read_lock(). Otherwise, lockdep will complain. 276 + */ 277 + static inline struct srcu_ctr __percpu *srcu_read_lock_fast(struct srcu_struct *ssp) __acquires(ssp) 278 + { 279 + struct srcu_ctr __percpu *retval; 280 + 281 + srcu_check_read_flavor_force(ssp, SRCU_READ_FLAVOR_FAST); 282 + retval = __srcu_read_lock_fast(ssp); 283 + rcu_try_lock_acquire(&ssp->dep_map); 284 + return retval; 285 + } 286 + 287 + /** 288 + * srcu_down_read_fast - register a new reader for an SRCU-protected structure. 289 + * @ssp: srcu_struct in which to register the new reader. 290 + * 291 + * Enter a semaphore-like SRCU read-side critical section, but for 292 + * a light-weight smp_mb()-free reader. See srcu_read_lock_fast() and 293 + * srcu_down_read() for more information. 294 + * 295 + * The same srcu_struct may be used concurrently by srcu_down_read_fast() 296 + * and srcu_read_lock_fast(). 297 + */ 298 + static inline struct srcu_ctr __percpu *srcu_down_read_fast(struct srcu_struct *ssp) __acquires(ssp) 299 + { 300 + WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && in_nmi()); 301 + srcu_check_read_flavor_force(ssp, SRCU_READ_FLAVOR_FAST); 302 + return __srcu_read_lock_fast(ssp); 303 + } 304 + 305 + /** 258 306 * srcu_read_lock_lite - register a new reader for an SRCU-protected structure. 259 307 * @ssp: srcu_struct in which to register the new reader. 260 308 * ··· 320 278 { 321 279 int retval; 322 280 323 - srcu_check_read_flavor_lite(ssp); 281 + srcu_check_read_flavor_force(ssp, SRCU_READ_FLAVOR_LITE); 324 282 retval = __srcu_read_lock_lite(ssp); 325 283 rcu_try_lock_acquire(&ssp->dep_map); 326 284 return retval; ··· 377 335 * srcu_down_read() nor srcu_up_read() may be invoked from an NMI handler. 378 336 * 379 337 * Calls to srcu_down_read() may be nested, similar to the manner in 380 - * which calls to down_read() may be nested. 338 + * which calls to down_read() may be nested. The same srcu_struct may be 339 + * used concurrently by srcu_down_read() and srcu_read_lock(). 381 340 */ 382 341 static inline int srcu_down_read(struct srcu_struct *ssp) __acquires(ssp) 383 342 { ··· 404 361 } 405 362 406 363 /** 364 + * srcu_read_unlock_fast - unregister a old reader from an SRCU-protected structure. 365 + * @ssp: srcu_struct in which to unregister the old reader. 366 + * @scp: return value from corresponding srcu_read_lock_fast(). 367 + * 368 + * Exit a light-weight SRCU read-side critical section. 369 + */ 370 + static inline void srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp) 371 + __releases(ssp) 372 + { 373 + srcu_check_read_flavor(ssp, SRCU_READ_FLAVOR_FAST); 374 + srcu_lock_release(&ssp->dep_map); 375 + __srcu_read_unlock_fast(ssp, scp); 376 + } 377 + 378 + /** 379 + * srcu_up_read_fast - unregister a old reader from an SRCU-protected structure. 380 + * @ssp: srcu_struct in which to unregister the old reader. 381 + * @scp: return value from corresponding srcu_read_lock_fast(). 382 + * 383 + * Exit an SRCU read-side critical section, but not necessarily from 384 + * the same context as the maching srcu_down_read_fast(). 385 + */ 386 + static inline void srcu_up_read_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp) 387 + __releases(ssp) 388 + { 389 + WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && in_nmi()); 390 + srcu_check_read_flavor(ssp, SRCU_READ_FLAVOR_FAST); 391 + __srcu_read_unlock_fast(ssp, scp); 392 + } 393 + 394 + /** 407 395 * srcu_read_unlock_lite - unregister a old reader from an SRCU-protected structure. 408 396 * @ssp: srcu_struct in which to unregister the old reader. 409 - * @idx: return value from corresponding srcu_read_lock(). 397 + * @idx: return value from corresponding srcu_read_lock_lite(). 410 398 * 411 399 * Exit a light-weight SRCU read-side critical section. 412 400 */ ··· 453 379 /** 454 380 * srcu_read_unlock_nmisafe - unregister a old reader from an SRCU-protected structure. 455 381 * @ssp: srcu_struct in which to unregister the old reader. 456 - * @idx: return value from corresponding srcu_read_lock(). 382 + * @idx: return value from corresponding srcu_read_lock_nmisafe(). 457 383 * 458 384 * Exit an SRCU read-side critical section, but in an NMI-safe manner. 459 385 */

+27 -2

include/linux/srcutiny.h

··· 64 64 { 65 65 int idx; 66 66 67 - preempt_disable(); // Needed for PREEMPT_AUTO 67 + preempt_disable(); // Needed for PREEMPT_LAZY 68 68 idx = ((READ_ONCE(ssp->srcu_idx) + 1) & 0x2) >> 1; 69 69 WRITE_ONCE(ssp->srcu_lock_nesting[idx], READ_ONCE(ssp->srcu_lock_nesting[idx]) + 1); 70 70 preempt_enable(); 71 71 return idx; 72 72 } 73 + 74 + struct srcu_ctr; 75 + 76 + static inline bool __srcu_ptr_to_ctr(struct srcu_struct *ssp, struct srcu_ctr __percpu *scpp) 77 + { 78 + return (int)(intptr_t)(struct srcu_ctr __force __kernel *)scpp; 79 + } 80 + 81 + static inline struct srcu_ctr __percpu *__srcu_ctr_to_ptr(struct srcu_struct *ssp, int idx) 82 + { 83 + return (struct srcu_ctr __percpu *)(intptr_t)idx; 84 + } 85 + 86 + static inline struct srcu_ctr __percpu *__srcu_read_lock_fast(struct srcu_struct *ssp) 87 + { 88 + return __srcu_ctr_to_ptr(ssp, __srcu_read_lock(ssp)); 89 + } 90 + 91 + static inline void __srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp) 92 + { 93 + __srcu_read_unlock(ssp, __srcu_ptr_to_ctr(ssp, scp)); 94 + } 95 + 96 + #define __srcu_read_lock_lite __srcu_read_lock 97 + #define __srcu_read_unlock_lite __srcu_read_unlock 73 98 74 99 static inline void synchronize_srcu_expedited(struct srcu_struct *ssp) 75 100 { ··· 107 82 } 108 83 109 84 #define srcu_check_read_flavor(ssp, read_flavor) do { } while (0) 110 - #define srcu_check_read_flavor_lite(ssp) do { } while (0) 85 + #define srcu_check_read_flavor_force(ssp, read_flavor) do { } while (0) 111 86 112 87 /* Defined here to avoid size increase for non-torture kernels. */ 113 88 static inline void srcu_torture_stats_print(struct srcu_struct *ssp,

+86 -12

include/linux/srcutree.h

··· 17 17 struct srcu_node; 18 18 struct srcu_struct; 19 19 20 + /* One element of the srcu_data srcu_ctrs array. */ 21 + struct srcu_ctr { 22 + atomic_long_t srcu_locks; /* Locks per CPU. */ 23 + atomic_long_t srcu_unlocks; /* Unlocks per CPU. */ 24 + }; 25 + 20 26 /* 21 27 * Per-CPU structure feeding into leaf srcu_node, similar in function 22 28 * to rcu_node. 23 29 */ 24 30 struct srcu_data { 25 31 /* Read-side state. */ 26 - atomic_long_t srcu_lock_count[2]; /* Locks per CPU. */ 27 - atomic_long_t srcu_unlock_count[2]; /* Unlocks per CPU. */ 32 + struct srcu_ctr srcu_ctrs[2]; /* Locks and unlocks per CPU. */ 28 33 int srcu_reader_flavor; /* Reader flavor for srcu_struct structure? */ 29 34 /* Values: SRCU_READ_FLAVOR_.* */ 30 35 ··· 100 95 * Per-SRCU-domain structure, similar in function to rcu_state. 101 96 */ 102 97 struct srcu_struct { 103 - unsigned int srcu_idx; /* Current rdr array element. */ 98 + struct srcu_ctr __percpu *srcu_ctrp; 104 99 struct srcu_data __percpu *sda; /* Per-CPU srcu_data array. */ 105 100 struct lockdep_map dep_map; 106 101 struct srcu_usage *srcu_sup; /* Update-side data. */ ··· 167 162 #define __SRCU_STRUCT_INIT(name, usage_name, pcpu_name) \ 168 163 { \ 169 164 .sda = &pcpu_name, \ 165 + .srcu_ctrp = &pcpu_name.srcu_ctrs[0], \ 170 166 __SRCU_STRUCT_INIT_COMMON(name, usage_name) \ 171 167 } 172 168 ··· 207 201 #define DEFINE_SRCU(name) __DEFINE_SRCU(name, /* not static */) 208 202 #define DEFINE_STATIC_SRCU(name) __DEFINE_SRCU(name, static) 209 203 204 + int __srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp); 210 205 void synchronize_srcu_expedited(struct srcu_struct *ssp); 211 206 void srcu_barrier(struct srcu_struct *ssp); 212 207 void srcu_torture_stats_print(struct srcu_struct *ssp, char *tt, char *tf); 208 + 209 + // Converts a per-CPU pointer to an ->srcu_ctrs[] array element to that 210 + // element's index. 211 + static inline bool __srcu_ptr_to_ctr(struct srcu_struct *ssp, struct srcu_ctr __percpu *scpp) 212 + { 213 + return scpp - &ssp->sda->srcu_ctrs[0]; 214 + } 215 + 216 + // Converts an integer to a per-CPU pointer to the corresponding 217 + // ->srcu_ctrs[] array element. 218 + static inline struct srcu_ctr __percpu *__srcu_ctr_to_ptr(struct srcu_struct *ssp, int idx) 219 + { 220 + return &ssp->sda->srcu_ctrs[idx]; 221 + } 222 + 223 + /* 224 + * Counts the new reader in the appropriate per-CPU element of the 225 + * srcu_struct. Returns a pointer that must be passed to the matching 226 + * srcu_read_unlock_fast(). 227 + * 228 + * Note that both this_cpu_inc() and atomic_long_inc() are RCU read-side 229 + * critical sections either because they disables interrupts, because they 230 + * are a single instruction, or because they are a read-modify-write atomic 231 + * operation, depending on the whims of the architecture. 232 + * 233 + * This means that __srcu_read_lock_fast() is not all that fast 234 + * on architectures that support NMIs but do not supply NMI-safe 235 + * implementations of this_cpu_inc(). 236 + */ 237 + static inline struct srcu_ctr __percpu *__srcu_read_lock_fast(struct srcu_struct *ssp) 238 + { 239 + struct srcu_ctr __percpu *scp = READ_ONCE(ssp->srcu_ctrp); 240 + 241 + RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_lock_fast()."); 242 + if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE)) 243 + this_cpu_inc(scp->srcu_locks.counter); /* Y */ 244 + else 245 + atomic_long_inc(raw_cpu_ptr(&scp->srcu_locks)); /* Z */ 246 + barrier(); /* Avoid leaking the critical section. */ 247 + return scp; 248 + } 249 + 250 + /* 251 + * Removes the count for the old reader from the appropriate 252 + * per-CPU element of the srcu_struct. Note that this may well be a 253 + * different CPU than that which was incremented by the corresponding 254 + * srcu_read_lock_fast(), but it must be within the same task. 255 + * 256 + * Note that both this_cpu_inc() and atomic_long_inc() are RCU read-side 257 + * critical sections either because they disables interrupts, because they 258 + * are a single instruction, or because they are a read-modify-write atomic 259 + * operation, depending on the whims of the architecture. 260 + * 261 + * This means that __srcu_read_unlock_fast() is not all that fast 262 + * on architectures that support NMIs but do not supply NMI-safe 263 + * implementations of this_cpu_inc(). 264 + */ 265 + static inline void __srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp) 266 + { 267 + barrier(); /* Avoid leaking the critical section. */ 268 + if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE)) 269 + this_cpu_inc(scp->srcu_unlocks.counter); /* Z */ 270 + else 271 + atomic_long_inc(raw_cpu_ptr(&scp->srcu_unlocks)); /* Z */ 272 + RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_unlock_fast()."); 273 + } 213 274 214 275 /* 215 276 * Counts the new reader in the appropriate per-CPU element of the ··· 290 217 */ 291 218 static inline int __srcu_read_lock_lite(struct srcu_struct *ssp) 292 219 { 293 - int idx; 220 + struct srcu_ctr __percpu *scp = READ_ONCE(ssp->srcu_ctrp); 294 221 295 222 RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_lock_lite()."); 296 - idx = READ_ONCE(ssp->srcu_idx) & 0x1; 297 - this_cpu_inc(ssp->sda->srcu_lock_count[idx].counter); /* Y */ 223 + this_cpu_inc(scp->srcu_locks.counter); /* Y */ 298 224 barrier(); /* Avoid leaking the critical section. */ 299 - return idx; 225 + return __srcu_ptr_to_ctr(ssp, scp); 300 226 } 301 227 302 228 /* ··· 312 240 static inline void __srcu_read_unlock_lite(struct srcu_struct *ssp, int idx) 313 241 { 314 242 barrier(); /* Avoid leaking the critical section. */ 315 - this_cpu_inc(ssp->sda->srcu_unlock_count[idx].counter); /* Z */ 243 + this_cpu_inc(__srcu_ctr_to_ptr(ssp, idx)->srcu_unlocks.counter); /* Z */ 316 244 RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_unlock_lite()."); 317 245 } 318 246 319 247 void __srcu_check_read_flavor(struct srcu_struct *ssp, int read_flavor); 320 248 321 - // Record _lite() usage even for CONFIG_PROVE_RCU=n kernels. 322 - static inline void srcu_check_read_flavor_lite(struct srcu_struct *ssp) 249 + // Record reader usage even for CONFIG_PROVE_RCU=n kernels. This is 250 + // needed only for flavors that require grace-period smp_mb() calls to be 251 + // promoted to synchronize_rcu(). 252 + static inline void srcu_check_read_flavor_force(struct srcu_struct *ssp, int read_flavor) 323 253 { 324 254 struct srcu_data *sdp = raw_cpu_ptr(ssp->sda); 325 255 326 - if (likely(READ_ONCE(sdp->srcu_reader_flavor) & SRCU_READ_FLAVOR_LITE)) 256 + if (likely(READ_ONCE(sdp->srcu_reader_flavor) & read_flavor)) 327 257 return; 328 258 329 259 // Note that the cmpxchg() in __srcu_check_read_flavor() is fully ordered. 330 - __srcu_check_read_flavor(ssp, SRCU_READ_FLAVOR_LITE); 260 + __srcu_check_read_flavor(ssp, read_flavor); 331 261 } 332 262 333 263 // Record non-_lite() usage only for CONFIG_PROVE_RCU=y kernels.

include/linux/torture.h

··· 104 104 /* Initialization and cleanup. */ 105 105 bool torture_init_begin(char *ttype, int v); 106 106 void torture_init_end(void); 107 + unsigned long get_torture_init_jiffies(void); 107 108 bool torture_cleanup_begin(void); 108 109 void torture_cleanup_end(void); 109 110 bool torture_must_stop(void);

+1 -1

include/trace/events/rcu.h

··· 207 207 __entry->gpevent = gpevent; 208 208 ), 209 209 210 - TP_printk("%s %ld %s", 210 + TP_printk("%s %#lx %s", 211 211 __entry->rcuname, __entry->gpseq, __entry->gpevent) 212 212 ); 213 213

-1

init/main.c

··· 1553 1553 1554 1554 init_mm_internals(); 1555 1555 1556 - rcu_init_tasks_generic(); 1557 1556 do_pre_smp_initcalls(); 1558 1557 lockup_detector_init(); 1559 1558

+4 -5

kernel/context_tracking.c

··· 80 80 */ 81 81 static noinstr void ct_kernel_exit_state(int offset) 82 82 { 83 - int seq; 84 - 85 83 /* 86 84 * CPUs seeing atomic_add_return() must see prior RCU read-side 87 85 * critical sections, and we also must force ordering with the 88 86 * next idle sojourn. 89 87 */ 90 88 rcu_task_trace_heavyweight_enter(); // Before CT state update! 91 - seq = ct_state_inc(offset); 92 - // RCU is no longer watching. Better be in extended quiescent state! 93 - WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && (seq & CT_RCU_WATCHING)); 89 + // RCU is still watching. Better not be in extended quiescent state! 90 + WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !rcu_is_watching_curr_cpu()); 91 + (void)ct_state_inc(offset); 92 + // RCU is no longer watching. 94 93 } 95 94 96 95 /*

+1 -3

kernel/printk/printk.c

··· 2461 2461 } 2462 2462 EXPORT_SYMBOL(_printk); 2463 2463 2464 - static bool pr_flush(int timeout_ms, bool reset_on_progress); 2465 2464 static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress); 2466 2465 2467 2466 #else /* CONFIG_PRINTK */ ··· 2473 2474 2474 2475 static u64 syslog_seq; 2475 2476 2476 - static bool pr_flush(int timeout_ms, bool reset_on_progress) { return true; } 2477 2477 static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress) { return true; } 2478 2478 2479 2479 #endif /* CONFIG_PRINTK */ ··· 4464 4466 * Context: Process context. May sleep while acquiring console lock. 4465 4467 * Return: true if all usable printers are caught up. 4466 4468 */ 4467 - static bool pr_flush(int timeout_ms, bool reset_on_progress) 4469 + bool pr_flush(int timeout_ms, bool reset_on_progress) 4468 4470 { 4469 4471 return __pr_flush(NULL, timeout_ms, reset_on_progress); 4470 4472 }

+26 -9

kernel/rcu/Kconfig

··· 18 18 19 19 config PREEMPT_RCU 20 20 bool 21 - default y if PREEMPTION 21 + default y if (PREEMPT || PREEMPT_RT || PREEMPT_DYNAMIC) 22 22 select TREE_RCU 23 23 help 24 24 This option selects the RCU implementation that is ··· 65 65 help 66 66 This option selects the full-fledged version of SRCU. 67 67 68 + config FORCE_NEED_SRCU_NMI_SAFE 69 + bool "Force selection of NEED_SRCU_NMI_SAFE" 70 + depends on !TINY_SRCU 71 + select NEED_SRCU_NMI_SAFE 72 + default n 73 + help 74 + This option forces selection of the NEED_SRCU_NMI_SAFE 75 + Kconfig option, allowing testing of srcu_read_lock_nmisafe() 76 + and srcu_read_unlock_nmisafe() on architectures (like x86) 77 + that select the ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option. 78 + 68 79 config NEED_SRCU_NMI_SAFE 69 80 def_bool HAVE_NMI && !ARCH_HAS_NMI_SAFE_THIS_CPU_OPS && !TINY_SRCU 70 81 ··· 102 91 103 92 config TASKS_RCU 104 93 bool 105 - default NEED_TASKS_RCU && (PREEMPTION || PREEMPT_AUTO) 94 + default NEED_TASKS_RCU && PREEMPTION 106 95 select IRQ_WORK 107 96 108 97 config FORCE_TASKS_RUDE_RCU ··· 334 323 depends on RCU_NOCB_CPU 335 324 default n 336 325 help 337 - To save power, batch RCU callbacks and flush after delay, memory 338 - pressure, or callback list growing too big. 326 + To save power, batch RCU callbacks and delay starting the 327 + corresponding grace period for multiple seconds. The grace 328 + period will be started after this delay, in case of memory 329 + pressure, or if the corresponding CPU's callback list grows 330 + too large. 339 331 340 - Requires rcu_nocbs=all to be set. 332 + These delays happen only on rcu_nocbs CPUs, that is, CPUs 333 + whose callbacks have been offloaded. 341 334 342 - Use rcutree.enable_rcu_lazy=0 to turn it off at boot time. 335 + Use the rcutree.enable_rcu_lazy=0 kernel-boot parameter to 336 + globally disable these delays. 343 337 344 338 config RCU_LAZY_DEFAULT_OFF 345 339 bool "Turn RCU lazy invocation off by default" 346 340 depends on RCU_LAZY 347 341 default n 348 342 help 349 - Allows building the kernel with CONFIG_RCU_LAZY=y yet keep it default 350 - off. Boot time param rcutree.enable_rcu_lazy=1 can be used to switch 351 - it back on. 343 + Build the kernel with CONFIG_RCU_LAZY=y, but cause the kernel 344 + to boot with these energy-efficiency delays disabled. Use the 345 + rcutree.enable_rcu_lazy=0 kernel-boot parameter to override 346 + the this option at boot time, thus re-enabling these delays. 352 347 353 348 config RCU_DOUBLE_CHECK_CB_TIME 354 349 bool "RCU callback-batch backup time check"

+16 -2

kernel/rcu/Kconfig.debug

··· 54 54 Say N if you are unsure. 55 55 56 56 config RCU_TORTURE_TEST_CHK_RDR_STATE 57 - tristate "Check rcutorture reader state" 57 + bool "Check rcutorture reader state" 58 58 depends on RCU_TORTURE_TEST 59 59 default n 60 60 help ··· 70 70 Say N if you are unsure. 71 71 72 72 config RCU_TORTURE_TEST_LOG_CPU 73 - tristate "Log CPU for rcutorture failures" 73 + bool "Log CPU for rcutorture failures" 74 74 depends on RCU_TORTURE_TEST 75 75 default n 76 76 help ··· 82 82 less probable. 83 83 84 84 Say Y here if you want CPU IDs logged. 85 + Say N if you are unsure. 86 + 87 + config RCU_TORTURE_TEST_LOG_GP 88 + bool "Log grace-period numbers for rcutorture failures" 89 + depends on RCU_TORTURE_TEST 90 + default n 91 + help 92 + This option causes rcutorture to decorate each entry of its 93 + log of failure/close-call rcutorture reader segments with the 94 + corresponding grace-period sequence numbers. This information 95 + can be useful, but it does incur additional overhead, overhead 96 + that can make both failures and close calls less probable. 97 + 98 + Say Y here if you want grace-period sequence numbers logged. 85 99 Say N if you are unsure. 86 100 87 101 config RCU_REF_SCALE_TEST

+9 -4

kernel/rcu/rcu.h

··· 162 162 { 163 163 unsigned long cur_s = READ_ONCE(*sp); 164 164 165 - return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 1)); 165 + return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (3 * RCU_SEQ_STATE_MASK + 1)); 166 166 } 167 167 168 168 /* ··· 590 590 #endif 591 591 static inline void rcu_gp_set_torture_wait(int duration) { } 592 592 #endif 593 + unsigned long long rcutorture_gather_gp_seqs(void); 594 + void rcutorture_format_gp_seqs(unsigned long long seqs, char *cp, size_t len); 593 595 594 596 #ifdef CONFIG_TINY_SRCU 595 597 ··· 613 611 static inline bool rcu_watching_zero_in_eqs(int cpu, int *vp) { return false; } 614 612 static inline unsigned long rcu_get_gp_seq(void) { return 0; } 615 613 static inline unsigned long rcu_exp_batches_completed(void) { return 0; } 616 - static inline unsigned long 617 - srcu_batches_completed(struct srcu_struct *sp) { return 0; } 618 614 static inline void rcu_force_quiescent_state(void) { } 619 615 static inline bool rcu_check_boost_fail(unsigned long gp_state, int *cpup) { return true; } 620 616 static inline void show_rcu_gp_kthreads(void) { } ··· 624 624 bool rcu_watching_zero_in_eqs(int cpu, int *vp); 625 625 unsigned long rcu_get_gp_seq(void); 626 626 unsigned long rcu_exp_batches_completed(void); 627 - unsigned long srcu_batches_completed(struct srcu_struct *sp); 628 627 bool rcu_check_boost_fail(unsigned long gp_state, int *cpup); 629 628 void show_rcu_gp_kthreads(void); 630 629 int rcu_get_gp_kthreads_prio(void); ··· 634 635 void rcu_gp_slow_register(atomic_t *rgssp); 635 636 void rcu_gp_slow_unregister(atomic_t *rgssp); 636 637 #endif /* #else #ifdef CONFIG_TINY_RCU */ 638 + 639 + #ifdef CONFIG_TINY_SRCU 640 + static inline unsigned long srcu_batches_completed(struct srcu_struct *sp) { return 0; } 641 + #else // #ifdef CONFIG_TINY_SRCU 642 + unsigned long srcu_batches_completed(struct srcu_struct *sp); 643 + #endif // #else // #ifdef CONFIG_TINY_SRCU 637 644 638 645 #ifdef CONFIG_RCU_NOCB_CPU 639 646 void rcu_bind_current_to_nocb(void);

+110 -14

kernel/rcu/rcutorture.c

··· 135 135 torture_param(int, stutter, 5, "Number of seconds to run/halt test"); 136 136 torture_param(int, test_boost, 1, "Test RCU prio boost: 0=no, 1=maybe, 2=yes."); 137 137 torture_param(int, test_boost_duration, 4, "Duration of each boost test, seconds."); 138 + torture_param(int, test_boost_holdoff, 0, "Holdoff time from rcutorture start, seconds."); 138 139 torture_param(int, test_boost_interval, 7, "Interval between boost tests, seconds."); 139 140 torture_param(int, test_nmis, 0, "End-test NMI tests, 0 to disable."); 140 141 torture_param(bool, test_no_idle_hz, true, "Test support for tickless idle CPUs"); ··· 148 147 149 148 static int nrealnocbers; 150 149 static int nrealreaders; 150 + static int nrealfakewriters; 151 151 static struct task_struct *writer_task; 152 152 static struct task_struct **fakewriter_tasks; 153 153 static struct task_struct **reader_tasks; ··· 274 272 bool rt_preempted; 275 273 int rt_cpu; 276 274 int rt_end_cpu; 275 + unsigned long long rt_gp_seq; 276 + unsigned long long rt_gp_seq_end; 277 + u64 rt_ts; 277 278 }; 278 279 static int err_segs_recorded; 279 280 static struct rt_read_seg err_segs[RCUTORTURE_RDR_MAX_SEGS]; ··· 411 406 void (*gp_slow_register)(atomic_t *rgssp); 412 407 void (*gp_slow_unregister)(atomic_t *rgssp); 413 408 bool (*reader_blocked)(void); 409 + unsigned long long (*gather_gp_seqs)(void); 410 + void (*format_gp_seqs)(unsigned long long seqs, char *cp, size_t len); 414 411 long cbflood_max; 415 412 int irq_capable; 416 413 int can_boost; ··· 617 610 .reader_blocked = IS_ENABLED(CONFIG_RCU_TORTURE_TEST_LOG_CPU) 618 611 ? has_rcu_reader_blocked 619 612 : NULL, 613 + .gather_gp_seqs = rcutorture_gather_gp_seqs, 614 + .format_gp_seqs = rcutorture_format_gp_seqs, 620 615 .irq_capable = 1, 621 616 .can_boost = IS_ENABLED(CONFIG_RCU_BOOST), 622 617 .extendables = RCUTORTURE_MAX_EXTEND, ··· 664 655 .sync = synchronize_rcu_busted, 665 656 .exp_sync = synchronize_rcu_busted, 666 657 .call = call_rcu_busted, 658 + .gather_gp_seqs = rcutorture_gather_gp_seqs, 659 + .format_gp_seqs = rcutorture_format_gp_seqs, 667 660 .irq_capable = 1, 668 661 .extendables = RCUTORTURE_MAX_EXTEND, 669 662 .name = "busted" ··· 688 677 static int srcu_torture_read_lock(void) 689 678 { 690 679 int idx; 680 + struct srcu_ctr __percpu *scp; 691 681 int ret = 0; 682 + 683 + WARN_ON_ONCE(reader_flavor & ~SRCU_READ_FLAVOR_ALL); 692 684 693 685 if ((reader_flavor & SRCU_READ_FLAVOR_NORMAL) || !(reader_flavor & SRCU_READ_FLAVOR_ALL)) { 694 686 idx = srcu_read_lock(srcu_ctlp); ··· 707 693 idx = srcu_read_lock_lite(srcu_ctlp); 708 694 WARN_ON_ONCE(idx & ~0x1); 709 695 ret += idx << 2; 696 + } 697 + if (reader_flavor & SRCU_READ_FLAVOR_FAST) { 698 + scp = srcu_read_lock_fast(srcu_ctlp); 699 + idx = __srcu_ptr_to_ctr(srcu_ctlp, scp); 700 + WARN_ON_ONCE(idx & ~0x1); 701 + ret += idx << 3; 710 702 } 711 703 return ret; 712 704 } ··· 739 719 static void srcu_torture_read_unlock(int idx) 740 720 { 741 721 WARN_ON_ONCE((reader_flavor && (idx & ~reader_flavor)) || (!reader_flavor && (idx & ~0x1))); 722 + if (reader_flavor & SRCU_READ_FLAVOR_FAST) 723 + srcu_read_unlock_fast(srcu_ctlp, __srcu_ctr_to_ptr(srcu_ctlp, (idx & 0x8) >> 3)); 742 724 if (reader_flavor & SRCU_READ_FLAVOR_LITE) 743 725 srcu_read_unlock_lite(srcu_ctlp, (idx & 0x4) >> 2); 744 726 if (reader_flavor & SRCU_READ_FLAVOR_NMI) ··· 813 791 .readunlock = srcu_torture_read_unlock, 814 792 .readlock_held = torture_srcu_read_lock_held, 815 793 .get_gp_seq = srcu_torture_completed, 794 + .gp_diff = rcu_seq_diff, 816 795 .deferred_free = srcu_torture_deferred_free, 817 796 .sync = srcu_torture_synchronize, 818 797 .exp_sync = srcu_torture_synchronize_expedited, ··· 857 834 .readunlock = srcu_torture_read_unlock, 858 835 .readlock_held = torture_srcu_read_lock_held, 859 836 .get_gp_seq = srcu_torture_completed, 837 + .gp_diff = rcu_seq_diff, 860 838 .deferred_free = srcu_torture_deferred_free, 861 839 .sync = srcu_torture_synchronize, 862 840 .exp_sync = srcu_torture_synchronize_expedited, ··· 1172 1148 unsigned long gp_state; 1173 1149 unsigned long gp_state_time; 1174 1150 unsigned long oldstarttime; 1151 + unsigned long booststarttime = get_torture_init_jiffies() + test_boost_holdoff * HZ; 1175 1152 1176 - VERBOSE_TOROUT_STRING("rcu_torture_boost started"); 1153 + if (test_boost_holdoff <= 0 || time_after(jiffies, booststarttime)) { 1154 + VERBOSE_TOROUT_STRING("rcu_torture_boost started"); 1155 + } else { 1156 + VERBOSE_TOROUT_STRING("rcu_torture_boost started holdoff period"); 1157 + while (time_before(jiffies, booststarttime)) { 1158 + schedule_timeout_idle(HZ); 1159 + if (kthread_should_stop()) 1160 + goto cleanup; 1161 + } 1162 + VERBOSE_TOROUT_STRING("rcu_torture_boost finished holdoff period"); 1163 + } 1177 1164 1178 1165 /* Set real-time priority. */ 1179 1166 sched_set_fifo_low(current); ··· 1260 1225 sched_set_fifo_low(current); 1261 1226 } while (!torture_must_stop()); 1262 1227 1228 + cleanup: 1263 1229 /* Clean up and exit. */ 1264 1230 while (!kthread_should_stop()) { 1265 1231 torture_shutdown_absorb("rcu_torture_boost"); ··· 1764 1728 do { 1765 1729 torture_hrtimeout_jiffies(torture_random(&rand) % 10, &rand); 1766 1730 if (cur_ops->cb_barrier != NULL && 1767 - torture_random(&rand) % (nfakewriters * 8) == 0) { 1731 + torture_random(&rand) % (nrealfakewriters * 8) == 0) { 1768 1732 cur_ops->cb_barrier(); 1769 1733 } else { 1770 1734 switch (synctype[torture_random(&rand) % nsynctypes]) { ··· 1909 1873 #define ROEC_ARGS "%s %s: Current %#x To add %#x To remove %#x preempt_count() %#x\n", __func__, s, curstate, new, old, preempt_count() 1910 1874 static void rcutorture_one_extend_check(char *s, int curstate, int new, int old, bool insoftirq) 1911 1875 { 1876 + int mask; 1877 + 1912 1878 if (!IS_ENABLED(CONFIG_RCU_TORTURE_TEST_CHK_RDR_STATE)) 1913 1879 return; 1914 1880 ··· 1937 1899 WARN_ONCE(cur_ops->extendables && 1938 1900 !(curstate & (RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH)) && 1939 1901 (preempt_count() & SOFTIRQ_MASK), ROEC_ARGS); 1940 - WARN_ONCE(cur_ops->extendables && 1941 - !(curstate & (RCUTORTURE_RDR_PREEMPT | RCUTORTURE_RDR_SCHED)) && 1902 + 1903 + /* 1904 + * non-preemptible RCU in a preemptible kernel uses preempt_disable() 1905 + * as rcu_read_lock(). 1906 + */ 1907 + mask = RCUTORTURE_RDR_PREEMPT | RCUTORTURE_RDR_SCHED; 1908 + if (!IS_ENABLED(CONFIG_PREEMPT_RCU)) 1909 + mask |= RCUTORTURE_RDR_RCU_1 | RCUTORTURE_RDR_RCU_2; 1910 + 1911 + WARN_ONCE(cur_ops->extendables && !(curstate & mask) && 1942 1912 (preempt_count() & PREEMPT_MASK), ROEC_ARGS); 1943 - WARN_ONCE(cur_ops->readlock_nesting && 1944 - !(curstate & (RCUTORTURE_RDR_RCU_1 | RCUTORTURE_RDR_RCU_2)) && 1913 + 1914 + /* 1915 + * non-preemptible RCU in a preemptible kernel uses "preempt_count() & 1916 + * PREEMPT_MASK" as ->readlock_nesting(). 1917 + */ 1918 + mask = RCUTORTURE_RDR_RCU_1 | RCUTORTURE_RDR_RCU_2; 1919 + if (!IS_ENABLED(CONFIG_PREEMPT_RCU)) 1920 + mask |= RCUTORTURE_RDR_PREEMPT | RCUTORTURE_RDR_SCHED; 1921 + 1922 + WARN_ONCE(cur_ops->readlock_nesting && !(curstate & mask) && 1945 1923 cur_ops->readlock_nesting() > 0, ROEC_ARGS); 1946 1924 } 1947 1925 ··· 2018 1964 if (cur_ops->reader_blocked) 2019 1965 rtrsp[-1].rt_preempted = cur_ops->reader_blocked(); 2020 1966 } 1967 + } 1968 + // Sample grace-period sequence number, as good a place as any. 1969 + if (IS_ENABLED(CONFIG_RCU_TORTURE_TEST_LOG_GP) && cur_ops->gather_gp_seqs) { 1970 + rtrsp->rt_gp_seq = cur_ops->gather_gp_seqs(); 1971 + rtrsp->rt_ts = ktime_get_mono_fast_ns(); 1972 + if (!first) 1973 + rtrsp[-1].rt_gp_seq_end = rtrsp->rt_gp_seq; 2021 1974 } 2022 1975 2023 1976 /* ··· 2573 2512 "shuffle_interval=%d stutter=%d irqreader=%d " 2574 2513 "fqs_duration=%d fqs_holdoff=%d fqs_stutter=%d " 2575 2514 "test_boost=%d/%d test_boost_interval=%d " 2576 - "test_boost_duration=%d shutdown_secs=%d " 2515 + "test_boost_duration=%d test_boost_holdoff=%d shutdown_secs=%d " 2577 2516 "stall_cpu=%d stall_cpu_holdoff=%d stall_cpu_irqsoff=%d " 2578 2517 "stall_cpu_block=%d stall_cpu_repeat=%d " 2579 2518 "n_barrier_cbs=%d " ··· 2583 2522 "nocbs_nthreads=%d nocbs_toggle=%d " 2584 2523 "test_nmis=%d " 2585 2524 "preempt_duration=%d preempt_interval=%d\n", 2586 - torture_type, tag, nrealreaders, nfakewriters, 2525 + torture_type, tag, nrealreaders, nrealfakewriters, 2587 2526 stat_interval, verbose, test_no_idle_hz, shuffle_interval, 2588 2527 stutter, irqreader, fqs_duration, fqs_holdoff, fqs_stutter, 2589 2528 test_boost, cur_ops->can_boost, 2590 - test_boost_interval, test_boost_duration, shutdown_secs, 2529 + test_boost_interval, test_boost_duration, test_boost_holdoff, shutdown_secs, 2591 2530 stall_cpu, stall_cpu_holdoff, stall_cpu_irqsoff, 2592 2531 stall_cpu_block, stall_cpu_repeat, 2593 2532 n_barrier_cbs, ··· 3614 3553 int flags = 0; 3615 3554 unsigned long gp_seq = 0; 3616 3555 int i; 3556 + int j; 3617 3557 3618 3558 if (torture_cleanup_begin()) { 3619 3559 if (cur_ops->cb_barrier != NULL) { ··· 3659 3597 rcu_torture_reader_mbchk = NULL; 3660 3598 3661 3599 if (fakewriter_tasks) { 3662 - for (i = 0; i < nfakewriters; i++) 3600 + for (i = 0; i < nrealfakewriters; i++) 3663 3601 torture_stop_kthread(rcu_torture_fakewriter, 3664 3602 fakewriter_tasks[i]); 3665 3603 kfree(fakewriter_tasks); ··· 3697 3635 pr_alert("\t: No segments recorded!!!\n"); 3698 3636 firsttime = 1; 3699 3637 for (i = 0; i < rt_read_nsegs; i++) { 3700 - pr_alert("\t%d: %#4x", i, err_segs[i].rt_readstate); 3638 + if (IS_ENABLED(CONFIG_RCU_TORTURE_TEST_LOG_GP)) 3639 + pr_alert("\t%lluus ", div64_u64(err_segs[i].rt_ts, 1000ULL)); 3640 + else 3641 + pr_alert("\t"); 3642 + pr_cont("%d: %#4x", i, err_segs[i].rt_readstate); 3701 3643 if (err_segs[i].rt_delay_jiffies != 0) { 3702 3644 pr_cont("%s%ldjiffies", firsttime ? "" : "+", 3703 3645 err_segs[i].rt_delay_jiffies); ··· 3713 3647 pr_cont("->%-2d", err_segs[i].rt_end_cpu); 3714 3648 else 3715 3649 pr_cont(" ..."); 3650 + } 3651 + if (IS_ENABLED(CONFIG_RCU_TORTURE_TEST_LOG_GP) && 3652 + cur_ops->gather_gp_seqs && cur_ops->format_gp_seqs) { 3653 + char buf1[20+1]; 3654 + char buf2[20+1]; 3655 + char sepchar = '-'; 3656 + 3657 + cur_ops->format_gp_seqs(err_segs[i].rt_gp_seq, 3658 + buf1, ARRAY_SIZE(buf1)); 3659 + cur_ops->format_gp_seqs(err_segs[i].rt_gp_seq_end, 3660 + buf2, ARRAY_SIZE(buf2)); 3661 + if (err_segs[i].rt_gp_seq == err_segs[i].rt_gp_seq_end) { 3662 + if (buf2[0]) { 3663 + for (j = 0; buf2[j]; j++) 3664 + buf2[j] = '.'; 3665 + if (j) 3666 + buf2[j - 1] = ' '; 3667 + } 3668 + sepchar = ' '; 3669 + } 3670 + pr_cont(" %s%c%s", buf1, sepchar, buf2); 3716 3671 } 3717 3672 if (err_segs[i].rt_delay_ms != 0) { 3718 3673 pr_cont(" %s%ldms", firsttime ? "" : "+", ··· 4081 3994 4082 3995 rcu_torture_init_srcu_lockdep(); 4083 3996 3997 + if (nfakewriters >= 0) { 3998 + nrealfakewriters = nfakewriters; 3999 + } else { 4000 + nrealfakewriters = num_online_cpus() - 2 - nfakewriters; 4001 + if (nrealfakewriters <= 0) 4002 + nrealfakewriters = 1; 4003 + } 4004 + 4084 4005 if (nreaders >= 0) { 4085 4006 nrealreaders = nreaders; 4086 4007 } else { ··· 4145 4050 writer_task); 4146 4051 if (torture_init_error(firsterr)) 4147 4052 goto unwind; 4148 - if (nfakewriters > 0) { 4149 - fakewriter_tasks = kcalloc(nfakewriters, 4053 + 4054 + if (nrealfakewriters > 0) { 4055 + fakewriter_tasks = kcalloc(nrealfakewriters, 4150 4056 sizeof(fakewriter_tasks[0]), 4151 4057 GFP_KERNEL); 4152 4058 if (fakewriter_tasks == NULL) { ··· 4156 4060 goto unwind; 4157 4061 } 4158 4062 } 4159 - for (i = 0; i < nfakewriters; i++) { 4063 + for (i = 0; i < nrealfakewriters; i++) { 4160 4064 firsterr = torture_create_kthread(rcu_torture_fakewriter, 4161 4065 NULL, fakewriter_tasks[i]); 4162 4066 if (torture_init_error(firsterr))

+31 -1

kernel/rcu/refscale.c

··· 216 216 .name = "srcu" 217 217 }; 218 218 219 + static void srcu_fast_ref_scale_read_section(const int nloops) 220 + { 221 + int i; 222 + struct srcu_ctr __percpu *scp; 223 + 224 + for (i = nloops; i >= 0; i--) { 225 + scp = srcu_read_lock_fast(srcu_ctlp); 226 + srcu_read_unlock_fast(srcu_ctlp, scp); 227 + } 228 + } 229 + 230 + static void srcu_fast_ref_scale_delay_section(const int nloops, const int udl, const int ndl) 231 + { 232 + int i; 233 + struct srcu_ctr __percpu *scp; 234 + 235 + for (i = nloops; i >= 0; i--) { 236 + scp = srcu_read_lock_fast(srcu_ctlp); 237 + un_delay(udl, ndl); 238 + srcu_read_unlock_fast(srcu_ctlp, scp); 239 + } 240 + } 241 + 242 + static const struct ref_scale_ops srcu_fast_ops = { 243 + .init = rcu_sync_scale_init, 244 + .readsection = srcu_fast_ref_scale_read_section, 245 + .delaysection = srcu_fast_ref_scale_delay_section, 246 + .name = "srcu-fast" 247 + }; 248 + 219 249 static void srcu_lite_ref_scale_read_section(const int nloops) 220 250 { 221 251 int i; ··· 1193 1163 long i; 1194 1164 int firsterr = 0; 1195 1165 static const struct ref_scale_ops *scale_ops[] = { 1196 - &rcu_ops, &srcu_ops, &srcu_lite_ops, RCU_TRACE_OPS RCU_TASKS_OPS 1166 + &rcu_ops, &srcu_ops, &srcu_fast_ops, &srcu_lite_ops, RCU_TRACE_OPS RCU_TASKS_OPS 1197 1167 &refcnt_ops, &rwlock_ops, &rwsem_ops, &lock_ops, &lock_irq_ops, 1198 1168 &acqrel_ops, &sched_clock_ops, &clock_ops, &jiffies_ops, 1199 1169 &typesafe_ref_ops, &typesafe_lock_ops, &typesafe_seqlock_ops,

+13 -7

kernel/rcu/srcutiny.c

··· 20 20 #include "rcu_segcblist.h" 21 21 #include "rcu.h" 22 22 23 + #ifndef CONFIG_TREE_RCU 23 24 int rcu_scheduler_active __read_mostly; 25 + #else // #ifndef CONFIG_TREE_RCU 26 + extern int rcu_scheduler_active; 27 + #endif // #else // #ifndef CONFIG_TREE_RCU 24 28 static LIST_HEAD(srcu_boot_list); 25 29 static bool srcu_init_done; 26 30 ··· 102 98 { 103 99 int newval; 104 100 105 - preempt_disable(); // Needed for PREEMPT_AUTO 101 + preempt_disable(); // Needed for PREEMPT_LAZY 106 102 newval = READ_ONCE(ssp->srcu_lock_nesting[idx]) - 1; 107 103 WRITE_ONCE(ssp->srcu_lock_nesting[idx], newval); 108 104 preempt_enable(); ··· 124 120 struct srcu_struct *ssp; 125 121 126 122 ssp = container_of(wp, struct srcu_struct, srcu_work); 127 - preempt_disable(); // Needed for PREEMPT_AUTO 123 + preempt_disable(); // Needed for PREEMPT_LAZY 128 124 if (ssp->srcu_gp_running || ULONG_CMP_GE(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max))) { 129 125 preempt_enable(); 130 126 return; /* Already running or nothing to do. */ ··· 142 138 WRITE_ONCE(ssp->srcu_gp_waiting, true); /* srcu_read_unlock() wakes! */ 143 139 preempt_enable(); 144 140 swait_event_exclusive(ssp->srcu_wq, !READ_ONCE(ssp->srcu_lock_nesting[idx])); 145 - preempt_disable(); // Needed for PREEMPT_AUTO 141 + preempt_disable(); // Needed for PREEMPT_LAZY 146 142 WRITE_ONCE(ssp->srcu_gp_waiting, false); /* srcu_read_unlock() cheap. */ 147 143 WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1); 148 144 preempt_enable(); ··· 163 159 * at interrupt level, but the ->srcu_gp_running checks will 164 160 * straighten that out. 165 161 */ 166 - preempt_disable(); // Needed for PREEMPT_AUTO 162 + preempt_disable(); // Needed for PREEMPT_LAZY 167 163 WRITE_ONCE(ssp->srcu_gp_running, false); 168 164 idx = ULONG_CMP_LT(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max)); 169 165 preempt_enable(); ··· 176 172 { 177 173 unsigned long cookie; 178 174 179 - preempt_disable(); // Needed for PREEMPT_AUTO 175 + preempt_disable(); // Needed for PREEMPT_LAZY 180 176 cookie = get_state_synchronize_srcu(ssp); 181 177 if (ULONG_CMP_GE(READ_ONCE(ssp->srcu_idx_max), cookie)) { 182 178 preempt_enable(); ··· 203 199 204 200 rhp->func = func; 205 201 rhp->next = NULL; 206 - preempt_disable(); // Needed for PREEMPT_AUTO 202 + preempt_disable(); // Needed for PREEMPT_LAZY 207 203 local_irq_save(flags); 208 204 *ssp->srcu_cb_tail = rhp; 209 205 ssp->srcu_cb_tail = &rhp->next; ··· 265 261 { 266 262 unsigned long ret; 267 263 268 - preempt_disable(); // Needed for PREEMPT_AUTO 264 + preempt_disable(); // Needed for PREEMPT_LAZY 269 265 ret = get_state_synchronize_srcu(ssp); 270 266 srcu_gp_start_if_needed(ssp); 271 267 preempt_enable(); ··· 286 282 } 287 283 EXPORT_SYMBOL_GPL(poll_state_synchronize_srcu); 288 284 285 + #ifndef CONFIG_TREE_RCU 289 286 /* Lockdep diagnostics. */ 290 287 void __init rcu_scheduler_starting(void) 291 288 { 292 289 rcu_scheduler_active = RCU_SCHEDULER_RUNNING; 293 290 } 291 + #endif // #ifndef CONFIG_TREE_RCU 294 292 295 293 /* 296 294 * Queue work for srcu_struct structures with early boot callbacks.

+110 -97

kernel/rcu/srcutree.c

··· 116 116 /* 117 117 * Initialize SRCU per-CPU data. Note that statically allocated 118 118 * srcu_struct structures might already have srcu_read_lock() and 119 - * srcu_read_unlock() running against them. So if the is_static parameter 120 - * is set, don't initialize ->srcu_lock_count[] and ->srcu_unlock_count[]. 119 + * srcu_read_unlock() running against them. So if the is_static 120 + * parameter is set, don't initialize ->srcu_ctrs[].srcu_locks and 121 + * ->srcu_ctrs[].srcu_unlocks. 121 122 */ 122 123 static void init_srcu_struct_data(struct srcu_struct *ssp) 123 124 { ··· 129 128 * Initialize the per-CPU srcu_data array, which feeds into the 130 129 * leaves of the srcu_node tree. 131 130 */ 132 - BUILD_BUG_ON(ARRAY_SIZE(sdp->srcu_lock_count) != 133 - ARRAY_SIZE(sdp->srcu_unlock_count)); 134 131 for_each_possible_cpu(cpu) { 135 132 sdp = per_cpu_ptr(ssp->sda, cpu); 136 133 spin_lock_init(&ACCESS_PRIVATE(sdp, lock)); ··· 246 247 ssp->srcu_sup->node = NULL; 247 248 mutex_init(&ssp->srcu_sup->srcu_cb_mutex); 248 249 mutex_init(&ssp->srcu_sup->srcu_gp_mutex); 249 - ssp->srcu_idx = 0; 250 250 ssp->srcu_sup->srcu_gp_seq = SRCU_GP_SEQ_INITIAL_VAL; 251 251 ssp->srcu_sup->srcu_barrier_seq = 0; 252 252 mutex_init(&ssp->srcu_sup->srcu_barrier_mutex); 253 253 atomic_set(&ssp->srcu_sup->srcu_barrier_cpu_cnt, 0); 254 254 INIT_DELAYED_WORK(&ssp->srcu_sup->work, process_srcu); 255 255 ssp->srcu_sup->sda_is_static = is_static; 256 - if (!is_static) 256 + if (!is_static) { 257 257 ssp->sda = alloc_percpu(struct srcu_data); 258 + ssp->srcu_ctrp = &ssp->sda->srcu_ctrs[0]; 259 + } 258 260 if (!ssp->sda) 259 261 goto err_free_sup; 260 262 init_srcu_struct_data(ssp); ··· 429 429 } 430 430 431 431 /* 432 - * Computes approximate total of the readers' ->srcu_lock_count[] values 433 - * for the rank of per-CPU counters specified by idx, and returns true if 434 - * the caller did the proper barrier (gp), and if the count of the locks 435 - * matches that of the unlocks passed in. 432 + * Computes approximate total of the readers' ->srcu_ctrs[].srcu_locks 433 + * values for the rank of per-CPU counters specified by idx, and returns 434 + * true if the caller did the proper barrier (gp), and if the count of 435 + * the locks matches that of the unlocks passed in. 436 436 */ 437 437 static bool srcu_readers_lock_idx(struct srcu_struct *ssp, int idx, bool gp, unsigned long unlocks) 438 438 { ··· 443 443 for_each_possible_cpu(cpu) { 444 444 struct srcu_data *sdp = per_cpu_ptr(ssp->sda, cpu); 445 445 446 - sum += atomic_long_read(&sdp->srcu_lock_count[idx]); 446 + sum += atomic_long_read(&sdp->srcu_ctrs[idx].srcu_locks); 447 447 if (IS_ENABLED(CONFIG_PROVE_RCU)) 448 448 mask = mask | READ_ONCE(sdp->srcu_reader_flavor); 449 449 } 450 450 WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && (mask & (mask - 1)), 451 451 "Mixed reader flavors for srcu_struct at %ps.\n", ssp); 452 - if (mask & SRCU_READ_FLAVOR_LITE && !gp) 452 + if (mask & SRCU_READ_FLAVOR_SLOWGP && !gp) 453 453 return false; 454 454 return sum == unlocks; 455 455 } 456 456 457 457 /* 458 - * Returns approximate total of the readers' ->srcu_unlock_count[] values 459 - * for the rank of per-CPU counters specified by idx. 458 + * Returns approximate total of the readers' ->srcu_ctrs[].srcu_unlocks 459 + * values for the rank of per-CPU counters specified by idx. 460 460 */ 461 461 static unsigned long srcu_readers_unlock_idx(struct srcu_struct *ssp, int idx, unsigned long *rdm) 462 462 { ··· 467 467 for_each_possible_cpu(cpu) { 468 468 struct srcu_data *sdp = per_cpu_ptr(ssp->sda, cpu); 469 469 470 - sum += atomic_long_read(&sdp->srcu_unlock_count[idx]); 470 + sum += atomic_long_read(&sdp->srcu_ctrs[idx].srcu_unlocks); 471 471 mask = mask | READ_ONCE(sdp->srcu_reader_flavor); 472 472 } 473 473 WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && (mask & (mask - 1)), ··· 487 487 unsigned long unlocks; 488 488 489 489 unlocks = srcu_readers_unlock_idx(ssp, idx, &rdm); 490 - did_gp = !!(rdm & SRCU_READ_FLAVOR_LITE); 490 + did_gp = !!(rdm & SRCU_READ_FLAVOR_SLOWGP); 491 491 492 492 /* 493 493 * Make sure that a lock is always counted if the corresponding ··· 509 509 * If the locks are the same as the unlocks, then there must have 510 510 * been no readers on this index at some point in this function. 511 511 * But there might be more readers, as a task might have read 512 - * the current ->srcu_idx but not yet have incremented its CPU's 513 - * ->srcu_lock_count[idx] counter. In fact, it is possible 512 + * the current ->srcu_ctrp but not yet have incremented its CPU's 513 + * ->srcu_ctrs[idx].srcu_locks counter. In fact, it is possible 514 514 * that most of the tasks have been preempted between fetching 515 - * ->srcu_idx and incrementing ->srcu_lock_count[idx]. And there 516 - * could be almost (ULONG_MAX / sizeof(struct task_struct)) tasks 517 - * in a system whose address space was fully populated with memory. 518 - * Call this quantity Nt. 515 + * ->srcu_ctrp and incrementing ->srcu_ctrs[idx].srcu_locks. And 516 + * there could be almost (ULONG_MAX / sizeof(struct task_struct)) 517 + * tasks in a system whose address space was fully populated 518 + * with memory. Call this quantity Nt. 519 519 * 520 - * So suppose that the updater is preempted at this point in the 521 - * code for a long time. That now-preempted updater has already 522 - * flipped ->srcu_idx (possibly during the preceding grace period), 523 - * done an smp_mb() (again, possibly during the preceding grace 524 - * period), and summed up the ->srcu_unlock_count[idx] counters. 525 - * How many times can a given one of the aforementioned Nt tasks 526 - * increment the old ->srcu_idx value's ->srcu_lock_count[idx] 527 - * counter, in the absence of nesting? 520 + * So suppose that the updater is preempted at this 521 + * point in the code for a long time. That now-preempted 522 + * updater has already flipped ->srcu_ctrp (possibly during 523 + * the preceding grace period), done an smp_mb() (again, 524 + * possibly during the preceding grace period), and summed up 525 + * the ->srcu_ctrs[idx].srcu_unlocks counters. How many times 526 + * can a given one of the aforementioned Nt tasks increment the 527 + * old ->srcu_ctrp value's ->srcu_ctrs[idx].srcu_locks counter, 528 + * in the absence of nesting? 528 529 * 529 530 * It can clearly do so once, given that it has already fetched 530 - * the old value of ->srcu_idx and is just about to use that value 531 - * to index its increment of ->srcu_lock_count[idx]. But as soon as 532 - * it leaves that SRCU read-side critical section, it will increment 533 - * ->srcu_unlock_count[idx], which must follow the updater's above 534 - * read from that same value. Thus, as soon the reading task does 535 - * an smp_mb() and a later fetch from ->srcu_idx, that task will be 536 - * guaranteed to get the new index. Except that the increment of 537 - * ->srcu_unlock_count[idx] in __srcu_read_unlock() is after the 538 - * smp_mb(), and the fetch from ->srcu_idx in __srcu_read_lock() 539 - * is before the smp_mb(). Thus, that task might not see the new 540 - * value of ->srcu_idx until the -second- __srcu_read_lock(), 541 - * which in turn means that this task might well increment 542 - * ->srcu_lock_count[idx] for the old value of ->srcu_idx twice, 543 - * not just once. 531 + * the old value of ->srcu_ctrp and is just about to use that 532 + * value to index its increment of ->srcu_ctrs[idx].srcu_locks. 533 + * But as soon as it leaves that SRCU read-side critical section, 534 + * it will increment ->srcu_ctrs[idx].srcu_unlocks, which must 535 + * follow the updater's above read from that same value. Thus, 536 + as soon the reading task does an smp_mb() and a later fetch from 537 + * ->srcu_ctrp, that task will be guaranteed to get the new index. 538 + * Except that the increment of ->srcu_ctrs[idx].srcu_unlocks 539 + * in __srcu_read_unlock() is after the smp_mb(), and the fetch 540 + * from ->srcu_ctrp in __srcu_read_lock() is before the smp_mb(). 541 + * Thus, that task might not see the new value of ->srcu_ctrp until 542 + * the -second- __srcu_read_lock(), which in turn means that this 543 + * task might well increment ->srcu_ctrs[idx].srcu_locks for the 544 + * old value of ->srcu_ctrp twice, not just once. 544 545 * 545 546 * However, it is important to note that a given smp_mb() takes 546 547 * effect not just for the task executing it, but also for any 547 548 * later task running on that same CPU. 548 549 * 549 - * That is, there can be almost Nt + Nc further increments of 550 - * ->srcu_lock_count[idx] for the old index, where Nc is the number 551 - * of CPUs. But this is OK because the size of the task_struct 552 - * structure limits the value of Nt and current systems limit Nc 553 - * to a few thousand. 550 + * That is, there can be almost Nt + Nc further increments 551 + * of ->srcu_ctrs[idx].srcu_locks for the old index, where Nc 552 + * is the number of CPUs. But this is OK because the size of 553 + * the task_struct structure limits the value of Nt and current 554 + * systems limit Nc to a few thousand. 554 555 * 555 556 * OK, but what about nesting? This does impose a limit on 556 557 * nesting of half of the size of the task_struct structure ··· 582 581 for_each_possible_cpu(cpu) { 583 582 struct srcu_data *sdp = per_cpu_ptr(ssp->sda, cpu); 584 583 585 - sum += atomic_long_read(&sdp->srcu_lock_count[0]); 586 - sum += atomic_long_read(&sdp->srcu_lock_count[1]); 587 - sum -= atomic_long_read(&sdp->srcu_unlock_count[0]); 588 - sum -= atomic_long_read(&sdp->srcu_unlock_count[1]); 584 + sum += atomic_long_read(&sdp->srcu_ctrs[0].srcu_locks); 585 + sum += atomic_long_read(&sdp->srcu_ctrs[1].srcu_locks); 586 + sum -= atomic_long_read(&sdp->srcu_ctrs[0].srcu_unlocks); 587 + sum -= atomic_long_read(&sdp->srcu_ctrs[1].srcu_unlocks); 589 588 } 590 589 return sum; 591 590 } ··· 648 647 unsigned long jbase = SRCU_INTERVAL; 649 648 struct srcu_usage *sup = ssp->srcu_sup; 650 649 650 + lockdep_assert_held(&ACCESS_PRIVATE(ssp->srcu_sup, lock)); 651 651 if (srcu_gp_is_expedited(ssp)) 652 652 jbase = 0; 653 653 if (rcu_seq_state(READ_ONCE(sup->srcu_gp_seq))) { ··· 676 674 void cleanup_srcu_struct(struct srcu_struct *ssp) 677 675 { 678 676 int cpu; 677 + unsigned long delay; 679 678 struct srcu_usage *sup = ssp->srcu_sup; 680 679 681 - if (WARN_ON(!srcu_get_delay(ssp))) 680 + spin_lock_irq_rcu_node(ssp->srcu_sup); 681 + delay = srcu_get_delay(ssp); 682 + spin_unlock_irq_rcu_node(ssp->srcu_sup); 683 + if (WARN_ON(!delay)) 682 684 return; /* Just leak it! */ 683 685 if (WARN_ON(srcu_readers_active(ssp))) 684 686 return; /* Just leak it! */ ··· 749 743 */ 750 744 int __srcu_read_lock(struct srcu_struct *ssp) 751 745 { 752 - int idx; 746 + struct srcu_ctr __percpu *scp = READ_ONCE(ssp->srcu_ctrp); 753 747 754 - idx = READ_ONCE(ssp->srcu_idx) & 0x1; 755 - this_cpu_inc(ssp->sda->srcu_lock_count[idx].counter); 748 + this_cpu_inc(scp->srcu_locks.counter); 756 749 smp_mb(); /* B */ /* Avoid leaking the critical section. */ 757 - return idx; 750 + return __srcu_ptr_to_ctr(ssp, scp); 758 751 } 759 752 EXPORT_SYMBOL_GPL(__srcu_read_lock); 760 753 ··· 765 760 void __srcu_read_unlock(struct srcu_struct *ssp, int idx) 766 761 { 767 762 smp_mb(); /* C */ /* Avoid leaking the critical section. */ 768 - this_cpu_inc(ssp->sda->srcu_unlock_count[idx].counter); 763 + this_cpu_inc(__srcu_ctr_to_ptr(ssp, idx)->srcu_unlocks.counter); 769 764 } 770 765 EXPORT_SYMBOL_GPL(__srcu_read_unlock); 771 766 ··· 778 773 */ 779 774 int __srcu_read_lock_nmisafe(struct srcu_struct *ssp) 780 775 { 781 - int idx; 782 - struct srcu_data *sdp = raw_cpu_ptr(ssp->sda); 776 + struct srcu_ctr __percpu *scpp = READ_ONCE(ssp->srcu_ctrp); 777 + struct srcu_ctr *scp = raw_cpu_ptr(scpp); 783 778 784 - idx = READ_ONCE(ssp->srcu_idx) & 0x1; 785 - atomic_long_inc(&sdp->srcu_lock_count[idx]); 779 + atomic_long_inc(&scp->srcu_locks); 786 780 smp_mb__after_atomic(); /* B */ /* Avoid leaking the critical section. */ 787 - return idx; 781 + return __srcu_ptr_to_ctr(ssp, scpp); 788 782 } 789 783 EXPORT_SYMBOL_GPL(__srcu_read_lock_nmisafe); 790 784 ··· 794 790 */ 795 791 void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx) 796 792 { 797 - struct srcu_data *sdp = raw_cpu_ptr(ssp->sda); 798 - 799 793 smp_mb__before_atomic(); /* C */ /* Avoid leaking the critical section. */ 800 - atomic_long_inc(&sdp->srcu_unlock_count[idx]); 794 + atomic_long_inc(&raw_cpu_ptr(__srcu_ctr_to_ptr(ssp, idx))->srcu_unlocks); 801 795 } 802 796 EXPORT_SYMBOL_GPL(__srcu_read_unlock_nmisafe); 803 797 ··· 1098 1096 /* 1099 1097 * Wait until all readers counted by array index idx complete, but 1100 1098 * loop an additional time if there is an expedited grace period pending. 1101 - * The caller must ensure that ->srcu_idx is not changed while checking. 1099 + * The caller must ensure that ->srcu_ctrp is not changed while checking. 1102 1100 */ 1103 1101 static bool try_check_zero(struct srcu_struct *ssp, int idx, int trycount) 1104 1102 { 1105 1103 unsigned long curdelay; 1106 1104 1105 + spin_lock_irq_rcu_node(ssp->srcu_sup); 1107 1106 curdelay = !srcu_get_delay(ssp); 1107 + spin_unlock_irq_rcu_node(ssp->srcu_sup); 1108 1108 1109 1109 for (;;) { 1110 1110 if (srcu_readers_active_idx_check(ssp, idx)) ··· 1118 1114 } 1119 1115 1120 1116 /* 1121 - * Increment the ->srcu_idx counter so that future SRCU readers will 1117 + * Increment the ->srcu_ctrp counter so that future SRCU readers will 1122 1118 * use the other rank of the ->srcu_(un)lock_count[] arrays. This allows 1123 1119 * us to wait for pre-existing readers in a starvation-free manner. 1124 1120 */ 1125 1121 static void srcu_flip(struct srcu_struct *ssp) 1126 1122 { 1127 1123 /* 1128 - * Because the flip of ->srcu_idx is executed only if the 1124 + * Because the flip of ->srcu_ctrp is executed only if the 1129 1125 * preceding call to srcu_readers_active_idx_check() found that 1130 - * the ->srcu_unlock_count[] and ->srcu_lock_count[] sums matched 1131 - * and because that summing uses atomic_long_read(), there is 1132 - * ordering due to a control dependency between that summing and 1133 - * the WRITE_ONCE() in this call to srcu_flip(). This ordering 1134 - * ensures that if this updater saw a given reader's increment from 1135 - * __srcu_read_lock(), that reader was using a value of ->srcu_idx 1136 - * from before the previous call to srcu_flip(), which should be 1137 - * quite rare. This ordering thus helps forward progress because 1138 - * the grace period could otherwise be delayed by additional 1139 - * calls to __srcu_read_lock() using that old (soon to be new) 1140 - * value of ->srcu_idx. 1126 + * the ->srcu_ctrs[].srcu_unlocks and ->srcu_ctrs[].srcu_locks sums 1127 + * matched and because that summing uses atomic_long_read(), 1128 + * there is ordering due to a control dependency between that 1129 + * summing and the WRITE_ONCE() in this call to srcu_flip(). 1130 + * This ordering ensures that if this updater saw a given reader's 1131 + * increment from __srcu_read_lock(), that reader was using a value 1132 + * of ->srcu_ctrp from before the previous call to srcu_flip(), 1133 + * which should be quite rare. This ordering thus helps forward 1134 + * progress because the grace period could otherwise be delayed 1135 + * by additional calls to __srcu_read_lock() using that old (soon 1136 + * to be new) value of ->srcu_ctrp. 1141 1137 * 1142 1138 * This sum-equality check and ordering also ensures that if 1143 1139 * a given call to __srcu_read_lock() uses the new value of 1144 - * ->srcu_idx, this updater's earlier scans cannot have seen 1140 + * ->srcu_ctrp, this updater's earlier scans cannot have seen 1145 1141 * that reader's increments, which is all to the good, because 1146 1142 * this grace period need not wait on that reader. After all, 1147 1143 * if those earlier scans had seen that reader, there would have ··· 1156 1152 */ 1157 1153 smp_mb(); /* E */ /* Pairs with B and C. */ 1158 1154 1159 - WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1); // Flip the counter. 1155 + WRITE_ONCE(ssp->srcu_ctrp, 1156 + &ssp->sda->srcu_ctrs[!(ssp->srcu_ctrp - &ssp->sda->srcu_ctrs[0])]); 1160 1157 1161 1158 /* 1162 1159 * Ensure that if the updater misses an __srcu_read_unlock() ··· 1203 1198 1204 1199 check_init_srcu_struct(ssp); 1205 1200 /* If _lite() readers, don't do unsolicited expediting. */ 1206 - if (this_cpu_read(ssp->sda->srcu_reader_flavor) & SRCU_READ_FLAVOR_LITE) 1201 + if (this_cpu_read(ssp->sda->srcu_reader_flavor) & SRCU_READ_FLAVOR_SLOWGP) 1207 1202 return false; 1208 1203 /* If the local srcu_data structure has callbacks, not idle. */ 1209 1204 sdp = raw_cpu_ptr(ssp->sda); ··· 1403 1398 * read-side critical sections are delimited by srcu_read_lock() and 1404 1399 * srcu_read_unlock(), and may be nested. 1405 1400 * 1406 - * The callback will be invoked from process context, but must nevertheless 1407 - * be fast and must not block. 1401 + * The callback will be invoked from process context, but with bh 1402 + * disabled. The callback function must therefore be fast and must 1403 + * not block. 1404 + * 1405 + * See the description of call_rcu() for more detailed information on 1406 + * memory ordering guarantees. 1408 1407 */ 1409 1408 void call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp, 1410 1409 rcu_callback_t func) ··· 1474 1465 * 1475 1466 * Wait for the count to drain to zero of both indexes. To avoid the 1476 1467 * possible starvation of synchronize_srcu(), it waits for the count of 1477 - * the index=((->srcu_idx & 1) ^ 1) to drain to zero at first, 1478 - * and then flip the srcu_idx and wait for the count of the other index. 1468 + * the index=!(ssp->srcu_ctrp - &ssp->sda->srcu_ctrs[0]) to drain to zero 1469 + * at first, and then flip the ->srcu_ctrp and wait for the count of the 1470 + * other index. 1479 1471 * 1480 1472 * Can block; must be called from process context. 1481 1473 * ··· 1685 1675 */ 1686 1676 unsigned long srcu_batches_completed(struct srcu_struct *ssp) 1687 1677 { 1688 - return READ_ONCE(ssp->srcu_idx); 1678 + return READ_ONCE(ssp->srcu_sup->srcu_gp_seq); 1689 1679 } 1690 1680 EXPORT_SYMBOL_GPL(srcu_batches_completed); 1691 1681 ··· 1702 1692 1703 1693 /* 1704 1694 * Because readers might be delayed for an extended period after 1705 - * fetching ->srcu_idx for their index, at any point in time there 1695 + * fetching ->srcu_ctrp for their index, at any point in time there 1706 1696 * might well be readers using both idx=0 and idx=1. We therefore 1707 1697 * need to wait for readers to clear from both index values before 1708 1698 * invoking a callback. ··· 1730 1720 } 1731 1721 1732 1722 if (rcu_seq_state(READ_ONCE(ssp->srcu_sup->srcu_gp_seq)) == SRCU_STATE_SCAN1) { 1733 - idx = 1 ^ (ssp->srcu_idx & 1); 1723 + idx = !(ssp->srcu_ctrp - &ssp->sda->srcu_ctrs[0]); 1734 1724 if (!try_check_zero(ssp, idx, 1)) { 1735 1725 mutex_unlock(&ssp->srcu_sup->srcu_gp_mutex); 1736 1726 return; /* readers present, retry later. */ ··· 1748 1738 * SRCU read-side critical sections are normally short, 1749 1739 * so check at least twice in quick succession after a flip. 1750 1740 */ 1751 - idx = 1 ^ (ssp->srcu_idx & 1); 1741 + idx = !(ssp->srcu_ctrp - &ssp->sda->srcu_ctrs[0]); 1752 1742 if (!try_check_zero(ssp, idx, 2)) { 1753 1743 mutex_unlock(&ssp->srcu_sup->srcu_gp_mutex); 1754 1744 return; /* readers present, retry later. */ ··· 1859 1849 ssp = sup->srcu_ssp; 1860 1850 1861 1851 srcu_advance_state(ssp); 1852 + spin_lock_irq_rcu_node(ssp->srcu_sup); 1862 1853 curdelay = srcu_get_delay(ssp); 1854 + spin_unlock_irq_rcu_node(ssp->srcu_sup); 1863 1855 if (curdelay) { 1864 1856 WRITE_ONCE(sup->reschedule_count, 0); 1865 1857 } else { ··· 1908 1896 int ss_state = READ_ONCE(ssp->srcu_sup->srcu_size_state); 1909 1897 int ss_state_idx = ss_state; 1910 1898 1911 - idx = ssp->srcu_idx & 0x1; 1899 + idx = ssp->srcu_ctrp - &ssp->sda->srcu_ctrs[0]; 1912 1900 if (ss_state < 0 || ss_state >= ARRAY_SIZE(srcu_size_state_name)) 1913 1901 ss_state_idx = ARRAY_SIZE(srcu_size_state_name) - 1; 1914 1902 pr_alert("%s%s Tree SRCU g%ld state %d (%s)", ··· 1926 1914 struct srcu_data *sdp; 1927 1915 1928 1916 sdp = per_cpu_ptr(ssp->sda, cpu); 1929 - u0 = data_race(atomic_long_read(&sdp->srcu_unlock_count[!idx])); 1930 - u1 = data_race(atomic_long_read(&sdp->srcu_unlock_count[idx])); 1917 + u0 = data_race(atomic_long_read(&sdp->srcu_ctrs[!idx].srcu_unlocks)); 1918 + u1 = data_race(atomic_long_read(&sdp->srcu_ctrs[idx].srcu_unlocks)); 1931 1919 1932 1920 /* 1933 1921 * Make sure that a lock is always counted if the corresponding ··· 1935 1923 */ 1936 1924 smp_rmb(); 1937 1925 1938 - l0 = data_race(atomic_long_read(&sdp->srcu_lock_count[!idx])); 1939 - l1 = data_race(atomic_long_read(&sdp->srcu_lock_count[idx])); 1926 + l0 = data_race(atomic_long_read(&sdp->srcu_ctrs[!idx].srcu_locks)); 1927 + l1 = data_race(atomic_long_read(&sdp->srcu_ctrs[idx].srcu_locks)); 1940 1928 1941 1929 c0 = l0 - u0; 1942 1930 c1 = l1 - u1; ··· 2013 2001 ssp->sda = alloc_percpu(struct srcu_data); 2014 2002 if (WARN_ON_ONCE(!ssp->sda)) 2015 2003 return -ENOMEM; 2004 + ssp->srcu_ctrp = &ssp->sda->srcu_ctrs[0]; 2016 2005 } 2017 2006 return 0; 2018 2007 }

+4 -1

kernel/rcu/tasks.h

··· 2256 2256 #endif 2257 2257 } 2258 2258 2259 - void __init rcu_init_tasks_generic(void) 2259 + static int __init rcu_init_tasks_generic(void) 2260 2260 { 2261 2261 #ifdef CONFIG_TASKS_RCU 2262 2262 rcu_spawn_tasks_kthread(); ··· 2272 2272 2273 2273 // Run the self-tests. 2274 2274 rcu_tasks_initiate_self_tests(); 2275 + 2276 + return 0; 2275 2277 } 2278 + core_initcall(rcu_init_tasks_generic); 2276 2279 2277 2280 #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */ 2278 2281 static inline void rcu_tasks_bootup_oddness(void) {}

+14

kernel/rcu/tiny.c

··· 232 232 } 233 233 EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu); 234 234 235 + #if IS_ENABLED(CONFIG_RCU_TORTURE_TEST) 236 + unsigned long long rcutorture_gather_gp_seqs(void) 237 + { 238 + return READ_ONCE(rcu_ctrlblk.gp_seq) & 0xffffULL; 239 + } 240 + EXPORT_SYMBOL_GPL(rcutorture_gather_gp_seqs); 241 + 242 + void rcutorture_format_gp_seqs(unsigned long long seqs, char *cp, size_t len) 243 + { 244 + snprintf(cp, len, "g%04llx", seqs & 0xffffULL); 245 + } 246 + EXPORT_SYMBOL_GPL(rcutorture_format_gp_seqs); 247 + #endif 248 + 235 249 void __init rcu_init(void) 236 250 { 237 251 open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);

+51 -13

kernel/rcu/tree.c

··· 538 538 } 539 539 EXPORT_SYMBOL_GPL(rcutorture_get_gp_data); 540 540 541 + /* Gather grace-period sequence numbers for rcutorture diagnostics. */ 542 + unsigned long long rcutorture_gather_gp_seqs(void) 543 + { 544 + return ((READ_ONCE(rcu_state.gp_seq) & 0xffffULL) << 40) | 545 + ((READ_ONCE(rcu_state.expedited_sequence) & 0xffffffULL) << 16) | 546 + (READ_ONCE(rcu_state.gp_seq_polled) & 0xffffULL); 547 + } 548 + EXPORT_SYMBOL_GPL(rcutorture_gather_gp_seqs); 549 + 550 + /* Format grace-period sequence numbers for rcutorture diagnostics. */ 551 + void rcutorture_format_gp_seqs(unsigned long long seqs, char *cp, size_t len) 552 + { 553 + unsigned int egp = (seqs >> 16) & 0xffffffULL; 554 + unsigned int ggp = (seqs >> 40) & 0xffffULL; 555 + unsigned int pgp = seqs & 0xffffULL; 556 + 557 + snprintf(cp, len, "g%04x:e%06x:p%04x", ggp, egp, pgp); 558 + } 559 + EXPORT_SYMBOL_GPL(rcutorture_format_gp_seqs); 560 + 541 561 #if defined(CONFIG_NO_HZ_FULL) && (!defined(CONFIG_GENERIC_ENTRY) || !defined(CONFIG_KVM_XFER_TO_GUEST_WORK)) 542 562 /* 543 563 * An empty function that will trigger a reschedule on ··· 1274 1254 1275 1255 /* Handle the ends of any preceding grace periods first. */ 1276 1256 if (rcu_seq_completed_gp(rdp->gp_seq, rnp->gp_seq) || 1277 - unlikely(READ_ONCE(rdp->gpwrap))) { 1257 + unlikely(rdp->gpwrap)) { 1278 1258 if (!offloaded) 1279 1259 ret = rcu_advance_cbs(rnp, rdp); /* Advance CBs. */ 1280 1260 rdp->core_needs_qs = false; ··· 1288 1268 1289 1269 /* Now handle the beginnings of any new-to-this-CPU grace periods. */ 1290 1270 if (rcu_seq_new_gp(rdp->gp_seq, rnp->gp_seq) || 1291 - unlikely(READ_ONCE(rdp->gpwrap))) { 1271 + unlikely(rdp->gpwrap)) { 1292 1272 /* 1293 1273 * If the current grace period is waiting for this CPU, 1294 1274 * set up to detect a quiescent state, otherwise don't ··· 1303 1283 rdp->gp_seq = rnp->gp_seq; /* Remember new grace-period state. */ 1304 1284 if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || rdp->gpwrap) 1305 1285 WRITE_ONCE(rdp->gp_seq_needed, rnp->gp_seq_needed); 1306 - if (IS_ENABLED(CONFIG_PROVE_RCU) && READ_ONCE(rdp->gpwrap)) 1286 + if (IS_ENABLED(CONFIG_PROVE_RCU) && rdp->gpwrap) 1307 1287 WRITE_ONCE(rdp->last_sched_clock, jiffies); 1308 1288 WRITE_ONCE(rdp->gpwrap, false); 1309 1289 rcu_gpnum_ovf(rnp, rdp); ··· 1632 1612 { 1633 1613 struct rcu_synchronize *rs = container_of( 1634 1614 (struct rcu_head *) node, struct rcu_synchronize, head); 1635 - unsigned long oldstate = (unsigned long) rs->head.func; 1636 1615 1637 1616 WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && 1638 - !poll_state_synchronize_rcu(oldstate), 1639 - "A full grace period is not passed yet: %lu", 1640 - rcu_seq_diff(get_state_synchronize_rcu(), oldstate)); 1617 + !poll_state_synchronize_rcu_full(&rs->oldstate), 1618 + "A full grace period is not passed yet!\n"); 1641 1619 1642 1620 /* Finally. */ 1643 1621 complete(&rs->completion); ··· 1819 1801 1820 1802 /* Advance to a new grace period and initialize state. */ 1821 1803 record_gp_stall_check_time(); 1804 + /* 1805 + * A new wait segment must be started before gp_seq advanced, so 1806 + * that previous gp waiters won't observe the new gp_seq. 1807 + */ 1808 + start_new_poll = rcu_sr_normal_gp_init(); 1822 1809 /* Record GP times before starting GP, hence rcu_seq_start(). */ 1823 1810 rcu_seq_start(&rcu_state.gp_seq); 1824 1811 ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq); 1825 - start_new_poll = rcu_sr_normal_gp_init(); 1826 1812 trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start")); 1827 1813 rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap); 1828 1814 raw_spin_unlock_irq_rcu_node(rnp); ··· 3124 3102 * critical sections have completed. 3125 3103 * 3126 3104 * Use this API instead of call_rcu() if you don't want the callback to be 3127 - * invoked after very long periods of time, which can happen on systems without 3105 + * delayed for very long periods of time, which can happen on systems without 3128 3106 * memory pressure and on systems which are lightly loaded or mostly idle. 3129 3107 * This function will cause callbacks to be invoked sooner than later at the 3130 3108 * expense of extra power. Other than that, this function is identical to, and ··· 3155 3133 * might well execute concurrently with RCU read-side critical sections 3156 3134 * that started after call_rcu() was invoked. 3157 3135 * 3136 + * It is perfectly legal to repost an RCU callback, potentially with 3137 + * a different callback function, from within its callback function. 3138 + * The specified function will be invoked after another full grace period 3139 + * has elapsed. This use case is similar in form to the common practice 3140 + * of reposting a timer from within its own handler. 3141 + * 3158 3142 * RCU read-side critical sections are delimited by rcu_read_lock() 3159 3143 * and rcu_read_unlock(), and may be nested. In addition, but only in 3160 3144 * v5.0 and later, regions of code across which interrupts, preemption, ··· 3189 3161 * 3190 3162 * Implementation of these memory-ordering guarantees is described here: 3191 3163 * Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst. 3164 + * 3165 + * Specific to call_rcu() (as opposed to the other call_rcu*() functions), 3166 + * in kernels built with CONFIG_RCU_LAZY=y, call_rcu() might delay for many 3167 + * seconds before starting the grace period needed by the corresponding 3168 + * callback. This delay can significantly improve energy-efficiency 3169 + * on low-utilization battery-powered devices. To avoid this delay, 3170 + * in latency-sensitive kernel code, use call_rcu_hurry(). 3192 3171 */ 3193 3172 void call_rcu(struct rcu_head *head, rcu_callback_t func) 3194 3173 { ··· 3244 3209 * snapshot before adding a request. 3245 3210 */ 3246 3211 if (IS_ENABLED(CONFIG_PROVE_RCU)) 3247 - rs.head.func = (void *) get_state_synchronize_rcu(); 3212 + get_state_synchronize_rcu_full(&rs.oldstate); 3248 3213 3249 3214 rcu_sr_normal_add_req(&rs); 3250 3215 ··· 3387 3352 */ 3388 3353 void get_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) 3389 3354 { 3390 - struct rcu_node *rnp = rcu_get_root(); 3391 - 3392 3355 /* 3393 3356 * Any prior manipulation of RCU-protected data must happen 3394 3357 * before the loads from ->gp_seq and ->expedited_sequence. 3395 3358 */ 3396 3359 smp_mb(); /* ^^^ */ 3397 - rgosp->rgos_norm = rcu_seq_snap(&rnp->gp_seq); 3360 + 3361 + // Yes, rcu_state.gp_seq, not rnp_root->gp_seq, the latter's use 3362 + // in poll_state_synchronize_rcu_full() notwithstanding. Use of 3363 + // the latter here would result in too-short grace periods due to 3364 + // interactions with newly onlined CPUs. 3365 + rgosp->rgos_norm = rcu_seq_snap(&rcu_state.gp_seq); 3398 3366 rgosp->rgos_exp = rcu_seq_snap(&rcu_state.expedited_sequence); 3399 3367 } 3400 3368 EXPORT_SYMBOL_GPL(get_state_synchronize_rcu_full);

+4 -2

kernel/rcu/tree_exp.h

··· 230 230 * specified leaf rcu_node structure, which is acquired by the caller. 231 231 */ 232 232 static void rcu_report_exp_cpu_mult(struct rcu_node *rnp, unsigned long flags, 233 - unsigned long mask, bool wake) 233 + unsigned long mask_in, bool wake) 234 234 __releases(rnp->lock) 235 235 { 236 236 int cpu; 237 + unsigned long mask; 237 238 struct rcu_data *rdp; 238 239 239 240 raw_lockdep_assert_held_rcu_node(rnp); 240 - if (!(rnp->expmask & mask)) { 241 + if (!(rnp->expmask & mask_in)) { 241 242 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 242 243 return; 243 244 } 245 + mask = mask_in & rnp->expmask; 244 246 WRITE_ONCE(rnp->expmask, rnp->expmask & ~mask); 245 247 for_each_leaf_node_cpu_mask(rnp, cpu, mask) { 246 248 rdp = per_cpu_ptr(&rcu_data, cpu);

+15 -5

kernel/rcu/tree_nocb.h

··· 1557 1557 /* Dump out nocb kthread state for the specified rcu_data structure. */ 1558 1558 static void show_rcu_nocb_state(struct rcu_data *rdp) 1559 1559 { 1560 - char bufw[20]; 1561 - char bufr[20]; 1560 + char bufd[22]; 1561 + char bufw[45]; 1562 + char bufr[45]; 1563 + char bufn[22]; 1564 + char bufb[22]; 1562 1565 struct rcu_data *nocb_next_rdp; 1563 1566 struct rcu_segcblist *rsclp = &rdp->cblist; 1564 1567 bool waslocked; ··· 1575 1572 typeof(*rdp), 1576 1573 nocb_entry_rdp); 1577 1574 1578 - sprintf(bufw, "%ld", rsclp->gp_seq[RCU_WAIT_TAIL]); 1579 - sprintf(bufr, "%ld", rsclp->gp_seq[RCU_NEXT_READY_TAIL]); 1580 - pr_info(" CB %d^%d->%d %c%c%c%c%c F%ld L%ld C%d %c%c%s%c%s%c%c q%ld %c CPU %d%s\n", 1575 + sprintf(bufd, "%ld", rsclp->seglen[RCU_DONE_TAIL]); 1576 + sprintf(bufw, "%ld(%ld)", rsclp->seglen[RCU_WAIT_TAIL], rsclp->gp_seq[RCU_WAIT_TAIL]); 1577 + sprintf(bufr, "%ld(%ld)", rsclp->seglen[RCU_NEXT_READY_TAIL], 1578 + rsclp->gp_seq[RCU_NEXT_READY_TAIL]); 1579 + sprintf(bufn, "%ld", rsclp->seglen[RCU_NEXT_TAIL]); 1580 + sprintf(bufb, "%ld", rcu_cblist_n_cbs(&rdp->nocb_bypass)); 1581 + pr_info(" CB %d^%d->%d %c%c%c%c%c F%ld L%ld C%d %c%s%c%s%c%s%c%s%c%s q%ld %c CPU %d%s\n", 1581 1582 rdp->cpu, rdp->nocb_gp_rdp->cpu, 1582 1583 nocb_next_rdp ? nocb_next_rdp->cpu : -1, 1583 1584 "kK"[!!rdp->nocb_cb_kthread], ··· 1593 1586 jiffies - rdp->nocb_nobypass_last, 1594 1587 rdp->nocb_nobypass_count, 1595 1588 ".D"[rcu_segcblist_ready_cbs(rsclp)], 1589 + rcu_segcblist_segempty(rsclp, RCU_DONE_TAIL) ? "" : bufd, 1596 1590 ".W"[!rcu_segcblist_segempty(rsclp, RCU_WAIT_TAIL)], 1597 1591 rcu_segcblist_segempty(rsclp, RCU_WAIT_TAIL) ? "" : bufw, 1598 1592 ".R"[!rcu_segcblist_segempty(rsclp, RCU_NEXT_READY_TAIL)], 1599 1593 rcu_segcblist_segempty(rsclp, RCU_NEXT_READY_TAIL) ? "" : bufr, 1600 1594 ".N"[!rcu_segcblist_segempty(rsclp, RCU_NEXT_TAIL)], 1595 + rcu_segcblist_segempty(rsclp, RCU_NEXT_TAIL) ? "" : bufn, 1601 1596 ".B"[!!rcu_cblist_n_cbs(&rdp->nocb_bypass)], 1597 + !rcu_cblist_n_cbs(&rdp->nocb_bypass) ? "" : bufb, 1602 1598 rcu_segcblist_n_cbs(&rdp->cblist), 1603 1599 rdp->nocb_cb_kthread ? task_state_to_char(rdp->nocb_cb_kthread) : '.', 1604 1600 rdp->nocb_cb_kthread ? (int)task_cpu(rdp->nocb_cb_kthread) : -1,

+17 -5

kernel/rcu/tree_plugin.h

··· 833 833 { 834 834 struct rcu_data *rdp; 835 835 836 - if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread) 836 + if (irqs_disabled() || in_atomic_preempt_off() || !rcu_state.gp_kthread) 837 837 return; 838 + 839 + /* 840 + * rcu_report_qs_rdp() can only be invoked with a stable rdp and 841 + * from the local CPU. 842 + * 843 + * The in_atomic_preempt_off() check ensures that we come here holding 844 + * the last preempt_count (which will get dropped once we return to 845 + * __rcu_read_unlock(). 846 + */ 838 847 rdp = this_cpu_ptr(&rcu_data); 839 848 rdp->cpu_no_qs.b.norm = false; 840 849 rcu_report_qs_rdp(rdp); ··· 984 975 */ 985 976 static void rcu_flavor_sched_clock_irq(int user) 986 977 { 987 - if (user || rcu_is_cpu_rrupt_from_idle()) { 978 + if (user || rcu_is_cpu_rrupt_from_idle() || 979 + (IS_ENABLED(CONFIG_PREEMPT_COUNT) && 980 + (preempt_count() == HARDIRQ_OFFSET))) { 988 981 989 982 /* 990 983 * Get here if this CPU took its interrupt from user 991 - * mode or from the idle loop, and if this is not a 992 - * nested interrupt. In this case, the CPU is in 993 - * a quiescent state, so note it. 984 + * mode, from the idle loop without this being a nested 985 + * interrupt, or while not holding the task preempt count 986 + * (with PREEMPT_COUNT=y). In this case, the CPU is in a 987 + * quiescent state, so note it. 994 988 * 995 989 * No memory barrier is required here because rcu_qs() 996 990 * references only CPU-local variables that other CPUs

kernel/reboot.c

··· 704 704 migrate_to_reboot_cpu(); 705 705 syscore_shutdown(); 706 706 pr_emerg("Power down\n"); 707 + pr_flush(1000, true); 707 708 kmsg_dump(KMSG_DUMP_SHUTDOWN); 708 709 machine_power_off(); 709 710 }

+3 -1

kernel/sched/core.c

··· 7285 7285 return 1; 7286 7286 } 7287 7287 /* 7288 - * In preemptible kernels, ->rcu_read_lock_nesting tells the tick 7288 + * In PREEMPT_RCU kernels, ->rcu_read_lock_nesting tells the tick 7289 7289 * whether the current CPU is in an RCU read-side critical section, 7290 7290 * so the tick can report quiescent states even for CPUs looping 7291 7291 * in kernel context. In contrast, in non-preemptible kernels, ··· 7294 7294 * RCU quiescent state. Therefore, the following code causes 7295 7295 * cond_resched() to report a quiescent state, but only when RCU 7296 7296 * is in urgent need of one. 7297 + * A third case, preemptible, but non-PREEMPT_RCU provides for 7298 + * urgently needed quiescent states via rcu_flavor_sched_clock_irq(). 7297 7299 */ 7298 7300 #ifndef CONFIG_PREEMPT_RCU 7299 7301 rcu_all_qs();

+12

kernel/torture.c

··· 792 792 stutter_task = NULL; 793 793 } 794 794 795 + static unsigned long torture_init_jiffies; 796 + 795 797 static void 796 798 torture_print_module_parms(void) 797 799 { ··· 823 821 torture_type = ttype; 824 822 verbose = v; 825 823 fullstop = FULLSTOP_DONTSTOP; 824 + WRITE_ONCE(torture_init_jiffies, jiffies); // Lockless reads. 826 825 torture_print_module_parms(); 827 826 return true; 828 827 } ··· 838 835 register_reboot_notifier(&torture_shutdown_nb); 839 836 } 840 837 EXPORT_SYMBOL_GPL(torture_init_end); 838 + 839 + /* 840 + * Get the torture_init_begin()-time value of the jiffies counter. 841 + */ 842 + unsigned long get_torture_init_jiffies(void) 843 + { 844 + return READ_ONCE(torture_init_jiffies); 845 + } 846 + EXPORT_SYMBOL_GPL(get_torture_init_jiffies); 841 847 842 848 /* 843 849 * Clean up torture module. Please note that this is -not- invoked via

+15 -17

kernel/trace/trace_osnoise.c

··· 1542 1542 1543 1543 /* 1544 1544 * In some cases, notably when running on a nohz_full CPU with 1545 - * a stopped tick PREEMPT_RCU has no way to account for QSs. 1546 - * This will eventually cause unwarranted noise as PREEMPT_RCU 1547 - * will force preemption as the means of ending the current 1548 - * grace period. We avoid this problem by calling 1549 - * rcu_momentary_eqs(), which performs a zero duration 1550 - * EQS allowing PREEMPT_RCU to end the current grace period. 1551 - * This call shouldn't be wrapped inside an RCU critical 1552 - * section. 1545 + * a stopped tick PREEMPT_RCU or PREEMPT_LAZY have no way to 1546 + * account for QSs. This will eventually cause unwarranted 1547 + * noise as RCU forces preemption as the means of ending the 1548 + * current grace period. We avoid this by calling 1549 + * rcu_momentary_eqs(), which performs a zero duration EQS 1550 + * allowing RCU to end the current grace period. This call 1551 + * shouldn't be wrapped inside an RCU critical section. 1553 1552 * 1554 - * Note that in non PREEMPT_RCU kernels QSs are handled through 1555 - * cond_resched() 1553 + * Normally QSs for other cases are handled through cond_resched(). 1554 + * For simplicity, however, we call rcu_momentary_eqs() for all 1555 + * configurations here. 1556 1556 */ 1557 - if (IS_ENABLED(CONFIG_PREEMPT_RCU)) { 1558 - if (!disable_irq) 1559 - local_irq_disable(); 1557 + if (!disable_irq) 1558 + local_irq_disable(); 1560 1559 1561 - rcu_momentary_eqs(); 1560 + rcu_momentary_eqs(); 1562 1561 1563 - if (!disable_irq) 1564 - local_irq_enable(); 1565 - } 1562 + if (!disable_irq) 1563 + local_irq_enable(); 1566 1564 1567 1565 /* 1568 1566 * For the non-preemptive kernel config: let threads runs, if

+1 -1

tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh

··· 49 49 do 50 50 err= 51 51 val=$((d*1000+t*10+c)) 52 - tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5s --configs "SRCU-P" --bootargs "rcutorture.test_srcu_lockdep=$val" --trust-make --datestamp "$ds/$val" > "$T/kvm.sh.out" 2>&1 52 + tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5s --configs "SRCU-P" --kconfig "CONFIG_FORCE_NEED_SRCU_NMI_SAFE=y" --bootargs "rcutorture.test_srcu_lockdep=$val rcutorture.reader_flavor=0x2" --trust-make --datestamp "$ds/$val" > "$T/kvm.sh.out" 2>&1 53 53 ret=$? 54 54 mv "$T/kvm.sh.out" "$RCUTORTURE/res/$ds/$val" 55 55 if test "$d" -ne 0 && test "$ret" -eq 0

tools/testing/selftests/rcutorture/configs/rcu/SRCU-P.boot

··· 2 2 rcupdate.rcu_self_test=1 3 3 rcutorture.fwd_progress=3 4 4 srcutree.big_cpu_lim=5 5 + rcutorture.reader_flavor=0x8

tools/testing/selftests/rcutorture/configs/rcu/TREE05.boot

··· 2 2 rcutree.gp_init_delay=3 3 3 rcutree.gp_cleanup_delay=3 4 4 rcupdate.rcu_self_test=1 5 + 6 + # This part is for synchronize_rcu() testing 7 + rcutorture.nfakewriters=-1 8 + rcutorture.gp_sync=1 9 + rcupdate.rcu_normal=1 10 + rcutree.rcu_normal_wake_from_gp=1

+2 -1

tools/testing/selftests/rcutorture/configs/rcu/TREE07

··· 1 1 CONFIG_SMP=y 2 2 CONFIG_NR_CPUS=16 3 - CONFIG_PREEMPT_NONE=y 3 + CONFIG_PREEMPT_NONE=n 4 4 CONFIG_PREEMPT_VOLUNTARY=n 5 + CONFIG_PREEMPT_LAZY=y 5 6 CONFIG_PREEMPT=n 6 7 CONFIG_PREEMPT_DYNAMIC=n 7 8 #CHECK#CONFIG_TREE_RCU=y

+2 -1

tools/testing/selftests/rcutorture/configs/rcu/TREE10

··· 1 1 CONFIG_SMP=y 2 2 CONFIG_NR_CPUS=74 3 - CONFIG_PREEMPT_NONE=y 3 + CONFIG_PREEMPT_NONE=n 4 + CONFIG_PREEMPT_LAZY=y 4 5 CONFIG_PREEMPT_VOLUNTARY=n 5 6 CONFIG_PREEMPT=n 6 7 CONFIG_PREEMPT_DYNAMIC=n