Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux

Pull RCU updates from Joel Fernandes:

- Updates and additions to MAINTAINERS files, with Boqun being added to
the RCU entry and Zqiang being added as an RCU reviewer.

I have also transitioned from reviewer to maintainer; however, Paul
will be taking over sending RCU pull-requests for the next merge
window.

- Resolution of hotplug warning in nohz code, achieved by fixing
cpu_is_hotpluggable() through interaction with the nohz subsystem.

Tick dependency modifications by Zqiang, focusing on fixing usage of
the TICK_DEP_BIT_RCU_EXP bitmask.

- Avoid needless calls to the rcu-lazy shrinker for CONFIG_RCU_LAZY=n
kernels, fixed by Zqiang.

- Improvements to rcu-tasks stall reporting by Neeraj.

- Initial renaming of k[v]free_rcu() to k[v]free_rcu_mightsleep() for
increased robustness, affecting several components like mac802154,
drbd, vmw_vmci, tracing, and more.

A report by Eric Dumazet showed that the API could be unknowingly
used in an atomic context, so we'd rather make sure they know what
they're asking for by being explicit:

https://lore.kernel.org/all/20221202052847.2623997-1-edumazet@google.com/

- Documentation updates, including corrections to spelling,
clarifications in comments, and improvements to the srcu_size_state
comments.

- Better srcu_struct cache locality for readers, by adjusting the size
of srcu_struct in support of SRCU usage by Christoph Hellwig.

- Teach lockdep to detect deadlocks between srcu_read_lock() vs
synchronize_srcu() contributed by Boqun.

Previously lockdep could not detect such deadlocks, now it can.

- Integration of rcutorture and rcu-related tools, targeted for v6.4
from Boqun's tree, featuring new SRCU deadlock scenarios, test_nmis
module parameter, and more

- Miscellaneous changes, various code cleanups and comment improvements

* tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux: (71 commits)
checkpatch: Error out if deprecated RCU API used
mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep()
rcuscale: Rename kfree_rcu() to kfree_rcu_mightsleep()
ext4/super: Rename kfree_rcu() to kfree_rcu_mightsleep()
net/mlx5: Rename kfree_rcu() to kfree_rcu_mightsleep()
net/sysctl: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
lib/test_vmalloc.c: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
tracing: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
misc: vmw_vmci: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
drbd: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
rcu: Protect rcu_print_task_exp_stall() ->exp_tasks access
rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed
rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan()
rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early
rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
rcu: Register rcu-lazy shrinker only for CONFIG_RCU_LAZY=y kernels
rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency check
rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race
rcu/trace: use strscpy() to instead of strncpy()
tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem
...

+916 -343
+3 -3
Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst
··· 277 277 278 278 Again, only one request in a given batch need actually carry out a 279 279 grace-period operation, which means there must be an efficient way to 280 - identify which of many concurrent reqeusts will initiate the grace 280 + identify which of many concurrent requests will initiate the grace 281 281 period, and that there be an efficient way for the remaining requests to 282 282 wait for that grace period to complete. However, that is the topic of 283 283 the next section. ··· 405 405 In earlier implementations, the task requesting the expedited grace 406 406 period also drove it to completion. This straightforward approach had 407 407 the disadvantage of needing to account for POSIX signals sent to user 408 - tasks, so more recent implemementations use the Linux kernel's 408 + tasks, so more recent implementations use the Linux kernel's 409 409 workqueues (see Documentation/core-api/workqueue.rst). 410 410 411 411 The requesting task still does counter snapshotting and funnel-lock ··· 465 465 initialized, which does not happen until some time after the scheduler 466 466 spawns the first task. Given that there are parts of the kernel that 467 467 really do want to execute grace periods during this mid-boot “dead 468 - zone”, expedited grace periods must do something else during thie time. 468 + zone”, expedited grace periods must do something else during this time. 469 469 470 470 What they do is to fall back to the old practice of requiring that the 471 471 requesting task drive the expedited grace period, as was the case before
+1 -1
Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
··· 168 168 +-----------------------------------------------------------------------+ 169 169 170 170 The approach must be extended to handle one final case, that of waking a 171 - task blocked in ``synchronize_rcu()``. This task might be affinitied to 171 + task blocked in ``synchronize_rcu()``. This task might be affined to 172 172 a CPU that is not yet aware that the grace period has ended, and thus 173 173 might not yet be subject to the grace period's memory ordering. 174 174 Therefore, there is an ``smp_mb()`` after the return from
+5 -5
Documentation/RCU/RTFP.txt
··· 201 201 In 2012, Josh Triplett received his Ph.D. with his dissertation 202 202 covering RCU-protected resizable hash tables and the relationship 203 203 between memory barriers and read-side traversal order: If the updater 204 - is making changes in the opposite direction from the read-side traveral 204 + is making changes in the opposite direction from the read-side traversal 205 205 order, the updater need only execute a memory-barrier instruction, 206 206 but if in the same direction, the updater needs to wait for a grace 207 207 period between the individual updates [JoshTriplettPhD]. Also in 2012, ··· 1245 1245 [Viewed September 5, 2005]" 1246 1246 ,annotation={ 1247 1247 First posting showing how RCU can be safely adapted for 1248 - preemptable RCU read side critical sections. 1248 + preemptible RCU read side critical sections. 1249 1249 } 1250 1250 } 1251 1251 ··· 1888 1888 \url{https://lore.kernel.org/r/20070910183004.GA3299@linux.vnet.ibm.com} 1889 1889 [Viewed October 25, 2007]" 1890 1890 ,annotation={ 1891 - Final patch for preemptable RCU to -rt. (Later patches were 1891 + Final patch for preemptible RCU to -rt. (Later patches were 1892 1892 to mainline, eventually incorporated.) 1893 1893 } 1894 1894 } ··· 2275 2275 \url{https://lore.kernel.org/r/20090724001429.GA17374@linux.vnet.ibm.com} 2276 2276 [Viewed August 15, 2009]" 2277 2277 ,annotation={ 2278 - First posting of simple and fast preemptable RCU. 2278 + First posting of simple and fast preemptible RCU. 2279 2279 } 2280 2280 } 2281 2281 ··· 2639 2639 RCU-protected hash tables, barriers vs. read-side traversal order. 2640 2640 . 2641 2641 If the updater is making changes in the opposite direction from 2642 - the read-side traveral order, the updater need only execute a 2642 + the read-side traversal order, the updater need only execute a 2643 2643 memory-barrier instruction, but if in the same direction, the 2644 2644 updater needs to wait for a grace period between the individual 2645 2645 updates.
+2 -2
Documentation/RCU/UP.rst
··· 107 107 108 108 Quick Quiz #3: 109 109 Why can't synchronize_rcu() return immediately on UP systems running 110 - preemptable RCU? 110 + preemptible RCU? 111 111 112 112 .. _answer_quick_quiz_up: 113 113 ··· 143 143 144 144 Answer to Quick Quiz #3: 145 145 Why can't synchronize_rcu() return immediately on UP systems 146 - running preemptable RCU? 146 + running preemptible RCU? 147 147 148 148 Because some other task might have been preempted in the middle 149 149 of an RCU read-side critical section. If synchronize_rcu()
+1 -1
Documentation/RCU/checklist.rst
··· 70 70 can serve as rcu_read_lock_sched(), but is less readable and 71 71 prevents lockdep from detecting locking issues. 72 72 73 - Please not that you *cannot* rely on code known to be built 73 + Please note that you *cannot* rely on code known to be built 74 74 only in non-preemptible kernels. Such code can and will break, 75 75 especially in kernels built with CONFIG_PREEMPT_COUNT=y. 76 76
+1 -1
Documentation/RCU/lockdep.rst
··· 65 65 rcu_access_pointer(p): 66 66 Return the value of the pointer and omit all barriers, 67 67 but retain the compiler constraints that prevent duplicating 68 - or coalescsing. This is useful when testing the 68 + or coalescing. This is useful when testing the 69 69 value of the pointer itself, for example, against NULL. 70 70 71 71 The rcu_dereference_check() check expression can be any boolean
+2 -2
Documentation/RCU/torture.rst
··· 216 216 rcutorture's module parameters. For example, to test a change to RCU's 217 217 CPU stall-warning code, use "--bootargs 'rcutorture.stall_cpu=30'". 218 218 This will of course result in the scripting reporting a failure, namely 219 - the resuling RCU CPU stall warning. As noted above, reducing memory may 219 + the resulting RCU CPU stall warning. As noted above, reducing memory may 220 220 require disabling rcutorture's callback-flooding tests:: 221 221 222 222 kvm.sh --cpus 448 --configs '56*TREE04' --memory 128M \ ··· 370 370 tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28-remote \ 371 371 --duration 24h 372 372 373 - In this case, most of the kvm-again.sh parmeters may be supplied following 373 + In this case, most of the kvm-again.sh parameters may be supplied following 374 374 the pathname of the old run-results directory.
+3 -3
Documentation/RCU/whatisRCU.rst
··· 597 597 If the occasional sleep is permitted, the single-argument form may 598 598 be used, omitting the rcu_head structure from struct foo. 599 599 600 - kfree_rcu(old_fp); 600 + kfree_rcu_mightsleep(old_fp); 601 601 602 - This variant of kfree_rcu() almost never blocks, but might do so by 603 - invoking synchronize_rcu() in response to memory-allocation failure. 602 + This variant almost never blocks, but might do so by invoking 603 + synchronize_rcu() in response to memory-allocation failure. 604 604 605 605 Again, see checklist.rst for additional rules governing the use of RCU. 606 606
+3 -1
MAINTAINERS
··· 17636 17636 M: "Paul E. McKenney" <paulmck@kernel.org> 17637 17637 M: Frederic Weisbecker <frederic@kernel.org> (kernel/rcu/tree_nocb.h) 17638 17638 M: Neeraj Upadhyay <quic_neeraju@quicinc.com> (kernel/rcu/tasks.h) 17639 + M: Joel Fernandes <joel@joelfernandes.org> 17639 17640 M: Josh Triplett <josh@joshtriplett.org> 17641 + M: Boqun Feng <boqun.feng@gmail.com> 17640 17642 R: Steven Rostedt <rostedt@goodmis.org> 17641 17643 R: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> 17642 17644 R: Lai Jiangshan <jiangshanlai@gmail.com> 17643 - R: Joel Fernandes <joel@joelfernandes.org> 17645 + R: Zqiang <qiang1.zhang@intel.com> 17644 17646 L: rcu@vger.kernel.org 17645 17647 S: Supported 17646 17648 W: http://www.rdrop.com/users/paulmck/RCU/
-1
arch/arm64/kvm/Kconfig
··· 29 29 select KVM_MMIO 30 30 select KVM_GENERIC_DIRTYLOG_READ_PROTECT 31 31 select KVM_XFER_TO_GUEST_WORK 32 - select SRCU 33 32 select KVM_VFIO 34 33 select HAVE_KVM_EVENTFD 35 34 select HAVE_KVM_IRQFD
-1
arch/mips/kvm/Kconfig
··· 26 26 select HAVE_KVM_VCPU_ASYNC_IOCTL 27 27 select KVM_MMIO 28 28 select MMU_NOTIFIER 29 - select SRCU 30 29 select INTERVAL_TREE 31 30 select KVM_GENERIC_HARDWARE_ENABLING 32 31 help
-1
arch/powerpc/kvm/Kconfig
··· 22 22 select PREEMPT_NOTIFIERS 23 23 select HAVE_KVM_EVENTFD 24 24 select HAVE_KVM_VCPU_ASYNC_IOCTL 25 - select SRCU 26 25 select KVM_VFIO 27 26 select IRQ_BYPASS_MANAGER 28 27 select HAVE_KVM_IRQ_BYPASS
-1
arch/riscv/kvm/Kconfig
··· 28 28 select KVM_XFER_TO_GUEST_WORK 29 29 select HAVE_KVM_VCPU_ASYNC_IOCTL 30 30 select HAVE_KVM_EVENTFD 31 - select SRCU 32 31 help 33 32 Support hosting virtualized guest machines. 34 33
-1
arch/s390/kvm/Kconfig
··· 31 31 select HAVE_KVM_IRQ_ROUTING 32 32 select HAVE_KVM_INVALID_WAKEUPS 33 33 select HAVE_KVM_NO_POLL 34 - select SRCU 35 34 select KVM_VFIO 36 35 select INTERVAL_TREE 37 36 select MMU_NOTIFIER
-2
arch/x86/Kconfig
··· 283 283 select RTC_LIB 284 284 select RTC_MC146818_LIB 285 285 select SPARSE_IRQ 286 - select SRCU 287 286 select SYSCTL_EXCEPTION_TRACE 288 287 select THREAD_INFO_IN_TASK 289 288 select TRACE_IRQFLAGS_SUPPORT ··· 1937 1938 depends on X86_64 && CPU_SUP_INTEL && X86_X2APIC 1938 1939 depends on CRYPTO=y 1939 1940 depends on CRYPTO_SHA256=y 1940 - select SRCU 1941 1941 select MMU_NOTIFIER 1942 1942 select NUMA_KEEP_MEMINFO if NUMA 1943 1943 select XARRAY_MULTI
-1
arch/x86/kvm/Kconfig
··· 46 46 select KVM_XFER_TO_GUEST_WORK 47 47 select KVM_GENERIC_DIRTYLOG_READ_PROTECT 48 48 select KVM_VFIO 49 - select SRCU 50 49 select INTERVAL_TREE 51 50 select HAVE_KVM_PM_NOTIFIER if PM 52 51 select KVM_GENERIC_HARDWARE_ENABLING
+2 -1
drivers/base/cpu.c
··· 487 487 bool cpu_is_hotpluggable(unsigned int cpu) 488 488 { 489 489 struct device *dev = get_cpu_device(cpu); 490 - return dev && container_of(dev, struct cpu, dev)->hotpluggable; 490 + return dev && container_of(dev, struct cpu, dev)->hotpluggable 491 + && tick_nohz_cpu_hotpluggable(cpu); 491 492 } 492 493 EXPORT_SYMBOL_GPL(cpu_is_hotpluggable); 493 494
+3 -3
drivers/block/drbd/drbd_nl.c
··· 1615 1615 drbd_send_sync_param(peer_device); 1616 1616 } 1617 1617 1618 - kvfree_rcu(old_disk_conf); 1618 + kvfree_rcu_mightsleep(old_disk_conf); 1619 1619 kfree(old_plan); 1620 1620 mod_timer(&device->request_timer, jiffies + HZ); 1621 1621 goto success; ··· 2446 2446 2447 2447 mutex_unlock(&connection->resource->conf_update); 2448 2448 mutex_unlock(&connection->data.mutex); 2449 - kvfree_rcu(old_net_conf); 2449 + kvfree_rcu_mightsleep(old_net_conf); 2450 2450 2451 2451 if (connection->cstate >= C_WF_REPORT_PARAMS) { 2452 2452 struct drbd_peer_device *peer_device; ··· 2860 2860 new_disk_conf->disk_size = (sector_t)rs.resize_size; 2861 2861 rcu_assign_pointer(device->ldev->disk_conf, new_disk_conf); 2862 2862 mutex_unlock(&device->resource->conf_update); 2863 - kvfree_rcu(old_disk_conf); 2863 + kvfree_rcu_mightsleep(old_disk_conf); 2864 2864 new_disk_conf = NULL; 2865 2865 } 2866 2866
+2 -2
drivers/block/drbd/drbd_receiver.c
··· 3759 3759 drbd_info(connection, "peer data-integrity-alg: %s\n", 3760 3760 integrity_alg[0] ? integrity_alg : "(none)"); 3761 3761 3762 - kvfree_rcu(old_net_conf); 3762 + kvfree_rcu_mightsleep(old_net_conf); 3763 3763 return 0; 3764 3764 3765 3765 disconnect_rcu_unlock: ··· 4127 4127 4128 4128 rcu_assign_pointer(device->ldev->disk_conf, new_disk_conf); 4129 4129 mutex_unlock(&connection->resource->conf_update); 4130 - kvfree_rcu(old_disk_conf); 4130 + kvfree_rcu_mightsleep(old_disk_conf); 4131 4131 4132 4132 drbd_info(device, "Peer sets u_size to %lu sectors (old: %lu)\n", 4133 4133 (unsigned long)p_usize, (unsigned long)my_usize);
+1 -1
drivers/block/drbd/drbd_state.c
··· 2071 2071 conn_free_crypto(connection); 2072 2072 mutex_unlock(&connection->resource->conf_update); 2073 2073 2074 - kvfree_rcu(old_conf); 2074 + kvfree_rcu_mightsleep(old_conf); 2075 2075 } 2076 2076 2077 2077 if (ns_max.susp_fen) {
+1 -1
drivers/misc/vmw_vmci/vmci_context.c
··· 687 687 spin_unlock(&context->lock); 688 688 689 689 if (notifier) 690 - kvfree_rcu(notifier); 690 + kvfree_rcu_mightsleep(notifier); 691 691 692 692 vmci_ctx_put(context); 693 693
+1 -1
drivers/misc/vmw_vmci/vmci_event.c
··· 209 209 if (!s) 210 210 return VMCI_ERROR_NOT_FOUND; 211 211 212 - kvfree_rcu(s); 212 + kvfree_rcu_mightsleep(s); 213 213 214 214 return VMCI_SUCCESS; 215 215 }
+1 -1
drivers/net/ethernet/mellanox/mlx5/core/en/tc/int_port.c
··· 242 242 mlx5_del_flow_rules(int_port->rx_rule); 243 243 mapping_remove(ctx, int_port->mapping); 244 244 mlx5e_int_port_metadata_free(priv, int_port->match_metadata); 245 - kfree_rcu(int_port); 245 + kfree_rcu_mightsleep(int_port); 246 246 priv->num_ports--; 247 247 } 248 248
+2 -2
drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c
··· 670 670 671 671 mlx5e_macsec_cleanup_sa(macsec, tx_sa, true); 672 672 mlx5_destroy_encryption_key(macsec->mdev, tx_sa->enc_key_id); 673 - kfree_rcu(tx_sa); 673 + kfree_rcu_mightsleep(tx_sa); 674 674 macsec_device->tx_sa[assoc_num] = NULL; 675 675 676 676 out: ··· 849 849 xa_erase(&macsec->sc_xarray, rx_sc->sc_xarray_element->fs_id); 850 850 metadata_dst_free(rx_sc->md_dst); 851 851 kfree(rx_sc->sc_xarray_element); 852 - kfree_rcu(rx_sc); 852 + kfree_rcu_mightsleep(rx_sc); 853 853 } 854 854 855 855 static int mlx5e_macsec_del_rxsc(struct macsec_context *ctx)
+1 -1
fs/ext4/super.c
··· 2500 2500 qname = rcu_replace_pointer(sbi->s_qf_names[i], qname, 2501 2501 lockdep_is_held(&sb->s_umount)); 2502 2502 if (qname) 2503 - kfree_rcu(qname); 2503 + kfree_rcu_mightsleep(qname); 2504 2504 } 2505 2505 } 2506 2506
+7 -1
include/linux/lockdep.h
··· 134 134 unsigned int read:2; /* see lock_acquire() comment */ 135 135 unsigned int check:1; /* see lock_acquire() comment */ 136 136 unsigned int hardirqs_off:1; 137 - unsigned int references:12; /* 32 bits */ 137 + unsigned int sync:1; 138 + unsigned int references:11; /* 32 bits */ 138 139 unsigned int pin_count; 139 140 }; 140 141 ··· 268 267 struct lockdep_map *nest_lock, unsigned long ip); 269 268 270 269 extern void lock_release(struct lockdep_map *lock, unsigned long ip); 270 + 271 + extern void lock_sync(struct lockdep_map *lock, unsigned int subclass, 272 + int read, int check, struct lockdep_map *nest_lock, 273 + unsigned long ip); 271 274 272 275 /* lock_is_held_type() returns */ 273 276 #define LOCK_STATE_UNKNOWN -1 ··· 559 554 #define lock_map_acquire_read(l) lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_) 560 555 #define lock_map_acquire_tryread(l) lock_acquire_shared_recursive(l, 0, 1, NULL, _THIS_IP_) 561 556 #define lock_map_release(l) lock_release(l, _THIS_IP_) 557 + #define lock_map_sync(l) lock_sync(l, 0, 0, 1, NULL, _THIS_IP_) 562 558 563 559 #ifdef CONFIG_PROVE_LOCKING 564 560 # define might_lock(lock) \
+4 -1
include/linux/notifier.h
··· 73 73 74 74 struct srcu_notifier_head { 75 75 struct mutex mutex; 76 + #ifdef CONFIG_TREE_SRCU 77 + struct srcu_usage srcuu; 78 + #endif 76 79 struct srcu_struct srcu; 77 80 struct notifier_block __rcu *head; 78 81 }; ··· 110 107 { \ 111 108 .mutex = __MUTEX_INITIALIZER(name.mutex), \ 112 109 .head = NULL, \ 113 - .srcu = __SRCU_STRUCT_INIT(name.srcu, pcpu), \ 110 + .srcu = __SRCU_STRUCT_INIT(name.srcu, name.srcuu, pcpu), \ 114 111 } 115 112 116 113 #define ATOMIC_NOTIFIER_HEAD(name) \
+32 -2
include/linux/srcu.h
··· 102 102 return lock_is_held(&ssp->dep_map); 103 103 } 104 104 105 + /* 106 + * Annotations provide deadlock detection for SRCU. 107 + * 108 + * Similar to other lockdep annotations, except there is an additional 109 + * srcu_lock_sync(), which is basically an empty *write*-side critical section, 110 + * see lock_sync() for more information. 111 + */ 112 + 113 + /* Annotates a srcu_read_lock() */ 114 + static inline void srcu_lock_acquire(struct lockdep_map *map) 115 + { 116 + lock_map_acquire_read(map); 117 + } 118 + 119 + /* Annotates a srcu_read_lock() */ 120 + static inline void srcu_lock_release(struct lockdep_map *map) 121 + { 122 + lock_map_release(map); 123 + } 124 + 125 + /* Annotates a synchronize_srcu() */ 126 + static inline void srcu_lock_sync(struct lockdep_map *map) 127 + { 128 + lock_map_sync(map); 129 + } 130 + 105 131 #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */ 106 132 107 133 static inline int srcu_read_lock_held(const struct srcu_struct *ssp) 108 134 { 109 135 return 1; 110 136 } 137 + 138 + #define srcu_lock_acquire(m) do { } while (0) 139 + #define srcu_lock_release(m) do { } while (0) 140 + #define srcu_lock_sync(m) do { } while (0) 111 141 112 142 #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */ 113 143 ··· 212 182 213 183 srcu_check_nmi_safety(ssp, false); 214 184 retval = __srcu_read_lock(ssp); 215 - rcu_lock_acquire(&(ssp)->dep_map); 185 + srcu_lock_acquire(&(ssp)->dep_map); 216 186 return retval; 217 187 } 218 188 ··· 284 254 { 285 255 WARN_ON_ONCE(idx & ~0x1); 286 256 srcu_check_nmi_safety(ssp, false); 287 - rcu_lock_release(&(ssp)->dep_map); 257 + srcu_lock_release(&(ssp)->dep_map); 288 258 __srcu_read_unlock(ssp, idx); 289 259 } 290 260
+3 -3
include/linux/srcutiny.h
··· 31 31 32 32 void srcu_drive_gp(struct work_struct *wp); 33 33 34 - #define __SRCU_STRUCT_INIT(name, __ignored) \ 34 + #define __SRCU_STRUCT_INIT(name, __ignored, ___ignored) \ 35 35 { \ 36 36 .srcu_wq = __SWAIT_QUEUE_HEAD_INITIALIZER(name.srcu_wq), \ 37 37 .srcu_cb_tail = &name.srcu_cb_head, \ ··· 44 44 * Tree SRCU, which needs some per-CPU data. 45 45 */ 46 46 #define DEFINE_SRCU(name) \ 47 - struct srcu_struct name = __SRCU_STRUCT_INIT(name, name) 47 + struct srcu_struct name = __SRCU_STRUCT_INIT(name, name, name) 48 48 #define DEFINE_STATIC_SRCU(name) \ 49 - static struct srcu_struct name = __SRCU_STRUCT_INIT(name, name) 49 + static struct srcu_struct name = __SRCU_STRUCT_INIT(name, name, name) 50 50 51 51 void synchronize_srcu(struct srcu_struct *ssp); 52 52
+66 -30
include/linux/srcutree.h
··· 58 58 }; 59 59 60 60 /* 61 - * Per-SRCU-domain structure, similar in function to rcu_state. 61 + * Per-SRCU-domain structure, update-side data linked from srcu_struct. 62 62 */ 63 - struct srcu_struct { 63 + struct srcu_usage { 64 64 struct srcu_node *node; /* Combining tree. */ 65 65 struct srcu_node *level[RCU_NUM_LVLS + 1]; 66 66 /* First node at each level. */ ··· 68 68 struct mutex srcu_cb_mutex; /* Serialize CB preparation. */ 69 69 spinlock_t __private lock; /* Protect counters and size state. */ 70 70 struct mutex srcu_gp_mutex; /* Serialize GP work. */ 71 - unsigned int srcu_idx; /* Current rdr array element. */ 72 71 unsigned long srcu_gp_seq; /* Grace-period seq #. */ 73 72 unsigned long srcu_gp_seq_needed; /* Latest gp_seq needed. */ 74 73 unsigned long srcu_gp_seq_needed_exp; /* Furthest future exp GP. */ ··· 76 77 unsigned long srcu_size_jiffies; /* Current contention-measurement interval. */ 77 78 unsigned long srcu_n_lock_retries; /* Contention events in current interval. */ 78 79 unsigned long srcu_n_exp_nodelay; /* # expedited no-delays in current GP phase. */ 79 - struct srcu_data __percpu *sda; /* Per-CPU srcu_data array. */ 80 80 bool sda_is_static; /* May ->sda be passed to free_percpu()? */ 81 81 unsigned long srcu_barrier_seq; /* srcu_barrier seq #. */ 82 82 struct mutex srcu_barrier_mutex; /* Serialize barrier ops. */ ··· 87 89 unsigned long reschedule_jiffies; 88 90 unsigned long reschedule_count; 89 91 struct delayed_work work; 90 - struct lockdep_map dep_map; 92 + struct srcu_struct *srcu_ssp; 91 93 }; 92 94 93 - /* Values for size state variable (->srcu_size_state). */ 94 - #define SRCU_SIZE_SMALL 0 95 - #define SRCU_SIZE_ALLOC 1 96 - #define SRCU_SIZE_WAIT_BARRIER 2 97 - #define SRCU_SIZE_WAIT_CALL 3 98 - #define SRCU_SIZE_WAIT_CBS1 4 99 - #define SRCU_SIZE_WAIT_CBS2 5 100 - #define SRCU_SIZE_WAIT_CBS3 6 101 - #define SRCU_SIZE_WAIT_CBS4 7 102 - #define SRCU_SIZE_BIG 8 95 + /* 96 + * Per-SRCU-domain structure, similar in function to rcu_state. 97 + */ 98 + struct srcu_struct { 99 + unsigned int srcu_idx; /* Current rdr array element. */ 100 + struct srcu_data __percpu *sda; /* Per-CPU srcu_data array. */ 101 + struct lockdep_map dep_map; 102 + struct srcu_usage *srcu_sup; /* Update-side data. */ 103 + }; 104 + 105 + // Values for size state variable (->srcu_size_state). Once the state 106 + // has been set to SRCU_SIZE_ALLOC, the grace-period code advances through 107 + // this state machine one step per grace period until the SRCU_SIZE_BIG state 108 + // is reached. Otherwise, the state machine remains in the SRCU_SIZE_SMALL 109 + // state indefinitely. 110 + #define SRCU_SIZE_SMALL 0 // No srcu_node combining tree, ->node == NULL 111 + #define SRCU_SIZE_ALLOC 1 // An srcu_node tree is being allocated, initialized, 112 + // and then referenced by ->node. It will not be used. 113 + #define SRCU_SIZE_WAIT_BARRIER 2 // The srcu_node tree starts being used by everything 114 + // except call_srcu(), especially by srcu_barrier(). 115 + // By the end of this state, all CPUs and threads 116 + // are aware of this tree's existence. 117 + #define SRCU_SIZE_WAIT_CALL 3 // The srcu_node tree starts being used by call_srcu(). 118 + // By the end of this state, all of the call_srcu() 119 + // invocations that were running on a non-boot CPU 120 + // and using the boot CPU's callback queue will have 121 + // completed. 122 + #define SRCU_SIZE_WAIT_CBS1 4 // Don't trust the ->srcu_have_cbs[] grace-period 123 + #define SRCU_SIZE_WAIT_CBS2 5 // sequence elements or the ->srcu_data_have_cbs[] 124 + #define SRCU_SIZE_WAIT_CBS3 6 // CPU-bitmask elements until all four elements of 125 + #define SRCU_SIZE_WAIT_CBS4 7 // each array have been initialized. 126 + #define SRCU_SIZE_BIG 8 // The srcu_node combining tree is fully initialized 127 + // and all aspects of it are being put to use. 103 128 104 129 /* Values for state variable (bottom bits of ->srcu_gp_seq). */ 105 130 #define SRCU_STATE_IDLE 0 106 131 #define SRCU_STATE_SCAN1 1 107 132 #define SRCU_STATE_SCAN2 2 108 133 109 - #define __SRCU_STRUCT_INIT(name, pcpu_name) \ 110 - { \ 111 - .sda = &pcpu_name, \ 112 - .lock = __SPIN_LOCK_UNLOCKED(name.lock), \ 113 - .srcu_gp_seq_needed = -1UL, \ 114 - .work = __DELAYED_WORK_INITIALIZER(name.work, NULL, 0), \ 115 - __SRCU_DEP_MAP_INIT(name) \ 134 + #define __SRCU_USAGE_INIT(name) \ 135 + { \ 136 + .lock = __SPIN_LOCK_UNLOCKED(name.lock), \ 137 + .srcu_gp_seq_needed = -1UL, \ 138 + .work = __DELAYED_WORK_INITIALIZER(name.work, NULL, 0), \ 139 + } 140 + 141 + #define __SRCU_STRUCT_INIT_COMMON(name, usage_name) \ 142 + .srcu_sup = &usage_name, \ 143 + __SRCU_DEP_MAP_INIT(name) 144 + 145 + #define __SRCU_STRUCT_INIT_MODULE(name, usage_name) \ 146 + { \ 147 + __SRCU_STRUCT_INIT_COMMON(name, usage_name) \ 148 + } 149 + 150 + #define __SRCU_STRUCT_INIT(name, usage_name, pcpu_name) \ 151 + { \ 152 + .sda = &pcpu_name, \ 153 + __SRCU_STRUCT_INIT_COMMON(name, usage_name) \ 116 154 } 117 155 118 156 /* ··· 171 137 * See include/linux/percpu-defs.h for the rules on per-CPU variables. 172 138 */ 173 139 #ifdef MODULE 174 - # define __DEFINE_SRCU(name, is_static) \ 175 - is_static struct srcu_struct name; \ 176 - extern struct srcu_struct * const __srcu_struct_##name; \ 177 - struct srcu_struct * const __srcu_struct_##name \ 140 + # define __DEFINE_SRCU(name, is_static) \ 141 + static struct srcu_usage name##_srcu_usage = __SRCU_USAGE_INIT(name##_srcu_usage); \ 142 + is_static struct srcu_struct name = __SRCU_STRUCT_INIT_MODULE(name, name##_srcu_usage); \ 143 + extern struct srcu_struct * const __srcu_struct_##name; \ 144 + struct srcu_struct * const __srcu_struct_##name \ 178 145 __section("___srcu_struct_ptrs") = &name 179 146 #else 180 - # define __DEFINE_SRCU(name, is_static) \ 181 - static DEFINE_PER_CPU(struct srcu_data, name##_srcu_data); \ 182 - is_static struct srcu_struct name = \ 183 - __SRCU_STRUCT_INIT(name, name##_srcu_data) 147 + # define __DEFINE_SRCU(name, is_static) \ 148 + static DEFINE_PER_CPU(struct srcu_data, name##_srcu_data); \ 149 + static struct srcu_usage name##_srcu_usage = __SRCU_USAGE_INIT(name##_srcu_usage); \ 150 + is_static struct srcu_struct name = \ 151 + __SRCU_STRUCT_INIT(name, name##_srcu_usage, name##_srcu_data) 184 152 #endif 185 153 #define DEFINE_SRCU(name) __DEFINE_SRCU(name, /* not static */) 186 154 #define DEFINE_STATIC_SRCU(name) __DEFINE_SRCU(name, static)
+2
include/linux/tick.h
··· 216 216 enum tick_dep_bits bit); 217 217 extern void tick_nohz_dep_clear_signal(struct signal_struct *signal, 218 218 enum tick_dep_bits bit); 219 + extern bool tick_nohz_cpu_hotpluggable(unsigned int cpu); 219 220 220 221 /* 221 222 * The below are tick_nohz_[set,clear]_dep() wrappers that optimize off-cases ··· 281 280 282 281 static inline void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit) { } 283 282 static inline void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit) { } 283 + static inline bool tick_nohz_cpu_hotpluggable(unsigned int cpu) { return true; } 284 284 285 285 static inline void tick_dep_set(enum tick_dep_bits bit) { } 286 286 static inline void tick_dep_clear(enum tick_dep_bits bit) { }
+1 -3
include/trace/events/rcu.h
··· 776 776 ), 777 777 778 778 TP_fast_assign( 779 - strncpy(__entry->rcutorturename, rcutorturename, 780 - RCUTORTURENAME_LEN); 781 - __entry->rcutorturename[RCUTORTURENAME_LEN - 1] = 0; 779 + strscpy(__entry->rcutorturename, rcutorturename, RCUTORTURENAME_LEN); 782 780 __entry->rhp = rhp; 783 781 __entry->secs = secs; 784 782 __entry->c_old = c_old;
+2 -1
include/trace/events/timer.h
··· 371 371 tick_dep_name(PERF_EVENTS) \ 372 372 tick_dep_name(SCHED) \ 373 373 tick_dep_name(CLOCK_UNSTABLE) \ 374 - tick_dep_name_end(RCU) 374 + tick_dep_name(RCU) \ 375 + tick_dep_name_end(RCU_EXP) 375 376 376 377 #undef tick_dep_name 377 378 #undef tick_dep_mask_name
+57 -7
kernel/locking/lockdep.c
··· 1881 1881 struct lock_class *source = hlock_class(src); 1882 1882 struct lock_class *target = hlock_class(tgt); 1883 1883 struct lock_class *parent = prt->class; 1884 + int src_read = src->read; 1885 + int tgt_read = tgt->read; 1884 1886 1885 1887 /* 1886 1888 * A direct locking problem where unsafe_class lock is taken ··· 1910 1908 printk(" Possible unsafe locking scenario:\n\n"); 1911 1909 printk(" CPU0 CPU1\n"); 1912 1910 printk(" ---- ----\n"); 1913 - printk(" lock("); 1911 + if (tgt_read != 0) 1912 + printk(" rlock("); 1913 + else 1914 + printk(" lock("); 1914 1915 __print_lock_name(target); 1915 1916 printk(KERN_CONT ");\n"); 1916 1917 printk(" lock("); ··· 1922 1917 printk(" lock("); 1923 1918 __print_lock_name(target); 1924 1919 printk(KERN_CONT ");\n"); 1925 - printk(" lock("); 1920 + if (src_read != 0) 1921 + printk(" rlock("); 1922 + else if (src->sync) 1923 + printk(" sync("); 1924 + else 1925 + printk(" lock("); 1926 1926 __print_lock_name(source); 1927 1927 printk(KERN_CONT ");\n"); 1928 1928 printk("\n *** DEADLOCK ***\n\n"); ··· 4541 4531 return 0; 4542 4532 } 4543 4533 } 4544 - if (!hlock->hardirqs_off) { 4534 + 4535 + /* 4536 + * For lock_sync(), don't mark the ENABLED usage, since lock_sync() 4537 + * creates no critical section and no extra dependency can be introduced 4538 + * by interrupts 4539 + */ 4540 + if (!hlock->hardirqs_off && !hlock->sync) { 4545 4541 if (hlock->read) { 4546 4542 if (!mark_lock(curr, hlock, 4547 4543 LOCK_ENABLED_HARDIRQ_READ)) ··· 4926 4910 static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass, 4927 4911 int trylock, int read, int check, int hardirqs_off, 4928 4912 struct lockdep_map *nest_lock, unsigned long ip, 4929 - int references, int pin_count) 4913 + int references, int pin_count, int sync) 4930 4914 { 4931 4915 struct task_struct *curr = current; 4932 4916 struct lock_class *class = NULL; ··· 4977 4961 4978 4962 class_idx = class - lock_classes; 4979 4963 4980 - if (depth) { /* we're holding locks */ 4964 + if (depth && !sync) { 4965 + /* we're holding locks and the new held lock is not a sync */ 4981 4966 hlock = curr->held_locks + depth - 1; 4982 4967 if (hlock->class_idx == class_idx && nest_lock) { 4983 4968 if (!references) ··· 5012 4995 hlock->trylock = trylock; 5013 4996 hlock->read = read; 5014 4997 hlock->check = check; 4998 + hlock->sync = !!sync; 5015 4999 hlock->hardirqs_off = !!hardirqs_off; 5016 5000 hlock->references = references; 5017 5001 #ifdef CONFIG_LOCK_STAT ··· 5073 5055 5074 5056 if (!validate_chain(curr, hlock, chain_head, chain_key)) 5075 5057 return 0; 5058 + 5059 + /* For lock_sync(), we are done here since no actual critical section */ 5060 + if (hlock->sync) 5061 + return 1; 5076 5062 5077 5063 curr->curr_chain_key = chain_key; 5078 5064 curr->lockdep_depth++; ··· 5219 5197 hlock->read, hlock->check, 5220 5198 hlock->hardirqs_off, 5221 5199 hlock->nest_lock, hlock->acquire_ip, 5222 - hlock->references, hlock->pin_count)) { 5200 + hlock->references, hlock->pin_count, 0)) { 5223 5201 case 0: 5224 5202 return 1; 5225 5203 case 1: ··· 5689 5667 5690 5668 lockdep_recursion_inc(); 5691 5669 __lock_acquire(lock, subclass, trylock, read, check, 5692 - irqs_disabled_flags(flags), nest_lock, ip, 0, 0); 5670 + irqs_disabled_flags(flags), nest_lock, ip, 0, 0, 0); 5693 5671 lockdep_recursion_finish(); 5694 5672 raw_local_irq_restore(flags); 5695 5673 } ··· 5714 5692 raw_local_irq_restore(flags); 5715 5693 } 5716 5694 EXPORT_SYMBOL_GPL(lock_release); 5695 + 5696 + /* 5697 + * lock_sync() - A special annotation for synchronize_{s,}rcu()-like API. 5698 + * 5699 + * No actual critical section is created by the APIs annotated with this: these 5700 + * APIs are used to wait for one or multiple critical sections (on other CPUs 5701 + * or threads), and it means that calling these APIs inside these critical 5702 + * sections is potential deadlock. 5703 + */ 5704 + void lock_sync(struct lockdep_map *lock, unsigned subclass, int read, 5705 + int check, struct lockdep_map *nest_lock, unsigned long ip) 5706 + { 5707 + unsigned long flags; 5708 + 5709 + if (unlikely(!lockdep_enabled())) 5710 + return; 5711 + 5712 + raw_local_irq_save(flags); 5713 + check_flags(flags); 5714 + 5715 + lockdep_recursion_inc(); 5716 + __lock_acquire(lock, subclass, 0, read, check, 5717 + irqs_disabled_flags(flags), nest_lock, ip, 0, 0, 1); 5718 + check_chain_key(current); 5719 + lockdep_recursion_finish(); 5720 + raw_local_irq_restore(flags); 5721 + } 5722 + EXPORT_SYMBOL_GPL(lock_sync); 5717 5723 5718 5724 noinstr int lock_is_held_type(const struct lockdep_map *lock, int read) 5719 5725 {
+1 -1
kernel/locking/test-ww_mutex.c
··· 659 659 if (ret) 660 660 return ret; 661 661 662 - ret = stress(4095, hweight32(STRESS_ALL)*ncpus, STRESS_ALL); 662 + ret = stress(2047, hweight32(STRESS_ALL)*ncpus, STRESS_ALL); 663 663 if (ret) 664 664 return ret; 665 665
-3
kernel/rcu/Kconfig
··· 53 53 54 54 Say N if you are unsure. 55 55 56 - config SRCU 57 - def_bool y 58 - 59 56 config TINY_SRCU 60 57 bool 61 58 default y if TINY_RCU
+41 -2
kernel/rcu/rcu.h
··· 14 14 15 15 /* 16 16 * Grace-period counter management. 17 + * 18 + * The two least significant bits contain the control flags. 19 + * The most significant bits contain the grace-period sequence counter. 20 + * 21 + * When both control flags are zero, no grace period is in progress. 22 + * When either bit is non-zero, a grace period has started and is in 23 + * progress. When the grace period completes, the control flags are reset 24 + * to 0 and the grace-period sequence counter is incremented. 25 + * 26 + * However some specific RCU usages make use of custom values. 27 + * 28 + * SRCU special control values: 29 + * 30 + * SRCU_SNP_INIT_SEQ : Invalid/init value set when SRCU node 31 + * is initialized. 32 + * 33 + * SRCU_STATE_IDLE : No SRCU gp is in progress 34 + * 35 + * SRCU_STATE_SCAN1 : State set by rcu_seq_start(). Indicates 36 + * we are scanning the readers on the slot 37 + * defined as inactive (there might well 38 + * be pending readers that will use that 39 + * index, but their number is bounded). 40 + * 41 + * SRCU_STATE_SCAN2 : State set manually via rcu_seq_set_state() 42 + * Indicates we are flipping the readers 43 + * index and then scanning the readers on the 44 + * slot newly designated as inactive (again, 45 + * the number of pending readers that will use 46 + * this inactive index is bounded). 47 + * 48 + * RCU polled GP special control value: 49 + * 50 + * RCU_GET_STATE_COMPLETED : State value indicating an already-completed 51 + * polled GP has completed. This value covers 52 + * both the state and the counter of the 53 + * grace-period sequence number. 17 54 */ 18 55 19 56 #define RCU_SEQ_CTR_SHIFT 2 ··· 378 341 * specified state structure (for SRCU) or the only rcu_state structure 379 342 * (for RCU). 380 343 */ 381 - #define srcu_for_each_node_breadth_first(sp, rnp) \ 344 + #define _rcu_for_each_node_breadth_first(sp, rnp) \ 382 345 for ((rnp) = &(sp)->node[0]; \ 383 346 (rnp) < &(sp)->node[rcu_num_nodes]; (rnp)++) 384 347 #define rcu_for_each_node_breadth_first(rnp) \ 385 - srcu_for_each_node_breadth_first(&rcu_state, rnp) 348 + _rcu_for_each_node_breadth_first(&rcu_state, rnp) 349 + #define srcu_for_each_node_breadth_first(ssp, rnp) \ 350 + _rcu_for_each_node_breadth_first(ssp->srcu_sup, rnp) 386 351 387 352 /* 388 353 * Scan the leaves of the rcu_node hierarchy for the rcu_state structure.
+4 -5
kernel/rcu/rcuscale.c
··· 631 631 static int 632 632 rcu_scale_shutdown(void *arg) 633 633 { 634 - wait_event(shutdown_wq, 635 - atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters); 634 + wait_event_idle(shutdown_wq, atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters); 636 635 smp_mb(); /* Wake before output. */ 637 636 rcu_scale_cleanup(); 638 637 kernel_power_off(); ··· 715 716 // is tested. 716 717 if ((kfree_rcu_test_single && !kfree_rcu_test_double) || 717 718 (kfree_rcu_test_both && torture_random(&tr) & 0x800)) 718 - kfree_rcu(alloc_ptr); 719 + kfree_rcu_mightsleep(alloc_ptr); 719 720 else 720 721 kfree_rcu(alloc_ptr, rh); 721 722 } ··· 770 771 static int 771 772 kfree_scale_shutdown(void *arg) 772 773 { 773 - wait_event(shutdown_wq, 774 - atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads); 774 + wait_event_idle(shutdown_wq, 775 + atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads); 775 776 776 777 smp_mb(); /* Wake before output. */ 777 778
+224 -10
kernel/rcu/rcutorture.c
··· 119 119 torture_param(int, test_boost, 1, "Test RCU prio boost: 0=no, 1=maybe, 2=yes."); 120 120 torture_param(int, test_boost_duration, 4, "Duration of each boost test, seconds."); 121 121 torture_param(int, test_boost_interval, 7, "Interval between boost tests, seconds."); 122 + torture_param(int, test_nmis, 0, "End-test NMI tests, 0 to disable."); 122 123 torture_param(bool, test_no_idle_hz, true, "Test support for tickless idle CPUs"); 124 + torture_param(int, test_srcu_lockdep, 0, "Test specified SRCU deadlock scenario."); 123 125 torture_param(int, verbose, 1, "Enable verbose debugging printk()s"); 124 126 125 127 static char *torture_type = "rcu"; ··· 181 179 static atomic_t n_rcu_torture_error; 182 180 static long n_rcu_torture_barrier_error; 183 181 static long n_rcu_torture_boost_ktrerror; 184 - static long n_rcu_torture_boost_rterror; 185 182 static long n_rcu_torture_boost_failure; 186 183 static long n_rcu_torture_boosts; 187 184 static atomic_long_t n_rcu_torture_timers; ··· 2195 2194 atomic_read(&n_rcu_torture_alloc), 2196 2195 atomic_read(&n_rcu_torture_alloc_fail), 2197 2196 atomic_read(&n_rcu_torture_free)); 2198 - pr_cont("rtmbe: %d rtmbkf: %d/%d rtbe: %ld rtbke: %ld rtbre: %ld ", 2197 + pr_cont("rtmbe: %d rtmbkf: %d/%d rtbe: %ld rtbke: %ld ", 2199 2198 atomic_read(&n_rcu_torture_mberror), 2200 2199 atomic_read(&n_rcu_torture_mbchk_fail), atomic_read(&n_rcu_torture_mbchk_tries), 2201 2200 n_rcu_torture_barrier_error, 2202 - n_rcu_torture_boost_ktrerror, 2203 - n_rcu_torture_boost_rterror); 2201 + n_rcu_torture_boost_ktrerror); 2204 2202 pr_cont("rtbf: %ld rtb: %ld nt: %ld ", 2205 2203 n_rcu_torture_boost_failure, 2206 2204 n_rcu_torture_boosts, ··· 2217 2217 if (atomic_read(&n_rcu_torture_mberror) || 2218 2218 atomic_read(&n_rcu_torture_mbchk_fail) || 2219 2219 n_rcu_torture_barrier_error || n_rcu_torture_boost_ktrerror || 2220 - n_rcu_torture_boost_rterror || n_rcu_torture_boost_failure || 2221 - i > 1) { 2220 + n_rcu_torture_boost_failure || i > 1) { 2222 2221 pr_cont("%s", "!!! "); 2223 2222 atomic_inc(&n_rcu_torture_error); 2224 2223 WARN_ON_ONCE(atomic_read(&n_rcu_torture_mberror)); 2225 2224 WARN_ON_ONCE(atomic_read(&n_rcu_torture_mbchk_fail)); 2226 2225 WARN_ON_ONCE(n_rcu_torture_barrier_error); // rcu_barrier() 2227 2226 WARN_ON_ONCE(n_rcu_torture_boost_ktrerror); // no boost kthread 2228 - WARN_ON_ONCE(n_rcu_torture_boost_rterror); // can't set RT prio 2229 2227 WARN_ON_ONCE(n_rcu_torture_boost_failure); // boost failed (TIMER_SOFTIRQ RT prio?) 2230 2228 WARN_ON_ONCE(i > 1); // Too-short grace period 2231 2229 } ··· 2356 2358 "n_barrier_cbs=%d " 2357 2359 "onoff_interval=%d onoff_holdoff=%d " 2358 2360 "read_exit_delay=%d read_exit_burst=%d " 2359 - "nocbs_nthreads=%d nocbs_toggle=%d\n", 2361 + "nocbs_nthreads=%d nocbs_toggle=%d " 2362 + "test_nmis=%d\n", 2360 2363 torture_type, tag, nrealreaders, nfakewriters, 2361 2364 stat_interval, verbose, test_no_idle_hz, shuffle_interval, 2362 2365 stutter, irqreader, fqs_duration, fqs_holdoff, fqs_stutter, ··· 2368 2369 n_barrier_cbs, 2369 2370 onoff_interval, onoff_holdoff, 2370 2371 read_exit_delay, read_exit_burst, 2371 - nocbs_nthreads, nocbs_toggle); 2372 + nocbs_nthreads, nocbs_toggle, 2373 + test_nmis); 2372 2374 } 2373 2375 2374 2376 static int rcutorture_booster_cleanup(unsigned int cpu) ··· 3273 3273 torture_stop_kthread(rcutorture_read_exit, read_exit_task); 3274 3274 } 3275 3275 3276 + static void rcutorture_test_nmis(int n) 3277 + { 3278 + #if IS_BUILTIN(CONFIG_RCU_TORTURE_TEST) 3279 + int cpu; 3280 + int dumpcpu; 3281 + int i; 3282 + 3283 + for (i = 0; i < n; i++) { 3284 + preempt_disable(); 3285 + cpu = smp_processor_id(); 3286 + dumpcpu = cpu + 1; 3287 + if (dumpcpu >= nr_cpu_ids) 3288 + dumpcpu = 0; 3289 + pr_alert("%s: CPU %d invoking dump_cpu_task(%d)\n", __func__, cpu, dumpcpu); 3290 + dump_cpu_task(dumpcpu); 3291 + preempt_enable(); 3292 + schedule_timeout_uninterruptible(15 * HZ); 3293 + } 3294 + #else // #if IS_BUILTIN(CONFIG_RCU_TORTURE_TEST) 3295 + WARN_ONCE(n, "Non-zero rcutorture.test_nmis=%d permitted only when rcutorture is built in.\n", test_nmis); 3296 + #endif // #else // #if IS_BUILTIN(CONFIG_RCU_TORTURE_TEST) 3297 + } 3298 + 3276 3299 static enum cpuhp_state rcutor_hp; 3277 3300 3278 3301 static void ··· 3319 3296 rcu_gp_slow_unregister(NULL); 3320 3297 return; 3321 3298 } 3299 + 3300 + rcutorture_test_nmis(test_nmis); 3322 3301 3323 3302 if (cur_ops->gp_kthread_dbg) 3324 3303 cur_ops->gp_kthread_dbg(); ··· 3488 3463 cur_ops->sync(); 3489 3464 } 3490 3465 3466 + static DEFINE_MUTEX(mut0); 3467 + static DEFINE_MUTEX(mut1); 3468 + static DEFINE_MUTEX(mut2); 3469 + static DEFINE_MUTEX(mut3); 3470 + static DEFINE_MUTEX(mut4); 3471 + static DEFINE_MUTEX(mut5); 3472 + static DEFINE_MUTEX(mut6); 3473 + static DEFINE_MUTEX(mut7); 3474 + static DEFINE_MUTEX(mut8); 3475 + static DEFINE_MUTEX(mut9); 3476 + 3477 + static DECLARE_RWSEM(rwsem0); 3478 + static DECLARE_RWSEM(rwsem1); 3479 + static DECLARE_RWSEM(rwsem2); 3480 + static DECLARE_RWSEM(rwsem3); 3481 + static DECLARE_RWSEM(rwsem4); 3482 + static DECLARE_RWSEM(rwsem5); 3483 + static DECLARE_RWSEM(rwsem6); 3484 + static DECLARE_RWSEM(rwsem7); 3485 + static DECLARE_RWSEM(rwsem8); 3486 + static DECLARE_RWSEM(rwsem9); 3487 + 3488 + DEFINE_STATIC_SRCU(srcu0); 3489 + DEFINE_STATIC_SRCU(srcu1); 3490 + DEFINE_STATIC_SRCU(srcu2); 3491 + DEFINE_STATIC_SRCU(srcu3); 3492 + DEFINE_STATIC_SRCU(srcu4); 3493 + DEFINE_STATIC_SRCU(srcu5); 3494 + DEFINE_STATIC_SRCU(srcu6); 3495 + DEFINE_STATIC_SRCU(srcu7); 3496 + DEFINE_STATIC_SRCU(srcu8); 3497 + DEFINE_STATIC_SRCU(srcu9); 3498 + 3499 + static int srcu_lockdep_next(const char *f, const char *fl, const char *fs, const char *fu, int i, 3500 + int cyclelen, int deadlock) 3501 + { 3502 + int j = i + 1; 3503 + 3504 + if (j >= cyclelen) 3505 + j = deadlock ? 0 : -1; 3506 + if (j >= 0) 3507 + pr_info("%s: %s(%d), %s(%d), %s(%d)\n", f, fl, i, fs, j, fu, i); 3508 + else 3509 + pr_info("%s: %s(%d), %s(%d)\n", f, fl, i, fu, i); 3510 + return j; 3511 + } 3512 + 3513 + // Test lockdep on SRCU-based deadlock scenarios. 3514 + static void rcu_torture_init_srcu_lockdep(void) 3515 + { 3516 + int cyclelen; 3517 + int deadlock; 3518 + bool err = false; 3519 + int i; 3520 + int j; 3521 + int idx; 3522 + struct mutex *muts[] = { &mut0, &mut1, &mut2, &mut3, &mut4, 3523 + &mut5, &mut6, &mut7, &mut8, &mut9 }; 3524 + struct rw_semaphore *rwsems[] = { &rwsem0, &rwsem1, &rwsem2, &rwsem3, &rwsem4, 3525 + &rwsem5, &rwsem6, &rwsem7, &rwsem8, &rwsem9 }; 3526 + struct srcu_struct *srcus[] = { &srcu0, &srcu1, &srcu2, &srcu3, &srcu4, 3527 + &srcu5, &srcu6, &srcu7, &srcu8, &srcu9 }; 3528 + int testtype; 3529 + 3530 + if (!test_srcu_lockdep) 3531 + return; 3532 + 3533 + deadlock = test_srcu_lockdep / 1000; 3534 + testtype = (test_srcu_lockdep / 10) % 100; 3535 + cyclelen = test_srcu_lockdep % 10; 3536 + WARN_ON_ONCE(ARRAY_SIZE(muts) != ARRAY_SIZE(srcus)); 3537 + if (WARN_ONCE(deadlock != !!deadlock, 3538 + "%s: test_srcu_lockdep=%d and deadlock digit %d must be zero or one.\n", 3539 + __func__, test_srcu_lockdep, deadlock)) 3540 + err = true; 3541 + if (WARN_ONCE(cyclelen <= 0, 3542 + "%s: test_srcu_lockdep=%d and cycle-length digit %d must be greater than zero.\n", 3543 + __func__, test_srcu_lockdep, cyclelen)) 3544 + err = true; 3545 + if (err) 3546 + goto err_out; 3547 + 3548 + if (testtype == 0) { 3549 + pr_info("%s: test_srcu_lockdep = %05d: SRCU %d-way %sdeadlock.\n", 3550 + __func__, test_srcu_lockdep, cyclelen, deadlock ? "" : "non-"); 3551 + if (deadlock && cyclelen == 1) 3552 + pr_info("%s: Expect hang.\n", __func__); 3553 + for (i = 0; i < cyclelen; i++) { 3554 + j = srcu_lockdep_next(__func__, "srcu_read_lock", "synchronize_srcu", 3555 + "srcu_read_unlock", i, cyclelen, deadlock); 3556 + idx = srcu_read_lock(srcus[i]); 3557 + if (j >= 0) 3558 + synchronize_srcu(srcus[j]); 3559 + srcu_read_unlock(srcus[i], idx); 3560 + } 3561 + return; 3562 + } 3563 + 3564 + if (testtype == 1) { 3565 + pr_info("%s: test_srcu_lockdep = %05d: SRCU/mutex %d-way %sdeadlock.\n", 3566 + __func__, test_srcu_lockdep, cyclelen, deadlock ? "" : "non-"); 3567 + for (i = 0; i < cyclelen; i++) { 3568 + pr_info("%s: srcu_read_lock(%d), mutex_lock(%d), mutex_unlock(%d), srcu_read_unlock(%d)\n", 3569 + __func__, i, i, i, i); 3570 + idx = srcu_read_lock(srcus[i]); 3571 + mutex_lock(muts[i]); 3572 + mutex_unlock(muts[i]); 3573 + srcu_read_unlock(srcus[i], idx); 3574 + 3575 + j = srcu_lockdep_next(__func__, "mutex_lock", "synchronize_srcu", 3576 + "mutex_unlock", i, cyclelen, deadlock); 3577 + mutex_lock(muts[i]); 3578 + if (j >= 0) 3579 + synchronize_srcu(srcus[j]); 3580 + mutex_unlock(muts[i]); 3581 + } 3582 + return; 3583 + } 3584 + 3585 + if (testtype == 2) { 3586 + pr_info("%s: test_srcu_lockdep = %05d: SRCU/rwsem %d-way %sdeadlock.\n", 3587 + __func__, test_srcu_lockdep, cyclelen, deadlock ? "" : "non-"); 3588 + for (i = 0; i < cyclelen; i++) { 3589 + pr_info("%s: srcu_read_lock(%d), down_read(%d), up_read(%d), srcu_read_unlock(%d)\n", 3590 + __func__, i, i, i, i); 3591 + idx = srcu_read_lock(srcus[i]); 3592 + down_read(rwsems[i]); 3593 + up_read(rwsems[i]); 3594 + srcu_read_unlock(srcus[i], idx); 3595 + 3596 + j = srcu_lockdep_next(__func__, "down_write", "synchronize_srcu", 3597 + "up_write", i, cyclelen, deadlock); 3598 + down_write(rwsems[i]); 3599 + if (j >= 0) 3600 + synchronize_srcu(srcus[j]); 3601 + up_write(rwsems[i]); 3602 + } 3603 + return; 3604 + } 3605 + 3606 + #ifdef CONFIG_TASKS_TRACE_RCU 3607 + if (testtype == 3) { 3608 + pr_info("%s: test_srcu_lockdep = %05d: SRCU and Tasks Trace RCU %d-way %sdeadlock.\n", 3609 + __func__, test_srcu_lockdep, cyclelen, deadlock ? "" : "non-"); 3610 + if (deadlock && cyclelen == 1) 3611 + pr_info("%s: Expect hang.\n", __func__); 3612 + for (i = 0; i < cyclelen; i++) { 3613 + char *fl = i == 0 ? "rcu_read_lock_trace" : "srcu_read_lock"; 3614 + char *fs = i == cyclelen - 1 ? "synchronize_rcu_tasks_trace" 3615 + : "synchronize_srcu"; 3616 + char *fu = i == 0 ? "rcu_read_unlock_trace" : "srcu_read_unlock"; 3617 + 3618 + j = srcu_lockdep_next(__func__, fl, fs, fu, i, cyclelen, deadlock); 3619 + if (i == 0) 3620 + rcu_read_lock_trace(); 3621 + else 3622 + idx = srcu_read_lock(srcus[i]); 3623 + if (j >= 0) { 3624 + if (i == cyclelen - 1) 3625 + synchronize_rcu_tasks_trace(); 3626 + else 3627 + synchronize_srcu(srcus[j]); 3628 + } 3629 + if (i == 0) 3630 + rcu_read_unlock_trace(); 3631 + else 3632 + srcu_read_unlock(srcus[i], idx); 3633 + } 3634 + return; 3635 + } 3636 + #endif // #ifdef CONFIG_TASKS_TRACE_RCU 3637 + 3638 + err_out: 3639 + pr_info("%s: test_srcu_lockdep = %05d does nothing.\n", __func__, test_srcu_lockdep); 3640 + pr_info("%s: test_srcu_lockdep = DNNL.\n", __func__); 3641 + pr_info("%s: D: Deadlock if nonzero.\n", __func__); 3642 + pr_info("%s: NN: Test number, 0=SRCU, 1=SRCU/mutex, 2=SRCU/rwsem, 3=SRCU/Tasks Trace RCU.\n", __func__); 3643 + pr_info("%s: L: Cycle length.\n", __func__); 3644 + if (!IS_ENABLED(CONFIG_TASKS_TRACE_RCU)) 3645 + pr_info("%s: NN=3 disallowed because kernel is built with CONFIG_TASKS_TRACE_RCU=n\n", __func__); 3646 + } 3647 + 3491 3648 static int __init 3492 3649 rcu_torture_init(void) 3493 3650 { ··· 3708 3501 pr_alert("rcu-torture: ->fqs NULL and non-zero fqs_duration, fqs disabled.\n"); 3709 3502 fqs_duration = 0; 3710 3503 } 3504 + if (nocbs_nthreads != 0 && (cur_ops != &rcu_ops || 3505 + !IS_ENABLED(CONFIG_RCU_NOCB_CPU))) { 3506 + pr_alert("rcu-torture types: %s and CONFIG_RCU_NOCB_CPU=%d, nocb toggle disabled.\n", 3507 + cur_ops->name, IS_ENABLED(CONFIG_RCU_NOCB_CPU)); 3508 + nocbs_nthreads = 0; 3509 + } 3711 3510 if (cur_ops->init) 3712 3511 cur_ops->init(); 3512 + 3513 + rcu_torture_init_srcu_lockdep(); 3713 3514 3714 3515 if (nreaders >= 0) { 3715 3516 nrealreaders = nreaders; ··· 3755 3540 atomic_set(&n_rcu_torture_error, 0); 3756 3541 n_rcu_torture_barrier_error = 0; 3757 3542 n_rcu_torture_boost_ktrerror = 0; 3758 - n_rcu_torture_boost_rterror = 0; 3759 3543 n_rcu_torture_boost_failure = 0; 3760 3544 n_rcu_torture_boosts = 0; 3761 3545 for (i = 0; i < RCU_TORTURE_PIPE_LEN + 1; i++)
+1 -1
kernel/rcu/refscale.c
··· 1031 1031 static int 1032 1032 ref_scale_shutdown(void *arg) 1033 1033 { 1034 - wait_event(shutdown_wq, shutdown_start); 1034 + wait_event_idle(shutdown_wq, shutdown_start); 1035 1035 1036 1036 smp_mb(); // Wake before output. 1037 1037 ref_scale_cleanup();
+2
kernel/rcu/srcutiny.c
··· 197 197 { 198 198 struct rcu_synchronize rs; 199 199 200 + srcu_lock_sync(&ssp->dep_map); 201 + 200 202 RCU_LOCKDEP_WARN(lockdep_is_held(ssp) || 201 203 lock_is_held(&rcu_bh_lock_map) || 202 204 lock_is_held(&rcu_lock_map) ||
+243 -195
kernel/rcu/srcutree.c
··· 103 103 104 104 #define spin_trylock_irqsave_rcu_node(p, flags) \ 105 105 ({ \ 106 - bool ___locked = spin_trylock_irqsave(&ACCESS_PRIVATE(p, lock), flags); \ 106 + bool ___locked = spin_trylock_irqsave(&ACCESS_PRIVATE(p, lock), flags); \ 107 107 \ 108 108 if (___locked) \ 109 109 smp_mb__after_unlock_lock(); \ ··· 135 135 spin_lock_init(&ACCESS_PRIVATE(sdp, lock)); 136 136 rcu_segcblist_init(&sdp->srcu_cblist); 137 137 sdp->srcu_cblist_invoking = false; 138 - sdp->srcu_gp_seq_needed = ssp->srcu_gp_seq; 139 - sdp->srcu_gp_seq_needed_exp = ssp->srcu_gp_seq; 138 + sdp->srcu_gp_seq_needed = ssp->srcu_sup->srcu_gp_seq; 139 + sdp->srcu_gp_seq_needed_exp = ssp->srcu_sup->srcu_gp_seq; 140 140 sdp->mynode = NULL; 141 141 sdp->cpu = cpu; 142 142 INIT_WORK(&sdp->work, srcu_invoke_callbacks); ··· 173 173 174 174 /* Initialize geometry if it has not already been initialized. */ 175 175 rcu_init_geometry(); 176 - ssp->node = kcalloc(rcu_num_nodes, sizeof(*ssp->node), gfp_flags); 177 - if (!ssp->node) 176 + ssp->srcu_sup->node = kcalloc(rcu_num_nodes, sizeof(*ssp->srcu_sup->node), gfp_flags); 177 + if (!ssp->srcu_sup->node) 178 178 return false; 179 179 180 180 /* Work out the overall tree geometry. */ 181 - ssp->level[0] = &ssp->node[0]; 181 + ssp->srcu_sup->level[0] = &ssp->srcu_sup->node[0]; 182 182 for (i = 1; i < rcu_num_lvls; i++) 183 - ssp->level[i] = ssp->level[i - 1] + num_rcu_lvl[i - 1]; 183 + ssp->srcu_sup->level[i] = ssp->srcu_sup->level[i - 1] + num_rcu_lvl[i - 1]; 184 184 rcu_init_levelspread(levelspread, num_rcu_lvl); 185 185 186 186 /* Each pass through this loop initializes one srcu_node structure. */ ··· 195 195 snp->srcu_gp_seq_needed_exp = SRCU_SNP_INIT_SEQ; 196 196 snp->grplo = -1; 197 197 snp->grphi = -1; 198 - if (snp == &ssp->node[0]) { 198 + if (snp == &ssp->srcu_sup->node[0]) { 199 199 /* Root node, special case. */ 200 200 snp->srcu_parent = NULL; 201 201 continue; 202 202 } 203 203 204 204 /* Non-root node. */ 205 - if (snp == ssp->level[level + 1]) 205 + if (snp == ssp->srcu_sup->level[level + 1]) 206 206 level++; 207 - snp->srcu_parent = ssp->level[level - 1] + 208 - (snp - ssp->level[level]) / 207 + snp->srcu_parent = ssp->srcu_sup->level[level - 1] + 208 + (snp - ssp->srcu_sup->level[level]) / 209 209 levelspread[level - 1]; 210 210 } 211 211 ··· 214 214 * leaves of the srcu_node tree. 215 215 */ 216 216 level = rcu_num_lvls - 1; 217 - snp_first = ssp->level[level]; 217 + snp_first = ssp->srcu_sup->level[level]; 218 218 for_each_possible_cpu(cpu) { 219 219 sdp = per_cpu_ptr(ssp->sda, cpu); 220 220 sdp->mynode = &snp_first[cpu / levelspread[level]]; ··· 225 225 } 226 226 sdp->grpmask = 1 << (cpu - sdp->mynode->grplo); 227 227 } 228 - smp_store_release(&ssp->srcu_size_state, SRCU_SIZE_WAIT_BARRIER); 228 + smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER); 229 229 return true; 230 230 } 231 231 ··· 236 236 */ 237 237 static int init_srcu_struct_fields(struct srcu_struct *ssp, bool is_static) 238 238 { 239 - ssp->srcu_size_state = SRCU_SIZE_SMALL; 240 - ssp->node = NULL; 241 - mutex_init(&ssp->srcu_cb_mutex); 242 - mutex_init(&ssp->srcu_gp_mutex); 239 + if (!is_static) 240 + ssp->srcu_sup = kzalloc(sizeof(*ssp->srcu_sup), GFP_KERNEL); 241 + if (!ssp->srcu_sup) 242 + return -ENOMEM; 243 + if (!is_static) 244 + spin_lock_init(&ACCESS_PRIVATE(ssp->srcu_sup, lock)); 245 + ssp->srcu_sup->srcu_size_state = SRCU_SIZE_SMALL; 246 + ssp->srcu_sup->node = NULL; 247 + mutex_init(&ssp->srcu_sup->srcu_cb_mutex); 248 + mutex_init(&ssp->srcu_sup->srcu_gp_mutex); 243 249 ssp->srcu_idx = 0; 244 - ssp->srcu_gp_seq = 0; 245 - ssp->srcu_barrier_seq = 0; 246 - mutex_init(&ssp->srcu_barrier_mutex); 247 - atomic_set(&ssp->srcu_barrier_cpu_cnt, 0); 248 - INIT_DELAYED_WORK(&ssp->work, process_srcu); 249 - ssp->sda_is_static = is_static; 250 + ssp->srcu_sup->srcu_gp_seq = 0; 251 + ssp->srcu_sup->srcu_barrier_seq = 0; 252 + mutex_init(&ssp->srcu_sup->srcu_barrier_mutex); 253 + atomic_set(&ssp->srcu_sup->srcu_barrier_cpu_cnt, 0); 254 + INIT_DELAYED_WORK(&ssp->srcu_sup->work, process_srcu); 255 + ssp->srcu_sup->sda_is_static = is_static; 250 256 if (!is_static) 251 257 ssp->sda = alloc_percpu(struct srcu_data); 252 - if (!ssp->sda) 258 + if (!ssp->sda) { 259 + if (!is_static) 260 + kfree(ssp->srcu_sup); 253 261 return -ENOMEM; 262 + } 254 263 init_srcu_struct_data(ssp); 255 - ssp->srcu_gp_seq_needed_exp = 0; 256 - ssp->srcu_last_gp_end = ktime_get_mono_fast_ns(); 257 - if (READ_ONCE(ssp->srcu_size_state) == SRCU_SIZE_SMALL && SRCU_SIZING_IS_INIT()) { 264 + ssp->srcu_sup->srcu_gp_seq_needed_exp = 0; 265 + ssp->srcu_sup->srcu_last_gp_end = ktime_get_mono_fast_ns(); 266 + if (READ_ONCE(ssp->srcu_sup->srcu_size_state) == SRCU_SIZE_SMALL && SRCU_SIZING_IS_INIT()) { 258 267 if (!init_srcu_struct_nodes(ssp, GFP_ATOMIC)) { 259 - if (!ssp->sda_is_static) { 268 + if (!ssp->srcu_sup->sda_is_static) { 260 269 free_percpu(ssp->sda); 261 270 ssp->sda = NULL; 271 + kfree(ssp->srcu_sup); 262 272 return -ENOMEM; 263 273 } 264 274 } else { 265 - WRITE_ONCE(ssp->srcu_size_state, SRCU_SIZE_BIG); 275 + WRITE_ONCE(ssp->srcu_sup->srcu_size_state, SRCU_SIZE_BIG); 266 276 } 267 277 } 268 - smp_store_release(&ssp->srcu_gp_seq_needed, 0); /* Init done. */ 278 + ssp->srcu_sup->srcu_ssp = ssp; 279 + smp_store_release(&ssp->srcu_sup->srcu_gp_seq_needed, 0); /* Init done. */ 269 280 return 0; 270 281 } 271 282 ··· 288 277 /* Don't re-initialize a lock while it is held. */ 289 278 debug_check_no_locks_freed((void *)ssp, sizeof(*ssp)); 290 279 lockdep_init_map(&ssp->dep_map, name, key, 0); 291 - spin_lock_init(&ACCESS_PRIVATE(ssp, lock)); 292 280 return init_srcu_struct_fields(ssp, false); 293 281 } 294 282 EXPORT_SYMBOL_GPL(__init_srcu_struct); ··· 304 294 */ 305 295 int init_srcu_struct(struct srcu_struct *ssp) 306 296 { 307 - spin_lock_init(&ACCESS_PRIVATE(ssp, lock)); 308 297 return init_srcu_struct_fields(ssp, false); 309 298 } 310 299 EXPORT_SYMBOL_GPL(init_srcu_struct); ··· 315 306 */ 316 307 static void __srcu_transition_to_big(struct srcu_struct *ssp) 317 308 { 318 - lockdep_assert_held(&ACCESS_PRIVATE(ssp, lock)); 319 - smp_store_release(&ssp->srcu_size_state, SRCU_SIZE_ALLOC); 309 + lockdep_assert_held(&ACCESS_PRIVATE(ssp->srcu_sup, lock)); 310 + smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_ALLOC); 320 311 } 321 312 322 313 /* ··· 327 318 unsigned long flags; 328 319 329 320 /* Double-checked locking on ->srcu_size-state. */ 330 - if (smp_load_acquire(&ssp->srcu_size_state) != SRCU_SIZE_SMALL) 321 + if (smp_load_acquire(&ssp->srcu_sup->srcu_size_state) != SRCU_SIZE_SMALL) 331 322 return; 332 - spin_lock_irqsave_rcu_node(ssp, flags); 333 - if (smp_load_acquire(&ssp->srcu_size_state) != SRCU_SIZE_SMALL) { 334 - spin_unlock_irqrestore_rcu_node(ssp, flags); 323 + spin_lock_irqsave_rcu_node(ssp->srcu_sup, flags); 324 + if (smp_load_acquire(&ssp->srcu_sup->srcu_size_state) != SRCU_SIZE_SMALL) { 325 + spin_unlock_irqrestore_rcu_node(ssp->srcu_sup, flags); 335 326 return; 336 327 } 337 328 __srcu_transition_to_big(ssp); 338 - spin_unlock_irqrestore_rcu_node(ssp, flags); 329 + spin_unlock_irqrestore_rcu_node(ssp->srcu_sup, flags); 339 330 } 340 331 341 332 /* ··· 346 337 { 347 338 unsigned long j; 348 339 349 - if (!SRCU_SIZING_IS_CONTEND() || ssp->srcu_size_state) 340 + if (!SRCU_SIZING_IS_CONTEND() || ssp->srcu_sup->srcu_size_state) 350 341 return; 351 342 j = jiffies; 352 - if (ssp->srcu_size_jiffies != j) { 353 - ssp->srcu_size_jiffies = j; 354 - ssp->srcu_n_lock_retries = 0; 343 + if (ssp->srcu_sup->srcu_size_jiffies != j) { 344 + ssp->srcu_sup->srcu_size_jiffies = j; 345 + ssp->srcu_sup->srcu_n_lock_retries = 0; 355 346 } 356 - if (++ssp->srcu_n_lock_retries <= small_contention_lim) 347 + if (++ssp->srcu_sup->srcu_n_lock_retries <= small_contention_lim) 357 348 return; 358 349 __srcu_transition_to_big(ssp); 359 350 } ··· 370 361 371 362 if (spin_trylock_irqsave_rcu_node(sdp, *flags)) 372 363 return; 373 - spin_lock_irqsave_rcu_node(ssp, *flags); 364 + spin_lock_irqsave_rcu_node(ssp->srcu_sup, *flags); 374 365 spin_lock_irqsave_check_contention(ssp); 375 - spin_unlock_irqrestore_rcu_node(ssp, *flags); 366 + spin_unlock_irqrestore_rcu_node(ssp->srcu_sup, *flags); 376 367 spin_lock_irqsave_rcu_node(sdp, *flags); 377 368 } 378 369 ··· 384 375 */ 385 376 static void spin_lock_irqsave_ssp_contention(struct srcu_struct *ssp, unsigned long *flags) 386 377 { 387 - if (spin_trylock_irqsave_rcu_node(ssp, *flags)) 378 + if (spin_trylock_irqsave_rcu_node(ssp->srcu_sup, *flags)) 388 379 return; 389 - spin_lock_irqsave_rcu_node(ssp, *flags); 380 + spin_lock_irqsave_rcu_node(ssp->srcu_sup, *flags); 390 381 spin_lock_irqsave_check_contention(ssp); 391 382 } 392 383 ··· 403 394 unsigned long flags; 404 395 405 396 /* The smp_load_acquire() pairs with the smp_store_release(). */ 406 - if (!rcu_seq_state(smp_load_acquire(&ssp->srcu_gp_seq_needed))) /*^^^*/ 397 + if (!rcu_seq_state(smp_load_acquire(&ssp->srcu_sup->srcu_gp_seq_needed))) /*^^^*/ 407 398 return; /* Already initialized. */ 408 - spin_lock_irqsave_rcu_node(ssp, flags); 409 - if (!rcu_seq_state(ssp->srcu_gp_seq_needed)) { 410 - spin_unlock_irqrestore_rcu_node(ssp, flags); 399 + spin_lock_irqsave_rcu_node(ssp->srcu_sup, flags); 400 + if (!rcu_seq_state(ssp->srcu_sup->srcu_gp_seq_needed)) { 401 + spin_unlock_irqrestore_rcu_node(ssp->srcu_sup, flags); 411 402 return; 412 403 } 413 404 init_srcu_struct_fields(ssp, true); 414 - spin_unlock_irqrestore_rcu_node(ssp, flags); 405 + spin_unlock_irqrestore_rcu_node(ssp->srcu_sup, flags); 415 406 } 416 407 417 408 /* ··· 616 607 unsigned long gpstart; 617 608 unsigned long j; 618 609 unsigned long jbase = SRCU_INTERVAL; 610 + struct srcu_usage *sup = ssp->srcu_sup; 619 611 620 - if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), READ_ONCE(ssp->srcu_gp_seq_needed_exp))) 612 + if (ULONG_CMP_LT(READ_ONCE(sup->srcu_gp_seq), READ_ONCE(sup->srcu_gp_seq_needed_exp))) 621 613 jbase = 0; 622 - if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) { 614 + if (rcu_seq_state(READ_ONCE(sup->srcu_gp_seq))) { 623 615 j = jiffies - 1; 624 - gpstart = READ_ONCE(ssp->srcu_gp_start); 616 + gpstart = READ_ONCE(sup->srcu_gp_start); 625 617 if (time_after(j, gpstart)) 626 618 jbase += j - gpstart; 627 619 if (!jbase) { 628 - WRITE_ONCE(ssp->srcu_n_exp_nodelay, READ_ONCE(ssp->srcu_n_exp_nodelay) + 1); 629 - if (READ_ONCE(ssp->srcu_n_exp_nodelay) > srcu_max_nodelay_phase) 620 + WRITE_ONCE(sup->srcu_n_exp_nodelay, READ_ONCE(sup->srcu_n_exp_nodelay) + 1); 621 + if (READ_ONCE(sup->srcu_n_exp_nodelay) > srcu_max_nodelay_phase) 630 622 jbase = 1; 631 623 } 632 624 } ··· 644 634 void cleanup_srcu_struct(struct srcu_struct *ssp) 645 635 { 646 636 int cpu; 637 + struct srcu_usage *sup = ssp->srcu_sup; 647 638 648 639 if (WARN_ON(!srcu_get_delay(ssp))) 649 640 return; /* Just leak it! */ 650 641 if (WARN_ON(srcu_readers_active(ssp))) 651 642 return; /* Just leak it! */ 652 - flush_delayed_work(&ssp->work); 643 + flush_delayed_work(&sup->work); 653 644 for_each_possible_cpu(cpu) { 654 645 struct srcu_data *sdp = per_cpu_ptr(ssp->sda, cpu); 655 646 ··· 659 648 if (WARN_ON(rcu_segcblist_n_cbs(&sdp->srcu_cblist))) 660 649 return; /* Forgot srcu_barrier(), so just leak it! */ 661 650 } 662 - if (WARN_ON(rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) != SRCU_STATE_IDLE) || 663 - WARN_ON(rcu_seq_current(&ssp->srcu_gp_seq) != ssp->srcu_gp_seq_needed) || 651 + if (WARN_ON(rcu_seq_state(READ_ONCE(sup->srcu_gp_seq)) != SRCU_STATE_IDLE) || 652 + WARN_ON(rcu_seq_current(&sup->srcu_gp_seq) != sup->srcu_gp_seq_needed) || 664 653 WARN_ON(srcu_readers_active(ssp))) { 665 654 pr_info("%s: Active srcu_struct %p read state: %d gp state: %lu/%lu\n", 666 - __func__, ssp, rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)), 667 - rcu_seq_current(&ssp->srcu_gp_seq), ssp->srcu_gp_seq_needed); 655 + __func__, ssp, rcu_seq_state(READ_ONCE(sup->srcu_gp_seq)), 656 + rcu_seq_current(&sup->srcu_gp_seq), sup->srcu_gp_seq_needed); 668 657 return; /* Caller forgot to stop doing call_srcu()? */ 669 658 } 670 - if (!ssp->sda_is_static) { 659 + kfree(sup->node); 660 + sup->node = NULL; 661 + sup->srcu_size_state = SRCU_SIZE_SMALL; 662 + if (!sup->sda_is_static) { 671 663 free_percpu(ssp->sda); 672 664 ssp->sda = NULL; 665 + kfree(sup); 666 + ssp->srcu_sup = NULL; 673 667 } 674 - kfree(ssp->node); 675 - ssp->node = NULL; 676 - ssp->srcu_size_state = SRCU_SIZE_SMALL; 677 668 } 678 669 EXPORT_SYMBOL_GPL(cleanup_srcu_struct); 679 670 ··· 773 760 struct srcu_data *sdp; 774 761 int state; 775 762 776 - if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER) 763 + if (smp_load_acquire(&ssp->srcu_sup->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER) 777 764 sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id()); 778 765 else 779 766 sdp = this_cpu_ptr(ssp->sda); 780 - lockdep_assert_held(&ACCESS_PRIVATE(ssp, lock)); 781 - WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed)); 767 + lockdep_assert_held(&ACCESS_PRIVATE(ssp->srcu_sup, lock)); 768 + WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_sup->srcu_gp_seq, ssp->srcu_sup->srcu_gp_seq_needed)); 782 769 spin_lock_rcu_node(sdp); /* Interrupts already disabled. */ 783 770 rcu_segcblist_advance(&sdp->srcu_cblist, 784 - rcu_seq_current(&ssp->srcu_gp_seq)); 771 + rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq)); 785 772 (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, 786 - rcu_seq_snap(&ssp->srcu_gp_seq)); 773 + rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq)); 787 774 spin_unlock_rcu_node(sdp); /* Interrupts remain disabled. */ 788 - WRITE_ONCE(ssp->srcu_gp_start, jiffies); 789 - WRITE_ONCE(ssp->srcu_n_exp_nodelay, 0); 775 + WRITE_ONCE(ssp->srcu_sup->srcu_gp_start, jiffies); 776 + WRITE_ONCE(ssp->srcu_sup->srcu_n_exp_nodelay, 0); 790 777 smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */ 791 - rcu_seq_start(&ssp->srcu_gp_seq); 792 - state = rcu_seq_state(ssp->srcu_gp_seq); 778 + rcu_seq_start(&ssp->srcu_sup->srcu_gp_seq); 779 + state = rcu_seq_state(ssp->srcu_sup->srcu_gp_seq); 793 780 WARN_ON_ONCE(state != SRCU_STATE_SCAN1); 794 781 } 795 782 ··· 862 849 unsigned long sgsne; 863 850 struct srcu_node *snp; 864 851 int ss_state; 852 + struct srcu_usage *sup = ssp->srcu_sup; 865 853 866 854 /* Prevent more than one additional grace period. */ 867 - mutex_lock(&ssp->srcu_cb_mutex); 855 + mutex_lock(&sup->srcu_cb_mutex); 868 856 869 857 /* End the current grace period. */ 870 - spin_lock_irq_rcu_node(ssp); 871 - idx = rcu_seq_state(ssp->srcu_gp_seq); 858 + spin_lock_irq_rcu_node(sup); 859 + idx = rcu_seq_state(sup->srcu_gp_seq); 872 860 WARN_ON_ONCE(idx != SRCU_STATE_SCAN2); 873 - if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), READ_ONCE(ssp->srcu_gp_seq_needed_exp))) 861 + if (ULONG_CMP_LT(READ_ONCE(sup->srcu_gp_seq), READ_ONCE(sup->srcu_gp_seq_needed_exp))) 874 862 cbdelay = 0; 875 863 876 - WRITE_ONCE(ssp->srcu_last_gp_end, ktime_get_mono_fast_ns()); 877 - rcu_seq_end(&ssp->srcu_gp_seq); 878 - gpseq = rcu_seq_current(&ssp->srcu_gp_seq); 879 - if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, gpseq)) 880 - WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, gpseq); 881 - spin_unlock_irq_rcu_node(ssp); 882 - mutex_unlock(&ssp->srcu_gp_mutex); 864 + WRITE_ONCE(sup->srcu_last_gp_end, ktime_get_mono_fast_ns()); 865 + rcu_seq_end(&sup->srcu_gp_seq); 866 + gpseq = rcu_seq_current(&sup->srcu_gp_seq); 867 + if (ULONG_CMP_LT(sup->srcu_gp_seq_needed_exp, gpseq)) 868 + WRITE_ONCE(sup->srcu_gp_seq_needed_exp, gpseq); 869 + spin_unlock_irq_rcu_node(sup); 870 + mutex_unlock(&sup->srcu_gp_mutex); 883 871 /* A new grace period can start at this point. But only one. */ 884 872 885 873 /* Initiate callback invocation as needed. */ 886 - ss_state = smp_load_acquire(&ssp->srcu_size_state); 874 + ss_state = smp_load_acquire(&sup->srcu_size_state); 887 875 if (ss_state < SRCU_SIZE_WAIT_BARRIER) { 888 876 srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, get_boot_cpu_id()), 889 877 cbdelay); ··· 893 879 srcu_for_each_node_breadth_first(ssp, snp) { 894 880 spin_lock_irq_rcu_node(snp); 895 881 cbs = false; 896 - last_lvl = snp >= ssp->level[rcu_num_lvls - 1]; 882 + last_lvl = snp >= sup->level[rcu_num_lvls - 1]; 897 883 if (last_lvl) 898 884 cbs = ss_state < SRCU_SIZE_BIG || snp->srcu_have_cbs[idx] == gpseq; 899 885 snp->srcu_have_cbs[idx] = gpseq; ··· 925 911 } 926 912 927 913 /* Callback initiation done, allow grace periods after next. */ 928 - mutex_unlock(&ssp->srcu_cb_mutex); 914 + mutex_unlock(&sup->srcu_cb_mutex); 929 915 930 916 /* Start a new grace period if needed. */ 931 - spin_lock_irq_rcu_node(ssp); 932 - gpseq = rcu_seq_current(&ssp->srcu_gp_seq); 917 + spin_lock_irq_rcu_node(sup); 918 + gpseq = rcu_seq_current(&sup->srcu_gp_seq); 933 919 if (!rcu_seq_state(gpseq) && 934 - ULONG_CMP_LT(gpseq, ssp->srcu_gp_seq_needed)) { 920 + ULONG_CMP_LT(gpseq, sup->srcu_gp_seq_needed)) { 935 921 srcu_gp_start(ssp); 936 - spin_unlock_irq_rcu_node(ssp); 922 + spin_unlock_irq_rcu_node(sup); 937 923 srcu_reschedule(ssp, 0); 938 924 } else { 939 - spin_unlock_irq_rcu_node(ssp); 925 + spin_unlock_irq_rcu_node(sup); 940 926 } 941 927 942 928 /* Transition to big if needed. */ ··· 944 930 if (ss_state == SRCU_SIZE_ALLOC) 945 931 init_srcu_struct_nodes(ssp, GFP_KERNEL); 946 932 else 947 - smp_store_release(&ssp->srcu_size_state, ss_state + 1); 933 + smp_store_release(&sup->srcu_size_state, ss_state + 1); 948 934 } 949 935 } 950 936 ··· 964 950 if (snp) 965 951 for (; snp != NULL; snp = snp->srcu_parent) { 966 952 sgsne = READ_ONCE(snp->srcu_gp_seq_needed_exp); 967 - if (WARN_ON_ONCE(rcu_seq_done(&ssp->srcu_gp_seq, s)) || 953 + if (WARN_ON_ONCE(rcu_seq_done(&ssp->srcu_sup->srcu_gp_seq, s)) || 968 954 (!srcu_invl_snp_seq(sgsne) && ULONG_CMP_GE(sgsne, s))) 969 955 return; 970 956 spin_lock_irqsave_rcu_node(snp, flags); ··· 977 963 spin_unlock_irqrestore_rcu_node(snp, flags); 978 964 } 979 965 spin_lock_irqsave_ssp_contention(ssp, &flags); 980 - if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, s)) 981 - WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, s); 982 - spin_unlock_irqrestore_rcu_node(ssp, flags); 966 + if (ULONG_CMP_LT(ssp->srcu_sup->srcu_gp_seq_needed_exp, s)) 967 + WRITE_ONCE(ssp->srcu_sup->srcu_gp_seq_needed_exp, s); 968 + spin_unlock_irqrestore_rcu_node(ssp->srcu_sup, flags); 983 969 } 984 970 985 971 /* ··· 1004 990 struct srcu_node *snp; 1005 991 struct srcu_node *snp_leaf; 1006 992 unsigned long snp_seq; 993 + struct srcu_usage *sup = ssp->srcu_sup; 1007 994 1008 995 /* Ensure that snp node tree is fully initialized before traversing it */ 1009 - if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER) 996 + if (smp_load_acquire(&sup->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER) 1010 997 snp_leaf = NULL; 1011 998 else 1012 999 snp_leaf = sdp->mynode; ··· 1015 1000 if (snp_leaf) 1016 1001 /* Each pass through the loop does one level of the srcu_node tree. */ 1017 1002 for (snp = snp_leaf; snp != NULL; snp = snp->srcu_parent) { 1018 - if (WARN_ON_ONCE(rcu_seq_done(&ssp->srcu_gp_seq, s)) && snp != snp_leaf) 1003 + if (WARN_ON_ONCE(rcu_seq_done(&sup->srcu_gp_seq, s)) && snp != snp_leaf) 1019 1004 return; /* GP already done and CBs recorded. */ 1020 1005 spin_lock_irqsave_rcu_node(snp, flags); 1021 1006 snp_seq = snp->srcu_have_cbs[idx]; ··· 1042 1027 1043 1028 /* Top of tree, must ensure the grace period will be started. */ 1044 1029 spin_lock_irqsave_ssp_contention(ssp, &flags); 1045 - if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed, s)) { 1030 + if (ULONG_CMP_LT(sup->srcu_gp_seq_needed, s)) { 1046 1031 /* 1047 1032 * Record need for grace period s. Pair with load 1048 1033 * acquire setting up for initialization. 1049 1034 */ 1050 - smp_store_release(&ssp->srcu_gp_seq_needed, s); /*^^^*/ 1035 + smp_store_release(&sup->srcu_gp_seq_needed, s); /*^^^*/ 1051 1036 } 1052 - if (!do_norm && ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, s)) 1053 - WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, s); 1037 + if (!do_norm && ULONG_CMP_LT(sup->srcu_gp_seq_needed_exp, s)) 1038 + WRITE_ONCE(sup->srcu_gp_seq_needed_exp, s); 1054 1039 1055 1040 /* If grace period not already in progress, start it. */ 1056 - if (!WARN_ON_ONCE(rcu_seq_done(&ssp->srcu_gp_seq, s)) && 1057 - rcu_seq_state(ssp->srcu_gp_seq) == SRCU_STATE_IDLE) { 1058 - WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed)); 1041 + if (!WARN_ON_ONCE(rcu_seq_done(&sup->srcu_gp_seq, s)) && 1042 + rcu_seq_state(sup->srcu_gp_seq) == SRCU_STATE_IDLE) { 1043 + WARN_ON_ONCE(ULONG_CMP_GE(sup->srcu_gp_seq, sup->srcu_gp_seq_needed)); 1059 1044 srcu_gp_start(ssp); 1060 1045 1061 1046 // And how can that list_add() in the "else" clause ··· 1064 1049 // can only be executed during early boot when there is only 1065 1050 // the one boot CPU running with interrupts still disabled. 1066 1051 if (likely(srcu_init_done)) 1067 - queue_delayed_work(rcu_gp_wq, &ssp->work, 1052 + queue_delayed_work(rcu_gp_wq, &sup->work, 1068 1053 !!srcu_get_delay(ssp)); 1069 - else if (list_empty(&ssp->work.work.entry)) 1070 - list_add(&ssp->work.work.entry, &srcu_boot_list); 1054 + else if (list_empty(&sup->work.work.entry)) 1055 + list_add(&sup->work.work.entry, &srcu_boot_list); 1071 1056 } 1072 - spin_unlock_irqrestore_rcu_node(ssp, flags); 1057 + spin_unlock_irqrestore_rcu_node(sup, flags); 1073 1058 } 1074 1059 1075 1060 /* ··· 1100 1085 static void srcu_flip(struct srcu_struct *ssp) 1101 1086 { 1102 1087 /* 1103 - * Ensure that if this updater saw a given reader's increment 1104 - * from __srcu_read_lock(), that reader was using an old value 1105 - * of ->srcu_idx. Also ensure that if a given reader sees the 1106 - * new value of ->srcu_idx, this updater's earlier scans cannot 1107 - * have seen that reader's increments (which is OK, because this 1108 - * grace period need not wait on that reader). 1088 + * Because the flip of ->srcu_idx is executed only if the 1089 + * preceding call to srcu_readers_active_idx_check() found that 1090 + * the ->srcu_unlock_count[] and ->srcu_lock_count[] sums matched 1091 + * and because that summing uses atomic_long_read(), there is 1092 + * ordering due to a control dependency between that summing and 1093 + * the WRITE_ONCE() in this call to srcu_flip(). This ordering 1094 + * ensures that if this updater saw a given reader's increment from 1095 + * __srcu_read_lock(), that reader was using a value of ->srcu_idx 1096 + * from before the previous call to srcu_flip(), which should be 1097 + * quite rare. This ordering thus helps forward progress because 1098 + * the grace period could otherwise be delayed by additional 1099 + * calls to __srcu_read_lock() using that old (soon to be new) 1100 + * value of ->srcu_idx. 1101 + * 1102 + * This sum-equality check and ordering also ensures that if 1103 + * a given call to __srcu_read_lock() uses the new value of 1104 + * ->srcu_idx, this updater's earlier scans cannot have seen 1105 + * that reader's increments, which is all to the good, because 1106 + * this grace period need not wait on that reader. After all, 1107 + * if those earlier scans had seen that reader, there would have 1108 + * been a sum mismatch and this code would not be reached. 1109 + * 1110 + * This means that the following smp_mb() is redundant, but 1111 + * it stays until either (1) Compilers learn about this sort of 1112 + * control dependency or (2) Some production workload running on 1113 + * a production system is unduly delayed by this slowpath smp_mb(). 1109 1114 */ 1110 1115 smp_mb(); /* E */ /* Pairs with B and C. */ 1111 1116 1112 - WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1); 1117 + WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1); // Flip the counter. 1113 1118 1114 1119 /* 1115 1120 * Ensure that if the updater misses an __srcu_read_unlock() ··· 1189 1154 1190 1155 /* First, see if enough time has passed since the last GP. */ 1191 1156 t = ktime_get_mono_fast_ns(); 1192 - tlast = READ_ONCE(ssp->srcu_last_gp_end); 1157 + tlast = READ_ONCE(ssp->srcu_sup->srcu_last_gp_end); 1193 1158 if (exp_holdoff == 0 || 1194 1159 time_in_range_open(t, tlast, tlast + exp_holdoff)) 1195 1160 return false; /* Too soon after last GP. */ 1196 1161 1197 1162 /* Next, check for probable idleness. */ 1198 - curseq = rcu_seq_current(&ssp->srcu_gp_seq); 1163 + curseq = rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq); 1199 1164 smp_mb(); /* Order ->srcu_gp_seq with ->srcu_gp_seq_needed. */ 1200 - if (ULONG_CMP_LT(curseq, READ_ONCE(ssp->srcu_gp_seq_needed))) 1165 + if (ULONG_CMP_LT(curseq, READ_ONCE(ssp->srcu_sup->srcu_gp_seq_needed))) 1201 1166 return false; /* Grace period in progress, so not idle. */ 1202 1167 smp_mb(); /* Order ->srcu_gp_seq with prior access. */ 1203 - if (curseq != rcu_seq_current(&ssp->srcu_gp_seq)) 1168 + if (curseq != rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq)) 1204 1169 return false; /* GP # changed, so not idle. */ 1205 1170 return true; /* With reasonable probability, idle! */ 1206 1171 } ··· 1234 1199 * sequence number cannot wrap around in the meantime. 1235 1200 */ 1236 1201 idx = __srcu_read_lock_nmisafe(ssp); 1237 - ss_state = smp_load_acquire(&ssp->srcu_size_state); 1202 + ss_state = smp_load_acquire(&ssp->srcu_sup->srcu_size_state); 1238 1203 if (ss_state < SRCU_SIZE_WAIT_CALL) 1239 1204 sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id()); 1240 1205 else ··· 1243 1208 if (rhp) 1244 1209 rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp); 1245 1210 rcu_segcblist_advance(&sdp->srcu_cblist, 1246 - rcu_seq_current(&ssp->srcu_gp_seq)); 1247 - s = rcu_seq_snap(&ssp->srcu_gp_seq); 1211 + rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq)); 1212 + s = rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq); 1248 1213 (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, s); 1249 1214 if (ULONG_CMP_LT(sdp->srcu_gp_seq_needed, s)) { 1250 1215 sdp->srcu_gp_seq_needed = s; ··· 1341 1306 static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm) 1342 1307 { 1343 1308 struct rcu_synchronize rcu; 1309 + 1310 + srcu_lock_sync(&ssp->dep_map); 1344 1311 1345 1312 RCU_LOCKDEP_WARN(lockdep_is_held(ssp) || 1346 1313 lock_is_held(&rcu_bh_lock_map) || ··· 1457 1420 // Any prior manipulation of SRCU-protected data must happen 1458 1421 // before the load from ->srcu_gp_seq. 1459 1422 smp_mb(); 1460 - return rcu_seq_snap(&ssp->srcu_gp_seq); 1423 + return rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq); 1461 1424 } 1462 1425 EXPORT_SYMBOL_GPL(get_state_synchronize_srcu); 1463 1426 ··· 1504 1467 */ 1505 1468 bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long cookie) 1506 1469 { 1507 - if (!rcu_seq_done(&ssp->srcu_gp_seq, cookie)) 1470 + if (!rcu_seq_done(&ssp->srcu_sup->srcu_gp_seq, cookie)) 1508 1471 return false; 1509 1472 // Ensure that the end of the SRCU grace period happens before 1510 1473 // any subsequent code that the caller might execute. ··· 1523 1486 1524 1487 sdp = container_of(rhp, struct srcu_data, srcu_barrier_head); 1525 1488 ssp = sdp->ssp; 1526 - if (atomic_dec_and_test(&ssp->srcu_barrier_cpu_cnt)) 1527 - complete(&ssp->srcu_barrier_completion); 1489 + if (atomic_dec_and_test(&ssp->srcu_sup->srcu_barrier_cpu_cnt)) 1490 + complete(&ssp->srcu_sup->srcu_barrier_completion); 1528 1491 } 1529 1492 1530 1493 /* ··· 1538 1501 static void srcu_barrier_one_cpu(struct srcu_struct *ssp, struct srcu_data *sdp) 1539 1502 { 1540 1503 spin_lock_irq_rcu_node(sdp); 1541 - atomic_inc(&ssp->srcu_barrier_cpu_cnt); 1504 + atomic_inc(&ssp->srcu_sup->srcu_barrier_cpu_cnt); 1542 1505 sdp->srcu_barrier_head.func = srcu_barrier_cb; 1543 1506 debug_rcu_head_queue(&sdp->srcu_barrier_head); 1544 1507 if (!rcu_segcblist_entrain(&sdp->srcu_cblist, 1545 1508 &sdp->srcu_barrier_head)) { 1546 1509 debug_rcu_head_unqueue(&sdp->srcu_barrier_head); 1547 - atomic_dec(&ssp->srcu_barrier_cpu_cnt); 1510 + atomic_dec(&ssp->srcu_sup->srcu_barrier_cpu_cnt); 1548 1511 } 1549 1512 spin_unlock_irq_rcu_node(sdp); 1550 1513 } ··· 1557 1520 { 1558 1521 int cpu; 1559 1522 int idx; 1560 - unsigned long s = rcu_seq_snap(&ssp->srcu_barrier_seq); 1523 + unsigned long s = rcu_seq_snap(&ssp->srcu_sup->srcu_barrier_seq); 1561 1524 1562 1525 check_init_srcu_struct(ssp); 1563 - mutex_lock(&ssp->srcu_barrier_mutex); 1564 - if (rcu_seq_done(&ssp->srcu_barrier_seq, s)) { 1526 + mutex_lock(&ssp->srcu_sup->srcu_barrier_mutex); 1527 + if (rcu_seq_done(&ssp->srcu_sup->srcu_barrier_seq, s)) { 1565 1528 smp_mb(); /* Force ordering following return. */ 1566 - mutex_unlock(&ssp->srcu_barrier_mutex); 1529 + mutex_unlock(&ssp->srcu_sup->srcu_barrier_mutex); 1567 1530 return; /* Someone else did our work for us. */ 1568 1531 } 1569 - rcu_seq_start(&ssp->srcu_barrier_seq); 1570 - init_completion(&ssp->srcu_barrier_completion); 1532 + rcu_seq_start(&ssp->srcu_sup->srcu_barrier_seq); 1533 + init_completion(&ssp->srcu_sup->srcu_barrier_completion); 1571 1534 1572 1535 /* Initial count prevents reaching zero until all CBs are posted. */ 1573 - atomic_set(&ssp->srcu_barrier_cpu_cnt, 1); 1536 + atomic_set(&ssp->srcu_sup->srcu_barrier_cpu_cnt, 1); 1574 1537 1575 1538 idx = __srcu_read_lock_nmisafe(ssp); 1576 - if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER) 1539 + if (smp_load_acquire(&ssp->srcu_sup->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER) 1577 1540 srcu_barrier_one_cpu(ssp, per_cpu_ptr(ssp->sda, get_boot_cpu_id())); 1578 1541 else 1579 1542 for_each_possible_cpu(cpu) ··· 1581 1544 __srcu_read_unlock_nmisafe(ssp, idx); 1582 1545 1583 1546 /* Remove the initial count, at which point reaching zero can happen. */ 1584 - if (atomic_dec_and_test(&ssp->srcu_barrier_cpu_cnt)) 1585 - complete(&ssp->srcu_barrier_completion); 1586 - wait_for_completion(&ssp->srcu_barrier_completion); 1547 + if (atomic_dec_and_test(&ssp->srcu_sup->srcu_barrier_cpu_cnt)) 1548 + complete(&ssp->srcu_sup->srcu_barrier_completion); 1549 + wait_for_completion(&ssp->srcu_sup->srcu_barrier_completion); 1587 1550 1588 - rcu_seq_end(&ssp->srcu_barrier_seq); 1589 - mutex_unlock(&ssp->srcu_barrier_mutex); 1551 + rcu_seq_end(&ssp->srcu_sup->srcu_barrier_seq); 1552 + mutex_unlock(&ssp->srcu_sup->srcu_barrier_mutex); 1590 1553 } 1591 1554 EXPORT_SYMBOL_GPL(srcu_barrier); 1592 1555 ··· 1612 1575 { 1613 1576 int idx; 1614 1577 1615 - mutex_lock(&ssp->srcu_gp_mutex); 1578 + mutex_lock(&ssp->srcu_sup->srcu_gp_mutex); 1616 1579 1617 1580 /* 1618 1581 * Because readers might be delayed for an extended period after ··· 1624 1587 * The load-acquire ensures that we see the accesses performed 1625 1588 * by the prior grace period. 1626 1589 */ 1627 - idx = rcu_seq_state(smp_load_acquire(&ssp->srcu_gp_seq)); /* ^^^ */ 1590 + idx = rcu_seq_state(smp_load_acquire(&ssp->srcu_sup->srcu_gp_seq)); /* ^^^ */ 1628 1591 if (idx == SRCU_STATE_IDLE) { 1629 - spin_lock_irq_rcu_node(ssp); 1630 - if (ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed)) { 1631 - WARN_ON_ONCE(rcu_seq_state(ssp->srcu_gp_seq)); 1632 - spin_unlock_irq_rcu_node(ssp); 1633 - mutex_unlock(&ssp->srcu_gp_mutex); 1592 + spin_lock_irq_rcu_node(ssp->srcu_sup); 1593 + if (ULONG_CMP_GE(ssp->srcu_sup->srcu_gp_seq, ssp->srcu_sup->srcu_gp_seq_needed)) { 1594 + WARN_ON_ONCE(rcu_seq_state(ssp->srcu_sup->srcu_gp_seq)); 1595 + spin_unlock_irq_rcu_node(ssp->srcu_sup); 1596 + mutex_unlock(&ssp->srcu_sup->srcu_gp_mutex); 1634 1597 return; 1635 1598 } 1636 - idx = rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)); 1599 + idx = rcu_seq_state(READ_ONCE(ssp->srcu_sup->srcu_gp_seq)); 1637 1600 if (idx == SRCU_STATE_IDLE) 1638 1601 srcu_gp_start(ssp); 1639 - spin_unlock_irq_rcu_node(ssp); 1602 + spin_unlock_irq_rcu_node(ssp->srcu_sup); 1640 1603 if (idx != SRCU_STATE_IDLE) { 1641 - mutex_unlock(&ssp->srcu_gp_mutex); 1604 + mutex_unlock(&ssp->srcu_sup->srcu_gp_mutex); 1642 1605 return; /* Someone else started the grace period. */ 1643 1606 } 1644 1607 } 1645 1608 1646 - if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) == SRCU_STATE_SCAN1) { 1609 + if (rcu_seq_state(READ_ONCE(ssp->srcu_sup->srcu_gp_seq)) == SRCU_STATE_SCAN1) { 1647 1610 idx = 1 ^ (ssp->srcu_idx & 1); 1648 1611 if (!try_check_zero(ssp, idx, 1)) { 1649 - mutex_unlock(&ssp->srcu_gp_mutex); 1612 + mutex_unlock(&ssp->srcu_sup->srcu_gp_mutex); 1650 1613 return; /* readers present, retry later. */ 1651 1614 } 1652 1615 srcu_flip(ssp); 1653 - spin_lock_irq_rcu_node(ssp); 1654 - rcu_seq_set_state(&ssp->srcu_gp_seq, SRCU_STATE_SCAN2); 1655 - ssp->srcu_n_exp_nodelay = 0; 1656 - spin_unlock_irq_rcu_node(ssp); 1616 + spin_lock_irq_rcu_node(ssp->srcu_sup); 1617 + rcu_seq_set_state(&ssp->srcu_sup->srcu_gp_seq, SRCU_STATE_SCAN2); 1618 + ssp->srcu_sup->srcu_n_exp_nodelay = 0; 1619 + spin_unlock_irq_rcu_node(ssp->srcu_sup); 1657 1620 } 1658 1621 1659 - if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) == SRCU_STATE_SCAN2) { 1622 + if (rcu_seq_state(READ_ONCE(ssp->srcu_sup->srcu_gp_seq)) == SRCU_STATE_SCAN2) { 1660 1623 1661 1624 /* 1662 1625 * SRCU read-side critical sections are normally short, ··· 1664 1627 */ 1665 1628 idx = 1 ^ (ssp->srcu_idx & 1); 1666 1629 if (!try_check_zero(ssp, idx, 2)) { 1667 - mutex_unlock(&ssp->srcu_gp_mutex); 1630 + mutex_unlock(&ssp->srcu_sup->srcu_gp_mutex); 1668 1631 return; /* readers present, retry later. */ 1669 1632 } 1670 - ssp->srcu_n_exp_nodelay = 0; 1633 + ssp->srcu_sup->srcu_n_exp_nodelay = 0; 1671 1634 srcu_gp_end(ssp); /* Releases ->srcu_gp_mutex. */ 1672 1635 } 1673 1636 } ··· 1693 1656 rcu_cblist_init(&ready_cbs); 1694 1657 spin_lock_irq_rcu_node(sdp); 1695 1658 rcu_segcblist_advance(&sdp->srcu_cblist, 1696 - rcu_seq_current(&ssp->srcu_gp_seq)); 1659 + rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq)); 1697 1660 if (sdp->srcu_cblist_invoking || 1698 1661 !rcu_segcblist_ready_cbs(&sdp->srcu_cblist)) { 1699 1662 spin_unlock_irq_rcu_node(sdp); ··· 1721 1684 spin_lock_irq_rcu_node(sdp); 1722 1685 rcu_segcblist_add_len(&sdp->srcu_cblist, -len); 1723 1686 (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, 1724 - rcu_seq_snap(&ssp->srcu_gp_seq)); 1687 + rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq)); 1725 1688 sdp->srcu_cblist_invoking = false; 1726 1689 more = rcu_segcblist_ready_cbs(&sdp->srcu_cblist); 1727 1690 spin_unlock_irq_rcu_node(sdp); ··· 1737 1700 { 1738 1701 bool pushgp = true; 1739 1702 1740 - spin_lock_irq_rcu_node(ssp); 1741 - if (ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed)) { 1742 - if (!WARN_ON_ONCE(rcu_seq_state(ssp->srcu_gp_seq))) { 1703 + spin_lock_irq_rcu_node(ssp->srcu_sup); 1704 + if (ULONG_CMP_GE(ssp->srcu_sup->srcu_gp_seq, ssp->srcu_sup->srcu_gp_seq_needed)) { 1705 + if (!WARN_ON_ONCE(rcu_seq_state(ssp->srcu_sup->srcu_gp_seq))) { 1743 1706 /* All requests fulfilled, time to go idle. */ 1744 1707 pushgp = false; 1745 1708 } 1746 - } else if (!rcu_seq_state(ssp->srcu_gp_seq)) { 1709 + } else if (!rcu_seq_state(ssp->srcu_sup->srcu_gp_seq)) { 1747 1710 /* Outstanding request and no GP. Start one. */ 1748 1711 srcu_gp_start(ssp); 1749 1712 } 1750 - spin_unlock_irq_rcu_node(ssp); 1713 + spin_unlock_irq_rcu_node(ssp->srcu_sup); 1751 1714 1752 1715 if (pushgp) 1753 - queue_delayed_work(rcu_gp_wq, &ssp->work, delay); 1716 + queue_delayed_work(rcu_gp_wq, &ssp->srcu_sup->work, delay); 1754 1717 } 1755 1718 1756 1719 /* ··· 1761 1724 unsigned long curdelay; 1762 1725 unsigned long j; 1763 1726 struct srcu_struct *ssp; 1727 + struct srcu_usage *sup; 1764 1728 1765 - ssp = container_of(work, struct srcu_struct, work.work); 1729 + sup = container_of(work, struct srcu_usage, work.work); 1730 + ssp = sup->srcu_ssp; 1766 1731 1767 1732 srcu_advance_state(ssp); 1768 1733 curdelay = srcu_get_delay(ssp); 1769 1734 if (curdelay) { 1770 - WRITE_ONCE(ssp->reschedule_count, 0); 1735 + WRITE_ONCE(sup->reschedule_count, 0); 1771 1736 } else { 1772 1737 j = jiffies; 1773 - if (READ_ONCE(ssp->reschedule_jiffies) == j) { 1774 - WRITE_ONCE(ssp->reschedule_count, READ_ONCE(ssp->reschedule_count) + 1); 1775 - if (READ_ONCE(ssp->reschedule_count) > srcu_max_nodelay) 1738 + if (READ_ONCE(sup->reschedule_jiffies) == j) { 1739 + WRITE_ONCE(sup->reschedule_count, READ_ONCE(sup->reschedule_count) + 1); 1740 + if (READ_ONCE(sup->reschedule_count) > srcu_max_nodelay) 1776 1741 curdelay = 1; 1777 1742 } else { 1778 - WRITE_ONCE(ssp->reschedule_count, 1); 1779 - WRITE_ONCE(ssp->reschedule_jiffies, j); 1743 + WRITE_ONCE(sup->reschedule_count, 1); 1744 + WRITE_ONCE(sup->reschedule_jiffies, j); 1780 1745 } 1781 1746 } 1782 1747 srcu_reschedule(ssp, curdelay); ··· 1791 1752 if (test_type != SRCU_FLAVOR) 1792 1753 return; 1793 1754 *flags = 0; 1794 - *gp_seq = rcu_seq_current(&ssp->srcu_gp_seq); 1755 + *gp_seq = rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq); 1795 1756 } 1796 1757 EXPORT_SYMBOL_GPL(srcutorture_get_gp_data); 1797 1758 ··· 1813 1774 int cpu; 1814 1775 int idx; 1815 1776 unsigned long s0 = 0, s1 = 0; 1816 - int ss_state = READ_ONCE(ssp->srcu_size_state); 1777 + int ss_state = READ_ONCE(ssp->srcu_sup->srcu_size_state); 1817 1778 int ss_state_idx = ss_state; 1818 1779 1819 1780 idx = ssp->srcu_idx & 0x1; 1820 1781 if (ss_state < 0 || ss_state >= ARRAY_SIZE(srcu_size_state_name)) 1821 1782 ss_state_idx = ARRAY_SIZE(srcu_size_state_name) - 1; 1822 1783 pr_alert("%s%s Tree SRCU g%ld state %d (%s)", 1823 - tt, tf, rcu_seq_current(&ssp->srcu_gp_seq), ss_state, 1784 + tt, tf, rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq), ss_state, 1824 1785 srcu_size_state_name[ss_state_idx]); 1825 1786 if (!ssp->sda) { 1826 1787 // Called after cleanup_srcu_struct(), perhaps. ··· 1877 1838 1878 1839 void __init srcu_init(void) 1879 1840 { 1880 - struct srcu_struct *ssp; 1841 + struct srcu_usage *sup; 1881 1842 1882 1843 /* Decide on srcu_struct-size strategy. */ 1883 1844 if (SRCU_SIZING_IS(SRCU_SIZING_AUTO)) { ··· 1897 1858 */ 1898 1859 srcu_init_done = true; 1899 1860 while (!list_empty(&srcu_boot_list)) { 1900 - ssp = list_first_entry(&srcu_boot_list, struct srcu_struct, 1861 + sup = list_first_entry(&srcu_boot_list, struct srcu_usage, 1901 1862 work.work.entry); 1902 - list_del_init(&ssp->work.work.entry); 1903 - if (SRCU_SIZING_IS(SRCU_SIZING_INIT) && ssp->srcu_size_state == SRCU_SIZE_SMALL) 1904 - ssp->srcu_size_state = SRCU_SIZE_ALLOC; 1905 - queue_work(rcu_gp_wq, &ssp->work.work); 1863 + list_del_init(&sup->work.work.entry); 1864 + if (SRCU_SIZING_IS(SRCU_SIZING_INIT) && 1865 + sup->srcu_size_state == SRCU_SIZE_SMALL) 1866 + sup->srcu_size_state = SRCU_SIZE_ALLOC; 1867 + queue_work(rcu_gp_wq, &sup->work.work); 1906 1868 } 1907 1869 } 1908 1870 ··· 1913 1873 static int srcu_module_coming(struct module *mod) 1914 1874 { 1915 1875 int i; 1876 + struct srcu_struct *ssp; 1916 1877 struct srcu_struct **sspp = mod->srcu_struct_ptrs; 1917 - int ret; 1918 1878 1919 1879 for (i = 0; i < mod->num_srcu_structs; i++) { 1920 - ret = init_srcu_struct(*(sspp++)); 1921 - if (WARN_ON_ONCE(ret)) 1922 - return ret; 1880 + ssp = *(sspp++); 1881 + ssp->sda = alloc_percpu(struct srcu_data); 1882 + if (WARN_ON_ONCE(!ssp->sda)) 1883 + return -ENOMEM; 1923 1884 } 1924 1885 return 0; 1925 1886 } ··· 1929 1888 static void srcu_module_going(struct module *mod) 1930 1889 { 1931 1890 int i; 1891 + struct srcu_struct *ssp; 1932 1892 struct srcu_struct **sspp = mod->srcu_struct_ptrs; 1933 1893 1934 - for (i = 0; i < mod->num_srcu_structs; i++) 1935 - cleanup_srcu_struct(*(sspp++)); 1894 + for (i = 0; i < mod->num_srcu_structs; i++) { 1895 + ssp = *(sspp++); 1896 + if (!rcu_seq_state(smp_load_acquire(&ssp->srcu_sup->srcu_gp_seq_needed)) && 1897 + !WARN_ON_ONCE(!ssp->srcu_sup->sda_is_static)) 1898 + cleanup_srcu_struct(ssp); 1899 + if (!WARN_ON(srcu_readers_active(ssp))) 1900 + free_percpu(ssp->sda); 1901 + } 1936 1902 } 1937 1903 1938 1904 /* Handle one module, either coming or going. */
+33
kernel/rcu/tasks.h
··· 136 136 .kname = #rt_name, \ 137 137 } 138 138 139 + #ifdef CONFIG_TASKS_RCU 139 140 /* Track exiting tasks in order to allow them to be waited for. */ 140 141 DEFINE_STATIC_SRCU(tasks_rcu_exit_srcu); 142 + #endif 143 + 144 + #ifdef CONFIG_TASKS_RCU 145 + /* Report delay in synchronize_srcu() completion in rcu_tasks_postscan(). */ 146 + static void tasks_rcu_exit_srcu_stall(struct timer_list *unused); 147 + static DEFINE_TIMER(tasks_rcu_exit_srcu_stall_timer, tasks_rcu_exit_srcu_stall); 148 + #endif 141 149 142 150 /* Avoid IPIing CPUs early in the grace period. */ 143 151 #define RCU_TASK_IPI_DELAY (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) ? HZ / 2 : 0) ··· 838 830 /* Processing between scanning taskslist and draining the holdout list. */ 839 831 static void rcu_tasks_postscan(struct list_head *hop) 840 832 { 833 + int rtsi = READ_ONCE(rcu_task_stall_info); 834 + 835 + if (!IS_ENABLED(CONFIG_TINY_RCU)) { 836 + tasks_rcu_exit_srcu_stall_timer.expires = jiffies + rtsi; 837 + add_timer(&tasks_rcu_exit_srcu_stall_timer); 838 + } 839 + 841 840 /* 842 841 * Exiting tasks may escape the tasklist scan. Those are vulnerable 843 842 * until their final schedule() with TASK_DEAD state. To cope with ··· 863 848 * call to synchronize_rcu(). 864 849 */ 865 850 synchronize_srcu(&tasks_rcu_exit_srcu); 851 + 852 + if (!IS_ENABLED(CONFIG_TINY_RCU)) 853 + del_timer_sync(&tasks_rcu_exit_srcu_stall_timer); 866 854 } 867 855 868 856 /* See if tasks are still holding out, complain if so. */ ··· 940 922 941 923 void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func); 942 924 DEFINE_RCU_TASKS(rcu_tasks, rcu_tasks_wait_gp, call_rcu_tasks, "RCU Tasks"); 925 + 926 + static void tasks_rcu_exit_srcu_stall(struct timer_list *unused) 927 + { 928 + #ifndef CONFIG_TINY_RCU 929 + int rtsi; 930 + 931 + rtsi = READ_ONCE(rcu_task_stall_info); 932 + pr_info("%s: %s grace period number %lu (since boot) gp_state: %s is %lu jiffies old.\n", 933 + __func__, rcu_tasks.kname, rcu_tasks.tasks_gp_seq, 934 + tasks_gp_state_getname(&rcu_tasks), jiffies - rcu_tasks.gp_jiffies); 935 + pr_info("Please check any exiting tasks stuck between calls to exit_tasks_rcu_start() and exit_tasks_rcu_finish()\n"); 936 + tasks_rcu_exit_srcu_stall_timer.expires = jiffies + rtsi; 937 + add_timer(&tasks_rcu_exit_srcu_stall_timer); 938 + #endif // #ifndef CONFIG_TINY_RCU 939 + } 943 940 944 941 /** 945 942 * call_rcu_tasks() - Queue an RCU for invocation task-based grace period
+11 -7
kernel/rcu/tree.c
··· 640 640 } 641 641 raw_spin_unlock_rcu_node(rdp->mynode); 642 642 } 643 + NOKPROBE_SYMBOL(__rcu_irq_enter_check_tick); 643 644 #endif /* CONFIG_NO_HZ_FULL */ 644 645 645 646 /* ··· 1956 1955 { 1957 1956 unsigned long flags; 1958 1957 unsigned long mask; 1959 - bool needwake = false; 1960 1958 bool needacc = false; 1961 1959 struct rcu_node *rnp; 1962 1960 ··· 1987 1987 * NOCB kthreads have their own way to deal with that... 1988 1988 */ 1989 1989 if (!rcu_rdp_is_offloaded(rdp)) { 1990 - needwake = rcu_accelerate_cbs(rnp, rdp); 1990 + /* 1991 + * The current GP has not yet ended, so it 1992 + * should not be possible for rcu_accelerate_cbs() 1993 + * to return true. So complain, but don't awaken. 1994 + */ 1995 + WARN_ON_ONCE(rcu_accelerate_cbs(rnp, rdp)); 1991 1996 } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) { 1992 1997 /* 1993 1998 * ...but NOCB kthreads may miss or delay callbacks acceleration ··· 2004 1999 rcu_disable_urgency_upon_qs(rdp); 2005 2000 rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); 2006 2001 /* ^^^ Released rnp->lock */ 2007 - if (needwake) 2008 - rcu_gp_kthread_wake(); 2009 2002 2010 2003 if (needacc) { 2011 2004 rcu_nocb_lock_irqsave(rdp, flags); ··· 2134 2131 break; 2135 2132 } 2136 2133 } else { 2134 + // In rcuoc context, so no worries about depriving 2135 + // other softirq vectors of CPU cycles. 2137 2136 local_bh_enable(); 2138 2137 lockdep_assert_irqs_enabled(); 2139 2138 cond_resched_tasks_rcu_qs(); ··· 4956 4951 else 4957 4952 qovld_calc = qovld; 4958 4953 4959 - // Kick-start any polled grace periods that started early. 4960 - if (!(per_cpu_ptr(&rcu_data, cpu)->mynode->exp_seq_poll_rq & 0x1)) 4961 - (void)start_poll_synchronize_rcu_expedited(); 4954 + // Kick-start in case any polled grace periods started early. 4955 + (void)start_poll_synchronize_rcu_expedited(); 4962 4956 4963 4957 rcu_test_sync_prims(); 4964 4958 }
+10 -6
kernel/rcu/tree_exp.h
··· 594 594 struct rcu_data *rdp; 595 595 struct rcu_node *rnp; 596 596 struct rcu_node *rnp_root = rcu_get_root(); 597 + unsigned long flags; 597 598 598 599 trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait")); 599 600 jiffies_stall = rcu_exp_jiffies_till_stall_check(); ··· 603 602 if (synchronize_rcu_expedited_wait_once(1)) 604 603 return; 605 604 rcu_for_each_leaf_node(rnp) { 605 + raw_spin_lock_irqsave_rcu_node(rnp, flags); 606 606 mask = READ_ONCE(rnp->expmask); 607 607 for_each_leaf_node_cpu_mask(rnp, cpu, mask) { 608 608 rdp = per_cpu_ptr(&rcu_data, cpu); 609 609 if (rdp->rcu_forced_tick_exp) 610 610 continue; 611 611 rdp->rcu_forced_tick_exp = true; 612 - preempt_disable(); 613 612 if (cpu_online(cpu)) 614 613 tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP); 615 - preempt_enable(); 616 614 } 615 + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 617 616 } 618 617 j = READ_ONCE(jiffies_till_first_fqs); 619 618 if (synchronize_rcu_expedited_wait_once(j + HZ)) ··· 803 802 int ndetected = 0; 804 803 struct task_struct *t; 805 804 806 - if (!READ_ONCE(rnp->exp_tasks)) 807 - return 0; 808 805 raw_spin_lock_irqsave_rcu_node(rnp, flags); 806 + if (!rnp->exp_tasks) { 807 + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 808 + return 0; 809 + } 809 810 t = list_entry(rnp->exp_tasks->prev, 810 811 struct task_struct, rcu_node_entry); 811 812 list_for_each_entry_continue(t, &rnp->blkd_tasks, rcu_node_entry) { ··· 1068 1065 if (rcu_init_invoked()) 1069 1066 raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags); 1070 1067 if (!poll_state_synchronize_rcu(s)) { 1071 - rnp->exp_seq_poll_rq = s; 1072 - if (rcu_init_invoked()) 1068 + if (rcu_init_invoked()) { 1069 + rnp->exp_seq_poll_rq = s; 1073 1070 queue_work(rcu_gp_wq, &rnp->exp_poll_wq); 1071 + } 1074 1072 } 1075 1073 if (rcu_init_invoked()) 1076 1074 raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
+4
kernel/rcu/tree_nocb.h
··· 1312 1312 } 1313 1313 EXPORT_SYMBOL_GPL(rcu_nocb_cpu_offload); 1314 1314 1315 + #ifdef CONFIG_RCU_LAZY 1315 1316 static unsigned long 1316 1317 lazy_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) 1317 1318 { ··· 1361 1360 .batch = 0, 1362 1361 .seeks = DEFAULT_SEEKS, 1363 1362 }; 1363 + #endif // #ifdef CONFIG_RCU_LAZY 1364 1364 1365 1365 void __init rcu_init_nohz(void) 1366 1366 { ··· 1393 1391 if (!rcu_state.nocb_is_setup) 1394 1392 return; 1395 1393 1394 + #ifdef CONFIG_RCU_LAZY 1396 1395 if (register_shrinker(&lazy_rcu_shrinker, "rcu-lazy")) 1397 1396 pr_err("Failed to register lazy_rcu shrinker!\n"); 1397 + #endif // #ifdef CONFIG_RCU_LAZY 1398 1398 1399 1399 if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) { 1400 1400 pr_info("\tNote: kernel parameter 'rcu_nocbs=', 'nohz_full', or 'isolcpus=' contains nonexistent CPUs.\n");
+13 -3
kernel/time/tick-sched.c
··· 281 281 return true; 282 282 } 283 283 284 + if (val & TICK_DEP_MASK_RCU_EXP) { 285 + trace_tick_stop(0, TICK_DEP_MASK_RCU_EXP); 286 + return true; 287 + } 288 + 284 289 return false; 285 290 } 286 291 ··· 532 527 tick_nohz_full_running = true; 533 528 } 534 529 535 - static int tick_nohz_cpu_down(unsigned int cpu) 530 + bool tick_nohz_cpu_hotpluggable(unsigned int cpu) 536 531 { 537 532 /* 538 533 * The tick_do_timer_cpu CPU handles housekeeping duty (unbound ··· 540 535 * CPUs. It must remain online when nohz full is enabled. 541 536 */ 542 537 if (tick_nohz_full_running && tick_do_timer_cpu == cpu) 543 - return -EBUSY; 544 - return 0; 538 + return false; 539 + return true; 540 + } 541 + 542 + static int tick_nohz_cpu_down(unsigned int cpu) 543 + { 544 + return tick_nohz_cpu_hotpluggable(cpu) ? 0 : -EBUSY; 545 545 } 546 546 547 547 void __init tick_nohz_init(void)
+1 -1
kernel/trace/trace_osnoise.c
··· 159 159 if (!found) 160 160 return; 161 161 162 - kvfree_rcu(inst); 162 + kvfree_rcu_mightsleep(inst); 163 163 } 164 164 165 165 /*
+1 -1
kernel/trace/trace_probe.c
··· 1172 1172 return -ENOENT; 1173 1173 1174 1174 list_del_rcu(&link->list); 1175 - kvfree_rcu(link); 1175 + kvfree_rcu_mightsleep(link); 1176 1176 1177 1177 if (list_empty(&tp->event->files)) 1178 1178 trace_probe_clear_flag(tp, TP_FLAG_TRACE);
+1 -1
lib/test_vmalloc.c
··· 334 334 return -1; 335 335 336 336 p->array[0] = 'a'; 337 - kvfree_rcu(p); 337 + kvfree_rcu_mightsleep(p); 338 338 } 339 339 340 340 return 0;
-1
mm/Kconfig
··· 686 686 687 687 config MMU_NOTIFIER 688 688 bool 689 - select SRCU 690 689 select INTERVAL_TREE 691 690 692 691 config KSM
+2 -2
net/core/sysctl_net_core.c
··· 177 177 if (orig_sock_table) { 178 178 static_branch_dec(&rps_needed); 179 179 static_branch_dec(&rfs_needed); 180 - kvfree_rcu(orig_sock_table); 180 + kvfree_rcu_mightsleep(orig_sock_table); 181 181 } 182 182 } 183 183 } ··· 215 215 lockdep_is_held(&flow_limit_update_mutex)); 216 216 if (cur && !cpumask_test_cpu(i, mask)) { 217 217 RCU_INIT_POINTER(sd->flow_limit, NULL); 218 - kfree_rcu(cur); 218 + kfree_rcu_mightsleep(cur); 219 219 } else if (!cur && cpumask_test_cpu(i, mask)) { 220 220 cur = kzalloc_node(len, GFP_KERNEL, 221 221 cpu_to_node(i));
+2 -2
net/mac802154/scan.c
··· 52 52 request = rcu_replace_pointer(local->scan_req, NULL, 1); 53 53 if (!request) 54 54 return 0; 55 - kfree_rcu(request); 55 + kvfree_rcu_mightsleep(request); 56 56 57 57 /* Advertize first, while we know the devices cannot be removed */ 58 58 if (aborted) ··· 403 403 request = rcu_replace_pointer(local->beacon_req, NULL, 1); 404 404 if (!request) 405 405 return 0; 406 - kfree_rcu(request); 406 + kvfree_rcu_mightsleep(request); 407 407 408 408 nl802154_beaconing_done(wpan_dev); 409 409
+9
scripts/checkpatch.pl
··· 6388 6388 } 6389 6389 } 6390 6390 6391 + # check for soon-to-be-deprecated single-argument k[v]free_rcu() API 6392 + if ($line =~ /\bk[v]?free_rcu\s*\([^(]+\)/) { 6393 + if ($line =~ /\bk[v]?free_rcu\s*\([^,]+\)/) { 6394 + ERROR("DEPRECATED_API", 6395 + "Single-argument k[v]free_rcu() API is deprecated, please pass rcu_head object or call k[v]free_rcu_mightsleep()." . $herecurr); 6396 + } 6397 + } 6398 + 6399 + 6391 6400 # check for unnecessary "Out of Memory" messages 6392 6401 if ($line =~ /^\+.*\b$logFunctions\s*\(/ && 6393 6402 $prevline =~ /^[ \+]\s*if\s*\(\s*(\!\s*|NULL\s*==\s*)?($Lval)(\s*==\s*NULL\s*)?\s*\)/ &&
+20 -6
tools/rcu/extract-stall.sh
··· 1 1 #!/bin/sh 2 2 # SPDX-License-Identifier: GPL-2.0+ 3 - # 4 - # Extract any RCU CPU stall warnings present in specified file. 5 - # Filter out clocksource lines. Note that preceding-lines excludes the 6 - # initial line of the stall warning but trailing-lines includes it. 7 - # 8 - # Usage: extract-stall.sh dmesg-file [ preceding-lines [ trailing-lines ] ] 3 + 4 + usage() { 5 + echo Extract any RCU CPU stall warnings present in specified file. 6 + echo Filter out clocksource lines. Note that preceding-lines excludes the 7 + echo initial line of the stall warning but trailing-lines includes it. 8 + echo 9 + echo Usage: $(basename $0) dmesg-file [ preceding-lines [ trailing-lines ] ] 10 + echo 11 + echo Error: $1 12 + } 13 + 14 + # Terminate the script, if the argument is missing 15 + 16 + if test -f "$1" && test -r "$1" 17 + then 18 + : 19 + else 20 + usage "Console log file \"$1\" missing or unreadable." 21 + exit 1 22 + fi 9 23 10 24 echo $1 11 25 preceding_lines="${2-3}"
+1 -1
tools/testing/selftests/rcutorture/bin/kvm-again.sh
··· 193 193 qemu_cmd_dir="`dirname "$i"`" 194 194 kernel_dir="`echo $qemu_cmd_dir | sed -e 's/\.[0-9]\+$//'`" 195 195 jitter_dir="`dirname "$kernel_dir"`" 196 - kvm-transform.sh "$kernel_dir/bzImage" "$qemu_cmd_dir/console.log" "$jitter_dir" $dur "$bootargs" < $T/qemu-cmd > $i 196 + kvm-transform.sh "$kernel_dir/bzImage" "$qemu_cmd_dir/console.log" "$jitter_dir" "$dur" "$bootargs" < $T/qemu-cmd > $i 197 197 if test -n "$arg_remote" 198 198 then 199 199 echo "# TORTURE_KCONFIG_GDB_ARG=''" >> $i
+78
tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh
··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0+ 3 + # 4 + # Run SRCU-lockdep tests and report any that fail to meet expectations. 5 + # 6 + # Copyright (C) 2021 Meta Platforms, Inc. 7 + # 8 + # Authors: Paul E. McKenney <paulmck@kernel.org> 9 + 10 + usage () { 11 + echo "Usage: $scriptname optional arguments:" 12 + echo " --datestamp string" 13 + exit 1 14 + } 15 + 16 + ds=`date +%Y.%m.%d-%H.%M.%S`-srcu_lockdep 17 + scriptname="$0" 18 + 19 + T="`mktemp -d ${TMPDIR-/tmp}/srcu_lockdep.sh.XXXXXX`" 20 + trap 'rm -rf $T' 0 21 + 22 + RCUTORTURE="`pwd`/tools/testing/selftests/rcutorture"; export RCUTORTURE 23 + PATH=${RCUTORTURE}/bin:$PATH; export PATH 24 + . functions.sh 25 + 26 + while test $# -gt 0 27 + do 28 + case "$1" in 29 + --datestamp) 30 + checkarg --datestamp "(relative pathname)" "$#" "$2" '^[a-zA-Z0-9._/-]*$' '^--' 31 + ds=$2 32 + shift 33 + ;; 34 + *) 35 + echo Unknown argument $1 36 + usage 37 + ;; 38 + esac 39 + shift 40 + done 41 + 42 + err= 43 + nerrs=0 44 + for d in 0 1 45 + do 46 + for t in 0 1 2 47 + do 48 + for c in 1 2 3 49 + do 50 + err= 51 + val=$((d*1000+t*10+c)) 52 + tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5s --configs "SRCU-P" --bootargs "rcutorture.test_srcu_lockdep=$val" --trust-make --datestamp "$ds/$val" > "$T/kvm.sh.out" 2>&1 53 + ret=$? 54 + mv "$T/kvm.sh.out" "$RCUTORTURE/res/$ds/$val" 55 + if test "$d" -ne 0 && test "$ret" -eq 0 56 + then 57 + err=1 58 + echo -n Unexpected success for > "$RCUTORTURE/res/$ds/$val/kvm.sh.err" 59 + fi 60 + if test "$d" -eq 0 && test "$ret" -ne 0 61 + then 62 + err=1 63 + echo -n Unexpected failure for > "$RCUTORTURE/res/$ds/$val/kvm.sh.err" 64 + fi 65 + if test -n "$err" 66 + then 67 + grep "rcu_torture_init_srcu_lockdep: test_srcu_lockdep = " "$RCUTORTURE/res/$ds/$val/SRCU-P/console.log" | sed -e 's/^.*rcu_torture_init_srcu_lockdep://' >> "$RCUTORTURE/res/$ds/$val/kvm.sh.err" 68 + cat "$RCUTORTURE/res/$ds/$val/kvm.sh.err" 69 + nerrs=$((nerrs+1)) 70 + fi 71 + done 72 + done 73 + done 74 + if test "$nerrs" -ne 0 75 + then 76 + exit 1 77 + fi 78 + exit 0
+3 -3
tools/testing/selftests/rcutorture/bin/torture.sh
··· 497 497 498 498 if test "$do_clocksourcewd" = "yes" 499 499 then 500 - torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000" 500 + torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 tsc=watchdog" 501 501 torture_set "clocksourcewd-1" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 45s --configs TREE03 --kconfig "CONFIG_TEST_CLOCKSOURCE_WATCHDOG=y" --trust-make 502 502 503 - torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 clocksource.max_cswd_read_retries=1" 503 + torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 clocksource.max_cswd_read_retries=1 tsc=watchdog" 504 504 torture_set "clocksourcewd-2" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 45s --configs TREE03 --kconfig "CONFIG_TEST_CLOCKSOURCE_WATCHDOG=y" --trust-make 505 505 506 506 # In case our work is already done... 507 507 if test "$do_rcutorture" != "yes" 508 508 then 509 - torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000" 509 + torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 tsc=watchdog" 510 510 torture_set "clocksourcewd-3" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 45s --configs TREE03 --trust-make 511 511 fi 512 512 fi
+1
tools/testing/selftests/rcutorture/configs/rcu/TREE01
··· 15 15 CONFIG_RCU_BOOST=n 16 16 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 17 17 CONFIG_RCU_EXPERT=y 18 + CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
+1
tools/testing/selftests/rcutorture/configs/rcu/TREE04
··· 15 15 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 16 16 CONFIG_RCU_EXPERT=y 17 17 CONFIG_RCU_EQS_DEBUG=y 18 + CONFIG_RCU_LAZY=y
-4
tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
··· 71 71 72 72 These are controlled by CONFIG_PREEMPT and/or CONFIG_SMP. 73 73 74 - CONFIG_SRCU 75 - 76 - Selected by CONFIG_RCU_TORTURE_TEST, so cannot disable. 77 - 78 74 79 75 boot parameters ignored: TBD