Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Paul McKenney:
"Documentation updates:

- Update whatisRCU.rst and checklist.rst for recent RCU API additions

- Fix RCU documentation formatting and typos

- Replace dead Ottawa Linux Symposium links in RTFP.txt

Miscellaneous RCU updates:

- Document that rcu_barrier() hurries RCU_LAZY callbacks

- Remove redundant interrupt disabling from
rcu_preempt_deferred_qs_handler()

- Move list_for_each_rcu from list.h to rculist.h, and adjust the
include directive in kernel/cgroup/dmem.c accordingly

- Make initial set of changes to accommodate upcoming
system_percpu_wq changes

SRCU updates:

- Create an srcu_read_lock_fast_notrace() for eventual use in
tracing, including adding guards

- Document the reliance on per-CPU operations as implicit RCU readers
in __srcu_read_{,un}lock_fast()

- Document the srcu_flip() function's memory-barrier D's relationship
to SRCU-fast readers

- Remove a redundant preempt_disable() and preempt_enable() pair from
srcu_gp_start_if_needed()

Torture-test updates:

- Fix jitter.sh spin time so that it actually varies as advertised.
It is still quite coarse-grained, but at least it does now vary

- Update torture.sh help text to include the not-so-new --do-normal
parameter, which permits (for example) testing KCSAN kernels
without doing non-debug kernels

- Fix a number of false-positive diagnostics that were being
triggered by rcutorture starting before boot completed. Running
multiple near-CPU-bound rcutorture processes when there is only the
boot CPU is after all a bit excessive

- Substitute kcalloc() for kzalloc()

- Remove a redundant kfree() and NULL out kfree()ed objects"

* tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (31 commits)
rcu: WQ_UNBOUND added to sync_wq workqueue
rcu: WQ_PERCPU added to alloc_workqueue users
rcu: replace use of system_wq with system_percpu_wq
refperf: Set reader_tasks to NULL after kfree()
refperf: Remove redundant kfree() after torture_stop_kthread()
srcu/tiny: Remove preempt_disable/enable() in srcu_gp_start_if_needed()
srcu: Document srcu_flip() memory-barrier D relation to SRCU-fast
srcu: Document __srcu_read_{,un}lock_fast() implicit RCU readers
rculist: move list_for_each_rcu() to where it belongs
refscale: Use kcalloc() instead of kzalloc()
rcutorture: Use kcalloc() instead of kzalloc()
docs: rcu: Replace multiple dead OLS links in RTFP.txt
doc: Fix typo in RCU's torture.rst documentation
Documentation: RCU: Retitle toctree index
Documentation: RCU: Reduce toctree depth
Documentation: RCU: Wrap kvm-remote.sh rerun snippet in literal code block
rcu: docs: Requirements.rst: Abide by conventions of kernel documentation
doc: Add RCU guards to checklist.rst
doc: Update whatisRCU.rst for recent RCU API additions
rcutorture: Delay forward-progress testing until boot completes
...

+315 -128
+24 -28
Documentation/RCU/Design/Requirements/Requirements.rst
··· 1973 1973 Note that grace period initialization (rcu_gp_init()) must carefully sequence 1974 1974 CPU hotplug scanning with grace period state changes. For example, the 1975 1975 following race could occur in rcu_gp_init() if rcu_seq_start() were to happen 1976 - after the CPU hotplug scanning. 1977 - 1978 - .. code-block:: none 1976 + after the CPU hotplug scanning:: 1979 1977 1980 1978 CPU0 (rcu_gp_init) CPU1 CPU2 1981 1979 --------------------- ---- ---- ··· 2006 2008 kfree(r1); 2007 2009 r2 = *r0; // USE-AFTER-FREE! 2008 2010 2009 - By incrementing gp_seq first, CPU1's RCU read-side critical section 2011 + By incrementing ``gp_seq`` first, CPU1's RCU read-side critical section 2010 2012 is guaranteed to not be missed by CPU2. 2011 2013 2012 - **Concurrent Quiescent State Reporting for Offline CPUs** 2014 + Concurrent Quiescent State Reporting for Offline CPUs 2015 + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2013 2016 2014 2017 RCU must ensure that CPUs going offline report quiescent states to avoid 2015 2018 blocking grace periods. This requires careful synchronization to handle 2016 2019 race conditions 2017 2020 2018 - **Race condition causing Offline CPU to hang GP** 2021 + Race condition causing Offline CPU to hang GP 2022 + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2019 2023 2020 - A race between CPU offlining and new GP initialization (gp_init) may occur 2021 - because `rcu_report_qs_rnp()` in `rcutree_report_cpu_dead()` must temporarily 2022 - release the `rcu_node` lock to wake the RCU grace-period kthread: 2023 - 2024 - .. code-block:: none 2024 + A race between CPU offlining and new GP initialization (gp_init()) may occur 2025 + because rcu_report_qs_rnp() in rcutree_report_cpu_dead() must temporarily 2026 + release the ``rcu_node`` lock to wake the RCU grace-period kthread:: 2025 2027 2026 2028 CPU1 (going offline) CPU0 (GP kthread) 2027 2029 -------------------- ----------------- ··· 2042 2044 // Reacquire lock (but too late) 2043 2045 rnp->qsmaskinitnext &= ~mask // Finally clears bit 2044 2046 2045 - Without `ofl_lock`, the new grace period includes the offline CPU and waits 2047 + Without ``ofl_lock``, the new grace period includes the offline CPU and waits 2046 2048 forever for its quiescent state causing a GP hang. 2047 2049 2048 - **A solution with ofl_lock** 2050 + A solution with ofl_lock 2051 + ^^^^^^^^^^^^^^^^^^^^^^^^ 2049 2052 2050 - The `ofl_lock` (offline lock) prevents `rcu_gp_init()` from running during 2051 - the vulnerable window when `rcu_report_qs_rnp()` has released `rnp->lock`: 2052 - 2053 - .. code-block:: none 2053 + The ``ofl_lock`` (offline lock) prevents rcu_gp_init() from running during 2054 + the vulnerable window when rcu_report_qs_rnp() has released ``rnp->lock``:: 2054 2055 2055 2056 CPU0 (rcu_gp_init) CPU1 (rcutree_report_cpu_dead) 2056 2057 ------------------ ------------------------------ ··· 2062 2065 arch_spin_unlock(&ofl_lock) ---> // Now CPU1 can proceed 2063 2066 } // But snapshot already taken 2064 2067 2065 - **Another race causing GP hangs in rcu_gpu_init(): Reporting QS for Now-offline CPUs** 2068 + Another race causing GP hangs in rcu_gpu_init(): Reporting QS for Now-offline CPUs 2069 + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2066 2070 2067 2071 After the first loop takes an atomic snapshot of online CPUs, as shown above, 2068 - the second loop in `rcu_gp_init()` detects CPUs that went offline between 2069 - releasing `ofl_lock` and acquiring the per-node `rnp->lock`. This detection is 2070 - crucial because: 2072 + the second loop in rcu_gp_init() detects CPUs that went offline between 2073 + releasing ``ofl_lock`` and acquiring the per-node ``rnp->lock``. 2074 + This detection is crucial because: 2071 2075 2072 2076 1. The CPU might have gone offline after the snapshot but before the second loop 2073 2077 2. The offline CPU cannot report its own QS if it's already dead 2074 2078 3. Without this detection, the grace period would wait forever for CPUs that 2075 2079 are now offline. 2076 2080 2077 - The second loop performs this detection safely: 2078 - 2079 - .. code-block:: none 2081 + The second loop performs this detection safely:: 2080 2082 2081 2083 rcu_for_each_node_breadth_first(rnp) { 2082 2084 raw_spin_lock_irqsave_rcu_node(rnp, flags); ··· 2089 2093 } 2090 2094 2091 2095 This approach ensures atomicity: quiescent state reporting for offline CPUs 2092 - happens either in `rcu_gp_init()` (second loop) or in `rcutree_report_cpu_dead()`, 2093 - never both and never neither. The `rnp->lock` held throughout the sequence 2094 - prevents races - `rcutree_report_cpu_dead()` also acquires this lock when 2095 - clearing `qsmaskinitnext`, ensuring mutual exclusion. 2096 + happens either in rcu_gp_init() (second loop) or in rcutree_report_cpu_dead(), 2097 + never both and never neither. The ``rnp->lock`` held throughout the sequence 2098 + prevents races - rcutree_report_cpu_dead() also acquires this lock when 2099 + clearing ``qsmaskinitnext``, ensuring mutual exclusion. 2096 2100 2097 2101 Scheduler and RCU 2098 2102 ~~~~~~~~~~~~~~~~~
+3 -3
Documentation/RCU/RTFP.txt
··· 641 641 ,Month="July" 642 642 ,Year="2001" 643 643 ,note="Available: 644 - \url{http://www.linuxsymposium.org/2001/abstracts/readcopy.php} 644 + \url{https://kernel.org/doc/ols/2001/read-copy.pdf} 645 645 \url{http://www.rdrop.com/users/paulmck/RCU/rclock_OLS.2001.05.01c.pdf} 646 646 [Viewed June 23, 2004]" 647 647 ,annotation={ ··· 1480 1480 ,Year="2006" 1481 1481 ,pages="v2 123-138" 1482 1482 ,note="Available: 1483 - \url{http://www.linuxsymposium.org/2006/view_abstract.php?content_key=184} 1483 + \url{https://kernel.org/doc/ols/2006/ols2006v2-pages-131-146.pdf} 1484 1484 \url{http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf} 1485 1485 [Viewed January 1, 2007]" 1486 1486 ,annotation={ ··· 1511 1511 ,Year="2006" 1512 1512 ,pages="v2 249-254" 1513 1513 ,note="Available: 1514 - \url{http://www.linuxsymposium.org/2006/view_abstract.php?content_key=184} 1514 + \url{https://kernel.org/doc/ols/2006/ols2006v2-pages-249-262.pdf} 1515 1515 [Viewed January 11, 2009]" 1516 1516 ,annotation={ 1517 1517 Uses RCU-protected radix tree for a lockless page cache.
+19 -8
Documentation/RCU/checklist.rst
··· 69 69 Explicit disabling of preemption (preempt_disable(), for example) 70 70 can serve as rcu_read_lock_sched(), but is less readable and 71 71 prevents lockdep from detecting locking issues. Acquiring a 72 - spinlock also enters an RCU read-side critical section. 72 + raw spinlock also enters an RCU read-side critical section. 73 + 74 + The guard(rcu)() and scoped_guard(rcu) primitives designate 75 + the remainder of the current scope or the next statement, 76 + respectively, as the RCU read-side critical section. Use of 77 + these guards can be less error-prone than rcu_read_lock(), 78 + rcu_read_unlock(), and friends. 73 79 74 80 Please note that you *cannot* rely on code known to be built 75 81 only in non-preemptible kernels. Such code can and will break, ··· 411 405 13. Unlike most flavors of RCU, it *is* permissible to block in an 412 406 SRCU read-side critical section (demarked by srcu_read_lock() 413 407 and srcu_read_unlock()), hence the "SRCU": "sleepable RCU". 414 - Please note that if you don't need to sleep in read-side critical 415 - sections, you should be using RCU rather than SRCU, because RCU 416 - is almost always faster and easier to use than is SRCU. 408 + As with RCU, guard(srcu)() and scoped_guard(srcu) forms are 409 + available, and often provide greater ease of use. Please note 410 + that if you don't need to sleep in read-side critical sections, 411 + you should be using RCU rather than SRCU, because RCU is almost 412 + always faster and easier to use than is SRCU. 417 413 418 414 Also unlike other forms of RCU, explicit initialization and 419 415 cleanup is required either at build time via DEFINE_SRCU() ··· 451 443 real-time workloads than is synchronize_rcu_expedited(). 452 444 453 445 It is also permissible to sleep in RCU Tasks Trace read-side 454 - critical section, which are delimited by rcu_read_lock_trace() and 455 - rcu_read_unlock_trace(). However, this is a specialized flavor 456 - of RCU, and you should not use it without first checking with 457 - its current users. In most cases, you should instead use SRCU. 446 + critical section, which are delimited by rcu_read_lock_trace() 447 + and rcu_read_unlock_trace(). However, this is a specialized 448 + flavor of RCU, and you should not use it without first checking 449 + with its current users. In most cases, you should instead 450 + use SRCU. As with RCU and SRCU, guard(rcu_tasks_trace)() and 451 + scoped_guard(rcu_tasks_trace) are available, and often provide 452 + greater ease of use. 458 453 459 454 Note that rcu_assign_pointer() relates to SRCU just as it does to 460 455 other forms of RCU, but instead of rcu_dereference() you should
+3 -3
Documentation/RCU/index.rst
··· 1 1 .. SPDX-License-Identifier: GPL-2.0 2 2 3 - .. _rcu_concepts: 3 + .. _rcu_handbook: 4 4 5 5 ============ 6 - RCU concepts 6 + RCU Handbook 7 7 ============ 8 8 9 9 .. toctree:: 10 - :maxdepth: 3 10 + :maxdepth: 2 11 11 12 12 checklist 13 13 lockdep
+2 -2
Documentation/RCU/torture.rst
··· 344 344 345 345 And this is why the kvm-remote.sh script exists. 346 346 347 - If you the following command works:: 347 + If the following command works:: 348 348 349 349 ssh system0 date 350 350 ··· 364 364 The kvm.sh ``--dryrun scenarios`` argument is useful for working out 365 365 how many scenarios may be run in one batch across a group of systems. 366 366 367 - You can also re-run a previous remote run in a manner similar to kvm.sh: 367 + You can also re-run a previous remote run in a manner similar to kvm.sh:: 368 368 369 369 kvm-remote.sh "system0 system1 system2 system3 system4 system5" \ 370 370 tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28-remote \
+116 -30
Documentation/RCU/whatisRCU.rst
··· 1021 1021 list_entry_rcu 1022 1022 list_entry_lockless 1023 1023 list_first_entry_rcu 1024 + list_first_or_null_rcu 1025 + list_tail_rcu 1024 1026 list_next_rcu 1027 + list_next_or_null_rcu 1025 1028 list_for_each_entry_rcu 1026 1029 list_for_each_entry_continue_rcu 1027 1030 list_for_each_entry_from_rcu 1028 - list_first_or_null_rcu 1029 - list_next_or_null_rcu 1031 + list_for_each_entry_lockless 1030 1032 hlist_first_rcu 1031 1033 hlist_next_rcu 1032 1034 hlist_pprev_rcu 1033 1035 hlist_for_each_entry_rcu 1036 + hlist_for_each_entry_rcu_notrace 1034 1037 hlist_for_each_entry_rcu_bh 1035 1038 hlist_for_each_entry_from_rcu 1036 1039 hlist_for_each_entry_continue_rcu 1037 1040 hlist_for_each_entry_continue_rcu_bh 1038 1041 hlist_nulls_first_rcu 1042 + hlist_nulls_next_rcu 1039 1043 hlist_nulls_for_each_entry_rcu 1044 + hlist_nulls_for_each_entry_safe 1040 1045 hlist_bl_first_rcu 1041 1046 hlist_bl_for_each_entry_rcu 1042 1047 1043 1048 RCU pointer/list update:: 1044 1049 1045 1050 rcu_assign_pointer 1051 + rcu_replace_pointer 1052 + INIT_LIST_HEAD_RCU 1046 1053 list_add_rcu 1047 1054 list_add_tail_rcu 1048 1055 list_del_rcu 1049 1056 list_replace_rcu 1057 + list_splice_init_rcu 1058 + list_splice_tail_init_rcu 1050 1059 hlist_add_behind_rcu 1051 1060 hlist_add_before_rcu 1052 1061 hlist_add_head_rcu ··· 1063 1054 hlist_del_rcu 1064 1055 hlist_del_init_rcu 1065 1056 hlist_replace_rcu 1066 - list_splice_init_rcu 1067 - list_splice_tail_init_rcu 1068 1057 hlist_nulls_del_init_rcu 1069 1058 hlist_nulls_del_rcu 1070 1059 hlist_nulls_add_head_rcu 1060 + hlist_nulls_add_tail_rcu 1061 + hlist_nulls_add_fake 1062 + hlists_swap_heads_rcu 1071 1063 hlist_bl_add_head_rcu 1072 - hlist_bl_del_init_rcu 1073 1064 hlist_bl_del_rcu 1074 1065 hlist_bl_set_first_rcu 1075 1066 1076 1067 RCU:: 1077 1068 1078 - Critical sections Grace period Barrier 1069 + Critical sections Grace period Barrier 1079 1070 1080 - rcu_read_lock synchronize_net rcu_barrier 1081 - rcu_read_unlock synchronize_rcu 1082 - rcu_dereference synchronize_rcu_expedited 1083 - rcu_read_lock_held call_rcu 1084 - rcu_dereference_check kfree_rcu 1085 - rcu_dereference_protected 1071 + rcu_read_lock synchronize_net rcu_barrier 1072 + rcu_read_unlock synchronize_rcu 1073 + guard(rcu)() synchronize_rcu_expedited 1074 + scoped_guard(rcu) synchronize_rcu_mult 1075 + rcu_dereference call_rcu 1076 + rcu_dereference_check call_rcu_hurry 1077 + rcu_dereference_protected kfree_rcu 1078 + rcu_read_lock_held kvfree_rcu 1079 + rcu_read_lock_any_held kfree_rcu_mightsleep 1080 + rcu_pointer_handoff cond_synchronize_rcu 1081 + unrcu_pointer cond_synchronize_rcu_full 1082 + cond_synchronize_rcu_expedited 1083 + cond_synchronize_rcu_expedited_full 1084 + get_completed_synchronize_rcu 1085 + get_completed_synchronize_rcu_full 1086 + get_state_synchronize_rcu 1087 + get_state_synchronize_rcu_full 1088 + poll_state_synchronize_rcu 1089 + poll_state_synchronize_rcu_full 1090 + same_state_synchronize_rcu 1091 + same_state_synchronize_rcu_full 1092 + start_poll_synchronize_rcu 1093 + start_poll_synchronize_rcu_full 1094 + start_poll_synchronize_rcu_expedited 1095 + start_poll_synchronize_rcu_expedited_full 1086 1096 1087 1097 bh:: 1088 1098 1089 1099 Critical sections Grace period Barrier 1090 1100 1091 - rcu_read_lock_bh call_rcu rcu_barrier 1092 - rcu_read_unlock_bh synchronize_rcu 1093 - [local_bh_disable] synchronize_rcu_expedited 1101 + rcu_read_lock_bh [Same as RCU] [Same as RCU] 1102 + rcu_read_unlock_bh 1103 + [local_bh_disable] 1094 1104 [and friends] 1095 1105 rcu_dereference_bh 1096 1106 rcu_dereference_bh_check ··· 1120 1092 1121 1093 Critical sections Grace period Barrier 1122 1094 1123 - rcu_read_lock_sched call_rcu rcu_barrier 1124 - rcu_read_unlock_sched synchronize_rcu 1125 - [preempt_disable] synchronize_rcu_expedited 1095 + rcu_read_lock_sched [Same as RCU] [Same as RCU] 1096 + rcu_read_unlock_sched 1097 + [preempt_disable] 1126 1098 [and friends] 1127 1099 rcu_read_lock_sched_notrace 1128 1100 rcu_read_unlock_sched_notrace ··· 1132 1104 rcu_read_lock_sched_held 1133 1105 1134 1106 1107 + RCU: Initialization/cleanup/ordering:: 1108 + 1109 + RCU_INIT_POINTER 1110 + RCU_INITIALIZER 1111 + RCU_POINTER_INITIALIZER 1112 + init_rcu_head 1113 + destroy_rcu_head 1114 + init_rcu_head_on_stack 1115 + destroy_rcu_head_on_stack 1116 + SLAB_TYPESAFE_BY_RCU 1117 + 1118 + 1119 + RCU: Quiescents states and control:: 1120 + 1121 + cond_resched_tasks_rcu_qs 1122 + rcu_all_qs 1123 + rcu_softirq_qs_periodic 1124 + rcu_end_inkernel_boot 1125 + rcu_expedite_gp 1126 + rcu_gp_is_expedited 1127 + rcu_unexpedite_gp 1128 + rcu_cpu_stall_reset 1129 + rcu_head_after_call_rcu 1130 + rcu_is_watching 1131 + 1132 + 1133 + RCU-sync primitive:: 1134 + 1135 + rcu_sync_is_idle 1136 + rcu_sync_init 1137 + rcu_sync_enter 1138 + rcu_sync_exit 1139 + rcu_sync_dtor 1140 + 1141 + 1135 1142 RCU-Tasks:: 1136 1143 1137 - Critical sections Grace period Barrier 1144 + Critical sections Grace period Barrier 1138 1145 1139 - N/A call_rcu_tasks rcu_barrier_tasks 1146 + N/A call_rcu_tasks rcu_barrier_tasks 1140 1147 synchronize_rcu_tasks 1141 1148 1142 1149 1143 1150 RCU-Tasks-Rude:: 1144 1151 1145 - Critical sections Grace period Barrier 1152 + Critical sections Grace period Barrier 1146 1153 1147 - N/A N/A 1148 - synchronize_rcu_tasks_rude 1154 + N/A synchronize_rcu_tasks_rude rcu_barrier_tasks_rude 1155 + call_rcu_tasks_rude 1149 1156 1150 1157 1151 1158 RCU-Tasks-Trace:: 1152 1159 1153 - Critical sections Grace period Barrier 1160 + Critical sections Grace period Barrier 1154 1161 1155 - rcu_read_lock_trace call_rcu_tasks_trace rcu_barrier_tasks_trace 1162 + rcu_read_lock_trace call_rcu_tasks_trace rcu_barrier_tasks_trace 1156 1163 rcu_read_unlock_trace synchronize_rcu_tasks_trace 1164 + guard(rcu_tasks_trace)() 1165 + scoped_guard(rcu_tasks_trace) 1166 + 1167 + 1168 + SRCU list traversal:: 1169 + list_for_each_entry_srcu 1170 + hlist_for_each_entry_srcu 1157 1171 1158 1172 1159 1173 SRCU:: 1160 1174 1161 - Critical sections Grace period Barrier 1175 + Critical sections Grace period Barrier 1162 1176 1163 - srcu_read_lock call_srcu srcu_barrier 1164 - srcu_read_unlock synchronize_srcu 1165 - srcu_dereference synchronize_srcu_expedited 1177 + srcu_read_lock call_srcu srcu_barrier 1178 + srcu_read_unlock synchronize_srcu 1179 + srcu_read_lock_fast synchronize_srcu_expedited 1180 + srcu_read_unlock_fast get_state_synchronize_srcu 1181 + srcu_read_lock_nmisafe start_poll_synchronize_srcu 1182 + srcu_read_unlock_nmisafe start_poll_synchronize_srcu_expedited 1183 + srcu_read_lock_notrace poll_state_synchronize_srcu 1184 + srcu_read_unlock_notrace 1185 + srcu_down_read 1186 + srcu_up_read 1187 + srcu_down_read_fast 1188 + srcu_up_read_fast 1189 + guard(srcu)() 1190 + scoped_guard(srcu) 1191 + srcu_read_lock_held 1192 + srcu_dereference 1166 1193 srcu_dereference_check 1194 + srcu_dereference_notrace 1167 1195 srcu_read_lock_held 1168 1196 1169 - SRCU: Initialization/cleanup:: 1197 + 1198 + SRCU: Initialization/cleanup/ordering:: 1170 1199 1171 1200 DEFINE_SRCU 1172 1201 DEFINE_STATIC_SRCU 1173 1202 init_srcu_struct 1174 1203 cleanup_srcu_struct 1204 + smp_mb__after_srcu_read_unlock 1175 1205 1176 1206 All: lockdep-checked RCU utility APIs:: 1177 1207
-10
include/linux/list.h
··· 709 709 for (pos = (head)->next; !list_is_head(pos, (head)); pos = pos->next) 710 710 711 711 /** 712 - * list_for_each_rcu - Iterate over a list in an RCU-safe fashion 713 - * @pos: the &struct list_head to use as a loop cursor. 714 - * @head: the head for your list. 715 - */ 716 - #define list_for_each_rcu(pos, head) \ 717 - for (pos = rcu_dereference((head)->next); \ 718 - !list_is_head(pos, (head)); \ 719 - pos = rcu_dereference(pos->next)) 720 - 721 - /** 722 712 * list_for_each_continue - continue iteration over a list 723 713 * @pos: the &struct list_head to use as a loop cursor. 724 714 * @head: the head for your list.
+10
include/linux/rculist.h
··· 43 43 #define list_bidir_prev_rcu(list) (*((struct list_head __rcu **)(&(list)->prev))) 44 44 45 45 /** 46 + * list_for_each_rcu - Iterate over a list in an RCU-safe fashion 47 + * @pos: the &struct list_head to use as a loop cursor. 48 + * @head: the head for your list. 49 + */ 50 + #define list_for_each_rcu(pos, head) \ 51 + for (pos = rcu_dereference((head)->next); \ 52 + !list_is_head(pos, (head)); \ 53 + pos = rcu_dereference(pos->next)) 54 + 55 + /** 46 56 * list_tail_rcu - returns the prev pointer of the head of the list 47 57 * @head: the head of the list 48 58 *
+34
include/linux/srcu.h
··· 275 275 { 276 276 struct srcu_ctr __percpu *retval; 277 277 278 + RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_lock_fast()."); 278 279 srcu_check_read_flavor_force(ssp, SRCU_READ_FLAVOR_FAST); 279 280 retval = __srcu_read_lock_fast(ssp); 280 281 rcu_try_lock_acquire(&ssp->dep_map); 282 + return retval; 283 + } 284 + 285 + /* 286 + * Used by tracing, cannot be traced and cannot call lockdep. 287 + * See srcu_read_lock_fast() for more information. 288 + */ 289 + static inline struct srcu_ctr __percpu *srcu_read_lock_fast_notrace(struct srcu_struct *ssp) 290 + __acquires(ssp) 291 + { 292 + struct srcu_ctr __percpu *retval; 293 + 294 + srcu_check_read_flavor_force(ssp, SRCU_READ_FLAVOR_FAST); 295 + retval = __srcu_read_lock_fast(ssp); 281 296 return retval; 282 297 } 283 298 ··· 310 295 static inline struct srcu_ctr __percpu *srcu_down_read_fast(struct srcu_struct *ssp) __acquires(ssp) 311 296 { 312 297 WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && in_nmi()); 298 + RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_down_read_fast()."); 313 299 srcu_check_read_flavor_force(ssp, SRCU_READ_FLAVOR_FAST); 314 300 return __srcu_read_lock_fast(ssp); 315 301 } ··· 405 389 srcu_check_read_flavor(ssp, SRCU_READ_FLAVOR_FAST); 406 390 srcu_lock_release(&ssp->dep_map); 407 391 __srcu_read_unlock_fast(ssp, scp); 392 + RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_unlock_fast()."); 393 + } 394 + 395 + /* 396 + * Used by tracing, cannot be traced and cannot call lockdep. 397 + * See srcu_read_unlock_fast() for more information. 398 + */ 399 + static inline void srcu_read_unlock_fast_notrace(struct srcu_struct *ssp, 400 + struct srcu_ctr __percpu *scp) __releases(ssp) 401 + { 402 + srcu_check_read_flavor(ssp, SRCU_READ_FLAVOR_FAST); 403 + __srcu_read_unlock_fast(ssp, scp); 408 404 } 409 405 410 406 /** ··· 433 405 WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && in_nmi()); 434 406 srcu_check_read_flavor(ssp, SRCU_READ_FLAVOR_FAST); 435 407 __srcu_read_unlock_fast(ssp, scp); 408 + RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_up_read_fast()."); 436 409 } 437 410 438 411 /** ··· 513 484 DEFINE_LOCK_GUARD_1(srcu_fast, struct srcu_struct, 514 485 _T->scp = srcu_read_lock_fast(_T->lock), 515 486 srcu_read_unlock_fast(_T->lock, _T->scp), 487 + struct srcu_ctr __percpu *scp) 488 + 489 + DEFINE_LOCK_GUARD_1(srcu_fast_notrace, struct srcu_struct, 490 + _T->scp = srcu_read_lock_fast_notrace(_T->lock), 491 + srcu_read_unlock_fast_notrace(_T->lock, _T->scp), 516 492 struct srcu_ctr __percpu *scp) 517 493 518 494 #endif
+30 -19
include/linux/srcutree.h
··· 232 232 * srcu_read_unlock_fast(). 233 233 * 234 234 * Note that both this_cpu_inc() and atomic_long_inc() are RCU read-side 235 - * critical sections either because they disables interrupts, because they 236 - * are a single instruction, or because they are a read-modify-write atomic 237 - * operation, depending on the whims of the architecture. 235 + * critical sections either because they disables interrupts, because 236 + * they are a single instruction, or because they are read-modify-write 237 + * atomic operations, depending on the whims of the architecture. 238 + * This matters because the SRCU-fast grace-period mechanism uses either 239 + * synchronize_rcu() or synchronize_rcu_expedited(), that is, RCU, 240 + * *not* SRCU, in order to eliminate the need for the read-side smp_mb() 241 + * invocations that are used by srcu_read_lock() and srcu_read_unlock(). 242 + * The __srcu_read_unlock_fast() function also relies on this same RCU 243 + * (again, *not* SRCU) trick to eliminate the need for smp_mb(). 244 + * 245 + * The key point behind this RCU trick is that if any part of a given 246 + * RCU reader precedes the beginning of a given RCU grace period, then 247 + * the entirety of that RCU reader and everything preceding it happens 248 + * before the end of that same RCU grace period. Similarly, if any part 249 + * of a given RCU reader follows the end of a given RCU grace period, 250 + * then the entirety of that RCU reader and everything following it 251 + * happens after the beginning of that same RCU grace period. Therefore, 252 + * the operations labeled Y in __srcu_read_lock_fast() and those labeled Z 253 + * in __srcu_read_unlock_fast() are ordered against the corresponding SRCU 254 + * read-side critical section from the viewpoint of the SRCU grace period. 255 + * This is all the ordering that is required, hence no calls to smp_mb(). 238 256 * 239 257 * This means that __srcu_read_lock_fast() is not all that fast 240 258 * on architectures that support NMIs but do not supply NMI-safe 241 259 * implementations of this_cpu_inc(). 242 260 */ 243 - static inline struct srcu_ctr __percpu *__srcu_read_lock_fast(struct srcu_struct *ssp) 261 + static inline struct srcu_ctr __percpu notrace *__srcu_read_lock_fast(struct srcu_struct *ssp) 244 262 { 245 263 struct srcu_ctr __percpu *scp = READ_ONCE(ssp->srcu_ctrp); 246 264 247 - RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_lock_fast()."); 248 265 if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE)) 249 - this_cpu_inc(scp->srcu_locks.counter); /* Y */ 266 + this_cpu_inc(scp->srcu_locks.counter); // Y, and implicit RCU reader. 250 267 else 251 - atomic_long_inc(raw_cpu_ptr(&scp->srcu_locks)); /* Z */ 268 + atomic_long_inc(raw_cpu_ptr(&scp->srcu_locks)); // Y, and implicit RCU reader. 252 269 barrier(); /* Avoid leaking the critical section. */ 253 270 return scp; 254 271 } ··· 276 259 * different CPU than that which was incremented by the corresponding 277 260 * srcu_read_lock_fast(), but it must be within the same task. 278 261 * 279 - * Note that both this_cpu_inc() and atomic_long_inc() are RCU read-side 280 - * critical sections either because they disables interrupts, because they 281 - * are a single instruction, or because they are a read-modify-write atomic 282 - * operation, depending on the whims of the architecture. 283 - * 284 - * This means that __srcu_read_unlock_fast() is not all that fast 285 - * on architectures that support NMIs but do not supply NMI-safe 286 - * implementations of this_cpu_inc(). 262 + * Please see the __srcu_read_lock_fast() function's header comment for 263 + * information on implicit RCU readers and NMI safety. 287 264 */ 288 - static inline void __srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp) 265 + static inline void notrace 266 + __srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp) 289 267 { 290 268 barrier(); /* Avoid leaking the critical section. */ 291 269 if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE)) 292 - this_cpu_inc(scp->srcu_unlocks.counter); /* Z */ 270 + this_cpu_inc(scp->srcu_unlocks.counter); // Z, and implicit RCU reader. 293 271 else 294 - atomic_long_inc(raw_cpu_ptr(&scp->srcu_unlocks)); /* Z */ 295 - RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_unlock_fast()."); 272 + atomic_long_inc(raw_cpu_ptr(&scp->srcu_unlocks)); // Z, and implicit RCU reader. 296 273 } 297 274 298 275 void __srcu_check_read_flavor(struct srcu_struct *ssp, int read_flavor);
+1
kernel/cgroup/dmem.c
··· 14 14 #include <linux/mutex.h> 15 15 #include <linux/page_counter.h> 16 16 #include <linux/parser.h> 17 + #include <linux/rculist.h> 17 18 #include <linux/slab.h> 18 19 19 20 struct dmem_cgroup_region {
+20 -7
kernel/rcu/rcutorture.c
··· 1528 1528 static int 1529 1529 rcu_torture_writer(void *arg) 1530 1530 { 1531 - bool boot_ended; 1531 + bool booting_still = false; 1532 1532 bool can_expedite = !rcu_gp_is_expedited() && !rcu_gp_is_normal(); 1533 1533 unsigned long cookie; 1534 1534 struct rcu_gp_oldstate cookie_full; ··· 1539 1539 struct rcu_gp_oldstate gp_snap1_full; 1540 1540 int i; 1541 1541 int idx; 1542 + unsigned long j; 1542 1543 int oldnice = task_nice(current); 1543 1544 struct rcu_gp_oldstate *rgo = NULL; 1544 1545 int rgo_size = 0; ··· 1572 1571 return 0; 1573 1572 } 1574 1573 if (cur_ops->poll_active > 0) { 1575 - ulo = kzalloc(cur_ops->poll_active * sizeof(ulo[0]), GFP_KERNEL); 1574 + ulo = kcalloc(cur_ops->poll_active, sizeof(*ulo), GFP_KERNEL); 1576 1575 if (!WARN_ON(!ulo)) 1577 1576 ulo_size = cur_ops->poll_active; 1578 1577 } 1579 1578 if (cur_ops->poll_active_full > 0) { 1580 - rgo = kzalloc(cur_ops->poll_active_full * sizeof(rgo[0]), GFP_KERNEL); 1579 + rgo = kcalloc(cur_ops->poll_active_full, sizeof(*rgo), GFP_KERNEL); 1581 1580 if (!WARN_ON(!rgo)) 1582 1581 rgo_size = cur_ops->poll_active_full; 1583 1582 } 1583 + 1584 + // If the system is still booting, let it finish. 1585 + j = jiffies; 1586 + while (!torture_must_stop() && !rcu_inkernel_boot_has_ended()) { 1587 + booting_still = true; 1588 + schedule_timeout_interruptible(HZ); 1589 + } 1590 + if (booting_still) 1591 + pr_alert("%s" TORTURE_FLAG " Waited %lu jiffies for boot to complete.\n", 1592 + torture_type, jiffies - j); 1584 1593 1585 1594 do { 1586 1595 rcu_torture_writer_state = RTWS_FIXED_DELAY; ··· 1780 1769 !rcu_gp_is_normal(); 1781 1770 } 1782 1771 rcu_torture_writer_state = RTWS_STUTTER; 1783 - boot_ended = rcu_inkernel_boot_has_ended(); 1784 1772 stutter_waited = stutter_wait("rcu_torture_writer"); 1785 1773 if (stutter_waited && 1786 1774 !atomic_read(&rcu_fwd_cb_nodelay) && 1787 1775 !cur_ops->slow_gps && 1788 1776 !torture_must_stop() && 1789 - boot_ended && 1790 1777 time_after(jiffies, stallsdone)) 1791 1778 for (i = 0; i < ARRAY_SIZE(rcu_tortures); i++) 1792 1779 if (list_empty(&rcu_tortures[i].rtort_free) && ··· 2446 2437 torture_hrtimeout_us(500, 1000, &rand); 2447 2438 lastsleep = jiffies + 10; 2448 2439 } 2449 - while (torture_num_online_cpus() < mynumonline && !torture_must_stop()) 2440 + while (!torture_must_stop() && 2441 + (torture_num_online_cpus() < mynumonline || !rcu_inkernel_boot_has_ended())) 2450 2442 schedule_timeout_interruptible(HZ / 5); 2451 2443 stutter_wait("rcu_torture_reader"); 2452 2444 } while (!torture_must_stop()); ··· 2766 2756 cur_ops->stats(); 2767 2757 if (rtcv_snap == rcu_torture_current_version && 2768 2758 rcu_access_pointer(rcu_torture_current) && 2769 - !rcu_stall_is_suppressed()) { 2759 + !rcu_stall_is_suppressed() && 2760 + rcu_inkernel_boot_has_ended()) { 2770 2761 int __maybe_unused flags = 0; 2771 2762 unsigned long __maybe_unused gp_seq = 0; 2772 2763 ··· 3457 3446 int tested_tries = 0; 3458 3447 3459 3448 VERBOSE_TOROUT_STRING("rcu_torture_fwd_progress task started"); 3449 + while (!rcu_inkernel_boot_has_ended()) 3450 + schedule_timeout_interruptible(HZ / 10); 3460 3451 rcu_bind_current_to_nocb(); 3461 3452 if (!IS_ENABLED(CONFIG_SMP) || !IS_ENABLED(CONFIG_RCU_BOOST)) 3462 3453 set_user_nice(current, MAX_NICE);
+2 -2
kernel/rcu/refscale.c
··· 1021 1021 set_user_nice(current, MAX_NICE); 1022 1022 1023 1023 VERBOSE_SCALEOUT("main_func task started"); 1024 - result_avg = kzalloc(nruns * sizeof(*result_avg), GFP_KERNEL); 1024 + result_avg = kcalloc(nruns, sizeof(*result_avg), GFP_KERNEL); 1025 1025 buf = kzalloc(800 + 64, GFP_KERNEL); 1026 1026 if (!result_avg || !buf) { 1027 1027 SCALEOUT_ERRSTRING("out of memory"); ··· 1133 1133 reader_tasks[i].task); 1134 1134 } 1135 1135 kfree(reader_tasks); 1136 + reader_tasks = NULL; 1136 1137 1137 1138 torture_stop_kthread("main_task", main_task); 1138 - kfree(main_task); 1139 1139 1140 1140 // Do scale-type-specific cleanup operations. 1141 1141 if (cur_ops->cleanup != NULL)
+1 -3
kernel/rcu/srcutiny.c
··· 176 176 { 177 177 unsigned long cookie; 178 178 179 - preempt_disable(); // Needed for PREEMPT_LAZY 179 + lockdep_assert_preemption_disabled(); // Needed for PREEMPT_LAZY 180 180 cookie = get_state_synchronize_srcu(ssp); 181 181 if (ULONG_CMP_GE(READ_ONCE(ssp->srcu_idx_max), cookie)) { 182 - preempt_enable(); 183 182 return; 184 183 } 185 184 WRITE_ONCE(ssp->srcu_idx_max, cookie); ··· 188 189 else if (list_empty(&ssp->srcu_work.entry)) 189 190 list_add(&ssp->srcu_work.entry, &srcu_boot_list); 190 191 } 191 - preempt_enable(); 192 192 } 193 193 194 194 /*
+10
kernel/rcu/srcutree.c
··· 1168 1168 * counter update. Note that both this memory barrier and the 1169 1169 * one in srcu_readers_active_idx_check() provide the guarantee 1170 1170 * for __srcu_read_lock(). 1171 + * 1172 + * Note that this is a performance optimization, in which we spend 1173 + * an otherwise unnecessary smp_mb() in order to reduce the number 1174 + * of full per-CPU-variable scans in srcu_readers_lock_idx() and 1175 + * srcu_readers_unlock_idx(). But this performance optimization 1176 + * is not so optimal for SRCU-fast, where we would be spending 1177 + * not smp_mb(), but rather synchronize_rcu(). At the same time, 1178 + * the overhead of the smp_mb() is in the noise, so there is no 1179 + * point in omitting it in the SRCU-fast case. So the same code 1180 + * is executed either way. 1171 1181 */ 1172 1182 smp_mb(); /* D */ /* Pairs with C. */ 1173 1183 }
+2 -2
kernel/rcu/tasks.h
··· 553 553 rtpcp_next = rtp->rtpcp_array[index]; 554 554 if (rtpcp_next->cpu < smp_load_acquire(&rtp->percpu_dequeue_lim)) { 555 555 cpuwq = rcu_cpu_beenfullyonline(rtpcp_next->cpu) ? rtpcp_next->cpu : WORK_CPU_UNBOUND; 556 - queue_work_on(cpuwq, system_wq, &rtpcp_next->rtp_work); 556 + queue_work_on(cpuwq, system_percpu_wq, &rtpcp_next->rtp_work); 557 557 index++; 558 558 if (index < num_possible_cpus()) { 559 559 rtpcp_next = rtp->rtpcp_array[index]; 560 560 if (rtpcp_next->cpu < smp_load_acquire(&rtp->percpu_dequeue_lim)) { 561 561 cpuwq = rcu_cpu_beenfullyonline(rtpcp_next->cpu) ? rtpcp_next->cpu : WORK_CPU_UNBOUND; 562 - queue_work_on(cpuwq, system_wq, &rtpcp_next->rtp_work); 562 + queue_work_on(cpuwq, system_percpu_wq, &rtpcp_next->rtp_work); 563 563 } 564 564 } 565 565 }
+7 -2
kernel/rcu/tree.c
··· 3800 3800 * to complete. For example, if there are no RCU callbacks queued anywhere 3801 3801 * in the system, then rcu_barrier() is within its rights to return 3802 3802 * immediately, without waiting for anything, much less an RCU grace period. 3803 + * In fact, rcu_barrier() will normally not result in any RCU grace periods 3804 + * beyond those that were already destined to be executed. 3805 + * 3806 + * In kernels built with CONFIG_RCU_LAZY=y, this function also hurries all 3807 + * pending lazy RCU callbacks. 3803 3808 */ 3804 3809 void rcu_barrier(void) 3805 3810 { ··· 4890 4885 rcutree_online_cpu(cpu); 4891 4886 4892 4887 /* Create workqueue for Tree SRCU and for expedited GPs. */ 4893 - rcu_gp_wq = alloc_workqueue("rcu_gp", WQ_MEM_RECLAIM, 0); 4888 + rcu_gp_wq = alloc_workqueue("rcu_gp", WQ_MEM_RECLAIM | WQ_PERCPU, 0); 4894 4889 WARN_ON(!rcu_gp_wq); 4895 4890 4896 - sync_wq = alloc_workqueue("sync_wq", WQ_MEM_RECLAIM, 0); 4891 + sync_wq = alloc_workqueue("sync_wq", WQ_MEM_RECLAIM | WQ_UNBOUND, 0); 4897 4892 WARN_ON(!sync_wq); 4898 4893 4899 4894 /* Respect if explicitly disabled via a boot parameter. */
+1 -4
kernel/rcu/tree_plugin.h
··· 626 626 */ 627 627 static void rcu_preempt_deferred_qs_handler(struct irq_work *iwp) 628 628 { 629 - unsigned long flags; 630 629 struct rcu_data *rdp; 631 630 631 + lockdep_assert_irqs_disabled(); 632 632 rdp = container_of(iwp, struct rcu_data, defer_qs_iw); 633 - local_irq_save(flags); 634 633 635 634 /* 636 635 * If the IRQ work handler happens to run in the middle of RCU read-side ··· 646 647 */ 647 648 if (rcu_preempt_depth() > 0) 648 649 WRITE_ONCE(rdp->defer_qs_iw_pending, DEFER_QS_IDLE); 649 - 650 - local_irq_restore(flags); 651 650 } 652 651 653 652 /*
+5 -2
kernel/torture.c
··· 359 359 torture_hrtimeout_jiffies(onoff_holdoff, &rand); 360 360 VERBOSE_TOROUT_STRING("torture_onoff end holdoff"); 361 361 } 362 + while (!rcu_inkernel_boot_has_ended()) 363 + schedule_timeout_interruptible(HZ / 10); 362 364 while (!torture_must_stop()) { 363 365 if (disable_onoff_at_boot && !rcu_inkernel_boot_has_ended()) { 364 366 torture_hrtimeout_jiffies(HZ / 10, &rand); ··· 799 797 static void 800 798 torture_print_module_parms(void) 801 799 { 802 - pr_alert("torture module --- %s: disable_onoff_at_boot=%d ftrace_dump_at_shutdown=%d verbose_sleep_frequency=%d verbose_sleep_duration=%d random_shuffle=%d\n", 803 - torture_type, disable_onoff_at_boot, ftrace_dump_at_shutdown, verbose_sleep_frequency, verbose_sleep_duration, random_shuffle); 800 + pr_alert("torture module --- %s: disable_onoff_at_boot=%d ftrace_dump_at_shutdown=%d verbose_sleep_frequency=%d verbose_sleep_duration=%d random_shuffle=%d%s\n", 801 + torture_type, disable_onoff_at_boot, ftrace_dump_at_shutdown, verbose_sleep_frequency, verbose_sleep_duration, random_shuffle, 802 + rcu_inkernel_boot_has_ended() ? "" : " still booting"); 804 803 } 805 804 806 805 /*
+24 -3
tools/testing/selftests/rcutorture/bin/jitter.sh
··· 39 39 fi 40 40 done 41 41 42 + # Uses global variables startsecs, startns, endsecs, endns, and limit. 43 + # Exit code is success for time not yet elapsed and failure otherwise. 44 + function timecheck { 45 + local done=`awk -v limit=$limit \ 46 + -v startsecs=$startsecs \ 47 + -v startns=$startns \ 48 + -v endsecs=$endsecs \ 49 + -v endns=$endns < /dev/null ' 50 + BEGIN { 51 + delta = (endsecs - startsecs) * 1000 * 1000; 52 + delta += int((endns - startns) / 1000); 53 + print delta >= limit; 54 + }'` 55 + return $done 56 + } 57 + 42 58 while : 43 59 do 44 60 # Check for done. ··· 101 85 n=$(($n+1)) 102 86 sleep .$sleeptime 103 87 104 - # Spin a random duration 88 + # Spin a random duration, but with rather coarse granularity. 105 89 limit=`awk -v me=$me -v n=$n -v spinmax=$spinmax 'BEGIN { 106 90 srand(n + me + systime()); 107 91 printf("%06d", int(rand() * spinmax)); 108 92 }' < /dev/null` 109 93 n=$(($n+1)) 110 - for i in {1..$limit} 94 + startsecs=`date +%s` 95 + startns=`date +%N` 96 + endsecs=$startns 97 + endns=$endns 98 + while timecheck 111 99 do 112 - echo > /dev/null 100 + endsecs=`date +%s` 101 + endns=`date +%N` 113 102 done 114 103 done 115 104
+1
tools/testing/selftests/rcutorture/bin/torture.sh
··· 94 94 echo " --do-kvfree / --do-no-kvfree / --no-kvfree" 95 95 echo " --do-locktorture / --do-no-locktorture / --no-locktorture" 96 96 echo " --do-none" 97 + echo " --do-normal / --do-no-normal / --no-normal" 97 98 echo " --do-rcuscale / --do-no-rcuscale / --no-rcuscale" 98 99 echo " --do-rcutasksflavors / --do-no-rcutasksflavors / --no-rcutasksflavors" 99 100 echo " --do-rcutorture / --do-no-rcutorture / --no-rcutorture"