Merge branches 'rcu/torture', 'rcu/fixes', 'rcu/docs', 'rcu/refscale', 'rcu/tasks' and 'rcu/stall' into rcu/next

+1 -1

Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst

··· 181 181 of this wait (or series of waits, as the case may be) is to permit a 182 182 concurrent CPU-hotplug operation to complete. 183 183 #. In the case of RCU-sched, one of the last acts of an outgoing CPU is 184 - to invoke ``rcu_report_dead()``, which reports a quiescent state for 184 + to invoke ``rcutree_report_cpu_dead()``, which reports a quiescent state for 185 185 that CPU. However, this is likely paranoia-induced redundancy. 186 186 187 187 +-----------------------------------------------------------------------+

-9

Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg

··· 566 566 style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_migrate_callbacks()</text> 567 567 <text 568 568 xml:space="preserve" 569 - x="8335.4873" 570 - y="5357.1006" 571 - font-style="normal" 572 - font-weight="bold" 573 - font-size="192" 574 - id="text202-7-9-6-0" 575 - style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_migrate_callbacks()</text> 576 - <text 577 - xml:space="preserve" 578 569 x="8768.4678" 579 570 y="6224.9038" 580 571 font-style="normal"

+2 -2

Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg

··· 1135 1135 font-weight="bold" 1136 1136 font-size="192" 1137 1137 id="text202-7-5-3-27-6-5" 1138 - style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_report_dead()</text> 1138 + style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_report_cpu_dead()</text> 1139 1139 <text 1140 1140 xml:space="preserve" 1141 1141 x="3745.7725" ··· 1256 1256 font-style="normal" 1257 1257 y="3679.27" 1258 1258 x="-3804.9949" 1259 - xml:space="preserve">rcu_cpu_starting()</text> 1259 + xml:space="preserve">rcutree_report_cpu_starting()</text> 1260 1260 <g 1261 1261 style="fill:none;stroke-width:0.025in" 1262 1262 id="g3107-7-5-0"

+2 -11

Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg

··· 1448 1448 style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_migrate_callbacks()</text> 1449 1449 <text 1450 1450 xml:space="preserve" 1451 - x="8335.4873" 1452 - y="5357.1006" 1453 - font-style="normal" 1454 - font-weight="bold" 1455 - font-size="192" 1456 - id="text202-7-9-6-0" 1457 - style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_migrate_callbacks()</text> 1458 - <text 1459 - xml:space="preserve" 1460 1451 x="8768.4678" 1461 1452 y="6224.9038" 1462 1453 font-style="normal" ··· 3265 3274 font-weight="bold" 3266 3275 font-size="192" 3267 3276 id="text202-7-5-3-27-6-5" 3268 - style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_report_dead()</text> 3277 + style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_report_cpu_dead()</text> 3269 3278 <text 3270 3279 xml:space="preserve" 3271 3280 x="3745.7725" ··· 3386 3395 font-style="normal" 3387 3396 y="3679.27" 3388 3397 x="-3804.9949" 3389 - xml:space="preserve">rcu_cpu_starting()</text> 3398 + xml:space="preserve">rcutree_report_cpu_starting()</text> 3390 3399 <g 3391 3400 style="fill:none;stroke-width:0.025in" 3392 3401 id="g3107-7-5-0"

+2 -2

Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg

··· 607 607 font-weight="bold" 608 608 font-size="192" 609 609 id="text202-7-5-3-27-6" 610 - style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_report_dead()</text> 610 + style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_report_cpu_dead()</text> 611 611 <text 612 612 xml:space="preserve" 613 613 x="3745.7725" ··· 728 728 font-style="normal" 729 729 y="3679.27" 730 730 x="-3804.9949" 731 - xml:space="preserve">rcu_cpu_starting()</text> 731 + xml:space="preserve">rcutree_report_cpu_starting()</text> 732 732 <g 733 733 style="fill:none;stroke-width:0.025in" 734 734 id="g3107-7-5-0"

+2 -2

Documentation/RCU/Design/Requirements/Requirements.rst

··· 1955 1955 1956 1956 An offline CPU's quiescent state will be reported either: 1957 1957 1958 - 1. As the CPU goes offline using RCU's hotplug notifier (rcu_report_dead()). 1958 + 1. As the CPU goes offline using RCU's hotplug notifier (rcutree_report_cpu_dead()). 1959 1959 2. When grace period initialization (rcu_gp_init()) detects a 1960 1960 race either with CPU offlining or with a task unblocking on a leaf 1961 1961 ``rcu_node`` structure whose CPUs are all offline. 1962 1962 1963 - The CPU-online path (rcu_cpu_starting()) should never need to report 1963 + The CPU-online path (rcutree_report_cpu_starting()) should never need to report 1964 1964 a quiescent state for an offline CPU. However, as a debugging measure, 1965 1965 it does emit a warning if a quiescent state was not already reported 1966 1966 for that CPU.

+9

Documentation/RCU/listRCU.rst

··· 8 8 that all of the required memory ordering is provided by the list macros. 9 9 This document describes several list-based RCU use cases. 10 10 11 + When iterating a list while holding the rcu_read_lock(), writers may 12 + modify the list. The reader is guaranteed to see all of the elements 13 + which were added to the list before they acquired the rcu_read_lock() 14 + and are still on the list when they drop the rcu_read_unlock(). 15 + Elements which are added to, or removed from the list may or may not 16 + be seen. If the writer calls list_replace_rcu(), the reader may see 17 + either the old element or the new element; they will not see both, 18 + nor will they see neither. 19 + 11 20 12 21 Example 1: Read-mostly list: Deferred Destruction 13 22 -------------------------------------------------

+2 -2

Documentation/RCU/whatisRCU.rst

··· 59 59 with example uses should focus on Sections 3 and 4. People who need to 60 60 understand the RCU implementation should focus on Section 5, then dive 61 61 into the kernel source code. People who reason best by analogy should 62 - focus on Section 6. Section 7 serves as an index to the docbook API 63 - documentation, and Section 8 is the traditional answer key. 62 + focus on Section 6 and 7. Section 8 serves as an index to the docbook 63 + API documentation, and Section 9 is the traditional answer key. 64 64 65 65 So, start with the section that makes the most sense to you and your 66 66 preferred method of learning. If you need to know everything about

+13

Documentation/admin-guide/kernel-parameters.txt

··· 4820 4820 Set maximum number of finished RCU callbacks to 4821 4821 process in one batch. 4822 4822 4823 + rcutree.do_rcu_barrier= [KNL] 4824 + Request a call to rcu_barrier(). This is 4825 + throttled so that userspace tests can safely 4826 + hammer on the sysfs variable if they so choose. 4827 + If triggered before the RCU grace-period machinery 4828 + is fully active, this will error out with EAGAIN. 4829 + 4823 4830 rcutree.dump_tree= [KNL] 4824 4831 Dump the structure of the rcu_node combining tree 4825 4832 out at early boot. This is used for diagnostic ··· 5479 5472 this parameter is to delay the start of the 5480 5473 test until boot completes in order to avoid 5481 5474 interference. 5475 + 5476 + refscale.lookup_instances= [KNL] 5477 + Number of data elements to use for the forms of 5478 + SLAB_TYPESAFE_BY_RCU testing. A negative number 5479 + is negated and multiplied by nr_cpu_ids, while 5480 + zero specifies nr_cpu_ids. 5482 5481 5483 5482 refscale.loops= [KNL] 5484 5483 Set the number of loops over the synchronization

+2 -2

arch/arm64/kernel/smp.c

··· 215 215 if (system_uses_irq_prio_masking()) 216 216 init_gic_priority_masking(); 217 217 218 - rcu_cpu_starting(cpu); 218 + rcutree_report_cpu_starting(cpu); 219 219 trace_hardirqs_off(); 220 220 221 221 /* ··· 401 401 402 402 /* Mark this CPU absent */ 403 403 set_cpu_present(cpu, 0); 404 - rcu_report_dead(cpu); 404 + rcutree_report_cpu_dead(); 405 405 406 406 if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) { 407 407 update_cpu_boot_status(CPU_KILL_ME);

+1 -1

arch/powerpc/kernel/smp.c

··· 1629 1629 1630 1630 smp_store_cpu_info(cpu); 1631 1631 set_dec(tb_ticks_per_jiffy); 1632 - rcu_cpu_starting(cpu); 1632 + rcutree_report_cpu_starting(cpu); 1633 1633 cpu_callin_map[cpu] = 1; 1634 1634 1635 1635 if (smp_ops->setup_cpu)

+1 -1

arch/s390/kernel/smp.c

··· 898 898 S390_lowcore.restart_flags = 0; 899 899 restore_access_regs(S390_lowcore.access_regs_save_area); 900 900 cpu_init(); 901 - rcu_cpu_starting(cpu); 901 + rcutree_report_cpu_starting(cpu); 902 902 init_cpu_timer(); 903 903 vtime_init(); 904 904 vdso_getcpu_init();

+1 -1

arch/x86/kernel/smpboot.c

··· 288 288 289 289 cpu_init(); 290 290 fpu__init_cpu(); 291 - rcu_cpu_starting(raw_smp_processor_id()); 291 + rcutree_report_cpu_starting(raw_smp_processor_id()); 292 292 x86_cpuinit.early_percpu_clock_init(); 293 293 294 294 ap_starting();

+1 -1

include/linux/interrupt.h

··· 566 566 * 567 567 * _ RCU: 568 568 * 1) rcutree_migrate_callbacks() migrates the queue. 569 - * 2) rcu_report_dead() reports the final quiescent states. 569 + * 2) rcutree_report_cpu_dead() reports the final quiescent states. 570 570 * 571 571 * _ IRQ_POLL: irq_poll_cpu_dead() migrates the queue 572 572 */

+32

include/linux/rcu_notifier.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0+ */ 2 + /* 3 + * Read-Copy Update notifiers, initially RCU CPU stall notifier. 4 + * Separate from rcupdate.h to avoid #include loops. 5 + * 6 + * Copyright (C) 2023 Paul E. McKenney. 7 + */ 8 + 9 + #ifndef __LINUX_RCU_NOTIFIER_H 10 + #define __LINUX_RCU_NOTIFIER_H 11 + 12 + // Actions for RCU CPU stall notifier calls. 13 + #define RCU_STALL_NOTIFY_NORM 1 14 + #define RCU_STALL_NOTIFY_EXP 2 15 + 16 + #ifdef CONFIG_RCU_STALL_COMMON 17 + 18 + #include <linux/notifier.h> 19 + #include <linux/types.h> 20 + 21 + int rcu_stall_chain_notifier_register(struct notifier_block *n); 22 + int rcu_stall_chain_notifier_unregister(struct notifier_block *n); 23 + 24 + #else // #ifdef CONFIG_RCU_STALL_COMMON 25 + 26 + // No RCU CPU stall warnings in Tiny RCU. 27 + static inline int rcu_stall_chain_notifier_register(struct notifier_block *n) { return -EEXIST; } 28 + static inline int rcu_stall_chain_notifier_unregister(struct notifier_block *n) { return -ENOENT; } 29 + 30 + #endif // #else // #ifdef CONFIG_RCU_STALL_COMMON 31 + 32 + #endif /* __LINUX_RCU_NOTIFIER_H */

-2

include/linux/rcupdate.h

··· 122 122 void rcu_init(void); 123 123 extern int rcu_scheduler_active; 124 124 void rcu_sched_clock_irq(int user); 125 - void rcu_report_dead(unsigned int cpu); 126 - void rcutree_migrate_callbacks(int cpu); 127 125 128 126 #ifdef CONFIG_TASKS_RCU_GENERIC 129 127 void rcu_init_tasks_generic(void);

+1 -1

include/linux/rcutiny.h

··· 171 171 #define rcutree_offline_cpu NULL 172 172 #define rcutree_dead_cpu NULL 173 173 #define rcutree_dying_cpu NULL 174 - static inline void rcu_cpu_starting(unsigned int cpu) { } 174 + static inline void rcutree_report_cpu_starting(unsigned int cpu) { } 175 175 176 176 #endif /* __LINUX_RCUTINY_H */

+14 -3

include/linux/rcutree.h

··· 37 37 void kvfree_call_rcu(struct rcu_head *head, void *ptr); 38 38 39 39 void rcu_barrier(void); 40 - bool rcu_eqs_special_set(int cpu); 41 40 void rcu_momentary_dyntick_idle(void); 42 41 void kfree_rcu_scheduler_running(void); 43 42 bool rcu_gp_might_be_stalled(void); ··· 110 111 /* RCUtree hotplug events */ 111 112 int rcutree_prepare_cpu(unsigned int cpu); 112 113 int rcutree_online_cpu(unsigned int cpu); 113 - int rcutree_offline_cpu(unsigned int cpu); 114 + void rcutree_report_cpu_starting(unsigned int cpu); 115 + 116 + #ifdef CONFIG_HOTPLUG_CPU 114 117 int rcutree_dead_cpu(unsigned int cpu); 115 118 int rcutree_dying_cpu(unsigned int cpu); 116 - void rcu_cpu_starting(unsigned int cpu); 119 + int rcutree_offline_cpu(unsigned int cpu); 120 + #else 121 + #define rcutree_dead_cpu NULL 122 + #define rcutree_dying_cpu NULL 123 + #define rcutree_offline_cpu NULL 124 + #endif 125 + 126 + void rcutree_migrate_callbacks(int cpu); 127 + 128 + /* Called from hotplug and also arm64 early secondary boot failure */ 129 + void rcutree_report_cpu_dead(void); 117 130 118 131 #endif /* __LINUX_RCUTREE_H */

+3 -2

include/linux/slab.h

··· 245 245 size_t ksize(const void *objp); 246 246 247 247 #ifdef CONFIG_PRINTK 248 - bool kmem_valid_obj(void *object); 249 - void kmem_dump_obj(void *object); 248 + bool kmem_dump_obj(void *object); 249 + #else 250 + static inline bool kmem_dump_obj(void *object) { return false; } 250 251 #endif 251 252 252 253 /*

+10 -3

kernel/cpu.c

··· 1372 1372 cpuhp_bp_sync_dead(cpu); 1373 1373 1374 1374 tick_cleanup_dead_cpu(cpu); 1375 + 1376 + /* 1377 + * Callbacks must be re-integrated right away to the RCU state machine. 1378 + * Otherwise an RCU callback could block a further teardown function 1379 + * waiting for its completion. 1380 + */ 1375 1381 rcutree_migrate_callbacks(cpu); 1382 + 1376 1383 return 0; 1377 1384 } 1378 1385 ··· 1395 1388 struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state); 1396 1389 1397 1390 BUG_ON(st->state != CPUHP_AP_OFFLINE); 1398 - rcu_report_dead(smp_processor_id()); 1391 + rcutree_report_cpu_dead(); 1399 1392 st->state = CPUHP_AP_IDLE_DEAD; 1400 1393 /* 1401 - * We cannot call complete after rcu_report_dead() so we delegate it 1394 + * We cannot call complete after rcutree_report_cpu_dead() so we delegate it 1402 1395 * to an online cpu. 1403 1396 */ 1404 1397 smp_call_function_single(cpumask_first(cpu_online_mask), ··· 1624 1617 struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu); 1625 1618 enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE); 1626 1619 1627 - rcu_cpu_starting(cpu); /* Enables RCU usage on this CPU. */ 1620 + rcutree_report_cpu_starting(cpu); /* Enables RCU usage on this CPU. */ 1628 1621 cpumask_set_cpu(cpu, &cpus_booted_once_mask); 1629 1622 1630 1623 /*

+13

kernel/rcu/rcu.h

··· 10 10 #ifndef __LINUX_RCU_H 11 11 #define __LINUX_RCU_H 12 12 13 + #include <linux/slab.h> 13 14 #include <trace/events/rcu.h> 14 15 15 16 /* ··· 248 247 { 249 248 } 250 249 #endif /* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */ 250 + 251 + static inline void debug_rcu_head_callback(struct rcu_head *rhp) 252 + { 253 + if (unlikely(!rhp->func)) 254 + kmem_dump_obj(rhp); 255 + } 251 256 252 257 extern int rcu_cpu_stall_suppress_at_boot; 253 258 ··· 656 649 #else 657 650 bool rcu_cpu_beenfullyonline(int cpu); 658 651 #endif 652 + 653 + #ifdef CONFIG_RCU_STALL_COMMON 654 + int rcu_stall_notifier_call_chain(unsigned long val, void *v); 655 + #else // #ifdef CONFIG_RCU_STALL_COMMON 656 + static inline int rcu_stall_notifier_call_chain(unsigned long val, void *v) { return NOTIFY_DONE; } 657 + #endif // #else // #ifdef CONFIG_RCU_STALL_COMMON 659 658 660 659 #endif /* __LINUX_RCU_H */

+2 -2

kernel/rcu/rcu_segcblist.c

··· 368 368 smp_mb(); /* Ensure counts are updated before callback is entrained. */ 369 369 rhp->next = NULL; 370 370 for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--) 371 - if (rsclp->tails[i] != rsclp->tails[i - 1]) 371 + if (!rcu_segcblist_segempty(rsclp, i)) 372 372 break; 373 373 rcu_segcblist_inc_seglen(rsclp, i); 374 374 WRITE_ONCE(*rsclp->tails[i], rhp); ··· 551 551 * as their ->gp_seq[] grace-period completion sequence number. 552 552 */ 553 553 for (i = RCU_NEXT_READY_TAIL; i > RCU_DONE_TAIL; i--) 554 - if (rsclp->tails[i] != rsclp->tails[i - 1] && 554 + if (!rcu_segcblist_segempty(rsclp, i) && 555 555 ULONG_CMP_LT(rsclp->gp_seq[i], seq)) 556 556 break; 557 557

+21

kernel/rcu/rcutorture.c

··· 21 21 #include <linux/spinlock.h> 22 22 #include <linux/smp.h> 23 23 #include <linux/rcupdate_wait.h> 24 + #include <linux/rcu_notifier.h> 24 25 #include <linux/interrupt.h> 25 26 #include <linux/sched/signal.h> 26 27 #include <uapi/linux/sched/types.h> ··· 2429 2428 return 0; 2430 2429 } 2431 2430 2431 + static int rcu_torture_stall_nf(struct notifier_block *nb, unsigned long v, void *ptr) 2432 + { 2433 + pr_info("%s: v=%lu, duration=%lu.\n", __func__, v, (unsigned long)ptr); 2434 + return NOTIFY_OK; 2435 + } 2436 + 2437 + static struct notifier_block rcu_torture_stall_block = { 2438 + .notifier_call = rcu_torture_stall_nf, 2439 + }; 2440 + 2432 2441 /* 2433 2442 * CPU-stall kthread. It waits as specified by stall_cpu_holdoff, then 2434 2443 * induces a CPU stall for the time specified by stall_cpu. ··· 2446 2435 static int rcu_torture_stall(void *args) 2447 2436 { 2448 2437 int idx; 2438 + int ret; 2449 2439 unsigned long stop_at; 2450 2440 2451 2441 VERBOSE_TOROUT_STRING("rcu_torture_stall task started"); 2442 + ret = rcu_stall_chain_notifier_register(&rcu_torture_stall_block); 2443 + if (ret) 2444 + pr_info("%s: rcu_stall_chain_notifier_register() returned %d, %sexpected.\n", 2445 + __func__, ret, !IS_ENABLED(CONFIG_RCU_STALL_COMMON) ? "un" : ""); 2452 2446 if (stall_cpu_holdoff > 0) { 2453 2447 VERBOSE_TOROUT_STRING("rcu_torture_stall begin holdoff"); 2454 2448 schedule_timeout_interruptible(stall_cpu_holdoff * HZ); ··· 2497 2481 cur_ops->readunlock(idx); 2498 2482 } 2499 2483 pr_alert("%s end.\n", __func__); 2484 + if (!ret) { 2485 + ret = rcu_stall_chain_notifier_unregister(&rcu_torture_stall_block); 2486 + if (ret) 2487 + pr_info("%s: rcu_stall_chain_notifier_unregister() returned %d.\n", __func__, ret); 2488 + } 2500 2489 torture_shutdown_absorb("rcu_torture_stall"); 2501 2490 while (!kthread_should_stop()) 2502 2491 schedule_timeout_interruptible(10 * HZ);

+3 -3

kernel/rcu/refscale.c

··· 655 655 goto retry; 656 656 } 657 657 un_delay(udl, ndl); 658 + b = READ_ONCE(rtsp->a); 658 659 // Remember, seqlock read-side release can fail. 659 660 if (!rts_release(rtsp, start)) { 660 661 rcu_read_unlock(); 661 662 goto retry; 662 663 } 663 - b = READ_ONCE(rtsp->a); 664 664 WARN_ONCE(a != b, "Re-read of ->a changed from %u to %u.\n", a, b); 665 665 b = rtsp->b; 666 666 rcu_read_unlock(); ··· 1025 1025 ref_scale_print_module_parms(struct ref_scale_ops *cur_ops, const char *tag) 1026 1026 { 1027 1027 pr_alert("%s" SCALE_FLAG 1028 - "--- %s: verbose=%d shutdown=%d holdoff=%d loops=%ld nreaders=%d nruns=%d readdelay=%d\n", scale_type, tag, 1029 - verbose, shutdown, holdoff, loops, nreaders, nruns, readdelay); 1028 + "--- %s: verbose=%d verbose_batched=%d shutdown=%d holdoff=%d lookup_instances=%ld loops=%ld nreaders=%d nruns=%d readdelay=%d\n", scale_type, tag, 1029 + verbose, verbose_batched, shutdown, holdoff, lookup_instances, loops, nreaders, nruns, readdelay); 1030 1030 } 1031 1031 1032 1032 static void

+1

kernel/rcu/srcutiny.c

··· 138 138 while (lh) { 139 139 rhp = lh; 140 140 lh = lh->next; 141 + debug_rcu_head_callback(rhp); 141 142 local_bh_disable(); 142 143 rhp->func(rhp); 143 144 local_bh_enable();

+51 -23

kernel/rcu/srcutree.c

··· 223 223 snp->grplo = cpu; 224 224 snp->grphi = cpu; 225 225 } 226 - sdp->grpmask = 1 << (cpu - sdp->mynode->grplo); 226 + sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo); 227 227 } 228 228 smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER); 229 229 return true; ··· 255 255 ssp->srcu_sup->sda_is_static = is_static; 256 256 if (!is_static) 257 257 ssp->sda = alloc_percpu(struct srcu_data); 258 - if (!ssp->sda) { 259 - if (!is_static) 260 - kfree(ssp->srcu_sup); 261 - return -ENOMEM; 262 - } 258 + if (!ssp->sda) 259 + goto err_free_sup; 263 260 init_srcu_struct_data(ssp); 264 261 ssp->srcu_sup->srcu_gp_seq_needed_exp = 0; 265 262 ssp->srcu_sup->srcu_last_gp_end = ktime_get_mono_fast_ns(); 266 263 if (READ_ONCE(ssp->srcu_sup->srcu_size_state) == SRCU_SIZE_SMALL && SRCU_SIZING_IS_INIT()) { 267 - if (!init_srcu_struct_nodes(ssp, GFP_ATOMIC)) { 268 - if (!ssp->srcu_sup->sda_is_static) { 269 - free_percpu(ssp->sda); 270 - ssp->sda = NULL; 271 - kfree(ssp->srcu_sup); 272 - return -ENOMEM; 273 - } 274 - } else { 275 - WRITE_ONCE(ssp->srcu_sup->srcu_size_state, SRCU_SIZE_BIG); 276 - } 264 + if (!init_srcu_struct_nodes(ssp, GFP_ATOMIC)) 265 + goto err_free_sda; 266 + WRITE_ONCE(ssp->srcu_sup->srcu_size_state, SRCU_SIZE_BIG); 277 267 } 278 268 ssp->srcu_sup->srcu_ssp = ssp; 279 269 smp_store_release(&ssp->srcu_sup->srcu_gp_seq_needed, 0); /* Init done. */ 280 270 return 0; 271 + 272 + err_free_sda: 273 + if (!is_static) { 274 + free_percpu(ssp->sda); 275 + ssp->sda = NULL; 276 + } 277 + err_free_sup: 278 + if (!is_static) { 279 + kfree(ssp->srcu_sup); 280 + ssp->srcu_sup = NULL; 281 + } 282 + return -ENOMEM; 281 283 } 282 284 283 285 #ifdef CONFIG_DEBUG_LOCK_ALLOC ··· 784 782 spin_lock_rcu_node(sdp); /* Interrupts already disabled. */ 785 783 rcu_segcblist_advance(&sdp->srcu_cblist, 786 784 rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq)); 787 - (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, 788 - rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq)); 785 + WARN_ON_ONCE(!rcu_segcblist_segempty(&sdp->srcu_cblist, RCU_NEXT_TAIL)); 789 786 spin_unlock_rcu_node(sdp); /* Interrupts remain disabled. */ 790 787 WRITE_ONCE(ssp->srcu_sup->srcu_gp_start, jiffies); 791 788 WRITE_ONCE(ssp->srcu_sup->srcu_n_exp_nodelay, 0); ··· 834 833 int cpu; 835 834 836 835 for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) { 837 - if (!(mask & (1 << (cpu - snp->grplo)))) 836 + if (!(mask & (1UL << (cpu - snp->grplo)))) 838 837 continue; 839 838 srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay); 840 839 } ··· 1243 1242 spin_lock_irqsave_sdp_contention(sdp, &flags); 1244 1243 if (rhp) 1245 1244 rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp); 1245 + /* 1246 + * The snapshot for acceleration must be taken _before_ the read of the 1247 + * current gp sequence used for advancing, otherwise advancing may fail 1248 + * and acceleration may then fail too. 1249 + * 1250 + * This could happen if: 1251 + * 1252 + * 1) The RCU_WAIT_TAIL segment has callbacks (gp_num = X + 4) and the 1253 + * RCU_NEXT_READY_TAIL also has callbacks (gp_num = X + 8). 1254 + * 1255 + * 2) The grace period for RCU_WAIT_TAIL is seen as started but not 1256 + * completed so rcu_seq_current() returns X + SRCU_STATE_SCAN1. 1257 + * 1258 + * 3) This value is passed to rcu_segcblist_advance() which can't move 1259 + * any segment forward and fails. 1260 + * 1261 + * 4) srcu_gp_start_if_needed() still proceeds with callback acceleration. 1262 + * But then the call to rcu_seq_snap() observes the grace period for the 1263 + * RCU_WAIT_TAIL segment as completed and the subsequent one for the 1264 + * RCU_NEXT_READY_TAIL segment as started (ie: X + 4 + SRCU_STATE_SCAN1) 1265 + * so it returns a snapshot of the next grace period, which is X + 12. 1266 + * 1267 + * 5) The value of X + 12 is passed to rcu_segcblist_accelerate() but the 1268 + * freshly enqueued callback in RCU_NEXT_TAIL can't move to 1269 + * RCU_NEXT_READY_TAIL which already has callbacks for a previous grace 1270 + * period (gp_num = X + 8). So acceleration fails. 1271 + */ 1272 + s = rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq); 1246 1273 rcu_segcblist_advance(&sdp->srcu_cblist, 1247 1274 rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq)); 1248 - s = rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq); 1249 - (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, s); 1275 + WARN_ON_ONCE(!rcu_segcblist_accelerate(&sdp->srcu_cblist, s) && rhp); 1250 1276 if (ULONG_CMP_LT(sdp->srcu_gp_seq_needed, s)) { 1251 1277 sdp->srcu_gp_seq_needed = s; 1252 1278 needgp = true; ··· 1720 1692 ssp = sdp->ssp; 1721 1693 rcu_cblist_init(&ready_cbs); 1722 1694 spin_lock_irq_rcu_node(sdp); 1695 + WARN_ON_ONCE(!rcu_segcblist_segempty(&sdp->srcu_cblist, RCU_NEXT_TAIL)); 1723 1696 rcu_segcblist_advance(&sdp->srcu_cblist, 1724 1697 rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq)); 1725 1698 if (sdp->srcu_cblist_invoking || ··· 1737 1708 rhp = rcu_cblist_dequeue(&ready_cbs); 1738 1709 for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) { 1739 1710 debug_rcu_head_unqueue(rhp); 1711 + debug_rcu_head_callback(rhp); 1740 1712 local_bh_disable(); 1741 1713 rhp->func(rhp); 1742 1714 local_bh_enable(); ··· 1750 1720 */ 1751 1721 spin_lock_irq_rcu_node(sdp); 1752 1722 rcu_segcblist_add_len(&sdp->srcu_cblist, -len); 1753 - (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, 1754 - rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq)); 1755 1723 sdp->srcu_cblist_invoking = false; 1756 1724 more = rcu_segcblist_ready_cbs(&sdp->srcu_cblist); 1757 1725 spin_unlock_irq_rcu_node(sdp);

+8 -3

kernel/rcu/tasks.h

··· 432 432 static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp) 433 433 { 434 434 int cpu; 435 + int dequeue_limit; 435 436 unsigned long flags; 436 437 bool gpdone = poll_state_synchronize_rcu(rtp->percpu_dequeue_gpseq); 437 438 long n; ··· 440 439 long ncbsnz = 0; 441 440 int needgpcb = 0; 442 441 443 - for (cpu = 0; cpu < smp_load_acquire(&rtp->percpu_dequeue_lim); cpu++) { 442 + dequeue_limit = smp_load_acquire(&rtp->percpu_dequeue_lim); 443 + for (cpu = 0; cpu < dequeue_limit; cpu++) { 444 444 struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu); 445 445 446 446 /* Advance and accelerate any new callbacks. */ ··· 540 538 raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags); 541 539 len = rcl.len; 542 540 for (rhp = rcu_cblist_dequeue(&rcl); rhp; rhp = rcu_cblist_dequeue(&rcl)) { 541 + debug_rcu_head_callback(rhp); 543 542 local_bh_disable(); 544 543 rhp->func(rhp); 545 544 local_bh_enable(); ··· 1087 1084 } 1088 1085 EXPORT_SYMBOL_GPL(rcu_barrier_tasks); 1089 1086 1090 - int rcu_tasks_lazy_ms = -1; 1087 + static int rcu_tasks_lazy_ms = -1; 1091 1088 module_param(rcu_tasks_lazy_ms, int, 0444); 1092 1089 1093 1090 static int __init rcu_spawn_tasks_kthread(void) ··· 1982 1979 1983 1980 static void rcu_tasks_initiate_self_tests(void) 1984 1981 { 1985 - pr_info("Running RCU-tasks wait API self tests\n"); 1986 1982 #ifdef CONFIG_TASKS_RCU 1983 + pr_info("Running RCU Tasks wait API self tests\n"); 1987 1984 tests[0].runstart = jiffies; 1988 1985 synchronize_rcu_tasks(); 1989 1986 call_rcu_tasks(&tests[0].rh, test_rcu_tasks_callback); 1990 1987 #endif 1991 1988 1992 1989 #ifdef CONFIG_TASKS_RUDE_RCU 1990 + pr_info("Running RCU Tasks Rude wait API self tests\n"); 1993 1991 tests[1].runstart = jiffies; 1994 1992 synchronize_rcu_tasks_rude(); 1995 1993 call_rcu_tasks_rude(&tests[1].rh, test_rcu_tasks_callback); 1996 1994 #endif 1997 1995 1998 1996 #ifdef CONFIG_TASKS_TRACE_RCU 1997 + pr_info("Running RCU Tasks Trace wait API self tests\n"); 1999 1998 tests[2].runstart = jiffies; 2000 1999 synchronize_rcu_tasks_trace(); 2001 2000 call_rcu_tasks_trace(&tests[2].rh, test_rcu_tasks_callback);

+1

kernel/rcu/tiny.c

··· 97 97 98 98 trace_rcu_invoke_callback("", head); 99 99 f = head->func; 100 + debug_rcu_head_callback(head); 100 101 WRITE_ONCE(head->func, (rcu_callback_t)0L); 101 102 f(head); 102 103 rcu_lock_release(&rcu_callback_map);

+170 -72

kernel/rcu/tree.c

··· 31 31 #include <linux/bitops.h> 32 32 #include <linux/export.h> 33 33 #include <linux/completion.h> 34 + #include <linux/kmemleak.h> 34 35 #include <linux/moduleparam.h> 35 36 #include <linux/panic.h> 36 37 #include <linux/panic_notifier.h> ··· 1261 1260 /* Unregister a counter, with NULL for not caring which. */ 1262 1261 void rcu_gp_slow_unregister(atomic_t *rgssp) 1263 1262 { 1264 - WARN_ON_ONCE(rgssp && rgssp != rcu_gp_slow_suppress); 1263 + WARN_ON_ONCE(rgssp && rgssp != rcu_gp_slow_suppress && rcu_gp_slow_suppress != NULL); 1265 1264 1266 1265 WRITE_ONCE(rcu_gp_slow_suppress, NULL); 1267 1266 } ··· 1557 1556 */ 1558 1557 static void rcu_gp_fqs(bool first_time) 1559 1558 { 1559 + int nr_fqs = READ_ONCE(rcu_state.nr_fqs_jiffies_stall); 1560 1560 struct rcu_node *rnp = rcu_get_root(); 1561 1561 1562 1562 WRITE_ONCE(rcu_state.gp_activity, jiffies); 1563 1563 WRITE_ONCE(rcu_state.n_force_qs, rcu_state.n_force_qs + 1); 1564 + 1565 + WARN_ON_ONCE(nr_fqs > 3); 1566 + /* Only countdown nr_fqs for stall purposes if jiffies moves. */ 1567 + if (nr_fqs) { 1568 + if (nr_fqs == 1) { 1569 + WRITE_ONCE(rcu_state.jiffies_stall, 1570 + jiffies + rcu_jiffies_till_stall_check()); 1571 + } 1572 + WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, --nr_fqs); 1573 + } 1574 + 1564 1575 if (first_time) { 1565 1576 /* Collect dyntick-idle snapshots. */ 1566 1577 force_qs_rnp(dyntick_save_progress_counter); ··· 2148 2135 trace_rcu_invoke_callback(rcu_state.name, rhp); 2149 2136 2150 2137 f = rhp->func; 2138 + debug_rcu_head_callback(rhp); 2151 2139 WRITE_ONCE(rhp->func, (rcu_callback_t)0L); 2152 2140 f(rhp); 2153 2141 ··· 2727 2713 */ 2728 2714 void call_rcu_hurry(struct rcu_head *head, rcu_callback_t func) 2729 2715 { 2730 - return __call_rcu_common(head, func, false); 2716 + __call_rcu_common(head, func, false); 2731 2717 } 2732 2718 EXPORT_SYMBOL_GPL(call_rcu_hurry); 2733 2719 #endif ··· 2778 2764 */ 2779 2765 void call_rcu(struct rcu_head *head, rcu_callback_t func) 2780 2766 { 2781 - return __call_rcu_common(head, func, IS_ENABLED(CONFIG_RCU_LAZY)); 2767 + __call_rcu_common(head, func, IS_ENABLED(CONFIG_RCU_LAZY)); 2782 2768 } 2783 2769 EXPORT_SYMBOL_GPL(call_rcu); 2784 2770 ··· 3401 3387 krcp->head_gp_snap = get_state_synchronize_rcu(); 3402 3388 success = true; 3403 3389 } 3390 + 3391 + /* 3392 + * The kvfree_rcu() caller considers the pointer freed at this point 3393 + * and likely removes any references to it. Since the actual slab 3394 + * freeing (and kmemleak_free()) is deferred, tell kmemleak to ignore 3395 + * this object (no scanning or false positives reporting). 3396 + */ 3397 + kmemleak_ignore(ptr); 3404 3398 3405 3399 // Set timer to drain after KFREE_DRAIN_JIFFIES. 3406 3400 if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING) ··· 4105 4083 } 4106 4084 EXPORT_SYMBOL_GPL(rcu_barrier); 4107 4085 4086 + static unsigned long rcu_barrier_last_throttle; 4087 + 4088 + /** 4089 + * rcu_barrier_throttled - Do rcu_barrier(), but limit to one per second 4090 + * 4091 + * This can be thought of as guard rails around rcu_barrier() that 4092 + * permits unrestricted userspace use, at least assuming the hardware's 4093 + * try_cmpxchg() is robust. There will be at most one call per second to 4094 + * rcu_barrier() system-wide from use of this function, which means that 4095 + * callers might needlessly wait a second or three. 4096 + * 4097 + * This is intended for use by test suites to avoid OOM by flushing RCU 4098 + * callbacks from the previous test before starting the next. See the 4099 + * rcutree.do_rcu_barrier module parameter for more information. 4100 + * 4101 + * Why not simply make rcu_barrier() more scalable? That might be 4102 + * the eventual endpoint, but let's keep it simple for the time being. 4103 + * Note that the module parameter infrastructure serializes calls to a 4104 + * given .set() function, but should concurrent .set() invocation ever be 4105 + * possible, we are ready! 4106 + */ 4107 + static void rcu_barrier_throttled(void) 4108 + { 4109 + unsigned long j = jiffies; 4110 + unsigned long old = READ_ONCE(rcu_barrier_last_throttle); 4111 + unsigned long s = rcu_seq_snap(&rcu_state.barrier_sequence); 4112 + 4113 + while (time_in_range(j, old, old + HZ / 16) || 4114 + !try_cmpxchg(&rcu_barrier_last_throttle, &old, j)) { 4115 + schedule_timeout_idle(HZ / 16); 4116 + if (rcu_seq_done(&rcu_state.barrier_sequence, s)) { 4117 + smp_mb(); /* caller's subsequent code after above check. */ 4118 + return; 4119 + } 4120 + j = jiffies; 4121 + old = READ_ONCE(rcu_barrier_last_throttle); 4122 + } 4123 + rcu_barrier(); 4124 + } 4125 + 4126 + /* 4127 + * Invoke rcu_barrier_throttled() when a rcutree.do_rcu_barrier 4128 + * request arrives. We insist on a true value to allow for possible 4129 + * future expansion. 4130 + */ 4131 + static int param_set_do_rcu_barrier(const char *val, const struct kernel_param *kp) 4132 + { 4133 + bool b; 4134 + int ret; 4135 + 4136 + if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING) 4137 + return -EAGAIN; 4138 + ret = kstrtobool(val, &b); 4139 + if (!ret && b) { 4140 + atomic_inc((atomic_t *)kp->arg); 4141 + rcu_barrier_throttled(); 4142 + atomic_dec((atomic_t *)kp->arg); 4143 + } 4144 + return ret; 4145 + } 4146 + 4147 + /* 4148 + * Output the number of outstanding rcutree.do_rcu_barrier requests. 4149 + */ 4150 + static int param_get_do_rcu_barrier(char *buffer, const struct kernel_param *kp) 4151 + { 4152 + return sprintf(buffer, "%d\n", atomic_read((atomic_t *)kp->arg)); 4153 + } 4154 + 4155 + static const struct kernel_param_ops do_rcu_barrier_ops = { 4156 + .set = param_set_do_rcu_barrier, 4157 + .get = param_get_do_rcu_barrier, 4158 + }; 4159 + static atomic_t do_rcu_barrier; 4160 + module_param_cb(do_rcu_barrier, &do_rcu_barrier_ops, &do_rcu_barrier, 0644); 4161 + 4108 4162 /* 4109 4163 * Compute the mask of online CPUs for the specified rcu_node structure. 4110 4164 * This will not be stable unless the rcu_node structure's ->lock is ··· 4228 4130 rdp = this_cpu_ptr(&rcu_data); 4229 4131 /* 4230 4132 * Strictly, we care here about the case where the current CPU is 4231 - * in rcu_cpu_starting() and thus has an excuse for rdp->grpmask 4133 + * in rcutree_report_cpu_starting() and thus has an excuse for rdp->grpmask 4232 4134 * not being up to date. So arch_spin_is_locked() might have a 4233 4135 * false positive if it's held by some *other* CPU, but that's 4234 4136 * OK because that just means a false *negative* on the warning. ··· 4247 4149 static bool rcu_init_invoked(void) 4248 4150 { 4249 4151 return !!rcu_state.n_online_cpus; 4250 - } 4251 - 4252 - /* 4253 - * Near the end of the offline process. Trace the fact that this CPU 4254 - * is going offline. 4255 - */ 4256 - int rcutree_dying_cpu(unsigned int cpu) 4257 - { 4258 - bool blkd; 4259 - struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); 4260 - struct rcu_node *rnp = rdp->mynode; 4261 - 4262 - if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) 4263 - return 0; 4264 - 4265 - blkd = !!(READ_ONCE(rnp->qsmask) & rdp->grpmask); 4266 - trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq), 4267 - blkd ? TPS("cpuofl-bgp") : TPS("cpuofl")); 4268 - return 0; 4269 4152 } 4270 4153 4271 4154 /* ··· 4292 4213 } 4293 4214 raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ 4294 4215 } 4295 - } 4296 - 4297 - /* 4298 - * The CPU has been completely removed, and some other CPU is reporting 4299 - * this fact from process context. Do the remainder of the cleanup. 4300 - * There can only be one CPU hotplug operation at a time, so no need for 4301 - * explicit locking. 4302 - */ 4303 - int rcutree_dead_cpu(unsigned int cpu) 4304 - { 4305 - if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) 4306 - return 0; 4307 - 4308 - WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1); 4309 - // Stop-machine done, so allow nohz_full to disable tick. 4310 - tick_dep_clear(TICK_DEP_BIT_RCU); 4311 - return 0; 4312 4216 } 4313 4217 4314 4218 /* ··· 4447 4385 } 4448 4386 4449 4387 /* 4450 - * Near the beginning of the process. The CPU is still very much alive 4451 - * with pretty much all services enabled. 4452 - */ 4453 - int rcutree_offline_cpu(unsigned int cpu) 4454 - { 4455 - unsigned long flags; 4456 - struct rcu_data *rdp; 4457 - struct rcu_node *rnp; 4458 - 4459 - rdp = per_cpu_ptr(&rcu_data, cpu); 4460 - rnp = rdp->mynode; 4461 - raw_spin_lock_irqsave_rcu_node(rnp, flags); 4462 - rnp->ffmask &= ~rdp->grpmask; 4463 - raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 4464 - 4465 - rcutree_affinity_setting(cpu, cpu); 4466 - 4467 - // nohz_full CPUs need the tick for stop-machine to work quickly 4468 - tick_dep_set(TICK_DEP_BIT_RCU); 4469 - return 0; 4470 - } 4471 - 4472 - /* 4473 4388 * Mark the specified CPU as being online so that subsequent grace periods 4474 4389 * (both expedited and normal) will wait on it. Note that this means that 4475 4390 * incoming CPUs are not allowed to use RCU read-side critical sections ··· 4457 4418 * from the incoming CPU rather than from the cpuhp_step mechanism. 4458 4419 * This is because this function must be invoked at a precise location. 4459 4420 * This incoming CPU must not have enabled interrupts yet. 4421 + * 4422 + * This mirrors the effects of rcutree_report_cpu_dead(). 4460 4423 */ 4461 - void rcu_cpu_starting(unsigned int cpu) 4424 + void rcutree_report_cpu_starting(unsigned int cpu) 4462 4425 { 4463 4426 unsigned long mask; 4464 4427 struct rcu_data *rdp; ··· 4514 4473 * Note that this function is special in that it is invoked directly 4515 4474 * from the outgoing CPU rather than from the cpuhp_step mechanism. 4516 4475 * This is because this function must be invoked at a precise location. 4476 + * 4477 + * This mirrors the effect of rcutree_report_cpu_starting(). 4517 4478 */ 4518 - void rcu_report_dead(unsigned int cpu) 4479 + void rcutree_report_cpu_dead(void) 4519 4480 { 4520 - unsigned long flags, seq_flags; 4481 + unsigned long flags; 4521 4482 unsigned long mask; 4522 - struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); 4483 + struct rcu_data *rdp = this_cpu_ptr(&rcu_data); 4523 4484 struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */ 4524 4485 4486 + /* 4487 + * IRQS must be disabled from now on and until the CPU dies, or an interrupt 4488 + * may introduce a new READ-side while it is actually off the QS masks. 4489 + */ 4490 + lockdep_assert_irqs_disabled(); 4525 4491 // Do any dangling deferred wakeups. 4526 4492 do_nocb_deferred_wakeup(rdp); 4527 4493 ··· 4536 4488 4537 4489 /* Remove outgoing CPU from mask in the leaf rcu_node structure. */ 4538 4490 mask = rdp->grpmask; 4539 - local_irq_save(seq_flags); 4540 4491 arch_spin_lock(&rcu_state.ofl_lock); 4541 4492 raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */ 4542 4493 rdp->rcu_ofl_gp_seq = READ_ONCE(rcu_state.gp_seq); ··· 4549 4502 WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask); 4550 4503 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 4551 4504 arch_spin_unlock(&rcu_state.ofl_lock); 4552 - local_irq_restore(seq_flags); 4553 - 4554 4505 rdp->cpu_started = false; 4555 4506 } 4556 4507 ··· 4603 4558 cpu, rcu_segcblist_n_cbs(&rdp->cblist), 4604 4559 rcu_segcblist_first_cb(&rdp->cblist)); 4605 4560 } 4606 - #endif 4561 + 4562 + /* 4563 + * The CPU has been completely removed, and some other CPU is reporting 4564 + * this fact from process context. Do the remainder of the cleanup. 4565 + * There can only be one CPU hotplug operation at a time, so no need for 4566 + * explicit locking. 4567 + */ 4568 + int rcutree_dead_cpu(unsigned int cpu) 4569 + { 4570 + WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1); 4571 + // Stop-machine done, so allow nohz_full to disable tick. 4572 + tick_dep_clear(TICK_DEP_BIT_RCU); 4573 + return 0; 4574 + } 4575 + 4576 + /* 4577 + * Near the end of the offline process. Trace the fact that this CPU 4578 + * is going offline. 4579 + */ 4580 + int rcutree_dying_cpu(unsigned int cpu) 4581 + { 4582 + bool blkd; 4583 + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); 4584 + struct rcu_node *rnp = rdp->mynode; 4585 + 4586 + blkd = !!(READ_ONCE(rnp->qsmask) & rdp->grpmask); 4587 + trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq), 4588 + blkd ? TPS("cpuofl-bgp") : TPS("cpuofl")); 4589 + return 0; 4590 + } 4591 + 4592 + /* 4593 + * Near the beginning of the process. The CPU is still very much alive 4594 + * with pretty much all services enabled. 4595 + */ 4596 + int rcutree_offline_cpu(unsigned int cpu) 4597 + { 4598 + unsigned long flags; 4599 + struct rcu_data *rdp; 4600 + struct rcu_node *rnp; 4601 + 4602 + rdp = per_cpu_ptr(&rcu_data, cpu); 4603 + rnp = rdp->mynode; 4604 + raw_spin_lock_irqsave_rcu_node(rnp, flags); 4605 + rnp->ffmask &= ~rdp->grpmask; 4606 + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 4607 + 4608 + rcutree_affinity_setting(cpu, cpu); 4609 + 4610 + // nohz_full CPUs need the tick for stop-machine to work quickly 4611 + tick_dep_set(TICK_DEP_BIT_RCU); 4612 + return 0; 4613 + } 4614 + #endif /* #ifdef CONFIG_HOTPLUG_CPU */ 4607 4615 4608 4616 /* 4609 4617 * On non-huge systems, use expedited RCU grace periods to make suspend ··· 5088 4990 pm_notifier(rcu_pm_notify, 0); 5089 4991 WARN_ON(num_online_cpus() > 1); // Only one CPU this early in boot. 5090 4992 rcutree_prepare_cpu(cpu); 5091 - rcu_cpu_starting(cpu); 4993 + rcutree_report_cpu_starting(cpu); 5092 4994 rcutree_online_cpu(cpu); 5093 4995 5094 4996 /* Create workqueue for Tree SRCU and for expedited GPs. */

+4

kernel/rcu/tree.h

··· 386 386 /* in jiffies. */ 387 387 unsigned long jiffies_stall; /* Time at which to check */ 388 388 /* for CPU stalls. */ 389 + int nr_fqs_jiffies_stall; /* Number of fqs loops after 390 + * which read jiffies and set 391 + * jiffies_stall. Stall 392 + * warnings disabled if !0. */ 389 393 unsigned long jiffies_resched; /* Time at which to resched */ 390 394 /* a reluctant CPU. */ 391 395 unsigned long n_force_qs_gpstart; /* Snapshot of n_force_qs at */

+5 -1

kernel/rcu/tree_exp.h

··· 621 621 } 622 622 623 623 for (;;) { 624 + unsigned long j; 625 + 624 626 if (synchronize_rcu_expedited_wait_once(jiffies_stall)) 625 627 return; 626 628 if (rcu_stall_is_suppressed()) 627 629 continue; 630 + j = jiffies; 631 + rcu_stall_notifier_call_chain(RCU_STALL_NOTIFY_EXP, (void *)(j - jiffies_start)); 628 632 trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall")); 629 633 pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {", 630 634 rcu_state.name); ··· 651 647 } 652 648 } 653 649 pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n", 654 - jiffies - jiffies_start, rcu_state.expedited_sequence, 650 + j - jiffies_start, rcu_state.expedited_sequence, 655 651 data_race(rnp_root->expmask), 656 652 ".T"[!!data_race(rnp_root->exp_tasks)]); 657 653 if (ndetected) {

+98 -37

kernel/rcu/tree_stall.h

··· 8 8 */ 9 9 10 10 #include <linux/kvm_para.h> 11 + #include <linux/rcu_notifier.h> 11 12 12 13 ////////////////////////////////////////////////////////////////////////////// 13 14 // ··· 150 149 /** 151 150 * rcu_cpu_stall_reset - restart stall-warning timeout for current grace period 152 151 * 152 + * To perform the reset request from the caller, disable stall detection until 153 + * 3 fqs loops have passed. This is required to ensure a fresh jiffies is 154 + * loaded. It should be safe to do from the fqs loop as enough timer 155 + * interrupts and context switches should have passed. 156 + * 153 157 * The caller must disable hard irqs. 154 158 */ 155 159 void rcu_cpu_stall_reset(void) 156 160 { 157 - WRITE_ONCE(rcu_state.jiffies_stall, 158 - jiffies + rcu_jiffies_till_stall_check()); 161 + WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, 3); 162 + WRITE_ONCE(rcu_state.jiffies_stall, ULONG_MAX); 159 163 } 160 164 161 165 ////////////////////////////////////////////////////////////////////////////// ··· 176 170 WRITE_ONCE(rcu_state.gp_start, j); 177 171 j1 = rcu_jiffies_till_stall_check(); 178 172 smp_mb(); // ->gp_start before ->jiffies_stall and caller's ->gp_seq. 173 + WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, 0); 179 174 WRITE_ONCE(rcu_state.jiffies_stall, j + j1); 180 175 rcu_state.jiffies_resched = j + j1 / 2; 181 176 rcu_state.n_force_qs_gpstart = READ_ONCE(rcu_state.n_force_qs); ··· 541 534 data_race(READ_ONCE(rcu_state.gp_state)), 542 535 gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu); 543 536 if (gpk) { 537 + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); 538 + 544 539 pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name); 545 540 pr_err("RCU grace-period kthread stack dump:\n"); 546 541 sched_show_task(gpk); 547 - if (cpu >= 0) { 548 - if (cpu_is_offline(cpu)) { 549 - pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); 550 - } else { 551 - pr_err("Stack dump where RCU GP kthread last ran:\n"); 552 - dump_cpu_task(cpu); 553 - } 542 + if (cpu_is_offline(cpu)) { 543 + pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); 544 + } else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) { 545 + pr_err("Stack dump where RCU GP kthread last ran:\n"); 546 + dump_cpu_task(cpu); 554 547 } 555 548 wake_up_process(gpk); 556 549 } ··· 718 711 719 712 static void check_cpu_stall(struct rcu_data *rdp) 720 713 { 721 - bool didstall = false; 714 + bool self_detected; 722 715 unsigned long gs1; 723 716 unsigned long gs2; 724 717 unsigned long gps; ··· 732 725 !rcu_gp_in_progress()) 733 726 return; 734 727 rcu_stall_kick_kthreads(); 728 + 729 + /* 730 + * Check if it was requested (via rcu_cpu_stall_reset()) that the FQS 731 + * loop has to set jiffies to ensure a non-stale jiffies value. This 732 + * is required to have good jiffies value after coming out of long 733 + * breaks of jiffies updates. Not doing so can cause false positives. 734 + */ 735 + if (READ_ONCE(rcu_state.nr_fqs_jiffies_stall) > 0) 736 + return; 737 + 735 738 j = jiffies; 736 739 737 740 /* ··· 775 758 return; /* No stall or GP completed since entering function. */ 776 759 rnp = rdp->mynode; 777 760 jn = jiffies + ULONG_MAX / 2; 761 + self_detected = READ_ONCE(rnp->qsmask) & rdp->grpmask; 778 762 if (rcu_gp_in_progress() && 779 - (READ_ONCE(rnp->qsmask) & rdp->grpmask) && 763 + (self_detected || ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY)) && 780 764 cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { 781 - 782 765 /* 783 766 * If a virtual machine is stopped by the host it can look to 784 767 * the watchdog like an RCU stall. Check to see if the host ··· 787 770 if (kvm_check_and_clear_guest_paused()) 788 771 return; 789 772 790 - /* We haven't checked in, so go dump stack. */ 791 - print_cpu_stall(gps); 773 + rcu_stall_notifier_call_chain(RCU_STALL_NOTIFY_NORM, (void *)j - gps); 774 + if (self_detected) { 775 + /* We haven't checked in, so go dump stack. */ 776 + print_cpu_stall(gps); 777 + } else { 778 + /* They had a few time units to dump stack, so complain. */ 779 + print_other_cpu_stall(gs2, gps); 780 + } 781 + 792 782 if (READ_ONCE(rcu_cpu_stall_ftrace_dump)) 793 783 rcu_ftrace_dump(DUMP_ALL); 794 - didstall = true; 795 784 796 - } else if (rcu_gp_in_progress() && 797 - ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) && 798 - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { 799 - 800 - /* 801 - * If a virtual machine is stopped by the host it can look to 802 - * the watchdog like an RCU stall. Check to see if the host 803 - * stopped the vm. 804 - */ 805 - if (kvm_check_and_clear_guest_paused()) 806 - return; 807 - 808 - /* They had a few time units to dump stack, so complain. */ 809 - print_other_cpu_stall(gs2, gps); 810 - if (READ_ONCE(rcu_cpu_stall_ftrace_dump)) 811 - rcu_ftrace_dump(DUMP_ALL); 812 - didstall = true; 813 - } 814 - if (didstall && READ_ONCE(rcu_state.jiffies_stall) == jn) { 815 - jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3; 816 - WRITE_ONCE(rcu_state.jiffies_stall, jn); 785 + if (READ_ONCE(rcu_state.jiffies_stall) == jn) { 786 + jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3; 787 + WRITE_ONCE(rcu_state.jiffies_stall, jn); 788 + } 817 789 } 818 790 } 819 791 820 792 ////////////////////////////////////////////////////////////////////////////// 821 793 // 822 - // RCU forward-progress mechanisms, including of callback invocation. 794 + // RCU forward-progress mechanisms, including for callback invocation. 823 795 824 796 825 797 /* ··· 1060 1054 return 0; 1061 1055 } 1062 1056 early_initcall(rcu_sysrq_init); 1057 + 1058 + 1059 + ////////////////////////////////////////////////////////////////////////////// 1060 + // 1061 + // RCU CPU stall-warning notifiers 1062 + 1063 + static ATOMIC_NOTIFIER_HEAD(rcu_cpu_stall_notifier_list); 1064 + 1065 + /** 1066 + * rcu_stall_chain_notifier_register - Add an RCU CPU stall notifier 1067 + * @n: Entry to add. 1068 + * 1069 + * Adds an RCU CPU stall notifier to an atomic notifier chain. 1070 + * The @action passed to a notifier will be @RCU_STALL_NOTIFY_NORM or 1071 + * friends. The @data will be the duration of the stalled grace period, 1072 + * in jiffies, coerced to a void* pointer. 1073 + * 1074 + * Returns 0 on success, %-EEXIST on error. 1075 + */ 1076 + int rcu_stall_chain_notifier_register(struct notifier_block *n) 1077 + { 1078 + return atomic_notifier_chain_register(&rcu_cpu_stall_notifier_list, n); 1079 + } 1080 + EXPORT_SYMBOL_GPL(rcu_stall_chain_notifier_register); 1081 + 1082 + /** 1083 + * rcu_stall_chain_notifier_unregister - Remove an RCU CPU stall notifier 1084 + * @n: Entry to add. 1085 + * 1086 + * Removes an RCU CPU stall notifier from an atomic notifier chain. 1087 + * 1088 + * Returns zero on success, %-ENOENT on failure. 1089 + */ 1090 + int rcu_stall_chain_notifier_unregister(struct notifier_block *n) 1091 + { 1092 + return atomic_notifier_chain_unregister(&rcu_cpu_stall_notifier_list, n); 1093 + } 1094 + EXPORT_SYMBOL_GPL(rcu_stall_chain_notifier_unregister); 1095 + 1096 + /* 1097 + * rcu_stall_notifier_call_chain - Call functions in an RCU CPU stall notifier chain 1098 + * @val: Value passed unmodified to notifier function 1099 + * @v: Pointer passed unmodified to notifier function 1100 + * 1101 + * Calls each function in the RCU CPU stall notifier chain in turn, which 1102 + * is an atomic call chain. See atomic_notifier_call_chain() for more 1103 + * information. 1104 + * 1105 + * This is for use within RCU, hence the omission of the extra asterisk 1106 + * to indicate a non-kerneldoc format header comment. 1107 + */ 1108 + int rcu_stall_notifier_call_chain(unsigned long val, void *v) 1109 + { 1110 + return atomic_notifier_call_chain(&rcu_cpu_stall_notifier_list, val, v); 1111 + }

+11 -30

mm/slab_common.c

··· 528 528 } 529 529 530 530 #ifdef CONFIG_PRINTK 531 - /** 532 - * kmem_valid_obj - does the pointer reference a valid slab object? 533 - * @object: pointer to query. 534 - * 535 - * Return: %true if the pointer is to a not-yet-freed object from 536 - * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer 537 - * is to an already-freed object, and %false otherwise. 538 - */ 539 - bool kmem_valid_obj(void *object) 540 - { 541 - struct folio *folio; 542 - 543 - /* Some arches consider ZERO_SIZE_PTR to be a valid address. */ 544 - if (object < (void *)PAGE_SIZE || !virt_addr_valid(object)) 545 - return false; 546 - folio = virt_to_folio(object); 547 - return folio_test_slab(folio); 548 - } 549 - EXPORT_SYMBOL_GPL(kmem_valid_obj); 550 - 551 531 static void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab) 552 532 { 553 533 if (__kfence_obj_info(kpp, object, slab)) ··· 546 566 * and, if available, the slab name, return address, and stack trace from 547 567 * the allocation and last free path of that object. 548 568 * 549 - * This function will splat if passed a pointer to a non-slab object. 550 - * If you are not sure what type of object you have, you should instead 551 - * use mem_dump_obj(). 569 + * Return: %true if the pointer is to a not-yet-freed object from 570 + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer 571 + * is to an already-freed object, and %false otherwise. 552 572 */ 553 - void kmem_dump_obj(void *object) 573 + bool kmem_dump_obj(void *object) 554 574 { 555 575 char *cp = IS_ENABLED(CONFIG_MMU) ? "" : "/vmalloc"; 556 576 int i; ··· 558 578 unsigned long ptroffset; 559 579 struct kmem_obj_info kp = { }; 560 580 561 - if (WARN_ON_ONCE(!virt_addr_valid(object))) 562 - return; 581 + /* Some arches consider ZERO_SIZE_PTR to be a valid address. */ 582 + if (object < (void *)PAGE_SIZE || !virt_addr_valid(object)) 583 + return false; 563 584 slab = virt_to_slab(object); 564 - if (WARN_ON_ONCE(!slab)) { 565 - pr_cont(" non-slab memory.\n"); 566 - return; 567 - } 585 + if (!slab) 586 + return false; 587 + 568 588 kmem_obj_info(&kp, object, slab); 569 589 if (kp.kp_slab_cache) 570 590 pr_cont(" slab%s %s", cp, kp.kp_slab_cache->name); ··· 601 621 pr_info(" %pS\n", kp.kp_free_stack[i]); 602 622 } 603 623 624 + return true; 604 625 } 605 626 EXPORT_SYMBOL_GPL(kmem_dump_obj); 606 627 #endif

+1 -3

mm/util.c

··· 1060 1060 { 1061 1061 const char *type; 1062 1062 1063 - if (kmem_valid_obj(object)) { 1064 - kmem_dump_obj(object); 1063 + if (kmem_dump_obj(object)) 1065 1064 return; 1066 - } 1067 1065 1068 1066 if (vmalloc_dump_obj(object)) 1069 1067 return;

-9

scripts/checkpatch.pl

··· 6427 6427 } 6428 6428 } 6429 6429 6430 - # check for soon-to-be-deprecated single-argument k[v]free_rcu() API 6431 - if ($line =~ /\bk[v]?free_rcu\s*$[^(]+$/) { 6432 - if ($line =~ /\bk[v]?free_rcu\s*$[^,]+$/) { 6433 - ERROR("DEPRECATED_API", 6434 - "Single-argument k[v]free_rcu() API is deprecated, please pass rcu_head object or call k[v]free_rcu_mightsleep()." . $herecurr); 6435 - } 6436 - } 6437 - 6438 - 6439 6430 # check for unnecessary "Out of Memory" messages 6440 6431 if ($line =~ /^\+.*\b$logFunctions\s*$/ && 6441 6432 $prevline =~ /^[ \+]\s*if\s*\(\s*(\!\s*|NULL\s*==\s*)?($Lval)(\s*==\s*NULL\s*)?\s*$/ &&