Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull RCU updates from Paul McKenney:

- Bitmap parsing support for "all" as an alias for all bits

- Documentation updates

- Miscellaneous fixes, including some that overlap into mm and lockdep

- kvfree_rcu() updates

- mem_dump_obj() updates, with acks from one of the slab-allocator
maintainers

- RCU NOCB CPU updates, including limited deoffloading

- SRCU updates

- Tasks-RCU updates

- Torture-test updates

* 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (78 commits)
tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline
rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states
rcu: Add missing __releases() annotation
rcu: Remove obsolete rcu_read_unlock() deadlock commentary
rcu: Improve comments describing RCU read-side critical sections
rcu: Create an unrcu_pointer() to remove __rcu from a pointer
srcu: Early test SRCU polling start
rcu: Fix various typos in comments
rcu/nocb: Unify timers
rcu/nocb: Prepare for fine-grained deferred wakeup
rcu/nocb: Only cancel nocb timer if not polling
rcu/nocb: Delete bypass_timer upon nocb_gp wakeup
rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup
rcu/nocb: Allow de-offloading rdp leader
rcu/nocb: Directly call __wake_nocb_gp() from bypass timer
rcu: Don't penalize priority boosting when there is nothing to boost
rcu: Point to documentation of ordering guarantees
rcu: Make rcu_gp_cleanup() be noinline for tracing
rcu: Restrict RCU_STRICT_GRACE_PERIOD to at most four CPUs
rcu: Make show_rcu_gp_kthreads() dump rcu_node structures blocking GP
...

+1253 -578
+3 -3
Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
··· 21 21 to see the effects of all accesses prior to the beginning of that grace 22 22 period that are within RCU read-side critical sections. 23 23 Similarly, any code that happens before the beginning of a given RCU grace 24 - period is guaranteed to see the effects of all accesses following the end 24 + period is guaranteed to not see the effects of all accesses following the end 25 25 of that grace period that are within RCU read-side critical sections. 26 26 27 27 Note well that RCU-sched read-side critical sections include any region ··· 339 339 leftmost ``rcu_node`` structure offlines its last CPU and if the next 340 340 ``rcu_node`` structure has no online CPUs). 341 341 342 - .. kernel-figure:: TreeRCU-gp-init-1.svg 342 + .. kernel-figure:: TreeRCU-gp-init-2.svg 343 343 344 344 The final ``rcu_gp_init()`` pass through the ``rcu_node`` tree traverses 345 345 breadth-first, setting each ``rcu_node`` structure's ``->gp_seq`` field 346 346 to the newly advanced value from the ``rcu_state`` structure, as shown 347 347 in the following diagram. 348 348 349 - .. kernel-figure:: TreeRCU-gp-init-1.svg 349 + .. kernel-figure:: TreeRCU-gp-init-3.svg 350 350 351 351 This change will also cause each CPU's next call to 352 352 ``__note_gp_changes()`` to notice that a new grace period has started,
+5
Documentation/admin-guide/kernel-parameters.rst
··· 76 76 will also change. Use the same on a small 4 core system, and "16-N" becomes 77 77 "16-3" and now the same boot input will be flagged as invalid (start > end). 78 78 79 + The special case-tolerant group name "all" has a meaning of selecting all CPUs, 80 + so that "nohz_full=all" is the equivalent of "nohz_full=0-N". 81 + 82 + The semantics of "N" and "all" is supported on a level of bitmaps and holds for 83 + all users of bitmap_parse(). 79 84 80 85 This document may not be entirely up to date and comprehensive. The command 81 86 "modinfo -p ${modulename}" shows a current list of all parameters of a loadable
+5
Documentation/admin-guide/kernel-parameters.txt
··· 4354 4354 whole algorithm to behave better in low memory 4355 4355 condition. 4356 4356 4357 + rcutree.rcu_delay_page_cache_fill_msec= [KNL] 4358 + Set the page-cache refill delay (in milliseconds) 4359 + in response to low-memory conditions. The range 4360 + of permitted values is in the range 0:100000. 4361 + 4357 4362 rcutree.jiffies_till_first_fqs= [KNL] 4358 4363 Set delay from grace-period initialization to 4359 4364 first attempt to force quiescent states.
+36 -36
include/linux/rcupdate.h
··· 315 315 #define RCU_LOCKDEP_WARN(c, s) \ 316 316 do { \ 317 317 static bool __section(".data.unlikely") __warned; \ 318 - if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \ 318 + if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \ 319 319 __warned = true; \ 320 320 lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ 321 321 } \ ··· 373 373 #define unrcu_pointer(p) \ 374 374 ({ \ 375 375 typeof(*p) *_________p1 = (typeof(*p) *__force)(p); \ 376 - rcu_check_sparse(p, __rcu); \ 376 + rcu_check_sparse(p, __rcu); \ 377 377 ((typeof(*p) __force __kernel *)(_________p1)); \ 378 378 }) 379 379 ··· 532 532 * @p: The pointer to read, prior to dereferencing 533 533 * @c: The conditions under which the dereference will take place 534 534 * 535 - * This is the RCU-bh counterpart to rcu_dereference_check(). 535 + * This is the RCU-bh counterpart to rcu_dereference_check(). However, 536 + * please note that starting in v5.0 kernels, vanilla RCU grace periods 537 + * wait for local_bh_disable() regions of code in addition to regions of 538 + * code demarked by rcu_read_lock() and rcu_read_unlock(). This means 539 + * that synchronize_rcu(), call_rcu, and friends all take not only 540 + * rcu_read_lock() but also rcu_read_lock_bh() into account. 536 541 */ 537 542 #define rcu_dereference_bh_check(p, c) \ 538 543 __rcu_dereference_check((p), (c) || rcu_read_lock_bh_held(), __rcu) ··· 548 543 * @c: The conditions under which the dereference will take place 549 544 * 550 545 * This is the RCU-sched counterpart to rcu_dereference_check(). 546 + * However, please note that starting in v5.0 kernels, vanilla RCU grace 547 + * periods wait for preempt_disable() regions of code in addition to 548 + * regions of code demarked by rcu_read_lock() and rcu_read_unlock(). 549 + * This means that synchronize_rcu(), call_rcu, and friends all take not 550 + * only rcu_read_lock() but also rcu_read_lock_sched() into account. 551 551 */ 552 552 #define rcu_dereference_sched_check(p, c) \ 553 553 __rcu_dereference_check((p), (c) || rcu_read_lock_sched_held(), \ ··· 644 634 * sections, invocation of the corresponding RCU callback is deferred 645 635 * until after the all the other CPUs exit their critical sections. 646 636 * 637 + * In v5.0 and later kernels, synchronize_rcu() and call_rcu() also 638 + * wait for regions of code with preemption disabled, including regions of 639 + * code with interrupts or softirqs disabled. In pre-v5.0 kernels, which 640 + * define synchronize_sched(), only code enclosed within rcu_read_lock() 641 + * and rcu_read_unlock() are guaranteed to be waited for. 642 + * 647 643 * Note, however, that RCU callbacks are permitted to run concurrently 648 644 * with new RCU read-side critical sections. One way that this can happen 649 645 * is via the following sequence of events: (1) CPU 0 enters an RCU ··· 702 686 /** 703 687 * rcu_read_unlock() - marks the end of an RCU read-side critical section. 704 688 * 705 - * In most situations, rcu_read_unlock() is immune from deadlock. 706 - * However, in kernels built with CONFIG_RCU_BOOST, rcu_read_unlock() 707 - * is responsible for deboosting, which it does via rt_mutex_unlock(). 708 - * Unfortunately, this function acquires the scheduler's runqueue and 709 - * priority-inheritance spinlocks. This means that deadlock could result 710 - * if the caller of rcu_read_unlock() already holds one of these locks or 711 - * any lock that is ever acquired while holding them. 712 - * 713 - * That said, RCU readers are never priority boosted unless they were 714 - * preempted. Therefore, one way to avoid deadlock is to make sure 715 - * that preemption never happens within any RCU read-side critical 716 - * section whose outermost rcu_read_unlock() is called with one of 717 - * rt_mutex_unlock()'s locks held. Such preemption can be avoided in 718 - * a number of ways, for example, by invoking preempt_disable() before 719 - * critical section's outermost rcu_read_lock(). 720 - * 721 - * Given that the set of locks acquired by rt_mutex_unlock() might change 722 - * at any time, a somewhat more future-proofed approach is to make sure 723 - * that that preemption never happens within any RCU read-side critical 724 - * section whose outermost rcu_read_unlock() is called with irqs disabled. 725 - * This approach relies on the fact that rt_mutex_unlock() currently only 726 - * acquires irq-disabled locks. 727 - * 728 - * The second of these two approaches is best in most situations, 729 - * however, the first approach can also be useful, at least to those 730 - * developers willing to keep abreast of the set of locks acquired by 731 - * rt_mutex_unlock(). 689 + * In almost all situations, rcu_read_unlock() is immune from deadlock. 690 + * In recent kernels that have consolidated synchronize_sched() and 691 + * synchronize_rcu_bh() into synchronize_rcu(), this deadlock immunity 692 + * also extends to the scheduler's runqueue and priority-inheritance 693 + * spinlocks, courtesy of the quiescent-state deferral that is carried 694 + * out when rcu_read_unlock() is invoked with interrupts disabled. 732 695 * 733 696 * See rcu_read_lock() for more information. 734 697 */ ··· 723 728 /** 724 729 * rcu_read_lock_bh() - mark the beginning of an RCU-bh critical section 725 730 * 726 - * This is equivalent of rcu_read_lock(), but also disables softirqs. 727 - * Note that anything else that disables softirqs can also serve as 728 - * an RCU read-side critical section. 731 + * This is equivalent to rcu_read_lock(), but also disables softirqs. 732 + * Note that anything else that disables softirqs can also serve as an RCU 733 + * read-side critical section. However, please note that this equivalence 734 + * applies only to v5.0 and later. Before v5.0, rcu_read_lock() and 735 + * rcu_read_lock_bh() were unrelated. 729 736 * 730 737 * Note that rcu_read_lock_bh() and the matching rcu_read_unlock_bh() 731 738 * must occur in the same context, for example, it is illegal to invoke ··· 760 763 /** 761 764 * rcu_read_lock_sched() - mark the beginning of a RCU-sched critical section 762 765 * 763 - * This is equivalent of rcu_read_lock(), but disables preemption. 764 - * Read-side critical sections can also be introduced by anything else 765 - * that disables preemption, including local_irq_disable() and friends. 766 + * This is equivalent to rcu_read_lock(), but also disables preemption. 767 + * Read-side critical sections can also be introduced by anything else that 768 + * disables preemption, including local_irq_disable() and friends. However, 769 + * please note that the equivalence to rcu_read_lock() applies only to 770 + * v5.0 and later. Before v5.0, rcu_read_lock() and rcu_read_lock_sched() 771 + * were unrelated. 766 772 * 767 773 * Note that rcu_read_lock_sched() and the matching rcu_read_unlock_sched() 768 774 * must occur in the same context, for example, it is illegal to invoke
-1
include/linux/rcutiny.h
··· 86 86 static inline void rcu_irq_exit_irqson(void) { } 87 87 static inline void rcu_irq_enter_irqson(void) { } 88 88 static inline void rcu_irq_exit(void) { } 89 - static inline void rcu_irq_exit_preempt(void) { } 90 89 static inline void rcu_irq_exit_check_preempt(void) { } 91 90 #define rcu_is_idle_cpu(cpu) \ 92 91 (is_idle_task(current) && !in_nmi() && !in_irq() && !in_serving_softirq())
-1
include/linux/rcutree.h
··· 49 49 void rcu_idle_exit(void); 50 50 void rcu_irq_enter(void); 51 51 void rcu_irq_exit(void); 52 - void rcu_irq_exit_preempt(void); 53 52 void rcu_irq_enter_irqson(void); 54 53 void rcu_irq_exit_irqson(void); 55 54 bool rcu_is_idle_cpu(int cpu);
+6
include/linux/srcu.h
··· 64 64 unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp); 65 65 bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long cookie); 66 66 67 + #ifdef CONFIG_SRCU 68 + void srcu_init(void); 69 + #else /* #ifdef CONFIG_SRCU */ 70 + static inline void srcu_init(void) { } 71 + #endif /* #else #ifdef CONFIG_SRCU */ 72 + 67 73 #ifdef CONFIG_DEBUG_LOCK_ALLOC 68 74 69 75 /**
-2
include/linux/srcutree.h
··· 82 82 /* callback for the barrier */ 83 83 /* operation. */ 84 84 struct delayed_work work; 85 - #ifdef CONFIG_DEBUG_LOCK_ALLOC 86 85 struct lockdep_map dep_map; 87 - #endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */ 88 86 }; 89 87 90 88 /* Values for state variable (bottom bits of ->srcu_gp_seq). */
-2
include/linux/timer.h
··· 192 192 193 193 #define del_singleshot_timer_sync(t) del_timer_sync(t) 194 194 195 - extern bool timer_curr_running(struct timer_list *timer); 196 - 197 195 extern void init_timers(void); 198 196 struct hrtimer; 199 197 extern enum hrtimer_restart it_real_fn(struct hrtimer *);
+1
include/trace/events/rcu.h
··· 278 278 * "WakeNot": Don't wake rcuo kthread. 279 279 * "WakeNotPoll": Don't wake rcuo kthread because it is polling. 280 280 * "WakeOvfIsDeferred": Wake rcuo kthread later, CB list is huge. 281 + * "WakeBypassIsDeferred": Wake rcuo kthread later, bypass list is contended. 281 282 * "WokeEmpty": rcuo CB kthread woke to find empty list. 282 283 */ 283 284 TRACE_EVENT_RCU(rcu_nocb_wake,
+2
init/main.c
··· 42 42 #include <linux/profile.h> 43 43 #include <linux/kfence.h> 44 44 #include <linux/rcupdate.h> 45 + #include <linux/srcu.h> 45 46 #include <linux/moduleparam.h> 46 47 #include <linux/kallsyms.h> 47 48 #include <linux/writeback.h> ··· 1009 1008 tick_init(); 1010 1009 rcu_init_nohz(); 1011 1010 init_timers(); 1011 + srcu_init(); 1012 1012 hrtimers_init(); 1013 1013 softirq_init(); 1014 1014 timekeeping_init();
+4 -2
kernel/locking/lockdep.c
··· 6506 6506 void lockdep_rcu_suspicious(const char *file, const int line, const char *s) 6507 6507 { 6508 6508 struct task_struct *curr = current; 6509 + int dl = READ_ONCE(debug_locks); 6509 6510 6510 6511 /* Note: the following can be executed concurrently, so be careful. */ 6511 6512 pr_warn("\n"); ··· 6516 6515 pr_warn("-----------------------------\n"); 6517 6516 pr_warn("%s:%d %s!\n", file, line, s); 6518 6517 pr_warn("\nother info that might help us debug this:\n\n"); 6519 - pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n", 6518 + pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n%s", 6520 6519 !rcu_lockdep_current_cpu_online() 6521 6520 ? "RCU used illegally from offline CPU!\n" 6522 6521 : "", 6523 - rcu_scheduler_active, debug_locks); 6522 + rcu_scheduler_active, dl, 6523 + dl ? "" : "Possible false positive due to lockdep disabling via debug_locks = 0\n"); 6524 6524 6525 6525 /* 6526 6526 * If a CPU is in the RCU-free window in idle (ie: in the section
+1 -1
kernel/rcu/Kconfig.debug
··· 116 116 117 117 config RCU_STRICT_GRACE_PERIOD 118 118 bool "Provide debug RCU implementation with short grace periods" 119 - depends on DEBUG_KERNEL && RCU_EXPERT 119 + depends on DEBUG_KERNEL && RCU_EXPERT && NR_CPUS <= 4 120 120 default n 121 121 select PREEMPT_COUNT if PREEMPT=n 122 122 help
+8 -6
kernel/rcu/rcu.h
··· 308 308 } 309 309 } 310 310 311 + extern void rcu_init_geometry(void); 312 + 311 313 /* Returns a pointer to the first leaf rcu_node structure. */ 312 314 #define rcu_first_leaf_node() (rcu_state.level[rcu_num_lvls - 1]) 313 315 ··· 424 422 425 423 #endif /* #if defined(CONFIG_SRCU) || !defined(CONFIG_TINY_RCU) */ 426 424 427 - #ifdef CONFIG_SRCU 428 - void srcu_init(void); 429 - #else /* #ifdef CONFIG_SRCU */ 430 - static inline void srcu_init(void) { } 431 - #endif /* #else #ifdef CONFIG_SRCU */ 432 - 433 425 #ifdef CONFIG_TINY_RCU 434 426 /* Tiny RCU doesn't expedite, as its purpose in life is instead to be tiny. */ 435 427 static inline bool rcu_gp_is_normal(void) { return true; } ··· 437 441 void rcu_expedite_gp(void); 438 442 void rcu_unexpedite_gp(void); 439 443 void rcupdate_announce_bootup_oddness(void); 444 + #ifdef CONFIG_TASKS_RCU_GENERIC 440 445 void show_rcu_tasks_gp_kthreads(void); 446 + #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */ 447 + static inline void show_rcu_tasks_gp_kthreads(void) {} 448 + #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */ 441 449 void rcu_request_urgent_qs_task(struct task_struct *t); 442 450 #endif /* #else #ifdef CONFIG_TINY_RCU */ 443 451 ··· 519 519 static inline unsigned long 520 520 srcu_batches_completed(struct srcu_struct *sp) { return 0; } 521 521 static inline void rcu_force_quiescent_state(void) { } 522 + static inline bool rcu_check_boost_fail(unsigned long gp_state, int *cpup) { return true; } 522 523 static inline void show_rcu_gp_kthreads(void) { } 523 524 static inline int rcu_get_gp_kthreads_prio(void) { return 0; } 524 525 static inline void rcu_fwd_progress_check(unsigned long j) { } ··· 528 527 unsigned long rcu_get_gp_seq(void); 529 528 unsigned long rcu_exp_batches_completed(void); 530 529 unsigned long srcu_batches_completed(struct srcu_struct *sp); 530 + bool rcu_check_boost_fail(unsigned long gp_state, int *cpup); 531 531 void show_rcu_gp_kthreads(void); 532 532 int rcu_get_gp_kthreads_prio(void); 533 533 void rcu_fwd_progress_check(unsigned long j);
+169 -144
kernel/rcu/rcutorture.c
··· 245 245 return rcu_torture_writer_state_names[i]; 246 246 } 247 247 248 - #if defined(CONFIG_RCU_BOOST) && defined(CONFIG_PREEMPT_RT) 249 - # define rcu_can_boost() 1 250 - #else 251 - # define rcu_can_boost() 0 252 - #endif 253 - 254 248 #ifdef CONFIG_RCU_TRACE 255 249 static u64 notrace rcu_trace_clock_local(void) 256 250 { ··· 325 331 void (*read_delay)(struct torture_random_state *rrsp, 326 332 struct rt_read_seg *rtrsp); 327 333 void (*readunlock)(int idx); 334 + int (*readlock_held)(void); 328 335 unsigned long (*get_gp_seq)(void); 329 336 unsigned long (*gp_diff)(unsigned long new, unsigned long old); 330 337 void (*deferred_free)(struct rcu_torture *p); ··· 340 345 void (*fqs)(void); 341 346 void (*stats)(void); 342 347 void (*gp_kthread_dbg)(void); 348 + bool (*check_boost_failed)(unsigned long gp_state, int *cpup); 343 349 int (*stall_dur)(void); 344 350 int irq_capable; 345 351 int can_boost; ··· 354 358 /* 355 359 * Definitions for rcu torture testing. 356 360 */ 361 + 362 + static int torture_readlock_not_held(void) 363 + { 364 + return rcu_read_lock_bh_held() || rcu_read_lock_sched_held(); 365 + } 357 366 358 367 static int rcu_torture_read_lock(void) __acquires(RCU) 359 368 { ··· 484 483 } 485 484 486 485 static struct rcu_torture_ops rcu_ops = { 487 - .ttype = RCU_FLAVOR, 488 - .init = rcu_sync_torture_init, 489 - .readlock = rcu_torture_read_lock, 490 - .read_delay = rcu_read_delay, 491 - .readunlock = rcu_torture_read_unlock, 492 - .get_gp_seq = rcu_get_gp_seq, 493 - .gp_diff = rcu_seq_diff, 494 - .deferred_free = rcu_torture_deferred_free, 495 - .sync = synchronize_rcu, 496 - .exp_sync = synchronize_rcu_expedited, 497 - .get_gp_state = get_state_synchronize_rcu, 498 - .start_gp_poll = start_poll_synchronize_rcu, 499 - .poll_gp_state = poll_state_synchronize_rcu, 500 - .cond_sync = cond_synchronize_rcu, 501 - .call = call_rcu, 502 - .cb_barrier = rcu_barrier, 503 - .fqs = rcu_force_quiescent_state, 504 - .stats = NULL, 505 - .gp_kthread_dbg = show_rcu_gp_kthreads, 506 - .stall_dur = rcu_jiffies_till_stall_check, 507 - .irq_capable = 1, 508 - .can_boost = rcu_can_boost(), 509 - .extendables = RCUTORTURE_MAX_EXTEND, 510 - .name = "rcu" 486 + .ttype = RCU_FLAVOR, 487 + .init = rcu_sync_torture_init, 488 + .readlock = rcu_torture_read_lock, 489 + .read_delay = rcu_read_delay, 490 + .readunlock = rcu_torture_read_unlock, 491 + .readlock_held = torture_readlock_not_held, 492 + .get_gp_seq = rcu_get_gp_seq, 493 + .gp_diff = rcu_seq_diff, 494 + .deferred_free = rcu_torture_deferred_free, 495 + .sync = synchronize_rcu, 496 + .exp_sync = synchronize_rcu_expedited, 497 + .get_gp_state = get_state_synchronize_rcu, 498 + .start_gp_poll = start_poll_synchronize_rcu, 499 + .poll_gp_state = poll_state_synchronize_rcu, 500 + .cond_sync = cond_synchronize_rcu, 501 + .call = call_rcu, 502 + .cb_barrier = rcu_barrier, 503 + .fqs = rcu_force_quiescent_state, 504 + .stats = NULL, 505 + .gp_kthread_dbg = show_rcu_gp_kthreads, 506 + .check_boost_failed = rcu_check_boost_fail, 507 + .stall_dur = rcu_jiffies_till_stall_check, 508 + .irq_capable = 1, 509 + .can_boost = IS_ENABLED(CONFIG_RCU_BOOST), 510 + .extendables = RCUTORTURE_MAX_EXTEND, 511 + .name = "rcu" 511 512 }; 512 513 513 514 /* ··· 543 540 .readlock = rcu_torture_read_lock, 544 541 .read_delay = rcu_read_delay, /* just reuse rcu's version. */ 545 542 .readunlock = rcu_torture_read_unlock, 543 + .readlock_held = torture_readlock_not_held, 546 544 .get_gp_seq = rcu_no_completed, 547 545 .deferred_free = rcu_busted_torture_deferred_free, 548 546 .sync = synchronize_rcu_busted, ··· 591 587 static void srcu_torture_read_unlock(int idx) __releases(srcu_ctlp) 592 588 { 593 589 srcu_read_unlock(srcu_ctlp, idx); 590 + } 591 + 592 + static int torture_srcu_read_lock_held(void) 593 + { 594 + return srcu_read_lock_held(srcu_ctlp); 594 595 } 595 596 596 597 static unsigned long srcu_torture_completed(void) ··· 655 646 .readlock = srcu_torture_read_lock, 656 647 .read_delay = srcu_read_delay, 657 648 .readunlock = srcu_torture_read_unlock, 649 + .readlock_held = torture_srcu_read_lock_held, 658 650 .get_gp_seq = srcu_torture_completed, 659 651 .deferred_free = srcu_torture_deferred_free, 660 652 .sync = srcu_torture_synchronize, ··· 691 681 .readlock = srcu_torture_read_lock, 692 682 .read_delay = srcu_read_delay, 693 683 .readunlock = srcu_torture_read_unlock, 684 + .readlock_held = torture_srcu_read_lock_held, 694 685 .get_gp_seq = srcu_torture_completed, 695 686 .deferred_free = srcu_torture_deferred_free, 696 687 .sync = srcu_torture_synchronize, ··· 711 700 .readlock = srcu_torture_read_lock, 712 701 .read_delay = rcu_read_delay, 713 702 .readunlock = srcu_torture_read_unlock, 703 + .readlock_held = torture_srcu_read_lock_held, 714 704 .get_gp_seq = srcu_torture_completed, 715 705 .deferred_free = srcu_torture_deferred_free, 716 706 .sync = srcu_torture_synchronize, ··· 799 787 .readlock = rcu_torture_read_lock_trivial, 800 788 .read_delay = rcu_read_delay, /* just reuse rcu's version. */ 801 789 .readunlock = rcu_torture_read_unlock_trivial, 790 + .readlock_held = torture_readlock_not_held, 802 791 .get_gp_seq = rcu_no_completed, 803 792 .sync = synchronize_rcu_trivial, 804 793 .exp_sync = synchronize_rcu_trivial, ··· 863 850 .readlock = tasks_tracing_torture_read_lock, 864 851 .read_delay = srcu_read_delay, /* just reuse srcu's version. */ 865 852 .readunlock = tasks_tracing_torture_read_unlock, 853 + .readlock_held = rcu_read_lock_trace_held, 866 854 .get_gp_seq = rcu_no_completed, 867 855 .deferred_free = rcu_tasks_tracing_torture_deferred_free, 868 856 .sync = synchronize_rcu_tasks_trace, ··· 885 871 return cur_ops->gp_diff(new, old); 886 872 } 887 873 888 - static bool __maybe_unused torturing_tasks(void) 889 - { 890 - return cur_ops == &tasks_ops || cur_ops == &tasks_rude_ops; 891 - } 892 - 893 874 /* 894 875 * RCU torture priority-boost testing. Runs one real-time thread per 895 - * CPU for moderate bursts, repeatedly registering RCU callbacks and 896 - * spinning waiting for them to be invoked. If a given callback takes 897 - * too long to be invoked, we assume that priority inversion has occurred. 876 + * CPU for moderate bursts, repeatedly starting grace periods and waiting 877 + * for them to complete. If a given grace period takes too long, we assume 878 + * that priority inversion has occurred. 898 879 */ 899 - 900 - struct rcu_boost_inflight { 901 - struct rcu_head rcu; 902 - int inflight; 903 - }; 904 - 905 - static void rcu_torture_boost_cb(struct rcu_head *head) 906 - { 907 - struct rcu_boost_inflight *rbip = 908 - container_of(head, struct rcu_boost_inflight, rcu); 909 - 910 - /* Ensure RCU-core accesses precede clearing ->inflight */ 911 - smp_store_release(&rbip->inflight, 0); 912 - } 913 880 914 881 static int old_rt_runtime = -1; 915 882 ··· 918 923 old_rt_runtime = -1; 919 924 } 920 925 921 - static bool rcu_torture_boost_failed(unsigned long start, unsigned long end) 926 + static bool rcu_torture_boost_failed(unsigned long gp_state, unsigned long *start) 922 927 { 928 + int cpu; 923 929 static int dbg_done; 930 + unsigned long end = jiffies; 931 + bool gp_done; 932 + unsigned long j; 933 + static unsigned long last_persist; 934 + unsigned long lp; 935 + unsigned long mininterval = test_boost_duration * HZ - HZ / 2; 924 936 925 - if (end - start > test_boost_duration * HZ - HZ / 2) { 937 + if (end - *start > mininterval) { 938 + // Recheck after checking time to avoid false positives. 939 + smp_mb(); // Time check before grace-period check. 940 + if (cur_ops->poll_gp_state(gp_state)) 941 + return false; // passed, though perhaps just barely 942 + if (cur_ops->check_boost_failed && !cur_ops->check_boost_failed(gp_state, &cpu)) { 943 + // At most one persisted message per boost test. 944 + j = jiffies; 945 + lp = READ_ONCE(last_persist); 946 + if (time_after(j, lp + mininterval) && cmpxchg(&last_persist, lp, j) == lp) 947 + pr_info("Boost inversion persisted: No QS from CPU %d\n", cpu); 948 + return false; // passed on a technicality 949 + } 926 950 VERBOSE_TOROUT_STRING("rcu_torture_boost boosting failed"); 927 951 n_rcu_torture_boost_failure++; 928 - if (!xchg(&dbg_done, 1) && cur_ops->gp_kthread_dbg) 952 + if (!xchg(&dbg_done, 1) && cur_ops->gp_kthread_dbg) { 953 + pr_info("Boost inversion thread ->rt_priority %u gp_state %lu jiffies %lu\n", 954 + current->rt_priority, gp_state, end - *start); 929 955 cur_ops->gp_kthread_dbg(); 956 + // Recheck after print to flag grace period ending during splat. 957 + gp_done = cur_ops->poll_gp_state(gp_state); 958 + pr_info("Boost inversion: GP %lu %s.\n", gp_state, 959 + gp_done ? "ended already" : "still pending"); 930 960 931 - return true; /* failed */ 961 + } 962 + 963 + return true; // failed 964 + } else if (cur_ops->check_boost_failed && !cur_ops->check_boost_failed(gp_state, NULL)) { 965 + *start = jiffies; 932 966 } 933 967 934 - return false; /* passed */ 968 + return false; // passed 935 969 } 936 970 937 971 static int rcu_torture_boost(void *arg) 938 972 { 939 - unsigned long call_rcu_time; 940 973 unsigned long endtime; 974 + unsigned long gp_state; 975 + unsigned long gp_state_time; 941 976 unsigned long oldstarttime; 942 - struct rcu_boost_inflight rbi = { .inflight = 0 }; 943 977 944 978 VERBOSE_TOROUT_STRING("rcu_torture_boost started"); 945 979 946 980 /* Set real-time priority. */ 947 981 sched_set_fifo_low(current); 948 982 949 - init_rcu_head_on_stack(&rbi.rcu); 950 983 /* Each pass through the following loop does one boost-test cycle. */ 951 984 do { 952 985 bool failed = false; // Test failed already in this test interval 953 - bool firsttime = true; 986 + bool gp_initiated = false; 954 987 955 - /* Increment n_rcu_torture_boosts once per boost-test */ 956 - while (!kthread_should_stop()) { 957 - if (mutex_trylock(&boost_mutex)) { 958 - n_rcu_torture_boosts++; 959 - mutex_unlock(&boost_mutex); 960 - break; 961 - } 962 - schedule_timeout_uninterruptible(1); 963 - } 964 988 if (kthread_should_stop()) 965 989 goto checkwait; 966 990 ··· 993 979 goto checkwait; 994 980 } 995 981 996 - /* Do one boost-test interval. */ 982 + // Do one boost-test interval. 997 983 endtime = oldstarttime + test_boost_duration * HZ; 998 984 while (time_before(jiffies, endtime)) { 999 - /* If we don't have a callback in flight, post one. */ 1000 - if (!smp_load_acquire(&rbi.inflight)) { 1001 - /* RCU core before ->inflight = 1. */ 1002 - smp_store_release(&rbi.inflight, 1); 1003 - cur_ops->call(&rbi.rcu, rcu_torture_boost_cb); 1004 - /* Check if the boost test failed */ 1005 - if (!firsttime && !failed) 1006 - failed = rcu_torture_boost_failed(call_rcu_time, jiffies); 1007 - call_rcu_time = jiffies; 1008 - firsttime = false; 985 + // Has current GP gone too long? 986 + if (gp_initiated && !failed && !cur_ops->poll_gp_state(gp_state)) 987 + failed = rcu_torture_boost_failed(gp_state, &gp_state_time); 988 + // If we don't have a grace period in flight, start one. 989 + if (!gp_initiated || cur_ops->poll_gp_state(gp_state)) { 990 + gp_state = cur_ops->start_gp_poll(); 991 + gp_initiated = true; 992 + gp_state_time = jiffies; 1009 993 } 1010 - if (stutter_wait("rcu_torture_boost")) 994 + if (stutter_wait("rcu_torture_boost")) { 1011 995 sched_set_fifo_low(current); 996 + // If the grace period already ended, 997 + // we don't know when that happened, so 998 + // start over. 999 + if (cur_ops->poll_gp_state(gp_state)) 1000 + gp_initiated = false; 1001 + } 1012 1002 if (torture_must_stop()) 1013 1003 goto checkwait; 1014 1004 } 1015 1005 1016 - /* 1017 - * If boost never happened, then inflight will always be 1, in 1018 - * this case the boost check would never happen in the above 1019 - * loop so do another one here. 1020 - */ 1021 - if (!firsttime && !failed && smp_load_acquire(&rbi.inflight)) 1022 - rcu_torture_boost_failed(call_rcu_time, jiffies); 1006 + // In case the grace period extended beyond the end of the loop. 1007 + if (gp_initiated && !failed && !cur_ops->poll_gp_state(gp_state)) 1008 + rcu_torture_boost_failed(gp_state, &gp_state_time); 1023 1009 1024 1010 /* 1025 1011 * Set the start time of the next test interval. ··· 1028 1014 * interval. Besides, we are running at RT priority, 1029 1015 * so delays should be relatively rare. 1030 1016 */ 1031 - while (oldstarttime == boost_starttime && 1032 - !kthread_should_stop()) { 1017 + while (oldstarttime == boost_starttime && !kthread_should_stop()) { 1033 1018 if (mutex_trylock(&boost_mutex)) { 1034 - boost_starttime = jiffies + 1035 - test_boost_interval * HZ; 1019 + if (oldstarttime == boost_starttime) { 1020 + boost_starttime = jiffies + test_boost_interval * HZ; 1021 + n_rcu_torture_boosts++; 1022 + } 1036 1023 mutex_unlock(&boost_mutex); 1037 1024 break; 1038 1025 } ··· 1045 1030 sched_set_fifo_low(current); 1046 1031 } while (!torture_must_stop()); 1047 1032 1048 - while (smp_load_acquire(&rbi.inflight)) 1049 - schedule_timeout_uninterruptible(1); // rcu_barrier() deadlocks. 1050 - 1051 1033 /* Clean up and exit. */ 1052 - while (!kthread_should_stop() || smp_load_acquire(&rbi.inflight)) { 1034 + while (!kthread_should_stop()) { 1053 1035 torture_shutdown_absorb("rcu_torture_boost"); 1054 1036 schedule_timeout_uninterruptible(1); 1055 1037 } 1056 - destroy_rcu_head_on_stack(&rbi.rcu); 1057 1038 torture_kthread_stopping("rcu_torture_boost"); 1058 1039 return 0; 1059 1040 } ··· 1564 1553 started = cur_ops->get_gp_seq(); 1565 1554 ts = rcu_trace_clock_local(); 1566 1555 p = rcu_dereference_check(rcu_torture_current, 1567 - rcu_read_lock_bh_held() || 1568 - rcu_read_lock_sched_held() || 1569 - srcu_read_lock_held(srcu_ctlp) || 1570 - rcu_read_lock_trace_held() || 1571 - torturing_tasks()); 1556 + !cur_ops->readlock_held || cur_ops->readlock_held()); 1572 1557 if (p == NULL) { 1573 1558 /* Wait for rcu_torture_writer to get underway */ 1574 1559 rcutorture_one_extend(&readstate, 0, trsp, rtrsp); ··· 1868 1861 torture_shutdown_absorb("rcu_torture_stats"); 1869 1862 } while (!torture_must_stop()); 1870 1863 torture_kthread_stopping("rcu_torture_stats"); 1871 - 1872 - { 1873 - struct rcu_head *rhp; 1874 - struct kmem_cache *kcp; 1875 - static int z; 1876 - 1877 - kcp = kmem_cache_create("rcuscale", 136, 8, SLAB_STORE_USER, NULL); 1878 - rhp = kmem_cache_alloc(kcp, GFP_KERNEL); 1879 - pr_alert("mem_dump_obj() slab test: rcu_torture_stats = %px, &rhp = %px, rhp = %px, &z = %px\n", stats_task, &rhp, rhp, &z); 1880 - pr_alert("mem_dump_obj(ZERO_SIZE_PTR):"); 1881 - mem_dump_obj(ZERO_SIZE_PTR); 1882 - pr_alert("mem_dump_obj(NULL):"); 1883 - mem_dump_obj(NULL); 1884 - pr_alert("mem_dump_obj(%px):", &rhp); 1885 - mem_dump_obj(&rhp); 1886 - pr_alert("mem_dump_obj(%px):", rhp); 1887 - mem_dump_obj(rhp); 1888 - pr_alert("mem_dump_obj(%px):", &rhp->func); 1889 - mem_dump_obj(&rhp->func); 1890 - pr_alert("mem_dump_obj(%px):", &z); 1891 - mem_dump_obj(&z); 1892 - kmem_cache_free(kcp, rhp); 1893 - kmem_cache_destroy(kcp); 1894 - rhp = kmalloc(sizeof(*rhp), GFP_KERNEL); 1895 - pr_alert("mem_dump_obj() kmalloc test: rcu_torture_stats = %px, &rhp = %px, rhp = %px\n", stats_task, &rhp, rhp); 1896 - pr_alert("mem_dump_obj(kmalloc %px):", rhp); 1897 - mem_dump_obj(rhp); 1898 - pr_alert("mem_dump_obj(kmalloc %px):", &rhp->func); 1899 - mem_dump_obj(&rhp->func); 1900 - kfree(rhp); 1901 - rhp = vmalloc(4096); 1902 - pr_alert("mem_dump_obj() vmalloc test: rcu_torture_stats = %px, &rhp = %px, rhp = %px\n", stats_task, &rhp, rhp); 1903 - pr_alert("mem_dump_obj(vmalloc %px):", rhp); 1904 - mem_dump_obj(rhp); 1905 - pr_alert("mem_dump_obj(vmalloc %px):", &rhp->func); 1906 - mem_dump_obj(&rhp->func); 1907 - vfree(rhp); 1908 - } 1909 - 1910 1864 return 0; 1865 + } 1866 + 1867 + /* Test mem_dump_obj() and friends. */ 1868 + static void rcu_torture_mem_dump_obj(void) 1869 + { 1870 + struct rcu_head *rhp; 1871 + struct kmem_cache *kcp; 1872 + static int z; 1873 + 1874 + kcp = kmem_cache_create("rcuscale", 136, 8, SLAB_STORE_USER, NULL); 1875 + rhp = kmem_cache_alloc(kcp, GFP_KERNEL); 1876 + pr_alert("mem_dump_obj() slab test: rcu_torture_stats = %px, &rhp = %px, rhp = %px, &z = %px\n", stats_task, &rhp, rhp, &z); 1877 + pr_alert("mem_dump_obj(ZERO_SIZE_PTR):"); 1878 + mem_dump_obj(ZERO_SIZE_PTR); 1879 + pr_alert("mem_dump_obj(NULL):"); 1880 + mem_dump_obj(NULL); 1881 + pr_alert("mem_dump_obj(%px):", &rhp); 1882 + mem_dump_obj(&rhp); 1883 + pr_alert("mem_dump_obj(%px):", rhp); 1884 + mem_dump_obj(rhp); 1885 + pr_alert("mem_dump_obj(%px):", &rhp->func); 1886 + mem_dump_obj(&rhp->func); 1887 + pr_alert("mem_dump_obj(%px):", &z); 1888 + mem_dump_obj(&z); 1889 + kmem_cache_free(kcp, rhp); 1890 + kmem_cache_destroy(kcp); 1891 + rhp = kmalloc(sizeof(*rhp), GFP_KERNEL); 1892 + pr_alert("mem_dump_obj() kmalloc test: rcu_torture_stats = %px, &rhp = %px, rhp = %px\n", stats_task, &rhp, rhp); 1893 + pr_alert("mem_dump_obj(kmalloc %px):", rhp); 1894 + mem_dump_obj(rhp); 1895 + pr_alert("mem_dump_obj(kmalloc %px):", &rhp->func); 1896 + mem_dump_obj(&rhp->func); 1897 + kfree(rhp); 1898 + rhp = vmalloc(4096); 1899 + pr_alert("mem_dump_obj() vmalloc test: rcu_torture_stats = %px, &rhp = %px, rhp = %px\n", stats_task, &rhp, rhp); 1900 + pr_alert("mem_dump_obj(vmalloc %px):", rhp); 1901 + mem_dump_obj(rhp); 1902 + pr_alert("mem_dump_obj(vmalloc %px):", &rhp->func); 1903 + mem_dump_obj(&rhp->func); 1904 + vfree(rhp); 1911 1905 } 1912 1906 1913 1907 static void ··· 2642 2634 2643 2635 if (!(test_boost == 1 && cur_ops->can_boost) && test_boost != 2) 2644 2636 return false; 2645 - if (!cur_ops->call) 2637 + if (!cur_ops->start_gp_poll || !cur_ops->poll_gp_state) 2646 2638 return false; 2647 2639 2648 2640 prio = rcu_get_gp_kthreads_prio(); ··· 2650 2642 return false; 2651 2643 2652 2644 if (prio < 2) { 2653 - if (boost_warn_once == 1) 2645 + if (boost_warn_once == 1) 2654 2646 return false; 2655 2647 2656 2648 pr_alert("%s: WARN: RCU kthread priority too low to test boosting. Skipping RCU boost test. Try passing rcutree.kthread_prio > 1 on the kernel command line.\n", KBUILD_MODNAME); ··· 2825 2817 cur_ops->cb_barrier(); 2826 2818 if (cur_ops->cleanup != NULL) 2827 2819 cur_ops->cleanup(); 2820 + 2821 + rcu_torture_mem_dump_obj(); 2828 2822 2829 2823 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */ 2830 2824 ··· 3130 3120 if (firsterr < 0) 3131 3121 goto unwind; 3132 3122 rcutor_hp = firsterr; 3123 + 3124 + // Testing RCU priority boosting requires rcutorture do 3125 + // some serious abuse. Counter this by running ksoftirqd 3126 + // at higher priority. 3127 + if (IS_BUILTIN(CONFIG_RCU_TORTURE_TEST)) { 3128 + for_each_online_cpu(cpu) { 3129 + struct sched_param sp; 3130 + struct task_struct *t; 3131 + 3132 + t = per_cpu(ksoftirqd, cpu); 3133 + WARN_ON_ONCE(!t); 3134 + sp.sched_priority = 2; 3135 + sched_setscheduler_nocheck(t, SCHED_FIFO, &sp); 3136 + } 3137 + } 3133 3138 } 3134 3139 shutdown_jiffies = jiffies + shutdown_secs * HZ; 3135 3140 firsterr = torture_shutdown_init(shutdown_secs, rcu_torture_cleanup);
+107 -2
kernel/rcu/refscale.c
··· 362 362 .name = "rwsem" 363 363 }; 364 364 365 + // Definitions for global spinlock 366 + static DEFINE_SPINLOCK(test_lock); 367 + 368 + static void ref_lock_section(const int nloops) 369 + { 370 + int i; 371 + 372 + preempt_disable(); 373 + for (i = nloops; i >= 0; i--) { 374 + spin_lock(&test_lock); 375 + spin_unlock(&test_lock); 376 + } 377 + preempt_enable(); 378 + } 379 + 380 + static void ref_lock_delay_section(const int nloops, const int udl, const int ndl) 381 + { 382 + int i; 383 + 384 + preempt_disable(); 385 + for (i = nloops; i >= 0; i--) { 386 + spin_lock(&test_lock); 387 + un_delay(udl, ndl); 388 + spin_unlock(&test_lock); 389 + } 390 + preempt_enable(); 391 + } 392 + 393 + static struct ref_scale_ops lock_ops = { 394 + .readsection = ref_lock_section, 395 + .delaysection = ref_lock_delay_section, 396 + .name = "lock" 397 + }; 398 + 399 + // Definitions for global irq-save spinlock 400 + 401 + static void ref_lock_irq_section(const int nloops) 402 + { 403 + unsigned long flags; 404 + int i; 405 + 406 + preempt_disable(); 407 + for (i = nloops; i >= 0; i--) { 408 + spin_lock_irqsave(&test_lock, flags); 409 + spin_unlock_irqrestore(&test_lock, flags); 410 + } 411 + preempt_enable(); 412 + } 413 + 414 + static void ref_lock_irq_delay_section(const int nloops, const int udl, const int ndl) 415 + { 416 + unsigned long flags; 417 + int i; 418 + 419 + preempt_disable(); 420 + for (i = nloops; i >= 0; i--) { 421 + spin_lock_irqsave(&test_lock, flags); 422 + un_delay(udl, ndl); 423 + spin_unlock_irqrestore(&test_lock, flags); 424 + } 425 + preempt_enable(); 426 + } 427 + 428 + static struct ref_scale_ops lock_irq_ops = { 429 + .readsection = ref_lock_irq_section, 430 + .delaysection = ref_lock_irq_delay_section, 431 + .name = "lock-irq" 432 + }; 433 + 434 + // Definitions acquire-release. 435 + static DEFINE_PER_CPU(unsigned long, test_acqrel); 436 + 437 + static void ref_acqrel_section(const int nloops) 438 + { 439 + unsigned long x; 440 + int i; 441 + 442 + preempt_disable(); 443 + for (i = nloops; i >= 0; i--) { 444 + x = smp_load_acquire(this_cpu_ptr(&test_acqrel)); 445 + smp_store_release(this_cpu_ptr(&test_acqrel), x + 1); 446 + } 447 + preempt_enable(); 448 + } 449 + 450 + static void ref_acqrel_delay_section(const int nloops, const int udl, const int ndl) 451 + { 452 + unsigned long x; 453 + int i; 454 + 455 + preempt_disable(); 456 + for (i = nloops; i >= 0; i--) { 457 + x = smp_load_acquire(this_cpu_ptr(&test_acqrel)); 458 + un_delay(udl, ndl); 459 + smp_store_release(this_cpu_ptr(&test_acqrel), x + 1); 460 + } 461 + preempt_enable(); 462 + } 463 + 464 + static struct ref_scale_ops acqrel_ops = { 465 + .readsection = ref_acqrel_section, 466 + .delaysection = ref_acqrel_delay_section, 467 + .name = "acqrel" 468 + }; 469 + 365 470 static void rcu_scale_one_reader(void) 366 471 { 367 472 if (readdelay <= 0) ··· 758 653 long i; 759 654 int firsterr = 0; 760 655 static struct ref_scale_ops *scale_ops[] = { 761 - &rcu_ops, &srcu_ops, &rcu_trace_ops, &rcu_tasks_ops, 762 - &refcnt_ops, &rwlock_ops, &rwsem_ops, 656 + &rcu_ops, &srcu_ops, &rcu_trace_ops, &rcu_tasks_ops, &refcnt_ops, &rwlock_ops, 657 + &rwsem_ops, &lock_ops, &lock_irq_ops, &acqrel_ops, 763 658 }; 764 659 765 660 if (!torture_init_begin(scale_type, verbose))
+15 -13
kernel/rcu/srcutree.c
··· 80 80 * srcu_read_unlock() running against them. So if the is_static parameter 81 81 * is set, don't initialize ->srcu_lock_count[] and ->srcu_unlock_count[]. 82 82 */ 83 - static void init_srcu_struct_nodes(struct srcu_struct *ssp, bool is_static) 83 + static void init_srcu_struct_nodes(struct srcu_struct *ssp) 84 84 { 85 85 int cpu; 86 86 int i; ··· 89 89 struct srcu_data *sdp; 90 90 struct srcu_node *snp; 91 91 struct srcu_node *snp_first; 92 + 93 + /* Initialize geometry if it has not already been initialized. */ 94 + rcu_init_geometry(); 92 95 93 96 /* Work out the overall tree geometry. */ 94 97 ssp->level[0] = &ssp->node[0]; ··· 151 148 timer_setup(&sdp->delay_work, srcu_delay_timer, 0); 152 149 sdp->ssp = ssp; 153 150 sdp->grpmask = 1 << (cpu - sdp->mynode->grplo); 154 - if (is_static) 155 - continue; 156 - 157 - /* Dynamically allocated, better be no srcu_read_locks()! */ 158 - for (i = 0; i < ARRAY_SIZE(sdp->srcu_lock_count); i++) { 159 - sdp->srcu_lock_count[i] = 0; 160 - sdp->srcu_unlock_count[i] = 0; 161 - } 162 151 } 163 152 } 164 153 ··· 174 179 ssp->sda = alloc_percpu(struct srcu_data); 175 180 if (!ssp->sda) 176 181 return -ENOMEM; 177 - init_srcu_struct_nodes(ssp, is_static); 182 + init_srcu_struct_nodes(ssp); 178 183 ssp->srcu_gp_seq_needed_exp = 0; 179 184 ssp->srcu_last_gp_end = ktime_get_mono_fast_ns(); 180 185 smp_store_release(&ssp->srcu_gp_seq_needed, 0); /* Init done. */ ··· 772 777 spin_unlock_irqrestore_rcu_node(sdp, flags); 773 778 774 779 /* 775 - * No local callbacks, so probabalistically probe global state. 780 + * No local callbacks, so probabilistically probe global state. 776 781 * Exact information would require acquiring locks, which would 777 - * kill scalability, hence the probabalistic nature of the probe. 782 + * kill scalability, hence the probabilistic nature of the probe. 778 783 */ 779 784 780 785 /* First, see if enough time has passed since the last GP. */ ··· 994 999 * Of course, these memory-ordering guarantees apply only when 995 1000 * synchronize_srcu(), srcu_read_lock(), and srcu_read_unlock() are 996 1001 * passed the same srcu_struct structure. 1002 + * 1003 + * Implementation of these memory-ordering guarantees is similar to 1004 + * that of synchronize_rcu(). 997 1005 * 998 1006 * If SRCU is likely idle, expedite the first request. This semantic 999 1007 * was provided by Classic SRCU, and is relied upon by its users, so TREE ··· 1390 1392 { 1391 1393 struct srcu_struct *ssp; 1392 1394 1395 + /* 1396 + * Once that is set, call_srcu() can follow the normal path and 1397 + * queue delayed work. This must follow RCU workqueues creation 1398 + * and timers initialization. 1399 + */ 1393 1400 srcu_init_done = true; 1394 1401 while (!list_empty(&srcu_boot_list)) { 1395 1402 ssp = list_first_entry(&srcu_boot_list, struct srcu_struct, 1396 1403 work.work.entry); 1397 - check_init_srcu_struct(ssp); 1398 1404 list_del_init(&ssp->work.work.entry); 1399 1405 queue_work(rcu_gp_wq, &ssp->work.work); 1400 1406 }
+2 -2
kernel/rcu/sync.c
··· 94 94 rcu_sync_call(rsp); 95 95 } else { 96 96 /* 97 - * We're at least a GP after the last rcu_sync_exit(); eveybody 97 + * We're at least a GP after the last rcu_sync_exit(); everybody 98 98 * will now have observed the write side critical section. 99 - * Let 'em rip!. 99 + * Let 'em rip! 100 100 */ 101 101 WRITE_ONCE(rsp->gp_state, GP_IDLE); 102 102 }
+51 -7
kernel/rcu/tasks.h
··· 23 23 * struct rcu_tasks - Definition for a Tasks-RCU-like mechanism. 24 24 * @cbs_head: Head of callback list. 25 25 * @cbs_tail: Tail pointer for callback list. 26 - * @cbs_wq: Wait queue allowning new callback to get kthread's attention. 26 + * @cbs_wq: Wait queue allowing new callback to get kthread's attention. 27 27 * @cbs_lock: Lock protecting callback list. 28 28 * @kthread_ptr: This flavor's grace-period/callback-invocation kthread. 29 29 * @gp_func: This flavor's grace-period-wait function. ··· 377 377 // Finally, this implementation does not support high call_rcu_tasks() 378 378 // rates from multiple CPUs. If this is required, per-CPU callback lists 379 379 // will be needed. 380 + // 381 + // The implementation uses rcu_tasks_wait_gp(), which relies on function 382 + // pointers in the rcu_tasks structure. The rcu_spawn_tasks_kthread() 383 + // function sets these function pointers up so that rcu_tasks_wait_gp() 384 + // invokes these functions in this order: 385 + // 386 + // rcu_tasks_pregp_step(): 387 + // Invokes synchronize_rcu() in order to wait for all in-flight 388 + // t->on_rq and t->nvcsw transitions to complete. This works because 389 + // all such transitions are carried out with interrupts disabled. 390 + // rcu_tasks_pertask(), invoked on every non-idle task: 391 + // For every runnable non-idle task other than the current one, use 392 + // get_task_struct() to pin down that task, snapshot that task's 393 + // number of voluntary context switches, and add that task to the 394 + // holdout list. 395 + // rcu_tasks_postscan(): 396 + // Invoke synchronize_srcu() to ensure that all tasks that were 397 + // in the process of exiting (and which thus might not know to 398 + // synchronize with this RCU Tasks grace period) have completed 399 + // exiting. 400 + // check_all_holdout_tasks(), repeatedly until holdout list is empty: 401 + // Scans the holdout list, attempting to identify a quiescent state 402 + // for each task on the list. If there is a quiescent state, the 403 + // corresponding task is removed from the holdout list. 404 + // rcu_tasks_postgp(): 405 + // Invokes synchronize_rcu() in order to ensure that all prior 406 + // t->on_rq and t->nvcsw transitions are seen by all CPUs and tasks 407 + // to have happened before the end of this RCU Tasks grace period. 408 + // Again, this works because all such transitions are carried out 409 + // with interrupts disabled. 410 + // 411 + // For each exiting task, the exit_tasks_rcu_start() and 412 + // exit_tasks_rcu_finish() functions begin and end, respectively, the SRCU 413 + // read-side critical sections waited for by rcu_tasks_postscan(). 414 + // 415 + // Pre-grace-period update-side code is ordered before the grace via the 416 + // ->cbs_lock and the smp_mb__after_spinlock(). Pre-grace-period read-side 417 + // code is ordered before the grace period via synchronize_rcu() call 418 + // in rcu_tasks_pregp_step() and by the scheduler's locks and interrupt 419 + // disabling. 380 420 381 421 /* Pre-grace-period preparation. */ 382 422 static void rcu_tasks_pregp_step(void) ··· 544 504 * or transition to usermode execution. As such, there are no read-side 545 505 * primitives analogous to rcu_read_lock() and rcu_read_unlock() because 546 506 * this primitive is intended to determine that all tasks have passed 547 - * through a safe state, not so much for data-strcuture synchronization. 507 + * through a safe state, not so much for data-structure synchronization. 548 508 * 549 509 * See the description of call_rcu() for more detailed information on 550 510 * memory ordering guarantees. ··· 645 605 // passing an empty function to schedule_on_each_cpu(). This approach 646 606 // provides an asynchronous call_rcu_tasks_rude() API and batching 647 607 // of concurrent calls to the synchronous synchronize_rcu_rude() API. 648 - // This sends IPIs far and wide and induces otherwise unnecessary context 649 - // switches on all online CPUs, whether idle or not. 608 + // This invokes schedule_on_each_cpu() in order to send IPIs far and wide 609 + // and induces otherwise unnecessary context switches on all online CPUs, 610 + // whether idle or not. 611 + // 612 + // Callback handling is provided by the rcu_tasks_kthread() function. 613 + // 614 + // Ordering is provided by the scheduler's context-switch code. 650 615 651 616 // Empty function to allow workqueues to force a context switch. 652 617 static void rcu_tasks_be_rude(struct work_struct *work) ··· 682 637 * there are no read-side primitives analogous to rcu_read_lock() and 683 638 * rcu_read_unlock() because this primitive is intended to determine 684 639 * that all tasks have passed through a safe state, not so much for 685 - * data-strcuture synchronization. 640 + * data-structure synchronization. 686 641 * 687 642 * See the description of call_rcu() for more detailed information on 688 643 * memory ordering guarantees. ··· 1208 1163 * there are no read-side primitives analogous to rcu_read_lock() and 1209 1164 * rcu_read_unlock() because this primitive is intended to determine 1210 1165 * that all tasks have passed through a safe state, not so much for 1211 - * data-strcuture synchronization. 1166 + * data-structure synchronization. 1212 1167 * 1213 1168 * See the description of call_rcu() for more detailed information on 1214 1169 * memory ordering guarantees. ··· 1401 1356 1402 1357 #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */ 1403 1358 static inline void rcu_tasks_bootup_oddness(void) {} 1404 - void show_rcu_tasks_gp_kthreads(void) {} 1405 1359 #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
-1
kernel/rcu/tiny.c
··· 221 221 { 222 222 open_softirq(RCU_SOFTIRQ, rcu_process_callbacks); 223 223 rcu_early_boot_tests(); 224 - srcu_init(); 225 224 }
+175 -140
kernel/rcu/tree.c
··· 188 188 static int rcu_min_cached_objs = 5; 189 189 module_param(rcu_min_cached_objs, int, 0444); 190 190 191 + // A page shrinker can ask for pages to be freed to make them 192 + // available for other parts of the system. This usually happens 193 + // under low memory conditions, and in that case we should also 194 + // defer page-cache filling for a short time period. 195 + // 196 + // The default value is 5 seconds, which is long enough to reduce 197 + // interference with the shrinker while it asks other systems to 198 + // drain their caches. 199 + static int rcu_delay_page_cache_fill_msec = 5000; 200 + module_param(rcu_delay_page_cache_fill_msec, int, 0444); 201 + 191 202 /* Retrieve RCU kthreads priority for rcutorture */ 192 203 int rcu_get_gp_kthreads_prio(void) 193 204 { ··· 215 204 * the need for long delays to increase some race probabilities with the 216 205 * need for fast grace periods to increase other race probabilities. 217 206 */ 218 - #define PER_RCU_NODE_PERIOD 3 /* Number of grace periods between delays. */ 207 + #define PER_RCU_NODE_PERIOD 3 /* Number of grace periods between delays for debugging. */ 219 208 220 209 /* 221 210 * Compute the mask of online CPUs for the specified rcu_node structure. ··· 255 244 { 256 245 rcu_qs(); 257 246 rcu_preempt_deferred_qs(current); 247 + rcu_tasks_qs(current, false); 258 248 } 259 249 260 250 /* ··· 847 835 rcu_nmi_exit(); 848 836 } 849 837 850 - /** 851 - * rcu_irq_exit_preempt - Inform RCU that current CPU is exiting irq 852 - * towards in kernel preemption 853 - * 854 - * Same as rcu_irq_exit() but has a sanity check that scheduling is safe 855 - * from RCU point of view. Invoked from return from interrupt before kernel 856 - * preemption. 857 - */ 858 - void rcu_irq_exit_preempt(void) 859 - { 860 - lockdep_assert_irqs_disabled(); 861 - rcu_nmi_exit(); 862 - 863 - RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) <= 0, 864 - "RCU dynticks_nesting counter underflow/zero!"); 865 - RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 866 - DYNTICK_IRQ_NONIDLE, 867 - "Bad RCU dynticks_nmi_nesting counter\n"); 868 - RCU_LOCKDEP_WARN(rcu_dynticks_curr_cpu_in_eqs(), 869 - "RCU in extended quiescent state!"); 870 - } 871 - 872 838 #ifdef CONFIG_PROVE_RCU 873 839 /** 874 840 * rcu_irq_exit_check_preempt - Validate that scheduling is possible ··· 951 961 */ 952 962 void noinstr rcu_user_exit(void) 953 963 { 954 - rcu_eqs_exit(1); 964 + rcu_eqs_exit(true); 955 965 } 956 966 957 967 /** ··· 1217 1227 #endif /* #if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU) */ 1218 1228 1219 1229 /* 1220 - * We are reporting a quiescent state on behalf of some other CPU, so 1230 + * When trying to report a quiescent state on behalf of some other CPU, 1221 1231 * it is our responsibility to check for and handle potential overflow 1222 1232 * of the rcu_node ->gp_seq counter with respect to the rcu_data counters. 1223 1233 * After all, the CPU might be in deep idle state, and thus executing no ··· 2040 2050 /* 2041 2051 * Clean up after the old grace period. 2042 2052 */ 2043 - static void rcu_gp_cleanup(void) 2053 + static noinline void rcu_gp_cleanup(void) 2044 2054 { 2045 2055 int cpu; 2046 2056 bool needgp = false; ··· 2481 2491 2482 2492 /* 2483 2493 * Invoke any RCU callbacks that have made it to the end of their grace 2484 - * period. Thottle as specified by rdp->blimit. 2494 + * period. Throttle as specified by rdp->blimit. 2485 2495 */ 2486 2496 static void rcu_do_batch(struct rcu_data *rdp) 2487 2497 { ··· 2621 2631 * state, for example, user mode or idle loop. It also schedules RCU 2622 2632 * core processing. If the current grace period has gone on too long, 2623 2633 * it will ask the scheduler to manufacture a context switch for the sole 2624 - * purpose of providing a providing the needed quiescent state. 2634 + * purpose of providing the needed quiescent state. 2625 2635 */ 2626 2636 void rcu_sched_clock_irq(int user) 2627 2637 { ··· 2903 2913 "%s: Could not start rcuc kthread, OOM is now expected behavior\n", __func__); 2904 2914 return 0; 2905 2915 } 2906 - early_initcall(rcu_spawn_core_kthreads); 2907 2916 2908 2917 /* 2909 2918 * Handle any core-RCU processing required by a call_rcu() invocation. ··· 3073 3084 * period elapses, in other words after all pre-existing RCU read-side 3074 3085 * critical sections have completed. However, the callback function 3075 3086 * might well execute concurrently with RCU read-side critical sections 3076 - * that started after call_rcu() was invoked. RCU read-side critical 3077 - * sections are delimited by rcu_read_lock() and rcu_read_unlock(), and 3078 - * may be nested. In addition, regions of code across which interrupts, 3079 - * preemption, or softirqs have been disabled also serve as RCU read-side 3080 - * critical sections. This includes hardware interrupt handlers, softirq 3081 - * handlers, and NMI handlers. 3087 + * that started after call_rcu() was invoked. 3088 + * 3089 + * RCU read-side critical sections are delimited by rcu_read_lock() 3090 + * and rcu_read_unlock(), and may be nested. In addition, but only in 3091 + * v5.0 and later, regions of code across which interrupts, preemption, 3092 + * or softirqs have been disabled also serve as RCU read-side critical 3093 + * sections. This includes hardware interrupt handlers, softirq handlers, 3094 + * and NMI handlers. 3082 3095 * 3083 3096 * Note that all CPUs must agree that the grace period extended beyond 3084 3097 * all pre-existing RCU read-side critical section. On systems with more ··· 3100 3109 * between the call to call_rcu() and the invocation of "func()" -- even 3101 3110 * if CPU A and CPU B are the same CPU (but again only if the system has 3102 3111 * more than one CPU). 3112 + * 3113 + * Implementation of these memory-ordering guarantees is described here: 3114 + * Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst. 3103 3115 */ 3104 3116 void call_rcu(struct rcu_head *head, rcu_callback_t func) 3105 3117 { ··· 3167 3173 * Even though it is lockless an access has to be protected by the 3168 3174 * per-cpu lock. 3169 3175 * @page_cache_work: A work to refill the cache when it is empty 3176 + * @backoff_page_cache_fill: Delay cache refills 3170 3177 * @work_in_progress: Indicates that page_cache_work is running 3171 3178 * @hrtimer: A hrtimer for scheduling a page_cache_work 3172 3179 * @nr_bkv_objs: number of allocated objects at @bkvcache. ··· 3187 3192 bool initialized; 3188 3193 int count; 3189 3194 3190 - struct work_struct page_cache_work; 3195 + struct delayed_work page_cache_work; 3196 + atomic_t backoff_page_cache_fill; 3191 3197 atomic_t work_in_progress; 3192 3198 struct hrtimer hrtimer; 3193 3199 ··· 3235 3239 if (!krcp->nr_bkv_objs) 3236 3240 return NULL; 3237 3241 3238 - krcp->nr_bkv_objs--; 3242 + WRITE_ONCE(krcp->nr_bkv_objs, krcp->nr_bkv_objs - 1); 3239 3243 return (struct kvfree_rcu_bulk_data *) 3240 3244 llist_del_first(&krcp->bkvcache); 3241 3245 } ··· 3249 3253 return false; 3250 3254 3251 3255 llist_add((struct llist_node *) bnode, &krcp->bkvcache); 3252 - krcp->nr_bkv_objs++; 3256 + WRITE_ONCE(krcp->nr_bkv_objs, krcp->nr_bkv_objs + 1); 3253 3257 return true; 3258 + } 3254 3259 3260 + static int 3261 + drain_page_cache(struct kfree_rcu_cpu *krcp) 3262 + { 3263 + unsigned long flags; 3264 + struct llist_node *page_list, *pos, *n; 3265 + int freed = 0; 3266 + 3267 + raw_spin_lock_irqsave(&krcp->lock, flags); 3268 + page_list = llist_del_all(&krcp->bkvcache); 3269 + WRITE_ONCE(krcp->nr_bkv_objs, 0); 3270 + raw_spin_unlock_irqrestore(&krcp->lock, flags); 3271 + 3272 + llist_for_each_safe(pos, n, page_list) { 3273 + free_page((unsigned long)pos); 3274 + freed++; 3275 + } 3276 + 3277 + return freed; 3255 3278 } 3256 3279 3257 3280 /* 3258 3281 * This function is invoked in workqueue context after a grace period. 3259 - * It frees all the objects queued on ->bhead_free or ->head_free. 3282 + * It frees all the objects queued on ->bkvhead_free or ->head_free. 3260 3283 */ 3261 3284 static void kfree_rcu_work(struct work_struct *work) 3262 3285 { ··· 3302 3287 krwp->head_free = NULL; 3303 3288 raw_spin_unlock_irqrestore(&krcp->lock, flags); 3304 3289 3305 - // Handle two first channels. 3290 + // Handle the first two channels. 3306 3291 for (i = 0; i < FREE_N_CHANNELS; i++) { 3307 3292 for (; bkvhead[i]; bkvhead[i] = bnext) { 3308 3293 bnext = bkvhead[i]->next; ··· 3340 3325 } 3341 3326 3342 3327 /* 3343 - * Emergency case only. It can happen under low memory 3344 - * condition when an allocation gets failed, so the "bulk" 3345 - * path can not be temporary maintained. 3328 + * This is used when the "bulk" path can not be used for the 3329 + * double-argument of kvfree_rcu(). This happens when the 3330 + * page-cache is empty, which means that objects are instead 3331 + * queued on a linked list through their rcu_head structures. 3332 + * This list is named "Channel 3". 3346 3333 */ 3347 3334 for (; head; head = next) { 3348 3335 unsigned long offset = (unsigned long)head->func; ··· 3364 3347 } 3365 3348 3366 3349 /* 3367 - * Schedule the kfree batch RCU work to run in workqueue context after a GP. 3368 - * 3369 - * This function is invoked by kfree_rcu_monitor() when the KFREE_DRAIN_JIFFIES 3370 - * timeout has been reached. 3350 + * This function is invoked after the KFREE_DRAIN_JIFFIES timeout. 3371 3351 */ 3372 - static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp) 3352 + static void kfree_rcu_monitor(struct work_struct *work) 3373 3353 { 3374 - struct kfree_rcu_cpu_work *krwp; 3375 - bool repeat = false; 3354 + struct kfree_rcu_cpu *krcp = container_of(work, 3355 + struct kfree_rcu_cpu, monitor_work.work); 3356 + unsigned long flags; 3376 3357 int i, j; 3377 3358 3378 - lockdep_assert_held(&krcp->lock); 3359 + raw_spin_lock_irqsave(&krcp->lock, flags); 3379 3360 3361 + // Attempt to start a new batch. 3380 3362 for (i = 0; i < KFREE_N_BATCHES; i++) { 3381 - krwp = &(krcp->krw_arr[i]); 3363 + struct kfree_rcu_cpu_work *krwp = &(krcp->krw_arr[i]); 3382 3364 3383 - /* 3384 - * Try to detach bkvhead or head and attach it over any 3385 - * available corresponding free channel. It can be that 3386 - * a previous RCU batch is in progress, it means that 3387 - * immediately to queue another one is not possible so 3388 - * return false to tell caller to retry. 3389 - */ 3365 + // Try to detach bkvhead or head and attach it over any 3366 + // available corresponding free channel. It can be that 3367 + // a previous RCU batch is in progress, it means that 3368 + // immediately to queue another one is not possible so 3369 + // in that case the monitor work is rearmed. 3390 3370 if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) || 3391 3371 (krcp->bkvhead[1] && !krwp->bkvhead_free[1]) || 3392 3372 (krcp->head && !krwp->head_free)) { 3393 - // Channel 1 corresponds to SLAB ptrs. 3394 - // Channel 2 corresponds to vmalloc ptrs. 3373 + // Channel 1 corresponds to the SLAB-pointer bulk path. 3374 + // Channel 2 corresponds to vmalloc-pointer bulk path. 3395 3375 for (j = 0; j < FREE_N_CHANNELS; j++) { 3396 3376 if (!krwp->bkvhead_free[j]) { 3397 3377 krwp->bkvhead_free[j] = krcp->bkvhead[j]; ··· 3396 3382 } 3397 3383 } 3398 3384 3399 - // Channel 3 corresponds to emergency path. 3385 + // Channel 3 corresponds to both SLAB and vmalloc 3386 + // objects queued on the linked list. 3400 3387 if (!krwp->head_free) { 3401 3388 krwp->head_free = krcp->head; 3402 3389 krcp->head = NULL; ··· 3405 3390 3406 3391 WRITE_ONCE(krcp->count, 0); 3407 3392 3408 - /* 3409 - * One work is per one batch, so there are three 3410 - * "free channels", the batch can handle. It can 3411 - * be that the work is in the pending state when 3412 - * channels have been detached following by each 3413 - * other. 3414 - */ 3393 + // One work is per one batch, so there are three 3394 + // "free channels", the batch can handle. It can 3395 + // be that the work is in the pending state when 3396 + // channels have been detached following by each 3397 + // other. 3415 3398 queue_rcu_work(system_wq, &krwp->rcu_work); 3416 3399 } 3417 - 3418 - // Repeat if any "free" corresponding channel is still busy. 3419 - if (krcp->bkvhead[0] || krcp->bkvhead[1] || krcp->head) 3420 - repeat = true; 3421 3400 } 3422 3401 3423 - return !repeat; 3424 - } 3425 - 3426 - static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp, 3427 - unsigned long flags) 3428 - { 3429 - // Attempt to start a new batch. 3430 - krcp->monitor_todo = false; 3431 - if (queue_kfree_rcu_work(krcp)) { 3432 - // Success! Our job is done here. 3433 - raw_spin_unlock_irqrestore(&krcp->lock, flags); 3434 - return; 3435 - } 3436 - 3437 - // Previous RCU batch still in progress, try again later. 3438 - krcp->monitor_todo = true; 3439 - schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES); 3440 - raw_spin_unlock_irqrestore(&krcp->lock, flags); 3441 - } 3442 - 3443 - /* 3444 - * This function is invoked after the KFREE_DRAIN_JIFFIES timeout. 3445 - * It invokes kfree_rcu_drain_unlock() to attempt to start another batch. 3446 - */ 3447 - static void kfree_rcu_monitor(struct work_struct *work) 3448 - { 3449 - unsigned long flags; 3450 - struct kfree_rcu_cpu *krcp = container_of(work, struct kfree_rcu_cpu, 3451 - monitor_work.work); 3452 - 3453 - raw_spin_lock_irqsave(&krcp->lock, flags); 3454 - if (krcp->monitor_todo) 3455 - kfree_rcu_drain_unlock(krcp, flags); 3402 + // If there is nothing to detach, it means that our job is 3403 + // successfully done here. In case of having at least one 3404 + // of the channels that is still busy we should rearm the 3405 + // work to repeat an attempt. Because previous batches are 3406 + // still in progress. 3407 + if (!krcp->bkvhead[0] && !krcp->bkvhead[1] && !krcp->head) 3408 + krcp->monitor_todo = false; 3456 3409 else 3457 - raw_spin_unlock_irqrestore(&krcp->lock, flags); 3410 + schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES); 3411 + 3412 + raw_spin_unlock_irqrestore(&krcp->lock, flags); 3458 3413 } 3459 3414 3460 3415 static enum hrtimer_restart ··· 3433 3448 struct kfree_rcu_cpu *krcp = 3434 3449 container_of(t, struct kfree_rcu_cpu, hrtimer); 3435 3450 3436 - queue_work(system_highpri_wq, &krcp->page_cache_work); 3451 + queue_delayed_work(system_highpri_wq, &krcp->page_cache_work, 0); 3437 3452 return HRTIMER_NORESTART; 3438 3453 } 3439 3454 ··· 3442 3457 struct kvfree_rcu_bulk_data *bnode; 3443 3458 struct kfree_rcu_cpu *krcp = 3444 3459 container_of(work, struct kfree_rcu_cpu, 3445 - page_cache_work); 3460 + page_cache_work.work); 3446 3461 unsigned long flags; 3462 + int nr_pages; 3447 3463 bool pushed; 3448 3464 int i; 3449 3465 3450 - for (i = 0; i < rcu_min_cached_objs; i++) { 3466 + nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? 3467 + 1 : rcu_min_cached_objs; 3468 + 3469 + for (i = 0; i < nr_pages; i++) { 3451 3470 bnode = (struct kvfree_rcu_bulk_data *) 3452 3471 __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); 3453 3472 ··· 3468 3479 } 3469 3480 3470 3481 atomic_set(&krcp->work_in_progress, 0); 3482 + atomic_set(&krcp->backoff_page_cache_fill, 0); 3471 3483 } 3472 3484 3473 3485 static void ··· 3476 3486 { 3477 3487 if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING && 3478 3488 !atomic_xchg(&krcp->work_in_progress, 1)) { 3479 - hrtimer_init(&krcp->hrtimer, CLOCK_MONOTONIC, 3480 - HRTIMER_MODE_REL); 3481 - krcp->hrtimer.function = schedule_page_work_fn; 3482 - hrtimer_start(&krcp->hrtimer, 0, HRTIMER_MODE_REL); 3489 + if (atomic_read(&krcp->backoff_page_cache_fill)) { 3490 + queue_delayed_work(system_wq, 3491 + &krcp->page_cache_work, 3492 + msecs_to_jiffies(rcu_delay_page_cache_fill_msec)); 3493 + } else { 3494 + hrtimer_init(&krcp->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); 3495 + krcp->hrtimer.function = schedule_page_work_fn; 3496 + hrtimer_start(&krcp->hrtimer, 0, HRTIMER_MODE_REL); 3497 + } 3483 3498 } 3484 3499 } 3485 3500 ··· 3549 3554 } 3550 3555 3551 3556 /* 3552 - * Queue a request for lazy invocation of appropriate free routine after a 3553 - * grace period. Please note there are three paths are maintained, two are the 3554 - * main ones that use array of pointers interface and third one is emergency 3555 - * one, that is used only when the main path can not be maintained temporary, 3556 - * due to memory pressure. 3557 + * Queue a request for lazy invocation of the appropriate free routine 3558 + * after a grace period. Please note that three paths are maintained, 3559 + * two for the common case using arrays of pointers and a third one that 3560 + * is used only when the main paths cannot be used, for example, due to 3561 + * memory pressure. 3557 3562 * 3558 3563 * Each kvfree_call_rcu() request is added to a batch. The batch will be drained 3559 3564 * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will ··· 3642 3647 struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 3643 3648 3644 3649 count += READ_ONCE(krcp->count); 3650 + count += READ_ONCE(krcp->nr_bkv_objs); 3651 + atomic_set(&krcp->backoff_page_cache_fill, 1); 3645 3652 } 3646 3653 3647 3654 return count; ··· 3653 3656 kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) 3654 3657 { 3655 3658 int cpu, freed = 0; 3656 - unsigned long flags; 3657 3659 3658 3660 for_each_possible_cpu(cpu) { 3659 3661 int count; 3660 3662 struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 3661 3663 3662 3664 count = krcp->count; 3663 - raw_spin_lock_irqsave(&krcp->lock, flags); 3664 - if (krcp->monitor_todo) 3665 - kfree_rcu_drain_unlock(krcp, flags); 3666 - else 3667 - raw_spin_unlock_irqrestore(&krcp->lock, flags); 3665 + count += drain_page_cache(krcp); 3666 + kfree_rcu_monitor(&krcp->monitor_work.work); 3668 3667 3669 3668 sc->nr_to_scan -= count; 3670 3669 freed += count; ··· 3688 3695 struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 3689 3696 3690 3697 raw_spin_lock_irqsave(&krcp->lock, flags); 3691 - if (!krcp->head || krcp->monitor_todo) { 3698 + if ((!krcp->bkvhead[0] && !krcp->bkvhead[1] && !krcp->head) || 3699 + krcp->monitor_todo) { 3692 3700 raw_spin_unlock_irqrestore(&krcp->lock, flags); 3693 3701 continue; 3694 3702 } ··· 3746 3752 * read-side critical sections have completed. Note, however, that 3747 3753 * upon return from synchronize_rcu(), the caller might well be executing 3748 3754 * concurrently with new RCU read-side critical sections that began while 3749 - * synchronize_rcu() was waiting. RCU read-side critical sections are 3750 - * delimited by rcu_read_lock() and rcu_read_unlock(), and may be nested. 3751 - * In addition, regions of code across which interrupts, preemption, or 3752 - * softirqs have been disabled also serve as RCU read-side critical 3755 + * synchronize_rcu() was waiting. 3756 + * 3757 + * RCU read-side critical sections are delimited by rcu_read_lock() 3758 + * and rcu_read_unlock(), and may be nested. In addition, but only in 3759 + * v5.0 and later, regions of code across which interrupts, preemption, 3760 + * or softirqs have been disabled also serve as RCU read-side critical 3753 3761 * sections. This includes hardware interrupt handlers, softirq handlers, 3754 3762 * and NMI handlers. 3755 3763 * ··· 3772 3776 * to have executed a full memory barrier during the execution of 3773 3777 * synchronize_rcu() -- even if CPU A and CPU B are the same CPU (but 3774 3778 * again only if the system has more than one CPU). 3779 + * 3780 + * Implementation of these memory-ordering guarantees is described here: 3781 + * Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst. 3775 3782 */ 3776 3783 void synchronize_rcu(void) 3777 3784 { ··· 3845 3846 /** 3846 3847 * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period 3847 3848 * 3848 - * @oldstate: return from call to get_state_synchronize_rcu() or start_poll_synchronize_rcu() 3849 + * @oldstate: value from get_state_synchronize_rcu() or start_poll_synchronize_rcu() 3849 3850 * 3850 3851 * If a full RCU grace period has elapsed since the earlier call from 3851 3852 * which oldstate was obtained, return @true, otherwise return @false. 3852 - * If @false is returned, it is the caller's responsibilty to invoke this 3853 + * If @false is returned, it is the caller's responsibility to invoke this 3853 3854 * function later on until it does return @true. Alternatively, the caller 3854 3855 * can explicitly wait for a grace period, for example, by passing @oldstate 3855 3856 * to cond_synchronize_rcu() or by directly invoking synchronize_rcu(). ··· 3861 3862 * (many hours even on 32-bit systems) should check them occasionally 3862 3863 * and either refresh them or set a flag indicating that the grace period 3863 3864 * has completed. 3865 + * 3866 + * This function provides the same memory-ordering guarantees that 3867 + * would be provided by a synchronize_rcu() that was invoked at the call 3868 + * to the function that provided @oldstate, and that returned at the end 3869 + * of this function. 3864 3870 */ 3865 3871 bool poll_state_synchronize_rcu(unsigned long oldstate) 3866 3872 { ··· 3880 3876 /** 3881 3877 * cond_synchronize_rcu - Conditionally wait for an RCU grace period 3882 3878 * 3883 - * @oldstate: return value from earlier call to get_state_synchronize_rcu() 3879 + * @oldstate: value from get_state_synchronize_rcu() or start_poll_synchronize_rcu() 3884 3880 * 3885 3881 * If a full RCU grace period has elapsed since the earlier call to 3886 3882 * get_state_synchronize_rcu() or start_poll_synchronize_rcu(), just return. ··· 3890 3886 * counter wrap is harmless. If the counter wraps, we have waited for 3891 3887 * more than 2 billion grace periods (and way more on a 64-bit system!), 3892 3888 * so waiting for one additional grace period should be just fine. 3889 + * 3890 + * This function provides the same memory-ordering guarantees that 3891 + * would be provided by a synchronize_rcu() that was invoked at the call 3892 + * to the function that provided @oldstate, and that returned at the end 3893 + * of this function. 3893 3894 */ 3894 3895 void cond_synchronize_rcu(unsigned long oldstate) 3895 3896 { ··· 3922 3913 check_cpu_stall(rdp); 3923 3914 3924 3915 /* Does this CPU need a deferred NOCB wakeup? */ 3925 - if (rcu_nocb_need_deferred_wakeup(rdp)) 3916 + if (rcu_nocb_need_deferred_wakeup(rdp, RCU_NOCB_WAKE)) 3926 3917 return 1; 3927 3918 3928 3919 /* Is this a nohz_full CPU in userspace or idle? (Ignore RCU if so.) */ ··· 4105 4096 /* 4106 4097 * Propagate ->qsinitmask bits up the rcu_node tree to account for the 4107 4098 * first CPU in a given leaf rcu_node structure coming online. The caller 4108 - * must hold the corresponding leaf rcu_node ->lock with interrrupts 4099 + * must hold the corresponding leaf rcu_node ->lock with interrupts 4109 4100 * disabled. 4110 4101 */ 4111 4102 static void rcu_init_new_rnp(struct rcu_node *rnp_leaf) ··· 4200 4191 rdp->rcu_iw_gp_seq = rdp->gp_seq - 1; 4201 4192 trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl")); 4202 4193 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 4203 - rcu_prepare_kthreads(cpu); 4194 + rcu_spawn_one_boost_kthread(rnp); 4204 4195 rcu_spawn_cpu_nocb_kthread(cpu); 4205 4196 WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus + 1); 4206 4197 ··· 4483 4474 wake_up_process(t); 4484 4475 rcu_spawn_nocb_kthreads(); 4485 4476 rcu_spawn_boost_kthreads(); 4477 + rcu_spawn_core_kthreads(); 4486 4478 return 0; 4487 4479 } 4488 4480 early_initcall(rcu_spawn_gp_kthread); ··· 4594 4584 * replace the definitions in tree.h because those are needed to size 4595 4585 * the ->node array in the rcu_state structure. 4596 4586 */ 4597 - static void __init rcu_init_geometry(void) 4587 + void rcu_init_geometry(void) 4598 4588 { 4599 4589 ulong d; 4600 4590 int i; 4591 + static unsigned long old_nr_cpu_ids; 4601 4592 int rcu_capacity[RCU_NUM_LVLS]; 4593 + static bool initialized; 4594 + 4595 + if (initialized) { 4596 + /* 4597 + * Warn if setup_nr_cpu_ids() had not yet been invoked, 4598 + * unless nr_cpus_ids == NR_CPUS, in which case who cares? 4599 + */ 4600 + WARN_ON_ONCE(old_nr_cpu_ids != nr_cpu_ids); 4601 + return; 4602 + } 4603 + 4604 + old_nr_cpu_ids = nr_cpu_ids; 4605 + initialized = true; 4602 4606 4603 4607 /* 4604 4608 * Initialize any unspecified boot parameters. ··· 4713 4689 int cpu; 4714 4690 int i; 4715 4691 4692 + /* Clamp it to [0:100] seconds interval. */ 4693 + if (rcu_delay_page_cache_fill_msec < 0 || 4694 + rcu_delay_page_cache_fill_msec > 100 * MSEC_PER_SEC) { 4695 + 4696 + rcu_delay_page_cache_fill_msec = 4697 + clamp(rcu_delay_page_cache_fill_msec, 0, 4698 + (int) (100 * MSEC_PER_SEC)); 4699 + 4700 + pr_info("Adjusting rcutree.rcu_delay_page_cache_fill_msec to %d ms.\n", 4701 + rcu_delay_page_cache_fill_msec); 4702 + } 4703 + 4716 4704 for_each_possible_cpu(cpu) { 4717 4705 struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); 4718 4706 ··· 4734 4698 } 4735 4699 4736 4700 INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor); 4737 - INIT_WORK(&krcp->page_cache_work, fill_page_cache_func); 4701 + INIT_DELAYED_WORK(&krcp->page_cache_work, fill_page_cache_func); 4738 4702 krcp->initialized = true; 4739 4703 } 4740 4704 if (register_shrinker(&kfree_rcu_shrinker)) ··· 4768 4732 rcutree_online_cpu(cpu); 4769 4733 } 4770 4734 4771 - /* Create workqueue for expedited GPs and for Tree SRCU. */ 4735 + /* Create workqueue for Tree SRCU and for expedited GPs. */ 4772 4736 rcu_gp_wq = alloc_workqueue("rcu_gp", WQ_MEM_RECLAIM, 0); 4773 4737 WARN_ON(!rcu_gp_wq); 4774 4738 rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0); 4775 4739 WARN_ON(!rcu_par_gp_wq); 4776 - srcu_init(); 4777 4740 4778 4741 /* Fill in default value for rcutree.qovld boot parameter. */ 4779 4742 /* -After- the rcu_node ->lock fields are initialized! */
+7 -7
kernel/rcu/tree.h
··· 115 115 /* boosting for this rcu_node structure. */ 116 116 unsigned int boost_kthread_status; 117 117 /* State of boost_kthread_task for tracing. */ 118 + unsigned long n_boosts; /* Number of boosts for this rcu_node structure. */ 118 119 #ifdef CONFIG_RCU_NOCB_CPU 119 120 struct swait_queue_head nocb_gp_wq[2]; 120 121 /* Place for rcu_nocb_kthread() to wait GP. */ ··· 154 153 unsigned long gp_seq; /* Track rsp->gp_seq counter. */ 155 154 unsigned long gp_seq_needed; /* Track furthest future GP request. */ 156 155 union rcu_noqs cpu_no_qs; /* No QSes yet for this CPU. */ 157 - bool core_needs_qs; /* Core waits for quiesc state. */ 156 + bool core_needs_qs; /* Core waits for quiescent state. */ 158 157 bool beenonline; /* CPU online at least once. */ 159 158 bool gpwrap; /* Possible ->gp_seq wrap. */ 160 159 bool exp_deferred_qs; /* This CPU awaiting a deferred QS? */ ··· 219 218 220 219 /* The following fields are used by GP kthread, hence own cacheline. */ 221 220 raw_spinlock_t nocb_gp_lock ____cacheline_internodealigned_in_smp; 222 - struct timer_list nocb_bypass_timer; /* Force nocb_bypass flush. */ 223 221 u8 nocb_gp_sleep; /* Is the nocb GP thread asleep? */ 224 222 u8 nocb_gp_bypass; /* Found a bypass on last scan? */ 225 223 u8 nocb_gp_gp; /* GP to wait for on last scan? */ ··· 257 257 }; 258 258 259 259 /* Values for nocb_defer_wakeup field in struct rcu_data. */ 260 - #define RCU_NOCB_WAKE_OFF -1 261 260 #define RCU_NOCB_WAKE_NOT 0 262 - #define RCU_NOCB_WAKE 1 263 - #define RCU_NOCB_WAKE_FORCE 2 261 + #define RCU_NOCB_WAKE_BYPASS 1 262 + #define RCU_NOCB_WAKE 2 263 + #define RCU_NOCB_WAKE_FORCE 3 264 264 265 265 #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500)) 266 266 /* For jiffies_till_first_fqs and */ ··· 417 417 static void rcu_preempt_boost_start_gp(struct rcu_node *rnp); 418 418 static bool rcu_is_callbacks_kthread(void); 419 419 static void rcu_cpu_kthread_setup(unsigned int cpu); 420 + static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp); 420 421 static void __init rcu_spawn_boost_kthreads(void); 421 - static void rcu_prepare_kthreads(int cpu); 422 422 static void rcu_cleanup_after_idle(void); 423 423 static void rcu_prepare_for_idle(void); 424 424 static bool rcu_preempt_has_tasks(struct rcu_node *rnp); ··· 434 434 bool *was_alldone, unsigned long flags); 435 435 static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty, 436 436 unsigned long flags); 437 - static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp); 437 + static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level); 438 438 static bool do_nocb_deferred_wakeup(struct rcu_data *rdp); 439 439 static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp); 440 440 static void rcu_spawn_cpu_nocb_kthread(int cpu);
+114 -127
kernel/rcu/tree_plugin.h
··· 33 33 return false; 34 34 } 35 35 36 - static inline bool rcu_running_nocb_timer(struct rcu_data *rdp) 37 - { 38 - return (timer_curr_running(&rdp->nocb_timer) && !in_irq()); 39 - } 40 36 #else 41 37 static inline int rcu_lockdep_is_held_nocb(struct rcu_data *rdp) 42 38 { ··· 40 44 } 41 45 42 46 static inline bool rcu_current_is_nocb_kthread(struct rcu_data *rdp) 43 - { 44 - return false; 45 - } 46 - 47 - static inline bool rcu_running_nocb_timer(struct rcu_data *rdp) 48 47 { 49 48 return false; 50 49 } ··· 63 72 rcu_lockdep_is_held_nocb(rdp) || 64 73 (rdp == this_cpu_ptr(&rcu_data) && 65 74 !(IS_ENABLED(CONFIG_PREEMPT_COUNT) && preemptible())) || 66 - rcu_current_is_nocb_kthread(rdp) || 67 - rcu_running_nocb_timer(rdp)), 75 + rcu_current_is_nocb_kthread(rdp)), 68 76 "Unsafe read of RCU_NOCB offloaded state" 69 77 ); 70 78 ··· 1088 1098 /* Lock only for side effect: boosts task t's priority. */ 1089 1099 rt_mutex_lock(&rnp->boost_mtx); 1090 1100 rt_mutex_unlock(&rnp->boost_mtx); /* Then keep lockdep happy. */ 1101 + rnp->n_boosts++; 1091 1102 1092 1103 return READ_ONCE(rnp->exp_tasks) != NULL || 1093 1104 READ_ONCE(rnp->boost_tasks) != NULL; ··· 1188 1197 */ 1189 1198 static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp) 1190 1199 { 1191 - int rnp_index = rnp - rcu_get_root(); 1192 1200 unsigned long flags; 1201 + int rnp_index = rnp - rcu_get_root(); 1193 1202 struct sched_param sp; 1194 1203 struct task_struct *t; 1195 1204 1196 - if (!IS_ENABLED(CONFIG_PREEMPT_RCU)) 1197 - return; 1198 - 1199 - if (!rcu_scheduler_fully_active || rcu_rnp_online_cpus(rnp) == 0) 1205 + if (rnp->boost_kthread_task || !rcu_scheduler_fully_active) 1200 1206 return; 1201 1207 1202 1208 rcu_state.boost = 1; 1203 - 1204 - if (rnp->boost_kthread_task != NULL) 1205 - return; 1206 1209 1207 1210 t = kthread_create(rcu_boost_kthread, (void *)rnp, 1208 1211 "rcub/%d", rnp_index); ··· 1249 1264 struct rcu_node *rnp; 1250 1265 1251 1266 rcu_for_each_leaf_node(rnp) 1252 - rcu_spawn_one_boost_kthread(rnp); 1253 - } 1254 - 1255 - static void rcu_prepare_kthreads(int cpu) 1256 - { 1257 - struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); 1258 - struct rcu_node *rnp = rdp->mynode; 1259 - 1260 - /* Fire up the incoming CPU's kthread and leaf rcu_node kthread. */ 1261 - if (rcu_scheduler_fully_active) 1262 - rcu_spawn_one_boost_kthread(rnp); 1267 + if (rcu_rnp_online_cpus(rnp)) 1268 + rcu_spawn_one_boost_kthread(rnp); 1263 1269 } 1264 1270 1265 1271 #else /* #ifdef CONFIG_RCU_BOOST */ ··· 1270 1294 { 1271 1295 } 1272 1296 1297 + static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp) 1298 + { 1299 + } 1300 + 1273 1301 static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) 1274 1302 { 1275 1303 } 1276 1304 1277 1305 static void __init rcu_spawn_boost_kthreads(void) 1278 - { 1279 - } 1280 - 1281 - static void rcu_prepare_kthreads(int cpu) 1282 1306 { 1283 1307 } 1284 1308 ··· 1511 1535 static int __init rcu_nocb_setup(char *str) 1512 1536 { 1513 1537 alloc_bootmem_cpumask_var(&rcu_nocb_mask); 1514 - if (!strcasecmp(str, "all")) /* legacy: use "0-N" instead */ 1538 + if (cpulist_parse(str, rcu_nocb_mask)) { 1539 + pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n"); 1515 1540 cpumask_setall(rcu_nocb_mask); 1516 - else 1517 - if (cpulist_parse(str, rcu_nocb_mask)) { 1518 - pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n"); 1519 - cpumask_setall(rcu_nocb_mask); 1520 - } 1541 + } 1521 1542 return 1; 1522 1543 } 1523 1544 __setup("rcu_nocbs=", rcu_nocb_setup); ··· 1665 1692 return false; 1666 1693 } 1667 1694 1668 - /* 1669 - * Kick the GP kthread for this NOCB group. Caller holds ->nocb_lock 1670 - * and this function releases it. 1671 - */ 1672 - static bool wake_nocb_gp(struct rcu_data *rdp, bool force, 1673 - unsigned long flags) 1674 - __releases(rdp->nocb_lock) 1695 + static bool __wake_nocb_gp(struct rcu_data *rdp_gp, 1696 + struct rcu_data *rdp, 1697 + bool force, unsigned long flags) 1698 + __releases(rdp_gp->nocb_gp_lock) 1675 1699 { 1676 1700 bool needwake = false; 1677 - struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; 1678 1701 1679 - lockdep_assert_held(&rdp->nocb_lock); 1680 1702 if (!READ_ONCE(rdp_gp->nocb_gp_kthread)) { 1681 - rcu_nocb_unlock_irqrestore(rdp, flags); 1703 + raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags); 1682 1704 trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, 1683 1705 TPS("AlreadyAwake")); 1684 1706 return false; 1685 1707 } 1686 1708 1687 - if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) { 1688 - WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT); 1689 - del_timer(&rdp->nocb_timer); 1709 + if (rdp_gp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) { 1710 + WRITE_ONCE(rdp_gp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT); 1711 + del_timer(&rdp_gp->nocb_timer); 1690 1712 } 1691 - rcu_nocb_unlock_irqrestore(rdp, flags); 1692 - raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags); 1713 + 1693 1714 if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) { 1694 1715 WRITE_ONCE(rdp_gp->nocb_gp_sleep, false); 1695 1716 needwake = true; 1696 - trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DoWake")); 1697 1717 } 1698 1718 raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags); 1699 - if (needwake) 1719 + if (needwake) { 1720 + trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DoWake")); 1700 1721 wake_up_process(rdp_gp->nocb_gp_kthread); 1722 + } 1701 1723 1702 1724 return needwake; 1725 + } 1726 + 1727 + /* 1728 + * Kick the GP kthread for this NOCB group. 1729 + */ 1730 + static bool wake_nocb_gp(struct rcu_data *rdp, bool force) 1731 + { 1732 + unsigned long flags; 1733 + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; 1734 + 1735 + raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags); 1736 + return __wake_nocb_gp(rdp_gp, rdp, force, flags); 1703 1737 } 1704 1738 1705 1739 /* ··· 1716 1736 static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype, 1717 1737 const char *reason) 1718 1738 { 1719 - if (rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_OFF) 1720 - return; 1721 - if (rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) 1722 - mod_timer(&rdp->nocb_timer, jiffies + 1); 1723 - if (rdp->nocb_defer_wakeup < waketype) 1724 - WRITE_ONCE(rdp->nocb_defer_wakeup, waketype); 1739 + unsigned long flags; 1740 + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; 1741 + 1742 + raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags); 1743 + 1744 + /* 1745 + * Bypass wakeup overrides previous deferments. In case 1746 + * of callback storm, no need to wake up too early. 1747 + */ 1748 + if (waketype == RCU_NOCB_WAKE_BYPASS) { 1749 + mod_timer(&rdp_gp->nocb_timer, jiffies + 2); 1750 + WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype); 1751 + } else { 1752 + if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE) 1753 + mod_timer(&rdp_gp->nocb_timer, jiffies + 1); 1754 + if (rdp_gp->nocb_defer_wakeup < waketype) 1755 + WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype); 1756 + } 1757 + 1758 + raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags); 1759 + 1725 1760 trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, reason); 1726 1761 } 1727 1762 ··· 1935 1940 } 1936 1941 1937 1942 /* 1938 - * Awaken the no-CBs grace-period kthead if needed, either due to it 1943 + * Awaken the no-CBs grace-period kthread if needed, either due to it 1939 1944 * legitimately being asleep or due to overload conditions. 1940 1945 * 1941 1946 * If warranted, also wake up the kthread servicing this CPUs queues. ··· 1963 1968 rdp->qlen_last_fqs_check = len; 1964 1969 if (!irqs_disabled_flags(flags)) { 1965 1970 /* ... if queue was empty ... */ 1966 - wake_nocb_gp(rdp, false, flags); 1971 + rcu_nocb_unlock_irqrestore(rdp, flags); 1972 + wake_nocb_gp(rdp, false); 1967 1973 trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, 1968 1974 TPS("WakeEmpty")); 1969 1975 } else { 1976 + rcu_nocb_unlock_irqrestore(rdp, flags); 1970 1977 wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE, 1971 1978 TPS("WakeEmptyIsDeferred")); 1972 - rcu_nocb_unlock_irqrestore(rdp, flags); 1973 1979 } 1974 1980 } else if (len > rdp->qlen_last_fqs_check + qhimark) { 1975 1981 /* ... or if many callbacks queued. */ ··· 1985 1989 smp_mb(); /* Enqueue before timer_pending(). */ 1986 1990 if ((rdp->nocb_cb_sleep || 1987 1991 !rcu_segcblist_ready_cbs(&rdp->cblist)) && 1988 - !timer_pending(&rdp->nocb_bypass_timer)) 1992 + !timer_pending(&rdp->nocb_timer)) { 1993 + rcu_nocb_unlock_irqrestore(rdp, flags); 1989 1994 wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_FORCE, 1990 1995 TPS("WakeOvfIsDeferred")); 1991 - rcu_nocb_unlock_irqrestore(rdp, flags); 1996 + } else { 1997 + rcu_nocb_unlock_irqrestore(rdp, flags); 1998 + trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WakeNot")); 1999 + } 1992 2000 } else { 1993 2001 rcu_nocb_unlock_irqrestore(rdp, flags); 1994 2002 trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WakeNot")); 1995 2003 } 1996 2004 return; 1997 - } 1998 - 1999 - /* Wake up the no-CBs GP kthread to flush ->nocb_bypass. */ 2000 - static void do_nocb_bypass_wakeup_timer(struct timer_list *t) 2001 - { 2002 - unsigned long flags; 2003 - struct rcu_data *rdp = from_timer(rdp, t, nocb_bypass_timer); 2004 - 2005 - trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Timer")); 2006 - rcu_nocb_lock_irqsave(rdp, flags); 2007 - smp_mb__after_spinlock(); /* Timer expire before wakeup. */ 2008 - __call_rcu_nocb_wake(rdp, true, flags); 2009 2005 } 2010 2006 2011 2007 /* ··· 2106 2118 bypass = true; 2107 2119 } 2108 2120 rnp = rdp->mynode; 2109 - if (bypass) { // Avoid race with first bypass CB. 2110 - WRITE_ONCE(my_rdp->nocb_defer_wakeup, 2111 - RCU_NOCB_WAKE_NOT); 2112 - del_timer(&my_rdp->nocb_timer); 2113 - } 2121 + 2114 2122 // Advance callbacks if helpful and low contention. 2115 2123 needwake_gp = false; 2116 2124 if (!rcu_segcblist_restempty(&rdp->cblist, ··· 2152 2168 my_rdp->nocb_gp_bypass = bypass; 2153 2169 my_rdp->nocb_gp_gp = needwait_gp; 2154 2170 my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0; 2171 + 2155 2172 if (bypass && !rcu_nocb_poll) { 2156 2173 // At least one child with non-empty ->nocb_bypass, so set 2157 2174 // timer in order to avoid stranding its callbacks. 2158 - raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags); 2159 - mod_timer(&my_rdp->nocb_bypass_timer, j + 2); 2160 - raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags); 2175 + wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_BYPASS, 2176 + TPS("WakeBypassIsDeferred")); 2161 2177 } 2162 2178 if (rcu_nocb_poll) { 2163 2179 /* Polling, so trace if first poll in the series. */ ··· 2181 2197 } 2182 2198 if (!rcu_nocb_poll) { 2183 2199 raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags); 2184 - if (bypass) 2185 - del_timer(&my_rdp->nocb_bypass_timer); 2200 + if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) { 2201 + WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT); 2202 + del_timer(&my_rdp->nocb_timer); 2203 + } 2186 2204 WRITE_ONCE(my_rdp->nocb_gp_sleep, true); 2187 2205 raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags); 2188 2206 } ··· 2320 2334 } 2321 2335 2322 2336 /* Is a deferred wakeup of rcu_nocb_kthread() required? */ 2323 - static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp) 2337 + static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level) 2324 2338 { 2325 - return READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT; 2339 + return READ_ONCE(rdp->nocb_defer_wakeup) >= level; 2326 2340 } 2327 2341 2328 2342 /* Do a deferred wakeup of rcu_nocb_kthread(). */ 2329 - static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp) 2343 + static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp_gp, 2344 + struct rcu_data *rdp, int level, 2345 + unsigned long flags) 2346 + __releases(rdp_gp->nocb_gp_lock) 2330 2347 { 2331 - unsigned long flags; 2332 2348 int ndw; 2333 2349 int ret; 2334 2350 2335 - rcu_nocb_lock_irqsave(rdp, flags); 2336 - if (!rcu_nocb_need_deferred_wakeup(rdp)) { 2337 - rcu_nocb_unlock_irqrestore(rdp, flags); 2351 + if (!rcu_nocb_need_deferred_wakeup(rdp_gp, level)) { 2352 + raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags); 2338 2353 return false; 2339 2354 } 2340 - ndw = READ_ONCE(rdp->nocb_defer_wakeup); 2341 - ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags); 2355 + 2356 + ndw = rdp_gp->nocb_defer_wakeup; 2357 + ret = __wake_nocb_gp(rdp_gp, rdp, ndw == RCU_NOCB_WAKE_FORCE, flags); 2342 2358 trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake")); 2343 2359 2344 2360 return ret; ··· 2349 2361 /* Do a deferred wakeup of rcu_nocb_kthread() from a timer handler. */ 2350 2362 static void do_nocb_deferred_wakeup_timer(struct timer_list *t) 2351 2363 { 2364 + unsigned long flags; 2352 2365 struct rcu_data *rdp = from_timer(rdp, t, nocb_timer); 2353 2366 2354 - do_nocb_deferred_wakeup_common(rdp); 2367 + WARN_ON_ONCE(rdp->nocb_gp_rdp != rdp); 2368 + trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Timer")); 2369 + 2370 + raw_spin_lock_irqsave(&rdp->nocb_gp_lock, flags); 2371 + smp_mb__after_spinlock(); /* Timer expire before wakeup. */ 2372 + do_nocb_deferred_wakeup_common(rdp, rdp, RCU_NOCB_WAKE_BYPASS, flags); 2355 2373 } 2356 2374 2357 2375 /* ··· 2367 2373 */ 2368 2374 static bool do_nocb_deferred_wakeup(struct rcu_data *rdp) 2369 2375 { 2370 - if (rcu_nocb_need_deferred_wakeup(rdp)) 2371 - return do_nocb_deferred_wakeup_common(rdp); 2372 - return false; 2376 + unsigned long flags; 2377 + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; 2378 + 2379 + if (!rdp_gp || !rcu_nocb_need_deferred_wakeup(rdp_gp, RCU_NOCB_WAKE)) 2380 + return false; 2381 + 2382 + raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags); 2383 + return do_nocb_deferred_wakeup_common(rdp_gp, rdp, RCU_NOCB_WAKE, flags); 2373 2384 } 2374 2385 2375 2386 void rcu_nocb_flush_deferred_wakeup(void) ··· 2442 2443 swait_event_exclusive(rdp->nocb_state_wq, 2443 2444 !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB | 2444 2445 SEGCBLIST_KTHREAD_GP)); 2445 - rcu_nocb_lock_irqsave(rdp, flags); 2446 - /* Make sure nocb timer won't stay around */ 2447 - WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_OFF); 2448 - rcu_nocb_unlock_irqrestore(rdp, flags); 2449 - del_timer_sync(&rdp->nocb_timer); 2450 - 2451 2446 /* 2452 - * Theoretically we could set SEGCBLIST_SOFTIRQ_ONLY with CB unlocked 2453 - * and IRQs disabled but let's be paranoid. 2447 + * Lock one last time to acquire latest callback updates from kthreads 2448 + * so we can later handle callbacks locally without locking. 2454 2449 */ 2455 2450 rcu_nocb_lock_irqsave(rdp, flags); 2451 + /* 2452 + * Theoretically we could set SEGCBLIST_SOFTIRQ_ONLY after the nocb 2453 + * lock is released but how about being paranoid for once? 2454 + */ 2456 2455 rcu_segcblist_set_flags(cblist, SEGCBLIST_SOFTIRQ_ONLY); 2457 2456 /* 2458 2457 * With SEGCBLIST_SOFTIRQ_ONLY, we can't use ··· 2470 2473 struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); 2471 2474 int ret = 0; 2472 2475 2473 - if (rdp == rdp->nocb_gp_rdp) { 2474 - pr_info("Can't deoffload an rdp GP leader (yet)\n"); 2475 - return -EINVAL; 2476 - } 2477 2476 mutex_lock(&rcu_state.barrier_mutex); 2478 2477 cpus_read_lock(); 2479 2478 if (rcu_rdp_is_offloaded(rdp)) { ··· 2510 2517 * SEGCBLIST_SOFTIRQ_ONLY mode. 2511 2518 */ 2512 2519 raw_spin_lock_irqsave(&rdp->nocb_lock, flags); 2513 - /* Re-enable nocb timer */ 2514 - WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT); 2520 + 2515 2521 /* 2516 2522 * We didn't take the nocb lock while working on the 2517 2523 * rdp->cblist in SEGCBLIST_SOFTIRQ_ONLY mode. ··· 2618 2626 raw_spin_lock_init(&rdp->nocb_bypass_lock); 2619 2627 raw_spin_lock_init(&rdp->nocb_gp_lock); 2620 2628 timer_setup(&rdp->nocb_timer, do_nocb_deferred_wakeup_timer, 0); 2621 - timer_setup(&rdp->nocb_bypass_timer, do_nocb_bypass_wakeup_timer, 0); 2622 2629 rcu_cblist_init(&rdp->nocb_bypass); 2623 2630 } 2624 2631 ··· 2776 2785 { 2777 2786 struct rcu_node *rnp = rdp->mynode; 2778 2787 2779 - pr_info("nocb GP %d %c%c%c%c%c%c %c[%c%c] %c%c:%ld rnp %d:%d %lu %c CPU %d%s\n", 2788 + pr_info("nocb GP %d %c%c%c%c%c %c[%c%c] %c%c:%ld rnp %d:%d %lu %c CPU %d%s\n", 2780 2789 rdp->cpu, 2781 2790 "kK"[!!rdp->nocb_gp_kthread], 2782 2791 "lL"[raw_spin_is_locked(&rdp->nocb_gp_lock)], 2783 2792 "dD"[!!rdp->nocb_defer_wakeup], 2784 2793 "tT"[timer_pending(&rdp->nocb_timer)], 2785 - "bB"[timer_pending(&rdp->nocb_bypass_timer)], 2786 2794 "sS"[!!rdp->nocb_gp_sleep], 2787 2795 ".W"[swait_active(&rdp->nocb_gp_wq)], 2788 2796 ".W"[swait_active(&rnp->nocb_gp_wq[0])], ··· 2802 2812 char bufr[20]; 2803 2813 struct rcu_segcblist *rsclp = &rdp->cblist; 2804 2814 bool waslocked; 2805 - bool wastimer; 2806 2815 bool wassleep; 2807 2816 2808 2817 if (rdp->nocb_gp_rdp == rdp) ··· 2838 2849 return; 2839 2850 2840 2851 waslocked = raw_spin_is_locked(&rdp->nocb_gp_lock); 2841 - wastimer = timer_pending(&rdp->nocb_bypass_timer); 2842 2852 wassleep = swait_active(&rdp->nocb_gp_wq); 2843 - if (!rdp->nocb_gp_sleep && !waslocked && !wastimer && !wassleep) 2844 - return; /* Nothing untowards. */ 2853 + if (!rdp->nocb_gp_sleep && !waslocked && !wassleep) 2854 + return; /* Nothing untoward. */ 2845 2855 2846 - pr_info(" nocb GP activity on CB-only CPU!!! %c%c%c%c %c\n", 2856 + pr_info(" nocb GP activity on CB-only CPU!!! %c%c%c %c\n", 2847 2857 "lL"[waslocked], 2848 2858 "dD"[!!rdp->nocb_defer_wakeup], 2849 - "tT"[wastimer], 2850 2859 "sS"[!!rdp->nocb_gp_sleep], 2851 2860 ".W"[wassleep]); 2852 2861 } ··· 2909 2922 { 2910 2923 } 2911 2924 2912 - static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp) 2925 + static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level) 2913 2926 { 2914 2927 return false; 2915 2928 }
+76 -8
kernel/rcu/tree_stall.h
··· 314 314 * tasks blocked within RCU read-side critical sections. 315 315 */ 316 316 static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags) 317 + __releases(rnp->lock) 317 318 { 318 319 raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 319 320 return 0; ··· 718 717 719 718 720 719 /* 720 + * Check to see if a failure to end RCU priority inversion was due to 721 + * a CPU not passing through a quiescent state. When this happens, there 722 + * is nothing that RCU priority boosting can do to help, so we shouldn't 723 + * count this as an RCU priority boosting failure. A return of true says 724 + * RCU priority boosting is to blame, and false says otherwise. If false 725 + * is returned, the first of the CPUs to blame is stored through cpup. 726 + * If there was no CPU blocking the current grace period, but also nothing 727 + * in need of being boosted, *cpup is set to -1. This can happen in case 728 + * of vCPU preemption while the last CPU is reporting its quiscent state, 729 + * for example. 730 + * 731 + * If cpup is NULL, then a lockless quick check is carried out, suitable 732 + * for high-rate usage. On the other hand, if cpup is non-NULL, each 733 + * rcu_node structure's ->lock is acquired, ruling out high-rate usage. 734 + */ 735 + bool rcu_check_boost_fail(unsigned long gp_state, int *cpup) 736 + { 737 + bool atb = false; 738 + int cpu; 739 + unsigned long flags; 740 + struct rcu_node *rnp; 741 + 742 + rcu_for_each_leaf_node(rnp) { 743 + if (!cpup) { 744 + if (READ_ONCE(rnp->qsmask)) { 745 + return false; 746 + } else { 747 + if (READ_ONCE(rnp->gp_tasks)) 748 + atb = true; 749 + continue; 750 + } 751 + } 752 + *cpup = -1; 753 + raw_spin_lock_irqsave_rcu_node(rnp, flags); 754 + if (rnp->gp_tasks) 755 + atb = true; 756 + if (!rnp->qsmask) { 757 + // No CPUs without quiescent states for this rnp. 758 + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 759 + continue; 760 + } 761 + // Find the first holdout CPU. 762 + for_each_leaf_node_possible_cpu(rnp, cpu) { 763 + if (rnp->qsmask & (1UL << (cpu - rnp->grplo))) { 764 + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 765 + *cpup = cpu; 766 + return false; 767 + } 768 + } 769 + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); 770 + } 771 + // Can't blame CPUs, so must blame RCU priority boosting. 772 + return atb; 773 + } 774 + EXPORT_SYMBOL_GPL(rcu_check_boost_fail); 775 + 776 + /* 721 777 * Show the state of the grace-period kthreads. 722 778 */ 723 779 void show_rcu_gp_kthreads(void) ··· 784 726 unsigned long j; 785 727 unsigned long ja; 786 728 unsigned long jr; 729 + unsigned long js; 787 730 unsigned long jw; 788 731 struct rcu_data *rdp; 789 732 struct rcu_node *rnp; ··· 793 734 j = jiffies; 794 735 ja = j - data_race(rcu_state.gp_activity); 795 736 jr = j - data_race(rcu_state.gp_req_activity); 737 + js = j - data_race(rcu_state.gp_start); 796 738 jw = j - data_race(rcu_state.gp_wake_time); 797 - pr_info("%s: wait state: %s(%d) ->state: %#x delta ->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld ->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n", 739 + pr_info("%s: wait state: %s(%d) ->state: %#lx ->rt_priority %u delta ->gp_start %lu ->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld ->gp_seq %ld ->gp_seq_needed %ld ->gp_max %lu ->gp_flags %#x\n", 798 740 rcu_state.name, gp_state_getname(rcu_state.gp_state), 799 - rcu_state.gp_state, t ? t->__state : 0x1ffff, 800 - ja, jr, jw, (long)data_race(rcu_state.gp_wake_seq), 741 + rcu_state.gp_state, t ? t->__state : 0x1ffffL, t ? t->rt_priority : 0xffU, 742 + js, ja, jr, jw, (long)data_race(rcu_state.gp_wake_seq), 801 743 (long)data_race(rcu_state.gp_seq), 802 744 (long)data_race(rcu_get_root()->gp_seq_needed), 745 + data_race(rcu_state.gp_max), 803 746 data_race(rcu_state.gp_flags)); 804 747 rcu_for_each_node_breadth_first(rnp) { 805 - if (ULONG_CMP_GE(READ_ONCE(rcu_state.gp_seq), 806 - READ_ONCE(rnp->gp_seq_needed))) 748 + if (ULONG_CMP_GE(READ_ONCE(rcu_state.gp_seq), READ_ONCE(rnp->gp_seq_needed)) && 749 + !data_race(rnp->qsmask) && !data_race(rnp->boost_tasks) && 750 + !data_race(rnp->exp_tasks) && !data_race(rnp->gp_tasks)) 807 751 continue; 808 - pr_info("\trcu_node %d:%d ->gp_seq %ld ->gp_seq_needed %ld\n", 809 - rnp->grplo, rnp->grphi, (long)data_race(rnp->gp_seq), 810 - (long)data_race(rnp->gp_seq_needed)); 752 + pr_info("\trcu_node %d:%d ->gp_seq %ld ->gp_seq_needed %ld ->qsmask %#lx %c%c%c%c ->n_boosts %ld\n", 753 + rnp->grplo, rnp->grphi, 754 + (long)data_race(rnp->gp_seq), (long)data_race(rnp->gp_seq_needed), 755 + data_race(rnp->qsmask), 756 + ".b"[!!data_race(rnp->boost_kthread_task)], 757 + ".B"[!!data_race(rnp->boost_tasks)], 758 + ".E"[!!data_race(rnp->exp_tasks)], 759 + ".G"[!!data_race(rnp->gp_tasks)], 760 + data_race(rnp->n_boosts)); 811 761 if (!rcu_is_leaf_node(rnp)) 812 762 continue; 813 763 for_each_leaf_node_possible_cpu(rnp, cpu) {
+6 -2
kernel/rcu/update.c
··· 277 277 278 278 noinstr int notrace debug_lockdep_rcu_enabled(void) 279 279 { 280 - return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && 280 + return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && READ_ONCE(debug_locks) && 281 281 current->lockdep_recursion == 0; 282 282 } 283 283 EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled); ··· 524 524 } 525 525 526 526 DEFINE_STATIC_SRCU(early_srcu); 527 + static unsigned long early_srcu_cookie; 527 528 528 529 struct early_boot_kfree_rcu { 529 530 struct rcu_head rh; ··· 537 536 struct early_boot_kfree_rcu *rhp; 538 537 539 538 call_rcu(&head, test_callback); 540 - if (IS_ENABLED(CONFIG_SRCU)) 539 + if (IS_ENABLED(CONFIG_SRCU)) { 540 + early_srcu_cookie = start_poll_synchronize_srcu(&early_srcu); 541 541 call_srcu(&early_srcu, &shead, test_callback); 542 + } 542 543 rhp = kmalloc(sizeof(*rhp), GFP_KERNEL); 543 544 if (!WARN_ON_ONCE(!rhp)) 544 545 kfree_rcu(rhp, rh); ··· 566 563 if (IS_ENABLED(CONFIG_SRCU)) { 567 564 early_boot_test_counter++; 568 565 srcu_barrier(&early_srcu); 566 + WARN_ON_ONCE(!poll_state_synchronize_srcu(&early_srcu, early_srcu_cookie)); 569 567 } 570 568 } 571 569 if (rcu_self_test_counter != early_boot_test_counter) {
-14
kernel/time/timer.c
··· 1237 1237 } 1238 1238 EXPORT_SYMBOL(try_to_del_timer_sync); 1239 1239 1240 - bool timer_curr_running(struct timer_list *timer) 1241 - { 1242 - int i; 1243 - 1244 - for (i = 0; i < NR_BASES; i++) { 1245 - struct timer_base *base = this_cpu_ptr(&timer_bases[i]); 1246 - 1247 - if (base->running_timer == timer) 1248 - return true; 1249 - } 1250 - 1251 - return false; 1252 - } 1253 - 1254 1240 #ifdef CONFIG_PREEMPT_RT 1255 1241 static __init void timer_base_init_expiry_lock(struct timer_base *base) 1256 1242 {
+9
lib/bitmap.c
··· 581 581 { 582 582 unsigned int lastbit = r->nbits - 1; 583 583 584 + if (!strncasecmp(str, "all", 3)) { 585 + r->start = 0; 586 + r->end = lastbit; 587 + str += 3; 588 + 589 + goto check_pattern; 590 + } 591 + 584 592 str = bitmap_getnum(str, &r->start, lastbit); 585 593 if (IS_ERR(str)) 586 594 return str; ··· 603 595 if (IS_ERR(str)) 604 596 return str; 605 597 598 + check_pattern: 606 599 if (end_of_region(*str)) 607 600 goto no_pattern; 608 601
+7
lib/test_bitmap.c
··· 366 366 {0, "0-31:1/3,1-31:1/3,2-31:1/3", &exp1[8 * step], 32, 0}, 367 367 {0, "1-10:8/12,8-31:24/29,0-31:0/3", &exp1[9 * step], 32, 0}, 368 368 369 + {0, "all", &exp1[8 * step], 32, 0}, 370 + {0, "0, 1, all, ", &exp1[8 * step], 32, 0}, 371 + {0, "all:1/2", &exp1[4 * step], 32, 0}, 372 + {0, "ALL:1/2", &exp1[4 * step], 32, 0}, 373 + {-EINVAL, "al", NULL, 8, 0}, 374 + {-EINVAL, "alll", NULL, 8, 0}, 375 + 369 376 {-EINVAL, "-1", NULL, 8, 0}, 370 377 {-EINVAL, "-0", NULL, 8, 0}, 371 378 {-EINVAL, "10-1", NULL, 8, 0},
+1 -1
mm/oom_kill.c
··· 922 922 continue; 923 923 } 924 924 /* 925 - * No kthead_use_mm() user needs to read from the userspace so 925 + * No kthread_use_mm() user needs to read from the userspace so 926 926 * we are ok to reap it. 927 927 */ 928 928 if (unlikely(p->flags & PF_KTHREAD))
+1
mm/slab.h
··· 634 634 struct kmem_cache *kp_slab_cache; 635 635 void *kp_ret; 636 636 void *kp_stack[KS_ADDRS_COUNT]; 637 + void *kp_free_stack[KS_ADDRS_COUNT]; 637 638 }; 638 639 void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page); 639 640 #endif
+11 -1
mm/slab_common.c
··· 575 575 * depends on the type of object and on how much debugging is enabled. 576 576 * For a slab-cache object, the fact that it is a slab object is printed, 577 577 * and, if available, the slab name, return address, and stack trace from 578 - * the allocation of that object. 578 + * the allocation and last free path of that object. 579 579 * 580 580 * This function will splat if passed a pointer to a non-slab object. 581 581 * If you are not sure what type of object you have, you should instead ··· 620 620 break; 621 621 pr_info(" %pS\n", kp.kp_stack[i]); 622 622 } 623 + 624 + if (kp.kp_free_stack[0]) 625 + pr_cont(" Free path:\n"); 626 + 627 + for (i = 0; i < ARRAY_SIZE(kp.kp_free_stack); i++) { 628 + if (!kp.kp_free_stack[i]) 629 + break; 630 + pr_info(" %pS\n", kp.kp_free_stack[i]); 631 + } 632 + 623 633 } 624 634 EXPORT_SYMBOL_GPL(kmem_dump_obj); 625 635 #endif
+8
mm/slub.c
··· 4045 4045 !(s->flags & SLAB_STORE_USER)) 4046 4046 return; 4047 4047 #ifdef CONFIG_SLUB_DEBUG 4048 + objp = fixup_red_left(s, objp); 4048 4049 trackp = get_track(s, objp, TRACK_ALLOC); 4049 4050 kpp->kp_ret = (void *)trackp->addr; 4050 4051 #ifdef CONFIG_STACKTRACE 4051 4052 for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) { 4052 4053 kpp->kp_stack[i] = (void *)trackp->addrs[i]; 4053 4054 if (!kpp->kp_stack[i]) 4055 + break; 4056 + } 4057 + 4058 + trackp = get_track(s, objp, TRACK_FREE); 4059 + for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) { 4060 + kpp->kp_free_stack[i] = (void *)trackp->addrs[i]; 4061 + if (!kpp->kp_free_stack[i]) 4054 4062 break; 4055 4063 } 4056 4064 #endif
+1 -1
mm/util.c
··· 983 983 * depends on the type of object and on how much debugging is enabled. 984 984 * For example, for a slab-cache object, the slab name is printed, and, 985 985 * if available, the return address and stack trace from the allocation 986 - * of that object. 986 + * and last free path of that object. 987 987 */ 988 988 void mem_dump_obj(void *object) 989 989 {
+46
tools/rcu/rcu-cbs.py
··· 1 + #!/usr/bin/env drgn 2 + # SPDX-License-Identifier: GPL-2.0+ 3 + # 4 + # Dump out the number of RCU callbacks outstanding. 5 + # 6 + # On older kernels having multiple flavors of RCU, this dumps out the 7 + # number of callbacks for the most heavily used flavor. 8 + # 9 + # Usage: sudo drgn rcu-cbs.py 10 + # 11 + # Copyright (C) 2021 Facebook, Inc. 12 + # 13 + # Authors: Paul E. McKenney <paulmck@kernel.org> 14 + 15 + import sys 16 + import drgn 17 + from drgn import NULL, Object 18 + from drgn.helpers.linux import * 19 + 20 + def get_rdp0(prog): 21 + try: 22 + rdp0 = prog.variable('rcu_preempt_data', 'kernel/rcu/tree.c'); 23 + except LookupError: 24 + rdp0 = NULL; 25 + 26 + if rdp0 == NULL: 27 + try: 28 + rdp0 = prog.variable('rcu_sched_data', 29 + 'kernel/rcu/tree.c'); 30 + except LookupError: 31 + rdp0 = NULL; 32 + 33 + if rdp0 == NULL: 34 + rdp0 = prog.variable('rcu_data', 'kernel/rcu/tree.c'); 35 + return rdp0.address_of_(); 36 + 37 + rdp0 = get_rdp0(prog); 38 + 39 + # Sum up RCU callbacks. 40 + sum = 0; 41 + for cpu in for_each_possible_cpu(prog): 42 + rdp = per_cpu_ptr(rdp0, cpu); 43 + len = rdp.cblist.len.value_(); 44 + # print("CPU " + str(cpu) + " RCU callbacks: " + str(len)); 45 + sum += len; 46 + print("Number of RCU callbacks in flight: " + str(sum));
+9 -24
tools/testing/selftests/rcutorture/bin/kvm-again.sh
··· 29 29 echo "Usage: $scriptname /path/to/old/run [ options ]" 30 30 exit 1 31 31 fi 32 - if ! cp "$oldrun/batches" $T/batches.oldrun 32 + if ! cp "$oldrun/scenarios" $T/scenarios.oldrun 33 33 then 34 34 # Later on, can reconstitute this from console.log files. 35 35 echo Prior run batches file does not exist: $oldrun/batches ··· 143 143 usage 144 144 fi 145 145 rm -f "$rundir"/*/{console.log,console.log.diags,qemu_pid,qemu-retval,Warnings,kvm-test-1-run.sh.out,kvm-test-1-run-qemu.sh.out,vmlinux} "$rundir"/log 146 + touch "$rundir/log" 147 + echo $scriptname $args | tee -a "$rundir/log" 146 148 echo $oldrun > "$rundir/re-run" 147 149 if ! test -d "$rundir/../../bin" 148 150 then ··· 167 165 grep '^#' $i | sed -e 's/^# //' > $T/qemu-cmd-settings 168 166 . $T/qemu-cmd-settings 169 167 170 - grep -v '^#' $T/batches.oldrun | awk ' 171 - BEGIN { 172 - oldbatch = 1; 173 - } 174 - 168 + grep -v '^#' $T/scenarios.oldrun | awk ' 175 169 { 176 - if (oldbatch != $1) { 177 - print "kvm-test-1-run-batch.sh" curbatch; 178 - curbatch = ""; 179 - oldbatch = $1; 180 - } 181 - curbatch = curbatch " " $2; 182 - } 183 - 184 - END { 185 - print "kvm-test-1-run-batch.sh" curbatch 170 + curbatch = ""; 171 + for (i = 2; i <= NF; i++) 172 + curbatch = curbatch " " $i; 173 + print "kvm-test-1-run-batch.sh" curbatch; 186 174 }' > $T/runbatches.sh 187 175 188 176 if test -n "$dryrun" ··· 180 188 echo ---- Dryrun complete, directory: $rundir | tee -a "$rundir/log" 181 189 else 182 190 ( cd "$rundir"; sh $T/runbatches.sh ) 183 - kcsan-collapse.sh "$rundir" | tee -a "$rundir/log" 184 - echo | tee -a "$rundir/log" 185 - echo ---- Results directory: $rundir | tee -a "$rundir/log" 186 - kvm-recheck.sh "$rundir" > $T/kvm-recheck.sh.out 2>&1 187 - ret=$? 188 - cat $T/kvm-recheck.sh.out | tee -a "$rundir/log" 189 - echo " --- Done at `date` (`get_starttime_duration $starttime`) exitcode $ret" | tee -a "$rundir/log" 190 - exit $ret 191 + kvm-end-run-stats.sh "$rundir" "$starttime" 191 192 fi
+4 -2
tools/testing/selftests/rcutorture/bin/kvm-build.sh
··· 40 40 then 41 41 exit 2 42 42 fi 43 - ncpus=`cpus2use.sh` 44 - make -j$ncpus $TORTURE_KMAKE_ARG > $resdir/Make.out 2>&1 43 + 44 + # Tell "make" to use double the number of real CPUs on the build system. 45 + ncpus="`getconf _NPROCESSORS_ONLN`" 46 + make -j$((2 * ncpus)) $TORTURE_KMAKE_ARG > $resdir/Make.out 2>&1 45 47 retval=$? 46 48 if test $retval -ne 0 || grep "rcu[^/]*": < $resdir/Make.out | egrep -q "Stop|Error|error:|warning:" || egrep -q "Stop|Error|error:" < $resdir/Make.out 47 49 then
+40
tools/testing/selftests/rcutorture/bin/kvm-end-run-stats.sh
··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0+ 3 + # 4 + # Check the status of the specified run. 5 + # 6 + # Usage: kvm-end-run-stats.sh /path/to/run starttime 7 + # 8 + # Copyright (C) 2021 Facebook, Inc. 9 + # 10 + # Authors: Paul E. McKenney <paulmck@kernel.org> 11 + 12 + # scriptname=$0 13 + # args="$*" 14 + rundir="$1" 15 + if ! test -d "$rundir" 16 + then 17 + echo kvm-end-run-stats.sh: Specified run directory does not exist: $rundir 18 + exit 1 19 + fi 20 + 21 + T=${TMPDIR-/tmp}/kvm-end-run-stats.sh.$$ 22 + trap 'rm -rf $T' 0 23 + mkdir $T 24 + 25 + KVM="`pwd`/tools/testing/selftests/rcutorture"; export KVM 26 + PATH=${KVM}/bin:$PATH; export PATH 27 + . functions.sh 28 + default_starttime="`get_starttime`" 29 + starttime="${2-default_starttime}" 30 + 31 + echo | tee -a "$rundir/log" 32 + echo | tee -a "$rundir/log" 33 + echo " --- `date` Test summary:" | tee -a "$rundir/log" 34 + echo Results directory: $rundir | tee -a "$rundir/log" 35 + kcsan-collapse.sh "$rundir" | tee -a "$rundir/log" 36 + kvm-recheck.sh "$rundir" > $T/kvm-recheck.sh.out 2>&1 37 + ret=$? 38 + cat $T/kvm-recheck.sh.out | tee -a "$rundir/log" 39 + echo " --- Done at `date` (`get_starttime_duration $starttime`) exitcode $ret" | tee -a "$rundir/log" 40 + exit $ret
+1 -1
tools/testing/selftests/rcutorture/bin/kvm-find-errors.sh
··· 43 43 else 44 44 echo No build errors. 45 45 fi 46 - if grep -q -e "--buildonly" < ${rundir}/log 46 + if grep -q -e "--build-\?only" < ${rundir}/log && ! test -f "${rundir}/remote-log" 47 47 then 48 48 echo Build-only run, no console logs to check. 49 49 exit $editorret
+1 -1
tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh
··· 31 31 echo "$configfile ------- " $stopstate 32 32 else 33 33 title="$configfile ------- $ngps GPs" 34 - dur=`sed -e 's/^.* rcutorture.shutdown_secs=//' -e 's/ .*$//' < $i/qemu-cmd 2> /dev/null` 34 + dur=`grep -v '^#' $i/qemu-cmd | sed -e 's/^.* rcutorture.shutdown_secs=//' -e 's/ .*$//'` 35 35 if test -z "$dur" 36 36 then 37 37 :
+249
tools/testing/selftests/rcutorture/bin/kvm-remote.sh
··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0+ 3 + # 4 + # Run a series of tests on remote systems under KVM. 5 + # 6 + # Usage: kvm-remote.sh "systems" [ <kvm.sh args> ] 7 + # kvm-remote.sh "systems" /path/to/old/run [ <kvm-again.sh args> ] 8 + # 9 + # Copyright (C) 2021 Facebook, Inc. 10 + # 11 + # Authors: Paul E. McKenney <paulmck@kernel.org> 12 + 13 + scriptname=$0 14 + args="$*" 15 + 16 + if ! test -d tools/testing/selftests/rcutorture/bin 17 + then 18 + echo $scriptname must be run from top-level directory of kernel source tree. 19 + exit 1 20 + fi 21 + 22 + KVM="`pwd`/tools/testing/selftests/rcutorture"; export KVM 23 + PATH=${KVM}/bin:$PATH; export PATH 24 + . functions.sh 25 + 26 + starttime="`get_starttime`" 27 + 28 + systems="$1" 29 + if test -z "$systems" 30 + then 31 + echo $scriptname: Empty list of systems will go nowhere good, giving up. 32 + exit 1 33 + fi 34 + shift 35 + 36 + # Pathnames: 37 + # T: /tmp/kvm-remote.sh.$$ 38 + # resdir: /tmp/kvm-remote.sh.$$/res 39 + # rundir: /tmp/kvm-remote.sh.$$/res/$ds ("-remote" suffix) 40 + # oldrun: `pwd`/tools/testing/.../res/$otherds 41 + # 42 + # Pathname segments: 43 + # TD: kvm-remote.sh.$$ 44 + # ds: yyyy.mm.dd-hh.mm.ss-remote 45 + 46 + TD=kvm-remote.sh.$$ 47 + T=${TMPDIR-/tmp}/$TD 48 + trap 'rm -rf $T' 0 49 + mkdir $T 50 + 51 + resdir="$T/res" 52 + ds=`date +%Y.%m.%d-%H.%M.%S`-remote 53 + rundir=$resdir/$ds 54 + echo Results directory: $rundir 55 + echo $scriptname $args 56 + if echo $1 | grep -q '^--' 57 + then 58 + # Fresh build. Create a datestamp unless the caller supplied one. 59 + datestamp="`echo "$@" | awk -v ds="$ds" '{ 60 + for (i = 1; i < NF; i++) { 61 + if ($i == "--datestamp") { 62 + ds = ""; 63 + break; 64 + } 65 + } 66 + if (ds != "") 67 + print "--datestamp " ds; 68 + }'`" 69 + kvm.sh --remote "$@" $datestamp --buildonly > $T/kvm.sh.out 2>&1 70 + ret=$? 71 + if test "$ret" -ne 0 72 + then 73 + echo $scriptname: kvm.sh failed exit code $? 74 + cat $T/kvm.sh.out 75 + exit 2 76 + fi 77 + oldrun="`grep -m 1 "^Results directory: " $T/kvm.sh.out | awk '{ print $3 }'`" 78 + touch "$oldrun/remote-log" 79 + echo $scriptname $args >> "$oldrun/remote-log" 80 + echo | tee -a "$oldrun/remote-log" 81 + echo " ----" kvm.sh output: "(`date`)" | tee -a "$oldrun/remote-log" 82 + cat $T/kvm.sh.out | tee -a "$oldrun/remote-log" 83 + # We are going to run this, so remove the buildonly files. 84 + rm -f "$oldrun"/*/buildonly 85 + kvm-again.sh $oldrun --dryrun --remote --rundir "$rundir" > $T/kvm-again.sh.out 2>&1 86 + ret=$? 87 + if test "$ret" -ne 0 88 + then 89 + echo $scriptname: kvm-again.sh failed exit code $? | tee -a "$oldrun/remote-log" 90 + cat $T/kvm-again.sh.out | tee -a "$oldrun/remote-log" 91 + exit 2 92 + fi 93 + else 94 + # Re-use old run. 95 + oldrun="$1" 96 + if ! echo $oldrun | grep -q '^/' 97 + then 98 + oldrun="`pwd`/$oldrun" 99 + fi 100 + shift 101 + touch "$oldrun/remote-log" 102 + echo $scriptname $args >> "$oldrun/remote-log" 103 + kvm-again.sh "$oldrun" "$@" --dryrun --remote --rundir "$rundir" > $T/kvm-again.sh.out 2>&1 104 + ret=$? 105 + if test "$ret" -ne 0 106 + then 107 + echo $scriptname: kvm-again.sh failed exit code $? | tee -a "$oldrun/remote-log" 108 + cat $T/kvm-again.sh.out | tee -a "$oldrun/remote-log" 109 + exit 2 110 + fi 111 + cp -a "$rundir" "$KVM/res/" 112 + oldrun="$KVM/res/$ds" 113 + fi 114 + echo | tee -a "$oldrun/remote-log" 115 + echo " ----" kvm-again.sh output: "(`date`)" | tee -a "$oldrun/remote-log" 116 + cat $T/kvm-again.sh.out 117 + echo | tee -a "$oldrun/remote-log" 118 + echo Remote run directory: $rundir | tee -a "$oldrun/remote-log" 119 + echo Local build-side run directory: $oldrun | tee -a "$oldrun/remote-log" 120 + 121 + # Create the kvm-remote-N.sh scripts in the bin directory. 122 + awk < "$rundir"/scenarios -v dest="$T/bin" -v rundir="$rundir" ' 123 + { 124 + n = $1; 125 + sub(/\./, "", n); 126 + fn = dest "/kvm-remote-" n ".sh" 127 + scenarios = ""; 128 + for (i = 2; i <= NF; i++) 129 + scenarios = scenarios " " $i; 130 + print "kvm-test-1-run-batch.sh" scenarios > fn; 131 + print "rm " rundir "/remote.run" >> fn; 132 + }' 133 + chmod +x $T/bin/kvm-remote-*.sh 134 + ( cd "`dirname $T`"; tar -chzf $T/binres.tgz "$TD/bin" "$TD/res" ) 135 + 136 + # Check first to avoid the need for cleanup for system-name typos 137 + for i in $systems 138 + do 139 + ncpus="`ssh $i getconf _NPROCESSORS_ONLN 2> /dev/null`" 140 + echo $i: $ncpus CPUs " " `date` | tee -a "$oldrun/remote-log" 141 + ret=$? 142 + if test "$ret" -ne 0 143 + then 144 + echo System $i unreachable, giving up. | tee -a "$oldrun/remote-log" 145 + exit 4 | tee -a "$oldrun/remote-log" 146 + fi 147 + done 148 + 149 + # Download and expand the tarball on all systems. 150 + for i in $systems 151 + do 152 + echo Downloading tarball to $i `date` | tee -a "$oldrun/remote-log" 153 + cat $T/binres.tgz | ssh $i "cd /tmp; tar -xzf -" 154 + ret=$? 155 + if test "$ret" -ne 0 156 + then 157 + echo Unable to download $T/binres.tgz to system $i, giving up. | tee -a "$oldrun/remote-log" 158 + exit 10 | tee -a "$oldrun/remote-log" 159 + fi 160 + done 161 + 162 + # Function to check for presence of a file on the specified system. 163 + # Complain if the system cannot be reached, and retry after a wait. 164 + # Currently just waits forever if a machine disappears. 165 + # 166 + # Usage: checkremotefile system pathname 167 + checkremotefile () { 168 + local ret 169 + local sleeptime=60 170 + 171 + while : 172 + do 173 + ssh $1 "test -f \"$2\"" 174 + ret=$? 175 + if test "$ret" -ne 255 176 + then 177 + return $ret 178 + fi 179 + echo " ---" ssh failure to $1 checking for file $2, retry after $sleeptime seconds. `date` 180 + sleep $sleeptime 181 + done 182 + } 183 + 184 + # Function to start batches on idle remote $systems 185 + # 186 + # Usage: startbatches curbatch nbatches 187 + # 188 + # Batches are numbered starting at 1. Returns the next batch to start. 189 + # Be careful to redirect all debug output to FD 2 (stderr). 190 + startbatches () { 191 + local curbatch="$1" 192 + local nbatches="$2" 193 + local ret 194 + 195 + # Each pass through the following loop examines one system. 196 + for i in $systems 197 + do 198 + if test "$curbatch" -gt "$nbatches" 199 + then 200 + echo $((nbatches + 1)) 201 + return 0 202 + fi 203 + if checkremotefile "$i" "$resdir/$ds/remote.run" 1>&2 204 + then 205 + continue # System still running last test, skip. 206 + fi 207 + ssh "$i" "cd \"$resdir/$ds\"; touch remote.run; PATH=\"$T/bin:$PATH\" nohup kvm-remote-$curbatch.sh > kvm-remote-$curbatch.sh.out 2>&1 &" 1>&2 208 + ret=$? 209 + if test "$ret" -ne 0 210 + then 211 + echo ssh $i failed: exitcode $ret 1>&2 212 + exit 11 213 + fi 214 + echo " ----" System $i Batch `head -n $curbatch < "$rundir"/scenarios | tail -1` `date` 1>&2 215 + curbatch=$((curbatch + 1)) 216 + done 217 + echo $curbatch 218 + } 219 + 220 + # Launch all the scenarios. 221 + nbatches="`wc -l "$rundir"/scenarios | awk '{ print $1 }'`" 222 + curbatch=1 223 + while test "$curbatch" -le "$nbatches" 224 + do 225 + startbatches $curbatch $nbatches > $T/curbatch 2> $T/startbatches.stderr 226 + curbatch="`cat $T/curbatch`" 227 + if test -s "$T/startbatches.stderr" 228 + then 229 + cat "$T/startbatches.stderr" | tee -a "$oldrun/remote-log" 230 + fi 231 + if test "$curbatch" -le "$nbatches" 232 + then 233 + sleep 30 234 + fi 235 + done 236 + echo All batches started. `date` 237 + 238 + # Wait for all remaining scenarios to complete and collect results. 239 + for i in $systems 240 + do 241 + while checkremotefile "$i" "$resdir/$ds/remote.run" 242 + do 243 + sleep 30 244 + done 245 + ( cd "$oldrun"; ssh $i "cd $rundir; tar -czf - kvm-remote-*.sh.out */console.log */kvm-test-1-run*.sh.out */qemu_pid */qemu-retval; rm -rf $T > /dev/null 2>&1" | tar -xzf - ) 246 + done 247 + 248 + ( kvm-end-run-stats.sh "$oldrun" "$starttime"; echo $? > $T/exitcode ) | tee -a "$oldrun/remote-log" 249 + exit "`cat $T/exitcode`"
+41 -20
tools/testing/selftests/rcutorture/bin/kvm.sh
··· 20 20 21 21 cd `dirname $scriptname`/../../../../../ 22 22 23 + # This script knows only English. 24 + LANG=en_US.UTF-8; export LANG 25 + 23 26 dur=$((30*60)) 24 27 dryrun="" 25 28 KVM="`pwd`/tools/testing/selftests/rcutorture"; export KVM ··· 44 41 TORTURE_KCONFIG_KCSAN_ARG="" 45 42 TORTURE_KMAKE_ARG="" 46 43 TORTURE_QEMU_MEM=512 44 + TORTURE_REMOTE= 47 45 TORTURE_SHUTDOWN_GRACE=180 48 46 TORTURE_SUITE=rcu 49 47 TORTURE_MOD=rcutorture ··· 68 64 echo " --cpus N" 69 65 echo " --datestamp string" 70 66 echo " --defconfig string" 71 - echo " --dryrun batches|sched|script" 67 + echo " --dryrun batches|scenarios|sched|script" 72 68 echo " --duration minutes | <seconds>s | <hours>h | <days>d" 73 69 echo " --gdb" 74 70 echo " --help" ··· 81 77 echo " --no-initrd" 82 78 echo " --qemu-args qemu-arguments" 83 79 echo " --qemu-cmd qemu-system-..." 80 + echo " --remote" 84 81 echo " --results absolute-pathname" 85 82 echo " --torture lock|rcu|rcuscale|refscale|scf" 86 83 echo " --trust-make" ··· 117 112 checkarg --cpus "(number)" "$#" "$2" '^[0-9]*$' '^--' 118 113 cpus=$2 119 114 TORTURE_ALLOTED_CPUS="$2" 120 - max_cpus="`identify_qemu_vcpus`" 121 - if test "$TORTURE_ALLOTED_CPUS" -gt "$max_cpus" 115 + if test -z "$TORTURE_REMOTE" 122 116 then 123 - TORTURE_ALLOTED_CPUS=$max_cpus 117 + max_cpus="`identify_qemu_vcpus`" 118 + if test "$TORTURE_ALLOTED_CPUS" -gt "$max_cpus" 119 + then 120 + TORTURE_ALLOTED_CPUS=$max_cpus 121 + fi 124 122 fi 125 123 shift 126 124 ;; ··· 138 130 shift 139 131 ;; 140 132 --dryrun) 141 - checkarg --dryrun "batches|sched|script" $# "$2" 'batches\|sched\|script' '^--' 133 + checkarg --dryrun "batches|sched|script" $# "$2" 'batches\|scenarios\|sched\|script' '^--' 142 134 dryrun=$2 143 135 shift 144 136 ;; ··· 213 205 checkarg --qemu-cmd "(qemu-system-...)" $# "$2" 'qemu-system-' '^--' 214 206 TORTURE_QEMU_CMD="$2" 215 207 shift 208 + ;; 209 + --remote) 210 + TORTURE_REMOTE=1 216 211 ;; 217 212 --results) 218 213 checkarg --results "(absolute pathname)" "$#" "$2" '^/' '^error' ··· 561 550 if (ncpus != 0) 562 551 dump(first, i, batchnum); 563 552 }' >> $T/script 564 - 565 - cat << '___EOF___' >> $T/script 566 - echo | tee -a $TORTURE_RESDIR/log 567 - echo | tee -a $TORTURE_RESDIR/log 568 - echo " --- `date` Test summary:" | tee -a $TORTURE_RESDIR/log 569 - ___EOF___ 570 - cat << ___EOF___ >> $T/script 571 - echo Results directory: $resdir/$ds | tee -a $resdir/$ds/log 572 - kcsan-collapse.sh $resdir/$ds | tee -a $resdir/$ds/log 573 - kvm-recheck.sh $resdir/$ds > $T/kvm-recheck.sh.out 2>&1 574 - ___EOF___ 575 - echo 'ret=$?' >> $T/script 576 - echo "cat $T/kvm-recheck.sh.out | tee -a $resdir/$ds/log" >> $T/script 577 - echo 'exit $ret' >> $T/script 553 + echo kvm-end-run-stats.sh "$resdir/$ds" "$starttime" >> $T/script 578 554 579 555 # Extract the tests and their batches from the script. 580 556 egrep 'Start batch|Starting build\.' $T/script | grep -v ">>" | ··· 574 576 { 575 577 print batchno, $1, $2 576 578 }' > $T/batches 579 + 580 + # As above, but one line per batch. 581 + grep -v '^#' $T/batches | awk ' 582 + BEGIN { 583 + oldbatch = 1; 584 + } 585 + 586 + { 587 + if (oldbatch != $1) { 588 + print ++n ". " curbatch; 589 + curbatch = ""; 590 + oldbatch = $1; 591 + } 592 + curbatch = curbatch " " $2; 593 + } 594 + 595 + END { 596 + print ++n ". " curbatch; 597 + }' > $T/scenarios 577 598 578 599 if test "$dryrun" = script 579 600 then ··· 614 597 then 615 598 cat $T/batches 616 599 exit 0 600 + elif test "$dryrun" = scenarios 601 + then 602 + cat $T/scenarios 603 + exit 0 617 604 else 618 605 # Not a dryrun. Record the batches and the number of CPUs, then run the script. 619 606 bash $T/script 620 607 ret=$? 621 608 cp $T/batches $resdir/$ds/batches 609 + cp $T/scenarios $resdir/$ds/scenarios 622 610 echo '#' cpus=$cpus >> $resdir/$ds/batches 623 - echo " --- Done at `date` (`get_starttime_duration $starttime`) exitcode $ret" | tee -a $resdir/$ds/log 624 611 exit $ret 625 612 fi 626 613
+1 -1
tools/testing/selftests/rcutorture/bin/torture.sh
··· 302 302 kcsan_kmake_tag="--kmake-args" 303 303 cur_kcsan_kmake_args="$kcsan_kmake_args" 304 304 fi 305 - torture_one $* --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y" $kcsan_kmake_tag $cur_kcsan_kmake_args --kcsan 305 + torture_one "$@" --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y" $kcsan_kmake_tag $cur_kcsan_kmake_args --kcsan 306 306 fi 307 307 } 308 308
+17
tools/testing/selftests/rcutorture/configs/rcu/BUSTED-BOOST
··· 1 + CONFIG_SMP=y 2 + CONFIG_NR_CPUS=16 3 + CONFIG_PREEMPT_NONE=n 4 + CONFIG_PREEMPT_VOLUNTARY=n 5 + CONFIG_PREEMPT=y 6 + #CHECK#CONFIG_PREEMPT_RCU=y 7 + CONFIG_HZ_PERIODIC=y 8 + CONFIG_NO_HZ_IDLE=n 9 + CONFIG_NO_HZ_FULL=n 10 + CONFIG_RCU_TRACE=y 11 + CONFIG_HOTPLUG_CPU=y 12 + CONFIG_RCU_FANOUT=2 13 + CONFIG_RCU_FANOUT_LEAF=2 14 + CONFIG_RCU_NOCB_CPU=n 15 + CONFIG_DEBUG_LOCK_ALLOC=n 16 + CONFIG_DEBUG_OBJECTS_RCU_HEAD=n 17 + CONFIG_RCU_EXPERT=y
+8
tools/testing/selftests/rcutorture/configs/rcu/BUSTED-BOOST.boot
··· 1 + rcutorture.test_boost=2 2 + rcutorture.stutter=0 3 + rcutree.gp_preinit_delay=12 4 + rcutree.gp_init_delay=3 5 + rcutree.gp_cleanup_delay=3 6 + rcutree.kthread_prio=2 7 + threadirqs 8 + tree.use_softirq=0
+1 -1
tools/testing/selftests/rcutorture/configs/rcuscale/TREE
··· 7 7 CONFIG_NO_HZ_IDLE=y 8 8 CONFIG_NO_HZ_FULL=n 9 9 CONFIG_RCU_FAST_NO_HZ=n 10 - CONFIG_HOTPLUG_CPU=n 10 + CONFIG_HOTPLUG_CPU=y 11 11 CONFIG_SUSPEND=n 12 12 CONFIG_HIBERNATION=n 13 13 CONFIG_RCU_NOCB_CPU=n
+1 -1
tools/testing/selftests/rcutorture/configs/rcuscale/TREE54
··· 8 8 CONFIG_NO_HZ_IDLE=y 9 9 CONFIG_NO_HZ_FULL=n 10 10 CONFIG_RCU_FAST_NO_HZ=n 11 - CONFIG_HOTPLUG_CPU=n 11 + CONFIG_HOTPLUG_CPU=y 12 12 CONFIG_SUSPEND=n 13 13 CONFIG_HIBERNATION=n 14 14 CONFIG_RCU_FANOUT=3
+1 -1
tools/testing/selftests/rcutorture/configs/refscale/NOPREEMPT
··· 7 7 CONFIG_NO_HZ_IDLE=y 8 8 CONFIG_NO_HZ_FULL=n 9 9 CONFIG_RCU_FAST_NO_HZ=n 10 - CONFIG_HOTPLUG_CPU=n 10 + CONFIG_HOTPLUG_CPU=y 11 11 CONFIG_SUSPEND=n 12 12 CONFIG_HIBERNATION=n 13 13 CONFIG_RCU_NOCB_CPU=n
+1 -1
tools/testing/selftests/rcutorture/configs/refscale/PREEMPT
··· 7 7 CONFIG_NO_HZ_IDLE=y 8 8 CONFIG_NO_HZ_FULL=n 9 9 CONFIG_RCU_FAST_NO_HZ=n 10 - CONFIG_HOTPLUG_CPU=n 10 + CONFIG_HOTPLUG_CPU=y 11 11 CONFIG_SUSPEND=n 12 12 CONFIG_HIBERNATION=n 13 13 CONFIG_RCU_NOCB_CPU=n
+1 -1
tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/locks.h
··· 174 174 } 175 175 176 176 struct completion { 177 - /* Hopefuly this won't overflow. */ 177 + /* Hopefully this won't overflow. */ 178 178 unsigned int count; 179 179 }; 180 180