Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

+24 -9

Documentation/RCU/stallwarn.txt

··· 56 56 two jiffies. (This is a cpp macro, not a kernel configuration 57 57 parameter.) 58 58 59 - When a CPU detects that it is stalling, it will print a message similar 60 - to the following: 59 + rcupdate.rcu_task_stall_timeout 60 + 61 + This boot/sysfs parameter controls the RCU-tasks stall warning 62 + interval. A value of zero or less suppresses RCU-tasks stall 63 + warnings. A positive value sets the stall-warning interval 64 + in jiffies. An RCU-tasks stall warning starts wtih the line: 65 + 66 + INFO: rcu_tasks detected stalls on tasks: 67 + 68 + And continues with the output of sched_show_task() for each 69 + task stalling the current RCU-tasks grace period. 70 + 71 + For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling, 72 + it will print a message similar to the following: 61 73 62 74 INFO: rcu_sched_state detected stall on CPU 5 (t=2500 jiffies) 63 75 ··· 186 174 o A CPU looping with bottom halves disabled. This condition can 187 175 result in RCU-sched and RCU-bh stalls. 188 176 189 - o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel 190 - without invoking schedule(). 177 + o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the 178 + kernel without invoking schedule(). Note that cond_resched() 179 + does not necessarily prevent RCU CPU stall warnings. Therefore, 180 + if the looping in the kernel is really expected and desirable 181 + behavior, you might need to replace some of the cond_resched() 182 + calls with calls to cond_resched_rcu_qs(). 191 183 192 184 o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might 193 185 happen to preempt a low-priority task in the middle of an RCU ··· 224 208 This resulted in a series of RCU CPU stall warnings, eventually 225 209 leading the realization that the CPU had failed. 226 210 227 - The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning. 228 - SRCU does not have its own CPU stall warnings, but its calls to 229 - synchronize_sched() will result in RCU-sched detecting RCU-sched-related 230 - CPU stalls. Please note that RCU only detects CPU stalls when there is 231 - a grace period in progress. No grace period, no CPU stall warnings. 211 + The RCU, RCU-sched, RCU-bh, and RCU-tasks implementations have CPU stall 212 + warning. Note that SRCU does -not- have CPU stall warnings. Please note 213 + that RCU only detects CPU stalls when there is a grace period in progress. 214 + No grace period, no CPU stall warnings. 232 215 233 216 To diagnose the cause of the stall, inspect the stack traces. 234 217 The offending function will usually be near the top of the stack.

+67 -1

Documentation/kernel-parameters.txt

··· 1723 1723 lockd.nlm_udpport=M [NFS] Assign UDP port. 1724 1724 Format: <integer> 1725 1725 1726 + locktorture.nreaders_stress= [KNL] 1727 + Set the number of locking read-acquisition kthreads. 1728 + Defaults to being automatically set based on the 1729 + number of online CPUs. 1730 + 1731 + locktorture.nwriters_stress= [KNL] 1732 + Set the number of locking write-acquisition kthreads. 1733 + 1734 + locktorture.onoff_holdoff= [KNL] 1735 + Set time (s) after boot for CPU-hotplug testing. 1736 + 1737 + locktorture.onoff_interval= [KNL] 1738 + Set time (s) between CPU-hotplug operations, or 1739 + zero to disable CPU-hotplug testing. 1740 + 1741 + locktorture.shuffle_interval= [KNL] 1742 + Set task-shuffle interval (jiffies). Shuffling 1743 + tasks allows some CPUs to go into dyntick-idle 1744 + mode during the locktorture test. 1745 + 1746 + locktorture.shutdown_secs= [KNL] 1747 + Set time (s) after boot system shutdown. This 1748 + is useful for hands-off automated testing. 1749 + 1750 + locktorture.stat_interval= [KNL] 1751 + Time (s) between statistics printk()s. 1752 + 1753 + locktorture.stutter= [KNL] 1754 + Time (s) to stutter testing, for example, 1755 + specifying five seconds causes the test to run for 1756 + five seconds, wait for five seconds, and so on. 1757 + This tests the locking primitive's ability to 1758 + transition abruptly to and from idle. 1759 + 1760 + locktorture.torture_runnable= [BOOT] 1761 + Start locktorture running at boot time. 1762 + 1763 + locktorture.torture_type= [KNL] 1764 + Specify the locking implementation to test. 1765 + 1766 + locktorture.verbose= [KNL] 1767 + Enable additional printk() statements. 1768 + 1726 1769 logibm.irq= [HW,MOUSE] Logitech Bus Mouse Driver 1727 1770 Format: <irq> 1728 1771 ··· 2943 2900 Lazy RCU callbacks are those which RCU can 2944 2901 prove do nothing more than free memory. 2945 2902 2903 + rcutorture.cbflood_inter_holdoff= [KNL] 2904 + Set holdoff time (jiffies) between successive 2905 + callback-flood tests. 2906 + 2907 + rcutorture.cbflood_intra_holdoff= [KNL] 2908 + Set holdoff time (jiffies) between successive 2909 + bursts of callbacks within a given callback-flood 2910 + test. 2911 + 2912 + rcutorture.cbflood_n_burst= [KNL] 2913 + Set the number of bursts making up a given 2914 + callback-flood test. Set this to zero to 2915 + disable callback-flood testing. 2916 + 2917 + rcutorture.cbflood_n_per_burst= [KNL] 2918 + Set the number of callbacks to be registered 2919 + in a given burst of a callback-flood test. 2920 + 2946 2921 rcutorture.fqs_duration= [KNL] 2947 2922 Set duration of force_quiescent_state bursts. 2948 2923 ··· 3000 2939 Set time (s) between CPU-hotplug operations, or 3001 2940 zero to disable CPU-hotplug testing. 3002 2941 3003 - rcutorture.rcutorture_runnable= [BOOT] 2942 + rcutorture.torture_runnable= [BOOT] 3004 2943 Start rcutorture running at boot time. 3005 2944 3006 2945 rcutorture.shuffle_interval= [KNL] ··· 3061 3000 3062 3001 rcupdate.rcu_cpu_stall_timeout= [KNL] 3063 3002 Set timeout for RCU CPU stall warning messages. 3003 + 3004 + rcupdate.rcu_task_stall_timeout= [KNL] 3005 + Set timeout in jiffies for RCU task stall warning 3006 + messages. Disable with a value less than or equal 3007 + to zero. 3064 3008 3065 3009 rdinit= [KNL] 3066 3010 Format: <full_path>

+147

Documentation/locking/locktorture.txt

··· 1 + Kernel Lock Torture Test Operation 2 + 3 + CONFIG_LOCK_TORTURE_TEST 4 + 5 + The CONFIG LOCK_TORTURE_TEST config option provides a kernel module 6 + that runs torture tests on core kernel locking primitives. The kernel 7 + module, 'locktorture', may be built after the fact on the running 8 + kernel to be tested, if desired. The tests periodically output status 9 + messages via printk(), which can be examined via the dmesg (perhaps 10 + grepping for "torture"). The test is started when the module is loaded, 11 + and stops when the module is unloaded. This program is based on how RCU 12 + is tortured, via rcutorture. 13 + 14 + This torture test consists of creating a number of kernel threads which 15 + acquire the lock and hold it for specific amount of time, thus simulating 16 + different critical region behaviors. The amount of contention on the lock 17 + can be simulated by either enlarging this critical region hold time and/or 18 + creating more kthreads. 19 + 20 + 21 + MODULE PARAMETERS 22 + 23 + This module has the following parameters: 24 + 25 + 26 + ** Locktorture-specific ** 27 + 28 + nwriters_stress Number of kernel threads that will stress exclusive lock 29 + ownership (writers). The default value is twice the number 30 + of online CPUs. 31 + 32 + nreaders_stress Number of kernel threads that will stress shared lock 33 + ownership (readers). The default is the same amount of writer 34 + locks. If the user did not specify nwriters_stress, then 35 + both readers and writers be the amount of online CPUs. 36 + 37 + torture_type Type of lock to torture. By default, only spinlocks will 38 + be tortured. This module can torture the following locks, 39 + with string values as follows: 40 + 41 + o "lock_busted": Simulates a buggy lock implementation. 42 + 43 + o "spin_lock": spin_lock() and spin_unlock() pairs. 44 + 45 + o "spin_lock_irq": spin_lock_irq() and spin_unlock_irq() 46 + pairs. 47 + 48 + o "rw_lock": read/write lock() and unlock() rwlock pairs. 49 + 50 + o "rw_lock_irq": read/write lock_irq() and unlock_irq() 51 + rwlock pairs. 52 + 53 + o "mutex_lock": mutex_lock() and mutex_unlock() pairs. 54 + 55 + o "rwsem_lock": read/write down() and up() semaphore pairs. 56 + 57 + torture_runnable Start locktorture at boot time in the case where the 58 + module is built into the kernel, otherwise wait for 59 + torture_runnable to be set via sysfs before starting. 60 + By default it will begin once the module is loaded. 61 + 62 + 63 + ** Torture-framework (RCU + locking) ** 64 + 65 + shutdown_secs The number of seconds to run the test before terminating 66 + the test and powering off the system. The default is 67 + zero, which disables test termination and system shutdown. 68 + This capability is useful for automated testing. 69 + 70 + onoff_interval The number of seconds between each attempt to execute a 71 + randomly selected CPU-hotplug operation. Defaults 72 + to zero, which disables CPU hotplugging. In 73 + CONFIG_HOTPLUG_CPU=n kernels, locktorture will silently 74 + refuse to do any CPU-hotplug operations regardless of 75 + what value is specified for onoff_interval. 76 + 77 + onoff_holdoff The number of seconds to wait until starting CPU-hotplug 78 + operations. This would normally only be used when 79 + locktorture was built into the kernel and started 80 + automatically at boot time, in which case it is useful 81 + in order to avoid confusing boot-time code with CPUs 82 + coming and going. This parameter is only useful if 83 + CONFIG_HOTPLUG_CPU is enabled. 84 + 85 + stat_interval Number of seconds between statistics-related printk()s. 86 + By default, locktorture will report stats every 60 seconds. 87 + Setting the interval to zero causes the statistics to 88 + be printed -only- when the module is unloaded, and this 89 + is the default. 90 + 91 + stutter The length of time to run the test before pausing for this 92 + same period of time. Defaults to "stutter=5", so as 93 + to run and pause for (roughly) five-second intervals. 94 + Specifying "stutter=0" causes the test to run continuously 95 + without pausing, which is the old default behavior. 96 + 97 + shuffle_interval The number of seconds to keep the test threads affinitied 98 + to a particular subset of the CPUs, defaults to 3 seconds. 99 + Used in conjunction with test_no_idle_hz. 100 + 101 + verbose Enable verbose debugging printing, via printk(). Enabled 102 + by default. This extra information is mostly related to 103 + high-level errors and reports from the main 'torture' 104 + framework. 105 + 106 + 107 + STATISTICS 108 + 109 + Statistics are printed in the following format: 110 + 111 + spin_lock-torture: Writes: Total: 93746064 Max/Min: 0/0 Fail: 0 112 + (A) (B) (C) (D) (E) 113 + 114 + (A): Lock type that is being tortured -- torture_type parameter. 115 + 116 + (B): Number of writer lock acquisitions. If dealing with a read/write primitive 117 + a second "Reads" statistics line is printed. 118 + 119 + (C): Number of times the lock was acquired. 120 + 121 + (D): Min and max number of times threads failed to acquire the lock. 122 + 123 + (E): true/false values if there were errors acquiring the lock. This should 124 + -only- be positive if there is a bug in the locking primitive's 125 + implementation. Otherwise a lock should never fail (i.e., spin_lock()). 126 + Of course, the same applies for (C), above. A dummy example of this is 127 + the "lock_busted" type. 128 + 129 + USAGE 130 + 131 + The following script may be used to torture locks: 132 + 133 + #!/bin/sh 134 + 135 + modprobe locktorture 136 + sleep 3600 137 + rmmod locktorture 138 + dmesg | grep torture: 139 + 140 + The output can be manually inspected for the error flag of "!!!". 141 + One could of course create a more elaborate script that automatically 142 + checked for such errors. The "rmmod" command forces a "SUCCESS", 143 + "FAILURE", or "RCU_HOTPLUG" indication to be printk()ed. The first 144 + two are self-explanatory, while the last indicates that while there 145 + were no locking failures, CPU-hotplug problems were detected. 146 + 147 + Also see: Documentation/RCU/torture.txt

+66 -62

Documentation/memory-barriers.txt

··· 574 574 in the following example: 575 575 576 576 q = ACCESS_ONCE(a); 577 - if (ACCESS_ONCE(q)) { 577 + if (q) { 578 578 ACCESS_ONCE(b) = p; 579 579 } 580 580 581 - Please note that ACCESS_ONCE() is not optional! Without the ACCESS_ONCE(), 582 - the compiler is within its rights to transform this example: 583 - 584 - q = a; 585 - if (q) { 586 - b = p; /* BUG: Compiler can reorder!!! */ 587 - do_something(); 588 - } else { 589 - b = p; /* BUG: Compiler can reorder!!! */ 590 - do_something_else(); 591 - } 592 - 593 - into this, which of course defeats the ordering: 594 - 595 - b = p; 596 - q = a; 597 - if (q) 598 - do_something(); 599 - else 600 - do_something_else(); 581 + Please note that ACCESS_ONCE() is not optional! Without the 582 + ACCESS_ONCE(), might combine the load from 'a' with other loads from 583 + 'a', and the store to 'b' with other stores to 'b', with possible highly 584 + counterintuitive effects on ordering. 601 585 602 586 Worse yet, if the compiler is able to prove (say) that the value of 603 587 variable 'a' is always non-zero, it would be well within its rights ··· 589 605 as follows: 590 606 591 607 q = a; 592 - b = p; /* BUG: Compiler can reorder!!! */ 593 - do_something(); 608 + b = p; /* BUG: Compiler and CPU can both reorder!!! */ 594 609 595 - The solution is again ACCESS_ONCE() and barrier(), which preserves the 596 - ordering between the load from variable 'a' and the store to variable 'b': 610 + So don't leave out the ACCESS_ONCE(). 611 + 612 + It is tempting to try to enforce ordering on identical stores on both 613 + branches of the "if" statement as follows: 597 614 598 615 q = ACCESS_ONCE(a); 599 616 if (q) { ··· 607 622 do_something_else(); 608 623 } 609 624 610 - The initial ACCESS_ONCE() is required to prevent the compiler from 611 - proving the value of 'a', and the pair of barrier() invocations are 612 - required to prevent the compiler from pulling the two identical stores 613 - to 'b' out from the legs of the "if" statement. 614 - 615 - It is important to note that control dependencies absolutely require a 616 - a conditional. For example, the following "optimized" version of 617 - the above example breaks ordering, which is why the barrier() invocations 618 - are absolutely required if you have identical stores in both legs of 619 - the "if" statement: 625 + Unfortunately, current compilers will transform this as follows at high 626 + optimization levels: 620 627 621 628 q = ACCESS_ONCE(a); 629 + barrier(); 622 630 ACCESS_ONCE(b) = p; /* BUG: No ordering vs. load from a!!! */ 623 631 if (q) { 624 632 /* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */ ··· 621 643 do_something_else(); 622 644 } 623 645 624 - It is of course legal for the prior load to be part of the conditional, 625 - for example, as follows: 646 + Now there is no conditional between the load from 'a' and the store to 647 + 'b', which means that the CPU is within its rights to reorder them: 648 + The conditional is absolutely required, and must be present in the 649 + assembly code even after all compiler optimizations have been applied. 650 + Therefore, if you need ordering in this example, you need explicit 651 + memory barriers, for example, smp_store_release(): 626 652 627 - if (ACCESS_ONCE(a) > 0) { 628 - barrier(); 629 - ACCESS_ONCE(b) = q / 2; 653 + q = ACCESS_ONCE(a); 654 + if (q) { 655 + smp_store_release(&b, p); 630 656 do_something(); 631 657 } else { 632 - barrier(); 633 - ACCESS_ONCE(b) = q / 3; 658 + smp_store_release(&b, p); 634 659 do_something_else(); 635 660 } 636 661 637 - This will again ensure that the load from variable 'a' is ordered before the 638 - stores to variable 'b'. 662 + In contrast, without explicit memory barriers, two-legged-if control 663 + ordering is guaranteed only when the stores differ, for example: 664 + 665 + q = ACCESS_ONCE(a); 666 + if (q) { 667 + ACCESS_ONCE(b) = p; 668 + do_something(); 669 + } else { 670 + ACCESS_ONCE(b) = r; 671 + do_something_else(); 672 + } 673 + 674 + The initial ACCESS_ONCE() is still required to prevent the compiler from 675 + proving the value of 'a'. 639 676 640 677 In addition, you need to be careful what you do with the local variable 'q', 641 678 otherwise the compiler might be able to guess the value and again remove ··· 658 665 659 666 q = ACCESS_ONCE(a); 660 667 if (q % MAX) { 661 - barrier(); 662 668 ACCESS_ONCE(b) = p; 663 669 do_something(); 664 670 } else { 665 - barrier(); 666 - ACCESS_ONCE(b) = p; 671 + ACCESS_ONCE(b) = r; 667 672 do_something_else(); 668 673 } 669 674 ··· 673 682 ACCESS_ONCE(b) = p; 674 683 do_something_else(); 675 684 676 - This transformation loses the ordering between the load from variable 'a' 677 - and the store to variable 'b'. If you are relying on this ordering, you 678 - should do something like the following: 685 + Given this transformation, the CPU is not required to respect the ordering 686 + between the load from variable 'a' and the store to variable 'b'. It is 687 + tempting to add a barrier(), but this does not help. The conditional 688 + is gone, and the barrier won't bring it back. Therefore, if you are 689 + relying on this ordering, you should make sure that MAX is greater than 690 + one, perhaps as follows: 679 691 680 692 q = ACCESS_ONCE(a); 681 693 BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */ ··· 686 692 ACCESS_ONCE(b) = p; 687 693 do_something(); 688 694 } else { 689 - ACCESS_ONCE(b) = p; 695 + ACCESS_ONCE(b) = r; 690 696 do_something_else(); 691 697 } 692 698 699 + Please note once again that the stores to 'b' differ. If they were 700 + identical, as noted earlier, the compiler could pull this store outside 701 + of the 'if' statement. 702 + 693 703 Finally, control dependencies do -not- provide transitivity. This is 694 - demonstrated by two related examples: 704 + demonstrated by two related examples, with the initial values of 705 + x and y both being zero: 695 706 696 707 CPU 0 CPU 1 697 708 ===================== ===================== 698 709 r1 = ACCESS_ONCE(x); r2 = ACCESS_ONCE(y); 699 - if (r1 >= 0) if (r2 >= 0) 710 + if (r1 > 0) if (r2 > 0) 700 711 ACCESS_ONCE(y) = 1; ACCESS_ONCE(x) = 1; 701 712 702 713 assert(!(r1 == 1 && r2 == 1)); 703 714 704 715 The above two-CPU example will never trigger the assert(). However, 705 716 if control dependencies guaranteed transitivity (which they do not), 706 - then adding the following two CPUs would guarantee a related assertion: 717 + then adding the following CPU would guarantee a related assertion: 707 718 708 - CPU 2 CPU 3 709 - ===================== ===================== 710 - ACCESS_ONCE(x) = 2; ACCESS_ONCE(y) = 2; 719 + CPU 2 720 + ===================== 721 + ACCESS_ONCE(x) = 2; 711 722 712 - assert(!(r1 == 2 && r2 == 2 && x == 1 && y == 1)); /* FAILS!!! */ 723 + assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */ 713 724 714 - But because control dependencies do -not- provide transitivity, the 715 - above assertion can fail after the combined four-CPU example completes. 716 - If you need the four-CPU example to provide ordering, you will need 717 - smp_mb() between the loads and stores in the CPU 0 and CPU 1 code fragments. 725 + But because control dependencies do -not- provide transitivity, the above 726 + assertion can fail after the combined three-CPU example completes. If you 727 + need the three-CPU example to provide ordering, you will need smp_mb() 728 + between the loads and stores in the CPU 0 and CPU 1 code fragments, 729 + that is, just before or just after the "if" statements. 730 + 731 + These two examples are the LB and WWC litmus tests from this paper: 732 + http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf and this 733 + site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html. 718 734 719 735 In summary: 720 736

+1 -1

fs/file.c

··· 367 367 struct file * file = xchg(&fdt->fd[i], NULL); 368 368 if (file) { 369 369 filp_close(file, files); 370 - cond_resched(); 370 + cond_resched_rcu_qs(); 371 371 } 372 372 } 373 373 i++;

+2

include/linux/cpu.h

··· 213 213 extern void cpu_hotplug_begin(void); 214 214 extern void cpu_hotplug_done(void); 215 215 extern void get_online_cpus(void); 216 + extern bool try_get_online_cpus(void); 216 217 extern void put_online_cpus(void); 217 218 extern void cpu_hotplug_disable(void); 218 219 extern void cpu_hotplug_enable(void); ··· 231 230 static inline void cpu_hotplug_begin(void) {} 232 231 static inline void cpu_hotplug_done(void) {} 233 232 #define get_online_cpus() do { } while (0) 233 + #define try_get_online_cpus() true 234 234 #define put_online_cpus() do { } while (0) 235 235 #define cpu_hotplug_disable() do { } while (0) 236 236 #define cpu_hotplug_enable() do { } while (0)

+11 -1

include/linux/init_task.h

··· 111 111 #ifdef CONFIG_PREEMPT_RCU 112 112 #define INIT_TASK_RCU_PREEMPT(tsk) \ 113 113 .rcu_read_lock_nesting = 0, \ 114 - .rcu_read_unlock_special = 0, \ 114 + .rcu_read_unlock_special.s = 0, \ 115 115 .rcu_node_entry = LIST_HEAD_INIT(tsk.rcu_node_entry), \ 116 116 INIT_TASK_RCU_TREE_PREEMPT() 117 117 #else 118 118 #define INIT_TASK_RCU_PREEMPT(tsk) 119 + #endif 120 + #ifdef CONFIG_TASKS_RCU 121 + #define INIT_TASK_RCU_TASKS(tsk) \ 122 + .rcu_tasks_holdout = false, \ 123 + .rcu_tasks_holdout_list = \ 124 + LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list), \ 125 + .rcu_tasks_idle_cpu = -1, 126 + #else 127 + #define INIT_TASK_RCU_TASKS(tsk) 119 128 #endif 120 129 121 130 extern struct cred init_cred; ··· 233 224 INIT_FTRACE_GRAPH \ 234 225 INIT_TRACE_RECURSION \ 235 226 INIT_TASK_RCU_PREEMPT(tsk) \ 227 + INIT_TASK_RCU_TASKS(tsk) \ 236 228 INIT_CPUSET_SEQ(tsk) \ 237 229 INIT_RT_MUTEXES(tsk) \ 238 230 INIT_VTIME(tsk) \

+1

include/linux/lockdep.h

··· 510 510 511 511 #define lock_map_acquire(l) lock_acquire_exclusive(l, 0, 0, NULL, _THIS_IP_) 512 512 #define lock_map_acquire_read(l) lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_) 513 + #define lock_map_acquire_tryread(l) lock_acquire_shared_recursive(l, 0, 1, NULL, _THIS_IP_) 513 514 #define lock_map_release(l) lock_release(l, 1, _THIS_IP_) 514 515 515 516 #ifdef CONFIG_PROVE_LOCKING

+65 -41

include/linux/rcupdate.h

··· 47 47 #include <asm/barrier.h> 48 48 49 49 extern int rcu_expedited; /* for sysctl */ 50 - #ifdef CONFIG_RCU_TORTURE_TEST 51 - extern int rcutorture_runnable; /* for sysctl */ 52 - #endif /* #ifdef CONFIG_RCU_TORTURE_TEST */ 53 50 54 51 enum rcutorture_type { 55 52 RCU_FLAVOR, 56 53 RCU_BH_FLAVOR, 57 54 RCU_SCHED_FLAVOR, 55 + RCU_TASKS_FLAVOR, 58 56 SRCU_FLAVOR, 59 57 INVALID_RCU_FLAVOR 60 58 }; ··· 195 197 196 198 void synchronize_sched(void); 197 199 200 + /** 201 + * call_rcu_tasks() - Queue an RCU for invocation task-based grace period 202 + * @head: structure to be used for queueing the RCU updates. 203 + * @func: actual callback function to be invoked after the grace period 204 + * 205 + * The callback function will be invoked some time after a full grace 206 + * period elapses, in other words after all currently executing RCU 207 + * read-side critical sections have completed. call_rcu_tasks() assumes 208 + * that the read-side critical sections end at a voluntary context 209 + * switch (not a preemption!), entry into idle, or transition to usermode 210 + * execution. As such, there are no read-side primitives analogous to 211 + * rcu_read_lock() and rcu_read_unlock() because this primitive is intended 212 + * to determine that all tasks have passed through a safe state, not so 213 + * much for data-strcuture synchronization. 214 + * 215 + * See the description of call_rcu() for more detailed information on 216 + * memory ordering guarantees. 217 + */ 218 + void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head *head)); 219 + void synchronize_rcu_tasks(void); 220 + void rcu_barrier_tasks(void); 221 + 198 222 #ifdef CONFIG_PREEMPT_RCU 199 223 200 224 void __rcu_read_lock(void); ··· 258 238 259 239 /* Internal to kernel */ 260 240 void rcu_init(void); 261 - void rcu_sched_qs(int cpu); 262 - void rcu_bh_qs(int cpu); 241 + void rcu_sched_qs(void); 242 + void rcu_bh_qs(void); 263 243 void rcu_check_callbacks(int cpu, int user); 264 244 struct notifier_block; 265 245 void rcu_idle_enter(void); ··· 289 269 struct task_struct *next) { } 290 270 #endif /* CONFIG_RCU_USER_QS */ 291 271 272 + #ifdef CONFIG_RCU_NOCB_CPU 273 + void rcu_init_nohz(void); 274 + #else /* #ifdef CONFIG_RCU_NOCB_CPU */ 275 + static inline void rcu_init_nohz(void) 276 + { 277 + } 278 + #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */ 279 + 292 280 /** 293 281 * RCU_NONIDLE - Indicate idle-loop code that needs RCU readers 294 282 * @a: Code that RCU needs to pay attention to. ··· 321 293 do { a; } while (0); \ 322 294 rcu_irq_exit(); \ 323 295 } while (0) 296 + 297 + /* 298 + * Note a voluntary context switch for RCU-tasks benefit. This is a 299 + * macro rather than an inline function to avoid #include hell. 300 + */ 301 + #ifdef CONFIG_TASKS_RCU 302 + #define TASKS_RCU(x) x 303 + extern struct srcu_struct tasks_rcu_exit_srcu; 304 + #define rcu_note_voluntary_context_switch(t) \ 305 + do { \ 306 + if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \ 307 + ACCESS_ONCE((t)->rcu_tasks_holdout) = false; \ 308 + } while (0) 309 + #else /* #ifdef CONFIG_TASKS_RCU */ 310 + #define TASKS_RCU(x) do { } while (0) 311 + #define rcu_note_voluntary_context_switch(t) do { } while (0) 312 + #endif /* #else #ifdef CONFIG_TASKS_RCU */ 313 + 314 + /** 315 + * cond_resched_rcu_qs - Report potential quiescent states to RCU 316 + * 317 + * This macro resembles cond_resched(), except that it is defined to 318 + * report potential quiescent states to RCU-tasks even if the cond_resched() 319 + * machinery were to be shut off, as some advocate for PREEMPT kernels. 320 + */ 321 + #define cond_resched_rcu_qs() \ 322 + do { \ 323 + rcu_note_voluntary_context_switch(current); \ 324 + cond_resched(); \ 325 + } while (0) 324 326 325 327 #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) 326 328 bool __rcu_is_watching(void); ··· 407 349 #else /* #if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PROVE_RCU) */ 408 350 static inline bool rcu_lockdep_current_cpu_online(void) 409 351 { 410 - return 1; 352 + return true; 411 353 } 412 354 #endif /* #else #if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PROVE_RCU) */ 413 355 ··· 429 371 extern struct lockdep_map rcu_callback_map; 430 372 int debug_lockdep_rcu_enabled(void); 431 373 432 - /** 433 - * rcu_read_lock_held() - might we be in RCU read-side critical section? 434 - * 435 - * If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an RCU 436 - * read-side critical section. In absence of CONFIG_DEBUG_LOCK_ALLOC, 437 - * this assumes we are in an RCU read-side critical section unless it can 438 - * prove otherwise. This is useful for debug checks in functions that 439 - * require that they be called within an RCU read-side critical section. 440 - * 441 - * Checks debug_lockdep_rcu_enabled() to prevent false positives during boot 442 - * and while lockdep is disabled. 443 - * 444 - * Note that rcu_read_lock() and the matching rcu_read_unlock() must 445 - * occur in the same context, for example, it is illegal to invoke 446 - * rcu_read_unlock() in process context if the matching rcu_read_lock() 447 - * was invoked from within an irq handler. 448 - * 449 - * Note that rcu_read_lock() is disallowed if the CPU is either idle or 450 - * offline from an RCU perspective, so check for those as well. 451 - */ 452 - static inline int rcu_read_lock_held(void) 453 - { 454 - if (!debug_lockdep_rcu_enabled()) 455 - return 1; 456 - if (!rcu_is_watching()) 457 - return 0; 458 - if (!rcu_lockdep_current_cpu_online()) 459 - return 0; 460 - return lock_is_held(&rcu_lock_map); 461 - } 462 - 463 - /* 464 - * rcu_read_lock_bh_held() is defined out of line to avoid #include-file 465 - * hell. 466 - */ 374 + int rcu_read_lock_held(void); 467 375 int rcu_read_lock_bh_held(void); 468 376 469 377 /**

+1 -1

include/linux/rcutiny.h

··· 80 80 81 81 static inline void rcu_note_context_switch(int cpu) 82 82 { 83 - rcu_sched_qs(cpu); 83 + rcu_sched_qs(); 84 84 } 85 85 86 86 /*

+23 -18

include/linux/sched.h

··· 1213 1213 struct hrtimer dl_timer; 1214 1214 }; 1215 1215 1216 + union rcu_special { 1217 + struct { 1218 + bool blocked; 1219 + bool need_qs; 1220 + } b; 1221 + short s; 1222 + }; 1216 1223 struct rcu_node; 1217 1224 1218 1225 enum perf_event_task_context { ··· 1272 1265 1273 1266 #ifdef CONFIG_PREEMPT_RCU 1274 1267 int rcu_read_lock_nesting; 1275 - char rcu_read_unlock_special; 1268 + union rcu_special rcu_read_unlock_special; 1276 1269 struct list_head rcu_node_entry; 1277 1270 #endif /* #ifdef CONFIG_PREEMPT_RCU */ 1278 1271 #ifdef CONFIG_TREE_PREEMPT_RCU 1279 1272 struct rcu_node *rcu_blocked_node; 1280 1273 #endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */ 1274 + #ifdef CONFIG_TASKS_RCU 1275 + unsigned long rcu_tasks_nvcsw; 1276 + bool rcu_tasks_holdout; 1277 + struct list_head rcu_tasks_holdout_list; 1278 + int rcu_tasks_idle_cpu; 1279 + #endif /* #ifdef CONFIG_TASKS_RCU */ 1281 1280 1282 1281 #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT) 1283 1282 struct sched_info sched_info; ··· 2027 2014 extern void task_clear_jobctl_pending(struct task_struct *task, 2028 2015 unsigned int mask); 2029 2016 2017 + static inline void rcu_copy_process(struct task_struct *p) 2018 + { 2030 2019 #ifdef CONFIG_PREEMPT_RCU 2031 - 2032 - #define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */ 2033 - #define RCU_READ_UNLOCK_NEED_QS (1 << 1) /* RCU core needs CPU response. */ 2034 - 2035 - static inline void rcu_copy_process(struct task_struct *p) 2036 - { 2037 2020 p->rcu_read_lock_nesting = 0; 2038 - p->rcu_read_unlock_special = 0; 2039 - #ifdef CONFIG_TREE_PREEMPT_RCU 2021 + p->rcu_read_unlock_special.s = 0; 2040 2022 p->rcu_blocked_node = NULL; 2041 - #endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */ 2042 2023 INIT_LIST_HEAD(&p->rcu_node_entry); 2024 + #endif /* #ifdef CONFIG_PREEMPT_RCU */ 2025 + #ifdef CONFIG_TASKS_RCU 2026 + p->rcu_tasks_holdout = false; 2027 + INIT_LIST_HEAD(&p->rcu_tasks_holdout_list); 2028 + p->rcu_tasks_idle_cpu = -1; 2029 + #endif /* #ifdef CONFIG_TASKS_RCU */ 2043 2030 } 2044 - 2045 - #else 2046 - 2047 - static inline void rcu_copy_process(struct task_struct *p) 2048 - { 2049 - } 2050 - 2051 - #endif 2052 2031 2053 2032 static inline void tsk_restore_flags(struct task_struct *task, 2054 2033 unsigned long orig_flags, unsigned long flags)

+3 -2

include/linux/torture.h

··· 51 51 52 52 /* Definitions for online/offline exerciser. */ 53 53 int torture_onoff_init(long ooholdoff, long oointerval); 54 - char *torture_onoff_stats(char *page); 54 + void torture_onoff_stats(void); 55 55 bool torture_onoff_failures(void); 56 56 57 57 /* Low-rider random number generator. */ ··· 77 77 /* Initialization and cleanup. */ 78 78 bool torture_init_begin(char *ttype, bool v, int *runnable); 79 79 void torture_init_end(void); 80 - bool torture_cleanup(void); 80 + bool torture_cleanup_begin(void); 81 + void torture_cleanup_end(void); 81 82 bool torture_must_stop(void); 82 83 bool torture_must_stop_irq(void); 83 84 void torture_kthread_stopping(char *title);

+3

include/trace/events/rcu.h

··· 180 180 * argument is a string as follows: 181 181 * 182 182 * "WakeEmpty": Wake rcuo kthread, first CB to empty list. 183 + * "WakeEmptyIsDeferred": Wake rcuo kthread later, first CB to empty list. 183 184 * "WakeOvf": Wake rcuo kthread, CB list is huge. 185 + * "WakeOvfIsDeferred": Wake rcuo kthread later, CB list is huge. 184 186 * "WakeNot": Don't wake rcuo kthread. 185 187 * "WakeNotPoll": Don't wake rcuo kthread because it is polling. 188 + * "DeferredWake": Carried out the "IsDeferred" wakeup. 186 189 * "Poll": Start of new polling cycle for rcu_nocb_poll. 187 190 * "Sleep": Sleep waiting for CBs for !rcu_nocb_poll. 188 191 * "WokeEmpty": rcuo kthread woke to find empty list.

+12 -2

init/Kconfig

··· 507 507 This option enables preemptible-RCU code that is common between 508 508 TREE_PREEMPT_RCU and, in the old days, TINY_PREEMPT_RCU. 509 509 510 + config TASKS_RCU 511 + bool "Task_based RCU implementation using voluntary context switch" 512 + default n 513 + help 514 + This option enables a task-based RCU implementation that uses 515 + only voluntary context switch (not preemption!), idle, and 516 + user-mode execution as quiescent states. 517 + 518 + If unsure, say N. 519 + 510 520 config RCU_STALL_COMMON 511 521 def_bool ( TREE_RCU || TREE_PREEMPT_RCU || RCU_TRACE ) 512 522 help ··· 747 737 748 738 config RCU_NOCB_CPU_NONE 749 739 bool "No build_forced no-CBs CPUs" 750 - depends on RCU_NOCB_CPU && !NO_HZ_FULL_ALL 740 + depends on RCU_NOCB_CPU 751 741 help 752 742 This option does not force any of the CPUs to be no-CBs CPUs. 753 743 Only CPUs designated by the rcu_nocbs= boot parameter will be ··· 761 751 762 752 config RCU_NOCB_CPU_ZERO 763 753 bool "CPU 0 is a build_forced no-CBs CPU" 764 - depends on RCU_NOCB_CPU && !NO_HZ_FULL_ALL 754 + depends on RCU_NOCB_CPU 765 755 help 766 756 This option forces CPU 0 to be a no-CBs CPU, so that its RCU 767 757 callbacks are invoked by a per-CPU kthread whose name begins

+1

init/main.c

··· 583 583 early_irq_init(); 584 584 init_IRQ(); 585 585 tick_init(); 586 + rcu_init_nohz(); 586 587 init_timers(); 587 588 hrtimers_init(); 588 589 softirq_init();

+15 -1

kernel/cpu.c

··· 79 79 80 80 /* Lockdep annotations for get/put_online_cpus() and cpu_hotplug_begin/end() */ 81 81 #define cpuhp_lock_acquire_read() lock_map_acquire_read(&cpu_hotplug.dep_map) 82 + #define cpuhp_lock_acquire_tryread() \ 83 + lock_map_acquire_tryread(&cpu_hotplug.dep_map) 82 84 #define cpuhp_lock_acquire() lock_map_acquire(&cpu_hotplug.dep_map) 83 85 #define cpuhp_lock_release() lock_map_release(&cpu_hotplug.dep_map) 84 86 ··· 93 91 mutex_lock(&cpu_hotplug.lock); 94 92 cpu_hotplug.refcount++; 95 93 mutex_unlock(&cpu_hotplug.lock); 96 - 97 94 } 98 95 EXPORT_SYMBOL_GPL(get_online_cpus); 96 + 97 + bool try_get_online_cpus(void) 98 + { 99 + if (cpu_hotplug.active_writer == current) 100 + return true; 101 + if (!mutex_trylock(&cpu_hotplug.lock)) 102 + return false; 103 + cpuhp_lock_acquire_tryread(); 104 + cpu_hotplug.refcount++; 105 + mutex_unlock(&cpu_hotplug.lock); 106 + return true; 107 + } 108 + EXPORT_SYMBOL_GPL(try_get_online_cpus); 99 109 100 110 void put_online_cpus(void) 101 111 {

+3

kernel/exit.c

··· 667 667 { 668 668 struct task_struct *tsk = current; 669 669 int group_dead; 670 + TASKS_RCU(int tasks_rcu_i); 670 671 671 672 profile_task_exit(tsk); 672 673 ··· 776 775 */ 777 776 flush_ptrace_hw_breakpoint(tsk); 778 777 778 + TASKS_RCU(tasks_rcu_i = __srcu_read_lock(&tasks_rcu_exit_srcu)); 779 779 exit_notify(tsk, group_dead); 780 780 proc_exit_connector(tsk); 781 781 #ifdef CONFIG_NUMA ··· 816 814 if (tsk->nr_dirtied) 817 815 __this_cpu_add(dirty_throttle_leaks, tsk->nr_dirtied); 818 816 exit_rcu(); 817 + TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i)); 819 818 820 819 /* 821 820 * The setting of TASK_RUNNING by try_to_wake_up() may be delayed

+445 -84

kernel/locking/locktorture.c

··· 20 20 * Author: Paul E. McKenney <paulmck@us.ibm.com> 21 21 * Based on kernel/rcu/torture.c. 22 22 */ 23 - #include <linux/types.h> 24 23 #include <linux/kernel.h> 25 - #include <linux/init.h> 26 24 #include <linux/module.h> 27 25 #include <linux/kthread.h> 28 - #include <linux/err.h> 29 26 #include <linux/spinlock.h> 27 + #include <linux/rwlock.h> 28 + #include <linux/mutex.h> 29 + #include <linux/rwsem.h> 30 30 #include <linux/smp.h> 31 31 #include <linux/interrupt.h> 32 32 #include <linux/sched.h> 33 33 #include <linux/atomic.h> 34 - #include <linux/bitops.h> 35 - #include <linux/completion.h> 36 34 #include <linux/moduleparam.h> 37 - #include <linux/percpu.h> 38 - #include <linux/notifier.h> 39 - #include <linux/reboot.h> 40 - #include <linux/freezer.h> 41 - #include <linux/cpu.h> 42 35 #include <linux/delay.h> 43 - #include <linux/stat.h> 44 36 #include <linux/slab.h> 45 - #include <linux/trace_clock.h> 46 - #include <asm/byteorder.h> 47 37 #include <linux/torture.h> 48 38 49 39 MODULE_LICENSE("GPL"); ··· 41 51 42 52 torture_param(int, nwriters_stress, -1, 43 53 "Number of write-locking stress-test threads"); 54 + torture_param(int, nreaders_stress, -1, 55 + "Number of read-locking stress-test threads"); 44 56 torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)"); 45 57 torture_param(int, onoff_interval, 0, 46 58 "Time between CPU hotplugs (s), 0=disable"); ··· 58 66 static char *torture_type = "spin_lock"; 59 67 module_param(torture_type, charp, 0444); 60 68 MODULE_PARM_DESC(torture_type, 61 - "Type of lock to torture (spin_lock, spin_lock_irq, ...)"); 62 - 63 - static atomic_t n_lock_torture_errors; 69 + "Type of lock to torture (spin_lock, spin_lock_irq, mutex_lock, ...)"); 64 70 65 71 static struct task_struct *stats_task; 66 72 static struct task_struct **writer_tasks; 73 + static struct task_struct **reader_tasks; 67 74 68 - static int nrealwriters_stress; 69 75 static bool lock_is_write_held; 76 + static bool lock_is_read_held; 70 77 71 - struct lock_writer_stress_stats { 72 - long n_write_lock_fail; 73 - long n_write_lock_acquired; 78 + struct lock_stress_stats { 79 + long n_lock_fail; 80 + long n_lock_acquired; 74 81 }; 75 - static struct lock_writer_stress_stats *lwsa; 76 82 77 83 #if defined(MODULE) 78 84 #define LOCKTORTURE_RUNNABLE_INIT 1 79 85 #else 80 86 #define LOCKTORTURE_RUNNABLE_INIT 0 81 87 #endif 82 - int locktorture_runnable = LOCKTORTURE_RUNNABLE_INIT; 83 - module_param(locktorture_runnable, int, 0444); 84 - MODULE_PARM_DESC(locktorture_runnable, "Start locktorture at module init"); 88 + int torture_runnable = LOCKTORTURE_RUNNABLE_INIT; 89 + module_param(torture_runnable, int, 0444); 90 + MODULE_PARM_DESC(torture_runnable, "Start locktorture at module init"); 85 91 86 92 /* Forward reference. */ 87 93 static void lock_torture_cleanup(void); ··· 92 102 int (*writelock)(void); 93 103 void (*write_delay)(struct torture_random_state *trsp); 94 104 void (*writeunlock)(void); 105 + int (*readlock)(void); 106 + void (*read_delay)(struct torture_random_state *trsp); 107 + void (*readunlock)(void); 95 108 unsigned long flags; 96 109 const char *name; 97 110 }; 98 111 99 - static struct lock_torture_ops *cur_ops; 100 - 112 + struct lock_torture_cxt { 113 + int nrealwriters_stress; 114 + int nrealreaders_stress; 115 + bool debug_lock; 116 + atomic_t n_lock_torture_errors; 117 + struct lock_torture_ops *cur_ops; 118 + struct lock_stress_stats *lwsa; /* writer statistics */ 119 + struct lock_stress_stats *lrsa; /* reader statistics */ 120 + }; 121 + static struct lock_torture_cxt cxt = { 0, 0, false, 122 + ATOMIC_INIT(0), 123 + NULL, NULL}; 101 124 /* 102 125 * Definitions for lock torture testing. 103 126 */ ··· 126 123 127 124 /* We want a long delay occasionally to force massive contention. */ 128 125 if (!(torture_random(trsp) % 129 - (nrealwriters_stress * 2000 * longdelay_us))) 126 + (cxt.nrealwriters_stress * 2000 * longdelay_us))) 130 127 mdelay(longdelay_us); 131 128 #ifdef CONFIG_PREEMPT 132 - if (!(torture_random(trsp) % (nrealwriters_stress * 20000))) 129 + if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) 133 130 preempt_schedule(); /* Allow test to be preempted. */ 134 131 #endif 135 132 } ··· 143 140 .writelock = torture_lock_busted_write_lock, 144 141 .write_delay = torture_lock_busted_write_delay, 145 142 .writeunlock = torture_lock_busted_write_unlock, 143 + .readlock = NULL, 144 + .read_delay = NULL, 145 + .readunlock = NULL, 146 146 .name = "lock_busted" 147 147 }; 148 148 ··· 166 160 * we want a long delay occasionally to force massive contention. 167 161 */ 168 162 if (!(torture_random(trsp) % 169 - (nrealwriters_stress * 2000 * longdelay_us))) 163 + (cxt.nrealwriters_stress * 2000 * longdelay_us))) 170 164 mdelay(longdelay_us); 171 165 if (!(torture_random(trsp) % 172 - (nrealwriters_stress * 2 * shortdelay_us))) 166 + (cxt.nrealwriters_stress * 2 * shortdelay_us))) 173 167 udelay(shortdelay_us); 174 168 #ifdef CONFIG_PREEMPT 175 - if (!(torture_random(trsp) % (nrealwriters_stress * 20000))) 169 + if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) 176 170 preempt_schedule(); /* Allow test to be preempted. */ 177 171 #endif 178 172 } ··· 186 180 .writelock = torture_spin_lock_write_lock, 187 181 .write_delay = torture_spin_lock_write_delay, 188 182 .writeunlock = torture_spin_lock_write_unlock, 183 + .readlock = NULL, 184 + .read_delay = NULL, 185 + .readunlock = NULL, 189 186 .name = "spin_lock" 190 187 }; 191 188 192 189 static int torture_spin_lock_write_lock_irq(void) 193 - __acquires(torture_spinlock_irq) 190 + __acquires(torture_spinlock) 194 191 { 195 192 unsigned long flags; 196 193 197 194 spin_lock_irqsave(&torture_spinlock, flags); 198 - cur_ops->flags = flags; 195 + cxt.cur_ops->flags = flags; 199 196 return 0; 200 197 } 201 198 202 199 static void torture_lock_spin_write_unlock_irq(void) 203 200 __releases(torture_spinlock) 204 201 { 205 - spin_unlock_irqrestore(&torture_spinlock, cur_ops->flags); 202 + spin_unlock_irqrestore(&torture_spinlock, cxt.cur_ops->flags); 206 203 } 207 204 208 205 static struct lock_torture_ops spin_lock_irq_ops = { 209 206 .writelock = torture_spin_lock_write_lock_irq, 210 207 .write_delay = torture_spin_lock_write_delay, 211 208 .writeunlock = torture_lock_spin_write_unlock_irq, 209 + .readlock = NULL, 210 + .read_delay = NULL, 211 + .readunlock = NULL, 212 212 .name = "spin_lock_irq" 213 + }; 214 + 215 + static DEFINE_RWLOCK(torture_rwlock); 216 + 217 + static int torture_rwlock_write_lock(void) __acquires(torture_rwlock) 218 + { 219 + write_lock(&torture_rwlock); 220 + return 0; 221 + } 222 + 223 + static void torture_rwlock_write_delay(struct torture_random_state *trsp) 224 + { 225 + const unsigned long shortdelay_us = 2; 226 + const unsigned long longdelay_ms = 100; 227 + 228 + /* We want a short delay mostly to emulate likely code, and 229 + * we want a long delay occasionally to force massive contention. 230 + */ 231 + if (!(torture_random(trsp) % 232 + (cxt.nrealwriters_stress * 2000 * longdelay_ms))) 233 + mdelay(longdelay_ms); 234 + else 235 + udelay(shortdelay_us); 236 + } 237 + 238 + static void torture_rwlock_write_unlock(void) __releases(torture_rwlock) 239 + { 240 + write_unlock(&torture_rwlock); 241 + } 242 + 243 + static int torture_rwlock_read_lock(void) __acquires(torture_rwlock) 244 + { 245 + read_lock(&torture_rwlock); 246 + return 0; 247 + } 248 + 249 + static void torture_rwlock_read_delay(struct torture_random_state *trsp) 250 + { 251 + const unsigned long shortdelay_us = 10; 252 + const unsigned long longdelay_ms = 100; 253 + 254 + /* We want a short delay mostly to emulate likely code, and 255 + * we want a long delay occasionally to force massive contention. 256 + */ 257 + if (!(torture_random(trsp) % 258 + (cxt.nrealreaders_stress * 2000 * longdelay_ms))) 259 + mdelay(longdelay_ms); 260 + else 261 + udelay(shortdelay_us); 262 + } 263 + 264 + static void torture_rwlock_read_unlock(void) __releases(torture_rwlock) 265 + { 266 + read_unlock(&torture_rwlock); 267 + } 268 + 269 + static struct lock_torture_ops rw_lock_ops = { 270 + .writelock = torture_rwlock_write_lock, 271 + .write_delay = torture_rwlock_write_delay, 272 + .writeunlock = torture_rwlock_write_unlock, 273 + .readlock = torture_rwlock_read_lock, 274 + .read_delay = torture_rwlock_read_delay, 275 + .readunlock = torture_rwlock_read_unlock, 276 + .name = "rw_lock" 277 + }; 278 + 279 + static int torture_rwlock_write_lock_irq(void) __acquires(torture_rwlock) 280 + { 281 + unsigned long flags; 282 + 283 + write_lock_irqsave(&torture_rwlock, flags); 284 + cxt.cur_ops->flags = flags; 285 + return 0; 286 + } 287 + 288 + static void torture_rwlock_write_unlock_irq(void) 289 + __releases(torture_rwlock) 290 + { 291 + write_unlock_irqrestore(&torture_rwlock, cxt.cur_ops->flags); 292 + } 293 + 294 + static int torture_rwlock_read_lock_irq(void) __acquires(torture_rwlock) 295 + { 296 + unsigned long flags; 297 + 298 + read_lock_irqsave(&torture_rwlock, flags); 299 + cxt.cur_ops->flags = flags; 300 + return 0; 301 + } 302 + 303 + static void torture_rwlock_read_unlock_irq(void) 304 + __releases(torture_rwlock) 305 + { 306 + write_unlock_irqrestore(&torture_rwlock, cxt.cur_ops->flags); 307 + } 308 + 309 + static struct lock_torture_ops rw_lock_irq_ops = { 310 + .writelock = torture_rwlock_write_lock_irq, 311 + .write_delay = torture_rwlock_write_delay, 312 + .writeunlock = torture_rwlock_write_unlock_irq, 313 + .readlock = torture_rwlock_read_lock_irq, 314 + .read_delay = torture_rwlock_read_delay, 315 + .readunlock = torture_rwlock_read_unlock_irq, 316 + .name = "rw_lock_irq" 317 + }; 318 + 319 + static DEFINE_MUTEX(torture_mutex); 320 + 321 + static int torture_mutex_lock(void) __acquires(torture_mutex) 322 + { 323 + mutex_lock(&torture_mutex); 324 + return 0; 325 + } 326 + 327 + static void torture_mutex_delay(struct torture_random_state *trsp) 328 + { 329 + const unsigned long longdelay_ms = 100; 330 + 331 + /* We want a long delay occasionally to force massive contention. */ 332 + if (!(torture_random(trsp) % 333 + (cxt.nrealwriters_stress * 2000 * longdelay_ms))) 334 + mdelay(longdelay_ms * 5); 335 + else 336 + mdelay(longdelay_ms / 5); 337 + #ifdef CONFIG_PREEMPT 338 + if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) 339 + preempt_schedule(); /* Allow test to be preempted. */ 340 + #endif 341 + } 342 + 343 + static void torture_mutex_unlock(void) __releases(torture_mutex) 344 + { 345 + mutex_unlock(&torture_mutex); 346 + } 347 + 348 + static struct lock_torture_ops mutex_lock_ops = { 349 + .writelock = torture_mutex_lock, 350 + .write_delay = torture_mutex_delay, 351 + .writeunlock = torture_mutex_unlock, 352 + .readlock = NULL, 353 + .read_delay = NULL, 354 + .readunlock = NULL, 355 + .name = "mutex_lock" 356 + }; 357 + 358 + static DECLARE_RWSEM(torture_rwsem); 359 + static int torture_rwsem_down_write(void) __acquires(torture_rwsem) 360 + { 361 + down_write(&torture_rwsem); 362 + return 0; 363 + } 364 + 365 + static void torture_rwsem_write_delay(struct torture_random_state *trsp) 366 + { 367 + const unsigned long longdelay_ms = 100; 368 + 369 + /* We want a long delay occasionally to force massive contention. */ 370 + if (!(torture_random(trsp) % 371 + (cxt.nrealwriters_stress * 2000 * longdelay_ms))) 372 + mdelay(longdelay_ms * 10); 373 + else 374 + mdelay(longdelay_ms / 10); 375 + #ifdef CONFIG_PREEMPT 376 + if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) 377 + preempt_schedule(); /* Allow test to be preempted. */ 378 + #endif 379 + } 380 + 381 + static void torture_rwsem_up_write(void) __releases(torture_rwsem) 382 + { 383 + up_write(&torture_rwsem); 384 + } 385 + 386 + static int torture_rwsem_down_read(void) __acquires(torture_rwsem) 387 + { 388 + down_read(&torture_rwsem); 389 + return 0; 390 + } 391 + 392 + static void torture_rwsem_read_delay(struct torture_random_state *trsp) 393 + { 394 + const unsigned long longdelay_ms = 100; 395 + 396 + /* We want a long delay occasionally to force massive contention. */ 397 + if (!(torture_random(trsp) % 398 + (cxt.nrealwriters_stress * 2000 * longdelay_ms))) 399 + mdelay(longdelay_ms * 2); 400 + else 401 + mdelay(longdelay_ms / 2); 402 + #ifdef CONFIG_PREEMPT 403 + if (!(torture_random(trsp) % (cxt.nrealreaders_stress * 20000))) 404 + preempt_schedule(); /* Allow test to be preempted. */ 405 + #endif 406 + } 407 + 408 + static void torture_rwsem_up_read(void) __releases(torture_rwsem) 409 + { 410 + up_read(&torture_rwsem); 411 + } 412 + 413 + static struct lock_torture_ops rwsem_lock_ops = { 414 + .writelock = torture_rwsem_down_write, 415 + .write_delay = torture_rwsem_write_delay, 416 + .writeunlock = torture_rwsem_up_write, 417 + .readlock = torture_rwsem_down_read, 418 + .read_delay = torture_rwsem_read_delay, 419 + .readunlock = torture_rwsem_up_read, 420 + .name = "rwsem_lock" 213 421 }; 214 422 215 423 /* ··· 432 212 */ 433 213 static int lock_torture_writer(void *arg) 434 214 { 435 - struct lock_writer_stress_stats *lwsp = arg; 215 + struct lock_stress_stats *lwsp = arg; 436 216 static DEFINE_TORTURE_RANDOM(rand); 437 217 438 218 VERBOSE_TOROUT_STRING("lock_torture_writer task started"); ··· 441 221 do { 442 222 if ((torture_random(&rand) & 0xfffff) == 0) 443 223 schedule_timeout_uninterruptible(1); 444 - cur_ops->writelock(); 224 + 225 + cxt.cur_ops->writelock(); 445 226 if (WARN_ON_ONCE(lock_is_write_held)) 446 - lwsp->n_write_lock_fail++; 227 + lwsp->n_lock_fail++; 447 228 lock_is_write_held = 1; 448 - lwsp->n_write_lock_acquired++; 449 - cur_ops->write_delay(&rand); 229 + if (WARN_ON_ONCE(lock_is_read_held)) 230 + lwsp->n_lock_fail++; /* rare, but... */ 231 + 232 + lwsp->n_lock_acquired++; 233 + cxt.cur_ops->write_delay(&rand); 450 234 lock_is_write_held = 0; 451 - cur_ops->writeunlock(); 235 + cxt.cur_ops->writeunlock(); 236 + 452 237 stutter_wait("lock_torture_writer"); 453 238 } while (!torture_must_stop()); 454 239 torture_kthread_stopping("lock_torture_writer"); ··· 461 236 } 462 237 463 238 /* 239 + * Lock torture reader kthread. Repeatedly acquires and releases 240 + * the reader lock. 241 + */ 242 + static int lock_torture_reader(void *arg) 243 + { 244 + struct lock_stress_stats *lrsp = arg; 245 + static DEFINE_TORTURE_RANDOM(rand); 246 + 247 + VERBOSE_TOROUT_STRING("lock_torture_reader task started"); 248 + set_user_nice(current, MAX_NICE); 249 + 250 + do { 251 + if ((torture_random(&rand) & 0xfffff) == 0) 252 + schedule_timeout_uninterruptible(1); 253 + 254 + cxt.cur_ops->readlock(); 255 + lock_is_read_held = 1; 256 + if (WARN_ON_ONCE(lock_is_write_held)) 257 + lrsp->n_lock_fail++; /* rare, but... */ 258 + 259 + lrsp->n_lock_acquired++; 260 + cxt.cur_ops->read_delay(&rand); 261 + lock_is_read_held = 0; 262 + cxt.cur_ops->readunlock(); 263 + 264 + stutter_wait("lock_torture_reader"); 265 + } while (!torture_must_stop()); 266 + torture_kthread_stopping("lock_torture_reader"); 267 + return 0; 268 + } 269 + 270 + /* 464 271 * Create an lock-torture-statistics message in the specified buffer. 465 272 */ 466 - static void lock_torture_printk(char *page) 273 + static void __torture_print_stats(char *page, 274 + struct lock_stress_stats *statp, bool write) 467 275 { 468 276 bool fail = 0; 469 - int i; 277 + int i, n_stress; 470 278 long max = 0; 471 - long min = lwsa[0].n_write_lock_acquired; 279 + long min = statp[0].n_lock_acquired; 472 280 long long sum = 0; 473 281 474 - for (i = 0; i < nrealwriters_stress; i++) { 475 - if (lwsa[i].n_write_lock_fail) 282 + n_stress = write ? cxt.nrealwriters_stress : cxt.nrealreaders_stress; 283 + for (i = 0; i < n_stress; i++) { 284 + if (statp[i].n_lock_fail) 476 285 fail = true; 477 - sum += lwsa[i].n_write_lock_acquired; 478 - if (max < lwsa[i].n_write_lock_fail) 479 - max = lwsa[i].n_write_lock_fail; 480 - if (min > lwsa[i].n_write_lock_fail) 481 - min = lwsa[i].n_write_lock_fail; 286 + sum += statp[i].n_lock_acquired; 287 + if (max < statp[i].n_lock_fail) 288 + max = statp[i].n_lock_fail; 289 + if (min > statp[i].n_lock_fail) 290 + min = statp[i].n_lock_fail; 482 291 } 483 - page += sprintf(page, "%s%s ", torture_type, TORTURE_FLAG); 484 292 page += sprintf(page, 485 - "Writes: Total: %lld Max/Min: %ld/%ld %s Fail: %d %s\n", 293 + "%s: Total: %lld Max/Min: %ld/%ld %s Fail: %d %s\n", 294 + write ? "Writes" : "Reads ", 486 295 sum, max, min, max / 2 > min ? "???" : "", 487 296 fail, fail ? "!!!" : ""); 488 297 if (fail) 489 - atomic_inc(&n_lock_torture_errors); 298 + atomic_inc(&cxt.n_lock_torture_errors); 490 299 } 491 300 492 301 /* ··· 533 274 */ 534 275 static void lock_torture_stats_print(void) 535 276 { 536 - int size = nrealwriters_stress * 200 + 8192; 277 + int size = cxt.nrealwriters_stress * 200 + 8192; 537 278 char *buf; 279 + 280 + if (cxt.cur_ops->readlock) 281 + size += cxt.nrealreaders_stress * 200 + 8192; 538 282 539 283 buf = kmalloc(size, GFP_KERNEL); 540 284 if (!buf) { ··· 545 283 size); 546 284 return; 547 285 } 548 - lock_torture_printk(buf); 286 + 287 + __torture_print_stats(buf, cxt.lwsa, true); 549 288 pr_alert("%s", buf); 550 289 kfree(buf); 290 + 291 + if (cxt.cur_ops->readlock) { 292 + buf = kmalloc(size, GFP_KERNEL); 293 + if (!buf) { 294 + pr_err("lock_torture_stats_print: Out of memory, need: %d", 295 + size); 296 + return; 297 + } 298 + 299 + __torture_print_stats(buf, cxt.lrsa, false); 300 + pr_alert("%s", buf); 301 + kfree(buf); 302 + } 551 303 } 552 304 553 305 /* ··· 588 312 const char *tag) 589 313 { 590 314 pr_alert("%s" TORTURE_FLAG 591 - "--- %s: nwriters_stress=%d stat_interval=%d verbose=%d shuffle_interval=%d stutter=%d shutdown_secs=%d onoff_interval=%d onoff_holdoff=%d\n", 592 - torture_type, tag, nrealwriters_stress, stat_interval, verbose, 593 - shuffle_interval, stutter, shutdown_secs, 315 + "--- %s%s: nwriters_stress=%d nreaders_stress=%d stat_interval=%d verbose=%d shuffle_interval=%d stutter=%d shutdown_secs=%d onoff_interval=%d onoff_holdoff=%d\n", 316 + torture_type, tag, cxt.debug_lock ? " [debug]": "", 317 + cxt.nrealwriters_stress, cxt.nrealreaders_stress, stat_interval, 318 + verbose, shuffle_interval, stutter, shutdown_secs, 594 319 onoff_interval, onoff_holdoff); 595 320 } 596 321 ··· 599 322 { 600 323 int i; 601 324 602 - if (torture_cleanup()) 325 + if (torture_cleanup_begin()) 603 326 return; 604 327 605 328 if (writer_tasks) { 606 - for (i = 0; i < nrealwriters_stress; i++) 329 + for (i = 0; i < cxt.nrealwriters_stress; i++) 607 330 torture_stop_kthread(lock_torture_writer, 608 331 writer_tasks[i]); 609 332 kfree(writer_tasks); 610 333 writer_tasks = NULL; 611 334 } 612 335 336 + if (reader_tasks) { 337 + for (i = 0; i < cxt.nrealreaders_stress; i++) 338 + torture_stop_kthread(lock_torture_reader, 339 + reader_tasks[i]); 340 + kfree(reader_tasks); 341 + reader_tasks = NULL; 342 + } 343 + 613 344 torture_stop_kthread(lock_torture_stats, stats_task); 614 345 lock_torture_stats_print(); /* -After- the stats thread is stopped! */ 615 346 616 - if (atomic_read(&n_lock_torture_errors)) 617 - lock_torture_print_module_parms(cur_ops, 347 + if (atomic_read(&cxt.n_lock_torture_errors)) 348 + lock_torture_print_module_parms(cxt.cur_ops, 618 349 "End of test: FAILURE"); 619 350 else if (torture_onoff_failures()) 620 - lock_torture_print_module_parms(cur_ops, 351 + lock_torture_print_module_parms(cxt.cur_ops, 621 352 "End of test: LOCK_HOTPLUG"); 622 353 else 623 - lock_torture_print_module_parms(cur_ops, 354 + lock_torture_print_module_parms(cxt.cur_ops, 624 355 "End of test: SUCCESS"); 356 + torture_cleanup_end(); 625 357 } 626 358 627 359 static int __init lock_torture_init(void) 628 360 { 629 - int i; 361 + int i, j; 630 362 int firsterr = 0; 631 363 static struct lock_torture_ops *torture_ops[] = { 632 - &lock_busted_ops, &spin_lock_ops, &spin_lock_irq_ops, 364 + &lock_busted_ops, 365 + &spin_lock_ops, &spin_lock_irq_ops, 366 + &rw_lock_ops, &rw_lock_irq_ops, 367 + &mutex_lock_ops, 368 + &rwsem_lock_ops, 633 369 }; 634 370 635 - if (!torture_init_begin(torture_type, verbose, &locktorture_runnable)) 371 + if (!torture_init_begin(torture_type, verbose, &torture_runnable)) 636 372 return -EBUSY; 637 373 638 374 /* Process args and tell the world that the torturer is on the job. */ 639 375 for (i = 0; i < ARRAY_SIZE(torture_ops); i++) { 640 - cur_ops = torture_ops[i]; 641 - if (strcmp(torture_type, cur_ops->name) == 0) 376 + cxt.cur_ops = torture_ops[i]; 377 + if (strcmp(torture_type, cxt.cur_ops->name) == 0) 642 378 break; 643 379 } 644 380 if (i == ARRAY_SIZE(torture_ops)) { ··· 664 374 torture_init_end(); 665 375 return -EINVAL; 666 376 } 667 - if (cur_ops->init) 668 - cur_ops->init(); /* no "goto unwind" prior to this point!!! */ 377 + if (cxt.cur_ops->init) 378 + cxt.cur_ops->init(); /* no "goto unwind" prior to this point!!! */ 669 379 670 380 if (nwriters_stress >= 0) 671 - nrealwriters_stress = nwriters_stress; 381 + cxt.nrealwriters_stress = nwriters_stress; 672 382 else 673 - nrealwriters_stress = 2 * num_online_cpus(); 674 - lock_torture_print_module_parms(cur_ops, "Start of test"); 383 + cxt.nrealwriters_stress = 2 * num_online_cpus(); 384 + 385 + #ifdef CONFIG_DEBUG_MUTEXES 386 + if (strncmp(torture_type, "mutex", 5) == 0) 387 + cxt.debug_lock = true; 388 + #endif 389 + #ifdef CONFIG_DEBUG_SPINLOCK 390 + if ((strncmp(torture_type, "spin", 4) == 0) || 391 + (strncmp(torture_type, "rw_lock", 7) == 0)) 392 + cxt.debug_lock = true; 393 + #endif 675 394 676 395 /* Initialize the statistics so that each run gets its own numbers. */ 677 396 678 397 lock_is_write_held = 0; 679 - lwsa = kmalloc(sizeof(*lwsa) * nrealwriters_stress, GFP_KERNEL); 680 - if (lwsa == NULL) { 681 - VERBOSE_TOROUT_STRING("lwsa: Out of memory"); 398 + cxt.lwsa = kmalloc(sizeof(*cxt.lwsa) * cxt.nrealwriters_stress, GFP_KERNEL); 399 + if (cxt.lwsa == NULL) { 400 + VERBOSE_TOROUT_STRING("cxt.lwsa: Out of memory"); 682 401 firsterr = -ENOMEM; 683 402 goto unwind; 684 403 } 685 - for (i = 0; i < nrealwriters_stress; i++) { 686 - lwsa[i].n_write_lock_fail = 0; 687 - lwsa[i].n_write_lock_acquired = 0; 404 + for (i = 0; i < cxt.nrealwriters_stress; i++) { 405 + cxt.lwsa[i].n_lock_fail = 0; 406 + cxt.lwsa[i].n_lock_acquired = 0; 688 407 } 689 408 690 - /* Start up the kthreads. */ 409 + if (cxt.cur_ops->readlock) { 410 + if (nreaders_stress >= 0) 411 + cxt.nrealreaders_stress = nreaders_stress; 412 + else { 413 + /* 414 + * By default distribute evenly the number of 415 + * readers and writers. We still run the same number 416 + * of threads as the writer-only locks default. 417 + */ 418 + if (nwriters_stress < 0) /* user doesn't care */ 419 + cxt.nrealwriters_stress = num_online_cpus(); 420 + cxt.nrealreaders_stress = cxt.nrealwriters_stress; 421 + } 691 422 423 + lock_is_read_held = 0; 424 + cxt.lrsa = kmalloc(sizeof(*cxt.lrsa) * cxt.nrealreaders_stress, GFP_KERNEL); 425 + if (cxt.lrsa == NULL) { 426 + VERBOSE_TOROUT_STRING("cxt.lrsa: Out of memory"); 427 + firsterr = -ENOMEM; 428 + kfree(cxt.lwsa); 429 + goto unwind; 430 + } 431 + 432 + for (i = 0; i < cxt.nrealreaders_stress; i++) { 433 + cxt.lrsa[i].n_lock_fail = 0; 434 + cxt.lrsa[i].n_lock_acquired = 0; 435 + } 436 + } 437 + lock_torture_print_module_parms(cxt.cur_ops, "Start of test"); 438 + 439 + /* Prepare torture context. */ 692 440 if (onoff_interval > 0) { 693 441 firsterr = torture_onoff_init(onoff_holdoff * HZ, 694 442 onoff_interval * HZ); ··· 750 422 goto unwind; 751 423 } 752 424 753 - writer_tasks = kzalloc(nrealwriters_stress * sizeof(writer_tasks[0]), 425 + writer_tasks = kzalloc(cxt.nrealwriters_stress * sizeof(writer_tasks[0]), 754 426 GFP_KERNEL); 755 427 if (writer_tasks == NULL) { 756 428 VERBOSE_TOROUT_ERRSTRING("writer_tasks: Out of memory"); 757 429 firsterr = -ENOMEM; 758 430 goto unwind; 759 431 } 760 - for (i = 0; i < nrealwriters_stress; i++) { 761 - firsterr = torture_create_kthread(lock_torture_writer, &lwsa[i], 432 + 433 + if (cxt.cur_ops->readlock) { 434 + reader_tasks = kzalloc(cxt.nrealreaders_stress * sizeof(reader_tasks[0]), 435 + GFP_KERNEL); 436 + if (reader_tasks == NULL) { 437 + VERBOSE_TOROUT_ERRSTRING("reader_tasks: Out of memory"); 438 + firsterr = -ENOMEM; 439 + goto unwind; 440 + } 441 + } 442 + 443 + /* 444 + * Create the kthreads and start torturing (oh, those poor little locks). 445 + * 446 + * TODO: Note that we interleave writers with readers, giving writers a 447 + * slight advantage, by creating its kthread first. This can be modified 448 + * for very specific needs, or even let the user choose the policy, if 449 + * ever wanted. 450 + */ 451 + for (i = 0, j = 0; i < cxt.nrealwriters_stress || 452 + j < cxt.nrealreaders_stress; i++, j++) { 453 + if (i >= cxt.nrealwriters_stress) 454 + goto create_reader; 455 + 456 + /* Create writer. */ 457 + firsterr = torture_create_kthread(lock_torture_writer, &cxt.lwsa[i], 762 458 writer_tasks[i]); 459 + if (firsterr) 460 + goto unwind; 461 + 462 + create_reader: 463 + if (cxt.cur_ops->readlock == NULL || (j >= cxt.nrealreaders_stress)) 464 + continue; 465 + /* Create reader. */ 466 + firsterr = torture_create_kthread(lock_torture_reader, &cxt.lrsa[j], 467 + reader_tasks[j]); 763 468 if (firsterr) 764 469 goto unwind; 765 470 }

+198 -80

kernel/rcu/rcutorture.c

··· 49 49 #include <linux/trace_clock.h> 50 50 #include <asm/byteorder.h> 51 51 #include <linux/torture.h> 52 + #include <linux/vmalloc.h> 52 53 53 54 MODULE_LICENSE("GPL"); 54 55 MODULE_AUTHOR("Paul E. McKenney <paulmck@us.ibm.com> and Josh Triplett <josh@joshtriplett.org>"); 55 56 56 57 58 + torture_param(int, cbflood_inter_holdoff, HZ, 59 + "Holdoff between floods (jiffies)"); 60 + torture_param(int, cbflood_intra_holdoff, 1, 61 + "Holdoff between bursts (jiffies)"); 62 + torture_param(int, cbflood_n_burst, 3, "# bursts in flood, zero to disable"); 63 + torture_param(int, cbflood_n_per_burst, 20000, 64 + "# callbacks per burst in flood"); 57 65 torture_param(int, fqs_duration, 0, 58 66 "Duration of fqs bursts (us), 0 to disable"); 59 67 torture_param(int, fqs_holdoff, 0, "Holdoff time within fqs bursts (us)"); ··· 104 96 MODULE_PARM_DESC(torture_type, "Type of RCU to torture (rcu, rcu_bh, ...)"); 105 97 106 98 static int nrealreaders; 99 + static int ncbflooders; 107 100 static struct task_struct *writer_task; 108 101 static struct task_struct **fakewriter_tasks; 109 102 static struct task_struct **reader_tasks; 110 103 static struct task_struct *stats_task; 104 + static struct task_struct **cbflood_task; 111 105 static struct task_struct *fqs_task; 112 106 static struct task_struct *boost_tasks[NR_CPUS]; 113 107 static struct task_struct *stall_task; ··· 148 138 static long n_rcu_torture_timers; 149 139 static long n_barrier_attempts; 150 140 static long n_barrier_successes; 141 + static atomic_long_t n_cbfloods; 151 142 static struct list_head rcu_torture_removed; 152 143 153 144 static int rcu_torture_writer_state; ··· 168 157 #else 169 158 #define RCUTORTURE_RUNNABLE_INIT 0 170 159 #endif 171 - int rcutorture_runnable = RCUTORTURE_RUNNABLE_INIT; 172 - module_param(rcutorture_runnable, int, 0444); 173 - MODULE_PARM_DESC(rcutorture_runnable, "Start rcutorture at boot"); 160 + static int torture_runnable = RCUTORTURE_RUNNABLE_INIT; 161 + module_param(torture_runnable, int, 0444); 162 + MODULE_PARM_DESC(torture_runnable, "Start rcutorture at boot"); 174 163 175 164 #if defined(CONFIG_RCU_BOOST) && !defined(CONFIG_HOTPLUG_CPU) 176 165 #define rcu_can_boost() 1 ··· 193 182 #endif /* #else #ifdef CONFIG_RCU_TRACE */ 194 183 195 184 static unsigned long boost_starttime; /* jiffies of next boost test start. */ 196 - DEFINE_MUTEX(boost_mutex); /* protect setting boost_starttime */ 185 + static DEFINE_MUTEX(boost_mutex); /* protect setting boost_starttime */ 197 186 /* and boost task create/destroy. */ 198 187 static atomic_t barrier_cbs_count; /* Barrier callbacks registered. */ 199 188 static bool barrier_phase; /* Test phase. */ ··· 253 242 void (*call)(struct rcu_head *head, void (*func)(struct rcu_head *rcu)); 254 243 void (*cb_barrier)(void); 255 244 void (*fqs)(void); 256 - void (*stats)(char *page); 245 + void (*stats)(void); 257 246 int irq_capable; 258 247 int can_boost; 259 248 const char *name; ··· 536 525 srcu_barrier(&srcu_ctl); 537 526 } 538 527 539 - static void srcu_torture_stats(char *page) 528 + static void srcu_torture_stats(void) 540 529 { 541 530 int cpu; 542 531 int idx = srcu_ctl.completed & 0x1; 543 532 544 - page += sprintf(page, "%s%s per-CPU(idx=%d):", 545 - torture_type, TORTURE_FLAG, idx); 533 + pr_alert("%s%s per-CPU(idx=%d):", 534 + torture_type, TORTURE_FLAG, idx); 546 535 for_each_possible_cpu(cpu) { 547 536 long c0, c1; 548 537 549 538 c0 = (long)per_cpu_ptr(srcu_ctl.per_cpu_ref, cpu)->c[!idx]; 550 539 c1 = (long)per_cpu_ptr(srcu_ctl.per_cpu_ref, cpu)->c[idx]; 551 - page += sprintf(page, " %d(%ld,%ld)", cpu, c0, c1); 540 + pr_cont(" %d(%ld,%ld)", cpu, c0, c1); 552 541 } 553 - sprintf(page, "\n"); 542 + pr_cont("\n"); 554 543 } 555 544 556 545 static void srcu_torture_synchronize_expedited(void) ··· 611 600 .irq_capable = 1, 612 601 .name = "sched" 613 602 }; 603 + 604 + #ifdef CONFIG_TASKS_RCU 605 + 606 + /* 607 + * Definitions for RCU-tasks torture testing. 608 + */ 609 + 610 + static int tasks_torture_read_lock(void) 611 + { 612 + return 0; 613 + } 614 + 615 + static void tasks_torture_read_unlock(int idx) 616 + { 617 + } 618 + 619 + static void rcu_tasks_torture_deferred_free(struct rcu_torture *p) 620 + { 621 + call_rcu_tasks(&p->rtort_rcu, rcu_torture_cb); 622 + } 623 + 624 + static struct rcu_torture_ops tasks_ops = { 625 + .ttype = RCU_TASKS_FLAVOR, 626 + .init = rcu_sync_torture_init, 627 + .readlock = tasks_torture_read_lock, 628 + .read_delay = rcu_read_delay, /* just reuse rcu's version. */ 629 + .readunlock = tasks_torture_read_unlock, 630 + .completed = rcu_no_completed, 631 + .deferred_free = rcu_tasks_torture_deferred_free, 632 + .sync = synchronize_rcu_tasks, 633 + .exp_sync = synchronize_rcu_tasks, 634 + .call = call_rcu_tasks, 635 + .cb_barrier = rcu_barrier_tasks, 636 + .fqs = NULL, 637 + .stats = NULL, 638 + .irq_capable = 1, 639 + .name = "tasks" 640 + }; 641 + 642 + #define RCUTORTURE_TASKS_OPS &tasks_ops, 643 + 644 + #else /* #ifdef CONFIG_TASKS_RCU */ 645 + 646 + #define RCUTORTURE_TASKS_OPS 647 + 648 + #endif /* #else #ifdef CONFIG_TASKS_RCU */ 614 649 615 650 /* 616 651 * RCU torture priority-boost testing. Runs one real-time thread per ··· 724 667 } 725 668 call_rcu_time = jiffies; 726 669 } 727 - cond_resched(); 670 + cond_resched_rcu_qs(); 728 671 stutter_wait("rcu_torture_boost"); 729 672 if (torture_must_stop()) 730 673 goto checkwait; ··· 761 704 smp_mb(); /* order accesses to ->inflight before stack-frame death. */ 762 705 destroy_rcu_head_on_stack(&rbi.rcu); 763 706 torture_kthread_stopping("rcu_torture_boost"); 707 + return 0; 708 + } 709 + 710 + static void rcu_torture_cbflood_cb(struct rcu_head *rhp) 711 + { 712 + } 713 + 714 + /* 715 + * RCU torture callback-flood kthread. Repeatedly induces bursts of calls 716 + * to call_rcu() or analogous, increasing the probability of occurrence 717 + * of callback-overflow corner cases. 718 + */ 719 + static int 720 + rcu_torture_cbflood(void *arg) 721 + { 722 + int err = 1; 723 + int i; 724 + int j; 725 + struct rcu_head *rhp; 726 + 727 + if (cbflood_n_per_burst > 0 && 728 + cbflood_inter_holdoff > 0 && 729 + cbflood_intra_holdoff > 0 && 730 + cur_ops->call && 731 + cur_ops->cb_barrier) { 732 + rhp = vmalloc(sizeof(*rhp) * 733 + cbflood_n_burst * cbflood_n_per_burst); 734 + err = !rhp; 735 + } 736 + if (err) { 737 + VERBOSE_TOROUT_STRING("rcu_torture_cbflood disabled: Bad args or OOM"); 738 + while (!torture_must_stop()) 739 + schedule_timeout_interruptible(HZ); 740 + return 0; 741 + } 742 + VERBOSE_TOROUT_STRING("rcu_torture_cbflood task started"); 743 + do { 744 + schedule_timeout_interruptible(cbflood_inter_holdoff); 745 + atomic_long_inc(&n_cbfloods); 746 + WARN_ON(signal_pending(current)); 747 + for (i = 0; i < cbflood_n_burst; i++) { 748 + for (j = 0; j < cbflood_n_per_burst; j++) { 749 + cur_ops->call(&rhp[i * cbflood_n_per_burst + j], 750 + rcu_torture_cbflood_cb); 751 + } 752 + schedule_timeout_interruptible(cbflood_intra_holdoff); 753 + WARN_ON(signal_pending(current)); 754 + } 755 + cur_ops->cb_barrier(); 756 + stutter_wait("rcu_torture_cbflood"); 757 + } while (!torture_must_stop()); 758 + torture_kthread_stopping("rcu_torture_cbflood"); 764 759 return 0; 765 760 } 766 761 ··· 1128 1019 __this_cpu_inc(rcu_torture_batch[completed]); 1129 1020 preempt_enable(); 1130 1021 cur_ops->readunlock(idx); 1131 - cond_resched(); 1022 + cond_resched_rcu_qs(); 1132 1023 stutter_wait("rcu_torture_reader"); 1133 1024 } while (!torture_must_stop()); 1134 1025 if (irqreader && cur_ops->irq_capable) { ··· 1140 1031 } 1141 1032 1142 1033 /* 1143 - * Create an RCU-torture statistics message in the specified buffer. 1034 + * Print torture statistics. Caller must ensure that there is only 1035 + * one call to this function at a given time!!! This is normally 1036 + * accomplished by relying on the module system to only have one copy 1037 + * of the module loaded, and then by giving the rcu_torture_stats 1038 + * kthread full control (or the init/cleanup functions when rcu_torture_stats 1039 + * thread is not running). 1144 1040 */ 1145 1041 static void 1146 - rcu_torture_printk(char *page) 1042 + rcu_torture_stats_print(void) 1147 1043 { 1148 1044 int cpu; 1149 1045 int i; ··· 1166 1052 if (pipesummary[i] != 0) 1167 1053 break; 1168 1054 } 1169 - page += sprintf(page, "%s%s ", torture_type, TORTURE_FLAG); 1170 - page += sprintf(page, 1171 - "rtc: %p ver: %lu tfle: %d rta: %d rtaf: %d rtf: %d ", 1172 - rcu_torture_current, 1173 - rcu_torture_current_version, 1174 - list_empty(&rcu_torture_freelist), 1175 - atomic_read(&n_rcu_torture_alloc), 1176 - atomic_read(&n_rcu_torture_alloc_fail), 1177 - atomic_read(&n_rcu_torture_free)); 1178 - page += sprintf(page, "rtmbe: %d rtbke: %ld rtbre: %ld ", 1179 - atomic_read(&n_rcu_torture_mberror), 1180 - n_rcu_torture_boost_ktrerror, 1181 - n_rcu_torture_boost_rterror); 1182 - page += sprintf(page, "rtbf: %ld rtb: %ld nt: %ld ", 1183 - n_rcu_torture_boost_failure, 1184 - n_rcu_torture_boosts, 1185 - n_rcu_torture_timers); 1186 - page = torture_onoff_stats(page); 1187 - page += sprintf(page, "barrier: %ld/%ld:%ld", 1188 - n_barrier_successes, 1189 - n_barrier_attempts, 1190 - n_rcu_torture_barrier_error); 1191 - page += sprintf(page, "\n%s%s ", torture_type, TORTURE_FLAG); 1055 + 1056 + pr_alert("%s%s ", torture_type, TORTURE_FLAG); 1057 + pr_cont("rtc: %p ver: %lu tfle: %d rta: %d rtaf: %d rtf: %d ", 1058 + rcu_torture_current, 1059 + rcu_torture_current_version, 1060 + list_empty(&rcu_torture_freelist), 1061 + atomic_read(&n_rcu_torture_alloc), 1062 + atomic_read(&n_rcu_torture_alloc_fail), 1063 + atomic_read(&n_rcu_torture_free)); 1064 + pr_cont("rtmbe: %d rtbke: %ld rtbre: %ld ", 1065 + atomic_read(&n_rcu_torture_mberror), 1066 + n_rcu_torture_boost_ktrerror, 1067 + n_rcu_torture_boost_rterror); 1068 + pr_cont("rtbf: %ld rtb: %ld nt: %ld ", 1069 + n_rcu_torture_boost_failure, 1070 + n_rcu_torture_boosts, 1071 + n_rcu_torture_timers); 1072 + torture_onoff_stats(); 1073 + pr_cont("barrier: %ld/%ld:%ld ", 1074 + n_barrier_successes, 1075 + n_barrier_attempts, 1076 + n_rcu_torture_barrier_error); 1077 + pr_cont("cbflood: %ld\n", atomic_long_read(&n_cbfloods)); 1078 + 1079 + pr_alert("%s%s ", torture_type, TORTURE_FLAG); 1192 1080 if (atomic_read(&n_rcu_torture_mberror) != 0 || 1193 1081 n_rcu_torture_barrier_error != 0 || 1194 1082 n_rcu_torture_boost_ktrerror != 0 || 1195 1083 n_rcu_torture_boost_rterror != 0 || 1196 1084 n_rcu_torture_boost_failure != 0 || 1197 1085 i > 1) { 1198 - page += sprintf(page, "!!! "); 1086 + pr_cont("%s", "!!! "); 1199 1087 atomic_inc(&n_rcu_torture_error); 1200 1088 WARN_ON_ONCE(1); 1201 1089 } 1202 - page += sprintf(page, "Reader Pipe: "); 1090 + pr_cont("Reader Pipe: "); 1203 1091 for (i = 0; i < RCU_TORTURE_PIPE_LEN + 1; i++) 1204 - page += sprintf(page, " %ld", pipesummary[i]); 1205 - page += sprintf(page, "\n%s%s ", torture_type, TORTURE_FLAG); 1206 - page += sprintf(page, "Reader Batch: "); 1092 + pr_cont(" %ld", pipesummary[i]); 1093 + pr_cont("\n"); 1094 + 1095 + pr_alert("%s%s ", torture_type, TORTURE_FLAG); 1096 + pr_cont("Reader Batch: "); 1207 1097 for (i = 0; i < RCU_TORTURE_PIPE_LEN + 1; i++) 1208 - page += sprintf(page, " %ld", batchsummary[i]); 1209 - page += sprintf(page, "\n%s%s ", torture_type, TORTURE_FLAG); 1210 - page += sprintf(page, "Free-Block Circulation: "); 1098 + pr_cont(" %ld", batchsummary[i]); 1099 + pr_cont("\n"); 1100 + 1101 + pr_alert("%s%s ", torture_type, TORTURE_FLAG); 1102 + pr_cont("Free-Block Circulation: "); 1211 1103 for (i = 0; i < RCU_TORTURE_PIPE_LEN + 1; i++) { 1212 - page += sprintf(page, " %d", 1213 - atomic_read(&rcu_torture_wcount[i])); 1104 + pr_cont(" %d", atomic_read(&rcu_torture_wcount[i])); 1214 1105 } 1215 - page += sprintf(page, "\n"); 1106 + pr_cont("\n"); 1107 + 1216 1108 if (cur_ops->stats) 1217 - cur_ops->stats(page); 1109 + cur_ops->stats(); 1218 1110 if (rtcv_snap == rcu_torture_current_version && 1219 1111 rcu_torture_current != NULL) { 1220 1112 int __maybe_unused flags; ··· 1229 1109 1230 1110 rcutorture_get_gp_data(cur_ops->ttype, 1231 1111 &flags, &gpnum, &completed); 1232 - page += sprintf(page, 1233 - "??? Writer stall state %d g%lu c%lu f%#x\n", 1234 - rcu_torture_writer_state, 1235 - gpnum, completed, flags); 1112 + pr_alert("??? Writer stall state %d g%lu c%lu f%#x\n", 1113 + rcu_torture_writer_state, 1114 + gpnum, completed, flags); 1236 1115 show_rcu_gp_kthreads(); 1237 1116 rcutorture_trace_dump(); 1238 1117 } 1239 1118 rtcv_snap = rcu_torture_current_version; 1240 - } 1241 - 1242 - /* 1243 - * Print torture statistics. Caller must ensure that there is only 1244 - * one call to this function at a given time!!! This is normally 1245 - * accomplished by relying on the module system to only have one copy 1246 - * of the module loaded, and then by giving the rcu_torture_stats 1247 - * kthread full control (or the init/cleanup functions when rcu_torture_stats 1248 - * thread is not running). 1249 - */ 1250 - static void 1251 - rcu_torture_stats_print(void) 1252 - { 1253 - int size = nr_cpu_ids * 200 + 8192; 1254 - char *buf; 1255 - 1256 - buf = kmalloc(size, GFP_KERNEL); 1257 - if (!buf) { 1258 - pr_err("rcu-torture: Out of memory, need: %d", size); 1259 - return; 1260 - } 1261 - rcu_torture_printk(buf); 1262 - pr_alert("%s", buf); 1263 - kfree(buf); 1264 1119 } 1265 1120 1266 1121 /* ··· 1390 1295 if (atomic_dec_and_test(&barrier_cbs_count)) 1391 1296 wake_up(&barrier_wq); 1392 1297 } while (!torture_must_stop()); 1393 - cur_ops->cb_barrier(); 1298 + if (cur_ops->cb_barrier != NULL) 1299 + cur_ops->cb_barrier(); 1394 1300 destroy_rcu_head_on_stack(&rcu); 1395 1301 torture_kthread_stopping("rcu_torture_barrier_cbs"); 1396 1302 return 0; ··· 1514 1418 int i; 1515 1419 1516 1420 rcutorture_record_test_transition(); 1517 - if (torture_cleanup()) { 1421 + if (torture_cleanup_begin()) { 1518 1422 if (cur_ops->cb_barrier != NULL) 1519 1423 cur_ops->cb_barrier(); 1520 1424 return; ··· 1543 1447 1544 1448 torture_stop_kthread(rcu_torture_stats, stats_task); 1545 1449 torture_stop_kthread(rcu_torture_fqs, fqs_task); 1450 + for (i = 0; i < ncbflooders; i++) 1451 + torture_stop_kthread(rcu_torture_cbflood, cbflood_task[i]); 1546 1452 if ((test_boost == 1 && cur_ops->can_boost) || 1547 1453 test_boost == 2) { 1548 1454 unregister_cpu_notifier(&rcutorture_cpu_nb); ··· 1566 1468 "End of test: RCU_HOTPLUG"); 1567 1469 else 1568 1470 rcu_torture_print_module_parms(cur_ops, "End of test: SUCCESS"); 1471 + torture_cleanup_end(); 1569 1472 } 1570 1473 1571 1474 #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD ··· 1633 1534 int firsterr = 0; 1634 1535 static struct rcu_torture_ops *torture_ops[] = { 1635 1536 &rcu_ops, &rcu_bh_ops, &rcu_busted_ops, &srcu_ops, &sched_ops, 1537 + RCUTORTURE_TASKS_OPS 1636 1538 }; 1637 1539 1638 - if (!torture_init_begin(torture_type, verbose, &rcutorture_runnable)) 1540 + if (!torture_init_begin(torture_type, verbose, &torture_runnable)) 1639 1541 return -EBUSY; 1640 1542 1641 1543 /* Process args and tell the world that the torturer is on the job. */ ··· 1793 1693 goto unwind; 1794 1694 if (object_debug) 1795 1695 rcu_test_debug_objects(); 1696 + if (cbflood_n_burst > 0) { 1697 + /* Create the cbflood threads */ 1698 + ncbflooders = (num_online_cpus() + 3) / 4; 1699 + cbflood_task = kcalloc(ncbflooders, sizeof(*cbflood_task), 1700 + GFP_KERNEL); 1701 + if (!cbflood_task) { 1702 + VERBOSE_TOROUT_ERRSTRING("out of memory"); 1703 + firsterr = -ENOMEM; 1704 + goto unwind; 1705 + } 1706 + for (i = 0; i < ncbflooders; i++) { 1707 + firsterr = torture_create_kthread(rcu_torture_cbflood, 1708 + NULL, 1709 + cbflood_task[i]); 1710 + if (firsterr) 1711 + goto unwind; 1712 + } 1713 + } 1796 1714 rcutorture_record_test_transition(); 1797 1715 torture_init_end(); 1798 1716 return 0;

+11 -9

kernel/rcu/tiny.c

··· 51 51 52 52 #include "tiny_plugin.h" 53 53 54 - /* Common code for rcu_idle_enter() and rcu_irq_exit(), see kernel/rcutree.c. */ 54 + /* Common code for rcu_idle_enter() and rcu_irq_exit(), see kernel/rcu/tree.c. */ 55 55 static void rcu_idle_enter_common(long long newval) 56 56 { 57 57 if (newval) { ··· 62 62 } 63 63 RCU_TRACE(trace_rcu_dyntick(TPS("Start"), 64 64 rcu_dynticks_nesting, newval)); 65 - if (!is_idle_task(current)) { 65 + if (IS_ENABLED(CONFIG_RCU_TRACE) && !is_idle_task(current)) { 66 66 struct task_struct *idle __maybe_unused = idle_task(smp_processor_id()); 67 67 68 68 RCU_TRACE(trace_rcu_dyntick(TPS("Entry error: not idle task"), ··· 72 72 current->pid, current->comm, 73 73 idle->pid, idle->comm); /* must be idle task! */ 74 74 } 75 - rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */ 75 + rcu_sched_qs(); /* implies rcu_bh_inc() */ 76 76 barrier(); 77 77 rcu_dynticks_nesting = newval; 78 78 } ··· 114 114 } 115 115 EXPORT_SYMBOL_GPL(rcu_irq_exit); 116 116 117 - /* Common code for rcu_idle_exit() and rcu_irq_enter(), see kernel/rcutree.c. */ 117 + /* Common code for rcu_idle_exit() and rcu_irq_enter(), see kernel/rcu/tree.c. */ 118 118 static void rcu_idle_exit_common(long long oldval) 119 119 { 120 120 if (oldval) { ··· 123 123 return; 124 124 } 125 125 RCU_TRACE(trace_rcu_dyntick(TPS("End"), oldval, rcu_dynticks_nesting)); 126 - if (!is_idle_task(current)) { 126 + if (IS_ENABLED(CONFIG_RCU_TRACE) && !is_idle_task(current)) { 127 127 struct task_struct *idle __maybe_unused = idle_task(smp_processor_id()); 128 128 129 129 RCU_TRACE(trace_rcu_dyntick(TPS("Exit error: not idle task"), ··· 217 217 * are at it, given that any rcu quiescent state is also an rcu_bh 218 218 * quiescent state. Use "+" instead of "||" to defeat short circuiting. 219 219 */ 220 - void rcu_sched_qs(int cpu) 220 + void rcu_sched_qs(void) 221 221 { 222 222 unsigned long flags; 223 223 ··· 231 231 /* 232 232 * Record an rcu_bh quiescent state. 233 233 */ 234 - void rcu_bh_qs(int cpu) 234 + void rcu_bh_qs(void) 235 235 { 236 236 unsigned long flags; 237 237 ··· 251 251 { 252 252 RCU_TRACE(check_cpu_stalls()); 253 253 if (user || rcu_is_cpu_rrupt_from_idle()) 254 - rcu_sched_qs(cpu); 254 + rcu_sched_qs(); 255 255 else if (!in_softirq()) 256 - rcu_bh_qs(cpu); 256 + rcu_bh_qs(); 257 + if (user) 258 + rcu_note_voluntary_context_switch(current); 257 259 } 258 260 259 261 /*

+69 -46

kernel/rcu/tree.c

··· 79 79 * the tracing userspace tools to be able to decipher the string 80 80 * address to the matching string. 81 81 */ 82 - #define RCU_STATE_INITIALIZER(sname, sabbr, cr) \ 82 + #ifdef CONFIG_TRACING 83 + # define DEFINE_RCU_TPS(sname) \ 83 84 static char sname##_varname[] = #sname; \ 84 - static const char *tp_##sname##_varname __used __tracepoint_string = sname##_varname; \ 85 + static const char *tp_##sname##_varname __used __tracepoint_string = sname##_varname; 86 + # define RCU_STATE_NAME(sname) sname##_varname 87 + #else 88 + # define DEFINE_RCU_TPS(sname) 89 + # define RCU_STATE_NAME(sname) __stringify(sname) 90 + #endif 91 + 92 + #define RCU_STATE_INITIALIZER(sname, sabbr, cr) \ 93 + DEFINE_RCU_TPS(sname) \ 85 94 struct rcu_state sname##_state = { \ 86 95 .level = { &sname##_state.node[0] }, \ 87 96 .call = cr, \ ··· 102 93 .orphan_donetail = &sname##_state.orphan_donelist, \ 103 94 .barrier_mutex = __MUTEX_INITIALIZER(sname##_state.barrier_mutex), \ 104 95 .onoff_mutex = __MUTEX_INITIALIZER(sname##_state.onoff_mutex), \ 105 - .name = sname##_varname, \ 96 + .name = RCU_STATE_NAME(sname), \ 106 97 .abbr = sabbr, \ 107 98 }; \ 108 99 DEFINE_PER_CPU(struct rcu_data, sname##_data) ··· 197 188 * one since the start of the grace period, this just sets a flag. 198 189 * The caller must have disabled preemption. 199 190 */ 200 - void rcu_sched_qs(int cpu) 191 + void rcu_sched_qs(void) 201 192 { 202 - struct rcu_data *rdp = &per_cpu(rcu_sched_data, cpu); 203 - 204 - if (rdp->passed_quiesce == 0) 205 - trace_rcu_grace_period(TPS("rcu_sched"), rdp->gpnum, TPS("cpuqs")); 206 - rdp->passed_quiesce = 1; 193 + if (!__this_cpu_read(rcu_sched_data.passed_quiesce)) { 194 + trace_rcu_grace_period(TPS("rcu_sched"), 195 + __this_cpu_read(rcu_sched_data.gpnum), 196 + TPS("cpuqs")); 197 + __this_cpu_write(rcu_sched_data.passed_quiesce, 1); 198 + } 207 199 } 208 200 209 - void rcu_bh_qs(int cpu) 201 + void rcu_bh_qs(void) 210 202 { 211 - struct rcu_data *rdp = &per_cpu(rcu_bh_data, cpu); 212 - 213 - if (rdp->passed_quiesce == 0) 214 - trace_rcu_grace_period(TPS("rcu_bh"), rdp->gpnum, TPS("cpuqs")); 215 - rdp->passed_quiesce = 1; 203 + if (!__this_cpu_read(rcu_bh_data.passed_quiesce)) { 204 + trace_rcu_grace_period(TPS("rcu_bh"), 205 + __this_cpu_read(rcu_bh_data.gpnum), 206 + TPS("cpuqs")); 207 + __this_cpu_write(rcu_bh_data.passed_quiesce, 1); 208 + } 216 209 } 217 210 218 211 static DEFINE_PER_CPU(int, rcu_sched_qs_mask); ··· 289 278 void rcu_note_context_switch(int cpu) 290 279 { 291 280 trace_rcu_utilization(TPS("Start context switch")); 292 - rcu_sched_qs(cpu); 281 + rcu_sched_qs(); 293 282 rcu_preempt_note_context_switch(cpu); 294 283 if (unlikely(raw_cpu_read(rcu_sched_qs_mask))) 295 284 rcu_momentary_dyntick_idle(); ··· 537 526 atomic_inc(&rdtp->dynticks); 538 527 smp_mb__after_atomic(); /* Force ordering with next sojourn. */ 539 528 WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1); 529 + rcu_dynticks_task_enter(); 540 530 541 531 /* 542 532 * It is illegal to enter an extended quiescent state while ··· 654 642 static void rcu_eqs_exit_common(struct rcu_dynticks *rdtp, long long oldval, 655 643 int user) 656 644 { 645 + rcu_dynticks_task_exit(); 657 646 smp_mb__before_atomic(); /* Force ordering w/previous sojourn. */ 658 647 atomic_inc(&rdtp->dynticks); 659 648 /* CPUs seeing atomic_inc() must see later RCU read-side crit sects */ ··· 832 819 */ 833 820 bool notrace rcu_is_watching(void) 834 821 { 835 - int ret; 822 + bool ret; 836 823 837 824 preempt_disable(); 838 825 ret = __rcu_is_watching(); ··· 1660 1647 rnp->level, rnp->grplo, 1661 1648 rnp->grphi, rnp->qsmask); 1662 1649 raw_spin_unlock_irq(&rnp->lock); 1663 - cond_resched(); 1650 + cond_resched_rcu_qs(); 1664 1651 } 1665 1652 1666 1653 mutex_unlock(&rsp->onoff_mutex); ··· 1681 1668 if (fqs_state == RCU_SAVE_DYNTICK) { 1682 1669 /* Collect dyntick-idle snapshots. */ 1683 1670 if (is_sysidle_rcu_state(rsp)) { 1684 - isidle = 1; 1671 + isidle = true; 1685 1672 maxj = jiffies - ULONG_MAX / 4; 1686 1673 } 1687 1674 force_qs_rnp(rsp, dyntick_save_progress_counter, ··· 1690 1677 fqs_state = RCU_FORCE_QS; 1691 1678 } else { 1692 1679 /* Handle dyntick-idle and offline CPUs. */ 1693 - isidle = 0; 1680 + isidle = false; 1694 1681 force_qs_rnp(rsp, rcu_implicit_dynticks_qs, &isidle, &maxj); 1695 1682 } 1696 1683 /* Clear flag to prevent immediate re-entry. */ 1697 1684 if (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) { 1698 1685 raw_spin_lock_irq(&rnp->lock); 1699 1686 smp_mb__after_unlock_lock(); 1700 - ACCESS_ONCE(rsp->gp_flags) &= ~RCU_GP_FLAG_FQS; 1687 + ACCESS_ONCE(rsp->gp_flags) = 1688 + ACCESS_ONCE(rsp->gp_flags) & ~RCU_GP_FLAG_FQS; 1701 1689 raw_spin_unlock_irq(&rnp->lock); 1702 1690 } 1703 1691 return fqs_state; ··· 1750 1736 /* smp_mb() provided by prior unlock-lock pair. */ 1751 1737 nocb += rcu_future_gp_cleanup(rsp, rnp); 1752 1738 raw_spin_unlock_irq(&rnp->lock); 1753 - cond_resched(); 1739 + cond_resched_rcu_qs(); 1754 1740 } 1755 1741 rnp = rcu_get_root(rsp); 1756 1742 raw_spin_lock_irq(&rnp->lock); ··· 1799 1785 /* Locking provides needed memory barrier. */ 1800 1786 if (rcu_gp_init(rsp)) 1801 1787 break; 1802 - cond_resched(); 1803 - flush_signals(current); 1788 + cond_resched_rcu_qs(); 1789 + WARN_ON(signal_pending(current)); 1804 1790 trace_rcu_grace_period(rsp->name, 1805 1791 ACCESS_ONCE(rsp->gpnum), 1806 1792 TPS("reqwaitsig")); ··· 1842 1828 trace_rcu_grace_period(rsp->name, 1843 1829 ACCESS_ONCE(rsp->gpnum), 1844 1830 TPS("fqsend")); 1845 - cond_resched(); 1831 + cond_resched_rcu_qs(); 1846 1832 } else { 1847 1833 /* Deal with stray signal. */ 1848 - cond_resched(); 1849 - flush_signals(current); 1834 + cond_resched_rcu_qs(); 1835 + WARN_ON(signal_pending(current)); 1850 1836 trace_rcu_grace_period(rsp->name, 1851 1837 ACCESS_ONCE(rsp->gpnum), 1852 1838 TPS("fqswaitsig")); ··· 1942 1928 { 1943 1929 WARN_ON_ONCE(!rcu_gp_in_progress(rsp)); 1944 1930 raw_spin_unlock_irqrestore(&rcu_get_root(rsp)->lock, flags); 1945 - wake_up(&rsp->gp_wq); /* Memory barrier implied by wake_up() path. */ 1931 + rcu_gp_kthread_wake(rsp); 1946 1932 } 1947 1933 1948 1934 /* ··· 2224 2210 /* Adjust any no-longer-needed kthreads. */ 2225 2211 rcu_boost_kthread_setaffinity(rnp, -1); 2226 2212 2227 - /* Remove the dead CPU from the bitmasks in the rcu_node hierarchy. */ 2228 - 2229 2213 /* Exclude any attempts to start a new grace period. */ 2230 2214 mutex_lock(&rsp->onoff_mutex); 2231 2215 raw_spin_lock_irqsave(&rsp->orphan_lock, flags); ··· 2405 2393 * at least not while the corresponding CPU is online. 2406 2394 */ 2407 2395 2408 - rcu_sched_qs(cpu); 2409 - rcu_bh_qs(cpu); 2396 + rcu_sched_qs(); 2397 + rcu_bh_qs(); 2410 2398 2411 2399 } else if (!in_softirq()) { 2412 2400 ··· 2417 2405 * critical section, so note it. 2418 2406 */ 2419 2407 2420 - rcu_bh_qs(cpu); 2408 + rcu_bh_qs(); 2421 2409 } 2422 2410 rcu_preempt_check_callbacks(cpu); 2423 2411 if (rcu_pending(cpu)) 2424 2412 invoke_rcu_core(); 2413 + if (user) 2414 + rcu_note_voluntary_context_switch(current); 2425 2415 trace_rcu_utilization(TPS("End scheduler-tick")); 2426 2416 } 2427 2417 ··· 2446 2432 struct rcu_node *rnp; 2447 2433 2448 2434 rcu_for_each_leaf_node(rsp, rnp) { 2449 - cond_resched(); 2435 + cond_resched_rcu_qs(); 2450 2436 mask = 0; 2451 2437 raw_spin_lock_irqsave(&rnp->lock, flags); 2452 2438 smp_mb__after_unlock_lock(); ··· 2463 2449 for (; cpu <= rnp->grphi; cpu++, bit <<= 1) { 2464 2450 if ((rnp->qsmask & bit) != 0) { 2465 2451 if ((rnp->qsmaskinit & bit) != 0) 2466 - *isidle = 0; 2452 + *isidle = false; 2467 2453 if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj)) 2468 2454 mask |= bit; 2469 2455 } ··· 2519 2505 raw_spin_unlock_irqrestore(&rnp_old->lock, flags); 2520 2506 return; /* Someone beat us to it. */ 2521 2507 } 2522 - ACCESS_ONCE(rsp->gp_flags) |= RCU_GP_FLAG_FQS; 2508 + ACCESS_ONCE(rsp->gp_flags) = 2509 + ACCESS_ONCE(rsp->gp_flags) | RCU_GP_FLAG_FQS; 2523 2510 raw_spin_unlock_irqrestore(&rnp_old->lock, flags); 2524 - wake_up(&rsp->gp_wq); /* Memory barrier implied by wake_up() path. */ 2511 + rcu_gp_kthread_wake(rsp); 2525 2512 } 2526 2513 2527 2514 /* ··· 2940 2925 * restructure your code to batch your updates, and then use a single 2941 2926 * synchronize_sched() instead. 2942 2927 * 2943 - * Note that it is illegal to call this function while holding any lock 2944 - * that is acquired by a CPU-hotplug notifier. And yes, it is also illegal 2945 - * to call this function from a CPU-hotplug notifier. Failing to observe 2946 - * these restriction will result in deadlock. 2947 - * 2948 2928 * This implementation can be thought of as an application of ticket 2949 2929 * locking to RCU, with sync_sched_expedited_started and 2950 2930 * sync_sched_expedited_done taking on the roles of the halves ··· 2989 2979 */ 2990 2980 snap = atomic_long_inc_return(&rsp->expedited_start); 2991 2981 firstsnap = snap; 2992 - get_online_cpus(); 2982 + if (!try_get_online_cpus()) { 2983 + /* CPU hotplug operation in flight, fall back to normal GP. */ 2984 + wait_rcu_gp(call_rcu_sched); 2985 + atomic_long_inc(&rsp->expedited_normal); 2986 + return; 2987 + } 2993 2988 WARN_ON_ONCE(cpu_is_offline(raw_smp_processor_id())); 2994 2989 2995 2990 /* ··· 3041 3026 * and they started after our first try, so their grace 3042 3027 * period works for us. 3043 3028 */ 3044 - get_online_cpus(); 3029 + if (!try_get_online_cpus()) { 3030 + /* CPU hotplug operation in flight, use normal GP. */ 3031 + wait_rcu_gp(call_rcu_sched); 3032 + atomic_long_inc(&rsp->expedited_normal); 3033 + return; 3034 + } 3045 3035 snap = atomic_long_read(&rsp->expedited_start); 3046 3036 smp_mb(); /* ensure read is before try_stop_cpus(). */ 3047 3037 } ··· 3462 3442 case CPU_UP_PREPARE_FROZEN: 3463 3443 rcu_prepare_cpu(cpu); 3464 3444 rcu_prepare_kthreads(cpu); 3445 + rcu_spawn_all_nocb_kthreads(cpu); 3465 3446 break; 3466 3447 case CPU_ONLINE: 3467 3448 case CPU_DOWN_FAILED: ··· 3510 3489 } 3511 3490 3512 3491 /* 3513 - * Spawn the kthread that handles this RCU flavor's grace periods. 3492 + * Spawn the kthreads that handle each RCU flavor's grace periods. 3514 3493 */ 3515 3494 static int __init rcu_spawn_gp_kthread(void) 3516 3495 { ··· 3519 3498 struct rcu_state *rsp; 3520 3499 struct task_struct *t; 3521 3500 3501 + rcu_scheduler_fully_active = 1; 3522 3502 for_each_rcu_flavor(rsp) { 3523 3503 t = kthread_run(rcu_gp_kthread, rsp, "%s", rsp->name); 3524 3504 BUG_ON(IS_ERR(t)); ··· 3527 3505 raw_spin_lock_irqsave(&rnp->lock, flags); 3528 3506 rsp->gp_kthread = t; 3529 3507 raw_spin_unlock_irqrestore(&rnp->lock, flags); 3530 - rcu_spawn_nocb_kthreads(rsp); 3531 3508 } 3509 + rcu_spawn_nocb_kthreads(); 3510 + rcu_spawn_boost_kthreads(); 3532 3511 return 0; 3533 3512 } 3534 3513 early_initcall(rcu_spawn_gp_kthread);

+15 -3

kernel/rcu/tree.h

··· 350 350 int nocb_p_count_lazy; /* (approximate). */ 351 351 wait_queue_head_t nocb_wq; /* For nocb kthreads to sleep on. */ 352 352 struct task_struct *nocb_kthread; 353 - bool nocb_defer_wakeup; /* Defer wakeup of nocb_kthread. */ 353 + int nocb_defer_wakeup; /* Defer wakeup of nocb_kthread. */ 354 354 355 355 /* The following fields are used by the leader, hence own cacheline. */ 356 356 struct rcu_head *nocb_gp_head ____cacheline_internodealigned_in_smp; ··· 382 382 #define RCU_SAVE_DYNTICK 2 /* Need to scan dyntick state. */ 383 383 #define RCU_FORCE_QS 3 /* Need to force quiescent state. */ 384 384 #define RCU_SIGNAL_INIT RCU_SAVE_DYNTICK 385 + 386 + /* Values for nocb_defer_wakeup field in struct rcu_data. */ 387 + #define RCU_NOGP_WAKE_NOT 0 388 + #define RCU_NOGP_WAKE 1 389 + #define RCU_NOGP_WAKE_FORCE 2 385 390 386 391 #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500)) 387 392 /* For jiffies_till_first_fqs and */ ··· 577 572 static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp, 578 573 struct rcu_node *rnp); 579 574 #endif /* #ifdef CONFIG_RCU_BOOST */ 575 + static void __init rcu_spawn_boost_kthreads(void); 580 576 static void rcu_prepare_kthreads(int cpu); 581 577 static void rcu_cleanup_after_idle(int cpu); 582 578 static void rcu_prepare_for_idle(int cpu); ··· 595 589 static bool rcu_nocb_adopt_orphan_cbs(struct rcu_state *rsp, 596 590 struct rcu_data *rdp, 597 591 unsigned long flags); 598 - static bool rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp); 592 + static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp); 599 593 static void do_nocb_deferred_wakeup(struct rcu_data *rdp); 600 594 static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp); 601 - static void rcu_spawn_nocb_kthreads(struct rcu_state *rsp); 595 + static void rcu_spawn_all_nocb_kthreads(int cpu); 596 + static void __init rcu_spawn_nocb_kthreads(void); 597 + #ifdef CONFIG_RCU_NOCB_CPU 598 + static void __init rcu_organize_nocb_kthreads(struct rcu_state *rsp); 599 + #endif /* #ifdef CONFIG_RCU_NOCB_CPU */ 602 600 static void __maybe_unused rcu_kick_nohz_cpu(int cpu); 603 601 static bool init_nocb_callback_list(struct rcu_data *rdp); 604 602 static void rcu_sysidle_enter(struct rcu_dynticks *rdtp, int irq); ··· 615 605 static void rcu_bind_gp_kthread(void); 616 606 static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp); 617 607 static bool rcu_nohz_full_cpu(struct rcu_state *rsp); 608 + static void rcu_dynticks_task_enter(void); 609 + static void rcu_dynticks_task_exit(void); 618 610 619 611 #endif /* #ifndef RCU_TREE_NONCORE */ 620 612

+271 -129

kernel/rcu/tree_plugin.h

··· 85 85 pr_info("\tBoot-time adjustment of leaf fanout to %d.\n", rcu_fanout_leaf); 86 86 if (nr_cpu_ids != NR_CPUS) 87 87 pr_info("\tRCU restricting CPUs from NR_CPUS=%d to nr_cpu_ids=%d.\n", NR_CPUS, nr_cpu_ids); 88 - #ifdef CONFIG_RCU_NOCB_CPU 89 - #ifndef CONFIG_RCU_NOCB_CPU_NONE 90 - if (!have_rcu_nocb_mask) { 91 - zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL); 92 - have_rcu_nocb_mask = true; 93 - } 94 - #ifdef CONFIG_RCU_NOCB_CPU_ZERO 95 - pr_info("\tOffload RCU callbacks from CPU 0\n"); 96 - cpumask_set_cpu(0, rcu_nocb_mask); 97 - #endif /* #ifdef CONFIG_RCU_NOCB_CPU_ZERO */ 98 - #ifdef CONFIG_RCU_NOCB_CPU_ALL 99 - pr_info("\tOffload RCU callbacks from all CPUs\n"); 100 - cpumask_copy(rcu_nocb_mask, cpu_possible_mask); 101 - #endif /* #ifdef CONFIG_RCU_NOCB_CPU_ALL */ 102 - #endif /* #ifndef CONFIG_RCU_NOCB_CPU_NONE */ 103 - if (have_rcu_nocb_mask) { 104 - if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) { 105 - pr_info("\tNote: kernel parameter 'rcu_nocbs=' contains nonexistent CPUs.\n"); 106 - cpumask_and(rcu_nocb_mask, cpu_possible_mask, 107 - rcu_nocb_mask); 108 - } 109 - cpulist_scnprintf(nocb_buf, sizeof(nocb_buf), rcu_nocb_mask); 110 - pr_info("\tOffload RCU callbacks from CPUs: %s.\n", nocb_buf); 111 - if (rcu_nocb_poll) 112 - pr_info("\tPoll for callbacks from no-CBs CPUs.\n"); 113 - } 114 - #endif /* #ifdef CONFIG_RCU_NOCB_CPU */ 115 88 } 116 89 117 90 #ifdef CONFIG_TREE_PREEMPT_RCU ··· 107 134 * Return the number of RCU-preempt batches processed thus far 108 135 * for debug and statistics. 109 136 */ 110 - long rcu_batches_completed_preempt(void) 137 + static long rcu_batches_completed_preempt(void) 111 138 { 112 139 return rcu_preempt_state.completed; 113 140 } ··· 128 155 * not in a quiescent state. There might be any number of tasks blocked 129 156 * while in an RCU read-side critical section. 130 157 * 131 - * Unlike the other rcu_*_qs() functions, callers to this function 132 - * must disable irqs in order to protect the assignment to 133 - * ->rcu_read_unlock_special. 158 + * As with the other rcu_*_qs() functions, callers to this function 159 + * must disable preemption. 134 160 */ 135 - static void rcu_preempt_qs(int cpu) 161 + static void rcu_preempt_qs(void) 136 162 { 137 - struct rcu_data *rdp = &per_cpu(rcu_preempt_data, cpu); 138 - 139 - if (rdp->passed_quiesce == 0) 140 - trace_rcu_grace_period(TPS("rcu_preempt"), rdp->gpnum, TPS("cpuqs")); 141 - rdp->passed_quiesce = 1; 142 - current->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS; 163 + if (!__this_cpu_read(rcu_preempt_data.passed_quiesce)) { 164 + trace_rcu_grace_period(TPS("rcu_preempt"), 165 + __this_cpu_read(rcu_preempt_data.gpnum), 166 + TPS("cpuqs")); 167 + __this_cpu_write(rcu_preempt_data.passed_quiesce, 1); 168 + barrier(); /* Coordinate with rcu_preempt_check_callbacks(). */ 169 + current->rcu_read_unlock_special.b.need_qs = false; 170 + } 143 171 } 144 172 145 173 /* ··· 164 190 struct rcu_node *rnp; 165 191 166 192 if (t->rcu_read_lock_nesting > 0 && 167 - (t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) { 193 + !t->rcu_read_unlock_special.b.blocked) { 168 194 169 195 /* Possibly blocking in an RCU read-side critical section. */ 170 196 rdp = per_cpu_ptr(rcu_preempt_state.rda, cpu); 171 197 rnp = rdp->mynode; 172 198 raw_spin_lock_irqsave(&rnp->lock, flags); 173 199 smp_mb__after_unlock_lock(); 174 - t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BLOCKED; 200 + t->rcu_read_unlock_special.b.blocked = true; 175 201 t->rcu_blocked_node = rnp; 176 202 177 203 /* ··· 213 239 : rnp->gpnum + 1); 214 240 raw_spin_unlock_irqrestore(&rnp->lock, flags); 215 241 } else if (t->rcu_read_lock_nesting < 0 && 216 - t->rcu_read_unlock_special) { 242 + t->rcu_read_unlock_special.s) { 217 243 218 244 /* 219 245 * Complete exit from RCU read-side critical section on ··· 231 257 * grace period, then the fact that the task has been enqueued 232 258 * means that we continue to block the current grace period. 233 259 */ 234 - local_irq_save(flags); 235 - rcu_preempt_qs(cpu); 236 - local_irq_restore(flags); 260 + rcu_preempt_qs(); 237 261 } 238 262 239 263 /* ··· 312 340 bool drop_boost_mutex = false; 313 341 #endif /* #ifdef CONFIG_RCU_BOOST */ 314 342 struct rcu_node *rnp; 315 - int special; 343 + union rcu_special special; 316 344 317 345 /* NMI handlers cannot block and cannot safely manipulate state. */ 318 346 if (in_nmi()) ··· 322 350 323 351 /* 324 352 * If RCU core is waiting for this CPU to exit critical section, 325 - * let it know that we have done so. 353 + * let it know that we have done so. Because irqs are disabled, 354 + * t->rcu_read_unlock_special cannot change. 326 355 */ 327 356 special = t->rcu_read_unlock_special; 328 - if (special & RCU_READ_UNLOCK_NEED_QS) { 329 - rcu_preempt_qs(smp_processor_id()); 330 - if (!t->rcu_read_unlock_special) { 357 + if (special.b.need_qs) { 358 + rcu_preempt_qs(); 359 + if (!t->rcu_read_unlock_special.s) { 331 360 local_irq_restore(flags); 332 361 return; 333 362 } ··· 341 368 } 342 369 343 370 /* Clean up if blocked during RCU read-side critical section. */ 344 - if (special & RCU_READ_UNLOCK_BLOCKED) { 345 - t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_BLOCKED; 371 + if (special.b.blocked) { 372 + t->rcu_read_unlock_special.b.blocked = false; 346 373 347 374 /* 348 375 * Remove this task from the list it blocked on. The ··· 626 653 struct task_struct *t = current; 627 654 628 655 if (t->rcu_read_lock_nesting == 0) { 629 - rcu_preempt_qs(cpu); 656 + rcu_preempt_qs(); 630 657 return; 631 658 } 632 659 if (t->rcu_read_lock_nesting > 0 && 633 - per_cpu(rcu_preempt_data, cpu).qs_pending) 634 - t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS; 660 + per_cpu(rcu_preempt_data, cpu).qs_pending && 661 + !per_cpu(rcu_preempt_data, cpu).passed_quiesce) 662 + t->rcu_read_unlock_special.b.need_qs = true; 635 663 } 636 664 637 665 #ifdef CONFIG_RCU_BOOST ··· 793 819 * In fact, if you are using synchronize_rcu_expedited() in a loop, 794 820 * please restructure your code to batch your updates, and then Use a 795 821 * single synchronize_rcu() instead. 796 - * 797 - * Note that it is illegal to call this function while holding any lock 798 - * that is acquired by a CPU-hotplug notifier. And yes, it is also illegal 799 - * to call this function from a CPU-hotplug notifier. Failing to observe 800 - * these restriction will result in deadlock. 801 822 */ 802 823 void synchronize_rcu_expedited(void) 803 824 { ··· 814 845 * being boosted. This simplifies the process of moving tasks 815 846 * from leaf to root rcu_node structures. 816 847 */ 817 - get_online_cpus(); 848 + if (!try_get_online_cpus()) { 849 + /* CPU-hotplug operation in flight, fall back to normal GP. */ 850 + wait_rcu_gp(call_rcu); 851 + return; 852 + } 818 853 819 854 /* 820 855 * Acquire lock, falling back to synchronize_rcu() if too many ··· 870 897 871 898 /* Clean up and exit. */ 872 899 smp_mb(); /* ensure expedited GP seen before counter increment. */ 873 - ACCESS_ONCE(sync_rcu_preempt_exp_count)++; 900 + ACCESS_ONCE(sync_rcu_preempt_exp_count) = 901 + sync_rcu_preempt_exp_count + 1; 874 902 unlock_mb_ret: 875 903 mutex_unlock(&sync_rcu_preempt_exp_mutex); 876 904 mb_ret: ··· 915 941 return; 916 942 t->rcu_read_lock_nesting = 1; 917 943 barrier(); 918 - t->rcu_read_unlock_special = RCU_READ_UNLOCK_BLOCKED; 944 + t->rcu_read_unlock_special.b.blocked = true; 919 945 __rcu_read_unlock(); 920 946 } 921 947 ··· 1436 1462 }; 1437 1463 1438 1464 /* 1439 - * Spawn all kthreads -- called as soon as the scheduler is running. 1465 + * Spawn boost kthreads -- called as soon as the scheduler is running. 1440 1466 */ 1441 - static int __init rcu_spawn_kthreads(void) 1467 + static void __init rcu_spawn_boost_kthreads(void) 1442 1468 { 1443 1469 struct rcu_node *rnp; 1444 1470 int cpu; 1445 1471 1446 - rcu_scheduler_fully_active = 1; 1447 1472 for_each_possible_cpu(cpu) 1448 1473 per_cpu(rcu_cpu_has_work, cpu) = 0; 1449 1474 BUG_ON(smpboot_register_percpu_thread(&rcu_cpu_thread_spec)); ··· 1452 1479 rcu_for_each_leaf_node(rcu_state_p, rnp) 1453 1480 (void)rcu_spawn_one_boost_kthread(rcu_state_p, rnp); 1454 1481 } 1455 - return 0; 1456 1482 } 1457 - early_initcall(rcu_spawn_kthreads); 1458 1483 1459 1484 static void rcu_prepare_kthreads(int cpu) 1460 1485 { ··· 1490 1519 { 1491 1520 } 1492 1521 1493 - static int __init rcu_scheduler_really_started(void) 1522 + static void __init rcu_spawn_boost_kthreads(void) 1494 1523 { 1495 - rcu_scheduler_fully_active = 1; 1496 - return 0; 1497 1524 } 1498 - early_initcall(rcu_scheduler_really_started); 1499 1525 1500 1526 static void rcu_prepare_kthreads(int cpu) 1501 1527 { ··· 1593 1625 1594 1626 /* Exit early if we advanced recently. */ 1595 1627 if (jiffies == rdtp->last_advance_all) 1596 - return 0; 1628 + return false; 1597 1629 rdtp->last_advance_all = jiffies; 1598 1630 1599 1631 for_each_rcu_flavor(rsp) { ··· 1816 1848 get_online_cpus(); 1817 1849 for_each_online_cpu(cpu) { 1818 1850 smp_call_function_single(cpu, rcu_oom_notify_cpu, NULL, 1); 1819 - cond_resched(); 1851 + cond_resched_rcu_qs(); 1820 1852 } 1821 1853 put_online_cpus(); 1822 1854 ··· 2043 2075 if (!ACCESS_ONCE(rdp_leader->nocb_kthread)) 2044 2076 return; 2045 2077 if (ACCESS_ONCE(rdp_leader->nocb_leader_sleep) || force) { 2046 - /* Prior xchg orders against prior callback enqueue. */ 2078 + /* Prior smp_mb__after_atomic() orders against prior enqueue. */ 2047 2079 ACCESS_ONCE(rdp_leader->nocb_leader_sleep) = false; 2048 2080 wake_up(&rdp_leader->nocb_wq); 2049 2081 } ··· 2072 2104 ACCESS_ONCE(*old_rhpp) = rhp; 2073 2105 atomic_long_add(rhcount, &rdp->nocb_q_count); 2074 2106 atomic_long_add(rhcount_lazy, &rdp->nocb_q_count_lazy); 2107 + smp_mb__after_atomic(); /* Store *old_rhpp before _wake test. */ 2075 2108 2076 2109 /* If we are not being polled and there is a kthread, awaken it ... */ 2077 2110 t = ACCESS_ONCE(rdp->nocb_kthread); ··· 2089 2120 trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, 2090 2121 TPS("WakeEmpty")); 2091 2122 } else { 2092 - rdp->nocb_defer_wakeup = true; 2123 + rdp->nocb_defer_wakeup = RCU_NOGP_WAKE; 2093 2124 trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, 2094 2125 TPS("WakeEmptyIsDeferred")); 2095 2126 } 2096 2127 rdp->qlen_last_fqs_check = 0; 2097 2128 } else if (len > rdp->qlen_last_fqs_check + qhimark) { 2098 2129 /* ... or if many callbacks queued. */ 2099 - wake_nocb_leader(rdp, true); 2130 + if (!irqs_disabled_flags(flags)) { 2131 + wake_nocb_leader(rdp, true); 2132 + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, 2133 + TPS("WakeOvf")); 2134 + } else { 2135 + rdp->nocb_defer_wakeup = RCU_NOGP_WAKE_FORCE; 2136 + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, 2137 + TPS("WakeOvfIsDeferred")); 2138 + } 2100 2139 rdp->qlen_last_fqs_check = LONG_MAX / 2; 2101 - trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("WakeOvf")); 2102 2140 } else { 2103 2141 trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("WakeNot")); 2104 2142 } ··· 2126 2150 { 2127 2151 2128 2152 if (!rcu_is_nocb_cpu(rdp->cpu)) 2129 - return 0; 2153 + return false; 2130 2154 __call_rcu_nocb_enqueue(rdp, rhp, &rhp->next, 1, lazy, flags); 2131 2155 if (__is_kfree_rcu_offset((unsigned long)rhp->func)) 2132 2156 trace_rcu_kfree_callback(rdp->rsp->name, rhp, ··· 2137 2161 trace_rcu_callback(rdp->rsp->name, rhp, 2138 2162 -atomic_long_read(&rdp->nocb_q_count_lazy), 2139 2163 -atomic_long_read(&rdp->nocb_q_count)); 2140 - return 1; 2164 + 2165 + /* 2166 + * If called from an extended quiescent state with interrupts 2167 + * disabled, invoke the RCU core in order to allow the idle-entry 2168 + * deferred-wakeup check to function. 2169 + */ 2170 + if (irqs_disabled_flags(flags) && 2171 + !rcu_is_watching() && 2172 + cpu_online(smp_processor_id())) 2173 + invoke_rcu_core(); 2174 + 2175 + return true; 2141 2176 } 2142 2177 2143 2178 /* ··· 2164 2177 2165 2178 /* If this is not a no-CBs CPU, tell the caller to do it the old way. */ 2166 2179 if (!rcu_is_nocb_cpu(smp_processor_id())) 2167 - return 0; 2180 + return false; 2168 2181 rsp->qlen = 0; 2169 2182 rsp->qlen_lazy = 0; 2170 2183 ··· 2183 2196 rsp->orphan_nxtlist = NULL; 2184 2197 rsp->orphan_nxttail = &rsp->orphan_nxtlist; 2185 2198 } 2186 - return 1; 2199 + return true; 2187 2200 } 2188 2201 2189 2202 /* ··· 2216 2229 (d = ULONG_CMP_GE(ACCESS_ONCE(rnp->completed), c))); 2217 2230 if (likely(d)) 2218 2231 break; 2219 - flush_signals(current); 2232 + WARN_ON(signal_pending(current)); 2220 2233 trace_rcu_future_gp(rnp, rdp, c, TPS("ResumeWait")); 2221 2234 } 2222 2235 trace_rcu_future_gp(rnp, rdp, c, TPS("EndWait")); ··· 2275 2288 if (!rcu_nocb_poll) 2276 2289 trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, 2277 2290 "WokeEmpty"); 2278 - flush_signals(current); 2291 + WARN_ON(signal_pending(current)); 2279 2292 schedule_timeout_interruptible(1); 2280 2293 2281 2294 /* Rescan in case we were a victim of memory ordering. */ ··· 2314 2327 atomic_long_add(rdp->nocb_gp_count, &rdp->nocb_follower_count); 2315 2328 atomic_long_add(rdp->nocb_gp_count_lazy, 2316 2329 &rdp->nocb_follower_count_lazy); 2330 + smp_mb__after_atomic(); /* Store *tail before wakeup. */ 2317 2331 if (rdp != my_rdp && tail == &rdp->nocb_follower_head) { 2318 2332 /* 2319 2333 * List was empty, wake up the follower. ··· 2355 2367 if (!rcu_nocb_poll) 2356 2368 trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, 2357 2369 "WokeEmpty"); 2358 - flush_signals(current); 2370 + WARN_ON(signal_pending(current)); 2359 2371 schedule_timeout_interruptible(1); 2360 2372 } 2361 2373 } ··· 2416 2428 list = next; 2417 2429 } 2418 2430 trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1); 2419 - ACCESS_ONCE(rdp->nocb_p_count) -= c; 2420 - ACCESS_ONCE(rdp->nocb_p_count_lazy) -= cl; 2431 + ACCESS_ONCE(rdp->nocb_p_count) = rdp->nocb_p_count - c; 2432 + ACCESS_ONCE(rdp->nocb_p_count_lazy) = 2433 + rdp->nocb_p_count_lazy - cl; 2421 2434 rdp->n_nocbs_invoked += c; 2422 2435 } 2423 2436 return 0; 2424 2437 } 2425 2438 2426 2439 /* Is a deferred wakeup of rcu_nocb_kthread() required? */ 2427 - static bool rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp) 2440 + static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp) 2428 2441 { 2429 2442 return ACCESS_ONCE(rdp->nocb_defer_wakeup); 2430 2443 } ··· 2433 2444 /* Do a deferred wakeup of rcu_nocb_kthread(). */ 2434 2445 static void do_nocb_deferred_wakeup(struct rcu_data *rdp) 2435 2446 { 2447 + int ndw; 2448 + 2436 2449 if (!rcu_nocb_need_deferred_wakeup(rdp)) 2437 2450 return; 2438 - ACCESS_ONCE(rdp->nocb_defer_wakeup) = false; 2439 - wake_nocb_leader(rdp, false); 2440 - trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("DeferredWakeEmpty")); 2451 + ndw = ACCESS_ONCE(rdp->nocb_defer_wakeup); 2452 + ACCESS_ONCE(rdp->nocb_defer_wakeup) = RCU_NOGP_WAKE_NOT; 2453 + wake_nocb_leader(rdp, ndw == RCU_NOGP_WAKE_FORCE); 2454 + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("DeferredWake")); 2455 + } 2456 + 2457 + void __init rcu_init_nohz(void) 2458 + { 2459 + int cpu; 2460 + bool need_rcu_nocb_mask = true; 2461 + struct rcu_state *rsp; 2462 + 2463 + #ifdef CONFIG_RCU_NOCB_CPU_NONE 2464 + need_rcu_nocb_mask = false; 2465 + #endif /* #ifndef CONFIG_RCU_NOCB_CPU_NONE */ 2466 + 2467 + #if defined(CONFIG_NO_HZ_FULL) 2468 + if (tick_nohz_full_running && cpumask_weight(tick_nohz_full_mask)) 2469 + need_rcu_nocb_mask = true; 2470 + #endif /* #if defined(CONFIG_NO_HZ_FULL) */ 2471 + 2472 + if (!have_rcu_nocb_mask && need_rcu_nocb_mask) { 2473 + if (!zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL)) { 2474 + pr_info("rcu_nocb_mask allocation failed, callback offloading disabled.\n"); 2475 + return; 2476 + } 2477 + have_rcu_nocb_mask = true; 2478 + } 2479 + if (!have_rcu_nocb_mask) 2480 + return; 2481 + 2482 + #ifdef CONFIG_RCU_NOCB_CPU_ZERO 2483 + pr_info("\tOffload RCU callbacks from CPU 0\n"); 2484 + cpumask_set_cpu(0, rcu_nocb_mask); 2485 + #endif /* #ifdef CONFIG_RCU_NOCB_CPU_ZERO */ 2486 + #ifdef CONFIG_RCU_NOCB_CPU_ALL 2487 + pr_info("\tOffload RCU callbacks from all CPUs\n"); 2488 + cpumask_copy(rcu_nocb_mask, cpu_possible_mask); 2489 + #endif /* #ifdef CONFIG_RCU_NOCB_CPU_ALL */ 2490 + #if defined(CONFIG_NO_HZ_FULL) 2491 + if (tick_nohz_full_running) 2492 + cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask); 2493 + #endif /* #if defined(CONFIG_NO_HZ_FULL) */ 2494 + 2495 + if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) { 2496 + pr_info("\tNote: kernel parameter 'rcu_nocbs=' contains nonexistent CPUs.\n"); 2497 + cpumask_and(rcu_nocb_mask, cpu_possible_mask, 2498 + rcu_nocb_mask); 2499 + } 2500 + cpulist_scnprintf(nocb_buf, sizeof(nocb_buf), rcu_nocb_mask); 2501 + pr_info("\tOffload RCU callbacks from CPUs: %s.\n", nocb_buf); 2502 + if (rcu_nocb_poll) 2503 + pr_info("\tPoll for callbacks from no-CBs CPUs.\n"); 2504 + 2505 + for_each_rcu_flavor(rsp) { 2506 + for_each_cpu(cpu, rcu_nocb_mask) { 2507 + struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu); 2508 + 2509 + /* 2510 + * If there are early callbacks, they will need 2511 + * to be moved to the nocb lists. 2512 + */ 2513 + WARN_ON_ONCE(rdp->nxttail[RCU_NEXT_TAIL] != 2514 + &rdp->nxtlist && 2515 + rdp->nxttail[RCU_NEXT_TAIL] != NULL); 2516 + init_nocb_callback_list(rdp); 2517 + } 2518 + rcu_organize_nocb_kthreads(rsp); 2519 + } 2441 2520 } 2442 2521 2443 2522 /* Initialize per-rcu_data variables for no-CBs CPUs. */ ··· 2516 2459 rdp->nocb_follower_tail = &rdp->nocb_follower_head; 2517 2460 } 2518 2461 2462 + /* 2463 + * If the specified CPU is a no-CBs CPU that does not already have its 2464 + * rcuo kthread for the specified RCU flavor, spawn it. If the CPUs are 2465 + * brought online out of order, this can require re-organizing the 2466 + * leader-follower relationships. 2467 + */ 2468 + static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu) 2469 + { 2470 + struct rcu_data *rdp; 2471 + struct rcu_data *rdp_last; 2472 + struct rcu_data *rdp_old_leader; 2473 + struct rcu_data *rdp_spawn = per_cpu_ptr(rsp->rda, cpu); 2474 + struct task_struct *t; 2475 + 2476 + /* 2477 + * If this isn't a no-CBs CPU or if it already has an rcuo kthread, 2478 + * then nothing to do. 2479 + */ 2480 + if (!rcu_is_nocb_cpu(cpu) || rdp_spawn->nocb_kthread) 2481 + return; 2482 + 2483 + /* If we didn't spawn the leader first, reorganize! */ 2484 + rdp_old_leader = rdp_spawn->nocb_leader; 2485 + if (rdp_old_leader != rdp_spawn && !rdp_old_leader->nocb_kthread) { 2486 + rdp_last = NULL; 2487 + rdp = rdp_old_leader; 2488 + do { 2489 + rdp->nocb_leader = rdp_spawn; 2490 + if (rdp_last && rdp != rdp_spawn) 2491 + rdp_last->nocb_next_follower = rdp; 2492 + rdp_last = rdp; 2493 + rdp = rdp->nocb_next_follower; 2494 + rdp_last->nocb_next_follower = NULL; 2495 + } while (rdp); 2496 + rdp_spawn->nocb_next_follower = rdp_old_leader; 2497 + } 2498 + 2499 + /* Spawn the kthread for this CPU and RCU flavor. */ 2500 + t = kthread_run(rcu_nocb_kthread, rdp_spawn, 2501 + "rcuo%c/%d", rsp->abbr, cpu); 2502 + BUG_ON(IS_ERR(t)); 2503 + ACCESS_ONCE(rdp_spawn->nocb_kthread) = t; 2504 + } 2505 + 2506 + /* 2507 + * If the specified CPU is a no-CBs CPU that does not already have its 2508 + * rcuo kthreads, spawn them. 2509 + */ 2510 + static void rcu_spawn_all_nocb_kthreads(int cpu) 2511 + { 2512 + struct rcu_state *rsp; 2513 + 2514 + if (rcu_scheduler_fully_active) 2515 + for_each_rcu_flavor(rsp) 2516 + rcu_spawn_one_nocb_kthread(rsp, cpu); 2517 + } 2518 + 2519 + /* 2520 + * Once the scheduler is running, spawn rcuo kthreads for all online 2521 + * no-CBs CPUs. This assumes that the early_initcall()s happen before 2522 + * non-boot CPUs come online -- if this changes, we will need to add 2523 + * some mutual exclusion. 2524 + */ 2525 + static void __init rcu_spawn_nocb_kthreads(void) 2526 + { 2527 + int cpu; 2528 + 2529 + for_each_online_cpu(cpu) 2530 + rcu_spawn_all_nocb_kthreads(cpu); 2531 + } 2532 + 2519 2533 /* How many follower CPU IDs per leader? Default of -1 for sqrt(nr_cpu_ids). */ 2520 2534 static int rcu_nocb_leader_stride = -1; 2521 2535 module_param(rcu_nocb_leader_stride, int, 0444); 2522 2536 2523 2537 /* 2524 - * Create a kthread for each RCU flavor for each no-CBs CPU. 2525 - * Also initialize leader-follower relationships. 2538 + * Initialize leader-follower relationships for all no-CBs CPU. 2526 2539 */ 2527 - static void __init rcu_spawn_nocb_kthreads(struct rcu_state *rsp) 2540 + static void __init rcu_organize_nocb_kthreads(struct rcu_state *rsp) 2528 2541 { 2529 2542 int cpu; 2530 2543 int ls = rcu_nocb_leader_stride; ··· 2602 2475 struct rcu_data *rdp; 2603 2476 struct rcu_data *rdp_leader = NULL; /* Suppress misguided gcc warn. */ 2604 2477 struct rcu_data *rdp_prev = NULL; 2605 - struct task_struct *t; 2606 2478 2607 - if (rcu_nocb_mask == NULL) 2479 + if (!have_rcu_nocb_mask) 2608 2480 return; 2609 - #if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL) 2610 - if (tick_nohz_full_running) 2611 - cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask); 2612 - #endif /* #if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL) */ 2613 2481 if (ls == -1) { 2614 2482 ls = int_sqrt(nr_cpu_ids); 2615 2483 rcu_nocb_leader_stride = ls; ··· 2627 2505 rdp_prev->nocb_next_follower = rdp; 2628 2506 } 2629 2507 rdp_prev = rdp; 2630 - 2631 - /* Spawn the kthread for this CPU. */ 2632 - t = kthread_run(rcu_nocb_kthread, rdp, 2633 - "rcuo%c/%d", rsp->abbr, cpu); 2634 - BUG_ON(IS_ERR(t)); 2635 - ACCESS_ONCE(rdp->nocb_kthread) = t; 2636 2508 } 2637 2509 } 2638 2510 2639 2511 /* Prevent __call_rcu() from enqueuing callbacks on no-CBs CPUs */ 2640 2512 static bool init_nocb_callback_list(struct rcu_data *rdp) 2641 2513 { 2642 - if (rcu_nocb_mask == NULL || 2643 - !cpumask_test_cpu(rdp->cpu, rcu_nocb_mask)) 2514 + if (!rcu_is_nocb_cpu(rdp->cpu)) 2644 2515 return false; 2516 + 2645 2517 rdp->nxttail[RCU_NEXT_TAIL] = NULL; 2646 2518 return true; 2647 2519 } ··· 2657 2541 static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp, 2658 2542 bool lazy, unsigned long flags) 2659 2543 { 2660 - return 0; 2544 + return false; 2661 2545 } 2662 2546 2663 2547 static bool __maybe_unused rcu_nocb_adopt_orphan_cbs(struct rcu_state *rsp, 2664 2548 struct rcu_data *rdp, 2665 2549 unsigned long flags) 2666 2550 { 2667 - return 0; 2551 + return false; 2668 2552 } 2669 2553 2670 2554 static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp) 2671 2555 { 2672 2556 } 2673 2557 2674 - static bool rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp) 2558 + static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp) 2675 2559 { 2676 2560 return false; 2677 2561 } ··· 2680 2564 { 2681 2565 } 2682 2566 2683 - static void __init rcu_spawn_nocb_kthreads(struct rcu_state *rsp) 2567 + static void rcu_spawn_all_nocb_kthreads(int cpu) 2568 + { 2569 + } 2570 + 2571 + static void __init rcu_spawn_nocb_kthreads(void) 2684 2572 { 2685 2573 } 2686 2574 ··· 2715 2595 2716 2596 #ifdef CONFIG_NO_HZ_FULL_SYSIDLE 2717 2597 2718 - /* 2719 - * Define RCU flavor that holds sysidle state. This needs to be the 2720 - * most active flavor of RCU. 2721 - */ 2722 - #ifdef CONFIG_PREEMPT_RCU 2723 - static struct rcu_state *rcu_sysidle_state = &rcu_preempt_state; 2724 - #else /* #ifdef CONFIG_PREEMPT_RCU */ 2725 - static struct rcu_state *rcu_sysidle_state = &rcu_sched_state; 2726 - #endif /* #else #ifdef CONFIG_PREEMPT_RCU */ 2727 - 2728 2598 static int full_sysidle_state; /* Current system-idle state. */ 2729 2599 #define RCU_SYSIDLE_NOT 0 /* Some CPU is not idle. */ 2730 2600 #define RCU_SYSIDLE_SHORT 1 /* All CPUs idle for brief period. */ ··· 2731 2621 static void rcu_sysidle_enter(struct rcu_dynticks *rdtp, int irq) 2732 2622 { 2733 2623 unsigned long j; 2624 + 2625 + /* If there are no nohz_full= CPUs, no need to track this. */ 2626 + if (!tick_nohz_full_enabled()) 2627 + return; 2734 2628 2735 2629 /* Adjust nesting, check for fully idle. */ 2736 2630 if (irq) { ··· 2801 2687 */ 2802 2688 static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, int irq) 2803 2689 { 2690 + /* If there are no nohz_full= CPUs, no need to track this. */ 2691 + if (!tick_nohz_full_enabled()) 2692 + return; 2693 + 2804 2694 /* Adjust nesting, check for already non-idle. */ 2805 2695 if (irq) { 2806 2696 rdtp->dynticks_idle_nesting++; ··· 2859 2741 unsigned long j; 2860 2742 struct rcu_dynticks *rdtp = rdp->dynticks; 2861 2743 2744 + /* If there are no nohz_full= CPUs, don't check system-wide idleness. */ 2745 + if (!tick_nohz_full_enabled()) 2746 + return; 2747 + 2862 2748 /* 2863 2749 * If some other CPU has already reported non-idle, if this is 2864 2750 * not the flavor of RCU that tracks sysidle state, or if this 2865 2751 * is an offline or the timekeeping CPU, nothing to do. 2866 2752 */ 2867 - if (!*isidle || rdp->rsp != rcu_sysidle_state || 2753 + if (!*isidle || rdp->rsp != rcu_state_p || 2868 2754 cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu) 2869 2755 return; 2870 2756 if (rcu_gp_in_progress(rdp->rsp)) ··· 2894 2772 */ 2895 2773 static bool is_sysidle_rcu_state(struct rcu_state *rsp) 2896 2774 { 2897 - return rsp == rcu_sysidle_state; 2775 + return rsp == rcu_state_p; 2898 2776 } 2899 2777 2900 2778 /* ··· 2972 2850 static void rcu_sysidle_report(struct rcu_state *rsp, int isidle, 2973 2851 unsigned long maxj, bool gpkt) 2974 2852 { 2975 - if (rsp != rcu_sysidle_state) 2853 + if (rsp != rcu_state_p) 2976 2854 return; /* Wrong flavor, ignore. */ 2977 2855 if (gpkt && nr_cpu_ids <= CONFIG_NO_HZ_FULL_SYSIDLE_SMALL) 2978 2856 return; /* Running state machine from timekeeping CPU. */ ··· 2989 2867 static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, 2990 2868 unsigned long maxj) 2991 2869 { 2870 + /* If there are no nohz_full= CPUs, no need to track this. */ 2871 + if (!tick_nohz_full_enabled()) 2872 + return; 2873 + 2992 2874 rcu_sysidle_report(rsp, isidle, maxj, true); 2993 2875 } 2994 2876 ··· 3019 2893 3020 2894 /* 3021 2895 * Check to see if the system is fully idle, other than the timekeeping CPU. 3022 - * The caller must have disabled interrupts. 2896 + * The caller must have disabled interrupts. This is not intended to be 2897 + * called unless tick_nohz_full_enabled(). 3023 2898 */ 3024 2899 bool rcu_sys_is_idle(void) 3025 2900 { ··· 3046 2919 3047 2920 /* Scan all the CPUs looking for nonidle CPUs. */ 3048 2921 for_each_possible_cpu(cpu) { 3049 - rdp = per_cpu_ptr(rcu_sysidle_state->rda, cpu); 2922 + rdp = per_cpu_ptr(rcu_state_p->rda, cpu); 3050 2923 rcu_sysidle_check_cpu(rdp, &isidle, &maxj); 3051 2924 if (!isidle) 3052 2925 break; 3053 2926 } 3054 - rcu_sysidle_report(rcu_sysidle_state, 3055 - isidle, maxj, false); 2927 + rcu_sysidle_report(rcu_state_p, isidle, maxj, false); 3056 2928 oldrss = rss; 3057 2929 rss = ACCESS_ONCE(full_sysidle_state); 3058 2930 } ··· 3078 2952 * provided by the memory allocator. 3079 2953 */ 3080 2954 if (nr_cpu_ids > CONFIG_NO_HZ_FULL_SYSIDLE_SMALL && 3081 - !rcu_gp_in_progress(rcu_sysidle_state) && 2955 + !rcu_gp_in_progress(rcu_state_p) && 3082 2956 !rsh.inuse && xchg(&rsh.inuse, 1) == 0) 3083 2957 call_rcu(&rsh.rh, rcu_sysidle_cb); 3084 2958 return false; ··· 3161 3035 if (!is_housekeeping_cpu(raw_smp_processor_id())) 3162 3036 housekeeping_affine(current); 3163 3037 #endif /* #else #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */ 3038 + } 3039 + 3040 + /* Record the current task on dyntick-idle entry. */ 3041 + static void rcu_dynticks_task_enter(void) 3042 + { 3043 + #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) 3044 + ACCESS_ONCE(current->rcu_tasks_idle_cpu) = smp_processor_id(); 3045 + #endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */ 3046 + } 3047 + 3048 + /* Record no current task on dyntick-idle exit. */ 3049 + static void rcu_dynticks_task_exit(void) 3050 + { 3051 + #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) 3052 + ACCESS_ONCE(current->rcu_tasks_idle_cpu) = -1; 3053 + #endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */ 3164 3054 }

+344 -1

kernel/rcu/update.c

··· 47 47 #include <linux/hardirq.h> 48 48 #include <linux/delay.h> 49 49 #include <linux/module.h> 50 + #include <linux/kthread.h> 51 + #include <linux/tick.h> 50 52 51 53 #define CREATE_TRACE_POINTS 52 54 ··· 93 91 barrier(); /* critical section before exit code. */ 94 92 t->rcu_read_lock_nesting = INT_MIN; 95 93 barrier(); /* assign before ->rcu_read_unlock_special load */ 96 - if (unlikely(ACCESS_ONCE(t->rcu_read_unlock_special))) 94 + if (unlikely(ACCESS_ONCE(t->rcu_read_unlock_special.s))) 97 95 rcu_read_unlock_special(t); 98 96 barrier(); /* ->rcu_read_unlock_special load before assign */ 99 97 t->rcu_read_lock_nesting = 0; ··· 137 135 current->lockdep_recursion == 0; 138 136 } 139 137 EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled); 138 + 139 + /** 140 + * rcu_read_lock_held() - might we be in RCU read-side critical section? 141 + * 142 + * If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an RCU 143 + * read-side critical section. In absence of CONFIG_DEBUG_LOCK_ALLOC, 144 + * this assumes we are in an RCU read-side critical section unless it can 145 + * prove otherwise. This is useful for debug checks in functions that 146 + * require that they be called within an RCU read-side critical section. 147 + * 148 + * Checks debug_lockdep_rcu_enabled() to prevent false positives during boot 149 + * and while lockdep is disabled. 150 + * 151 + * Note that rcu_read_lock() and the matching rcu_read_unlock() must 152 + * occur in the same context, for example, it is illegal to invoke 153 + * rcu_read_unlock() in process context if the matching rcu_read_lock() 154 + * was invoked from within an irq handler. 155 + * 156 + * Note that rcu_read_lock() is disallowed if the CPU is either idle or 157 + * offline from an RCU perspective, so check for those as well. 158 + */ 159 + int rcu_read_lock_held(void) 160 + { 161 + if (!debug_lockdep_rcu_enabled()) 162 + return 1; 163 + if (!rcu_is_watching()) 164 + return 0; 165 + if (!rcu_lockdep_current_cpu_online()) 166 + return 0; 167 + return lock_is_held(&rcu_lock_map); 168 + } 169 + EXPORT_SYMBOL_GPL(rcu_read_lock_held); 140 170 141 171 /** 142 172 * rcu_read_lock_bh_held() - might we be in RCU-bh read-side critical section? ··· 381 347 early_initcall(check_cpu_stall_init); 382 348 383 349 #endif /* #ifdef CONFIG_RCU_STALL_COMMON */ 350 + 351 + #ifdef CONFIG_TASKS_RCU 352 + 353 + /* 354 + * Simple variant of RCU whose quiescent states are voluntary context switch, 355 + * user-space execution, and idle. As such, grace periods can take one good 356 + * long time. There are no read-side primitives similar to rcu_read_lock() 357 + * and rcu_read_unlock() because this implementation is intended to get 358 + * the system into a safe state for some of the manipulations involved in 359 + * tracing and the like. Finally, this implementation does not support 360 + * high call_rcu_tasks() rates from multiple CPUs. If this is required, 361 + * per-CPU callback lists will be needed. 362 + */ 363 + 364 + /* Global list of callbacks and associated lock. */ 365 + static struct rcu_head *rcu_tasks_cbs_head; 366 + static struct rcu_head **rcu_tasks_cbs_tail = &rcu_tasks_cbs_head; 367 + static DECLARE_WAIT_QUEUE_HEAD(rcu_tasks_cbs_wq); 368 + static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock); 369 + 370 + /* Track exiting tasks in order to allow them to be waited for. */ 371 + DEFINE_SRCU(tasks_rcu_exit_srcu); 372 + 373 + /* Control stall timeouts. Disable with <= 0, otherwise jiffies till stall. */ 374 + static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 10; 375 + module_param(rcu_task_stall_timeout, int, 0644); 376 + 377 + static void rcu_spawn_tasks_kthread(void); 378 + 379 + /* 380 + * Post an RCU-tasks callback. First call must be from process context 381 + * after the scheduler if fully operational. 382 + */ 383 + void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp)) 384 + { 385 + unsigned long flags; 386 + bool needwake; 387 + 388 + rhp->next = NULL; 389 + rhp->func = func; 390 + raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags); 391 + needwake = !rcu_tasks_cbs_head; 392 + *rcu_tasks_cbs_tail = rhp; 393 + rcu_tasks_cbs_tail = &rhp->next; 394 + raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags); 395 + if (needwake) { 396 + rcu_spawn_tasks_kthread(); 397 + wake_up(&rcu_tasks_cbs_wq); 398 + } 399 + } 400 + EXPORT_SYMBOL_GPL(call_rcu_tasks); 401 + 402 + /** 403 + * synchronize_rcu_tasks - wait until an rcu-tasks grace period has elapsed. 404 + * 405 + * Control will return to the caller some time after a full rcu-tasks 406 + * grace period has elapsed, in other words after all currently 407 + * executing rcu-tasks read-side critical sections have elapsed. These 408 + * read-side critical sections are delimited by calls to schedule(), 409 + * cond_resched_rcu_qs(), idle execution, userspace execution, calls 410 + * to synchronize_rcu_tasks(), and (in theory, anyway) cond_resched(). 411 + * 412 + * This is a very specialized primitive, intended only for a few uses in 413 + * tracing and other situations requiring manipulation of function 414 + * preambles and profiling hooks. The synchronize_rcu_tasks() function 415 + * is not (yet) intended for heavy use from multiple CPUs. 416 + * 417 + * Note that this guarantee implies further memory-ordering guarantees. 418 + * On systems with more than one CPU, when synchronize_rcu_tasks() returns, 419 + * each CPU is guaranteed to have executed a full memory barrier since the 420 + * end of its last RCU-tasks read-side critical section whose beginning 421 + * preceded the call to synchronize_rcu_tasks(). In addition, each CPU 422 + * having an RCU-tasks read-side critical section that extends beyond 423 + * the return from synchronize_rcu_tasks() is guaranteed to have executed 424 + * a full memory barrier after the beginning of synchronize_rcu_tasks() 425 + * and before the beginning of that RCU-tasks read-side critical section. 426 + * Note that these guarantees include CPUs that are offline, idle, or 427 + * executing in user mode, as well as CPUs that are executing in the kernel. 428 + * 429 + * Furthermore, if CPU A invoked synchronize_rcu_tasks(), which returned 430 + * to its caller on CPU B, then both CPU A and CPU B are guaranteed 431 + * to have executed a full memory barrier during the execution of 432 + * synchronize_rcu_tasks() -- even if CPU A and CPU B are the same CPU 433 + * (but again only if the system has more than one CPU). 434 + */ 435 + void synchronize_rcu_tasks(void) 436 + { 437 + /* Complain if the scheduler has not started. */ 438 + rcu_lockdep_assert(!rcu_scheduler_active, 439 + "synchronize_rcu_tasks called too soon"); 440 + 441 + /* Wait for the grace period. */ 442 + wait_rcu_gp(call_rcu_tasks); 443 + } 444 + EXPORT_SYMBOL_GPL(synchronize_rcu_tasks); 445 + 446 + /** 447 + * rcu_barrier_tasks - Wait for in-flight call_rcu_tasks() callbacks. 448 + * 449 + * Although the current implementation is guaranteed to wait, it is not 450 + * obligated to, for example, if there are no pending callbacks. 451 + */ 452 + void rcu_barrier_tasks(void) 453 + { 454 + /* There is only one callback queue, so this is easy. ;-) */ 455 + synchronize_rcu_tasks(); 456 + } 457 + EXPORT_SYMBOL_GPL(rcu_barrier_tasks); 458 + 459 + /* See if tasks are still holding out, complain if so. */ 460 + static void check_holdout_task(struct task_struct *t, 461 + bool needreport, bool *firstreport) 462 + { 463 + int cpu; 464 + 465 + if (!ACCESS_ONCE(t->rcu_tasks_holdout) || 466 + t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw) || 467 + !ACCESS_ONCE(t->on_rq) || 468 + (IS_ENABLED(CONFIG_NO_HZ_FULL) && 469 + !is_idle_task(t) && t->rcu_tasks_idle_cpu >= 0)) { 470 + ACCESS_ONCE(t->rcu_tasks_holdout) = false; 471 + list_del_init(&t->rcu_tasks_holdout_list); 472 + put_task_struct(t); 473 + return; 474 + } 475 + if (!needreport) 476 + return; 477 + if (*firstreport) { 478 + pr_err("INFO: rcu_tasks detected stalls on tasks:\n"); 479 + *firstreport = false; 480 + } 481 + cpu = task_cpu(t); 482 + pr_alert("%p: %c%c nvcsw: %lu/%lu holdout: %d idle_cpu: %d/%d\n", 483 + t, ".I"[is_idle_task(t)], 484 + "N."[cpu < 0 || !tick_nohz_full_cpu(cpu)], 485 + t->rcu_tasks_nvcsw, t->nvcsw, t->rcu_tasks_holdout, 486 + t->rcu_tasks_idle_cpu, cpu); 487 + sched_show_task(t); 488 + } 489 + 490 + /* RCU-tasks kthread that detects grace periods and invokes callbacks. */ 491 + static int __noreturn rcu_tasks_kthread(void *arg) 492 + { 493 + unsigned long flags; 494 + struct task_struct *g, *t; 495 + unsigned long lastreport; 496 + struct rcu_head *list; 497 + struct rcu_head *next; 498 + LIST_HEAD(rcu_tasks_holdouts); 499 + 500 + /* FIXME: Add housekeeping affinity. */ 501 + 502 + /* 503 + * Each pass through the following loop makes one check for 504 + * newly arrived callbacks, and, if there are some, waits for 505 + * one RCU-tasks grace period and then invokes the callbacks. 506 + * This loop is terminated by the system going down. ;-) 507 + */ 508 + for (;;) { 509 + 510 + /* Pick up any new callbacks. */ 511 + raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags); 512 + list = rcu_tasks_cbs_head; 513 + rcu_tasks_cbs_head = NULL; 514 + rcu_tasks_cbs_tail = &rcu_tasks_cbs_head; 515 + raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags); 516 + 517 + /* If there were none, wait a bit and start over. */ 518 + if (!list) { 519 + wait_event_interruptible(rcu_tasks_cbs_wq, 520 + rcu_tasks_cbs_head); 521 + if (!rcu_tasks_cbs_head) { 522 + WARN_ON(signal_pending(current)); 523 + schedule_timeout_interruptible(HZ/10); 524 + } 525 + continue; 526 + } 527 + 528 + /* 529 + * Wait for all pre-existing t->on_rq and t->nvcsw 530 + * transitions to complete. Invoking synchronize_sched() 531 + * suffices because all these transitions occur with 532 + * interrupts disabled. Without this synchronize_sched(), 533 + * a read-side critical section that started before the 534 + * grace period might be incorrectly seen as having started 535 + * after the grace period. 536 + * 537 + * This synchronize_sched() also dispenses with the 538 + * need for a memory barrier on the first store to 539 + * ->rcu_tasks_holdout, as it forces the store to happen 540 + * after the beginning of the grace period. 541 + */ 542 + synchronize_sched(); 543 + 544 + /* 545 + * There were callbacks, so we need to wait for an 546 + * RCU-tasks grace period. Start off by scanning 547 + * the task list for tasks that are not already 548 + * voluntarily blocked. Mark these tasks and make 549 + * a list of them in rcu_tasks_holdouts. 550 + */ 551 + rcu_read_lock(); 552 + for_each_process_thread(g, t) { 553 + if (t != current && ACCESS_ONCE(t->on_rq) && 554 + !is_idle_task(t)) { 555 + get_task_struct(t); 556 + t->rcu_tasks_nvcsw = ACCESS_ONCE(t->nvcsw); 557 + ACCESS_ONCE(t->rcu_tasks_holdout) = true; 558 + list_add(&t->rcu_tasks_holdout_list, 559 + &rcu_tasks_holdouts); 560 + } 561 + } 562 + rcu_read_unlock(); 563 + 564 + /* 565 + * Wait for tasks that are in the process of exiting. 566 + * This does only part of the job, ensuring that all 567 + * tasks that were previously exiting reach the point 568 + * where they have disabled preemption, allowing the 569 + * later synchronize_sched() to finish the job. 570 + */ 571 + synchronize_srcu(&tasks_rcu_exit_srcu); 572 + 573 + /* 574 + * Each pass through the following loop scans the list 575 + * of holdout tasks, removing any that are no longer 576 + * holdouts. When the list is empty, we are done. 577 + */ 578 + lastreport = jiffies; 579 + while (!list_empty(&rcu_tasks_holdouts)) { 580 + bool firstreport; 581 + bool needreport; 582 + int rtst; 583 + struct task_struct *t1; 584 + 585 + schedule_timeout_interruptible(HZ); 586 + rtst = ACCESS_ONCE(rcu_task_stall_timeout); 587 + needreport = rtst > 0 && 588 + time_after(jiffies, lastreport + rtst); 589 + if (needreport) 590 + lastreport = jiffies; 591 + firstreport = true; 592 + WARN_ON(signal_pending(current)); 593 + list_for_each_entry_safe(t, t1, &rcu_tasks_holdouts, 594 + rcu_tasks_holdout_list) { 595 + check_holdout_task(t, needreport, &firstreport); 596 + cond_resched(); 597 + } 598 + } 599 + 600 + /* 601 + * Because ->on_rq and ->nvcsw are not guaranteed 602 + * to have a full memory barriers prior to them in the 603 + * schedule() path, memory reordering on other CPUs could 604 + * cause their RCU-tasks read-side critical sections to 605 + * extend past the end of the grace period. However, 606 + * because these ->nvcsw updates are carried out with 607 + * interrupts disabled, we can use synchronize_sched() 608 + * to force the needed ordering on all such CPUs. 609 + * 610 + * This synchronize_sched() also confines all 611 + * ->rcu_tasks_holdout accesses to be within the grace 612 + * period, avoiding the need for memory barriers for 613 + * ->rcu_tasks_holdout accesses. 614 + * 615 + * In addition, this synchronize_sched() waits for exiting 616 + * tasks to complete their final preempt_disable() region 617 + * of execution, cleaning up after the synchronize_srcu() 618 + * above. 619 + */ 620 + synchronize_sched(); 621 + 622 + /* Invoke the callbacks. */ 623 + while (list) { 624 + next = list->next; 625 + local_bh_disable(); 626 + list->func(list); 627 + local_bh_enable(); 628 + list = next; 629 + cond_resched(); 630 + } 631 + schedule_timeout_uninterruptible(HZ/10); 632 + } 633 + } 634 + 635 + /* Spawn rcu_tasks_kthread() at first call to call_rcu_tasks(). */ 636 + static void rcu_spawn_tasks_kthread(void) 637 + { 638 + static DEFINE_MUTEX(rcu_tasks_kthread_mutex); 639 + static struct task_struct *rcu_tasks_kthread_ptr; 640 + struct task_struct *t; 641 + 642 + if (ACCESS_ONCE(rcu_tasks_kthread_ptr)) { 643 + smp_mb(); /* Ensure caller sees full kthread. */ 644 + return; 645 + } 646 + mutex_lock(&rcu_tasks_kthread_mutex); 647 + if (rcu_tasks_kthread_ptr) { 648 + mutex_unlock(&rcu_tasks_kthread_mutex); 649 + return; 650 + } 651 + t = kthread_run(rcu_tasks_kthread, NULL, "rcu_tasks_kthread"); 652 + BUG_ON(IS_ERR(t)); 653 + smp_mb(); /* Ensure others see full kthread. */ 654 + ACCESS_ONCE(rcu_tasks_kthread_ptr) = t; 655 + mutex_unlock(&rcu_tasks_kthread_mutex); 656 + } 657 + 658 + #endif /* #ifdef CONFIG_TASKS_RCU */

+1 -1

kernel/softirq.c

··· 278 278 pending >>= softirq_bit; 279 279 } 280 280 281 - rcu_bh_qs(smp_processor_id()); 281 + rcu_bh_qs(); 282 282 local_irq_disable(); 283 283 284 284 pending = local_softirq_pending();

-9

kernel/sysctl.c

··· 1055 1055 .child = key_sysctls, 1056 1056 }, 1057 1057 #endif 1058 - #ifdef CONFIG_RCU_TORTURE_TEST 1059 - { 1060 - .procname = "rcutorture_runnable", 1061 - .data = &rcutorture_runnable, 1062 - .maxlen = sizeof(int), 1063 - .mode = 0644, 1064 - .proc_handler = proc_dointvec, 1065 - }, 1066 - #endif 1067 1058 #ifdef CONFIG_PERF_EVENTS 1068 1059 /* 1069 1060 * User-space scripts rely on the existence of this file

+20 -12

kernel/torture.c

··· 211 211 /* 212 212 * Print online/offline testing statistics. 213 213 */ 214 - char *torture_onoff_stats(char *page) 214 + void torture_onoff_stats(void) 215 215 { 216 216 #ifdef CONFIG_HOTPLUG_CPU 217 - page += sprintf(page, 218 - "onoff: %ld/%ld:%ld/%ld %d,%d:%d,%d %lu:%lu (HZ=%d) ", 219 - n_online_successes, n_online_attempts, 220 - n_offline_successes, n_offline_attempts, 221 - min_online, max_online, 222 - min_offline, max_offline, 223 - sum_online, sum_offline, HZ); 217 + pr_cont("onoff: %ld/%ld:%ld/%ld %d,%d:%d,%d %lu:%lu (HZ=%d) ", 218 + n_online_successes, n_online_attempts, 219 + n_offline_successes, n_offline_attempts, 220 + min_online, max_online, 221 + min_offline, max_offline, 222 + sum_online, sum_offline, HZ); 224 223 #endif /* #ifdef CONFIG_HOTPLUG_CPU */ 225 - return page; 226 224 } 227 225 EXPORT_SYMBOL_GPL(torture_onoff_stats); 228 226 ··· 633 635 * 634 636 * This must be called before the caller starts shutting down its own 635 637 * kthreads. 638 + * 639 + * Both torture_cleanup_begin() and torture_cleanup_end() must be paired, 640 + * in order to correctly perform the cleanup. They are separated because 641 + * threads can still need to reference the torture_type type, thus nullify 642 + * only after completing all other relevant calls. 636 643 */ 637 - bool torture_cleanup(void) 644 + bool torture_cleanup_begin(void) 638 645 { 639 646 mutex_lock(&fullstop_mutex); 640 647 if (ACCESS_ONCE(fullstop) == FULLSTOP_SHUTDOWN) { ··· 654 651 torture_shuffle_cleanup(); 655 652 torture_stutter_cleanup(); 656 653 torture_onoff_cleanup(); 654 + return false; 655 + } 656 + EXPORT_SYMBOL_GPL(torture_cleanup_begin); 657 + 658 + void torture_cleanup_end(void) 659 + { 657 660 mutex_lock(&fullstop_mutex); 658 661 torture_type = NULL; 659 662 mutex_unlock(&fullstop_mutex); 660 - return false; 661 663 } 662 - EXPORT_SYMBOL_GPL(torture_cleanup); 664 + EXPORT_SYMBOL_GPL(torture_cleanup_end); 663 665 664 666 /* 665 667 * Is it time for the current torture test to stop?

+3 -2

kernel/workqueue.c

··· 2043 2043 * kernels, where a requeueing work item waiting for something to 2044 2044 * happen could deadlock with stop_machine as such work item could 2045 2045 * indefinitely requeue itself while all other CPUs are trapped in 2046 - * stop_machine. 2046 + * stop_machine. At the same time, report a quiescent RCU state so 2047 + * the same condition doesn't freeze RCU. 2047 2048 */ 2048 - cond_resched(); 2049 + cond_resched_rcu_qs(); 2049 2050 2050 2051 spin_lock_irq(&pool->lock); 2051 2052

+1 -1

mm/mlock.c

··· 789 789 790 790 /* Ignore errors */ 791 791 mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags); 792 - cond_resched(); 792 + cond_resched_rcu_qs(); 793 793 } 794 794 out: 795 795 return 0;

+2 -2

tools/testing/selftests/rcutorture/bin/config2frag.sh

··· 1 - #!/bin/sh 2 - # Usage: sh config2frag.sh < .config > configfrag 1 + #!/bin/bash 2 + # Usage: config2frag.sh < .config > configfrag 3 3 # 4 4 # Converts the "# CONFIG_XXX is not set" to "CONFIG_XXX=n" so that the 5 5 # resulting file becomes a legitimate Kconfig fragment.

+2 -2

tools/testing/selftests/rcutorture/bin/configcheck.sh

··· 1 - #!/bin/sh 2 - # Usage: sh configcheck.sh .config .config-template 1 + #!/bin/bash 2 + # Usage: configcheck.sh .config .config-template 3 3 # 4 4 # This program is free software; you can redistribute it and/or modify 5 5 # it under the terms of the GNU General Public License as published by

+2 -2

tools/testing/selftests/rcutorture/bin/configinit.sh

··· 1 - #!/bin/sh 1 + #!/bin/bash 2 2 # 3 - # sh configinit.sh config-spec-file [ build output dir ] 3 + # Usage: configinit.sh config-spec-file [ build output dir ] 4 4 # 5 5 # Create a .config file from the spec file. Run from the kernel source tree. 6 6 # Exits with 0 if all went well, with 1 if all went well but the config

+20

tools/testing/selftests/rcutorture/bin/functions.sh

··· 64 64 fi 65 65 } 66 66 67 + # configfrag_boot_cpus bootparam-string config-fragment-file config-cpus 68 + # 69 + # Decreases number of CPUs based on any maxcpus= boot parameters specified. 70 + configfrag_boot_cpus () { 71 + local bootargs="`configfrag_boot_params "$1" "$2"`" 72 + local maxcpus 73 + if echo "${bootargs}" | grep -q 'maxcpus=[0-9]' 74 + then 75 + maxcpus="`echo "${bootargs}" | sed -e 's/^.*maxcpus=$[0-9]*$.*$/\1/'`" 76 + if test "$3" -gt "$maxcpus" 77 + then 78 + echo $maxcpus 79 + else 80 + echo $3 81 + fi 82 + else 83 + echo $3 84 + fi 85 + } 86 + 67 87 # configfrag_hotplug_cpu config-fragment-file 68 88 # 69 89 # Returns 1 if the config fragment specifies hotplug CPU.

+1 -1

tools/testing/selftests/rcutorture/bin/kvm-build.sh

··· 2 2 # 3 3 # Build a kvm-ready Linux kernel from the tree in the current directory. 4 4 # 5 - # Usage: sh kvm-build.sh config-template build-dir more-configs 5 + # Usage: kvm-build.sh config-template build-dir more-configs 6 6 # 7 7 # This program is free software; you can redistribute it and/or modify 8 8 # it under the terms of the GNU General Public License as published by

+1 -1

tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh

··· 2 2 # 3 3 # Analyze a given results directory for locktorture progress. 4 4 # 5 - # Usage: sh kvm-recheck-lock.sh resdir 5 + # Usage: kvm-recheck-lock.sh resdir 6 6 # 7 7 # This program is free software; you can redistribute it and/or modify 8 8 # it under the terms of the GNU General Public License as published by

+1 -1

tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh

··· 2 2 # 3 3 # Analyze a given results directory for rcutorture progress. 4 4 # 5 - # Usage: sh kvm-recheck-rcu.sh resdir 5 + # Usage: kvm-recheck-rcu.sh resdir 6 6 # 7 7 # This program is free software; you can redistribute it and/or modify 8 8 # it under the terms of the GNU General Public License as published by

+1 -1

tools/testing/selftests/rcutorture/bin/kvm-recheck.sh

··· 4 4 # check the build and console output for errors. Given a directory 5 5 # containing results directories, this recursively checks them all. 6 6 # 7 - # Usage: sh kvm-recheck.sh resdir ... 7 + # Usage: kvm-recheck.sh resdir ... 8 8 # 9 9 # This program is free software; you can redistribute it and/or modify 10 10 # it under the terms of the GNU General Public License as published by

+3 -2

tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh

··· 6 6 # Execute this in the source tree. Do not run it as a background task 7 7 # because qemu does not seem to like that much. 8 8 # 9 - # Usage: sh kvm-test-1-run.sh config builddir resdir minutes qemu-args boot_args 9 + # Usage: kvm-test-1-run.sh config builddir resdir minutes qemu-args boot_args 10 10 # 11 11 # qemu-args defaults to "-nographic", along with arguments specifying the 12 12 # number of CPUs and other options generated from ··· 140 140 # Generate -smp qemu argument. 141 141 qemu_args="-nographic $qemu_args" 142 142 cpu_count=`configNR_CPUS.sh $config_template` 143 + cpu_count=`configfrag_boot_cpus "$boot_args" "$config_template" "$cpu_count"` 143 144 vcpus=`identify_qemu_vcpus` 144 145 if test $cpu_count -gt $vcpus 145 146 then ··· 215 214 fi 216 215 if test $kruntime -ge $((seconds + grace)) 217 216 then 218 - echo "!!! Hang at $kruntime vs. $seconds seconds" >> $resdir/Warnings 2>&1 217 + echo "!!! PID $qemu_pid hung at $kruntime vs. $seconds seconds" >> $resdir/Warnings 2>&1 219 218 kill -KILL $qemu_pid 220 219 break 221 220 fi

+4 -2

tools/testing/selftests/rcutorture/bin/kvm.sh

··· 7 7 # Edit the definitions below to set the locations of the various directories, 8 8 # as well as the test duration. 9 9 # 10 - # Usage: sh kvm.sh [ options ] 10 + # Usage: kvm.sh [ options ] 11 11 # 12 12 # This program is free software; you can redistribute it and/or modify 13 13 # it under the terms of the GNU General Public License as published by ··· 188 188 do 189 189 if test -f "$CONFIGFRAG/$kversion/$CF" 190 190 then 191 - echo $CF `configNR_CPUS.sh $CONFIGFRAG/$kversion/$CF` >> $T/cfgcpu 191 + cpu_count=`configNR_CPUS.sh $CONFIGFRAG/$kversion/$CF` 192 + cpu_count=`configfrag_boot_cpus "$TORTURE_BOOTARGS" "$CONFIGFRAG/$kversion/$CF" "$cpu_count"` 193 + echo $CF $cpu_count >> $T/cfgcpu 192 194 else 193 195 echo "The --configs file $CF does not exist, terminating." 194 196 exit 1

+2 -3

tools/testing/selftests/rcutorture/bin/parse-build.sh

··· 1 - #!/bin/sh 1 + #!/bin/bash 2 2 # 3 3 # Check the build output from an rcutorture run for goodness. 4 4 # The "file" is a pathname on the local system, and "title" is ··· 6 6 # 7 7 # The file must contain kernel build output. 8 8 # 9 - # Usage: 10 - # sh parse-build.sh file title 9 + # Usage: parse-build.sh file title 11 10 # 12 11 # This program is free software; you can redistribute it and/or modify 13 12 # it under the terms of the GNU General Public License as published by

+6 -3

tools/testing/selftests/rcutorture/bin/parse-console.sh

··· 1 - #!/bin/sh 1 + #!/bin/bash 2 2 # 3 3 # Check the console output from an rcutorture run for oopses. 4 4 # The "file" is a pathname on the local system, and "title" is 5 5 # a text string for error-message purposes. 6 6 # 7 - # Usage: 8 - # sh parse-console.sh file title 7 + # Usage: parse-console.sh file title 9 8 # 10 9 # This program is free software; you can redistribute it and/or modify 11 10 # it under the terms of the GNU General Public License as published by ··· 32 33 33 34 . functions.sh 34 35 36 + if grep -Pq '\x00' < $file 37 + then 38 + print_warning Console output contains nul bytes, old qemu still running? 39 + fi 35 40 egrep 'Badness|WARNING:|Warn|BUG|===========|Call Trace:|Oops:' < $file | grep -v 'ODEBUG: ' | grep -v 'Warning: unable to open an initial console' > $T 36 41 if test -s $T 37 42 then

+2 -3

tools/testing/selftests/rcutorture/bin/parse-torture.sh

··· 1 - #!/bin/sh 1 + #!/bin/bash 2 2 # 3 3 # Check the console output from a torture run for goodness. 4 4 # The "file" is a pathname on the local system, and "title" is ··· 7 7 # The file must contain torture output, but can be interspersed 8 8 # with other dmesg text, as in console-log output. 9 9 # 10 - # Usage: 11 - # sh parse-torture.sh file title 10 + # Usage: parse-torture.sh file title 12 11 # 13 12 # This program is free software; you can redistribute it and/or modify 14 13 # it under the terms of the GNU General Public License as published by

+3

tools/testing/selftests/rcutorture/configs/lock/CFLIST

··· 1 1 LOCK01 2 + LOCK02 3 + LOCK03 4 + LOCK04

+6

tools/testing/selftests/rcutorture/configs/lock/LOCK02

··· 1 + CONFIG_SMP=y 2 + CONFIG_NR_CPUS=4 3 + CONFIG_HOTPLUG_CPU=y 4 + CONFIG_PREEMPT_NONE=n 5 + CONFIG_PREEMPT_VOLUNTARY=n 6 + CONFIG_PREEMPT=y

+1

tools/testing/selftests/rcutorture/configs/lock/LOCK02.boot

··· 1 + locktorture.torture_type=mutex_lock

+6

tools/testing/selftests/rcutorture/configs/lock/LOCK03

··· 1 + CONFIG_SMP=y 2 + CONFIG_NR_CPUS=4 3 + CONFIG_HOTPLUG_CPU=y 4 + CONFIG_PREEMPT_NONE=n 5 + CONFIG_PREEMPT_VOLUNTARY=n 6 + CONFIG_PREEMPT=y

+1

tools/testing/selftests/rcutorture/configs/lock/LOCK03.boot

··· 1 + locktorture.torture_type=rwsem_lock

+6

tools/testing/selftests/rcutorture/configs/lock/LOCK04

··· 1 + CONFIG_SMP=y 2 + CONFIG_NR_CPUS=4 3 + CONFIG_HOTPLUG_CPU=y 4 + CONFIG_PREEMPT_NONE=n 5 + CONFIG_PREEMPT_VOLUNTARY=n 6 + CONFIG_PREEMPT=y

+1

tools/testing/selftests/rcutorture/configs/lock/LOCK04.boot

··· 1 + locktorture.torture_type=rw_lock

+1 -1

tools/testing/selftests/rcutorture/configs/lock/ver_functions.sh

··· 38 38 echo $1 `locktorture_param_onoff "$1" "$2"` \ 39 39 locktorture.stat_interval=15 \ 40 40 locktorture.shutdown_secs=$3 \ 41 - locktorture.locktorture_runnable=1 \ 41 + locktorture.torture_runnable=1 \ 42 42 locktorture.verbose=1 43 43 }

+3

tools/testing/selftests/rcutorture/configs/rcu/CFLIST

··· 11 11 SRCU-P 12 12 TINY01 13 13 TINY02 14 + TASKS01 15 + TASKS02 16 + TASKS03

+9

tools/testing/selftests/rcutorture/configs/rcu/TASKS01

··· 1 + CONFIG_SMP=y 2 + CONFIG_NR_CPUS=2 3 + CONFIG_HOTPLUG_CPU=y 4 + CONFIG_PREEMPT_NONE=n 5 + CONFIG_PREEMPT_VOLUNTARY=n 6 + CONFIG_PREEMPT=y 7 + CONFIG_DEBUG_LOCK_ALLOC=y 8 + CONFIG_PROVE_RCU=y 9 + CONFIG_TASKS_RCU=y

+1

tools/testing/selftests/rcutorture/configs/rcu/TASKS01.boot

··· 1 + rcutorture.torture_type=tasks

+5

tools/testing/selftests/rcutorture/configs/rcu/TASKS02

··· 1 + CONFIG_SMP=n 2 + CONFIG_PREEMPT_NONE=y 3 + CONFIG_PREEMPT_VOLUNTARY=n 4 + CONFIG_PREEMPT=n 5 + CONFIG_TASKS_RCU=y

+1

tools/testing/selftests/rcutorture/configs/rcu/TASKS02.boot

··· 1 + rcutorture.torture_type=tasks

+13

tools/testing/selftests/rcutorture/configs/rcu/TASKS03

··· 1 + CONFIG_SMP=y 2 + CONFIG_NR_CPUS=2 3 + CONFIG_HOTPLUG_CPU=n 4 + CONFIG_SUSPEND=n 5 + CONFIG_HIBERNATION=n 6 + CONFIG_PREEMPT_NONE=n 7 + CONFIG_PREEMPT_VOLUNTARY=n 8 + CONFIG_PREEMPT=y 9 + CONFIG_TASKS_RCU=y 10 + CONFIG_HZ_PERIODIC=n 11 + CONFIG_NO_HZ_IDLE=n 12 + CONFIG_NO_HZ_FULL=y 13 + CONFIG_NO_HZ_FULL_ALL=y

+1

tools/testing/selftests/rcutorture/configs/rcu/TASKS03.boot

··· 1 + rcutorture.torture_type=tasks

+1 -3

tools/testing/selftests/rcutorture/configs/rcu/TREE01

··· 1 1 CONFIG_SMP=y 2 - CONFIG_NR_CPUS=8 3 2 CONFIG_PREEMPT_NONE=n 4 3 CONFIG_PREEMPT_VOLUNTARY=n 5 4 CONFIG_PREEMPT=y ··· 9 10 CONFIG_RCU_FAST_NO_HZ=y 10 11 CONFIG_RCU_TRACE=y 11 12 CONFIG_HOTPLUG_CPU=y 12 - CONFIG_RCU_FANOUT=8 13 - CONFIG_RCU_FANOUT_EXACT=n 13 + CONFIG_MAXSMP=y 14 14 CONFIG_RCU_NOCB_CPU=y 15 15 CONFIG_RCU_NOCB_CPU_ZERO=y 16 16 CONFIG_DEBUG_LOCK_ALLOC=n

+1 -1

tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot

··· 1 - rcutorture.torture_type=rcu_bh 1 + rcutorture.torture_type=rcu_bh maxcpus=8

+2 -1

tools/testing/selftests/rcutorture/configs/rcu/TREE07

··· 1 1 CONFIG_SMP=y 2 2 CONFIG_NR_CPUS=16 3 + CONFIG_CPUMASK_OFFSTACK=y 3 4 CONFIG_PREEMPT_NONE=y 4 5 CONFIG_PREEMPT_VOLUNTARY=n 5 6 CONFIG_PREEMPT=n ··· 8 7 CONFIG_HZ_PERIODIC=n 9 8 CONFIG_NO_HZ_IDLE=n 10 9 CONFIG_NO_HZ_FULL=y 11 - CONFIG_NO_HZ_FULL_ALL=y 10 + CONFIG_NO_HZ_FULL_ALL=n 12 11 CONFIG_NO_HZ_FULL_SYSIDLE=y 13 12 CONFIG_RCU_FAST_NO_HZ=n 14 13 CONFIG_RCU_TRACE=y

+1

tools/testing/selftests/rcutorture/configs/rcu/TREE07.boot

··· 1 + nohz_full=2-9

+1 -1

tools/testing/selftests/rcutorture/configs/rcu/ver_functions.sh

··· 51 51 `rcutorture_param_n_barrier_cbs "$1"` \ 52 52 rcutorture.stat_interval=15 \ 53 53 rcutorture.shutdown_secs=$3 \ 54 - rcutorture.rcutorture_runnable=1 \ 54 + rcutorture.torture_runnable=1 \ 55 55 rcutorture.test_no_idle_hz=1 \ 56 56 rcutorture.verbose=1 57 57 }

+1

tools/testing/selftests/rcutorture/doc/initrd.txt

··· 6 6 That said, here are the commands: 7 7 8 8 ------------------------------------------------------------------------ 9 + cd tools/testing/selftests/rcutorture 9 10 zcat /initrd.img > /tmp/initrd.img.zcat 10 11 mkdir initrd 11 12 cd initrd