Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

docs: locking: convert docs to ReST and rename to *.rst

Convert the locking documents to ReST and add them to the
kernel development book where it belongs.

Most of the stuff here is just to make Sphinx to properly
parse the text file, as they're already in good shape,
not requiring massive changes in order to be parsed.

The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Federico Vaga <federico.vaga@vaga.pv.it>

+511 -387
+1 -1
Documentation/kernel-hacking/locking.rst
··· 1364 1364 Further reading 1365 1365 =============== 1366 1366 1367 - - ``Documentation/locking/spinlocks.txt``: Linus Torvalds' spinlocking 1367 + - ``Documentation/locking/spinlocks.rst``: Linus Torvalds' spinlocking 1368 1368 tutorial in the kernel sources. 1369 1369 1370 1370 - Unix Systems for Modern Architectures: Symmetric Multiprocessing and
+24
Documentation/locking/index.rst
··· 1 + :orphan: 2 + 3 + ======= 4 + locking 5 + ======= 6 + 7 + .. toctree:: 8 + :maxdepth: 1 9 + 10 + lockdep-design 11 + lockstat 12 + locktorture 13 + mutex-design 14 + rt-mutex-design 15 + rt-mutex 16 + spinlocks 17 + ww-mutex-design 18 + 19 + .. only:: subproject and html 20 + 21 + Indices 22 + ======= 23 + 24 + * :ref:`genindex`
+28 -23
Documentation/locking/lockdep-design.txt Documentation/locking/lockdep-design.rst
··· 2 2 ===================================== 3 3 4 4 started by Ingo Molnar <mingo@redhat.com> 5 + 5 6 additions by Arjan van de Ven <arjan@linux.intel.com> 6 7 7 8 Lock-class ··· 57 56 58 57 When locking rules are violated, these usage bits are presented in the 59 58 locking error messages, inside curlies, with a total of 2 * n STATEs bits. 60 - A contrived example: 59 + A contrived example:: 61 60 62 61 modprobe/2287 is trying to acquire lock: 63 62 (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24 ··· 71 70 above respectively, and the character displayed at each bit position 72 71 indicates: 73 72 73 + === =================================================== 74 74 '.' acquired while irqs disabled and not in irq context 75 75 '-' acquired in irq context 76 76 '+' acquired with irqs enabled 77 77 '?' acquired in irq context with irqs enabled. 78 + === =================================================== 78 79 79 - The bits are illustrated with an example: 80 + The bits are illustrated with an example:: 80 81 81 82 (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24 82 83 |||| ··· 93 90 shown in the table below. The bit character is able to indicate which 94 91 exact case is for the lock as of the reporting time. 95 92 96 - ------------------------------------------- 93 + +--------------+-------------+--------------+ 97 94 | | irq enabled | irq disabled | 98 - |-------------------------------------------| 95 + +--------------+-------------+--------------+ 99 96 | ever in irq | ? | - | 100 - |-------------------------------------------| 97 + +--------------+-------------+--------------+ 101 98 | never in irq | + | . | 102 - ------------------------------------------- 99 + +--------------+-------------+--------------+ 103 100 104 101 The character '-' suggests irq is disabled because if otherwise the 105 102 charactor '?' would have been shown instead. Similar deduction can be ··· 116 113 117 114 A softirq-unsafe lock-class is automatically hardirq-unsafe as well. The 118 115 following states must be exclusive: only one of them is allowed to be set 119 - for any lock-class based on its usage: 116 + for any lock-class based on its usage:: 120 117 121 118 <hardirq-safe> or <hardirq-unsafe> 122 119 <softirq-safe> or <softirq-unsafe> ··· 137 134 The same lock-class must not be acquired twice, because this could lead 138 135 to lock recursion deadlocks. 139 136 140 - Furthermore, two locks can not be taken in inverse order: 137 + Furthermore, two locks can not be taken in inverse order:: 141 138 142 139 <L1> -> <L2> 143 140 <L2> -> <L1> ··· 151 148 acquired in a circular fashion. 152 149 153 150 Furthermore, the following usage based lock dependencies are not allowed 154 - between any two lock-classes: 151 + between any two lock-classes:: 155 152 156 153 <hardirq-safe> -> <hardirq-unsafe> 157 154 <softirq-safe> -> <softirq-unsafe> ··· 207 204 In order to teach the validator about this correct usage model, new 208 205 versions of the various locking primitives were added that allow you to 209 206 specify a "nesting level". An example call, for the block device mutex, 210 - looks like this: 207 + looks like this:: 211 208 212 - enum bdev_bd_mutex_lock_class 213 - { 209 + enum bdev_bd_mutex_lock_class 210 + { 214 211 BD_MUTEX_NORMAL, 215 212 BD_MUTEX_WHOLE, 216 213 BD_MUTEX_PARTITION 217 - }; 214 + }; 218 215 219 - mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION); 216 + mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION); 220 217 221 218 In this case the locking is done on a bdev object that is known to be a 222 219 partition. ··· 237 234 As the name suggests, lockdep_assert_held* family of macros assert that a 238 235 particular lock is held at a certain time (and generate a WARN() otherwise). 239 236 This annotation is largely used all over the kernel, e.g. kernel/sched/ 240 - core.c 237 + core.c:: 241 238 242 239 void update_rq_clock(struct rq *rq) 243 240 { ··· 256 253 layer assumes a lock remains taken, but a lower layer thinks it can maybe drop 257 254 and reacquire the lock ("unwittingly" introducing races). lockdep_pin_lock() 258 255 returns a 'struct pin_cookie' that is then used by lockdep_unpin_lock() to check 259 - that nobody tampered with the lock, e.g. kernel/sched/sched.h 256 + that nobody tampered with the lock, e.g. kernel/sched/sched.h:: 260 257 261 258 static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf) 262 259 { ··· 283 280 locking sequence that occurred at least once during the lifetime of the 284 281 kernel, the validator proves it with a 100% certainty that no 285 282 combination and timing of these locking sequences can cause any class of 286 - lock related deadlock. [*] 283 + lock related deadlock. [1]_ 287 284 288 285 I.e. complex multi-CPU and multi-task locking scenarios do not have to 289 286 occur in practice to prove a deadlock: only the simple 'component' ··· 302 299 every possible hardirq and softirq nesting scenario (which is impossible 303 300 to do in practice). 304 301 305 - [*] assuming that the validator itself is 100% correct, and no other 302 + .. [1] 303 + 304 + assuming that the validator itself is 100% correct, and no other 306 305 part of the system corrupts the state of the validator in any way. 307 306 We also assume that all NMI/SMM paths [which could interrupt 308 307 even hardirq-disabled codepaths] are correct and do not interfere ··· 315 310 Performance: 316 311 ------------ 317 312 318 - The above rules require _massive_ amounts of runtime checking. If we did 313 + The above rules require **massive** amounts of runtime checking. If we did 319 314 that for every lock taken and for every irqs-enable event, it would 320 315 render the system practically unusably slow. The complexity of checking 321 316 is O(N^2), so even with just a few hundred lock-classes we'd have to do ··· 374 369 375 370 Of course, if you do run out of lock classes, the next thing to do is 376 371 to find the offending lock classes. First, the following command gives 377 - you the number of lock classes currently in use along with the maximum: 372 + you the number of lock classes currently in use along with the maximum:: 378 373 379 374 grep "lock-classes" /proc/lockdep_stats 380 375 381 - This command produces the following output on a modest system: 376 + This command produces the following output on a modest system:: 382 377 383 - lock-classes: 748 [max: 8191] 378 + lock-classes: 748 [max: 8191] 384 379 385 380 If the number allocated (748 above) increases continually over time, 386 381 then there is likely a leak. The following command can be used to 387 - identify the leaking lock classes: 382 + identify the leaking lock classes:: 388 383 389 384 grep "BD" /proc/lockdep 390 385
+204
Documentation/locking/lockstat.rst
··· 1 + =============== 2 + Lock Statistics 3 + =============== 4 + 5 + What 6 + ==== 7 + 8 + As the name suggests, it provides statistics on locks. 9 + 10 + 11 + Why 12 + === 13 + 14 + Because things like lock contention can severely impact performance. 15 + 16 + How 17 + === 18 + 19 + Lockdep already has hooks in the lock functions and maps lock instances to 20 + lock classes. We build on that (see Documentation/locking/lockdep-design.rst). 21 + The graph below shows the relation between the lock functions and the various 22 + hooks therein:: 23 + 24 + __acquire 25 + | 26 + lock _____ 27 + | \ 28 + | __contended 29 + | | 30 + | <wait> 31 + | _______/ 32 + |/ 33 + | 34 + __acquired 35 + | 36 + . 37 + <hold> 38 + . 39 + | 40 + __release 41 + | 42 + unlock 43 + 44 + lock, unlock - the regular lock functions 45 + __* - the hooks 46 + <> - states 47 + 48 + With these hooks we provide the following statistics: 49 + 50 + con-bounces 51 + - number of lock contention that involved x-cpu data 52 + contentions 53 + - number of lock acquisitions that had to wait 54 + wait time 55 + min 56 + - shortest (non-0) time we ever had to wait for a lock 57 + max 58 + - longest time we ever had to wait for a lock 59 + total 60 + - total time we spend waiting on this lock 61 + avg 62 + - average time spent waiting on this lock 63 + acq-bounces 64 + - number of lock acquisitions that involved x-cpu data 65 + acquisitions 66 + - number of times we took the lock 67 + hold time 68 + min 69 + - shortest (non-0) time we ever held the lock 70 + max 71 + - longest time we ever held the lock 72 + total 73 + - total time this lock was held 74 + avg 75 + - average time this lock was held 76 + 77 + These numbers are gathered per lock class, per read/write state (when 78 + applicable). 79 + 80 + It also tracks 4 contention points per class. A contention point is a call site 81 + that had to wait on lock acquisition. 82 + 83 + Configuration 84 + ------------- 85 + 86 + Lock statistics are enabled via CONFIG_LOCK_STAT. 87 + 88 + Usage 89 + ----- 90 + 91 + Enable collection of statistics:: 92 + 93 + # echo 1 >/proc/sys/kernel/lock_stat 94 + 95 + Disable collection of statistics:: 96 + 97 + # echo 0 >/proc/sys/kernel/lock_stat 98 + 99 + Look at the current lock statistics:: 100 + 101 + ( line numbers not part of actual output, done for clarity in the explanation 102 + below ) 103 + 104 + # less /proc/lock_stat 105 + 106 + 01 lock_stat version 0.4 107 + 02----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 108 + 03 class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg 109 + 04----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 110 + 05 111 + 06 &mm->mmap_sem-W: 46 84 0.26 939.10 16371.53 194.90 47291 2922365 0.16 2220301.69 17464026916.32 5975.99 112 + 07 &mm->mmap_sem-R: 37 100 1.31 299502.61 325629.52 3256.30 212344 34316685 0.10 7744.91 95016910.20 2.77 113 + 08 --------------- 114 + 09 &mm->mmap_sem 1 [<ffffffff811502a7>] khugepaged_scan_mm_slot+0x57/0x280 115 + 10 &mm->mmap_sem 96 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510 116 + 11 &mm->mmap_sem 34 [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0 117 + 12 &mm->mmap_sem 17 [<ffffffff81127e71>] vm_munmap+0x41/0x80 118 + 13 --------------- 119 + 14 &mm->mmap_sem 1 [<ffffffff81046fda>] dup_mmap+0x2a/0x3f0 120 + 15 &mm->mmap_sem 60 [<ffffffff81129e29>] SyS_mprotect+0xe9/0x250 121 + 16 &mm->mmap_sem 41 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510 122 + 17 &mm->mmap_sem 68 [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0 123 + 18 124 + 19............................................................................................................................................................................................................................. 125 + 20 126 + 21 unix_table_lock: 110 112 0.21 49.24 163.91 1.46 21094 66312 0.12 624.42 31589.81 0.48 127 + 22 --------------- 128 + 23 unix_table_lock 45 [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0 129 + 24 unix_table_lock 47 [<ffffffff8150b111>] unix_release_sock+0x31/0x250 130 + 25 unix_table_lock 15 [<ffffffff8150ca37>] unix_find_other+0x117/0x230 131 + 26 unix_table_lock 5 [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0 132 + 27 --------------- 133 + 28 unix_table_lock 39 [<ffffffff8150b111>] unix_release_sock+0x31/0x250 134 + 29 unix_table_lock 49 [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0 135 + 30 unix_table_lock 20 [<ffffffff8150ca37>] unix_find_other+0x117/0x230 136 + 31 unix_table_lock 4 [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0 137 + 138 + 139 + This excerpt shows the first two lock class statistics. Line 01 shows the 140 + output version - each time the format changes this will be updated. Line 02-04 141 + show the header with column descriptions. Lines 05-18 and 20-31 show the actual 142 + statistics. These statistics come in two parts; the actual stats separated by a 143 + short separator (line 08, 13) from the contention points. 144 + 145 + Lines 09-12 show the first 4 recorded contention points (the code 146 + which tries to get the lock) and lines 14-17 show the first 4 recorded 147 + contended points (the lock holder). It is possible that the max 148 + con-bounces point is missing in the statistics. 149 + 150 + The first lock (05-18) is a read/write lock, and shows two lines above the 151 + short separator. The contention points don't match the column descriptors, 152 + they have two: contentions and [<IP>] symbol. The second set of contention 153 + points are the points we're contending with. 154 + 155 + The integer part of the time values is in us. 156 + 157 + Dealing with nested locks, subclasses may appear:: 158 + 159 + 32........................................................................................................................................................................................................................... 160 + 33 161 + 34 &rq->lock: 13128 13128 0.43 190.53 103881.26 7.91 97454 3453404 0.00 401.11 13224683.11 3.82 162 + 35 --------- 163 + 36 &rq->lock 645 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75 164 + 37 &rq->lock 297 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a 165 + 38 &rq->lock 360 [<ffffffff8103c4c5>] select_task_rq_fair+0x1f0/0x74a 166 + 39 &rq->lock 428 [<ffffffff81045f98>] scheduler_tick+0x46/0x1fb 167 + 40 --------- 168 + 41 &rq->lock 77 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75 169 + 42 &rq->lock 174 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a 170 + 43 &rq->lock 4715 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54 171 + 44 &rq->lock 893 [<ffffffff81340524>] schedule+0x157/0x7b8 172 + 45 173 + 46........................................................................................................................................................................................................................... 174 + 47 175 + 48 &rq->lock/1: 1526 11488 0.33 388.73 136294.31 11.86 21461 38404 0.00 37.93 109388.53 2.84 176 + 49 ----------- 177 + 50 &rq->lock/1 11526 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54 178 + 51 ----------- 179 + 52 &rq->lock/1 5645 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54 180 + 53 &rq->lock/1 1224 [<ffffffff81340524>] schedule+0x157/0x7b8 181 + 54 &rq->lock/1 4336 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54 182 + 55 &rq->lock/1 181 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a 183 + 184 + Line 48 shows statistics for the second subclass (/1) of &rq->lock class 185 + (subclass starts from 0), since in this case, as line 50 suggests, 186 + double_rq_lock actually acquires a nested lock of two spinlocks. 187 + 188 + View the top contending locks:: 189 + 190 + # grep : /proc/lock_stat | head 191 + clockevents_lock: 2926159 2947636 0.15 46882.81 1784540466.34 605.41 3381345 3879161 0.00 2260.97 53178395.68 13.71 192 + tick_broadcast_lock: 346460 346717 0.18 2257.43 39364622.71 113.54 3642919 4242696 0.00 2263.79 49173646.60 11.59 193 + &mapping->i_mmap_mutex: 203896 203899 3.36 645530.05 31767507988.39 155800.21 3361776 8893984 0.17 2254.15 14110121.02 1.59 194 + &rq->lock: 135014 136909 0.18 606.09 842160.68 6.15 1540728 10436146 0.00 728.72 17606683.41 1.69 195 + &(&zone->lru_lock)->rlock: 93000 94934 0.16 59.18 188253.78 1.98 1199912 3809894 0.15 391.40 3559518.81 0.93 196 + tasklist_lock-W: 40667 41130 0.23 1189.42 428980.51 10.43 270278 510106 0.16 653.51 3939674.91 7.72 197 + tasklist_lock-R: 21298 21305 0.20 1310.05 215511.12 10.12 186204 241258 0.14 1162.33 1179779.23 4.89 198 + rcu_node_1: 47656 49022 0.16 635.41 193616.41 3.95 844888 1865423 0.00 764.26 1656226.96 0.89 199 + &(&dentry->d_lockref.lock)->rlock: 39791 40179 0.15 1302.08 88851.96 2.21 2790851 12527025 0.10 1910.75 3379714.27 0.27 200 + rcu_node_0: 29203 30064 0.16 786.55 1555573.00 51.74 88963 244254 0.00 398.87 428872.51 1.76 201 + 202 + Clear the statistics:: 203 + 204 + # echo 0 > /proc/lock_stat
-183
Documentation/locking/lockstat.txt
··· 1 - 2 - LOCK STATISTICS 3 - 4 - - WHAT 5 - 6 - As the name suggests, it provides statistics on locks. 7 - 8 - - WHY 9 - 10 - Because things like lock contention can severely impact performance. 11 - 12 - - HOW 13 - 14 - Lockdep already has hooks in the lock functions and maps lock instances to 15 - lock classes. We build on that (see Documentation/locking/lockdep-design.txt). 16 - The graph below shows the relation between the lock functions and the various 17 - hooks therein. 18 - 19 - __acquire 20 - | 21 - lock _____ 22 - | \ 23 - | __contended 24 - | | 25 - | <wait> 26 - | _______/ 27 - |/ 28 - | 29 - __acquired 30 - | 31 - . 32 - <hold> 33 - . 34 - | 35 - __release 36 - | 37 - unlock 38 - 39 - lock, unlock - the regular lock functions 40 - __* - the hooks 41 - <> - states 42 - 43 - With these hooks we provide the following statistics: 44 - 45 - con-bounces - number of lock contention that involved x-cpu data 46 - contentions - number of lock acquisitions that had to wait 47 - wait time min - shortest (non-0) time we ever had to wait for a lock 48 - max - longest time we ever had to wait for a lock 49 - total - total time we spend waiting on this lock 50 - avg - average time spent waiting on this lock 51 - acq-bounces - number of lock acquisitions that involved x-cpu data 52 - acquisitions - number of times we took the lock 53 - hold time min - shortest (non-0) time we ever held the lock 54 - max - longest time we ever held the lock 55 - total - total time this lock was held 56 - avg - average time this lock was held 57 - 58 - These numbers are gathered per lock class, per read/write state (when 59 - applicable). 60 - 61 - It also tracks 4 contention points per class. A contention point is a call site 62 - that had to wait on lock acquisition. 63 - 64 - - CONFIGURATION 65 - 66 - Lock statistics are enabled via CONFIG_LOCK_STAT. 67 - 68 - - USAGE 69 - 70 - Enable collection of statistics: 71 - 72 - # echo 1 >/proc/sys/kernel/lock_stat 73 - 74 - Disable collection of statistics: 75 - 76 - # echo 0 >/proc/sys/kernel/lock_stat 77 - 78 - Look at the current lock statistics: 79 - 80 - ( line numbers not part of actual output, done for clarity in the explanation 81 - below ) 82 - 83 - # less /proc/lock_stat 84 - 85 - 01 lock_stat version 0.4 86 - 02----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 87 - 03 class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg 88 - 04----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 89 - 05 90 - 06 &mm->mmap_sem-W: 46 84 0.26 939.10 16371.53 194.90 47291 2922365 0.16 2220301.69 17464026916.32 5975.99 91 - 07 &mm->mmap_sem-R: 37 100 1.31 299502.61 325629.52 3256.30 212344 34316685 0.10 7744.91 95016910.20 2.77 92 - 08 --------------- 93 - 09 &mm->mmap_sem 1 [<ffffffff811502a7>] khugepaged_scan_mm_slot+0x57/0x280 94 - 10 &mm->mmap_sem 96 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510 95 - 11 &mm->mmap_sem 34 [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0 96 - 12 &mm->mmap_sem 17 [<ffffffff81127e71>] vm_munmap+0x41/0x80 97 - 13 --------------- 98 - 14 &mm->mmap_sem 1 [<ffffffff81046fda>] dup_mmap+0x2a/0x3f0 99 - 15 &mm->mmap_sem 60 [<ffffffff81129e29>] SyS_mprotect+0xe9/0x250 100 - 16 &mm->mmap_sem 41 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510 101 - 17 &mm->mmap_sem 68 [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0 102 - 18 103 - 19............................................................................................................................................................................................................................. 104 - 20 105 - 21 unix_table_lock: 110 112 0.21 49.24 163.91 1.46 21094 66312 0.12 624.42 31589.81 0.48 106 - 22 --------------- 107 - 23 unix_table_lock 45 [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0 108 - 24 unix_table_lock 47 [<ffffffff8150b111>] unix_release_sock+0x31/0x250 109 - 25 unix_table_lock 15 [<ffffffff8150ca37>] unix_find_other+0x117/0x230 110 - 26 unix_table_lock 5 [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0 111 - 27 --------------- 112 - 28 unix_table_lock 39 [<ffffffff8150b111>] unix_release_sock+0x31/0x250 113 - 29 unix_table_lock 49 [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0 114 - 30 unix_table_lock 20 [<ffffffff8150ca37>] unix_find_other+0x117/0x230 115 - 31 unix_table_lock 4 [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0 116 - 117 - 118 - This excerpt shows the first two lock class statistics. Line 01 shows the 119 - output version - each time the format changes this will be updated. Line 02-04 120 - show the header with column descriptions. Lines 05-18 and 20-31 show the actual 121 - statistics. These statistics come in two parts; the actual stats separated by a 122 - short separator (line 08, 13) from the contention points. 123 - 124 - Lines 09-12 show the first 4 recorded contention points (the code 125 - which tries to get the lock) and lines 14-17 show the first 4 recorded 126 - contended points (the lock holder). It is possible that the max 127 - con-bounces point is missing in the statistics. 128 - 129 - The first lock (05-18) is a read/write lock, and shows two lines above the 130 - short separator. The contention points don't match the column descriptors, 131 - they have two: contentions and [<IP>] symbol. The second set of contention 132 - points are the points we're contending with. 133 - 134 - The integer part of the time values is in us. 135 - 136 - Dealing with nested locks, subclasses may appear: 137 - 138 - 32........................................................................................................................................................................................................................... 139 - 33 140 - 34 &rq->lock: 13128 13128 0.43 190.53 103881.26 7.91 97454 3453404 0.00 401.11 13224683.11 3.82 141 - 35 --------- 142 - 36 &rq->lock 645 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75 143 - 37 &rq->lock 297 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a 144 - 38 &rq->lock 360 [<ffffffff8103c4c5>] select_task_rq_fair+0x1f0/0x74a 145 - 39 &rq->lock 428 [<ffffffff81045f98>] scheduler_tick+0x46/0x1fb 146 - 40 --------- 147 - 41 &rq->lock 77 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75 148 - 42 &rq->lock 174 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a 149 - 43 &rq->lock 4715 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54 150 - 44 &rq->lock 893 [<ffffffff81340524>] schedule+0x157/0x7b8 151 - 45 152 - 46........................................................................................................................................................................................................................... 153 - 47 154 - 48 &rq->lock/1: 1526 11488 0.33 388.73 136294.31 11.86 21461 38404 0.00 37.93 109388.53 2.84 155 - 49 ----------- 156 - 50 &rq->lock/1 11526 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54 157 - 51 ----------- 158 - 52 &rq->lock/1 5645 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54 159 - 53 &rq->lock/1 1224 [<ffffffff81340524>] schedule+0x157/0x7b8 160 - 54 &rq->lock/1 4336 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54 161 - 55 &rq->lock/1 181 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a 162 - 163 - Line 48 shows statistics for the second subclass (/1) of &rq->lock class 164 - (subclass starts from 0), since in this case, as line 50 suggests, 165 - double_rq_lock actually acquires a nested lock of two spinlocks. 166 - 167 - View the top contending locks: 168 - 169 - # grep : /proc/lock_stat | head 170 - clockevents_lock: 2926159 2947636 0.15 46882.81 1784540466.34 605.41 3381345 3879161 0.00 2260.97 53178395.68 13.71 171 - tick_broadcast_lock: 346460 346717 0.18 2257.43 39364622.71 113.54 3642919 4242696 0.00 2263.79 49173646.60 11.59 172 - &mapping->i_mmap_mutex: 203896 203899 3.36 645530.05 31767507988.39 155800.21 3361776 8893984 0.17 2254.15 14110121.02 1.59 173 - &rq->lock: 135014 136909 0.18 606.09 842160.68 6.15 1540728 10436146 0.00 728.72 17606683.41 1.69 174 - &(&zone->lru_lock)->rlock: 93000 94934 0.16 59.18 188253.78 1.98 1199912 3809894 0.15 391.40 3559518.81 0.93 175 - tasklist_lock-W: 40667 41130 0.23 1189.42 428980.51 10.43 270278 510106 0.16 653.51 3939674.91 7.72 176 - tasklist_lock-R: 21298 21305 0.20 1310.05 215511.12 10.12 186204 241258 0.14 1162.33 1179779.23 4.89 177 - rcu_node_1: 47656 49022 0.16 635.41 193616.41 3.95 844888 1865423 0.00 764.26 1656226.96 0.89 178 - &(&dentry->d_lockref.lock)->rlock: 39791 40179 0.15 1302.08 88851.96 2.21 2790851 12527025 0.10 1910.75 3379714.27 0.27 179 - rcu_node_0: 29203 30064 0.16 786.55 1555573.00 51.74 88963 244254 0.00 398.87 428872.51 1.76 180 - 181 - Clear the statistics: 182 - 183 - # echo 0 > /proc/lock_stat
+65 -40
Documentation/locking/locktorture.txt Documentation/locking/locktorture.rst
··· 1 + ================================== 1 2 Kernel Lock Torture Test Operation 3 + ================================== 2 4 3 5 CONFIG_LOCK_TORTURE_TEST 6 + ======================== 4 7 5 8 The CONFIG LOCK_TORTURE_TEST config option provides a kernel module 6 9 that runs torture tests on core kernel locking primitives. The kernel ··· 21 18 creating more kthreads. 22 19 23 20 24 - MODULE PARAMETERS 21 + Module Parameters 22 + ================= 25 23 26 24 This module has the following parameters: 27 25 28 26 29 - ** Locktorture-specific ** 27 + Locktorture-specific 28 + -------------------- 30 29 31 - nwriters_stress Number of kernel threads that will stress exclusive lock 30 + nwriters_stress 31 + Number of kernel threads that will stress exclusive lock 32 32 ownership (writers). The default value is twice the number 33 33 of online CPUs. 34 34 35 - nreaders_stress Number of kernel threads that will stress shared lock 35 + nreaders_stress 36 + Number of kernel threads that will stress shared lock 36 37 ownership (readers). The default is the same amount of writer 37 38 locks. If the user did not specify nwriters_stress, then 38 39 both readers and writers be the amount of online CPUs. 39 40 40 - torture_type Type of lock to torture. By default, only spinlocks will 41 + torture_type 42 + Type of lock to torture. By default, only spinlocks will 41 43 be tortured. This module can torture the following locks, 42 44 with string values as follows: 43 45 44 - o "lock_busted": Simulates a buggy lock implementation. 46 + - "lock_busted": 47 + Simulates a buggy lock implementation. 45 48 46 - o "spin_lock": spin_lock() and spin_unlock() pairs. 49 + - "spin_lock": 50 + spin_lock() and spin_unlock() pairs. 47 51 48 - o "spin_lock_irq": spin_lock_irq() and spin_unlock_irq() 49 - pairs. 52 + - "spin_lock_irq": 53 + spin_lock_irq() and spin_unlock_irq() pairs. 50 54 51 - o "rw_lock": read/write lock() and unlock() rwlock pairs. 55 + - "rw_lock": 56 + read/write lock() and unlock() rwlock pairs. 52 57 53 - o "rw_lock_irq": read/write lock_irq() and unlock_irq() 54 - rwlock pairs. 58 + - "rw_lock_irq": 59 + read/write lock_irq() and unlock_irq() 60 + rwlock pairs. 55 61 56 - o "mutex_lock": mutex_lock() and mutex_unlock() pairs. 62 + - "mutex_lock": 63 + mutex_lock() and mutex_unlock() pairs. 57 64 58 - o "rtmutex_lock": rtmutex_lock() and rtmutex_unlock() 59 - pairs. Kernel must have CONFIG_RT_MUTEX=y. 65 + - "rtmutex_lock": 66 + rtmutex_lock() and rtmutex_unlock() pairs. 67 + Kernel must have CONFIG_RT_MUTEX=y. 60 68 61 - o "rwsem_lock": read/write down() and up() semaphore pairs. 69 + - "rwsem_lock": 70 + read/write down() and up() semaphore pairs. 62 71 63 72 64 - ** Torture-framework (RCU + locking) ** 73 + Torture-framework (RCU + locking) 74 + --------------------------------- 65 75 66 - shutdown_secs The number of seconds to run the test before terminating 76 + shutdown_secs 77 + The number of seconds to run the test before terminating 67 78 the test and powering off the system. The default is 68 79 zero, which disables test termination and system shutdown. 69 80 This capability is useful for automated testing. 70 81 71 - onoff_interval The number of seconds between each attempt to execute a 82 + onoff_interval 83 + The number of seconds between each attempt to execute a 72 84 randomly selected CPU-hotplug operation. Defaults 73 85 to zero, which disables CPU hotplugging. In 74 86 CONFIG_HOTPLUG_CPU=n kernels, locktorture will silently 75 87 refuse to do any CPU-hotplug operations regardless of 76 88 what value is specified for onoff_interval. 77 89 78 - onoff_holdoff The number of seconds to wait until starting CPU-hotplug 90 + onoff_holdoff 91 + The number of seconds to wait until starting CPU-hotplug 79 92 operations. This would normally only be used when 80 93 locktorture was built into the kernel and started 81 94 automatically at boot time, in which case it is useful ··· 99 80 coming and going. This parameter is only useful if 100 81 CONFIG_HOTPLUG_CPU is enabled. 101 82 102 - stat_interval Number of seconds between statistics-related printk()s. 83 + stat_interval 84 + Number of seconds between statistics-related printk()s. 103 85 By default, locktorture will report stats every 60 seconds. 104 86 Setting the interval to zero causes the statistics to 105 87 be printed -only- when the module is unloaded, and this 106 88 is the default. 107 89 108 - stutter The length of time to run the test before pausing for this 90 + stutter 91 + The length of time to run the test before pausing for this 109 92 same period of time. Defaults to "stutter=5", so as 110 93 to run and pause for (roughly) five-second intervals. 111 94 Specifying "stutter=0" causes the test to run continuously 112 95 without pausing, which is the old default behavior. 113 96 114 - shuffle_interval The number of seconds to keep the test threads affinitied 97 + shuffle_interval 98 + The number of seconds to keep the test threads affinitied 115 99 to a particular subset of the CPUs, defaults to 3 seconds. 116 100 Used in conjunction with test_no_idle_hz. 117 101 118 - verbose Enable verbose debugging printing, via printk(). Enabled 102 + verbose 103 + Enable verbose debugging printing, via printk(). Enabled 119 104 by default. This extra information is mostly related to 120 105 high-level errors and reports from the main 'torture' 121 106 framework. 122 107 123 108 124 - STATISTICS 109 + Statistics 110 + ========== 125 111 126 - Statistics are printed in the following format: 112 + Statistics are printed in the following format:: 127 113 128 - spin_lock-torture: Writes: Total: 93746064 Max/Min: 0/0 Fail: 0 129 - (A) (B) (C) (D) (E) 114 + spin_lock-torture: Writes: Total: 93746064 Max/Min: 0/0 Fail: 0 115 + (A) (B) (C) (D) (E) 130 116 131 - (A): Lock type that is being tortured -- torture_type parameter. 117 + (A): Lock type that is being tortured -- torture_type parameter. 132 118 133 - (B): Number of writer lock acquisitions. If dealing with a read/write primitive 134 - a second "Reads" statistics line is printed. 119 + (B): Number of writer lock acquisitions. If dealing with a read/write 120 + primitive a second "Reads" statistics line is printed. 135 121 136 - (C): Number of times the lock was acquired. 122 + (C): Number of times the lock was acquired. 137 123 138 - (D): Min and max number of times threads failed to acquire the lock. 124 + (D): Min and max number of times threads failed to acquire the lock. 139 125 140 - (E): true/false values if there were errors acquiring the lock. This should 141 - -only- be positive if there is a bug in the locking primitive's 142 - implementation. Otherwise a lock should never fail (i.e., spin_lock()). 143 - Of course, the same applies for (C), above. A dummy example of this is 144 - the "lock_busted" type. 126 + (E): true/false values if there were errors acquiring the lock. This should 127 + -only- be positive if there is a bug in the locking primitive's 128 + implementation. Otherwise a lock should never fail (i.e., spin_lock()). 129 + Of course, the same applies for (C), above. A dummy example of this is 130 + the "lock_busted" type. 145 131 146 - USAGE 132 + Usage 133 + ===== 147 134 148 - The following script may be used to torture locks: 135 + The following script may be used to torture locks:: 149 136 150 137 #!/bin/sh 151 138
+18 -8
Documentation/locking/mutex-design.txt Documentation/locking/mutex-design.rst
··· 1 + ======================= 1 2 Generic Mutex Subsystem 3 + ======================= 2 4 3 5 started by Ingo Molnar <mingo@redhat.com> 6 + 4 7 updated by Davidlohr Bueso <davidlohr@hp.com> 5 8 6 9 What are mutexes? ··· 26 23 Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h 27 24 and implemented in kernel/locking/mutex.c. These locks use an atomic variable 28 25 (->owner) to keep track of the lock state during its lifetime. Field owner 29 - actually contains 'struct task_struct *' to the current lock owner and it is 26 + actually contains `struct task_struct *` to the current lock owner and it is 30 27 therefore NULL if not currently owned. Since task_struct pointers are aligned 31 28 at at least L1_CACHE_BYTES, low bits (3) are used to store extra state (e.g., 32 29 if waiter list is non-empty). In its most basic form it also includes a ··· 104 101 105 102 Interfaces 106 103 ---------- 107 - Statically define the mutex: 104 + Statically define the mutex:: 105 + 108 106 DEFINE_MUTEX(name); 109 107 110 - Dynamically initialize the mutex: 108 + Dynamically initialize the mutex:: 109 + 111 110 mutex_init(mutex); 112 111 113 - Acquire the mutex, uninterruptible: 112 + Acquire the mutex, uninterruptible:: 113 + 114 114 void mutex_lock(struct mutex *lock); 115 115 void mutex_lock_nested(struct mutex *lock, unsigned int subclass); 116 116 int mutex_trylock(struct mutex *lock); 117 117 118 - Acquire the mutex, interruptible: 118 + Acquire the mutex, interruptible:: 119 + 119 120 int mutex_lock_interruptible_nested(struct mutex *lock, 120 121 unsigned int subclass); 121 122 int mutex_lock_interruptible(struct mutex *lock); 122 123 123 - Acquire the mutex, interruptible, if dec to 0: 124 + Acquire the mutex, interruptible, if dec to 0:: 125 + 124 126 int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); 125 127 126 - Unlock the mutex: 128 + Unlock the mutex:: 129 + 127 130 void mutex_unlock(struct mutex *lock); 128 131 129 - Test if the mutex is taken: 132 + Test if the mutex is taken:: 133 + 130 134 int mutex_is_locked(struct mutex *lock); 131 135 132 136 Disadvantages
+77 -62
Documentation/locking/rt-mutex-design.txt Documentation/locking/rt-mutex-design.rst
··· 1 - # 2 - # Copyright (c) 2006 Steven Rostedt 3 - # Licensed under the GNU Free Documentation License, Version 1.2 4 - # 5 - 1 + ============================== 6 2 RT-mutex implementation design 7 - ------------------------------ 3 + ============================== 4 + 5 + Copyright (c) 2006 Steven Rostedt 6 + 7 + Licensed under the GNU Free Documentation License, Version 1.2 8 + 8 9 9 10 This document tries to describe the design of the rtmutex.c implementation. 10 11 It doesn't describe the reasons why rtmutex.c exists. For that please see 11 - Documentation/locking/rt-mutex.txt. Although this document does explain problems 12 + Documentation/locking/rt-mutex.rst. Although this document does explain problems 12 13 that happen without this code, but that is in the concept to understand 13 14 what the code actually is doing. 14 15 ··· 42 41 never give C a chance to release the lock. This is called unbounded priority 43 42 inversion. 44 43 45 - Here's a little ASCII art to show the problem. 44 + Here's a little ASCII art to show the problem:: 46 45 47 - grab lock L1 (owned by C) 48 - | 49 - A ---+ 50 - C preempted by B 51 - | 52 - C +----+ 46 + grab lock L1 (owned by C) 47 + | 48 + A ---+ 49 + C preempted by B 50 + | 51 + C +----+ 53 52 54 - B +--------> 55 - B now keeps A from running. 53 + B +--------> 54 + B now keeps A from running. 56 55 57 56 58 57 Priority Inheritance (PI) ··· 76 75 Here I explain some terminology that is used in this document to help describe 77 76 the design that is used to implement PI. 78 77 79 - PI chain - The PI chain is an ordered series of locks and processes that cause 78 + PI chain 79 + - The PI chain is an ordered series of locks and processes that cause 80 80 processes to inherit priorities from a previous process that is 81 81 blocked on one of its locks. This is described in more detail 82 82 later in this document. 83 83 84 - mutex - In this document, to differentiate from locks that implement 84 + mutex 85 + - In this document, to differentiate from locks that implement 85 86 PI and spin locks that are used in the PI code, from now on 86 87 the PI locks will be called a mutex. 87 88 88 - lock - In this document from now on, I will use the term lock when 89 + lock 90 + - In this document from now on, I will use the term lock when 89 91 referring to spin locks that are used to protect parts of the PI 90 92 algorithm. These locks disable preemption for UP (when 91 93 CONFIG_PREEMPT is enabled) and on SMP prevents multiple CPUs from 92 94 entering critical sections simultaneously. 93 95 94 - spin lock - Same as lock above. 96 + spin lock 97 + - Same as lock above. 95 98 96 - waiter - A waiter is a struct that is stored on the stack of a blocked 99 + waiter 100 + - A waiter is a struct that is stored on the stack of a blocked 97 101 process. Since the scope of the waiter is within the code for 98 102 a process being blocked on the mutex, it is fine to allocate 99 103 the waiter on the process's stack (local variable). This ··· 110 104 waiter is sometimes used in reference to the task that is waiting 111 105 on a mutex. This is the same as waiter->task. 112 106 113 - waiters - A list of processes that are blocked on a mutex. 107 + waiters 108 + - A list of processes that are blocked on a mutex. 114 109 115 - top waiter - The highest priority process waiting on a specific mutex. 110 + top waiter 111 + - The highest priority process waiting on a specific mutex. 116 112 117 - top pi waiter - The highest priority process waiting on one of the mutexes 113 + top pi waiter 114 + - The highest priority process waiting on one of the mutexes 118 115 that a specific process owns. 119 116 120 - Note: task and process are used interchangeably in this document, mostly to 117 + Note: 118 + task and process are used interchangeably in this document, mostly to 121 119 differentiate between two processes that are being described together. 122 120 123 121 ··· 133 123 would never diverge, since a process can't be blocked on more than one 134 124 mutex at a time. 135 125 136 - Example: 126 + Example:: 137 127 138 128 Process: A, B, C, D, E 139 129 Mutexes: L1, L2, L3, L4 ··· 147 137 D owns L4 148 138 E blocked on L4 149 139 150 - The chain would be: 140 + The chain would be:: 151 141 152 142 E->L4->D->L3->C->L2->B->L1->A 153 143 154 144 To show where two chains merge, we could add another process F and 155 145 another mutex L5 where B owns L5 and F is blocked on mutex L5. 156 146 157 - The chain for F would be: 147 + The chain for F would be:: 158 148 159 149 F->L5->B->L1->A 160 150 161 151 Since a process may own more than one mutex, but never be blocked on more than 162 152 one, the chains merge. 163 153 164 - Here we show both chains: 154 + Here we show both chains:: 165 155 166 156 E->L4->D->L3->C->L2-+ 167 157 | ··· 175 165 176 166 Also since a mutex may have more than one process blocked on it, we can 177 167 have multiple chains merge at mutexes. If we add another process G that is 178 - blocked on mutex L2: 168 + blocked on mutex L2:: 179 169 180 170 G->L2->B->L1->A 181 171 182 172 And once again, to show how this can grow I will show the merging chains 183 - again. 173 + again:: 184 174 185 175 E->L4->D->L3->C-+ 186 176 +->L2-+ ··· 194 184 to that of G. 195 185 196 186 Mutex Waiters Tree 197 - ----------------- 187 + ------------------ 198 188 199 189 Every mutex keeps track of all the waiters that are blocked on itself. The 200 190 mutex has a rbtree to store these waiters by priority. This tree is protected ··· 229 219 the nesting of mutexes. Let's look at the example where we have 3 mutexes, 230 220 L1, L2, and L3, and four separate functions func1, func2, func3 and func4. 231 221 The following shows a locking order of L1->L2->L3, but may not actually 232 - be directly nested that way. 222 + be directly nested that way:: 233 223 234 - void func1(void) 235 - { 224 + void func1(void) 225 + { 236 226 mutex_lock(L1); 237 227 238 228 /* do anything */ 239 229 240 230 mutex_unlock(L1); 241 - } 231 + } 242 232 243 - void func2(void) 244 - { 233 + void func2(void) 234 + { 245 235 mutex_lock(L1); 246 236 mutex_lock(L2); 247 237 ··· 249 239 250 240 mutex_unlock(L2); 251 241 mutex_unlock(L1); 252 - } 242 + } 253 243 254 - void func3(void) 255 - { 244 + void func3(void) 245 + { 256 246 mutex_lock(L2); 257 247 mutex_lock(L3); 258 248 ··· 260 250 261 251 mutex_unlock(L3); 262 252 mutex_unlock(L2); 263 - } 253 + } 264 254 265 - void func4(void) 266 - { 255 + void func4(void) 256 + { 267 257 mutex_lock(L3); 268 258 269 259 /* do something again */ 270 260 271 261 mutex_unlock(L3); 272 - } 262 + } 273 263 274 264 Now we add 4 processes that run each of these functions separately. 275 265 Processes A, B, C, and D which run functions func1, func2, func3 and func4 276 266 respectively, and such that D runs first and A last. With D being preempted 277 - in func4 in the "do something again" area, we have a locking that follows: 267 + in func4 in the "do something again" area, we have a locking that follows:: 278 268 279 - D owns L3 280 - C blocked on L3 281 - C owns L2 282 - B blocked on L2 283 - B owns L1 284 - A blocked on L1 269 + D owns L3 270 + C blocked on L3 271 + C owns L2 272 + B blocked on L2 273 + B owns L1 274 + A blocked on L1 285 275 286 - And thus we have the chain A->L1->B->L2->C->L3->D. 276 + And thus we have the chain A->L1->B->L2->C->L3->D. 287 277 288 278 This gives us a PI depth of 4 (four processes), but looking at any of the 289 279 functions individually, it seems as though they only have at most a locking ··· 308 298 significant bit to be used as a flag. Bit 0 is used as the "Has Waiters" 309 299 flag. It's set whenever there are waiters on a mutex. 310 300 311 - See Documentation/locking/rt-mutex.txt for further details. 301 + See Documentation/locking/rt-mutex.rst for further details. 312 302 313 303 cmpxchg Tricks 314 304 -------------- ··· 317 307 is used (when applicable) to keep the fast path of grabbing and releasing 318 308 mutexes short. 319 309 320 - cmpxchg is basically the following function performed atomically: 310 + cmpxchg is basically the following function performed atomically:: 321 311 322 - unsigned long _cmpxchg(unsigned long *A, unsigned long *B, unsigned long *C) 323 - { 312 + unsigned long _cmpxchg(unsigned long *A, unsigned long *B, unsigned long *C) 313 + { 324 314 unsigned long T = *A; 325 315 if (*A == *B) { 326 316 *A = *C; 327 317 } 328 318 return T; 329 - } 330 - #define cmpxchg(a,b,c) _cmpxchg(&a,&b,&c) 319 + } 320 + #define cmpxchg(a,b,c) _cmpxchg(&a,&b,&c) 331 321 332 322 This is really nice to have, since it allows you to only update a variable 333 323 if the variable is what you expect it to be. You know if it succeeded if ··· 362 352 new priority. Note that rt_mutex_setprio is defined in kernel/sched/core.c 363 353 to implement the actual change in priority. 364 354 365 - (Note: For the "prio" field in task_struct, the lower the number, the 355 + Note: 356 + For the "prio" field in task_struct, the lower the number, the 366 357 higher the priority. A "prio" of 5 is of higher priority than a 367 - "prio" of 10.) 358 + "prio" of 10. 368 359 369 360 It is interesting to note that rt_mutex_adjust_prio can either increase 370 361 or decrease the priority of the task. In the case that a higher priority ··· 450 439 forces the current owner to synchronize with this code. 451 440 452 441 The lock is taken if the following are true: 442 + 453 443 1) The lock has no owner 454 444 2) The current task is the highest priority against all other 455 445 waiters of the lock ··· 558 546 ------- 559 547 560 548 Author: Steven Rostedt <rostedt@goodmis.org> 549 + 561 550 Updated: Alex Shi <alex.shi@linaro.org> - 7/6/2017 562 551 563 - Original Reviewers: Ingo Molnar, Thomas Gleixner, Thomas Duetsch, and 552 + Original Reviewers: 553 + Ingo Molnar, Thomas Gleixner, Thomas Duetsch, and 564 554 Randy Dunlap 555 + 565 556 Update (7/6/2017) Reviewers: Steven Rostedt and Sebastian Siewior 566 557 567 558 Updates
+17 -13
Documentation/locking/rt-mutex.txt Documentation/locking/rt-mutex.rst
··· 1 + ================================== 1 2 RT-mutex subsystem with PI support 2 - ---------------------------------- 3 + ================================== 3 4 4 5 RT-mutexes with priority inheritance are used to support PI-futexes, 5 6 which enable pthread_mutex_t priority inheritance attributes ··· 47 46 structure: 48 47 49 48 lock->owner holds the task_struct pointer of the owner. Bit 0 is used to 50 - keep track of the "lock has waiters" state. 49 + keep track of the "lock has waiters" state: 51 50 52 - owner bit0 51 + ============ ======= ================================================ 52 + owner bit0 Notes 53 + ============ ======= ================================================ 53 54 NULL 0 lock is free (fast acquire possible) 54 55 NULL 1 lock is free and has waiters and the top waiter 55 - is going to take the lock* 56 + is going to take the lock [1]_ 56 57 taskpointer 0 lock is held (fast release possible) 57 - taskpointer 1 lock is held and has waiters** 58 + taskpointer 1 lock is held and has waiters [2]_ 59 + ============ ======= ================================================ 58 60 59 61 The fast atomic compare exchange based acquire and release is only 60 62 possible when bit 0 of lock->owner is 0. 61 63 62 - (*) It also can be a transitional state when grabbing the lock 63 - with ->wait_lock is held. To prevent any fast path cmpxchg to the lock, 64 - we need to set the bit0 before looking at the lock, and the owner may be 65 - NULL in this small time, hence this can be a transitional state. 64 + .. [1] It also can be a transitional state when grabbing the lock 65 + with ->wait_lock is held. To prevent any fast path cmpxchg to the lock, 66 + we need to set the bit0 before looking at the lock, and the owner may 67 + be NULL in this small time, hence this can be a transitional state. 66 68 67 - (**) There is a small time when bit 0 is set but there are no 68 - waiters. This can happen when grabbing the lock in the slow path. 69 - To prevent a cmpxchg of the owner releasing the lock, we need to 70 - set this bit before looking at the lock. 69 + .. [2] There is a small time when bit 0 is set but there are no 70 + waiters. This can happen when grabbing the lock in the slow path. 71 + To prevent a cmpxchg of the owner releasing the lock, we need to 72 + set this bit before looking at the lock. 71 73 72 74 BTW, there is still technically a "Pending Owner", it's just not called 73 75 that anymore. The pending owner happens to be the top_waiter of a lock
+21 -11
Documentation/locking/spinlocks.txt Documentation/locking/spinlocks.rst
··· 1 + =============== 2 + Locking lessons 3 + =============== 4 + 1 5 Lesson 1: Spin locks 6 + ==================== 2 7 3 - The most basic primitive for locking is spinlock. 8 + The most basic primitive for locking is spinlock:: 4 9 5 - static DEFINE_SPINLOCK(xxx_lock); 10 + static DEFINE_SPINLOCK(xxx_lock); 6 11 7 12 unsigned long flags; 8 13 ··· 24 19 NOTE! Implications of spin_locks for memory are further described in: 25 20 26 21 Documentation/memory-barriers.txt 22 + 27 23 (5) LOCK operations. 24 + 28 25 (6) UNLOCK operations. 29 26 30 27 The above is usually pretty simple (you usually need and want only one 31 28 spinlock for most things - using more than one spinlock can make things a 32 29 lot more complex and even slower and is usually worth it only for 33 - sequences that you _know_ need to be split up: avoid it at all cost if you 30 + sequences that you **know** need to be split up: avoid it at all cost if you 34 31 aren't sure). 35 32 36 33 This is really the only really hard part about spinlocks: once you start 37 34 using spinlocks they tend to expand to areas you might not have noticed 38 35 before, because you have to make sure the spinlocks correctly protect the 39 - shared data structures _everywhere_ they are used. The spinlocks are most 36 + shared data structures **everywhere** they are used. The spinlocks are most 40 37 easily added to places that are completely independent of other code (for 41 38 example, internal driver data structures that nobody else ever touches). 42 39 43 - NOTE! The spin-lock is safe only when you _also_ use the lock itself 40 + NOTE! The spin-lock is safe only when you **also** use the lock itself 44 41 to do locking across CPU's, which implies that EVERYTHING that 45 42 touches a shared variable has to agree about the spinlock they want 46 43 to use. ··· 50 43 ---- 51 44 52 45 Lesson 2: reader-writer spinlocks. 46 + ================================== 53 47 54 48 If your data accesses have a very natural pattern where you usually tend 55 49 to mostly read from the shared variables, the reader-writer locks ··· 62 54 simple spinlocks. Unless the reader critical section is long, you 63 55 are better off just using spinlocks. 64 56 65 - The routines look the same as above: 57 + The routines look the same as above:: 66 58 67 59 rwlock_t xxx_lock = __RW_LOCK_UNLOCKED(xxx_lock); 68 60 ··· 79 71 The above kind of lock may be useful for complex data structures like 80 72 linked lists, especially searching for entries without changing the list 81 73 itself. The read lock allows many concurrent readers. Anything that 82 - _changes_ the list will have to get the write lock. 74 + **changes** the list will have to get the write lock. 83 75 84 76 NOTE! RCU is better for list traversal, but requires careful 85 77 attention to design detail (see Documentation/RCU/listRCU.txt). ··· 95 87 ---- 96 88 97 89 Lesson 3: spinlocks revisited. 90 + ============================== 98 91 99 92 The single spin-lock primitives above are by no means the only ones. They 100 93 are the most safe ones, and the ones that work under all circumstances, 101 - but partly _because_ they are safe they are also fairly slow. They are slower 94 + but partly **because** they are safe they are also fairly slow. They are slower 102 95 than they'd need to be, because they do have to disable interrupts 103 96 (which is just a single instruction on a x86, but it's an expensive one - 104 97 and on other architectures it can be worse). ··· 107 98 If you have a case where you have to protect a data structure across 108 99 several CPU's and you want to use spinlocks you can potentially use 109 100 cheaper versions of the spinlocks. IFF you know that the spinlocks are 110 - never used in interrupt handlers, you can use the non-irq versions: 101 + never used in interrupt handlers, you can use the non-irq versions:: 111 102 112 103 spin_lock(&lock); 113 104 ... ··· 119 110 manipulated from a "process context", ie no interrupts involved. 120 111 121 112 The reasons you mustn't use these versions if you have interrupts that 122 - play with the spinlock is that you can get deadlocks: 113 + play with the spinlock is that you can get deadlocks:: 123 114 124 115 spin_lock(&lock); 125 116 ... ··· 156 147 ---- 157 148 158 149 Reference information: 150 + ====================== 159 151 160 152 For dynamic initialization, use spin_lock_init() or rwlock_init() as 161 - appropriate: 153 + appropriate:: 162 154 163 155 spinlock_t xxx_lock; 164 156 rwlock_t xxx_rw_lock;
+46 -36
Documentation/locking/ww-mutex-design.txt Documentation/locking/ww-mutex-design.rst
··· 1 + ====================================== 1 2 Wound/Wait Deadlock-Proof Mutex Design 2 3 ====================================== 3 4 ··· 86 85 no deadlock potential and hence the ww_mutex_lock call will block and not 87 86 prematurely return -EDEADLK. The advantage of the _slow functions is in 88 87 interface safety: 88 + 89 89 - ww_mutex_lock has a __must_check int return type, whereas ww_mutex_lock_slow 90 90 has a void return type. Note that since ww mutex code needs loops/retries 91 91 anyway the __must_check doesn't result in spurious warnings, even though the ··· 117 115 and you want to reduce the number of rollbacks. 118 116 119 117 Three different ways to acquire locks within the same w/w class. Common 120 - definitions for methods #1 and #2: 118 + definitions for methods #1 and #2:: 121 119 122 - static DEFINE_WW_CLASS(ww_class); 120 + static DEFINE_WW_CLASS(ww_class); 123 121 124 - struct obj { 122 + struct obj { 125 123 struct ww_mutex lock; 126 124 /* obj data */ 127 - }; 125 + }; 128 126 129 - struct obj_entry { 127 + struct obj_entry { 130 128 struct list_head head; 131 129 struct obj *obj; 132 - }; 130 + }; 133 131 134 132 Method 1, using a list in execbuf->buffers that's not allowed to be reordered. 135 133 This is useful if a list of required objects is already tracked somewhere. 136 134 Furthermore the lock helper can use propagate the -EALREADY return code back to 137 135 the caller as a signal that an object is twice on the list. This is useful if 138 136 the list is constructed from userspace input and the ABI requires userspace to 139 - not have duplicate entries (e.g. for a gpu commandbuffer submission ioctl). 137 + not have duplicate entries (e.g. for a gpu commandbuffer submission ioctl):: 140 138 141 - int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) 142 - { 139 + int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) 140 + { 143 141 struct obj *res_obj = NULL; 144 142 struct obj_entry *contended_entry = NULL; 145 143 struct obj_entry *entry; 146 144 147 145 ww_acquire_init(ctx, &ww_class); 148 146 149 - retry: 147 + retry: 150 148 list_for_each_entry (entry, list, head) { 151 149 if (entry->obj == res_obj) { 152 150 res_obj = NULL; ··· 162 160 ww_acquire_done(ctx); 163 161 return 0; 164 162 165 - err: 163 + err: 166 164 list_for_each_entry_continue_reverse (entry, list, head) 167 165 ww_mutex_unlock(&entry->obj->lock); 168 166 ··· 178 176 ww_acquire_fini(ctx); 179 177 180 178 return ret; 181 - } 179 + } 182 180 183 181 Method 2, using a list in execbuf->buffers that can be reordered. Same semantics 184 182 of duplicate entry detection using -EALREADY as method 1 above. But the 185 - list-reordering allows for a bit more idiomatic code. 183 + list-reordering allows for a bit more idiomatic code:: 186 184 187 - int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) 188 - { 185 + int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) 186 + { 189 187 struct obj_entry *entry, *entry2; 190 188 191 189 ww_acquire_init(ctx, &ww_class); ··· 218 216 219 217 ww_acquire_done(ctx); 220 218 return 0; 221 - } 219 + } 222 220 223 - Unlocking works the same way for both methods #1 and #2: 221 + Unlocking works the same way for both methods #1 and #2:: 224 222 225 - void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) 226 - { 223 + void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) 224 + { 227 225 struct obj_entry *entry; 228 226 229 227 list_for_each_entry (entry, list, head) 230 228 ww_mutex_unlock(&entry->obj->lock); 231 229 232 230 ww_acquire_fini(ctx); 233 - } 231 + } 234 232 235 233 Method 3 is useful if the list of objects is constructed ad-hoc and not upfront, 236 234 e.g. when adjusting edges in a graph where each node has its own ww_mutex lock, 237 235 and edges can only be changed when holding the locks of all involved nodes. w/w 238 236 mutexes are a natural fit for such a case for two reasons: 237 + 239 238 - They can handle lock-acquisition in any order which allows us to start walking 240 239 a graph from a starting point and then iteratively discovering new edges and 241 240 locking down the nodes those edges connect to. ··· 246 243 as a starting point). 247 244 248 245 Note that this approach differs in two important ways from the above methods: 246 + 249 247 - Since the list of objects is dynamically constructed (and might very well be 250 248 different when retrying due to hitting the -EDEADLK die condition) there's 251 249 no need to keep any object on a persistent list when it's not locked. We can ··· 264 260 265 261 Also, method 3 can't fail the lock acquisition step since it doesn't return 266 262 -EALREADY. Of course this would be different when using the _interruptible 267 - variants, but that's outside of the scope of these examples here. 263 + variants, but that's outside of the scope of these examples here:: 268 264 269 - struct obj { 265 + struct obj { 270 266 struct ww_mutex ww_mutex; 271 267 struct list_head locked_list; 272 - }; 268 + }; 273 269 274 - static DEFINE_WW_CLASS(ww_class); 270 + static DEFINE_WW_CLASS(ww_class); 275 271 276 - void __unlock_objs(struct list_head *list) 277 - { 272 + void __unlock_objs(struct list_head *list) 273 + { 278 274 struct obj *entry, *temp; 279 275 280 276 list_for_each_entry_safe (entry, temp, list, locked_list) { ··· 283 279 list_del(&entry->locked_list); 284 280 ww_mutex_unlock(entry->ww_mutex) 285 281 } 286 - } 282 + } 287 283 288 - void lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) 289 - { 284 + void lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) 285 + { 290 286 struct obj *obj; 291 287 292 288 ww_acquire_init(ctx, &ww_class); 293 289 294 - retry: 290 + retry: 295 291 /* re-init loop start state */ 296 292 loop { 297 293 /* magic code which walks over a graph and decides which objects ··· 316 312 317 313 ww_acquire_done(ctx); 318 314 return 0; 319 - } 315 + } 320 316 321 - void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) 322 - { 317 + void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) 318 + { 323 319 __unlock_objs(list); 324 320 ww_acquire_fini(ctx); 325 - } 321 + } 326 322 327 323 Method 4: Only lock one single objects. In that case deadlock detection and 328 324 prevention is obviously overkill, since with grabbing just one lock you can't ··· 333 329 ---------------------- 334 330 335 331 Design: 332 + ^^^^^^^ 333 + 336 334 ww_mutex currently encapsulates a struct mutex, this means no extra overhead for 337 335 normal mutex locks, which are far more common. As such there is only a small 338 336 increase in code size if wait/wound mutexes are not used. 339 337 340 338 We maintain the following invariants for the wait list: 339 + 341 340 (1) Waiters with an acquire context are sorted by stamp order; waiters 342 341 without an acquire context are interspersed in FIFO order. 343 342 (2) For Wait-Die, among waiters with contexts, only the first one can have ··· 362 355 therefore be directed towards the uncontended cases. 363 356 364 357 Lockdep: 358 + ^^^^^^^^ 359 + 365 360 Special care has been taken to warn for as many cases of api abuse 366 361 as possible. Some common api abuses will be caught with 367 362 CONFIG_DEBUG_MUTEXES, but CONFIG_PROVE_LOCKING is recommended. ··· 388 379 having called ww_acquire_fini on the first. 389 380 - 'normal' deadlocks that can occur. 390 381 391 - FIXME: Update this section once we have the TASK_DEADLOCK task state flag magic 392 - implemented. 382 + FIXME: 383 + Update this section once we have the TASK_DEADLOCK task state flag magic 384 + implemented.
+1 -1
Documentation/pi-futex.txt
··· 119 119 robust-futex, PI-futex, robust+PI-futex. 120 120 121 121 More details about priority inheritance can be found in 122 - Documentation/locking/rt-mutex.txt. 122 + Documentation/locking/rt-mutex.rst.
+1 -1
Documentation/translations/it_IT/kernel-hacking/locking.rst
··· 1404 1404 Approfondimenti 1405 1405 =============== 1406 1406 1407 - - ``Documentation/locking/spinlocks.txt``: la guida di Linus Torvalds agli 1407 + - ``Documentation/locking/spinlocks.rst``: la guida di Linus Torvalds agli 1408 1408 spinlock del kernel. 1409 1409 1410 1410 - Unix Systems for Modern Architectures: Symmetric Multiprocessing and
+1 -1
drivers/gpu/drm/drm_modeset_lock.c
··· 36 36 * of extra utility/tracking out of our acquire-ctx. This is provided 37 37 * by &struct drm_modeset_lock and &struct drm_modeset_acquire_ctx. 38 38 * 39 - * For basic principles of &ww_mutex, see: Documentation/locking/ww-mutex-design.txt 39 + * For basic principles of &ww_mutex, see: Documentation/locking/ww-mutex-design.rst 40 40 * 41 41 * The basic usage pattern is to:: 42 42 *
+1 -1
include/linux/lockdep.h
··· 5 5 * Copyright (C) 2006,2007 Red Hat, Inc., Ingo Molnar <mingo@redhat.com> 6 6 * Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra 7 7 * 8 - * see Documentation/locking/lockdep-design.txt for more details. 8 + * see Documentation/locking/lockdep-design.rst for more details. 9 9 */ 10 10 #ifndef __LINUX_LOCKDEP_H 11 11 #define __LINUX_LOCKDEP_H
+1 -1
include/linux/mutex.h
··· 151 151 152 152 /* 153 153 * See kernel/locking/mutex.c for detailed documentation of these APIs. 154 - * Also see Documentation/locking/mutex-design.txt. 154 + * Also see Documentation/locking/mutex-design.rst. 155 155 */ 156 156 #ifdef CONFIG_DEBUG_LOCK_ALLOC 157 157 extern void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
+1 -1
include/linux/rwsem.h
··· 160 160 * static then another method for expressing nested locking is 161 161 * the explicit definition of lock class keys and the use of 162 162 * lockdep_set_class() at lock initialization time. 163 - * See Documentation/locking/lockdep-design.txt for more details.) 163 + * See Documentation/locking/lockdep-design.rst for more details.) 164 164 */ 165 165 extern void down_read_nested(struct rw_semaphore *sem, int subclass); 166 166 extern void down_write_nested(struct rw_semaphore *sem, int subclass);
+1 -1
kernel/locking/mutex.c
··· 16 16 * by Steven Rostedt, based on work by Gregory Haskins, Peter Morreale 17 17 * and Sven Dietrich. 18 18 * 19 - * Also see Documentation/locking/mutex-design.txt. 19 + * Also see Documentation/locking/mutex-design.rst. 20 20 */ 21 21 #include <linux/mutex.h> 22 22 #include <linux/ww_mutex.h>
+1 -1
kernel/locking/rtmutex.c
··· 9 9 * Copyright (C) 2005 Kihon Technologies Inc., Steven Rostedt 10 10 * Copyright (C) 2006 Esben Nielsen 11 11 * 12 - * See Documentation/locking/rt-mutex-design.txt for details. 12 + * See Documentation/locking/rt-mutex-design.rst for details. 13 13 */ 14 14 #include <linux/spinlock.h> 15 15 #include <linux/export.h>
+2 -2
lib/Kconfig.debug
··· 1139 1139 the proof of observed correctness is also maintained for an 1140 1140 arbitrary combination of these separate locking variants. 1141 1141 1142 - For more details, see Documentation/locking/lockdep-design.txt. 1142 + For more details, see Documentation/locking/lockdep-design.rst. 1143 1143 1144 1144 config LOCK_STAT 1145 1145 bool "Lock usage statistics" ··· 1153 1153 help 1154 1154 This feature enables tracking lock contention points 1155 1155 1156 - For more details, see Documentation/locking/lockstat.txt 1156 + For more details, see Documentation/locking/lockstat.rst 1157 1157 1158 1158 This also enables lock events required by "perf lock", 1159 1159 subcommand of perf.