Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
"The main changes in this cycle were:

- Continued user-access cleanups in the futex code.

- percpu-rwsem rewrite that uses its own waitqueue and atomic_t
instead of an embedded rwsem. This addresses a couple of
weaknesses, but the primary motivation was complications on the -rt
kernel.

- Introduce raw lock nesting detection on lockdep
(CONFIG_PROVE_RAW_LOCK_NESTING=y), document the raw_lock vs. normal
lock differences. This too originates from -rt.

- Reuse lockdep zapped chain_hlocks entries, to conserve RAM
footprint on distro-ish kernels running into the "BUG:
MAX_LOCKDEP_CHAIN_HLOCKS too low!" depletion of the lockdep
chain-entries pool.

- Misc cleanups, smaller fixes and enhancements - see the changelog
for details"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
Documentation/locking/locktypes: Minor copy editor fixes
Documentation/locking/locktypes: Further clarifications and wordsmithing
m68knommu: Remove mm.h include from uaccess_no.h
x86: get rid of user_atomic_cmpxchg_inatomic()
generic arch_futex_atomic_op_inuser() doesn't need access_ok()
x86: don't reload after cmpxchg in unsafe_atomic_op2() loop
x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()
objtool: whitelist __sanitizer_cov_trace_switch()
[parisc, s390, sparc64] no need for access_ok() in futex handling
sh: no need of access_ok() in arch_futex_atomic_op_inuser()
futex: arch_futex_atomic_op_inuser() calling conventions change
completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
lockdep: Add posixtimer context tracing bits
lockdep: Annotate irq_work
lockdep: Add hrtimer context tracing bits
lockdep: Introduce wait-type checks
completion: Use simple wait queues
sched/swait: Prepare usage in completions
...

+1622 -713
+1
Documentation/locking/index.rst
··· 7 7 .. toctree:: 8 8 :maxdepth: 1 9 9 10 + locktypes 10 11 lockdep-design 11 12 lockstat 12 13 locktorture
+347
Documentation/locking/locktypes.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + .. _kernel_hacking_locktypes: 4 + 5 + ========================== 6 + Lock types and their rules 7 + ========================== 8 + 9 + Introduction 10 + ============ 11 + 12 + The kernel provides a variety of locking primitives which can be divided 13 + into two categories: 14 + 15 + - Sleeping locks 16 + - Spinning locks 17 + 18 + This document conceptually describes these lock types and provides rules 19 + for their nesting, including the rules for use under PREEMPT_RT. 20 + 21 + 22 + Lock categories 23 + =============== 24 + 25 + Sleeping locks 26 + -------------- 27 + 28 + Sleeping locks can only be acquired in preemptible task context. 29 + 30 + Although implementations allow try_lock() from other contexts, it is 31 + necessary to carefully evaluate the safety of unlock() as well as of 32 + try_lock(). Furthermore, it is also necessary to evaluate the debugging 33 + versions of these primitives. In short, don't acquire sleeping locks from 34 + other contexts unless there is no other option. 35 + 36 + Sleeping lock types: 37 + 38 + - mutex 39 + - rt_mutex 40 + - semaphore 41 + - rw_semaphore 42 + - ww_mutex 43 + - percpu_rw_semaphore 44 + 45 + On PREEMPT_RT kernels, these lock types are converted to sleeping locks: 46 + 47 + - spinlock_t 48 + - rwlock_t 49 + 50 + Spinning locks 51 + -------------- 52 + 53 + - raw_spinlock_t 54 + - bit spinlocks 55 + 56 + On non-PREEMPT_RT kernels, these lock types are also spinning locks: 57 + 58 + - spinlock_t 59 + - rwlock_t 60 + 61 + Spinning locks implicitly disable preemption and the lock / unlock functions 62 + can have suffixes which apply further protections: 63 + 64 + =================== ==================================================== 65 + _bh() Disable / enable bottom halves (soft interrupts) 66 + _irq() Disable / enable interrupts 67 + _irqsave/restore() Save and disable / restore interrupt disabled state 68 + =================== ==================================================== 69 + 70 + Owner semantics 71 + =============== 72 + 73 + The aforementioned lock types except semaphores have strict owner 74 + semantics: 75 + 76 + The context (task) that acquired the lock must release it. 77 + 78 + rw_semaphores have a special interface which allows non-owner release for 79 + readers. 80 + 81 + 82 + rtmutex 83 + ======= 84 + 85 + RT-mutexes are mutexes with support for priority inheritance (PI). 86 + 87 + PI has limitations on non-PREEMPT_RT kernels due to preemption and 88 + interrupt disabled sections. 89 + 90 + PI clearly cannot preempt preemption-disabled or interrupt-disabled 91 + regions of code, even on PREEMPT_RT kernels. Instead, PREEMPT_RT kernels 92 + execute most such regions of code in preemptible task context, especially 93 + interrupt handlers and soft interrupts. This conversion allows spinlock_t 94 + and rwlock_t to be implemented via RT-mutexes. 95 + 96 + 97 + semaphore 98 + ========= 99 + 100 + semaphore is a counting semaphore implementation. 101 + 102 + Semaphores are often used for both serialization and waiting, but new use 103 + cases should instead use separate serialization and wait mechanisms, such 104 + as mutexes and completions. 105 + 106 + semaphores and PREEMPT_RT 107 + ---------------------------- 108 + 109 + PREEMPT_RT does not change the semaphore implementation because counting 110 + semaphores have no concept of owners, thus preventing PREEMPT_RT from 111 + providing priority inheritance for semaphores. After all, an unknown 112 + owner cannot be boosted. As a consequence, blocking on semaphores can 113 + result in priority inversion. 114 + 115 + 116 + rw_semaphore 117 + ============ 118 + 119 + rw_semaphore is a multiple readers and single writer lock mechanism. 120 + 121 + On non-PREEMPT_RT kernels the implementation is fair, thus preventing 122 + writer starvation. 123 + 124 + rw_semaphore complies by default with the strict owner semantics, but there 125 + exist special-purpose interfaces that allow non-owner release for readers. 126 + These interfaces work independent of the kernel configuration. 127 + 128 + rw_semaphore and PREEMPT_RT 129 + --------------------------- 130 + 131 + PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based 132 + implementation, thus changing the fairness: 133 + 134 + Because an rw_semaphore writer cannot grant its priority to multiple 135 + readers, a preempted low-priority reader will continue holding its lock, 136 + thus starving even high-priority writers. In contrast, because readers 137 + can grant their priority to a writer, a preempted low-priority writer will 138 + have its priority boosted until it releases the lock, thus preventing that 139 + writer from starving readers. 140 + 141 + 142 + raw_spinlock_t and spinlock_t 143 + ============================= 144 + 145 + raw_spinlock_t 146 + -------------- 147 + 148 + raw_spinlock_t is a strict spinning lock implementation regardless of the 149 + kernel configuration including PREEMPT_RT enabled kernels. 150 + 151 + raw_spinlock_t is a strict spinning lock implementation in all kernels, 152 + including PREEMPT_RT kernels. Use raw_spinlock_t only in real critical 153 + core code, low-level interrupt handling and places where disabling 154 + preemption or interrupts is required, for example, to safely access 155 + hardware state. raw_spinlock_t can sometimes also be used when the 156 + critical section is tiny, thus avoiding RT-mutex overhead. 157 + 158 + spinlock_t 159 + ---------- 160 + 161 + The semantics of spinlock_t change with the state of PREEMPT_RT. 162 + 163 + On a non-PREEMPT_RT kernel spinlock_t is mapped to raw_spinlock_t and has 164 + exactly the same semantics. 165 + 166 + spinlock_t and PREEMPT_RT 167 + ------------------------- 168 + 169 + On a PREEMPT_RT kernel spinlock_t is mapped to a separate implementation 170 + based on rt_mutex which changes the semantics: 171 + 172 + - Preemption is not disabled. 173 + 174 + - The hard interrupt related suffixes for spin_lock / spin_unlock 175 + operations (_irq, _irqsave / _irqrestore) do not affect the CPU's 176 + interrupt disabled state. 177 + 178 + - The soft interrupt related suffix (_bh()) still disables softirq 179 + handlers. 180 + 181 + Non-PREEMPT_RT kernels disable preemption to get this effect. 182 + 183 + PREEMPT_RT kernels use a per-CPU lock for serialization which keeps 184 + preemption disabled. The lock disables softirq handlers and also 185 + prevents reentrancy due to task preemption. 186 + 187 + PREEMPT_RT kernels preserve all other spinlock_t semantics: 188 + 189 + - Tasks holding a spinlock_t do not migrate. Non-PREEMPT_RT kernels 190 + avoid migration by disabling preemption. PREEMPT_RT kernels instead 191 + disable migration, which ensures that pointers to per-CPU variables 192 + remain valid even if the task is preempted. 193 + 194 + - Task state is preserved across spinlock acquisition, ensuring that the 195 + task-state rules apply to all kernel configurations. Non-PREEMPT_RT 196 + kernels leave task state untouched. However, PREEMPT_RT must change 197 + task state if the task blocks during acquisition. Therefore, it saves 198 + the current task state before blocking and the corresponding lock wakeup 199 + restores it, as shown below:: 200 + 201 + task->state = TASK_INTERRUPTIBLE 202 + lock() 203 + block() 204 + task->saved_state = task->state 205 + task->state = TASK_UNINTERRUPTIBLE 206 + schedule() 207 + lock wakeup 208 + task->state = task->saved_state 209 + 210 + Other types of wakeups would normally unconditionally set the task state 211 + to RUNNING, but that does not work here because the task must remain 212 + blocked until the lock becomes available. Therefore, when a non-lock 213 + wakeup attempts to awaken a task blocked waiting for a spinlock, it 214 + instead sets the saved state to RUNNING. Then, when the lock 215 + acquisition completes, the lock wakeup sets the task state to the saved 216 + state, in this case setting it to RUNNING:: 217 + 218 + task->state = TASK_INTERRUPTIBLE 219 + lock() 220 + block() 221 + task->saved_state = task->state 222 + task->state = TASK_UNINTERRUPTIBLE 223 + schedule() 224 + non lock wakeup 225 + task->saved_state = TASK_RUNNING 226 + 227 + lock wakeup 228 + task->state = task->saved_state 229 + 230 + This ensures that the real wakeup cannot be lost. 231 + 232 + 233 + rwlock_t 234 + ======== 235 + 236 + rwlock_t is a multiple readers and single writer lock mechanism. 237 + 238 + Non-PREEMPT_RT kernels implement rwlock_t as a spinning lock and the 239 + suffix rules of spinlock_t apply accordingly. The implementation is fair, 240 + thus preventing writer starvation. 241 + 242 + rwlock_t and PREEMPT_RT 243 + ----------------------- 244 + 245 + PREEMPT_RT kernels map rwlock_t to a separate rt_mutex-based 246 + implementation, thus changing semantics: 247 + 248 + - All the spinlock_t changes also apply to rwlock_t. 249 + 250 + - Because an rwlock_t writer cannot grant its priority to multiple 251 + readers, a preempted low-priority reader will continue holding its lock, 252 + thus starving even high-priority writers. In contrast, because readers 253 + can grant their priority to a writer, a preempted low-priority writer 254 + will have its priority boosted until it releases the lock, thus 255 + preventing that writer from starving readers. 256 + 257 + 258 + PREEMPT_RT caveats 259 + ================== 260 + 261 + spinlock_t and rwlock_t 262 + ----------------------- 263 + 264 + These changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels 265 + have a few implications. For example, on a non-PREEMPT_RT kernel the 266 + following code sequence works as expected:: 267 + 268 + local_irq_disable(); 269 + spin_lock(&lock); 270 + 271 + and is fully equivalent to:: 272 + 273 + spin_lock_irq(&lock); 274 + 275 + Same applies to rwlock_t and the _irqsave() suffix variants. 276 + 277 + On PREEMPT_RT kernel this code sequence breaks because RT-mutex requires a 278 + fully preemptible context. Instead, use spin_lock_irq() or 279 + spin_lock_irqsave() and their unlock counterparts. In cases where the 280 + interrupt disabling and locking must remain separate, PREEMPT_RT offers a 281 + local_lock mechanism. Acquiring the local_lock pins the task to a CPU, 282 + allowing things like per-CPU interrupt disabled locks to be acquired. 283 + However, this approach should be used only where absolutely necessary. 284 + 285 + 286 + raw_spinlock_t 287 + -------------- 288 + 289 + Acquiring a raw_spinlock_t disables preemption and possibly also 290 + interrupts, so the critical section must avoid acquiring a regular 291 + spinlock_t or rwlock_t, for example, the critical section must avoid 292 + allocating memory. Thus, on a non-PREEMPT_RT kernel the following code 293 + works perfectly:: 294 + 295 + raw_spin_lock(&lock); 296 + p = kmalloc(sizeof(*p), GFP_ATOMIC); 297 + 298 + But this code fails on PREEMPT_RT kernels because the memory allocator is 299 + fully preemptible and therefore cannot be invoked from truly atomic 300 + contexts. However, it is perfectly fine to invoke the memory allocator 301 + while holding normal non-raw spinlocks because they do not disable 302 + preemption on PREEMPT_RT kernels:: 303 + 304 + spin_lock(&lock); 305 + p = kmalloc(sizeof(*p), GFP_ATOMIC); 306 + 307 + 308 + bit spinlocks 309 + ------------- 310 + 311 + PREEMPT_RT cannot substitute bit spinlocks because a single bit is too 312 + small to accommodate an RT-mutex. Therefore, the semantics of bit 313 + spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t 314 + caveats also apply to bit spinlocks. 315 + 316 + Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT 317 + using conditional (#ifdef'ed) code changes at the usage site. In contrast, 318 + usage-site changes are not needed for the spinlock_t substitution. 319 + Instead, conditionals in header files and the core locking implemementation 320 + enable the compiler to do the substitution transparently. 321 + 322 + 323 + Lock type nesting rules 324 + ======================= 325 + 326 + The most basic rules are: 327 + 328 + - Lock types of the same lock category (sleeping, spinning) can nest 329 + arbitrarily as long as they respect the general lock ordering rules to 330 + prevent deadlocks. 331 + 332 + - Sleeping lock types cannot nest inside spinning lock types. 333 + 334 + - Spinning lock types can nest inside sleeping lock types. 335 + 336 + These constraints apply both in PREEMPT_RT and otherwise. 337 + 338 + The fact that PREEMPT_RT changes the lock category of spinlock_t and 339 + rwlock_t from spinning to sleeping means that they cannot be acquired while 340 + holding a raw spinlock. This results in the following nesting ordering: 341 + 342 + 1) Sleeping locks 343 + 2) spinlock_t and rwlock_t 344 + 3) raw_spinlock_t and bit spinlocks 345 + 346 + Lockdep will complain if these constraints are violated, both in 347 + PREEMPT_RT and otherwise.
+2 -3
arch/alpha/include/asm/futex.h
··· 31 31 { 32 32 int oldval = 0, ret; 33 33 34 - pagefault_disable(); 34 + if (!access_ok(uaddr, sizeof(u32))) 35 + return -EFAULT; 35 36 36 37 switch (op) { 37 38 case FUTEX_OP_SET: ··· 53 52 default: 54 53 ret = -ENOSYS; 55 54 } 56 - 57 - pagefault_enable(); 58 55 59 56 if (!ret) 60 57 *oval = oldval;
+3 -2
arch/arc/include/asm/futex.h
··· 75 75 { 76 76 int oldval = 0, ret; 77 77 78 + if (!access_ok(uaddr, sizeof(u32))) 79 + return -EFAULT; 80 + 78 81 #ifndef CONFIG_ARC_HAS_LLSC 79 82 preempt_disable(); /* to guarantee atomic r-m-w of futex op */ 80 83 #endif 81 - pagefault_disable(); 82 84 83 85 switch (op) { 84 86 case FUTEX_OP_SET: ··· 103 101 ret = -ENOSYS; 104 102 } 105 103 106 - pagefault_enable(); 107 104 #ifndef CONFIG_ARC_HAS_LLSC 108 105 preempt_enable(); 109 106 #endif
+3 -2
arch/arm/include/asm/futex.h
··· 134 134 { 135 135 int oldval = 0, ret, tmp; 136 136 137 + if (!access_ok(uaddr, sizeof(u32))) 138 + return -EFAULT; 139 + 137 140 #ifndef CONFIG_SMP 138 141 preempt_disable(); 139 142 #endif 140 - pagefault_disable(); 141 143 142 144 switch (op) { 143 145 case FUTEX_OP_SET: ··· 161 159 ret = -ENOSYS; 162 160 } 163 161 164 - pagefault_enable(); 165 162 #ifndef CONFIG_SMP 166 163 preempt_enable(); 167 164 #endif
+2 -3
arch/arm64/include/asm/futex.h
··· 48 48 int oldval = 0, ret, tmp; 49 49 u32 __user *uaddr = __uaccess_mask_ptr(_uaddr); 50 50 51 - pagefault_disable(); 51 + if (!access_ok(_uaddr, sizeof(u32))) 52 + return -EFAULT; 52 53 53 54 switch (op) { 54 55 case FUTEX_OP_SET: ··· 75 74 default: 76 75 ret = -ENOSYS; 77 76 } 78 - 79 - pagefault_enable(); 80 77 81 78 if (!ret) 82 79 *oval = oldval;
-1
arch/csky/include/asm/uaccess.h
··· 11 11 #include <linux/errno.h> 12 12 #include <linux/types.h> 13 13 #include <linux/sched.h> 14 - #include <linux/mm.h> 15 14 #include <linux/string.h> 16 15 #include <linux/version.h> 17 16 #include <asm/segment.h>
+2 -3
arch/hexagon/include/asm/futex.h
··· 36 36 { 37 37 int oldval = 0, ret; 38 38 39 - pagefault_disable(); 39 + if (!access_ok(uaddr, sizeof(u32))) 40 + return -EFAULT; 40 41 41 42 switch (op) { 42 43 case FUTEX_OP_SET: ··· 62 61 default: 63 62 ret = -ENOSYS; 64 63 } 65 - 66 - pagefault_enable(); 67 64 68 65 if (!ret) 69 66 *oval = oldval;
-1
arch/hexagon/include/asm/uaccess.h
··· 10 10 /* 11 11 * User space memory access functions 12 12 */ 13 - #include <linux/mm.h> 14 13 #include <asm/sections.h> 15 14 16 15 /*
+2 -3
arch/ia64/include/asm/futex.h
··· 50 50 { 51 51 int oldval = 0, ret; 52 52 53 - pagefault_disable(); 53 + if (!access_ok(uaddr, sizeof(u32))) 54 + return -EFAULT; 54 55 55 56 switch (op) { 56 57 case FUTEX_OP_SET: ··· 74 73 default: 75 74 ret = -ENOSYS; 76 75 } 77 - 78 - pagefault_enable(); 79 76 80 77 if (!ret) 81 78 *oval = oldval;
-1
arch/ia64/include/asm/uaccess.h
··· 35 35 36 36 #include <linux/compiler.h> 37 37 #include <linux/page-flags.h> 38 - #include <linux/mm.h> 39 38 40 39 #include <asm/intrinsics.h> 41 40 #include <asm/pgtable.h>
+1
arch/ia64/kernel/process.c
··· 681 681 machine_halt(); 682 682 } 683 683 684 + EXPORT_SYMBOL(ia64_delay_loop);
+1
arch/ia64/mm/ioremap.c
··· 8 8 #include <linux/module.h> 9 9 #include <linux/efi.h> 10 10 #include <linux/io.h> 11 + #include <linux/mm.h> 11 12 #include <linux/vmalloc.h> 12 13 #include <asm/io.h> 13 14 #include <asm/meminit.h>
-1
arch/m68k/include/asm/uaccess_no.h
··· 5 5 /* 6 6 * User space memory access functions 7 7 */ 8 - #include <linux/mm.h> 9 8 #include <linux/string.h> 10 9 11 10 #include <asm/segment.h>
+2 -3
arch/microblaze/include/asm/futex.h
··· 34 34 { 35 35 int oldval = 0, ret; 36 36 37 - pagefault_disable(); 37 + if (!access_ok(uaddr, sizeof(u32))) 38 + return -EFAULT; 38 39 39 40 switch (op) { 40 41 case FUTEX_OP_SET: ··· 56 55 default: 57 56 ret = -ENOSYS; 58 57 } 59 - 60 - pagefault_enable(); 61 58 62 59 if (!ret) 63 60 *oval = oldval;
-1
arch/microblaze/include/asm/uaccess.h
··· 12 12 #define _ASM_MICROBLAZE_UACCESS_H 13 13 14 14 #include <linux/kernel.h> 15 - #include <linux/mm.h> 16 15 17 16 #include <asm/mmu.h> 18 17 #include <asm/page.h>
+2 -3
arch/mips/include/asm/futex.h
··· 89 89 { 90 90 int oldval = 0, ret; 91 91 92 - pagefault_disable(); 92 + if (!access_ok(uaddr, sizeof(u32))) 93 + return -EFAULT; 93 94 94 95 switch (op) { 95 96 case FUTEX_OP_SET: ··· 116 115 default: 117 116 ret = -ENOSYS; 118 117 } 119 - 120 - pagefault_enable(); 121 118 122 119 if (!ret) 123 120 *oval = oldval;
+2 -4
arch/nds32/include/asm/futex.h
··· 66 66 { 67 67 int oldval = 0, ret; 68 68 69 - 70 - pagefault_disable(); 69 + if (!access_ok(uaddr, sizeof(u32))) 70 + return -EFAULT; 71 71 switch (op) { 72 72 case FUTEX_OP_SET: 73 73 __futex_atomic_op("move %0, %3", ret, oldval, tmp, uaddr, ··· 92 92 default: 93 93 ret = -ENOSYS; 94 94 } 95 - 96 - pagefault_enable(); 97 95 98 96 if (!ret) 99 97 *oval = oldval;
-1
arch/nds32/include/asm/uaccess.h
··· 11 11 #include <asm/errno.h> 12 12 #include <asm/memory.h> 13 13 #include <asm/types.h> 14 - #include <linux/mm.h> 15 14 16 15 #define __asmeq(x, y) ".ifnc " x "," y " ; .err ; .endif\n\t" 17 16
+2 -3
arch/openrisc/include/asm/futex.h
··· 35 35 { 36 36 int oldval = 0, ret; 37 37 38 - pagefault_disable(); 38 + if (!access_ok(uaddr, sizeof(u32))) 39 + return -EFAULT; 39 40 40 41 switch (op) { 41 42 case FUTEX_OP_SET: ··· 57 56 default: 58 57 ret = -ENOSYS; 59 58 } 60 - 61 - pagefault_enable(); 62 59 63 60 if (!ret) 64 61 *oval = oldval;
-2
arch/parisc/include/asm/futex.h
··· 40 40 u32 tmp; 41 41 42 42 _futex_spin_lock_irqsave(uaddr, &flags); 43 - pagefault_disable(); 44 43 45 44 ret = -EFAULT; 46 45 if (unlikely(get_user(oldval, uaddr) != 0)) ··· 72 73 ret = -EFAULT; 73 74 74 75 out_pagefault_enable: 75 - pagefault_enable(); 76 76 _futex_spin_unlock_irqrestore(uaddr, &flags); 77 77 78 78 if (!ret)
+2 -3
arch/powerpc/include/asm/futex.h
··· 35 35 { 36 36 int oldval = 0, ret; 37 37 38 + if (!access_ok(uaddr, sizeof(u32))) 39 + return -EFAULT; 38 40 allow_read_write_user(uaddr, uaddr, sizeof(*uaddr)); 39 - pagefault_disable(); 40 41 41 42 switch (op) { 42 43 case FUTEX_OP_SET: ··· 58 57 default: 59 58 ret = -ENOSYS; 60 59 } 61 - 62 - pagefault_enable(); 63 60 64 61 *oval = oldval; 65 62
+9 -9
arch/powerpc/platforms/ps3/device-init.c
··· 13 13 #include <linux/init.h> 14 14 #include <linux/slab.h> 15 15 #include <linux/reboot.h> 16 + #include <linux/rcuwait.h> 16 17 17 18 #include <asm/firmware.h> 18 19 #include <asm/lv1call.h> ··· 671 670 spinlock_t lock; 672 671 u64 tag; 673 672 u64 lv1_status; 674 - struct completion done; 673 + struct rcuwait wait; 674 + bool done; 675 675 }; 676 676 677 677 enum ps3_notify_type { ··· 714 712 pr_debug("%s:%u: completed, status 0x%llx\n", __func__, 715 713 __LINE__, status); 716 714 dev->lv1_status = status; 717 - complete(&dev->done); 715 + dev->done = true; 716 + rcuwait_wake_up(&dev->wait); 718 717 } 719 718 spin_unlock(&dev->lock); 720 719 return IRQ_HANDLED; ··· 728 725 unsigned long flags; 729 726 int res; 730 727 731 - init_completion(&dev->done); 732 728 spin_lock_irqsave(&dev->lock, flags); 733 729 res = write ? lv1_storage_write(dev->sbd.dev_id, 0, 0, 1, 0, lpar, 734 730 &dev->tag) 735 731 : lv1_storage_read(dev->sbd.dev_id, 0, 0, 1, 0, lpar, 736 732 &dev->tag); 733 + dev->done = false; 737 734 spin_unlock_irqrestore(&dev->lock, flags); 738 735 if (res) { 739 736 pr_err("%s:%u: %s failed %d\n", __func__, __LINE__, op, res); ··· 741 738 } 742 739 pr_debug("%s:%u: notification %s issued\n", __func__, __LINE__, op); 743 740 744 - res = wait_event_interruptible(dev->done.wait, 745 - dev->done.done || kthread_should_stop()); 741 + rcuwait_wait_event(&dev->wait, dev->done || kthread_should_stop(), TASK_IDLE); 742 + 746 743 if (kthread_should_stop()) 747 744 res = -EINTR; 748 - if (res) { 749 - pr_debug("%s:%u: interrupted %s\n", __func__, __LINE__, op); 750 - return res; 751 - } 752 745 753 746 if (dev->lv1_status) { 754 747 pr_err("%s:%u: %s not completed, status 0x%llx\n", __func__, ··· 809 810 } 810 811 811 812 spin_lock_init(&dev.lock); 813 + rcuwait_init(&dev.wait); 812 814 813 815 res = request_irq(irq, ps3_notification_interrupt, 0, 814 816 "ps3_notification", &dev);
+2 -3
arch/riscv/include/asm/futex.h
··· 46 46 { 47 47 int oldval = 0, ret = 0; 48 48 49 - pagefault_disable(); 49 + if (!access_ok(uaddr, sizeof(u32))) 50 + return -EFAULT; 50 51 51 52 switch (op) { 52 53 case FUTEX_OP_SET: ··· 73 72 default: 74 73 ret = -ENOSYS; 75 74 } 76 - 77 - pagefault_enable(); 78 75 79 76 if (!ret) 80 77 *oval = oldval;
-2
arch/s390/include/asm/futex.h
··· 29 29 mm_segment_t old_fs; 30 30 31 31 old_fs = enable_sacf_uaccess(); 32 - pagefault_disable(); 33 32 switch (op) { 34 33 case FUTEX_OP_SET: 35 34 __futex_atomic_op("lr %2,%5\n", ··· 53 54 default: 54 55 ret = -ENOSYS; 55 56 } 56 - pagefault_enable(); 57 57 disable_sacf_uaccess(old_fs); 58 58 59 59 if (!ret)
-4
arch/sh/include/asm/futex.h
··· 34 34 u32 oldval, newval, prev; 35 35 int ret; 36 36 37 - pagefault_disable(); 38 - 39 37 do { 40 38 ret = get_user(oldval, uaddr); 41 39 ··· 64 66 65 67 ret = futex_atomic_cmpxchg_inatomic(&prev, uaddr, oldval, newval); 66 68 } while (!ret && prev != oldval); 67 - 68 - pagefault_enable(); 69 69 70 70 if (!ret) 71 71 *oval = oldval;
-4
arch/sparc/include/asm/futex_64.h
··· 38 38 if (unlikely((((unsigned long) uaddr) & 0x3UL))) 39 39 return -EINVAL; 40 40 41 - pagefault_disable(); 42 - 43 41 switch (op) { 44 42 case FUTEX_OP_SET: 45 43 __futex_cas_op("mov\t%4, %1", ret, oldval, uaddr, oparg); ··· 57 59 default: 58 60 ret = -ENOSYS; 59 61 } 60 - 61 - pagefault_enable(); 62 62 63 63 if (!ret) 64 64 *oval = oldval;
+63 -36
arch/x86/include/asm/futex.h
··· 12 12 #include <asm/processor.h> 13 13 #include <asm/smap.h> 14 14 15 - #define __futex_atomic_op1(insn, ret, oldval, uaddr, oparg) \ 16 - asm volatile("\t" ASM_STAC "\n" \ 17 - "1:\t" insn "\n" \ 18 - "2:\t" ASM_CLAC "\n" \ 15 + #define unsafe_atomic_op1(insn, oval, uaddr, oparg, label) \ 16 + do { \ 17 + int oldval = 0, ret; \ 18 + asm volatile("1:\t" insn "\n" \ 19 + "2:\n" \ 19 20 "\t.section .fixup,\"ax\"\n" \ 20 21 "3:\tmov\t%3, %1\n" \ 21 22 "\tjmp\t2b\n" \ 22 23 "\t.previous\n" \ 23 24 _ASM_EXTABLE_UA(1b, 3b) \ 24 25 : "=r" (oldval), "=r" (ret), "+m" (*uaddr) \ 25 - : "i" (-EFAULT), "0" (oparg), "1" (0)) 26 + : "i" (-EFAULT), "0" (oparg), "1" (0)); \ 27 + if (ret) \ 28 + goto label; \ 29 + *oval = oldval; \ 30 + } while(0) 26 31 27 - #define __futex_atomic_op2(insn, ret, oldval, uaddr, oparg) \ 28 - asm volatile("\t" ASM_STAC "\n" \ 29 - "1:\tmovl %2, %0\n" \ 30 - "\tmovl\t%0, %3\n" \ 32 + 33 + #define unsafe_atomic_op2(insn, oval, uaddr, oparg, label) \ 34 + do { \ 35 + int oldval = 0, ret, tem; \ 36 + asm volatile("1:\tmovl %2, %0\n" \ 37 + "2:\tmovl\t%0, %3\n" \ 31 38 "\t" insn "\n" \ 32 - "2:\t" LOCK_PREFIX "cmpxchgl %3, %2\n" \ 33 - "\tjnz\t1b\n" \ 34 - "3:\t" ASM_CLAC "\n" \ 39 + "3:\t" LOCK_PREFIX "cmpxchgl %3, %2\n" \ 40 + "\tjnz\t2b\n" \ 41 + "4:\n" \ 35 42 "\t.section .fixup,\"ax\"\n" \ 36 - "4:\tmov\t%5, %1\n" \ 37 - "\tjmp\t3b\n" \ 43 + "5:\tmov\t%5, %1\n" \ 44 + "\tjmp\t4b\n" \ 38 45 "\t.previous\n" \ 39 - _ASM_EXTABLE_UA(1b, 4b) \ 40 - _ASM_EXTABLE_UA(2b, 4b) \ 46 + _ASM_EXTABLE_UA(1b, 5b) \ 47 + _ASM_EXTABLE_UA(3b, 5b) \ 41 48 : "=&a" (oldval), "=&r" (ret), \ 42 49 "+m" (*uaddr), "=&r" (tem) \ 43 - : "r" (oparg), "i" (-EFAULT), "1" (0)) 50 + : "r" (oparg), "i" (-EFAULT), "1" (0)); \ 51 + if (ret) \ 52 + goto label; \ 53 + *oval = oldval; \ 54 + } while(0) 44 55 45 - static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, 56 + static __always_inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, 46 57 u32 __user *uaddr) 47 58 { 48 - int oldval = 0, ret, tem; 49 - 50 - pagefault_disable(); 59 + if (!user_access_begin(uaddr, sizeof(u32))) 60 + return -EFAULT; 51 61 52 62 switch (op) { 53 63 case FUTEX_OP_SET: 54 - __futex_atomic_op1("xchgl %0, %2", ret, oldval, uaddr, oparg); 64 + unsafe_atomic_op1("xchgl %0, %2", oval, uaddr, oparg, Efault); 55 65 break; 56 66 case FUTEX_OP_ADD: 57 - __futex_atomic_op1(LOCK_PREFIX "xaddl %0, %2", ret, oldval, 58 - uaddr, oparg); 67 + unsafe_atomic_op1(LOCK_PREFIX "xaddl %0, %2", oval, 68 + uaddr, oparg, Efault); 59 69 break; 60 70 case FUTEX_OP_OR: 61 - __futex_atomic_op2("orl %4, %3", ret, oldval, uaddr, oparg); 71 + unsafe_atomic_op2("orl %4, %3", oval, uaddr, oparg, Efault); 62 72 break; 63 73 case FUTEX_OP_ANDN: 64 - __futex_atomic_op2("andl %4, %3", ret, oldval, uaddr, ~oparg); 74 + unsafe_atomic_op2("andl %4, %3", oval, uaddr, ~oparg, Efault); 65 75 break; 66 76 case FUTEX_OP_XOR: 67 - __futex_atomic_op2("xorl %4, %3", ret, oldval, uaddr, oparg); 77 + unsafe_atomic_op2("xorl %4, %3", oval, uaddr, oparg, Efault); 68 78 break; 69 79 default: 70 - ret = -ENOSYS; 80 + user_access_end(); 81 + return -ENOSYS; 71 82 } 72 - 73 - pagefault_enable(); 74 - 75 - if (!ret) 76 - *oval = oldval; 77 - 78 - return ret; 83 + user_access_end(); 84 + return 0; 85 + Efault: 86 + user_access_end(); 87 + return -EFAULT; 79 88 } 80 89 81 90 static inline int futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, 82 91 u32 oldval, u32 newval) 83 92 { 84 - return user_atomic_cmpxchg_inatomic(uval, uaddr, oldval, newval); 93 + int ret = 0; 94 + 95 + if (!user_access_begin(uaddr, sizeof(u32))) 96 + return -EFAULT; 97 + asm volatile("\n" 98 + "1:\t" LOCK_PREFIX "cmpxchgl %4, %2\n" 99 + "2:\n" 100 + "\t.section .fixup, \"ax\"\n" 101 + "3:\tmov %3, %0\n" 102 + "\tjmp 2b\n" 103 + "\t.previous\n" 104 + _ASM_EXTABLE_UA(1b, 3b) 105 + : "+r" (ret), "=a" (oldval), "+m" (*uaddr) 106 + : "i" (-EFAULT), "r" (newval), "1" (oldval) 107 + : "memory" 108 + ); 109 + user_access_end(); 110 + *uval = oldval; 111 + return ret; 85 112 } 86 113 87 114 #endif
-93
arch/x86/include/asm/uaccess.h
··· 584 584 unsigned long __must_check clear_user(void __user *mem, unsigned long len); 585 585 unsigned long __must_check __clear_user(void __user *mem, unsigned long len); 586 586 587 - extern void __cmpxchg_wrong_size(void) 588 - __compiletime_error("Bad argument size for cmpxchg"); 589 - 590 - #define __user_atomic_cmpxchg_inatomic(uval, ptr, old, new, size) \ 591 - ({ \ 592 - int __ret = 0; \ 593 - __typeof__(*(ptr)) __old = (old); \ 594 - __typeof__(*(ptr)) __new = (new); \ 595 - __uaccess_begin_nospec(); \ 596 - switch (size) { \ 597 - case 1: \ 598 - { \ 599 - asm volatile("\n" \ 600 - "1:\t" LOCK_PREFIX "cmpxchgb %4, %2\n" \ 601 - "2:\n" \ 602 - "\t.section .fixup, \"ax\"\n" \ 603 - "3:\tmov %3, %0\n" \ 604 - "\tjmp 2b\n" \ 605 - "\t.previous\n" \ 606 - _ASM_EXTABLE_UA(1b, 3b) \ 607 - : "+r" (__ret), "=a" (__old), "+m" (*(ptr)) \ 608 - : "i" (-EFAULT), "q" (__new), "1" (__old) \ 609 - : "memory" \ 610 - ); \ 611 - break; \ 612 - } \ 613 - case 2: \ 614 - { \ 615 - asm volatile("\n" \ 616 - "1:\t" LOCK_PREFIX "cmpxchgw %4, %2\n" \ 617 - "2:\n" \ 618 - "\t.section .fixup, \"ax\"\n" \ 619 - "3:\tmov %3, %0\n" \ 620 - "\tjmp 2b\n" \ 621 - "\t.previous\n" \ 622 - _ASM_EXTABLE_UA(1b, 3b) \ 623 - : "+r" (__ret), "=a" (__old), "+m" (*(ptr)) \ 624 - : "i" (-EFAULT), "r" (__new), "1" (__old) \ 625 - : "memory" \ 626 - ); \ 627 - break; \ 628 - } \ 629 - case 4: \ 630 - { \ 631 - asm volatile("\n" \ 632 - "1:\t" LOCK_PREFIX "cmpxchgl %4, %2\n" \ 633 - "2:\n" \ 634 - "\t.section .fixup, \"ax\"\n" \ 635 - "3:\tmov %3, %0\n" \ 636 - "\tjmp 2b\n" \ 637 - "\t.previous\n" \ 638 - _ASM_EXTABLE_UA(1b, 3b) \ 639 - : "+r" (__ret), "=a" (__old), "+m" (*(ptr)) \ 640 - : "i" (-EFAULT), "r" (__new), "1" (__old) \ 641 - : "memory" \ 642 - ); \ 643 - break; \ 644 - } \ 645 - case 8: \ 646 - { \ 647 - if (!IS_ENABLED(CONFIG_X86_64)) \ 648 - __cmpxchg_wrong_size(); \ 649 - \ 650 - asm volatile("\n" \ 651 - "1:\t" LOCK_PREFIX "cmpxchgq %4, %2\n" \ 652 - "2:\n" \ 653 - "\t.section .fixup, \"ax\"\n" \ 654 - "3:\tmov %3, %0\n" \ 655 - "\tjmp 2b\n" \ 656 - "\t.previous\n" \ 657 - _ASM_EXTABLE_UA(1b, 3b) \ 658 - : "+r" (__ret), "=a" (__old), "+m" (*(ptr)) \ 659 - : "i" (-EFAULT), "r" (__new), "1" (__old) \ 660 - : "memory" \ 661 - ); \ 662 - break; \ 663 - } \ 664 - default: \ 665 - __cmpxchg_wrong_size(); \ 666 - } \ 667 - __uaccess_end(); \ 668 - *(uval) = __old; \ 669 - __ret; \ 670 - }) 671 - 672 - #define user_atomic_cmpxchg_inatomic(uval, ptr, old, new) \ 673 - ({ \ 674 - access_ok((ptr), sizeof(*(ptr))) ? \ 675 - __user_atomic_cmpxchg_inatomic((uval), (ptr), \ 676 - (old), (new), sizeof(*(ptr))) : \ 677 - -EFAULT; \ 678 - }) 679 - 680 587 /* 681 588 * movsl can be slow when source and dest are not both 8-byte aligned 682 589 */
+2 -3
arch/xtensa/include/asm/futex.h
··· 72 72 #if XCHAL_HAVE_S32C1I || XCHAL_HAVE_EXCLUSIVE 73 73 int oldval = 0, ret; 74 74 75 - pagefault_disable(); 75 + if (!access_ok(uaddr, sizeof(u32))) 76 + return -EFAULT; 76 77 77 78 switch (op) { 78 79 case FUTEX_OP_SET: ··· 99 98 default: 100 99 ret = -ENOSYS; 101 100 } 102 - 103 - pagefault_enable(); 104 101 105 102 if (!ret) 106 103 *oval = oldval;
+5 -16
drivers/net/wireless/intersil/orinoco/orinoco_usb.c
··· 365 365 return ctx; 366 366 } 367 367 368 - 369 - /* Hopefully the real complete_all will soon be exported, in the mean 370 - * while this should work. */ 371 - static inline void ezusb_complete_all(struct completion *comp) 372 - { 373 - complete(comp); 374 - complete(comp); 375 - complete(comp); 376 - complete(comp); 377 - } 378 - 379 368 static void ezusb_ctx_complete(struct request_context *ctx) 380 369 { 381 370 struct ezusb_priv *upriv = ctx->upriv; ··· 398 409 399 410 netif_wake_queue(dev); 400 411 } 401 - ezusb_complete_all(&ctx->done); 412 + complete_all(&ctx->done); 402 413 ezusb_request_context_put(ctx); 403 414 break; 404 415 ··· 408 419 /* This is normal, as all request contexts get flushed 409 420 * when the device is disconnected */ 410 421 err("Called, CTX not terminating, but device gone"); 411 - ezusb_complete_all(&ctx->done); 422 + complete_all(&ctx->done); 412 423 ezusb_request_context_put(ctx); 413 424 break; 414 425 } ··· 679 690 * get the chance to run themselves. So we make sure 680 691 * that we don't sleep for ever */ 681 692 int msecs = DEF_TIMEOUT * (1000 / HZ); 682 - while (!ctx->done.done && msecs--) 693 + 694 + while (!try_wait_for_completion(&ctx->done) && msecs--) 683 695 udelay(1000); 684 696 } else { 685 - wait_event_interruptible(ctx->done.wait, 686 - ctx->done.done); 697 + wait_for_completion(&ctx->done); 687 698 } 688 699 break; 689 700 default:
+13 -9
drivers/pci/switch/switchtec.c
··· 52 52 53 53 enum mrpc_state state; 54 54 55 - struct completion comp; 55 + wait_queue_head_t cmd_comp; 56 56 struct kref kref; 57 57 struct list_head list; 58 58 59 + bool cmd_done; 59 60 u32 cmd; 60 61 u32 status; 61 62 u32 return_code; ··· 78 77 stuser->stdev = stdev; 79 78 kref_init(&stuser->kref); 80 79 INIT_LIST_HEAD(&stuser->list); 81 - init_completion(&stuser->comp); 80 + init_waitqueue_head(&stuser->cmd_comp); 82 81 stuser->event_cnt = atomic_read(&stdev->event_cnt); 83 82 84 83 dev_dbg(&stdev->dev, "%s: %p\n", __func__, stuser); ··· 176 175 kref_get(&stuser->kref); 177 176 stuser->read_len = sizeof(stuser->data); 178 177 stuser_set_state(stuser, MRPC_QUEUED); 179 - init_completion(&stuser->comp); 178 + stuser->cmd_done = false; 180 179 list_add_tail(&stuser->list, &stdev->mrpc_queue); 181 180 182 181 mrpc_cmd_submit(stdev); ··· 223 222 memcpy_fromio(stuser->data, &stdev->mmio_mrpc->output_data, 224 223 stuser->read_len); 225 224 out: 226 - complete_all(&stuser->comp); 225 + stuser->cmd_done = true; 226 + wake_up_interruptible(&stuser->cmd_comp); 227 227 list_del_init(&stuser->list); 228 228 stuser_put(stuser); 229 229 stdev->mrpc_busy = 0; ··· 531 529 mutex_unlock(&stdev->mrpc_mutex); 532 530 533 531 if (filp->f_flags & O_NONBLOCK) { 534 - if (!try_wait_for_completion(&stuser->comp)) 532 + if (!stuser->cmd_done) 535 533 return -EAGAIN; 536 534 } else { 537 - rc = wait_for_completion_interruptible(&stuser->comp); 535 + rc = wait_event_interruptible(stuser->cmd_comp, 536 + stuser->cmd_done); 538 537 if (rc < 0) 539 538 return rc; 540 539 } ··· 583 580 struct switchtec_dev *stdev = stuser->stdev; 584 581 __poll_t ret = 0; 585 582 586 - poll_wait(filp, &stuser->comp.wait, wait); 583 + poll_wait(filp, &stuser->cmd_comp, wait); 587 584 poll_wait(filp, &stdev->event_wq, wait); 588 585 589 586 if (lock_mutex_and_test_alive(stdev)) ··· 591 588 592 589 mutex_unlock(&stdev->mrpc_mutex); 593 590 594 - if (try_wait_for_completion(&stuser->comp)) 591 + if (stuser->cmd_done) 595 592 ret |= EPOLLIN | EPOLLRDNORM; 596 593 597 594 if (stuser->event_cnt != atomic_read(&stdev->event_cnt)) ··· 1275 1272 1276 1273 /* Wake up and kill any users waiting on an MRPC request */ 1277 1274 list_for_each_entry_safe(stuser, tmpuser, &stdev->mrpc_queue, list) { 1278 - complete_all(&stuser->comp); 1275 + stuser->cmd_done = true; 1276 + wake_up_interruptible(&stuser->cmd_comp); 1279 1277 list_del_init(&stuser->list); 1280 1278 stuser_put(stuser); 1281 1279 }
+1
drivers/platform/x86/dell-smo8800.c
··· 16 16 #include <linux/interrupt.h> 17 17 #include <linux/miscdevice.h> 18 18 #include <linux/uaccess.h> 19 + #include <linux/fs.h> 19 20 20 21 struct smo8800_device { 21 22 u32 irq; /* acpi device irq */
+1
drivers/platform/x86/wmi.c
··· 29 29 #include <linux/uaccess.h> 30 30 #include <linux/uuid.h> 31 31 #include <linux/wmi.h> 32 + #include <linux/fs.h> 32 33 #include <uapi/linux/wmi.h> 33 34 34 35 ACPI_MODULE_NAME("wmi");
+1
drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
··· 19 19 #include <linux/acpi.h> 20 20 #include <linux/uaccess.h> 21 21 #include <linux/miscdevice.h> 22 + #include <linux/fs.h> 22 23 #include "acpi_thermal_rel.h" 23 24 24 25 static acpi_handle acpi_thermal_rel_handle;
+12 -12
drivers/thermal/intel/x86_pkg_temp_thermal.c
··· 63 63 /* Array of zone pointers */ 64 64 static struct zone_device **zones; 65 65 /* Serializes interrupt notification, work and hotplug */ 66 - static DEFINE_SPINLOCK(pkg_temp_lock); 66 + static DEFINE_RAW_SPINLOCK(pkg_temp_lock); 67 67 /* Protects zone operation in the work function against hotplug removal */ 68 68 static DEFINE_MUTEX(thermal_zone_mutex); 69 69 ··· 266 266 u64 msr_val, wr_val; 267 267 268 268 mutex_lock(&thermal_zone_mutex); 269 - spin_lock_irq(&pkg_temp_lock); 269 + raw_spin_lock_irq(&pkg_temp_lock); 270 270 ++pkg_work_cnt; 271 271 272 272 zonedev = pkg_temp_thermal_get_dev(cpu); 273 273 if (!zonedev) { 274 - spin_unlock_irq(&pkg_temp_lock); 274 + raw_spin_unlock_irq(&pkg_temp_lock); 275 275 mutex_unlock(&thermal_zone_mutex); 276 276 return; 277 277 } ··· 285 285 } 286 286 287 287 enable_pkg_thres_interrupt(); 288 - spin_unlock_irq(&pkg_temp_lock); 288 + raw_spin_unlock_irq(&pkg_temp_lock); 289 289 290 290 /* 291 291 * If tzone is not NULL, then thermal_zone_mutex will prevent the ··· 310 310 struct zone_device *zonedev; 311 311 unsigned long flags; 312 312 313 - spin_lock_irqsave(&pkg_temp_lock, flags); 313 + raw_spin_lock_irqsave(&pkg_temp_lock, flags); 314 314 ++pkg_interrupt_cnt; 315 315 316 316 disable_pkg_thres_interrupt(); ··· 322 322 pkg_thermal_schedule_work(zonedev->cpu, &zonedev->work); 323 323 } 324 324 325 - spin_unlock_irqrestore(&pkg_temp_lock, flags); 325 + raw_spin_unlock_irqrestore(&pkg_temp_lock, flags); 326 326 return 0; 327 327 } 328 328 ··· 368 368 zonedev->msr_pkg_therm_high); 369 369 370 370 cpumask_set_cpu(cpu, &zonedev->cpumask); 371 - spin_lock_irq(&pkg_temp_lock); 371 + raw_spin_lock_irq(&pkg_temp_lock); 372 372 zones[id] = zonedev; 373 - spin_unlock_irq(&pkg_temp_lock); 373 + raw_spin_unlock_irq(&pkg_temp_lock); 374 374 return 0; 375 375 } 376 376 ··· 407 407 } 408 408 409 409 /* Protect against work and interrupts */ 410 - spin_lock_irq(&pkg_temp_lock); 410 + raw_spin_lock_irq(&pkg_temp_lock); 411 411 412 412 /* 413 413 * Check whether this cpu was the current target and store the new ··· 439 439 * To cancel the work we need to drop the lock, otherwise 440 440 * we might deadlock if the work needs to be flushed. 441 441 */ 442 - spin_unlock_irq(&pkg_temp_lock); 442 + raw_spin_unlock_irq(&pkg_temp_lock); 443 443 cancel_delayed_work_sync(&zonedev->work); 444 - spin_lock_irq(&pkg_temp_lock); 444 + raw_spin_lock_irq(&pkg_temp_lock); 445 445 /* 446 446 * If this is not the last cpu in the package and the work 447 447 * did not run after we dropped the lock above, then we ··· 452 452 pkg_thermal_schedule_work(target, &zonedev->work); 453 453 } 454 454 455 - spin_unlock_irq(&pkg_temp_lock); 455 + raw_spin_unlock_irq(&pkg_temp_lock); 456 456 457 457 /* Final cleanup if this is the last cpu */ 458 458 if (lastcpu)
+1 -1
drivers/usb/gadget/function/f_fs.c
··· 1704 1704 pr_info("%s(): freeing\n", __func__); 1705 1705 ffs_data_clear(ffs); 1706 1706 BUG_ON(waitqueue_active(&ffs->ev.waitq) || 1707 - waitqueue_active(&ffs->ep0req_completion.wait) || 1707 + swait_active(&ffs->ep0req_completion.wait) || 1708 1708 waitqueue_active(&ffs->wait)); 1709 1709 destroy_workqueue(ffs->io_completion_wq); 1710 1710 kfree(ffs->dev_name);
+2 -2
drivers/usb/gadget/legacy/inode.c
··· 344 344 spin_unlock_irq (&epdata->dev->lock); 345 345 346 346 if (likely (value == 0)) { 347 - value = wait_event_interruptible (done.wait, done.done); 347 + value = wait_for_completion_interruptible(&done); 348 348 if (value != 0) { 349 349 spin_lock_irq (&epdata->dev->lock); 350 350 if (likely (epdata->ep != NULL)) { ··· 353 353 usb_ep_dequeue (epdata->ep, epdata->req); 354 354 spin_unlock_irq (&epdata->dev->lock); 355 355 356 - wait_event (done.wait, done.done); 356 + wait_for_completion(&done); 357 357 if (epdata->status == -ECONNRESET) 358 358 epdata->status = -EINTR; 359 359 } else {
+7 -12
fs/buffer.c
··· 274 274 * decide that the page is now completely done. 275 275 */ 276 276 first = page_buffers(page); 277 - local_irq_save(flags); 278 - bit_spin_lock(BH_Uptodate_Lock, &first->b_state); 277 + spin_lock_irqsave(&first->b_uptodate_lock, flags); 279 278 clear_buffer_async_read(bh); 280 279 unlock_buffer(bh); 281 280 tmp = bh; ··· 287 288 } 288 289 tmp = tmp->b_this_page; 289 290 } while (tmp != bh); 290 - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); 291 - local_irq_restore(flags); 291 + spin_unlock_irqrestore(&first->b_uptodate_lock, flags); 292 292 293 293 /* 294 294 * If none of the buffers had errors and they are all ··· 299 301 return; 300 302 301 303 still_busy: 302 - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); 303 - local_irq_restore(flags); 304 + spin_unlock_irqrestore(&first->b_uptodate_lock, flags); 304 305 return; 305 306 } 306 307 ··· 368 371 } 369 372 370 373 first = page_buffers(page); 371 - local_irq_save(flags); 372 - bit_spin_lock(BH_Uptodate_Lock, &first->b_state); 374 + spin_lock_irqsave(&first->b_uptodate_lock, flags); 373 375 374 376 clear_buffer_async_write(bh); 375 377 unlock_buffer(bh); ··· 380 384 } 381 385 tmp = tmp->b_this_page; 382 386 } 383 - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); 384 - local_irq_restore(flags); 387 + spin_unlock_irqrestore(&first->b_uptodate_lock, flags); 385 388 end_page_writeback(page); 386 389 return; 387 390 388 391 still_busy: 389 - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); 390 - local_irq_restore(flags); 392 + spin_unlock_irqrestore(&first->b_uptodate_lock, flags); 391 393 return; 392 394 } 393 395 EXPORT_SYMBOL(end_buffer_async_write); ··· 3336 3342 struct buffer_head *ret = kmem_cache_zalloc(bh_cachep, gfp_flags); 3337 3343 if (ret) { 3338 3344 INIT_LIST_HEAD(&ret->b_assoc_buffers); 3345 + spin_lock_init(&ret->b_uptodate_lock); 3339 3346 preempt_disable(); 3340 3347 __this_cpu_inc(bh_accounting.nr); 3341 3348 recalc_bh_state();
+3 -5
fs/ext4/page-io.c
··· 125 125 } 126 126 bh = head = page_buffers(page); 127 127 /* 128 - * We check all buffers in the page under BH_Uptodate_Lock 128 + * We check all buffers in the page under b_uptodate_lock 129 129 * to avoid races with other end io clearing async_write flags 130 130 */ 131 - local_irq_save(flags); 132 - bit_spin_lock(BH_Uptodate_Lock, &head->b_state); 131 + spin_lock_irqsave(&head->b_uptodate_lock, flags); 133 132 do { 134 133 if (bh_offset(bh) < bio_start || 135 134 bh_offset(bh) + bh->b_size > bio_end) { ··· 140 141 if (bio->bi_status) 141 142 buffer_io_error(bh); 142 143 } while ((bh = bh->b_this_page) != head); 143 - bit_spin_unlock(BH_Uptodate_Lock, &head->b_state); 144 - local_irq_restore(flags); 144 + spin_unlock_irqrestore(&head->b_uptodate_lock, flags); 145 145 if (!under_io) { 146 146 fscrypt_free_bounce_page(bounce_page); 147 147 end_page_writeback(page);
+3 -6
fs/ntfs/aops.c
··· 92 92 "0x%llx.", (unsigned long long)bh->b_blocknr); 93 93 } 94 94 first = page_buffers(page); 95 - local_irq_save(flags); 96 - bit_spin_lock(BH_Uptodate_Lock, &first->b_state); 95 + spin_lock_irqsave(&first->b_uptodate_lock, flags); 97 96 clear_buffer_async_read(bh); 98 97 unlock_buffer(bh); 99 98 tmp = bh; ··· 107 108 } 108 109 tmp = tmp->b_this_page; 109 110 } while (tmp != bh); 110 - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); 111 - local_irq_restore(flags); 111 + spin_unlock_irqrestore(&first->b_uptodate_lock, flags); 112 112 /* 113 113 * If none of the buffers had errors then we can set the page uptodate, 114 114 * but we first have to perform the post read mst fixups, if the ··· 140 142 unlock_page(page); 141 143 return; 142 144 still_busy: 143 - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); 144 - local_irq_restore(flags); 145 + spin_unlock_irqrestore(&first->b_uptodate_lock, flags); 145 146 return; 146 147 } 147 148
+1 -1
include/acpi/acpi_bus.h
··· 80 80 81 81 #ifdef CONFIG_ACPI 82 82 83 - #include <linux/proc_fs.h> 83 + struct proc_dir_entry; 84 84 85 85 #define ACPI_BUS_FILE_ROOT "acpi" 86 86 extern struct proc_dir_entry *acpi_root_dir;
+3 -2
include/asm-generic/bitops.h
··· 4 4 5 5 /* 6 6 * For the benefit of those who are trying to port Linux to another 7 - * architecture, here are some C-language equivalents. You should 8 - * recode these in the native assembly language, if at all possible. 7 + * architecture, here are some C-language equivalents. They should 8 + * generate reasonable code, so take a look at what your compiler spits 9 + * out before rolling your own buggy implementation in assembly language. 9 10 * 10 11 * C language equivalents written by Theodore Ts'o, 9/26/92 11 12 */
-2
include/asm-generic/futex.h
··· 34 34 u32 tmp; 35 35 36 36 preempt_disable(); 37 - pagefault_disable(); 38 37 39 38 ret = -EFAULT; 40 39 if (unlikely(get_user(oldval, uaddr) != 0)) ··· 66 67 ret = -EFAULT; 67 68 68 69 out_pagefault_enable: 69 - pagefault_enable(); 70 70 preempt_enable(); 71 71 72 72 if (ret == 0)
+3 -3
include/linux/buffer_head.h
··· 22 22 BH_Dirty, /* Is dirty */ 23 23 BH_Lock, /* Is locked */ 24 24 BH_Req, /* Has been submitted for I/O */ 25 - BH_Uptodate_Lock,/* Used by the first bh in a page, to serialise 26 - * IO completion of other buffers in the page 27 - */ 28 25 29 26 BH_Mapped, /* Has a disk mapping */ 30 27 BH_New, /* Disk mapping was newly created by get_block */ ··· 73 76 struct address_space *b_assoc_map; /* mapping this buffer is 74 77 associated with */ 75 78 atomic_t b_count; /* users using this buffer_head */ 79 + spinlock_t b_uptodate_lock; /* Used by the first bh in a page, to 80 + * serialise IO completion of other 81 + * buffers in the page */ 76 82 }; 77 83 78 84 /*
+4 -4
include/linux/completion.h
··· 9 9 * See kernel/sched/completion.c for details. 10 10 */ 11 11 12 - #include <linux/wait.h> 12 + #include <linux/swait.h> 13 13 14 14 /* 15 15 * struct completion - structure used to maintain state for a "completion" ··· 25 25 */ 26 26 struct completion { 27 27 unsigned int done; 28 - wait_queue_head_t wait; 28 + struct swait_queue_head wait; 29 29 }; 30 30 31 31 #define init_completion_map(x, m) __init_completion(x) ··· 34 34 static inline void complete_release(struct completion *x) {} 35 35 36 36 #define COMPLETION_INITIALIZER(work) \ 37 - { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) } 37 + { 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) } 38 38 39 39 #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \ 40 40 (*({ init_completion_map(&(work), &(map)); &(work); })) ··· 85 85 static inline void __init_completion(struct completion *x) 86 86 { 87 87 x->done = 0; 88 - init_waitqueue_head(&x->wait); 88 + init_swait_queue_head(&x->wait); 89 89 } 90 90 91 91 /**
+2
include/linux/irq_work.h
··· 18 18 19 19 /* Doesn't want IPI, wait for tick: */ 20 20 #define IRQ_WORK_LAZY BIT(2) 21 + /* Run hard IRQ context, even on RT */ 22 + #define IRQ_WORK_HARD_IRQ BIT(3) 21 23 22 24 #define IRQ_WORK_CLAIMED (IRQ_WORK_PENDING | IRQ_WORK_BUSY) 23 25
+47 -1
include/linux/irqflags.h
··· 37 37 # define trace_softirqs_enabled(p) ((p)->softirqs_enabled) 38 38 # define trace_hardirq_enter() \ 39 39 do { \ 40 - current->hardirq_context++; \ 40 + if (!current->hardirq_context++) \ 41 + current->hardirq_threaded = 0; \ 42 + } while (0) 43 + # define trace_hardirq_threaded() \ 44 + do { \ 45 + current->hardirq_threaded = 1; \ 41 46 } while (0) 42 47 # define trace_hardirq_exit() \ 43 48 do { \ ··· 56 51 do { \ 57 52 current->softirq_context--; \ 58 53 } while (0) 54 + 55 + # define lockdep_hrtimer_enter(__hrtimer) \ 56 + do { \ 57 + if (!__hrtimer->is_hard) \ 58 + current->irq_config = 1; \ 59 + } while (0) 60 + 61 + # define lockdep_hrtimer_exit(__hrtimer) \ 62 + do { \ 63 + if (!__hrtimer->is_hard) \ 64 + current->irq_config = 0; \ 65 + } while (0) 66 + 67 + # define lockdep_posixtimer_enter() \ 68 + do { \ 69 + current->irq_config = 1; \ 70 + } while (0) 71 + 72 + # define lockdep_posixtimer_exit() \ 73 + do { \ 74 + current->irq_config = 0; \ 75 + } while (0) 76 + 77 + # define lockdep_irq_work_enter(__work) \ 78 + do { \ 79 + if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\ 80 + current->irq_config = 1; \ 81 + } while (0) 82 + # define lockdep_irq_work_exit(__work) \ 83 + do { \ 84 + if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\ 85 + current->irq_config = 0; \ 86 + } while (0) 87 + 59 88 #else 60 89 # define trace_hardirqs_on() do { } while (0) 61 90 # define trace_hardirqs_off() do { } while (0) ··· 98 59 # define trace_hardirqs_enabled(p) 0 99 60 # define trace_softirqs_enabled(p) 0 100 61 # define trace_hardirq_enter() do { } while (0) 62 + # define trace_hardirq_threaded() do { } while (0) 101 63 # define trace_hardirq_exit() do { } while (0) 102 64 # define lockdep_softirq_enter() do { } while (0) 103 65 # define lockdep_softirq_exit() do { } while (0) 66 + # define lockdep_hrtimer_enter(__hrtimer) do { } while (0) 67 + # define lockdep_hrtimer_exit(__hrtimer) do { } while (0) 68 + # define lockdep_posixtimer_enter() do { } while (0) 69 + # define lockdep_posixtimer_exit() do { } while (0) 70 + # define lockdep_irq_work_enter(__work) do { } while (0) 71 + # define lockdep_irq_work_exit(__work) do { } while (0) 104 72 #endif 105 73 106 74 #if defined(CONFIG_IRQSOFF_TRACER) || \
+75 -11
include/linux/lockdep.h
··· 21 21 22 22 #include <linux/types.h> 23 23 24 + enum lockdep_wait_type { 25 + LD_WAIT_INV = 0, /* not checked, catch all */ 26 + 27 + LD_WAIT_FREE, /* wait free, rcu etc.. */ 28 + LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */ 29 + 30 + #ifdef CONFIG_PROVE_RAW_LOCK_NESTING 31 + LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */ 32 + #else 33 + LD_WAIT_CONFIG = LD_WAIT_SPIN, 34 + #endif 35 + LD_WAIT_SLEEP, /* sleeping locks, mutex_t etc.. */ 36 + 37 + LD_WAIT_MAX, /* must be last */ 38 + }; 39 + 24 40 #ifdef CONFIG_LOCKDEP 25 41 26 42 #include <linux/linkage.h> ··· 127 111 int name_version; 128 112 const char *name; 129 113 114 + short wait_type_inner; 115 + short wait_type_outer; 116 + 130 117 #ifdef CONFIG_LOCK_STAT 131 118 unsigned long contention_point[LOCKSTAT_POINTS]; 132 119 unsigned long contending_point[LOCKSTAT_POINTS]; ··· 177 158 struct lock_class_key *key; 178 159 struct lock_class *class_cache[NR_LOCKDEP_CACHING_CLASSES]; 179 160 const char *name; 161 + short wait_type_outer; /* can be taken in this context */ 162 + short wait_type_inner; /* presents this context */ 180 163 #ifdef CONFIG_LOCK_STAT 181 164 int cpu; 182 165 unsigned long ip; ··· 320 299 * to lockdep: 321 300 */ 322 301 323 - extern void lockdep_init_map(struct lockdep_map *lock, const char *name, 324 - struct lock_class_key *key, int subclass); 302 + extern void lockdep_init_map_waits(struct lockdep_map *lock, const char *name, 303 + struct lock_class_key *key, int subclass, short inner, short outer); 304 + 305 + static inline void 306 + lockdep_init_map_wait(struct lockdep_map *lock, const char *name, 307 + struct lock_class_key *key, int subclass, short inner) 308 + { 309 + lockdep_init_map_waits(lock, name, key, subclass, inner, LD_WAIT_INV); 310 + } 311 + 312 + static inline void lockdep_init_map(struct lockdep_map *lock, const char *name, 313 + struct lock_class_key *key, int subclass) 314 + { 315 + lockdep_init_map_wait(lock, name, key, subclass, LD_WAIT_INV); 316 + } 325 317 326 318 /* 327 319 * Reinitialize a lock key - for cases where there is special locking or ··· 342 308 * of dependencies wrong: they are either too broad (they need a class-split) 343 309 * or they are too narrow (they suffer from a false class-split): 344 310 */ 345 - #define lockdep_set_class(lock, key) \ 346 - lockdep_init_map(&(lock)->dep_map, #key, key, 0) 347 - #define lockdep_set_class_and_name(lock, key, name) \ 348 - lockdep_init_map(&(lock)->dep_map, name, key, 0) 349 - #define lockdep_set_class_and_subclass(lock, key, sub) \ 350 - lockdep_init_map(&(lock)->dep_map, #key, key, sub) 351 - #define lockdep_set_subclass(lock, sub) \ 352 - lockdep_init_map(&(lock)->dep_map, #lock, \ 353 - (lock)->dep_map.key, sub) 311 + #define lockdep_set_class(lock, key) \ 312 + lockdep_init_map_waits(&(lock)->dep_map, #key, key, 0, \ 313 + (lock)->dep_map.wait_type_inner, \ 314 + (lock)->dep_map.wait_type_outer) 315 + 316 + #define lockdep_set_class_and_name(lock, key, name) \ 317 + lockdep_init_map_waits(&(lock)->dep_map, name, key, 0, \ 318 + (lock)->dep_map.wait_type_inner, \ 319 + (lock)->dep_map.wait_type_outer) 320 + 321 + #define lockdep_set_class_and_subclass(lock, key, sub) \ 322 + lockdep_init_map_waits(&(lock)->dep_map, #key, key, sub,\ 323 + (lock)->dep_map.wait_type_inner, \ 324 + (lock)->dep_map.wait_type_outer) 325 + 326 + #define lockdep_set_subclass(lock, sub) \ 327 + lockdep_init_map_waits(&(lock)->dep_map, #lock, (lock)->dep_map.key, sub,\ 328 + (lock)->dep_map.wait_type_inner, \ 329 + (lock)->dep_map.wait_type_outer) 354 330 355 331 #define lockdep_set_novalidate_class(lock) \ 356 332 lockdep_set_class_and_name(lock, &__lockdep_no_validate__, #lock) 333 + 357 334 /* 358 335 * Compare locking classes 359 336 */ ··· 477 432 # define lock_set_class(l, n, k, s, i) do { } while (0) 478 433 # define lock_set_subclass(l, s, i) do { } while (0) 479 434 # define lockdep_init() do { } while (0) 435 + # define lockdep_init_map_waits(lock, name, key, sub, inner, outer) \ 436 + do { (void)(name); (void)(key); } while (0) 437 + # define lockdep_init_map_wait(lock, name, key, sub, inner) \ 438 + do { (void)(name); (void)(key); } while (0) 480 439 # define lockdep_init_map(lock, name, key, sub) \ 481 440 do { (void)(name); (void)(key); } while (0) 482 441 # define lockdep_set_class(lock, key) do { (void)(key); } while (0) ··· 709 660 # define lockdep_assert_irqs_enabled() do { } while (0) 710 661 # define lockdep_assert_irqs_disabled() do { } while (0) 711 662 # define lockdep_assert_in_irq() do { } while (0) 663 + #endif 664 + 665 + #ifdef CONFIG_PROVE_RAW_LOCK_NESTING 666 + 667 + # define lockdep_assert_RT_in_threaded_ctx() do { \ 668 + WARN_ONCE(debug_locks && !current->lockdep_recursion && \ 669 + current->hardirq_context && \ 670 + !(current->hardirq_threaded || current->irq_config), \ 671 + "Not in threaded context on PREEMPT_RT as expected\n"); \ 672 + } while (0) 673 + 674 + #else 675 + 676 + # define lockdep_assert_RT_in_threaded_ctx() do { } while (0) 677 + 712 678 #endif 713 679 714 680 #ifdef CONFIG_LOCKDEP
+5 -2
include/linux/mutex.h
··· 109 109 } while (0) 110 110 111 111 #ifdef CONFIG_DEBUG_LOCK_ALLOC 112 - # define __DEP_MAP_MUTEX_INITIALIZER(lockname) \ 113 - , .dep_map = { .name = #lockname } 112 + # define __DEP_MAP_MUTEX_INITIALIZER(lockname) \ 113 + , .dep_map = { \ 114 + .name = #lockname, \ 115 + .wait_type_inner = LD_WAIT_SLEEP, \ 116 + } 114 117 #else 115 118 # define __DEP_MAP_MUTEX_INITIALIZER(lockname) 116 119 #endif
+49 -34
include/linux/percpu-rwsem.h
··· 3 3 #define _LINUX_PERCPU_RWSEM_H 4 4 5 5 #include <linux/atomic.h> 6 - #include <linux/rwsem.h> 7 6 #include <linux/percpu.h> 8 7 #include <linux/rcuwait.h> 8 + #include <linux/wait.h> 9 9 #include <linux/rcu_sync.h> 10 10 #include <linux/lockdep.h> 11 11 12 12 struct percpu_rw_semaphore { 13 13 struct rcu_sync rss; 14 14 unsigned int __percpu *read_count; 15 - struct rw_semaphore rw_sem; /* slowpath */ 16 - struct rcuwait writer; /* blocked writer */ 17 - int readers_block; 15 + struct rcuwait writer; 16 + wait_queue_head_t waiters; 17 + atomic_t block; 18 + #ifdef CONFIG_DEBUG_LOCK_ALLOC 19 + struct lockdep_map dep_map; 20 + #endif 18 21 }; 22 + 23 + #ifdef CONFIG_DEBUG_LOCK_ALLOC 24 + #define __PERCPU_RWSEM_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname }, 25 + #else 26 + #define __PERCPU_RWSEM_DEP_MAP_INIT(lockname) 27 + #endif 19 28 20 29 #define __DEFINE_PERCPU_RWSEM(name, is_static) \ 21 30 static DEFINE_PER_CPU(unsigned int, __percpu_rwsem_rc_##name); \ 22 31 is_static struct percpu_rw_semaphore name = { \ 23 32 .rss = __RCU_SYNC_INITIALIZER(name.rss), \ 24 33 .read_count = &__percpu_rwsem_rc_##name, \ 25 - .rw_sem = __RWSEM_INITIALIZER(name.rw_sem), \ 26 34 .writer = __RCUWAIT_INITIALIZER(name.writer), \ 35 + .waiters = __WAIT_QUEUE_HEAD_INITIALIZER(name.waiters), \ 36 + .block = ATOMIC_INIT(0), \ 37 + __PERCPU_RWSEM_DEP_MAP_INIT(name) \ 27 38 } 39 + 28 40 #define DEFINE_PERCPU_RWSEM(name) \ 29 41 __DEFINE_PERCPU_RWSEM(name, /* not static */) 30 42 #define DEFINE_STATIC_PERCPU_RWSEM(name) \ 31 43 __DEFINE_PERCPU_RWSEM(name, static) 32 44 33 - extern int __percpu_down_read(struct percpu_rw_semaphore *, int); 34 - extern void __percpu_up_read(struct percpu_rw_semaphore *); 45 + extern bool __percpu_down_read(struct percpu_rw_semaphore *, bool); 35 46 36 47 static inline void percpu_down_read(struct percpu_rw_semaphore *sem) 37 48 { 38 49 might_sleep(); 39 50 40 - rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 0, _RET_IP_); 51 + rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_); 41 52 42 53 preempt_disable(); 43 54 /* ··· 59 48 * and that once the synchronize_rcu() is done, the writer will see 60 49 * anything we did within this RCU-sched read-size critical section. 61 50 */ 62 - __this_cpu_inc(*sem->read_count); 63 - if (unlikely(!rcu_sync_is_idle(&sem->rss))) 51 + if (likely(rcu_sync_is_idle(&sem->rss))) 52 + __this_cpu_inc(*sem->read_count); 53 + else 64 54 __percpu_down_read(sem, false); /* Unconditional memory barrier */ 65 55 /* 66 56 * The preempt_enable() prevents the compiler from ··· 70 58 preempt_enable(); 71 59 } 72 60 73 - static inline int percpu_down_read_trylock(struct percpu_rw_semaphore *sem) 61 + static inline bool percpu_down_read_trylock(struct percpu_rw_semaphore *sem) 74 62 { 75 - int ret = 1; 63 + bool ret = true; 76 64 77 65 preempt_disable(); 78 66 /* 79 67 * Same as in percpu_down_read(). 80 68 */ 81 - __this_cpu_inc(*sem->read_count); 82 - if (unlikely(!rcu_sync_is_idle(&sem->rss))) 69 + if (likely(rcu_sync_is_idle(&sem->rss))) 70 + __this_cpu_inc(*sem->read_count); 71 + else 83 72 ret = __percpu_down_read(sem, true); /* Unconditional memory barrier */ 84 73 preempt_enable(); 85 74 /* ··· 89 76 */ 90 77 91 78 if (ret) 92 - rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 1, _RET_IP_); 79 + rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_); 93 80 94 81 return ret; 95 82 } 96 83 97 84 static inline void percpu_up_read(struct percpu_rw_semaphore *sem) 98 85 { 86 + rwsem_release(&sem->dep_map, _RET_IP_); 87 + 99 88 preempt_disable(); 100 89 /* 101 90 * Same as in percpu_down_read(). 102 91 */ 103 - if (likely(rcu_sync_is_idle(&sem->rss))) 92 + if (likely(rcu_sync_is_idle(&sem->rss))) { 104 93 __this_cpu_dec(*sem->read_count); 105 - else 106 - __percpu_up_read(sem); /* Unconditional memory barrier */ 94 + } else { 95 + /* 96 + * slowpath; reader will only ever wake a single blocked 97 + * writer. 98 + */ 99 + smp_mb(); /* B matches C */ 100 + /* 101 + * In other words, if they see our decrement (presumably to 102 + * aggregate zero, as that is the only time it matters) they 103 + * will also see our critical section. 104 + */ 105 + __this_cpu_dec(*sem->read_count); 106 + rcuwait_wake_up(&sem->writer); 107 + } 107 108 preempt_enable(); 108 - 109 - rwsem_release(&sem->rw_sem.dep_map, _RET_IP_); 110 109 } 111 110 112 111 extern void percpu_down_write(struct percpu_rw_semaphore *); ··· 135 110 __percpu_init_rwsem(sem, #sem, &rwsem_key); \ 136 111 }) 137 112 138 - #define percpu_rwsem_is_held(sem) lockdep_is_held(&(sem)->rw_sem) 139 - 140 - #define percpu_rwsem_assert_held(sem) \ 141 - lockdep_assert_held(&(sem)->rw_sem) 113 + #define percpu_rwsem_is_held(sem) lockdep_is_held(sem) 114 + #define percpu_rwsem_assert_held(sem) lockdep_assert_held(sem) 142 115 143 116 static inline void percpu_rwsem_release(struct percpu_rw_semaphore *sem, 144 117 bool read, unsigned long ip) 145 118 { 146 - lock_release(&sem->rw_sem.dep_map, ip); 147 - #ifdef CONFIG_RWSEM_SPIN_ON_OWNER 148 - if (!read) 149 - atomic_long_set(&sem->rw_sem.owner, RWSEM_OWNER_UNKNOWN); 150 - #endif 119 + lock_release(&sem->dep_map, ip); 151 120 } 152 121 153 122 static inline void percpu_rwsem_acquire(struct percpu_rw_semaphore *sem, 154 123 bool read, unsigned long ip) 155 124 { 156 - lock_acquire(&sem->rw_sem.dep_map, 0, 1, read, 1, NULL, ip); 157 - #ifdef CONFIG_RWSEM_SPIN_ON_OWNER 158 - if (!read) 159 - atomic_long_set(&sem->rw_sem.owner, (long)current); 160 - #endif 125 + lock_acquire(&sem->dep_map, 0, 1, read, 1, NULL, ip); 161 126 } 162 127 163 128 #endif
+10 -2
include/linux/rcuwait.h
··· 3 3 #define _LINUX_RCUWAIT_H_ 4 4 5 5 #include <linux/rcupdate.h> 6 + #include <linux/sched/signal.h> 6 7 7 8 /* 8 9 * rcuwait provides a way of blocking and waking up a single ··· 31 30 * The caller is responsible for locking around rcuwait_wait_event(), 32 31 * such that writes to @task are properly serialized. 33 32 */ 34 - #define rcuwait_wait_event(w, condition) \ 33 + #define rcuwait_wait_event(w, condition, state) \ 35 34 ({ \ 35 + int __ret = 0; \ 36 36 rcu_assign_pointer((w)->task, current); \ 37 37 for (;;) { \ 38 38 /* \ 39 39 * Implicit barrier (A) pairs with (B) in \ 40 40 * rcuwait_wake_up(). \ 41 41 */ \ 42 - set_current_state(TASK_UNINTERRUPTIBLE); \ 42 + set_current_state(state); \ 43 43 if (condition) \ 44 44 break; \ 45 + \ 46 + if (signal_pending_state(state, current)) { \ 47 + __ret = -EINTR; \ 48 + break; \ 49 + } \ 45 50 \ 46 51 schedule(); \ 47 52 } \ 48 53 \ 49 54 WRITE_ONCE((w)->task, NULL); \ 50 55 __set_current_state(TASK_RUNNING); \ 56 + __ret; \ 51 57 }) 52 58 53 59 #endif /* _LINUX_RCUWAIT_H_ */
+5 -1
include/linux/rwlock_types.h
··· 22 22 #define RWLOCK_MAGIC 0xdeaf1eed 23 23 24 24 #ifdef CONFIG_DEBUG_LOCK_ALLOC 25 - # define RW_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname } 25 + # define RW_DEP_MAP_INIT(lockname) \ 26 + .dep_map = { \ 27 + .name = #lockname, \ 28 + .wait_type_inner = LD_WAIT_CONFIG, \ 29 + } 26 30 #else 27 31 # define RW_DEP_MAP_INIT(lockname) 28 32 #endif
+5 -7
include/linux/rwsem.h
··· 53 53 #endif 54 54 }; 55 55 56 - /* 57 - * Setting all bits of the owner field except bit 0 will indicate 58 - * that the rwsem is writer-owned with an unknown owner. 59 - */ 60 - #define RWSEM_OWNER_UNKNOWN (-2L) 61 - 62 56 /* In all implementations count != 0 means locked */ 63 57 static inline int rwsem_is_locked(struct rw_semaphore *sem) 64 58 { ··· 65 71 /* Common initializer macros and functions */ 66 72 67 73 #ifdef CONFIG_DEBUG_LOCK_ALLOC 68 - # define __RWSEM_DEP_MAP_INIT(lockname) , .dep_map = { .name = #lockname } 74 + # define __RWSEM_DEP_MAP_INIT(lockname) \ 75 + , .dep_map = { \ 76 + .name = #lockname, \ 77 + .wait_type_inner = LD_WAIT_SLEEP, \ 78 + } 69 79 #else 70 80 # define __RWSEM_DEP_MAP_INIT(lockname) 71 81 #endif
+2
include/linux/sched.h
··· 970 970 971 971 #ifdef CONFIG_TRACE_IRQFLAGS 972 972 unsigned int irq_events; 973 + unsigned int hardirq_threaded; 973 974 unsigned long hardirq_enable_ip; 974 975 unsigned long hardirq_disable_ip; 975 976 unsigned int hardirq_enable_event; ··· 983 982 unsigned int softirq_enable_event; 984 983 int softirqs_enabled; 985 984 int softirq_context; 985 + int irq_config; 986 986 #endif 987 987 988 988 #ifdef CONFIG_LOCKDEP
+25 -10
include/linux/spinlock.h
··· 93 93 94 94 #ifdef CONFIG_DEBUG_SPINLOCK 95 95 extern void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name, 96 - struct lock_class_key *key); 97 - # define raw_spin_lock_init(lock) \ 98 - do { \ 99 - static struct lock_class_key __key; \ 100 - \ 101 - __raw_spin_lock_init((lock), #lock, &__key); \ 96 + struct lock_class_key *key, short inner); 97 + 98 + # define raw_spin_lock_init(lock) \ 99 + do { \ 100 + static struct lock_class_key __key; \ 101 + \ 102 + __raw_spin_lock_init((lock), #lock, &__key, LD_WAIT_SPIN); \ 102 103 } while (0) 103 104 104 105 #else ··· 328 327 return &lock->rlock; 329 328 } 330 329 331 - #define spin_lock_init(_lock) \ 332 - do { \ 333 - spinlock_check(_lock); \ 334 - raw_spin_lock_init(&(_lock)->rlock); \ 330 + #ifdef CONFIG_DEBUG_SPINLOCK 331 + 332 + # define spin_lock_init(lock) \ 333 + do { \ 334 + static struct lock_class_key __key; \ 335 + \ 336 + __raw_spin_lock_init(spinlock_check(lock), \ 337 + #lock, &__key, LD_WAIT_CONFIG); \ 335 338 } while (0) 339 + 340 + #else 341 + 342 + # define spin_lock_init(_lock) \ 343 + do { \ 344 + spinlock_check(_lock); \ 345 + *(_lock) = __SPIN_LOCK_UNLOCKED(_lock); \ 346 + } while (0) 347 + 348 + #endif 336 349 337 350 static __always_inline void spin_lock(spinlock_t *lock) 338 351 {
+20 -4
include/linux/spinlock_types.h
··· 33 33 #define SPINLOCK_OWNER_INIT ((void *)-1L) 34 34 35 35 #ifdef CONFIG_DEBUG_LOCK_ALLOC 36 - # define SPIN_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname } 36 + # define RAW_SPIN_DEP_MAP_INIT(lockname) \ 37 + .dep_map = { \ 38 + .name = #lockname, \ 39 + .wait_type_inner = LD_WAIT_SPIN, \ 40 + } 41 + # define SPIN_DEP_MAP_INIT(lockname) \ 42 + .dep_map = { \ 43 + .name = #lockname, \ 44 + .wait_type_inner = LD_WAIT_CONFIG, \ 45 + } 37 46 #else 47 + # define RAW_SPIN_DEP_MAP_INIT(lockname) 38 48 # define SPIN_DEP_MAP_INIT(lockname) 39 49 #endif 40 50 ··· 61 51 { \ 62 52 .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED, \ 63 53 SPIN_DEBUG_INIT(lockname) \ 64 - SPIN_DEP_MAP_INIT(lockname) } 54 + RAW_SPIN_DEP_MAP_INIT(lockname) } 65 55 66 56 #define __RAW_SPIN_LOCK_UNLOCKED(lockname) \ 67 57 (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname) ··· 82 72 }; 83 73 } spinlock_t; 84 74 75 + #define ___SPIN_LOCK_INITIALIZER(lockname) \ 76 + { \ 77 + .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED, \ 78 + SPIN_DEBUG_INIT(lockname) \ 79 + SPIN_DEP_MAP_INIT(lockname) } 80 + 85 81 #define __SPIN_LOCK_INITIALIZER(lockname) \ 86 - { { .rlock = __RAW_SPIN_LOCK_INITIALIZER(lockname) } } 82 + { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } } 87 83 88 84 #define __SPIN_LOCK_UNLOCKED(lockname) \ 89 - (spinlock_t ) __SPIN_LOCK_INITIALIZER(lockname) 85 + (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname) 90 86 91 87 #define DEFINE_SPINLOCK(x) spinlock_t x = __SPIN_LOCK_UNLOCKED(x) 92 88
+1
include/linux/wait.h
··· 20 20 #define WQ_FLAG_EXCLUSIVE 0x01 21 21 #define WQ_FLAG_WOKEN 0x02 22 22 #define WQ_FLAG_BOOKMARK 0x04 23 + #define WQ_FLAG_CUSTOM 0x08 23 24 24 25 /* 25 26 * A single wait-queue entry structure:
+2 -2
kernel/cpu.c
··· 331 331 332 332 static void lockdep_acquire_cpus_lock(void) 333 333 { 334 - rwsem_acquire(&cpu_hotplug_lock.rw_sem.dep_map, 0, 0, _THIS_IP_); 334 + rwsem_acquire(&cpu_hotplug_lock.dep_map, 0, 0, _THIS_IP_); 335 335 } 336 336 337 337 static void lockdep_release_cpus_lock(void) 338 338 { 339 - rwsem_release(&cpu_hotplug_lock.rw_sem.dep_map, _THIS_IP_); 339 + rwsem_release(&cpu_hotplug_lock.dep_map, _THIS_IP_); 340 340 } 341 341 342 342 /*
+1
kernel/exit.c
··· 258 258 wake_up_process(task); 259 259 rcu_read_unlock(); 260 260 } 261 + EXPORT_SYMBOL_GPL(rcuwait_wake_up); 261 262 262 263 /* 263 264 * Determine if a process group is "orphaned", according to the POSIX
+8 -99
kernel/futex.c
··· 135 135 * 136 136 * Where (A) orders the waiters increment and the futex value read through 137 137 * atomic operations (see hb_waiters_inc) and where (B) orders the write 138 - * to futex and the waiters read -- this is done by the barriers for both 139 - * shared and private futexes in get_futex_key_refs(). 138 + * to futex and the waiters read (see hb_waiters_pending()). 140 139 * 141 140 * This yields the following case (where X:=waiters, Y:=futex): 142 141 * ··· 330 331 static inline void compat_exit_robust_list(struct task_struct *curr) { } 331 332 #endif 332 333 333 - static inline void futex_get_mm(union futex_key *key) 334 - { 335 - mmgrab(key->private.mm); 336 - /* 337 - * Ensure futex_get_mm() implies a full barrier such that 338 - * get_futex_key() implies a full barrier. This is relied upon 339 - * as smp_mb(); (B), see the ordering comment above. 340 - */ 341 - smp_mb__after_atomic(); 342 - } 343 - 344 334 /* 345 335 * Reflects a new waiter being added to the waitqueue. 346 336 */ ··· 358 370 static inline int hb_waiters_pending(struct futex_hash_bucket *hb) 359 371 { 360 372 #ifdef CONFIG_SMP 373 + /* 374 + * Full barrier (B), see the ordering comment above. 375 + */ 376 + smp_mb(); 361 377 return atomic_read(&hb->waiters); 362 378 #else 363 379 return 1; ··· 397 405 && key1->both.word == key2->both.word 398 406 && key1->both.ptr == key2->both.ptr 399 407 && key1->both.offset == key2->both.offset); 400 - } 401 - 402 - /* 403 - * Take a reference to the resource addressed by a key. 404 - * Can be called while holding spinlocks. 405 - * 406 - */ 407 - static void get_futex_key_refs(union futex_key *key) 408 - { 409 - if (!key->both.ptr) 410 - return; 411 - 412 - /* 413 - * On MMU less systems futexes are always "private" as there is no per 414 - * process address space. We need the smp wmb nevertheless - yes, 415 - * arch/blackfin has MMU less SMP ... 416 - */ 417 - if (!IS_ENABLED(CONFIG_MMU)) { 418 - smp_mb(); /* explicit smp_mb(); (B) */ 419 - return; 420 - } 421 - 422 - switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) { 423 - case FUT_OFF_INODE: 424 - smp_mb(); /* explicit smp_mb(); (B) */ 425 - break; 426 - case FUT_OFF_MMSHARED: 427 - futex_get_mm(key); /* implies smp_mb(); (B) */ 428 - break; 429 - default: 430 - /* 431 - * Private futexes do not hold reference on an inode or 432 - * mm, therefore the only purpose of calling get_futex_key_refs 433 - * is because we need the barrier for the lockless waiter check. 434 - */ 435 - smp_mb(); /* explicit smp_mb(); (B) */ 436 - } 437 - } 438 - 439 - /* 440 - * Drop a reference to the resource addressed by a key. 441 - * The hash bucket spinlock must not be held. This is 442 - * a no-op for private futexes, see comment in the get 443 - * counterpart. 444 - */ 445 - static void drop_futex_key_refs(union futex_key *key) 446 - { 447 - if (!key->both.ptr) { 448 - /* If we're here then we tried to put a key we failed to get */ 449 - WARN_ON_ONCE(1); 450 - return; 451 - } 452 - 453 - if (!IS_ENABLED(CONFIG_MMU)) 454 - return; 455 - 456 - switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) { 457 - case FUT_OFF_INODE: 458 - break; 459 - case FUT_OFF_MMSHARED: 460 - mmdrop(key->private.mm); 461 - break; 462 - } 463 408 } 464 409 465 410 enum futex_access { ··· 530 601 if (!fshared) { 531 602 key->private.mm = mm; 532 603 key->private.address = address; 533 - get_futex_key_refs(key); /* implies smp_mb(); (B) */ 534 604 return 0; 535 605 } 536 606 ··· 669 741 rcu_read_unlock(); 670 742 } 671 743 672 - get_futex_key_refs(key); /* implies smp_mb(); (B) */ 673 - 674 744 out: 675 745 put_page(page); 676 746 return err; ··· 676 750 677 751 static inline void put_futex_key(union futex_key *key) 678 752 { 679 - drop_futex_key_refs(key); 680 753 } 681 754 682 755 /** ··· 1665 1740 oparg = 1 << oparg; 1666 1741 } 1667 1742 1668 - if (!access_ok(uaddr, sizeof(u32))) 1669 - return -EFAULT; 1670 - 1743 + pagefault_disable(); 1671 1744 ret = arch_futex_atomic_op_inuser(op, oparg, &oldval, uaddr); 1745 + pagefault_enable(); 1672 1746 if (ret) 1673 1747 return ret; 1674 1748 ··· 1809 1885 plist_add(&q->list, &hb2->chain); 1810 1886 q->lock_ptr = &hb2->lock; 1811 1887 } 1812 - get_futex_key_refs(key2); 1813 1888 q->key = *key2; 1814 1889 } 1815 1890 ··· 1830 1907 void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key, 1831 1908 struct futex_hash_bucket *hb) 1832 1909 { 1833 - get_futex_key_refs(key); 1834 1910 q->key = *key; 1835 1911 1836 1912 __unqueue_futex(q); ··· 1940 2018 u32 *cmpval, int requeue_pi) 1941 2019 { 1942 2020 union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT; 1943 - int drop_count = 0, task_count = 0, ret; 2021 + int task_count = 0, ret; 1944 2022 struct futex_pi_state *pi_state = NULL; 1945 2023 struct futex_hash_bucket *hb1, *hb2; 1946 2024 struct futex_q *this, *next; ··· 2061 2139 */ 2062 2140 if (ret > 0) { 2063 2141 WARN_ON(pi_state); 2064 - drop_count++; 2065 2142 task_count++; 2066 2143 /* 2067 2144 * If we acquired the lock, then the user space value ··· 2180 2259 * doing so. 2181 2260 */ 2182 2261 requeue_pi_wake_futex(this, &key2, hb2); 2183 - drop_count++; 2184 2262 continue; 2185 2263 } else if (ret) { 2186 2264 /* ··· 2200 2280 } 2201 2281 } 2202 2282 requeue_futex(this, hb1, hb2, &key2); 2203 - drop_count++; 2204 2283 } 2205 2284 2206 2285 /* ··· 2213 2294 double_unlock_hb(hb1, hb2); 2214 2295 wake_up_q(&wake_q); 2215 2296 hb_waiters_dec(hb2); 2216 - 2217 - /* 2218 - * drop_futex_key_refs() must be called outside the spinlocks. During 2219 - * the requeue we moved futex_q's from the hash bucket at key1 to the 2220 - * one at key2 and updated their key pointer. We no longer need to 2221 - * hold the references to key1. 2222 - */ 2223 - while (--drop_count >= 0) 2224 - drop_futex_key_refs(&key1); 2225 2297 2226 2298 out_put_keys: 2227 2299 put_futex_key(&key2); ··· 2343 2433 ret = 1; 2344 2434 } 2345 2435 2346 - drop_futex_key_refs(&q->key); 2347 2436 return ret; 2348 2437 } 2349 2438
+7
kernel/irq/handle.c
··· 145 145 for_each_action_of_desc(desc, action) { 146 146 irqreturn_t res; 147 147 148 + /* 149 + * If this IRQ would be threaded under force_irqthreads, mark it so. 150 + */ 151 + if (irq_settings_can_thread(desc) && 152 + !(action->flags & (IRQF_NO_THREAD | IRQF_PERCPU | IRQF_ONESHOT))) 153 + trace_hardirq_threaded(); 154 + 148 155 trace_irq_handler_entry(irq, action); 149 156 res = action->handler(irq, action->dev_id); 150 157 trace_irq_handler_exit(irq, action, res);
+2
kernel/irq_work.c
··· 153 153 */ 154 154 flags = atomic_fetch_andnot(IRQ_WORK_PENDING, &work->flags); 155 155 156 + lockdep_irq_work_enter(work); 156 157 work->func(work); 158 + lockdep_irq_work_exit(work); 157 159 /* 158 160 * Clear the BUSY bit and return to the free state if 159 161 * no-one else claimed it meanwhile.
+543 -131
kernel/locking/lockdep.c
··· 84 84 * to use a raw spinlock - we really dont want the spinlock 85 85 * code to recurse back into the lockdep code... 86 86 */ 87 - static arch_spinlock_t lockdep_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; 87 + static arch_spinlock_t __lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; 88 + static struct task_struct *__owner; 89 + 90 + static inline void lockdep_lock(void) 91 + { 92 + DEBUG_LOCKS_WARN_ON(!irqs_disabled()); 93 + 94 + arch_spin_lock(&__lock); 95 + __owner = current; 96 + current->lockdep_recursion++; 97 + } 98 + 99 + static inline void lockdep_unlock(void) 100 + { 101 + if (debug_locks && DEBUG_LOCKS_WARN_ON(__owner != current)) 102 + return; 103 + 104 + current->lockdep_recursion--; 105 + __owner = NULL; 106 + arch_spin_unlock(&__lock); 107 + } 108 + 109 + static inline bool lockdep_assert_locked(void) 110 + { 111 + return DEBUG_LOCKS_WARN_ON(__owner != current); 112 + } 113 + 88 114 static struct task_struct *lockdep_selftest_task_struct; 115 + 89 116 90 117 static int graph_lock(void) 91 118 { 92 - arch_spin_lock(&lockdep_lock); 119 + lockdep_lock(); 93 120 /* 94 121 * Make sure that if another CPU detected a bug while 95 122 * walking the graph we dont change it (while the other ··· 124 97 * dropped already) 125 98 */ 126 99 if (!debug_locks) { 127 - arch_spin_unlock(&lockdep_lock); 100 + lockdep_unlock(); 128 101 return 0; 129 102 } 130 - /* prevent any recursions within lockdep from causing deadlocks */ 131 - current->lockdep_recursion++; 132 103 return 1; 133 104 } 134 105 135 - static inline int graph_unlock(void) 106 + static inline void graph_unlock(void) 136 107 { 137 - if (debug_locks && !arch_spin_is_locked(&lockdep_lock)) { 138 - /* 139 - * The lockdep graph lock isn't locked while we expect it to 140 - * be, we're confused now, bye! 141 - */ 142 - return DEBUG_LOCKS_WARN_ON(1); 143 - } 144 - 145 - current->lockdep_recursion--; 146 - arch_spin_unlock(&lockdep_lock); 147 - return 0; 108 + lockdep_unlock(); 148 109 } 149 110 150 111 /* ··· 143 128 { 144 129 int ret = debug_locks_off(); 145 130 146 - arch_spin_unlock(&lockdep_lock); 131 + lockdep_unlock(); 147 132 148 133 return ret; 149 134 } ··· 162 147 #define KEYHASH_SIZE (1UL << KEYHASH_BITS) 163 148 static struct hlist_head lock_keys_hash[KEYHASH_SIZE]; 164 149 unsigned long nr_lock_classes; 150 + unsigned long nr_zapped_classes; 165 151 #ifndef CONFIG_DEBUG_LOCKDEP 166 152 static 167 153 #endif ··· 393 377 task->lockdep_recursion = 0; 394 378 } 395 379 380 + /* 381 + * Split the recrursion counter in two to readily detect 'off' vs recursion. 382 + */ 383 + #define LOCKDEP_RECURSION_BITS 16 384 + #define LOCKDEP_OFF (1U << LOCKDEP_RECURSION_BITS) 385 + #define LOCKDEP_RECURSION_MASK (LOCKDEP_OFF - 1) 386 + 396 387 void lockdep_off(void) 397 388 { 398 - current->lockdep_recursion++; 389 + current->lockdep_recursion += LOCKDEP_OFF; 399 390 } 400 391 EXPORT_SYMBOL(lockdep_off); 401 392 402 393 void lockdep_on(void) 403 394 { 404 - current->lockdep_recursion--; 395 + current->lockdep_recursion -= LOCKDEP_OFF; 405 396 } 406 397 EXPORT_SYMBOL(lockdep_on); 398 + 399 + static inline void lockdep_recursion_finish(void) 400 + { 401 + if (WARN_ON_ONCE(--current->lockdep_recursion)) 402 + current->lockdep_recursion = 0; 403 + } 407 404 408 405 void lockdep_set_selftest_task(struct task_struct *task) 409 406 { ··· 604 575 #include "lockdep_states.h" 605 576 #undef LOCKDEP_STATE 606 577 [LOCK_USED] = "INITIAL USE", 578 + [LOCK_USAGE_STATES] = "IN-NMI", 607 579 }; 608 580 #endif 609 581 ··· 683 653 684 654 printk(KERN_CONT " ("); 685 655 __print_lock_name(class); 686 - printk(KERN_CONT "){%s}", usage); 656 + printk(KERN_CONT "){%s}-{%hd:%hd}", usage, 657 + class->wait_type_outer ?: class->wait_type_inner, 658 + class->wait_type_inner); 687 659 } 688 660 689 661 static void print_lockdep_cache(struct lockdep_map *lock) ··· 819 787 return count + 1; 820 788 } 821 789 790 + /* used from NMI context -- must be lockless */ 822 791 static inline struct lock_class * 823 792 look_up_lock_class(const struct lockdep_map *lock, unsigned int subclass) 824 793 { ··· 1103 1070 1104 1071 #endif /* CONFIG_DEBUG_LOCKDEP */ 1105 1072 1073 + static void init_chain_block_buckets(void); 1074 + 1106 1075 /* 1107 1076 * Initialize the lock_classes[] array elements, the free_lock_classes list 1108 1077 * and also the delayed_free structure. 1109 1078 */ 1110 1079 static void init_data_structures_once(void) 1111 1080 { 1112 - static bool ds_initialized, rcu_head_initialized; 1081 + static bool __read_mostly ds_initialized, rcu_head_initialized; 1113 1082 int i; 1114 1083 1115 1084 if (likely(rcu_head_initialized)) ··· 1135 1100 INIT_LIST_HEAD(&lock_classes[i].locks_after); 1136 1101 INIT_LIST_HEAD(&lock_classes[i].locks_before); 1137 1102 } 1103 + init_chain_block_buckets(); 1138 1104 } 1139 1105 1140 1106 static inline struct hlist_head *keyhashentry(const struct lock_class_key *key) ··· 1266 1230 WARN_ON_ONCE(!list_empty(&class->locks_before)); 1267 1231 WARN_ON_ONCE(!list_empty(&class->locks_after)); 1268 1232 class->name_version = count_matching_names(class); 1233 + class->wait_type_inner = lock->wait_type_inner; 1234 + class->wait_type_outer = lock->wait_type_outer; 1269 1235 /* 1270 1236 * We use RCU's safe list-add method to make 1271 1237 * parallel walking of the hash-list safe: ··· 1507 1469 struct circular_queue *cq = &lock_cq; 1508 1470 int ret = 1; 1509 1471 1472 + lockdep_assert_locked(); 1473 + 1510 1474 if (match(source_entry, data)) { 1511 1475 *target_entry = source_entry; 1512 1476 ret = 0; ··· 1530 1490 } 1531 1491 1532 1492 head = get_dep_list(lock, offset); 1533 - 1534 - DEBUG_LOCKS_WARN_ON(!irqs_disabled()); 1535 1493 1536 1494 list_for_each_entry_rcu(entry, head, entry) { 1537 1495 if (!lock_accessed(entry)) { ··· 1757 1719 this.class = class; 1758 1720 1759 1721 raw_local_irq_save(flags); 1760 - arch_spin_lock(&lockdep_lock); 1722 + lockdep_lock(); 1761 1723 ret = __lockdep_count_forward_deps(&this); 1762 - arch_spin_unlock(&lockdep_lock); 1724 + lockdep_unlock(); 1763 1725 raw_local_irq_restore(flags); 1764 1726 1765 1727 return ret; ··· 1784 1746 this.class = class; 1785 1747 1786 1748 raw_local_irq_save(flags); 1787 - arch_spin_lock(&lockdep_lock); 1749 + lockdep_lock(); 1788 1750 ret = __lockdep_count_backward_deps(&this); 1789 - arch_spin_unlock(&lockdep_lock); 1751 + lockdep_unlock(); 1790 1752 raw_local_irq_restore(flags); 1791 1753 1792 1754 return ret; ··· 2336 2298 return 0; 2337 2299 } 2338 2300 2339 - static void inc_chains(void) 2340 - { 2341 - if (current->hardirq_context) 2342 - nr_hardirq_chains++; 2343 - else { 2344 - if (current->softirq_context) 2345 - nr_softirq_chains++; 2346 - else 2347 - nr_process_chains++; 2348 - } 2349 - } 2350 - 2351 2301 #else 2352 2302 2353 2303 static inline int check_irq_usage(struct task_struct *curr, ··· 2343 2317 { 2344 2318 return 1; 2345 2319 } 2320 + #endif /* CONFIG_TRACE_IRQFLAGS */ 2346 2321 2347 - static inline void inc_chains(void) 2322 + static void inc_chains(int irq_context) 2348 2323 { 2349 - nr_process_chains++; 2324 + if (irq_context & LOCK_CHAIN_HARDIRQ_CONTEXT) 2325 + nr_hardirq_chains++; 2326 + else if (irq_context & LOCK_CHAIN_SOFTIRQ_CONTEXT) 2327 + nr_softirq_chains++; 2328 + else 2329 + nr_process_chains++; 2350 2330 } 2351 2331 2352 - #endif /* CONFIG_TRACE_IRQFLAGS */ 2332 + static void dec_chains(int irq_context) 2333 + { 2334 + if (irq_context & LOCK_CHAIN_HARDIRQ_CONTEXT) 2335 + nr_hardirq_chains--; 2336 + else if (irq_context & LOCK_CHAIN_SOFTIRQ_CONTEXT) 2337 + nr_softirq_chains--; 2338 + else 2339 + nr_process_chains--; 2340 + } 2353 2341 2354 2342 static void 2355 2343 print_deadlock_scenario(struct held_lock *nxt, struct held_lock *prv) ··· 2662 2622 2663 2623 struct lock_chain lock_chains[MAX_LOCKDEP_CHAINS]; 2664 2624 static DECLARE_BITMAP(lock_chains_in_use, MAX_LOCKDEP_CHAINS); 2665 - int nr_chain_hlocks; 2666 2625 static u16 chain_hlocks[MAX_LOCKDEP_CHAIN_HLOCKS]; 2626 + unsigned long nr_zapped_lock_chains; 2627 + unsigned int nr_free_chain_hlocks; /* Free chain_hlocks in buckets */ 2628 + unsigned int nr_lost_chain_hlocks; /* Lost chain_hlocks */ 2629 + unsigned int nr_large_chain_blocks; /* size > MAX_CHAIN_BUCKETS */ 2630 + 2631 + /* 2632 + * The first 2 chain_hlocks entries in the chain block in the bucket 2633 + * list contains the following meta data: 2634 + * 2635 + * entry[0]: 2636 + * Bit 15 - always set to 1 (it is not a class index) 2637 + * Bits 0-14 - upper 15 bits of the next block index 2638 + * entry[1] - lower 16 bits of next block index 2639 + * 2640 + * A next block index of all 1 bits means it is the end of the list. 2641 + * 2642 + * On the unsized bucket (bucket-0), the 3rd and 4th entries contain 2643 + * the chain block size: 2644 + * 2645 + * entry[2] - upper 16 bits of the chain block size 2646 + * entry[3] - lower 16 bits of the chain block size 2647 + */ 2648 + #define MAX_CHAIN_BUCKETS 16 2649 + #define CHAIN_BLK_FLAG (1U << 15) 2650 + #define CHAIN_BLK_LIST_END 0xFFFFU 2651 + 2652 + static int chain_block_buckets[MAX_CHAIN_BUCKETS]; 2653 + 2654 + static inline int size_to_bucket(int size) 2655 + { 2656 + if (size > MAX_CHAIN_BUCKETS) 2657 + return 0; 2658 + 2659 + return size - 1; 2660 + } 2661 + 2662 + /* 2663 + * Iterate all the chain blocks in a bucket. 2664 + */ 2665 + #define for_each_chain_block(bucket, prev, curr) \ 2666 + for ((prev) = -1, (curr) = chain_block_buckets[bucket]; \ 2667 + (curr) >= 0; \ 2668 + (prev) = (curr), (curr) = chain_block_next(curr)) 2669 + 2670 + /* 2671 + * next block or -1 2672 + */ 2673 + static inline int chain_block_next(int offset) 2674 + { 2675 + int next = chain_hlocks[offset]; 2676 + 2677 + WARN_ON_ONCE(!(next & CHAIN_BLK_FLAG)); 2678 + 2679 + if (next == CHAIN_BLK_LIST_END) 2680 + return -1; 2681 + 2682 + next &= ~CHAIN_BLK_FLAG; 2683 + next <<= 16; 2684 + next |= chain_hlocks[offset + 1]; 2685 + 2686 + return next; 2687 + } 2688 + 2689 + /* 2690 + * bucket-0 only 2691 + */ 2692 + static inline int chain_block_size(int offset) 2693 + { 2694 + return (chain_hlocks[offset + 2] << 16) | chain_hlocks[offset + 3]; 2695 + } 2696 + 2697 + static inline void init_chain_block(int offset, int next, int bucket, int size) 2698 + { 2699 + chain_hlocks[offset] = (next >> 16) | CHAIN_BLK_FLAG; 2700 + chain_hlocks[offset + 1] = (u16)next; 2701 + 2702 + if (size && !bucket) { 2703 + chain_hlocks[offset + 2] = size >> 16; 2704 + chain_hlocks[offset + 3] = (u16)size; 2705 + } 2706 + } 2707 + 2708 + static inline void add_chain_block(int offset, int size) 2709 + { 2710 + int bucket = size_to_bucket(size); 2711 + int next = chain_block_buckets[bucket]; 2712 + int prev, curr; 2713 + 2714 + if (unlikely(size < 2)) { 2715 + /* 2716 + * We can't store single entries on the freelist. Leak them. 2717 + * 2718 + * One possible way out would be to uniquely mark them, other 2719 + * than with CHAIN_BLK_FLAG, such that we can recover them when 2720 + * the block before it is re-added. 2721 + */ 2722 + if (size) 2723 + nr_lost_chain_hlocks++; 2724 + return; 2725 + } 2726 + 2727 + nr_free_chain_hlocks += size; 2728 + if (!bucket) { 2729 + nr_large_chain_blocks++; 2730 + 2731 + /* 2732 + * Variable sized, sort large to small. 2733 + */ 2734 + for_each_chain_block(0, prev, curr) { 2735 + if (size >= chain_block_size(curr)) 2736 + break; 2737 + } 2738 + init_chain_block(offset, curr, 0, size); 2739 + if (prev < 0) 2740 + chain_block_buckets[0] = offset; 2741 + else 2742 + init_chain_block(prev, offset, 0, 0); 2743 + return; 2744 + } 2745 + /* 2746 + * Fixed size, add to head. 2747 + */ 2748 + init_chain_block(offset, next, bucket, size); 2749 + chain_block_buckets[bucket] = offset; 2750 + } 2751 + 2752 + /* 2753 + * Only the first block in the list can be deleted. 2754 + * 2755 + * For the variable size bucket[0], the first block (the largest one) is 2756 + * returned, broken up and put back into the pool. So if a chain block of 2757 + * length > MAX_CHAIN_BUCKETS is ever used and zapped, it will just be 2758 + * queued up after the primordial chain block and never be used until the 2759 + * hlock entries in the primordial chain block is almost used up. That 2760 + * causes fragmentation and reduce allocation efficiency. That can be 2761 + * monitored by looking at the "large chain blocks" number in lockdep_stats. 2762 + */ 2763 + static inline void del_chain_block(int bucket, int size, int next) 2764 + { 2765 + nr_free_chain_hlocks -= size; 2766 + chain_block_buckets[bucket] = next; 2767 + 2768 + if (!bucket) 2769 + nr_large_chain_blocks--; 2770 + } 2771 + 2772 + static void init_chain_block_buckets(void) 2773 + { 2774 + int i; 2775 + 2776 + for (i = 0; i < MAX_CHAIN_BUCKETS; i++) 2777 + chain_block_buckets[i] = -1; 2778 + 2779 + add_chain_block(0, ARRAY_SIZE(chain_hlocks)); 2780 + } 2781 + 2782 + /* 2783 + * Return offset of a chain block of the right size or -1 if not found. 2784 + * 2785 + * Fairly simple worst-fit allocator with the addition of a number of size 2786 + * specific free lists. 2787 + */ 2788 + static int alloc_chain_hlocks(int req) 2789 + { 2790 + int bucket, curr, size; 2791 + 2792 + /* 2793 + * We rely on the MSB to act as an escape bit to denote freelist 2794 + * pointers. Make sure this bit isn't set in 'normal' class_idx usage. 2795 + */ 2796 + BUILD_BUG_ON((MAX_LOCKDEP_KEYS-1) & CHAIN_BLK_FLAG); 2797 + 2798 + init_data_structures_once(); 2799 + 2800 + if (nr_free_chain_hlocks < req) 2801 + return -1; 2802 + 2803 + /* 2804 + * We require a minimum of 2 (u16) entries to encode a freelist 2805 + * 'pointer'. 2806 + */ 2807 + req = max(req, 2); 2808 + bucket = size_to_bucket(req); 2809 + curr = chain_block_buckets[bucket]; 2810 + 2811 + if (bucket) { 2812 + if (curr >= 0) { 2813 + del_chain_block(bucket, req, chain_block_next(curr)); 2814 + return curr; 2815 + } 2816 + /* Try bucket 0 */ 2817 + curr = chain_block_buckets[0]; 2818 + } 2819 + 2820 + /* 2821 + * The variable sized freelist is sorted by size; the first entry is 2822 + * the largest. Use it if it fits. 2823 + */ 2824 + if (curr >= 0) { 2825 + size = chain_block_size(curr); 2826 + if (likely(size >= req)) { 2827 + del_chain_block(0, size, chain_block_next(curr)); 2828 + add_chain_block(curr + req, size - req); 2829 + return curr; 2830 + } 2831 + } 2832 + 2833 + /* 2834 + * Last resort, split a block in a larger sized bucket. 2835 + */ 2836 + for (size = MAX_CHAIN_BUCKETS; size > req; size--) { 2837 + bucket = size_to_bucket(size); 2838 + curr = chain_block_buckets[bucket]; 2839 + if (curr < 0) 2840 + continue; 2841 + 2842 + del_chain_block(bucket, size, chain_block_next(curr)); 2843 + add_chain_block(curr + req, size - req); 2844 + return curr; 2845 + } 2846 + 2847 + return -1; 2848 + } 2849 + 2850 + static inline void free_chain_hlocks(int base, int size) 2851 + { 2852 + add_chain_block(base, max(size, 2)); 2853 + } 2667 2854 2668 2855 struct lock_class *lock_chain_get_class(struct lock_chain *chain, int i) 2669 2856 { ··· 3070 2803 * disabled to make this an IRQ-safe lock.. for recursion reasons 3071 2804 * lockdep won't complain about its own locking errors. 3072 2805 */ 3073 - if (DEBUG_LOCKS_WARN_ON(!irqs_disabled())) 2806 + if (lockdep_assert_locked()) 3074 2807 return 0; 3075 2808 3076 2809 chain = alloc_lock_chain(); ··· 3091 2824 BUILD_BUG_ON((1UL << 6) <= ARRAY_SIZE(curr->held_locks)); 3092 2825 BUILD_BUG_ON((1UL << 8*sizeof(chain_hlocks[0])) <= ARRAY_SIZE(lock_classes)); 3093 2826 3094 - if (likely(nr_chain_hlocks + chain->depth <= MAX_LOCKDEP_CHAIN_HLOCKS)) { 3095 - chain->base = nr_chain_hlocks; 3096 - for (j = 0; j < chain->depth - 1; j++, i++) { 3097 - int lock_id = curr->held_locks[i].class_idx; 3098 - chain_hlocks[chain->base + j] = lock_id; 3099 - } 3100 - chain_hlocks[chain->base + j] = class - lock_classes; 3101 - nr_chain_hlocks += chain->depth; 3102 - } else { 2827 + j = alloc_chain_hlocks(chain->depth); 2828 + if (j < 0) { 3103 2829 if (!debug_locks_off_graph_unlock()) 3104 2830 return 0; 3105 2831 ··· 3101 2841 return 0; 3102 2842 } 3103 2843 2844 + chain->base = j; 2845 + for (j = 0; j < chain->depth - 1; j++, i++) { 2846 + int lock_id = curr->held_locks[i].class_idx; 2847 + 2848 + chain_hlocks[chain->base + j] = lock_id; 2849 + } 2850 + chain_hlocks[chain->base + j] = class - lock_classes; 3104 2851 hlist_add_head_rcu(&chain->entry, hash_head); 3105 2852 debug_atomic_inc(chain_lookup_misses); 3106 - inc_chains(); 2853 + inc_chains(chain->irq_context); 3107 2854 3108 2855 return 1; 3109 2856 } ··· 3254 2987 { 3255 2988 return 1; 3256 2989 } 2990 + 2991 + static void init_chain_block_buckets(void) { } 3257 2992 #endif /* CONFIG_PROVE_LOCKING */ 3258 2993 3259 2994 /* ··· 3698 3429 if (DEBUG_LOCKS_WARN_ON(current->hardirq_context)) 3699 3430 return; 3700 3431 3701 - current->lockdep_recursion = 1; 3432 + current->lockdep_recursion++; 3702 3433 __trace_hardirqs_on_caller(ip); 3703 - current->lockdep_recursion = 0; 3434 + lockdep_recursion_finish(); 3704 3435 } 3705 3436 NOKPROBE_SYMBOL(lockdep_hardirqs_on); 3706 3437 ··· 3756 3487 return; 3757 3488 } 3758 3489 3759 - current->lockdep_recursion = 1; 3490 + current->lockdep_recursion++; 3760 3491 /* 3761 3492 * We'll do an OFF -> ON transition: 3762 3493 */ ··· 3771 3502 */ 3772 3503 if (curr->hardirqs_enabled) 3773 3504 mark_held_locks(curr, LOCK_ENABLED_SOFTIRQ); 3774 - current->lockdep_recursion = 0; 3505 + lockdep_recursion_finish(); 3775 3506 } 3776 3507 3777 3508 /* ··· 3865 3596 3866 3597 static inline unsigned int task_irq_context(struct task_struct *task) 3867 3598 { 3868 - return 2 * !!task->hardirq_context + !!task->softirq_context; 3599 + return LOCK_CHAIN_HARDIRQ_CONTEXT * !!task->hardirq_context + 3600 + LOCK_CHAIN_SOFTIRQ_CONTEXT * !!task->softirq_context; 3869 3601 } 3870 3602 3871 3603 static int separate_irq_context(struct task_struct *curr, ··· 3952 3682 return ret; 3953 3683 } 3954 3684 3685 + static int 3686 + print_lock_invalid_wait_context(struct task_struct *curr, 3687 + struct held_lock *hlock) 3688 + { 3689 + if (!debug_locks_off()) 3690 + return 0; 3691 + if (debug_locks_silent) 3692 + return 0; 3693 + 3694 + pr_warn("\n"); 3695 + pr_warn("=============================\n"); 3696 + pr_warn("[ BUG: Invalid wait context ]\n"); 3697 + print_kernel_ident(); 3698 + pr_warn("-----------------------------\n"); 3699 + 3700 + pr_warn("%s/%d is trying to lock:\n", curr->comm, task_pid_nr(curr)); 3701 + print_lock(hlock); 3702 + 3703 + pr_warn("other info that might help us debug this:\n"); 3704 + lockdep_print_held_locks(curr); 3705 + 3706 + pr_warn("stack backtrace:\n"); 3707 + dump_stack(); 3708 + 3709 + return 0; 3710 + } 3711 + 3712 + /* 3713 + * Verify the wait_type context. 3714 + * 3715 + * This check validates we takes locks in the right wait-type order; that is it 3716 + * ensures that we do not take mutexes inside spinlocks and do not attempt to 3717 + * acquire spinlocks inside raw_spinlocks and the sort. 3718 + * 3719 + * The entire thing is slightly more complex because of RCU, RCU is a lock that 3720 + * can be taken from (pretty much) any context but also has constraints. 3721 + * However when taken in a stricter environment the RCU lock does not loosen 3722 + * the constraints. 3723 + * 3724 + * Therefore we must look for the strictest environment in the lock stack and 3725 + * compare that to the lock we're trying to acquire. 3726 + */ 3727 + static int check_wait_context(struct task_struct *curr, struct held_lock *next) 3728 + { 3729 + short next_inner = hlock_class(next)->wait_type_inner; 3730 + short next_outer = hlock_class(next)->wait_type_outer; 3731 + short curr_inner; 3732 + int depth; 3733 + 3734 + if (!curr->lockdep_depth || !next_inner || next->trylock) 3735 + return 0; 3736 + 3737 + if (!next_outer) 3738 + next_outer = next_inner; 3739 + 3740 + /* 3741 + * Find start of current irq_context.. 3742 + */ 3743 + for (depth = curr->lockdep_depth - 1; depth >= 0; depth--) { 3744 + struct held_lock *prev = curr->held_locks + depth; 3745 + if (prev->irq_context != next->irq_context) 3746 + break; 3747 + } 3748 + depth++; 3749 + 3750 + /* 3751 + * Set appropriate wait type for the context; for IRQs we have to take 3752 + * into account force_irqthread as that is implied by PREEMPT_RT. 3753 + */ 3754 + if (curr->hardirq_context) { 3755 + /* 3756 + * Check if force_irqthreads will run us threaded. 3757 + */ 3758 + if (curr->hardirq_threaded || curr->irq_config) 3759 + curr_inner = LD_WAIT_CONFIG; 3760 + else 3761 + curr_inner = LD_WAIT_SPIN; 3762 + } else if (curr->softirq_context) { 3763 + /* 3764 + * Softirqs are always threaded. 3765 + */ 3766 + curr_inner = LD_WAIT_CONFIG; 3767 + } else { 3768 + curr_inner = LD_WAIT_MAX; 3769 + } 3770 + 3771 + for (; depth < curr->lockdep_depth; depth++) { 3772 + struct held_lock *prev = curr->held_locks + depth; 3773 + short prev_inner = hlock_class(prev)->wait_type_inner; 3774 + 3775 + if (prev_inner) { 3776 + /* 3777 + * We can have a bigger inner than a previous one 3778 + * when outer is smaller than inner, as with RCU. 3779 + * 3780 + * Also due to trylocks. 3781 + */ 3782 + curr_inner = min(curr_inner, prev_inner); 3783 + } 3784 + } 3785 + 3786 + if (next_outer > curr_inner) 3787 + return print_lock_invalid_wait_context(curr, next); 3788 + 3789 + return 0; 3790 + } 3791 + 3955 3792 #else /* CONFIG_PROVE_LOCKING */ 3956 3793 3957 3794 static inline int ··· 4078 3701 return 0; 4079 3702 } 4080 3703 3704 + static inline int check_wait_context(struct task_struct *curr, 3705 + struct held_lock *next) 3706 + { 3707 + return 0; 3708 + } 3709 + 4081 3710 #endif /* CONFIG_PROVE_LOCKING */ 4082 3711 4083 3712 /* 4084 3713 * Initialize a lock instance's lock-class mapping info: 4085 3714 */ 4086 - void lockdep_init_map(struct lockdep_map *lock, const char *name, 4087 - struct lock_class_key *key, int subclass) 3715 + void lockdep_init_map_waits(struct lockdep_map *lock, const char *name, 3716 + struct lock_class_key *key, int subclass, 3717 + short inner, short outer) 4088 3718 { 4089 3719 int i; 4090 3720 ··· 4111 3727 } 4112 3728 4113 3729 lock->name = name; 3730 + 3731 + lock->wait_type_outer = outer; 3732 + lock->wait_type_inner = inner; 4114 3733 4115 3734 /* 4116 3735 * No key, no joy, we need to hash something. ··· 4142 3755 return; 4143 3756 4144 3757 raw_local_irq_save(flags); 4145 - current->lockdep_recursion = 1; 3758 + current->lockdep_recursion++; 4146 3759 register_lock_class(lock, subclass, 1); 4147 - current->lockdep_recursion = 0; 3760 + lockdep_recursion_finish(); 4148 3761 raw_local_irq_restore(flags); 4149 3762 } 4150 3763 } 4151 - EXPORT_SYMBOL_GPL(lockdep_init_map); 3764 + EXPORT_SYMBOL_GPL(lockdep_init_map_waits); 4152 3765 4153 3766 struct lock_class_key __lockdep_no_validate__; 4154 3767 EXPORT_SYMBOL_GPL(__lockdep_no_validate__); ··· 4249 3862 4250 3863 class_idx = class - lock_classes; 4251 3864 4252 - if (depth) { 3865 + if (depth) { /* we're holding locks */ 4253 3866 hlock = curr->held_locks + depth - 1; 4254 3867 if (hlock->class_idx == class_idx && nest_lock) { 4255 3868 if (!references) ··· 4290 3903 hlock->holdtime_stamp = lockstat_clock(); 4291 3904 #endif 4292 3905 hlock->pin_count = pin_count; 3906 + 3907 + if (check_wait_context(curr, hlock)) 3908 + return 0; 4293 3909 4294 3910 /* Initialize the lock usage bit */ 4295 3911 if (!mark_usage(curr, hlock, check)) ··· 4529 4139 return 0; 4530 4140 } 4531 4141 4532 - lockdep_init_map(lock, name, key, 0); 4142 + lockdep_init_map_waits(lock, name, key, 0, 4143 + lock->wait_type_inner, 4144 + lock->wait_type_outer); 4533 4145 class = register_lock_class(lock, subclass, 0); 4534 4146 hlock->class_idx = class - lock_classes; 4535 4147 ··· 4829 4437 return; 4830 4438 4831 4439 raw_local_irq_save(flags); 4832 - current->lockdep_recursion = 1; 4440 + current->lockdep_recursion++; 4833 4441 check_flags(flags); 4834 4442 if (__lock_set_class(lock, name, key, subclass, ip)) 4835 4443 check_chain_key(current); 4836 - current->lockdep_recursion = 0; 4444 + lockdep_recursion_finish(); 4837 4445 raw_local_irq_restore(flags); 4838 4446 } 4839 4447 EXPORT_SYMBOL_GPL(lock_set_class); ··· 4846 4454 return; 4847 4455 4848 4456 raw_local_irq_save(flags); 4849 - current->lockdep_recursion = 1; 4457 + current->lockdep_recursion++; 4850 4458 check_flags(flags); 4851 4459 if (__lock_downgrade(lock, ip)) 4852 4460 check_chain_key(current); 4853 - current->lockdep_recursion = 0; 4461 + lockdep_recursion_finish(); 4854 4462 raw_local_irq_restore(flags); 4855 4463 } 4856 4464 EXPORT_SYMBOL_GPL(lock_downgrade); 4465 + 4466 + /* NMI context !!! */ 4467 + static void verify_lock_unused(struct lockdep_map *lock, struct held_lock *hlock, int subclass) 4468 + { 4469 + #ifdef CONFIG_PROVE_LOCKING 4470 + struct lock_class *class = look_up_lock_class(lock, subclass); 4471 + 4472 + /* if it doesn't have a class (yet), it certainly hasn't been used yet */ 4473 + if (!class) 4474 + return; 4475 + 4476 + if (!(class->usage_mask & LOCK_USED)) 4477 + return; 4478 + 4479 + hlock->class_idx = class - lock_classes; 4480 + 4481 + print_usage_bug(current, hlock, LOCK_USED, LOCK_USAGE_STATES); 4482 + #endif 4483 + } 4484 + 4485 + static bool lockdep_nmi(void) 4486 + { 4487 + if (current->lockdep_recursion & LOCKDEP_RECURSION_MASK) 4488 + return false; 4489 + 4490 + if (!in_nmi()) 4491 + return false; 4492 + 4493 + return true; 4494 + } 4857 4495 4858 4496 /* 4859 4497 * We are not always called with irqs disabled - do that here, ··· 4895 4473 { 4896 4474 unsigned long flags; 4897 4475 4898 - if (unlikely(current->lockdep_recursion)) 4476 + if (unlikely(current->lockdep_recursion)) { 4477 + /* XXX allow trylock from NMI ?!? */ 4478 + if (lockdep_nmi() && !trylock) { 4479 + struct held_lock hlock; 4480 + 4481 + hlock.acquire_ip = ip; 4482 + hlock.instance = lock; 4483 + hlock.nest_lock = nest_lock; 4484 + hlock.irq_context = 2; // XXX 4485 + hlock.trylock = trylock; 4486 + hlock.read = read; 4487 + hlock.check = check; 4488 + hlock.hardirqs_off = true; 4489 + hlock.references = 0; 4490 + 4491 + verify_lock_unused(lock, &hlock, subclass); 4492 + } 4899 4493 return; 4494 + } 4900 4495 4901 4496 raw_local_irq_save(flags); 4902 4497 check_flags(flags); 4903 4498 4904 - current->lockdep_recursion = 1; 4499 + current->lockdep_recursion++; 4905 4500 trace_lock_acquire(lock, subclass, trylock, read, check, nest_lock, ip); 4906 4501 __lock_acquire(lock, subclass, trylock, read, check, 4907 4502 irqs_disabled_flags(flags), nest_lock, ip, 0, 0); 4908 - current->lockdep_recursion = 0; 4503 + lockdep_recursion_finish(); 4909 4504 raw_local_irq_restore(flags); 4910 4505 } 4911 4506 EXPORT_SYMBOL_GPL(lock_acquire); ··· 4936 4497 4937 4498 raw_local_irq_save(flags); 4938 4499 check_flags(flags); 4939 - current->lockdep_recursion = 1; 4500 + current->lockdep_recursion++; 4940 4501 trace_lock_release(lock, ip); 4941 4502 if (__lock_release(lock, ip)) 4942 4503 check_chain_key(current); 4943 - current->lockdep_recursion = 0; 4504 + lockdep_recursion_finish(); 4944 4505 raw_local_irq_restore(flags); 4945 4506 } 4946 4507 EXPORT_SYMBOL_GPL(lock_release); ··· 4956 4517 raw_local_irq_save(flags); 4957 4518 check_flags(flags); 4958 4519 4959 - current->lockdep_recursion = 1; 4520 + current->lockdep_recursion++; 4960 4521 ret = __lock_is_held(lock, read); 4961 - current->lockdep_recursion = 0; 4522 + lockdep_recursion_finish(); 4962 4523 raw_local_irq_restore(flags); 4963 4524 4964 4525 return ret; ··· 4977 4538 raw_local_irq_save(flags); 4978 4539 check_flags(flags); 4979 4540 4980 - current->lockdep_recursion = 1; 4541 + current->lockdep_recursion++; 4981 4542 cookie = __lock_pin_lock(lock); 4982 - current->lockdep_recursion = 0; 4543 + lockdep_recursion_finish(); 4983 4544 raw_local_irq_restore(flags); 4984 4545 4985 4546 return cookie; ··· 4996 4557 raw_local_irq_save(flags); 4997 4558 check_flags(flags); 4998 4559 4999 - current->lockdep_recursion = 1; 4560 + current->lockdep_recursion++; 5000 4561 __lock_repin_lock(lock, cookie); 5001 - current->lockdep_recursion = 0; 4562 + lockdep_recursion_finish(); 5002 4563 raw_local_irq_restore(flags); 5003 4564 } 5004 4565 EXPORT_SYMBOL_GPL(lock_repin_lock); ··· 5013 4574 raw_local_irq_save(flags); 5014 4575 check_flags(flags); 5015 4576 5016 - current->lockdep_recursion = 1; 4577 + current->lockdep_recursion++; 5017 4578 __lock_unpin_lock(lock, cookie); 5018 - current->lockdep_recursion = 0; 4579 + lockdep_recursion_finish(); 5019 4580 raw_local_irq_restore(flags); 5020 4581 } 5021 4582 EXPORT_SYMBOL_GPL(lock_unpin_lock); ··· 5151 4712 5152 4713 raw_local_irq_save(flags); 5153 4714 check_flags(flags); 5154 - current->lockdep_recursion = 1; 4715 + current->lockdep_recursion++; 5155 4716 trace_lock_contended(lock, ip); 5156 4717 __lock_contended(lock, ip); 5157 - current->lockdep_recursion = 0; 4718 + lockdep_recursion_finish(); 5158 4719 raw_local_irq_restore(flags); 5159 4720 } 5160 4721 EXPORT_SYMBOL_GPL(lock_contended); ··· 5171 4732 5172 4733 raw_local_irq_save(flags); 5173 4734 check_flags(flags); 5174 - current->lockdep_recursion = 1; 4735 + current->lockdep_recursion++; 5175 4736 __lock_acquired(lock, ip); 5176 - current->lockdep_recursion = 0; 4737 + lockdep_recursion_finish(); 5177 4738 raw_local_irq_restore(flags); 5178 4739 } 5179 4740 EXPORT_SYMBOL_GPL(lock_acquired); ··· 5207 4768 struct lock_class *class) 5208 4769 { 5209 4770 #ifdef CONFIG_PROVE_LOCKING 5210 - struct lock_chain *new_chain; 5211 - u64 chain_key; 5212 4771 int i; 5213 4772 5214 4773 for (i = chain->base; i < chain->base + chain->depth; i++) { 5215 4774 if (chain_hlocks[i] != class - lock_classes) 5216 4775 continue; 5217 - /* The code below leaks one chain_hlock[] entry. */ 5218 - if (--chain->depth > 0) { 5219 - memmove(&chain_hlocks[i], &chain_hlocks[i + 1], 5220 - (chain->base + chain->depth - i) * 5221 - sizeof(chain_hlocks[0])); 5222 - } 5223 4776 /* 5224 4777 * Each lock class occurs at most once in a lock chain so once 5225 4778 * we found a match we can break out of this loop. 5226 4779 */ 5227 - goto recalc; 4780 + goto free_lock_chain; 5228 4781 } 5229 4782 /* Since the chain has not been modified, return. */ 5230 4783 return; 5231 4784 5232 - recalc: 5233 - chain_key = INITIAL_CHAIN_KEY; 5234 - for (i = chain->base; i < chain->base + chain->depth; i++) 5235 - chain_key = iterate_chain_key(chain_key, chain_hlocks[i]); 5236 - if (chain->depth && chain->chain_key == chain_key) 5237 - return; 4785 + free_lock_chain: 4786 + free_chain_hlocks(chain->base, chain->depth); 5238 4787 /* Overwrite the chain key for concurrent RCU readers. */ 5239 - WRITE_ONCE(chain->chain_key, chain_key); 4788 + WRITE_ONCE(chain->chain_key, INITIAL_CHAIN_KEY); 4789 + dec_chains(chain->irq_context); 4790 + 5240 4791 /* 5241 4792 * Note: calling hlist_del_rcu() from inside a 5242 4793 * hlist_for_each_entry_rcu() loop is safe. 5243 4794 */ 5244 4795 hlist_del_rcu(&chain->entry); 5245 4796 __set_bit(chain - lock_chains, pf->lock_chains_being_freed); 5246 - if (chain->depth == 0) 5247 - return; 5248 - /* 5249 - * If the modified lock chain matches an existing lock chain, drop 5250 - * the modified lock chain. 5251 - */ 5252 - if (lookup_chain_cache(chain_key)) 5253 - return; 5254 - new_chain = alloc_lock_chain(); 5255 - if (WARN_ON_ONCE(!new_chain)) { 5256 - debug_locks_off(); 5257 - return; 5258 - } 5259 - *new_chain = *chain; 5260 - hlist_add_head_rcu(&new_chain->entry, chainhashentry(chain_key)); 4797 + nr_zapped_lock_chains++; 5261 4798 #endif 5262 4799 } 5263 4800 ··· 5289 4874 } 5290 4875 5291 4876 remove_class_from_lock_chains(pf, class); 4877 + nr_zapped_classes++; 5292 4878 } 5293 4879 5294 4880 static void reinit_class(struct lock_class *class) ··· 5374 4958 return; 5375 4959 5376 4960 raw_local_irq_save(flags); 5377 - arch_spin_lock(&lockdep_lock); 5378 - current->lockdep_recursion = 1; 4961 + lockdep_lock(); 5379 4962 5380 4963 /* closed head */ 5381 4964 pf = delayed_free.pf + (delayed_free.index ^ 1); ··· 5386 4971 */ 5387 4972 call_rcu_zapped(delayed_free.pf + delayed_free.index); 5388 4973 5389 - current->lockdep_recursion = 0; 5390 - arch_spin_unlock(&lockdep_lock); 4974 + lockdep_unlock(); 5391 4975 raw_local_irq_restore(flags); 5392 4976 } 5393 4977 ··· 5431 5017 init_data_structures_once(); 5432 5018 5433 5019 raw_local_irq_save(flags); 5434 - arch_spin_lock(&lockdep_lock); 5435 - current->lockdep_recursion = 1; 5020 + lockdep_lock(); 5436 5021 pf = get_pending_free(); 5437 5022 __lockdep_free_key_range(pf, start, size); 5438 5023 call_rcu_zapped(pf); 5439 - current->lockdep_recursion = 0; 5440 - arch_spin_unlock(&lockdep_lock); 5024 + lockdep_unlock(); 5441 5025 raw_local_irq_restore(flags); 5442 5026 5443 5027 /* ··· 5457 5045 init_data_structures_once(); 5458 5046 5459 5047 raw_local_irq_save(flags); 5460 - arch_spin_lock(&lockdep_lock); 5048 + lockdep_lock(); 5461 5049 __lockdep_free_key_range(pf, start, size); 5462 5050 __free_zapped_classes(pf); 5463 - arch_spin_unlock(&lockdep_lock); 5051 + lockdep_unlock(); 5464 5052 raw_local_irq_restore(flags); 5465 5053 } 5466 5054 ··· 5556 5144 unsigned long flags; 5557 5145 5558 5146 raw_local_irq_save(flags); 5559 - arch_spin_lock(&lockdep_lock); 5147 + lockdep_lock(); 5560 5148 __lockdep_reset_lock(pf, lock); 5561 5149 __free_zapped_classes(pf); 5562 - arch_spin_unlock(&lockdep_lock); 5150 + lockdep_unlock(); 5563 5151 raw_local_irq_restore(flags); 5564 5152 } 5565 5153
+12 -2
kernel/locking/lockdep_internals.h
··· 106 106 #define STACK_TRACE_HASH_SIZE 16384 107 107 #endif 108 108 109 + /* 110 + * Bit definitions for lock_chain.irq_context 111 + */ 112 + #define LOCK_CHAIN_SOFTIRQ_CONTEXT (1 << 0) 113 + #define LOCK_CHAIN_HARDIRQ_CONTEXT (1 << 1) 114 + 109 115 #define MAX_LOCKDEP_CHAINS (1UL << MAX_LOCKDEP_CHAINS_BITS) 110 116 111 117 #define MAX_LOCKDEP_CHAIN_HLOCKS (MAX_LOCKDEP_CHAINS*5) ··· 130 124 struct lock_class *lock_chain_get_class(struct lock_chain *chain, int i); 131 125 132 126 extern unsigned long nr_lock_classes; 127 + extern unsigned long nr_zapped_classes; 128 + extern unsigned long nr_zapped_lock_chains; 133 129 extern unsigned long nr_list_entries; 134 130 long lockdep_next_lockchain(long i); 135 131 unsigned long lock_chain_count(void); 136 - extern int nr_chain_hlocks; 137 132 extern unsigned long nr_stack_trace_entries; 138 133 139 134 extern unsigned int nr_hardirq_chains; 140 135 extern unsigned int nr_softirq_chains; 141 136 extern unsigned int nr_process_chains; 142 - extern unsigned int max_lockdep_depth; 137 + extern unsigned int nr_free_chain_hlocks; 138 + extern unsigned int nr_lost_chain_hlocks; 139 + extern unsigned int nr_large_chain_blocks; 143 140 141 + extern unsigned int max_lockdep_depth; 144 142 extern unsigned int max_bfs_queue_depth; 145 143 146 144 #ifdef CONFIG_PROVE_LOCKING
+27 -4
kernel/locking/lockdep_proc.c
··· 128 128 struct lock_chain *chain = v; 129 129 struct lock_class *class; 130 130 int i; 131 + static const char * const irq_strs[] = { 132 + [0] = "0", 133 + [LOCK_CHAIN_HARDIRQ_CONTEXT] = "hardirq", 134 + [LOCK_CHAIN_SOFTIRQ_CONTEXT] = "softirq", 135 + [LOCK_CHAIN_SOFTIRQ_CONTEXT| 136 + LOCK_CHAIN_HARDIRQ_CONTEXT] = "hardirq|softirq", 137 + }; 131 138 132 139 if (v == SEQ_START_TOKEN) { 133 - if (nr_chain_hlocks > MAX_LOCKDEP_CHAIN_HLOCKS) 140 + if (!nr_free_chain_hlocks) 134 141 seq_printf(m, "(buggered) "); 135 142 seq_printf(m, "all lock chains:\n"); 136 143 return 0; 137 144 } 138 145 139 - seq_printf(m, "irq_context: %d\n", chain->irq_context); 146 + seq_printf(m, "irq_context: %s\n", irq_strs[chain->irq_context]); 140 147 141 148 for (i = 0; i < chain->depth; i++) { 142 149 class = lock_chain_get_class(chain, i); ··· 278 271 #ifdef CONFIG_PROVE_LOCKING 279 272 seq_printf(m, " dependency chains: %11lu [max: %lu]\n", 280 273 lock_chain_count(), MAX_LOCKDEP_CHAINS); 281 - seq_printf(m, " dependency chain hlocks: %11d [max: %lu]\n", 282 - nr_chain_hlocks, MAX_LOCKDEP_CHAIN_HLOCKS); 274 + seq_printf(m, " dependency chain hlocks used: %11lu [max: %lu]\n", 275 + MAX_LOCKDEP_CHAIN_HLOCKS - 276 + (nr_free_chain_hlocks + nr_lost_chain_hlocks), 277 + MAX_LOCKDEP_CHAIN_HLOCKS); 278 + seq_printf(m, " dependency chain hlocks lost: %11u\n", 279 + nr_lost_chain_hlocks); 283 280 #endif 284 281 285 282 #ifdef CONFIG_TRACE_IRQFLAGS ··· 347 336 seq_printf(m, " debug_locks: %11u\n", 348 337 debug_locks); 349 338 339 + /* 340 + * Zappped classes and lockdep data buffers reuse statistics. 341 + */ 342 + seq_puts(m, "\n"); 343 + seq_printf(m, " zapped classes: %11lu\n", 344 + nr_zapped_classes); 345 + #ifdef CONFIG_PROVE_LOCKING 346 + seq_printf(m, " zapped lock chains: %11lu\n", 347 + nr_zapped_lock_chains); 348 + seq_printf(m, " large chain blocks: %11u\n", 349 + nr_large_chain_blocks); 350 + #endif 350 351 return 0; 351 352 } 352 353
+1 -1
kernel/locking/mutex-debug.c
··· 85 85 * Make sure we are not reinitializing a held lock: 86 86 */ 87 87 debug_check_no_locks_freed((void *)lock, sizeof(*lock)); 88 - lockdep_init_map(&lock->dep_map, name, key, 0); 88 + lockdep_init_map_wait(&lock->dep_map, name, key, 0, LD_WAIT_SLEEP); 89 89 #endif 90 90 lock->magic = lock; 91 91 }
+145 -71
kernel/locking/percpu-rwsem.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0-only 2 2 #include <linux/atomic.h> 3 - #include <linux/rwsem.h> 4 3 #include <linux/percpu.h> 4 + #include <linux/wait.h> 5 5 #include <linux/lockdep.h> 6 6 #include <linux/percpu-rwsem.h> 7 7 #include <linux/rcupdate.h> 8 8 #include <linux/sched.h> 9 + #include <linux/sched/task.h> 9 10 #include <linux/errno.h> 10 11 11 - #include "rwsem.h" 12 - 13 12 int __percpu_init_rwsem(struct percpu_rw_semaphore *sem, 14 - const char *name, struct lock_class_key *rwsem_key) 13 + const char *name, struct lock_class_key *key) 15 14 { 16 15 sem->read_count = alloc_percpu(int); 17 16 if (unlikely(!sem->read_count)) 18 17 return -ENOMEM; 19 18 20 - /* ->rw_sem represents the whole percpu_rw_semaphore for lockdep */ 21 19 rcu_sync_init(&sem->rss); 22 - __init_rwsem(&sem->rw_sem, name, rwsem_key); 23 20 rcuwait_init(&sem->writer); 24 - sem->readers_block = 0; 21 + init_waitqueue_head(&sem->waiters); 22 + atomic_set(&sem->block, 0); 23 + #ifdef CONFIG_DEBUG_LOCK_ALLOC 24 + debug_check_no_locks_freed((void *)sem, sizeof(*sem)); 25 + lockdep_init_map(&sem->dep_map, name, key, 0); 26 + #endif 25 27 return 0; 26 28 } 27 29 EXPORT_SYMBOL_GPL(__percpu_init_rwsem); ··· 43 41 } 44 42 EXPORT_SYMBOL_GPL(percpu_free_rwsem); 45 43 46 - int __percpu_down_read(struct percpu_rw_semaphore *sem, int try) 44 + static bool __percpu_down_read_trylock(struct percpu_rw_semaphore *sem) 47 45 { 46 + __this_cpu_inc(*sem->read_count); 47 + 48 48 /* 49 49 * Due to having preemption disabled the decrement happens on 50 50 * the same CPU as the increment, avoiding the 51 51 * increment-on-one-CPU-and-decrement-on-another problem. 52 52 * 53 - * If the reader misses the writer's assignment of readers_block, then 54 - * the writer is guaranteed to see the reader's increment. 53 + * If the reader misses the writer's assignment of sem->block, then the 54 + * writer is guaranteed to see the reader's increment. 55 55 * 56 56 * Conversely, any readers that increment their sem->read_count after 57 - * the writer looks are guaranteed to see the readers_block value, 58 - * which in turn means that they are guaranteed to immediately 59 - * decrement their sem->read_count, so that it doesn't matter that the 60 - * writer missed them. 57 + * the writer looks are guaranteed to see the sem->block value, which 58 + * in turn means that they are guaranteed to immediately decrement 59 + * their sem->read_count, so that it doesn't matter that the writer 60 + * missed them. 61 61 */ 62 62 63 63 smp_mb(); /* A matches D */ 64 64 65 65 /* 66 - * If !readers_block the critical section starts here, matched by the 66 + * If !sem->block the critical section starts here, matched by the 67 67 * release in percpu_up_write(). 68 68 */ 69 - if (likely(!smp_load_acquire(&sem->readers_block))) 70 - return 1; 69 + if (likely(!atomic_read_acquire(&sem->block))) 70 + return true; 71 71 72 - /* 73 - * Per the above comment; we still have preemption disabled and 74 - * will thus decrement on the same CPU as we incremented. 75 - */ 76 - __percpu_up_read(sem); 77 - 78 - if (try) 79 - return 0; 80 - 81 - /* 82 - * We either call schedule() in the wait, or we'll fall through 83 - * and reschedule on the preempt_enable() in percpu_down_read(). 84 - */ 85 - preempt_enable_no_resched(); 86 - 87 - /* 88 - * Avoid lockdep for the down/up_read() we already have them. 89 - */ 90 - __down_read(&sem->rw_sem); 91 - this_cpu_inc(*sem->read_count); 92 - __up_read(&sem->rw_sem); 93 - 94 - preempt_disable(); 95 - return 1; 96 - } 97 - EXPORT_SYMBOL_GPL(__percpu_down_read); 98 - 99 - void __percpu_up_read(struct percpu_rw_semaphore *sem) 100 - { 101 - smp_mb(); /* B matches C */ 102 - /* 103 - * In other words, if they see our decrement (presumably to aggregate 104 - * zero, as that is the only time it matters) they will also see our 105 - * critical section. 106 - */ 107 72 __this_cpu_dec(*sem->read_count); 108 73 109 - /* Prod writer to recheck readers_active */ 74 + /* Prod writer to re-evaluate readers_active_check() */ 110 75 rcuwait_wake_up(&sem->writer); 76 + 77 + return false; 111 78 } 112 - EXPORT_SYMBOL_GPL(__percpu_up_read); 79 + 80 + static inline bool __percpu_down_write_trylock(struct percpu_rw_semaphore *sem) 81 + { 82 + if (atomic_read(&sem->block)) 83 + return false; 84 + 85 + return atomic_xchg(&sem->block, 1) == 0; 86 + } 87 + 88 + static bool __percpu_rwsem_trylock(struct percpu_rw_semaphore *sem, bool reader) 89 + { 90 + if (reader) { 91 + bool ret; 92 + 93 + preempt_disable(); 94 + ret = __percpu_down_read_trylock(sem); 95 + preempt_enable(); 96 + 97 + return ret; 98 + } 99 + return __percpu_down_write_trylock(sem); 100 + } 101 + 102 + /* 103 + * The return value of wait_queue_entry::func means: 104 + * 105 + * <0 - error, wakeup is terminated and the error is returned 106 + * 0 - no wakeup, a next waiter is tried 107 + * >0 - woken, if EXCLUSIVE, counted towards @nr_exclusive. 108 + * 109 + * We use EXCLUSIVE for both readers and writers to preserve FIFO order, 110 + * and play games with the return value to allow waking multiple readers. 111 + * 112 + * Specifically, we wake readers until we've woken a single writer, or until a 113 + * trylock fails. 114 + */ 115 + static int percpu_rwsem_wake_function(struct wait_queue_entry *wq_entry, 116 + unsigned int mode, int wake_flags, 117 + void *key) 118 + { 119 + struct task_struct *p = get_task_struct(wq_entry->private); 120 + bool reader = wq_entry->flags & WQ_FLAG_CUSTOM; 121 + struct percpu_rw_semaphore *sem = key; 122 + 123 + /* concurrent against percpu_down_write(), can get stolen */ 124 + if (!__percpu_rwsem_trylock(sem, reader)) 125 + return 1; 126 + 127 + list_del_init(&wq_entry->entry); 128 + smp_store_release(&wq_entry->private, NULL); 129 + 130 + wake_up_process(p); 131 + put_task_struct(p); 132 + 133 + return !reader; /* wake (readers until) 1 writer */ 134 + } 135 + 136 + static void percpu_rwsem_wait(struct percpu_rw_semaphore *sem, bool reader) 137 + { 138 + DEFINE_WAIT_FUNC(wq_entry, percpu_rwsem_wake_function); 139 + bool wait; 140 + 141 + spin_lock_irq(&sem->waiters.lock); 142 + /* 143 + * Serialize against the wakeup in percpu_up_write(), if we fail 144 + * the trylock, the wakeup must see us on the list. 145 + */ 146 + wait = !__percpu_rwsem_trylock(sem, reader); 147 + if (wait) { 148 + wq_entry.flags |= WQ_FLAG_EXCLUSIVE | reader * WQ_FLAG_CUSTOM; 149 + __add_wait_queue_entry_tail(&sem->waiters, &wq_entry); 150 + } 151 + spin_unlock_irq(&sem->waiters.lock); 152 + 153 + while (wait) { 154 + set_current_state(TASK_UNINTERRUPTIBLE); 155 + if (!smp_load_acquire(&wq_entry.private)) 156 + break; 157 + schedule(); 158 + } 159 + __set_current_state(TASK_RUNNING); 160 + } 161 + 162 + bool __percpu_down_read(struct percpu_rw_semaphore *sem, bool try) 163 + { 164 + if (__percpu_down_read_trylock(sem)) 165 + return true; 166 + 167 + if (try) 168 + return false; 169 + 170 + preempt_enable(); 171 + percpu_rwsem_wait(sem, /* .reader = */ true); 172 + preempt_disable(); 173 + 174 + return true; 175 + } 176 + EXPORT_SYMBOL_GPL(__percpu_down_read); 113 177 114 178 #define per_cpu_sum(var) \ 115 179 ({ \ ··· 192 124 * zero. If this sum is zero, then it is stable due to the fact that if any 193 125 * newly arriving readers increment a given counter, they will immediately 194 126 * decrement that same counter. 127 + * 128 + * Assumes sem->block is set. 195 129 */ 196 130 static bool readers_active_check(struct percpu_rw_semaphore *sem) 197 131 { ··· 212 142 213 143 void percpu_down_write(struct percpu_rw_semaphore *sem) 214 144 { 145 + might_sleep(); 146 + rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_); 147 + 215 148 /* Notify readers to take the slow path. */ 216 149 rcu_sync_enter(&sem->rss); 217 150 218 - down_write(&sem->rw_sem); 151 + /* 152 + * Try set sem->block; this provides writer-writer exclusion. 153 + * Having sem->block set makes new readers block. 154 + */ 155 + if (!__percpu_down_write_trylock(sem)) 156 + percpu_rwsem_wait(sem, /* .reader = */ false); 157 + 158 + /* smp_mb() implied by __percpu_down_write_trylock() on success -- D matches A */ 219 159 220 160 /* 221 - * Notify new readers to block; up until now, and thus throughout the 222 - * longish rcu_sync_enter() above, new readers could still come in. 223 - */ 224 - WRITE_ONCE(sem->readers_block, 1); 225 - 226 - smp_mb(); /* D matches A */ 227 - 228 - /* 229 - * If they don't see our writer of readers_block, then we are 230 - * guaranteed to see their sem->read_count increment, and therefore 231 - * will wait for them. 161 + * If they don't see our store of sem->block, then we are guaranteed to 162 + * see their sem->read_count increment, and therefore will wait for 163 + * them. 232 164 */ 233 165 234 - /* Wait for all now active readers to complete. */ 235 - rcuwait_wait_event(&sem->writer, readers_active_check(sem)); 166 + /* Wait for all active readers to complete. */ 167 + rcuwait_wait_event(&sem->writer, readers_active_check(sem), TASK_UNINTERRUPTIBLE); 236 168 } 237 169 EXPORT_SYMBOL_GPL(percpu_down_write); 238 170 239 171 void percpu_up_write(struct percpu_rw_semaphore *sem) 240 172 { 173 + rwsem_release(&sem->dep_map, _RET_IP_); 174 + 241 175 /* 242 176 * Signal the writer is done, no fast path yet. 243 177 * ··· 252 178 * Therefore we force it through the slow path which guarantees an 253 179 * acquire and thereby guarantees the critical section's consistency. 254 180 */ 255 - smp_store_release(&sem->readers_block, 0); 181 + atomic_set_release(&sem->block, 0); 256 182 257 183 /* 258 - * Release the write lock, this will allow readers back in the game. 184 + * Prod any pending reader/writer to make progress. 259 185 */ 260 - up_write(&sem->rw_sem); 186 + __wake_up(&sem->waiters, TASK_NORMAL, 1, sem); 261 187 262 188 /* 263 189 * Once this completes (at least one RCU-sched grace period hence) the
+3 -6
kernel/locking/rwsem.c
··· 28 28 #include <linux/rwsem.h> 29 29 #include <linux/atomic.h> 30 30 31 - #include "rwsem.h" 32 31 #include "lock_events.h" 33 32 34 33 /* ··· 328 329 * Make sure we are not reinitializing a held semaphore: 329 330 */ 330 331 debug_check_no_locks_freed((void *)sem, sizeof(*sem)); 331 - lockdep_init_map(&sem->dep_map, name, key, 0); 332 + lockdep_init_map_wait(&sem->dep_map, name, key, 0, LD_WAIT_SLEEP); 332 333 #endif 333 334 #ifdef CONFIG_DEBUG_RWSEMS 334 335 sem->magic = sem; ··· 658 659 struct task_struct *owner; 659 660 unsigned long flags; 660 661 bool ret = true; 661 - 662 - BUILD_BUG_ON(!(RWSEM_OWNER_UNKNOWN & RWSEM_NONSPINNABLE)); 663 662 664 663 if (need_resched()) { 665 664 lockevent_inc(rwsem_opt_fail); ··· 1335 1338 /* 1336 1339 * lock for reading 1337 1340 */ 1338 - inline void __down_read(struct rw_semaphore *sem) 1341 + static inline void __down_read(struct rw_semaphore *sem) 1339 1342 { 1340 1343 if (!rwsem_read_trylock(sem)) { 1341 1344 rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE); ··· 1423 1426 /* 1424 1427 * unlock after reading 1425 1428 */ 1426 - inline void __up_read(struct rw_semaphore *sem) 1429 + static inline void __up_read(struct rw_semaphore *sem) 1427 1430 { 1428 1431 long tmp; 1429 1432
-10
kernel/locking/rwsem.h
··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - 3 - #ifndef __INTERNAL_RWSEM_H 4 - #define __INTERNAL_RWSEM_H 5 - #include <linux/rwsem.h> 6 - 7 - extern void __down_read(struct rw_semaphore *sem); 8 - extern void __up_read(struct rw_semaphore *sem); 9 - 10 - #endif /* __INTERNAL_RWSEM_H */
+3 -3
kernel/locking/spinlock_debug.c
··· 14 14 #include <linux/export.h> 15 15 16 16 void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name, 17 - struct lock_class_key *key) 17 + struct lock_class_key *key, short inner) 18 18 { 19 19 #ifdef CONFIG_DEBUG_LOCK_ALLOC 20 20 /* 21 21 * Make sure we are not reinitializing a held lock: 22 22 */ 23 23 debug_check_no_locks_freed((void *)lock, sizeof(*lock)); 24 - lockdep_init_map(&lock->dep_map, name, key, 0); 24 + lockdep_init_map_wait(&lock->dep_map, name, key, 0, inner); 25 25 #endif 26 26 lock->raw_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; 27 27 lock->magic = SPINLOCK_MAGIC; ··· 39 39 * Make sure we are not reinitializing a held lock: 40 40 */ 41 41 debug_check_no_locks_freed((void *)lock, sizeof(*lock)); 42 - lockdep_init_map(&lock->dep_map, name, key, 0); 42 + lockdep_init_map_wait(&lock->dep_map, name, key, 0, LD_WAIT_CONFIG); 43 43 #endif 44 44 lock->raw_lock = (arch_rwlock_t) __ARCH_RW_LOCK_UNLOCKED; 45 45 lock->magic = RWLOCK_MAGIC;
+1
kernel/rcu/tree.c
··· 1124 1124 !rdp->rcu_iw_pending && rdp->rcu_iw_gp_seq != rnp->gp_seq && 1125 1125 (rnp->ffmask & rdp->grpmask)) { 1126 1126 init_irq_work(&rdp->rcu_iw, rcu_iw_handler); 1127 + atomic_set(&rdp->rcu_iw.flags, IRQ_WORK_HARD_IRQ); 1127 1128 rdp->rcu_iw_pending = true; 1128 1129 rdp->rcu_iw_gp_seq = rnp->gp_seq; 1129 1130 irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
+18 -6
kernel/rcu/update.c
··· 239 239 240 240 #ifdef CONFIG_DEBUG_LOCK_ALLOC 241 241 static struct lock_class_key rcu_lock_key; 242 - struct lockdep_map rcu_lock_map = 243 - STATIC_LOCKDEP_MAP_INIT("rcu_read_lock", &rcu_lock_key); 242 + struct lockdep_map rcu_lock_map = { 243 + .name = "rcu_read_lock", 244 + .key = &rcu_lock_key, 245 + .wait_type_outer = LD_WAIT_FREE, 246 + .wait_type_inner = LD_WAIT_CONFIG, /* XXX PREEMPT_RCU ? */ 247 + }; 244 248 EXPORT_SYMBOL_GPL(rcu_lock_map); 245 249 246 250 static struct lock_class_key rcu_bh_lock_key; 247 - struct lockdep_map rcu_bh_lock_map = 248 - STATIC_LOCKDEP_MAP_INIT("rcu_read_lock_bh", &rcu_bh_lock_key); 251 + struct lockdep_map rcu_bh_lock_map = { 252 + .name = "rcu_read_lock_bh", 253 + .key = &rcu_bh_lock_key, 254 + .wait_type_outer = LD_WAIT_FREE, 255 + .wait_type_inner = LD_WAIT_CONFIG, /* PREEMPT_LOCK also makes BH preemptible */ 256 + }; 249 257 EXPORT_SYMBOL_GPL(rcu_bh_lock_map); 250 258 251 259 static struct lock_class_key rcu_sched_lock_key; 252 - struct lockdep_map rcu_sched_lock_map = 253 - STATIC_LOCKDEP_MAP_INIT("rcu_read_lock_sched", &rcu_sched_lock_key); 260 + struct lockdep_map rcu_sched_lock_map = { 261 + .name = "rcu_read_lock_sched", 262 + .key = &rcu_sched_lock_key, 263 + .wait_type_outer = LD_WAIT_FREE, 264 + .wait_type_inner = LD_WAIT_SPIN, 265 + }; 254 266 EXPORT_SYMBOL_GPL(rcu_sched_lock_map); 255 267 256 268 static struct lock_class_key rcu_callback_key;
+19 -17
kernel/sched/completion.c
··· 29 29 { 30 30 unsigned long flags; 31 31 32 - spin_lock_irqsave(&x->wait.lock, flags); 32 + raw_spin_lock_irqsave(&x->wait.lock, flags); 33 33 34 34 if (x->done != UINT_MAX) 35 35 x->done++; 36 - __wake_up_locked(&x->wait, TASK_NORMAL, 1); 37 - spin_unlock_irqrestore(&x->wait.lock, flags); 36 + swake_up_locked(&x->wait); 37 + raw_spin_unlock_irqrestore(&x->wait.lock, flags); 38 38 } 39 39 EXPORT_SYMBOL(complete); 40 40 ··· 58 58 { 59 59 unsigned long flags; 60 60 61 - spin_lock_irqsave(&x->wait.lock, flags); 61 + lockdep_assert_RT_in_threaded_ctx(); 62 + 63 + raw_spin_lock_irqsave(&x->wait.lock, flags); 62 64 x->done = UINT_MAX; 63 - __wake_up_locked(&x->wait, TASK_NORMAL, 0); 64 - spin_unlock_irqrestore(&x->wait.lock, flags); 65 + swake_up_all_locked(&x->wait); 66 + raw_spin_unlock_irqrestore(&x->wait.lock, flags); 65 67 } 66 68 EXPORT_SYMBOL(complete_all); 67 69 ··· 72 70 long (*action)(long), long timeout, int state) 73 71 { 74 72 if (!x->done) { 75 - DECLARE_WAITQUEUE(wait, current); 73 + DECLARE_SWAITQUEUE(wait); 76 74 77 - __add_wait_queue_entry_tail_exclusive(&x->wait, &wait); 78 75 do { 79 76 if (signal_pending_state(state, current)) { 80 77 timeout = -ERESTARTSYS; 81 78 break; 82 79 } 80 + __prepare_to_swait(&x->wait, &wait); 83 81 __set_current_state(state); 84 - spin_unlock_irq(&x->wait.lock); 82 + raw_spin_unlock_irq(&x->wait.lock); 85 83 timeout = action(timeout); 86 - spin_lock_irq(&x->wait.lock); 84 + raw_spin_lock_irq(&x->wait.lock); 87 85 } while (!x->done && timeout); 88 - __remove_wait_queue(&x->wait, &wait); 86 + __finish_swait(&x->wait, &wait); 89 87 if (!x->done) 90 88 return timeout; 91 89 } ··· 102 100 103 101 complete_acquire(x); 104 102 105 - spin_lock_irq(&x->wait.lock); 103 + raw_spin_lock_irq(&x->wait.lock); 106 104 timeout = do_wait_for_common(x, action, timeout, state); 107 - spin_unlock_irq(&x->wait.lock); 105 + raw_spin_unlock_irq(&x->wait.lock); 108 106 109 107 complete_release(x); 110 108 ··· 293 291 if (!READ_ONCE(x->done)) 294 292 return false; 295 293 296 - spin_lock_irqsave(&x->wait.lock, flags); 294 + raw_spin_lock_irqsave(&x->wait.lock, flags); 297 295 if (!x->done) 298 296 ret = false; 299 297 else if (x->done != UINT_MAX) 300 298 x->done--; 301 - spin_unlock_irqrestore(&x->wait.lock, flags); 299 + raw_spin_unlock_irqrestore(&x->wait.lock, flags); 302 300 return ret; 303 301 } 304 302 EXPORT_SYMBOL(try_wait_for_completion); ··· 324 322 * otherwise we can end up freeing the completion before complete() 325 323 * is done referencing it. 326 324 */ 327 - spin_lock_irqsave(&x->wait.lock, flags); 328 - spin_unlock_irqrestore(&x->wait.lock, flags); 325 + raw_spin_lock_irqsave(&x->wait.lock, flags); 326 + raw_spin_unlock_irqrestore(&x->wait.lock, flags); 329 327 return true; 330 328 } 331 329 EXPORT_SYMBOL(completion_done);
+3
kernel/sched/sched.h
··· 2492 2492 return true; 2493 2493 } 2494 2494 #endif 2495 + 2496 + void swake_up_all_locked(struct swait_queue_head *q); 2497 + void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
+14 -1
kernel/sched/swait.c
··· 32 32 } 33 33 EXPORT_SYMBOL(swake_up_locked); 34 34 35 + /* 36 + * Wake up all waiters. This is an interface which is solely exposed for 37 + * completions and not for general usage. 38 + * 39 + * It is intentionally different from swake_up_all() to allow usage from 40 + * hard interrupt context and interrupt disabled regions. 41 + */ 42 + void swake_up_all_locked(struct swait_queue_head *q) 43 + { 44 + while (!list_empty(&q->task_list)) 45 + swake_up_locked(q); 46 + } 47 + 35 48 void swake_up_one(struct swait_queue_head *q) 36 49 { 37 50 unsigned long flags; ··· 82 69 } 83 70 EXPORT_SYMBOL(swake_up_all); 84 71 85 - static void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait) 72 + void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait) 86 73 { 87 74 wait->task = current; 88 75 if (list_empty(&wait->task_list))
+5 -1
kernel/time/hrtimer.c
··· 1404 1404 base = softtimer ? HRTIMER_MAX_CLOCK_BASES / 2 : 0; 1405 1405 base += hrtimer_clockid_to_base(clock_id); 1406 1406 timer->is_soft = softtimer; 1407 - timer->is_hard = !softtimer; 1407 + timer->is_hard = !!(mode & HRTIMER_MODE_HARD); 1408 1408 timer->base = &cpu_base->clock_base[base]; 1409 1409 timerqueue_init(&timer->node); 1410 1410 } ··· 1514 1514 */ 1515 1515 raw_spin_unlock_irqrestore(&cpu_base->lock, flags); 1516 1516 trace_hrtimer_expire_entry(timer, now); 1517 + lockdep_hrtimer_enter(timer); 1518 + 1517 1519 restart = fn(timer); 1520 + 1521 + lockdep_hrtimer_exit(timer); 1518 1522 trace_hrtimer_expire_exit(timer); 1519 1523 raw_spin_lock_irq(&cpu_base->lock); 1520 1524
+4 -3
kernel/time/jiffies.c
··· 58 58 .max_cycles = 10, 59 59 }; 60 60 61 - __cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock); 61 + __cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(jiffies_lock); 62 + __cacheline_aligned_in_smp seqcount_t jiffies_seq; 62 63 63 64 #if (BITS_PER_LONG < 64) 64 65 u64 get_jiffies_64(void) ··· 68 67 u64 ret; 69 68 70 69 do { 71 - seq = read_seqbegin(&jiffies_lock); 70 + seq = read_seqcount_begin(&jiffies_seq); 72 71 ret = jiffies_64; 73 - } while (read_seqretry(&jiffies_lock, seq)); 72 + } while (read_seqcount_retry(&jiffies_seq, seq)); 74 73 return ret; 75 74 } 76 75 EXPORT_SYMBOL(get_jiffies_64);
+5 -1
kernel/time/posix-cpu-timers.c
··· 1126 1126 if (!fastpath_timer_check(tsk)) 1127 1127 return; 1128 1128 1129 - if (!lock_task_sighand(tsk, &flags)) 1129 + lockdep_posixtimer_enter(); 1130 + if (!lock_task_sighand(tsk, &flags)) { 1131 + lockdep_posixtimer_exit(); 1130 1132 return; 1133 + } 1131 1134 /* 1132 1135 * Here we take off tsk->signal->cpu_timers[N] and 1133 1136 * tsk->cpu_timers[N] all the timers that are firing, and ··· 1172 1169 cpu_timer_fire(timer); 1173 1170 spin_unlock(&timer->it_lock); 1174 1171 } 1172 + lockdep_posixtimer_exit(); 1175 1173 } 1176 1174 1177 1175 /*
+6 -4
kernel/time/tick-common.c
··· 84 84 static void tick_periodic(int cpu) 85 85 { 86 86 if (tick_do_timer_cpu == cpu) { 87 - write_seqlock(&jiffies_lock); 87 + raw_spin_lock(&jiffies_lock); 88 + write_seqcount_begin(&jiffies_seq); 88 89 89 90 /* Keep track of the next tick event */ 90 91 tick_next_period = ktime_add(tick_next_period, tick_period); 91 92 92 93 do_timer(1); 93 - write_sequnlock(&jiffies_lock); 94 + write_seqcount_end(&jiffies_seq); 95 + raw_spin_unlock(&jiffies_lock); 94 96 update_wall_time(); 95 97 } 96 98 ··· 164 162 ktime_t next; 165 163 166 164 do { 167 - seq = read_seqbegin(&jiffies_lock); 165 + seq = read_seqcount_begin(&jiffies_seq); 168 166 next = tick_next_period; 169 - } while (read_seqretry(&jiffies_lock, seq)); 167 + } while (read_seqcount_retry(&jiffies_seq, seq)); 170 168 171 169 clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT); 172 170
+13 -7
kernel/time/tick-sched.c
··· 65 65 return; 66 66 67 67 /* Reevaluate with jiffies_lock held */ 68 - write_seqlock(&jiffies_lock); 68 + raw_spin_lock(&jiffies_lock); 69 + write_seqcount_begin(&jiffies_seq); 69 70 70 71 delta = ktime_sub(now, last_jiffies_update); 71 72 if (delta >= tick_period) { ··· 92 91 /* Keep the tick_next_period variable up to date */ 93 92 tick_next_period = ktime_add(last_jiffies_update, tick_period); 94 93 } else { 95 - write_sequnlock(&jiffies_lock); 94 + write_seqcount_end(&jiffies_seq); 95 + raw_spin_unlock(&jiffies_lock); 96 96 return; 97 97 } 98 - write_sequnlock(&jiffies_lock); 98 + write_seqcount_end(&jiffies_seq); 99 + raw_spin_unlock(&jiffies_lock); 99 100 update_wall_time(); 100 101 } 101 102 ··· 108 105 { 109 106 ktime_t period; 110 107 111 - write_seqlock(&jiffies_lock); 108 + raw_spin_lock(&jiffies_lock); 109 + write_seqcount_begin(&jiffies_seq); 112 110 /* Did we start the jiffies update yet ? */ 113 111 if (last_jiffies_update == 0) 114 112 last_jiffies_update = tick_next_period; 115 113 period = last_jiffies_update; 116 - write_sequnlock(&jiffies_lock); 114 + write_seqcount_end(&jiffies_seq); 115 + raw_spin_unlock(&jiffies_lock); 117 116 return period; 118 117 } 119 118 ··· 245 240 246 241 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = { 247 242 .func = nohz_full_kick_func, 243 + .flags = ATOMIC_INIT(IRQ_WORK_HARD_IRQ), 248 244 }; 249 245 250 246 /* ··· 682 676 683 677 /* Read jiffies and the time when jiffies were updated last */ 684 678 do { 685 - seq = read_seqbegin(&jiffies_lock); 679 + seq = read_seqcount_begin(&jiffies_seq); 686 680 basemono = last_jiffies_update; 687 681 basejiff = jiffies; 688 - } while (read_seqretry(&jiffies_lock, seq)); 682 + } while (read_seqcount_retry(&jiffies_seq, seq)); 689 683 ts->last_jiffies = basejiff; 690 684 ts->timer_expires_base = basemono; 691 685
+4 -2
kernel/time/timekeeping.c
··· 2397 2397 */ 2398 2398 void xtime_update(unsigned long ticks) 2399 2399 { 2400 - write_seqlock(&jiffies_lock); 2400 + raw_spin_lock(&jiffies_lock); 2401 + write_seqcount_begin(&jiffies_seq); 2401 2402 do_timer(ticks); 2402 - write_sequnlock(&jiffies_lock); 2403 + write_seqcount_end(&jiffies_seq); 2404 + raw_spin_unlock(&jiffies_lock); 2403 2405 update_wall_time(); 2404 2406 }
+2 -1
kernel/time/timekeeping.h
··· 25 25 extern void do_timer(unsigned long ticks); 26 26 extern void update_wall_time(void); 27 27 28 - extern seqlock_t jiffies_lock; 28 + extern raw_spinlock_t jiffies_lock; 29 + extern seqcount_t jiffies_seq; 29 30 30 31 #define CS_NAME_LEN 32 31 32
+17
lib/Kconfig.debug
··· 1086 1086 1087 1087 For more details, see Documentation/locking/lockdep-design.rst. 1088 1088 1089 + config PROVE_RAW_LOCK_NESTING 1090 + bool "Enable raw_spinlock - spinlock nesting checks" 1091 + depends on PROVE_LOCKING 1092 + default n 1093 + help 1094 + Enable the raw_spinlock vs. spinlock nesting checks which ensure 1095 + that the lock nesting rules for PREEMPT_RT enabled kernels are 1096 + not violated. 1097 + 1098 + NOTE: There are known nesting problems. So if you enable this 1099 + option expect lockdep splats until these problems have been fully 1100 + addressed which is work in progress. This config switch allows to 1101 + identify and analyze these problems. It will be removed and the 1102 + check permanentely enabled once the main issues have been fixed. 1103 + 1104 + If unsure, select N. 1105 + 1089 1106 config LOCK_STAT 1090 1107 bool "Lock usage statistics" 1091 1108 depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
+1
tools/objtool/check.c
··· 488 488 "__sanitizer_cov_trace_cmp2", 489 489 "__sanitizer_cov_trace_cmp4", 490 490 "__sanitizer_cov_trace_cmp8", 491 + "__sanitizer_cov_trace_switch", 491 492 /* UBSAN */ 492 493 "ubsan_type_mismatch_common", 493 494 "__ubsan_handle_type_mismatch",