Merge tag 'locking-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

tjh.dev / kernel

fork atom

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

fork atom

Merge tag 'locking-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
"The biggest change to core locking facilities in this cycle is the
introduction of local_lock_t - this primitive comes from the -rt
project and identifies CPU-local locking dependencies normally handled
opaquely beind preempt_disable() or local_irq_save/disable() critical
sections.

The generated code on mainline kernels doesn't change as a result, but
still there are benefits: improved debugging and better documentation
of data structure accesses.

The new local_lock_t primitives are introduced and then utilized in a
couple of kernel subsystems. No change in functionality is intended.

There's also other smaller changes and cleanups"

* tag 'locking-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
zram: Use local lock to protect per-CPU data
zram: Allocate struct zcomp_strm as per-CPU memory
connector/cn_proc: Protect send_msg() with a local lock
squashfs: Make use of local lock in multi_cpu decompressor
mm/swap: Use local_lock for protection
radix-tree: Use local_lock for protection
locking: Introduce local_lock()
locking/lockdep: Replace zero-length array with flexible-array
locking/rtmutex: Remove unused rt_mutex_cmpxchg_relaxed()

Linus Torvalds 5 years ago 60056060 2227e5b2

+502 -110

15 changed files

expand all collapse all

Documentation

locking

locktypes.rst

drivers

block

zram

zcomp.c

zcomp.h

connector

cn_proc.c

squashfs

decompressor_multi_percpu.c

include

linux

idr.h

local_lock.h

local_lock_internal.h

radix-tree.h

swap.h

kernel

locking

lockdep.c

rtmutex.c

lib

radix-tree.c

compaction.c

swap.c

+204 -11

Documentation/locking/locktypes.rst

reviewed

··· 13 13 into two categories: 14 14 15 15 - Sleeping locks 16 16 + - CPU local locks 16 17 - Spinning locks 17 18 18 19 This document conceptually describes these lock types and provides rules ··· 45 44 46 45 On PREEMPT_RT kernels, these lock types are converted to sleeping locks: 47 46 47 47 + - local_lock 48 48 - spinlock_t 49 49 - rwlock_t 50 50 + 51 51 + 52 52 + CPU local locks 53 53 + --------------- 54 54 + 55 55 + - local_lock 56 56 + 57 57 + On non-PREEMPT_RT kernels, local_lock functions are wrappers around 58 58 + preemption and interrupt disabling primitives. Contrary to other locking 59 59 + mechanisms, disabling preemption or interrupts are pure CPU local 60 60 + concurrency control mechanisms and not suited for inter-CPU concurrency 61 61 + control. 62 62 + 50 63 51 64 Spinning locks 52 65 -------------- ··· 81 66 _irq() Disable / enable interrupts 82 67 _irqsave/restore() Save and disable / restore interrupt disabled state 83 68 =================== ==================================================== 69 69 + 84 70 85 71 Owner semantics 86 72 =============== ··· 153 137 can grant their priority to a writer, a preempted low-priority writer will 154 138 have its priority boosted until it releases the lock, thus preventing that 155 139 writer from starving readers. 140 140 + 141 141 + 142 142 + local_lock 143 143 + ========== 144 144 + 145 145 + local_lock provides a named scope to critical sections which are protected 146 146 + by disabling preemption or interrupts. 147 147 + 148 148 + On non-PREEMPT_RT kernels local_lock operations map to the preemption and 149 149 + interrupt disabling and enabling primitives: 150 150 + 151 151 + =========================== ====================== 152 152 + local_lock(&llock) preempt_disable() 153 153 + local_unlock(&llock) preempt_enable() 154 154 + local_lock_irq(&llock) local_irq_disable() 155 155 + local_unlock_irq(&llock) local_irq_enable() 156 156 + local_lock_save(&llock) local_irq_save() 157 157 + local_lock_restore(&llock) local_irq_save() 158 158 + =========================== ====================== 159 159 + 160 160 + The named scope of local_lock has two advantages over the regular 161 161 + primitives: 162 162 + 163 163 + - The lock name allows static analysis and is also a clear documentation 164 164 + of the protection scope while the regular primitives are scopeless and 165 165 + opaque. 166 166 + 167 167 + - If lockdep is enabled the local_lock gains a lockmap which allows to 168 168 + validate the correctness of the protection. This can detect cases where 169 169 + e.g. a function using preempt_disable() as protection mechanism is 170 170 + invoked from interrupt or soft-interrupt context. Aside of that 171 171 + lockdep_assert_held(&llock) works as with any other locking primitive. 172 172 + 173 173 + local_lock and PREEMPT_RT 174 174 + ------------------------- 175 175 + 176 176 + PREEMPT_RT kernels map local_lock to a per-CPU spinlock_t, thus changing 177 177 + semantics: 178 178 + 179 179 + - All spinlock_t changes also apply to local_lock. 180 180 + 181 181 + local_lock usage 182 182 + ---------------- 183 183 + 184 184 + local_lock should be used in situations where disabling preemption or 185 185 + interrupts is the appropriate form of concurrency control to protect 186 186 + per-CPU data structures on a non PREEMPT_RT kernel. 187 187 + 188 188 + local_lock is not suitable to protect against preemption or interrupts on a 189 189 + PREEMPT_RT kernel due to the PREEMPT_RT specific spinlock_t semantics. 156 190 157 191 158 192 raw_spinlock_t and spinlock_t ··· 324 258 PREEMPT_RT caveats 325 259 ================== 326 260 261 261 + local_lock on RT 262 262 + ---------------- 263 263 + 264 264 + The mapping of local_lock to spinlock_t on PREEMPT_RT kernels has a few 265 265 + implications. For example, on a non-PREEMPT_RT kernel the following code 266 266 + sequence works as expected:: 267 267 + 268 268 + local_lock_irq(&local_lock); 269 269 + raw_spin_lock(&lock); 270 270 + 271 271 + and is fully equivalent to:: 272 272 + 273 273 + raw_spin_lock_irq(&lock); 274 274 + 275 275 + On a PREEMPT_RT kernel this code sequence breaks because local_lock_irq() 276 276 + is mapped to a per-CPU spinlock_t which neither disables interrupts nor 277 277 + preemption. The following code sequence works perfectly correct on both 278 278 + PREEMPT_RT and non-PREEMPT_RT kernels:: 279 279 + 280 280 + local_lock_irq(&local_lock); 281 281 + spin_lock(&lock); 282 282 + 283 283 + Another caveat with local locks is that each local_lock has a specific 284 284 + protection scope. So the following substitution is wrong:: 285 285 + 286 286 + func1() 287 287 + { 288 288 + local_irq_save(flags); -> local_lock_irqsave(&local_lock_1, flags); 289 289 + func3(); 290 290 + local_irq_restore(flags); -> local_lock_irqrestore(&local_lock_1, flags); 291 291 + } 292 292 + 293 293 + func2() 294 294 + { 295 295 + local_irq_save(flags); -> local_lock_irqsave(&local_lock_2, flags); 296 296 + func3(); 297 297 + local_irq_restore(flags); -> local_lock_irqrestore(&local_lock_2, flags); 298 298 + } 299 299 + 300 300 + func3() 301 301 + { 302 302 + lockdep_assert_irqs_disabled(); 303 303 + access_protected_data(); 304 304 + } 305 305 + 306 306 + On a non-PREEMPT_RT kernel this works correctly, but on a PREEMPT_RT kernel 307 307 + local_lock_1 and local_lock_2 are distinct and cannot serialize the callers 308 308 + of func3(). Also the lockdep assert will trigger on a PREEMPT_RT kernel 309 309 + because local_lock_irqsave() does not disable interrupts due to the 310 310 + PREEMPT_RT-specific semantics of spinlock_t. The correct substitution is:: 311 311 + 312 312 + func1() 313 313 + { 314 314 + local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags); 315 315 + func3(); 316 316 + local_irq_restore(flags); -> local_lock_irqrestore(&local_lock, flags); 317 317 + } 318 318 + 319 319 + func2() 320 320 + { 321 321 + local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags); 322 322 + func3(); 323 323 + local_irq_restore(flags); -> local_lock_irqrestore(&local_lock, flags); 324 324 + } 325 325 + 326 326 + func3() 327 327 + { 328 328 + lockdep_assert_held(&local_lock); 329 329 + access_protected_data(); 330 330 + } 331 331 + 332 332 + 327 333 spinlock_t and rwlock_t 328 334 ----------------------- 329 335 330 330 - These changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels 336 336 + The changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels 331 337 have a few implications. For example, on a non-PREEMPT_RT kernel the 332 338 following code sequence works as expected:: 333 339 ··· 420 282 allowing things like per-CPU interrupt disabled locks to be acquired. 421 283 However, this approach should be used only where absolutely necessary. 422 284 285 285 + A typical scenario is protection of per-CPU variables in thread context:: 423 286 424 424 - raw_spinlock_t 425 425 - -------------- 287 287 + struct foo *p = get_cpu_ptr(&var1); 288 288 + 289 289 + spin_lock(&p->lock); 290 290 + p->count += this_cpu_read(var2); 291 291 + 292 292 + This is correct code on a non-PREEMPT_RT kernel, but on a PREEMPT_RT kernel 293 293 + this breaks. The PREEMPT_RT-specific change of spinlock_t semantics does 294 294 + not allow to acquire p->lock because get_cpu_ptr() implicitly disables 295 295 + preemption. The following substitution works on both kernels:: 296 296 + 297 297 + struct foo *p; 298 298 + 299 299 + migrate_disable(); 300 300 + p = this_cpu_ptr(&var1); 301 301 + spin_lock(&p->lock); 302 302 + p->count += this_cpu_read(var2); 303 303 + 304 304 + On a non-PREEMPT_RT kernel migrate_disable() maps to preempt_disable() 305 305 + which makes the above code fully equivalent. On a PREEMPT_RT kernel 306 306 + migrate_disable() ensures that the task is pinned on the current CPU which 307 307 + in turn guarantees that the per-CPU access to var1 and var2 are staying on 308 308 + the same CPU. 309 309 + 310 310 + The migrate_disable() substitution is not valid for the following 311 311 + scenario:: 312 312 + 313 313 + func() 314 314 + { 315 315 + struct foo *p; 316 316 + 317 317 + migrate_disable(); 318 318 + p = this_cpu_ptr(&var1); 319 319 + p->val = func2(); 320 320 + 321 321 + While correct on a non-PREEMPT_RT kernel, this breaks on PREEMPT_RT because 322 322 + here migrate_disable() does not protect against reentrancy from a 323 323 + preempting task. A correct substitution for this case is:: 324 324 + 325 325 + func() 326 326 + { 327 327 + struct foo *p; 328 328 + 329 329 + local_lock(&foo_lock); 330 330 + p = this_cpu_ptr(&var1); 331 331 + p->val = func2(); 332 332 + 333 333 + On a non-PREEMPT_RT kernel this protects against reentrancy by disabling 334 334 + preemption. On a PREEMPT_RT kernel this is achieved by acquiring the 335 335 + underlying per-CPU spinlock. 336 336 + 337 337 + 338 338 + raw_spinlock_t on RT 339 339 + -------------------- 426 340 427 341 Acquiring a raw_spinlock_t disables preemption and possibly also 428 342 interrupts, so the critical section must avoid acquiring a regular ··· 515 325 516 326 The most basic rules are: 517 327 518 518 - - Lock types of the same lock category (sleeping, spinning) can nest 519 519 - arbitrarily as long as they respect the general lock ordering rules to 520 520 - prevent deadlocks. 328 328 + - Lock types of the same lock category (sleeping, CPU local, spinning) 329 329 + can nest arbitrarily as long as they respect the general lock ordering 330 330 + rules to prevent deadlocks. 521 331 522 522 - - Sleeping lock types cannot nest inside spinning lock types. 332 332 + - Sleeping lock types cannot nest inside CPU local and spinning lock types. 523 333 524 524 - - Spinning lock types can nest inside sleeping lock types. 334 334 + - CPU local and spinning lock types can nest inside sleeping lock types. 335 335 + 336 336 + - Spinning lock types can nest inside all lock types 525 337 526 338 These constraints apply both in PREEMPT_RT and otherwise. 527 339 528 340 The fact that PREEMPT_RT changes the lock category of spinlock_t and 529 529 - rwlock_t from spinning to sleeping means that they cannot be acquired while 530 530 - holding a raw spinlock. This results in the following nesting ordering: 341 341 + rwlock_t from spinning to sleeping and substitutes local_lock with a 342 342 + per-CPU spinlock_t means that they cannot be acquired while holding a raw 343 343 + spinlock. This results in the following nesting ordering: 531 344 532 345 1) Sleeping locks 533 533 - 2) spinlock_t and rwlock_t 346 346 + 2) spinlock_t, rwlock_t, local_lock 534 347 3) raw_spinlock_t and bit spinlocks 535 348 536 349 Lockdep will complain if these constraints are violated, both in

+19 -25

drivers/block/zram/zcomp.c

reviewed

··· 37 37 if (!IS_ERR_OR_NULL(zstrm->tfm)) 38 38 crypto_free_comp(zstrm->tfm); 39 39 free_pages((unsigned long)zstrm->buffer, 1); 40 40 - kfree(zstrm); 40 40 + zstrm->tfm = NULL; 41 41 + zstrm->buffer = NULL; 41 42 } 42 43 43 44 /* 44 44 - * allocate new zcomp_strm structure with ->tfm initialized by 45 45 - * backend, return NULL on error 45 45 + * Initialize zcomp_strm structure with ->tfm initialized by backend, and 46 46 + * ->buffer. Return a negative value on error. 46 47 */ 47 47 - static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp) 48 48 + static int zcomp_strm_init(struct zcomp_strm *zstrm, struct zcomp *comp) 48 49 { 49 49 - struct zcomp_strm *zstrm = kmalloc(sizeof(*zstrm), GFP_KERNEL); 50 50 - if (!zstrm) 51 51 - return NULL; 52 52 - 53 50 zstrm->tfm = crypto_alloc_comp(comp->name, 0, 0); 54 51 /* 55 52 * allocate 2 pages. 1 for compressed data, plus 1 extra for the ··· 55 58 zstrm->buffer = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1); 56 59 if (IS_ERR_OR_NULL(zstrm->tfm) || !zstrm->buffer) { 57 60 zcomp_strm_free(zstrm); 58 58 - zstrm = NULL; 61 61 + return -ENOMEM; 59 62 } 60 60 - return zstrm; 63 63 + return 0; 61 64 } 62 65 63 66 bool zcomp_available_algorithm(const char *comp) ··· 110 113 111 114 struct zcomp_strm *zcomp_stream_get(struct zcomp *comp) 112 115 { 113 113 - return *get_cpu_ptr(comp->stream); 116 116 + local_lock(&comp->stream->lock); 117 117 + return this_cpu_ptr(comp->stream); 114 118 } 115 119 116 120 void zcomp_stream_put(struct zcomp *comp) 117 121 { 118 118 - put_cpu_ptr(comp->stream); 122 122 + local_unlock(&comp->stream->lock); 119 123 } 120 124 121 125 int zcomp_compress(struct zcomp_strm *zstrm, ··· 157 159 { 158 160 struct zcomp *comp = hlist_entry(node, struct zcomp, node); 159 161 struct zcomp_strm *zstrm; 162 162 + int ret; 160 163 161 161 - if (WARN_ON(*per_cpu_ptr(comp->stream, cpu))) 162 162 - return 0; 164 164 + zstrm = per_cpu_ptr(comp->stream, cpu); 165 165 + local_lock_init(&zstrm->lock); 163 166 164 164 - zstrm = zcomp_strm_alloc(comp); 165 165 - if (IS_ERR_OR_NULL(zstrm)) { 167 167 + ret = zcomp_strm_init(zstrm, comp); 168 168 + if (ret) 166 169 pr_err("Can't allocate a compression stream\n"); 167 167 - return -ENOMEM; 168 168 - } 169 169 - *per_cpu_ptr(comp->stream, cpu) = zstrm; 170 170 - return 0; 170 170 + return ret; 171 171 } 172 172 173 173 int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node) ··· 173 177 struct zcomp *comp = hlist_entry(node, struct zcomp, node); 174 178 struct zcomp_strm *zstrm; 175 179 176 176 - zstrm = *per_cpu_ptr(comp->stream, cpu); 177 177 - if (!IS_ERR_OR_NULL(zstrm)) 178 178 - zcomp_strm_free(zstrm); 179 179 - *per_cpu_ptr(comp->stream, cpu) = NULL; 180 180 + zstrm = per_cpu_ptr(comp->stream, cpu); 181 181 + zcomp_strm_free(zstrm); 180 182 return 0; 181 183 } 182 184 ··· 182 188 { 183 189 int ret; 184 190 185 185 - comp->stream = alloc_percpu(struct zcomp_strm *); 191 191 + comp->stream = alloc_percpu(struct zcomp_strm); 186 192 if (!comp->stream) 187 193 return -ENOMEM; 188 194

+4 -1

drivers/block/zram/zcomp.h

reviewed

··· 5 5 6 6 #ifndef _ZCOMP_H_ 7 7 #define _ZCOMP_H_ 8 8 + #include <linux/local_lock.h> 8 9 9 10 struct zcomp_strm { 11 11 + /* The members ->buffer and ->tfm are protected by ->lock. */ 12 12 + local_lock_t lock; 10 13 /* compression/decompression buffer */ 11 14 void *buffer; 12 15 struct crypto_comp *tfm; ··· 17 14 18 15 /* dynamic per-device compression frontend */ 19 16 struct zcomp { 20 20 - struct zcomp_strm * __percpu *stream; 17 17 + struct zcomp_strm __percpu *stream; 21 18 const char *name; 22 19 struct hlist_node node; 23 20 };

+14 -7

drivers/connector/cn_proc.c

reviewed

··· 18 18 #include <linux/pid_namespace.h> 19 19 20 20 #include <linux/cn_proc.h> 21 21 + #include <linux/local_lock.h> 21 22 22 23 /* 23 24 * Size of a cn_msg followed by a proc_event structure. Since the ··· 39 38 static atomic_t proc_event_num_listeners = ATOMIC_INIT(0); 40 39 static struct cb_id cn_proc_event_id = { CN_IDX_PROC, CN_VAL_PROC }; 41 40 42 42 - /* proc_event_counts is used as the sequence number of the netlink message */ 43 43 - static DEFINE_PER_CPU(__u32, proc_event_counts) = { 0 }; 41 41 + /* local_event.count is used as the sequence number of the netlink message */ 42 42 + struct local_event { 43 43 + local_lock_t lock; 44 44 + __u32 count; 45 45 + }; 46 46 + static DEFINE_PER_CPU(struct local_event, local_event) = { 47 47 + .lock = INIT_LOCAL_LOCK(lock), 48 48 + }; 44 49 45 50 static inline void send_msg(struct cn_msg *msg) 46 51 { 47 47 - preempt_disable(); 52 52 + local_lock(&local_event.lock); 48 53 49 49 - msg->seq = __this_cpu_inc_return(proc_event_counts) - 1; 54 54 + msg->seq = __this_cpu_inc_return(local_event.count) - 1; 50 55 ((struct proc_event *)msg->data)->cpu = smp_processor_id(); 51 56 52 57 /* 53 53 - * Preemption remains disabled during send to ensure the messages are 54 54 - * ordered according to their sequence numbers. 58 58 + * local_lock() disables preemption during send to ensure the messages 59 59 + * are ordered according to their sequence numbers. 55 60 * 56 61 * If cn_netlink_send() fails, the data is not sent. 57 62 */ 58 63 cn_netlink_send(msg, 0, CN_IDX_PROC, GFP_NOWAIT); 59 64 60 60 - preempt_enable(); 65 65 + local_unlock(&local_event.lock); 61 66 } 62 67 63 68 void proc_fork_connector(struct task_struct *task)

+14 -7

fs/squashfs/decompressor_multi_percpu.c

reviewed

··· 8 8 #include <linux/slab.h> 9 9 #include <linux/percpu.h> 10 10 #include <linux/buffer_head.h> 11 11 + #include <linux/local_lock.h> 11 12 12 13 #include "squashfs_fs.h" 13 14 #include "squashfs_fs_sb.h" ··· 21 20 */ 22 21 23 22 struct squashfs_stream { 24 24 - void *stream; 23 23 + void *stream; 24 24 + local_lock_t lock; 25 25 }; 26 26 27 27 void *squashfs_decompressor_create(struct squashfs_sb_info *msblk, ··· 43 41 err = PTR_ERR(stream->stream); 44 42 goto out; 45 43 } 44 44 + local_lock_init(&stream->lock); 46 45 } 47 46 48 47 kfree(comp_opts); ··· 78 75 int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head **bh, 79 76 int b, int offset, int length, struct squashfs_page_actor *output) 80 77 { 81 81 - struct squashfs_stream __percpu *percpu = 82 82 - (struct squashfs_stream __percpu *) msblk->stream; 83 83 - struct squashfs_stream *stream = get_cpu_ptr(percpu); 84 84 - int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b, 85 85 - offset, length, output); 86 86 - put_cpu_ptr(stream); 78 78 + struct squashfs_stream *stream; 79 79 + int res; 80 80 + 81 81 + local_lock(&msblk->stream->lock); 82 82 + stream = this_cpu_ptr(msblk->stream); 83 83 + 84 84 + res = msblk->decompressor->decompress(msblk, stream->stream, bh, b, 85 85 + offset, length, output); 86 86 + 87 87 + local_unlock(&msblk->stream->lock); 87 88 88 89 if (res < 0) 89 90 ERROR("%s decompression failed, data probably corrupt\n",

+1 -1

include/linux/idr.h

reviewed

··· 171 171 */ 172 172 static inline void idr_preload_end(void) 173 173 { 174 174 - preempt_enable(); 174 174 + local_unlock(&radix_tree_preloads.lock); 175 175 } 176 176 177 177 /**

+54

include/linux/local_lock.h

reviewed

··· 1 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 2 + #ifndef _LINUX_LOCAL_LOCK_H 3 3 + #define _LINUX_LOCAL_LOCK_H 4 4 + 5 5 + #include <linux/local_lock_internal.h> 6 6 + 7 7 + /** 8 8 + * local_lock_init - Runtime initialize a lock instance 9 9 + */ 10 10 + #define local_lock_init(lock) __local_lock_init(lock) 11 11 + 12 12 + /** 13 13 + * local_lock - Acquire a per CPU local lock 14 14 + * @lock: The lock variable 15 15 + */ 16 16 + #define local_lock(lock) __local_lock(lock) 17 17 + 18 18 + /** 19 19 + * local_lock_irq - Acquire a per CPU local lock and disable interrupts 20 20 + * @lock: The lock variable 21 21 + */ 22 22 + #define local_lock_irq(lock) __local_lock_irq(lock) 23 23 + 24 24 + /** 25 25 + * local_lock_irqsave - Acquire a per CPU local lock, save and disable 26 26 + * interrupts 27 27 + * @lock: The lock variable 28 28 + * @flags: Storage for interrupt flags 29 29 + */ 30 30 + #define local_lock_irqsave(lock, flags) \ 31 31 + __local_lock_irqsave(lock, flags) 32 32 + 33 33 + /** 34 34 + * local_unlock - Release a per CPU local lock 35 35 + * @lock: The lock variable 36 36 + */ 37 37 + #define local_unlock(lock) __local_unlock(lock) 38 38 + 39 39 + /** 40 40 + * local_unlock_irq - Release a per CPU local lock and enable interrupts 41 41 + * @lock: The lock variable 42 42 + */ 43 43 + #define local_unlock_irq(lock) __local_unlock_irq(lock) 44 44 + 45 45 + /** 46 46 + * local_unlock_irqrestore - Release a per CPU local lock and restore 47 47 + * interrupt flags 48 48 + * @lock: The lock variable 49 49 + * @flags: Interrupt flags to restore 50 50 + */ 51 51 + #define local_unlock_irqrestore(lock, flags) \ 52 52 + __local_unlock_irqrestore(lock, flags) 53 53 + 54 54 + #endif

+90

include/linux/local_lock_internal.h

reviewed

··· 1 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 2 + #ifndef _LINUX_LOCAL_LOCK_H 3 3 + # error "Do not include directly, include linux/local_lock.h" 4 4 + #endif 5 5 + 6 6 + #include <linux/percpu-defs.h> 7 7 + #include <linux/lockdep.h> 8 8 + 9 9 + typedef struct { 10 10 + #ifdef CONFIG_DEBUG_LOCK_ALLOC 11 11 + struct lockdep_map dep_map; 12 12 + struct task_struct *owner; 13 13 + #endif 14 14 + } local_lock_t; 15 15 + 16 16 + #ifdef CONFIG_DEBUG_LOCK_ALLOC 17 17 + # define LL_DEP_MAP_INIT(lockname) \ 18 18 + .dep_map = { \ 19 19 + .name = #lockname, \ 20 20 + .wait_type_inner = LD_WAIT_CONFIG, \ 21 21 + } 22 22 + #else 23 23 + # define LL_DEP_MAP_INIT(lockname) 24 24 + #endif 25 25 + 26 26 + #define INIT_LOCAL_LOCK(lockname) { LL_DEP_MAP_INIT(lockname) } 27 27 + 28 28 + #define __local_lock_init(lock) \ 29 29 + do { \ 30 30 + static struct lock_class_key __key; \ 31 31 + \ 32 32 + debug_check_no_locks_freed((void *)lock, sizeof(*lock));\ 33 33 + lockdep_init_map_wait(&(lock)->dep_map, #lock, &__key, 0, LD_WAIT_CONFIG);\ 34 34 + } while (0) 35 35 + 36 36 + #ifdef CONFIG_DEBUG_LOCK_ALLOC 37 37 + static inline void local_lock_acquire(local_lock_t *l) 38 38 + { 39 39 + lock_map_acquire(&l->dep_map); 40 40 + DEBUG_LOCKS_WARN_ON(l->owner); 41 41 + l->owner = current; 42 42 + } 43 43 + 44 44 + static inline void local_lock_release(local_lock_t *l) 45 45 + { 46 46 + DEBUG_LOCKS_WARN_ON(l->owner != current); 47 47 + l->owner = NULL; 48 48 + lock_map_release(&l->dep_map); 49 49 + } 50 50 + 51 51 + #else /* CONFIG_DEBUG_LOCK_ALLOC */ 52 52 + static inline void local_lock_acquire(local_lock_t *l) { } 53 53 + static inline void local_lock_release(local_lock_t *l) { } 54 54 + #endif /* !CONFIG_DEBUG_LOCK_ALLOC */ 55 55 + 56 56 + #define __local_lock(lock) \ 57 57 + do { \ 58 58 + preempt_disable(); \ 59 59 + local_lock_acquire(this_cpu_ptr(lock)); \ 60 60 + } while (0) 61 61 + 62 62 + #define __local_lock_irq(lock) \ 63 63 + do { \ 64 64 + local_irq_disable(); \ 65 65 + local_lock_acquire(this_cpu_ptr(lock)); \ 66 66 + } while (0) 67 67 + 68 68 + #define __local_lock_irqsave(lock, flags) \ 69 69 + do { \ 70 70 + local_irq_save(flags); \ 71 71 + local_lock_acquire(this_cpu_ptr(lock)); \ 72 72 + } while (0) 73 73 + 74 74 + #define __local_unlock(lock) \ 75 75 + do { \ 76 76 + local_lock_release(this_cpu_ptr(lock)); \ 77 77 + preempt_enable(); \ 78 78 + } while (0) 79 79 + 80 80 + #define __local_unlock_irq(lock) \ 81 81 + do { \ 82 82 + local_lock_release(this_cpu_ptr(lock)); \ 83 83 + local_irq_enable(); \ 84 84 + } while (0) 85 85 + 86 86 + #define __local_unlock_irqrestore(lock, flags) \ 87 87 + do { \ 88 88 + local_lock_release(this_cpu_ptr(lock)); \ 89 89 + local_irq_restore(flags); \ 90 90 + } while (0)

+10 -1

include/linux/radix-tree.h

reviewed

··· 16 16 #include <linux/spinlock.h> 17 17 #include <linux/types.h> 18 18 #include <linux/xarray.h> 19 19 + #include <linux/local_lock.h> 19 20 20 21 /* Keep unconverted code working */ 21 22 #define radix_tree_root xarray 22 23 #define radix_tree_node xa_node 24 24 + 25 25 + struct radix_tree_preload { 26 26 + local_lock_t lock; 27 27 + unsigned nr; 28 28 + /* nodes->parent points to next preallocated node */ 29 29 + struct radix_tree_node *nodes; 30 30 + }; 31 31 + DECLARE_PER_CPU(struct radix_tree_preload, radix_tree_preloads); 23 32 24 33 /* 25 34 * The bottom two bits of the slot determine how the remaining bits in the ··· 254 245 255 246 static inline void radix_tree_preload_end(void) 256 247 { 257 257 - preempt_enable(); 248 248 + local_unlock(&radix_tree_preloads.lock); 258 249 } 259 250 260 251 void __rcu **idr_get_free(struct radix_tree_root *root,

include/linux/swap.h

reviewed

··· 337 337 extern void mark_page_accessed(struct page *); 338 338 extern void lru_add_drain(void); 339 339 extern void lru_add_drain_cpu(int cpu); 340 340 + extern void lru_add_drain_cpu_zone(struct zone *zone); 340 341 extern void lru_add_drain_all(void); 341 342 extern void rotate_reclaimable_page(struct page *page); 342 343 extern void deactivate_file_page(struct page *page);

+1 -1

kernel/locking/lockdep.c

reviewed

··· 470 470 struct hlist_node hash_entry; 471 471 u32 hash; 472 472 u32 nr_entries; 473 473 - unsigned long entries[0] __aligned(sizeof(unsigned long)); 473 473 + unsigned long entries[] __aligned(sizeof(unsigned long)); 474 474 }; 475 475 #define LOCK_TRACE_SIZE_IN_LONGS \ 476 476 (sizeof(struct lock_trace) / sizeof(unsigned long))

-2

kernel/locking/rtmutex.c

reviewed

··· 141 141 * set up. 142 142 */ 143 143 #ifndef CONFIG_DEBUG_RT_MUTEXES 144 144 - # define rt_mutex_cmpxchg_relaxed(l,c,n) (cmpxchg_relaxed(&l->owner, c, n) == c) 145 144 # define rt_mutex_cmpxchg_acquire(l,c,n) (cmpxchg_acquire(&l->owner, c, n) == c) 146 145 # define rt_mutex_cmpxchg_release(l,c,n) (cmpxchg_release(&l->owner, c, n) == c) 147 146 ··· 201 202 } 202 203 203 204 #else 204 204 - # define rt_mutex_cmpxchg_relaxed(l,c,n) (0) 205 205 # define rt_mutex_cmpxchg_acquire(l,c,n) (0) 206 206 # define rt_mutex_cmpxchg_release(l,c,n) (0) 207 207

+9 -11

lib/radix-tree.c

reviewed

··· 20 20 #include <linux/kernel.h> 21 21 #include <linux/kmemleak.h> 22 22 #include <linux/percpu.h> 23 23 + #include <linux/local_lock.h> 23 24 #include <linux/preempt.h> /* in_interrupt() */ 24 25 #include <linux/radix-tree.h> 25 26 #include <linux/rcupdate.h> 26 27 #include <linux/slab.h> 27 28 #include <linux/string.h> 28 29 #include <linux/xarray.h> 29 29 - 30 30 31 31 /* 32 32 * Radix tree node cache. ··· 58 58 /* 59 59 * Per-cpu pool of preloaded nodes 60 60 */ 61 61 - struct radix_tree_preload { 62 62 - unsigned nr; 63 63 - /* nodes->parent points to next preallocated node */ 64 64 - struct radix_tree_node *nodes; 61 61 + DEFINE_PER_CPU(struct radix_tree_preload, radix_tree_preloads) = { 62 62 + .lock = INIT_LOCAL_LOCK(lock), 65 63 }; 66 66 - static DEFINE_PER_CPU(struct radix_tree_preload, radix_tree_preloads) = { 0, }; 64 64 + EXPORT_PER_CPU_SYMBOL_GPL(radix_tree_preloads); 67 65 68 66 static inline struct radix_tree_node *entry_to_node(void *ptr) 69 67 { ··· 330 332 */ 331 333 gfp_mask &= ~__GFP_ACCOUNT; 332 334 333 333 - preempt_disable(); 335 335 + local_lock(&radix_tree_preloads.lock); 334 336 rtp = this_cpu_ptr(&radix_tree_preloads); 335 337 while (rtp->nr < nr) { 336 336 - preempt_enable(); 338 338 + local_unlock(&radix_tree_preloads.lock); 337 339 node = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask); 338 340 if (node == NULL) 339 341 goto out; 340 340 - preempt_disable(); 342 342 + local_lock(&radix_tree_preloads.lock); 341 343 rtp = this_cpu_ptr(&radix_tree_preloads); 342 344 if (rtp->nr < nr) { 343 345 node->parent = rtp->nodes; ··· 379 381 if (gfpflags_allow_blocking(gfp_mask)) 380 382 return __radix_tree_preload(gfp_mask, RADIX_TREE_PRELOAD_SIZE); 381 383 /* Preloading doesn't help anything with this gfp mask, skip it */ 382 382 - preempt_disable(); 384 384 + local_lock(&radix_tree_preloads.lock); 383 385 return 0; 384 386 } 385 387 EXPORT_SYMBOL(radix_tree_maybe_preload); ··· 1468 1470 void idr_preload(gfp_t gfp_mask) 1469 1471 { 1470 1472 if (__radix_tree_preload(gfp_mask, IDR_PRELOAD_SIZE)) 1471 1471 - preempt_disable(); 1473 1473 + local_lock(&radix_tree_preloads.lock); 1472 1474 } 1473 1475 EXPORT_SYMBOL(idr_preload); 1474 1476

+1 -5

mm/compaction.c

reviewed

··· 2243 2243 * would succeed. 2244 2244 */ 2245 2245 if (cc->order > 0 && last_migrated_pfn) { 2246 2246 - int cpu; 2247 2246 unsigned long current_block_start = 2248 2247 block_start_pfn(cc->migrate_pfn, cc->order); 2249 2248 2250 2249 if (last_migrated_pfn < current_block_start) { 2251 2251 - cpu = get_cpu(); 2252 2252 - lru_add_drain_cpu(cpu); 2253 2253 - drain_local_pages(cc->zone); 2254 2254 - put_cpu(); 2250 2250 + lru_add_drain_cpu_zone(cc->zone); 2255 2251 /* No more flushing until we migrate again */ 2256 2252 last_migrated_pfn = 0; 2257 2253 }

+80 -38

mm/swap.c

reviewed

··· 35 35 #include <linux/uio.h> 36 36 #include <linux/hugetlb.h> 37 37 #include <linux/page_idle.h> 38 38 + #include <linux/local_lock.h> 38 39 39 40 #include "internal.h" 40 41 ··· 45 44 /* How many pages do we try to swap or page in/out together? */ 46 45 int page_cluster; 47 46 48 48 - static DEFINE_PER_CPU(struct pagevec, lru_add_pvec); 49 49 - static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs); 50 50 - static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs); 51 51 - static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs); 52 52 - static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs); 47 47 + /* Protecting only lru_rotate.pvec which requires disabling interrupts */ 48 48 + struct lru_rotate { 49 49 + local_lock_t lock; 50 50 + struct pagevec pvec; 51 51 + }; 52 52 + static DEFINE_PER_CPU(struct lru_rotate, lru_rotate) = { 53 53 + .lock = INIT_LOCAL_LOCK(lock), 54 54 + }; 55 55 + 56 56 + /* 57 57 + * The following struct pagevec are grouped together because they are protected 58 58 + * by disabling preemption (and interrupts remain enabled). 59 59 + */ 60 60 + struct lru_pvecs { 61 61 + local_lock_t lock; 62 62 + struct pagevec lru_add; 63 63 + struct pagevec lru_deactivate_file; 64 64 + struct pagevec lru_deactivate; 65 65 + struct pagevec lru_lazyfree; 53 66 #ifdef CONFIG_SMP 54 54 - static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs); 67 67 + struct pagevec activate_page; 55 68 #endif 69 69 + }; 70 70 + static DEFINE_PER_CPU(struct lru_pvecs, lru_pvecs) = { 71 71 + .lock = INIT_LOCAL_LOCK(lock), 72 72 + }; 56 73 57 74 /* 58 75 * This path almost never happens for VM activity - pages are normally ··· 273 254 unsigned long flags; 274 255 275 256 get_page(page); 276 276 - local_irq_save(flags); 277 277 - pvec = this_cpu_ptr(&lru_rotate_pvecs); 257 257 + local_lock_irqsave(&lru_rotate.lock, flags); 258 258 + pvec = this_cpu_ptr(&lru_rotate.pvec); 278 259 if (!pagevec_add(pvec, page) || PageCompound(page)) 279 260 pagevec_move_tail(pvec); 280 280 - local_irq_restore(flags); 261 261 + local_unlock_irqrestore(&lru_rotate.lock, flags); 281 262 } 282 263 } 283 264 ··· 312 293 #ifdef CONFIG_SMP 313 294 static void activate_page_drain(int cpu) 314 295 { 315 315 - struct pagevec *pvec = &per_cpu(activate_page_pvecs, cpu); 296 296 + struct pagevec *pvec = &per_cpu(lru_pvecs.activate_page, cpu); 316 297 317 298 if (pagevec_count(pvec)) 318 299 pagevec_lru_move_fn(pvec, __activate_page, NULL); ··· 320 301 321 302 static bool need_activate_page_drain(int cpu) 322 303 { 323 323 - return pagevec_count(&per_cpu(activate_page_pvecs, cpu)) != 0; 304 304 + return pagevec_count(&per_cpu(lru_pvecs.activate_page, cpu)) != 0; 324 305 } 325 306 326 307 void activate_page(struct page *page) 327 308 { 328 309 page = compound_head(page); 329 310 if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) { 330 330 - struct pagevec *pvec = &get_cpu_var(activate_page_pvecs); 311 311 + struct pagevec *pvec; 331 312 313 313 + local_lock(&lru_pvecs.lock); 314 314 + pvec = this_cpu_ptr(&lru_pvecs.activate_page); 332 315 get_page(page); 333 316 if (!pagevec_add(pvec, page) || PageCompound(page)) 334 317 pagevec_lru_move_fn(pvec, __activate_page, NULL); 335 335 - put_cpu_var(activate_page_pvecs); 318 318 + local_unlock(&lru_pvecs.lock); 336 319 } 337 320 } 338 321 ··· 356 335 357 336 static void __lru_cache_activate_page(struct page *page) 358 337 { 359 359 - struct pagevec *pvec = &get_cpu_var(lru_add_pvec); 338 338 + struct pagevec *pvec; 360 339 int i; 340 340 + 341 341 + local_lock(&lru_pvecs.lock); 342 342 + pvec = this_cpu_ptr(&lru_pvecs.lru_add); 361 343 362 344 /* 363 345 * Search backwards on the optimistic assumption that the page being ··· 381 357 } 382 358 } 383 359 384 384 - put_cpu_var(lru_add_pvec); 360 360 + local_unlock(&lru_pvecs.lock); 385 361 } 386 362 387 363 /* ··· 409 385 } else if (!PageActive(page)) { 410 386 /* 411 387 * If the page is on the LRU, queue it for activation via 412 412 - * activate_page_pvecs. Otherwise, assume the page is on a 388 388 + * lru_pvecs.activate_page. Otherwise, assume the page is on a 413 389 * pagevec, mark it active and it'll be moved to the active 414 390 * LRU on the next drain. 415 391 */ ··· 428 404 429 405 static void __lru_cache_add(struct page *page) 430 406 { 431 431 - struct pagevec *pvec = &get_cpu_var(lru_add_pvec); 407 407 + struct pagevec *pvec; 432 408 409 409 + local_lock(&lru_pvecs.lock); 410 410 + pvec = this_cpu_ptr(&lru_pvecs.lru_add); 433 411 get_page(page); 434 412 if (!pagevec_add(pvec, page) || PageCompound(page)) 435 413 __pagevec_lru_add(pvec); 436 436 - put_cpu_var(lru_add_pvec); 414 414 + local_unlock(&lru_pvecs.lock); 437 415 } 438 416 439 417 /** ··· 619 593 */ 620 594 void lru_add_drain_cpu(int cpu) 621 595 { 622 622 - struct pagevec *pvec = &per_cpu(lru_add_pvec, cpu); 596 596 + struct pagevec *pvec = &per_cpu(lru_pvecs.lru_add, cpu); 623 597 624 598 if (pagevec_count(pvec)) 625 599 __pagevec_lru_add(pvec); 626 600 627 627 - pvec = &per_cpu(lru_rotate_pvecs, cpu); 601 601 + pvec = &per_cpu(lru_rotate.pvec, cpu); 628 602 if (pagevec_count(pvec)) { 629 603 unsigned long flags; 630 604 631 605 /* No harm done if a racing interrupt already did this */ 632 632 - local_irq_save(flags); 606 606 + local_lock_irqsave(&lru_rotate.lock, flags); 633 607 pagevec_move_tail(pvec); 634 634 - local_irq_restore(flags); 608 608 + local_unlock_irqrestore(&lru_rotate.lock, flags); 635 609 } 636 610 637 637 - pvec = &per_cpu(lru_deactivate_file_pvecs, cpu); 611 611 + pvec = &per_cpu(lru_pvecs.lru_deactivate_file, cpu); 638 612 if (pagevec_count(pvec)) 639 613 pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); 640 614 641 641 - pvec = &per_cpu(lru_deactivate_pvecs, cpu); 615 615 + pvec = &per_cpu(lru_pvecs.lru_deactivate, cpu); 642 616 if (pagevec_count(pvec)) 643 617 pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); 644 618 645 645 - pvec = &per_cpu(lru_lazyfree_pvecs, cpu); 619 619 + pvec = &per_cpu(lru_pvecs.lru_lazyfree, cpu); 646 620 if (pagevec_count(pvec)) 647 621 pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); 648 622 ··· 667 641 return; 668 642 669 643 if (likely(get_page_unless_zero(page))) { 670 670 - struct pagevec *pvec = &get_cpu_var(lru_deactivate_file_pvecs); 644 644 + struct pagevec *pvec; 645 645 + 646 646 + local_lock(&lru_pvecs.lock); 647 647 + pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate_file); 671 648 672 649 if (!pagevec_add(pvec, page) || PageCompound(page)) 673 650 pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); 674 674 - put_cpu_var(lru_deactivate_file_pvecs); 651 651 + local_unlock(&lru_pvecs.lock); 675 652 } 676 653 } 677 654 ··· 689 660 void deactivate_page(struct page *page) 690 661 { 691 662 if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { 692 692 - struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs); 663 663 + struct pagevec *pvec; 693 664 665 665 + local_lock(&lru_pvecs.lock); 666 666 + pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate); 694 667 get_page(page); 695 668 if (!pagevec_add(pvec, page) || PageCompound(page)) 696 669 pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); 697 697 - put_cpu_var(lru_deactivate_pvecs); 670 670 + local_unlock(&lru_pvecs.lock); 698 671 } 699 672 } 700 673 ··· 711 680 { 712 681 if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) && 713 682 !PageSwapCache(page) && !PageUnevictable(page)) { 714 714 - struct pagevec *pvec = &get_cpu_var(lru_lazyfree_pvecs); 683 683 + struct pagevec *pvec; 715 684 685 685 + local_lock(&lru_pvecs.lock); 686 686 + pvec = this_cpu_ptr(&lru_pvecs.lru_lazyfree); 716 687 get_page(page); 717 688 if (!pagevec_add(pvec, page) || PageCompound(page)) 718 689 pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); 719 719 - put_cpu_var(lru_lazyfree_pvecs); 690 690 + local_unlock(&lru_pvecs.lock); 720 691 } 721 692 } 722 693 723 694 void lru_add_drain(void) 724 695 { 725 725 - lru_add_drain_cpu(get_cpu()); 726 726 - put_cpu(); 696 696 + local_lock(&lru_pvecs.lock); 697 697 + lru_add_drain_cpu(smp_processor_id()); 698 698 + local_unlock(&lru_pvecs.lock); 699 699 + } 700 700 + 701 701 + void lru_add_drain_cpu_zone(struct zone *zone) 702 702 + { 703 703 + local_lock(&lru_pvecs.lock); 704 704 + lru_add_drain_cpu(smp_processor_id()); 705 705 + drain_local_pages(zone); 706 706 + local_unlock(&lru_pvecs.lock); 727 707 } 728 708 729 709 #ifdef CONFIG_SMP ··· 785 743 for_each_online_cpu(cpu) { 786 744 struct work_struct *work = &per_cpu(lru_add_drain_work, cpu); 787 745 788 788 - if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) || 789 789 - pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) || 790 790 - pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) || 791 791 - pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) || 792 792 - pagevec_count(&per_cpu(lru_lazyfree_pvecs, cpu)) || 746 746 + if (pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) || 747 747 + pagevec_count(&per_cpu(lru_rotate.pvec, cpu)) || 748 748 + pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) || 749 749 + pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) || 750 750 + pagevec_count(&per_cpu(lru_pvecs.lru_lazyfree, cpu)) || 793 751 need_activate_page_drain(cpu)) { 794 752 INIT_WORK(work, lru_add_drain_per_cpu); 795 753 queue_work_on(cpu, mm_percpu_wq, work);