Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Documentation: locking: Describe seqlock design and usage

Proper documentation for the design and usage of sequence counters and
sequential locks does not exist. Complete the seqlock.h documentation as
follows:

- Divide all documentation on a seqcount_t vs. seqlock_t basis. The
description for both mechanisms was intermingled, which is incorrect
since the usage constrains for each type are vastly different.

- Add an introductory paragraph describing the internal design of, and
rationale for, sequence counters.

- Document seqcount_t writer non-preemptibility requirement, which was
not previously documented anywhere, and provide a clear rationale.

- Provide template code for seqcount_t and seqlock_t initialization
and reader/writer critical sections.

- Recommend using seqlock_t by default. It implicitly handles the
serialization and non-preemptibility requirements of writers.

At seqlock.h:

- Remove references to brlocks as they've long been removed from the
kernel.

- Remove references to gcc-3.x since the kernel's minimum supported
gcc version is 4.9.

References: 0f6ed63b1707 ("no need to keep brlock macros anymore...")
References: 6ec4476ac825 ("Raise gcc version requirement to 4.9")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-2-a.darwish@linutronix.de

authored by

Ahmed S. Darwish and committed by
Peter Zijlstra
0d24f65e f05d6717

+209 -43
+1
Documentation/locking/index.rst
··· 14 14 mutex-design 15 15 rt-mutex-design 16 16 rt-mutex 17 + seqlock 17 18 spinlocks 18 19 ww-mutex-design 19 20 preempt-locking
+170
Documentation/locking/seqlock.rst
··· 1 + ====================================== 2 + Sequence counters and sequential locks 3 + ====================================== 4 + 5 + Introduction 6 + ============ 7 + 8 + Sequence counters are a reader-writer consistency mechanism with 9 + lockless readers (read-only retry loops), and no writer starvation. They 10 + are used for data that's rarely written to (e.g. system time), where the 11 + reader wants a consistent set of information and is willing to retry if 12 + that information changes. 13 + 14 + A data set is consistent when the sequence count at the beginning of the 15 + read side critical section is even and the same sequence count value is 16 + read again at the end of the critical section. The data in the set must 17 + be copied out inside the read side critical section. If the sequence 18 + count has changed between the start and the end of the critical section, 19 + the reader must retry. 20 + 21 + Writers increment the sequence count at the start and the end of their 22 + critical section. After starting the critical section the sequence count 23 + is odd and indicates to the readers that an update is in progress. At 24 + the end of the write side critical section the sequence count becomes 25 + even again which lets readers make progress. 26 + 27 + A sequence counter write side critical section must never be preempted 28 + or interrupted by read side sections. Otherwise the reader will spin for 29 + the entire scheduler tick due to the odd sequence count value and the 30 + interrupted writer. If that reader belongs to a real-time scheduling 31 + class, it can spin forever and the kernel will livelock. 32 + 33 + This mechanism cannot be used if the protected data contains pointers, 34 + as the writer can invalidate a pointer that the reader is following. 35 + 36 + 37 + .. _seqcount_t: 38 + 39 + Sequence counters (``seqcount_t``) 40 + ================================== 41 + 42 + This is the the raw counting mechanism, which does not protect against 43 + multiple writers. Write side critical sections must thus be serialized 44 + by an external lock. 45 + 46 + If the write serialization primitive is not implicitly disabling 47 + preemption, preemption must be explicitly disabled before entering the 48 + write side section. If the read section can be invoked from hardirq or 49 + softirq contexts, interrupts or bottom halves must also be respectively 50 + disabled before entering the write section. 51 + 52 + If it's desired to automatically handle the sequence counter 53 + requirements of writer serialization and non-preemptibility, use 54 + :ref:`seqlock_t` instead. 55 + 56 + Initialization:: 57 + 58 + /* dynamic */ 59 + seqcount_t foo_seqcount; 60 + seqcount_init(&foo_seqcount); 61 + 62 + /* static */ 63 + static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount); 64 + 65 + /* C99 struct init */ 66 + struct { 67 + .seq = SEQCNT_ZERO(foo.seq), 68 + } foo; 69 + 70 + Write path:: 71 + 72 + /* Serialized context with disabled preemption */ 73 + 74 + write_seqcount_begin(&foo_seqcount); 75 + 76 + /* ... [[write-side critical section]] ... */ 77 + 78 + write_seqcount_end(&foo_seqcount); 79 + 80 + Read path:: 81 + 82 + do { 83 + seq = read_seqcount_begin(&foo_seqcount); 84 + 85 + /* ... [[read-side critical section]] ... */ 86 + 87 + } while (read_seqcount_retry(&foo_seqcount, seq)); 88 + 89 + 90 + .. _seqlock_t: 91 + 92 + Sequential locks (``seqlock_t``) 93 + ================================ 94 + 95 + This contains the :ref:`seqcount_t` mechanism earlier discussed, plus an 96 + embedded spinlock for writer serialization and non-preemptibility. 97 + 98 + If the read side section can be invoked from hardirq or softirq context, 99 + use the write side function variants which disable interrupts or bottom 100 + halves respectively. 101 + 102 + Initialization:: 103 + 104 + /* dynamic */ 105 + seqlock_t foo_seqlock; 106 + seqlock_init(&foo_seqlock); 107 + 108 + /* static */ 109 + static DEFINE_SEQLOCK(foo_seqlock); 110 + 111 + /* C99 struct init */ 112 + struct { 113 + .seql = __SEQLOCK_UNLOCKED(foo.seql) 114 + } foo; 115 + 116 + Write path:: 117 + 118 + write_seqlock(&foo_seqlock); 119 + 120 + /* ... [[write-side critical section]] ... */ 121 + 122 + write_sequnlock(&foo_seqlock); 123 + 124 + Read path, three categories: 125 + 126 + 1. Normal Sequence readers which never block a writer but they must 127 + retry if a writer is in progress by detecting change in the sequence 128 + number. Writers do not wait for a sequence reader:: 129 + 130 + do { 131 + seq = read_seqbegin(&foo_seqlock); 132 + 133 + /* ... [[read-side critical section]] ... */ 134 + 135 + } while (read_seqretry(&foo_seqlock, seq)); 136 + 137 + 2. Locking readers which will wait if a writer or another locking reader 138 + is in progress. A locking reader in progress will also block a writer 139 + from entering its critical section. This read lock is 140 + exclusive. Unlike rwlock_t, only one locking reader can acquire it:: 141 + 142 + read_seqlock_excl(&foo_seqlock); 143 + 144 + /* ... [[read-side critical section]] ... */ 145 + 146 + read_sequnlock_excl(&foo_seqlock); 147 + 148 + 3. Conditional lockless reader (as in 1), or locking reader (as in 2), 149 + according to a passed marker. This is used to avoid lockless readers 150 + starvation (too much retry loops) in case of a sharp spike in write 151 + activity. First, a lockless read is tried (even marker passed). If 152 + that trial fails (odd sequence counter is returned, which is used as 153 + the next iteration marker), the lockless read is transformed to a 154 + full locking read and no retry loop is necessary:: 155 + 156 + /* marker; even initialization */ 157 + int seq = 0; 158 + do { 159 + read_seqbegin_or_lock(&foo_seqlock, &seq); 160 + 161 + /* ... [[read-side critical section]] ... */ 162 + 163 + } while (need_seqretry(&foo_seqlock, seq)); 164 + done_seqretry(&foo_seqlock, seq); 165 + 166 + 167 + API documentation 168 + ================= 169 + 170 + .. kernel-doc:: include/linux/seqlock.h
+38 -43
include/linux/seqlock.h
··· 1 1 /* SPDX-License-Identifier: GPL-2.0 */ 2 2 #ifndef __LINUX_SEQLOCK_H 3 3 #define __LINUX_SEQLOCK_H 4 + 4 5 /* 5 - * Reader/writer consistent mechanism without starving writers. This type of 6 - * lock for data where the reader wants a consistent set of information 7 - * and is willing to retry if the information changes. There are two types 8 - * of readers: 9 - * 1. Sequence readers which never block a writer but they may have to retry 10 - * if a writer is in progress by detecting change in sequence number. 11 - * Writers do not wait for a sequence reader. 12 - * 2. Locking readers which will wait if a writer or another locking reader 13 - * is in progress. A locking reader in progress will also block a writer 14 - * from going forward. Unlike the regular rwlock, the read lock here is 15 - * exclusive so that only one locking reader can get it. 6 + * seqcount_t / seqlock_t - a reader-writer consistency mechanism with 7 + * lockless readers (read-only retry loops), and no writer starvation. 16 8 * 17 - * This is not as cache friendly as brlock. Also, this may not work well 18 - * for data that contains pointers, because any writer could 19 - * invalidate a pointer that a reader was following. 9 + * See Documentation/locking/seqlock.rst 20 10 * 21 - * Expected non-blocking reader usage: 22 - * do { 23 - * seq = read_seqbegin(&foo); 24 - * ... 25 - * } while (read_seqretry(&foo, seq)); 26 - * 27 - * 28 - * On non-SMP the spin locks disappear but the writer still needs 29 - * to increment the sequence variables because an interrupt routine could 30 - * change the state of the data. 31 - * 32 - * Based on x86_64 vsyscall gettimeofday 33 - * by Keith Owens and Andrea Arcangeli 11 + * Copyrights: 12 + * - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli 34 13 */ 35 14 36 15 #include <linux/spinlock.h> ··· 20 41 #include <asm/processor.h> 21 42 22 43 /* 23 - * The seqlock interface does not prescribe a precise sequence of read 24 - * begin/retry/end. For readers, typically there is a call to 44 + * The seqlock seqcount_t interface does not prescribe a precise sequence of 45 + * read begin/retry/end. For readers, typically there is a call to 25 46 * read_seqcount_begin() and read_seqcount_retry(), however, there are more 26 47 * esoteric cases which do not follow this pattern. 27 48 * ··· 29 50 * via seqcount_t under KCSAN: upon beginning a seq-reader critical section, 30 51 * pessimistically mark the next KCSAN_SEQLOCK_REGION_MAX memory accesses as 31 52 * atomics; if there is a matching read_seqcount_retry() call, no following 32 - * memory operations are considered atomic. Usage of seqlocks via seqlock_t 33 - * interface is not affected. 53 + * memory operations are considered atomic. Usage of the seqlock_t interface 54 + * is not affected. 34 55 */ 35 56 #define KCSAN_SEQLOCK_REGION_MAX 1000 36 57 37 58 /* 38 - * Version using sequence counter only. 39 - * This can be used when code has its own mutex protecting the 40 - * updating starting before the write_seqcountbeqin() and ending 41 - * after the write_seqcount_end(). 59 + * Sequence counters (seqcount_t) 60 + * 61 + * This is the raw counting mechanism, without any writer protection. 62 + * 63 + * Write side critical sections must be serialized and non-preemptible. 64 + * 65 + * If readers can be invoked from hardirq or softirq contexts, 66 + * interrupts or bottom halves must also be respectively disabled before 67 + * entering the write section. 68 + * 69 + * This mechanism can't be used if the protected data contains pointers, 70 + * as the writer can invalidate a pointer that a reader is following. 71 + * 72 + * If it's desired to automatically handle the sequence counter writer 73 + * serialization and non-preemptibility requirements, use a sequential 74 + * lock (seqlock_t) instead. 75 + * 76 + * See Documentation/locking/seqlock.rst 42 77 */ 43 78 typedef struct seqcount { 44 79 unsigned sequence; ··· 391 398 smp_wmb(); /* increment "sequence" before following stores */ 392 399 } 393 400 394 - /* 395 - * Sequence counter only version assumes that callers are using their 396 - * own mutexing. 397 - */ 398 401 static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass) 399 402 { 400 403 raw_write_seqcount_begin(s); ··· 423 434 kcsan_nestable_atomic_end(); 424 435 } 425 436 437 + /* 438 + * Sequential locks (seqlock_t) 439 + * 440 + * Sequence counters with an embedded spinlock for writer serialization 441 + * and non-preemptibility. 442 + * 443 + * For more info, see: 444 + * - Comments on top of seqcount_t 445 + * - Documentation/locking/seqlock.rst 446 + */ 426 447 typedef struct { 427 448 struct seqcount seqcount; 428 449 spinlock_t lock; 429 450 } seqlock_t; 430 451 431 - /* 432 - * These macros triggered gcc-3.x compile-time problems. We think these are 433 - * OK now. Be cautious. 434 - */ 435 452 #define __SEQLOCK_UNLOCKED(lockname) \ 436 453 { \ 437 454 .seqcount = SEQCNT_ZERO(lockname), \