Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

tools/memory-model: Add access-marking documentation

This commit adapts the "Concurrency bugs should fear the big bad data-race
detector (part 2)" LWN article (https://lwn.net/Articles/816854/)
to kernel-documentation form. This allows more easily updating the
material as needed.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
[ paulmck: Apply Marco Elver feedback. ]
[ paulmck: Update per Akira Yokosawa feedback. ]
Reviewed-by: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

+479
+479
tools/memory-model/Documentation/access-marking.txt
··· 1 + MARKING SHARED-MEMORY ACCESSES 2 + ============================== 3 + 4 + This document provides guidelines for marking intentionally concurrent 5 + normal accesses to shared memory, that is "normal" as in accesses that do 6 + not use read-modify-write atomic operations. It also describes how to 7 + document these accesses, both with comments and with special assertions 8 + processed by the Kernel Concurrency Sanitizer (KCSAN). This discussion 9 + builds on an earlier LWN article [1]. 10 + 11 + 12 + ACCESS-MARKING OPTIONS 13 + ====================== 14 + 15 + The Linux kernel provides the following access-marking options: 16 + 17 + 1. Plain C-language accesses (unmarked), for example, "a = b;" 18 + 19 + 2. Data-race marking, for example, "data_race(a = b);" 20 + 21 + 3. READ_ONCE(), for example, "a = READ_ONCE(b);" 22 + The various forms of atomic_read() also fit in here. 23 + 24 + 4. WRITE_ONCE(), for example, "WRITE_ONCE(a, b);" 25 + The various forms of atomic_set() also fit in here. 26 + 27 + 28 + These may be used in combination, as shown in this admittedly improbable 29 + example: 30 + 31 + WRITE_ONCE(a, b + data_race(c + d) + READ_ONCE(e)); 32 + 33 + Neither plain C-language accesses nor data_race() (#1 and #2 above) place 34 + any sort of constraint on the compiler's choice of optimizations [2]. 35 + In contrast, READ_ONCE() and WRITE_ONCE() (#3 and #4 above) restrict the 36 + compiler's use of code-motion and common-subexpression optimizations. 37 + Therefore, if a given access is involved in an intentional data race, 38 + using READ_ONCE() for loads and WRITE_ONCE() for stores is usually 39 + preferable to data_race(), which in turn is usually preferable to plain 40 + C-language accesses. 41 + 42 + KCSAN will complain about many types of data races involving plain 43 + C-language accesses, but marking all accesses involved in a given data 44 + race with one of data_race(), READ_ONCE(), or WRITE_ONCE(), will prevent 45 + KCSAN from complaining. Of course, lack of KCSAN complaints does not 46 + imply correct code. Therefore, please take a thoughtful approach 47 + when responding to KCSAN complaints. Churning the code base with 48 + ill-considered additions of data_race(), READ_ONCE(), and WRITE_ONCE() 49 + is unhelpful. 50 + 51 + In fact, the following sections describe situations where use of 52 + data_race() and even plain C-language accesses is preferable to 53 + READ_ONCE() and WRITE_ONCE(). 54 + 55 + 56 + Use of the data_race() Macro 57 + ---------------------------- 58 + 59 + Here are some situations where data_race() should be used instead of 60 + READ_ONCE() and WRITE_ONCE(): 61 + 62 + 1. Data-racy loads from shared variables whose values are used only 63 + for diagnostic purposes. 64 + 65 + 2. Data-racy reads whose values are checked against marked reload. 66 + 67 + 3. Reads whose values feed into error-tolerant heuristics. 68 + 69 + 4. Writes setting values that feed into error-tolerant heuristics. 70 + 71 + 72 + Data-Racy Reads for Approximate Diagnostics 73 + 74 + Approximate diagnostics include lockdep reports, monitoring/statistics 75 + (including /proc and /sys output), WARN*()/BUG*() checks whose return 76 + values are ignored, and other situations where reads from shared variables 77 + are not an integral part of the core concurrency design. 78 + 79 + In fact, use of data_race() instead READ_ONCE() for these diagnostic 80 + reads can enable better checking of the remaining accesses implementing 81 + the core concurrency design. For example, suppose that the core design 82 + prevents any non-diagnostic reads from shared variable x from running 83 + concurrently with updates to x. Then using plain C-language writes 84 + to x allows KCSAN to detect reads from x from within regions of code 85 + that fail to exclude the updates. In this case, it is important to use 86 + data_race() for the diagnostic reads because otherwise KCSAN would give 87 + false-positive warnings about these diagnostic reads. 88 + 89 + In theory, plain C-language loads can also be used for this use case. 90 + However, in practice this will have the disadvantage of causing KCSAN 91 + to generate false positives because KCSAN will have no way of knowing 92 + that the resulting data race was intentional. 93 + 94 + 95 + Data-Racy Reads That Are Checked Against Marked Reload 96 + 97 + The values from some reads are not implicitly trusted. They are instead 98 + fed into some operation that checks the full value against a later marked 99 + load from memory, which means that the occasional arbitrarily bogus value 100 + is not a problem. For example, if a bogus value is fed into cmpxchg(), 101 + all that happens is that this cmpxchg() fails, which normally results 102 + in a retry. Unless the race condition that resulted in the bogus value 103 + recurs, this retry will with high probability succeed, so no harm done. 104 + 105 + However, please keep in mind that a data_race() load feeding into 106 + a cmpxchg_relaxed() might still be subject to load fusing on some 107 + architectures. Therefore, it is best to capture the return value from 108 + the failing cmpxchg() for the next iteration of the loop, an approach 109 + that provides the compiler much less scope for mischievous optimizations. 110 + Capturing the return value from cmpxchg() also saves a memory reference 111 + in many cases. 112 + 113 + In theory, plain C-language loads can also be used for this use case. 114 + However, in practice this will have the disadvantage of causing KCSAN 115 + to generate false positives because KCSAN will have no way of knowing 116 + that the resulting data race was intentional. 117 + 118 + 119 + Reads Feeding Into Error-Tolerant Heuristics 120 + 121 + Values from some reads feed into heuristics that can tolerate occasional 122 + errors. Such reads can use data_race(), thus allowing KCSAN to focus on 123 + the other accesses to the relevant shared variables. But please note 124 + that data_race() loads are subject to load fusing, which can result in 125 + consistent errors, which in turn are quite capable of breaking heuristics. 126 + Therefore use of data_race() should be limited to cases where some other 127 + code (such as a barrier() call) will force the occasional reload. 128 + 129 + In theory, plain C-language loads can also be used for this use case. 130 + However, in practice this will have the disadvantage of causing KCSAN 131 + to generate false positives because KCSAN will have no way of knowing 132 + that the resulting data race was intentional. 133 + 134 + 135 + Writes Setting Values Feeding Into Error-Tolerant Heuristics 136 + 137 + The values read into error-tolerant heuristics come from somewhere, 138 + for example, from sysfs. This means that some code in sysfs writes 139 + to this same variable, and these writes can also use data_race(). 140 + After all, if the heuristic can tolerate the occasional bogus value 141 + due to compiler-mangled reads, it can also tolerate the occasional 142 + compiler-mangled write, at least assuming that the proper value is in 143 + place once the write completes. 144 + 145 + Plain C-language stores can also be used for this use case. However, 146 + in kernels built with CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n, this 147 + will have the disadvantage of causing KCSAN to generate false positives 148 + because KCSAN will have no way of knowing that the resulting data race 149 + was intentional. 150 + 151 + 152 + Use of Plain C-Language Accesses 153 + -------------------------------- 154 + 155 + Here are some example situations where plain C-language accesses should 156 + used instead of READ_ONCE(), WRITE_ONCE(), and data_race(): 157 + 158 + 1. Accesses protected by mutual exclusion, including strict locking 159 + and sequence locking. 160 + 161 + 2. Initialization-time and cleanup-time accesses. This covers a 162 + wide variety of situations, including the uniprocessor phase of 163 + system boot, variables to be used by not-yet-spawned kthreads, 164 + structures not yet published to reference-counted or RCU-protected 165 + data structures, and the cleanup side of any of these situations. 166 + 167 + 3. Per-CPU variables that are not accessed from other CPUs. 168 + 169 + 4. Private per-task variables, including on-stack variables, some 170 + fields in the task_struct structure, and task-private heap data. 171 + 172 + 5. Any other loads for which there is not supposed to be a concurrent 173 + store to that same variable. 174 + 175 + 6. Any other stores for which there should be neither concurrent 176 + loads nor concurrent stores to that same variable. 177 + 178 + But note that KCSAN makes two explicit exceptions to this rule 179 + by default, refraining from flagging plain C-language stores: 180 + 181 + a. No matter what. You can override this default by building 182 + with CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n. 183 + 184 + b. When the store writes the value already contained in 185 + that variable. You can override this default by building 186 + with CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n. 187 + 188 + c. When one of the stores is in an interrupt handler and 189 + the other in the interrupted code. You can override this 190 + default by building with CONFIG_KCSAN_INTERRUPT_WATCHER=y. 191 + 192 + Note that it is important to use plain C-language accesses in these cases, 193 + because doing otherwise prevents KCSAN from detecting violations of your 194 + code's synchronization rules. 195 + 196 + 197 + ACCESS-DOCUMENTATION OPTIONS 198 + ============================ 199 + 200 + It is important to comment marked accesses so that people reading your 201 + code, yourself included, are reminded of the synchronization design. 202 + However, it is even more important to comment plain C-language accesses 203 + that are intentionally involved in data races. Such comments are 204 + needed to remind people reading your code, again, yourself included, 205 + of how the compiler has been prevented from optimizing those accesses 206 + into concurrency bugs. 207 + 208 + It is also possible to tell KCSAN about your synchronization design. 209 + For example, ASSERT_EXCLUSIVE_ACCESS(foo) tells KCSAN that any 210 + concurrent access to variable foo by any other CPU is an error, even 211 + if that concurrent access is marked with READ_ONCE(). In addition, 212 + ASSERT_EXCLUSIVE_WRITER(foo) tells KCSAN that although it is OK for there 213 + to be concurrent reads from foo from other CPUs, it is an error for some 214 + other CPU to be concurrently writing to foo, even if that concurrent 215 + write is marked with data_race() or WRITE_ONCE(). 216 + 217 + Note that although KCSAN will call out data races involving either 218 + ASSERT_EXCLUSIVE_ACCESS() or ASSERT_EXCLUSIVE_WRITER() on the one hand 219 + and data_race() writes on the other, KCSAN will not report the location 220 + of these data_race() writes. 221 + 222 + 223 + EXAMPLES 224 + ======== 225 + 226 + As noted earlier, the goal is to prevent the compiler from destroying 227 + your concurrent algorithm, to help the human reader, and to inform 228 + KCSAN of aspects of your concurrency design. This section looks at a 229 + few examples showing how this can be done. 230 + 231 + 232 + Lock Protection With Lockless Diagnostic Access 233 + ----------------------------------------------- 234 + 235 + For example, suppose a shared variable "foo" is read only while a 236 + reader-writer spinlock is read-held, written only while that same 237 + spinlock is write-held, except that it is also read locklessly for 238 + diagnostic purposes. The code might look as follows: 239 + 240 + int foo; 241 + DEFINE_RWLOCK(foo_rwlock); 242 + 243 + void update_foo(int newval) 244 + { 245 + write_lock(&foo_rwlock); 246 + foo = newval; 247 + do_something(newval); 248 + write_unlock(&foo_rwlock); 249 + } 250 + 251 + int read_foo(void) 252 + { 253 + int ret; 254 + 255 + read_lock(&foo_rwlock); 256 + do_something_else(); 257 + ret = foo; 258 + read_unlock(&foo_rwlock); 259 + return ret; 260 + } 261 + 262 + int read_foo_diagnostic(void) 263 + { 264 + return data_race(foo); 265 + } 266 + 267 + The reader-writer lock prevents the compiler from introducing concurrency 268 + bugs into any part of the main algorithm using foo, which means that 269 + the accesses to foo within both update_foo() and read_foo() can (and 270 + should) be plain C-language accesses. One benefit of making them be 271 + plain C-language accesses is that KCSAN can detect any erroneous lockless 272 + reads from or updates to foo. The data_race() in read_foo_diagnostic() 273 + tells KCSAN that data races are expected, and should be silently 274 + ignored. This data_race() also tells the human reading the code that 275 + read_foo_diagnostic() might sometimes return a bogus value. 276 + 277 + However, please note that your kernel must be built with 278 + CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n in order for KCSAN to 279 + detect a buggy lockless write. If you need KCSAN to detect such a 280 + write even if that write did not change the value of foo, you also 281 + need CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n. If you need KCSAN to 282 + detect such a write happening in an interrupt handler running on the 283 + same CPU doing the legitimate lock-protected write, you also need 284 + CONFIG_KCSAN_INTERRUPT_WATCHER=y. With some or all of these Kconfig 285 + options set properly, KCSAN can be quite helpful, although it is not 286 + necessarily a full replacement for hardware watchpoints. On the other 287 + hand, neither are hardware watchpoints a full replacement for KCSAN 288 + because it is not always easy to tell hardware watchpoint to conditionally 289 + trap on accesses. 290 + 291 + 292 + Lock-Protected Writes With Lockless Reads 293 + ----------------------------------------- 294 + 295 + For another example, suppose a shared variable "foo" is updated only 296 + while holding a spinlock, but is read locklessly. The code might look 297 + as follows: 298 + 299 + int foo; 300 + DEFINE_SPINLOCK(foo_lock); 301 + 302 + void update_foo(int newval) 303 + { 304 + spin_lock(&foo_lock); 305 + WRITE_ONCE(foo, newval); 306 + ASSERT_EXCLUSIVE_WRITER(foo); 307 + do_something(newval); 308 + spin_unlock(&foo_wlock); 309 + } 310 + 311 + int read_foo(void) 312 + { 313 + do_something_else(); 314 + return READ_ONCE(foo); 315 + } 316 + 317 + Because foo is read locklessly, all accesses are marked. The purpose 318 + of the ASSERT_EXCLUSIVE_WRITER() is to allow KCSAN to check for a buggy 319 + concurrent lockless write. 320 + 321 + 322 + Lockless Reads and Writes 323 + ------------------------- 324 + 325 + For another example, suppose a shared variable "foo" is both read and 326 + updated locklessly. The code might look as follows: 327 + 328 + int foo; 329 + 330 + int update_foo(int newval) 331 + { 332 + int ret; 333 + 334 + ret = xchg(&foo, newval); 335 + do_something(newval); 336 + return ret; 337 + } 338 + 339 + int read_foo(void) 340 + { 341 + do_something_else(); 342 + return READ_ONCE(foo); 343 + } 344 + 345 + Because foo is accessed locklessly, all accesses are marked. It does 346 + not make sense to use ASSERT_EXCLUSIVE_WRITER() in this case because 347 + there really can be concurrent lockless writers. KCSAN would 348 + flag any concurrent plain C-language reads from foo, and given 349 + CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n, also any concurrent plain 350 + C-language writes to foo. 351 + 352 + 353 + Lockless Reads and Writes, But With Single-Threaded Initialization 354 + ------------------------------------------------------------------ 355 + 356 + For yet another example, suppose that foo is initialized in a 357 + single-threaded manner, but that a number of kthreads are then created 358 + that locklessly and concurrently access foo. Some snippets of this code 359 + might look as follows: 360 + 361 + int foo; 362 + 363 + void initialize_foo(int initval, int nkthreads) 364 + { 365 + int i; 366 + 367 + foo = initval; 368 + ASSERT_EXCLUSIVE_ACCESS(foo); 369 + for (i = 0; i < nkthreads; i++) 370 + kthread_run(access_foo_concurrently, ...); 371 + } 372 + 373 + /* Called from access_foo_concurrently(). */ 374 + int update_foo(int newval) 375 + { 376 + int ret; 377 + 378 + ret = xchg(&foo, newval); 379 + do_something(newval); 380 + return ret; 381 + } 382 + 383 + /* Also called from access_foo_concurrently(). */ 384 + int read_foo(void) 385 + { 386 + do_something_else(); 387 + return READ_ONCE(foo); 388 + } 389 + 390 + The initialize_foo() uses a plain C-language write to foo because there 391 + are not supposed to be concurrent accesses during initialization. The 392 + ASSERT_EXCLUSIVE_ACCESS() allows KCSAN to flag buggy concurrent unmarked 393 + reads, and the ASSERT_EXCLUSIVE_ACCESS() call further allows KCSAN to 394 + flag buggy concurrent writes, even if: (1) Those writes are marked or 395 + (2) The kernel was built with CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=y. 396 + 397 + 398 + Checking Stress-Test Race Coverage 399 + ---------------------------------- 400 + 401 + When designing stress tests it is important to ensure that race conditions 402 + of interest really do occur. For example, consider the following code 403 + fragment: 404 + 405 + int foo; 406 + 407 + int update_foo(int newval) 408 + { 409 + return xchg(&foo, newval); 410 + } 411 + 412 + int xor_shift_foo(int shift, int mask) 413 + { 414 + int old, new, newold; 415 + 416 + newold = data_race(foo); /* Checked by cmpxchg(). */ 417 + do { 418 + old = newold; 419 + new = (old << shift) ^ mask; 420 + newold = cmpxchg(&foo, old, new); 421 + } while (newold != old); 422 + return old; 423 + } 424 + 425 + int read_foo(void) 426 + { 427 + return READ_ONCE(foo); 428 + } 429 + 430 + If it is possible for update_foo(), xor_shift_foo(), and read_foo() to be 431 + invoked concurrently, the stress test should force this concurrency to 432 + actually happen. KCSAN can evaluate the stress test when the above code 433 + is modified to read as follows: 434 + 435 + int foo; 436 + 437 + int update_foo(int newval) 438 + { 439 + ASSERT_EXCLUSIVE_ACCESS(foo); 440 + return xchg(&foo, newval); 441 + } 442 + 443 + int xor_shift_foo(int shift, int mask) 444 + { 445 + int old, new, newold; 446 + 447 + newold = data_race(foo); /* Checked by cmpxchg(). */ 448 + do { 449 + old = newold; 450 + new = (old << shift) ^ mask; 451 + ASSERT_EXCLUSIVE_ACCESS(foo); 452 + newold = cmpxchg(&foo, old, new); 453 + } while (newold != old); 454 + return old; 455 + } 456 + 457 + 458 + int read_foo(void) 459 + { 460 + ASSERT_EXCLUSIVE_ACCESS(foo); 461 + return READ_ONCE(foo); 462 + } 463 + 464 + If a given stress-test run does not result in KCSAN complaints from 465 + each possible pair of ASSERT_EXCLUSIVE_ACCESS() invocations, the 466 + stress test needs improvement. If the stress test was to be evaluated 467 + on a regular basis, it would be wise to place the above instances of 468 + ASSERT_EXCLUSIVE_ACCESS() under #ifdef so that they did not result in 469 + false positives when not evaluating the stress test. 470 + 471 + 472 + REFERENCES 473 + ========== 474 + 475 + [1] "Concurrency bugs should fear the big bad data-race detector (part 2)" 476 + https://lwn.net/Articles/816854/ 477 + 478 + [2] "Who's afraid of a big bad optimizing compiler?" 479 + https://lwn.net/Articles/793253/