Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Documentation/memory-barriers.txt: various fixes

Fix various grammatical issues in Documentation/memory-barriers.txt.

Cc: "Robert P. J. Day" <rpjday@mindspring.com>
Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Jarek Poplawski and committed by
Linus Torvalds
81fc6323 03491c92

+49 -49
+49 -49
Documentation/memory-barriers.txt
··· 24 24 (*) Explicit kernel barriers. 25 25 26 26 - Compiler barrier. 27 - - The CPU memory barriers. 27 + - CPU memory barriers. 28 28 - MMIO write barrier. 29 29 30 30 (*) Implicit kernel memory barriers. ··· 265 265 ordering over the memory operations on either side of the barrier. 266 266 267 267 Such enforcement is important because the CPUs and other devices in a system 268 - can use a variety of tricks to improve performance - including reordering, 268 + can use a variety of tricks to improve performance, including reordering, 269 269 deferral and combination of memory operations; speculative loads; speculative 270 270 branch prediction and various types of caching. Memory barriers are used to 271 271 override or suppress these tricks, allowing the code to sanely control the ··· 457 457 (Q == &A) implies (D == 1) 458 458 (Q == &B) implies (D == 4) 459 459 460 - But! CPU 2's perception of P may be updated _before_ its perception of B, thus 460 + But! CPU 2's perception of P may be updated _before_ its perception of B, thus 461 461 leading to the following situation: 462 462 463 463 (Q == &B) and (D == 2) ???? ··· 573 573 the "weaker" type. 574 574 575 575 [!] Note that the stores before the write barrier would normally be expected to 576 - match the loads after the read barrier or data dependency barrier, and vice 576 + match the loads after the read barrier or the data dependency barrier, and vice 577 577 versa: 578 578 579 579 CPU 1 CPU 2 ··· 588 588 EXAMPLES OF MEMORY BARRIER SEQUENCES 589 589 ------------------------------------ 590 590 591 - Firstly, write barriers act as a partial orderings on store operations. 591 + Firstly, write barriers act as partial orderings on store operations. 592 592 Consider the following sequence of events: 593 593 594 594 CPU 1 ··· 608 608 +-------+ : : 609 609 | | +------+ 610 610 | |------>| C=3 | } /\ 611 - | | : +------+ }----- \ -----> Events perceptible 612 - | | : | A=1 | } \/ to rest of system 611 + | | : +------+ }----- \ -----> Events perceptible to 612 + | | : | A=1 | } \/ the rest of the system 613 613 | | : +------+ } 614 614 | CPU 1 | : | B=2 | } 615 615 | | +------+ } 616 616 | | wwwwwwwwwwwwwwww } <--- At this point the write barrier 617 617 | | +------+ } requires all stores prior to the 618 618 | | : | E=5 | } barrier to be committed before 619 - | | : +------+ } further stores may be take place. 619 + | | : +------+ } further stores may take place 620 620 | |------>| D=4 | } 621 621 | | +------+ 622 622 +-------+ : : ··· 626 626 V 627 627 628 628 629 - Secondly, data dependency barriers act as a partial orderings on data-dependent 629 + Secondly, data dependency barriers act as partial orderings on data-dependent 630 630 loads. Consider the following sequence of events: 631 631 632 632 CPU 1 CPU 2 ··· 975 975 976 976 barrier(); 977 977 978 - This a general barrier - lesser varieties of compiler barrier do not exist. 978 + This is a general barrier - lesser varieties of compiler barrier do not exist. 979 979 980 980 The compiler barrier has no direct effect on the CPU, which may then reorder 981 981 things however it wishes. ··· 997 997 All CPU memory barriers unconditionally imply compiler barriers. 998 998 999 999 SMP memory barriers are reduced to compiler barriers on uniprocessor compiled 1000 - systems because it is assumed that a CPU will be appear to be self-consistent, 1000 + systems because it is assumed that a CPU will appear to be self-consistent, 1001 1001 and will order overlapping accesses correctly with respect to itself. 1002 1002 1003 1003 [!] Note that SMP memory barriers _must_ be used to control the ordering of ··· 1146 1146 Therefore, from (1), (2) and (4) an UNLOCK followed by an unconditional LOCK is 1147 1147 equivalent to a full barrier, but a LOCK followed by an UNLOCK is not. 1148 1148 1149 - [!] Note: one of the consequence of LOCKs and UNLOCKs being only one-way 1150 - barriers is that the effects instructions outside of a critical section may 1151 - seep into the inside of the critical section. 1149 + [!] Note: one of the consequences of LOCKs and UNLOCKs being only one-way 1150 + barriers is that the effects of instructions outside of a critical section 1151 + may seep into the inside of the critical section. 1152 1152 1153 1153 A LOCK followed by an UNLOCK may not be assumed to be full memory barrier 1154 1154 because it is possible for an access preceding the LOCK to happen after the ··· 1239 1239 UNLOCK M UNLOCK Q 1240 1240 *D = d; *H = h; 1241 1241 1242 - Then there is no guarantee as to what order CPU #3 will see the accesses to *A 1242 + Then there is no guarantee as to what order CPU 3 will see the accesses to *A 1243 1243 through *H occur in, other than the constraints imposed by the separate locks 1244 1244 on the separate CPUs. It might, for example, see: 1245 1245 ··· 1269 1269 UNLOCK M [2] 1270 1270 *H = h; 1271 1271 1272 - CPU #3 might see: 1272 + CPU 3 might see: 1273 1273 1274 1274 *E, LOCK M [1], *C, *B, *A, UNLOCK M [1], 1275 1275 LOCK M [2], *H, *F, *G, UNLOCK M [2], *D 1276 1276 1277 - But assuming CPU #1 gets the lock first, it won't see any of: 1277 + But assuming CPU 1 gets the lock first, CPU 3 won't see any of: 1278 1278 1279 1279 *B, *C, *D, *F, *G or *H preceding LOCK M [1] 1280 1280 *A, *B or *C following UNLOCK M [1] ··· 1327 1327 mmiowb(); 1328 1328 spin_unlock(Q); 1329 1329 1330 - this will ensure that the two stores issued on CPU #1 appear at the PCI bridge 1331 - before either of the stores issued on CPU #2. 1330 + this will ensure that the two stores issued on CPU 1 appear at the PCI bridge 1331 + before either of the stores issued on CPU 2. 1332 1332 1333 1333 1334 - Furthermore, following a store by a load to the same device obviates the need 1335 - for an mmiowb(), because the load forces the store to complete before the load 1334 + Furthermore, following a store by a load from the same device obviates the need 1335 + for the mmiowb(), because the load forces the store to complete before the load 1336 1336 is performed: 1337 1337 1338 1338 CPU 1 CPU 2 ··· 1363 1363 1364 1364 (*) Atomic operations. 1365 1365 1366 - (*) Accessing devices (I/O). 1366 + (*) Accessing devices. 1367 1367 1368 1368 (*) Interrupts. 1369 1369 ··· 1399 1399 (1) read the next pointer from this waiter's record to know as to where the 1400 1400 next waiter record is; 1401 1401 1402 - (4) read the pointer to the waiter's task structure; 1402 + (2) read the pointer to the waiter's task structure; 1403 1403 1404 1404 (3) clear the task pointer to tell the waiter it has been given the semaphore; 1405 1405 ··· 1407 1407 1408 1408 (5) release the reference held on the waiter's task struct. 1409 1409 1410 - In otherwords, it has to perform this sequence of events: 1410 + In other words, it has to perform this sequence of events: 1411 1411 1412 1412 LOAD waiter->list.next; 1413 1413 LOAD waiter->task; ··· 1502 1502 such the implicit memory barrier effects are necessary. 1503 1503 1504 1504 1505 - The following operation are potential problems as they do _not_ imply memory 1505 + The following operations are potential problems as they do _not_ imply memory 1506 1506 barriers, but might be used for implementing such things as UNLOCK-class 1507 1507 operations: 1508 1508 ··· 1517 1517 1518 1518 The following also do _not_ imply memory barriers, and so may require explicit 1519 1519 memory barriers under some circumstances (smp_mb__before_atomic_dec() for 1520 - instance)): 1520 + instance): 1521 1521 1522 1522 atomic_add(); 1523 1523 atomic_sub(); ··· 1641 1641 indeed have special I/O space access cycles and instructions, but many 1642 1642 CPUs don't have such a concept. 1643 1643 1644 - The PCI bus, amongst others, defines an I/O space concept - which on such 1645 - CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O 1644 + The PCI bus, amongst others, defines an I/O space concept which - on such 1645 + CPUs as i386 and x86_64 - readily maps to the CPU's concept of I/O 1646 1646 space. However, it may also be mapped as a virtual I/O space in the CPU's 1647 1647 memory map, particularly on those CPUs that don't support alternate I/O 1648 1648 spaces. ··· 1664 1664 i386 architecture machines, for example, this is controlled by way of the 1665 1665 MTRR registers. 1666 1666 1667 - Ordinarily, these will be guaranteed to be fully ordered and uncombined,, 1667 + Ordinarily, these will be guaranteed to be fully ordered and uncombined, 1668 1668 provided they're not accessing a prefetchable device. 1669 1669 1670 1670 However, intermediary hardware (such as a PCI bridge) may indulge in ··· 1689 1689 1690 1690 (*) ioreadX(), iowriteX() 1691 1691 1692 - These will perform as appropriate for the type of access they're actually 1692 + These will perform appropriately for the type of access they're actually 1693 1693 doing, be it inX()/outX() or readX()/writeX(). 1694 1694 1695 1695 ··· 1705 1705 1706 1706 This means that it must be considered that the CPU will execute its instruction 1707 1707 stream in any order it feels like - or even in parallel - provided that if an 1708 - instruction in the stream depends on the an earlier instruction, then that 1708 + instruction in the stream depends on an earlier instruction, then that 1709 1709 earlier instruction must be sufficiently complete[*] before the later 1710 1710 instruction may proceed; in other words: provided that the appearance of 1711 1711 causality is maintained. ··· 1795 1795 become apparent in the same order on those other CPUs. 1796 1796 1797 1797 1798 - Consider dealing with a system that has pair of CPUs (1 & 2), each of which has 1799 - a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D): 1798 + Consider dealing with a system that has a pair of CPUs (1 & 2), each of which 1799 + has a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D): 1800 1800 1801 1801 : 1802 1802 : +--------+ ··· 1835 1835 1836 1836 (*) the coherency queue is not flushed by normal loads to lines already 1837 1837 present in the cache, even though the contents of the queue may 1838 - potentially effect those loads. 1838 + potentially affect those loads. 1839 1839 1840 1840 Imagine, then, that two writes are made on the first CPU, with a write barrier 1841 1841 between them to guarantee that they will appear to reach that CPU's caches in ··· 1845 1845 =============== =============== ======================================= 1846 1846 u == 0, v == 1 and p == &u, q == &u 1847 1847 v = 2; 1848 - smp_wmb(); Make sure change to v visible before 1848 + smp_wmb(); Make sure change to v is visible before 1849 1849 change to p 1850 1850 <A:modify v=2> v is now in cache A exclusively 1851 1851 p = &v; ··· 1853 1853 1854 1854 The write memory barrier forces the other CPUs in the system to perceive that 1855 1855 the local CPU's caches have apparently been updated in the correct order. But 1856 - now imagine that the second CPU that wants to read those values: 1856 + now imagine that the second CPU wants to read those values: 1857 1857 1858 1858 CPU 1 CPU 2 COMMENT 1859 1859 =============== =============== ======================================= ··· 1861 1861 q = p; 1862 1862 x = *q; 1863 1863 1864 - The above pair of reads may then fail to happen in expected order, as the 1864 + The above pair of reads may then fail to happen in the expected order, as the 1865 1865 cacheline holding p may get updated in one of the second CPU's caches whilst 1866 1866 the update to the cacheline holding v is delayed in the other of the second 1867 1867 CPU's caches by some other cache event: ··· 1916 1916 1917 1917 Other CPUs may also have split caches, but must coordinate between the various 1918 1918 cachelets for normal memory accesses. The semantics of the Alpha removes the 1919 - need for coordination in absence of memory barriers. 1919 + need for coordination in the absence of memory barriers. 1920 1920 1921 1921 1922 1922 CACHE COHERENCY VS DMA ··· 1931 1931 1932 1932 In addition, the data DMA'd to RAM by a device may be overwritten by dirty 1933 1933 cache lines being written back to RAM from a CPU's cache after the device has 1934 - installed its own data, or cache lines simply present in a CPUs cache may 1935 - simply obscure the fact that RAM has been updated, until at such time as the 1936 - cacheline is discarded from the CPU's cache and reloaded. To deal with this, 1937 - the appropriate part of the kernel must invalidate the overlapping bits of the 1934 + installed its own data, or cache lines present in the CPU's cache may simply 1935 + obscure the fact that RAM has been updated, until at such time as the cacheline 1936 + is discarded from the CPU's cache and reloaded. To deal with this, the 1937 + appropriate part of the kernel must invalidate the overlapping bits of the 1938 1938 cache on each CPU. 1939 1939 1940 1940 See Documentation/cachetlb.txt for more information on cache management. ··· 1944 1944 ----------------------- 1945 1945 1946 1946 Memory mapped I/O usually takes place through memory locations that are part of 1947 - a window in the CPU's memory space that have different properties assigned than 1947 + a window in the CPU's memory space that has different properties assigned than 1948 1948 the usual RAM directed window. 1949 1949 1950 1950 Amongst these properties is usually the fact that such accesses bypass the ··· 1960 1960 ========================= 1961 1961 1962 1962 A programmer might take it for granted that the CPU will perform memory 1963 - operations in exactly the order specified, so that if a CPU is, for example, 1963 + operations in exactly the order specified, so that if the CPU is, for example, 1964 1964 given the following piece of code to execute: 1965 1965 1966 1966 a = *A; ··· 1969 1969 d = *D; 1970 1970 *E = e; 1971 1971 1972 - They would then expect that the CPU will complete the memory operation for each 1972 + they would then expect that the CPU will complete the memory operation for each 1973 1973 instruction before moving on to the next one, leading to a definite sequence of 1974 1974 operations as seen by external observers in the system: 1975 1975 ··· 1986 1986 (*) loads may be done speculatively, and the result discarded should it prove 1987 1987 to have been unnecessary; 1988 1988 1989 - (*) loads may be done speculatively, leading to the result having being 1990 - fetched at the wrong time in the expected sequence of events; 1989 + (*) loads may be done speculatively, leading to the result having been fetched 1990 + at the wrong time in the expected sequence of events; 1991 1991 1992 1992 (*) the order of the memory accesses may be rearranged to promote better use 1993 1993 of the CPU buses and caches; ··· 2069 2069 2070 2070 The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that, 2071 2071 some versions of the Alpha CPU have a split data cache, permitting them to have 2072 - two semantically related cache lines updating at separate times. This is where 2072 + two semantically-related cache lines updated at separate times. This is where 2073 2073 the data dependency barrier really becomes necessary as this synchronises both 2074 2074 caches with the memory coherence system, thus making it seem like pointer 2075 2075 changes vs new data occur in the right order. 2076 2076 2077 - The Alpha defines the Linux's kernel's memory barrier model. 2077 + The Alpha defines the Linux kernel's memory barrier model. 2078 2078 2079 2079 See the subsection on "Cache Coherency" above. 2080 2080