Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

tools/memory-model: Provide extra ordering for unlock+lock pair on the same CPU

A recent discussion[1] shows that we are in favor of strengthening the
ordering of unlock + lock on the same CPU: a unlock and a po-after lock
should provide the so-called RCtso ordering, that is a memory access S
po-before the unlock should be ordered against a memory access R
po-after the lock, unless S is a store and R is a load.

The strengthening meets programmers' expection that "sequence of two
locked regions to be ordered wrt each other" (from Linus), and can
reduce the mental burden when using locks. Therefore add it in LKMM.

[1]: https://lore.kernel.org/lkml/20210909185937.GA12379@rowland.harvard.edu/

Co-developed-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> (RISC-V)
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

authored by

Boqun Feng and committed by
Paul E. McKenney
ddfe1294 fa55b7dc

+28 -22
+25 -19
tools/memory-model/Documentation/explanation.txt
··· 1813 1813 lock-acquires -- have two properties beyond those of ordinary releases 1814 1814 and acquires. 1815 1815 1816 - First, when a lock-acquire reads from a lock-release, the LKMM 1817 - requires that every instruction po-before the lock-release must 1818 - execute before any instruction po-after the lock-acquire. This would 1819 - naturally hold if the release and acquire operations were on different 1820 - CPUs, but the LKMM says it holds even when they are on the same CPU. 1821 - For example: 1816 + First, when a lock-acquire reads from or is po-after a lock-release, 1817 + the LKMM requires that every instruction po-before the lock-release 1818 + must execute before any instruction po-after the lock-acquire. This 1819 + would naturally hold if the release and acquire operations were on 1820 + different CPUs and accessed the same lock variable, but the LKMM says 1821 + it also holds when they are on the same CPU, even if they access 1822 + different lock variables. For example: 1822 1823 1823 1824 int x, y; 1824 - spinlock_t s; 1825 + spinlock_t s, t; 1825 1826 1826 1827 P0() 1827 1828 { ··· 1831 1830 spin_lock(&s); 1832 1831 r1 = READ_ONCE(x); 1833 1832 spin_unlock(&s); 1834 - spin_lock(&s); 1833 + spin_lock(&t); 1835 1834 r2 = READ_ONCE(y); 1836 - spin_unlock(&s); 1835 + spin_unlock(&t); 1837 1836 } 1838 1837 1839 1838 P1() ··· 1843 1842 WRITE_ONCE(x, 1); 1844 1843 } 1845 1844 1846 - Here the second spin_lock() reads from the first spin_unlock(), and 1847 - therefore the load of x must execute before the load of y. Thus we 1848 - cannot have r1 = 1 and r2 = 0 at the end (this is an instance of the 1849 - MP pattern). 1845 + Here the second spin_lock() is po-after the first spin_unlock(), and 1846 + therefore the load of x must execute before the load of y, even though 1847 + the two locking operations use different locks. Thus we cannot have 1848 + r1 = 1 and r2 = 0 at the end (this is an instance of the MP pattern). 1850 1849 1851 1850 This requirement does not apply to ordinary release and acquire 1852 1851 fences, only to lock-related operations. For instance, suppose P0() ··· 1873 1872 1874 1873 and thus it could load y before x, obtaining r2 = 0 and r1 = 1. 1875 1874 1876 - Second, when a lock-acquire reads from a lock-release, and some other 1877 - stores W and W' occur po-before the lock-release and po-after the 1878 - lock-acquire respectively, the LKMM requires that W must propagate to 1879 - each CPU before W' does. For example, consider: 1875 + Second, when a lock-acquire reads from or is po-after a lock-release, 1876 + and some other stores W and W' occur po-before the lock-release and 1877 + po-after the lock-acquire respectively, the LKMM requires that W must 1878 + propagate to each CPU before W' does. For example, consider: 1880 1879 1881 1880 int x, y; 1882 - spinlock_t x; 1881 + spinlock_t s; 1883 1882 1884 1883 P0() 1885 1884 { ··· 1909 1908 1910 1909 If r1 = 1 at the end then the spin_lock() in P1 must have read from 1911 1910 the spin_unlock() in P0. Hence the store to x must propagate to P2 1912 - before the store to y does, so we cannot have r2 = 1 and r3 = 0. 1911 + before the store to y does, so we cannot have r2 = 1 and r3 = 0. But 1912 + if P1 had used a lock variable different from s, the writes could have 1913 + propagated in either order. (On the other hand, if the code in P0 and 1914 + P1 had all executed on a single CPU, as in the example before this 1915 + one, then the writes would have propagated in order even if the two 1916 + critical sections used different lock variables.) 1913 1917 1914 1918 These two special requirements for lock-release and lock-acquire do 1915 1919 not arise from the operational model. Nevertheless, kernel developers
+3 -3
tools/memory-model/linux-kernel.cat
··· 27 27 (* Release Acquire *) 28 28 let acq-po = [Acquire] ; po ; [M] 29 29 let po-rel = [M] ; po ; [Release] 30 - let po-unlock-rf-lock-po = po ; [UL] ; rf ; [LKR] ; po 30 + let po-unlock-lock-po = po ; [UL] ; (po|rf) ; [LKR] ; po 31 31 32 32 (* Fences *) 33 33 let R4rmb = R \ Noreturn (* Reads for which rmb works *) ··· 70 70 let overwrite = co | fr 71 71 let to-w = rwdep | (overwrite & int) | (addr ; [Plain] ; wmb) 72 72 let to-r = addr | (dep ; [Marked] ; rfi) 73 - let ppo = to-r | to-w | fence | (po-unlock-rf-lock-po & int) 73 + let ppo = to-r | to-w | fence | (po-unlock-lock-po & int) 74 74 75 75 (* Propagation: Ordering from release operations and strong fences. *) 76 76 let A-cumul(r) = (rfe ; [Marked])? ; r 77 77 let cumul-fence = [Marked] ; (A-cumul(strong-fence | po-rel) | wmb | 78 - po-unlock-rf-lock-po) ; [Marked] 78 + po-unlock-lock-po) ; [Marked] 79 79 let prop = [Marked] ; (overwrite & ext)? ; cumul-fence* ; 80 80 [Marked] ; rfe? ; [Marked] 81 81