Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

docs/mm: document latest changes to vm_lock

Change the documentation to reflect that vm_lock is integrated into vma
and replaced with vm_refcnt. Document newly introduced
vma_start_read_locked{_nested} functions.

Link: https://lkml.kernel.org/r/20250213224655.1680278-19-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Suren Baghdasaryan and committed by
Andrew Morton
795f2961 31041385

+26 -18
+26 -18
Documentation/mm/process_addrs.rst
··· 716 716 critical section, then attempts to VMA lock it via :c:func:`!vma_start_read`, 717 717 before releasing the RCU lock via :c:func:`!rcu_read_unlock`. 718 718 719 - VMA read locks hold the read lock on the :c:member:`!vma->vm_lock` semaphore for 720 - their duration and the caller of :c:func:`!lock_vma_under_rcu` must release it 721 - via :c:func:`!vma_end_read`. 719 + In cases when the user already holds mmap read lock, :c:func:`!vma_start_read_locked` 720 + and :c:func:`!vma_start_read_locked_nested` can be used. These functions do not 721 + fail due to lock contention but the caller should still check their return values 722 + in case they fail for other reasons. 723 + 724 + VMA read locks increment :c:member:`!vma.vm_refcnt` reference counter for their 725 + duration and the caller of :c:func:`!lock_vma_under_rcu` must drop it via 726 + :c:func:`!vma_end_read`. 722 727 723 728 VMA **write** locks are acquired via :c:func:`!vma_start_write` in instances where a 724 729 VMA is about to be modified, unlike :c:func:`!vma_start_read` the lock is always ··· 731 726 lock, releasing or downgrading the mmap write lock also releases the VMA write 732 727 lock so there is no :c:func:`!vma_end_write` function. 733 728 734 - Note that a semaphore write lock is not held across a VMA lock. Rather, a 735 - sequence number is used for serialisation, and the write semaphore is only 736 - acquired at the point of write lock to update this. 729 + Note that when write-locking a VMA lock, the :c:member:`!vma.vm_refcnt` is temporarily 730 + modified so that readers can detect the presense of a writer. The reference counter is 731 + restored once the vma sequence number used for serialisation is updated. 737 732 738 733 This ensures the semantics we require - VMA write locks provide exclusive write 739 734 access to the VMA. ··· 743 738 744 739 The VMA lock mechanism is designed to be a lightweight means of avoiding the use 745 740 of the heavily contended mmap lock. It is implemented using a combination of a 746 - read/write semaphore and sequence numbers belonging to the containing 741 + reference counter and sequence numbers belonging to the containing 747 742 :c:struct:`!struct mm_struct` and the VMA. 748 743 749 744 Read locks are acquired via :c:func:`!vma_start_read`, which is an optimistic ··· 784 779 keep VMAs locked across entirely separate write operations. It also maintains 785 780 correct lock ordering. 786 781 787 - Each time a VMA read lock is acquired, we acquire a read lock on the 788 - :c:member:`!vma->vm_lock` read/write semaphore and hold it, while checking that 789 - the sequence count of the VMA does not match that of the mm. 782 + Each time a VMA read lock is acquired, we increment :c:member:`!vma.vm_refcnt` 783 + reference counter and check that the sequence count of the VMA does not match 784 + that of the mm. 790 785 791 - If it does, the read lock fails. If it does not, we hold the lock, excluding 792 - writers, but permitting other readers, who will also obtain this lock under RCU. 786 + If it does, the read lock fails and :c:member:`!vma.vm_refcnt` is dropped. 787 + If it does not, we keep the reference counter raised, excluding writers, but 788 + permitting other readers, who can also obtain this lock under RCU. 793 789 794 790 Importantly, maple tree operations performed in :c:func:`!lock_vma_under_rcu` 795 791 are also RCU safe, so the whole read lock operation is guaranteed to function 796 792 correctly. 797 793 798 - On the write side, we acquire a write lock on the :c:member:`!vma->vm_lock` 799 - read/write semaphore, before setting the VMA's sequence number under this lock, 800 - also simultaneously holding the mmap write lock. 794 + On the write side, we set a bit in :c:member:`!vma.vm_refcnt` which can't be 795 + modified by readers and wait for all readers to drop their reference count. 796 + Once there are no readers, the VMA's sequence number is set to match that of 797 + the mm. During this entire operation mmap write lock is held. 801 798 802 799 This way, if any read locks are in effect, :c:func:`!vma_start_write` will sleep 803 800 until these are finished and mutual exclusion is achieved. 804 801 805 - After setting the VMA's sequence number, the lock is released, avoiding 806 - complexity with a long-term held write lock. 802 + After setting the VMA's sequence number, the bit in :c:member:`!vma.vm_refcnt` 803 + indicating a writer is cleared. From this point on, VMA's sequence number will 804 + indicate VMA's write-locked state until mmap write lock is dropped or downgraded. 807 805 808 - This clever combination of a read/write semaphore and sequence count allows for 806 + This clever combination of a reference counter and sequence count allows for 809 807 fast RCU-based per-VMA lock acquisition (especially on page fault, though 810 808 utilised elsewhere) with minimal complexity around lock ordering. 811 809