Fix get_user_pages() race for write access

There's no real guarantee that handle_mm_fault() will always be able to
break a COW situation - if an update from another thread ends up
modifying the page table some way, handle_mm_fault() may end up
requiring us to re-try the operation.

That's normally fine, but get_user_pages() ended up re-trying it as a
read, and thus a write access could in theory end up losing the dirty
bit or be done on a page that had not been properly COW'ed.

This makes get_user_pages() always retry write accesses as write
accesses by making "follow_page()" require that a writable follow has
the dirty bit set. That simplifies the code and solves the race: if the
COW break fails for some reason, we'll just loop around and try again.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>

+4 -17
+4 -17
mm/memory.c
··· 811 pte = *ptep; 812 pte_unmap(ptep); 813 if (pte_present(pte)) { 814 - if (write && !pte_write(pte)) 815 goto out; 816 if (read && !pte_read(pte)) 817 goto out; 818 pfn = pte_pfn(pte); 819 if (pfn_valid(pfn)) { 820 page = pfn_to_page(pfn); 821 - if (accessed) { 822 - if (write && !pte_dirty(pte) &&!PageDirty(page)) 823 - set_page_dirty(page); 824 mark_page_accessed(page); 825 - } 826 return page; 827 } 828 } ··· 938 spin_lock(&mm->page_table_lock); 939 do { 940 struct page *page; 941 - int lookup_write = write; 942 943 cond_resched_lock(&mm->page_table_lock); 944 - while (!(page = follow_page(mm, start, lookup_write))) { 945 /* 946 * Shortcut for anonymous pages. We don't want 947 * to force the creation of pages tables for ··· 948 * nobody touched so far. This is important 949 * for doing a core dump for these mappings. 950 */ 951 - if (!lookup_write && 952 - untouched_anonymous_page(mm,vma,start)) { 953 page = ZERO_PAGE(start); 954 break; 955 } ··· 967 default: 968 BUG(); 969 } 970 - /* 971 - * Now that we have performed a write fault 972 - * and surely no longer have a shared page we 973 - * shouldn't write, we shouldn't ignore an 974 - * unwritable page in the page table if 975 - * we are forcing write access. 976 - */ 977 - lookup_write = write && !force; 978 spin_lock(&mm->page_table_lock); 979 } 980 if (pages) {
··· 811 pte = *ptep; 812 pte_unmap(ptep); 813 if (pte_present(pte)) { 814 + if (write && !pte_dirty(pte)) 815 goto out; 816 if (read && !pte_read(pte)) 817 goto out; 818 pfn = pte_pfn(pte); 819 if (pfn_valid(pfn)) { 820 page = pfn_to_page(pfn); 821 + if (accessed) 822 mark_page_accessed(page); 823 return page; 824 } 825 } ··· 941 spin_lock(&mm->page_table_lock); 942 do { 943 struct page *page; 944 945 cond_resched_lock(&mm->page_table_lock); 946 + while (!(page = follow_page(mm, start, write))) { 947 /* 948 * Shortcut for anonymous pages. We don't want 949 * to force the creation of pages tables for ··· 952 * nobody touched so far. This is important 953 * for doing a core dump for these mappings. 954 */ 955 + if (!write && untouched_anonymous_page(mm,vma,start)) { 956 page = ZERO_PAGE(start); 957 break; 958 } ··· 972 default: 973 BUG(); 974 } 975 spin_lock(&mm->page_table_lock); 976 } 977 if (pages) {