Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

[PATCH] page fault retry with NOPAGE_REFAULT

Add a way for a no_page() handler to request a retry of the faulting
instruction. It goes back to userland on page faults and just tries again
in get_user_pages(). I added a cond_resched() in the loop in that later
case.

The problem I have with signal and spufs is an actual bug affecting apps and I
don't see other ways of fixing it.

In addition, we are having issues with infiniband and 64k pages (related to
the way the hypervisor deals with some HV cards) that will require us to muck
around with the MMU from within the IB driver's no_page() (it's a pSeries
specific driver) and return to the caller the same way using NOPAGE_REFAULT.

And to add to this, the graphics folks have been following a new approach of
memory management that involves transparently swapping objects between video
ram and main meory. To do that, they need installing PTEs from a no_page()
handler as well and that also requires returning with NOPAGE_REFAULT.

(For the later, they are currently using io_remap_pfn_range to install one PTE
from no_page() which is a bit racy, we need to add a check for the PTE having
already been installed afer taking the lock, but that's ok, they are only at
the proof-of-concept stage. I'll send a patch adding a "clean" function to do
that, we can use that from spufs too and get rid of the sparsemem hacks we do
to create struct page for SPEs. Basically, that provides a generic solution
for being able to have no_page() map hardware devices, which is something that
I think sound driver folks have been asking for some time too).

All of these things depend on having the NOPAGE_REFAULT exit path from
no_page() handlers.

Signed-off-by: Benjamin Herrenchmidt <benh@kernel.crashing.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

authored by

Benjamin Herrenschmidt and committed by
Linus Torvalds
7f7bbbe5 1ca4cb24

+7 -3
+1
include/linux/mm.h
··· 593 593 */ 594 594 #define NOPAGE_SIGBUS (NULL) 595 595 #define NOPAGE_OOM ((struct page *) (-1)) 596 + #define NOPAGE_REFAULT ((struct page *) (-2)) /* Return to userspace, rerun */ 596 597 597 598 /* 598 599 * Error return values for the *_nopfn functions
+6 -3
mm/memory.c
··· 1086 1086 default: 1087 1087 BUG(); 1088 1088 } 1089 + cond_resched(); 1089 1090 } 1090 1091 if (pages) { 1091 1092 pages[i] = page; ··· 2170 2169 * after the next truncate_count read. 2171 2170 */ 2172 2171 2173 - /* no page was available -- either SIGBUS or OOM */ 2174 - if (new_page == NOPAGE_SIGBUS) 2172 + /* no page was available -- either SIGBUS, OOM or REFAULT */ 2173 + if (unlikely(new_page == NOPAGE_SIGBUS)) 2175 2174 return VM_FAULT_SIGBUS; 2176 - if (new_page == NOPAGE_OOM) 2175 + else if (unlikely(new_page == NOPAGE_OOM)) 2177 2176 return VM_FAULT_OOM; 2177 + else if (unlikely(new_page == NOPAGE_REFAULT)) 2178 + return VM_FAULT_MINOR; 2178 2179 2179 2180 /* 2180 2181 * Should we do an early C-O-W break?