Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm: non-atomically mark page accessed during page cache allocation where possible

aops->write_begin may allocate a new page and make it visible only to have
mark_page_accessed called almost immediately after. Once the page is
visible the atomic operations are necessary which is noticable overhead
when writing to an in-memory filesystem like tmpfs but should also be
noticable with fast storage. The objective of the patch is to initialse
the accessed information with non-atomic operations before the page is
visible.

The bulk of filesystems directly or indirectly use
grab_cache_page_write_begin or find_or_create_page for the initial
allocation of a page cache page. This patch adds an init_page_accessed()
helper which behaves like the first call to mark_page_accessed() but may
called before the page is visible and can be done non-atomically.

The primary APIs of concern in this care are the following and are used
by most filesystems.

find_get_page
find_lock_page
find_or_create_page
grab_cache_page_nowait
grab_cache_page_write_begin

All of them are very similar in detail to the patch creates a core helper
pagecache_get_page() which takes a flags parameter that affects its
behavior such as whether the page should be marked accessed or not. Then
old API is preserved but is basically a thin wrapper around this core
function.

Each of the filesystems are then updated to avoid calling
mark_page_accessed when it is known that the VM interfaces have already
done the job. There is a slight snag in that the timing of the
mark_page_accessed() has now changed so in rare cases it's possible a page
gets to the end of the LRU as PageReferenced where as previously it might
have been repromoted. This is expected to be rare but it's worth the
filesystem people thinking about it in case they see a problem with the
timing change. It is also the case that some filesystems may be marking
pages accessed that previously did not but it makes sense that filesystems
have consistent behaviour in this regard.

The test case used to evaulate this is a simple dd of a large file done
multiple times with the file deleted on each iterations. The size of the
file is 1/10th physical memory to avoid dirty page balancing. In the
async case it will be possible that the workload completes without even
hitting the disk and will have variable results but highlight the impact
of mark_page_accessed for async IO. The sync results are expected to be
more stable. The exception is tmpfs where the normal case is for the "IO"
to not hit the disk.

The test machine was single socket and UMA to avoid any scheduling or NUMA
artifacts. Throughput and wall times are presented for sync IO, only wall
times are shown for async as the granularity reported by dd and the
variability is unsuitable for comparison. As async results were variable
do to writback timings, I'm only reporting the maximum figures. The sync
results were stable enough to make the mean and stddev uninteresting.

The performance results are reported based on a run with no profiling.
Profile data is based on a separate run with oprofile running.

async dd
3.15.0-rc3 3.15.0-rc3
vanilla accessed-v2
ext3 Max elapsed 13.9900 ( 0.00%) 11.5900 ( 17.16%)
tmpfs Max elapsed 0.5100 ( 0.00%) 0.4900 ( 3.92%)
btrfs Max elapsed 12.8100 ( 0.00%) 12.7800 ( 0.23%)
ext4 Max elapsed 18.6000 ( 0.00%) 13.3400 ( 28.28%)
xfs Max elapsed 12.5600 ( 0.00%) 2.0900 ( 83.36%)

The XFS figure is a bit strange as it managed to avoid a worst case by
sheer luck but the average figures looked reasonable.

samples percentage
ext3 86107 0.9783 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
ext3 23833 0.2710 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext3 5036 0.0573 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
ext4 64566 0.8961 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
ext4 5322 0.0713 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext4 2869 0.0384 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs 62126 1.7675 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
xfs 1904 0.0554 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs 103 0.0030 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
btrfs 10655 0.1338 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
btrfs 2020 0.0273 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
btrfs 587 0.0079 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
tmpfs 59562 3.2628 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
tmpfs 1210 0.0696 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
tmpfs 94 0.0054 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed

[akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Tested-by: Prabhakar Lad <prabhakar.csengg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Mel Gorman and committed by
Linus Torvalds
2457aec6 e7470ee8

+220 -165
+6 -5
fs/btrfs/extent_io.c
··· 4510 4510 spin_unlock(&eb->refs_lock); 4511 4511 } 4512 4512 4513 - static void mark_extent_buffer_accessed(struct extent_buffer *eb) 4513 + static void mark_extent_buffer_accessed(struct extent_buffer *eb, 4514 + struct page *accessed) 4514 4515 { 4515 4516 unsigned long num_pages, i; 4516 4517 ··· 4520 4519 num_pages = num_extent_pages(eb->start, eb->len); 4521 4520 for (i = 0; i < num_pages; i++) { 4522 4521 struct page *p = extent_buffer_page(eb, i); 4523 - mark_page_accessed(p); 4522 + if (p != accessed) 4523 + mark_page_accessed(p); 4524 4524 } 4525 4525 } 4526 4526 ··· 4535 4533 start >> PAGE_CACHE_SHIFT); 4536 4534 if (eb && atomic_inc_not_zero(&eb->refs)) { 4537 4535 rcu_read_unlock(); 4538 - mark_extent_buffer_accessed(eb); 4536 + mark_extent_buffer_accessed(eb, NULL); 4539 4537 return eb; 4540 4538 } 4541 4539 rcu_read_unlock(); ··· 4583 4581 spin_unlock(&mapping->private_lock); 4584 4582 unlock_page(p); 4585 4583 page_cache_release(p); 4586 - mark_extent_buffer_accessed(exists); 4584 + mark_extent_buffer_accessed(exists, p); 4587 4585 goto free_eb; 4588 4586 } 4589 4587 ··· 4598 4596 attach_extent_buffer_page(eb, p); 4599 4597 spin_unlock(&mapping->private_lock); 4600 4598 WARN_ON(PageDirty(p)); 4601 - mark_page_accessed(p); 4602 4599 eb->pages[i] = p; 4603 4600 if (!PageUptodate(p)) 4604 4601 uptodate = 0;
+3 -2
fs/btrfs/file.c
··· 470 470 for (i = 0; i < num_pages; i++) { 471 471 /* page checked is some magic around finding pages that 472 472 * have been modified without going through btrfs_set_page_dirty 473 - * clear it here 473 + * clear it here. There should be no need to mark the pages 474 + * accessed as prepare_pages should have marked them accessed 475 + * in prepare_pages via find_or_create_page() 474 476 */ 475 477 ClearPageChecked(pages[i]); 476 478 unlock_page(pages[i]); 477 - mark_page_accessed(pages[i]); 478 479 page_cache_release(pages[i]); 479 480 } 480 481 }
+4 -3
fs/buffer.c
··· 227 227 int all_mapped = 1; 228 228 229 229 index = block >> (PAGE_CACHE_SHIFT - bd_inode->i_blkbits); 230 - page = find_get_page(bd_mapping, index); 230 + page = find_get_page_flags(bd_mapping, index, FGP_ACCESSED); 231 231 if (!page) 232 232 goto out; 233 233 ··· 1366 1366 struct buffer_head *bh = lookup_bh_lru(bdev, block, size); 1367 1367 1368 1368 if (bh == NULL) { 1369 + /* __find_get_block_slow will mark the page accessed */ 1369 1370 bh = __find_get_block_slow(bdev, block); 1370 1371 if (bh) 1371 1372 bh_lru_install(bh); 1372 - } 1373 - if (bh) 1373 + } else 1374 1374 touch_buffer(bh); 1375 + 1375 1376 return bh; 1376 1377 } 1377 1378 EXPORT_SYMBOL(__find_get_block);
+8 -6
fs/ext4/mballoc.c
··· 1044 1044 * allocating. If we are looking at the buddy cache we would 1045 1045 * have taken a reference using ext4_mb_load_buddy and that 1046 1046 * would have pinned buddy page to page cache. 1047 + * The call to ext4_mb_get_buddy_page_lock will mark the 1048 + * page accessed. 1047 1049 */ 1048 1050 ret = ext4_mb_get_buddy_page_lock(sb, group, &e4b); 1049 1051 if (ret || !EXT4_MB_GRP_NEED_INIT(this_grp)) { ··· 1064 1062 ret = -EIO; 1065 1063 goto err; 1066 1064 } 1067 - mark_page_accessed(page); 1068 1065 1069 1066 if (e4b.bd_buddy_page == NULL) { 1070 1067 /* ··· 1083 1082 ret = -EIO; 1084 1083 goto err; 1085 1084 } 1086 - mark_page_accessed(page); 1087 1085 err: 1088 1086 ext4_mb_put_buddy_page_lock(&e4b); 1089 1087 return ret; ··· 1141 1141 1142 1142 /* we could use find_or_create_page(), but it locks page 1143 1143 * what we'd like to avoid in fast path ... */ 1144 - page = find_get_page(inode->i_mapping, pnum); 1144 + page = find_get_page_flags(inode->i_mapping, pnum, FGP_ACCESSED); 1145 1145 if (page == NULL || !PageUptodate(page)) { 1146 1146 if (page) 1147 1147 /* ··· 1176 1176 ret = -EIO; 1177 1177 goto err; 1178 1178 } 1179 + 1180 + /* Pages marked accessed already */ 1179 1181 e4b->bd_bitmap_page = page; 1180 1182 e4b->bd_bitmap = page_address(page) + (poff * sb->s_blocksize); 1181 - mark_page_accessed(page); 1182 1183 1183 1184 block++; 1184 1185 pnum = block / blocks_per_page; 1185 1186 poff = block % blocks_per_page; 1186 1187 1187 - page = find_get_page(inode->i_mapping, pnum); 1188 + page = find_get_page_flags(inode->i_mapping, pnum, FGP_ACCESSED); 1188 1189 if (page == NULL || !PageUptodate(page)) { 1189 1190 if (page) 1190 1191 page_cache_release(page); ··· 1210 1209 ret = -EIO; 1211 1210 goto err; 1212 1211 } 1212 + 1213 + /* Pages marked accessed already */ 1213 1214 e4b->bd_buddy_page = page; 1214 1215 e4b->bd_buddy = page_address(page) + (poff * sb->s_blocksize); 1215 - mark_page_accessed(page); 1216 1216 1217 1217 BUG_ON(e4b->bd_bitmap_page == NULL); 1218 1218 BUG_ON(e4b->bd_buddy_page == NULL);
-3
fs/f2fs/checkpoint.c
··· 69 69 goto repeat; 70 70 } 71 71 out: 72 - mark_page_accessed(page); 73 72 return page; 74 73 } 75 74 ··· 136 137 if (!page) 137 138 continue; 138 139 if (PageUptodate(page)) { 139 - mark_page_accessed(page); 140 140 f2fs_put_page(page, 1); 141 141 continue; 142 142 } 143 143 144 144 f2fs_submit_page_mbio(sbi, page, blk_addr, &fio); 145 - mark_page_accessed(page); 146 145 f2fs_put_page(page, 0); 147 146 } 148 147 out:
-2
fs/f2fs/node.c
··· 967 967 goto repeat; 968 968 } 969 969 got_it: 970 - mark_page_accessed(page); 971 970 return page; 972 971 } 973 972 ··· 1021 1022 f2fs_put_page(page, 1); 1022 1023 return ERR_PTR(-EIO); 1023 1024 } 1024 - mark_page_accessed(page); 1025 1025 return page; 1026 1026 } 1027 1027
-2
fs/fuse/file.c
··· 1089 1089 tmp = iov_iter_copy_from_user_atomic(page, ii, offset, bytes); 1090 1090 flush_dcache_page(page); 1091 1091 1092 - mark_page_accessed(page); 1093 - 1094 1092 if (!tmp) { 1095 1093 unlock_page(page); 1096 1094 page_cache_release(page);
-1
fs/gfs2/aops.c
··· 577 577 p = kmap_atomic(page); 578 578 memcpy(buf + copied, p + offset, amt); 579 579 kunmap_atomic(p); 580 - mark_page_accessed(page); 581 580 page_cache_release(page); 582 581 copied += amt; 583 582 index++;
+2 -2
fs/gfs2/meta_io.c
··· 136 136 yield(); 137 137 } 138 138 } else { 139 - page = find_lock_page(mapping, index); 139 + page = find_get_page_flags(mapping, index, 140 + FGP_LOCK|FGP_ACCESSED); 140 141 if (!page) 141 142 return NULL; 142 143 } ··· 154 153 map_bh(bh, sdp->sd_vfs, blkno); 155 154 156 155 unlock_page(page); 157 - mark_page_accessed(page); 158 156 page_cache_release(page); 159 157 160 158 return bh;
-1
fs/ntfs/attrib.c
··· 1748 1748 if (page) { 1749 1749 set_page_dirty(page); 1750 1750 unlock_page(page); 1751 - mark_page_accessed(page); 1752 1751 page_cache_release(page); 1753 1752 } 1754 1753 ntfs_debug("Done.");
-1
fs/ntfs/file.c
··· 2060 2060 } 2061 2061 do { 2062 2062 unlock_page(pages[--do_pages]); 2063 - mark_page_accessed(pages[do_pages]); 2064 2063 page_cache_release(pages[do_pages]); 2065 2064 } while (do_pages); 2066 2065 if (unlikely(status))
+1
include/linux/page-flags.h
··· 198 198 TESTPAGEFLAG(Locked, locked) 199 199 PAGEFLAG(Error, error) TESTCLEARFLAG(Error, error) 200 200 PAGEFLAG(Referenced, referenced) TESTCLEARFLAG(Referenced, referenced) 201 + __SETPAGEFLAG(Referenced, referenced) 201 202 PAGEFLAG(Dirty, dirty) TESTSCFLAG(Dirty, dirty) __CLEARPAGEFLAG(Dirty, dirty) 202 203 PAGEFLAG(LRU, lru) __CLEARPAGEFLAG(LRU, lru) 203 204 PAGEFLAG(Active, active) __CLEARPAGEFLAG(Active, active)
+101 -6
include/linux/pagemap.h
··· 259 259 pgoff_t page_cache_prev_hole(struct address_space *mapping, 260 260 pgoff_t index, unsigned long max_scan); 261 261 262 + #define FGP_ACCESSED 0x00000001 263 + #define FGP_LOCK 0x00000002 264 + #define FGP_CREAT 0x00000004 265 + #define FGP_WRITE 0x00000008 266 + #define FGP_NOFS 0x00000010 267 + #define FGP_NOWAIT 0x00000020 268 + 269 + struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, 270 + int fgp_flags, gfp_t cache_gfp_mask, gfp_t radix_gfp_mask); 271 + 272 + /** 273 + * find_get_page - find and get a page reference 274 + * @mapping: the address_space to search 275 + * @offset: the page index 276 + * 277 + * Looks up the page cache slot at @mapping & @offset. If there is a 278 + * page cache page, it is returned with an increased refcount. 279 + * 280 + * Otherwise, %NULL is returned. 281 + */ 282 + static inline struct page *find_get_page(struct address_space *mapping, 283 + pgoff_t offset) 284 + { 285 + return pagecache_get_page(mapping, offset, 0, 0, 0); 286 + } 287 + 288 + static inline struct page *find_get_page_flags(struct address_space *mapping, 289 + pgoff_t offset, int fgp_flags) 290 + { 291 + return pagecache_get_page(mapping, offset, fgp_flags, 0, 0); 292 + } 293 + 294 + /** 295 + * find_lock_page - locate, pin and lock a pagecache page 296 + * pagecache_get_page - find and get a page reference 297 + * @mapping: the address_space to search 298 + * @offset: the page index 299 + * 300 + * Looks up the page cache slot at @mapping & @offset. If there is a 301 + * page cache page, it is returned locked and with an increased 302 + * refcount. 303 + * 304 + * Otherwise, %NULL is returned. 305 + * 306 + * find_lock_page() may sleep. 307 + */ 308 + static inline struct page *find_lock_page(struct address_space *mapping, 309 + pgoff_t offset) 310 + { 311 + return pagecache_get_page(mapping, offset, FGP_LOCK, 0, 0); 312 + } 313 + 314 + /** 315 + * find_or_create_page - locate or add a pagecache page 316 + * @mapping: the page's address_space 317 + * @index: the page's index into the mapping 318 + * @gfp_mask: page allocation mode 319 + * 320 + * Looks up the page cache slot at @mapping & @offset. If there is a 321 + * page cache page, it is returned locked and with an increased 322 + * refcount. 323 + * 324 + * If the page is not present, a new page is allocated using @gfp_mask 325 + * and added to the page cache and the VM's LRU list. The page is 326 + * returned locked and with an increased refcount. 327 + * 328 + * On memory exhaustion, %NULL is returned. 329 + * 330 + * find_or_create_page() may sleep, even if @gfp_flags specifies an 331 + * atomic allocation! 332 + */ 333 + static inline struct page *find_or_create_page(struct address_space *mapping, 334 + pgoff_t offset, gfp_t gfp_mask) 335 + { 336 + return pagecache_get_page(mapping, offset, 337 + FGP_LOCK|FGP_ACCESSED|FGP_CREAT, 338 + gfp_mask, gfp_mask & GFP_RECLAIM_MASK); 339 + } 340 + 341 + /** 342 + * grab_cache_page_nowait - returns locked page at given index in given cache 343 + * @mapping: target address_space 344 + * @index: the page index 345 + * 346 + * Same as grab_cache_page(), but do not wait if the page is unavailable. 347 + * This is intended for speculative data generators, where the data can 348 + * be regenerated if the page couldn't be grabbed. This routine should 349 + * be safe to call while holding the lock for another page. 350 + * 351 + * Clear __GFP_FS when allocating the page to avoid recursion into the fs 352 + * and deadlock against the caller's locked page. 353 + */ 354 + static inline struct page *grab_cache_page_nowait(struct address_space *mapping, 355 + pgoff_t index) 356 + { 357 + return pagecache_get_page(mapping, index, 358 + FGP_LOCK|FGP_CREAT|FGP_NOFS|FGP_NOWAIT, 359 + mapping_gfp_mask(mapping), 360 + GFP_NOFS); 361 + } 362 + 262 363 struct page *find_get_entry(struct address_space *mapping, pgoff_t offset); 263 - struct page *find_get_page(struct address_space *mapping, pgoff_t offset); 264 364 struct page *find_lock_entry(struct address_space *mapping, pgoff_t offset); 265 - struct page *find_lock_page(struct address_space *mapping, pgoff_t offset); 266 - struct page *find_or_create_page(struct address_space *mapping, pgoff_t index, 267 - gfp_t gfp_mask); 268 365 unsigned find_get_entries(struct address_space *mapping, pgoff_t start, 269 366 unsigned int nr_entries, struct page **entries, 270 367 pgoff_t *indices); ··· 384 287 return find_or_create_page(mapping, index, mapping_gfp_mask(mapping)); 385 288 } 386 289 387 - extern struct page * grab_cache_page_nowait(struct address_space *mapping, 388 - pgoff_t index); 389 290 extern struct page * read_cache_page(struct address_space *mapping, 390 291 pgoff_t index, filler_t *filler, void *data); 391 292 extern struct page * read_cache_page_gfp(struct address_space *mapping,
+1
include/linux/swap.h
··· 311 311 struct lruvec *lruvec, struct list_head *head); 312 312 extern void activate_page(struct page *); 313 313 extern void mark_page_accessed(struct page *); 314 + extern void init_page_accessed(struct page *page); 314 315 extern void lru_add_drain(void); 315 316 extern void lru_add_drain_cpu(int cpu); 316 317 extern void lru_add_drain_all(void);
+78 -130
mm/filemap.c
··· 982 982 EXPORT_SYMBOL(find_get_entry); 983 983 984 984 /** 985 - * find_get_page - find and get a page reference 986 - * @mapping: the address_space to search 987 - * @offset: the page index 988 - * 989 - * Looks up the page cache slot at @mapping & @offset. If there is a 990 - * page cache page, it is returned with an increased refcount. 991 - * 992 - * Otherwise, %NULL is returned. 993 - */ 994 - struct page *find_get_page(struct address_space *mapping, pgoff_t offset) 995 - { 996 - struct page *page = find_get_entry(mapping, offset); 997 - 998 - if (radix_tree_exceptional_entry(page)) 999 - page = NULL; 1000 - return page; 1001 - } 1002 - EXPORT_SYMBOL(find_get_page); 1003 - 1004 - /** 1005 985 * find_lock_entry - locate, pin and lock a page cache entry 1006 986 * @mapping: the address_space to search 1007 987 * @offset: the page cache index ··· 1018 1038 EXPORT_SYMBOL(find_lock_entry); 1019 1039 1020 1040 /** 1021 - * find_lock_page - locate, pin and lock a pagecache page 1041 + * pagecache_get_page - find and get a page reference 1022 1042 * @mapping: the address_space to search 1023 1043 * @offset: the page index 1044 + * @fgp_flags: PCG flags 1045 + * @gfp_mask: gfp mask to use if a page is to be allocated 1024 1046 * 1025 - * Looks up the page cache slot at @mapping & @offset. If there is a 1026 - * page cache page, it is returned locked and with an increased 1027 - * refcount. 1047 + * Looks up the page cache slot at @mapping & @offset. 1028 1048 * 1029 - * Otherwise, %NULL is returned. 1049 + * PCG flags modify how the page is returned 1030 1050 * 1031 - * find_lock_page() may sleep. 1051 + * FGP_ACCESSED: the page will be marked accessed 1052 + * FGP_LOCK: Page is return locked 1053 + * FGP_CREAT: If page is not present then a new page is allocated using 1054 + * @gfp_mask and added to the page cache and the VM's LRU 1055 + * list. The page is returned locked and with an increased 1056 + * refcount. Otherwise, %NULL is returned. 1057 + * 1058 + * If FGP_LOCK or FGP_CREAT are specified then the function may sleep even 1059 + * if the GFP flags specified for FGP_CREAT are atomic. 1060 + * 1061 + * If there is a page cache page, it is returned with an increased refcount. 1032 1062 */ 1033 - struct page *find_lock_page(struct address_space *mapping, pgoff_t offset) 1034 - { 1035 - struct page *page = find_lock_entry(mapping, offset); 1036 - 1037 - if (radix_tree_exceptional_entry(page)) 1038 - page = NULL; 1039 - return page; 1040 - } 1041 - EXPORT_SYMBOL(find_lock_page); 1042 - 1043 - /** 1044 - * find_or_create_page - locate or add a pagecache page 1045 - * @mapping: the page's address_space 1046 - * @index: the page's index into the mapping 1047 - * @gfp_mask: page allocation mode 1048 - * 1049 - * Looks up the page cache slot at @mapping & @offset. If there is a 1050 - * page cache page, it is returned locked and with an increased 1051 - * refcount. 1052 - * 1053 - * If the page is not present, a new page is allocated using @gfp_mask 1054 - * and added to the page cache and the VM's LRU list. The page is 1055 - * returned locked and with an increased refcount. 1056 - * 1057 - * On memory exhaustion, %NULL is returned. 1058 - * 1059 - * find_or_create_page() may sleep, even if @gfp_flags specifies an 1060 - * atomic allocation! 1061 - */ 1062 - struct page *find_or_create_page(struct address_space *mapping, 1063 - pgoff_t index, gfp_t gfp_mask) 1063 + struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, 1064 + int fgp_flags, gfp_t cache_gfp_mask, gfp_t radix_gfp_mask) 1064 1065 { 1065 1066 struct page *page; 1066 - int err; 1067 + 1067 1068 repeat: 1068 - page = find_lock_page(mapping, index); 1069 - if (!page) { 1070 - page = __page_cache_alloc(gfp_mask); 1069 + page = find_get_entry(mapping, offset); 1070 + if (radix_tree_exceptional_entry(page)) 1071 + page = NULL; 1072 + if (!page) 1073 + goto no_page; 1074 + 1075 + if (fgp_flags & FGP_LOCK) { 1076 + if (fgp_flags & FGP_NOWAIT) { 1077 + if (!trylock_page(page)) { 1078 + page_cache_release(page); 1079 + return NULL; 1080 + } 1081 + } else { 1082 + lock_page(page); 1083 + } 1084 + 1085 + /* Has the page been truncated? */ 1086 + if (unlikely(page->mapping != mapping)) { 1087 + unlock_page(page); 1088 + page_cache_release(page); 1089 + goto repeat; 1090 + } 1091 + VM_BUG_ON_PAGE(page->index != offset, page); 1092 + } 1093 + 1094 + if (page && (fgp_flags & FGP_ACCESSED)) 1095 + mark_page_accessed(page); 1096 + 1097 + no_page: 1098 + if (!page && (fgp_flags & FGP_CREAT)) { 1099 + int err; 1100 + if ((fgp_flags & FGP_WRITE) && mapping_cap_account_dirty(mapping)) 1101 + cache_gfp_mask |= __GFP_WRITE; 1102 + if (fgp_flags & FGP_NOFS) { 1103 + cache_gfp_mask &= ~__GFP_FS; 1104 + radix_gfp_mask &= ~__GFP_FS; 1105 + } 1106 + 1107 + page = __page_cache_alloc(cache_gfp_mask); 1071 1108 if (!page) 1072 1109 return NULL; 1073 - /* 1074 - * We want a regular kernel memory (not highmem or DMA etc) 1075 - * allocation for the radix tree nodes, but we need to honour 1076 - * the context-specific requirements the caller has asked for. 1077 - * GFP_RECLAIM_MASK collects those requirements. 1078 - */ 1079 - err = add_to_page_cache_lru(page, mapping, index, 1080 - (gfp_mask & GFP_RECLAIM_MASK)); 1110 + 1111 + if (WARN_ON_ONCE(!(fgp_flags & FGP_LOCK))) 1112 + fgp_flags |= FGP_LOCK; 1113 + 1114 + /* Init accessed so avoit atomic mark_page_accessed later */ 1115 + if (fgp_flags & FGP_ACCESSED) 1116 + init_page_accessed(page); 1117 + 1118 + err = add_to_page_cache_lru(page, mapping, offset, radix_gfp_mask); 1081 1119 if (unlikely(err)) { 1082 1120 page_cache_release(page); 1083 1121 page = NULL; ··· 1103 1105 goto repeat; 1104 1106 } 1105 1107 } 1108 + 1106 1109 return page; 1107 1110 } 1108 - EXPORT_SYMBOL(find_or_create_page); 1111 + EXPORT_SYMBOL(pagecache_get_page); 1109 1112 1110 1113 /** 1111 1114 * find_get_entries - gang pagecache lookup ··· 1402 1403 return ret; 1403 1404 } 1404 1405 EXPORT_SYMBOL(find_get_pages_tag); 1405 - 1406 - /** 1407 - * grab_cache_page_nowait - returns locked page at given index in given cache 1408 - * @mapping: target address_space 1409 - * @index: the page index 1410 - * 1411 - * Same as grab_cache_page(), but do not wait if the page is unavailable. 1412 - * This is intended for speculative data generators, where the data can 1413 - * be regenerated if the page couldn't be grabbed. This routine should 1414 - * be safe to call while holding the lock for another page. 1415 - * 1416 - * Clear __GFP_FS when allocating the page to avoid recursion into the fs 1417 - * and deadlock against the caller's locked page. 1418 - */ 1419 - struct page * 1420 - grab_cache_page_nowait(struct address_space *mapping, pgoff_t index) 1421 - { 1422 - struct page *page = find_get_page(mapping, index); 1423 - 1424 - if (page) { 1425 - if (trylock_page(page)) 1426 - return page; 1427 - page_cache_release(page); 1428 - return NULL; 1429 - } 1430 - page = __page_cache_alloc(mapping_gfp_mask(mapping) & ~__GFP_FS); 1431 - if (page && add_to_page_cache_lru(page, mapping, index, GFP_NOFS)) { 1432 - page_cache_release(page); 1433 - page = NULL; 1434 - } 1435 - return page; 1436 - } 1437 - EXPORT_SYMBOL(grab_cache_page_nowait); 1438 1406 1439 1407 /* 1440 1408 * CD/DVDs are error prone. When a medium error occurs, the driver may fail ··· 2372 2406 { 2373 2407 const struct address_space_operations *aops = mapping->a_ops; 2374 2408 2375 - mark_page_accessed(page); 2376 2409 return aops->write_end(file, mapping, pos, len, copied, page, fsdata); 2377 2410 } 2378 2411 EXPORT_SYMBOL(pagecache_write_end); ··· 2453 2488 struct page *grab_cache_page_write_begin(struct address_space *mapping, 2454 2489 pgoff_t index, unsigned flags) 2455 2490 { 2456 - int status; 2457 - gfp_t gfp_mask; 2458 2491 struct page *page; 2459 - gfp_t gfp_notmask = 0; 2492 + int fgp_flags = FGP_LOCK|FGP_ACCESSED|FGP_WRITE|FGP_CREAT; 2460 2493 2461 - gfp_mask = mapping_gfp_mask(mapping); 2462 - if (mapping_cap_account_dirty(mapping)) 2463 - gfp_mask |= __GFP_WRITE; 2464 2494 if (flags & AOP_FLAG_NOFS) 2465 - gfp_notmask = __GFP_FS; 2466 - repeat: 2467 - page = find_lock_page(mapping, index); 2468 - if (page) 2469 - goto found; 2495 + fgp_flags |= FGP_NOFS; 2470 2496 2471 - page = __page_cache_alloc(gfp_mask & ~gfp_notmask); 2472 - if (!page) 2473 - return NULL; 2474 - status = add_to_page_cache_lru(page, mapping, index, 2475 - GFP_KERNEL & ~gfp_notmask); 2476 - if (unlikely(status)) { 2477 - page_cache_release(page); 2478 - if (status == -EEXIST) 2479 - goto repeat; 2480 - return NULL; 2481 - } 2482 - found: 2483 - wait_for_stable_page(page); 2497 + page = pagecache_get_page(mapping, index, fgp_flags, 2498 + mapping_gfp_mask(mapping), 2499 + GFP_KERNEL); 2500 + if (page) 2501 + wait_for_stable_page(page); 2502 + 2484 2503 return page; 2485 2504 } 2486 2505 EXPORT_SYMBOL(grab_cache_page_write_begin); ··· 2513 2564 2514 2565 status = a_ops->write_begin(file, mapping, pos, bytes, flags, 2515 2566 &page, &fsdata); 2516 - if (unlikely(status)) 2567 + if (unlikely(status < 0)) 2517 2568 break; 2518 2569 2519 2570 if (mapping_writably_mapped(mapping)) ··· 2522 2573 copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes); 2523 2574 flush_dcache_page(page); 2524 2575 2525 - mark_page_accessed(page); 2526 2576 status = a_ops->write_end(file, mapping, pos, bytes, copied, 2527 2577 page, fsdata); 2528 2578 if (unlikely(status < 0))
+5 -1
mm/shmem.c
··· 1372 1372 loff_t pos, unsigned len, unsigned flags, 1373 1373 struct page **pagep, void **fsdata) 1374 1374 { 1375 + int ret; 1375 1376 struct inode *inode = mapping->host; 1376 1377 pgoff_t index = pos >> PAGE_CACHE_SHIFT; 1377 - return shmem_getpage(inode, index, pagep, SGP_WRITE, NULL); 1378 + ret = shmem_getpage(inode, index, pagep, SGP_WRITE, NULL); 1379 + if (ret == 0 && *pagep) 1380 + init_page_accessed(*pagep); 1381 + return ret; 1378 1382 } 1379 1383 1380 1384 static int
+11
mm/swap.c
··· 614 614 } 615 615 EXPORT_SYMBOL(mark_page_accessed); 616 616 617 + /* 618 + * Used to mark_page_accessed(page) that is not visible yet and when it is 619 + * still safe to use non-atomic ops 620 + */ 621 + void init_page_accessed(struct page *page) 622 + { 623 + if (!PageReferenced(page)) 624 + __SetPageReferenced(page); 625 + } 626 + EXPORT_SYMBOL(init_page_accessed); 627 + 617 628 static void __lru_cache_add(struct page *page) 618 629 { 619 630 struct pagevec *pvec = &get_cpu_var(lru_add_pvec);