Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm, swap: use offset of swap entry as key of swap cache

This patch is to improve the performance of swap cache operations when
the type of the swap device is not 0. Originally, the whole swap entry
value is used as the key of the swap cache, even though there is one
radix tree for each swap device. If the type of the swap device is not
0, the height of the radix tree of the swap cache will be increased
unnecessary, especially on 64bit architecture. For example, for a 1GB
swap device on the x86_64 architecture, the height of the radix tree of
the swap cache is 11. But if the offset of the swap entry is used as
the key of the swap cache, the height of the radix tree of the swap
cache is 4. The increased height causes unnecessary radix tree
descending and increased cache footprint.

This patch reduces the height of the radix tree of the swap cache via
using the offset of the swap entry instead of the whole swap entry value
as the key of the swap cache. In 32 processes sequential swap out test
case on a Xeon E5 v3 system with RAM disk as swap, the lock contention
for the spinlock of the swap cache is reduced from 20.15% to 12.19%,
when the type of the swap device is 1.

Use the whole swap entry as key,

perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list: 10.37,
perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_node_memcg: 9.78,

Use the swap offset as key,

perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list: 6.25,
perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_node_memcg: 5.94,

Link: http://lkml.kernel.org/r/1473270649-27229-1-git-send-email-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Huang Ying and committed by
Linus Torvalds
f6ab1f7f 87744ab3

+16 -14
+4 -4
include/linux/mm.h
··· 1048 1048 return page->mapping; 1049 1049 } 1050 1050 1051 + extern pgoff_t __page_file_index(struct page *page); 1052 + 1051 1053 /* 1052 1054 * Return the pagecache index of the passed page. Regular pagecache pages 1053 - * use ->index whereas swapcache pages use ->private 1055 + * use ->index whereas swapcache pages use swp_offset(->private) 1054 1056 */ 1055 1057 static inline pgoff_t page_index(struct page *page) 1056 1058 { 1057 1059 if (unlikely(PageSwapCache(page))) 1058 - return page_private(page); 1060 + return __page_file_index(page); 1059 1061 return page->index; 1060 1062 } 1061 - 1062 - extern pgoff_t __page_file_index(struct page *page); 1063 1063 1064 1064 /* 1065 1065 * Return the file index of the page. Regular pagecache pages use ->index
+3 -2
mm/memcontrol.c
··· 4408 4408 * Because lookup_swap_cache() updates some statistics counter, 4409 4409 * we call find_get_page() with swapper_space directly. 4410 4410 */ 4411 - page = find_get_page(swap_address_space(ent), ent.val); 4411 + page = find_get_page(swap_address_space(ent), swp_offset(ent)); 4412 4412 if (do_memsw_account()) 4413 4413 entry->val = ent.val; 4414 4414 ··· 4446 4446 swp_entry_t swp = radix_to_swp_entry(page); 4447 4447 if (do_memsw_account()) 4448 4448 *entry = swp; 4449 - page = find_get_page(swap_address_space(swp), swp.val); 4449 + page = find_get_page(swap_address_space(swp), 4450 + swp_offset(swp)); 4450 4451 } 4451 4452 } else 4452 4453 page = find_get_page(mapping, pgoff);
+3 -2
mm/mincore.c
··· 66 66 */ 67 67 if (radix_tree_exceptional_entry(page)) { 68 68 swp_entry_t swp = radix_to_swp_entry(page); 69 - page = find_get_page(swap_address_space(swp), swp.val); 69 + page = find_get_page(swap_address_space(swp), 70 + swp_offset(swp)); 70 71 } 71 72 } else 72 73 page = find_get_page(mapping, pgoff); ··· 151 150 } else { 152 151 #ifdef CONFIG_SWAP 153 152 *vec = mincore_page(swap_address_space(entry), 154 - entry.val); 153 + swp_offset(entry)); 155 154 #else 156 155 WARN_ON(1); 157 156 *vec = 1;
+4 -4
mm/swap_state.c
··· 94 94 address_space = swap_address_space(entry); 95 95 spin_lock_irq(&address_space->tree_lock); 96 96 error = radix_tree_insert(&address_space->page_tree, 97 - entry.val, page); 97 + swp_offset(entry), page); 98 98 if (likely(!error)) { 99 99 address_space->nrpages++; 100 100 __inc_node_page_state(page, NR_FILE_PAGES); ··· 145 145 146 146 entry.val = page_private(page); 147 147 address_space = swap_address_space(entry); 148 - radix_tree_delete(&address_space->page_tree, page_private(page)); 148 + radix_tree_delete(&address_space->page_tree, swp_offset(entry)); 149 149 set_page_private(page, 0); 150 150 ClearPageSwapCache(page); 151 151 address_space->nrpages--; ··· 283 283 { 284 284 struct page *page; 285 285 286 - page = find_get_page(swap_address_space(entry), entry.val); 286 + page = find_get_page(swap_address_space(entry), swp_offset(entry)); 287 287 288 288 if (page) { 289 289 INC_CACHE_INFO(find_success); ··· 310 310 * called after lookup_swap_cache() failed, re-calling 311 311 * that would confuse statistics. 312 312 */ 313 - found_page = find_get_page(swapper_space, entry.val); 313 + found_page = find_get_page(swapper_space, swp_offset(entry)); 314 314 if (found_page) 315 315 break; 316 316
+2 -2
mm/swapfile.c
··· 105 105 struct page *page; 106 106 int ret = 0; 107 107 108 - page = find_get_page(swap_address_space(entry), entry.val); 108 + page = find_get_page(swap_address_space(entry), swp_offset(entry)); 109 109 if (!page) 110 110 return 0; 111 111 /* ··· 1005 1005 if (p) { 1006 1006 if (swap_entry_free(p, entry, 1) == SWAP_HAS_CACHE) { 1007 1007 page = find_get_page(swap_address_space(entry), 1008 - entry.val); 1008 + swp_offset(entry)); 1009 1009 if (page && !trylock_page(page)) { 1010 1010 put_page(page); 1011 1011 page = NULL;