Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE

virtio-mem wants to allow to offline memory blocks of which some parts
were unplugged (allocated via alloc_contig_range()), especially, to later
offline and remove completely unplugged memory blocks. The important part
is that PageOffline() has to remain set until the section is offline, so
these pages will never get accessed (e.g., when dumping). The pages should
not be handed back to the buddy (which would require clearing PageOffline()
and result in issues if offlining fails and the pages are suddenly in the
buddy).

Let's allow to do that by allowing to isolate any PageOffline() page
when offlining. This way, we can reach the memory hotplug notifier
MEM_GOING_OFFLINE, where the driver can signal that he is fine with
offlining this page by dropping its reference count. PageOffline() pages
with a reference count of 0 can then be skipped when offlining the
pages (like if they were free, however they are not in the buddy).

Anybody who uses PageOffline() pages and does not agree to offline them
(e.g., Hyper-V balloon, XEN balloon, VMWare balloon for 2MB pages) will not
decrement the reference count and make offlining fail when trying to
migrate such an unmovable page. So there should be no observable change.
Same applies to balloon compaction users (movable PageOffline() pages), the
pages will simply be migrated.

Note 1: If offlining fails, a driver has to increment the reference
count again in MEM_CANCEL_OFFLINE.

Note 2: A driver that makes use of this has to be aware that re-onlining
the memory block has to be handled by hooking into onlining code
(online_page_callback_t), resetting the page PageOffline() and
not giving them to the buddy.

Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-7-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

authored by

David Hildenbrand and committed by
Michael S. Tsirkin
aa218795 255f5985

+77 -10
+10
include/linux/page-flags.h
··· 777 777 * not onlined when onlining the section). 778 778 * The content of these pages is effectively stale. Such pages should not 779 779 * be touched (read/write/dump/save) except by their owner. 780 + * 781 + * If a driver wants to allow to offline unmovable PageOffline() pages without 782 + * putting them back to the buddy, it can do so via the memory notifier by 783 + * decrementing the reference count in MEM_GOING_OFFLINE and incrementing the 784 + * reference count in MEM_CANCEL_OFFLINE. When offlining, the PageOffline() 785 + * pages (now with a reference count of zero) are treated like free pages, 786 + * allowing the containing memory block to get offlined. A driver that 787 + * relies on this feature is aware that re-onlining the memory block will 788 + * require to re-set the pages PageOffline() and not giving them to the 789 + * buddy via online_page_callback_t. 780 790 */ 781 791 PAGE_TYPE_OPS(Offline, offline) 782 792
+34 -10
mm/memory_hotplug.c
··· 1224 1224 1225 1225 /* 1226 1226 * Scan pfn range [start,end) to find movable/migratable pages (LRU pages, 1227 - * non-lru movable pages and hugepages). We scan pfn because it's much 1228 - * easier than scanning over linked list. This function returns the pfn 1229 - * of the first found movable page if it's found, otherwise 0. 1227 + * non-lru movable pages and hugepages). Will skip over most unmovable 1228 + * pages (esp., pages that can be skipped when offlining), but bail out on 1229 + * definitely unmovable pages. 1230 + * 1231 + * Returns: 1232 + * 0 in case a movable page is found and movable_pfn was updated. 1233 + * -ENOENT in case no movable page was found. 1234 + * -EBUSY in case a definitely unmovable page was found. 1230 1235 */ 1231 - static unsigned long scan_movable_pages(unsigned long start, unsigned long end) 1236 + static int scan_movable_pages(unsigned long start, unsigned long end, 1237 + unsigned long *movable_pfn) 1232 1238 { 1233 1239 unsigned long pfn; 1234 1240 ··· 1246 1240 continue; 1247 1241 page = pfn_to_page(pfn); 1248 1242 if (PageLRU(page)) 1249 - return pfn; 1243 + goto found; 1250 1244 if (__PageMovable(page)) 1251 - return pfn; 1245 + goto found; 1246 + 1247 + /* 1248 + * PageOffline() pages that are not marked __PageMovable() and 1249 + * have a reference count > 0 (after MEM_GOING_OFFLINE) are 1250 + * definitely unmovable. If their reference count would be 0, 1251 + * they could at least be skipped when offlining memory. 1252 + */ 1253 + if (PageOffline(page) && page_count(page)) 1254 + return -EBUSY; 1252 1255 1253 1256 if (!PageHuge(page)) 1254 1257 continue; 1255 1258 head = compound_head(page); 1256 1259 if (page_huge_active(head)) 1257 - return pfn; 1260 + goto found; 1258 1261 skip = compound_nr(head) - (page - head); 1259 1262 pfn += skip - 1; 1260 1263 } 1264 + return -ENOENT; 1265 + found: 1266 + *movable_pfn = pfn; 1261 1267 return 0; 1262 1268 } 1263 1269 ··· 1536 1518 } 1537 1519 1538 1520 do { 1539 - for (pfn = start_pfn; pfn;) { 1521 + pfn = start_pfn; 1522 + do { 1540 1523 if (signal_pending(current)) { 1541 1524 ret = -EINTR; 1542 1525 reason = "signal backoff"; ··· 1547 1528 cond_resched(); 1548 1529 lru_add_drain_all(); 1549 1530 1550 - pfn = scan_movable_pages(pfn, end_pfn); 1551 - if (pfn) { 1531 + ret = scan_movable_pages(pfn, end_pfn, &pfn); 1532 + if (!ret) { 1552 1533 /* 1553 1534 * TODO: fatal migration failures should bail 1554 1535 * out 1555 1536 */ 1556 1537 do_migrate_range(pfn, end_pfn); 1557 1538 } 1539 + } while (!ret); 1540 + 1541 + if (ret != -ENOENT) { 1542 + reason = "unmovable page"; 1543 + goto failed_removal_isolated; 1558 1544 } 1559 1545 1560 1546 /*
+24
mm/page_alloc.c
··· 8372 8372 if ((flags & MEMORY_OFFLINE) && PageHWPoison(page)) 8373 8373 continue; 8374 8374 8375 + /* 8376 + * We treat all PageOffline() pages as movable when offlining 8377 + * to give drivers a chance to decrement their reference count 8378 + * in MEM_GOING_OFFLINE in order to indicate that these pages 8379 + * can be offlined as there are no direct references anymore. 8380 + * For actually unmovable PageOffline() where the driver does 8381 + * not support this, we will fail later when trying to actually 8382 + * move these pages that still have a reference count > 0. 8383 + * (false negatives in this function only) 8384 + */ 8385 + if ((flags & MEMORY_OFFLINE) && PageOffline(page)) 8386 + continue; 8387 + 8375 8388 if (__PageMovable(page) || PageLRU(page)) 8376 8389 continue; 8377 8390 ··· 8801 8788 * page_count() is not 0. 8802 8789 */ 8803 8790 if (unlikely(!PageBuddy(page) && PageHWPoison(page))) { 8791 + pfn++; 8792 + offlined_pages++; 8793 + continue; 8794 + } 8795 + /* 8796 + * At this point all remaining PageOffline() pages have a 8797 + * reference count of 0 and can simply be skipped. 8798 + */ 8799 + if (PageOffline(page)) { 8800 + BUG_ON(page_count(page)); 8801 + BUG_ON(PageBuddy(page)); 8804 8802 pfn++; 8805 8803 offlined_pages++; 8806 8804 continue;
+9
mm/page_isolation.c
··· 151 151 * a bit mask) 152 152 * MEMORY_OFFLINE - isolate to offline (!allocate) memory 153 153 * e.g., skip over PageHWPoison() pages 154 + * and PageOffline() pages. 154 155 * REPORT_FAILURE - report details about the failure to 155 156 * isolate the range 156 157 * ··· 259 258 pfn += 1 << page_order(page); 260 259 else if ((flags & MEMORY_OFFLINE) && PageHWPoison(page)) 261 260 /* A HWPoisoned page cannot be also PageBuddy */ 261 + pfn++; 262 + else if ((flags & MEMORY_OFFLINE) && PageOffline(page) && 263 + !page_count(page)) 264 + /* 265 + * The responsible driver agreed to skip PageOffline() 266 + * pages when offlining memory by dropping its 267 + * reference in MEM_GOING_OFFLINE. 268 + */ 262 269 pfn++; 263 270 else 264 271 break;