Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd

__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Mel Gorman and committed by
Linus Torvalds
d0164adc 016c13da

+210 -172
+8 -6
Documentation/vm/balance
··· 1 1 Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com> 2 2 3 - Memory balancing is needed for non __GFP_WAIT as well as for non 4 - __GFP_IO allocations. 3 + Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as 4 + well as for non __GFP_IO allocations. 5 5 6 - There are two reasons to be requesting non __GFP_WAIT allocations: 7 - the caller can not sleep (typically intr context), or does not want 8 - to incur cost overheads of page stealing and possible swap io for 9 - whatever reasons. 6 + The first reason why a caller may avoid reclaim is that the caller can not 7 + sleep due to holding a spinlock or is in interrupt context. The second may 8 + be that the caller is willing to fail the allocation without incurring the 9 + overhead of page reclaim. This may happen for opportunistic high-order 10 + allocation requests that have order-0 fallback options. In such cases, 11 + the caller may also wish to avoid waking kswapd. 10 12 11 13 __GFP_IO allocation requests are made to prevent file system deadlocks. 12 14
+3 -3
arch/arm/mm/dma-mapping.c
··· 651 651 652 652 if (nommu()) 653 653 addr = __alloc_simple_buffer(dev, size, gfp, &page); 654 - else if (dev_get_cma_area(dev) && (gfp & __GFP_WAIT)) 654 + else if (dev_get_cma_area(dev) && (gfp & __GFP_DIRECT_RECLAIM)) 655 655 addr = __alloc_from_contiguous(dev, size, prot, &page, 656 656 caller, want_vaddr); 657 657 else if (is_coherent) 658 658 addr = __alloc_simple_buffer(dev, size, gfp, &page); 659 - else if (!(gfp & __GFP_WAIT)) 659 + else if (!gfpflags_allow_blocking(gfp)) 660 660 addr = __alloc_from_pool(size, &page); 661 661 else 662 662 addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, ··· 1363 1363 *handle = DMA_ERROR_CODE; 1364 1364 size = PAGE_ALIGN(size); 1365 1365 1366 - if (!(gfp & __GFP_WAIT)) 1366 + if (!gfpflags_allow_blocking(gfp)) 1367 1367 return __iommu_alloc_atomic(dev, size, handle); 1368 1368 1369 1369 /*
+1 -1
arch/arm/xen/mm.c
··· 25 25 unsigned long xen_get_swiotlb_free_pages(unsigned int order) 26 26 { 27 27 struct memblock_region *reg; 28 - gfp_t flags = __GFP_NOWARN; 28 + gfp_t flags = __GFP_NOWARN|__GFP_KSWAPD_RECLAIM; 29 29 30 30 for_each_memblock(memory, reg) { 31 31 if (reg->base < (phys_addr_t)0xffffffff) {
+2 -2
arch/arm64/mm/dma-mapping.c
··· 100 100 if (IS_ENABLED(CONFIG_ZONE_DMA) && 101 101 dev->coherent_dma_mask <= DMA_BIT_MASK(32)) 102 102 flags |= GFP_DMA; 103 - if (dev_get_cma_area(dev) && (flags & __GFP_WAIT)) { 103 + if (dev_get_cma_area(dev) && gfpflags_allow_blocking(flags)) { 104 104 struct page *page; 105 105 void *addr; 106 106 ··· 148 148 149 149 size = PAGE_ALIGN(size); 150 150 151 - if (!coherent && !(flags & __GFP_WAIT)) { 151 + if (!coherent && !gfpflags_allow_blocking(flags)) { 152 152 struct page *page = NULL; 153 153 void *addr = __alloc_from_pool(size, &page, flags); 154 154
+1 -1
arch/x86/kernel/pci-dma.c
··· 90 90 again: 91 91 page = NULL; 92 92 /* CMA can be used only in the context which permits sleeping */ 93 - if (flag & __GFP_WAIT) { 93 + if (gfpflags_allow_blocking(flag)) { 94 94 page = dma_alloc_from_contiguous(dev, count, get_order(size)); 95 95 if (page && page_to_phys(page) + size > dma_mask) { 96 96 dma_release_from_contiguous(dev, page, count);
+13 -13
block/bio.c
··· 211 211 bvl = mempool_alloc(pool, gfp_mask); 212 212 } else { 213 213 struct biovec_slab *bvs = bvec_slabs + *idx; 214 - gfp_t __gfp_mask = gfp_mask & ~(__GFP_WAIT | __GFP_IO); 214 + gfp_t __gfp_mask = gfp_mask & ~(__GFP_DIRECT_RECLAIM | __GFP_IO); 215 215 216 216 /* 217 217 * Make this allocation restricted and don't dump info on ··· 221 221 __gfp_mask |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN; 222 222 223 223 /* 224 - * Try a slab allocation. If this fails and __GFP_WAIT 224 + * Try a slab allocation. If this fails and __GFP_DIRECT_RECLAIM 225 225 * is set, retry with the 1-entry mempool 226 226 */ 227 227 bvl = kmem_cache_alloc(bvs->slab, __gfp_mask); 228 - if (unlikely(!bvl && (gfp_mask & __GFP_WAIT))) { 228 + if (unlikely(!bvl && (gfp_mask & __GFP_DIRECT_RECLAIM))) { 229 229 *idx = BIOVEC_MAX_IDX; 230 230 goto fallback; 231 231 } ··· 395 395 * If @bs is NULL, uses kmalloc() to allocate the bio; else the allocation is 396 396 * backed by the @bs's mempool. 397 397 * 398 - * When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be 399 - * able to allocate a bio. This is due to the mempool guarantees. To make this 400 - * work, callers must never allocate more than 1 bio at a time from this pool. 401 - * Callers that need to allocate more than 1 bio must always submit the 402 - * previously allocated bio for IO before attempting to allocate a new one. 403 - * Failure to do so can cause deadlocks under memory pressure. 398 + * When @bs is not NULL, if %__GFP_DIRECT_RECLAIM is set then bio_alloc will 399 + * always be able to allocate a bio. This is due to the mempool guarantees. 400 + * To make this work, callers must never allocate more than 1 bio at a time 401 + * from this pool. Callers that need to allocate more than 1 bio must always 402 + * submit the previously allocated bio for IO before attempting to allocate 403 + * a new one. Failure to do so can cause deadlocks under memory pressure. 404 404 * 405 405 * Note that when running under generic_make_request() (i.e. any block 406 406 * driver), bios are not submitted until after you return - see the code in ··· 459 459 * We solve this, and guarantee forward progress, with a rescuer 460 460 * workqueue per bio_set. If we go to allocate and there are 461 461 * bios on current->bio_list, we first try the allocation 462 - * without __GFP_WAIT; if that fails, we punt those bios we 463 - * would be blocking to the rescuer workqueue before we retry 464 - * with the original gfp_flags. 462 + * without __GFP_DIRECT_RECLAIM; if that fails, we punt those 463 + * bios we would be blocking to the rescuer workqueue before 464 + * we retry with the original gfp_flags. 465 465 */ 466 466 467 467 if (current->bio_list && !bio_list_empty(current->bio_list)) 468 - gfp_mask &= ~__GFP_WAIT; 468 + gfp_mask &= ~__GFP_DIRECT_RECLAIM; 469 469 470 470 p = mempool_alloc(bs->bio_pool, gfp_mask); 471 471 if (!p && gfp_mask != saved_gfp) {
+8 -8
block/blk-core.c
··· 1206 1206 * @bio: bio to allocate request for (can be %NULL) 1207 1207 * @gfp_mask: allocation mask 1208 1208 * 1209 - * Get a free request from @q. If %__GFP_WAIT is set in @gfp_mask, this 1210 - * function keeps retrying under memory pressure and fails iff @q is dead. 1209 + * Get a free request from @q. If %__GFP_DIRECT_RECLAIM is set in @gfp_mask, 1210 + * this function keeps retrying under memory pressure and fails iff @q is dead. 1211 1211 * 1212 1212 * Must be called with @q->queue_lock held and, 1213 1213 * Returns ERR_PTR on failure, with @q->queue_lock held. ··· 1227 1227 if (!IS_ERR(rq)) 1228 1228 return rq; 1229 1229 1230 - if (!(gfp_mask & __GFP_WAIT) || unlikely(blk_queue_dying(q))) { 1230 + if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) { 1231 1231 blk_put_rl(rl); 1232 1232 return rq; 1233 1233 } ··· 1305 1305 * BUG. 1306 1306 * 1307 1307 * WARNING: When allocating/cloning a bio-chain, careful consideration should be 1308 - * given to how you allocate bios. In particular, you cannot use __GFP_WAIT for 1309 - * anything but the first bio in the chain. Otherwise you risk waiting for IO 1310 - * completion of a bio that hasn't been submitted yet, thus resulting in a 1311 - * deadlock. Alternatively bios should be allocated using bio_kmalloc() instead 1312 - * of bio_alloc(), as that avoids the mempool deadlock. 1308 + * given to how you allocate bios. In particular, you cannot use 1309 + * __GFP_DIRECT_RECLAIM for anything but the first bio in the chain. Otherwise 1310 + * you risk waiting for IO completion of a bio that hasn't been submitted yet, 1311 + * thus resulting in a deadlock. Alternatively bios should be allocated using 1312 + * bio_kmalloc() instead of bio_alloc(), as that avoids the mempool deadlock. 1313 1313 * If possible a big IO should be split into smaller parts when allocation 1314 1314 * fails. Partial allocation should not be an error, or you risk a live-lock. 1315 1315 */
+1 -1
block/blk-ioc.c
··· 289 289 { 290 290 struct io_context *ioc; 291 291 292 - might_sleep_if(gfp_flags & __GFP_WAIT); 292 + might_sleep_if(gfpflags_allow_blocking(gfp_flags)); 293 293 294 294 do { 295 295 task_lock(task);
+1 -1
block/blk-mq-tag.c
··· 268 268 if (tag != -1) 269 269 return tag; 270 270 271 - if (!(data->gfp & __GFP_WAIT)) 271 + if (!gfpflags_allow_blocking(data->gfp)) 272 272 return -1; 273 273 274 274 bs = bt_wait_ptr(bt, hctx);
+3 -3
block/blk-mq.c
··· 244 244 245 245 ctx = blk_mq_get_ctx(q); 246 246 hctx = q->mq_ops->map_queue(q, ctx->cpu); 247 - blk_mq_set_alloc_data(&alloc_data, q, gfp & ~__GFP_WAIT, 247 + blk_mq_set_alloc_data(&alloc_data, q, gfp & ~__GFP_DIRECT_RECLAIM, 248 248 reserved, ctx, hctx); 249 249 250 250 rq = __blk_mq_alloc_request(&alloc_data, rw); 251 - if (!rq && (gfp & __GFP_WAIT)) { 251 + if (!rq && (gfp & __GFP_DIRECT_RECLAIM)) { 252 252 __blk_mq_run_hw_queue(hctx); 253 253 blk_mq_put_ctx(ctx); 254 254 ··· 1186 1186 ctx = blk_mq_get_ctx(q); 1187 1187 hctx = q->mq_ops->map_queue(q, ctx->cpu); 1188 1188 blk_mq_set_alloc_data(&alloc_data, q, 1189 - __GFP_WAIT|GFP_ATOMIC, false, ctx, hctx); 1189 + __GFP_WAIT|__GFP_HIGH, false, ctx, hctx); 1190 1190 rq = __blk_mq_alloc_request(&alloc_data, rw); 1191 1191 ctx = alloc_data.ctx; 1192 1192 hctx = alloc_data.hctx;
+2 -1
drivers/block/drbd/drbd_receiver.c
··· 357 357 } 358 358 359 359 if (has_payload && data_size) { 360 - page = drbd_alloc_pages(peer_device, nr_pages, (gfp_mask & __GFP_WAIT)); 360 + page = drbd_alloc_pages(peer_device, nr_pages, 361 + gfpflags_allow_blocking(gfp_mask)); 361 362 if (!page) 362 363 goto fail; 363 364 }
+1 -1
drivers/block/osdblk.c
··· 271 271 goto err_out; 272 272 273 273 tmp->bi_bdev = NULL; 274 - gfpmask &= ~__GFP_WAIT; 274 + gfpmask &= ~__GFP_DIRECT_RECLAIM; 275 275 tmp->bi_next = NULL; 276 276 277 277 if (!new_chain)
+2 -1
drivers/connector/connector.c
··· 124 124 if (group) 125 125 return netlink_broadcast(dev->nls, skb, portid, group, 126 126 gfp_mask); 127 - return netlink_unicast(dev->nls, skb, portid, !(gfp_mask&__GFP_WAIT)); 127 + return netlink_unicast(dev->nls, skb, portid, 128 + !gfpflags_allow_blocking(gfp_mask)); 128 129 } 129 130 EXPORT_SYMBOL_GPL(cn_netlink_send_mult); 130 131
+1 -1
drivers/firewire/core-cdev.c
··· 486 486 static int add_client_resource(struct client *client, 487 487 struct client_resource *resource, gfp_t gfp_mask) 488 488 { 489 - bool preload = !!(gfp_mask & __GFP_WAIT); 489 + bool preload = gfpflags_allow_blocking(gfp_mask); 490 490 unsigned long flags; 491 491 int ret; 492 492
+1 -1
drivers/gpu/drm/i915/i915_gem.c
··· 2215 2215 */ 2216 2216 mapping = file_inode(obj->base.filp)->i_mapping; 2217 2217 gfp = mapping_gfp_mask(mapping); 2218 - gfp |= __GFP_NORETRY | __GFP_NOWARN | __GFP_NO_KSWAPD; 2218 + gfp |= __GFP_NORETRY | __GFP_NOWARN; 2219 2219 gfp &= ~(__GFP_IO | __GFP_WAIT); 2220 2220 sg = st->sgl; 2221 2221 st->nents = 0;
+1 -1
drivers/infiniband/core/sa_query.c
··· 1083 1083 1084 1084 static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask) 1085 1085 { 1086 - bool preload = !!(gfp_mask & __GFP_WAIT); 1086 + bool preload = gfpflags_allow_blocking(gfp_mask); 1087 1087 unsigned long flags; 1088 1088 int ret, id; 1089 1089
+1 -1
drivers/iommu/amd_iommu.c
··· 2668 2668 2669 2669 page = alloc_pages(flag | __GFP_NOWARN, get_order(size)); 2670 2670 if (!page) { 2671 - if (!(flag & __GFP_WAIT)) 2671 + if (!gfpflags_allow_blocking(flag)) 2672 2672 return NULL; 2673 2673 2674 2674 page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT,
+1 -1
drivers/iommu/intel-iommu.c
··· 3647 3647 flags |= GFP_DMA32; 3648 3648 } 3649 3649 3650 - if (flags & __GFP_WAIT) { 3650 + if (gfpflags_allow_blocking(flags)) { 3651 3651 unsigned int count = size >> PAGE_SHIFT; 3652 3652 3653 3653 page = dma_alloc_from_contiguous(dev, count, order);
+3 -3
drivers/md/dm-crypt.c
··· 994 994 struct bio_vec *bvec; 995 995 996 996 retry: 997 - if (unlikely(gfp_mask & __GFP_WAIT)) 997 + if (unlikely(gfp_mask & __GFP_DIRECT_RECLAIM)) 998 998 mutex_lock(&cc->bio_alloc_lock); 999 999 1000 1000 clone = bio_alloc_bioset(GFP_NOIO, nr_iovecs, cc->bs); ··· 1010 1010 if (!page) { 1011 1011 crypt_free_buffer_pages(cc, clone); 1012 1012 bio_put(clone); 1013 - gfp_mask |= __GFP_WAIT; 1013 + gfp_mask |= __GFP_DIRECT_RECLAIM; 1014 1014 goto retry; 1015 1015 } 1016 1016 ··· 1027 1027 } 1028 1028 1029 1029 return_clone: 1030 - if (unlikely(gfp_mask & __GFP_WAIT)) 1030 + if (unlikely(gfp_mask & __GFP_DIRECT_RECLAIM)) 1031 1031 mutex_unlock(&cc->bio_alloc_lock); 1032 1032 1033 1033 return clone;
+1 -1
drivers/md/dm-kcopyd.c
··· 244 244 *pages = NULL; 245 245 246 246 do { 247 - pl = alloc_pl(__GFP_NOWARN | __GFP_NORETRY); 247 + pl = alloc_pl(__GFP_NOWARN | __GFP_NORETRY | __GFP_KSWAPD_RECLAIM); 248 248 if (unlikely(!pl)) { 249 249 /* Use reserved pages */ 250 250 pl = kc->pages;
+1 -1
drivers/media/pci/solo6x10/solo6x10-v4l2-enc.c
··· 1297 1297 solo_enc->vidq.ops = &solo_enc_video_qops; 1298 1298 solo_enc->vidq.mem_ops = &vb2_dma_sg_memops; 1299 1299 solo_enc->vidq.drv_priv = solo_enc; 1300 - solo_enc->vidq.gfp_flags = __GFP_DMA32; 1300 + solo_enc->vidq.gfp_flags = __GFP_DMA32 | __GFP_KSWAPD_RECLAIM; 1301 1301 solo_enc->vidq.timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC; 1302 1302 solo_enc->vidq.buf_struct_size = sizeof(struct solo_vb2_buf); 1303 1303 solo_enc->vidq.lock = &solo_enc->lock;
+1 -1
drivers/media/pci/solo6x10/solo6x10-v4l2.c
··· 678 678 solo_dev->vidq.mem_ops = &vb2_dma_contig_memops; 679 679 solo_dev->vidq.drv_priv = solo_dev; 680 680 solo_dev->vidq.timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC; 681 - solo_dev->vidq.gfp_flags = __GFP_DMA32; 681 + solo_dev->vidq.gfp_flags = __GFP_DMA32 | __GFP_KSWAPD_RECLAIM; 682 682 solo_dev->vidq.buf_struct_size = sizeof(struct solo_vb2_buf); 683 683 solo_dev->vidq.lock = &solo_dev->lock; 684 684 ret = vb2_queue_init(&solo_dev->vidq);
+1 -1
drivers/media/pci/tw68/tw68-video.c
··· 979 979 dev->vidq.ops = &tw68_video_qops; 980 980 dev->vidq.mem_ops = &vb2_dma_sg_memops; 981 981 dev->vidq.drv_priv = dev; 982 - dev->vidq.gfp_flags = __GFP_DMA32; 982 + dev->vidq.gfp_flags = __GFP_DMA32 | __GFP_KSWAPD_RECLAIM; 983 983 dev->vidq.buf_struct_size = sizeof(struct tw68_buf); 984 984 dev->vidq.lock = &dev->lock; 985 985 dev->vidq.min_buffers_needed = 2;
+1 -2
drivers/mtd/mtdcore.c
··· 1188 1188 */ 1189 1189 void *mtd_kmalloc_up_to(const struct mtd_info *mtd, size_t *size) 1190 1190 { 1191 - gfp_t flags = __GFP_NOWARN | __GFP_WAIT | 1192 - __GFP_NORETRY | __GFP_NO_KSWAPD; 1191 + gfp_t flags = __GFP_NOWARN | __GFP_DIRECT_RECLAIM | __GFP_NORETRY; 1193 1192 size_t min_alloc = max_t(size_t, mtd->writesize, PAGE_SIZE); 1194 1193 void *kbuf; 1195 1194
+1 -1
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
··· 691 691 { 692 692 if (fp->rx_frag_size) { 693 693 /* GFP_KERNEL allocations are used only during initialization */ 694 - if (unlikely(gfp_mask & __GFP_WAIT)) 694 + if (unlikely(gfpflags_allow_blocking(gfp_mask))) 695 695 return (void *)__get_free_page(gfp_mask); 696 696 697 697 return netdev_alloc_frag(fp->rx_frag_size);
+1 -1
drivers/staging/android/ion/ion_system_heap.c
··· 27 27 #include "ion_priv.h" 28 28 29 29 static gfp_t high_order_gfp_flags = (GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN | 30 - __GFP_NORETRY) & ~__GFP_WAIT; 30 + __GFP_NORETRY) & ~__GFP_DIRECT_RECLAIM; 31 31 static gfp_t low_order_gfp_flags = (GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN); 32 32 static const unsigned int orders[] = {8, 4, 0}; 33 33 static const int num_orders = ARRAY_SIZE(orders);
+1 -1
drivers/staging/lustre/include/linux/libcfs/libcfs_private.h
··· 95 95 do { \ 96 96 LASSERT(!in_interrupt() || \ 97 97 ((size) <= LIBCFS_VMALLOC_SIZE && \ 98 - ((mask) & __GFP_WAIT) == 0)); \ 98 + !gfpflags_allow_blocking(mask))); \ 99 99 } while (0) 100 100 101 101 #define LIBCFS_ALLOC_POST(ptr, size) \
+1 -1
drivers/usb/host/u132-hcd.c
··· 2244 2244 { 2245 2245 struct u132 *u132 = hcd_to_u132(hcd); 2246 2246 if (irqs_disabled()) { 2247 - if (__GFP_WAIT & mem_flags) { 2247 + if (gfpflags_allow_blocking(mem_flags)) { 2248 2248 printk(KERN_ERR "invalid context for function that might sleep\n"); 2249 2249 return -EINVAL; 2250 2250 }
+1 -1
drivers/video/fbdev/vermilion/vermilion.c
··· 99 99 * below the first 16MB. 100 100 */ 101 101 102 - flags = __GFP_DMA | __GFP_HIGH; 102 + flags = __GFP_DMA | __GFP_HIGH | __GFP_KSWAPD_RECLAIM; 103 103 va->logical = 104 104 __get_free_pages(flags, --max_order); 105 105 } while (va->logical == 0 && max_order > min_order);
+1 -1
fs/btrfs/disk-io.c
··· 2572 2572 fs_info->commit_interval = BTRFS_DEFAULT_COMMIT_INTERVAL; 2573 2573 fs_info->avg_delayed_ref_runtime = NSEC_PER_SEC >> 6; /* div by 64 */ 2574 2574 /* readahead state */ 2575 - INIT_RADIX_TREE(&fs_info->reada_tree, GFP_NOFS & ~__GFP_WAIT); 2575 + INIT_RADIX_TREE(&fs_info->reada_tree, GFP_NOFS & ~__GFP_DIRECT_RECLAIM); 2576 2576 spin_lock_init(&fs_info->reada_lock); 2577 2577 2578 2578 fs_info->thread_pool_size = min_t(unsigned long,
+7 -7
fs/btrfs/extent_io.c
··· 594 594 if (bits & (EXTENT_IOBITS | EXTENT_BOUNDARY)) 595 595 clear = 1; 596 596 again: 597 - if (!prealloc && (mask & __GFP_WAIT)) { 597 + if (!prealloc && gfpflags_allow_blocking(mask)) { 598 598 /* 599 599 * Don't care for allocation failure here because we might end 600 600 * up not needing the pre-allocated extent state at all, which ··· 718 718 if (start > end) 719 719 goto out; 720 720 spin_unlock(&tree->lock); 721 - if (mask & __GFP_WAIT) 721 + if (gfpflags_allow_blocking(mask)) 722 722 cond_resched(); 723 723 goto again; 724 724 } ··· 850 850 851 851 bits |= EXTENT_FIRST_DELALLOC; 852 852 again: 853 - if (!prealloc && (mask & __GFP_WAIT)) { 853 + if (!prealloc && gfpflags_allow_blocking(mask)) { 854 854 prealloc = alloc_extent_state(mask); 855 855 BUG_ON(!prealloc); 856 856 } ··· 1028 1028 if (start > end) 1029 1029 goto out; 1030 1030 spin_unlock(&tree->lock); 1031 - if (mask & __GFP_WAIT) 1031 + if (gfpflags_allow_blocking(mask)) 1032 1032 cond_resched(); 1033 1033 goto again; 1034 1034 } ··· 1076 1076 btrfs_debug_check_extent_io_range(tree, start, end); 1077 1077 1078 1078 again: 1079 - if (!prealloc && (mask & __GFP_WAIT)) { 1079 + if (!prealloc && gfpflags_allow_blocking(mask)) { 1080 1080 /* 1081 1081 * Best effort, don't worry if extent state allocation fails 1082 1082 * here for the first iteration. We might have a cached state ··· 1253 1253 if (start > end) 1254 1254 goto out; 1255 1255 spin_unlock(&tree->lock); 1256 - if (mask & __GFP_WAIT) 1256 + if (gfpflags_allow_blocking(mask)) 1257 1257 cond_resched(); 1258 1258 first_iteration = false; 1259 1259 goto again; ··· 4319 4319 u64 start = page_offset(page); 4320 4320 u64 end = start + PAGE_CACHE_SIZE - 1; 4321 4321 4322 - if ((mask & __GFP_WAIT) && 4322 + if (gfpflags_allow_blocking(mask) && 4323 4323 page->mapping->host->i_size > 16 * 1024 * 1024) { 4324 4324 u64 len; 4325 4325 while (start <= end) {
+2 -2
fs/btrfs/volumes.c
··· 156 156 spin_lock_init(&dev->reada_lock); 157 157 atomic_set(&dev->reada_in_flight, 0); 158 158 atomic_set(&dev->dev_stats_ccnt, 0); 159 - INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_WAIT); 160 - INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_WAIT); 159 + INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_DIRECT_RECLAIM); 160 + INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_DIRECT_RECLAIM); 161 161 162 162 return dev; 163 163 }
+1 -1
fs/ext4/super.c
··· 1058 1058 return 0; 1059 1059 if (journal) 1060 1060 return jbd2_journal_try_to_free_buffers(journal, page, 1061 - wait & ~__GFP_WAIT); 1061 + wait & ~__GFP_DIRECT_RECLAIM); 1062 1062 return try_to_free_buffers(page); 1063 1063 } 1064 1064
+1 -1
fs/fscache/cookie.c
··· 111 111 112 112 /* radix tree insertion won't use the preallocation pool unless it's 113 113 * told it may not wait */ 114 - INIT_RADIX_TREE(&cookie->stores, GFP_NOFS & ~__GFP_WAIT); 114 + INIT_RADIX_TREE(&cookie->stores, GFP_NOFS & ~__GFP_DIRECT_RECLAIM); 115 115 116 116 switch (cookie->def->type) { 117 117 case FSCACHE_COOKIE_TYPE_INDEX:
+3 -3
fs/fscache/page.c
··· 58 58 59 59 /* 60 60 * decide whether a page can be released, possibly by cancelling a store to it 61 - * - we're allowed to sleep if __GFP_WAIT is flagged 61 + * - we're allowed to sleep if __GFP_DIRECT_RECLAIM is flagged 62 62 */ 63 63 bool __fscache_maybe_release_page(struct fscache_cookie *cookie, 64 64 struct page *page, ··· 122 122 * allocator as the work threads writing to the cache may all end up 123 123 * sleeping on memory allocation, so we may need to impose a timeout 124 124 * too. */ 125 - if (!(gfp & __GFP_WAIT) || !(gfp & __GFP_FS)) { 125 + if (!(gfp & __GFP_DIRECT_RECLAIM) || !(gfp & __GFP_FS)) { 126 126 fscache_stat(&fscache_n_store_vmscan_busy); 127 127 return false; 128 128 } ··· 132 132 _debug("fscache writeout timeout page: %p{%lx}", 133 133 page, page->index); 134 134 135 - gfp &= ~__GFP_WAIT; 135 + gfp &= ~__GFP_DIRECT_RECLAIM; 136 136 goto try_again; 137 137 } 138 138 EXPORT_SYMBOL(__fscache_maybe_release_page);
+2 -2
fs/jbd2/transaction.c
··· 1937 1937 * @journal: journal for operation 1938 1938 * @page: to try and free 1939 1939 * @gfp_mask: we use the mask to detect how hard should we try to release 1940 - * buffers. If __GFP_WAIT and __GFP_FS is set, we wait for commit code to 1941 - * release the buffers. 1940 + * buffers. If __GFP_DIRECT_RECLAIM and __GFP_FS is set, we wait for commit 1941 + * code to release the buffers. 1942 1942 * 1943 1943 * 1944 1944 * For all the buffers on this page,
+3 -3
fs/nfs/file.c
··· 473 473 dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page); 474 474 475 475 /* Always try to initiate a 'commit' if relevant, but only 476 - * wait for it if __GFP_WAIT is set. Even then, only wait 1 477 - * second and only if the 'bdi' is not congested. 476 + * wait for it if the caller allows blocking. Even then, 477 + * only wait 1 second and only if the 'bdi' is not congested. 478 478 * Waiting indefinitely can cause deadlocks when the NFS 479 479 * server is on this machine, when a new TCP connection is 480 480 * needed and in other rare cases. There is no particular ··· 484 484 if (mapping) { 485 485 struct nfs_server *nfss = NFS_SERVER(mapping->host); 486 486 nfs_commit_inode(mapping->host, 0); 487 - if ((gfp & __GFP_WAIT) && 487 + if (gfpflags_allow_blocking(gfp) && 488 488 !bdi_write_congested(&nfss->backing_dev_info)) { 489 489 wait_on_page_bit_killable_timeout(page, PG_private, 490 490 HZ);
+1 -1
fs/xfs/xfs_qm.c
··· 525 525 unsigned long freed; 526 526 int error; 527 527 528 - if ((sc->gfp_mask & (__GFP_FS|__GFP_WAIT)) != (__GFP_FS|__GFP_WAIT)) 528 + if ((sc->gfp_mask & (__GFP_FS|__GFP_DIRECT_RECLAIM)) != (__GFP_FS|__GFP_DIRECT_RECLAIM)) 529 529 return 0; 530 530 531 531 INIT_LIST_HEAD(&isol.buffers);
+33 -13
include/linux/gfp.h
··· 29 29 #define ___GFP_NOMEMALLOC 0x10000u 30 30 #define ___GFP_HARDWALL 0x20000u 31 31 #define ___GFP_THISNODE 0x40000u 32 - #define ___GFP_WAIT 0x80000u 32 + #define ___GFP_ATOMIC 0x80000u 33 33 #define ___GFP_NOACCOUNT 0x100000u 34 34 #define ___GFP_NOTRACK 0x200000u 35 - #define ___GFP_NO_KSWAPD 0x400000u 35 + #define ___GFP_DIRECT_RECLAIM 0x400000u 36 36 #define ___GFP_OTHER_NODE 0x800000u 37 37 #define ___GFP_WRITE 0x1000000u 38 + #define ___GFP_KSWAPD_RECLAIM 0x2000000u 38 39 /* If the above are modified, __GFP_BITS_SHIFT may need updating */ 39 40 40 41 /* ··· 72 71 * __GFP_MOVABLE: Flag that this page will be movable by the page migration 73 72 * mechanism or reclaimed 74 73 */ 75 - #define __GFP_WAIT ((__force gfp_t)___GFP_WAIT) /* Can wait and reschedule? */ 74 + #define __GFP_ATOMIC ((__force gfp_t)___GFP_ATOMIC) /* Caller cannot wait or reschedule */ 76 75 #define __GFP_HIGH ((__force gfp_t)___GFP_HIGH) /* Should access emergency pools? */ 77 76 #define __GFP_IO ((__force gfp_t)___GFP_IO) /* Can start physical IO? */ 78 77 #define __GFP_FS ((__force gfp_t)___GFP_FS) /* Can call down to low-level FS? */ ··· 95 94 #define __GFP_NOACCOUNT ((__force gfp_t)___GFP_NOACCOUNT) /* Don't account to kmemcg */ 96 95 #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) /* Don't track with kmemcheck */ 97 96 98 - #define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) 99 97 #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ 100 98 #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ 99 + 100 + /* 101 + * A caller that is willing to wait may enter direct reclaim and will 102 + * wake kswapd to reclaim pages in the background until the high 103 + * watermark is met. A caller may wish to clear __GFP_DIRECT_RECLAIM to 104 + * avoid unnecessary delays when a fallback option is available but 105 + * still allow kswapd to reclaim in the background. The kswapd flag 106 + * can be cleared when the reclaiming of pages would cause unnecessary 107 + * disruption. 108 + */ 109 + #define __GFP_WAIT ((__force gfp_t)(___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM)) 110 + #define __GFP_DIRECT_RECLAIM ((__force gfp_t)___GFP_DIRECT_RECLAIM) /* Caller can reclaim */ 111 + #define __GFP_KSWAPD_RECLAIM ((__force gfp_t)___GFP_KSWAPD_RECLAIM) /* kswapd can wake */ 101 112 102 113 /* 103 114 * This may seem redundant, but it's a way of annotating false positives vs. ··· 117 104 */ 118 105 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK) 119 106 120 - #define __GFP_BITS_SHIFT 25 /* Room for N __GFP_FOO bits */ 107 + #define __GFP_BITS_SHIFT 26 /* Room for N __GFP_FOO bits */ 121 108 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) 122 109 123 - /* This equals 0, but use constants in case they ever change */ 124 - #define GFP_NOWAIT (GFP_ATOMIC & ~__GFP_HIGH) 125 - /* GFP_ATOMIC means both !wait (__GFP_WAIT not set) and use emergency pool */ 126 - #define GFP_ATOMIC (__GFP_HIGH) 110 + /* 111 + * GFP_ATOMIC callers can not sleep, need the allocation to succeed. 112 + * A lower watermark is applied to allow access to "atomic reserves" 113 + */ 114 + #define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM) 115 + #define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM) 127 116 #define GFP_NOIO (__GFP_WAIT) 128 117 #define GFP_NOFS (__GFP_WAIT | __GFP_IO) 129 118 #define GFP_KERNEL (__GFP_WAIT | __GFP_IO | __GFP_FS) ··· 134 119 #define GFP_USER (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL) 135 120 #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM) 136 121 #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE) 137 - #define GFP_IOFS (__GFP_IO | __GFP_FS) 138 - #define GFP_TRANSHUGE (GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ 139 - __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | \ 140 - __GFP_NO_KSWAPD) 122 + #define GFP_IOFS (__GFP_IO | __GFP_FS | __GFP_KSWAPD_RECLAIM) 123 + #define GFP_TRANSHUGE ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ 124 + __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \ 125 + ~__GFP_KSWAPD_RECLAIM) 141 126 142 127 /* This mask makes up all the page movable related flags */ 143 128 #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE) ··· 177 162 178 163 /* Group based on mobility */ 179 164 return (gfp_flags & GFP_MOVABLE_MASK) >> GFP_MOVABLE_SHIFT; 165 + } 166 + 167 + static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) 168 + { 169 + return gfp_flags & __GFP_DIRECT_RECLAIM; 180 170 } 181 171 182 172 #ifdef CONFIG_HIGHMEM
+3 -3
include/linux/skbuff.h
··· 1224 1224 1225 1225 static inline int skb_unclone(struct sk_buff *skb, gfp_t pri) 1226 1226 { 1227 - might_sleep_if(pri & __GFP_WAIT); 1227 + might_sleep_if(gfpflags_allow_blocking(pri)); 1228 1228 1229 1229 if (skb_cloned(skb)) 1230 1230 return pskb_expand_head(skb, 0, 0, pri); ··· 1308 1308 */ 1309 1309 static inline struct sk_buff *skb_share_check(struct sk_buff *skb, gfp_t pri) 1310 1310 { 1311 - might_sleep_if(pri & __GFP_WAIT); 1311 + might_sleep_if(gfpflags_allow_blocking(pri)); 1312 1312 if (skb_shared(skb)) { 1313 1313 struct sk_buff *nskb = skb_clone(skb, pri); 1314 1314 ··· 1344 1344 static inline struct sk_buff *skb_unshare(struct sk_buff *skb, 1345 1345 gfp_t pri) 1346 1346 { 1347 - might_sleep_if(pri & __GFP_WAIT); 1347 + might_sleep_if(gfpflags_allow_blocking(pri)); 1348 1348 if (skb_cloned(skb)) { 1349 1349 struct sk_buff *nskb = skb_copy(skb, pri); 1350 1350
+1 -1
include/net/sock.h
··· 2041 2041 */ 2042 2042 static inline struct page_frag *sk_page_frag(struct sock *sk) 2043 2043 { 2044 - if (sk->sk_allocation & __GFP_WAIT) 2044 + if (gfpflags_allow_blocking(sk->sk_allocation)) 2045 2045 return &current->task_frag; 2046 2046 2047 2047 return &sk->sk_frag;
+3 -2
include/trace/events/gfpflags.h
··· 20 20 {(unsigned long)GFP_ATOMIC, "GFP_ATOMIC"}, \ 21 21 {(unsigned long)GFP_NOIO, "GFP_NOIO"}, \ 22 22 {(unsigned long)__GFP_HIGH, "GFP_HIGH"}, \ 23 - {(unsigned long)__GFP_WAIT, "GFP_WAIT"}, \ 23 + {(unsigned long)__GFP_ATOMIC, "GFP_ATOMIC"}, \ 24 24 {(unsigned long)__GFP_IO, "GFP_IO"}, \ 25 25 {(unsigned long)__GFP_COLD, "GFP_COLD"}, \ 26 26 {(unsigned long)__GFP_NOWARN, "GFP_NOWARN"}, \ ··· 36 36 {(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \ 37 37 {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \ 38 38 {(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \ 39 - {(unsigned long)__GFP_NO_KSWAPD, "GFP_NO_KSWAPD"}, \ 39 + {(unsigned long)__GFP_DIRECT_RECLAIM, "GFP_DIRECT_RECLAIM"}, \ 40 + {(unsigned long)__GFP_KSWAPD_RECLAIM, "GFP_KSWAPD_RECLAIM"}, \ 40 41 {(unsigned long)__GFP_OTHER_NODE, "GFP_OTHER_NODE"} \ 41 42 ) : "GFP_NOWAIT" 42 43
+3 -3
kernel/audit.c
··· 1371 1371 if (unlikely(audit_filter_type(type))) 1372 1372 return NULL; 1373 1373 1374 - if (gfp_mask & __GFP_WAIT) { 1374 + if (gfp_mask & __GFP_DIRECT_RECLAIM) { 1375 1375 if (audit_pid && audit_pid == current->pid) 1376 - gfp_mask &= ~__GFP_WAIT; 1376 + gfp_mask &= ~__GFP_DIRECT_RECLAIM; 1377 1377 else 1378 1378 reserve = 0; 1379 1379 } 1380 1380 1381 1381 while (audit_backlog_limit 1382 1382 && skb_queue_len(&audit_skb_queue) > audit_backlog_limit + reserve) { 1383 - if (gfp_mask & __GFP_WAIT && audit_backlog_wait_time) { 1383 + if (gfp_mask & __GFP_DIRECT_RECLAIM && audit_backlog_wait_time) { 1384 1384 long sleep_time; 1385 1385 1386 1386 sleep_time = timeout_start + audit_backlog_wait_time - jiffies;
+1 -1
kernel/cgroup.c
··· 299 299 300 300 idr_preload(gfp_mask); 301 301 spin_lock_bh(&cgroup_idr_lock); 302 - ret = idr_alloc(idr, ptr, start, end, gfp_mask & ~__GFP_WAIT); 302 + ret = idr_alloc(idr, ptr, start, end, gfp_mask & ~__GFP_DIRECT_RECLAIM); 303 303 spin_unlock_bh(&cgroup_idr_lock); 304 304 idr_preload_end(); 305 305 return ret;
+1 -1
kernel/locking/lockdep.c
··· 2738 2738 return; 2739 2739 2740 2740 /* no reclaim without waiting on it */ 2741 - if (!(gfp_mask & __GFP_WAIT)) 2741 + if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) 2742 2742 return; 2743 2743 2744 2744 /* this guy won't enter reclaim */
+1 -1
kernel/power/snapshot.c
··· 1779 1779 while (to_alloc-- > 0) { 1780 1780 struct page *page; 1781 1781 1782 - page = alloc_image_page(__GFP_HIGHMEM); 1782 + page = alloc_image_page(__GFP_HIGHMEM|__GFP_KSWAPD_RECLAIM); 1783 1783 memory_bm_set_bit(bm, page_to_pfn(page)); 1784 1784 } 1785 1785 return nr_highmem;
+1 -1
kernel/smp.c
··· 669 669 cpumask_var_t cpus; 670 670 int cpu, ret; 671 671 672 - might_sleep_if(gfp_flags & __GFP_WAIT); 672 + might_sleep_if(gfpflags_allow_blocking(gfp_flags)); 673 673 674 674 if (likely(zalloc_cpumask_var(&cpus, (gfp_flags|__GFP_NOWARN)))) { 675 675 preempt_disable();
+2 -2
lib/idr.c
··· 399 399 * allocation guarantee. Disallow usage from those contexts. 400 400 */ 401 401 WARN_ON_ONCE(in_interrupt()); 402 - might_sleep_if(gfp_mask & __GFP_WAIT); 402 + might_sleep_if(gfpflags_allow_blocking(gfp_mask)); 403 403 404 404 preempt_disable(); 405 405 ··· 453 453 struct idr_layer *pa[MAX_IDR_LEVEL + 1]; 454 454 int id; 455 455 456 - might_sleep_if(gfp_mask & __GFP_WAIT); 456 + might_sleep_if(gfpflags_allow_blocking(gfp_mask)); 457 457 458 458 /* sanity checks */ 459 459 if (WARN_ON_ONCE(start < 0))
+5 -5
lib/radix-tree.c
··· 188 188 * preloading in the interrupt anyway as all the allocations have to 189 189 * be atomic. So just do normal allocation when in interrupt. 190 190 */ 191 - if (!(gfp_mask & __GFP_WAIT) && !in_interrupt()) { 191 + if (!gfpflags_allow_blocking(gfp_mask) && !in_interrupt()) { 192 192 struct radix_tree_preload *rtp; 193 193 194 194 /* ··· 249 249 * with preemption not disabled. 250 250 * 251 251 * To make use of this facility, the radix tree must be initialised without 252 - * __GFP_WAIT being passed to INIT_RADIX_TREE(). 252 + * __GFP_DIRECT_RECLAIM being passed to INIT_RADIX_TREE(). 253 253 */ 254 254 static int __radix_tree_preload(gfp_t gfp_mask) 255 255 { ··· 286 286 * with preemption not disabled. 287 287 * 288 288 * To make use of this facility, the radix tree must be initialised without 289 - * __GFP_WAIT being passed to INIT_RADIX_TREE(). 289 + * __GFP_DIRECT_RECLAIM being passed to INIT_RADIX_TREE(). 290 290 */ 291 291 int radix_tree_preload(gfp_t gfp_mask) 292 292 { 293 293 /* Warn on non-sensical use... */ 294 - WARN_ON_ONCE(!(gfp_mask & __GFP_WAIT)); 294 + WARN_ON_ONCE(!gfpflags_allow_blocking(gfp_mask)); 295 295 return __radix_tree_preload(gfp_mask); 296 296 } 297 297 EXPORT_SYMBOL(radix_tree_preload); ··· 303 303 */ 304 304 int radix_tree_maybe_preload(gfp_t gfp_mask) 305 305 { 306 - if (gfp_mask & __GFP_WAIT) 306 + if (gfpflags_allow_blocking(gfp_mask)) 307 307 return __radix_tree_preload(gfp_mask); 308 308 /* Preloading doesn't help anything with this gfp mask, skip it */ 309 309 preempt_disable();
+1 -1
mm/backing-dev.c
··· 637 637 { 638 638 struct bdi_writeback *wb; 639 639 640 - might_sleep_if(gfp & __GFP_WAIT); 640 + might_sleep_if(gfpflags_allow_blocking(gfp)); 641 641 642 642 if (!memcg_css->parent) 643 643 return &bdi->wb;
+1 -1
mm/dmapool.c
··· 326 326 size_t offset; 327 327 void *retval; 328 328 329 - might_sleep_if(mem_flags & __GFP_WAIT); 329 + might_sleep_if(gfpflags_allow_blocking(mem_flags)); 330 330 331 331 spin_lock_irqsave(&pool->lock, flags); 332 332 list_for_each_entry(page, &pool->page_list, page_list) {
+3 -3
mm/memcontrol.c
··· 2046 2046 if (unlikely(task_in_memcg_oom(current))) 2047 2047 goto nomem; 2048 2048 2049 - if (!(gfp_mask & __GFP_WAIT)) 2049 + if (!gfpflags_allow_blocking(gfp_mask)) 2050 2050 goto nomem; 2051 2051 2052 2052 mem_cgroup_events(mem_over_limit, MEMCG_MAX, 1); ··· 4364 4364 { 4365 4365 int ret; 4366 4366 4367 - /* Try a single bulk charge without reclaim first */ 4368 - ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_WAIT, count); 4367 + /* Try a single bulk charge without reclaim first, kswapd may wake */ 4368 + ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count); 4369 4369 if (!ret) { 4370 4370 mc.precharge += count; 4371 4371 return ret;
+5 -5
mm/mempool.c
··· 320 320 gfp_t gfp_temp; 321 321 322 322 VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO); 323 - might_sleep_if(gfp_mask & __GFP_WAIT); 323 + might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM); 324 324 325 325 gfp_mask |= __GFP_NOMEMALLOC; /* don't allocate emergency reserves */ 326 326 gfp_mask |= __GFP_NORETRY; /* don't loop in __alloc_pages */ 327 327 gfp_mask |= __GFP_NOWARN; /* failures are OK */ 328 328 329 - gfp_temp = gfp_mask & ~(__GFP_WAIT|__GFP_IO); 329 + gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO); 330 330 331 331 repeat_alloc: 332 332 ··· 349 349 } 350 350 351 351 /* 352 - * We use gfp mask w/o __GFP_WAIT or IO for the first round. If 352 + * We use gfp mask w/o direct reclaim or IO for the first round. If 353 353 * alloc failed with that and @pool was empty, retry immediately. 354 354 */ 355 355 if (gfp_temp != gfp_mask) { ··· 358 358 goto repeat_alloc; 359 359 } 360 360 361 - /* We must not sleep if !__GFP_WAIT */ 362 - if (!(gfp_mask & __GFP_WAIT)) { 361 + /* We must not sleep if !__GFP_DIRECT_RECLAIM */ 362 + if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) { 363 363 spin_unlock_irqrestore(&pool->lock, flags); 364 364 return NULL; 365 365 }
+1 -1
mm/migrate.c
··· 1578 1578 (GFP_HIGHUSER_MOVABLE | 1579 1579 __GFP_THISNODE | __GFP_NOMEMALLOC | 1580 1580 __GFP_NORETRY | __GFP_NOWARN) & 1581 - ~GFP_IOFS, 0); 1581 + ~(__GFP_IO | __GFP_FS), 0); 1582 1582 1583 1583 return newpage; 1584 1584 }
+27 -16
mm/page_alloc.c
··· 169 169 WARN_ON(!mutex_is_locked(&pm_mutex)); 170 170 WARN_ON(saved_gfp_mask); 171 171 saved_gfp_mask = gfp_allowed_mask; 172 - gfp_allowed_mask &= ~GFP_IOFS; 172 + gfp_allowed_mask &= ~(__GFP_IO | __GFP_FS); 173 173 } 174 174 175 175 bool pm_suspended_storage(void) 176 176 { 177 - if ((gfp_allowed_mask & GFP_IOFS) == GFP_IOFS) 177 + if ((gfp_allowed_mask & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS)) 178 178 return false; 179 179 return true; 180 180 } ··· 2183 2183 return false; 2184 2184 if (fail_page_alloc.ignore_gfp_highmem && (gfp_mask & __GFP_HIGHMEM)) 2185 2185 return false; 2186 - if (fail_page_alloc.ignore_gfp_wait && (gfp_mask & __GFP_WAIT)) 2186 + if (fail_page_alloc.ignore_gfp_wait && (gfp_mask & __GFP_DIRECT_RECLAIM)) 2187 2187 return false; 2188 2188 2189 2189 return should_fail(&fail_page_alloc.attr, 1 << order); ··· 2685 2685 if (test_thread_flag(TIF_MEMDIE) || 2686 2686 (current->flags & (PF_MEMALLOC | PF_EXITING))) 2687 2687 filter &= ~SHOW_MEM_FILTER_NODES; 2688 - if (in_interrupt() || !(gfp_mask & __GFP_WAIT)) 2688 + if (in_interrupt() || !(gfp_mask & __GFP_DIRECT_RECLAIM)) 2689 2689 filter &= ~SHOW_MEM_FILTER_NODES; 2690 2690 2691 2691 if (fmt) { ··· 2945 2945 gfp_to_alloc_flags(gfp_t gfp_mask) 2946 2946 { 2947 2947 int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET; 2948 - const bool atomic = !(gfp_mask & (__GFP_WAIT | __GFP_NO_KSWAPD)); 2949 2948 2950 2949 /* __GFP_HIGH is assumed to be the same as ALLOC_HIGH to save a branch. */ 2951 2950 BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH); ··· 2953 2954 * The caller may dip into page reserves a bit more if the caller 2954 2955 * cannot run direct reclaim, or if the caller has realtime scheduling 2955 2956 * policy or is asking for __GFP_HIGH memory. GFP_ATOMIC requests will 2956 - * set both ALLOC_HARDER (atomic == true) and ALLOC_HIGH (__GFP_HIGH). 2957 + * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_HIGH (__GFP_HIGH). 2957 2958 */ 2958 2959 alloc_flags |= (__force int) (gfp_mask & __GFP_HIGH); 2959 2960 2960 - if (atomic) { 2961 + if (gfp_mask & __GFP_ATOMIC) { 2961 2962 /* 2962 2963 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even 2963 2964 * if it can't schedule. ··· 2994 2995 return !!(gfp_to_alloc_flags(gfp_mask) & ALLOC_NO_WATERMARKS); 2995 2996 } 2996 2997 2998 + static inline bool is_thp_gfp_mask(gfp_t gfp_mask) 2999 + { 3000 + return (gfp_mask & (GFP_TRANSHUGE | __GFP_KSWAPD_RECLAIM)) == GFP_TRANSHUGE; 3001 + } 3002 + 2997 3003 static inline struct page * 2998 3004 __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, 2999 3005 struct alloc_context *ac) 3000 3006 { 3001 - const gfp_t wait = gfp_mask & __GFP_WAIT; 3007 + bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM; 3002 3008 struct page *page = NULL; 3003 3009 int alloc_flags; 3004 3010 unsigned long pages_reclaimed = 0; ··· 3024 3020 } 3025 3021 3026 3022 /* 3023 + * We also sanity check to catch abuse of atomic reserves being used by 3024 + * callers that are not in atomic context. 3025 + */ 3026 + if (WARN_ON_ONCE((gfp_mask & (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)) == 3027 + (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM))) 3028 + gfp_mask &= ~__GFP_ATOMIC; 3029 + 3030 + /* 3027 3031 * If this allocation cannot block and it is for a specific node, then 3028 3032 * fail early. There's no need to wakeup kswapd or retry for a 3029 3033 * speculative node-specific allocation. 3030 3034 */ 3031 - if (IS_ENABLED(CONFIG_NUMA) && (gfp_mask & __GFP_THISNODE) && !wait) 3035 + if (IS_ENABLED(CONFIG_NUMA) && (gfp_mask & __GFP_THISNODE) && !can_direct_reclaim) 3032 3036 goto nopage; 3033 3037 3034 3038 retry: 3035 - if (!(gfp_mask & __GFP_NO_KSWAPD)) 3039 + if (gfp_mask & __GFP_KSWAPD_RECLAIM) 3036 3040 wake_all_kswapds(order, ac); 3037 3041 3038 3042 /* ··· 3083 3071 } 3084 3072 } 3085 3073 3086 - /* Atomic allocations - we can't balance anything */ 3087 - if (!wait) { 3074 + /* Caller is not willing to reclaim, we can't balance anything */ 3075 + if (!can_direct_reclaim) { 3088 3076 /* 3089 3077 * All existing users of the deprecated __GFP_NOFAIL are 3090 3078 * blockable, so warn of any new users that actually allow this ··· 3114 3102 goto got_pg; 3115 3103 3116 3104 /* Checks for THP-specific high-order allocations */ 3117 - if ((gfp_mask & GFP_TRANSHUGE) == GFP_TRANSHUGE) { 3105 + if (is_thp_gfp_mask(gfp_mask)) { 3118 3106 /* 3119 3107 * If compaction is deferred for high-order allocations, it is 3120 3108 * because sync compaction recently failed. If this is the case ··· 3149 3137 * fault, so use asynchronous memory compaction for THP unless it is 3150 3138 * khugepaged trying to collapse. 3151 3139 */ 3152 - if ((gfp_mask & GFP_TRANSHUGE) != GFP_TRANSHUGE || 3153 - (current->flags & PF_KTHREAD)) 3140 + if (!is_thp_gfp_mask(gfp_mask) || (current->flags & PF_KTHREAD)) 3154 3141 migration_mode = MIGRATE_SYNC_LIGHT; 3155 3142 3156 3143 /* Try direct reclaim and then allocating */ ··· 3220 3209 3221 3210 lockdep_trace_alloc(gfp_mask); 3222 3211 3223 - might_sleep_if(gfp_mask & __GFP_WAIT); 3212 + might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM); 3224 3213 3225 3214 if (should_fail_alloc_page(gfp_mask, order)) 3226 3215 return NULL;
+9 -9
mm/slab.c
··· 1031 1031 } 1032 1032 1033 1033 /* 1034 - * Construct gfp mask to allocate from a specific node but do not invoke reclaim 1035 - * or warn about failures. 1034 + * Construct gfp mask to allocate from a specific node but do not direct reclaim 1035 + * or warn about failures. kswapd may still wake to reclaim in the background. 1036 1036 */ 1037 1037 static inline gfp_t gfp_exact_node(gfp_t flags) 1038 1038 { 1039 - return (flags | __GFP_THISNODE | __GFP_NOWARN) & ~__GFP_WAIT; 1039 + return (flags | __GFP_THISNODE | __GFP_NOWARN) & ~__GFP_DIRECT_RECLAIM; 1040 1040 } 1041 1041 #endif 1042 1042 ··· 2633 2633 2634 2634 offset *= cachep->colour_off; 2635 2635 2636 - if (local_flags & __GFP_WAIT) 2636 + if (gfpflags_allow_blocking(local_flags)) 2637 2637 local_irq_enable(); 2638 2638 2639 2639 /* ··· 2663 2663 2664 2664 cache_init_objs(cachep, page); 2665 2665 2666 - if (local_flags & __GFP_WAIT) 2666 + if (gfpflags_allow_blocking(local_flags)) 2667 2667 local_irq_disable(); 2668 2668 check_irq_off(); 2669 2669 spin_lock(&n->list_lock); ··· 2677 2677 opps1: 2678 2678 kmem_freepages(cachep, page); 2679 2679 failed: 2680 - if (local_flags & __GFP_WAIT) 2680 + if (gfpflags_allow_blocking(local_flags)) 2681 2681 local_irq_disable(); 2682 2682 return 0; 2683 2683 } ··· 2869 2869 static inline void cache_alloc_debugcheck_before(struct kmem_cache *cachep, 2870 2870 gfp_t flags) 2871 2871 { 2872 - might_sleep_if(flags & __GFP_WAIT); 2872 + might_sleep_if(gfpflags_allow_blocking(flags)); 2873 2873 #if DEBUG 2874 2874 kmem_flagcheck(cachep, flags); 2875 2875 #endif ··· 3057 3057 */ 3058 3058 struct page *page; 3059 3059 3060 - if (local_flags & __GFP_WAIT) 3060 + if (gfpflags_allow_blocking(local_flags)) 3061 3061 local_irq_enable(); 3062 3062 kmem_flagcheck(cache, flags); 3063 3063 page = kmem_getpages(cache, local_flags, numa_mem_id()); 3064 - if (local_flags & __GFP_WAIT) 3064 + if (gfpflags_allow_blocking(local_flags)) 3065 3065 local_irq_disable(); 3066 3066 if (page) { 3067 3067 /*
+5 -5
mm/slub.c
··· 1265 1265 { 1266 1266 flags &= gfp_allowed_mask; 1267 1267 lockdep_trace_alloc(flags); 1268 - might_sleep_if(flags & __GFP_WAIT); 1268 + might_sleep_if(gfpflags_allow_blocking(flags)); 1269 1269 1270 1270 if (should_failslab(s->object_size, flags, s->flags)) 1271 1271 return NULL; ··· 1353 1353 1354 1354 flags &= gfp_allowed_mask; 1355 1355 1356 - if (flags & __GFP_WAIT) 1356 + if (gfpflags_allow_blocking(flags)) 1357 1357 local_irq_enable(); 1358 1358 1359 1359 flags |= s->allocflags; ··· 1363 1363 * so we fall-back to the minimum order allocation. 1364 1364 */ 1365 1365 alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL; 1366 - if ((alloc_gfp & __GFP_WAIT) && oo_order(oo) > oo_order(s->min)) 1367 - alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~__GFP_WAIT; 1366 + if ((alloc_gfp & __GFP_DIRECT_RECLAIM) && oo_order(oo) > oo_order(s->min)) 1367 + alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~__GFP_DIRECT_RECLAIM; 1368 1368 1369 1369 page = alloc_slab_page(s, alloc_gfp, node, oo); 1370 1370 if (unlikely(!page)) { ··· 1424 1424 page->frozen = 1; 1425 1425 1426 1426 out: 1427 - if (flags & __GFP_WAIT) 1427 + if (gfpflags_allow_blocking(flags)) 1428 1428 local_irq_disable(); 1429 1429 if (!page) 1430 1430 return NULL;
+1 -1
mm/vmalloc.c
··· 1617 1617 goto fail; 1618 1618 } 1619 1619 area->pages[i] = page; 1620 - if (gfp_mask & __GFP_WAIT) 1620 + if (gfpflags_allow_blocking(gfp_mask)) 1621 1621 cond_resched(); 1622 1622 } 1623 1623
+2 -2
mm/vmscan.c
··· 1476 1476 * won't get blocked by normal direct-reclaimers, forming a circular 1477 1477 * deadlock. 1478 1478 */ 1479 - if ((sc->gfp_mask & GFP_IOFS) == GFP_IOFS) 1479 + if ((sc->gfp_mask & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS)) 1480 1480 inactive >>= 3; 1481 1481 1482 1482 return isolated > inactive; ··· 3791 3791 /* 3792 3792 * Do not scan if the allocation should not be delayed. 3793 3793 */ 3794 - if (!(gfp_mask & __GFP_WAIT) || (current->flags & PF_MEMALLOC)) 3794 + if (!gfpflags_allow_blocking(gfp_mask) || (current->flags & PF_MEMALLOC)) 3795 3795 return ZONE_RECLAIM_NOSCAN; 3796 3796 3797 3797 /*
+3 -2
mm/zswap.c
··· 571 571 static struct zswap_pool *zswap_pool_create(char *type, char *compressor) 572 572 { 573 573 struct zswap_pool *pool; 574 - gfp_t gfp = __GFP_NORETRY | __GFP_NOWARN; 574 + gfp_t gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM; 575 575 576 576 pool = kzalloc(sizeof(*pool), GFP_KERNEL); 577 577 if (!pool) { ··· 1011 1011 /* store */ 1012 1012 len = dlen + sizeof(struct zswap_header); 1013 1013 ret = zpool_malloc(entry->pool->zpool, len, 1014 - __GFP_NORETRY | __GFP_NOWARN, &handle); 1014 + __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM, 1015 + &handle); 1015 1016 if (ret == -ENOSPC) { 1016 1017 zswap_reject_compress_poor++; 1017 1018 goto put_dstmem;
+4 -4
net/core/skbuff.c
··· 414 414 len += NET_SKB_PAD; 415 415 416 416 if ((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) || 417 - (gfp_mask & (__GFP_WAIT | GFP_DMA))) { 417 + (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) { 418 418 skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE); 419 419 if (!skb) 420 420 goto skb_fail; ··· 481 481 len += NET_SKB_PAD + NET_IP_ALIGN; 482 482 483 483 if ((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) || 484 - (gfp_mask & (__GFP_WAIT | GFP_DMA))) { 484 + (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) { 485 485 skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE); 486 486 if (!skb) 487 487 goto skb_fail; ··· 4452 4452 return NULL; 4453 4453 4454 4454 gfp_head = gfp_mask; 4455 - if (gfp_head & __GFP_WAIT) 4455 + if (gfp_head & __GFP_DIRECT_RECLAIM) 4456 4456 gfp_head |= __GFP_REPEAT; 4457 4457 4458 4458 *errcode = -ENOBUFS; ··· 4467 4467 4468 4468 while (order) { 4469 4469 if (npages >= 1 << order) { 4470 - page = alloc_pages((gfp_mask & ~__GFP_WAIT) | 4470 + page = alloc_pages((gfp_mask & ~__GFP_DIRECT_RECLAIM) | 4471 4471 __GFP_COMP | 4472 4472 __GFP_NOWARN | 4473 4473 __GFP_NORETRY,
+4 -2
net/core/sock.c
··· 1944 1944 1945 1945 pfrag->offset = 0; 1946 1946 if (SKB_FRAG_PAGE_ORDER) { 1947 - pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP | 1948 - __GFP_NOWARN | __GFP_NORETRY, 1947 + /* Avoid direct reclaim but allow kswapd to wake */ 1948 + pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) | 1949 + __GFP_COMP | __GFP_NOWARN | 1950 + __GFP_NORETRY, 1949 1951 SKB_FRAG_PAGE_ORDER); 1950 1952 if (likely(pfrag->page)) { 1951 1953 pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
+1 -1
net/netlink/af_netlink.c
··· 2116 2116 consume_skb(info.skb2); 2117 2117 2118 2118 if (info.delivered) { 2119 - if (info.congested && (allocation & __GFP_WAIT)) 2119 + if (info.congested && gfpflags_allow_blocking(allocation)) 2120 2120 yield(); 2121 2121 return 0; 2122 2122 }
+2 -2
net/rds/ib_recv.c
··· 305 305 gfp_t slab_mask = GFP_NOWAIT; 306 306 gfp_t page_mask = GFP_NOWAIT; 307 307 308 - if (gfp & __GFP_WAIT) { 308 + if (gfp & __GFP_DIRECT_RECLAIM) { 309 309 slab_mask = GFP_KERNEL; 310 310 page_mask = GFP_HIGHUSER; 311 311 } ··· 379 379 struct ib_recv_wr *failed_wr; 380 380 unsigned int posted = 0; 381 381 int ret = 0; 382 - bool can_wait = !!(gfp & __GFP_WAIT); 382 + bool can_wait = !!(gfp & __GFP_DIRECT_RECLAIM); 383 383 u32 pos; 384 384 385 385 /* the goal here is to just make sure that someone, somewhere
+1 -1
net/rxrpc/ar-connection.c
··· 500 500 if (bundle->num_conns >= 20) { 501 501 _debug("too many conns"); 502 502 503 - if (!(gfp & __GFP_WAIT)) { 503 + if (!gfpflags_allow_blocking(gfp)) { 504 504 _leave(" = -EAGAIN"); 505 505 return -EAGAIN; 506 506 }
+1 -1
net/sctp/associola.c
··· 1590 1590 /* Set an association id for a given association */ 1591 1591 int sctp_assoc_set_id(struct sctp_association *asoc, gfp_t gfp) 1592 1592 { 1593 - bool preload = !!(gfp & __GFP_WAIT); 1593 + bool preload = gfpflags_allow_blocking(gfp); 1594 1594 int ret; 1595 1595 1596 1596 /* If the id is already assigned, keep it. */