Merge branch 'akpm' (patches from Andrew)

+14 -12

Documentation/DocBook/debugobjects.tmpl

··· 316 316 </itemizedlist> 317 317 </para> 318 318 <para> 319 - The function returns 1 when the fixup was successful, 320 - otherwise 0. The return value is used to update the 319 + The function returns true when the fixup was successful, 320 + otherwise false. The return value is used to update the 321 321 statistics. 322 322 </para> 323 323 <para> ··· 341 341 </itemizedlist> 342 342 </para> 343 343 <para> 344 - The function returns 1 when the fixup was successful, 345 - otherwise 0. The return value is used to update the 344 + The function returns true when the fixup was successful, 345 + otherwise false. The return value is used to update the 346 346 statistics. 347 347 </para> 348 348 <para> ··· 359 359 statically initialized object or not. In case it is it calls 360 360 debug_object_init() and debug_object_activate() to make the 361 361 object known to the tracker and marked active. In this case 362 - the function should return 0 because this is not a real fixup. 362 + the function should return false because this is not a real 363 + fixup. 363 364 </para> 364 365 </sect1> 365 366 ··· 377 376 </itemizedlist> 378 377 </para> 379 378 <para> 380 - The function returns 1 when the fixup was successful, 381 - otherwise 0. The return value is used to update the 379 + The function returns true when the fixup was successful, 380 + otherwise false. The return value is used to update the 382 381 statistics. 383 382 </para> 384 383 </sect1> ··· 398 397 </itemizedlist> 399 398 </para> 400 399 <para> 401 - The function returns 1 when the fixup was successful, 402 - otherwise 0. The return value is used to update the 400 + The function returns true when the fixup was successful, 401 + otherwise false. The return value is used to update the 403 402 statistics. 404 403 </para> 405 404 </sect1> ··· 415 414 debug bucket. 416 415 </para> 417 416 <para> 418 - The function returns 1 when the fixup was successful, 419 - otherwise 0. The return value is used to update the 417 + The function returns true when the fixup was successful, 418 + otherwise false. The return value is used to update the 420 419 statistics. 421 420 </para> 422 421 <para> ··· 428 427 case. The fixup function should check if this is a legitimate 429 428 case of a statically initialized object or not. In this case only 430 429 debug_object_init() should be called to make the object known to 431 - the tracker. Then the function should return 0 because this is not 430 + the tracker. Then the function should return false because this 431 + is not 432 432 a real fixup. 433 433 </para> 434 434 </sect1>

+8

Documentation/kernel-parameters.txt

··· 2168 2168 [KNL,SH] Allow user to override the default size for 2169 2169 per-device physically contiguous DMA buffers. 2170 2170 2171 + memhp_default_state=online/offline 2172 + [KNL] Set the initial state for the memory hotplug 2173 + onlining policy. If not specified, the default value is 2174 + set according to the 2175 + CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config 2176 + option. 2177 + See Documentation/memory-hotplug.txt. 2178 + 2171 2179 memmap=exactmap [KNL,X86] Enable setting of an exact 2172 2180 E820 memory map, as specified by the user. 2173 2181 Such memmap=exactmap lines can be constructed based on

+5 -4

Documentation/memory-hotplug.txt

··· 261 261 262 262 % cat /sys/devices/system/memory/auto_online_blocks 263 263 264 - The default is "offline" which means the newly added memory is not in a 265 - ready-to-use state and you have to "online" the newly added memory blocks 266 - manually. Automatic onlining can be requested by writing "online" to 267 - "auto_online_blocks" file: 264 + The default depends on the CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config 265 + option. If it is disabled the default is "offline" which means the newly added 266 + memory is not in a ready-to-use state and you have to "online" the newly added 267 + memory blocks manually. Automatic onlining can be requested by writing "online" 268 + to "auto_online_blocks" file: 268 269 269 270 % echo online > /sys/devices/system/memory/auto_online_blocks 270 271

+14

Documentation/sysctl/vm.txt

··· 57 57 - panic_on_oom 58 58 - percpu_pagelist_fraction 59 59 - stat_interval 60 + - stat_refresh 60 61 - swappiness 61 62 - user_reserve_kbytes 62 63 - vfs_cache_pressure ··· 753 752 754 753 The time interval between which vm statistics are updated. The default 755 754 is 1 second. 755 + 756 + ============================================================== 757 + 758 + stat_refresh 759 + 760 + Any read or write (by root only) flushes all the per-cpu vm statistics 761 + into their global totals, for more accurate reports when testing 762 + e.g. cat /proc/sys/vm/stat_refresh /proc/meminfo 763 + 764 + As a side-effect, it also checks for negative totals (elsewhere reported 765 + as 0) and "fails" with EINVAL if any are found, with a warning in dmesg. 766 + (At time of writing, a few stats are known sometimes to be found negative, 767 + with no ill effects: errors and warnings on these stats are suppressed.) 756 768 757 769 ============================================================== 758 770

+5 -5

Documentation/vm/transhuge.txt

··· 394 394 Refcounting on THP is mostly consistent with refcounting on other compound 395 395 pages: 396 396 397 - - get_page()/put_page() and GUP operate in head page's ->_count. 397 + - get_page()/put_page() and GUP operate in head page's ->_refcount. 398 398 399 - - ->_count in tail pages is always zero: get_page_unless_zero() never 399 + - ->_refcount in tail pages is always zero: get_page_unless_zero() never 400 400 succeed on tail pages. 401 401 402 402 - map/unmap of the pages with PTE entry increment/decrement ->_mapcount ··· 426 426 sum of mapcount of all sub-pages plus one (split_huge_page caller must 427 427 have reference for head page). 428 428 429 - split_huge_page uses migration entries to stabilize page->_count and 429 + split_huge_page uses migration entries to stabilize page->_refcount and 430 430 page->_mapcount. 431 431 432 432 We safe against physical memory scanners too: the only legitimate way 433 433 scanner can get reference to a page is get_page_unless_zero(). 434 434 435 - All tail pages has zero ->_count until atomic_add(). It prevent scanner 435 + All tail pages has zero ->_refcount until atomic_add(). It prevent scanner 436 436 from geting reference to tail page up to the point. After the atomic_add() 437 - we don't care about ->_count value. We already known how many references 437 + we don't care about ->_refcount value. We already known how many references 438 438 with should uncharge from head page. 439 439 440 440 For head page get_page_unless_zero() will succeed and we don't mind. It's

-2

arch/arc/include/asm/hugepage.h

··· 61 61 extern void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr, 62 62 pmd_t *pmd); 63 63 64 - #define has_transparent_hugepage() 1 65 - 66 64 /* Generic variants assume pgtable_t is struct page *, hence need for these */ 67 65 #define __HAVE_ARCH_PGTABLE_DEPOSIT 68 66 extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,

-5

arch/arm/include/asm/pgtable-3level.h

··· 281 281 flush_pmd_entry(pmdp); 282 282 } 283 283 284 - static inline int has_transparent_hugepage(void) 285 - { 286 - return 1; 287 - } 288 - 289 284 #endif /* __ASSEMBLY__ */ 290 285 291 286 #endif /* _ASM_PGTABLE_3LEVEL_H */

-5

arch/arm64/include/asm/pgtable.h

··· 316 316 317 317 #define set_pmd_at(mm, addr, pmdp, pmd) set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd)) 318 318 319 - static inline int has_transparent_hugepage(void) 320 - { 321 - return 1; 322 - } 323 - 324 319 #define __pgprot_modify(prot,mask,bits) \ 325 320 __pgprot((pgprot_val(prot) & ~(mask)) | (bits)) 326 321

+1

arch/arm64/mm/hugetlbpage.c

··· 307 307 } else if (ps == PUD_SIZE) { 308 308 hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); 309 309 } else { 310 + hugetlb_bad_size(); 310 311 pr_err("hugepagesz: Unsupported page size %lu K\n", ps >> 10); 311 312 return 0; 312 313 }

+1

arch/metag/mm/hugetlbpage.c

··· 239 239 if (ps == (1 << HPAGE_SHIFT)) { 240 240 hugetlb_add_hstate(HPAGE_SHIFT - PAGE_SHIFT); 241 241 } else { 242 + hugetlb_bad_size(); 242 243 pr_err("hugepagesz: Unsupported page size %lu M\n", 243 244 ps >> 20); 244 245 return 0;

+1

arch/mips/include/asm/pgtable.h

··· 533 533 534 534 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 535 535 536 + #define has_transparent_hugepage has_transparent_hugepage 536 537 extern int has_transparent_hugepage(void); 537 538 538 539 static inline int pmd_trans_huge(pmd_t pmd)

+11 -10

arch/mips/mm/tlb-r4k.c

··· 405 405 406 406 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 407 407 408 - int __init has_transparent_hugepage(void) 408 + int has_transparent_hugepage(void) 409 409 { 410 - unsigned int mask; 411 - unsigned long flags; 410 + static unsigned int mask = -1; 412 411 413 - local_irq_save(flags); 414 - write_c0_pagemask(PM_HUGE_MASK); 415 - back_to_back_c0_hazard(); 416 - mask = read_c0_pagemask(); 417 - write_c0_pagemask(PM_DEFAULT_MASK); 412 + if (mask == -1) { /* first call comes during __init */ 413 + unsigned long flags; 418 414 419 - local_irq_restore(flags); 420 - 415 + local_irq_save(flags); 416 + write_c0_pagemask(PM_HUGE_MASK); 417 + back_to_back_c0_hazard(); 418 + mask = read_c0_pagemask(); 419 + write_c0_pagemask(PM_DEFAULT_MASK); 420 + local_irq_restore(flags); 421 + } 421 422 return mask == PM_HUGE_MASK; 422 423 } 423 424

+1

arch/powerpc/include/asm/book3s/64/pgtable.h

··· 219 219 pmd_t *pmdp, pmd_t pmd); 220 220 extern void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr, 221 221 pmd_t *pmd); 222 + #define has_transparent_hugepage has_transparent_hugepage 222 223 extern int has_transparent_hugepage(void); 223 224 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ 224 225

-1

arch/powerpc/include/asm/pgtable.h

··· 65 65 struct page **pages, int *nr); 66 66 #ifndef CONFIG_TRANSPARENT_HUGEPAGE 67 67 #define pmd_large(pmd) 0 68 - #define has_transparent_hugepage() 0 69 68 #endif 70 69 pte_t *__find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, 71 70 bool *is_thp, unsigned *shift);

+4 -2

arch/powerpc/mm/hugetlbpage.c

··· 772 772 773 773 size = memparse(str, &str); 774 774 775 - if (add_huge_page_size(size) != 0) 776 - printk(KERN_WARNING "Invalid huge page size specified(%llu)\n", size); 775 + if (add_huge_page_size(size) != 0) { 776 + hugetlb_bad_size(); 777 + pr_err("Invalid huge page size specified(%llu)\n", size); 778 + } 777 779 778 780 return 1; 779 781 }

+1

arch/s390/include/asm/pgtable.h

··· 1223 1223 return pmd_val(pmd) & _SEGMENT_ENTRY_LARGE; 1224 1224 } 1225 1225 1226 + #define has_transparent_hugepage has_transparent_hugepage 1226 1227 static inline int has_transparent_hugepage(void) 1227 1228 { 1228 1229 return MACHINE_HAS_HPAGE ? 1 : 0;

-2

arch/sparc/include/asm/pgtable_64.h

··· 681 681 return pte_val(pte) & _PAGE_PMD_HUGE; 682 682 } 683 683 684 - #define has_transparent_hugepage() 1 685 - 686 684 static inline pmd_t pmd_mkold(pmd_t pmd) 687 685 { 688 686 pte_t pte = __pte(pmd_val(pmd));

-1

arch/tile/include/asm/pgtable.h

··· 487 487 } 488 488 489 489 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 490 - #define has_transparent_hugepage() 1 491 490 #define pmd_trans_huge pmd_huge_page 492 491 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ 493 492

+1 -3

arch/tile/kernel/setup.c

··· 962 962 cpumask_set_cpu(best_cpu, &node_2_cpu_mask[node]); 963 963 cpu_2_node[best_cpu] = node; 964 964 cpumask_clear_cpu(best_cpu, &unbound_cpus); 965 - node = next_node(node, default_nodes); 966 - if (node == MAX_NUMNODES) 967 - node = first_node(default_nodes); 965 + node = next_node_in(node, default_nodes); 968 966 } 969 967 970 968 /* Print out node assignments and set defaults for disabled cpus */

+6 -1

arch/tile/mm/hugetlbpage.c

··· 308 308 309 309 static __init int setup_hugepagesz(char *opt) 310 310 { 311 + int rc; 312 + 311 313 if (!saw_hugepagesz) { 312 314 saw_hugepagesz = true; 313 315 memset(huge_shift, 0, sizeof(huge_shift)); 314 316 } 315 - return __setup_hugepagesz(memparse(opt, NULL)); 317 + rc = __setup_hugepagesz(memparse(opt, NULL)); 318 + if (rc) 319 + hugetlb_bad_size(); 320 + return rc; 316 321 } 317 322 __setup("hugepagesz=", setup_hugepagesz); 318 323

+1 -1

arch/tile/mm/init.c

··· 679 679 * Hacky direct set to avoid unnecessary 680 680 * lock take/release for EVERY page here. 681 681 */ 682 - p->_count.counter = 0; 682 + p->_refcount.counter = 0; 683 683 p->_mapcount.counter = -1; 684 684 } 685 685 init_page_count(page);

+1

arch/x86/include/asm/pgtable.h

··· 181 181 return (pmd_val(pmd) & (_PAGE_PSE|_PAGE_DEVMAP)) == _PAGE_PSE; 182 182 } 183 183 184 + #define has_transparent_hugepage has_transparent_hugepage 184 185 static inline int has_transparent_hugepage(void) 185 186 { 186 187 return boot_cpu_has(X86_FEATURE_PSE);

+1

arch/x86/mm/hugetlbpage.c

··· 165 165 } else if (ps == PUD_SIZE && boot_cpu_has(X86_FEATURE_GBPAGES)) { 166 166 hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); 167 167 } else { 168 + hugetlb_bad_size(); 168 169 printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n", 169 170 ps >> 20); 170 171 return 0;

+1 -3

arch/x86/mm/numa.c

··· 617 617 if (early_cpu_to_node(i) != NUMA_NO_NODE) 618 618 continue; 619 619 numa_set_node(i, rr); 620 - rr = next_node(rr, node_online_map); 621 - if (rr == MAX_NUMNODES) 622 - rr = first_node(node_online_map); 620 + rr = next_node_in(rr, node_online_map); 623 621 } 624 622 } 625 623

+1 -1

drivers/block/aoe/aoecmd.c

··· 861 861 * discussion. 862 862 * 863 863 * We cannot use get_page in the workaround, because it insists on a 864 - * positive page count as a precondition. So we use _count directly. 864 + * positive page count as a precondition. So we use _refcount directly. 865 865 */ 866 866 static void 867 867 bio_pageinc(struct bio *bio)

+1 -1

drivers/hwtracing/intel_th/msu.c

··· 1164 1164 if (!atomic_dec_and_mutex_lock(&msc->mmap_count, &msc->buf_mutex)) 1165 1165 return; 1166 1166 1167 - /* drop page _counts */ 1167 + /* drop page _refcounts */ 1168 1168 for (pg = 0; pg < msc->nr_pages; pg++) { 1169 1169 struct page *page = msc_buffer_get_page(msc, pg); 1170 1170

+1 -1

drivers/media/usb/dvb-usb/dib0700_core.c

··· 517 517 if (nb_packet_buffer_size < 1) 518 518 nb_packet_buffer_size = 1; 519 519 520 - /* get the fimware version */ 520 + /* get the firmware version */ 521 521 usb_control_msg(udev, usb_rcvctrlpipe(udev, 0), 522 522 REQUEST_GET_VERSION, 523 523 USB_TYPE_VENDOR | USB_DIR_IN, 0, 0,

+1 -1

drivers/net/ethernet/cavium/thunder/nicvf_queues.c

··· 23 23 if (!nic->rb_pageref || !nic->rb_page) 24 24 return; 25 25 26 - atomic_add(nic->rb_pageref, &nic->rb_page->_count); 26 + page_ref_add(nic->rb_page, nic->rb_pageref); 27 27 nic->rb_pageref = 0; 28 28 } 29 29

+10 -10

drivers/net/ethernet/mellanox/mlx5/core/en_rx.c

··· 433 433 for (i = 0; i < MLX5_MPWRQ_PAGES_PER_WQE; i++) { 434 434 if (unlikely(mlx5e_alloc_and_map_page(rq, wi, i))) 435 435 goto err_unmap; 436 - atomic_add(mlx5e_mpwqe_strides_per_page(rq), 437 - &wi->umr.dma_info[i].page->_count); 436 + page_ref_add(wi->umr.dma_info[i].page, 437 + mlx5e_mpwqe_strides_per_page(rq)); 438 438 wi->skbs_frags[i] = 0; 439 439 } 440 440 ··· 452 452 while (--i >= 0) { 453 453 dma_unmap_page(rq->pdev, wi->umr.dma_info[i].addr, PAGE_SIZE, 454 454 PCI_DMA_FROMDEVICE); 455 - atomic_sub(mlx5e_mpwqe_strides_per_page(rq), 456 - &wi->umr.dma_info[i].page->_count); 455 + page_ref_sub(wi->umr.dma_info[i].page, 456 + mlx5e_mpwqe_strides_per_page(rq)); 457 457 put_page(wi->umr.dma_info[i].page); 458 458 } 459 459 dma_unmap_single(rq->pdev, wi->umr.mtt_addr, mtt_sz, PCI_DMA_TODEVICE); ··· 477 477 for (i = 0; i < MLX5_MPWRQ_PAGES_PER_WQE; i++) { 478 478 dma_unmap_page(rq->pdev, wi->umr.dma_info[i].addr, PAGE_SIZE, 479 479 PCI_DMA_FROMDEVICE); 480 - atomic_sub(mlx5e_mpwqe_strides_per_page(rq) - wi->skbs_frags[i], 481 - &wi->umr.dma_info[i].page->_count); 480 + page_ref_sub(wi->umr.dma_info[i].page, 481 + mlx5e_mpwqe_strides_per_page(rq) - wi->skbs_frags[i]); 482 482 put_page(wi->umr.dma_info[i].page); 483 483 } 484 484 dma_unmap_single(rq->pdev, wi->umr.mtt_addr, mtt_sz, PCI_DMA_TODEVICE); ··· 527 527 */ 528 528 split_page(wi->dma_info.page, MLX5_MPWRQ_WQE_PAGE_ORDER); 529 529 for (i = 0; i < MLX5_MPWRQ_PAGES_PER_WQE; i++) { 530 - atomic_add(mlx5e_mpwqe_strides_per_page(rq), 531 - &wi->dma_info.page[i]._count); 530 + page_ref_add(&wi->dma_info.page[i], 531 + mlx5e_mpwqe_strides_per_page(rq)); 532 532 wi->skbs_frags[i] = 0; 533 533 } 534 534 ··· 551 551 dma_unmap_page(rq->pdev, wi->dma_info.addr, rq->wqe_sz, 552 552 PCI_DMA_FROMDEVICE); 553 553 for (i = 0; i < MLX5_MPWRQ_PAGES_PER_WQE; i++) { 554 - atomic_sub(mlx5e_mpwqe_strides_per_page(rq) - wi->skbs_frags[i], 555 - &wi->dma_info.page[i]._count); 554 + page_ref_sub(&wi->dma_info.page[i], 555 + mlx5e_mpwqe_strides_per_page(rq) - wi->skbs_frags[i]); 556 556 put_page(&wi->dma_info.page[i]); 557 557 } 558 558 }

+3 -3

drivers/net/ethernet/qlogic/qede/qede_main.c

··· 920 920 * network stack to take the ownership of the page 921 921 * which can be recycled multiple times by the driver. 922 922 */ 923 - atomic_inc(&curr_cons->data->_count); 923 + page_ref_inc(curr_cons->data); 924 924 qede_reuse_page(edev, rxq, curr_cons); 925 925 } 926 926 ··· 1036 1036 /* Incr page ref count to reuse on allocation failure 1037 1037 * so that it doesn't get freed while freeing SKB. 1038 1038 */ 1039 - atomic_inc(&current_bd->data->_count); 1039 + page_ref_inc(current_bd->data); 1040 1040 goto out; 1041 1041 } 1042 1042 ··· 1487 1487 * freeing SKB. 1488 1488 */ 1489 1489 1490 - atomic_inc(&sw_rx_data->data->_count); 1490 + page_ref_inc(sw_rx_data->data); 1491 1491 rxq->rx_alloc_errors++; 1492 1492 qede_recycle_rx_bd_ring(rxq, edev, 1493 1493 fp_cqe->bd_num);

+1 -1

drivers/scsi/bfa/bfi.h

··· 356 356 u8 port0_mode; /* device mode for port 0 */ 357 357 u8 port1_mode; /* device mode for port 1 */ 358 358 u32 exec; /* exec vector */ 359 - u32 bootenv; /* fimware boot env */ 359 + u32 bootenv; /* firmware boot env */ 360 360 u32 rsvd_b[2]; 361 361 struct bfi_ioc_fwver_s fwver; 362 362 u32 md5sum[BFI_IOC_MD5SUM_SZ];

+1 -1

drivers/staging/comedi/drivers/daqboard2000.c

··· 26 26 * Much of the functionality of this driver was determined from reading 27 27 * the source code for the Windows driver. 28 28 * 29 - * The FPGA on the board requires fimware, which is available from 29 + * The FPGA on the board requires firmware, which is available from 30 30 * http://www.comedi.org in the comedi_nonfree_firmware tarball. 31 31 * 32 32 * Configuration options: not applicable, uses PCI auto config

+5 -5

fs/buffer.c

··· 255 255 */ 256 256 static void free_more_memory(void) 257 257 { 258 - struct zone *zone; 258 + struct zoneref *z; 259 259 int nid; 260 260 261 261 wakeup_flusher_threads(1024, WB_REASON_FREE_MORE_MEM); 262 262 yield(); 263 263 264 264 for_each_online_node(nid) { 265 - (void)first_zones_zonelist(node_zonelist(nid, GFP_NOFS), 266 - gfp_zone(GFP_NOFS), NULL, 267 - &zone); 268 - if (zone) 265 + 266 + z = first_zones_zonelist(node_zonelist(nid, GFP_NOFS), 267 + gfp_zone(GFP_NOFS), NULL); 268 + if (z->zone) 269 269 try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0, 270 270 GFP_NOFS, NULL); 271 271 }

+6 -6

fs/eventpoll.c

··· 1583 1583 return ep_scan_ready_list(ep, ep_send_events_proc, &esed, 0, false); 1584 1584 } 1585 1585 1586 - static inline struct timespec ep_set_mstimeout(long ms) 1586 + static inline struct timespec64 ep_set_mstimeout(long ms) 1587 1587 { 1588 - struct timespec now, ts = { 1588 + struct timespec64 now, ts = { 1589 1589 .tv_sec = ms / MSEC_PER_SEC, 1590 1590 .tv_nsec = NSEC_PER_MSEC * (ms % MSEC_PER_SEC), 1591 1591 }; 1592 1592 1593 - ktime_get_ts(&now); 1594 - return timespec_add_safe(now, ts); 1593 + ktime_get_ts64(&now); 1594 + return timespec64_add_safe(now, ts); 1595 1595 } 1596 1596 1597 1597 /** ··· 1621 1621 ktime_t expires, *to = NULL; 1622 1622 1623 1623 if (timeout > 0) { 1624 - struct timespec end_time = ep_set_mstimeout(timeout); 1624 + struct timespec64 end_time = ep_set_mstimeout(timeout); 1625 1625 1626 1626 slack = select_estimate_accuracy(&end_time); 1627 1627 to = &expires; 1628 - *to = timespec_to_ktime(end_time); 1628 + *to = timespec64_to_ktime(end_time); 1629 1629 } else if (timeout == 0) { 1630 1630 /* 1631 1631 * Avoid the unnecessary trip to the wait queue loop, if the

+7

fs/notify/fsnotify.h

··· 56 56 fsnotify_destroy_marks(&real_mount(mnt)->mnt_fsnotify_marks, 57 57 &mnt->mnt_root->d_lock); 58 58 } 59 + /* prepare for freeing all marks associated with given group */ 60 + extern void fsnotify_detach_group_marks(struct fsnotify_group *group); 61 + /* 62 + * wait for fsnotify_mark_srcu period to end and free all marks in destroy_list 63 + */ 64 + extern void fsnotify_mark_destroy_list(void); 65 + 59 66 /* 60 67 * update the dentry->d_flags of all of inode's children to indicate if inode cares 61 68 * about events that happen to its children.

+13 -4

fs/notify/group.c

··· 47 47 */ 48 48 void fsnotify_destroy_group(struct fsnotify_group *group) 49 49 { 50 - /* clear all inode marks for this group */ 51 - fsnotify_clear_marks_by_group(group); 50 + /* clear all inode marks for this group, attach them to destroy_list */ 51 + fsnotify_detach_group_marks(group); 52 52 53 - synchronize_srcu(&fsnotify_mark_srcu); 53 + /* 54 + * Wait for fsnotify_mark_srcu period to end and free all marks in 55 + * destroy_list 56 + */ 57 + fsnotify_mark_destroy_list(); 54 58 55 - /* clear the notification queue of all events */ 59 + /* 60 + * Since we have waited for fsnotify_mark_srcu in 61 + * fsnotify_mark_destroy_list() there can be no outstanding event 62 + * notification against this group. So clearing the notification queue 63 + * of all events is reliable now. 64 + */ 56 65 fsnotify_flush_notify(group); 57 66 58 67 /*

+61 -17

fs/notify/mark.c

··· 97 97 static DEFINE_SPINLOCK(destroy_lock); 98 98 static LIST_HEAD(destroy_list); 99 99 100 - static void fsnotify_mark_destroy(struct work_struct *work); 101 - static DECLARE_DELAYED_WORK(reaper_work, fsnotify_mark_destroy); 100 + static void fsnotify_mark_destroy_workfn(struct work_struct *work); 101 + static DECLARE_DELAYED_WORK(reaper_work, fsnotify_mark_destroy_workfn); 102 102 103 103 void fsnotify_get_mark(struct fsnotify_mark *mark) 104 104 { ··· 173 173 } 174 174 175 175 /* 176 - * Free fsnotify mark. The freeing is actually happening from a kthread which 177 - * first waits for srcu period end. Caller must have a reference to the mark 178 - * or be protected by fsnotify_mark_srcu. 176 + * Prepare mark for freeing and add it to the list of marks prepared for 177 + * freeing. The actual freeing must happen after SRCU period ends and the 178 + * caller is responsible for this. 179 + * 180 + * The function returns true if the mark was added to the list of marks for 181 + * freeing. The function returns false if someone else has already called 182 + * __fsnotify_free_mark() for the mark. 179 183 */ 180 - void fsnotify_free_mark(struct fsnotify_mark *mark) 184 + static bool __fsnotify_free_mark(struct fsnotify_mark *mark) 181 185 { 182 186 struct fsnotify_group *group = mark->group; 183 187 ··· 189 185 /* something else already called this function on this mark */ 190 186 if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE)) { 191 187 spin_unlock(&mark->lock); 192 - return; 188 + return false; 193 189 } 194 190 mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE; 195 191 spin_unlock(&mark->lock); 196 - 197 - spin_lock(&destroy_lock); 198 - list_add(&mark->g_list, &destroy_list); 199 - spin_unlock(&destroy_lock); 200 - queue_delayed_work(system_unbound_wq, &reaper_work, 201 - FSNOTIFY_REAPER_DELAY); 202 192 203 193 /* 204 194 * Some groups like to know that marks are being freed. This is a ··· 201 203 */ 202 204 if (group->ops->freeing_mark) 203 205 group->ops->freeing_mark(mark, group); 206 + 207 + spin_lock(&destroy_lock); 208 + list_add(&mark->g_list, &destroy_list); 209 + spin_unlock(&destroy_lock); 210 + 211 + return true; 212 + } 213 + 214 + /* 215 + * Free fsnotify mark. The freeing is actually happening from a workqueue which 216 + * first waits for srcu period end. Caller must have a reference to the mark 217 + * or be protected by fsnotify_mark_srcu. 218 + */ 219 + void fsnotify_free_mark(struct fsnotify_mark *mark) 220 + { 221 + if (__fsnotify_free_mark(mark)) { 222 + queue_delayed_work(system_unbound_wq, &reaper_work, 223 + FSNOTIFY_REAPER_DELAY); 224 + } 204 225 } 205 226 206 227 void fsnotify_destroy_mark(struct fsnotify_mark *mark, ··· 485 468 } 486 469 487 470 /* 488 - * Given a group, destroy all of the marks associated with that group. 471 + * Given a group, prepare for freeing all the marks associated with that group. 472 + * The marks are attached to the list of marks prepared for destruction, the 473 + * caller is responsible for freeing marks in that list after SRCU period has 474 + * ended. 489 475 */ 490 - void fsnotify_clear_marks_by_group(struct fsnotify_group *group) 476 + void fsnotify_detach_group_marks(struct fsnotify_group *group) 491 477 { 492 - fsnotify_clear_marks_by_group_flags(group, (unsigned int)-1); 478 + struct fsnotify_mark *mark; 479 + 480 + while (1) { 481 + mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING); 482 + if (list_empty(&group->marks_list)) { 483 + mutex_unlock(&group->mark_mutex); 484 + break; 485 + } 486 + mark = list_first_entry(&group->marks_list, 487 + struct fsnotify_mark, g_list); 488 + fsnotify_get_mark(mark); 489 + fsnotify_detach_mark(mark); 490 + mutex_unlock(&group->mark_mutex); 491 + __fsnotify_free_mark(mark); 492 + fsnotify_put_mark(mark); 493 + } 493 494 } 494 495 495 496 void fsnotify_duplicate_mark(struct fsnotify_mark *new, struct fsnotify_mark *old) ··· 534 499 mark->free_mark = free_mark; 535 500 } 536 501 537 - static void fsnotify_mark_destroy(struct work_struct *work) 502 + /* 503 + * Destroy all marks in destroy_list, waits for SRCU period to finish before 504 + * actually freeing marks. 505 + */ 506 + void fsnotify_mark_destroy_list(void) 538 507 { 539 508 struct fsnotify_mark *mark, *next; 540 509 struct list_head private_destroy_list; ··· 554 515 list_del_init(&mark->g_list); 555 516 fsnotify_put_mark(mark); 556 517 } 518 + } 519 + 520 + static void fsnotify_mark_destroy_workfn(struct work_struct *work) 521 + { 522 + fsnotify_mark_destroy_list(); 557 523 }

+1 -2

fs/ocfs2/alloc.c

··· 5351 5351 { 5352 5352 int ret; 5353 5353 u32 left_cpos, rec_range, trunc_range; 5354 - int wants_rotate = 0, is_rightmost_tree_rec = 0; 5354 + int is_rightmost_tree_rec = 0; 5355 5355 struct super_block *sb = ocfs2_metadata_cache_get_super(et->et_ci); 5356 5356 struct ocfs2_path *left_path = NULL; 5357 5357 struct ocfs2_extent_list *el = path_leaf_el(path); ··· 5457 5457 5458 5458 memset(rec, 0, sizeof(*rec)); 5459 5459 ocfs2_cleanup_merge(el, index); 5460 - wants_rotate = 1; 5461 5460 5462 5461 next_free = le16_to_cpu(el->l_next_free_rec); 5463 5462 if (is_rightmost_tree_rec && next_free > 1) {

+2 -3

fs/ocfs2/cluster/heartbeat.c

··· 1456 1456 1457 1457 static int o2hb_read_block_input(struct o2hb_region *reg, 1458 1458 const char *page, 1459 - size_t count, 1460 1459 unsigned long *ret_bytes, 1461 1460 unsigned int *ret_bits) 1462 1461 { ··· 1498 1499 if (reg->hr_bdev) 1499 1500 return -EINVAL; 1500 1501 1501 - status = o2hb_read_block_input(reg, page, count, 1502 - &block_bytes, &block_bits); 1502 + status = o2hb_read_block_input(reg, page, &block_bytes, 1503 + &block_bits); 1503 1504 if (status) 1504 1505 return status; 1505 1506

+1 -1

fs/ocfs2/ocfs2_fs.h

··· 580 580 /*00*/ __u8 es_valid; 581 581 __u8 es_reserved1[3]; 582 582 __le32 es_node_num; 583 - /*10*/ 583 + /*08*/ 584 584 }; 585 585 586 586 /*

+1 -5

fs/ocfs2/slot_map.c

··· 535 535 spin_unlock(&osb->osb_lock); 536 536 537 537 status = ocfs2_update_disk_slot(osb, si, slot_num); 538 - if (status < 0) { 538 + if (status < 0) 539 539 mlog_errno(status); 540 - goto bail; 541 - } 542 540 543 - bail: 544 541 ocfs2_free_slot_info(osb); 545 542 } 546 -

+1 -1

fs/proc/page.c

··· 142 142 143 143 144 144 /* 145 - * Caveats on high order pages: page->_count will only be set 145 + * Caveats on high order pages: page->_refcount will only be set 146 146 * -1 on the head page; SLUB/SLQB do the same for PG_slab; 147 147 * SLOB won't set PG_slab at all on compound pages. 148 148 */

+37 -30

fs/select.c

··· 47 47 48 48 #define MAX_SLACK (100 * NSEC_PER_MSEC) 49 49 50 - static long __estimate_accuracy(struct timespec *tv) 50 + static long __estimate_accuracy(struct timespec64 *tv) 51 51 { 52 52 long slack; 53 53 int divfactor = 1000; ··· 70 70 return slack; 71 71 } 72 72 73 - u64 select_estimate_accuracy(struct timespec *tv) 73 + u64 select_estimate_accuracy(struct timespec64 *tv) 74 74 { 75 75 u64 ret; 76 - struct timespec now; 76 + struct timespec64 now; 77 77 78 78 /* 79 79 * Realtime tasks get a slack of 0 for obvious reasons. ··· 82 82 if (rt_task(current)) 83 83 return 0; 84 84 85 - ktime_get_ts(&now); 86 - now = timespec_sub(*tv, now); 85 + ktime_get_ts64(&now); 86 + now = timespec64_sub(*tv, now); 87 87 ret = __estimate_accuracy(&now); 88 88 if (ret < current->timer_slack_ns) 89 89 return current->timer_slack_ns; ··· 260 260 261 261 /** 262 262 * poll_select_set_timeout - helper function to setup the timeout value 263 - * @to: pointer to timespec variable for the final timeout 263 + * @to: pointer to timespec64 variable for the final timeout 264 264 * @sec: seconds (from user space) 265 265 * @nsec: nanoseconds (from user space) 266 266 * ··· 269 269 * 270 270 * Returns -EINVAL if sec/nsec are not normalized. Otherwise 0. 271 271 */ 272 - int poll_select_set_timeout(struct timespec *to, long sec, long nsec) 272 + int poll_select_set_timeout(struct timespec64 *to, time64_t sec, long nsec) 273 273 { 274 - struct timespec ts = {.tv_sec = sec, .tv_nsec = nsec}; 274 + struct timespec64 ts = {.tv_sec = sec, .tv_nsec = nsec}; 275 275 276 - if (!timespec_valid(&ts)) 276 + if (!timespec64_valid(&ts)) 277 277 return -EINVAL; 278 278 279 279 /* Optimize for the zero timeout value here */ 280 280 if (!sec && !nsec) { 281 281 to->tv_sec = to->tv_nsec = 0; 282 282 } else { 283 - ktime_get_ts(to); 284 - *to = timespec_add_safe(*to, ts); 283 + ktime_get_ts64(to); 284 + *to = timespec64_add_safe(*to, ts); 285 285 } 286 286 return 0; 287 287 } 288 288 289 - static int poll_select_copy_remaining(struct timespec *end_time, void __user *p, 289 + static int poll_select_copy_remaining(struct timespec64 *end_time, 290 + void __user *p, 290 291 int timeval, int ret) 291 292 { 293 + struct timespec64 rts64; 292 294 struct timespec rts; 293 295 struct timeval rtv; 294 296 ··· 304 302 if (!end_time->tv_sec && !end_time->tv_nsec) 305 303 return ret; 306 304 307 - ktime_get_ts(&rts); 308 - rts = timespec_sub(*end_time, rts); 309 - if (rts.tv_sec < 0) 310 - rts.tv_sec = rts.tv_nsec = 0; 305 + ktime_get_ts64(&rts64); 306 + rts64 = timespec64_sub(*end_time, rts64); 307 + if (rts64.tv_sec < 0) 308 + rts64.tv_sec = rts64.tv_nsec = 0; 309 + 310 + rts = timespec64_to_timespec(rts64); 311 311 312 312 if (timeval) { 313 313 if (sizeof(rtv) > sizeof(rtv.tv_sec) + sizeof(rtv.tv_usec)) 314 314 memset(&rtv, 0, sizeof(rtv)); 315 - rtv.tv_sec = rts.tv_sec; 316 - rtv.tv_usec = rts.tv_nsec / NSEC_PER_USEC; 315 + rtv.tv_sec = rts64.tv_sec; 316 + rtv.tv_usec = rts64.tv_nsec / NSEC_PER_USEC; 317 317 318 318 if (!copy_to_user(p, &rtv, sizeof(rtv))) 319 319 return ret; ··· 400 396 wait->_key |= POLLOUT_SET; 401 397 } 402 398 403 - int do_select(int n, fd_set_bits *fds, struct timespec *end_time) 399 + int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time) 404 400 { 405 401 ktime_t expire, *to = NULL; 406 402 struct poll_wqueues table; ··· 526 522 * pointer to the expiry value. 527 523 */ 528 524 if (end_time && !to) { 529 - expire = timespec_to_ktime(*end_time); 525 + expire = timespec64_to_ktime(*end_time); 530 526 to = &expire; 531 527 } 532 528 ··· 549 545 * I'm trying ERESTARTNOHAND which restart only when you want to. 550 546 */ 551 547 int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp, 552 - fd_set __user *exp, struct timespec *end_time) 548 + fd_set __user *exp, struct timespec64 *end_time) 553 549 { 554 550 fd_set_bits fds; 555 551 void *bits; ··· 626 622 SYSCALL_DEFINE5(select, int, n, fd_set __user *, inp, fd_set __user *, outp, 627 623 fd_set __user *, exp, struct timeval __user *, tvp) 628 624 { 629 - struct timespec end_time, *to = NULL; 625 + struct timespec64 end_time, *to = NULL; 630 626 struct timeval tv; 631 627 int ret; 632 628 ··· 652 648 const sigset_t __user *sigmask, size_t sigsetsize) 653 649 { 654 650 sigset_t ksigmask, sigsaved; 655 - struct timespec ts, end_time, *to = NULL; 651 + struct timespec ts; 652 + struct timespec64 ts64, end_time, *to = NULL; 656 653 int ret; 657 654 658 655 if (tsp) { 659 656 if (copy_from_user(&ts, tsp, sizeof(ts))) 660 657 return -EFAULT; 658 + ts64 = timespec_to_timespec64(ts); 661 659 662 660 to = &end_time; 663 - if (poll_select_set_timeout(to, ts.tv_sec, ts.tv_nsec)) 661 + if (poll_select_set_timeout(to, ts64.tv_sec, ts64.tv_nsec)) 664 662 return -EINVAL; 665 663 } 666 664 ··· 785 779 } 786 780 787 781 static int do_poll(struct poll_list *list, struct poll_wqueues *wait, 788 - struct timespec *end_time) 782 + struct timespec64 *end_time) 789 783 { 790 784 poll_table* pt = &wait->pt; 791 785 ktime_t expire, *to = NULL; ··· 860 854 * pointer to the expiry value. 861 855 */ 862 856 if (end_time && !to) { 863 - expire = timespec_to_ktime(*end_time); 857 + expire = timespec64_to_ktime(*end_time); 864 858 to = &expire; 865 859 } 866 860 ··· 874 868 sizeof(struct pollfd)) 875 869 876 870 int do_sys_poll(struct pollfd __user *ufds, unsigned int nfds, 877 - struct timespec *end_time) 871 + struct timespec64 *end_time) 878 872 { 879 873 struct poll_wqueues table; 880 874 int err = -EFAULT, fdcount, len, size; ··· 942 936 { 943 937 struct pollfd __user *ufds = restart_block->poll.ufds; 944 938 int nfds = restart_block->poll.nfds; 945 - struct timespec *to = NULL, end_time; 939 + struct timespec64 *to = NULL, end_time; 946 940 int ret; 947 941 948 942 if (restart_block->poll.has_timeout) { ··· 963 957 SYSCALL_DEFINE3(poll, struct pollfd __user *, ufds, unsigned int, nfds, 964 958 int, timeout_msecs) 965 959 { 966 - struct timespec end_time, *to = NULL; 960 + struct timespec64 end_time, *to = NULL; 967 961 int ret; 968 962 969 963 if (timeout_msecs >= 0) { ··· 999 993 size_t, sigsetsize) 1000 994 { 1001 995 sigset_t ksigmask, sigsaved; 1002 - struct timespec ts, end_time, *to = NULL; 996 + struct timespec ts; 997 + struct timespec64 end_time, *to = NULL; 1003 998 int ret; 1004 999 1005 1000 if (tsp) {

+8

include/asm-generic/pgtable.h

··· 806 806 #define io_remap_pfn_range remap_pfn_range 807 807 #endif 808 808 809 + #ifndef has_transparent_hugepage 810 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 811 + #define has_transparent_hugepage() 1 812 + #else 813 + #define has_transparent_hugepage() 0 814 + #endif 815 + #endif 816 + 809 817 #endif /* _ASM_GENERIC_PGTABLE_H */

+8 -8

include/linux/bootmem.h

··· 83 83 unsigned long goal); 84 84 extern void *__alloc_bootmem_nopanic(unsigned long size, 85 85 unsigned long align, 86 - unsigned long goal); 86 + unsigned long goal) __malloc; 87 87 extern void *__alloc_bootmem_node(pg_data_t *pgdat, 88 88 unsigned long size, 89 89 unsigned long align, 90 - unsigned long goal); 90 + unsigned long goal) __malloc; 91 91 void *__alloc_bootmem_node_high(pg_data_t *pgdat, 92 92 unsigned long size, 93 93 unsigned long align, 94 - unsigned long goal); 94 + unsigned long goal) __malloc; 95 95 extern void *__alloc_bootmem_node_nopanic(pg_data_t *pgdat, 96 96 unsigned long size, 97 97 unsigned long align, 98 - unsigned long goal); 98 + unsigned long goal) __malloc; 99 99 void *___alloc_bootmem_node_nopanic(pg_data_t *pgdat, 100 100 unsigned long size, 101 101 unsigned long align, 102 102 unsigned long goal, 103 - unsigned long limit); 103 + unsigned long limit) __malloc; 104 104 extern void *__alloc_bootmem_low(unsigned long size, 105 105 unsigned long align, 106 - unsigned long goal); 106 + unsigned long goal) __malloc; 107 107 void *__alloc_bootmem_low_nopanic(unsigned long size, 108 108 unsigned long align, 109 - unsigned long goal); 109 + unsigned long goal) __malloc; 110 110 extern void *__alloc_bootmem_low_node(pg_data_t *pgdat, 111 111 unsigned long size, 112 112 unsigned long align, 113 - unsigned long goal); 113 + unsigned long goal) __malloc; 114 114 115 115 #ifdef CONFIG_NO_BOOTMEM 116 116 /* We are using top down, so it is safe to use 0 here */

+3 -3

include/linux/compaction.h

··· 39 39 40 40 extern int fragmentation_index(struct zone *zone, unsigned int order); 41 41 extern unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order, 42 - int alloc_flags, const struct alloc_context *ac, 43 - enum migrate_mode mode, int *contended); 42 + unsigned int alloc_flags, const struct alloc_context *ac, 43 + enum migrate_mode mode, int *contended); 44 44 extern void compact_pgdat(pg_data_t *pgdat, int order); 45 45 extern void reset_isolation_suitable(pg_data_t *pgdat); 46 46 extern unsigned long compaction_suitable(struct zone *zone, int order, 47 - int alloc_flags, int classzone_idx); 47 + unsigned int alloc_flags, int classzone_idx); 48 48 49 49 extern void defer_compaction(struct zone *zone, int order); 50 50 extern bool compaction_deferred(struct zone *zone, int order);

+1

include/linux/compiler-gcc.h

··· 142 142 143 143 #if GCC_VERSION >= 30400 144 144 #define __must_check __attribute__((warn_unused_result)) 145 + #define __malloc __attribute__((__malloc__)) 145 146 #endif 146 147 147 148 #if GCC_VERSION >= 40000

+4

include/linux/compiler.h

··· 357 357 #define __deprecated_for_modules 358 358 #endif 359 359 360 + #ifndef __malloc 361 + #define __malloc 362 + #endif 363 + 360 364 /* 361 365 * Allow us to avoid 'defined but not used' warnings on functions and data, 362 366 * as well as force them to be emitted to the assembly file.

+28 -14

include/linux/cpuset.h

··· 16 16 17 17 #ifdef CONFIG_CPUSETS 18 18 19 - extern struct static_key cpusets_enabled_key; 19 + extern struct static_key_false cpusets_enabled_key; 20 20 static inline bool cpusets_enabled(void) 21 21 { 22 - return static_key_false(&cpusets_enabled_key); 22 + return static_branch_unlikely(&cpusets_enabled_key); 23 23 } 24 24 25 25 static inline int nr_cpusets(void) 26 26 { 27 27 /* jump label reference count + the top-level cpuset */ 28 - return static_key_count(&cpusets_enabled_key) + 1; 28 + return static_key_count(&cpusets_enabled_key.key) + 1; 29 29 } 30 30 31 31 static inline void cpuset_inc(void) 32 32 { 33 - static_key_slow_inc(&cpusets_enabled_key); 33 + static_branch_inc(&cpusets_enabled_key); 34 34 } 35 35 36 36 static inline void cpuset_dec(void) 37 37 { 38 - static_key_slow_dec(&cpusets_enabled_key); 38 + static_branch_dec(&cpusets_enabled_key); 39 39 } 40 40 41 41 extern int cpuset_init(void); ··· 48 48 void cpuset_init_current_mems_allowed(void); 49 49 int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask); 50 50 51 - extern int __cpuset_node_allowed(int node, gfp_t gfp_mask); 51 + extern bool __cpuset_node_allowed(int node, gfp_t gfp_mask); 52 52 53 - static inline int cpuset_node_allowed(int node, gfp_t gfp_mask) 53 + static inline bool cpuset_node_allowed(int node, gfp_t gfp_mask) 54 54 { 55 - return nr_cpusets() <= 1 || __cpuset_node_allowed(node, gfp_mask); 55 + if (cpusets_enabled()) 56 + return __cpuset_node_allowed(node, gfp_mask); 57 + return true; 56 58 } 57 59 58 - static inline int cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask) 60 + static inline bool __cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask) 59 61 { 60 - return cpuset_node_allowed(zone_to_nid(z), gfp_mask); 62 + return __cpuset_node_allowed(zone_to_nid(z), gfp_mask); 63 + } 64 + 65 + static inline bool cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask) 66 + { 67 + if (cpusets_enabled()) 68 + return __cpuset_zone_allowed(z, gfp_mask); 69 + return true; 61 70 } 62 71 63 72 extern int cpuset_mems_allowed_intersects(const struct task_struct *tsk1, ··· 181 172 return 1; 182 173 } 183 174 184 - static inline int cpuset_node_allowed(int node, gfp_t gfp_mask) 175 + static inline bool cpuset_node_allowed(int node, gfp_t gfp_mask) 185 176 { 186 - return 1; 177 + return true; 187 178 } 188 179 189 - static inline int cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask) 180 + static inline bool __cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask) 190 181 { 191 - return 1; 182 + return true; 183 + } 184 + 185 + static inline bool cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask) 186 + { 187 + return true; 192 188 } 193 189 194 190 static inline int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,

+10 -7

include/linux/debugobjects.h

··· 38 38 * @name: name of the object typee 39 39 * @debug_hint: function returning address, which have associated 40 40 * kernel symbol, to allow identify the object 41 + * @is_static_object return true if the obj is static, otherwise return false 41 42 * @fixup_init: fixup function, which is called when the init check 42 - * fails 43 + * fails. All fixup functions must return true if fixup 44 + * was successful, otherwise return false 43 45 * @fixup_activate: fixup function, which is called when the activate check 44 46 * fails 45 47 * @fixup_destroy: fixup function, which is called when the destroy check ··· 53 51 */ 54 52 struct debug_obj_descr { 55 53 const char *name; 56 - void *(*debug_hint) (void *addr); 57 - int (*fixup_init) (void *addr, enum debug_obj_state state); 58 - int (*fixup_activate) (void *addr, enum debug_obj_state state); 59 - int (*fixup_destroy) (void *addr, enum debug_obj_state state); 60 - int (*fixup_free) (void *addr, enum debug_obj_state state); 61 - int (*fixup_assert_init)(void *addr, enum debug_obj_state state); 54 + void *(*debug_hint)(void *addr); 55 + bool (*is_static_object)(void *addr); 56 + bool (*fixup_init)(void *addr, enum debug_obj_state state); 57 + bool (*fixup_activate)(void *addr, enum debug_obj_state state); 58 + bool (*fixup_destroy)(void *addr, enum debug_obj_state state); 59 + bool (*fixup_free)(void *addr, enum debug_obj_state state); 60 + bool (*fixup_assert_init)(void *addr, enum debug_obj_state state); 62 61 }; 63 62 64 63 #ifdef CONFIG_DEBUG_OBJECTS

+6 -6

include/linux/device.h

··· 609 609 610 610 #ifdef CONFIG_DEBUG_DEVRES 611 611 extern void *__devres_alloc_node(dr_release_t release, size_t size, gfp_t gfp, 612 - int nid, const char *name); 612 + int nid, const char *name) __malloc; 613 613 #define devres_alloc(release, size, gfp) \ 614 614 __devres_alloc_node(release, size, gfp, NUMA_NO_NODE, #release) 615 615 #define devres_alloc_node(release, size, gfp, nid) \ 616 616 __devres_alloc_node(release, size, gfp, nid, #release) 617 617 #else 618 618 extern void *devres_alloc_node(dr_release_t release, size_t size, gfp_t gfp, 619 - int nid); 619 + int nid) __malloc; 620 620 static inline void *devres_alloc(dr_release_t release, size_t size, gfp_t gfp) 621 621 { 622 622 return devres_alloc_node(release, size, gfp, NUMA_NO_NODE); ··· 648 648 extern int devres_release_group(struct device *dev, void *id); 649 649 650 650 /* managed devm_k.alloc/kfree for device drivers */ 651 - extern void *devm_kmalloc(struct device *dev, size_t size, gfp_t gfp); 651 + extern void *devm_kmalloc(struct device *dev, size_t size, gfp_t gfp) __malloc; 652 652 extern __printf(3, 0) 653 653 char *devm_kvasprintf(struct device *dev, gfp_t gfp, const char *fmt, 654 - va_list ap); 654 + va_list ap) __malloc; 655 655 extern __printf(3, 4) 656 - char *devm_kasprintf(struct device *dev, gfp_t gfp, const char *fmt, ...); 656 + char *devm_kasprintf(struct device *dev, gfp_t gfp, const char *fmt, ...) __malloc; 657 657 static inline void *devm_kzalloc(struct device *dev, size_t size, gfp_t gfp) 658 658 { 659 659 return devm_kmalloc(dev, size, gfp | __GFP_ZERO); ··· 671 671 return devm_kmalloc_array(dev, n, size, flags | __GFP_ZERO); 672 672 } 673 673 extern void devm_kfree(struct device *dev, void *p); 674 - extern char *devm_kstrdup(struct device *dev, const char *s, gfp_t gfp); 674 + extern char *devm_kstrdup(struct device *dev, const char *s, gfp_t gfp) __malloc; 675 675 extern void *devm_kmemdup(struct device *dev, const void *src, size_t len, 676 676 gfp_t gfp); 677 677

-2

include/linux/fsnotify_backend.h

··· 359 359 extern void fsnotify_clear_inode_marks_by_group(struct fsnotify_group *group); 360 360 /* run all the marks in a group, and clear all of the marks where mark->flags & flags is true*/ 361 361 extern void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group, unsigned int flags); 362 - /* run all the marks in a group, and flag them to be freed */ 363 - extern void fsnotify_clear_marks_by_group(struct fsnotify_group *group); 364 362 extern void fsnotify_get_mark(struct fsnotify_mark *mark); 365 363 extern void fsnotify_put_mark(struct fsnotify_mark *mark); 366 364 extern void fsnotify_unmount_inodes(struct super_block *sb);

+1 -3

include/linux/huge_mm.h

··· 28 28 extern int mincore_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, 29 29 unsigned long addr, unsigned long end, 30 30 unsigned char *vec); 31 - extern bool move_huge_pmd(struct vm_area_struct *vma, 32 - struct vm_area_struct *new_vma, 33 - unsigned long old_addr, 31 + extern bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, 34 32 unsigned long new_addr, unsigned long old_end, 35 33 pmd_t *old_pmd, pmd_t *new_pmd); 36 34 extern int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,

+1

include/linux/hugetlb.h

··· 338 338 /* arch callback */ 339 339 int __init alloc_bootmem_huge_page(struct hstate *h); 340 340 341 + void __init hugetlb_bad_size(void); 341 342 void __init hugetlb_add_hstate(unsigned order); 342 343 struct hstate *size_to_hstate(unsigned long size); 343 344

+3 -3

include/linux/hugetlb_inline.h

··· 5 5 6 6 #include <linux/mm.h> 7 7 8 - static inline int is_vm_hugetlb_page(struct vm_area_struct *vma) 8 + static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma) 9 9 { 10 10 return !!(vma->vm_flags & VM_HUGETLB); 11 11 } 12 12 13 13 #else 14 14 15 - static inline int is_vm_hugetlb_page(struct vm_area_struct *vma) 15 + static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma) 16 16 { 17 - return 0; 17 + return false; 18 18 } 19 19 20 20 #endif

+2 -2

include/linux/kernel.h

··· 412 412 int scnprintf(char *buf, size_t size, const char *fmt, ...); 413 413 extern __printf(3, 0) 414 414 int vscnprintf(char *buf, size_t size, const char *fmt, va_list args); 415 - extern __printf(2, 3) 415 + extern __printf(2, 3) __malloc 416 416 char *kasprintf(gfp_t gfp, const char *fmt, ...); 417 - extern __printf(2, 0) 417 + extern __printf(2, 0) __malloc 418 418 char *kvasprintf(gfp_t gfp, const char *fmt, va_list args); 419 419 extern __printf(2, 0) 420 420 const char *kvasprintf_const(gfp_t gfp, const char *fmt, va_list args);

-6

include/linux/memcontrol.h

··· 658 658 return 0; 659 659 } 660 660 661 - static inline void 662 - mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru, 663 - int increment) 664 - { 665 - } 666 - 667 661 static inline unsigned long 668 662 mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg, 669 663 int nid, unsigned int lru_mask)

+3 -3

include/linux/memory_hotplug.h

··· 247 247 248 248 #ifdef CONFIG_MEMORY_HOTREMOVE 249 249 250 - extern int is_mem_section_removable(unsigned long pfn, unsigned long nr_pages); 250 + extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages); 251 251 extern void try_offline_node(int nid); 252 252 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); 253 253 extern void remove_memory(int nid, u64 start, u64 size); 254 254 255 255 #else 256 - static inline int is_mem_section_removable(unsigned long pfn, 256 + static inline bool is_mem_section_removable(unsigned long pfn, 257 257 unsigned long nr_pages) 258 258 { 259 - return 0; 259 + return false; 260 260 } 261 261 262 262 static inline void try_offline_node(int nid) {}

+11 -5

include/linux/mempolicy.h

··· 172 172 extern void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol); 173 173 174 174 /* Check if a vma is migratable */ 175 - static inline int vma_migratable(struct vm_area_struct *vma) 175 + static inline bool vma_migratable(struct vm_area_struct *vma) 176 176 { 177 177 if (vma->vm_flags & (VM_IO | VM_PFNMAP)) 178 - return 0; 178 + return false; 179 179 180 180 #ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION 181 181 if (vma->vm_flags & VM_HUGETLB) 182 - return 0; 182 + return false; 183 183 #endif 184 184 185 185 /* ··· 190 190 if (vma->vm_file && 191 191 gfp_zone(mapping_gfp_mask(vma->vm_file->f_mapping)) 192 192 < policy_zone) 193 - return 0; 194 - return 1; 193 + return false; 194 + return true; 195 195 } 196 196 197 197 extern int mpol_misplaced(struct page *, struct vm_area_struct *, unsigned long); ··· 226 226 227 227 static inline void mpol_free_shared_policy(struct shared_policy *p) 228 228 { 229 + } 230 + 231 + static inline struct mempolicy * 232 + mpol_shared_policy_lookup(struct shared_policy *sp, unsigned long idx) 233 + { 234 + return NULL; 229 235 } 230 236 231 237 #define vma_policy(vma) NULL

+2 -1

include/linux/mempool.h

··· 5 5 #define _LINUX_MEMPOOL_H 6 6 7 7 #include <linux/wait.h> 8 + #include <linux/compiler.h> 8 9 9 10 struct kmem_cache; 10 11 ··· 32 31 33 32 extern int mempool_resize(mempool_t *pool, int new_min_nr); 34 33 extern void mempool_destroy(mempool_t *pool); 35 - extern void * mempool_alloc(mempool_t *pool, gfp_t gfp_mask); 34 + extern void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) __malloc; 36 35 extern void mempool_free(void *element, mempool_t *pool); 37 36 38 37 /*

+5 -27

include/linux/mm.h

··· 447 447 * On nommu, vmalloc/vfree wrap through kmalloc/kfree directly, so there 448 448 * is no special casing required. 449 449 */ 450 - static inline int is_vmalloc_addr(const void *x) 450 + static inline bool is_vmalloc_addr(const void *x) 451 451 { 452 452 #ifdef CONFIG_MMU 453 453 unsigned long addr = (unsigned long)x; 454 454 455 455 return addr >= VMALLOC_START && addr < VMALLOC_END; 456 456 #else 457 - return 0; 457 + return false; 458 458 #endif 459 459 } 460 460 #ifdef CONFIG_MMU ··· 734 734 page = compound_head(page); 735 735 /* 736 736 * Getting a normal page or the head of a compound page 737 - * requires to already have an elevated page->_count. 737 + * requires to already have an elevated page->_refcount. 738 738 */ 739 739 VM_BUG_ON_PAGE(page_ref_count(page) <= 0, page); 740 740 page_ref_inc(page); ··· 850 850 851 851 static inline void page_cpupid_reset_last(struct page *page) 852 852 { 853 - int cpupid = (1 << LAST_CPUPID_SHIFT) - 1; 854 - 855 - page->flags &= ~(LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT); 856 - page->flags |= (cpupid & LAST_CPUPID_MASK) << LAST_CPUPID_PGSHIFT; 853 + page->flags |= LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT; 857 854 } 858 855 #endif /* LAST_CPUPID_NOT_IN_PAGE_FLAGS */ 859 856 #else /* !CONFIG_NUMA_BALANCING */ ··· 1029 1032 return page->index; 1030 1033 } 1031 1034 1032 - /* 1033 - * Return true if this page is mapped into pagetables. 1034 - * For compound page it returns true if any subpage of compound page is mapped. 1035 - */ 1036 - static inline bool page_mapped(struct page *page) 1037 - { 1038 - int i; 1039 - if (likely(!PageCompound(page))) 1040 - return atomic_read(&page->_mapcount) >= 0; 1041 - page = compound_head(page); 1042 - if (atomic_read(compound_mapcount_ptr(page)) >= 0) 1043 - return true; 1044 - if (PageHuge(page)) 1045 - return false; 1046 - for (i = 0; i < hpage_nr_pages(page); i++) { 1047 - if (atomic_read(&page[i]._mapcount) >= 0) 1048 - return true; 1049 - } 1050 - return false; 1051 - } 1035 + bool page_mapped(struct page *page); 1052 1036 1053 1037 /* 1054 1038 * Return true only if the page has been allocated with

+18 -6

include/linux/mm_inline.h

··· 22 22 return !PageSwapBacked(page); 23 23 } 24 24 25 + static __always_inline void __update_lru_size(struct lruvec *lruvec, 26 + enum lru_list lru, int nr_pages) 27 + { 28 + __mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, nr_pages); 29 + } 30 + 31 + static __always_inline void update_lru_size(struct lruvec *lruvec, 32 + enum lru_list lru, int nr_pages) 33 + { 34 + #ifdef CONFIG_MEMCG 35 + mem_cgroup_update_lru_size(lruvec, lru, nr_pages); 36 + #else 37 + __update_lru_size(lruvec, lru, nr_pages); 38 + #endif 39 + } 40 + 25 41 static __always_inline void add_page_to_lru_list(struct page *page, 26 42 struct lruvec *lruvec, enum lru_list lru) 27 43 { 28 - int nr_pages = hpage_nr_pages(page); 29 - mem_cgroup_update_lru_size(lruvec, lru, nr_pages); 44 + update_lru_size(lruvec, lru, hpage_nr_pages(page)); 30 45 list_add(&page->lru, &lruvec->lists[lru]); 31 - __mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, nr_pages); 32 46 } 33 47 34 48 static __always_inline void del_page_from_lru_list(struct page *page, 35 49 struct lruvec *lruvec, enum lru_list lru) 36 50 { 37 - int nr_pages = hpage_nr_pages(page); 38 - mem_cgroup_update_lru_size(lruvec, lru, -nr_pages); 39 51 list_del(&page->lru); 40 - __mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, -nr_pages); 52 + update_lru_size(lruvec, lru, -hpage_nr_pages(page)); 41 53 } 42 54 43 55 /**

+9 -5

include/linux/mm_types.h

··· 73 73 unsigned long counters; 74 74 #else 75 75 /* 76 - * Keep _count separate from slub cmpxchg_double data. 77 - * As the rest of the double word is protected by 78 - * slab_lock but _count is not. 76 + * Keep _refcount separate from slub cmpxchg_double 77 + * data. As the rest of the double word is protected by 78 + * slab_lock but _refcount is not. 79 79 */ 80 80 unsigned counters; 81 81 #endif ··· 97 97 }; 98 98 int units; /* SLOB */ 99 99 }; 100 - atomic_t _count; /* Usage count, see below. */ 100 + /* 101 + * Usage count, *USE WRAPPER FUNCTION* 102 + * when manual accounting. See page_ref.h 103 + */ 104 + atomic_t _refcount; 101 105 }; 102 106 unsigned int active; /* SLAB */ 103 107 }; ··· 252 248 __u32 offset; 253 249 #endif 254 250 /* we maintain a pagecount bias, so that we dont dirty cache line 255 - * containing page->_count every time we allocate a fragment. 251 + * containing page->_refcount every time we allocate a fragment. 256 252 */ 257 253 unsigned int pagecnt_bias; 258 254 bool pfmemalloc;

+25 -21

include/linux/mmzone.h

··· 85 85 get_pfnblock_flags_mask(page, page_to_pfn(page), \ 86 86 PB_migrate_end, MIGRATETYPE_MASK) 87 87 88 - static inline int get_pfnblock_migratetype(struct page *page, unsigned long pfn) 89 - { 90 - BUILD_BUG_ON(PB_migrate_end - PB_migrate != 2); 91 - return get_pfnblock_flags_mask(page, pfn, PB_migrate_end, 92 - MIGRATETYPE_MASK); 93 - } 94 - 95 88 struct free_area { 96 89 struct list_head free_list[MIGRATE_TYPES]; 97 90 unsigned long nr_free; ··· 740 747 void build_all_zonelists(pg_data_t *pgdat, struct zone *zone); 741 748 void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx); 742 749 bool zone_watermark_ok(struct zone *z, unsigned int order, 743 - unsigned long mark, int classzone_idx, int alloc_flags); 750 + unsigned long mark, int classzone_idx, 751 + unsigned int alloc_flags); 744 752 bool zone_watermark_ok_safe(struct zone *z, unsigned int order, 745 753 unsigned long mark, int classzone_idx); 746 754 enum memmap_context { ··· 822 828 static inline int is_highmem(struct zone *zone) 823 829 { 824 830 #ifdef CONFIG_HIGHMEM 825 - int zone_off = (char *)zone - (char *)zone->zone_pgdat->node_zones; 826 - return zone_off == ZONE_HIGHMEM * sizeof(*zone) || 827 - (zone_off == ZONE_MOVABLE * sizeof(*zone) && 828 - zone_movable_is_highmem()); 831 + return is_highmem_idx(zone_idx(zone)); 829 832 #else 830 833 return 0; 831 834 #endif ··· 913 922 #endif /* CONFIG_NUMA */ 914 923 } 915 924 925 + struct zoneref *__next_zones_zonelist(struct zoneref *z, 926 + enum zone_type highest_zoneidx, 927 + nodemask_t *nodes); 928 + 916 929 /** 917 930 * next_zones_zonelist - Returns the next zone at or below highest_zoneidx within the allowed nodemask using a cursor within a zonelist as a starting point 918 931 * @z - The cursor used as a starting point for the search ··· 929 934 * being examined. It should be advanced by one before calling 930 935 * next_zones_zonelist again. 931 936 */ 932 - struct zoneref *next_zones_zonelist(struct zoneref *z, 937 + static __always_inline struct zoneref *next_zones_zonelist(struct zoneref *z, 933 938 enum zone_type highest_zoneidx, 934 - nodemask_t *nodes); 939 + nodemask_t *nodes) 940 + { 941 + if (likely(!nodes && zonelist_zone_idx(z) <= highest_zoneidx)) 942 + return z; 943 + return __next_zones_zonelist(z, highest_zoneidx, nodes); 944 + } 935 945 936 946 /** 937 947 * first_zones_zonelist - Returns the first zone at or below highest_zoneidx within the allowed nodemask in a zonelist ··· 952 952 */ 953 953 static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist, 954 954 enum zone_type highest_zoneidx, 955 - nodemask_t *nodes, 956 - struct zone **zone) 955 + nodemask_t *nodes) 957 956 { 958 - struct zoneref *z = next_zones_zonelist(zonelist->_zonerefs, 957 + return next_zones_zonelist(zonelist->_zonerefs, 959 958 highest_zoneidx, nodes); 960 - *zone = zonelist_zone(z); 961 - return z; 962 959 } 963 960 964 961 /** ··· 970 973 * within a given nodemask 971 974 */ 972 975 #define for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \ 973 - for (z = first_zones_zonelist(zlist, highidx, nodemask, &zone); \ 976 + for (z = first_zones_zonelist(zlist, highidx, nodemask), zone = zonelist_zone(z); \ 974 977 zone; \ 975 978 z = next_zones_zonelist(++z, highidx, nodemask), \ 976 - zone = zonelist_zone(z)) \ 979 + zone = zonelist_zone(z)) 980 + 981 + #define for_next_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \ 982 + for (zone = z->zone; \ 983 + zone; \ 984 + z = next_zones_zonelist(++z, highidx, nodemask), \ 985 + zone = zonelist_zone(z)) 986 + 977 987 978 988 /** 979 989 * for_each_zone_zonelist - helper macro to iterate over valid zones in a zonelist at or below a given zone index

+10 -1

include/linux/nodemask.h

··· 43 43 * 44 44 * int first_node(mask) Number lowest set bit, or MAX_NUMNODES 45 45 * int next_node(node, mask) Next node past 'node', or MAX_NUMNODES 46 + * int next_node_in(node, mask) Next node past 'node', or wrap to first, 47 + * or MAX_NUMNODES 46 48 * int first_unset_node(mask) First node not set in mask, or 47 - * MAX_NUMNODES. 49 + * MAX_NUMNODES 48 50 * 49 51 * nodemask_t nodemask_of_node(node) Return nodemask with bit 'node' set 50 52 * NODE_MASK_ALL Initializer - all bits set ··· 260 258 { 261 259 return min_t(int,MAX_NUMNODES,find_next_bit(srcp->bits, MAX_NUMNODES, n+1)); 262 260 } 261 + 262 + /* 263 + * Find the next present node in src, starting after node n, wrapping around to 264 + * the first node in src if needed. Returns MAX_NUMNODES if src is empty. 265 + */ 266 + #define next_node_in(n, src) __next_node_in((n), &(src)) 267 + int __next_node_in(int node, const nodemask_t *srcp); 263 268 264 269 static inline void init_nodemask_of_node(nodemask_t *mask, int node) 265 270 {

+8

include/linux/oom.h

··· 72 72 73 73 extern void mark_oom_victim(struct task_struct *tsk); 74 74 75 + #ifdef CONFIG_MMU 76 + extern void try_oom_reaper(struct task_struct *tsk); 77 + #else 78 + static inline void try_oom_reaper(struct task_struct *tsk) 79 + { 80 + } 81 + #endif 82 + 75 83 extern unsigned long oom_badness(struct task_struct *p, 76 84 struct mem_cgroup *memcg, const nodemask_t *nodemask, 77 85 unsigned long totalpages);

-5

include/linux/padata.h

··· 175 175 extern void padata_do_serial(struct padata_priv *padata); 176 176 extern int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type, 177 177 cpumask_var_t cpumask); 178 - extern int padata_set_cpumasks(struct padata_instance *pinst, 179 - cpumask_var_t pcpumask, 180 - cpumask_var_t cbcpumask); 181 - extern int padata_add_cpu(struct padata_instance *pinst, int cpu, int mask); 182 - extern int padata_remove_cpu(struct padata_instance *pinst, int cpu, int mask); 183 178 extern int padata_start(struct padata_instance *pinst); 184 179 extern void padata_stop(struct padata_instance *pinst); 185 180 extern int padata_register_cpumask_notifier(struct padata_instance *pinst,

+6 -1

include/linux/page-flags.h

··· 371 371 #define PAGE_MAPPING_KSM 2 372 372 #define PAGE_MAPPING_FLAGS (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM) 373 373 374 + static __always_inline int PageAnonHead(struct page *page) 375 + { 376 + return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0; 377 + } 378 + 374 379 static __always_inline int PageAnon(struct page *page) 375 380 { 376 381 page = compound_head(page); 377 - return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0; 382 + return PageAnonHead(page); 378 383 } 379 384 380 385 #ifdef CONFIG_KSM

+13 -13

include/linux/page_ref.h

··· 63 63 64 64 static inline int page_ref_count(struct page *page) 65 65 { 66 - return atomic_read(&page->_count); 66 + return atomic_read(&page->_refcount); 67 67 } 68 68 69 69 static inline int page_count(struct page *page) 70 70 { 71 - return atomic_read(&compound_head(page)->_count); 71 + return atomic_read(&compound_head(page)->_refcount); 72 72 } 73 73 74 74 static inline void set_page_count(struct page *page, int v) 75 75 { 76 - atomic_set(&page->_count, v); 76 + atomic_set(&page->_refcount, v); 77 77 if (page_ref_tracepoint_active(__tracepoint_page_ref_set)) 78 78 __page_ref_set(page, v); 79 79 } ··· 89 89 90 90 static inline void page_ref_add(struct page *page, int nr) 91 91 { 92 - atomic_add(nr, &page->_count); 92 + atomic_add(nr, &page->_refcount); 93 93 if (page_ref_tracepoint_active(__tracepoint_page_ref_mod)) 94 94 __page_ref_mod(page, nr); 95 95 } 96 96 97 97 static inline void page_ref_sub(struct page *page, int nr) 98 98 { 99 - atomic_sub(nr, &page->_count); 99 + atomic_sub(nr, &page->_refcount); 100 100 if (page_ref_tracepoint_active(__tracepoint_page_ref_mod)) 101 101 __page_ref_mod(page, -nr); 102 102 } 103 103 104 104 static inline void page_ref_inc(struct page *page) 105 105 { 106 - atomic_inc(&page->_count); 106 + atomic_inc(&page->_refcount); 107 107 if (page_ref_tracepoint_active(__tracepoint_page_ref_mod)) 108 108 __page_ref_mod(page, 1); 109 109 } 110 110 111 111 static inline void page_ref_dec(struct page *page) 112 112 { 113 - atomic_dec(&page->_count); 113 + atomic_dec(&page->_refcount); 114 114 if (page_ref_tracepoint_active(__tracepoint_page_ref_mod)) 115 115 __page_ref_mod(page, -1); 116 116 } 117 117 118 118 static inline int page_ref_sub_and_test(struct page *page, int nr) 119 119 { 120 - int ret = atomic_sub_and_test(nr, &page->_count); 120 + int ret = atomic_sub_and_test(nr, &page->_refcount); 121 121 122 122 if (page_ref_tracepoint_active(__tracepoint_page_ref_mod_and_test)) 123 123 __page_ref_mod_and_test(page, -nr, ret); ··· 126 126 127 127 static inline int page_ref_dec_and_test(struct page *page) 128 128 { 129 - int ret = atomic_dec_and_test(&page->_count); 129 + int ret = atomic_dec_and_test(&page->_refcount); 130 130 131 131 if (page_ref_tracepoint_active(__tracepoint_page_ref_mod_and_test)) 132 132 __page_ref_mod_and_test(page, -1, ret); ··· 135 135 136 136 static inline int page_ref_dec_return(struct page *page) 137 137 { 138 - int ret = atomic_dec_return(&page->_count); 138 + int ret = atomic_dec_return(&page->_refcount); 139 139 140 140 if (page_ref_tracepoint_active(__tracepoint_page_ref_mod_and_return)) 141 141 __page_ref_mod_and_return(page, -1, ret); ··· 144 144 145 145 static inline int page_ref_add_unless(struct page *page, int nr, int u) 146 146 { 147 - int ret = atomic_add_unless(&page->_count, nr, u); 147 + int ret = atomic_add_unless(&page->_refcount, nr, u); 148 148 149 149 if (page_ref_tracepoint_active(__tracepoint_page_ref_mod_unless)) 150 150 __page_ref_mod_unless(page, nr, ret); ··· 153 153 154 154 static inline int page_ref_freeze(struct page *page, int count) 155 155 { 156 - int ret = likely(atomic_cmpxchg(&page->_count, count, 0) == count); 156 + int ret = likely(atomic_cmpxchg(&page->_refcount, count, 0) == count); 157 157 158 158 if (page_ref_tracepoint_active(__tracepoint_page_ref_freeze)) 159 159 __page_ref_freeze(page, count, ret); ··· 165 165 VM_BUG_ON_PAGE(page_count(page) != 0, page); 166 166 VM_BUG_ON(count == 0); 167 167 168 - atomic_set(&page->_count, count); 168 + atomic_set(&page->_refcount, count); 169 169 if (page_ref_tracepoint_active(__tracepoint_page_ref_unfreeze)) 170 170 __page_ref_unfreeze(page, count); 171 171 }

+4 -4

include/linux/pagemap.h

··· 90 90 91 91 /* 92 92 * speculatively take a reference to a page. 93 - * If the page is free (_count == 0), then _count is untouched, and 0 94 - * is returned. Otherwise, _count is incremented by 1 and 1 is returned. 93 + * If the page is free (_refcount == 0), then _refcount is untouched, and 0 94 + * is returned. Otherwise, _refcount is incremented by 1 and 1 is returned. 95 95 * 96 96 * This function must be called inside the same rcu_read_lock() section as has 97 97 * been used to lookup the page in the pagecache radix-tree (or page table): 98 - * this allows allocators to use a synchronize_rcu() to stabilize _count. 98 + * this allows allocators to use a synchronize_rcu() to stabilize _refcount. 99 99 * 100 100 * Unless an RCU grace period has passed, the count of all pages coming out 101 101 * of the allocator must be considered unstable. page_count may return higher ··· 111 111 * 2. conditionally increment refcount 112 112 * 3. check the page is still in pagecache (if no, goto 1) 113 113 * 114 - * Remove-side that cares about stability of _count (eg. reclaim) has the 114 + * Remove-side that cares about stability of _refcount (eg. reclaim) has the 115 115 * following (with tree_lock held for write): 116 116 * A. atomically check refcount is correct and set it to 0 (atomic_cmpxchg) 117 117 * B. remove page from pagecache

+6 -5

include/linux/poll.h

··· 96 96 extern void poll_freewait(struct poll_wqueues *pwq); 97 97 extern int poll_schedule_timeout(struct poll_wqueues *pwq, int state, 98 98 ktime_t *expires, unsigned long slack); 99 - extern u64 select_estimate_accuracy(struct timespec *tv); 99 + extern u64 select_estimate_accuracy(struct timespec64 *tv); 100 100 101 101 102 102 static inline int poll_schedule(struct poll_wqueues *pwq, int state) ··· 153 153 154 154 #define MAX_INT64_SECONDS (((s64)(~((u64)0)>>1)/HZ)-1) 155 155 156 - extern int do_select(int n, fd_set_bits *fds, struct timespec *end_time); 156 + extern int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time); 157 157 extern int do_sys_poll(struct pollfd __user * ufds, unsigned int nfds, 158 - struct timespec *end_time); 158 + struct timespec64 *end_time); 159 159 extern int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp, 160 - fd_set __user *exp, struct timespec *end_time); 160 + fd_set __user *exp, struct timespec64 *end_time); 161 161 162 - extern int poll_select_set_timeout(struct timespec *to, long sec, long nsec); 162 + extern int poll_select_set_timeout(struct timespec64 *to, time64_t sec, 163 + long nsec); 163 164 164 165 #endif /* _LINUX_POLL_H */

+8 -8

include/linux/slab.h

··· 315 315 } 316 316 #endif /* !CONFIG_SLOB */ 317 317 318 - void *__kmalloc(size_t size, gfp_t flags) __assume_kmalloc_alignment; 319 - void *kmem_cache_alloc(struct kmem_cache *, gfp_t flags) __assume_slab_alignment; 318 + void *__kmalloc(size_t size, gfp_t flags) __assume_kmalloc_alignment __malloc; 319 + void *kmem_cache_alloc(struct kmem_cache *, gfp_t flags) __assume_slab_alignment __malloc; 320 320 void kmem_cache_free(struct kmem_cache *, void *); 321 321 322 322 /* ··· 339 339 } 340 340 341 341 #ifdef CONFIG_NUMA 342 - void *__kmalloc_node(size_t size, gfp_t flags, int node) __assume_kmalloc_alignment; 343 - void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node) __assume_slab_alignment; 342 + void *__kmalloc_node(size_t size, gfp_t flags, int node) __assume_kmalloc_alignment __malloc; 343 + void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node) __assume_slab_alignment __malloc; 344 344 #else 345 345 static __always_inline void *__kmalloc_node(size_t size, gfp_t flags, int node) 346 346 { ··· 354 354 #endif 355 355 356 356 #ifdef CONFIG_TRACING 357 - extern void *kmem_cache_alloc_trace(struct kmem_cache *, gfp_t, size_t) __assume_slab_alignment; 357 + extern void *kmem_cache_alloc_trace(struct kmem_cache *, gfp_t, size_t) __assume_slab_alignment __malloc; 358 358 359 359 #ifdef CONFIG_NUMA 360 360 extern void *kmem_cache_alloc_node_trace(struct kmem_cache *s, 361 361 gfp_t gfpflags, 362 - int node, size_t size) __assume_slab_alignment; 362 + int node, size_t size) __assume_slab_alignment __malloc; 363 363 #else 364 364 static __always_inline void * 365 365 kmem_cache_alloc_node_trace(struct kmem_cache *s, ··· 392 392 } 393 393 #endif /* CONFIG_TRACING */ 394 394 395 - extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) __assume_page_alignment; 395 + extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) __assume_page_alignment __malloc; 396 396 397 397 #ifdef CONFIG_TRACING 398 - extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order) __assume_page_alignment; 398 + extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order) __assume_page_alignment __malloc; 399 399 #else 400 400 static __always_inline void * 401 401 kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)

+4

include/linux/slab_def.h

··· 80 80 struct kasan_cache kasan_info; 81 81 #endif 82 82 83 + #ifdef CONFIG_SLAB_FREELIST_RANDOM 84 + void *random_seq; 85 + #endif 86 + 83 87 struct kmem_cache_node *node[MAX_NUMNODES]; 84 88 }; 85 89

+1 -1

include/linux/string.h

··· 119 119 120 120 extern void kfree_const(const void *x); 121 121 122 - extern char *kstrdup(const char *s, gfp_t gfp); 122 + extern char *kstrdup(const char *s, gfp_t gfp) __malloc; 123 123 extern const char *kstrdup_const(const char *s, gfp_t gfp); 124 124 extern char *kstrndup(const char *s, size_t len, gfp_t gfp); 125 125 extern void *kmemdup(const void *src, size_t len, gfp_t gfp);

+7 -10

include/linux/time64.h

··· 65 65 # define timespec64_equal timespec_equal 66 66 # define timespec64_compare timespec_compare 67 67 # define set_normalized_timespec64 set_normalized_timespec 68 - # define timespec64_add_safe timespec_add_safe 69 68 # define timespec64_add timespec_add 70 69 # define timespec64_sub timespec_sub 71 70 # define timespec64_valid timespec_valid ··· 132 133 } 133 134 134 135 extern void set_normalized_timespec64(struct timespec64 *ts, time64_t sec, s64 nsec); 135 - 136 - /* 137 - * timespec64_add_safe assumes both values are positive and checks for 138 - * overflow. It will return TIME_T_MAX if the returned value would be 139 - * smaller then either of the arguments. 140 - */ 141 - extern struct timespec64 timespec64_add_safe(const struct timespec64 lhs, 142 - const struct timespec64 rhs); 143 - 144 136 145 137 static inline struct timespec64 timespec64_add(struct timespec64 lhs, 146 138 struct timespec64 rhs) ··· 213 223 } 214 224 215 225 #endif 226 + 227 + /* 228 + * timespec64_add_safe assumes both values are positive and checks for 229 + * overflow. It will return TIME64_MAX in case of overflow. 230 + */ 231 + extern struct timespec64 timespec64_add_safe(const struct timespec64 lhs, 232 + const struct timespec64 rhs); 216 233 217 234 #endif /* _LINUX_TIME64_H */

+4 -2

include/linux/vmstat.h

··· 163 163 #ifdef CONFIG_NUMA 164 164 165 165 extern unsigned long node_page_state(int node, enum zone_stat_item item); 166 - extern void zone_statistics(struct zone *, struct zone *, gfp_t gfp); 167 166 168 167 #else 169 168 170 169 #define node_page_state(node, item) global_page_state(item) 171 - #define zone_statistics(_zl, _z, gfp) do { } while (0) 172 170 173 171 #endif /* CONFIG_NUMA */ 174 172 ··· 190 192 void quiet_vmstat(void); 191 193 void cpu_vm_stats_fold(int cpu); 192 194 void refresh_zone_stat_thresholds(void); 195 + 196 + struct ctl_table; 197 + int vmstat_refresh(struct ctl_table *, int write, 198 + void __user *buffer, size_t *lenp, loff_t *ppos); 193 199 194 200 void drain_zonestat(struct zone *zone, struct per_cpu_pageset *); 195 201

+9

init/Kconfig

··· 1742 1742 1743 1743 endchoice 1744 1744 1745 + config SLAB_FREELIST_RANDOM 1746 + default n 1747 + depends on SLAB 1748 + bool "SLAB freelist randomization" 1749 + help 1750 + Randomizes the freelist order used on creating new SLABs. This 1751 + security feature reduces the predictability of the kernel slab 1752 + allocator against heap overflows. 1753 + 1745 1754 config SLUB_CPU_PARTIAL 1746 1755 default y 1747 1756 depends on SLUB && SMP

+8 -14

kernel/cpuset.c

··· 61 61 #include <linux/cgroup.h> 62 62 #include <linux/wait.h> 63 63 64 - struct static_key cpusets_enabled_key __read_mostly = STATIC_KEY_INIT_FALSE; 64 + DEFINE_STATIC_KEY_FALSE(cpusets_enabled_key); 65 65 66 66 /* See "Frequency meter" comments, below. */ 67 67 ··· 2528 2528 * GFP_KERNEL - any node in enclosing hardwalled cpuset ok 2529 2529 * GFP_USER - only nodes in current tasks mems allowed ok. 2530 2530 */ 2531 - int __cpuset_node_allowed(int node, gfp_t gfp_mask) 2531 + bool __cpuset_node_allowed(int node, gfp_t gfp_mask) 2532 2532 { 2533 2533 struct cpuset *cs; /* current cpuset ancestors */ 2534 2534 int allowed; /* is allocation in zone z allowed? */ 2535 2535 unsigned long flags; 2536 2536 2537 2537 if (in_interrupt()) 2538 - return 1; 2538 + return true; 2539 2539 if (node_isset(node, current->mems_allowed)) 2540 - return 1; 2540 + return true; 2541 2541 /* 2542 2542 * Allow tasks that have access to memory reserves because they have 2543 2543 * been OOM killed to get memory anywhere. 2544 2544 */ 2545 2545 if (unlikely(test_thread_flag(TIF_MEMDIE))) 2546 - return 1; 2546 + return true; 2547 2547 if (gfp_mask & __GFP_HARDWALL) /* If hardwall request, stop here */ 2548 - return 0; 2548 + return false; 2549 2549 2550 2550 if (current->flags & PF_EXITING) /* Let dying task have memory */ 2551 - return 1; 2551 + return true; 2552 2552 2553 2553 /* Not hardwall and node outside mems_allowed: scan up cpusets */ 2554 2554 spin_lock_irqsave(&callback_lock, flags); ··· 2591 2591 2592 2592 static int cpuset_spread_node(int *rotor) 2593 2593 { 2594 - int node; 2595 - 2596 - node = next_node(*rotor, current->mems_allowed); 2597 - if (node == MAX_NUMNODES) 2598 - node = first_node(current->mems_allowed); 2599 - *rotor = node; 2600 - return node; 2594 + return *rotor = next_node_in(*rotor, current->mems_allowed); 2601 2595 } 2602 2596 2603 2597 int cpuset_mem_spread_node(void)

+1 -1

kernel/kexec_core.c

··· 1410 1410 VMCOREINFO_STRUCT_SIZE(list_head); 1411 1411 VMCOREINFO_SIZE(nodemask_t); 1412 1412 VMCOREINFO_OFFSET(page, flags); 1413 - VMCOREINFO_OFFSET(page, _count); 1413 + VMCOREINFO_OFFSET(page, _refcount); 1414 1414 VMCOREINFO_OFFSET(page, mapping); 1415 1415 VMCOREINFO_OFFSET(page, lru); 1416 1416 VMCOREINFO_OFFSET(page, _mapcount);

+37 -101

kernel/padata.c

··· 607 607 } 608 608 609 609 /** 610 - * padata_set_cpumasks - Set both parallel and serial cpumasks. The first 611 - * one is used by parallel workers and the second one 612 - * by the wokers doing serialization. 613 - * 614 - * @pinst: padata instance 615 - * @pcpumask: the cpumask to use for parallel workers 616 - * @cbcpumask: the cpumsak to use for serial workers 617 - */ 618 - int padata_set_cpumasks(struct padata_instance *pinst, cpumask_var_t pcpumask, 619 - cpumask_var_t cbcpumask) 620 - { 621 - int err; 622 - 623 - mutex_lock(&pinst->lock); 624 - get_online_cpus(); 625 - 626 - err = __padata_set_cpumasks(pinst, pcpumask, cbcpumask); 627 - 628 - put_online_cpus(); 629 - mutex_unlock(&pinst->lock); 630 - 631 - return err; 632 - 633 - } 634 - EXPORT_SYMBOL(padata_set_cpumasks); 635 - 636 - /** 637 610 * padata_set_cpumask: Sets specified by @cpumask_type cpumask to the value 638 611 * equivalent to @cpumask. 639 612 * ··· 647 674 } 648 675 EXPORT_SYMBOL(padata_set_cpumask); 649 676 677 + /** 678 + * padata_start - start the parallel processing 679 + * 680 + * @pinst: padata instance to start 681 + */ 682 + int padata_start(struct padata_instance *pinst) 683 + { 684 + int err = 0; 685 + 686 + mutex_lock(&pinst->lock); 687 + 688 + if (pinst->flags & PADATA_INVALID) 689 + err = -EINVAL; 690 + 691 + __padata_start(pinst); 692 + 693 + mutex_unlock(&pinst->lock); 694 + 695 + return err; 696 + } 697 + EXPORT_SYMBOL(padata_start); 698 + 699 + /** 700 + * padata_stop - stop the parallel processing 701 + * 702 + * @pinst: padata instance to stop 703 + */ 704 + void padata_stop(struct padata_instance *pinst) 705 + { 706 + mutex_lock(&pinst->lock); 707 + __padata_stop(pinst); 708 + mutex_unlock(&pinst->lock); 709 + } 710 + EXPORT_SYMBOL(padata_stop); 711 + 712 + #ifdef CONFIG_HOTPLUG_CPU 713 + 650 714 static int __padata_add_cpu(struct padata_instance *pinst, int cpu) 651 715 { 652 716 struct parallel_data *pd; ··· 703 693 704 694 return 0; 705 695 } 706 - 707 - /** 708 - * padata_add_cpu - add a cpu to one or both(parallel and serial) 709 - * padata cpumasks. 710 - * 711 - * @pinst: padata instance 712 - * @cpu: cpu to add 713 - * @mask: bitmask of flags specifying to which cpumask @cpu shuld be added. 714 - * The @mask may be any combination of the following flags: 715 - * PADATA_CPU_SERIAL - serial cpumask 716 - * PADATA_CPU_PARALLEL - parallel cpumask 717 - */ 718 - 719 - int padata_add_cpu(struct padata_instance *pinst, int cpu, int mask) 720 - { 721 - int err; 722 - 723 - if (!(mask & (PADATA_CPU_SERIAL | PADATA_CPU_PARALLEL))) 724 - return -EINVAL; 725 - 726 - mutex_lock(&pinst->lock); 727 - 728 - get_online_cpus(); 729 - if (mask & PADATA_CPU_SERIAL) 730 - cpumask_set_cpu(cpu, pinst->cpumask.cbcpu); 731 - if (mask & PADATA_CPU_PARALLEL) 732 - cpumask_set_cpu(cpu, pinst->cpumask.pcpu); 733 - 734 - err = __padata_add_cpu(pinst, cpu); 735 - put_online_cpus(); 736 - 737 - mutex_unlock(&pinst->lock); 738 - 739 - return err; 740 - } 741 - EXPORT_SYMBOL(padata_add_cpu); 742 696 743 697 static int __padata_remove_cpu(struct padata_instance *pinst, int cpu) 744 698 { ··· 762 788 return err; 763 789 } 764 790 EXPORT_SYMBOL(padata_remove_cpu); 765 - 766 - /** 767 - * padata_start - start the parallel processing 768 - * 769 - * @pinst: padata instance to start 770 - */ 771 - int padata_start(struct padata_instance *pinst) 772 - { 773 - int err = 0; 774 - 775 - mutex_lock(&pinst->lock); 776 - 777 - if (pinst->flags & PADATA_INVALID) 778 - err =-EINVAL; 779 - 780 - __padata_start(pinst); 781 - 782 - mutex_unlock(&pinst->lock); 783 - 784 - return err; 785 - } 786 - EXPORT_SYMBOL(padata_start); 787 - 788 - /** 789 - * padata_stop - stop the parallel processing 790 - * 791 - * @pinst: padata instance to stop 792 - */ 793 - void padata_stop(struct padata_instance *pinst) 794 - { 795 - mutex_lock(&pinst->lock); 796 - __padata_stop(pinst); 797 - mutex_unlock(&pinst->lock); 798 - } 799 - EXPORT_SYMBOL(padata_stop); 800 - 801 - #ifdef CONFIG_HOTPLUG_CPU 802 791 803 792 static inline int pinst_has_cpu(struct padata_instance *pinst, int cpu) 804 793 { ··· 1028 1091 err: 1029 1092 return NULL; 1030 1093 } 1031 - EXPORT_SYMBOL(padata_alloc); 1032 1094 1033 1095 /** 1034 1096 * padata_free - free a padata instance

+3 -23

kernel/rcu/update.c

··· 380 380 debug_object_free(head, &rcuhead_debug_descr); 381 381 } 382 382 383 - /* 384 - * fixup_activate is called when: 385 - * - an active object is activated 386 - * - an unknown object is activated (might be a statically initialized object) 387 - * Activation is performed internally by call_rcu(). 388 - */ 389 - static int rcuhead_fixup_activate(void *addr, enum debug_obj_state state) 383 + static bool rcuhead_is_static_object(void *addr) 390 384 { 391 - struct rcu_head *head = addr; 392 - 393 - switch (state) { 394 - 395 - case ODEBUG_STATE_NOTAVAILABLE: 396 - /* 397 - * This is not really a fixup. We just make sure that it is 398 - * tracked in the object tracker. 399 - */ 400 - debug_object_init(head, &rcuhead_debug_descr); 401 - debug_object_activate(head, &rcuhead_debug_descr); 402 - return 0; 403 - default: 404 - return 1; 405 - } 385 + return true; 406 386 } 407 387 408 388 /** ··· 420 440 421 441 struct debug_obj_descr rcuhead_debug_descr = { 422 442 .name = "rcu_head", 423 - .fixup_activate = rcuhead_fixup_activate, 443 + .is_static_object = rcuhead_is_static_object, 424 444 }; 425 445 EXPORT_SYMBOL_GPL(rcuhead_debug_descr); 426 446 #endif /* #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD */

+7

kernel/sysctl.c

··· 1521 1521 .mode = 0644, 1522 1522 .proc_handler = proc_dointvec_jiffies, 1523 1523 }, 1524 + { 1525 + .procname = "stat_refresh", 1526 + .data = NULL, 1527 + .maxlen = 0, 1528 + .mode = 0600, 1529 + .proc_handler = vmstat_refresh, 1530 + }, 1524 1531 #endif 1525 1532 #ifdef CONFIG_MMU 1526 1533 {

+9 -14

kernel/time/hrtimer.c

··· 334 334 * fixup_init is called when: 335 335 * - an active object is initialized 336 336 */ 337 - static int hrtimer_fixup_init(void *addr, enum debug_obj_state state) 337 + static bool hrtimer_fixup_init(void *addr, enum debug_obj_state state) 338 338 { 339 339 struct hrtimer *timer = addr; 340 340 ··· 342 342 case ODEBUG_STATE_ACTIVE: 343 343 hrtimer_cancel(timer); 344 344 debug_object_init(timer, &hrtimer_debug_descr); 345 - return 1; 345 + return true; 346 346 default: 347 - return 0; 347 + return false; 348 348 } 349 349 } 350 350 351 351 /* 352 352 * fixup_activate is called when: 353 353 * - an active object is activated 354 - * - an unknown object is activated (might be a statically initialized object) 354 + * - an unknown non-static object is activated 355 355 */ 356 - static int hrtimer_fixup_activate(void *addr, enum debug_obj_state state) 356 + static bool hrtimer_fixup_activate(void *addr, enum debug_obj_state state) 357 357 { 358 358 switch (state) { 359 - 360 - case ODEBUG_STATE_NOTAVAILABLE: 361 - WARN_ON_ONCE(1); 362 - return 0; 363 - 364 359 case ODEBUG_STATE_ACTIVE: 365 360 WARN_ON(1); 366 361 367 362 default: 368 - return 0; 363 + return false; 369 364 } 370 365 } 371 366 ··· 368 373 * fixup_free is called when: 369 374 * - an active object is freed 370 375 */ 371 - static int hrtimer_fixup_free(void *addr, enum debug_obj_state state) 376 + static bool hrtimer_fixup_free(void *addr, enum debug_obj_state state) 372 377 { 373 378 struct hrtimer *timer = addr; 374 379 ··· 376 381 case ODEBUG_STATE_ACTIVE: 377 382 hrtimer_cancel(timer); 378 383 debug_object_free(timer, &hrtimer_debug_descr); 379 - return 1; 384 + return true; 380 385 default: 381 - return 0; 386 + return false; 382 387 } 383 388 } 384 389

+21

kernel/time/time.c

··· 769 769 770 770 return res; 771 771 } 772 + 773 + /* 774 + * Add two timespec64 values and do a safety check for overflow. 775 + * It's assumed that both values are valid (>= 0). 776 + * And, each timespec64 is in normalized form. 777 + */ 778 + struct timespec64 timespec64_add_safe(const struct timespec64 lhs, 779 + const struct timespec64 rhs) 780 + { 781 + struct timespec64 res; 782 + 783 + set_normalized_timespec64(&res, lhs.tv_sec + rhs.tv_sec, 784 + lhs.tv_nsec + rhs.tv_nsec); 785 + 786 + if (unlikely(res.tv_sec < lhs.tv_sec || res.tv_sec < rhs.tv_sec)) { 787 + res.tv_sec = TIME64_MAX; 788 + res.tv_nsec = 0; 789 + } 790 + 791 + return res; 792 + }

+24 -39

kernel/time/timer.c

··· 489 489 return ((struct timer_list *) addr)->function; 490 490 } 491 491 492 + static bool timer_is_static_object(void *addr) 493 + { 494 + struct timer_list *timer = addr; 495 + 496 + return (timer->entry.pprev == NULL && 497 + timer->entry.next == TIMER_ENTRY_STATIC); 498 + } 499 + 492 500 /* 493 501 * fixup_init is called when: 494 502 * - an active object is initialized 495 503 */ 496 - static int timer_fixup_init(void *addr, enum debug_obj_state state) 504 + static bool timer_fixup_init(void *addr, enum debug_obj_state state) 497 505 { 498 506 struct timer_list *timer = addr; 499 507 ··· 509 501 case ODEBUG_STATE_ACTIVE: 510 502 del_timer_sync(timer); 511 503 debug_object_init(timer, &timer_debug_descr); 512 - return 1; 504 + return true; 513 505 default: 514 - return 0; 506 + return false; 515 507 } 516 508 } 517 509 ··· 524 516 /* 525 517 * fixup_activate is called when: 526 518 * - an active object is activated 527 - * - an unknown object is activated (might be a statically initialized object) 519 + * - an unknown non-static object is activated 528 520 */ 529 - static int timer_fixup_activate(void *addr, enum debug_obj_state state) 521 + static bool timer_fixup_activate(void *addr, enum debug_obj_state state) 530 522 { 531 523 struct timer_list *timer = addr; 532 524 533 525 switch (state) { 534 - 535 526 case ODEBUG_STATE_NOTAVAILABLE: 536 - /* 537 - * This is not really a fixup. The timer was 538 - * statically initialized. We just make sure that it 539 - * is tracked in the object tracker. 540 - */ 541 - if (timer->entry.pprev == NULL && 542 - timer->entry.next == TIMER_ENTRY_STATIC) { 543 - debug_object_init(timer, &timer_debug_descr); 544 - debug_object_activate(timer, &timer_debug_descr); 545 - return 0; 546 - } else { 547 - setup_timer(timer, stub_timer, 0); 548 - return 1; 549 - } 550 - return 0; 527 + setup_timer(timer, stub_timer, 0); 528 + return true; 551 529 552 530 case ODEBUG_STATE_ACTIVE: 553 531 WARN_ON(1); 554 532 555 533 default: 556 - return 0; 534 + return false; 557 535 } 558 536 } 559 537 ··· 547 553 * fixup_free is called when: 548 554 * - an active object is freed 549 555 */ 550 - static int timer_fixup_free(void *addr, enum debug_obj_state state) 556 + static bool timer_fixup_free(void *addr, enum debug_obj_state state) 551 557 { 552 558 struct timer_list *timer = addr; 553 559 ··· 555 561 case ODEBUG_STATE_ACTIVE: 556 562 del_timer_sync(timer); 557 563 debug_object_free(timer, &timer_debug_descr); 558 - return 1; 564 + return true; 559 565 default: 560 - return 0; 566 + return false; 561 567 } 562 568 } 563 569 ··· 565 571 * fixup_assert_init is called when: 566 572 * - an untracked/uninit-ed object is found 567 573 */ 568 - static int timer_fixup_assert_init(void *addr, enum debug_obj_state state) 574 + static bool timer_fixup_assert_init(void *addr, enum debug_obj_state state) 569 575 { 570 576 struct timer_list *timer = addr; 571 577 572 578 switch (state) { 573 579 case ODEBUG_STATE_NOTAVAILABLE: 574 - if (timer->entry.next == TIMER_ENTRY_STATIC) { 575 - /* 576 - * This is not really a fixup. The timer was 577 - * statically initialized. We just make sure that it 578 - * is tracked in the object tracker. 579 - */ 580 - debug_object_init(timer, &timer_debug_descr); 581 - return 0; 582 - } else { 583 - setup_timer(timer, stub_timer, 0); 584 - return 1; 585 - } 580 + setup_timer(timer, stub_timer, 0); 581 + return true; 586 582 default: 587 - return 0; 583 + return false; 588 584 } 589 585 } 590 586 591 587 static struct debug_obj_descr timer_debug_descr = { 592 588 .name = "timer_list", 593 589 .debug_hint = timer_debug_hint, 590 + .is_static_object = timer_is_static_object, 594 591 .fixup_init = timer_fixup_init, 595 592 .fixup_activate = timer_fixup_activate, 596 593 .fixup_free = timer_fixup_free,

+14 -40

kernel/workqueue.c

··· 433 433 return ((struct work_struct *) addr)->func; 434 434 } 435 435 436 + static bool work_is_static_object(void *addr) 437 + { 438 + struct work_struct *work = addr; 439 + 440 + return test_bit(WORK_STRUCT_STATIC_BIT, work_data_bits(work)); 441 + } 442 + 436 443 /* 437 444 * fixup_init is called when: 438 445 * - an active object is initialized 439 446 */ 440 - static int work_fixup_init(void *addr, enum debug_obj_state state) 447 + static bool work_fixup_init(void *addr, enum debug_obj_state state) 441 448 { 442 449 struct work_struct *work = addr; 443 450 ··· 452 445 case ODEBUG_STATE_ACTIVE: 453 446 cancel_work_sync(work); 454 447 debug_object_init(work, &work_debug_descr); 455 - return 1; 448 + return true; 456 449 default: 457 - return 0; 458 - } 459 - } 460 - 461 - /* 462 - * fixup_activate is called when: 463 - * - an active object is activated 464 - * - an unknown object is activated (might be a statically initialized object) 465 - */ 466 - static int work_fixup_activate(void *addr, enum debug_obj_state state) 467 - { 468 - struct work_struct *work = addr; 469 - 470 - switch (state) { 471 - 472 - case ODEBUG_STATE_NOTAVAILABLE: 473 - /* 474 - * This is not really a fixup. The work struct was 475 - * statically initialized. We just make sure that it 476 - * is tracked in the object tracker. 477 - */ 478 - if (test_bit(WORK_STRUCT_STATIC_BIT, work_data_bits(work))) { 479 - debug_object_init(work, &work_debug_descr); 480 - debug_object_activate(work, &work_debug_descr); 481 - return 0; 482 - } 483 - WARN_ON_ONCE(1); 484 - return 0; 485 - 486 - case ODEBUG_STATE_ACTIVE: 487 - WARN_ON(1); 488 - 489 - default: 490 - return 0; 450 + return false; 491 451 } 492 452 } 493 453 ··· 462 488 * fixup_free is called when: 463 489 * - an active object is freed 464 490 */ 465 - static int work_fixup_free(void *addr, enum debug_obj_state state) 491 + static bool work_fixup_free(void *addr, enum debug_obj_state state) 466 492 { 467 493 struct work_struct *work = addr; 468 494 ··· 470 496 case ODEBUG_STATE_ACTIVE: 471 497 cancel_work_sync(work); 472 498 debug_object_free(work, &work_debug_descr); 473 - return 1; 499 + return true; 474 500 default: 475 - return 0; 501 + return false; 476 502 } 477 503 } 478 504 479 505 static struct debug_obj_descr work_debug_descr = { 480 506 .name = "work_struct", 481 507 .debug_hint = work_debug_hint, 508 + .is_static_object = work_is_static_object, 482 509 .fixup_init = work_fixup_init, 483 - .fixup_activate = work_fixup_activate, 484 510 .fixup_free = work_fixup_free, 485 511 }; 486 512

+1 -1

lib/Makefile

··· 25 25 sha1.o md5.o irq_regs.o argv_split.o \ 26 26 flex_proportions.o ratelimit.o show_mem.o \ 27 27 is_single_threaded.o plist.o decompress.o kobject_uevent.o \ 28 - earlycpio.o seq_buf.o nmi_backtrace.o 28 + earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o 29 29 30 30 obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o 31 31 lib-$(CONFIG_MMU) += ioremap.o

+53 -39

lib/debugobjects.c

··· 269 269 * Try to repair the damage, so we have a better chance to get useful 270 270 * debug output. 271 271 */ 272 - static int 273 - debug_object_fixup(int (*fixup)(void *addr, enum debug_obj_state state), 272 + static bool 273 + debug_object_fixup(bool (*fixup)(void *addr, enum debug_obj_state state), 274 274 void * addr, enum debug_obj_state state) 275 275 { 276 - int fixed = 0; 277 - 278 - if (fixup) 279 - fixed = fixup(addr, state); 280 - debug_objects_fixups += fixed; 281 - return fixed; 276 + if (fixup && fixup(addr, state)) { 277 + debug_objects_fixups++; 278 + return true; 279 + } 280 + return false; 282 281 } 283 282 284 283 static void debug_object_is_on_stack(void *addr, int onstack) ··· 415 416 state = obj->state; 416 417 raw_spin_unlock_irqrestore(&db->lock, flags); 417 418 ret = debug_object_fixup(descr->fixup_activate, addr, state); 418 - return ret ? -EINVAL : 0; 419 + return ret ? 0 : -EINVAL; 419 420 420 421 case ODEBUG_STATE_DESTROYED: 421 422 debug_print_object(obj, "activate"); ··· 431 432 432 433 raw_spin_unlock_irqrestore(&db->lock, flags); 433 434 /* 434 - * This happens when a static object is activated. We 435 - * let the type specific code decide whether this is 436 - * true or not. 435 + * We are here when a static object is activated. We 436 + * let the type specific code confirm whether this is 437 + * true or not. if true, we just make sure that the 438 + * static object is tracked in the object tracker. If 439 + * not, this must be a bug, so we try to fix it up. 437 440 */ 438 - if (debug_object_fixup(descr->fixup_activate, addr, 439 - ODEBUG_STATE_NOTAVAILABLE)) { 441 + if (descr->is_static_object && descr->is_static_object(addr)) { 442 + /* track this static object */ 443 + debug_object_init(addr, descr); 444 + debug_object_activate(addr, descr); 445 + } else { 440 446 debug_print_object(&o, "activate"); 441 - return -EINVAL; 447 + ret = debug_object_fixup(descr->fixup_activate, addr, 448 + ODEBUG_STATE_NOTAVAILABLE); 449 + return ret ? 0 : -EINVAL; 442 450 } 443 451 return 0; 444 452 } ··· 609 603 610 604 raw_spin_unlock_irqrestore(&db->lock, flags); 611 605 /* 612 - * Maybe the object is static. Let the type specific 613 - * code decide what to do. 606 + * Maybe the object is static, and we let the type specific 607 + * code confirm. Track this static object if true, else invoke 608 + * fixup. 614 609 */ 615 - if (debug_object_fixup(descr->fixup_assert_init, addr, 616 - ODEBUG_STATE_NOTAVAILABLE)) 610 + if (descr->is_static_object && descr->is_static_object(addr)) { 611 + /* Track this static object */ 612 + debug_object_init(addr, descr); 613 + } else { 617 614 debug_print_object(&o, "assert_init"); 615 + debug_object_fixup(descr->fixup_assert_init, addr, 616 + ODEBUG_STATE_NOTAVAILABLE); 617 + } 618 618 return; 619 619 } 620 620 ··· 805 793 806 794 static __initdata struct debug_obj_descr descr_type_test; 807 795 796 + static bool __init is_static_object(void *addr) 797 + { 798 + struct self_test *obj = addr; 799 + 800 + return obj->static_init; 801 + } 802 + 808 803 /* 809 804 * fixup_init is called when: 810 805 * - an active object is initialized 811 806 */ 812 - static int __init fixup_init(void *addr, enum debug_obj_state state) 807 + static bool __init fixup_init(void *addr, enum debug_obj_state state) 813 808 { 814 809 struct self_test *obj = addr; 815 810 ··· 824 805 case ODEBUG_STATE_ACTIVE: 825 806 debug_object_deactivate(obj, &descr_type_test); 826 807 debug_object_init(obj, &descr_type_test); 827 - return 1; 808 + return true; 828 809 default: 829 - return 0; 810 + return false; 830 811 } 831 812 } 832 813 833 814 /* 834 815 * fixup_activate is called when: 835 816 * - an active object is activated 836 - * - an unknown object is activated (might be a statically initialized object) 817 + * - an unknown non-static object is activated 837 818 */ 838 - static int __init fixup_activate(void *addr, enum debug_obj_state state) 819 + static bool __init fixup_activate(void *addr, enum debug_obj_state state) 839 820 { 840 821 struct self_test *obj = addr; 841 822 842 823 switch (state) { 843 824 case ODEBUG_STATE_NOTAVAILABLE: 844 - if (obj->static_init == 1) { 845 - debug_object_init(obj, &descr_type_test); 846 - debug_object_activate(obj, &descr_type_test); 847 - return 0; 848 - } 849 - return 1; 850 - 825 + return true; 851 826 case ODEBUG_STATE_ACTIVE: 852 827 debug_object_deactivate(obj, &descr_type_test); 853 828 debug_object_activate(obj, &descr_type_test); 854 - return 1; 829 + return true; 855 830 856 831 default: 857 - return 0; 832 + return false; 858 833 } 859 834 } 860 835 ··· 856 843 * fixup_destroy is called when: 857 844 * - an active object is destroyed 858 845 */ 859 - static int __init fixup_destroy(void *addr, enum debug_obj_state state) 846 + static bool __init fixup_destroy(void *addr, enum debug_obj_state state) 860 847 { 861 848 struct self_test *obj = addr; 862 849 ··· 864 851 case ODEBUG_STATE_ACTIVE: 865 852 debug_object_deactivate(obj, &descr_type_test); 866 853 debug_object_destroy(obj, &descr_type_test); 867 - return 1; 854 + return true; 868 855 default: 869 - return 0; 856 + return false; 870 857 } 871 858 } 872 859 ··· 874 861 * fixup_free is called when: 875 862 * - an active object is freed 876 863 */ 877 - static int __init fixup_free(void *addr, enum debug_obj_state state) 864 + static bool __init fixup_free(void *addr, enum debug_obj_state state) 878 865 { 879 866 struct self_test *obj = addr; 880 867 ··· 882 869 case ODEBUG_STATE_ACTIVE: 883 870 debug_object_deactivate(obj, &descr_type_test); 884 871 debug_object_free(obj, &descr_type_test); 885 - return 1; 872 + return true; 886 873 default: 887 - return 0; 874 + return false; 888 875 } 889 876 } 890 877 ··· 930 917 931 918 static __initdata struct debug_obj_descr descr_type_test = { 932 919 .name = "selftest", 920 + .is_static_object = is_static_object, 933 921 .fixup_init = fixup_init, 934 922 .fixup_activate = fixup_activate, 935 923 .fixup_destroy = fixup_destroy,

+30

lib/nodemask.c

··· 1 + #include <linux/nodemask.h> 2 + #include <linux/module.h> 3 + #include <linux/random.h> 4 + 5 + int __next_node_in(int node, const nodemask_t *srcp) 6 + { 7 + int ret = __next_node(node, srcp); 8 + 9 + if (ret == MAX_NUMNODES) 10 + ret = __first_node(srcp); 11 + return ret; 12 + } 13 + EXPORT_SYMBOL(__next_node_in); 14 + 15 + #ifdef CONFIG_NUMA 16 + /* 17 + * Return the bit number of a random bit set in the nodemask. 18 + * (returns NUMA_NO_NODE if nodemask is empty) 19 + */ 20 + int node_random(const nodemask_t *maskp) 21 + { 22 + int w, bit = NUMA_NO_NODE; 23 + 24 + w = nodes_weight(*maskp); 25 + if (w) 26 + bit = bitmap_ord_to_pos(maskp->bits, 27 + get_random_int() % w, MAX_NUMNODES); 28 + return bit; 29 + } 30 + #endif

+3 -3

lib/percpu_counter.c

··· 19 19 20 20 static struct debug_obj_descr percpu_counter_debug_descr; 21 21 22 - static int percpu_counter_fixup_free(void *addr, enum debug_obj_state state) 22 + static bool percpu_counter_fixup_free(void *addr, enum debug_obj_state state) 23 23 { 24 24 struct percpu_counter *fbc = addr; 25 25 ··· 27 27 case ODEBUG_STATE_ACTIVE: 28 28 percpu_counter_destroy(fbc); 29 29 debug_object_free(fbc, &percpu_counter_debug_descr); 30 - return 1; 30 + return true; 31 31 default: 32 - return 0; 32 + return false; 33 33 } 34 34 } 35 35

+16 -5

mm/Kconfig

··· 192 192 def_bool y 193 193 depends on SPARSEMEM && MEMORY_HOTPLUG 194 194 195 + config MEMORY_HOTPLUG_DEFAULT_ONLINE 196 + bool "Online the newly added memory blocks by default" 197 + default n 198 + depends on MEMORY_HOTPLUG 199 + help 200 + This option sets the default policy setting for memory hotplug 201 + onlining policy (/sys/devices/system/memory/auto_online_blocks) which 202 + determines what happens to newly added memory regions. Policy setting 203 + can always be changed at runtime. 204 + See Documentation/memory-hotplug.txt for more information. 205 + 206 + Say Y here if you want all hot-plugged memory blocks to appear in 207 + 'online' state by default. 208 + Say N here if you want the default policy to keep all hot-plugged 209 + memory blocks in 'offline' state. 210 + 195 211 config MEMORY_HOTREMOVE 196 212 bool "Allow for memory hot remove" 197 213 select MEMORY_ISOLATION ··· 283 267 284 268 config PHYS_ADDR_T_64BIT 285 269 def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT 286 - 287 - config ZONE_DMA_FLAG 288 - int 289 - default "0" if !ZONE_DMA 290 - default "1" 291 270 292 271 config BOUNCE 293 272 bool "Enable bounce buffers"

+117 -41

mm/compaction.c

··· 42 42 #define CREATE_TRACE_POINTS 43 43 #include <trace/events/compaction.h> 44 44 45 + #define block_start_pfn(pfn, order) round_down(pfn, 1UL << (order)) 46 + #define block_end_pfn(pfn, order) ALIGN((pfn) + 1, 1UL << (order)) 47 + #define pageblock_start_pfn(pfn) block_start_pfn(pfn, pageblock_order) 48 + #define pageblock_end_pfn(pfn) block_end_pfn(pfn, pageblock_order) 49 + 45 50 static unsigned long release_freepages(struct list_head *freelist) 46 51 { 47 52 struct page *page, *next; ··· 166 161 zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn; 167 162 zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn; 168 163 zone->compact_cached_free_pfn = 169 - round_down(zone_end_pfn(zone) - 1, pageblock_nr_pages); 164 + pageblock_start_pfn(zone_end_pfn(zone) - 1); 170 165 } 171 166 172 167 /* ··· 524 519 LIST_HEAD(freelist); 525 520 526 521 pfn = start_pfn; 527 - block_start_pfn = pfn & ~(pageblock_nr_pages - 1); 522 + block_start_pfn = pageblock_start_pfn(pfn); 528 523 if (block_start_pfn < cc->zone->zone_start_pfn) 529 524 block_start_pfn = cc->zone->zone_start_pfn; 530 - block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages); 525 + block_end_pfn = pageblock_end_pfn(pfn); 531 526 532 527 for (; pfn < end_pfn; pfn += isolated, 533 528 block_start_pfn = block_end_pfn, ··· 543 538 * scanning range to right one. 544 539 */ 545 540 if (pfn >= block_end_pfn) { 546 - block_start_pfn = pfn & ~(pageblock_nr_pages - 1); 547 - block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages); 541 + block_start_pfn = pageblock_start_pfn(pfn); 542 + block_end_pfn = pageblock_end_pfn(pfn); 548 543 block_end_pfn = min(block_end_pfn, end_pfn); 549 544 } 550 545 ··· 638 633 { 639 634 struct zone *zone = cc->zone; 640 635 unsigned long nr_scanned = 0, nr_isolated = 0; 641 - struct list_head *migratelist = &cc->migratepages; 642 636 struct lruvec *lruvec; 643 637 unsigned long flags = 0; 644 638 bool locked = false; 645 639 struct page *page = NULL, *valid_page = NULL; 646 640 unsigned long start_pfn = low_pfn; 641 + bool skip_on_failure = false; 642 + unsigned long next_skip_pfn = 0; 647 643 648 644 /* 649 645 * Ensure that there are not too many pages isolated from the LRU ··· 665 659 if (compact_should_abort(cc)) 666 660 return 0; 667 661 662 + if (cc->direct_compaction && (cc->mode == MIGRATE_ASYNC)) { 663 + skip_on_failure = true; 664 + next_skip_pfn = block_end_pfn(low_pfn, cc->order); 665 + } 666 + 668 667 /* Time to isolate some pages for migration */ 669 668 for (; low_pfn < end_pfn; low_pfn++) { 670 669 bool is_lru; 670 + 671 + if (skip_on_failure && low_pfn >= next_skip_pfn) { 672 + /* 673 + * We have isolated all migration candidates in the 674 + * previous order-aligned block, and did not skip it due 675 + * to failure. We should migrate the pages now and 676 + * hopefully succeed compaction. 677 + */ 678 + if (nr_isolated) 679 + break; 680 + 681 + /* 682 + * We failed to isolate in the previous order-aligned 683 + * block. Set the new boundary to the end of the 684 + * current block. Note we can't simply increase 685 + * next_skip_pfn by 1 << order, as low_pfn might have 686 + * been incremented by a higher number due to skipping 687 + * a compound or a high-order buddy page in the 688 + * previous loop iteration. 689 + */ 690 + next_skip_pfn = block_end_pfn(low_pfn, cc->order); 691 + } 671 692 672 693 /* 673 694 * Periodically drop the lock (if held) regardless of its ··· 707 674 break; 708 675 709 676 if (!pfn_valid_within(low_pfn)) 710 - continue; 677 + goto isolate_fail; 711 678 nr_scanned++; 712 679 713 680 page = pfn_to_page(low_pfn); ··· 762 729 if (likely(comp_order < MAX_ORDER)) 763 730 low_pfn += (1UL << comp_order) - 1; 764 731 765 - continue; 732 + goto isolate_fail; 766 733 } 767 734 768 735 if (!is_lru) 769 - continue; 736 + goto isolate_fail; 770 737 771 738 /* 772 739 * Migration will fail if an anonymous page is pinned in memory, ··· 775 742 */ 776 743 if (!page_mapping(page) && 777 744 page_count(page) > page_mapcount(page)) 778 - continue; 745 + goto isolate_fail; 779 746 780 747 /* If we already hold the lock, we can skip some rechecking */ 781 748 if (!locked) { ··· 786 753 787 754 /* Recheck PageLRU and PageCompound under lock */ 788 755 if (!PageLRU(page)) 789 - continue; 756 + goto isolate_fail; 790 757 791 758 /* 792 759 * Page become compound since the non-locked check, ··· 795 762 */ 796 763 if (unlikely(PageCompound(page))) { 797 764 low_pfn += (1UL << compound_order(page)) - 1; 798 - continue; 765 + goto isolate_fail; 799 766 } 800 767 } 801 768 ··· 803 770 804 771 /* Try isolate the page */ 805 772 if (__isolate_lru_page(page, isolate_mode) != 0) 806 - continue; 773 + goto isolate_fail; 807 774 808 775 VM_BUG_ON_PAGE(PageCompound(page), page); 809 776 ··· 811 778 del_page_from_lru_list(page, lruvec, page_lru(page)); 812 779 813 780 isolate_success: 814 - list_add(&page->lru, migratelist); 781 + list_add(&page->lru, &cc->migratepages); 815 782 cc->nr_migratepages++; 816 783 nr_isolated++; 784 + 785 + /* 786 + * Record where we could have freed pages by migration and not 787 + * yet flushed them to buddy allocator. 788 + * - this is the lowest page that was isolated and likely be 789 + * then freed by migration. 790 + */ 791 + if (!cc->last_migrated_pfn) 792 + cc->last_migrated_pfn = low_pfn; 817 793 818 794 /* Avoid isolating too much */ 819 795 if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) { 820 796 ++low_pfn; 821 797 break; 798 + } 799 + 800 + continue; 801 + isolate_fail: 802 + if (!skip_on_failure) 803 + continue; 804 + 805 + /* 806 + * We have isolated some pages, but then failed. Release them 807 + * instead of migrating, as we cannot form the cc->order buddy 808 + * page anyway. 809 + */ 810 + if (nr_isolated) { 811 + if (locked) { 812 + spin_unlock_irqrestore(&zone->lru_lock, flags); 813 + locked = false; 814 + } 815 + acct_isolated(zone, cc); 816 + putback_movable_pages(&cc->migratepages); 817 + cc->nr_migratepages = 0; 818 + cc->last_migrated_pfn = 0; 819 + nr_isolated = 0; 820 + } 821 + 822 + if (low_pfn < next_skip_pfn) { 823 + low_pfn = next_skip_pfn - 1; 824 + /* 825 + * The check near the loop beginning would have updated 826 + * next_skip_pfn too, but this is a bit simpler. 827 + */ 828 + next_skip_pfn += 1UL << cc->order; 822 829 } 823 830 } 824 831 ··· 907 834 908 835 /* Scan block by block. First and last block may be incomplete */ 909 836 pfn = start_pfn; 910 - block_start_pfn = pfn & ~(pageblock_nr_pages - 1); 837 + block_start_pfn = pageblock_start_pfn(pfn); 911 838 if (block_start_pfn < cc->zone->zone_start_pfn) 912 839 block_start_pfn = cc->zone->zone_start_pfn; 913 - block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages); 840 + block_end_pfn = pageblock_end_pfn(pfn); 914 841 915 842 for (; pfn < end_pfn; pfn = block_end_pfn, 916 843 block_start_pfn = block_end_pfn, ··· 997 924 * is using. 998 925 */ 999 926 isolate_start_pfn = cc->free_pfn; 1000 - block_start_pfn = cc->free_pfn & ~(pageblock_nr_pages-1); 927 + block_start_pfn = pageblock_start_pfn(cc->free_pfn); 1001 928 block_end_pfn = min(block_start_pfn + pageblock_nr_pages, 1002 929 zone_end_pfn(zone)); 1003 - low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages); 930 + low_pfn = pageblock_end_pfn(cc->migrate_pfn); 1004 931 1005 932 /* 1006 933 * Isolate free pages until enough are available to migrate the ··· 1143 1070 unsigned long block_start_pfn; 1144 1071 unsigned long block_end_pfn; 1145 1072 unsigned long low_pfn; 1146 - unsigned long isolate_start_pfn; 1147 1073 struct page *page; 1148 1074 const isolate_mode_t isolate_mode = 1149 1075 (sysctl_compact_unevictable_allowed ? ISOLATE_UNEVICTABLE : 0) | ··· 1153 1081 * initialized by compact_zone() 1154 1082 */ 1155 1083 low_pfn = cc->migrate_pfn; 1156 - block_start_pfn = cc->migrate_pfn & ~(pageblock_nr_pages - 1); 1084 + block_start_pfn = pageblock_start_pfn(low_pfn); 1157 1085 if (block_start_pfn < zone->zone_start_pfn) 1158 1086 block_start_pfn = zone->zone_start_pfn; 1159 1087 1160 1088 /* Only scan within a pageblock boundary */ 1161 - block_end_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages); 1089 + block_end_pfn = pageblock_end_pfn(low_pfn); 1162 1090 1163 1091 /* 1164 1092 * Iterate over whole pageblocks until we find the first suitable. ··· 1197 1125 continue; 1198 1126 1199 1127 /* Perform the isolation */ 1200 - isolate_start_pfn = low_pfn; 1201 1128 low_pfn = isolate_migratepages_block(cc, low_pfn, 1202 1129 block_end_pfn, isolate_mode); 1203 1130 ··· 1204 1133 acct_isolated(zone, cc); 1205 1134 return ISOLATE_ABORT; 1206 1135 } 1207 - 1208 - /* 1209 - * Record where we could have freed pages by migration and not 1210 - * yet flushed them to buddy allocator. 1211 - * - this is the lowest page that could have been isolated and 1212 - * then freed by migration. 1213 - */ 1214 - if (cc->nr_migratepages && !cc->last_migrated_pfn) 1215 - cc->last_migrated_pfn = isolate_start_pfn; 1216 1136 1217 1137 /* 1218 1138 * Either we isolated something and proceed with migration. Or ··· 1313 1251 * COMPACT_CONTINUE - If compaction should run now 1314 1252 */ 1315 1253 static unsigned long __compaction_suitable(struct zone *zone, int order, 1316 - int alloc_flags, int classzone_idx) 1254 + unsigned int alloc_flags, 1255 + int classzone_idx) 1317 1256 { 1318 1257 int fragindex; 1319 1258 unsigned long watermark; ··· 1359 1296 } 1360 1297 1361 1298 unsigned long compaction_suitable(struct zone *zone, int order, 1362 - int alloc_flags, int classzone_idx) 1299 + unsigned int alloc_flags, 1300 + int classzone_idx) 1363 1301 { 1364 1302 unsigned long ret; 1365 1303 ··· 1407 1343 cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync]; 1408 1344 cc->free_pfn = zone->compact_cached_free_pfn; 1409 1345 if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) { 1410 - cc->free_pfn = round_down(end_pfn - 1, pageblock_nr_pages); 1346 + cc->free_pfn = pageblock_start_pfn(end_pfn - 1); 1411 1347 zone->compact_cached_free_pfn = cc->free_pfn; 1412 1348 } 1413 1349 if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) { ··· 1462 1398 ret = COMPACT_CONTENDED; 1463 1399 goto out; 1464 1400 } 1401 + /* 1402 + * We failed to migrate at least one page in the current 1403 + * order-aligned block, so skip the rest of it. 1404 + */ 1405 + if (cc->direct_compaction && 1406 + (cc->mode == MIGRATE_ASYNC)) { 1407 + cc->migrate_pfn = block_end_pfn( 1408 + cc->migrate_pfn - 1, cc->order); 1409 + /* Draining pcplists is useless in this case */ 1410 + cc->last_migrated_pfn = 0; 1411 + 1412 + } 1465 1413 } 1466 1414 1467 1415 check_drain: ··· 1487 1411 if (cc->order > 0 && cc->last_migrated_pfn) { 1488 1412 int cpu; 1489 1413 unsigned long current_block_start = 1490 - cc->migrate_pfn & ~((1UL << cc->order) - 1); 1414 + block_start_pfn(cc->migrate_pfn, cc->order); 1491 1415 1492 1416 if (cc->last_migrated_pfn < current_block_start) { 1493 1417 cpu = get_cpu(); ··· 1512 1436 cc->nr_freepages = 0; 1513 1437 VM_BUG_ON(free_pfn == 0); 1514 1438 /* The cached pfn is always the first in a pageblock */ 1515 - free_pfn &= ~(pageblock_nr_pages-1); 1439 + free_pfn = pageblock_start_pfn(free_pfn); 1516 1440 /* 1517 1441 * Only go back, not forward. The cached pfn might have been 1518 1442 * already reset to zone end in compact_finished() ··· 1532 1456 1533 1457 static unsigned long compact_zone_order(struct zone *zone, int order, 1534 1458 gfp_t gfp_mask, enum migrate_mode mode, int *contended, 1535 - int alloc_flags, int classzone_idx) 1459 + unsigned int alloc_flags, int classzone_idx) 1536 1460 { 1537 1461 unsigned long ret; 1538 1462 struct compact_control cc = { ··· 1573 1497 * This is the main entry point for direct page compaction. 1574 1498 */ 1575 1499 unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order, 1576 - int alloc_flags, const struct alloc_context *ac, 1577 - enum migrate_mode mode, int *contended) 1500 + unsigned int alloc_flags, const struct alloc_context *ac, 1501 + enum migrate_mode mode, int *contended) 1578 1502 { 1579 1503 int may_enter_fs = gfp_mask & __GFP_FS; 1580 1504 int may_perform_io = gfp_mask & __GFP_IO; ··· 1602 1526 1603 1527 status = compact_zone_order(zone, order, gfp_mask, mode, 1604 1528 &zone_contended, alloc_flags, 1605 - ac->classzone_idx); 1529 + ac_classzone_idx(ac)); 1606 1530 rc = max(status, rc); 1607 1531 /* 1608 1532 * It takes at least one zone that wasn't lock contended ··· 1612 1536 1613 1537 /* If a normal allocation would succeed, stop compacting */ 1614 1538 if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 1615 - ac->classzone_idx, alloc_flags)) { 1539 + ac_classzone_idx(ac), alloc_flags)) { 1616 1540 /* 1617 1541 * We think the allocation will succeed in this zone, 1618 1542 * but it is not certain, hence the false. The caller

+1 -1

mm/filemap.c

··· 213 213 * some other bad page check should catch it later. 214 214 */ 215 215 page_mapcount_reset(page); 216 - atomic_sub(mapcount, &page->_count); 216 + page_ref_sub(page, mapcount); 217 217 } 218 218 } 219 219

+4 -8

mm/highmem.c

··· 112 112 113 113 unsigned int nr_free_highpages (void) 114 114 { 115 - pg_data_t *pgdat; 115 + struct zone *zone; 116 116 unsigned int pages = 0; 117 117 118 - for_each_online_pgdat(pgdat) { 119 - pages += zone_page_state(&pgdat->node_zones[ZONE_HIGHMEM], 120 - NR_FREE_PAGES); 121 - if (zone_movable_is_highmem()) 122 - pages += zone_page_state( 123 - &pgdat->node_zones[ZONE_MOVABLE], 124 - NR_FREE_PAGES); 118 + for_each_populated_zone(zone) { 119 + if (is_highmem(zone)) 120 + pages += zone_page_state(zone, NR_FREE_PAGES); 125 121 } 126 122 127 123 return pages;

+4 -7

mm/huge_memory.c

··· 1698 1698 return 1; 1699 1699 } 1700 1700 1701 - bool move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma, 1702 - unsigned long old_addr, 1701 + bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, 1703 1702 unsigned long new_addr, unsigned long old_end, 1704 1703 pmd_t *old_pmd, pmd_t *new_pmd) 1705 1704 { 1706 1705 spinlock_t *old_ptl, *new_ptl; 1707 1706 pmd_t pmd; 1708 - 1709 1707 struct mm_struct *mm = vma->vm_mm; 1710 1708 1711 1709 if ((old_addr & ~HPAGE_PMD_MASK) || 1712 1710 (new_addr & ~HPAGE_PMD_MASK) || 1713 - old_end - old_addr < HPAGE_PMD_SIZE || 1714 - (new_vma->vm_flags & VM_NOHUGEPAGE)) 1711 + old_end - old_addr < HPAGE_PMD_SIZE) 1715 1712 return false; 1716 1713 1717 1714 /* ··· 3110 3113 VM_BUG_ON_PAGE(page_ref_count(page_tail) != 0, page_tail); 3111 3114 3112 3115 /* 3113 - * tail_page->_count is zero and not changing from under us. But 3116 + * tail_page->_refcount is zero and not changing from under us. But 3114 3117 * get_page_unless_zero() may be running from under us on the 3115 3118 * tail_page. If we used atomic_set() below instead of atomic_inc(), we 3116 3119 * would then run atomic_set() concurrently with ··· 3337 3340 if (mlocked) 3338 3341 lru_add_drain(); 3339 3342 3340 - /* Prevent deferred_split_scan() touching ->_count */ 3343 + /* Prevent deferred_split_scan() touching ->_refcount */ 3341 3344 spin_lock_irqsave(&pgdata->split_queue_lock, flags); 3342 3345 count = page_count(head); 3343 3346 mapcount = total_mapcount(head);

+26 -11

mm/hugetlb.c

··· 51 51 static struct hstate * __initdata parsed_hstate; 52 52 static unsigned long __initdata default_hstate_max_huge_pages; 53 53 static unsigned long __initdata default_hstate_size; 54 + static bool __initdata parsed_valid_hugepagesz = true; 54 55 55 56 /* 56 57 * Protects updates to hugepage_freelists, hugepage_activelist, nr_huge_pages, ··· 145 144 } 146 145 } 147 146 148 - if (spool->min_hpages != -1) { /* minimum size accounting */ 147 + /* minimum size accounting */ 148 + if (spool->min_hpages != -1 && spool->rsv_hpages) { 149 149 if (delta > spool->rsv_hpages) { 150 150 /* 151 151 * Asking for more reserves than those already taken on ··· 184 182 if (spool->max_hpages != -1) /* maximum size accounting */ 185 183 spool->used_hpages -= delta; 186 184 187 - if (spool->min_hpages != -1) { /* minimum size accounting */ 185 + /* minimum size accounting */ 186 + if (spool->min_hpages != -1 && spool->used_hpages < spool->min_hpages) { 188 187 if (spool->rsv_hpages + delta <= spool->min_hpages) 189 188 ret = 0; 190 189 else ··· 940 937 */ 941 938 static int next_node_allowed(int nid, nodemask_t *nodes_allowed) 942 939 { 943 - nid = next_node(nid, *nodes_allowed); 944 - if (nid == MAX_NUMNODES) 945 - nid = first_node(*nodes_allowed); 940 + nid = next_node_in(nid, *nodes_allowed); 946 941 VM_BUG_ON(nid >= MAX_NUMNODES); 947 942 948 943 return nid; ··· 1031 1030 return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE); 1032 1031 } 1033 1032 1034 - static bool pfn_range_valid_gigantic(unsigned long start_pfn, 1035 - unsigned long nr_pages) 1033 + static bool pfn_range_valid_gigantic(struct zone *z, 1034 + unsigned long start_pfn, unsigned long nr_pages) 1036 1035 { 1037 1036 unsigned long i, end_pfn = start_pfn + nr_pages; 1038 1037 struct page *page; ··· 1042 1041 return false; 1043 1042 1044 1043 page = pfn_to_page(i); 1044 + 1045 + if (page_zone(page) != z) 1046 + return false; 1045 1047 1046 1048 if (PageReserved(page)) 1047 1049 return false; ··· 1078 1074 1079 1075 pfn = ALIGN(z->zone_start_pfn, nr_pages); 1080 1076 while (zone_spans_last_pfn(z, pfn, nr_pages)) { 1081 - if (pfn_range_valid_gigantic(pfn, nr_pages)) { 1077 + if (pfn_range_valid_gigantic(z, pfn, nr_pages)) { 1082 1078 /* 1083 1079 * We release the zone lock here because 1084 1080 * alloc_contig_range() will also lock the zone ··· 2663 2659 subsys_initcall(hugetlb_init); 2664 2660 2665 2661 /* Should be called on processing a hugepagesz=... option */ 2662 + void __init hugetlb_bad_size(void) 2663 + { 2664 + parsed_valid_hugepagesz = false; 2665 + } 2666 + 2666 2667 void __init hugetlb_add_hstate(unsigned int order) 2667 2668 { 2668 2669 struct hstate *h; ··· 2687 2678 for (i = 0; i < MAX_NUMNODES; ++i) 2688 2679 INIT_LIST_HEAD(&h->hugepage_freelists[i]); 2689 2680 INIT_LIST_HEAD(&h->hugepage_activelist); 2690 - h->next_nid_to_alloc = first_node(node_states[N_MEMORY]); 2691 - h->next_nid_to_free = first_node(node_states[N_MEMORY]); 2681 + h->next_nid_to_alloc = first_memory_node; 2682 + h->next_nid_to_free = first_memory_node; 2692 2683 snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", 2693 2684 huge_page_size(h)/1024); 2694 2685 ··· 2700 2691 unsigned long *mhp; 2701 2692 static unsigned long *last_mhp; 2702 2693 2694 + if (!parsed_valid_hugepagesz) { 2695 + pr_warn("hugepages = %s preceded by " 2696 + "an unsupported hugepagesz, ignoring\n", s); 2697 + parsed_valid_hugepagesz = true; 2698 + return 1; 2699 + } 2703 2700 /* 2704 2701 * !hugetlb_max_hstate means we haven't parsed a hugepagesz= parameter yet, 2705 2702 * so this hugepages= parameter goes to the "default hstate". 2706 2703 */ 2707 - if (!hugetlb_max_hstate) 2704 + else if (!hugetlb_max_hstate) 2708 2705 mhp = &default_hstate_max_huge_pages; 2709 2706 else 2710 2707 mhp = &parsed_hstate->max_huge_pages;

+5 -4

mm/internal.h

··· 58 58 } 59 59 60 60 /* 61 - * Turn a non-refcounted page (->_count == 0) into refcounted with 61 + * Turn a non-refcounted page (->_refcount == 0) into refcounted with 62 62 * a count of one. 63 63 */ 64 64 static inline void set_page_refcounted(struct page *page) ··· 102 102 struct alloc_context { 103 103 struct zonelist *zonelist; 104 104 nodemask_t *nodemask; 105 - struct zone *preferred_zone; 106 - int classzone_idx; 105 + struct zoneref *preferred_zoneref; 107 106 int migratetype; 108 107 enum zone_type high_zoneidx; 109 108 bool spread_dirty_pages; 110 109 }; 110 + 111 + #define ac_classzone_idx(ac) zonelist_zone_idx(ac->preferred_zoneref) 111 112 112 113 /* 113 114 * Locate the struct page for both the matching buddy in our ··· 176 175 bool direct_compaction; /* False from kcompactd or /proc/... */ 177 176 int order; /* order a direct compactor needs */ 178 177 const gfp_t gfp_mask; /* gfp mask of a direct compactor */ 179 - const int alloc_flags; /* alloc flags of a direct compactor */ 178 + const unsigned int alloc_flags; /* alloc flags of a direct compactor */ 180 179 const int classzone_idx; /* zone index of a direct compactor */ 181 180 struct zone *zone; 182 181 int contended; /* Signal need_sched() or lock

+27 -11

mm/memcontrol.c

··· 1023 1023 * @lru: index of lru list the page is sitting on 1024 1024 * @nr_pages: positive when adding or negative when removing 1025 1025 * 1026 - * This function must be called when a page is added to or removed from an 1027 - * lru list. 1026 + * This function must be called under lru_lock, just before a page is added 1027 + * to or just after a page is removed from an lru list (that ordering being 1028 + * so as to allow it to check that lru_size 0 is consistent with list_empty). 1028 1029 */ 1029 1030 void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru, 1030 1031 int nr_pages) 1031 1032 { 1032 1033 struct mem_cgroup_per_zone *mz; 1033 1034 unsigned long *lru_size; 1035 + long size; 1036 + bool empty; 1037 + 1038 + __update_lru_size(lruvec, lru, nr_pages); 1034 1039 1035 1040 if (mem_cgroup_disabled()) 1036 1041 return; 1037 1042 1038 1043 mz = container_of(lruvec, struct mem_cgroup_per_zone, lruvec); 1039 1044 lru_size = mz->lru_size + lru; 1040 - *lru_size += nr_pages; 1041 - VM_BUG_ON((long)(*lru_size) < 0); 1045 + empty = list_empty(lruvec->lists + lru); 1046 + 1047 + if (nr_pages < 0) 1048 + *lru_size += nr_pages; 1049 + 1050 + size = *lru_size; 1051 + if (WARN_ONCE(size < 0 || empty != !size, 1052 + "%s(%p, %d, %d): lru_size %ld but %sempty\n", 1053 + __func__, lruvec, lru, nr_pages, size, empty ? "" : "not ")) { 1054 + VM_BUG_ON(1); 1055 + *lru_size = 0; 1056 + } 1057 + 1058 + if (nr_pages > 0) 1059 + *lru_size += nr_pages; 1042 1060 } 1043 1061 1044 1062 bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg) ··· 1275 1257 */ 1276 1258 if (fatal_signal_pending(current) || task_will_free_mem(current)) { 1277 1259 mark_oom_victim(current); 1260 + try_oom_reaper(current); 1278 1261 goto unlock; 1279 1262 } 1280 1263 ··· 1408 1389 mem_cgroup_may_update_nodemask(memcg); 1409 1390 node = memcg->last_scanned_node; 1410 1391 1411 - node = next_node(node, memcg->scan_nodes); 1412 - if (node == MAX_NUMNODES) 1413 - node = first_node(memcg->scan_nodes); 1392 + node = next_node_in(node, memcg->scan_nodes); 1414 1393 /* 1415 - * We call this when we hit limit, not when pages are added to LRU. 1416 - * No LRU may hold pages because all pages are UNEVICTABLE or 1417 - * memcg is too small and all pages are not on LRU. In that case, 1418 - * we use curret node. 1394 + * mem_cgroup_may_update_nodemask might have seen no reclaimmable pages 1395 + * last time it really checked all the LRUs due to rate limiting. 1396 + * Fallback to the current node in that case for simplicity. 1419 1397 */ 1420 1398 if (unlikely(node == MAX_NUMNODES)) 1421 1399 node = numa_node_id();

+18 -3

mm/memory_hotplug.c

··· 78 78 #define memhp_lock_acquire() lock_map_acquire(&mem_hotplug.dep_map) 79 79 #define memhp_lock_release() lock_map_release(&mem_hotplug.dep_map) 80 80 81 + #ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE 81 82 bool memhp_auto_online; 83 + #else 84 + bool memhp_auto_online = true; 85 + #endif 82 86 EXPORT_SYMBOL_GPL(memhp_auto_online); 87 + 88 + static int __init setup_memhp_default_state(char *str) 89 + { 90 + if (!strcmp(str, "online")) 91 + memhp_auto_online = true; 92 + else if (!strcmp(str, "offline")) 93 + memhp_auto_online = false; 94 + 95 + return 1; 96 + } 97 + __setup("memhp_default_state=", setup_memhp_default_state); 83 98 84 99 void get_online_mems(void) 85 100 { ··· 1425 1410 } 1426 1411 1427 1412 /* Checks if this range of memory is likely to be hot-removable. */ 1428 - int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages) 1413 + bool is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages) 1429 1414 { 1430 1415 struct page *page = pfn_to_page(start_pfn); 1431 1416 struct page *end_page = page + nr_pages; ··· 1433 1418 /* Check the starting page of each pageblock within the range */ 1434 1419 for (; page < end_page; page = next_active_pageblock(page)) { 1435 1420 if (!is_pageblock_removable_nolock(page)) 1436 - return 0; 1421 + return false; 1437 1422 cond_resched(); 1438 1423 } 1439 1424 1440 1425 /* All pageblocks in the memory block are likely to be hot-removable */ 1441 - return 1; 1426 + return true; 1442 1427 } 1443 1428 1444 1429 /*

+23 -40

mm/mempolicy.c

··· 97 97 98 98 #include <asm/tlbflush.h> 99 99 #include <asm/uaccess.h> 100 - #include <linux/random.h> 101 100 102 101 #include "internal.h" 103 102 ··· 346 347 BUG(); 347 348 348 349 if (!node_isset(current->il_next, tmp)) { 349 - current->il_next = next_node(current->il_next, tmp); 350 - if (current->il_next >= MAX_NUMNODES) 351 - current->il_next = first_node(tmp); 350 + current->il_next = next_node_in(current->il_next, tmp); 352 351 if (current->il_next >= MAX_NUMNODES) 353 352 current->il_next = numa_node_id(); 354 353 } ··· 1706 1709 struct task_struct *me = current; 1707 1710 1708 1711 nid = me->il_next; 1709 - next = next_node(nid, policy->v.nodes); 1710 - if (next >= MAX_NUMNODES) 1711 - next = first_node(policy->v.nodes); 1712 + next = next_node_in(nid, policy->v.nodes); 1712 1713 if (next < MAX_NUMNODES) 1713 1714 me->il_next = next; 1714 1715 return nid; ··· 1739 1744 return interleave_nodes(policy); 1740 1745 1741 1746 case MPOL_BIND: { 1747 + struct zoneref *z; 1748 + 1742 1749 /* 1743 1750 * Follow bind policy behavior and start allocation at the 1744 1751 * first node. 1745 1752 */ 1746 1753 struct zonelist *zonelist; 1747 - struct zone *zone; 1748 1754 enum zone_type highest_zoneidx = gfp_zone(GFP_KERNEL); 1749 1755 zonelist = &NODE_DATA(node)->node_zonelists[0]; 1750 - (void)first_zones_zonelist(zonelist, highest_zoneidx, 1751 - &policy->v.nodes, 1752 - &zone); 1753 - return zone ? zone->node : node; 1756 + z = first_zones_zonelist(zonelist, highest_zoneidx, 1757 + &policy->v.nodes); 1758 + return z->zone ? z->zone->node : node; 1754 1759 } 1755 1760 1756 1761 default: ··· 1758 1763 } 1759 1764 } 1760 1765 1761 - /* Do static interleaving for a VMA with known offset. */ 1766 + /* 1767 + * Do static interleaving for a VMA with known offset @n. Returns the n'th 1768 + * node in pol->v.nodes (starting from n=0), wrapping around if n exceeds the 1769 + * number of present nodes. 1770 + */ 1762 1771 static unsigned offset_il_node(struct mempolicy *pol, 1763 - struct vm_area_struct *vma, unsigned long off) 1772 + struct vm_area_struct *vma, unsigned long n) 1764 1773 { 1765 1774 unsigned nnodes = nodes_weight(pol->v.nodes); 1766 1775 unsigned target; 1767 - int c; 1768 - int nid = NUMA_NO_NODE; 1776 + int i; 1777 + int nid; 1769 1778 1770 1779 if (!nnodes) 1771 1780 return numa_node_id(); 1772 - target = (unsigned int)off % nnodes; 1773 - c = 0; 1774 - do { 1781 + target = (unsigned int)n % nnodes; 1782 + nid = first_node(pol->v.nodes); 1783 + for (i = 0; i < target; i++) 1775 1784 nid = next_node(nid, pol->v.nodes); 1776 - c++; 1777 - } while (c <= target); 1778 1785 return nid; 1779 1786 } 1780 1787 ··· 1800 1803 return offset_il_node(pol, vma, off); 1801 1804 } else 1802 1805 return interleave_nodes(pol); 1803 - } 1804 - 1805 - /* 1806 - * Return the bit number of a random bit set in the nodemask. 1807 - * (returns NUMA_NO_NODE if nodemask is empty) 1808 - */ 1809 - int node_random(const nodemask_t *maskp) 1810 - { 1811 - int w, bit = NUMA_NO_NODE; 1812 - 1813 - w = nodes_weight(*maskp); 1814 - if (w) 1815 - bit = bitmap_ord_to_pos(maskp->bits, 1816 - get_random_int() % w, MAX_NUMNODES); 1817 - return bit; 1818 1806 } 1819 1807 1820 1808 #ifdef CONFIG_HUGETLBFS ··· 2266 2284 int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long addr) 2267 2285 { 2268 2286 struct mempolicy *pol; 2269 - struct zone *zone; 2287 + struct zoneref *z; 2270 2288 int curnid = page_to_nid(page); 2271 2289 unsigned long pgoff; 2272 2290 int thiscpu = raw_smp_processor_id(); ··· 2298 2316 break; 2299 2317 2300 2318 case MPOL_BIND: 2319 + 2301 2320 /* 2302 2321 * allows binding to multiple nodes. 2303 2322 * use current page if in policy nodemask, ··· 2307 2324 */ 2308 2325 if (node_isset(curnid, pol->v.nodes)) 2309 2326 goto out; 2310 - (void)first_zones_zonelist( 2327 + z = first_zones_zonelist( 2311 2328 node_zonelist(numa_node_id(), GFP_HIGHUSER), 2312 2329 gfp_zone(GFP_HIGHUSER), 2313 - &pol->v.nodes, &zone); 2314 - polnid = zone->node; 2330 + &pol->v.nodes); 2331 + polnid = z->zone->node; 2315 2332 break; 2316 2333 2317 2334 default:

+3 -3

mm/migrate.c

··· 332 332 newpage->index = page->index; 333 333 newpage->mapping = page->mapping; 334 334 if (PageSwapBacked(page)) 335 - SetPageSwapBacked(newpage); 335 + __SetPageSwapBacked(newpage); 336 336 337 337 return MIGRATEPAGE_SUCCESS; 338 338 } ··· 378 378 newpage->index = page->index; 379 379 newpage->mapping = page->mapping; 380 380 if (PageSwapBacked(page)) 381 - SetPageSwapBacked(newpage); 381 + __SetPageSwapBacked(newpage); 382 382 383 383 get_page(newpage); /* add cache reference */ 384 384 if (PageSwapCache(page)) { ··· 1791 1791 1792 1792 /* Prepare a page as a migration target */ 1793 1793 __SetPageLocked(new_page); 1794 - SetPageSwapBacked(new_page); 1794 + __SetPageSwapBacked(new_page); 1795 1795 1796 1796 /* anon mapping, we can simply copy page->mapping to the new page: */ 1797 1797 new_page->mapping = page->mapping;

-5

mm/mmap.c

··· 55 55 #define arch_mmap_check(addr, len, flags) (0) 56 56 #endif 57 57 58 - #ifndef arch_rebalance_pgtables 59 - #define arch_rebalance_pgtables(addr, len) (addr) 60 - #endif 61 - 62 58 #ifdef CONFIG_HAVE_ARCH_MMAP_RND_BITS 63 59 const int mmap_rnd_bits_min = CONFIG_ARCH_MMAP_RND_BITS_MIN; 64 60 const int mmap_rnd_bits_max = CONFIG_ARCH_MMAP_RND_BITS_MAX; ··· 1907 1911 if (offset_in_page(addr)) 1908 1912 return -EINVAL; 1909 1913 1910 - addr = arch_rebalance_pgtables(addr, len); 1911 1914 error = security_mmap_addr(addr); 1912 1915 return error ? error : addr; 1913 1916 }

+1 -1

mm/mmzone.c

··· 52 52 } 53 53 54 54 /* Returns the next zone at or below highest_zoneidx in a zonelist */ 55 - struct zoneref *next_zones_zonelist(struct zoneref *z, 55 + struct zoneref *__next_zones_zonelist(struct zoneref *z, 56 56 enum zone_type highest_zoneidx, 57 57 nodemask_t *nodes) 58 58 {

+24 -23

mm/mremap.c

··· 70 70 return pmd; 71 71 } 72 72 73 + static void take_rmap_locks(struct vm_area_struct *vma) 74 + { 75 + if (vma->vm_file) 76 + i_mmap_lock_write(vma->vm_file->f_mapping); 77 + if (vma->anon_vma) 78 + anon_vma_lock_write(vma->anon_vma); 79 + } 80 + 81 + static void drop_rmap_locks(struct vm_area_struct *vma) 82 + { 83 + if (vma->anon_vma) 84 + anon_vma_unlock_write(vma->anon_vma); 85 + if (vma->vm_file) 86 + i_mmap_unlock_write(vma->vm_file->f_mapping); 87 + } 88 + 73 89 static pte_t move_soft_dirty_pte(pte_t pte) 74 90 { 75 91 /* ··· 106 90 struct vm_area_struct *new_vma, pmd_t *new_pmd, 107 91 unsigned long new_addr, bool need_rmap_locks) 108 92 { 109 - struct address_space *mapping = NULL; 110 - struct anon_vma *anon_vma = NULL; 111 93 struct mm_struct *mm = vma->vm_mm; 112 94 pte_t *old_pte, *new_pte, pte; 113 95 spinlock_t *old_ptl, *new_ptl; ··· 128 114 * serialize access to individual ptes, but only rmap traversal 129 115 * order guarantees that we won't miss both the old and new ptes). 130 116 */ 131 - if (need_rmap_locks) { 132 - if (vma->vm_file) { 133 - mapping = vma->vm_file->f_mapping; 134 - i_mmap_lock_write(mapping); 135 - } 136 - if (vma->anon_vma) { 137 - anon_vma = vma->anon_vma; 138 - anon_vma_lock_write(anon_vma); 139 - } 140 - } 117 + if (need_rmap_locks) 118 + take_rmap_locks(vma); 141 119 142 120 /* 143 121 * We don't have to worry about the ordering of src and dst ··· 157 151 spin_unlock(new_ptl); 158 152 pte_unmap(new_pte - 1); 159 153 pte_unmap_unlock(old_pte - 1, old_ptl); 160 - if (anon_vma) 161 - anon_vma_unlock_write(anon_vma); 162 - if (mapping) 163 - i_mmap_unlock_write(mapping); 154 + if (need_rmap_locks) 155 + drop_rmap_locks(vma); 164 156 } 165 157 166 158 #define LATENCY_LIMIT (64 * PAGE_SIZE) ··· 197 193 if (pmd_trans_huge(*old_pmd)) { 198 194 if (extent == HPAGE_PMD_SIZE) { 199 195 bool moved; 200 - VM_BUG_ON_VMA(vma->vm_file || !vma->anon_vma, 201 - vma); 202 196 /* See comment in move_ptes() */ 203 197 if (need_rmap_locks) 204 - anon_vma_lock_write(vma->anon_vma); 205 - moved = move_huge_pmd(vma, new_vma, old_addr, 206 - new_addr, old_end, 207 - old_pmd, new_pmd); 198 + take_rmap_locks(vma); 199 + moved = move_huge_pmd(vma, old_addr, new_addr, 200 + old_end, old_pmd, new_pmd); 208 201 if (need_rmap_locks) 209 - anon_vma_unlock_write(vma->anon_vma); 202 + drop_rmap_locks(vma); 210 203 if (moved) { 211 204 need_flush = true; 212 205 continue;

+88 -24

mm/oom_kill.c

··· 412 412 413 413 #define K(x) ((x) << (PAGE_SHIFT-10)) 414 414 415 + /* 416 + * task->mm can be NULL if the task is the exited group leader. So to 417 + * determine whether the task is using a particular mm, we examine all the 418 + * task's threads: if one of those is using this mm then this task was also 419 + * using it. 420 + */ 421 + static bool process_shares_mm(struct task_struct *p, struct mm_struct *mm) 422 + { 423 + struct task_struct *t; 424 + 425 + for_each_thread(p, t) { 426 + struct mm_struct *t_mm = READ_ONCE(t->mm); 427 + if (t_mm) 428 + return t_mm == mm; 429 + } 430 + return false; 431 + } 432 + 433 + 415 434 #ifdef CONFIG_MMU 416 435 /* 417 436 * OOM Reaper kernel thread which tries to reap the memory used by the OOM ··· 510 491 up_read(&mm->mmap_sem); 511 492 512 493 /* 513 - * Clear TIF_MEMDIE because the task shouldn't be sitting on a 514 - * reasonably reclaimable memory anymore. OOM killer can continue 515 - * by selecting other victim if unmapping hasn't led to any 516 - * improvements. This also means that selecting this task doesn't 517 - * make any sense. 494 + * This task can be safely ignored because we cannot do much more 495 + * to release its memory. 518 496 */ 519 497 tsk->signal->oom_score_adj = OOM_SCORE_ADJ_MIN; 520 - exit_oom_victim(tsk); 521 498 out: 522 499 mmput(mm); 523 500 return ret; ··· 533 518 task_pid_nr(tsk), tsk->comm); 534 519 debug_show_all_locks(); 535 520 } 521 + 522 + /* 523 + * Clear TIF_MEMDIE because the task shouldn't be sitting on a 524 + * reasonably reclaimable memory anymore or it is not a good candidate 525 + * for the oom victim right now because it cannot release its memory 526 + * itself nor by the oom reaper. 527 + */ 528 + tsk->oom_reaper_list = NULL; 529 + exit_oom_victim(tsk); 536 530 537 531 /* Drop a reference taken by wake_oom_reaper */ 538 532 put_task_struct(tsk); ··· 585 561 oom_reaper_list = tsk; 586 562 spin_unlock(&oom_reaper_lock); 587 563 wake_up(&oom_reaper_wait); 564 + } 565 + 566 + /* Check if we can reap the given task. This has to be called with stable 567 + * tsk->mm 568 + */ 569 + void try_oom_reaper(struct task_struct *tsk) 570 + { 571 + struct mm_struct *mm = tsk->mm; 572 + struct task_struct *p; 573 + 574 + if (!mm) 575 + return; 576 + 577 + /* 578 + * There might be other threads/processes which are either not 579 + * dying or even not killable. 580 + */ 581 + if (atomic_read(&mm->mm_users) > 1) { 582 + rcu_read_lock(); 583 + for_each_process(p) { 584 + bool exiting; 585 + 586 + if (!process_shares_mm(p, mm)) 587 + continue; 588 + if (same_thread_group(p, tsk)) 589 + continue; 590 + if (fatal_signal_pending(p)) 591 + continue; 592 + 593 + /* 594 + * If the task is exiting make sure the whole thread group 595 + * is exiting and cannot acces mm anymore. 596 + */ 597 + spin_lock_irq(&p->sighand->siglock); 598 + exiting = signal_group_exit(p->signal); 599 + spin_unlock_irq(&p->sighand->siglock); 600 + if (exiting) 601 + continue; 602 + 603 + /* Give up */ 604 + rcu_read_unlock(); 605 + return; 606 + } 607 + rcu_read_unlock(); 608 + } 609 + 610 + wake_oom_reaper(tsk); 588 611 } 589 612 590 613 static int __init oom_init(void) ··· 724 653 } 725 654 726 655 /* 727 - * task->mm can be NULL if the task is the exited group leader. So to 728 - * determine whether the task is using a particular mm, we examine all the 729 - * task's threads: if one of those is using this mm then this task was also 730 - * using it. 731 - */ 732 - static bool process_shares_mm(struct task_struct *p, struct mm_struct *mm) 733 - { 734 - struct task_struct *t; 735 - 736 - for_each_thread(p, t) { 737 - struct mm_struct *t_mm = READ_ONCE(t->mm); 738 - if (t_mm) 739 - return t_mm == mm; 740 - } 741 - return false; 742 - } 743 - 744 - /* 745 656 * Must be called while holding a reference to p, which will be released upon 746 657 * returning. 747 658 */ ··· 747 694 task_lock(p); 748 695 if (p->mm && task_will_free_mem(p)) { 749 696 mark_oom_victim(p); 697 + try_oom_reaper(p); 750 698 task_unlock(p); 751 699 put_task_struct(p); 752 700 return; ··· 927 873 if (current->mm && 928 874 (fatal_signal_pending(current) || task_will_free_mem(current))) { 929 875 mark_oom_victim(current); 876 + try_oom_reaper(current); 930 877 return true; 931 878 } 879 + 880 + /* 881 + * The OOM killer does not compensate for IO-less reclaim. 882 + * pagefault_out_of_memory lost its gfp context so we have to 883 + * make sure exclude 0 mask - all other users should have at least 884 + * ___GFP_DIRECT_RECLAIM to get here. 885 + */ 886 + if (oc->gfp_mask && !(oc->gfp_mask & (__GFP_FS|__GFP_NOFAIL))) 887 + return true; 932 888 933 889 /* 934 890 * Check if there were limitations on the allocation (only relevant for

+6 -2

mm/page-writeback.c

··· 296 296 #ifdef CONFIG_HIGHMEM 297 297 int node; 298 298 unsigned long x = 0; 299 + int i; 299 300 300 301 for_each_node_state(node, N_HIGH_MEMORY) { 301 - struct zone *z = &NODE_DATA(node)->node_zones[ZONE_HIGHMEM]; 302 + for (i = 0; i < MAX_NR_ZONES; i++) { 303 + struct zone *z = &NODE_DATA(node)->node_zones[i]; 302 304 303 - x += zone_dirtyable_memory(z); 305 + if (is_highmem(z)) 306 + x += zone_dirtyable_memory(z); 307 + } 304 308 } 305 309 /* 306 310 * Unreclaimable memory (kernel memory or anonymous memory

+566 -381

mm/page_alloc.c

··· 352 352 } 353 353 #endif 354 354 355 + /* Return a pointer to the bitmap storing bits affecting a block of pages */ 356 + static inline unsigned long *get_pageblock_bitmap(struct page *page, 357 + unsigned long pfn) 358 + { 359 + #ifdef CONFIG_SPARSEMEM 360 + return __pfn_to_section(pfn)->pageblock_flags; 361 + #else 362 + return page_zone(page)->pageblock_flags; 363 + #endif /* CONFIG_SPARSEMEM */ 364 + } 365 + 366 + static inline int pfn_to_bitidx(struct page *page, unsigned long pfn) 367 + { 368 + #ifdef CONFIG_SPARSEMEM 369 + pfn &= (PAGES_PER_SECTION-1); 370 + return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS; 371 + #else 372 + pfn = pfn - round_down(page_zone(page)->zone_start_pfn, pageblock_nr_pages); 373 + return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS; 374 + #endif /* CONFIG_SPARSEMEM */ 375 + } 376 + 377 + /** 378 + * get_pfnblock_flags_mask - Return the requested group of flags for the pageblock_nr_pages block of pages 379 + * @page: The page within the block of interest 380 + * @pfn: The target page frame number 381 + * @end_bitidx: The last bit of interest to retrieve 382 + * @mask: mask of bits that the caller is interested in 383 + * 384 + * Return: pageblock_bits flags 385 + */ 386 + static __always_inline unsigned long __get_pfnblock_flags_mask(struct page *page, 387 + unsigned long pfn, 388 + unsigned long end_bitidx, 389 + unsigned long mask) 390 + { 391 + unsigned long *bitmap; 392 + unsigned long bitidx, word_bitidx; 393 + unsigned long word; 394 + 395 + bitmap = get_pageblock_bitmap(page, pfn); 396 + bitidx = pfn_to_bitidx(page, pfn); 397 + word_bitidx = bitidx / BITS_PER_LONG; 398 + bitidx &= (BITS_PER_LONG-1); 399 + 400 + word = bitmap[word_bitidx]; 401 + bitidx += end_bitidx; 402 + return (word >> (BITS_PER_LONG - bitidx - 1)) & mask; 403 + } 404 + 405 + unsigned long get_pfnblock_flags_mask(struct page *page, unsigned long pfn, 406 + unsigned long end_bitidx, 407 + unsigned long mask) 408 + { 409 + return __get_pfnblock_flags_mask(page, pfn, end_bitidx, mask); 410 + } 411 + 412 + static __always_inline int get_pfnblock_migratetype(struct page *page, unsigned long pfn) 413 + { 414 + return __get_pfnblock_flags_mask(page, pfn, PB_migrate_end, MIGRATETYPE_MASK); 415 + } 416 + 417 + /** 418 + * set_pfnblock_flags_mask - Set the requested group of flags for a pageblock_nr_pages block of pages 419 + * @page: The page within the block of interest 420 + * @flags: The flags to set 421 + * @pfn: The target page frame number 422 + * @end_bitidx: The last bit of interest 423 + * @mask: mask of bits that the caller is interested in 424 + */ 425 + void set_pfnblock_flags_mask(struct page *page, unsigned long flags, 426 + unsigned long pfn, 427 + unsigned long end_bitidx, 428 + unsigned long mask) 429 + { 430 + unsigned long *bitmap; 431 + unsigned long bitidx, word_bitidx; 432 + unsigned long old_word, word; 433 + 434 + BUILD_BUG_ON(NR_PAGEBLOCK_BITS != 4); 435 + 436 + bitmap = get_pageblock_bitmap(page, pfn); 437 + bitidx = pfn_to_bitidx(page, pfn); 438 + word_bitidx = bitidx / BITS_PER_LONG; 439 + bitidx &= (BITS_PER_LONG-1); 440 + 441 + VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); 442 + 443 + bitidx += end_bitidx; 444 + mask <<= (BITS_PER_LONG - bitidx - 1); 445 + flags <<= (BITS_PER_LONG - bitidx - 1); 446 + 447 + word = READ_ONCE(bitmap[word_bitidx]); 448 + for (;;) { 449 + old_word = cmpxchg(&bitmap[word_bitidx], word, (word & ~mask) | flags); 450 + if (word == old_word) 451 + break; 452 + word = old_word; 453 + } 454 + } 355 455 356 456 void set_pageblock_migratetype(struct page *page, int migratetype) 357 457 { ··· 884 784 zone->free_area[order].nr_free++; 885 785 } 886 786 887 - static inline int free_pages_check(struct page *page) 787 + /* 788 + * A bad page could be due to a number of fields. Instead of multiple branches, 789 + * try and check multiple fields with one check. The caller must do a detailed 790 + * check if necessary. 791 + */ 792 + static inline bool page_expected_state(struct page *page, 793 + unsigned long check_flags) 888 794 { 889 - const char *bad_reason = NULL; 890 - unsigned long bad_flags = 0; 795 + if (unlikely(atomic_read(&page->_mapcount) != -1)) 796 + return false; 797 + 798 + if (unlikely((unsigned long)page->mapping | 799 + page_ref_count(page) | 800 + #ifdef CONFIG_MEMCG 801 + (unsigned long)page->mem_cgroup | 802 + #endif 803 + (page->flags & check_flags))) 804 + return false; 805 + 806 + return true; 807 + } 808 + 809 + static void free_pages_check_bad(struct page *page) 810 + { 811 + const char *bad_reason; 812 + unsigned long bad_flags; 813 + 814 + bad_reason = NULL; 815 + bad_flags = 0; 891 816 892 817 if (unlikely(atomic_read(&page->_mapcount) != -1)) 893 818 bad_reason = "nonzero mapcount"; 894 819 if (unlikely(page->mapping != NULL)) 895 820 bad_reason = "non-NULL mapping"; 896 821 if (unlikely(page_ref_count(page) != 0)) 897 - bad_reason = "nonzero _count"; 822 + bad_reason = "nonzero _refcount"; 898 823 if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_FREE)) { 899 824 bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set"; 900 825 bad_flags = PAGE_FLAGS_CHECK_AT_FREE; ··· 928 803 if (unlikely(page->mem_cgroup)) 929 804 bad_reason = "page still charged to cgroup"; 930 805 #endif 931 - if (unlikely(bad_reason)) { 932 - bad_page(page, bad_reason, bad_flags); 933 - return 1; 934 - } 935 - page_cpupid_reset_last(page); 936 - if (page->flags & PAGE_FLAGS_CHECK_AT_PREP) 937 - page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; 938 - return 0; 806 + bad_page(page, bad_reason, bad_flags); 939 807 } 940 808 941 - /* 942 - * Frees a number of pages from the PCP lists 943 - * Assumes all pages on list are in same zone, and of same order. 944 - * count is the number of pages to free. 945 - * 946 - * If the zone was previously in an "all pages pinned" state then look to 947 - * see if this freeing clears that state. 948 - * 949 - * And clear the zone's pages_scanned counter, to hold off the "all pages are 950 - * pinned" detection logic. 951 - */ 952 - static void free_pcppages_bulk(struct zone *zone, int count, 953 - struct per_cpu_pages *pcp) 809 + static inline int free_pages_check(struct page *page) 954 810 { 955 - int migratetype = 0; 956 - int batch_free = 0; 957 - int to_free = count; 958 - unsigned long nr_scanned; 811 + if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE))) 812 + return 0; 959 813 960 - spin_lock(&zone->lock); 961 - nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED); 962 - if (nr_scanned) 963 - __mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned); 964 - 965 - while (to_free) { 966 - struct page *page; 967 - struct list_head *list; 968 - 969 - /* 970 - * Remove pages from lists in a round-robin fashion. A 971 - * batch_free count is maintained that is incremented when an 972 - * empty list is encountered. This is so more pages are freed 973 - * off fuller lists instead of spinning excessively around empty 974 - * lists 975 - */ 976 - do { 977 - batch_free++; 978 - if (++migratetype == MIGRATE_PCPTYPES) 979 - migratetype = 0; 980 - list = &pcp->lists[migratetype]; 981 - } while (list_empty(list)); 982 - 983 - /* This is the only non-empty list. Free them all. */ 984 - if (batch_free == MIGRATE_PCPTYPES) 985 - batch_free = to_free; 986 - 987 - do { 988 - int mt; /* migratetype of the to-be-freed page */ 989 - 990 - page = list_last_entry(list, struct page, lru); 991 - /* must delete as __free_one_page list manipulates */ 992 - list_del(&page->lru); 993 - 994 - mt = get_pcppage_migratetype(page); 995 - /* MIGRATE_ISOLATE page should not go to pcplists */ 996 - VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); 997 - /* Pageblock could have been isolated meanwhile */ 998 - if (unlikely(has_isolate_pageblock(zone))) 999 - mt = get_pageblock_migratetype(page); 1000 - 1001 - __free_one_page(page, page_to_pfn(page), zone, 0, mt); 1002 - trace_mm_page_pcpu_drain(page, 0, mt); 1003 - } while (--to_free && --batch_free && !list_empty(list)); 1004 - } 1005 - spin_unlock(&zone->lock); 1006 - } 1007 - 1008 - static void free_one_page(struct zone *zone, 1009 - struct page *page, unsigned long pfn, 1010 - unsigned int order, 1011 - int migratetype) 1012 - { 1013 - unsigned long nr_scanned; 1014 - spin_lock(&zone->lock); 1015 - nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED); 1016 - if (nr_scanned) 1017 - __mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned); 1018 - 1019 - if (unlikely(has_isolate_pageblock(zone) || 1020 - is_migrate_isolate(migratetype))) { 1021 - migratetype = get_pfnblock_migratetype(page, pfn); 1022 - } 1023 - __free_one_page(page, pfn, zone, order, migratetype); 1024 - spin_unlock(&zone->lock); 814 + /* Something has gone sideways, find it */ 815 + free_pages_check_bad(page); 816 + return 1; 1025 817 } 1026 818 1027 819 static int free_tail_pages_check(struct page *head_page, struct page *page) ··· 989 947 page->mapping = NULL; 990 948 clear_compound_head(page); 991 949 return ret; 950 + } 951 + 952 + static __always_inline bool free_pages_prepare(struct page *page, 953 + unsigned int order, bool check_free) 954 + { 955 + int bad = 0; 956 + 957 + VM_BUG_ON_PAGE(PageTail(page), page); 958 + 959 + trace_mm_page_free(page, order); 960 + kmemcheck_free_shadow(page, order); 961 + kasan_free_pages(page, order); 962 + 963 + /* 964 + * Check tail pages before head page information is cleared to 965 + * avoid checking PageCompound for order-0 pages. 966 + */ 967 + if (unlikely(order)) { 968 + bool compound = PageCompound(page); 969 + int i; 970 + 971 + VM_BUG_ON_PAGE(compound && compound_order(page) != order, page); 972 + 973 + for (i = 1; i < (1 << order); i++) { 974 + if (compound) 975 + bad += free_tail_pages_check(page, page + i); 976 + if (unlikely(free_pages_check(page + i))) { 977 + bad++; 978 + continue; 979 + } 980 + (page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; 981 + } 982 + } 983 + if (PageAnonHead(page)) 984 + page->mapping = NULL; 985 + if (check_free) 986 + bad += free_pages_check(page); 987 + if (bad) 988 + return false; 989 + 990 + page_cpupid_reset_last(page); 991 + page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; 992 + reset_page_owner(page, order); 993 + 994 + if (!PageHighMem(page)) { 995 + debug_check_no_locks_freed(page_address(page), 996 + PAGE_SIZE << order); 997 + debug_check_no_obj_freed(page_address(page), 998 + PAGE_SIZE << order); 999 + } 1000 + arch_free_page(page, order); 1001 + kernel_poison_pages(page, 1 << order, 0); 1002 + kernel_map_pages(page, 1 << order, 0); 1003 + 1004 + return true; 1005 + } 1006 + 1007 + #ifdef CONFIG_DEBUG_VM 1008 + static inline bool free_pcp_prepare(struct page *page) 1009 + { 1010 + return free_pages_prepare(page, 0, true); 1011 + } 1012 + 1013 + static inline bool bulkfree_pcp_prepare(struct page *page) 1014 + { 1015 + return false; 1016 + } 1017 + #else 1018 + static bool free_pcp_prepare(struct page *page) 1019 + { 1020 + return free_pages_prepare(page, 0, false); 1021 + } 1022 + 1023 + static bool bulkfree_pcp_prepare(struct page *page) 1024 + { 1025 + return free_pages_check(page); 1026 + } 1027 + #endif /* CONFIG_DEBUG_VM */ 1028 + 1029 + /* 1030 + * Frees a number of pages from the PCP lists 1031 + * Assumes all pages on list are in same zone, and of same order. 1032 + * count is the number of pages to free. 1033 + * 1034 + * If the zone was previously in an "all pages pinned" state then look to 1035 + * see if this freeing clears that state. 1036 + * 1037 + * And clear the zone's pages_scanned counter, to hold off the "all pages are 1038 + * pinned" detection logic. 1039 + */ 1040 + static void free_pcppages_bulk(struct zone *zone, int count, 1041 + struct per_cpu_pages *pcp) 1042 + { 1043 + int migratetype = 0; 1044 + int batch_free = 0; 1045 + unsigned long nr_scanned; 1046 + bool isolated_pageblocks; 1047 + 1048 + spin_lock(&zone->lock); 1049 + isolated_pageblocks = has_isolate_pageblock(zone); 1050 + nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED); 1051 + if (nr_scanned) 1052 + __mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned); 1053 + 1054 + while (count) { 1055 + struct page *page; 1056 + struct list_head *list; 1057 + 1058 + /* 1059 + * Remove pages from lists in a round-robin fashion. A 1060 + * batch_free count is maintained that is incremented when an 1061 + * empty list is encountered. This is so more pages are freed 1062 + * off fuller lists instead of spinning excessively around empty 1063 + * lists 1064 + */ 1065 + do { 1066 + batch_free++; 1067 + if (++migratetype == MIGRATE_PCPTYPES) 1068 + migratetype = 0; 1069 + list = &pcp->lists[migratetype]; 1070 + } while (list_empty(list)); 1071 + 1072 + /* This is the only non-empty list. Free them all. */ 1073 + if (batch_free == MIGRATE_PCPTYPES) 1074 + batch_free = count; 1075 + 1076 + do { 1077 + int mt; /* migratetype of the to-be-freed page */ 1078 + 1079 + page = list_last_entry(list, struct page, lru); 1080 + /* must delete as __free_one_page list manipulates */ 1081 + list_del(&page->lru); 1082 + 1083 + mt = get_pcppage_migratetype(page); 1084 + /* MIGRATE_ISOLATE page should not go to pcplists */ 1085 + VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); 1086 + /* Pageblock could have been isolated meanwhile */ 1087 + if (unlikely(isolated_pageblocks)) 1088 + mt = get_pageblock_migratetype(page); 1089 + 1090 + if (bulkfree_pcp_prepare(page)) 1091 + continue; 1092 + 1093 + __free_one_page(page, page_to_pfn(page), zone, 0, mt); 1094 + trace_mm_page_pcpu_drain(page, 0, mt); 1095 + } while (--count && --batch_free && !list_empty(list)); 1096 + } 1097 + spin_unlock(&zone->lock); 1098 + } 1099 + 1100 + static void free_one_page(struct zone *zone, 1101 + struct page *page, unsigned long pfn, 1102 + unsigned int order, 1103 + int migratetype) 1104 + { 1105 + unsigned long nr_scanned; 1106 + spin_lock(&zone->lock); 1107 + nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED); 1108 + if (nr_scanned) 1109 + __mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned); 1110 + 1111 + if (unlikely(has_isolate_pageblock(zone) || 1112 + is_migrate_isolate(migratetype))) { 1113 + migratetype = get_pfnblock_migratetype(page, pfn); 1114 + } 1115 + __free_one_page(page, pfn, zone, order, migratetype); 1116 + spin_unlock(&zone->lock); 992 1117 } 993 1118 994 1119 static void __meminit __init_single_page(struct page *page, unsigned long pfn, ··· 1231 1022 } 1232 1023 } 1233 1024 1234 - static bool free_pages_prepare(struct page *page, unsigned int order) 1235 - { 1236 - bool compound = PageCompound(page); 1237 - int i, bad = 0; 1238 - 1239 - VM_BUG_ON_PAGE(PageTail(page), page); 1240 - VM_BUG_ON_PAGE(compound && compound_order(page) != order, page); 1241 - 1242 - trace_mm_page_free(page, order); 1243 - kmemcheck_free_shadow(page, order); 1244 - kasan_free_pages(page, order); 1245 - 1246 - if (PageAnon(page)) 1247 - page->mapping = NULL; 1248 - bad += free_pages_check(page); 1249 - for (i = 1; i < (1 << order); i++) { 1250 - if (compound) 1251 - bad += free_tail_pages_check(page, page + i); 1252 - bad += free_pages_check(page + i); 1253 - } 1254 - if (bad) 1255 - return false; 1256 - 1257 - reset_page_owner(page, order); 1258 - 1259 - if (!PageHighMem(page)) { 1260 - debug_check_no_locks_freed(page_address(page), 1261 - PAGE_SIZE << order); 1262 - debug_check_no_obj_freed(page_address(page), 1263 - PAGE_SIZE << order); 1264 - } 1265 - arch_free_page(page, order); 1266 - kernel_poison_pages(page, 1 << order, 0); 1267 - kernel_map_pages(page, 1 << order, 0); 1268 - 1269 - return true; 1270 - } 1271 - 1272 1025 static void __free_pages_ok(struct page *page, unsigned int order) 1273 1026 { 1274 1027 unsigned long flags; 1275 1028 int migratetype; 1276 1029 unsigned long pfn = page_to_pfn(page); 1277 1030 1278 - if (!free_pages_prepare(page, order)) 1031 + if (!free_pages_prepare(page, order, true)) 1279 1032 return; 1280 1033 1281 1034 migratetype = get_pfnblock_migratetype(page, pfn); ··· 1247 1076 local_irq_restore(flags); 1248 1077 } 1249 1078 1250 - static void __init __free_pages_boot_core(struct page *page, 1251 - unsigned long pfn, unsigned int order) 1079 + static void __init __free_pages_boot_core(struct page *page, unsigned int order) 1252 1080 { 1253 1081 unsigned int nr_pages = 1 << order; 1254 1082 struct page *p = page; ··· 1324 1154 { 1325 1155 if (early_page_uninitialised(pfn)) 1326 1156 return; 1327 - return __free_pages_boot_core(page, pfn, order); 1157 + return __free_pages_boot_core(page, order); 1328 1158 } 1329 1159 1330 1160 /* ··· 1409 1239 if (nr_pages == MAX_ORDER_NR_PAGES && 1410 1240 (pfn & (MAX_ORDER_NR_PAGES-1)) == 0) { 1411 1241 set_pageblock_migratetype(page, MIGRATE_MOVABLE); 1412 - __free_pages_boot_core(page, pfn, MAX_ORDER-1); 1242 + __free_pages_boot_core(page, MAX_ORDER-1); 1413 1243 return; 1414 1244 } 1415 1245 1416 - for (i = 0; i < nr_pages; i++, page++, pfn++) 1417 - __free_pages_boot_core(page, pfn, 0); 1246 + for (i = 0; i < nr_pages; i++, page++) 1247 + __free_pages_boot_core(page, 0); 1418 1248 } 1419 1249 1420 1250 /* Completion tracking for deferred_init_memmap() threads */ ··· 1647 1477 } 1648 1478 } 1649 1479 1650 - /* 1651 - * This page is about to be returned from the page allocator 1652 - */ 1653 - static inline int check_new_page(struct page *page) 1480 + static void check_new_page_bad(struct page *page) 1654 1481 { 1655 1482 const char *bad_reason = NULL; 1656 1483 unsigned long bad_flags = 0; ··· 1670 1503 if (unlikely(page->mem_cgroup)) 1671 1504 bad_reason = "page still charged to cgroup"; 1672 1505 #endif 1673 - if (unlikely(bad_reason)) { 1674 - bad_page(page, bad_reason, bad_flags); 1675 - return 1; 1676 - } 1677 - return 0; 1506 + bad_page(page, bad_reason, bad_flags); 1507 + } 1508 + 1509 + /* 1510 + * This page is about to be returned from the page allocator 1511 + */ 1512 + static inline int check_new_page(struct page *page) 1513 + { 1514 + if (likely(page_expected_state(page, 1515 + PAGE_FLAGS_CHECK_AT_PREP|__PG_HWPOISON))) 1516 + return 0; 1517 + 1518 + check_new_page_bad(page); 1519 + return 1; 1678 1520 } 1679 1521 1680 1522 static inline bool free_pages_prezeroed(bool poisoned) ··· 1692 1516 page_poisoning_enabled() && poisoned; 1693 1517 } 1694 1518 1695 - static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, 1696 - int alloc_flags) 1519 + #ifdef CONFIG_DEBUG_VM 1520 + static bool check_pcp_refill(struct page *page) 1521 + { 1522 + return false; 1523 + } 1524 + 1525 + static bool check_new_pcp(struct page *page) 1526 + { 1527 + return check_new_page(page); 1528 + } 1529 + #else 1530 + static bool check_pcp_refill(struct page *page) 1531 + { 1532 + return check_new_page(page); 1533 + } 1534 + static bool check_new_pcp(struct page *page) 1535 + { 1536 + return false; 1537 + } 1538 + #endif /* CONFIG_DEBUG_VM */ 1539 + 1540 + static bool check_new_pages(struct page *page, unsigned int order) 1541 + { 1542 + int i; 1543 + for (i = 0; i < (1 << order); i++) { 1544 + struct page *p = page + i; 1545 + 1546 + if (unlikely(check_new_page(p))) 1547 + return true; 1548 + } 1549 + 1550 + return false; 1551 + } 1552 + 1553 + static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, 1554 + unsigned int alloc_flags) 1697 1555 { 1698 1556 int i; 1699 1557 bool poisoned = true; 1700 1558 1701 1559 for (i = 0; i < (1 << order); i++) { 1702 1560 struct page *p = page + i; 1703 - if (unlikely(check_new_page(p))) 1704 - return 1; 1705 1561 if (poisoned) 1706 1562 poisoned &= page_is_poisoned(p); 1707 1563 } ··· 1765 1557 set_page_pfmemalloc(page); 1766 1558 else 1767 1559 clear_page_pfmemalloc(page); 1768 - 1769 - return 0; 1770 1560 } 1771 1561 1772 1562 /* ··· 2186 1980 if (unlikely(page == NULL)) 2187 1981 break; 2188 1982 1983 + if (unlikely(check_pcp_refill(page))) 1984 + continue; 1985 + 2189 1986 /* 2190 1987 * Split buddy pages returned by expand() are received here 2191 1988 * in physical page order. The page is added to the callers and ··· 2366 2157 for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) 2367 2158 if (pfn_valid(pfn)) { 2368 2159 page = pfn_to_page(pfn); 2160 + 2161 + if (page_zone(page) != zone) 2162 + continue; 2163 + 2369 2164 if (!swsusp_page_is_forbidden(page)) 2370 2165 swsusp_unset_page_free(page); 2371 2166 } ··· 2400 2187 unsigned long pfn = page_to_pfn(page); 2401 2188 int migratetype; 2402 2189 2403 - if (!free_pages_prepare(page, 0)) 2190 + if (!free_pcp_prepare(page)) 2404 2191 return; 2405 2192 2406 2193 migratetype = get_pfnblock_migratetype(page, pfn); ··· 2556 2343 } 2557 2344 2558 2345 /* 2346 + * Update NUMA hit/miss statistics 2347 + * 2348 + * Must be called with interrupts disabled. 2349 + * 2350 + * When __GFP_OTHER_NODE is set assume the node of the preferred 2351 + * zone is the local node. This is useful for daemons who allocate 2352 + * memory on behalf of other processes. 2353 + */ 2354 + static inline void zone_statistics(struct zone *preferred_zone, struct zone *z, 2355 + gfp_t flags) 2356 + { 2357 + #ifdef CONFIG_NUMA 2358 + int local_nid = numa_node_id(); 2359 + enum zone_stat_item local_stat = NUMA_LOCAL; 2360 + 2361 + if (unlikely(flags & __GFP_OTHER_NODE)) { 2362 + local_stat = NUMA_OTHER; 2363 + local_nid = preferred_zone->node; 2364 + } 2365 + 2366 + if (z->node == local_nid) { 2367 + __inc_zone_state(z, NUMA_HIT); 2368 + __inc_zone_state(z, local_stat); 2369 + } else { 2370 + __inc_zone_state(z, NUMA_MISS); 2371 + __inc_zone_state(preferred_zone, NUMA_FOREIGN); 2372 + } 2373 + #endif 2374 + } 2375 + 2376 + /* 2559 2377 * Allocate a page from the given zone. Use pcplists for order-0 allocations. 2560 2378 */ 2561 2379 static inline 2562 2380 struct page *buffered_rmqueue(struct zone *preferred_zone, 2563 2381 struct zone *zone, unsigned int order, 2564 - gfp_t gfp_flags, int alloc_flags, int migratetype) 2382 + gfp_t gfp_flags, unsigned int alloc_flags, 2383 + int migratetype) 2565 2384 { 2566 2385 unsigned long flags; 2567 2386 struct page *page; ··· 2604 2359 struct list_head *list; 2605 2360 2606 2361 local_irq_save(flags); 2607 - pcp = &this_cpu_ptr(zone->pageset)->pcp; 2608 - list = &pcp->lists[migratetype]; 2609 - if (list_empty(list)) { 2610 - pcp->count += rmqueue_bulk(zone, 0, 2611 - pcp->batch, list, 2612 - migratetype, cold); 2613 - if (unlikely(list_empty(list))) 2614 - goto failed; 2615 - } 2362 + do { 2363 + pcp = &this_cpu_ptr(zone->pageset)->pcp; 2364 + list = &pcp->lists[migratetype]; 2365 + if (list_empty(list)) { 2366 + pcp->count += rmqueue_bulk(zone, 0, 2367 + pcp->batch, list, 2368 + migratetype, cold); 2369 + if (unlikely(list_empty(list))) 2370 + goto failed; 2371 + } 2616 2372 2617 - if (cold) 2618 - page = list_last_entry(list, struct page, lru); 2619 - else 2620 - page = list_first_entry(list, struct page, lru); 2373 + if (cold) 2374 + page = list_last_entry(list, struct page, lru); 2375 + else 2376 + page = list_first_entry(list, struct page, lru); 2377 + } while (page && check_new_pcp(page)); 2621 2378 2379 + __dec_zone_state(zone, NR_ALLOC_BATCH); 2622 2380 list_del(&page->lru); 2623 2381 pcp->count--; 2624 2382 } else { ··· 2632 2384 WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); 2633 2385 spin_lock_irqsave(&zone->lock, flags); 2634 2386 2635 - page = NULL; 2636 - if (alloc_flags & ALLOC_HARDER) { 2637 - page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); 2638 - if (page) 2639 - trace_mm_page_alloc_zone_locked(page, order, migratetype); 2640 - } 2641 - if (!page) 2642 - page = __rmqueue(zone, order, migratetype); 2387 + do { 2388 + page = NULL; 2389 + if (alloc_flags & ALLOC_HARDER) { 2390 + page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); 2391 + if (page) 2392 + trace_mm_page_alloc_zone_locked(page, order, migratetype); 2393 + } 2394 + if (!page) 2395 + page = __rmqueue(zone, order, migratetype); 2396 + } while (page && check_new_pages(page, order)); 2643 2397 spin_unlock(&zone->lock); 2644 2398 if (!page) 2645 2399 goto failed; 2400 + __mod_zone_page_state(zone, NR_ALLOC_BATCH, -(1 << order)); 2646 2401 __mod_zone_freepage_state(zone, -(1 << order), 2647 2402 get_pcppage_migratetype(page)); 2648 2403 } 2649 2404 2650 - __mod_zone_page_state(zone, NR_ALLOC_BATCH, -(1 << order)); 2651 2405 if (atomic_long_read(&zone->vm_stat[NR_ALLOC_BATCH]) <= 0 && 2652 2406 !test_bit(ZONE_FAIR_DEPLETED, &zone->flags)) 2653 2407 set_bit(ZONE_FAIR_DEPLETED, &zone->flags); ··· 2751 2501 * to check in the allocation paths if no pages are free. 2752 2502 */ 2753 2503 static bool __zone_watermark_ok(struct zone *z, unsigned int order, 2754 - unsigned long mark, int classzone_idx, int alloc_flags, 2504 + unsigned long mark, int classzone_idx, 2505 + unsigned int alloc_flags, 2755 2506 long free_pages) 2756 2507 { 2757 2508 long min = mark; 2758 2509 int o; 2759 - const int alloc_harder = (alloc_flags & ALLOC_HARDER); 2510 + const bool alloc_harder = (alloc_flags & ALLOC_HARDER); 2760 2511 2761 2512 /* free_pages may go negative - that's OK */ 2762 2513 free_pages -= (1 << order) - 1; ··· 2820 2569 } 2821 2570 2822 2571 bool zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, 2823 - int classzone_idx, int alloc_flags) 2572 + int classzone_idx, unsigned int alloc_flags) 2824 2573 { 2825 2574 return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags, 2826 2575 zone_page_state(z, NR_FREE_PAGES)); 2576 + } 2577 + 2578 + static inline bool zone_watermark_fast(struct zone *z, unsigned int order, 2579 + unsigned long mark, int classzone_idx, unsigned int alloc_flags) 2580 + { 2581 + long free_pages = zone_page_state(z, NR_FREE_PAGES); 2582 + long cma_pages = 0; 2583 + 2584 + #ifdef CONFIG_CMA 2585 + /* If allocation can't use CMA areas don't use free CMA pages */ 2586 + if (!(alloc_flags & ALLOC_CMA)) 2587 + cma_pages = zone_page_state(z, NR_FREE_CMA_PAGES); 2588 + #endif 2589 + 2590 + /* 2591 + * Fast check for order-0 only. If this fails then the reserves 2592 + * need to be calculated. There is a corner case where the check 2593 + * passes but only the high-order atomic reserve are free. If 2594 + * the caller is !atomic then it'll uselessly search the free 2595 + * list. That corner case is then slower but it is harmless. 2596 + */ 2597 + if (!order && (free_pages - cma_pages) > mark + z->lowmem_reserve[classzone_idx]) 2598 + return true; 2599 + 2600 + return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags, 2601 + free_pages); 2827 2602 } 2828 2603 2829 2604 bool zone_watermark_ok_safe(struct zone *z, unsigned int order, ··· 2907 2630 get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, 2908 2631 const struct alloc_context *ac) 2909 2632 { 2910 - struct zonelist *zonelist = ac->zonelist; 2911 - struct zoneref *z; 2912 - struct page *page = NULL; 2633 + struct zoneref *z = ac->preferred_zoneref; 2913 2634 struct zone *zone; 2914 - int nr_fair_skipped = 0; 2915 - bool zonelist_rescan; 2635 + bool fair_skipped = false; 2636 + bool apply_fair = (alloc_flags & ALLOC_FAIR); 2916 2637 2917 2638 zonelist_scan: 2918 - zonelist_rescan = false; 2919 - 2920 2639 /* 2921 2640 * Scan zonelist, looking for a zone with enough free. 2922 2641 * See also __cpuset_node_allowed() comment in kernel/cpuset.c. 2923 2642 */ 2924 - for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx, 2643 + for_next_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx, 2925 2644 ac->nodemask) { 2645 + struct page *page; 2926 2646 unsigned long mark; 2927 2647 2928 2648 if (cpusets_enabled() && 2929 2649 (alloc_flags & ALLOC_CPUSET) && 2930 - !cpuset_zone_allowed(zone, gfp_mask)) 2650 + !__cpuset_zone_allowed(zone, gfp_mask)) 2931 2651 continue; 2932 2652 /* 2933 2653 * Distribute pages in proportion to the individual ··· 2932 2658 * page was allocated in should have no effect on the 2933 2659 * time the page has in memory before being reclaimed. 2934 2660 */ 2935 - if (alloc_flags & ALLOC_FAIR) { 2936 - if (!zone_local(ac->preferred_zone, zone)) 2937 - break; 2661 + if (apply_fair) { 2938 2662 if (test_bit(ZONE_FAIR_DEPLETED, &zone->flags)) { 2939 - nr_fair_skipped++; 2663 + fair_skipped = true; 2940 2664 continue; 2665 + } 2666 + if (!zone_local(ac->preferred_zoneref->zone, zone)) { 2667 + if (fair_skipped) 2668 + goto reset_fair; 2669 + apply_fair = false; 2941 2670 } 2942 2671 } 2943 2672 /* ··· 2973 2696 continue; 2974 2697 2975 2698 mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK]; 2976 - if (!zone_watermark_ok(zone, order, mark, 2977 - ac->classzone_idx, alloc_flags)) { 2699 + if (!zone_watermark_fast(zone, order, mark, 2700 + ac_classzone_idx(ac), alloc_flags)) { 2978 2701 int ret; 2979 2702 2980 2703 /* Checked here to keep the fast path fast */ ··· 2983 2706 goto try_this_zone; 2984 2707 2985 2708 if (zone_reclaim_mode == 0 || 2986 - !zone_allows_reclaim(ac->preferred_zone, zone)) 2709 + !zone_allows_reclaim(ac->preferred_zoneref->zone, zone)) 2987 2710 continue; 2988 2711 2989 2712 ret = zone_reclaim(zone, gfp_mask, order); ··· 2997 2720 default: 2998 2721 /* did we reclaim enough */ 2999 2722 if (zone_watermark_ok(zone, order, mark, 3000 - ac->classzone_idx, alloc_flags)) 2723 + ac_classzone_idx(ac), alloc_flags)) 3001 2724 goto try_this_zone; 3002 2725 3003 2726 continue; ··· 3005 2728 } 3006 2729 3007 2730 try_this_zone: 3008 - page = buffered_rmqueue(ac->preferred_zone, zone, order, 2731 + page = buffered_rmqueue(ac->preferred_zoneref->zone, zone, order, 3009 2732 gfp_mask, alloc_flags, ac->migratetype); 3010 2733 if (page) { 3011 - if (prep_new_page(page, order, gfp_mask, alloc_flags)) 3012 - goto try_this_zone; 2734 + prep_new_page(page, order, gfp_mask, alloc_flags); 3013 2735 3014 2736 /* 3015 2737 * If this is a high-order atomic allocation then check ··· 3029 2753 * include remote zones now, before entering the slowpath and waking 3030 2754 * kswapd: prefer spilling to a remote zone over swapping locally. 3031 2755 */ 3032 - if (alloc_flags & ALLOC_FAIR) { 3033 - alloc_flags &= ~ALLOC_FAIR; 3034 - if (nr_fair_skipped) { 3035 - zonelist_rescan = true; 3036 - reset_alloc_batches(ac->preferred_zone); 3037 - } 3038 - if (nr_online_nodes > 1) 3039 - zonelist_rescan = true; 3040 - } 3041 - 3042 - if (zonelist_rescan) 2756 + if (fair_skipped) { 2757 + reset_fair: 2758 + apply_fair = false; 2759 + fair_skipped = false; 2760 + reset_alloc_batches(ac->preferred_zoneref->zone); 3043 2761 goto zonelist_scan; 2762 + } 3044 2763 3045 2764 return NULL; 3046 2765 } ··· 3143 2872 /* The OOM killer does not needlessly kill tasks for lowmem */ 3144 2873 if (ac->high_zoneidx < ZONE_NORMAL) 3145 2874 goto out; 3146 - /* The OOM killer does not compensate for IO-less reclaim */ 3147 - if (!(gfp_mask & __GFP_FS)) { 3148 - /* 3149 - * XXX: Page reclaim didn't yield anything, 3150 - * and the OOM killer can't be invoked, but 3151 - * keep looping as per tradition. 3152 - * 3153 - * But do not keep looping if oom_killer_disable() 3154 - * was already called, for the system is trying to 3155 - * enter a quiescent state during suspend. 3156 - */ 3157 - *did_some_progress = !oom_killer_disabled; 3158 - goto out; 3159 - } 3160 2875 if (pm_suspended_storage()) 3161 2876 goto out; 2877 + /* 2878 + * XXX: GFP_NOFS allocations should rather fail than rely on 2879 + * other request to make a forward progress. 2880 + * We are in an unfortunate situation where out_of_memory cannot 2881 + * do much for this context but let's try it to at least get 2882 + * access to memory reserved if the current task is killed (see 2883 + * out_of_memory). Once filesystems are ready to handle allocation 2884 + * failures more gracefully we should just bail out here. 2885 + */ 2886 + 3162 2887 /* The OOM killer may not free memory on a specific node */ 3163 2888 if (gfp_mask & __GFP_THISNODE) 3164 2889 goto out; ··· 3184 2917 /* Try memory compaction for high-order allocations before reclaim */ 3185 2918 static struct page * 3186 2919 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, 3187 - int alloc_flags, const struct alloc_context *ac, 2920 + unsigned int alloc_flags, const struct alloc_context *ac, 3188 2921 enum migrate_mode mode, int *contended_compaction, 3189 2922 bool *deferred_compaction) 3190 2923 { ··· 3240 2973 #else 3241 2974 static inline struct page * 3242 2975 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, 3243 - int alloc_flags, const struct alloc_context *ac, 2976 + unsigned int alloc_flags, const struct alloc_context *ac, 3244 2977 enum migrate_mode mode, int *contended_compaction, 3245 2978 bool *deferred_compaction) 3246 2979 { ··· 3280 3013 /* The really slow allocator path where we enter direct reclaim */ 3281 3014 static inline struct page * 3282 3015 __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order, 3283 - int alloc_flags, const struct alloc_context *ac, 3016 + unsigned int alloc_flags, const struct alloc_context *ac, 3284 3017 unsigned long *did_some_progress) 3285 3018 { 3286 3019 struct page *page = NULL; ··· 3316 3049 3317 3050 for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, 3318 3051 ac->high_zoneidx, ac->nodemask) 3319 - wakeup_kswapd(zone, order, zone_idx(ac->preferred_zone)); 3052 + wakeup_kswapd(zone, order, ac_classzone_idx(ac)); 3320 3053 } 3321 3054 3322 - static inline int 3055 + static inline unsigned int 3323 3056 gfp_to_alloc_flags(gfp_t gfp_mask) 3324 3057 { 3325 - int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET; 3058 + unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET; 3326 3059 3327 3060 /* __GFP_HIGH is assumed to be the same as ALLOC_HIGH to save a branch. */ 3328 3061 BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH); ··· 3383 3116 { 3384 3117 bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM; 3385 3118 struct page *page = NULL; 3386 - int alloc_flags; 3119 + unsigned int alloc_flags; 3387 3120 unsigned long pages_reclaimed = 0; 3388 3121 unsigned long did_some_progress; 3389 3122 enum migrate_mode migration_mode = MIGRATE_ASYNC; ··· 3419 3152 * to how we want to proceed. 3420 3153 */ 3421 3154 alloc_flags = gfp_to_alloc_flags(gfp_mask); 3422 - 3423 - /* 3424 - * Find the true preferred zone if the allocation is unconstrained by 3425 - * cpusets. 3426 - */ 3427 - if (!(alloc_flags & ALLOC_CPUSET) && !ac->nodemask) { 3428 - struct zoneref *preferred_zoneref; 3429 - preferred_zoneref = first_zones_zonelist(ac->zonelist, 3430 - ac->high_zoneidx, NULL, &ac->preferred_zone); 3431 - ac->classzone_idx = zonelist_zone_idx(preferred_zoneref); 3432 - } 3433 3155 3434 3156 /* This is the last chance, in general, before the goto nopage. */ 3435 3157 page = get_page_from_freelist(gfp_mask, order, ··· 3534 3278 if ((did_some_progress && order <= PAGE_ALLOC_COSTLY_ORDER) || 3535 3279 ((gfp_mask & __GFP_REPEAT) && pages_reclaimed < (1 << order))) { 3536 3280 /* Wait for some write requests to complete then retry */ 3537 - wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC, HZ/50); 3281 + wait_iff_congested(ac->preferred_zoneref->zone, BLK_RW_ASYNC, HZ/50); 3538 3282 goto retry; 3539 3283 } 3540 3284 ··· 3572 3316 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, 3573 3317 struct zonelist *zonelist, nodemask_t *nodemask) 3574 3318 { 3575 - struct zoneref *preferred_zoneref; 3576 - struct page *page = NULL; 3319 + struct page *page; 3577 3320 unsigned int cpuset_mems_cookie; 3578 - int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; 3579 - gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ 3321 + unsigned int alloc_flags = ALLOC_WMARK_LOW|ALLOC_FAIR; 3322 + gfp_t alloc_mask = gfp_mask; /* The gfp_t that was actually used for allocation */ 3580 3323 struct alloc_context ac = { 3581 3324 .high_zoneidx = gfp_zone(gfp_mask), 3325 + .zonelist = zonelist, 3582 3326 .nodemask = nodemask, 3583 3327 .migratetype = gfpflags_to_migratetype(gfp_mask), 3584 3328 }; 3329 + 3330 + if (cpusets_enabled()) { 3331 + alloc_mask |= __GFP_HARDWALL; 3332 + alloc_flags |= ALLOC_CPUSET; 3333 + if (!ac.nodemask) 3334 + ac.nodemask = &cpuset_current_mems_allowed; 3335 + } 3585 3336 3586 3337 gfp_mask &= gfp_allowed_mask; 3587 3338 ··· 3613 3350 retry_cpuset: 3614 3351 cpuset_mems_cookie = read_mems_allowed_begin(); 3615 3352 3616 - /* We set it here, as __alloc_pages_slowpath might have changed it */ 3617 - ac.zonelist = zonelist; 3618 - 3619 3353 /* Dirty zone balancing only done in the fast path */ 3620 3354 ac.spread_dirty_pages = (gfp_mask & __GFP_WRITE); 3621 3355 3622 3356 /* The preferred zone is used for statistics later */ 3623 - preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx, 3624 - ac.nodemask ? : &cpuset_current_mems_allowed, 3625 - &ac.preferred_zone); 3626 - if (!ac.preferred_zone) 3627 - goto out; 3628 - ac.classzone_idx = zonelist_zone_idx(preferred_zoneref); 3629 - 3630 - /* First allocation attempt */ 3631 - alloc_mask = gfp_mask|__GFP_HARDWALL; 3632 - page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac); 3633 - if (unlikely(!page)) { 3634 - /* 3635 - * Runtime PM, block IO and its error handling path 3636 - * can deadlock because I/O on the device might not 3637 - * complete. 3638 - */ 3639 - alloc_mask = memalloc_noio_flags(gfp_mask); 3640 - ac.spread_dirty_pages = false; 3641 - 3642 - page = __alloc_pages_slowpath(alloc_mask, order, &ac); 3357 + ac.preferred_zoneref = first_zones_zonelist(ac.zonelist, 3358 + ac.high_zoneidx, ac.nodemask); 3359 + if (!ac.preferred_zoneref) { 3360 + page = NULL; 3361 + goto no_zone; 3643 3362 } 3644 3363 3645 - if (kmemcheck_enabled && page) 3646 - kmemcheck_pagealloc_alloc(page, order, gfp_mask); 3364 + /* First allocation attempt */ 3365 + page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac); 3366 + if (likely(page)) 3367 + goto out; 3647 3368 3648 - trace_mm_page_alloc(page, order, alloc_mask, ac.migratetype); 3369 + /* 3370 + * Runtime PM, block IO and its error handling path can deadlock 3371 + * because I/O on the device might not complete. 3372 + */ 3373 + alloc_mask = memalloc_noio_flags(gfp_mask); 3374 + ac.spread_dirty_pages = false; 3649 3375 3650 - out: 3376 + /* 3377 + * Restore the original nodemask if it was potentially replaced with 3378 + * &cpuset_current_mems_allowed to optimize the fast-path attempt. 3379 + */ 3380 + if (cpusets_enabled()) 3381 + ac.nodemask = nodemask; 3382 + page = __alloc_pages_slowpath(alloc_mask, order, &ac); 3383 + 3384 + no_zone: 3651 3385 /* 3652 3386 * When updating a task's mems_allowed, it is possible to race with 3653 3387 * parallel threads in such a way that an allocation can fail while 3654 3388 * the mask is being updated. If a page allocation is about to fail, 3655 3389 * check if the cpuset changed during allocation and if so, retry. 3656 3390 */ 3657 - if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie))) 3391 + if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie))) { 3392 + alloc_mask = gfp_mask; 3658 3393 goto retry_cpuset; 3394 + } 3395 + 3396 + out: 3397 + if (kmemcheck_enabled && page) 3398 + kmemcheck_pagealloc_alloc(page, order, gfp_mask); 3399 + 3400 + trace_mm_page_alloc(page, order, alloc_mask, ac.migratetype); 3659 3401 3660 3402 return page; 3661 3403 } ··· 4058 3790 { 4059 3791 int zone_type; /* needs to be signed */ 4060 3792 unsigned long managed_pages = 0; 3793 + unsigned long managed_highpages = 0; 3794 + unsigned long free_highpages = 0; 4061 3795 pg_data_t *pgdat = NODE_DATA(nid); 4062 3796 4063 3797 for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++) ··· 4068 3798 val->sharedram = node_page_state(nid, NR_SHMEM); 4069 3799 val->freeram = node_page_state(nid, NR_FREE_PAGES); 4070 3800 #ifdef CONFIG_HIGHMEM 4071 - val->totalhigh = pgdat->node_zones[ZONE_HIGHMEM].managed_pages; 4072 - val->freehigh = zone_page_state(&pgdat->node_zones[ZONE_HIGHMEM], 4073 - NR_FREE_PAGES); 3801 + for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++) { 3802 + struct zone *zone = &pgdat->node_zones[zone_type]; 3803 + 3804 + if (is_highmem(zone)) { 3805 + managed_highpages += zone->managed_pages; 3806 + free_highpages += zone_page_state(zone, NR_FREE_PAGES); 3807 + } 3808 + } 3809 + val->totalhigh = managed_highpages; 3810 + val->freehigh = free_highpages; 4074 3811 #else 4075 - val->totalhigh = 0; 4076 - val->freehigh = 0; 3812 + val->totalhigh = managed_highpages; 3813 + val->freehigh = free_highpages; 4077 3814 #endif 4078 3815 val->mem_unit = PAGE_SIZE; 4079 3816 } ··· 4667 4390 */ 4668 4391 int local_memory_node(int node) 4669 4392 { 4670 - struct zone *zone; 4393 + struct zoneref *z; 4671 4394 4672 - (void)first_zones_zonelist(node_zonelist(node, GFP_KERNEL), 4395 + z = first_zones_zonelist(node_zonelist(node, GFP_KERNEL), 4673 4396 gfp_zone(GFP_KERNEL), 4674 - NULL, 4675 - &zone); 4676 - return zone->node; 4397 + NULL); 4398 + return z->zone->node; 4677 4399 } 4678 4400 #endif 4679 4401 ··· 7001 6725 return table; 7002 6726 } 7003 6727 7004 - /* Return a pointer to the bitmap storing bits affecting a block of pages */ 7005 - static inline unsigned long *get_pageblock_bitmap(struct zone *zone, 7006 - unsigned long pfn) 7007 - { 7008 - #ifdef CONFIG_SPARSEMEM 7009 - return __pfn_to_section(pfn)->pageblock_flags; 7010 - #else 7011 - return zone->pageblock_flags; 7012 - #endif /* CONFIG_SPARSEMEM */ 7013 - } 7014 - 7015 - static inline int pfn_to_bitidx(struct zone *zone, unsigned long pfn) 7016 - { 7017 - #ifdef CONFIG_SPARSEMEM 7018 - pfn &= (PAGES_PER_SECTION-1); 7019 - return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS; 7020 - #else 7021 - pfn = pfn - round_down(zone->zone_start_pfn, pageblock_nr_pages); 7022 - return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS; 7023 - #endif /* CONFIG_SPARSEMEM */ 7024 - } 7025 - 7026 - /** 7027 - * get_pfnblock_flags_mask - Return the requested group of flags for the pageblock_nr_pages block of pages 7028 - * @page: The page within the block of interest 7029 - * @pfn: The target page frame number 7030 - * @end_bitidx: The last bit of interest to retrieve 7031 - * @mask: mask of bits that the caller is interested in 7032 - * 7033 - * Return: pageblock_bits flags 7034 - */ 7035 - unsigned long get_pfnblock_flags_mask(struct page *page, unsigned long pfn, 7036 - unsigned long end_bitidx, 7037 - unsigned long mask) 7038 - { 7039 - struct zone *zone; 7040 - unsigned long *bitmap; 7041 - unsigned long bitidx, word_bitidx; 7042 - unsigned long word; 7043 - 7044 - zone = page_zone(page); 7045 - bitmap = get_pageblock_bitmap(zone, pfn); 7046 - bitidx = pfn_to_bitidx(zone, pfn); 7047 - word_bitidx = bitidx / BITS_PER_LONG; 7048 - bitidx &= (BITS_PER_LONG-1); 7049 - 7050 - word = bitmap[word_bitidx]; 7051 - bitidx += end_bitidx; 7052 - return (word >> (BITS_PER_LONG - bitidx - 1)) & mask; 7053 - } 7054 - 7055 - /** 7056 - * set_pfnblock_flags_mask - Set the requested group of flags for a pageblock_nr_pages block of pages 7057 - * @page: The page within the block of interest 7058 - * @flags: The flags to set 7059 - * @pfn: The target page frame number 7060 - * @end_bitidx: The last bit of interest 7061 - * @mask: mask of bits that the caller is interested in 7062 - */ 7063 - void set_pfnblock_flags_mask(struct page *page, unsigned long flags, 7064 - unsigned long pfn, 7065 - unsigned long end_bitidx, 7066 - unsigned long mask) 7067 - { 7068 - struct zone *zone; 7069 - unsigned long *bitmap; 7070 - unsigned long bitidx, word_bitidx; 7071 - unsigned long old_word, word; 7072 - 7073 - BUILD_BUG_ON(NR_PAGEBLOCK_BITS != 4); 7074 - 7075 - zone = page_zone(page); 7076 - bitmap = get_pageblock_bitmap(zone, pfn); 7077 - bitidx = pfn_to_bitidx(zone, pfn); 7078 - word_bitidx = bitidx / BITS_PER_LONG; 7079 - bitidx &= (BITS_PER_LONG-1); 7080 - 7081 - VM_BUG_ON_PAGE(!zone_spans_pfn(zone, pfn), page); 7082 - 7083 - bitidx += end_bitidx; 7084 - mask <<= (BITS_PER_LONG - bitidx - 1); 7085 - flags <<= (BITS_PER_LONG - bitidx - 1); 7086 - 7087 - word = READ_ONCE(bitmap[word_bitidx]); 7088 - for (;;) { 7089 - old_word = cmpxchg(&bitmap[word_bitidx], word, (word & ~mask) | flags); 7090 - if (word == old_word) 7091 - break; 7092 - word = old_word; 7093 - } 7094 - } 7095 - 7096 6728 /* 7097 6729 * This function checks whether pageblock includes unmovable pages or not. 7098 6730 * If @count is not zero, it is okay to include less @count unmovable pages ··· 7048 6864 * We can't use page_count without pin a page 7049 6865 * because another CPU can free compound page. 7050 6866 * This check already skips compound tails of THP 7051 - * because their page->_count is zero at all time. 6867 + * because their page->_refcount is zero at all time. 7052 6868 */ 7053 6869 if (!page_ref_count(page)) { 7054 6870 if (PageBuddy(page)) ··· 7361 7177 7362 7178 #ifdef CONFIG_MEMORY_HOTREMOVE 7363 7179 /* 7364 - * All pages in the range must be isolated before calling this. 7180 + * All pages in the range must be in a single zone and isolated 7181 + * before calling this. 7365 7182 */ 7366 7183 void 7367 7184 __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)

+4 -6

mm/page_isolation.c

··· 246 246 return pfn; 247 247 } 248 248 249 + /* Caller should ensure that requested range is in a single zone */ 249 250 int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn, 250 251 bool skip_hwpoisoned_pages) 251 252 { ··· 289 288 * accordance with memory policy of the user process if possible. For 290 289 * now as a simple work-around, we use the next node for destination. 291 290 */ 292 - if (PageHuge(page)) { 293 - int node = next_online_node(page_to_nid(page)); 294 - if (node == MAX_NUMNODES) 295 - node = first_online_node; 291 + if (PageHuge(page)) 296 292 return alloc_huge_page_node(page_hstate(compound_head(page)), 297 - node); 298 - } 293 + next_node_in(page_to_nid(page), 294 + node_online_map)); 299 295 300 296 if (PageHighMem(page)) 301 297 gfp_mask |= __GFP_HIGHMEM;

+4 -1

mm/page_owner.c

··· 143 143 goto err; 144 144 145 145 /* Print information relevant to grouping pages by mobility */ 146 - pageblock_mt = get_pfnblock_migratetype(page, pfn); 146 + pageblock_mt = get_pageblock_migratetype(page); 147 147 page_mt = gfpflags_to_migratetype(page_ext->gfp_mask); 148 148 ret += snprintf(kbuf + ret, count - ret, 149 149 "PFN %lu type %s Block %lu type %s Flags %#lx(%pGp)\n", ··· 300 300 continue; 301 301 302 302 page = pfn_to_page(pfn); 303 + 304 + if (page_zone(page) != zone) 305 + continue; 303 306 304 307 /* 305 308 * We are safe to check buddy flag and order, because

+2 -2

mm/rmap.c

··· 409 409 list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) { 410 410 struct anon_vma *anon_vma = avc->anon_vma; 411 411 412 - BUG_ON(anon_vma->degree); 412 + VM_WARN_ON(anon_vma->degree); 413 413 put_anon_vma(anon_vma); 414 414 415 415 list_del(&avc->same_vma); ··· 1249 1249 int nr = compound ? hpage_nr_pages(page) : 1; 1250 1250 1251 1251 VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma); 1252 - SetPageSwapBacked(page); 1252 + __SetPageSwapBacked(page); 1253 1253 if (compound) { 1254 1254 VM_BUG_ON_PAGE(!PageTransHuge(page), page); 1255 1255 /* increment count (starts at -1) */

+60 -70

mm/shmem.c

··· 101 101 enum sgp_type { 102 102 SGP_READ, /* don't exceed i_size, don't allocate page */ 103 103 SGP_CACHE, /* don't exceed i_size, may allocate page */ 104 - SGP_DIRTY, /* like SGP_CACHE, but set new page dirty */ 105 104 SGP_WRITE, /* may exceed i_size, may allocate !Uptodate page */ 106 105 SGP_FALLOC, /* like SGP_WRITE, but make existing page Uptodate */ 107 106 }; ··· 121 122 static int shmem_replace_page(struct page **pagep, gfp_t gfp, 122 123 struct shmem_inode_info *info, pgoff_t index); 123 124 static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, 124 - struct page **pagep, enum sgp_type sgp, gfp_t gfp, int *fault_type); 125 + struct page **pagep, enum sgp_type sgp, 126 + gfp_t gfp, struct mm_struct *fault_mm, int *fault_type); 125 127 126 128 static inline int shmem_getpage(struct inode *inode, pgoff_t index, 127 - struct page **pagep, enum sgp_type sgp, int *fault_type) 129 + struct page **pagep, enum sgp_type sgp) 128 130 { 129 131 return shmem_getpage_gfp(inode, index, pagep, sgp, 130 - mapping_gfp_mask(inode->i_mapping), fault_type); 132 + mapping_gfp_mask(inode->i_mapping), NULL, NULL); 131 133 } 132 134 133 135 static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb) ··· 169 169 170 170 /* 171 171 * ... whereas tmpfs objects are accounted incrementally as 172 - * pages are allocated, in order to allow huge sparse files. 172 + * pages are allocated, in order to allow large sparse files. 173 173 * shmem_getpage reports shmem_acct_block failure as -ENOSPC not -ENOMEM, 174 174 * so that a failure on a sparse tmpfs mapping will give SIGBUS not OOM. 175 175 */ ··· 528 528 529 529 if (partial_start) { 530 530 struct page *page = NULL; 531 - shmem_getpage(inode, start - 1, &page, SGP_READ, NULL); 531 + shmem_getpage(inode, start - 1, &page, SGP_READ); 532 532 if (page) { 533 533 unsigned int top = PAGE_SIZE; 534 534 if (start > end) { ··· 543 543 } 544 544 if (partial_end) { 545 545 struct page *page = NULL; 546 - shmem_getpage(inode, end, &page, SGP_READ, NULL); 546 + shmem_getpage(inode, end, &page, SGP_READ); 547 547 if (page) { 548 548 zero_user_segment(page, 0, partial_end); 549 549 set_page_dirty(page); ··· 947 947 return 0; 948 948 } 949 949 950 - #ifdef CONFIG_NUMA 951 - #ifdef CONFIG_TMPFS 950 + #if defined(CONFIG_NUMA) && defined(CONFIG_TMPFS) 952 951 static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol) 953 952 { 954 953 char buffer[64]; ··· 971 972 } 972 973 return mpol; 973 974 } 974 - #endif /* CONFIG_TMPFS */ 975 + #else /* !CONFIG_NUMA || !CONFIG_TMPFS */ 976 + static inline void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol) 977 + { 978 + } 979 + static inline struct mempolicy *shmem_get_sbmpol(struct shmem_sb_info *sbinfo) 980 + { 981 + return NULL; 982 + } 983 + #endif /* CONFIG_NUMA && CONFIG_TMPFS */ 984 + #ifndef CONFIG_NUMA 985 + #define vm_policy vm_private_data 986 + #endif 975 987 976 988 static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp, 977 989 struct shmem_inode_info *info, pgoff_t index) ··· 1018 1008 pvma.vm_ops = NULL; 1019 1009 pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, index); 1020 1010 1021 - page = alloc_page_vma(gfp, &pvma, 0); 1011 + page = alloc_pages_vma(gfp, 0, &pvma, 0, numa_node_id(), false); 1012 + if (page) { 1013 + __SetPageLocked(page); 1014 + __SetPageSwapBacked(page); 1015 + } 1022 1016 1023 1017 /* Drop reference taken by mpol_shared_policy_lookup() */ 1024 1018 mpol_cond_put(pvma.vm_policy); 1025 1019 1026 1020 return page; 1027 1021 } 1028 - #else /* !CONFIG_NUMA */ 1029 - #ifdef CONFIG_TMPFS 1030 - static inline void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol) 1031 - { 1032 - } 1033 - #endif /* CONFIG_TMPFS */ 1034 - 1035 - static inline struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp, 1036 - struct shmem_inode_info *info, pgoff_t index) 1037 - { 1038 - return swapin_readahead(swap, gfp, NULL, 0); 1039 - } 1040 - 1041 - static inline struct page *shmem_alloc_page(gfp_t gfp, 1042 - struct shmem_inode_info *info, pgoff_t index) 1043 - { 1044 - return alloc_page(gfp); 1045 - } 1046 - #endif /* CONFIG_NUMA */ 1047 - 1048 - #if !defined(CONFIG_NUMA) || !defined(CONFIG_TMPFS) 1049 - static inline struct mempolicy *shmem_get_sbmpol(struct shmem_sb_info *sbinfo) 1050 - { 1051 - return NULL; 1052 - } 1053 - #endif 1054 1022 1055 1023 /* 1056 1024 * When a page is moved from swapcache to shmem filecache (either by the ··· 1072 1084 copy_highpage(newpage, oldpage); 1073 1085 flush_dcache_page(newpage); 1074 1086 1075 - __SetPageLocked(newpage); 1076 1087 SetPageUptodate(newpage); 1077 - SetPageSwapBacked(newpage); 1078 1088 set_page_private(newpage, swap_index); 1079 1089 SetPageSwapCache(newpage); 1080 1090 ··· 1116 1130 * 1117 1131 * If we allocate a new one we do not mark it dirty. That's up to the 1118 1132 * vm. If we swap it in we mark it dirty since we also free the swap 1119 - * entry since a page cannot live in both the swap and page cache 1133 + * entry since a page cannot live in both the swap and page cache. 1134 + * 1135 + * fault_mm and fault_type are only supplied by shmem_fault: 1136 + * otherwise they are NULL. 1120 1137 */ 1121 1138 static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, 1122 - struct page **pagep, enum sgp_type sgp, gfp_t gfp, int *fault_type) 1139 + struct page **pagep, enum sgp_type sgp, gfp_t gfp, 1140 + struct mm_struct *fault_mm, int *fault_type) 1123 1141 { 1124 1142 struct address_space *mapping = inode->i_mapping; 1125 1143 struct shmem_inode_info *info; 1126 1144 struct shmem_sb_info *sbinfo; 1145 + struct mm_struct *charge_mm; 1127 1146 struct mem_cgroup *memcg; 1128 1147 struct page *page; 1129 1148 swp_entry_t swap; ··· 1146 1155 page = NULL; 1147 1156 } 1148 1157 1149 - if (sgp != SGP_WRITE && sgp != SGP_FALLOC && 1158 + if (sgp <= SGP_CACHE && 1150 1159 ((loff_t)index << PAGE_SHIFT) >= i_size_read(inode)) { 1151 1160 error = -EINVAL; 1152 1161 goto unlock; ··· 1174 1183 */ 1175 1184 info = SHMEM_I(inode); 1176 1185 sbinfo = SHMEM_SB(inode->i_sb); 1186 + charge_mm = fault_mm ? : current->mm; 1177 1187 1178 1188 if (swap.val) { 1179 1189 /* Look it up and read it in.. */ 1180 1190 page = lookup_swap_cache(swap); 1181 1191 if (!page) { 1182 - /* here we actually do the io */ 1183 - if (fault_type) 1192 + /* Or update major stats only when swapin succeeds?? */ 1193 + if (fault_type) { 1184 1194 *fault_type |= VM_FAULT_MAJOR; 1195 + count_vm_event(PGMAJFAULT); 1196 + mem_cgroup_count_vm_event(fault_mm, PGMAJFAULT); 1197 + } 1198 + /* Here we actually start the io */ 1185 1199 page = shmem_swapin(swap, gfp, info, index); 1186 1200 if (!page) { 1187 1201 error = -ENOMEM; ··· 1213 1217 goto failed; 1214 1218 } 1215 1219 1216 - error = mem_cgroup_try_charge(page, current->mm, gfp, &memcg, 1220 + error = mem_cgroup_try_charge(page, charge_mm, gfp, &memcg, 1217 1221 false); 1218 1222 if (!error) { 1219 1223 error = shmem_add_to_page_cache(page, mapping, index, ··· 1271 1275 error = -ENOMEM; 1272 1276 goto decused; 1273 1277 } 1274 - 1275 - __SetPageSwapBacked(page); 1276 - __SetPageLocked(page); 1277 1278 if (sgp == SGP_WRITE) 1278 1279 __SetPageReferenced(page); 1279 1280 1280 - error = mem_cgroup_try_charge(page, current->mm, gfp, &memcg, 1281 + error = mem_cgroup_try_charge(page, charge_mm, gfp, &memcg, 1281 1282 false); 1282 1283 if (error) 1283 1284 goto decused; ··· 1314 1321 flush_dcache_page(page); 1315 1322 SetPageUptodate(page); 1316 1323 } 1317 - if (sgp == SGP_DIRTY) 1318 - set_page_dirty(page); 1319 1324 } 1320 1325 1321 1326 /* Perhaps the file has been truncated since we checked */ 1322 - if (sgp != SGP_WRITE && sgp != SGP_FALLOC && 1327 + if (sgp <= SGP_CACHE && 1323 1328 ((loff_t)index << PAGE_SHIFT) >= i_size_read(inode)) { 1324 1329 if (alloced) { 1325 1330 ClearPageDirty(page); ··· 1363 1372 static int shmem_fault(struct vm_area_struct *vma, struct vm_fault *vmf) 1364 1373 { 1365 1374 struct inode *inode = file_inode(vma->vm_file); 1375 + gfp_t gfp = mapping_gfp_mask(inode->i_mapping); 1366 1376 int error; 1367 1377 int ret = VM_FAULT_LOCKED; 1368 1378 ··· 1425 1433 spin_unlock(&inode->i_lock); 1426 1434 } 1427 1435 1428 - error = shmem_getpage(inode, vmf->pgoff, &vmf->page, SGP_CACHE, &ret); 1436 + error = shmem_getpage_gfp(inode, vmf->pgoff, &vmf->page, SGP_CACHE, 1437 + gfp, vma->vm_mm, &ret); 1429 1438 if (error) 1430 1439 return ((error == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS); 1431 - 1432 - if (ret & VM_FAULT_MAJOR) { 1433 - count_vm_event(PGMAJFAULT); 1434 - mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT); 1435 - } 1436 1440 return ret; 1437 1441 } 1438 1442 ··· 1575 1587 return -EPERM; 1576 1588 } 1577 1589 1578 - return shmem_getpage(inode, index, pagep, SGP_WRITE, NULL); 1590 + return shmem_getpage(inode, index, pagep, SGP_WRITE); 1579 1591 } 1580 1592 1581 1593 static int ··· 1621 1633 * and even mark them dirty, so it cannot exceed the max_blocks limit. 1622 1634 */ 1623 1635 if (!iter_is_iovec(to)) 1624 - sgp = SGP_DIRTY; 1636 + sgp = SGP_CACHE; 1625 1637 1626 1638 index = *ppos >> PAGE_SHIFT; 1627 1639 offset = *ppos & ~PAGE_MASK; ··· 1641 1653 break; 1642 1654 } 1643 1655 1644 - error = shmem_getpage(inode, index, &page, sgp, NULL); 1656 + error = shmem_getpage(inode, index, &page, sgp); 1645 1657 if (error) { 1646 1658 if (error == -EINVAL) 1647 1659 error = 0; 1648 1660 break; 1649 1661 } 1650 - if (page) 1662 + if (page) { 1663 + if (sgp == SGP_CACHE) 1664 + set_page_dirty(page); 1651 1665 unlock_page(page); 1666 + } 1652 1667 1653 1668 /* 1654 1669 * We must evaluate after, since reads (unlike writes) ··· 1757 1766 error = 0; 1758 1767 1759 1768 while (spd.nr_pages < nr_pages) { 1760 - error = shmem_getpage(inode, index, &page, SGP_CACHE, NULL); 1769 + error = shmem_getpage(inode, index, &page, SGP_CACHE); 1761 1770 if (error) 1762 1771 break; 1763 1772 unlock_page(page); ··· 1779 1788 page = spd.pages[page_nr]; 1780 1789 1781 1790 if (!PageUptodate(page) || page->mapping != mapping) { 1782 - error = shmem_getpage(inode, index, &page, 1783 - SGP_CACHE, NULL); 1791 + error = shmem_getpage(inode, index, &page, SGP_CACHE); 1784 1792 if (error) 1785 1793 break; 1786 1794 unlock_page(page); ··· 2222 2232 else if (shmem_falloc.nr_unswapped > shmem_falloc.nr_falloced) 2223 2233 error = -ENOMEM; 2224 2234 else 2225 - error = shmem_getpage(inode, index, &page, SGP_FALLOC, 2226 - NULL); 2235 + error = shmem_getpage(inode, index, &page, SGP_FALLOC); 2227 2236 if (error) { 2228 2237 /* Remove the !PageUptodate pages we added */ 2229 2238 shmem_undo_range(inode, ··· 2540 2551 inode->i_op = &shmem_short_symlink_operations; 2541 2552 } else { 2542 2553 inode_nohighmem(inode); 2543 - error = shmem_getpage(inode, 0, &page, SGP_WRITE, NULL); 2554 + error = shmem_getpage(inode, 0, &page, SGP_WRITE); 2544 2555 if (error) { 2545 2556 iput(inode); 2546 2557 return error; ··· 2581 2592 return ERR_PTR(-ECHILD); 2582 2593 } 2583 2594 } else { 2584 - error = shmem_getpage(inode, 0, &page, SGP_READ, NULL); 2595 + error = shmem_getpage(inode, 0, &page, SGP_READ); 2585 2596 if (error) 2586 2597 return ERR_PTR(error); 2587 2598 unlock_page(page); ··· 3485 3496 int error; 3486 3497 3487 3498 BUG_ON(mapping->a_ops != &shmem_aops); 3488 - error = shmem_getpage_gfp(inode, index, &page, SGP_CACHE, gfp, NULL); 3499 + error = shmem_getpage_gfp(inode, index, &page, SGP_CACHE, 3500 + gfp, NULL, NULL); 3489 3501 if (error) 3490 3502 page = ERR_PTR(error); 3491 3503 else

+471 -302

mm/slab.c

··· 213 213 static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp); 214 214 static void cache_reap(struct work_struct *unused); 215 215 216 + static inline void fixup_objfreelist_debug(struct kmem_cache *cachep, 217 + void **list); 218 + static inline void fixup_slab_list(struct kmem_cache *cachep, 219 + struct kmem_cache_node *n, struct page *page, 220 + void **list); 216 221 static int slab_early_init = 1; 217 222 218 223 #define INDEX_NODE kmalloc_index(sizeof(struct kmem_cache_node)) ··· 426 421 .name = "kmem_cache", 427 422 }; 428 423 429 - #define BAD_ALIEN_MAGIC 0x01020304ul 430 - 431 424 static DEFINE_PER_CPU(struct delayed_work, slab_reap_work); 432 425 433 426 static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep) ··· 522 519 523 520 static void init_reap_node(int cpu) 524 521 { 525 - int node; 526 - 527 - node = next_node(cpu_to_mem(cpu), node_online_map); 528 - if (node == MAX_NUMNODES) 529 - node = first_node(node_online_map); 530 - 531 - per_cpu(slab_reap_node, cpu) = node; 522 + per_cpu(slab_reap_node, cpu) = next_node_in(cpu_to_mem(cpu), 523 + node_online_map); 532 524 } 533 525 534 526 static void next_reap_node(void) 535 527 { 536 528 int node = __this_cpu_read(slab_reap_node); 537 529 538 - node = next_node(node, node_online_map); 539 - if (unlikely(node >= MAX_NUMNODES)) 540 - node = first_node(node_online_map); 530 + node = next_node_in(node, node_online_map); 541 531 __this_cpu_write(slab_reap_node, node); 542 532 } 543 533 ··· 640 644 static inline struct alien_cache **alloc_alien_cache(int node, 641 645 int limit, gfp_t gfp) 642 646 { 643 - return (struct alien_cache **)BAD_ALIEN_MAGIC; 647 + return NULL; 644 648 } 645 649 646 650 static inline void free_alien_cache(struct alien_cache **ac_ptr) ··· 846 850 } 847 851 #endif 848 852 853 + static int init_cache_node(struct kmem_cache *cachep, int node, gfp_t gfp) 854 + { 855 + struct kmem_cache_node *n; 856 + 857 + /* 858 + * Set up the kmem_cache_node for cpu before we can 859 + * begin anything. Make sure some other cpu on this 860 + * node has not already allocated this 861 + */ 862 + n = get_node(cachep, node); 863 + if (n) { 864 + spin_lock_irq(&n->list_lock); 865 + n->free_limit = (1 + nr_cpus_node(node)) * cachep->batchcount + 866 + cachep->num; 867 + spin_unlock_irq(&n->list_lock); 868 + 869 + return 0; 870 + } 871 + 872 + n = kmalloc_node(sizeof(struct kmem_cache_node), gfp, node); 873 + if (!n) 874 + return -ENOMEM; 875 + 876 + kmem_cache_node_init(n); 877 + n->next_reap = jiffies + REAPTIMEOUT_NODE + 878 + ((unsigned long)cachep) % REAPTIMEOUT_NODE; 879 + 880 + n->free_limit = 881 + (1 + nr_cpus_node(node)) * cachep->batchcount + cachep->num; 882 + 883 + /* 884 + * The kmem_cache_nodes don't come and go as CPUs 885 + * come and go. slab_mutex is sufficient 886 + * protection here. 887 + */ 888 + cachep->node[node] = n; 889 + 890 + return 0; 891 + } 892 + 849 893 /* 850 894 * Allocates and initializes node for a node on each slab cache, used for 851 895 * either memory or cpu hotplug. If memory is being hot-added, the kmem_cache_node ··· 897 861 */ 898 862 static int init_cache_node_node(int node) 899 863 { 864 + int ret; 900 865 struct kmem_cache *cachep; 901 - struct kmem_cache_node *n; 902 - const size_t memsize = sizeof(struct kmem_cache_node); 903 866 904 867 list_for_each_entry(cachep, &slab_caches, list) { 905 - /* 906 - * Set up the kmem_cache_node for cpu before we can 907 - * begin anything. Make sure some other cpu on this 908 - * node has not already allocated this 909 - */ 910 - n = get_node(cachep, node); 911 - if (!n) { 912 - n = kmalloc_node(memsize, GFP_KERNEL, node); 913 - if (!n) 914 - return -ENOMEM; 915 - kmem_cache_node_init(n); 916 - n->next_reap = jiffies + REAPTIMEOUT_NODE + 917 - ((unsigned long)cachep) % REAPTIMEOUT_NODE; 918 - 919 - /* 920 - * The kmem_cache_nodes don't come and go as CPUs 921 - * come and go. slab_mutex is sufficient 922 - * protection here. 923 - */ 924 - cachep->node[node] = n; 925 - } 926 - 927 - spin_lock_irq(&n->list_lock); 928 - n->free_limit = 929 - (1 + nr_cpus_node(node)) * 930 - cachep->batchcount + cachep->num; 931 - spin_unlock_irq(&n->list_lock); 868 + ret = init_cache_node(cachep, node, GFP_KERNEL); 869 + if (ret) 870 + return ret; 932 871 } 872 + 933 873 return 0; 934 874 } 935 875 936 - static inline int slabs_tofree(struct kmem_cache *cachep, 937 - struct kmem_cache_node *n) 876 + static int setup_kmem_cache_node(struct kmem_cache *cachep, 877 + int node, gfp_t gfp, bool force_change) 938 878 { 939 - return (n->free_objects + cachep->num - 1) / cachep->num; 879 + int ret = -ENOMEM; 880 + struct kmem_cache_node *n; 881 + struct array_cache *old_shared = NULL; 882 + struct array_cache *new_shared = NULL; 883 + struct alien_cache **new_alien = NULL; 884 + LIST_HEAD(list); 885 + 886 + if (use_alien_caches) { 887 + new_alien = alloc_alien_cache(node, cachep->limit, gfp); 888 + if (!new_alien) 889 + goto fail; 890 + } 891 + 892 + if (cachep->shared) { 893 + new_shared = alloc_arraycache(node, 894 + cachep->shared * cachep->batchcount, 0xbaadf00d, gfp); 895 + if (!new_shared) 896 + goto fail; 897 + } 898 + 899 + ret = init_cache_node(cachep, node, gfp); 900 + if (ret) 901 + goto fail; 902 + 903 + n = get_node(cachep, node); 904 + spin_lock_irq(&n->list_lock); 905 + if (n->shared && force_change) { 906 + free_block(cachep, n->shared->entry, 907 + n->shared->avail, node, &list); 908 + n->shared->avail = 0; 909 + } 910 + 911 + if (!n->shared || force_change) { 912 + old_shared = n->shared; 913 + n->shared = new_shared; 914 + new_shared = NULL; 915 + } 916 + 917 + if (!n->alien) { 918 + n->alien = new_alien; 919 + new_alien = NULL; 920 + } 921 + 922 + spin_unlock_irq(&n->list_lock); 923 + slabs_destroy(cachep, &list); 924 + 925 + /* 926 + * To protect lockless access to n->shared during irq disabled context. 927 + * If n->shared isn't NULL in irq disabled context, accessing to it is 928 + * guaranteed to be valid until irq is re-enabled, because it will be 929 + * freed after synchronize_sched(). 930 + */ 931 + if (force_change) 932 + synchronize_sched(); 933 + 934 + fail: 935 + kfree(old_shared); 936 + kfree(new_shared); 937 + free_alien_cache(new_alien); 938 + 939 + return ret; 940 940 } 941 941 942 942 static void cpuup_canceled(long cpu) ··· 1039 967 n = get_node(cachep, node); 1040 968 if (!n) 1041 969 continue; 1042 - drain_freelist(cachep, n, slabs_tofree(cachep, n)); 970 + drain_freelist(cachep, n, INT_MAX); 1043 971 } 1044 972 } 1045 973 1046 974 static int cpuup_prepare(long cpu) 1047 975 { 1048 976 struct kmem_cache *cachep; 1049 - struct kmem_cache_node *n = NULL; 1050 977 int node = cpu_to_mem(cpu); 1051 978 int err; 1052 979 ··· 1064 993 * array caches 1065 994 */ 1066 995 list_for_each_entry(cachep, &slab_caches, list) { 1067 - struct array_cache *shared = NULL; 1068 - struct alien_cache **alien = NULL; 1069 - 1070 - if (cachep->shared) { 1071 - shared = alloc_arraycache(node, 1072 - cachep->shared * cachep->batchcount, 1073 - 0xbaadf00d, GFP_KERNEL); 1074 - if (!shared) 1075 - goto bad; 1076 - } 1077 - if (use_alien_caches) { 1078 - alien = alloc_alien_cache(node, cachep->limit, GFP_KERNEL); 1079 - if (!alien) { 1080 - kfree(shared); 1081 - goto bad; 1082 - } 1083 - } 1084 - n = get_node(cachep, node); 1085 - BUG_ON(!n); 1086 - 1087 - spin_lock_irq(&n->list_lock); 1088 - if (!n->shared) { 1089 - /* 1090 - * We are serialised from CPU_DEAD or 1091 - * CPU_UP_CANCELLED by the cpucontrol lock 1092 - */ 1093 - n->shared = shared; 1094 - shared = NULL; 1095 - } 1096 - #ifdef CONFIG_NUMA 1097 - if (!n->alien) { 1098 - n->alien = alien; 1099 - alien = NULL; 1100 - } 1101 - #endif 1102 - spin_unlock_irq(&n->list_lock); 1103 - kfree(shared); 1104 - free_alien_cache(alien); 996 + err = setup_kmem_cache_node(cachep, node, GFP_KERNEL, false); 997 + if (err) 998 + goto bad; 1105 999 } 1106 1000 1107 1001 return 0; ··· 1155 1119 if (!n) 1156 1120 continue; 1157 1121 1158 - drain_freelist(cachep, n, slabs_tofree(cachep, n)); 1122 + drain_freelist(cachep, n, INT_MAX); 1159 1123 1160 1124 if (!list_empty(&n->slabs_full) || 1161 1125 !list_empty(&n->slabs_partial)) { ··· 1236 1200 } 1237 1201 } 1238 1202 1203 + #ifdef CONFIG_SLAB_FREELIST_RANDOM 1204 + static void freelist_randomize(struct rnd_state *state, freelist_idx_t *list, 1205 + size_t count) 1206 + { 1207 + size_t i; 1208 + unsigned int rand; 1209 + 1210 + for (i = 0; i < count; i++) 1211 + list[i] = i; 1212 + 1213 + /* Fisher-Yates shuffle */ 1214 + for (i = count - 1; i > 0; i--) { 1215 + rand = prandom_u32_state(state); 1216 + rand %= (i + 1); 1217 + swap(list[i], list[rand]); 1218 + } 1219 + } 1220 + 1221 + /* Create a random sequence per cache */ 1222 + static int cache_random_seq_create(struct kmem_cache *cachep, gfp_t gfp) 1223 + { 1224 + unsigned int seed, count = cachep->num; 1225 + struct rnd_state state; 1226 + 1227 + if (count < 2) 1228 + return 0; 1229 + 1230 + /* If it fails, we will just use the global lists */ 1231 + cachep->random_seq = kcalloc(count, sizeof(freelist_idx_t), gfp); 1232 + if (!cachep->random_seq) 1233 + return -ENOMEM; 1234 + 1235 + /* Get best entropy at this stage */ 1236 + get_random_bytes_arch(&seed, sizeof(seed)); 1237 + prandom_seed_state(&state, seed); 1238 + 1239 + freelist_randomize(&state, cachep->random_seq, count); 1240 + return 0; 1241 + } 1242 + 1243 + /* Destroy the per-cache random freelist sequence */ 1244 + static void cache_random_seq_destroy(struct kmem_cache *cachep) 1245 + { 1246 + kfree(cachep->random_seq); 1247 + cachep->random_seq = NULL; 1248 + } 1249 + #else 1250 + static inline int cache_random_seq_create(struct kmem_cache *cachep, gfp_t gfp) 1251 + { 1252 + return 0; 1253 + } 1254 + static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { } 1255 + #endif /* CONFIG_SLAB_FREELIST_RANDOM */ 1256 + 1257 + 1239 1258 /* 1240 1259 * Initialisation. Called after the page allocator have been initialised and 1241 1260 * before smp_init(). ··· 1303 1212 sizeof(struct rcu_head)); 1304 1213 kmem_cache = &kmem_cache_boot; 1305 1214 1306 - if (num_possible_nodes() == 1) 1215 + if (!IS_ENABLED(CONFIG_NUMA) || num_possible_nodes() == 1) 1307 1216 use_alien_caches = 0; 1308 1217 1309 1218 for (i = 0; i < NUM_INIT_LISTS; i++) ··· 1872 1781 1873 1782 /* 1874 1783 * Needed to avoid possible looping condition 1875 - * in cache_grow() 1784 + * in cache_grow_begin() 1876 1785 */ 1877 1786 if (OFF_SLAB(freelist_cache)) 1878 1787 continue; ··· 2229 2138 cachep->freelist_size = cachep->num * sizeof(freelist_idx_t); 2230 2139 cachep->flags = flags; 2231 2140 cachep->allocflags = __GFP_COMP; 2232 - if (CONFIG_ZONE_DMA_FLAG && (flags & SLAB_CACHE_DMA)) 2141 + if (flags & SLAB_CACHE_DMA) 2233 2142 cachep->allocflags |= GFP_DMA; 2234 2143 cachep->size = size; 2235 2144 cachep->reciprocal_buffer_size = reciprocal_value(size); ··· 2271 2180 BUG_ON(irqs_disabled()); 2272 2181 } 2273 2182 2183 + static void check_mutex_acquired(void) 2184 + { 2185 + BUG_ON(!mutex_is_locked(&slab_mutex)); 2186 + } 2187 + 2274 2188 static void check_spinlock_acquired(struct kmem_cache *cachep) 2275 2189 { 2276 2190 #ifdef CONFIG_SMP ··· 2295 2199 #else 2296 2200 #define check_irq_off() do { } while(0) 2297 2201 #define check_irq_on() do { } while(0) 2202 + #define check_mutex_acquired() do { } while(0) 2298 2203 #define check_spinlock_acquired(x) do { } while(0) 2299 2204 #define check_spinlock_acquired_node(x, y) do { } while(0) 2300 2205 #endif 2301 2206 2302 - static void drain_array(struct kmem_cache *cachep, struct kmem_cache_node *n, 2303 - struct array_cache *ac, 2304 - int force, int node); 2207 + static void drain_array_locked(struct kmem_cache *cachep, struct array_cache *ac, 2208 + int node, bool free_all, struct list_head *list) 2209 + { 2210 + int tofree; 2211 + 2212 + if (!ac || !ac->avail) 2213 + return; 2214 + 2215 + tofree = free_all ? ac->avail : (ac->limit + 4) / 5; 2216 + if (tofree > ac->avail) 2217 + tofree = (ac->avail + 1) / 2; 2218 + 2219 + free_block(cachep, ac->entry, tofree, node, list); 2220 + ac->avail -= tofree; 2221 + memmove(ac->entry, &(ac->entry[tofree]), sizeof(void *) * ac->avail); 2222 + } 2305 2223 2306 2224 static void do_drain(void *arg) 2307 2225 { ··· 2339 2229 { 2340 2230 struct kmem_cache_node *n; 2341 2231 int node; 2232 + LIST_HEAD(list); 2342 2233 2343 2234 on_each_cpu(do_drain, cachep, 1); 2344 2235 check_irq_on(); ··· 2347 2236 if (n->alien) 2348 2237 drain_alien_cache(cachep, n->alien); 2349 2238 2350 - for_each_kmem_cache_node(cachep, node, n) 2351 - drain_array(cachep, n, n->shared, 1, node); 2239 + for_each_kmem_cache_node(cachep, node, n) { 2240 + spin_lock_irq(&n->list_lock); 2241 + drain_array_locked(cachep, n->shared, node, true, &list); 2242 + spin_unlock_irq(&n->list_lock); 2243 + 2244 + slabs_destroy(cachep, &list); 2245 + } 2352 2246 } 2353 2247 2354 2248 /* ··· 2404 2288 2405 2289 check_irq_on(); 2406 2290 for_each_kmem_cache_node(cachep, node, n) { 2407 - drain_freelist(cachep, n, slabs_tofree(cachep, n)); 2291 + drain_freelist(cachep, n, INT_MAX); 2408 2292 2409 2293 ret += !list_empty(&n->slabs_full) || 2410 2294 !list_empty(&n->slabs_partial); ··· 2421 2305 { 2422 2306 int i; 2423 2307 struct kmem_cache_node *n; 2308 + 2309 + cache_random_seq_destroy(cachep); 2424 2310 2425 2311 free_percpu(cachep->cpu_cache); 2426 2312 ··· 2530 2412 #endif 2531 2413 } 2532 2414 2415 + #ifdef CONFIG_SLAB_FREELIST_RANDOM 2416 + /* Hold information during a freelist initialization */ 2417 + union freelist_init_state { 2418 + struct { 2419 + unsigned int pos; 2420 + freelist_idx_t *list; 2421 + unsigned int count; 2422 + unsigned int rand; 2423 + }; 2424 + struct rnd_state rnd_state; 2425 + }; 2426 + 2427 + /* 2428 + * Initialize the state based on the randomization methode available. 2429 + * return true if the pre-computed list is available, false otherwize. 2430 + */ 2431 + static bool freelist_state_initialize(union freelist_init_state *state, 2432 + struct kmem_cache *cachep, 2433 + unsigned int count) 2434 + { 2435 + bool ret; 2436 + unsigned int rand; 2437 + 2438 + /* Use best entropy available to define a random shift */ 2439 + get_random_bytes_arch(&rand, sizeof(rand)); 2440 + 2441 + /* Use a random state if the pre-computed list is not available */ 2442 + if (!cachep->random_seq) { 2443 + prandom_seed_state(&state->rnd_state, rand); 2444 + ret = false; 2445 + } else { 2446 + state->list = cachep->random_seq; 2447 + state->count = count; 2448 + state->pos = 0; 2449 + state->rand = rand; 2450 + ret = true; 2451 + } 2452 + return ret; 2453 + } 2454 + 2455 + /* Get the next entry on the list and randomize it using a random shift */ 2456 + static freelist_idx_t next_random_slot(union freelist_init_state *state) 2457 + { 2458 + return (state->list[state->pos++] + state->rand) % state->count; 2459 + } 2460 + 2461 + /* 2462 + * Shuffle the freelist initialization state based on pre-computed lists. 2463 + * return true if the list was successfully shuffled, false otherwise. 2464 + */ 2465 + static bool shuffle_freelist(struct kmem_cache *cachep, struct page *page) 2466 + { 2467 + unsigned int objfreelist = 0, i, count = cachep->num; 2468 + union freelist_init_state state; 2469 + bool precomputed; 2470 + 2471 + if (count < 2) 2472 + return false; 2473 + 2474 + precomputed = freelist_state_initialize(&state, cachep, count); 2475 + 2476 + /* Take a random entry as the objfreelist */ 2477 + if (OBJFREELIST_SLAB(cachep)) { 2478 + if (!precomputed) 2479 + objfreelist = count - 1; 2480 + else 2481 + objfreelist = next_random_slot(&state); 2482 + page->freelist = index_to_obj(cachep, page, objfreelist) + 2483 + obj_offset(cachep); 2484 + count--; 2485 + } 2486 + 2487 + /* 2488 + * On early boot, generate the list dynamically. 2489 + * Later use a pre-computed list for speed. 2490 + */ 2491 + if (!precomputed) { 2492 + freelist_randomize(&state.rnd_state, page->freelist, count); 2493 + } else { 2494 + for (i = 0; i < count; i++) 2495 + set_free_obj(page, i, next_random_slot(&state)); 2496 + } 2497 + 2498 + if (OBJFREELIST_SLAB(cachep)) 2499 + set_free_obj(page, cachep->num - 1, objfreelist); 2500 + 2501 + return true; 2502 + } 2503 + #else 2504 + static inline bool shuffle_freelist(struct kmem_cache *cachep, 2505 + struct page *page) 2506 + { 2507 + return false; 2508 + } 2509 + #endif /* CONFIG_SLAB_FREELIST_RANDOM */ 2510 + 2533 2511 static void cache_init_objs(struct kmem_cache *cachep, 2534 2512 struct page *page) 2535 2513 { 2536 2514 int i; 2537 2515 void *objp; 2516 + bool shuffled; 2538 2517 2539 2518 cache_init_objs_debug(cachep, page); 2540 2519 2541 - if (OBJFREELIST_SLAB(cachep)) { 2520 + /* Try to randomize the freelist if enabled */ 2521 + shuffled = shuffle_freelist(cachep, page); 2522 + 2523 + if (!shuffled && OBJFREELIST_SLAB(cachep)) { 2542 2524 page->freelist = index_to_obj(cachep, page, cachep->num - 1) + 2543 2525 obj_offset(cachep); 2544 2526 } ··· 2652 2434 kasan_poison_object_data(cachep, objp); 2653 2435 } 2654 2436 2655 - set_free_obj(page, i, i); 2656 - } 2657 - } 2658 - 2659 - static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags) 2660 - { 2661 - if (CONFIG_ZONE_DMA_FLAG) { 2662 - if (flags & GFP_DMA) 2663 - BUG_ON(!(cachep->allocflags & GFP_DMA)); 2664 - else 2665 - BUG_ON(cachep->allocflags & GFP_DMA); 2437 + if (!shuffled) 2438 + set_free_obj(page, i, i); 2666 2439 } 2667 2440 } 2668 2441 ··· 2711 2502 * Grow (by 1) the number of slabs within a cache. This is called by 2712 2503 * kmem_cache_alloc() when there are no active objs left in a cache. 2713 2504 */ 2714 - static int cache_grow(struct kmem_cache *cachep, 2715 - gfp_t flags, int nodeid, struct page *page) 2505 + static struct page *cache_grow_begin(struct kmem_cache *cachep, 2506 + gfp_t flags, int nodeid) 2716 2507 { 2717 2508 void *freelist; 2718 2509 size_t offset; 2719 2510 gfp_t local_flags; 2511 + int page_node; 2720 2512 struct kmem_cache_node *n; 2513 + struct page *page; 2721 2514 2722 2515 /* 2723 2516 * Be lazy and only check for valid flags here, keeping it out of the ··· 2731 2520 } 2732 2521 local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK); 2733 2522 2734 - /* Take the node list lock to change the colour_next on this node */ 2735 2523 check_irq_off(); 2736 - n = get_node(cachep, nodeid); 2737 - spin_lock(&n->list_lock); 2738 - 2739 - /* Get colour for the slab, and cal the next value. */ 2740 - offset = n->colour_next; 2741 - n->colour_next++; 2742 - if (n->colour_next >= cachep->colour) 2743 - n->colour_next = 0; 2744 - spin_unlock(&n->list_lock); 2745 - 2746 - offset *= cachep->colour_off; 2747 - 2748 2524 if (gfpflags_allow_blocking(local_flags)) 2749 2525 local_irq_enable(); 2750 - 2751 - /* 2752 - * The test for missing atomic flag is performed here, rather than 2753 - * the more obvious place, simply to reduce the critical path length 2754 - * in kmem_cache_alloc(). If a caller is seriously mis-behaving they 2755 - * will eventually be caught here (where it matters). 2756 - */ 2757 - kmem_flagcheck(cachep, flags); 2758 2526 2759 2527 /* 2760 2528 * Get mem for the objs. Attempt to allocate a physical page from 2761 2529 * 'nodeid'. 2762 2530 */ 2763 - if (!page) 2764 - page = kmem_getpages(cachep, local_flags, nodeid); 2531 + page = kmem_getpages(cachep, local_flags, nodeid); 2765 2532 if (!page) 2766 2533 goto failed; 2767 2534 2535 + page_node = page_to_nid(page); 2536 + n = get_node(cachep, page_node); 2537 + 2538 + /* Get colour for the slab, and cal the next value. */ 2539 + n->colour_next++; 2540 + if (n->colour_next >= cachep->colour) 2541 + n->colour_next = 0; 2542 + 2543 + offset = n->colour_next; 2544 + if (offset >= cachep->colour) 2545 + offset = 0; 2546 + 2547 + offset *= cachep->colour_off; 2548 + 2768 2549 /* Get slab management. */ 2769 2550 freelist = alloc_slabmgmt(cachep, page, offset, 2770 - local_flags & ~GFP_CONSTRAINT_MASK, nodeid); 2551 + local_flags & ~GFP_CONSTRAINT_MASK, page_node); 2771 2552 if (OFF_SLAB(cachep) && !freelist) 2772 2553 goto opps1; 2773 2554 ··· 2770 2567 2771 2568 if (gfpflags_allow_blocking(local_flags)) 2772 2569 local_irq_disable(); 2773 - check_irq_off(); 2774 - spin_lock(&n->list_lock); 2775 2570 2776 - /* Make slab active. */ 2777 - list_add_tail(&page->lru, &(n->slabs_free)); 2778 - STATS_INC_GROWN(cachep); 2779 - n->free_objects += cachep->num; 2780 - spin_unlock(&n->list_lock); 2781 - return 1; 2571 + return page; 2572 + 2782 2573 opps1: 2783 2574 kmem_freepages(cachep, page); 2784 2575 failed: 2785 2576 if (gfpflags_allow_blocking(local_flags)) 2786 2577 local_irq_disable(); 2787 - return 0; 2578 + return NULL; 2579 + } 2580 + 2581 + static void cache_grow_end(struct kmem_cache *cachep, struct page *page) 2582 + { 2583 + struct kmem_cache_node *n; 2584 + void *list = NULL; 2585 + 2586 + check_irq_off(); 2587 + 2588 + if (!page) 2589 + return; 2590 + 2591 + INIT_LIST_HEAD(&page->lru); 2592 + n = get_node(cachep, page_to_nid(page)); 2593 + 2594 + spin_lock(&n->list_lock); 2595 + if (!page->active) 2596 + list_add_tail(&page->lru, &(n->slabs_free)); 2597 + else 2598 + fixup_slab_list(cachep, n, page, &list); 2599 + STATS_INC_GROWN(cachep); 2600 + n->free_objects += cachep->num - page->active; 2601 + spin_unlock(&n->list_lock); 2602 + 2603 + fixup_objfreelist_debug(cachep, &list); 2788 2604 } 2789 2605 2790 2606 #if DEBUG ··· 3007 2785 return obj; 3008 2786 } 3009 2787 2788 + /* 2789 + * Slab list should be fixed up by fixup_slab_list() for existing slab 2790 + * or cache_grow_end() for new slab 2791 + */ 2792 + static __always_inline int alloc_block(struct kmem_cache *cachep, 2793 + struct array_cache *ac, struct page *page, int batchcount) 2794 + { 2795 + /* 2796 + * There must be at least one object available for 2797 + * allocation. 2798 + */ 2799 + BUG_ON(page->active >= cachep->num); 2800 + 2801 + while (page->active < cachep->num && batchcount--) { 2802 + STATS_INC_ALLOCED(cachep); 2803 + STATS_INC_ACTIVE(cachep); 2804 + STATS_SET_HIGH(cachep); 2805 + 2806 + ac->entry[ac->avail++] = slab_get_obj(cachep, page); 2807 + } 2808 + 2809 + return batchcount; 2810 + } 2811 + 3010 2812 static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags) 3011 2813 { 3012 2814 int batchcount; 3013 2815 struct kmem_cache_node *n; 3014 - struct array_cache *ac; 2816 + struct array_cache *ac, *shared; 3015 2817 int node; 3016 2818 void *list = NULL; 2819 + struct page *page; 3017 2820 3018 2821 check_irq_off(); 3019 2822 node = numa_mem_id(); 3020 2823 3021 - retry: 3022 2824 ac = cpu_cache_get(cachep); 3023 2825 batchcount = ac->batchcount; 3024 2826 if (!ac->touched && batchcount > BATCHREFILL_LIMIT) { ··· 3056 2810 n = get_node(cachep, node); 3057 2811 3058 2812 BUG_ON(ac->avail > 0 || !n); 2813 + shared = READ_ONCE(n->shared); 2814 + if (!n->free_objects && (!shared || !shared->avail)) 2815 + goto direct_grow; 2816 + 3059 2817 spin_lock(&n->list_lock); 2818 + shared = READ_ONCE(n->shared); 3060 2819 3061 2820 /* See if we can refill from the shared array */ 3062 - if (n->shared && transfer_objects(ac, n->shared, batchcount)) { 3063 - n->shared->touched = 1; 2821 + if (shared && transfer_objects(ac, shared, batchcount)) { 2822 + shared->touched = 1; 3064 2823 goto alloc_done; 3065 2824 } 3066 2825 3067 2826 while (batchcount > 0) { 3068 - struct page *page; 3069 2827 /* Get slab alloc is to come from. */ 3070 2828 page = get_first_slab(n, false); 3071 2829 if (!page) ··· 3077 2827 3078 2828 check_spinlock_acquired(cachep); 3079 2829 3080 - /* 3081 - * The slab was either on partial or free list so 3082 - * there must be at least one object available for 3083 - * allocation. 3084 - */ 3085 - BUG_ON(page->active >= cachep->num); 3086 - 3087 - while (page->active < cachep->num && batchcount--) { 3088 - STATS_INC_ALLOCED(cachep); 3089 - STATS_INC_ACTIVE(cachep); 3090 - STATS_SET_HIGH(cachep); 3091 - 3092 - ac->entry[ac->avail++] = slab_get_obj(cachep, page); 3093 - } 3094 - 2830 + batchcount = alloc_block(cachep, ac, page, batchcount); 3095 2831 fixup_slab_list(cachep, n, page, &list); 3096 2832 } 3097 2833 ··· 3087 2851 spin_unlock(&n->list_lock); 3088 2852 fixup_objfreelist_debug(cachep, &list); 3089 2853 2854 + direct_grow: 3090 2855 if (unlikely(!ac->avail)) { 3091 - int x; 3092 - 3093 2856 /* Check if we can use obj in pfmemalloc slab */ 3094 2857 if (sk_memalloc_socks()) { 3095 2858 void *obj = cache_alloc_pfmemalloc(cachep, n, flags); ··· 3097 2862 return obj; 3098 2863 } 3099 2864 3100 - x = cache_grow(cachep, gfp_exact_node(flags), node, NULL); 2865 + page = cache_grow_begin(cachep, gfp_exact_node(flags), node); 3101 2866 3102 - /* cache_grow can reenable interrupts, then ac could change. */ 2867 + /* 2868 + * cache_grow_begin() can reenable interrupts, 2869 + * then ac could change. 2870 + */ 3103 2871 ac = cpu_cache_get(cachep); 3104 - node = numa_mem_id(); 2872 + if (!ac->avail && page) 2873 + alloc_block(cachep, ac, page, batchcount); 2874 + cache_grow_end(cachep, page); 3105 2875 3106 - /* no objects in sight? abort */ 3107 - if (!x && ac->avail == 0) 2876 + if (!ac->avail) 3108 2877 return NULL; 3109 - 3110 - if (!ac->avail) /* objects refilled by interrupt? */ 3111 - goto retry; 3112 2878 } 3113 2879 ac->touched = 1; 3114 2880 ··· 3120 2884 gfp_t flags) 3121 2885 { 3122 2886 might_sleep_if(gfpflags_allow_blocking(flags)); 3123 - #if DEBUG 3124 - kmem_flagcheck(cachep, flags); 3125 - #endif 3126 2887 } 3127 2888 3128 2889 #if DEBUG ··· 3231 2998 static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags) 3232 2999 { 3233 3000 struct zonelist *zonelist; 3234 - gfp_t local_flags; 3235 3001 struct zoneref *z; 3236 3002 struct zone *zone; 3237 3003 enum zone_type high_zoneidx = gfp_zone(flags); 3238 3004 void *obj = NULL; 3005 + struct page *page; 3239 3006 int nid; 3240 3007 unsigned int cpuset_mems_cookie; 3241 3008 3242 3009 if (flags & __GFP_THISNODE) 3243 3010 return NULL; 3244 - 3245 - local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK); 3246 3011 3247 3012 retry_cpuset: 3248 3013 cpuset_mems_cookie = read_mems_allowed_begin(); ··· 3271 3040 * We may trigger various forms of reclaim on the allowed 3272 3041 * set and go into memory reserves if necessary. 3273 3042 */ 3274 - struct page *page; 3275 - 3276 - if (gfpflags_allow_blocking(local_flags)) 3277 - local_irq_enable(); 3278 - kmem_flagcheck(cache, flags); 3279 - page = kmem_getpages(cache, local_flags, numa_mem_id()); 3280 - if (gfpflags_allow_blocking(local_flags)) 3281 - local_irq_disable(); 3043 + page = cache_grow_begin(cache, flags, numa_mem_id()); 3044 + cache_grow_end(cache, page); 3282 3045 if (page) { 3283 - /* 3284 - * Insert into the appropriate per node queues 3285 - */ 3286 3046 nid = page_to_nid(page); 3287 - if (cache_grow(cache, flags, nid, page)) { 3288 - obj = ____cache_alloc_node(cache, 3289 - gfp_exact_node(flags), nid); 3290 - if (!obj) 3291 - /* 3292 - * Another processor may allocate the 3293 - * objects in the slab since we are 3294 - * not holding any locks. 3295 - */ 3296 - goto retry; 3297 - } else { 3298 - /* cache_grow already freed obj */ 3299 - obj = NULL; 3300 - } 3047 + obj = ____cache_alloc_node(cache, 3048 + gfp_exact_node(flags), nid); 3049 + 3050 + /* 3051 + * Another processor may allocate the objects in 3052 + * the slab since we are not holding any locks. 3053 + */ 3054 + if (!obj) 3055 + goto retry; 3301 3056 } 3302 3057 } 3303 3058 ··· 3300 3083 { 3301 3084 struct page *page; 3302 3085 struct kmem_cache_node *n; 3303 - void *obj; 3086 + void *obj = NULL; 3304 3087 void *list = NULL; 3305 - int x; 3306 3088 3307 3089 VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES); 3308 3090 n = get_node(cachep, nodeid); 3309 3091 BUG_ON(!n); 3310 3092 3311 - retry: 3312 3093 check_irq_off(); 3313 3094 spin_lock(&n->list_lock); 3314 3095 page = get_first_slab(n, false); ··· 3328 3113 3329 3114 spin_unlock(&n->list_lock); 3330 3115 fixup_objfreelist_debug(cachep, &list); 3331 - goto done; 3116 + return obj; 3332 3117 3333 3118 must_grow: 3334 3119 spin_unlock(&n->list_lock); 3335 - x = cache_grow(cachep, gfp_exact_node(flags), nodeid, NULL); 3336 - if (x) 3337 - goto retry; 3120 + page = cache_grow_begin(cachep, gfp_exact_node(flags), nodeid); 3121 + if (page) { 3122 + /* This slab isn't counted yet so don't update free_objects */ 3123 + obj = slab_get_obj(cachep, page); 3124 + } 3125 + cache_grow_end(cachep, page); 3338 3126 3339 - return fallback_alloc(cachep, flags); 3340 - 3341 - done: 3342 - return obj; 3127 + return obj ? obj : fallback_alloc(cachep, flags); 3343 3128 } 3344 3129 3345 3130 static __always_inline void * ··· 3457 3242 { 3458 3243 int i; 3459 3244 struct kmem_cache_node *n = get_node(cachep, node); 3245 + struct page *page; 3246 + 3247 + n->free_objects += nr_objects; 3460 3248 3461 3249 for (i = 0; i < nr_objects; i++) { 3462 3250 void *objp; ··· 3472 3254 check_spinlock_acquired_node(cachep, node); 3473 3255 slab_put_obj(cachep, page, objp); 3474 3256 STATS_DEC_ACTIVE(cachep); 3475 - n->free_objects++; 3476 3257 3477 3258 /* fixup slab chains */ 3478 - if (page->active == 0) { 3479 - if (n->free_objects > n->free_limit) { 3480 - n->free_objects -= cachep->num; 3481 - list_add_tail(&page->lru, list); 3482 - } else { 3483 - list_add(&page->lru, &n->slabs_free); 3484 - } 3485 - } else { 3259 + if (page->active == 0) 3260 + list_add(&page->lru, &n->slabs_free); 3261 + else { 3486 3262 /* Unconditionally move a slab to the end of the 3487 3263 * partial list on free - maximum time for the 3488 3264 * other objects to be freed, too. 3489 3265 */ 3490 3266 list_add_tail(&page->lru, &n->slabs_partial); 3491 3267 } 3268 + } 3269 + 3270 + while (n->free_objects > n->free_limit && !list_empty(&n->slabs_free)) { 3271 + n->free_objects -= cachep->num; 3272 + 3273 + page = list_last_entry(&n->slabs_free, struct page, lru); 3274 + list_del(&page->lru); 3275 + list_add(&page->lru, list); 3492 3276 } 3493 3277 } 3494 3278 ··· 3865 3645 /* 3866 3646 * This initializes kmem_cache_node or resizes various caches for all nodes. 3867 3647 */ 3868 - static int alloc_kmem_cache_node(struct kmem_cache *cachep, gfp_t gfp) 3648 + static int setup_kmem_cache_nodes(struct kmem_cache *cachep, gfp_t gfp) 3869 3649 { 3650 + int ret; 3870 3651 int node; 3871 3652 struct kmem_cache_node *n; 3872 - struct array_cache *new_shared; 3873 - struct alien_cache **new_alien = NULL; 3874 3653 3875 3654 for_each_online_node(node) { 3876 - 3877 - if (use_alien_caches) { 3878 - new_alien = alloc_alien_cache(node, cachep->limit, gfp); 3879 - if (!new_alien) 3880 - goto fail; 3881 - } 3882 - 3883 - new_shared = NULL; 3884 - if (cachep->shared) { 3885 - new_shared = alloc_arraycache(node, 3886 - cachep->shared*cachep->batchcount, 3887 - 0xbaadf00d, gfp); 3888 - if (!new_shared) { 3889 - free_alien_cache(new_alien); 3890 - goto fail; 3891 - } 3892 - } 3893 - 3894 - n = get_node(cachep, node); 3895 - if (n) { 3896 - struct array_cache *shared = n->shared; 3897 - LIST_HEAD(list); 3898 - 3899 - spin_lock_irq(&n->list_lock); 3900 - 3901 - if (shared) 3902 - free_block(cachep, shared->entry, 3903 - shared->avail, node, &list); 3904 - 3905 - n->shared = new_shared; 3906 - if (!n->alien) { 3907 - n->alien = new_alien; 3908 - new_alien = NULL; 3909 - } 3910 - n->free_limit = (1 + nr_cpus_node(node)) * 3911 - cachep->batchcount + cachep->num; 3912 - spin_unlock_irq(&n->list_lock); 3913 - slabs_destroy(cachep, &list); 3914 - kfree(shared); 3915 - free_alien_cache(new_alien); 3916 - continue; 3917 - } 3918 - n = kmalloc_node(sizeof(struct kmem_cache_node), gfp, node); 3919 - if (!n) { 3920 - free_alien_cache(new_alien); 3921 - kfree(new_shared); 3655 + ret = setup_kmem_cache_node(cachep, node, gfp, true); 3656 + if (ret) 3922 3657 goto fail; 3923 - } 3924 3658 3925 - kmem_cache_node_init(n); 3926 - n->next_reap = jiffies + REAPTIMEOUT_NODE + 3927 - ((unsigned long)cachep) % REAPTIMEOUT_NODE; 3928 - n->shared = new_shared; 3929 - n->alien = new_alien; 3930 - n->free_limit = (1 + nr_cpus_node(node)) * 3931 - cachep->batchcount + cachep->num; 3932 - cachep->node[node] = n; 3933 3659 } 3660 + 3934 3661 return 0; 3935 3662 3936 3663 fail: ··· 3919 3752 cachep->shared = shared; 3920 3753 3921 3754 if (!prev) 3922 - goto alloc_node; 3755 + goto setup_node; 3923 3756 3924 3757 for_each_online_cpu(cpu) { 3925 3758 LIST_HEAD(list); ··· 3936 3769 } 3937 3770 free_percpu(prev); 3938 3771 3939 - alloc_node: 3940 - return alloc_kmem_cache_node(cachep, gfp); 3772 + setup_node: 3773 + return setup_kmem_cache_nodes(cachep, gfp); 3941 3774 } 3942 3775 3943 3776 static int do_tune_cpucache(struct kmem_cache *cachep, int limit, ··· 3970 3803 int limit = 0; 3971 3804 int shared = 0; 3972 3805 int batchcount = 0; 3806 + 3807 + err = cache_random_seq_create(cachep, gfp); 3808 + if (err) 3809 + goto end; 3973 3810 3974 3811 if (!is_root_cache(cachep)) { 3975 3812 struct kmem_cache *root = memcg_root_cache(cachep); ··· 4028 3857 batchcount = (limit + 1) / 2; 4029 3858 skip_setup: 4030 3859 err = do_tune_cpucache(cachep, limit, batchcount, shared, gfp); 3860 + end: 4031 3861 if (err) 4032 3862 pr_err("enable_cpucache failed for %s, error %d\n", 4033 3863 cachep->name, -err); ··· 4041 3869 * if drain_array() is used on the shared array. 4042 3870 */ 4043 3871 static void drain_array(struct kmem_cache *cachep, struct kmem_cache_node *n, 4044 - struct array_cache *ac, int force, int node) 3872 + struct array_cache *ac, int node) 4045 3873 { 4046 3874 LIST_HEAD(list); 4047 - int tofree; 3875 + 3876 + /* ac from n->shared can be freed if we don't hold the slab_mutex. */ 3877 + check_mutex_acquired(); 4048 3878 4049 3879 if (!ac || !ac->avail) 4050 3880 return; 4051 - if (ac->touched && !force) { 3881 + 3882 + if (ac->touched) { 4052 3883 ac->touched = 0; 4053 - } else { 4054 - spin_lock_irq(&n->list_lock); 4055 - if (ac->avail) { 4056 - tofree = force ? ac->avail : (ac->limit + 4) / 5; 4057 - if (tofree > ac->avail) 4058 - tofree = (ac->avail + 1) / 2; 4059 - free_block(cachep, ac->entry, tofree, node, &list); 4060 - ac->avail -= tofree; 4061 - memmove(ac->entry, &(ac->entry[tofree]), 4062 - sizeof(void *) * ac->avail); 4063 - } 4064 - spin_unlock_irq(&n->list_lock); 4065 - slabs_destroy(cachep, &list); 3884 + return; 4066 3885 } 3886 + 3887 + spin_lock_irq(&n->list_lock); 3888 + drain_array_locked(cachep, ac, node, false, &list); 3889 + spin_unlock_irq(&n->list_lock); 3890 + 3891 + slabs_destroy(cachep, &list); 4067 3892 } 4068 3893 4069 3894 /** ··· 4098 3929 4099 3930 reap_alien(searchp, n); 4100 3931 4101 - drain_array(searchp, n, cpu_cache_get(searchp), 0, node); 3932 + drain_array(searchp, n, cpu_cache_get(searchp), node); 4102 3933 4103 3934 /* 4104 3935 * These are racy checks but it does not matter ··· 4109 3940 4110 3941 n->next_reap = jiffies + REAPTIMEOUT_NODE; 4111 3942 4112 - drain_array(searchp, n, n->shared, 0, node); 3943 + drain_array(searchp, n, n->shared, node); 4113 3944 4114 3945 if (n->free_touched) 4115 3946 n->free_touched = 0;

+8 -8

mm/slub.c

··· 329 329 tmp.counters = counters_new; 330 330 /* 331 331 * page->counters can cover frozen/inuse/objects as well 332 - * as page->_count. If we assign to ->counters directly 333 - * we run the risk of losing updates to page->_count, so 332 + * as page->_refcount. If we assign to ->counters directly 333 + * we run the risk of losing updates to page->_refcount, so 334 334 * be careful and only assign to the fields we need. 335 335 */ 336 336 page->frozen = tmp.frozen; ··· 1735 1735 * may return off node objects because partial slabs are obtained 1736 1736 * from other nodes and filled up. 1737 1737 * 1738 - * If /sys/kernel/slab/xx/defrag_ratio is set to 100 (which makes 1739 - * defrag_ratio = 1000) then every (well almost) allocation will 1740 - * first attempt to defrag slab caches on other nodes. This means 1741 - * scanning over all nodes to look for partial slabs which may be 1742 - * expensive if we do it every time we are trying to find a slab 1738 + * If /sys/kernel/slab/xx/remote_node_defrag_ratio is set to 100 1739 + * (which makes defrag_ratio = 1000) then every (well almost) 1740 + * allocation will first attempt to defrag slab caches on other nodes. 1741 + * This means scanning over all nodes to look for partial slabs which 1742 + * may be expensive if we do it every time we are trying to find a slab 1743 1743 * with available objects. 1744 1744 */ 1745 1745 if (!s->remote_node_defrag_ratio || ··· 3697 3697 * s->cpu_partial is checked locklessly (see put_cpu_partial), 3698 3698 * so we have to make sure the change is visible. 3699 3699 */ 3700 - kick_all_cpus_sync(); 3700 + synchronize_sched(); 3701 3701 } 3702 3702 3703 3703 flush_all(s);

+1 -2

mm/swap_state.c

··· 358 358 359 359 /* May fail (-ENOMEM) if radix-tree node allocation failed. */ 360 360 __SetPageLocked(new_page); 361 - SetPageSwapBacked(new_page); 361 + __SetPageSwapBacked(new_page); 362 362 err = __add_to_swap_cache(new_page, entry); 363 363 if (likely(!err)) { 364 364 radix_tree_preload_end(); ··· 370 370 return new_page; 371 371 } 372 372 radix_tree_preload_end(); 373 - ClearPageSwapBacked(new_page); 374 373 __ClearPageLocked(new_page); 375 374 /* 376 375 * add_to_swap_cache() doesn't return -EEXIST, so we can safely

+23

mm/util.c

··· 346 346 return __page_rmapping(page); 347 347 } 348 348 349 + /* 350 + * Return true if this page is mapped into pagetables. 351 + * For compound page it returns true if any subpage of compound page is mapped. 352 + */ 353 + bool page_mapped(struct page *page) 354 + { 355 + int i; 356 + 357 + if (likely(!PageCompound(page))) 358 + return atomic_read(&page->_mapcount) >= 0; 359 + page = compound_head(page); 360 + if (atomic_read(compound_mapcount_ptr(page)) >= 0) 361 + return true; 362 + if (PageHuge(page)) 363 + return false; 364 + for (i = 0; i < hpage_nr_pages(page); i++) { 365 + if (atomic_read(&page[i]._mapcount) >= 0) 366 + return true; 367 + } 368 + return false; 369 + } 370 + EXPORT_SYMBOL(page_mapped); 371 + 349 372 struct anon_vma *page_anon_vma(struct page *page) 350 373 { 351 374 unsigned long mapping;

+12 -15

mm/vmscan.c

··· 633 633 * 634 634 * Reversing the order of the tests ensures such a situation cannot 635 635 * escape unnoticed. The smp_rmb is needed to ensure the page->flags 636 - * load is not satisfied before that of page->_count. 636 + * load is not satisfied before that of page->_refcount. 637 637 * 638 638 * Note that if SetPageDirty is always performed via set_page_dirty, 639 639 * and thus under tree_lock, then this ordering is not required. ··· 1374 1374 for (scan = 0; scan < nr_to_scan && nr_taken < nr_to_scan && 1375 1375 !list_empty(src); scan++) { 1376 1376 struct page *page; 1377 - int nr_pages; 1378 1377 1379 1378 page = lru_to_page(src); 1380 1379 prefetchw_prev_lru_page(page, src, flags); ··· 1382 1383 1383 1384 switch (__isolate_lru_page(page, mode)) { 1384 1385 case 0: 1385 - nr_pages = hpage_nr_pages(page); 1386 - mem_cgroup_update_lru_size(lruvec, lru, -nr_pages); 1386 + nr_taken += hpage_nr_pages(page); 1387 1387 list_move(&page->lru, dst); 1388 - nr_taken += nr_pages; 1389 1388 break; 1390 1389 1391 1390 case -EBUSY: ··· 1599 1602 nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list, 1600 1603 &nr_scanned, sc, isolate_mode, lru); 1601 1604 1602 - __mod_zone_page_state(zone, NR_LRU_BASE + lru, -nr_taken); 1605 + update_lru_size(lruvec, lru, -nr_taken); 1603 1606 __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken); 1607 + reclaim_stat->recent_scanned[file] += nr_taken; 1604 1608 1605 1609 if (global_reclaim(sc)) { 1606 1610 __mod_zone_page_state(zone, NR_PAGES_SCANNED, nr_scanned); ··· 1621 1623 false); 1622 1624 1623 1625 spin_lock_irq(&zone->lru_lock); 1624 - 1625 - reclaim_stat->recent_scanned[file] += nr_taken; 1626 1626 1627 1627 if (global_reclaim(sc)) { 1628 1628 if (current_is_kswapd()) ··· 1716 1720 * It is safe to rely on PG_active against the non-LRU pages in here because 1717 1721 * nobody will play with that bit on a non-LRU page. 1718 1722 * 1719 - * The downside is that we have to touch page->_count against each page. 1723 + * The downside is that we have to touch page->_refcount against each page. 1720 1724 * But we had to alter page->flags anyway. 1721 1725 */ 1722 1726 ··· 1738 1742 SetPageLRU(page); 1739 1743 1740 1744 nr_pages = hpage_nr_pages(page); 1741 - mem_cgroup_update_lru_size(lruvec, lru, nr_pages); 1745 + update_lru_size(lruvec, lru, nr_pages); 1742 1746 list_move(&page->lru, &lruvec->lists[lru]); 1743 1747 pgmoved += nr_pages; 1744 1748 ··· 1756 1760 list_add(&page->lru, pages_to_free); 1757 1761 } 1758 1762 } 1759 - __mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved); 1763 + 1760 1764 if (!is_active_lru(lru)) 1761 1765 __count_vm_events(PGDEACTIVATE, pgmoved); 1762 1766 } ··· 1790 1794 1791 1795 nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold, 1792 1796 &nr_scanned, sc, isolate_mode, lru); 1793 - if (global_reclaim(sc)) 1794 - __mod_zone_page_state(zone, NR_PAGES_SCANNED, nr_scanned); 1795 1797 1798 + update_lru_size(lruvec, lru, -nr_taken); 1799 + __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken); 1796 1800 reclaim_stat->recent_scanned[file] += nr_taken; 1797 1801 1802 + if (global_reclaim(sc)) 1803 + __mod_zone_page_state(zone, NR_PAGES_SCANNED, nr_scanned); 1798 1804 __count_zone_vm_events(PGREFILL, zone, nr_scanned); 1799 - __mod_zone_page_state(zone, NR_LRU_BASE + lru, -nr_taken); 1800 - __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken); 1805 + 1801 1806 spin_unlock_irq(&zone->lru_lock); 1802 1807 1803 1808 while (!list_empty(&l_hold)) {

+74 -38

mm/vmstat.c

··· 570 570 571 571 #ifdef CONFIG_NUMA 572 572 /* 573 - * zonelist = the list of zones passed to the allocator 574 - * z = the zone from which the allocation occurred. 575 - * 576 - * Must be called with interrupts disabled. 577 - * 578 - * When __GFP_OTHER_NODE is set assume the node of the preferred 579 - * zone is the local node. This is useful for daemons who allocate 580 - * memory on behalf of other processes. 581 - */ 582 - void zone_statistics(struct zone *preferred_zone, struct zone *z, gfp_t flags) 583 - { 584 - if (z->zone_pgdat == preferred_zone->zone_pgdat) { 585 - __inc_zone_state(z, NUMA_HIT); 586 - } else { 587 - __inc_zone_state(z, NUMA_MISS); 588 - __inc_zone_state(preferred_zone, NUMA_FOREIGN); 589 - } 590 - if (z->node == ((flags & __GFP_OTHER_NODE) ? 591 - preferred_zone->node : numa_node_id())) 592 - __inc_zone_state(z, NUMA_LOCAL); 593 - else 594 - __inc_zone_state(z, NUMA_OTHER); 595 - } 596 - 597 - /* 598 573 * Determine the per node value of a stat item. 599 574 */ 600 575 unsigned long node_page_state(int node, enum zone_stat_item item) 601 576 { 602 577 struct zone *zones = NODE_DATA(node)->node_zones; 578 + int i; 579 + unsigned long count = 0; 603 580 604 - return 605 - #ifdef CONFIG_ZONE_DMA 606 - zone_page_state(&zones[ZONE_DMA], item) + 607 - #endif 608 - #ifdef CONFIG_ZONE_DMA32 609 - zone_page_state(&zones[ZONE_DMA32], item) + 610 - #endif 611 - #ifdef CONFIG_HIGHMEM 612 - zone_page_state(&zones[ZONE_HIGHMEM], item) + 613 - #endif 614 - zone_page_state(&zones[ZONE_NORMAL], item) + 615 - zone_page_state(&zones[ZONE_MOVABLE], item); 581 + for (i = 0; i < MAX_NR_ZONES; i++) 582 + count += zone_page_state(zones + i, item); 583 + 584 + return count; 616 585 } 617 586 618 587 #endif ··· 979 1010 if (!memmap_valid_within(pfn, page, zone)) 980 1011 continue; 981 1012 1013 + if (page_zone(page) != zone) 1014 + continue; 1015 + 982 1016 mtype = get_pageblock_migratetype(page); 983 1017 984 1018 if (mtype < MIGRATE_TYPES) ··· 1041 1069 block_end_pfn = min(block_end_pfn, end_pfn); 1042 1070 1043 1071 page = pfn_to_page(pfn); 1044 - pageblock_mt = get_pfnblock_migratetype(page, pfn); 1072 + pageblock_mt = get_pageblock_migratetype(page); 1045 1073 1046 1074 for (; pfn < block_end_pfn; pfn++) { 1047 1075 if (!pfn_valid_within(pfn)) 1048 1076 continue; 1049 1077 1050 1078 page = pfn_to_page(pfn); 1079 + 1080 + if (page_zone(page) != zone) 1081 + continue; 1082 + 1051 1083 if (PageBuddy(page)) { 1052 1084 pfn += (1UL << page_order(page)) - 1; 1053 1085 continue; ··· 1353 1377 static DEFINE_PER_CPU(struct delayed_work, vmstat_work); 1354 1378 int sysctl_stat_interval __read_mostly = HZ; 1355 1379 static cpumask_var_t cpu_stat_off; 1380 + 1381 + #ifdef CONFIG_PROC_FS 1382 + static void refresh_vm_stats(struct work_struct *work) 1383 + { 1384 + refresh_cpu_vm_stats(true); 1385 + } 1386 + 1387 + int vmstat_refresh(struct ctl_table *table, int write, 1388 + void __user *buffer, size_t *lenp, loff_t *ppos) 1389 + { 1390 + long val; 1391 + int err; 1392 + int i; 1393 + 1394 + /* 1395 + * The regular update, every sysctl_stat_interval, may come later 1396 + * than expected: leaving a significant amount in per_cpu buckets. 1397 + * This is particularly misleading when checking a quantity of HUGE 1398 + * pages, immediately after running a test. /proc/sys/vm/stat_refresh, 1399 + * which can equally be echo'ed to or cat'ted from (by root), 1400 + * can be used to update the stats just before reading them. 1401 + * 1402 + * Oh, and since global_page_state() etc. are so careful to hide 1403 + * transiently negative values, report an error here if any of 1404 + * the stats is negative, so we know to go looking for imbalance. 1405 + */ 1406 + err = schedule_on_each_cpu(refresh_vm_stats); 1407 + if (err) 1408 + return err; 1409 + for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { 1410 + val = atomic_long_read(&vm_stat[i]); 1411 + if (val < 0) { 1412 + switch (i) { 1413 + case NR_ALLOC_BATCH: 1414 + case NR_PAGES_SCANNED: 1415 + /* 1416 + * These are often seen to go negative in 1417 + * recent kernels, but not to go permanently 1418 + * negative. Whilst it would be nicer not to 1419 + * have exceptions, rooting them out would be 1420 + * another task, of rather low priority. 1421 + */ 1422 + break; 1423 + default: 1424 + pr_warn("%s: %s %ld\n", 1425 + __func__, vmstat_text[i], val); 1426 + err = -EINVAL; 1427 + break; 1428 + } 1429 + } 1430 + } 1431 + if (err) 1432 + return err; 1433 + if (write) 1434 + *ppos += *lenp; 1435 + else 1436 + *lenp = 0; 1437 + return 0; 1438 + } 1439 + #endif /* CONFIG_PROC_FS */ 1356 1440 1357 1441 static void vmstat_update(struct work_struct *w) 1358 1442 {

+5 -3

net/socket.c

··· 2168 2168 struct mmsghdr __user *entry; 2169 2169 struct compat_mmsghdr __user *compat_entry; 2170 2170 struct msghdr msg_sys; 2171 - struct timespec end_time; 2171 + struct timespec64 end_time; 2172 + struct timespec64 timeout64; 2172 2173 2173 2174 if (timeout && 2174 2175 poll_select_set_timeout(&end_time, timeout->tv_sec, ··· 2221 2220 flags |= MSG_DONTWAIT; 2222 2221 2223 2222 if (timeout) { 2224 - ktime_get_ts(timeout); 2225 - *timeout = timespec_sub(end_time, *timeout); 2223 + ktime_get_ts64(&timeout64); 2224 + *timeout = timespec64_to_timespec( 2225 + timespec64_sub(end_time, timeout64)); 2226 2226 if (timeout->tv_sec < 0) { 2227 2227 timeout->tv_sec = timeout->tv_nsec = 0; 2228 2228 break;

+1 -1

net/wireless/util.c

··· 651 651 struct skb_shared_info *sh = skb_shinfo(skb); 652 652 int page_offset; 653 653 654 - atomic_inc(&page->_count); 654 + page_ref_inc(page); 655 655 page_offset = ptr - page_address(page); 656 656 skb_add_rx_frag(skb, sh->nr_frags, page, page_offset, len, size); 657 657 }

+6

scripts/bloat-o-meter

··· 32 32 new = getsizes(sys.argv[2]) 33 33 grow, shrink, add, remove, up, down = 0, 0, 0, 0, 0, 0 34 34 delta, common = [], {} 35 + otot, ntot = 0, 0 35 36 36 37 for a in old: 37 38 if a in new: 38 39 common[a] = 1 39 40 40 41 for name in old: 42 + otot += old[name] 41 43 if name not in common: 42 44 remove += 1 43 45 down += old[name] 44 46 delta.append((-old[name], name)) 45 47 46 48 for name in new: 49 + ntot += new[name] 47 50 if name not in common: 48 51 add += 1 49 52 up += new[name] ··· 66 63 print("%-40s %7s %7s %+7s" % ("function", "old", "new", "delta")) 67 64 for d, n in delta: 68 65 if d: print("%-40s %7s %7s %+7d" % (n, old.get(n,"-"), new.get(n,"-"), d)) 66 + 67 + print("Total: Before=%d, After=%d, chg %f%%" % \ 68 + (otot, ntot, (ntot - otot)*100/otot))

+39 -16

scripts/decode_stacktrace.sh

··· 2 2 # (c) 2014, Sasha Levin <sasha.levin@oracle.com> 3 3 #set -x 4 4 5 - if [[ $# != 2 ]]; then 5 + if [[ $# < 2 ]]; then 6 6 echo "Usage:" 7 - echo " $0 [vmlinux] [base path]" 7 + echo " $0 [vmlinux] [base path] [modules path]" 8 8 exit 1 9 9 fi 10 10 11 11 vmlinux=$1 12 12 basepath=$2 13 + modpath=$3 13 14 declare -A cache 15 + declare -A modcache 14 16 15 17 parse_symbol() { 16 18 # The structure of symbol at this point is: ··· 20 18 # 21 19 # For example: 22 20 # do_basic_setup+0x9c/0xbf 21 + 22 + if [[ $module == "" ]] ; then 23 + local objfile=$vmlinux 24 + elif [[ "${modcache[$module]+isset}" == "isset" ]]; then 25 + local objfile=${modcache[$module]} 26 + else 27 + [[ $modpath == "" ]] && return 28 + local objfile=$(find "$modpath" -name $module.ko -print -quit) 29 + [[ $objfile == "" ]] && return 30 + modcache[$module]=$objfile 31 + fi 23 32 24 33 # Remove the englobing parenthesis 25 34 symbol=${symbol#\(} ··· 42 29 # Use 'nm vmlinux' to figure out the base address of said symbol. 43 30 # It's actually faster to call it every time than to load it 44 31 # all into bash. 45 - if [[ "${cache[$name]+isset}" == "isset" ]]; then 46 - local base_addr=${cache[$name]} 32 + if [[ "${cache[$module,$name]+isset}" == "isset" ]]; then 33 + local base_addr=${cache[$module,$name]} 47 34 else 48 - local base_addr=$(nm "$vmlinux" | grep -i ' t ' | awk "/ $name\$/ {print \$1}" | head -n1) 49 - cache["$name"]="$base_addr" 35 + local base_addr=$(nm "$objfile" | grep -i ' t ' | awk "/ $name\$/ {print \$1}" | head -n1) 36 + cache[$module,$name]="$base_addr" 50 37 fi 51 38 # Let's start doing the math to get the exact address into the 52 39 # symbol. First, strip out the symbol total length. ··· 61 48 local address=$(printf "%x\n" "$expr") 62 49 63 50 # Pass it to addr2line to get filename and line number 64 - # Could get more than one result 65 - if [[ "${cache[$address]+isset}" == "isset" ]]; then 66 - local code=${cache[$address]} 51 + # Could get more than one result 52 + if [[ "${cache[$module,$address]+isset}" == "isset" ]]; then 53 + local code=${cache[$module,$address]} 67 54 else 68 - local code=$(addr2line -i -e "$vmlinux" "$address") 69 - cache[$address]=$code 55 + local code=$(addr2line -i -e "$objfile" "$address") 56 + cache[$module,$address]=$code 70 57 fi 71 58 72 59 # addr2line doesn't return a proper error code if it fails, so ··· 118 105 fi 119 106 done 120 107 121 - # The symbol is the last element, process it 122 - symbol=${words[$last]} 108 + if [[ ${words[$last]} =~ \[([^]]+)\] ]]; then 109 + module=${words[$last]} 110 + module=${module#\[} 111 + module=${module%\]} 112 + symbol=${words[$last-1]} 113 + unset words[$last-1] 114 + else 115 + # The symbol is the last element, process it 116 + symbol=${words[$last]} 117 + module= 118 + fi 119 + 123 120 unset words[$last] 124 121 parse_symbol # modifies $symbol 125 122 126 123 # Add up the line number to the symbol 127 - echo "${words[@]}" "$symbol" 124 + echo "${words[@]}" "$symbol $module" 128 125 } 129 126 130 127 while read line; do ··· 144 121 handle_line "$line" 145 122 # Is it a code line? 146 123 elif [[ $line == *Code:* ]]; then 147 - decode_code "$line" 148 - else 124 + decode_code "$line" 125 + else 149 126 # Nothing special in this line, show it as is 150 127 echo "$line" 151 128 fi

+1

scripts/spelling.txt

··· 428 428 fetaure||feature 429 429 fetaures||features 430 430 fileystem||filesystem 431 + fimware||firmware 431 432 finanize||finalize 432 433 findn||find 433 434 finilizes||finalizes