Merge branch 'akpm' (patches from Andrew)

+12

Documentation/admin-guide/kernel-parameters.txt

··· 834 834 dump out devices still on the deferred probe list after 835 835 retrying. 836 836 837 + dfltcc= [HW,S390] 838 + Format: { on | off | def_only | inf_only | always } 839 + on: s390 zlib hardware support for compression on 840 + level 1 and decompression (default) 841 + off: No s390 zlib hardware support 842 + def_only: s390 zlib hardware support for deflate 843 + only (compression on level 1) 844 + inf_only: s390 zlib hardware support for inflate 845 + only (decompression) 846 + always: Same as 'on' but ignores the selected compression 847 + level always using hardware support (used for debugging) 848 + 837 849 dhash_entries= [KNL] 838 850 Set number of hash buckets for dentry cache. 839 851

+1

Documentation/core-api/index.rst

··· 31 31 generic-radix-tree 32 32 memory-allocation 33 33 mm-api 34 + pin_user_pages 34 35 gfp_mask-from-fs-io 35 36 timekeeping 36 37 boot-time-mm

+232

Documentation/core-api/pin_user_pages.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ==================================================== 4 + pin_user_pages() and related calls 5 + ==================================================== 6 + 7 + .. contents:: :local: 8 + 9 + Overview 10 + ======== 11 + 12 + This document describes the following functions:: 13 + 14 + pin_user_pages() 15 + pin_user_pages_fast() 16 + pin_user_pages_remote() 17 + 18 + Basic description of FOLL_PIN 19 + ============================= 20 + 21 + FOLL_PIN and FOLL_LONGTERM are flags that can be passed to the get_user_pages*() 22 + ("gup") family of functions. FOLL_PIN has significant interactions and 23 + interdependencies with FOLL_LONGTERM, so both are covered here. 24 + 25 + FOLL_PIN is internal to gup, meaning that it should not appear at the gup call 26 + sites. This allows the associated wrapper functions (pin_user_pages*() and 27 + others) to set the correct combination of these flags, and to check for problems 28 + as well. 29 + 30 + FOLL_LONGTERM, on the other hand, *is* allowed to be set at the gup call sites. 31 + This is in order to avoid creating a large number of wrapper functions to cover 32 + all combinations of get*(), pin*(), FOLL_LONGTERM, and more. Also, the 33 + pin_user_pages*() APIs are clearly distinct from the get_user_pages*() APIs, so 34 + that's a natural dividing line, and a good point to make separate wrapper calls. 35 + In other words, use pin_user_pages*() for DMA-pinned pages, and 36 + get_user_pages*() for other cases. There are four cases described later on in 37 + this document, to further clarify that concept. 38 + 39 + FOLL_PIN and FOLL_GET are mutually exclusive for a given gup call. However, 40 + multiple threads and call sites are free to pin the same struct pages, via both 41 + FOLL_PIN and FOLL_GET. It's just the call site that needs to choose one or the 42 + other, not the struct page(s). 43 + 44 + The FOLL_PIN implementation is nearly the same as FOLL_GET, except that FOLL_PIN 45 + uses a different reference counting technique. 46 + 47 + FOLL_PIN is a prerequisite to FOLL_LONGTERM. Another way of saying that is, 48 + FOLL_LONGTERM is a specific case, more restrictive case of FOLL_PIN. 49 + 50 + Which flags are set by each wrapper 51 + =================================== 52 + 53 + For these pin_user_pages*() functions, FOLL_PIN is OR'd in with whatever gup 54 + flags the caller provides. The caller is required to pass in a non-null struct 55 + pages* array, and the function then pin pages by incrementing each by a special 56 + value. For now, that value is +1, just like get_user_pages*().:: 57 + 58 + Function 59 + -------- 60 + pin_user_pages FOLL_PIN is always set internally by this function. 61 + pin_user_pages_fast FOLL_PIN is always set internally by this function. 62 + pin_user_pages_remote FOLL_PIN is always set internally by this function. 63 + 64 + For these get_user_pages*() functions, FOLL_GET might not even be specified. 65 + Behavior is a little more complex than above. If FOLL_GET was *not* specified, 66 + but the caller passed in a non-null struct pages* array, then the function 67 + sets FOLL_GET for you, and proceeds to pin pages by incrementing the refcount 68 + of each page by +1.:: 69 + 70 + Function 71 + -------- 72 + get_user_pages FOLL_GET is sometimes set internally by this function. 73 + get_user_pages_fast FOLL_GET is sometimes set internally by this function. 74 + get_user_pages_remote FOLL_GET is sometimes set internally by this function. 75 + 76 + Tracking dma-pinned pages 77 + ========================= 78 + 79 + Some of the key design constraints, and solutions, for tracking dma-pinned 80 + pages: 81 + 82 + * An actual reference count, per struct page, is required. This is because 83 + multiple processes may pin and unpin a page. 84 + 85 + * False positives (reporting that a page is dma-pinned, when in fact it is not) 86 + are acceptable, but false negatives are not. 87 + 88 + * struct page may not be increased in size for this, and all fields are already 89 + used. 90 + 91 + * Given the above, we can overload the page->_refcount field by using, sort of, 92 + the upper bits in that field for a dma-pinned count. "Sort of", means that, 93 + rather than dividing page->_refcount into bit fields, we simple add a medium- 94 + large value (GUP_PIN_COUNTING_BIAS, initially chosen to be 1024: 10 bits) to 95 + page->_refcount. This provides fuzzy behavior: if a page has get_page() called 96 + on it 1024 times, then it will appear to have a single dma-pinned count. 97 + And again, that's acceptable. 98 + 99 + This also leads to limitations: there are only 31-10==21 bits available for a 100 + counter that increments 10 bits at a time. 101 + 102 + TODO: for 1GB and larger huge pages, this is cutting it close. That's because 103 + when pin_user_pages() follows such pages, it increments the head page by "1" 104 + (where "1" used to mean "+1" for get_user_pages(), but now means "+1024" for 105 + pin_user_pages()) for each tail page. So if you have a 1GB huge page: 106 + 107 + * There are 256K (18 bits) worth of 4 KB tail pages. 108 + * There are 21 bits available to count up via GUP_PIN_COUNTING_BIAS (that is, 109 + 10 bits at a time) 110 + * There are 21 - 18 == 3 bits available to count. Except that there aren't, 111 + because you need to allow for a few normal get_page() calls on the head page, 112 + as well. Fortunately, the approach of using addition, rather than "hard" 113 + bitfields, within page->_refcount, allows for sharing these bits gracefully. 114 + But we're still looking at about 8 references. 115 + 116 + This, however, is a missing feature more than anything else, because it's easily 117 + solved by addressing an obvious inefficiency in the original get_user_pages() 118 + approach of retrieving pages: stop treating all the pages as if they were 119 + PAGE_SIZE. Retrieve huge pages as huge pages. The callers need to be aware of 120 + this, so some work is required. Once that's in place, this limitation mostly 121 + disappears from view, because there will be ample refcounting range available. 122 + 123 + * Callers must specifically request "dma-pinned tracking of pages". In other 124 + words, just calling get_user_pages() will not suffice; a new set of functions, 125 + pin_user_page() and related, must be used. 126 + 127 + FOLL_PIN, FOLL_GET, FOLL_LONGTERM: when to use which flags 128 + ========================================================== 129 + 130 + Thanks to Jan Kara, Vlastimil Babka and several other -mm people, for describing 131 + these categories: 132 + 133 + CASE 1: Direct IO (DIO) 134 + ----------------------- 135 + There are GUP references to pages that are serving 136 + as DIO buffers. These buffers are needed for a relatively short time (so they 137 + are not "long term"). No special synchronization with page_mkclean() or 138 + munmap() is provided. Therefore, flags to set at the call site are: :: 139 + 140 + FOLL_PIN 141 + 142 + ...but rather than setting FOLL_PIN directly, call sites should use one of 143 + the pin_user_pages*() routines that set FOLL_PIN. 144 + 145 + CASE 2: RDMA 146 + ------------ 147 + There are GUP references to pages that are serving as DMA 148 + buffers. These buffers are needed for a long time ("long term"). No special 149 + synchronization with page_mkclean() or munmap() is provided. Therefore, flags 150 + to set at the call site are: :: 151 + 152 + FOLL_PIN | FOLL_LONGTERM 153 + 154 + NOTE: Some pages, such as DAX pages, cannot be pinned with longterm pins. That's 155 + because DAX pages do not have a separate page cache, and so "pinning" implies 156 + locking down file system blocks, which is not (yet) supported in that way. 157 + 158 + CASE 3: Hardware with page faulting support 159 + ------------------------------------------- 160 + Here, a well-written driver doesn't normally need to pin pages at all. However, 161 + if the driver does choose to do so, it can register MMU notifiers for the range, 162 + and will be called back upon invalidation. Either way (avoiding page pinning, or 163 + using MMU notifiers to unpin upon request), there is proper synchronization with 164 + both filesystem and mm (page_mkclean(), munmap(), etc). 165 + 166 + Therefore, neither flag needs to be set. 167 + 168 + In this case, ideally, neither get_user_pages() nor pin_user_pages() should be 169 + called. Instead, the software should be written so that it does not pin pages. 170 + This allows mm and filesystems to operate more efficiently and reliably. 171 + 172 + CASE 4: Pinning for struct page manipulation only 173 + ------------------------------------------------- 174 + Here, normal GUP calls are sufficient, so neither flag needs to be set. 175 + 176 + page_dma_pinned(): the whole point of pinning 177 + ============================================= 178 + 179 + The whole point of marking pages as "DMA-pinned" or "gup-pinned" is to be able 180 + to query, "is this page DMA-pinned?" That allows code such as page_mkclean() 181 + (and file system writeback code in general) to make informed decisions about 182 + what to do when a page cannot be unmapped due to such pins. 183 + 184 + What to do in those cases is the subject of a years-long series of discussions 185 + and debates (see the References at the end of this document). It's a TODO item 186 + here: fill in the details once that's worked out. Meanwhile, it's safe to say 187 + that having this available: :: 188 + 189 + static inline bool page_dma_pinned(struct page *page) 190 + 191 + ...is a prerequisite to solving the long-running gup+DMA problem. 192 + 193 + Another way of thinking about FOLL_GET, FOLL_PIN, and FOLL_LONGTERM 194 + =================================================================== 195 + 196 + Another way of thinking about these flags is as a progression of restrictions: 197 + FOLL_GET is for struct page manipulation, without affecting the data that the 198 + struct page refers to. FOLL_PIN is a *replacement* for FOLL_GET, and is for 199 + short term pins on pages whose data *will* get accessed. As such, FOLL_PIN is 200 + a "more severe" form of pinning. And finally, FOLL_LONGTERM is an even more 201 + restrictive case that has FOLL_PIN as a prerequisite: this is for pages that 202 + will be pinned longterm, and whose data will be accessed. 203 + 204 + Unit testing 205 + ============ 206 + This file:: 207 + 208 + tools/testing/selftests/vm/gup_benchmark.c 209 + 210 + has the following new calls to exercise the new pin*() wrapper functions: 211 + 212 + * PIN_FAST_BENCHMARK (./gup_benchmark -a) 213 + * PIN_BENCHMARK (./gup_benchmark -b) 214 + 215 + You can monitor how many total dma-pinned pages have been acquired and released 216 + since the system was booted, via two new /proc/vmstat entries: :: 217 + 218 + /proc/vmstat/nr_foll_pin_requested 219 + /proc/vmstat/nr_foll_pin_requested 220 + 221 + Those are both going to show zero, unless CONFIG_DEBUG_VM is set. This is 222 + because there is a noticeable performance drop in unpin_user_page(), when they 223 + are activated. 224 + 225 + References 226 + ========== 227 + 228 + * `Some slow progress on get_user_pages() (Apr 2, 2019) <https://lwn.net/Articles/784574/>`_ 229 + * `DMA and get_user_pages() (LPC: Dec 12, 2018) <https://lwn.net/Articles/774411/>`_ 230 + * `The trouble with get_user_pages() (Apr 30, 2018) <https://lwn.net/Articles/753027/>`_ 231 + 232 + John Hubbard, October, 2019

+13

Documentation/vm/zswap.rst

··· 130 130 existing pages which are marked as same-value filled pages remain stored 131 131 unchanged in zswap until they are either loaded or invalidated. 132 132 133 + To prevent zswap from shrinking pool when zswap is full and there's a high 134 + pressure on swap (this will result in flipping pages in and out zswap pool 135 + without any real benefit but with a performance drop for the system), a 136 + special parameter has been introduced to implement a sort of hysteresis to 137 + refuse taking pages into zswap pool until it has sufficient space if the limit 138 + has been hit. To set the threshold at which zswap would start accepting pages 139 + again after it became full, use the sysfs ``accept_threhsold_percent`` 140 + attribute, e. g.:: 141 + 142 + echo 80 > /sys/module/zswap/parameters/accept_threhsold_percent 143 + 144 + Setting this parameter to 100 will disable the hysteresis. 145 + 133 146 A debugfs interface is provided for various statistic about pool size, number 134 147 of pages stored, same-value filled pages and various counters for the reasons 135 148 pages are rejected.

+5 -5

arch/powerpc/mm/book3s64/iommu_api.c

··· 103 103 for (entry = 0; entry < entries; entry += chunk) { 104 104 unsigned long n = min(entries - entry, chunk); 105 105 106 - ret = get_user_pages(ua + (entry << PAGE_SHIFT), n, 106 + ret = pin_user_pages(ua + (entry << PAGE_SHIFT), n, 107 107 FOLL_WRITE | FOLL_LONGTERM, 108 108 mem->hpages + entry, NULL); 109 109 if (ret == n) { ··· 167 167 return 0; 168 168 169 169 free_exit: 170 - /* free the reference taken */ 171 - for (i = 0; i < pinned; i++) 172 - put_page(mem->hpages[i]); 170 + /* free the references taken */ 171 + unpin_user_pages(mem->hpages, pinned); 173 172 174 173 vfree(mem->hpas); 175 174 kfree(mem); ··· 214 215 if (mem->hpas[i] & MM_IOMMU_TABLE_GROUP_PAGE_DIRTY) 215 216 SetPageDirty(page); 216 217 217 - put_page(page); 218 + unpin_user_page(page); 219 + 218 220 mem->hpas[i] = 0; 219 221 } 220 222 }

+4 -4

arch/s390/boot/compressed/decompressor.c

··· 30 30 extern unsigned char _compressed_end[]; 31 31 32 32 #ifdef CONFIG_HAVE_KERNEL_BZIP2 33 - #define HEAP_SIZE 0x400000 33 + #define BOOT_HEAP_SIZE 0x400000 34 34 #else 35 - #define HEAP_SIZE 0x10000 35 + #define BOOT_HEAP_SIZE 0x10000 36 36 #endif 37 37 38 38 static unsigned long free_mem_ptr = (unsigned long) _end; 39 - static unsigned long free_mem_end_ptr = (unsigned long) _end + HEAP_SIZE; 39 + static unsigned long free_mem_end_ptr = (unsigned long) _end + BOOT_HEAP_SIZE; 40 40 41 41 #ifdef CONFIG_KERNEL_GZIP 42 42 #include "../../../../lib/decompress_inflate.c" ··· 62 62 #include "../../../../lib/decompress_unxz.c" 63 63 #endif 64 64 65 - #define decompress_offset ALIGN((unsigned long)_end + HEAP_SIZE, PAGE_SIZE) 65 + #define decompress_offset ALIGN((unsigned long)_end + BOOT_HEAP_SIZE, PAGE_SIZE) 66 66 67 67 unsigned long mem_safe_offset(void) 68 68 {

+14

arch/s390/boot/ipl_parm.c

··· 14 14 char __bootdata(early_command_line)[COMMAND_LINE_SIZE]; 15 15 struct ipl_parameter_block __bootdata_preserved(ipl_block); 16 16 int __bootdata_preserved(ipl_block_valid); 17 + unsigned int __bootdata_preserved(zlib_dfltcc_support) = ZLIB_DFLTCC_FULL; 17 18 18 19 unsigned long __bootdata(vmalloc_size) = VMALLOC_DEFAULT_SIZE; 19 20 unsigned long __bootdata(memory_end); ··· 229 228 230 229 if (!strcmp(param, "vmalloc") && val) 231 230 vmalloc_size = round_up(memparse(val, NULL), PAGE_SIZE); 231 + 232 + if (!strcmp(param, "dfltcc")) { 233 + if (!strcmp(val, "off")) 234 + zlib_dfltcc_support = ZLIB_DFLTCC_DISABLED; 235 + else if (!strcmp(val, "on")) 236 + zlib_dfltcc_support = ZLIB_DFLTCC_FULL; 237 + else if (!strcmp(val, "def_only")) 238 + zlib_dfltcc_support = ZLIB_DFLTCC_DEFLATE_ONLY; 239 + else if (!strcmp(val, "inf_only")) 240 + zlib_dfltcc_support = ZLIB_DFLTCC_INFLATE_ONLY; 241 + else if (!strcmp(val, "always")) 242 + zlib_dfltcc_support = ZLIB_DFLTCC_FULL_DEBUG; 243 + } 232 244 233 245 if (!strcmp(param, "noexec")) { 234 246 rc = kstrtobool(val, &enabled);

+7

arch/s390/include/asm/setup.h

··· 79 79 char command_line[ARCH_COMMAND_LINE_SIZE]; /* 0x10480 */ 80 80 }; 81 81 82 + extern unsigned int zlib_dfltcc_support; 83 + #define ZLIB_DFLTCC_DISABLED 0 84 + #define ZLIB_DFLTCC_FULL 1 85 + #define ZLIB_DFLTCC_DEFLATE_ONLY 2 86 + #define ZLIB_DFLTCC_INFLATE_ONLY 3 87 + #define ZLIB_DFLTCC_FULL_DEBUG 4 88 + 82 89 extern int noexec_disabled; 83 90 extern int memory_end_set; 84 91 extern unsigned long memory_end;

+5 -9

arch/s390/kernel/setup.c

··· 111 111 unsigned long __bootdata_preserved(__sdma); 112 112 unsigned long __bootdata_preserved(__edma); 113 113 unsigned long __bootdata_preserved(__kaslr_offset); 114 + unsigned int __bootdata_preserved(zlib_dfltcc_support); 115 + EXPORT_SYMBOL(zlib_dfltcc_support); 114 116 115 117 unsigned long VMALLOC_START; 116 118 EXPORT_SYMBOL(VMALLOC_START); ··· 761 759 memblock_free(start, size); 762 760 } 763 761 764 - static void __init memblock_physmem_add(phys_addr_t start, phys_addr_t size) 765 - { 766 - memblock_dbg("memblock_physmem_add: [%#016llx-%#016llx]\n", 767 - start, start + size - 1); 768 - memblock_add_range(&memblock.memory, start, size, 0, 0); 769 - memblock_add_range(&memblock.physmem, start, size, 0, 0); 770 - } 771 - 772 762 static const char * __init get_mem_info_source(void) 773 763 { 774 764 switch (mem_detect.info_source) { ··· 785 791 get_mem_info_source(), mem_detect.info_source); 786 792 /* keep memblock lists close to the kernel */ 787 793 memblock_set_bottom_up(true); 788 - for_each_mem_detect_block(i, &start, &end) 794 + for_each_mem_detect_block(i, &start, &end) { 795 + memblock_add(start, end - start); 789 796 memblock_physmem_add(start, end - start); 797 + } 790 798 memblock_set_bottom_up(false); 791 799 memblock_dump_all(); 792 800 }

+18 -16

drivers/acpi/thermal.c

··· 27 27 #include <linux/acpi.h> 28 28 #include <linux/workqueue.h> 29 29 #include <linux/uaccess.h> 30 + #include <linux/units.h> 30 31 31 32 #define PREFIX "ACPI: " 32 33 ··· 173 172 struct acpi_handle_list devices; 174 173 struct thermal_zone_device *thermal_zone; 175 174 int tz_enabled; 176 - int kelvin_offset; 175 + int kelvin_offset; /* in millidegrees */ 177 176 struct work_struct thermal_check_work; 178 177 }; 179 178 ··· 298 297 if (crt == -1) { 299 298 tz->trips.critical.flags.valid = 0; 300 299 } else if (crt > 0) { 301 - unsigned long crt_k = CELSIUS_TO_DECI_KELVIN(crt); 300 + unsigned long crt_k = celsius_to_deci_kelvin(crt); 301 + 302 302 /* 303 303 * Allow override critical threshold 304 304 */ ··· 335 333 if (psv == -1) { 336 334 status = AE_SUPPORT; 337 335 } else if (psv > 0) { 338 - tmp = CELSIUS_TO_DECI_KELVIN(psv); 336 + tmp = celsius_to_deci_kelvin(psv); 339 337 status = AE_OK; 340 338 } else { 341 339 status = acpi_evaluate_integer(tz->device->handle, ··· 415 413 break; 416 414 if (i == 1) 417 415 tz->trips.active[0].temperature = 418 - CELSIUS_TO_DECI_KELVIN(act); 416 + celsius_to_deci_kelvin(act); 419 417 else 420 418 /* 421 419 * Don't allow override higher than ··· 423 421 */ 424 422 tz->trips.active[i - 1].temperature = 425 423 (tz->trips.active[i - 2].temperature < 426 - CELSIUS_TO_DECI_KELVIN(act) ? 424 + celsius_to_deci_kelvin(act) ? 427 425 tz->trips.active[i - 2].temperature : 428 - CELSIUS_TO_DECI_KELVIN(act)); 426 + celsius_to_deci_kelvin(act)); 429 427 break; 430 428 } else { 431 429 tz->trips.active[i].temperature = tmp; ··· 521 519 if (result) 522 520 return result; 523 521 524 - *temp = DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET(tz->temperature, 522 + *temp = deci_kelvin_to_millicelsius_with_offset(tz->temperature, 525 523 tz->kelvin_offset); 526 524 return 0; 527 525 } ··· 626 624 627 625 if (tz->trips.critical.flags.valid) { 628 626 if (!trip) { 629 - *temp = DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET( 627 + *temp = deci_kelvin_to_millicelsius_with_offset( 630 628 tz->trips.critical.temperature, 631 629 tz->kelvin_offset); 632 630 return 0; ··· 636 634 637 635 if (tz->trips.hot.flags.valid) { 638 636 if (!trip) { 639 - *temp = DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET( 637 + *temp = deci_kelvin_to_millicelsius_with_offset( 640 638 tz->trips.hot.temperature, 641 639 tz->kelvin_offset); 642 640 return 0; ··· 646 644 647 645 if (tz->trips.passive.flags.valid) { 648 646 if (!trip) { 649 - *temp = DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET( 647 + *temp = deci_kelvin_to_millicelsius_with_offset( 650 648 tz->trips.passive.temperature, 651 649 tz->kelvin_offset); 652 650 return 0; ··· 657 655 for (i = 0; i < ACPI_THERMAL_MAX_ACTIVE && 658 656 tz->trips.active[i].flags.valid; i++) { 659 657 if (!trip) { 660 - *temp = DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET( 658 + *temp = deci_kelvin_to_millicelsius_with_offset( 661 659 tz->trips.active[i].temperature, 662 660 tz->kelvin_offset); 663 661 return 0; ··· 674 672 struct acpi_thermal *tz = thermal->devdata; 675 673 676 674 if (tz->trips.critical.flags.valid) { 677 - *temperature = DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET( 675 + *temperature = deci_kelvin_to_millicelsius_with_offset( 678 676 tz->trips.critical.temperature, 679 677 tz->kelvin_offset); 680 678 return 0; ··· 694 692 695 693 if (type == THERMAL_TRIP_ACTIVE) { 696 694 int trip_temp; 697 - int temp = DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET( 695 + int temp = deci_kelvin_to_millicelsius_with_offset( 698 696 tz->temperature, tz->kelvin_offset); 699 697 if (thermal_get_trip_temp(thermal, trip, &trip_temp)) 700 698 return -EINVAL; ··· 1045 1043 { 1046 1044 if (tz->trips.critical.flags.valid && 1047 1045 (tz->trips.critical.temperature % 5) == 1) 1048 - tz->kelvin_offset = 2731; 1046 + tz->kelvin_offset = 273100; 1049 1047 else 1050 - tz->kelvin_offset = 2732; 1048 + tz->kelvin_offset = 273200; 1051 1049 } 1052 1050 1053 1051 static void acpi_thermal_check_fn(struct work_struct *work) ··· 1089 1087 INIT_WORK(&tz->thermal_check_work, acpi_thermal_check_fn); 1090 1088 1091 1089 pr_info(PREFIX "%s [%s] (%ld C)\n", acpi_device_name(device), 1092 - acpi_device_bid(device), DECI_KELVIN_TO_CELSIUS(tz->temperature)); 1090 + acpi_device_bid(device), deci_kelvin_to_celsius(tz->temperature)); 1093 1091 goto end; 1094 1092 1095 1093 free_memory:

+3 -22

drivers/base/memory.c

··· 70 70 } 71 71 EXPORT_SYMBOL(unregister_memory_notifier); 72 72 73 - static ATOMIC_NOTIFIER_HEAD(memory_isolate_chain); 74 - 75 - int register_memory_isolate_notifier(struct notifier_block *nb) 76 - { 77 - return atomic_notifier_chain_register(&memory_isolate_chain, nb); 78 - } 79 - EXPORT_SYMBOL(register_memory_isolate_notifier); 80 - 81 - void unregister_memory_isolate_notifier(struct notifier_block *nb) 82 - { 83 - atomic_notifier_chain_unregister(&memory_isolate_chain, nb); 84 - } 85 - EXPORT_SYMBOL(unregister_memory_isolate_notifier); 86 - 87 73 static void memory_block_release(struct device *dev) 88 74 { 89 75 struct memory_block *mem = to_memory_block(dev); ··· 161 175 return blocking_notifier_call_chain(&memory_chain, val, v); 162 176 } 163 177 164 - int memory_isolate_notify(unsigned long val, void *v) 165 - { 166 - return atomic_notifier_call_chain(&memory_isolate_chain, val, v); 167 - } 168 - 169 178 /* 170 179 * The probe routines leave the pages uninitialized, just as the bootmem code 171 180 * does. Make sure we do not access them, but instead use only information from ··· 206 225 */ 207 226 static int 208 227 memory_block_action(unsigned long start_section_nr, unsigned long action, 209 - int online_type) 228 + int online_type, int nid) 210 229 { 211 230 unsigned long start_pfn; 212 231 unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; ··· 219 238 if (!pages_correctly_probed(start_pfn)) 220 239 return -EBUSY; 221 240 222 - ret = online_pages(start_pfn, nr_pages, online_type); 241 + ret = online_pages(start_pfn, nr_pages, online_type, nid); 223 242 break; 224 243 case MEM_OFFLINE: 225 244 ret = offline_pages(start_pfn, nr_pages); ··· 245 264 mem->state = MEM_GOING_OFFLINE; 246 265 247 266 ret = memory_block_action(mem->start_section_nr, to_state, 248 - mem->online_type); 267 + mem->online_type, mem->nid); 249 268 250 269 mem->state = ret ? from_state_req : to_state; 251 270

+6 -4

drivers/block/zram/zram_drv.c

··· 207 207 208 208 static bool page_same_filled(void *ptr, unsigned long *element) 209 209 { 210 - unsigned int pos; 211 210 unsigned long *page; 212 211 unsigned long val; 212 + unsigned int pos, last_pos = PAGE_SIZE / sizeof(*page) - 1; 213 213 214 214 page = (unsigned long *)ptr; 215 215 val = page[0]; 216 216 217 - for (pos = 1; pos < PAGE_SIZE / sizeof(*page); pos++) { 217 + if (val != page[last_pos]) 218 + return false; 219 + 220 + for (pos = 1; pos < last_pos; pos++) { 218 221 if (val != page[pos]) 219 222 return false; 220 223 } ··· 629 626 struct bio bio; 630 627 struct bio_vec bio_vec; 631 628 struct page *page; 632 - ssize_t ret; 629 + ssize_t ret = len; 633 630 int mode; 634 631 unsigned long blk_idx = 0; 635 632 ··· 765 762 766 763 if (blk_idx) 767 764 free_block_bdev(zram, blk_idx); 768 - ret = len; 769 765 __free_page(page); 770 766 release_init_lock: 771 767 up_read(&zram->init_lock);

+3 -3

drivers/gpu/drm/via/via_dmablit.c

··· 188 188 kfree(vsg->desc_pages); 189 189 /* fall through */ 190 190 case dr_via_pages_locked: 191 - put_user_pages_dirty_lock(vsg->pages, vsg->num_pages, 192 - (vsg->direction == DMA_FROM_DEVICE)); 191 + unpin_user_pages_dirty_lock(vsg->pages, vsg->num_pages, 192 + (vsg->direction == DMA_FROM_DEVICE)); 193 193 /* fall through */ 194 194 case dr_via_pages_alloc: 195 195 vfree(vsg->pages); ··· 239 239 vsg->pages = vzalloc(array_size(sizeof(struct page *), vsg->num_pages)); 240 240 if (NULL == vsg->pages) 241 241 return -ENOMEM; 242 - ret = get_user_pages_fast((unsigned long)xfer->mem_addr, 242 + ret = pin_user_pages_fast((unsigned long)xfer->mem_addr, 243 243 vsg->num_pages, 244 244 vsg->direction == DMA_FROM_DEVICE ? FOLL_WRITE : 0, 245 245 vsg->pages);

+3 -3

drivers/iio/adc/qcom-vadc-common.c

··· 6 6 #include <linux/log2.h> 7 7 #include <linux/err.h> 8 8 #include <linux/module.h> 9 + #include <linux/units.h> 9 10 10 11 #include "qcom-vadc-common.h" 11 12 ··· 237 236 voltage = 0; 238 237 } 239 238 240 - voltage -= KELVINMIL_CELSIUSMIL; 241 - *result_mdec = voltage; 239 + *result_mdec = milli_kelvin_to_millicelsius(voltage); 242 240 243 241 return 0; 244 242 } ··· 325 325 { 326 326 *result_mdec = qcom_vadc_scale_code_voltage_factor(adc_code, 327 327 prescale, data, 2); 328 - *result_mdec -= KELVINMIL_CELSIUSMIL; 328 + *result_mdec = milli_kelvin_to_millicelsius(*result_mdec); 329 329 330 330 return 0; 331 331 }

-1

drivers/iio/adc/qcom-vadc-common.h

··· 38 38 #define VADC_AVG_SAMPLES_MAX 512 39 39 #define ADC5_AVG_SAMPLES_MAX 16 40 40 41 - #define KELVINMIL_CELSIUSMIL 273150 42 41 #define PMIC5_CHG_TEMP_SCALE_FACTOR 377500 43 42 #define PMIC5_SMB_TEMP_CONSTANT 419400 44 43 #define PMIC5_SMB_TEMP_SCALE_FACTOR 356

+7 -12

drivers/infiniband/core/umem.c

··· 54 54 55 55 for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) { 56 56 page = sg_page_iter_page(&sg_iter); 57 - put_user_pages_dirty_lock(&page, 1, umem->writable && dirty); 57 + unpin_user_pages_dirty_lock(&page, 1, umem->writable && dirty); 58 58 } 59 59 60 60 sg_free_table(&umem->sg_head); ··· 257 257 sg = umem->sg_head.sgl; 258 258 259 259 while (npages) { 260 - down_read(&mm->mmap_sem); 261 - ret = get_user_pages(cur_base, 262 - min_t(unsigned long, npages, 263 - PAGE_SIZE / sizeof (struct page *)), 264 - gup_flags | FOLL_LONGTERM, 265 - page_list, NULL); 266 - if (ret < 0) { 267 - up_read(&mm->mmap_sem); 260 + ret = pin_user_pages_fast(cur_base, 261 + min_t(unsigned long, npages, 262 + PAGE_SIZE / 263 + sizeof(struct page *)), 264 + gup_flags | FOLL_LONGTERM, page_list); 265 + if (ret < 0) 268 266 goto umem_release; 269 - } 270 267 271 268 cur_base += ret * PAGE_SIZE; 272 269 npages -= ret; ··· 271 274 sg = ib_umem_add_sg_table(sg, page_list, ret, 272 275 dma_get_max_seg_size(device->dma_device), 273 276 &umem->sg_nents); 274 - 275 - up_read(&mm->mmap_sem); 276 277 } 277 278 278 279 sg_mark_end(sg);

+6 -7

drivers/infiniband/core/umem_odp.c

··· 293 293 * The function returns -EFAULT if the DMA mapping operation fails. It returns 294 294 * -EAGAIN if a concurrent invalidation prevents us from updating the page. 295 295 * 296 - * The page is released via put_user_page even if the operation failed. For 297 - * on-demand pinning, the page is released whenever it isn't stored in the 298 - * umem. 296 + * The page is released via put_page even if the operation failed. For on-demand 297 + * pinning, the page is released whenever it isn't stored in the umem. 299 298 */ 300 299 static int ib_umem_odp_map_dma_single_page( 301 300 struct ib_umem_odp *umem_odp, ··· 347 348 } 348 349 349 350 out: 350 - put_user_page(page); 351 + put_page(page); 351 352 return ret; 352 353 } 353 354 ··· 457 458 ret = -EFAULT; 458 459 break; 459 460 } 460 - put_user_page(local_page_list[j]); 461 + put_page(local_page_list[j]); 461 462 continue; 462 463 } 463 464 ··· 484 485 * ib_umem_odp_map_dma_single_page(). 485 486 */ 486 487 if (npages - (j + 1) > 0) 487 - put_user_pages(&local_page_list[j+1], 488 - npages - (j + 1)); 488 + release_pages(&local_page_list[j+1], 489 + npages - (j + 1)); 489 490 break; 490 491 } 491 492 }

+2 -2

drivers/infiniband/hw/hfi1/user_pages.c

··· 106 106 int ret; 107 107 unsigned int gup_flags = FOLL_LONGTERM | (writable ? FOLL_WRITE : 0); 108 108 109 - ret = get_user_pages_fast(vaddr, npages, gup_flags, pages); 109 + ret = pin_user_pages_fast(vaddr, npages, gup_flags, pages); 110 110 if (ret < 0) 111 111 return ret; 112 112 ··· 118 118 void hfi1_release_user_pages(struct mm_struct *mm, struct page **p, 119 119 size_t npages, bool dirty) 120 120 { 121 - put_user_pages_dirty_lock(p, npages, dirty); 121 + unpin_user_pages_dirty_lock(p, npages, dirty); 122 122 123 123 if (mm) { /* during close after signal, mm can be NULL */ 124 124 atomic64_sub(npages, &mm->pinned_vm);

+4 -4

drivers/infiniband/hw/mthca/mthca_memfree.c

··· 472 472 goto out; 473 473 } 474 474 475 - ret = get_user_pages_fast(uaddr & PAGE_MASK, 1, 475 + ret = pin_user_pages_fast(uaddr & PAGE_MASK, 1, 476 476 FOLL_WRITE | FOLL_LONGTERM, pages); 477 477 if (ret < 0) 478 478 goto out; ··· 482 482 483 483 ret = pci_map_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE); 484 484 if (ret < 0) { 485 - put_user_page(pages[0]); 485 + unpin_user_page(pages[0]); 486 486 goto out; 487 487 } 488 488 ··· 490 490 mthca_uarc_virt(dev, uar, i)); 491 491 if (ret) { 492 492 pci_unmap_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE); 493 - put_user_page(sg_page(&db_tab->page[i].mem)); 493 + unpin_user_page(sg_page(&db_tab->page[i].mem)); 494 494 goto out; 495 495 } 496 496 ··· 556 556 if (db_tab->page[i].uvirt) { 557 557 mthca_UNMAP_ICM(dev, mthca_uarc_virt(dev, uar, i), 1); 558 558 pci_unmap_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE); 559 - put_user_page(sg_page(&db_tab->page[i].mem)); 559 + unpin_user_page(sg_page(&db_tab->page[i].mem)); 560 560 } 561 561 } 562 562

+2 -2

drivers/infiniband/hw/qib/qib_user_pages.c

··· 40 40 static void __qib_release_user_pages(struct page **p, size_t num_pages, 41 41 int dirty) 42 42 { 43 - put_user_pages_dirty_lock(p, num_pages, dirty); 43 + unpin_user_pages_dirty_lock(p, num_pages, dirty); 44 44 } 45 45 46 46 /** ··· 108 108 109 109 down_read(&current->mm->mmap_sem); 110 110 for (got = 0; got < num_pages; got += ret) { 111 - ret = get_user_pages(start_page + got * PAGE_SIZE, 111 + ret = pin_user_pages(start_page + got * PAGE_SIZE, 112 112 num_pages - got, 113 113 FOLL_LONGTERM | FOLL_WRITE | FOLL_FORCE, 114 114 p + got, NULL);

+4 -4

drivers/infiniband/hw/qib/qib_user_sdma.c

··· 317 317 * the caller can ignore this page. 318 318 */ 319 319 if (put) { 320 - put_user_page(page); 320 + unpin_user_page(page); 321 321 } else { 322 322 /* coalesce case */ 323 323 kunmap(page); ··· 631 631 kunmap(pkt->addr[i].page); 632 632 633 633 if (pkt->addr[i].put_page) 634 - put_user_page(pkt->addr[i].page); 634 + unpin_user_page(pkt->addr[i].page); 635 635 else 636 636 __free_page(pkt->addr[i].page); 637 637 } else if (pkt->addr[i].kvaddr) { ··· 670 670 else 671 671 j = npages; 672 672 673 - ret = get_user_pages_fast(addr, j, FOLL_LONGTERM, pages); 673 + ret = pin_user_pages_fast(addr, j, FOLL_LONGTERM, pages); 674 674 if (ret != j) { 675 675 i = 0; 676 676 j = ret; ··· 706 706 /* if error, return all pages not managed by pkt */ 707 707 free_pages: 708 708 while (i < j) 709 - put_user_page(pages[i++]); 709 + unpin_user_page(pages[i++]); 710 710 711 711 done: 712 712 return ret;

+2 -2

drivers/infiniband/hw/usnic/usnic_uiom.c

··· 75 75 for_each_sg(chunk->page_list, sg, chunk->nents, i) { 76 76 page = sg_page(sg); 77 77 pa = sg_phys(sg); 78 - put_user_pages_dirty_lock(&page, 1, dirty); 78 + unpin_user_pages_dirty_lock(&page, 1, dirty); 79 79 usnic_dbg("pa: %pa\n", &pa); 80 80 } 81 81 kfree(chunk); ··· 141 141 ret = 0; 142 142 143 143 while (npages) { 144 - ret = get_user_pages(cur_base, 144 + ret = pin_user_pages(cur_base, 145 145 min_t(unsigned long, npages, 146 146 PAGE_SIZE / sizeof(struct page *)), 147 147 gup_flags | FOLL_LONGTERM,

+2 -2

drivers/infiniband/sw/siw/siw_mem.c

··· 63 63 static void siw_free_plist(struct siw_page_chunk *chunk, int num_pages, 64 64 bool dirty) 65 65 { 66 - put_user_pages_dirty_lock(chunk->plist, num_pages, dirty); 66 + unpin_user_pages_dirty_lock(chunk->plist, num_pages, dirty); 67 67 } 68 68 69 69 void siw_umem_release(struct siw_umem *umem, bool dirty) ··· 426 426 while (nents) { 427 427 struct page **plist = &umem->page_chunk[i].plist[got]; 428 428 429 - rv = get_user_pages(first_page_va, nents, 429 + rv = pin_user_pages(first_page_va, nents, 430 430 foll_flags | FOLL_LONGTERM, 431 431 plist, NULL); 432 432 if (rv < 0)

+4 -4

drivers/media/v4l2-core/videobuf-dma-sg.c

··· 183 183 dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n", 184 184 data, size, dma->nr_pages); 185 185 186 - err = get_user_pages(data & PAGE_MASK, dma->nr_pages, 186 + err = pin_user_pages(data & PAGE_MASK, dma->nr_pages, 187 187 flags | FOLL_LONGTERM, dma->pages, NULL); 188 188 189 189 if (err != dma->nr_pages) { 190 190 dma->nr_pages = (err >= 0) ? err : 0; 191 - dprintk(1, "get_user_pages: err=%d [%d]\n", err, 191 + dprintk(1, "pin_user_pages: err=%d [%d]\n", err, 192 192 dma->nr_pages); 193 193 return err < 0 ? err : -EINVAL; 194 194 } ··· 349 349 BUG_ON(dma->sglen); 350 350 351 351 if (dma->pages) { 352 - for (i = 0; i < dma->nr_pages; i++) 353 - put_page(dma->pages[i]); 352 + unpin_user_pages_dirty_lock(dma->pages, dma->nr_pages, 353 + dma->direction == DMA_FROM_DEVICE); 354 354 kfree(dma->pages); 355 355 dma->pages = NULL; 356 356 }

-1

drivers/net/ethernet/broadcom/bnx2x/bnx2x_init.h

··· 296 296 * possible, the driver should only write the valid vnics into the internal 297 297 * ram according to the appropriate port mode. 298 298 */ 299 - #define BITS_TO_BYTES(x) ((x)/8) 300 299 301 300 /* CMNG constants, as derived from system spec calculations */ 302 301

+2 -1

drivers/net/wireless/intel/iwlegacy/4965-mac.c

··· 27 27 #include <linux/firmware.h> 28 28 #include <linux/etherdevice.h> 29 29 #include <linux/if_arp.h> 30 + #include <linux/units.h> 30 31 31 32 #include <net/mac80211.h> 32 33 ··· 6469 6468 il->hw_params.valid_rx_ant = il->cfg->valid_rx_ant; 6470 6469 6471 6470 il->hw_params.ct_kill_threshold = 6472 - CELSIUS_TO_KELVIN(CT_KILL_THRESHOLD_LEGACY); 6471 + celsius_to_kelvin(CT_KILL_THRESHOLD_LEGACY); 6473 6472 6474 6473 il->hw_params.sens = &il4965_sensitivity; 6475 6474 il->hw_params.beacon_time_tsf_bits = IL4965_EXT_BEACON_TIME_POS;

+9 -8

drivers/net/wireless/intel/iwlegacy/4965.c

··· 17 17 #include <linux/sched.h> 18 18 #include <linux/skbuff.h> 19 19 #include <linux/netdevice.h> 20 + #include <linux/units.h> 20 21 #include <net/mac80211.h> 21 22 #include <linux/etherdevice.h> 22 23 #include <asm/unaligned.h> ··· 1105 1104 /* get current temperature (Celsius) */ 1106 1105 current_temp = max(il->temperature, IL_TX_POWER_TEMPERATURE_MIN); 1107 1106 current_temp = min(il->temperature, IL_TX_POWER_TEMPERATURE_MAX); 1108 - current_temp = KELVIN_TO_CELSIUS(current_temp); 1107 + current_temp = kelvin_to_celsius(current_temp); 1109 1108 1110 1109 /* select thermal txpower adjustment params, based on channel group 1111 1110 * (same frequency group used for mimo txatten adjustment) */ ··· 1611 1610 temperature = 1612 1611 (temperature * 97) / 100 + TEMPERATURE_CALIB_KELVIN_OFFSET; 1613 1612 1614 - D_TEMP("Calibrated temperature: %dK, %dC\n", temperature, 1615 - KELVIN_TO_CELSIUS(temperature)); 1613 + D_TEMP("Calibrated temperature: %dK, %ldC\n", temperature, 1614 + kelvin_to_celsius(temperature)); 1616 1615 1617 1616 return temperature; 1618 1617 } ··· 1671 1670 1672 1671 if (il->temperature != temp) { 1673 1672 if (il->temperature) 1674 - D_TEMP("Temperature changed " "from %dC to %dC\n", 1675 - KELVIN_TO_CELSIUS(il->temperature), 1676 - KELVIN_TO_CELSIUS(temp)); 1673 + D_TEMP("Temperature changed " "from %ldC to %ldC\n", 1674 + kelvin_to_celsius(il->temperature), 1675 + kelvin_to_celsius(temp)); 1677 1676 else 1678 - D_TEMP("Temperature " "initialized to %dC\n", 1679 - KELVIN_TO_CELSIUS(temp)); 1677 + D_TEMP("Temperature " "initialized to %ldC\n", 1678 + kelvin_to_celsius(temp)); 1680 1679 } 1681 1680 1682 1681 il->temperature = temp;

-3

drivers/net/wireless/intel/iwlegacy/common.h

··· 779 779 u16 nrg_th_cca; 780 780 }; 781 781 782 - #define KELVIN_TO_CELSIUS(x) ((x)-273) 783 - #define CELSIUS_TO_KELVIN(x) ((x)+273) 784 - 785 782 /** 786 783 * struct il_hw_params 787 784 * @bcast_id: f/w broadcast station ID

-5

drivers/net/wireless/intel/iwlwifi/dvm/dev.h

··· 237 237 u16 nrg_th_cca; 238 238 }; 239 239 240 - 241 - #define KELVIN_TO_CELSIUS(x) ((x)-273) 242 - #define CELSIUS_TO_KELVIN(x) ((x)+273) 243 - 244 - 245 240 /****************************************************************************** 246 241 * 247 242 * Functions implemented in core module which are forward declared here

+4 -2

drivers/net/wireless/intel/iwlwifi/dvm/devices.c

··· 10 10 * 11 11 *****************************************************************************/ 12 12 13 + #include <linux/units.h> 14 + 13 15 /* 14 16 * DVM device-specific data & functions 15 17 */ ··· 347 345 static void iwl5150_set_ct_threshold(struct iwl_priv *priv) 348 346 { 349 347 const s32 volt2temp_coef = IWL_5150_VOLTAGE_TO_TEMPERATURE_COEFF; 350 - s32 threshold = (s32)CELSIUS_TO_KELVIN(CT_KILL_THRESHOLD_LEGACY) - 348 + s32 threshold = (s32)celsius_to_kelvin(CT_KILL_THRESHOLD_LEGACY) - 351 349 iwl_temp_calib_to_offset(priv); 352 350 353 351 priv->hw_params.ct_kill_threshold = threshold * volt2temp_coef; ··· 383 381 vt = le32_to_cpu(priv->statistics.common.temperature); 384 382 vt = vt / IWL_5150_VOLTAGE_TO_TEMPERATURE_COEFF + offset; 385 383 /* now vt hold the temperature in Kelvin */ 386 - priv->temperature = KELVIN_TO_CELSIUS(vt); 384 + priv->temperature = kelvin_to_celsius(vt); 387 385 iwl_tt_handler(priv); 388 386 } 389 387

-6

drivers/nvdimm/pmem.c

··· 337 337 put_disk(pmem->disk); 338 338 } 339 339 340 - static void pmem_pagemap_page_free(struct page *page) 341 - { 342 - wake_up_var(&page->_refcount); 343 - } 344 - 345 340 static const struct dev_pagemap_ops fsdax_pagemap_ops = { 346 - .page_free = pmem_pagemap_page_free, 347 341 .kill = pmem_pagemap_kill, 348 342 .cleanup = pmem_pagemap_cleanup, 349 343 };

+5 -8

drivers/nvme/host/hwmon.c

··· 5 5 */ 6 6 7 7 #include <linux/hwmon.h> 8 + #include <linux/units.h> 8 9 #include <asm/unaligned.h> 9 10 10 11 #include "nvme.h" 11 - 12 - /* These macros should be moved to linux/temperature.h */ 13 - #define MILLICELSIUS_TO_KELVIN(t) DIV_ROUND_CLOSEST((t) + 273150, 1000) 14 - #define KELVIN_TO_MILLICELSIUS(t) ((t) * 1000L - 273150) 15 12 16 13 struct nvme_hwmon_data { 17 14 struct nvme_ctrl *ctrl; ··· 32 35 return -EIO; 33 36 if (ret < 0) 34 37 return ret; 35 - *temp = KELVIN_TO_MILLICELSIUS(status & NVME_TEMP_THRESH_MASK); 38 + *temp = kelvin_to_millicelsius(status & NVME_TEMP_THRESH_MASK); 36 39 37 40 return 0; 38 41 } ··· 43 46 unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT; 44 47 int ret; 45 48 46 - temp = MILLICELSIUS_TO_KELVIN(temp); 49 + temp = millicelsius_to_kelvin(temp); 47 50 threshold |= clamp_val(temp, 0, NVME_TEMP_THRESH_MASK); 48 51 49 52 if (under) ··· 85 88 case hwmon_temp_min: 86 89 return nvme_get_temp_thresh(data->ctrl, channel, true, val); 87 90 case hwmon_temp_crit: 88 - *val = KELVIN_TO_MILLICELSIUS(data->ctrl->cctemp); 91 + *val = kelvin_to_millicelsius(data->ctrl->cctemp); 89 92 return 0; 90 93 default: 91 94 break; ··· 102 105 temp = get_unaligned_le16(log->temperature); 103 106 else 104 107 temp = le16_to_cpu(log->temp_sensor[channel - 1]); 105 - *val = KELVIN_TO_MILLICELSIUS(temp); 108 + *val = kelvin_to_millicelsius(temp); 106 109 break; 107 110 case hwmon_temp_alarm: 108 111 *val = !!(log->critical_warning & NVME_SMART_CRIT_TEMPERATURE);

+12 -23

drivers/platform/goldfish/goldfish_pipe.c

··· 257 257 } 258 258 } 259 259 260 - static int pin_user_pages(unsigned long first_page, 261 - unsigned long last_page, 262 - unsigned int last_page_size, 263 - int is_write, 264 - struct page *pages[MAX_BUFFERS_PER_COMMAND], 265 - unsigned int *iter_last_page_size) 260 + static int goldfish_pin_pages(unsigned long first_page, 261 + unsigned long last_page, 262 + unsigned int last_page_size, 263 + int is_write, 264 + struct page *pages[MAX_BUFFERS_PER_COMMAND], 265 + unsigned int *iter_last_page_size) 266 266 { 267 267 int ret; 268 268 int requested_pages = ((last_page - first_page) >> PAGE_SHIFT) + 1; ··· 274 274 *iter_last_page_size = last_page_size; 275 275 } 276 276 277 - ret = get_user_pages_fast(first_page, requested_pages, 277 + ret = pin_user_pages_fast(first_page, requested_pages, 278 278 !is_write ? FOLL_WRITE : 0, 279 279 pages); 280 280 if (ret <= 0) ··· 283 283 *iter_last_page_size = PAGE_SIZE; 284 284 285 285 return ret; 286 - } 287 - 288 - static void release_user_pages(struct page **pages, int pages_count, 289 - int is_write, s32 consumed_size) 290 - { 291 - int i; 292 - 293 - for (i = 0; i < pages_count; i++) { 294 - if (!is_write && consumed_size > 0) 295 - set_page_dirty(pages[i]); 296 - put_page(pages[i]); 297 - } 298 286 } 299 287 300 288 /* Populate the call parameters, merging adjacent pages together */ ··· 342 354 if (mutex_lock_interruptible(&pipe->lock)) 343 355 return -ERESTARTSYS; 344 356 345 - pages_count = pin_user_pages(first_page, last_page, 346 - last_page_size, is_write, 347 - pipe->pages, &iter_last_page_size); 357 + pages_count = goldfish_pin_pages(first_page, last_page, 358 + last_page_size, is_write, 359 + pipe->pages, &iter_last_page_size); 348 360 if (pages_count < 0) { 349 361 mutex_unlock(&pipe->lock); 350 362 return pages_count; ··· 360 372 361 373 *consumed_size = pipe->command_buffer->rw_params.consumed_size; 362 374 363 - release_user_pages(pipe->pages, pages_count, is_write, *consumed_size); 375 + unpin_user_pages_dirty_lock(pipe->pages, pages_count, 376 + !is_write && *consumed_size > 0); 364 377 365 378 mutex_unlock(&pipe->lock); 366 379 return 0;

+3 -4

drivers/platform/x86/asus-wmi.c

··· 33 33 #include <linux/seq_file.h> 34 34 #include <linux/platform_data/x86/asus-wmi.h> 35 35 #include <linux/platform_device.h> 36 - #include <linux/thermal.h> 37 36 #include <linux/acpi.h> 38 37 #include <linux/dmi.h> 38 + #include <linux/units.h> 39 39 40 40 #include <acpi/battery.h> 41 41 #include <acpi/video.h> ··· 1514 1514 if (err < 0) 1515 1515 return err; 1516 1516 1517 - value = DECI_KELVIN_TO_CELSIUS((value & 0xFFFF)) * 1000; 1518 - 1519 - return sprintf(buf, "%d\n", value); 1517 + return sprintf(buf, "%ld\n", 1518 + deci_kelvin_to_millicelsius(value & 0xFFFF)); 1520 1519 } 1521 1520 1522 1521 /* Fan1 */

+6 -3

drivers/platform/x86/intel_menlow.c

··· 22 22 #include <linux/slab.h> 23 23 #include <linux/thermal.h> 24 24 #include <linux/types.h> 25 + #include <linux/units.h> 25 26 26 27 MODULE_AUTHOR("Thomas Sujith"); 27 28 MODULE_AUTHOR("Zhang Rui"); ··· 303 302 int result; 304 303 305 304 result = sensor_get_auxtrip(attr->handle, idx, &value); 305 + if (result) 306 + return result; 306 307 307 - return result ? result : sprintf(buf, "%lu", DECI_KELVIN_TO_CELSIUS(value)); 308 + return sprintf(buf, "%lu", deci_kelvin_to_celsius(value)); 308 309 } 309 310 310 311 static ssize_t aux0_show(struct device *dev, ··· 335 332 if (value < 0) 336 333 return -EINVAL; 337 334 338 - result = sensor_set_auxtrip(attr->handle, idx, 339 - CELSIUS_TO_DECI_KELVIN(value)); 335 + result = sensor_set_auxtrip(attr->handle, idx, 336 + celsius_to_deci_kelvin(value)); 340 337 return result ? result : count; 341 338 } 342 339

-2

drivers/thermal/armada_thermal.c

··· 21 21 22 22 #include "thermal_core.h" 23 23 24 - #define TO_MCELSIUS(c) ((c) * 1000) 25 - 26 24 /* Thermal Manager Control and Status Register */ 27 25 #define PMU_TDC0_SW_RST_MASK (0x1 << 1) 28 26 #define PMU_TM_DISABLE_OFFS 0

+4 -3

drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c

··· 8 8 #include <linux/init.h> 9 9 #include <linux/acpi.h> 10 10 #include <linux/thermal.h> 11 + #include <linux/units.h> 11 12 #include "int340x_thermal_zone.h" 12 13 13 14 static int int340x_thermal_get_zone_temp(struct thermal_zone_device *zone, ··· 35 34 *temp = (unsigned long)conv_temp * 10; 36 35 } else 37 36 /* _TMP returns the temperature in tenths of degrees Kelvin */ 38 - *temp = DECI_KELVIN_TO_MILLICELSIUS(tmp); 37 + *temp = deci_kelvin_to_millicelsius(tmp); 39 38 40 39 return 0; 41 40 } ··· 117 116 118 117 snprintf(name, sizeof(name), "PAT%d", trip); 119 118 status = acpi_execute_simple_method(d->adev->handle, name, 120 - MILLICELSIUS_TO_DECI_KELVIN(temp)); 119 + millicelsius_to_deci_kelvin(temp)); 121 120 if (ACPI_FAILURE(status)) 122 121 return -EIO; 123 122 ··· 164 163 if (ACPI_FAILURE(status)) 165 164 return -EIO; 166 165 167 - *temp = DECI_KELVIN_TO_MILLICELSIUS(r); 166 + *temp = deci_kelvin_to_millicelsius(r); 168 167 169 168 return 0; 170 169 }

+2 -1

drivers/thermal/intel/intel_pch_thermal.c

··· 13 13 #include <linux/pci.h> 14 14 #include <linux/acpi.h> 15 15 #include <linux/thermal.h> 16 + #include <linux/units.h> 16 17 #include <linux/pm.h> 17 18 18 19 /* Intel PCH thermal Device IDs */ ··· 94 93 if (ACPI_SUCCESS(status)) { 95 94 unsigned long trip_temp; 96 95 97 - trip_temp = DECI_KELVIN_TO_MILLICELSIUS(r); 96 + trip_temp = deci_kelvin_to_millicelsius(r); 98 97 if (trip_temp) { 99 98 ptd->psv_temp = trip_temp; 100 99 ptd->psv_trip_id = *nr_trips;

+7 -28

drivers/vfio/vfio_iommu_type1.c

··· 309 309 { 310 310 if (!is_invalid_reserved_pfn(pfn)) { 311 311 struct page *page = pfn_to_page(pfn); 312 - if (prot & IOMMU_WRITE) 313 - SetPageDirty(page); 314 - put_page(page); 312 + 313 + unpin_user_pages_dirty_lock(&page, 1, prot & IOMMU_WRITE); 315 314 return 1; 316 315 } 317 316 return 0; ··· 321 322 { 322 323 struct page *page[1]; 323 324 struct vm_area_struct *vma; 324 - struct vm_area_struct *vmas[1]; 325 325 unsigned int flags = 0; 326 326 int ret; 327 327 ··· 328 330 flags |= FOLL_WRITE; 329 331 330 332 down_read(&mm->mmap_sem); 331 - if (mm == current->mm) { 332 - ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page, 333 - vmas); 334 - } else { 335 - ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, 336 - vmas, NULL); 337 - /* 338 - * The lifetime of a vaddr_get_pfn() page pin is 339 - * userspace-controlled. In the fs-dax case this could 340 - * lead to indefinite stalls in filesystem operations. 341 - * Disallow attempts to pin fs-dax pages via this 342 - * interface. 343 - */ 344 - if (ret > 0 && vma_is_fsdax(vmas[0])) { 345 - ret = -EOPNOTSUPP; 346 - put_page(page[0]); 347 - } 348 - } 349 - up_read(&mm->mmap_sem); 350 - 333 + ret = pin_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM, 334 + page, NULL, NULL); 351 335 if (ret == 1) { 352 336 *pfn = page_to_pfn(page[0]); 353 - return 0; 337 + ret = 0; 338 + goto done; 354 339 } 355 - 356 - down_read(&mm->mmap_sem); 357 340 358 341 vaddr = untagged_addr(vaddr); 359 342 ··· 345 366 if (is_invalid_reserved_pfn(*pfn)) 346 367 ret = 0; 347 368 } 348 - 369 + done: 349 370 up_read(&mm->mmap_sem); 350 371 return ret; 351 372 }

+74 -70

fs/binfmt_elf.c

··· 97 97 .min_coredump = ELF_EXEC_PAGESIZE, 98 98 }; 99 99 100 - #define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE) 100 + #define BAD_ADDR(x) (unlikely((unsigned long)(x) >= TASK_SIZE)) 101 101 102 102 static int set_brk(unsigned long start, unsigned long end, int prot) 103 103 { ··· 161 161 #endif 162 162 163 163 static int 164 - create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec, 165 - unsigned long load_addr, unsigned long interp_load_addr) 164 + create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, 165 + unsigned long load_addr, unsigned long interp_load_addr, 166 + unsigned long e_entry) 166 167 { 168 + struct mm_struct *mm = current->mm; 167 169 unsigned long p = bprm->p; 168 170 int argc = bprm->argc; 169 171 int envc = bprm->envc; ··· 178 176 unsigned char k_rand_bytes[16]; 179 177 int items; 180 178 elf_addr_t *elf_info; 181 - int ei_index = 0; 179 + int ei_index; 182 180 const struct cred *cred = current_cred(); 183 181 struct vm_area_struct *vma; 184 182 ··· 228 226 return -EFAULT; 229 227 230 228 /* Create the ELF interpreter info */ 231 - elf_info = (elf_addr_t *)current->mm->saved_auxv; 229 + elf_info = (elf_addr_t *)mm->saved_auxv; 232 230 /* update AT_VECTOR_SIZE_BASE if the number of NEW_AUX_ENT() changes */ 233 231 #define NEW_AUX_ENT(id, val) \ 234 232 do { \ 235 - elf_info[ei_index++] = id; \ 236 - elf_info[ei_index++] = val; \ 233 + *elf_info++ = id; \ 234 + *elf_info++ = val; \ 237 235 } while (0) 238 236 239 237 #ifdef ARCH_DLINFO ··· 253 251 NEW_AUX_ENT(AT_PHNUM, exec->e_phnum); 254 252 NEW_AUX_ENT(AT_BASE, interp_load_addr); 255 253 NEW_AUX_ENT(AT_FLAGS, 0); 256 - NEW_AUX_ENT(AT_ENTRY, exec->e_entry); 254 + NEW_AUX_ENT(AT_ENTRY, e_entry); 257 255 NEW_AUX_ENT(AT_UID, from_kuid_munged(cred->user_ns, cred->uid)); 258 256 NEW_AUX_ENT(AT_EUID, from_kuid_munged(cred->user_ns, cred->euid)); 259 257 NEW_AUX_ENT(AT_GID, from_kgid_munged(cred->user_ns, cred->gid)); ··· 277 275 } 278 276 #undef NEW_AUX_ENT 279 277 /* AT_NULL is zero; clear the rest too */ 280 - memset(&elf_info[ei_index], 0, 281 - sizeof current->mm->saved_auxv - ei_index * sizeof elf_info[0]); 278 + memset(elf_info, 0, (char *)mm->saved_auxv + 279 + sizeof(mm->saved_auxv) - (char *)elf_info); 282 280 283 281 /* And advance past the AT_NULL entry. */ 284 - ei_index += 2; 282 + elf_info += 2; 285 283 284 + ei_index = elf_info - (elf_addr_t *)mm->saved_auxv; 286 285 sp = STACK_ADD(p, ei_index); 287 286 288 287 items = (argc + 1) + (envc + 1) + 1; ··· 302 299 * Grow the stack manually; some architectures have a limit on how 303 300 * far ahead a user-space access may be in order to grow the stack. 304 301 */ 305 - vma = find_extend_vma(current->mm, bprm->p); 302 + vma = find_extend_vma(mm, bprm->p); 306 303 if (!vma) 307 304 return -EFAULT; 308 305 ··· 311 308 return -EFAULT; 312 309 313 310 /* Populate list of argv pointers back to argv strings. */ 314 - p = current->mm->arg_end = current->mm->arg_start; 311 + p = mm->arg_end = mm->arg_start; 315 312 while (argc-- > 0) { 316 313 size_t len; 317 314 if (__put_user((elf_addr_t)p, sp++)) ··· 323 320 } 324 321 if (__put_user(0, sp++)) 325 322 return -EFAULT; 326 - current->mm->arg_end = p; 323 + mm->arg_end = p; 327 324 328 325 /* Populate list of envp pointers back to envp strings. */ 329 - current->mm->env_end = current->mm->env_start = p; 326 + mm->env_end = mm->env_start = p; 330 327 while (envc-- > 0) { 331 328 size_t len; 332 329 if (__put_user((elf_addr_t)p, sp++)) ··· 338 335 } 339 336 if (__put_user(0, sp++)) 340 337 return -EFAULT; 341 - current->mm->env_end = p; 338 + mm->env_end = p; 342 339 343 340 /* Put the elf_info on the stack in the right place. */ 344 - if (copy_to_user(sp, elf_info, ei_index * sizeof(elf_addr_t))) 341 + if (copy_to_user(sp, mm->saved_auxv, ei_index * sizeof(elf_addr_t))) 345 342 return -EFAULT; 346 343 return 0; 347 344 } ··· 692 689 int bss_prot = 0; 693 690 int retval, i; 694 691 unsigned long elf_entry; 692 + unsigned long e_entry; 695 693 unsigned long interp_load_addr = 0; 696 694 unsigned long start_code, end_code, start_data, end_data; 697 695 unsigned long reloc_func_desc __maybe_unused = 0; 698 696 int executable_stack = EXSTACK_DEFAULT; 697 + struct elfhdr *elf_ex = (struct elfhdr *)bprm->buf; 699 698 struct { 700 - struct elfhdr elf_ex; 701 699 struct elfhdr interp_elf_ex; 702 700 } *loc; 703 701 struct arch_elf_state arch_state = INIT_ARCH_ELF_STATE; 702 + struct mm_struct *mm; 704 703 struct pt_regs *regs; 705 704 706 705 loc = kmalloc(sizeof(*loc), GFP_KERNEL); ··· 710 705 retval = -ENOMEM; 711 706 goto out_ret; 712 707 } 713 - 714 - /* Get the exec-header */ 715 - loc->elf_ex = *((struct elfhdr *)bprm->buf); 716 708 717 709 retval = -ENOEXEC; 718 710 /* First of all, some simple consistency checks */ 719 - if (memcmp(loc->elf_ex.e_ident, ELFMAG, SELFMAG) != 0) 711 + if (memcmp(elf_ex->e_ident, ELFMAG, SELFMAG) != 0) 720 712 goto out; 721 713 722 - if (loc->elf_ex.e_type != ET_EXEC && loc->elf_ex.e_type != ET_DYN) 714 + if (elf_ex->e_type != ET_EXEC && elf_ex->e_type != ET_DYN) 723 715 goto out; 724 - if (!elf_check_arch(&loc->elf_ex)) 716 + if (!elf_check_arch(elf_ex)) 725 717 goto out; 726 - if (elf_check_fdpic(&loc->elf_ex)) 718 + if (elf_check_fdpic(elf_ex)) 727 719 goto out; 728 720 if (!bprm->file->f_op->mmap) 729 721 goto out; 730 722 731 - elf_phdata = load_elf_phdrs(&loc->elf_ex, bprm->file); 723 + elf_phdata = load_elf_phdrs(elf_ex, bprm->file); 732 724 if (!elf_phdata) 733 725 goto out; 734 726 735 727 elf_ppnt = elf_phdata; 736 - for (i = 0; i < loc->elf_ex.e_phnum; i++, elf_ppnt++) { 728 + for (i = 0; i < elf_ex->e_phnum; i++, elf_ppnt++) { 737 729 char *elf_interpreter; 738 730 739 731 if (elf_ppnt->p_type != PT_INTERP) ··· 784 782 } 785 783 786 784 elf_ppnt = elf_phdata; 787 - for (i = 0; i < loc->elf_ex.e_phnum; i++, elf_ppnt++) 785 + for (i = 0; i < elf_ex->e_phnum; i++, elf_ppnt++) 788 786 switch (elf_ppnt->p_type) { 789 787 case PT_GNU_STACK: 790 788 if (elf_ppnt->p_flags & PF_X) ··· 794 792 break; 795 793 796 794 case PT_LOPROC ... PT_HIPROC: 797 - retval = arch_elf_pt_proc(&loc->elf_ex, elf_ppnt, 795 + retval = arch_elf_pt_proc(elf_ex, elf_ppnt, 798 796 bprm->file, false, 799 797 &arch_state); 800 798 if (retval) ··· 838 836 * still possible to return an error to the code that invoked 839 837 * the exec syscall. 840 838 */ 841 - retval = arch_check_elf(&loc->elf_ex, 839 + retval = arch_check_elf(elf_ex, 842 840 !!interpreter, &loc->interp_elf_ex, 843 841 &arch_state); 844 842 if (retval) ··· 851 849 852 850 /* Do this immediately, since STACK_TOP as used in setup_arg_pages 853 851 may depend on the personality. */ 854 - SET_PERSONALITY2(loc->elf_ex, &arch_state); 855 - if (elf_read_implies_exec(loc->elf_ex, executable_stack)) 852 + SET_PERSONALITY2(*elf_ex, &arch_state); 853 + if (elf_read_implies_exec(*elf_ex, executable_stack)) 856 854 current->personality |= READ_IMPLIES_EXEC; 857 855 858 856 if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space) ··· 879 877 /* Now we do a little grungy work by mmapping the ELF image into 880 878 the correct location in memory. */ 881 879 for(i = 0, elf_ppnt = elf_phdata; 882 - i < loc->elf_ex.e_phnum; i++, elf_ppnt++) { 880 + i < elf_ex->e_phnum; i++, elf_ppnt++) { 883 881 int elf_prot, elf_flags; 884 882 unsigned long k, vaddr; 885 883 unsigned long total_size = 0; ··· 923 921 * If we are loading ET_EXEC or we have already performed 924 922 * the ET_DYN load_addr calculations, proceed normally. 925 923 */ 926 - if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) { 924 + if (elf_ex->e_type == ET_EXEC || load_addr_set) { 927 925 elf_flags |= MAP_FIXED; 928 - } else if (loc->elf_ex.e_type == ET_DYN) { 926 + } else if (elf_ex->e_type == ET_DYN) { 929 927 /* 930 928 * This logic is run once for the first LOAD Program 931 929 * Header for ET_DYN binaries to calculate the ··· 974 972 load_bias = ELF_PAGESTART(load_bias - vaddr); 975 973 976 974 total_size = total_mapping_size(elf_phdata, 977 - loc->elf_ex.e_phnum); 975 + elf_ex->e_phnum); 978 976 if (!total_size) { 979 977 retval = -EINVAL; 980 978 goto out_free_dentry; ··· 992 990 if (!load_addr_set) { 993 991 load_addr_set = 1; 994 992 load_addr = (elf_ppnt->p_vaddr - elf_ppnt->p_offset); 995 - if (loc->elf_ex.e_type == ET_DYN) { 993 + if (elf_ex->e_type == ET_DYN) { 996 994 load_bias += error - 997 995 ELF_PAGESTART(load_bias + vaddr); 998 996 load_addr += load_bias; ··· 1000 998 } 1001 999 } 1002 1000 k = elf_ppnt->p_vaddr; 1003 - if (k < start_code) 1001 + if ((elf_ppnt->p_flags & PF_X) && k < start_code) 1004 1002 start_code = k; 1005 1003 if (start_data < k) 1006 1004 start_data = k; ··· 1033 1031 } 1034 1032 } 1035 1033 1036 - loc->elf_ex.e_entry += load_bias; 1034 + e_entry = elf_ex->e_entry + load_bias; 1037 1035 elf_bss += load_bias; 1038 1036 elf_brk += load_bias; 1039 1037 start_code += load_bias; ··· 1076 1074 allow_write_access(interpreter); 1077 1075 fput(interpreter); 1078 1076 } else { 1079 - elf_entry = loc->elf_ex.e_entry; 1077 + elf_entry = e_entry; 1080 1078 if (BAD_ADDR(elf_entry)) { 1081 1079 retval = -EINVAL; 1082 1080 goto out_free_dentry; ··· 1094 1092 goto out; 1095 1093 #endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */ 1096 1094 1097 - retval = create_elf_tables(bprm, &loc->elf_ex, 1098 - load_addr, interp_load_addr); 1095 + retval = create_elf_tables(bprm, elf_ex, 1096 + load_addr, interp_load_addr, e_entry); 1099 1097 if (retval < 0) 1100 1098 goto out; 1101 - current->mm->end_code = end_code; 1102 - current->mm->start_code = start_code; 1103 - current->mm->start_data = start_data; 1104 - current->mm->end_data = end_data; 1105 - current->mm->start_stack = bprm->p; 1099 + 1100 + mm = current->mm; 1101 + mm->end_code = end_code; 1102 + mm->start_code = start_code; 1103 + mm->start_data = start_data; 1104 + mm->end_data = end_data; 1105 + mm->start_stack = bprm->p; 1106 1106 1107 1107 if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1)) { 1108 1108 /* ··· 1115 1111 * growing down), and into the unused ELF_ET_DYN_BASE region. 1116 1112 */ 1117 1113 if (IS_ENABLED(CONFIG_ARCH_HAS_ELF_RANDOMIZE) && 1118 - loc->elf_ex.e_type == ET_DYN && !interpreter) 1119 - current->mm->brk = current->mm->start_brk = 1120 - ELF_ET_DYN_BASE; 1114 + elf_ex->e_type == ET_DYN && !interpreter) { 1115 + mm->brk = mm->start_brk = ELF_ET_DYN_BASE; 1116 + } 1121 1117 1122 - current->mm->brk = current->mm->start_brk = 1123 - arch_randomize_brk(current->mm); 1118 + mm->brk = mm->start_brk = arch_randomize_brk(mm); 1124 1119 #ifdef compat_brk_randomized 1125 1120 current->brk_randomized = 1; 1126 1121 #endif ··· 1577 1574 */ 1578 1575 static int fill_files_note(struct memelfnote *note) 1579 1576 { 1577 + struct mm_struct *mm = current->mm; 1580 1578 struct vm_area_struct *vma; 1581 1579 unsigned count, size, names_ofs, remaining, n; 1582 1580 user_long_t *data; ··· 1585 1581 char *name_base, *name_curpos; 1586 1582 1587 1583 /* *Estimated* file count and total data size needed */ 1588 - count = current->mm->map_count; 1584 + count = mm->map_count; 1589 1585 if (count > UINT_MAX / 64) 1590 1586 return -EINVAL; 1591 1587 size = count * 64; ··· 1595 1591 if (size >= MAX_FILE_NOTE_SIZE) /* paranoia check */ 1596 1592 return -EINVAL; 1597 1593 size = round_up(size, PAGE_SIZE); 1594 + /* 1595 + * "size" can be 0 here legitimately. 1596 + * Let it ENOMEM and omit NT_FILE section which will be empty anyway. 1597 + */ 1598 1598 data = kvmalloc(size, GFP_KERNEL); 1599 1599 if (ZERO_OR_NULL_PTR(data)) 1600 1600 return -ENOMEM; ··· 1607 1599 name_base = name_curpos = ((char *)data) + names_ofs; 1608 1600 remaining = size - names_ofs; 1609 1601 count = 0; 1610 - for (vma = current->mm->mmap; vma != NULL; vma = vma->vm_next) { 1602 + for (vma = mm->mmap; vma != NULL; vma = vma->vm_next) { 1611 1603 struct file *file; 1612 1604 const char *filename; 1613 1605 ··· 1641 1633 data[0] = count; 1642 1634 data[1] = PAGE_SIZE; 1643 1635 /* 1644 - * Count usually is less than current->mm->map_count, 1636 + * Count usually is less than mm->map_count, 1645 1637 * we need to move filenames down. 1646 1638 */ 1647 - n = current->mm->map_count - count; 1639 + n = mm->map_count - count; 1648 1640 if (n != 0) { 1649 1641 unsigned shift_bytes = n * 3 * sizeof(data[0]); 1650 1642 memmove(name_base - shift_bytes, name_base, ··· 2190 2182 int segs, i; 2191 2183 size_t vma_data_size = 0; 2192 2184 struct vm_area_struct *vma, *gate_vma; 2193 - struct elfhdr *elf = NULL; 2185 + struct elfhdr elf; 2194 2186 loff_t offset = 0, dataoff; 2195 2187 struct elf_note_info info = { }; 2196 2188 struct elf_phdr *phdr4note = NULL; ··· 2211 2203 * exists while dumping the mm->vm_next areas to the core file. 2212 2204 */ 2213 2205 2214 - /* alloc memory for large data structures: too large to be on stack */ 2215 - elf = kmalloc(sizeof(*elf), GFP_KERNEL); 2216 - if (!elf) 2217 - goto out; 2218 2206 /* 2219 2207 * The number of segs are recored into ELF header as 16bit value. 2220 2208 * Please check DEFAULT_MAX_MAP_COUNT definition when you modify here. ··· 2234 2230 * Collect all the non-memory information about the process for the 2235 2231 * notes. This also sets up the file header. 2236 2232 */ 2237 - if (!fill_note_info(elf, e_phnum, &info, cprm->siginfo, cprm->regs)) 2233 + if (!fill_note_info(&elf, e_phnum, &info, cprm->siginfo, cprm->regs)) 2238 2234 goto cleanup; 2239 2235 2240 2236 has_dumped = 1; ··· 2242 2238 fs = get_fs(); 2243 2239 set_fs(KERNEL_DS); 2244 2240 2245 - offset += sizeof(*elf); /* Elf header */ 2241 + offset += sizeof(elf); /* Elf header */ 2246 2242 offset += segs * sizeof(struct elf_phdr); /* Program headers */ 2247 2243 2248 2244 /* Write notes phdr entry */ ··· 2261 2257 2262 2258 dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE); 2263 2259 2264 - if (segs - 1 > ULONG_MAX / sizeof(*vma_filesz)) 2265 - goto end_coredump; 2260 + /* 2261 + * Zero vma process will get ZERO_SIZE_PTR here. 2262 + * Let coredump continue for register state at least. 2263 + */ 2266 2264 vma_filesz = kvmalloc(array_size(sizeof(*vma_filesz), (segs - 1)), 2267 2265 GFP_KERNEL); 2268 - if (ZERO_OR_NULL_PTR(vma_filesz)) 2266 + if (!vma_filesz) 2269 2267 goto end_coredump; 2270 2268 2271 2269 for (i = 0, vma = first_vma(current, gate_vma); vma != NULL; ··· 2287 2281 shdr4extnum = kmalloc(sizeof(*shdr4extnum), GFP_KERNEL); 2288 2282 if (!shdr4extnum) 2289 2283 goto end_coredump; 2290 - fill_extnum_info(elf, shdr4extnum, e_shoff, segs); 2284 + fill_extnum_info(&elf, shdr4extnum, e_shoff, segs); 2291 2285 } 2292 2286 2293 2287 offset = dataoff; 2294 2288 2295 - if (!dump_emit(cprm, elf, sizeof(*elf))) 2289 + if (!dump_emit(cprm, &elf, sizeof(elf))) 2296 2290 goto end_coredump; 2297 2291 2298 2292 if (!dump_emit(cprm, phdr4note, sizeof(*phdr4note))) ··· 2376 2370 kfree(shdr4extnum); 2377 2371 kvfree(vma_filesz); 2378 2372 kfree(phdr4note); 2379 - kfree(elf); 2380 - out: 2381 2373 return has_dumped; 2382 2374 } 2383 2375

+1 -1

fs/btrfs/compression.c

··· 1290 1290 /* copy bytes from the working buffer into the pages */ 1291 1291 while (working_bytes > 0) { 1292 1292 bytes = min_t(unsigned long, bvec.bv_len, 1293 - PAGE_SIZE - buf_offset); 1293 + PAGE_SIZE - (buf_offset % PAGE_SIZE)); 1294 1294 bytes = min(bytes, working_bytes); 1295 1295 1296 1296 kaddr = kmap_atomic(bvec.bv_page);

+100 -35

fs/btrfs/zlib.c

··· 20 20 #include <linux/refcount.h> 21 21 #include "compression.h" 22 22 23 + /* workspace buffer size for s390 zlib hardware support */ 24 + #define ZLIB_DFLTCC_BUF_SIZE (4 * PAGE_SIZE) 25 + 23 26 struct workspace { 24 27 z_stream strm; 25 28 char *buf; 29 + unsigned int buf_size; 26 30 struct list_head list; 27 31 int level; 28 32 }; ··· 65 61 zlib_inflate_workspacesize()); 66 62 workspace->strm.workspace = kvmalloc(workspacesize, GFP_KERNEL); 67 63 workspace->level = level; 68 - workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL); 64 + workspace->buf = NULL; 65 + /* 66 + * In case of s390 zlib hardware support, allocate lager workspace 67 + * buffer. If allocator fails, fall back to a single page buffer. 68 + */ 69 + if (zlib_deflate_dfltcc_enabled()) { 70 + workspace->buf = kmalloc(ZLIB_DFLTCC_BUF_SIZE, 71 + __GFP_NOMEMALLOC | __GFP_NORETRY | 72 + __GFP_NOWARN | GFP_NOIO); 73 + workspace->buf_size = ZLIB_DFLTCC_BUF_SIZE; 74 + } 75 + if (!workspace->buf) { 76 + workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL); 77 + workspace->buf_size = PAGE_SIZE; 78 + } 69 79 if (!workspace->strm.workspace || !workspace->buf) 70 80 goto fail; 71 81 ··· 103 85 struct page *in_page = NULL; 104 86 struct page *out_page = NULL; 105 87 unsigned long bytes_left; 88 + unsigned int in_buf_pages; 106 89 unsigned long len = *total_out; 107 90 unsigned long nr_dest_pages = *out_pages; 108 91 const unsigned long max_out = nr_dest_pages * PAGE_SIZE; ··· 121 102 workspace->strm.total_in = 0; 122 103 workspace->strm.total_out = 0; 123 104 124 - in_page = find_get_page(mapping, start >> PAGE_SHIFT); 125 - data_in = kmap(in_page); 126 - 127 105 out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM); 128 106 if (out_page == NULL) { 129 107 ret = -ENOMEM; ··· 130 114 pages[0] = out_page; 131 115 nr_pages = 1; 132 116 133 - workspace->strm.next_in = data_in; 117 + workspace->strm.next_in = workspace->buf; 118 + workspace->strm.avail_in = 0; 134 119 workspace->strm.next_out = cpage_out; 135 120 workspace->strm.avail_out = PAGE_SIZE; 136 - workspace->strm.avail_in = min(len, PAGE_SIZE); 137 121 138 122 while (workspace->strm.total_in < len) { 123 + /* 124 + * Get next input pages and copy the contents to 125 + * the workspace buffer if required. 126 + */ 127 + if (workspace->strm.avail_in == 0) { 128 + bytes_left = len - workspace->strm.total_in; 129 + in_buf_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE), 130 + workspace->buf_size / PAGE_SIZE); 131 + if (in_buf_pages > 1) { 132 + int i; 133 + 134 + for (i = 0; i < in_buf_pages; i++) { 135 + if (in_page) { 136 + kunmap(in_page); 137 + put_page(in_page); 138 + } 139 + in_page = find_get_page(mapping, 140 + start >> PAGE_SHIFT); 141 + data_in = kmap(in_page); 142 + memcpy(workspace->buf + i * PAGE_SIZE, 143 + data_in, PAGE_SIZE); 144 + start += PAGE_SIZE; 145 + } 146 + workspace->strm.next_in = workspace->buf; 147 + } else { 148 + if (in_page) { 149 + kunmap(in_page); 150 + put_page(in_page); 151 + } 152 + in_page = find_get_page(mapping, 153 + start >> PAGE_SHIFT); 154 + data_in = kmap(in_page); 155 + start += PAGE_SIZE; 156 + workspace->strm.next_in = data_in; 157 + } 158 + workspace->strm.avail_in = min(bytes_left, 159 + (unsigned long) workspace->buf_size); 160 + } 161 + 139 162 ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH); 140 163 if (ret != Z_OK) { 141 164 pr_debug("BTRFS: deflate in loop returned %d\n", ··· 216 161 /* we're all done */ 217 162 if (workspace->strm.total_in >= len) 218 163 break; 219 - 220 - /* we've read in a full page, get a new one */ 221 - if (workspace->strm.avail_in == 0) { 222 - if (workspace->strm.total_out > max_out) 223 - break; 224 - 225 - bytes_left = len - workspace->strm.total_in; 226 - kunmap(in_page); 227 - put_page(in_page); 228 - 229 - start += PAGE_SIZE; 230 - in_page = find_get_page(mapping, 231 - start >> PAGE_SHIFT); 232 - data_in = kmap(in_page); 233 - workspace->strm.avail_in = min(bytes_left, 234 - PAGE_SIZE); 235 - workspace->strm.next_in = data_in; 236 - } 164 + if (workspace->strm.total_out > max_out) 165 + break; 237 166 } 238 167 workspace->strm.avail_in = 0; 239 - ret = zlib_deflate(&workspace->strm, Z_FINISH); 240 - zlib_deflateEnd(&workspace->strm); 241 - 242 - if (ret != Z_STREAM_END) { 243 - ret = -EIO; 244 - goto out; 168 + /* 169 + * Call deflate with Z_FINISH flush parameter providing more output 170 + * space but no more input data, until it returns with Z_STREAM_END. 171 + */ 172 + while (ret != Z_STREAM_END) { 173 + ret = zlib_deflate(&workspace->strm, Z_FINISH); 174 + if (ret == Z_STREAM_END) 175 + break; 176 + if (ret != Z_OK && ret != Z_BUF_ERROR) { 177 + zlib_deflateEnd(&workspace->strm); 178 + ret = -EIO; 179 + goto out; 180 + } else if (workspace->strm.avail_out == 0) { 181 + /* get another page for the stream end */ 182 + kunmap(out_page); 183 + if (nr_pages == nr_dest_pages) { 184 + out_page = NULL; 185 + ret = -E2BIG; 186 + goto out; 187 + } 188 + out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM); 189 + if (out_page == NULL) { 190 + ret = -ENOMEM; 191 + goto out; 192 + } 193 + cpage_out = kmap(out_page); 194 + pages[nr_pages] = out_page; 195 + nr_pages++; 196 + workspace->strm.avail_out = PAGE_SIZE; 197 + workspace->strm.next_out = cpage_out; 198 + } 245 199 } 200 + zlib_deflateEnd(&workspace->strm); 246 201 247 202 if (workspace->strm.total_out >= workspace->strm.total_in) { 248 203 ret = -E2BIG; ··· 296 231 297 232 workspace->strm.total_out = 0; 298 233 workspace->strm.next_out = workspace->buf; 299 - workspace->strm.avail_out = PAGE_SIZE; 234 + workspace->strm.avail_out = workspace->buf_size; 300 235 301 236 /* If it's deflate, and it's got no preset dictionary, then 302 237 we can tell zlib to skip the adler32 check. */ ··· 335 270 } 336 271 337 272 workspace->strm.next_out = workspace->buf; 338 - workspace->strm.avail_out = PAGE_SIZE; 273 + workspace->strm.avail_out = workspace->buf_size; 339 274 340 275 if (workspace->strm.avail_in == 0) { 341 276 unsigned long tmp; ··· 385 320 workspace->strm.total_in = 0; 386 321 387 322 workspace->strm.next_out = workspace->buf; 388 - workspace->strm.avail_out = PAGE_SIZE; 323 + workspace->strm.avail_out = workspace->buf_size; 389 324 workspace->strm.total_out = 0; 390 325 /* If it's deflate, and it's got no preset dictionary, then 391 326 we can tell zlib to skip the adler32 check. */ ··· 429 364 buf_offset = 0; 430 365 431 366 bytes = min(PAGE_SIZE - pg_offset, 432 - PAGE_SIZE - buf_offset); 367 + PAGE_SIZE - (buf_offset % PAGE_SIZE)); 433 368 bytes = min(bytes, bytes_left); 434 369 435 370 kaddr = kmap_atomic(dest_page); ··· 440 375 bytes_left -= bytes; 441 376 next: 442 377 workspace->strm.next_out = workspace->buf; 443 - workspace->strm.avail_out = PAGE_SIZE; 378 + workspace->strm.avail_out = workspace->buf_size; 444 379 } 445 380 446 381 if (ret != Z_STREAM_END && bytes_left != 0)

+5

fs/exec.c

··· 760 760 goto out_unlock; 761 761 BUG_ON(prev != vma); 762 762 763 + if (unlikely(vm_flags & VM_EXEC)) { 764 + pr_warn_once("process '%pD4' started with executable stack\n", 765 + bprm->file); 766 + } 767 + 763 768 /* Move stack pages down in memory. */ 764 769 if (stack_shift) { 765 770 ret = shift_arg_pages(vma, stack_shift);

+1 -1

fs/fs-writeback.c

··· 2063 2063 struct bdi_writeback, dwork); 2064 2064 long pages_written; 2065 2065 2066 - set_worker_desc("flush-%s", dev_name(wb->bdi->dev)); 2066 + set_worker_desc("flush-%s", bdi_dev_name(wb->bdi)); 2067 2067 current->flags |= PF_SWAPWRITE; 2068 2068 2069 2069 if (likely(!current_is_workqueue_rescuer() ||

+3 -3

fs/io_uring.c

··· 6005 6005 struct io_mapped_ubuf *imu = &ctx->user_bufs[i]; 6006 6006 6007 6007 for (j = 0; j < imu->nr_bvecs; j++) 6008 - put_user_page(imu->bvec[j].bv_page); 6008 + unpin_user_page(imu->bvec[j].bv_page); 6009 6009 6010 6010 if (ctx->account_mem) 6011 6011 io_unaccount_mem(ctx->user, imu->nr_bvecs); ··· 6126 6126 6127 6127 ret = 0; 6128 6128 down_read(&current->mm->mmap_sem); 6129 - pret = get_user_pages(ubuf, nr_pages, 6129 + pret = pin_user_pages(ubuf, nr_pages, 6130 6130 FOLL_WRITE | FOLL_LONGTERM, 6131 6131 pages, vmas); 6132 6132 if (pret == nr_pages) { ··· 6150 6150 * release any pages we did get 6151 6151 */ 6152 6152 if (pret > 0) 6153 - put_user_pages(pages, pret); 6153 + unpin_user_pages(pages, pret); 6154 6154 if (ctx->account_mem) 6155 6155 io_unaccount_mem(ctx->user, nr_pages); 6156 6156 kvfree(imu->bvec);

+1 -1

fs/ocfs2/cluster/quorum.c

··· 73 73 "system by restarting ***\n"); 74 74 emergency_restart(); 75 75 break; 76 - }; 76 + } 77 77 } 78 78 79 79 /* Indicate that a timeout occurred on a heartbeat region write. The

-2

fs/ocfs2/dlm/Makefile

··· 1 1 # SPDX-License-Identifier: GPL-2.0-only 2 - ccflags-y := -I $(srctree)/$(src)/.. 3 - 4 2 obj-$(CONFIG_OCFS2_FS_O2CB) += ocfs2_dlm.o 5 3 6 4 ocfs2_dlm-objs := dlmdomain.o dlmdebug.o dlmthread.o dlmrecovery.o \

+4 -4

fs/ocfs2/dlm/dlmast.c

··· 23 23 #include <linux/spinlock.h> 24 24 25 25 26 - #include "cluster/heartbeat.h" 27 - #include "cluster/nodemanager.h" 28 - #include "cluster/tcp.h" 26 + #include "../cluster/heartbeat.h" 27 + #include "../cluster/nodemanager.h" 28 + #include "../cluster/tcp.h" 29 29 30 30 #include "dlmapi.h" 31 31 #include "dlmcommon.h" 32 32 33 33 #define MLOG_MASK_PREFIX ML_DLM 34 - #include "cluster/masklog.h" 34 + #include "../cluster/masklog.h" 35 35 36 36 static void dlm_update_lvb(struct dlm_ctxt *dlm, struct dlm_lock_resource *res, 37 37 struct dlm_lock *lock);

-4

fs/ocfs2/dlm/dlmcommon.h

··· 688 688 __be32 pad2; 689 689 }; 690 690 691 - 692 - #define BITS_PER_BYTE 8 693 - #define BITS_TO_BYTES(bits) (((bits)+BITS_PER_BYTE-1)/BITS_PER_BYTE) 694 - 695 691 struct dlm_query_join_request 696 692 { 697 693 u8 node_idx;

+4 -4

fs/ocfs2/dlm/dlmconvert.c

··· 23 23 #include <linux/spinlock.h> 24 24 25 25 26 - #include "cluster/heartbeat.h" 27 - #include "cluster/nodemanager.h" 28 - #include "cluster/tcp.h" 26 + #include "../cluster/heartbeat.h" 27 + #include "../cluster/nodemanager.h" 28 + #include "../cluster/tcp.h" 29 29 30 30 #include "dlmapi.h" 31 31 #include "dlmcommon.h" ··· 33 33 #include "dlmconvert.h" 34 34 35 35 #define MLOG_MASK_PREFIX ML_DLM 36 - #include "cluster/masklog.h" 36 + #include "../cluster/masklog.h" 37 37 38 38 /* NOTE: __dlmconvert_master is the only function in here that 39 39 * needs a spinlock held on entry (res->spinlock) and it is the

+4 -4

fs/ocfs2/dlm/dlmdebug.c

··· 17 17 #include <linux/debugfs.h> 18 18 #include <linux/export.h> 19 19 20 - #include "cluster/heartbeat.h" 21 - #include "cluster/nodemanager.h" 22 - #include "cluster/tcp.h" 20 + #include "../cluster/heartbeat.h" 21 + #include "../cluster/nodemanager.h" 22 + #include "../cluster/tcp.h" 23 23 24 24 #include "dlmapi.h" 25 25 #include "dlmcommon.h" ··· 27 27 #include "dlmdebug.h" 28 28 29 29 #define MLOG_MASK_PREFIX ML_DLM 30 - #include "cluster/masklog.h" 30 + #include "../cluster/masklog.h" 31 31 32 32 static int stringify_lockname(const char *lockname, int locklen, char *buf, 33 33 int len);

+4 -4

fs/ocfs2/dlm/dlmdomain.c

··· 20 20 #include <linux/debugfs.h> 21 21 #include <linux/sched/signal.h> 22 22 23 - #include "cluster/heartbeat.h" 24 - #include "cluster/nodemanager.h" 25 - #include "cluster/tcp.h" 23 + #include "../cluster/heartbeat.h" 24 + #include "../cluster/nodemanager.h" 25 + #include "../cluster/tcp.h" 26 26 27 27 #include "dlmapi.h" 28 28 #include "dlmcommon.h" ··· 30 30 #include "dlmdebug.h" 31 31 32 32 #define MLOG_MASK_PREFIX (ML_DLM|ML_DLM_DOMAIN) 33 - #include "cluster/masklog.h" 33 + #include "../cluster/masklog.h" 34 34 35 35 /* 36 36 * ocfs2 node maps are array of long int, which limits to send them freely

+4 -4

fs/ocfs2/dlm/dlmlock.c

··· 25 25 #include <linux/delay.h> 26 26 27 27 28 - #include "cluster/heartbeat.h" 29 - #include "cluster/nodemanager.h" 30 - #include "cluster/tcp.h" 28 + #include "../cluster/heartbeat.h" 29 + #include "../cluster/nodemanager.h" 30 + #include "../cluster/tcp.h" 31 31 32 32 #include "dlmapi.h" 33 33 #include "dlmcommon.h" ··· 35 35 #include "dlmconvert.h" 36 36 37 37 #define MLOG_MASK_PREFIX ML_DLM 38 - #include "cluster/masklog.h" 38 + #include "../cluster/masklog.h" 39 39 40 40 static struct kmem_cache *dlm_lock_cache; 41 41

+4 -6

fs/ocfs2/dlm/dlmmaster.c

··· 25 25 #include <linux/delay.h> 26 26 27 27 28 - #include "cluster/heartbeat.h" 29 - #include "cluster/nodemanager.h" 30 - #include "cluster/tcp.h" 28 + #include "../cluster/heartbeat.h" 29 + #include "../cluster/nodemanager.h" 30 + #include "../cluster/tcp.h" 31 31 32 32 #include "dlmapi.h" 33 33 #include "dlmcommon.h" ··· 35 35 #include "dlmdebug.h" 36 36 37 37 #define MLOG_MASK_PREFIX (ML_DLM|ML_DLM_MASTER) 38 - #include "cluster/masklog.h" 38 + #include "../cluster/masklog.h" 39 39 40 40 static void dlm_mle_node_down(struct dlm_ctxt *dlm, 41 41 struct dlm_master_list_entry *mle, ··· 2553 2553 2554 2554 if (!dlm_grab(dlm)) 2555 2555 return -EINVAL; 2556 - 2557 - BUG_ON(target == O2NM_MAX_NODES); 2558 2556 2559 2557 name = res->lockname.name; 2560 2558 namelen = res->lockname.len;

+5 -5

fs/ocfs2/dlm/dlmrecovery.c

··· 26 26 #include <linux/delay.h> 27 27 28 28 29 - #include "cluster/heartbeat.h" 30 - #include "cluster/nodemanager.h" 31 - #include "cluster/tcp.h" 29 + #include "../cluster/heartbeat.h" 30 + #include "../cluster/nodemanager.h" 31 + #include "../cluster/tcp.h" 32 32 33 33 #include "dlmapi.h" 34 34 #include "dlmcommon.h" 35 35 #include "dlmdomain.h" 36 36 37 37 #define MLOG_MASK_PREFIX (ML_DLM|ML_DLM_RECOVERY) 38 - #include "cluster/masklog.h" 38 + #include "../cluster/masklog.h" 39 39 40 40 static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node); 41 41 ··· 1668 1668 int dlm_do_master_requery(struct dlm_ctxt *dlm, struct dlm_lock_resource *res, 1669 1669 u8 nodenum, u8 *real_master) 1670 1670 { 1671 - int ret = -EINVAL; 1671 + int ret; 1672 1672 struct dlm_master_requery req; 1673 1673 int status = DLM_LOCK_RES_OWNER_UNKNOWN; 1674 1674

+4 -4

fs/ocfs2/dlm/dlmthread.c

··· 25 25 #include <linux/delay.h> 26 26 27 27 28 - #include "cluster/heartbeat.h" 29 - #include "cluster/nodemanager.h" 30 - #include "cluster/tcp.h" 28 + #include "../cluster/heartbeat.h" 29 + #include "../cluster/nodemanager.h" 30 + #include "../cluster/tcp.h" 31 31 32 32 #include "dlmapi.h" 33 33 #include "dlmcommon.h" 34 34 #include "dlmdomain.h" 35 35 36 36 #define MLOG_MASK_PREFIX (ML_DLM|ML_DLM_THREAD) 37 - #include "cluster/masklog.h" 37 + #include "../cluster/masklog.h" 38 38 39 39 static int dlm_thread(void *data); 40 40 static void dlm_flush_asts(struct dlm_ctxt *dlm);

+4 -4

fs/ocfs2/dlm/dlmunlock.c

··· 23 23 #include <linux/spinlock.h> 24 24 #include <linux/delay.h> 25 25 26 - #include "cluster/heartbeat.h" 27 - #include "cluster/nodemanager.h" 28 - #include "cluster/tcp.h" 26 + #include "../cluster/heartbeat.h" 27 + #include "../cluster/nodemanager.h" 28 + #include "../cluster/tcp.h" 29 29 30 30 #include "dlmapi.h" 31 31 #include "dlmcommon.h" 32 32 33 33 #define MLOG_MASK_PREFIX ML_DLM 34 - #include "cluster/masklog.h" 34 + #include "../cluster/masklog.h" 35 35 36 36 #define DLM_UNLOCK_FREE_LOCK 0x00000001 37 37 #define DLM_UNLOCK_CALL_AST 0x00000002

-2

fs/ocfs2/dlmfs/Makefile

··· 1 1 # SPDX-License-Identifier: GPL-2.0-only 2 - ccflags-y := -I $(srctree)/$(src)/.. 3 - 4 2 obj-$(CONFIG_OCFS2_FS) += ocfs2_dlmfs.o 5 3 6 4 ocfs2_dlmfs-objs := userdlm.o dlmfs.o

+2 -2

fs/ocfs2/dlmfs/dlmfs.c

··· 33 33 34 34 #include <linux/uaccess.h> 35 35 36 - #include "stackglue.h" 36 + #include "../stackglue.h" 37 37 #include "userdlm.h" 38 38 39 39 #define MLOG_MASK_PREFIX ML_DLMFS 40 - #include "cluster/masklog.h" 40 + #include "../cluster/masklog.h" 41 41 42 42 43 43 static const struct super_operations dlmfs_ops;

+3 -3

fs/ocfs2/dlmfs/userdlm.c

··· 21 21 #include <linux/types.h> 22 22 #include <linux/crc32.h> 23 23 24 - #include "ocfs2_lockingver.h" 25 - #include "stackglue.h" 24 + #include "../ocfs2_lockingver.h" 25 + #include "../stackglue.h" 26 26 #include "userdlm.h" 27 27 28 28 #define MLOG_MASK_PREFIX ML_DLMFS 29 - #include "cluster/masklog.h" 29 + #include "../cluster/masklog.h" 30 30 31 31 32 32 static inline struct user_lock_res *user_lksb_to_lock_res(struct ocfs2_dlm_lksb *lksb)

+1 -1

fs/ocfs2/dlmglue.c

··· 570 570 mlog_bug_on_msg(1, "type: %d\n", type); 571 571 ops = NULL; /* thanks, gcc */ 572 572 break; 573 - }; 573 + } 574 574 575 575 ocfs2_build_lock_name(type, OCFS2_I(inode)->ip_blkno, 576 576 generation, res->l_name);

+5 -3

fs/ocfs2/journal.h

··· 597 597 { 598 598 struct ocfs2_inode_info *oi = OCFS2_I(inode); 599 599 600 - oi->i_sync_tid = handle->h_transaction->t_tid; 601 - if (datasync) 602 - oi->i_datasync_tid = handle->h_transaction->t_tid; 600 + if (!is_handle_aborted(handle)) { 601 + oi->i_sync_tid = handle->h_transaction->t_tid; 602 + if (datasync) 603 + oi->i_datasync_tid = handle->h_transaction->t_tid; 604 + } 603 605 } 604 606 605 607 #endif /* OCFS2_JOURNAL_H */

+1 -2

fs/ocfs2/namei.c

··· 586 586 mlog_errno(status); 587 587 } 588 588 589 - oi->i_sync_tid = handle->h_transaction->t_tid; 590 - oi->i_datasync_tid = handle->h_transaction->t_tid; 589 + ocfs2_update_inode_fsync_trans(handle, inode, 1); 591 590 592 591 leave: 593 592 if (status < 0) {

+2 -1

fs/reiserfs/stree.c

··· 2240 2240 /* also releases the path */ 2241 2241 unfix_nodes(&s_ins_balance); 2242 2242 #ifdef REISERQUOTA_DEBUG 2243 - reiserfs_debug(th->t_super, REISERFS_DEBUG_CODE, 2243 + if (inode) 2244 + reiserfs_debug(th->t_super, REISERFS_DEBUG_CODE, 2244 2245 "reiserquota insert_item(): freeing %u id=%u type=%c", 2245 2246 quota_bytes, inode->i_uid, head2type(ih)); 2246 2247 #endif

+10

include/linux/backing-dev.h

··· 13 13 #include <linux/fs.h> 14 14 #include <linux/sched.h> 15 15 #include <linux/blkdev.h> 16 + #include <linux/device.h> 16 17 #include <linux/writeback.h> 17 18 #include <linux/blk-cgroup.h> 18 19 #include <linux/backing-dev-defs.h> ··· 503 502 { 504 503 return bdi_congested(bdi, (1 << WB_sync_congested) | 505 504 (1 << WB_async_congested)); 505 + } 506 + 507 + extern const char *bdi_unknown_name; 508 + 509 + static inline const char *bdi_dev_name(struct backing_dev_info *bdi) 510 + { 511 + if (!bdi || !bdi->dev) 512 + return bdi_unknown_name; 513 + return dev_name(bdi->dev); 506 514 } 507 515 508 516 #endif /* _LINUX_BACKING_DEV_H */

+1

include/linux/bitops.h

··· 13 13 14 14 #define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE) 15 15 #define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_TYPE(long)) 16 + #define BITS_TO_BYTES(nr) DIV_ROUND_UP(nr, BITS_PER_TYPE(char)) 16 17 17 18 extern unsigned int __sw_hweight8(unsigned int w); 18 19 extern unsigned int __sw_hweight16(unsigned int w);

+5 -1

include/linux/fs.h

··· 2737 2737 2738 2738 extern bool filemap_range_has_page(struct address_space *, loff_t lstart, 2739 2739 loff_t lend); 2740 - extern int filemap_write_and_wait(struct address_space *mapping); 2741 2740 extern int filemap_write_and_wait_range(struct address_space *mapping, 2742 2741 loff_t lstart, loff_t lend); 2743 2742 extern int __filemap_fdatawrite_range(struct address_space *mapping, ··· 2745 2746 loff_t start, loff_t end); 2746 2747 extern int filemap_check_errors(struct address_space *mapping); 2747 2748 extern void __filemap_set_wb_err(struct address_space *mapping, int err); 2749 + 2750 + static inline int filemap_write_and_wait(struct address_space *mapping) 2751 + { 2752 + return filemap_write_and_wait_range(mapping, 0, LLONG_MAX); 2753 + } 2748 2754 2749 2755 extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, 2750 2756 loff_t lend);

+2 -3

include/linux/io-mapping.h

··· 28 28 29 29 #ifdef CONFIG_HAVE_ATOMIC_IOMAP 30 30 31 + #include <linux/pfn.h> 31 32 #include <asm/iomap.h> 32 33 /* 33 34 * For small address space machines, mapping large objects ··· 65 64 unsigned long offset) 66 65 { 67 66 resource_size_t phys_addr; 68 - unsigned long pfn; 69 67 70 68 BUG_ON(offset >= mapping->size); 71 69 phys_addr = mapping->base + offset; 72 - pfn = (unsigned long) (phys_addr >> PAGE_SHIFT); 73 - return iomap_atomic_prot_pfn(pfn, mapping->prot); 70 + return iomap_atomic_prot_pfn(PHYS_PFN(phys_addr), mapping->prot); 74 71 } 75 72 76 73 static inline void

+3 -4

include/linux/memblock.h

··· 113 113 int memblock_remove(phys_addr_t base, phys_addr_t size); 114 114 int memblock_free(phys_addr_t base, phys_addr_t size); 115 115 int memblock_reserve(phys_addr_t base, phys_addr_t size); 116 + #ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP 117 + int memblock_physmem_add(phys_addr_t base, phys_addr_t size); 118 + #endif 116 119 void memblock_trim_memory(phys_addr_t align); 117 120 bool memblock_overlaps_region(struct memblock_type *type, 118 121 phys_addr_t base, phys_addr_t size); ··· 130 127 void reset_all_zones_managed_pages(void); 131 128 132 129 /* Low level functions */ 133 - int memblock_add_range(struct memblock_type *type, 134 - phys_addr_t base, phys_addr_t size, 135 - int nid, enum memblock_flags flags); 136 - 137 130 void __next_mem_range(u64 *idx, int nid, enum memblock_flags flags, 138 131 struct memblock_type *type_a, 139 132 struct memblock_type *type_b, phys_addr_t *out_start,

-29

include/linux/memory.h

··· 29 29 int section_count; /* serialized by mem_sysfs_mutex */ 30 30 int online_type; /* for passing data to online routine */ 31 31 int phys_device; /* to which fru does this belong? */ 32 - void *hw; /* optional pointer to fw/hw data */ 33 - int (*phys_callback)(struct memory_block *); 34 32 struct device dev; 35 33 int nid; /* NID for this memory block */ 36 34 }; ··· 51 53 int status_change_nid_normal; 52 54 int status_change_nid_high; 53 55 int status_change_nid; 54 - }; 55 - 56 - /* 57 - * During pageblock isolation, count the number of pages within the 58 - * range [start_pfn, start_pfn + nr_pages) which are owned by code 59 - * in the notifier chain. 60 - */ 61 - #define MEM_ISOLATE_COUNT (1<<0) 62 - 63 - struct memory_isolate_notify { 64 - unsigned long start_pfn; /* Start of range to check */ 65 - unsigned int nr_pages; /* # pages in range to check */ 66 - unsigned int pages_found; /* # pages owned found by callbacks */ 67 56 }; 68 57 69 58 struct notifier_block; ··· 79 94 { 80 95 return 0; 81 96 } 82 - static inline int register_memory_isolate_notifier(struct notifier_block *nb) 83 - { 84 - return 0; 85 - } 86 - static inline void unregister_memory_isolate_notifier(struct notifier_block *nb) 87 - { 88 - } 89 - static inline int memory_isolate_notify(unsigned long val, void *v) 90 - { 91 - return 0; 92 - } 93 97 #else 94 98 extern int register_memory_notifier(struct notifier_block *nb); 95 99 extern void unregister_memory_notifier(struct notifier_block *nb); 96 - extern int register_memory_isolate_notifier(struct notifier_block *nb); 97 - extern void unregister_memory_isolate_notifier(struct notifier_block *nb); 98 100 int create_memory_block_devices(unsigned long start, unsigned long size); 99 101 void remove_memory_block_devices(unsigned long start, unsigned long size); 100 102 extern void memory_dev_init(void); 101 103 extern int memory_notify(unsigned long val, void *v); 102 - extern int memory_isolate_notify(unsigned long val, void *v); 103 104 extern struct memory_block *find_memory_block(struct mem_section *); 104 105 typedef int (*walk_memory_blocks_func_t)(struct memory_block *, void *); 105 106 extern int walk_memory_blocks(unsigned long start, unsigned long size,

+2 -1

include/linux/memory_hotplug.h

··· 94 94 extern int zone_grow_waitqueues(struct zone *zone, unsigned long nr_pages); 95 95 extern int add_one_highpage(struct page *page, int pfn, int bad_ppro); 96 96 /* VM interface that may be used by firmware interface */ 97 - extern int online_pages(unsigned long, unsigned long, int); 97 + extern int online_pages(unsigned long pfn, unsigned long nr_pages, 98 + int online_type, int nid); 98 99 extern int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn, 99 100 unsigned long *valid_start, unsigned long *valid_end); 100 101 extern unsigned long __offline_isolated_pages(unsigned long start_pfn,

+70 -34

include/linux/mm.h

··· 70 70 atomic_long_add(count, &_totalram_pages); 71 71 } 72 72 73 - static inline void totalram_pages_set(long val) 74 - { 75 - atomic_long_set(&_totalram_pages, val); 76 - } 77 - 78 73 extern void * high_memory; 79 74 extern int page_cluster; 80 75 ··· 911 916 912 917 #define ZONEID_PGSHIFT (ZONEID_PGOFF * (ZONEID_SHIFT != 0)) 913 918 914 - #if SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS 915 - #error SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS 916 - #endif 917 - 918 919 #define ZONES_MASK ((1UL << ZONES_WIDTH) - 1) 919 920 #define NODES_MASK ((1UL << NODES_WIDTH) - 1) 920 921 #define SECTIONS_MASK ((1UL << SECTIONS_WIDTH) - 1) ··· 938 947 #endif 939 948 940 949 #ifdef CONFIG_DEV_PAGEMAP_OPS 941 - void __put_devmap_managed_page(struct page *page); 950 + void free_devmap_managed_page(struct page *page); 942 951 DECLARE_STATIC_KEY_FALSE(devmap_managed_key); 943 - static inline bool put_devmap_managed_page(struct page *page) 952 + 953 + static inline bool page_is_devmap_managed(struct page *page) 944 954 { 945 955 if (!static_branch_unlikely(&devmap_managed_key)) 946 956 return false; ··· 950 958 switch (page->pgmap->type) { 951 959 case MEMORY_DEVICE_PRIVATE: 952 960 case MEMORY_DEVICE_FS_DAX: 953 - __put_devmap_managed_page(page); 954 961 return true; 955 962 default: 956 963 break; ··· 957 966 return false; 958 967 } 959 968 969 + void put_devmap_managed_page(struct page *page); 970 + 960 971 #else /* CONFIG_DEV_PAGEMAP_OPS */ 961 - static inline bool put_devmap_managed_page(struct page *page) 972 + static inline bool page_is_devmap_managed(struct page *page) 962 973 { 963 974 return false; 975 + } 976 + 977 + static inline void put_devmap_managed_page(struct page *page) 978 + { 964 979 } 965 980 #endif /* CONFIG_DEV_PAGEMAP_OPS */ 966 981 ··· 1020 1023 * need to inform the device driver through callback. See 1021 1024 * include/linux/memremap.h and HMM for details. 1022 1025 */ 1023 - if (put_devmap_managed_page(page)) 1026 + if (page_is_devmap_managed(page)) { 1027 + put_devmap_managed_page(page); 1024 1028 return; 1029 + } 1025 1030 1026 1031 if (put_page_testzero(page)) 1027 1032 __put_page(page); 1028 1033 } 1029 1034 1030 1035 /** 1031 - * put_user_page() - release a gup-pinned page 1036 + * unpin_user_page() - release a gup-pinned page 1032 1037 * @page: pointer to page to be released 1033 1038 * 1034 - * Pages that were pinned via get_user_pages*() must be released via 1035 - * either put_user_page(), or one of the put_user_pages*() routines 1036 - * below. This is so that eventually, pages that are pinned via 1037 - * get_user_pages*() can be separately tracked and uniquely handled. In 1038 - * particular, interactions with RDMA and filesystems need special 1039 - * handling. 1039 + * Pages that were pinned via pin_user_pages*() must be released via either 1040 + * unpin_user_page(), or one of the unpin_user_pages*() routines. This is so 1041 + * that eventually such pages can be separately tracked and uniquely handled. In 1042 + * particular, interactions with RDMA and filesystems need special handling. 1040 1043 * 1041 - * put_user_page() and put_page() are not interchangeable, despite this early 1042 - * implementation that makes them look the same. put_user_page() calls must 1043 - * be perfectly matched up with get_user_page() calls. 1044 + * unpin_user_page() and put_page() are not interchangeable, despite this early 1045 + * implementation that makes them look the same. unpin_user_page() calls must 1046 + * be perfectly matched up with pin*() calls. 1044 1047 */ 1045 - static inline void put_user_page(struct page *page) 1048 + static inline void unpin_user_page(struct page *page) 1046 1049 { 1047 1050 put_page(page); 1048 1051 } 1049 1052 1050 - void put_user_pages_dirty_lock(struct page **pages, unsigned long npages, 1051 - bool make_dirty); 1053 + void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, 1054 + bool make_dirty); 1052 1055 1053 - void put_user_pages(struct page **pages, unsigned long npages); 1056 + void unpin_user_pages(struct page **pages, unsigned long npages); 1054 1057 1055 1058 #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) 1056 1059 #define SECTION_IN_PAGE_FLAGS ··· 1498 1501 unsigned long start, unsigned long nr_pages, 1499 1502 unsigned int gup_flags, struct page **pages, 1500 1503 struct vm_area_struct **vmas, int *locked); 1504 + long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, 1505 + unsigned long start, unsigned long nr_pages, 1506 + unsigned int gup_flags, struct page **pages, 1507 + struct vm_area_struct **vmas, int *locked); 1501 1508 long get_user_pages(unsigned long start, unsigned long nr_pages, 1502 1509 unsigned int gup_flags, struct page **pages, 1503 1510 struct vm_area_struct **vmas); 1511 + long pin_user_pages(unsigned long start, unsigned long nr_pages, 1512 + unsigned int gup_flags, struct page **pages, 1513 + struct vm_area_struct **vmas); 1504 1514 long get_user_pages_locked(unsigned long start, unsigned long nr_pages, 1505 1515 unsigned int gup_flags, struct page **pages, int *locked); 1506 1516 long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, 1507 1517 struct page **pages, unsigned int gup_flags); 1508 1518 1509 1519 int get_user_pages_fast(unsigned long start, int nr_pages, 1520 + unsigned int gup_flags, struct page **pages); 1521 + int pin_user_pages_fast(unsigned long start, int nr_pages, 1510 1522 unsigned int gup_flags, struct page **pages); 1511 1523 1512 1524 int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc); ··· 2581 2575 #define FOLL_ANON 0x8000 /* don't do file mappings */ 2582 2576 #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ 2583 2577 #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ 2578 + #define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ 2584 2579 2585 2580 /* 2586 - * NOTE on FOLL_LONGTERM: 2581 + * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each 2582 + * other. Here is what they mean, and how to use them: 2587 2583 * 2588 2584 * FOLL_LONGTERM indicates that the page will be held for an indefinite time 2589 - * period _often_ under userspace control. This is contrasted with 2590 - * iov_iter_get_pages() where usages which are transient. 2585 + * period _often_ under userspace control. This is in contrast to 2586 + * iov_iter_get_pages(), whose usages are transient. 2591 2587 * 2592 2588 * FIXME: For pages which are part of a filesystem, mappings are subject to the 2593 2589 * lifetime enforced by the filesystem and we need guarantees that longterm ··· 2604 2596 * Currently only get_user_pages() and get_user_pages_fast() support this flag 2605 2597 * and calls to get_user_pages_[un]locked are specifically not allowed. This 2606 2598 * is due to an incompatibility with the FS DAX check and 2607 - * FAULT_FLAG_ALLOW_RETRY 2599 + * FAULT_FLAG_ALLOW_RETRY. 2608 2600 * 2609 - * In the CMA case: longterm pins in a CMA region would unnecessarily fragment 2610 - * that region. And so CMA attempts to migrate the page before pinning when 2601 + * In the CMA case: long term pins in a CMA region would unnecessarily fragment 2602 + * that region. And so, CMA attempts to migrate the page before pinning, when 2611 2603 * FOLL_LONGTERM is specified. 2604 + * 2605 + * FOLL_PIN indicates that a special kind of tracking (not just page->_refcount, 2606 + * but an additional pin counting system) will be invoked. This is intended for 2607 + * anything that gets a page reference and then touches page data (for example, 2608 + * Direct IO). This lets the filesystem know that some non-file-system entity is 2609 + * potentially changing the pages' data. In contrast to FOLL_GET (whose pages 2610 + * are released via put_page()), FOLL_PIN pages must be released, ultimately, by 2611 + * a call to unpin_user_page(). 2612 + * 2613 + * FOLL_PIN is similar to FOLL_GET: both of these pin pages. They use different 2614 + * and separate refcounting mechanisms, however, and that means that each has 2615 + * its own acquire and release mechanisms: 2616 + * 2617 + * FOLL_GET: get_user_pages*() to acquire, and put_page() to release. 2618 + * 2619 + * FOLL_PIN: pin_user_pages*() to acquire, and unpin_user_pages to release. 2620 + * 2621 + * FOLL_PIN and FOLL_GET are mutually exclusive for a given function call. 2622 + * (The underlying pages may experience both FOLL_GET-based and FOLL_PIN-based 2623 + * calls applied to them, and that's perfectly OK. This is a constraint on the 2624 + * callers, not on the pages.) 2625 + * 2626 + * FOLL_PIN should be set internally by the pin_user_pages*() APIs, never 2627 + * directly by the caller. That's in order to help avoid mismatches when 2628 + * releasing pages: get_user_pages*() pages must be released via put_page(), 2629 + * while pin_user_pages*() pages must be released via unpin_user_page(). 2630 + * 2631 + * Please see Documentation/vm/pin_user_pages.rst for more information. 2612 2632 */ 2613 2633 2614 2634 static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)

+1 -1

include/linux/mmzone.h

··· 758 758 759 759 #ifdef CONFIG_NUMA 760 760 /* 761 - * zone reclaim becomes active if more unmapped pages exist. 761 + * node reclaim becomes active if more unmapped pages exist. 762 762 */ 763 763 unsigned long min_unmapped_pages; 764 764 unsigned long min_slab_pages;

+2 -2

include/linux/page-isolation.h

··· 33 33 #define MEMORY_OFFLINE 0x1 34 34 #define REPORT_FAILURE 0x2 35 35 36 - bool has_unmovable_pages(struct zone *zone, struct page *page, int count, 37 - int migratetype, int flags); 36 + struct page *has_unmovable_pages(struct zone *zone, struct page *page, 37 + int migratetype, int flags); 38 38 void set_pageblock_migratetype(struct page *page, int migratetype); 39 39 int move_freepages_block(struct zone *zone, struct page *page, 40 40 int migratetype, int *num_movable);

+1

include/linux/swab.h

··· 7 7 # define swab16 __swab16 8 8 # define swab32 __swab32 9 9 # define swab64 __swab64 10 + # define swab __swab 10 11 # define swahw32 __swahw32 11 12 # define swahb32 __swahb32 12 13 # define swab16p __swab16p

-11

include/linux/thermal.h

··· 32 32 /* use value, which < 0K, to indicate an invalid/uninitialized temperature */ 33 33 #define THERMAL_TEMP_INVALID -274000 34 34 35 - /* Unit conversion macros */ 36 - #define DECI_KELVIN_TO_CELSIUS(t) ({ \ 37 - long _t = (t); \ 38 - ((_t-2732 >= 0) ? (_t-2732+5)/10 : (_t-2732-5)/10); \ 39 - }) 40 - #define CELSIUS_TO_DECI_KELVIN(t) ((t)*10+2732) 41 - #define DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET(t, off) (((t) - (off)) * 100) 42 - #define DECI_KELVIN_TO_MILLICELSIUS(t) DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET(t, 2732) 43 - #define MILLICELSIUS_TO_DECI_KELVIN_WITH_OFFSET(t, off) (((t) / 100) + (off)) 44 - #define MILLICELSIUS_TO_DECI_KELVIN(t) MILLICELSIUS_TO_DECI_KELVIN_WITH_OFFSET(t, 2732) 45 - 46 35 /* Default Thermal Governor */ 47 36 #if defined(CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE) 48 37 #define DEFAULT_THERMAL_GOVERNOR "step_wise"

+84

include/linux/units.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _LINUX_UNITS_H 3 + #define _LINUX_UNITS_H 4 + 5 + #include <linux/kernel.h> 6 + 7 + #define ABSOLUTE_ZERO_MILLICELSIUS -273150 8 + 9 + static inline long milli_kelvin_to_millicelsius(long t) 10 + { 11 + return t + ABSOLUTE_ZERO_MILLICELSIUS; 12 + } 13 + 14 + static inline long millicelsius_to_milli_kelvin(long t) 15 + { 16 + return t - ABSOLUTE_ZERO_MILLICELSIUS; 17 + } 18 + 19 + #define MILLIDEGREE_PER_DEGREE 1000 20 + #define MILLIDEGREE_PER_DECIDEGREE 100 21 + 22 + static inline long kelvin_to_millicelsius(long t) 23 + { 24 + return milli_kelvin_to_millicelsius(t * MILLIDEGREE_PER_DEGREE); 25 + } 26 + 27 + static inline long millicelsius_to_kelvin(long t) 28 + { 29 + t = millicelsius_to_milli_kelvin(t); 30 + 31 + return DIV_ROUND_CLOSEST(t, MILLIDEGREE_PER_DEGREE); 32 + } 33 + 34 + static inline long deci_kelvin_to_celsius(long t) 35 + { 36 + t = milli_kelvin_to_millicelsius(t * MILLIDEGREE_PER_DECIDEGREE); 37 + 38 + return DIV_ROUND_CLOSEST(t, MILLIDEGREE_PER_DEGREE); 39 + } 40 + 41 + static inline long celsius_to_deci_kelvin(long t) 42 + { 43 + t = millicelsius_to_milli_kelvin(t * MILLIDEGREE_PER_DEGREE); 44 + 45 + return DIV_ROUND_CLOSEST(t, MILLIDEGREE_PER_DECIDEGREE); 46 + } 47 + 48 + /** 49 + * deci_kelvin_to_millicelsius_with_offset - convert Kelvin to Celsius 50 + * @t: temperature value in decidegrees Kelvin 51 + * @offset: difference between Kelvin and Celsius in millidegrees 52 + * 53 + * Return: temperature value in millidegrees Celsius 54 + */ 55 + static inline long deci_kelvin_to_millicelsius_with_offset(long t, long offset) 56 + { 57 + return t * MILLIDEGREE_PER_DECIDEGREE - offset; 58 + } 59 + 60 + static inline long deci_kelvin_to_millicelsius(long t) 61 + { 62 + return milli_kelvin_to_millicelsius(t * MILLIDEGREE_PER_DECIDEGREE); 63 + } 64 + 65 + static inline long millicelsius_to_deci_kelvin(long t) 66 + { 67 + t = millicelsius_to_milli_kelvin(t); 68 + 69 + return DIV_ROUND_CLOSEST(t, MILLIDEGREE_PER_DECIDEGREE); 70 + } 71 + 72 + static inline long kelvin_to_celsius(long t) 73 + { 74 + return t + DIV_ROUND_CLOSEST(ABSOLUTE_ZERO_MILLICELSIUS, 75 + MILLIDEGREE_PER_DEGREE); 76 + } 77 + 78 + static inline long celsius_to_kelvin(long t) 79 + { 80 + return t - DIV_ROUND_CLOSEST(ABSOLUTE_ZERO_MILLICELSIUS, 81 + MILLIDEGREE_PER_DEGREE); 82 + } 83 + 84 + #endif /* _LINUX_UNITS_H */

+6

include/linux/zlib.h

··· 191 191 exceed those passed here. 192 192 */ 193 193 194 + extern int zlib_deflate_dfltcc_enabled (void); 195 + /* 196 + Returns 1 if Deflate-Conversion facility is installed and enabled, 197 + otherwise 0. 198 + */ 199 + 194 200 /* 195 201 extern int deflateInit (z_streamp strm, int level); 196 202

+2 -2

include/trace/events/kmem.h

··· 88 88 __entry->node = node; 89 89 ), 90 90 91 - TP_printk("call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d", 92 - __entry->call_site, 91 + TP_printk("call_site=%pS ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d", 92 + (void *)__entry->call_site, 93 93 __entry->ptr, 94 94 __entry->bytes_req, 95 95 __entry->bytes_alloc,

+17 -20

include/trace/events/writeback.h

··· 67 67 68 68 TP_fast_assign( 69 69 strscpy_pad(__entry->name, 70 - mapping ? dev_name(inode_to_bdi(mapping->host)->dev) : "(unknown)", 71 - 32); 70 + bdi_dev_name(mapping ? inode_to_bdi(mapping->host) : 71 + NULL), 32); 72 72 __entry->ino = mapping ? mapping->host->i_ino : 0; 73 73 __entry->index = page->index; 74 74 ), ··· 111 111 struct backing_dev_info *bdi = inode_to_bdi(inode); 112 112 113 113 /* may be called for files on pseudo FSes w/ unregistered bdi */ 114 - strscpy_pad(__entry->name, 115 - bdi->dev ? dev_name(bdi->dev) : "(unknown)", 32); 114 + strscpy_pad(__entry->name, bdi_dev_name(bdi), 32); 116 115 __entry->ino = inode->i_ino; 117 116 __entry->state = inode->i_state; 118 117 __entry->flags = flags; ··· 192 193 ), 193 194 194 195 TP_fast_assign( 195 - strncpy(__entry->name, dev_name(inode_to_bdi(inode)->dev), 32); 196 + strncpy(__entry->name, bdi_dev_name(inode_to_bdi(inode)), 32); 196 197 __entry->ino = inode->i_ino; 197 198 __entry->cgroup_ino = __trace_wbc_assign_cgroup(wbc); 198 199 __entry->history = history; ··· 221 222 ), 222 223 223 224 TP_fast_assign( 224 - strncpy(__entry->name, dev_name(old_wb->bdi->dev), 32); 225 + strncpy(__entry->name, bdi_dev_name(old_wb->bdi), 32); 225 226 __entry->ino = inode->i_ino; 226 227 __entry->old_cgroup_ino = __trace_wb_assign_cgroup(old_wb); 227 228 __entry->new_cgroup_ino = __trace_wb_assign_cgroup(new_wb); ··· 254 255 struct address_space *mapping = page_mapping(page); 255 256 struct inode *inode = mapping ? mapping->host : NULL; 256 257 257 - strncpy(__entry->name, dev_name(wb->bdi->dev), 32); 258 + strncpy(__entry->name, bdi_dev_name(wb->bdi), 32); 258 259 __entry->bdi_id = wb->bdi->id; 259 260 __entry->ino = inode ? inode->i_ino : 0; 260 261 __entry->memcg_id = wb->memcg_css->id; ··· 287 288 ), 288 289 289 290 TP_fast_assign( 290 - strncpy(__entry->name, dev_name(wb->bdi->dev), 32); 291 + strncpy(__entry->name, bdi_dev_name(wb->bdi), 32); 291 292 __entry->cgroup_ino = __trace_wb_assign_cgroup(wb); 292 293 __entry->frn_bdi_id = frn_bdi_id; 293 294 __entry->frn_memcg_id = frn_memcg_id; ··· 317 318 318 319 TP_fast_assign( 319 320 strscpy_pad(__entry->name, 320 - dev_name(inode_to_bdi(inode)->dev), 32); 321 + bdi_dev_name(inode_to_bdi(inode)), 32); 321 322 __entry->ino = inode->i_ino; 322 323 __entry->sync_mode = wbc->sync_mode; 323 324 __entry->cgroup_ino = __trace_wbc_assign_cgroup(wbc); ··· 360 361 __field(ino_t, cgroup_ino) 361 362 ), 362 363 TP_fast_assign( 363 - strscpy_pad(__entry->name, 364 - wb->bdi->dev ? dev_name(wb->bdi->dev) : 365 - "(unknown)", 32); 364 + strscpy_pad(__entry->name, bdi_dev_name(wb->bdi), 32); 366 365 __entry->nr_pages = work->nr_pages; 367 366 __entry->sb_dev = work->sb ? work->sb->s_dev : 0; 368 367 __entry->sync_mode = work->sync_mode; ··· 413 416 __field(ino_t, cgroup_ino) 414 417 ), 415 418 TP_fast_assign( 416 - strscpy_pad(__entry->name, dev_name(wb->bdi->dev), 32); 419 + strscpy_pad(__entry->name, bdi_dev_name(wb->bdi), 32); 417 420 __entry->cgroup_ino = __trace_wb_assign_cgroup(wb); 418 421 ), 419 422 TP_printk("bdi %s: cgroup_ino=%lu", ··· 435 438 __array(char, name, 32) 436 439 ), 437 440 TP_fast_assign( 438 - strscpy_pad(__entry->name, dev_name(bdi->dev), 32); 441 + strscpy_pad(__entry->name, bdi_dev_name(bdi), 32); 439 442 ), 440 443 TP_printk("bdi %s", 441 444 __entry->name ··· 460 463 ), 461 464 462 465 TP_fast_assign( 463 - strscpy_pad(__entry->name, dev_name(bdi->dev), 32); 466 + strscpy_pad(__entry->name, bdi_dev_name(bdi), 32); 464 467 __entry->nr_to_write = wbc->nr_to_write; 465 468 __entry->pages_skipped = wbc->pages_skipped; 466 469 __entry->sync_mode = wbc->sync_mode; ··· 511 514 ), 512 515 TP_fast_assign( 513 516 unsigned long *older_than_this = work->older_than_this; 514 - strscpy_pad(__entry->name, dev_name(wb->bdi->dev), 32); 517 + strscpy_pad(__entry->name, bdi_dev_name(wb->bdi), 32); 515 518 __entry->older = older_than_this ? *older_than_this : 0; 516 519 __entry->age = older_than_this ? 517 520 (jiffies - *older_than_this) * 1000 / HZ : -1; ··· 597 600 ), 598 601 599 602 TP_fast_assign( 600 - strscpy_pad(__entry->bdi, dev_name(wb->bdi->dev), 32); 603 + strscpy_pad(__entry->bdi, bdi_dev_name(wb->bdi), 32); 601 604 __entry->write_bw = KBps(wb->write_bandwidth); 602 605 __entry->avg_write_bw = KBps(wb->avg_write_bandwidth); 603 606 __entry->dirty_rate = KBps(dirty_rate); ··· 662 665 663 666 TP_fast_assign( 664 667 unsigned long freerun = (thresh + bg_thresh) / 2; 665 - strscpy_pad(__entry->bdi, dev_name(wb->bdi->dev), 32); 668 + strscpy_pad(__entry->bdi, bdi_dev_name(wb->bdi), 32); 666 669 667 670 __entry->limit = global_wb_domain.dirty_limit; 668 671 __entry->setpoint = (global_wb_domain.dirty_limit + ··· 723 726 724 727 TP_fast_assign( 725 728 strscpy_pad(__entry->name, 726 - dev_name(inode_to_bdi(inode)->dev), 32); 729 + bdi_dev_name(inode_to_bdi(inode)), 32); 727 730 __entry->ino = inode->i_ino; 728 731 __entry->state = inode->i_state; 729 732 __entry->dirtied_when = inode->dirtied_when; ··· 797 800 798 801 TP_fast_assign( 799 802 strscpy_pad(__entry->name, 800 - dev_name(inode_to_bdi(inode)->dev), 32); 803 + bdi_dev_name(inode_to_bdi(inode)), 32); 801 804 __entry->ino = inode->i_ino; 802 805 __entry->state = inode->i_state; 803 806 __entry->dirtied_when = inode->dirtied_when;

+10

include/uapi/linux/swab.h

··· 4 4 5 5 #include <linux/types.h> 6 6 #include <linux/compiler.h> 7 + #include <asm/bitsperlong.h> 7 8 #include <asm/swab.h> 8 9 9 10 /* ··· 132 131 ___constant_swab64(x) : \ 133 132 __fswab64(x)) 134 133 #endif 134 + 135 + static __always_inline unsigned long __swab(const unsigned long y) 136 + { 137 + #if BITS_PER_LONG == 64 138 + return __swab64(y); 139 + #else /* BITS_PER_LONG == 32 */ 140 + return __swab32(y); 141 + #endif 142 + } 135 143 136 144 /** 137 145 * __swahw32 - return a word-swapped 32-bit value

+1 -1

include/uapi/linux/sysctl.h

··· 195 195 VM_MIN_UNMAPPED=32, /* Set min percent of unmapped pages */ 196 196 VM_PANIC_ON_OOM=33, /* panic at out-of-memory */ 197 197 VM_VDSO_ENABLED=34, /* map VDSO into new processes? */ 198 - VM_MIN_SLAB=35, /* Percent pages ignored by zone reclaim */ 198 + VM_MIN_SLAB=35, /* Percent pages ignored by node reclaim */ 199 199 }; 200 200 201 201

+27 -9

init/main.c

··· 246 246 early_param("loglevel", loglevel); 247 247 248 248 /* Change NUL term back to "=", to make "param" the whole string. */ 249 - static int __init repair_env_string(char *param, char *val, 250 - const char *unused, void *arg) 249 + static void __init repair_env_string(char *param, char *val) 251 250 { 252 251 if (val) { 253 252 /* param=val or param="val"? */ ··· 255 256 else if (val == param+strlen(param)+2) { 256 257 val[-2] = '='; 257 258 memmove(val-1, val, strlen(val)+1); 258 - val--; 259 259 } else 260 260 BUG(); 261 261 } 262 - return 0; 263 262 } 264 263 265 264 /* Anything after -- gets handed straight to init. */ ··· 269 272 if (panic_later) 270 273 return 0; 271 274 272 - repair_env_string(param, val, unused, NULL); 275 + repair_env_string(param, val); 273 276 274 277 for (i = 0; argv_init[i]; i++) { 275 278 if (i == MAX_INIT_ARGS) { ··· 289 292 static int __init unknown_bootoption(char *param, char *val, 290 293 const char *unused, void *arg) 291 294 { 292 - repair_env_string(param, val, unused, NULL); 295 + size_t len = strlen(param); 296 + 297 + repair_env_string(param, val); 293 298 294 299 /* Handle obsolete-style parameters */ 295 300 if (obsolete_checksetup(param)) 296 301 return 0; 297 302 298 303 /* Unused module parameter. */ 299 - if (strchr(param, '.') && (!val || strchr(param, '.') < val)) 304 + if (strnchr(param, len, '.')) 300 305 return 0; 301 306 302 307 if (panic_later) ··· 312 313 panic_later = "env"; 313 314 panic_param = param; 314 315 } 315 - if (!strncmp(param, envp_init[i], val - param)) 316 + if (!strncmp(param, envp_init[i], len+1)) 316 317 break; 317 318 } 318 319 envp_init[i] = param; ··· 990 991 "late", 991 992 }; 992 993 994 + static int __init ignore_unknown_bootoption(char *param, char *val, 995 + const char *unused, void *arg) 996 + { 997 + return 0; 998 + } 999 + 993 1000 static void __init do_initcall_level(int level) 994 1001 { 995 1002 initcall_entry_t *fn; ··· 1005 1000 initcall_command_line, __start___param, 1006 1001 __stop___param - __start___param, 1007 1002 level, level, 1008 - NULL, &repair_env_string); 1003 + NULL, ignore_unknown_bootoption); 1009 1004 1010 1005 trace_initcall_level(initcall_level_names[level]); 1011 1006 for (fn = initcall_levels[level]; fn < initcall_levels[level+1]; fn++) ··· 1048 1043 1049 1044 static int run_init_process(const char *init_filename) 1050 1045 { 1046 + const char *const *p; 1047 + 1051 1048 argv_init[0] = init_filename; 1052 1049 pr_info("Run %s as init process\n", init_filename); 1050 + pr_debug(" with arguments:\n"); 1051 + for (p = argv_init; *p; p++) 1052 + pr_debug(" %s\n", *p); 1053 + pr_debug(" with environment:\n"); 1054 + for (p = envp_init; *p; p++) 1055 + pr_debug(" %s\n", *p); 1053 1056 return do_execve(getname_kernel(init_filename), 1054 1057 (const char __user *const __user *)argv_init, 1055 1058 (const char __user *const __user *)envp_init); ··· 1103 1090 rodata_test(); 1104 1091 } else 1105 1092 pr_info("Kernel memory protection disabled.\n"); 1093 + } 1094 + #elif defined(CONFIG_ARCH_HAS_STRICT_KERNEL_RWX) 1095 + static inline void mark_readonly(void) 1096 + { 1097 + pr_warn("Kernel memory protection not selected by kernel config.\n"); 1106 1098 } 1107 1099 #else 1108 1100 static inline void mark_readonly(void)

+1

kernel/Makefile

··· 27 27 # and produce insane amounts of uninteresting coverage. 28 28 KCOV_INSTRUMENT_module.o := n 29 29 KCOV_INSTRUMENT_extable.o := n 30 + KCOV_INSTRUMENT_stacktrace.o := n 30 31 # Don't self-instrument. 31 32 KCOV_INSTRUMENT_kcov.o := n 32 33 KASAN_SANITIZE_kcov.o := n

+7

lib/Kconfig

··· 278 278 tristate 279 279 select BITREVERSE 280 280 281 + config ZLIB_DFLTCC 282 + def_bool y 283 + depends on S390 284 + prompt "Enable s390x DEFLATE CONVERSION CALL support for kernel zlib" 285 + help 286 + Enable s390x hardware support for zlib in the kernel. 287 + 281 288 config LZO_COMPRESS 282 289 tristate 283 290

+2

lib/Makefile

··· 16 16 KCOV_INSTRUMENT_list_debug.o := n 17 17 KCOV_INSTRUMENT_debugobjects.o := n 18 18 KCOV_INSTRUMENT_dynamic_debug.o := n 19 + KCOV_INSTRUMENT_fault-inject.o := n 19 20 20 21 # Early boot use of cmdline, don't instrument it 21 22 ifdef CONFIG_AMD_MEM_ENCRYPT ··· 141 140 obj-$(CONFIG_842_DECOMPRESS) += 842/ 142 141 obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/ 143 142 obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/ 143 + obj-$(CONFIG_ZLIB_DFLTCC) += zlib_dfltcc/ 144 144 obj-$(CONFIG_REED_SOLOMON) += reed_solomon/ 145 145 obj-$(CONFIG_BCH) += bch.o 146 146 obj-$(CONFIG_LZO_COMPRESS) += lzo/

+13

lib/decompress_inflate.c

··· 10 10 #include "zlib_inflate/inftrees.c" 11 11 #include "zlib_inflate/inffast.c" 12 12 #include "zlib_inflate/inflate.c" 13 + #ifdef CONFIG_ZLIB_DFLTCC 14 + #include "zlib_dfltcc/dfltcc.c" 15 + #include "zlib_dfltcc/dfltcc_inflate.c" 16 + #endif 13 17 14 18 #else /* STATIC */ 15 19 /* initramfs et al: linked */ ··· 80 76 } 81 77 82 78 strm->workspace = malloc(flush ? zlib_inflate_workspacesize() : 79 + #ifdef CONFIG_ZLIB_DFLTCC 80 + /* Always allocate the full workspace for DFLTCC */ 81 + zlib_inflate_workspacesize()); 82 + #else 83 83 sizeof(struct inflate_state)); 84 + #endif 84 85 if (strm->workspace == NULL) { 85 86 error("Out of memory while allocating workspace"); 86 87 goto gunzip_nomem4; ··· 132 123 133 124 rc = zlib_inflateInit2(strm, -MAX_WBITS); 134 125 126 + #ifdef CONFIG_ZLIB_DFLTCC 127 + /* Always keep the window for DFLTCC */ 128 + #else 135 129 if (!flush) { 136 130 WS(strm)->inflate_state.wsize = 0; 137 131 WS(strm)->inflate_state.window = NULL; 138 132 } 133 + #endif 139 134 140 135 while (rc == Z_OK) { 141 136 if (strm->avail_in == 0) {

+20 -58

lib/find_bit.c

··· 17 17 #include <linux/export.h> 18 18 #include <linux/kernel.h> 19 19 20 - #if !defined(find_next_bit) || !defined(find_next_zero_bit) || \ 21 - !defined(find_next_and_bit) 22 - 20 + #if !defined(find_next_bit) || !defined(find_next_zero_bit) || \ 21 + !defined(find_next_bit_le) || !defined(find_next_zero_bit_le) || \ 22 + !defined(find_next_and_bit) 23 23 /* 24 24 * This is a common helper function for find_next_bit, find_next_zero_bit, and 25 25 * find_next_and_bit. The differences are: ··· 27 27 * searching it for one bits. 28 28 * - The optional "addr2", which is anded with "addr1" if present. 29 29 */ 30 - static inline unsigned long _find_next_bit(const unsigned long *addr1, 30 + static unsigned long _find_next_bit(const unsigned long *addr1, 31 31 const unsigned long *addr2, unsigned long nbits, 32 - unsigned long start, unsigned long invert) 32 + unsigned long start, unsigned long invert, unsigned long le) 33 33 { 34 - unsigned long tmp; 34 + unsigned long tmp, mask; 35 35 36 36 if (unlikely(start >= nbits)) 37 37 return nbits; ··· 42 42 tmp ^= invert; 43 43 44 44 /* Handle 1st word. */ 45 - tmp &= BITMAP_FIRST_WORD_MASK(start); 45 + mask = BITMAP_FIRST_WORD_MASK(start); 46 + if (le) 47 + mask = swab(mask); 48 + 49 + tmp &= mask; 50 + 46 51 start = round_down(start, BITS_PER_LONG); 47 52 48 53 while (!tmp) { ··· 61 56 tmp ^= invert; 62 57 } 63 58 59 + if (le) 60 + tmp = swab(tmp); 61 + 64 62 return min(start + __ffs(tmp), nbits); 65 63 } 66 64 #endif ··· 75 67 unsigned long find_next_bit(const unsigned long *addr, unsigned long size, 76 68 unsigned long offset) 77 69 { 78 - return _find_next_bit(addr, NULL, size, offset, 0UL); 70 + return _find_next_bit(addr, NULL, size, offset, 0UL, 0); 79 71 } 80 72 EXPORT_SYMBOL(find_next_bit); 81 73 #endif ··· 84 76 unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size, 85 77 unsigned long offset) 86 78 { 87 - return _find_next_bit(addr, NULL, size, offset, ~0UL); 79 + return _find_next_bit(addr, NULL, size, offset, ~0UL, 0); 88 80 } 89 81 EXPORT_SYMBOL(find_next_zero_bit); 90 82 #endif ··· 94 86 const unsigned long *addr2, unsigned long size, 95 87 unsigned long offset) 96 88 { 97 - return _find_next_bit(addr1, addr2, size, offset, 0UL); 89 + return _find_next_bit(addr1, addr2, size, offset, 0UL, 0); 98 90 } 99 91 EXPORT_SYMBOL(find_next_and_bit); 100 92 #endif ··· 157 149 158 150 #ifdef __BIG_ENDIAN 159 151 160 - /* include/linux/byteorder does not support "unsigned long" type */ 161 - static inline unsigned long ext2_swab(const unsigned long y) 162 - { 163 - #if BITS_PER_LONG == 64 164 - return (unsigned long) __swab64((u64) y); 165 - #elif BITS_PER_LONG == 32 166 - return (unsigned long) __swab32((u32) y); 167 - #else 168 - #error BITS_PER_LONG not defined 169 - #endif 170 - } 171 - 172 - #if !defined(find_next_bit_le) || !defined(find_next_zero_bit_le) 173 - static inline unsigned long _find_next_bit_le(const unsigned long *addr1, 174 - const unsigned long *addr2, unsigned long nbits, 175 - unsigned long start, unsigned long invert) 176 - { 177 - unsigned long tmp; 178 - 179 - if (unlikely(start >= nbits)) 180 - return nbits; 181 - 182 - tmp = addr1[start / BITS_PER_LONG]; 183 - if (addr2) 184 - tmp &= addr2[start / BITS_PER_LONG]; 185 - tmp ^= invert; 186 - 187 - /* Handle 1st word. */ 188 - tmp &= ext2_swab(BITMAP_FIRST_WORD_MASK(start)); 189 - start = round_down(start, BITS_PER_LONG); 190 - 191 - while (!tmp) { 192 - start += BITS_PER_LONG; 193 - if (start >= nbits) 194 - return nbits; 195 - 196 - tmp = addr1[start / BITS_PER_LONG]; 197 - if (addr2) 198 - tmp &= addr2[start / BITS_PER_LONG]; 199 - tmp ^= invert; 200 - } 201 - 202 - return min(start + __ffs(ext2_swab(tmp)), nbits); 203 - } 204 - #endif 205 - 206 152 #ifndef find_next_zero_bit_le 207 153 unsigned long find_next_zero_bit_le(const void *addr, unsigned 208 154 long size, unsigned long offset) 209 155 { 210 - return _find_next_bit_le(addr, NULL, size, offset, ~0UL); 156 + return _find_next_bit(addr, NULL, size, offset, ~0UL, 1); 211 157 } 212 158 EXPORT_SYMBOL(find_next_zero_bit_le); 213 159 #endif ··· 170 208 unsigned long find_next_bit_le(const void *addr, unsigned 171 209 long size, unsigned long offset) 172 210 { 173 - return _find_next_bit_le(addr, NULL, size, offset, 0UL); 211 + return _find_next_bit(addr, NULL, size, offset, 0UL, 1); 174 212 } 175 213 EXPORT_SYMBOL(find_next_bit_le); 176 214 #endif

+1 -1

lib/scatterlist.c

··· 311 311 if (prv) 312 312 table->nents = ++table->orig_nents; 313 313 314 - return -ENOMEM; 314 + return -ENOMEM; 315 315 } 316 316 317 317 sg_init_table(sg, alloc_size);

+5 -4

lib/test_bitmap.c

··· 275 275 static void __init test_replace(void) 276 276 { 277 277 unsigned int nbits = 64; 278 + unsigned int nlongs = DIV_ROUND_UP(nbits, BITS_PER_LONG); 278 279 DECLARE_BITMAP(bmap, 1024); 279 280 280 281 bitmap_zero(bmap, 1024); 281 - bitmap_replace(bmap, &exp2[0], &exp2[1], exp2_to_exp3_mask, nbits); 282 + bitmap_replace(bmap, &exp2[0 * nlongs], &exp2[1 * nlongs], exp2_to_exp3_mask, nbits); 282 283 expect_eq_bitmap(bmap, exp3_0_1, nbits); 283 284 284 285 bitmap_zero(bmap, 1024); 285 - bitmap_replace(bmap, &exp2[1], &exp2[0], exp2_to_exp3_mask, nbits); 286 + bitmap_replace(bmap, &exp2[1 * nlongs], &exp2[0 * nlongs], exp2_to_exp3_mask, nbits); 286 287 expect_eq_bitmap(bmap, exp3_1_0, nbits); 287 288 288 289 bitmap_fill(bmap, 1024); 289 - bitmap_replace(bmap, &exp2[0], &exp2[1], exp2_to_exp3_mask, nbits); 290 + bitmap_replace(bmap, &exp2[0 * nlongs], &exp2[1 * nlongs], exp2_to_exp3_mask, nbits); 290 291 expect_eq_bitmap(bmap, exp3_0_1, nbits); 291 292 292 293 bitmap_fill(bmap, 1024); 293 - bitmap_replace(bmap, &exp2[1], &exp2[0], exp2_to_exp3_mask, nbits); 294 + bitmap_replace(bmap, &exp2[1 * nlongs], &exp2[0 * nlongs], exp2_to_exp3_mask, nbits); 294 295 expect_eq_bitmap(bmap, exp3_1_0, nbits); 295 296 } 296 297

+1

lib/test_kasan.c

··· 158 158 if (!ptr1 || !ptr2) { 159 159 pr_err("Allocation failed\n"); 160 160 kfree(ptr1); 161 + kfree(ptr2); 161 162 return; 162 163 } 163 164

+47 -38

lib/zlib_deflate/deflate.c

··· 52 52 #include <linux/zutil.h> 53 53 #include "defutil.h" 54 54 55 + /* architecture-specific bits */ 56 + #ifdef CONFIG_ZLIB_DFLTCC 57 + # include "../zlib_dfltcc/dfltcc.h" 58 + #else 59 + #define DEFLATE_RESET_HOOK(strm) do {} while (0) 60 + #define DEFLATE_HOOK(strm, flush, bstate) 0 61 + #define DEFLATE_NEED_CHECKSUM(strm) 1 62 + #define DEFLATE_DFLTCC_ENABLED() 0 63 + #endif 55 64 56 65 /* =========================================================================== 57 66 * Function prototypes. 58 67 */ 59 - typedef enum { 60 - need_more, /* block not completed, need more input or more output */ 61 - block_done, /* block flush performed */ 62 - finish_started, /* finish started, need only more output at next deflate */ 63 - finish_done /* finish done, accept no more input or output */ 64 - } block_state; 65 68 66 69 typedef block_state (*compress_func) (deflate_state *s, int flush); 67 70 /* Compression function. Returns the block state after the call. */ ··· 75 72 static block_state deflate_slow (deflate_state *s, int flush); 76 73 static void lm_init (deflate_state *s); 77 74 static void putShortMSB (deflate_state *s, uInt b); 78 - static void flush_pending (z_streamp strm); 79 75 static int read_buf (z_streamp strm, Byte *buf, unsigned size); 80 76 static uInt longest_match (deflate_state *s, IPos cur_match); 81 77 ··· 99 97 /* Minimum amount of lookahead, except at the end of the input file. 100 98 * See deflate.c for comments about the MIN_MATCH+1. 101 99 */ 100 + 101 + /* Workspace to be allocated for deflate processing */ 102 + typedef struct deflate_workspace { 103 + /* State memory for the deflator */ 104 + deflate_state deflate_memory; 105 + #ifdef CONFIG_ZLIB_DFLTCC 106 + /* State memory for s390 hardware deflate */ 107 + struct dfltcc_state dfltcc_memory; 108 + #endif 109 + Byte *window_memory; 110 + Pos *prev_memory; 111 + Pos *head_memory; 112 + char *overlay_memory; 113 + } deflate_workspace; 114 + 115 + #ifdef CONFIG_ZLIB_DFLTCC 116 + /* dfltcc_state must be doubleword aligned for DFLTCC call */ 117 + static_assert(offsetof(struct deflate_workspace, dfltcc_memory) % 8 == 0); 118 + #endif 102 119 103 120 /* Values for max_lazy_match, good_match and max_chain_length, depending on 104 121 * the desired pack level (0..9). The values given below have been tuned to ··· 228 207 */ 229 208 next = (char *) mem; 230 209 next += sizeof(*mem); 210 + #ifdef CONFIG_ZLIB_DFLTCC 211 + /* 212 + * DFLTCC requires the window to be page aligned. 213 + * Thus, we overallocate and take the aligned portion of the buffer. 214 + */ 215 + mem->window_memory = (Byte *) PTR_ALIGN(next, PAGE_SIZE); 216 + #else 231 217 mem->window_memory = (Byte *) next; 218 + #endif 232 219 next += zlib_deflate_window_memsize(windowBits); 233 220 mem->prev_memory = (Pos *) next; 234 221 next += zlib_deflate_prev_memsize(windowBits); ··· 306 277 zlib_tr_init(s); 307 278 lm_init(s); 308 279 280 + DEFLATE_RESET_HOOK(strm); 281 + 309 282 return Z_OK; 310 283 } 311 284 ··· 324 293 put_byte(s, (Byte)(b >> 8)); 325 294 put_byte(s, (Byte)(b & 0xff)); 326 295 } 327 - 328 - /* ========================================================================= 329 - * Flush as much pending output as possible. All deflate() output goes 330 - * through this function so some applications may wish to modify it 331 - * to avoid allocating a large strm->next_out buffer and copying into it. 332 - * (See also read_buf()). 333 - */ 334 - static void flush_pending( 335 - z_streamp strm 336 - ) 337 - { 338 - deflate_state *s = (deflate_state *) strm->state; 339 - unsigned len = s->pending; 340 - 341 - if (len > strm->avail_out) len = strm->avail_out; 342 - if (len == 0) return; 343 - 344 - if (strm->next_out != NULL) { 345 - memcpy(strm->next_out, s->pending_out, len); 346 - strm->next_out += len; 347 - } 348 - s->pending_out += len; 349 - strm->total_out += len; 350 - strm->avail_out -= len; 351 - s->pending -= len; 352 - if (s->pending == 0) { 353 - s->pending_out = s->pending_buf; 354 - } 355 - } 356 296 357 297 /* ========================================================================= */ 358 298 int zlib_deflate( ··· 406 404 (flush != Z_NO_FLUSH && s->status != FINISH_STATE)) { 407 405 block_state bstate; 408 406 409 - bstate = (*(configuration_table[s->level].func))(s, flush); 407 + bstate = DEFLATE_HOOK(strm, flush, &bstate) ? bstate : 408 + (*(configuration_table[s->level].func))(s, flush); 410 409 411 410 if (bstate == finish_started || bstate == finish_done) { 412 411 s->status = FINISH_STATE; ··· 506 503 507 504 strm->avail_in -= len; 508 505 509 - if (!((deflate_state *)(strm->state))->noheader) { 506 + if (!DEFLATE_NEED_CHECKSUM(strm)) {} 507 + else if (!((deflate_state *)(strm->state))->noheader) { 510 508 strm->adler = zlib_adler32(strm->adler, strm->next_in, len); 511 509 } 512 510 memcpy(buf, strm->next_in, len); ··· 1138 1134 + zlib_deflate_prev_memsize(windowBits) 1139 1135 + zlib_deflate_head_memsize(memLevel) 1140 1136 + zlib_deflate_overlay_memsize(memLevel); 1137 + } 1138 + 1139 + int zlib_deflate_dfltcc_enabled(void) 1140 + { 1141 + return DEFLATE_DFLTCC_ENABLED(); 1141 1142 }

+1

lib/zlib_deflate/deflate_syms.c

··· 12 12 #include <linux/zlib.h> 13 13 14 14 EXPORT_SYMBOL(zlib_deflate_workspacesize); 15 + EXPORT_SYMBOL(zlib_deflate_dfltcc_enabled); 15 16 EXPORT_SYMBOL(zlib_deflate); 16 17 EXPORT_SYMBOL(zlib_deflateInit2); 17 18 EXPORT_SYMBOL(zlib_deflateEnd);

-54

lib/zlib_deflate/deftree.c

··· 76 76 * probability, to avoid transmitting the lengths for unused bit length codes. 77 77 */ 78 78 79 - #define Buf_size (8 * 2*sizeof(char)) 80 - /* Number of bits used within bi_buf. (bi_buf might be implemented on 81 - * more than 16 bits on some systems.) 82 - */ 83 - 84 79 /* =========================================================================== 85 80 * Local data. These are initialized only once. 86 81 */ ··· 142 147 static void compress_block (deflate_state *s, ct_data *ltree, 143 148 ct_data *dtree); 144 149 static void set_data_type (deflate_state *s); 145 - static void bi_windup (deflate_state *s); 146 150 static void bi_flush (deflate_state *s); 147 151 static void copy_block (deflate_state *s, char *buf, unsigned len, 148 152 int header); ··· 162 168 * must not have side effects. dist_code[256] and dist_code[257] are never 163 169 * used. 164 170 */ 165 - 166 - /* =========================================================================== 167 - * Send a value on a given number of bits. 168 - * IN assertion: length <= 16 and value fits in length bits. 169 - */ 170 - #ifdef DEBUG_ZLIB 171 - static void send_bits (deflate_state *s, int value, int length); 172 - 173 - static void send_bits( 174 - deflate_state *s, 175 - int value, /* value to send */ 176 - int length /* number of bits */ 177 - ) 178 - { 179 - Tracevv((stderr," l %2d v %4x ", length, value)); 180 - Assert(length > 0 && length <= 15, "invalid length"); 181 - s->bits_sent += (ulg)length; 182 - 183 - /* If not enough room in bi_buf, use (valid) bits from bi_buf and 184 - * (16 - bi_valid) bits from value, leaving (width - (16-bi_valid)) 185 - * unused bits in value. 186 - */ 187 - if (s->bi_valid > (int)Buf_size - length) { 188 - s->bi_buf |= (value << s->bi_valid); 189 - put_short(s, s->bi_buf); 190 - s->bi_buf = (ush)value >> (Buf_size - s->bi_valid); 191 - s->bi_valid += length - Buf_size; 192 - } else { 193 - s->bi_buf |= value << s->bi_valid; 194 - s->bi_valid += length; 195 - } 196 - } 197 - #else /* !DEBUG_ZLIB */ 198 - 199 - #define send_bits(s, value, length) \ 200 - { int len = length;\ 201 - if (s->bi_valid > (int)Buf_size - len) {\ 202 - int val = value;\ 203 - s->bi_buf |= (val << s->bi_valid);\ 204 - put_short(s, s->bi_buf);\ 205 - s->bi_buf = (ush)val >> (Buf_size - s->bi_valid);\ 206 - s->bi_valid += len - Buf_size;\ 207 - } else {\ 208 - s->bi_buf |= (value) << s->bi_valid;\ 209 - s->bi_valid += len;\ 210 - }\ 211 - } 212 - #endif /* DEBUG_ZLIB */ 213 171 214 172 /* =========================================================================== 215 173 * Initialize the various 'constant' tables. In a multi-threaded environment,

+124 -10

lib/zlib_deflate/defutil.h

··· 1 + #ifndef DEFUTIL_H 2 + #define DEFUTIL_H 1 3 2 - 4 + #include <linux/zutil.h> 3 5 4 6 #define Assert(err, str) 5 7 #define Trace(dummy) ··· 240 238 241 239 } deflate_state; 242 240 243 - typedef struct deflate_workspace { 244 - /* State memory for the deflator */ 245 - deflate_state deflate_memory; 246 - Byte *window_memory; 247 - Pos *prev_memory; 248 - Pos *head_memory; 249 - char *overlay_memory; 250 - } deflate_workspace; 251 - 241 + #ifdef CONFIG_ZLIB_DFLTCC 242 + #define zlib_deflate_window_memsize(windowBits) \ 243 + (2 * (1 << (windowBits)) * sizeof(Byte) + PAGE_SIZE) 244 + #else 252 245 #define zlib_deflate_window_memsize(windowBits) \ 253 246 (2 * (1 << (windowBits)) * sizeof(Byte)) 247 + #endif 254 248 #define zlib_deflate_prev_memsize(windowBits) \ 255 249 ((1 << (windowBits)) * sizeof(Pos)) 256 250 #define zlib_deflate_head_memsize(memLevel) \ ··· 291 293 } 292 294 293 295 /* =========================================================================== 296 + * Reverse the first len bits of a code, using straightforward code (a faster 297 + * method would use a table) 298 + * IN assertion: 1 <= len <= 15 299 + */ 300 + static inline unsigned bi_reverse( 301 + unsigned code, /* the value to invert */ 302 + int len /* its bit length */ 303 + ) 304 + { 305 + register unsigned res = 0; 306 + do { 307 + res |= code & 1; 308 + code >>= 1, res <<= 1; 309 + } while (--len > 0); 310 + return res >> 1; 311 + } 312 + 313 + /* =========================================================================== 294 314 * Flush the bit buffer, keeping at most 7 bits in it. 295 315 */ 296 316 static inline void bi_flush(deflate_state *s) ··· 341 325 #endif 342 326 } 343 327 328 + typedef enum { 329 + need_more, /* block not completed, need more input or more output */ 330 + block_done, /* block flush performed */ 331 + finish_started, /* finish started, need only more output at next deflate */ 332 + finish_done /* finish done, accept no more input or output */ 333 + } block_state; 334 + 335 + #define Buf_size (8 * 2*sizeof(char)) 336 + /* Number of bits used within bi_buf. (bi_buf might be implemented on 337 + * more than 16 bits on some systems.) 338 + */ 339 + 340 + /* =========================================================================== 341 + * Send a value on a given number of bits. 342 + * IN assertion: length <= 16 and value fits in length bits. 343 + */ 344 + #ifdef DEBUG_ZLIB 345 + static void send_bits (deflate_state *s, int value, int length); 346 + 347 + static void send_bits( 348 + deflate_state *s, 349 + int value, /* value to send */ 350 + int length /* number of bits */ 351 + ) 352 + { 353 + Tracevv((stderr," l %2d v %4x ", length, value)); 354 + Assert(length > 0 && length <= 15, "invalid length"); 355 + s->bits_sent += (ulg)length; 356 + 357 + /* If not enough room in bi_buf, use (valid) bits from bi_buf and 358 + * (16 - bi_valid) bits from value, leaving (width - (16-bi_valid)) 359 + * unused bits in value. 360 + */ 361 + if (s->bi_valid > (int)Buf_size - length) { 362 + s->bi_buf |= (value << s->bi_valid); 363 + put_short(s, s->bi_buf); 364 + s->bi_buf = (ush)value >> (Buf_size - s->bi_valid); 365 + s->bi_valid += length - Buf_size; 366 + } else { 367 + s->bi_buf |= value << s->bi_valid; 368 + s->bi_valid += length; 369 + } 370 + } 371 + #else /* !DEBUG_ZLIB */ 372 + 373 + #define send_bits(s, value, length) \ 374 + { int len = length;\ 375 + if (s->bi_valid > (int)Buf_size - len) {\ 376 + int val = value;\ 377 + s->bi_buf |= (val << s->bi_valid);\ 378 + put_short(s, s->bi_buf);\ 379 + s->bi_buf = (ush)val >> (Buf_size - s->bi_valid);\ 380 + s->bi_valid += len - Buf_size;\ 381 + } else {\ 382 + s->bi_buf |= (value) << s->bi_valid;\ 383 + s->bi_valid += len;\ 384 + }\ 385 + } 386 + #endif /* DEBUG_ZLIB */ 387 + 388 + static inline void zlib_tr_send_bits( 389 + deflate_state *s, 390 + int value, 391 + int length 392 + ) 393 + { 394 + send_bits(s, value, length); 395 + } 396 + 397 + /* ========================================================================= 398 + * Flush as much pending output as possible. All deflate() output goes 399 + * through this function so some applications may wish to modify it 400 + * to avoid allocating a large strm->next_out buffer and copying into it. 401 + * (See also read_buf()). 402 + */ 403 + static inline void flush_pending( 404 + z_streamp strm 405 + ) 406 + { 407 + deflate_state *s = (deflate_state *) strm->state; 408 + unsigned len = s->pending; 409 + 410 + if (len > strm->avail_out) len = strm->avail_out; 411 + if (len == 0) return; 412 + 413 + if (strm->next_out != NULL) { 414 + memcpy(strm->next_out, s->pending_out, len); 415 + strm->next_out += len; 416 + } 417 + s->pending_out += len; 418 + strm->total_out += len; 419 + strm->avail_out -= len; 420 + s->pending -= len; 421 + if (s->pending == 0) { 422 + s->pending_out = s->pending_buf; 423 + } 424 + } 425 + #endif /* DEFUTIL_H */

+11

lib/zlib_dfltcc/Makefile

··· 1 + # SPDX-License-Identifier: GPL-2.0-only 2 + # 3 + # This is a modified version of zlib, which does all memory 4 + # allocation ahead of time. 5 + # 6 + # This is the code for s390 zlib hardware support. 7 + # 8 + 9 + obj-$(CONFIG_ZLIB_DFLTCC) += zlib_dfltcc.o 10 + 11 + zlib_dfltcc-objs := dfltcc.o dfltcc_deflate.o dfltcc_inflate.o dfltcc_syms.o

+55

lib/zlib_dfltcc/dfltcc.c

··· 1 + // SPDX-License-Identifier: Zlib 2 + /* dfltcc.c - SystemZ DEFLATE CONVERSION CALL support. */ 3 + 4 + #include <linux/zutil.h> 5 + #include "dfltcc_util.h" 6 + #include "dfltcc.h" 7 + 8 + char *oesc_msg( 9 + char *buf, 10 + int oesc 11 + ) 12 + { 13 + if (oesc == 0x00) 14 + return NULL; /* Successful completion */ 15 + else { 16 + #ifdef STATIC 17 + return NULL; /* Ignore for pre-boot decompressor */ 18 + #else 19 + sprintf(buf, "Operation-Ending-Supplemental Code is 0x%.2X", oesc); 20 + return buf; 21 + #endif 22 + } 23 + } 24 + 25 + void dfltcc_reset( 26 + z_streamp strm, 27 + uInt size 28 + ) 29 + { 30 + struct dfltcc_state *dfltcc_state = 31 + (struct dfltcc_state *)((char *)strm->state + size); 32 + struct dfltcc_qaf_param *param = 33 + (struct dfltcc_qaf_param *)&dfltcc_state->param; 34 + 35 + /* Initialize available functions */ 36 + if (is_dfltcc_enabled()) { 37 + dfltcc(DFLTCC_QAF, param, NULL, NULL, NULL, NULL, NULL); 38 + memmove(&dfltcc_state->af, param, sizeof(dfltcc_state->af)); 39 + } else 40 + memset(&dfltcc_state->af, 0, sizeof(dfltcc_state->af)); 41 + 42 + /* Initialize parameter block */ 43 + memset(&dfltcc_state->param, 0, sizeof(dfltcc_state->param)); 44 + dfltcc_state->param.nt = 1; 45 + 46 + /* Initialize tuning parameters */ 47 + if (zlib_dfltcc_support == ZLIB_DFLTCC_FULL_DEBUG) 48 + dfltcc_state->level_mask = DFLTCC_LEVEL_MASK_DEBUG; 49 + else 50 + dfltcc_state->level_mask = DFLTCC_LEVEL_MASK; 51 + dfltcc_state->block_size = DFLTCC_BLOCK_SIZE; 52 + dfltcc_state->block_threshold = DFLTCC_FIRST_FHT_BLOCK_SIZE; 53 + dfltcc_state->dht_threshold = DFLTCC_DHT_MIN_SAMPLE_SIZE; 54 + dfltcc_state->param.ribm = DFLTCC_RIBM; 55 + }

+155

lib/zlib_dfltcc/dfltcc.h

··· 1 + // SPDX-License-Identifier: Zlib 2 + #ifndef DFLTCC_H 3 + #define DFLTCC_H 4 + 5 + #include "../zlib_deflate/defutil.h" 6 + #include <asm/facility.h> 7 + #include <asm/setup.h> 8 + 9 + /* 10 + * Tuning parameters. 11 + */ 12 + #define DFLTCC_LEVEL_MASK 0x2 /* DFLTCC compression for level 1 only */ 13 + #define DFLTCC_LEVEL_MASK_DEBUG 0x3fe /* DFLTCC compression for all levels */ 14 + #define DFLTCC_BLOCK_SIZE 1048576 15 + #define DFLTCC_FIRST_FHT_BLOCK_SIZE 4096 16 + #define DFLTCC_DHT_MIN_SAMPLE_SIZE 4096 17 + #define DFLTCC_RIBM 0 18 + 19 + #define DFLTCC_FACILITY 151 20 + 21 + /* 22 + * Parameter Block for Query Available Functions. 23 + */ 24 + struct dfltcc_qaf_param { 25 + char fns[16]; 26 + char reserved1[8]; 27 + char fmts[2]; 28 + char reserved2[6]; 29 + }; 30 + 31 + static_assert(sizeof(struct dfltcc_qaf_param) == 32); 32 + 33 + #define DFLTCC_FMT0 0 34 + 35 + /* 36 + * Parameter Block for Generate Dynamic-Huffman Table, Compress and Expand. 37 + */ 38 + struct dfltcc_param_v0 { 39 + uint16_t pbvn; /* Parameter-Block-Version Number */ 40 + uint8_t mvn; /* Model-Version Number */ 41 + uint8_t ribm; /* Reserved for IBM use */ 42 + unsigned reserved32 : 31; 43 + unsigned cf : 1; /* Continuation Flag */ 44 + uint8_t reserved64[8]; 45 + unsigned nt : 1; /* New Task */ 46 + unsigned reserved129 : 1; 47 + unsigned cvt : 1; /* Check Value Type */ 48 + unsigned reserved131 : 1; 49 + unsigned htt : 1; /* Huffman-Table Type */ 50 + unsigned bcf : 1; /* Block-Continuation Flag */ 51 + unsigned bcc : 1; /* Block Closing Control */ 52 + unsigned bhf : 1; /* Block Header Final */ 53 + unsigned reserved136 : 1; 54 + unsigned reserved137 : 1; 55 + unsigned dhtgc : 1; /* DHT Generation Control */ 56 + unsigned reserved139 : 5; 57 + unsigned reserved144 : 5; 58 + unsigned sbb : 3; /* Sub-Byte Boundary */ 59 + uint8_t oesc; /* Operation-Ending-Supplemental Code */ 60 + unsigned reserved160 : 12; 61 + unsigned ifs : 4; /* Incomplete-Function Status */ 62 + uint16_t ifl; /* Incomplete-Function Length */ 63 + uint8_t reserved192[8]; 64 + uint8_t reserved256[8]; 65 + uint8_t reserved320[4]; 66 + uint16_t hl; /* History Length */ 67 + unsigned reserved368 : 1; 68 + uint16_t ho : 15; /* History Offset */ 69 + uint32_t cv; /* Check Value */ 70 + unsigned eobs : 15; /* End-of-block Symbol */ 71 + unsigned reserved431: 1; 72 + uint8_t eobl : 4; /* End-of-block Length */ 73 + unsigned reserved436 : 12; 74 + unsigned reserved448 : 4; 75 + uint16_t cdhtl : 12; /* Compressed-Dynamic-Huffman Table 76 + Length */ 77 + uint8_t reserved464[6]; 78 + uint8_t cdht[288]; 79 + uint8_t reserved[32]; 80 + uint8_t csb[1152]; 81 + }; 82 + 83 + static_assert(sizeof(struct dfltcc_param_v0) == 1536); 84 + 85 + #define CVT_CRC32 0 86 + #define CVT_ADLER32 1 87 + #define HTT_FIXED 0 88 + #define HTT_DYNAMIC 1 89 + 90 + /* 91 + * Extension of inflate_state and deflate_state for DFLTCC. 92 + */ 93 + struct dfltcc_state { 94 + struct dfltcc_param_v0 param; /* Parameter block */ 95 + struct dfltcc_qaf_param af; /* Available functions */ 96 + uLong level_mask; /* Levels on which to use DFLTCC */ 97 + uLong block_size; /* New block each X bytes */ 98 + uLong block_threshold; /* New block after total_in > X */ 99 + uLong dht_threshold; /* New block only if avail_in >= X */ 100 + char msg[64]; /* Buffer for strm->msg */ 101 + }; 102 + 103 + /* Resides right after inflate_state or deflate_state */ 104 + #define GET_DFLTCC_STATE(state) ((struct dfltcc_state *)((state) + 1)) 105 + 106 + /* External functions */ 107 + int dfltcc_can_deflate(z_streamp strm); 108 + int dfltcc_deflate(z_streamp strm, 109 + int flush, 110 + block_state *result); 111 + void dfltcc_reset(z_streamp strm, uInt size); 112 + int dfltcc_can_inflate(z_streamp strm); 113 + typedef enum { 114 + DFLTCC_INFLATE_CONTINUE, 115 + DFLTCC_INFLATE_BREAK, 116 + DFLTCC_INFLATE_SOFTWARE, 117 + } dfltcc_inflate_action; 118 + dfltcc_inflate_action dfltcc_inflate(z_streamp strm, 119 + int flush, int *ret); 120 + static inline int is_dfltcc_enabled(void) 121 + { 122 + return (zlib_dfltcc_support != ZLIB_DFLTCC_DISABLED && 123 + test_facility(DFLTCC_FACILITY)); 124 + } 125 + 126 + #define DEFLATE_RESET_HOOK(strm) \ 127 + dfltcc_reset((strm), sizeof(deflate_state)) 128 + 129 + #define DEFLATE_HOOK dfltcc_deflate 130 + 131 + #define DEFLATE_NEED_CHECKSUM(strm) (!dfltcc_can_deflate((strm))) 132 + 133 + #define DEFLATE_DFLTCC_ENABLED() is_dfltcc_enabled() 134 + 135 + #define INFLATE_RESET_HOOK(strm) \ 136 + dfltcc_reset((strm), sizeof(struct inflate_state)) 137 + 138 + #define INFLATE_TYPEDO_HOOK(strm, flush) \ 139 + if (dfltcc_can_inflate((strm))) { \ 140 + dfltcc_inflate_action action; \ 141 + \ 142 + RESTORE(); \ 143 + action = dfltcc_inflate((strm), (flush), &ret); \ 144 + LOAD(); \ 145 + if (action == DFLTCC_INFLATE_CONTINUE) \ 146 + break; \ 147 + else if (action == DFLTCC_INFLATE_BREAK) \ 148 + goto inf_leave; \ 149 + } 150 + 151 + #define INFLATE_NEED_CHECKSUM(strm) (!dfltcc_can_inflate((strm))) 152 + 153 + #define INFLATE_NEED_UPDATEWINDOW(strm) (!dfltcc_can_inflate((strm))) 154 + 155 + #endif /* DFLTCC_H */

+279

lib/zlib_dfltcc/dfltcc_deflate.c

··· 1 + // SPDX-License-Identifier: Zlib 2 + 3 + #include "../zlib_deflate/defutil.h" 4 + #include "dfltcc_util.h" 5 + #include "dfltcc.h" 6 + #include <asm/setup.h> 7 + #include <linux/zutil.h> 8 + 9 + /* 10 + * Compress. 11 + */ 12 + int dfltcc_can_deflate( 13 + z_streamp strm 14 + ) 15 + { 16 + deflate_state *state = (deflate_state *)strm->state; 17 + struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state); 18 + 19 + /* Check for kernel dfltcc command line parameter */ 20 + if (zlib_dfltcc_support == ZLIB_DFLTCC_DISABLED || 21 + zlib_dfltcc_support == ZLIB_DFLTCC_INFLATE_ONLY) 22 + return 0; 23 + 24 + /* Unsupported compression settings */ 25 + if (!dfltcc_are_params_ok(state->level, state->w_bits, state->strategy, 26 + dfltcc_state->level_mask)) 27 + return 0; 28 + 29 + /* Unsupported hardware */ 30 + if (!is_bit_set(dfltcc_state->af.fns, DFLTCC_GDHT) || 31 + !is_bit_set(dfltcc_state->af.fns, DFLTCC_CMPR) || 32 + !is_bit_set(dfltcc_state->af.fmts, DFLTCC_FMT0)) 33 + return 0; 34 + 35 + return 1; 36 + } 37 + 38 + static void dfltcc_gdht( 39 + z_streamp strm 40 + ) 41 + { 42 + deflate_state *state = (deflate_state *)strm->state; 43 + struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param; 44 + size_t avail_in = avail_in = strm->avail_in; 45 + 46 + dfltcc(DFLTCC_GDHT, 47 + param, NULL, NULL, 48 + &strm->next_in, &avail_in, NULL); 49 + } 50 + 51 + static dfltcc_cc dfltcc_cmpr( 52 + z_streamp strm 53 + ) 54 + { 55 + deflate_state *state = (deflate_state *)strm->state; 56 + struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param; 57 + size_t avail_in = strm->avail_in; 58 + size_t avail_out = strm->avail_out; 59 + dfltcc_cc cc; 60 + 61 + cc = dfltcc(DFLTCC_CMPR | HBT_CIRCULAR, 62 + param, &strm->next_out, &avail_out, 63 + &strm->next_in, &avail_in, state->window); 64 + strm->total_in += (strm->avail_in - avail_in); 65 + strm->total_out += (strm->avail_out - avail_out); 66 + strm->avail_in = avail_in; 67 + strm->avail_out = avail_out; 68 + return cc; 69 + } 70 + 71 + static void send_eobs( 72 + z_streamp strm, 73 + const struct dfltcc_param_v0 *param 74 + ) 75 + { 76 + deflate_state *state = (deflate_state *)strm->state; 77 + 78 + zlib_tr_send_bits( 79 + state, 80 + bi_reverse(param->eobs >> (15 - param->eobl), param->eobl), 81 + param->eobl); 82 + flush_pending(strm); 83 + if (state->pending != 0) { 84 + /* The remaining data is located in pending_out[0:pending]. If someone 85 + * calls put_byte() - this might happen in deflate() - the byte will be 86 + * placed into pending_buf[pending], which is incorrect. Move the 87 + * remaining data to the beginning of pending_buf so that put_byte() is 88 + * usable again. 89 + */ 90 + memmove(state->pending_buf, state->pending_out, state->pending); 91 + state->pending_out = state->pending_buf; 92 + } 93 + #ifdef ZLIB_DEBUG 94 + state->compressed_len += param->eobl; 95 + #endif 96 + } 97 + 98 + int dfltcc_deflate( 99 + z_streamp strm, 100 + int flush, 101 + block_state *result 102 + ) 103 + { 104 + deflate_state *state = (deflate_state *)strm->state; 105 + struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state); 106 + struct dfltcc_param_v0 *param = &dfltcc_state->param; 107 + uInt masked_avail_in; 108 + dfltcc_cc cc; 109 + int need_empty_block; 110 + int soft_bcc; 111 + int no_flush; 112 + 113 + if (!dfltcc_can_deflate(strm)) 114 + return 0; 115 + 116 + again: 117 + masked_avail_in = 0; 118 + soft_bcc = 0; 119 + no_flush = flush == Z_NO_FLUSH; 120 + 121 + /* Trailing empty block. Switch to software, except when Continuation Flag 122 + * is set, which means that DFLTCC has buffered some output in the 123 + * parameter block and needs to be called again in order to flush it. 124 + */ 125 + if (flush == Z_FINISH && strm->avail_in == 0 && !param->cf) { 126 + if (param->bcf) { 127 + /* A block is still open, and the hardware does not support closing 128 + * blocks without adding data. Thus, close it manually. 129 + */ 130 + send_eobs(strm, param); 131 + param->bcf = 0; 132 + } 133 + return 0; 134 + } 135 + 136 + if (strm->avail_in == 0 && !param->cf) { 137 + *result = need_more; 138 + return 1; 139 + } 140 + 141 + /* There is an open non-BFINAL block, we are not going to close it just 142 + * yet, we have compressed more than DFLTCC_BLOCK_SIZE bytes and we see 143 + * more than DFLTCC_DHT_MIN_SAMPLE_SIZE bytes. Open a new block with a new 144 + * DHT in order to adapt to a possibly changed input data distribution. 145 + */ 146 + if (param->bcf && no_flush && 147 + strm->total_in > dfltcc_state->block_threshold && 148 + strm->avail_in >= dfltcc_state->dht_threshold) { 149 + if (param->cf) { 150 + /* We need to flush the DFLTCC buffer before writing the 151 + * End-of-block Symbol. Mask the input data and proceed as usual. 152 + */ 153 + masked_avail_in += strm->avail_in; 154 + strm->avail_in = 0; 155 + no_flush = 0; 156 + } else { 157 + /* DFLTCC buffer is empty, so we can manually write the 158 + * End-of-block Symbol right away. 159 + */ 160 + send_eobs(strm, param); 161 + param->bcf = 0; 162 + dfltcc_state->block_threshold = 163 + strm->total_in + dfltcc_state->block_size; 164 + if (strm->avail_out == 0) { 165 + *result = need_more; 166 + return 1; 167 + } 168 + } 169 + } 170 + 171 + /* The caller gave us too much data. Pass only one block worth of 172 + * uncompressed data to DFLTCC and mask the rest, so that on the next 173 + * iteration we start a new block. 174 + */ 175 + if (no_flush && strm->avail_in > dfltcc_state->block_size) { 176 + masked_avail_in += (strm->avail_in - dfltcc_state->block_size); 177 + strm->avail_in = dfltcc_state->block_size; 178 + } 179 + 180 + /* When we have an open non-BFINAL deflate block and caller indicates that 181 + * the stream is ending, we need to close an open deflate block and open a 182 + * BFINAL one. 183 + */ 184 + need_empty_block = flush == Z_FINISH && param->bcf && !param->bhf; 185 + 186 + /* Translate stream to parameter block */ 187 + param->cvt = CVT_ADLER32; 188 + if (!no_flush) 189 + /* We need to close a block. Always do this in software - when there is 190 + * no input data, the hardware will not nohor BCC. */ 191 + soft_bcc = 1; 192 + if (flush == Z_FINISH && !param->bcf) 193 + /* We are about to open a BFINAL block, set Block Header Final bit 194 + * until the stream ends. 195 + */ 196 + param->bhf = 1; 197 + /* DFLTCC-CMPR will write to next_out, so make sure that buffers with 198 + * higher precedence are empty. 199 + */ 200 + Assert(state->pending == 0, "There must be no pending bytes"); 201 + Assert(state->bi_valid < 8, "There must be less than 8 pending bits"); 202 + param->sbb = (unsigned int)state->bi_valid; 203 + if (param->sbb > 0) 204 + *strm->next_out = (Byte)state->bi_buf; 205 + if (param->hl) 206 + param->nt = 0; /* Honor history */ 207 + param->cv = strm->adler; 208 + 209 + /* When opening a block, choose a Huffman-Table Type */ 210 + if (!param->bcf) { 211 + if (strm->total_in == 0 && dfltcc_state->block_threshold > 0) { 212 + param->htt = HTT_FIXED; 213 + } 214 + else { 215 + param->htt = HTT_DYNAMIC; 216 + dfltcc_gdht(strm); 217 + } 218 + } 219 + 220 + /* Deflate */ 221 + do { 222 + cc = dfltcc_cmpr(strm); 223 + if (strm->avail_in < 4096 && masked_avail_in > 0) 224 + /* We are about to call DFLTCC with a small input buffer, which is 225 + * inefficient. Since there is masked data, there will be at least 226 + * one more DFLTCC call, so skip the current one and make the next 227 + * one handle more data. 228 + */ 229 + break; 230 + } while (cc == DFLTCC_CC_AGAIN); 231 + 232 + /* Translate parameter block to stream */ 233 + strm->msg = oesc_msg(dfltcc_state->msg, param->oesc); 234 + state->bi_valid = param->sbb; 235 + if (state->bi_valid == 0) 236 + state->bi_buf = 0; /* Avoid accessing next_out */ 237 + else 238 + state->bi_buf = *strm->next_out & ((1 << state->bi_valid) - 1); 239 + strm->adler = param->cv; 240 + 241 + /* Unmask the input data */ 242 + strm->avail_in += masked_avail_in; 243 + masked_avail_in = 0; 244 + 245 + /* If we encounter an error, it means there is a bug in DFLTCC call */ 246 + Assert(cc != DFLTCC_CC_OP2_CORRUPT || param->oesc == 0, "BUG"); 247 + 248 + /* Update Block-Continuation Flag. It will be used to check whether to call 249 + * GDHT the next time. 250 + */ 251 + if (cc == DFLTCC_CC_OK) { 252 + if (soft_bcc) { 253 + send_eobs(strm, param); 254 + param->bcf = 0; 255 + dfltcc_state->block_threshold = 256 + strm->total_in + dfltcc_state->block_size; 257 + } else 258 + param->bcf = 1; 259 + if (flush == Z_FINISH) { 260 + if (need_empty_block) 261 + /* Make the current deflate() call also close the stream */ 262 + return 0; 263 + else { 264 + bi_windup(state); 265 + *result = finish_done; 266 + } 267 + } else { 268 + if (flush == Z_FULL_FLUSH) 269 + param->hl = 0; /* Clear history */ 270 + *result = flush == Z_NO_FLUSH ? need_more : block_done; 271 + } 272 + } else { 273 + param->bcf = 1; 274 + *result = need_more; 275 + } 276 + if (strm->avail_in != 0 && strm->avail_out != 0) 277 + goto again; /* deflate() must use all input or all output */ 278 + return 1; 279 + }

+149

lib/zlib_dfltcc/dfltcc_inflate.c

··· 1 + // SPDX-License-Identifier: Zlib 2 + 3 + #include "../zlib_inflate/inflate.h" 4 + #include "dfltcc_util.h" 5 + #include "dfltcc.h" 6 + #include <asm/setup.h> 7 + #include <linux/zutil.h> 8 + 9 + /* 10 + * Expand. 11 + */ 12 + int dfltcc_can_inflate( 13 + z_streamp strm 14 + ) 15 + { 16 + struct inflate_state *state = (struct inflate_state *)strm->state; 17 + struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state); 18 + 19 + /* Check for kernel dfltcc command line parameter */ 20 + if (zlib_dfltcc_support == ZLIB_DFLTCC_DISABLED || 21 + zlib_dfltcc_support == ZLIB_DFLTCC_DEFLATE_ONLY) 22 + return 0; 23 + 24 + /* Unsupported compression settings */ 25 + if (state->wbits != HB_BITS) 26 + return 0; 27 + 28 + /* Unsupported hardware */ 29 + return is_bit_set(dfltcc_state->af.fns, DFLTCC_XPND) && 30 + is_bit_set(dfltcc_state->af.fmts, DFLTCC_FMT0); 31 + } 32 + 33 + static int dfltcc_was_inflate_used( 34 + z_streamp strm 35 + ) 36 + { 37 + struct inflate_state *state = (struct inflate_state *)strm->state; 38 + struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param; 39 + 40 + return !param->nt; 41 + } 42 + 43 + static int dfltcc_inflate_disable( 44 + z_streamp strm 45 + ) 46 + { 47 + struct inflate_state *state = (struct inflate_state *)strm->state; 48 + struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state); 49 + 50 + if (!dfltcc_can_inflate(strm)) 51 + return 0; 52 + if (dfltcc_was_inflate_used(strm)) 53 + /* DFLTCC has already decompressed some data. Since there is not 54 + * enough information to resume decompression in software, the call 55 + * must fail. 56 + */ 57 + return 1; 58 + /* DFLTCC was not used yet - decompress in software */ 59 + memset(&dfltcc_state->af, 0, sizeof(dfltcc_state->af)); 60 + return 0; 61 + } 62 + 63 + static dfltcc_cc dfltcc_xpnd( 64 + z_streamp strm 65 + ) 66 + { 67 + struct inflate_state *state = (struct inflate_state *)strm->state; 68 + struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param; 69 + size_t avail_in = strm->avail_in; 70 + size_t avail_out = strm->avail_out; 71 + dfltcc_cc cc; 72 + 73 + cc = dfltcc(DFLTCC_XPND | HBT_CIRCULAR, 74 + param, &strm->next_out, &avail_out, 75 + &strm->next_in, &avail_in, state->window); 76 + strm->avail_in = avail_in; 77 + strm->avail_out = avail_out; 78 + return cc; 79 + } 80 + 81 + dfltcc_inflate_action dfltcc_inflate( 82 + z_streamp strm, 83 + int flush, 84 + int *ret 85 + ) 86 + { 87 + struct inflate_state *state = (struct inflate_state *)strm->state; 88 + struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state); 89 + struct dfltcc_param_v0 *param = &dfltcc_state->param; 90 + dfltcc_cc cc; 91 + 92 + if (flush == Z_BLOCK) { 93 + /* DFLTCC does not support stopping on block boundaries */ 94 + if (dfltcc_inflate_disable(strm)) { 95 + *ret = Z_STREAM_ERROR; 96 + return DFLTCC_INFLATE_BREAK; 97 + } else 98 + return DFLTCC_INFLATE_SOFTWARE; 99 + } 100 + 101 + if (state->last) { 102 + if (state->bits != 0) { 103 + strm->next_in++; 104 + strm->avail_in--; 105 + state->bits = 0; 106 + } 107 + state->mode = CHECK; 108 + return DFLTCC_INFLATE_CONTINUE; 109 + } 110 + 111 + if (strm->avail_in == 0 && !param->cf) 112 + return DFLTCC_INFLATE_BREAK; 113 + 114 + if (!state->window || state->wsize == 0) { 115 + state->mode = MEM; 116 + return DFLTCC_INFLATE_CONTINUE; 117 + } 118 + 119 + /* Translate stream to parameter block */ 120 + param->cvt = CVT_ADLER32; 121 + param->sbb = state->bits; 122 + param->hl = state->whave; /* Software and hardware history formats match */ 123 + param->ho = (state->write - state->whave) & ((1 << HB_BITS) - 1); 124 + if (param->hl) 125 + param->nt = 0; /* Honor history for the first block */ 126 + param->cv = state->flags ? REVERSE(state->check) : state->check; 127 + 128 + /* Inflate */ 129 + do { 130 + cc = dfltcc_xpnd(strm); 131 + } while (cc == DFLTCC_CC_AGAIN); 132 + 133 + /* Translate parameter block to stream */ 134 + strm->msg = oesc_msg(dfltcc_state->msg, param->oesc); 135 + state->last = cc == DFLTCC_CC_OK; 136 + state->bits = param->sbb; 137 + state->whave = param->hl; 138 + state->write = (param->ho + param->hl) & ((1 << HB_BITS) - 1); 139 + state->check = state->flags ? REVERSE(param->cv) : param->cv; 140 + if (cc == DFLTCC_CC_OP2_CORRUPT && param->oesc != 0) { 141 + /* Report an error if stream is corrupted */ 142 + state->mode = BAD; 143 + return DFLTCC_INFLATE_CONTINUE; 144 + } 145 + state->mode = TYPEDO; 146 + /* Break if operands are exhausted, otherwise continue looping */ 147 + return (cc == DFLTCC_CC_OP1_TOO_SHORT || cc == DFLTCC_CC_OP2_TOO_SHORT) ? 148 + DFLTCC_INFLATE_BREAK : DFLTCC_INFLATE_CONTINUE; 149 + }

+17

lib/zlib_dfltcc/dfltcc_syms.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * linux/lib/zlib_dfltcc/dfltcc_syms.c 4 + * 5 + * Exported symbols for the s390 zlib dfltcc support. 6 + * 7 + */ 8 + 9 + #include <linux/init.h> 10 + #include <linux/module.h> 11 + #include <linux/zlib.h> 12 + #include "dfltcc.h" 13 + 14 + EXPORT_SYMBOL(dfltcc_can_deflate); 15 + EXPORT_SYMBOL(dfltcc_deflate); 16 + EXPORT_SYMBOL(dfltcc_reset); 17 + MODULE_LICENSE("GPL");

+103

lib/zlib_dfltcc/dfltcc_util.h

··· 1 + // SPDX-License-Identifier: Zlib 2 + #ifndef DFLTCC_UTIL_H 3 + #define DFLTCC_UTIL_H 4 + 5 + #include <linux/zutil.h> 6 + 7 + /* 8 + * C wrapper for the DEFLATE CONVERSION CALL instruction. 9 + */ 10 + typedef enum { 11 + DFLTCC_CC_OK = 0, 12 + DFLTCC_CC_OP1_TOO_SHORT = 1, 13 + DFLTCC_CC_OP2_TOO_SHORT = 2, 14 + DFLTCC_CC_OP2_CORRUPT = 2, 15 + DFLTCC_CC_AGAIN = 3, 16 + } dfltcc_cc; 17 + 18 + #define DFLTCC_QAF 0 19 + #define DFLTCC_GDHT 1 20 + #define DFLTCC_CMPR 2 21 + #define DFLTCC_XPND 4 22 + #define HBT_CIRCULAR (1 << 7) 23 + #define HB_BITS 15 24 + #define HB_SIZE (1 << HB_BITS) 25 + 26 + static inline dfltcc_cc dfltcc( 27 + int fn, 28 + void *param, 29 + Byte **op1, 30 + size_t *len1, 31 + const Byte **op2, 32 + size_t *len2, 33 + void *hist 34 + ) 35 + { 36 + Byte *t2 = op1 ? *op1 : NULL; 37 + size_t t3 = len1 ? *len1 : 0; 38 + const Byte *t4 = op2 ? *op2 : NULL; 39 + size_t t5 = len2 ? *len2 : 0; 40 + register int r0 __asm__("r0") = fn; 41 + register void *r1 __asm__("r1") = param; 42 + register Byte *r2 __asm__("r2") = t2; 43 + register size_t r3 __asm__("r3") = t3; 44 + register const Byte *r4 __asm__("r4") = t4; 45 + register size_t r5 __asm__("r5") = t5; 46 + int cc; 47 + 48 + __asm__ volatile( 49 + ".insn rrf,0xb9390000,%[r2],%[r4],%[hist],0\n" 50 + "ipm %[cc]\n" 51 + : [r2] "+r" (r2) 52 + , [r3] "+r" (r3) 53 + , [r4] "+r" (r4) 54 + , [r5] "+r" (r5) 55 + , [cc] "=r" (cc) 56 + : [r0] "r" (r0) 57 + , [r1] "r" (r1) 58 + , [hist] "r" (hist) 59 + : "cc", "memory"); 60 + t2 = r2; t3 = r3; t4 = r4; t5 = r5; 61 + 62 + if (op1) 63 + *op1 = t2; 64 + if (len1) 65 + *len1 = t3; 66 + if (op2) 67 + *op2 = t4; 68 + if (len2) 69 + *len2 = t5; 70 + return (cc >> 28) & 3; 71 + } 72 + 73 + static inline int is_bit_set( 74 + const char *bits, 75 + int n 76 + ) 77 + { 78 + return bits[n / 8] & (1 << (7 - (n % 8))); 79 + } 80 + 81 + static inline void turn_bit_off( 82 + char *bits, 83 + int n 84 + ) 85 + { 86 + bits[n / 8] &= ~(1 << (7 - (n % 8))); 87 + } 88 + 89 + static inline int dfltcc_are_params_ok( 90 + int level, 91 + uInt window_bits, 92 + int strategy, 93 + uLong level_mask 94 + ) 95 + { 96 + return (level_mask & (1 << level)) != 0 && 97 + (window_bits == HB_BITS) && 98 + (strategy == Z_DEFAULT_STRATEGY); 99 + } 100 + 101 + char *oesc_msg(char *buf, int oesc); 102 + 103 + #endif /* DFLTCC_UTIL_H */

+24 -8

lib/zlib_inflate/inflate.c

··· 15 15 #include "inffast.h" 16 16 #include "infutil.h" 17 17 18 + /* architecture-specific bits */ 19 + #ifdef CONFIG_ZLIB_DFLTCC 20 + # include "../zlib_dfltcc/dfltcc.h" 21 + #else 22 + #define INFLATE_RESET_HOOK(strm) do {} while (0) 23 + #define INFLATE_TYPEDO_HOOK(strm, flush) do {} while (0) 24 + #define INFLATE_NEED_UPDATEWINDOW(strm) 1 25 + #define INFLATE_NEED_CHECKSUM(strm) 1 26 + #endif 27 + 18 28 int zlib_inflate_workspacesize(void) 19 29 { 20 30 return sizeof(struct inflate_workspace); ··· 52 42 state->write = 0; 53 43 state->whave = 0; 54 44 45 + INFLATE_RESET_HOOK(strm); 55 46 return Z_OK; 56 47 } 57 48 ··· 77 66 return Z_STREAM_ERROR; 78 67 } 79 68 state->wbits = (unsigned)windowBits; 69 + #ifdef CONFIG_ZLIB_DFLTCC 70 + /* 71 + * DFLTCC requires the window to be page aligned. 72 + * Thus, we overallocate and take the aligned portion of the buffer. 73 + */ 74 + state->window = PTR_ALIGN(&WS(strm)->working_window[0], PAGE_SIZE); 75 + #else 80 76 state->window = &WS(strm)->working_window[0]; 77 + #endif 81 78 82 79 return zlib_inflateReset(strm); 83 80 } ··· 245 226 hold >>= bits & 7; \ 246 227 bits -= bits & 7; \ 247 228 } while (0) 248 - 249 - /* Reverse the bytes in a 32-bit value */ 250 - #define REVERSE(q) \ 251 - ((((q) >> 24) & 0xff) + (((q) >> 8) & 0xff00) + \ 252 - (((q) & 0xff00) << 8) + (((q) & 0xff) << 24)) 253 229 254 230 /* 255 231 inflate() uses a state machine to process as much input data and generate as ··· 409 395 if (flush == Z_BLOCK) goto inf_leave; 410 396 /* fall through */ 411 397 case TYPEDO: 398 + INFLATE_TYPEDO_HOOK(strm, flush); 412 399 if (state->last) { 413 400 BYTEBITS(); 414 401 state->mode = CHECK; ··· 707 692 out -= left; 708 693 strm->total_out += out; 709 694 state->total += out; 710 - if (out) 695 + if (INFLATE_NEED_CHECKSUM(strm) && out) 711 696 strm->adler = state->check = 712 697 UPDATE(state->check, put - out, out); 713 698 out = left; ··· 741 726 */ 742 727 inf_leave: 743 728 RESTORE(); 744 - if (state->wsize || (state->mode < CHECK && out != strm->avail_out)) 729 + if (INFLATE_NEED_UPDATEWINDOW(strm) && 730 + (state->wsize || (state->mode < CHECK && out != strm->avail_out))) 745 731 zlib_updatewindow(strm, out); 746 732 747 733 in -= strm->avail_in; ··· 750 734 strm->total_in += in; 751 735 strm->total_out += out; 752 736 state->total += out; 753 - if (state->wrap && out) 737 + if (INFLATE_NEED_CHECKSUM(strm) && state->wrap && out) 754 738 strm->adler = state->check = 755 739 UPDATE(state->check, strm->next_out - out, out); 756 740

+8

lib/zlib_inflate/inflate.h

··· 11 11 subject to change. Applications should only use zlib.h. 12 12 */ 13 13 14 + #include "inftrees.h" 15 + 14 16 /* Possible inflate modes between inflate() calls */ 15 17 typedef enum { 16 18 HEAD, /* i: waiting for magic header */ ··· 110 108 unsigned short work[288]; /* work area for code table building */ 111 109 code codes[ENOUGH]; /* space for code tables */ 112 110 }; 111 + 112 + /* Reverse the bytes in a 32-bit value */ 113 + #define REVERSE(q) \ 114 + ((((q) >> 24) & 0xff) + (((q) >> 8) & 0xff00) + \ 115 + (((q) & 0xff00) << 8) + (((q) & 0xff) << 24)) 116 + 113 117 #endif

+16 -2

lib/zlib_inflate/infutil.h

··· 12 12 #define _INFUTIL_H 13 13 14 14 #include <linux/zlib.h> 15 + #ifdef CONFIG_ZLIB_DFLTCC 16 + #include "../zlib_dfltcc/dfltcc.h" 17 + #include <asm/page.h> 18 + #endif 15 19 16 20 /* memory allocation for inflation */ 17 21 18 22 struct inflate_workspace { 19 23 struct inflate_state inflate_state; 20 - unsigned char working_window[1 << MAX_WBITS]; 24 + #ifdef CONFIG_ZLIB_DFLTCC 25 + struct dfltcc_state dfltcc_state; 26 + unsigned char working_window[(1 << MAX_WBITS) + PAGE_SIZE]; 27 + #else 28 + unsigned char working_window[(1 << MAX_WBITS)]; 29 + #endif 21 30 }; 22 31 23 - #define WS(z) ((struct inflate_workspace *)(z->workspace)) 32 + #ifdef CONFIG_ZLIB_DFLTCC 33 + /* dfltcc_state must be doubleword aligned for DFLTCC call */ 34 + static_assert(offsetof(struct inflate_workspace, dfltcc_state) % 8 == 0); 35 + #endif 36 + 37 + #define WS(strm) ((struct inflate_workspace *)(strm->workspace)) 24 38 25 39 #endif

+1

mm/Makefile

··· 20 20 KCOV_INSTRUMENT_memcontrol.o := n 21 21 KCOV_INSTRUMENT_mmzone.o := n 22 22 KCOV_INSTRUMENT_vmstat.o := n 23 + KCOV_INSTRUMENT_failslab.o := n 23 24 24 25 CFLAGS_init-mm.o += $(call cc-disable-warning, override-init) 25 26 CFLAGS_init-mm.o += $(call cc-disable-warning, initializer-overrides)

+1

mm/backing-dev.c

··· 21 21 EXPORT_SYMBOL_GPL(noop_backing_dev_info); 22 22 23 23 static struct class *bdi_class; 24 + const char *bdi_unknown_name = "(unknown)"; 24 25 25 26 /* 26 27 * bdi_lock protects bdi_tree and updates to bdi_list. bdi_list has RCU

+13 -3

mm/debug.c

··· 46 46 { 47 47 struct address_space *mapping; 48 48 bool page_poisoned = PagePoisoned(page); 49 + /* 50 + * Accessing the pageblock without the zone lock. It could change to 51 + * "isolate" again in the meantime, but since we are just dumping the 52 + * state for debugging, it should be fine to accept a bit of 53 + * inaccuracy here due to racing. 54 + */ 55 + bool page_cma = is_migrate_cma_page(page); 49 56 int mapcount; 57 + char *type = ""; 50 58 51 59 /* 52 60 * If struct page is poisoned don't access Page*() functions as that ··· 86 78 page, page_ref_count(page), mapcount, 87 79 page->mapping, page_to_pgoff(page)); 88 80 if (PageKsm(page)) 89 - pr_warn("ksm flags: %#lx(%pGp)\n", page->flags, &page->flags); 81 + type = "ksm "; 90 82 else if (PageAnon(page)) 91 - pr_warn("anon flags: %#lx(%pGp)\n", page->flags, &page->flags); 83 + type = "anon "; 92 84 else if (mapping) { 93 85 if (mapping->host && mapping->host->i_dentry.first) { 94 86 struct dentry *dentry; ··· 96 88 pr_warn("%ps name:\"%pd\"\n", mapping->a_ops, dentry); 97 89 } else 98 90 pr_warn("%ps\n", mapping->a_ops); 99 - pr_warn("flags: %#lx(%pGp)\n", page->flags, &page->flags); 100 91 } 101 92 BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); 93 + 94 + pr_warn("%sflags: %#lx(%pGp)%s\n", type, page->flags, &page->flags, 95 + page_cma ? " CMA" : ""); 102 96 103 97 hex_only: 104 98 print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32,

+4 -4

mm/early_ioremap.c

··· 121 121 } 122 122 } 123 123 124 - if (WARN(slot < 0, "%s(%08llx, %08lx) not found slot\n", 125 - __func__, (u64)phys_addr, size)) 124 + if (WARN(slot < 0, "%s(%pa, %08lx) not found slot\n", 125 + __func__, &phys_addr, size)) 126 126 return NULL; 127 127 128 128 /* Don't allow wraparound or zero size */ ··· 158 158 --idx; 159 159 --nrpages; 160 160 } 161 - WARN(early_ioremap_debug, "%s(%08llx, %08lx) [%d] => %08lx + %08lx\n", 162 - __func__, (u64)phys_addr, size, slot, offset, slot_virt[slot]); 161 + WARN(early_ioremap_debug, "%s(%pa, %08lx) [%d] => %08lx + %08lx\n", 162 + __func__, &phys_addr, size, slot, offset, slot_virt[slot]); 163 163 164 164 prev_map[slot] = (void __iomem *)(offset + slot_virt[slot]); 165 165 return prev_map[slot];

+6 -28

mm/filemap.c

··· 632 632 return mapping->nrpages; 633 633 } 634 634 635 - int filemap_write_and_wait(struct address_space *mapping) 636 - { 637 - int err = 0; 638 - 639 - if (mapping_needs_writeback(mapping)) { 640 - err = filemap_fdatawrite(mapping); 641 - /* 642 - * Even if the above returned error, the pages may be 643 - * written partially (e.g. -ENOSPC), so we wait for it. 644 - * But the -EIO is special case, it may indicate the worst 645 - * thing (e.g. bug) happened, so we avoid waiting for it. 646 - */ 647 - if (err != -EIO) { 648 - int err2 = filemap_fdatawait(mapping); 649 - if (!err) 650 - err = err2; 651 - } else { 652 - /* Clear any previously stored errors */ 653 - filemap_check_errors(mapping); 654 - } 655 - } else { 656 - err = filemap_check_errors(mapping); 657 - } 658 - return err; 659 - } 660 - EXPORT_SYMBOL(filemap_write_and_wait); 661 - 662 635 /** 663 636 * filemap_write_and_wait_range - write out & wait on a file range 664 637 * @mapping: the address_space for the pages ··· 653 680 if (mapping_needs_writeback(mapping)) { 654 681 err = __filemap_fdatawrite_range(mapping, lstart, lend, 655 682 WB_SYNC_ALL); 656 - /* See comment of filemap_write_and_wait() */ 683 + /* 684 + * Even if the above returned error, the pages may be 685 + * written partially (e.g. -ENOSPC), so we wait for it. 686 + * But the -EIO is special case, it may indicate the worst 687 + * thing (e.g. bug) happened, so we avoid waiting for it. 688 + */ 657 689 if (err != -EIO) { 658 690 int err2 = filemap_fdatawait_range(mapping, 659 691 lstart, lend);

+310 -191

mm/gup.c

··· 29 29 unsigned int page_mask; 30 30 }; 31 31 32 + /* 33 + * Return the compound head page with ref appropriately incremented, 34 + * or NULL if that failed. 35 + */ 36 + static inline struct page *try_get_compound_head(struct page *page, int refs) 37 + { 38 + struct page *head = compound_head(page); 39 + 40 + if (WARN_ON_ONCE(page_ref_count(head) < 0)) 41 + return NULL; 42 + if (unlikely(!page_cache_add_speculative(head, refs))) 43 + return NULL; 44 + return head; 45 + } 46 + 32 47 /** 33 - * put_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages 48 + * unpin_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages 34 49 * @pages: array of pages to be maybe marked dirty, and definitely released. 35 50 * @npages: number of pages in the @pages array. 36 51 * @make_dirty: whether to mark the pages dirty ··· 55 40 * 56 41 * For each page in the @pages array, make that page (or its head page, if a 57 42 * compound page) dirty, if @make_dirty is true, and if the page was previously 58 - * listed as clean. In any case, releases all pages using put_user_page(), 59 - * possibly via put_user_pages(), for the non-dirty case. 43 + * listed as clean. In any case, releases all pages using unpin_user_page(), 44 + * possibly via unpin_user_pages(), for the non-dirty case. 60 45 * 61 - * Please see the put_user_page() documentation for details. 46 + * Please see the unpin_user_page() documentation for details. 62 47 * 63 48 * set_page_dirty_lock() is used internally. If instead, set_page_dirty() is 64 49 * required, then the caller should a) verify that this is really correct, 65 50 * because _lock() is usually required, and b) hand code it: 66 - * set_page_dirty_lock(), put_user_page(). 51 + * set_page_dirty_lock(), unpin_user_page(). 67 52 * 68 53 */ 69 - void put_user_pages_dirty_lock(struct page **pages, unsigned long npages, 70 - bool make_dirty) 54 + void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, 55 + bool make_dirty) 71 56 { 72 57 unsigned long index; 73 58 ··· 78 63 */ 79 64 80 65 if (!make_dirty) { 81 - put_user_pages(pages, npages); 66 + unpin_user_pages(pages, npages); 82 67 return; 83 68 } 84 69 ··· 106 91 */ 107 92 if (!PageDirty(page)) 108 93 set_page_dirty_lock(page); 109 - put_user_page(page); 94 + unpin_user_page(page); 110 95 } 111 96 } 112 - EXPORT_SYMBOL(put_user_pages_dirty_lock); 97 + EXPORT_SYMBOL(unpin_user_pages_dirty_lock); 113 98 114 99 /** 115 - * put_user_pages() - release an array of gup-pinned pages. 100 + * unpin_user_pages() - release an array of gup-pinned pages. 116 101 * @pages: array of pages to be marked dirty and released. 117 102 * @npages: number of pages in the @pages array. 118 103 * 119 - * For each page in the @pages array, release the page using put_user_page(). 104 + * For each page in the @pages array, release the page using unpin_user_page(). 120 105 * 121 - * Please see the put_user_page() documentation for details. 106 + * Please see the unpin_user_page() documentation for details. 122 107 */ 123 - void put_user_pages(struct page **pages, unsigned long npages) 108 + void unpin_user_pages(struct page **pages, unsigned long npages) 124 109 { 125 110 unsigned long index; 126 111 ··· 130 115 * single operation to the head page should suffice. 131 116 */ 132 117 for (index = 0; index < npages; index++) 133 - put_user_page(pages[index]); 118 + unpin_user_page(pages[index]); 134 119 } 135 - EXPORT_SYMBOL(put_user_pages); 120 + EXPORT_SYMBOL(unpin_user_pages); 136 121 137 122 #ifdef CONFIG_MMU 138 123 static struct page *no_page_table(struct vm_area_struct *vma, ··· 194 179 spinlock_t *ptl; 195 180 pte_t *ptep, pte; 196 181 182 + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ 183 + if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == 184 + (FOLL_PIN | FOLL_GET))) 185 + return ERR_PTR(-EINVAL); 197 186 retry: 198 187 if (unlikely(pmd_bad(*pmd))) 199 188 return no_page_table(vma, flags); ··· 342 323 pmdval = READ_ONCE(*pmd); 343 324 if (pmd_none(pmdval)) 344 325 return no_page_table(vma, flags); 345 - if (pmd_huge(pmdval) && vma->vm_flags & VM_HUGETLB) { 326 + if (pmd_huge(pmdval) && is_vm_hugetlb_page(vma)) { 346 327 page = follow_huge_pmd(mm, address, pmd, flags); 347 328 if (page) 348 329 return page; ··· 452 433 pud = pud_offset(p4dp, address); 453 434 if (pud_none(*pud)) 454 435 return no_page_table(vma, flags); 455 - if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) { 436 + if (pud_huge(*pud) && is_vm_hugetlb_page(vma)) { 456 437 page = follow_huge_pud(mm, address, pud, flags); 457 438 if (page) 458 439 return page; ··· 815 796 816 797 start = untagged_addr(start); 817 798 818 - VM_BUG_ON(!!pages != !!(gup_flags & FOLL_GET)); 799 + VM_BUG_ON(!!pages != !!(gup_flags & (FOLL_GET | FOLL_PIN))); 819 800 820 801 /* 821 802 * If FOLL_FORCE is set then do not force a full fault as the hinting ··· 1039 1020 BUG_ON(*locked != 1); 1040 1021 } 1041 1022 1042 - if (pages) 1023 + /* 1024 + * FOLL_PIN and FOLL_GET are mutually exclusive. Traditional behavior 1025 + * is to set FOLL_GET if the caller wants pages[] filled in (but has 1026 + * carelessly failed to specify FOLL_GET), so keep doing that, but only 1027 + * for FOLL_GET, not for the newer FOLL_PIN. 1028 + * 1029 + * FOLL_PIN always expects pages to be non-null, but no need to assert 1030 + * that here, as any failures will be obvious enough. 1031 + */ 1032 + if (pages && !(flags & FOLL_PIN)) 1043 1033 flags |= FOLL_GET; 1044 1034 1045 1035 pages_done = 0; ··· 1123 1095 } 1124 1096 return pages_done; 1125 1097 } 1126 - 1127 - /* 1128 - * get_user_pages_remote() - pin user pages in memory 1129 - * @tsk: the task_struct to use for page fault accounting, or 1130 - * NULL if faults are not to be recorded. 1131 - * @mm: mm_struct of target mm 1132 - * @start: starting user address 1133 - * @nr_pages: number of pages from start to pin 1134 - * @gup_flags: flags modifying lookup behaviour 1135 - * @pages: array that receives pointers to the pages pinned. 1136 - * Should be at least nr_pages long. Or NULL, if caller 1137 - * only intends to ensure the pages are faulted in. 1138 - * @vmas: array of pointers to vmas corresponding to each page. 1139 - * Or NULL if the caller does not require them. 1140 - * @locked: pointer to lock flag indicating whether lock is held and 1141 - * subsequently whether VM_FAULT_RETRY functionality can be 1142 - * utilised. Lock must initially be held. 1143 - * 1144 - * Returns either number of pages pinned (which may be less than the 1145 - * number requested), or an error. Details about the return value: 1146 - * 1147 - * -- If nr_pages is 0, returns 0. 1148 - * -- If nr_pages is >0, but no pages were pinned, returns -errno. 1149 - * -- If nr_pages is >0, and some pages were pinned, returns the number of 1150 - * pages pinned. Again, this may be less than nr_pages. 1151 - * 1152 - * The caller is responsible for releasing returned @pages, via put_page(). 1153 - * 1154 - * @vmas are valid only as long as mmap_sem is held. 1155 - * 1156 - * Must be called with mmap_sem held for read or write. 1157 - * 1158 - * get_user_pages walks a process's page tables and takes a reference to 1159 - * each struct page that each user address corresponds to at a given 1160 - * instant. That is, it takes the page that would be accessed if a user 1161 - * thread accesses the given user virtual address at that instant. 1162 - * 1163 - * This does not guarantee that the page exists in the user mappings when 1164 - * get_user_pages returns, and there may even be a completely different 1165 - * page there in some cases (eg. if mmapped pagecache has been invalidated 1166 - * and subsequently re faulted). However it does guarantee that the page 1167 - * won't be freed completely. And mostly callers simply care that the page 1168 - * contains data that was valid *at some point in time*. Typically, an IO 1169 - * or similar operation cannot guarantee anything stronger anyway because 1170 - * locks can't be held over the syscall boundary. 1171 - * 1172 - * If gup_flags & FOLL_WRITE == 0, the page must not be written to. If the page 1173 - * is written to, set_page_dirty (or set_page_dirty_lock, as appropriate) must 1174 - * be called after the page is finished with, and before put_page is called. 1175 - * 1176 - * get_user_pages is typically used for fewer-copy IO operations, to get a 1177 - * handle on the memory by some means other than accesses via the user virtual 1178 - * addresses. The pages may be submitted for DMA to devices or accessed via 1179 - * their kernel linear mapping (via the kmap APIs). Care should be taken to 1180 - * use the correct cache flushing APIs. 1181 - * 1182 - * See also get_user_pages_fast, for performance critical applications. 1183 - * 1184 - * get_user_pages should be phased out in favor of 1185 - * get_user_pages_locked|unlocked or get_user_pages_fast. Nothing 1186 - * should use get_user_pages because it cannot pass 1187 - * FAULT_FLAG_ALLOW_RETRY to handle_mm_fault. 1188 - */ 1189 - long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, 1190 - unsigned long start, unsigned long nr_pages, 1191 - unsigned int gup_flags, struct page **pages, 1192 - struct vm_area_struct **vmas, int *locked) 1193 - { 1194 - /* 1195 - * FIXME: Current FOLL_LONGTERM behavior is incompatible with 1196 - * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on 1197 - * vmas. As there are no users of this flag in this call we simply 1198 - * disallow this option for now. 1199 - */ 1200 - if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM)) 1201 - return -EINVAL; 1202 - 1203 - return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas, 1204 - locked, 1205 - gup_flags | FOLL_TOUCH | FOLL_REMOTE); 1206 - } 1207 - EXPORT_SYMBOL(get_user_pages_remote); 1208 1098 1209 1099 /** 1210 1100 * populate_vma_page_range() - populate a range of pages in the vma. ··· 1558 1612 #endif /* CONFIG_FS_DAX || CONFIG_CMA */ 1559 1613 1560 1614 /* 1615 + * get_user_pages_remote() - pin user pages in memory 1616 + * @tsk: the task_struct to use for page fault accounting, or 1617 + * NULL if faults are not to be recorded. 1618 + * @mm: mm_struct of target mm 1619 + * @start: starting user address 1620 + * @nr_pages: number of pages from start to pin 1621 + * @gup_flags: flags modifying lookup behaviour 1622 + * @pages: array that receives pointers to the pages pinned. 1623 + * Should be at least nr_pages long. Or NULL, if caller 1624 + * only intends to ensure the pages are faulted in. 1625 + * @vmas: array of pointers to vmas corresponding to each page. 1626 + * Or NULL if the caller does not require them. 1627 + * @locked: pointer to lock flag indicating whether lock is held and 1628 + * subsequently whether VM_FAULT_RETRY functionality can be 1629 + * utilised. Lock must initially be held. 1630 + * 1631 + * Returns either number of pages pinned (which may be less than the 1632 + * number requested), or an error. Details about the return value: 1633 + * 1634 + * -- If nr_pages is 0, returns 0. 1635 + * -- If nr_pages is >0, but no pages were pinned, returns -errno. 1636 + * -- If nr_pages is >0, and some pages were pinned, returns the number of 1637 + * pages pinned. Again, this may be less than nr_pages. 1638 + * 1639 + * The caller is responsible for releasing returned @pages, via put_page(). 1640 + * 1641 + * @vmas are valid only as long as mmap_sem is held. 1642 + * 1643 + * Must be called with mmap_sem held for read or write. 1644 + * 1645 + * get_user_pages walks a process's page tables and takes a reference to 1646 + * each struct page that each user address corresponds to at a given 1647 + * instant. That is, it takes the page that would be accessed if a user 1648 + * thread accesses the given user virtual address at that instant. 1649 + * 1650 + * This does not guarantee that the page exists in the user mappings when 1651 + * get_user_pages returns, and there may even be a completely different 1652 + * page there in some cases (eg. if mmapped pagecache has been invalidated 1653 + * and subsequently re faulted). However it does guarantee that the page 1654 + * won't be freed completely. And mostly callers simply care that the page 1655 + * contains data that was valid *at some point in time*. Typically, an IO 1656 + * or similar operation cannot guarantee anything stronger anyway because 1657 + * locks can't be held over the syscall boundary. 1658 + * 1659 + * If gup_flags & FOLL_WRITE == 0, the page must not be written to. If the page 1660 + * is written to, set_page_dirty (or set_page_dirty_lock, as appropriate) must 1661 + * be called after the page is finished with, and before put_page is called. 1662 + * 1663 + * get_user_pages is typically used for fewer-copy IO operations, to get a 1664 + * handle on the memory by some means other than accesses via the user virtual 1665 + * addresses. The pages may be submitted for DMA to devices or accessed via 1666 + * their kernel linear mapping (via the kmap APIs). Care should be taken to 1667 + * use the correct cache flushing APIs. 1668 + * 1669 + * See also get_user_pages_fast, for performance critical applications. 1670 + * 1671 + * get_user_pages should be phased out in favor of 1672 + * get_user_pages_locked|unlocked or get_user_pages_fast. Nothing 1673 + * should use get_user_pages because it cannot pass 1674 + * FAULT_FLAG_ALLOW_RETRY to handle_mm_fault. 1675 + */ 1676 + #ifdef CONFIG_MMU 1677 + long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, 1678 + unsigned long start, unsigned long nr_pages, 1679 + unsigned int gup_flags, struct page **pages, 1680 + struct vm_area_struct **vmas, int *locked) 1681 + { 1682 + /* 1683 + * FOLL_PIN must only be set internally by the pin_user_pages*() APIs, 1684 + * never directly by the caller, so enforce that with an assertion: 1685 + */ 1686 + if (WARN_ON_ONCE(gup_flags & FOLL_PIN)) 1687 + return -EINVAL; 1688 + 1689 + /* 1690 + * Parts of FOLL_LONGTERM behavior are incompatible with 1691 + * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on 1692 + * vmas. However, this only comes up if locked is set, and there are 1693 + * callers that do request FOLL_LONGTERM, but do not set locked. So, 1694 + * allow what we can. 1695 + */ 1696 + if (gup_flags & FOLL_LONGTERM) { 1697 + if (WARN_ON_ONCE(locked)) 1698 + return -EINVAL; 1699 + /* 1700 + * This will check the vmas (even if our vmas arg is NULL) 1701 + * and return -ENOTSUPP if DAX isn't allowed in this case: 1702 + */ 1703 + return __gup_longterm_locked(tsk, mm, start, nr_pages, pages, 1704 + vmas, gup_flags | FOLL_TOUCH | 1705 + FOLL_REMOTE); 1706 + } 1707 + 1708 + return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas, 1709 + locked, 1710 + gup_flags | FOLL_TOUCH | FOLL_REMOTE); 1711 + } 1712 + EXPORT_SYMBOL(get_user_pages_remote); 1713 + 1714 + #else /* CONFIG_MMU */ 1715 + long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, 1716 + unsigned long start, unsigned long nr_pages, 1717 + unsigned int gup_flags, struct page **pages, 1718 + struct vm_area_struct **vmas, int *locked) 1719 + { 1720 + return 0; 1721 + } 1722 + #endif /* !CONFIG_MMU */ 1723 + 1724 + /* 1561 1725 * This is the same as get_user_pages_remote(), just with a 1562 1726 * less-flexible calling convention where we assume that the task 1563 1727 * and mm being operated on are the current task's and don't allow ··· 1678 1622 unsigned int gup_flags, struct page **pages, 1679 1623 struct vm_area_struct **vmas) 1680 1624 { 1625 + /* 1626 + * FOLL_PIN must only be set internally by the pin_user_pages*() APIs, 1627 + * never directly by the caller, so enforce that with an assertion: 1628 + */ 1629 + if (WARN_ON_ONCE(gup_flags & FOLL_PIN)) 1630 + return -EINVAL; 1631 + 1681 1632 return __gup_longterm_locked(current, current->mm, start, nr_pages, 1682 1633 pages, vmas, gup_flags | FOLL_TOUCH); 1683 1634 } ··· 1870 1807 } 1871 1808 } 1872 1809 1873 - /* 1874 - * Return the compund head page with ref appropriately incremented, 1875 - * or NULL if that failed. 1876 - */ 1877 - static inline struct page *try_get_compound_head(struct page *page, int refs) 1878 - { 1879 - struct page *head = compound_head(page); 1880 - if (WARN_ON_ONCE(page_ref_count(head) < 0)) 1881 - return NULL; 1882 - if (unlikely(!page_cache_add_speculative(head, refs))) 1883 - return NULL; 1884 - return head; 1885 - } 1886 - 1887 1810 #ifdef CONFIG_ARCH_HAS_PTE_SPECIAL 1888 1811 static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, 1889 1812 unsigned int flags, struct page **pages, int *nr) ··· 2027 1978 } 2028 1979 #endif 2029 1980 1981 + static int record_subpages(struct page *page, unsigned long addr, 1982 + unsigned long end, struct page **pages) 1983 + { 1984 + int nr; 1985 + 1986 + for (nr = 0; addr != end; addr += PAGE_SIZE) 1987 + pages[nr++] = page++; 1988 + 1989 + return nr; 1990 + } 1991 + 1992 + static void put_compound_head(struct page *page, int refs) 1993 + { 1994 + VM_BUG_ON_PAGE(page_ref_count(page) < refs, page); 1995 + /* 1996 + * Calling put_page() for each ref is unnecessarily slow. Only the last 1997 + * ref needs a put_page(). 1998 + */ 1999 + if (refs > 1) 2000 + page_ref_sub(page, refs - 1); 2001 + put_page(page); 2002 + } 2003 + 2030 2004 #ifdef CONFIG_ARCH_HAS_HUGEPD 2031 2005 static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end, 2032 2006 unsigned long sz) ··· 2079 2007 /* hugepages are never "special" */ 2080 2008 VM_BUG_ON(!pfn_valid(pte_pfn(pte))); 2081 2009 2082 - refs = 0; 2083 2010 head = pte_page(pte); 2084 - 2085 2011 page = head + ((addr & (sz-1)) >> PAGE_SHIFT); 2086 - do { 2087 - VM_BUG_ON(compound_head(page) != head); 2088 - pages[*nr] = page; 2089 - (*nr)++; 2090 - page++; 2091 - refs++; 2092 - } while (addr += PAGE_SIZE, addr != end); 2012 + refs = record_subpages(page, addr, end, pages + *nr); 2093 2013 2094 2014 head = try_get_compound_head(head, refs); 2095 - if (!head) { 2096 - *nr -= refs; 2015 + if (!head) 2097 2016 return 0; 2098 - } 2099 2017 2100 2018 if (unlikely(pte_val(pte) != pte_val(*ptep))) { 2101 - /* Could be optimized better */ 2102 - *nr -= refs; 2103 - while (refs--) 2104 - put_page(head); 2019 + put_compound_head(head, refs); 2105 2020 return 0; 2106 2021 } 2107 2022 2023 + *nr += refs; 2108 2024 SetPageReferenced(head); 2109 2025 return 1; 2110 2026 } ··· 2139 2079 return __gup_device_huge_pmd(orig, pmdp, addr, end, pages, nr); 2140 2080 } 2141 2081 2142 - refs = 0; 2143 2082 page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); 2144 - do { 2145 - pages[*nr] = page; 2146 - (*nr)++; 2147 - page++; 2148 - refs++; 2149 - } while (addr += PAGE_SIZE, addr != end); 2083 + refs = record_subpages(page, addr, end, pages + *nr); 2150 2084 2151 2085 head = try_get_compound_head(pmd_page(orig), refs); 2152 - if (!head) { 2153 - *nr -= refs; 2086 + if (!head) 2154 2087 return 0; 2155 - } 2156 2088 2157 2089 if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { 2158 - *nr -= refs; 2159 - while (refs--) 2160 - put_page(head); 2090 + put_compound_head(head, refs); 2161 2091 return 0; 2162 2092 } 2163 2093 2094 + *nr += refs; 2164 2095 SetPageReferenced(head); 2165 2096 return 1; 2166 2097 } ··· 2171 2120 return __gup_device_huge_pud(orig, pudp, addr, end, pages, nr); 2172 2121 } 2173 2122 2174 - refs = 0; 2175 2123 page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); 2176 - do { 2177 - pages[*nr] = page; 2178 - (*nr)++; 2179 - page++; 2180 - refs++; 2181 - } while (addr += PAGE_SIZE, addr != end); 2124 + refs = record_subpages(page, addr, end, pages + *nr); 2182 2125 2183 2126 head = try_get_compound_head(pud_page(orig), refs); 2184 - if (!head) { 2185 - *nr -= refs; 2127 + if (!head) 2186 2128 return 0; 2187 - } 2188 2129 2189 2130 if (unlikely(pud_val(orig) != pud_val(*pudp))) { 2190 - *nr -= refs; 2191 - while (refs--) 2192 - put_page(head); 2131 + put_compound_head(head, refs); 2193 2132 return 0; 2194 2133 } 2195 2134 2135 + *nr += refs; 2196 2136 SetPageReferenced(head); 2197 2137 return 1; 2198 2138 } ··· 2199 2157 return 0; 2200 2158 2201 2159 BUILD_BUG_ON(pgd_devmap(orig)); 2202 - refs = 0; 2160 + 2203 2161 page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT); 2204 - do { 2205 - pages[*nr] = page; 2206 - (*nr)++; 2207 - page++; 2208 - refs++; 2209 - } while (addr += PAGE_SIZE, addr != end); 2162 + refs = record_subpages(page, addr, end, pages + *nr); 2210 2163 2211 2164 head = try_get_compound_head(pgd_page(orig), refs); 2212 - if (!head) { 2213 - *nr -= refs; 2165 + if (!head) 2214 2166 return 0; 2215 - } 2216 2167 2217 2168 if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) { 2218 - *nr -= refs; 2219 - while (refs--) 2220 - put_page(head); 2169 + put_compound_head(head, refs); 2221 2170 return 0; 2222 2171 } 2223 2172 2173 + *nr += refs; 2224 2174 SetPageReferenced(head); 2225 2175 return 1; 2226 2176 } ··· 2271 2237 pud_t pud = READ_ONCE(*pudp); 2272 2238 2273 2239 next = pud_addr_end(addr, end); 2274 - if (pud_none(pud)) 2240 + if (unlikely(!pud_present(pud))) 2275 2241 return 0; 2276 2242 if (unlikely(pud_huge(pud))) { 2277 2243 if (!gup_huge_pud(pud, pudp, addr, next, flags, ··· 2427 2393 return ret; 2428 2394 } 2429 2395 2430 - /** 2431 - * get_user_pages_fast() - pin user pages in memory 2432 - * @start: starting user address 2433 - * @nr_pages: number of pages from start to pin 2434 - * @gup_flags: flags modifying pin behaviour 2435 - * @pages: array that receives pointers to the pages pinned. 2436 - * Should be at least nr_pages long. 2437 - * 2438 - * Attempt to pin user pages in memory without taking mm->mmap_sem. 2439 - * If not successful, it will fall back to taking the lock and 2440 - * calling get_user_pages(). 2441 - * 2442 - * Returns number of pages pinned. This may be fewer than the number 2443 - * requested. If nr_pages is 0 or negative, returns 0. If no pages 2444 - * were pinned, returns -errno. 2445 - */ 2446 - int get_user_pages_fast(unsigned long start, int nr_pages, 2447 - unsigned int gup_flags, struct page **pages) 2396 + static int internal_get_user_pages_fast(unsigned long start, int nr_pages, 2397 + unsigned int gup_flags, 2398 + struct page **pages) 2448 2399 { 2449 2400 unsigned long addr, len, end; 2450 2401 int nr = 0, ret = 0; 2451 2402 2452 - if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM))) 2403 + if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM | 2404 + FOLL_FORCE | FOLL_PIN))) 2453 2405 return -EINVAL; 2454 2406 2455 2407 start = untagged_addr(start) & PAGE_MASK; ··· 2475 2455 2476 2456 return ret; 2477 2457 } 2458 + 2459 + /** 2460 + * get_user_pages_fast() - pin user pages in memory 2461 + * @start: starting user address 2462 + * @nr_pages: number of pages from start to pin 2463 + * @gup_flags: flags modifying pin behaviour 2464 + * @pages: array that receives pointers to the pages pinned. 2465 + * Should be at least nr_pages long. 2466 + * 2467 + * Attempt to pin user pages in memory without taking mm->mmap_sem. 2468 + * If not successful, it will fall back to taking the lock and 2469 + * calling get_user_pages(). 2470 + * 2471 + * Returns number of pages pinned. This may be fewer than the number requested. 2472 + * If nr_pages is 0 or negative, returns 0. If no pages were pinned, returns 2473 + * -errno. 2474 + */ 2475 + int get_user_pages_fast(unsigned long start, int nr_pages, 2476 + unsigned int gup_flags, struct page **pages) 2477 + { 2478 + /* 2479 + * FOLL_PIN must only be set internally by the pin_user_pages*() APIs, 2480 + * never directly by the caller, so enforce that: 2481 + */ 2482 + if (WARN_ON_ONCE(gup_flags & FOLL_PIN)) 2483 + return -EINVAL; 2484 + 2485 + return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages); 2486 + } 2478 2487 EXPORT_SYMBOL_GPL(get_user_pages_fast); 2488 + 2489 + /** 2490 + * pin_user_pages_fast() - pin user pages in memory without taking locks 2491 + * 2492 + * For now, this is a placeholder function, until various call sites are 2493 + * converted to use the correct get_user_pages*() or pin_user_pages*() API. So, 2494 + * this is identical to get_user_pages_fast(). 2495 + * 2496 + * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It 2497 + * is NOT intended for Case 2 (RDMA: long-term pins). 2498 + */ 2499 + int pin_user_pages_fast(unsigned long start, int nr_pages, 2500 + unsigned int gup_flags, struct page **pages) 2501 + { 2502 + /* 2503 + * This is a placeholder, until the pin functionality is activated. 2504 + * Until then, just behave like the corresponding get_user_pages*() 2505 + * routine. 2506 + */ 2507 + return get_user_pages_fast(start, nr_pages, gup_flags, pages); 2508 + } 2509 + EXPORT_SYMBOL_GPL(pin_user_pages_fast); 2510 + 2511 + /** 2512 + * pin_user_pages_remote() - pin pages of a remote process (task != current) 2513 + * 2514 + * For now, this is a placeholder function, until various call sites are 2515 + * converted to use the correct get_user_pages*() or pin_user_pages*() API. So, 2516 + * this is identical to get_user_pages_remote(). 2517 + * 2518 + * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It 2519 + * is NOT intended for Case 2 (RDMA: long-term pins). 2520 + */ 2521 + long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, 2522 + unsigned long start, unsigned long nr_pages, 2523 + unsigned int gup_flags, struct page **pages, 2524 + struct vm_area_struct **vmas, int *locked) 2525 + { 2526 + /* 2527 + * This is a placeholder, until the pin functionality is activated. 2528 + * Until then, just behave like the corresponding get_user_pages*() 2529 + * routine. 2530 + */ 2531 + return get_user_pages_remote(tsk, mm, start, nr_pages, gup_flags, pages, 2532 + vmas, locked); 2533 + } 2534 + EXPORT_SYMBOL(pin_user_pages_remote); 2535 + 2536 + /** 2537 + * pin_user_pages() - pin user pages in memory for use by other devices 2538 + * 2539 + * For now, this is a placeholder function, until various call sites are 2540 + * converted to use the correct get_user_pages*() or pin_user_pages*() API. So, 2541 + * this is identical to get_user_pages(). 2542 + * 2543 + * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It 2544 + * is NOT intended for Case 2 (RDMA: long-term pins). 2545 + */ 2546 + long pin_user_pages(unsigned long start, unsigned long nr_pages, 2547 + unsigned int gup_flags, struct page **pages, 2548 + struct vm_area_struct **vmas) 2549 + { 2550 + /* 2551 + * This is a placeholder, until the pin functionality is activated. 2552 + * Until then, just behave like the corresponding get_user_pages*() 2553 + * routine. 2554 + */ 2555 + return get_user_pages(start, nr_pages, gup_flags, pages, vmas); 2556 + } 2557 + EXPORT_SYMBOL(pin_user_pages);

+6 -3

mm/gup_benchmark.c

··· 49 49 nr = (next - addr) / PAGE_SIZE; 50 50 } 51 51 52 + /* Filter out most gup flags: only allow a tiny subset here: */ 53 + gup->flags &= FOLL_WRITE; 54 + 52 55 switch (cmd) { 53 56 case GUP_FAST_BENCHMARK: 54 - nr = get_user_pages_fast(addr, nr, gup->flags & 1, 57 + nr = get_user_pages_fast(addr, nr, gup->flags, 55 58 pages + i); 56 59 break; 57 60 case GUP_LONGTERM_BENCHMARK: 58 61 nr = get_user_pages(addr, nr, 59 - (gup->flags & 1) | FOLL_LONGTERM, 62 + gup->flags | FOLL_LONGTERM, 60 63 pages + i, NULL); 61 64 break; 62 65 case GUP_BENCHMARK: 63 - nr = get_user_pages(addr, nr, gup->flags & 1, pages + i, 66 + nr = get_user_pages(addr, nr, gup->flags, pages + i, 64 67 NULL); 65 68 break; 66 69 default:

+18 -26

mm/huge_memory.c

··· 177 177 { 178 178 ssize_t ret = count; 179 179 180 - if (!memcmp("always", buf, 181 - min(sizeof("always")-1, count))) { 180 + if (sysfs_streq(buf, "always")) { 182 181 clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags); 183 182 set_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags); 184 - } else if (!memcmp("madvise", buf, 185 - min(sizeof("madvise")-1, count))) { 183 + } else if (sysfs_streq(buf, "madvise")) { 186 184 clear_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags); 187 185 set_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags); 188 - } else if (!memcmp("never", buf, 189 - min(sizeof("never")-1, count))) { 186 + } else if (sysfs_streq(buf, "never")) { 190 187 clear_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags); 191 188 clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags); 192 189 } else ··· 247 250 struct kobj_attribute *attr, 248 251 const char *buf, size_t count) 249 252 { 250 - if (!memcmp("always", buf, 251 - min(sizeof("always")-1, count))) { 253 + if (sysfs_streq(buf, "always")) { 252 254 clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); 253 255 clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); 254 256 clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); 255 257 set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); 256 - } else if (!memcmp("defer+madvise", buf, 257 - min(sizeof("defer+madvise")-1, count))) { 258 + } else if (sysfs_streq(buf, "defer+madvise")) { 258 259 clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); 259 260 clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); 260 261 clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); 261 262 set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); 262 - } else if (!memcmp("defer", buf, 263 - min(sizeof("defer")-1, count))) { 263 + } else if (sysfs_streq(buf, "defer")) { 264 264 clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); 265 265 clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); 266 266 clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); 267 267 set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); 268 - } else if (!memcmp("madvise", buf, 269 - min(sizeof("madvise")-1, count))) { 268 + } else if (sysfs_streq(buf, "madvise")) { 270 269 clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); 271 270 clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); 272 271 clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); 273 272 set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); 274 - } else if (!memcmp("never", buf, 275 - min(sizeof("never")-1, count))) { 273 + } else if (sysfs_streq(buf, "never")) { 276 274 clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); 277 275 clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); 278 276 clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); ··· 2707 2715 { 2708 2716 struct page *head = compound_head(page); 2709 2717 struct pglist_data *pgdata = NODE_DATA(page_to_nid(head)); 2710 - struct deferred_split *ds_queue = get_deferred_split_queue(page); 2718 + struct deferred_split *ds_queue = get_deferred_split_queue(head); 2711 2719 struct anon_vma *anon_vma = NULL; 2712 2720 struct address_space *mapping = NULL; 2713 2721 int count, mapcount, extra_pins, ret; ··· 2715 2723 unsigned long flags; 2716 2724 pgoff_t end; 2717 2725 2718 - VM_BUG_ON_PAGE(is_huge_zero_page(page), page); 2719 - VM_BUG_ON_PAGE(!PageLocked(page), page); 2720 - VM_BUG_ON_PAGE(!PageCompound(page), page); 2726 + VM_BUG_ON_PAGE(is_huge_zero_page(head), head); 2727 + VM_BUG_ON_PAGE(!PageLocked(head), head); 2728 + VM_BUG_ON_PAGE(!PageCompound(head), head); 2721 2729 2722 - if (PageWriteback(page)) 2730 + if (PageWriteback(head)) 2723 2731 return -EBUSY; 2724 2732 2725 2733 if (PageAnon(head)) { ··· 2770 2778 goto out_unlock; 2771 2779 } 2772 2780 2773 - mlocked = PageMlocked(page); 2781 + mlocked = PageMlocked(head); 2774 2782 unmap_page(head); 2775 2783 VM_BUG_ON_PAGE(compound_mapcount(head), head); 2776 2784 ··· 2802 2810 ds_queue->split_queue_len--; 2803 2811 list_del(page_deferred_list(head)); 2804 2812 } 2813 + spin_unlock(&ds_queue->split_queue_lock); 2805 2814 if (mapping) { 2806 - if (PageSwapBacked(page)) 2807 - __dec_node_page_state(page, NR_SHMEM_THPS); 2815 + if (PageSwapBacked(head)) 2816 + __dec_node_page_state(head, NR_SHMEM_THPS); 2808 2817 else 2809 - __dec_node_page_state(page, NR_FILE_THPS); 2818 + __dec_node_page_state(head, NR_FILE_THPS); 2810 2819 } 2811 2820 2812 - spin_unlock(&ds_queue->split_queue_lock); 2813 2821 __split_huge_page(page, list, end, flags); 2814 2822 if (PageSwapCache(head)) { 2815 2823 swp_entry_t entry = { .val = page_private(head) };

+56 -56

mm/kmemleak.c

··· 13 13 * 14 14 * The following locks and mutexes are used by kmemleak: 15 15 * 16 - * - kmemleak_lock (rwlock): protects the object_list modifications and 16 + * - kmemleak_lock (raw_spinlock_t): protects the object_list modifications and 17 17 * accesses to the object_tree_root. The object_list is the main list 18 18 * holding the metadata (struct kmemleak_object) for the allocated memory 19 19 * blocks. The object_tree_root is a red black tree used to look-up ··· 22 22 * object_tree_root in the create_object() function called from the 23 23 * kmemleak_alloc() callback and removed in delete_object() called from the 24 24 * kmemleak_free() callback 25 - * - kmemleak_object.lock (spinlock): protects a kmemleak_object. Accesses to 26 - * the metadata (e.g. count) are protected by this lock. Note that some 27 - * members of this structure may be protected by other means (atomic or 28 - * kmemleak_lock). This lock is also held when scanning the corresponding 29 - * memory block to avoid the kernel freeing it via the kmemleak_free() 30 - * callback. This is less heavyweight than holding a global lock like 31 - * kmemleak_lock during scanning 25 + * - kmemleak_object.lock (raw_spinlock_t): protects a kmemleak_object. 26 + * Accesses to the metadata (e.g. count) are protected by this lock. Note 27 + * that some members of this structure may be protected by other means 28 + * (atomic or kmemleak_lock). This lock is also held when scanning the 29 + * corresponding memory block to avoid the kernel freeing it via the 30 + * kmemleak_free() callback. This is less heavyweight than holding a global 31 + * lock like kmemleak_lock during scanning. 32 32 * - scan_mutex (mutex): ensures that only one thread may scan the memory for 33 33 * unreferenced objects at a time. The gray_list contains the objects which 34 34 * are already referenced or marked as false positives and need to be ··· 135 135 * (use_count) and freed using the RCU mechanism. 136 136 */ 137 137 struct kmemleak_object { 138 - spinlock_t lock; 138 + raw_spinlock_t lock; 139 139 unsigned int flags; /* object status flags */ 140 140 struct list_head object_list; 141 141 struct list_head gray_list; ··· 191 191 static LIST_HEAD(mem_pool_free_list); 192 192 /* search tree for object boundaries */ 193 193 static struct rb_root object_tree_root = RB_ROOT; 194 - /* rw_lock protecting the access to object_list and object_tree_root */ 195 - static DEFINE_RWLOCK(kmemleak_lock); 194 + /* protecting the access to object_list and object_tree_root */ 195 + static DEFINE_RAW_SPINLOCK(kmemleak_lock); 196 196 197 197 /* allocation caches for kmemleak internal data */ 198 198 static struct kmem_cache *object_cache; ··· 426 426 } 427 427 428 428 /* slab allocation failed, try the memory pool */ 429 - write_lock_irqsave(&kmemleak_lock, flags); 429 + raw_spin_lock_irqsave(&kmemleak_lock, flags); 430 430 object = list_first_entry_or_null(&mem_pool_free_list, 431 431 typeof(*object), object_list); 432 432 if (object) ··· 435 435 object = &mem_pool[--mem_pool_free_count]; 436 436 else 437 437 pr_warn_once("Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE\n"); 438 - write_unlock_irqrestore(&kmemleak_lock, flags); 438 + raw_spin_unlock_irqrestore(&kmemleak_lock, flags); 439 439 440 440 return object; 441 441 } ··· 453 453 } 454 454 455 455 /* add the object to the memory pool free list */ 456 - write_lock_irqsave(&kmemleak_lock, flags); 456 + raw_spin_lock_irqsave(&kmemleak_lock, flags); 457 457 list_add(&object->object_list, &mem_pool_free_list); 458 - write_unlock_irqrestore(&kmemleak_lock, flags); 458 + raw_spin_unlock_irqrestore(&kmemleak_lock, flags); 459 459 } 460 460 461 461 /* ··· 514 514 struct kmemleak_object *object; 515 515 516 516 rcu_read_lock(); 517 - read_lock_irqsave(&kmemleak_lock, flags); 517 + raw_spin_lock_irqsave(&kmemleak_lock, flags); 518 518 object = lookup_object(ptr, alias); 519 - read_unlock_irqrestore(&kmemleak_lock, flags); 519 + raw_spin_unlock_irqrestore(&kmemleak_lock, flags); 520 520 521 521 /* check whether the object is still available */ 522 522 if (object && !get_object(object)) ··· 546 546 unsigned long flags; 547 547 struct kmemleak_object *object; 548 548 549 - write_lock_irqsave(&kmemleak_lock, flags); 549 + raw_spin_lock_irqsave(&kmemleak_lock, flags); 550 550 object = lookup_object(ptr, alias); 551 551 if (object) 552 552 __remove_object(object); 553 - write_unlock_irqrestore(&kmemleak_lock, flags); 553 + raw_spin_unlock_irqrestore(&kmemleak_lock, flags); 554 554 555 555 return object; 556 556 } ··· 585 585 INIT_LIST_HEAD(&object->object_list); 586 586 INIT_LIST_HEAD(&object->gray_list); 587 587 INIT_HLIST_HEAD(&object->area_list); 588 - spin_lock_init(&object->lock); 588 + raw_spin_lock_init(&object->lock); 589 589 atomic_set(&object->use_count, 1); 590 590 object->flags = OBJECT_ALLOCATED; 591 591 object->pointer = ptr; ··· 617 617 /* kernel backtrace */ 618 618 object->trace_len = __save_stack_trace(object->trace); 619 619 620 - write_lock_irqsave(&kmemleak_lock, flags); 620 + raw_spin_lock_irqsave(&kmemleak_lock, flags); 621 621 622 622 untagged_ptr = (unsigned long)kasan_reset_tag((void *)ptr); 623 623 min_addr = min(min_addr, untagged_ptr); ··· 649 649 650 650 list_add_tail_rcu(&object->object_list, &object_list); 651 651 out: 652 - write_unlock_irqrestore(&kmemleak_lock, flags); 652 + raw_spin_unlock_irqrestore(&kmemleak_lock, flags); 653 653 return object; 654 654 } 655 655 ··· 667 667 * Locking here also ensures that the corresponding memory block 668 668 * cannot be freed when it is being scanned. 669 669 */ 670 - spin_lock_irqsave(&object->lock, flags); 670 + raw_spin_lock_irqsave(&object->lock, flags); 671 671 object->flags &= ~OBJECT_ALLOCATED; 672 - spin_unlock_irqrestore(&object->lock, flags); 672 + raw_spin_unlock_irqrestore(&object->lock, flags); 673 673 put_object(object); 674 674 } 675 675 ··· 739 739 { 740 740 unsigned long flags; 741 741 742 - spin_lock_irqsave(&object->lock, flags); 742 + raw_spin_lock_irqsave(&object->lock, flags); 743 743 __paint_it(object, color); 744 - spin_unlock_irqrestore(&object->lock, flags); 744 + raw_spin_unlock_irqrestore(&object->lock, flags); 745 745 } 746 746 747 747 static void paint_ptr(unsigned long ptr, int color) ··· 798 798 if (scan_area_cache) 799 799 area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp)); 800 800 801 - spin_lock_irqsave(&object->lock, flags); 801 + raw_spin_lock_irqsave(&object->lock, flags); 802 802 if (!area) { 803 803 pr_warn_once("Cannot allocate a scan area, scanning the full object\n"); 804 804 /* mark the object for full scan to avoid false positives */ ··· 820 820 821 821 hlist_add_head(&area->node, &object->area_list); 822 822 out_unlock: 823 - spin_unlock_irqrestore(&object->lock, flags); 823 + raw_spin_unlock_irqrestore(&object->lock, flags); 824 824 put_object(object); 825 825 } 826 826 ··· 842 842 return; 843 843 } 844 844 845 - spin_lock_irqsave(&object->lock, flags); 845 + raw_spin_lock_irqsave(&object->lock, flags); 846 846 object->excess_ref = excess_ref; 847 - spin_unlock_irqrestore(&object->lock, flags); 847 + raw_spin_unlock_irqrestore(&object->lock, flags); 848 848 put_object(object); 849 849 } 850 850 ··· 864 864 return; 865 865 } 866 866 867 - spin_lock_irqsave(&object->lock, flags); 867 + raw_spin_lock_irqsave(&object->lock, flags); 868 868 object->flags |= OBJECT_NO_SCAN; 869 - spin_unlock_irqrestore(&object->lock, flags); 869 + raw_spin_unlock_irqrestore(&object->lock, flags); 870 870 put_object(object); 871 871 } 872 872 ··· 1026 1026 return; 1027 1027 } 1028 1028 1029 - spin_lock_irqsave(&object->lock, flags); 1029 + raw_spin_lock_irqsave(&object->lock, flags); 1030 1030 object->trace_len = __save_stack_trace(object->trace); 1031 - spin_unlock_irqrestore(&object->lock, flags); 1031 + raw_spin_unlock_irqrestore(&object->lock, flags); 1032 1032 1033 1033 put_object(object); 1034 1034 } ··· 1233 1233 unsigned long flags; 1234 1234 unsigned long untagged_ptr; 1235 1235 1236 - read_lock_irqsave(&kmemleak_lock, flags); 1236 + raw_spin_lock_irqsave(&kmemleak_lock, flags); 1237 1237 for (ptr = start; ptr < end; ptr++) { 1238 1238 struct kmemleak_object *object; 1239 1239 unsigned long pointer; ··· 1268 1268 * previously acquired in scan_object(). These locks are 1269 1269 * enclosed by scan_mutex. 1270 1270 */ 1271 - spin_lock_nested(&object->lock, SINGLE_DEPTH_NESTING); 1271 + raw_spin_lock_nested(&object->lock, SINGLE_DEPTH_NESTING); 1272 1272 /* only pass surplus references (object already gray) */ 1273 1273 if (color_gray(object)) { 1274 1274 excess_ref = object->excess_ref; ··· 1277 1277 excess_ref = 0; 1278 1278 update_refs(object); 1279 1279 } 1280 - spin_unlock(&object->lock); 1280 + raw_spin_unlock(&object->lock); 1281 1281 1282 1282 if (excess_ref) { 1283 1283 object = lookup_object(excess_ref, 0); ··· 1286 1286 if (object == scanned) 1287 1287 /* circular reference, ignore */ 1288 1288 continue; 1289 - spin_lock_nested(&object->lock, SINGLE_DEPTH_NESTING); 1289 + raw_spin_lock_nested(&object->lock, SINGLE_DEPTH_NESTING); 1290 1290 update_refs(object); 1291 - spin_unlock(&object->lock); 1291 + raw_spin_unlock(&object->lock); 1292 1292 } 1293 1293 } 1294 - read_unlock_irqrestore(&kmemleak_lock, flags); 1294 + raw_spin_unlock_irqrestore(&kmemleak_lock, flags); 1295 1295 } 1296 1296 1297 1297 /* ··· 1324 1324 * Once the object->lock is acquired, the corresponding memory block 1325 1325 * cannot be freed (the same lock is acquired in delete_object). 1326 1326 */ 1327 - spin_lock_irqsave(&object->lock, flags); 1327 + raw_spin_lock_irqsave(&object->lock, flags); 1328 1328 if (object->flags & OBJECT_NO_SCAN) 1329 1329 goto out; 1330 1330 if (!(object->flags & OBJECT_ALLOCATED)) ··· 1344 1344 if (start >= end) 1345 1345 break; 1346 1346 1347 - spin_unlock_irqrestore(&object->lock, flags); 1347 + raw_spin_unlock_irqrestore(&object->lock, flags); 1348 1348 cond_resched(); 1349 - spin_lock_irqsave(&object->lock, flags); 1349 + raw_spin_lock_irqsave(&object->lock, flags); 1350 1350 } while (object->flags & OBJECT_ALLOCATED); 1351 1351 } else 1352 1352 hlist_for_each_entry(area, &object->area_list, node) ··· 1354 1354 (void *)(area->start + area->size), 1355 1355 object); 1356 1356 out: 1357 - spin_unlock_irqrestore(&object->lock, flags); 1357 + raw_spin_unlock_irqrestore(&object->lock, flags); 1358 1358 } 1359 1359 1360 1360 /* ··· 1407 1407 /* prepare the kmemleak_object's */ 1408 1408 rcu_read_lock(); 1409 1409 list_for_each_entry_rcu(object, &object_list, object_list) { 1410 - spin_lock_irqsave(&object->lock, flags); 1410 + raw_spin_lock_irqsave(&object->lock, flags); 1411 1411 #ifdef DEBUG 1412 1412 /* 1413 1413 * With a few exceptions there should be a maximum of ··· 1424 1424 if (color_gray(object) && get_object(object)) 1425 1425 list_add_tail(&object->gray_list, &gray_list); 1426 1426 1427 - spin_unlock_irqrestore(&object->lock, flags); 1427 + raw_spin_unlock_irqrestore(&object->lock, flags); 1428 1428 } 1429 1429 rcu_read_unlock(); 1430 1430 ··· 1492 1492 */ 1493 1493 rcu_read_lock(); 1494 1494 list_for_each_entry_rcu(object, &object_list, object_list) { 1495 - spin_lock_irqsave(&object->lock, flags); 1495 + raw_spin_lock_irqsave(&object->lock, flags); 1496 1496 if (color_white(object) && (object->flags & OBJECT_ALLOCATED) 1497 1497 && update_checksum(object) && get_object(object)) { 1498 1498 /* color it gray temporarily */ 1499 1499 object->count = object->min_count; 1500 1500 list_add_tail(&object->gray_list, &gray_list); 1501 1501 } 1502 - spin_unlock_irqrestore(&object->lock, flags); 1502 + raw_spin_unlock_irqrestore(&object->lock, flags); 1503 1503 } 1504 1504 rcu_read_unlock(); 1505 1505 ··· 1519 1519 */ 1520 1520 rcu_read_lock(); 1521 1521 list_for_each_entry_rcu(object, &object_list, object_list) { 1522 - spin_lock_irqsave(&object->lock, flags); 1522 + raw_spin_lock_irqsave(&object->lock, flags); 1523 1523 if (unreferenced_object(object) && 1524 1524 !(object->flags & OBJECT_REPORTED)) { 1525 1525 object->flags |= OBJECT_REPORTED; ··· 1529 1529 1530 1530 new_leaks++; 1531 1531 } 1532 - spin_unlock_irqrestore(&object->lock, flags); 1532 + raw_spin_unlock_irqrestore(&object->lock, flags); 1533 1533 } 1534 1534 rcu_read_unlock(); 1535 1535 ··· 1681 1681 struct kmemleak_object *object = v; 1682 1682 unsigned long flags; 1683 1683 1684 - spin_lock_irqsave(&object->lock, flags); 1684 + raw_spin_lock_irqsave(&object->lock, flags); 1685 1685 if ((object->flags & OBJECT_REPORTED) && unreferenced_object(object)) 1686 1686 print_unreferenced(seq, object); 1687 - spin_unlock_irqrestore(&object->lock, flags); 1687 + raw_spin_unlock_irqrestore(&object->lock, flags); 1688 1688 return 0; 1689 1689 } 1690 1690 ··· 1714 1714 return -EINVAL; 1715 1715 } 1716 1716 1717 - spin_lock_irqsave(&object->lock, flags); 1717 + raw_spin_lock_irqsave(&object->lock, flags); 1718 1718 dump_object_info(object); 1719 - spin_unlock_irqrestore(&object->lock, flags); 1719 + raw_spin_unlock_irqrestore(&object->lock, flags); 1720 1720 1721 1721 put_object(object); 1722 1722 return 0; ··· 1735 1735 1736 1736 rcu_read_lock(); 1737 1737 list_for_each_entry_rcu(object, &object_list, object_list) { 1738 - spin_lock_irqsave(&object->lock, flags); 1738 + raw_spin_lock_irqsave(&object->lock, flags); 1739 1739 if ((object->flags & OBJECT_REPORTED) && 1740 1740 unreferenced_object(object)) 1741 1741 __paint_it(object, KMEMLEAK_GREY); 1742 - spin_unlock_irqrestore(&object->lock, flags); 1742 + raw_spin_unlock_irqrestore(&object->lock, flags); 1743 1743 } 1744 1744 rcu_read_unlock(); 1745 1745

+17 -5

mm/memblock.c

··· 575 575 * Return: 576 576 * 0 on success, -errno on failure. 577 577 */ 578 - int __init_memblock memblock_add_range(struct memblock_type *type, 578 + static int __init_memblock memblock_add_range(struct memblock_type *type, 579 579 phys_addr_t base, phys_addr_t size, 580 580 int nid, enum memblock_flags flags) 581 581 { ··· 694 694 { 695 695 phys_addr_t end = base + size - 1; 696 696 697 - memblock_dbg("memblock_add: [%pa-%pa] %pS\n", 697 + memblock_dbg("%s: [%pa-%pa] %pS\n", __func__, 698 698 &base, &end, (void *)_RET_IP_); 699 699 700 700 return memblock_add_range(&memblock.memory, base, size, MAX_NUMNODES, 0); ··· 795 795 { 796 796 phys_addr_t end = base + size - 1; 797 797 798 - memblock_dbg("memblock_remove: [%pa-%pa] %pS\n", 798 + memblock_dbg("%s: [%pa-%pa] %pS\n", __func__, 799 799 &base, &end, (void *)_RET_IP_); 800 800 801 801 return memblock_remove_range(&memblock.memory, base, size); ··· 813 813 { 814 814 phys_addr_t end = base + size - 1; 815 815 816 - memblock_dbg(" memblock_free: [%pa-%pa] %pS\n", 816 + memblock_dbg("%s: [%pa-%pa] %pS\n", __func__, 817 817 &base, &end, (void *)_RET_IP_); 818 818 819 819 kmemleak_free_part_phys(base, size); ··· 824 824 { 825 825 phys_addr_t end = base + size - 1; 826 826 827 - memblock_dbg("memblock_reserve: [%pa-%pa] %pS\n", 827 + memblock_dbg("%s: [%pa-%pa] %pS\n", __func__, 828 828 &base, &end, (void *)_RET_IP_); 829 829 830 830 return memblock_add_range(&memblock.reserved, base, size, MAX_NUMNODES, 0); 831 831 } 832 + 833 + #ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP 834 + int __init_memblock memblock_physmem_add(phys_addr_t base, phys_addr_t size) 835 + { 836 + phys_addr_t end = base + size - 1; 837 + 838 + memblock_dbg("%s: [%pa-%pa] %pS\n", __func__, 839 + &base, &end, (void *)_RET_IP_); 840 + 841 + return memblock_add_range(&memblock.physmem, base, size, MAX_NUMNODES, 0); 842 + } 843 + #endif 832 844 833 845 /** 834 846 * memblock_setclr_flag - set or clear flag for a memory region

+3 -22

mm/memcontrol.c

··· 5340 5340 __mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages); 5341 5341 } 5342 5342 5343 - #ifdef CONFIG_TRANSPARENT_HUGEPAGE 5344 - if (compound && !list_empty(page_deferred_list(page))) { 5345 - spin_lock(&from->deferred_split_queue.split_queue_lock); 5346 - list_del_init(page_deferred_list(page)); 5347 - from->deferred_split_queue.split_queue_len--; 5348 - spin_unlock(&from->deferred_split_queue.split_queue_lock); 5349 - } 5350 - #endif 5351 5343 /* 5352 5344 * It is safe to change page->mem_cgroup here because the page 5353 5345 * is referenced, charged, and isolated - we can't race with ··· 5348 5356 5349 5357 /* caller should have done css_get */ 5350 5358 page->mem_cgroup = to; 5351 - 5352 - #ifdef CONFIG_TRANSPARENT_HUGEPAGE 5353 - if (compound && list_empty(page_deferred_list(page))) { 5354 - spin_lock(&to->deferred_split_queue.split_queue_lock); 5355 - list_add_tail(page_deferred_list(page), 5356 - &to->deferred_split_queue.split_queue); 5357 - to->deferred_split_queue.split_queue_len++; 5358 - spin_unlock(&to->deferred_split_queue.split_queue_lock); 5359 - } 5360 - #endif 5361 5359 5362 5360 spin_unlock_irqrestore(&from->move_lock, flags); 5363 5361 ··· 6633 6651 { 6634 6652 struct mem_cgroup *memcg; 6635 6653 unsigned int nr_pages; 6636 - bool compound; 6637 6654 unsigned long flags; 6638 6655 6639 6656 VM_BUG_ON_PAGE(!PageLocked(oldpage), oldpage); ··· 6654 6673 return; 6655 6674 6656 6675 /* Force-charge the new page. The old one will be freed soon */ 6657 - compound = PageTransHuge(newpage); 6658 - nr_pages = compound ? hpage_nr_pages(newpage) : 1; 6676 + nr_pages = hpage_nr_pages(newpage); 6659 6677 6660 6678 page_counter_charge(&memcg->memory, nr_pages); 6661 6679 if (do_memsw_account()) ··· 6664 6684 commit_charge(newpage, memcg, false); 6665 6685 6666 6686 local_irq_save(flags); 6667 - mem_cgroup_charge_statistics(memcg, newpage, compound, nr_pages); 6687 + mem_cgroup_charge_statistics(memcg, newpage, PageTransHuge(newpage), 6688 + nr_pages); 6668 6689 memcg_check_events(memcg, newpage); 6669 6690 local_irq_restore(flags); 6670 6691 }

+9 -15

mm/memory_hotplug.c

··· 783 783 return default_zone_for_pfn(nid, start_pfn, nr_pages); 784 784 } 785 785 786 - int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type) 786 + int __ref online_pages(unsigned long pfn, unsigned long nr_pages, 787 + int online_type, int nid) 787 788 { 788 789 unsigned long flags; 789 790 unsigned long onlined_pages = 0; 790 791 struct zone *zone; 791 792 int need_zonelists_rebuild = 0; 792 - int nid; 793 793 int ret; 794 794 struct memory_notify arg; 795 - struct memory_block *mem; 796 795 797 796 mem_hotplug_begin(); 798 - 799 - /* 800 - * We can't use pfn_to_nid() because nid might be stored in struct page 801 - * which is not yet initialized. Instead, we find nid from memory block. 802 - */ 803 - mem = find_memory_block(__pfn_to_section(pfn)); 804 - nid = mem->nid; 805 - put_device(&mem->dev); 806 797 807 798 /* associate pfn range with the zone */ 808 799 zone = zone_for_pfn_range(online_type, nid, pfn, nr_pages); ··· 1173 1182 if (!zone_spans_pfn(zone, pfn)) 1174 1183 return false; 1175 1184 1176 - return !has_unmovable_pages(zone, page, 0, MIGRATE_MOVABLE, 1185 + return !has_unmovable_pages(zone, page, MIGRATE_MOVABLE, 1177 1186 MEMORY_OFFLINE); 1178 1187 } 1179 1188 ··· 1755 1764 1756 1765 BUG_ON(check_hotplug_memory_range(start, size)); 1757 1766 1758 - mem_hotplug_begin(); 1759 - 1760 1767 /* 1761 1768 * All memory blocks must be offlined before removing memory. Check 1762 1769 * whether all memory blocks in question are offline and return error ··· 1767 1778 /* remove memmap entry */ 1768 1779 firmware_map_remove(start, start + size, "System RAM"); 1769 1780 1770 - /* remove memory block devices before removing memory */ 1781 + /* 1782 + * Memory block device removal under the device_hotplug_lock is 1783 + * a barrier against racing online attempts. 1784 + */ 1771 1785 remove_memory_block_devices(start, size); 1786 + 1787 + mem_hotplug_begin(); 1772 1788 1773 1789 arch_remove_memory(nid, start, size, NULL); 1774 1790 memblock_free(start, size);

+3 -3

mm/mempolicy.c

··· 2821 2821 char *flags = strchr(str, '='); 2822 2822 int err = 1, mode; 2823 2823 2824 + if (flags) 2825 + *flags++ = '\0'; /* terminate mode string */ 2826 + 2824 2827 if (nodelist) { 2825 2828 /* NUL-terminate mode or flags string */ 2826 2829 *nodelist++ = '\0'; ··· 2833 2830 goto out; 2834 2831 } else 2835 2832 nodes_clear(nodes); 2836 - 2837 - if (flags) 2838 - *flags++ = '\0'; /* terminate mode string */ 2839 2833 2840 2834 mode = match_string(policy_modes, MPOL_MAX, str); 2841 2835 if (mode < 0)

+35 -40

mm/memremap.c

··· 27 27 28 28 static int devmap_managed_enable_get(struct dev_pagemap *pgmap) 29 29 { 30 - if (!pgmap->ops || !pgmap->ops->page_free) { 30 + if (pgmap->type == MEMORY_DEVICE_PRIVATE && 31 + (!pgmap->ops || !pgmap->ops->page_free)) { 31 32 WARN(1, "Missing page_free method\n"); 32 33 return -EINVAL; 33 34 } ··· 411 410 EXPORT_SYMBOL_GPL(get_dev_pagemap); 412 411 413 412 #ifdef CONFIG_DEV_PAGEMAP_OPS 414 - void __put_devmap_managed_page(struct page *page) 413 + void free_devmap_managed_page(struct page *page) 415 414 { 416 - int count = page_ref_dec_return(page); 415 + /* notify page idle for dax */ 416 + if (!is_device_private_page(page)) { 417 + wake_up_var(&page->_refcount); 418 + return; 419 + } 420 + 421 + /* Clear Active bit in case of parallel mark_page_accessed */ 422 + __ClearPageActive(page); 423 + __ClearPageWaiters(page); 424 + 425 + mem_cgroup_uncharge(page); 417 426 418 427 /* 419 - * If refcount is 1 then page is freed and refcount is stable as nobody 420 - * holds a reference on the page. 428 + * When a device_private page is freed, the page->mapping field 429 + * may still contain a (stale) mapping value. For example, the 430 + * lower bits of page->mapping may still identify the page as an 431 + * anonymous page. Ultimately, this entire field is just stale 432 + * and wrong, and it will cause errors if not cleared. One 433 + * example is: 434 + * 435 + * migrate_vma_pages() 436 + * migrate_vma_insert_page() 437 + * page_add_new_anon_rmap() 438 + * __page_set_anon_rmap() 439 + * ...checks page->mapping, via PageAnon(page) call, 440 + * and incorrectly concludes that the page is an 441 + * anonymous page. Therefore, it incorrectly, 442 + * silently fails to set up the new anon rmap. 443 + * 444 + * For other types of ZONE_DEVICE pages, migration is either 445 + * handled differently or not done at all, so there is no need 446 + * to clear page->mapping. 421 447 */ 422 - if (count == 1) { 423 - /* Clear Active bit in case of parallel mark_page_accessed */ 424 - __ClearPageActive(page); 425 - __ClearPageWaiters(page); 426 - 427 - mem_cgroup_uncharge(page); 428 - 429 - /* 430 - * When a device_private page is freed, the page->mapping field 431 - * may still contain a (stale) mapping value. For example, the 432 - * lower bits of page->mapping may still identify the page as 433 - * an anonymous page. Ultimately, this entire field is just 434 - * stale and wrong, and it will cause errors if not cleared. 435 - * One example is: 436 - * 437 - * migrate_vma_pages() 438 - * migrate_vma_insert_page() 439 - * page_add_new_anon_rmap() 440 - * __page_set_anon_rmap() 441 - * ...checks page->mapping, via PageAnon(page) call, 442 - * and incorrectly concludes that the page is an 443 - * anonymous page. Therefore, it incorrectly, 444 - * silently fails to set up the new anon rmap. 445 - * 446 - * For other types of ZONE_DEVICE pages, migration is either 447 - * handled differently or not done at all, so there is no need 448 - * to clear page->mapping. 449 - */ 450 - if (is_device_private_page(page)) 451 - page->mapping = NULL; 452 - 453 - page->pgmap->ops->page_free(page); 454 - } else if (!count) 455 - __put_page(page); 448 + page->mapping = NULL; 449 + page->pgmap->ops->page_free(page); 456 450 } 457 - EXPORT_SYMBOL(__put_devmap_managed_page); 458 451 #endif /* CONFIG_DEV_PAGEMAP_OPS */

+51 -26

mm/migrate.c

··· 48 48 #include <linux/page_owner.h> 49 49 #include <linux/sched/mm.h> 50 50 #include <linux/ptrace.h> 51 + #include <linux/oom.h> 51 52 52 53 #include <asm/tlbflush.h> 53 54 ··· 987 986 } 988 987 989 988 /* 990 - * Anonymous and movable page->mapping will be cleard by 989 + * Anonymous and movable page->mapping will be cleared by 991 990 * free_pages_prepare so don't reset it here for keeping 992 991 * the type to work PageAnon, for example. 993 992 */ ··· 1200 1199 /* 1201 1200 * A page that has been migrated has all references 1202 1201 * removed and will be freed. A page that has not been 1203 - * migrated will have kepts its references and be 1204 - * restored. 1202 + * migrated will have kept its references and be restored. 1205 1203 */ 1206 1204 list_del(&page->lru); 1207 1205 ··· 1627 1627 start = i; 1628 1628 } else if (node != current_node) { 1629 1629 err = do_move_pages_to_node(mm, &pagelist, current_node); 1630 - if (err) 1630 + if (err) { 1631 + /* 1632 + * Positive err means the number of failed 1633 + * pages to migrate. Since we are going to 1634 + * abort and return the number of non-migrated 1635 + * pages, so need to incude the rest of the 1636 + * nr_pages that have not been attempted as 1637 + * well. 1638 + */ 1639 + if (err > 0) 1640 + err += nr_pages - i - 1; 1631 1641 goto out; 1642 + } 1632 1643 err = store_status(status, start, current_node, i - start); 1633 1644 if (err) 1634 1645 goto out; ··· 1670 1659 goto out_flush; 1671 1660 1672 1661 err = do_move_pages_to_node(mm, &pagelist, current_node); 1673 - if (err) 1662 + if (err) { 1663 + if (err > 0) 1664 + err += nr_pages - i - 1; 1674 1665 goto out; 1666 + } 1675 1667 if (i > start) { 1676 1668 err = store_status(status, start, current_node, i - start); 1677 1669 if (err) ··· 1688 1674 1689 1675 /* Make sure we do not overwrite the existing error */ 1690 1676 err1 = do_move_pages_to_node(mm, &pagelist, current_node); 1677 + /* 1678 + * Don't have to report non-attempted pages here since: 1679 + * - If the above loop is done gracefully all pages have been 1680 + * attempted. 1681 + * - If the above loop is aborted it means a fatal error 1682 + * happened, should return ret. 1683 + */ 1691 1684 if (!err1) 1692 1685 err1 = store_status(status, start, current_node, i - start); 1693 - if (!err) 1686 + if (err >= 0) 1694 1687 err = err1; 1695 1688 out: 1696 1689 return err; ··· 2156 2135 struct migrate_vma *migrate = walk->private; 2157 2136 unsigned long addr; 2158 2137 2159 - for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) { 2138 + for (addr = start; addr < end; addr += PAGE_SIZE) { 2160 2139 migrate->src[migrate->npages] = MIGRATE_PFN_MIGRATE; 2161 2140 migrate->dst[migrate->npages] = 0; 2162 2141 migrate->npages++; ··· 2173 2152 struct migrate_vma *migrate = walk->private; 2174 2153 unsigned long addr; 2175 2154 2176 - for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) { 2155 + for (addr = start; addr < end; addr += PAGE_SIZE) { 2177 2156 migrate->dst[migrate->npages] = 0; 2178 2157 migrate->src[migrate->npages++] = 0; 2179 2158 } ··· 2696 2675 } 2697 2676 EXPORT_SYMBOL(migrate_vma_setup); 2698 2677 2678 + /* 2679 + * This code closely matches the code in: 2680 + * __handle_mm_fault() 2681 + * handle_pte_fault() 2682 + * do_anonymous_page() 2683 + * to map in an anonymous zero page but the struct page will be a ZONE_DEVICE 2684 + * private page. 2685 + */ 2699 2686 static void migrate_vma_insert_page(struct migrate_vma *migrate, 2700 2687 unsigned long addr, 2701 2688 struct page *page, ··· 2784 2755 2785 2756 ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); 2786 2757 2758 + if (check_stable_address_space(mm)) 2759 + goto unlock_abort; 2760 + 2787 2761 if (pte_present(*ptep)) { 2788 2762 unsigned long pfn = pte_pfn(*ptep); 2789 2763 2790 - if (!is_zero_pfn(pfn)) { 2791 - pte_unmap_unlock(ptep, ptl); 2792 - mem_cgroup_cancel_charge(page, memcg, false); 2793 - goto abort; 2794 - } 2764 + if (!is_zero_pfn(pfn)) 2765 + goto unlock_abort; 2795 2766 flush = true; 2796 - } else if (!pte_none(*ptep)) { 2797 - pte_unmap_unlock(ptep, ptl); 2798 - mem_cgroup_cancel_charge(page, memcg, false); 2799 - goto abort; 2800 - } 2767 + } else if (!pte_none(*ptep)) 2768 + goto unlock_abort; 2801 2769 2802 2770 /* 2803 - * Check for usefaultfd but do not deliver the fault. Instead, 2771 + * Check for userfaultfd but do not deliver the fault. Instead, 2804 2772 * just back off. 2805 2773 */ 2806 - if (userfaultfd_missing(vma)) { 2807 - pte_unmap_unlock(ptep, ptl); 2808 - mem_cgroup_cancel_charge(page, memcg, false); 2809 - goto abort; 2810 - } 2774 + if (userfaultfd_missing(vma)) 2775 + goto unlock_abort; 2811 2776 2812 2777 inc_mm_counter(mm, MM_ANONPAGES); 2813 2778 page_add_new_anon_rmap(page, vma, addr, false); ··· 2825 2802 *src = MIGRATE_PFN_MIGRATE; 2826 2803 return; 2827 2804 2805 + unlock_abort: 2806 + pte_unmap_unlock(ptep, ptl); 2807 + mem_cgroup_cancel_charge(page, memcg, false); 2828 2808 abort: 2829 2809 *src &= ~MIGRATE_PFN_MIGRATE; 2830 2810 } ··· 2860 2834 } 2861 2835 2862 2836 if (!page) { 2863 - if (!(migrate->src[i] & MIGRATE_PFN_MIGRATE)) { 2837 + if (!(migrate->src[i] & MIGRATE_PFN_MIGRATE)) 2864 2838 continue; 2865 - } 2866 2839 if (!notified) { 2867 2840 notified = true; 2868 2841

+13 -17

mm/mmap.c

··· 1270 1270 */ 1271 1271 struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *vma) 1272 1272 { 1273 - struct anon_vma *anon_vma; 1274 - struct vm_area_struct *near; 1273 + struct anon_vma *anon_vma = NULL; 1275 1274 1276 - near = vma->vm_next; 1277 - if (!near) 1278 - goto try_prev; 1275 + /* Try next first. */ 1276 + if (vma->vm_next) { 1277 + anon_vma = reusable_anon_vma(vma->vm_next, vma, vma->vm_next); 1278 + if (anon_vma) 1279 + return anon_vma; 1280 + } 1279 1281 1280 - anon_vma = reusable_anon_vma(near, vma, near); 1281 - if (anon_vma) 1282 - return anon_vma; 1283 - try_prev: 1284 - near = vma->vm_prev; 1285 - if (!near) 1286 - goto none; 1282 + /* Try prev next. */ 1283 + if (vma->vm_prev) 1284 + anon_vma = reusable_anon_vma(vma->vm_prev, vma->vm_prev, vma); 1287 1285 1288 - anon_vma = reusable_anon_vma(near, near, vma); 1289 - if (anon_vma) 1290 - return anon_vma; 1291 - none: 1292 1286 /* 1287 + * We might reach here with anon_vma == NULL if we can't find 1288 + * any reusable anon_vma. 1293 1289 * There's no absolute need to look only at touching neighbours: 1294 1290 * we could search further afield for "compatible" anon_vmas. 1295 1291 * But it would probably just be a waste of time searching, ··· 1293 1297 * We're trying to allow mprotect remerging later on, 1294 1298 * not trying to minimize memory used for anon_vmas. 1295 1299 */ 1296 - return NULL; 1300 + return anon_vma; 1297 1301 } 1298 1302 1299 1303 /*

+2

mm/oom_kill.c

··· 26 26 #include <linux/sched/mm.h> 27 27 #include <linux/sched/coredump.h> 28 28 #include <linux/sched/task.h> 29 + #include <linux/sched/debug.h> 29 30 #include <linux/swap.h> 30 31 #include <linux/timex.h> 31 32 #include <linux/jiffies.h> ··· 621 620 622 621 pr_info("oom_reaper: unable to reap pid:%d (%s)\n", 623 622 task_pid_nr(tsk), tsk->comm); 623 + sched_show_task(tsk); 624 624 debug_show_all_locks(); 625 625 626 626 done:

+44 -31

mm/page_alloc.c

··· 5848 5848 return false; 5849 5849 } 5850 5850 5851 + #ifdef CONFIG_SPARSEMEM 5852 + /* Skip PFNs that belong to non-present sections */ 5853 + static inline __meminit unsigned long next_pfn(unsigned long pfn) 5854 + { 5855 + unsigned long section_nr; 5856 + 5857 + section_nr = pfn_to_section_nr(++pfn); 5858 + if (present_section_nr(section_nr)) 5859 + return pfn; 5860 + 5861 + while (++section_nr <= __highest_present_section_nr) { 5862 + if (present_section_nr(section_nr)) 5863 + return section_nr_to_pfn(section_nr); 5864 + } 5865 + 5866 + return -1; 5867 + } 5868 + #else 5869 + static inline __meminit unsigned long next_pfn(unsigned long pfn) 5870 + { 5871 + return pfn++; 5872 + } 5873 + #endif 5874 + 5851 5875 /* 5852 5876 * Initially all pages are reserved - free ones are freed 5853 5877 * up by memblock_free_all() once the early boot process is ··· 5911 5887 * function. They do not exist on hotplugged memory. 5912 5888 */ 5913 5889 if (context == MEMMAP_EARLY) { 5914 - if (!early_pfn_valid(pfn)) 5890 + if (!early_pfn_valid(pfn)) { 5891 + pfn = next_pfn(pfn) - 1; 5915 5892 continue; 5893 + } 5916 5894 if (!early_pfn_in_nid(pfn, nid)) 5917 5895 continue; 5918 5896 if (overlap_memmap_init(zone, &pfn)) ··· 8180 8154 8181 8155 /* 8182 8156 * This function checks whether pageblock includes unmovable pages or not. 8183 - * If @count is not zero, it is okay to include less @count unmovable pages 8184 8157 * 8185 8158 * PageLRU check without isolation or lru_lock could race so that 8186 8159 * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable 8187 8160 * check without lock_page also may miss some movable non-lru pages at 8188 8161 * race condition. So you can't expect this function should be exact. 8162 + * 8163 + * Returns a page without holding a reference. If the caller wants to 8164 + * dereference that page (e.g., dumping), it has to make sure that that it 8165 + * cannot get removed (e.g., via memory unplug) concurrently. 8166 + * 8189 8167 */ 8190 - bool has_unmovable_pages(struct zone *zone, struct page *page, int count, 8191 - int migratetype, int flags) 8168 + struct page *has_unmovable_pages(struct zone *zone, struct page *page, 8169 + int migratetype, int flags) 8192 8170 { 8193 - unsigned long found; 8194 8171 unsigned long iter = 0; 8195 8172 unsigned long pfn = page_to_pfn(page); 8196 - const char *reason = "unmovable page"; 8197 8173 8198 8174 /* 8199 8175 * TODO we could make this much more efficient by not checking every ··· 8212 8184 * so consider them movable here. 8213 8185 */ 8214 8186 if (is_migrate_cma(migratetype)) 8215 - return false; 8187 + return NULL; 8216 8188 8217 - reason = "CMA page"; 8218 - goto unmovable; 8189 + return page; 8219 8190 } 8220 8191 8221 - for (found = 0; iter < pageblock_nr_pages; iter++) { 8222 - unsigned long check = pfn + iter; 8223 - 8224 - if (!pfn_valid_within(check)) 8192 + for (; iter < pageblock_nr_pages; iter++) { 8193 + if (!pfn_valid_within(pfn + iter)) 8225 8194 continue; 8226 8195 8227 - page = pfn_to_page(check); 8196 + page = pfn_to_page(pfn + iter); 8228 8197 8229 8198 if (PageReserved(page)) 8230 - goto unmovable; 8199 + return page; 8231 8200 8232 8201 /* 8233 8202 * If the zone is movable and we have ruled out all reserved ··· 8244 8219 unsigned int skip_pages; 8245 8220 8246 8221 if (!hugepage_migration_supported(page_hstate(head))) 8247 - goto unmovable; 8222 + return page; 8248 8223 8249 8224 skip_pages = compound_nr(head) - (page - head); 8250 8225 iter += skip_pages - 1; ··· 8270 8245 if ((flags & MEMORY_OFFLINE) && PageHWPoison(page)) 8271 8246 continue; 8272 8247 8273 - if (__PageMovable(page)) 8248 + if (__PageMovable(page) || PageLRU(page)) 8274 8249 continue; 8275 8250 8276 - if (!PageLRU(page)) 8277 - found++; 8278 8251 /* 8279 8252 * If there are RECLAIMABLE pages, we need to check 8280 8253 * it. But now, memory offline itself doesn't call ··· 8286 8263 * is set to both of a memory hole page and a _used_ kernel 8287 8264 * page at boot. 8288 8265 */ 8289 - if (found > count) 8290 - goto unmovable; 8266 + return page; 8291 8267 } 8292 - return false; 8293 - unmovable: 8294 - WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE); 8295 - if (flags & REPORT_FAILURE) 8296 - dump_page(pfn_to_page(pfn + iter), reason); 8297 - return true; 8268 + return NULL; 8298 8269 } 8299 8270 8300 8271 #ifdef CONFIG_CONTIG_ALLOC ··· 8692 8675 BUG_ON(!PageBuddy(page)); 8693 8676 order = page_order(page); 8694 8677 offlined_pages += 1 << order; 8695 - #ifdef CONFIG_DEBUG_VM 8696 - pr_info("remove from free list %lx %d %lx\n", 8697 - pfn, 1 << order, end_pfn); 8698 - #endif 8699 8678 del_page_from_free_area(page, &zone->free_area[order]); 8700 8679 pfn += (1 << order); 8701 8680 }

+18 -35

mm/page_isolation.c

··· 17 17 18 18 static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags) 19 19 { 20 + struct page *unmovable = NULL; 20 21 struct zone *zone; 21 - unsigned long flags, pfn; 22 - struct memory_isolate_notify arg; 23 - int notifier_ret; 22 + unsigned long flags; 24 23 int ret = -EBUSY; 25 24 26 25 zone = page_zone(page); ··· 34 35 if (is_migrate_isolate_page(page)) 35 36 goto out; 36 37 37 - pfn = page_to_pfn(page); 38 - arg.start_pfn = pfn; 39 - arg.nr_pages = pageblock_nr_pages; 40 - arg.pages_found = 0; 41 - 42 - /* 43 - * It may be possible to isolate a pageblock even if the 44 - * migratetype is not MIGRATE_MOVABLE. The memory isolation 45 - * notifier chain is used by balloon drivers to return the 46 - * number of pages in a range that are held by the balloon 47 - * driver to shrink memory. If all the pages are accounted for 48 - * by balloons, are free, or on the LRU, isolation can continue. 49 - * Later, for example, when memory hotplug notifier runs, these 50 - * pages reported as "can be isolated" should be isolated(freed) 51 - * by the balloon driver through the memory notifier chain. 52 - */ 53 - notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg); 54 - notifier_ret = notifier_to_errno(notifier_ret); 55 - if (notifier_ret) 56 - goto out; 57 38 /* 58 39 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself. 59 40 * We just check MOVABLE pages. 60 41 */ 61 - if (!has_unmovable_pages(zone, page, arg.pages_found, migratetype, 62 - isol_flags)) 63 - ret = 0; 64 - 65 - /* 66 - * immobile means "not-on-lru" pages. If immobile is larger than 67 - * removable-by-driver pages reported by notifier, we'll fail. 68 - */ 69 - 70 - out: 71 - if (!ret) { 42 + unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags); 43 + if (!unmovable) { 72 44 unsigned long nr_pages; 73 45 int mt = get_pageblock_migratetype(page); 74 46 ··· 49 79 NULL); 50 80 51 81 __mod_zone_freepage_state(zone, -nr_pages, mt); 82 + ret = 0; 52 83 } 53 84 85 + out: 54 86 spin_unlock_irqrestore(&zone->lock, flags); 55 - if (!ret) 87 + if (!ret) { 56 88 drain_all_pages(zone); 89 + } else { 90 + WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE); 91 + 92 + if ((isol_flags & REPORT_FAILURE) && unmovable) 93 + /* 94 + * printk() with zone->lock held will likely trigger a 95 + * lockdep splat, so defer it here. 96 + */ 97 + dump_page(unmovable, "unmovable page"); 98 + } 99 + 57 100 return ret; 58 101 } 59 102

+8 -4

mm/page_vma_mapped.c

··· 52 52 return true; 53 53 } 54 54 55 - static inline bool pfn_in_hpage(struct page *hpage, unsigned long pfn) 55 + static inline bool pfn_is_match(struct page *page, unsigned long pfn) 56 56 { 57 - unsigned long hpage_pfn = page_to_pfn(hpage); 57 + unsigned long page_pfn = page_to_pfn(page); 58 + 59 + /* normal page and hugetlbfs page */ 60 + if (!PageTransCompound(page) || PageHuge(page)) 61 + return page_pfn == pfn; 58 62 59 63 /* THP can be referenced by any subpage */ 60 - return pfn >= hpage_pfn && pfn - hpage_pfn < hpage_nr_pages(hpage); 64 + return pfn >= page_pfn && pfn - page_pfn < hpage_nr_pages(page); 61 65 } 62 66 63 67 /** ··· 112 108 pfn = pte_pfn(*pvmw->pte); 113 109 } 114 110 115 - return pfn_in_hpage(pvmw->page, pfn); 111 + return pfn_is_match(pvmw->page, pfn); 116 112 } 117 113 118 114 /**

+15 -13

mm/process_vm_access.c

··· 42 42 if (copy > len) 43 43 copy = len; 44 44 45 - if (vm_write) { 45 + if (vm_write) 46 46 copied = copy_page_from_iter(page, offset, copy, iter); 47 - set_page_dirty_lock(page); 48 - } else { 47 + else 49 48 copied = copy_page_to_iter(page, offset, copy, iter); 50 - } 49 + 51 50 len -= copied; 52 51 if (copied < copy && iov_iter_count(iter)) 53 52 return -EFAULT; ··· 95 96 flags |= FOLL_WRITE; 96 97 97 98 while (!rc && nr_pages && iov_iter_count(iter)) { 98 - int pages = min(nr_pages, max_pages_per_loop); 99 + int pinned_pages = min(nr_pages, max_pages_per_loop); 99 100 int locked = 1; 100 101 size_t bytes; 101 102 ··· 105 106 * current/current->mm 106 107 */ 107 108 down_read(&mm->mmap_sem); 108 - pages = get_user_pages_remote(task, mm, pa, pages, flags, 109 - process_pages, NULL, &locked); 109 + pinned_pages = pin_user_pages_remote(task, mm, pa, pinned_pages, 110 + flags, process_pages, 111 + NULL, &locked); 110 112 if (locked) 111 113 up_read(&mm->mmap_sem); 112 - if (pages <= 0) 114 + if (pinned_pages <= 0) 113 115 return -EFAULT; 114 116 115 - bytes = pages * PAGE_SIZE - start_offset; 117 + bytes = pinned_pages * PAGE_SIZE - start_offset; 116 118 if (bytes > len) 117 119 bytes = len; 118 120 ··· 122 122 vm_write); 123 123 len -= bytes; 124 124 start_offset = 0; 125 - nr_pages -= pages; 126 - pa += pages * PAGE_SIZE; 127 - while (pages) 128 - put_page(process_pages[--pages]); 125 + nr_pages -= pinned_pages; 126 + pa += pinned_pages * PAGE_SIZE; 127 + 128 + /* If vm_write is set, the pages need to be made dirty: */ 129 + unpin_user_pages_dirty_lock(process_pages, pinned_pages, 130 + vm_write); 129 131 } 130 132 131 133 return rc;

+47 -41

mm/slub.c

··· 439 439 } 440 440 441 441 #ifdef CONFIG_SLUB_DEBUG 442 + static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)]; 443 + static DEFINE_SPINLOCK(object_map_lock); 444 + 442 445 /* 443 446 * Determine a map of object in use on a page. 444 447 * 445 448 * Node listlock must be held to guarantee that the page does 446 449 * not vanish from under us. 447 450 */ 448 - static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map) 451 + static unsigned long *get_map(struct kmem_cache *s, struct page *page) 449 452 { 450 453 void *p; 451 454 void *addr = page_address(page); 452 455 456 + VM_BUG_ON(!irqs_disabled()); 457 + 458 + spin_lock(&object_map_lock); 459 + 460 + bitmap_zero(object_map, page->objects); 461 + 453 462 for (p = page->freelist; p; p = get_freepointer(s, p)) 454 - set_bit(slab_index(p, s, addr), map); 463 + set_bit(slab_index(p, s, addr), object_map); 464 + 465 + return object_map; 466 + } 467 + 468 + static void put_map(unsigned long *map) 469 + { 470 + VM_BUG_ON(map != object_map); 471 + lockdep_assert_held(&object_map_lock); 472 + 473 + spin_unlock(&object_map_lock); 455 474 } 456 475 457 476 static inline unsigned int size_from_object(struct kmem_cache *s) ··· 3694 3675 #ifdef CONFIG_SLUB_DEBUG 3695 3676 void *addr = page_address(page); 3696 3677 void *p; 3697 - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC); 3698 - if (!map) 3699 - return; 3678 + unsigned long *map; 3679 + 3700 3680 slab_err(s, page, text, s->name); 3701 3681 slab_lock(page); 3702 3682 3703 - get_map(s, page, map); 3683 + map = get_map(s, page); 3704 3684 for_each_object(p, s, addr, page->objects) { 3705 3685 3706 3686 if (!test_bit(slab_index(p, s, addr), map)) { ··· 3707 3689 print_tracking(s, p); 3708 3690 } 3709 3691 } 3692 + put_map(map); 3693 + 3710 3694 slab_unlock(page); 3711 - bitmap_free(map); 3712 3695 #endif 3713 3696 } 3714 3697 ··· 4403 4384 #endif 4404 4385 4405 4386 #ifdef CONFIG_SLUB_DEBUG 4406 - static void validate_slab(struct kmem_cache *s, struct page *page, 4407 - unsigned long *map) 4387 + static void validate_slab(struct kmem_cache *s, struct page *page) 4408 4388 { 4409 4389 void *p; 4410 4390 void *addr = page_address(page); 4391 + unsigned long *map; 4392 + 4393 + slab_lock(page); 4411 4394 4412 4395 if (!check_slab(s, page) || !on_freelist(s, page, NULL)) 4413 - return; 4396 + goto unlock; 4414 4397 4415 4398 /* Now we know that a valid freelist exists */ 4416 - bitmap_zero(map, page->objects); 4417 - 4418 - get_map(s, page, map); 4399 + map = get_map(s, page); 4419 4400 for_each_object(p, s, addr, page->objects) { 4420 4401 u8 val = test_bit(slab_index(p, s, addr), map) ? 4421 4402 SLUB_RED_INACTIVE : SLUB_RED_ACTIVE; ··· 4423 4404 if (!check_object(s, page, p, val)) 4424 4405 break; 4425 4406 } 4426 - } 4427 - 4428 - static void validate_slab_slab(struct kmem_cache *s, struct page *page, 4429 - unsigned long *map) 4430 - { 4431 - slab_lock(page); 4432 - validate_slab(s, page, map); 4407 + put_map(map); 4408 + unlock: 4433 4409 slab_unlock(page); 4434 4410 } 4435 4411 4436 4412 static int validate_slab_node(struct kmem_cache *s, 4437 - struct kmem_cache_node *n, unsigned long *map) 4413 + struct kmem_cache_node *n) 4438 4414 { 4439 4415 unsigned long count = 0; 4440 4416 struct page *page; ··· 4438 4424 spin_lock_irqsave(&n->list_lock, flags); 4439 4425 4440 4426 list_for_each_entry(page, &n->partial, slab_list) { 4441 - validate_slab_slab(s, page, map); 4427 + validate_slab(s, page); 4442 4428 count++; 4443 4429 } 4444 4430 if (count != n->nr_partial) ··· 4449 4435 goto out; 4450 4436 4451 4437 list_for_each_entry(page, &n->full, slab_list) { 4452 - validate_slab_slab(s, page, map); 4438 + validate_slab(s, page); 4453 4439 count++; 4454 4440 } 4455 4441 if (count != atomic_long_read(&n->nr_slabs)) ··· 4466 4452 int node; 4467 4453 unsigned long count = 0; 4468 4454 struct kmem_cache_node *n; 4469 - unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); 4470 - 4471 - if (!map) 4472 - return -ENOMEM; 4473 4455 4474 4456 flush_all(s); 4475 4457 for_each_kmem_cache_node(s, node, n) 4476 - count += validate_slab_node(s, n, map); 4477 - bitmap_free(map); 4458 + count += validate_slab_node(s, n); 4459 + 4478 4460 return count; 4479 4461 } 4480 4462 /* ··· 4600 4590 } 4601 4591 4602 4592 static void process_slab(struct loc_track *t, struct kmem_cache *s, 4603 - struct page *page, enum track_item alloc, 4604 - unsigned long *map) 4593 + struct page *page, enum track_item alloc) 4605 4594 { 4606 4595 void *addr = page_address(page); 4607 4596 void *p; 4597 + unsigned long *map; 4608 4598 4609 - bitmap_zero(map, page->objects); 4610 - get_map(s, page, map); 4611 - 4599 + map = get_map(s, page); 4612 4600 for_each_object(p, s, addr, page->objects) 4613 4601 if (!test_bit(slab_index(p, s, addr), map)) 4614 4602 add_location(t, s, get_track(s, p, alloc)); 4603 + put_map(map); 4615 4604 } 4616 4605 4617 4606 static int list_locations(struct kmem_cache *s, char *buf, ··· 4621 4612 struct loc_track t = { 0, 0, NULL }; 4622 4613 int node; 4623 4614 struct kmem_cache_node *n; 4624 - unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL); 4625 4615 4626 - if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location), 4627 - GFP_KERNEL)) { 4628 - bitmap_free(map); 4616 + if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location), 4617 + GFP_KERNEL)) { 4629 4618 return sprintf(buf, "Out of memory\n"); 4630 4619 } 4631 4620 /* Push back cpu slabs */ ··· 4638 4631 4639 4632 spin_lock_irqsave(&n->list_lock, flags); 4640 4633 list_for_each_entry(page, &n->partial, slab_list) 4641 - process_slab(&t, s, page, alloc, map); 4634 + process_slab(&t, s, page, alloc); 4642 4635 list_for_each_entry(page, &n->full, slab_list) 4643 - process_slab(&t, s, page, alloc, map); 4636 + process_slab(&t, s, page, alloc); 4644 4637 spin_unlock_irqrestore(&n->list_lock, flags); 4645 4638 } 4646 4639 ··· 4689 4682 } 4690 4683 4691 4684 free_loc_track(&t); 4692 - bitmap_free(map); 4693 4685 if (!t.count) 4694 4686 len += sprintf(buf, "No data\n"); 4695 4687 return len;

+1 -1

mm/sparse.c

··· 789 789 ms->usage = NULL; 790 790 } 791 791 memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr); 792 - ms->section_mem_map = sparse_encode_mem_map(NULL, section_nr); 792 + ms->section_mem_map = (unsigned long)NULL; 793 793 } 794 794 795 795 if (section_is_early && memmap)

+26 -1

mm/swap.c

··· 813 813 * processing, and instead, expect a call to 814 814 * put_page_testzero(). 815 815 */ 816 - if (put_devmap_managed_page(page)) 816 + if (page_is_devmap_managed(page)) { 817 + put_devmap_managed_page(page); 817 818 continue; 819 + } 818 820 } 819 821 820 822 page = compound_head(page); ··· 1104 1102 * _really_ don't want to cluster much more 1105 1103 */ 1106 1104 } 1105 + 1106 + #ifdef CONFIG_DEV_PAGEMAP_OPS 1107 + void put_devmap_managed_page(struct page *page) 1108 + { 1109 + int count; 1110 + 1111 + if (WARN_ON_ONCE(!page_is_devmap_managed(page))) 1112 + return; 1113 + 1114 + count = page_ref_dec_return(page); 1115 + 1116 + /* 1117 + * devmap page refcounts are 1-based, rather than 0-based: if 1118 + * refcount is 1, then the page is free and the refcount is 1119 + * stable because nobody holds a reference on the page. 1120 + */ 1121 + if (count == 1) 1122 + free_devmap_managed_page(page); 1123 + else if (!count) 1124 + __put_page(page); 1125 + } 1126 + EXPORT_SYMBOL(put_devmap_managed_page); 1127 + #endif

+1 -1

mm/swapfile.c

··· 2737 2737 else 2738 2738 type = si->type + 1; 2739 2739 2740 + ++(*pos); 2740 2741 for (; (si = swap_type_to_swap_info(type)); type++) { 2741 2742 if (!(si->flags & SWP_USED) || !si->swap_map) 2742 2743 continue; 2743 - ++*pos; 2744 2744 return si; 2745 2745 } 2746 2746

+3 -21

mm/vmscan.c

··· 146 146 struct reclaim_state reclaim_state; 147 147 }; 148 148 149 - #ifdef ARCH_HAS_PREFETCH 150 - #define prefetch_prev_lru_page(_page, _base, _field) \ 151 - do { \ 152 - if ((_page)->lru.prev != _base) { \ 153 - struct page *prev; \ 154 - \ 155 - prev = lru_to_page(&(_page->lru)); \ 156 - prefetch(&prev->_field); \ 157 - } \ 158 - } while (0) 159 - #else 160 - #define prefetch_prev_lru_page(_page, _base, _field) do { } while (0) 161 - #endif 162 - 163 149 #ifdef ARCH_HAS_PREFETCHW 164 150 #define prefetchw_prev_lru_page(_page, _base, _field) \ 165 151 do { \ ··· 2681 2695 } while ((memcg = mem_cgroup_iter(target_memcg, memcg, NULL))); 2682 2696 } 2683 2697 2684 - static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc) 2698 + static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) 2685 2699 { 2686 2700 struct reclaim_state *reclaim_state = current->reclaim_state; 2687 2701 unsigned long nr_reclaimed, nr_scanned; ··· 2860 2874 */ 2861 2875 if (reclaimable) 2862 2876 pgdat->kswapd_failures = 0; 2863 - 2864 - return reclaimable; 2865 2877 } 2866 2878 2867 2879 /* ··· 4110 4126 */ 4111 4127 int node_reclaim_mode __read_mostly; 4112 4128 4113 - #define RECLAIM_OFF 0 4114 - #define RECLAIM_ZONE (1<<0) /* Run shrink_inactive_list on the zone */ 4115 - #define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */ 4116 - #define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */ 4129 + #define RECLAIM_WRITE (1<<0) /* Writeout pages during reclaim */ 4130 + #define RECLAIM_UNMAP (1<<1) /* Unmap pages during reclaim */ 4117 4131 4118 4132 /* 4119 4133 * Priority for NODE_RECLAIM. This determines the fraction of pages

+56 -32

mm/zswap.c

··· 32 32 #include <linux/swapops.h> 33 33 #include <linux/writeback.h> 34 34 #include <linux/pagemap.h> 35 + #include <linux/workqueue.h> 35 36 36 37 /********************************* 37 38 * statistics ··· 65 64 static u64 zswap_reject_kmemcache_fail; 66 65 /* Duplicate store was encountered (rare) */ 67 66 static u64 zswap_duplicate_entry; 67 + 68 + /* Shrinker work queue */ 69 + static struct workqueue_struct *shrink_wq; 70 + /* Pool limit was hit, we need to calm down */ 71 + static bool zswap_pool_reached_full; 68 72 69 73 /********************************* 70 74 * tunables ··· 115 109 static unsigned int zswap_max_pool_percent = 20; 116 110 module_param_named(max_pool_percent, zswap_max_pool_percent, uint, 0644); 117 111 112 + /* The threshold for accepting new pages after the max_pool_percent was hit */ 113 + static unsigned int zswap_accept_thr_percent = 90; /* of max pool size */ 114 + module_param_named(accept_threshold_percent, zswap_accept_thr_percent, 115 + uint, 0644); 116 + 118 117 /* Enable/disable handling same-value filled pages (enabled by default) */ 119 118 static bool zswap_same_filled_pages_enabled = true; 120 119 module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled, ··· 134 123 struct crypto_comp * __percpu *tfm; 135 124 struct kref kref; 136 125 struct list_head list; 137 - struct work_struct work; 126 + struct work_struct release_work; 127 + struct work_struct shrink_work; 138 128 struct hlist_node node; 139 129 char tfm_name[CRYPTO_MAX_ALG_NAME]; 140 130 }; ··· 223 211 static bool zswap_is_full(void) 224 212 { 225 213 return totalram_pages() * zswap_max_pool_percent / 100 < 214 + DIV_ROUND_UP(zswap_pool_total_size, PAGE_SIZE); 215 + } 216 + 217 + static bool zswap_can_accept(void) 218 + { 219 + return totalram_pages() * zswap_accept_thr_percent / 100 * 220 + zswap_max_pool_percent / 100 > 226 221 DIV_ROUND_UP(zswap_pool_total_size, PAGE_SIZE); 227 222 } 228 223 ··· 520 501 return NULL; 521 502 } 522 503 504 + static void shrink_worker(struct work_struct *w) 505 + { 506 + struct zswap_pool *pool = container_of(w, typeof(*pool), 507 + shrink_work); 508 + 509 + if (zpool_shrink(pool->zpool, 1, NULL)) 510 + zswap_reject_reclaim_fail++; 511 + zswap_pool_put(pool); 512 + } 513 + 523 514 static struct zswap_pool *zswap_pool_create(char *type, char *compressor) 524 515 { 525 516 struct zswap_pool *pool; ··· 580 551 */ 581 552 kref_init(&pool->kref); 582 553 INIT_LIST_HEAD(&pool->list); 554 + INIT_WORK(&pool->shrink_work, shrink_worker); 583 555 584 556 zswap_pool_debug("created", pool); 585 557 ··· 654 624 655 625 static void __zswap_pool_release(struct work_struct *work) 656 626 { 657 - struct zswap_pool *pool = container_of(work, typeof(*pool), work); 627 + struct zswap_pool *pool = container_of(work, typeof(*pool), 628 + release_work); 658 629 659 630 synchronize_rcu(); 660 631 ··· 678 647 679 648 list_del_rcu(&pool->list); 680 649 681 - INIT_WORK(&pool->work, __zswap_pool_release); 682 - schedule_work(&pool->work); 650 + INIT_WORK(&pool->release_work, __zswap_pool_release); 651 + schedule_work(&pool->release_work); 683 652 684 653 spin_unlock(&zswap_pools_lock); 685 654 } ··· 973 942 return ret; 974 943 } 975 944 976 - static int zswap_shrink(void) 977 - { 978 - struct zswap_pool *pool; 979 - int ret; 980 - 981 - pool = zswap_pool_last_get(); 982 - if (!pool) 983 - return -ENOENT; 984 - 985 - ret = zpool_shrink(pool->zpool, 1, NULL); 986 - 987 - zswap_pool_put(pool); 988 - 989 - return ret; 990 - } 991 - 992 945 static int zswap_is_page_same_filled(void *ptr, unsigned long *value) 993 946 { 994 947 unsigned int pos; ··· 1026 1011 1027 1012 /* reclaim space if needed */ 1028 1013 if (zswap_is_full()) { 1029 - zswap_pool_limit_hit++; 1030 - if (zswap_shrink()) { 1031 - zswap_reject_reclaim_fail++; 1032 - ret = -ENOMEM; 1033 - goto reject; 1034 - } 1014 + struct zswap_pool *pool; 1035 1015 1036 - /* A second zswap_is_full() check after 1037 - * zswap_shrink() to make sure it's now 1038 - * under the max_pool_percent 1039 - */ 1040 - if (zswap_is_full()) { 1016 + zswap_pool_limit_hit++; 1017 + zswap_pool_reached_full = true; 1018 + pool = zswap_pool_last_get(); 1019 + if (pool) 1020 + queue_work(shrink_wq, &pool->shrink_work); 1021 + ret = -ENOMEM; 1022 + goto reject; 1023 + } 1024 + 1025 + if (zswap_pool_reached_full) { 1026 + if (!zswap_can_accept()) { 1041 1027 ret = -ENOMEM; 1042 1028 goto reject; 1043 - } 1029 + } else 1030 + zswap_pool_reached_full = false; 1044 1031 } 1045 1032 1046 1033 /* allocate entry */ ··· 1349 1332 zswap_enabled = false; 1350 1333 } 1351 1334 1335 + shrink_wq = create_workqueue("zswap-shrink"); 1336 + if (!shrink_wq) 1337 + goto fallback_fail; 1338 + 1352 1339 frontswap_register_ops(&zswap_frontswap_ops); 1353 1340 if (zswap_debugfs_init()) 1354 1341 pr_warn("debugfs initialization failed\n"); 1355 1342 return 0; 1356 1343 1344 + fallback_fail: 1345 + if (pool) 1346 + zswap_pool_destroy(pool); 1357 1347 hp_fail: 1358 1348 cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE); 1359 1349 dstmem_fail:

+2 -2

net/xdp/xdp_umem.c

··· 212 212 213 213 static void xdp_umem_unpin_pages(struct xdp_umem *umem) 214 214 { 215 - put_user_pages_dirty_lock(umem->pgs, umem->npgs, true); 215 + unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true); 216 216 217 217 kfree(umem->pgs); 218 218 umem->pgs = NULL; ··· 291 291 return -ENOMEM; 292 292 293 293 down_read(&current->mm->mmap_sem); 294 - npgs = get_user_pages(umem->address, umem->npgs, 294 + npgs = pin_user_pages(umem->address, umem->npgs, 295 295 gup_flags | FOLL_LONGTERM, &umem->pgs[0], NULL); 296 296 up_read(&current->mm->mmap_sem); 297 297

+14

scripts/spelling.txt

··· 39 39 accquire||acquire 40 40 accquired||acquired 41 41 accross||across 42 + accumalate||accumulate 43 + accumalator||accumulator 42 44 acessable||accessible 43 45 acess||access 44 46 acessing||accessing ··· 108 106 alot||a lot 109 107 alow||allow 110 108 alows||allows 109 + alreay||already 111 110 alredy||already 112 111 altough||although 113 112 alue||value ··· 244 241 calescing||coalescing 245 242 calle||called 246 243 callibration||calibration 244 + callled||called 247 245 calucate||calculate 248 246 calulate||calculate 249 247 cancelation||cancellation ··· 315 311 comparsion||comparison 316 312 compatability||compatibility 317 313 compatable||compatible 314 + compatibililty||compatibility 318 315 compatibiliy||compatibility 319 316 compatibilty||compatibility 320 317 compatiblity||compatibility ··· 335 330 conbination||combination 336 331 conditionaly||conditionally 337 332 conditon||condition 333 + condtion||condition 338 334 conected||connected 339 335 conector||connector 340 336 connecetd||connected ··· 394 388 deafult||default 395 389 deamon||daemon 396 390 debouce||debounce 391 + decendant||descendant 392 + decendants||descendants 397 393 decompres||decompress 398 394 decsribed||described 399 395 decription||description ··· 419 411 delares||declares 420 412 delaring||declaring 421 413 delemiter||delimiter 414 + delievered||delivered 422 415 demodualtor||demodulator 423 416 demension||dimension 424 417 dependancies||dependencies 425 418 dependancy||dependency 426 419 dependant||dependent 420 + dependend||dependent 427 421 depreacted||deprecated 428 422 depreacte||deprecate 429 423 desactivate||deactivate ··· 801 791 irrelevent||irrelevant 802 792 isnt||isn't 803 793 isssue||issue 794 + issus||issues 804 795 iternations||iterations 805 796 itertation||iteration 806 797 itslef||itself ··· 1006 995 pendantic||pedantic 1007 996 peprocessor||preprocessor 1008 997 perfoming||performing 998 + perfomring||performing 1009 999 peripherial||peripheral 1010 1000 permissons||permissions 1011 1001 peroid||period ··· 1178 1166 retreiving||retrieving 1179 1167 retrive||retrieve 1180 1168 retrived||retrieved 1169 + retrun||return 1170 + retun||return 1181 1171 retuned||returned 1182 1172 reudce||reduce 1183 1173 reuest||request

+5 -1

tools/testing/selftests/vm/gup_benchmark.c

··· 18 18 #define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) 19 19 #define GUP_BENCHMARK _IOWR('g', 3, struct gup_benchmark) 20 20 21 + /* Just the flags we need, copied from mm.h: */ 22 + #define FOLL_WRITE 0x01 /* check pte is writable */ 23 + 21 24 struct gup_benchmark { 22 25 __u64 get_delta_usec; 23 26 __u64 put_delta_usec; ··· 88 85 } 89 86 90 87 gup.nr_pages_per_call = nr_pages; 91 - gup.flags = write; 88 + if (write) 89 + gup.flags |= FOLL_WRITE; 92 90 93 91 fd = open("/sys/kernel/debug/gup_benchmark", O_RDWR); 94 92 if (fd == -1)

+2 -2

tools/vm/slabinfo.c

··· 720 720 return; 721 721 722 722 if (sanity && !s->sanity_checks) { 723 - set_obj(s, "sanity", 1); 723 + set_obj(s, "sanity_checks", 1); 724 724 } 725 725 if (!sanity && s->sanity_checks) { 726 726 if (slab_empty(s)) 727 - set_obj(s, "sanity", 0); 727 + set_obj(s, "sanity_checks", 0); 728 728 else 729 729 fprintf(stderr, "%s not empty cannot disable sanity checks\n", s->name); 730 730 }