Merge branch 'akpm' (patches from Andrew)

+1

.mailmap

··· 100 100 Ralf Baechle <ralf@linux-mips.org> 101 101 Ralf Wildenhues <Ralf.Wildenhues@gmx.de> 102 102 Rémi Denis-Courmont <rdenis@simphalempin.com> 103 + Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> 103 104 Rudolf Marek <R.Marek@sh.cvut.cz> 104 105 Rui Saraiva <rmps@joel.ist.utl.pt> 105 106 Sachin P Sant <ssant@in.ibm.com>

+17

CREDITS

··· 508 508 W: http://paulbristow.net/linux/idefloppy.html 509 509 D: Maintainer of IDE/ATAPI floppy driver 510 510 511 + N: Stefano Brivio 512 + E: stefano.brivio@polimi.it 513 + D: Broadcom B43 driver 514 + 511 515 N: Dominik Brodowski 512 516 E: linux@brodo.de 513 517 W: http://www.brodo.de/ ··· 3011 3007 W: http://www.qsl.net/dl1bke/ 3012 3008 D: Generic Z8530 driver, AX.25 DAMA slave implementation 3013 3009 D: Several AX.25 hacks 3010 + 3011 + N: Ricardo Ribalda Delgado 3012 + E: ricardo.ribalda@gmail.com 3013 + W: http://ribalda.com 3014 + D: PLX USB338x driver 3015 + D: PCA9634 driver 3016 + D: Option GTM671WFS 3017 + D: Fintek F81216A 3018 + D: Various kernel hacks 3019 + S: Qtechnology A/S 3020 + S: Valby Langgade 142 3021 + S: 2500 Valby 3022 + S: Denmark 3014 3023 3015 3024 N: Francois-Rene Rideau 3016 3025 E: fare@tunes.org

+119

Documentation/ABI/obsolete/sysfs-block-zram

··· 1 + What: /sys/block/zram<id>/num_reads 2 + Date: August 2015 3 + Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> 4 + Description: 5 + The num_reads file is read-only and specifies the number of 6 + reads (failed or successful) done on this device. 7 + Now accessible via zram<id>/stat node. 8 + 9 + What: /sys/block/zram<id>/num_writes 10 + Date: August 2015 11 + Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> 12 + Description: 13 + The num_writes file is read-only and specifies the number of 14 + writes (failed or successful) done on this device. 15 + Now accessible via zram<id>/stat node. 16 + 17 + What: /sys/block/zram<id>/invalid_io 18 + Date: August 2015 19 + Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> 20 + Description: 21 + The invalid_io file is read-only and specifies the number of 22 + non-page-size-aligned I/O requests issued to this device. 23 + Now accessible via zram<id>/io_stat node. 24 + 25 + What: /sys/block/zram<id>/failed_reads 26 + Date: August 2015 27 + Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> 28 + Description: 29 + The failed_reads file is read-only and specifies the number of 30 + failed reads happened on this device. 31 + Now accessible via zram<id>/io_stat node. 32 + 33 + What: /sys/block/zram<id>/failed_writes 34 + Date: August 2015 35 + Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> 36 + Description: 37 + The failed_writes file is read-only and specifies the number of 38 + failed writes happened on this device. 39 + Now accessible via zram<id>/io_stat node. 40 + 41 + What: /sys/block/zram<id>/notify_free 42 + Date: August 2015 43 + Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> 44 + Description: 45 + The notify_free file is read-only. Depending on device usage 46 + scenario it may account a) the number of pages freed because 47 + of swap slot free notifications or b) the number of pages freed 48 + because of REQ_DISCARD requests sent by bio. The former ones 49 + are sent to a swap block device when a swap slot is freed, which 50 + implies that this disk is being used as a swap disk. The latter 51 + ones are sent by filesystem mounted with discard option, 52 + whenever some data blocks are getting discarded. 53 + Now accessible via zram<id>/io_stat node. 54 + 55 + What: /sys/block/zram<id>/zero_pages 56 + Date: August 2015 57 + Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> 58 + Description: 59 + The zero_pages file is read-only and specifies number of zero 60 + filled pages written to this disk. No memory is allocated for 61 + such pages. 62 + Now accessible via zram<id>/mm_stat node. 63 + 64 + What: /sys/block/zram<id>/orig_data_size 65 + Date: August 2015 66 + Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> 67 + Description: 68 + The orig_data_size file is read-only and specifies uncompressed 69 + size of data stored in this disk. This excludes zero-filled 70 + pages (zero_pages) since no memory is allocated for them. 71 + Unit: bytes 72 + Now accessible via zram<id>/mm_stat node. 73 + 74 + What: /sys/block/zram<id>/compr_data_size 75 + Date: August 2015 76 + Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> 77 + Description: 78 + The compr_data_size file is read-only and specifies compressed 79 + size of data stored in this disk. So, compression ratio can be 80 + calculated using orig_data_size and this statistic. 81 + Unit: bytes 82 + Now accessible via zram<id>/mm_stat node. 83 + 84 + What: /sys/block/zram<id>/mem_used_total 85 + Date: August 2015 86 + Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> 87 + Description: 88 + The mem_used_total file is read-only and specifies the amount 89 + of memory, including allocator fragmentation and metadata 90 + overhead, allocated for this disk. So, allocator space 91 + efficiency can be calculated using compr_data_size and this 92 + statistic. 93 + Unit: bytes 94 + Now accessible via zram<id>/mm_stat node. 95 + 96 + What: /sys/block/zram<id>/mem_used_max 97 + Date: August 2015 98 + Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> 99 + Description: 100 + The mem_used_max file is read/write and specifies the amount 101 + of maximum memory zram have consumed to store compressed data. 102 + For resetting the value, you should write "0". Otherwise, 103 + you could see -EINVAL. 104 + Unit: bytes 105 + Downgraded to write-only node: so it's possible to set new 106 + value only; its current value is stored in zram<id>/mm_stat 107 + node. 108 + 109 + What: /sys/block/zram<id>/mem_limit 110 + Date: August 2015 111 + Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> 112 + Description: 113 + The mem_limit file is read/write and specifies the maximum 114 + amount of memory ZRAM can use to store the compressed data. 115 + The limit could be changed in run time and "0" means disable 116 + the limit. No limit is the initial state. Unit: bytes 117 + Downgraded to write-only node: so it's possible to set new 118 + value only; its current value is stored in zram<id>/mm_stat 119 + node.

+25

Documentation/ABI/testing/sysfs-block-zram

··· 141 141 amount of memory ZRAM can use to store the compressed data. The 142 142 limit could be changed in run time and "0" means disable the 143 143 limit. No limit is the initial state. Unit: bytes 144 + 145 + What: /sys/block/zram<id>/compact 146 + Date: August 2015 147 + Contact: Minchan Kim <minchan@kernel.org> 148 + Description: 149 + The compact file is write-only and trigger compaction for 150 + allocator zrm uses. The allocator moves some objects so that 151 + it could free fragment space. 152 + 153 + What: /sys/block/zram<id>/io_stat 154 + Date: August 2015 155 + Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> 156 + Description: 157 + The io_stat file is read-only and accumulates device's I/O 158 + statistics not accounted by block layer. For example, 159 + failed_reads, failed_writes, etc. File format is similar to 160 + block layer statistics file format. 161 + 162 + What: /sys/block/zram<id>/mm_stat 163 + Date: August 2015 164 + Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> 165 + Description: 166 + The mm_stat file is read-only and represents device's mm 167 + statistics (orig_data_size, compr_data_size, etc.) in a format 168 + similar to block layer statistics file format.

+73 -14

Documentation/blockdev/zram.txt

··· 98 98 mount /dev/zram1 /tmp 99 99 100 100 7) Stats: 101 - Per-device statistics are exported as various nodes under 102 - /sys/block/zram<id>/ 103 - disksize 104 - num_reads 105 - num_writes 106 - failed_reads 107 - failed_writes 108 - invalid_io 109 - notify_free 110 - zero_pages 111 - orig_data_size 112 - compr_data_size 113 - mem_used_total 114 - mem_used_max 101 + Per-device statistics are exported as various nodes under /sys/block/zram<id>/ 102 + 103 + A brief description of exported device attritbutes. For more details please 104 + read Documentation/ABI/testing/sysfs-block-zram. 105 + 106 + Name access description 107 + ---- ------ ----------- 108 + disksize RW show and set the device's disk size 109 + initstate RO shows the initialization state of the device 110 + reset WO trigger device reset 111 + num_reads RO the number of reads 112 + failed_reads RO the number of failed reads 113 + num_write RO the number of writes 114 + failed_writes RO the number of failed writes 115 + invalid_io RO the number of non-page-size-aligned I/O requests 116 + max_comp_streams RW the number of possible concurrent compress operations 117 + comp_algorithm RW show and change the compression algorithm 118 + notify_free RO the number of notifications to free pages (either 119 + slot free notifications or REQ_DISCARD requests) 120 + zero_pages RO the number of zero filled pages written to this disk 121 + orig_data_size RO uncompressed size of data stored in this disk 122 + compr_data_size RO compressed size of data stored in this disk 123 + mem_used_total RO the amount of memory allocated for this disk 124 + mem_used_max RW the maximum amount memory zram have consumed to 125 + store compressed data 126 + mem_limit RW the maximum amount of memory ZRAM can use to store 127 + the compressed data 128 + num_migrated RO the number of objects migrated migrated by compaction 129 + 130 + 131 + WARNING 132 + ======= 133 + per-stat sysfs attributes are considered to be deprecated. 134 + The basic strategy is: 135 + -- the existing RW nodes will be downgraded to WO nodes (in linux 4.11) 136 + -- deprecated RO sysfs nodes will eventually be removed (in linux 4.11) 137 + 138 + The list of deprecated attributes can be found here: 139 + Documentation/ABI/obsolete/sysfs-block-zram 140 + 141 + Basically, every attribute that has its own read accessible sysfs node 142 + (e.g. num_reads) *AND* is accessible via one of the stat files (zram<id>/stat 143 + or zram<id>/io_stat or zram<id>/mm_stat) is considered to be deprecated. 144 + 145 + User space is advised to use the following files to read the device statistics. 146 + 147 + File /sys/block/zram<id>/stat 148 + 149 + Represents block layer statistics. Read Documentation/block/stat.txt for 150 + details. 151 + 152 + File /sys/block/zram<id>/io_stat 153 + 154 + The stat file represents device's I/O statistics not accounted by block 155 + layer and, thus, not available in zram<id>/stat file. It consists of a 156 + single line of text and contains the following stats separated by 157 + whitespace: 158 + failed_reads 159 + failed_writes 160 + invalid_io 161 + notify_free 162 + 163 + File /sys/block/zram<id>/mm_stat 164 + 165 + The stat file represents device's mm statistics. It consists of a single 166 + line of text and contains the following stats separated by whitespace: 167 + orig_data_size 168 + compr_data_size 169 + mem_used_total 170 + mem_limit 171 + mem_used_max 172 + zero_pages 173 + num_migrated 115 174 116 175 8) Deactivate: 117 176 swapoff /dev/zram0

+8

Documentation/filesystems/Locking

··· 523 523 void (*close)(struct vm_area_struct*); 524 524 int (*fault)(struct vm_area_struct*, struct vm_fault *); 525 525 int (*page_mkwrite)(struct vm_area_struct *, struct vm_fault *); 526 + int (*pfn_mkwrite)(struct vm_area_struct *, struct vm_fault *); 526 527 int (*access)(struct vm_area_struct *, unsigned long, void*, int, int); 527 528 528 529 locking rules: ··· 533 532 fault: yes can return with page locked 534 533 map_pages: yes 535 534 page_mkwrite: yes can return with page locked 535 + pfn_mkwrite: yes 536 536 access: yes 537 537 538 538 ->fault() is called when a previously not present pte is about ··· 559 557 the page has been truncated, the filesystem should not look up a new page 560 558 like the ->fault() handler, but simply return with VM_FAULT_NOPAGE, which 561 559 will cause the VM to retry the fault. 560 + 561 + ->pfn_mkwrite() is the same as page_mkwrite but when the pte is 562 + VM_PFNMAP or VM_MIXEDMAP with a page-less entry. Expected return is 563 + VM_FAULT_NOPAGE. Or one of the VM_FAULT_ERROR types. The default behavior 564 + after this call is to make the pte read-write, unless pfn_mkwrite returns 565 + an error. 562 566 563 567 ->access() is called when get_user_pages() fails in 564 568 access_process_vm(), typically used to debug a process through

+37 -12

Documentation/printk-formats.txt

··· 8 8 unsigned long long %llu or %llx 9 9 size_t %zu or %zx 10 10 ssize_t %zd or %zx 11 + s32 %d or %x 12 + u32 %u or %x 13 + s64 %lld or %llx 14 + u64 %llu or %llx 15 + 16 + If <type> is dependent on a config option for its size (e.g., sector_t, 17 + blkcnt_t) or is architecture-dependent for its size (e.g., tcflag_t), use a 18 + format specifier of its largest possible type and explicitly cast to it. 19 + Example: 20 + 21 + printk("test: sector number/total blocks: %llu/%llu\n", 22 + (unsigned long long)sector, (unsigned long long)blockcount); 23 + 24 + Reminder: sizeof() result is of type size_t. 25 + 11 26 12 27 Raw pointer value SHOULD be printed with %p. The kernel supports 13 28 the following extended format specifiers for pointer types: ··· 69 54 70 55 For printing struct resources. The 'R' and 'r' specifiers result in a 71 56 printed resource with ('R') or without ('r') a decoded flags member. 57 + Passed by reference. 72 58 73 59 Physical addresses types phys_addr_t: 74 60 ··· 148 132 specifier to use reversed byte order suitable for visual interpretation 149 133 of Bluetooth addresses which are in the little endian order. 150 134 135 + Passed by reference. 136 + 151 137 IPv4 addresses: 152 138 153 139 %pI4 1.2.3.4 ··· 164 146 host, network, big or little endian order addresses respectively. Where 165 147 no specifier is provided the default network/big endian order is used. 166 148 149 + Passed by reference. 150 + 167 151 IPv6 addresses: 168 152 169 153 %pI6 0001:0002:0003:0004:0005:0006:0007:0008 ··· 179 159 The additional 'c' specifier can be used with the 'I' specifier to 180 160 print a compressed IPv6 address as described by 181 161 http://tools.ietf.org/html/rfc5952 162 + 163 + Passed by reference. 182 164 183 165 IPv4/IPv6 addresses (generic, with port, flowinfo, scope): 184 166 ··· 208 186 specifiers can be used as well and are ignored in case of an IPv6 209 187 address. 210 188 189 + Passed by reference. 190 + 211 191 Further examples: 212 192 213 193 %pISfc 1.2.3.4 or [1:2:3:4:5:6:7:8]/123456789 ··· 231 207 Where no additional specifiers are used the default little endian 232 208 order with lower case hex characters will be printed. 233 209 210 + Passed by reference. 211 + 234 212 dentry names: 235 213 %pd{,2,3,4} 236 214 %pD{,2,3,4} ··· 241 215 a mix of old and new ones, but it won't oops. %pd dentry is a safer 242 216 equivalent of %s dentry->d_name.name we used to use, %pd<n> prints 243 217 n last components. %pD does the same thing for struct file. 218 + 219 + Passed by reference. 244 220 245 221 struct va_format: 246 222 ··· 259 231 Do not use this feature without some mechanism to verify the 260 232 correctness of the format string and va_list arguments. 261 233 262 - u64 SHOULD be printed with %llu/%llx: 234 + Passed by reference. 263 235 264 - printk("%llu", u64_var); 236 + struct clk: 265 237 266 - s64 SHOULD be printed with %lld/%llx: 238 + %pC pll1 239 + %pCn pll1 240 + %pCr 1560000000 267 241 268 - printk("%lld", s64_var); 242 + For printing struct clk structures. '%pC' and '%pCn' print the name 243 + (Common Clock Framework) or address (legacy clock framework) of the 244 + structure; '%pCr' prints the current clock rate. 269 245 270 - If <type> is dependent on a config option for its size (e.g., sector_t, 271 - blkcnt_t) or is architecture-dependent for its size (e.g., tcflag_t), use a 272 - format specifier of its largest possible type and explicitly cast to it. 273 - Example: 246 + Passed by reference. 274 247 275 - printk("test: sector number/total blocks: %llu/%llu\n", 276 - (unsigned long long)sector, (unsigned long long)blockcount); 277 - 278 - Reminder: sizeof() result is of type size_t. 279 248 280 249 Thank you for your cooperation and attention. 281 250

+11

Documentation/sysctl/vm.txt

··· 21 21 - admin_reserve_kbytes 22 22 - block_dump 23 23 - compact_memory 24 + - compact_unevictable_allowed 24 25 - dirty_background_bytes 25 26 - dirty_background_ratio 26 27 - dirty_bytes ··· 104 103 all zones are compacted such that free memory is available in contiguous 105 104 blocks where possible. This can be important for example in the allocation of 106 105 huge pages although processes will also directly compact memory as required. 106 + 107 + ============================================================== 108 + 109 + compact_unevictable_allowed 110 + 111 + Available only when CONFIG_COMPACTION is set. When set to 1, compaction is 112 + allowed to examine the unevictable lru (mlocked pages) for pages to compact. 113 + This should be used on systems where stalls for minor page faults are an 114 + acceptable trade for large contiguous free memory. Set to 0 to prevent 115 + compaction from moving pages that are unevictable. Default value is 1. 107 116 108 117 ============================================================== 109 118

+38 -17

Documentation/vm/hugetlbpage.txt

··· 267 267 type hugetlbfs: 268 268 269 269 mount -t hugetlbfs \ 270 - -o uid=<value>,gid=<value>,mode=<value>,size=<value>,nr_inodes=<value> \ 271 - none /mnt/huge 270 + -o uid=<value>,gid=<value>,mode=<value>,pagesize=<value>,size=<value>,\ 271 + min_size=<value>,nr_inodes=<value> none /mnt/huge 272 272 273 273 This command mounts a (pseudo) filesystem of type hugetlbfs on the directory 274 274 /mnt/huge. Any files created on /mnt/huge uses huge pages. The uid and gid 275 275 options sets the owner and group of the root of the file system. By default 276 276 the uid and gid of the current process are taken. The mode option sets the 277 277 mode of root of file system to value & 01777. This value is given in octal. 278 - By default the value 0755 is picked. The size option sets the maximum value of 279 - memory (huge pages) allowed for that filesystem (/mnt/huge). The size is 280 - rounded down to HPAGE_SIZE. The option nr_inodes sets the maximum number of 281 - inodes that /mnt/huge can use. If the size or nr_inodes option is not 282 - provided on command line then no limits are set. For size and nr_inodes 283 - options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For 284 - example, size=2K has the same meaning as size=2048. 278 + By default the value 0755 is picked. If the paltform supports multiple huge 279 + page sizes, the pagesize option can be used to specify the huge page size and 280 + associated pool. pagesize is specified in bytes. If pagesize is not specified 281 + the paltform's default huge page size and associated pool will be used. The 282 + size option sets the maximum value of memory (huge pages) allowed for that 283 + filesystem (/mnt/huge). The size option can be specified in bytes, or as a 284 + percentage of the specified huge page pool (nr_hugepages). The size is 285 + rounded down to HPAGE_SIZE boundary. The min_size option sets the minimum 286 + value of memory (huge pages) allowed for the filesystem. min_size can be 287 + specified in the same way as size, either bytes or a percentage of the 288 + huge page pool. At mount time, the number of huge pages specified by 289 + min_size are reserved for use by the filesystem. If there are not enough 290 + free huge pages available, the mount will fail. As huge pages are allocated 291 + to the filesystem and freed, the reserve count is adjusted so that the sum 292 + of allocated and reserved huge pages is always at least min_size. The option 293 + nr_inodes sets the maximum number of inodes that /mnt/huge can use. If the 294 + size, min_size or nr_inodes option is not provided on command line then 295 + no limits are set. For pagesize, size, min_size and nr_inodes options, you 296 + can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For example, size=2K 297 + has the same meaning as size=2048. 285 298 286 299 While read system calls are supported on files that reside on hugetlb 287 300 file systems, write system calls are not. ··· 302 289 Regular chown, chgrp, and chmod commands (with right permissions) could be 303 290 used to change the file attributes on hugetlbfs. 304 291 305 - Also, it is important to note that no such mount command is required if the 292 + Also, it is important to note that no such mount command is required if 306 293 applications are going to use only shmat/shmget system calls or mmap with 307 - MAP_HUGETLB. Users who wish to use hugetlb page via shared memory segment 308 - should be a member of a supplementary group and system admin needs to 309 - configure that gid into /proc/sys/vm/hugetlb_shm_group. It is possible for 310 - same or different applications to use any combination of mmaps and shm* 311 - calls, though the mount of filesystem will be required for using mmap calls 312 - without MAP_HUGETLB. For an example of how to use mmap with MAP_HUGETLB see 313 - map_hugetlb.c. 294 + MAP_HUGETLB. For an example of how to use mmap with MAP_HUGETLB see map_hugetlb 295 + below. 296 + 297 + Users who wish to use hugetlb memory via shared memory segment should be a 298 + member of a supplementary group and system admin needs to configure that gid 299 + into /proc/sys/vm/hugetlb_shm_group. It is possible for same or different 300 + applications to use any combination of mmaps and shm* calls, though the mount of 301 + filesystem will be required for using mmap calls without MAP_HUGETLB. 302 + 303 + Syscalls that operate on memory backed by hugetlb pages only have their lengths 304 + aligned to the native page size of the processor; they will normally fail with 305 + errno set to EINVAL or exclude hugetlb pages that extend beyond the length if 306 + not hugepage aligned. For example, munmap(2) will fail if memory is backed by 307 + a hugetlb page and the length is smaller than the hugepage size. 308 + 314 309 315 310 Examples 316 311 ========

+12

Documentation/vm/unevictable-lru.txt

··· 22 22 - Filtering special vmas. 23 23 - munlock()/munlockall() system call handling. 24 24 - Migrating mlocked pages. 25 + - Compacting mlocked pages. 25 26 - mmap(MAP_LOCKED) system call handling. 26 27 - munmap()/exit()/exec() system call handling. 27 28 - try_to_unmap(). ··· 449 448 process is released. To ensure that we don't strand pages on the unevictable 450 449 list because of a race between munlock and migration, page migration uses the 451 450 putback_lru_page() function to add migrated pages back to the LRU. 451 + 452 + 453 + COMPACTING MLOCKED PAGES 454 + ------------------------ 455 + 456 + The unevictable LRU can be scanned for compactable regions and the default 457 + behavior is to do so. /proc/sys/vm/compact_unevictable_allowed controls 458 + this behavior (see Documentation/sysctl/vm.txt). Once scanning of the 459 + unevictable LRU is enabled, the work of compaction is mostly handled by 460 + the page migration code and the same work flow as described in MIGRATING 461 + MLOCKED PAGES will apply. 452 462 453 463 454 464 mmap(MAP_LOCKED) SYSTEM CALL HANDLING

+70

Documentation/vm/zsmalloc.txt

··· 1 + zsmalloc 2 + -------- 3 + 4 + This allocator is designed for use with zram. Thus, the allocator is 5 + supposed to work well under low memory conditions. In particular, it 6 + never attempts higher order page allocation which is very likely to 7 + fail under memory pressure. On the other hand, if we just use single 8 + (0-order) pages, it would suffer from very high fragmentation -- 9 + any object of size PAGE_SIZE/2 or larger would occupy an entire page. 10 + This was one of the major issues with its predecessor (xvmalloc). 11 + 12 + To overcome these issues, zsmalloc allocates a bunch of 0-order pages 13 + and links them together using various 'struct page' fields. These linked 14 + pages act as a single higher-order page i.e. an object can span 0-order 15 + page boundaries. The code refers to these linked pages as a single entity 16 + called zspage. 17 + 18 + For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE 19 + since this satisfies the requirements of all its current users (in the 20 + worst case, page is incompressible and is thus stored "as-is" i.e. in 21 + uncompressed form). For allocation requests larger than this size, failure 22 + is returned (see zs_malloc). 23 + 24 + Additionally, zs_malloc() does not return a dereferenceable pointer. 25 + Instead, it returns an opaque handle (unsigned long) which encodes actual 26 + location of the allocated object. The reason for this indirection is that 27 + zsmalloc does not keep zspages permanently mapped since that would cause 28 + issues on 32-bit systems where the VA region for kernel space mappings 29 + is very small. So, before using the allocating memory, the object has to 30 + be mapped using zs_map_object() to get a usable pointer and subsequently 31 + unmapped using zs_unmap_object(). 32 + 33 + stat 34 + ---- 35 + 36 + With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via 37 + /sys/kernel/debug/zsmalloc/<user name>. Here is a sample of stat output: 38 + 39 + # cat /sys/kernel/debug/zsmalloc/zram0/classes 40 + 41 + class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage 42 + .. 43 + .. 44 + 9 176 0 1 186 129 8 4 45 + 10 192 1 0 2880 2872 135 3 46 + 11 208 0 1 819 795 42 2 47 + 12 224 0 1 219 159 12 4 48 + .. 49 + .. 50 + 51 + 52 + class: index 53 + size: object size zspage stores 54 + almost_empty: the number of ZS_ALMOST_EMPTY zspages(see below) 55 + almost_full: the number of ZS_ALMOST_FULL zspages(see below) 56 + obj_allocated: the number of objects allocated 57 + obj_used: the number of objects allocated to the user 58 + pages_used: the number of pages allocated for the class 59 + pages_per_zspage: the number of 0-order pages to make a zspage 60 + 61 + We assign a zspage to ZS_ALMOST_EMPTY fullness group when: 62 + n <= N / f, where 63 + n = number of allocated objects 64 + N = total number of objects zspage can store 65 + f = fullness_threshold_frac(ie, 4 at the moment) 66 + 67 + Similarly, we assign zspage to: 68 + ZS_ALMOST_FULL when n > N / f 69 + ZS_EMPTY when n == 0 70 + ZS_FULL when n == N

+37 -38

MAINTAINERS

··· 625 625 F: include/linux/amd-iommu.h 626 626 627 627 AMD KFD 628 - M: Oded Gabbay <oded.gabbay@amd.com> 629 - L: dri-devel@lists.freedesktop.org 630 - T: git git://people.freedesktop.org/~gabbayo/linux.git 631 - S: Supported 632 - F: drivers/gpu/drm/amd/amdkfd/ 628 + M: Oded Gabbay <oded.gabbay@amd.com> 629 + L: dri-devel@lists.freedesktop.org 630 + T: git git://people.freedesktop.org/~gabbayo/linux.git 631 + S: Supported 632 + F: drivers/gpu/drm/amd/amdkfd/ 633 633 F: drivers/gpu/drm/amd/include/cik_structs.h 634 634 F: drivers/gpu/drm/amd/include/kgd_kfd_interface.h 635 - F: drivers/gpu/drm/radeon/radeon_kfd.c 636 - F: drivers/gpu/drm/radeon/radeon_kfd.h 637 - F: include/uapi/linux/kfd_ioctl.h 635 + F: drivers/gpu/drm/radeon/radeon_kfd.c 636 + F: drivers/gpu/drm/radeon/radeon_kfd.h 637 + F: include/uapi/linux/kfd_ioctl.h 638 638 639 639 AMD MICROCODE UPDATE SUPPORT 640 640 M: Borislav Petkov <bp@alien8.de> ··· 1915 1915 F: drivers/media/radio/radio-aztech* 1916 1916 1917 1917 B43 WIRELESS DRIVER 1918 - M: Stefano Brivio <stefano.brivio@polimi.it> 1919 1918 L: linux-wireless@vger.kernel.org 1920 1919 L: b43-dev@lists.infradead.org 1921 1920 W: http://wireless.kernel.org/en/users/Drivers/b43 1922 - S: Maintained 1921 + S: Odd Fixes 1923 1922 F: drivers/net/wireless/b43/ 1924 1923 1925 1924 B43LEGACY WIRELESS DRIVER 1926 1925 M: Larry Finger <Larry.Finger@lwfinger.net> 1927 - M: Stefano Brivio <stefano.brivio@polimi.it> 1928 1926 L: linux-wireless@vger.kernel.org 1929 1927 L: b43-dev@lists.infradead.org 1930 1928 W: http://wireless.kernel.org/en/users/Drivers/b43 ··· 1965 1967 F: fs/befs/ 1966 1968 1967 1969 BECKHOFF CX5020 ETHERCAT MASTER DRIVER 1968 - M: Dariusz Marcinkiewicz <reksio@newterm.pl> 1969 - L: netdev@vger.kernel.org 1970 - S: Maintained 1971 - F: drivers/net/ethernet/ec_bhf.c 1970 + M: Dariusz Marcinkiewicz <reksio@newterm.pl> 1971 + L: netdev@vger.kernel.org 1972 + S: Maintained 1973 + F: drivers/net/ethernet/ec_bhf.c 1972 1974 1973 1975 BFS FILE SYSTEM 1974 1976 M: "Tigran A. Aivazian" <tigran@aivazian.fsnet.co.uk> ··· 2894 2896 F: drivers/net/ethernet/chelsio/cxgb3/ 2895 2897 2896 2898 CXGB3 ISCSI DRIVER (CXGB3I) 2897 - M: Karen Xie <kxie@chelsio.com> 2898 - L: linux-scsi@vger.kernel.org 2899 - W: http://www.chelsio.com 2900 - S: Supported 2901 - F: drivers/scsi/cxgbi/cxgb3i 2899 + M: Karen Xie <kxie@chelsio.com> 2900 + L: linux-scsi@vger.kernel.org 2901 + W: http://www.chelsio.com 2902 + S: Supported 2903 + F: drivers/scsi/cxgbi/cxgb3i 2902 2904 2903 2905 CXGB3 IWARP RNIC DRIVER (IW_CXGB3) 2904 2906 M: Steve Wise <swise@chelsio.com> ··· 2915 2917 F: drivers/net/ethernet/chelsio/cxgb4/ 2916 2918 2917 2919 CXGB4 ISCSI DRIVER (CXGB4I) 2918 - M: Karen Xie <kxie@chelsio.com> 2919 - L: linux-scsi@vger.kernel.org 2920 - W: http://www.chelsio.com 2921 - S: Supported 2922 - F: drivers/scsi/cxgbi/cxgb4i 2920 + M: Karen Xie <kxie@chelsio.com> 2921 + L: linux-scsi@vger.kernel.org 2922 + W: http://www.chelsio.com 2923 + S: Supported 2924 + F: drivers/scsi/cxgbi/cxgb4i 2923 2925 2924 2926 CXGB4 IWARP RNIC DRIVER (IW_CXGB4) 2925 2927 M: Steve Wise <swise@chelsio.com> ··· 5221 5223 INTEL WIRELESS WIMAX CONNECTION 2400 5222 5224 M: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com> 5223 5225 M: linux-wimax@intel.com 5224 - L: wimax@linuxwimax.org (subscribers-only) 5226 + L: wimax@linuxwimax.org (subscribers-only) 5225 5227 S: Supported 5226 5228 W: http://linuxwimax.org 5227 5229 F: Documentation/wimax/README.i2400m ··· 5924 5926 F: arch/powerpc/platforms/52xx/ 5925 5927 5926 5928 LINUX FOR POWERPC EMBEDDED PPC4XX 5927 - M: Alistair Popple <alistair@popple.id.au> 5929 + M: Alistair Popple <alistair@popple.id.au> 5928 5930 M: Matt Porter <mporter@kernel.crashing.org> 5929 5931 W: http://www.penguinppc.org/ 5930 5932 L: linuxppc-dev@lists.ozlabs.org ··· 6397 6399 F: drivers/watchdog/mena21_wdt.c 6398 6400 6399 6401 MEN CHAMELEON BUS (mcb) 6400 - M: Johannes Thumshirn <johannes.thumshirn@men.de> 6402 + M: Johannes Thumshirn <johannes.thumshirn@men.de> 6401 6403 S: Supported 6402 6404 F: drivers/mcb/ 6403 6405 F: include/linux/mcb.h ··· 7953 7955 S: Maintained 7954 7956 7955 7957 QAT DRIVER 7956 - M: Tadeusz Struk <tadeusz.struk@intel.com> 7957 - L: qat-linux@intel.com 7958 - S: Supported 7959 - F: drivers/crypto/qat/ 7958 + M: Tadeusz Struk <tadeusz.struk@intel.com> 7959 + L: qat-linux@intel.com 7960 + S: Supported 7961 + F: drivers/crypto/qat/ 7960 7962 7961 7963 QIB DRIVER 7962 7964 M: Mike Marciniszyn <infinipath@intel.com> ··· 10127 10129 F: include/uapi/linux/cdrom.h 10128 10130 10129 10131 UNISYS S-PAR DRIVERS 10130 - M: Benjamin Romer <benjamin.romer@unisys.com> 10131 - M: David Kershner <david.kershner@unisys.com> 10132 - L: sparmaintainer@unisys.com (Unisys internal) 10133 - S: Supported 10134 - F: drivers/staging/unisys/ 10132 + M: Benjamin Romer <benjamin.romer@unisys.com> 10133 + M: David Kershner <david.kershner@unisys.com> 10134 + L: sparmaintainer@unisys.com (Unisys internal) 10135 + S: Supported 10136 + F: drivers/staging/unisys/ 10135 10137 10136 10138 UNIVERSAL FLASH STORAGE HOST CONTROLLER DRIVER 10137 10139 M: Vinayak Holikatti <vinholikatti@gmail.com> ··· 10688 10690 WIMAX STACK 10689 10691 M: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com> 10690 10692 M: linux-wimax@intel.com 10691 - L: wimax@linuxwimax.org (subscribers-only) 10693 + L: wimax@linuxwimax.org (subscribers-only) 10692 10694 S: Supported 10693 10695 W: http://linuxwimax.org 10694 10696 F: Documentation/wimax/README.wimax ··· 10979 10981 S: Maintained 10980 10982 F: mm/zsmalloc.c 10981 10983 F: include/linux/zsmalloc.h 10984 + F: Documentation/vm/zsmalloc.txt 10982 10985 10983 10986 ZSWAP COMPRESSED SWAP CACHING 10984 10987 M: Seth Jennings <sjennings@variantweb.net>

+52 -57

arch/arm/plat-pxa/dma.c

··· 51 51 52 52 static int dbg_show_requester_chan(struct seq_file *s, void *p) 53 53 { 54 - int pos = 0; 55 54 int chan = (int)s->private; 56 55 int i; 57 56 u32 drcmr; 58 57 59 - pos += seq_printf(s, "DMA channel %d requesters list :\n", chan); 58 + seq_printf(s, "DMA channel %d requesters list :\n", chan); 60 59 for (i = 0; i < DMA_MAX_REQUESTERS; i++) { 61 60 drcmr = DRCMR(i); 62 61 if ((drcmr & DRCMR_CHLNUM) == chan) 63 - pos += seq_printf(s, "\tRequester %d (MAPVLD=%d)\n", i, 64 - !!(drcmr & DRCMR_MAPVLD)); 62 + seq_printf(s, "\tRequester %d (MAPVLD=%d)\n", 63 + i, !!(drcmr & DRCMR_MAPVLD)); 65 64 } 66 - return pos; 65 + 66 + return 0; 67 67 } 68 68 69 69 static inline int dbg_burst_from_dcmd(u32 dcmd) ··· 83 83 84 84 static int dbg_show_descriptors(struct seq_file *s, void *p) 85 85 { 86 - int pos = 0; 87 86 int chan = (int)s->private; 88 87 int i, max_show = 20, burst, width; 89 88 u32 dcmd; ··· 93 94 spin_lock_irqsave(&dma_channels[chan].lock, flags); 94 95 phys_desc = DDADR(chan); 95 96 96 - pos += seq_printf(s, "DMA channel %d descriptors :\n", chan); 97 - pos += seq_printf(s, "[%03d] First descriptor unknown\n", 0); 97 + seq_printf(s, "DMA channel %d descriptors :\n", chan); 98 + seq_printf(s, "[%03d] First descriptor unknown\n", 0); 98 99 for (i = 1; i < max_show && is_phys_valid(phys_desc); i++) { 99 100 desc = phys_to_virt(phys_desc); 100 101 dcmd = desc->dcmd; 101 102 burst = dbg_burst_from_dcmd(dcmd); 102 103 width = (1 << ((dcmd >> 14) & 0x3)) >> 1; 103 104 104 - pos += seq_printf(s, "[%03d] Desc at %08lx(virt %p)\n", 105 - i, phys_desc, desc); 106 - pos += seq_printf(s, "\tDDADR = %08x\n", desc->ddadr); 107 - pos += seq_printf(s, "\tDSADR = %08x\n", desc->dsadr); 108 - pos += seq_printf(s, "\tDTADR = %08x\n", desc->dtadr); 109 - pos += seq_printf(s, "\tDCMD = %08x (%s%s%s%s%s%s%sburst=%d" 110 - " width=%d len=%d)\n", 111 - dcmd, 112 - DCMD_STR(INCSRCADDR), DCMD_STR(INCTRGADDR), 113 - DCMD_STR(FLOWSRC), DCMD_STR(FLOWTRG), 114 - DCMD_STR(STARTIRQEN), DCMD_STR(ENDIRQEN), 115 - DCMD_STR(ENDIAN), burst, width, 116 - dcmd & DCMD_LENGTH); 105 + seq_printf(s, "[%03d] Desc at %08lx(virt %p)\n", 106 + i, phys_desc, desc); 107 + seq_printf(s, "\tDDADR = %08x\n", desc->ddadr); 108 + seq_printf(s, "\tDSADR = %08x\n", desc->dsadr); 109 + seq_printf(s, "\tDTADR = %08x\n", desc->dtadr); 110 + seq_printf(s, "\tDCMD = %08x (%s%s%s%s%s%s%sburst=%d width=%d len=%d)\n", 111 + dcmd, 112 + DCMD_STR(INCSRCADDR), DCMD_STR(INCTRGADDR), 113 + DCMD_STR(FLOWSRC), DCMD_STR(FLOWTRG), 114 + DCMD_STR(STARTIRQEN), DCMD_STR(ENDIRQEN), 115 + DCMD_STR(ENDIAN), burst, width, 116 + dcmd & DCMD_LENGTH); 117 117 phys_desc = desc->ddadr; 118 118 } 119 119 if (i == max_show) 120 - pos += seq_printf(s, "[%03d] Desc at %08lx ... max display reached\n", 121 - i, phys_desc); 120 + seq_printf(s, "[%03d] Desc at %08lx ... max display reached\n", 121 + i, phys_desc); 122 122 else 123 - pos += seq_printf(s, "[%03d] Desc at %08lx is %s\n", 124 - i, phys_desc, phys_desc == DDADR_STOP ? 125 - "DDADR_STOP" : "invalid"); 123 + seq_printf(s, "[%03d] Desc at %08lx is %s\n", 124 + i, phys_desc, phys_desc == DDADR_STOP ? 125 + "DDADR_STOP" : "invalid"); 126 126 127 127 spin_unlock_irqrestore(&dma_channels[chan].lock, flags); 128 - return pos; 128 + 129 + return 0; 129 130 } 130 131 131 132 static int dbg_show_chan_state(struct seq_file *s, void *p) 132 133 { 133 - int pos = 0; 134 134 int chan = (int)s->private; 135 135 u32 dcsr, dcmd; 136 136 int burst, width; ··· 140 142 burst = dbg_burst_from_dcmd(dcmd); 141 143 width = (1 << ((dcmd >> 14) & 0x3)) >> 1; 142 144 143 - pos += seq_printf(s, "DMA channel %d\n", chan); 144 - pos += seq_printf(s, "\tPriority : %s\n", 145 - str_prio[dma_channels[chan].prio]); 146 - pos += seq_printf(s, "\tUnaligned transfer bit: %s\n", 147 - DALGN & (1 << chan) ? "yes" : "no"); 148 - pos += seq_printf(s, "\tDCSR = %08x (%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s)\n", 149 - dcsr, DCSR_STR(RUN), DCSR_STR(NODESC), 150 - DCSR_STR(STOPIRQEN), DCSR_STR(EORIRQEN), 151 - DCSR_STR(EORJMPEN), DCSR_STR(EORSTOPEN), 152 - DCSR_STR(SETCMPST), DCSR_STR(CLRCMPST), 153 - DCSR_STR(CMPST), DCSR_STR(EORINTR), DCSR_STR(REQPEND), 154 - DCSR_STR(STOPSTATE), DCSR_STR(ENDINTR), 155 - DCSR_STR(STARTINTR), DCSR_STR(BUSERR)); 145 + seq_printf(s, "DMA channel %d\n", chan); 146 + seq_printf(s, "\tPriority : %s\n", str_prio[dma_channels[chan].prio]); 147 + seq_printf(s, "\tUnaligned transfer bit: %s\n", 148 + DALGN & (1 << chan) ? "yes" : "no"); 149 + seq_printf(s, "\tDCSR = %08x (%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s)\n", 150 + dcsr, DCSR_STR(RUN), DCSR_STR(NODESC), 151 + DCSR_STR(STOPIRQEN), DCSR_STR(EORIRQEN), 152 + DCSR_STR(EORJMPEN), DCSR_STR(EORSTOPEN), 153 + DCSR_STR(SETCMPST), DCSR_STR(CLRCMPST), 154 + DCSR_STR(CMPST), DCSR_STR(EORINTR), DCSR_STR(REQPEND), 155 + DCSR_STR(STOPSTATE), DCSR_STR(ENDINTR), 156 + DCSR_STR(STARTINTR), DCSR_STR(BUSERR)); 156 157 157 - pos += seq_printf(s, "\tDCMD = %08x (%s%s%s%s%s%s%sburst=%d width=%d" 158 - " len=%d)\n", 159 - dcmd, 160 - DCMD_STR(INCSRCADDR), DCMD_STR(INCTRGADDR), 161 - DCMD_STR(FLOWSRC), DCMD_STR(FLOWTRG), 162 - DCMD_STR(STARTIRQEN), DCMD_STR(ENDIRQEN), 163 - DCMD_STR(ENDIAN), burst, width, dcmd & DCMD_LENGTH); 164 - pos += seq_printf(s, "\tDSADR = %08x\n", DSADR(chan)); 165 - pos += seq_printf(s, "\tDTADR = %08x\n", DTADR(chan)); 166 - pos += seq_printf(s, "\tDDADR = %08x\n", DDADR(chan)); 167 - return pos; 158 + seq_printf(s, "\tDCMD = %08x (%s%s%s%s%s%s%sburst=%d width=%d len=%d)\n", 159 + dcmd, 160 + DCMD_STR(INCSRCADDR), DCMD_STR(INCTRGADDR), 161 + DCMD_STR(FLOWSRC), DCMD_STR(FLOWTRG), 162 + DCMD_STR(STARTIRQEN), DCMD_STR(ENDIRQEN), 163 + DCMD_STR(ENDIAN), burst, width, dcmd & DCMD_LENGTH); 164 + seq_printf(s, "\tDSADR = %08x\n", DSADR(chan)); 165 + seq_printf(s, "\tDTADR = %08x\n", DTADR(chan)); 166 + seq_printf(s, "\tDDADR = %08x\n", DDADR(chan)); 167 + 168 + return 0; 168 169 } 169 170 170 171 static int dbg_show_state(struct seq_file *s, void *p) 171 172 { 172 - int pos = 0; 173 - 174 173 /* basic device status */ 175 - pos += seq_printf(s, "DMA engine status\n"); 176 - pos += seq_printf(s, "\tChannel number: %d\n", num_dma_channels); 174 + seq_puts(s, "DMA engine status\n"); 175 + seq_printf(s, "\tChannel number: %d\n", num_dma_channels); 177 176 178 - return pos; 177 + return 0; 179 178 } 180 179 181 180 #define DBGFS_FUNC_DECL(name) \

+39 -46

arch/cris/arch-v10/kernel/fasttimer.c

··· 527 527 i = debug_log_cnt; 528 528 529 529 while (i != end_i || debug_log_cnt_wrapped) { 530 - if (seq_printf(m, debug_log_string[i], debug_log_value[i]) < 0) 530 + seq_printf(m, debug_log_string[i], debug_log_value[i]); 531 + if (seq_has_overflowed(m)) 531 532 return 0; 532 533 i = (i+1) % DEBUG_LOG_MAX; 533 534 } ··· 543 542 int cur = (fast_timers_started - i - 1) % NUM_TIMER_STATS; 544 543 545 544 #if 1 //ndef FAST_TIMER_LOG 546 - seq_printf(m, "div: %i freq: %i delay: %i" 547 - "\n", 545 + seq_printf(m, "div: %i freq: %i delay: %i\n", 548 546 timer_div_settings[cur], 549 547 timer_freq_settings[cur], 550 548 timer_delay_settings[cur]); 551 549 #endif 552 550 #ifdef FAST_TIMER_LOG 553 551 t = &timer_started_log[cur]; 554 - if (seq_printf(m, "%-14s s: %6lu.%06lu e: %6lu.%06lu " 555 - "d: %6li us data: 0x%08lX" 556 - "\n", 557 - t->name, 558 - (unsigned long)t->tv_set.tv_jiff, 559 - (unsigned long)t->tv_set.tv_usec, 560 - (unsigned long)t->tv_expires.tv_jiff, 561 - (unsigned long)t->tv_expires.tv_usec, 562 - t->delay_us, 563 - t->data) < 0) 552 + seq_printf(m, "%-14s s: %6lu.%06lu e: %6lu.%06lu d: %6li us data: 0x%08lX\n", 553 + t->name, 554 + (unsigned long)t->tv_set.tv_jiff, 555 + (unsigned long)t->tv_set.tv_usec, 556 + (unsigned long)t->tv_expires.tv_jiff, 557 + (unsigned long)t->tv_expires.tv_usec, 558 + t->delay_us, 559 + t->data); 560 + if (seq_has_overflowed(m)) 564 561 return 0; 565 562 #endif 566 563 } ··· 570 571 seq_printf(m, "Timers added: %i\n", fast_timers_added); 571 572 for (i = 0; i < num_to_show; i++) { 572 573 t = &timer_added_log[(fast_timers_added - i - 1) % NUM_TIMER_STATS]; 573 - if (seq_printf(m, "%-14s s: %6lu.%06lu e: %6lu.%06lu " 574 - "d: %6li us data: 0x%08lX" 575 - "\n", 576 - t->name, 577 - (unsigned long)t->tv_set.tv_jiff, 578 - (unsigned long)t->tv_set.tv_usec, 579 - (unsigned long)t->tv_expires.tv_jiff, 580 - (unsigned long)t->tv_expires.tv_usec, 581 - t->delay_us, 582 - t->data) < 0) 574 + seq_printf(m, "%-14s s: %6lu.%06lu e: %6lu.%06lu d: %6li us data: 0x%08lX\n", 575 + t->name, 576 + (unsigned long)t->tv_set.tv_jiff, 577 + (unsigned long)t->tv_set.tv_usec, 578 + (unsigned long)t->tv_expires.tv_jiff, 579 + (unsigned long)t->tv_expires.tv_usec, 580 + t->delay_us, 581 + t->data); 582 + if (seq_has_overflowed(m)) 583 583 return 0; 584 584 } 585 585 seq_putc(m, '\n'); ··· 588 590 seq_printf(m, "Timers expired: %i\n", fast_timers_expired); 589 591 for (i = 0; i < num_to_show; i++) { 590 592 t = &timer_expired_log[(fast_timers_expired - i - 1) % NUM_TIMER_STATS]; 591 - if (seq_printf(m, "%-14s s: %6lu.%06lu e: %6lu.%06lu " 592 - "d: %6li us data: 0x%08lX" 593 - "\n", 594 - t->name, 595 - (unsigned long)t->tv_set.tv_jiff, 596 - (unsigned long)t->tv_set.tv_usec, 597 - (unsigned long)t->tv_expires.tv_jiff, 598 - (unsigned long)t->tv_expires.tv_usec, 599 - t->delay_us, 600 - t->data) < 0) 593 + seq_printf(m, "%-14s s: %6lu.%06lu e: %6lu.%06lu d: %6li us data: 0x%08lX\n", 594 + t->name, 595 + (unsigned long)t->tv_set.tv_jiff, 596 + (unsigned long)t->tv_set.tv_usec, 597 + (unsigned long)t->tv_expires.tv_jiff, 598 + (unsigned long)t->tv_expires.tv_usec, 599 + t->delay_us, 600 + t->data); 601 + if (seq_has_overflowed(m)) 601 602 return 0; 602 603 } 603 604 seq_putc(m, '\n'); ··· 608 611 while (t) { 609 612 nextt = t->next; 610 613 local_irq_restore(flags); 611 - if (seq_printf(m, "%-14s s: %6lu.%06lu e: %6lu.%06lu " 612 - "d: %6li us data: 0x%08lX" 613 - /* " func: 0x%08lX" */ 614 - "\n", 615 - t->name, 616 - (unsigned long)t->tv_set.tv_jiff, 617 - (unsigned long)t->tv_set.tv_usec, 618 - (unsigned long)t->tv_expires.tv_jiff, 619 - (unsigned long)t->tv_expires.tv_usec, 620 - t->delay_us, 621 - t->data 622 - /* , t->function */ 623 - ) < 0) 614 + seq_printf(m, "%-14s s: %6lu.%06lu e: %6lu.%06lu d: %6li us data: 0x%08lX\n", 615 + t->name, 616 + (unsigned long)t->tv_set.tv_jiff, 617 + (unsigned long)t->tv_set.tv_usec, 618 + (unsigned long)t->tv_expires.tv_jiff, 619 + (unsigned long)t->tv_expires.tv_usec, 620 + t->delay_us, 621 + t->data); 622 + if (seq_has_overflowed(m)) 624 623 return 0; 625 624 local_irq_save(flags); 626 625 if (t->next != nextt)

+30 -28

arch/cris/arch-v10/kernel/setup.c

··· 63 63 else 64 64 info = &cpu_info[revision]; 65 65 66 - return seq_printf(m, 67 - "processor\t: 0\n" 68 - "cpu\t\t: CRIS\n" 69 - "cpu revision\t: %lu\n" 70 - "cpu model\t: %s\n" 71 - "cache size\t: %d kB\n" 72 - "fpu\t\t: %s\n" 73 - "mmu\t\t: %s\n" 74 - "mmu DMA bug\t: %s\n" 75 - "ethernet\t: %s Mbps\n" 76 - "token ring\t: %s\n" 77 - "scsi\t\t: %s\n" 78 - "ata\t\t: %s\n" 79 - "usb\t\t: %s\n" 80 - "bogomips\t: %lu.%02lu\n", 66 + seq_printf(m, 67 + "processor\t: 0\n" 68 + "cpu\t\t: CRIS\n" 69 + "cpu revision\t: %lu\n" 70 + "cpu model\t: %s\n" 71 + "cache size\t: %d kB\n" 72 + "fpu\t\t: %s\n" 73 + "mmu\t\t: %s\n" 74 + "mmu DMA bug\t: %s\n" 75 + "ethernet\t: %s Mbps\n" 76 + "token ring\t: %s\n" 77 + "scsi\t\t: %s\n" 78 + "ata\t\t: %s\n" 79 + "usb\t\t: %s\n" 80 + "bogomips\t: %lu.%02lu\n", 81 81 82 - revision, 83 - info->model, 84 - info->cache, 85 - info->flags & HAS_FPU ? "yes" : "no", 86 - info->flags & HAS_MMU ? "yes" : "no", 87 - info->flags & HAS_MMU_BUG ? "yes" : "no", 88 - info->flags & HAS_ETHERNET100 ? "10/100" : "10", 89 - info->flags & HAS_TOKENRING ? "4/16 Mbps" : "no", 90 - info->flags & HAS_SCSI ? "yes" : "no", 91 - info->flags & HAS_ATA ? "yes" : "no", 92 - info->flags & HAS_USB ? "yes" : "no", 93 - (loops_per_jiffy * HZ + 500) / 500000, 94 - ((loops_per_jiffy * HZ + 500) / 5000) % 100); 82 + revision, 83 + info->model, 84 + info->cache, 85 + info->flags & HAS_FPU ? "yes" : "no", 86 + info->flags & HAS_MMU ? "yes" : "no", 87 + info->flags & HAS_MMU_BUG ? "yes" : "no", 88 + info->flags & HAS_ETHERNET100 ? "10/100" : "10", 89 + info->flags & HAS_TOKENRING ? "4/16 Mbps" : "no", 90 + info->flags & HAS_SCSI ? "yes" : "no", 91 + info->flags & HAS_ATA ? "yes" : "no", 92 + info->flags & HAS_USB ? "yes" : "no", 93 + (loops_per_jiffy * HZ + 500) / 500000, 94 + ((loops_per_jiffy * HZ + 500) / 5000) % 100); 95 + 96 + return 0; 95 97 } 96 98 97 99 #endif /* CONFIG_PROC_FS */

+39 -46

arch/cris/arch-v32/kernel/fasttimer.c

··· 501 501 i = debug_log_cnt; 502 502 503 503 while ((i != end_i || debug_log_cnt_wrapped)) { 504 - if (seq_printf(m, debug_log_string[i], debug_log_value[i]) < 0) 504 + seq_printf(m, debug_log_string[i], debug_log_value[i]); 505 + if (seq_has_overflowed(m)) 505 506 return 0; 506 507 i = (i+1) % DEBUG_LOG_MAX; 507 508 } ··· 517 516 int cur = (fast_timers_started - i - 1) % NUM_TIMER_STATS; 518 517 519 518 #if 1 //ndef FAST_TIMER_LOG 520 - seq_printf(m, "div: %i delay: %i" 521 - "\n", 519 + seq_printf(m, "div: %i delay: %i\n", 522 520 timer_div_settings[cur], 523 521 timer_delay_settings[cur]); 524 522 #endif 525 523 #ifdef FAST_TIMER_LOG 526 524 t = &timer_started_log[cur]; 527 - if (seq_printf(m, "%-14s s: %6lu.%06lu e: %6lu.%06lu " 528 - "d: %6li us data: 0x%08lX" 529 - "\n", 530 - t->name, 531 - (unsigned long)t->tv_set.tv_jiff, 532 - (unsigned long)t->tv_set.tv_usec, 533 - (unsigned long)t->tv_expires.tv_jiff, 534 - (unsigned long)t->tv_expires.tv_usec, 535 - t->delay_us, 536 - t->data) < 0) 525 + seq_printf(m, "%-14s s: %6lu.%06lu e: %6lu.%06lu d: %6li us data: 0x%08lX\n", 526 + t->name, 527 + (unsigned long)t->tv_set.tv_jiff, 528 + (unsigned long)t->tv_set.tv_usec, 529 + (unsigned long)t->tv_expires.tv_jiff, 530 + (unsigned long)t->tv_expires.tv_usec, 531 + t->delay_us, 532 + t->data); 533 + if (seq_has_overflowed(m)) 537 534 return 0; 538 535 #endif 539 536 } ··· 543 544 seq_printf(m, "Timers added: %i\n", fast_timers_added); 544 545 for (i = 0; i < num_to_show; i++) { 545 546 t = &timer_added_log[(fast_timers_added - i - 1) % NUM_TIMER_STATS]; 546 - if (seq_printf(m, "%-14s s: %6lu.%06lu e: %6lu.%06lu " 547 - "d: %6li us data: 0x%08lX" 548 - "\n", 549 - t->name, 550 - (unsigned long)t->tv_set.tv_jiff, 551 - (unsigned long)t->tv_set.tv_usec, 552 - (unsigned long)t->tv_expires.tv_jiff, 553 - (unsigned long)t->tv_expires.tv_usec, 554 - t->delay_us, 555 - t->data) < 0) 547 + seq_printf(m, "%-14s s: %6lu.%06lu e: %6lu.%06lu d: %6li us data: 0x%08lX\n", 548 + t->name, 549 + (unsigned long)t->tv_set.tv_jiff, 550 + (unsigned long)t->tv_set.tv_usec, 551 + (unsigned long)t->tv_expires.tv_jiff, 552 + (unsigned long)t->tv_expires.tv_usec, 553 + t->delay_us, 554 + t->data); 555 + if (seq_has_overflowed(m)) 556 556 return 0; 557 557 } 558 558 seq_putc(m, '\n'); ··· 561 563 seq_printf(m, "Timers expired: %i\n", fast_timers_expired); 562 564 for (i = 0; i < num_to_show; i++){ 563 565 t = &timer_expired_log[(fast_timers_expired - i - 1) % NUM_TIMER_STATS]; 564 - if (seq_printf(m, "%-14s s: %6lu.%06lu e: %6lu.%06lu " 565 - "d: %6li us data: 0x%08lX" 566 - "\n", 567 - t->name, 568 - (unsigned long)t->tv_set.tv_jiff, 569 - (unsigned long)t->tv_set.tv_usec, 570 - (unsigned long)t->tv_expires.tv_jiff, 571 - (unsigned long)t->tv_expires.tv_usec, 572 - t->delay_us, 573 - t->data) < 0) 566 + seq_printf(m, "%-14s s: %6lu.%06lu e: %6lu.%06lu d: %6li us data: 0x%08lX\n", 567 + t->name, 568 + (unsigned long)t->tv_set.tv_jiff, 569 + (unsigned long)t->tv_set.tv_usec, 570 + (unsigned long)t->tv_expires.tv_jiff, 571 + (unsigned long)t->tv_expires.tv_usec, 572 + t->delay_us, 573 + t->data); 574 + if (seq_has_overflowed(m)) 574 575 return 0; 575 576 } 576 577 seq_putc(m, '\n'); ··· 581 584 while (t != NULL){ 582 585 nextt = t->next; 583 586 local_irq_restore(flags); 584 - if (seq_printf(m, "%-14s s: %6lu.%06lu e: %6lu.%06lu " 585 - "d: %6li us data: 0x%08lX" 586 - /* " func: 0x%08lX" */ 587 - "\n", 588 - t->name, 589 - (unsigned long)t->tv_set.tv_jiff, 590 - (unsigned long)t->tv_set.tv_usec, 591 - (unsigned long)t->tv_expires.tv_jiff, 592 - (unsigned long)t->tv_expires.tv_usec, 593 - t->delay_us, 594 - t->data 595 - /* , t->function */ 596 - ) < 0) 587 + seq_printf(m, "%-14s s: %6lu.%06lu e: %6lu.%06lu d: %6li us data: 0x%08lX\n", 588 + t->name, 589 + (unsigned long)t->tv_set.tv_jiff, 590 + (unsigned long)t->tv_set.tv_usec, 591 + (unsigned long)t->tv_expires.tv_jiff, 592 + (unsigned long)t->tv_expires.tv_usec, 593 + t->delay_us, 594 + t->data); 595 + if (seq_has_overflowed(m)) 597 596 return 0; 598 597 local_irq_save(flags); 599 598 if (t->next != nextt)

+31 -29

arch/cris/arch-v32/kernel/setup.c

··· 77 77 } 78 78 } 79 79 80 - return seq_printf(m, 81 - "processor\t: %d\n" 82 - "cpu\t\t: CRIS\n" 83 - "cpu revision\t: %lu\n" 84 - "cpu model\t: %s\n" 85 - "cache size\t: %d KB\n" 86 - "fpu\t\t: %s\n" 87 - "mmu\t\t: %s\n" 88 - "mmu DMA bug\t: %s\n" 89 - "ethernet\t: %s Mbps\n" 90 - "token ring\t: %s\n" 91 - "scsi\t\t: %s\n" 92 - "ata\t\t: %s\n" 93 - "usb\t\t: %s\n" 94 - "bogomips\t: %lu.%02lu\n\n", 80 + seq_printf(m, 81 + "processor\t: %d\n" 82 + "cpu\t\t: CRIS\n" 83 + "cpu revision\t: %lu\n" 84 + "cpu model\t: %s\n" 85 + "cache size\t: %d KB\n" 86 + "fpu\t\t: %s\n" 87 + "mmu\t\t: %s\n" 88 + "mmu DMA bug\t: %s\n" 89 + "ethernet\t: %s Mbps\n" 90 + "token ring\t: %s\n" 91 + "scsi\t\t: %s\n" 92 + "ata\t\t: %s\n" 93 + "usb\t\t: %s\n" 94 + "bogomips\t: %lu.%02lu\n\n", 95 95 96 - cpu, 97 - revision, 98 - info->cpu_model, 99 - info->cache_size, 100 - info->flags & HAS_FPU ? "yes" : "no", 101 - info->flags & HAS_MMU ? "yes" : "no", 102 - info->flags & HAS_MMU_BUG ? "yes" : "no", 103 - info->flags & HAS_ETHERNET100 ? "10/100" : "10", 104 - info->flags & HAS_TOKENRING ? "4/16 Mbps" : "no", 105 - info->flags & HAS_SCSI ? "yes" : "no", 106 - info->flags & HAS_ATA ? "yes" : "no", 107 - info->flags & HAS_USB ? "yes" : "no", 108 - (loops_per_jiffy * HZ + 500) / 500000, 109 - ((loops_per_jiffy * HZ + 500) / 5000) % 100); 96 + cpu, 97 + revision, 98 + info->cpu_model, 99 + info->cache_size, 100 + info->flags & HAS_FPU ? "yes" : "no", 101 + info->flags & HAS_MMU ? "yes" : "no", 102 + info->flags & HAS_MMU_BUG ? "yes" : "no", 103 + info->flags & HAS_ETHERNET100 ? "10/100" : "10", 104 + info->flags & HAS_TOKENRING ? "4/16 Mbps" : "no", 105 + info->flags & HAS_SCSI ? "yes" : "no", 106 + info->flags & HAS_ATA ? "yes" : "no", 107 + info->flags & HAS_USB ? "yes" : "no", 108 + (loops_per_jiffy * HZ + 500) / 500000, 109 + ((loops_per_jiffy * HZ + 500) / 5000) % 100); 110 + 111 + return 0; 110 112 } 111 113 112 114 #endif /* CONFIG_PROC_FS */

+68 -71

arch/microblaze/kernel/cpu/mb.c

··· 27 27 28 28 static int show_cpuinfo(struct seq_file *m, void *v) 29 29 { 30 - int count = 0; 31 30 char *fpga_family = "Unknown"; 32 31 char *cpu_ver = "Unknown"; 33 32 int i; ··· 47 48 } 48 49 } 49 50 50 - count = seq_printf(m, 51 - "CPU-Family: MicroBlaze\n" 52 - "FPGA-Arch: %s\n" 53 - "CPU-Ver: %s, %s endian\n" 54 - "CPU-MHz: %d.%02d\n" 55 - "BogoMips: %lu.%02lu\n", 56 - fpga_family, 57 - cpu_ver, 58 - cpuinfo.endian ? "little" : "big", 59 - cpuinfo.cpu_clock_freq / 60 - 1000000, 61 - cpuinfo.cpu_clock_freq % 62 - 1000000, 63 - loops_per_jiffy / (500000 / HZ), 64 - (loops_per_jiffy / (5000 / HZ)) % 100); 51 + seq_printf(m, 52 + "CPU-Family: MicroBlaze\n" 53 + "FPGA-Arch: %s\n" 54 + "CPU-Ver: %s, %s endian\n" 55 + "CPU-MHz: %d.%02d\n" 56 + "BogoMips: %lu.%02lu\n", 57 + fpga_family, 58 + cpu_ver, 59 + cpuinfo.endian ? "little" : "big", 60 + cpuinfo.cpu_clock_freq / 1000000, 61 + cpuinfo.cpu_clock_freq % 1000000, 62 + loops_per_jiffy / (500000 / HZ), 63 + (loops_per_jiffy / (5000 / HZ)) % 100); 65 64 66 - count += seq_printf(m, 67 - "HW:\n Shift:\t\t%s\n" 68 - " MSR:\t\t%s\n" 69 - " PCMP:\t\t%s\n" 70 - " DIV:\t\t%s\n", 71 - (cpuinfo.use_instr & PVR0_USE_BARREL_MASK) ? "yes" : "no", 72 - (cpuinfo.use_instr & PVR2_USE_MSR_INSTR) ? "yes" : "no", 73 - (cpuinfo.use_instr & PVR2_USE_PCMP_INSTR) ? "yes" : "no", 74 - (cpuinfo.use_instr & PVR0_USE_DIV_MASK) ? "yes" : "no"); 65 + seq_printf(m, 66 + "HW:\n Shift:\t\t%s\n" 67 + " MSR:\t\t%s\n" 68 + " PCMP:\t\t%s\n" 69 + " DIV:\t\t%s\n", 70 + (cpuinfo.use_instr & PVR0_USE_BARREL_MASK) ? "yes" : "no", 71 + (cpuinfo.use_instr & PVR2_USE_MSR_INSTR) ? "yes" : "no", 72 + (cpuinfo.use_instr & PVR2_USE_PCMP_INSTR) ? "yes" : "no", 73 + (cpuinfo.use_instr & PVR0_USE_DIV_MASK) ? "yes" : "no"); 75 74 76 - count += seq_printf(m, 77 - " MMU:\t\t%x\n", 78 - cpuinfo.mmu); 75 + seq_printf(m, " MMU:\t\t%x\n", cpuinfo.mmu); 79 76 80 - count += seq_printf(m, 81 - " MUL:\t\t%s\n" 82 - " FPU:\t\t%s\n", 83 - (cpuinfo.use_mult & PVR2_USE_MUL64_MASK) ? "v2" : 84 - (cpuinfo.use_mult & PVR0_USE_HW_MUL_MASK) ? "v1" : "no", 85 - (cpuinfo.use_fpu & PVR2_USE_FPU2_MASK) ? "v2" : 86 - (cpuinfo.use_fpu & PVR0_USE_FPU_MASK) ? "v1" : "no"); 77 + seq_printf(m, 78 + " MUL:\t\t%s\n" 79 + " FPU:\t\t%s\n", 80 + (cpuinfo.use_mult & PVR2_USE_MUL64_MASK) ? "v2" : 81 + (cpuinfo.use_mult & PVR0_USE_HW_MUL_MASK) ? "v1" : "no", 82 + (cpuinfo.use_fpu & PVR2_USE_FPU2_MASK) ? "v2" : 83 + (cpuinfo.use_fpu & PVR0_USE_FPU_MASK) ? "v1" : "no"); 87 84 88 - count += seq_printf(m, 89 - " Exc:\t\t%s%s%s%s%s%s%s%s\n", 90 - (cpuinfo.use_exc & PVR2_OPCODE_0x0_ILL_MASK) ? "op0x0 " : "", 91 - (cpuinfo.use_exc & PVR2_UNALIGNED_EXC_MASK) ? "unal " : "", 92 - (cpuinfo.use_exc & PVR2_ILL_OPCODE_EXC_MASK) ? "ill " : "", 93 - (cpuinfo.use_exc & PVR2_IOPB_BUS_EXC_MASK) ? "iopb " : "", 94 - (cpuinfo.use_exc & PVR2_DOPB_BUS_EXC_MASK) ? "dopb " : "", 95 - (cpuinfo.use_exc & PVR2_DIV_ZERO_EXC_MASK) ? "zero " : "", 96 - (cpuinfo.use_exc & PVR2_FPU_EXC_MASK) ? "fpu " : "", 97 - (cpuinfo.use_exc & PVR2_USE_FSL_EXC) ? "fsl " : ""); 85 + seq_printf(m, 86 + " Exc:\t\t%s%s%s%s%s%s%s%s\n", 87 + (cpuinfo.use_exc & PVR2_OPCODE_0x0_ILL_MASK) ? "op0x0 " : "", 88 + (cpuinfo.use_exc & PVR2_UNALIGNED_EXC_MASK) ? "unal " : "", 89 + (cpuinfo.use_exc & PVR2_ILL_OPCODE_EXC_MASK) ? "ill " : "", 90 + (cpuinfo.use_exc & PVR2_IOPB_BUS_EXC_MASK) ? "iopb " : "", 91 + (cpuinfo.use_exc & PVR2_DOPB_BUS_EXC_MASK) ? "dopb " : "", 92 + (cpuinfo.use_exc & PVR2_DIV_ZERO_EXC_MASK) ? "zero " : "", 93 + (cpuinfo.use_exc & PVR2_FPU_EXC_MASK) ? "fpu " : "", 94 + (cpuinfo.use_exc & PVR2_USE_FSL_EXC) ? "fsl " : ""); 98 95 99 - count += seq_printf(m, 100 - "Stream-insns:\t%sprivileged\n", 101 - cpuinfo.mmu_privins ? "un" : ""); 96 + seq_printf(m, 97 + "Stream-insns:\t%sprivileged\n", 98 + cpuinfo.mmu_privins ? "un" : ""); 102 99 103 100 if (cpuinfo.use_icache) 104 - count += seq_printf(m, 105 - "Icache:\t\t%ukB\tline length:\t%dB\n", 106 - cpuinfo.icache_size >> 10, 107 - cpuinfo.icache_line_length); 101 + seq_printf(m, 102 + "Icache:\t\t%ukB\tline length:\t%dB\n", 103 + cpuinfo.icache_size >> 10, 104 + cpuinfo.icache_line_length); 108 105 else 109 - count += seq_printf(m, "Icache:\t\tno\n"); 106 + seq_puts(m, "Icache:\t\tno\n"); 110 107 111 108 if (cpuinfo.use_dcache) { 112 - count += seq_printf(m, 113 - "Dcache:\t\t%ukB\tline length:\t%dB\n", 114 - cpuinfo.dcache_size >> 10, 115 - cpuinfo.dcache_line_length); 116 - seq_printf(m, "Dcache-Policy:\t"); 109 + seq_printf(m, 110 + "Dcache:\t\t%ukB\tline length:\t%dB\n", 111 + cpuinfo.dcache_size >> 10, 112 + cpuinfo.dcache_line_length); 113 + seq_puts(m, "Dcache-Policy:\t"); 117 114 if (cpuinfo.dcache_wb) 118 - count += seq_printf(m, "write-back\n"); 115 + seq_puts(m, "write-back\n"); 119 116 else 120 - count += seq_printf(m, "write-through\n"); 121 - } else 122 - count += seq_printf(m, "Dcache:\t\tno\n"); 117 + seq_puts(m, "write-through\n"); 118 + } else { 119 + seq_puts(m, "Dcache:\t\tno\n"); 120 + } 123 121 124 - count += seq_printf(m, 125 - "HW-Debug:\t%s\n", 126 - cpuinfo.hw_debug ? "yes" : "no"); 122 + seq_printf(m, 123 + "HW-Debug:\t%s\n", 124 + cpuinfo.hw_debug ? "yes" : "no"); 127 125 128 - count += seq_printf(m, 129 - "PVR-USR1:\t%02x\n" 130 - "PVR-USR2:\t%08x\n", 131 - cpuinfo.pvr_user1, 132 - cpuinfo.pvr_user2); 126 + seq_printf(m, 127 + "PVR-USR1:\t%02x\n" 128 + "PVR-USR2:\t%08x\n", 129 + cpuinfo.pvr_user1, 130 + cpuinfo.pvr_user2); 133 131 134 - count += seq_printf(m, "Page size:\t%lu\n", PAGE_SIZE); 132 + seq_printf(m, "Page size:\t%lu\n", PAGE_SIZE); 133 + 135 134 return 0; 136 135 } 137 136

+34 -35

arch/nios2/kernel/cpuinfo.c

··· 126 126 */ 127 127 static int show_cpuinfo(struct seq_file *m, void *v) 128 128 { 129 - int count = 0; 130 129 const u32 clockfreq = cpuinfo.cpu_clock_freq; 131 130 132 - count = seq_printf(m, 133 - "CPU:\t\tNios II/%s\n" 134 - "MMU:\t\t%s\n" 135 - "FPU:\t\tnone\n" 136 - "Clocking:\t%u.%02u MHz\n" 137 - "BogoMips:\t%lu.%02lu\n" 138 - "Calibration:\t%lu loops\n", 139 - cpuinfo.cpu_impl, 140 - cpuinfo.mmu ? "present" : "none", 141 - clockfreq / 1000000, (clockfreq / 100000) % 10, 142 - (loops_per_jiffy * HZ) / 500000, 143 - ((loops_per_jiffy * HZ) / 5000) % 100, 144 - (loops_per_jiffy * HZ)); 131 + seq_printf(m, 132 + "CPU:\t\tNios II/%s\n" 133 + "MMU:\t\t%s\n" 134 + "FPU:\t\tnone\n" 135 + "Clocking:\t%u.%02u MHz\n" 136 + "BogoMips:\t%lu.%02lu\n" 137 + "Calibration:\t%lu loops\n", 138 + cpuinfo.cpu_impl, 139 + cpuinfo.mmu ? "present" : "none", 140 + clockfreq / 1000000, (clockfreq / 100000) % 10, 141 + (loops_per_jiffy * HZ) / 500000, 142 + ((loops_per_jiffy * HZ) / 5000) % 100, 143 + (loops_per_jiffy * HZ)); 145 144 146 - count += seq_printf(m, 147 - "HW:\n" 148 - " MUL:\t\t%s\n" 149 - " MULX:\t\t%s\n" 150 - " DIV:\t\t%s\n", 151 - cpuinfo.has_mul ? "yes" : "no", 152 - cpuinfo.has_mulx ? "yes" : "no", 153 - cpuinfo.has_div ? "yes" : "no"); 145 + seq_printf(m, 146 + "HW:\n" 147 + " MUL:\t\t%s\n" 148 + " MULX:\t\t%s\n" 149 + " DIV:\t\t%s\n", 150 + cpuinfo.has_mul ? "yes" : "no", 151 + cpuinfo.has_mulx ? "yes" : "no", 152 + cpuinfo.has_div ? "yes" : "no"); 154 153 155 - count += seq_printf(m, 156 - "Icache:\t\t%ukB, line length: %u\n", 157 - cpuinfo.icache_size >> 10, 158 - cpuinfo.icache_line_size); 154 + seq_printf(m, 155 + "Icache:\t\t%ukB, line length: %u\n", 156 + cpuinfo.icache_size >> 10, 157 + cpuinfo.icache_line_size); 159 158 160 - count += seq_printf(m, 161 - "Dcache:\t\t%ukB, line length: %u\n", 162 - cpuinfo.dcache_size >> 10, 163 - cpuinfo.dcache_line_size); 159 + seq_printf(m, 160 + "Dcache:\t\t%ukB, line length: %u\n", 161 + cpuinfo.dcache_size >> 10, 162 + cpuinfo.dcache_line_size); 164 163 165 - count += seq_printf(m, 166 - "TLB:\t\t%u ways, %u entries, %u PID bits\n", 167 - cpuinfo.tlb_num_ways, 168 - cpuinfo.tlb_num_entries, 169 - cpuinfo.tlb_pid_num_bits); 164 + seq_printf(m, 165 + "TLB:\t\t%u ways, %u entries, %u PID bits\n", 166 + cpuinfo.tlb_num_ways, 167 + cpuinfo.tlb_num_entries, 168 + cpuinfo.tlb_pid_num_bits); 170 169 171 170 return 0; 172 171 }

+26 -24

arch/openrisc/kernel/setup.c

··· 329 329 version = (vr & SPR_VR_VER) >> 24; 330 330 revision = vr & SPR_VR_REV; 331 331 332 - return seq_printf(m, 333 - "cpu\t\t: OpenRISC-%x\n" 334 - "revision\t: %d\n" 335 - "frequency\t: %ld\n" 336 - "dcache size\t: %d bytes\n" 337 - "dcache block size\t: %d bytes\n" 338 - "icache size\t: %d bytes\n" 339 - "icache block size\t: %d bytes\n" 340 - "immu\t\t: %d entries, %lu ways\n" 341 - "dmmu\t\t: %d entries, %lu ways\n" 342 - "bogomips\t: %lu.%02lu\n", 343 - version, 344 - revision, 345 - loops_per_jiffy * HZ, 346 - cpuinfo.dcache_size, 347 - cpuinfo.dcache_block_size, 348 - cpuinfo.icache_size, 349 - cpuinfo.icache_block_size, 350 - 1 << ((mfspr(SPR_DMMUCFGR) & SPR_DMMUCFGR_NTS) >> 2), 351 - 1 + (mfspr(SPR_DMMUCFGR) & SPR_DMMUCFGR_NTW), 352 - 1 << ((mfspr(SPR_IMMUCFGR) & SPR_IMMUCFGR_NTS) >> 2), 353 - 1 + (mfspr(SPR_IMMUCFGR) & SPR_IMMUCFGR_NTW), 354 - (loops_per_jiffy * HZ) / 500000, 355 - ((loops_per_jiffy * HZ) / 5000) % 100); 332 + seq_printf(m, 333 + "cpu\t\t: OpenRISC-%x\n" 334 + "revision\t: %d\n" 335 + "frequency\t: %ld\n" 336 + "dcache size\t: %d bytes\n" 337 + "dcache block size\t: %d bytes\n" 338 + "icache size\t: %d bytes\n" 339 + "icache block size\t: %d bytes\n" 340 + "immu\t\t: %d entries, %lu ways\n" 341 + "dmmu\t\t: %d entries, %lu ways\n" 342 + "bogomips\t: %lu.%02lu\n", 343 + version, 344 + revision, 345 + loops_per_jiffy * HZ, 346 + cpuinfo.dcache_size, 347 + cpuinfo.dcache_block_size, 348 + cpuinfo.icache_size, 349 + cpuinfo.icache_block_size, 350 + 1 << ((mfspr(SPR_DMMUCFGR) & SPR_DMMUCFGR_NTS) >> 2), 351 + 1 + (mfspr(SPR_DMMUCFGR) & SPR_DMMUCFGR_NTW), 352 + 1 << ((mfspr(SPR_IMMUCFGR) & SPR_IMMUCFGR_NTS) >> 2), 353 + 1 + (mfspr(SPR_IMMUCFGR) & SPR_IMMUCFGR_NTW), 354 + (loops_per_jiffy * HZ) / 500000, 355 + ((loops_per_jiffy * HZ) / 5000) % 100); 356 + 357 + return 0; 356 358 } 357 359 358 360 static void *c_start(struct seq_file *m, loff_t * pos)

+3 -2

arch/powerpc/platforms/powernv/opal-power.c

··· 29 29 30 30 switch (type) { 31 31 case SOFT_REBOOT: 32 - /* Fall through. The service processor is responsible for 33 - * bringing the machine back up */ 32 + pr_info("OPAL: reboot requested\n"); 33 + orderly_reboot(); 34 + break; 34 35 case SOFT_OFF: 35 36 pr_info("OPAL: poweroff requested\n"); 36 37 orderly_poweroff(true);

+1

arch/s390/Kconfig

··· 328 328 select COMPAT_BINFMT_ELF if BINFMT_ELF 329 329 select ARCH_WANT_OLD_COMPAT_IPC 330 330 select COMPAT_OLD_SIGACTION 331 + depends on MULTIUSER 331 332 help 332 333 Select this option if you want to enable your system kernel to 333 334 handle system-calls from ELF binaries for 31 bit ESA. This option

+4 -2

arch/s390/pci/pci_debug.c

··· 45 45 46 46 if (!zdev) 47 47 return 0; 48 - if (!zdev->fmb) 49 - return seq_printf(m, "FMB statistics disabled\n"); 48 + if (!zdev->fmb) { 49 + seq_puts(m, "FMB statistics disabled\n"); 50 + return 0; 51 + } 50 52 51 53 /* header */ 52 54 seq_printf(m, "FMB @ %p\n", zdev->fmb);

+5 -7

arch/x86/kernel/cpu/mtrr/if.c

··· 404 404 static int mtrr_seq_show(struct seq_file *seq, void *offset) 405 405 { 406 406 char factor; 407 - int i, max, len; 407 + int i, max; 408 408 mtrr_type type; 409 409 unsigned long base, size; 410 410 411 - len = 0; 412 411 max = num_var_ranges; 413 412 for (i = 0; i < max; i++) { 414 413 mtrr_if->get(i, &base, &size, &type); ··· 424 425 size >>= 20 - PAGE_SHIFT; 425 426 } 426 427 /* Base can be > 32bit */ 427 - len += seq_printf(seq, "reg%02i: base=0x%06lx000 " 428 - "(%5luMB), size=%5lu%cB, count=%d: %s\n", 429 - i, base, base >> (20 - PAGE_SHIFT), size, 430 - factor, mtrr_usage_table[i], 431 - mtrr_attrib_to_str(type)); 428 + seq_printf(seq, "reg%02i: base=0x%06lx000 (%5luMB), size=%5lu%cB, count=%d: %s\n", 429 + i, base, base >> (20 - PAGE_SHIFT), 430 + size, factor, 431 + mtrr_usage_table[i], mtrr_attrib_to_str(type)); 432 432 } 433 433 return 0; 434 434 }

+7 -9

drivers/base/power/wakeup.c

··· 843 843 unsigned long active_count; 844 844 ktime_t active_time; 845 845 ktime_t prevent_sleep_time; 846 - int ret; 847 846 848 847 spin_lock_irqsave(&ws->lock, flags); 849 848 ··· 865 866 active_time = ktime_set(0, 0); 866 867 } 867 868 868 - ret = seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t" 869 - "%lld\t\t%lld\t\t%lld\t\t%lld\t\t%lld\n", 870 - ws->name, active_count, ws->event_count, 871 - ws->wakeup_count, ws->expire_count, 872 - ktime_to_ms(active_time), ktime_to_ms(total_time), 873 - ktime_to_ms(max_time), ktime_to_ms(ws->last_time), 874 - ktime_to_ms(prevent_sleep_time)); 869 + seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t%lld\t\t%lld\t\t%lld\t\t%lld\t\t%lld\n", 870 + ws->name, active_count, ws->event_count, 871 + ws->wakeup_count, ws->expire_count, 872 + ktime_to_ms(active_time), ktime_to_ms(total_time), 873 + ktime_to_ms(max_time), ktime_to_ms(ws->last_time), 874 + ktime_to_ms(prevent_sleep_time)); 875 875 876 876 spin_unlock_irqrestore(&ws->lock, flags); 877 877 878 - return ret; 878 + return 0; 879 879 } 880 880 881 881 /**

+2 -2

drivers/block/paride/pg.c

··· 137 137 138 138 */ 139 139 140 - static bool verbose = 0; 140 + static int verbose; 141 141 static int major = PG_MAJOR; 142 142 static char *name = PG_NAME; 143 143 static int disable = 0; ··· 168 168 169 169 #include <asm/uaccess.h> 170 170 171 - module_param(verbose, bool, 0644); 171 + module_param(verbose, int, 0644); 172 172 module_param(major, int, 0); 173 173 module_param(name, charp, 0); 174 174 module_param_array(drive0, int, NULL, 0);

+73

drivers/block/zram/zram_drv.c

··· 43 43 /* Module params (documentation at end) */ 44 44 static unsigned int num_devices = 1; 45 45 46 + static inline void deprecated_attr_warn(const char *name) 47 + { 48 + pr_warn_once("%d (%s) Attribute %s (and others) will be removed. %s\n", 49 + task_pid_nr(current), 50 + current->comm, 51 + name, 52 + "See zram documentation."); 53 + } 54 + 46 55 #define ZRAM_ATTR_RO(name) \ 47 56 static ssize_t name##_show(struct device *d, \ 48 57 struct device_attribute *attr, char *b) \ 49 58 { \ 50 59 struct zram *zram = dev_to_zram(d); \ 60 + \ 61 + deprecated_attr_warn(__stringify(name)); \ 51 62 return scnprintf(b, PAGE_SIZE, "%llu\n", \ 52 63 (u64)atomic64_read(&zram->stats.name)); \ 53 64 } \ ··· 100 89 { 101 90 struct zram *zram = dev_to_zram(dev); 102 91 92 + deprecated_attr_warn("orig_data_size"); 103 93 return scnprintf(buf, PAGE_SIZE, "%llu\n", 104 94 (u64)(atomic64_read(&zram->stats.pages_stored)) << PAGE_SHIFT); 105 95 } ··· 111 99 u64 val = 0; 112 100 struct zram *zram = dev_to_zram(dev); 113 101 102 + deprecated_attr_warn("mem_used_total"); 114 103 down_read(&zram->init_lock); 115 104 if (init_done(zram)) { 116 105 struct zram_meta *meta = zram->meta; ··· 141 128 u64 val; 142 129 struct zram *zram = dev_to_zram(dev); 143 130 131 + deprecated_attr_warn("mem_limit"); 144 132 down_read(&zram->init_lock); 145 133 val = zram->limit_pages; 146 134 up_read(&zram->init_lock); ··· 173 159 u64 val = 0; 174 160 struct zram *zram = dev_to_zram(dev); 175 161 162 + deprecated_attr_warn("mem_used_max"); 176 163 down_read(&zram->init_lock); 177 164 if (init_done(zram)) 178 165 val = atomic_long_read(&zram->stats.max_used_pages); ··· 685 670 static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index, 686 671 int offset, int rw) 687 672 { 673 + unsigned long start_time = jiffies; 688 674 int ret; 675 + 676 + generic_start_io_acct(rw, bvec->bv_len >> SECTOR_SHIFT, 677 + &zram->disk->part0); 689 678 690 679 if (rw == READ) { 691 680 atomic64_inc(&zram->stats.num_reads); ··· 698 679 atomic64_inc(&zram->stats.num_writes); 699 680 ret = zram_bvec_write(zram, bvec, index, offset); 700 681 } 682 + 683 + generic_end_io_acct(rw, &zram->disk->part0, start_time); 701 684 702 685 if (unlikely(ret)) { 703 686 if (rw == READ) ··· 1048 1027 static DEVICE_ATTR_RW(max_comp_streams); 1049 1028 static DEVICE_ATTR_RW(comp_algorithm); 1050 1029 1030 + static ssize_t io_stat_show(struct device *dev, 1031 + struct device_attribute *attr, char *buf) 1032 + { 1033 + struct zram *zram = dev_to_zram(dev); 1034 + ssize_t ret; 1035 + 1036 + down_read(&zram->init_lock); 1037 + ret = scnprintf(buf, PAGE_SIZE, 1038 + "%8llu %8llu %8llu %8llu\n", 1039 + (u64)atomic64_read(&zram->stats.failed_reads), 1040 + (u64)atomic64_read(&zram->stats.failed_writes), 1041 + (u64)atomic64_read(&zram->stats.invalid_io), 1042 + (u64)atomic64_read(&zram->stats.notify_free)); 1043 + up_read(&zram->init_lock); 1044 + 1045 + return ret; 1046 + } 1047 + 1048 + static ssize_t mm_stat_show(struct device *dev, 1049 + struct device_attribute *attr, char *buf) 1050 + { 1051 + struct zram *zram = dev_to_zram(dev); 1052 + u64 orig_size, mem_used = 0; 1053 + long max_used; 1054 + ssize_t ret; 1055 + 1056 + down_read(&zram->init_lock); 1057 + if (init_done(zram)) 1058 + mem_used = zs_get_total_pages(zram->meta->mem_pool); 1059 + 1060 + orig_size = atomic64_read(&zram->stats.pages_stored); 1061 + max_used = atomic_long_read(&zram->stats.max_used_pages); 1062 + 1063 + ret = scnprintf(buf, PAGE_SIZE, 1064 + "%8llu %8llu %8llu %8lu %8ld %8llu %8llu\n", 1065 + orig_size << PAGE_SHIFT, 1066 + (u64)atomic64_read(&zram->stats.compr_data_size), 1067 + mem_used << PAGE_SHIFT, 1068 + zram->limit_pages << PAGE_SHIFT, 1069 + max_used << PAGE_SHIFT, 1070 + (u64)atomic64_read(&zram->stats.zero_pages), 1071 + (u64)atomic64_read(&zram->stats.num_migrated)); 1072 + up_read(&zram->init_lock); 1073 + 1074 + return ret; 1075 + } 1076 + 1077 + static DEVICE_ATTR_RO(io_stat); 1078 + static DEVICE_ATTR_RO(mm_stat); 1051 1079 ZRAM_ATTR_RO(num_reads); 1052 1080 ZRAM_ATTR_RO(num_writes); 1053 1081 ZRAM_ATTR_RO(failed_reads); ··· 1124 1054 &dev_attr_mem_used_max.attr, 1125 1055 &dev_attr_max_comp_streams.attr, 1126 1056 &dev_attr_comp_algorithm.attr, 1057 + &dev_attr_io_stat.attr, 1058 + &dev_attr_mm_stat.attr, 1127 1059 NULL, 1128 1060 }; 1129 1061 ··· 1154 1082 if (!zram->disk) { 1155 1083 pr_warn("Error allocating disk structure for device %d\n", 1156 1084 device_id); 1085 + ret = -ENOMEM; 1157 1086 goto out_free_queue; 1158 1087 } 1159 1088

+1

drivers/block/zram/zram_drv.h

··· 84 84 atomic64_t compr_data_size; /* compressed size of pages stored */ 85 85 atomic64_t num_reads; /* failed + successful */ 86 86 atomic64_t num_writes; /* --do-- */ 87 + atomic64_t num_migrated; /* no. of migrated object */ 87 88 atomic64_t failed_reads; /* can happen when memory is too low */ 88 89 atomic64_t failed_writes; /* can happen when memory is too low */ 89 90 atomic64_t invalid_io; /* non-page-aligned I/O requests */

+26 -28

drivers/parisc/ccio-dma.c

··· 1021 1021 #ifdef CONFIG_PROC_FS 1022 1022 static int ccio_proc_info(struct seq_file *m, void *p) 1023 1023 { 1024 - int len = 0; 1025 1024 struct ioc *ioc = ioc_list; 1026 1025 1027 1026 while (ioc != NULL) { ··· 1030 1031 int j; 1031 1032 #endif 1032 1033 1033 - len += seq_printf(m, "%s\n", ioc->name); 1034 + seq_printf(m, "%s\n", ioc->name); 1034 1035 1035 - len += seq_printf(m, "Cujo 2.0 bug : %s\n", 1036 - (ioc->cujo20_bug ? "yes" : "no")); 1036 + seq_printf(m, "Cujo 2.0 bug : %s\n", 1037 + (ioc->cujo20_bug ? "yes" : "no")); 1037 1038 1038 - len += seq_printf(m, "IO PDIR size : %d bytes (%d entries)\n", 1039 - total_pages * 8, total_pages); 1039 + seq_printf(m, "IO PDIR size : %d bytes (%d entries)\n", 1040 + total_pages * 8, total_pages); 1040 1041 1041 1042 #ifdef CCIO_COLLECT_STATS 1042 - len += seq_printf(m, "IO PDIR entries : %ld free %ld used (%d%%)\n", 1043 - total_pages - ioc->used_pages, ioc->used_pages, 1044 - (int)(ioc->used_pages * 100 / total_pages)); 1043 + seq_printf(m, "IO PDIR entries : %ld free %ld used (%d%%)\n", 1044 + total_pages - ioc->used_pages, ioc->used_pages, 1045 + (int)(ioc->used_pages * 100 / total_pages)); 1045 1046 #endif 1046 1047 1047 - len += seq_printf(m, "Resource bitmap : %d bytes (%d pages)\n", 1048 - ioc->res_size, total_pages); 1048 + seq_printf(m, "Resource bitmap : %d bytes (%d pages)\n", 1049 + ioc->res_size, total_pages); 1049 1050 1050 1051 #ifdef CCIO_COLLECT_STATS 1051 1052 min = max = ioc->avg_search[0]; ··· 1057 1058 min = ioc->avg_search[j]; 1058 1059 } 1059 1060 avg /= CCIO_SEARCH_SAMPLE; 1060 - len += seq_printf(m, " Bitmap search : %ld/%ld/%ld (min/avg/max CPU Cycles)\n", 1061 - min, avg, max); 1061 + seq_printf(m, " Bitmap search : %ld/%ld/%ld (min/avg/max CPU Cycles)\n", 1062 + min, avg, max); 1062 1063 1063 - len += seq_printf(m, "pci_map_single(): %8ld calls %8ld pages (avg %d/1000)\n", 1064 - ioc->msingle_calls, ioc->msingle_pages, 1065 - (int)((ioc->msingle_pages * 1000)/ioc->msingle_calls)); 1064 + seq_printf(m, "pci_map_single(): %8ld calls %8ld pages (avg %d/1000)\n", 1065 + ioc->msingle_calls, ioc->msingle_pages, 1066 + (int)((ioc->msingle_pages * 1000)/ioc->msingle_calls)); 1066 1067 1067 1068 /* KLUGE - unmap_sg calls unmap_single for each mapped page */ 1068 1069 min = ioc->usingle_calls - ioc->usg_calls; 1069 1070 max = ioc->usingle_pages - ioc->usg_pages; 1070 - len += seq_printf(m, "pci_unmap_single: %8ld calls %8ld pages (avg %d/1000)\n", 1071 - min, max, (int)((max * 1000)/min)); 1071 + seq_printf(m, "pci_unmap_single: %8ld calls %8ld pages (avg %d/1000)\n", 1072 + min, max, (int)((max * 1000)/min)); 1072 1073 1073 - len += seq_printf(m, "pci_map_sg() : %8ld calls %8ld pages (avg %d/1000)\n", 1074 - ioc->msg_calls, ioc->msg_pages, 1075 - (int)((ioc->msg_pages * 1000)/ioc->msg_calls)); 1074 + seq_printf(m, "pci_map_sg() : %8ld calls %8ld pages (avg %d/1000)\n", 1075 + ioc->msg_calls, ioc->msg_pages, 1076 + (int)((ioc->msg_pages * 1000)/ioc->msg_calls)); 1076 1077 1077 - len += seq_printf(m, "pci_unmap_sg() : %8ld calls %8ld pages (avg %d/1000)\n\n\n", 1078 - ioc->usg_calls, ioc->usg_pages, 1079 - (int)((ioc->usg_pages * 1000)/ioc->usg_calls)); 1078 + seq_printf(m, "pci_unmap_sg() : %8ld calls %8ld pages (avg %d/1000)\n\n\n", 1079 + ioc->usg_calls, ioc->usg_pages, 1080 + (int)((ioc->usg_pages * 1000)/ioc->usg_calls)); 1080 1081 #endif /* CCIO_COLLECT_STATS */ 1081 1082 1082 1083 ioc = ioc->next; ··· 1100 1101 1101 1102 static int ccio_proc_bitmap_info(struct seq_file *m, void *p) 1102 1103 { 1103 - int len = 0; 1104 1104 struct ioc *ioc = ioc_list; 1105 1105 1106 1106 while (ioc != NULL) { ··· 1108 1110 1109 1111 for (j = 0; j < (ioc->res_size / sizeof(u32)); j++) { 1110 1112 if ((j & 7) == 0) 1111 - len += seq_puts(m, "\n "); 1112 - len += seq_printf(m, "%08x", *res_ptr); 1113 + seq_puts(m, "\n "); 1114 + seq_printf(m, "%08x", *res_ptr); 1113 1115 res_ptr++; 1114 1116 } 1115 - len += seq_puts(m, "\n\n"); 1117 + seq_puts(m, "\n\n"); 1116 1118 ioc = ioc->next; 1117 1119 break; /* XXX - remove me */ 1118 1120 }

+39 -41

drivers/parisc/sba_iommu.c

··· 1774 1774 #ifdef SBA_COLLECT_STATS 1775 1775 unsigned long avg = 0, min, max; 1776 1776 #endif 1777 - int i, len = 0; 1777 + int i; 1778 1778 1779 - len += seq_printf(m, "%s rev %d.%d\n", 1780 - sba_dev->name, 1781 - (sba_dev->hw_rev & 0x7) + 1, 1782 - (sba_dev->hw_rev & 0x18) >> 3 1783 - ); 1784 - len += seq_printf(m, "IO PDIR size : %d bytes (%d entries)\n", 1785 - (int) ((ioc->res_size << 3) * sizeof(u64)), /* 8 bits/byte */ 1786 - total_pages); 1779 + seq_printf(m, "%s rev %d.%d\n", 1780 + sba_dev->name, 1781 + (sba_dev->hw_rev & 0x7) + 1, 1782 + (sba_dev->hw_rev & 0x18) >> 3); 1783 + seq_printf(m, "IO PDIR size : %d bytes (%d entries)\n", 1784 + (int)((ioc->res_size << 3) * sizeof(u64)), /* 8 bits/byte */ 1785 + total_pages); 1787 1786 1788 - len += seq_printf(m, "Resource bitmap : %d bytes (%d pages)\n", 1789 - ioc->res_size, ioc->res_size << 3); /* 8 bits per byte */ 1787 + seq_printf(m, "Resource bitmap : %d bytes (%d pages)\n", 1788 + ioc->res_size, ioc->res_size << 3); /* 8 bits per byte */ 1790 1789 1791 - len += seq_printf(m, "LMMIO_BASE/MASK/ROUTE %08x %08x %08x\n", 1792 - READ_REG32(sba_dev->sba_hpa + LMMIO_DIST_BASE), 1793 - READ_REG32(sba_dev->sba_hpa + LMMIO_DIST_MASK), 1794 - READ_REG32(sba_dev->sba_hpa + LMMIO_DIST_ROUTE) 1795 - ); 1790 + seq_printf(m, "LMMIO_BASE/MASK/ROUTE %08x %08x %08x\n", 1791 + READ_REG32(sba_dev->sba_hpa + LMMIO_DIST_BASE), 1792 + READ_REG32(sba_dev->sba_hpa + LMMIO_DIST_MASK), 1793 + READ_REG32(sba_dev->sba_hpa + LMMIO_DIST_ROUTE)); 1796 1794 1797 1795 for (i=0; i<4; i++) 1798 - len += seq_printf(m, "DIR%d_BASE/MASK/ROUTE %08x %08x %08x\n", i, 1799 - READ_REG32(sba_dev->sba_hpa + LMMIO_DIRECT0_BASE + i*0x18), 1800 - READ_REG32(sba_dev->sba_hpa + LMMIO_DIRECT0_MASK + i*0x18), 1801 - READ_REG32(sba_dev->sba_hpa + LMMIO_DIRECT0_ROUTE + i*0x18) 1802 - ); 1796 + seq_printf(m, "DIR%d_BASE/MASK/ROUTE %08x %08x %08x\n", 1797 + i, 1798 + READ_REG32(sba_dev->sba_hpa + LMMIO_DIRECT0_BASE + i*0x18), 1799 + READ_REG32(sba_dev->sba_hpa + LMMIO_DIRECT0_MASK + i*0x18), 1800 + READ_REG32(sba_dev->sba_hpa + LMMIO_DIRECT0_ROUTE + i*0x18)); 1803 1801 1804 1802 #ifdef SBA_COLLECT_STATS 1805 - len += seq_printf(m, "IO PDIR entries : %ld free %ld used (%d%%)\n", 1806 - total_pages - ioc->used_pages, ioc->used_pages, 1807 - (int) (ioc->used_pages * 100 / total_pages)); 1803 + seq_printf(m, "IO PDIR entries : %ld free %ld used (%d%%)\n", 1804 + total_pages - ioc->used_pages, ioc->used_pages, 1805 + (int)(ioc->used_pages * 100 / total_pages)); 1808 1806 1809 1807 min = max = ioc->avg_search[0]; 1810 1808 for (i = 0; i < SBA_SEARCH_SAMPLE; i++) { ··· 1811 1813 if (ioc->avg_search[i] < min) min = ioc->avg_search[i]; 1812 1814 } 1813 1815 avg /= SBA_SEARCH_SAMPLE; 1814 - len += seq_printf(m, " Bitmap search : %ld/%ld/%ld (min/avg/max CPU Cycles)\n", 1815 - min, avg, max); 1816 + seq_printf(m, " Bitmap search : %ld/%ld/%ld (min/avg/max CPU Cycles)\n", 1817 + min, avg, max); 1816 1818 1817 - len += seq_printf(m, "pci_map_single(): %12ld calls %12ld pages (avg %d/1000)\n", 1818 - ioc->msingle_calls, ioc->msingle_pages, 1819 - (int) ((ioc->msingle_pages * 1000)/ioc->msingle_calls)); 1819 + seq_printf(m, "pci_map_single(): %12ld calls %12ld pages (avg %d/1000)\n", 1820 + ioc->msingle_calls, ioc->msingle_pages, 1821 + (int)((ioc->msingle_pages * 1000)/ioc->msingle_calls)); 1820 1822 1821 1823 /* KLUGE - unmap_sg calls unmap_single for each mapped page */ 1822 1824 min = ioc->usingle_calls; 1823 1825 max = ioc->usingle_pages - ioc->usg_pages; 1824 - len += seq_printf(m, "pci_unmap_single: %12ld calls %12ld pages (avg %d/1000)\n", 1825 - min, max, (int) ((max * 1000)/min)); 1826 + seq_printf(m, "pci_unmap_single: %12ld calls %12ld pages (avg %d/1000)\n", 1827 + min, max, (int)((max * 1000)/min)); 1826 1828 1827 - len += seq_printf(m, "pci_map_sg() : %12ld calls %12ld pages (avg %d/1000)\n", 1828 - ioc->msg_calls, ioc->msg_pages, 1829 - (int) ((ioc->msg_pages * 1000)/ioc->msg_calls)); 1829 + seq_printf(m, "pci_map_sg() : %12ld calls %12ld pages (avg %d/1000)\n", 1830 + ioc->msg_calls, ioc->msg_pages, 1831 + (int)((ioc->msg_pages * 1000)/ioc->msg_calls)); 1830 1832 1831 - len += seq_printf(m, "pci_unmap_sg() : %12ld calls %12ld pages (avg %d/1000)\n", 1832 - ioc->usg_calls, ioc->usg_pages, 1833 - (int) ((ioc->usg_pages * 1000)/ioc->usg_calls)); 1833 + seq_printf(m, "pci_unmap_sg() : %12ld calls %12ld pages (avg %d/1000)\n", 1834 + ioc->usg_calls, ioc->usg_pages, 1835 + (int)((ioc->usg_pages * 1000)/ioc->usg_calls)); 1834 1836 #endif 1835 1837 1836 1838 return 0; ··· 1856 1858 struct sba_device *sba_dev = sba_list; 1857 1859 struct ioc *ioc = &sba_dev->ioc[0]; /* FIXME: Multi-IOC support! */ 1858 1860 unsigned int *res_ptr = (unsigned int *)ioc->res_map; 1859 - int i, len = 0; 1861 + int i; 1860 1862 1861 1863 for (i = 0; i < (ioc->res_size/sizeof(unsigned int)); ++i, ++res_ptr) { 1862 1864 if ((i & 7) == 0) 1863 - len += seq_printf(m, "\n "); 1864 - len += seq_printf(m, " %08x", *res_ptr); 1865 + seq_puts(m, "\n "); 1866 + seq_printf(m, " %08x", *res_ptr); 1865 1867 } 1866 - len += seq_printf(m, "\n"); 1868 + seq_putc(m, '\n'); 1867 1869 1868 1870 return 0; 1869 1871 }

+19 -17

drivers/rtc/rtc-cmos.c

··· 459 459 /* NOTE: at least ICH6 reports battery status using a different 460 460 * (non-RTC) bit; and SQWE is ignored on many current systems. 461 461 */ 462 - return seq_printf(seq, 463 - "periodic_IRQ\t: %s\n" 464 - "update_IRQ\t: %s\n" 465 - "HPET_emulated\t: %s\n" 466 - // "square_wave\t: %s\n" 467 - "BCD\t\t: %s\n" 468 - "DST_enable\t: %s\n" 469 - "periodic_freq\t: %d\n" 470 - "batt_status\t: %s\n", 471 - (rtc_control & RTC_PIE) ? "yes" : "no", 472 - (rtc_control & RTC_UIE) ? "yes" : "no", 473 - is_hpet_enabled() ? "yes" : "no", 474 - // (rtc_control & RTC_SQWE) ? "yes" : "no", 475 - (rtc_control & RTC_DM_BINARY) ? "no" : "yes", 476 - (rtc_control & RTC_DST_EN) ? "yes" : "no", 477 - cmos->rtc->irq_freq, 478 - (valid & RTC_VRT) ? "okay" : "dead"); 462 + seq_printf(seq, 463 + "periodic_IRQ\t: %s\n" 464 + "update_IRQ\t: %s\n" 465 + "HPET_emulated\t: %s\n" 466 + // "square_wave\t: %s\n" 467 + "BCD\t\t: %s\n" 468 + "DST_enable\t: %s\n" 469 + "periodic_freq\t: %d\n" 470 + "batt_status\t: %s\n", 471 + (rtc_control & RTC_PIE) ? "yes" : "no", 472 + (rtc_control & RTC_UIE) ? "yes" : "no", 473 + is_hpet_enabled() ? "yes" : "no", 474 + // (rtc_control & RTC_SQWE) ? "yes" : "no", 475 + (rtc_control & RTC_DM_BINARY) ? "no" : "yes", 476 + (rtc_control & RTC_DST_EN) ? "yes" : "no", 477 + cmos->rtc->irq_freq, 478 + (valid & RTC_VRT) ? "okay" : "dead"); 479 + 480 + return 0; 479 481 } 480 482 481 483 #else

+3 -3

drivers/rtc/rtc-ds1305.c

··· 434 434 } 435 435 436 436 done: 437 - return seq_printf(seq, 438 - "trickle_charge\t: %s%s\n", 439 - diodes, resistors); 437 + seq_printf(seq, "trickle_charge\t: %s%s\n", diodes, resistors); 438 + 439 + return 0; 440 440 } 441 441 442 442 #else

+9 -7

drivers/rtc/rtc-mrst.c

··· 277 277 valid = vrtc_cmos_read(RTC_VALID); 278 278 spin_unlock_irq(&rtc_lock); 279 279 280 - return seq_printf(seq, 281 - "periodic_IRQ\t: %s\n" 282 - "alarm\t\t: %s\n" 283 - "BCD\t\t: no\n" 284 - "periodic_freq\t: daily (not adjustable)\n", 285 - (rtc_control & RTC_PIE) ? "on" : "off", 286 - (rtc_control & RTC_AIE) ? "on" : "off"); 280 + seq_printf(seq, 281 + "periodic_IRQ\t: %s\n" 282 + "alarm\t\t: %s\n" 283 + "BCD\t\t: no\n" 284 + "periodic_freq\t: daily (not adjustable)\n", 285 + (rtc_control & RTC_PIE) ? "on" : "off", 286 + (rtc_control & RTC_AIE) ? "on" : "off"); 287 + 288 + return 0; 287 289 } 288 290 289 291 #else

+3 -1

drivers/rtc/rtc-tegra.c

··· 261 261 if (!dev || !dev->driver) 262 262 return 0; 263 263 264 - return seq_printf(seq, "name\t\t: %s\n", dev_name(dev)); 264 + seq_printf(seq, "name\t\t: %s\n", dev_name(dev)); 265 + 266 + return 0; 265 267 } 266 268 267 269 static irqreturn_t tegra_rtc_irq_handler(int irq, void *data)

+7 -5

drivers/s390/cio/blacklist.c

··· 330 330 if (!iter->in_range) { 331 331 /* First device in range. */ 332 332 if ((iter->devno == __MAX_SUBCHANNEL) || 333 - !is_blacklisted(iter->ssid, iter->devno + 1)) 333 + !is_blacklisted(iter->ssid, iter->devno + 1)) { 334 334 /* Singular device. */ 335 - return seq_printf(s, "0.%x.%04x\n", 336 - iter->ssid, iter->devno); 335 + seq_printf(s, "0.%x.%04x\n", iter->ssid, iter->devno); 336 + return 0; 337 + } 337 338 iter->in_range = 1; 338 - return seq_printf(s, "0.%x.%04x-", iter->ssid, iter->devno); 339 + seq_printf(s, "0.%x.%04x-", iter->ssid, iter->devno); 340 + return 0; 339 341 } 340 342 if ((iter->devno == __MAX_SUBCHANNEL) || 341 343 !is_blacklisted(iter->ssid, iter->devno + 1)) { 342 344 /* Last device in range. */ 343 345 iter->in_range = 0; 344 - return seq_printf(s, "0.%x.%04x\n", iter->ssid, iter->devno); 346 + seq_printf(s, "0.%x.%04x\n", iter->ssid, iter->devno); 345 347 } 346 348 return 0; 347 349 }

+1 -2

drivers/sbus/char/bbc_envctrl.c

··· 160 160 printk(KERN_CRIT "kenvctrld: Shutting down the system now.\n"); 161 161 162 162 shutting_down = 1; 163 - if (orderly_poweroff(true) < 0) 164 - printk(KERN_CRIT "envctrl: shutdown execution failed\n"); 163 + orderly_poweroff(true); 165 164 } 166 165 167 166 #define WARN_INTERVAL (30 * HZ)

+1 -6

drivers/sbus/char/envctrl.c

··· 970 970 static void envctrl_do_shutdown(void) 971 971 { 972 972 static int inprog = 0; 973 - int ret; 974 973 975 974 if (inprog != 0) 976 975 return; 977 976 978 977 inprog = 1; 979 978 printk(KERN_CRIT "kenvctrld: WARNING: Shutting down the system now.\n"); 980 - ret = orderly_poweroff(true); 981 - if (ret < 0) { 982 - printk(KERN_CRIT "kenvctrld: WARNING: system shutdown failed!\n"); 983 - inprog = 0; /* unlikely to succeed, but we could try again */ 984 - } 979 + orderly_poweroff(true); 985 980 } 986 981 987 982 static struct task_struct *kenvctrld_task;

+1

drivers/staging/lustre/lustre/Kconfig

··· 10 10 select CRYPTO_SHA1 11 11 select CRYPTO_SHA256 12 12 select CRYPTO_SHA512 13 + depends on MULTIUSER 13 14 help 14 15 This option enables Lustre file system client support. Choose Y 15 16 here if you want to access a Lustre file system cluster. To compile

+17

fs/dax.c

··· 464 464 EXPORT_SYMBOL_GPL(dax_fault); 465 465 466 466 /** 467 + * dax_pfn_mkwrite - handle first write to DAX page 468 + * @vma: The virtual memory area where the fault occurred 469 + * @vmf: The description of the fault 470 + * 471 + */ 472 + int dax_pfn_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) 473 + { 474 + struct super_block *sb = file_inode(vma->vm_file)->i_sb; 475 + 476 + sb_start_pagefault(sb); 477 + file_update_time(vma->vm_file); 478 + sb_end_pagefault(sb); 479 + return VM_FAULT_NOPAGE; 480 + } 481 + EXPORT_SYMBOL_GPL(dax_pfn_mkwrite); 482 + 483 + /** 467 484 * dax_zero_page_range - zero a range within a page of a DAX file 468 485 * @inode: The file being truncated 469 486 * @from: The file offset that is being truncated to

-1

fs/ext2/ext2.h

··· 793 793 int datasync); 794 794 extern const struct inode_operations ext2_file_inode_operations; 795 795 extern const struct file_operations ext2_file_operations; 796 - extern const struct file_operations ext2_dax_file_operations; 797 796 798 797 /* inode.c */ 799 798 extern const struct address_space_operations ext2_aops;

+1 -16

fs/ext2/file.c

··· 39 39 static const struct vm_operations_struct ext2_dax_vm_ops = { 40 40 .fault = ext2_dax_fault, 41 41 .page_mkwrite = ext2_dax_mkwrite, 42 + .pfn_mkwrite = dax_pfn_mkwrite, 42 43 }; 43 44 44 45 static int ext2_file_mmap(struct file *file, struct vm_area_struct *vma) ··· 106 105 .splice_read = generic_file_splice_read, 107 106 .splice_write = iter_file_splice_write, 108 107 }; 109 - 110 - #ifdef CONFIG_FS_DAX 111 - const struct file_operations ext2_dax_file_operations = { 112 - .llseek = generic_file_llseek, 113 - .read_iter = generic_file_read_iter, 114 - .write_iter = generic_file_write_iter, 115 - .unlocked_ioctl = ext2_ioctl, 116 - #ifdef CONFIG_COMPAT 117 - .compat_ioctl = ext2_compat_ioctl, 118 - #endif 119 - .mmap = ext2_file_mmap, 120 - .open = dquot_file_open, 121 - .release = ext2_release_file, 122 - .fsync = ext2_fsync, 123 - }; 124 - #endif 125 108 126 109 const struct inode_operations ext2_file_inode_operations = { 127 110 #ifdef CONFIG_EXT2_FS_XATTR

+1 -4

fs/ext2/inode.c

··· 1388 1388 1389 1389 if (S_ISREG(inode->i_mode)) { 1390 1390 inode->i_op = &ext2_file_inode_operations; 1391 - if (test_opt(inode->i_sb, DAX)) { 1392 - inode->i_mapping->a_ops = &ext2_aops; 1393 - inode->i_fop = &ext2_dax_file_operations; 1394 - } else if (test_opt(inode->i_sb, NOBH)) { 1391 + if (test_opt(inode->i_sb, NOBH)) { 1395 1392 inode->i_mapping->a_ops = &ext2_nobh_aops; 1396 1393 inode->i_fop = &ext2_file_operations; 1397 1394 } else {

+2 -8

fs/ext2/namei.c

··· 104 104 return PTR_ERR(inode); 105 105 106 106 inode->i_op = &ext2_file_inode_operations; 107 - if (test_opt(inode->i_sb, DAX)) { 108 - inode->i_mapping->a_ops = &ext2_aops; 109 - inode->i_fop = &ext2_dax_file_operations; 110 - } else if (test_opt(inode->i_sb, NOBH)) { 107 + if (test_opt(inode->i_sb, NOBH)) { 111 108 inode->i_mapping->a_ops = &ext2_nobh_aops; 112 109 inode->i_fop = &ext2_file_operations; 113 110 } else { ··· 122 125 return PTR_ERR(inode); 123 126 124 127 inode->i_op = &ext2_file_inode_operations; 125 - if (test_opt(inode->i_sb, DAX)) { 126 - inode->i_mapping->a_ops = &ext2_aops; 127 - inode->i_fop = &ext2_dax_file_operations; 128 - } else if (test_opt(inode->i_sb, NOBH)) { 128 + if (test_opt(inode->i_sb, NOBH)) { 129 129 inode->i_mapping->a_ops = &ext2_nobh_aops; 130 130 inode->i_fop = &ext2_file_operations; 131 131 } else {

-1

fs/ext4/ext4.h

··· 2593 2593 /* file.c */ 2594 2594 extern const struct inode_operations ext4_file_inode_operations; 2595 2595 extern const struct file_operations ext4_file_operations; 2596 - extern const struct file_operations ext4_dax_file_operations; 2597 2596 extern loff_t ext4_llseek(struct file *file, loff_t offset, int origin); 2598 2597 2599 2598 /* inline.c */

+1 -18

fs/ext4/file.c

··· 206 206 static const struct vm_operations_struct ext4_dax_vm_ops = { 207 207 .fault = ext4_dax_fault, 208 208 .page_mkwrite = ext4_dax_mkwrite, 209 + .pfn_mkwrite = dax_pfn_mkwrite, 209 210 }; 210 211 #else 211 212 #define ext4_dax_vm_ops ext4_file_vm_ops ··· 622 621 .splice_write = iter_file_splice_write, 623 622 .fallocate = ext4_fallocate, 624 623 }; 625 - 626 - #ifdef CONFIG_FS_DAX 627 - const struct file_operations ext4_dax_file_operations = { 628 - .llseek = ext4_llseek, 629 - .read_iter = generic_file_read_iter, 630 - .write_iter = ext4_file_write_iter, 631 - .unlocked_ioctl = ext4_ioctl, 632 - #ifdef CONFIG_COMPAT 633 - .compat_ioctl = ext4_compat_ioctl, 634 - #endif 635 - .mmap = ext4_file_mmap, 636 - .open = ext4_file_open, 637 - .release = ext4_release_file, 638 - .fsync = ext4_sync_file, 639 - /* Splice not yet supported with DAX */ 640 - .fallocate = ext4_fallocate, 641 - }; 642 - #endif 643 624 644 625 const struct inode_operations ext4_file_inode_operations = { 645 626 .setattr = ext4_setattr,

+1 -4

fs/ext4/inode.c

··· 4090 4090 4091 4091 if (S_ISREG(inode->i_mode)) { 4092 4092 inode->i_op = &ext4_file_inode_operations; 4093 - if (test_opt(inode->i_sb, DAX)) 4094 - inode->i_fop = &ext4_dax_file_operations; 4095 - else 4096 - inode->i_fop = &ext4_file_operations; 4093 + inode->i_fop = &ext4_file_operations; 4097 4094 ext4_set_aops(inode); 4098 4095 } else if (S_ISDIR(inode->i_mode)) { 4099 4096 inode->i_op = &ext4_dir_inode_operations;

+2 -8

fs/ext4/namei.c

··· 2235 2235 err = PTR_ERR(inode); 2236 2236 if (!IS_ERR(inode)) { 2237 2237 inode->i_op = &ext4_file_inode_operations; 2238 - if (test_opt(inode->i_sb, DAX)) 2239 - inode->i_fop = &ext4_dax_file_operations; 2240 - else 2241 - inode->i_fop = &ext4_file_operations; 2238 + inode->i_fop = &ext4_file_operations; 2242 2239 ext4_set_aops(inode); 2243 2240 err = ext4_add_nondir(handle, dentry, inode); 2244 2241 if (!err && IS_DIRSYNC(dir)) ··· 2299 2302 err = PTR_ERR(inode); 2300 2303 if (!IS_ERR(inode)) { 2301 2304 inode->i_op = &ext4_file_inode_operations; 2302 - if (test_opt(inode->i_sb, DAX)) 2303 - inode->i_fop = &ext4_dax_file_operations; 2304 - else 2305 - inode->i_fop = &ext4_file_operations; 2305 + inode->i_fop = &ext4_file_operations; 2306 2306 ext4_set_aops(inode); 2307 2307 d_tmpfile(dentry, inode); 2308 2308 err = ext4_orphan_add(handle, inode);

+71 -19

fs/hugetlbfs/inode.c

··· 48 48 kuid_t uid; 49 49 kgid_t gid; 50 50 umode_t mode; 51 - long nr_blocks; 51 + long max_hpages; 52 52 long nr_inodes; 53 53 struct hstate *hstate; 54 + long min_hpages; 54 55 }; 55 56 56 57 struct hugetlbfs_inode_info { ··· 69 68 enum { 70 69 Opt_size, Opt_nr_inodes, 71 70 Opt_mode, Opt_uid, Opt_gid, 72 - Opt_pagesize, 71 + Opt_pagesize, Opt_min_size, 73 72 Opt_err, 74 73 }; 75 74 ··· 80 79 {Opt_uid, "uid=%u"}, 81 80 {Opt_gid, "gid=%u"}, 82 81 {Opt_pagesize, "pagesize=%s"}, 82 + {Opt_min_size, "min_size=%s"}, 83 83 {Opt_err, NULL}, 84 84 }; 85 85 ··· 731 729 .show_options = generic_show_options, 732 730 }; 733 731 732 + enum { NO_SIZE, SIZE_STD, SIZE_PERCENT }; 733 + 734 + /* 735 + * Convert size option passed from command line to number of huge pages 736 + * in the pool specified by hstate. Size option could be in bytes 737 + * (val_type == SIZE_STD) or percentage of the pool (val_type == SIZE_PERCENT). 738 + */ 739 + static long long 740 + hugetlbfs_size_to_hpages(struct hstate *h, unsigned long long size_opt, 741 + int val_type) 742 + { 743 + if (val_type == NO_SIZE) 744 + return -1; 745 + 746 + if (val_type == SIZE_PERCENT) { 747 + size_opt <<= huge_page_shift(h); 748 + size_opt *= h->max_huge_pages; 749 + do_div(size_opt, 100); 750 + } 751 + 752 + size_opt >>= huge_page_shift(h); 753 + return size_opt; 754 + } 755 + 734 756 static int 735 757 hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig) 736 758 { 737 759 char *p, *rest; 738 760 substring_t args[MAX_OPT_ARGS]; 739 761 int option; 740 - unsigned long long size = 0; 741 - enum { NO_SIZE, SIZE_STD, SIZE_PERCENT } setsize = NO_SIZE; 762 + unsigned long long max_size_opt = 0, min_size_opt = 0; 763 + int max_val_type = NO_SIZE, min_val_type = NO_SIZE; 742 764 743 765 if (!options) 744 766 return 0; ··· 800 774 /* memparse() will accept a K/M/G without a digit */ 801 775 if (!isdigit(*args[0].from)) 802 776 goto bad_val; 803 - size = memparse(args[0].from, &rest); 804 - setsize = SIZE_STD; 777 + max_size_opt = memparse(args[0].from, &rest); 778 + max_val_type = SIZE_STD; 805 779 if (*rest == '%') 806 - setsize = SIZE_PERCENT; 780 + max_val_type = SIZE_PERCENT; 807 781 break; 808 782 } 809 783 ··· 826 800 break; 827 801 } 828 802 803 + case Opt_min_size: { 804 + /* memparse() will accept a K/M/G without a digit */ 805 + if (!isdigit(*args[0].from)) 806 + goto bad_val; 807 + min_size_opt = memparse(args[0].from, &rest); 808 + min_val_type = SIZE_STD; 809 + if (*rest == '%') 810 + min_val_type = SIZE_PERCENT; 811 + break; 812 + } 813 + 829 814 default: 830 815 pr_err("Bad mount option: \"%s\"\n", p); 831 816 return -EINVAL; ··· 844 807 } 845 808 } 846 809 847 - /* Do size after hstate is set up */ 848 - if (setsize > NO_SIZE) { 849 - struct hstate *h = pconfig->hstate; 850 - if (setsize == SIZE_PERCENT) { 851 - size <<= huge_page_shift(h); 852 - size *= h->max_huge_pages; 853 - do_div(size, 100); 854 - } 855 - pconfig->nr_blocks = (size >> huge_page_shift(h)); 810 + /* 811 + * Use huge page pool size (in hstate) to convert the size 812 + * options to number of huge pages. If NO_SIZE, -1 is returned. 813 + */ 814 + pconfig->max_hpages = hugetlbfs_size_to_hpages(pconfig->hstate, 815 + max_size_opt, max_val_type); 816 + pconfig->min_hpages = hugetlbfs_size_to_hpages(pconfig->hstate, 817 + min_size_opt, min_val_type); 818 + 819 + /* 820 + * If max_size was specified, then min_size must be smaller 821 + */ 822 + if (max_val_type > NO_SIZE && 823 + pconfig->min_hpages > pconfig->max_hpages) { 824 + pr_err("minimum size can not be greater than maximum size\n"); 825 + return -EINVAL; 856 826 } 857 827 858 828 return 0; ··· 878 834 879 835 save_mount_options(sb, data); 880 836 881 - config.nr_blocks = -1; /* No limit on size by default */ 837 + config.max_hpages = -1; /* No limit on size by default */ 882 838 config.nr_inodes = -1; /* No limit on number of inodes by default */ 883 839 config.uid = current_fsuid(); 884 840 config.gid = current_fsgid(); 885 841 config.mode = 0755; 886 842 config.hstate = &default_hstate; 843 + config.min_hpages = -1; /* No default minimum size */ 887 844 ret = hugetlbfs_parse_options(data, &config); 888 845 if (ret) 889 846 return ret; ··· 898 853 sbinfo->max_inodes = config.nr_inodes; 899 854 sbinfo->free_inodes = config.nr_inodes; 900 855 sbinfo->spool = NULL; 901 - if (config.nr_blocks != -1) { 902 - sbinfo->spool = hugepage_new_subpool(config.nr_blocks); 856 + /* 857 + * Allocate and initialize subpool if maximum or minimum size is 858 + * specified. Any needed reservations (for minimim size) are taken 859 + * taken when the subpool is created. 860 + */ 861 + if (config.max_hpages != -1 || config.min_hpages != -1) { 862 + sbinfo->spool = hugepage_new_subpool(config.hstate, 863 + config.max_hpages, 864 + config.min_hpages); 903 865 if (!sbinfo->spool) 904 866 goto out_free; 905 867 }

+12 -19

fs/jfs/jfs_metapage.c

··· 183 183 184 184 #endif 185 185 186 - static void init_once(void *foo) 187 - { 188 - struct metapage *mp = (struct metapage *)foo; 189 - 190 - mp->lid = 0; 191 - mp->lsn = 0; 192 - mp->flag = 0; 193 - mp->data = NULL; 194 - mp->clsn = 0; 195 - mp->log = NULL; 196 - set_bit(META_free, &mp->flag); 197 - init_waitqueue_head(&mp->wait); 198 - } 199 - 200 186 static inline struct metapage *alloc_metapage(gfp_t gfp_mask) 201 187 { 202 - return mempool_alloc(metapage_mempool, gfp_mask); 188 + struct metapage *mp = mempool_alloc(metapage_mempool, gfp_mask); 189 + 190 + if (mp) { 191 + mp->lid = 0; 192 + mp->lsn = 0; 193 + mp->data = NULL; 194 + mp->clsn = 0; 195 + mp->log = NULL; 196 + init_waitqueue_head(&mp->wait); 197 + } 198 + return mp; 203 199 } 204 200 205 201 static inline void free_metapage(struct metapage *mp) 206 202 { 207 - mp->flag = 0; 208 - set_bit(META_free, &mp->flag); 209 - 210 203 mempool_free(mp, metapage_mempool); 211 204 } 212 205 ··· 209 216 * Allocate the metapage structures 210 217 */ 211 218 metapage_cache = kmem_cache_create("jfs_mp", sizeof(struct metapage), 212 - 0, 0, init_once); 219 + 0, 0, NULL); 213 220 if (metapage_cache == NULL) 214 221 return -ENOMEM; 215 222

-1

fs/jfs/jfs_metapage.h

··· 48 48 49 49 /* metapage flag */ 50 50 #define META_locked 0 51 - #define META_free 1 52 51 #define META_dirty 2 53 52 #define META_sync 3 54 53 #define META_discard 4

+1 -1

fs/nfs/Kconfig

··· 1 1 config NFS_FS 2 2 tristate "NFS client support" 3 - depends on INET && FILE_LOCKING 3 + depends on INET && FILE_LOCKING && MULTIUSER 4 4 select LOCKD 5 5 select SUNRPC 6 6 select NFS_ACL_SUPPORT if NFS_V3_ACL

+1

fs/nfsd/Kconfig

··· 6 6 select SUNRPC 7 7 select EXPORTFS 8 8 select NFS_ACL_SUPPORT if NFSD_V2_ACL 9 + depends on MULTIUSER 9 10 help 10 11 Choose Y here if you want to allow other computers to access 11 12 files residing on this system using Sun's Network File System

+23 -3

fs/proc/array.c

··· 99 99 buf = m->buf + m->count; 100 100 101 101 /* Ignore error for now */ 102 - string_escape_str(tcomm, &buf, m->size - m->count, 103 - ESCAPE_SPACE | ESCAPE_SPECIAL, "\n\\"); 102 + buf += string_escape_str(tcomm, buf, m->size - m->count, 103 + ESCAPE_SPACE | ESCAPE_SPECIAL, "\n\\"); 104 104 105 105 m->count = buf - m->buf; 106 106 seq_putc(m, '\n'); ··· 188 188 from_kgid_munged(user_ns, GROUP_AT(group_info, g))); 189 189 put_cred(cred); 190 190 191 + #ifdef CONFIG_PID_NS 192 + seq_puts(m, "\nNStgid:"); 193 + for (g = ns->level; g <= pid->level; g++) 194 + seq_printf(m, "\t%d", 195 + task_tgid_nr_ns(p, pid->numbers[g].ns)); 196 + seq_puts(m, "\nNSpid:"); 197 + for (g = ns->level; g <= pid->level; g++) 198 + seq_printf(m, "\t%d", 199 + task_pid_nr_ns(p, pid->numbers[g].ns)); 200 + seq_puts(m, "\nNSpgid:"); 201 + for (g = ns->level; g <= pid->level; g++) 202 + seq_printf(m, "\t%d", 203 + task_pgrp_nr_ns(p, pid->numbers[g].ns)); 204 + seq_puts(m, "\nNSsid:"); 205 + for (g = ns->level; g <= pid->level; g++) 206 + seq_printf(m, "\t%d", 207 + task_session_nr_ns(p, pid->numbers[g].ns)); 208 + #endif 191 209 seq_putc(m, '\n'); 192 210 } 193 211 ··· 632 614 pid_t pid; 633 615 634 616 pid = pid_nr_ns(v, inode->i_sb->s_fs_info); 635 - return seq_printf(seq, "%d ", pid); 617 + seq_printf(seq, "%d ", pid); 618 + 619 + return 0; 636 620 } 637 621 638 622 static void *children_seq_start(struct seq_file *seq, loff_t *pos)

+47 -35

fs/proc/base.c

··· 238 238 239 239 wchan = get_wchan(task); 240 240 241 - if (lookup_symbol_name(wchan, symname) < 0) 241 + if (lookup_symbol_name(wchan, symname) < 0) { 242 242 if (!ptrace_may_access(task, PTRACE_MODE_READ)) 243 243 return 0; 244 - else 245 - return seq_printf(m, "%lu", wchan); 246 - else 247 - return seq_printf(m, "%s", symname); 244 + seq_printf(m, "%lu", wchan); 245 + } else { 246 + seq_printf(m, "%s", symname); 247 + } 248 + 249 + return 0; 248 250 } 249 251 #endif /* CONFIG_KALLSYMS */ 250 252 ··· 311 309 static int proc_pid_schedstat(struct seq_file *m, struct pid_namespace *ns, 312 310 struct pid *pid, struct task_struct *task) 313 311 { 314 - return seq_printf(m, "%llu %llu %lu\n", 315 - (unsigned long long)task->se.sum_exec_runtime, 316 - (unsigned long long)task->sched_info.run_delay, 317 - task->sched_info.pcount); 312 + seq_printf(m, "%llu %llu %lu\n", 313 + (unsigned long long)task->se.sum_exec_runtime, 314 + (unsigned long long)task->sched_info.run_delay, 315 + task->sched_info.pcount); 316 + 317 + return 0; 318 318 } 319 319 #endif 320 320 ··· 391 387 points = oom_badness(task, NULL, NULL, totalpages) * 392 388 1000 / totalpages; 393 389 read_unlock(&tasklist_lock); 394 - return seq_printf(m, "%lu\n", points); 390 + seq_printf(m, "%lu\n", points); 391 + 392 + return 0; 395 393 } 396 394 397 395 struct limit_names { ··· 438 432 * print the file header 439 433 */ 440 434 seq_printf(m, "%-25s %-20s %-20s %-10s\n", 441 - "Limit", "Soft Limit", "Hard Limit", "Units"); 435 + "Limit", "Soft Limit", "Hard Limit", "Units"); 442 436 443 437 for (i = 0; i < RLIM_NLIMITS; i++) { 444 438 if (rlim[i].rlim_cur == RLIM_INFINITY) 445 439 seq_printf(m, "%-25s %-20s ", 446 - lnames[i].name, "unlimited"); 440 + lnames[i].name, "unlimited"); 447 441 else 448 442 seq_printf(m, "%-25s %-20lu ", 449 - lnames[i].name, rlim[i].rlim_cur); 443 + lnames[i].name, rlim[i].rlim_cur); 450 444 451 445 if (rlim[i].rlim_max == RLIM_INFINITY) 452 446 seq_printf(m, "%-20s ", "unlimited"); ··· 468 462 { 469 463 long nr; 470 464 unsigned long args[6], sp, pc; 471 - int res = lock_trace(task); 465 + int res; 466 + 467 + res = lock_trace(task); 472 468 if (res) 473 469 return res; 474 470 ··· 485 477 args[0], args[1], args[2], args[3], args[4], args[5], 486 478 sp, pc); 487 479 unlock_trace(task); 488 - return res; 480 + 481 + return 0; 489 482 } 490 483 #endif /* CONFIG_HAVE_ARCH_TRACEHOOK */ 491 484 ··· 2011 2002 notify = timer->it_sigev_notify; 2012 2003 2013 2004 seq_printf(m, "ID: %d\n", timer->it_id); 2014 - seq_printf(m, "signal: %d/%p\n", timer->sigq->info.si_signo, 2015 - timer->sigq->info.si_value.sival_ptr); 2005 + seq_printf(m, "signal: %d/%p\n", 2006 + timer->sigq->info.si_signo, 2007 + timer->sigq->info.si_value.sival_ptr); 2016 2008 seq_printf(m, "notify: %s/%s.%d\n", 2017 - nstr[notify & ~SIGEV_THREAD_ID], 2018 - (notify & SIGEV_THREAD_ID) ? "tid" : "pid", 2019 - pid_nr_ns(timer->it_pid, tp->ns)); 2009 + nstr[notify & ~SIGEV_THREAD_ID], 2010 + (notify & SIGEV_THREAD_ID) ? "tid" : "pid", 2011 + pid_nr_ns(timer->it_pid, tp->ns)); 2020 2012 seq_printf(m, "ClockID: %d\n", timer->it_clock); 2021 2013 2022 2014 return 0; ··· 2362 2352 2363 2353 unlock_task_sighand(task, &flags); 2364 2354 } 2365 - result = seq_printf(m, 2366 - "rchar: %llu\n" 2367 - "wchar: %llu\n" 2368 - "syscr: %llu\n" 2369 - "syscw: %llu\n" 2370 - "read_bytes: %llu\n" 2371 - "write_bytes: %llu\n" 2372 - "cancelled_write_bytes: %llu\n", 2373 - (unsigned long long)acct.rchar, 2374 - (unsigned long long)acct.wchar, 2375 - (unsigned long long)acct.syscr, 2376 - (unsigned long long)acct.syscw, 2377 - (unsigned long long)acct.read_bytes, 2378 - (unsigned long long)acct.write_bytes, 2379 - (unsigned long long)acct.cancelled_write_bytes); 2355 + seq_printf(m, 2356 + "rchar: %llu\n" 2357 + "wchar: %llu\n" 2358 + "syscr: %llu\n" 2359 + "syscw: %llu\n" 2360 + "read_bytes: %llu\n" 2361 + "write_bytes: %llu\n" 2362 + "cancelled_write_bytes: %llu\n", 2363 + (unsigned long long)acct.rchar, 2364 + (unsigned long long)acct.wchar, 2365 + (unsigned long long)acct.syscr, 2366 + (unsigned long long)acct.syscw, 2367 + (unsigned long long)acct.read_bytes, 2368 + (unsigned long long)acct.write_bytes, 2369 + (unsigned long long)acct.cancelled_write_bytes); 2370 + result = 0; 2371 + 2380 2372 out_unlock: 2381 2373 mutex_unlock(&task->signal->cred_guard_mutex); 2382 2374 return result;

+3

fs/splice.c

··· 523 523 loff_t isize, left; 524 524 int ret; 525 525 526 + if (IS_DAX(in->f_mapping->host)) 527 + return default_file_splice_read(in, ppos, pipe, len, flags); 528 + 526 529 isize = i_size_read(in->f_mapping->host); 527 530 if (unlikely(*ppos >= isize)) 528 531 return 0;

-67

include/linux/a.out.h

··· 4 4 #include <uapi/linux/a.out.h> 5 5 6 6 #ifndef __ASSEMBLY__ 7 - #if defined (M_OLDSUN2) 8 - #else 9 - #endif 10 - #if defined (M_68010) 11 - #else 12 - #endif 13 - #if defined (M_68020) 14 - #else 15 - #endif 16 - #if defined (M_SPARC) 17 - #else 18 - #endif 19 - #if !defined (N_MAGIC) 20 - #endif 21 - #if !defined (N_BADMAG) 22 - #endif 23 - #if !defined (N_TXTOFF) 24 - #endif 25 - #if !defined (N_DATOFF) 26 - #endif 27 - #if !defined (N_TRELOFF) 28 - #endif 29 - #if !defined (N_DRELOFF) 30 - #endif 31 - #if !defined (N_SYMOFF) 32 - #endif 33 - #if !defined (N_STROFF) 34 - #endif 35 - #if !defined (N_TXTADDR) 36 - #endif 37 - #if defined(vax) || defined(hp300) || defined(pyr) 38 - #endif 39 - #ifdef sony 40 - #endif /* Sony. */ 41 - #ifdef is68k 42 - #endif 43 - #if defined(m68k) && defined(PORTAR) 44 - #endif 45 7 #ifdef linux 46 8 #include <asm/page.h> 47 9 #if defined(__i386__) || defined(__mc68000__) ··· 13 51 #endif 14 52 #endif 15 53 #endif 16 - #ifndef N_DATADDR 17 - #endif 18 - #if !defined (N_BSSADDR) 19 - #endif 20 - #if !defined (N_NLIST_DECLARED) 21 - #endif /* no N_NLIST_DECLARED. */ 22 - #if !defined (N_UNDF) 23 - #endif 24 - #if !defined (N_ABS) 25 - #endif 26 - #if !defined (N_TEXT) 27 - #endif 28 - #if !defined (N_DATA) 29 - #endif 30 - #if !defined (N_BSS) 31 - #endif 32 - #if !defined (N_FN) 33 - #endif 34 - #if !defined (N_EXT) 35 - #endif 36 - #if !defined (N_TYPE) 37 - #endif 38 - #if !defined (N_STAB) 39 - #endif 40 - #if !defined (N_RELOCATION_INFO_DECLARED) 41 - #ifdef NS32K 42 - #else 43 - #endif 44 - #endif /* no N_RELOCATION_INFO_DECLARED. */ 45 54 #endif /*__ASSEMBLY__ */ 46 55 #endif /* __A_OUT_GNU_H__ */

+2 -6

include/linux/bitmap.h

··· 172 172 extern int bitmap_print_to_pagebuf(bool list, char *buf, 173 173 const unsigned long *maskp, int nmaskbits); 174 174 175 - #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) % BITS_PER_LONG)) 176 - #define BITMAP_LAST_WORD_MASK(nbits) \ 177 - ( \ 178 - ((nbits) % BITS_PER_LONG) ? \ 179 - (1UL<<((nbits) % BITS_PER_LONG))-1 : ~0UL \ 180 - ) 175 + #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1))) 176 + #define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1))) 181 177 182 178 #define small_const_nbits(nbits) \ 183 179 (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG)

+29

include/linux/capability.h

··· 205 205 cap_intersect(permitted, __cap_nfsd_set)); 206 206 } 207 207 208 + #ifdef CONFIG_MULTIUSER 208 209 extern bool has_capability(struct task_struct *t, int cap); 209 210 extern bool has_ns_capability(struct task_struct *t, 210 211 struct user_namespace *ns, int cap); ··· 214 213 struct user_namespace *ns, int cap); 215 214 extern bool capable(int cap); 216 215 extern bool ns_capable(struct user_namespace *ns, int cap); 216 + #else 217 + static inline bool has_capability(struct task_struct *t, int cap) 218 + { 219 + return true; 220 + } 221 + static inline bool has_ns_capability(struct task_struct *t, 222 + struct user_namespace *ns, int cap) 223 + { 224 + return true; 225 + } 226 + static inline bool has_capability_noaudit(struct task_struct *t, int cap) 227 + { 228 + return true; 229 + } 230 + static inline bool has_ns_capability_noaudit(struct task_struct *t, 231 + struct user_namespace *ns, int cap) 232 + { 233 + return true; 234 + } 235 + static inline bool capable(int cap) 236 + { 237 + return true; 238 + } 239 + static inline bool ns_capable(struct user_namespace *ns, int cap) 240 + { 241 + return true; 242 + } 243 + #endif /* CONFIG_MULTIUSER */ 217 244 extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap); 218 245 extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap); 219 246

+1

include/linux/compaction.h

··· 34 34 extern int sysctl_extfrag_threshold; 35 35 extern int sysctl_extfrag_handler(struct ctl_table *table, int write, 36 36 void __user *buffer, size_t *length, loff_t *ppos); 37 + extern int sysctl_compact_unevictable_allowed; 37 38 38 39 extern int fragmentation_index(struct zone *zone, unsigned int order); 39 40 extern unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,

+19 -4

include/linux/cred.h

··· 62 62 groups_free(group_info); \ 63 63 } while (0) 64 64 65 - extern struct group_info *groups_alloc(int); 66 65 extern struct group_info init_groups; 66 + #ifdef CONFIG_MULTIUSER 67 + extern struct group_info *groups_alloc(int); 67 68 extern void groups_free(struct group_info *); 69 + 70 + extern int in_group_p(kgid_t); 71 + extern int in_egroup_p(kgid_t); 72 + #else 73 + static inline void groups_free(struct group_info *group_info) 74 + { 75 + } 76 + 77 + static inline int in_group_p(kgid_t grp) 78 + { 79 + return 1; 80 + } 81 + static inline int in_egroup_p(kgid_t grp) 82 + { 83 + return 1; 84 + } 85 + #endif 68 86 extern int set_current_groups(struct group_info *); 69 87 extern void set_groups(struct cred *, struct group_info *); 70 88 extern int groups_search(const struct group_info *, kgid_t); ··· 91 73 /* access the groups "array" with this macro */ 92 74 #define GROUP_AT(gi, i) \ 93 75 ((gi)->blocks[(i) / NGROUPS_PER_BLOCK][(i) % NGROUPS_PER_BLOCK]) 94 - 95 - extern int in_group_p(kgid_t); 96 - extern int in_egroup_p(kgid_t); 97 76 98 77 /* 99 78 * The security context of a task

+1 -1

include/linux/fs.h

··· 2615 2615 int dax_zero_page_range(struct inode *, loff_t from, unsigned len, get_block_t); 2616 2616 int dax_truncate_page(struct inode *, loff_t from, get_block_t); 2617 2617 int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t); 2618 + int dax_pfn_mkwrite(struct vm_area_struct *, struct vm_fault *); 2618 2619 #define dax_mkwrite(vma, vmf, gb) dax_fault(vma, vmf, gb) 2619 2620 2620 2621 #ifdef CONFIG_BLOCK ··· 2680 2679 loff_t inode_get_bytes(struct inode *inode); 2681 2680 void inode_set_bytes(struct inode *inode, loff_t bytes); 2682 2681 2683 - extern int vfs_readdir(struct file *, filldir_t, void *); 2684 2682 extern int iterate_dir(struct file *, struct dir_context *); 2685 2683 2686 2684 extern int vfs_stat(const char __user *, struct kstat *);

+9 -11

include/linux/hugetlb.h

··· 22 22 struct hugepage_subpool { 23 23 spinlock_t lock; 24 24 long count; 25 - long max_hpages, used_hpages; 25 + long max_hpages; /* Maximum huge pages or -1 if no maximum. */ 26 + long used_hpages; /* Used count against maximum, includes */ 27 + /* both alloced and reserved pages. */ 28 + struct hstate *hstate; 29 + long min_hpages; /* Minimum huge pages or -1 if no minimum. */ 30 + long rsv_hpages; /* Pages reserved against global pool to */ 31 + /* sasitfy minimum size. */ 26 32 }; 27 33 28 34 struct resv_map { ··· 44 38 #define for_each_hstate(h) \ 45 39 for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++) 46 40 47 - struct hugepage_subpool *hugepage_new_subpool(long nr_blocks); 41 + struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages, 42 + long min_hpages); 48 43 void hugepage_put_subpool(struct hugepage_subpool *spool); 49 - 50 - int PageHuge(struct page *page); 51 44 52 45 void reset_vma_resv_huge_pages(struct vm_area_struct *vma); 53 46 int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); ··· 84 79 int dequeue_hwpoisoned_huge_page(struct page *page); 85 80 bool isolate_huge_page(struct page *page, struct list_head *list); 86 81 void putback_active_hugepage(struct page *page); 87 - bool is_hugepage_active(struct page *page); 88 82 void free_huge_page(struct page *page); 89 83 90 84 #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE ··· 112 108 unsigned long address, unsigned long end, pgprot_t newprot); 113 109 114 110 #else /* !CONFIG_HUGETLB_PAGE */ 115 - 116 - static inline int PageHuge(struct page *page) 117 - { 118 - return 0; 119 - } 120 111 121 112 static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma) 122 113 { ··· 151 152 return false; 152 153 } 153 154 #define putback_active_hugepage(p) do {} while (0) 154 - #define is_hugepage_active(x) false 155 155 156 156 static inline unsigned long hugetlb_change_protection(struct vm_area_struct *vma, 157 157 unsigned long address, unsigned long end, pgprot_t newprot)

-8

include/linux/ioport.h

··· 196 196 197 197 /* Compatibility cruft */ 198 198 #define release_region(start,n) __release_region(&ioport_resource, (start), (n)) 199 - #define check_mem_region(start,n) __check_region(&iomem_resource, (start), (n)) 200 199 #define release_mem_region(start,n) __release_region(&iomem_resource, (start), (n)) 201 200 202 - extern int __check_region(struct resource *, resource_size_t, resource_size_t); 203 201 extern void __release_region(struct resource *, resource_size_t, 204 202 resource_size_t); 205 203 #ifdef CONFIG_MEMORY_HOTREMOVE 206 204 extern int release_mem_region_adjustable(struct resource *, resource_size_t, 207 205 resource_size_t); 208 206 #endif 209 - 210 - static inline int __deprecated check_region(resource_size_t s, 211 - resource_size_t n) 212 - { 213 - return __check_region(&ioport_resource, s, n); 214 - } 215 207 216 208 /* Wrappers for managed devices */ 217 209 struct device;

+2

include/linux/kasan.h

··· 44 44 45 45 void kasan_kmalloc_large(const void *ptr, size_t size); 46 46 void kasan_kfree_large(const void *ptr); 47 + void kasan_kfree(void *ptr); 47 48 void kasan_kmalloc(struct kmem_cache *s, const void *object, size_t size); 48 49 void kasan_krealloc(const void *object, size_t new_size); 49 50 ··· 72 71 73 72 static inline void kasan_kmalloc_large(void *ptr, size_t size) {} 74 73 static inline void kasan_kfree_large(const void *ptr) {} 74 + static inline void kasan_kfree(void *ptr) {} 75 75 static inline void kasan_kmalloc(struct kmem_cache *s, const void *object, 76 76 size_t size) {} 77 77 static inline void kasan_krealloc(const void *object, size_t new_size) {}

-17

include/linux/ksm.h

··· 35 35 __ksm_exit(mm); 36 36 } 37 37 38 - /* 39 - * A KSM page is one of those write-protected "shared pages" or "merged pages" 40 - * which KSM maps into multiple mms, wherever identical anonymous page content 41 - * is found in VM_MERGEABLE vmas. It's a PageAnon page, pointing not to any 42 - * anon_vma, but to that page's node of the stable tree. 43 - */ 44 - static inline int PageKsm(struct page *page) 45 - { 46 - return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) == 47 - (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM); 48 - } 49 - 50 38 static inline struct stable_node *page_stable_node(struct page *page) 51 39 { 52 40 return PageKsm(page) ? page_rmapping(page) : NULL; ··· 73 85 74 86 static inline void ksm_exit(struct mm_struct *mm) 75 87 { 76 - } 77 - 78 - static inline int PageKsm(struct page *page) 79 - { 80 - return 0; 81 88 } 82 89 83 90 #ifdef CONFIG_MMU

+2 -1

include/linux/mempool.h

··· 36 36 37 37 /* 38 38 * A mempool_alloc_t and mempool_free_t that get the memory from 39 - * a slab that is passed in through pool_data. 39 + * a slab cache that is passed in through pool_data. 40 + * Note: the slab cache may not have a ctor function. 40 41 */ 41 42 void *mempool_alloc_slab(gfp_t gfp_mask, void *pool_data); 42 43 void mempool_free_slab(void *element, void *pool_data);

+9 -91

include/linux/mm.h

··· 251 251 * writable, if an error is returned it will cause a SIGBUS */ 252 252 int (*page_mkwrite)(struct vm_area_struct *vma, struct vm_fault *vmf); 253 253 254 + /* same as page_mkwrite when using VM_PFNMAP|VM_MIXEDMAP */ 255 + int (*pfn_mkwrite)(struct vm_area_struct *vma, struct vm_fault *vmf); 256 + 254 257 /* called by access_process_vm when get_user_pages() fails, typically 255 258 * for use by special VMAs that can switch between memory and hardware 256 259 */ ··· 497 494 return atomic_read(&compound_head(page)->_count); 498 495 } 499 496 500 - #ifdef CONFIG_HUGETLB_PAGE 501 - extern int PageHeadHuge(struct page *page_head); 502 - #else /* CONFIG_HUGETLB_PAGE */ 503 - static inline int PageHeadHuge(struct page *page_head) 504 - { 505 - return 0; 506 - } 507 - #endif /* CONFIG_HUGETLB_PAGE */ 508 - 509 497 static inline bool __compound_tail_refcounted(struct page *page) 510 498 { 511 - return !PageSlab(page) && !PageHeadHuge(page); 499 + return PageAnon(page) && !PageSlab(page) && !PageHeadHuge(page); 512 500 } 513 501 514 502 /* ··· 563 569 static inline void init_page_count(struct page *page) 564 570 { 565 571 atomic_set(&page->_count, 1); 566 - } 567 - 568 - /* 569 - * PageBuddy() indicate that the page is free and in the buddy system 570 - * (see mm/page_alloc.c). 571 - * 572 - * PAGE_BUDDY_MAPCOUNT_VALUE must be <= -2 but better not too close to 573 - * -2 so that an underflow of the page_mapcount() won't be mistaken 574 - * for a genuine PAGE_BUDDY_MAPCOUNT_VALUE. -128 can be created very 575 - * efficiently by most CPU architectures. 576 - */ 577 - #define PAGE_BUDDY_MAPCOUNT_VALUE (-128) 578 - 579 - static inline int PageBuddy(struct page *page) 580 - { 581 - return atomic_read(&page->_mapcount) == PAGE_BUDDY_MAPCOUNT_VALUE; 582 - } 583 - 584 - static inline void __SetPageBuddy(struct page *page) 585 - { 586 - VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page); 587 - atomic_set(&page->_mapcount, PAGE_BUDDY_MAPCOUNT_VALUE); 588 - } 589 - 590 - static inline void __ClearPageBuddy(struct page *page) 591 - { 592 - VM_BUG_ON_PAGE(!PageBuddy(page), page); 593 - atomic_set(&page->_mapcount, -1); 594 - } 595 - 596 - #define PAGE_BALLOON_MAPCOUNT_VALUE (-256) 597 - 598 - static inline int PageBalloon(struct page *page) 599 - { 600 - return atomic_read(&page->_mapcount) == PAGE_BALLOON_MAPCOUNT_VALUE; 601 - } 602 - 603 - static inline void __SetPageBalloon(struct page *page) 604 - { 605 - VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page); 606 - atomic_set(&page->_mapcount, PAGE_BALLOON_MAPCOUNT_VALUE); 607 - } 608 - 609 - static inline void __ClearPageBalloon(struct page *page) 610 - { 611 - VM_BUG_ON_PAGE(!PageBalloon(page), page); 612 - atomic_set(&page->_mapcount, -1); 613 572 } 614 573 615 574 void put_page(struct page *page); ··· 953 1006 #define page_address_init() do { } while(0) 954 1007 #endif 955 1008 956 - /* 957 - * On an anonymous page mapped into a user virtual memory area, 958 - * page->mapping points to its anon_vma, not to a struct address_space; 959 - * with the PAGE_MAPPING_ANON bit set to distinguish it. See rmap.h. 960 - * 961 - * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled, 962 - * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit; 963 - * and then page->mapping points, not to an anon_vma, but to a private 964 - * structure which KSM associates with that merged page. See ksm.h. 965 - * 966 - * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used. 967 - * 968 - * Please note that, confusingly, "page_mapping" refers to the inode 969 - * address_space which maps the page from disk; whereas "page_mapped" 970 - * refers to user virtual address space into which the page is mapped. 971 - */ 972 - #define PAGE_MAPPING_ANON 1 973 - #define PAGE_MAPPING_KSM 2 974 - #define PAGE_MAPPING_FLAGS (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM) 975 - 1009 + extern void *page_rmapping(struct page *page); 1010 + extern struct anon_vma *page_anon_vma(struct page *page); 976 1011 extern struct address_space *page_mapping(struct page *page); 977 - 978 - /* Neutral page->mapping pointer to address_space or anon_vma or other */ 979 - static inline void *page_rmapping(struct page *page) 980 - { 981 - return (void *)((unsigned long)page->mapping & ~PAGE_MAPPING_FLAGS); 982 - } 983 1012 984 1013 extern struct address_space *__page_file_mapping(struct page *); 985 1014 ··· 966 1043 return __page_file_mapping(page); 967 1044 968 1045 return page->mapping; 969 - } 970 - 971 - static inline int PageAnon(struct page *page) 972 - { 973 - return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0; 974 1046 } 975 1047 976 1048 /* ··· 1893 1975 static inline unsigned long 1894 1976 vm_unmapped_area(struct vm_unmapped_area_info *info) 1895 1977 { 1896 - if (!(info->flags & VM_UNMAPPED_AREA_TOPDOWN)) 1897 - return unmapped_area(info); 1898 - else 1978 + if (info->flags & VM_UNMAPPED_AREA_TOPDOWN) 1899 1979 return unmapped_area_topdown(info); 1980 + else 1981 + return unmapped_area(info); 1900 1982 } 1901 1983 1902 1984 /* truncate.c */

+4 -4

include/linux/mmzone.h

··· 842 842 843 843 extern int movable_zone; 844 844 845 + #ifdef CONFIG_HIGHMEM 845 846 static inline int zone_movable_is_highmem(void) 846 847 { 847 - #if defined(CONFIG_HIGHMEM) && defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) 848 + #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP 848 849 return movable_zone == ZONE_HIGHMEM; 849 - #elif defined(CONFIG_HIGHMEM) 850 - return (ZONE_MOVABLE - 1) == ZONE_HIGHMEM; 851 850 #else 852 - return 0; 851 + return (ZONE_MOVABLE - 1) == ZONE_HIGHMEM; 853 852 #endif 854 853 } 854 + #endif 855 855 856 856 static inline int is_highmem_idx(enum zone_type idx) 857 857 {

+103

include/linux/page-flags.h

··· 289 289 #define __PG_HWPOISON 0 290 290 #endif 291 291 292 + /* 293 + * On an anonymous page mapped into a user virtual memory area, 294 + * page->mapping points to its anon_vma, not to a struct address_space; 295 + * with the PAGE_MAPPING_ANON bit set to distinguish it. See rmap.h. 296 + * 297 + * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled, 298 + * the PAGE_MAPPING_KSM bit may be set along with the PAGE_MAPPING_ANON bit; 299 + * and then page->mapping points, not to an anon_vma, but to a private 300 + * structure which KSM associates with that merged page. See ksm.h. 301 + * 302 + * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is currently never used. 303 + * 304 + * Please note that, confusingly, "page_mapping" refers to the inode 305 + * address_space which maps the page from disk; whereas "page_mapped" 306 + * refers to user virtual address space into which the page is mapped. 307 + */ 308 + #define PAGE_MAPPING_ANON 1 309 + #define PAGE_MAPPING_KSM 2 310 + #define PAGE_MAPPING_FLAGS (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM) 311 + 312 + static inline int PageAnon(struct page *page) 313 + { 314 + return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0; 315 + } 316 + 317 + #ifdef CONFIG_KSM 318 + /* 319 + * A KSM page is one of those write-protected "shared pages" or "merged pages" 320 + * which KSM maps into multiple mms, wherever identical anonymous page content 321 + * is found in VM_MERGEABLE vmas. It's a PageAnon page, pointing not to any 322 + * anon_vma, but to that page's node of the stable tree. 323 + */ 324 + static inline int PageKsm(struct page *page) 325 + { 326 + return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) == 327 + (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM); 328 + } 329 + #else 330 + TESTPAGEFLAG_FALSE(Ksm) 331 + #endif 332 + 292 333 u64 stable_page_flags(struct page *page); 293 334 294 335 static inline int PageUptodate(struct page *page) ··· 467 426 468 427 #endif /* !PAGEFLAGS_EXTENDED */ 469 428 429 + #ifdef CONFIG_HUGETLB_PAGE 430 + int PageHuge(struct page *page); 431 + int PageHeadHuge(struct page *page); 432 + bool page_huge_active(struct page *page); 433 + #else 434 + TESTPAGEFLAG_FALSE(Huge) 435 + TESTPAGEFLAG_FALSE(HeadHuge) 436 + 437 + static inline bool page_huge_active(struct page *page) 438 + { 439 + return 0; 440 + } 441 + #endif 442 + 443 + 470 444 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 471 445 /* 472 446 * PageHuge() only returns true for hugetlbfs pages, but not for ··· 534 478 return 0; 535 479 } 536 480 #endif 481 + 482 + /* 483 + * PageBuddy() indicate that the page is free and in the buddy system 484 + * (see mm/page_alloc.c). 485 + * 486 + * PAGE_BUDDY_MAPCOUNT_VALUE must be <= -2 but better not too close to 487 + * -2 so that an underflow of the page_mapcount() won't be mistaken 488 + * for a genuine PAGE_BUDDY_MAPCOUNT_VALUE. -128 can be created very 489 + * efficiently by most CPU architectures. 490 + */ 491 + #define PAGE_BUDDY_MAPCOUNT_VALUE (-128) 492 + 493 + static inline int PageBuddy(struct page *page) 494 + { 495 + return atomic_read(&page->_mapcount) == PAGE_BUDDY_MAPCOUNT_VALUE; 496 + } 497 + 498 + static inline void __SetPageBuddy(struct page *page) 499 + { 500 + VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page); 501 + atomic_set(&page->_mapcount, PAGE_BUDDY_MAPCOUNT_VALUE); 502 + } 503 + 504 + static inline void __ClearPageBuddy(struct page *page) 505 + { 506 + VM_BUG_ON_PAGE(!PageBuddy(page), page); 507 + atomic_set(&page->_mapcount, -1); 508 + } 509 + 510 + #define PAGE_BALLOON_MAPCOUNT_VALUE (-256) 511 + 512 + static inline int PageBalloon(struct page *page) 513 + { 514 + return atomic_read(&page->_mapcount) == PAGE_BALLOON_MAPCOUNT_VALUE; 515 + } 516 + 517 + static inline void __SetPageBalloon(struct page *page) 518 + { 519 + VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page); 520 + atomic_set(&page->_mapcount, PAGE_BALLOON_MAPCOUNT_VALUE); 521 + } 522 + 523 + static inline void __ClearPageBalloon(struct page *page) 524 + { 525 + VM_BUG_ON_PAGE(!PageBalloon(page), page); 526 + atomic_set(&page->_mapcount, -1); 527 + } 537 528 538 529 /* 539 530 * If network-based swap is enabled, sl*b must keep track of whether pages

+5

include/linux/printk.h

··· 255 255 printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__) 256 256 #define pr_info(fmt, ...) \ 257 257 printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) 258 + /* 259 + * Like KERN_CONT, pr_cont() should only be used when continuing 260 + * a line with no newline ('\n') enclosed. Otherwise it defaults 261 + * back to KERN_DEFAULT. 262 + */ 258 263 #define pr_cont(fmt, ...) \ 259 264 printk(KERN_CONT fmt, ##__VA_ARGS__) 260 265

+2 -1

include/linux/reboot.h

··· 70 70 #define POWEROFF_CMD_PATH_LEN 256 71 71 extern char poweroff_cmd[POWEROFF_CMD_PATH_LEN]; 72 72 73 - extern int orderly_poweroff(bool force); 73 + extern void orderly_poweroff(bool force); 74 + extern void orderly_reboot(void); 74 75 75 76 /* 76 77 * Emergency restart, callable from an interrupt handler.

-8

include/linux/rmap.h

··· 105 105 __put_anon_vma(anon_vma); 106 106 } 107 107 108 - static inline struct anon_vma *page_anon_vma(struct page *page) 109 - { 110 - if (((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) != 111 - PAGE_MAPPING_ANON) 112 - return NULL; 113 - return page_rmapping(page); 114 - } 115 - 116 108 static inline void vma_lock_anon_vma(struct vm_area_struct *vma) 117 109 { 118 110 struct anon_vma *anon_vma = vma->anon_vma;

+4 -4

include/linux/string_helpers.h

··· 47 47 #define ESCAPE_ANY_NP (ESCAPE_ANY | ESCAPE_NP) 48 48 #define ESCAPE_HEX 0x20 49 49 50 - int string_escape_mem(const char *src, size_t isz, char **dst, size_t osz, 50 + int string_escape_mem(const char *src, size_t isz, char *dst, size_t osz, 51 51 unsigned int flags, const char *esc); 52 52 53 53 static inline int string_escape_mem_any_np(const char *src, size_t isz, 54 - char **dst, size_t osz, const char *esc) 54 + char *dst, size_t osz, const char *esc) 55 55 { 56 56 return string_escape_mem(src, isz, dst, osz, ESCAPE_ANY_NP, esc); 57 57 } 58 58 59 - static inline int string_escape_str(const char *src, char **dst, size_t sz, 59 + static inline int string_escape_str(const char *src, char *dst, size_t sz, 60 60 unsigned int flags, const char *esc) 61 61 { 62 62 return string_escape_mem(src, strlen(src), dst, sz, flags, esc); 63 63 } 64 64 65 - static inline int string_escape_str_any_np(const char *src, char **dst, 65 + static inline int string_escape_str_any_np(const char *src, char *dst, 66 66 size_t sz, const char *esc) 67 67 { 68 68 return string_escape_str(src, dst, sz, ESCAPE_ANY_NP, esc);

+1 -1

include/linux/swap.h

··· 307 307 extern void lru_add_drain_cpu(int cpu); 308 308 extern void lru_add_drain_all(void); 309 309 extern void rotate_reclaimable_page(struct page *page); 310 - extern void deactivate_page(struct page *page); 310 + extern void deactivate_file_page(struct page *page); 311 311 extern void swap_setup(void); 312 312 313 313 extern void add_page_to_unevictable_list(struct page *page);

-6

include/linux/types.h

··· 146 146 typedef u32 dma_addr_t; 147 147 #endif /* dma_addr_t */ 148 148 149 - #ifdef __CHECKER__ 150 - #else 151 - #endif 152 - #ifdef __CHECK_ENDIAN__ 153 - #else 154 - #endif 155 149 typedef unsigned __bitwise__ gfp_t; 156 150 typedef unsigned __bitwise__ fmode_t; 157 151 typedef unsigned __bitwise__ oom_flags_t;

+12

include/linux/uidgid.h

··· 29 29 #define KUIDT_INIT(value) (kuid_t){ value } 30 30 #define KGIDT_INIT(value) (kgid_t){ value } 31 31 32 + #ifdef CONFIG_MULTIUSER 32 33 static inline uid_t __kuid_val(kuid_t uid) 33 34 { 34 35 return uid.val; ··· 39 38 { 40 39 return gid.val; 41 40 } 41 + #else 42 + static inline uid_t __kuid_val(kuid_t uid) 43 + { 44 + return 0; 45 + } 46 + 47 + static inline gid_t __kgid_val(kgid_t gid) 48 + { 49 + return 0; 50 + } 51 + #endif 42 52 43 53 #define GLOBAL_ROOT_UID KUIDT_INIT(0) 44 54 #define GLOBAL_ROOT_GID KGIDT_INIT(0)

+1

include/linux/zsmalloc.h

··· 47 47 void zs_unmap_object(struct zs_pool *pool, unsigned long handle); 48 48 49 49 unsigned long zs_get_total_pages(struct zs_pool *pool); 50 + unsigned long zs_compact(struct zs_pool *pool); 50 51 51 52 #endif

+66

include/trace/events/cma.h

··· 1 + #undef TRACE_SYSTEM 2 + #define TRACE_SYSTEM cma 3 + 4 + #if !defined(_TRACE_CMA_H) || defined(TRACE_HEADER_MULTI_READ) 5 + #define _TRACE_CMA_H 6 + 7 + #include <linux/types.h> 8 + #include <linux/tracepoint.h> 9 + 10 + TRACE_EVENT(cma_alloc, 11 + 12 + TP_PROTO(unsigned long pfn, const struct page *page, 13 + unsigned int count, unsigned int align), 14 + 15 + TP_ARGS(pfn, page, count, align), 16 + 17 + TP_STRUCT__entry( 18 + __field(unsigned long, pfn) 19 + __field(const struct page *, page) 20 + __field(unsigned int, count) 21 + __field(unsigned int, align) 22 + ), 23 + 24 + TP_fast_assign( 25 + __entry->pfn = pfn; 26 + __entry->page = page; 27 + __entry->count = count; 28 + __entry->align = align; 29 + ), 30 + 31 + TP_printk("pfn=%lx page=%p count=%u align=%u", 32 + __entry->pfn, 33 + __entry->page, 34 + __entry->count, 35 + __entry->align) 36 + ); 37 + 38 + TRACE_EVENT(cma_release, 39 + 40 + TP_PROTO(unsigned long pfn, const struct page *page, 41 + unsigned int count), 42 + 43 + TP_ARGS(pfn, page, count), 44 + 45 + TP_STRUCT__entry( 46 + __field(unsigned long, pfn) 47 + __field(const struct page *, page) 48 + __field(unsigned int, count) 49 + ), 50 + 51 + TP_fast_assign( 52 + __entry->pfn = pfn; 53 + __entry->page = page; 54 + __entry->count = count; 55 + ), 56 + 57 + TP_printk("pfn=%lx page=%p count=%u", 58 + __entry->pfn, 59 + __entry->page, 60 + __entry->count) 61 + ); 62 + 63 + #endif /* _TRACE_CMA_H */ 64 + 65 + /* This part must be outside protection */ 66 + #include <trace/define_trace.h>

+18 -1

init/Kconfig

··· 394 394 395 395 config BSD_PROCESS_ACCT 396 396 bool "BSD Process Accounting" 397 + depends on MULTIUSER 397 398 help 398 399 If you say Y here, a user level program will be able to instruct the 399 400 kernel (via a special system call) to write process accounting ··· 421 420 config TASKSTATS 422 421 bool "Export task/process statistics through netlink" 423 422 depends on NET 423 + depends on MULTIUSER 424 424 default n 425 425 help 426 426 Export selected statistics for tasks/processes through the ··· 1162 1160 1163 1161 menuconfig NAMESPACES 1164 1162 bool "Namespaces support" if EXPERT 1163 + depends on MULTIUSER 1165 1164 default !EXPERT 1166 1165 help 1167 1166 Provides the way to make tasks work with different objects using ··· 1359 1356 1360 1357 config UID16 1361 1358 bool "Enable 16-bit UID system calls" if EXPERT 1362 - depends on HAVE_UID16 1359 + depends on HAVE_UID16 && MULTIUSER 1363 1360 default y 1364 1361 help 1365 1362 This enables the legacy 16-bit UID syscall wrappers. 1363 + 1364 + config MULTIUSER 1365 + bool "Multiple users, groups and capabilities support" if EXPERT 1366 + default y 1367 + help 1368 + This option enables support for non-root users, groups and 1369 + capabilities. 1370 + 1371 + If you say N here, all processes will run with UID 0, GID 0, and all 1372 + possible capabilities. Saying N here also compiles out support for 1373 + system calls related to UIDs, GIDs, and capabilities, such as setuid, 1374 + setgid, and capset. 1375 + 1376 + If unsure, say Y here. 1366 1377 1367 1378 config SGETMASK_SYSCALL 1368 1379 bool "sgetmask/ssetmask syscalls support" if EXPERT

+18 -16

ipc/msg.c

··· 1015 1015 struct user_namespace *user_ns = seq_user_ns(s); 1016 1016 struct msg_queue *msq = it; 1017 1017 1018 - return seq_printf(s, 1019 - "%10d %10d %4o %10lu %10lu %5u %5u %5u %5u %5u %5u %10lu %10lu %10lu\n", 1020 - msq->q_perm.key, 1021 - msq->q_perm.id, 1022 - msq->q_perm.mode, 1023 - msq->q_cbytes, 1024 - msq->q_qnum, 1025 - msq->q_lspid, 1026 - msq->q_lrpid, 1027 - from_kuid_munged(user_ns, msq->q_perm.uid), 1028 - from_kgid_munged(user_ns, msq->q_perm.gid), 1029 - from_kuid_munged(user_ns, msq->q_perm.cuid), 1030 - from_kgid_munged(user_ns, msq->q_perm.cgid), 1031 - msq->q_stime, 1032 - msq->q_rtime, 1033 - msq->q_ctime); 1018 + seq_printf(s, 1019 + "%10d %10d %4o %10lu %10lu %5u %5u %5u %5u %5u %5u %10lu %10lu %10lu\n", 1020 + msq->q_perm.key, 1021 + msq->q_perm.id, 1022 + msq->q_perm.mode, 1023 + msq->q_cbytes, 1024 + msq->q_qnum, 1025 + msq->q_lspid, 1026 + msq->q_lrpid, 1027 + from_kuid_munged(user_ns, msq->q_perm.uid), 1028 + from_kgid_munged(user_ns, msq->q_perm.gid), 1029 + from_kuid_munged(user_ns, msq->q_perm.cuid), 1030 + from_kgid_munged(user_ns, msq->q_perm.cgid), 1031 + msq->q_stime, 1032 + msq->q_rtime, 1033 + msq->q_ctime); 1034 + 1035 + return 0; 1034 1036 } 1035 1037 #endif 1036 1038

+14 -12

ipc/sem.c

··· 2170 2170 2171 2171 sem_otime = get_semotime(sma); 2172 2172 2173 - return seq_printf(s, 2174 - "%10d %10d %4o %10u %5u %5u %5u %5u %10lu %10lu\n", 2175 - sma->sem_perm.key, 2176 - sma->sem_perm.id, 2177 - sma->sem_perm.mode, 2178 - sma->sem_nsems, 2179 - from_kuid_munged(user_ns, sma->sem_perm.uid), 2180 - from_kgid_munged(user_ns, sma->sem_perm.gid), 2181 - from_kuid_munged(user_ns, sma->sem_perm.cuid), 2182 - from_kgid_munged(user_ns, sma->sem_perm.cgid), 2183 - sem_otime, 2184 - sma->sem_ctime); 2173 + seq_printf(s, 2174 + "%10d %10d %4o %10u %5u %5u %5u %5u %10lu %10lu\n", 2175 + sma->sem_perm.key, 2176 + sma->sem_perm.id, 2177 + sma->sem_perm.mode, 2178 + sma->sem_nsems, 2179 + from_kuid_munged(user_ns, sma->sem_perm.uid), 2180 + from_kgid_munged(user_ns, sma->sem_perm.gid), 2181 + from_kuid_munged(user_ns, sma->sem_perm.cuid), 2182 + from_kgid_munged(user_ns, sma->sem_perm.cgid), 2183 + sem_otime, 2184 + sma->sem_ctime); 2185 + 2186 + return 0; 2185 2187 } 2186 2188 #endif

+22 -20

ipc/shm.c

··· 1342 1342 #define SIZE_SPEC "%21lu" 1343 1343 #endif 1344 1344 1345 - return seq_printf(s, 1346 - "%10d %10d %4o " SIZE_SPEC " %5u %5u " 1347 - "%5lu %5u %5u %5u %5u %10lu %10lu %10lu " 1348 - SIZE_SPEC " " SIZE_SPEC "\n", 1349 - shp->shm_perm.key, 1350 - shp->shm_perm.id, 1351 - shp->shm_perm.mode, 1352 - shp->shm_segsz, 1353 - shp->shm_cprid, 1354 - shp->shm_lprid, 1355 - shp->shm_nattch, 1356 - from_kuid_munged(user_ns, shp->shm_perm.uid), 1357 - from_kgid_munged(user_ns, shp->shm_perm.gid), 1358 - from_kuid_munged(user_ns, shp->shm_perm.cuid), 1359 - from_kgid_munged(user_ns, shp->shm_perm.cgid), 1360 - shp->shm_atim, 1361 - shp->shm_dtim, 1362 - shp->shm_ctim, 1363 - rss * PAGE_SIZE, 1364 - swp * PAGE_SIZE); 1345 + seq_printf(s, 1346 + "%10d %10d %4o " SIZE_SPEC " %5u %5u " 1347 + "%5lu %5u %5u %5u %5u %10lu %10lu %10lu " 1348 + SIZE_SPEC " " SIZE_SPEC "\n", 1349 + shp->shm_perm.key, 1350 + shp->shm_perm.id, 1351 + shp->shm_perm.mode, 1352 + shp->shm_segsz, 1353 + shp->shm_cprid, 1354 + shp->shm_lprid, 1355 + shp->shm_nattch, 1356 + from_kuid_munged(user_ns, shp->shm_perm.uid), 1357 + from_kgid_munged(user_ns, shp->shm_perm.gid), 1358 + from_kuid_munged(user_ns, shp->shm_perm.cuid), 1359 + from_kgid_munged(user_ns, shp->shm_perm.cgid), 1360 + shp->shm_atim, 1361 + shp->shm_dtim, 1362 + shp->shm_ctim, 1363 + rss * PAGE_SIZE, 1364 + swp * PAGE_SIZE); 1365 + 1366 + return 0; 1365 1367 } 1366 1368 #endif

+4 -2

ipc/util.c

··· 837 837 struct ipc_proc_iter *iter = s->private; 838 838 struct ipc_proc_iface *iface = iter->iface; 839 839 840 - if (it == SEQ_START_TOKEN) 841 - return seq_puts(s, iface->header); 840 + if (it == SEQ_START_TOKEN) { 841 + seq_puts(s, iface->header); 842 + return 0; 843 + } 842 844 843 845 return iface->show(s, it); 844 846 }

+3 -1

kernel/Makefile

··· 9 9 extable.o params.o \ 10 10 kthread.o sys_ni.o nsproxy.o \ 11 11 notifier.o ksysfs.o cred.o reboot.o \ 12 - async.o range.o groups.o smpboot.o 12 + async.o range.o smpboot.o 13 + 14 + obj-$(CONFIG_MULTIUSER) += groups.o 13 15 14 16 ifdef CONFIG_FUNCTION_TRACER 15 17 # Do not trace debug files and internal ftrace files

+19 -16

kernel/capability.c

··· 35 35 } 36 36 __setup("no_file_caps", file_caps_disable); 37 37 38 + #ifdef CONFIG_MULTIUSER 38 39 /* 39 40 * More recent versions of libcap are available from: 40 41 * ··· 387 386 } 388 387 EXPORT_SYMBOL(ns_capable); 389 388 389 + 390 + /** 391 + * capable - Determine if the current task has a superior capability in effect 392 + * @cap: The capability to be tested for 393 + * 394 + * Return true if the current task has the given superior capability currently 395 + * available for use, false if not. 396 + * 397 + * This sets PF_SUPERPRIV on the task if the capability is available on the 398 + * assumption that it's about to be used. 399 + */ 400 + bool capable(int cap) 401 + { 402 + return ns_capable(&init_user_ns, cap); 403 + } 404 + EXPORT_SYMBOL(capable); 405 + #endif /* CONFIG_MULTIUSER */ 406 + 390 407 /** 391 408 * file_ns_capable - Determine if the file's opener had a capability in effect 392 409 * @file: The file we want to check ··· 429 410 return false; 430 411 } 431 412 EXPORT_SYMBOL(file_ns_capable); 432 - 433 - /** 434 - * capable - Determine if the current task has a superior capability in effect 435 - * @cap: The capability to be tested for 436 - * 437 - * Return true if the current task has the given superior capability currently 438 - * available for use, false if not. 439 - * 440 - * This sets PF_SUPERPRIV on the task if the capability is available on the 441 - * assumption that it's about to be used. 442 - */ 443 - bool capable(int cap) 444 - { 445 - return ns_capable(&init_user_ns, cap); 446 - } 447 - EXPORT_SYMBOL(capable); 448 413 449 414 /** 450 415 * capable_wrt_inode_uidgid - Check nsown_capable and uid and gid mapped

+4 -2

kernel/cgroup.c

··· 4196 4196 4197 4197 static int cgroup_pidlist_show(struct seq_file *s, void *v) 4198 4198 { 4199 - return seq_printf(s, "%d\n", *(int *)v); 4199 + seq_printf(s, "%d\n", *(int *)v); 4200 + 4201 + return 0; 4200 4202 } 4201 4203 4202 4204 static u64 cgroup_read_notify_on_release(struct cgroup_subsys_state *css, ··· 5453 5451 struct cgroup_subsys_state *css_from_id(int id, struct cgroup_subsys *ss) 5454 5452 { 5455 5453 WARN_ON_ONCE(!rcu_read_lock_held()); 5456 - return idr_find(&ss->css_idr, id); 5454 + return id > 0 ? idr_find(&ss->css_idr, id) : NULL; 5457 5455 } 5458 5456 5459 5457 #ifdef CONFIG_CGROUP_DEBUG

+3

kernel/cred.c

··· 29 29 30 30 static struct kmem_cache *cred_jar; 31 31 32 + /* init to 2 - one for init_task, one to ensure it is never freed */ 33 + struct group_info init_groups = { .usage = ATOMIC_INIT(2) }; 34 + 32 35 /* 33 36 * The initial credentials for the initial task 34 37 */

-3

kernel/groups.c

··· 9 9 #include <linux/user_namespace.h> 10 10 #include <asm/uaccess.h> 11 11 12 - /* init to 2 - one for init_task, one to ensure it is never freed */ 13 - struct group_info init_groups = { .usage = ATOMIC_INIT(2) }; 14 - 15 12 struct group_info *groups_alloc(int gidsetsize) 16 13 { 17 14 struct group_info *group_info;

+2 -2

kernel/hung_task.c

··· 169 169 return; 170 170 171 171 rcu_read_lock(); 172 - do_each_thread(g, t) { 172 + for_each_process_thread(g, t) { 173 173 if (!max_count--) 174 174 goto unlock; 175 175 if (!--batch_count) { ··· 180 180 /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */ 181 181 if (t->state == TASK_UNINTERRUPTIBLE) 182 182 check_hung_task(t, timeout); 183 - } while_each_thread(g, t); 183 + } 184 184 unlock: 185 185 rcu_read_unlock(); 186 186 }

+48 -5

kernel/reboot.c

··· 387 387 } 388 388 389 389 char poweroff_cmd[POWEROFF_CMD_PATH_LEN] = "/sbin/poweroff"; 390 + static const char reboot_cmd[] = "/sbin/reboot"; 390 391 391 - static int __orderly_poweroff(bool force) 392 + static int run_cmd(const char *cmd) 392 393 { 393 394 char **argv; 394 395 static char *envp[] = { ··· 398 397 NULL 399 398 }; 400 399 int ret; 401 - 402 - argv = argv_split(GFP_KERNEL, poweroff_cmd, NULL); 400 + argv = argv_split(GFP_KERNEL, cmd, NULL); 403 401 if (argv) { 404 402 ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC); 405 403 argv_free(argv); ··· 406 406 ret = -ENOMEM; 407 407 } 408 408 409 + return ret; 410 + } 411 + 412 + static int __orderly_reboot(void) 413 + { 414 + int ret; 415 + 416 + ret = run_cmd(reboot_cmd); 417 + 418 + if (ret) { 419 + pr_warn("Failed to start orderly reboot: forcing the issue\n"); 420 + emergency_sync(); 421 + kernel_restart(NULL); 422 + } 423 + 424 + return ret; 425 + } 426 + 427 + static int __orderly_poweroff(bool force) 428 + { 429 + int ret; 430 + 431 + ret = run_cmd(poweroff_cmd); 432 + 409 433 if (ret && force) { 410 434 pr_warn("Failed to start orderly shutdown: forcing the issue\n"); 435 + 411 436 /* 412 437 * I guess this should try to kick off some daemon to sync and 413 438 * poweroff asap. Or not even bother syncing if we're doing an ··· 461 436 * This may be called from any context to trigger a system shutdown. 462 437 * If the orderly shutdown fails, it will force an immediate shutdown. 463 438 */ 464 - int orderly_poweroff(bool force) 439 + void orderly_poweroff(bool force) 465 440 { 466 441 if (force) /* do not override the pending "true" */ 467 442 poweroff_force = true; 468 443 schedule_work(&poweroff_work); 469 - return 0; 470 444 } 471 445 EXPORT_SYMBOL_GPL(orderly_poweroff); 446 + 447 + static void reboot_work_func(struct work_struct *work) 448 + { 449 + __orderly_reboot(); 450 + } 451 + 452 + static DECLARE_WORK(reboot_work, reboot_work_func); 453 + 454 + /** 455 + * orderly_reboot - Trigger an orderly system reboot 456 + * 457 + * This may be called from any context to trigger a system reboot. 458 + * If the orderly reboot fails, it will force an immediate reboot. 459 + */ 460 + void orderly_reboot(void) 461 + { 462 + schedule_work(&reboot_work); 463 + } 464 + EXPORT_SYMBOL_GPL(orderly_reboot); 472 465 473 466 static int __init reboot_setup(char *str) 474 467 {

-32

kernel/resource.c

··· 1034 1034 * 1035 1035 * request_region creates a new busy region. 1036 1036 * 1037 - * check_region returns non-zero if the area is already busy. 1038 - * 1039 1037 * release_region releases a matching busy region. 1040 1038 */ 1041 1039 ··· 1094 1096 return res; 1095 1097 } 1096 1098 EXPORT_SYMBOL(__request_region); 1097 - 1098 - /** 1099 - * __check_region - check if a resource region is busy or free 1100 - * @parent: parent resource descriptor 1101 - * @start: resource start address 1102 - * @n: resource region size 1103 - * 1104 - * Returns 0 if the region is free at the moment it is checked, 1105 - * returns %-EBUSY if the region is busy. 1106 - * 1107 - * NOTE: 1108 - * This function is deprecated because its use is racy. 1109 - * Even if it returns 0, a subsequent call to request_region() 1110 - * may fail because another driver etc. just allocated the region. 1111 - * Do NOT use it. It will be removed from the kernel. 1112 - */ 1113 - int __check_region(struct resource *parent, resource_size_t start, 1114 - resource_size_t n) 1115 - { 1116 - struct resource * res; 1117 - 1118 - res = __request_region(parent, start, n, "check-region", 0); 1119 - if (!res) 1120 - return -EBUSY; 1121 - 1122 - release_resource(res); 1123 - free_resource(res); 1124 - return 0; 1125 - } 1126 - EXPORT_SYMBOL(__check_region); 1127 1099 1128 1100 /** 1129 1101 * __release_region - release a previously reserved resource region

+2

kernel/sys.c

··· 325 325 * SMP: There are not races, the GIDs are checked only by filesystem 326 326 * operations (as far as semantic preservation is concerned). 327 327 */ 328 + #ifdef CONFIG_MULTIUSER 328 329 SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid) 329 330 { 330 331 struct user_namespace *ns = current_user_ns(); ··· 816 815 commit_creds(new); 817 816 return old_fsgid; 818 817 } 818 + #endif /* CONFIG_MULTIUSER */ 819 819 820 820 /** 821 821 * sys_getpid - return the thread group id of the current process

+14

kernel/sys_ni.c

··· 159 159 cond_syscall(sys_fadvise64); 160 160 cond_syscall(sys_fadvise64_64); 161 161 cond_syscall(sys_madvise); 162 + cond_syscall(sys_setuid); 163 + cond_syscall(sys_setregid); 164 + cond_syscall(sys_setgid); 165 + cond_syscall(sys_setreuid); 166 + cond_syscall(sys_setresuid); 167 + cond_syscall(sys_getresuid); 168 + cond_syscall(sys_setresgid); 169 + cond_syscall(sys_getresgid); 170 + cond_syscall(sys_setgroups); 171 + cond_syscall(sys_getgroups); 172 + cond_syscall(sys_setfsuid); 173 + cond_syscall(sys_setfsgid); 174 + cond_syscall(sys_capget); 175 + cond_syscall(sys_capset); 162 176 163 177 /* arch-specific weak syscall entries */ 164 178 cond_syscall(sys_pciconfig_read);

+9

kernel/sysctl.c

··· 1335 1335 .extra1 = &min_extfrag_threshold, 1336 1336 .extra2 = &max_extfrag_threshold, 1337 1337 }, 1338 + { 1339 + .procname = "compact_unevictable_allowed", 1340 + .data = &sysctl_compact_unevictable_allowed, 1341 + .maxlen = sizeof(int), 1342 + .mode = 0644, 1343 + .proc_handler = proc_dointvec, 1344 + .extra1 = &zero, 1345 + .extra2 = &one, 1346 + }, 1338 1347 1339 1348 #endif /* CONFIG_COMPACTION */ 1340 1349 {

+2 -2

kernel/trace/trace_stack.c

··· 327 327 local_irq_enable(); 328 328 } 329 329 330 - static int trace_lookup_stack(struct seq_file *m, long i) 330 + static void trace_lookup_stack(struct seq_file *m, long i) 331 331 { 332 332 unsigned long addr = stack_dump_trace[i]; 333 333 334 - return seq_printf(m, "%pS\n", (void *)addr); 334 + seq_printf(m, "%pS\n", (void *)addr); 335 335 } 336 336 337 337 static void print_disabled(struct seq_file *m)

+5 -4

lib/lru_cache.c

··· 247 247 * progress) and "changed", when this in fact lead to an successful 248 248 * update of the cache. 249 249 */ 250 - return seq_printf(seq, "\t%s: used:%u/%u " 251 - "hits:%lu misses:%lu starving:%lu locked:%lu changed:%lu\n", 252 - lc->name, lc->used, lc->nr_elements, 253 - lc->hits, lc->misses, lc->starving, lc->locked, lc->changed); 250 + seq_printf(seq, "\t%s: used:%u/%u hits:%lu misses:%lu starving:%lu locked:%lu changed:%lu\n", 251 + lc->name, lc->used, lc->nr_elements, 252 + lc->hits, lc->misses, lc->starving, lc->locked, lc->changed); 253 + 254 + return 0; 254 255 } 255 256 256 257 static struct hlist_head *lc_hash_slot(struct lru_cache *lc, unsigned int enr)

+78 -111

lib/string_helpers.c

··· 239 239 } 240 240 EXPORT_SYMBOL(string_unescape); 241 241 242 - static int escape_passthrough(unsigned char c, char **dst, size_t *osz) 242 + static bool escape_passthrough(unsigned char c, char **dst, char *end) 243 243 { 244 244 char *out = *dst; 245 245 246 - if (*osz < 1) 247 - return -ENOMEM; 248 - 249 - *out++ = c; 250 - 251 - *dst = out; 252 - *osz -= 1; 253 - 254 - return 1; 246 + if (out < end) 247 + *out = c; 248 + *dst = out + 1; 249 + return true; 255 250 } 256 251 257 - static int escape_space(unsigned char c, char **dst, size_t *osz) 252 + static bool escape_space(unsigned char c, char **dst, char *end) 258 253 { 259 254 char *out = *dst; 260 255 unsigned char to; 261 - 262 - if (*osz < 2) 263 - return -ENOMEM; 264 256 265 257 switch (c) { 266 258 case '\n': ··· 271 279 to = 'f'; 272 280 break; 273 281 default: 274 - return 0; 282 + return false; 275 283 } 276 284 277 - *out++ = '\\'; 278 - *out++ = to; 285 + if (out < end) 286 + *out = '\\'; 287 + ++out; 288 + if (out < end) 289 + *out = to; 290 + ++out; 279 291 280 292 *dst = out; 281 - *osz -= 2; 282 - 283 - return 1; 293 + return true; 284 294 } 285 295 286 - static int escape_special(unsigned char c, char **dst, size_t *osz) 296 + static bool escape_special(unsigned char c, char **dst, char *end) 287 297 { 288 298 char *out = *dst; 289 299 unsigned char to; 290 - 291 - if (*osz < 2) 292 - return -ENOMEM; 293 300 294 301 switch (c) { 295 302 case '\\': ··· 301 310 to = 'e'; 302 311 break; 303 312 default: 304 - return 0; 313 + return false; 305 314 } 306 315 307 - *out++ = '\\'; 308 - *out++ = to; 316 + if (out < end) 317 + *out = '\\'; 318 + ++out; 319 + if (out < end) 320 + *out = to; 321 + ++out; 309 322 310 323 *dst = out; 311 - *osz -= 2; 312 - 313 - return 1; 324 + return true; 314 325 } 315 326 316 - static int escape_null(unsigned char c, char **dst, size_t *osz) 327 + static bool escape_null(unsigned char c, char **dst, char *end) 317 328 { 318 329 char *out = *dst; 319 - 320 - if (*osz < 2) 321 - return -ENOMEM; 322 330 323 331 if (c) 324 - return 0; 332 + return false; 325 333 326 - *out++ = '\\'; 327 - *out++ = '0'; 334 + if (out < end) 335 + *out = '\\'; 336 + ++out; 337 + if (out < end) 338 + *out = '0'; 339 + ++out; 328 340 329 341 *dst = out; 330 - *osz -= 2; 331 - 332 - return 1; 342 + return true; 333 343 } 334 344 335 - static int escape_octal(unsigned char c, char **dst, size_t *osz) 345 + static bool escape_octal(unsigned char c, char **dst, char *end) 336 346 { 337 347 char *out = *dst; 338 348 339 - if (*osz < 4) 340 - return -ENOMEM; 341 - 342 - *out++ = '\\'; 343 - *out++ = ((c >> 6) & 0x07) + '0'; 344 - *out++ = ((c >> 3) & 0x07) + '0'; 345 - *out++ = ((c >> 0) & 0x07) + '0'; 349 + if (out < end) 350 + *out = '\\'; 351 + ++out; 352 + if (out < end) 353 + *out = ((c >> 6) & 0x07) + '0'; 354 + ++out; 355 + if (out < end) 356 + *out = ((c >> 3) & 0x07) + '0'; 357 + ++out; 358 + if (out < end) 359 + *out = ((c >> 0) & 0x07) + '0'; 360 + ++out; 346 361 347 362 *dst = out; 348 - *osz -= 4; 349 - 350 - return 1; 363 + return true; 351 364 } 352 365 353 - static int escape_hex(unsigned char c, char **dst, size_t *osz) 366 + static bool escape_hex(unsigned char c, char **dst, char *end) 354 367 { 355 368 char *out = *dst; 356 369 357 - if (*osz < 4) 358 - return -ENOMEM; 359 - 360 - *out++ = '\\'; 361 - *out++ = 'x'; 362 - *out++ = hex_asc_hi(c); 363 - *out++ = hex_asc_lo(c); 370 + if (out < end) 371 + *out = '\\'; 372 + ++out; 373 + if (out < end) 374 + *out = 'x'; 375 + ++out; 376 + if (out < end) 377 + *out = hex_asc_hi(c); 378 + ++out; 379 + if (out < end) 380 + *out = hex_asc_lo(c); 381 + ++out; 364 382 365 383 *dst = out; 366 - *osz -= 4; 367 - 368 - return 1; 384 + return true; 369 385 } 370 386 371 387 /** ··· 424 426 * it if needs. 425 427 * 426 428 * Return: 427 - * The amount of the characters processed to the destination buffer, or 428 - * %-ENOMEM if the size of buffer is not enough to put an escaped character is 429 - * returned. 430 - * 431 - * Even in the case of error @dst pointer will be updated to point to the byte 432 - * after the last processed character. 429 + * The total size of the escaped output that would be generated for 430 + * the given input and flags. To check whether the output was 431 + * truncated, compare the return value to osz. There is room left in 432 + * dst for a '\0' terminator if and only if ret < osz. 433 433 */ 434 - int string_escape_mem(const char *src, size_t isz, char **dst, size_t osz, 434 + int string_escape_mem(const char *src, size_t isz, char *dst, size_t osz, 435 435 unsigned int flags, const char *esc) 436 436 { 437 - char *out = *dst, *p = out; 437 + char *p = dst; 438 + char *end = p + osz; 438 439 bool is_dict = esc && *esc; 439 - int ret = 0; 440 440 441 441 while (isz--) { 442 442 unsigned char c = *src++; ··· 454 458 (is_dict && !strchr(esc, c))) { 455 459 /* do nothing */ 456 460 } else { 457 - if (flags & ESCAPE_SPACE) { 458 - ret = escape_space(c, &p, &osz); 459 - if (ret < 0) 460 - break; 461 - if (ret > 0) 462 - continue; 463 - } 461 + if (flags & ESCAPE_SPACE && escape_space(c, &p, end)) 462 + continue; 464 463 465 - if (flags & ESCAPE_SPECIAL) { 466 - ret = escape_special(c, &p, &osz); 467 - if (ret < 0) 468 - break; 469 - if (ret > 0) 470 - continue; 471 - } 464 + if (flags & ESCAPE_SPECIAL && escape_special(c, &p, end)) 465 + continue; 472 466 473 - if (flags & ESCAPE_NULL) { 474 - ret = escape_null(c, &p, &osz); 475 - if (ret < 0) 476 - break; 477 - if (ret > 0) 478 - continue; 479 - } 467 + if (flags & ESCAPE_NULL && escape_null(c, &p, end)) 468 + continue; 480 469 481 470 /* ESCAPE_OCTAL and ESCAPE_HEX always go last */ 482 - if (flags & ESCAPE_OCTAL) { 483 - ret = escape_octal(c, &p, &osz); 484 - if (ret < 0) 485 - break; 471 + if (flags & ESCAPE_OCTAL && escape_octal(c, &p, end)) 486 472 continue; 487 - } 488 - if (flags & ESCAPE_HEX) { 489 - ret = escape_hex(c, &p, &osz); 490 - if (ret < 0) 491 - break; 473 + 474 + if (flags & ESCAPE_HEX && escape_hex(c, &p, end)) 492 475 continue; 493 - } 494 476 } 495 477 496 - ret = escape_passthrough(c, &p, &osz); 497 - if (ret < 0) 498 - break; 478 + escape_passthrough(c, &p, end); 499 479 } 500 480 501 - *dst = p; 502 - 503 - if (ret < 0) 504 - return ret; 505 - 506 - return p - out; 481 + return p - dst; 507 482 } 508 483 EXPORT_SYMBOL(string_escape_mem);

+4 -4

lib/test-hexdump.c

··· 18 18 19 19 static const unsigned char data_a[] = ".2.{....p..$}.4...1.....L...C..."; 20 20 21 - static const char *test_data_1_le[] __initconst = { 21 + static const char * const test_data_1_le[] __initconst = { 22 22 "be", "32", "db", "7b", "0a", "18", "93", "b2", 23 23 "70", "ba", "c4", "24", "7d", "83", "34", "9b", 24 24 "a6", "9c", "31", "ad", "9c", "0f", "ac", "e9", 25 25 "4c", "d1", "19", "99", "43", "b1", "af", "0c", 26 26 }; 27 27 28 - static const char *test_data_2_le[] __initconst = { 28 + static const char *test_data_2_le[] __initdata = { 29 29 "32be", "7bdb", "180a", "b293", 30 30 "ba70", "24c4", "837d", "9b34", 31 31 "9ca6", "ad31", "0f9c", "e9ac", 32 32 "d14c", "9919", "b143", "0caf", 33 33 }; 34 34 35 - static const char *test_data_4_le[] __initconst = { 35 + static const char *test_data_4_le[] __initdata = { 36 36 "7bdb32be", "b293180a", "24c4ba70", "9b34837d", 37 37 "ad319ca6", "e9ac0f9c", "9919d14c", "0cafb143", 38 38 }; 39 39 40 - static const char *test_data_8_le[] __initconst = { 40 + static const char *test_data_8_le[] __initdata = { 41 41 "b293180a7bdb32be", "9b34837d24c4ba70", 42 42 "e9ac0f9cad319ca6", "0cafb1439919d14c", 43 43 };

+20 -20

lib/test-string_helpers.c

··· 260 260 return NULL; 261 261 } 262 262 263 + static __init void 264 + test_string_escape_overflow(const char *in, int p, unsigned int flags, const char *esc, 265 + int q_test, const char *name) 266 + { 267 + int q_real; 268 + 269 + q_real = string_escape_mem(in, p, NULL, 0, flags, esc); 270 + if (q_real != q_test) 271 + pr_warn("Test '%s' failed: flags = %u, osz = 0, expected %d, got %d\n", 272 + name, flags, q_test, q_real); 273 + } 274 + 263 275 static __init void test_string_escape(const char *name, 264 276 const struct test_string_2 *s2, 265 277 unsigned int flags, const char *esc) 266 278 { 267 - int q_real = 512; 268 - char *out_test = kmalloc(q_real, GFP_KERNEL); 269 - char *out_real = kmalloc(q_real, GFP_KERNEL); 279 + size_t out_size = 512; 280 + char *out_test = kmalloc(out_size, GFP_KERNEL); 281 + char *out_real = kmalloc(out_size, GFP_KERNEL); 270 282 char *in = kmalloc(256, GFP_KERNEL); 271 - char *buf = out_real; 272 283 int p = 0, q_test = 0; 284 + int q_real; 273 285 274 286 if (!out_test || !out_real || !in) 275 287 goto out; ··· 313 301 q_test += len; 314 302 } 315 303 316 - q_real = string_escape_mem(in, p, &buf, q_real, flags, esc); 304 + q_real = string_escape_mem(in, p, out_real, out_size, flags, esc); 317 305 318 306 test_string_check_buf(name, flags, in, p, out_real, q_real, out_test, 319 307 q_test); 308 + 309 + test_string_escape_overflow(in, p, flags, esc, q_test, name); 310 + 320 311 out: 321 312 kfree(in); 322 313 kfree(out_real); 323 314 kfree(out_test); 324 - } 325 - 326 - static __init void test_string_escape_nomem(void) 327 - { 328 - char *in = "\eb \\C\007\"\x90\r]"; 329 - char out[64], *buf = out; 330 - int rc = -ENOMEM, ret; 331 - 332 - ret = string_escape_str_any_np(in, &buf, strlen(in), NULL); 333 - if (ret == rc) 334 - return; 335 - 336 - pr_err("Test 'escape nomem' failed: got %d instead of %d\n", ret, rc); 337 315 } 338 316 339 317 static int __init test_string_helpers_init(void) ··· 343 341 /* With dictionary */ 344 342 for (i = 0; i < (ESCAPE_ANY_NP | ESCAPE_HEX) + 1; i++) 345 343 test_string_escape("escape 1", escape1, i, TEST_STRING_2_DICT_1); 346 - 347 - test_string_escape_nomem(); 348 344 349 345 return -EINVAL; 350 346 }

+72 -38

lib/vsprintf.c

··· 17 17 */ 18 18 19 19 #include <stdarg.h> 20 + #include <linux/clk-provider.h> 20 21 #include <linux/module.h> /* for KSYM_SYMBOL_LEN */ 21 22 #include <linux/types.h> 22 23 #include <linux/string.h> ··· 341 340 return len; 342 341 } 343 342 344 - #define ZEROPAD 1 /* pad with zero */ 345 - #define SIGN 2 /* unsigned/signed long */ 343 + #define SIGN 1 /* unsigned/signed, must be 1 */ 344 + #define LEFT 2 /* left justified */ 346 345 #define PLUS 4 /* show plus */ 347 346 #define SPACE 8 /* space if plus */ 348 - #define LEFT 16 /* left justified */ 347 + #define ZEROPAD 16 /* pad with zero, must be 16 == '0' - ' ' */ 349 348 #define SMALL 32 /* use lowercase in hex (must be 32 == 0x20) */ 350 349 #define SPECIAL 64 /* prefix hex with "0x", octal with "0" */ 351 350 ··· 384 383 char *number(char *buf, char *end, unsigned long long num, 385 384 struct printf_spec spec) 386 385 { 387 - /* we are called with base 8, 10 or 16, only, thus don't need "G..." */ 388 - static const char digits[16] = "0123456789ABCDEF"; /* "GHIJKLMNOPQRSTUVWXYZ"; */ 389 - 390 - char tmp[66]; 386 + char tmp[3 * sizeof(num)]; 391 387 char sign; 392 388 char locase; 393 389 int need_pfx = ((spec.flags & SPECIAL) && spec.base != 10); ··· 420 422 /* generate full string in tmp[], in reverse order */ 421 423 i = 0; 422 424 if (num < spec.base) 423 - tmp[i++] = digits[num] | locase; 424 - /* Generic code, for any base: 425 - else do { 426 - tmp[i++] = (digits[do_div(num,base)] | locase); 427 - } while (num != 0); 428 - */ 425 + tmp[i++] = hex_asc_upper[num] | locase; 429 426 else if (spec.base != 10) { /* 8 or 16 */ 430 427 int mask = spec.base - 1; 431 428 int shift = 3; ··· 428 435 if (spec.base == 16) 429 436 shift = 4; 430 437 do { 431 - tmp[i++] = (digits[((unsigned char)num) & mask] | locase); 438 + tmp[i++] = (hex_asc_upper[((unsigned char)num) & mask] | locase); 432 439 num >>= shift; 433 440 } while (num); 434 441 } else { /* base 10 */ ··· 440 447 spec.precision = i; 441 448 /* leading space padding */ 442 449 spec.field_width -= spec.precision; 443 - if (!(spec.flags & (ZEROPAD+LEFT))) { 450 + if (!(spec.flags & (ZEROPAD | LEFT))) { 444 451 while (--spec.field_width >= 0) { 445 452 if (buf < end) 446 453 *buf = ' '; ··· 468 475 } 469 476 /* zero or space padding */ 470 477 if (!(spec.flags & LEFT)) { 471 - char c = (spec.flags & ZEROPAD) ? '0' : ' '; 478 + char c = ' ' + (spec.flags & ZEROPAD); 479 + BUILD_BUG_ON(' ' + ZEROPAD != '0'); 472 480 while (--spec.field_width >= 0) { 473 481 if (buf < end) 474 482 *buf = c; ··· 777 783 if (spec.field_width > 0) 778 784 len = min_t(int, spec.field_width, 64); 779 785 780 - for (i = 0; i < len && buf < end - 1; i++) { 781 - buf = hex_byte_pack(buf, addr[i]); 786 + for (i = 0; i < len; ++i) { 787 + if (buf < end) 788 + *buf = hex_asc_hi(addr[i]); 789 + ++buf; 790 + if (buf < end) 791 + *buf = hex_asc_lo(addr[i]); 792 + ++buf; 782 793 783 - if (buf < end && separator && i != len - 1) 784 - *buf++ = separator; 794 + if (separator && i != len - 1) { 795 + if (buf < end) 796 + *buf = separator; 797 + ++buf; 798 + } 785 799 } 786 800 787 801 return buf; ··· 1235 1233 1236 1234 len = spec.field_width < 0 ? 1 : spec.field_width; 1237 1235 1238 - /* Ignore the error. We print as many characters as we can */ 1239 - string_escape_mem(addr, len, &buf, end - buf, flags, NULL); 1236 + /* 1237 + * string_escape_mem() writes as many characters as it can to 1238 + * the given buffer, and returns the total size of the output 1239 + * had the buffer been big enough. 1240 + */ 1241 + buf += string_escape_mem(addr, len, buf, buf < end ? end - buf : 0, flags, NULL); 1240 1242 1241 1243 return buf; 1242 1244 } ··· 1328 1322 return number(buf, end, num, spec); 1329 1323 } 1330 1324 1325 + static noinline_for_stack 1326 + char *clock(char *buf, char *end, struct clk *clk, struct printf_spec spec, 1327 + const char *fmt) 1328 + { 1329 + if (!IS_ENABLED(CONFIG_HAVE_CLK) || !clk) 1330 + return string(buf, end, NULL, spec); 1331 + 1332 + switch (fmt[1]) { 1333 + case 'r': 1334 + return number(buf, end, clk_get_rate(clk), spec); 1335 + 1336 + case 'n': 1337 + default: 1338 + #ifdef CONFIG_COMMON_CLK 1339 + return string(buf, end, __clk_get_name(clk), spec); 1340 + #else 1341 + spec.base = 16; 1342 + spec.field_width = sizeof(unsigned long) * 2 + 2; 1343 + spec.flags |= SPECIAL | SMALL | ZEROPAD; 1344 + return number(buf, end, (unsigned long)clk, spec); 1345 + #endif 1346 + } 1347 + } 1348 + 1331 1349 int kptr_restrict __read_mostly; 1332 1350 1333 1351 /* ··· 1434 1404 * (default assumed to be phys_addr_t, passed by reference) 1435 1405 * - 'd[234]' For a dentry name (optionally 2-4 last components) 1436 1406 * - 'D[234]' Same as 'd' but for a struct file 1407 + * - 'C' For a clock, it prints the name (Common Clock Framework) or address 1408 + * (legacy clock framework) of the clock 1409 + * - 'Cn' For a clock, it prints the name (Common Clock Framework) or address 1410 + * (legacy clock framework) of the clock 1411 + * - 'Cr' For a clock, it prints the current rate of the clock 1437 1412 * 1438 1413 * Note: The difference between 'S' and 'F' is that on ia64 and ppc64 1439 1414 * function pointers are really function descriptors, which contain a ··· 1583 1548 return address_val(buf, end, ptr, spec, fmt); 1584 1549 case 'd': 1585 1550 return dentry_name(buf, end, ptr, spec, fmt); 1551 + case 'C': 1552 + return clock(buf, end, ptr, spec, fmt); 1586 1553 case 'D': 1587 1554 return dentry_name(buf, end, 1588 1555 ((const struct file *)ptr)->f_path.dentry, ··· 1775 1738 if (spec->qualifier == 'L') 1776 1739 spec->type = FORMAT_TYPE_LONG_LONG; 1777 1740 else if (spec->qualifier == 'l') { 1778 - if (spec->flags & SIGN) 1779 - spec->type = FORMAT_TYPE_LONG; 1780 - else 1781 - spec->type = FORMAT_TYPE_ULONG; 1741 + BUILD_BUG_ON(FORMAT_TYPE_ULONG + SIGN != FORMAT_TYPE_LONG); 1742 + spec->type = FORMAT_TYPE_ULONG + (spec->flags & SIGN); 1782 1743 } else if (_tolower(spec->qualifier) == 'z') { 1783 1744 spec->type = FORMAT_TYPE_SIZE_T; 1784 1745 } else if (spec->qualifier == 't') { 1785 1746 spec->type = FORMAT_TYPE_PTRDIFF; 1786 1747 } else if (spec->qualifier == 'H') { 1787 - if (spec->flags & SIGN) 1788 - spec->type = FORMAT_TYPE_BYTE; 1789 - else 1790 - spec->type = FORMAT_TYPE_UBYTE; 1748 + BUILD_BUG_ON(FORMAT_TYPE_UBYTE + SIGN != FORMAT_TYPE_BYTE); 1749 + spec->type = FORMAT_TYPE_UBYTE + (spec->flags & SIGN); 1791 1750 } else if (spec->qualifier == 'h') { 1792 - if (spec->flags & SIGN) 1793 - spec->type = FORMAT_TYPE_SHORT; 1794 - else 1795 - spec->type = FORMAT_TYPE_USHORT; 1751 + BUILD_BUG_ON(FORMAT_TYPE_USHORT + SIGN != FORMAT_TYPE_SHORT); 1752 + spec->type = FORMAT_TYPE_USHORT + (spec->flags & SIGN); 1796 1753 } else { 1797 - if (spec->flags & SIGN) 1798 - spec->type = FORMAT_TYPE_INT; 1799 - else 1800 - spec->type = FORMAT_TYPE_UINT; 1754 + BUILD_BUG_ON(FORMAT_TYPE_UINT + SIGN != FORMAT_TYPE_INT); 1755 + spec->type = FORMAT_TYPE_UINT + (spec->flags & SIGN); 1801 1756 } 1802 1757 1803 1758 return ++fmt - start; ··· 1829 1800 * %*pE[achnops] print an escaped buffer 1830 1801 * %*ph[CDN] a variable-length hex string with a separator (supports up to 64 1831 1802 * bytes of the input) 1803 + * %pC output the name (Common Clock Framework) or address (legacy clock 1804 + * framework) of a clock 1805 + * %pCn output the name (Common Clock Framework) or address (legacy clock 1806 + * framework) of a clock 1807 + * %pCr output the current rate of a clock 1832 1808 * %n is ignored 1833 1809 * 1834 1810 * ** Please update Documentation/printk-formats.txt when making changes **

+5

mm/cma.c

··· 23 23 # define DEBUG 24 24 #endif 25 25 #endif 26 + #define CREATE_TRACE_POINTS 26 27 27 28 #include <linux/memblock.h> 28 29 #include <linux/err.h> ··· 35 34 #include <linux/cma.h> 36 35 #include <linux/highmem.h> 37 36 #include <linux/io.h> 37 + #include <trace/events/cma.h> 38 38 39 39 #include "cma.h" 40 40 ··· 416 414 start = bitmap_no + mask + 1; 417 415 } 418 416 417 + trace_cma_alloc(page ? pfn : -1UL, page, count, align); 418 + 419 419 pr_debug("%s(): returned %p\n", __func__, page); 420 420 return page; 421 421 } ··· 450 446 451 447 free_contig_range(pfn, count); 452 448 cma_clear_bitmap(cma, pfn, count); 449 + trace_cma_release(pfn, pages, count); 453 450 454 451 return true; 455 452 }

+38 -3

mm/cma_debug.c

··· 30 30 31 31 return 0; 32 32 } 33 - 34 33 DEFINE_SIMPLE_ATTRIBUTE(cma_debugfs_fops, cma_debugfs_get, NULL, "%llu\n"); 34 + 35 + static int cma_used_get(void *data, u64 *val) 36 + { 37 + struct cma *cma = data; 38 + unsigned long used; 39 + 40 + mutex_lock(&cma->lock); 41 + /* pages counter is smaller than sizeof(int) */ 42 + used = bitmap_weight(cma->bitmap, (int)cma->count); 43 + mutex_unlock(&cma->lock); 44 + *val = (u64)used << cma->order_per_bit; 45 + 46 + return 0; 47 + } 48 + DEFINE_SIMPLE_ATTRIBUTE(cma_used_fops, cma_used_get, NULL, "%llu\n"); 49 + 50 + static int cma_maxchunk_get(void *data, u64 *val) 51 + { 52 + struct cma *cma = data; 53 + unsigned long maxchunk = 0; 54 + unsigned long start, end = 0; 55 + 56 + mutex_lock(&cma->lock); 57 + for (;;) { 58 + start = find_next_zero_bit(cma->bitmap, cma->count, end); 59 + if (start >= cma->count) 60 + break; 61 + end = find_next_bit(cma->bitmap, cma->count, start); 62 + maxchunk = max(end - start, maxchunk); 63 + } 64 + mutex_unlock(&cma->lock); 65 + *val = (u64)maxchunk << cma->order_per_bit; 66 + 67 + return 0; 68 + } 69 + DEFINE_SIMPLE_ATTRIBUTE(cma_maxchunk_fops, cma_maxchunk_get, NULL, "%llu\n"); 35 70 36 71 static void cma_add_to_cma_mem_list(struct cma *cma, struct cma_mem *mem) 37 72 { ··· 126 91 127 92 return cma_free_mem(cma, pages); 128 93 } 129 - 130 94 DEFINE_SIMPLE_ATTRIBUTE(cma_free_fops, NULL, cma_free_write, "%llu\n"); 131 95 132 96 static int cma_alloc_mem(struct cma *cma, int count) ··· 158 124 159 125 return cma_alloc_mem(cma, pages); 160 126 } 161 - 162 127 DEFINE_SIMPLE_ATTRIBUTE(cma_alloc_fops, NULL, cma_alloc_write, "%llu\n"); 163 128 164 129 static void cma_debugfs_add_one(struct cma *cma, int idx) ··· 182 149 &cma->count, &cma_debugfs_fops); 183 150 debugfs_create_file("order_per_bit", S_IRUGO, tmp, 184 151 &cma->order_per_bit, &cma_debugfs_fops); 152 + debugfs_create_file("used", S_IRUGO, tmp, cma, &cma_used_fops); 153 + debugfs_create_file("maxchunk", S_IRUGO, tmp, cma, &cma_maxchunk_fops); 185 154 186 155 u32s = DIV_ROUND_UP(cma_bitmap_maxno(cma), BITS_PER_BYTE * sizeof(u32)); 187 156 debugfs_create_u32_array("bitmap", S_IRUGO, tmp, (u32*)cma->bitmap, u32s);

+38 -22

mm/compaction.c

··· 391 391 return false; 392 392 } 393 393 394 - /* Returns true if the page is within a block suitable for migration to */ 395 - static bool suitable_migration_target(struct page *page) 396 - { 397 - /* If the page is a large free page, then disallow migration */ 398 - if (PageBuddy(page)) { 399 - /* 400 - * We are checking page_order without zone->lock taken. But 401 - * the only small danger is that we skip a potentially suitable 402 - * pageblock, so it's not worth to check order for valid range. 403 - */ 404 - if (page_order_unsafe(page) >= pageblock_order) 405 - return false; 406 - } 407 - 408 - /* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */ 409 - if (migrate_async_suitable(get_pageblock_migratetype(page))) 410 - return true; 411 - 412 - /* Otherwise skip the block */ 413 - return false; 414 - } 415 - 416 394 /* 417 395 * Isolate free pages onto a private freelist. If @strict is true, will abort 418 396 * returning 0 on any invalid PFNs or non-free pages inside of the pageblock ··· 874 896 875 897 #endif /* CONFIG_COMPACTION || CONFIG_CMA */ 876 898 #ifdef CONFIG_COMPACTION 899 + 900 + /* Returns true if the page is within a block suitable for migration to */ 901 + static bool suitable_migration_target(struct page *page) 902 + { 903 + /* If the page is a large free page, then disallow migration */ 904 + if (PageBuddy(page)) { 905 + /* 906 + * We are checking page_order without zone->lock taken. But 907 + * the only small danger is that we skip a potentially suitable 908 + * pageblock, so it's not worth to check order for valid range. 909 + */ 910 + if (page_order_unsafe(page) >= pageblock_order) 911 + return false; 912 + } 913 + 914 + /* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */ 915 + if (migrate_async_suitable(get_pageblock_migratetype(page))) 916 + return true; 917 + 918 + /* Otherwise skip the block */ 919 + return false; 920 + } 921 + 877 922 /* 878 923 * Based on information in the current compact_control, find blocks 879 924 * suitable for isolating free pages from and then isolate them. ··· 1048 1047 } isolate_migrate_t; 1049 1048 1050 1049 /* 1050 + * Allow userspace to control policy on scanning the unevictable LRU for 1051 + * compactable pages. 1052 + */ 1053 + int sysctl_compact_unevictable_allowed __read_mostly = 1; 1054 + 1055 + /* 1051 1056 * Isolate all pages that can be migrated from the first suitable block, 1052 1057 * starting at the block pointed to by the migrate scanner pfn within 1053 1058 * compact_control. ··· 1064 1057 unsigned long low_pfn, end_pfn; 1065 1058 struct page *page; 1066 1059 const isolate_mode_t isolate_mode = 1060 + (sysctl_compact_unevictable_allowed ? ISOLATE_UNEVICTABLE : 0) | 1067 1061 (cc->mode == MIGRATE_ASYNC ? ISOLATE_ASYNC_MIGRATE : 0); 1068 1062 1069 1063 /* ··· 1605 1597 cc->zone = zone; 1606 1598 INIT_LIST_HEAD(&cc->freepages); 1607 1599 INIT_LIST_HEAD(&cc->migratepages); 1600 + 1601 + /* 1602 + * When called via /proc/sys/vm/compact_memory 1603 + * this makes sure we compact the whole zone regardless of 1604 + * cached scanner positions. 1605 + */ 1606 + if (cc->order == -1) 1607 + __reset_isolation_suitable(zone); 1608 1608 1609 1609 if (cc->order == -1 || !compaction_deferred(zone, cc->order)) 1610 1610 compact_zone(zone, cc);

+2 -2

mm/gup.c

··· 1019 1019 * 1020 1020 * for an example see gup_get_pte in arch/x86/mm/gup.c 1021 1021 */ 1022 - pte_t pte = ACCESS_ONCE(*ptep); 1022 + pte_t pte = READ_ONCE(*ptep); 1023 1023 struct page *page; 1024 1024 1025 1025 /* ··· 1309 1309 local_irq_save(flags); 1310 1310 pgdp = pgd_offset(mm, addr); 1311 1311 do { 1312 - pgd_t pgd = ACCESS_ONCE(*pgdp); 1312 + pgd_t pgd = READ_ONCE(*pgdp); 1313 1313 1314 1314 next = pgd_addr_end(addr, end); 1315 1315 if (pgd_none(pgd))

+49 -37

mm/huge_memory.c

··· 67 67 68 68 static int khugepaged(void *none); 69 69 static int khugepaged_slab_init(void); 70 + static void khugepaged_slab_exit(void); 70 71 71 72 #define MM_SLOTS_HASH_BITS 10 72 73 static __read_mostly DEFINE_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS); ··· 110 109 int nr_zones = 0; 111 110 unsigned long recommended_min; 112 111 113 - if (!khugepaged_enabled()) 114 - return 0; 115 - 116 112 for_each_populated_zone(zone) 117 113 nr_zones++; 118 114 ··· 141 143 setup_per_zone_wmarks(); 142 144 return 0; 143 145 } 144 - late_initcall(set_recommended_min_free_kbytes); 145 146 146 - static int start_khugepaged(void) 147 + static int start_stop_khugepaged(void) 147 148 { 148 149 int err = 0; 149 150 if (khugepaged_enabled()) { ··· 153 156 pr_err("khugepaged: kthread_run(khugepaged) failed\n"); 154 157 err = PTR_ERR(khugepaged_thread); 155 158 khugepaged_thread = NULL; 159 + goto fail; 156 160 } 157 161 158 162 if (!list_empty(&khugepaged_scan.mm_head)) ··· 164 166 kthread_stop(khugepaged_thread); 165 167 khugepaged_thread = NULL; 166 168 } 167 - 169 + fail: 168 170 return err; 169 171 } 170 172 ··· 181 183 struct page *zero_page; 182 184 retry: 183 185 if (likely(atomic_inc_not_zero(&huge_zero_refcount))) 184 - return ACCESS_ONCE(huge_zero_page); 186 + return READ_ONCE(huge_zero_page); 185 187 186 188 zero_page = alloc_pages((GFP_TRANSHUGE | __GFP_ZERO) & ~__GFP_MOVABLE, 187 189 HPAGE_PMD_ORDER); ··· 200 202 /* We take additional reference here. It will be put back by shrinker */ 201 203 atomic_set(&huge_zero_refcount, 2); 202 204 preempt_enable(); 203 - return ACCESS_ONCE(huge_zero_page); 205 + return READ_ONCE(huge_zero_page); 204 206 } 205 207 206 208 static void put_huge_zero_page(void) ··· 298 300 int err; 299 301 300 302 mutex_lock(&khugepaged_mutex); 301 - err = start_khugepaged(); 303 + err = start_stop_khugepaged(); 302 304 mutex_unlock(&khugepaged_mutex); 303 305 304 306 if (err) ··· 632 634 633 635 err = hugepage_init_sysfs(&hugepage_kobj); 634 636 if (err) 635 - return err; 637 + goto err_sysfs; 636 638 637 639 err = khugepaged_slab_init(); 638 640 if (err) 639 - goto out; 641 + goto err_slab; 640 642 641 - register_shrinker(&huge_zero_page_shrinker); 643 + err = register_shrinker(&huge_zero_page_shrinker); 644 + if (err) 645 + goto err_hzp_shrinker; 642 646 643 647 /* 644 648 * By default disable transparent hugepages on smaller systems, 645 649 * where the extra memory used could hurt more than TLB overhead 646 650 * is likely to save. The admin can still enable it through /sys. 647 651 */ 648 - if (totalram_pages < (512 << (20 - PAGE_SHIFT))) 652 + if (totalram_pages < (512 << (20 - PAGE_SHIFT))) { 649 653 transparent_hugepage_flags = 0; 654 + return 0; 655 + } 650 656 651 - start_khugepaged(); 657 + err = start_stop_khugepaged(); 658 + if (err) 659 + goto err_khugepaged; 652 660 653 661 return 0; 654 - out: 662 + err_khugepaged: 663 + unregister_shrinker(&huge_zero_page_shrinker); 664 + err_hzp_shrinker: 665 + khugepaged_slab_exit(); 666 + err_slab: 655 667 hugepage_exit_sysfs(hugepage_kobj); 668 + err_sysfs: 656 669 return err; 657 670 } 658 671 subsys_initcall(hugepage_init); ··· 717 708 static int __do_huge_pmd_anonymous_page(struct mm_struct *mm, 718 709 struct vm_area_struct *vma, 719 710 unsigned long haddr, pmd_t *pmd, 720 - struct page *page) 711 + struct page *page, gfp_t gfp) 721 712 { 722 713 struct mem_cgroup *memcg; 723 714 pgtable_t pgtable; ··· 725 716 726 717 VM_BUG_ON_PAGE(!PageCompound(page), page); 727 718 728 - if (mem_cgroup_try_charge(page, mm, GFP_TRANSHUGE, &memcg)) 719 + if (mem_cgroup_try_charge(page, mm, gfp, &memcg)) 729 720 return VM_FAULT_OOM; 730 721 731 722 pgtable = pte_alloc_one(mm, haddr); ··· 831 822 count_vm_event(THP_FAULT_FALLBACK); 832 823 return VM_FAULT_FALLBACK; 833 824 } 834 - if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd, page))) { 825 + if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd, page, gfp))) { 835 826 put_page(page); 836 827 count_vm_event(THP_FAULT_FALLBACK); 837 828 return VM_FAULT_FALLBACK; ··· 1089 1080 unsigned long haddr; 1090 1081 unsigned long mmun_start; /* For mmu_notifiers */ 1091 1082 unsigned long mmun_end; /* For mmu_notifiers */ 1083 + gfp_t huge_gfp; /* for allocation and charge */ 1092 1084 1093 1085 ptl = pmd_lockptr(mm, pmd); 1094 1086 VM_BUG_ON_VMA(!vma->anon_vma, vma); ··· 1116 1106 alloc: 1117 1107 if (transparent_hugepage_enabled(vma) && 1118 1108 !transparent_hugepage_debug_cow()) { 1119 - gfp_t gfp; 1120 - 1121 - gfp = alloc_hugepage_gfpmask(transparent_hugepage_defrag(vma), 0); 1122 - new_page = alloc_hugepage_vma(gfp, vma, haddr, HPAGE_PMD_ORDER); 1109 + huge_gfp = alloc_hugepage_gfpmask(transparent_hugepage_defrag(vma), 0); 1110 + new_page = alloc_hugepage_vma(huge_gfp, vma, haddr, HPAGE_PMD_ORDER); 1123 1111 } else 1124 1112 new_page = NULL; 1125 1113 ··· 1138 1130 goto out; 1139 1131 } 1140 1132 1141 - if (unlikely(mem_cgroup_try_charge(new_page, mm, 1142 - GFP_TRANSHUGE, &memcg))) { 1133 + if (unlikely(mem_cgroup_try_charge(new_page, mm, huge_gfp, &memcg))) { 1143 1134 put_page(new_page); 1144 1135 if (page) { 1145 1136 split_huge_page(page); ··· 1983 1976 return 0; 1984 1977 } 1985 1978 1979 + static void __init khugepaged_slab_exit(void) 1980 + { 1981 + kmem_cache_destroy(mm_slot_cache); 1982 + } 1983 + 1986 1984 static inline struct mm_slot *alloc_mm_slot(void) 1987 1985 { 1988 1986 if (!mm_slot_cache) /* initialization failed */ ··· 2335 2323 return true; 2336 2324 } 2337 2325 2338 - static struct page 2339 - *khugepaged_alloc_page(struct page **hpage, struct mm_struct *mm, 2326 + static struct page * 2327 + khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm, 2340 2328 struct vm_area_struct *vma, unsigned long address, 2341 2329 int node) 2342 2330 { 2343 - gfp_t flags; 2344 - 2345 2331 VM_BUG_ON_PAGE(*hpage, *hpage); 2346 - 2347 - /* Only allocate from the target node */ 2348 - flags = alloc_hugepage_gfpmask(khugepaged_defrag(), __GFP_OTHER_NODE) | 2349 - __GFP_THISNODE; 2350 2332 2351 2333 /* 2352 2334 * Before allocating the hugepage, release the mmap_sem read lock. ··· 2350 2344 */ 2351 2345 up_read(&mm->mmap_sem); 2352 2346 2353 - *hpage = alloc_pages_exact_node(node, flags, HPAGE_PMD_ORDER); 2347 + *hpage = alloc_pages_exact_node(node, gfp, HPAGE_PMD_ORDER); 2354 2348 if (unlikely(!*hpage)) { 2355 2349 count_vm_event(THP_COLLAPSE_ALLOC_FAILED); 2356 2350 *hpage = ERR_PTR(-ENOMEM); ··· 2403 2397 return true; 2404 2398 } 2405 2399 2406 - static struct page 2407 - *khugepaged_alloc_page(struct page **hpage, struct mm_struct *mm, 2400 + static struct page * 2401 + khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm, 2408 2402 struct vm_area_struct *vma, unsigned long address, 2409 2403 int node) 2410 2404 { 2411 2405 up_read(&mm->mmap_sem); 2412 2406 VM_BUG_ON(!*hpage); 2407 + 2413 2408 return *hpage; 2414 2409 } 2415 2410 #endif ··· 2445 2438 struct mem_cgroup *memcg; 2446 2439 unsigned long mmun_start; /* For mmu_notifiers */ 2447 2440 unsigned long mmun_end; /* For mmu_notifiers */ 2441 + gfp_t gfp; 2448 2442 2449 2443 VM_BUG_ON(address & ~HPAGE_PMD_MASK); 2450 2444 2445 + /* Only allocate from the target node */ 2446 + gfp = alloc_hugepage_gfpmask(khugepaged_defrag(), __GFP_OTHER_NODE) | 2447 + __GFP_THISNODE; 2448 + 2451 2449 /* release the mmap_sem read lock. */ 2452 - new_page = khugepaged_alloc_page(hpage, mm, vma, address, node); 2450 + new_page = khugepaged_alloc_page(hpage, gfp, mm, vma, address, node); 2453 2451 if (!new_page) 2454 2452 return; 2455 2453 2456 2454 if (unlikely(mem_cgroup_try_charge(new_page, mm, 2457 - GFP_TRANSHUGE, &memcg))) 2455 + gfp, &memcg))) 2458 2456 return; 2459 2457 2460 2458 /*

+166 -70

mm/hugetlb.c

··· 61 61 static int num_fault_mutexes; 62 62 static struct mutex *htlb_fault_mutex_table ____cacheline_aligned_in_smp; 63 63 64 + /* Forward declaration */ 65 + static int hugetlb_acct_memory(struct hstate *h, long delta); 66 + 64 67 static inline void unlock_or_release_subpool(struct hugepage_subpool *spool) 65 68 { 66 69 bool free = (spool->count == 0) && (spool->used_hpages == 0); ··· 71 68 spin_unlock(&spool->lock); 72 69 73 70 /* If no pages are used, and no other handles to the subpool 74 - * remain, free the subpool the subpool remain */ 75 - if (free) 71 + * remain, give up any reservations mased on minimum size and 72 + * free the subpool */ 73 + if (free) { 74 + if (spool->min_hpages != -1) 75 + hugetlb_acct_memory(spool->hstate, 76 + -spool->min_hpages); 76 77 kfree(spool); 78 + } 77 79 } 78 80 79 - struct hugepage_subpool *hugepage_new_subpool(long nr_blocks) 81 + struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages, 82 + long min_hpages) 80 83 { 81 84 struct hugepage_subpool *spool; 82 85 83 - spool = kmalloc(sizeof(*spool), GFP_KERNEL); 86 + spool = kzalloc(sizeof(*spool), GFP_KERNEL); 84 87 if (!spool) 85 88 return NULL; 86 89 87 90 spin_lock_init(&spool->lock); 88 91 spool->count = 1; 89 - spool->max_hpages = nr_blocks; 90 - spool->used_hpages = 0; 92 + spool->max_hpages = max_hpages; 93 + spool->hstate = h; 94 + spool->min_hpages = min_hpages; 95 + 96 + if (min_hpages != -1 && hugetlb_acct_memory(h, min_hpages)) { 97 + kfree(spool); 98 + return NULL; 99 + } 100 + spool->rsv_hpages = min_hpages; 91 101 92 102 return spool; 93 103 } ··· 113 97 unlock_or_release_subpool(spool); 114 98 } 115 99 116 - static int hugepage_subpool_get_pages(struct hugepage_subpool *spool, 100 + /* 101 + * Subpool accounting for allocating and reserving pages. 102 + * Return -ENOMEM if there are not enough resources to satisfy the 103 + * the request. Otherwise, return the number of pages by which the 104 + * global pools must be adjusted (upward). The returned value may 105 + * only be different than the passed value (delta) in the case where 106 + * a subpool minimum size must be manitained. 107 + */ 108 + static long hugepage_subpool_get_pages(struct hugepage_subpool *spool, 117 109 long delta) 118 110 { 119 - int ret = 0; 111 + long ret = delta; 120 112 121 113 if (!spool) 122 - return 0; 114 + return ret; 123 115 124 116 spin_lock(&spool->lock); 125 - if ((spool->used_hpages + delta) <= spool->max_hpages) { 126 - spool->used_hpages += delta; 127 - } else { 128 - ret = -ENOMEM; 129 - } 130 - spin_unlock(&spool->lock); 131 117 118 + if (spool->max_hpages != -1) { /* maximum size accounting */ 119 + if ((spool->used_hpages + delta) <= spool->max_hpages) 120 + spool->used_hpages += delta; 121 + else { 122 + ret = -ENOMEM; 123 + goto unlock_ret; 124 + } 125 + } 126 + 127 + if (spool->min_hpages != -1) { /* minimum size accounting */ 128 + if (delta > spool->rsv_hpages) { 129 + /* 130 + * Asking for more reserves than those already taken on 131 + * behalf of subpool. Return difference. 132 + */ 133 + ret = delta - spool->rsv_hpages; 134 + spool->rsv_hpages = 0; 135 + } else { 136 + ret = 0; /* reserves already accounted for */ 137 + spool->rsv_hpages -= delta; 138 + } 139 + } 140 + 141 + unlock_ret: 142 + spin_unlock(&spool->lock); 132 143 return ret; 133 144 } 134 145 135 - static void hugepage_subpool_put_pages(struct hugepage_subpool *spool, 146 + /* 147 + * Subpool accounting for freeing and unreserving pages. 148 + * Return the number of global page reservations that must be dropped. 149 + * The return value may only be different than the passed value (delta) 150 + * in the case where a subpool minimum size must be maintained. 151 + */ 152 + static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, 136 153 long delta) 137 154 { 155 + long ret = delta; 156 + 138 157 if (!spool) 139 - return; 158 + return delta; 140 159 141 160 spin_lock(&spool->lock); 142 - spool->used_hpages -= delta; 143 - /* If hugetlbfs_put_super couldn't free spool due to 144 - * an outstanding quota reference, free it now. */ 161 + 162 + if (spool->max_hpages != -1) /* maximum size accounting */ 163 + spool->used_hpages -= delta; 164 + 165 + if (spool->min_hpages != -1) { /* minimum size accounting */ 166 + if (spool->rsv_hpages + delta <= spool->min_hpages) 167 + ret = 0; 168 + else 169 + ret = spool->rsv_hpages + delta - spool->min_hpages; 170 + 171 + spool->rsv_hpages += delta; 172 + if (spool->rsv_hpages > spool->min_hpages) 173 + spool->rsv_hpages = spool->min_hpages; 174 + } 175 + 176 + /* 177 + * If hugetlbfs_put_super couldn't free spool due to an outstanding 178 + * quota reference, free it now. 179 + */ 145 180 unlock_or_release_subpool(spool); 181 + 182 + return ret; 146 183 } 147 184 148 185 static inline struct hugepage_subpool *subpool_inode(struct inode *inode) ··· 924 855 return NULL; 925 856 } 926 857 858 + /* 859 + * Test to determine whether the hugepage is "active/in-use" (i.e. being linked 860 + * to hstate->hugepage_activelist.) 861 + * 862 + * This function can be called for tail pages, but never returns true for them. 863 + */ 864 + bool page_huge_active(struct page *page) 865 + { 866 + VM_BUG_ON_PAGE(!PageHuge(page), page); 867 + return PageHead(page) && PagePrivate(&page[1]); 868 + } 869 + 870 + /* never called for tail page */ 871 + static void set_page_huge_active(struct page *page) 872 + { 873 + VM_BUG_ON_PAGE(!PageHeadHuge(page), page); 874 + SetPagePrivate(&page[1]); 875 + } 876 + 877 + static void clear_page_huge_active(struct page *page) 878 + { 879 + VM_BUG_ON_PAGE(!PageHeadHuge(page), page); 880 + ClearPagePrivate(&page[1]); 881 + } 882 + 927 883 void free_huge_page(struct page *page) 928 884 { 929 885 /* ··· 968 874 restore_reserve = PagePrivate(page); 969 875 ClearPagePrivate(page); 970 876 877 + /* 878 + * A return code of zero implies that the subpool will be under its 879 + * minimum size if the reservation is not restored after page is free. 880 + * Therefore, force restore_reserve operation. 881 + */ 882 + if (hugepage_subpool_put_pages(spool, 1) == 0) 883 + restore_reserve = true; 884 + 971 885 spin_lock(&hugetlb_lock); 886 + clear_page_huge_active(page); 972 887 hugetlb_cgroup_uncharge_page(hstate_index(h), 973 888 pages_per_huge_page(h), page); 974 889 if (restore_reserve) ··· 994 891 enqueue_huge_page(h, page); 995 892 } 996 893 spin_unlock(&hugetlb_lock); 997 - hugepage_subpool_put_pages(spool, 1); 998 894 } 999 895 1000 896 static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) ··· 1488 1386 if (chg < 0) 1489 1387 return ERR_PTR(-ENOMEM); 1490 1388 if (chg || avoid_reserve) 1491 - if (hugepage_subpool_get_pages(spool, 1)) 1389 + if (hugepage_subpool_get_pages(spool, 1) < 0) 1492 1390 return ERR_PTR(-ENOSPC); 1493 1391 1494 1392 ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); ··· 2556 2454 struct resv_map *resv = vma_resv_map(vma); 2557 2455 struct hugepage_subpool *spool = subpool_vma(vma); 2558 2456 unsigned long reserve, start, end; 2457 + long gbl_reserve; 2559 2458 2560 2459 if (!resv || !is_vma_resv_set(vma, HPAGE_RESV_OWNER)) 2561 2460 return; ··· 2569 2466 kref_put(&resv->refs, resv_map_release); 2570 2467 2571 2468 if (reserve) { 2572 - hugetlb_acct_memory(h, -reserve); 2573 - hugepage_subpool_put_pages(spool, reserve); 2469 + /* 2470 + * Decrement reserve counts. The global reserve count may be 2471 + * adjusted if the subpool has a minimum size. 2472 + */ 2473 + gbl_reserve = hugepage_subpool_put_pages(spool, reserve); 2474 + hugetlb_acct_memory(h, -gbl_reserve); 2574 2475 } 2575 2476 } 2576 2477 ··· 2998 2891 copy_user_huge_page(new_page, old_page, address, vma, 2999 2892 pages_per_huge_page(h)); 3000 2893 __SetPageUptodate(new_page); 2894 + set_page_huge_active(new_page); 3001 2895 3002 2896 mmun_start = address & huge_page_mask(h); 3003 2897 mmun_end = mmun_start + huge_page_size(h); ··· 3111 3003 } 3112 3004 clear_huge_page(page, address, pages_per_huge_page(h)); 3113 3005 __SetPageUptodate(page); 3006 + set_page_huge_active(page); 3114 3007 3115 3008 if (vma->vm_flags & VM_MAYSHARE) { 3116 3009 int err; ··· 3556 3447 struct hstate *h = hstate_inode(inode); 3557 3448 struct hugepage_subpool *spool = subpool_inode(inode); 3558 3449 struct resv_map *resv_map; 3450 + long gbl_reserve; 3559 3451 3560 3452 /* 3561 3453 * Only apply hugepage reservation if asked. At fault time, an ··· 3593 3483 goto out_err; 3594 3484 } 3595 3485 3596 - /* There must be enough pages in the subpool for the mapping */ 3597 - if (hugepage_subpool_get_pages(spool, chg)) { 3486 + /* 3487 + * There must be enough pages in the subpool for the mapping. If 3488 + * the subpool has a minimum size, there may be some global 3489 + * reservations already in place (gbl_reserve). 3490 + */ 3491 + gbl_reserve = hugepage_subpool_get_pages(spool, chg); 3492 + if (gbl_reserve < 0) { 3598 3493 ret = -ENOSPC; 3599 3494 goto out_err; 3600 3495 } ··· 3608 3493 * Check enough hugepages are available for the reservation. 3609 3494 * Hand the pages back to the subpool if there are not 3610 3495 */ 3611 - ret = hugetlb_acct_memory(h, chg); 3496 + ret = hugetlb_acct_memory(h, gbl_reserve); 3612 3497 if (ret < 0) { 3613 - hugepage_subpool_put_pages(spool, chg); 3498 + /* put back original number of pages, chg */ 3499 + (void)hugepage_subpool_put_pages(spool, chg); 3614 3500 goto out_err; 3615 3501 } 3616 3502 ··· 3641 3525 struct resv_map *resv_map = inode_resv_map(inode); 3642 3526 long chg = 0; 3643 3527 struct hugepage_subpool *spool = subpool_inode(inode); 3528 + long gbl_reserve; 3644 3529 3645 3530 if (resv_map) 3646 3531 chg = region_truncate(resv_map, offset); ··· 3649 3532 inode->i_blocks -= (blocks_per_huge_page(h) * freed); 3650 3533 spin_unlock(&inode->i_lock); 3651 3534 3652 - hugepage_subpool_put_pages(spool, (chg - freed)); 3653 - hugetlb_acct_memory(h, -(chg - freed)); 3535 + /* 3536 + * If the subpool has a minimum size, the number of global 3537 + * reservations to be released may be adjusted. 3538 + */ 3539 + gbl_reserve = hugepage_subpool_put_pages(spool, (chg - freed)); 3540 + hugetlb_acct_memory(h, -gbl_reserve); 3654 3541 } 3655 3542 3656 3543 #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE ··· 3896 3775 3897 3776 #ifdef CONFIG_MEMORY_FAILURE 3898 3777 3899 - /* Should be called in hugetlb_lock */ 3900 - static int is_hugepage_on_freelist(struct page *hpage) 3901 - { 3902 - struct page *page; 3903 - struct page *tmp; 3904 - struct hstate *h = page_hstate(hpage); 3905 - int nid = page_to_nid(hpage); 3906 - 3907 - list_for_each_entry_safe(page, tmp, &h->hugepage_freelists[nid], lru) 3908 - if (page == hpage) 3909 - return 1; 3910 - return 0; 3911 - } 3912 - 3913 3778 /* 3914 3779 * This function is called from memory failure code. 3915 3780 * Assume the caller holds page lock of the head page. ··· 3907 3800 int ret = -EBUSY; 3908 3801 3909 3802 spin_lock(&hugetlb_lock); 3910 - if (is_hugepage_on_freelist(hpage)) { 3803 + /* 3804 + * Just checking !page_huge_active is not enough, because that could be 3805 + * an isolated/hwpoisoned hugepage (which have >0 refcount). 3806 + */ 3807 + if (!page_huge_active(hpage) && !page_count(hpage)) { 3911 3808 /* 3912 3809 * Hwpoisoned hugepage isn't linked to activelist or freelist, 3913 3810 * but dangling hpage->lru can trigger list-debug warnings ··· 3931 3820 3932 3821 bool isolate_huge_page(struct page *page, struct list_head *list) 3933 3822 { 3823 + bool ret = true; 3824 + 3934 3825 VM_BUG_ON_PAGE(!PageHead(page), page); 3935 - if (!get_page_unless_zero(page)) 3936 - return false; 3937 3826 spin_lock(&hugetlb_lock); 3827 + if (!page_huge_active(page) || !get_page_unless_zero(page)) { 3828 + ret = false; 3829 + goto unlock; 3830 + } 3831 + clear_page_huge_active(page); 3938 3832 list_move_tail(&page->lru, list); 3833 + unlock: 3939 3834 spin_unlock(&hugetlb_lock); 3940 - return true; 3835 + return ret; 3941 3836 } 3942 3837 3943 3838 void putback_active_hugepage(struct page *page) 3944 3839 { 3945 3840 VM_BUG_ON_PAGE(!PageHead(page), page); 3946 3841 spin_lock(&hugetlb_lock); 3842 + set_page_huge_active(page); 3947 3843 list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist); 3948 3844 spin_unlock(&hugetlb_lock); 3949 3845 put_page(page); 3950 - } 3951 - 3952 - bool is_hugepage_active(struct page *page) 3953 - { 3954 - VM_BUG_ON_PAGE(!PageHuge(page), page); 3955 - /* 3956 - * This function can be called for a tail page because the caller, 3957 - * scan_movable_pages, scans through a given pfn-range which typically 3958 - * covers one memory block. In systems using gigantic hugepage (1GB 3959 - * for x86_64,) a hugepage is larger than a memory block, and we don't 3960 - * support migrating such large hugepages for now, so return false 3961 - * when called for tail pages. 3962 - */ 3963 - if (PageTail(page)) 3964 - return false; 3965 - /* 3966 - * Refcount of a hwpoisoned hugepages is 1, but they are not active, 3967 - * so we should return false for them. 3968 - */ 3969 - if (unlikely(PageHWPoison(page))) 3970 - return false; 3971 - return page_count(page) > 0; 3972 3846 }

+2 -2

mm/internal.h

··· 224 224 * PageBuddy() should be checked first by the caller to minimize race window, 225 225 * and invalid values must be handled gracefully. 226 226 * 227 - * ACCESS_ONCE is used so that if the caller assigns the result into a local 227 + * READ_ONCE is used so that if the caller assigns the result into a local 228 228 * variable and e.g. tests it for valid range before using, the compiler cannot 229 229 * decide to remove the variable and inline the page_private(page) multiple 230 230 * times, potentially observing different values in the tests and the actual 231 231 * use of the result. 232 232 */ 233 - #define page_order_unsafe(page) ACCESS_ONCE(page_private(page)) 233 + #define page_order_unsafe(page) READ_ONCE(page_private(page)) 234 234 235 235 static inline bool is_cow_mapping(vm_flags_t flags) 236 236 {

+13

mm/kasan/kasan.c

··· 389 389 kasan_kmalloc(page->slab_cache, object, size); 390 390 } 391 391 392 + void kasan_kfree(void *ptr) 393 + { 394 + struct page *page; 395 + 396 + page = virt_to_head_page(ptr); 397 + 398 + if (unlikely(!PageSlab(page))) 399 + kasan_poison_shadow(ptr, PAGE_SIZE << compound_order(page), 400 + KASAN_FREE_PAGE); 401 + else 402 + kasan_slab_free(page->slab_cache, ptr); 403 + } 404 + 392 405 void kasan_kfree_large(const void *ptr) 393 406 { 394 407 struct page *page = virt_to_page(ptr);

+5 -5

mm/ksm.c

··· 542 542 expected_mapping = (void *)stable_node + 543 543 (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM); 544 544 again: 545 - kpfn = ACCESS_ONCE(stable_node->kpfn); 545 + kpfn = READ_ONCE(stable_node->kpfn); 546 546 page = pfn_to_page(kpfn); 547 547 548 548 /* ··· 551 551 * but on Alpha we need to be more careful. 552 552 */ 553 553 smp_read_barrier_depends(); 554 - if (ACCESS_ONCE(page->mapping) != expected_mapping) 554 + if (READ_ONCE(page->mapping) != expected_mapping) 555 555 goto stale; 556 556 557 557 /* ··· 577 577 cpu_relax(); 578 578 } 579 579 580 - if (ACCESS_ONCE(page->mapping) != expected_mapping) { 580 + if (READ_ONCE(page->mapping) != expected_mapping) { 581 581 put_page(page); 582 582 goto stale; 583 583 } 584 584 585 585 if (lock_it) { 586 586 lock_page(page); 587 - if (ACCESS_ONCE(page->mapping) != expected_mapping) { 587 + if (READ_ONCE(page->mapping) != expected_mapping) { 588 588 unlock_page(page); 589 589 put_page(page); 590 590 goto stale; ··· 600 600 * before checking whether node->kpfn has been changed. 601 601 */ 602 602 smp_rmb(); 603 - if (ACCESS_ONCE(stable_node->kpfn) != kpfn) 603 + if (READ_ONCE(stable_node->kpfn) != kpfn) 604 604 goto again; 605 605 remove_node_from_stable_tree(stable_node); 606 606 return NULL;

+16 -2

mm/memblock.c

··· 580 580 return memblock_add_range(&memblock.memory, base, size, nid, 0); 581 581 } 582 582 583 + static int __init_memblock memblock_add_region(phys_addr_t base, 584 + phys_addr_t size, 585 + int nid, 586 + unsigned long flags) 587 + { 588 + struct memblock_type *_rgn = &memblock.memory; 589 + 590 + memblock_dbg("memblock_add: [%#016llx-%#016llx] flags %#02lx %pF\n", 591 + (unsigned long long)base, 592 + (unsigned long long)base + size - 1, 593 + flags, (void *)_RET_IP_); 594 + 595 + return memblock_add_range(_rgn, base, size, nid, flags); 596 + } 597 + 583 598 int __init_memblock memblock_add(phys_addr_t base, phys_addr_t size) 584 599 { 585 - return memblock_add_range(&memblock.memory, base, size, 586 - MAX_NUMNODES, 0); 600 + return memblock_add_region(base, size, MAX_NUMNODES, 0); 587 601 } 588 602 589 603 /**

+17 -30

mm/memcontrol.c

··· 259 259 * page cache and RSS per cgroup. We would eventually like to provide 260 260 * statistics based on the statistics developed by Rik Van Riel for clock-pro, 261 261 * to help the administrator determine what knobs to tune. 262 - * 263 - * TODO: Add a water mark for the memory controller. Reclaim will begin when 264 - * we hit the water mark. May be even add a low water mark, such that 265 - * no reclaim occurs from a cgroup at it's low water mark, this is 266 - * a feature that will be implemented much later in the future. 267 262 */ 268 263 struct mem_cgroup { 269 264 struct cgroup_subsys_state css; ··· 455 460 return memcg->css.id; 456 461 } 457 462 463 + /* 464 + * A helper function to get mem_cgroup from ID. must be called under 465 + * rcu_read_lock(). The caller is responsible for calling 466 + * css_tryget_online() if the mem_cgroup is used for charging. (dropping 467 + * refcnt from swap can be called against removed memcg.) 468 + */ 458 469 static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id) 459 470 { 460 471 struct cgroup_subsys_state *css; ··· 674 673 static unsigned long soft_limit_excess(struct mem_cgroup *memcg) 675 674 { 676 675 unsigned long nr_pages = page_counter_read(&memcg->memory); 677 - unsigned long soft_limit = ACCESS_ONCE(memcg->soft_limit); 676 + unsigned long soft_limit = READ_ONCE(memcg->soft_limit); 678 677 unsigned long excess = 0; 679 678 680 679 if (nr_pages > soft_limit) ··· 1042 1041 goto out_unlock; 1043 1042 1044 1043 do { 1045 - pos = ACCESS_ONCE(iter->position); 1044 + pos = READ_ONCE(iter->position); 1046 1045 /* 1047 1046 * A racing update may change the position and 1048 1047 * put the last reference, hence css_tryget(), ··· 1359 1358 unsigned long limit; 1360 1359 1361 1360 count = page_counter_read(&memcg->memory); 1362 - limit = ACCESS_ONCE(memcg->memory.limit); 1361 + limit = READ_ONCE(memcg->memory.limit); 1363 1362 if (count < limit) 1364 1363 margin = limit - count; 1365 1364 1366 1365 if (do_swap_account) { 1367 1366 count = page_counter_read(&memcg->memsw); 1368 - limit = ACCESS_ONCE(memcg->memsw.limit); 1367 + limit = READ_ONCE(memcg->memsw.limit); 1369 1368 if (count <= limit) 1370 1369 margin = min(margin, limit - count); 1371 1370 } ··· 2350 2349 } 2351 2350 2352 2351 /* 2353 - * A helper function to get mem_cgroup from ID. must be called under 2354 - * rcu_read_lock(). The caller is responsible for calling 2355 - * css_tryget_online() if the mem_cgroup is used for charging. (dropping 2356 - * refcnt from swap can be called against removed memcg.) 2357 - */ 2358 - static struct mem_cgroup *mem_cgroup_lookup(unsigned short id) 2359 - { 2360 - /* ID 0 is unused ID */ 2361 - if (!id) 2362 - return NULL; 2363 - return mem_cgroup_from_id(id); 2364 - } 2365 - 2366 - /* 2367 2352 * try_get_mem_cgroup_from_page - look up page's memcg association 2368 2353 * @page: the page 2369 2354 * ··· 2375 2388 ent.val = page_private(page); 2376 2389 id = lookup_swap_cgroup_id(ent); 2377 2390 rcu_read_lock(); 2378 - memcg = mem_cgroup_lookup(id); 2391 + memcg = mem_cgroup_from_id(id); 2379 2392 if (memcg && !css_tryget_online(&memcg->css)) 2380 2393 memcg = NULL; 2381 2394 rcu_read_unlock(); ··· 2637 2650 return cachep; 2638 2651 2639 2652 memcg = get_mem_cgroup_from_mm(current->mm); 2640 - kmemcg_id = ACCESS_ONCE(memcg->kmemcg_id); 2653 + kmemcg_id = READ_ONCE(memcg->kmemcg_id); 2641 2654 if (kmemcg_id < 0) 2642 2655 goto out; 2643 2656 ··· 5007 5020 * tunable will only affect upcoming migrations, not the current one. 5008 5021 * So we need to save it, and keep it going. 5009 5022 */ 5010 - move_flags = ACCESS_ONCE(memcg->move_charge_at_immigrate); 5023 + move_flags = READ_ONCE(memcg->move_charge_at_immigrate); 5011 5024 if (move_flags) { 5012 5025 struct mm_struct *mm; 5013 5026 struct mem_cgroup *from = mem_cgroup_from_task(p); ··· 5241 5254 static int memory_low_show(struct seq_file *m, void *v) 5242 5255 { 5243 5256 struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); 5244 - unsigned long low = ACCESS_ONCE(memcg->low); 5257 + unsigned long low = READ_ONCE(memcg->low); 5245 5258 5246 5259 if (low == PAGE_COUNTER_MAX) 5247 5260 seq_puts(m, "max\n"); ··· 5271 5284 static int memory_high_show(struct seq_file *m, void *v) 5272 5285 { 5273 5286 struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); 5274 - unsigned long high = ACCESS_ONCE(memcg->high); 5287 + unsigned long high = READ_ONCE(memcg->high); 5275 5288 5276 5289 if (high == PAGE_COUNTER_MAX) 5277 5290 seq_puts(m, "max\n"); ··· 5301 5314 static int memory_max_show(struct seq_file *m, void *v) 5302 5315 { 5303 5316 struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); 5304 - unsigned long max = ACCESS_ONCE(memcg->memory.limit); 5317 + unsigned long max = READ_ONCE(memcg->memory.limit); 5305 5318 5306 5319 if (max == PAGE_COUNTER_MAX) 5307 5320 seq_puts(m, "max\n"); ··· 5856 5869 5857 5870 id = swap_cgroup_record(entry, 0); 5858 5871 rcu_read_lock(); 5859 - memcg = mem_cgroup_lookup(id); 5872 + memcg = mem_cgroup_from_id(id); 5860 5873 if (memcg) { 5861 5874 if (!mem_cgroup_is_root(memcg)) 5862 5875 page_counter_uncharge(&memcg->memsw, 1);

+89 -33

mm/memory-failure.c

··· 521 521 [RECOVERED] = "Recovered", 522 522 }; 523 523 524 + enum action_page_type { 525 + MSG_KERNEL, 526 + MSG_KERNEL_HIGH_ORDER, 527 + MSG_SLAB, 528 + MSG_DIFFERENT_COMPOUND, 529 + MSG_POISONED_HUGE, 530 + MSG_HUGE, 531 + MSG_FREE_HUGE, 532 + MSG_UNMAP_FAILED, 533 + MSG_DIRTY_SWAPCACHE, 534 + MSG_CLEAN_SWAPCACHE, 535 + MSG_DIRTY_MLOCKED_LRU, 536 + MSG_CLEAN_MLOCKED_LRU, 537 + MSG_DIRTY_UNEVICTABLE_LRU, 538 + MSG_CLEAN_UNEVICTABLE_LRU, 539 + MSG_DIRTY_LRU, 540 + MSG_CLEAN_LRU, 541 + MSG_TRUNCATED_LRU, 542 + MSG_BUDDY, 543 + MSG_BUDDY_2ND, 544 + MSG_UNKNOWN, 545 + }; 546 + 547 + static const char * const action_page_types[] = { 548 + [MSG_KERNEL] = "reserved kernel page", 549 + [MSG_KERNEL_HIGH_ORDER] = "high-order kernel page", 550 + [MSG_SLAB] = "kernel slab page", 551 + [MSG_DIFFERENT_COMPOUND] = "different compound page after locking", 552 + [MSG_POISONED_HUGE] = "huge page already hardware poisoned", 553 + [MSG_HUGE] = "huge page", 554 + [MSG_FREE_HUGE] = "free huge page", 555 + [MSG_UNMAP_FAILED] = "unmapping failed page", 556 + [MSG_DIRTY_SWAPCACHE] = "dirty swapcache page", 557 + [MSG_CLEAN_SWAPCACHE] = "clean swapcache page", 558 + [MSG_DIRTY_MLOCKED_LRU] = "dirty mlocked LRU page", 559 + [MSG_CLEAN_MLOCKED_LRU] = "clean mlocked LRU page", 560 + [MSG_DIRTY_UNEVICTABLE_LRU] = "dirty unevictable LRU page", 561 + [MSG_CLEAN_UNEVICTABLE_LRU] = "clean unevictable LRU page", 562 + [MSG_DIRTY_LRU] = "dirty LRU page", 563 + [MSG_CLEAN_LRU] = "clean LRU page", 564 + [MSG_TRUNCATED_LRU] = "already truncated LRU page", 565 + [MSG_BUDDY] = "free buddy page", 566 + [MSG_BUDDY_2ND] = "free buddy page (2nd try)", 567 + [MSG_UNKNOWN] = "unknown page", 568 + }; 569 + 524 570 /* 525 571 * XXX: It is possible that a page is isolated from LRU cache, 526 572 * and then kept in swap cache or failed to remove from page cache. ··· 823 777 static struct page_state { 824 778 unsigned long mask; 825 779 unsigned long res; 826 - char *msg; 780 + enum action_page_type type; 827 781 int (*action)(struct page *p, unsigned long pfn); 828 782 } error_states[] = { 829 - { reserved, reserved, "reserved kernel", me_kernel }, 783 + { reserved, reserved, MSG_KERNEL, me_kernel }, 830 784 /* 831 785 * free pages are specially detected outside this table: 832 786 * PG_buddy pages only make a small fraction of all free pages. ··· 837 791 * currently unused objects without touching them. But just 838 792 * treat it as standard kernel for now. 839 793 */ 840 - { slab, slab, "kernel slab", me_kernel }, 794 + { slab, slab, MSG_SLAB, me_kernel }, 841 795 842 796 #ifdef CONFIG_PAGEFLAGS_EXTENDED 843 - { head, head, "huge", me_huge_page }, 844 - { tail, tail, "huge", me_huge_page }, 797 + { head, head, MSG_HUGE, me_huge_page }, 798 + { tail, tail, MSG_HUGE, me_huge_page }, 845 799 #else 846 - { compound, compound, "huge", me_huge_page }, 800 + { compound, compound, MSG_HUGE, me_huge_page }, 847 801 #endif 848 802 849 - { sc|dirty, sc|dirty, "dirty swapcache", me_swapcache_dirty }, 850 - { sc|dirty, sc, "clean swapcache", me_swapcache_clean }, 803 + { sc|dirty, sc|dirty, MSG_DIRTY_SWAPCACHE, me_swapcache_dirty }, 804 + { sc|dirty, sc, MSG_CLEAN_SWAPCACHE, me_swapcache_clean }, 851 805 852 - { mlock|dirty, mlock|dirty, "dirty mlocked LRU", me_pagecache_dirty }, 853 - { mlock|dirty, mlock, "clean mlocked LRU", me_pagecache_clean }, 806 + { mlock|dirty, mlock|dirty, MSG_DIRTY_MLOCKED_LRU, me_pagecache_dirty }, 807 + { mlock|dirty, mlock, MSG_CLEAN_MLOCKED_LRU, me_pagecache_clean }, 854 808 855 - { unevict|dirty, unevict|dirty, "dirty unevictable LRU", me_pagecache_dirty }, 856 - { unevict|dirty, unevict, "clean unevictable LRU", me_pagecache_clean }, 809 + { unevict|dirty, unevict|dirty, MSG_DIRTY_UNEVICTABLE_LRU, me_pagecache_dirty }, 810 + { unevict|dirty, unevict, MSG_CLEAN_UNEVICTABLE_LRU, me_pagecache_clean }, 857 811 858 - { lru|dirty, lru|dirty, "dirty LRU", me_pagecache_dirty }, 859 - { lru|dirty, lru, "clean LRU", me_pagecache_clean }, 812 + { lru|dirty, lru|dirty, MSG_DIRTY_LRU, me_pagecache_dirty }, 813 + { lru|dirty, lru, MSG_CLEAN_LRU, me_pagecache_clean }, 860 814 861 815 /* 862 816 * Catchall entry: must be at end. 863 817 */ 864 - { 0, 0, "unknown page state", me_unknown }, 818 + { 0, 0, MSG_UNKNOWN, me_unknown }, 865 819 }; 866 820 867 821 #undef dirty ··· 881 835 * "Dirty/Clean" indication is not 100% accurate due to the possibility of 882 836 * setting PG_dirty outside page lock. See also comment above set_page_dirty(). 883 837 */ 884 - static void action_result(unsigned long pfn, char *msg, int result) 838 + static void action_result(unsigned long pfn, enum action_page_type type, int result) 885 839 { 886 - pr_err("MCE %#lx: %s page recovery: %s\n", 887 - pfn, msg, action_name[result]); 840 + pr_err("MCE %#lx: recovery action for %s: %s\n", 841 + pfn, action_page_types[type], action_name[result]); 888 842 } 889 843 890 844 static int page_action(struct page_state *ps, struct page *p, ··· 900 854 count--; 901 855 if (count != 0) { 902 856 printk(KERN_ERR 903 - "MCE %#lx: %s page still referenced by %d users\n", 904 - pfn, ps->msg, count); 857 + "MCE %#lx: %s still referenced by %d users\n", 858 + pfn, action_page_types[ps->type], count); 905 859 result = FAILED; 906 860 } 907 - action_result(pfn, ps->msg, result); 861 + action_result(pfn, ps->type, result); 908 862 909 863 /* Could do more checks here if page looks ok */ 910 864 /* ··· 1152 1106 if (!(flags & MF_COUNT_INCREASED) && 1153 1107 !get_page_unless_zero(hpage)) { 1154 1108 if (is_free_buddy_page(p)) { 1155 - action_result(pfn, "free buddy", DELAYED); 1109 + action_result(pfn, MSG_BUDDY, DELAYED); 1156 1110 return 0; 1157 1111 } else if (PageHuge(hpage)) { 1158 1112 /* ··· 1169 1123 } 1170 1124 set_page_hwpoison_huge_page(hpage); 1171 1125 res = dequeue_hwpoisoned_huge_page(hpage); 1172 - action_result(pfn, "free huge", 1126 + action_result(pfn, MSG_FREE_HUGE, 1173 1127 res ? IGNORED : DELAYED); 1174 1128 unlock_page(hpage); 1175 1129 return res; 1176 1130 } else { 1177 - action_result(pfn, "high order kernel", IGNORED); 1131 + action_result(pfn, MSG_KERNEL_HIGH_ORDER, IGNORED); 1178 1132 return -EBUSY; 1179 1133 } 1180 1134 } ··· 1196 1150 */ 1197 1151 if (is_free_buddy_page(p)) { 1198 1152 if (flags & MF_COUNT_INCREASED) 1199 - action_result(pfn, "free buddy", DELAYED); 1153 + action_result(pfn, MSG_BUDDY, DELAYED); 1200 1154 else 1201 - action_result(pfn, "free buddy, 2nd try", DELAYED); 1155 + action_result(pfn, MSG_BUDDY_2ND, 1156 + DELAYED); 1202 1157 return 0; 1203 1158 } 1204 1159 } ··· 1212 1165 * If this happens just bail out. 1213 1166 */ 1214 1167 if (compound_head(p) != hpage) { 1215 - action_result(pfn, "different compound page after locking", IGNORED); 1168 + action_result(pfn, MSG_DIFFERENT_COMPOUND, IGNORED); 1216 1169 res = -EBUSY; 1217 1170 goto out; 1218 1171 } ··· 1252 1205 * on the head page to show that the hugepage is hwpoisoned 1253 1206 */ 1254 1207 if (PageHuge(p) && PageTail(p) && TestSetPageHWPoison(hpage)) { 1255 - action_result(pfn, "hugepage already hardware poisoned", 1256 - IGNORED); 1208 + action_result(pfn, MSG_POISONED_HUGE, IGNORED); 1257 1209 unlock_page(hpage); 1258 1210 put_page(hpage); 1259 1211 return 0; ··· 1281 1235 */ 1282 1236 if (hwpoison_user_mappings(p, pfn, trapno, flags, &hpage) 1283 1237 != SWAP_SUCCESS) { 1284 - action_result(pfn, "unmapping failed", IGNORED); 1238 + action_result(pfn, MSG_UNMAP_FAILED, IGNORED); 1285 1239 res = -EBUSY; 1286 1240 goto out; 1287 1241 } ··· 1290 1244 * Torn down by someone else? 1291 1245 */ 1292 1246 if (PageLRU(p) && !PageSwapCache(p) && p->mapping == NULL) { 1293 - action_result(pfn, "already truncated LRU", IGNORED); 1247 + action_result(pfn, MSG_TRUNCATED_LRU, IGNORED); 1294 1248 res = -EBUSY; 1295 1249 goto out; 1296 1250 } ··· 1586 1540 } 1587 1541 unlock_page(hpage); 1588 1542 1589 - /* Keep page count to indicate a given hugepage is isolated. */ 1590 - list_move(&hpage->lru, &pagelist); 1543 + ret = isolate_huge_page(hpage, &pagelist); 1544 + if (ret) { 1545 + /* 1546 + * get_any_page() and isolate_huge_page() takes a refcount each, 1547 + * so need to drop one here. 1548 + */ 1549 + put_page(hpage); 1550 + } else { 1551 + pr_info("soft offline: %#lx hugepage failed to isolate\n", pfn); 1552 + return -EBUSY; 1553 + } 1554 + 1591 1555 ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, 1592 1556 MIGRATE_SYNC, MR_MEMORY_FAILURE); 1593 1557 if (ret) {

+45 -11

mm/memory.c

··· 690 690 /* 691 691 * Choose text because data symbols depend on CONFIG_KALLSYMS_ALL=y 692 692 */ 693 - if (vma->vm_ops) 694 - printk(KERN_ALERT "vma->vm_ops->fault: %pSR\n", 695 - vma->vm_ops->fault); 696 - if (vma->vm_file) 697 - printk(KERN_ALERT "vma->vm_file->f_op->mmap: %pSR\n", 698 - vma->vm_file->f_op->mmap); 693 + pr_alert("file:%pD fault:%pf mmap:%pf readpage:%pf\n", 694 + vma->vm_file, 695 + vma->vm_ops ? vma->vm_ops->fault : NULL, 696 + vma->vm_file ? vma->vm_file->f_op->mmap : NULL, 697 + mapping ? mapping->a_ops->readpage : NULL); 699 698 dump_stack(); 700 699 add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); 701 700 } ··· 2180 2181 return VM_FAULT_OOM; 2181 2182 } 2182 2183 2184 + /* 2185 + * Handle write page faults for VM_MIXEDMAP or VM_PFNMAP for a VM_SHARED 2186 + * mapping 2187 + */ 2188 + static int wp_pfn_shared(struct mm_struct *mm, 2189 + struct vm_area_struct *vma, unsigned long address, 2190 + pte_t *page_table, spinlock_t *ptl, pte_t orig_pte, 2191 + pmd_t *pmd) 2192 + { 2193 + if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) { 2194 + struct vm_fault vmf = { 2195 + .page = NULL, 2196 + .pgoff = linear_page_index(vma, address), 2197 + .virtual_address = (void __user *)(address & PAGE_MASK), 2198 + .flags = FAULT_FLAG_WRITE | FAULT_FLAG_MKWRITE, 2199 + }; 2200 + int ret; 2201 + 2202 + pte_unmap_unlock(page_table, ptl); 2203 + ret = vma->vm_ops->pfn_mkwrite(vma, &vmf); 2204 + if (ret & VM_FAULT_ERROR) 2205 + return ret; 2206 + page_table = pte_offset_map_lock(mm, pmd, address, &ptl); 2207 + /* 2208 + * We might have raced with another page fault while we 2209 + * released the pte_offset_map_lock. 2210 + */ 2211 + if (!pte_same(*page_table, orig_pte)) { 2212 + pte_unmap_unlock(page_table, ptl); 2213 + return 0; 2214 + } 2215 + } 2216 + return wp_page_reuse(mm, vma, address, page_table, ptl, orig_pte, 2217 + NULL, 0, 0); 2218 + } 2219 + 2183 2220 static int wp_page_shared(struct mm_struct *mm, struct vm_area_struct *vma, 2184 2221 unsigned long address, pte_t *page_table, 2185 2222 pmd_t *pmd, spinlock_t *ptl, pte_t orig_pte, ··· 2294 2259 * VM_PFNMAP VMA. 2295 2260 * 2296 2261 * We should not cow pages in a shared writeable mapping. 2297 - * Just mark the pages writable as we can't do any dirty 2298 - * accounting on raw pfn maps. 2262 + * Just mark the pages writable and/or call ops->pfn_mkwrite. 2299 2263 */ 2300 2264 if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) == 2301 2265 (VM_WRITE|VM_SHARED)) 2302 - return wp_page_reuse(mm, vma, address, page_table, ptl, 2303 - orig_pte, old_page, 0, 0); 2266 + return wp_pfn_shared(mm, vma, address, page_table, ptl, 2267 + orig_pte, pmd); 2304 2268 2305 2269 pte_unmap_unlock(page_table, ptl); 2306 2270 return wp_page_copy(mm, vma, address, page_table, pmd, ··· 2879 2845 struct vm_fault vmf; 2880 2846 int off; 2881 2847 2882 - nr_pages = ACCESS_ONCE(fault_around_bytes) >> PAGE_SHIFT; 2848 + nr_pages = READ_ONCE(fault_around_bytes) >> PAGE_SHIFT; 2883 2849 mask = ~(nr_pages * PAGE_SIZE - 1) & PAGE_MASK; 2884 2850 2885 2851 start_addr = max(address & mask, vma->vm_start);

+1 -1

mm/memory_hotplug.c

··· 1373 1373 if (PageLRU(page)) 1374 1374 return pfn; 1375 1375 if (PageHuge(page)) { 1376 - if (is_hugepage_active(page)) 1376 + if (page_huge_active(page)) 1377 1377 return pfn; 1378 1378 else 1379 1379 pfn = round_up(pfn + 1,

+115 -2

mm/mempool.c

··· 6 6 * extreme VM load. 7 7 * 8 8 * started by Ingo Molnar, Copyright (C) 2001 9 + * debugging by David Rientjes, Copyright (C) 2015 9 10 */ 10 11 11 12 #include <linux/mm.h> 12 13 #include <linux/slab.h> 14 + #include <linux/highmem.h> 15 + #include <linux/kasan.h> 13 16 #include <linux/kmemleak.h> 14 17 #include <linux/export.h> 15 18 #include <linux/mempool.h> 16 19 #include <linux/blkdev.h> 17 20 #include <linux/writeback.h> 21 + #include "slab.h" 22 + 23 + #if defined(CONFIG_DEBUG_SLAB) || defined(CONFIG_SLUB_DEBUG_ON) 24 + static void poison_error(mempool_t *pool, void *element, size_t size, 25 + size_t byte) 26 + { 27 + const int nr = pool->curr_nr; 28 + const int start = max_t(int, byte - (BITS_PER_LONG / 8), 0); 29 + const int end = min_t(int, byte + (BITS_PER_LONG / 8), size); 30 + int i; 31 + 32 + pr_err("BUG: mempool element poison mismatch\n"); 33 + pr_err("Mempool %p size %zu\n", pool, size); 34 + pr_err(" nr=%d @ %p: %s0x", nr, element, start > 0 ? "... " : ""); 35 + for (i = start; i < end; i++) 36 + pr_cont("%x ", *(u8 *)(element + i)); 37 + pr_cont("%s\n", end < size ? "..." : ""); 38 + dump_stack(); 39 + } 40 + 41 + static void __check_element(mempool_t *pool, void *element, size_t size) 42 + { 43 + u8 *obj = element; 44 + size_t i; 45 + 46 + for (i = 0; i < size; i++) { 47 + u8 exp = (i < size - 1) ? POISON_FREE : POISON_END; 48 + 49 + if (obj[i] != exp) { 50 + poison_error(pool, element, size, i); 51 + return; 52 + } 53 + } 54 + memset(obj, POISON_INUSE, size); 55 + } 56 + 57 + static void check_element(mempool_t *pool, void *element) 58 + { 59 + /* Mempools backed by slab allocator */ 60 + if (pool->free == mempool_free_slab || pool->free == mempool_kfree) 61 + __check_element(pool, element, ksize(element)); 62 + 63 + /* Mempools backed by page allocator */ 64 + if (pool->free == mempool_free_pages) { 65 + int order = (int)(long)pool->pool_data; 66 + void *addr = kmap_atomic((struct page *)element); 67 + 68 + __check_element(pool, addr, 1UL << (PAGE_SHIFT + order)); 69 + kunmap_atomic(addr); 70 + } 71 + } 72 + 73 + static void __poison_element(void *element, size_t size) 74 + { 75 + u8 *obj = element; 76 + 77 + memset(obj, POISON_FREE, size - 1); 78 + obj[size - 1] = POISON_END; 79 + } 80 + 81 + static void poison_element(mempool_t *pool, void *element) 82 + { 83 + /* Mempools backed by slab allocator */ 84 + if (pool->alloc == mempool_alloc_slab || pool->alloc == mempool_kmalloc) 85 + __poison_element(element, ksize(element)); 86 + 87 + /* Mempools backed by page allocator */ 88 + if (pool->alloc == mempool_alloc_pages) { 89 + int order = (int)(long)pool->pool_data; 90 + void *addr = kmap_atomic((struct page *)element); 91 + 92 + __poison_element(addr, 1UL << (PAGE_SHIFT + order)); 93 + kunmap_atomic(addr); 94 + } 95 + } 96 + #else /* CONFIG_DEBUG_SLAB || CONFIG_SLUB_DEBUG_ON */ 97 + static inline void check_element(mempool_t *pool, void *element) 98 + { 99 + } 100 + static inline void poison_element(mempool_t *pool, void *element) 101 + { 102 + } 103 + #endif /* CONFIG_DEBUG_SLAB || CONFIG_SLUB_DEBUG_ON */ 104 + 105 + static void kasan_poison_element(mempool_t *pool, void *element) 106 + { 107 + if (pool->alloc == mempool_alloc_slab) 108 + kasan_slab_free(pool->pool_data, element); 109 + if (pool->alloc == mempool_kmalloc) 110 + kasan_kfree(element); 111 + if (pool->alloc == mempool_alloc_pages) 112 + kasan_free_pages(element, (unsigned long)pool->pool_data); 113 + } 114 + 115 + static void kasan_unpoison_element(mempool_t *pool, void *element) 116 + { 117 + if (pool->alloc == mempool_alloc_slab) 118 + kasan_slab_alloc(pool->pool_data, element); 119 + if (pool->alloc == mempool_kmalloc) 120 + kasan_krealloc(element, (size_t)pool->pool_data); 121 + if (pool->alloc == mempool_alloc_pages) 122 + kasan_alloc_pages(element, (unsigned long)pool->pool_data); 123 + } 18 124 19 125 static void add_element(mempool_t *pool, void *element) 20 126 { 21 127 BUG_ON(pool->curr_nr >= pool->min_nr); 128 + poison_element(pool, element); 129 + kasan_poison_element(pool, element); 22 130 pool->elements[pool->curr_nr++] = element; 23 131 } 24 132 25 133 static void *remove_element(mempool_t *pool) 26 134 { 27 - BUG_ON(pool->curr_nr <= 0); 28 - return pool->elements[--pool->curr_nr]; 135 + void *element = pool->elements[--pool->curr_nr]; 136 + 137 + BUG_ON(pool->curr_nr < 0); 138 + check_element(pool, element); 139 + kasan_unpoison_element(pool, element); 140 + return element; 29 141 } 30 142 31 143 /** ··· 446 334 void *mempool_alloc_slab(gfp_t gfp_mask, void *pool_data) 447 335 { 448 336 struct kmem_cache *mem = pool_data; 337 + VM_BUG_ON(mem->ctor); 449 338 return kmem_cache_alloc(mem, gfp_mask); 450 339 } 451 340 EXPORT_SYMBOL(mempool_alloc_slab);

+2 -1

mm/migrate.c

··· 537 537 * Please do not reorder this without considering how mm/ksm.c's 538 538 * get_ksm_page() depends upon ksm_migrate_page() and PageSwapCache(). 539 539 */ 540 - ClearPageSwapCache(page); 540 + if (PageSwapCache(page)) 541 + ClearPageSwapCache(page); 541 542 ClearPagePrivate(page); 542 543 set_page_private(page, 0); 543 544

+10 -11

mm/mmap.c

··· 1133 1133 * by another page fault trying to merge _that_. But that's ok: if it 1134 1134 * is being set up, that automatically means that it will be a singleton 1135 1135 * acceptable for merging, so we can do all of this optimistically. But 1136 - * we do that ACCESS_ONCE() to make sure that we never re-load the pointer. 1136 + * we do that READ_ONCE() to make sure that we never re-load the pointer. 1137 1137 * 1138 1138 * IOW: that the "list_is_singular()" test on the anon_vma_chain only 1139 1139 * matters for the 'stable anon_vma' case (ie the thing we want to avoid ··· 1147 1147 static struct anon_vma *reusable_anon_vma(struct vm_area_struct *old, struct vm_area_struct *a, struct vm_area_struct *b) 1148 1148 { 1149 1149 if (anon_vma_compatible(a, b)) { 1150 - struct anon_vma *anon_vma = ACCESS_ONCE(old->anon_vma); 1150 + struct anon_vma *anon_vma = READ_ONCE(old->anon_vma); 1151 1151 1152 1152 if (anon_vma && list_is_singular(&old->anon_vma_chain)) 1153 1153 return anon_vma; ··· 1551 1551 1552 1552 /* Clear old maps */ 1553 1553 error = -ENOMEM; 1554 - munmap_back: 1555 - if (find_vma_links(mm, addr, addr + len, &prev, &rb_link, &rb_parent)) { 1554 + while (find_vma_links(mm, addr, addr + len, &prev, &rb_link, 1555 + &rb_parent)) { 1556 1556 if (do_munmap(mm, addr, len)) 1557 1557 return -ENOMEM; 1558 - goto munmap_back; 1559 1558 } 1560 1559 1561 1560 /* ··· 1570 1571 /* 1571 1572 * Can we just expand an old mapping? 1572 1573 */ 1573 - vma = vma_merge(mm, prev, addr, addr + len, vm_flags, NULL, file, pgoff, NULL); 1574 + vma = vma_merge(mm, prev, addr, addr + len, vm_flags, NULL, file, pgoff, 1575 + NULL); 1574 1576 if (vma) 1575 1577 goto out; 1576 1578 ··· 2100 2100 actual_size = size; 2101 2101 if (size && (vma->vm_flags & (VM_GROWSUP | VM_GROWSDOWN))) 2102 2102 actual_size -= PAGE_SIZE; 2103 - if (actual_size > ACCESS_ONCE(rlim[RLIMIT_STACK].rlim_cur)) 2103 + if (actual_size > READ_ONCE(rlim[RLIMIT_STACK].rlim_cur)) 2104 2104 return -ENOMEM; 2105 2105 2106 2106 /* mlock limit tests */ ··· 2108 2108 unsigned long locked; 2109 2109 unsigned long limit; 2110 2110 locked = mm->locked_vm + grow; 2111 - limit = ACCESS_ONCE(rlim[RLIMIT_MEMLOCK].rlim_cur); 2111 + limit = READ_ONCE(rlim[RLIMIT_MEMLOCK].rlim_cur); 2112 2112 limit >>= PAGE_SHIFT; 2113 2113 if (locked > limit && !capable(CAP_IPC_LOCK)) 2114 2114 return -ENOMEM; ··· 2739 2739 /* 2740 2740 * Clear old maps. this also does some error checking for us 2741 2741 */ 2742 - munmap_back: 2743 - if (find_vma_links(mm, addr, addr + len, &prev, &rb_link, &rb_parent)) { 2742 + while (find_vma_links(mm, addr, addr + len, &prev, &rb_link, 2743 + &rb_parent)) { 2744 2744 if (do_munmap(mm, addr, len)) 2745 2745 return -ENOMEM; 2746 - goto munmap_back; 2747 2746 } 2748 2747 2749 2748 /* Check against address space limits *after* clearing old maps... */

+8 -17

mm/mremap.c

··· 345 345 struct vm_area_struct *vma = find_vma(mm, addr); 346 346 347 347 if (!vma || vma->vm_start > addr) 348 - goto Efault; 348 + return ERR_PTR(-EFAULT); 349 349 350 350 if (is_vm_hugetlb_page(vma)) 351 - goto Einval; 351 + return ERR_PTR(-EINVAL); 352 352 353 353 /* We can't remap across vm area boundaries */ 354 354 if (old_len > vma->vm_end - addr) 355 - goto Efault; 355 + return ERR_PTR(-EFAULT); 356 356 357 357 /* Need to be careful about a growing mapping */ 358 358 if (new_len > old_len) { 359 359 unsigned long pgoff; 360 360 361 361 if (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)) 362 - goto Efault; 362 + return ERR_PTR(-EFAULT); 363 363 pgoff = (addr - vma->vm_start) >> PAGE_SHIFT; 364 364 pgoff += vma->vm_pgoff; 365 365 if (pgoff + (new_len >> PAGE_SHIFT) < pgoff) 366 - goto Einval; 366 + return ERR_PTR(-EINVAL); 367 367 } 368 368 369 369 if (vma->vm_flags & VM_LOCKED) { ··· 372 372 lock_limit = rlimit(RLIMIT_MEMLOCK); 373 373 locked += new_len - old_len; 374 374 if (locked > lock_limit && !capable(CAP_IPC_LOCK)) 375 - goto Eagain; 375 + return ERR_PTR(-EAGAIN); 376 376 } 377 377 378 378 if (!may_expand_vm(mm, (new_len - old_len) >> PAGE_SHIFT)) 379 - goto Enomem; 379 + return ERR_PTR(-ENOMEM); 380 380 381 381 if (vma->vm_flags & VM_ACCOUNT) { 382 382 unsigned long charged = (new_len - old_len) >> PAGE_SHIFT; 383 383 if (security_vm_enough_memory_mm(mm, charged)) 384 - goto Efault; 384 + return ERR_PTR(-ENOMEM); 385 385 *p = charged; 386 386 } 387 387 388 388 return vma; 389 - 390 - Efault: /* very odd choice for most of the cases, but... */ 391 - return ERR_PTR(-EFAULT); 392 - Einval: 393 - return ERR_PTR(-EINVAL); 394 - Enomem: 395 - return ERR_PTR(-ENOMEM); 396 - Eagain: 397 - return ERR_PTR(-EAGAIN); 398 389 } 399 390 400 391 static unsigned long mremap_to(unsigned long addr, unsigned long old_len,

+1 -1

mm/oom_kill.c

··· 408 408 static DECLARE_RWSEM(oom_sem); 409 409 410 410 /** 411 - * mark_tsk_oom_victim - marks the given taks as OOM victim. 411 + * mark_tsk_oom_victim - marks the given task as OOM victim. 412 412 * @tsk: task to mark 413 413 * 414 414 * Has to be called with oom_sem taken for read and never after

+2 -1

mm/page-writeback.c

··· 2228 2228 * it will confuse readahead and make it restart the size rampup 2229 2229 * process. But it's a trivial problem. 2230 2230 */ 2231 - ClearPageReclaim(page); 2231 + if (PageReclaim(page)) 2232 + ClearPageReclaim(page); 2232 2233 #ifdef CONFIG_BLOCK 2233 2234 if (!spd) 2234 2235 spd = __set_page_dirty_buffers;

+3 -3

mm/page_alloc.c

··· 1371 1371 int to_drain, batch; 1372 1372 1373 1373 local_irq_save(flags); 1374 - batch = ACCESS_ONCE(pcp->batch); 1374 + batch = READ_ONCE(pcp->batch); 1375 1375 to_drain = min(pcp->count, batch); 1376 1376 if (to_drain > 0) { 1377 1377 free_pcppages_bulk(zone, to_drain, pcp); ··· 1570 1570 list_add_tail(&page->lru, &pcp->lists[migratetype]); 1571 1571 pcp->count++; 1572 1572 if (pcp->count >= pcp->high) { 1573 - unsigned long batch = ACCESS_ONCE(pcp->batch); 1573 + unsigned long batch = READ_ONCE(pcp->batch); 1574 1574 free_pcppages_bulk(zone, batch, pcp); 1575 1575 pcp->count -= batch; 1576 1576 } ··· 6207 6207 mask <<= (BITS_PER_LONG - bitidx - 1); 6208 6208 flags <<= (BITS_PER_LONG - bitidx - 1); 6209 6209 6210 - word = ACCESS_ONCE(bitmap[word_bitidx]); 6210 + word = READ_ONCE(bitmap[word_bitidx]); 6211 6211 for (;;) { 6212 6212 old_word = cmpxchg(&bitmap[word_bitidx], word, (word & ~mask) | flags); 6213 6213 if (word == old_word)

+3 -3

mm/rmap.c

··· 456 456 unsigned long anon_mapping; 457 457 458 458 rcu_read_lock(); 459 - anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping); 459 + anon_mapping = (unsigned long)READ_ONCE(page->mapping); 460 460 if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON) 461 461 goto out; 462 462 if (!page_mapped(page)) ··· 500 500 unsigned long anon_mapping; 501 501 502 502 rcu_read_lock(); 503 - anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping); 503 + anon_mapping = (unsigned long)READ_ONCE(page->mapping); 504 504 if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON) 505 505 goto out; 506 506 if (!page_mapped(page)) 507 507 goto out; 508 508 509 509 anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); 510 - root_anon_vma = ACCESS_ONCE(anon_vma->root); 510 + root_anon_vma = READ_ONCE(anon_vma->root); 511 511 if (down_read_trylock(&root_anon_vma->rwsem)) { 512 512 /* 513 513 * If the page is still mapped, then this anon_vma is still

+2 -2

mm/slub.c

··· 4277 4277 int node; 4278 4278 struct page *page; 4279 4279 4280 - page = ACCESS_ONCE(c->page); 4280 + page = READ_ONCE(c->page); 4281 4281 if (!page) 4282 4282 continue; 4283 4283 ··· 4292 4292 total += x; 4293 4293 nodes[node] += x; 4294 4294 4295 - page = ACCESS_ONCE(c->partial); 4295 + page = READ_ONCE(c->partial); 4296 4296 if (page) { 4297 4297 node = page_to_nid(page); 4298 4298 if (flags & SO_TOTAL)

+21 -13

mm/swap.c

··· 31 31 #include <linux/memcontrol.h> 32 32 #include <linux/gfp.h> 33 33 #include <linux/uio.h> 34 + #include <linux/hugetlb.h> 34 35 35 36 #include "internal.h" 36 37 ··· 43 42 44 43 static DEFINE_PER_CPU(struct pagevec, lru_add_pvec); 45 44 static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs); 46 - static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs); 45 + static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs); 47 46 48 47 /* 49 48 * This path almost never happens for VM activity - pages are normally ··· 76 75 { 77 76 compound_page_dtor *dtor; 78 77 79 - __page_cache_release(page); 78 + /* 79 + * __page_cache_release() is supposed to be called for thp, not for 80 + * hugetlb. This is because hugetlb page does never have PageLRU set 81 + * (it's never listed to any LRU lists) and no memcg routines should 82 + * be called for hugetlb (it has a separate hugetlb_cgroup.) 83 + */ 84 + if (!PageHuge(page)) 85 + __page_cache_release(page); 80 86 dtor = get_compound_page_dtor(page); 81 87 (*dtor)(page); 82 88 } ··· 751 743 * be write it out by flusher threads as this is much more effective 752 744 * than the single-page writeout from reclaim. 753 745 */ 754 - static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, 746 + static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec, 755 747 void *arg) 756 748 { 757 749 int lru, file; ··· 819 811 local_irq_restore(flags); 820 812 } 821 813 822 - pvec = &per_cpu(lru_deactivate_pvecs, cpu); 814 + pvec = &per_cpu(lru_deactivate_file_pvecs, cpu); 823 815 if (pagevec_count(pvec)) 824 - pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); 816 + pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); 825 817 826 818 activate_page_drain(cpu); 827 819 } 828 820 829 821 /** 830 - * deactivate_page - forcefully deactivate a page 822 + * deactivate_file_page - forcefully deactivate a file page 831 823 * @page: page to deactivate 832 824 * 833 825 * This function hints the VM that @page is a good reclaim candidate, 834 826 * for example if its invalidation fails due to the page being dirty 835 827 * or under writeback. 836 828 */ 837 - void deactivate_page(struct page *page) 829 + void deactivate_file_page(struct page *page) 838 830 { 839 831 /* 840 - * In a workload with many unevictable page such as mprotect, unevictable 841 - * page deactivation for accelerating reclaim is pointless. 832 + * In a workload with many unevictable page such as mprotect, 833 + * unevictable page deactivation for accelerating reclaim is pointless. 842 834 */ 843 835 if (PageUnevictable(page)) 844 836 return; 845 837 846 838 if (likely(get_page_unless_zero(page))) { 847 - struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs); 839 + struct pagevec *pvec = &get_cpu_var(lru_deactivate_file_pvecs); 848 840 849 841 if (!pagevec_add(pvec, page)) 850 - pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); 851 - put_cpu_var(lru_deactivate_pvecs); 842 + pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); 843 + put_cpu_var(lru_deactivate_file_pvecs); 852 844 } 853 845 } 854 846 ··· 880 872 881 873 if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) || 882 874 pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) || 883 - pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) || 875 + pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) || 884 876 need_activate_page_drain(cpu)) { 885 877 INIT_WORK(work, lru_add_drain_per_cpu); 886 878 schedule_work_on(cpu, work);

+1 -1

mm/swap_state.c

··· 390 390 unsigned int pages, max_pages, last_ra; 391 391 static atomic_t last_readahead_pages; 392 392 393 - max_pages = 1 << ACCESS_ONCE(page_cluster); 393 + max_pages = 1 << READ_ONCE(page_cluster); 394 394 if (max_pages <= 1) 395 395 return 1; 396 396

+1 -1

mm/swapfile.c

··· 1312 1312 else 1313 1313 continue; 1314 1314 } 1315 - count = ACCESS_ONCE(si->swap_map[i]); 1315 + count = READ_ONCE(si->swap_map[i]); 1316 1316 if (count && swap_count(count) != SWAP_MAP_BAD) 1317 1317 break; 1318 1318 }

+1 -1

mm/truncate.c

··· 490 490 * of interest and try to speed up its reclaim. 491 491 */ 492 492 if (!ret) 493 - deactivate_page(page); 493 + deactivate_file_page(page); 494 494 count += ret; 495 495 } 496 496 pagevec_remove_exceptionals(&pvec);

+36 -5

mm/util.c

··· 325 325 } 326 326 EXPORT_SYMBOL(kvfree); 327 327 328 + static inline void *__page_rmapping(struct page *page) 329 + { 330 + unsigned long mapping; 331 + 332 + mapping = (unsigned long)page->mapping; 333 + mapping &= ~PAGE_MAPPING_FLAGS; 334 + 335 + return (void *)mapping; 336 + } 337 + 338 + /* Neutral page->mapping pointer to address_space or anon_vma or other */ 339 + void *page_rmapping(struct page *page) 340 + { 341 + page = compound_head(page); 342 + return __page_rmapping(page); 343 + } 344 + 345 + struct anon_vma *page_anon_vma(struct page *page) 346 + { 347 + unsigned long mapping; 348 + 349 + page = compound_head(page); 350 + mapping = (unsigned long)page->mapping; 351 + if ((mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON) 352 + return NULL; 353 + return __page_rmapping(page); 354 + } 355 + 328 356 struct address_space *page_mapping(struct page *page) 329 357 { 330 - struct address_space *mapping = page->mapping; 358 + unsigned long mapping; 331 359 332 360 /* This happens if someone calls flush_dcache_page on slab page */ 333 361 if (unlikely(PageSlab(page))) ··· 365 337 swp_entry_t entry; 366 338 367 339 entry.val = page_private(page); 368 - mapping = swap_address_space(entry); 369 - } else if ((unsigned long)mapping & PAGE_MAPPING_ANON) 370 - mapping = NULL; 371 - return mapping; 340 + return swap_address_space(entry); 341 + } 342 + 343 + mapping = (unsigned long)page->mapping; 344 + if (mapping & PAGE_MAPPING_FLAGS) 345 + return NULL; 346 + return page->mapping; 372 347 } 373 348 374 349 int overcommit_ratio_handler(struct ctl_table *table, int write,

+55 -40

mm/vmalloc.c

··· 765 765 spinlock_t lock; 766 766 struct vmap_area *va; 767 767 unsigned long free, dirty; 768 - DECLARE_BITMAP(dirty_map, VMAP_BBMAP_BITS); 768 + unsigned long dirty_min, dirty_max; /*< dirty range */ 769 769 struct list_head free_list; 770 770 struct rcu_head rcu_head; 771 771 struct list_head purge; ··· 796 796 return addr; 797 797 } 798 798 799 - static struct vmap_block *new_vmap_block(gfp_t gfp_mask) 799 + static void *vmap_block_vaddr(unsigned long va_start, unsigned long pages_off) 800 + { 801 + unsigned long addr; 802 + 803 + addr = va_start + (pages_off << PAGE_SHIFT); 804 + BUG_ON(addr_to_vb_idx(addr) != addr_to_vb_idx(va_start)); 805 + return (void *)addr; 806 + } 807 + 808 + /** 809 + * new_vmap_block - allocates new vmap_block and occupies 2^order pages in this 810 + * block. Of course pages number can't exceed VMAP_BBMAP_BITS 811 + * @order: how many 2^order pages should be occupied in newly allocated block 812 + * @gfp_mask: flags for the page level allocator 813 + * 814 + * Returns: virtual address in a newly allocated block or ERR_PTR(-errno) 815 + */ 816 + static void *new_vmap_block(unsigned int order, gfp_t gfp_mask) 800 817 { 801 818 struct vmap_block_queue *vbq; 802 819 struct vmap_block *vb; 803 820 struct vmap_area *va; 804 821 unsigned long vb_idx; 805 822 int node, err; 823 + void *vaddr; 806 824 807 825 node = numa_node_id(); 808 826 ··· 844 826 return ERR_PTR(err); 845 827 } 846 828 829 + vaddr = vmap_block_vaddr(va->va_start, 0); 847 830 spin_lock_init(&vb->lock); 848 831 vb->va = va; 849 - vb->free = VMAP_BBMAP_BITS; 832 + /* At least something should be left free */ 833 + BUG_ON(VMAP_BBMAP_BITS <= (1UL << order)); 834 + vb->free = VMAP_BBMAP_BITS - (1UL << order); 850 835 vb->dirty = 0; 851 - bitmap_zero(vb->dirty_map, VMAP_BBMAP_BITS); 836 + vb->dirty_min = VMAP_BBMAP_BITS; 837 + vb->dirty_max = 0; 852 838 INIT_LIST_HEAD(&vb->free_list); 853 839 854 840 vb_idx = addr_to_vb_idx(va->va_start); ··· 864 842 865 843 vbq = &get_cpu_var(vmap_block_queue); 866 844 spin_lock(&vbq->lock); 867 - list_add_rcu(&vb->free_list, &vbq->free); 845 + list_add_tail_rcu(&vb->free_list, &vbq->free); 868 846 spin_unlock(&vbq->lock); 869 847 put_cpu_var(vmap_block_queue); 870 848 871 - return vb; 849 + return vaddr; 872 850 } 873 851 874 852 static void free_vmap_block(struct vmap_block *vb) ··· 903 881 if (vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS) { 904 882 vb->free = 0; /* prevent further allocs after releasing lock */ 905 883 vb->dirty = VMAP_BBMAP_BITS; /* prevent purging it again */ 906 - bitmap_fill(vb->dirty_map, VMAP_BBMAP_BITS); 884 + vb->dirty_min = 0; 885 + vb->dirty_max = VMAP_BBMAP_BITS; 907 886 spin_lock(&vbq->lock); 908 887 list_del_rcu(&vb->free_list); 909 888 spin_unlock(&vbq->lock); ··· 933 910 { 934 911 struct vmap_block_queue *vbq; 935 912 struct vmap_block *vb; 936 - unsigned long addr = 0; 913 + void *vaddr = NULL; 937 914 unsigned int order; 938 915 939 916 BUG_ON(size & ~PAGE_MASK); ··· 948 925 } 949 926 order = get_order(size); 950 927 951 - again: 952 928 rcu_read_lock(); 953 929 vbq = &get_cpu_var(vmap_block_queue); 954 930 list_for_each_entry_rcu(vb, &vbq->free, free_list) { 955 - int i; 931 + unsigned long pages_off; 956 932 957 933 spin_lock(&vb->lock); 958 - if (vb->free < 1UL << order) 959 - goto next; 934 + if (vb->free < (1UL << order)) { 935 + spin_unlock(&vb->lock); 936 + continue; 937 + } 960 938 961 - i = VMAP_BBMAP_BITS - vb->free; 962 - addr = vb->va->va_start + (i << PAGE_SHIFT); 963 - BUG_ON(addr_to_vb_idx(addr) != 964 - addr_to_vb_idx(vb->va->va_start)); 939 + pages_off = VMAP_BBMAP_BITS - vb->free; 940 + vaddr = vmap_block_vaddr(vb->va->va_start, pages_off); 965 941 vb->free -= 1UL << order; 966 942 if (vb->free == 0) { 967 943 spin_lock(&vbq->lock); 968 944 list_del_rcu(&vb->free_list); 969 945 spin_unlock(&vbq->lock); 970 946 } 947 + 971 948 spin_unlock(&vb->lock); 972 949 break; 973 - next: 974 - spin_unlock(&vb->lock); 975 950 } 976 951 977 952 put_cpu_var(vmap_block_queue); 978 953 rcu_read_unlock(); 979 954 980 - if (!addr) { 981 - vb = new_vmap_block(gfp_mask); 982 - if (IS_ERR(vb)) 983 - return vb; 984 - goto again; 985 - } 955 + /* Allocate new block if nothing was found */ 956 + if (!vaddr) 957 + vaddr = new_vmap_block(order, gfp_mask); 986 958 987 - return (void *)addr; 959 + return vaddr; 988 960 } 989 961 990 962 static void vb_free(const void *addr, unsigned long size) ··· 997 979 order = get_order(size); 998 980 999 981 offset = (unsigned long)addr & (VMAP_BLOCK_SIZE - 1); 982 + offset >>= PAGE_SHIFT; 1000 983 1001 984 vb_idx = addr_to_vb_idx((unsigned long)addr); 1002 985 rcu_read_lock(); ··· 1008 989 vunmap_page_range((unsigned long)addr, (unsigned long)addr + size); 1009 990 1010 991 spin_lock(&vb->lock); 1011 - BUG_ON(bitmap_allocate_region(vb->dirty_map, offset >> PAGE_SHIFT, order)); 992 + 993 + /* Expand dirty range */ 994 + vb->dirty_min = min(vb->dirty_min, offset); 995 + vb->dirty_max = max(vb->dirty_max, offset + (1UL << order)); 1012 996 1013 997 vb->dirty += 1UL << order; 1014 998 if (vb->dirty == VMAP_BBMAP_BITS) { ··· 1050 1028 1051 1029 rcu_read_lock(); 1052 1030 list_for_each_entry_rcu(vb, &vbq->free, free_list) { 1053 - int i, j; 1054 - 1055 1031 spin_lock(&vb->lock); 1056 - i = find_first_bit(vb->dirty_map, VMAP_BBMAP_BITS); 1057 - if (i < VMAP_BBMAP_BITS) { 1032 + if (vb->dirty) { 1033 + unsigned long va_start = vb->va->va_start; 1058 1034 unsigned long s, e; 1059 1035 1060 - j = find_last_bit(vb->dirty_map, 1061 - VMAP_BBMAP_BITS); 1062 - j = j + 1; /* need exclusive index */ 1036 + s = va_start + (vb->dirty_min << PAGE_SHIFT); 1037 + e = va_start + (vb->dirty_max << PAGE_SHIFT); 1063 1038 1064 - s = vb->va->va_start + (i << PAGE_SHIFT); 1065 - e = vb->va->va_start + (j << PAGE_SHIFT); 1039 + start = min(s, start); 1040 + end = max(e, end); 1041 + 1066 1042 flush = 1; 1067 - 1068 - if (s < start) 1069 - start = s; 1070 - if (e > end) 1071 - end = e; 1072 1043 } 1073 1044 spin_unlock(&vb->lock); 1074 1045 }

+716 -273

mm/zsmalloc.c

··· 12 12 */ 13 13 14 14 /* 15 - * This allocator is designed for use with zram. Thus, the allocator is 16 - * supposed to work well under low memory conditions. In particular, it 17 - * never attempts higher order page allocation which is very likely to 18 - * fail under memory pressure. On the other hand, if we just use single 19 - * (0-order) pages, it would suffer from very high fragmentation -- 20 - * any object of size PAGE_SIZE/2 or larger would occupy an entire page. 21 - * This was one of the major issues with its predecessor (xvmalloc). 22 - * 23 - * To overcome these issues, zsmalloc allocates a bunch of 0-order pages 24 - * and links them together using various 'struct page' fields. These linked 25 - * pages act as a single higher-order page i.e. an object can span 0-order 26 - * page boundaries. The code refers to these linked pages as a single entity 27 - * called zspage. 28 - * 29 - * For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE 30 - * since this satisfies the requirements of all its current users (in the 31 - * worst case, page is incompressible and is thus stored "as-is" i.e. in 32 - * uncompressed form). For allocation requests larger than this size, failure 33 - * is returned (see zs_malloc). 34 - * 35 - * Additionally, zs_malloc() does not return a dereferenceable pointer. 36 - * Instead, it returns an opaque handle (unsigned long) which encodes actual 37 - * location of the allocated object. The reason for this indirection is that 38 - * zsmalloc does not keep zspages permanently mapped since that would cause 39 - * issues on 32-bit systems where the VA region for kernel space mappings 40 - * is very small. So, before using the allocating memory, the object has to 41 - * be mapped using zs_map_object() to get a usable pointer and subsequently 42 - * unmapped using zs_unmap_object(). 43 - * 44 15 * Following is how we use various fields and flags of underlying 45 16 * struct page(s) to form a zspage. 46 17 * ··· 28 57 * 29 58 * page->private (union with page->first_page): refers to the 30 59 * component page after the first page 60 + * If the page is first_page for huge object, it stores handle. 61 + * Look at size_class->huge. 31 62 * page->freelist: points to the first free object in zspage. 32 63 * Free objects are linked together using in-place 33 64 * metadata. ··· 51 78 52 79 #include <linux/module.h> 53 80 #include <linux/kernel.h> 81 + #include <linux/sched.h> 54 82 #include <linux/bitops.h> 55 83 #include <linux/errno.h> 56 84 #include <linux/highmem.h> ··· 84 110 #define ZS_MAX_ZSPAGE_ORDER 2 85 111 #define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER) 86 112 113 + #define ZS_HANDLE_SIZE (sizeof(unsigned long)) 114 + 87 115 /* 88 116 * Object location (<PFN>, <obj_idx>) is encoded as 89 117 * as single (unsigned long) handle value. ··· 109 133 #endif 110 134 #endif 111 135 #define _PFN_BITS (MAX_PHYSMEM_BITS - PAGE_SHIFT) 112 - #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS) 136 + 137 + /* 138 + * Memory for allocating for handle keeps object position by 139 + * encoding <page, obj_idx> and the encoded value has a room 140 + * in least bit(ie, look at obj_to_location). 141 + * We use the bit to synchronize between object access by 142 + * user and migration. 143 + */ 144 + #define HANDLE_PIN_BIT 0 145 + 146 + /* 147 + * Head in allocated object should have OBJ_ALLOCATED_TAG 148 + * to identify the object was allocated or not. 149 + * It's okay to add the status bit in the least bit because 150 + * header keeps handle which is 4byte-aligned address so we 151 + * have room for two bit at least. 152 + */ 153 + #define OBJ_ALLOCATED_TAG 1 154 + #define OBJ_TAG_BITS 1 155 + #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS) 113 156 #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) 114 157 115 158 #define MAX(a, b) ((a) >= (b) ? (a) : (b)) 116 159 /* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */ 117 160 #define ZS_MIN_ALLOC_SIZE \ 118 161 MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS)) 162 + /* each chunk includes extra space to keep handle */ 119 163 #define ZS_MAX_ALLOC_SIZE PAGE_SIZE 120 164 121 165 /* ··· 168 172 enum zs_stat_type { 169 173 OBJ_ALLOCATED, 170 174 OBJ_USED, 175 + CLASS_ALMOST_FULL, 176 + CLASS_ALMOST_EMPTY, 171 177 NR_ZS_STAT_TYPE, 172 178 }; 173 179 ··· 214 216 215 217 /* Number of PAGE_SIZE sized pages to combine to form a 'zspage' */ 216 218 int pages_per_zspage; 219 + /* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */ 220 + bool huge; 217 221 218 222 #ifdef CONFIG_ZSMALLOC_STAT 219 223 struct zs_size_stat stats; ··· 233 233 * This must be power of 2 and less than or equal to ZS_ALIGN 234 234 */ 235 235 struct link_free { 236 - /* Handle of next free chunk (encodes <PFN, obj_idx>) */ 237 - void *next; 236 + union { 237 + /* 238 + * Position of next free chunk (encodes <PFN, obj_idx>) 239 + * It's valid for non-allocated object 240 + */ 241 + void *next; 242 + /* 243 + * Handle of allocated object. 244 + */ 245 + unsigned long handle; 246 + }; 238 247 }; 239 248 240 249 struct zs_pool { 241 250 char *name; 242 251 243 252 struct size_class **size_class; 253 + struct kmem_cache *handle_cachep; 244 254 245 255 gfp_t flags; /* allocation flags used when growing pool */ 246 256 atomic_long_t pages_allocated; ··· 277 267 #endif 278 268 char *vm_addr; /* address of kmap_atomic()'ed pages */ 279 269 enum zs_mapmode vm_mm; /* mapping mode */ 270 + bool huge; 280 271 }; 272 + 273 + static int create_handle_cache(struct zs_pool *pool) 274 + { 275 + pool->handle_cachep = kmem_cache_create("zs_handle", ZS_HANDLE_SIZE, 276 + 0, 0, NULL); 277 + return pool->handle_cachep ? 0 : 1; 278 + } 279 + 280 + static void destroy_handle_cache(struct zs_pool *pool) 281 + { 282 + kmem_cache_destroy(pool->handle_cachep); 283 + } 284 + 285 + static unsigned long alloc_handle(struct zs_pool *pool) 286 + { 287 + return (unsigned long)kmem_cache_alloc(pool->handle_cachep, 288 + pool->flags & ~__GFP_HIGHMEM); 289 + } 290 + 291 + static void free_handle(struct zs_pool *pool, unsigned long handle) 292 + { 293 + kmem_cache_free(pool->handle_cachep, (void *)handle); 294 + } 295 + 296 + static void record_obj(unsigned long handle, unsigned long obj) 297 + { 298 + *(unsigned long *)handle = obj; 299 + } 281 300 282 301 /* zpool driver */ 283 302 ··· 385 346 MODULE_ALIAS("zpool-zsmalloc"); 386 347 #endif /* CONFIG_ZPOOL */ 387 348 349 + static unsigned int get_maxobj_per_zspage(int size, int pages_per_zspage) 350 + { 351 + return pages_per_zspage * PAGE_SIZE / size; 352 + } 353 + 388 354 /* per-cpu VM mapping areas for zspage accesses that cross page boundaries */ 389 355 static DEFINE_PER_CPU(struct mapping_area, zs_map_area); 390 356 ··· 440 396 idx = DIV_ROUND_UP(size - ZS_MIN_ALLOC_SIZE, 441 397 ZS_SIZE_CLASS_DELTA); 442 398 443 - return idx; 399 + return min(zs_size_classes - 1, idx); 444 400 } 401 + 402 + #ifdef CONFIG_ZSMALLOC_STAT 403 + 404 + static inline void zs_stat_inc(struct size_class *class, 405 + enum zs_stat_type type, unsigned long cnt) 406 + { 407 + class->stats.objs[type] += cnt; 408 + } 409 + 410 + static inline void zs_stat_dec(struct size_class *class, 411 + enum zs_stat_type type, unsigned long cnt) 412 + { 413 + class->stats.objs[type] -= cnt; 414 + } 415 + 416 + static inline unsigned long zs_stat_get(struct size_class *class, 417 + enum zs_stat_type type) 418 + { 419 + return class->stats.objs[type]; 420 + } 421 + 422 + static int __init zs_stat_init(void) 423 + { 424 + if (!debugfs_initialized()) 425 + return -ENODEV; 426 + 427 + zs_stat_root = debugfs_create_dir("zsmalloc", NULL); 428 + if (!zs_stat_root) 429 + return -ENOMEM; 430 + 431 + return 0; 432 + } 433 + 434 + static void __exit zs_stat_exit(void) 435 + { 436 + debugfs_remove_recursive(zs_stat_root); 437 + } 438 + 439 + static int zs_stats_size_show(struct seq_file *s, void *v) 440 + { 441 + int i; 442 + struct zs_pool *pool = s->private; 443 + struct size_class *class; 444 + int objs_per_zspage; 445 + unsigned long class_almost_full, class_almost_empty; 446 + unsigned long obj_allocated, obj_used, pages_used; 447 + unsigned long total_class_almost_full = 0, total_class_almost_empty = 0; 448 + unsigned long total_objs = 0, total_used_objs = 0, total_pages = 0; 449 + 450 + seq_printf(s, " %5s %5s %11s %12s %13s %10s %10s %16s\n", 451 + "class", "size", "almost_full", "almost_empty", 452 + "obj_allocated", "obj_used", "pages_used", 453 + "pages_per_zspage"); 454 + 455 + for (i = 0; i < zs_size_classes; i++) { 456 + class = pool->size_class[i]; 457 + 458 + if (class->index != i) 459 + continue; 460 + 461 + spin_lock(&class->lock); 462 + class_almost_full = zs_stat_get(class, CLASS_ALMOST_FULL); 463 + class_almost_empty = zs_stat_get(class, CLASS_ALMOST_EMPTY); 464 + obj_allocated = zs_stat_get(class, OBJ_ALLOCATED); 465 + obj_used = zs_stat_get(class, OBJ_USED); 466 + spin_unlock(&class->lock); 467 + 468 + objs_per_zspage = get_maxobj_per_zspage(class->size, 469 + class->pages_per_zspage); 470 + pages_used = obj_allocated / objs_per_zspage * 471 + class->pages_per_zspage; 472 + 473 + seq_printf(s, " %5u %5u %11lu %12lu %13lu %10lu %10lu %16d\n", 474 + i, class->size, class_almost_full, class_almost_empty, 475 + obj_allocated, obj_used, pages_used, 476 + class->pages_per_zspage); 477 + 478 + total_class_almost_full += class_almost_full; 479 + total_class_almost_empty += class_almost_empty; 480 + total_objs += obj_allocated; 481 + total_used_objs += obj_used; 482 + total_pages += pages_used; 483 + } 484 + 485 + seq_puts(s, "\n"); 486 + seq_printf(s, " %5s %5s %11lu %12lu %13lu %10lu %10lu\n", 487 + "Total", "", total_class_almost_full, 488 + total_class_almost_empty, total_objs, 489 + total_used_objs, total_pages); 490 + 491 + return 0; 492 + } 493 + 494 + static int zs_stats_size_open(struct inode *inode, struct file *file) 495 + { 496 + return single_open(file, zs_stats_size_show, inode->i_private); 497 + } 498 + 499 + static const struct file_operations zs_stat_size_ops = { 500 + .open = zs_stats_size_open, 501 + .read = seq_read, 502 + .llseek = seq_lseek, 503 + .release = single_release, 504 + }; 505 + 506 + static int zs_pool_stat_create(char *name, struct zs_pool *pool) 507 + { 508 + struct dentry *entry; 509 + 510 + if (!zs_stat_root) 511 + return -ENODEV; 512 + 513 + entry = debugfs_create_dir(name, zs_stat_root); 514 + if (!entry) { 515 + pr_warn("debugfs dir <%s> creation failed\n", name); 516 + return -ENOMEM; 517 + } 518 + pool->stat_dentry = entry; 519 + 520 + entry = debugfs_create_file("classes", S_IFREG | S_IRUGO, 521 + pool->stat_dentry, pool, &zs_stat_size_ops); 522 + if (!entry) { 523 + pr_warn("%s: debugfs file entry <%s> creation failed\n", 524 + name, "classes"); 525 + return -ENOMEM; 526 + } 527 + 528 + return 0; 529 + } 530 + 531 + static void zs_pool_stat_destroy(struct zs_pool *pool) 532 + { 533 + debugfs_remove_recursive(pool->stat_dentry); 534 + } 535 + 536 + #else /* CONFIG_ZSMALLOC_STAT */ 537 + 538 + static inline void zs_stat_inc(struct size_class *class, 539 + enum zs_stat_type type, unsigned long cnt) 540 + { 541 + } 542 + 543 + static inline void zs_stat_dec(struct size_class *class, 544 + enum zs_stat_type type, unsigned long cnt) 545 + { 546 + } 547 + 548 + static inline unsigned long zs_stat_get(struct size_class *class, 549 + enum zs_stat_type type) 550 + { 551 + return 0; 552 + } 553 + 554 + static int __init zs_stat_init(void) 555 + { 556 + return 0; 557 + } 558 + 559 + static void __exit zs_stat_exit(void) 560 + { 561 + } 562 + 563 + static inline int zs_pool_stat_create(char *name, struct zs_pool *pool) 564 + { 565 + return 0; 566 + } 567 + 568 + static inline void zs_pool_stat_destroy(struct zs_pool *pool) 569 + { 570 + } 571 + 572 + #endif 573 + 445 574 446 575 /* 447 576 * For each size class, zspages are divided into different groups ··· 636 419 fg = ZS_EMPTY; 637 420 else if (inuse == max_objects) 638 421 fg = ZS_FULL; 639 - else if (inuse <= max_objects / fullness_threshold_frac) 422 + else if (inuse <= 3 * max_objects / fullness_threshold_frac) 640 423 fg = ZS_ALMOST_EMPTY; 641 424 else 642 425 fg = ZS_ALMOST_FULL; ··· 665 448 list_add_tail(&page->lru, &(*head)->lru); 666 449 667 450 *head = page; 451 + zs_stat_inc(class, fullness == ZS_ALMOST_EMPTY ? 452 + CLASS_ALMOST_EMPTY : CLASS_ALMOST_FULL, 1); 668 453 } 669 454 670 455 /* ··· 692 473 struct page, lru); 693 474 694 475 list_del_init(&page->lru); 476 + zs_stat_dec(class, fullness == ZS_ALMOST_EMPTY ? 477 + CLASS_ALMOST_EMPTY : CLASS_ALMOST_FULL, 1); 695 478 } 696 479 697 480 /* ··· 705 484 * page from the freelist of the old fullness group to that of the new 706 485 * fullness group. 707 486 */ 708 - static enum fullness_group fix_fullness_group(struct zs_pool *pool, 487 + static enum fullness_group fix_fullness_group(struct size_class *class, 709 488 struct page *page) 710 489 { 711 490 int class_idx; 712 - struct size_class *class; 713 491 enum fullness_group currfg, newfg; 714 492 715 493 BUG_ON(!is_first_page(page)); ··· 718 498 if (newfg == currfg) 719 499 goto out; 720 500 721 - class = pool->size_class[class_idx]; 722 501 remove_zspage(page, class, currfg); 723 502 insert_zspage(page, class, newfg); 724 503 set_zspage_mapping(page, class_idx, newfg); ··· 731 512 * to form a zspage for each size class. This is important 732 513 * to reduce wastage due to unusable space left at end of 733 514 * each zspage which is given as: 734 - * wastage = Zp - Zp % size_class 515 + * wastage = Zp % class_size 516 + * usage = Zp - wastage 735 517 * where Zp = zspage size = k * PAGE_SIZE where k = 1, 2, ... 736 518 * 737 519 * For example, for size class of 3/8 * PAGE_SIZE, we should ··· 791 571 792 572 /* 793 573 * Encode <page, obj_idx> as a single handle value. 794 - * On hardware platforms with physical memory starting at 0x0 the pfn 795 - * could be 0 so we ensure that the handle will never be 0 by adjusting the 796 - * encoded obj_idx value before encoding. 574 + * We use the least bit of handle for tagging. 797 575 */ 798 - static void *obj_location_to_handle(struct page *page, unsigned long obj_idx) 576 + static void *location_to_obj(struct page *page, unsigned long obj_idx) 799 577 { 800 - unsigned long handle; 578 + unsigned long obj; 801 579 802 580 if (!page) { 803 581 BUG_ON(obj_idx); 804 582 return NULL; 805 583 } 806 584 807 - handle = page_to_pfn(page) << OBJ_INDEX_BITS; 808 - handle |= ((obj_idx + 1) & OBJ_INDEX_MASK); 585 + obj = page_to_pfn(page) << OBJ_INDEX_BITS; 586 + obj |= ((obj_idx) & OBJ_INDEX_MASK); 587 + obj <<= OBJ_TAG_BITS; 809 588 810 - return (void *)handle; 589 + return (void *)obj; 811 590 } 812 591 813 592 /* 814 593 * Decode <page, obj_idx> pair from the given object handle. We adjust the 815 594 * decoded obj_idx back to its original value since it was adjusted in 816 - * obj_location_to_handle(). 595 + * location_to_obj(). 817 596 */ 818 - static void obj_handle_to_location(unsigned long handle, struct page **page, 597 + static void obj_to_location(unsigned long obj, struct page **page, 819 598 unsigned long *obj_idx) 820 599 { 821 - *page = pfn_to_page(handle >> OBJ_INDEX_BITS); 822 - *obj_idx = (handle & OBJ_INDEX_MASK) - 1; 600 + obj >>= OBJ_TAG_BITS; 601 + *page = pfn_to_page(obj >> OBJ_INDEX_BITS); 602 + *obj_idx = (obj & OBJ_INDEX_MASK); 603 + } 604 + 605 + static unsigned long handle_to_obj(unsigned long handle) 606 + { 607 + return *(unsigned long *)handle; 608 + } 609 + 610 + static unsigned long obj_to_head(struct size_class *class, struct page *page, 611 + void *obj) 612 + { 613 + if (class->huge) { 614 + VM_BUG_ON(!is_first_page(page)); 615 + return *(unsigned long *)page_private(page); 616 + } else 617 + return *(unsigned long *)obj; 823 618 } 824 619 825 620 static unsigned long obj_idx_to_offset(struct page *page, ··· 846 611 off = page->index; 847 612 848 613 return off + obj_idx * class_size; 614 + } 615 + 616 + static inline int trypin_tag(unsigned long handle) 617 + { 618 + unsigned long *ptr = (unsigned long *)handle; 619 + 620 + return !test_and_set_bit_lock(HANDLE_PIN_BIT, ptr); 621 + } 622 + 623 + static void pin_tag(unsigned long handle) 624 + { 625 + while (!trypin_tag(handle)); 626 + } 627 + 628 + static void unpin_tag(unsigned long handle) 629 + { 630 + unsigned long *ptr = (unsigned long *)handle; 631 + 632 + clear_bit_unlock(HANDLE_PIN_BIT, ptr); 849 633 } 850 634 851 635 static void reset_page(struct page *page) ··· 928 674 link = (struct link_free *)vaddr + off / sizeof(*link); 929 675 930 676 while ((off += class->size) < PAGE_SIZE) { 931 - link->next = obj_location_to_handle(page, i++); 677 + link->next = location_to_obj(page, i++); 932 678 link += class->size / sizeof(*link); 933 679 } 934 680 ··· 938 684 * page (if present) 939 685 */ 940 686 next_page = get_next_page(page); 941 - link->next = obj_location_to_handle(next_page, 0); 687 + link->next = location_to_obj(next_page, 0); 942 688 kunmap_atomic(vaddr); 943 689 page = next_page; 944 690 off %= PAGE_SIZE; ··· 992 738 993 739 init_zspage(first_page, class); 994 740 995 - first_page->freelist = obj_location_to_handle(first_page, 0); 741 + first_page->freelist = location_to_obj(first_page, 0); 996 742 /* Maximum number of objects we can store in this zspage */ 997 743 first_page->objects = class->pages_per_zspage * PAGE_SIZE / class->size; 998 744 ··· 1114 860 { 1115 861 int sizes[2]; 1116 862 void *addr; 1117 - char *buf = area->vm_buf; 863 + char *buf; 1118 864 1119 865 /* no write fastpath */ 1120 866 if (area->vm_mm == ZS_MM_RO) 1121 867 goto out; 868 + 869 + buf = area->vm_buf; 870 + if (!area->huge) { 871 + buf = buf + ZS_HANDLE_SIZE; 872 + size -= ZS_HANDLE_SIZE; 873 + off += ZS_HANDLE_SIZE; 874 + } 1122 875 1123 876 sizes[0] = PAGE_SIZE - off; 1124 877 sizes[1] = size - sizes[0]; ··· 1213 952 zs_size_classes = nr; 1214 953 } 1215 954 1216 - static unsigned int get_maxobj_per_zspage(int size, int pages_per_zspage) 1217 - { 1218 - return pages_per_zspage * PAGE_SIZE / size; 1219 - } 1220 - 1221 955 static bool can_merge(struct size_class *prev, int size, int pages_per_zspage) 1222 956 { 1223 957 if (prev->pages_per_zspage != pages_per_zspage) ··· 1225 969 return true; 1226 970 } 1227 971 1228 - #ifdef CONFIG_ZSMALLOC_STAT 1229 - 1230 - static inline void zs_stat_inc(struct size_class *class, 1231 - enum zs_stat_type type, unsigned long cnt) 972 + static bool zspage_full(struct page *page) 1232 973 { 1233 - class->stats.objs[type] += cnt; 974 + BUG_ON(!is_first_page(page)); 975 + 976 + return page->inuse == page->objects; 1234 977 } 1235 - 1236 - static inline void zs_stat_dec(struct size_class *class, 1237 - enum zs_stat_type type, unsigned long cnt) 1238 - { 1239 - class->stats.objs[type] -= cnt; 1240 - } 1241 - 1242 - static inline unsigned long zs_stat_get(struct size_class *class, 1243 - enum zs_stat_type type) 1244 - { 1245 - return class->stats.objs[type]; 1246 - } 1247 - 1248 - static int __init zs_stat_init(void) 1249 - { 1250 - if (!debugfs_initialized()) 1251 - return -ENODEV; 1252 - 1253 - zs_stat_root = debugfs_create_dir("zsmalloc", NULL); 1254 - if (!zs_stat_root) 1255 - return -ENOMEM; 1256 - 1257 - return 0; 1258 - } 1259 - 1260 - static void __exit zs_stat_exit(void) 1261 - { 1262 - debugfs_remove_recursive(zs_stat_root); 1263 - } 1264 - 1265 - static int zs_stats_size_show(struct seq_file *s, void *v) 1266 - { 1267 - int i; 1268 - struct zs_pool *pool = s->private; 1269 - struct size_class *class; 1270 - int objs_per_zspage; 1271 - unsigned long obj_allocated, obj_used, pages_used; 1272 - unsigned long total_objs = 0, total_used_objs = 0, total_pages = 0; 1273 - 1274 - seq_printf(s, " %5s %5s %13s %10s %10s\n", "class", "size", 1275 - "obj_allocated", "obj_used", "pages_used"); 1276 - 1277 - for (i = 0; i < zs_size_classes; i++) { 1278 - class = pool->size_class[i]; 1279 - 1280 - if (class->index != i) 1281 - continue; 1282 - 1283 - spin_lock(&class->lock); 1284 - obj_allocated = zs_stat_get(class, OBJ_ALLOCATED); 1285 - obj_used = zs_stat_get(class, OBJ_USED); 1286 - spin_unlock(&class->lock); 1287 - 1288 - objs_per_zspage = get_maxobj_per_zspage(class->size, 1289 - class->pages_per_zspage); 1290 - pages_used = obj_allocated / objs_per_zspage * 1291 - class->pages_per_zspage; 1292 - 1293 - seq_printf(s, " %5u %5u %10lu %10lu %10lu\n", i, 1294 - class->size, obj_allocated, obj_used, pages_used); 1295 - 1296 - total_objs += obj_allocated; 1297 - total_used_objs += obj_used; 1298 - total_pages += pages_used; 1299 - } 1300 - 1301 - seq_puts(s, "\n"); 1302 - seq_printf(s, " %5s %5s %10lu %10lu %10lu\n", "Total", "", 1303 - total_objs, total_used_objs, total_pages); 1304 - 1305 - return 0; 1306 - } 1307 - 1308 - static int zs_stats_size_open(struct inode *inode, struct file *file) 1309 - { 1310 - return single_open(file, zs_stats_size_show, inode->i_private); 1311 - } 1312 - 1313 - static const struct file_operations zs_stat_size_ops = { 1314 - .open = zs_stats_size_open, 1315 - .read = seq_read, 1316 - .llseek = seq_lseek, 1317 - .release = single_release, 1318 - }; 1319 - 1320 - static int zs_pool_stat_create(char *name, struct zs_pool *pool) 1321 - { 1322 - struct dentry *entry; 1323 - 1324 - if (!zs_stat_root) 1325 - return -ENODEV; 1326 - 1327 - entry = debugfs_create_dir(name, zs_stat_root); 1328 - if (!entry) { 1329 - pr_warn("debugfs dir <%s> creation failed\n", name); 1330 - return -ENOMEM; 1331 - } 1332 - pool->stat_dentry = entry; 1333 - 1334 - entry = debugfs_create_file("obj_in_classes", S_IFREG | S_IRUGO, 1335 - pool->stat_dentry, pool, &zs_stat_size_ops); 1336 - if (!entry) { 1337 - pr_warn("%s: debugfs file entry <%s> creation failed\n", 1338 - name, "obj_in_classes"); 1339 - return -ENOMEM; 1340 - } 1341 - 1342 - return 0; 1343 - } 1344 - 1345 - static void zs_pool_stat_destroy(struct zs_pool *pool) 1346 - { 1347 - debugfs_remove_recursive(pool->stat_dentry); 1348 - } 1349 - 1350 - #else /* CONFIG_ZSMALLOC_STAT */ 1351 - 1352 - static inline void zs_stat_inc(struct size_class *class, 1353 - enum zs_stat_type type, unsigned long cnt) 1354 - { 1355 - } 1356 - 1357 - static inline void zs_stat_dec(struct size_class *class, 1358 - enum zs_stat_type type, unsigned long cnt) 1359 - { 1360 - } 1361 - 1362 - static inline unsigned long zs_stat_get(struct size_class *class, 1363 - enum zs_stat_type type) 1364 - { 1365 - return 0; 1366 - } 1367 - 1368 - static int __init zs_stat_init(void) 1369 - { 1370 - return 0; 1371 - } 1372 - 1373 - static void __exit zs_stat_exit(void) 1374 - { 1375 - } 1376 - 1377 - static inline int zs_pool_stat_create(char *name, struct zs_pool *pool) 1378 - { 1379 - return 0; 1380 - } 1381 - 1382 - static inline void zs_pool_stat_destroy(struct zs_pool *pool) 1383 - { 1384 - } 1385 - 1386 - #endif 1387 978 1388 979 unsigned long zs_get_total_pages(struct zs_pool *pool) 1389 980 { ··· 1256 1153 enum zs_mapmode mm) 1257 1154 { 1258 1155 struct page *page; 1259 - unsigned long obj_idx, off; 1156 + unsigned long obj, obj_idx, off; 1260 1157 1261 1158 unsigned int class_idx; 1262 1159 enum fullness_group fg; 1263 1160 struct size_class *class; 1264 1161 struct mapping_area *area; 1265 1162 struct page *pages[2]; 1163 + void *ret; 1266 1164 1267 1165 BUG_ON(!handle); 1268 1166 ··· 1274 1170 */ 1275 1171 BUG_ON(in_interrupt()); 1276 1172 1277 - obj_handle_to_location(handle, &page, &obj_idx); 1173 + /* From now on, migration cannot move the object */ 1174 + pin_tag(handle); 1175 + 1176 + obj = handle_to_obj(handle); 1177 + obj_to_location(obj, &page, &obj_idx); 1278 1178 get_zspage_mapping(get_first_page(page), &class_idx, &fg); 1279 1179 class = pool->size_class[class_idx]; 1280 1180 off = obj_idx_to_offset(page, obj_idx, class->size); ··· 1288 1180 if (off + class->size <= PAGE_SIZE) { 1289 1181 /* this object is contained entirely within a page */ 1290 1182 area->vm_addr = kmap_atomic(page); 1291 - return area->vm_addr + off; 1183 + ret = area->vm_addr + off; 1184 + goto out; 1292 1185 } 1293 1186 1294 1187 /* this object spans two pages */ ··· 1297 1188 pages[1] = get_next_page(page); 1298 1189 BUG_ON(!pages[1]); 1299 1190 1300 - return __zs_map_object(area, pages, off, class->size); 1191 + ret = __zs_map_object(area, pages, off, class->size); 1192 + out: 1193 + if (!class->huge) 1194 + ret += ZS_HANDLE_SIZE; 1195 + 1196 + return ret; 1301 1197 } 1302 1198 EXPORT_SYMBOL_GPL(zs_map_object); 1303 1199 1304 1200 void zs_unmap_object(struct zs_pool *pool, unsigned long handle) 1305 1201 { 1306 1202 struct page *page; 1307 - unsigned long obj_idx, off; 1203 + unsigned long obj, obj_idx, off; 1308 1204 1309 1205 unsigned int class_idx; 1310 1206 enum fullness_group fg; ··· 1318 1204 1319 1205 BUG_ON(!handle); 1320 1206 1321 - obj_handle_to_location(handle, &page, &obj_idx); 1207 + obj = handle_to_obj(handle); 1208 + obj_to_location(obj, &page, &obj_idx); 1322 1209 get_zspage_mapping(get_first_page(page), &class_idx, &fg); 1323 1210 class = pool->size_class[class_idx]; 1324 1211 off = obj_idx_to_offset(page, obj_idx, class->size); ··· 1337 1222 __zs_unmap_object(area, pages, off, class->size); 1338 1223 } 1339 1224 put_cpu_var(zs_map_area); 1225 + unpin_tag(handle); 1340 1226 } 1341 1227 EXPORT_SYMBOL_GPL(zs_unmap_object); 1228 + 1229 + static unsigned long obj_malloc(struct page *first_page, 1230 + struct size_class *class, unsigned long handle) 1231 + { 1232 + unsigned long obj; 1233 + struct link_free *link; 1234 + 1235 + struct page *m_page; 1236 + unsigned long m_objidx, m_offset; 1237 + void *vaddr; 1238 + 1239 + handle |= OBJ_ALLOCATED_TAG; 1240 + obj = (unsigned long)first_page->freelist; 1241 + obj_to_location(obj, &m_page, &m_objidx); 1242 + m_offset = obj_idx_to_offset(m_page, m_objidx, class->size); 1243 + 1244 + vaddr = kmap_atomic(m_page); 1245 + link = (struct link_free *)vaddr + m_offset / sizeof(*link); 1246 + first_page->freelist = link->next; 1247 + if (!class->huge) 1248 + /* record handle in the header of allocated chunk */ 1249 + link->handle = handle; 1250 + else 1251 + /* record handle in first_page->private */ 1252 + set_page_private(first_page, handle); 1253 + kunmap_atomic(vaddr); 1254 + first_page->inuse++; 1255 + zs_stat_inc(class, OBJ_USED, 1); 1256 + 1257 + return obj; 1258 + } 1259 + 1342 1260 1343 1261 /** 1344 1262 * zs_malloc - Allocate block of given size from pool. ··· 1384 1236 */ 1385 1237 unsigned long zs_malloc(struct zs_pool *pool, size_t size) 1386 1238 { 1387 - unsigned long obj; 1388 - struct link_free *link; 1239 + unsigned long handle, obj; 1389 1240 struct size_class *class; 1390 - void *vaddr; 1391 - 1392 - struct page *first_page, *m_page; 1393 - unsigned long m_objidx, m_offset; 1241 + struct page *first_page; 1394 1242 1395 1243 if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE)) 1396 1244 return 0; 1397 1245 1246 + handle = alloc_handle(pool); 1247 + if (!handle) 1248 + return 0; 1249 + 1250 + /* extra space in chunk to keep the handle */ 1251 + size += ZS_HANDLE_SIZE; 1398 1252 class = pool->size_class[get_size_class_index(size)]; 1399 1253 1400 1254 spin_lock(&class->lock); ··· 1405 1255 if (!first_page) { 1406 1256 spin_unlock(&class->lock); 1407 1257 first_page = alloc_zspage(class, pool->flags); 1408 - if (unlikely(!first_page)) 1258 + if (unlikely(!first_page)) { 1259 + free_handle(pool, handle); 1409 1260 return 0; 1261 + } 1410 1262 1411 1263 set_zspage_mapping(first_page, class->index, ZS_EMPTY); 1412 1264 atomic_long_add(class->pages_per_zspage, ··· 1419 1267 class->size, class->pages_per_zspage)); 1420 1268 } 1421 1269 1422 - obj = (unsigned long)first_page->freelist; 1423 - obj_handle_to_location(obj, &m_page, &m_objidx); 1424 - m_offset = obj_idx_to_offset(m_page, m_objidx, class->size); 1425 - 1426 - vaddr = kmap_atomic(m_page); 1427 - link = (struct link_free *)vaddr + m_offset / sizeof(*link); 1428 - first_page->freelist = link->next; 1429 - memset(link, POISON_INUSE, sizeof(*link)); 1430 - kunmap_atomic(vaddr); 1431 - 1432 - first_page->inuse++; 1433 - zs_stat_inc(class, OBJ_USED, 1); 1270 + obj = obj_malloc(first_page, class, handle); 1434 1271 /* Now move the zspage to another fullness group, if required */ 1435 - fix_fullness_group(pool, first_page); 1272 + fix_fullness_group(class, first_page); 1273 + record_obj(handle, obj); 1436 1274 spin_unlock(&class->lock); 1437 1275 1438 - return obj; 1276 + return handle; 1439 1277 } 1440 1278 EXPORT_SYMBOL_GPL(zs_malloc); 1441 1279 1442 - void zs_free(struct zs_pool *pool, unsigned long obj) 1280 + static void obj_free(struct zs_pool *pool, struct size_class *class, 1281 + unsigned long obj) 1443 1282 { 1444 1283 struct link_free *link; 1445 1284 struct page *first_page, *f_page; 1446 1285 unsigned long f_objidx, f_offset; 1447 1286 void *vaddr; 1287 + int class_idx; 1288 + enum fullness_group fullness; 1448 1289 1290 + BUG_ON(!obj); 1291 + 1292 + obj &= ~OBJ_ALLOCATED_TAG; 1293 + obj_to_location(obj, &f_page, &f_objidx); 1294 + first_page = get_first_page(f_page); 1295 + 1296 + get_zspage_mapping(first_page, &class_idx, &fullness); 1297 + f_offset = obj_idx_to_offset(f_page, f_objidx, class->size); 1298 + 1299 + vaddr = kmap_atomic(f_page); 1300 + 1301 + /* Insert this object in containing zspage's freelist */ 1302 + link = (struct link_free *)(vaddr + f_offset); 1303 + link->next = first_page->freelist; 1304 + if (class->huge) 1305 + set_page_private(first_page, 0); 1306 + kunmap_atomic(vaddr); 1307 + first_page->freelist = (void *)obj; 1308 + first_page->inuse--; 1309 + zs_stat_dec(class, OBJ_USED, 1); 1310 + } 1311 + 1312 + void zs_free(struct zs_pool *pool, unsigned long handle) 1313 + { 1314 + struct page *first_page, *f_page; 1315 + unsigned long obj, f_objidx; 1449 1316 int class_idx; 1450 1317 struct size_class *class; 1451 1318 enum fullness_group fullness; 1452 1319 1453 - if (unlikely(!obj)) 1320 + if (unlikely(!handle)) 1454 1321 return; 1455 1322 1456 - obj_handle_to_location(obj, &f_page, &f_objidx); 1323 + pin_tag(handle); 1324 + obj = handle_to_obj(handle); 1325 + obj_to_location(obj, &f_page, &f_objidx); 1457 1326 first_page = get_first_page(f_page); 1458 1327 1459 1328 get_zspage_mapping(first_page, &class_idx, &fullness); 1460 1329 class = pool->size_class[class_idx]; 1461 - f_offset = obj_idx_to_offset(f_page, f_objidx, class->size); 1462 1330 1463 1331 spin_lock(&class->lock); 1464 - 1465 - /* Insert this object in containing zspage's freelist */ 1466 - vaddr = kmap_atomic(f_page); 1467 - link = (struct link_free *)(vaddr + f_offset); 1468 - link->next = first_page->freelist; 1469 - kunmap_atomic(vaddr); 1470 - first_page->freelist = (void *)obj; 1471 - 1472 - first_page->inuse--; 1473 - fullness = fix_fullness_group(pool, first_page); 1474 - 1475 - zs_stat_dec(class, OBJ_USED, 1); 1476 - if (fullness == ZS_EMPTY) 1332 + obj_free(pool, class, obj); 1333 + fullness = fix_fullness_group(class, first_page); 1334 + if (fullness == ZS_EMPTY) { 1477 1335 zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage( 1478 1336 class->size, class->pages_per_zspage)); 1479 - 1480 - spin_unlock(&class->lock); 1481 - 1482 - if (fullness == ZS_EMPTY) { 1483 1337 atomic_long_sub(class->pages_per_zspage, 1484 1338 &pool->pages_allocated); 1485 1339 free_zspage(first_page); 1486 1340 } 1341 + spin_unlock(&class->lock); 1342 + unpin_tag(handle); 1343 + 1344 + free_handle(pool, handle); 1487 1345 } 1488 1346 EXPORT_SYMBOL_GPL(zs_free); 1347 + 1348 + static void zs_object_copy(unsigned long src, unsigned long dst, 1349 + struct size_class *class) 1350 + { 1351 + struct page *s_page, *d_page; 1352 + unsigned long s_objidx, d_objidx; 1353 + unsigned long s_off, d_off; 1354 + void *s_addr, *d_addr; 1355 + int s_size, d_size, size; 1356 + int written = 0; 1357 + 1358 + s_size = d_size = class->size; 1359 + 1360 + obj_to_location(src, &s_page, &s_objidx); 1361 + obj_to_location(dst, &d_page, &d_objidx); 1362 + 1363 + s_off = obj_idx_to_offset(s_page, s_objidx, class->size); 1364 + d_off = obj_idx_to_offset(d_page, d_objidx, class->size); 1365 + 1366 + if (s_off + class->size > PAGE_SIZE) 1367 + s_size = PAGE_SIZE - s_off; 1368 + 1369 + if (d_off + class->size > PAGE_SIZE) 1370 + d_size = PAGE_SIZE - d_off; 1371 + 1372 + s_addr = kmap_atomic(s_page); 1373 + d_addr = kmap_atomic(d_page); 1374 + 1375 + while (1) { 1376 + size = min(s_size, d_size); 1377 + memcpy(d_addr + d_off, s_addr + s_off, size); 1378 + written += size; 1379 + 1380 + if (written == class->size) 1381 + break; 1382 + 1383 + s_off += size; 1384 + s_size -= size; 1385 + d_off += size; 1386 + d_size -= size; 1387 + 1388 + if (s_off >= PAGE_SIZE) { 1389 + kunmap_atomic(d_addr); 1390 + kunmap_atomic(s_addr); 1391 + s_page = get_next_page(s_page); 1392 + BUG_ON(!s_page); 1393 + s_addr = kmap_atomic(s_page); 1394 + d_addr = kmap_atomic(d_page); 1395 + s_size = class->size - written; 1396 + s_off = 0; 1397 + } 1398 + 1399 + if (d_off >= PAGE_SIZE) { 1400 + kunmap_atomic(d_addr); 1401 + d_page = get_next_page(d_page); 1402 + BUG_ON(!d_page); 1403 + d_addr = kmap_atomic(d_page); 1404 + d_size = class->size - written; 1405 + d_off = 0; 1406 + } 1407 + } 1408 + 1409 + kunmap_atomic(d_addr); 1410 + kunmap_atomic(s_addr); 1411 + } 1412 + 1413 + /* 1414 + * Find alloced object in zspage from index object and 1415 + * return handle. 1416 + */ 1417 + static unsigned long find_alloced_obj(struct page *page, int index, 1418 + struct size_class *class) 1419 + { 1420 + unsigned long head; 1421 + int offset = 0; 1422 + unsigned long handle = 0; 1423 + void *addr = kmap_atomic(page); 1424 + 1425 + if (!is_first_page(page)) 1426 + offset = page->index; 1427 + offset += class->size * index; 1428 + 1429 + while (offset < PAGE_SIZE) { 1430 + head = obj_to_head(class, page, addr + offset); 1431 + if (head & OBJ_ALLOCATED_TAG) { 1432 + handle = head & ~OBJ_ALLOCATED_TAG; 1433 + if (trypin_tag(handle)) 1434 + break; 1435 + handle = 0; 1436 + } 1437 + 1438 + offset += class->size; 1439 + index++; 1440 + } 1441 + 1442 + kunmap_atomic(addr); 1443 + return handle; 1444 + } 1445 + 1446 + struct zs_compact_control { 1447 + /* Source page for migration which could be a subpage of zspage. */ 1448 + struct page *s_page; 1449 + /* Destination page for migration which should be a first page 1450 + * of zspage. */ 1451 + struct page *d_page; 1452 + /* Starting object index within @s_page which used for live object 1453 + * in the subpage. */ 1454 + int index; 1455 + /* how many of objects are migrated */ 1456 + int nr_migrated; 1457 + }; 1458 + 1459 + static int migrate_zspage(struct zs_pool *pool, struct size_class *class, 1460 + struct zs_compact_control *cc) 1461 + { 1462 + unsigned long used_obj, free_obj; 1463 + unsigned long handle; 1464 + struct page *s_page = cc->s_page; 1465 + struct page *d_page = cc->d_page; 1466 + unsigned long index = cc->index; 1467 + int nr_migrated = 0; 1468 + int ret = 0; 1469 + 1470 + while (1) { 1471 + handle = find_alloced_obj(s_page, index, class); 1472 + if (!handle) { 1473 + s_page = get_next_page(s_page); 1474 + if (!s_page) 1475 + break; 1476 + index = 0; 1477 + continue; 1478 + } 1479 + 1480 + /* Stop if there is no more space */ 1481 + if (zspage_full(d_page)) { 1482 + unpin_tag(handle); 1483 + ret = -ENOMEM; 1484 + break; 1485 + } 1486 + 1487 + used_obj = handle_to_obj(handle); 1488 + free_obj = obj_malloc(d_page, class, handle); 1489 + zs_object_copy(used_obj, free_obj, class); 1490 + index++; 1491 + record_obj(handle, free_obj); 1492 + unpin_tag(handle); 1493 + obj_free(pool, class, used_obj); 1494 + nr_migrated++; 1495 + } 1496 + 1497 + /* Remember last position in this iteration */ 1498 + cc->s_page = s_page; 1499 + cc->index = index; 1500 + cc->nr_migrated = nr_migrated; 1501 + 1502 + return ret; 1503 + } 1504 + 1505 + static struct page *alloc_target_page(struct size_class *class) 1506 + { 1507 + int i; 1508 + struct page *page; 1509 + 1510 + for (i = 0; i < _ZS_NR_FULLNESS_GROUPS; i++) { 1511 + page = class->fullness_list[i]; 1512 + if (page) { 1513 + remove_zspage(page, class, i); 1514 + break; 1515 + } 1516 + } 1517 + 1518 + return page; 1519 + } 1520 + 1521 + static void putback_zspage(struct zs_pool *pool, struct size_class *class, 1522 + struct page *first_page) 1523 + { 1524 + enum fullness_group fullness; 1525 + 1526 + BUG_ON(!is_first_page(first_page)); 1527 + 1528 + fullness = get_fullness_group(first_page); 1529 + insert_zspage(first_page, class, fullness); 1530 + set_zspage_mapping(first_page, class->index, fullness); 1531 + 1532 + if (fullness == ZS_EMPTY) { 1533 + zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage( 1534 + class->size, class->pages_per_zspage)); 1535 + atomic_long_sub(class->pages_per_zspage, 1536 + &pool->pages_allocated); 1537 + 1538 + free_zspage(first_page); 1539 + } 1540 + } 1541 + 1542 + static struct page *isolate_source_page(struct size_class *class) 1543 + { 1544 + struct page *page; 1545 + 1546 + page = class->fullness_list[ZS_ALMOST_EMPTY]; 1547 + if (page) 1548 + remove_zspage(page, class, ZS_ALMOST_EMPTY); 1549 + 1550 + return page; 1551 + } 1552 + 1553 + static unsigned long __zs_compact(struct zs_pool *pool, 1554 + struct size_class *class) 1555 + { 1556 + int nr_to_migrate; 1557 + struct zs_compact_control cc; 1558 + struct page *src_page; 1559 + struct page *dst_page = NULL; 1560 + unsigned long nr_total_migrated = 0; 1561 + 1562 + spin_lock(&class->lock); 1563 + while ((src_page = isolate_source_page(class))) { 1564 + 1565 + BUG_ON(!is_first_page(src_page)); 1566 + 1567 + /* The goal is to migrate all live objects in source page */ 1568 + nr_to_migrate = src_page->inuse; 1569 + cc.index = 0; 1570 + cc.s_page = src_page; 1571 + 1572 + while ((dst_page = alloc_target_page(class))) { 1573 + cc.d_page = dst_page; 1574 + /* 1575 + * If there is no more space in dst_page, try to 1576 + * allocate another zspage. 1577 + */ 1578 + if (!migrate_zspage(pool, class, &cc)) 1579 + break; 1580 + 1581 + putback_zspage(pool, class, dst_page); 1582 + nr_total_migrated += cc.nr_migrated; 1583 + nr_to_migrate -= cc.nr_migrated; 1584 + } 1585 + 1586 + /* Stop if we couldn't find slot */ 1587 + if (dst_page == NULL) 1588 + break; 1589 + 1590 + putback_zspage(pool, class, dst_page); 1591 + putback_zspage(pool, class, src_page); 1592 + spin_unlock(&class->lock); 1593 + nr_total_migrated += cc.nr_migrated; 1594 + cond_resched(); 1595 + spin_lock(&class->lock); 1596 + } 1597 + 1598 + if (src_page) 1599 + putback_zspage(pool, class, src_page); 1600 + 1601 + spin_unlock(&class->lock); 1602 + 1603 + return nr_total_migrated; 1604 + } 1605 + 1606 + unsigned long zs_compact(struct zs_pool *pool) 1607 + { 1608 + int i; 1609 + unsigned long nr_migrated = 0; 1610 + struct size_class *class; 1611 + 1612 + for (i = zs_size_classes - 1; i >= 0; i--) { 1613 + class = pool->size_class[i]; 1614 + if (!class) 1615 + continue; 1616 + if (class->index != i) 1617 + continue; 1618 + nr_migrated += __zs_compact(pool, class); 1619 + } 1620 + 1621 + return nr_migrated; 1622 + } 1623 + EXPORT_SYMBOL_GPL(zs_compact); 1489 1624 1490 1625 /** 1491 1626 * zs_create_pool - Creates an allocation pool to work from. ··· 1794 1355 if (!pool) 1795 1356 return NULL; 1796 1357 1797 - pool->name = kstrdup(name, GFP_KERNEL); 1798 - if (!pool->name) { 1358 + pool->size_class = kcalloc(zs_size_classes, sizeof(struct size_class *), 1359 + GFP_KERNEL); 1360 + if (!pool->size_class) { 1799 1361 kfree(pool); 1800 1362 return NULL; 1801 1363 } 1802 1364 1803 - pool->size_class = kcalloc(zs_size_classes, sizeof(struct size_class *), 1804 - GFP_KERNEL); 1805 - if (!pool->size_class) { 1806 - kfree(pool->name); 1807 - kfree(pool); 1808 - return NULL; 1809 - } 1365 + pool->name = kstrdup(name, GFP_KERNEL); 1366 + if (!pool->name) 1367 + goto err; 1368 + 1369 + if (create_handle_cache(pool)) 1370 + goto err; 1810 1371 1811 1372 /* 1812 1373 * Iterate reversly, because, size of size_class that we want to use ··· 1845 1406 class->size = size; 1846 1407 class->index = i; 1847 1408 class->pages_per_zspage = pages_per_zspage; 1409 + if (pages_per_zspage == 1 && 1410 + get_maxobj_per_zspage(size, pages_per_zspage) == 1) 1411 + class->huge = true; 1848 1412 spin_lock_init(&class->lock); 1849 1413 pool->size_class[i] = class; 1850 1414 ··· 1892 1450 kfree(class); 1893 1451 } 1894 1452 1453 + destroy_handle_cache(pool); 1895 1454 kfree(pool->size_class); 1896 1455 kfree(pool->name); 1897 1456 kfree(pool);

+2

net/sunrpc/Kconfig

··· 1 1 config SUNRPC 2 2 tristate 3 + depends on MULTIUSER 3 4 4 5 config SUNRPC_GSS 5 6 tristate 6 7 select OID_REGISTRY 8 + depends on MULTIUSER 7 9 8 10 config SUNRPC_BACKCHANNEL 9 11 bool

+5 -3

net/sunrpc/cache.c

··· 1072 1072 1073 1073 if (len < 0) return; 1074 1074 1075 - ret = string_escape_str(str, &bp, len, ESCAPE_OCTAL, "\\ \n\t"); 1076 - if (ret < 0 || ret == len) 1075 + ret = string_escape_str(str, bp, len, ESCAPE_OCTAL, "\\ \n\t"); 1076 + if (ret >= len) { 1077 + bp += len; 1077 1078 len = -1; 1078 - else { 1079 + } else { 1080 + bp += ret; 1079 1081 len -= ret; 1080 1082 *bp++ = ' '; 1081 1083 len--;

+1

security/Kconfig

··· 21 21 config SECURITY 22 22 bool "Enable different security models" 23 23 depends on SYSFS 24 + depends on MULTIUSER 24 25 help 25 26 This allows you to choose different security modules to be 26 27 configured into your kernel.

+6 -2

tools/testing/selftests/powerpc/mm/hugetlb_vs_thp_test.c

··· 21 21 * Typically the mmap will fail because no huge pages are 22 22 * allocated on the system. But if there are huge pages 23 23 * allocated the mmap will succeed. That's fine too, we just 24 - * munmap here before continuing. 24 + * munmap here before continuing. munmap() length of 25 + * MAP_HUGETLB memory must be hugepage aligned. 25 26 */ 26 - munmap(addr, SIZE); 27 + if (munmap(addr, SIZE)) { 28 + perror("munmap"); 29 + return 1; 30 + } 27 31 } 28 32 29 33 p = mmap(addr, SIZE, PROT_READ | PROT_WRITE,

+3 -1

tools/testing/selftests/vm/hugetlbfstest.c

··· 34 34 int *p; 35 35 int flags = MAP_PRIVATE | MAP_POPULATE | extra_flags; 36 36 u64 before, after; 37 + int ret; 37 38 38 39 before = read_rss(); 39 40 p = mmap(NULL, length, PROT_READ | PROT_WRITE, flags, fd, 0); ··· 45 44 !"rss didn't grow as expected"); 46 45 if (!unmap) 47 46 return; 48 - munmap(p, length); 47 + ret = munmap(p, length); 48 + assert(!ret || !"munmap returned an unexpected error"); 49 49 after = read_rss(); 50 50 assert(llabs(after - before) < 0x40000 || 51 51 !"rss didn't shrink as expected");

+5 -1

tools/testing/selftests/vm/map_hugetlb.c

··· 73 73 write_bytes(addr); 74 74 ret = read_bytes(addr); 75 75 76 - munmap(addr, LENGTH); 76 + /* munmap() length of MAP_HUGETLB memory must be hugepage aligned */ 77 + if (munmap(addr, LENGTH)) { 78 + perror("munmap"); 79 + exit(1); 80 + } 77 81 78 82 return ret; 79 83 }