Merge branch 'akpm' (patches from Andrew)

+4

CREDITS

··· 2992 2992 S: Santa Clara, CA 95052 2993 2993 S: USA 2994 2994 2995 + N: Anil Ravindranath 2996 + E: anil_ravindranath@pmc-sierra.com 2997 + D: PMC-Sierra MaxRAID driver 2998 + 2995 2999 N: Eric S. Raymond 2996 3000 E: esr@thyrsus.com 2997 3001 W: http://www.tuxedo.org/~esr/

+2

Documentation/vm/00-INDEX

··· 14 14 - a brief summary of hugetlbpage support in the Linux kernel. 15 15 hwpoison.txt 16 16 - explains what hwpoison is 17 + idle_page_tracking.txt 18 + - description of the idle page tracking feature. 17 19 ksm.txt 18 20 - how to use the Kernel Samepage Merging feature. 19 21 numa

+98

Documentation/vm/idle_page_tracking.txt

··· 1 + MOTIVATION 2 + 3 + The idle page tracking feature allows to track which memory pages are being 4 + accessed by a workload and which are idle. This information can be useful for 5 + estimating the workload's working set size, which, in turn, can be taken into 6 + account when configuring the workload parameters, setting memory cgroup limits, 7 + or deciding where to place the workload within a compute cluster. 8 + 9 + It is enabled by CONFIG_IDLE_PAGE_TRACKING=y. 10 + 11 + USER API 12 + 13 + The idle page tracking API is located at /sys/kernel/mm/page_idle. Currently, 14 + it consists of the only read-write file, /sys/kernel/mm/page_idle/bitmap. 15 + 16 + The file implements a bitmap where each bit corresponds to a memory page. The 17 + bitmap is represented by an array of 8-byte integers, and the page at PFN #i is 18 + mapped to bit #i%64 of array element #i/64, byte order is native. When a bit is 19 + set, the corresponding page is idle. 20 + 21 + A page is considered idle if it has not been accessed since it was marked idle 22 + (for more details on what "accessed" actually means see the IMPLEMENTATION 23 + DETAILS section). To mark a page idle one has to set the bit corresponding to 24 + the page by writing to the file. A value written to the file is OR-ed with the 25 + current bitmap value. 26 + 27 + Only accesses to user memory pages are tracked. These are pages mapped to a 28 + process address space, page cache and buffer pages, swap cache pages. For other 29 + page types (e.g. SLAB pages) an attempt to mark a page idle is silently ignored, 30 + and hence such pages are never reported idle. 31 + 32 + For huge pages the idle flag is set only on the head page, so one has to read 33 + /proc/kpageflags in order to correctly count idle huge pages. 34 + 35 + Reading from or writing to /sys/kernel/mm/page_idle/bitmap will return 36 + -EINVAL if you are not starting the read/write on an 8-byte boundary, or 37 + if the size of the read/write is not a multiple of 8 bytes. Writing to 38 + this file beyond max PFN will return -ENXIO. 39 + 40 + That said, in order to estimate the amount of pages that are not used by a 41 + workload one should: 42 + 43 + 1. Mark all the workload's pages as idle by setting corresponding bits in 44 + /sys/kernel/mm/page_idle/bitmap. The pages can be found by reading 45 + /proc/pid/pagemap if the workload is represented by a process, or by 46 + filtering out alien pages using /proc/kpagecgroup in case the workload is 47 + placed in a memory cgroup. 48 + 49 + 2. Wait until the workload accesses its working set. 50 + 51 + 3. Read /sys/kernel/mm/page_idle/bitmap and count the number of bits set. If 52 + one wants to ignore certain types of pages, e.g. mlocked pages since they 53 + are not reclaimable, he or she can filter them out using /proc/kpageflags. 54 + 55 + See Documentation/vm/pagemap.txt for more information about /proc/pid/pagemap, 56 + /proc/kpageflags, and /proc/kpagecgroup. 57 + 58 + IMPLEMENTATION DETAILS 59 + 60 + The kernel internally keeps track of accesses to user memory pages in order to 61 + reclaim unreferenced pages first on memory shortage conditions. A page is 62 + considered referenced if it has been recently accessed via a process address 63 + space, in which case one or more PTEs it is mapped to will have the Accessed bit 64 + set, or marked accessed explicitly by the kernel (see mark_page_accessed()). The 65 + latter happens when: 66 + 67 + - a userspace process reads or writes a page using a system call (e.g. read(2) 68 + or write(2)) 69 + 70 + - a page that is used for storing filesystem buffers is read or written, 71 + because a process needs filesystem metadata stored in it (e.g. lists a 72 + directory tree) 73 + 74 + - a page is accessed by a device driver using get_user_pages() 75 + 76 + When a dirty page is written to swap or disk as a result of memory reclaim or 77 + exceeding the dirty memory limit, it is not marked referenced. 78 + 79 + The idle memory tracking feature adds a new page flag, the Idle flag. This flag 80 + is set manually, by writing to /sys/kernel/mm/page_idle/bitmap (see the USER API 81 + section), and cleared automatically whenever a page is referenced as defined 82 + above. 83 + 84 + When a page is marked idle, the Accessed bit must be cleared in all PTEs it is 85 + mapped to, otherwise we will not be able to detect accesses to the page coming 86 + from a process address space. To avoid interference with the reclaimer, which, 87 + as noted above, uses the Accessed bit to promote actively referenced pages, one 88 + more page flag is introduced, the Young flag. When the PTE Accessed bit is 89 + cleared as a result of setting or updating a page's Idle flag, the Young flag 90 + is set on the page. The reclaimer treats the Young flag as an extra PTE 91 + Accessed bit and therefore will consider such a page as referenced. 92 + 93 + Since the idle memory tracking feature is based on the memory reclaimer logic, 94 + it only works with pages that are on an LRU list, other pages are silently 95 + ignored. That means it will ignore a user memory page if it is isolated, but 96 + since there are usually not many of them, it should not affect the overall 97 + result noticeably. In order not to stall scanning of the idle page bitmap, 98 + locked pages may be skipped too.

+12 -1

Documentation/vm/pagemap.txt

··· 5 5 userspace programs to examine the page tables and related information by 6 6 reading files in /proc. 7 7 8 - There are three components to pagemap: 8 + There are four components to pagemap: 9 9 10 10 * /proc/pid/pagemap. This file lets a userspace process find out which 11 11 physical frame each virtual page is mapped to. It contains one 64-bit ··· 70 70 22. THP 71 71 23. BALLOON 72 72 24. ZERO_PAGE 73 + 25. IDLE 74 + 75 + * /proc/kpagecgroup. This file contains a 64-bit inode number of the 76 + memory cgroup each page is charged to, indexed by PFN. Only available when 77 + CONFIG_MEMCG is set. 73 78 74 79 Short descriptions to the page flags: 75 80 ··· 120 115 121 116 24. ZERO_PAGE 122 117 zero page for pfn_zero or huge_zero page 118 + 119 + 25. IDLE 120 + page has not been accessed since it was marked idle (see 121 + Documentation/vm/idle_page_tracking.txt). Note that this flag may be 122 + stale in case the page was accessed via a PTE. To make sure the flag 123 + is up-to-date one has to read /sys/kernel/mm/page_idle/bitmap first. 123 124 124 125 [IO related page flags] 125 126 1. ERROR IO error occurred

+28 -8

Documentation/vm/zswap.txt

··· 32 32 An example command to enable zswap at runtime, assuming sysfs is mounted 33 33 at /sys, is: 34 34 35 - echo 1 > /sys/modules/zswap/parameters/enabled 35 + echo 1 > /sys/module/zswap/parameters/enabled 36 36 37 37 When zswap is disabled at runtime it will stop storing pages that are 38 38 being swapped out. However, it will _not_ immediately write out or fault ··· 49 49 evict pages from its own compressed pool on an LRU basis and write them back to 50 50 the backing swap device in the case that the compressed pool is full. 51 51 52 - Zswap makes use of zbud for the managing the compressed memory pool. Each 53 - allocation in zbud is not directly accessible by address. Rather, a handle is 52 + Zswap makes use of zpool for the managing the compressed memory pool. Each 53 + allocation in zpool is not directly accessible by address. Rather, a handle is 54 54 returned by the allocation routine and that handle must be mapped before being 55 55 accessed. The compressed memory pool grows on demand and shrinks as compressed 56 - pages are freed. The pool is not preallocated. 56 + pages are freed. The pool is not preallocated. By default, a zpool of type 57 + zbud is created, but it can be selected at boot time by setting the "zpool" 58 + attribute, e.g. zswap.zpool=zbud. It can also be changed at runtime using the 59 + sysfs "zpool" attribute, e.g. 60 + 61 + echo zbud > /sys/module/zswap/parameters/zpool 62 + 63 + The zbud type zpool allocates exactly 1 page to store 2 compressed pages, which 64 + means the compression ratio will always be 2:1 or worse (because of half-full 65 + zbud pages). The zsmalloc type zpool has a more complex compressed page 66 + storage method, and it can achieve greater storage densities. However, 67 + zsmalloc does not implement compressed page eviction, so once zswap fills it 68 + cannot evict the oldest page, it can only reject new pages. 57 69 58 70 When a swap page is passed from frontswap to zswap, zswap maintains a mapping 59 - of the swap entry, a combination of the swap type and swap offset, to the zbud 71 + of the swap entry, a combination of the swap type and swap offset, to the zpool 60 72 handle that references that compressed swap page. This mapping is achieved 61 73 with a red-black tree per swap type. The swap offset is the search key for the 62 74 tree nodes. ··· 86 74 * max_pool_percent - The maximum percentage of memory that the compressed 87 75 pool can occupy. 88 76 89 - Zswap allows the compressor to be selected at kernel boot time by setting the 90 - “compressor” attribute. The default compressor is lzo. e.g. 91 - zswap.compressor=deflate 77 + The default compressor is lzo, but it can be selected at boot time by setting 78 + the “compressor” attribute, e.g. zswap.compressor=lzo. It can also be changed 79 + at runtime using the sysfs "compressor" attribute, e.g. 80 + 81 + echo lzo > /sys/module/zswap/parameters/compressor 82 + 83 + When the zpool and/or compressor parameter is changed at runtime, any existing 84 + compressed pages are not modified; they are left in their own zpool. When a 85 + request is made for a page in an old zpool, it is uncompressed using its 86 + original compressor. Once all pages are removed from an old zpool, the zpool 87 + and its compressor are freed. 92 88 93 89 A debugfs interface is provided for various statistic about pool size, number 94 90 of pages stored, and various counters for the reasons pages are rejected.

+1 -2

MAINTAINERS

··· 8199 8199 F: include/linux/i2c/pmbus.h 8200 8200 8201 8201 PMC SIERRA MaxRAID DRIVER 8202 - M: Anil Ravindranath <anil_ravindranath@pmc-sierra.com> 8203 8202 L: linux-scsi@vger.kernel.org 8204 8203 W: http://www.pmc-sierra.com/ 8205 - S: Supported 8204 + S: Orphan 8206 8205 F: drivers/scsi/pmcraid.* 8207 8206 8208 8207 PMC SIERRA PM8001 DRIVER

+3

arch/Kconfig

··· 2 2 # General architecture dependent options 3 3 # 4 4 5 + config KEXEC_CORE 6 + bool 7 + 5 8 config OPROFILE 6 9 tristate "OProfile system profiling" 7 10 depends on PROFILING

-36

arch/alpha/include/asm/dma-mapping.h

··· 12 12 13 13 #include <asm-generic/dma-mapping-common.h> 14 14 15 - #define dma_alloc_coherent(d,s,h,f) dma_alloc_attrs(d,s,h,f,NULL) 16 - 17 - static inline void *dma_alloc_attrs(struct device *dev, size_t size, 18 - dma_addr_t *dma_handle, gfp_t gfp, 19 - struct dma_attrs *attrs) 20 - { 21 - return get_dma_ops(dev)->alloc(dev, size, dma_handle, gfp, attrs); 22 - } 23 - 24 - #define dma_free_coherent(d,s,c,h) dma_free_attrs(d,s,c,h,NULL) 25 - 26 - static inline void dma_free_attrs(struct device *dev, size_t size, 27 - void *vaddr, dma_addr_t dma_handle, 28 - struct dma_attrs *attrs) 29 - { 30 - get_dma_ops(dev)->free(dev, size, vaddr, dma_handle, attrs); 31 - } 32 - 33 - static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) 34 - { 35 - return get_dma_ops(dev)->mapping_error(dev, dma_addr); 36 - } 37 - 38 - static inline int dma_supported(struct device *dev, u64 mask) 39 - { 40 - return get_dma_ops(dev)->dma_supported(dev, mask); 41 - } 42 - 43 - static inline int dma_set_mask(struct device *dev, u64 mask) 44 - { 45 - return get_dma_ops(dev)->set_dma_mask(dev, mask); 46 - } 47 - 48 - #define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f) 49 - #define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h) 50 - 51 15 #define dma_cache_sync(dev, va, size, dir) ((void)0) 52 16 53 17 #endif /* _ALPHA_DMA_MAPPING_H */

-10

arch/alpha/kernel/pci-noop.c

··· 166 166 return mask < 0x00ffffffUL ? 0 : 1; 167 167 } 168 168 169 - static int alpha_noop_set_mask(struct device *dev, u64 mask) 170 - { 171 - if (!dev->dma_mask || !dma_supported(dev, mask)) 172 - return -EIO; 173 - 174 - *dev->dma_mask = mask; 175 - return 0; 176 - } 177 - 178 169 struct dma_map_ops alpha_noop_ops = { 179 170 .alloc = alpha_noop_alloc_coherent, 180 171 .free = alpha_noop_free_coherent, ··· 173 182 .map_sg = alpha_noop_map_sg, 174 183 .mapping_error = alpha_noop_mapping_error, 175 184 .dma_supported = alpha_noop_supported, 176 - .set_dma_mask = alpha_noop_set_mask, 177 185 }; 178 186 179 187 struct dma_map_ops *dma_ops = &alpha_noop_ops;

-11

arch/alpha/kernel/pci_iommu.c

··· 939 939 return dma_addr == 0; 940 940 } 941 941 942 - static int alpha_pci_set_mask(struct device *dev, u64 mask) 943 - { 944 - if (!dev->dma_mask || 945 - !pci_dma_supported(alpha_gendev_to_pci(dev), mask)) 946 - return -EIO; 947 - 948 - *dev->dma_mask = mask; 949 - return 0; 950 - } 951 - 952 942 struct dma_map_ops alpha_pci_ops = { 953 943 .alloc = alpha_pci_alloc_coherent, 954 944 .free = alpha_pci_free_coherent, ··· 948 958 .unmap_sg = alpha_pci_unmap_sg, 949 959 .mapping_error = alpha_pci_mapping_error, 950 960 .dma_supported = alpha_pci_supported, 951 - .set_dma_mask = alpha_pci_set_mask, 952 961 }; 953 962 954 963 struct dma_map_ops *dma_ops = &alpha_pci_ops;

+1

arch/arm/Kconfig

··· 2020 2020 bool "Kexec system call (EXPERIMENTAL)" 2021 2021 depends on (!SMP || PM_SLEEP_SMP) 2022 2022 depends on !CPU_V7M 2023 + select KEXEC_CORE 2023 2024 help 2024 2025 kexec is a system call that implements the ability to shutdown your 2025 2026 current kernel, and to start another kernel. It is like a reboot

+1 -1

arch/arm/boot/compressed/decompress.c

··· 57 57 58 58 int do_decompress(u8 *input, int len, u8 *output, void (*error)(char *x)) 59 59 { 60 - return decompress(input, len, NULL, NULL, output, NULL, error); 60 + return __decompress(input, len, NULL, NULL, output, 0, NULL, error); 61 61 }

+8 -60

arch/arm/include/asm/dma-mapping.h

··· 8 8 #include <linux/dma-attrs.h> 9 9 #include <linux/dma-debug.h> 10 10 11 - #include <asm-generic/dma-coherent.h> 12 11 #include <asm/memory.h> 13 12 14 13 #include <xen/xen.h> ··· 38 39 dev->archdata.dma_ops = ops; 39 40 } 40 41 41 - #include <asm-generic/dma-mapping-common.h> 42 + #define HAVE_ARCH_DMA_SUPPORTED 1 43 + extern int dma_supported(struct device *dev, u64 mask); 42 44 43 - static inline int dma_set_mask(struct device *dev, u64 mask) 44 - { 45 - return get_dma_ops(dev)->set_dma_mask(dev, mask); 46 - } 45 + /* 46 + * Note that while the generic code provides dummy dma_{alloc,free}_noncoherent 47 + * implementations, we don't provide a dma_cache_sync function so drivers using 48 + * this API are highlighted with build warnings. 49 + */ 50 + #include <asm-generic/dma-mapping-common.h> 47 51 48 52 #ifdef __arch_page_to_dma 49 53 #error Please update to __arch_pfn_to_dma ··· 169 167 170 168 static inline void dma_mark_clean(void *addr, size_t size) { } 171 169 172 - /* 173 - * DMA errors are defined by all-bits-set in the DMA address. 174 - */ 175 - static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) 176 - { 177 - debug_dma_mapping_error(dev, dma_addr); 178 - return dma_addr == DMA_ERROR_CODE; 179 - } 180 - 181 - /* 182 - * Dummy noncoherent implementation. We don't provide a dma_cache_sync 183 - * function so drivers using this API are highlighted with build warnings. 184 - */ 185 - static inline void *dma_alloc_noncoherent(struct device *dev, size_t size, 186 - dma_addr_t *handle, gfp_t gfp) 187 - { 188 - return NULL; 189 - } 190 - 191 - static inline void dma_free_noncoherent(struct device *dev, size_t size, 192 - void *cpu_addr, dma_addr_t handle) 193 - { 194 - } 195 - 196 - extern int dma_supported(struct device *dev, u64 mask); 197 - 198 170 extern int arm_dma_set_mask(struct device *dev, u64 dma_mask); 199 171 200 172 /** ··· 184 208 */ 185 209 extern void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, 186 210 gfp_t gfp, struct dma_attrs *attrs); 187 - 188 - #define dma_alloc_coherent(d, s, h, f) dma_alloc_attrs(d, s, h, f, NULL) 189 - 190 - static inline void *dma_alloc_attrs(struct device *dev, size_t size, 191 - dma_addr_t *dma_handle, gfp_t flag, 192 - struct dma_attrs *attrs) 193 - { 194 - struct dma_map_ops *ops = get_dma_ops(dev); 195 - void *cpu_addr; 196 - BUG_ON(!ops); 197 - 198 - cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs); 199 - debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr); 200 - return cpu_addr; 201 - } 202 211 203 212 /** 204 213 * arm_dma_free - free memory allocated by arm_dma_alloc ··· 201 240 */ 202 241 extern void arm_dma_free(struct device *dev, size_t size, void *cpu_addr, 203 242 dma_addr_t handle, struct dma_attrs *attrs); 204 - 205 - #define dma_free_coherent(d, s, c, h) dma_free_attrs(d, s, c, h, NULL) 206 - 207 - static inline void dma_free_attrs(struct device *dev, size_t size, 208 - void *cpu_addr, dma_addr_t dma_handle, 209 - struct dma_attrs *attrs) 210 - { 211 - struct dma_map_ops *ops = get_dma_ops(dev); 212 - BUG_ON(!ops); 213 - 214 - debug_dma_free_coherent(dev, size, cpu_addr, dma_handle); 215 - ops->free(dev, size, cpu_addr, dma_handle, attrs); 216 - } 217 243 218 244 /** 219 245 * arm_dma_mmap - map a coherent DMA allocation into user space

-12

arch/arm/mm/dma-mapping.c

··· 676 676 gfp_t gfp, struct dma_attrs *attrs) 677 677 { 678 678 pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL); 679 - void *memory; 680 - 681 - if (dma_alloc_from_coherent(dev, size, handle, &memory)) 682 - return memory; 683 679 684 680 return __dma_alloc(dev, size, handle, gfp, prot, false, 685 681 attrs, __builtin_return_address(0)); ··· 684 688 static void *arm_coherent_dma_alloc(struct device *dev, size_t size, 685 689 dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs) 686 690 { 687 - void *memory; 688 - 689 - if (dma_alloc_from_coherent(dev, size, handle, &memory)) 690 - return memory; 691 - 692 691 return __dma_alloc(dev, size, handle, gfp, PAGE_KERNEL, true, 693 692 attrs, __builtin_return_address(0)); 694 693 } ··· 742 751 { 743 752 struct page *page = pfn_to_page(dma_to_pfn(dev, handle)); 744 753 bool want_vaddr = !dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs); 745 - 746 - if (dma_release_from_coherent(dev, get_order(size), cpu_addr)) 747 - return; 748 754 749 755 size = PAGE_ALIGN(size); 750 756

-69

arch/arm64/include/asm/dma-mapping.h

··· 22 22 #include <linux/types.h> 23 23 #include <linux/vmalloc.h> 24 24 25 - #include <asm-generic/dma-coherent.h> 26 - 27 25 #include <xen/xen.h> 28 26 #include <asm/xen/hypervisor.h> 29 27 ··· 84 86 return (phys_addr_t)dev_addr; 85 87 } 86 88 87 - static inline int dma_mapping_error(struct device *dev, dma_addr_t dev_addr) 88 - { 89 - struct dma_map_ops *ops = get_dma_ops(dev); 90 - debug_dma_mapping_error(dev, dev_addr); 91 - return ops->mapping_error(dev, dev_addr); 92 - } 93 - 94 - static inline int dma_supported(struct device *dev, u64 mask) 95 - { 96 - struct dma_map_ops *ops = get_dma_ops(dev); 97 - return ops->dma_supported(dev, mask); 98 - } 99 - 100 - static inline int dma_set_mask(struct device *dev, u64 mask) 101 - { 102 - if (!dev->dma_mask || !dma_supported(dev, mask)) 103 - return -EIO; 104 - *dev->dma_mask = mask; 105 - 106 - return 0; 107 - } 108 - 109 89 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size) 110 90 { 111 91 if (!dev->dma_mask) ··· 93 117 } 94 118 95 119 static inline void dma_mark_clean(void *addr, size_t size) 96 - { 97 - } 98 - 99 - #define dma_alloc_coherent(d, s, h, f) dma_alloc_attrs(d, s, h, f, NULL) 100 - #define dma_free_coherent(d, s, h, f) dma_free_attrs(d, s, h, f, NULL) 101 - 102 - static inline void *dma_alloc_attrs(struct device *dev, size_t size, 103 - dma_addr_t *dma_handle, gfp_t flags, 104 - struct dma_attrs *attrs) 105 - { 106 - struct dma_map_ops *ops = get_dma_ops(dev); 107 - void *vaddr; 108 - 109 - if (dma_alloc_from_coherent(dev, size, dma_handle, &vaddr)) 110 - return vaddr; 111 - 112 - vaddr = ops->alloc(dev, size, dma_handle, flags, attrs); 113 - debug_dma_alloc_coherent(dev, size, *dma_handle, vaddr); 114 - return vaddr; 115 - } 116 - 117 - static inline void dma_free_attrs(struct device *dev, size_t size, 118 - void *vaddr, dma_addr_t dev_addr, 119 - struct dma_attrs *attrs) 120 - { 121 - struct dma_map_ops *ops = get_dma_ops(dev); 122 - 123 - if (dma_release_from_coherent(dev, get_order(size), vaddr)) 124 - return; 125 - 126 - debug_dma_free_coherent(dev, size, vaddr, dev_addr); 127 - ops->free(dev, size, vaddr, dev_addr, attrs); 128 - } 129 - 130 - /* 131 - * There is no dma_cache_sync() implementation, so just return NULL here. 132 - */ 133 - static inline void *dma_alloc_noncoherent(struct device *dev, size_t size, 134 - dma_addr_t *handle, gfp_t flags) 135 - { 136 - return NULL; 137 - } 138 - 139 - static inline void dma_free_noncoherent(struct device *dev, size_t size, 140 - void *cpu_addr, dma_addr_t handle) 141 120 { 142 121 } 143 122

+1 -1

arch/h8300/boot/compressed/misc.c

··· 70 70 free_mem_ptr = (unsigned long)&_end; 71 71 free_mem_end_ptr = free_mem_ptr + HEAP_SIZE; 72 72 73 - decompress(input_data, input_len, NULL, NULL, output, NULL, error); 73 + __decompress(input_data, input_len, NULL, NULL, output, 0, NULL, error); 74 74 }

-44

arch/h8300/include/asm/dma-mapping.h

··· 1 1 #ifndef _H8300_DMA_MAPPING_H 2 2 #define _H8300_DMA_MAPPING_H 3 3 4 - #include <asm-generic/dma-coherent.h> 5 - 6 4 extern struct dma_map_ops h8300_dma_map_ops; 7 5 8 6 static inline struct dma_map_ops *get_dma_ops(struct device *dev) ··· 9 11 } 10 12 11 13 #include <asm-generic/dma-mapping-common.h> 12 - 13 - static inline int dma_supported(struct device *dev, u64 mask) 14 - { 15 - return 0; 16 - } 17 - 18 - static inline int dma_set_mask(struct device *dev, u64 mask) 19 - { 20 - return 0; 21 - } 22 - 23 - #define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f) 24 - #define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h) 25 - 26 - #define dma_alloc_coherent(d, s, h, f) dma_alloc_attrs(d, s, h, f, NULL) 27 - 28 - static inline void *dma_alloc_attrs(struct device *dev, size_t size, 29 - dma_addr_t *dma_handle, gfp_t flag, 30 - struct dma_attrs *attrs) 31 - { 32 - struct dma_map_ops *ops = get_dma_ops(dev); 33 - void *memory; 34 - 35 - memory = ops->alloc(dev, size, dma_handle, flag, attrs); 36 - return memory; 37 - } 38 - 39 - #define dma_free_coherent(d, s, c, h) dma_free_attrs(d, s, c, h, NULL) 40 - 41 - static inline void dma_free_attrs(struct device *dev, size_t size, 42 - void *cpu_addr, dma_addr_t dma_handle, 43 - struct dma_attrs *attrs) 44 - { 45 - struct dma_map_ops *ops = get_dma_ops(dev); 46 - 47 - ops->free(dev, size, cpu_addr, dma_handle, attrs); 48 - } 49 - 50 - static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) 51 - { 52 - return 0; 53 - } 54 14 55 15 #endif

+2 -47

arch/hexagon/include/asm/dma-mapping.h

··· 31 31 32 32 struct device; 33 33 extern int bad_dma_address; 34 + #define DMA_ERROR_CODE bad_dma_address 34 35 35 36 extern struct dma_map_ops *dma_ops; 36 - 37 - #define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f) 38 - #define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h) 39 37 40 38 static inline struct dma_map_ops *get_dma_ops(struct device *dev) 41 39 { ··· 43 45 return dma_ops; 44 46 } 45 47 48 + #define HAVE_ARCH_DMA_SUPPORTED 1 46 49 extern int dma_supported(struct device *dev, u64 mask); 47 - extern int dma_set_mask(struct device *dev, u64 mask); 48 50 extern int dma_is_consistent(struct device *dev, dma_addr_t dma_handle); 49 51 extern void dma_cache_sync(struct device *dev, void *vaddr, size_t size, 50 52 enum dma_data_direction direction); ··· 56 58 if (!dev->dma_mask) 57 59 return 0; 58 60 return addr + size - 1 <= *dev->dma_mask; 59 - } 60 - 61 - static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) 62 - { 63 - struct dma_map_ops *dma_ops = get_dma_ops(dev); 64 - 65 - if (dma_ops->mapping_error) 66 - return dma_ops->mapping_error(dev, dma_addr); 67 - 68 - return (dma_addr == bad_dma_address); 69 - } 70 - 71 - #define dma_alloc_coherent(d,s,h,f) dma_alloc_attrs(d,s,h,f,NULL) 72 - 73 - static inline void *dma_alloc_attrs(struct device *dev, size_t size, 74 - dma_addr_t *dma_handle, gfp_t flag, 75 - struct dma_attrs *attrs) 76 - { 77 - void *ret; 78 - struct dma_map_ops *ops = get_dma_ops(dev); 79 - 80 - BUG_ON(!dma_ops); 81 - 82 - ret = ops->alloc(dev, size, dma_handle, flag, attrs); 83 - 84 - debug_dma_alloc_coherent(dev, size, *dma_handle, ret); 85 - 86 - return ret; 87 - } 88 - 89 - #define dma_free_coherent(d,s,c,h) dma_free_attrs(d,s,c,h,NULL) 90 - 91 - static inline void dma_free_attrs(struct device *dev, size_t size, 92 - void *cpu_addr, dma_addr_t dma_handle, 93 - struct dma_attrs *attrs) 94 - { 95 - struct dma_map_ops *dma_ops = get_dma_ops(dev); 96 - 97 - BUG_ON(!dma_ops); 98 - 99 - dma_ops->free(dev, size, cpu_addr, dma_handle, attrs); 100 - 101 - debug_dma_free_coherent(dev, size, cpu_addr, dma_handle); 102 61 } 103 62 104 63 #endif

-11

arch/hexagon/kernel/dma.c

··· 44 44 } 45 45 EXPORT_SYMBOL(dma_supported); 46 46 47 - int dma_set_mask(struct device *dev, u64 mask) 48 - { 49 - if (!dev->dma_mask || !dma_supported(dev, mask)) 50 - return -EIO; 51 - 52 - *dev->dma_mask = mask; 53 - 54 - return 0; 55 - } 56 - EXPORT_SYMBOL(dma_set_mask); 57 - 58 47 static struct gen_pool *coherent_pool; 59 48 60 49

+1

arch/ia64/Kconfig

··· 518 518 config KEXEC 519 519 bool "kexec system call" 520 520 depends on !IA64_HP_SIM && (!SMP || HOTPLUG_CPU) 521 + select KEXEC_CORE 521 522 help 522 523 kexec is a system call that implements the ability to shutdown your 523 524 current kernel, and to start another kernel. It is like a reboot

-50

arch/ia64/include/asm/dma-mapping.h

··· 23 23 extern void machvec_dma_sync_sg(struct device *, struct scatterlist *, int, 24 24 enum dma_data_direction); 25 25 26 - #define dma_alloc_coherent(d,s,h,f) dma_alloc_attrs(d,s,h,f,NULL) 27 - 28 - static inline void *dma_alloc_attrs(struct device *dev, size_t size, 29 - dma_addr_t *daddr, gfp_t gfp, 30 - struct dma_attrs *attrs) 31 - { 32 - struct dma_map_ops *ops = platform_dma_get_ops(dev); 33 - void *caddr; 34 - 35 - caddr = ops->alloc(dev, size, daddr, gfp, attrs); 36 - debug_dma_alloc_coherent(dev, size, *daddr, caddr); 37 - return caddr; 38 - } 39 - 40 - #define dma_free_coherent(d,s,c,h) dma_free_attrs(d,s,c,h,NULL) 41 - 42 - static inline void dma_free_attrs(struct device *dev, size_t size, 43 - void *caddr, dma_addr_t daddr, 44 - struct dma_attrs *attrs) 45 - { 46 - struct dma_map_ops *ops = platform_dma_get_ops(dev); 47 - debug_dma_free_coherent(dev, size, caddr, daddr); 48 - ops->free(dev, size, caddr, daddr, attrs); 49 - } 50 - 51 - #define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f) 52 - #define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h) 53 - 54 26 #define get_dma_ops(dev) platform_dma_get_ops(dev) 55 27 56 28 #include <asm-generic/dma-mapping-common.h> 57 - 58 - static inline int dma_mapping_error(struct device *dev, dma_addr_t daddr) 59 - { 60 - struct dma_map_ops *ops = platform_dma_get_ops(dev); 61 - debug_dma_mapping_error(dev, daddr); 62 - return ops->mapping_error(dev, daddr); 63 - } 64 - 65 - static inline int dma_supported(struct device *dev, u64 mask) 66 - { 67 - struct dma_map_ops *ops = platform_dma_get_ops(dev); 68 - return ops->dma_supported(dev, mask); 69 - } 70 - 71 - static inline int 72 - dma_set_mask (struct device *dev, u64 mask) 73 - { 74 - if (!dev->dma_mask || !dma_supported(dev, mask)) 75 - return -EIO; 76 - *dev->dma_mask = mask; 77 - return 0; 78 - } 79 29 80 30 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size) 81 31 {

+2 -1

arch/m32r/boot/compressed/misc.c

··· 86 86 free_mem_end_ptr = free_mem_ptr + BOOT_HEAP_SIZE; 87 87 88 88 puts("\nDecompressing Linux... "); 89 - decompress(input_data, input_len, NULL, NULL, output_data, NULL, error); 89 + __decompress(input_data, input_len, NULL, NULL, output_data, 0, 90 + NULL, error); 90 91 puts("done.\nBooting the kernel.\n"); 91 92 }

+1

arch/m68k/Kconfig

··· 95 95 config KEXEC 96 96 bool "kexec system call" 97 97 depends on M68KCLASSIC 98 + select KEXEC_CORE 98 99 help 99 100 kexec is a system call that implements the ability to shutdown your 100 101 current kernel, and to start another kernel. It is like a reboot

-70

arch/microblaze/include/asm/dma-mapping.h

··· 27 27 #include <linux/dma-debug.h> 28 28 #include <linux/dma-attrs.h> 29 29 #include <asm/io.h> 30 - #include <asm-generic/dma-coherent.h> 31 30 #include <asm/cacheflush.h> 32 31 33 32 #define DMA_ERROR_CODE (~(dma_addr_t)0x0) ··· 42 43 static inline struct dma_map_ops *get_dma_ops(struct device *dev) 43 44 { 44 45 return &dma_direct_ops; 45 - } 46 - 47 - static inline int dma_supported(struct device *dev, u64 mask) 48 - { 49 - struct dma_map_ops *ops = get_dma_ops(dev); 50 - 51 - if (unlikely(!ops)) 52 - return 0; 53 - if (!ops->dma_supported) 54 - return 1; 55 - return ops->dma_supported(dev, mask); 56 - } 57 - 58 - static inline int dma_set_mask(struct device *dev, u64 dma_mask) 59 - { 60 - struct dma_map_ops *ops = get_dma_ops(dev); 61 - 62 - if (unlikely(ops == NULL)) 63 - return -EIO; 64 - if (ops->set_dma_mask) 65 - return ops->set_dma_mask(dev, dma_mask); 66 - if (!dev->dma_mask || !dma_supported(dev, dma_mask)) 67 - return -EIO; 68 - *dev->dma_mask = dma_mask; 69 - return 0; 70 46 } 71 47 72 48 #include <asm-generic/dma-mapping-common.h> ··· 60 86 default: 61 87 BUG(); 62 88 } 63 - } 64 - 65 - static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) 66 - { 67 - struct dma_map_ops *ops = get_dma_ops(dev); 68 - 69 - debug_dma_mapping_error(dev, dma_addr); 70 - if (ops->mapping_error) 71 - return ops->mapping_error(dev, dma_addr); 72 - 73 - return (dma_addr == DMA_ERROR_CODE); 74 - } 75 - 76 - #define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f) 77 - #define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h) 78 - 79 - #define dma_alloc_coherent(d, s, h, f) dma_alloc_attrs(d, s, h, f, NULL) 80 - 81 - static inline void *dma_alloc_attrs(struct device *dev, size_t size, 82 - dma_addr_t *dma_handle, gfp_t flag, 83 - struct dma_attrs *attrs) 84 - { 85 - struct dma_map_ops *ops = get_dma_ops(dev); 86 - void *memory; 87 - 88 - BUG_ON(!ops); 89 - 90 - memory = ops->alloc(dev, size, dma_handle, flag, attrs); 91 - 92 - debug_dma_alloc_coherent(dev, size, *dma_handle, memory); 93 - return memory; 94 - } 95 - 96 - #define dma_free_coherent(d,s,c,h) dma_free_attrs(d, s, c, h, NULL) 97 - 98 - static inline void dma_free_attrs(struct device *dev, size_t size, 99 - void *cpu_addr, dma_addr_t dma_handle, 100 - struct dma_attrs *attrs) 101 - { 102 - struct dma_map_ops *ops = get_dma_ops(dev); 103 - 104 - BUG_ON(!ops); 105 - debug_dma_free_coherent(dev, size, cpu_addr, dma_handle); 106 - ops->free(dev, size, cpu_addr, dma_handle, attrs); 107 89 } 108 90 109 91 static inline void dma_cache_sync(struct device *dev, void *vaddr, size_t size,

+1

arch/mips/Kconfig

··· 2597 2597 2598 2598 config KEXEC 2599 2599 bool "Kexec system call" 2600 + select KEXEC_CORE 2600 2601 help 2601 2602 kexec is a system call that implements the ability to shutdown your 2602 2603 current kernel, and to start another kernel. It is like a reboot

+2 -2

arch/mips/boot/compressed/decompress.c

··· 111 111 puts("\n"); 112 112 113 113 /* Decompress the kernel with according algorithm */ 114 - decompress((char *)zimage_start, zimage_size, 0, 0, 115 - (void *)VMLINUX_LOAD_ADDRESS_ULL, 0, error); 114 + __decompress((char *)zimage_start, zimage_size, 0, 0, 115 + (void *)VMLINUX_LOAD_ADDRESS_ULL, 0, 0, error); 116 116 117 117 /* FIXME: should we flush cache here? */ 118 118 puts("Now, booting the kernel...\n");

-8

arch/mips/cavium-octeon/dma-octeon.c

··· 161 161 { 162 162 void *ret; 163 163 164 - if (dma_alloc_from_coherent(dev, size, dma_handle, &ret)) 165 - return ret; 166 - 167 164 /* ignore region specifiers */ 168 165 gfp &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM); 169 166 ··· 191 194 static void octeon_dma_free_coherent(struct device *dev, size_t size, 192 195 void *vaddr, dma_addr_t dma_handle, struct dma_attrs *attrs) 193 196 { 194 - int order = get_order(size); 195 - 196 - if (dma_release_from_coherent(dev, order, vaddr)) 197 - return; 198 - 199 197 swiotlb_free_coherent(dev, size, vaddr, dma_handle); 200 198 } 201 199

-67

arch/mips/include/asm/dma-mapping.h

··· 4 4 #include <linux/scatterlist.h> 5 5 #include <asm/dma-coherence.h> 6 6 #include <asm/cache.h> 7 - #include <asm-generic/dma-coherent.h> 8 7 9 8 #ifndef CONFIG_SGI_IP27 /* Kludge to fix 2.6.39 build for IP27 */ 10 9 #include <dma-coherence.h> ··· 31 32 32 33 #include <asm-generic/dma-mapping-common.h> 33 34 34 - static inline int dma_supported(struct device *dev, u64 mask) 35 - { 36 - struct dma_map_ops *ops = get_dma_ops(dev); 37 - return ops->dma_supported(dev, mask); 38 - } 39 - 40 - static inline int dma_mapping_error(struct device *dev, u64 mask) 41 - { 42 - struct dma_map_ops *ops = get_dma_ops(dev); 43 - 44 - debug_dma_mapping_error(dev, mask); 45 - return ops->mapping_error(dev, mask); 46 - } 47 - 48 - static inline int 49 - dma_set_mask(struct device *dev, u64 mask) 50 - { 51 - struct dma_map_ops *ops = get_dma_ops(dev); 52 - 53 - if(!dev->dma_mask || !dma_supported(dev, mask)) 54 - return -EIO; 55 - 56 - if (ops->set_dma_mask) 57 - return ops->set_dma_mask(dev, mask); 58 - 59 - *dev->dma_mask = mask; 60 - 61 - return 0; 62 - } 63 - 64 35 extern void dma_cache_sync(struct device *dev, void *vaddr, size_t size, 65 36 enum dma_data_direction direction); 66 - 67 - #define dma_alloc_coherent(d,s,h,f) dma_alloc_attrs(d,s,h,f,NULL) 68 - 69 - static inline void *dma_alloc_attrs(struct device *dev, size_t size, 70 - dma_addr_t *dma_handle, gfp_t gfp, 71 - struct dma_attrs *attrs) 72 - { 73 - void *ret; 74 - struct dma_map_ops *ops = get_dma_ops(dev); 75 - 76 - ret = ops->alloc(dev, size, dma_handle, gfp, attrs); 77 - 78 - debug_dma_alloc_coherent(dev, size, *dma_handle, ret); 79 - 80 - return ret; 81 - } 82 - 83 - #define dma_free_coherent(d,s,c,h) dma_free_attrs(d,s,c,h,NULL) 84 - 85 - static inline void dma_free_attrs(struct device *dev, size_t size, 86 - void *vaddr, dma_addr_t dma_handle, 87 - struct dma_attrs *attrs) 88 - { 89 - struct dma_map_ops *ops = get_dma_ops(dev); 90 - 91 - ops->free(dev, size, vaddr, dma_handle, attrs); 92 - 93 - debug_dma_free_coherent(dev, size, vaddr, dma_handle); 94 - } 95 - 96 - 97 - void *dma_alloc_noncoherent(struct device *dev, size_t size, 98 - dma_addr_t *dma_handle, gfp_t flag); 99 - 100 - void dma_free_noncoherent(struct device *dev, size_t size, 101 - void *vaddr, dma_addr_t dma_handle); 102 37 103 38 #endif /* _ASM_DMA_MAPPING_H */

+3 -8

arch/mips/loongson64/common/dma-swiotlb.c

··· 14 14 { 15 15 void *ret; 16 16 17 - if (dma_alloc_from_coherent(dev, size, dma_handle, &ret)) 18 - return ret; 19 - 20 17 /* ignore region specifiers */ 21 18 gfp &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM); 22 19 ··· 43 46 static void loongson_dma_free_coherent(struct device *dev, size_t size, 44 47 void *vaddr, dma_addr_t dma_handle, struct dma_attrs *attrs) 45 48 { 46 - int order = get_order(size); 47 - 48 - if (dma_release_from_coherent(dev, order, vaddr)) 49 - return; 50 - 51 49 swiotlb_free_coherent(dev, size, vaddr, dma_handle); 52 50 } 53 51 ··· 85 93 86 94 static int loongson_dma_set_mask(struct device *dev, u64 mask) 87 95 { 96 + if (!dev->dma_mask || !dma_supported(dev, mask)) 97 + return -EIO; 98 + 88 99 if (mask > DMA_BIT_MASK(loongson_sysconf.dma_mask_bits)) { 89 100 *dev->dma_mask = DMA_BIT_MASK(loongson_sysconf.dma_mask_bits); 90 101 return -EIO;

+12 -9

arch/mips/mm/dma-default.c

··· 112 112 return gfp | dma_flag; 113 113 } 114 114 115 - void *dma_alloc_noncoherent(struct device *dev, size_t size, 115 + static void *mips_dma_alloc_noncoherent(struct device *dev, size_t size, 116 116 dma_addr_t * dma_handle, gfp_t gfp) 117 117 { 118 118 void *ret; ··· 128 128 129 129 return ret; 130 130 } 131 - EXPORT_SYMBOL(dma_alloc_noncoherent); 132 131 133 132 static void *mips_dma_alloc_coherent(struct device *dev, size_t size, 134 133 dma_addr_t * dma_handle, gfp_t gfp, struct dma_attrs *attrs) ··· 136 137 struct page *page = NULL; 137 138 unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT; 138 139 139 - if (dma_alloc_from_coherent(dev, size, dma_handle, &ret)) 140 - return ret; 140 + /* 141 + * XXX: seems like the coherent and non-coherent implementations could 142 + * be consolidated. 143 + */ 144 + if (dma_get_attr(DMA_ATTR_NON_CONSISTENT, attrs)) 145 + return mips_dma_alloc_noncoherent(dev, size, dma_handle, gfp); 141 146 142 147 gfp = massage_gfp_flags(dev, gfp); 143 148 ··· 167 164 } 168 165 169 166 170 - void dma_free_noncoherent(struct device *dev, size_t size, void *vaddr, 171 - dma_addr_t dma_handle) 167 + static void mips_dma_free_noncoherent(struct device *dev, size_t size, 168 + void *vaddr, dma_addr_t dma_handle) 172 169 { 173 170 plat_unmap_dma_mem(dev, dma_handle, size, DMA_BIDIRECTIONAL); 174 171 free_pages((unsigned long) vaddr, get_order(size)); 175 172 } 176 - EXPORT_SYMBOL(dma_free_noncoherent); 177 173 178 174 static void mips_dma_free_coherent(struct device *dev, size_t size, void *vaddr, 179 175 dma_addr_t dma_handle, struct dma_attrs *attrs) 180 176 { 181 177 unsigned long addr = (unsigned long) vaddr; 182 - int order = get_order(size); 183 178 unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT; 184 179 struct page *page = NULL; 185 180 186 - if (dma_release_from_coherent(dev, order, vaddr)) 181 + if (dma_get_attr(DMA_ATTR_NON_CONSISTENT, attrs)) { 182 + mips_dma_free_noncoherent(dev, size, vaddr, dma_handle); 187 183 return; 184 + } 188 185 189 186 plat_unmap_dma_mem(dev, dma_handle, size, DMA_BIDIRECTIONAL); 190 187

-10

arch/mips/netlogic/common/nlm-dma.c

··· 47 47 static void *nlm_dma_alloc_coherent(struct device *dev, size_t size, 48 48 dma_addr_t *dma_handle, gfp_t gfp, struct dma_attrs *attrs) 49 49 { 50 - void *ret; 51 - 52 - if (dma_alloc_from_coherent(dev, size, dma_handle, &ret)) 53 - return ret; 54 - 55 50 /* ignore region specifiers */ 56 51 gfp &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM); 57 52 ··· 64 69 static void nlm_dma_free_coherent(struct device *dev, size_t size, 65 70 void *vaddr, dma_addr_t dma_handle, struct dma_attrs *attrs) 66 71 { 67 - int order = get_order(size); 68 - 69 - if (dma_release_from_coherent(dev, order, vaddr)) 70 - return; 71 - 72 72 swiotlb_free_coherent(dev, size, vaddr, dma_handle); 73 73 } 74 74

+2 -65

arch/openrisc/include/asm/dma-mapping.h

··· 23 23 */ 24 24 25 25 #include <linux/dma-debug.h> 26 - #include <asm-generic/dma-coherent.h> 27 26 #include <linux/kmemcheck.h> 28 27 #include <linux/dma-mapping.h> 29 28 ··· 35 36 return &or1k_dma_map_ops; 36 37 } 37 38 38 - #include <asm-generic/dma-mapping-common.h> 39 - 40 - #define dma_alloc_coherent(d,s,h,f) dma_alloc_attrs(d,s,h,f,NULL) 41 - 42 - static inline void *dma_alloc_attrs(struct device *dev, size_t size, 43 - dma_addr_t *dma_handle, gfp_t gfp, 44 - struct dma_attrs *attrs) 45 - { 46 - struct dma_map_ops *ops = get_dma_ops(dev); 47 - void *memory; 48 - 49 - memory = ops->alloc(dev, size, dma_handle, gfp, attrs); 50 - 51 - debug_dma_alloc_coherent(dev, size, *dma_handle, memory); 52 - 53 - return memory; 54 - } 55 - 56 - #define dma_free_coherent(d,s,c,h) dma_free_attrs(d,s,c,h,NULL) 57 - 58 - static inline void dma_free_attrs(struct device *dev, size_t size, 59 - void *cpu_addr, dma_addr_t dma_handle, 60 - struct dma_attrs *attrs) 61 - { 62 - struct dma_map_ops *ops = get_dma_ops(dev); 63 - 64 - debug_dma_free_coherent(dev, size, cpu_addr, dma_handle); 65 - 66 - ops->free(dev, size, cpu_addr, dma_handle, attrs); 67 - } 68 - 69 - static inline void *dma_alloc_noncoherent(struct device *dev, size_t size, 70 - dma_addr_t *dma_handle, gfp_t gfp) 71 - { 72 - struct dma_attrs attrs; 73 - 74 - dma_set_attr(DMA_ATTR_NON_CONSISTENT, &attrs); 75 - 76 - return dma_alloc_attrs(dev, size, dma_handle, gfp, &attrs); 77 - } 78 - 79 - static inline void dma_free_noncoherent(struct device *dev, size_t size, 80 - void *cpu_addr, dma_addr_t dma_handle) 81 - { 82 - struct dma_attrs attrs; 83 - 84 - dma_set_attr(DMA_ATTR_NON_CONSISTENT, &attrs); 85 - 86 - dma_free_attrs(dev, size, cpu_addr, dma_handle, &attrs); 87 - } 88 - 39 + #define HAVE_ARCH_DMA_SUPPORTED 1 89 40 static inline int dma_supported(struct device *dev, u64 dma_mask) 90 41 { 91 42 /* Support 32 bit DMA mask exclusively */ 92 43 return dma_mask == DMA_BIT_MASK(32); 93 44 } 94 45 95 - static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) 96 - { 97 - return 0; 98 - } 46 + #include <asm-generic/dma-mapping-common.h> 99 47 100 - static inline int dma_set_mask(struct device *dev, u64 dma_mask) 101 - { 102 - if (!dev->dma_mask || !dma_supported(dev, dma_mask)) 103 - return -EIO; 104 - 105 - *dev->dma_mask = dma_mask; 106 - 107 - return 0; 108 - } 109 48 #endif /* __ASM_OPENRISC_DMA_MAPPING_H */

+1

arch/powerpc/Kconfig

··· 420 420 config KEXEC 421 421 bool "kexec system call" 422 422 depends on (PPC_BOOK3S || FSL_BOOKE || (44x && !SMP)) 423 + select KEXEC_CORE 423 424 help 424 425 kexec is a system call that implements the ability to shutdown your 425 426 current kernel, and to start another kernel. It is like a reboot

+5 -63

arch/powerpc/include/asm/dma-mapping.h

··· 18 18 #include <asm/io.h> 19 19 #include <asm/swiotlb.h> 20 20 21 + #ifdef CONFIG_PPC64 21 22 #define DMA_ERROR_CODE (~(dma_addr_t)0x0) 23 + #endif 22 24 23 25 /* Some dma direct funcs must be visible for use in other dma_ops */ 24 26 extern void *__dma_direct_alloc_coherent(struct device *dev, size_t size, ··· 122 120 /* this will be removed soon */ 123 121 #define flush_write_buffers() 124 122 123 + #define HAVE_ARCH_DMA_SET_MASK 1 124 + extern int dma_set_mask(struct device *dev, u64 dma_mask); 125 + 125 126 #include <asm-generic/dma-mapping-common.h> 126 127 127 - static inline int dma_supported(struct device *dev, u64 mask) 128 - { 129 - struct dma_map_ops *dma_ops = get_dma_ops(dev); 130 - 131 - if (unlikely(dma_ops == NULL)) 132 - return 0; 133 - if (dma_ops->dma_supported == NULL) 134 - return 1; 135 - return dma_ops->dma_supported(dev, mask); 136 - } 137 - 138 - extern int dma_set_mask(struct device *dev, u64 dma_mask); 139 128 extern int __dma_set_mask(struct device *dev, u64 dma_mask); 140 129 extern u64 __dma_get_required_mask(struct device *dev); 141 - 142 - #define dma_alloc_coherent(d,s,h,f) dma_alloc_attrs(d,s,h,f,NULL) 143 - 144 - static inline void *dma_alloc_attrs(struct device *dev, size_t size, 145 - dma_addr_t *dma_handle, gfp_t flag, 146 - struct dma_attrs *attrs) 147 - { 148 - struct dma_map_ops *dma_ops = get_dma_ops(dev); 149 - void *cpu_addr; 150 - 151 - BUG_ON(!dma_ops); 152 - 153 - cpu_addr = dma_ops->alloc(dev, size, dma_handle, flag, attrs); 154 - 155 - debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr); 156 - 157 - return cpu_addr; 158 - } 159 - 160 - #define dma_free_coherent(d,s,c,h) dma_free_attrs(d,s,c,h,NULL) 161 - 162 - static inline void dma_free_attrs(struct device *dev, size_t size, 163 - void *cpu_addr, dma_addr_t dma_handle, 164 - struct dma_attrs *attrs) 165 - { 166 - struct dma_map_ops *dma_ops = get_dma_ops(dev); 167 - 168 - BUG_ON(!dma_ops); 169 - 170 - debug_dma_free_coherent(dev, size, cpu_addr, dma_handle); 171 - 172 - dma_ops->free(dev, size, cpu_addr, dma_handle, attrs); 173 - } 174 - 175 - static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) 176 - { 177 - struct dma_map_ops *dma_ops = get_dma_ops(dev); 178 - 179 - debug_dma_mapping_error(dev, dma_addr); 180 - if (dma_ops->mapping_error) 181 - return dma_ops->mapping_error(dev, dma_addr); 182 - 183 - #ifdef CONFIG_PPC64 184 - return (dma_addr == DMA_ERROR_CODE); 185 - #else 186 - return 0; 187 - #endif 188 - } 189 130 190 131 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size) 191 132 { ··· 154 209 { 155 210 return daddr - get_dma_offset(dev); 156 211 } 157 - 158 - #define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f) 159 - #define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h) 160 212 161 213 #define ARCH_HAS_DMA_MMAP_COHERENT 162 214

+1

arch/s390/Kconfig

··· 48 48 49 49 config KEXEC 50 50 def_bool y 51 + select KEXEC_CORE 51 52 52 53 config AUDIT_ARCH 53 54 def_bool y

+1 -1

arch/s390/boot/compressed/misc.c

··· 167 167 #endif 168 168 169 169 puts("Uncompressing Linux... "); 170 - decompress(input_data, input_len, NULL, NULL, output, NULL, error); 170 + __decompress(input_data, input_len, NULL, NULL, output, 0, NULL, error); 171 171 puts("Ok, booting the kernel.\n"); 172 172 return (unsigned long) output; 173 173 }

-55

arch/s390/include/asm/dma-mapping.h

··· 18 18 return &s390_dma_ops; 19 19 } 20 20 21 - extern int dma_set_mask(struct device *dev, u64 mask); 22 - 23 21 static inline void dma_cache_sync(struct device *dev, void *vaddr, size_t size, 24 22 enum dma_data_direction direction) 25 23 { 26 24 } 27 25 28 - #define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f) 29 - #define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h) 30 - 31 26 #include <asm-generic/dma-mapping-common.h> 32 - 33 - static inline int dma_supported(struct device *dev, u64 mask) 34 - { 35 - struct dma_map_ops *dma_ops = get_dma_ops(dev); 36 - 37 - if (dma_ops->dma_supported == NULL) 38 - return 1; 39 - return dma_ops->dma_supported(dev, mask); 40 - } 41 27 42 28 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size) 43 29 { 44 30 if (!dev->dma_mask) 45 31 return false; 46 32 return addr + size - 1 <= *dev->dma_mask; 47 - } 48 - 49 - static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) 50 - { 51 - struct dma_map_ops *dma_ops = get_dma_ops(dev); 52 - 53 - debug_dma_mapping_error(dev, dma_addr); 54 - if (dma_ops->mapping_error) 55 - return dma_ops->mapping_error(dev, dma_addr); 56 - return dma_addr == DMA_ERROR_CODE; 57 - } 58 - 59 - #define dma_alloc_coherent(d, s, h, f) dma_alloc_attrs(d, s, h, f, NULL) 60 - 61 - static inline void *dma_alloc_attrs(struct device *dev, size_t size, 62 - dma_addr_t *dma_handle, gfp_t flags, 63 - struct dma_attrs *attrs) 64 - { 65 - struct dma_map_ops *ops = get_dma_ops(dev); 66 - void *cpu_addr; 67 - 68 - BUG_ON(!ops); 69 - 70 - cpu_addr = ops->alloc(dev, size, dma_handle, flags, attrs); 71 - debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr); 72 - 73 - return cpu_addr; 74 - } 75 - 76 - #define dma_free_coherent(d, s, c, h) dma_free_attrs(d, s, c, h, NULL) 77 - 78 - static inline void dma_free_attrs(struct device *dev, size_t size, 79 - void *cpu_addr, dma_addr_t dma_handle, 80 - struct dma_attrs *attrs) 81 - { 82 - struct dma_map_ops *ops = get_dma_ops(dev); 83 - 84 - BUG_ON(!ops); 85 - 86 - debug_dma_free_coherent(dev, size, cpu_addr, dma_handle); 87 - ops->free(dev, size, cpu_addr, dma_handle, attrs); 88 33 } 89 34 90 35 #endif /* _ASM_S390_DMA_MAPPING_H */

-10

arch/s390/pci/pci_dma.c

··· 262 262 spin_unlock_irqrestore(&zdev->iommu_bitmap_lock, flags); 263 263 } 264 264 265 - int dma_set_mask(struct device *dev, u64 mask) 266 - { 267 - if (!dev->dma_mask || !dma_supported(dev, mask)) 268 - return -EIO; 269 - 270 - *dev->dma_mask = mask; 271 - return 0; 272 - } 273 - EXPORT_SYMBOL_GPL(dma_set_mask); 274 - 275 265 static dma_addr_t s390_dma_map_pages(struct device *dev, struct page *page, 276 266 unsigned long offset, size_t size, 277 267 enum dma_data_direction direction,

+1

arch/sh/Kconfig

··· 602 602 config KEXEC 603 603 bool "kexec system call (EXPERIMENTAL)" 604 604 depends on SUPERH32 && MMU 605 + select KEXEC_CORE 605 606 help 606 607 kexec is a system call that implements the ability to shutdown your 607 608 current kernel, and to start another kernel. It is like a reboot

+1 -1

arch/sh/boot/compressed/misc.c

··· 132 132 133 133 puts("Uncompressing Linux... "); 134 134 cache_control(CACHE_ENABLE); 135 - decompress(input_data, input_len, NULL, NULL, output, NULL, error); 135 + __decompress(input_data, input_len, NULL, NULL, output, 0, NULL, error); 136 136 cache_control(CACHE_DISABLE); 137 137 puts("Ok, booting the kernel.\n"); 138 138 }

+2 -75

arch/sh/include/asm/dma-mapping.h

··· 9 9 return dma_ops; 10 10 } 11 11 12 - #include <asm-generic/dma-coherent.h> 12 + #define DMA_ERROR_CODE 0 13 + 13 14 #include <asm-generic/dma-mapping-common.h> 14 - 15 - static inline int dma_supported(struct device *dev, u64 mask) 16 - { 17 - struct dma_map_ops *ops = get_dma_ops(dev); 18 - 19 - if (ops->dma_supported) 20 - return ops->dma_supported(dev, mask); 21 - 22 - return 1; 23 - } 24 - 25 - static inline int dma_set_mask(struct device *dev, u64 mask) 26 - { 27 - struct dma_map_ops *ops = get_dma_ops(dev); 28 - 29 - if (!dev->dma_mask || !dma_supported(dev, mask)) 30 - return -EIO; 31 - if (ops->set_dma_mask) 32 - return ops->set_dma_mask(dev, mask); 33 - 34 - *dev->dma_mask = mask; 35 - 36 - return 0; 37 - } 38 15 39 16 void dma_cache_sync(struct device *dev, void *vaddr, size_t size, 40 17 enum dma_data_direction dir); 41 - 42 - #define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f) 43 - #define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h) 44 - 45 - static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) 46 - { 47 - struct dma_map_ops *ops = get_dma_ops(dev); 48 - 49 - debug_dma_mapping_error(dev, dma_addr); 50 - if (ops->mapping_error) 51 - return ops->mapping_error(dev, dma_addr); 52 - 53 - return dma_addr == 0; 54 - } 55 - 56 - #define dma_alloc_coherent(d,s,h,f) dma_alloc_attrs(d,s,h,f,NULL) 57 - 58 - static inline void *dma_alloc_attrs(struct device *dev, size_t size, 59 - dma_addr_t *dma_handle, gfp_t gfp, 60 - struct dma_attrs *attrs) 61 - { 62 - struct dma_map_ops *ops = get_dma_ops(dev); 63 - void *memory; 64 - 65 - if (dma_alloc_from_coherent(dev, size, dma_handle, &memory)) 66 - return memory; 67 - if (!ops->alloc) 68 - return NULL; 69 - 70 - memory = ops->alloc(dev, size, dma_handle, gfp, attrs); 71 - debug_dma_alloc_coherent(dev, size, *dma_handle, memory); 72 - 73 - return memory; 74 - } 75 - 76 - #define dma_free_coherent(d,s,c,h) dma_free_attrs(d,s,c,h,NULL) 77 - 78 - static inline void dma_free_attrs(struct device *dev, size_t size, 79 - void *vaddr, dma_addr_t dma_handle, 80 - struct dma_attrs *attrs) 81 - { 82 - struct dma_map_ops *ops = get_dma_ops(dev); 83 - 84 - if (dma_release_from_coherent(dev, get_order(size), vaddr)) 85 - return; 86 - 87 - debug_dma_free_coherent(dev, size, vaddr, dma_handle); 88 - if (ops->free) 89 - ops->free(dev, size, vaddr, dma_handle, attrs); 90 - } 91 18 92 19 /* arch/sh/mm/consistent.c */ 93 20 extern void *dma_generic_alloc_coherent(struct device *dev, size_t size,

+4 -36

arch/sparc/include/asm/dma-mapping.h

··· 7 7 8 8 #define DMA_ERROR_CODE (~(dma_addr_t)0x0) 9 9 10 + #define HAVE_ARCH_DMA_SUPPORTED 1 10 11 int dma_supported(struct device *dev, u64 mask); 11 - 12 - #define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f) 13 - #define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h) 14 12 15 13 static inline void dma_cache_sync(struct device *dev, void *vaddr, size_t size, 16 14 enum dma_data_direction dir) ··· 37 39 return dma_ops; 38 40 } 39 41 40 - #include <asm-generic/dma-mapping-common.h> 41 - 42 - #define dma_alloc_coherent(d,s,h,f) dma_alloc_attrs(d,s,h,f,NULL) 43 - 44 - static inline void *dma_alloc_attrs(struct device *dev, size_t size, 45 - dma_addr_t *dma_handle, gfp_t flag, 46 - struct dma_attrs *attrs) 47 - { 48 - struct dma_map_ops *ops = get_dma_ops(dev); 49 - void *cpu_addr; 50 - 51 - cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs); 52 - debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr); 53 - return cpu_addr; 54 - } 55 - 56 - #define dma_free_coherent(d,s,c,h) dma_free_attrs(d,s,c,h,NULL) 57 - 58 - static inline void dma_free_attrs(struct device *dev, size_t size, 59 - void *cpu_addr, dma_addr_t dma_handle, 60 - struct dma_attrs *attrs) 61 - { 62 - struct dma_map_ops *ops = get_dma_ops(dev); 63 - 64 - debug_dma_free_coherent(dev, size, cpu_addr, dma_handle); 65 - ops->free(dev, size, cpu_addr, dma_handle, attrs); 66 - } 67 - 68 - static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) 69 - { 70 - debug_dma_mapping_error(dev, dma_addr); 71 - return (dma_addr == DMA_ERROR_CODE); 72 - } 42 + #define HAVE_ARCH_DMA_SET_MASK 1 73 43 74 44 static inline int dma_set_mask(struct device *dev, u64 mask) 75 45 { ··· 51 85 #endif 52 86 return -EINVAL; 53 87 } 88 + 89 + #include <asm-generic/dma-mapping-common.h> 54 90 55 91 #endif

+1

arch/tile/Kconfig

··· 205 205 206 206 config KEXEC 207 207 bool "kexec system call" 208 + select KEXEC_CORE 208 209 ---help--- 209 210 kexec is a system call that implements the ability to shutdown your 210 211 current kernel, and to start another kernel. It is like a reboot

+2 -43

arch/tile/include/asm/dma-mapping.h

··· 59 59 60 60 static inline void dma_mark_clean(void *addr, size_t size) {} 61 61 62 - #include <asm-generic/dma-mapping-common.h> 63 - 64 62 static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops) 65 63 { 66 64 dev->archdata.dma_ops = ops; ··· 72 74 return addr + size - 1 <= *dev->dma_mask; 73 75 } 74 76 75 - static inline int 76 - dma_mapping_error(struct device *dev, dma_addr_t dma_addr) 77 - { 78 - debug_dma_mapping_error(dev, dma_addr); 79 - return get_dma_ops(dev)->mapping_error(dev, dma_addr); 80 - } 77 + #define HAVE_ARCH_DMA_SET_MASK 1 81 78 82 - static inline int 83 - dma_supported(struct device *dev, u64 mask) 84 - { 85 - return get_dma_ops(dev)->dma_supported(dev, mask); 86 - } 79 + #include <asm-generic/dma-mapping-common.h> 87 80 88 81 static inline int 89 82 dma_set_mask(struct device *dev, u64 mask) ··· 104 115 105 116 return 0; 106 117 } 107 - 108 - static inline void *dma_alloc_attrs(struct device *dev, size_t size, 109 - dma_addr_t *dma_handle, gfp_t flag, 110 - struct dma_attrs *attrs) 111 - { 112 - struct dma_map_ops *dma_ops = get_dma_ops(dev); 113 - void *cpu_addr; 114 - 115 - cpu_addr = dma_ops->alloc(dev, size, dma_handle, flag, attrs); 116 - 117 - debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr); 118 - 119 - return cpu_addr; 120 - } 121 - 122 - static inline void dma_free_attrs(struct device *dev, size_t size, 123 - void *cpu_addr, dma_addr_t dma_handle, 124 - struct dma_attrs *attrs) 125 - { 126 - struct dma_map_ops *dma_ops = get_dma_ops(dev); 127 - 128 - debug_dma_free_coherent(dev, size, cpu_addr, dma_handle); 129 - 130 - dma_ops->free(dev, size, cpu_addr, dma_handle, attrs); 131 - } 132 - 133 - #define dma_alloc_coherent(d, s, h, f) dma_alloc_attrs(d, s, h, f, NULL) 134 - #define dma_alloc_noncoherent(d, s, h, f) dma_alloc_attrs(d, s, h, f, NULL) 135 - #define dma_free_coherent(d, s, v, h) dma_free_attrs(d, s, v, h, NULL) 136 - #define dma_free_noncoherent(d, s, v, h) dma_free_attrs(d, s, v, h, NULL) 137 118 138 119 /* 139 120 * dma_alloc_noncoherent() is #defined to return coherent memory,

+2 -2

arch/unicore32/boot/compressed/misc.c

··· 119 119 output_ptr = get_unaligned_le32(tmp); 120 120 121 121 arch_decomp_puts("Uncompressing Linux..."); 122 - decompress(input_data, input_data_end - input_data, NULL, NULL, 123 - output_data, NULL, error); 122 + __decompress(input_data, input_data_end - input_data, NULL, NULL, 123 + output_data, 0, NULL, error); 124 124 arch_decomp_puts(" done, booting the kernel.\n"); 125 125 return output_ptr; 126 126 }

-57

arch/unicore32/include/asm/dma-mapping.h

··· 18 18 #include <linux/scatterlist.h> 19 19 #include <linux/swiotlb.h> 20 20 21 - #include <asm-generic/dma-coherent.h> 22 - 23 21 #include <asm/memory.h> 24 22 #include <asm/cacheflush.h> 25 23 ··· 26 28 static inline struct dma_map_ops *get_dma_ops(struct device *dev) 27 29 { 28 30 return &swiotlb_dma_map_ops; 29 - } 30 - 31 - static inline int dma_supported(struct device *dev, u64 mask) 32 - { 33 - struct dma_map_ops *dma_ops = get_dma_ops(dev); 34 - 35 - if (unlikely(dma_ops == NULL)) 36 - return 0; 37 - 38 - return dma_ops->dma_supported(dev, mask); 39 - } 40 - 41 - static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) 42 - { 43 - struct dma_map_ops *dma_ops = get_dma_ops(dev); 44 - 45 - if (dma_ops->mapping_error) 46 - return dma_ops->mapping_error(dev, dma_addr); 47 - 48 - return 0; 49 31 } 50 32 51 33 #include <asm-generic/dma-mapping-common.h> ··· 49 71 } 50 72 51 73 static inline void dma_mark_clean(void *addr, size_t size) {} 52 - 53 - static inline int dma_set_mask(struct device *dev, u64 dma_mask) 54 - { 55 - if (!dev->dma_mask || !dma_supported(dev, dma_mask)) 56 - return -EIO; 57 - 58 - *dev->dma_mask = dma_mask; 59 - 60 - return 0; 61 - } 62 - 63 - #define dma_alloc_coherent(d,s,h,f) dma_alloc_attrs(d,s,h,f,NULL) 64 - 65 - static inline void *dma_alloc_attrs(struct device *dev, size_t size, 66 - dma_addr_t *dma_handle, gfp_t flag, 67 - struct dma_attrs *attrs) 68 - { 69 - struct dma_map_ops *dma_ops = get_dma_ops(dev); 70 - 71 - return dma_ops->alloc(dev, size, dma_handle, flag, attrs); 72 - } 73 - 74 - #define dma_free_coherent(d,s,c,h) dma_free_attrs(d,s,c,h,NULL) 75 - 76 - static inline void dma_free_attrs(struct device *dev, size_t size, 77 - void *cpu_addr, dma_addr_t dma_handle, 78 - struct dma_attrs *attrs) 79 - { 80 - struct dma_map_ops *dma_ops = get_dma_ops(dev); 81 - 82 - dma_ops->free(dev, size, cpu_addr, dma_handle, attrs); 83 - } 84 - 85 - #define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f) 86 - #define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h) 87 74 88 75 static inline void dma_cache_sync(struct device *dev, void *vaddr, 89 76 size_t size, enum dma_data_direction direction)

+2 -1

arch/x86/Kconfig

··· 1754 1754 1755 1755 config KEXEC 1756 1756 bool "kexec system call" 1757 + select KEXEC_CORE 1757 1758 ---help--- 1758 1759 kexec is a system call that implements the ability to shutdown your 1759 1760 current kernel, and to start another kernel. It is like a reboot ··· 1771 1770 1772 1771 config KEXEC_FILE 1773 1772 bool "kexec file based system call" 1773 + select KEXEC_CORE 1774 1774 select BUILD_BIN2C 1775 - depends on KEXEC 1776 1775 depends on X86_64 1777 1776 depends on CRYPTO=y 1778 1777 depends on CRYPTO_SHA256=y

+2 -1

arch/x86/boot/compressed/misc.c

··· 448 448 #endif 449 449 450 450 debug_putstr("\nDecompressing Linux... "); 451 - decompress(input_data, input_len, NULL, NULL, output, NULL, error); 451 + __decompress(input_data, input_len, NULL, NULL, output, output_len, 452 + NULL, error); 452 453 parse_elf(output); 453 454 /* 454 455 * 32-bit always performs relocations. 64-bit relocations are only

+1 -1

arch/x86/boot/header.S

··· 414 414 # define XLF23 0 415 415 #endif 416 416 417 - #if defined(CONFIG_X86_64) && defined(CONFIG_EFI) && defined(CONFIG_KEXEC) 417 + #if defined(CONFIG_X86_64) && defined(CONFIG_EFI) && defined(CONFIG_KEXEC_CORE) 418 418 # define XLF4 XLF_EFI_KEXEC 419 419 #else 420 420 # define XLF4 0

+1 -1

arch/x86/entry/vsyscall/vsyscall_64.c

··· 277 277 { 278 278 return "[vsyscall]"; 279 279 } 280 - static struct vm_operations_struct gate_vma_ops = { 280 + static const struct vm_operations_struct gate_vma_ops = { 281 281 .name = gate_vma_name, 282 282 }; 283 283 static struct vm_area_struct gate_vma = {

+5 -29

arch/x86/include/asm/dma-mapping.h

··· 12 12 #include <linux/dma-attrs.h> 13 13 #include <asm/io.h> 14 14 #include <asm/swiotlb.h> 15 - #include <asm-generic/dma-coherent.h> 16 15 #include <linux/dma-contiguous.h> 17 16 18 17 #ifdef CONFIG_ISA ··· 40 41 #endif 41 42 } 42 43 43 - #include <asm-generic/dma-mapping-common.h> 44 + bool arch_dma_alloc_attrs(struct device **dev, gfp_t *gfp); 45 + #define arch_dma_alloc_attrs arch_dma_alloc_attrs 44 46 45 - /* Make sure we keep the same behaviour */ 46 - static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) 47 - { 48 - struct dma_map_ops *ops = get_dma_ops(dev); 49 - debug_dma_mapping_error(dev, dma_addr); 50 - if (ops->mapping_error) 51 - return ops->mapping_error(dev, dma_addr); 52 - 53 - return (dma_addr == DMA_ERROR_CODE); 54 - } 55 - 56 - #define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f) 57 - #define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h) 58 - 47 + #define HAVE_ARCH_DMA_SUPPORTED 1 59 48 extern int dma_supported(struct device *hwdev, u64 mask); 60 - extern int dma_set_mask(struct device *dev, u64 mask); 49 + 50 + #include <asm-generic/dma-mapping-common.h> 61 51 62 52 extern void *dma_generic_alloc_coherent(struct device *dev, size_t size, 63 53 dma_addr_t *dma_addr, gfp_t flag, ··· 112 124 #endif 113 125 return gfp; 114 126 } 115 - 116 - #define dma_alloc_coherent(d,s,h,f) dma_alloc_attrs(d,s,h,f,NULL) 117 - 118 - void * 119 - dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle, 120 - gfp_t gfp, struct dma_attrs *attrs); 121 - 122 - #define dma_free_coherent(d,s,c,h) dma_free_attrs(d,s,c,h,NULL) 123 - 124 - void dma_free_attrs(struct device *dev, size_t size, 125 - void *vaddr, dma_addr_t bus, 126 - struct dma_attrs *attrs); 127 127 128 128 #endif

+1 -1

arch/x86/include/asm/kdebug.h

··· 29 29 extern void __show_regs(struct pt_regs *regs, int all); 30 30 extern unsigned long oops_begin(void); 31 31 extern void oops_end(unsigned long, struct pt_regs *, int signr); 32 - #ifdef CONFIG_KEXEC 32 + #ifdef CONFIG_KEXEC_CORE 33 33 extern int in_crash_kexec; 34 34 #else 35 35 /* no crash dump is ever in progress if no crash kernel can be kexec'd */

+2 -2

arch/x86/kernel/Makefile

··· 71 71 obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o 72 72 obj-$(CONFIG_FTRACE_SYSCALLS) += ftrace.o 73 73 obj-$(CONFIG_X86_TSC) += trace_clock.o 74 - obj-$(CONFIG_KEXEC) += machine_kexec_$(BITS).o 75 - obj-$(CONFIG_KEXEC) += relocate_kernel_$(BITS).o crash.o 74 + obj-$(CONFIG_KEXEC_CORE) += machine_kexec_$(BITS).o 75 + obj-$(CONFIG_KEXEC_CORE) += relocate_kernel_$(BITS).o crash.o 76 76 obj-$(CONFIG_KEXEC_FILE) += kexec-bzimage64.o 77 77 obj-$(CONFIG_CRASH_DUMP) += crash_dump_$(BITS).o 78 78 obj-y += kprobes/

+2 -2

arch/x86/kernel/kvmclock.c

··· 200 200 * kind of shutdown from our side, we unregister the clock by writting anything 201 201 * that does not have the 'enable' bit set in the msr 202 202 */ 203 - #ifdef CONFIG_KEXEC 203 + #ifdef CONFIG_KEXEC_CORE 204 204 static void kvm_crash_shutdown(struct pt_regs *regs) 205 205 { 206 206 native_write_msr(msr_kvm_system_time, 0, 0); ··· 259 259 x86_platform.save_sched_clock_state = kvm_save_sched_clock_state; 260 260 x86_platform.restore_sched_clock_state = kvm_restore_sched_clock_state; 261 261 machine_ops.shutdown = kvm_shutdown; 262 - #ifdef CONFIG_KEXEC 262 + #ifdef CONFIG_KEXEC_CORE 263 263 machine_ops.crash_shutdown = kvm_crash_shutdown; 264 264 #endif 265 265 kvm_get_preset_lpj();

+9 -51

arch/x86/kernel/pci-dma.c

··· 58 58 /* Number of entries preallocated for DMA-API debugging */ 59 59 #define PREALLOC_DMA_DEBUG_ENTRIES 65536 60 60 61 - int dma_set_mask(struct device *dev, u64 mask) 62 - { 63 - if (!dev->dma_mask || !dma_supported(dev, mask)) 64 - return -EIO; 65 - 66 - *dev->dma_mask = mask; 67 - 68 - return 0; 69 - } 70 - EXPORT_SYMBOL(dma_set_mask); 71 - 72 61 void __init pci_iommu_alloc(void) 73 62 { 74 63 struct iommu_table_entry *p; ··· 129 140 free_pages((unsigned long)vaddr, get_order(size)); 130 141 } 131 142 132 - void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle, 133 - gfp_t gfp, struct dma_attrs *attrs) 143 + bool arch_dma_alloc_attrs(struct device **dev, gfp_t *gfp) 134 144 { 135 - struct dma_map_ops *ops = get_dma_ops(dev); 136 - void *memory; 145 + *gfp = dma_alloc_coherent_gfp_flags(*dev, *gfp); 146 + *gfp &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32); 137 147 138 - gfp &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32); 148 + if (!*dev) 149 + *dev = &x86_dma_fallback_dev; 150 + if (!is_device_dma_capable(*dev)) 151 + return false; 152 + return true; 139 153 140 - if (dma_alloc_from_coherent(dev, size, dma_handle, &memory)) 141 - return memory; 142 - 143 - if (!dev) 144 - dev = &x86_dma_fallback_dev; 145 - 146 - if (!is_device_dma_capable(dev)) 147 - return NULL; 148 - 149 - if (!ops->alloc) 150 - return NULL; 151 - 152 - memory = ops->alloc(dev, size, dma_handle, 153 - dma_alloc_coherent_gfp_flags(dev, gfp), attrs); 154 - debug_dma_alloc_coherent(dev, size, *dma_handle, memory); 155 - 156 - return memory; 157 154 } 158 - EXPORT_SYMBOL(dma_alloc_attrs); 159 - 160 - void dma_free_attrs(struct device *dev, size_t size, 161 - void *vaddr, dma_addr_t bus, 162 - struct dma_attrs *attrs) 163 - { 164 - struct dma_map_ops *ops = get_dma_ops(dev); 165 - 166 - WARN_ON(irqs_disabled()); /* for portability */ 167 - 168 - if (dma_release_from_coherent(dev, get_order(size), vaddr)) 169 - return; 170 - 171 - debug_dma_free_coherent(dev, size, vaddr, bus); 172 - if (ops->free) 173 - ops->free(dev, size, vaddr, bus, attrs); 174 - } 175 - EXPORT_SYMBOL(dma_free_attrs); 155 + EXPORT_SYMBOL(arch_dma_alloc_attrs); 176 156 177 157 /* 178 158 * See <Documentation/x86/x86_64/boot-options.txt> for the iommu kernel

+2 -2

arch/x86/kernel/reboot.c

··· 673 673 .emergency_restart = native_machine_emergency_restart, 674 674 .restart = native_machine_restart, 675 675 .halt = native_machine_halt, 676 - #ifdef CONFIG_KEXEC 676 + #ifdef CONFIG_KEXEC_CORE 677 677 .crash_shutdown = native_machine_crash_shutdown, 678 678 #endif 679 679 }; ··· 703 703 machine_ops.halt(); 704 704 } 705 705 706 - #ifdef CONFIG_KEXEC 706 + #ifdef CONFIG_KEXEC_CORE 707 707 void machine_crash_shutdown(struct pt_regs *regs) 708 708 { 709 709 machine_ops.crash_shutdown(regs);

+1 -1

arch/x86/kernel/setup.c

··· 478 478 * --------- Crashkernel reservation ------------------------------ 479 479 */ 480 480 481 - #ifdef CONFIG_KEXEC 481 + #ifdef CONFIG_KEXEC_CORE 482 482 483 483 /* 484 484 * Keep the crash kernel below this limit. On 32 bits earlier kernels

+1 -1

arch/x86/kernel/vmlinux.lds.S

··· 364 364 365 365 #endif /* CONFIG_X86_32 */ 366 366 367 - #ifdef CONFIG_KEXEC 367 + #ifdef CONFIG_KEXEC_CORE 368 368 #include <asm/kexec.h> 369 369 370 370 . = ASSERT(kexec_control_code_size <= KEXEC_CONTROL_CODE_MAX_SIZE,

+4 -4

arch/x86/kvm/vmx.c

··· 1264 1264 vmcs, phys_addr); 1265 1265 } 1266 1266 1267 - #ifdef CONFIG_KEXEC 1267 + #ifdef CONFIG_KEXEC_CORE 1268 1268 /* 1269 1269 * This bitmap is used to indicate whether the vmclear 1270 1270 * operation is enabled on all cpus. All disabled by ··· 1302 1302 #else 1303 1303 static inline void crash_enable_local_vmclear(int cpu) { } 1304 1304 static inline void crash_disable_local_vmclear(int cpu) { } 1305 - #endif /* CONFIG_KEXEC */ 1305 + #endif /* CONFIG_KEXEC_CORE */ 1306 1306 1307 1307 static void __loaded_vmcs_clear(void *arg) 1308 1308 { ··· 10411 10411 if (r) 10412 10412 return r; 10413 10413 10414 - #ifdef CONFIG_KEXEC 10414 + #ifdef CONFIG_KEXEC_CORE 10415 10415 rcu_assign_pointer(crash_vmclear_loaded_vmcss, 10416 10416 crash_vmclear_local_loaded_vmcss); 10417 10417 #endif ··· 10421 10421 10422 10422 static void __exit vmx_exit(void) 10423 10423 { 10424 - #ifdef CONFIG_KEXEC 10424 + #ifdef CONFIG_KEXEC_CORE 10425 10425 RCU_INIT_POINTER(crash_vmclear_loaded_vmcss, NULL); 10426 10426 synchronize_rcu(); 10427 10427 #endif

+7 -44

arch/x86/mm/mpx.c

··· 42 42 */ 43 43 static unsigned long mpx_mmap(unsigned long len) 44 44 { 45 - unsigned long ret; 46 - unsigned long addr, pgoff; 47 45 struct mm_struct *mm = current->mm; 48 - vm_flags_t vm_flags; 49 - struct vm_area_struct *vma; 46 + unsigned long addr, populate; 50 47 51 48 /* Only bounds table can be allocated here */ 52 49 if (len != mpx_bt_size_bytes(mm)) 53 50 return -EINVAL; 54 51 55 52 down_write(&mm->mmap_sem); 56 - 57 - /* Too many mappings? */ 58 - if (mm->map_count > sysctl_max_map_count) { 59 - ret = -ENOMEM; 60 - goto out; 61 - } 62 - 63 - /* Obtain the address to map to. we verify (or select) it and ensure 64 - * that it represents a valid section of the address space. 65 - */ 66 - addr = get_unmapped_area(NULL, 0, len, 0, MAP_ANONYMOUS | MAP_PRIVATE); 67 - if (addr & ~PAGE_MASK) { 68 - ret = addr; 69 - goto out; 70 - } 71 - 72 - vm_flags = VM_READ | VM_WRITE | VM_MPX | 73 - mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; 74 - 75 - /* Set pgoff according to addr for anon_vma */ 76 - pgoff = addr >> PAGE_SHIFT; 77 - 78 - ret = mmap_region(NULL, addr, len, vm_flags, pgoff); 79 - if (IS_ERR_VALUE(ret)) 80 - goto out; 81 - 82 - vma = find_vma(mm, ret); 83 - if (!vma) { 84 - ret = -ENOMEM; 85 - goto out; 86 - } 87 - 88 - if (vm_flags & VM_LOCKED) { 89 - up_write(&mm->mmap_sem); 90 - mm_populate(ret, len); 91 - return ret; 92 - } 93 - 94 - out: 53 + addr = do_mmap(NULL, 0, len, PROT_READ | PROT_WRITE, 54 + MAP_ANONYMOUS | MAP_PRIVATE, VM_MPX, 0, &populate); 95 55 up_write(&mm->mmap_sem); 96 - return ret; 56 + if (populate) 57 + mm_populate(addr, populate); 58 + 59 + return addr; 97 60 } 98 61 99 62 enum reg_type {

+2 -2

arch/x86/platform/efi/efi.c

··· 650 650 651 651 static void __init save_runtime_map(void) 652 652 { 653 - #ifdef CONFIG_KEXEC 653 + #ifdef CONFIG_KEXEC_CORE 654 654 efi_memory_desc_t *md; 655 655 void *tmp, *p, *q = NULL; 656 656 int count = 0; ··· 748 748 749 749 static void __init kexec_enter_virtual_mode(void) 750 750 { 751 - #ifdef CONFIG_KEXEC 751 + #ifdef CONFIG_KEXEC_CORE 752 752 efi_memory_desc_t *md; 753 753 void *p; 754 754

+3 -3

arch/x86/platform/uv/uv_nmi.c

··· 492 492 touch_nmi_watchdog(); 493 493 } 494 494 495 - #if defined(CONFIG_KEXEC) 495 + #if defined(CONFIG_KEXEC_CORE) 496 496 static atomic_t uv_nmi_kexec_failed; 497 497 static void uv_nmi_kdump(int cpu, int master, struct pt_regs *regs) 498 498 { ··· 519 519 uv_nmi_sync_exit(0); 520 520 } 521 521 522 - #else /* !CONFIG_KEXEC */ 522 + #else /* !CONFIG_KEXEC_CORE */ 523 523 static inline void uv_nmi_kdump(int cpu, int master, struct pt_regs *regs) 524 524 { 525 525 if (master) 526 526 pr_err("UV: NMI kdump: KEXEC not supported in this kernel\n"); 527 527 } 528 - #endif /* !CONFIG_KEXEC */ 528 + #endif /* !CONFIG_KEXEC_CORE */ 529 529 530 530 #ifdef CONFIG_KGDB 531 531 #ifdef CONFIG_KGDB_KDB

-60

arch/xtensa/include/asm/dma-mapping.h

··· 32 32 33 33 #include <asm-generic/dma-mapping-common.h> 34 34 35 - #define dma_alloc_noncoherent(d, s, h, f) dma_alloc_attrs(d, s, h, f, NULL) 36 - #define dma_free_noncoherent(d, s, v, h) dma_free_attrs(d, s, v, h, NULL) 37 - #define dma_alloc_coherent(d, s, h, f) dma_alloc_attrs(d, s, h, f, NULL) 38 - #define dma_free_coherent(d, s, c, h) dma_free_attrs(d, s, c, h, NULL) 39 - 40 - static inline void *dma_alloc_attrs(struct device *dev, size_t size, 41 - dma_addr_t *dma_handle, gfp_t gfp, 42 - struct dma_attrs *attrs) 43 - { 44 - void *ret; 45 - struct dma_map_ops *ops = get_dma_ops(dev); 46 - 47 - if (dma_alloc_from_coherent(dev, size, dma_handle, &ret)) 48 - return ret; 49 - 50 - ret = ops->alloc(dev, size, dma_handle, gfp, attrs); 51 - debug_dma_alloc_coherent(dev, size, *dma_handle, ret); 52 - 53 - return ret; 54 - } 55 - 56 - static inline void dma_free_attrs(struct device *dev, size_t size, 57 - void *vaddr, dma_addr_t dma_handle, 58 - struct dma_attrs *attrs) 59 - { 60 - struct dma_map_ops *ops = get_dma_ops(dev); 61 - 62 - if (dma_release_from_coherent(dev, get_order(size), vaddr)) 63 - return; 64 - 65 - ops->free(dev, size, vaddr, dma_handle, attrs); 66 - debug_dma_free_coherent(dev, size, vaddr, dma_handle); 67 - } 68 - 69 - static inline int 70 - dma_mapping_error(struct device *dev, dma_addr_t dma_addr) 71 - { 72 - struct dma_map_ops *ops = get_dma_ops(dev); 73 - 74 - debug_dma_mapping_error(dev, dma_addr); 75 - return ops->mapping_error(dev, dma_addr); 76 - } 77 - 78 - static inline int 79 - dma_supported(struct device *dev, u64 mask) 80 - { 81 - return 1; 82 - } 83 - 84 - static inline int 85 - dma_set_mask(struct device *dev, u64 mask) 86 - { 87 - if(!dev->dma_mask || !dma_supported(dev, mask)) 88 - return -EIO; 89 - 90 - *dev->dma_mask = mask; 91 - 92 - return 0; 93 - } 94 - 95 35 void dma_cache_sync(struct device *dev, void *vaddr, size_t size, 96 36 enum dma_data_direction direction); 97 37

+1 -1

drivers/android/binder.c

··· 2834 2834 return VM_FAULT_SIGBUS; 2835 2835 } 2836 2836 2837 - static struct vm_operations_struct binder_vm_ops = { 2837 + static const struct vm_operations_struct binder_vm_ops = { 2838 2838 .open = binder_vma_open, 2839 2839 .close = binder_vma_close, 2840 2840 .fault = binder_vm_fault,

+2 -14

drivers/crypto/qat/qat_common/adf_transport_debug.c

··· 86 86 { 87 87 struct adf_etr_ring_data *ring = sfile->private; 88 88 struct adf_etr_bank_data *bank = ring->bank; 89 - uint32_t *msg = v; 90 89 void __iomem *csr = ring->bank->csr_addr; 91 - int i, x; 92 90 93 91 if (v == SEQ_START_TOKEN) { 94 92 int head, tail, empty; ··· 111 113 seq_puts(sfile, "----------- Ring data ------------\n"); 112 114 return 0; 113 115 } 114 - seq_printf(sfile, "%p:", msg); 115 - x = 0; 116 - i = 0; 117 - for (; i < (ADF_MSG_SIZE_TO_BYTES(ring->msg_size) >> 2); i++) { 118 - seq_printf(sfile, " %08X", *(msg + i)); 119 - if ((ADF_MSG_SIZE_TO_BYTES(ring->msg_size) >> 2) != i + 1 && 120 - (++x == 8)) { 121 - seq_printf(sfile, "\n%p:", msg + i + 1); 122 - x = 0; 123 - } 124 - } 125 - seq_puts(sfile, "\n"); 116 + seq_hex_dump(sfile, "", DUMP_PREFIX_ADDRESS, 32, 4, 117 + v, ADF_MSG_SIZE_TO_BYTES(ring->msg_size), false); 126 118 return 0; 127 119 } 128 120

+1 -1

drivers/firmware/efi/Kconfig

··· 43 43 44 44 config EFI_RUNTIME_MAP 45 45 bool "Export efi runtime maps to sysfs" 46 - depends on X86 && EFI && KEXEC 46 + depends on X86 && EFI && KEXEC_CORE 47 47 default y 48 48 help 49 49 Export efi runtime memory maps to /sys/firmware/efi/runtime-map.

+1 -1

drivers/gpu/drm/vgem/vgem_drv.c

··· 125 125 } 126 126 } 127 127 128 - static struct vm_operations_struct vgem_gem_vm_ops = { 128 + static const struct vm_operations_struct vgem_gem_vm_ops = { 129 129 .fault = vgem_gem_fault, 130 130 .open = drm_gem_vm_open, 131 131 .close = drm_gem_vm_close,

+1 -1

drivers/hsi/clients/cmt_speech.c

··· 1110 1110 return 0; 1111 1111 } 1112 1112 1113 - static struct vm_operations_struct cs_char_vm_ops = { 1113 + static const struct vm_operations_struct cs_char_vm_ops = { 1114 1114 .fault = cs_char_vma_fault, 1115 1115 }; 1116 1116

+1 -1

drivers/infiniband/hw/qib/qib_file_ops.c

··· 908 908 return 0; 909 909 } 910 910 911 - static struct vm_operations_struct qib_file_vm_ops = { 911 + static const struct vm_operations_struct qib_file_vm_ops = { 912 912 .fault = qib_file_vma_fault, 913 913 }; 914 914

+1 -1

drivers/infiniband/hw/qib/qib_mmap.c

··· 75 75 kref_put(&ip->ref, qib_release_mmap_info); 76 76 } 77 77 78 - static struct vm_operations_struct qib_vm_ops = { 78 + static const struct vm_operations_struct qib_vm_ops = { 79 79 .open = qib_vma_open, 80 80 .close = qib_vma_close, 81 81 };

+1 -1

drivers/media/platform/omap/omap_vout.c

··· 872 872 vout->mmap_count--; 873 873 } 874 874 875 - static struct vm_operations_struct omap_vout_vm_ops = { 875 + static const struct vm_operations_struct omap_vout_vm_ops = { 876 876 .open = omap_vout_vm_open, 877 877 .close = omap_vout_vm_close, 878 878 };

+1 -1

drivers/misc/genwqe/card_dev.c

··· 418 418 kfree(dma_map); 419 419 } 420 420 421 - static struct vm_operations_struct genwqe_vma_ops = { 421 + static const struct vm_operations_struct genwqe_vma_ops = { 422 422 .open = genwqe_vma_open, 423 423 .close = genwqe_vma_close, 424 424 };

+7 -28

drivers/net/wireless/ath/wil6210/debugfs.c

··· 156 156 .llseek = seq_lseek, 157 157 }; 158 158 159 + static void wil_seq_hexdump(struct seq_file *s, void *p, int len, 160 + const char *prefix) 161 + { 162 + seq_hex_dump(s, prefix, DUMP_PREFIX_NONE, 16, 1, p, len, false); 163 + } 164 + 159 165 static void wil_print_ring(struct seq_file *s, const char *prefix, 160 166 void __iomem *off) 161 167 { ··· 218 212 le16_to_cpu(hdr.seq), len, 219 213 le16_to_cpu(hdr.type), hdr.flags); 220 214 if (len <= MAX_MBOXITEM_SIZE) { 221 - int n = 0; 222 - char printbuf[16 * 3 + 2]; 223 215 unsigned char databuf[MAX_MBOXITEM_SIZE]; 224 216 void __iomem *src = wmi_buffer(wil, d.addr) + 225 217 sizeof(struct wil6210_mbox_hdr); ··· 227 223 * reading header 228 224 */ 229 225 wil_memcpy_fromio_32(databuf, src, len); 230 - while (n < len) { 231 - int l = min(len - n, 16); 232 - 233 - hex_dump_to_buffer(databuf + n, l, 234 - 16, 1, printbuf, 235 - sizeof(printbuf), 236 - false); 237 - seq_printf(s, " : %s\n", printbuf); 238 - n += l; 239 - } 226 + wil_seq_hexdump(s, databuf, len, " : "); 240 227 } 241 228 } else { 242 229 seq_puts(s, "\n"); ··· 861 866 .write = wil_write_file_wmi, 862 867 .open = simple_open, 863 868 }; 864 - 865 - static void wil_seq_hexdump(struct seq_file *s, void *p, int len, 866 - const char *prefix) 867 - { 868 - char printbuf[16 * 3 + 2]; 869 - int i = 0; 870 - 871 - while (i < len) { 872 - int l = min(len - i, 16); 873 - 874 - hex_dump_to_buffer(p + i, l, 16, 1, printbuf, 875 - sizeof(printbuf), false); 876 - seq_printf(s, "%s%s\n", prefix, printbuf); 877 - i += l; 878 - } 879 - } 880 869 881 870 static void wil_seq_print_skb(struct seq_file *s, struct sk_buff *skb) 882 871 {

+3 -10

drivers/parisc/ccio-dma.c

··· 1103 1103 struct ioc *ioc = ioc_list; 1104 1104 1105 1105 while (ioc != NULL) { 1106 - u32 *res_ptr = (u32 *)ioc->res_map; 1107 - int j; 1108 - 1109 - for (j = 0; j < (ioc->res_size / sizeof(u32)); j++) { 1110 - if ((j & 7) == 0) 1111 - seq_puts(m, "\n "); 1112 - seq_printf(m, "%08x", *res_ptr); 1113 - res_ptr++; 1114 - } 1115 - seq_puts(m, "\n\n"); 1106 + seq_hex_dump(m, " ", DUMP_PREFIX_NONE, 32, 4, ioc->res_map, 1107 + ioc->res_size, false); 1108 + seq_putc(m, '\n'); 1116 1109 ioc = ioc->next; 1117 1110 break; /* XXX - remove me */ 1118 1111 }

+2 -7

drivers/parisc/sba_iommu.c

··· 1854 1854 { 1855 1855 struct sba_device *sba_dev = sba_list; 1856 1856 struct ioc *ioc = &sba_dev->ioc[0]; /* FIXME: Multi-IOC support! */ 1857 - unsigned int *res_ptr = (unsigned int *)ioc->res_map; 1858 - int i; 1859 1857 1860 - for (i = 0; i < (ioc->res_size/sizeof(unsigned int)); ++i, ++res_ptr) { 1861 - if ((i & 7) == 0) 1862 - seq_puts(m, "\n "); 1863 - seq_printf(m, " %08x", *res_ptr); 1864 - } 1858 + seq_hex_dump(m, " ", DUMP_PREFIX_NONE, 32, 4, ioc->res_map, 1859 + ioc->res_size, false); 1865 1860 seq_putc(m, '\n'); 1866 1861 1867 1862 return 0;

+1 -1

drivers/pci/pci-driver.c

··· 467 467 pci_msi_shutdown(pci_dev); 468 468 pci_msix_shutdown(pci_dev); 469 469 470 - #ifdef CONFIG_KEXEC 470 + #ifdef CONFIG_KEXEC_CORE 471 471 /* 472 472 * If this is a kexec reboot, turn off Bus Master bit on the 473 473 * device to tell it to not continue to do DMA. Don't touch

+1 -9

drivers/s390/crypto/zcrypt_api.c

··· 1206 1206 static void sprinthx4(unsigned char *title, struct seq_file *m, 1207 1207 unsigned int *array, unsigned int len) 1208 1208 { 1209 - int r; 1210 - 1211 1209 seq_printf(m, "\n%s\n", title); 1212 - for (r = 0; r < len; r++) { 1213 - if ((r % 8) == 0) 1214 - seq_printf(m, " "); 1215 - seq_printf(m, "%08X ", array[r]); 1216 - if ((r % 8) == 7) 1217 - seq_putc(m, '\n'); 1218 - } 1210 + seq_hex_dump(m, " ", DUMP_PREFIX_NONE, 32, 4, array, len, false); 1219 1211 seq_putc(m, '\n'); 1220 1212 } 1221 1213

+1 -1

drivers/staging/android/ion/ion.c

··· 997 997 mutex_unlock(&buffer->lock); 998 998 } 999 999 1000 - static struct vm_operations_struct ion_vma_ops = { 1000 + static const struct vm_operations_struct ion_vma_ops = { 1001 1001 .open = ion_vm_open, 1002 1002 .close = ion_vm_close, 1003 1003 .fault = ion_vm_fault,

+1 -1

drivers/staging/comedi/comedi_fops.c

··· 2156 2156 comedi_buf_map_put(bm); 2157 2157 } 2158 2158 2159 - static struct vm_operations_struct comedi_vm_ops = { 2159 + static const struct vm_operations_struct comedi_vm_ops = { 2160 2160 .open = comedi_vm_open, 2161 2161 .close = comedi_vm_close, 2162 2162 };

+1 -1

drivers/video/fbdev/omap2/omapfb/omapfb-main.c

··· 1091 1091 omapfb_put_mem_region(rg); 1092 1092 } 1093 1093 1094 - static struct vm_operations_struct mmap_user_ops = { 1094 + static const struct vm_operations_struct mmap_user_ops = { 1095 1095 .open = mmap_user_open, 1096 1096 .close = mmap_user_close, 1097 1097 };

+1 -1

drivers/xen/gntalloc.c

··· 494 494 mutex_unlock(&gref_mutex); 495 495 } 496 496 497 - static struct vm_operations_struct gntalloc_vmops = { 497 + static const struct vm_operations_struct gntalloc_vmops = { 498 498 .open = gntalloc_vma_open, 499 499 .close = gntalloc_vma_close, 500 500 };

+1 -1

drivers/xen/gntdev.c

··· 433 433 return map->pages[(addr - map->pages_vm_start) >> PAGE_SHIFT]; 434 434 } 435 435 436 - static struct vm_operations_struct gntdev_vmops = { 436 + static const struct vm_operations_struct gntdev_vmops = { 437 437 .open = gntdev_vma_open, 438 438 .close = gntdev_vma_close, 439 439 .find_special_page = gntdev_vma_find_special_page,

+2 -2

drivers/xen/privcmd.c

··· 414 414 return 0; 415 415 } 416 416 417 - static struct vm_operations_struct privcmd_vm_ops; 417 + static const struct vm_operations_struct privcmd_vm_ops; 418 418 419 419 static long privcmd_ioctl_mmap_batch(void __user *udata, int version) 420 420 { ··· 605 605 return VM_FAULT_SIGBUS; 606 606 } 607 607 608 - static struct vm_operations_struct privcmd_vm_ops = { 608 + static const struct vm_operations_struct privcmd_vm_ops = { 609 609 .close = privcmd_close, 610 610 .fault = privcmd_fault 611 611 };

-6

drivers/xen/swiotlb-xen.c

··· 311 311 */ 312 312 flags &= ~(__GFP_DMA | __GFP_HIGHMEM); 313 313 314 - if (dma_alloc_from_coherent(hwdev, size, dma_handle, &ret)) 315 - return ret; 316 - 317 314 /* On ARM this function returns an ioremap'ped virtual address for 318 315 * which virt_to_phys doesn't return the corresponding physical 319 316 * address. In fact on ARM virt_to_phys only works for kernel direct ··· 352 355 int order = get_order(size); 353 356 phys_addr_t phys; 354 357 u64 dma_mask = DMA_BIT_MASK(32); 355 - 356 - if (dma_release_from_coherent(hwdev, order, vaddr)) 357 - return; 358 358 359 359 if (hwdev && hwdev->coherent_dma_mask) 360 360 dma_mask = hwdev->coherent_dma_mask;

+5 -3

fs/affs/super.c

··· 18 18 #include <linux/sched.h> 19 19 #include <linux/slab.h> 20 20 #include <linux/writeback.h> 21 + #include <linux/blkdev.h> 21 22 #include "affs.h" 22 23 23 24 static int affs_statfs(struct dentry *dentry, struct kstatfs *buf); ··· 353 352 * blocks, we will have to change it. 354 353 */ 355 354 356 - size = sb->s_bdev->bd_inode->i_size >> 9; 355 + size = i_size_read(sb->s_bdev->bd_inode) >> 9; 357 356 pr_debug("initial blocksize=%d, #blocks=%d\n", 512, size); 358 357 359 358 affs_set_blocksize(sb, PAGE_SIZE); 360 359 /* Try to find root block. Its location depends on the block size. */ 361 360 362 - i = 512; 363 - j = 4096; 361 + i = bdev_logical_block_size(sb->s_bdev); 362 + j = PAGE_SIZE; 364 363 if (blocksize > 0) { 365 364 i = j = blocksize; 366 365 size = size / (blocksize / 512); 367 366 } 367 + 368 368 for (blocksize = i; blocksize <= j; blocksize <<= 1, size >>= 1) { 369 369 sbi->s_root_block = root_block; 370 370 if (root_block < 0)

+1 -1

fs/ceph/addr.c

··· 1593 1593 return err; 1594 1594 } 1595 1595 1596 - static struct vm_operations_struct ceph_vmops = { 1596 + static const struct vm_operations_struct ceph_vmops = { 1597 1597 .fault = ceph_filemap_fault, 1598 1598 .page_mkwrite = ceph_page_mkwrite, 1599 1599 };

+1 -1

fs/cifs/file.c

··· 3216 3216 return VM_FAULT_LOCKED; 3217 3217 } 3218 3218 3219 - static struct vm_operations_struct cifs_file_vm_ops = { 3219 + static const struct vm_operations_struct cifs_file_vm_ops = { 3220 3220 .fault = filemap_fault, 3221 3221 .map_pages = filemap_map_pages, 3222 3222 .page_mkwrite = cifs_page_mkwrite,

+3 -3

fs/coda/upcall.c

··· 353 353 char *result; 354 354 355 355 insize = max_t(unsigned int, 356 - INSIZE(readlink), OUTSIZE(readlink)+ *length + 1); 356 + INSIZE(readlink), OUTSIZE(readlink)+ *length); 357 357 UPARG(CODA_READLINK); 358 358 359 359 inp->coda_readlink.VFid = *fid; ··· 361 361 error = coda_upcall(coda_vcp(sb), insize, &outsize, inp); 362 362 if (!error) { 363 363 retlen = outp->coda_readlink.count; 364 - if ( retlen > *length ) 365 - retlen = *length; 364 + if (retlen >= *length) 365 + retlen = *length - 1; 366 366 *length = retlen; 367 367 result = (char *)outp + (long)outp->coda_readlink.data; 368 368 memcpy(buffer, result, retlen);

+38 -8

fs/coredump.c

··· 513 513 const struct cred *old_cred; 514 514 struct cred *cred; 515 515 int retval = 0; 516 - int flag = 0; 517 516 int ispipe; 518 517 struct files_struct *displaced; 519 - bool need_nonrelative = false; 518 + /* require nonrelative corefile path and be extra careful */ 519 + bool need_suid_safe = false; 520 520 bool core_dumped = false; 521 521 static atomic_t core_dump_count = ATOMIC_INIT(0); 522 522 struct coredump_params cprm = { ··· 550 550 */ 551 551 if (__get_dumpable(cprm.mm_flags) == SUID_DUMP_ROOT) { 552 552 /* Setuid core dump mode */ 553 - flag = O_EXCL; /* Stop rewrite attacks */ 554 553 cred->fsuid = GLOBAL_ROOT_UID; /* Dump root private */ 555 - need_nonrelative = true; 554 + need_suid_safe = true; 556 555 } 557 556 558 557 retval = coredump_wait(siginfo->si_signo, &core_state); ··· 632 633 if (cprm.limit < binfmt->min_coredump) 633 634 goto fail_unlock; 634 635 635 - if (need_nonrelative && cn.corename[0] != '/') { 636 + if (need_suid_safe && cn.corename[0] != '/') { 636 637 printk(KERN_WARNING "Pid %d(%s) can only dump core "\ 637 638 "to fully qualified path!\n", 638 639 task_tgid_vnr(current), current->comm); ··· 640 641 goto fail_unlock; 641 642 } 642 643 644 + /* 645 + * Unlink the file if it exists unless this is a SUID 646 + * binary - in that case, we're running around with root 647 + * privs and don't want to unlink another user's coredump. 648 + */ 649 + if (!need_suid_safe) { 650 + mm_segment_t old_fs; 651 + 652 + old_fs = get_fs(); 653 + set_fs(KERNEL_DS); 654 + /* 655 + * If it doesn't exist, that's fine. If there's some 656 + * other problem, we'll catch it at the filp_open(). 657 + */ 658 + (void) sys_unlink((const char __user *)cn.corename); 659 + set_fs(old_fs); 660 + } 661 + 662 + /* 663 + * There is a race between unlinking and creating the 664 + * file, but if that causes an EEXIST here, that's 665 + * fine - another process raced with us while creating 666 + * the corefile, and the other process won. To userspace, 667 + * what matters is that at least one of the two processes 668 + * writes its coredump successfully, not which one. 669 + */ 643 670 cprm.file = filp_open(cn.corename, 644 - O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag, 671 + O_CREAT | 2 | O_NOFOLLOW | 672 + O_LARGEFILE | O_EXCL, 645 673 0600); 646 674 if (IS_ERR(cprm.file)) 647 675 goto fail_unlock; ··· 685 659 if (!S_ISREG(inode->i_mode)) 686 660 goto close_fail; 687 661 /* 688 - * Dont allow local users get cute and trick others to coredump 689 - * into their pre-created files. 662 + * Don't dump core if the filesystem changed owner or mode 663 + * of the file during file creation. This is an issue when 664 + * a process dumps core while its cwd is e.g. on a vfat 665 + * filesystem. 690 666 */ 691 667 if (!uid_eq(inode->i_uid, current_fsuid())) 668 + goto close_fail; 669 + if ((inode->i_mode & 0677) != 0600) 692 670 goto close_fail; 693 671 if (!(cprm.file->f_mode & FMODE_CAN_WRITE)) 694 672 goto close_fail;

+4 -5

fs/hfs/bnode.c

··· 288 288 page_cache_release(page); 289 289 goto fail; 290 290 } 291 - page_cache_release(page); 292 291 node->page[i] = page; 293 292 } 294 293 ··· 397 398 398 399 void hfs_bnode_free(struct hfs_bnode *node) 399 400 { 400 - //int i; 401 + int i; 401 402 402 - //for (i = 0; i < node->tree->pages_per_bnode; i++) 403 - // if (node->page[i]) 404 - // page_cache_release(node->page[i]); 403 + for (i = 0; i < node->tree->pages_per_bnode; i++) 404 + if (node->page[i]) 405 + page_cache_release(node->page[i]); 405 406 kfree(node); 406 407 } 407 408

+11 -9

fs/hfs/brec.c

··· 131 131 hfs_bnode_write(node, entry, data_off + key_len, entry_len); 132 132 hfs_bnode_dump(node); 133 133 134 - if (new_node) { 135 - /* update parent key if we inserted a key 136 - * at the start of the first node 137 - */ 138 - if (!rec && new_node != node) 139 - hfs_brec_update_parent(fd); 134 + /* 135 + * update parent key if we inserted a key 136 + * at the start of the node and it is not the new node 137 + */ 138 + if (!rec && new_node != node) { 139 + hfs_bnode_read_key(node, fd->search_key, data_off + size); 140 + hfs_brec_update_parent(fd); 141 + } 140 142 143 + if (new_node) { 141 144 hfs_bnode_put(fd->bnode); 142 145 if (!new_node->parent) { 143 146 hfs_btree_inc_height(tree); ··· 168 165 } 169 166 goto again; 170 167 } 171 - 172 - if (!rec) 173 - hfs_brec_update_parent(fd); 174 168 175 169 return 0; 176 170 } ··· 366 366 if (IS_ERR(parent)) 367 367 return PTR_ERR(parent); 368 368 __hfs_brec_find(parent, fd); 369 + if (fd->record < 0) 370 + return -ENOENT; 369 371 hfs_bnode_dump(parent); 370 372 rec = fd->record; 371 373

-3

fs/hfsplus/bnode.c

··· 454 454 page_cache_release(page); 455 455 goto fail; 456 456 } 457 - page_cache_release(page); 458 457 node->page[i] = page; 459 458 } 460 459 ··· 565 566 566 567 void hfs_bnode_free(struct hfs_bnode *node) 567 568 { 568 - #if 0 569 569 int i; 570 570 571 571 for (i = 0; i < node->tree->pages_per_bnode; i++) 572 572 if (node->page[i]) 573 573 page_cache_release(node->page[i]); 574 - #endif 575 574 kfree(node); 576 575 } 577 576

+1 -1

fs/namei.c

··· 2438 2438 2439 2439 /** 2440 2440 * path_mountpoint - look up a path to be umounted 2441 - * @nameidata: lookup context 2441 + * @nd: lookup context 2442 2442 * @flags: lookup flags 2443 2443 * @path: pointer to container for result 2444 2444 *

+45 -68

fs/proc/base.c

··· 1230 1230 size_t count, loff_t *ppos) 1231 1231 { 1232 1232 struct inode * inode = file_inode(file); 1233 - char *page, *tmp; 1234 - ssize_t length; 1235 1233 uid_t loginuid; 1236 1234 kuid_t kloginuid; 1235 + int rv; 1237 1236 1238 1237 rcu_read_lock(); 1239 1238 if (current != pid_task(proc_pid(inode), PIDTYPE_PID)) { ··· 1241 1242 } 1242 1243 rcu_read_unlock(); 1243 1244 1244 - if (count >= PAGE_SIZE) 1245 - count = PAGE_SIZE - 1; 1246 - 1247 1245 if (*ppos != 0) { 1248 1246 /* No partial writes. */ 1249 1247 return -EINVAL; 1250 1248 } 1251 - page = (char*)__get_free_page(GFP_TEMPORARY); 1252 - if (!page) 1253 - return -ENOMEM; 1254 - length = -EFAULT; 1255 - if (copy_from_user(page, buf, count)) 1256 - goto out_free_page; 1257 1249 1258 - page[count] = '\0'; 1259 - loginuid = simple_strtoul(page, &tmp, 10); 1260 - if (tmp == page) { 1261 - length = -EINVAL; 1262 - goto out_free_page; 1263 - 1264 - } 1250 + rv = kstrtou32_from_user(buf, count, 10, &loginuid); 1251 + if (rv < 0) 1252 + return rv; 1265 1253 1266 1254 /* is userspace tring to explicitly UNSET the loginuid? */ 1267 1255 if (loginuid == AUDIT_UID_UNSET) { 1268 1256 kloginuid = INVALID_UID; 1269 1257 } else { 1270 1258 kloginuid = make_kuid(file->f_cred->user_ns, loginuid); 1271 - if (!uid_valid(kloginuid)) { 1272 - length = -EINVAL; 1273 - goto out_free_page; 1274 - } 1259 + if (!uid_valid(kloginuid)) 1260 + return -EINVAL; 1275 1261 } 1276 1262 1277 - length = audit_set_loginuid(kloginuid); 1278 - if (likely(length == 0)) 1279 - length = count; 1280 - 1281 - out_free_page: 1282 - free_page((unsigned long) page); 1283 - return length; 1263 + rv = audit_set_loginuid(kloginuid); 1264 + if (rv < 0) 1265 + return rv; 1266 + return count; 1284 1267 } 1285 1268 1286 1269 static const struct file_operations proc_loginuid_operations = { ··· 1316 1335 const char __user * buf, size_t count, loff_t *ppos) 1317 1336 { 1318 1337 struct task_struct *task; 1319 - char buffer[PROC_NUMBUF], *end; 1338 + char buffer[PROC_NUMBUF]; 1320 1339 int make_it_fail; 1340 + int rv; 1321 1341 1322 1342 if (!capable(CAP_SYS_RESOURCE)) 1323 1343 return -EPERM; ··· 1327 1345 count = sizeof(buffer) - 1; 1328 1346 if (copy_from_user(buffer, buf, count)) 1329 1347 return -EFAULT; 1330 - make_it_fail = simple_strtol(strstrip(buffer), &end, 0); 1331 - if (*end) 1332 - return -EINVAL; 1348 + rv = kstrtoint(strstrip(buffer), 0, &make_it_fail); 1349 + if (rv < 0) 1350 + return rv; 1333 1351 if (make_it_fail < 0 || make_it_fail > 1) 1334 1352 return -EINVAL; 1335 1353 ··· 1818 1836 return dir_emit(ctx, name, len, 1, DT_UNKNOWN); 1819 1837 } 1820 1838 1821 - #ifdef CONFIG_CHECKPOINT_RESTORE 1822 - 1823 1839 /* 1824 1840 * dname_to_vma_addr - maps a dentry name into two unsigned longs 1825 1841 * which represent vma start and end addresses. ··· 1843 1863 1844 1864 if (flags & LOOKUP_RCU) 1845 1865 return -ECHILD; 1846 - 1847 - if (!capable(CAP_SYS_ADMIN)) { 1848 - status = -EPERM; 1849 - goto out_notask; 1850 - } 1851 1866 1852 1867 inode = d_inode(dentry); 1853 1868 task = get_proc_task(inode); ··· 1932 1957 unsigned char name[4*sizeof(long)+2]; /* max: %lx-%lx\0 */ 1933 1958 }; 1934 1959 1960 + /* 1961 + * Only allow CAP_SYS_ADMIN to follow the links, due to concerns about how the 1962 + * symlinks may be used to bypass permissions on ancestor directories in the 1963 + * path to the file in question. 1964 + */ 1965 + static const char * 1966 + proc_map_files_follow_link(struct dentry *dentry, void **cookie) 1967 + { 1968 + if (!capable(CAP_SYS_ADMIN)) 1969 + return ERR_PTR(-EPERM); 1970 + 1971 + return proc_pid_follow_link(dentry, NULL); 1972 + } 1973 + 1974 + /* 1975 + * Identical to proc_pid_link_inode_operations except for follow_link() 1976 + */ 1977 + static const struct inode_operations proc_map_files_link_inode_operations = { 1978 + .readlink = proc_pid_readlink, 1979 + .follow_link = proc_map_files_follow_link, 1980 + .setattr = proc_setattr, 1981 + }; 1982 + 1935 1983 static int 1936 1984 proc_map_files_instantiate(struct inode *dir, struct dentry *dentry, 1937 1985 struct task_struct *task, const void *ptr) ··· 1970 1972 ei = PROC_I(inode); 1971 1973 ei->op.proc_get_link = proc_map_files_get_link; 1972 1974 1973 - inode->i_op = &proc_pid_link_inode_operations; 1975 + inode->i_op = &proc_map_files_link_inode_operations; 1974 1976 inode->i_size = 64; 1975 1977 inode->i_mode = S_IFLNK; 1976 1978 ··· 1993 1995 struct task_struct *task; 1994 1996 int result; 1995 1997 struct mm_struct *mm; 1996 - 1997 - result = -EPERM; 1998 - if (!capable(CAP_SYS_ADMIN)) 1999 - goto out; 2000 1998 2001 1999 result = -ENOENT; 2002 2000 task = get_proc_task(dir); ··· 2046 2052 struct map_files_info info; 2047 2053 struct map_files_info *p; 2048 2054 int ret; 2049 - 2050 - ret = -EPERM; 2051 - if (!capable(CAP_SYS_ADMIN)) 2052 - goto out; 2053 2055 2054 2056 ret = -ENOENT; 2055 2057 task = get_proc_task(file_inode(file)); ··· 2235 2245 .llseek = seq_lseek, 2236 2246 .release = seq_release_private, 2237 2247 }; 2238 - #endif /* CONFIG_CHECKPOINT_RESTORE */ 2239 2248 2240 2249 static int proc_pident_instantiate(struct inode *dir, 2241 2250 struct dentry *dentry, struct task_struct *task, const void *ptr) ··· 2470 2481 { 2471 2482 struct task_struct *task; 2472 2483 struct mm_struct *mm; 2473 - char buffer[PROC_NUMBUF], *end; 2474 2484 unsigned int val; 2475 2485 int ret; 2476 2486 int i; 2477 2487 unsigned long mask; 2478 2488 2479 - ret = -EFAULT; 2480 - memset(buffer, 0, sizeof(buffer)); 2481 - if (count > sizeof(buffer) - 1) 2482 - count = sizeof(buffer) - 1; 2483 - if (copy_from_user(buffer, buf, count)) 2484 - goto out_no_task; 2485 - 2486 - ret = -EINVAL; 2487 - val = (unsigned int)simple_strtoul(buffer, &end, 0); 2488 - if (*end == '\n') 2489 - end++; 2490 - if (end - buffer == 0) 2491 - goto out_no_task; 2489 + ret = kstrtouint_from_user(buf, count, 0, &val); 2490 + if (ret < 0) 2491 + return ret; 2492 2492 2493 2493 ret = -ESRCH; 2494 2494 task = get_proc_task(file_inode(file)); 2495 2495 if (!task) 2496 2496 goto out_no_task; 2497 2497 2498 - ret = end - buffer; 2499 2498 mm = get_task_mm(task); 2500 2499 if (!mm) 2501 2500 goto out_no_mm; ··· 2499 2522 out_no_mm: 2500 2523 put_task_struct(task); 2501 2524 out_no_task: 2502 - return ret; 2525 + if (ret < 0) 2526 + return ret; 2527 + return count; 2503 2528 } 2504 2529 2505 2530 static const struct file_operations proc_coredump_filter_operations = { ··· 2723 2744 static const struct pid_entry tgid_base_stuff[] = { 2724 2745 DIR("task", S_IRUGO|S_IXUGO, proc_task_inode_operations, proc_task_operations), 2725 2746 DIR("fd", S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations), 2726 - #ifdef CONFIG_CHECKPOINT_RESTORE 2727 2747 DIR("map_files", S_IRUSR|S_IXUSR, proc_map_files_inode_operations, proc_map_files_operations), 2728 - #endif 2729 2748 DIR("fdinfo", S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations), 2730 2749 DIR("ns", S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations), 2731 2750 #ifdef CONFIG_NET

+22 -22

fs/proc/generic.c

··· 26 26 27 27 #include "internal.h" 28 28 29 - static DEFINE_SPINLOCK(proc_subdir_lock); 29 + static DEFINE_RWLOCK(proc_subdir_lock); 30 30 31 31 static int proc_match(unsigned int len, const char *name, struct proc_dir_entry *de) 32 32 { ··· 172 172 { 173 173 int rv; 174 174 175 - spin_lock(&proc_subdir_lock); 175 + read_lock(&proc_subdir_lock); 176 176 rv = __xlate_proc_name(name, ret, residual); 177 - spin_unlock(&proc_subdir_lock); 177 + read_unlock(&proc_subdir_lock); 178 178 return rv; 179 179 } 180 180 ··· 231 231 { 232 232 struct inode *inode; 233 233 234 - spin_lock(&proc_subdir_lock); 234 + read_lock(&proc_subdir_lock); 235 235 de = pde_subdir_find(de, dentry->d_name.name, dentry->d_name.len); 236 236 if (de) { 237 237 pde_get(de); 238 - spin_unlock(&proc_subdir_lock); 238 + read_unlock(&proc_subdir_lock); 239 239 inode = proc_get_inode(dir->i_sb, de); 240 240 if (!inode) 241 241 return ERR_PTR(-ENOMEM); ··· 243 243 d_add(dentry, inode); 244 244 return NULL; 245 245 } 246 - spin_unlock(&proc_subdir_lock); 246 + read_unlock(&proc_subdir_lock); 247 247 return ERR_PTR(-ENOENT); 248 248 } 249 249 ··· 270 270 if (!dir_emit_dots(file, ctx)) 271 271 return 0; 272 272 273 - spin_lock(&proc_subdir_lock); 273 + read_lock(&proc_subdir_lock); 274 274 de = pde_subdir_first(de); 275 275 i = ctx->pos - 2; 276 276 for (;;) { 277 277 if (!de) { 278 - spin_unlock(&proc_subdir_lock); 278 + read_unlock(&proc_subdir_lock); 279 279 return 0; 280 280 } 281 281 if (!i) ··· 287 287 do { 288 288 struct proc_dir_entry *next; 289 289 pde_get(de); 290 - spin_unlock(&proc_subdir_lock); 290 + read_unlock(&proc_subdir_lock); 291 291 if (!dir_emit(ctx, de->name, de->namelen, 292 292 de->low_ino, de->mode >> 12)) { 293 293 pde_put(de); 294 294 return 0; 295 295 } 296 - spin_lock(&proc_subdir_lock); 296 + read_lock(&proc_subdir_lock); 297 297 ctx->pos++; 298 298 next = pde_subdir_next(de); 299 299 pde_put(de); 300 300 de = next; 301 301 } while (de); 302 - spin_unlock(&proc_subdir_lock); 302 + read_unlock(&proc_subdir_lock); 303 303 return 1; 304 304 } 305 305 ··· 338 338 if (ret) 339 339 return ret; 340 340 341 - spin_lock(&proc_subdir_lock); 341 + write_lock(&proc_subdir_lock); 342 342 dp->parent = dir; 343 343 if (pde_subdir_insert(dir, dp) == false) { 344 344 WARN(1, "proc_dir_entry '%s/%s' already registered\n", 345 345 dir->name, dp->name); 346 - spin_unlock(&proc_subdir_lock); 346 + write_unlock(&proc_subdir_lock); 347 347 proc_free_inum(dp->low_ino); 348 348 return -EEXIST; 349 349 } 350 - spin_unlock(&proc_subdir_lock); 350 + write_unlock(&proc_subdir_lock); 351 351 352 352 return 0; 353 353 } ··· 549 549 const char *fn = name; 550 550 unsigned int len; 551 551 552 - spin_lock(&proc_subdir_lock); 552 + write_lock(&proc_subdir_lock); 553 553 if (__xlate_proc_name(name, &parent, &fn) != 0) { 554 - spin_unlock(&proc_subdir_lock); 554 + write_unlock(&proc_subdir_lock); 555 555 return; 556 556 } 557 557 len = strlen(fn); ··· 559 559 de = pde_subdir_find(parent, fn, len); 560 560 if (de) 561 561 rb_erase(&de->subdir_node, &parent->subdir); 562 - spin_unlock(&proc_subdir_lock); 562 + write_unlock(&proc_subdir_lock); 563 563 if (!de) { 564 564 WARN(1, "name '%s'\n", name); 565 565 return; ··· 583 583 const char *fn = name; 584 584 unsigned int len; 585 585 586 - spin_lock(&proc_subdir_lock); 586 + write_lock(&proc_subdir_lock); 587 587 if (__xlate_proc_name(name, &parent, &fn) != 0) { 588 - spin_unlock(&proc_subdir_lock); 588 + write_unlock(&proc_subdir_lock); 589 589 return -ENOENT; 590 590 } 591 591 len = strlen(fn); 592 592 593 593 root = pde_subdir_find(parent, fn, len); 594 594 if (!root) { 595 - spin_unlock(&proc_subdir_lock); 595 + write_unlock(&proc_subdir_lock); 596 596 return -ENOENT; 597 597 } 598 598 rb_erase(&root->subdir_node, &parent->subdir); ··· 605 605 de = next; 606 606 continue; 607 607 } 608 - spin_unlock(&proc_subdir_lock); 608 + write_unlock(&proc_subdir_lock); 609 609 610 610 proc_entry_rundown(de); 611 611 next = de->parent; ··· 616 616 break; 617 617 pde_put(de); 618 618 619 - spin_lock(&proc_subdir_lock); 619 + write_lock(&proc_subdir_lock); 620 620 de = next; 621 621 } 622 622 pde_put(root);

+65

fs/proc/page.c

··· 9 9 #include <linux/proc_fs.h> 10 10 #include <linux/seq_file.h> 11 11 #include <linux/hugetlb.h> 12 + #include <linux/memcontrol.h> 13 + #include <linux/mmu_notifier.h> 14 + #include <linux/page_idle.h> 12 15 #include <linux/kernel-page-flags.h> 13 16 #include <asm/uaccess.h> 14 17 #include "internal.h" 15 18 16 19 #define KPMSIZE sizeof(u64) 17 20 #define KPMMASK (KPMSIZE - 1) 21 + #define KPMBITS (KPMSIZE * BITS_PER_BYTE) 18 22 19 23 /* /proc/kpagecount - an array exposing page counts 20 24 * ··· 58 54 pfn++; 59 55 out++; 60 56 count -= KPMSIZE; 57 + 58 + cond_resched(); 61 59 } 62 60 63 61 *ppos += (char __user *)out - buf; ··· 152 146 if (PageBalloon(page)) 153 147 u |= 1 << KPF_BALLOON; 154 148 149 + if (page_is_idle(page)) 150 + u |= 1 << KPF_IDLE; 151 + 155 152 u |= kpf_copy_bit(k, KPF_LOCKED, PG_locked); 156 153 157 154 u |= kpf_copy_bit(k, KPF_SLAB, PG_slab); ··· 221 212 pfn++; 222 213 out++; 223 214 count -= KPMSIZE; 215 + 216 + cond_resched(); 224 217 } 225 218 226 219 *ppos += (char __user *)out - buf; ··· 236 225 .read = kpageflags_read, 237 226 }; 238 227 228 + #ifdef CONFIG_MEMCG 229 + static ssize_t kpagecgroup_read(struct file *file, char __user *buf, 230 + size_t count, loff_t *ppos) 231 + { 232 + u64 __user *out = (u64 __user *)buf; 233 + struct page *ppage; 234 + unsigned long src = *ppos; 235 + unsigned long pfn; 236 + ssize_t ret = 0; 237 + u64 ino; 238 + 239 + pfn = src / KPMSIZE; 240 + count = min_t(unsigned long, count, (max_pfn * KPMSIZE) - src); 241 + if (src & KPMMASK || count & KPMMASK) 242 + return -EINVAL; 243 + 244 + while (count > 0) { 245 + if (pfn_valid(pfn)) 246 + ppage = pfn_to_page(pfn); 247 + else 248 + ppage = NULL; 249 + 250 + if (ppage) 251 + ino = page_cgroup_ino(ppage); 252 + else 253 + ino = 0; 254 + 255 + if (put_user(ino, out)) { 256 + ret = -EFAULT; 257 + break; 258 + } 259 + 260 + pfn++; 261 + out++; 262 + count -= KPMSIZE; 263 + 264 + cond_resched(); 265 + } 266 + 267 + *ppos += (char __user *)out - buf; 268 + if (!ret) 269 + ret = (char __user *)out - buf; 270 + return ret; 271 + } 272 + 273 + static const struct file_operations proc_kpagecgroup_operations = { 274 + .llseek = mem_lseek, 275 + .read = kpagecgroup_read, 276 + }; 277 + #endif /* CONFIG_MEMCG */ 278 + 239 279 static int __init proc_page_init(void) 240 280 { 241 281 proc_create("kpagecount", S_IRUSR, NULL, &proc_kpagecount_operations); 242 282 proc_create("kpageflags", S_IRUSR, NULL, &proc_kpageflags_operations); 283 + #ifdef CONFIG_MEMCG 284 + proc_create("kpagecgroup", S_IRUSR, NULL, &proc_kpagecgroup_operations); 285 + #endif 243 286 return 0; 244 287 } 245 288 fs_initcall(proc_page_init);

+4 -1

fs/proc/task_mmu.c

··· 13 13 #include <linux/swap.h> 14 14 #include <linux/swapops.h> 15 15 #include <linux/mmu_notifier.h> 16 + #include <linux/page_idle.h> 16 17 17 18 #include <asm/elf.h> 18 19 #include <asm/uaccess.h> ··· 460 459 461 460 mss->resident += size; 462 461 /* Accumulate the size in pages that have been accessed. */ 463 - if (young || PageReferenced(page)) 462 + if (young || page_is_young(page) || PageReferenced(page)) 464 463 mss->referenced += size; 465 464 mapcount = page_mapcount(page); 466 465 if (mapcount >= 2) { ··· 808 807 809 808 /* Clear accessed and referenced bits. */ 810 809 pmdp_test_and_clear_young(vma, addr, pmd); 810 + test_and_clear_page_young(page); 811 811 ClearPageReferenced(page); 812 812 out: 813 813 spin_unlock(ptl); ··· 836 834 837 835 /* Clear accessed and referenced bits. */ 838 836 ptep_test_and_clear_young(vma, addr, pte); 837 + test_and_clear_page_young(page); 839 838 ClearPageReferenced(page); 840 839 } 841 840 pte_unmap_unlock(pte - 1, ptl);

+42

fs/seq_file.c

··· 12 12 #include <linux/slab.h> 13 13 #include <linux/cred.h> 14 14 #include <linux/mm.h> 15 + #include <linux/printk.h> 15 16 16 17 #include <asm/uaccess.h> 17 18 #include <asm/page.h> ··· 773 772 seq_putc(m, c); 774 773 } 775 774 EXPORT_SYMBOL(seq_pad); 775 + 776 + /* A complete analogue of print_hex_dump() */ 777 + void seq_hex_dump(struct seq_file *m, const char *prefix_str, int prefix_type, 778 + int rowsize, int groupsize, const void *buf, size_t len, 779 + bool ascii) 780 + { 781 + const u8 *ptr = buf; 782 + int i, linelen, remaining = len; 783 + int ret; 784 + 785 + if (rowsize != 16 && rowsize != 32) 786 + rowsize = 16; 787 + 788 + for (i = 0; i < len && !seq_has_overflowed(m); i += rowsize) { 789 + linelen = min(remaining, rowsize); 790 + remaining -= rowsize; 791 + 792 + switch (prefix_type) { 793 + case DUMP_PREFIX_ADDRESS: 794 + seq_printf(m, "%s%p: ", prefix_str, ptr + i); 795 + break; 796 + case DUMP_PREFIX_OFFSET: 797 + seq_printf(m, "%s%.8x: ", prefix_str, i); 798 + break; 799 + default: 800 + seq_printf(m, "%s", prefix_str); 801 + break; 802 + } 803 + 804 + ret = hex_dump_to_buffer(ptr + i, linelen, rowsize, groupsize, 805 + m->buf + m->count, m->size - m->count, 806 + ascii); 807 + if (ret >= m->size - m->count) { 808 + seq_set_overflow(m); 809 + } else { 810 + m->count += ret; 811 + seq_putc(m, '\n'); 812 + } 813 + } 814 + } 815 + EXPORT_SYMBOL(seq_hex_dump); 776 816 777 817 struct list_head *seq_list_start(struct list_head *head, loff_t pos) 778 818 {

+118

include/asm-generic/dma-mapping-common.h

··· 6 6 #include <linux/scatterlist.h> 7 7 #include <linux/dma-debug.h> 8 8 #include <linux/dma-attrs.h> 9 + #include <asm-generic/dma-coherent.h> 9 10 10 11 static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr, 11 12 size_t size, ··· 237 236 } 238 237 239 238 #define dma_get_sgtable(d, t, v, h, s) dma_get_sgtable_attrs(d, t, v, h, s, NULL) 239 + 240 + #ifndef arch_dma_alloc_attrs 241 + #define arch_dma_alloc_attrs(dev, flag) (true) 242 + #endif 243 + 244 + static inline void *dma_alloc_attrs(struct device *dev, size_t size, 245 + dma_addr_t *dma_handle, gfp_t flag, 246 + struct dma_attrs *attrs) 247 + { 248 + struct dma_map_ops *ops = get_dma_ops(dev); 249 + void *cpu_addr; 250 + 251 + BUG_ON(!ops); 252 + 253 + if (dma_alloc_from_coherent(dev, size, dma_handle, &cpu_addr)) 254 + return cpu_addr; 255 + 256 + if (!arch_dma_alloc_attrs(&dev, &flag)) 257 + return NULL; 258 + if (!ops->alloc) 259 + return NULL; 260 + 261 + cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs); 262 + debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr); 263 + return cpu_addr; 264 + } 265 + 266 + static inline void dma_free_attrs(struct device *dev, size_t size, 267 + void *cpu_addr, dma_addr_t dma_handle, 268 + struct dma_attrs *attrs) 269 + { 270 + struct dma_map_ops *ops = get_dma_ops(dev); 271 + 272 + BUG_ON(!ops); 273 + WARN_ON(irqs_disabled()); 274 + 275 + if (dma_release_from_coherent(dev, get_order(size), cpu_addr)) 276 + return; 277 + 278 + if (!ops->free) 279 + return; 280 + 281 + debug_dma_free_coherent(dev, size, cpu_addr, dma_handle); 282 + ops->free(dev, size, cpu_addr, dma_handle, attrs); 283 + } 284 + 285 + static inline void *dma_alloc_coherent(struct device *dev, size_t size, 286 + dma_addr_t *dma_handle, gfp_t flag) 287 + { 288 + return dma_alloc_attrs(dev, size, dma_handle, flag, NULL); 289 + } 290 + 291 + static inline void dma_free_coherent(struct device *dev, size_t size, 292 + void *cpu_addr, dma_addr_t dma_handle) 293 + { 294 + return dma_free_attrs(dev, size, cpu_addr, dma_handle, NULL); 295 + } 296 + 297 + static inline void *dma_alloc_noncoherent(struct device *dev, size_t size, 298 + dma_addr_t *dma_handle, gfp_t gfp) 299 + { 300 + DEFINE_DMA_ATTRS(attrs); 301 + 302 + dma_set_attr(DMA_ATTR_NON_CONSISTENT, &attrs); 303 + return dma_alloc_attrs(dev, size, dma_handle, gfp, &attrs); 304 + } 305 + 306 + static inline void dma_free_noncoherent(struct device *dev, size_t size, 307 + void *cpu_addr, dma_addr_t dma_handle) 308 + { 309 + DEFINE_DMA_ATTRS(attrs); 310 + 311 + dma_set_attr(DMA_ATTR_NON_CONSISTENT, &attrs); 312 + dma_free_attrs(dev, size, cpu_addr, dma_handle, &attrs); 313 + } 314 + 315 + static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) 316 + { 317 + debug_dma_mapping_error(dev, dma_addr); 318 + 319 + if (get_dma_ops(dev)->mapping_error) 320 + return get_dma_ops(dev)->mapping_error(dev, dma_addr); 321 + 322 + #ifdef DMA_ERROR_CODE 323 + return dma_addr == DMA_ERROR_CODE; 324 + #else 325 + return 0; 326 + #endif 327 + } 328 + 329 + #ifndef HAVE_ARCH_DMA_SUPPORTED 330 + static inline int dma_supported(struct device *dev, u64 mask) 331 + { 332 + struct dma_map_ops *ops = get_dma_ops(dev); 333 + 334 + if (!ops) 335 + return 0; 336 + if (!ops->dma_supported) 337 + return 1; 338 + return ops->dma_supported(dev, mask); 339 + } 340 + #endif 341 + 342 + #ifndef HAVE_ARCH_DMA_SET_MASK 343 + static inline int dma_set_mask(struct device *dev, u64 mask) 344 + { 345 + struct dma_map_ops *ops = get_dma_ops(dev); 346 + 347 + if (ops->set_dma_mask) 348 + return ops->set_dma_mask(dev, mask); 349 + 350 + if (!dev->dma_mask || !dma_supported(dev, mask)) 351 + return -EIO; 352 + *dev->dma_mask = mask; 353 + return 0; 354 + } 355 + #endif 240 356 241 357 #endif

+14 -3

include/linux/kexec.h

··· 16 16 17 17 #include <uapi/linux/kexec.h> 18 18 19 - #ifdef CONFIG_KEXEC 19 + #ifdef CONFIG_KEXEC_CORE 20 20 #include <linux/list.h> 21 21 #include <linux/linkage.h> 22 22 #include <linux/compat.h> ··· 318 318 size_t crash_get_memory_size(void); 319 319 void crash_free_reserved_phys_range(unsigned long begin, unsigned long end); 320 320 321 - #else /* !CONFIG_KEXEC */ 321 + int __weak arch_kexec_kernel_image_probe(struct kimage *image, void *buf, 322 + unsigned long buf_len); 323 + void * __weak arch_kexec_kernel_image_load(struct kimage *image); 324 + int __weak arch_kimage_file_post_load_cleanup(struct kimage *image); 325 + int __weak arch_kexec_kernel_verify_sig(struct kimage *image, void *buf, 326 + unsigned long buf_len); 327 + int __weak arch_kexec_apply_relocations_add(const Elf_Ehdr *ehdr, 328 + Elf_Shdr *sechdrs, unsigned int relsec); 329 + int __weak arch_kexec_apply_relocations(const Elf_Ehdr *ehdr, Elf_Shdr *sechdrs, 330 + unsigned int relsec); 331 + 332 + #else /* !CONFIG_KEXEC_CORE */ 322 333 struct pt_regs; 323 334 struct task_struct; 324 335 static inline void crash_kexec(struct pt_regs *regs) { } 325 336 static inline int kexec_should_crash(struct task_struct *p) { return 0; } 326 337 #define kexec_in_progress false 327 - #endif /* CONFIG_KEXEC */ 338 + #endif /* CONFIG_KEXEC_CORE */ 328 339 329 340 #endif /* !defined(__ASSEBMLY__) */ 330 341

-2

include/linux/kmod.h

··· 85 85 UMH_DISABLED, 86 86 }; 87 87 88 - extern void usermodehelper_init(void); 89 - 90 88 extern int __usermodehelper_disable(enum umh_disable_depth depth); 91 89 extern void __usermodehelper_set_disable_depth(enum umh_disable_depth depth); 92 90

+2 -8

include/linux/memcontrol.h

··· 305 305 struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *); 306 306 307 307 bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg); 308 - 309 - struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page); 310 308 struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); 311 - 312 309 struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg); 310 + 313 311 static inline 314 312 struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ 315 313 return css ? container_of(css, struct mem_cgroup, css) : NULL; ··· 343 345 } 344 346 345 347 struct cgroup_subsys_state *mem_cgroup_css_from_page(struct page *page); 348 + ino_t page_cgroup_ino(struct page *page); 346 349 347 350 static inline bool mem_cgroup_disabled(void) 348 351 { ··· 552 553 struct zone *zone) 553 554 { 554 555 return &zone->lruvec; 555 - } 556 - 557 - static inline struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page) 558 - { 559 - return NULL; 560 556 } 561 557 562 558 static inline bool mm_match_cgroup(struct mm_struct *mm,

+10 -2

include/linux/mm.h

··· 1873 1873 1874 1874 extern unsigned long mmap_region(struct file *file, unsigned long addr, 1875 1875 unsigned long len, vm_flags_t vm_flags, unsigned long pgoff); 1876 - extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, 1876 + extern unsigned long do_mmap(struct file *file, unsigned long addr, 1877 1877 unsigned long len, unsigned long prot, unsigned long flags, 1878 - unsigned long pgoff, unsigned long *populate); 1878 + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate); 1879 1879 extern int do_munmap(struct mm_struct *, unsigned long, size_t); 1880 + 1881 + static inline unsigned long 1882 + do_mmap_pgoff(struct file *file, unsigned long addr, 1883 + unsigned long len, unsigned long prot, unsigned long flags, 1884 + unsigned long pgoff, unsigned long *populate) 1885 + { 1886 + return do_mmap(file, addr, len, prot, flags, 0, pgoff, populate); 1887 + } 1880 1888 1881 1889 #ifdef CONFIG_MMU 1882 1890 extern int __mm_populate(unsigned long addr, unsigned long len,

+46

include/linux/mmu_notifier.h

··· 66 66 unsigned long end); 67 67 68 68 /* 69 + * clear_young is a lightweight version of clear_flush_young. Like the 70 + * latter, it is supposed to test-and-clear the young/accessed bitflag 71 + * in the secondary pte, but it may omit flushing the secondary tlb. 72 + */ 73 + int (*clear_young)(struct mmu_notifier *mn, 74 + struct mm_struct *mm, 75 + unsigned long start, 76 + unsigned long end); 77 + 78 + /* 69 79 * test_young is called to check the young/accessed bitflag in 70 80 * the secondary pte. This is used to know if the page is 71 81 * frequently used without actually clearing the flag or tearing ··· 213 203 extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm, 214 204 unsigned long start, 215 205 unsigned long end); 206 + extern int __mmu_notifier_clear_young(struct mm_struct *mm, 207 + unsigned long start, 208 + unsigned long end); 216 209 extern int __mmu_notifier_test_young(struct mm_struct *mm, 217 210 unsigned long address); 218 211 extern void __mmu_notifier_change_pte(struct mm_struct *mm, ··· 241 228 { 242 229 if (mm_has_notifiers(mm)) 243 230 return __mmu_notifier_clear_flush_young(mm, start, end); 231 + return 0; 232 + } 233 + 234 + static inline int mmu_notifier_clear_young(struct mm_struct *mm, 235 + unsigned long start, 236 + unsigned long end) 237 + { 238 + if (mm_has_notifiers(mm)) 239 + return __mmu_notifier_clear_young(mm, start, end); 244 240 return 0; 245 241 } 246 242 ··· 330 308 ___address, \ 331 309 ___address + \ 332 310 PMD_SIZE); \ 311 + __young; \ 312 + }) 313 + 314 + #define ptep_clear_young_notify(__vma, __address, __ptep) \ 315 + ({ \ 316 + int __young; \ 317 + struct vm_area_struct *___vma = __vma; \ 318 + unsigned long ___address = __address; \ 319 + __young = ptep_test_and_clear_young(___vma, ___address, __ptep);\ 320 + __young |= mmu_notifier_clear_young(___vma->vm_mm, ___address, \ 321 + ___address + PAGE_SIZE); \ 322 + __young; \ 323 + }) 324 + 325 + #define pmdp_clear_young_notify(__vma, __address, __pmdp) \ 326 + ({ \ 327 + int __young; \ 328 + struct vm_area_struct *___vma = __vma; \ 329 + unsigned long ___address = __address; \ 330 + __young = pmdp_test_and_clear_young(___vma, ___address, __pmdp);\ 331 + __young |= mmu_notifier_clear_young(___vma->vm_mm, ___address, \ 332 + ___address + PMD_SIZE); \ 333 333 __young; \ 334 334 }) 335 335 ··· 471 427 472 428 #define ptep_clear_flush_young_notify ptep_clear_flush_young 473 429 #define pmdp_clear_flush_young_notify pmdp_clear_flush_young 430 + #define ptep_clear_young_notify ptep_test_and_clear_young 431 + #define pmdp_clear_young_notify pmdp_test_and_clear_young 474 432 #define ptep_clear_flush_notify ptep_clear_flush 475 433 #define pmdp_huge_clear_flush_notify pmdp_huge_clear_flush 476 434 #define pmdp_huge_get_and_clear_notify pmdp_huge_get_and_clear

+11

include/linux/page-flags.h

··· 109 109 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 110 110 PG_compound_lock, 111 111 #endif 112 + #if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT) 113 + PG_young, 114 + PG_idle, 115 + #endif 112 116 __NR_PAGEFLAGS, 113 117 114 118 /* Filesystems */ ··· 291 287 #else 292 288 PAGEFLAG_FALSE(HWPoison) 293 289 #define __PG_HWPOISON 0 290 + #endif 291 + 292 + #if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT) 293 + TESTPAGEFLAG(Young, young) 294 + SETPAGEFLAG(Young, young) 295 + TESTCLEARFLAG(Young, young) 296 + PAGEFLAG(Idle, idle) 294 297 #endif 295 298 296 299 /*

+4

include/linux/page_ext.h

··· 26 26 PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ 27 27 PAGE_EXT_DEBUG_GUARD, 28 28 PAGE_EXT_OWNER, 29 + #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) 30 + PAGE_EXT_YOUNG, 31 + PAGE_EXT_IDLE, 32 + #endif 29 33 }; 30 34 31 35 /*

+110

include/linux/page_idle.h

··· 1 + #ifndef _LINUX_MM_PAGE_IDLE_H 2 + #define _LINUX_MM_PAGE_IDLE_H 3 + 4 + #include <linux/bitops.h> 5 + #include <linux/page-flags.h> 6 + #include <linux/page_ext.h> 7 + 8 + #ifdef CONFIG_IDLE_PAGE_TRACKING 9 + 10 + #ifdef CONFIG_64BIT 11 + static inline bool page_is_young(struct page *page) 12 + { 13 + return PageYoung(page); 14 + } 15 + 16 + static inline void set_page_young(struct page *page) 17 + { 18 + SetPageYoung(page); 19 + } 20 + 21 + static inline bool test_and_clear_page_young(struct page *page) 22 + { 23 + return TestClearPageYoung(page); 24 + } 25 + 26 + static inline bool page_is_idle(struct page *page) 27 + { 28 + return PageIdle(page); 29 + } 30 + 31 + static inline void set_page_idle(struct page *page) 32 + { 33 + SetPageIdle(page); 34 + } 35 + 36 + static inline void clear_page_idle(struct page *page) 37 + { 38 + ClearPageIdle(page); 39 + } 40 + #else /* !CONFIG_64BIT */ 41 + /* 42 + * If there is not enough space to store Idle and Young bits in page flags, use 43 + * page ext flags instead. 44 + */ 45 + extern struct page_ext_operations page_idle_ops; 46 + 47 + static inline bool page_is_young(struct page *page) 48 + { 49 + return test_bit(PAGE_EXT_YOUNG, &lookup_page_ext(page)->flags); 50 + } 51 + 52 + static inline void set_page_young(struct page *page) 53 + { 54 + set_bit(PAGE_EXT_YOUNG, &lookup_page_ext(page)->flags); 55 + } 56 + 57 + static inline bool test_and_clear_page_young(struct page *page) 58 + { 59 + return test_and_clear_bit(PAGE_EXT_YOUNG, 60 + &lookup_page_ext(page)->flags); 61 + } 62 + 63 + static inline bool page_is_idle(struct page *page) 64 + { 65 + return test_bit(PAGE_EXT_IDLE, &lookup_page_ext(page)->flags); 66 + } 67 + 68 + static inline void set_page_idle(struct page *page) 69 + { 70 + set_bit(PAGE_EXT_IDLE, &lookup_page_ext(page)->flags); 71 + } 72 + 73 + static inline void clear_page_idle(struct page *page) 74 + { 75 + clear_bit(PAGE_EXT_IDLE, &lookup_page_ext(page)->flags); 76 + } 77 + #endif /* CONFIG_64BIT */ 78 + 79 + #else /* !CONFIG_IDLE_PAGE_TRACKING */ 80 + 81 + static inline bool page_is_young(struct page *page) 82 + { 83 + return false; 84 + } 85 + 86 + static inline void set_page_young(struct page *page) 87 + { 88 + } 89 + 90 + static inline bool test_and_clear_page_young(struct page *page) 91 + { 92 + return false; 93 + } 94 + 95 + static inline bool page_is_idle(struct page *page) 96 + { 97 + return false; 98 + } 99 + 100 + static inline void set_page_idle(struct page *page) 101 + { 102 + } 103 + 104 + static inline void clear_page_idle(struct page *page) 105 + { 106 + } 107 + 108 + #endif /* CONFIG_IDLE_PAGE_TRACKING */ 109 + 110 + #endif /* _LINUX_MM_PAGE_IDLE_H */

+2 -9

include/linux/poison.h

··· 19 19 * under normal circumstances, used to verify that nobody uses 20 20 * non-initialized list entries. 21 21 */ 22 - #define LIST_POISON1 ((void *) 0x00100100 + POISON_POINTER_DELTA) 23 - #define LIST_POISON2 ((void *) 0x00200200 + POISON_POINTER_DELTA) 22 + #define LIST_POISON1 ((void *) 0x100 + POISON_POINTER_DELTA) 23 + #define LIST_POISON2 ((void *) 0x200 + POISON_POINTER_DELTA) 24 24 25 25 /********** include/linux/timer.h **********/ 26 26 /* ··· 69 69 #define ATM_POISON_FREE 0x12 70 70 #define ATM_POISON 0xdeadbeef 71 71 72 - /********** net/ **********/ 73 - #define NEIGHBOR_DEAD 0xdeadbeef 74 - #define NETFILTER_LINK_POISON 0xdead57ac 75 - 76 72 /********** kernel/mutexes **********/ 77 73 #define MUTEX_DEBUG_INIT 0x11 78 74 #define MUTEX_DEBUG_FREE 0x22 ··· 78 82 79 83 /********** security/ **********/ 80 84 #define KEY_DESTROY 0xbd 81 - 82 - /********** sound/oss/ **********/ 83 - #define OSS_POISON_FREE 0xAB 84 85 85 86 #endif

+10 -4

include/linux/printk.h

··· 404 404 static DEFINE_RATELIMIT_STATE(_rs, \ 405 405 DEFAULT_RATELIMIT_INTERVAL, \ 406 406 DEFAULT_RATELIMIT_BURST); \ 407 - DEFINE_DYNAMIC_DEBUG_METADATA(descriptor, fmt); \ 407 + DEFINE_DYNAMIC_DEBUG_METADATA(descriptor, pr_fmt(fmt)); \ 408 408 if (unlikely(descriptor.flags & _DPRINTK_FLAGS_PRINT) && \ 409 409 __ratelimit(&_rs)) \ 410 - __dynamic_pr_debug(&descriptor, fmt, ##__VA_ARGS__); \ 410 + __dynamic_pr_debug(&descriptor, pr_fmt(fmt), ##__VA_ARGS__); \ 411 411 } while (0) 412 412 #elif defined(DEBUG) 413 413 #define pr_debug_ratelimited(fmt, ...) \ ··· 456 456 groupsize, buf, len, ascii) \ 457 457 dynamic_hex_dump(prefix_str, prefix_type, rowsize, \ 458 458 groupsize, buf, len, ascii) 459 - #else 459 + #elif defined(DEBUG) 460 460 #define print_hex_dump_debug(prefix_str, prefix_type, rowsize, \ 461 461 groupsize, buf, len, ascii) \ 462 462 print_hex_dump(KERN_DEBUG, prefix_str, prefix_type, rowsize, \ 463 463 groupsize, buf, len, ascii) 464 - #endif /* defined(CONFIG_DYNAMIC_DEBUG) */ 464 + #else 465 + static inline void print_hex_dump_debug(const char *prefix_str, int prefix_type, 466 + int rowsize, int groupsize, 467 + const void *buf, size_t len, bool ascii) 468 + { 469 + } 470 + #endif 465 471 466 472 #endif

+4

include/linux/seq_file.h

··· 122 122 __printf(2, 3) int seq_printf(struct seq_file *, const char *, ...); 123 123 __printf(2, 0) int seq_vprintf(struct seq_file *, const char *, va_list args); 124 124 125 + void seq_hex_dump(struct seq_file *m, const char *prefix_str, int prefix_type, 126 + int rowsize, int groupsize, const void *buf, size_t len, 127 + bool ascii); 128 + 125 129 int seq_path(struct seq_file *, const struct path *, const char *); 126 130 int seq_file_path(struct seq_file *, struct file *, const char *); 127 131 int seq_dentry(struct seq_file *, struct dentry *, const char *);

+7 -7

include/linux/string_helpers.h

··· 48 48 #define ESCAPE_HEX 0x20 49 49 50 50 int string_escape_mem(const char *src, size_t isz, char *dst, size_t osz, 51 - unsigned int flags, const char *esc); 51 + unsigned int flags, const char *only); 52 52 53 53 static inline int string_escape_mem_any_np(const char *src, size_t isz, 54 - char *dst, size_t osz, const char *esc) 54 + char *dst, size_t osz, const char *only) 55 55 { 56 - return string_escape_mem(src, isz, dst, osz, ESCAPE_ANY_NP, esc); 56 + return string_escape_mem(src, isz, dst, osz, ESCAPE_ANY_NP, only); 57 57 } 58 58 59 59 static inline int string_escape_str(const char *src, char *dst, size_t sz, 60 - unsigned int flags, const char *esc) 60 + unsigned int flags, const char *only) 61 61 { 62 - return string_escape_mem(src, strlen(src), dst, sz, flags, esc); 62 + return string_escape_mem(src, strlen(src), dst, sz, flags, only); 63 63 } 64 64 65 65 static inline int string_escape_str_any_np(const char *src, char *dst, 66 - size_t sz, const char *esc) 66 + size_t sz, const char *only) 67 67 { 68 - return string_escape_str(src, dst, sz, ESCAPE_ANY_NP, esc); 68 + return string_escape_str(src, dst, sz, ESCAPE_ANY_NP, only); 69 69 } 70 70 71 71 #endif

+2

include/linux/zpool.h

··· 36 36 ZPOOL_MM_DEFAULT = ZPOOL_MM_RW 37 37 }; 38 38 39 + bool zpool_has_pool(char *type); 40 + 39 41 struct zpool *zpool_create_pool(char *type, char *name, 40 42 gfp_t gfp, const struct zpool_ops *ops); 41 43

+1

include/uapi/linux/kernel-page-flags.h

··· 33 33 #define KPF_THP 22 34 34 #define KPF_BALLOON 23 35 35 #define KPF_ZERO_PAGE 24 36 + #define KPF_IDLE 25 36 37 37 38 38 39 #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */

+2 -2

init/initramfs.c

··· 526 526 527 527 static void __init free_initrd(void) 528 528 { 529 - #ifdef CONFIG_KEXEC 529 + #ifdef CONFIG_KEXEC_CORE 530 530 unsigned long crashk_start = (unsigned long)__va(crashk_res.start); 531 531 unsigned long crashk_end = (unsigned long)__va(crashk_res.end); 532 532 #endif 533 533 if (do_retain_initrd) 534 534 goto skip; 535 535 536 - #ifdef CONFIG_KEXEC 536 + #ifdef CONFIG_KEXEC_CORE 537 537 /* 538 538 * If the initrd region is overlapped with crashkernel reserved region, 539 539 * free only memory that is not part of crashkernel region.

-1

init/main.c

··· 877 877 static void __init do_basic_setup(void) 878 878 { 879 879 cpuset_init_smp(); 880 - usermodehelper_init(); 881 880 shmem_init(); 882 881 driver_init(); 883 882 init_irq_proc();

+1 -1

ipc/msgutil.c

··· 123 123 size_t len = src->m_ts; 124 124 size_t alen; 125 125 126 - BUG_ON(dst == NULL); 126 + WARN_ON(dst == NULL); 127 127 if (src->m_ts > dst->m_ts) 128 128 return ERR_PTR(-EINVAL); 129 129

+2 -2

ipc/shm.c

··· 159 159 * We raced in the idr lookup or with shm_destroy(). Either way, the 160 160 * ID is busted. 161 161 */ 162 - BUG_ON(IS_ERR(ipcp)); 162 + WARN_ON(IS_ERR(ipcp)); 163 163 164 164 return container_of(ipcp, struct shmid_kernel, shm_perm); 165 165 } ··· 393 393 return ret; 394 394 sfd->vm_ops = vma->vm_ops; 395 395 #ifdef CONFIG_MMU 396 - BUG_ON(!sfd->vm_ops->fault); 396 + WARN_ON(!sfd->vm_ops->fault); 397 397 #endif 398 398 vma->vm_ops = &shm_vm_ops; 399 399 shm_open(vma);

+2

kernel/Makefile

··· 49 49 obj-$(CONFIG_MODULE_SIG) += module_signing.o 50 50 obj-$(CONFIG_KALLSYMS) += kallsyms.o 51 51 obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o 52 + obj-$(CONFIG_KEXEC_CORE) += kexec_core.o 52 53 obj-$(CONFIG_KEXEC) += kexec.o 54 + obj-$(CONFIG_KEXEC_FILE) += kexec_file.o 53 55 obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o 54 56 obj-$(CONFIG_COMPAT) += compat.o 55 57 obj-$(CONFIG_CGROUPS) += cgroup.o

+9 -4

kernel/cred.c

··· 20 20 #include <linux/cn_proc.h> 21 21 22 22 #if 0 23 - #define kdebug(FMT, ...) \ 24 - printk("[%-5.5s%5u] "FMT"\n", current->comm, current->pid ,##__VA_ARGS__) 23 + #define kdebug(FMT, ...) \ 24 + printk("[%-5.5s%5u] " FMT "\n", \ 25 + current->comm, current->pid, ##__VA_ARGS__) 25 26 #else 26 - #define kdebug(FMT, ...) \ 27 - no_printk("[%-5.5s%5u] "FMT"\n", current->comm, current->pid ,##__VA_ARGS__) 27 + #define kdebug(FMT, ...) \ 28 + do { \ 29 + if (0) \ 30 + no_printk("[%-5.5s%5u] " FMT "\n", \ 31 + current->comm, current->pid, ##__VA_ARGS__); \ 32 + } while (0) 28 33 #endif 29 34 30 35 static struct kmem_cache *cred_jar;

+1 -1

kernel/events/core.c

··· 9094 9094 mutex_unlock(&swhash->hlist_mutex); 9095 9095 } 9096 9096 9097 - #if defined CONFIG_HOTPLUG_CPU || defined CONFIG_KEXEC 9097 + #if defined CONFIG_HOTPLUG_CPU || defined CONFIG_KEXEC_CORE 9098 9098 static void __perf_event_exit_context(void *__info) 9099 9099 { 9100 9100 struct remove_event re = { .detach_group = true };

-1

kernel/extable.c

··· 18 18 #include <linux/ftrace.h> 19 19 #include <linux/memory.h> 20 20 #include <linux/module.h> 21 - #include <linux/ftrace.h> 22 21 #include <linux/mutex.h> 23 22 #include <linux/init.h> 24 23

+3 -2528

kernel/kexec.c

··· 1 1 /* 2 - * kexec.c - kexec system call 2 + * kexec.c - kexec_load system call 3 3 * Copyright (C) 2002-2004 Eric Biederman <ebiederm@xmission.com> 4 4 * 5 5 * This source code is licensed under the GNU General Public License, 6 6 * Version 2. See the file COPYING for more details. 7 7 */ 8 8 9 - #define pr_fmt(fmt) "kexec: " fmt 10 - 11 9 #include <linux/capability.h> 12 10 #include <linux/mm.h> 13 11 #include <linux/file.h> 14 - #include <linux/slab.h> 15 - #include <linux/fs.h> 16 12 #include <linux/kexec.h> 17 13 #include <linux/mutex.h> 18 14 #include <linux/list.h> 19 - #include <linux/highmem.h> 20 15 #include <linux/syscalls.h> 21 - #include <linux/reboot.h> 22 - #include <linux/ioport.h> 23 - #include <linux/hardirq.h> 24 - #include <linux/elf.h> 25 - #include <linux/elfcore.h> 26 - #include <linux/utsname.h> 27 - #include <linux/numa.h> 28 - #include <linux/suspend.h> 29 - #include <linux/device.h> 30 - #include <linux/freezer.h> 31 - #include <linux/pm.h> 32 - #include <linux/cpu.h> 33 - #include <linux/console.h> 34 16 #include <linux/vmalloc.h> 35 - #include <linux/swap.h> 36 - #include <linux/syscore_ops.h> 37 - #include <linux/compiler.h> 38 - #include <linux/hugetlb.h> 17 + #include <linux/slab.h> 39 18 40 - #include <asm/page.h> 41 - #include <asm/uaccess.h> 42 - #include <asm/io.h> 43 - #include <asm/sections.h> 44 - 45 - #include <crypto/hash.h> 46 - #include <crypto/sha.h> 47 - 48 - /* Per cpu memory for storing cpu states in case of system crash. */ 49 - note_buf_t __percpu *crash_notes; 50 - 51 - /* vmcoreinfo stuff */ 52 - static unsigned char vmcoreinfo_data[VMCOREINFO_BYTES]; 53 - u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4]; 54 - size_t vmcoreinfo_size; 55 - size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data); 56 - 57 - /* Flag to indicate we are going to kexec a new kernel */ 58 - bool kexec_in_progress = false; 59 - 60 - /* 61 - * Declare these symbols weak so that if architecture provides a purgatory, 62 - * these will be overridden. 63 - */ 64 - char __weak kexec_purgatory[0]; 65 - size_t __weak kexec_purgatory_size = 0; 66 - 67 - #ifdef CONFIG_KEXEC_FILE 68 - static int kexec_calculate_store_digests(struct kimage *image); 69 - #endif 70 - 71 - /* Location of the reserved area for the crash kernel */ 72 - struct resource crashk_res = { 73 - .name = "Crash kernel", 74 - .start = 0, 75 - .end = 0, 76 - .flags = IORESOURCE_BUSY | IORESOURCE_MEM 77 - }; 78 - struct resource crashk_low_res = { 79 - .name = "Crash kernel", 80 - .start = 0, 81 - .end = 0, 82 - .flags = IORESOURCE_BUSY | IORESOURCE_MEM 83 - }; 84 - 85 - int kexec_should_crash(struct task_struct *p) 86 - { 87 - /* 88 - * If crash_kexec_post_notifiers is enabled, don't run 89 - * crash_kexec() here yet, which must be run after panic 90 - * notifiers in panic(). 91 - */ 92 - if (crash_kexec_post_notifiers) 93 - return 0; 94 - /* 95 - * There are 4 panic() calls in do_exit() path, each of which 96 - * corresponds to each of these 4 conditions. 97 - */ 98 - if (in_interrupt() || !p->pid || is_global_init(p) || panic_on_oops) 99 - return 1; 100 - return 0; 101 - } 102 - 103 - /* 104 - * When kexec transitions to the new kernel there is a one-to-one 105 - * mapping between physical and virtual addresses. On processors 106 - * where you can disable the MMU this is trivial, and easy. For 107 - * others it is still a simple predictable page table to setup. 108 - * 109 - * In that environment kexec copies the new kernel to its final 110 - * resting place. This means I can only support memory whose 111 - * physical address can fit in an unsigned long. In particular 112 - * addresses where (pfn << PAGE_SHIFT) > ULONG_MAX cannot be handled. 113 - * If the assembly stub has more restrictive requirements 114 - * KEXEC_SOURCE_MEMORY_LIMIT and KEXEC_DEST_MEMORY_LIMIT can be 115 - * defined more restrictively in <asm/kexec.h>. 116 - * 117 - * The code for the transition from the current kernel to the 118 - * the new kernel is placed in the control_code_buffer, whose size 119 - * is given by KEXEC_CONTROL_PAGE_SIZE. In the best case only a single 120 - * page of memory is necessary, but some architectures require more. 121 - * Because this memory must be identity mapped in the transition from 122 - * virtual to physical addresses it must live in the range 123 - * 0 - TASK_SIZE, as only the user space mappings are arbitrarily 124 - * modifiable. 125 - * 126 - * The assembly stub in the control code buffer is passed a linked list 127 - * of descriptor pages detailing the source pages of the new kernel, 128 - * and the destination addresses of those source pages. As this data 129 - * structure is not used in the context of the current OS, it must 130 - * be self-contained. 131 - * 132 - * The code has been made to work with highmem pages and will use a 133 - * destination page in its final resting place (if it happens 134 - * to allocate it). The end product of this is that most of the 135 - * physical address space, and most of RAM can be used. 136 - * 137 - * Future directions include: 138 - * - allocating a page table with the control code buffer identity 139 - * mapped, to simplify machine_kexec and make kexec_on_panic more 140 - * reliable. 141 - */ 142 - 143 - /* 144 - * KIMAGE_NO_DEST is an impossible destination address..., for 145 - * allocating pages whose destination address we do not care about. 146 - */ 147 - #define KIMAGE_NO_DEST (-1UL) 148 - 149 - static int kimage_is_destination_range(struct kimage *image, 150 - unsigned long start, unsigned long end); 151 - static struct page *kimage_alloc_page(struct kimage *image, 152 - gfp_t gfp_mask, 153 - unsigned long dest); 19 + #include "kexec_internal.h" 154 20 155 21 static int copy_user_segment_list(struct kimage *image, 156 22 unsigned long nr_segments, ··· 34 168 35 169 return ret; 36 170 } 37 - 38 - static int sanity_check_segment_list(struct kimage *image) 39 - { 40 - int result, i; 41 - unsigned long nr_segments = image->nr_segments; 42 - 43 - /* 44 - * Verify we have good destination addresses. The caller is 45 - * responsible for making certain we don't attempt to load 46 - * the new image into invalid or reserved areas of RAM. This 47 - * just verifies it is an address we can use. 48 - * 49 - * Since the kernel does everything in page size chunks ensure 50 - * the destination addresses are page aligned. Too many 51 - * special cases crop of when we don't do this. The most 52 - * insidious is getting overlapping destination addresses 53 - * simply because addresses are changed to page size 54 - * granularity. 55 - */ 56 - result = -EADDRNOTAVAIL; 57 - for (i = 0; i < nr_segments; i++) { 58 - unsigned long mstart, mend; 59 - 60 - mstart = image->segment[i].mem; 61 - mend = mstart + image->segment[i].memsz; 62 - if ((mstart & ~PAGE_MASK) || (mend & ~PAGE_MASK)) 63 - return result; 64 - if (mend >= KEXEC_DESTINATION_MEMORY_LIMIT) 65 - return result; 66 - } 67 - 68 - /* Verify our destination addresses do not overlap. 69 - * If we alloed overlapping destination addresses 70 - * through very weird things can happen with no 71 - * easy explanation as one segment stops on another. 72 - */ 73 - result = -EINVAL; 74 - for (i = 0; i < nr_segments; i++) { 75 - unsigned long mstart, mend; 76 - unsigned long j; 77 - 78 - mstart = image->segment[i].mem; 79 - mend = mstart + image->segment[i].memsz; 80 - for (j = 0; j < i; j++) { 81 - unsigned long pstart, pend; 82 - pstart = image->segment[j].mem; 83 - pend = pstart + image->segment[j].memsz; 84 - /* Do the segments overlap ? */ 85 - if ((mend > pstart) && (mstart < pend)) 86 - return result; 87 - } 88 - } 89 - 90 - /* Ensure our buffer sizes are strictly less than 91 - * our memory sizes. This should always be the case, 92 - * and it is easier to check up front than to be surprised 93 - * later on. 94 - */ 95 - result = -EINVAL; 96 - for (i = 0; i < nr_segments; i++) { 97 - if (image->segment[i].bufsz > image->segment[i].memsz) 98 - return result; 99 - } 100 - 101 - /* 102 - * Verify we have good destination addresses. Normally 103 - * the caller is responsible for making certain we don't 104 - * attempt to load the new image into invalid or reserved 105 - * areas of RAM. But crash kernels are preloaded into a 106 - * reserved area of ram. We must ensure the addresses 107 - * are in the reserved area otherwise preloading the 108 - * kernel could corrupt things. 109 - */ 110 - 111 - if (image->type == KEXEC_TYPE_CRASH) { 112 - result = -EADDRNOTAVAIL; 113 - for (i = 0; i < nr_segments; i++) { 114 - unsigned long mstart, mend; 115 - 116 - mstart = image->segment[i].mem; 117 - mend = mstart + image->segment[i].memsz - 1; 118 - /* Ensure we are within the crash kernel limits */ 119 - if ((mstart < crashk_res.start) || 120 - (mend > crashk_res.end)) 121 - return result; 122 - } 123 - } 124 - 125 - return 0; 126 - } 127 - 128 - static struct kimage *do_kimage_alloc_init(void) 129 - { 130 - struct kimage *image; 131 - 132 - /* Allocate a controlling structure */ 133 - image = kzalloc(sizeof(*image), GFP_KERNEL); 134 - if (!image) 135 - return NULL; 136 - 137 - image->head = 0; 138 - image->entry = &image->head; 139 - image->last_entry = &image->head; 140 - image->control_page = ~0; /* By default this does not apply */ 141 - image->type = KEXEC_TYPE_DEFAULT; 142 - 143 - /* Initialize the list of control pages */ 144 - INIT_LIST_HEAD(&image->control_pages); 145 - 146 - /* Initialize the list of destination pages */ 147 - INIT_LIST_HEAD(&image->dest_pages); 148 - 149 - /* Initialize the list of unusable pages */ 150 - INIT_LIST_HEAD(&image->unusable_pages); 151 - 152 - return image; 153 - } 154 - 155 - static void kimage_free_page_list(struct list_head *list); 156 171 157 172 static int kimage_alloc_init(struct kimage **rimage, unsigned long entry, 158 173 unsigned long nr_segments, ··· 101 354 return ret; 102 355 } 103 356 104 - #ifdef CONFIG_KEXEC_FILE 105 - static int copy_file_from_fd(int fd, void **buf, unsigned long *buf_len) 106 - { 107 - struct fd f = fdget(fd); 108 - int ret; 109 - struct kstat stat; 110 - loff_t pos; 111 - ssize_t bytes = 0; 112 - 113 - if (!f.file) 114 - return -EBADF; 115 - 116 - ret = vfs_getattr(&f.file->f_path, &stat); 117 - if (ret) 118 - goto out; 119 - 120 - if (stat.size > INT_MAX) { 121 - ret = -EFBIG; 122 - goto out; 123 - } 124 - 125 - /* Don't hand 0 to vmalloc, it whines. */ 126 - if (stat.size == 0) { 127 - ret = -EINVAL; 128 - goto out; 129 - } 130 - 131 - *buf = vmalloc(stat.size); 132 - if (!*buf) { 133 - ret = -ENOMEM; 134 - goto out; 135 - } 136 - 137 - pos = 0; 138 - while (pos < stat.size) { 139 - bytes = kernel_read(f.file, pos, (char *)(*buf) + pos, 140 - stat.size - pos); 141 - if (bytes < 0) { 142 - vfree(*buf); 143 - ret = bytes; 144 - goto out; 145 - } 146 - 147 - if (bytes == 0) 148 - break; 149 - pos += bytes; 150 - } 151 - 152 - if (pos != stat.size) { 153 - ret = -EBADF; 154 - vfree(*buf); 155 - goto out; 156 - } 157 - 158 - *buf_len = pos; 159 - out: 160 - fdput(f); 161 - return ret; 162 - } 163 - 164 - /* Architectures can provide this probe function */ 165 - int __weak arch_kexec_kernel_image_probe(struct kimage *image, void *buf, 166 - unsigned long buf_len) 167 - { 168 - return -ENOEXEC; 169 - } 170 - 171 - void * __weak arch_kexec_kernel_image_load(struct kimage *image) 172 - { 173 - return ERR_PTR(-ENOEXEC); 174 - } 175 - 176 - void __weak arch_kimage_file_post_load_cleanup(struct kimage *image) 177 - { 178 - } 179 - 180 - int __weak arch_kexec_kernel_verify_sig(struct kimage *image, void *buf, 181 - unsigned long buf_len) 182 - { 183 - return -EKEYREJECTED; 184 - } 185 - 186 - /* Apply relocations of type RELA */ 187 - int __weak 188 - arch_kexec_apply_relocations_add(const Elf_Ehdr *ehdr, Elf_Shdr *sechdrs, 189 - unsigned int relsec) 190 - { 191 - pr_err("RELA relocation unsupported.\n"); 192 - return -ENOEXEC; 193 - } 194 - 195 - /* Apply relocations of type REL */ 196 - int __weak 197 - arch_kexec_apply_relocations(const Elf_Ehdr *ehdr, Elf_Shdr *sechdrs, 198 - unsigned int relsec) 199 - { 200 - pr_err("REL relocation unsupported.\n"); 201 - return -ENOEXEC; 202 - } 203 - 204 - /* 205 - * Free up memory used by kernel, initrd, and command line. This is temporary 206 - * memory allocation which is not needed any more after these buffers have 207 - * been loaded into separate segments and have been copied elsewhere. 208 - */ 209 - static void kimage_file_post_load_cleanup(struct kimage *image) 210 - { 211 - struct purgatory_info *pi = &image->purgatory_info; 212 - 213 - vfree(image->kernel_buf); 214 - image->kernel_buf = NULL; 215 - 216 - vfree(image->initrd_buf); 217 - image->initrd_buf = NULL; 218 - 219 - kfree(image->cmdline_buf); 220 - image->cmdline_buf = NULL; 221 - 222 - vfree(pi->purgatory_buf); 223 - pi->purgatory_buf = NULL; 224 - 225 - vfree(pi->sechdrs); 226 - pi->sechdrs = NULL; 227 - 228 - /* See if architecture has anything to cleanup post load */ 229 - arch_kimage_file_post_load_cleanup(image); 230 - 231 - /* 232 - * Above call should have called into bootloader to free up 233 - * any data stored in kimage->image_loader_data. It should 234 - * be ok now to free it up. 235 - */ 236 - kfree(image->image_loader_data); 237 - image->image_loader_data = NULL; 238 - } 239 - 240 - /* 241 - * In file mode list of segments is prepared by kernel. Copy relevant 242 - * data from user space, do error checking, prepare segment list 243 - */ 244 - static int 245 - kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, 246 - const char __user *cmdline_ptr, 247 - unsigned long cmdline_len, unsigned flags) 248 - { 249 - int ret = 0; 250 - void *ldata; 251 - 252 - ret = copy_file_from_fd(kernel_fd, &image->kernel_buf, 253 - &image->kernel_buf_len); 254 - if (ret) 255 - return ret; 256 - 257 - /* Call arch image probe handlers */ 258 - ret = arch_kexec_kernel_image_probe(image, image->kernel_buf, 259 - image->kernel_buf_len); 260 - 261 - if (ret) 262 - goto out; 263 - 264 - #ifdef CONFIG_KEXEC_VERIFY_SIG 265 - ret = arch_kexec_kernel_verify_sig(image, image->kernel_buf, 266 - image->kernel_buf_len); 267 - if (ret) { 268 - pr_debug("kernel signature verification failed.\n"); 269 - goto out; 270 - } 271 - pr_debug("kernel signature verification successful.\n"); 272 - #endif 273 - /* It is possible that there no initramfs is being loaded */ 274 - if (!(flags & KEXEC_FILE_NO_INITRAMFS)) { 275 - ret = copy_file_from_fd(initrd_fd, &image->initrd_buf, 276 - &image->initrd_buf_len); 277 - if (ret) 278 - goto out; 279 - } 280 - 281 - if (cmdline_len) { 282 - image->cmdline_buf = kzalloc(cmdline_len, GFP_KERNEL); 283 - if (!image->cmdline_buf) { 284 - ret = -ENOMEM; 285 - goto out; 286 - } 287 - 288 - ret = copy_from_user(image->cmdline_buf, cmdline_ptr, 289 - cmdline_len); 290 - if (ret) { 291 - ret = -EFAULT; 292 - goto out; 293 - } 294 - 295 - image->cmdline_buf_len = cmdline_len; 296 - 297 - /* command line should be a string with last byte null */ 298 - if (image->cmdline_buf[cmdline_len - 1] != '\0') { 299 - ret = -EINVAL; 300 - goto out; 301 - } 302 - } 303 - 304 - /* Call arch image load handlers */ 305 - ldata = arch_kexec_kernel_image_load(image); 306 - 307 - if (IS_ERR(ldata)) { 308 - ret = PTR_ERR(ldata); 309 - goto out; 310 - } 311 - 312 - image->image_loader_data = ldata; 313 - out: 314 - /* In case of error, free up all allocated memory in this function */ 315 - if (ret) 316 - kimage_file_post_load_cleanup(image); 317 - return ret; 318 - } 319 - 320 - static int 321 - kimage_file_alloc_init(struct kimage **rimage, int kernel_fd, 322 - int initrd_fd, const char __user *cmdline_ptr, 323 - unsigned long cmdline_len, unsigned long flags) 324 - { 325 - int ret; 326 - struct kimage *image; 327 - bool kexec_on_panic = flags & KEXEC_FILE_ON_CRASH; 328 - 329 - image = do_kimage_alloc_init(); 330 - if (!image) 331 - return -ENOMEM; 332 - 333 - image->file_mode = 1; 334 - 335 - if (kexec_on_panic) { 336 - /* Enable special crash kernel control page alloc policy. */ 337 - image->control_page = crashk_res.start; 338 - image->type = KEXEC_TYPE_CRASH; 339 - } 340 - 341 - ret = kimage_file_prepare_segments(image, kernel_fd, initrd_fd, 342 - cmdline_ptr, cmdline_len, flags); 343 - if (ret) 344 - goto out_free_image; 345 - 346 - ret = sanity_check_segment_list(image); 347 - if (ret) 348 - goto out_free_post_load_bufs; 349 - 350 - ret = -ENOMEM; 351 - image->control_code_page = kimage_alloc_control_pages(image, 352 - get_order(KEXEC_CONTROL_PAGE_SIZE)); 353 - if (!image->control_code_page) { 354 - pr_err("Could not allocate control_code_buffer\n"); 355 - goto out_free_post_load_bufs; 356 - } 357 - 358 - if (!kexec_on_panic) { 359 - image->swap_page = kimage_alloc_control_pages(image, 0); 360 - if (!image->swap_page) { 361 - pr_err("Could not allocate swap buffer\n"); 362 - goto out_free_control_pages; 363 - } 364 - } 365 - 366 - *rimage = image; 367 - return 0; 368 - out_free_control_pages: 369 - kimage_free_page_list(&image->control_pages); 370 - out_free_post_load_bufs: 371 - kimage_file_post_load_cleanup(image); 372 - out_free_image: 373 - kfree(image); 374 - return ret; 375 - } 376 - #else /* CONFIG_KEXEC_FILE */ 377 - static inline void kimage_file_post_load_cleanup(struct kimage *image) { } 378 - #endif /* CONFIG_KEXEC_FILE */ 379 - 380 - static int kimage_is_destination_range(struct kimage *image, 381 - unsigned long start, 382 - unsigned long end) 383 - { 384 - unsigned long i; 385 - 386 - for (i = 0; i < image->nr_segments; i++) { 387 - unsigned long mstart, mend; 388 - 389 - mstart = image->segment[i].mem; 390 - mend = mstart + image->segment[i].memsz; 391 - if ((end > mstart) && (start < mend)) 392 - return 1; 393 - } 394 - 395 - return 0; 396 - } 397 - 398 - static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order) 399 - { 400 - struct page *pages; 401 - 402 - pages = alloc_pages(gfp_mask, order); 403 - if (pages) { 404 - unsigned int count, i; 405 - pages->mapping = NULL; 406 - set_page_private(pages, order); 407 - count = 1 << order; 408 - for (i = 0; i < count; i++) 409 - SetPageReserved(pages + i); 410 - } 411 - 412 - return pages; 413 - } 414 - 415 - static void kimage_free_pages(struct page *page) 416 - { 417 - unsigned int order, count, i; 418 - 419 - order = page_private(page); 420 - count = 1 << order; 421 - for (i = 0; i < count; i++) 422 - ClearPageReserved(page + i); 423 - __free_pages(page, order); 424 - } 425 - 426 - static void kimage_free_page_list(struct list_head *list) 427 - { 428 - struct list_head *pos, *next; 429 - 430 - list_for_each_safe(pos, next, list) { 431 - struct page *page; 432 - 433 - page = list_entry(pos, struct page, lru); 434 - list_del(&page->lru); 435 - kimage_free_pages(page); 436 - } 437 - } 438 - 439 - static struct page *kimage_alloc_normal_control_pages(struct kimage *image, 440 - unsigned int order) 441 - { 442 - /* Control pages are special, they are the intermediaries 443 - * that are needed while we copy the rest of the pages 444 - * to their final resting place. As such they must 445 - * not conflict with either the destination addresses 446 - * or memory the kernel is already using. 447 - * 448 - * The only case where we really need more than one of 449 - * these are for architectures where we cannot disable 450 - * the MMU and must instead generate an identity mapped 451 - * page table for all of the memory. 452 - * 453 - * At worst this runs in O(N) of the image size. 454 - */ 455 - struct list_head extra_pages; 456 - struct page *pages; 457 - unsigned int count; 458 - 459 - count = 1 << order; 460 - INIT_LIST_HEAD(&extra_pages); 461 - 462 - /* Loop while I can allocate a page and the page allocated 463 - * is a destination page. 464 - */ 465 - do { 466 - unsigned long pfn, epfn, addr, eaddr; 467 - 468 - pages = kimage_alloc_pages(KEXEC_CONTROL_MEMORY_GFP, order); 469 - if (!pages) 470 - break; 471 - pfn = page_to_pfn(pages); 472 - epfn = pfn + count; 473 - addr = pfn << PAGE_SHIFT; 474 - eaddr = epfn << PAGE_SHIFT; 475 - if ((epfn >= (KEXEC_CONTROL_MEMORY_LIMIT >> PAGE_SHIFT)) || 476 - kimage_is_destination_range(image, addr, eaddr)) { 477 - list_add(&pages->lru, &extra_pages); 478 - pages = NULL; 479 - } 480 - } while (!pages); 481 - 482 - if (pages) { 483 - /* Remember the allocated page... */ 484 - list_add(&pages->lru, &image->control_pages); 485 - 486 - /* Because the page is already in it's destination 487 - * location we will never allocate another page at 488 - * that address. Therefore kimage_alloc_pages 489 - * will not return it (again) and we don't need 490 - * to give it an entry in image->segment[]. 491 - */ 492 - } 493 - /* Deal with the destination pages I have inadvertently allocated. 494 - * 495 - * Ideally I would convert multi-page allocations into single 496 - * page allocations, and add everything to image->dest_pages. 497 - * 498 - * For now it is simpler to just free the pages. 499 - */ 500 - kimage_free_page_list(&extra_pages); 501 - 502 - return pages; 503 - } 504 - 505 - static struct page *kimage_alloc_crash_control_pages(struct kimage *image, 506 - unsigned int order) 507 - { 508 - /* Control pages are special, they are the intermediaries 509 - * that are needed while we copy the rest of the pages 510 - * to their final resting place. As such they must 511 - * not conflict with either the destination addresses 512 - * or memory the kernel is already using. 513 - * 514 - * Control pages are also the only pags we must allocate 515 - * when loading a crash kernel. All of the other pages 516 - * are specified by the segments and we just memcpy 517 - * into them directly. 518 - * 519 - * The only case where we really need more than one of 520 - * these are for architectures where we cannot disable 521 - * the MMU and must instead generate an identity mapped 522 - * page table for all of the memory. 523 - * 524 - * Given the low demand this implements a very simple 525 - * allocator that finds the first hole of the appropriate 526 - * size in the reserved memory region, and allocates all 527 - * of the memory up to and including the hole. 528 - */ 529 - unsigned long hole_start, hole_end, size; 530 - struct page *pages; 531 - 532 - pages = NULL; 533 - size = (1 << order) << PAGE_SHIFT; 534 - hole_start = (image->control_page + (size - 1)) & ~(size - 1); 535 - hole_end = hole_start + size - 1; 536 - while (hole_end <= crashk_res.end) { 537 - unsigned long i; 538 - 539 - if (hole_end > KEXEC_CRASH_CONTROL_MEMORY_LIMIT) 540 - break; 541 - /* See if I overlap any of the segments */ 542 - for (i = 0; i < image->nr_segments; i++) { 543 - unsigned long mstart, mend; 544 - 545 - mstart = image->segment[i].mem; 546 - mend = mstart + image->segment[i].memsz - 1; 547 - if ((hole_end >= mstart) && (hole_start <= mend)) { 548 - /* Advance the hole to the end of the segment */ 549 - hole_start = (mend + (size - 1)) & ~(size - 1); 550 - hole_end = hole_start + size - 1; 551 - break; 552 - } 553 - } 554 - /* If I don't overlap any segments I have found my hole! */ 555 - if (i == image->nr_segments) { 556 - pages = pfn_to_page(hole_start >> PAGE_SHIFT); 557 - break; 558 - } 559 - } 560 - if (pages) 561 - image->control_page = hole_end; 562 - 563 - return pages; 564 - } 565 - 566 - 567 - struct page *kimage_alloc_control_pages(struct kimage *image, 568 - unsigned int order) 569 - { 570 - struct page *pages = NULL; 571 - 572 - switch (image->type) { 573 - case KEXEC_TYPE_DEFAULT: 574 - pages = kimage_alloc_normal_control_pages(image, order); 575 - break; 576 - case KEXEC_TYPE_CRASH: 577 - pages = kimage_alloc_crash_control_pages(image, order); 578 - break; 579 - } 580 - 581 - return pages; 582 - } 583 - 584 - static int kimage_add_entry(struct kimage *image, kimage_entry_t entry) 585 - { 586 - if (*image->entry != 0) 587 - image->entry++; 588 - 589 - if (image->entry == image->last_entry) { 590 - kimage_entry_t *ind_page; 591 - struct page *page; 592 - 593 - page = kimage_alloc_page(image, GFP_KERNEL, KIMAGE_NO_DEST); 594 - if (!page) 595 - return -ENOMEM; 596 - 597 - ind_page = page_address(page); 598 - *image->entry = virt_to_phys(ind_page) | IND_INDIRECTION; 599 - image->entry = ind_page; 600 - image->last_entry = ind_page + 601 - ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1); 602 - } 603 - *image->entry = entry; 604 - image->entry++; 605 - *image->entry = 0; 606 - 607 - return 0; 608 - } 609 - 610 - static int kimage_set_destination(struct kimage *image, 611 - unsigned long destination) 612 - { 613 - int result; 614 - 615 - destination &= PAGE_MASK; 616 - result = kimage_add_entry(image, destination | IND_DESTINATION); 617 - 618 - return result; 619 - } 620 - 621 - 622 - static int kimage_add_page(struct kimage *image, unsigned long page) 623 - { 624 - int result; 625 - 626 - page &= PAGE_MASK; 627 - result = kimage_add_entry(image, page | IND_SOURCE); 628 - 629 - return result; 630 - } 631 - 632 - 633 - static void kimage_free_extra_pages(struct kimage *image) 634 - { 635 - /* Walk through and free any extra destination pages I may have */ 636 - kimage_free_page_list(&image->dest_pages); 637 - 638 - /* Walk through and free any unusable pages I have cached */ 639 - kimage_free_page_list(&image->unusable_pages); 640 - 641 - } 642 - static void kimage_terminate(struct kimage *image) 643 - { 644 - if (*image->entry != 0) 645 - image->entry++; 646 - 647 - *image->entry = IND_DONE; 648 - } 649 - 650 - #define for_each_kimage_entry(image, ptr, entry) \ 651 - for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \ 652 - ptr = (entry & IND_INDIRECTION) ? \ 653 - phys_to_virt((entry & PAGE_MASK)) : ptr + 1) 654 - 655 - static void kimage_free_entry(kimage_entry_t entry) 656 - { 657 - struct page *page; 658 - 659 - page = pfn_to_page(entry >> PAGE_SHIFT); 660 - kimage_free_pages(page); 661 - } 662 - 663 - static void kimage_free(struct kimage *image) 664 - { 665 - kimage_entry_t *ptr, entry; 666 - kimage_entry_t ind = 0; 667 - 668 - if (!image) 669 - return; 670 - 671 - kimage_free_extra_pages(image); 672 - for_each_kimage_entry(image, ptr, entry) { 673 - if (entry & IND_INDIRECTION) { 674 - /* Free the previous indirection page */ 675 - if (ind & IND_INDIRECTION) 676 - kimage_free_entry(ind); 677 - /* Save this indirection page until we are 678 - * done with it. 679 - */ 680 - ind = entry; 681 - } else if (entry & IND_SOURCE) 682 - kimage_free_entry(entry); 683 - } 684 - /* Free the final indirection page */ 685 - if (ind & IND_INDIRECTION) 686 - kimage_free_entry(ind); 687 - 688 - /* Handle any machine specific cleanup */ 689 - machine_kexec_cleanup(image); 690 - 691 - /* Free the kexec control pages... */ 692 - kimage_free_page_list(&image->control_pages); 693 - 694 - /* 695 - * Free up any temporary buffers allocated. This might hit if 696 - * error occurred much later after buffer allocation. 697 - */ 698 - if (image->file_mode) 699 - kimage_file_post_load_cleanup(image); 700 - 701 - kfree(image); 702 - } 703 - 704 - static kimage_entry_t *kimage_dst_used(struct kimage *image, 705 - unsigned long page) 706 - { 707 - kimage_entry_t *ptr, entry; 708 - unsigned long destination = 0; 709 - 710 - for_each_kimage_entry(image, ptr, entry) { 711 - if (entry & IND_DESTINATION) 712 - destination = entry & PAGE_MASK; 713 - else if (entry & IND_SOURCE) { 714 - if (page == destination) 715 - return ptr; 716 - destination += PAGE_SIZE; 717 - } 718 - } 719 - 720 - return NULL; 721 - } 722 - 723 - static struct page *kimage_alloc_page(struct kimage *image, 724 - gfp_t gfp_mask, 725 - unsigned long destination) 726 - { 727 - /* 728 - * Here we implement safeguards to ensure that a source page 729 - * is not copied to its destination page before the data on 730 - * the destination page is no longer useful. 731 - * 732 - * To do this we maintain the invariant that a source page is 733 - * either its own destination page, or it is not a 734 - * destination page at all. 735 - * 736 - * That is slightly stronger than required, but the proof 737 - * that no problems will not occur is trivial, and the 738 - * implementation is simply to verify. 739 - * 740 - * When allocating all pages normally this algorithm will run 741 - * in O(N) time, but in the worst case it will run in O(N^2) 742 - * time. If the runtime is a problem the data structures can 743 - * be fixed. 744 - */ 745 - struct page *page; 746 - unsigned long addr; 747 - 748 - /* 749 - * Walk through the list of destination pages, and see if I 750 - * have a match. 751 - */ 752 - list_for_each_entry(page, &image->dest_pages, lru) { 753 - addr = page_to_pfn(page) << PAGE_SHIFT; 754 - if (addr == destination) { 755 - list_del(&page->lru); 756 - return page; 757 - } 758 - } 759 - page = NULL; 760 - while (1) { 761 - kimage_entry_t *old; 762 - 763 - /* Allocate a page, if we run out of memory give up */ 764 - page = kimage_alloc_pages(gfp_mask, 0); 765 - if (!page) 766 - return NULL; 767 - /* If the page cannot be used file it away */ 768 - if (page_to_pfn(page) > 769 - (KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) { 770 - list_add(&page->lru, &image->unusable_pages); 771 - continue; 772 - } 773 - addr = page_to_pfn(page) << PAGE_SHIFT; 774 - 775 - /* If it is the destination page we want use it */ 776 - if (addr == destination) 777 - break; 778 - 779 - /* If the page is not a destination page use it */ 780 - if (!kimage_is_destination_range(image, addr, 781 - addr + PAGE_SIZE)) 782 - break; 783 - 784 - /* 785 - * I know that the page is someones destination page. 786 - * See if there is already a source page for this 787 - * destination page. And if so swap the source pages. 788 - */ 789 - old = kimage_dst_used(image, addr); 790 - if (old) { 791 - /* If so move it */ 792 - unsigned long old_addr; 793 - struct page *old_page; 794 - 795 - old_addr = *old & PAGE_MASK; 796 - old_page = pfn_to_page(old_addr >> PAGE_SHIFT); 797 - copy_highpage(page, old_page); 798 - *old = addr | (*old & ~PAGE_MASK); 799 - 800 - /* The old page I have found cannot be a 801 - * destination page, so return it if it's 802 - * gfp_flags honor the ones passed in. 803 - */ 804 - if (!(gfp_mask & __GFP_HIGHMEM) && 805 - PageHighMem(old_page)) { 806 - kimage_free_pages(old_page); 807 - continue; 808 - } 809 - addr = old_addr; 810 - page = old_page; 811 - break; 812 - } else { 813 - /* Place the page on the destination list I 814 - * will use it later. 815 - */ 816 - list_add(&page->lru, &image->dest_pages); 817 - } 818 - } 819 - 820 - return page; 821 - } 822 - 823 - static int kimage_load_normal_segment(struct kimage *image, 824 - struct kexec_segment *segment) 825 - { 826 - unsigned long maddr; 827 - size_t ubytes, mbytes; 828 - int result; 829 - unsigned char __user *buf = NULL; 830 - unsigned char *kbuf = NULL; 831 - 832 - result = 0; 833 - if (image->file_mode) 834 - kbuf = segment->kbuf; 835 - else 836 - buf = segment->buf; 837 - ubytes = segment->bufsz; 838 - mbytes = segment->memsz; 839 - maddr = segment->mem; 840 - 841 - result = kimage_set_destination(image, maddr); 842 - if (result < 0) 843 - goto out; 844 - 845 - while (mbytes) { 846 - struct page *page; 847 - char *ptr; 848 - size_t uchunk, mchunk; 849 - 850 - page = kimage_alloc_page(image, GFP_HIGHUSER, maddr); 851 - if (!page) { 852 - result = -ENOMEM; 853 - goto out; 854 - } 855 - result = kimage_add_page(image, page_to_pfn(page) 856 - << PAGE_SHIFT); 857 - if (result < 0) 858 - goto out; 859 - 860 - ptr = kmap(page); 861 - /* Start with a clear page */ 862 - clear_page(ptr); 863 - ptr += maddr & ~PAGE_MASK; 864 - mchunk = min_t(size_t, mbytes, 865 - PAGE_SIZE - (maddr & ~PAGE_MASK)); 866 - uchunk = min(ubytes, mchunk); 867 - 868 - /* For file based kexec, source pages are in kernel memory */ 869 - if (image->file_mode) 870 - memcpy(ptr, kbuf, uchunk); 871 - else 872 - result = copy_from_user(ptr, buf, uchunk); 873 - kunmap(page); 874 - if (result) { 875 - result = -EFAULT; 876 - goto out; 877 - } 878 - ubytes -= uchunk; 879 - maddr += mchunk; 880 - if (image->file_mode) 881 - kbuf += mchunk; 882 - else 883 - buf += mchunk; 884 - mbytes -= mchunk; 885 - } 886 - out: 887 - return result; 888 - } 889 - 890 - static int kimage_load_crash_segment(struct kimage *image, 891 - struct kexec_segment *segment) 892 - { 893 - /* For crash dumps kernels we simply copy the data from 894 - * user space to it's destination. 895 - * We do things a page at a time for the sake of kmap. 896 - */ 897 - unsigned long maddr; 898 - size_t ubytes, mbytes; 899 - int result; 900 - unsigned char __user *buf = NULL; 901 - unsigned char *kbuf = NULL; 902 - 903 - result = 0; 904 - if (image->file_mode) 905 - kbuf = segment->kbuf; 906 - else 907 - buf = segment->buf; 908 - ubytes = segment->bufsz; 909 - mbytes = segment->memsz; 910 - maddr = segment->mem; 911 - while (mbytes) { 912 - struct page *page; 913 - char *ptr; 914 - size_t uchunk, mchunk; 915 - 916 - page = pfn_to_page(maddr >> PAGE_SHIFT); 917 - if (!page) { 918 - result = -ENOMEM; 919 - goto out; 920 - } 921 - ptr = kmap(page); 922 - ptr += maddr & ~PAGE_MASK; 923 - mchunk = min_t(size_t, mbytes, 924 - PAGE_SIZE - (maddr & ~PAGE_MASK)); 925 - uchunk = min(ubytes, mchunk); 926 - if (mchunk > uchunk) { 927 - /* Zero the trailing part of the page */ 928 - memset(ptr + uchunk, 0, mchunk - uchunk); 929 - } 930 - 931 - /* For file based kexec, source pages are in kernel memory */ 932 - if (image->file_mode) 933 - memcpy(ptr, kbuf, uchunk); 934 - else 935 - result = copy_from_user(ptr, buf, uchunk); 936 - kexec_flush_icache_page(page); 937 - kunmap(page); 938 - if (result) { 939 - result = -EFAULT; 940 - goto out; 941 - } 942 - ubytes -= uchunk; 943 - maddr += mchunk; 944 - if (image->file_mode) 945 - kbuf += mchunk; 946 - else 947 - buf += mchunk; 948 - mbytes -= mchunk; 949 - } 950 - out: 951 - return result; 952 - } 953 - 954 - static int kimage_load_segment(struct kimage *image, 955 - struct kexec_segment *segment) 956 - { 957 - int result = -ENOMEM; 958 - 959 - switch (image->type) { 960 - case KEXEC_TYPE_DEFAULT: 961 - result = kimage_load_normal_segment(image, segment); 962 - break; 963 - case KEXEC_TYPE_CRASH: 964 - result = kimage_load_crash_segment(image, segment); 965 - break; 966 - } 967 - 968 - return result; 969 - } 970 - 971 357 /* 972 358 * Exec Kernel system call: for obvious reasons only root may call it. 973 359 * ··· 121 1241 * kexec does not sync, or unmount filesystems so if you need 122 1242 * that to happen you need to do that yourself. 123 1243 */ 124 - struct kimage *kexec_image; 125 - struct kimage *kexec_crash_image; 126 - int kexec_load_disabled; 127 - 128 - static DEFINE_MUTEX(kexec_mutex); 129 1244 130 1245 SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments, 131 1246 struct kexec_segment __user *, segments, unsigned long, flags) ··· 215 1340 return result; 216 1341 } 217 1342 218 - /* 219 - * Add and remove page tables for crashkernel memory 220 - * 221 - * Provide an empty default implementation here -- architecture 222 - * code may override this 223 - */ 224 - void __weak crash_map_reserved_pages(void) 225 - {} 226 - 227 - void __weak crash_unmap_reserved_pages(void) 228 - {} 229 - 230 1343 #ifdef CONFIG_COMPAT 231 1344 COMPAT_SYSCALL_DEFINE4(kexec_load, compat_ulong_t, entry, 232 1345 compat_ulong_t, nr_segments, ··· 253 1390 return sys_kexec_load(entry, nr_segments, ksegments, flags); 254 1391 } 255 1392 #endif 256 - 257 - #ifdef CONFIG_KEXEC_FILE 258 - SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, 259 - unsigned long, cmdline_len, const char __user *, cmdline_ptr, 260 - unsigned long, flags) 261 - { 262 - int ret = 0, i; 263 - struct kimage **dest_image, *image; 264 - 265 - /* We only trust the superuser with rebooting the system. */ 266 - if (!capable(CAP_SYS_BOOT) || kexec_load_disabled) 267 - return -EPERM; 268 - 269 - /* Make sure we have a legal set of flags */ 270 - if (flags != (flags & KEXEC_FILE_FLAGS)) 271 - return -EINVAL; 272 - 273 - image = NULL; 274 - 275 - if (!mutex_trylock(&kexec_mutex)) 276 - return -EBUSY; 277 - 278 - dest_image = &kexec_image; 279 - if (flags & KEXEC_FILE_ON_CRASH) 280 - dest_image = &kexec_crash_image; 281 - 282 - if (flags & KEXEC_FILE_UNLOAD) 283 - goto exchange; 284 - 285 - /* 286 - * In case of crash, new kernel gets loaded in reserved region. It is 287 - * same memory where old crash kernel might be loaded. Free any 288 - * current crash dump kernel before we corrupt it. 289 - */ 290 - if (flags & KEXEC_FILE_ON_CRASH) 291 - kimage_free(xchg(&kexec_crash_image, NULL)); 292 - 293 - ret = kimage_file_alloc_init(&image, kernel_fd, initrd_fd, cmdline_ptr, 294 - cmdline_len, flags); 295 - if (ret) 296 - goto out; 297 - 298 - ret = machine_kexec_prepare(image); 299 - if (ret) 300 - goto out; 301 - 302 - ret = kexec_calculate_store_digests(image); 303 - if (ret) 304 - goto out; 305 - 306 - for (i = 0; i < image->nr_segments; i++) { 307 - struct kexec_segment *ksegment; 308 - 309 - ksegment = &image->segment[i]; 310 - pr_debug("Loading segment %d: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", 311 - i, ksegment->buf, ksegment->bufsz, ksegment->mem, 312 - ksegment->memsz); 313 - 314 - ret = kimage_load_segment(image, &image->segment[i]); 315 - if (ret) 316 - goto out; 317 - } 318 - 319 - kimage_terminate(image); 320 - 321 - /* 322 - * Free up any temporary buffers allocated which are not needed 323 - * after image has been loaded 324 - */ 325 - kimage_file_post_load_cleanup(image); 326 - exchange: 327 - image = xchg(dest_image, image); 328 - out: 329 - mutex_unlock(&kexec_mutex); 330 - kimage_free(image); 331 - return ret; 332 - } 333 - 334 - #endif /* CONFIG_KEXEC_FILE */ 335 - 336 - void crash_kexec(struct pt_regs *regs) 337 - { 338 - /* Take the kexec_mutex here to prevent sys_kexec_load 339 - * running on one cpu from replacing the crash kernel 340 - * we are using after a panic on a different cpu. 341 - * 342 - * If the crash kernel was not located in a fixed area 343 - * of memory the xchg(&kexec_crash_image) would be 344 - * sufficient. But since I reuse the memory... 345 - */ 346 - if (mutex_trylock(&kexec_mutex)) { 347 - if (kexec_crash_image) { 348 - struct pt_regs fixed_regs; 349 - 350 - crash_setup_regs(&fixed_regs, regs); 351 - crash_save_vmcoreinfo(); 352 - machine_crash_shutdown(&fixed_regs); 353 - machine_kexec(kexec_crash_image); 354 - } 355 - mutex_unlock(&kexec_mutex); 356 - } 357 - } 358 - 359 - size_t crash_get_memory_size(void) 360 - { 361 - size_t size = 0; 362 - mutex_lock(&kexec_mutex); 363 - if (crashk_res.end != crashk_res.start) 364 - size = resource_size(&crashk_res); 365 - mutex_unlock(&kexec_mutex); 366 - return size; 367 - } 368 - 369 - void __weak crash_free_reserved_phys_range(unsigned long begin, 370 - unsigned long end) 371 - { 372 - unsigned long addr; 373 - 374 - for (addr = begin; addr < end; addr += PAGE_SIZE) 375 - free_reserved_page(pfn_to_page(addr >> PAGE_SHIFT)); 376 - } 377 - 378 - int crash_shrink_memory(unsigned long new_size) 379 - { 380 - int ret = 0; 381 - unsigned long start, end; 382 - unsigned long old_size; 383 - struct resource *ram_res; 384 - 385 - mutex_lock(&kexec_mutex); 386 - 387 - if (kexec_crash_image) { 388 - ret = -ENOENT; 389 - goto unlock; 390 - } 391 - start = crashk_res.start; 392 - end = crashk_res.end; 393 - old_size = (end == 0) ? 0 : end - start + 1; 394 - if (new_size >= old_size) { 395 - ret = (new_size == old_size) ? 0 : -EINVAL; 396 - goto unlock; 397 - } 398 - 399 - ram_res = kzalloc(sizeof(*ram_res), GFP_KERNEL); 400 - if (!ram_res) { 401 - ret = -ENOMEM; 402 - goto unlock; 403 - } 404 - 405 - start = roundup(start, KEXEC_CRASH_MEM_ALIGN); 406 - end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN); 407 - 408 - crash_map_reserved_pages(); 409 - crash_free_reserved_phys_range(end, crashk_res.end); 410 - 411 - if ((start == end) && (crashk_res.parent != NULL)) 412 - release_resource(&crashk_res); 413 - 414 - ram_res->start = end; 415 - ram_res->end = crashk_res.end; 416 - ram_res->flags = IORESOURCE_BUSY | IORESOURCE_MEM; 417 - ram_res->name = "System RAM"; 418 - 419 - crashk_res.end = end - 1; 420 - 421 - insert_resource(&iomem_resource, ram_res); 422 - crash_unmap_reserved_pages(); 423 - 424 - unlock: 425 - mutex_unlock(&kexec_mutex); 426 - return ret; 427 - } 428 - 429 - static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data, 430 - size_t data_len) 431 - { 432 - struct elf_note note; 433 - 434 - note.n_namesz = strlen(name) + 1; 435 - note.n_descsz = data_len; 436 - note.n_type = type; 437 - memcpy(buf, &note, sizeof(note)); 438 - buf += (sizeof(note) + 3)/4; 439 - memcpy(buf, name, note.n_namesz); 440 - buf += (note.n_namesz + 3)/4; 441 - memcpy(buf, data, note.n_descsz); 442 - buf += (note.n_descsz + 3)/4; 443 - 444 - return buf; 445 - } 446 - 447 - static void final_note(u32 *buf) 448 - { 449 - struct elf_note note; 450 - 451 - note.n_namesz = 0; 452 - note.n_descsz = 0; 453 - note.n_type = 0; 454 - memcpy(buf, &note, sizeof(note)); 455 - } 456 - 457 - void crash_save_cpu(struct pt_regs *regs, int cpu) 458 - { 459 - struct elf_prstatus prstatus; 460 - u32 *buf; 461 - 462 - if ((cpu < 0) || (cpu >= nr_cpu_ids)) 463 - return; 464 - 465 - /* Using ELF notes here is opportunistic. 466 - * I need a well defined structure format 467 - * for the data I pass, and I need tags 468 - * on the data to indicate what information I have 469 - * squirrelled away. ELF notes happen to provide 470 - * all of that, so there is no need to invent something new. 471 - */ 472 - buf = (u32 *)per_cpu_ptr(crash_notes, cpu); 473 - if (!buf) 474 - return; 475 - memset(&prstatus, 0, sizeof(prstatus)); 476 - prstatus.pr_pid = current->pid; 477 - elf_core_copy_kernel_regs(&prstatus.pr_reg, regs); 478 - buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_PRSTATUS, 479 - &prstatus, sizeof(prstatus)); 480 - final_note(buf); 481 - } 482 - 483 - static int __init crash_notes_memory_init(void) 484 - { 485 - /* Allocate memory for saving cpu registers. */ 486 - crash_notes = alloc_percpu(note_buf_t); 487 - if (!crash_notes) { 488 - pr_warn("Kexec: Memory allocation for saving cpu register states failed\n"); 489 - return -ENOMEM; 490 - } 491 - return 0; 492 - } 493 - subsys_initcall(crash_notes_memory_init); 494 - 495 - 496 - /* 497 - * parsing the "crashkernel" commandline 498 - * 499 - * this code is intended to be called from architecture specific code 500 - */ 501 - 502 - 503 - /* 504 - * This function parses command lines in the format 505 - * 506 - * crashkernel=ramsize-range:size[,...][@offset] 507 - * 508 - * The function returns 0 on success and -EINVAL on failure. 509 - */ 510 - static int __init parse_crashkernel_mem(char *cmdline, 511 - unsigned long long system_ram, 512 - unsigned long long *crash_size, 513 - unsigned long long *crash_base) 514 - { 515 - char *cur = cmdline, *tmp; 516 - 517 - /* for each entry of the comma-separated list */ 518 - do { 519 - unsigned long long start, end = ULLONG_MAX, size; 520 - 521 - /* get the start of the range */ 522 - start = memparse(cur, &tmp); 523 - if (cur == tmp) { 524 - pr_warn("crashkernel: Memory value expected\n"); 525 - return -EINVAL; 526 - } 527 - cur = tmp; 528 - if (*cur != '-') { 529 - pr_warn("crashkernel: '-' expected\n"); 530 - return -EINVAL; 531 - } 532 - cur++; 533 - 534 - /* if no ':' is here, than we read the end */ 535 - if (*cur != ':') { 536 - end = memparse(cur, &tmp); 537 - if (cur == tmp) { 538 - pr_warn("crashkernel: Memory value expected\n"); 539 - return -EINVAL; 540 - } 541 - cur = tmp; 542 - if (end <= start) { 543 - pr_warn("crashkernel: end <= start\n"); 544 - return -EINVAL; 545 - } 546 - } 547 - 548 - if (*cur != ':') { 549 - pr_warn("crashkernel: ':' expected\n"); 550 - return -EINVAL; 551 - } 552 - cur++; 553 - 554 - size = memparse(cur, &tmp); 555 - if (cur == tmp) { 556 - pr_warn("Memory value expected\n"); 557 - return -EINVAL; 558 - } 559 - cur = tmp; 560 - if (size >= system_ram) { 561 - pr_warn("crashkernel: invalid size\n"); 562 - return -EINVAL; 563 - } 564 - 565 - /* match ? */ 566 - if (system_ram >= start && system_ram < end) { 567 - *crash_size = size; 568 - break; 569 - } 570 - } while (*cur++ == ','); 571 - 572 - if (*crash_size > 0) { 573 - while (*cur && *cur != ' ' && *cur != '@') 574 - cur++; 575 - if (*cur == '@') { 576 - cur++; 577 - *crash_base = memparse(cur, &tmp); 578 - if (cur == tmp) { 579 - pr_warn("Memory value expected after '@'\n"); 580 - return -EINVAL; 581 - } 582 - } 583 - } 584 - 585 - return 0; 586 - } 587 - 588 - /* 589 - * That function parses "simple" (old) crashkernel command lines like 590 - * 591 - * crashkernel=size[@offset] 592 - * 593 - * It returns 0 on success and -EINVAL on failure. 594 - */ 595 - static int __init parse_crashkernel_simple(char *cmdline, 596 - unsigned long long *crash_size, 597 - unsigned long long *crash_base) 598 - { 599 - char *cur = cmdline; 600 - 601 - *crash_size = memparse(cmdline, &cur); 602 - if (cmdline == cur) { 603 - pr_warn("crashkernel: memory value expected\n"); 604 - return -EINVAL; 605 - } 606 - 607 - if (*cur == '@') 608 - *crash_base = memparse(cur+1, &cur); 609 - else if (*cur != ' ' && *cur != '\0') { 610 - pr_warn("crashkernel: unrecognized char\n"); 611 - return -EINVAL; 612 - } 613 - 614 - return 0; 615 - } 616 - 617 - #define SUFFIX_HIGH 0 618 - #define SUFFIX_LOW 1 619 - #define SUFFIX_NULL 2 620 - static __initdata char *suffix_tbl[] = { 621 - [SUFFIX_HIGH] = ",high", 622 - [SUFFIX_LOW] = ",low", 623 - [SUFFIX_NULL] = NULL, 624 - }; 625 - 626 - /* 627 - * That function parses "suffix" crashkernel command lines like 628 - * 629 - * crashkernel=size,[high|low] 630 - * 631 - * It returns 0 on success and -EINVAL on failure. 632 - */ 633 - static int __init parse_crashkernel_suffix(char *cmdline, 634 - unsigned long long *crash_size, 635 - const char *suffix) 636 - { 637 - char *cur = cmdline; 638 - 639 - *crash_size = memparse(cmdline, &cur); 640 - if (cmdline == cur) { 641 - pr_warn("crashkernel: memory value expected\n"); 642 - return -EINVAL; 643 - } 644 - 645 - /* check with suffix */ 646 - if (strncmp(cur, suffix, strlen(suffix))) { 647 - pr_warn("crashkernel: unrecognized char\n"); 648 - return -EINVAL; 649 - } 650 - cur += strlen(suffix); 651 - if (*cur != ' ' && *cur != '\0') { 652 - pr_warn("crashkernel: unrecognized char\n"); 653 - return -EINVAL; 654 - } 655 - 656 - return 0; 657 - } 658 - 659 - static __init char *get_last_crashkernel(char *cmdline, 660 - const char *name, 661 - const char *suffix) 662 - { 663 - char *p = cmdline, *ck_cmdline = NULL; 664 - 665 - /* find crashkernel and use the last one if there are more */ 666 - p = strstr(p, name); 667 - while (p) { 668 - char *end_p = strchr(p, ' '); 669 - char *q; 670 - 671 - if (!end_p) 672 - end_p = p + strlen(p); 673 - 674 - if (!suffix) { 675 - int i; 676 - 677 - /* skip the one with any known suffix */ 678 - for (i = 0; suffix_tbl[i]; i++) { 679 - q = end_p - strlen(suffix_tbl[i]); 680 - if (!strncmp(q, suffix_tbl[i], 681 - strlen(suffix_tbl[i]))) 682 - goto next; 683 - } 684 - ck_cmdline = p; 685 - } else { 686 - q = end_p - strlen(suffix); 687 - if (!strncmp(q, suffix, strlen(suffix))) 688 - ck_cmdline = p; 689 - } 690 - next: 691 - p = strstr(p+1, name); 692 - } 693 - 694 - if (!ck_cmdline) 695 - return NULL; 696 - 697 - return ck_cmdline; 698 - } 699 - 700 - static int __init __parse_crashkernel(char *cmdline, 701 - unsigned long long system_ram, 702 - unsigned long long *crash_size, 703 - unsigned long long *crash_base, 704 - const char *name, 705 - const char *suffix) 706 - { 707 - char *first_colon, *first_space; 708 - char *ck_cmdline; 709 - 710 - BUG_ON(!crash_size || !crash_base); 711 - *crash_size = 0; 712 - *crash_base = 0; 713 - 714 - ck_cmdline = get_last_crashkernel(cmdline, name, suffix); 715 - 716 - if (!ck_cmdline) 717 - return -EINVAL; 718 - 719 - ck_cmdline += strlen(name); 720 - 721 - if (suffix) 722 - return parse_crashkernel_suffix(ck_cmdline, crash_size, 723 - suffix); 724 - /* 725 - * if the commandline contains a ':', then that's the extended 726 - * syntax -- if not, it must be the classic syntax 727 - */ 728 - first_colon = strchr(ck_cmdline, ':'); 729 - first_space = strchr(ck_cmdline, ' '); 730 - if (first_colon && (!first_space || first_colon < first_space)) 731 - return parse_crashkernel_mem(ck_cmdline, system_ram, 732 - crash_size, crash_base); 733 - 734 - return parse_crashkernel_simple(ck_cmdline, crash_size, crash_base); 735 - } 736 - 737 - /* 738 - * That function is the entry point for command line parsing and should be 739 - * called from the arch-specific code. 740 - */ 741 - int __init parse_crashkernel(char *cmdline, 742 - unsigned long long system_ram, 743 - unsigned long long *crash_size, 744 - unsigned long long *crash_base) 745 - { 746 - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, 747 - "crashkernel=", NULL); 748 - } 749 - 750 - int __init parse_crashkernel_high(char *cmdline, 751 - unsigned long long system_ram, 752 - unsigned long long *crash_size, 753 - unsigned long long *crash_base) 754 - { 755 - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, 756 - "crashkernel=", suffix_tbl[SUFFIX_HIGH]); 757 - } 758 - 759 - int __init parse_crashkernel_low(char *cmdline, 760 - unsigned long long system_ram, 761 - unsigned long long *crash_size, 762 - unsigned long long *crash_base) 763 - { 764 - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, 765 - "crashkernel=", suffix_tbl[SUFFIX_LOW]); 766 - } 767 - 768 - static void update_vmcoreinfo_note(void) 769 - { 770 - u32 *buf = vmcoreinfo_note; 771 - 772 - if (!vmcoreinfo_size) 773 - return; 774 - buf = append_elf_note(buf, VMCOREINFO_NOTE_NAME, 0, vmcoreinfo_data, 775 - vmcoreinfo_size); 776 - final_note(buf); 777 - } 778 - 779 - void crash_save_vmcoreinfo(void) 780 - { 781 - vmcoreinfo_append_str("CRASHTIME=%ld\n", get_seconds()); 782 - update_vmcoreinfo_note(); 783 - } 784 - 785 - void vmcoreinfo_append_str(const char *fmt, ...) 786 - { 787 - va_list args; 788 - char buf[0x50]; 789 - size_t r; 790 - 791 - va_start(args, fmt); 792 - r = vscnprintf(buf, sizeof(buf), fmt, args); 793 - va_end(args); 794 - 795 - r = min(r, vmcoreinfo_max_size - vmcoreinfo_size); 796 - 797 - memcpy(&vmcoreinfo_data[vmcoreinfo_size], buf, r); 798 - 799 - vmcoreinfo_size += r; 800 - } 801 - 802 - /* 803 - * provide an empty default implementation here -- architecture 804 - * code may override this 805 - */ 806 - void __weak arch_crash_save_vmcoreinfo(void) 807 - {} 808 - 809 - unsigned long __weak paddr_vmcoreinfo_note(void) 810 - { 811 - return __pa((unsigned long)(char *)&vmcoreinfo_note); 812 - } 813 - 814 - static int __init crash_save_vmcoreinfo_init(void) 815 - { 816 - VMCOREINFO_OSRELEASE(init_uts_ns.name.release); 817 - VMCOREINFO_PAGESIZE(PAGE_SIZE); 818 - 819 - VMCOREINFO_SYMBOL(init_uts_ns); 820 - VMCOREINFO_SYMBOL(node_online_map); 821 - #ifdef CONFIG_MMU 822 - VMCOREINFO_SYMBOL(swapper_pg_dir); 823 - #endif 824 - VMCOREINFO_SYMBOL(_stext); 825 - VMCOREINFO_SYMBOL(vmap_area_list); 826 - 827 - #ifndef CONFIG_NEED_MULTIPLE_NODES 828 - VMCOREINFO_SYMBOL(mem_map); 829 - VMCOREINFO_SYMBOL(contig_page_data); 830 - #endif 831 - #ifdef CONFIG_SPARSEMEM 832 - VMCOREINFO_SYMBOL(mem_section); 833 - VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); 834 - VMCOREINFO_STRUCT_SIZE(mem_section); 835 - VMCOREINFO_OFFSET(mem_section, section_mem_map); 836 - #endif 837 - VMCOREINFO_STRUCT_SIZE(page); 838 - VMCOREINFO_STRUCT_SIZE(pglist_data); 839 - VMCOREINFO_STRUCT_SIZE(zone); 840 - VMCOREINFO_STRUCT_SIZE(free_area); 841 - VMCOREINFO_STRUCT_SIZE(list_head); 842 - VMCOREINFO_SIZE(nodemask_t); 843 - VMCOREINFO_OFFSET(page, flags); 844 - VMCOREINFO_OFFSET(page, _count); 845 - VMCOREINFO_OFFSET(page, mapping); 846 - VMCOREINFO_OFFSET(page, lru); 847 - VMCOREINFO_OFFSET(page, _mapcount); 848 - VMCOREINFO_OFFSET(page, private); 849 - VMCOREINFO_OFFSET(pglist_data, node_zones); 850 - VMCOREINFO_OFFSET(pglist_data, nr_zones); 851 - #ifdef CONFIG_FLAT_NODE_MEM_MAP 852 - VMCOREINFO_OFFSET(pglist_data, node_mem_map); 853 - #endif 854 - VMCOREINFO_OFFSET(pglist_data, node_start_pfn); 855 - VMCOREINFO_OFFSET(pglist_data, node_spanned_pages); 856 - VMCOREINFO_OFFSET(pglist_data, node_id); 857 - VMCOREINFO_OFFSET(zone, free_area); 858 - VMCOREINFO_OFFSET(zone, vm_stat); 859 - VMCOREINFO_OFFSET(zone, spanned_pages); 860 - VMCOREINFO_OFFSET(free_area, free_list); 861 - VMCOREINFO_OFFSET(list_head, next); 862 - VMCOREINFO_OFFSET(list_head, prev); 863 - VMCOREINFO_OFFSET(vmap_area, va_start); 864 - VMCOREINFO_OFFSET(vmap_area, list); 865 - VMCOREINFO_LENGTH(zone.free_area, MAX_ORDER); 866 - log_buf_kexec_setup(); 867 - VMCOREINFO_LENGTH(free_area.free_list, MIGRATE_TYPES); 868 - VMCOREINFO_NUMBER(NR_FREE_PAGES); 869 - VMCOREINFO_NUMBER(PG_lru); 870 - VMCOREINFO_NUMBER(PG_private); 871 - VMCOREINFO_NUMBER(PG_swapcache); 872 - VMCOREINFO_NUMBER(PG_slab); 873 - #ifdef CONFIG_MEMORY_FAILURE 874 - VMCOREINFO_NUMBER(PG_hwpoison); 875 - #endif 876 - VMCOREINFO_NUMBER(PG_head_mask); 877 - VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE); 878 - #ifdef CONFIG_HUGETLBFS 879 - VMCOREINFO_SYMBOL(free_huge_page); 880 - #endif 881 - 882 - arch_crash_save_vmcoreinfo(); 883 - update_vmcoreinfo_note(); 884 - 885 - return 0; 886 - } 887 - 888 - subsys_initcall(crash_save_vmcoreinfo_init); 889 - 890 - #ifdef CONFIG_KEXEC_FILE 891 - static int locate_mem_hole_top_down(unsigned long start, unsigned long end, 892 - struct kexec_buf *kbuf) 893 - { 894 - struct kimage *image = kbuf->image; 895 - unsigned long temp_start, temp_end; 896 - 897 - temp_end = min(end, kbuf->buf_max); 898 - temp_start = temp_end - kbuf->memsz; 899 - 900 - do { 901 - /* align down start */ 902 - temp_start = temp_start & (~(kbuf->buf_align - 1)); 903 - 904 - if (temp_start < start || temp_start < kbuf->buf_min) 905 - return 0; 906 - 907 - temp_end = temp_start + kbuf->memsz - 1; 908 - 909 - /* 910 - * Make sure this does not conflict with any of existing 911 - * segments 912 - */ 913 - if (kimage_is_destination_range(image, temp_start, temp_end)) { 914 - temp_start = temp_start - PAGE_SIZE; 915 - continue; 916 - } 917 - 918 - /* We found a suitable memory range */ 919 - break; 920 - } while (1); 921 - 922 - /* If we are here, we found a suitable memory range */ 923 - kbuf->mem = temp_start; 924 - 925 - /* Success, stop navigating through remaining System RAM ranges */ 926 - return 1; 927 - } 928 - 929 - static int locate_mem_hole_bottom_up(unsigned long start, unsigned long end, 930 - struct kexec_buf *kbuf) 931 - { 932 - struct kimage *image = kbuf->image; 933 - unsigned long temp_start, temp_end; 934 - 935 - temp_start = max(start, kbuf->buf_min); 936 - 937 - do { 938 - temp_start = ALIGN(temp_start, kbuf->buf_align); 939 - temp_end = temp_start + kbuf->memsz - 1; 940 - 941 - if (temp_end > end || temp_end > kbuf->buf_max) 942 - return 0; 943 - /* 944 - * Make sure this does not conflict with any of existing 945 - * segments 946 - */ 947 - if (kimage_is_destination_range(image, temp_start, temp_end)) { 948 - temp_start = temp_start + PAGE_SIZE; 949 - continue; 950 - } 951 - 952 - /* We found a suitable memory range */ 953 - break; 954 - } while (1); 955 - 956 - /* If we are here, we found a suitable memory range */ 957 - kbuf->mem = temp_start; 958 - 959 - /* Success, stop navigating through remaining System RAM ranges */ 960 - return 1; 961 - } 962 - 963 - static int locate_mem_hole_callback(u64 start, u64 end, void *arg) 964 - { 965 - struct kexec_buf *kbuf = (struct kexec_buf *)arg; 966 - unsigned long sz = end - start + 1; 967 - 968 - /* Returning 0 will take to next memory range */ 969 - if (sz < kbuf->memsz) 970 - return 0; 971 - 972 - if (end < kbuf->buf_min || start > kbuf->buf_max) 973 - return 0; 974 - 975 - /* 976 - * Allocate memory top down with-in ram range. Otherwise bottom up 977 - * allocation. 978 - */ 979 - if (kbuf->top_down) 980 - return locate_mem_hole_top_down(start, end, kbuf); 981 - return locate_mem_hole_bottom_up(start, end, kbuf); 982 - } 983 - 984 - /* 985 - * Helper function for placing a buffer in a kexec segment. This assumes 986 - * that kexec_mutex is held. 987 - */ 988 - int kexec_add_buffer(struct kimage *image, char *buffer, unsigned long bufsz, 989 - unsigned long memsz, unsigned long buf_align, 990 - unsigned long buf_min, unsigned long buf_max, 991 - bool top_down, unsigned long *load_addr) 992 - { 993 - 994 - struct kexec_segment *ksegment; 995 - struct kexec_buf buf, *kbuf; 996 - int ret; 997 - 998 - /* Currently adding segment this way is allowed only in file mode */ 999 - if (!image->file_mode) 1000 - return -EINVAL; 1001 - 1002 - if (image->nr_segments >= KEXEC_SEGMENT_MAX) 1003 - return -EINVAL; 1004 - 1005 - /* 1006 - * Make sure we are not trying to add buffer after allocating 1007 - * control pages. All segments need to be placed first before 1008 - * any control pages are allocated. As control page allocation 1009 - * logic goes through list of segments to make sure there are 1010 - * no destination overlaps. 1011 - */ 1012 - if (!list_empty(&image->control_pages)) { 1013 - WARN_ON(1); 1014 - return -EINVAL; 1015 - } 1016 - 1017 - memset(&buf, 0, sizeof(struct kexec_buf)); 1018 - kbuf = &buf; 1019 - kbuf->image = image; 1020 - kbuf->buffer = buffer; 1021 - kbuf->bufsz = bufsz; 1022 - 1023 - kbuf->memsz = ALIGN(memsz, PAGE_SIZE); 1024 - kbuf->buf_align = max(buf_align, PAGE_SIZE); 1025 - kbuf->buf_min = buf_min; 1026 - kbuf->buf_max = buf_max; 1027 - kbuf->top_down = top_down; 1028 - 1029 - /* Walk the RAM ranges and allocate a suitable range for the buffer */ 1030 - if (image->type == KEXEC_TYPE_CRASH) 1031 - ret = walk_iomem_res("Crash kernel", 1032 - IORESOURCE_MEM | IORESOURCE_BUSY, 1033 - crashk_res.start, crashk_res.end, kbuf, 1034 - locate_mem_hole_callback); 1035 - else 1036 - ret = walk_system_ram_res(0, -1, kbuf, 1037 - locate_mem_hole_callback); 1038 - if (ret != 1) { 1039 - /* A suitable memory range could not be found for buffer */ 1040 - return -EADDRNOTAVAIL; 1041 - } 1042 - 1043 - /* Found a suitable memory range */ 1044 - ksegment = &image->segment[image->nr_segments]; 1045 - ksegment->kbuf = kbuf->buffer; 1046 - ksegment->bufsz = kbuf->bufsz; 1047 - ksegment->mem = kbuf->mem; 1048 - ksegment->memsz = kbuf->memsz; 1049 - image->nr_segments++; 1050 - *load_addr = ksegment->mem; 1051 - return 0; 1052 - } 1053 - 1054 - /* Calculate and store the digest of segments */ 1055 - static int kexec_calculate_store_digests(struct kimage *image) 1056 - { 1057 - struct crypto_shash *tfm; 1058 - struct shash_desc *desc; 1059 - int ret = 0, i, j, zero_buf_sz, sha_region_sz; 1060 - size_t desc_size, nullsz; 1061 - char *digest; 1062 - void *zero_buf; 1063 - struct kexec_sha_region *sha_regions; 1064 - struct purgatory_info *pi = &image->purgatory_info; 1065 - 1066 - zero_buf = __va(page_to_pfn(ZERO_PAGE(0)) << PAGE_SHIFT); 1067 - zero_buf_sz = PAGE_SIZE; 1068 - 1069 - tfm = crypto_alloc_shash("sha256", 0, 0); 1070 - if (IS_ERR(tfm)) { 1071 - ret = PTR_ERR(tfm); 1072 - goto out; 1073 - } 1074 - 1075 - desc_size = crypto_shash_descsize(tfm) + sizeof(*desc); 1076 - desc = kzalloc(desc_size, GFP_KERNEL); 1077 - if (!desc) { 1078 - ret = -ENOMEM; 1079 - goto out_free_tfm; 1080 - } 1081 - 1082 - sha_region_sz = KEXEC_SEGMENT_MAX * sizeof(struct kexec_sha_region); 1083 - sha_regions = vzalloc(sha_region_sz); 1084 - if (!sha_regions) 1085 - goto out_free_desc; 1086 - 1087 - desc->tfm = tfm; 1088 - desc->flags = 0; 1089 - 1090 - ret = crypto_shash_init(desc); 1091 - if (ret < 0) 1092 - goto out_free_sha_regions; 1093 - 1094 - digest = kzalloc(SHA256_DIGEST_SIZE, GFP_KERNEL); 1095 - if (!digest) { 1096 - ret = -ENOMEM; 1097 - goto out_free_sha_regions; 1098 - } 1099 - 1100 - for (j = i = 0; i < image->nr_segments; i++) { 1101 - struct kexec_segment *ksegment; 1102 - 1103 - ksegment = &image->segment[i]; 1104 - /* 1105 - * Skip purgatory as it will be modified once we put digest 1106 - * info in purgatory. 1107 - */ 1108 - if (ksegment->kbuf == pi->purgatory_buf) 1109 - continue; 1110 - 1111 - ret = crypto_shash_update(desc, ksegment->kbuf, 1112 - ksegment->bufsz); 1113 - if (ret) 1114 - break; 1115 - 1116 - /* 1117 - * Assume rest of the buffer is filled with zero and 1118 - * update digest accordingly. 1119 - */ 1120 - nullsz = ksegment->memsz - ksegment->bufsz; 1121 - while (nullsz) { 1122 - unsigned long bytes = nullsz; 1123 - 1124 - if (bytes > zero_buf_sz) 1125 - bytes = zero_buf_sz; 1126 - ret = crypto_shash_update(desc, zero_buf, bytes); 1127 - if (ret) 1128 - break; 1129 - nullsz -= bytes; 1130 - } 1131 - 1132 - if (ret) 1133 - break; 1134 - 1135 - sha_regions[j].start = ksegment->mem; 1136 - sha_regions[j].len = ksegment->memsz; 1137 - j++; 1138 - } 1139 - 1140 - if (!ret) { 1141 - ret = crypto_shash_final(desc, digest); 1142 - if (ret) 1143 - goto out_free_digest; 1144 - ret = kexec_purgatory_get_set_symbol(image, "sha_regions", 1145 - sha_regions, sha_region_sz, 0); 1146 - if (ret) 1147 - goto out_free_digest; 1148 - 1149 - ret = kexec_purgatory_get_set_symbol(image, "sha256_digest", 1150 - digest, SHA256_DIGEST_SIZE, 0); 1151 - if (ret) 1152 - goto out_free_digest; 1153 - } 1154 - 1155 - out_free_digest: 1156 - kfree(digest); 1157 - out_free_sha_regions: 1158 - vfree(sha_regions); 1159 - out_free_desc: 1160 - kfree(desc); 1161 - out_free_tfm: 1162 - kfree(tfm); 1163 - out: 1164 - return ret; 1165 - } 1166 - 1167 - /* Actually load purgatory. Lot of code taken from kexec-tools */ 1168 - static int __kexec_load_purgatory(struct kimage *image, unsigned long min, 1169 - unsigned long max, int top_down) 1170 - { 1171 - struct purgatory_info *pi = &image->purgatory_info; 1172 - unsigned long align, buf_align, bss_align, buf_sz, bss_sz, bss_pad; 1173 - unsigned long memsz, entry, load_addr, curr_load_addr, bss_addr, offset; 1174 - unsigned char *buf_addr, *src; 1175 - int i, ret = 0, entry_sidx = -1; 1176 - const Elf_Shdr *sechdrs_c; 1177 - Elf_Shdr *sechdrs = NULL; 1178 - void *purgatory_buf = NULL; 1179 - 1180 - /* 1181 - * sechdrs_c points to section headers in purgatory and are read 1182 - * only. No modifications allowed. 1183 - */ 1184 - sechdrs_c = (void *)pi->ehdr + pi->ehdr->e_shoff; 1185 - 1186 - /* 1187 - * We can not modify sechdrs_c[] and its fields. It is read only. 1188 - * Copy it over to a local copy where one can store some temporary 1189 - * data and free it at the end. We need to modify ->sh_addr and 1190 - * ->sh_offset fields to keep track of permanent and temporary 1191 - * locations of sections. 1192 - */ 1193 - sechdrs = vzalloc(pi->ehdr->e_shnum * sizeof(Elf_Shdr)); 1194 - if (!sechdrs) 1195 - return -ENOMEM; 1196 - 1197 - memcpy(sechdrs, sechdrs_c, pi->ehdr->e_shnum * sizeof(Elf_Shdr)); 1198 - 1199 - /* 1200 - * We seem to have multiple copies of sections. First copy is which 1201 - * is embedded in kernel in read only section. Some of these sections 1202 - * will be copied to a temporary buffer and relocated. And these 1203 - * sections will finally be copied to their final destination at 1204 - * segment load time. 1205 - * 1206 - * Use ->sh_offset to reflect section address in memory. It will 1207 - * point to original read only copy if section is not allocatable. 1208 - * Otherwise it will point to temporary copy which will be relocated. 1209 - * 1210 - * Use ->sh_addr to contain final address of the section where it 1211 - * will go during execution time. 1212 - */ 1213 - for (i = 0; i < pi->ehdr->e_shnum; i++) { 1214 - if (sechdrs[i].sh_type == SHT_NOBITS) 1215 - continue; 1216 - 1217 - sechdrs[i].sh_offset = (unsigned long)pi->ehdr + 1218 - sechdrs[i].sh_offset; 1219 - } 1220 - 1221 - /* 1222 - * Identify entry point section and make entry relative to section 1223 - * start. 1224 - */ 1225 - entry = pi->ehdr->e_entry; 1226 - for (i = 0; i < pi->ehdr->e_shnum; i++) { 1227 - if (!(sechdrs[i].sh_flags & SHF_ALLOC)) 1228 - continue; 1229 - 1230 - if (!(sechdrs[i].sh_flags & SHF_EXECINSTR)) 1231 - continue; 1232 - 1233 - /* Make entry section relative */ 1234 - if (sechdrs[i].sh_addr <= pi->ehdr->e_entry && 1235 - ((sechdrs[i].sh_addr + sechdrs[i].sh_size) > 1236 - pi->ehdr->e_entry)) { 1237 - entry_sidx = i; 1238 - entry -= sechdrs[i].sh_addr; 1239 - break; 1240 - } 1241 - } 1242 - 1243 - /* Determine how much memory is needed to load relocatable object. */ 1244 - buf_align = 1; 1245 - bss_align = 1; 1246 - buf_sz = 0; 1247 - bss_sz = 0; 1248 - 1249 - for (i = 0; i < pi->ehdr->e_shnum; i++) { 1250 - if (!(sechdrs[i].sh_flags & SHF_ALLOC)) 1251 - continue; 1252 - 1253 - align = sechdrs[i].sh_addralign; 1254 - if (sechdrs[i].sh_type != SHT_NOBITS) { 1255 - if (buf_align < align) 1256 - buf_align = align; 1257 - buf_sz = ALIGN(buf_sz, align); 1258 - buf_sz += sechdrs[i].sh_size; 1259 - } else { 1260 - /* bss section */ 1261 - if (bss_align < align) 1262 - bss_align = align; 1263 - bss_sz = ALIGN(bss_sz, align); 1264 - bss_sz += sechdrs[i].sh_size; 1265 - } 1266 - } 1267 - 1268 - /* Determine the bss padding required to align bss properly */ 1269 - bss_pad = 0; 1270 - if (buf_sz & (bss_align - 1)) 1271 - bss_pad = bss_align - (buf_sz & (bss_align - 1)); 1272 - 1273 - memsz = buf_sz + bss_pad + bss_sz; 1274 - 1275 - /* Allocate buffer for purgatory */ 1276 - purgatory_buf = vzalloc(buf_sz); 1277 - if (!purgatory_buf) { 1278 - ret = -ENOMEM; 1279 - goto out; 1280 - } 1281 - 1282 - if (buf_align < bss_align) 1283 - buf_align = bss_align; 1284 - 1285 - /* Add buffer to segment list */ 1286 - ret = kexec_add_buffer(image, purgatory_buf, buf_sz, memsz, 1287 - buf_align, min, max, top_down, 1288 - &pi->purgatory_load_addr); 1289 - if (ret) 1290 - goto out; 1291 - 1292 - /* Load SHF_ALLOC sections */ 1293 - buf_addr = purgatory_buf; 1294 - load_addr = curr_load_addr = pi->purgatory_load_addr; 1295 - bss_addr = load_addr + buf_sz + bss_pad; 1296 - 1297 - for (i = 0; i < pi->ehdr->e_shnum; i++) { 1298 - if (!(sechdrs[i].sh_flags & SHF_ALLOC)) 1299 - continue; 1300 - 1301 - align = sechdrs[i].sh_addralign; 1302 - if (sechdrs[i].sh_type != SHT_NOBITS) { 1303 - curr_load_addr = ALIGN(curr_load_addr, align); 1304 - offset = curr_load_addr - load_addr; 1305 - /* We already modifed ->sh_offset to keep src addr */ 1306 - src = (char *) sechdrs[i].sh_offset; 1307 - memcpy(buf_addr + offset, src, sechdrs[i].sh_size); 1308 - 1309 - /* Store load address and source address of section */ 1310 - sechdrs[i].sh_addr = curr_load_addr; 1311 - 1312 - /* 1313 - * This section got copied to temporary buffer. Update 1314 - * ->sh_offset accordingly. 1315 - */ 1316 - sechdrs[i].sh_offset = (unsigned long)(buf_addr + offset); 1317 - 1318 - /* Advance to the next address */ 1319 - curr_load_addr += sechdrs[i].sh_size; 1320 - } else { 1321 - bss_addr = ALIGN(bss_addr, align); 1322 - sechdrs[i].sh_addr = bss_addr; 1323 - bss_addr += sechdrs[i].sh_size; 1324 - } 1325 - } 1326 - 1327 - /* Update entry point based on load address of text section */ 1328 - if (entry_sidx >= 0) 1329 - entry += sechdrs[entry_sidx].sh_addr; 1330 - 1331 - /* Make kernel jump to purgatory after shutdown */ 1332 - image->start = entry; 1333 - 1334 - /* Used later to get/set symbol values */ 1335 - pi->sechdrs = sechdrs; 1336 - 1337 - /* 1338 - * Used later to identify which section is purgatory and skip it 1339 - * from checksumming. 1340 - */ 1341 - pi->purgatory_buf = purgatory_buf; 1342 - return ret; 1343 - out: 1344 - vfree(sechdrs); 1345 - vfree(purgatory_buf); 1346 - return ret; 1347 - } 1348 - 1349 - static int kexec_apply_relocations(struct kimage *image) 1350 - { 1351 - int i, ret; 1352 - struct purgatory_info *pi = &image->purgatory_info; 1353 - Elf_Shdr *sechdrs = pi->sechdrs; 1354 - 1355 - /* Apply relocations */ 1356 - for (i = 0; i < pi->ehdr->e_shnum; i++) { 1357 - Elf_Shdr *section, *symtab; 1358 - 1359 - if (sechdrs[i].sh_type != SHT_RELA && 1360 - sechdrs[i].sh_type != SHT_REL) 1361 - continue; 1362 - 1363 - /* 1364 - * For section of type SHT_RELA/SHT_REL, 1365 - * ->sh_link contains section header index of associated 1366 - * symbol table. And ->sh_info contains section header 1367 - * index of section to which relocations apply. 1368 - */ 1369 - if (sechdrs[i].sh_info >= pi->ehdr->e_shnum || 1370 - sechdrs[i].sh_link >= pi->ehdr->e_shnum) 1371 - return -ENOEXEC; 1372 - 1373 - section = &sechdrs[sechdrs[i].sh_info]; 1374 - symtab = &sechdrs[sechdrs[i].sh_link]; 1375 - 1376 - if (!(section->sh_flags & SHF_ALLOC)) 1377 - continue; 1378 - 1379 - /* 1380 - * symtab->sh_link contain section header index of associated 1381 - * string table. 1382 - */ 1383 - if (symtab->sh_link >= pi->ehdr->e_shnum) 1384 - /* Invalid section number? */ 1385 - continue; 1386 - 1387 - /* 1388 - * Respective architecture needs to provide support for applying 1389 - * relocations of type SHT_RELA/SHT_REL. 1390 - */ 1391 - if (sechdrs[i].sh_type == SHT_RELA) 1392 - ret = arch_kexec_apply_relocations_add(pi->ehdr, 1393 - sechdrs, i); 1394 - else if (sechdrs[i].sh_type == SHT_REL) 1395 - ret = arch_kexec_apply_relocations(pi->ehdr, 1396 - sechdrs, i); 1397 - if (ret) 1398 - return ret; 1399 - } 1400 - 1401 - return 0; 1402 - } 1403 - 1404 - /* Load relocatable purgatory object and relocate it appropriately */ 1405 - int kexec_load_purgatory(struct kimage *image, unsigned long min, 1406 - unsigned long max, int top_down, 1407 - unsigned long *load_addr) 1408 - { 1409 - struct purgatory_info *pi = &image->purgatory_info; 1410 - int ret; 1411 - 1412 - if (kexec_purgatory_size <= 0) 1413 - return -EINVAL; 1414 - 1415 - if (kexec_purgatory_size < sizeof(Elf_Ehdr)) 1416 - return -ENOEXEC; 1417 - 1418 - pi->ehdr = (Elf_Ehdr *)kexec_purgatory; 1419 - 1420 - if (memcmp(pi->ehdr->e_ident, ELFMAG, SELFMAG) != 0 1421 - || pi->ehdr->e_type != ET_REL 1422 - || !elf_check_arch(pi->ehdr) 1423 - || pi->ehdr->e_shentsize != sizeof(Elf_Shdr)) 1424 - return -ENOEXEC; 1425 - 1426 - if (pi->ehdr->e_shoff >= kexec_purgatory_size 1427 - || (pi->ehdr->e_shnum * sizeof(Elf_Shdr) > 1428 - kexec_purgatory_size - pi->ehdr->e_shoff)) 1429 - return -ENOEXEC; 1430 - 1431 - ret = __kexec_load_purgatory(image, min, max, top_down); 1432 - if (ret) 1433 - return ret; 1434 - 1435 - ret = kexec_apply_relocations(image); 1436 - if (ret) 1437 - goto out; 1438 - 1439 - *load_addr = pi->purgatory_load_addr; 1440 - return 0; 1441 - out: 1442 - vfree(pi->sechdrs); 1443 - vfree(pi->purgatory_buf); 1444 - return ret; 1445 - } 1446 - 1447 - static Elf_Sym *kexec_purgatory_find_symbol(struct purgatory_info *pi, 1448 - const char *name) 1449 - { 1450 - Elf_Sym *syms; 1451 - Elf_Shdr *sechdrs; 1452 - Elf_Ehdr *ehdr; 1453 - int i, k; 1454 - const char *strtab; 1455 - 1456 - if (!pi->sechdrs || !pi->ehdr) 1457 - return NULL; 1458 - 1459 - sechdrs = pi->sechdrs; 1460 - ehdr = pi->ehdr; 1461 - 1462 - for (i = 0; i < ehdr->e_shnum; i++) { 1463 - if (sechdrs[i].sh_type != SHT_SYMTAB) 1464 - continue; 1465 - 1466 - if (sechdrs[i].sh_link >= ehdr->e_shnum) 1467 - /* Invalid strtab section number */ 1468 - continue; 1469 - strtab = (char *)sechdrs[sechdrs[i].sh_link].sh_offset; 1470 - syms = (Elf_Sym *)sechdrs[i].sh_offset; 1471 - 1472 - /* Go through symbols for a match */ 1473 - for (k = 0; k < sechdrs[i].sh_size/sizeof(Elf_Sym); k++) { 1474 - if (ELF_ST_BIND(syms[k].st_info) != STB_GLOBAL) 1475 - continue; 1476 - 1477 - if (strcmp(strtab + syms[k].st_name, name) != 0) 1478 - continue; 1479 - 1480 - if (syms[k].st_shndx == SHN_UNDEF || 1481 - syms[k].st_shndx >= ehdr->e_shnum) { 1482 - pr_debug("Symbol: %s has bad section index %d.\n", 1483 - name, syms[k].st_shndx); 1484 - return NULL; 1485 - } 1486 - 1487 - /* Found the symbol we are looking for */ 1488 - return &syms[k]; 1489 - } 1490 - } 1491 - 1492 - return NULL; 1493 - } 1494 - 1495 - void *kexec_purgatory_get_symbol_addr(struct kimage *image, const char *name) 1496 - { 1497 - struct purgatory_info *pi = &image->purgatory_info; 1498 - Elf_Sym *sym; 1499 - Elf_Shdr *sechdr; 1500 - 1501 - sym = kexec_purgatory_find_symbol(pi, name); 1502 - if (!sym) 1503 - return ERR_PTR(-EINVAL); 1504 - 1505 - sechdr = &pi->sechdrs[sym->st_shndx]; 1506 - 1507 - /* 1508 - * Returns the address where symbol will finally be loaded after 1509 - * kexec_load_segment() 1510 - */ 1511 - return (void *)(sechdr->sh_addr + sym->st_value); 1512 - } 1513 - 1514 - /* 1515 - * Get or set value of a symbol. If "get_value" is true, symbol value is 1516 - * returned in buf otherwise symbol value is set based on value in buf. 1517 - */ 1518 - int kexec_purgatory_get_set_symbol(struct kimage *image, const char *name, 1519 - void *buf, unsigned int size, bool get_value) 1520 - { 1521 - Elf_Sym *sym; 1522 - Elf_Shdr *sechdrs; 1523 - struct purgatory_info *pi = &image->purgatory_info; 1524 - char *sym_buf; 1525 - 1526 - sym = kexec_purgatory_find_symbol(pi, name); 1527 - if (!sym) 1528 - return -EINVAL; 1529 - 1530 - if (sym->st_size != size) { 1531 - pr_err("symbol %s size mismatch: expected %lu actual %u\n", 1532 - name, (unsigned long)sym->st_size, size); 1533 - return -EINVAL; 1534 - } 1535 - 1536 - sechdrs = pi->sechdrs; 1537 - 1538 - if (sechdrs[sym->st_shndx].sh_type == SHT_NOBITS) { 1539 - pr_err("symbol %s is in a bss section. Cannot %s\n", name, 1540 - get_value ? "get" : "set"); 1541 - return -EINVAL; 1542 - } 1543 - 1544 - sym_buf = (unsigned char *)sechdrs[sym->st_shndx].sh_offset + 1545 - sym->st_value; 1546 - 1547 - if (get_value) 1548 - memcpy((void *)buf, sym_buf, size); 1549 - else 1550 - memcpy((void *)sym_buf, buf, size); 1551 - 1552 - return 0; 1553 - } 1554 - #endif /* CONFIG_KEXEC_FILE */ 1555 - 1556 - /* 1557 - * Move into place and start executing a preloaded standalone 1558 - * executable. If nothing was preloaded return an error. 1559 - */ 1560 - int kernel_kexec(void) 1561 - { 1562 - int error = 0; 1563 - 1564 - if (!mutex_trylock(&kexec_mutex)) 1565 - return -EBUSY; 1566 - if (!kexec_image) { 1567 - error = -EINVAL; 1568 - goto Unlock; 1569 - } 1570 - 1571 - #ifdef CONFIG_KEXEC_JUMP 1572 - if (kexec_image->preserve_context) { 1573 - lock_system_sleep(); 1574 - pm_prepare_console(); 1575 - error = freeze_processes(); 1576 - if (error) { 1577 - error = -EBUSY; 1578 - goto Restore_console; 1579 - } 1580 - suspend_console(); 1581 - error = dpm_suspend_start(PMSG_FREEZE); 1582 - if (error) 1583 - goto Resume_console; 1584 - /* At this point, dpm_suspend_start() has been called, 1585 - * but *not* dpm_suspend_end(). We *must* call 1586 - * dpm_suspend_end() now. Otherwise, drivers for 1587 - * some devices (e.g. interrupt controllers) become 1588 - * desynchronized with the actual state of the 1589 - * hardware at resume time, and evil weirdness ensues. 1590 - */ 1591 - error = dpm_suspend_end(PMSG_FREEZE); 1592 - if (error) 1593 - goto Resume_devices; 1594 - error = disable_nonboot_cpus(); 1595 - if (error) 1596 - goto Enable_cpus; 1597 - local_irq_disable(); 1598 - error = syscore_suspend(); 1599 - if (error) 1600 - goto Enable_irqs; 1601 - } else 1602 - #endif 1603 - { 1604 - kexec_in_progress = true; 1605 - kernel_restart_prepare(NULL); 1606 - migrate_to_reboot_cpu(); 1607 - 1608 - /* 1609 - * migrate_to_reboot_cpu() disables CPU hotplug assuming that 1610 - * no further code needs to use CPU hotplug (which is true in 1611 - * the reboot case). However, the kexec path depends on using 1612 - * CPU hotplug again; so re-enable it here. 1613 - */ 1614 - cpu_hotplug_enable(); 1615 - pr_emerg("Starting new kernel\n"); 1616 - machine_shutdown(); 1617 - } 1618 - 1619 - machine_kexec(kexec_image); 1620 - 1621 - #ifdef CONFIG_KEXEC_JUMP 1622 - if (kexec_image->preserve_context) { 1623 - syscore_resume(); 1624 - Enable_irqs: 1625 - local_irq_enable(); 1626 - Enable_cpus: 1627 - enable_nonboot_cpus(); 1628 - dpm_resume_start(PMSG_RESTORE); 1629 - Resume_devices: 1630 - dpm_resume_end(PMSG_RESTORE); 1631 - Resume_console: 1632 - resume_console(); 1633 - thaw_processes(); 1634 - Restore_console: 1635 - pm_restore_console(); 1636 - unlock_system_sleep(); 1637 - } 1638 - #endif 1639 - 1640 - Unlock: 1641 - mutex_unlock(&kexec_mutex); 1642 - return error; 1643 - }

+1534

kernel/kexec_core.c

··· 1 + /* 2 + * kexec.c - kexec system call core code. 3 + * Copyright (C) 2002-2004 Eric Biederman <ebiederm@xmission.com> 4 + * 5 + * This source code is licensed under the GNU General Public License, 6 + * Version 2. See the file COPYING for more details. 7 + */ 8 + 9 + #define pr_fmt(fmt) "kexec: " fmt 10 + 11 + #include <linux/capability.h> 12 + #include <linux/mm.h> 13 + #include <linux/file.h> 14 + #include <linux/slab.h> 15 + #include <linux/fs.h> 16 + #include <linux/kexec.h> 17 + #include <linux/mutex.h> 18 + #include <linux/list.h> 19 + #include <linux/highmem.h> 20 + #include <linux/syscalls.h> 21 + #include <linux/reboot.h> 22 + #include <linux/ioport.h> 23 + #include <linux/hardirq.h> 24 + #include <linux/elf.h> 25 + #include <linux/elfcore.h> 26 + #include <linux/utsname.h> 27 + #include <linux/numa.h> 28 + #include <linux/suspend.h> 29 + #include <linux/device.h> 30 + #include <linux/freezer.h> 31 + #include <linux/pm.h> 32 + #include <linux/cpu.h> 33 + #include <linux/uaccess.h> 34 + #include <linux/io.h> 35 + #include <linux/console.h> 36 + #include <linux/vmalloc.h> 37 + #include <linux/swap.h> 38 + #include <linux/syscore_ops.h> 39 + #include <linux/compiler.h> 40 + #include <linux/hugetlb.h> 41 + 42 + #include <asm/page.h> 43 + #include <asm/sections.h> 44 + 45 + #include <crypto/hash.h> 46 + #include <crypto/sha.h> 47 + #include "kexec_internal.h" 48 + 49 + DEFINE_MUTEX(kexec_mutex); 50 + 51 + /* Per cpu memory for storing cpu states in case of system crash. */ 52 + note_buf_t __percpu *crash_notes; 53 + 54 + /* vmcoreinfo stuff */ 55 + static unsigned char vmcoreinfo_data[VMCOREINFO_BYTES]; 56 + u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4]; 57 + size_t vmcoreinfo_size; 58 + size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data); 59 + 60 + /* Flag to indicate we are going to kexec a new kernel */ 61 + bool kexec_in_progress = false; 62 + 63 + 64 + /* Location of the reserved area for the crash kernel */ 65 + struct resource crashk_res = { 66 + .name = "Crash kernel", 67 + .start = 0, 68 + .end = 0, 69 + .flags = IORESOURCE_BUSY | IORESOURCE_MEM 70 + }; 71 + struct resource crashk_low_res = { 72 + .name = "Crash kernel", 73 + .start = 0, 74 + .end = 0, 75 + .flags = IORESOURCE_BUSY | IORESOURCE_MEM 76 + }; 77 + 78 + int kexec_should_crash(struct task_struct *p) 79 + { 80 + /* 81 + * If crash_kexec_post_notifiers is enabled, don't run 82 + * crash_kexec() here yet, which must be run after panic 83 + * notifiers in panic(). 84 + */ 85 + if (crash_kexec_post_notifiers) 86 + return 0; 87 + /* 88 + * There are 4 panic() calls in do_exit() path, each of which 89 + * corresponds to each of these 4 conditions. 90 + */ 91 + if (in_interrupt() || !p->pid || is_global_init(p) || panic_on_oops) 92 + return 1; 93 + return 0; 94 + } 95 + 96 + /* 97 + * When kexec transitions to the new kernel there is a one-to-one 98 + * mapping between physical and virtual addresses. On processors 99 + * where you can disable the MMU this is trivial, and easy. For 100 + * others it is still a simple predictable page table to setup. 101 + * 102 + * In that environment kexec copies the new kernel to its final 103 + * resting place. This means I can only support memory whose 104 + * physical address can fit in an unsigned long. In particular 105 + * addresses where (pfn << PAGE_SHIFT) > ULONG_MAX cannot be handled. 106 + * If the assembly stub has more restrictive requirements 107 + * KEXEC_SOURCE_MEMORY_LIMIT and KEXEC_DEST_MEMORY_LIMIT can be 108 + * defined more restrictively in <asm/kexec.h>. 109 + * 110 + * The code for the transition from the current kernel to the 111 + * the new kernel is placed in the control_code_buffer, whose size 112 + * is given by KEXEC_CONTROL_PAGE_SIZE. In the best case only a single 113 + * page of memory is necessary, but some architectures require more. 114 + * Because this memory must be identity mapped in the transition from 115 + * virtual to physical addresses it must live in the range 116 + * 0 - TASK_SIZE, as only the user space mappings are arbitrarily 117 + * modifiable. 118 + * 119 + * The assembly stub in the control code buffer is passed a linked list 120 + * of descriptor pages detailing the source pages of the new kernel, 121 + * and the destination addresses of those source pages. As this data 122 + * structure is not used in the context of the current OS, it must 123 + * be self-contained. 124 + * 125 + * The code has been made to work with highmem pages and will use a 126 + * destination page in its final resting place (if it happens 127 + * to allocate it). The end product of this is that most of the 128 + * physical address space, and most of RAM can be used. 129 + * 130 + * Future directions include: 131 + * - allocating a page table with the control code buffer identity 132 + * mapped, to simplify machine_kexec and make kexec_on_panic more 133 + * reliable. 134 + */ 135 + 136 + /* 137 + * KIMAGE_NO_DEST is an impossible destination address..., for 138 + * allocating pages whose destination address we do not care about. 139 + */ 140 + #define KIMAGE_NO_DEST (-1UL) 141 + 142 + static struct page *kimage_alloc_page(struct kimage *image, 143 + gfp_t gfp_mask, 144 + unsigned long dest); 145 + 146 + int sanity_check_segment_list(struct kimage *image) 147 + { 148 + int result, i; 149 + unsigned long nr_segments = image->nr_segments; 150 + 151 + /* 152 + * Verify we have good destination addresses. The caller is 153 + * responsible for making certain we don't attempt to load 154 + * the new image into invalid or reserved areas of RAM. This 155 + * just verifies it is an address we can use. 156 + * 157 + * Since the kernel does everything in page size chunks ensure 158 + * the destination addresses are page aligned. Too many 159 + * special cases crop of when we don't do this. The most 160 + * insidious is getting overlapping destination addresses 161 + * simply because addresses are changed to page size 162 + * granularity. 163 + */ 164 + result = -EADDRNOTAVAIL; 165 + for (i = 0; i < nr_segments; i++) { 166 + unsigned long mstart, mend; 167 + 168 + mstart = image->segment[i].mem; 169 + mend = mstart + image->segment[i].memsz; 170 + if ((mstart & ~PAGE_MASK) || (mend & ~PAGE_MASK)) 171 + return result; 172 + if (mend >= KEXEC_DESTINATION_MEMORY_LIMIT) 173 + return result; 174 + } 175 + 176 + /* Verify our destination addresses do not overlap. 177 + * If we alloed overlapping destination addresses 178 + * through very weird things can happen with no 179 + * easy explanation as one segment stops on another. 180 + */ 181 + result = -EINVAL; 182 + for (i = 0; i < nr_segments; i++) { 183 + unsigned long mstart, mend; 184 + unsigned long j; 185 + 186 + mstart = image->segment[i].mem; 187 + mend = mstart + image->segment[i].memsz; 188 + for (j = 0; j < i; j++) { 189 + unsigned long pstart, pend; 190 + 191 + pstart = image->segment[j].mem; 192 + pend = pstart + image->segment[j].memsz; 193 + /* Do the segments overlap ? */ 194 + if ((mend > pstart) && (mstart < pend)) 195 + return result; 196 + } 197 + } 198 + 199 + /* Ensure our buffer sizes are strictly less than 200 + * our memory sizes. This should always be the case, 201 + * and it is easier to check up front than to be surprised 202 + * later on. 203 + */ 204 + result = -EINVAL; 205 + for (i = 0; i < nr_segments; i++) { 206 + if (image->segment[i].bufsz > image->segment[i].memsz) 207 + return result; 208 + } 209 + 210 + /* 211 + * Verify we have good destination addresses. Normally 212 + * the caller is responsible for making certain we don't 213 + * attempt to load the new image into invalid or reserved 214 + * areas of RAM. But crash kernels are preloaded into a 215 + * reserved area of ram. We must ensure the addresses 216 + * are in the reserved area otherwise preloading the 217 + * kernel could corrupt things. 218 + */ 219 + 220 + if (image->type == KEXEC_TYPE_CRASH) { 221 + result = -EADDRNOTAVAIL; 222 + for (i = 0; i < nr_segments; i++) { 223 + unsigned long mstart, mend; 224 + 225 + mstart = image->segment[i].mem; 226 + mend = mstart + image->segment[i].memsz - 1; 227 + /* Ensure we are within the crash kernel limits */ 228 + if ((mstart < crashk_res.start) || 229 + (mend > crashk_res.end)) 230 + return result; 231 + } 232 + } 233 + 234 + return 0; 235 + } 236 + 237 + struct kimage *do_kimage_alloc_init(void) 238 + { 239 + struct kimage *image; 240 + 241 + /* Allocate a controlling structure */ 242 + image = kzalloc(sizeof(*image), GFP_KERNEL); 243 + if (!image) 244 + return NULL; 245 + 246 + image->head = 0; 247 + image->entry = &image->head; 248 + image->last_entry = &image->head; 249 + image->control_page = ~0; /* By default this does not apply */ 250 + image->type = KEXEC_TYPE_DEFAULT; 251 + 252 + /* Initialize the list of control pages */ 253 + INIT_LIST_HEAD(&image->control_pages); 254 + 255 + /* Initialize the list of destination pages */ 256 + INIT_LIST_HEAD(&image->dest_pages); 257 + 258 + /* Initialize the list of unusable pages */ 259 + INIT_LIST_HEAD(&image->unusable_pages); 260 + 261 + return image; 262 + } 263 + 264 + int kimage_is_destination_range(struct kimage *image, 265 + unsigned long start, 266 + unsigned long end) 267 + { 268 + unsigned long i; 269 + 270 + for (i = 0; i < image->nr_segments; i++) { 271 + unsigned long mstart, mend; 272 + 273 + mstart = image->segment[i].mem; 274 + mend = mstart + image->segment[i].memsz; 275 + if ((end > mstart) && (start < mend)) 276 + return 1; 277 + } 278 + 279 + return 0; 280 + } 281 + 282 + static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order) 283 + { 284 + struct page *pages; 285 + 286 + pages = alloc_pages(gfp_mask, order); 287 + if (pages) { 288 + unsigned int count, i; 289 + 290 + pages->mapping = NULL; 291 + set_page_private(pages, order); 292 + count = 1 << order; 293 + for (i = 0; i < count; i++) 294 + SetPageReserved(pages + i); 295 + } 296 + 297 + return pages; 298 + } 299 + 300 + static void kimage_free_pages(struct page *page) 301 + { 302 + unsigned int order, count, i; 303 + 304 + order = page_private(page); 305 + count = 1 << order; 306 + for (i = 0; i < count; i++) 307 + ClearPageReserved(page + i); 308 + __free_pages(page, order); 309 + } 310 + 311 + void kimage_free_page_list(struct list_head *list) 312 + { 313 + struct list_head *pos, *next; 314 + 315 + list_for_each_safe(pos, next, list) { 316 + struct page *page; 317 + 318 + page = list_entry(pos, struct page, lru); 319 + list_del(&page->lru); 320 + kimage_free_pages(page); 321 + } 322 + } 323 + 324 + static struct page *kimage_alloc_normal_control_pages(struct kimage *image, 325 + unsigned int order) 326 + { 327 + /* Control pages are special, they are the intermediaries 328 + * that are needed while we copy the rest of the pages 329 + * to their final resting place. As such they must 330 + * not conflict with either the destination addresses 331 + * or memory the kernel is already using. 332 + * 333 + * The only case where we really need more than one of 334 + * these are for architectures where we cannot disable 335 + * the MMU and must instead generate an identity mapped 336 + * page table for all of the memory. 337 + * 338 + * At worst this runs in O(N) of the image size. 339 + */ 340 + struct list_head extra_pages; 341 + struct page *pages; 342 + unsigned int count; 343 + 344 + count = 1 << order; 345 + INIT_LIST_HEAD(&extra_pages); 346 + 347 + /* Loop while I can allocate a page and the page allocated 348 + * is a destination page. 349 + */ 350 + do { 351 + unsigned long pfn, epfn, addr, eaddr; 352 + 353 + pages = kimage_alloc_pages(KEXEC_CONTROL_MEMORY_GFP, order); 354 + if (!pages) 355 + break; 356 + pfn = page_to_pfn(pages); 357 + epfn = pfn + count; 358 + addr = pfn << PAGE_SHIFT; 359 + eaddr = epfn << PAGE_SHIFT; 360 + if ((epfn >= (KEXEC_CONTROL_MEMORY_LIMIT >> PAGE_SHIFT)) || 361 + kimage_is_destination_range(image, addr, eaddr)) { 362 + list_add(&pages->lru, &extra_pages); 363 + pages = NULL; 364 + } 365 + } while (!pages); 366 + 367 + if (pages) { 368 + /* Remember the allocated page... */ 369 + list_add(&pages->lru, &image->control_pages); 370 + 371 + /* Because the page is already in it's destination 372 + * location we will never allocate another page at 373 + * that address. Therefore kimage_alloc_pages 374 + * will not return it (again) and we don't need 375 + * to give it an entry in image->segment[]. 376 + */ 377 + } 378 + /* Deal with the destination pages I have inadvertently allocated. 379 + * 380 + * Ideally I would convert multi-page allocations into single 381 + * page allocations, and add everything to image->dest_pages. 382 + * 383 + * For now it is simpler to just free the pages. 384 + */ 385 + kimage_free_page_list(&extra_pages); 386 + 387 + return pages; 388 + } 389 + 390 + static struct page *kimage_alloc_crash_control_pages(struct kimage *image, 391 + unsigned int order) 392 + { 393 + /* Control pages are special, they are the intermediaries 394 + * that are needed while we copy the rest of the pages 395 + * to their final resting place. As such they must 396 + * not conflict with either the destination addresses 397 + * or memory the kernel is already using. 398 + * 399 + * Control pages are also the only pags we must allocate 400 + * when loading a crash kernel. All of the other pages 401 + * are specified by the segments and we just memcpy 402 + * into them directly. 403 + * 404 + * The only case where we really need more than one of 405 + * these are for architectures where we cannot disable 406 + * the MMU and must instead generate an identity mapped 407 + * page table for all of the memory. 408 + * 409 + * Given the low demand this implements a very simple 410 + * allocator that finds the first hole of the appropriate 411 + * size in the reserved memory region, and allocates all 412 + * of the memory up to and including the hole. 413 + */ 414 + unsigned long hole_start, hole_end, size; 415 + struct page *pages; 416 + 417 + pages = NULL; 418 + size = (1 << order) << PAGE_SHIFT; 419 + hole_start = (image->control_page + (size - 1)) & ~(size - 1); 420 + hole_end = hole_start + size - 1; 421 + while (hole_end <= crashk_res.end) { 422 + unsigned long i; 423 + 424 + if (hole_end > KEXEC_CRASH_CONTROL_MEMORY_LIMIT) 425 + break; 426 + /* See if I overlap any of the segments */ 427 + for (i = 0; i < image->nr_segments; i++) { 428 + unsigned long mstart, mend; 429 + 430 + mstart = image->segment[i].mem; 431 + mend = mstart + image->segment[i].memsz - 1; 432 + if ((hole_end >= mstart) && (hole_start <= mend)) { 433 + /* Advance the hole to the end of the segment */ 434 + hole_start = (mend + (size - 1)) & ~(size - 1); 435 + hole_end = hole_start + size - 1; 436 + break; 437 + } 438 + } 439 + /* If I don't overlap any segments I have found my hole! */ 440 + if (i == image->nr_segments) { 441 + pages = pfn_to_page(hole_start >> PAGE_SHIFT); 442 + image->control_page = hole_end; 443 + break; 444 + } 445 + } 446 + 447 + return pages; 448 + } 449 + 450 + 451 + struct page *kimage_alloc_control_pages(struct kimage *image, 452 + unsigned int order) 453 + { 454 + struct page *pages = NULL; 455 + 456 + switch (image->type) { 457 + case KEXEC_TYPE_DEFAULT: 458 + pages = kimage_alloc_normal_control_pages(image, order); 459 + break; 460 + case KEXEC_TYPE_CRASH: 461 + pages = kimage_alloc_crash_control_pages(image, order); 462 + break; 463 + } 464 + 465 + return pages; 466 + } 467 + 468 + static int kimage_add_entry(struct kimage *image, kimage_entry_t entry) 469 + { 470 + if (*image->entry != 0) 471 + image->entry++; 472 + 473 + if (image->entry == image->last_entry) { 474 + kimage_entry_t *ind_page; 475 + struct page *page; 476 + 477 + page = kimage_alloc_page(image, GFP_KERNEL, KIMAGE_NO_DEST); 478 + if (!page) 479 + return -ENOMEM; 480 + 481 + ind_page = page_address(page); 482 + *image->entry = virt_to_phys(ind_page) | IND_INDIRECTION; 483 + image->entry = ind_page; 484 + image->last_entry = ind_page + 485 + ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1); 486 + } 487 + *image->entry = entry; 488 + image->entry++; 489 + *image->entry = 0; 490 + 491 + return 0; 492 + } 493 + 494 + static int kimage_set_destination(struct kimage *image, 495 + unsigned long destination) 496 + { 497 + int result; 498 + 499 + destination &= PAGE_MASK; 500 + result = kimage_add_entry(image, destination | IND_DESTINATION); 501 + 502 + return result; 503 + } 504 + 505 + 506 + static int kimage_add_page(struct kimage *image, unsigned long page) 507 + { 508 + int result; 509 + 510 + page &= PAGE_MASK; 511 + result = kimage_add_entry(image, page | IND_SOURCE); 512 + 513 + return result; 514 + } 515 + 516 + 517 + static void kimage_free_extra_pages(struct kimage *image) 518 + { 519 + /* Walk through and free any extra destination pages I may have */ 520 + kimage_free_page_list(&image->dest_pages); 521 + 522 + /* Walk through and free any unusable pages I have cached */ 523 + kimage_free_page_list(&image->unusable_pages); 524 + 525 + } 526 + void kimage_terminate(struct kimage *image) 527 + { 528 + if (*image->entry != 0) 529 + image->entry++; 530 + 531 + *image->entry = IND_DONE; 532 + } 533 + 534 + #define for_each_kimage_entry(image, ptr, entry) \ 535 + for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \ 536 + ptr = (entry & IND_INDIRECTION) ? \ 537 + phys_to_virt((entry & PAGE_MASK)) : ptr + 1) 538 + 539 + static void kimage_free_entry(kimage_entry_t entry) 540 + { 541 + struct page *page; 542 + 543 + page = pfn_to_page(entry >> PAGE_SHIFT); 544 + kimage_free_pages(page); 545 + } 546 + 547 + void kimage_free(struct kimage *image) 548 + { 549 + kimage_entry_t *ptr, entry; 550 + kimage_entry_t ind = 0; 551 + 552 + if (!image) 553 + return; 554 + 555 + kimage_free_extra_pages(image); 556 + for_each_kimage_entry(image, ptr, entry) { 557 + if (entry & IND_INDIRECTION) { 558 + /* Free the previous indirection page */ 559 + if (ind & IND_INDIRECTION) 560 + kimage_free_entry(ind); 561 + /* Save this indirection page until we are 562 + * done with it. 563 + */ 564 + ind = entry; 565 + } else if (entry & IND_SOURCE) 566 + kimage_free_entry(entry); 567 + } 568 + /* Free the final indirection page */ 569 + if (ind & IND_INDIRECTION) 570 + kimage_free_entry(ind); 571 + 572 + /* Handle any machine specific cleanup */ 573 + machine_kexec_cleanup(image); 574 + 575 + /* Free the kexec control pages... */ 576 + kimage_free_page_list(&image->control_pages); 577 + 578 + /* 579 + * Free up any temporary buffers allocated. This might hit if 580 + * error occurred much later after buffer allocation. 581 + */ 582 + if (image->file_mode) 583 + kimage_file_post_load_cleanup(image); 584 + 585 + kfree(image); 586 + } 587 + 588 + static kimage_entry_t *kimage_dst_used(struct kimage *image, 589 + unsigned long page) 590 + { 591 + kimage_entry_t *ptr, entry; 592 + unsigned long destination = 0; 593 + 594 + for_each_kimage_entry(image, ptr, entry) { 595 + if (entry & IND_DESTINATION) 596 + destination = entry & PAGE_MASK; 597 + else if (entry & IND_SOURCE) { 598 + if (page == destination) 599 + return ptr; 600 + destination += PAGE_SIZE; 601 + } 602 + } 603 + 604 + return NULL; 605 + } 606 + 607 + static struct page *kimage_alloc_page(struct kimage *image, 608 + gfp_t gfp_mask, 609 + unsigned long destination) 610 + { 611 + /* 612 + * Here we implement safeguards to ensure that a source page 613 + * is not copied to its destination page before the data on 614 + * the destination page is no longer useful. 615 + * 616 + * To do this we maintain the invariant that a source page is 617 + * either its own destination page, or it is not a 618 + * destination page at all. 619 + * 620 + * That is slightly stronger than required, but the proof 621 + * that no problems will not occur is trivial, and the 622 + * implementation is simply to verify. 623 + * 624 + * When allocating all pages normally this algorithm will run 625 + * in O(N) time, but in the worst case it will run in O(N^2) 626 + * time. If the runtime is a problem the data structures can 627 + * be fixed. 628 + */ 629 + struct page *page; 630 + unsigned long addr; 631 + 632 + /* 633 + * Walk through the list of destination pages, and see if I 634 + * have a match. 635 + */ 636 + list_for_each_entry(page, &image->dest_pages, lru) { 637 + addr = page_to_pfn(page) << PAGE_SHIFT; 638 + if (addr == destination) { 639 + list_del(&page->lru); 640 + return page; 641 + } 642 + } 643 + page = NULL; 644 + while (1) { 645 + kimage_entry_t *old; 646 + 647 + /* Allocate a page, if we run out of memory give up */ 648 + page = kimage_alloc_pages(gfp_mask, 0); 649 + if (!page) 650 + return NULL; 651 + /* If the page cannot be used file it away */ 652 + if (page_to_pfn(page) > 653 + (KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) { 654 + list_add(&page->lru, &image->unusable_pages); 655 + continue; 656 + } 657 + addr = page_to_pfn(page) << PAGE_SHIFT; 658 + 659 + /* If it is the destination page we want use it */ 660 + if (addr == destination) 661 + break; 662 + 663 + /* If the page is not a destination page use it */ 664 + if (!kimage_is_destination_range(image, addr, 665 + addr + PAGE_SIZE)) 666 + break; 667 + 668 + /* 669 + * I know that the page is someones destination page. 670 + * See if there is already a source page for this 671 + * destination page. And if so swap the source pages. 672 + */ 673 + old = kimage_dst_used(image, addr); 674 + if (old) { 675 + /* If so move it */ 676 + unsigned long old_addr; 677 + struct page *old_page; 678 + 679 + old_addr = *old & PAGE_MASK; 680 + old_page = pfn_to_page(old_addr >> PAGE_SHIFT); 681 + copy_highpage(page, old_page); 682 + *old = addr | (*old & ~PAGE_MASK); 683 + 684 + /* The old page I have found cannot be a 685 + * destination page, so return it if it's 686 + * gfp_flags honor the ones passed in. 687 + */ 688 + if (!(gfp_mask & __GFP_HIGHMEM) && 689 + PageHighMem(old_page)) { 690 + kimage_free_pages(old_page); 691 + continue; 692 + } 693 + addr = old_addr; 694 + page = old_page; 695 + break; 696 + } 697 + /* Place the page on the destination list, to be used later */ 698 + list_add(&page->lru, &image->dest_pages); 699 + } 700 + 701 + return page; 702 + } 703 + 704 + static int kimage_load_normal_segment(struct kimage *image, 705 + struct kexec_segment *segment) 706 + { 707 + unsigned long maddr; 708 + size_t ubytes, mbytes; 709 + int result; 710 + unsigned char __user *buf = NULL; 711 + unsigned char *kbuf = NULL; 712 + 713 + result = 0; 714 + if (image->file_mode) 715 + kbuf = segment->kbuf; 716 + else 717 + buf = segment->buf; 718 + ubytes = segment->bufsz; 719 + mbytes = segment->memsz; 720 + maddr = segment->mem; 721 + 722 + result = kimage_set_destination(image, maddr); 723 + if (result < 0) 724 + goto out; 725 + 726 + while (mbytes) { 727 + struct page *page; 728 + char *ptr; 729 + size_t uchunk, mchunk; 730 + 731 + page = kimage_alloc_page(image, GFP_HIGHUSER, maddr); 732 + if (!page) { 733 + result = -ENOMEM; 734 + goto out; 735 + } 736 + result = kimage_add_page(image, page_to_pfn(page) 737 + << PAGE_SHIFT); 738 + if (result < 0) 739 + goto out; 740 + 741 + ptr = kmap(page); 742 + /* Start with a clear page */ 743 + clear_page(ptr); 744 + ptr += maddr & ~PAGE_MASK; 745 + mchunk = min_t(size_t, mbytes, 746 + PAGE_SIZE - (maddr & ~PAGE_MASK)); 747 + uchunk = min(ubytes, mchunk); 748 + 749 + /* For file based kexec, source pages are in kernel memory */ 750 + if (image->file_mode) 751 + memcpy(ptr, kbuf, uchunk); 752 + else 753 + result = copy_from_user(ptr, buf, uchunk); 754 + kunmap(page); 755 + if (result) { 756 + result = -EFAULT; 757 + goto out; 758 + } 759 + ubytes -= uchunk; 760 + maddr += mchunk; 761 + if (image->file_mode) 762 + kbuf += mchunk; 763 + else 764 + buf += mchunk; 765 + mbytes -= mchunk; 766 + } 767 + out: 768 + return result; 769 + } 770 + 771 + static int kimage_load_crash_segment(struct kimage *image, 772 + struct kexec_segment *segment) 773 + { 774 + /* For crash dumps kernels we simply copy the data from 775 + * user space to it's destination. 776 + * We do things a page at a time for the sake of kmap. 777 + */ 778 + unsigned long maddr; 779 + size_t ubytes, mbytes; 780 + int result; 781 + unsigned char __user *buf = NULL; 782 + unsigned char *kbuf = NULL; 783 + 784 + result = 0; 785 + if (image->file_mode) 786 + kbuf = segment->kbuf; 787 + else 788 + buf = segment->buf; 789 + ubytes = segment->bufsz; 790 + mbytes = segment->memsz; 791 + maddr = segment->mem; 792 + while (mbytes) { 793 + struct page *page; 794 + char *ptr; 795 + size_t uchunk, mchunk; 796 + 797 + page = pfn_to_page(maddr >> PAGE_SHIFT); 798 + if (!page) { 799 + result = -ENOMEM; 800 + goto out; 801 + } 802 + ptr = kmap(page); 803 + ptr += maddr & ~PAGE_MASK; 804 + mchunk = min_t(size_t, mbytes, 805 + PAGE_SIZE - (maddr & ~PAGE_MASK)); 806 + uchunk = min(ubytes, mchunk); 807 + if (mchunk > uchunk) { 808 + /* Zero the trailing part of the page */ 809 + memset(ptr + uchunk, 0, mchunk - uchunk); 810 + } 811 + 812 + /* For file based kexec, source pages are in kernel memory */ 813 + if (image->file_mode) 814 + memcpy(ptr, kbuf, uchunk); 815 + else 816 + result = copy_from_user(ptr, buf, uchunk); 817 + kexec_flush_icache_page(page); 818 + kunmap(page); 819 + if (result) { 820 + result = -EFAULT; 821 + goto out; 822 + } 823 + ubytes -= uchunk; 824 + maddr += mchunk; 825 + if (image->file_mode) 826 + kbuf += mchunk; 827 + else 828 + buf += mchunk; 829 + mbytes -= mchunk; 830 + } 831 + out: 832 + return result; 833 + } 834 + 835 + int kimage_load_segment(struct kimage *image, 836 + struct kexec_segment *segment) 837 + { 838 + int result = -ENOMEM; 839 + 840 + switch (image->type) { 841 + case KEXEC_TYPE_DEFAULT: 842 + result = kimage_load_normal_segment(image, segment); 843 + break; 844 + case KEXEC_TYPE_CRASH: 845 + result = kimage_load_crash_segment(image, segment); 846 + break; 847 + } 848 + 849 + return result; 850 + } 851 + 852 + struct kimage *kexec_image; 853 + struct kimage *kexec_crash_image; 854 + int kexec_load_disabled; 855 + 856 + void crash_kexec(struct pt_regs *regs) 857 + { 858 + /* Take the kexec_mutex here to prevent sys_kexec_load 859 + * running on one cpu from replacing the crash kernel 860 + * we are using after a panic on a different cpu. 861 + * 862 + * If the crash kernel was not located in a fixed area 863 + * of memory the xchg(&kexec_crash_image) would be 864 + * sufficient. But since I reuse the memory... 865 + */ 866 + if (mutex_trylock(&kexec_mutex)) { 867 + if (kexec_crash_image) { 868 + struct pt_regs fixed_regs; 869 + 870 + crash_setup_regs(&fixed_regs, regs); 871 + crash_save_vmcoreinfo(); 872 + machine_crash_shutdown(&fixed_regs); 873 + machine_kexec(kexec_crash_image); 874 + } 875 + mutex_unlock(&kexec_mutex); 876 + } 877 + } 878 + 879 + size_t crash_get_memory_size(void) 880 + { 881 + size_t size = 0; 882 + 883 + mutex_lock(&kexec_mutex); 884 + if (crashk_res.end != crashk_res.start) 885 + size = resource_size(&crashk_res); 886 + mutex_unlock(&kexec_mutex); 887 + return size; 888 + } 889 + 890 + void __weak crash_free_reserved_phys_range(unsigned long begin, 891 + unsigned long end) 892 + { 893 + unsigned long addr; 894 + 895 + for (addr = begin; addr < end; addr += PAGE_SIZE) 896 + free_reserved_page(pfn_to_page(addr >> PAGE_SHIFT)); 897 + } 898 + 899 + int crash_shrink_memory(unsigned long new_size) 900 + { 901 + int ret = 0; 902 + unsigned long start, end; 903 + unsigned long old_size; 904 + struct resource *ram_res; 905 + 906 + mutex_lock(&kexec_mutex); 907 + 908 + if (kexec_crash_image) { 909 + ret = -ENOENT; 910 + goto unlock; 911 + } 912 + start = crashk_res.start; 913 + end = crashk_res.end; 914 + old_size = (end == 0) ? 0 : end - start + 1; 915 + if (new_size >= old_size) { 916 + ret = (new_size == old_size) ? 0 : -EINVAL; 917 + goto unlock; 918 + } 919 + 920 + ram_res = kzalloc(sizeof(*ram_res), GFP_KERNEL); 921 + if (!ram_res) { 922 + ret = -ENOMEM; 923 + goto unlock; 924 + } 925 + 926 + start = roundup(start, KEXEC_CRASH_MEM_ALIGN); 927 + end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN); 928 + 929 + crash_map_reserved_pages(); 930 + crash_free_reserved_phys_range(end, crashk_res.end); 931 + 932 + if ((start == end) && (crashk_res.parent != NULL)) 933 + release_resource(&crashk_res); 934 + 935 + ram_res->start = end; 936 + ram_res->end = crashk_res.end; 937 + ram_res->flags = IORESOURCE_BUSY | IORESOURCE_MEM; 938 + ram_res->name = "System RAM"; 939 + 940 + crashk_res.end = end - 1; 941 + 942 + insert_resource(&iomem_resource, ram_res); 943 + crash_unmap_reserved_pages(); 944 + 945 + unlock: 946 + mutex_unlock(&kexec_mutex); 947 + return ret; 948 + } 949 + 950 + static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data, 951 + size_t data_len) 952 + { 953 + struct elf_note note; 954 + 955 + note.n_namesz = strlen(name) + 1; 956 + note.n_descsz = data_len; 957 + note.n_type = type; 958 + memcpy(buf, &note, sizeof(note)); 959 + buf += (sizeof(note) + 3)/4; 960 + memcpy(buf, name, note.n_namesz); 961 + buf += (note.n_namesz + 3)/4; 962 + memcpy(buf, data, note.n_descsz); 963 + buf += (note.n_descsz + 3)/4; 964 + 965 + return buf; 966 + } 967 + 968 + static void final_note(u32 *buf) 969 + { 970 + struct elf_note note; 971 + 972 + note.n_namesz = 0; 973 + note.n_descsz = 0; 974 + note.n_type = 0; 975 + memcpy(buf, &note, sizeof(note)); 976 + } 977 + 978 + void crash_save_cpu(struct pt_regs *regs, int cpu) 979 + { 980 + struct elf_prstatus prstatus; 981 + u32 *buf; 982 + 983 + if ((cpu < 0) || (cpu >= nr_cpu_ids)) 984 + return; 985 + 986 + /* Using ELF notes here is opportunistic. 987 + * I need a well defined structure format 988 + * for the data I pass, and I need tags 989 + * on the data to indicate what information I have 990 + * squirrelled away. ELF notes happen to provide 991 + * all of that, so there is no need to invent something new. 992 + */ 993 + buf = (u32 *)per_cpu_ptr(crash_notes, cpu); 994 + if (!buf) 995 + return; 996 + memset(&prstatus, 0, sizeof(prstatus)); 997 + prstatus.pr_pid = current->pid; 998 + elf_core_copy_kernel_regs(&prstatus.pr_reg, regs); 999 + buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_PRSTATUS, 1000 + &prstatus, sizeof(prstatus)); 1001 + final_note(buf); 1002 + } 1003 + 1004 + static int __init crash_notes_memory_init(void) 1005 + { 1006 + /* Allocate memory for saving cpu registers. */ 1007 + size_t size, align; 1008 + 1009 + /* 1010 + * crash_notes could be allocated across 2 vmalloc pages when percpu 1011 + * is vmalloc based . vmalloc doesn't guarantee 2 continuous vmalloc 1012 + * pages are also on 2 continuous physical pages. In this case the 1013 + * 2nd part of crash_notes in 2nd page could be lost since only the 1014 + * starting address and size of crash_notes are exported through sysfs. 1015 + * Here round up the size of crash_notes to the nearest power of two 1016 + * and pass it to __alloc_percpu as align value. This can make sure 1017 + * crash_notes is allocated inside one physical page. 1018 + */ 1019 + size = sizeof(note_buf_t); 1020 + align = min(roundup_pow_of_two(sizeof(note_buf_t)), PAGE_SIZE); 1021 + 1022 + /* 1023 + * Break compile if size is bigger than PAGE_SIZE since crash_notes 1024 + * definitely will be in 2 pages with that. 1025 + */ 1026 + BUILD_BUG_ON(size > PAGE_SIZE); 1027 + 1028 + crash_notes = __alloc_percpu(size, align); 1029 + if (!crash_notes) { 1030 + pr_warn("Kexec: Memory allocation for saving cpu register states failed\n"); 1031 + return -ENOMEM; 1032 + } 1033 + return 0; 1034 + } 1035 + subsys_initcall(crash_notes_memory_init); 1036 + 1037 + 1038 + /* 1039 + * parsing the "crashkernel" commandline 1040 + * 1041 + * this code is intended to be called from architecture specific code 1042 + */ 1043 + 1044 + 1045 + /* 1046 + * This function parses command lines in the format 1047 + * 1048 + * crashkernel=ramsize-range:size[,...][@offset] 1049 + * 1050 + * The function returns 0 on success and -EINVAL on failure. 1051 + */ 1052 + static int __init parse_crashkernel_mem(char *cmdline, 1053 + unsigned long long system_ram, 1054 + unsigned long long *crash_size, 1055 + unsigned long long *crash_base) 1056 + { 1057 + char *cur = cmdline, *tmp; 1058 + 1059 + /* for each entry of the comma-separated list */ 1060 + do { 1061 + unsigned long long start, end = ULLONG_MAX, size; 1062 + 1063 + /* get the start of the range */ 1064 + start = memparse(cur, &tmp); 1065 + if (cur == tmp) { 1066 + pr_warn("crashkernel: Memory value expected\n"); 1067 + return -EINVAL; 1068 + } 1069 + cur = tmp; 1070 + if (*cur != '-') { 1071 + pr_warn("crashkernel: '-' expected\n"); 1072 + return -EINVAL; 1073 + } 1074 + cur++; 1075 + 1076 + /* if no ':' is here, than we read the end */ 1077 + if (*cur != ':') { 1078 + end = memparse(cur, &tmp); 1079 + if (cur == tmp) { 1080 + pr_warn("crashkernel: Memory value expected\n"); 1081 + return -EINVAL; 1082 + } 1083 + cur = tmp; 1084 + if (end <= start) { 1085 + pr_warn("crashkernel: end <= start\n"); 1086 + return -EINVAL; 1087 + } 1088 + } 1089 + 1090 + if (*cur != ':') { 1091 + pr_warn("crashkernel: ':' expected\n"); 1092 + return -EINVAL; 1093 + } 1094 + cur++; 1095 + 1096 + size = memparse(cur, &tmp); 1097 + if (cur == tmp) { 1098 + pr_warn("Memory value expected\n"); 1099 + return -EINVAL; 1100 + } 1101 + cur = tmp; 1102 + if (size >= system_ram) { 1103 + pr_warn("crashkernel: invalid size\n"); 1104 + return -EINVAL; 1105 + } 1106 + 1107 + /* match ? */ 1108 + if (system_ram >= start && system_ram < end) { 1109 + *crash_size = size; 1110 + break; 1111 + } 1112 + } while (*cur++ == ','); 1113 + 1114 + if (*crash_size > 0) { 1115 + while (*cur && *cur != ' ' && *cur != '@') 1116 + cur++; 1117 + if (*cur == '@') { 1118 + cur++; 1119 + *crash_base = memparse(cur, &tmp); 1120 + if (cur == tmp) { 1121 + pr_warn("Memory value expected after '@'\n"); 1122 + return -EINVAL; 1123 + } 1124 + } 1125 + } 1126 + 1127 + return 0; 1128 + } 1129 + 1130 + /* 1131 + * That function parses "simple" (old) crashkernel command lines like 1132 + * 1133 + * crashkernel=size[@offset] 1134 + * 1135 + * It returns 0 on success and -EINVAL on failure. 1136 + */ 1137 + static int __init parse_crashkernel_simple(char *cmdline, 1138 + unsigned long long *crash_size, 1139 + unsigned long long *crash_base) 1140 + { 1141 + char *cur = cmdline; 1142 + 1143 + *crash_size = memparse(cmdline, &cur); 1144 + if (cmdline == cur) { 1145 + pr_warn("crashkernel: memory value expected\n"); 1146 + return -EINVAL; 1147 + } 1148 + 1149 + if (*cur == '@') 1150 + *crash_base = memparse(cur+1, &cur); 1151 + else if (*cur != ' ' && *cur != '\0') { 1152 + pr_warn("crashkernel: unrecognized char\n"); 1153 + return -EINVAL; 1154 + } 1155 + 1156 + return 0; 1157 + } 1158 + 1159 + #define SUFFIX_HIGH 0 1160 + #define SUFFIX_LOW 1 1161 + #define SUFFIX_NULL 2 1162 + static __initdata char *suffix_tbl[] = { 1163 + [SUFFIX_HIGH] = ",high", 1164 + [SUFFIX_LOW] = ",low", 1165 + [SUFFIX_NULL] = NULL, 1166 + }; 1167 + 1168 + /* 1169 + * That function parses "suffix" crashkernel command lines like 1170 + * 1171 + * crashkernel=size,[high|low] 1172 + * 1173 + * It returns 0 on success and -EINVAL on failure. 1174 + */ 1175 + static int __init parse_crashkernel_suffix(char *cmdline, 1176 + unsigned long long *crash_size, 1177 + const char *suffix) 1178 + { 1179 + char *cur = cmdline; 1180 + 1181 + *crash_size = memparse(cmdline, &cur); 1182 + if (cmdline == cur) { 1183 + pr_warn("crashkernel: memory value expected\n"); 1184 + return -EINVAL; 1185 + } 1186 + 1187 + /* check with suffix */ 1188 + if (strncmp(cur, suffix, strlen(suffix))) { 1189 + pr_warn("crashkernel: unrecognized char\n"); 1190 + return -EINVAL; 1191 + } 1192 + cur += strlen(suffix); 1193 + if (*cur != ' ' && *cur != '\0') { 1194 + pr_warn("crashkernel: unrecognized char\n"); 1195 + return -EINVAL; 1196 + } 1197 + 1198 + return 0; 1199 + } 1200 + 1201 + static __init char *get_last_crashkernel(char *cmdline, 1202 + const char *name, 1203 + const char *suffix) 1204 + { 1205 + char *p = cmdline, *ck_cmdline = NULL; 1206 + 1207 + /* find crashkernel and use the last one if there are more */ 1208 + p = strstr(p, name); 1209 + while (p) { 1210 + char *end_p = strchr(p, ' '); 1211 + char *q; 1212 + 1213 + if (!end_p) 1214 + end_p = p + strlen(p); 1215 + 1216 + if (!suffix) { 1217 + int i; 1218 + 1219 + /* skip the one with any known suffix */ 1220 + for (i = 0; suffix_tbl[i]; i++) { 1221 + q = end_p - strlen(suffix_tbl[i]); 1222 + if (!strncmp(q, suffix_tbl[i], 1223 + strlen(suffix_tbl[i]))) 1224 + goto next; 1225 + } 1226 + ck_cmdline = p; 1227 + } else { 1228 + q = end_p - strlen(suffix); 1229 + if (!strncmp(q, suffix, strlen(suffix))) 1230 + ck_cmdline = p; 1231 + } 1232 + next: 1233 + p = strstr(p+1, name); 1234 + } 1235 + 1236 + if (!ck_cmdline) 1237 + return NULL; 1238 + 1239 + return ck_cmdline; 1240 + } 1241 + 1242 + static int __init __parse_crashkernel(char *cmdline, 1243 + unsigned long long system_ram, 1244 + unsigned long long *crash_size, 1245 + unsigned long long *crash_base, 1246 + const char *name, 1247 + const char *suffix) 1248 + { 1249 + char *first_colon, *first_space; 1250 + char *ck_cmdline; 1251 + 1252 + BUG_ON(!crash_size || !crash_base); 1253 + *crash_size = 0; 1254 + *crash_base = 0; 1255 + 1256 + ck_cmdline = get_last_crashkernel(cmdline, name, suffix); 1257 + 1258 + if (!ck_cmdline) 1259 + return -EINVAL; 1260 + 1261 + ck_cmdline += strlen(name); 1262 + 1263 + if (suffix) 1264 + return parse_crashkernel_suffix(ck_cmdline, crash_size, 1265 + suffix); 1266 + /* 1267 + * if the commandline contains a ':', then that's the extended 1268 + * syntax -- if not, it must be the classic syntax 1269 + */ 1270 + first_colon = strchr(ck_cmdline, ':'); 1271 + first_space = strchr(ck_cmdline, ' '); 1272 + if (first_colon && (!first_space || first_colon < first_space)) 1273 + return parse_crashkernel_mem(ck_cmdline, system_ram, 1274 + crash_size, crash_base); 1275 + 1276 + return parse_crashkernel_simple(ck_cmdline, crash_size, crash_base); 1277 + } 1278 + 1279 + /* 1280 + * That function is the entry point for command line parsing and should be 1281 + * called from the arch-specific code. 1282 + */ 1283 + int __init parse_crashkernel(char *cmdline, 1284 + unsigned long long system_ram, 1285 + unsigned long long *crash_size, 1286 + unsigned long long *crash_base) 1287 + { 1288 + return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, 1289 + "crashkernel=", NULL); 1290 + } 1291 + 1292 + int __init parse_crashkernel_high(char *cmdline, 1293 + unsigned long long system_ram, 1294 + unsigned long long *crash_size, 1295 + unsigned long long *crash_base) 1296 + { 1297 + return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, 1298 + "crashkernel=", suffix_tbl[SUFFIX_HIGH]); 1299 + } 1300 + 1301 + int __init parse_crashkernel_low(char *cmdline, 1302 + unsigned long long system_ram, 1303 + unsigned long long *crash_size, 1304 + unsigned long long *crash_base) 1305 + { 1306 + return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, 1307 + "crashkernel=", suffix_tbl[SUFFIX_LOW]); 1308 + } 1309 + 1310 + static void update_vmcoreinfo_note(void) 1311 + { 1312 + u32 *buf = vmcoreinfo_note; 1313 + 1314 + if (!vmcoreinfo_size) 1315 + return; 1316 + buf = append_elf_note(buf, VMCOREINFO_NOTE_NAME, 0, vmcoreinfo_data, 1317 + vmcoreinfo_size); 1318 + final_note(buf); 1319 + } 1320 + 1321 + void crash_save_vmcoreinfo(void) 1322 + { 1323 + vmcoreinfo_append_str("CRASHTIME=%ld\n", get_seconds()); 1324 + update_vmcoreinfo_note(); 1325 + } 1326 + 1327 + void vmcoreinfo_append_str(const char *fmt, ...) 1328 + { 1329 + va_list args; 1330 + char buf[0x50]; 1331 + size_t r; 1332 + 1333 + va_start(args, fmt); 1334 + r = vscnprintf(buf, sizeof(buf), fmt, args); 1335 + va_end(args); 1336 + 1337 + r = min(r, vmcoreinfo_max_size - vmcoreinfo_size); 1338 + 1339 + memcpy(&vmcoreinfo_data[vmcoreinfo_size], buf, r); 1340 + 1341 + vmcoreinfo_size += r; 1342 + } 1343 + 1344 + /* 1345 + * provide an empty default implementation here -- architecture 1346 + * code may override this 1347 + */ 1348 + void __weak arch_crash_save_vmcoreinfo(void) 1349 + {} 1350 + 1351 + unsigned long __weak paddr_vmcoreinfo_note(void) 1352 + { 1353 + return __pa((unsigned long)(char *)&vmcoreinfo_note); 1354 + } 1355 + 1356 + static int __init crash_save_vmcoreinfo_init(void) 1357 + { 1358 + VMCOREINFO_OSRELEASE(init_uts_ns.name.release); 1359 + VMCOREINFO_PAGESIZE(PAGE_SIZE); 1360 + 1361 + VMCOREINFO_SYMBOL(init_uts_ns); 1362 + VMCOREINFO_SYMBOL(node_online_map); 1363 + #ifdef CONFIG_MMU 1364 + VMCOREINFO_SYMBOL(swapper_pg_dir); 1365 + #endif 1366 + VMCOREINFO_SYMBOL(_stext); 1367 + VMCOREINFO_SYMBOL(vmap_area_list); 1368 + 1369 + #ifndef CONFIG_NEED_MULTIPLE_NODES 1370 + VMCOREINFO_SYMBOL(mem_map); 1371 + VMCOREINFO_SYMBOL(contig_page_data); 1372 + #endif 1373 + #ifdef CONFIG_SPARSEMEM 1374 + VMCOREINFO_SYMBOL(mem_section); 1375 + VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); 1376 + VMCOREINFO_STRUCT_SIZE(mem_section); 1377 + VMCOREINFO_OFFSET(mem_section, section_mem_map); 1378 + #endif 1379 + VMCOREINFO_STRUCT_SIZE(page); 1380 + VMCOREINFO_STRUCT_SIZE(pglist_data); 1381 + VMCOREINFO_STRUCT_SIZE(zone); 1382 + VMCOREINFO_STRUCT_SIZE(free_area); 1383 + VMCOREINFO_STRUCT_SIZE(list_head); 1384 + VMCOREINFO_SIZE(nodemask_t); 1385 + VMCOREINFO_OFFSET(page, flags); 1386 + VMCOREINFO_OFFSET(page, _count); 1387 + VMCOREINFO_OFFSET(page, mapping); 1388 + VMCOREINFO_OFFSET(page, lru); 1389 + VMCOREINFO_OFFSET(page, _mapcount); 1390 + VMCOREINFO_OFFSET(page, private); 1391 + VMCOREINFO_OFFSET(pglist_data, node_zones); 1392 + VMCOREINFO_OFFSET(pglist_data, nr_zones); 1393 + #ifdef CONFIG_FLAT_NODE_MEM_MAP 1394 + VMCOREINFO_OFFSET(pglist_data, node_mem_map); 1395 + #endif 1396 + VMCOREINFO_OFFSET(pglist_data, node_start_pfn); 1397 + VMCOREINFO_OFFSET(pglist_data, node_spanned_pages); 1398 + VMCOREINFO_OFFSET(pglist_data, node_id); 1399 + VMCOREINFO_OFFSET(zone, free_area); 1400 + VMCOREINFO_OFFSET(zone, vm_stat); 1401 + VMCOREINFO_OFFSET(zone, spanned_pages); 1402 + VMCOREINFO_OFFSET(free_area, free_list); 1403 + VMCOREINFO_OFFSET(list_head, next); 1404 + VMCOREINFO_OFFSET(list_head, prev); 1405 + VMCOREINFO_OFFSET(vmap_area, va_start); 1406 + VMCOREINFO_OFFSET(vmap_area, list); 1407 + VMCOREINFO_LENGTH(zone.free_area, MAX_ORDER); 1408 + log_buf_kexec_setup(); 1409 + VMCOREINFO_LENGTH(free_area.free_list, MIGRATE_TYPES); 1410 + VMCOREINFO_NUMBER(NR_FREE_PAGES); 1411 + VMCOREINFO_NUMBER(PG_lru); 1412 + VMCOREINFO_NUMBER(PG_private); 1413 + VMCOREINFO_NUMBER(PG_swapcache); 1414 + VMCOREINFO_NUMBER(PG_slab); 1415 + #ifdef CONFIG_MEMORY_FAILURE 1416 + VMCOREINFO_NUMBER(PG_hwpoison); 1417 + #endif 1418 + VMCOREINFO_NUMBER(PG_head_mask); 1419 + VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE); 1420 + #ifdef CONFIG_X86 1421 + VMCOREINFO_NUMBER(KERNEL_IMAGE_SIZE); 1422 + #endif 1423 + #ifdef CONFIG_HUGETLBFS 1424 + VMCOREINFO_SYMBOL(free_huge_page); 1425 + #endif 1426 + 1427 + arch_crash_save_vmcoreinfo(); 1428 + update_vmcoreinfo_note(); 1429 + 1430 + return 0; 1431 + } 1432 + 1433 + subsys_initcall(crash_save_vmcoreinfo_init); 1434 + 1435 + /* 1436 + * Move into place and start executing a preloaded standalone 1437 + * executable. If nothing was preloaded return an error. 1438 + */ 1439 + int kernel_kexec(void) 1440 + { 1441 + int error = 0; 1442 + 1443 + if (!mutex_trylock(&kexec_mutex)) 1444 + return -EBUSY; 1445 + if (!kexec_image) { 1446 + error = -EINVAL; 1447 + goto Unlock; 1448 + } 1449 + 1450 + #ifdef CONFIG_KEXEC_JUMP 1451 + if (kexec_image->preserve_context) { 1452 + lock_system_sleep(); 1453 + pm_prepare_console(); 1454 + error = freeze_processes(); 1455 + if (error) { 1456 + error = -EBUSY; 1457 + goto Restore_console; 1458 + } 1459 + suspend_console(); 1460 + error = dpm_suspend_start(PMSG_FREEZE); 1461 + if (error) 1462 + goto Resume_console; 1463 + /* At this point, dpm_suspend_start() has been called, 1464 + * but *not* dpm_suspend_end(). We *must* call 1465 + * dpm_suspend_end() now. Otherwise, drivers for 1466 + * some devices (e.g. interrupt controllers) become 1467 + * desynchronized with the actual state of the 1468 + * hardware at resume time, and evil weirdness ensues. 1469 + */ 1470 + error = dpm_suspend_end(PMSG_FREEZE); 1471 + if (error) 1472 + goto Resume_devices; 1473 + error = disable_nonboot_cpus(); 1474 + if (error) 1475 + goto Enable_cpus; 1476 + local_irq_disable(); 1477 + error = syscore_suspend(); 1478 + if (error) 1479 + goto Enable_irqs; 1480 + } else 1481 + #endif 1482 + { 1483 + kexec_in_progress = true; 1484 + kernel_restart_prepare(NULL); 1485 + migrate_to_reboot_cpu(); 1486 + 1487 + /* 1488 + * migrate_to_reboot_cpu() disables CPU hotplug assuming that 1489 + * no further code needs to use CPU hotplug (which is true in 1490 + * the reboot case). However, the kexec path depends on using 1491 + * CPU hotplug again; so re-enable it here. 1492 + */ 1493 + cpu_hotplug_enable(); 1494 + pr_emerg("Starting new kernel\n"); 1495 + machine_shutdown(); 1496 + } 1497 + 1498 + machine_kexec(kexec_image); 1499 + 1500 + #ifdef CONFIG_KEXEC_JUMP 1501 + if (kexec_image->preserve_context) { 1502 + syscore_resume(); 1503 + Enable_irqs: 1504 + local_irq_enable(); 1505 + Enable_cpus: 1506 + enable_nonboot_cpus(); 1507 + dpm_resume_start(PMSG_RESTORE); 1508 + Resume_devices: 1509 + dpm_resume_end(PMSG_RESTORE); 1510 + Resume_console: 1511 + resume_console(); 1512 + thaw_processes(); 1513 + Restore_console: 1514 + pm_restore_console(); 1515 + unlock_system_sleep(); 1516 + } 1517 + #endif 1518 + 1519 + Unlock: 1520 + mutex_unlock(&kexec_mutex); 1521 + return error; 1522 + } 1523 + 1524 + /* 1525 + * Add and remove page tables for crashkernel memory 1526 + * 1527 + * Provide an empty default implementation here -- architecture 1528 + * code may override this 1529 + */ 1530 + void __weak crash_map_reserved_pages(void) 1531 + {} 1532 + 1533 + void __weak crash_unmap_reserved_pages(void) 1534 + {}

+1045

kernel/kexec_file.c

··· 1 + /* 2 + * kexec: kexec_file_load system call 3 + * 4 + * Copyright (C) 2014 Red Hat Inc. 5 + * Authors: 6 + * Vivek Goyal <vgoyal@redhat.com> 7 + * 8 + * This source code is licensed under the GNU General Public License, 9 + * Version 2. See the file COPYING for more details. 10 + */ 11 + 12 + #include <linux/capability.h> 13 + #include <linux/mm.h> 14 + #include <linux/file.h> 15 + #include <linux/slab.h> 16 + #include <linux/kexec.h> 17 + #include <linux/mutex.h> 18 + #include <linux/list.h> 19 + #include <crypto/hash.h> 20 + #include <crypto/sha.h> 21 + #include <linux/syscalls.h> 22 + #include <linux/vmalloc.h> 23 + #include "kexec_internal.h" 24 + 25 + /* 26 + * Declare these symbols weak so that if architecture provides a purgatory, 27 + * these will be overridden. 28 + */ 29 + char __weak kexec_purgatory[0]; 30 + size_t __weak kexec_purgatory_size = 0; 31 + 32 + static int kexec_calculate_store_digests(struct kimage *image); 33 + 34 + static int copy_file_from_fd(int fd, void **buf, unsigned long *buf_len) 35 + { 36 + struct fd f = fdget(fd); 37 + int ret; 38 + struct kstat stat; 39 + loff_t pos; 40 + ssize_t bytes = 0; 41 + 42 + if (!f.file) 43 + return -EBADF; 44 + 45 + ret = vfs_getattr(&f.file->f_path, &stat); 46 + if (ret) 47 + goto out; 48 + 49 + if (stat.size > INT_MAX) { 50 + ret = -EFBIG; 51 + goto out; 52 + } 53 + 54 + /* Don't hand 0 to vmalloc, it whines. */ 55 + if (stat.size == 0) { 56 + ret = -EINVAL; 57 + goto out; 58 + } 59 + 60 + *buf = vmalloc(stat.size); 61 + if (!*buf) { 62 + ret = -ENOMEM; 63 + goto out; 64 + } 65 + 66 + pos = 0; 67 + while (pos < stat.size) { 68 + bytes = kernel_read(f.file, pos, (char *)(*buf) + pos, 69 + stat.size - pos); 70 + if (bytes < 0) { 71 + vfree(*buf); 72 + ret = bytes; 73 + goto out; 74 + } 75 + 76 + if (bytes == 0) 77 + break; 78 + pos += bytes; 79 + } 80 + 81 + if (pos != stat.size) { 82 + ret = -EBADF; 83 + vfree(*buf); 84 + goto out; 85 + } 86 + 87 + *buf_len = pos; 88 + out: 89 + fdput(f); 90 + return ret; 91 + } 92 + 93 + /* Architectures can provide this probe function */ 94 + int __weak arch_kexec_kernel_image_probe(struct kimage *image, void *buf, 95 + unsigned long buf_len) 96 + { 97 + return -ENOEXEC; 98 + } 99 + 100 + void * __weak arch_kexec_kernel_image_load(struct kimage *image) 101 + { 102 + return ERR_PTR(-ENOEXEC); 103 + } 104 + 105 + int __weak arch_kimage_file_post_load_cleanup(struct kimage *image) 106 + { 107 + return -EINVAL; 108 + } 109 + 110 + int __weak arch_kexec_kernel_verify_sig(struct kimage *image, void *buf, 111 + unsigned long buf_len) 112 + { 113 + return -EKEYREJECTED; 114 + } 115 + 116 + /* Apply relocations of type RELA */ 117 + int __weak 118 + arch_kexec_apply_relocations_add(const Elf_Ehdr *ehdr, Elf_Shdr *sechdrs, 119 + unsigned int relsec) 120 + { 121 + pr_err("RELA relocation unsupported.\n"); 122 + return -ENOEXEC; 123 + } 124 + 125 + /* Apply relocations of type REL */ 126 + int __weak 127 + arch_kexec_apply_relocations(const Elf_Ehdr *ehdr, Elf_Shdr *sechdrs, 128 + unsigned int relsec) 129 + { 130 + pr_err("REL relocation unsupported.\n"); 131 + return -ENOEXEC; 132 + } 133 + 134 + /* 135 + * Free up memory used by kernel, initrd, and command line. This is temporary 136 + * memory allocation which is not needed any more after these buffers have 137 + * been loaded into separate segments and have been copied elsewhere. 138 + */ 139 + void kimage_file_post_load_cleanup(struct kimage *image) 140 + { 141 + struct purgatory_info *pi = &image->purgatory_info; 142 + 143 + vfree(image->kernel_buf); 144 + image->kernel_buf = NULL; 145 + 146 + vfree(image->initrd_buf); 147 + image->initrd_buf = NULL; 148 + 149 + kfree(image->cmdline_buf); 150 + image->cmdline_buf = NULL; 151 + 152 + vfree(pi->purgatory_buf); 153 + pi->purgatory_buf = NULL; 154 + 155 + vfree(pi->sechdrs); 156 + pi->sechdrs = NULL; 157 + 158 + /* See if architecture has anything to cleanup post load */ 159 + arch_kimage_file_post_load_cleanup(image); 160 + 161 + /* 162 + * Above call should have called into bootloader to free up 163 + * any data stored in kimage->image_loader_data. It should 164 + * be ok now to free it up. 165 + */ 166 + kfree(image->image_loader_data); 167 + image->image_loader_data = NULL; 168 + } 169 + 170 + /* 171 + * In file mode list of segments is prepared by kernel. Copy relevant 172 + * data from user space, do error checking, prepare segment list 173 + */ 174 + static int 175 + kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, 176 + const char __user *cmdline_ptr, 177 + unsigned long cmdline_len, unsigned flags) 178 + { 179 + int ret = 0; 180 + void *ldata; 181 + 182 + ret = copy_file_from_fd(kernel_fd, &image->kernel_buf, 183 + &image->kernel_buf_len); 184 + if (ret) 185 + return ret; 186 + 187 + /* Call arch image probe handlers */ 188 + ret = arch_kexec_kernel_image_probe(image, image->kernel_buf, 189 + image->kernel_buf_len); 190 + 191 + if (ret) 192 + goto out; 193 + 194 + #ifdef CONFIG_KEXEC_VERIFY_SIG 195 + ret = arch_kexec_kernel_verify_sig(image, image->kernel_buf, 196 + image->kernel_buf_len); 197 + if (ret) { 198 + pr_debug("kernel signature verification failed.\n"); 199 + goto out; 200 + } 201 + pr_debug("kernel signature verification successful.\n"); 202 + #endif 203 + /* It is possible that there no initramfs is being loaded */ 204 + if (!(flags & KEXEC_FILE_NO_INITRAMFS)) { 205 + ret = copy_file_from_fd(initrd_fd, &image->initrd_buf, 206 + &image->initrd_buf_len); 207 + if (ret) 208 + goto out; 209 + } 210 + 211 + if (cmdline_len) { 212 + image->cmdline_buf = kzalloc(cmdline_len, GFP_KERNEL); 213 + if (!image->cmdline_buf) { 214 + ret = -ENOMEM; 215 + goto out; 216 + } 217 + 218 + ret = copy_from_user(image->cmdline_buf, cmdline_ptr, 219 + cmdline_len); 220 + if (ret) { 221 + ret = -EFAULT; 222 + goto out; 223 + } 224 + 225 + image->cmdline_buf_len = cmdline_len; 226 + 227 + /* command line should be a string with last byte null */ 228 + if (image->cmdline_buf[cmdline_len - 1] != '\0') { 229 + ret = -EINVAL; 230 + goto out; 231 + } 232 + } 233 + 234 + /* Call arch image load handlers */ 235 + ldata = arch_kexec_kernel_image_load(image); 236 + 237 + if (IS_ERR(ldata)) { 238 + ret = PTR_ERR(ldata); 239 + goto out; 240 + } 241 + 242 + image->image_loader_data = ldata; 243 + out: 244 + /* In case of error, free up all allocated memory in this function */ 245 + if (ret) 246 + kimage_file_post_load_cleanup(image); 247 + return ret; 248 + } 249 + 250 + static int 251 + kimage_file_alloc_init(struct kimage **rimage, int kernel_fd, 252 + int initrd_fd, const char __user *cmdline_ptr, 253 + unsigned long cmdline_len, unsigned long flags) 254 + { 255 + int ret; 256 + struct kimage *image; 257 + bool kexec_on_panic = flags & KEXEC_FILE_ON_CRASH; 258 + 259 + image = do_kimage_alloc_init(); 260 + if (!image) 261 + return -ENOMEM; 262 + 263 + image->file_mode = 1; 264 + 265 + if (kexec_on_panic) { 266 + /* Enable special crash kernel control page alloc policy. */ 267 + image->control_page = crashk_res.start; 268 + image->type = KEXEC_TYPE_CRASH; 269 + } 270 + 271 + ret = kimage_file_prepare_segments(image, kernel_fd, initrd_fd, 272 + cmdline_ptr, cmdline_len, flags); 273 + if (ret) 274 + goto out_free_image; 275 + 276 + ret = sanity_check_segment_list(image); 277 + if (ret) 278 + goto out_free_post_load_bufs; 279 + 280 + ret = -ENOMEM; 281 + image->control_code_page = kimage_alloc_control_pages(image, 282 + get_order(KEXEC_CONTROL_PAGE_SIZE)); 283 + if (!image->control_code_page) { 284 + pr_err("Could not allocate control_code_buffer\n"); 285 + goto out_free_post_load_bufs; 286 + } 287 + 288 + if (!kexec_on_panic) { 289 + image->swap_page = kimage_alloc_control_pages(image, 0); 290 + if (!image->swap_page) { 291 + pr_err("Could not allocate swap buffer\n"); 292 + goto out_free_control_pages; 293 + } 294 + } 295 + 296 + *rimage = image; 297 + return 0; 298 + out_free_control_pages: 299 + kimage_free_page_list(&image->control_pages); 300 + out_free_post_load_bufs: 301 + kimage_file_post_load_cleanup(image); 302 + out_free_image: 303 + kfree(image); 304 + return ret; 305 + } 306 + 307 + SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, 308 + unsigned long, cmdline_len, const char __user *, cmdline_ptr, 309 + unsigned long, flags) 310 + { 311 + int ret = 0, i; 312 + struct kimage **dest_image, *image; 313 + 314 + /* We only trust the superuser with rebooting the system. */ 315 + if (!capable(CAP_SYS_BOOT) || kexec_load_disabled) 316 + return -EPERM; 317 + 318 + /* Make sure we have a legal set of flags */ 319 + if (flags != (flags & KEXEC_FILE_FLAGS)) 320 + return -EINVAL; 321 + 322 + image = NULL; 323 + 324 + if (!mutex_trylock(&kexec_mutex)) 325 + return -EBUSY; 326 + 327 + dest_image = &kexec_image; 328 + if (flags & KEXEC_FILE_ON_CRASH) 329 + dest_image = &kexec_crash_image; 330 + 331 + if (flags & KEXEC_FILE_UNLOAD) 332 + goto exchange; 333 + 334 + /* 335 + * In case of crash, new kernel gets loaded in reserved region. It is 336 + * same memory where old crash kernel might be loaded. Free any 337 + * current crash dump kernel before we corrupt it. 338 + */ 339 + if (flags & KEXEC_FILE_ON_CRASH) 340 + kimage_free(xchg(&kexec_crash_image, NULL)); 341 + 342 + ret = kimage_file_alloc_init(&image, kernel_fd, initrd_fd, cmdline_ptr, 343 + cmdline_len, flags); 344 + if (ret) 345 + goto out; 346 + 347 + ret = machine_kexec_prepare(image); 348 + if (ret) 349 + goto out; 350 + 351 + ret = kexec_calculate_store_digests(image); 352 + if (ret) 353 + goto out; 354 + 355 + for (i = 0; i < image->nr_segments; i++) { 356 + struct kexec_segment *ksegment; 357 + 358 + ksegment = &image->segment[i]; 359 + pr_debug("Loading segment %d: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", 360 + i, ksegment->buf, ksegment->bufsz, ksegment->mem, 361 + ksegment->memsz); 362 + 363 + ret = kimage_load_segment(image, &image->segment[i]); 364 + if (ret) 365 + goto out; 366 + } 367 + 368 + kimage_terminate(image); 369 + 370 + /* 371 + * Free up any temporary buffers allocated which are not needed 372 + * after image has been loaded 373 + */ 374 + kimage_file_post_load_cleanup(image); 375 + exchange: 376 + image = xchg(dest_image, image); 377 + out: 378 + mutex_unlock(&kexec_mutex); 379 + kimage_free(image); 380 + return ret; 381 + } 382 + 383 + static int locate_mem_hole_top_down(unsigned long start, unsigned long end, 384 + struct kexec_buf *kbuf) 385 + { 386 + struct kimage *image = kbuf->image; 387 + unsigned long temp_start, temp_end; 388 + 389 + temp_end = min(end, kbuf->buf_max); 390 + temp_start = temp_end - kbuf->memsz; 391 + 392 + do { 393 + /* align down start */ 394 + temp_start = temp_start & (~(kbuf->buf_align - 1)); 395 + 396 + if (temp_start < start || temp_start < kbuf->buf_min) 397 + return 0; 398 + 399 + temp_end = temp_start + kbuf->memsz - 1; 400 + 401 + /* 402 + * Make sure this does not conflict with any of existing 403 + * segments 404 + */ 405 + if (kimage_is_destination_range(image, temp_start, temp_end)) { 406 + temp_start = temp_start - PAGE_SIZE; 407 + continue; 408 + } 409 + 410 + /* We found a suitable memory range */ 411 + break; 412 + } while (1); 413 + 414 + /* If we are here, we found a suitable memory range */ 415 + kbuf->mem = temp_start; 416 + 417 + /* Success, stop navigating through remaining System RAM ranges */ 418 + return 1; 419 + } 420 + 421 + static int locate_mem_hole_bottom_up(unsigned long start, unsigned long end, 422 + struct kexec_buf *kbuf) 423 + { 424 + struct kimage *image = kbuf->image; 425 + unsigned long temp_start, temp_end; 426 + 427 + temp_start = max(start, kbuf->buf_min); 428 + 429 + do { 430 + temp_start = ALIGN(temp_start, kbuf->buf_align); 431 + temp_end = temp_start + kbuf->memsz - 1; 432 + 433 + if (temp_end > end || temp_end > kbuf->buf_max) 434 + return 0; 435 + /* 436 + * Make sure this does not conflict with any of existing 437 + * segments 438 + */ 439 + if (kimage_is_destination_range(image, temp_start, temp_end)) { 440 + temp_start = temp_start + PAGE_SIZE; 441 + continue; 442 + } 443 + 444 + /* We found a suitable memory range */ 445 + break; 446 + } while (1); 447 + 448 + /* If we are here, we found a suitable memory range */ 449 + kbuf->mem = temp_start; 450 + 451 + /* Success, stop navigating through remaining System RAM ranges */ 452 + return 1; 453 + } 454 + 455 + static int locate_mem_hole_callback(u64 start, u64 end, void *arg) 456 + { 457 + struct kexec_buf *kbuf = (struct kexec_buf *)arg; 458 + unsigned long sz = end - start + 1; 459 + 460 + /* Returning 0 will take to next memory range */ 461 + if (sz < kbuf->memsz) 462 + return 0; 463 + 464 + if (end < kbuf->buf_min || start > kbuf->buf_max) 465 + return 0; 466 + 467 + /* 468 + * Allocate memory top down with-in ram range. Otherwise bottom up 469 + * allocation. 470 + */ 471 + if (kbuf->top_down) 472 + return locate_mem_hole_top_down(start, end, kbuf); 473 + return locate_mem_hole_bottom_up(start, end, kbuf); 474 + } 475 + 476 + /* 477 + * Helper function for placing a buffer in a kexec segment. This assumes 478 + * that kexec_mutex is held. 479 + */ 480 + int kexec_add_buffer(struct kimage *image, char *buffer, unsigned long bufsz, 481 + unsigned long memsz, unsigned long buf_align, 482 + unsigned long buf_min, unsigned long buf_max, 483 + bool top_down, unsigned long *load_addr) 484 + { 485 + 486 + struct kexec_segment *ksegment; 487 + struct kexec_buf buf, *kbuf; 488 + int ret; 489 + 490 + /* Currently adding segment this way is allowed only in file mode */ 491 + if (!image->file_mode) 492 + return -EINVAL; 493 + 494 + if (image->nr_segments >= KEXEC_SEGMENT_MAX) 495 + return -EINVAL; 496 + 497 + /* 498 + * Make sure we are not trying to add buffer after allocating 499 + * control pages. All segments need to be placed first before 500 + * any control pages are allocated. As control page allocation 501 + * logic goes through list of segments to make sure there are 502 + * no destination overlaps. 503 + */ 504 + if (!list_empty(&image->control_pages)) { 505 + WARN_ON(1); 506 + return -EINVAL; 507 + } 508 + 509 + memset(&buf, 0, sizeof(struct kexec_buf)); 510 + kbuf = &buf; 511 + kbuf->image = image; 512 + kbuf->buffer = buffer; 513 + kbuf->bufsz = bufsz; 514 + 515 + kbuf->memsz = ALIGN(memsz, PAGE_SIZE); 516 + kbuf->buf_align = max(buf_align, PAGE_SIZE); 517 + kbuf->buf_min = buf_min; 518 + kbuf->buf_max = buf_max; 519 + kbuf->top_down = top_down; 520 + 521 + /* Walk the RAM ranges and allocate a suitable range for the buffer */ 522 + if (image->type == KEXEC_TYPE_CRASH) 523 + ret = walk_iomem_res("Crash kernel", 524 + IORESOURCE_MEM | IORESOURCE_BUSY, 525 + crashk_res.start, crashk_res.end, kbuf, 526 + locate_mem_hole_callback); 527 + else 528 + ret = walk_system_ram_res(0, -1, kbuf, 529 + locate_mem_hole_callback); 530 + if (ret != 1) { 531 + /* A suitable memory range could not be found for buffer */ 532 + return -EADDRNOTAVAIL; 533 + } 534 + 535 + /* Found a suitable memory range */ 536 + ksegment = &image->segment[image->nr_segments]; 537 + ksegment->kbuf = kbuf->buffer; 538 + ksegment->bufsz = kbuf->bufsz; 539 + ksegment->mem = kbuf->mem; 540 + ksegment->memsz = kbuf->memsz; 541 + image->nr_segments++; 542 + *load_addr = ksegment->mem; 543 + return 0; 544 + } 545 + 546 + /* Calculate and store the digest of segments */ 547 + static int kexec_calculate_store_digests(struct kimage *image) 548 + { 549 + struct crypto_shash *tfm; 550 + struct shash_desc *desc; 551 + int ret = 0, i, j, zero_buf_sz, sha_region_sz; 552 + size_t desc_size, nullsz; 553 + char *digest; 554 + void *zero_buf; 555 + struct kexec_sha_region *sha_regions; 556 + struct purgatory_info *pi = &image->purgatory_info; 557 + 558 + zero_buf = __va(page_to_pfn(ZERO_PAGE(0)) << PAGE_SHIFT); 559 + zero_buf_sz = PAGE_SIZE; 560 + 561 + tfm = crypto_alloc_shash("sha256", 0, 0); 562 + if (IS_ERR(tfm)) { 563 + ret = PTR_ERR(tfm); 564 + goto out; 565 + } 566 + 567 + desc_size = crypto_shash_descsize(tfm) + sizeof(*desc); 568 + desc = kzalloc(desc_size, GFP_KERNEL); 569 + if (!desc) { 570 + ret = -ENOMEM; 571 + goto out_free_tfm; 572 + } 573 + 574 + sha_region_sz = KEXEC_SEGMENT_MAX * sizeof(struct kexec_sha_region); 575 + sha_regions = vzalloc(sha_region_sz); 576 + if (!sha_regions) 577 + goto out_free_desc; 578 + 579 + desc->tfm = tfm; 580 + desc->flags = 0; 581 + 582 + ret = crypto_shash_init(desc); 583 + if (ret < 0) 584 + goto out_free_sha_regions; 585 + 586 + digest = kzalloc(SHA256_DIGEST_SIZE, GFP_KERNEL); 587 + if (!digest) { 588 + ret = -ENOMEM; 589 + goto out_free_sha_regions; 590 + } 591 + 592 + for (j = i = 0; i < image->nr_segments; i++) { 593 + struct kexec_segment *ksegment; 594 + 595 + ksegment = &image->segment[i]; 596 + /* 597 + * Skip purgatory as it will be modified once we put digest 598 + * info in purgatory. 599 + */ 600 + if (ksegment->kbuf == pi->purgatory_buf) 601 + continue; 602 + 603 + ret = crypto_shash_update(desc, ksegment->kbuf, 604 + ksegment->bufsz); 605 + if (ret) 606 + break; 607 + 608 + /* 609 + * Assume rest of the buffer is filled with zero and 610 + * update digest accordingly. 611 + */ 612 + nullsz = ksegment->memsz - ksegment->bufsz; 613 + while (nullsz) { 614 + unsigned long bytes = nullsz; 615 + 616 + if (bytes > zero_buf_sz) 617 + bytes = zero_buf_sz; 618 + ret = crypto_shash_update(desc, zero_buf, bytes); 619 + if (ret) 620 + break; 621 + nullsz -= bytes; 622 + } 623 + 624 + if (ret) 625 + break; 626 + 627 + sha_regions[j].start = ksegment->mem; 628 + sha_regions[j].len = ksegment->memsz; 629 + j++; 630 + } 631 + 632 + if (!ret) { 633 + ret = crypto_shash_final(desc, digest); 634 + if (ret) 635 + goto out_free_digest; 636 + ret = kexec_purgatory_get_set_symbol(image, "sha_regions", 637 + sha_regions, sha_region_sz, 0); 638 + if (ret) 639 + goto out_free_digest; 640 + 641 + ret = kexec_purgatory_get_set_symbol(image, "sha256_digest", 642 + digest, SHA256_DIGEST_SIZE, 0); 643 + if (ret) 644 + goto out_free_digest; 645 + } 646 + 647 + out_free_digest: 648 + kfree(digest); 649 + out_free_sha_regions: 650 + vfree(sha_regions); 651 + out_free_desc: 652 + kfree(desc); 653 + out_free_tfm: 654 + kfree(tfm); 655 + out: 656 + return ret; 657 + } 658 + 659 + /* Actually load purgatory. Lot of code taken from kexec-tools */ 660 + static int __kexec_load_purgatory(struct kimage *image, unsigned long min, 661 + unsigned long max, int top_down) 662 + { 663 + struct purgatory_info *pi = &image->purgatory_info; 664 + unsigned long align, buf_align, bss_align, buf_sz, bss_sz, bss_pad; 665 + unsigned long memsz, entry, load_addr, curr_load_addr, bss_addr, offset; 666 + unsigned char *buf_addr, *src; 667 + int i, ret = 0, entry_sidx = -1; 668 + const Elf_Shdr *sechdrs_c; 669 + Elf_Shdr *sechdrs = NULL; 670 + void *purgatory_buf = NULL; 671 + 672 + /* 673 + * sechdrs_c points to section headers in purgatory and are read 674 + * only. No modifications allowed. 675 + */ 676 + sechdrs_c = (void *)pi->ehdr + pi->ehdr->e_shoff; 677 + 678 + /* 679 + * We can not modify sechdrs_c[] and its fields. It is read only. 680 + * Copy it over to a local copy where one can store some temporary 681 + * data and free it at the end. We need to modify ->sh_addr and 682 + * ->sh_offset fields to keep track of permanent and temporary 683 + * locations of sections. 684 + */ 685 + sechdrs = vzalloc(pi->ehdr->e_shnum * sizeof(Elf_Shdr)); 686 + if (!sechdrs) 687 + return -ENOMEM; 688 + 689 + memcpy(sechdrs, sechdrs_c, pi->ehdr->e_shnum * sizeof(Elf_Shdr)); 690 + 691 + /* 692 + * We seem to have multiple copies of sections. First copy is which 693 + * is embedded in kernel in read only section. Some of these sections 694 + * will be copied to a temporary buffer and relocated. And these 695 + * sections will finally be copied to their final destination at 696 + * segment load time. 697 + * 698 + * Use ->sh_offset to reflect section address in memory. It will 699 + * point to original read only copy if section is not allocatable. 700 + * Otherwise it will point to temporary copy which will be relocated. 701 + * 702 + * Use ->sh_addr to contain final address of the section where it 703 + * will go during execution time. 704 + */ 705 + for (i = 0; i < pi->ehdr->e_shnum; i++) { 706 + if (sechdrs[i].sh_type == SHT_NOBITS) 707 + continue; 708 + 709 + sechdrs[i].sh_offset = (unsigned long)pi->ehdr + 710 + sechdrs[i].sh_offset; 711 + } 712 + 713 + /* 714 + * Identify entry point section and make entry relative to section 715 + * start. 716 + */ 717 + entry = pi->ehdr->e_entry; 718 + for (i = 0; i < pi->ehdr->e_shnum; i++) { 719 + if (!(sechdrs[i].sh_flags & SHF_ALLOC)) 720 + continue; 721 + 722 + if (!(sechdrs[i].sh_flags & SHF_EXECINSTR)) 723 + continue; 724 + 725 + /* Make entry section relative */ 726 + if (sechdrs[i].sh_addr <= pi->ehdr->e_entry && 727 + ((sechdrs[i].sh_addr + sechdrs[i].sh_size) > 728 + pi->ehdr->e_entry)) { 729 + entry_sidx = i; 730 + entry -= sechdrs[i].sh_addr; 731 + break; 732 + } 733 + } 734 + 735 + /* Determine how much memory is needed to load relocatable object. */ 736 + buf_align = 1; 737 + bss_align = 1; 738 + buf_sz = 0; 739 + bss_sz = 0; 740 + 741 + for (i = 0; i < pi->ehdr->e_shnum; i++) { 742 + if (!(sechdrs[i].sh_flags & SHF_ALLOC)) 743 + continue; 744 + 745 + align = sechdrs[i].sh_addralign; 746 + if (sechdrs[i].sh_type != SHT_NOBITS) { 747 + if (buf_align < align) 748 + buf_align = align; 749 + buf_sz = ALIGN(buf_sz, align); 750 + buf_sz += sechdrs[i].sh_size; 751 + } else { 752 + /* bss section */ 753 + if (bss_align < align) 754 + bss_align = align; 755 + bss_sz = ALIGN(bss_sz, align); 756 + bss_sz += sechdrs[i].sh_size; 757 + } 758 + } 759 + 760 + /* Determine the bss padding required to align bss properly */ 761 + bss_pad = 0; 762 + if (buf_sz & (bss_align - 1)) 763 + bss_pad = bss_align - (buf_sz & (bss_align - 1)); 764 + 765 + memsz = buf_sz + bss_pad + bss_sz; 766 + 767 + /* Allocate buffer for purgatory */ 768 + purgatory_buf = vzalloc(buf_sz); 769 + if (!purgatory_buf) { 770 + ret = -ENOMEM; 771 + goto out; 772 + } 773 + 774 + if (buf_align < bss_align) 775 + buf_align = bss_align; 776 + 777 + /* Add buffer to segment list */ 778 + ret = kexec_add_buffer(image, purgatory_buf, buf_sz, memsz, 779 + buf_align, min, max, top_down, 780 + &pi->purgatory_load_addr); 781 + if (ret) 782 + goto out; 783 + 784 + /* Load SHF_ALLOC sections */ 785 + buf_addr = purgatory_buf; 786 + load_addr = curr_load_addr = pi->purgatory_load_addr; 787 + bss_addr = load_addr + buf_sz + bss_pad; 788 + 789 + for (i = 0; i < pi->ehdr->e_shnum; i++) { 790 + if (!(sechdrs[i].sh_flags & SHF_ALLOC)) 791 + continue; 792 + 793 + align = sechdrs[i].sh_addralign; 794 + if (sechdrs[i].sh_type != SHT_NOBITS) { 795 + curr_load_addr = ALIGN(curr_load_addr, align); 796 + offset = curr_load_addr - load_addr; 797 + /* We already modifed ->sh_offset to keep src addr */ 798 + src = (char *) sechdrs[i].sh_offset; 799 + memcpy(buf_addr + offset, src, sechdrs[i].sh_size); 800 + 801 + /* Store load address and source address of section */ 802 + sechdrs[i].sh_addr = curr_load_addr; 803 + 804 + /* 805 + * This section got copied to temporary buffer. Update 806 + * ->sh_offset accordingly. 807 + */ 808 + sechdrs[i].sh_offset = (unsigned long)(buf_addr + offset); 809 + 810 + /* Advance to the next address */ 811 + curr_load_addr += sechdrs[i].sh_size; 812 + } else { 813 + bss_addr = ALIGN(bss_addr, align); 814 + sechdrs[i].sh_addr = bss_addr; 815 + bss_addr += sechdrs[i].sh_size; 816 + } 817 + } 818 + 819 + /* Update entry point based on load address of text section */ 820 + if (entry_sidx >= 0) 821 + entry += sechdrs[entry_sidx].sh_addr; 822 + 823 + /* Make kernel jump to purgatory after shutdown */ 824 + image->start = entry; 825 + 826 + /* Used later to get/set symbol values */ 827 + pi->sechdrs = sechdrs; 828 + 829 + /* 830 + * Used later to identify which section is purgatory and skip it 831 + * from checksumming. 832 + */ 833 + pi->purgatory_buf = purgatory_buf; 834 + return ret; 835 + out: 836 + vfree(sechdrs); 837 + vfree(purgatory_buf); 838 + return ret; 839 + } 840 + 841 + static int kexec_apply_relocations(struct kimage *image) 842 + { 843 + int i, ret; 844 + struct purgatory_info *pi = &image->purgatory_info; 845 + Elf_Shdr *sechdrs = pi->sechdrs; 846 + 847 + /* Apply relocations */ 848 + for (i = 0; i < pi->ehdr->e_shnum; i++) { 849 + Elf_Shdr *section, *symtab; 850 + 851 + if (sechdrs[i].sh_type != SHT_RELA && 852 + sechdrs[i].sh_type != SHT_REL) 853 + continue; 854 + 855 + /* 856 + * For section of type SHT_RELA/SHT_REL, 857 + * ->sh_link contains section header index of associated 858 + * symbol table. And ->sh_info contains section header 859 + * index of section to which relocations apply. 860 + */ 861 + if (sechdrs[i].sh_info >= pi->ehdr->e_shnum || 862 + sechdrs[i].sh_link >= pi->ehdr->e_shnum) 863 + return -ENOEXEC; 864 + 865 + section = &sechdrs[sechdrs[i].sh_info]; 866 + symtab = &sechdrs[sechdrs[i].sh_link]; 867 + 868 + if (!(section->sh_flags & SHF_ALLOC)) 869 + continue; 870 + 871 + /* 872 + * symtab->sh_link contain section header index of associated 873 + * string table. 874 + */ 875 + if (symtab->sh_link >= pi->ehdr->e_shnum) 876 + /* Invalid section number? */ 877 + continue; 878 + 879 + /* 880 + * Respective architecture needs to provide support for applying 881 + * relocations of type SHT_RELA/SHT_REL. 882 + */ 883 + if (sechdrs[i].sh_type == SHT_RELA) 884 + ret = arch_kexec_apply_relocations_add(pi->ehdr, 885 + sechdrs, i); 886 + else if (sechdrs[i].sh_type == SHT_REL) 887 + ret = arch_kexec_apply_relocations(pi->ehdr, 888 + sechdrs, i); 889 + if (ret) 890 + return ret; 891 + } 892 + 893 + return 0; 894 + } 895 + 896 + /* Load relocatable purgatory object and relocate it appropriately */ 897 + int kexec_load_purgatory(struct kimage *image, unsigned long min, 898 + unsigned long max, int top_down, 899 + unsigned long *load_addr) 900 + { 901 + struct purgatory_info *pi = &image->purgatory_info; 902 + int ret; 903 + 904 + if (kexec_purgatory_size <= 0) 905 + return -EINVAL; 906 + 907 + if (kexec_purgatory_size < sizeof(Elf_Ehdr)) 908 + return -ENOEXEC; 909 + 910 + pi->ehdr = (Elf_Ehdr *)kexec_purgatory; 911 + 912 + if (memcmp(pi->ehdr->e_ident, ELFMAG, SELFMAG) != 0 913 + || pi->ehdr->e_type != ET_REL 914 + || !elf_check_arch(pi->ehdr) 915 + || pi->ehdr->e_shentsize != sizeof(Elf_Shdr)) 916 + return -ENOEXEC; 917 + 918 + if (pi->ehdr->e_shoff >= kexec_purgatory_size 919 + || (pi->ehdr->e_shnum * sizeof(Elf_Shdr) > 920 + kexec_purgatory_size - pi->ehdr->e_shoff)) 921 + return -ENOEXEC; 922 + 923 + ret = __kexec_load_purgatory(image, min, max, top_down); 924 + if (ret) 925 + return ret; 926 + 927 + ret = kexec_apply_relocations(image); 928 + if (ret) 929 + goto out; 930 + 931 + *load_addr = pi->purgatory_load_addr; 932 + return 0; 933 + out: 934 + vfree(pi->sechdrs); 935 + vfree(pi->purgatory_buf); 936 + return ret; 937 + } 938 + 939 + static Elf_Sym *kexec_purgatory_find_symbol(struct purgatory_info *pi, 940 + const char *name) 941 + { 942 + Elf_Sym *syms; 943 + Elf_Shdr *sechdrs; 944 + Elf_Ehdr *ehdr; 945 + int i, k; 946 + const char *strtab; 947 + 948 + if (!pi->sechdrs || !pi->ehdr) 949 + return NULL; 950 + 951 + sechdrs = pi->sechdrs; 952 + ehdr = pi->ehdr; 953 + 954 + for (i = 0; i < ehdr->e_shnum; i++) { 955 + if (sechdrs[i].sh_type != SHT_SYMTAB) 956 + continue; 957 + 958 + if (sechdrs[i].sh_link >= ehdr->e_shnum) 959 + /* Invalid strtab section number */ 960 + continue; 961 + strtab = (char *)sechdrs[sechdrs[i].sh_link].sh_offset; 962 + syms = (Elf_Sym *)sechdrs[i].sh_offset; 963 + 964 + /* Go through symbols for a match */ 965 + for (k = 0; k < sechdrs[i].sh_size/sizeof(Elf_Sym); k++) { 966 + if (ELF_ST_BIND(syms[k].st_info) != STB_GLOBAL) 967 + continue; 968 + 969 + if (strcmp(strtab + syms[k].st_name, name) != 0) 970 + continue; 971 + 972 + if (syms[k].st_shndx == SHN_UNDEF || 973 + syms[k].st_shndx >= ehdr->e_shnum) { 974 + pr_debug("Symbol: %s has bad section index %d.\n", 975 + name, syms[k].st_shndx); 976 + return NULL; 977 + } 978 + 979 + /* Found the symbol we are looking for */ 980 + return &syms[k]; 981 + } 982 + } 983 + 984 + return NULL; 985 + } 986 + 987 + void *kexec_purgatory_get_symbol_addr(struct kimage *image, const char *name) 988 + { 989 + struct purgatory_info *pi = &image->purgatory_info; 990 + Elf_Sym *sym; 991 + Elf_Shdr *sechdr; 992 + 993 + sym = kexec_purgatory_find_symbol(pi, name); 994 + if (!sym) 995 + return ERR_PTR(-EINVAL); 996 + 997 + sechdr = &pi->sechdrs[sym->st_shndx]; 998 + 999 + /* 1000 + * Returns the address where symbol will finally be loaded after 1001 + * kexec_load_segment() 1002 + */ 1003 + return (void *)(sechdr->sh_addr + sym->st_value); 1004 + } 1005 + 1006 + /* 1007 + * Get or set value of a symbol. If "get_value" is true, symbol value is 1008 + * returned in buf otherwise symbol value is set based on value in buf. 1009 + */ 1010 + int kexec_purgatory_get_set_symbol(struct kimage *image, const char *name, 1011 + void *buf, unsigned int size, bool get_value) 1012 + { 1013 + Elf_Sym *sym; 1014 + Elf_Shdr *sechdrs; 1015 + struct purgatory_info *pi = &image->purgatory_info; 1016 + char *sym_buf; 1017 + 1018 + sym = kexec_purgatory_find_symbol(pi, name); 1019 + if (!sym) 1020 + return -EINVAL; 1021 + 1022 + if (sym->st_size != size) { 1023 + pr_err("symbol %s size mismatch: expected %lu actual %u\n", 1024 + name, (unsigned long)sym->st_size, size); 1025 + return -EINVAL; 1026 + } 1027 + 1028 + sechdrs = pi->sechdrs; 1029 + 1030 + if (sechdrs[sym->st_shndx].sh_type == SHT_NOBITS) { 1031 + pr_err("symbol %s is in a bss section. Cannot %s\n", name, 1032 + get_value ? "get" : "set"); 1033 + return -EINVAL; 1034 + } 1035 + 1036 + sym_buf = (unsigned char *)sechdrs[sym->st_shndx].sh_offset + 1037 + sym->st_value; 1038 + 1039 + if (get_value) 1040 + memcpy((void *)buf, sym_buf, size); 1041 + else 1042 + memcpy((void *)sym_buf, buf, size); 1043 + 1044 + return 0; 1045 + }

+22

kernel/kexec_internal.h

··· 1 + #ifndef LINUX_KEXEC_INTERNAL_H 2 + #define LINUX_KEXEC_INTERNAL_H 3 + 4 + #include <linux/kexec.h> 5 + 6 + struct kimage *do_kimage_alloc_init(void); 7 + int sanity_check_segment_list(struct kimage *image); 8 + void kimage_free_page_list(struct list_head *list); 9 + void kimage_free(struct kimage *image); 10 + int kimage_load_segment(struct kimage *image, struct kexec_segment *segment); 11 + void kimage_terminate(struct kimage *image); 12 + int kimage_is_destination_range(struct kimage *image, 13 + unsigned long start, unsigned long end); 14 + 15 + extern struct mutex kexec_mutex; 16 + 17 + #ifdef CONFIG_KEXEC_FILE 18 + void kimage_file_post_load_cleanup(struct kimage *image); 19 + #else /* CONFIG_KEXEC_FILE */ 20 + static inline void kimage_file_post_load_cleanup(struct kimage *image) { } 21 + #endif /* CONFIG_KEXEC_FILE */ 22 + #endif /* LINUX_KEXEC_INTERNAL_H */

+54 -46

kernel/kmod.c

··· 45 45 46 46 extern int max_threads; 47 47 48 - static struct workqueue_struct *khelper_wq; 49 - 50 48 #define CAP_BSET (void *)1 51 49 #define CAP_PI (void *)2 52 50 ··· 112 114 * @...: arguments as specified in the format string 113 115 * 114 116 * Load a module using the user mode module loader. The function returns 115 - * zero on success or a negative errno code on failure. Note that a 116 - * successful module load does not mean the module did not then unload 117 - * and exit on an error of its own. Callers must check that the service 118 - * they requested is now available not blindly invoke it. 117 + * zero on success or a negative errno code or positive exit code from 118 + * "modprobe" on failure. Note that a successful module load does not mean 119 + * the module did not then unload and exit on an error of its own. Callers 120 + * must check that the service they requested is now available not blindly 121 + * invoke it. 119 122 * 120 123 * If module auto-loading support is disabled then this function 121 124 * becomes a no-operation. ··· 212 213 /* 213 214 * This is the task which runs the usermode application 214 215 */ 215 - static int ____call_usermodehelper(void *data) 216 + static int call_usermodehelper_exec_async(void *data) 216 217 { 217 218 struct subprocess_info *sub_info = data; 218 219 struct cred *new; ··· 222 223 flush_signal_handlers(current, 1); 223 224 spin_unlock_irq(&current->sighand->siglock); 224 225 225 - /* We can run anywhere, unlike our parent keventd(). */ 226 - set_cpus_allowed_ptr(current, cpu_all_mask); 227 - 228 226 /* 229 - * Our parent is keventd, which runs with elevated scheduling priority. 230 - * Avoid propagating that into the userspace child. 227 + * Our parent (unbound workqueue) runs with elevated scheduling 228 + * priority. Avoid propagating that into the userspace child. 231 229 */ 232 230 set_user_nice(current, 0); 233 231 ··· 254 258 (const char __user *const __user *)sub_info->envp); 255 259 out: 256 260 sub_info->retval = retval; 257 - /* wait_for_helper() will call umh_complete if UHM_WAIT_PROC. */ 261 + /* 262 + * call_usermodehelper_exec_sync() will call umh_complete 263 + * if UHM_WAIT_PROC. 264 + */ 258 265 if (!(sub_info->wait & UMH_WAIT_PROC)) 259 266 umh_complete(sub_info); 260 267 if (!retval) ··· 265 266 do_exit(0); 266 267 } 267 268 268 - /* Keventd can't block, but this (a child) can. */ 269 - static int wait_for_helper(void *data) 269 + /* Handles UMH_WAIT_PROC. */ 270 + static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) 270 271 { 271 - struct subprocess_info *sub_info = data; 272 272 pid_t pid; 273 273 274 274 /* If SIGCLD is ignored sys_wait4 won't populate the status. */ 275 275 kernel_sigaction(SIGCHLD, SIG_DFL); 276 - pid = kernel_thread(____call_usermodehelper, sub_info, SIGCHLD); 276 + pid = kernel_thread(call_usermodehelper_exec_async, sub_info, SIGCHLD); 277 277 if (pid < 0) { 278 278 sub_info->retval = pid; 279 279 } else { ··· 280 282 /* 281 283 * Normally it is bogus to call wait4() from in-kernel because 282 284 * wait4() wants to write the exit code to a userspace address. 283 - * But wait_for_helper() always runs as keventd, and put_user() 284 - * to a kernel address works OK for kernel threads, due to their 285 - * having an mm_segment_t which spans the entire address space. 285 + * But call_usermodehelper_exec_sync() always runs as kernel 286 + * thread (workqueue) and put_user() to a kernel address works 287 + * OK for kernel threads, due to their having an mm_segment_t 288 + * which spans the entire address space. 286 289 * 287 290 * Thus the __user pointer cast is valid here. 288 291 */ 289 292 sys_wait4(pid, (int __user *)&ret, 0, NULL); 290 293 291 294 /* 292 - * If ret is 0, either ____call_usermodehelper failed and the 293 - * real error code is already in sub_info->retval or 295 + * If ret is 0, either call_usermodehelper_exec_async failed and 296 + * the real error code is already in sub_info->retval or 294 297 * sub_info->retval is 0 anyway, so don't mess with it then. 295 298 */ 296 299 if (ret) 297 300 sub_info->retval = ret; 298 301 } 299 302 303 + /* Restore default kernel sig handler */ 304 + kernel_sigaction(SIGCHLD, SIG_IGN); 305 + 300 306 umh_complete(sub_info); 301 - do_exit(0); 302 307 } 303 308 304 - /* This is run by khelper thread */ 305 - static void __call_usermodehelper(struct work_struct *work) 309 + /* 310 + * We need to create the usermodehelper kernel thread from a task that is affine 311 + * to an optimized set of CPUs (or nohz housekeeping ones) such that they 312 + * inherit a widest affinity irrespective of call_usermodehelper() callers with 313 + * possibly reduced affinity (eg: per-cpu workqueues). We don't want 314 + * usermodehelper targets to contend a busy CPU. 315 + * 316 + * Unbound workqueues provide such wide affinity and allow to block on 317 + * UMH_WAIT_PROC requests without blocking pending request (up to some limit). 318 + * 319 + * Besides, workqueues provide the privilege level that caller might not have 320 + * to perform the usermodehelper request. 321 + * 322 + */ 323 + static void call_usermodehelper_exec_work(struct work_struct *work) 306 324 { 307 325 struct subprocess_info *sub_info = 308 326 container_of(work, struct subprocess_info, work); 309 - pid_t pid; 310 327 311 - if (sub_info->wait & UMH_WAIT_PROC) 312 - pid = kernel_thread(wait_for_helper, sub_info, 313 - CLONE_FS | CLONE_FILES | SIGCHLD); 314 - else 315 - pid = kernel_thread(____call_usermodehelper, sub_info, 328 + if (sub_info->wait & UMH_WAIT_PROC) { 329 + call_usermodehelper_exec_sync(sub_info); 330 + } else { 331 + pid_t pid; 332 + 333 + pid = kernel_thread(call_usermodehelper_exec_async, sub_info, 316 334 SIGCHLD); 317 - 318 - if (pid < 0) { 319 - sub_info->retval = pid; 320 - umh_complete(sub_info); 335 + if (pid < 0) { 336 + sub_info->retval = pid; 337 + umh_complete(sub_info); 338 + } 321 339 } 322 340 } 323 341 ··· 523 509 if (!sub_info) 524 510 goto out; 525 511 526 - INIT_WORK(&sub_info->work, __call_usermodehelper); 512 + INIT_WORK(&sub_info->work, call_usermodehelper_exec_work); 527 513 sub_info->path = path; 528 514 sub_info->argv = argv; 529 515 sub_info->envp = envp; ··· 545 531 * from interrupt context. 546 532 * 547 533 * Runs a user-space application. The application is started 548 - * asynchronously if wait is not set, and runs as a child of keventd. 549 - * (ie. it runs with full root capabilities). 534 + * asynchronously if wait is not set, and runs as a child of system workqueues. 535 + * (ie. it runs with full root capabilities and optimized affinity). 550 536 */ 551 537 int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) 552 538 { ··· 558 544 return -EINVAL; 559 545 } 560 546 helper_lock(); 561 - if (!khelper_wq || usermodehelper_disabled) { 547 + if (usermodehelper_disabled) { 562 548 retval = -EBUSY; 563 549 goto out; 564 550 } ··· 570 556 sub_info->complete = (wait == UMH_NO_WAIT) ? NULL : &done; 571 557 sub_info->wait = wait; 572 558 573 - queue_work(khelper_wq, &sub_info->work); 559 + queue_work(system_unbound_wq, &sub_info->work); 574 560 if (wait == UMH_NO_WAIT) /* task has freed sub_info */ 575 561 goto unlock; 576 562 ··· 700 686 }, 701 687 { } 702 688 }; 703 - 704 - void __init usermodehelper_init(void) 705 - { 706 - khelper_wq = create_singlethread_workqueue("khelper"); 707 - BUG_ON(!khelper_wq); 708 - }

+3 -3

kernel/ksysfs.c

··· 90 90 KERNEL_ATTR_RW(profiling); 91 91 #endif 92 92 93 - #ifdef CONFIG_KEXEC 93 + #ifdef CONFIG_KEXEC_CORE 94 94 static ssize_t kexec_loaded_show(struct kobject *kobj, 95 95 struct kobj_attribute *attr, char *buf) 96 96 { ··· 134 134 } 135 135 KERNEL_ATTR_RO(vmcoreinfo); 136 136 137 - #endif /* CONFIG_KEXEC */ 137 + #endif /* CONFIG_KEXEC_CORE */ 138 138 139 139 /* whether file capabilities are enabled */ 140 140 static ssize_t fscaps_show(struct kobject *kobj, ··· 196 196 #ifdef CONFIG_PROFILING 197 197 &profiling_attr.attr, 198 198 #endif 199 - #ifdef CONFIG_KEXEC 199 + #ifdef CONFIG_KEXEC_CORE 200 200 &kexec_loaded_attr.attr, 201 201 &kexec_crash_loaded_attr.attr, 202 202 &kexec_crash_size_attr.attr,

+1 -1

kernel/printk/printk.c

··· 835 835 .release = devkmsg_release, 836 836 }; 837 837 838 - #ifdef CONFIG_KEXEC 838 + #ifdef CONFIG_KEXEC_CORE 839 839 /* 840 840 * This appends the listed symbols to /proc/vmcore 841 841 *

+1 -1

kernel/reboot.c

··· 346 346 kernel_restart(buffer); 347 347 break; 348 348 349 - #ifdef CONFIG_KEXEC 349 + #ifdef CONFIG_KEXEC_CORE 350 350 case LINUX_REBOOT_CMD_KEXEC: 351 351 ret = kernel_kexec(); 352 352 break;

+6 -6

kernel/sysctl.c

··· 621 621 .proc_handler = proc_dointvec, 622 622 }, 623 623 #endif 624 - #ifdef CONFIG_KEXEC 624 + #ifdef CONFIG_KEXEC_CORE 625 625 { 626 626 .procname = "kexec_load_disabled", 627 627 .data = &kexec_load_disabled, ··· 1995 1995 int val = *valp; 1996 1996 if (val < 0) { 1997 1997 *negp = true; 1998 - *lvalp = (unsigned long)-val; 1998 + *lvalp = -(unsigned long)val; 1999 1999 } else { 2000 2000 *negp = false; 2001 2001 *lvalp = (unsigned long)val; ··· 2201 2201 int val = *valp; 2202 2202 if (val < 0) { 2203 2203 *negp = true; 2204 - *lvalp = (unsigned long)-val; 2204 + *lvalp = -(unsigned long)val; 2205 2205 } else { 2206 2206 *negp = false; 2207 2207 *lvalp = (unsigned long)val; ··· 2436 2436 unsigned long lval; 2437 2437 if (val < 0) { 2438 2438 *negp = true; 2439 - lval = (unsigned long)-val; 2439 + lval = -(unsigned long)val; 2440 2440 } else { 2441 2441 *negp = false; 2442 2442 lval = (unsigned long)val; ··· 2459 2459 unsigned long lval; 2460 2460 if (val < 0) { 2461 2461 *negp = true; 2462 - lval = (unsigned long)-val; 2462 + lval = -(unsigned long)val; 2463 2463 } else { 2464 2464 *negp = false; 2465 2465 lval = (unsigned long)val; ··· 2484 2484 unsigned long lval; 2485 2485 if (val < 0) { 2486 2486 *negp = true; 2487 - lval = (unsigned long)-val; 2487 + lval = -(unsigned long)val; 2488 2488 } else { 2489 2489 *negp = false; 2490 2490 lval = (unsigned long)val;

+26 -17

lib/bitmap.c

··· 367 367 368 368 nchunks = nbits = totaldigits = c = 0; 369 369 do { 370 - chunk = ndigits = 0; 370 + chunk = 0; 371 + ndigits = totaldigits; 371 372 372 373 /* Get the next chunk of the bitmap */ 373 374 while (buflen) { ··· 407 406 return -EOVERFLOW; 408 407 409 408 chunk = (chunk << 4) | hex_to_bin(c); 410 - ndigits++; totaldigits++; 409 + totaldigits++; 411 410 } 412 - if (ndigits == 0) 411 + if (ndigits == totaldigits) 413 412 return -EINVAL; 414 413 if (nchunks == 0 && chunk == 0) 415 414 continue; ··· 506 505 int nmaskbits) 507 506 { 508 507 unsigned a, b; 509 - int c, old_c, totaldigits; 508 + int c, old_c, totaldigits, ndigits; 510 509 const char __user __force *ubuf = (const char __user __force *)buf; 511 510 int at_start, in_range; 512 511 ··· 516 515 at_start = 1; 517 516 in_range = 0; 518 517 a = b = 0; 518 + ndigits = totaldigits; 519 519 520 520 /* Get the next cpu# or a range of cpu#'s */ 521 521 while (buflen) { ··· 530 528 if (isspace(c)) 531 529 continue; 532 530 533 - /* 534 - * If the last character was a space and the current 535 - * character isn't '\0', we've got embedded whitespace. 536 - * This is a no-no, so throw an error. 537 - */ 538 - if (totaldigits && c && isspace(old_c)) 539 - return -EINVAL; 540 - 541 531 /* A '\0' or a ',' signal the end of a cpu# or range */ 542 532 if (c == '\0' || c == ',') 543 533 break; 534 + /* 535 + * whitespaces between digits are not allowed, 536 + * but it's ok if whitespaces are on head or tail. 537 + * when old_c is whilespace, 538 + * if totaldigits == ndigits, whitespace is on head. 539 + * if whitespace is on tail, it should not run here. 540 + * as c was ',' or '\0', 541 + * the last code line has broken the current loop. 542 + */ 543 + if ((totaldigits != ndigits) && isspace(old_c)) 544 + return -EINVAL; 544 545 545 546 if (c == '-') { 546 547 if (at_start || in_range) 547 548 return -EINVAL; 548 549 b = 0; 549 550 in_range = 1; 551 + at_start = 1; 550 552 continue; 551 553 } 552 554 ··· 563 557 at_start = 0; 564 558 totaldigits++; 565 559 } 560 + if (ndigits == totaldigits) 561 + continue; 562 + /* if no digit is after '-', it's wrong*/ 563 + if (at_start && in_range) 564 + return -EINVAL; 566 565 if (!(a <= b)) 567 566 return -EINVAL; 568 567 if (b >= nmaskbits) 569 568 return -ERANGE; 570 - if (!at_start) { 571 - while (a <= b) { 572 - set_bit(a, maskp); 573 - a++; 574 - } 569 + while (a <= b) { 570 + set_bit(a, maskp); 571 + a++; 575 572 } 576 573 } while (buflen && c == ','); 577 574 return 0;

+3 -3

lib/decompress_bunzip2.c

··· 743 743 } 744 744 745 745 #ifdef PREBOOT 746 - STATIC int INIT decompress(unsigned char *buf, long len, 746 + STATIC int INIT __decompress(unsigned char *buf, long len, 747 747 long (*fill)(void*, unsigned long), 748 748 long (*flush)(void*, unsigned long), 749 - unsigned char *outbuf, 749 + unsigned char *outbuf, long olen, 750 750 long *pos, 751 - void(*error)(char *x)) 751 + void (*error)(char *x)) 752 752 { 753 753 return bunzip2(buf, len - 4, fill, flush, outbuf, pos, error); 754 754 }

+26 -5

lib/decompress_inflate.c

··· 1 1 #ifdef STATIC 2 + #define PREBOOT 2 3 /* Pre-boot environment: included */ 3 4 4 5 /* prevent inclusion of _LINUX_KERNEL_H in pre-boot environment: lots ··· 34 33 } 35 34 36 35 /* Included from initramfs et al code */ 37 - STATIC int INIT gunzip(unsigned char *buf, long len, 36 + STATIC int INIT __gunzip(unsigned char *buf, long len, 38 37 long (*fill)(void*, unsigned long), 39 38 long (*flush)(void*, unsigned long), 40 - unsigned char *out_buf, 39 + unsigned char *out_buf, long out_len, 41 40 long *pos, 42 41 void(*error)(char *x)) { 43 42 u8 *zbuf; 44 43 struct z_stream_s *strm; 45 44 int rc; 46 - size_t out_len; 47 45 48 46 rc = -1; 49 47 if (flush) { 50 48 out_len = 0x8000; /* 32 K */ 51 49 out_buf = malloc(out_len); 52 50 } else { 53 - out_len = ((size_t)~0) - (size_t)out_buf; /* no limit */ 51 + if (!out_len) 52 + out_len = ((size_t)~0) - (size_t)out_buf; /* no limit */ 54 53 } 55 54 if (!out_buf) { 56 55 error("Out of memory while allocating output buffer"); ··· 182 181 return rc; /* returns Z_OK (0) if successful */ 183 182 } 184 183 185 - #define decompress gunzip 184 + #ifndef PREBOOT 185 + STATIC int INIT gunzip(unsigned char *buf, long len, 186 + long (*fill)(void*, unsigned long), 187 + long (*flush)(void*, unsigned long), 188 + unsigned char *out_buf, 189 + long *pos, 190 + void (*error)(char *x)) 191 + { 192 + return __gunzip(buf, len, fill, flush, out_buf, 0, pos, error); 193 + } 194 + #else 195 + STATIC int INIT __decompress(unsigned char *buf, long len, 196 + long (*fill)(void*, unsigned long), 197 + long (*flush)(void*, unsigned long), 198 + unsigned char *out_buf, long out_len, 199 + long *pos, 200 + void (*error)(char *x)) 201 + { 202 + return __gunzip(buf, len, fill, flush, out_buf, out_len, pos, error); 203 + } 204 + #endif

+3 -3

lib/decompress_unlz4.c

··· 196 196 } 197 197 198 198 #ifdef PREBOOT 199 - STATIC int INIT decompress(unsigned char *buf, long in_len, 199 + STATIC int INIT __decompress(unsigned char *buf, long in_len, 200 200 long (*fill)(void*, unsigned long), 201 201 long (*flush)(void*, unsigned long), 202 - unsigned char *output, 202 + unsigned char *output, long out_len, 203 203 long *posp, 204 - void(*error)(char *x) 204 + void (*error)(char *x) 205 205 ) 206 206 { 207 207 return unlz4(buf, in_len - 4, fill, flush, output, posp, error);

+4 -5

lib/decompress_unlzma.c

··· 620 620 621 621 num_probs = LZMA_BASE_SIZE + (LZMA_LIT_SIZE << (lc + lp)); 622 622 p = (uint16_t *) large_malloc(num_probs * sizeof(*p)); 623 - if (p == 0) 623 + if (p == NULL) 624 624 goto exit_2; 625 625 num_probs = LZMA_LITERAL + (LZMA_LIT_SIZE << (lc + lp)); 626 626 for (i = 0; i < num_probs; i++) ··· 667 667 } 668 668 669 669 #ifdef PREBOOT 670 - STATIC int INIT decompress(unsigned char *buf, long in_len, 670 + STATIC int INIT __decompress(unsigned char *buf, long in_len, 671 671 long (*fill)(void*, unsigned long), 672 672 long (*flush)(void*, unsigned long), 673 - unsigned char *output, 673 + unsigned char *output, long out_len, 674 674 long *posp, 675 - void(*error)(char *x) 676 - ) 675 + void (*error)(char *x)) 677 676 { 678 677 return unlzma(buf, in_len - 4, fill, flush, output, posp, error); 679 678 }

+12 -1

lib/decompress_unlzo.c

··· 31 31 */ 32 32 33 33 #ifdef STATIC 34 + #define PREBOOT 34 35 #include "lzo/lzo1x_decompress_safe.c" 35 36 #else 36 37 #include <linux/decompress/unlzo.h> ··· 288 287 return ret; 289 288 } 290 289 291 - #define decompress unlzo 290 + #ifdef PREBOOT 291 + STATIC int INIT __decompress(unsigned char *buf, long len, 292 + long (*fill)(void*, unsigned long), 293 + long (*flush)(void*, unsigned long), 294 + unsigned char *out_buf, long olen, 295 + long *pos, 296 + void (*error)(char *x)) 297 + { 298 + return unlzo(buf, len, fill, flush, out_buf, pos, error); 299 + } 300 + #endif

+11 -1

lib/decompress_unxz.c

··· 394 394 * This macro is used by architecture-specific files to decompress 395 395 * the kernel image. 396 396 */ 397 - #define decompress unxz 397 + #ifdef XZ_PREBOOT 398 + STATIC int INIT __decompress(unsigned char *buf, long len, 399 + long (*fill)(void*, unsigned long), 400 + long (*flush)(void*, unsigned long), 401 + unsigned char *out_buf, long olen, 402 + long *pos, 403 + void (*error)(char *x)) 404 + { 405 + return unxz(buf, len, fill, flush, out_buf, pos, error); 406 + } 407 + #endif

+1 -1

lib/kstrtox.c

··· 152 152 rv = _kstrtoull(s + 1, base, &tmp); 153 153 if (rv < 0) 154 154 return rv; 155 - if ((long long)(-tmp) >= 0) 155 + if ((long long)-tmp > 0) 156 156 return -ERANGE; 157 157 *res = -tmp; 158 158 } else {

+11 -9

lib/string_helpers.c

··· 410 410 * @dst: destination buffer (escaped) 411 411 * @osz: destination buffer size 412 412 * @flags: combination of the flags (bitwise OR): 413 - * %ESCAPE_SPACE: 413 + * %ESCAPE_SPACE: (special white space, not space itself) 414 414 * '\f' - form feed 415 415 * '\n' - new line 416 416 * '\r' - carriage return ··· 432 432 * all previous together 433 433 * %ESCAPE_HEX: 434 434 * '\xHH' - byte with hexadecimal value HH (2 digits) 435 - * @esc: NULL-terminated string of characters any of which, if found in 436 - * the source, has to be escaped 435 + * @only: NULL-terminated string containing characters used to limit 436 + * the selected escape class. If characters are included in @only 437 + * that would not normally be escaped by the classes selected 438 + * in @flags, they will be copied to @dst unescaped. 437 439 * 438 440 * Description: 439 441 * The process of escaping byte buffer includes several parts. They are applied 440 442 * in the following sequence. 441 443 * 1. The character is matched to the printable class, if asked, and in 442 444 * case of match it passes through to the output. 443 - * 2. The character is not matched to the one from @esc string and thus 444 - * must go as is to the output. 445 + * 2. The character is not matched to the one from @only string and thus 446 + * must go as-is to the output. 445 447 * 3. The character is checked if it falls into the class given by @flags. 446 448 * %ESCAPE_OCTAL and %ESCAPE_HEX are going last since they cover any 447 449 * character. Note that they actually can't go together, otherwise ··· 460 458 * dst for a '\0' terminator if and only if ret < osz. 461 459 */ 462 460 int string_escape_mem(const char *src, size_t isz, char *dst, size_t osz, 463 - unsigned int flags, const char *esc) 461 + unsigned int flags, const char *only) 464 462 { 465 463 char *p = dst; 466 464 char *end = p + osz; 467 - bool is_dict = esc && *esc; 465 + bool is_dict = only && *only; 468 466 469 467 while (isz--) { 470 468 unsigned char c = *src++; ··· 473 471 * Apply rules in the following sequence: 474 472 * - the character is printable, when @flags has 475 473 * %ESCAPE_NP bit set 476 - * - the @esc string is supplied and does not contain a 474 + * - the @only string is supplied and does not contain a 477 475 * character under question 478 476 * - the character doesn't fall into a class of symbols 479 477 * defined by given @flags ··· 481 479 * output buffer. 482 480 */ 483 481 if ((flags & ESCAPE_NP && isprint(c)) || 484 - (is_dict && !strchr(esc, c))) { 482 + (is_dict && !strchr(only, c))) { 485 483 /* do nothing */ 486 484 } else { 487 485 if (flags & ESCAPE_SPACE && escape_space(c, &p, end))

+1 -5

lib/test-kstrtox.c

··· 260 260 {"4294967297", 10, 4294967297LL}, 261 261 {"9223372036854775807", 10, 9223372036854775807LL}, 262 262 263 + {"-0", 10, 0LL}, 263 264 {"-1", 10, -1LL}, 264 265 {"-2", 10, -2LL}, 265 266 {"-9223372036854775808", 10, LLONG_MIN}, ··· 278 277 {"-9223372036854775809", 10}, 279 278 {"-18446744073709551614", 10}, 280 279 {"-18446744073709551615", 10}, 281 - /* negative zero isn't an integer in Linux */ 282 - {"-0", 0}, 283 - {"-0", 8}, 284 - {"-0", 10}, 285 - {"-0", 16}, 286 280 /* sign is first character if any */ 287 281 {"-+1", 0}, 288 282 {"-+1", 8},

+3 -3

lib/test_kasan.c

··· 65 65 kfree(ptr); 66 66 } 67 67 68 - static noinline void __init kmalloc_large_oob_rigth(void) 68 + static noinline void __init kmalloc_large_oob_right(void) 69 69 { 70 70 char *ptr; 71 71 size_t size = KMALLOC_MAX_CACHE_SIZE + 10; ··· 114 114 kfree(ptr1); 115 115 return; 116 116 } 117 - ptr2[size1] = 'x'; 117 + ptr2[size2] = 'x'; 118 118 kfree(ptr2); 119 119 } 120 120 ··· 259 259 kmalloc_oob_right(); 260 260 kmalloc_oob_left(); 261 261 kmalloc_node_oob_right(); 262 - kmalloc_large_oob_rigth(); 262 + kmalloc_large_oob_right(); 263 263 kmalloc_oob_krealloc_more(); 264 264 kmalloc_oob_krealloc_less(); 265 265 kmalloc_oob_16();

+3 -3

lib/zlib_deflate/deftree.c

··· 35 35 /* #include "deflate.h" */ 36 36 37 37 #include <linux/zutil.h> 38 + #include <linux/bitrev.h> 38 39 #include "defutil.h" 39 40 40 41 #ifdef DEBUG_ZLIB ··· 147 146 static void compress_block (deflate_state *s, ct_data *ltree, 148 147 ct_data *dtree); 149 148 static void set_data_type (deflate_state *s); 150 - static unsigned bi_reverse (unsigned value, int length); 151 149 static void bi_windup (deflate_state *s); 152 150 static void bi_flush (deflate_state *s); 153 151 static void copy_block (deflate_state *s, char *buf, unsigned len, ··· 284 284 /* The static distance tree is trivial: */ 285 285 for (n = 0; n < D_CODES; n++) { 286 286 static_dtree[n].Len = 5; 287 - static_dtree[n].Code = bi_reverse((unsigned)n, 5); 287 + static_dtree[n].Code = bitrev32((u32)n) >> (32 - 5); 288 288 } 289 289 static_init_done = 1; 290 290 } ··· 520 520 int len = tree[n].Len; 521 521 if (len == 0) continue; 522 522 /* Now reverse the bits */ 523 - tree[n].Code = bi_reverse(next_code[len]++, len); 523 + tree[n].Code = bitrev32((u32)(next_code[len]++)) >> (32 - len); 524 524 525 525 Tracecv(tree != static_ltree, (stderr,"\nn %3d %c l %2d c %4x (%x) ", 526 526 n, (isgraph(n) ? n : ' '), len, tree[n].Code, next_code[len]-1));

-16

lib/zlib_deflate/defutil.h

··· 293 293 } 294 294 295 295 /* =========================================================================== 296 - * Reverse the first len bits of a code, using straightforward code (a faster 297 - * method would use a table) 298 - * IN assertion: 1 <= len <= 15 299 - */ 300 - static inline unsigned bi_reverse(unsigned code, /* the value to invert */ 301 - int len) /* its bit length */ 302 - { 303 - register unsigned res = 0; 304 - do { 305 - res |= code & 1; 306 - code >>= 1, res <<= 1; 307 - } while (--len > 0); 308 - return res >> 1; 309 - } 310 - 311 - /* =========================================================================== 312 296 * Flush the bit buffer, keeping at most 7 bits in it. 313 297 */ 314 298 static inline void bi_flush(deflate_state *s)

+12

mm/Kconfig

··· 649 649 processes running early in the lifetime of the systemm until kswapd 650 650 finishes the initialisation. 651 651 652 + config IDLE_PAGE_TRACKING 653 + bool "Enable idle page tracking" 654 + depends on SYSFS && MMU 655 + select PAGE_EXTENSION if !64BIT 656 + help 657 + This feature allows to estimate the amount of user pages that have 658 + not been touched during a given period of time. This information can 659 + be useful to tune memory cgroup limits and/or for job placement 660 + within a compute cluster. 661 + 662 + See Documentation/vm/idle_page_tracking.txt for more details. 663 + 652 664 config ZONE_DEVICE 653 665 bool "Device memory (pmem, etc...) hotplug support" if EXPERT 654 666 default !ZONE_DMA

+1

mm/Makefile

··· 79 79 obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o 80 80 obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o 81 81 obj-$(CONFIG_USERFAULTFD) += userfaultfd.o 82 + obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o

+4

mm/debug.c

··· 48 48 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 49 49 {1UL << PG_compound_lock, "compound_lock" }, 50 50 #endif 51 + #if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT) 52 + {1UL << PG_young, "young" }, 53 + {1UL << PG_idle, "idle" }, 54 + #endif 51 55 }; 52 56 53 57 static void dump_flags(unsigned long flags,

+10 -2

mm/huge_memory.c

··· 25 25 #include <linux/migrate.h> 26 26 #include <linux/hashtable.h> 27 27 #include <linux/userfaultfd_k.h> 28 + #include <linux/page_idle.h> 28 29 29 30 #include <asm/tlb.h> 30 31 #include <asm/pgalloc.h> ··· 1758 1757 /* clear PageTail before overwriting first_page */ 1759 1758 smp_wmb(); 1760 1759 1760 + if (page_is_young(page)) 1761 + set_page_young(page_tail); 1762 + if (page_is_idle(page)) 1763 + set_page_idle(page_tail); 1764 + 1761 1765 /* 1762 1766 * __split_huge_page_splitting() already set the 1763 1767 * splitting bit in all pmd that could map this ··· 2268 2262 VM_BUG_ON_PAGE(PageLRU(page), page); 2269 2263 2270 2264 /* If there is no mapped pte young don't collapse the page */ 2271 - if (pte_young(pteval) || PageReferenced(page) || 2265 + if (pte_young(pteval) || 2266 + page_is_young(page) || PageReferenced(page) || 2272 2267 mmu_notifier_test_young(vma->vm_mm, address)) 2273 2268 referenced = true; 2274 2269 } ··· 2700 2693 */ 2701 2694 if (page_count(page) != 1 + !!PageSwapCache(page)) 2702 2695 goto out_unmap; 2703 - if (pte_young(pteval) || PageReferenced(page) || 2696 + if (pte_young(pteval) || 2697 + page_is_young(page) || PageReferenced(page) || 2704 2698 mmu_notifier_test_young(vma->vm_mm, address)) 2705 2699 referenced = true; 2706 2700 }

+1 -4

mm/hwpoison-inject.c

··· 45 45 /* 46 46 * do a racy check with elevated page count, to make sure PG_hwpoison 47 47 * will only be set for the targeted owner (or on a free page). 48 - * We temporarily take page lock for try_get_mem_cgroup_from_page(). 49 48 * memory_failure() will redo the check reliably inside page lock. 50 49 */ 51 - lock_page(hpage); 52 50 err = hwpoison_filter(hpage); 53 - unlock_page(hpage); 54 51 if (err) 55 52 goto put_out; 56 53 ··· 123 126 if (!dentry) 124 127 goto fail; 125 128 126 - #ifdef CONFIG_MEMCG_SWAP 129 + #ifdef CONFIG_MEMCG 127 130 dentry = debugfs_create_u64("corrupt-filter-memcg", 0600, 128 131 hwpoison_dir, &hwpoison_filter_memcg); 129 132 if (!dentry)

+5 -14

mm/kmemleak.c

··· 302 302 struct kmemleak_object *object) 303 303 { 304 304 const u8 *ptr = (const u8 *)object->pointer; 305 - int i, len, remaining; 306 - unsigned char linebuf[HEX_ROW_SIZE * 5]; 305 + size_t len; 307 306 308 307 /* limit the number of lines to HEX_MAX_LINES */ 309 - remaining = len = 310 - min(object->size, (size_t)(HEX_MAX_LINES * HEX_ROW_SIZE)); 308 + len = min_t(size_t, object->size, HEX_MAX_LINES * HEX_ROW_SIZE); 311 309 312 - seq_printf(seq, " hex dump (first %d bytes):\n", len); 313 - for (i = 0; i < len; i += HEX_ROW_SIZE) { 314 - int linelen = min(remaining, HEX_ROW_SIZE); 315 - 316 - remaining -= HEX_ROW_SIZE; 317 - hex_dump_to_buffer(ptr + i, linelen, HEX_ROW_SIZE, 318 - HEX_GROUP_SIZE, linebuf, sizeof(linebuf), 319 - HEX_ASCII); 320 - seq_printf(seq, " %s\n", linebuf); 321 - } 310 + seq_printf(seq, " hex dump (first %zu bytes):\n", len); 311 + seq_hex_dump(seq, " ", DUMP_PREFIX_NONE, HEX_ROW_SIZE, 312 + HEX_GROUP_SIZE, ptr, len, HEX_ASCII); 322 313 } 323 314 324 315 /*

+40 -36

mm/memcontrol.c

··· 441 441 return &memcg->css; 442 442 } 443 443 444 + /** 445 + * page_cgroup_ino - return inode number of the memcg a page is charged to 446 + * @page: the page 447 + * 448 + * Look up the closest online ancestor of the memory cgroup @page is charged to 449 + * and return its inode number or 0 if @page is not charged to any cgroup. It 450 + * is safe to call this function without holding a reference to @page. 451 + * 452 + * Note, this function is inherently racy, because there is nothing to prevent 453 + * the cgroup inode from getting torn down and potentially reallocated a moment 454 + * after page_cgroup_ino() returns, so it only should be used by callers that 455 + * do not care (such as procfs interfaces). 456 + */ 457 + ino_t page_cgroup_ino(struct page *page) 458 + { 459 + struct mem_cgroup *memcg; 460 + unsigned long ino = 0; 461 + 462 + rcu_read_lock(); 463 + memcg = READ_ONCE(page->mem_cgroup); 464 + while (memcg && !(memcg->css.flags & CSS_ONLINE)) 465 + memcg = parent_mem_cgroup(memcg); 466 + if (memcg) 467 + ino = cgroup_ino(memcg->css.cgroup); 468 + rcu_read_unlock(); 469 + return ino; 470 + } 471 + 444 472 static struct mem_cgroup_per_zone * 445 473 mem_cgroup_page_zoneinfo(struct mem_cgroup *memcg, struct page *page) 446 474 { ··· 2097 2069 page_counter_uncharge(&memcg->memsw, nr_pages); 2098 2070 2099 2071 css_put_many(&memcg->css, nr_pages); 2100 - } 2101 - 2102 - /* 2103 - * try_get_mem_cgroup_from_page - look up page's memcg association 2104 - * @page: the page 2105 - * 2106 - * Look up, get a css reference, and return the memcg that owns @page. 2107 - * 2108 - * The page must be locked to prevent racing with swap-in and page 2109 - * cache charges. If coming from an unlocked page table, the caller 2110 - * must ensure the page is on the LRU or this can race with charging. 2111 - */ 2112 - struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page) 2113 - { 2114 - struct mem_cgroup *memcg; 2115 - unsigned short id; 2116 - swp_entry_t ent; 2117 - 2118 - VM_BUG_ON_PAGE(!PageLocked(page), page); 2119 - 2120 - memcg = page->mem_cgroup; 2121 - if (memcg) { 2122 - if (!css_tryget_online(&memcg->css)) 2123 - memcg = NULL; 2124 - } else if (PageSwapCache(page)) { 2125 - ent.val = page_private(page); 2126 - id = lookup_swap_cgroup_id(ent); 2127 - rcu_read_lock(); 2128 - memcg = mem_cgroup_from_id(id); 2129 - if (memcg && !css_tryget_online(&memcg->css)) 2130 - memcg = NULL; 2131 - rcu_read_unlock(); 2132 - } 2133 - return memcg; 2134 2072 } 2135 2073 2136 2074 static void lock_page_lru(struct page *page, int *isolated) ··· 5295 5301 * the page lock, which serializes swap cache removal, which 5296 5302 * in turn serializes uncharging. 5297 5303 */ 5304 + VM_BUG_ON_PAGE(!PageLocked(page), page); 5298 5305 if (page->mem_cgroup) 5299 5306 goto out; 5307 + 5308 + if (do_swap_account) { 5309 + swp_entry_t ent = { .val = page_private(page), }; 5310 + unsigned short id = lookup_swap_cgroup_id(ent); 5311 + 5312 + rcu_read_lock(); 5313 + memcg = mem_cgroup_from_id(id); 5314 + if (memcg && !css_tryget_online(&memcg->css)) 5315 + memcg = NULL; 5316 + rcu_read_unlock(); 5317 + } 5300 5318 } 5301 5319 5302 5320 if (PageTransHuge(page)) { ··· 5316 5310 VM_BUG_ON_PAGE(!PageTransHuge(page), page); 5317 5311 } 5318 5312 5319 - if (do_swap_account && PageSwapCache(page)) 5320 - memcg = try_get_mem_cgroup_from_page(page); 5321 5313 if (!memcg) 5322 5314 memcg = get_mem_cgroup_from_mm(mm); 5323 5315

+2 -14

mm/memory-failure.c

··· 130 130 * can only guarantee that the page either belongs to the memcg tasks, or is 131 131 * a freed page. 132 132 */ 133 - #ifdef CONFIG_MEMCG_SWAP 133 + #ifdef CONFIG_MEMCG 134 134 u64 hwpoison_filter_memcg; 135 135 EXPORT_SYMBOL_GPL(hwpoison_filter_memcg); 136 136 static int hwpoison_filter_task(struct page *p) 137 137 { 138 - struct mem_cgroup *mem; 139 - struct cgroup_subsys_state *css; 140 - unsigned long ino; 141 - 142 138 if (!hwpoison_filter_memcg) 143 139 return 0; 144 140 145 - mem = try_get_mem_cgroup_from_page(p); 146 - if (!mem) 147 - return -EINVAL; 148 - 149 - css = &mem->css; 150 - ino = cgroup_ino(css->cgroup); 151 - css_put(css); 152 - 153 - if (ino != hwpoison_filter_memcg) 141 + if (page_cgroup_ino(p) != hwpoison_filter_memcg) 154 142 return -EINVAL; 155 143 156 144 return 0;

+2 -2

mm/memory.c

··· 3233 3233 static int create_huge_pmd(struct mm_struct *mm, struct vm_area_struct *vma, 3234 3234 unsigned long address, pmd_t *pmd, unsigned int flags) 3235 3235 { 3236 - if (!vma->vm_ops) 3236 + if (vma_is_anonymous(vma)) 3237 3237 return do_huge_pmd_anonymous_page(mm, vma, address, pmd, flags); 3238 3238 if (vma->vm_ops->pmd_fault) 3239 3239 return vma->vm_ops->pmd_fault(vma, address, pmd, flags); ··· 3244 3244 unsigned long address, pmd_t *pmd, pmd_t orig_pmd, 3245 3245 unsigned int flags) 3246 3246 { 3247 - if (!vma->vm_ops) 3247 + if (vma_is_anonymous(vma)) 3248 3248 return do_huge_pmd_wp_page(mm, vma, address, pmd, orig_pmd); 3249 3249 if (vma->vm_ops->pmd_fault) 3250 3250 return vma->vm_ops->pmd_fault(vma, address, pmd, flags);

+6

mm/migrate.c

··· 37 37 #include <linux/gfp.h> 38 38 #include <linux/balloon_compaction.h> 39 39 #include <linux/mmu_notifier.h> 40 + #include <linux/page_idle.h> 40 41 41 42 #include <asm/tlbflush.h> 42 43 ··· 524 523 else 525 524 __set_page_dirty_nobuffers(newpage); 526 525 } 526 + 527 + if (page_is_young(page)) 528 + set_page_young(newpage); 529 + if (page_is_idle(page)) 530 + set_page_idle(newpage); 527 531 528 532 /* 529 533 * Copy NUMA information to the new page, to prevent over-eager

+12 -6

mm/mmap.c

··· 612 612 void __vma_link_rb(struct mm_struct *mm, struct vm_area_struct *vma, 613 613 struct rb_node **rb_link, struct rb_node *rb_parent) 614 614 { 615 + WARN_ONCE(vma->vm_file && !vma->vm_ops, "missing vma->vm_ops"); 616 + 615 617 /* Update tracking information for the gap following the new vma. */ 616 618 if (vma->vm_next) 617 619 vma_gap_update(vma->vm_next); ··· 1262 1260 /* 1263 1261 * The caller must hold down_write(&current->mm->mmap_sem). 1264 1262 */ 1265 - 1266 - unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, 1263 + unsigned long do_mmap(struct file *file, unsigned long addr, 1267 1264 unsigned long len, unsigned long prot, 1268 - unsigned long flags, unsigned long pgoff, 1269 - unsigned long *populate) 1265 + unsigned long flags, vm_flags_t vm_flags, 1266 + unsigned long pgoff, unsigned long *populate) 1270 1267 { 1271 1268 struct mm_struct *mm = current->mm; 1272 - vm_flags_t vm_flags; 1273 1269 1274 1270 *populate = 0; 1275 1271 ··· 1311 1311 * to. we assume access permissions have been handled by the open 1312 1312 * of the memory object, so we don't do any here. 1313 1313 */ 1314 - vm_flags = calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) | 1314 + vm_flags |= calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) | 1315 1315 mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; 1316 1316 1317 1317 if (flags & MAP_LOCKED) ··· 1637 1637 * be updated for vma_link() 1638 1638 */ 1639 1639 WARN_ON_ONCE(addr != vma->vm_start); 1640 + 1641 + /* All file mapping must have ->vm_ops set */ 1642 + if (!vma->vm_ops) { 1643 + static const struct vm_operations_struct dummy_ops = {}; 1644 + vma->vm_ops = &dummy_ops; 1645 + } 1640 1646 1641 1647 addr = vma->vm_start; 1642 1648 vm_flags = vma->vm_flags;

+17

mm/mmu_notifier.c

··· 123 123 return young; 124 124 } 125 125 126 + int __mmu_notifier_clear_young(struct mm_struct *mm, 127 + unsigned long start, 128 + unsigned long end) 129 + { 130 + struct mmu_notifier *mn; 131 + int young = 0, id; 132 + 133 + id = srcu_read_lock(&srcu); 134 + hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) { 135 + if (mn->ops->clear_young) 136 + young |= mn->ops->clear_young(mn, mm, start, end); 137 + } 138 + srcu_read_unlock(&srcu, id); 139 + 140 + return young; 141 + } 142 + 126 143 int __mmu_notifier_test_young(struct mm_struct *mm, 127 144 unsigned long address) 128 145 {

+10 -9

mm/nommu.c

··· 1233 1233 /* 1234 1234 * handle mapping creation for uClinux 1235 1235 */ 1236 - unsigned long do_mmap_pgoff(struct file *file, 1237 - unsigned long addr, 1238 - unsigned long len, 1239 - unsigned long prot, 1240 - unsigned long flags, 1241 - unsigned long pgoff, 1242 - unsigned long *populate) 1236 + unsigned long do_mmap(struct file *file, 1237 + unsigned long addr, 1238 + unsigned long len, 1239 + unsigned long prot, 1240 + unsigned long flags, 1241 + vm_flags_t vm_flags, 1242 + unsigned long pgoff, 1243 + unsigned long *populate) 1243 1244 { 1244 1245 struct vm_area_struct *vma; 1245 1246 struct vm_region *region; 1246 1247 struct rb_node *rb; 1247 - unsigned long capabilities, vm_flags, result; 1248 + unsigned long capabilities, result; 1248 1249 int ret; 1249 1250 1250 1251 *populate = 0; ··· 1263 1262 1264 1263 /* we've determined that we can make the mapping, now translate what we 1265 1264 * now know into VMA flags */ 1266 - vm_flags = determine_vm_flags(file, prot, flags, capabilities); 1265 + vm_flags |= determine_vm_flags(file, prot, flags, capabilities); 1267 1266 1268 1267 /* we're going to need to record the mapping */ 1269 1268 region = kmem_cache_zalloc(vm_region_jar, GFP_KERNEL);

+4

mm/page_ext.c

··· 6 6 #include <linux/vmalloc.h> 7 7 #include <linux/kmemleak.h> 8 8 #include <linux/page_owner.h> 9 + #include <linux/page_idle.h> 9 10 10 11 /* 11 12 * struct page extension ··· 59 58 #endif 60 59 #ifdef CONFIG_PAGE_OWNER 61 60 &page_owner_ops, 61 + #endif 62 + #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) 63 + &page_idle_ops, 62 64 #endif 63 65 }; 64 66

+232

mm/page_idle.c

··· 1 + #include <linux/init.h> 2 + #include <linux/bootmem.h> 3 + #include <linux/fs.h> 4 + #include <linux/sysfs.h> 5 + #include <linux/kobject.h> 6 + #include <linux/mm.h> 7 + #include <linux/mmzone.h> 8 + #include <linux/pagemap.h> 9 + #include <linux/rmap.h> 10 + #include <linux/mmu_notifier.h> 11 + #include <linux/page_ext.h> 12 + #include <linux/page_idle.h> 13 + 14 + #define BITMAP_CHUNK_SIZE sizeof(u64) 15 + #define BITMAP_CHUNK_BITS (BITMAP_CHUNK_SIZE * BITS_PER_BYTE) 16 + 17 + /* 18 + * Idle page tracking only considers user memory pages, for other types of 19 + * pages the idle flag is always unset and an attempt to set it is silently 20 + * ignored. 21 + * 22 + * We treat a page as a user memory page if it is on an LRU list, because it is 23 + * always safe to pass such a page to rmap_walk(), which is essential for idle 24 + * page tracking. With such an indicator of user pages we can skip isolated 25 + * pages, but since there are not usually many of them, it will hardly affect 26 + * the overall result. 27 + * 28 + * This function tries to get a user memory page by pfn as described above. 29 + */ 30 + static struct page *page_idle_get_page(unsigned long pfn) 31 + { 32 + struct page *page; 33 + struct zone *zone; 34 + 35 + if (!pfn_valid(pfn)) 36 + return NULL; 37 + 38 + page = pfn_to_page(pfn); 39 + if (!page || !PageLRU(page) || 40 + !get_page_unless_zero(page)) 41 + return NULL; 42 + 43 + zone = page_zone(page); 44 + spin_lock_irq(&zone->lru_lock); 45 + if (unlikely(!PageLRU(page))) { 46 + put_page(page); 47 + page = NULL; 48 + } 49 + spin_unlock_irq(&zone->lru_lock); 50 + return page; 51 + } 52 + 53 + static int page_idle_clear_pte_refs_one(struct page *page, 54 + struct vm_area_struct *vma, 55 + unsigned long addr, void *arg) 56 + { 57 + struct mm_struct *mm = vma->vm_mm; 58 + spinlock_t *ptl; 59 + pmd_t *pmd; 60 + pte_t *pte; 61 + bool referenced = false; 62 + 63 + if (unlikely(PageTransHuge(page))) { 64 + pmd = page_check_address_pmd(page, mm, addr, 65 + PAGE_CHECK_ADDRESS_PMD_FLAG, &ptl); 66 + if (pmd) { 67 + referenced = pmdp_clear_young_notify(vma, addr, pmd); 68 + spin_unlock(ptl); 69 + } 70 + } else { 71 + pte = page_check_address(page, mm, addr, &ptl, 0); 72 + if (pte) { 73 + referenced = ptep_clear_young_notify(vma, addr, pte); 74 + pte_unmap_unlock(pte, ptl); 75 + } 76 + } 77 + if (referenced) { 78 + clear_page_idle(page); 79 + /* 80 + * We cleared the referenced bit in a mapping to this page. To 81 + * avoid interference with page reclaim, mark it young so that 82 + * page_referenced() will return > 0. 83 + */ 84 + set_page_young(page); 85 + } 86 + return SWAP_AGAIN; 87 + } 88 + 89 + static void page_idle_clear_pte_refs(struct page *page) 90 + { 91 + /* 92 + * Since rwc.arg is unused, rwc is effectively immutable, so we 93 + * can make it static const to save some cycles and stack. 94 + */ 95 + static const struct rmap_walk_control rwc = { 96 + .rmap_one = page_idle_clear_pte_refs_one, 97 + .anon_lock = page_lock_anon_vma_read, 98 + }; 99 + bool need_lock; 100 + 101 + if (!page_mapped(page) || 102 + !page_rmapping(page)) 103 + return; 104 + 105 + need_lock = !PageAnon(page) || PageKsm(page); 106 + if (need_lock && !trylock_page(page)) 107 + return; 108 + 109 + rmap_walk(page, (struct rmap_walk_control *)&rwc); 110 + 111 + if (need_lock) 112 + unlock_page(page); 113 + } 114 + 115 + static ssize_t page_idle_bitmap_read(struct file *file, struct kobject *kobj, 116 + struct bin_attribute *attr, char *buf, 117 + loff_t pos, size_t count) 118 + { 119 + u64 *out = (u64 *)buf; 120 + struct page *page; 121 + unsigned long pfn, end_pfn; 122 + int bit; 123 + 124 + if (pos % BITMAP_CHUNK_SIZE || count % BITMAP_CHUNK_SIZE) 125 + return -EINVAL; 126 + 127 + pfn = pos * BITS_PER_BYTE; 128 + if (pfn >= max_pfn) 129 + return 0; 130 + 131 + end_pfn = pfn + count * BITS_PER_BYTE; 132 + if (end_pfn > max_pfn) 133 + end_pfn = ALIGN(max_pfn, BITMAP_CHUNK_BITS); 134 + 135 + for (; pfn < end_pfn; pfn++) { 136 + bit = pfn % BITMAP_CHUNK_BITS; 137 + if (!bit) 138 + *out = 0ULL; 139 + page = page_idle_get_page(pfn); 140 + if (page) { 141 + if (page_is_idle(page)) { 142 + /* 143 + * The page might have been referenced via a 144 + * pte, in which case it is not idle. Clear 145 + * refs and recheck. 146 + */ 147 + page_idle_clear_pte_refs(page); 148 + if (page_is_idle(page)) 149 + *out |= 1ULL << bit; 150 + } 151 + put_page(page); 152 + } 153 + if (bit == BITMAP_CHUNK_BITS - 1) 154 + out++; 155 + cond_resched(); 156 + } 157 + return (char *)out - buf; 158 + } 159 + 160 + static ssize_t page_idle_bitmap_write(struct file *file, struct kobject *kobj, 161 + struct bin_attribute *attr, char *buf, 162 + loff_t pos, size_t count) 163 + { 164 + const u64 *in = (u64 *)buf; 165 + struct page *page; 166 + unsigned long pfn, end_pfn; 167 + int bit; 168 + 169 + if (pos % BITMAP_CHUNK_SIZE || count % BITMAP_CHUNK_SIZE) 170 + return -EINVAL; 171 + 172 + pfn = pos * BITS_PER_BYTE; 173 + if (pfn >= max_pfn) 174 + return -ENXIO; 175 + 176 + end_pfn = pfn + count * BITS_PER_BYTE; 177 + if (end_pfn > max_pfn) 178 + end_pfn = ALIGN(max_pfn, BITMAP_CHUNK_BITS); 179 + 180 + for (; pfn < end_pfn; pfn++) { 181 + bit = pfn % BITMAP_CHUNK_BITS; 182 + if ((*in >> bit) & 1) { 183 + page = page_idle_get_page(pfn); 184 + if (page) { 185 + page_idle_clear_pte_refs(page); 186 + set_page_idle(page); 187 + put_page(page); 188 + } 189 + } 190 + if (bit == BITMAP_CHUNK_BITS - 1) 191 + in++; 192 + cond_resched(); 193 + } 194 + return (char *)in - buf; 195 + } 196 + 197 + static struct bin_attribute page_idle_bitmap_attr = 198 + __BIN_ATTR(bitmap, S_IRUSR | S_IWUSR, 199 + page_idle_bitmap_read, page_idle_bitmap_write, 0); 200 + 201 + static struct bin_attribute *page_idle_bin_attrs[] = { 202 + &page_idle_bitmap_attr, 203 + NULL, 204 + }; 205 + 206 + static struct attribute_group page_idle_attr_group = { 207 + .bin_attrs = page_idle_bin_attrs, 208 + .name = "page_idle", 209 + }; 210 + 211 + #ifndef CONFIG_64BIT 212 + static bool need_page_idle(void) 213 + { 214 + return true; 215 + } 216 + struct page_ext_operations page_idle_ops = { 217 + .need = need_page_idle, 218 + }; 219 + #endif 220 + 221 + static int __init page_idle_init(void) 222 + { 223 + int err; 224 + 225 + err = sysfs_create_group(mm_kobj, &page_idle_attr_group); 226 + if (err) { 227 + pr_err("page_idle: register sysfs failed\n"); 228 + return err; 229 + } 230 + return 0; 231 + } 232 + subsys_initcall(page_idle_init);

+6

mm/rmap.c

··· 59 59 #include <linux/migrate.h> 60 60 #include <linux/hugetlb.h> 61 61 #include <linux/backing-dev.h> 62 + #include <linux/page_idle.h> 62 63 63 64 #include <asm/tlbflush.h> 64 65 ··· 886 885 } 887 886 pte_unmap_unlock(pte, ptl); 888 887 } 888 + 889 + if (referenced) 890 + clear_page_idle(page); 891 + if (test_and_clear_page_young(page)) 892 + referenced++; 889 893 890 894 if (referenced) { 891 895 pra->referenced++;

+3

mm/swap.c

··· 32 32 #include <linux/gfp.h> 33 33 #include <linux/uio.h> 34 34 #include <linux/hugetlb.h> 35 + #include <linux/page_idle.h> 35 36 36 37 #include "internal.h" 37 38 ··· 623 622 } else if (!PageReferenced(page)) { 624 623 SetPageReferenced(page); 625 624 } 625 + if (page_is_idle(page)) 626 + clear_page_idle(page); 626 627 } 627 628 EXPORT_SYMBOL(mark_page_accessed); 628 629

+33

mm/zpool.c

··· 100 100 } 101 101 102 102 /** 103 + * zpool_has_pool() - Check if the pool driver is available 104 + * @type The type of the zpool to check (e.g. zbud, zsmalloc) 105 + * 106 + * This checks if the @type pool driver is available. This will try to load 107 + * the requested module, if needed, but there is no guarantee the module will 108 + * still be loaded and available immediately after calling. If this returns 109 + * true, the caller should assume the pool is available, but must be prepared 110 + * to handle the @zpool_create_pool() returning failure. However if this 111 + * returns false, the caller should assume the requested pool type is not 112 + * available; either the requested pool type module does not exist, or could 113 + * not be loaded, and calling @zpool_create_pool() with the pool type will 114 + * fail. 115 + * 116 + * Returns: true if @type pool is available, false if not 117 + */ 118 + bool zpool_has_pool(char *type) 119 + { 120 + struct zpool_driver *driver = zpool_get_driver(type); 121 + 122 + if (!driver) { 123 + request_module("zpool-%s", type); 124 + driver = zpool_get_driver(type); 125 + } 126 + 127 + if (!driver) 128 + return false; 129 + 130 + zpool_put_driver(driver); 131 + return true; 132 + } 133 + EXPORT_SYMBOL(zpool_has_pool); 134 + 135 + /** 103 136 * zpool_create_pool() - Create a new zpool 104 137 * @type The type of the zpool to create (e.g. zbud, zsmalloc) 105 138 * @name The name of the zpool (e.g. zram0, zswap)

+541 -163

mm/zswap.c

··· 80 80 static bool zswap_enabled; 81 81 module_param_named(enabled, zswap_enabled, bool, 0644); 82 82 83 - /* Compressor to be used by zswap (fixed at boot for now) */ 83 + /* Crypto compressor to use */ 84 84 #define ZSWAP_COMPRESSOR_DEFAULT "lzo" 85 - static char *zswap_compressor = ZSWAP_COMPRESSOR_DEFAULT; 86 - module_param_named(compressor, zswap_compressor, charp, 0444); 85 + static char zswap_compressor[CRYPTO_MAX_ALG_NAME] = ZSWAP_COMPRESSOR_DEFAULT; 86 + static struct kparam_string zswap_compressor_kparam = { 87 + .string = zswap_compressor, 88 + .maxlen = sizeof(zswap_compressor), 89 + }; 90 + static int zswap_compressor_param_set(const char *, 91 + const struct kernel_param *); 92 + static struct kernel_param_ops zswap_compressor_param_ops = { 93 + .set = zswap_compressor_param_set, 94 + .get = param_get_string, 95 + }; 96 + module_param_cb(compressor, &zswap_compressor_param_ops, 97 + &zswap_compressor_kparam, 0644); 98 + 99 + /* Compressed storage zpool to use */ 100 + #define ZSWAP_ZPOOL_DEFAULT "zbud" 101 + static char zswap_zpool_type[32 /* arbitrary */] = ZSWAP_ZPOOL_DEFAULT; 102 + static struct kparam_string zswap_zpool_kparam = { 103 + .string = zswap_zpool_type, 104 + .maxlen = sizeof(zswap_zpool_type), 105 + }; 106 + static int zswap_zpool_param_set(const char *, const struct kernel_param *); 107 + static struct kernel_param_ops zswap_zpool_param_ops = { 108 + .set = zswap_zpool_param_set, 109 + .get = param_get_string, 110 + }; 111 + module_param_cb(zpool, &zswap_zpool_param_ops, &zswap_zpool_kparam, 0644); 87 112 88 113 /* The maximum percentage of memory that the compressed pool can occupy */ 89 114 static unsigned int zswap_max_pool_percent = 20; 90 - module_param_named(max_pool_percent, 91 - zswap_max_pool_percent, uint, 0644); 92 - 93 - /* Compressed storage to use */ 94 - #define ZSWAP_ZPOOL_DEFAULT "zbud" 95 - static char *zswap_zpool_type = ZSWAP_ZPOOL_DEFAULT; 96 - module_param_named(zpool, zswap_zpool_type, charp, 0444); 97 - 98 - /* zpool is shared by all of zswap backend */ 99 - static struct zpool *zswap_pool; 100 - 101 - /********************************* 102 - * compression functions 103 - **********************************/ 104 - /* per-cpu compression transforms */ 105 - static struct crypto_comp * __percpu *zswap_comp_pcpu_tfms; 106 - 107 - enum comp_op { 108 - ZSWAP_COMPOP_COMPRESS, 109 - ZSWAP_COMPOP_DECOMPRESS 110 - }; 111 - 112 - static int zswap_comp_op(enum comp_op op, const u8 *src, unsigned int slen, 113 - u8 *dst, unsigned int *dlen) 114 - { 115 - struct crypto_comp *tfm; 116 - int ret; 117 - 118 - tfm = *per_cpu_ptr(zswap_comp_pcpu_tfms, get_cpu()); 119 - switch (op) { 120 - case ZSWAP_COMPOP_COMPRESS: 121 - ret = crypto_comp_compress(tfm, src, slen, dst, dlen); 122 - break; 123 - case ZSWAP_COMPOP_DECOMPRESS: 124 - ret = crypto_comp_decompress(tfm, src, slen, dst, dlen); 125 - break; 126 - default: 127 - ret = -EINVAL; 128 - } 129 - 130 - put_cpu(); 131 - return ret; 132 - } 133 - 134 - static int __init zswap_comp_init(void) 135 - { 136 - if (!crypto_has_comp(zswap_compressor, 0, 0)) { 137 - pr_info("%s compressor not available\n", zswap_compressor); 138 - /* fall back to default compressor */ 139 - zswap_compressor = ZSWAP_COMPRESSOR_DEFAULT; 140 - if (!crypto_has_comp(zswap_compressor, 0, 0)) 141 - /* can't even load the default compressor */ 142 - return -ENODEV; 143 - } 144 - pr_info("using %s compressor\n", zswap_compressor); 145 - 146 - /* alloc percpu transforms */ 147 - zswap_comp_pcpu_tfms = alloc_percpu(struct crypto_comp *); 148 - if (!zswap_comp_pcpu_tfms) 149 - return -ENOMEM; 150 - return 0; 151 - } 152 - 153 - static void __init zswap_comp_exit(void) 154 - { 155 - /* free percpu transforms */ 156 - free_percpu(zswap_comp_pcpu_tfms); 157 - } 115 + module_param_named(max_pool_percent, zswap_max_pool_percent, uint, 0644); 158 116 159 117 /********************************* 160 118 * data structures 161 119 **********************************/ 120 + 121 + struct zswap_pool { 122 + struct zpool *zpool; 123 + struct crypto_comp * __percpu *tfm; 124 + struct kref kref; 125 + struct list_head list; 126 + struct rcu_head rcu_head; 127 + struct notifier_block notifier; 128 + char tfm_name[CRYPTO_MAX_ALG_NAME]; 129 + }; 130 + 162 131 /* 163 132 * struct zswap_entry 164 133 * ··· 135 166 * page within zswap. 136 167 * 137 168 * rbnode - links the entry into red-black tree for the appropriate swap type 169 + * offset - the swap offset for the entry. Index into the red-black tree. 138 170 * refcount - the number of outstanding reference to the entry. This is needed 139 171 * to protect against premature freeing of the entry by code 140 172 * concurrent calls to load, invalidate, and writeback. The lock 141 173 * for the zswap_tree structure that contains the entry must 142 174 * be held while changing the refcount. Since the lock must 143 175 * be held, there is no reason to also make refcount atomic. 144 - * offset - the swap offset for the entry. Index into the red-black tree. 145 - * handle - zpool allocation handle that stores the compressed page data 146 176 * length - the length in bytes of the compressed page data. Needed during 147 177 * decompression 178 + * pool - the zswap_pool the entry's data is in 179 + * handle - zpool allocation handle that stores the compressed page data 148 180 */ 149 181 struct zswap_entry { 150 182 struct rb_node rbnode; 151 183 pgoff_t offset; 152 184 int refcount; 153 185 unsigned int length; 186 + struct zswap_pool *pool; 154 187 unsigned long handle; 155 188 }; 156 189 ··· 171 200 }; 172 201 173 202 static struct zswap_tree *zswap_trees[MAX_SWAPFILES]; 203 + 204 + /* RCU-protected iteration */ 205 + static LIST_HEAD(zswap_pools); 206 + /* protects zswap_pools list modification */ 207 + static DEFINE_SPINLOCK(zswap_pools_lock); 208 + 209 + /* used by param callback function */ 210 + static bool zswap_init_started; 211 + 212 + /********************************* 213 + * helpers and fwd declarations 214 + **********************************/ 215 + 216 + #define zswap_pool_debug(msg, p) \ 217 + pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ 218 + zpool_get_type((p)->zpool)) 219 + 220 + static int zswap_writeback_entry(struct zpool *pool, unsigned long handle); 221 + static int zswap_pool_get(struct zswap_pool *pool); 222 + static void zswap_pool_put(struct zswap_pool *pool); 223 + 224 + static const struct zpool_ops zswap_zpool_ops = { 225 + .evict = zswap_writeback_entry 226 + }; 227 + 228 + static bool zswap_is_full(void) 229 + { 230 + return totalram_pages * zswap_max_pool_percent / 100 < 231 + DIV_ROUND_UP(zswap_pool_total_size, PAGE_SIZE); 232 + } 233 + 234 + static void zswap_update_total_size(void) 235 + { 236 + struct zswap_pool *pool; 237 + u64 total = 0; 238 + 239 + rcu_read_lock(); 240 + 241 + list_for_each_entry_rcu(pool, &zswap_pools, list) 242 + total += zpool_get_total_size(pool->zpool); 243 + 244 + rcu_read_unlock(); 245 + 246 + zswap_pool_total_size = total; 247 + } 174 248 175 249 /********************************* 176 250 * zswap entry functions ··· 310 294 */ 311 295 static void zswap_free_entry(struct zswap_entry *entry) 312 296 { 313 - zpool_free(zswap_pool, entry->handle); 297 + zpool_free(entry->pool->zpool, entry->handle); 298 + zswap_pool_put(entry->pool); 314 299 zswap_entry_cache_free(entry); 315 300 atomic_dec(&zswap_stored_pages); 316 - zswap_pool_total_size = zpool_get_total_size(zswap_pool); 301 + zswap_update_total_size(); 317 302 } 318 303 319 304 /* caller must hold the tree lock */ ··· 356 339 **********************************/ 357 340 static DEFINE_PER_CPU(u8 *, zswap_dstmem); 358 341 359 - static int __zswap_cpu_notifier(unsigned long action, unsigned long cpu) 342 + static int __zswap_cpu_dstmem_notifier(unsigned long action, unsigned long cpu) 360 343 { 361 - struct crypto_comp *tfm; 362 344 u8 *dst; 363 345 364 346 switch (action) { 365 347 case CPU_UP_PREPARE: 366 - tfm = crypto_alloc_comp(zswap_compressor, 0, 0); 367 - if (IS_ERR(tfm)) { 368 - pr_err("can't allocate compressor transform\n"); 369 - return NOTIFY_BAD; 370 - } 371 - *per_cpu_ptr(zswap_comp_pcpu_tfms, cpu) = tfm; 372 348 dst = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu)); 373 349 if (!dst) { 374 350 pr_err("can't allocate compressor buffer\n"); 375 - crypto_free_comp(tfm); 376 - *per_cpu_ptr(zswap_comp_pcpu_tfms, cpu) = NULL; 377 351 return NOTIFY_BAD; 378 352 } 379 353 per_cpu(zswap_dstmem, cpu) = dst; 380 354 break; 381 355 case CPU_DEAD: 382 356 case CPU_UP_CANCELED: 383 - tfm = *per_cpu_ptr(zswap_comp_pcpu_tfms, cpu); 384 - if (tfm) { 385 - crypto_free_comp(tfm); 386 - *per_cpu_ptr(zswap_comp_pcpu_tfms, cpu) = NULL; 387 - } 388 357 dst = per_cpu(zswap_dstmem, cpu); 389 358 kfree(dst); 390 359 per_cpu(zswap_dstmem, cpu) = NULL; ··· 381 378 return NOTIFY_OK; 382 379 } 383 380 384 - static int zswap_cpu_notifier(struct notifier_block *nb, 385 - unsigned long action, void *pcpu) 381 + static int zswap_cpu_dstmem_notifier(struct notifier_block *nb, 382 + unsigned long action, void *pcpu) 386 383 { 387 - unsigned long cpu = (unsigned long)pcpu; 388 - return __zswap_cpu_notifier(action, cpu); 384 + return __zswap_cpu_dstmem_notifier(action, (unsigned long)pcpu); 389 385 } 390 386 391 - static struct notifier_block zswap_cpu_notifier_block = { 392 - .notifier_call = zswap_cpu_notifier 387 + static struct notifier_block zswap_dstmem_notifier = { 388 + .notifier_call = zswap_cpu_dstmem_notifier, 393 389 }; 394 390 395 - static int __init zswap_cpu_init(void) 391 + static int __init zswap_cpu_dstmem_init(void) 396 392 { 397 393 unsigned long cpu; 398 394 399 395 cpu_notifier_register_begin(); 400 396 for_each_online_cpu(cpu) 401 - if (__zswap_cpu_notifier(CPU_UP_PREPARE, cpu) != NOTIFY_OK) 397 + if (__zswap_cpu_dstmem_notifier(CPU_UP_PREPARE, cpu) == 398 + NOTIFY_BAD) 402 399 goto cleanup; 403 - __register_cpu_notifier(&zswap_cpu_notifier_block); 400 + __register_cpu_notifier(&zswap_dstmem_notifier); 404 401 cpu_notifier_register_done(); 405 402 return 0; 406 403 407 404 cleanup: 408 405 for_each_online_cpu(cpu) 409 - __zswap_cpu_notifier(CPU_UP_CANCELED, cpu); 406 + __zswap_cpu_dstmem_notifier(CPU_UP_CANCELED, cpu); 410 407 cpu_notifier_register_done(); 411 408 return -ENOMEM; 412 409 } 413 410 414 - /********************************* 415 - * helpers 416 - **********************************/ 417 - static bool zswap_is_full(void) 411 + static void zswap_cpu_dstmem_destroy(void) 418 412 { 419 - return totalram_pages * zswap_max_pool_percent / 100 < 420 - DIV_ROUND_UP(zswap_pool_total_size, PAGE_SIZE); 413 + unsigned long cpu; 414 + 415 + cpu_notifier_register_begin(); 416 + for_each_online_cpu(cpu) 417 + __zswap_cpu_dstmem_notifier(CPU_UP_CANCELED, cpu); 418 + __unregister_cpu_notifier(&zswap_dstmem_notifier); 419 + cpu_notifier_register_done(); 420 + } 421 + 422 + static int __zswap_cpu_comp_notifier(struct zswap_pool *pool, 423 + unsigned long action, unsigned long cpu) 424 + { 425 + struct crypto_comp *tfm; 426 + 427 + switch (action) { 428 + case CPU_UP_PREPARE: 429 + if (WARN_ON(*per_cpu_ptr(pool->tfm, cpu))) 430 + break; 431 + tfm = crypto_alloc_comp(pool->tfm_name, 0, 0); 432 + if (IS_ERR_OR_NULL(tfm)) { 433 + pr_err("could not alloc crypto comp %s : %ld\n", 434 + pool->tfm_name, PTR_ERR(tfm)); 435 + return NOTIFY_BAD; 436 + } 437 + *per_cpu_ptr(pool->tfm, cpu) = tfm; 438 + break; 439 + case CPU_DEAD: 440 + case CPU_UP_CANCELED: 441 + tfm = *per_cpu_ptr(pool->tfm, cpu); 442 + if (!IS_ERR_OR_NULL(tfm)) 443 + crypto_free_comp(tfm); 444 + *per_cpu_ptr(pool->tfm, cpu) = NULL; 445 + break; 446 + default: 447 + break; 448 + } 449 + return NOTIFY_OK; 450 + } 451 + 452 + static int zswap_cpu_comp_notifier(struct notifier_block *nb, 453 + unsigned long action, void *pcpu) 454 + { 455 + unsigned long cpu = (unsigned long)pcpu; 456 + struct zswap_pool *pool = container_of(nb, typeof(*pool), notifier); 457 + 458 + return __zswap_cpu_comp_notifier(pool, action, cpu); 459 + } 460 + 461 + static int zswap_cpu_comp_init(struct zswap_pool *pool) 462 + { 463 + unsigned long cpu; 464 + 465 + memset(&pool->notifier, 0, sizeof(pool->notifier)); 466 + pool->notifier.notifier_call = zswap_cpu_comp_notifier; 467 + 468 + cpu_notifier_register_begin(); 469 + for_each_online_cpu(cpu) 470 + if (__zswap_cpu_comp_notifier(pool, CPU_UP_PREPARE, cpu) == 471 + NOTIFY_BAD) 472 + goto cleanup; 473 + __register_cpu_notifier(&pool->notifier); 474 + cpu_notifier_register_done(); 475 + return 0; 476 + 477 + cleanup: 478 + for_each_online_cpu(cpu) 479 + __zswap_cpu_comp_notifier(pool, CPU_UP_CANCELED, cpu); 480 + cpu_notifier_register_done(); 481 + return -ENOMEM; 482 + } 483 + 484 + static void zswap_cpu_comp_destroy(struct zswap_pool *pool) 485 + { 486 + unsigned long cpu; 487 + 488 + cpu_notifier_register_begin(); 489 + for_each_online_cpu(cpu) 490 + __zswap_cpu_comp_notifier(pool, CPU_UP_CANCELED, cpu); 491 + __unregister_cpu_notifier(&pool->notifier); 492 + cpu_notifier_register_done(); 493 + } 494 + 495 + /********************************* 496 + * pool functions 497 + **********************************/ 498 + 499 + static struct zswap_pool *__zswap_pool_current(void) 500 + { 501 + struct zswap_pool *pool; 502 + 503 + pool = list_first_or_null_rcu(&zswap_pools, typeof(*pool), list); 504 + WARN_ON(!pool); 505 + 506 + return pool; 507 + } 508 + 509 + static struct zswap_pool *zswap_pool_current(void) 510 + { 511 + assert_spin_locked(&zswap_pools_lock); 512 + 513 + return __zswap_pool_current(); 514 + } 515 + 516 + static struct zswap_pool *zswap_pool_current_get(void) 517 + { 518 + struct zswap_pool *pool; 519 + 520 + rcu_read_lock(); 521 + 522 + pool = __zswap_pool_current(); 523 + if (!pool || !zswap_pool_get(pool)) 524 + pool = NULL; 525 + 526 + rcu_read_unlock(); 527 + 528 + return pool; 529 + } 530 + 531 + static struct zswap_pool *zswap_pool_last_get(void) 532 + { 533 + struct zswap_pool *pool, *last = NULL; 534 + 535 + rcu_read_lock(); 536 + 537 + list_for_each_entry_rcu(pool, &zswap_pools, list) 538 + last = pool; 539 + if (!WARN_ON(!last) && !zswap_pool_get(last)) 540 + last = NULL; 541 + 542 + rcu_read_unlock(); 543 + 544 + return last; 545 + } 546 + 547 + static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor) 548 + { 549 + struct zswap_pool *pool; 550 + 551 + assert_spin_locked(&zswap_pools_lock); 552 + 553 + list_for_each_entry_rcu(pool, &zswap_pools, list) { 554 + if (strncmp(pool->tfm_name, compressor, sizeof(pool->tfm_name))) 555 + continue; 556 + if (strncmp(zpool_get_type(pool->zpool), type, 557 + sizeof(zswap_zpool_type))) 558 + continue; 559 + /* if we can't get it, it's about to be destroyed */ 560 + if (!zswap_pool_get(pool)) 561 + continue; 562 + return pool; 563 + } 564 + 565 + return NULL; 566 + } 567 + 568 + static struct zswap_pool *zswap_pool_create(char *type, char *compressor) 569 + { 570 + struct zswap_pool *pool; 571 + gfp_t gfp = __GFP_NORETRY | __GFP_NOWARN; 572 + 573 + pool = kzalloc(sizeof(*pool), GFP_KERNEL); 574 + if (!pool) { 575 + pr_err("pool alloc failed\n"); 576 + return NULL; 577 + } 578 + 579 + pool->zpool = zpool_create_pool(type, "zswap", gfp, &zswap_zpool_ops); 580 + if (!pool->zpool) { 581 + pr_err("%s zpool not available\n", type); 582 + goto error; 583 + } 584 + pr_debug("using %s zpool\n", zpool_get_type(pool->zpool)); 585 + 586 + strlcpy(pool->tfm_name, compressor, sizeof(pool->tfm_name)); 587 + pool->tfm = alloc_percpu(struct crypto_comp *); 588 + if (!pool->tfm) { 589 + pr_err("percpu alloc failed\n"); 590 + goto error; 591 + } 592 + 593 + if (zswap_cpu_comp_init(pool)) 594 + goto error; 595 + pr_debug("using %s compressor\n", pool->tfm_name); 596 + 597 + /* being the current pool takes 1 ref; this func expects the 598 + * caller to always add the new pool as the current pool 599 + */ 600 + kref_init(&pool->kref); 601 + INIT_LIST_HEAD(&pool->list); 602 + 603 + zswap_pool_debug("created", pool); 604 + 605 + return pool; 606 + 607 + error: 608 + free_percpu(pool->tfm); 609 + if (pool->zpool) 610 + zpool_destroy_pool(pool->zpool); 611 + kfree(pool); 612 + return NULL; 613 + } 614 + 615 + static struct zswap_pool *__zswap_pool_create_fallback(void) 616 + { 617 + if (!crypto_has_comp(zswap_compressor, 0, 0)) { 618 + pr_err("compressor %s not available, using default %s\n", 619 + zswap_compressor, ZSWAP_COMPRESSOR_DEFAULT); 620 + strncpy(zswap_compressor, ZSWAP_COMPRESSOR_DEFAULT, 621 + sizeof(zswap_compressor)); 622 + } 623 + if (!zpool_has_pool(zswap_zpool_type)) { 624 + pr_err("zpool %s not available, using default %s\n", 625 + zswap_zpool_type, ZSWAP_ZPOOL_DEFAULT); 626 + strncpy(zswap_zpool_type, ZSWAP_ZPOOL_DEFAULT, 627 + sizeof(zswap_zpool_type)); 628 + } 629 + 630 + return zswap_pool_create(zswap_zpool_type, zswap_compressor); 631 + } 632 + 633 + static void zswap_pool_destroy(struct zswap_pool *pool) 634 + { 635 + zswap_pool_debug("destroying", pool); 636 + 637 + zswap_cpu_comp_destroy(pool); 638 + free_percpu(pool->tfm); 639 + zpool_destroy_pool(pool->zpool); 640 + kfree(pool); 641 + } 642 + 643 + static int __must_check zswap_pool_get(struct zswap_pool *pool) 644 + { 645 + return kref_get_unless_zero(&pool->kref); 646 + } 647 + 648 + static void __zswap_pool_release(struct rcu_head *head) 649 + { 650 + struct zswap_pool *pool = container_of(head, typeof(*pool), rcu_head); 651 + 652 + /* nobody should have been able to get a kref... */ 653 + WARN_ON(kref_get_unless_zero(&pool->kref)); 654 + 655 + /* pool is now off zswap_pools list and has no references. */ 656 + zswap_pool_destroy(pool); 657 + } 658 + 659 + static void __zswap_pool_empty(struct kref *kref) 660 + { 661 + struct zswap_pool *pool; 662 + 663 + pool = container_of(kref, typeof(*pool), kref); 664 + 665 + spin_lock(&zswap_pools_lock); 666 + 667 + WARN_ON(pool == zswap_pool_current()); 668 + 669 + list_del_rcu(&pool->list); 670 + call_rcu(&pool->rcu_head, __zswap_pool_release); 671 + 672 + spin_unlock(&zswap_pools_lock); 673 + } 674 + 675 + static void zswap_pool_put(struct zswap_pool *pool) 676 + { 677 + kref_put(&pool->kref, __zswap_pool_empty); 678 + } 679 + 680 + /********************************* 681 + * param callbacks 682 + **********************************/ 683 + 684 + static int __zswap_param_set(const char *val, const struct kernel_param *kp, 685 + char *type, char *compressor) 686 + { 687 + struct zswap_pool *pool, *put_pool = NULL; 688 + char str[kp->str->maxlen], *s; 689 + int ret; 690 + 691 + /* 692 + * kp is either zswap_zpool_kparam or zswap_compressor_kparam, defined 693 + * at the top of this file, so maxlen is CRYPTO_MAX_ALG_NAME (64) or 694 + * 32 (arbitrary). 695 + */ 696 + strlcpy(str, val, kp->str->maxlen); 697 + s = strim(str); 698 + 699 + /* if this is load-time (pre-init) param setting, 700 + * don't create a pool; that's done during init. 701 + */ 702 + if (!zswap_init_started) 703 + return param_set_copystring(s, kp); 704 + 705 + /* no change required */ 706 + if (!strncmp(kp->str->string, s, kp->str->maxlen)) 707 + return 0; 708 + 709 + if (!type) { 710 + type = s; 711 + if (!zpool_has_pool(type)) { 712 + pr_err("zpool %s not available\n", type); 713 + return -ENOENT; 714 + } 715 + } else if (!compressor) { 716 + compressor = s; 717 + if (!crypto_has_comp(compressor, 0, 0)) { 718 + pr_err("compressor %s not available\n", compressor); 719 + return -ENOENT; 720 + } 721 + } 722 + 723 + spin_lock(&zswap_pools_lock); 724 + 725 + pool = zswap_pool_find_get(type, compressor); 726 + if (pool) { 727 + zswap_pool_debug("using existing", pool); 728 + list_del_rcu(&pool->list); 729 + } else { 730 + spin_unlock(&zswap_pools_lock); 731 + pool = zswap_pool_create(type, compressor); 732 + spin_lock(&zswap_pools_lock); 733 + } 734 + 735 + if (pool) 736 + ret = param_set_copystring(s, kp); 737 + else 738 + ret = -EINVAL; 739 + 740 + if (!ret) { 741 + put_pool = zswap_pool_current(); 742 + list_add_rcu(&pool->list, &zswap_pools); 743 + } else if (pool) { 744 + /* add the possibly pre-existing pool to the end of the pools 745 + * list; if it's new (and empty) then it'll be removed and 746 + * destroyed by the put after we drop the lock 747 + */ 748 + list_add_tail_rcu(&pool->list, &zswap_pools); 749 + put_pool = pool; 750 + } 751 + 752 + spin_unlock(&zswap_pools_lock); 753 + 754 + /* drop the ref from either the old current pool, 755 + * or the new pool we failed to add 756 + */ 757 + if (put_pool) 758 + zswap_pool_put(put_pool); 759 + 760 + return ret; 761 + } 762 + 763 + static int zswap_compressor_param_set(const char *val, 764 + const struct kernel_param *kp) 765 + { 766 + return __zswap_param_set(val, kp, zswap_zpool_type, NULL); 767 + } 768 + 769 + static int zswap_zpool_param_set(const char *val, 770 + const struct kernel_param *kp) 771 + { 772 + return __zswap_param_set(val, kp, NULL, zswap_compressor); 421 773 } 422 774 423 775 /********************************* ··· 835 477 pgoff_t offset; 836 478 struct zswap_entry *entry; 837 479 struct page *page; 480 + struct crypto_comp *tfm; 838 481 u8 *src, *dst; 839 482 unsigned int dlen; 840 483 int ret; ··· 876 517 case ZSWAP_SWAPCACHE_NEW: /* page is locked */ 877 518 /* decompress */ 878 519 dlen = PAGE_SIZE; 879 - src = (u8 *)zpool_map_handle(zswap_pool, entry->handle, 520 + src = (u8 *)zpool_map_handle(entry->pool->zpool, entry->handle, 880 521 ZPOOL_MM_RO) + sizeof(struct zswap_header); 881 522 dst = kmap_atomic(page); 882 - ret = zswap_comp_op(ZSWAP_COMPOP_DECOMPRESS, src, 883 - entry->length, dst, &dlen); 523 + tfm = *get_cpu_ptr(entry->pool->tfm); 524 + ret = crypto_comp_decompress(tfm, src, entry->length, 525 + dst, &dlen); 526 + put_cpu_ptr(entry->pool->tfm); 884 527 kunmap_atomic(dst); 885 - zpool_unmap_handle(zswap_pool, entry->handle); 528 + zpool_unmap_handle(entry->pool->zpool, entry->handle); 886 529 BUG_ON(ret); 887 530 BUG_ON(dlen != PAGE_SIZE); 888 531 ··· 933 572 return ret; 934 573 } 935 574 575 + static int zswap_shrink(void) 576 + { 577 + struct zswap_pool *pool; 578 + int ret; 579 + 580 + pool = zswap_pool_last_get(); 581 + if (!pool) 582 + return -ENOENT; 583 + 584 + ret = zpool_shrink(pool->zpool, 1, NULL); 585 + 586 + zswap_pool_put(pool); 587 + 588 + return ret; 589 + } 590 + 936 591 /********************************* 937 592 * frontswap hooks 938 593 **********************************/ ··· 958 581 { 959 582 struct zswap_tree *tree = zswap_trees[type]; 960 583 struct zswap_entry *entry, *dupentry; 584 + struct crypto_comp *tfm; 961 585 int ret; 962 586 unsigned int dlen = PAGE_SIZE, len; 963 587 unsigned long handle; ··· 974 596 /* reclaim space if needed */ 975 597 if (zswap_is_full()) { 976 598 zswap_pool_limit_hit++; 977 - if (zpool_shrink(zswap_pool, 1, NULL)) { 599 + if (zswap_shrink()) { 978 600 zswap_reject_reclaim_fail++; 979 601 ret = -ENOMEM; 980 602 goto reject; ··· 989 611 goto reject; 990 612 } 991 613 992 - /* compress */ 993 - dst = get_cpu_var(zswap_dstmem); 994 - src = kmap_atomic(page); 995 - ret = zswap_comp_op(ZSWAP_COMPOP_COMPRESS, src, PAGE_SIZE, dst, &dlen); 996 - kunmap_atomic(src); 997 - if (ret) { 614 + /* if entry is successfully added, it keeps the reference */ 615 + entry->pool = zswap_pool_current_get(); 616 + if (!entry->pool) { 998 617 ret = -EINVAL; 999 618 goto freepage; 1000 619 } 1001 620 621 + /* compress */ 622 + dst = get_cpu_var(zswap_dstmem); 623 + tfm = *get_cpu_ptr(entry->pool->tfm); 624 + src = kmap_atomic(page); 625 + ret = crypto_comp_compress(tfm, src, PAGE_SIZE, dst, &dlen); 626 + kunmap_atomic(src); 627 + put_cpu_ptr(entry->pool->tfm); 628 + if (ret) { 629 + ret = -EINVAL; 630 + goto put_dstmem; 631 + } 632 + 1002 633 /* store */ 1003 634 len = dlen + sizeof(struct zswap_header); 1004 - ret = zpool_malloc(zswap_pool, len, __GFP_NORETRY | __GFP_NOWARN, 1005 - &handle); 635 + ret = zpool_malloc(entry->pool->zpool, len, 636 + __GFP_NORETRY | __GFP_NOWARN, &handle); 1006 637 if (ret == -ENOSPC) { 1007 638 zswap_reject_compress_poor++; 1008 - goto freepage; 639 + goto put_dstmem; 1009 640 } 1010 641 if (ret) { 1011 642 zswap_reject_alloc_fail++; 1012 - goto freepage; 643 + goto put_dstmem; 1013 644 } 1014 - zhdr = zpool_map_handle(zswap_pool, handle, ZPOOL_MM_RW); 645 + zhdr = zpool_map_handle(entry->pool->zpool, handle, ZPOOL_MM_RW); 1015 646 zhdr->swpentry = swp_entry(type, offset); 1016 647 buf = (u8 *)(zhdr + 1); 1017 648 memcpy(buf, dst, dlen); 1018 - zpool_unmap_handle(zswap_pool, handle); 649 + zpool_unmap_handle(entry->pool->zpool, handle); 1019 650 put_cpu_var(zswap_dstmem); 1020 651 1021 652 /* populate entry */ ··· 1047 660 1048 661 /* update stats */ 1049 662 atomic_inc(&zswap_stored_pages); 1050 - zswap_pool_total_size = zpool_get_total_size(zswap_pool); 663 + zswap_update_total_size(); 1051 664 1052 665 return 0; 1053 666 1054 - freepage: 667 + put_dstmem: 1055 668 put_cpu_var(zswap_dstmem); 669 + zswap_pool_put(entry->pool); 670 + freepage: 1056 671 zswap_entry_cache_free(entry); 1057 672 reject: 1058 673 return ret; ··· 1069 680 { 1070 681 struct zswap_tree *tree = zswap_trees[type]; 1071 682 struct zswap_entry *entry; 683 + struct crypto_comp *tfm; 1072 684 u8 *src, *dst; 1073 685 unsigned int dlen; 1074 686 int ret; ··· 1086 696 1087 697 /* decompress */ 1088 698 dlen = PAGE_SIZE; 1089 - src = (u8 *)zpool_map_handle(zswap_pool, entry->handle, 699 + src = (u8 *)zpool_map_handle(entry->pool->zpool, entry->handle, 1090 700 ZPOOL_MM_RO) + sizeof(struct zswap_header); 1091 701 dst = kmap_atomic(page); 1092 - ret = zswap_comp_op(ZSWAP_COMPOP_DECOMPRESS, src, entry->length, 1093 - dst, &dlen); 702 + tfm = *get_cpu_ptr(entry->pool->tfm); 703 + ret = crypto_comp_decompress(tfm, src, entry->length, dst, &dlen); 704 + put_cpu_ptr(entry->pool->tfm); 1094 705 kunmap_atomic(dst); 1095 - zpool_unmap_handle(zswap_pool, entry->handle); 706 + zpool_unmap_handle(entry->pool->zpool, entry->handle); 1096 707 BUG_ON(ret); 1097 708 1098 709 spin_lock(&tree->lock); ··· 1145 754 kfree(tree); 1146 755 zswap_trees[type] = NULL; 1147 756 } 1148 - 1149 - static const struct zpool_ops zswap_zpool_ops = { 1150 - .evict = zswap_writeback_entry 1151 - }; 1152 757 1153 758 static void zswap_frontswap_init(unsigned type) 1154 759 { ··· 1226 839 **********************************/ 1227 840 static int __init init_zswap(void) 1228 841 { 1229 - gfp_t gfp = __GFP_NORETRY | __GFP_NOWARN; 842 + struct zswap_pool *pool; 1230 843 1231 - pr_info("loading zswap\n"); 1232 - 1233 - zswap_pool = zpool_create_pool(zswap_zpool_type, "zswap", gfp, 1234 - &zswap_zpool_ops); 1235 - if (!zswap_pool && strcmp(zswap_zpool_type, ZSWAP_ZPOOL_DEFAULT)) { 1236 - pr_info("%s zpool not available\n", zswap_zpool_type); 1237 - zswap_zpool_type = ZSWAP_ZPOOL_DEFAULT; 1238 - zswap_pool = zpool_create_pool(zswap_zpool_type, "zswap", gfp, 1239 - &zswap_zpool_ops); 1240 - } 1241 - if (!zswap_pool) { 1242 - pr_err("%s zpool not available\n", zswap_zpool_type); 1243 - pr_err("zpool creation failed\n"); 1244 - goto error; 1245 - } 1246 - pr_info("using %s pool\n", zswap_zpool_type); 844 + zswap_init_started = true; 1247 845 1248 846 if (zswap_entry_cache_create()) { 1249 847 pr_err("entry cache creation failed\n"); 1250 - goto cachefail; 848 + goto cache_fail; 1251 849 } 1252 - if (zswap_comp_init()) { 1253 - pr_err("compressor initialization failed\n"); 1254 - goto compfail; 850 + 851 + if (zswap_cpu_dstmem_init()) { 852 + pr_err("dstmem alloc failed\n"); 853 + goto dstmem_fail; 1255 854 } 1256 - if (zswap_cpu_init()) { 1257 - pr_err("per-cpu initialization failed\n"); 1258 - goto pcpufail; 855 + 856 + pool = __zswap_pool_create_fallback(); 857 + if (!pool) { 858 + pr_err("pool creation failed\n"); 859 + goto pool_fail; 1259 860 } 861 + pr_info("loaded using pool %s/%s\n", pool->tfm_name, 862 + zpool_get_type(pool->zpool)); 863 + 864 + list_add(&pool->list, &zswap_pools); 1260 865 1261 866 frontswap_register_ops(&zswap_frontswap_ops); 1262 867 if (zswap_debugfs_init()) 1263 868 pr_warn("debugfs initialization failed\n"); 1264 869 return 0; 1265 - pcpufail: 1266 - zswap_comp_exit(); 1267 - compfail: 870 + 871 + pool_fail: 872 + zswap_cpu_dstmem_destroy(); 873 + dstmem_fail: 1268 874 zswap_entry_cache_destroy(); 1269 - cachefail: 1270 - zpool_destroy_pool(zswap_pool); 1271 - error: 875 + cache_fail: 1272 876 return -ENOMEM; 1273 877 } 1274 878 /* must be late so crypto has time to come up */

+147 -38

scripts/checkpatch.pl

··· 264 264 __kernel| 265 265 __force| 266 266 __iomem| 267 + __pmem| 267 268 __must_check| 268 269 __init_refok| 269 270 __kprobes| ··· 585 584 our $FuncArg = qr{$Typecast{0,1}($LvalOrFunc|$Constant|$String)}; 586 585 587 586 our $declaration_macros = qr{(?x: 588 - (?:$Storage\s+)?(?:[A-Z_][A-Z0-9]*_){0,2}(?:DEFINE|DECLARE)(?:_[A-Z0-9]+){1,2}\s*\(| 587 + (?:$Storage\s+)?(?:[A-Z_][A-Z0-9]*_){0,2}(?:DEFINE|DECLARE)(?:_[A-Z0-9]+){1,6}\s*\(| 589 588 (?:$Storage\s+)?LIST_HEAD\s*\(| 590 589 (?:$Storage\s+)?${Type}\s+uninitialized_var\s*\( 591 590 )}; ··· 1954 1953 our $clean = 1; 1955 1954 my $signoff = 0; 1956 1955 my $is_patch = 0; 1957 - 1958 1956 my $in_header_lines = $file ? 0 : 1; 1959 1957 my $in_commit_log = 0; #Scanning lines before patch 1958 + my $commit_log_possible_stack_dump = 0; 1960 1959 my $commit_log_long_line = 0; 1961 1960 my $commit_log_has_diff = 0; 1962 1961 my $reported_maintainer_file = 0; ··· 2167 2166 if ($showfile) { 2168 2167 $prefix = "$realfile:$realline: " 2169 2168 } elsif ($emacs) { 2170 - $prefix = "$filename:$linenr: "; 2169 + if ($file) { 2170 + $prefix = "$filename:$realline: "; 2171 + } else { 2172 + $prefix = "$filename:$linenr: "; 2173 + } 2171 2174 } 2172 2175 2173 2176 if ($found_file) { 2174 - if ($realfile =~ m@^(drivers/net/|net/)@) { 2177 + if ($realfile =~ m@^(?:drivers/net/|net/|drivers/staging/)@) { 2175 2178 $check = 1; 2176 2179 } else { 2177 2180 $check = $check_orig; ··· 2315 2310 2316 2311 # Check for line lengths > 75 in commit log, warn once 2317 2312 if ($in_commit_log && !$commit_log_long_line && 2318 - length($line) > 75) { 2313 + length($line) > 75 && 2314 + !($line =~ /^\s*[a-zA-Z0-9_\/\.]+\s+\|\s+\d+/ || 2315 + # file delta changes 2316 + $line =~ /^\s*(?:[\w\.\-]+\/)++[\w\.\-]+:/ || 2317 + # filename then : 2318 + $line =~ /^\s*(?:Fixes:|Link:)/i || 2319 + # A Fixes: or Link: line 2320 + $commit_log_possible_stack_dump)) { 2319 2321 WARN("COMMIT_LOG_LONG_LINE", 2320 2322 "Possible unwrapped commit description (prefer a maximum 75 chars per line)\n" . $herecurr); 2321 2323 $commit_log_long_line = 1; 2322 2324 } 2323 2325 2326 + # Check if the commit log is in a possible stack dump 2327 + if ($in_commit_log && !$commit_log_possible_stack_dump && 2328 + ($line =~ /^\s*(?:WARNING:|BUG:)/ || 2329 + $line =~ /^\s*\[\s*\d+\.\d{6,6}\s*\]/ || 2330 + # timestamp 2331 + $line =~ /^\s*\[\<[0-9a-fA-F]{8,}\>\]/)) { 2332 + # stack dump address 2333 + $commit_log_possible_stack_dump = 1; 2334 + } 2335 + 2336 + # Reset possible stack dump if a blank line is found 2337 + if ($in_commit_log && $commit_log_possible_stack_dump && 2338 + $line =~ /^\s*$/) { 2339 + $commit_log_possible_stack_dump = 0; 2340 + } 2341 + 2324 2342 # Check for git id commit length and improperly formed commit descriptions 2325 - if ($in_commit_log && $line =~ /\b(c)ommit\s+([0-9a-f]{5,})/i) { 2326 - my $init_char = $1; 2327 - my $orig_commit = lc($2); 2343 + if ($in_commit_log && 2344 + ($line =~ /\bcommit\s+[0-9a-f]{5,}\b/i || 2345 + ($line =~ /\b[0-9a-f]{12,40}\b/i && 2346 + $line !~ /\bfixes:\s*[0-9a-f]{12,40}/i))) { 2347 + my $init_char = "c"; 2348 + my $orig_commit = ""; 2328 2349 my $short = 1; 2329 2350 my $long = 0; 2330 2351 my $case = 1; ··· 2360 2329 my $id = '0123456789ab'; 2361 2330 my $orig_desc = "commit description"; 2362 2331 my $description = ""; 2332 + 2333 + if ($line =~ /\b(c)ommit\s+([0-9a-f]{5,})\b/i) { 2334 + $init_char = $1; 2335 + $orig_commit = lc($2); 2336 + } elsif ($line =~ /\b([0-9a-f]{12,40})\b/i) { 2337 + $orig_commit = lc($1); 2338 + } 2363 2339 2364 2340 $short = 0 if ($line =~ /\bcommit\s+[0-9a-f]{12,40}/i); 2365 2341 $long = 1 if ($line =~ /\bcommit\s+[0-9a-f]{41,}/i); ··· 2776 2738 } 2777 2739 } 2778 2740 2741 + # Block comment styles 2742 + # Networking with an initial /* 2779 2743 if ($realfile =~ m@^(drivers/net/|net/)@ && 2780 2744 $prevrawline =~ /^\+[ \t]*\/\*[ \t]*$/ && 2781 2745 $rawline =~ /^\+[ \t]*\*/ && ··· 2786 2746 "networking block comments don't use an empty /* line, use /* Comment...\n" . $hereprev); 2787 2747 } 2788 2748 2789 - if ($realfile =~ m@^(drivers/net/|net/)@ && 2790 - $prevrawline =~ /^\+[ \t]*\/\*/ && #starting /* 2749 + # Block comments use * on subsequent lines 2750 + if ($prevline =~ /$;[ \t]*$/ && #ends in comment 2751 + $prevrawline =~ /^\+.*?\/\*/ && #starting /* 2791 2752 $prevrawline !~ /\*\/[ \t]*$/ && #no trailing */ 2792 2753 $rawline =~ /^\+/ && #line is new 2793 2754 $rawline !~ /^\+[ \t]*\*/) { #no leading * 2794 - WARN("NETWORKING_BLOCK_COMMENT_STYLE", 2795 - "networking block comments start with * on subsequent lines\n" . $hereprev); 2755 + WARN("BLOCK_COMMENT_STYLE", 2756 + "Block comments use * on subsequent lines\n" . $hereprev); 2796 2757 } 2797 2758 2798 - if ($realfile =~ m@^(drivers/net/|net/)@ && 2799 - $rawline !~ m@^\+[ \t]*\*/[ \t]*$@ && #trailing */ 2759 + # Block comments use */ on trailing lines 2760 + if ($rawline !~ m@^\+[ \t]*\*/[ \t]*$@ && #trailing */ 2800 2761 $rawline !~ m@^\+.*/\*.*\*/[ \t]*$@ && #inline /*...*/ 2801 2762 $rawline !~ m@^\+.*\*{2,}/[ \t]*$@ && #trailing **/ 2802 2763 $rawline =~ m@^\+[ \t]*.+\*\/[ \t]*$@) { #non blank */ 2803 - WARN("NETWORKING_BLOCK_COMMENT_STYLE", 2804 - "networking block comments put the trailing */ on a separate line\n" . $herecurr); 2764 + WARN("BLOCK_COMMENT_STYLE", 2765 + "Block comments use a trailing */ on a separate line\n" . $herecurr); 2805 2766 } 2806 2767 2807 2768 # check for missing blank lines after struct/union declarations ··· 3108 3067 3109 3068 substr($s, 0, length($c), ''); 3110 3069 3111 - # Make sure we remove the line prefixes as we have 3112 - # none on the first line, and are going to readd them 3113 - # where necessary. 3114 - $s =~ s/\n./\n/gs; 3070 + # remove inline comments 3071 + $s =~ s/$;/ /g; 3072 + $c =~ s/$;/ /g; 3115 3073 3116 3074 # Find out how long the conditional actually is. 3117 3075 my @newlines = ($c =~ /\n/gs); 3118 3076 my $cond_lines = 1 + $#newlines; 3077 + 3078 + # Make sure we remove the line prefixes as we have 3079 + # none on the first line, and are going to readd them 3080 + # where necessary. 3081 + $s =~ s/\n./\n/gs; 3082 + while ($s =~ /\n\s+\\\n/) { 3083 + $cond_lines += $s =~ s/\n\s+\\\n/\n/g; 3084 + } 3119 3085 3120 3086 # We want to check the first line inside the block 3121 3087 # starting at the end of the conditional, so remove: ··· 3189 3141 3190 3142 #print "line<$line> prevline<$prevline> indent<$indent> sindent<$sindent> check<$check> continuation<$continuation> s<$s> cond_lines<$cond_lines> stat_real<$stat_real> stat<$stat>\n"; 3191 3143 3192 - if ($check && (($sindent % 8) != 0 || 3193 - ($sindent <= $indent && $s ne ''))) { 3144 + if ($check && $s ne '' && 3145 + (($sindent % 8) != 0 || 3146 + ($sindent < $indent) || 3147 + ($sindent > $indent + 8))) { 3194 3148 WARN("SUSPECT_CODE_INDENT", 3195 3149 "suspect code indent for conditional statements ($indent, $sindent)\n" . $herecurr . "$stat_real\n"); 3196 3150 } ··· 3489 3439 } 3490 3440 } 3491 3441 3492 - # # no BUG() or BUG_ON() 3493 - # if ($line =~ /\b(BUG|BUG_ON)\b/) { 3494 - # print "Try to use WARN_ON & Recovery code rather than BUG() or BUG_ON()\n"; 3495 - # print "$herecurr"; 3496 - # $clean = 0; 3497 - # } 3442 + # avoid BUG() or BUG_ON() 3443 + if ($line =~ /\b(?:BUG|BUG_ON)\b/) { 3444 + my $msg_type = \&WARN; 3445 + $msg_type = \&CHK if ($file); 3446 + &{$msg_type}("AVOID_BUG", 3447 + "Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON()\n" . $herecurr); 3448 + } 3498 3449 3450 + # avoid LINUX_VERSION_CODE 3499 3451 if ($line =~ /\bLINUX_VERSION_CODE\b/) { 3500 3452 WARN("LINUX_VERSION_CODE", 3501 3453 "LINUX_VERSION_CODE should be avoided, code should be for the version to which it is merged\n" . $herecurr); ··· 3572 3520 # function brace can't be on same line, except for #defines of do while, 3573 3521 # or if closed on same line 3574 3522 if (($line=~/$Type\s*$Ident$.*$.*\s*{/) and 3575 - !($line=~/\#\s*define.*do\s{/) and !($line=~/}/)) { 3523 + !($line=~/\#\s*define.*do\s\{/) and !($line=~/}/)) { 3576 3524 if (ERROR("OPEN_BRACE", 3577 3525 "open brace '{' following function declarations go on the next line\n" . $herecurr) && 3578 3526 $fix) { ··· 4084 4032 ## } 4085 4033 4086 4034 #need space before brace following if, while, etc 4087 - if (($line =~ /$.*${/ && $line !~ /$$Type${/) || 4088 - $line =~ /do{/) { 4035 + if (($line =~ /$.*$\{/ && $line !~ /$$Type${/) || 4036 + $line =~ /do\{/) { 4089 4037 if (ERROR("SPACING", 4090 4038 "space required before the open brace '{'\n" . $herecurr) && 4091 4039 $fix) { ··· 4228 4176 $msg = " - maybe == should be = ?" if ($comp eq "=="); 4229 4177 WARN("UNNECESSARY_PARENTHESES", 4230 4178 "Unnecessary parentheses$msg\n" . $herecurr); 4179 + } 4180 + } 4181 + 4182 + # comparisons with a constant or upper case identifier on the left 4183 + # avoid cases like "foo + BAR < baz" 4184 + # only fix matches surrounded by parentheses to avoid incorrect 4185 + # conversions like "FOO < baz() + 5" being "misfixed" to "baz() > FOO + 5" 4186 + if ($^V && $^V ge 5.10.0 && 4187 + $line =~ /^\+(.*)\b($Constant|[A-Z_][A-Z0-9_]*)\s*($Compare)\s*($LvalOrFunc)/) { 4188 + my $lead = $1; 4189 + my $const = $2; 4190 + my $comp = $3; 4191 + my $to = $4; 4192 + my $newcomp = $comp; 4193 + if ($lead !~ /$Operators\s*$/ && 4194 + $to !~ /^(?:Constant|[A-Z_][A-Z0-9_]*)$/ && 4195 + WARN("CONSTANT_COMPARISON", 4196 + "Comparisons should place the constant on the right side of the test\n" . $herecurr) && 4197 + $fix) { 4198 + if ($comp eq "<") { 4199 + $newcomp = ">"; 4200 + } elsif ($comp eq "<=") { 4201 + $newcomp = ">="; 4202 + } elsif ($comp eq ">") { 4203 + $newcomp = "<"; 4204 + } elsif ($comp eq ">=") { 4205 + $newcomp = "<="; 4206 + } 4207 + $fixed[$fixlinenr] =~ s/$\s*\Q$const\E\s*$Compare\s*\Q$to\E\s*$/($to $newcomp $const)/; 4231 4208 } 4232 4209 } 4233 4210 ··· 4561 4480 $dstat !~ /^for\s*$Constant$/ && # for (...) 4562 4481 $dstat !~ /^for\s*$Constant\s+(?:$Ident|-?$Constant)$/ && # for (...) bar() 4563 4482 $dstat !~ /^do\s*{/ && # do {... 4564 - $dstat !~ /^\({/ && # ({... 4483 + $dstat !~ /^\(\{/ && # ({... 4565 4484 $ctx !~ /^.\s*#\s*define\s+TRACE_(?:SYSTEM|INCLUDE_FILE|INCLUDE_PATH)\b/) 4566 4485 { 4567 4486 $ctx =~ s/\n*$//; ··· 4870 4789 "Consecutive strings are generally better as a single string\n" . $herecurr); 4871 4790 } 4872 4791 4873 - # check for %L{u,d,i} in strings 4792 + # check for %L{u,d,i} and 0x%[udi] in strings 4874 4793 my $string; 4875 4794 while ($line =~ /(?:^|")([X\t]*)(?:"|$)/g) { 4876 4795 $string = substr($rawline, $-[1], $+[1] - $-[1]); 4877 4796 $string =~ s/%%/__/g; 4878 - if ($string =~ /(?<!%)%L[udi]/) { 4797 + if ($string =~ /(?<!%)%[\*\d\.\$]*L[udi]/) { 4879 4798 WARN("PRINTF_L", 4880 4799 "\%Ld/%Lu are not-standard C, use %lld/%llu\n" . $herecurr); 4881 4800 last; 4801 + } 4802 + if ($string =~ /0x%[\*\d\.\$\Llzth]*[udi]/) { 4803 + ERROR("PRINTF_0xDECIMAL", 4804 + "Prefixing 0x with decimal output is defective\n" . $herecurr); 4882 4805 } 4883 4806 } 4884 4807 ··· 4901 4816 4902 4817 # check for needless "if (<foo>) fn(<foo>)" uses 4903 4818 if ($prevline =~ /\bif\s*$\s*($Lval)\s*$/) { 4904 - my $expr = '\s*$\s*' . quotemeta($1) . '\s*$\s*;'; 4905 - if ($line =~ /\b(kfree|usb_free_urb|debugfs_remove(?:_recursive)?)$expr/) { 4906 - WARN('NEEDLESS_IF', 4907 - "$1(NULL) is safe and this check is probably not required\n" . $hereprev); 4819 + my $tested = quotemeta($1); 4820 + my $expr = '\s*$\s*' . $tested . '\s*$\s*;'; 4821 + if ($line =~ /\b(kfree|usb_free_urb|debugfs_remove(?:_recursive)?|(?:kmem_cache|mempool|dma_pool)_destroy)$expr/) { 4822 + my $func = $1; 4823 + if (WARN('NEEDLESS_IF', 4824 + "$func(NULL) is safe and this check is probably not required\n" . $hereprev) && 4825 + $fix) { 4826 + my $do_fix = 1; 4827 + my $leading_tabs = ""; 4828 + my $new_leading_tabs = ""; 4829 + if ($lines[$linenr - 2] =~ /^\+(\t*)if\s*$\s*$tested\s*$\s*$/) { 4830 + $leading_tabs = $1; 4831 + } else { 4832 + $do_fix = 0; 4833 + } 4834 + if ($lines[$linenr - 1] =~ /^\+(\t+)$func\s*$\s*$tested\s*$\s*;\s*$/) { 4835 + $new_leading_tabs = $1; 4836 + if (length($leading_tabs) + 1 ne length($new_leading_tabs)) { 4837 + $do_fix = 0; 4838 + } 4839 + } else { 4840 + $do_fix = 0; 4841 + } 4842 + if ($do_fix) { 4843 + fix_delete_line($fixlinenr - 1, $prevrawline); 4844 + $fixed[$fixlinenr] =~ s/^\+$new_leading_tabs/\+$leading_tabs/; 4845 + } 4846 + } 4908 4847 } 4909 4848 } 4910 4849

+1 -1

security/selinux/selinuxfs.c

··· 472 472 return 0; 473 473 } 474 474 475 - static struct vm_operations_struct sel_mmap_policy_ops = { 475 + static const struct vm_operations_struct sel_mmap_policy_ops = { 476 476 .fault = sel_mmap_policy_fault, 477 477 .page_mkwrite = sel_mmap_policy_fault, 478 478 };

+31

virt/kvm/kvm_main.c

··· 397 397 return young; 398 398 } 399 399 400 + static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, 401 + struct mm_struct *mm, 402 + unsigned long start, 403 + unsigned long end) 404 + { 405 + struct kvm *kvm = mmu_notifier_to_kvm(mn); 406 + int young, idx; 407 + 408 + idx = srcu_read_lock(&kvm->srcu); 409 + spin_lock(&kvm->mmu_lock); 410 + /* 411 + * Even though we do not flush TLB, this will still adversely 412 + * affect performance on pre-Haswell Intel EPT, where there is 413 + * no EPT Access Bit to clear so that we have to tear down EPT 414 + * tables instead. If we find this unacceptable, we can always 415 + * add a parameter to kvm_age_hva so that it effectively doesn't 416 + * do anything on clear_young. 417 + * 418 + * Also note that currently we never issue secondary TLB flushes 419 + * from clear_young, leaving this job up to the regular system 420 + * cadence. If we find this inaccurate, we might come up with a 421 + * more sophisticated heuristic later. 422 + */ 423 + young = kvm_age_hva(kvm, start, end); 424 + spin_unlock(&kvm->mmu_lock); 425 + srcu_read_unlock(&kvm->srcu, idx); 426 + 427 + return young; 428 + } 429 + 400 430 static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, 401 431 struct mm_struct *mm, 402 432 unsigned long address) ··· 459 429 .invalidate_range_start = kvm_mmu_notifier_invalidate_range_start, 460 430 .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end, 461 431 .clear_flush_young = kvm_mmu_notifier_clear_flush_young, 432 + .clear_young = kvm_mmu_notifier_clear_young, 462 433 .test_young = kvm_mmu_notifier_test_young, 463 434 .change_pte = kvm_mmu_notifier_change_pte, 464 435 .release = kvm_mmu_notifier_release,