Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping

Pull CMA and ARM DMA-mapping updates from Marek Szyprowski:
"These patches contain two major updates for DMA mapping subsystem
(mainly for ARM architecture). First one is Contiguous Memory
Allocator (CMA) which makes it possible for device drivers to allocate
big contiguous chunks of memory after the system has booted.

The main difference from the similar frameworks is the fact that CMA
allows to transparently reuse the memory region reserved for the big
chunk allocation as a system memory, so no memory is wasted when no
big chunk is allocated. Once the alloc request is issued, the
framework migrates system pages to create space for the required big
chunk of physically contiguous memory.

For more information one can refer to nice LWN articles:

- 'A reworked contiguous memory allocator':
http://lwn.net/Articles/447405/

- 'CMA and ARM':
http://lwn.net/Articles/450286/

- 'A deep dive into CMA':
http://lwn.net/Articles/486301/

- and the following thread with the patches and links to all previous
versions:
https://lkml.org/lkml/2012/4/3/204

The main client for this new framework is ARM DMA-mapping subsystem.

The second part provides a complete redesign in ARM DMA-mapping
subsystem. The core implementation has been changed to use common
struct dma_map_ops based infrastructure with the recent updates for
new dma attributes merged in v3.4-rc2. This allows to use more than
one implementation of dma-mapping calls and change/select them on the
struct device basis. The first client of this new infractructure is
dmabounce implementation which has been completely cut out of the
core, common code.

The last patch of this redesign update introduces a new, experimental
implementation of dma-mapping calls on top of generic IOMMU framework.
This lets ARM sub-platform to transparently use IOMMU for DMA-mapping
calls if one provides required IOMMU hardware.

For more information please refer to the following thread:
http://www.spinics.net/lists/arm-kernel/msg175729.html

The last patch merges changes from both updates and provides a
resolution for the conflicts which cannot be avoided when patches have
been applied on the same files (mainly arch/arm/mm/dma-mapping.c)."

Acked by Andrew Morton <akpm@linux-foundation.org>:
"Yup, this one please. It's had much work, plenty of review and I
think even Russell is happy with it."

* 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: (28 commits)
ARM: dma-mapping: use PMD size for section unmap
cma: fix migration mode
ARM: integrate CMA with DMA-mapping subsystem
X86: integrate CMA with DMA-mapping subsystem
drivers: add Contiguous Memory Allocator
mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks
mm: extract reclaim code from __alloc_pages_direct_reclaim()
mm: Serialize access to min_free_kbytes
mm: page_isolation: MIGRATE_CMA isolation functions added
mm: mmzone: MIGRATE_CMA migration type added
mm: page_alloc: change fallbacks array handling
mm: page_alloc: introduce alloc_contig_range()
mm: compaction: export some of the functions
mm: compaction: introduce isolate_freepages_range()
mm: compaction: introduce map_pages()
mm: compaction: introduce isolate_migratepages_range()
mm: page_alloc: remove trailing whitespace
ARM: dma-mapping: add support for IOMMU mapper
ARM: dma-mapping: use alloc, mmap, free from dma_ops
ARM: dma-mapping: remove redundant code and do the cleanup
...

Conflicts:
arch/x86/include/asm/dma-mapping.h

+2899 -781
+9
Documentation/kernel-parameters.txt
··· 508 508 Also note the kernel might malfunction if you disable 509 509 some critical bits. 510 510 511 + cma=nn[MG] [ARM,KNL] 512 + Sets the size of kernel global memory area for contiguous 513 + memory allocations. For more information, see 514 + include/linux/dma-contiguous.h 515 + 511 516 cmo_free_hint= [PPC] Format: { yes | no } 512 517 Specify whether pages are marked as being inactive 513 518 when they are freed. This is used in CMO environments 514 519 to determine OS memory pressure for page stealing by 515 520 a hypervisor. 516 521 Default: yes 522 + 523 + coherent_pool=nn[KMG] [ARM,KNL] 524 + Sets the size of memory pool for coherent, atomic dma 525 + allocations if Contiguous Memory Allocator (CMA) is used. 517 526 518 527 code_bytes [X86] How many bytes of object code to print 519 528 in an oops report.
+3
arch/Kconfig
··· 159 159 config HAVE_DMA_ATTRS 160 160 bool 161 161 162 + config HAVE_DMA_CONTIGUOUS 163 + bool 164 + 162 165 config USE_GENERIC_SMP_HELPERS 163 166 bool 164 167
+11
arch/arm/Kconfig
··· 5 5 select HAVE_AOUT 6 6 select HAVE_DMA_API_DEBUG 7 7 select HAVE_IDE if PCI || ISA || PCMCIA 8 + select HAVE_DMA_ATTRS 9 + select HAVE_DMA_CONTIGUOUS if (CPU_V6 || CPU_V6K || CPU_V7) 10 + select CMA if (CPU_V6 || CPU_V6K || CPU_V7) 8 11 select HAVE_MEMBLOCK 9 12 select RTC_LIB 10 13 select SYS_SUPPORTS_APM_EMULATION ··· 55 52 <http://www.arm.linux.org.uk/>. 56 53 57 54 config ARM_HAS_SG_CHAIN 55 + bool 56 + 57 + config NEED_SG_DMA_LENGTH 58 + bool 59 + 60 + config ARM_DMA_USE_IOMMU 61 + select NEED_SG_DMA_LENGTH 62 + select ARM_HAS_SG_CHAIN 58 63 bool 59 64 60 65 config HAVE_PWM
+65 -19
arch/arm/common/dmabounce.c
··· 173 173 read_lock_irqsave(&device_info->lock, flags); 174 174 175 175 list_for_each_entry(b, &device_info->safe_buffers, node) 176 - if (b->safe_dma_addr == safe_dma_addr) { 176 + if (b->safe_dma_addr <= safe_dma_addr && 177 + b->safe_dma_addr + b->size > safe_dma_addr) { 177 178 rb = b; 178 179 break; 179 180 } ··· 255 254 if (buf == NULL) { 256 255 dev_err(dev, "%s: unable to map unsafe buffer %p!\n", 257 256 __func__, ptr); 258 - return ~0; 257 + return DMA_ERROR_CODE; 259 258 } 260 259 261 260 dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n", ··· 308 307 * substitute the safe buffer for the unsafe one. 309 308 * (basically move the buffer from an unsafe area to a safe one) 310 309 */ 311 - dma_addr_t __dma_map_page(struct device *dev, struct page *page, 312 - unsigned long offset, size_t size, enum dma_data_direction dir) 310 + static dma_addr_t dmabounce_map_page(struct device *dev, struct page *page, 311 + unsigned long offset, size_t size, enum dma_data_direction dir, 312 + struct dma_attrs *attrs) 313 313 { 314 314 dma_addr_t dma_addr; 315 315 int ret; ··· 322 320 323 321 ret = needs_bounce(dev, dma_addr, size); 324 322 if (ret < 0) 325 - return ~0; 323 + return DMA_ERROR_CODE; 326 324 327 325 if (ret == 0) { 328 - __dma_page_cpu_to_dev(page, offset, size, dir); 326 + arm_dma_ops.sync_single_for_device(dev, dma_addr, size, dir); 329 327 return dma_addr; 330 328 } 331 329 332 330 if (PageHighMem(page)) { 333 331 dev_err(dev, "DMA buffer bouncing of HIGHMEM pages is not supported\n"); 334 - return ~0; 332 + return DMA_ERROR_CODE; 335 333 } 336 334 337 335 return map_single(dev, page_address(page) + offset, size, dir); 338 336 } 339 - EXPORT_SYMBOL(__dma_map_page); 340 337 341 338 /* 342 339 * see if a mapped address was really a "safe" buffer and if so, copy ··· 343 342 * the safe buffer. (basically return things back to the way they 344 343 * should be) 345 344 */ 346 - void __dma_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size, 347 - enum dma_data_direction dir) 345 + static void dmabounce_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size, 346 + enum dma_data_direction dir, struct dma_attrs *attrs) 348 347 { 349 348 struct safe_buffer *buf; 350 349 ··· 353 352 354 353 buf = find_safe_buffer_dev(dev, dma_addr, __func__); 355 354 if (!buf) { 356 - __dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, dma_addr)), 357 - dma_addr & ~PAGE_MASK, size, dir); 355 + arm_dma_ops.sync_single_for_cpu(dev, dma_addr, size, dir); 358 356 return; 359 357 } 360 358 361 359 unmap_single(dev, buf, size, dir); 362 360 } 363 - EXPORT_SYMBOL(__dma_unmap_page); 364 361 365 - int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr, 366 - unsigned long off, size_t sz, enum dma_data_direction dir) 362 + static int __dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr, 363 + size_t sz, enum dma_data_direction dir) 367 364 { 368 365 struct safe_buffer *buf; 366 + unsigned long off; 369 367 370 368 dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n", 371 369 __func__, addr, off, sz, dir); ··· 372 372 buf = find_safe_buffer_dev(dev, addr, __func__); 373 373 if (!buf) 374 374 return 1; 375 + 376 + off = addr - buf->safe_dma_addr; 375 377 376 378 BUG_ON(buf->direction != dir); 377 379 ··· 390 388 } 391 389 return 0; 392 390 } 393 - EXPORT_SYMBOL(dmabounce_sync_for_cpu); 394 391 395 - int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr, 396 - unsigned long off, size_t sz, enum dma_data_direction dir) 392 + static void dmabounce_sync_for_cpu(struct device *dev, 393 + dma_addr_t handle, size_t size, enum dma_data_direction dir) 394 + { 395 + if (!__dmabounce_sync_for_cpu(dev, handle, size, dir)) 396 + return; 397 + 398 + arm_dma_ops.sync_single_for_cpu(dev, handle, size, dir); 399 + } 400 + 401 + static int __dmabounce_sync_for_device(struct device *dev, dma_addr_t addr, 402 + size_t sz, enum dma_data_direction dir) 397 403 { 398 404 struct safe_buffer *buf; 405 + unsigned long off; 399 406 400 407 dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n", 401 408 __func__, addr, off, sz, dir); ··· 412 401 buf = find_safe_buffer_dev(dev, addr, __func__); 413 402 if (!buf) 414 403 return 1; 404 + 405 + off = addr - buf->safe_dma_addr; 415 406 416 407 BUG_ON(buf->direction != dir); 417 408 ··· 430 417 } 431 418 return 0; 432 419 } 433 - EXPORT_SYMBOL(dmabounce_sync_for_device); 420 + 421 + static void dmabounce_sync_for_device(struct device *dev, 422 + dma_addr_t handle, size_t size, enum dma_data_direction dir) 423 + { 424 + if (!__dmabounce_sync_for_device(dev, handle, size, dir)) 425 + return; 426 + 427 + arm_dma_ops.sync_single_for_device(dev, handle, size, dir); 428 + } 429 + 430 + static int dmabounce_set_mask(struct device *dev, u64 dma_mask) 431 + { 432 + if (dev->archdata.dmabounce) 433 + return 0; 434 + 435 + return arm_dma_ops.set_dma_mask(dev, dma_mask); 436 + } 437 + 438 + static struct dma_map_ops dmabounce_ops = { 439 + .alloc = arm_dma_alloc, 440 + .free = arm_dma_free, 441 + .mmap = arm_dma_mmap, 442 + .map_page = dmabounce_map_page, 443 + .unmap_page = dmabounce_unmap_page, 444 + .sync_single_for_cpu = dmabounce_sync_for_cpu, 445 + .sync_single_for_device = dmabounce_sync_for_device, 446 + .map_sg = arm_dma_map_sg, 447 + .unmap_sg = arm_dma_unmap_sg, 448 + .sync_sg_for_cpu = arm_dma_sync_sg_for_cpu, 449 + .sync_sg_for_device = arm_dma_sync_sg_for_device, 450 + .set_dma_mask = dmabounce_set_mask, 451 + }; 434 452 435 453 static int dmabounce_init_pool(struct dmabounce_pool *pool, struct device *dev, 436 454 const char *name, unsigned long size) ··· 523 479 #endif 524 480 525 481 dev->archdata.dmabounce = device_info; 482 + set_dma_ops(dev, &dmabounce_ops); 526 483 527 484 dev_info(dev, "dmabounce: registered device\n"); 528 485 ··· 542 497 struct dmabounce_device_info *device_info = dev->archdata.dmabounce; 543 498 544 499 dev->archdata.dmabounce = NULL; 500 + set_dma_ops(dev, NULL); 545 501 546 502 if (!device_info) { 547 503 dev_warn(dev,
+4
arch/arm/include/asm/device.h
··· 7 7 #define ASMARM_DEVICE_H 8 8 9 9 struct dev_archdata { 10 + struct dma_map_ops *dma_ops; 10 11 #ifdef CONFIG_DMABOUNCE 11 12 struct dmabounce_device_info *dmabounce; 12 13 #endif 13 14 #ifdef CONFIG_IOMMU_API 14 15 void *iommu; /* private IOMMU data */ 16 + #endif 17 + #ifdef CONFIG_ARM_DMA_USE_IOMMU 18 + struct dma_iommu_mapping *mapping; 15 19 #endif 16 20 }; 17 21
+15
arch/arm/include/asm/dma-contiguous.h
··· 1 + #ifndef ASMARM_DMA_CONTIGUOUS_H 2 + #define ASMARM_DMA_CONTIGUOUS_H 3 + 4 + #ifdef __KERNEL__ 5 + #ifdef CONFIG_CMA 6 + 7 + #include <linux/types.h> 8 + #include <asm-generic/dma-contiguous.h> 9 + 10 + void dma_contiguous_early_fixup(phys_addr_t base, unsigned long size); 11 + 12 + #endif 13 + #endif 14 + 15 + #endif
+34
arch/arm/include/asm/dma-iommu.h
··· 1 + #ifndef ASMARM_DMA_IOMMU_H 2 + #define ASMARM_DMA_IOMMU_H 3 + 4 + #ifdef __KERNEL__ 5 + 6 + #include <linux/mm_types.h> 7 + #include <linux/scatterlist.h> 8 + #include <linux/dma-debug.h> 9 + #include <linux/kmemcheck.h> 10 + 11 + struct dma_iommu_mapping { 12 + /* iommu specific data */ 13 + struct iommu_domain *domain; 14 + 15 + void *bitmap; 16 + size_t bits; 17 + unsigned int order; 18 + dma_addr_t base; 19 + 20 + spinlock_t lock; 21 + struct kref kref; 22 + }; 23 + 24 + struct dma_iommu_mapping * 25 + arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, size_t size, 26 + int order); 27 + 28 + void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping); 29 + 30 + int arm_iommu_attach_device(struct device *dev, 31 + struct dma_iommu_mapping *mapping); 32 + 33 + #endif /* __KERNEL__ */ 34 + #endif
+109 -298
arch/arm/include/asm/dma-mapping.h
··· 5 5 6 6 #include <linux/mm_types.h> 7 7 #include <linux/scatterlist.h> 8 + #include <linux/dma-attrs.h> 8 9 #include <linux/dma-debug.h> 9 10 10 11 #include <asm-generic/dma-coherent.h> 11 12 #include <asm/memory.h> 13 + 14 + #define DMA_ERROR_CODE (~0) 15 + extern struct dma_map_ops arm_dma_ops; 16 + 17 + static inline struct dma_map_ops *get_dma_ops(struct device *dev) 18 + { 19 + if (dev && dev->archdata.dma_ops) 20 + return dev->archdata.dma_ops; 21 + return &arm_dma_ops; 22 + } 23 + 24 + static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops) 25 + { 26 + BUG_ON(!dev); 27 + dev->archdata.dma_ops = ops; 28 + } 29 + 30 + #include <asm-generic/dma-mapping-common.h> 31 + 32 + static inline int dma_set_mask(struct device *dev, u64 mask) 33 + { 34 + return get_dma_ops(dev)->set_dma_mask(dev, mask); 35 + } 12 36 13 37 #ifdef __arch_page_to_dma 14 38 #error Please update to __arch_pfn_to_dma ··· 86 62 #endif 87 63 88 64 /* 89 - * The DMA API is built upon the notion of "buffer ownership". A buffer 90 - * is either exclusively owned by the CPU (and therefore may be accessed 91 - * by it) or exclusively owned by the DMA device. These helper functions 92 - * represent the transitions between these two ownership states. 93 - * 94 - * Note, however, that on later ARMs, this notion does not work due to 95 - * speculative prefetches. We model our approach on the assumption that 96 - * the CPU does do speculative prefetches, which means we clean caches 97 - * before transfers and delay cache invalidation until transfer completion. 98 - * 99 - * Private support functions: these are not part of the API and are 100 - * liable to change. Drivers must not use these. 101 - */ 102 - static inline void __dma_single_cpu_to_dev(const void *kaddr, size_t size, 103 - enum dma_data_direction dir) 104 - { 105 - extern void ___dma_single_cpu_to_dev(const void *, size_t, 106 - enum dma_data_direction); 107 - 108 - if (!arch_is_coherent()) 109 - ___dma_single_cpu_to_dev(kaddr, size, dir); 110 - } 111 - 112 - static inline void __dma_single_dev_to_cpu(const void *kaddr, size_t size, 113 - enum dma_data_direction dir) 114 - { 115 - extern void ___dma_single_dev_to_cpu(const void *, size_t, 116 - enum dma_data_direction); 117 - 118 - if (!arch_is_coherent()) 119 - ___dma_single_dev_to_cpu(kaddr, size, dir); 120 - } 121 - 122 - static inline void __dma_page_cpu_to_dev(struct page *page, unsigned long off, 123 - size_t size, enum dma_data_direction dir) 124 - { 125 - extern void ___dma_page_cpu_to_dev(struct page *, unsigned long, 126 - size_t, enum dma_data_direction); 127 - 128 - if (!arch_is_coherent()) 129 - ___dma_page_cpu_to_dev(page, off, size, dir); 130 - } 131 - 132 - static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off, 133 - size_t size, enum dma_data_direction dir) 134 - { 135 - extern void ___dma_page_dev_to_cpu(struct page *, unsigned long, 136 - size_t, enum dma_data_direction); 137 - 138 - if (!arch_is_coherent()) 139 - ___dma_page_dev_to_cpu(page, off, size, dir); 140 - } 141 - 142 - extern int dma_supported(struct device *, u64); 143 - extern int dma_set_mask(struct device *, u64); 144 - 145 - /* 146 65 * DMA errors are defined by all-bits-set in the DMA address. 147 66 */ 148 67 static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) 149 68 { 150 - return dma_addr == ~0; 69 + return dma_addr == DMA_ERROR_CODE; 151 70 } 152 71 153 72 /* ··· 108 141 { 109 142 } 110 143 144 + extern int dma_supported(struct device *dev, u64 mask); 145 + 111 146 /** 112 - * dma_alloc_coherent - allocate consistent memory for DMA 147 + * arm_dma_alloc - allocate consistent memory for DMA 113 148 * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices 114 149 * @size: required memory size 115 150 * @handle: bus-specific DMA address 151 + * @attrs: optinal attributes that specific mapping properties 116 152 * 117 - * Allocate some uncached, unbuffered memory for a device for 118 - * performing DMA. This function allocates pages, and will 119 - * return the CPU-viewed address, and sets @handle to be the 120 - * device-viewed address. 153 + * Allocate some memory for a device for performing DMA. This function 154 + * allocates pages, and will return the CPU-viewed address, and sets @handle 155 + * to be the device-viewed address. 121 156 */ 122 - extern void *dma_alloc_coherent(struct device *, size_t, dma_addr_t *, gfp_t); 157 + extern void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, 158 + gfp_t gfp, struct dma_attrs *attrs); 159 + 160 + #define dma_alloc_coherent(d, s, h, f) dma_alloc_attrs(d, s, h, f, NULL) 161 + 162 + static inline void *dma_alloc_attrs(struct device *dev, size_t size, 163 + dma_addr_t *dma_handle, gfp_t flag, 164 + struct dma_attrs *attrs) 165 + { 166 + struct dma_map_ops *ops = get_dma_ops(dev); 167 + void *cpu_addr; 168 + BUG_ON(!ops); 169 + 170 + cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs); 171 + debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr); 172 + return cpu_addr; 173 + } 123 174 124 175 /** 125 - * dma_free_coherent - free memory allocated by dma_alloc_coherent 176 + * arm_dma_free - free memory allocated by arm_dma_alloc 126 177 * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices 127 178 * @size: size of memory originally requested in dma_alloc_coherent 128 179 * @cpu_addr: CPU-view address returned from dma_alloc_coherent 129 180 * @handle: device-view address returned from dma_alloc_coherent 181 + * @attrs: optinal attributes that specific mapping properties 130 182 * 131 183 * Free (and unmap) a DMA buffer previously allocated by 132 - * dma_alloc_coherent(). 184 + * arm_dma_alloc(). 133 185 * 134 186 * References to memory and mappings associated with cpu_addr/handle 135 187 * during and after this call executing are illegal. 136 188 */ 137 - extern void dma_free_coherent(struct device *, size_t, void *, dma_addr_t); 189 + extern void arm_dma_free(struct device *dev, size_t size, void *cpu_addr, 190 + dma_addr_t handle, struct dma_attrs *attrs); 191 + 192 + #define dma_free_coherent(d, s, c, h) dma_free_attrs(d, s, c, h, NULL) 193 + 194 + static inline void dma_free_attrs(struct device *dev, size_t size, 195 + void *cpu_addr, dma_addr_t dma_handle, 196 + struct dma_attrs *attrs) 197 + { 198 + struct dma_map_ops *ops = get_dma_ops(dev); 199 + BUG_ON(!ops); 200 + 201 + debug_dma_free_coherent(dev, size, cpu_addr, dma_handle); 202 + ops->free(dev, size, cpu_addr, dma_handle, attrs); 203 + } 138 204 139 205 /** 140 - * dma_mmap_coherent - map a coherent DMA allocation into user space 206 + * arm_dma_mmap - map a coherent DMA allocation into user space 141 207 * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices 142 208 * @vma: vm_area_struct describing requested user mapping 143 209 * @cpu_addr: kernel CPU-view address returned from dma_alloc_coherent 144 210 * @handle: device-view address returned from dma_alloc_coherent 145 211 * @size: size of memory originally requested in dma_alloc_coherent 212 + * @attrs: optinal attributes that specific mapping properties 146 213 * 147 214 * Map a coherent DMA buffer previously allocated by dma_alloc_coherent 148 215 * into user space. The coherent DMA buffer must not be freed by the 149 216 * driver until the user space mapping has been released. 150 217 */ 151 - int dma_mmap_coherent(struct device *, struct vm_area_struct *, 152 - void *, dma_addr_t, size_t); 218 + extern int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma, 219 + void *cpu_addr, dma_addr_t dma_addr, size_t size, 220 + struct dma_attrs *attrs); 153 221 222 + #define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, NULL) 154 223 155 - /** 156 - * dma_alloc_writecombine - allocate writecombining memory for DMA 157 - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices 158 - * @size: required memory size 159 - * @handle: bus-specific DMA address 160 - * 161 - * Allocate some uncached, buffered memory for a device for 162 - * performing DMA. This function allocates pages, and will 163 - * return the CPU-viewed address, and sets @handle to be the 164 - * device-viewed address. 165 - */ 166 - extern void *dma_alloc_writecombine(struct device *, size_t, dma_addr_t *, 167 - gfp_t); 224 + static inline int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma, 225 + void *cpu_addr, dma_addr_t dma_addr, 226 + size_t size, struct dma_attrs *attrs) 227 + { 228 + struct dma_map_ops *ops = get_dma_ops(dev); 229 + BUG_ON(!ops); 230 + return ops->mmap(dev, vma, cpu_addr, dma_addr, size, attrs); 231 + } 168 232 169 - #define dma_free_writecombine(dev,size,cpu_addr,handle) \ 170 - dma_free_coherent(dev,size,cpu_addr,handle) 233 + static inline void *dma_alloc_writecombine(struct device *dev, size_t size, 234 + dma_addr_t *dma_handle, gfp_t flag) 235 + { 236 + DEFINE_DMA_ATTRS(attrs); 237 + dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs); 238 + return dma_alloc_attrs(dev, size, dma_handle, flag, &attrs); 239 + } 171 240 172 - int dma_mmap_writecombine(struct device *, struct vm_area_struct *, 173 - void *, dma_addr_t, size_t); 241 + static inline void dma_free_writecombine(struct device *dev, size_t size, 242 + void *cpu_addr, dma_addr_t dma_handle) 243 + { 244 + DEFINE_DMA_ATTRS(attrs); 245 + dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs); 246 + return dma_free_attrs(dev, size, cpu_addr, dma_handle, &attrs); 247 + } 248 + 249 + static inline int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma, 250 + void *cpu_addr, dma_addr_t dma_addr, size_t size) 251 + { 252 + DEFINE_DMA_ATTRS(attrs); 253 + dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs); 254 + return dma_mmap_attrs(dev, vma, cpu_addr, dma_addr, size, &attrs); 255 + } 174 256 175 257 /* 176 258 * This can be called during boot to increase the size of the consistent ··· 228 212 */ 229 213 extern void __init init_consistent_dma_size(unsigned long size); 230 214 231 - 232 - #ifdef CONFIG_DMABOUNCE 233 215 /* 234 216 * For SA-1111, IXP425, and ADI systems the dma-mapping functions are "magic" 235 217 * and utilize bounce buffers as needed to work around limited DMA windows. ··· 267 253 */ 268 254 extern void dmabounce_unregister_dev(struct device *); 269 255 270 - /* 271 - * The DMA API, implemented by dmabounce.c. See below for descriptions. 272 - */ 273 - extern dma_addr_t __dma_map_page(struct device *, struct page *, 274 - unsigned long, size_t, enum dma_data_direction); 275 - extern void __dma_unmap_page(struct device *, dma_addr_t, size_t, 276 - enum dma_data_direction); 277 256 278 - /* 279 - * Private functions 280 - */ 281 - int dmabounce_sync_for_cpu(struct device *, dma_addr_t, unsigned long, 282 - size_t, enum dma_data_direction); 283 - int dmabounce_sync_for_device(struct device *, dma_addr_t, unsigned long, 284 - size_t, enum dma_data_direction); 285 - #else 286 - static inline int dmabounce_sync_for_cpu(struct device *d, dma_addr_t addr, 287 - unsigned long offset, size_t size, enum dma_data_direction dir) 288 - { 289 - return 1; 290 - } 291 - 292 - static inline int dmabounce_sync_for_device(struct device *d, dma_addr_t addr, 293 - unsigned long offset, size_t size, enum dma_data_direction dir) 294 - { 295 - return 1; 296 - } 297 - 298 - 299 - static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page, 300 - unsigned long offset, size_t size, enum dma_data_direction dir) 301 - { 302 - __dma_page_cpu_to_dev(page, offset, size, dir); 303 - return pfn_to_dma(dev, page_to_pfn(page)) + offset; 304 - } 305 - 306 - static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle, 307 - size_t size, enum dma_data_direction dir) 308 - { 309 - __dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)), 310 - handle & ~PAGE_MASK, size, dir); 311 - } 312 - #endif /* CONFIG_DMABOUNCE */ 313 - 314 - /** 315 - * dma_map_single - map a single buffer for streaming DMA 316 - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices 317 - * @cpu_addr: CPU direct mapped address of buffer 318 - * @size: size of buffer to map 319 - * @dir: DMA transfer direction 320 - * 321 - * Ensure that any data held in the cache is appropriately discarded 322 - * or written back. 323 - * 324 - * The device owns this memory once this call has completed. The CPU 325 - * can regain ownership by calling dma_unmap_single() or 326 - * dma_sync_single_for_cpu(). 327 - */ 328 - static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr, 329 - size_t size, enum dma_data_direction dir) 330 - { 331 - unsigned long offset; 332 - struct page *page; 333 - dma_addr_t addr; 334 - 335 - BUG_ON(!virt_addr_valid(cpu_addr)); 336 - BUG_ON(!virt_addr_valid(cpu_addr + size - 1)); 337 - BUG_ON(!valid_dma_direction(dir)); 338 - 339 - page = virt_to_page(cpu_addr); 340 - offset = (unsigned long)cpu_addr & ~PAGE_MASK; 341 - addr = __dma_map_page(dev, page, offset, size, dir); 342 - debug_dma_map_page(dev, page, offset, size, dir, addr, true); 343 - 344 - return addr; 345 - } 346 - 347 - /** 348 - * dma_map_page - map a portion of a page for streaming DMA 349 - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices 350 - * @page: page that buffer resides in 351 - * @offset: offset into page for start of buffer 352 - * @size: size of buffer to map 353 - * @dir: DMA transfer direction 354 - * 355 - * Ensure that any data held in the cache is appropriately discarded 356 - * or written back. 357 - * 358 - * The device owns this memory once this call has completed. The CPU 359 - * can regain ownership by calling dma_unmap_page(). 360 - */ 361 - static inline dma_addr_t dma_map_page(struct device *dev, struct page *page, 362 - unsigned long offset, size_t size, enum dma_data_direction dir) 363 - { 364 - dma_addr_t addr; 365 - 366 - BUG_ON(!valid_dma_direction(dir)); 367 - 368 - addr = __dma_map_page(dev, page, offset, size, dir); 369 - debug_dma_map_page(dev, page, offset, size, dir, addr, false); 370 - 371 - return addr; 372 - } 373 - 374 - /** 375 - * dma_unmap_single - unmap a single buffer previously mapped 376 - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices 377 - * @handle: DMA address of buffer 378 - * @size: size of buffer (same as passed to dma_map_single) 379 - * @dir: DMA transfer direction (same as passed to dma_map_single) 380 - * 381 - * Unmap a single streaming mode DMA translation. The handle and size 382 - * must match what was provided in the previous dma_map_single() call. 383 - * All other usages are undefined. 384 - * 385 - * After this call, reads by the CPU to the buffer are guaranteed to see 386 - * whatever the device wrote there. 387 - */ 388 - static inline void dma_unmap_single(struct device *dev, dma_addr_t handle, 389 - size_t size, enum dma_data_direction dir) 390 - { 391 - debug_dma_unmap_page(dev, handle, size, dir, true); 392 - __dma_unmap_page(dev, handle, size, dir); 393 - } 394 - 395 - /** 396 - * dma_unmap_page - unmap a buffer previously mapped through dma_map_page() 397 - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices 398 - * @handle: DMA address of buffer 399 - * @size: size of buffer (same as passed to dma_map_page) 400 - * @dir: DMA transfer direction (same as passed to dma_map_page) 401 - * 402 - * Unmap a page streaming mode DMA translation. The handle and size 403 - * must match what was provided in the previous dma_map_page() call. 404 - * All other usages are undefined. 405 - * 406 - * After this call, reads by the CPU to the buffer are guaranteed to see 407 - * whatever the device wrote there. 408 - */ 409 - static inline void dma_unmap_page(struct device *dev, dma_addr_t handle, 410 - size_t size, enum dma_data_direction dir) 411 - { 412 - debug_dma_unmap_page(dev, handle, size, dir, false); 413 - __dma_unmap_page(dev, handle, size, dir); 414 - } 415 - 416 - /** 417 - * dma_sync_single_range_for_cpu 418 - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices 419 - * @handle: DMA address of buffer 420 - * @offset: offset of region to start sync 421 - * @size: size of region to sync 422 - * @dir: DMA transfer direction (same as passed to dma_map_single) 423 - * 424 - * Make physical memory consistent for a single streaming mode DMA 425 - * translation after a transfer. 426 - * 427 - * If you perform a dma_map_single() but wish to interrogate the 428 - * buffer using the cpu, yet do not wish to teardown the PCI dma 429 - * mapping, you must call this function before doing so. At the 430 - * next point you give the PCI dma address back to the card, you 431 - * must first the perform a dma_sync_for_device, and then the 432 - * device again owns the buffer. 433 - */ 434 - static inline void dma_sync_single_range_for_cpu(struct device *dev, 435 - dma_addr_t handle, unsigned long offset, size_t size, 436 - enum dma_data_direction dir) 437 - { 438 - BUG_ON(!valid_dma_direction(dir)); 439 - 440 - debug_dma_sync_single_for_cpu(dev, handle + offset, size, dir); 441 - 442 - if (!dmabounce_sync_for_cpu(dev, handle, offset, size, dir)) 443 - return; 444 - 445 - __dma_single_dev_to_cpu(dma_to_virt(dev, handle) + offset, size, dir); 446 - } 447 - 448 - static inline void dma_sync_single_range_for_device(struct device *dev, 449 - dma_addr_t handle, unsigned long offset, size_t size, 450 - enum dma_data_direction dir) 451 - { 452 - BUG_ON(!valid_dma_direction(dir)); 453 - 454 - debug_dma_sync_single_for_device(dev, handle + offset, size, dir); 455 - 456 - if (!dmabounce_sync_for_device(dev, handle, offset, size, dir)) 457 - return; 458 - 459 - __dma_single_cpu_to_dev(dma_to_virt(dev, handle) + offset, size, dir); 460 - } 461 - 462 - static inline void dma_sync_single_for_cpu(struct device *dev, 463 - dma_addr_t handle, size_t size, enum dma_data_direction dir) 464 - { 465 - dma_sync_single_range_for_cpu(dev, handle, 0, size, dir); 466 - } 467 - 468 - static inline void dma_sync_single_for_device(struct device *dev, 469 - dma_addr_t handle, size_t size, enum dma_data_direction dir) 470 - { 471 - dma_sync_single_range_for_device(dev, handle, 0, size, dir); 472 - } 473 257 474 258 /* 475 259 * The scatter list versions of the above methods. 476 260 */ 477 - extern int dma_map_sg(struct device *, struct scatterlist *, int, 261 + extern int arm_dma_map_sg(struct device *, struct scatterlist *, int, 262 + enum dma_data_direction, struct dma_attrs *attrs); 263 + extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int, 264 + enum dma_data_direction, struct dma_attrs *attrs); 265 + extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int, 478 266 enum dma_data_direction); 479 - extern void dma_unmap_sg(struct device *, struct scatterlist *, int, 267 + extern void arm_dma_sync_sg_for_device(struct device *, struct scatterlist *, int, 480 268 enum dma_data_direction); 481 - extern void dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int, 482 - enum dma_data_direction); 483 - extern void dma_sync_sg_for_device(struct device *, struct scatterlist *, int, 484 - enum dma_data_direction); 485 - 486 269 487 270 #endif /* __KERNEL__ */ 488 271 #endif
+1
arch/arm/include/asm/mach/map.h
··· 30 30 #define MT_MEMORY_DTCM 12 31 31 #define MT_MEMORY_ITCM 13 32 32 #define MT_MEMORY_SO 14 33 + #define MT_MEMORY_DMA_READY 15 33 34 34 35 #ifdef CONFIG_MMU 35 36 extern void iotable_init(struct map_desc *, int);
+3 -6
arch/arm/kernel/setup.c
··· 81 81 extern void paging_init(struct machine_desc *desc); 82 82 extern void sanity_check_meminfo(void); 83 83 extern void reboot_setup(char *str); 84 + extern void setup_dma_zone(struct machine_desc *desc); 84 85 85 86 unsigned int processor_id; 86 87 EXPORT_SYMBOL(processor_id); ··· 940 939 machine_desc = mdesc; 941 940 machine_name = mdesc->name; 942 941 943 - #ifdef CONFIG_ZONE_DMA 944 - if (mdesc->dma_zone_size) { 945 - extern unsigned long arm_dma_zone_size; 946 - arm_dma_zone_size = mdesc->dma_zone_size; 947 - } 948 - #endif 942 + setup_dma_zone(mdesc); 943 + 949 944 if (mdesc->restart_mode) 950 945 reboot_setup(&mdesc->restart_mode); 951 946
+1149 -199
arch/arm/mm/dma-mapping.c
··· 17 17 #include <linux/init.h> 18 18 #include <linux/device.h> 19 19 #include <linux/dma-mapping.h> 20 + #include <linux/dma-contiguous.h> 20 21 #include <linux/highmem.h> 22 + #include <linux/memblock.h> 21 23 #include <linux/slab.h> 24 + #include <linux/iommu.h> 25 + #include <linux/vmalloc.h> 22 26 23 27 #include <asm/memory.h> 24 28 #include <asm/highmem.h> ··· 30 26 #include <asm/tlbflush.h> 31 27 #include <asm/sizes.h> 32 28 #include <asm/mach/arch.h> 29 + #include <asm/dma-iommu.h> 30 + #include <asm/mach/map.h> 31 + #include <asm/system_info.h> 32 + #include <asm/dma-contiguous.h> 33 33 34 34 #include "mm.h" 35 + 36 + /* 37 + * The DMA API is built upon the notion of "buffer ownership". A buffer 38 + * is either exclusively owned by the CPU (and therefore may be accessed 39 + * by it) or exclusively owned by the DMA device. These helper functions 40 + * represent the transitions between these two ownership states. 41 + * 42 + * Note, however, that on later ARMs, this notion does not work due to 43 + * speculative prefetches. We model our approach on the assumption that 44 + * the CPU does do speculative prefetches, which means we clean caches 45 + * before transfers and delay cache invalidation until transfer completion. 46 + * 47 + */ 48 + static void __dma_page_cpu_to_dev(struct page *, unsigned long, 49 + size_t, enum dma_data_direction); 50 + static void __dma_page_dev_to_cpu(struct page *, unsigned long, 51 + size_t, enum dma_data_direction); 52 + 53 + /** 54 + * arm_dma_map_page - map a portion of a page for streaming DMA 55 + * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices 56 + * @page: page that buffer resides in 57 + * @offset: offset into page for start of buffer 58 + * @size: size of buffer to map 59 + * @dir: DMA transfer direction 60 + * 61 + * Ensure that any data held in the cache is appropriately discarded 62 + * or written back. 63 + * 64 + * The device owns this memory once this call has completed. The CPU 65 + * can regain ownership by calling dma_unmap_page(). 66 + */ 67 + static dma_addr_t arm_dma_map_page(struct device *dev, struct page *page, 68 + unsigned long offset, size_t size, enum dma_data_direction dir, 69 + struct dma_attrs *attrs) 70 + { 71 + if (!arch_is_coherent()) 72 + __dma_page_cpu_to_dev(page, offset, size, dir); 73 + return pfn_to_dma(dev, page_to_pfn(page)) + offset; 74 + } 75 + 76 + /** 77 + * arm_dma_unmap_page - unmap a buffer previously mapped through dma_map_page() 78 + * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices 79 + * @handle: DMA address of buffer 80 + * @size: size of buffer (same as passed to dma_map_page) 81 + * @dir: DMA transfer direction (same as passed to dma_map_page) 82 + * 83 + * Unmap a page streaming mode DMA translation. The handle and size 84 + * must match what was provided in the previous dma_map_page() call. 85 + * All other usages are undefined. 86 + * 87 + * After this call, reads by the CPU to the buffer are guaranteed to see 88 + * whatever the device wrote there. 89 + */ 90 + static void arm_dma_unmap_page(struct device *dev, dma_addr_t handle, 91 + size_t size, enum dma_data_direction dir, 92 + struct dma_attrs *attrs) 93 + { 94 + if (!arch_is_coherent()) 95 + __dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)), 96 + handle & ~PAGE_MASK, size, dir); 97 + } 98 + 99 + static void arm_dma_sync_single_for_cpu(struct device *dev, 100 + dma_addr_t handle, size_t size, enum dma_data_direction dir) 101 + { 102 + unsigned int offset = handle & (PAGE_SIZE - 1); 103 + struct page *page = pfn_to_page(dma_to_pfn(dev, handle-offset)); 104 + if (!arch_is_coherent()) 105 + __dma_page_dev_to_cpu(page, offset, size, dir); 106 + } 107 + 108 + static void arm_dma_sync_single_for_device(struct device *dev, 109 + dma_addr_t handle, size_t size, enum dma_data_direction dir) 110 + { 111 + unsigned int offset = handle & (PAGE_SIZE - 1); 112 + struct page *page = pfn_to_page(dma_to_pfn(dev, handle-offset)); 113 + if (!arch_is_coherent()) 114 + __dma_page_cpu_to_dev(page, offset, size, dir); 115 + } 116 + 117 + static int arm_dma_set_mask(struct device *dev, u64 dma_mask); 118 + 119 + struct dma_map_ops arm_dma_ops = { 120 + .alloc = arm_dma_alloc, 121 + .free = arm_dma_free, 122 + .mmap = arm_dma_mmap, 123 + .map_page = arm_dma_map_page, 124 + .unmap_page = arm_dma_unmap_page, 125 + .map_sg = arm_dma_map_sg, 126 + .unmap_sg = arm_dma_unmap_sg, 127 + .sync_single_for_cpu = arm_dma_sync_single_for_cpu, 128 + .sync_single_for_device = arm_dma_sync_single_for_device, 129 + .sync_sg_for_cpu = arm_dma_sync_sg_for_cpu, 130 + .sync_sg_for_device = arm_dma_sync_sg_for_device, 131 + .set_dma_mask = arm_dma_set_mask, 132 + }; 133 + EXPORT_SYMBOL(arm_dma_ops); 35 134 36 135 static u64 get_coherent_dma_mask(struct device *dev) 37 136 { ··· 163 56 return mask; 164 57 } 165 58 59 + static void __dma_clear_buffer(struct page *page, size_t size) 60 + { 61 + void *ptr; 62 + /* 63 + * Ensure that the allocated pages are zeroed, and that any data 64 + * lurking in the kernel direct-mapped region is invalidated. 65 + */ 66 + ptr = page_address(page); 67 + if (ptr) { 68 + memset(ptr, 0, size); 69 + dmac_flush_range(ptr, ptr + size); 70 + outer_flush_range(__pa(ptr), __pa(ptr) + size); 71 + } 72 + } 73 + 166 74 /* 167 75 * Allocate a DMA buffer for 'dev' of size 'size' using the 168 76 * specified gfp mask. Note that 'size' must be page aligned. ··· 186 64 { 187 65 unsigned long order = get_order(size); 188 66 struct page *page, *p, *e; 189 - void *ptr; 190 - u64 mask = get_coherent_dma_mask(dev); 191 - 192 - #ifdef CONFIG_DMA_API_DEBUG 193 - u64 limit = (mask + 1) & ~mask; 194 - if (limit && size >= limit) { 195 - dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n", 196 - size, mask); 197 - return NULL; 198 - } 199 - #endif 200 - 201 - if (!mask) 202 - return NULL; 203 - 204 - if (mask < 0xffffffffULL) 205 - gfp |= GFP_DMA; 206 67 207 68 page = alloc_pages(gfp, order); 208 69 if (!page) ··· 198 93 for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++) 199 94 __free_page(p); 200 95 201 - /* 202 - * Ensure that the allocated pages are zeroed, and that any data 203 - * lurking in the kernel direct-mapped region is invalidated. 204 - */ 205 - ptr = page_address(page); 206 - memset(ptr, 0, size); 207 - dmac_flush_range(ptr, ptr + size); 208 - outer_flush_range(__pa(ptr), __pa(ptr) + size); 96 + __dma_clear_buffer(page, size); 209 97 210 98 return page; 211 99 } ··· 268 170 unsigned long base = consistent_base; 269 171 unsigned long num_ptes = (CONSISTENT_END - base) >> PMD_SHIFT; 270 172 173 + #ifndef CONFIG_ARM_DMA_USE_IOMMU 174 + if (cpu_architecture() >= CPU_ARCH_ARMv6) 175 + return 0; 176 + #endif 177 + 271 178 consistent_pte = kmalloc(num_ptes * sizeof(pte_t), GFP_KERNEL); 272 179 if (!consistent_pte) { 273 180 pr_err("%s: no memory\n", __func__); ··· 287 184 288 185 pud = pud_alloc(&init_mm, pgd, base); 289 186 if (!pud) { 290 - printk(KERN_ERR "%s: no pud tables\n", __func__); 187 + pr_err("%s: no pud tables\n", __func__); 291 188 ret = -ENOMEM; 292 189 break; 293 190 } 294 191 295 192 pmd = pmd_alloc(&init_mm, pud, base); 296 193 if (!pmd) { 297 - printk(KERN_ERR "%s: no pmd tables\n", __func__); 194 + pr_err("%s: no pmd tables\n", __func__); 298 195 ret = -ENOMEM; 299 196 break; 300 197 } ··· 302 199 303 200 pte = pte_alloc_kernel(pmd, base); 304 201 if (!pte) { 305 - printk(KERN_ERR "%s: no pte tables\n", __func__); 202 + pr_err("%s: no pte tables\n", __func__); 306 203 ret = -ENOMEM; 307 204 break; 308 205 } ··· 313 210 314 211 return ret; 315 212 } 316 - 317 213 core_initcall(consistent_init); 214 + 215 + static void *__alloc_from_contiguous(struct device *dev, size_t size, 216 + pgprot_t prot, struct page **ret_page); 217 + 218 + static struct arm_vmregion_head coherent_head = { 219 + .vm_lock = __SPIN_LOCK_UNLOCKED(&coherent_head.vm_lock), 220 + .vm_list = LIST_HEAD_INIT(coherent_head.vm_list), 221 + }; 222 + 223 + size_t coherent_pool_size = DEFAULT_CONSISTENT_DMA_SIZE / 8; 224 + 225 + static int __init early_coherent_pool(char *p) 226 + { 227 + coherent_pool_size = memparse(p, &p); 228 + return 0; 229 + } 230 + early_param("coherent_pool", early_coherent_pool); 231 + 232 + /* 233 + * Initialise the coherent pool for atomic allocations. 234 + */ 235 + static int __init coherent_init(void) 236 + { 237 + pgprot_t prot = pgprot_dmacoherent(pgprot_kernel); 238 + size_t size = coherent_pool_size; 239 + struct page *page; 240 + void *ptr; 241 + 242 + if (cpu_architecture() < CPU_ARCH_ARMv6) 243 + return 0; 244 + 245 + ptr = __alloc_from_contiguous(NULL, size, prot, &page); 246 + if (ptr) { 247 + coherent_head.vm_start = (unsigned long) ptr; 248 + coherent_head.vm_end = (unsigned long) ptr + size; 249 + printk(KERN_INFO "DMA: preallocated %u KiB pool for atomic coherent allocations\n", 250 + (unsigned)size / 1024); 251 + return 0; 252 + } 253 + printk(KERN_ERR "DMA: failed to allocate %u KiB pool for atomic coherent allocation\n", 254 + (unsigned)size / 1024); 255 + return -ENOMEM; 256 + } 257 + /* 258 + * CMA is activated by core_initcall, so we must be called after it. 259 + */ 260 + postcore_initcall(coherent_init); 261 + 262 + struct dma_contig_early_reserve { 263 + phys_addr_t base; 264 + unsigned long size; 265 + }; 266 + 267 + static struct dma_contig_early_reserve dma_mmu_remap[MAX_CMA_AREAS] __initdata; 268 + 269 + static int dma_mmu_remap_num __initdata; 270 + 271 + void __init dma_contiguous_early_fixup(phys_addr_t base, unsigned long size) 272 + { 273 + dma_mmu_remap[dma_mmu_remap_num].base = base; 274 + dma_mmu_remap[dma_mmu_remap_num].size = size; 275 + dma_mmu_remap_num++; 276 + } 277 + 278 + void __init dma_contiguous_remap(void) 279 + { 280 + int i; 281 + for (i = 0; i < dma_mmu_remap_num; i++) { 282 + phys_addr_t start = dma_mmu_remap[i].base; 283 + phys_addr_t end = start + dma_mmu_remap[i].size; 284 + struct map_desc map; 285 + unsigned long addr; 286 + 287 + if (end > arm_lowmem_limit) 288 + end = arm_lowmem_limit; 289 + if (start >= end) 290 + return; 291 + 292 + map.pfn = __phys_to_pfn(start); 293 + map.virtual = __phys_to_virt(start); 294 + map.length = end - start; 295 + map.type = MT_MEMORY_DMA_READY; 296 + 297 + /* 298 + * Clear previous low-memory mapping 299 + */ 300 + for (addr = __phys_to_virt(start); addr < __phys_to_virt(end); 301 + addr += PMD_SIZE) 302 + pmd_clear(pmd_off_k(addr)); 303 + 304 + iotable_init(&map, 1); 305 + } 306 + } 318 307 319 308 static void * 320 309 __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot, ··· 417 222 int bit; 418 223 419 224 if (!consistent_pte) { 420 - printk(KERN_ERR "%s: not initialised\n", __func__); 225 + pr_err("%s: not initialised\n", __func__); 421 226 dump_stack(); 422 227 return NULL; 423 228 } ··· 444 249 u32 off = CONSISTENT_OFFSET(c->vm_start) & (PTRS_PER_PTE-1); 445 250 446 251 pte = consistent_pte[idx] + off; 447 - c->vm_pages = page; 252 + c->priv = page; 448 253 449 254 do { 450 255 BUG_ON(!pte_none(*pte)); ··· 476 281 477 282 c = arm_vmregion_find_remove(&consistent_head, (unsigned long)cpu_addr); 478 283 if (!c) { 479 - printk(KERN_ERR "%s: trying to free invalid coherent area: %p\n", 284 + pr_err("%s: trying to free invalid coherent area: %p\n", 480 285 __func__, cpu_addr); 481 286 dump_stack(); 482 287 return; 483 288 } 484 289 485 290 if ((c->vm_end - c->vm_start) != size) { 486 - printk(KERN_ERR "%s: freeing wrong coherent size (%ld != %d)\n", 291 + pr_err("%s: freeing wrong coherent size (%ld != %d)\n", 487 292 __func__, c->vm_end - c->vm_start, size); 488 293 dump_stack(); 489 294 size = c->vm_end - c->vm_start; ··· 505 310 } 506 311 507 312 if (pte_none(pte) || !pte_present(pte)) 508 - printk(KERN_CRIT "%s: bad page in kernel page table\n", 509 - __func__); 313 + pr_crit("%s: bad page in kernel page table\n", 314 + __func__); 510 315 } while (size -= PAGE_SIZE); 511 316 512 317 flush_tlb_kernel_range(c->vm_start, c->vm_end); ··· 514 319 arm_vmregion_free(&consistent_head, c); 515 320 } 516 321 322 + static int __dma_update_pte(pte_t *pte, pgtable_t token, unsigned long addr, 323 + void *data) 324 + { 325 + struct page *page = virt_to_page(addr); 326 + pgprot_t prot = *(pgprot_t *)data; 327 + 328 + set_pte_ext(pte, mk_pte(page, prot), 0); 329 + return 0; 330 + } 331 + 332 + static void __dma_remap(struct page *page, size_t size, pgprot_t prot) 333 + { 334 + unsigned long start = (unsigned long) page_address(page); 335 + unsigned end = start + size; 336 + 337 + apply_to_page_range(&init_mm, start, size, __dma_update_pte, &prot); 338 + dsb(); 339 + flush_tlb_kernel_range(start, end); 340 + } 341 + 342 + static void *__alloc_remap_buffer(struct device *dev, size_t size, gfp_t gfp, 343 + pgprot_t prot, struct page **ret_page, 344 + const void *caller) 345 + { 346 + struct page *page; 347 + void *ptr; 348 + page = __dma_alloc_buffer(dev, size, gfp); 349 + if (!page) 350 + return NULL; 351 + 352 + ptr = __dma_alloc_remap(page, size, gfp, prot, caller); 353 + if (!ptr) { 354 + __dma_free_buffer(page, size); 355 + return NULL; 356 + } 357 + 358 + *ret_page = page; 359 + return ptr; 360 + } 361 + 362 + static void *__alloc_from_pool(struct device *dev, size_t size, 363 + struct page **ret_page, const void *caller) 364 + { 365 + struct arm_vmregion *c; 366 + size_t align; 367 + 368 + if (!coherent_head.vm_start) { 369 + printk(KERN_ERR "%s: coherent pool not initialised!\n", 370 + __func__); 371 + dump_stack(); 372 + return NULL; 373 + } 374 + 375 + /* 376 + * Align the region allocation - allocations from pool are rather 377 + * small, so align them to their order in pages, minimum is a page 378 + * size. This helps reduce fragmentation of the DMA space. 379 + */ 380 + align = PAGE_SIZE << get_order(size); 381 + c = arm_vmregion_alloc(&coherent_head, align, size, 0, caller); 382 + if (c) { 383 + void *ptr = (void *)c->vm_start; 384 + struct page *page = virt_to_page(ptr); 385 + *ret_page = page; 386 + return ptr; 387 + } 388 + return NULL; 389 + } 390 + 391 + static int __free_from_pool(void *cpu_addr, size_t size) 392 + { 393 + unsigned long start = (unsigned long)cpu_addr; 394 + unsigned long end = start + size; 395 + struct arm_vmregion *c; 396 + 397 + if (start < coherent_head.vm_start || end > coherent_head.vm_end) 398 + return 0; 399 + 400 + c = arm_vmregion_find_remove(&coherent_head, (unsigned long)start); 401 + 402 + if ((c->vm_end - c->vm_start) != size) { 403 + printk(KERN_ERR "%s: freeing wrong coherent size (%ld != %d)\n", 404 + __func__, c->vm_end - c->vm_start, size); 405 + dump_stack(); 406 + size = c->vm_end - c->vm_start; 407 + } 408 + 409 + arm_vmregion_free(&coherent_head, c); 410 + return 1; 411 + } 412 + 413 + static void *__alloc_from_contiguous(struct device *dev, size_t size, 414 + pgprot_t prot, struct page **ret_page) 415 + { 416 + unsigned long order = get_order(size); 417 + size_t count = size >> PAGE_SHIFT; 418 + struct page *page; 419 + 420 + page = dma_alloc_from_contiguous(dev, count, order); 421 + if (!page) 422 + return NULL; 423 + 424 + __dma_clear_buffer(page, size); 425 + __dma_remap(page, size, prot); 426 + 427 + *ret_page = page; 428 + return page_address(page); 429 + } 430 + 431 + static void __free_from_contiguous(struct device *dev, struct page *page, 432 + size_t size) 433 + { 434 + __dma_remap(page, size, pgprot_kernel); 435 + dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT); 436 + } 437 + 438 + static inline pgprot_t __get_dma_pgprot(struct dma_attrs *attrs, pgprot_t prot) 439 + { 440 + prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ? 441 + pgprot_writecombine(prot) : 442 + pgprot_dmacoherent(prot); 443 + return prot; 444 + } 445 + 446 + #define nommu() 0 447 + 517 448 #else /* !CONFIG_MMU */ 518 449 519 - #define __dma_alloc_remap(page, size, gfp, prot, c) page_address(page) 520 - #define __dma_free_remap(addr, size) do { } while (0) 450 + #define nommu() 1 451 + 452 + #define __get_dma_pgprot(attrs, prot) __pgprot(0) 453 + #define __alloc_remap_buffer(dev, size, gfp, prot, ret, c) NULL 454 + #define __alloc_from_pool(dev, size, ret_page, c) NULL 455 + #define __alloc_from_contiguous(dev, size, prot, ret) NULL 456 + #define __free_from_pool(cpu_addr, size) 0 457 + #define __free_from_contiguous(dev, page, size) do { } while (0) 458 + #define __dma_free_remap(cpu_addr, size) do { } while (0) 521 459 522 460 #endif /* CONFIG_MMU */ 523 461 524 - static void * 525 - __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, 526 - pgprot_t prot, const void *caller) 462 + static void *__alloc_simple_buffer(struct device *dev, size_t size, gfp_t gfp, 463 + struct page **ret_page) 527 464 { 528 465 struct page *page; 466 + page = __dma_alloc_buffer(dev, size, gfp); 467 + if (!page) 468 + return NULL; 469 + 470 + *ret_page = page; 471 + return page_address(page); 472 + } 473 + 474 + 475 + 476 + static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, 477 + gfp_t gfp, pgprot_t prot, const void *caller) 478 + { 479 + u64 mask = get_coherent_dma_mask(dev); 480 + struct page *page; 529 481 void *addr; 482 + 483 + #ifdef CONFIG_DMA_API_DEBUG 484 + u64 limit = (mask + 1) & ~mask; 485 + if (limit && size >= limit) { 486 + dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n", 487 + size, mask); 488 + return NULL; 489 + } 490 + #endif 491 + 492 + if (!mask) 493 + return NULL; 494 + 495 + if (mask < 0xffffffffULL) 496 + gfp |= GFP_DMA; 530 497 531 498 /* 532 499 * Following is a work-around (a.k.a. hack) to prevent pages ··· 699 342 */ 700 343 gfp &= ~(__GFP_COMP); 701 344 702 - *handle = ~0; 345 + *handle = DMA_ERROR_CODE; 703 346 size = PAGE_ALIGN(size); 704 347 705 - page = __dma_alloc_buffer(dev, size, gfp); 706 - if (!page) 707 - return NULL; 708 - 709 - if (!arch_is_coherent()) 710 - addr = __dma_alloc_remap(page, size, gfp, prot, caller); 348 + if (arch_is_coherent() || nommu()) 349 + addr = __alloc_simple_buffer(dev, size, gfp, &page); 350 + else if (cpu_architecture() < CPU_ARCH_ARMv6) 351 + addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, caller); 352 + else if (gfp & GFP_ATOMIC) 353 + addr = __alloc_from_pool(dev, size, &page, caller); 711 354 else 712 - addr = page_address(page); 355 + addr = __alloc_from_contiguous(dev, size, prot, &page); 713 356 714 357 if (addr) 715 358 *handle = pfn_to_dma(dev, page_to_pfn(page)); 716 - else 717 - __dma_free_buffer(page, size); 718 359 719 360 return addr; 720 361 } ··· 721 366 * Allocate DMA-coherent memory space and return both the kernel remapped 722 367 * virtual and bus address for that space. 723 368 */ 724 - void * 725 - dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp) 369 + void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, 370 + gfp_t gfp, struct dma_attrs *attrs) 726 371 { 372 + pgprot_t prot = __get_dma_pgprot(attrs, pgprot_kernel); 727 373 void *memory; 728 374 729 375 if (dma_alloc_from_coherent(dev, size, handle, &memory)) 730 376 return memory; 731 377 732 - return __dma_alloc(dev, size, handle, gfp, 733 - pgprot_dmacoherent(pgprot_kernel), 378 + return __dma_alloc(dev, size, handle, gfp, prot, 734 379 __builtin_return_address(0)); 735 380 } 736 - EXPORT_SYMBOL(dma_alloc_coherent); 737 381 738 382 /* 739 - * Allocate a writecombining region, in much the same way as 740 - * dma_alloc_coherent above. 383 + * Create userspace mapping for the DMA-coherent memory. 741 384 */ 742 - void * 743 - dma_alloc_writecombine(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp) 744 - { 745 - return __dma_alloc(dev, size, handle, gfp, 746 - pgprot_writecombine(pgprot_kernel), 747 - __builtin_return_address(0)); 748 - } 749 - EXPORT_SYMBOL(dma_alloc_writecombine); 750 - 751 - static int dma_mmap(struct device *dev, struct vm_area_struct *vma, 752 - void *cpu_addr, dma_addr_t dma_addr, size_t size) 385 + int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma, 386 + void *cpu_addr, dma_addr_t dma_addr, size_t size, 387 + struct dma_attrs *attrs) 753 388 { 754 389 int ret = -ENXIO; 755 390 #ifdef CONFIG_MMU 756 - unsigned long user_size, kern_size; 757 - struct arm_vmregion *c; 391 + unsigned long pfn = dma_to_pfn(dev, dma_addr); 392 + vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot); 758 393 759 - user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; 394 + if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret)) 395 + return ret; 760 396 761 - c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr); 762 - if (c) { 763 - unsigned long off = vma->vm_pgoff; 764 - 765 - kern_size = (c->vm_end - c->vm_start) >> PAGE_SHIFT; 766 - 767 - if (off < kern_size && 768 - user_size <= (kern_size - off)) { 769 - ret = remap_pfn_range(vma, vma->vm_start, 770 - page_to_pfn(c->vm_pages) + off, 771 - user_size << PAGE_SHIFT, 772 - vma->vm_page_prot); 773 - } 774 - } 397 + ret = remap_pfn_range(vma, vma->vm_start, 398 + pfn + vma->vm_pgoff, 399 + vma->vm_end - vma->vm_start, 400 + vma->vm_page_prot); 775 401 #endif /* CONFIG_MMU */ 776 402 777 403 return ret; 778 404 } 779 405 780 - int dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma, 781 - void *cpu_addr, dma_addr_t dma_addr, size_t size) 782 - { 783 - vma->vm_page_prot = pgprot_dmacoherent(vma->vm_page_prot); 784 - return dma_mmap(dev, vma, cpu_addr, dma_addr, size); 785 - } 786 - EXPORT_SYMBOL(dma_mmap_coherent); 787 - 788 - int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma, 789 - void *cpu_addr, dma_addr_t dma_addr, size_t size) 790 - { 791 - vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); 792 - return dma_mmap(dev, vma, cpu_addr, dma_addr, size); 793 - } 794 - EXPORT_SYMBOL(dma_mmap_writecombine); 795 - 796 406 /* 797 - * free a page as defined by the above mapping. 798 - * Must not be called with IRQs disabled. 407 + * Free a buffer as defined by the above mapping. 799 408 */ 800 - void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle) 409 + void arm_dma_free(struct device *dev, size_t size, void *cpu_addr, 410 + dma_addr_t handle, struct dma_attrs *attrs) 801 411 { 802 - WARN_ON(irqs_disabled()); 412 + struct page *page = pfn_to_page(dma_to_pfn(dev, handle)); 803 413 804 414 if (dma_release_from_coherent(dev, get_order(size), cpu_addr)) 805 415 return; 806 416 807 417 size = PAGE_ALIGN(size); 808 418 809 - if (!arch_is_coherent()) 419 + if (arch_is_coherent() || nommu()) { 420 + __dma_free_buffer(page, size); 421 + } else if (cpu_architecture() < CPU_ARCH_ARMv6) { 810 422 __dma_free_remap(cpu_addr, size); 811 - 812 - __dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size); 813 - } 814 - EXPORT_SYMBOL(dma_free_coherent); 815 - 816 - /* 817 - * Make an area consistent for devices. 818 - * Note: Drivers should NOT use this function directly, as it will break 819 - * platforms with CONFIG_DMABOUNCE. 820 - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) 821 - */ 822 - void ___dma_single_cpu_to_dev(const void *kaddr, size_t size, 823 - enum dma_data_direction dir) 824 - { 825 - unsigned long paddr; 826 - 827 - BUG_ON(!virt_addr_valid(kaddr) || !virt_addr_valid(kaddr + size - 1)); 828 - 829 - dmac_map_area(kaddr, size, dir); 830 - 831 - paddr = __pa(kaddr); 832 - if (dir == DMA_FROM_DEVICE) { 833 - outer_inv_range(paddr, paddr + size); 423 + __dma_free_buffer(page, size); 834 424 } else { 835 - outer_clean_range(paddr, paddr + size); 425 + if (__free_from_pool(cpu_addr, size)) 426 + return; 427 + /* 428 + * Non-atomic allocations cannot be freed with IRQs disabled 429 + */ 430 + WARN_ON(irqs_disabled()); 431 + __free_from_contiguous(dev, page, size); 836 432 } 837 - /* FIXME: non-speculating: flush on bidirectional mappings? */ 838 433 } 839 - EXPORT_SYMBOL(___dma_single_cpu_to_dev); 840 - 841 - void ___dma_single_dev_to_cpu(const void *kaddr, size_t size, 842 - enum dma_data_direction dir) 843 - { 844 - BUG_ON(!virt_addr_valid(kaddr) || !virt_addr_valid(kaddr + size - 1)); 845 - 846 - /* FIXME: non-speculating: not required */ 847 - /* don't bother invalidating if DMA to device */ 848 - if (dir != DMA_TO_DEVICE) { 849 - unsigned long paddr = __pa(kaddr); 850 - outer_inv_range(paddr, paddr + size); 851 - } 852 - 853 - dmac_unmap_area(kaddr, size, dir); 854 - } 855 - EXPORT_SYMBOL(___dma_single_dev_to_cpu); 856 434 857 435 static void dma_cache_maint_page(struct page *page, unsigned long offset, 858 436 size_t size, enum dma_data_direction dir, ··· 831 543 } while (left); 832 544 } 833 545 834 - void ___dma_page_cpu_to_dev(struct page *page, unsigned long off, 546 + /* 547 + * Make an area consistent for devices. 548 + * Note: Drivers should NOT use this function directly, as it will break 549 + * platforms with CONFIG_DMABOUNCE. 550 + * Use the driver DMA support - see dma-mapping.h (dma_sync_*) 551 + */ 552 + static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, 835 553 size_t size, enum dma_data_direction dir) 836 554 { 837 555 unsigned long paddr; ··· 852 558 } 853 559 /* FIXME: non-speculating: flush on bidirectional mappings? */ 854 560 } 855 - EXPORT_SYMBOL(___dma_page_cpu_to_dev); 856 561 857 - void ___dma_page_dev_to_cpu(struct page *page, unsigned long off, 562 + static void __dma_page_dev_to_cpu(struct page *page, unsigned long off, 858 563 size_t size, enum dma_data_direction dir) 859 564 { 860 565 unsigned long paddr = page_to_phys(page) + off; ··· 871 578 if (dir != DMA_TO_DEVICE && off == 0 && size >= PAGE_SIZE) 872 579 set_bit(PG_dcache_clean, &page->flags); 873 580 } 874 - EXPORT_SYMBOL(___dma_page_dev_to_cpu); 875 581 876 582 /** 877 - * dma_map_sg - map a set of SG buffers for streaming mode DMA 583 + * arm_dma_map_sg - map a set of SG buffers for streaming mode DMA 878 584 * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices 879 585 * @sg: list of buffers 880 586 * @nents: number of buffers to map ··· 888 596 * Device ownership issues as mentioned for dma_map_single are the same 889 597 * here. 890 598 */ 891 - int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, 892 - enum dma_data_direction dir) 599 + int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, 600 + enum dma_data_direction dir, struct dma_attrs *attrs) 893 601 { 602 + struct dma_map_ops *ops = get_dma_ops(dev); 894 603 struct scatterlist *s; 895 604 int i, j; 896 605 897 - BUG_ON(!valid_dma_direction(dir)); 898 - 899 606 for_each_sg(sg, s, nents, i) { 900 - s->dma_address = __dma_map_page(dev, sg_page(s), s->offset, 901 - s->length, dir); 607 + #ifdef CONFIG_NEED_SG_DMA_LENGTH 608 + s->dma_length = s->length; 609 + #endif 610 + s->dma_address = ops->map_page(dev, sg_page(s), s->offset, 611 + s->length, dir, attrs); 902 612 if (dma_mapping_error(dev, s->dma_address)) 903 613 goto bad_mapping; 904 614 } 905 - debug_dma_map_sg(dev, sg, nents, nents, dir); 906 615 return nents; 907 616 908 617 bad_mapping: 909 618 for_each_sg(sg, s, i, j) 910 - __dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir); 619 + ops->unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir, attrs); 911 620 return 0; 912 621 } 913 - EXPORT_SYMBOL(dma_map_sg); 914 622 915 623 /** 916 - * dma_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg 624 + * arm_dma_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg 917 625 * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices 918 626 * @sg: list of buffers 919 627 * @nents: number of buffers to unmap (same as was passed to dma_map_sg) ··· 922 630 * Unmap a set of streaming mode DMA translations. Again, CPU access 923 631 * rules concerning calls here are the same as for dma_unmap_single(). 924 632 */ 925 - void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, 926 - enum dma_data_direction dir) 633 + void arm_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, 634 + enum dma_data_direction dir, struct dma_attrs *attrs) 927 635 { 636 + struct dma_map_ops *ops = get_dma_ops(dev); 928 637 struct scatterlist *s; 929 - int i; 930 638 931 - debug_dma_unmap_sg(dev, sg, nents, dir); 639 + int i; 932 640 933 641 for_each_sg(sg, s, nents, i) 934 - __dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir); 642 + ops->unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir, attrs); 935 643 } 936 - EXPORT_SYMBOL(dma_unmap_sg); 937 644 938 645 /** 939 - * dma_sync_sg_for_cpu 646 + * arm_dma_sync_sg_for_cpu 940 647 * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices 941 648 * @sg: list of buffers 942 649 * @nents: number of buffers to map (returned from dma_map_sg) 943 650 * @dir: DMA transfer direction (same as was passed to dma_map_sg) 944 651 */ 945 - void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, 652 + void arm_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, 946 653 int nents, enum dma_data_direction dir) 947 654 { 655 + struct dma_map_ops *ops = get_dma_ops(dev); 948 656 struct scatterlist *s; 949 657 int i; 950 658 951 - for_each_sg(sg, s, nents, i) { 952 - if (!dmabounce_sync_for_cpu(dev, sg_dma_address(s), 0, 953 - sg_dma_len(s), dir)) 954 - continue; 955 - 956 - __dma_page_dev_to_cpu(sg_page(s), s->offset, 957 - s->length, dir); 958 - } 959 - 960 - debug_dma_sync_sg_for_cpu(dev, sg, nents, dir); 659 + for_each_sg(sg, s, nents, i) 660 + ops->sync_single_for_cpu(dev, sg_dma_address(s), s->length, 661 + dir); 961 662 } 962 - EXPORT_SYMBOL(dma_sync_sg_for_cpu); 963 663 964 664 /** 965 - * dma_sync_sg_for_device 665 + * arm_dma_sync_sg_for_device 966 666 * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices 967 667 * @sg: list of buffers 968 668 * @nents: number of buffers to map (returned from dma_map_sg) 969 669 * @dir: DMA transfer direction (same as was passed to dma_map_sg) 970 670 */ 971 - void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, 671 + void arm_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, 972 672 int nents, enum dma_data_direction dir) 973 673 { 674 + struct dma_map_ops *ops = get_dma_ops(dev); 974 675 struct scatterlist *s; 975 676 int i; 976 677 977 - for_each_sg(sg, s, nents, i) { 978 - if (!dmabounce_sync_for_device(dev, sg_dma_address(s), 0, 979 - sg_dma_len(s), dir)) 980 - continue; 981 - 982 - __dma_page_cpu_to_dev(sg_page(s), s->offset, 983 - s->length, dir); 984 - } 985 - 986 - debug_dma_sync_sg_for_device(dev, sg, nents, dir); 678 + for_each_sg(sg, s, nents, i) 679 + ops->sync_single_for_device(dev, sg_dma_address(s), s->length, 680 + dir); 987 681 } 988 - EXPORT_SYMBOL(dma_sync_sg_for_device); 989 682 990 683 /* 991 684 * Return whether the given device DMA address mask can be supported ··· 986 709 } 987 710 EXPORT_SYMBOL(dma_supported); 988 711 989 - int dma_set_mask(struct device *dev, u64 dma_mask) 712 + static int arm_dma_set_mask(struct device *dev, u64 dma_mask) 990 713 { 991 714 if (!dev->dma_mask || !dma_supported(dev, dma_mask)) 992 715 return -EIO; 993 716 994 - #ifndef CONFIG_DMABOUNCE 995 717 *dev->dma_mask = dma_mask; 996 - #endif 997 718 998 719 return 0; 999 720 } 1000 - EXPORT_SYMBOL(dma_set_mask); 1001 721 1002 722 #define PREALLOC_DMA_DEBUG_ENTRIES 4096 1003 723 ··· 1007 733 return 0; 1008 734 } 1009 735 fs_initcall(dma_debug_do_init); 736 + 737 + #ifdef CONFIG_ARM_DMA_USE_IOMMU 738 + 739 + /* IOMMU */ 740 + 741 + static inline dma_addr_t __alloc_iova(struct dma_iommu_mapping *mapping, 742 + size_t size) 743 + { 744 + unsigned int order = get_order(size); 745 + unsigned int align = 0; 746 + unsigned int count, start; 747 + unsigned long flags; 748 + 749 + count = ((PAGE_ALIGN(size) >> PAGE_SHIFT) + 750 + (1 << mapping->order) - 1) >> mapping->order; 751 + 752 + if (order > mapping->order) 753 + align = (1 << (order - mapping->order)) - 1; 754 + 755 + spin_lock_irqsave(&mapping->lock, flags); 756 + start = bitmap_find_next_zero_area(mapping->bitmap, mapping->bits, 0, 757 + count, align); 758 + if (start > mapping->bits) { 759 + spin_unlock_irqrestore(&mapping->lock, flags); 760 + return DMA_ERROR_CODE; 761 + } 762 + 763 + bitmap_set(mapping->bitmap, start, count); 764 + spin_unlock_irqrestore(&mapping->lock, flags); 765 + 766 + return mapping->base + (start << (mapping->order + PAGE_SHIFT)); 767 + } 768 + 769 + static inline void __free_iova(struct dma_iommu_mapping *mapping, 770 + dma_addr_t addr, size_t size) 771 + { 772 + unsigned int start = (addr - mapping->base) >> 773 + (mapping->order + PAGE_SHIFT); 774 + unsigned int count = ((size >> PAGE_SHIFT) + 775 + (1 << mapping->order) - 1) >> mapping->order; 776 + unsigned long flags; 777 + 778 + spin_lock_irqsave(&mapping->lock, flags); 779 + bitmap_clear(mapping->bitmap, start, count); 780 + spin_unlock_irqrestore(&mapping->lock, flags); 781 + } 782 + 783 + static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, gfp_t gfp) 784 + { 785 + struct page **pages; 786 + int count = size >> PAGE_SHIFT; 787 + int array_size = count * sizeof(struct page *); 788 + int i = 0; 789 + 790 + if (array_size <= PAGE_SIZE) 791 + pages = kzalloc(array_size, gfp); 792 + else 793 + pages = vzalloc(array_size); 794 + if (!pages) 795 + return NULL; 796 + 797 + while (count) { 798 + int j, order = __ffs(count); 799 + 800 + pages[i] = alloc_pages(gfp | __GFP_NOWARN, order); 801 + while (!pages[i] && order) 802 + pages[i] = alloc_pages(gfp | __GFP_NOWARN, --order); 803 + if (!pages[i]) 804 + goto error; 805 + 806 + if (order) 807 + split_page(pages[i], order); 808 + j = 1 << order; 809 + while (--j) 810 + pages[i + j] = pages[i] + j; 811 + 812 + __dma_clear_buffer(pages[i], PAGE_SIZE << order); 813 + i += 1 << order; 814 + count -= 1 << order; 815 + } 816 + 817 + return pages; 818 + error: 819 + while (--i) 820 + if (pages[i]) 821 + __free_pages(pages[i], 0); 822 + if (array_size < PAGE_SIZE) 823 + kfree(pages); 824 + else 825 + vfree(pages); 826 + return NULL; 827 + } 828 + 829 + static int __iommu_free_buffer(struct device *dev, struct page **pages, size_t size) 830 + { 831 + int count = size >> PAGE_SHIFT; 832 + int array_size = count * sizeof(struct page *); 833 + int i; 834 + for (i = 0; i < count; i++) 835 + if (pages[i]) 836 + __free_pages(pages[i], 0); 837 + if (array_size < PAGE_SIZE) 838 + kfree(pages); 839 + else 840 + vfree(pages); 841 + return 0; 842 + } 843 + 844 + /* 845 + * Create a CPU mapping for a specified pages 846 + */ 847 + static void * 848 + __iommu_alloc_remap(struct page **pages, size_t size, gfp_t gfp, pgprot_t prot) 849 + { 850 + struct arm_vmregion *c; 851 + size_t align; 852 + size_t count = size >> PAGE_SHIFT; 853 + int bit; 854 + 855 + if (!consistent_pte[0]) { 856 + pr_err("%s: not initialised\n", __func__); 857 + dump_stack(); 858 + return NULL; 859 + } 860 + 861 + /* 862 + * Align the virtual region allocation - maximum alignment is 863 + * a section size, minimum is a page size. This helps reduce 864 + * fragmentation of the DMA space, and also prevents allocations 865 + * smaller than a section from crossing a section boundary. 866 + */ 867 + bit = fls(size - 1); 868 + if (bit > SECTION_SHIFT) 869 + bit = SECTION_SHIFT; 870 + align = 1 << bit; 871 + 872 + /* 873 + * Allocate a virtual address in the consistent mapping region. 874 + */ 875 + c = arm_vmregion_alloc(&consistent_head, align, size, 876 + gfp & ~(__GFP_DMA | __GFP_HIGHMEM), NULL); 877 + if (c) { 878 + pte_t *pte; 879 + int idx = CONSISTENT_PTE_INDEX(c->vm_start); 880 + int i = 0; 881 + u32 off = CONSISTENT_OFFSET(c->vm_start) & (PTRS_PER_PTE-1); 882 + 883 + pte = consistent_pte[idx] + off; 884 + c->priv = pages; 885 + 886 + do { 887 + BUG_ON(!pte_none(*pte)); 888 + 889 + set_pte_ext(pte, mk_pte(pages[i], prot), 0); 890 + pte++; 891 + off++; 892 + i++; 893 + if (off >= PTRS_PER_PTE) { 894 + off = 0; 895 + pte = consistent_pte[++idx]; 896 + } 897 + } while (i < count); 898 + 899 + dsb(); 900 + 901 + return (void *)c->vm_start; 902 + } 903 + return NULL; 904 + } 905 + 906 + /* 907 + * Create a mapping in device IO address space for specified pages 908 + */ 909 + static dma_addr_t 910 + __iommu_create_mapping(struct device *dev, struct page **pages, size_t size) 911 + { 912 + struct dma_iommu_mapping *mapping = dev->archdata.mapping; 913 + unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT; 914 + dma_addr_t dma_addr, iova; 915 + int i, ret = DMA_ERROR_CODE; 916 + 917 + dma_addr = __alloc_iova(mapping, size); 918 + if (dma_addr == DMA_ERROR_CODE) 919 + return dma_addr; 920 + 921 + iova = dma_addr; 922 + for (i = 0; i < count; ) { 923 + unsigned int next_pfn = page_to_pfn(pages[i]) + 1; 924 + phys_addr_t phys = page_to_phys(pages[i]); 925 + unsigned int len, j; 926 + 927 + for (j = i + 1; j < count; j++, next_pfn++) 928 + if (page_to_pfn(pages[j]) != next_pfn) 929 + break; 930 + 931 + len = (j - i) << PAGE_SHIFT; 932 + ret = iommu_map(mapping->domain, iova, phys, len, 0); 933 + if (ret < 0) 934 + goto fail; 935 + iova += len; 936 + i = j; 937 + } 938 + return dma_addr; 939 + fail: 940 + iommu_unmap(mapping->domain, dma_addr, iova-dma_addr); 941 + __free_iova(mapping, dma_addr, size); 942 + return DMA_ERROR_CODE; 943 + } 944 + 945 + static int __iommu_remove_mapping(struct device *dev, dma_addr_t iova, size_t size) 946 + { 947 + struct dma_iommu_mapping *mapping = dev->archdata.mapping; 948 + 949 + /* 950 + * add optional in-page offset from iova to size and align 951 + * result to page size 952 + */ 953 + size = PAGE_ALIGN((iova & ~PAGE_MASK) + size); 954 + iova &= PAGE_MASK; 955 + 956 + iommu_unmap(mapping->domain, iova, size); 957 + __free_iova(mapping, iova, size); 958 + return 0; 959 + } 960 + 961 + static void *arm_iommu_alloc_attrs(struct device *dev, size_t size, 962 + dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs) 963 + { 964 + pgprot_t prot = __get_dma_pgprot(attrs, pgprot_kernel); 965 + struct page **pages; 966 + void *addr = NULL; 967 + 968 + *handle = DMA_ERROR_CODE; 969 + size = PAGE_ALIGN(size); 970 + 971 + pages = __iommu_alloc_buffer(dev, size, gfp); 972 + if (!pages) 973 + return NULL; 974 + 975 + *handle = __iommu_create_mapping(dev, pages, size); 976 + if (*handle == DMA_ERROR_CODE) 977 + goto err_buffer; 978 + 979 + addr = __iommu_alloc_remap(pages, size, gfp, prot); 980 + if (!addr) 981 + goto err_mapping; 982 + 983 + return addr; 984 + 985 + err_mapping: 986 + __iommu_remove_mapping(dev, *handle, size); 987 + err_buffer: 988 + __iommu_free_buffer(dev, pages, size); 989 + return NULL; 990 + } 991 + 992 + static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma, 993 + void *cpu_addr, dma_addr_t dma_addr, size_t size, 994 + struct dma_attrs *attrs) 995 + { 996 + struct arm_vmregion *c; 997 + 998 + vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot); 999 + c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr); 1000 + 1001 + if (c) { 1002 + struct page **pages = c->priv; 1003 + 1004 + unsigned long uaddr = vma->vm_start; 1005 + unsigned long usize = vma->vm_end - vma->vm_start; 1006 + int i = 0; 1007 + 1008 + do { 1009 + int ret; 1010 + 1011 + ret = vm_insert_page(vma, uaddr, pages[i++]); 1012 + if (ret) { 1013 + pr_err("Remapping memory, error: %d\n", ret); 1014 + return ret; 1015 + } 1016 + 1017 + uaddr += PAGE_SIZE; 1018 + usize -= PAGE_SIZE; 1019 + } while (usize > 0); 1020 + } 1021 + return 0; 1022 + } 1023 + 1024 + /* 1025 + * free a page as defined by the above mapping. 1026 + * Must not be called with IRQs disabled. 1027 + */ 1028 + void arm_iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr, 1029 + dma_addr_t handle, struct dma_attrs *attrs) 1030 + { 1031 + struct arm_vmregion *c; 1032 + size = PAGE_ALIGN(size); 1033 + 1034 + c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr); 1035 + if (c) { 1036 + struct page **pages = c->priv; 1037 + __dma_free_remap(cpu_addr, size); 1038 + __iommu_remove_mapping(dev, handle, size); 1039 + __iommu_free_buffer(dev, pages, size); 1040 + } 1041 + } 1042 + 1043 + /* 1044 + * Map a part of the scatter-gather list into contiguous io address space 1045 + */ 1046 + static int __map_sg_chunk(struct device *dev, struct scatterlist *sg, 1047 + size_t size, dma_addr_t *handle, 1048 + enum dma_data_direction dir) 1049 + { 1050 + struct dma_iommu_mapping *mapping = dev->archdata.mapping; 1051 + dma_addr_t iova, iova_base; 1052 + int ret = 0; 1053 + unsigned int count; 1054 + struct scatterlist *s; 1055 + 1056 + size = PAGE_ALIGN(size); 1057 + *handle = DMA_ERROR_CODE; 1058 + 1059 + iova_base = iova = __alloc_iova(mapping, size); 1060 + if (iova == DMA_ERROR_CODE) 1061 + return -ENOMEM; 1062 + 1063 + for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) { 1064 + phys_addr_t phys = page_to_phys(sg_page(s)); 1065 + unsigned int len = PAGE_ALIGN(s->offset + s->length); 1066 + 1067 + if (!arch_is_coherent()) 1068 + __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); 1069 + 1070 + ret = iommu_map(mapping->domain, iova, phys, len, 0); 1071 + if (ret < 0) 1072 + goto fail; 1073 + count += len >> PAGE_SHIFT; 1074 + iova += len; 1075 + } 1076 + *handle = iova_base; 1077 + 1078 + return 0; 1079 + fail: 1080 + iommu_unmap(mapping->domain, iova_base, count * PAGE_SIZE); 1081 + __free_iova(mapping, iova_base, size); 1082 + return ret; 1083 + } 1084 + 1085 + /** 1086 + * arm_iommu_map_sg - map a set of SG buffers for streaming mode DMA 1087 + * @dev: valid struct device pointer 1088 + * @sg: list of buffers 1089 + * @nents: number of buffers to map 1090 + * @dir: DMA transfer direction 1091 + * 1092 + * Map a set of buffers described by scatterlist in streaming mode for DMA. 1093 + * The scatter gather list elements are merged together (if possible) and 1094 + * tagged with the appropriate dma address and length. They are obtained via 1095 + * sg_dma_{address,length}. 1096 + */ 1097 + int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, int nents, 1098 + enum dma_data_direction dir, struct dma_attrs *attrs) 1099 + { 1100 + struct scatterlist *s = sg, *dma = sg, *start = sg; 1101 + int i, count = 0; 1102 + unsigned int offset = s->offset; 1103 + unsigned int size = s->offset + s->length; 1104 + unsigned int max = dma_get_max_seg_size(dev); 1105 + 1106 + for (i = 1; i < nents; i++) { 1107 + s = sg_next(s); 1108 + 1109 + s->dma_address = DMA_ERROR_CODE; 1110 + s->dma_length = 0; 1111 + 1112 + if (s->offset || (size & ~PAGE_MASK) || size + s->length > max) { 1113 + if (__map_sg_chunk(dev, start, size, &dma->dma_address, 1114 + dir) < 0) 1115 + goto bad_mapping; 1116 + 1117 + dma->dma_address += offset; 1118 + dma->dma_length = size - offset; 1119 + 1120 + size = offset = s->offset; 1121 + start = s; 1122 + dma = sg_next(dma); 1123 + count += 1; 1124 + } 1125 + size += s->length; 1126 + } 1127 + if (__map_sg_chunk(dev, start, size, &dma->dma_address, dir) < 0) 1128 + goto bad_mapping; 1129 + 1130 + dma->dma_address += offset; 1131 + dma->dma_length = size - offset; 1132 + 1133 + return count+1; 1134 + 1135 + bad_mapping: 1136 + for_each_sg(sg, s, count, i) 1137 + __iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s)); 1138 + return 0; 1139 + } 1140 + 1141 + /** 1142 + * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg 1143 + * @dev: valid struct device pointer 1144 + * @sg: list of buffers 1145 + * @nents: number of buffers to unmap (same as was passed to dma_map_sg) 1146 + * @dir: DMA transfer direction (same as was passed to dma_map_sg) 1147 + * 1148 + * Unmap a set of streaming mode DMA translations. Again, CPU access 1149 + * rules concerning calls here are the same as for dma_unmap_single(). 1150 + */ 1151 + void arm_iommu_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, 1152 + enum dma_data_direction dir, struct dma_attrs *attrs) 1153 + { 1154 + struct scatterlist *s; 1155 + int i; 1156 + 1157 + for_each_sg(sg, s, nents, i) { 1158 + if (sg_dma_len(s)) 1159 + __iommu_remove_mapping(dev, sg_dma_address(s), 1160 + sg_dma_len(s)); 1161 + if (!arch_is_coherent()) 1162 + __dma_page_dev_to_cpu(sg_page(s), s->offset, 1163 + s->length, dir); 1164 + } 1165 + } 1166 + 1167 + /** 1168 + * arm_iommu_sync_sg_for_cpu 1169 + * @dev: valid struct device pointer 1170 + * @sg: list of buffers 1171 + * @nents: number of buffers to map (returned from dma_map_sg) 1172 + * @dir: DMA transfer direction (same as was passed to dma_map_sg) 1173 + */ 1174 + void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, 1175 + int nents, enum dma_data_direction dir) 1176 + { 1177 + struct scatterlist *s; 1178 + int i; 1179 + 1180 + for_each_sg(sg, s, nents, i) 1181 + if (!arch_is_coherent()) 1182 + __dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir); 1183 + 1184 + } 1185 + 1186 + /** 1187 + * arm_iommu_sync_sg_for_device 1188 + * @dev: valid struct device pointer 1189 + * @sg: list of buffers 1190 + * @nents: number of buffers to map (returned from dma_map_sg) 1191 + * @dir: DMA transfer direction (same as was passed to dma_map_sg) 1192 + */ 1193 + void arm_iommu_sync_sg_for_device(struct device *dev, struct scatterlist *sg, 1194 + int nents, enum dma_data_direction dir) 1195 + { 1196 + struct scatterlist *s; 1197 + int i; 1198 + 1199 + for_each_sg(sg, s, nents, i) 1200 + if (!arch_is_coherent()) 1201 + __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); 1202 + } 1203 + 1204 + 1205 + /** 1206 + * arm_iommu_map_page 1207 + * @dev: valid struct device pointer 1208 + * @page: page that buffer resides in 1209 + * @offset: offset into page for start of buffer 1210 + * @size: size of buffer to map 1211 + * @dir: DMA transfer direction 1212 + * 1213 + * IOMMU aware version of arm_dma_map_page() 1214 + */ 1215 + static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page, 1216 + unsigned long offset, size_t size, enum dma_data_direction dir, 1217 + struct dma_attrs *attrs) 1218 + { 1219 + struct dma_iommu_mapping *mapping = dev->archdata.mapping; 1220 + dma_addr_t dma_addr; 1221 + int ret, len = PAGE_ALIGN(size + offset); 1222 + 1223 + if (!arch_is_coherent()) 1224 + __dma_page_cpu_to_dev(page, offset, size, dir); 1225 + 1226 + dma_addr = __alloc_iova(mapping, len); 1227 + if (dma_addr == DMA_ERROR_CODE) 1228 + return dma_addr; 1229 + 1230 + ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, 0); 1231 + if (ret < 0) 1232 + goto fail; 1233 + 1234 + return dma_addr + offset; 1235 + fail: 1236 + __free_iova(mapping, dma_addr, len); 1237 + return DMA_ERROR_CODE; 1238 + } 1239 + 1240 + /** 1241 + * arm_iommu_unmap_page 1242 + * @dev: valid struct device pointer 1243 + * @handle: DMA address of buffer 1244 + * @size: size of buffer (same as passed to dma_map_page) 1245 + * @dir: DMA transfer direction (same as passed to dma_map_page) 1246 + * 1247 + * IOMMU aware version of arm_dma_unmap_page() 1248 + */ 1249 + static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, 1250 + size_t size, enum dma_data_direction dir, 1251 + struct dma_attrs *attrs) 1252 + { 1253 + struct dma_iommu_mapping *mapping = dev->archdata.mapping; 1254 + dma_addr_t iova = handle & PAGE_MASK; 1255 + struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); 1256 + int offset = handle & ~PAGE_MASK; 1257 + int len = PAGE_ALIGN(size + offset); 1258 + 1259 + if (!iova) 1260 + return; 1261 + 1262 + if (!arch_is_coherent()) 1263 + __dma_page_dev_to_cpu(page, offset, size, dir); 1264 + 1265 + iommu_unmap(mapping->domain, iova, len); 1266 + __free_iova(mapping, iova, len); 1267 + } 1268 + 1269 + static void arm_iommu_sync_single_for_cpu(struct device *dev, 1270 + dma_addr_t handle, size_t size, enum dma_data_direction dir) 1271 + { 1272 + struct dma_iommu_mapping *mapping = dev->archdata.mapping; 1273 + dma_addr_t iova = handle & PAGE_MASK; 1274 + struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); 1275 + unsigned int offset = handle & ~PAGE_MASK; 1276 + 1277 + if (!iova) 1278 + return; 1279 + 1280 + if (!arch_is_coherent()) 1281 + __dma_page_dev_to_cpu(page, offset, size, dir); 1282 + } 1283 + 1284 + static void arm_iommu_sync_single_for_device(struct device *dev, 1285 + dma_addr_t handle, size_t size, enum dma_data_direction dir) 1286 + { 1287 + struct dma_iommu_mapping *mapping = dev->archdata.mapping; 1288 + dma_addr_t iova = handle & PAGE_MASK; 1289 + struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); 1290 + unsigned int offset = handle & ~PAGE_MASK; 1291 + 1292 + if (!iova) 1293 + return; 1294 + 1295 + __dma_page_cpu_to_dev(page, offset, size, dir); 1296 + } 1297 + 1298 + struct dma_map_ops iommu_ops = { 1299 + .alloc = arm_iommu_alloc_attrs, 1300 + .free = arm_iommu_free_attrs, 1301 + .mmap = arm_iommu_mmap_attrs, 1302 + 1303 + .map_page = arm_iommu_map_page, 1304 + .unmap_page = arm_iommu_unmap_page, 1305 + .sync_single_for_cpu = arm_iommu_sync_single_for_cpu, 1306 + .sync_single_for_device = arm_iommu_sync_single_for_device, 1307 + 1308 + .map_sg = arm_iommu_map_sg, 1309 + .unmap_sg = arm_iommu_unmap_sg, 1310 + .sync_sg_for_cpu = arm_iommu_sync_sg_for_cpu, 1311 + .sync_sg_for_device = arm_iommu_sync_sg_for_device, 1312 + }; 1313 + 1314 + /** 1315 + * arm_iommu_create_mapping 1316 + * @bus: pointer to the bus holding the client device (for IOMMU calls) 1317 + * @base: start address of the valid IO address space 1318 + * @size: size of the valid IO address space 1319 + * @order: accuracy of the IO addresses allocations 1320 + * 1321 + * Creates a mapping structure which holds information about used/unused 1322 + * IO address ranges, which is required to perform memory allocation and 1323 + * mapping with IOMMU aware functions. 1324 + * 1325 + * The client device need to be attached to the mapping with 1326 + * arm_iommu_attach_device function. 1327 + */ 1328 + struct dma_iommu_mapping * 1329 + arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, size_t size, 1330 + int order) 1331 + { 1332 + unsigned int count = size >> (PAGE_SHIFT + order); 1333 + unsigned int bitmap_size = BITS_TO_LONGS(count) * sizeof(long); 1334 + struct dma_iommu_mapping *mapping; 1335 + int err = -ENOMEM; 1336 + 1337 + if (!count) 1338 + return ERR_PTR(-EINVAL); 1339 + 1340 + mapping = kzalloc(sizeof(struct dma_iommu_mapping), GFP_KERNEL); 1341 + if (!mapping) 1342 + goto err; 1343 + 1344 + mapping->bitmap = kzalloc(bitmap_size, GFP_KERNEL); 1345 + if (!mapping->bitmap) 1346 + goto err2; 1347 + 1348 + mapping->base = base; 1349 + mapping->bits = BITS_PER_BYTE * bitmap_size; 1350 + mapping->order = order; 1351 + spin_lock_init(&mapping->lock); 1352 + 1353 + mapping->domain = iommu_domain_alloc(bus); 1354 + if (!mapping->domain) 1355 + goto err3; 1356 + 1357 + kref_init(&mapping->kref); 1358 + return mapping; 1359 + err3: 1360 + kfree(mapping->bitmap); 1361 + err2: 1362 + kfree(mapping); 1363 + err: 1364 + return ERR_PTR(err); 1365 + } 1366 + 1367 + static void release_iommu_mapping(struct kref *kref) 1368 + { 1369 + struct dma_iommu_mapping *mapping = 1370 + container_of(kref, struct dma_iommu_mapping, kref); 1371 + 1372 + iommu_domain_free(mapping->domain); 1373 + kfree(mapping->bitmap); 1374 + kfree(mapping); 1375 + } 1376 + 1377 + void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping) 1378 + { 1379 + if (mapping) 1380 + kref_put(&mapping->kref, release_iommu_mapping); 1381 + } 1382 + 1383 + /** 1384 + * arm_iommu_attach_device 1385 + * @dev: valid struct device pointer 1386 + * @mapping: io address space mapping structure (returned from 1387 + * arm_iommu_create_mapping) 1388 + * 1389 + * Attaches specified io address space mapping to the provided device, 1390 + * this replaces the dma operations (dma_map_ops pointer) with the 1391 + * IOMMU aware version. More than one client might be attached to 1392 + * the same io address space mapping. 1393 + */ 1394 + int arm_iommu_attach_device(struct device *dev, 1395 + struct dma_iommu_mapping *mapping) 1396 + { 1397 + int err; 1398 + 1399 + err = iommu_attach_device(mapping->domain, dev); 1400 + if (err) 1401 + return err; 1402 + 1403 + kref_get(&mapping->kref); 1404 + dev->archdata.mapping = mapping; 1405 + set_dma_ops(dev, &iommu_ops); 1406 + 1407 + pr_info("Attached IOMMU controller to %s device.\n", dev_name(dev)); 1408 + return 0; 1409 + } 1410 + 1411 + #endif
+19 -4
arch/arm/mm/init.c
··· 20 20 #include <linux/highmem.h> 21 21 #include <linux/gfp.h> 22 22 #include <linux/memblock.h> 23 + #include <linux/dma-contiguous.h> 23 24 24 25 #include <asm/mach-types.h> 25 26 #include <asm/memblock.h> ··· 227 226 } 228 227 #endif 229 228 229 + void __init setup_dma_zone(struct machine_desc *mdesc) 230 + { 231 + #ifdef CONFIG_ZONE_DMA 232 + if (mdesc->dma_zone_size) { 233 + arm_dma_zone_size = mdesc->dma_zone_size; 234 + arm_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1; 235 + } else 236 + arm_dma_limit = 0xffffffff; 237 + #endif 238 + } 239 + 230 240 static void __init arm_bootmem_free(unsigned long min, unsigned long max_low, 231 241 unsigned long max_high) 232 242 { ··· 285 273 * Adjust the sizes according to any special requirements for 286 274 * this machine type. 287 275 */ 288 - if (arm_dma_zone_size) { 276 + if (arm_dma_zone_size) 289 277 arm_adjust_dma_zone(zone_size, zhole_size, 290 278 arm_dma_zone_size >> PAGE_SHIFT); 291 - arm_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1; 292 - } else 293 - arm_dma_limit = 0xffffffff; 294 279 #endif 295 280 296 281 free_area_init_node(0, zone_size, min, zhole_size); ··· 372 363 /* reserve any platform specific memblock areas */ 373 364 if (mdesc->reserve) 374 365 mdesc->reserve(); 366 + 367 + /* 368 + * reserve memory for DMA contigouos allocations, 369 + * must come from DMA area inside low memory 370 + */ 371 + dma_contiguous_reserve(min(arm_dma_limit, arm_lowmem_limit)); 375 372 376 373 arm_memblock_steal_permitted = false; 377 374 memblock_allow_resize();
+3
arch/arm/mm/mm.h
··· 67 67 #define arm_dma_limit ((u32)~0) 68 68 #endif 69 69 70 + extern phys_addr_t arm_lowmem_limit; 71 + 70 72 void __init bootmem_init(void); 71 73 void arm_mm_memblock_reserve(void); 74 + void dma_contiguous_remap(void);
+20 -11
arch/arm/mm/mmu.c
··· 288 288 PMD_SECT_UNCACHED | PMD_SECT_XN, 289 289 .domain = DOMAIN_KERNEL, 290 290 }, 291 + [MT_MEMORY_DMA_READY] = { 292 + .prot_pte = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY, 293 + .prot_l1 = PMD_TYPE_TABLE, 294 + .domain = DOMAIN_KERNEL, 295 + }, 291 296 }; 292 297 293 298 const struct mem_type *get_mem_type(unsigned int type) ··· 434 429 if (arch_is_coherent() && cpu_is_xsc3()) { 435 430 mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S; 436 431 mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED; 432 + mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED; 437 433 mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S; 438 434 mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED; 439 435 } ··· 466 460 mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED; 467 461 mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S; 468 462 mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED; 463 + mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED; 469 464 mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S; 470 465 mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED; 471 466 } ··· 519 512 mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask; 520 513 mem_types[MT_MEMORY].prot_sect |= ecc_mask | cp->pmd; 521 514 mem_types[MT_MEMORY].prot_pte |= kern_pgprot; 515 + mem_types[MT_MEMORY_DMA_READY].prot_pte |= kern_pgprot; 522 516 mem_types[MT_MEMORY_NONCACHED].prot_sect |= ecc_mask; 523 517 mem_types[MT_ROM].prot_sect |= cp->pmd; 524 518 ··· 604 596 * L1 entries, whereas PGDs refer to a group of L1 entries making 605 597 * up one logical pointer to an L2 table. 606 598 */ 607 - if (((addr | end | phys) & ~SECTION_MASK) == 0) { 599 + if (type->prot_sect && ((addr | end | phys) & ~SECTION_MASK) == 0) { 608 600 pmd_t *p = pmd; 609 601 610 602 #ifndef CONFIG_ARM_LPAE ··· 822 814 } 823 815 early_param("vmalloc", early_vmalloc); 824 816 825 - static phys_addr_t lowmem_limit __initdata = 0; 817 + phys_addr_t arm_lowmem_limit __initdata = 0; 826 818 827 819 void __init sanity_check_meminfo(void) 828 820 { ··· 905 897 bank->size = newsize; 906 898 } 907 899 #endif 908 - if (!bank->highmem && bank->start + bank->size > lowmem_limit) 909 - lowmem_limit = bank->start + bank->size; 900 + if (!bank->highmem && bank->start + bank->size > arm_lowmem_limit) 901 + arm_lowmem_limit = bank->start + bank->size; 910 902 911 903 j++; 912 904 } ··· 931 923 } 932 924 #endif 933 925 meminfo.nr_banks = j; 934 - high_memory = __va(lowmem_limit - 1) + 1; 935 - memblock_set_current_limit(lowmem_limit); 926 + high_memory = __va(arm_lowmem_limit - 1) + 1; 927 + memblock_set_current_limit(arm_lowmem_limit); 936 928 } 937 929 938 930 static inline void prepare_page_table(void) ··· 957 949 * Find the end of the first block of lowmem. 958 950 */ 959 951 end = memblock.memory.regions[0].base + memblock.memory.regions[0].size; 960 - if (end >= lowmem_limit) 961 - end = lowmem_limit; 952 + if (end >= arm_lowmem_limit) 953 + end = arm_lowmem_limit; 962 954 963 955 /* 964 956 * Clear out all the kernel space mappings, except for the first ··· 1101 1093 phys_addr_t end = start + reg->size; 1102 1094 struct map_desc map; 1103 1095 1104 - if (end > lowmem_limit) 1105 - end = lowmem_limit; 1096 + if (end > arm_lowmem_limit) 1097 + end = arm_lowmem_limit; 1106 1098 if (start >= end) 1107 1099 break; 1108 1100 ··· 1123 1115 { 1124 1116 void *zero_page; 1125 1117 1126 - memblock_set_current_limit(lowmem_limit); 1118 + memblock_set_current_limit(arm_lowmem_limit); 1127 1119 1128 1120 build_mem_type_table(); 1129 1121 prepare_page_table(); 1130 1122 map_lowmem(); 1123 + dma_contiguous_remap(); 1131 1124 devicemaps_init(mdesc); 1132 1125 kmap_init(); 1133 1126
+1 -1
arch/arm/mm/vmregion.h
··· 17 17 struct list_head vm_list; 18 18 unsigned long vm_start; 19 19 unsigned long vm_end; 20 - struct page *vm_pages; 20 + void *priv; 21 21 int vm_active; 22 22 const void *caller; 23 23 };
+1
arch/x86/Kconfig
··· 32 32 select ARCH_WANT_OPTIONAL_GPIOLIB 33 33 select ARCH_WANT_FRAME_POINTERS 34 34 select HAVE_DMA_ATTRS 35 + select HAVE_DMA_CONTIGUOUS if !SWIOTLB 35 36 select HAVE_KRETPROBES 36 37 select HAVE_OPTPROBES 37 38 select HAVE_FTRACE_MCOUNT_RECORD
+13
arch/x86/include/asm/dma-contiguous.h
··· 1 + #ifndef ASMX86_DMA_CONTIGUOUS_H 2 + #define ASMX86_DMA_CONTIGUOUS_H 3 + 4 + #ifdef __KERNEL__ 5 + 6 + #include <linux/types.h> 7 + #include <asm-generic/dma-contiguous.h> 8 + 9 + static inline void 10 + dma_contiguous_early_fixup(phys_addr_t base, unsigned long size) { } 11 + 12 + #endif 13 + #endif
+5
arch/x86/include/asm/dma-mapping.h
··· 13 13 #include <asm/io.h> 14 14 #include <asm/swiotlb.h> 15 15 #include <asm-generic/dma-coherent.h> 16 + #include <linux/dma-contiguous.h> 16 17 17 18 #ifdef CONFIG_ISA 18 19 # define ISA_DMA_BIT_MASK DMA_BIT_MASK(24) ··· 62 61 extern void *dma_generic_alloc_coherent(struct device *dev, size_t size, 63 62 dma_addr_t *dma_addr, gfp_t flag, 64 63 struct dma_attrs *attrs); 64 + 65 + extern void dma_generic_free_coherent(struct device *dev, size_t size, 66 + void *vaddr, dma_addr_t dma_addr, 67 + struct dma_attrs *attrs); 65 68 66 69 #ifdef CONFIG_X86_DMA_REMAP /* Platform code defines bridge-specific code */ 67 70 extern bool dma_capable(struct device *dev, dma_addr_t addr, size_t size);
+16 -2
arch/x86/kernel/pci-dma.c
··· 100 100 struct dma_attrs *attrs) 101 101 { 102 102 unsigned long dma_mask; 103 - struct page *page; 103 + struct page *page = NULL; 104 + unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT; 104 105 dma_addr_t addr; 105 106 106 107 dma_mask = dma_alloc_coherent_mask(dev, flag); 107 108 108 109 flag |= __GFP_ZERO; 109 110 again: 110 - page = alloc_pages_node(dev_to_node(dev), flag, get_order(size)); 111 + if (!(flag & GFP_ATOMIC)) 112 + page = dma_alloc_from_contiguous(dev, count, get_order(size)); 113 + if (!page) 114 + page = alloc_pages_node(dev_to_node(dev), flag, get_order(size)); 111 115 if (!page) 112 116 return NULL; 113 117 ··· 129 125 130 126 *dma_addr = addr; 131 127 return page_address(page); 128 + } 129 + 130 + void dma_generic_free_coherent(struct device *dev, size_t size, void *vaddr, 131 + dma_addr_t dma_addr, struct dma_attrs *attrs) 132 + { 133 + unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT; 134 + struct page *page = virt_to_page(vaddr); 135 + 136 + if (!dma_release_from_contiguous(dev, page, count)) 137 + free_pages((unsigned long)vaddr, get_order(size)); 132 138 } 133 139 134 140 /*
+1 -7
arch/x86/kernel/pci-nommu.c
··· 74 74 return nents; 75 75 } 76 76 77 - static void nommu_free_coherent(struct device *dev, size_t size, void *vaddr, 78 - dma_addr_t dma_addr, struct dma_attrs *attrs) 79 - { 80 - free_pages((unsigned long)vaddr, get_order(size)); 81 - } 82 - 83 77 static void nommu_sync_single_for_device(struct device *dev, 84 78 dma_addr_t addr, size_t size, 85 79 enum dma_data_direction dir) ··· 91 97 92 98 struct dma_map_ops nommu_dma_ops = { 93 99 .alloc = dma_generic_alloc_coherent, 94 - .free = nommu_free_coherent, 100 + .free = dma_generic_free_coherent, 95 101 .map_sg = nommu_map_sg, 96 102 .map_page = nommu_map_page, 97 103 .sync_single_for_device = nommu_sync_single_for_device,
+2
arch/x86/kernel/setup.c
··· 49 49 #include <asm/pci-direct.h> 50 50 #include <linux/init_ohci1394_dma.h> 51 51 #include <linux/kvm_para.h> 52 + #include <linux/dma-contiguous.h> 52 53 53 54 #include <linux/errno.h> 54 55 #include <linux/kernel.h> ··· 926 925 } 927 926 #endif 928 927 memblock.current_limit = get_max_mapped(); 928 + dma_contiguous_reserve(0); 929 929 930 930 /* 931 931 * NOTE: On x86-32, only from this point on, fixmaps are ready for use.
+89
drivers/base/Kconfig
··· 192 192 APIs extension; the file's descriptor can then be passed on to other 193 193 driver. 194 194 195 + config CMA 196 + bool "Contiguous Memory Allocator (EXPERIMENTAL)" 197 + depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK && EXPERIMENTAL 198 + select MIGRATION 199 + help 200 + This enables the Contiguous Memory Allocator which allows drivers 201 + to allocate big physically-contiguous blocks of memory for use with 202 + hardware components that do not support I/O map nor scatter-gather. 203 + 204 + For more information see <include/linux/dma-contiguous.h>. 205 + If unsure, say "n". 206 + 207 + if CMA 208 + 209 + config CMA_DEBUG 210 + bool "CMA debug messages (DEVELOPMENT)" 211 + depends on DEBUG_KERNEL 212 + help 213 + Turns on debug messages in CMA. This produces KERN_DEBUG 214 + messages for every CMA call as well as various messages while 215 + processing calls such as dma_alloc_from_contiguous(). 216 + This option does not affect warning and error messages. 217 + 218 + comment "Default contiguous memory area size:" 219 + 220 + config CMA_SIZE_MBYTES 221 + int "Size in Mega Bytes" 222 + depends on !CMA_SIZE_SEL_PERCENTAGE 223 + default 16 224 + help 225 + Defines the size (in MiB) of the default memory area for Contiguous 226 + Memory Allocator. 227 + 228 + config CMA_SIZE_PERCENTAGE 229 + int "Percentage of total memory" 230 + depends on !CMA_SIZE_SEL_MBYTES 231 + default 10 232 + help 233 + Defines the size of the default memory area for Contiguous Memory 234 + Allocator as a percentage of the total memory in the system. 235 + 236 + choice 237 + prompt "Selected region size" 238 + default CMA_SIZE_SEL_ABSOLUTE 239 + 240 + config CMA_SIZE_SEL_MBYTES 241 + bool "Use mega bytes value only" 242 + 243 + config CMA_SIZE_SEL_PERCENTAGE 244 + bool "Use percentage value only" 245 + 246 + config CMA_SIZE_SEL_MIN 247 + bool "Use lower value (minimum)" 248 + 249 + config CMA_SIZE_SEL_MAX 250 + bool "Use higher value (maximum)" 251 + 252 + endchoice 253 + 254 + config CMA_ALIGNMENT 255 + int "Maximum PAGE_SIZE order of alignment for contiguous buffers" 256 + range 4 9 257 + default 8 258 + help 259 + DMA mapping framework by default aligns all buffers to the smallest 260 + PAGE_SIZE order which is greater than or equal to the requested buffer 261 + size. This works well for buffers up to a few hundreds kilobytes, but 262 + for larger buffers it just a memory waste. With this parameter you can 263 + specify the maximum PAGE_SIZE order for contiguous buffers. Larger 264 + buffers will be aligned only to this specified order. The order is 265 + expressed as a power of two multiplied by the PAGE_SIZE. 266 + 267 + For example, if your system defaults to 4KiB pages, the order value 268 + of 8 means that the buffers will be aligned up to 1MiB only. 269 + 270 + If unsure, leave the default value "8". 271 + 272 + config CMA_AREAS 273 + int "Maximum count of the CMA device-private areas" 274 + default 7 275 + help 276 + CMA allows to create CMA areas for particular devices. This parameter 277 + sets the maximum number of such device private CMA areas in the 278 + system. 279 + 280 + If unsure, leave the default value "7". 281 + 282 + endif 283 + 195 284 endmenu
+1
drivers/base/Makefile
··· 6 6 attribute_container.o transport_class.o \ 7 7 topology.o 8 8 obj-$(CONFIG_DEVTMPFS) += devtmpfs.o 9 + obj-$(CONFIG_CMA) += dma-contiguous.o 9 10 obj-y += power/ 10 11 obj-$(CONFIG_HAS_DMA) += dma-mapping.o 11 12 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
+42
drivers/base/dma-coherent.c
··· 10 10 struct dma_coherent_mem { 11 11 void *virt_base; 12 12 dma_addr_t device_base; 13 + phys_addr_t pfn_base; 13 14 int size; 14 15 int flags; 15 16 unsigned long *bitmap; ··· 45 44 46 45 dev->dma_mem->virt_base = mem_base; 47 46 dev->dma_mem->device_base = device_addr; 47 + dev->dma_mem->pfn_base = PFN_DOWN(bus_addr); 48 48 dev->dma_mem->size = pages; 49 49 dev->dma_mem->flags = flags; 50 50 ··· 178 176 return 0; 179 177 } 180 178 EXPORT_SYMBOL(dma_release_from_coherent); 179 + 180 + /** 181 + * dma_mmap_from_coherent() - try to mmap the memory allocated from 182 + * per-device coherent memory pool to userspace 183 + * @dev: device from which the memory was allocated 184 + * @vma: vm_area for the userspace memory 185 + * @vaddr: cpu address returned by dma_alloc_from_coherent 186 + * @size: size of the memory buffer allocated by dma_alloc_from_coherent 187 + * 188 + * This checks whether the memory was allocated from the per-device 189 + * coherent memory pool and if so, maps that memory to the provided vma. 190 + * 191 + * Returns 1 if we correctly mapped the memory, or 0 if 192 + * dma_release_coherent() should proceed with mapping memory from 193 + * generic pools. 194 + */ 195 + int dma_mmap_from_coherent(struct device *dev, struct vm_area_struct *vma, 196 + void *vaddr, size_t size, int *ret) 197 + { 198 + struct dma_coherent_mem *mem = dev ? dev->dma_mem : NULL; 199 + 200 + if (mem && vaddr >= mem->virt_base && vaddr + size <= 201 + (mem->virt_base + (mem->size << PAGE_SHIFT))) { 202 + unsigned long off = vma->vm_pgoff; 203 + int start = (vaddr - mem->virt_base) >> PAGE_SHIFT; 204 + int user_count = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; 205 + int count = size >> PAGE_SHIFT; 206 + 207 + *ret = -ENXIO; 208 + if (off < count && user_count <= count - off) { 209 + unsigned pfn = mem->pfn_base + start + off; 210 + *ret = remap_pfn_range(vma, vma->vm_start, pfn, 211 + user_count << PAGE_SHIFT, 212 + vma->vm_page_prot); 213 + } 214 + return 1; 215 + } 216 + return 0; 217 + } 218 + EXPORT_SYMBOL(dma_mmap_from_coherent);
+401
drivers/base/dma-contiguous.c
··· 1 + /* 2 + * Contiguous Memory Allocator for DMA mapping framework 3 + * Copyright (c) 2010-2011 by Samsung Electronics. 4 + * Written by: 5 + * Marek Szyprowski <m.szyprowski@samsung.com> 6 + * Michal Nazarewicz <mina86@mina86.com> 7 + * 8 + * This program is free software; you can redistribute it and/or 9 + * modify it under the terms of the GNU General Public License as 10 + * published by the Free Software Foundation; either version 2 of the 11 + * License or (at your optional) any later version of the license. 12 + */ 13 + 14 + #define pr_fmt(fmt) "cma: " fmt 15 + 16 + #ifdef CONFIG_CMA_DEBUG 17 + #ifndef DEBUG 18 + # define DEBUG 19 + #endif 20 + #endif 21 + 22 + #include <asm/page.h> 23 + #include <asm/dma-contiguous.h> 24 + 25 + #include <linux/memblock.h> 26 + #include <linux/err.h> 27 + #include <linux/mm.h> 28 + #include <linux/mutex.h> 29 + #include <linux/page-isolation.h> 30 + #include <linux/slab.h> 31 + #include <linux/swap.h> 32 + #include <linux/mm_types.h> 33 + #include <linux/dma-contiguous.h> 34 + 35 + #ifndef SZ_1M 36 + #define SZ_1M (1 << 20) 37 + #endif 38 + 39 + struct cma { 40 + unsigned long base_pfn; 41 + unsigned long count; 42 + unsigned long *bitmap; 43 + }; 44 + 45 + struct cma *dma_contiguous_default_area; 46 + 47 + #ifdef CONFIG_CMA_SIZE_MBYTES 48 + #define CMA_SIZE_MBYTES CONFIG_CMA_SIZE_MBYTES 49 + #else 50 + #define CMA_SIZE_MBYTES 0 51 + #endif 52 + 53 + /* 54 + * Default global CMA area size can be defined in kernel's .config. 55 + * This is usefull mainly for distro maintainers to create a kernel 56 + * that works correctly for most supported systems. 57 + * The size can be set in bytes or as a percentage of the total memory 58 + * in the system. 59 + * 60 + * Users, who want to set the size of global CMA area for their system 61 + * should use cma= kernel parameter. 62 + */ 63 + static const unsigned long size_bytes = CMA_SIZE_MBYTES * SZ_1M; 64 + static long size_cmdline = -1; 65 + 66 + static int __init early_cma(char *p) 67 + { 68 + pr_debug("%s(%s)\n", __func__, p); 69 + size_cmdline = memparse(p, &p); 70 + return 0; 71 + } 72 + early_param("cma", early_cma); 73 + 74 + #ifdef CONFIG_CMA_SIZE_PERCENTAGE 75 + 76 + static unsigned long __init __maybe_unused cma_early_percent_memory(void) 77 + { 78 + struct memblock_region *reg; 79 + unsigned long total_pages = 0; 80 + 81 + /* 82 + * We cannot use memblock_phys_mem_size() here, because 83 + * memblock_analyze() has not been called yet. 84 + */ 85 + for_each_memblock(memory, reg) 86 + total_pages += memblock_region_memory_end_pfn(reg) - 87 + memblock_region_memory_base_pfn(reg); 88 + 89 + return (total_pages * CONFIG_CMA_SIZE_PERCENTAGE / 100) << PAGE_SHIFT; 90 + } 91 + 92 + #else 93 + 94 + static inline __maybe_unused unsigned long cma_early_percent_memory(void) 95 + { 96 + return 0; 97 + } 98 + 99 + #endif 100 + 101 + /** 102 + * dma_contiguous_reserve() - reserve area for contiguous memory handling 103 + * @limit: End address of the reserved memory (optional, 0 for any). 104 + * 105 + * This function reserves memory from early allocator. It should be 106 + * called by arch specific code once the early allocator (memblock or bootmem) 107 + * has been activated and all other subsystems have already allocated/reserved 108 + * memory. 109 + */ 110 + void __init dma_contiguous_reserve(phys_addr_t limit) 111 + { 112 + unsigned long selected_size = 0; 113 + 114 + pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit); 115 + 116 + if (size_cmdline != -1) { 117 + selected_size = size_cmdline; 118 + } else { 119 + #ifdef CONFIG_CMA_SIZE_SEL_MBYTES 120 + selected_size = size_bytes; 121 + #elif defined(CONFIG_CMA_SIZE_SEL_PERCENTAGE) 122 + selected_size = cma_early_percent_memory(); 123 + #elif defined(CONFIG_CMA_SIZE_SEL_MIN) 124 + selected_size = min(size_bytes, cma_early_percent_memory()); 125 + #elif defined(CONFIG_CMA_SIZE_SEL_MAX) 126 + selected_size = max(size_bytes, cma_early_percent_memory()); 127 + #endif 128 + } 129 + 130 + if (selected_size) { 131 + pr_debug("%s: reserving %ld MiB for global area\n", __func__, 132 + selected_size / SZ_1M); 133 + 134 + dma_declare_contiguous(NULL, selected_size, 0, limit); 135 + } 136 + }; 137 + 138 + static DEFINE_MUTEX(cma_mutex); 139 + 140 + static __init int cma_activate_area(unsigned long base_pfn, unsigned long count) 141 + { 142 + unsigned long pfn = base_pfn; 143 + unsigned i = count >> pageblock_order; 144 + struct zone *zone; 145 + 146 + WARN_ON_ONCE(!pfn_valid(pfn)); 147 + zone = page_zone(pfn_to_page(pfn)); 148 + 149 + do { 150 + unsigned j; 151 + base_pfn = pfn; 152 + for (j = pageblock_nr_pages; j; --j, pfn++) { 153 + WARN_ON_ONCE(!pfn_valid(pfn)); 154 + if (page_zone(pfn_to_page(pfn)) != zone) 155 + return -EINVAL; 156 + } 157 + init_cma_reserved_pageblock(pfn_to_page(base_pfn)); 158 + } while (--i); 159 + return 0; 160 + } 161 + 162 + static __init struct cma *cma_create_area(unsigned long base_pfn, 163 + unsigned long count) 164 + { 165 + int bitmap_size = BITS_TO_LONGS(count) * sizeof(long); 166 + struct cma *cma; 167 + int ret = -ENOMEM; 168 + 169 + pr_debug("%s(base %08lx, count %lx)\n", __func__, base_pfn, count); 170 + 171 + cma = kmalloc(sizeof *cma, GFP_KERNEL); 172 + if (!cma) 173 + return ERR_PTR(-ENOMEM); 174 + 175 + cma->base_pfn = base_pfn; 176 + cma->count = count; 177 + cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL); 178 + 179 + if (!cma->bitmap) 180 + goto no_mem; 181 + 182 + ret = cma_activate_area(base_pfn, count); 183 + if (ret) 184 + goto error; 185 + 186 + pr_debug("%s: returned %p\n", __func__, (void *)cma); 187 + return cma; 188 + 189 + error: 190 + kfree(cma->bitmap); 191 + no_mem: 192 + kfree(cma); 193 + return ERR_PTR(ret); 194 + } 195 + 196 + static struct cma_reserved { 197 + phys_addr_t start; 198 + unsigned long size; 199 + struct device *dev; 200 + } cma_reserved[MAX_CMA_AREAS] __initdata; 201 + static unsigned cma_reserved_count __initdata; 202 + 203 + static int __init cma_init_reserved_areas(void) 204 + { 205 + struct cma_reserved *r = cma_reserved; 206 + unsigned i = cma_reserved_count; 207 + 208 + pr_debug("%s()\n", __func__); 209 + 210 + for (; i; --i, ++r) { 211 + struct cma *cma; 212 + cma = cma_create_area(PFN_DOWN(r->start), 213 + r->size >> PAGE_SHIFT); 214 + if (!IS_ERR(cma)) 215 + dev_set_cma_area(r->dev, cma); 216 + } 217 + return 0; 218 + } 219 + core_initcall(cma_init_reserved_areas); 220 + 221 + /** 222 + * dma_declare_contiguous() - reserve area for contiguous memory handling 223 + * for particular device 224 + * @dev: Pointer to device structure. 225 + * @size: Size of the reserved memory. 226 + * @base: Start address of the reserved memory (optional, 0 for any). 227 + * @limit: End address of the reserved memory (optional, 0 for any). 228 + * 229 + * This function reserves memory for specified device. It should be 230 + * called by board specific code when early allocator (memblock or bootmem) 231 + * is still activate. 232 + */ 233 + int __init dma_declare_contiguous(struct device *dev, unsigned long size, 234 + phys_addr_t base, phys_addr_t limit) 235 + { 236 + struct cma_reserved *r = &cma_reserved[cma_reserved_count]; 237 + unsigned long alignment; 238 + 239 + pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__, 240 + (unsigned long)size, (unsigned long)base, 241 + (unsigned long)limit); 242 + 243 + /* Sanity checks */ 244 + if (cma_reserved_count == ARRAY_SIZE(cma_reserved)) { 245 + pr_err("Not enough slots for CMA reserved regions!\n"); 246 + return -ENOSPC; 247 + } 248 + 249 + if (!size) 250 + return -EINVAL; 251 + 252 + /* Sanitise input arguments */ 253 + alignment = PAGE_SIZE << max(MAX_ORDER, pageblock_order); 254 + base = ALIGN(base, alignment); 255 + size = ALIGN(size, alignment); 256 + limit &= ~(alignment - 1); 257 + 258 + /* Reserve memory */ 259 + if (base) { 260 + if (memblock_is_region_reserved(base, size) || 261 + memblock_reserve(base, size) < 0) { 262 + base = -EBUSY; 263 + goto err; 264 + } 265 + } else { 266 + /* 267 + * Use __memblock_alloc_base() since 268 + * memblock_alloc_base() panic()s. 269 + */ 270 + phys_addr_t addr = __memblock_alloc_base(size, alignment, limit); 271 + if (!addr) { 272 + base = -ENOMEM; 273 + goto err; 274 + } else if (addr + size > ~(unsigned long)0) { 275 + memblock_free(addr, size); 276 + base = -EINVAL; 277 + goto err; 278 + } else { 279 + base = addr; 280 + } 281 + } 282 + 283 + /* 284 + * Each reserved area must be initialised later, when more kernel 285 + * subsystems (like slab allocator) are available. 286 + */ 287 + r->start = base; 288 + r->size = size; 289 + r->dev = dev; 290 + cma_reserved_count++; 291 + pr_info("CMA: reserved %ld MiB at %08lx\n", size / SZ_1M, 292 + (unsigned long)base); 293 + 294 + /* Architecture specific contiguous memory fixup. */ 295 + dma_contiguous_early_fixup(base, size); 296 + return 0; 297 + err: 298 + pr_err("CMA: failed to reserve %ld MiB\n", size / SZ_1M); 299 + return base; 300 + } 301 + 302 + /** 303 + * dma_alloc_from_contiguous() - allocate pages from contiguous area 304 + * @dev: Pointer to device for which the allocation is performed. 305 + * @count: Requested number of pages. 306 + * @align: Requested alignment of pages (in PAGE_SIZE order). 307 + * 308 + * This function allocates memory buffer for specified device. It uses 309 + * device specific contiguous memory area if available or the default 310 + * global one. Requires architecture specific get_dev_cma_area() helper 311 + * function. 312 + */ 313 + struct page *dma_alloc_from_contiguous(struct device *dev, int count, 314 + unsigned int align) 315 + { 316 + unsigned long mask, pfn, pageno, start = 0; 317 + struct cma *cma = dev_get_cma_area(dev); 318 + int ret; 319 + 320 + if (!cma || !cma->count) 321 + return NULL; 322 + 323 + if (align > CONFIG_CMA_ALIGNMENT) 324 + align = CONFIG_CMA_ALIGNMENT; 325 + 326 + pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma, 327 + count, align); 328 + 329 + if (!count) 330 + return NULL; 331 + 332 + mask = (1 << align) - 1; 333 + 334 + mutex_lock(&cma_mutex); 335 + 336 + for (;;) { 337 + pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 338 + start, count, mask); 339 + if (pageno >= cma->count) { 340 + ret = -ENOMEM; 341 + goto error; 342 + } 343 + 344 + pfn = cma->base_pfn + pageno; 345 + ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA); 346 + if (ret == 0) { 347 + bitmap_set(cma->bitmap, pageno, count); 348 + break; 349 + } else if (ret != -EBUSY) { 350 + goto error; 351 + } 352 + pr_debug("%s(): memory range at %p is busy, retrying\n", 353 + __func__, pfn_to_page(pfn)); 354 + /* try again with a bit different memory target */ 355 + start = pageno + mask + 1; 356 + } 357 + 358 + mutex_unlock(&cma_mutex); 359 + 360 + pr_debug("%s(): returned %p\n", __func__, pfn_to_page(pfn)); 361 + return pfn_to_page(pfn); 362 + error: 363 + mutex_unlock(&cma_mutex); 364 + return NULL; 365 + } 366 + 367 + /** 368 + * dma_release_from_contiguous() - release allocated pages 369 + * @dev: Pointer to device for which the pages were allocated. 370 + * @pages: Allocated pages. 371 + * @count: Number of allocated pages. 372 + * 373 + * This function releases memory allocated by dma_alloc_from_contiguous(). 374 + * It returns false when provided pages do not belong to contiguous area and 375 + * true otherwise. 376 + */ 377 + bool dma_release_from_contiguous(struct device *dev, struct page *pages, 378 + int count) 379 + { 380 + struct cma *cma = dev_get_cma_area(dev); 381 + unsigned long pfn; 382 + 383 + if (!cma || !pages) 384 + return false; 385 + 386 + pr_debug("%s(page %p)\n", __func__, (void *)pages); 387 + 388 + pfn = page_to_pfn(pages); 389 + 390 + if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count) 391 + return false; 392 + 393 + VM_BUG_ON(pfn + count > cma->base_pfn + cma->count); 394 + 395 + mutex_lock(&cma_mutex); 396 + bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count); 397 + free_contig_range(pfn, count); 398 + mutex_unlock(&cma_mutex); 399 + 400 + return true; 401 + }
+3 -1
include/asm-generic/dma-coherent.h
··· 3 3 4 4 #ifdef CONFIG_HAVE_GENERIC_DMA_COHERENT 5 5 /* 6 - * These two functions are only for dma allocator. 6 + * These three functions are only for dma allocator. 7 7 * Don't use them in device drivers. 8 8 */ 9 9 int dma_alloc_from_coherent(struct device *dev, ssize_t size, 10 10 dma_addr_t *dma_handle, void **ret); 11 11 int dma_release_from_coherent(struct device *dev, int order, void *vaddr); 12 12 13 + int dma_mmap_from_coherent(struct device *dev, struct vm_area_struct *vma, 14 + void *cpu_addr, size_t size, int *ret); 13 15 /* 14 16 * Standard interface 15 17 */
+28
include/asm-generic/dma-contiguous.h
··· 1 + #ifndef ASM_DMA_CONTIGUOUS_H 2 + #define ASM_DMA_CONTIGUOUS_H 3 + 4 + #ifdef __KERNEL__ 5 + #ifdef CONFIG_CMA 6 + 7 + #include <linux/device.h> 8 + #include <linux/dma-contiguous.h> 9 + 10 + static inline struct cma *dev_get_cma_area(struct device *dev) 11 + { 12 + if (dev && dev->cma_area) 13 + return dev->cma_area; 14 + return dma_contiguous_default_area; 15 + } 16 + 17 + static inline void dev_set_cma_area(struct device *dev, struct cma *cma) 18 + { 19 + if (dev) 20 + dev->cma_area = cma; 21 + if (!dev || !dma_contiguous_default_area) 22 + dma_contiguous_default_area = cma; 23 + } 24 + 25 + #endif 26 + #endif 27 + 28 + #endif
+4
include/linux/device.h
··· 667 667 668 668 struct dma_coherent_mem *dma_mem; /* internal for coherent mem 669 669 override */ 670 + #ifdef CONFIG_CMA 671 + struct cma *cma_area; /* contiguous memory area for dma 672 + allocations */ 673 + #endif 670 674 /* arch specific additions */ 671 675 struct dev_archdata archdata; 672 676
+110
include/linux/dma-contiguous.h
··· 1 + #ifndef __LINUX_CMA_H 2 + #define __LINUX_CMA_H 3 + 4 + /* 5 + * Contiguous Memory Allocator for DMA mapping framework 6 + * Copyright (c) 2010-2011 by Samsung Electronics. 7 + * Written by: 8 + * Marek Szyprowski <m.szyprowski@samsung.com> 9 + * Michal Nazarewicz <mina86@mina86.com> 10 + * 11 + * This program is free software; you can redistribute it and/or 12 + * modify it under the terms of the GNU General Public License as 13 + * published by the Free Software Foundation; either version 2 of the 14 + * License or (at your optional) any later version of the license. 15 + */ 16 + 17 + /* 18 + * Contiguous Memory Allocator 19 + * 20 + * The Contiguous Memory Allocator (CMA) makes it possible to 21 + * allocate big contiguous chunks of memory after the system has 22 + * booted. 23 + * 24 + * Why is it needed? 25 + * 26 + * Various devices on embedded systems have no scatter-getter and/or 27 + * IO map support and require contiguous blocks of memory to 28 + * operate. They include devices such as cameras, hardware video 29 + * coders, etc. 30 + * 31 + * Such devices often require big memory buffers (a full HD frame 32 + * is, for instance, more then 2 mega pixels large, i.e. more than 6 33 + * MB of memory), which makes mechanisms such as kmalloc() or 34 + * alloc_page() ineffective. 35 + * 36 + * At the same time, a solution where a big memory region is 37 + * reserved for a device is suboptimal since often more memory is 38 + * reserved then strictly required and, moreover, the memory is 39 + * inaccessible to page system even if device drivers don't use it. 40 + * 41 + * CMA tries to solve this issue by operating on memory regions 42 + * where only movable pages can be allocated from. This way, kernel 43 + * can use the memory for pagecache and when device driver requests 44 + * it, allocated pages can be migrated. 45 + * 46 + * Driver usage 47 + * 48 + * CMA should not be used by the device drivers directly. It is 49 + * only a helper framework for dma-mapping subsystem. 50 + * 51 + * For more information, see kernel-docs in drivers/base/dma-contiguous.c 52 + */ 53 + 54 + #ifdef __KERNEL__ 55 + 56 + struct cma; 57 + struct page; 58 + struct device; 59 + 60 + #ifdef CONFIG_CMA 61 + 62 + /* 63 + * There is always at least global CMA area and a few optional device 64 + * private areas configured in kernel .config. 65 + */ 66 + #define MAX_CMA_AREAS (1 + CONFIG_CMA_AREAS) 67 + 68 + extern struct cma *dma_contiguous_default_area; 69 + 70 + void dma_contiguous_reserve(phys_addr_t addr_limit); 71 + int dma_declare_contiguous(struct device *dev, unsigned long size, 72 + phys_addr_t base, phys_addr_t limit); 73 + 74 + struct page *dma_alloc_from_contiguous(struct device *dev, int count, 75 + unsigned int order); 76 + bool dma_release_from_contiguous(struct device *dev, struct page *pages, 77 + int count); 78 + 79 + #else 80 + 81 + #define MAX_CMA_AREAS (0) 82 + 83 + static inline void dma_contiguous_reserve(phys_addr_t limit) { } 84 + 85 + static inline 86 + int dma_declare_contiguous(struct device *dev, unsigned long size, 87 + phys_addr_t base, phys_addr_t limit) 88 + { 89 + return -ENOSYS; 90 + } 91 + 92 + static inline 93 + struct page *dma_alloc_from_contiguous(struct device *dev, int count, 94 + unsigned int order) 95 + { 96 + return NULL; 97 + } 98 + 99 + static inline 100 + bool dma_release_from_contiguous(struct device *dev, struct page *pages, 101 + int count) 102 + { 103 + return false; 104 + } 105 + 106 + #endif 107 + 108 + #endif 109 + 110 + #endif
+12
include/linux/gfp.h
··· 391 391 } 392 392 #endif /* CONFIG_PM_SLEEP */ 393 393 394 + #ifdef CONFIG_CMA 395 + 396 + /* The below functions must be run on a range from a single zone. */ 397 + extern int alloc_contig_range(unsigned long start, unsigned long end, 398 + unsigned migratetype); 399 + extern void free_contig_range(unsigned long pfn, unsigned nr_pages); 400 + 401 + /* CMA stuff */ 402 + extern void init_cma_reserved_pageblock(struct page *page); 403 + 404 + #endif 405 + 394 406 #endif /* __LINUX_GFP_H */
+40 -7
include/linux/mmzone.h
··· 35 35 */ 36 36 #define PAGE_ALLOC_COSTLY_ORDER 3 37 37 38 - #define MIGRATE_UNMOVABLE 0 39 - #define MIGRATE_RECLAIMABLE 1 40 - #define MIGRATE_MOVABLE 2 41 - #define MIGRATE_PCPTYPES 3 /* the number of types on the pcp lists */ 42 - #define MIGRATE_RESERVE 3 43 - #define MIGRATE_ISOLATE 4 /* can't allocate from here */ 44 - #define MIGRATE_TYPES 5 38 + enum { 39 + MIGRATE_UNMOVABLE, 40 + MIGRATE_RECLAIMABLE, 41 + MIGRATE_MOVABLE, 42 + MIGRATE_PCPTYPES, /* the number of types on the pcp lists */ 43 + MIGRATE_RESERVE = MIGRATE_PCPTYPES, 44 + #ifdef CONFIG_CMA 45 + /* 46 + * MIGRATE_CMA migration type is designed to mimic the way 47 + * ZONE_MOVABLE works. Only movable pages can be allocated 48 + * from MIGRATE_CMA pageblocks and page allocator never 49 + * implicitly change migration type of MIGRATE_CMA pageblock. 50 + * 51 + * The way to use it is to change migratetype of a range of 52 + * pageblocks to MIGRATE_CMA which can be done by 53 + * __free_pageblock_cma() function. What is important though 54 + * is that a range of pageblocks must be aligned to 55 + * MAX_ORDER_NR_PAGES should biggest page be bigger then 56 + * a single pageblock. 57 + */ 58 + MIGRATE_CMA, 59 + #endif 60 + MIGRATE_ISOLATE, /* can't allocate from here */ 61 + MIGRATE_TYPES 62 + }; 63 + 64 + #ifdef CONFIG_CMA 65 + # define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA) 66 + # define cma_wmark_pages(zone) zone->min_cma_pages 67 + #else 68 + # define is_migrate_cma(migratetype) false 69 + # define cma_wmark_pages(zone) 0 70 + #endif 45 71 46 72 #define for_each_migratetype_order(order, type) \ 47 73 for (order = 0; order < MAX_ORDER; order++) \ ··· 372 346 #ifdef CONFIG_MEMORY_HOTPLUG 373 347 /* see spanned/present_pages for more description */ 374 348 seqlock_t span_seqlock; 349 + #endif 350 + #ifdef CONFIG_CMA 351 + /* 352 + * CMA needs to increase watermark levels during the allocation 353 + * process to make sure that the system is not starved. 354 + */ 355 + unsigned long min_cma_pages; 375 356 #endif 376 357 struct free_area free_area[MAX_ORDER]; 377 358
+9 -9
include/linux/page-isolation.h
··· 3 3 4 4 /* 5 5 * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE. 6 - * If specified range includes migrate types other than MOVABLE, 6 + * If specified range includes migrate types other than MOVABLE or CMA, 7 7 * this will fail with -EBUSY. 8 8 * 9 9 * For isolating all pages in the range finally, the caller have to ··· 11 11 * test it. 12 12 */ 13 13 extern int 14 - start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn); 14 + start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, 15 + unsigned migratetype); 15 16 16 17 /* 17 18 * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE. 18 19 * target range is [start_pfn, end_pfn) 19 20 */ 20 21 extern int 21 - undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn); 22 + undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, 23 + unsigned migratetype); 22 24 23 25 /* 24 - * test all pages in [start_pfn, end_pfn)are isolated or not. 26 + * Test all pages in [start_pfn, end_pfn) are isolated or not. 25 27 */ 26 - extern int 27 - test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn); 28 + int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn); 28 29 29 30 /* 30 - * Internal funcs.Changes pageblock's migrate type. 31 - * Please use make_pagetype_isolated()/make_pagetype_movable(). 31 + * Internal functions. Changes pageblock's migrate type. 32 32 */ 33 33 extern int set_migratetype_isolate(struct page *page); 34 - extern void unset_migratetype_isolate(struct page *page); 34 + extern void unset_migratetype_isolate(struct page *page, unsigned migratetype); 35 35 36 36 37 37 #endif
+1 -1
mm/Kconfig
··· 198 198 config MIGRATION 199 199 bool "Page migration" 200 200 def_bool y 201 - depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION 201 + depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA 202 202 help 203 203 Allows the migration of the physical location of pages of processes 204 204 while the virtual addresses are not changed. This is useful in
+1 -2
mm/Makefile
··· 13 13 readahead.o swap.o truncate.o vmscan.o shmem.o \ 14 14 prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \ 15 15 page_isolation.o mm_init.o mmu_context.o percpu.o \ 16 - $(mmu-y) 16 + compaction.o $(mmu-y) 17 17 obj-y += init-mm.o 18 18 19 19 ifdef CONFIG_NO_BOOTMEM ··· 32 32 obj-$(CONFIG_SPARSEMEM) += sparse.o 33 33 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o 34 34 obj-$(CONFIG_SLOB) += slob.o 35 - obj-$(CONFIG_COMPACTION) += compaction.o 36 35 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o 37 36 obj-$(CONFIG_KSM) += ksm.o 38 37 obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o
+263 -157
mm/compaction.c
··· 16 16 #include <linux/sysfs.h> 17 17 #include "internal.h" 18 18 19 + #if defined CONFIG_COMPACTION || defined CONFIG_CMA 20 + 19 21 #define CREATE_TRACE_POINTS 20 22 #include <trace/events/compaction.h> 21 - 22 - /* 23 - * compact_control is used to track pages being migrated and the free pages 24 - * they are being migrated to during memory compaction. The free_pfn starts 25 - * at the end of a zone and migrate_pfn begins at the start. Movable pages 26 - * are moved to the end of a zone during a compaction run and the run 27 - * completes when free_pfn <= migrate_pfn 28 - */ 29 - struct compact_control { 30 - struct list_head freepages; /* List of free pages to migrate to */ 31 - struct list_head migratepages; /* List of pages being migrated */ 32 - unsigned long nr_freepages; /* Number of isolated free pages */ 33 - unsigned long nr_migratepages; /* Number of pages to migrate */ 34 - unsigned long free_pfn; /* isolate_freepages search base */ 35 - unsigned long migrate_pfn; /* isolate_migratepages search base */ 36 - bool sync; /* Synchronous migration */ 37 - 38 - int order; /* order a direct compactor needs */ 39 - int migratetype; /* MOVABLE, RECLAIMABLE etc */ 40 - struct zone *zone; 41 - }; 42 23 43 24 static unsigned long release_freepages(struct list_head *freelist) 44 25 { ··· 35 54 return count; 36 55 } 37 56 38 - /* Isolate free pages onto a private freelist. Must hold zone->lock */ 39 - static unsigned long isolate_freepages_block(struct zone *zone, 40 - unsigned long blockpfn, 41 - struct list_head *freelist) 57 + static void map_pages(struct list_head *list) 42 58 { 43 - unsigned long zone_end_pfn, end_pfn; 59 + struct page *page; 60 + 61 + list_for_each_entry(page, list, lru) { 62 + arch_alloc_page(page, 0); 63 + kernel_map_pages(page, 1, 1); 64 + } 65 + } 66 + 67 + static inline bool migrate_async_suitable(int migratetype) 68 + { 69 + return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE; 70 + } 71 + 72 + /* 73 + * Isolate free pages onto a private freelist. Caller must hold zone->lock. 74 + * If @strict is true, will abort returning 0 on any invalid PFNs or non-free 75 + * pages inside of the pageblock (even though it may still end up isolating 76 + * some pages). 77 + */ 78 + static unsigned long isolate_freepages_block(unsigned long blockpfn, 79 + unsigned long end_pfn, 80 + struct list_head *freelist, 81 + bool strict) 82 + { 44 83 int nr_scanned = 0, total_isolated = 0; 45 84 struct page *cursor; 46 85 47 - /* Get the last PFN we should scan for free pages at */ 48 - zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages; 49 - end_pfn = min(blockpfn + pageblock_nr_pages, zone_end_pfn); 50 - 51 - /* Find the first usable PFN in the block to initialse page cursor */ 52 - for (; blockpfn < end_pfn; blockpfn++) { 53 - if (pfn_valid_within(blockpfn)) 54 - break; 55 - } 56 86 cursor = pfn_to_page(blockpfn); 57 87 58 88 /* Isolate free pages. This assumes the block is valid */ ··· 71 79 int isolated, i; 72 80 struct page *page = cursor; 73 81 74 - if (!pfn_valid_within(blockpfn)) 82 + if (!pfn_valid_within(blockpfn)) { 83 + if (strict) 84 + return 0; 75 85 continue; 86 + } 76 87 nr_scanned++; 77 88 78 - if (!PageBuddy(page)) 89 + if (!PageBuddy(page)) { 90 + if (strict) 91 + return 0; 79 92 continue; 93 + } 80 94 81 95 /* Found a free page, break it into order-0 pages */ 82 96 isolated = split_free_page(page); 97 + if (!isolated && strict) 98 + return 0; 83 99 total_isolated += isolated; 84 100 for (i = 0; i < isolated; i++) { 85 101 list_add(&page->lru, freelist); ··· 105 105 return total_isolated; 106 106 } 107 107 108 - /* Returns true if the page is within a block suitable for migration to */ 109 - static bool suitable_migration_target(struct page *page) 110 - { 111 - 112 - int migratetype = get_pageblock_migratetype(page); 113 - 114 - /* Don't interfere with memory hot-remove or the min_free_kbytes blocks */ 115 - if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE) 116 - return false; 117 - 118 - /* If the page is a large free page, then allow migration */ 119 - if (PageBuddy(page) && page_order(page) >= pageblock_order) 120 - return true; 121 - 122 - /* If the block is MIGRATE_MOVABLE, allow migration */ 123 - if (migratetype == MIGRATE_MOVABLE) 124 - return true; 125 - 126 - /* Otherwise skip the block */ 127 - return false; 128 - } 129 - 130 - /* 131 - * Based on information in the current compact_control, find blocks 132 - * suitable for isolating free pages from and then isolate them. 108 + /** 109 + * isolate_freepages_range() - isolate free pages. 110 + * @start_pfn: The first PFN to start isolating. 111 + * @end_pfn: The one-past-last PFN. 112 + * 113 + * Non-free pages, invalid PFNs, or zone boundaries within the 114 + * [start_pfn, end_pfn) range are considered errors, cause function to 115 + * undo its actions and return zero. 116 + * 117 + * Otherwise, function returns one-past-the-last PFN of isolated page 118 + * (which may be greater then end_pfn if end fell in a middle of 119 + * a free page). 133 120 */ 134 - static void isolate_freepages(struct zone *zone, 135 - struct compact_control *cc) 121 + unsigned long 122 + isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn) 136 123 { 137 - struct page *page; 138 - unsigned long high_pfn, low_pfn, pfn; 139 - unsigned long flags; 140 - int nr_freepages = cc->nr_freepages; 141 - struct list_head *freelist = &cc->freepages; 124 + unsigned long isolated, pfn, block_end_pfn, flags; 125 + struct zone *zone = NULL; 126 + LIST_HEAD(freelist); 142 127 143 - /* 144 - * Initialise the free scanner. The starting point is where we last 145 - * scanned from (or the end of the zone if starting). The low point 146 - * is the end of the pageblock the migration scanner is using. 147 - */ 148 - pfn = cc->free_pfn; 149 - low_pfn = cc->migrate_pfn + pageblock_nr_pages; 128 + if (pfn_valid(start_pfn)) 129 + zone = page_zone(pfn_to_page(start_pfn)); 150 130 151 - /* 152 - * Take care that if the migration scanner is at the end of the zone 153 - * that the free scanner does not accidentally move to the next zone 154 - * in the next isolation cycle. 155 - */ 156 - high_pfn = min(low_pfn, pfn); 157 - 158 - /* 159 - * Isolate free pages until enough are available to migrate the 160 - * pages on cc->migratepages. We stop searching if the migrate 161 - * and free page scanners meet or enough free pages are isolated. 162 - */ 163 - for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages; 164 - pfn -= pageblock_nr_pages) { 165 - unsigned long isolated; 166 - 167 - if (!pfn_valid(pfn)) 168 - continue; 131 + for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) { 132 + if (!pfn_valid(pfn) || zone != page_zone(pfn_to_page(pfn))) 133 + break; 169 134 170 135 /* 171 - * Check for overlapping nodes/zones. It's possible on some 172 - * configurations to have a setup like 173 - * node0 node1 node0 174 - * i.e. it's possible that all pages within a zones range of 175 - * pages do not belong to a single zone. 136 + * On subsequent iterations ALIGN() is actually not needed, 137 + * but we keep it that we not to complicate the code. 176 138 */ 177 - page = pfn_to_page(pfn); 178 - if (page_zone(page) != zone) 179 - continue; 139 + block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages); 140 + block_end_pfn = min(block_end_pfn, end_pfn); 180 141 181 - /* Check the block is suitable for migration */ 182 - if (!suitable_migration_target(page)) 183 - continue; 184 - 185 - /* 186 - * Found a block suitable for isolating free pages from. Now 187 - * we disabled interrupts, double check things are ok and 188 - * isolate the pages. This is to minimise the time IRQs 189 - * are disabled 190 - */ 191 - isolated = 0; 192 142 spin_lock_irqsave(&zone->lock, flags); 193 - if (suitable_migration_target(page)) { 194 - isolated = isolate_freepages_block(zone, pfn, freelist); 195 - nr_freepages += isolated; 196 - } 143 + isolated = isolate_freepages_block(pfn, block_end_pfn, 144 + &freelist, true); 197 145 spin_unlock_irqrestore(&zone->lock, flags); 198 146 199 147 /* 200 - * Record the highest PFN we isolated pages from. When next 201 - * looking for free pages, the search will restart here as 202 - * page migration may have returned some pages to the allocator 148 + * In strict mode, isolate_freepages_block() returns 0 if 149 + * there are any holes in the block (ie. invalid PFNs or 150 + * non-free pages). 203 151 */ 204 - if (isolated) 205 - high_pfn = max(high_pfn, pfn); 152 + if (!isolated) 153 + break; 154 + 155 + /* 156 + * If we managed to isolate pages, it is always (1 << n) * 157 + * pageblock_nr_pages for some non-negative n. (Max order 158 + * page may span two pageblocks). 159 + */ 206 160 } 207 161 208 162 /* split_free_page does not map the pages */ 209 - list_for_each_entry(page, freelist, lru) { 210 - arch_alloc_page(page, 0); 211 - kernel_map_pages(page, 1, 1); 163 + map_pages(&freelist); 164 + 165 + if (pfn < end_pfn) { 166 + /* Loop terminated early, cleanup. */ 167 + release_freepages(&freelist); 168 + return 0; 212 169 } 213 170 214 - cc->free_pfn = high_pfn; 215 - cc->nr_freepages = nr_freepages; 171 + /* We don't use freelists for anything. */ 172 + return pfn; 216 173 } 217 174 218 175 /* Update the number of anon and file isolated pages in the zone */ ··· 200 243 return isolated > (inactive + active) / 2; 201 244 } 202 245 203 - /* possible outcome of isolate_migratepages */ 204 - typedef enum { 205 - ISOLATE_ABORT, /* Abort compaction now */ 206 - ISOLATE_NONE, /* No pages isolated, continue scanning */ 207 - ISOLATE_SUCCESS, /* Pages isolated, migrate */ 208 - } isolate_migrate_t; 209 - 210 - /* 211 - * Isolate all pages that can be migrated from the block pointed to by 212 - * the migrate scanner within compact_control. 246 + /** 247 + * isolate_migratepages_range() - isolate all migrate-able pages in range. 248 + * @zone: Zone pages are in. 249 + * @cc: Compaction control structure. 250 + * @low_pfn: The first PFN of the range. 251 + * @end_pfn: The one-past-the-last PFN of the range. 252 + * 253 + * Isolate all pages that can be migrated from the range specified by 254 + * [low_pfn, end_pfn). Returns zero if there is a fatal signal 255 + * pending), otherwise PFN of the first page that was not scanned 256 + * (which may be both less, equal to or more then end_pfn). 257 + * 258 + * Assumes that cc->migratepages is empty and cc->nr_migratepages is 259 + * zero. 260 + * 261 + * Apart from cc->migratepages and cc->nr_migratetypes this function 262 + * does not modify any cc's fields, in particular it does not modify 263 + * (or read for that matter) cc->migrate_pfn. 213 264 */ 214 - static isolate_migrate_t isolate_migratepages(struct zone *zone, 215 - struct compact_control *cc) 265 + unsigned long 266 + isolate_migratepages_range(struct zone *zone, struct compact_control *cc, 267 + unsigned long low_pfn, unsigned long end_pfn) 216 268 { 217 - unsigned long low_pfn, end_pfn; 218 269 unsigned long last_pageblock_nr = 0, pageblock_nr; 219 270 unsigned long nr_scanned = 0, nr_isolated = 0; 220 271 struct list_head *migratelist = &cc->migratepages; 221 272 isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE; 222 - 223 - /* Do not scan outside zone boundaries */ 224 - low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn); 225 - 226 - /* Only scan within a pageblock boundary */ 227 - end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages); 228 - 229 - /* Do not cross the free scanner or scan within a memory hole */ 230 - if (end_pfn > cc->free_pfn || !pfn_valid(low_pfn)) { 231 - cc->migrate_pfn = end_pfn; 232 - return ISOLATE_NONE; 233 - } 234 273 235 274 /* 236 275 * Ensure that there are not too many pages isolated from the LRU ··· 236 283 while (unlikely(too_many_isolated(zone))) { 237 284 /* async migration should just abort */ 238 285 if (!cc->sync) 239 - return ISOLATE_ABORT; 286 + return 0; 240 287 241 288 congestion_wait(BLK_RW_ASYNC, HZ/10); 242 289 243 290 if (fatal_signal_pending(current)) 244 - return ISOLATE_ABORT; 291 + return 0; 245 292 } 246 293 247 294 /* Time to isolate some pages for migration */ ··· 304 351 */ 305 352 pageblock_nr = low_pfn >> pageblock_order; 306 353 if (!cc->sync && last_pageblock_nr != pageblock_nr && 307 - get_pageblock_migratetype(page) != MIGRATE_MOVABLE) { 354 + !migrate_async_suitable(get_pageblock_migratetype(page))) { 308 355 low_pfn += pageblock_nr_pages; 309 356 low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1; 310 357 last_pageblock_nr = pageblock_nr; ··· 349 396 acct_isolated(zone, cc); 350 397 351 398 spin_unlock_irq(&zone->lru_lock); 352 - cc->migrate_pfn = low_pfn; 353 399 354 400 trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated); 355 401 356 - return ISOLATE_SUCCESS; 402 + return low_pfn; 403 + } 404 + 405 + #endif /* CONFIG_COMPACTION || CONFIG_CMA */ 406 + #ifdef CONFIG_COMPACTION 407 + 408 + /* Returns true if the page is within a block suitable for migration to */ 409 + static bool suitable_migration_target(struct page *page) 410 + { 411 + 412 + int migratetype = get_pageblock_migratetype(page); 413 + 414 + /* Don't interfere with memory hot-remove or the min_free_kbytes blocks */ 415 + if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE) 416 + return false; 417 + 418 + /* If the page is a large free page, then allow migration */ 419 + if (PageBuddy(page) && page_order(page) >= pageblock_order) 420 + return true; 421 + 422 + /* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */ 423 + if (migrate_async_suitable(migratetype)) 424 + return true; 425 + 426 + /* Otherwise skip the block */ 427 + return false; 428 + } 429 + 430 + /* 431 + * Based on information in the current compact_control, find blocks 432 + * suitable for isolating free pages from and then isolate them. 433 + */ 434 + static void isolate_freepages(struct zone *zone, 435 + struct compact_control *cc) 436 + { 437 + struct page *page; 438 + unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn; 439 + unsigned long flags; 440 + int nr_freepages = cc->nr_freepages; 441 + struct list_head *freelist = &cc->freepages; 442 + 443 + /* 444 + * Initialise the free scanner. The starting point is where we last 445 + * scanned from (or the end of the zone if starting). The low point 446 + * is the end of the pageblock the migration scanner is using. 447 + */ 448 + pfn = cc->free_pfn; 449 + low_pfn = cc->migrate_pfn + pageblock_nr_pages; 450 + 451 + /* 452 + * Take care that if the migration scanner is at the end of the zone 453 + * that the free scanner does not accidentally move to the next zone 454 + * in the next isolation cycle. 455 + */ 456 + high_pfn = min(low_pfn, pfn); 457 + 458 + zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages; 459 + 460 + /* 461 + * Isolate free pages until enough are available to migrate the 462 + * pages on cc->migratepages. We stop searching if the migrate 463 + * and free page scanners meet or enough free pages are isolated. 464 + */ 465 + for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages; 466 + pfn -= pageblock_nr_pages) { 467 + unsigned long isolated; 468 + 469 + if (!pfn_valid(pfn)) 470 + continue; 471 + 472 + /* 473 + * Check for overlapping nodes/zones. It's possible on some 474 + * configurations to have a setup like 475 + * node0 node1 node0 476 + * i.e. it's possible that all pages within a zones range of 477 + * pages do not belong to a single zone. 478 + */ 479 + page = pfn_to_page(pfn); 480 + if (page_zone(page) != zone) 481 + continue; 482 + 483 + /* Check the block is suitable for migration */ 484 + if (!suitable_migration_target(page)) 485 + continue; 486 + 487 + /* 488 + * Found a block suitable for isolating free pages from. Now 489 + * we disabled interrupts, double check things are ok and 490 + * isolate the pages. This is to minimise the time IRQs 491 + * are disabled 492 + */ 493 + isolated = 0; 494 + spin_lock_irqsave(&zone->lock, flags); 495 + if (suitable_migration_target(page)) { 496 + end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); 497 + isolated = isolate_freepages_block(pfn, end_pfn, 498 + freelist, false); 499 + nr_freepages += isolated; 500 + } 501 + spin_unlock_irqrestore(&zone->lock, flags); 502 + 503 + /* 504 + * Record the highest PFN we isolated pages from. When next 505 + * looking for free pages, the search will restart here as 506 + * page migration may have returned some pages to the allocator 507 + */ 508 + if (isolated) 509 + high_pfn = max(high_pfn, pfn); 510 + } 511 + 512 + /* split_free_page does not map the pages */ 513 + map_pages(freelist); 514 + 515 + cc->free_pfn = high_pfn; 516 + cc->nr_freepages = nr_freepages; 357 517 } 358 518 359 519 /* ··· 513 447 514 448 cc->nr_migratepages = nr_migratepages; 515 449 cc->nr_freepages = nr_freepages; 450 + } 451 + 452 + /* possible outcome of isolate_migratepages */ 453 + typedef enum { 454 + ISOLATE_ABORT, /* Abort compaction now */ 455 + ISOLATE_NONE, /* No pages isolated, continue scanning */ 456 + ISOLATE_SUCCESS, /* Pages isolated, migrate */ 457 + } isolate_migrate_t; 458 + 459 + /* 460 + * Isolate all pages that can be migrated from the block pointed to by 461 + * the migrate scanner within compact_control. 462 + */ 463 + static isolate_migrate_t isolate_migratepages(struct zone *zone, 464 + struct compact_control *cc) 465 + { 466 + unsigned long low_pfn, end_pfn; 467 + 468 + /* Do not scan outside zone boundaries */ 469 + low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn); 470 + 471 + /* Only scan within a pageblock boundary */ 472 + end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages); 473 + 474 + /* Do not cross the free scanner or scan within a memory hole */ 475 + if (end_pfn > cc->free_pfn || !pfn_valid(low_pfn)) { 476 + cc->migrate_pfn = end_pfn; 477 + return ISOLATE_NONE; 478 + } 479 + 480 + /* Perform the isolation */ 481 + low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn); 482 + if (!low_pfn) 483 + return ISOLATE_ABORT; 484 + 485 + cc->migrate_pfn = low_pfn; 486 + 487 + return ISOLATE_SUCCESS; 516 488 } 517 489 518 490 static int compact_finished(struct zone *zone, ··· 899 795 return device_remove_file(&node->dev, &dev_attr_compact); 900 796 } 901 797 #endif /* CONFIG_SYSFS && CONFIG_NUMA */ 798 + 799 + #endif /* CONFIG_COMPACTION */
+33
mm/internal.h
··· 100 100 extern bool is_free_buddy_page(struct page *page); 101 101 #endif 102 102 103 + #if defined CONFIG_COMPACTION || defined CONFIG_CMA 104 + 105 + /* 106 + * in mm/compaction.c 107 + */ 108 + /* 109 + * compact_control is used to track pages being migrated and the free pages 110 + * they are being migrated to during memory compaction. The free_pfn starts 111 + * at the end of a zone and migrate_pfn begins at the start. Movable pages 112 + * are moved to the end of a zone during a compaction run and the run 113 + * completes when free_pfn <= migrate_pfn 114 + */ 115 + struct compact_control { 116 + struct list_head freepages; /* List of free pages to migrate to */ 117 + struct list_head migratepages; /* List of pages being migrated */ 118 + unsigned long nr_freepages; /* Number of isolated free pages */ 119 + unsigned long nr_migratepages; /* Number of pages to migrate */ 120 + unsigned long free_pfn; /* isolate_freepages search base */ 121 + unsigned long migrate_pfn; /* isolate_migratepages search base */ 122 + bool sync; /* Synchronous migration */ 123 + 124 + int order; /* order a direct compactor needs */ 125 + int migratetype; /* MOVABLE, RECLAIMABLE etc */ 126 + struct zone *zone; 127 + }; 128 + 129 + unsigned long 130 + isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn); 131 + unsigned long 132 + isolate_migratepages_range(struct zone *zone, struct compact_control *cc, 133 + unsigned long low_pfn, unsigned long end_pfn); 134 + 135 + #endif 103 136 104 137 /* 105 138 * function for dealing with page's order in buddy system.
+1 -1
mm/memory-failure.c
··· 1404 1404 /* Not a free page */ 1405 1405 ret = 1; 1406 1406 } 1407 - unset_migratetype_isolate(p); 1407 + unset_migratetype_isolate(p, MIGRATE_MOVABLE); 1408 1408 unlock_memory_hotplug(); 1409 1409 return ret; 1410 1410 }
+3 -3
mm/memory_hotplug.c
··· 891 891 nr_pages = end_pfn - start_pfn; 892 892 893 893 /* set above range as isolated */ 894 - ret = start_isolate_page_range(start_pfn, end_pfn); 894 + ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); 895 895 if (ret) 896 896 goto out; 897 897 ··· 956 956 We cannot do rollback at this point. */ 957 957 offline_isolated_pages(start_pfn, end_pfn); 958 958 /* reset pagetype flags and makes migrate type to be MOVABLE */ 959 - undo_isolate_page_range(start_pfn, end_pfn); 959 + undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); 960 960 /* removal success */ 961 961 zone->present_pages -= offlined_pages; 962 962 zone->zone_pgdat->node_present_pages -= offlined_pages; ··· 981 981 start_pfn, end_pfn); 982 982 memory_notify(MEM_CANCEL_OFFLINE, &arg); 983 983 /* pushback to free area */ 984 - undo_isolate_page_range(start_pfn, end_pfn); 984 + undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); 985 985 986 986 out: 987 987 unlock_memory_hotplug();
+363 -46
mm/page_alloc.c
··· 57 57 #include <linux/ftrace_event.h> 58 58 #include <linux/memcontrol.h> 59 59 #include <linux/prefetch.h> 60 + #include <linux/migrate.h> 60 61 #include <linux/page-debug-flags.h> 61 62 62 63 #include <asm/tlbflush.h> ··· 514 513 * free pages of length of (1 << order) and marked with _mapcount -2. Page's 515 514 * order is recorded in page_private(page) field. 516 515 * So when we are allocating or freeing one, we can derive the state of the 517 - * other. That is, if we allocate a small block, and both were 518 - * free, the remainder of the region must be split into blocks. 516 + * other. That is, if we allocate a small block, and both were 517 + * free, the remainder of the region must be split into blocks. 519 518 * If a block is freed, and its buddy is also free, then this 520 - * triggers coalescing into a block of larger size. 519 + * triggers coalescing into a block of larger size. 521 520 * 522 521 * -- wli 523 522 */ ··· 750 749 __free_pages(page, order); 751 750 } 752 751 752 + #ifdef CONFIG_CMA 753 + /* Free whole pageblock and set it's migration type to MIGRATE_CMA. */ 754 + void __init init_cma_reserved_pageblock(struct page *page) 755 + { 756 + unsigned i = pageblock_nr_pages; 757 + struct page *p = page; 758 + 759 + do { 760 + __ClearPageReserved(p); 761 + set_page_count(p, 0); 762 + } while (++p, --i); 763 + 764 + set_page_refcounted(page); 765 + set_pageblock_migratetype(page, MIGRATE_CMA); 766 + __free_pages(page, pageblock_order); 767 + totalram_pages += pageblock_nr_pages; 768 + } 769 + #endif 753 770 754 771 /* 755 772 * The order of subdivision here is critical for the IO subsystem. ··· 893 874 * This array describes the order lists are fallen back to when 894 875 * the free lists for the desirable migrate type are depleted 895 876 */ 896 - static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = { 897 - [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE }, 898 - [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE }, 899 - [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE }, 900 - [MIGRATE_RESERVE] = { MIGRATE_RESERVE, MIGRATE_RESERVE, MIGRATE_RESERVE }, /* Never used */ 877 + static int fallbacks[MIGRATE_TYPES][4] = { 878 + [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE }, 879 + [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE }, 880 + #ifdef CONFIG_CMA 881 + [MIGRATE_MOVABLE] = { MIGRATE_CMA, MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE }, 882 + [MIGRATE_CMA] = { MIGRATE_RESERVE }, /* Never used */ 883 + #else 884 + [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE }, 885 + #endif 886 + [MIGRATE_RESERVE] = { MIGRATE_RESERVE }, /* Never used */ 887 + [MIGRATE_ISOLATE] = { MIGRATE_RESERVE }, /* Never used */ 901 888 }; 902 889 903 890 /* ··· 998 973 /* Find the largest possible block of pages in the other list */ 999 974 for (current_order = MAX_ORDER-1; current_order >= order; 1000 975 --current_order) { 1001 - for (i = 0; i < MIGRATE_TYPES - 1; i++) { 976 + for (i = 0;; i++) { 1002 977 migratetype = fallbacks[start_migratetype][i]; 1003 978 1004 979 /* MIGRATE_RESERVE handled later if necessary */ 1005 980 if (migratetype == MIGRATE_RESERVE) 1006 - continue; 981 + break; 1007 982 1008 983 area = &(zone->free_area[current_order]); 1009 984 if (list_empty(&area->free_list[migratetype])) ··· 1018 993 * pages to the preferred allocation list. If falling 1019 994 * back for a reclaimable kernel allocation, be more 1020 995 * aggressive about taking ownership of free pages 996 + * 997 + * On the other hand, never change migration 998 + * type of MIGRATE_CMA pageblocks nor move CMA 999 + * pages on different free lists. We don't 1000 + * want unmovable pages to be allocated from 1001 + * MIGRATE_CMA areas. 1021 1002 */ 1022 - if (unlikely(current_order >= (pageblock_order >> 1)) || 1023 - start_migratetype == MIGRATE_RECLAIMABLE || 1024 - page_group_by_mobility_disabled) { 1025 - unsigned long pages; 1003 + if (!is_migrate_cma(migratetype) && 1004 + (unlikely(current_order >= pageblock_order / 2) || 1005 + start_migratetype == MIGRATE_RECLAIMABLE || 1006 + page_group_by_mobility_disabled)) { 1007 + int pages; 1026 1008 pages = move_freepages_block(zone, page, 1027 1009 start_migratetype); 1028 1010 ··· 1047 1015 rmv_page_order(page); 1048 1016 1049 1017 /* Take ownership for orders >= pageblock_order */ 1050 - if (current_order >= pageblock_order) 1018 + if (current_order >= pageblock_order && 1019 + !is_migrate_cma(migratetype)) 1051 1020 change_pageblock_range(page, current_order, 1052 1021 start_migratetype); 1053 1022 1054 - expand(zone, page, order, current_order, area, migratetype); 1023 + expand(zone, page, order, current_order, area, 1024 + is_migrate_cma(migratetype) 1025 + ? migratetype : start_migratetype); 1055 1026 1056 1027 trace_mm_page_alloc_extfrag(page, order, current_order, 1057 1028 start_migratetype, migratetype); ··· 1096 1061 return page; 1097 1062 } 1098 1063 1099 - /* 1064 + /* 1100 1065 * Obtain a specified number of elements from the buddy allocator, all under 1101 1066 * a single hold of the lock, for efficiency. Add them to the supplied list. 1102 1067 * Returns the number of new pages which were placed at *list. 1103 1068 */ 1104 - static int rmqueue_bulk(struct zone *zone, unsigned int order, 1069 + static int rmqueue_bulk(struct zone *zone, unsigned int order, 1105 1070 unsigned long count, struct list_head *list, 1106 1071 int migratetype, int cold) 1107 1072 { 1108 - int i; 1109 - 1073 + int mt = migratetype, i; 1074 + 1110 1075 spin_lock(&zone->lock); 1111 1076 for (i = 0; i < count; ++i) { 1112 1077 struct page *page = __rmqueue(zone, order, migratetype); ··· 1126 1091 list_add(&page->lru, list); 1127 1092 else 1128 1093 list_add_tail(&page->lru, list); 1129 - set_page_private(page, migratetype); 1094 + if (IS_ENABLED(CONFIG_CMA)) { 1095 + mt = get_pageblock_migratetype(page); 1096 + if (!is_migrate_cma(mt) && mt != MIGRATE_ISOLATE) 1097 + mt = migratetype; 1098 + } 1099 + set_page_private(page, mt); 1130 1100 list = &page->lru; 1131 1101 } 1132 1102 __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); ··· 1411 1371 1412 1372 if (order >= pageblock_order - 1) { 1413 1373 struct page *endpage = page + (1 << order) - 1; 1414 - for (; page < endpage; page += pageblock_nr_pages) 1415 - set_pageblock_migratetype(page, MIGRATE_MOVABLE); 1374 + for (; page < endpage; page += pageblock_nr_pages) { 1375 + int mt = get_pageblock_migratetype(page); 1376 + if (mt != MIGRATE_ISOLATE && !is_migrate_cma(mt)) 1377 + set_pageblock_migratetype(page, 1378 + MIGRATE_MOVABLE); 1379 + } 1416 1380 } 1417 1381 1418 1382 return 1 << order; ··· 2130 2086 } 2131 2087 #endif /* CONFIG_COMPACTION */ 2132 2088 2133 - /* The really slow allocator path where we enter direct reclaim */ 2134 - static inline struct page * 2135 - __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order, 2136 - struct zonelist *zonelist, enum zone_type high_zoneidx, 2137 - nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone, 2138 - int migratetype, unsigned long *did_some_progress) 2089 + /* Perform direct synchronous page reclaim */ 2090 + static int 2091 + __perform_reclaim(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, 2092 + nodemask_t *nodemask) 2139 2093 { 2140 - struct page *page = NULL; 2141 2094 struct reclaim_state reclaim_state; 2142 - bool drained = false; 2095 + int progress; 2143 2096 2144 2097 cond_resched(); 2145 2098 ··· 2147 2106 reclaim_state.reclaimed_slab = 0; 2148 2107 current->reclaim_state = &reclaim_state; 2149 2108 2150 - *did_some_progress = try_to_free_pages(zonelist, order, gfp_mask, nodemask); 2109 + progress = try_to_free_pages(zonelist, order, gfp_mask, nodemask); 2151 2110 2152 2111 current->reclaim_state = NULL; 2153 2112 lockdep_clear_current_reclaim_state(); ··· 2155 2114 2156 2115 cond_resched(); 2157 2116 2117 + return progress; 2118 + } 2119 + 2120 + /* The really slow allocator path where we enter direct reclaim */ 2121 + static inline struct page * 2122 + __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order, 2123 + struct zonelist *zonelist, enum zone_type high_zoneidx, 2124 + nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone, 2125 + int migratetype, unsigned long *did_some_progress) 2126 + { 2127 + struct page *page = NULL; 2128 + bool drained = false; 2129 + 2130 + *did_some_progress = __perform_reclaim(gfp_mask, order, zonelist, 2131 + nodemask); 2158 2132 if (unlikely(!(*did_some_progress))) 2159 2133 return NULL; 2160 2134 ··· 4357 4301 init_waitqueue_head(&pgdat->kswapd_wait); 4358 4302 pgdat->kswapd_max_order = 0; 4359 4303 pgdat_page_cgroup_init(pgdat); 4360 - 4304 + 4361 4305 for (j = 0; j < MAX_NR_ZONES; j++) { 4362 4306 struct zone *zone = pgdat->node_zones + j; 4363 4307 unsigned long size, realsize, memmap_pages; ··· 5032 4976 calculate_totalreserve_pages(); 5033 4977 } 5034 4978 5035 - /** 5036 - * setup_per_zone_wmarks - called when min_free_kbytes changes 5037 - * or when memory is hot-{added|removed} 5038 - * 5039 - * Ensures that the watermark[min,low,high] values for each zone are set 5040 - * correctly with respect to min_free_kbytes. 5041 - */ 5042 - void setup_per_zone_wmarks(void) 4979 + static void __setup_per_zone_wmarks(void) 5043 4980 { 5044 4981 unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10); 5045 4982 unsigned long lowmem_pages = 0; ··· 5079 5030 5080 5031 zone->watermark[WMARK_LOW] = min_wmark_pages(zone) + (tmp >> 2); 5081 5032 zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + (tmp >> 1); 5033 + 5034 + zone->watermark[WMARK_MIN] += cma_wmark_pages(zone); 5035 + zone->watermark[WMARK_LOW] += cma_wmark_pages(zone); 5036 + zone->watermark[WMARK_HIGH] += cma_wmark_pages(zone); 5037 + 5082 5038 setup_zone_migrate_reserve(zone); 5083 5039 spin_unlock_irqrestore(&zone->lock, flags); 5084 5040 } 5085 5041 5086 5042 /* update totalreserve_pages */ 5087 5043 calculate_totalreserve_pages(); 5044 + } 5045 + 5046 + /** 5047 + * setup_per_zone_wmarks - called when min_free_kbytes changes 5048 + * or when memory is hot-{added|removed} 5049 + * 5050 + * Ensures that the watermark[min,low,high] values for each zone are set 5051 + * correctly with respect to min_free_kbytes. 5052 + */ 5053 + void setup_per_zone_wmarks(void) 5054 + { 5055 + mutex_lock(&zonelists_mutex); 5056 + __setup_per_zone_wmarks(); 5057 + mutex_unlock(&zonelists_mutex); 5088 5058 } 5089 5059 5090 5060 /* ··· 5483 5415 __count_immobile_pages(struct zone *zone, struct page *page, int count) 5484 5416 { 5485 5417 unsigned long pfn, iter, found; 5418 + int mt; 5419 + 5486 5420 /* 5487 5421 * For avoiding noise data, lru_add_drain_all() should be called 5488 5422 * If ZONE_MOVABLE, the zone never contains immobile pages 5489 5423 */ 5490 5424 if (zone_idx(zone) == ZONE_MOVABLE) 5491 5425 return true; 5492 - 5493 - if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE) 5426 + mt = get_pageblock_migratetype(page); 5427 + if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt)) 5494 5428 return true; 5495 5429 5496 5430 pfn = page_to_pfn(page); ··· 5609 5539 return ret; 5610 5540 } 5611 5541 5612 - void unset_migratetype_isolate(struct page *page) 5542 + void unset_migratetype_isolate(struct page *page, unsigned migratetype) 5613 5543 { 5614 5544 struct zone *zone; 5615 5545 unsigned long flags; ··· 5617 5547 spin_lock_irqsave(&zone->lock, flags); 5618 5548 if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) 5619 5549 goto out; 5620 - set_pageblock_migratetype(page, MIGRATE_MOVABLE); 5621 - move_freepages_block(zone, page, MIGRATE_MOVABLE); 5550 + set_pageblock_migratetype(page, migratetype); 5551 + move_freepages_block(zone, page, migratetype); 5622 5552 out: 5623 5553 spin_unlock_irqrestore(&zone->lock, flags); 5624 5554 } 5555 + 5556 + #ifdef CONFIG_CMA 5557 + 5558 + static unsigned long pfn_max_align_down(unsigned long pfn) 5559 + { 5560 + return pfn & ~(max_t(unsigned long, MAX_ORDER_NR_PAGES, 5561 + pageblock_nr_pages) - 1); 5562 + } 5563 + 5564 + static unsigned long pfn_max_align_up(unsigned long pfn) 5565 + { 5566 + return ALIGN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES, 5567 + pageblock_nr_pages)); 5568 + } 5569 + 5570 + static struct page * 5571 + __alloc_contig_migrate_alloc(struct page *page, unsigned long private, 5572 + int **resultp) 5573 + { 5574 + return alloc_page(GFP_HIGHUSER_MOVABLE); 5575 + } 5576 + 5577 + /* [start, end) must belong to a single zone. */ 5578 + static int __alloc_contig_migrate_range(unsigned long start, unsigned long end) 5579 + { 5580 + /* This function is based on compact_zone() from compaction.c. */ 5581 + 5582 + unsigned long pfn = start; 5583 + unsigned int tries = 0; 5584 + int ret = 0; 5585 + 5586 + struct compact_control cc = { 5587 + .nr_migratepages = 0, 5588 + .order = -1, 5589 + .zone = page_zone(pfn_to_page(start)), 5590 + .sync = true, 5591 + }; 5592 + INIT_LIST_HEAD(&cc.migratepages); 5593 + 5594 + migrate_prep_local(); 5595 + 5596 + while (pfn < end || !list_empty(&cc.migratepages)) { 5597 + if (fatal_signal_pending(current)) { 5598 + ret = -EINTR; 5599 + break; 5600 + } 5601 + 5602 + if (list_empty(&cc.migratepages)) { 5603 + cc.nr_migratepages = 0; 5604 + pfn = isolate_migratepages_range(cc.zone, &cc, 5605 + pfn, end); 5606 + if (!pfn) { 5607 + ret = -EINTR; 5608 + break; 5609 + } 5610 + tries = 0; 5611 + } else if (++tries == 5) { 5612 + ret = ret < 0 ? ret : -EBUSY; 5613 + break; 5614 + } 5615 + 5616 + ret = migrate_pages(&cc.migratepages, 5617 + __alloc_contig_migrate_alloc, 5618 + 0, false, MIGRATE_SYNC); 5619 + } 5620 + 5621 + putback_lru_pages(&cc.migratepages); 5622 + return ret > 0 ? 0 : ret; 5623 + } 5624 + 5625 + /* 5626 + * Update zone's cma pages counter used for watermark level calculation. 5627 + */ 5628 + static inline void __update_cma_watermarks(struct zone *zone, int count) 5629 + { 5630 + unsigned long flags; 5631 + spin_lock_irqsave(&zone->lock, flags); 5632 + zone->min_cma_pages += count; 5633 + spin_unlock_irqrestore(&zone->lock, flags); 5634 + setup_per_zone_wmarks(); 5635 + } 5636 + 5637 + /* 5638 + * Trigger memory pressure bump to reclaim some pages in order to be able to 5639 + * allocate 'count' pages in single page units. Does similar work as 5640 + *__alloc_pages_slowpath() function. 5641 + */ 5642 + static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count) 5643 + { 5644 + enum zone_type high_zoneidx = gfp_zone(gfp_mask); 5645 + struct zonelist *zonelist = node_zonelist(0, gfp_mask); 5646 + int did_some_progress = 0; 5647 + int order = 1; 5648 + 5649 + /* 5650 + * Increase level of watermarks to force kswapd do his job 5651 + * to stabilise at new watermark level. 5652 + */ 5653 + __update_cma_watermarks(zone, count); 5654 + 5655 + /* Obey watermarks as if the page was being allocated */ 5656 + while (!zone_watermark_ok(zone, 0, low_wmark_pages(zone), 0, 0)) { 5657 + wake_all_kswapd(order, zonelist, high_zoneidx, zone_idx(zone)); 5658 + 5659 + did_some_progress = __perform_reclaim(gfp_mask, order, zonelist, 5660 + NULL); 5661 + if (!did_some_progress) { 5662 + /* Exhausted what can be done so it's blamo time */ 5663 + out_of_memory(zonelist, gfp_mask, order, NULL, false); 5664 + } 5665 + } 5666 + 5667 + /* Restore original watermark levels. */ 5668 + __update_cma_watermarks(zone, -count); 5669 + 5670 + return count; 5671 + } 5672 + 5673 + /** 5674 + * alloc_contig_range() -- tries to allocate given range of pages 5675 + * @start: start PFN to allocate 5676 + * @end: one-past-the-last PFN to allocate 5677 + * @migratetype: migratetype of the underlaying pageblocks (either 5678 + * #MIGRATE_MOVABLE or #MIGRATE_CMA). All pageblocks 5679 + * in range must have the same migratetype and it must 5680 + * be either of the two. 5681 + * 5682 + * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES 5683 + * aligned, however it's the caller's responsibility to guarantee that 5684 + * we are the only thread that changes migrate type of pageblocks the 5685 + * pages fall in. 5686 + * 5687 + * The PFN range must belong to a single zone. 5688 + * 5689 + * Returns zero on success or negative error code. On success all 5690 + * pages which PFN is in [start, end) are allocated for the caller and 5691 + * need to be freed with free_contig_range(). 5692 + */ 5693 + int alloc_contig_range(unsigned long start, unsigned long end, 5694 + unsigned migratetype) 5695 + { 5696 + struct zone *zone = page_zone(pfn_to_page(start)); 5697 + unsigned long outer_start, outer_end; 5698 + int ret = 0, order; 5699 + 5700 + /* 5701 + * What we do here is we mark all pageblocks in range as 5702 + * MIGRATE_ISOLATE. Because pageblock and max order pages may 5703 + * have different sizes, and due to the way page allocator 5704 + * work, we align the range to biggest of the two pages so 5705 + * that page allocator won't try to merge buddies from 5706 + * different pageblocks and change MIGRATE_ISOLATE to some 5707 + * other migration type. 5708 + * 5709 + * Once the pageblocks are marked as MIGRATE_ISOLATE, we 5710 + * migrate the pages from an unaligned range (ie. pages that 5711 + * we are interested in). This will put all the pages in 5712 + * range back to page allocator as MIGRATE_ISOLATE. 5713 + * 5714 + * When this is done, we take the pages in range from page 5715 + * allocator removing them from the buddy system. This way 5716 + * page allocator will never consider using them. 5717 + * 5718 + * This lets us mark the pageblocks back as 5719 + * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the 5720 + * aligned range but not in the unaligned, original range are 5721 + * put back to page allocator so that buddy can use them. 5722 + */ 5723 + 5724 + ret = start_isolate_page_range(pfn_max_align_down(start), 5725 + pfn_max_align_up(end), migratetype); 5726 + if (ret) 5727 + goto done; 5728 + 5729 + ret = __alloc_contig_migrate_range(start, end); 5730 + if (ret) 5731 + goto done; 5732 + 5733 + /* 5734 + * Pages from [start, end) are within a MAX_ORDER_NR_PAGES 5735 + * aligned blocks that are marked as MIGRATE_ISOLATE. What's 5736 + * more, all pages in [start, end) are free in page allocator. 5737 + * What we are going to do is to allocate all pages from 5738 + * [start, end) (that is remove them from page allocator). 5739 + * 5740 + * The only problem is that pages at the beginning and at the 5741 + * end of interesting range may be not aligned with pages that 5742 + * page allocator holds, ie. they can be part of higher order 5743 + * pages. Because of this, we reserve the bigger range and 5744 + * once this is done free the pages we are not interested in. 5745 + * 5746 + * We don't have to hold zone->lock here because the pages are 5747 + * isolated thus they won't get removed from buddy. 5748 + */ 5749 + 5750 + lru_add_drain_all(); 5751 + drain_all_pages(); 5752 + 5753 + order = 0; 5754 + outer_start = start; 5755 + while (!PageBuddy(pfn_to_page(outer_start))) { 5756 + if (++order >= MAX_ORDER) { 5757 + ret = -EBUSY; 5758 + goto done; 5759 + } 5760 + outer_start &= ~0UL << order; 5761 + } 5762 + 5763 + /* Make sure the range is really isolated. */ 5764 + if (test_pages_isolated(outer_start, end)) { 5765 + pr_warn("alloc_contig_range test_pages_isolated(%lx, %lx) failed\n", 5766 + outer_start, end); 5767 + ret = -EBUSY; 5768 + goto done; 5769 + } 5770 + 5771 + /* 5772 + * Reclaim enough pages to make sure that contiguous allocation 5773 + * will not starve the system. 5774 + */ 5775 + __reclaim_pages(zone, GFP_HIGHUSER_MOVABLE, end-start); 5776 + 5777 + /* Grab isolated pages from freelists. */ 5778 + outer_end = isolate_freepages_range(outer_start, end); 5779 + if (!outer_end) { 5780 + ret = -EBUSY; 5781 + goto done; 5782 + } 5783 + 5784 + /* Free head and tail (if any) */ 5785 + if (start != outer_start) 5786 + free_contig_range(outer_start, start - outer_start); 5787 + if (end != outer_end) 5788 + free_contig_range(end, outer_end - end); 5789 + 5790 + done: 5791 + undo_isolate_page_range(pfn_max_align_down(start), 5792 + pfn_max_align_up(end), migratetype); 5793 + return ret; 5794 + } 5795 + 5796 + void free_contig_range(unsigned long pfn, unsigned nr_pages) 5797 + { 5798 + for (; nr_pages--; ++pfn) 5799 + __free_page(pfn_to_page(pfn)); 5800 + } 5801 + #endif 5625 5802 5626 5803 #ifdef CONFIG_MEMORY_HOTREMOVE 5627 5804 /*
+8 -7
mm/page_isolation.c
··· 24 24 * to be MIGRATE_ISOLATE. 25 25 * @start_pfn: The lower PFN of the range to be isolated. 26 26 * @end_pfn: The upper PFN of the range to be isolated. 27 + * @migratetype: migrate type to set in error recovery. 27 28 * 28 29 * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in 29 30 * the range will never be allocated. Any free pages and pages freed in the ··· 33 32 * start_pfn/end_pfn must be aligned to pageblock_order. 34 33 * Returns 0 on success and -EBUSY if any part of range cannot be isolated. 35 34 */ 36 - int 37 - start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn) 35 + int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, 36 + unsigned migratetype) 38 37 { 39 38 unsigned long pfn; 40 39 unsigned long undo_pfn; ··· 57 56 for (pfn = start_pfn; 58 57 pfn < undo_pfn; 59 58 pfn += pageblock_nr_pages) 60 - unset_migratetype_isolate(pfn_to_page(pfn)); 59 + unset_migratetype_isolate(pfn_to_page(pfn), migratetype); 61 60 62 61 return -EBUSY; 63 62 } ··· 65 64 /* 66 65 * Make isolated pages available again. 67 66 */ 68 - int 69 - undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn) 67 + int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, 68 + unsigned migratetype) 70 69 { 71 70 unsigned long pfn; 72 71 struct page *page; ··· 78 77 page = __first_valid_page(pfn, pageblock_nr_pages); 79 78 if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE) 80 79 continue; 81 - unset_migratetype_isolate(page); 80 + unset_migratetype_isolate(page, migratetype); 82 81 } 83 82 return 0; 84 83 } ··· 87 86 * all pages in [start_pfn...end_pfn) must be in the same zone. 88 87 * zone->lock must be held before call this. 89 88 * 90 - * Returns 1 if all pages in the range is isolated. 89 + * Returns 1 if all pages in the range are isolated. 91 90 */ 92 91 static int 93 92 __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
+3
mm/vmstat.c
··· 613 613 "Reclaimable", 614 614 "Movable", 615 615 "Reserve", 616 + #ifdef CONFIG_CMA 617 + "CMA", 618 + #endif 616 619 "Isolate", 617 620 }; 618 621