Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

NOMMU: Make VMAs per MM as for MMU-mode linux

Make VMAs per mm_struct as for MMU-mode linux. This solves two problems:

(1) In SYSV SHM where nattch for a segment does not reflect the number of
shmat's (and forks) done.

(2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an
exec'ing process when VM_EXECUTABLE is specified, regardless of the fact
that a VMA might be shared and already have its vm_mm assigned to another
process or a dead process.

A new struct (vm_region) is introduced to track a mapped region and to remember
the circumstances under which it may be shared and the vm_list_struct structure
is discarded as it's no longer required.

This patch makes the following additional changes:

(1) Regions are now allocated with alloc_pages() rather than kmalloc() and
with no recourse to __GFP_COMP, so the pages are not composite. Instead,
each page has a reference on it held by the region. Anything else that is
interested in such a page will have to get a reference on it to retain it.
When the pages are released due to unmapping, each page is passed to
put_page() and will be freed when the page usage count reaches zero.

(2) Excess pages are trimmed after an allocation as the allocation must be
made as a power-of-2 quantity of pages.

(3) VMAs are added to the parent MM's R/B tree and mmap lists. As an MM may
end up with overlapping VMAs within the tree, the VMA struct address is
appended to the sort key.

(4) Non-anonymous VMAs are now added to the backing inode's prio list.

(5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of
the backing region. The VMA and region structs will be split if
necessary.

(6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory
segment instead of all the attachments at that addresss. Multiple
shmat()'s return the same address under NOMMU-mode instead of different
virtual addresses as under MMU-mode.

(7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode.

(8) /proc/maps is now the global list of mapped regions, and may list bits
that aren't actually mapped anywhere.

(9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount
of RAM currently allocated by mmap to hold mappable regions that can't be
mapped directly. These are copies of the backing device or file if not
anonymous.

These changes make NOMMU mode more similar to MMU mode. The downside is that
NOMMU mode requires some extra memory to track things over NOMMU without this
patch (VMAs are no longer shared, and there are now region structs).

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Mike Frysinger <vapier.adi@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>

+885 -461
+11 -5
Documentation/nommu-mmap.txt
··· 109 109 FURTHER NOTES ON NO-MMU MMAP 110 110 ============================ 111 111 112 - (*) A request for a private mapping of less than a page in size may not return 113 - a page-aligned buffer. This is because the kernel calls kmalloc() to 114 - allocate the buffer, not get_free_page(). 112 + (*) A request for a private mapping of a file may return a buffer that is not 113 + page-aligned. This is because XIP may take place, and the data may not be 114 + paged aligned in the backing store. 115 115 116 - (*) A list of all the mappings on the system is visible through /proc/maps in 117 - no-MMU mode. 116 + (*) A request for an anonymous mapping will always be page aligned. If 117 + possible the size of the request should be a power of two otherwise some 118 + of the space may be wasted as the kernel must allocate a power-of-2 119 + granule but will only discard the excess if appropriately configured as 120 + this has an effect on fragmentation. 121 + 122 + (*) A list of all the private copy and anonymous mappings on the system is 123 + visible through /proc/maps in no-MMU mode. 118 124 119 125 (*) A list of all the mappings in use by a process is visible through 120 126 /proc/<pid>/maps in no-MMU mode.
-1
arch/arm/include/asm/mmu.h
··· 24 24 * modified for 2.6 by Hyok S. Choi <hyok.choi@samsung.com> 25 25 */ 26 26 typedef struct { 27 - struct vm_list_struct *vmlist; 28 27 unsigned long end_brk; 29 28 } mm_context_t; 30 29
-1
arch/blackfin/include/asm/mmu.h
··· 10 10 }; 11 11 12 12 typedef struct { 13 - struct vm_list_struct *vmlist; 14 13 unsigned long end_brk; 15 14 unsigned long stack_start; 16 15
+3 -3
arch/blackfin/kernel/ptrace.c
··· 160 160 static inline int is_user_addr_valid(struct task_struct *child, 161 161 unsigned long start, unsigned long len) 162 162 { 163 - struct vm_list_struct *vml; 163 + struct vm_area_struct *vma; 164 164 struct sram_list_struct *sraml; 165 165 166 166 /* overflow */ 167 167 if (start + len < start) 168 168 return -EIO; 169 169 170 - for (vml = child->mm->context.vmlist; vml; vml = vml->next) 171 - if (start >= vml->vma->vm_start && start + len < vml->vma->vm_end) 170 + vma = find_vma(child->mm, start); 171 + if (vma && start >= vma->vm_start && start + len <= vma->vm_end) 172 172 return 0; 173 173 174 174 for (sraml = child->mm->context.sram_list; sraml; sraml = sraml->next)
+6 -5
arch/blackfin/kernel/traps.c
··· 32 32 #include <linux/module.h> 33 33 #include <linux/kallsyms.h> 34 34 #include <linux/fs.h> 35 + #include <linux/rbtree.h> 35 36 #include <asm/traps.h> 36 37 #include <asm/cacheflush.h> 37 38 #include <asm/cplb.h> ··· 84 83 struct mm_struct *mm; 85 84 unsigned long flags, offset; 86 85 unsigned char in_atomic = (bfin_read_IPEND() & 0x10) || in_atomic(); 86 + struct rb_node *n; 87 87 88 88 #ifdef CONFIG_KALLSYMS 89 89 unsigned long symsize; ··· 130 128 if (!mm) 131 129 continue; 132 130 133 - vml = mm->context.vmlist; 134 - while (vml) { 135 - struct vm_area_struct *vma = vml->vma; 131 + for (n = rb_first(&mm->mm_rb); n; n = rb_next(n)) { 132 + struct vm_area_struct *vma; 133 + 134 + vma = rb_entry(n, struct vm_area_struct, vm_rb); 136 135 137 136 if (address >= vma->vm_start && address < vma->vm_end) { 138 137 char _tmpbuf[256]; ··· 179 176 180 177 goto done; 181 178 } 182 - 183 - vml = vml->next; 184 179 } 185 180 if (!in_atomic) 186 181 mmput(mm);
+6 -5
arch/frv/kernel/ptrace.c
··· 69 69 } 70 70 71 71 /* 72 - * check that an address falls within the bounds of the target process's memory mappings 72 + * check that an address falls within the bounds of the target process's memory 73 + * mappings 73 74 */ 74 75 static inline int is_user_addr_valid(struct task_struct *child, 75 76 unsigned long start, unsigned long len) ··· 80 79 return -EIO; 81 80 return 0; 82 81 #else 83 - struct vm_list_struct *vml; 82 + struct vm_area_struct *vma; 84 83 85 - for (vml = child->mm->context.vmlist; vml; vml = vml->next) 86 - if (start >= vml->vma->vm_start && start + len <= vml->vma->vm_end) 87 - return 0; 84 + vma = find_vma(child->mm, start); 85 + if (vma && start >= vma->vm_start && start + len <= vma->vm_end) 86 + return 0; 88 87 89 88 return -EIO; 90 89 #endif
-1
arch/h8300/include/asm/mmu.h
··· 4 4 /* Copyright (C) 2002, David McCullough <davidm@snapgear.com> */ 5 5 6 6 typedef struct { 7 - struct vm_list_struct *vmlist; 8 7 unsigned long end_brk; 9 8 } mm_context_t; 10 9
-1
arch/m68knommu/include/asm/mmu.h
··· 4 4 /* Copyright (C) 2002, David McCullough <davidm@snapgear.com> */ 5 5 6 6 typedef struct { 7 - struct vm_list_struct *vmlist; 8 7 unsigned long end_brk; 9 8 } mm_context_t; 10 9
-1
arch/sh/include/asm/mmu.h
··· 9 9 mm_context_id_t id; 10 10 void *vdso; 11 11 #else 12 - struct vm_list_struct *vmlist; 13 12 unsigned long end_brk; 14 13 #endif 15 14 #ifdef CONFIG_BINFMT_ELF_FDPIC
+3 -24
fs/binfmt_elf_fdpic.c
··· 1567 1567 static int elf_fdpic_dump_segments(struct file *file, size_t *size, 1568 1568 unsigned long *limit, unsigned long mm_flags) 1569 1569 { 1570 - struct vm_list_struct *vml; 1570 + struct vm_area_struct *vma; 1571 1571 1572 - for (vml = current->mm->context.vmlist; vml; vml = vml->next) { 1573 - struct vm_area_struct *vma = vml->vma; 1574 - 1572 + for (vma = current->mm->mmap; vma; vma = vma->vm_next) { 1575 1573 if (!maydump(vma, mm_flags)) 1576 1574 continue; 1577 1575 ··· 1615 1617 elf_fpxregset_t *xfpu = NULL; 1616 1618 #endif 1617 1619 int thread_status_size = 0; 1618 - #ifndef CONFIG_MMU 1619 - struct vm_list_struct *vml; 1620 - #endif 1621 1620 elf_addr_t *auxv; 1622 1621 unsigned long mm_flags; 1623 1622 ··· 1680 1685 fill_prstatus(prstatus, current, signr); 1681 1686 elf_core_copy_regs(&prstatus->pr_reg, regs); 1682 1687 1683 - #ifdef CONFIG_MMU 1684 1688 segs = current->mm->map_count; 1685 - #else 1686 - segs = 0; 1687 - for (vml = current->mm->context.vmlist; vml; vml = vml->next) 1688 - segs++; 1689 - #endif 1690 1689 #ifdef ELF_CORE_EXTRA_PHDRS 1691 1690 segs += ELF_CORE_EXTRA_PHDRS; 1692 1691 #endif ··· 1755 1766 mm_flags = current->mm->flags; 1756 1767 1757 1768 /* write program headers for segments dump */ 1758 - for ( 1759 - #ifdef CONFIG_MMU 1760 - vma = current->mm->mmap; vma; vma = vma->vm_next 1761 - #else 1762 - vml = current->mm->context.vmlist; vml; vml = vml->next 1763 - #endif 1764 - ) { 1769 + for (vma = current->mm->mmap; vma; vma = vma->vm_next) { 1765 1770 struct elf_phdr phdr; 1766 1771 size_t sz; 1767 - 1768 - #ifndef CONFIG_MMU 1769 - vma = vml->vma; 1770 - #endif 1771 1772 1772 1773 sz = vma->vm_end - vma->vm_start; 1773 1774
-2
fs/proc/internal.h
··· 41 41 (vmi)->used = 0; \ 42 42 (vmi)->largest_chunk = 0; \ 43 43 } while(0) 44 - 45 - extern int nommu_vma_show(struct seq_file *, struct vm_area_struct *); 46 44 #endif 47 45 48 46 extern int proc_tid_stat(struct seq_file *m, struct pid_namespace *ns,
+6
fs/proc/meminfo.c
··· 74 74 "LowTotal: %8lu kB\n" 75 75 "LowFree: %8lu kB\n" 76 76 #endif 77 + #ifndef CONFIG_MMU 78 + "MmapCopy: %8lu kB\n" 79 + #endif 77 80 "SwapTotal: %8lu kB\n" 78 81 "SwapFree: %8lu kB\n" 79 82 "Dirty: %8lu kB\n" ··· 118 115 K(i.freehigh), 119 116 K(i.totalram-i.totalhigh), 120 117 K(i.freeram-i.freehigh), 118 + #endif 119 + #ifndef CONFIG_MMU 120 + K((unsigned long) atomic_read(&mmap_pages_allocated)), 121 121 #endif 122 122 K(i.totalswap), 123 123 K(i.freeswap),
+32 -39
fs/proc/nommu.c
··· 33 33 #include "internal.h" 34 34 35 35 /* 36 - * display a single VMA to a sequenced file 36 + * display a single region to a sequenced file 37 37 */ 38 - int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma) 38 + static int nommu_region_show(struct seq_file *m, struct vm_region *region) 39 39 { 40 40 unsigned long ino = 0; 41 41 struct file *file; 42 42 dev_t dev = 0; 43 43 int flags, len; 44 44 45 - flags = vma->vm_flags; 46 - file = vma->vm_file; 45 + flags = region->vm_flags; 46 + file = region->vm_file; 47 47 48 48 if (file) { 49 - struct inode *inode = vma->vm_file->f_path.dentry->d_inode; 49 + struct inode *inode = region->vm_file->f_path.dentry->d_inode; 50 50 dev = inode->i_sb->s_dev; 51 51 ino = inode->i_ino; 52 52 } 53 53 54 54 seq_printf(m, 55 55 "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n", 56 - vma->vm_start, 57 - vma->vm_end, 56 + region->vm_start, 57 + region->vm_end, 58 58 flags & VM_READ ? 'r' : '-', 59 59 flags & VM_WRITE ? 'w' : '-', 60 60 flags & VM_EXEC ? 'x' : '-', 61 61 flags & VM_MAYSHARE ? flags & VM_SHARED ? 'S' : 's' : 'p', 62 - ((loff_t)vma->vm_pgoff) << PAGE_SHIFT, 62 + ((loff_t)region->vm_pgoff) << PAGE_SHIFT, 63 63 MAJOR(dev), MINOR(dev), ino, &len); 64 64 65 65 if (file) { ··· 75 75 } 76 76 77 77 /* 78 - * display a list of all the VMAs the kernel knows about 78 + * display a list of all the REGIONs the kernel knows about 79 79 * - nommu kernals have a single flat list 80 80 */ 81 - static int nommu_vma_list_show(struct seq_file *m, void *v) 81 + static int nommu_region_list_show(struct seq_file *m, void *_p) 82 82 { 83 - struct vm_area_struct *vma; 83 + struct rb_node *p = _p; 84 84 85 - vma = rb_entry((struct rb_node *) v, struct vm_area_struct, vm_rb); 86 - return nommu_vma_show(m, vma); 85 + return nommu_region_show(m, rb_entry(p, struct vm_region, vm_rb)); 87 86 } 88 87 89 - static void *nommu_vma_list_start(struct seq_file *m, loff_t *_pos) 88 + static void *nommu_region_list_start(struct seq_file *m, loff_t *_pos) 90 89 { 91 - struct rb_node *_rb; 90 + struct rb_node *p; 92 91 loff_t pos = *_pos; 93 - void *next = NULL; 94 92 95 - down_read(&nommu_vma_sem); 93 + down_read(&nommu_region_sem); 96 94 97 - for (_rb = rb_first(&nommu_vma_tree); _rb; _rb = rb_next(_rb)) { 98 - if (pos == 0) { 99 - next = _rb; 100 - break; 101 - } 102 - pos--; 103 - } 104 - 105 - return next; 95 + for (p = rb_first(&nommu_region_tree); p; p = rb_next(p)) 96 + if (pos-- == 0) 97 + return p; 98 + return NULL; 106 99 } 107 100 108 - static void nommu_vma_list_stop(struct seq_file *m, void *v) 101 + static void nommu_region_list_stop(struct seq_file *m, void *v) 109 102 { 110 - up_read(&nommu_vma_sem); 103 + up_read(&nommu_region_sem); 111 104 } 112 105 113 - static void *nommu_vma_list_next(struct seq_file *m, void *v, loff_t *pos) 106 + static void *nommu_region_list_next(struct seq_file *m, void *v, loff_t *pos) 114 107 { 115 108 (*pos)++; 116 109 return rb_next((struct rb_node *) v); 117 110 } 118 111 119 - static const struct seq_operations proc_nommu_vma_list_seqop = { 120 - .start = nommu_vma_list_start, 121 - .next = nommu_vma_list_next, 122 - .stop = nommu_vma_list_stop, 123 - .show = nommu_vma_list_show 112 + static struct seq_operations proc_nommu_region_list_seqop = { 113 + .start = nommu_region_list_start, 114 + .next = nommu_region_list_next, 115 + .stop = nommu_region_list_stop, 116 + .show = nommu_region_list_show 124 117 }; 125 118 126 - static int proc_nommu_vma_list_open(struct inode *inode, struct file *file) 119 + static int proc_nommu_region_list_open(struct inode *inode, struct file *file) 127 120 { 128 - return seq_open(file, &proc_nommu_vma_list_seqop); 121 + return seq_open(file, &proc_nommu_region_list_seqop); 129 122 } 130 123 131 - static const struct file_operations proc_nommu_vma_list_operations = { 132 - .open = proc_nommu_vma_list_open, 124 + static const struct file_operations proc_nommu_region_list_operations = { 125 + .open = proc_nommu_region_list_open, 133 126 .read = seq_read, 134 127 .llseek = seq_lseek, 135 128 .release = seq_release, ··· 130 137 131 138 static int __init proc_nommu_init(void) 132 139 { 133 - proc_create("maps", S_IRUGO, NULL, &proc_nommu_vma_list_operations); 140 + proc_create("maps", S_IRUGO, NULL, &proc_nommu_region_list_operations); 134 141 return 0; 135 142 } 136 143
+75 -33
fs/proc/task_nommu.c
··· 15 15 */ 16 16 void task_mem(struct seq_file *m, struct mm_struct *mm) 17 17 { 18 - struct vm_list_struct *vml; 18 + struct vm_area_struct *vma; 19 + struct rb_node *p; 19 20 unsigned long bytes = 0, sbytes = 0, slack = 0; 20 21 21 22 down_read(&mm->mmap_sem); 22 - for (vml = mm->context.vmlist; vml; vml = vml->next) { 23 - if (!vml->vma) 24 - continue; 23 + for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) { 24 + vma = rb_entry(p, struct vm_area_struct, vm_rb); 25 25 26 - bytes += kobjsize(vml); 26 + bytes += kobjsize(vma); 27 27 if (atomic_read(&mm->mm_count) > 1 || 28 - atomic_read(&vml->vma->vm_usage) > 1 29 - ) { 30 - sbytes += kobjsize((void *) vml->vma->vm_start); 31 - sbytes += kobjsize(vml->vma); 28 + vma->vm_region || 29 + vma->vm_flags & VM_MAYSHARE) { 30 + sbytes += kobjsize((void *) vma->vm_start); 31 + if (vma->vm_region) 32 + sbytes += kobjsize(vma->vm_region); 32 33 } else { 33 - bytes += kobjsize((void *) vml->vma->vm_start); 34 - bytes += kobjsize(vml->vma); 35 - slack += kobjsize((void *) vml->vma->vm_start) - 36 - (vml->vma->vm_end - vml->vma->vm_start); 34 + bytes += kobjsize((void *) vma->vm_start); 35 + slack += kobjsize((void *) vma->vm_start) - 36 + (vma->vm_end - vma->vm_start); 37 37 } 38 38 } 39 39 ··· 70 70 71 71 unsigned long task_vsize(struct mm_struct *mm) 72 72 { 73 - struct vm_list_struct *tbp; 73 + struct vm_area_struct *vma; 74 + struct rb_node *p; 74 75 unsigned long vsize = 0; 75 76 76 77 down_read(&mm->mmap_sem); 77 - for (tbp = mm->context.vmlist; tbp; tbp = tbp->next) { 78 - if (tbp->vma) 79 - vsize += kobjsize((void *) tbp->vma->vm_start); 78 + for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) { 79 + vma = rb_entry(p, struct vm_area_struct, vm_rb); 80 + vsize += vma->vm_region->vm_end - vma->vm_region->vm_start; 80 81 } 81 82 up_read(&mm->mmap_sem); 82 83 return vsize; ··· 86 85 int task_statm(struct mm_struct *mm, int *shared, int *text, 87 86 int *data, int *resident) 88 87 { 89 - struct vm_list_struct *tbp; 88 + struct vm_area_struct *vma; 89 + struct rb_node *p; 90 90 int size = kobjsize(mm); 91 91 92 92 down_read(&mm->mmap_sem); 93 - for (tbp = mm->context.vmlist; tbp; tbp = tbp->next) { 94 - size += kobjsize(tbp); 95 - if (tbp->vma) { 96 - size += kobjsize(tbp->vma); 97 - size += kobjsize((void *) tbp->vma->vm_start); 98 - } 93 + for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) { 94 + vma = rb_entry(p, struct vm_area_struct, vm_rb); 95 + size += kobjsize(vma); 96 + size += kobjsize((void *) vma->vm_start); 99 97 } 100 98 101 99 size += (*text = mm->end_code - mm->start_code); ··· 105 105 } 106 106 107 107 /* 108 + * display a single VMA to a sequenced file 109 + */ 110 + static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma) 111 + { 112 + unsigned long ino = 0; 113 + struct file *file; 114 + dev_t dev = 0; 115 + int flags, len; 116 + 117 + flags = vma->vm_flags; 118 + file = vma->vm_file; 119 + 120 + if (file) { 121 + struct inode *inode = vma->vm_file->f_path.dentry->d_inode; 122 + dev = inode->i_sb->s_dev; 123 + ino = inode->i_ino; 124 + } 125 + 126 + seq_printf(m, 127 + "%08lx-%08lx %c%c%c%c %08lx %02x:%02x %lu %n", 128 + vma->vm_start, 129 + vma->vm_end, 130 + flags & VM_READ ? 'r' : '-', 131 + flags & VM_WRITE ? 'w' : '-', 132 + flags & VM_EXEC ? 'x' : '-', 133 + flags & VM_MAYSHARE ? flags & VM_SHARED ? 'S' : 's' : 'p', 134 + vma->vm_pgoff << PAGE_SHIFT, 135 + MAJOR(dev), MINOR(dev), ino, &len); 136 + 137 + if (file) { 138 + len = 25 + sizeof(void *) * 6 - len; 139 + if (len < 1) 140 + len = 1; 141 + seq_printf(m, "%*c", len, ' '); 142 + seq_path(m, &file->f_path, ""); 143 + } 144 + 145 + seq_putc(m, '\n'); 146 + return 0; 147 + } 148 + 149 + /* 108 150 * display mapping lines for a particular process's /proc/pid/maps 109 151 */ 110 - static int show_map(struct seq_file *m, void *_vml) 152 + static int show_map(struct seq_file *m, void *_p) 111 153 { 112 - struct vm_list_struct *vml = _vml; 154 + struct rb_node *p = _p; 113 155 114 - return nommu_vma_show(m, vml->vma); 156 + return nommu_vma_show(m, rb_entry(p, struct vm_area_struct, vm_rb)); 115 157 } 116 158 117 159 static void *m_start(struct seq_file *m, loff_t *pos) 118 160 { 119 161 struct proc_maps_private *priv = m->private; 120 - struct vm_list_struct *vml; 121 162 struct mm_struct *mm; 163 + struct rb_node *p; 122 164 loff_t n = *pos; 123 165 124 166 /* pin the task and mm whilst we play with them */ ··· 176 134 } 177 135 178 136 /* start from the Nth VMA */ 179 - for (vml = mm->context.vmlist; vml; vml = vml->next) 137 + for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) 180 138 if (n-- == 0) 181 - return vml; 139 + return p; 182 140 return NULL; 183 141 } 184 142 ··· 194 152 } 195 153 } 196 154 197 - static void *m_next(struct seq_file *m, void *_vml, loff_t *pos) 155 + static void *m_next(struct seq_file *m, void *_p, loff_t *pos) 198 156 { 199 - struct vm_list_struct *vml = _vml; 157 + struct rb_node *p = _p; 200 158 201 159 (*pos)++; 202 - return vml ? vml->next : NULL; 160 + return p ? rb_next(p) : NULL; 203 161 } 204 162 205 163 static const struct seq_operations proc_pid_maps_ops = {
-1
include/asm-frv/mmu.h
··· 22 22 unsigned long dtlb_ptd_mapping; /* [DAMR5] PTD mapping for dtlb cached PGE */ 23 23 24 24 #else 25 - struct vm_list_struct *vmlist; 26 25 unsigned long end_brk; 27 26 28 27 #endif
-1
include/asm-m32r/mmu.h
··· 4 4 #if !defined(CONFIG_MMU) 5 5 6 6 typedef struct { 7 - struct vm_list_struct *vmlist; 8 7 unsigned long end_brk; 9 8 } mm_context_t; 10 9
+6 -12
include/linux/mm.h
··· 56 56 57 57 extern struct kmem_cache *vm_area_cachep; 58 58 59 - /* 60 - * This struct defines the per-mm list of VMAs for uClinux. If CONFIG_MMU is 61 - * disabled, then there's a single shared list of VMAs maintained by the 62 - * system, and mm's subscribe to these individually 63 - */ 64 - struct vm_list_struct { 65 - struct vm_list_struct *next; 66 - struct vm_area_struct *vma; 67 - }; 68 - 69 59 #ifndef CONFIG_MMU 70 - extern struct rb_root nommu_vma_tree; 71 - extern struct rw_semaphore nommu_vma_sem; 60 + extern struct rb_root nommu_region_tree; 61 + extern struct rw_semaphore nommu_region_sem; 72 62 73 63 extern unsigned int kobjsize(const void *objp); 74 64 #endif ··· 1051 1061 unsigned long, enum memmap_context); 1052 1062 extern void setup_per_zone_pages_min(void); 1053 1063 extern void mem_init(void); 1064 + extern void __init mmap_init(void); 1054 1065 extern void show_mem(void); 1055 1066 extern void si_meminfo(struct sysinfo * val); 1056 1067 extern void si_meminfo_node(struct sysinfo *val, int nid); ··· 1062 1071 #else 1063 1072 static inline void setup_per_cpu_pageset(void) {} 1064 1073 #endif 1074 + 1075 + /* nommu.c */ 1076 + extern atomic_t mmap_pages_allocated; 1065 1077 1066 1078 /* prio_tree.c */ 1067 1079 void vma_prio_tree_add(struct vm_area_struct *, struct vm_area_struct *old);
+17 -1
include/linux/mm_types.h
··· 97 97 }; 98 98 99 99 /* 100 + * A region containing a mapping of a non-memory backed file under NOMMU 101 + * conditions. These are held in a global tree and are pinned by the VMAs that 102 + * map parts of them. 103 + */ 104 + struct vm_region { 105 + struct rb_node vm_rb; /* link in global region tree */ 106 + unsigned long vm_flags; /* VMA vm_flags */ 107 + unsigned long vm_start; /* start address of region */ 108 + unsigned long vm_end; /* region initialised to here */ 109 + unsigned long vm_pgoff; /* the offset in vm_file corresponding to vm_start */ 110 + struct file *vm_file; /* the backing file or NULL */ 111 + 112 + atomic_t vm_usage; /* region usage count */ 113 + }; 114 + 115 + /* 100 116 * This struct defines a memory VMM memory area. There is one of these 101 117 * per VM-area/task. A VM area is any part of the process virtual memory 102 118 * space that has a special rule for the page-fault handlers (ie a shared ··· 168 152 unsigned long vm_truncate_count;/* truncate_count or restart_addr */ 169 153 170 154 #ifndef CONFIG_MMU 171 - atomic_t vm_usage; /* refcount (VMAs shared if !MMU) */ 155 + struct vm_region *vm_region; /* NOMMU mapping region */ 172 156 #endif 173 157 #ifdef CONFIG_NUMA 174 158 struct mempolicy *vm_policy; /* NUMA policy for the VMA */
+12
ipc/shm.c
··· 990 990 */ 991 991 vma = find_vma(mm, addr); 992 992 993 + #ifdef CONFIG_MMU 993 994 while (vma) { 994 995 next = vma->vm_next; 995 996 ··· 1034 1033 do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start); 1035 1034 vma = next; 1036 1035 } 1036 + 1037 + #else /* CONFIG_MMU */ 1038 + /* under NOMMU conditions, the exact address to be destroyed must be 1039 + * given */ 1040 + retval = -EINVAL; 1041 + if (vma->vm_start == addr && vma->vm_ops == &shm_vm_ops) { 1042 + do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start); 1043 + retval = 0; 1044 + } 1045 + 1046 + #endif 1037 1047 1038 1048 up_write(&mm->mmap_sem); 1039 1049 return retval;
+1 -3
kernel/fork.c
··· 1481 1481 fs_cachep = kmem_cache_create("fs_cache", 1482 1482 sizeof(struct fs_struct), 0, 1483 1483 SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL); 1484 - vm_area_cachep = kmem_cache_create("vm_area_struct", 1485 - sizeof(struct vm_area_struct), 0, 1486 - SLAB_PANIC, NULL); 1487 1484 mm_cachep = kmem_cache_create("mm_struct", 1488 1485 sizeof(struct mm_struct), ARCH_MIN_MMSTRUCT_ALIGN, 1489 1486 SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL); 1487 + mmap_init(); 1490 1488 } 1491 1489 1492 1490 /*
+7
lib/Kconfig.debug
··· 512 512 513 513 If unsure, say N. 514 514 515 + config DEBUG_NOMMU_REGIONS 516 + bool "Debug the global anon/private NOMMU mapping region tree" 517 + depends on DEBUG_KERNEL && !MMU 518 + help 519 + This option causes the global tree of anonymous and private mapping 520 + regions to be regularly checked for invalid topology. 521 + 515 522 config DEBUG_WRITECOUNT 516 523 bool "Debug filesystem writers count" 517 524 depends on DEBUG_KERNEL
+10
mm/mmap.c
··· 2472 2472 2473 2473 mutex_unlock(&mm_all_locks_mutex); 2474 2474 } 2475 + 2476 + /* 2477 + * initialise the VMA slab 2478 + */ 2479 + void __init mmap_init(void) 2480 + { 2481 + vm_area_cachep = kmem_cache_create("vm_area_struct", 2482 + sizeof(struct vm_area_struct), 0, 2483 + SLAB_PANIC, NULL); 2484 + }
+690 -322
mm/nommu.c
··· 6 6 * 7 7 * See Documentation/nommu-mmap.txt 8 8 * 9 - * Copyright (c) 2004-2005 David Howells <dhowells@redhat.com> 9 + * Copyright (c) 2004-2008 David Howells <dhowells@redhat.com> 10 10 * Copyright (c) 2000-2003 David McCullough <davidm@snapgear.com> 11 11 * Copyright (c) 2000-2001 D Jeff Dionne <jeff@uClinux.org> 12 12 * Copyright (c) 2002 Greg Ungerer <gerg@snapgear.com> ··· 33 33 #include <asm/uaccess.h> 34 34 #include <asm/tlb.h> 35 35 #include <asm/tlbflush.h> 36 + #include "internal.h" 37 + 38 + static inline __attribute__((format(printf, 1, 2))) 39 + void no_printk(const char *fmt, ...) 40 + { 41 + } 42 + 43 + #if 0 44 + #define kenter(FMT, ...) \ 45 + printk(KERN_DEBUG "==> %s("FMT")\n", __func__, ##__VA_ARGS__) 46 + #define kleave(FMT, ...) \ 47 + printk(KERN_DEBUG "<== %s()"FMT"\n", __func__, ##__VA_ARGS__) 48 + #define kdebug(FMT, ...) \ 49 + printk(KERN_DEBUG "xxx" FMT"yyy\n", ##__VA_ARGS__) 50 + #else 51 + #define kenter(FMT, ...) \ 52 + no_printk(KERN_DEBUG "==> %s("FMT")\n", __func__, ##__VA_ARGS__) 53 + #define kleave(FMT, ...) \ 54 + no_printk(KERN_DEBUG "<== %s()"FMT"\n", __func__, ##__VA_ARGS__) 55 + #define kdebug(FMT, ...) \ 56 + no_printk(KERN_DEBUG FMT"\n", ##__VA_ARGS__) 57 + #endif 36 58 37 59 #include "internal.h" 38 60 ··· 68 46 int sysctl_max_map_count = DEFAULT_MAX_MAP_COUNT; 69 47 int heap_stack_gap = 0; 70 48 49 + atomic_t mmap_pages_allocated; 50 + 71 51 EXPORT_SYMBOL(mem_map); 72 52 EXPORT_SYMBOL(num_physpages); 73 53 74 - /* list of shareable VMAs */ 75 - struct rb_root nommu_vma_tree = RB_ROOT; 76 - DECLARE_RWSEM(nommu_vma_sem); 54 + /* list of mapped, potentially shareable regions */ 55 + static struct kmem_cache *vm_region_jar; 56 + struct rb_root nommu_region_tree = RB_ROOT; 57 + DECLARE_RWSEM(nommu_region_sem); 77 58 78 59 struct vm_operations_struct generic_file_vm_ops = { 79 60 }; ··· 425 400 return mm->brk = brk; 426 401 } 427 402 428 - #ifdef DEBUG 429 - static void show_process_blocks(void) 403 + /* 404 + * initialise the VMA and region record slabs 405 + */ 406 + void __init mmap_init(void) 430 407 { 431 - struct vm_list_struct *vml; 408 + vm_region_jar = kmem_cache_create("vm_region_jar", 409 + sizeof(struct vm_region), 0, 410 + SLAB_PANIC, NULL); 411 + vm_area_cachep = kmem_cache_create("vm_area_struct", 412 + sizeof(struct vm_area_struct), 0, 413 + SLAB_PANIC, NULL); 414 + } 432 415 433 - printk("Process blocks %d:", current->pid); 416 + /* 417 + * validate the region tree 418 + * - the caller must hold the region lock 419 + */ 420 + #ifdef CONFIG_DEBUG_NOMMU_REGIONS 421 + static noinline void validate_nommu_regions(void) 422 + { 423 + struct vm_region *region, *last; 424 + struct rb_node *p, *lastp; 434 425 435 - for (vml = &current->mm->context.vmlist; vml; vml = vml->next) { 436 - printk(" %p: %p", vml, vml->vma); 437 - if (vml->vma) 438 - printk(" (%d @%lx #%d)", 439 - kobjsize((void *) vml->vma->vm_start), 440 - vml->vma->vm_start, 441 - atomic_read(&vml->vma->vm_usage)); 442 - printk(vml->next ? " ->" : ".\n"); 426 + lastp = rb_first(&nommu_region_tree); 427 + if (!lastp) 428 + return; 429 + 430 + last = rb_entry(lastp, struct vm_region, vm_rb); 431 + if (unlikely(last->vm_end <= last->vm_start)) 432 + BUG(); 433 + 434 + while ((p = rb_next(lastp))) { 435 + region = rb_entry(p, struct vm_region, vm_rb); 436 + last = rb_entry(lastp, struct vm_region, vm_rb); 437 + 438 + if (unlikely(region->vm_end <= region->vm_start)) 439 + BUG(); 440 + if (unlikely(region->vm_start < last->vm_end)) 441 + BUG(); 442 + 443 + lastp = p; 443 444 } 444 445 } 445 - #endif /* DEBUG */ 446 + #else 447 + #define validate_nommu_regions() do {} while(0) 448 + #endif 449 + 450 + /* 451 + * add a region into the global tree 452 + */ 453 + static void add_nommu_region(struct vm_region *region) 454 + { 455 + struct vm_region *pregion; 456 + struct rb_node **p, *parent; 457 + 458 + validate_nommu_regions(); 459 + 460 + BUG_ON(region->vm_start & ~PAGE_MASK); 461 + 462 + parent = NULL; 463 + p = &nommu_region_tree.rb_node; 464 + while (*p) { 465 + parent = *p; 466 + pregion = rb_entry(parent, struct vm_region, vm_rb); 467 + if (region->vm_start < pregion->vm_start) 468 + p = &(*p)->rb_left; 469 + else if (region->vm_start > pregion->vm_start) 470 + p = &(*p)->rb_right; 471 + else if (pregion == region) 472 + return; 473 + else 474 + BUG(); 475 + } 476 + 477 + rb_link_node(&region->vm_rb, parent, p); 478 + rb_insert_color(&region->vm_rb, &nommu_region_tree); 479 + 480 + validate_nommu_regions(); 481 + } 482 + 483 + /* 484 + * delete a region from the global tree 485 + */ 486 + static void delete_nommu_region(struct vm_region *region) 487 + { 488 + BUG_ON(!nommu_region_tree.rb_node); 489 + 490 + validate_nommu_regions(); 491 + rb_erase(&region->vm_rb, &nommu_region_tree); 492 + validate_nommu_regions(); 493 + } 494 + 495 + /* 496 + * free a contiguous series of pages 497 + */ 498 + static void free_page_series(unsigned long from, unsigned long to) 499 + { 500 + for (; from < to; from += PAGE_SIZE) { 501 + struct page *page = virt_to_page(from); 502 + 503 + kdebug("- free %lx", from); 504 + atomic_dec(&mmap_pages_allocated); 505 + if (page_count(page) != 1) 506 + kdebug("free page %p [%d]", page, page_count(page)); 507 + put_page(page); 508 + } 509 + } 510 + 511 + /* 512 + * release a reference to a region 513 + * - the caller must hold the region semaphore, which this releases 514 + * - the region may not have been added to the tree yet, in which case vm_end 515 + * will equal vm_start 516 + */ 517 + static void __put_nommu_region(struct vm_region *region) 518 + __releases(nommu_region_sem) 519 + { 520 + kenter("%p{%d}", region, atomic_read(&region->vm_usage)); 521 + 522 + BUG_ON(!nommu_region_tree.rb_node); 523 + 524 + if (atomic_dec_and_test(&region->vm_usage)) { 525 + if (region->vm_end > region->vm_start) 526 + delete_nommu_region(region); 527 + up_write(&nommu_region_sem); 528 + 529 + if (region->vm_file) 530 + fput(region->vm_file); 531 + 532 + /* IO memory and memory shared directly out of the pagecache 533 + * from ramfs/tmpfs mustn't be released here */ 534 + if (region->vm_flags & VM_MAPPED_COPY) { 535 + kdebug("free series"); 536 + free_page_series(region->vm_start, region->vm_end); 537 + } 538 + kmem_cache_free(vm_region_jar, region); 539 + } else { 540 + up_write(&nommu_region_sem); 541 + } 542 + } 543 + 544 + /* 545 + * release a reference to a region 546 + */ 547 + static void put_nommu_region(struct vm_region *region) 548 + { 549 + down_write(&nommu_region_sem); 550 + __put_nommu_region(region); 551 + } 446 552 447 553 /* 448 554 * add a VMA into a process's mm_struct in the appropriate place in the list 555 + * and tree and add to the address space's page tree also if not an anonymous 556 + * page 449 557 * - should be called with mm->mmap_sem held writelocked 450 558 */ 451 - static void add_vma_to_mm(struct mm_struct *mm, struct vm_list_struct *vml) 559 + static void add_vma_to_mm(struct mm_struct *mm, struct vm_area_struct *vma) 452 560 { 453 - struct vm_list_struct **ppv; 561 + struct vm_area_struct *pvma, **pp; 562 + struct address_space *mapping; 563 + struct rb_node **p, *parent; 454 564 455 - for (ppv = &current->mm->context.vmlist; *ppv; ppv = &(*ppv)->next) 456 - if ((*ppv)->vma->vm_start > vml->vma->vm_start) 565 + kenter(",%p", vma); 566 + 567 + BUG_ON(!vma->vm_region); 568 + 569 + mm->map_count++; 570 + vma->vm_mm = mm; 571 + 572 + /* add the VMA to the mapping */ 573 + if (vma->vm_file) { 574 + mapping = vma->vm_file->f_mapping; 575 + 576 + flush_dcache_mmap_lock(mapping); 577 + vma_prio_tree_insert(vma, &mapping->i_mmap); 578 + flush_dcache_mmap_unlock(mapping); 579 + } 580 + 581 + /* add the VMA to the tree */ 582 + parent = NULL; 583 + p = &mm->mm_rb.rb_node; 584 + while (*p) { 585 + parent = *p; 586 + pvma = rb_entry(parent, struct vm_area_struct, vm_rb); 587 + 588 + /* sort by: start addr, end addr, VMA struct addr in that order 589 + * (the latter is necessary as we may get identical VMAs) */ 590 + if (vma->vm_start < pvma->vm_start) 591 + p = &(*p)->rb_left; 592 + else if (vma->vm_start > pvma->vm_start) 593 + p = &(*p)->rb_right; 594 + else if (vma->vm_end < pvma->vm_end) 595 + p = &(*p)->rb_left; 596 + else if (vma->vm_end > pvma->vm_end) 597 + p = &(*p)->rb_right; 598 + else if (vma < pvma) 599 + p = &(*p)->rb_left; 600 + else if (vma > pvma) 601 + p = &(*p)->rb_right; 602 + else 603 + BUG(); 604 + } 605 + 606 + rb_link_node(&vma->vm_rb, parent, p); 607 + rb_insert_color(&vma->vm_rb, &mm->mm_rb); 608 + 609 + /* add VMA to the VMA list also */ 610 + for (pp = &mm->mmap; (pvma = *pp); pp = &(*pp)->vm_next) { 611 + if (pvma->vm_start > vma->vm_start) 457 612 break; 613 + if (pvma->vm_start < vma->vm_start) 614 + continue; 615 + if (pvma->vm_end < vma->vm_end) 616 + break; 617 + } 458 618 459 - vml->next = *ppv; 460 - *ppv = vml; 619 + vma->vm_next = *pp; 620 + *pp = vma; 621 + } 622 + 623 + /* 624 + * delete a VMA from its owning mm_struct and address space 625 + */ 626 + static void delete_vma_from_mm(struct vm_area_struct *vma) 627 + { 628 + struct vm_area_struct **pp; 629 + struct address_space *mapping; 630 + struct mm_struct *mm = vma->vm_mm; 631 + 632 + kenter("%p", vma); 633 + 634 + mm->map_count--; 635 + if (mm->mmap_cache == vma) 636 + mm->mmap_cache = NULL; 637 + 638 + /* remove the VMA from the mapping */ 639 + if (vma->vm_file) { 640 + mapping = vma->vm_file->f_mapping; 641 + 642 + flush_dcache_mmap_lock(mapping); 643 + vma_prio_tree_remove(vma, &mapping->i_mmap); 644 + flush_dcache_mmap_unlock(mapping); 645 + } 646 + 647 + /* remove from the MM's tree and list */ 648 + rb_erase(&vma->vm_rb, &mm->mm_rb); 649 + for (pp = &mm->mmap; *pp; pp = &(*pp)->vm_next) { 650 + if (*pp == vma) { 651 + *pp = vma->vm_next; 652 + break; 653 + } 654 + } 655 + 656 + vma->vm_mm = NULL; 657 + } 658 + 659 + /* 660 + * destroy a VMA record 661 + */ 662 + static void delete_vma(struct mm_struct *mm, struct vm_area_struct *vma) 663 + { 664 + kenter("%p", vma); 665 + if (vma->vm_ops && vma->vm_ops->close) 666 + vma->vm_ops->close(vma); 667 + if (vma->vm_file) { 668 + fput(vma->vm_file); 669 + if (vma->vm_flags & VM_EXECUTABLE) 670 + removed_exe_file_vma(mm); 671 + } 672 + put_nommu_region(vma->vm_region); 673 + kmem_cache_free(vm_area_cachep, vma); 461 674 } 462 675 463 676 /* ··· 704 441 */ 705 442 struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr) 706 443 { 707 - struct vm_list_struct *loop, *vml; 444 + struct vm_area_struct *vma; 445 + struct rb_node *n = mm->mm_rb.rb_node; 708 446 709 - /* search the vm_start ordered list */ 710 - vml = NULL; 711 - for (loop = mm->context.vmlist; loop; loop = loop->next) { 712 - if (loop->vma->vm_start > addr) 713 - break; 714 - vml = loop; 447 + /* check the cache first */ 448 + vma = mm->mmap_cache; 449 + if (vma && vma->vm_start <= addr && vma->vm_end > addr) 450 + return vma; 451 + 452 + /* trawl the tree (there may be multiple mappings in which addr 453 + * resides) */ 454 + for (n = rb_first(&mm->mm_rb); n; n = rb_next(n)) { 455 + vma = rb_entry(n, struct vm_area_struct, vm_rb); 456 + if (vma->vm_start > addr) 457 + return NULL; 458 + if (vma->vm_end > addr) { 459 + mm->mmap_cache = vma; 460 + return vma; 461 + } 715 462 } 716 - 717 - if (vml && vml->vma->vm_end > addr) 718 - return vml->vma; 719 463 720 464 return NULL; 721 465 } ··· 737 467 return find_vma(mm, addr); 738 468 } 739 469 470 + /* 471 + * expand a stack to a given address 472 + * - not supported under NOMMU conditions 473 + */ 740 474 int expand_stack(struct vm_area_struct *vma, unsigned long address) 741 475 { 742 476 return -ENOMEM; ··· 750 476 * look up the first VMA exactly that exactly matches addr 751 477 * - should be called with mm->mmap_sem at least held readlocked 752 478 */ 753 - static inline struct vm_area_struct *find_vma_exact(struct mm_struct *mm, 754 - unsigned long addr) 755 - { 756 - struct vm_list_struct *vml; 757 - 758 - /* search the vm_start ordered list */ 759 - for (vml = mm->context.vmlist; vml; vml = vml->next) { 760 - if (vml->vma->vm_start == addr) 761 - return vml->vma; 762 - if (vml->vma->vm_start > addr) 763 - break; 764 - } 765 - 766 - return NULL; 767 - } 768 - 769 - /* 770 - * find a VMA in the global tree 771 - */ 772 - static inline struct vm_area_struct *find_nommu_vma(unsigned long start) 479 + static struct vm_area_struct *find_vma_exact(struct mm_struct *mm, 480 + unsigned long addr, 481 + unsigned long len) 773 482 { 774 483 struct vm_area_struct *vma; 775 - struct rb_node *n = nommu_vma_tree.rb_node; 484 + struct rb_node *n = mm->mm_rb.rb_node; 485 + unsigned long end = addr + len; 776 486 777 - while (n) { 487 + /* check the cache first */ 488 + vma = mm->mmap_cache; 489 + if (vma && vma->vm_start == addr && vma->vm_end == end) 490 + return vma; 491 + 492 + /* trawl the tree (there may be multiple mappings in which addr 493 + * resides) */ 494 + for (n = rb_first(&mm->mm_rb); n; n = rb_next(n)) { 778 495 vma = rb_entry(n, struct vm_area_struct, vm_rb); 779 - 780 - if (start < vma->vm_start) 781 - n = n->rb_left; 782 - else if (start > vma->vm_start) 783 - n = n->rb_right; 784 - else 496 + if (vma->vm_start < addr) 497 + continue; 498 + if (vma->vm_start > addr) 499 + return NULL; 500 + if (vma->vm_end == end) { 501 + mm->mmap_cache = vma; 785 502 return vma; 503 + } 786 504 } 787 505 788 506 return NULL; 789 - } 790 - 791 - /* 792 - * add a VMA in the global tree 793 - */ 794 - static void add_nommu_vma(struct vm_area_struct *vma) 795 - { 796 - struct vm_area_struct *pvma; 797 - struct address_space *mapping; 798 - struct rb_node **p = &nommu_vma_tree.rb_node; 799 - struct rb_node *parent = NULL; 800 - 801 - /* add the VMA to the mapping */ 802 - if (vma->vm_file) { 803 - mapping = vma->vm_file->f_mapping; 804 - 805 - flush_dcache_mmap_lock(mapping); 806 - vma_prio_tree_insert(vma, &mapping->i_mmap); 807 - flush_dcache_mmap_unlock(mapping); 808 - } 809 - 810 - /* add the VMA to the master list */ 811 - while (*p) { 812 - parent = *p; 813 - pvma = rb_entry(parent, struct vm_area_struct, vm_rb); 814 - 815 - if (vma->vm_start < pvma->vm_start) { 816 - p = &(*p)->rb_left; 817 - } 818 - else if (vma->vm_start > pvma->vm_start) { 819 - p = &(*p)->rb_right; 820 - } 821 - else { 822 - /* mappings are at the same address - this can only 823 - * happen for shared-mem chardevs and shared file 824 - * mappings backed by ramfs/tmpfs */ 825 - BUG_ON(!(pvma->vm_flags & VM_SHARED)); 826 - 827 - if (vma < pvma) 828 - p = &(*p)->rb_left; 829 - else if (vma > pvma) 830 - p = &(*p)->rb_right; 831 - else 832 - BUG(); 833 - } 834 - } 835 - 836 - rb_link_node(&vma->vm_rb, parent, p); 837 - rb_insert_color(&vma->vm_rb, &nommu_vma_tree); 838 - } 839 - 840 - /* 841 - * delete a VMA from the global list 842 - */ 843 - static void delete_nommu_vma(struct vm_area_struct *vma) 844 - { 845 - struct address_space *mapping; 846 - 847 - /* remove the VMA from the mapping */ 848 - if (vma->vm_file) { 849 - mapping = vma->vm_file->f_mapping; 850 - 851 - flush_dcache_mmap_lock(mapping); 852 - vma_prio_tree_remove(vma, &mapping->i_mmap); 853 - flush_dcache_mmap_unlock(mapping); 854 - } 855 - 856 - /* remove from the master list */ 857 - rb_erase(&vma->vm_rb, &nommu_vma_tree); 858 507 } 859 508 860 509 /* ··· 792 595 unsigned long pgoff, 793 596 unsigned long *_capabilities) 794 597 { 795 - unsigned long capabilities; 598 + unsigned long capabilities, rlen; 796 599 unsigned long reqprot = prot; 797 600 int ret; 798 601 ··· 812 615 return -EINVAL; 813 616 814 617 /* Careful about overflows.. */ 815 - len = PAGE_ALIGN(len); 816 - if (!len || len > TASK_SIZE) 618 + rlen = PAGE_ALIGN(len); 619 + if (!rlen || rlen > TASK_SIZE) 817 620 return -ENOMEM; 818 621 819 622 /* offset overflow? */ 820 - if ((pgoff + (len >> PAGE_SHIFT)) < pgoff) 623 + if ((pgoff + (rlen >> PAGE_SHIFT)) < pgoff) 821 624 return -EOVERFLOW; 822 625 823 626 if (file) { ··· 991 794 } 992 795 993 796 /* 994 - * set up a shared mapping on a file 797 + * set up a shared mapping on a file (the driver or filesystem provides and 798 + * pins the storage) 995 799 */ 996 - static int do_mmap_shared_file(struct vm_area_struct *vma, unsigned long len) 800 + static int do_mmap_shared_file(struct vm_area_struct *vma) 997 801 { 998 802 int ret; 999 803 ··· 1012 814 /* 1013 815 * set up a private mapping or an anonymous shared mapping 1014 816 */ 1015 - static int do_mmap_private(struct vm_area_struct *vma, unsigned long len) 817 + static int do_mmap_private(struct vm_area_struct *vma, 818 + struct vm_region *region, 819 + unsigned long len) 1016 820 { 821 + struct page *pages; 822 + unsigned long total, point, n, rlen; 1017 823 void *base; 1018 - int ret; 824 + int ret, order; 1019 825 1020 826 /* invoke the file's mapping function so that it can keep track of 1021 827 * shared mappings on devices or memory ··· 1038 836 * make a private copy of the data and map that instead */ 1039 837 } 1040 838 839 + rlen = PAGE_ALIGN(len); 840 + 1041 841 /* allocate some memory to hold the mapping 1042 842 * - note that this may not return a page-aligned address if the object 1043 843 * we're allocating is smaller than a page 1044 844 */ 1045 - base = kmalloc(len, GFP_KERNEL|__GFP_COMP); 1046 - if (!base) 845 + order = get_order(rlen); 846 + kdebug("alloc order %d for %lx", order, len); 847 + 848 + pages = alloc_pages(GFP_KERNEL, order); 849 + if (!pages) 1047 850 goto enomem; 1048 851 1049 - vma->vm_start = (unsigned long) base; 1050 - vma->vm_end = vma->vm_start + len; 1051 - vma->vm_flags |= VM_MAPPED_COPY; 852 + /* we allocated a power-of-2 sized page set, so we need to trim off the 853 + * excess */ 854 + total = 1 << order; 855 + atomic_add(total, &mmap_pages_allocated); 1052 856 1053 - #ifdef WARN_ON_SLACK 1054 - if (len + WARN_ON_SLACK <= kobjsize(result)) 1055 - printk("Allocation of %lu bytes from process %d has %lu bytes of slack\n", 1056 - len, current->pid, kobjsize(result) - len); 1057 - #endif 857 + point = rlen >> PAGE_SHIFT; 858 + while (total > point) { 859 + order = ilog2(total - point); 860 + n = 1 << order; 861 + kdebug("shave %lu/%lu @%lu", n, total - point, total); 862 + atomic_sub(n, &mmap_pages_allocated); 863 + total -= n; 864 + set_page_refcounted(pages + total); 865 + __free_pages(pages + total, order); 866 + } 867 + 868 + total = rlen >> PAGE_SHIFT; 869 + for (point = 1; point < total; point++) 870 + set_page_refcounted(&pages[point]); 871 + 872 + base = page_address(pages); 873 + region->vm_flags = vma->vm_flags |= VM_MAPPED_COPY; 874 + region->vm_start = (unsigned long) base; 875 + region->vm_end = region->vm_start + rlen; 876 + 877 + vma->vm_start = region->vm_start; 878 + vma->vm_end = region->vm_start + len; 1058 879 1059 880 if (vma->vm_file) { 1060 881 /* read the contents of a file into the copy */ ··· 1089 864 1090 865 old_fs = get_fs(); 1091 866 set_fs(KERNEL_DS); 1092 - ret = vma->vm_file->f_op->read(vma->vm_file, base, len, &fpos); 867 + ret = vma->vm_file->f_op->read(vma->vm_file, base, rlen, &fpos); 1093 868 set_fs(old_fs); 1094 869 1095 870 if (ret < 0) 1096 871 goto error_free; 1097 872 1098 873 /* clear the last little bit */ 1099 - if (ret < len) 1100 - memset(base + ret, 0, len - ret); 874 + if (ret < rlen) 875 + memset(base + ret, 0, rlen - ret); 1101 876 1102 877 } else { 1103 878 /* if it's an anonymous mapping, then just clear it */ 1104 - memset(base, 0, len); 879 + memset(base, 0, rlen); 1105 880 } 1106 881 1107 882 return 0; 1108 883 1109 884 error_free: 1110 - kfree(base); 1111 - vma->vm_start = 0; 885 + free_page_series(region->vm_start, region->vm_end); 886 + region->vm_start = vma->vm_start = 0; 887 + region->vm_end = vma->vm_end = 0; 1112 888 return ret; 1113 889 1114 890 enomem: ··· 1129 903 unsigned long flags, 1130 904 unsigned long pgoff) 1131 905 { 1132 - struct vm_list_struct *vml = NULL; 1133 - struct vm_area_struct *vma = NULL; 906 + struct vm_area_struct *vma; 907 + struct vm_region *region; 1134 908 struct rb_node *rb; 1135 - unsigned long capabilities, vm_flags; 1136 - void *result; 909 + unsigned long capabilities, vm_flags, result; 1137 910 int ret; 911 + 912 + kenter(",%lx,%lx,%lx,%lx,%lx", addr, len, prot, flags, pgoff); 1138 913 1139 914 if (!(flags & MAP_FIXED)) 1140 915 addr = round_hint_to_min(addr); ··· 1144 917 * mapping */ 1145 918 ret = validate_mmap_request(file, addr, len, prot, flags, pgoff, 1146 919 &capabilities); 1147 - if (ret < 0) 920 + if (ret < 0) { 921 + kleave(" = %d [val]", ret); 1148 922 return ret; 923 + } 1149 924 1150 925 /* we've determined that we can make the mapping, now translate what we 1151 926 * now know into VMA flags */ 1152 927 vm_flags = determine_vm_flags(file, prot, flags, capabilities); 1153 928 1154 - /* we're going to need to record the mapping if it works */ 1155 - vml = kzalloc(sizeof(struct vm_list_struct), GFP_KERNEL); 1156 - if (!vml) 1157 - goto error_getting_vml; 929 + /* we're going to need to record the mapping */ 930 + region = kmem_cache_zalloc(vm_region_jar, GFP_KERNEL); 931 + if (!region) 932 + goto error_getting_region; 1158 933 1159 - down_write(&nommu_vma_sem); 934 + vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL); 935 + if (!vma) 936 + goto error_getting_vma; 1160 937 1161 - /* if we want to share, we need to check for VMAs created by other 938 + atomic_set(&region->vm_usage, 1); 939 + region->vm_flags = vm_flags; 940 + region->vm_pgoff = pgoff; 941 + 942 + INIT_LIST_HEAD(&vma->anon_vma_node); 943 + vma->vm_flags = vm_flags; 944 + vma->vm_pgoff = pgoff; 945 + 946 + if (file) { 947 + region->vm_file = file; 948 + get_file(file); 949 + vma->vm_file = file; 950 + get_file(file); 951 + if (vm_flags & VM_EXECUTABLE) { 952 + added_exe_file_vma(current->mm); 953 + vma->vm_mm = current->mm; 954 + } 955 + } 956 + 957 + down_write(&nommu_region_sem); 958 + 959 + /* if we want to share, we need to check for regions created by other 1162 960 * mmap() calls that overlap with our proposed mapping 1163 - * - we can only share with an exact match on most regular files 961 + * - we can only share with a superset match on most regular files 1164 962 * - shared mappings on character devices and memory backed files are 1165 963 * permitted to overlap inexactly as far as we are concerned for in 1166 964 * these cases, sharing is handled in the driver or filesystem rather 1167 965 * than here 1168 966 */ 1169 967 if (vm_flags & VM_MAYSHARE) { 1170 - unsigned long pglen = (len + PAGE_SIZE - 1) >> PAGE_SHIFT; 1171 - unsigned long vmpglen; 968 + struct vm_region *pregion; 969 + unsigned long pglen, rpglen, pgend, rpgend, start; 1172 970 1173 - /* suppress VMA sharing for shared regions */ 1174 - if (vm_flags & VM_SHARED && 1175 - capabilities & BDI_CAP_MAP_DIRECT) 1176 - goto dont_share_VMAs; 971 + pglen = (len + PAGE_SIZE - 1) >> PAGE_SHIFT; 972 + pgend = pgoff + pglen; 1177 973 1178 - for (rb = rb_first(&nommu_vma_tree); rb; rb = rb_next(rb)) { 1179 - vma = rb_entry(rb, struct vm_area_struct, vm_rb); 974 + for (rb = rb_first(&nommu_region_tree); rb; rb = rb_next(rb)) { 975 + pregion = rb_entry(rb, struct vm_region, vm_rb); 1180 976 1181 - if (!(vma->vm_flags & VM_MAYSHARE)) 977 + if (!(pregion->vm_flags & VM_MAYSHARE)) 1182 978 continue; 1183 979 1184 980 /* search for overlapping mappings on the same file */ 1185 - if (vma->vm_file->f_path.dentry->d_inode != file->f_path.dentry->d_inode) 981 + if (pregion->vm_file->f_path.dentry->d_inode != 982 + file->f_path.dentry->d_inode) 1186 983 continue; 1187 984 1188 - if (vma->vm_pgoff >= pgoff + pglen) 985 + if (pregion->vm_pgoff >= pgend) 1189 986 continue; 1190 987 1191 - vmpglen = vma->vm_end - vma->vm_start + PAGE_SIZE - 1; 1192 - vmpglen >>= PAGE_SHIFT; 1193 - if (pgoff >= vma->vm_pgoff + vmpglen) 988 + rpglen = pregion->vm_end - pregion->vm_start; 989 + rpglen = (rpglen + PAGE_SIZE - 1) >> PAGE_SHIFT; 990 + rpgend = pregion->vm_pgoff + rpglen; 991 + if (pgoff >= rpgend) 1194 992 continue; 1195 993 1196 - /* handle inexactly overlapping matches between mappings */ 1197 - if (vma->vm_pgoff != pgoff || vmpglen != pglen) { 994 + /* handle inexactly overlapping matches between 995 + * mappings */ 996 + if ((pregion->vm_pgoff != pgoff || rpglen != pglen) && 997 + !(pgoff >= pregion->vm_pgoff && pgend <= rpgend)) { 998 + /* new mapping is not a subset of the region */ 1198 999 if (!(capabilities & BDI_CAP_MAP_DIRECT)) 1199 1000 goto sharing_violation; 1200 1001 continue; 1201 1002 } 1202 1003 1203 - /* we've found a VMA we can share */ 1204 - atomic_inc(&vma->vm_usage); 1004 + /* we've found a region we can share */ 1005 + atomic_inc(&pregion->vm_usage); 1006 + vma->vm_region = pregion; 1007 + start = pregion->vm_start; 1008 + start += (pgoff - pregion->vm_pgoff) << PAGE_SHIFT; 1009 + vma->vm_start = start; 1010 + vma->vm_end = start + len; 1205 1011 1206 - vml->vma = vma; 1207 - result = (void *) vma->vm_start; 1208 - goto shared; 1012 + if (pregion->vm_flags & VM_MAPPED_COPY) { 1013 + kdebug("share copy"); 1014 + vma->vm_flags |= VM_MAPPED_COPY; 1015 + } else { 1016 + kdebug("share mmap"); 1017 + ret = do_mmap_shared_file(vma); 1018 + if (ret < 0) { 1019 + vma->vm_region = NULL; 1020 + vma->vm_start = 0; 1021 + vma->vm_end = 0; 1022 + atomic_dec(&pregion->vm_usage); 1023 + pregion = NULL; 1024 + goto error_just_free; 1025 + } 1026 + } 1027 + fput(region->vm_file); 1028 + kmem_cache_free(vm_region_jar, region); 1029 + region = pregion; 1030 + result = start; 1031 + goto share; 1209 1032 } 1210 - 1211 - dont_share_VMAs: 1212 - vma = NULL; 1213 1033 1214 1034 /* obtain the address at which to make a shared mapping 1215 1035 * - this is the hook for quasi-memory character devices to ··· 1268 994 if (IS_ERR((void *) addr)) { 1269 995 ret = addr; 1270 996 if (ret != (unsigned long) -ENOSYS) 1271 - goto error; 997 + goto error_just_free; 1272 998 1273 999 /* the driver refused to tell us where to site 1274 1000 * the mapping so we'll have to attempt to copy 1275 1001 * it */ 1276 1002 ret = (unsigned long) -ENODEV; 1277 1003 if (!(capabilities & BDI_CAP_MAP_COPY)) 1278 - goto error; 1004 + goto error_just_free; 1279 1005 1280 1006 capabilities &= ~BDI_CAP_MAP_DIRECT; 1007 + } else { 1008 + vma->vm_start = region->vm_start = addr; 1009 + vma->vm_end = region->vm_end = addr + len; 1281 1010 } 1282 1011 } 1283 1012 } 1284 1013 1285 - /* we're going to need a VMA struct as well */ 1286 - vma = kzalloc(sizeof(struct vm_area_struct), GFP_KERNEL); 1287 - if (!vma) 1288 - goto error_getting_vma; 1289 - 1290 - INIT_LIST_HEAD(&vma->anon_vma_node); 1291 - atomic_set(&vma->vm_usage, 1); 1292 - if (file) { 1293 - get_file(file); 1294 - if (vm_flags & VM_EXECUTABLE) { 1295 - added_exe_file_vma(current->mm); 1296 - vma->vm_mm = current->mm; 1297 - } 1298 - } 1299 - vma->vm_file = file; 1300 - vma->vm_flags = vm_flags; 1301 - vma->vm_start = addr; 1302 - vma->vm_end = addr + len; 1303 - vma->vm_pgoff = pgoff; 1304 - 1305 - vml->vma = vma; 1014 + vma->vm_region = region; 1306 1015 1307 1016 /* set up the mapping */ 1308 1017 if (file && vma->vm_flags & VM_SHARED) 1309 - ret = do_mmap_shared_file(vma, len); 1018 + ret = do_mmap_shared_file(vma); 1310 1019 else 1311 - ret = do_mmap_private(vma, len); 1020 + ret = do_mmap_private(vma, region, len); 1312 1021 if (ret < 0) 1313 - goto error; 1022 + goto error_put_region; 1023 + 1024 + add_nommu_region(region); 1314 1025 1315 1026 /* okay... we have a mapping; now we have to register it */ 1316 - result = (void *) vma->vm_start; 1027 + result = vma->vm_start; 1317 1028 1318 1029 current->mm->total_vm += len >> PAGE_SHIFT; 1319 1030 1320 - add_nommu_vma(vma); 1031 + share: 1032 + add_vma_to_mm(current->mm, vma); 1321 1033 1322 - shared: 1323 - add_vma_to_mm(current->mm, vml); 1324 - 1325 - up_write(&nommu_vma_sem); 1034 + up_write(&nommu_region_sem); 1326 1035 1327 1036 if (prot & PROT_EXEC) 1328 - flush_icache_range((unsigned long) result, 1329 - (unsigned long) result + len); 1037 + flush_icache_range(result, result + len); 1330 1038 1331 - #ifdef DEBUG 1332 - printk("do_mmap:\n"); 1333 - show_process_blocks(); 1334 - #endif 1039 + kleave(" = %lx", result); 1040 + return result; 1335 1041 1336 - return (unsigned long) result; 1337 - 1338 - error: 1339 - up_write(&nommu_vma_sem); 1340 - kfree(vml); 1042 + error_put_region: 1043 + __put_nommu_region(region); 1341 1044 if (vma) { 1342 1045 if (vma->vm_file) { 1343 1046 fput(vma->vm_file); 1344 1047 if (vma->vm_flags & VM_EXECUTABLE) 1345 1048 removed_exe_file_vma(vma->vm_mm); 1346 1049 } 1347 - kfree(vma); 1050 + kmem_cache_free(vm_area_cachep, vma); 1348 1051 } 1052 + kleave(" = %d [pr]", ret); 1349 1053 return ret; 1350 1054 1351 - sharing_violation: 1352 - up_write(&nommu_vma_sem); 1353 - printk("Attempt to share mismatched mappings\n"); 1354 - kfree(vml); 1355 - return -EINVAL; 1055 + error_just_free: 1056 + up_write(&nommu_region_sem); 1057 + error: 1058 + fput(region->vm_file); 1059 + kmem_cache_free(vm_region_jar, region); 1060 + fput(vma->vm_file); 1061 + if (vma->vm_flags & VM_EXECUTABLE) 1062 + removed_exe_file_vma(vma->vm_mm); 1063 + kmem_cache_free(vm_area_cachep, vma); 1064 + kleave(" = %d", ret); 1065 + return ret; 1356 1066 1357 - error_getting_vma: 1358 - up_write(&nommu_vma_sem); 1359 - kfree(vml); 1360 - printk("Allocation of vma for %lu byte allocation from process %d failed\n", 1067 + sharing_violation: 1068 + up_write(&nommu_region_sem); 1069 + printk(KERN_WARNING "Attempt to share mismatched mappings\n"); 1070 + ret = -EINVAL; 1071 + goto error; 1072 + 1073 + error_getting_vma: 1074 + kmem_cache_free(vm_region_jar, region); 1075 + printk(KERN_WARNING "Allocation of vma for %lu byte allocation" 1076 + " from process %d failed\n", 1361 1077 len, current->pid); 1362 1078 show_free_areas(); 1363 1079 return -ENOMEM; 1364 1080 1365 - error_getting_vml: 1366 - printk("Allocation of vml for %lu byte allocation from process %d failed\n", 1081 + error_getting_region: 1082 + printk(KERN_WARNING "Allocation of vm region for %lu byte allocation" 1083 + " from process %d failed\n", 1367 1084 len, current->pid); 1368 1085 show_free_areas(); 1369 1086 return -ENOMEM; ··· 1362 1097 EXPORT_SYMBOL(do_mmap_pgoff); 1363 1098 1364 1099 /* 1365 - * handle mapping disposal for uClinux 1100 + * split a vma into two pieces at address 'addr', a new vma is allocated either 1101 + * for the first part or the tail. 1366 1102 */ 1367 - static void put_vma(struct mm_struct *mm, struct vm_area_struct *vma) 1103 + int split_vma(struct mm_struct *mm, struct vm_area_struct *vma, 1104 + unsigned long addr, int new_below) 1368 1105 { 1369 - if (vma) { 1370 - down_write(&nommu_vma_sem); 1106 + struct vm_area_struct *new; 1107 + struct vm_region *region; 1108 + unsigned long npages; 1371 1109 1372 - if (atomic_dec_and_test(&vma->vm_usage)) { 1373 - delete_nommu_vma(vma); 1110 + kenter(""); 1374 1111 1375 - if (vma->vm_ops && vma->vm_ops->close) 1376 - vma->vm_ops->close(vma); 1112 + /* we're only permitted to split anonymous regions that have a single 1113 + * owner */ 1114 + if (vma->vm_file || 1115 + atomic_read(&vma->vm_region->vm_usage) != 1) 1116 + return -ENOMEM; 1377 1117 1378 - /* IO memory and memory shared directly out of the pagecache from 1379 - * ramfs/tmpfs mustn't be released here */ 1380 - if (vma->vm_flags & VM_MAPPED_COPY) 1381 - kfree((void *) vma->vm_start); 1118 + if (mm->map_count >= sysctl_max_map_count) 1119 + return -ENOMEM; 1382 1120 1383 - if (vma->vm_file) { 1384 - fput(vma->vm_file); 1385 - if (vma->vm_flags & VM_EXECUTABLE) 1386 - removed_exe_file_vma(mm); 1387 - } 1388 - kfree(vma); 1389 - } 1121 + region = kmem_cache_alloc(vm_region_jar, GFP_KERNEL); 1122 + if (!region) 1123 + return -ENOMEM; 1390 1124 1391 - up_write(&nommu_vma_sem); 1125 + new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); 1126 + if (!new) { 1127 + kmem_cache_free(vm_region_jar, region); 1128 + return -ENOMEM; 1392 1129 } 1130 + 1131 + /* most fields are the same, copy all, and then fixup */ 1132 + *new = *vma; 1133 + *region = *vma->vm_region; 1134 + new->vm_region = region; 1135 + 1136 + npages = (addr - vma->vm_start) >> PAGE_SHIFT; 1137 + 1138 + if (new_below) { 1139 + region->vm_end = new->vm_end = addr; 1140 + } else { 1141 + region->vm_start = new->vm_start = addr; 1142 + region->vm_pgoff = new->vm_pgoff += npages; 1143 + } 1144 + 1145 + if (new->vm_ops && new->vm_ops->open) 1146 + new->vm_ops->open(new); 1147 + 1148 + delete_vma_from_mm(vma); 1149 + down_write(&nommu_region_sem); 1150 + delete_nommu_region(vma->vm_region); 1151 + if (new_below) { 1152 + vma->vm_region->vm_start = vma->vm_start = addr; 1153 + vma->vm_region->vm_pgoff = vma->vm_pgoff += npages; 1154 + } else { 1155 + vma->vm_region->vm_end = vma->vm_end = addr; 1156 + } 1157 + add_nommu_region(vma->vm_region); 1158 + add_nommu_region(new->vm_region); 1159 + up_write(&nommu_region_sem); 1160 + add_vma_to_mm(mm, vma); 1161 + add_vma_to_mm(mm, new); 1162 + return 0; 1163 + } 1164 + 1165 + /* 1166 + * shrink a VMA by removing the specified chunk from either the beginning or 1167 + * the end 1168 + */ 1169 + static int shrink_vma(struct mm_struct *mm, 1170 + struct vm_area_struct *vma, 1171 + unsigned long from, unsigned long to) 1172 + { 1173 + struct vm_region *region; 1174 + 1175 + kenter(""); 1176 + 1177 + /* adjust the VMA's pointers, which may reposition it in the MM's tree 1178 + * and list */ 1179 + delete_vma_from_mm(vma); 1180 + if (from > vma->vm_start) 1181 + vma->vm_end = from; 1182 + else 1183 + vma->vm_start = to; 1184 + add_vma_to_mm(mm, vma); 1185 + 1186 + /* cut the backing region down to size */ 1187 + region = vma->vm_region; 1188 + BUG_ON(atomic_read(&region->vm_usage) != 1); 1189 + 1190 + down_write(&nommu_region_sem); 1191 + delete_nommu_region(region); 1192 + if (from > region->vm_start) 1193 + region->vm_end = from; 1194 + else 1195 + region->vm_start = to; 1196 + add_nommu_region(region); 1197 + up_write(&nommu_region_sem); 1198 + 1199 + free_page_series(from, to); 1200 + return 0; 1393 1201 } 1394 1202 1395 1203 /* 1396 1204 * release a mapping 1397 - * - under NOMMU conditions the parameters must match exactly to the mapping to 1398 - * be removed 1205 + * - under NOMMU conditions the chunk to be unmapped must be backed by a single 1206 + * VMA, though it need not cover the whole VMA 1399 1207 */ 1400 - int do_munmap(struct mm_struct *mm, unsigned long addr, size_t len) 1208 + int do_munmap(struct mm_struct *mm, unsigned long start, size_t len) 1401 1209 { 1402 - struct vm_list_struct *vml, **parent; 1403 - unsigned long end = addr + len; 1210 + struct vm_area_struct *vma; 1211 + struct rb_node *rb; 1212 + unsigned long end = start + len; 1213 + int ret; 1404 1214 1405 - #ifdef DEBUG 1406 - printk("do_munmap:\n"); 1407 - #endif 1215 + kenter(",%lx,%zx", start, len); 1408 1216 1409 - for (parent = &mm->context.vmlist; *parent; parent = &(*parent)->next) { 1410 - if ((*parent)->vma->vm_start > addr) 1411 - break; 1412 - if ((*parent)->vma->vm_start == addr && 1413 - ((len == 0) || ((*parent)->vma->vm_end == end))) 1414 - goto found; 1217 + if (len == 0) 1218 + return -EINVAL; 1219 + 1220 + /* find the first potentially overlapping VMA */ 1221 + vma = find_vma(mm, start); 1222 + if (!vma) { 1223 + printk(KERN_WARNING 1224 + "munmap of memory not mmapped by process %d (%s):" 1225 + " 0x%lx-0x%lx\n", 1226 + current->pid, current->comm, start, start + len - 1); 1227 + return -EINVAL; 1415 1228 } 1416 1229 1417 - printk("munmap of non-mmaped memory by process %d (%s): %p\n", 1418 - current->pid, current->comm, (void *) addr); 1419 - return -EINVAL; 1230 + /* we're allowed to split an anonymous VMA but not a file-backed one */ 1231 + if (vma->vm_file) { 1232 + do { 1233 + if (start > vma->vm_start) { 1234 + kleave(" = -EINVAL [miss]"); 1235 + return -EINVAL; 1236 + } 1237 + if (end == vma->vm_end) 1238 + goto erase_whole_vma; 1239 + rb = rb_next(&vma->vm_rb); 1240 + vma = rb_entry(rb, struct vm_area_struct, vm_rb); 1241 + } while (rb); 1242 + kleave(" = -EINVAL [split file]"); 1243 + return -EINVAL; 1244 + } else { 1245 + /* the chunk must be a subset of the VMA found */ 1246 + if (start == vma->vm_start && end == vma->vm_end) 1247 + goto erase_whole_vma; 1248 + if (start < vma->vm_start || end > vma->vm_end) { 1249 + kleave(" = -EINVAL [superset]"); 1250 + return -EINVAL; 1251 + } 1252 + if (start & ~PAGE_MASK) { 1253 + kleave(" = -EINVAL [unaligned start]"); 1254 + return -EINVAL; 1255 + } 1256 + if (end != vma->vm_end && end & ~PAGE_MASK) { 1257 + kleave(" = -EINVAL [unaligned split]"); 1258 + return -EINVAL; 1259 + } 1260 + if (start != vma->vm_start && end != vma->vm_end) { 1261 + ret = split_vma(mm, vma, start, 1); 1262 + if (ret < 0) { 1263 + kleave(" = %d [split]", ret); 1264 + return ret; 1265 + } 1266 + } 1267 + return shrink_vma(mm, vma, start, end); 1268 + } 1420 1269 1421 - found: 1422 - vml = *parent; 1423 - 1424 - put_vma(mm, vml->vma); 1425 - 1426 - *parent = vml->next; 1427 - kfree(vml); 1428 - 1429 - update_hiwater_vm(mm); 1430 - mm->total_vm -= len >> PAGE_SHIFT; 1431 - 1432 - #ifdef DEBUG 1433 - show_process_blocks(); 1434 - #endif 1435 - 1270 + erase_whole_vma: 1271 + delete_vma_from_mm(vma); 1272 + delete_vma(mm, vma); 1273 + kleave(" = 0"); 1436 1274 return 0; 1437 1275 } 1438 1276 EXPORT_SYMBOL(do_munmap); ··· 1552 1184 } 1553 1185 1554 1186 /* 1555 - * Release all mappings 1187 + * release all the mappings made in a process's VM space 1556 1188 */ 1557 - void exit_mmap(struct mm_struct * mm) 1189 + void exit_mmap(struct mm_struct *mm) 1558 1190 { 1559 - struct vm_list_struct *tmp; 1191 + struct vm_area_struct *vma; 1560 1192 1561 - if (mm) { 1562 - #ifdef DEBUG 1563 - printk("Exit_mmap:\n"); 1564 - #endif 1193 + if (!mm) 1194 + return; 1565 1195 1566 - mm->total_vm = 0; 1196 + kenter(""); 1567 1197 1568 - while ((tmp = mm->context.vmlist)) { 1569 - mm->context.vmlist = tmp->next; 1570 - put_vma(mm, tmp->vma); 1571 - kfree(tmp); 1572 - } 1198 + mm->total_vm = 0; 1573 1199 1574 - #ifdef DEBUG 1575 - show_process_blocks(); 1576 - #endif 1200 + while ((vma = mm->mmap)) { 1201 + mm->mmap = vma->vm_next; 1202 + delete_vma_from_mm(vma); 1203 + delete_vma(mm, vma); 1577 1204 } 1205 + 1206 + kleave(""); 1578 1207 } 1579 1208 1580 1209 unsigned long do_brk(unsigned long addr, unsigned long len) ··· 1584 1219 * time (controlled by the MREMAP_MAYMOVE flag and available VM space) 1585 1220 * 1586 1221 * under NOMMU conditions, we only permit changing a mapping's size, and only 1587 - * as long as it stays within the hole allocated by the kmalloc() call in 1588 - * do_mmap_pgoff() and the block is not shareable 1222 + * as long as it stays within the region allocated by do_mmap_private() and the 1223 + * block is not shareable 1589 1224 * 1590 1225 * MREMAP_FIXED is not supported under NOMMU conditions 1591 1226 */ ··· 1596 1231 struct vm_area_struct *vma; 1597 1232 1598 1233 /* insanity checks first */ 1599 - if (new_len == 0) 1234 + if (old_len == 0 || new_len == 0) 1600 1235 return (unsigned long) -EINVAL; 1236 + 1237 + if (addr & ~PAGE_MASK) 1238 + return -EINVAL; 1601 1239 1602 1240 if (flags & MREMAP_FIXED && new_addr != addr) 1603 1241 return (unsigned long) -EINVAL; 1604 1242 1605 - vma = find_vma_exact(current->mm, addr); 1243 + vma = find_vma_exact(current->mm, addr, old_len); 1606 1244 if (!vma) 1607 1245 return (unsigned long) -EINVAL; 1608 1246 ··· 1615 1247 if (vma->vm_flags & VM_MAYSHARE) 1616 1248 return (unsigned long) -EPERM; 1617 1249 1618 - if (new_len > kobjsize((void *) addr)) 1250 + if (new_len > vma->vm_region->vm_end - vma->vm_region->vm_start) 1619 1251 return (unsigned long) -ENOMEM; 1620 1252 1621 1253 /* all checks complete - do it */ 1622 1254 vma->vm_end = vma->vm_start + new_len; 1623 - 1624 1255 return vma->vm_start; 1625 1256 } 1626 1257 EXPORT_SYMBOL(do_mremap); 1627 1258 1628 - asmlinkage unsigned long sys_mremap(unsigned long addr, 1629 - unsigned long old_len, unsigned long new_len, 1630 - unsigned long flags, unsigned long new_addr) 1259 + asmlinkage 1260 + unsigned long sys_mremap(unsigned long addr, 1261 + unsigned long old_len, unsigned long new_len, 1262 + unsigned long flags, unsigned long new_addr) 1631 1263 { 1632 1264 unsigned long ret; 1633 1265