Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

hugetlbfs: allow the creation of files suitable for MAP_PRIVATE on the vfs internal mount

This patchset adds a flag to mmap that allows the user to request that an
anonymous mapping be backed with huge pages. This mapping will borrow
functionality from the huge page shm code to create a file on the kernel
internal mount and use it to approximate an anonymous mapping. The
MAP_HUGETLB flag is a modifier to MAP_ANONYMOUS and will not work without
both flags being preset.

A new flag is necessary because there is no other way to hook into huge
pages without creating a file on a hugetlbfs mount which wouldn't be
MAP_ANONYMOUS.

To userspace, this mapping will behave just like an anonymous mapping
because the file is not accessible outside of the kernel.

This patchset is meant to simplify the programming model. Presently there
is a large chunk of boiler platecode, contained in libhugetlbfs, required
to create private, hugepage backed mappings. This patch set would allow
use of hugepages without linking to libhugetlbfs or having hugetblfs
mounted.

Unification of the VM code would provide these same benefits, but it has
been resisted each time that it has been suggested for several reasons: it
would break PAGE_SIZE assumptions across the kernel, it makes page-table
abstractions really expensive, and it does not provide any benefit on
architectures that do not support huge pages, incurring fast path
penalties without providing any benefit on these architectures.

This patch:

There are two means of creating mappings backed by huge pages:

1. mmap() a file created on hugetlbfs
2. Use shm which creates a file on an internal mount which essentially
maps it MAP_SHARED

The internal mount is only used for shared mappings but there is very
little that stops it being used for private mappings. This patch extends
hugetlbfs_file_setup() to deal with the creation of files that will be
mapped MAP_PRIVATE on the internal hugetlbfs mount. This extended API is
used in a subsequent patch to implement the MAP_HUGETLB mmap() flag.

Signed-off-by: Eric Munson <ebmunson@us.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Eric B Munson and committed by
Linus Torvalds
6bfde05b f8dbf0a7

+28 -7
+17 -4
fs/hugetlbfs/inode.c
··· 507 507 inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; 508 508 INIT_LIST_HEAD(&inode->i_mapping->private_list); 509 509 info = HUGETLBFS_I(inode); 510 + /* 511 + * The policy is initialized here even if we are creating a 512 + * private inode because initialization simply creates an 513 + * an empty rb tree and calls spin_lock_init(), later when we 514 + * call mpol_free_shared_policy() it will just return because 515 + * the rb tree will still be empty. 516 + */ 510 517 mpol_shared_policy_init(&info->policy, NULL); 511 518 switch (mode & S_IFMT) { 512 519 default: ··· 938 931 939 932 static struct vfsmount *hugetlbfs_vfsmount; 940 933 941 - static int can_do_hugetlb_shm(void) 934 + static int can_do_hugetlb_shm(int creat_flags) 942 935 { 943 - return capable(CAP_IPC_LOCK) || in_group_p(sysctl_hugetlb_shm_group); 936 + if (creat_flags != HUGETLB_SHMFS_INODE) 937 + return 0; 938 + if (capable(CAP_IPC_LOCK)) 939 + return 1; 940 + if (in_group_p(sysctl_hugetlb_shm_group)) 941 + return 1; 942 + return 0; 944 943 } 945 944 946 945 struct file *hugetlb_file_setup(const char *name, size_t size, int acctflag, 947 - struct user_struct **user) 946 + struct user_struct **user, int creat_flags) 948 947 { 949 948 int error = -ENOMEM; 950 949 struct file *file; ··· 962 949 if (!hugetlbfs_vfsmount) 963 950 return ERR_PTR(-ENOENT); 964 951 965 - if (!can_do_hugetlb_shm()) { 952 + if (!can_do_hugetlb_shm(creat_flags)) { 966 953 *user = current_user(); 967 954 if (user_shm_lock(size, *user)) { 968 955 WARN_ONCE(1,
+10 -2
include/linux/hugetlb.h
··· 112 112 113 113 #endif /* !CONFIG_HUGETLB_PAGE */ 114 114 115 + enum { 116 + /* 117 + * The file will be used as an shm file so shmfs accounting rules 118 + * apply 119 + */ 120 + HUGETLB_SHMFS_INODE = 1, 121 + }; 122 + 115 123 #ifdef CONFIG_HUGETLBFS 116 124 struct hugetlbfs_config { 117 125 uid_t uid; ··· 158 150 extern const struct file_operations hugetlbfs_file_operations; 159 151 extern struct vm_operations_struct hugetlb_vm_ops; 160 152 struct file *hugetlb_file_setup(const char *name, size_t size, int acct, 161 - struct user_struct **user); 153 + struct user_struct **user, int creat_flags); 162 154 int hugetlb_get_quota(struct address_space *mapping, long delta); 163 155 void hugetlb_put_quota(struct address_space *mapping, long delta); 164 156 ··· 180 172 181 173 #define is_file_hugepages(file) 0 182 174 #define set_file_hugepages(file) BUG() 183 - #define hugetlb_file_setup(name,size,acct,user) ERR_PTR(-ENOSYS) 175 + #define hugetlb_file_setup(name,size,acct,user,creat) ERR_PTR(-ENOSYS) 184 176 185 177 #endif /* !CONFIG_HUGETLBFS */ 186 178
+1 -1
ipc/shm.c
··· 370 370 if (shmflg & SHM_NORESERVE) 371 371 acctflag = VM_NORESERVE; 372 372 file = hugetlb_file_setup(name, size, acctflag, 373 - &shp->mlock_user); 373 + &shp->mlock_user, HUGETLB_SHMFS_INODE); 374 374 } else { 375 375 /* 376 376 * Do not allow no accounting for OVERCOMMIT_NEVER, even