Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

coredump_filter: add hugepage dumping

Presently hugepage's vma has a VM_RESERVED flag in order not to be
swapped. But a VM_RESERVED vma isn't core dumped because this flag is
often used for some kernel vmas (e.g. vmalloc, sound related).

Thus hugepages are never dumped and it can't be debugged easily. Many
developers want hugepages to be included into core-dump.

However, We can't read generic VM_RESERVED area because this area is often
IO mapping area. then these area reading may change device state. it is
definitly undesiable side-effect.

So adding a hugepage specific bit to the coredump filter is better. It
will be able to hugepage core dumping and doesn't cause any side-effect to
any i/o devices.

In additional, libhugetlb use hugetlb private mapping pages as anonymous
page. Then, hugepage private mapping pages should be core dumped by
default.

Then, /proc/[pid]/core_dump_filter has two new bits.

- bit 5 mean hugetlb private mapping pages are dumped or not. (default: yes)
- bit 6 mean hugetlb shared mapping pages are dumped or not. (default: no)

I tested by following method.

% ulimit -c unlimited
% ./crash_hugepage 50
% ./crash_hugepage 50 -p
% ls -lh
% gdb ./crash_hugepage core
%
% echo 0x43 > /proc/self/coredump_filter
% ./crash_hugepage 50
% ./crash_hugepage 50 -p
% ls -lh
% gdb ./crash_hugepage core

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>

#include "hugetlbfs.h"

int main(int argc, char** argv){
char* p;
int ch;
int mmap_flags = MAP_SHARED;
int fd;
int nr_pages;

while((ch = getopt(argc, argv, "p")) != -1) {
switch (ch) {
case 'p':
mmap_flags &= ~MAP_SHARED;
mmap_flags |= MAP_PRIVATE;
break;
default:
/* nothing*/
break;
}
}
argc -= optind;
argv += optind;

if (argc == 0){
printf("need # of pages\n");
exit(1);
}

nr_pages = atoi(argv[0]);
if (nr_pages < 2) {
printf("nr_pages must >2\n");
exit(1);
}

fd = hugetlbfs_unlinked_fd();
p = mmap(NULL, nr_pages * gethugepagesize(),
PROT_READ|PROT_WRITE, mmap_flags, fd, 0);

sleep(2);

*(p + gethugepagesize()) = 1; /* COW */
sleep(2);

/* crash! */
*(int*)0 = 1;

return 0;
}

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Kawai Hidehiro <hidehiro.kawai.ez@hitachi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: William Irwin <wli@holomorphy.com>
Cc: Adam Litke <agl@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

KOSAKI Motohiro and committed by
Linus Torvalds
e575f111 d903ef9f

+25 -9
+10 -5
Documentation/filesystems/proc.txt
··· 2412 2412 of memory types. If a bit of the bitmask is set, memory segments of the 2413 2413 corresponding memory type are dumped, otherwise they are not dumped. 2414 2414 2415 - The following 4 memory types are supported: 2415 + The following 7 memory types are supported: 2416 2416 - (bit 0) anonymous private memory 2417 2417 - (bit 1) anonymous shared memory 2418 2418 - (bit 2) file-backed private memory 2419 2419 - (bit 3) file-backed shared memory 2420 2420 - (bit 4) ELF header pages in file-backed private memory areas (it is 2421 2421 effective only if the bit 2 is cleared) 2422 + - (bit 5) hugetlb private memory 2423 + - (bit 6) hugetlb shared memory 2422 2424 2423 2425 Note that MMIO pages such as frame buffer are never dumped and vDSO pages 2424 2426 are always dumped regardless of the bitmask status. 2425 2427 2426 - Default value of coredump_filter is 0x3; this means all anonymous memory 2427 - segments are dumped. 2428 + Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only 2429 + effected by bit 5-6. 2430 + 2431 + Default value of coredump_filter is 0x23; this means all anonymous memory 2432 + segments and hugetlb private memory are dumped. 2428 2433 2429 2434 If you don't want to dump all shared memory segments attached to pid 1234, 2430 - write 1 to the process's proc file. 2435 + write 0x21 to the process's proc file. 2431 2436 2432 - $ echo 0x1 > /proc/1234/coredump_filter 2437 + $ echo 0x21 > /proc/1234/coredump_filter 2433 2438 2434 2439 When a new process is created, the process inherits the bitmask status from its 2435 2440 parent. It is useful to set up coredump_filter before the program runs.
+10 -2
fs/binfmt_elf.c
··· 1156 1156 static unsigned long vma_dump_size(struct vm_area_struct *vma, 1157 1157 unsigned long mm_flags) 1158 1158 { 1159 + #define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type)) 1160 + 1159 1161 /* The vma can be set up to tell us the answer directly. */ 1160 1162 if (vma->vm_flags & VM_ALWAYSDUMP) 1161 1163 goto whole; 1162 1164 1165 + /* Hugetlb memory check */ 1166 + if (vma->vm_flags & VM_HUGETLB) { 1167 + if ((vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_SHARED)) 1168 + goto whole; 1169 + if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE)) 1170 + goto whole; 1171 + } 1172 + 1163 1173 /* Do not dump I/O mapped devices or special mappings */ 1164 1174 if (vma->vm_flags & (VM_IO | VM_RESERVED)) 1165 1175 return 0; 1166 - 1167 - #define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type)) 1168 1176 1169 1177 /* By default, dump shared memory if mapped from an anonymous file. */ 1170 1178 if (vma->vm_flags & VM_SHARED) {
+5 -2
include/linux/sched.h
··· 403 403 #define MMF_DUMP_MAPPED_PRIVATE 4 404 404 #define MMF_DUMP_MAPPED_SHARED 5 405 405 #define MMF_DUMP_ELF_HEADERS 6 406 + #define MMF_DUMP_HUGETLB_PRIVATE 7 407 + #define MMF_DUMP_HUGETLB_SHARED 8 406 408 #define MMF_DUMP_FILTER_SHIFT MMF_DUMPABLE_BITS 407 - #define MMF_DUMP_FILTER_BITS 5 409 + #define MMF_DUMP_FILTER_BITS 7 408 410 #define MMF_DUMP_FILTER_MASK \ 409 411 (((1 << MMF_DUMP_FILTER_BITS) - 1) << MMF_DUMP_FILTER_SHIFT) 410 412 #define MMF_DUMP_FILTER_DEFAULT \ 411 - ((1 << MMF_DUMP_ANON_PRIVATE) | (1 << MMF_DUMP_ANON_SHARED)) 413 + ((1 << MMF_DUMP_ANON_PRIVATE) | (1 << MMF_DUMP_ANON_SHARED) |\ 414 + (1 << MMF_DUMP_HUGETLB_PRIVATE)) 412 415 413 416 struct sighand_struct { 414 417 atomic_t count;