Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

powerpc/crash: add crash memory hotplug support

Extend the arch crash hotplug handler, as introduced by the patch title
("powerpc: add crash CPU hotplug support"), to also support memory
add/remove events.

Elfcorehdr describes the memory of the crash kernel to capture the
kernel; hence, it needs to be updated if memory resources change due to
memory add/remove events. Therefore, arch_crash_handle_hotplug_event()
is updated to recreate the elfcorehdr and replace it with the previous
one on memory add/remove events.

The memblock list is used to prepare the elfcorehdr. In the case of
memory hot remove, the memblock list is updated after the arch crash
hotplug handler is triggered, as depicted in Figure 1. Thus, the
hot-removed memory is explicitly removed from the crash memory ranges
to ensure that the memory ranges added to elfcorehdr do not include the
hot-removed memory.

Memory remove
|
v
Offline pages
|
v
Initiate memory notify call <----> crash hotplug handler
chain for MEM_OFFLINE event
|
v
Update memblock list

Figure 1

There are two system calls, `kexec_file_load` and `kexec_load`, used to
load the kdump image. A few changes have been made to ensure that the
kernel can safely update the elfcorehdr component of the kdump image for
both system calls.

For the kexec_file_load syscall, kdump image is prepared in the kernel.
To support an increasing number of memory regions, the elfcorehdr is
built with extra buffer space to ensure that it can accommodate
additional memory ranges in future.

For the kexec_load syscall, the elfcorehdr is updated only if the
KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by the
kexec tool. Passing this flag to the kernel indicates that the
elfcorehdr is built to accommodate additional memory ranges and the
elfcorehdr segment is not considered for SHA calculation, making it safe
to update.

The changes related to this feature are kept under the CRASH_HOTPLUG
config, and it is enabled by default.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240326055413.186534-7-sourabhjain@linux.ibm.com

authored by

Sourabh Jain and committed by
Michael Ellerman
849599b7 b741092d

+202 -2
+3
arch/powerpc/include/asm/kexec.h
··· 141 141 142 142 int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); 143 143 #define arch_crash_hotplug_support arch_crash_hotplug_support 144 + 145 + unsigned int arch_crash_get_elfcorehdr_size(void); 146 + #define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size 144 147 #endif /* CONFIG_CRASH_HOTPLUG */ 145 148 146 149 extern int crashing_cpu;
+1
arch/powerpc/include/asm/kexec_ranges.h
··· 7 7 void sort_memory_ranges(struct crash_mem *mrngs, bool merge); 8 8 struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges); 9 9 int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); 10 + int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); 10 11 int get_exclude_memory_ranges(struct crash_mem **mem_ranges); 11 12 int get_reserved_memory_ranges(struct crash_mem **mem_ranges); 12 13 int get_crash_memory_ranges(struct crash_mem **mem_ranges);
+94 -1
arch/powerpc/kexec/crash.c
··· 17 17 #include <linux/irq.h> 18 18 #include <linux/types.h> 19 19 #include <linux/libfdt.h> 20 + #include <linux/memory.h> 20 21 21 22 #include <asm/processor.h> 22 23 #include <asm/machdep.h> ··· 26 25 #include <asm/setjmp.h> 27 26 #include <asm/debug.h> 28 27 #include <asm/interrupt.h> 28 + #include <asm/kexec_ranges.h> 29 29 30 30 /* 31 31 * The primary CPU waits a while for all secondary CPUs to enter. This is to ··· 400 398 #undef pr_fmt 401 399 #define pr_fmt(fmt) "crash hp: " fmt 402 400 401 + /* 402 + * Advertise preferred elfcorehdr size to userspace via 403 + * /sys/kernel/crash_elfcorehdr_size sysfs interface. 404 + */ 405 + unsigned int arch_crash_get_elfcorehdr_size(void) 406 + { 407 + unsigned long phdr_cnt; 408 + 409 + /* A program header for possible CPUs + vmcoreinfo */ 410 + phdr_cnt = num_possible_cpus() + 1; 411 + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) 412 + phdr_cnt += CONFIG_CRASH_MAX_MEMORY_RANGES; 413 + 414 + return sizeof(struct elfhdr) + (phdr_cnt * sizeof(Elf64_Phdr)); 415 + } 416 + 417 + /** 418 + * update_crash_elfcorehdr() - Recreate the elfcorehdr and replace it with old 419 + * elfcorehdr in the kexec segment array. 420 + * @image: the active struct kimage 421 + * @mn: struct memory_notify data handler 422 + */ 423 + static void update_crash_elfcorehdr(struct kimage *image, struct memory_notify *mn) 424 + { 425 + int ret; 426 + struct crash_mem *cmem = NULL; 427 + struct kexec_segment *ksegment; 428 + void *ptr, *mem, *elfbuf = NULL; 429 + unsigned long elfsz, memsz, base_addr, size; 430 + 431 + ksegment = &image->segment[image->elfcorehdr_index]; 432 + mem = (void *) ksegment->mem; 433 + memsz = ksegment->memsz; 434 + 435 + ret = get_crash_memory_ranges(&cmem); 436 + if (ret) { 437 + pr_err("Failed to get crash mem range\n"); 438 + return; 439 + } 440 + 441 + /* 442 + * The hot unplugged memory is part of crash memory ranges, 443 + * remove it here. 444 + */ 445 + if (image->hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY) { 446 + base_addr = PFN_PHYS(mn->start_pfn); 447 + size = mn->nr_pages * PAGE_SIZE; 448 + ret = remove_mem_range(&cmem, base_addr, size); 449 + if (ret) { 450 + pr_err("Failed to remove hot-unplugged memory from crash memory ranges\n"); 451 + goto out; 452 + } 453 + } 454 + 455 + ret = crash_prepare_elf64_headers(cmem, false, &elfbuf, &elfsz); 456 + if (ret) { 457 + pr_err("Failed to prepare elf header\n"); 458 + goto out; 459 + } 460 + 461 + /* 462 + * It is unlikely that kernel hit this because elfcorehdr kexec 463 + * segment (memsz) is built with addition space to accommodate growing 464 + * number of crash memory ranges while loading the kdump kernel. It is 465 + * Just to avoid any unforeseen case. 466 + */ 467 + if (elfsz > memsz) { 468 + pr_err("Updated crash elfcorehdr elfsz %lu > memsz %lu", elfsz, memsz); 469 + goto out; 470 + } 471 + 472 + ptr = __va(mem); 473 + if (ptr) { 474 + /* Temporarily invalidate the crash image while it is replaced */ 475 + xchg(&kexec_crash_image, NULL); 476 + 477 + /* Replace the old elfcorehdr with newly prepared elfcorehdr */ 478 + memcpy((void *)ptr, elfbuf, elfsz); 479 + 480 + /* The crash image is now valid once again */ 481 + xchg(&kexec_crash_image, image); 482 + } 483 + out: 484 + kvfree(cmem); 485 + if (elfbuf) 486 + kvfree(elfbuf); 487 + } 488 + 403 489 /** 404 490 * get_fdt_index - Loop through the kexec segment array and find 405 491 * the index of the FDT segment. ··· 568 478 */ 569 479 void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) 570 480 { 481 + struct memory_notify *mn; 482 + 571 483 switch (image->hp_action) { 572 484 case KEXEC_CRASH_HP_REMOVE_CPU: 573 485 return; ··· 580 488 581 489 case KEXEC_CRASH_HP_REMOVE_MEMORY: 582 490 case KEXEC_CRASH_HP_ADD_MEMORY: 583 - pr_info_once("Crash update is not supported for memory hotplug\n"); 491 + mn = (struct memory_notify *)arg; 492 + update_crash_elfcorehdr(image, mn); 584 493 return; 585 494 default: 586 495 pr_warn_once("Unknown hotplug action\n");
+19 -1
arch/powerpc/kexec/file_load_64.c
··· 595 595 } 596 596 } 597 597 598 + static unsigned int kdump_extra_elfcorehdr_size(struct crash_mem *cmem) 599 + { 600 + #if defined(CONFIG_CRASH_HOTPLUG) && defined(CONFIG_MEMORY_HOTPLUG) 601 + unsigned int extra_sz = 0; 602 + 603 + if (CONFIG_CRASH_MAX_MEMORY_RANGES > (unsigned int)PN_XNUM) 604 + pr_warn("Number of Phdrs %u exceeds max\n", CONFIG_CRASH_MAX_MEMORY_RANGES); 605 + else if (cmem->nr_ranges >= CONFIG_CRASH_MAX_MEMORY_RANGES) 606 + pr_warn("Configured crash mem ranges may not be enough\n"); 607 + else 608 + extra_sz = (CONFIG_CRASH_MAX_MEMORY_RANGES - cmem->nr_ranges) * sizeof(Elf64_Phdr); 609 + 610 + return extra_sz; 611 + #endif 612 + return 0; 613 + } 614 + 598 615 /** 599 616 * load_elfcorehdr_segment - Setup crash memory ranges and initialize elfcorehdr 600 617 * segment needed to load kdump kernel. ··· 643 626 644 627 kbuf->buffer = headers; 645 628 kbuf->mem = KEXEC_BUF_MEM_UNKNOWN; 646 - kbuf->bufsz = kbuf->memsz = headers_sz; 629 + kbuf->bufsz = headers_sz; 630 + kbuf->memsz = headers_sz + kdump_extra_elfcorehdr_size(cmem); 647 631 kbuf->top_down = false; 648 632 649 633 ret = kexec_add_buffer(kbuf);
+85
arch/powerpc/kexec/ranges.c
··· 620 620 pr_err("Failed to setup crash memory ranges\n"); 621 621 return ret; 622 622 } 623 + 624 + /** 625 + * remove_mem_range - Removes the given memory range from the range list. 626 + * @mem_ranges: Range list to remove the memory range to. 627 + * @base: Base address of the range to remove. 628 + * @size: Size of the memory range to remove. 629 + * 630 + * (Re)allocates memory, if needed. 631 + * 632 + * Returns 0 on success, negative errno on error. 633 + */ 634 + int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size) 635 + { 636 + u64 end; 637 + int ret = 0; 638 + unsigned int i; 639 + u64 mstart, mend; 640 + struct crash_mem *mem_rngs = *mem_ranges; 641 + 642 + if (!size) 643 + return 0; 644 + 645 + /* 646 + * Memory range are stored as start and end address, use 647 + * the same format to do remove operation. 648 + */ 649 + end = base + size - 1; 650 + 651 + for (i = 0; i < mem_rngs->nr_ranges; i++) { 652 + mstart = mem_rngs->ranges[i].start; 653 + mend = mem_rngs->ranges[i].end; 654 + 655 + /* 656 + * Memory range to remove is not part of this range entry 657 + * in the memory range list 658 + */ 659 + if (!(base >= mstart && end <= mend)) 660 + continue; 661 + 662 + /* 663 + * Memory range to remove is equivalent to this entry in the 664 + * memory range list. Remove the range entry from the list. 665 + */ 666 + if (base == mstart && end == mend) { 667 + for (; i < mem_rngs->nr_ranges - 1; i++) { 668 + mem_rngs->ranges[i].start = mem_rngs->ranges[i+1].start; 669 + mem_rngs->ranges[i].end = mem_rngs->ranges[i+1].end; 670 + } 671 + mem_rngs->nr_ranges--; 672 + goto out; 673 + } 674 + /* 675 + * Start address of the memory range to remove and the 676 + * current memory range entry in the list is same. Just 677 + * move the start address of the current memory range 678 + * entry in the list to end + 1. 679 + */ 680 + else if (base == mstart) { 681 + mem_rngs->ranges[i].start = end + 1; 682 + goto out; 683 + } 684 + /* 685 + * End address of the memory range to remove and the 686 + * current memory range entry in the list is same. 687 + * Just move the end address of the current memory 688 + * range entry in the list to base - 1. 689 + */ 690 + else if (end == mend) { 691 + mem_rngs->ranges[i].end = base - 1; 692 + goto out; 693 + } 694 + /* 695 + * Memory range to remove is not at the edge of current 696 + * memory range entry. Split the current memory entry into 697 + * two half. 698 + */ 699 + else { 700 + mem_rngs->ranges[i].end = base - 1; 701 + size = mem_rngs->ranges[i].end - end; 702 + ret = add_mem_range(mem_ranges, end + 1, size); 703 + } 704 + } 705 + out: 706 + return ret; 707 + } 623 708 #endif /* CONFIG_CRASH_DUMP */