Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 's390-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Alexander Gordeev:

- Store AP Query Configuration Information in a static buffer

- Rework the AP initialization and add missing cleanups to the error
path

- Swap IRQ and AP bus/device registration to avoid race conditions

- Export prot_virt_guest symbol

- Introduce AP configuration changes notifier interface to facilitate
modularization of the AP bus

- Add CONFIG_AP kernel configuration option to allow modularization of
the AP bus

- Rework CONFIG_ZCRYPT_DEBUG kernel configuration option description
and dependency and rename it to CONFIG_AP_DEBUG

- Convert sprintf() and snprintf() to sysfs_emit() in CIO code

- Adjust indentation of RELOCS command build step

- Make crypto performance counters upward compatible

- Convert make_page_secure() and gmap_make_secure() to use folio

- Rework channel-utilization-block (CUB) handling in preparation of
introducing additional CUBs

- Use attribute groups to simplify registration, removal and extension
of measurement-related channel-path sysfs attributes

- Add a per-channel-path binary "ext_measurement" sysfs attribute that
provides access to extended channel-path measurement data

- Export measurement data for all channel-measurement-groups (CMG), not
only for a specific ones. This enables support of new CMG data
formats in userspace without the need for kernel changes

- Add a per-channel-path sysfs attribute "speed_bps" that provides the
operating speed in bits per second or 0 if the operating speed is not
available

- The CIO tracepoint subchannel-type field "st" is incorrectly set to
the value of subchannel-enabled SCHIB "ena" field. Fix that

- Do not forcefully limit vmemmap starting address to MAX_PHYSMEM_BITS

- Consider the maximum physical address available to a DCSS segment
(512GB) when memory layout is set up

- Simplify the virtual memory layout setup by reducing the size of
identity mapping vs vmemmap overlap

- Swap vmalloc and Lowcore/Real Memory Copy areas in virtual memory.
This will allow to place the kernel image next to kernel modules

- Move everyting KASLR related from <asm/setup.h> to <asm/page.h>

- Put virtual memory layout information into a structure to improve
code generation

- Currently __kaslr_offset is the kernel offset in both physical and
virtual memory spaces. Uncouple these offsets to allow uncoupling of
the addresses spaces

- Currently the identity mapping base address is implicit and is always
set to zero. Make it explicit by putting into __identity_base
persistent boot variable and use it in proper context

- Introduce .amode31 section start and end macros AMODE31_START and
AMODE31_END

- Introduce OS_INFO entries that do not reference any data in memory,
but rather provide only values

- Store virtual memory layout in OS_INFO. It is read out by
makedumpfile, crash and other tools

- Store virtual memory layout in VMCORE_INFO. It is read out by crash
and other tools when /proc/kcore device is used

- Create additional PT_LOAD ELF program header that covers kernel image
only, so that vmcore tools could locate kernel text and data when
virtual and physical memory spaces are uncoupled

- Uncouple physical and virtual address spaces

- Map kernel at fixed location when KASLR mode is disabled. The
location is defined by CONFIG_KERNEL_IMAGE_BASE kernel configuration
value.

- Rework deployment of kernel image for both compressed and
uncompressed variants as defined by CONFIG_KERNEL_UNCOMPRESSED kernel
configuration value

- Move .vmlinux.relocs section in front of the compressed kernel. The
interim section rescue step is avoided as result

- Correct modules thunk offset calculation when branch target is more
than 2GB away

- Kernel modules contain their own set of expoline thunks. Now that the
kernel modules area is less than 4GB away from kernel expoline
thunks, make modules use kernel expolines. Also make EXPOLINE_EXTERN
the default if the compiler supports it

- userfaultfd can insert shared zeropages into processes running VMs,
but that is not allowed for s390. Fallback to allocating a fresh
zeroed anonymous folio and insert that instead

- Re-enable shared zeropages for non-PV and non-skeys KVM guests

- Rename hex2bitmap() to ap_hex2bitmap() and export it for external use

- Add ap_config sysfs attribute to provide the means for setting or
displaying adapters, domains and control domains assigned to a
vfio-ap mediated device in a single operation

- Make vfio_ap_mdev_link_queue() ignore duplicate link requests

- Add write support to ap_config sysfs attribute to allow atomic update
a vfio-ap mediated device state

- Document ap_config sysfs attribute

- Function os_info_old_init() is expected to be called only from a
regular kdump kernel. Enable it to be called from a stand-alone dump
kernel

- Address gcc -Warray-bounds warning and fix array size in struct
os_info

- s390 does not support SMBIOS, so drop unneeded CONFIG_DMI checks

- Use unwinder instead of __builtin_return_address() with ftrace to
prevent returning of undefined values

- Sections .hash and .gnu.hash are only created when CONFIG_PIE_BUILD
kernel is enabled. Drop these for the case CONFIG_PIE_BUILD is
disabled

- Compile kernel with -fPIC and link with -no-pie to allow kpatch
feature always succeed and drop the whole CONFIG_PIE_BUILD
option-enabled code

- Add missing virt_to_phys() converter for VSIE facility and crypto
control blocks

* tag 's390-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits)
Revert "s390: Relocate vmlinux ELF data to virtual address space"
KVM: s390: vsie: Use virt_to_phys for crypto control block
s390: Relocate vmlinux ELF data to virtual address space
s390: Compile kernel with -fPIC and link with -no-pie
s390: vmlinux.lds.S: Drop .hash and .gnu.hash for !CONFIG_PIE_BUILD
s390/ftrace: Use unwinder instead of __builtin_return_address()
s390/pci: Drop unneeded reference to CONFIG_DMI
s390/os_info: Fix array size in struct os_info
s390/os_info: Initialize old os_info in standalone dump kernel
docs: Update s390 vfio-ap doc for ap_config sysfs attribute
s390/vfio-ap: Add write support to sysfs attr ap_config
s390/vfio-ap: Ignore duplicate link requests in vfio_ap_mdev_link_queue
s390/vfio-ap: Add sysfs attr, ap_config, to export mdev state
s390/ap: Externalize AP bus specific bitmap reading function
s390/mm: Re-enable the shared zeropage for !PV and !skeys KVM guests
mm/userfaultfd: Do not place zeropages when zeropages are disallowed
s390/expoline: Make modules use kernel expolines
s390/nospec: Correct modules thunk offset calculation
s390/boot: Do not rescue .vmlinux.relocs section
s390/boot: Rework deployment of the kernel image
...

+1453 -703
+3 -1
Documentation/admin-guide/kernel-parameters.txt
··· 4785 4785 4786 4786 prot_virt= [S390] enable hosting protected virtual machines 4787 4787 isolated from the hypervisor (if hardware supports 4788 - that). 4788 + that). If enabled, the default kernel base address 4789 + might be overridden even when Kernel Address Space 4790 + Layout Randomization is disabled. 4789 4791 Format: <bool> 4790 4792 4791 4793 psi= [KNL] Enable or disable pressure stall information
+1
Documentation/arch/s390/index.rst
··· 8 8 cds 9 9 3270 10 10 driver-model 11 + mm 11 12 monreader 12 13 qeth 13 14 s390dbf
+111
Documentation/arch/s390/mm.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================= 4 + Memory Management 5 + ================= 6 + 7 + Virtual memory layout 8 + ===================== 9 + 10 + .. note:: 11 + 12 + - Some aspects of the virtual memory layout setup are not 13 + clarified (number of page levels, alignment, DMA memory). 14 + 15 + - Unused gaps in the virtual memory layout could be present 16 + or not - depending on how partucular system is configured. 17 + No page tables are created for the unused gaps. 18 + 19 + - The virtual memory regions are tracked or untracked by KASAN 20 + instrumentation, as well as the KASAN shadow memory itself is 21 + created only when CONFIG_KASAN configuration option is enabled. 22 + 23 + :: 24 + 25 + ============================================================================= 26 + | Physical | Virtual | VM area description 27 + ============================================================================= 28 + +- 0 --------------+- 0 --------------+ 29 + | | S390_lowcore | Low-address memory 30 + | +- 8 KB -----------+ 31 + | | | 32 + | | | 33 + | | ... unused gap | KASAN untracked 34 + | | | 35 + +- AMODE31_START --+- AMODE31_START --+ .amode31 rand. phys/virt start 36 + |.amode31 text/data|.amode31 text/data| KASAN untracked 37 + +- AMODE31_END ----+- AMODE31_END ----+ .amode31 rand. phys/virt end (<2GB) 38 + | | | 39 + | | | 40 + +- __kaslr_offset_phys | kernel rand. phys start 41 + | | | 42 + | kernel text/data | | 43 + | | | 44 + +------------------+ | kernel phys end 45 + | | | 46 + | | | 47 + | | | 48 + | | | 49 + +- ident_map_size -+ | 50 + | | 51 + | ... unused gap | KASAN untracked 52 + | | 53 + +- __identity_base + identity mapping start (>= 2GB) 54 + | | 55 + | identity | phys == virt - __identity_base 56 + | mapping | virt == phys + __identity_base 57 + | | 58 + | | KASAN tracked 59 + | | 60 + | | 61 + | | 62 + | | 63 + | | 64 + | | 65 + | | 66 + | | 67 + | | 68 + | | 69 + | | 70 + | | 71 + | | 72 + | | 73 + | | 74 + +---- vmemmap -----+ 'struct page' array start 75 + | | 76 + | virtually mapped | 77 + | memory map | KASAN untracked 78 + | | 79 + +- __abs_lowcore --+ 80 + | | 81 + | Absolute Lowcore | KASAN untracked 82 + | | 83 + +- __memcpy_real_area 84 + | | 85 + | Real Memory Copy| KASAN untracked 86 + | | 87 + +- VMALLOC_START --+ vmalloc area start 88 + | | KASAN untracked or 89 + | vmalloc area | KASAN shallowly populated in case 90 + | | CONFIG_KASAN_VMALLOC=y 91 + +- MODULES_VADDR --+ modules area start 92 + | | KASAN allocated per module or 93 + | modules area | KASAN shallowly populated in case 94 + | | CONFIG_KASAN_VMALLOC=y 95 + +- __kaslr_offset -+ kernel rand. virt start 96 + | | KASAN tracked 97 + | kernel text/data | phys == (kvirt - __kaslr_offset) + 98 + | | __kaslr_offset_phys 99 + +- kernel .bss end + kernel rand. virt end 100 + | | 101 + | ... unused gap | KASAN untracked 102 + | | 103 + +------------------+ UltraVisor Secure Storage limit 104 + | | 105 + | ... unused gap | KASAN untracked 106 + | | 107 + +KASAN_SHADOW_START+ KASAN shadow memory start 108 + | | 109 + | KASAN shadow | KASAN untracked 110 + | | 111 + +------------------+ ASCE limit
+31 -1
Documentation/arch/s390/vfio-ap.rst
··· 380 380 control_domains: 381 381 A read-only file for displaying the control domain numbers assigned to the 382 382 vfio_ap mediated device. 383 + ap_config: 384 + A read/write file that, when written to, allows all three of the 385 + vfio_ap mediated device's ap matrix masks to be replaced in one shot. 386 + Three masks are given, one for adapters, one for domains, and one for 387 + control domains. If the given state cannot be set then no changes are 388 + made to the vfio-ap mediated device. 389 + 390 + The format of the data written to ap_config is as follows: 391 + {amask},{dmask},{cmask}\n 392 + 393 + \n is a newline character. 394 + 395 + amask, dmask, and cmask are masks identifying which adapters, domains, 396 + and control domains should be assigned to the mediated device. 397 + 398 + The format of a mask is as follows: 399 + 0xNN..NN 400 + 401 + Where NN..NN is 64 hexadecimal characters representing a 256-bit value. 402 + The leftmost (highest order) bit represents adapter/domain 0. 403 + 404 + For an example set of masks that represent your mdev's current 405 + configuration, simply cat ap_config. 406 + 407 + Setting an adapter or domain number greater than the maximum allowed for 408 + the system will result in an error. 409 + 410 + This attribute is intended to be used by automation. End users would be 411 + better served using the respective assign/unassign attributes for 412 + adapters, domains, and control domains. 383 413 384 414 * functions: 385 415 ··· 580 550 following Kconfig elements selected: 581 551 * IOMMU_SUPPORT 582 552 * S390 583 - * ZCRYPT 553 + * AP 584 554 * VFIO 585 555 * KVM 586 556
+51 -14
arch/s390/Kconfig
··· 17 17 config ARCH_HAS_ILOG2_U64 18 18 def_bool n 19 19 20 + config ARCH_PROC_KCORE_TEXT 21 + def_bool y 22 + 20 23 config GENERIC_HWEIGHT 21 24 def_bool y 22 25 ··· 555 552 If unsure, say N. 556 553 557 554 config EXPOLINE_EXTERN 558 - def_bool n 555 + def_bool y if EXPOLINE 559 556 depends on EXPOLINE 560 557 depends on CC_IS_GCC && GCC_VERSION >= 110200 561 558 depends on $(success,$(srctree)/arch/s390/tools/gcc-thunk-extern.sh $(CC)) ··· 593 590 Note: this option exists only for documentation purposes, please do 594 591 not remove it. 595 592 596 - config PIE_BUILD 597 - def_bool CC_IS_CLANG && !$(cc-option,-munaligned-symbols) 598 - help 599 - If the compiler is unable to generate code that can manage unaligned 600 - symbols, the kernel is linked as a position-independent executable 601 - (PIE) and includes dynamic relocations that are processed early 602 - during bootup. 603 - 604 - For kpatch functionality, it is recommended to build the kernel 605 - without the PIE_BUILD option. PIE_BUILD is only enabled when the 606 - compiler lacks proper support for handling unaligned symbols. 607 - 608 593 config RANDOMIZE_BASE 609 594 bool "Randomize the address of the kernel image (KASLR)" 610 595 default y ··· 601 610 this randomizes the address at which the kernel image is loaded, 602 611 as a security feature that deters exploit attempts relying on 603 612 knowledge of the location of kernel internals. 613 + 614 + config KERNEL_IMAGE_BASE 615 + hex "Kernel image base address" 616 + range 0x100000 0x1FFFFFE0000000 if !KASAN 617 + range 0x100000 0x1BFFFFE0000000 if KASAN 618 + default 0x3FFE0000000 if !KASAN 619 + default 0x7FFFE0000000 if KASAN 620 + help 621 + This is the address at which the kernel image is loaded in case 622 + Kernel Address Space Layout Randomization (KASLR) is disabled. 623 + 624 + In case the Protected virtualization guest support is enabled the 625 + Ultravisor imposes a virtual address limit. If the value of this 626 + option leads to the kernel image exceeding the Ultravisor limit, 627 + this option is ignored and the image is loaded below the limit. 628 + 629 + If the value of this option leads to the kernel image overlapping 630 + the virtual memory where other data structures are located, this 631 + option is ignored and the image is loaded above the structures. 604 632 605 633 endmenu 606 634 ··· 734 724 To compile this driver as a module, choose M here: the 735 725 module will be called eadm_sch. 736 726 727 + config AP 728 + def_tristate y 729 + prompt "Support for Adjunct Processors (ap)" 730 + help 731 + This driver allows usage to Adjunct Processor (AP) devices via 732 + the ap bus, cards and queues. Supported Adjunct Processors are 733 + the CryptoExpress Cards (CEX). 734 + 735 + To compile this driver as a module, choose M here: the 736 + module will be called ap. 737 + 738 + If unsure, say Y (default). 739 + 740 + config AP_DEBUG 741 + def_bool n 742 + prompt "Enable debug features for Adjunct Processor (ap) devices" 743 + depends on AP 744 + help 745 + Say 'Y' here to enable some additional debug features for Adjunct 746 + Processor (ap) devices. 747 + 748 + There will be some more sysfs attributes displayed for ap queues. 749 + 750 + Do not enable on production level kernel build. 751 + 752 + If unsure, say N. 753 + 737 754 config VFIO_CCW 738 755 def_tristate n 739 756 prompt "Support for VFIO-CCW subchannels" ··· 777 740 prompt "VFIO support for AP devices" 778 741 depends on KVM 779 742 depends on VFIO 780 - depends on ZCRYPT 743 + depends on AP 781 744 select VFIO_MDEV 782 745 help 783 746 This driver grants access to Adjunct Processor (AP) devices
+2 -13
arch/s390/Makefile
··· 14 14 KBUILD_CFLAGS_MODULE += -fPIC 15 15 KBUILD_AFLAGS += -m64 16 16 KBUILD_CFLAGS += -m64 17 - ifdef CONFIG_PIE_BUILD 18 - KBUILD_CFLAGS += -fPIE 19 - LDFLAGS_vmlinux := -pie -z notext 20 - else 21 - KBUILD_CFLAGS += $(call cc-option,-munaligned-symbols,) 22 - LDFLAGS_vmlinux := --emit-relocs --discard-none 17 + KBUILD_CFLAGS += -fPIC 18 + LDFLAGS_vmlinux := -no-pie --emit-relocs --discard-none 23 19 extra_tools := relocs 24 - endif 25 20 aflags_dwarf := -Wa,-gdwarf-2 26 21 KBUILD_AFLAGS_DECOMPRESSOR := $(CLANG_FLAGS) -m64 -D__ASSEMBLY__ 27 22 ifndef CONFIG_AS_IS_LLVM ··· 83 88 84 89 ifdef CONFIG_EXPOLINE 85 90 ifdef CONFIG_EXPOLINE_EXTERN 86 - KBUILD_LDFLAGS_MODULE += arch/s390/lib/expoline/expoline.o 87 91 CC_FLAGS_EXPOLINE := -mindirect-branch=thunk-extern 88 92 CC_FLAGS_EXPOLINE += -mfunction-return=thunk-extern 89 93 else ··· 161 167 vdso-install-y += arch/s390/kernel/vdso64/vdso64.so.dbg 162 168 vdso-install-$(CONFIG_COMPAT) += arch/s390/kernel/vdso32/vdso32.so.dbg 163 169 164 - ifdef CONFIG_EXPOLINE_EXTERN 165 - modules_prepare: expoline_prepare 166 - expoline_prepare: scripts 167 - $(Q)$(MAKE) $(build)=arch/s390/lib/expoline arch/s390/lib/expoline/expoline.o 168 - endif 169 170 endif 170 171 171 172 # Don't use tabs in echo arguments
+2 -7
arch/s390/boot/Makefile
··· 37 37 38 38 obj-y := head.o als.o startup.o physmem_info.o ipl_parm.o ipl_report.o vmem.o 39 39 obj-y += string.o ebcdic.o sclp_early_core.o mem.o ipl_vmparm.o cmdline.o 40 - obj-y += version.o pgm_check_info.o ctype.o ipl_data.o 41 - obj-y += $(if $(CONFIG_PIE_BUILD),machine_kexec_reloc.o,relocs.o) 40 + obj-y += version.o pgm_check_info.o ctype.o ipl_data.o relocs.o 42 41 obj-$(findstring y, $(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) $(CONFIG_PGSTE)) += uv.o 43 42 obj-$(CONFIG_RANDOMIZE_BASE) += kaslr.o 44 43 obj-y += $(if $(CONFIG_KERNEL_UNCOMPRESSED),,decompressor.o) info.o ··· 48 49 targets += vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 49 50 targets += vmlinux.bin.xz vmlinux.bin.lzma vmlinux.bin.lzo vmlinux.bin.lz4 50 51 targets += vmlinux.bin.zst info.bin syms.bin vmlinux.syms $(obj-all) 51 - ifndef CONFIG_PIE_BUILD 52 52 targets += relocs.S 53 - endif 54 53 55 54 OBJECTS := $(addprefix $(obj)/,$(obj-y)) 56 55 OBJECTS_ALL := $(addprefix $(obj)/,$(obj-all)) ··· 107 110 $(obj)/vmlinux.bin: vmlinux FORCE 108 111 $(call if_changed,objcopy) 109 112 110 - ifndef CONFIG_PIE_BUILD 111 113 CMD_RELOCS=arch/s390/tools/relocs 112 - quiet_cmd_relocs = RELOCS $@ 114 + quiet_cmd_relocs = RELOCS $@ 113 115 cmd_relocs = $(CMD_RELOCS) $< > $@ 114 116 $(obj)/relocs.S: vmlinux FORCE 115 117 $(call if_changed,relocs) 116 - endif 117 118 118 119 suffix-$(CONFIG_KERNEL_GZIP) := .gz 119 120 suffix-$(CONFIG_KERNEL_BZIP2) := .bz2
+6 -8
arch/s390/boot/boot.h
··· 17 17 }; 18 18 19 19 struct vmlinux_info { 20 - unsigned long default_lma; 21 20 unsigned long entry; 22 21 unsigned long image_size; /* does not include .bss */ 23 22 unsigned long bss_size; /* uncompressed image .bss size */ ··· 24 25 unsigned long bootdata_size; 25 26 unsigned long bootdata_preserved_off; 26 27 unsigned long bootdata_preserved_size; 27 - #ifdef CONFIG_PIE_BUILD 28 - unsigned long dynsym_start; 29 - unsigned long rela_dyn_start; 30 - unsigned long rela_dyn_end; 31 - #else 32 28 unsigned long got_start; 33 29 unsigned long got_end; 34 - #endif 35 30 unsigned long amode31_size; 36 31 unsigned long init_mm_off; 37 32 unsigned long swapper_pg_dir_off; ··· 67 74 void print_pgm_check_info(void); 68 75 unsigned long randomize_within_range(unsigned long size, unsigned long align, 69 76 unsigned long min, unsigned long max); 70 - void setup_vmem(unsigned long asce_limit); 77 + void setup_vmem(unsigned long kernel_start, unsigned long kernel_end, unsigned long asce_limit); 71 78 void __printf(1, 2) decompressor_printk(const char *fmt, ...); 72 79 void print_stacktrace(unsigned long sp); 73 80 void error(char *m); 81 + int get_random(unsigned long limit, unsigned long *value); 74 82 75 83 extern struct machine_info machine; 76 84 ··· 92 98 #define vmlinux _vmlinux_info 93 99 94 100 #define __abs_lowcore_pa(x) (((unsigned long)(x) - __abs_lowcore) % sizeof(struct lowcore)) 101 + #define __kernel_va(x) ((void *)((unsigned long)(x) - __kaslr_offset_phys + __kaslr_offset)) 102 + #define __kernel_pa(x) ((unsigned long)(x) - __kaslr_offset + __kaslr_offset_phys) 103 + #define __identity_va(x) ((void *)((unsigned long)(x) + __identity_base)) 104 + #define __identity_pa(x) ((unsigned long)(x) - __identity_base) 95 105 96 106 static inline bool intersects(unsigned long addr0, unsigned long size0, 97 107 unsigned long addr1, unsigned long size1)
+2 -13
arch/s390/boot/decompressor.c
··· 63 63 #include "../../../../lib/decompress_unzstd.c" 64 64 #endif 65 65 66 - #define decompress_offset ALIGN((unsigned long)_end + BOOT_HEAP_SIZE, PAGE_SIZE) 67 - 68 66 unsigned long mem_safe_offset(void) 69 67 { 70 - /* 71 - * due to 4MB HEAD_SIZE for bzip2 72 - * 'decompress_offset + vmlinux.image_size' could be larger than 73 - * kernel at final position + its .bss, so take the larger of two 74 - */ 75 - return max(decompress_offset + vmlinux.image_size, 76 - vmlinux.default_lma + vmlinux.image_size + vmlinux.bss_size); 68 + return ALIGN(free_mem_end_ptr, PAGE_SIZE); 77 69 } 78 70 79 - void *decompress_kernel(void) 71 + void deploy_kernel(void *output) 80 72 { 81 - void *output = (void *)decompress_offset; 82 - 83 73 __decompress(_compressed_start, _compressed_end - _compressed_start, 84 74 NULL, NULL, output, vmlinux.image_size, NULL, error); 85 - return output; 86 75 }
+3 -5
arch/s390/boot/decompressor.h
··· 2 2 #ifndef BOOT_COMPRESSED_DECOMPRESSOR_H 3 3 #define BOOT_COMPRESSED_DECOMPRESSOR_H 4 4 5 - #ifdef CONFIG_KERNEL_UNCOMPRESSED 6 - static inline void *decompress_kernel(void) { return NULL; } 7 - #else 8 - void *decompress_kernel(void); 9 - #endif 5 + #ifndef CONFIG_KERNEL_UNCOMPRESSED 10 6 unsigned long mem_safe_offset(void); 7 + void deploy_kernel(void *output); 8 + #endif 11 9 12 10 #endif /* BOOT_COMPRESSED_DECOMPRESSOR_H */
+1 -1
arch/s390/boot/kaslr.c
··· 43 43 return PRNG_MODE_TDES; 44 44 } 45 45 46 - static int get_random(unsigned long limit, unsigned long *value) 46 + int get_random(unsigned long limit, unsigned long *value) 47 47 { 48 48 struct prng_parm prng = { 49 49 /* initial parameter block for tdes mode, copied from libica */
+3 -1
arch/s390/boot/pgm_check_info.c
··· 153 153 decompressor_printk("Kernel command line: %s\n", early_command_line); 154 154 decompressor_printk("Kernel fault: interruption code %04x ilc:%x\n", 155 155 S390_lowcore.pgm_code, S390_lowcore.pgm_ilc >> 1); 156 - if (kaslr_enabled()) 156 + if (kaslr_enabled()) { 157 157 decompressor_printk("Kernel random base: %lx\n", __kaslr_offset); 158 + decompressor_printk("Kernel random base phys: %lx\n", __kaslr_offset_phys); 159 + } 158 160 decompressor_printk("PSW : %016lx %016lx (%pS)\n", 159 161 S390_lowcore.psw_save_area.mask, 160 162 S390_lowcore.psw_save_area.addr,
+136 -127
arch/s390/boot/startup.c
··· 3 3 #include <linux/elf.h> 4 4 #include <asm/page-states.h> 5 5 #include <asm/boot_data.h> 6 + #include <asm/extmem.h> 6 7 #include <asm/sections.h> 7 8 #include <asm/maccess.h> 8 9 #include <asm/cpu_mf.h> ··· 19 18 #include "boot.h" 20 19 #include "uv.h" 21 20 22 - unsigned long __bootdata_preserved(__kaslr_offset); 21 + struct vm_layout __bootdata_preserved(vm_layout); 23 22 unsigned long __bootdata_preserved(__abs_lowcore); 24 23 unsigned long __bootdata_preserved(__memcpy_real_area); 25 24 pte_t *__bootdata_preserved(memcpy_real_ptep); ··· 30 29 unsigned long __bootdata_preserved(MODULES_VADDR); 31 30 unsigned long __bootdata_preserved(MODULES_END); 32 31 unsigned long __bootdata_preserved(max_mappable); 33 - unsigned long __bootdata(ident_map_size); 34 32 35 33 u64 __bootdata_preserved(stfle_fac_list[16]); 36 34 u64 __bootdata_preserved(alt_stfle_fac_list[16]); ··· 109 109 } 110 110 111 111 #ifdef CONFIG_KERNEL_UNCOMPRESSED 112 - unsigned long mem_safe_offset(void) 112 + static unsigned long mem_safe_offset(void) 113 113 { 114 - return vmlinux.default_lma + vmlinux.image_size + vmlinux.bss_size; 114 + return (unsigned long)_compressed_start; 115 + } 116 + 117 + static void deploy_kernel(void *output) 118 + { 119 + void *uncompressed_start = (void *)_compressed_start; 120 + 121 + if (output == uncompressed_start) 122 + return; 123 + memmove(output, uncompressed_start, vmlinux.image_size); 124 + memset(uncompressed_start, 0, vmlinux.image_size); 115 125 } 116 126 #endif 117 127 ··· 151 141 memcpy((void *)vmlinux.bootdata_preserved_off, __boot_data_preserved_start, vmlinux.bootdata_preserved_size); 152 142 } 153 143 154 - #ifdef CONFIG_PIE_BUILD 155 - static void kaslr_adjust_relocs(unsigned long min_addr, unsigned long max_addr, unsigned long offset) 156 - { 157 - Elf64_Rela *rela_start, *rela_end, *rela; 158 - int r_type, r_sym, rc; 159 - Elf64_Addr loc, val; 160 - Elf64_Sym *dynsym; 161 - 162 - rela_start = (Elf64_Rela *) vmlinux.rela_dyn_start; 163 - rela_end = (Elf64_Rela *) vmlinux.rela_dyn_end; 164 - dynsym = (Elf64_Sym *) vmlinux.dynsym_start; 165 - for (rela = rela_start; rela < rela_end; rela++) { 166 - loc = rela->r_offset + offset; 167 - val = rela->r_addend; 168 - r_sym = ELF64_R_SYM(rela->r_info); 169 - if (r_sym) { 170 - if (dynsym[r_sym].st_shndx != SHN_UNDEF) 171 - val += dynsym[r_sym].st_value + offset; 172 - } else { 173 - /* 174 - * 0 == undefined symbol table index (STN_UNDEF), 175 - * used for R_390_RELATIVE, only add KASLR offset 176 - */ 177 - val += offset; 178 - } 179 - r_type = ELF64_R_TYPE(rela->r_info); 180 - rc = arch_kexec_do_relocs(r_type, (void *) loc, val, 0); 181 - if (rc) 182 - error("Unknown relocation type"); 183 - } 184 - } 185 - 186 - static void kaslr_adjust_got(unsigned long offset) {} 187 - static void rescue_relocs(void) {} 188 - static void free_relocs(void) {} 189 - #else 190 - static int *vmlinux_relocs_64_start; 191 - static int *vmlinux_relocs_64_end; 192 - 193 - static void rescue_relocs(void) 194 - { 195 - unsigned long size = __vmlinux_relocs_64_end - __vmlinux_relocs_64_start; 196 - 197 - vmlinux_relocs_64_start = (void *)physmem_alloc_top_down(RR_RELOC, size, 0); 198 - vmlinux_relocs_64_end = (void *)vmlinux_relocs_64_start + size; 199 - memmove(vmlinux_relocs_64_start, __vmlinux_relocs_64_start, size); 200 - } 201 - 202 - static void free_relocs(void) 203 - { 204 - physmem_free(RR_RELOC); 205 - } 206 - 207 - static void kaslr_adjust_relocs(unsigned long min_addr, unsigned long max_addr, unsigned long offset) 144 + static void kaslr_adjust_relocs(unsigned long min_addr, unsigned long max_addr, 145 + unsigned long offset, unsigned long phys_offset) 208 146 { 209 147 int *reloc; 210 148 long loc; 211 149 212 150 /* Adjust R_390_64 relocations */ 213 - for (reloc = vmlinux_relocs_64_start; reloc < vmlinux_relocs_64_end; reloc++) { 214 - loc = (long)*reloc + offset; 151 + for (reloc = (int *)__vmlinux_relocs_64_start; reloc < (int *)__vmlinux_relocs_64_end; reloc++) { 152 + loc = (long)*reloc + phys_offset; 215 153 if (loc < min_addr || loc > max_addr) 216 154 error("64-bit relocation outside of kernel!\n"); 217 - *(u64 *)loc += offset; 155 + *(u64 *)loc += offset - __START_KERNEL; 218 156 } 219 157 } 220 158 ··· 175 217 * reason. Adjust the GOT entries. 176 218 */ 177 219 for (entry = (u64 *)vmlinux.got_start; entry < (u64 *)vmlinux.got_end; entry++) 178 - *entry += offset; 220 + *entry += offset - __START_KERNEL; 179 221 } 180 - #endif 181 222 182 223 /* 183 224 * Merge information from several sources into a single ident_map_size value. ··· 218 261 #endif 219 262 } 220 263 221 - static unsigned long setup_kernel_memory_layout(void) 264 + #define FIXMAP_SIZE round_up(MEMCPY_REAL_SIZE + ABS_LOWCORE_MAP_SIZE, sizeof(struct lowcore)) 265 + 266 + static unsigned long get_vmem_size(unsigned long identity_size, 267 + unsigned long vmemmap_size, 268 + unsigned long vmalloc_size, 269 + unsigned long rte_size) 270 + { 271 + unsigned long max_mappable, vsize; 272 + 273 + max_mappable = max(identity_size, MAX_DCSS_ADDR); 274 + vsize = round_up(SZ_2G + max_mappable, rte_size) + 275 + round_up(vmemmap_size, rte_size) + 276 + FIXMAP_SIZE + MODULES_LEN + KASLR_LEN; 277 + return size_add(vsize, vmalloc_size); 278 + } 279 + 280 + static unsigned long setup_kernel_memory_layout(unsigned long kernel_size) 222 281 { 223 282 unsigned long vmemmap_start; 283 + unsigned long kernel_start; 224 284 unsigned long asce_limit; 225 285 unsigned long rte_size; 226 286 unsigned long pages; ··· 249 275 vmemmap_size = SECTION_ALIGN_UP(pages) * sizeof(struct page); 250 276 251 277 /* choose kernel address space layout: 4 or 3 levels. */ 252 - vsize = round_up(ident_map_size, _REGION3_SIZE) + vmemmap_size + 253 - MODULES_LEN + MEMCPY_REAL_SIZE + ABS_LOWCORE_MAP_SIZE; 254 - vsize = size_add(vsize, vmalloc_size); 255 - if (IS_ENABLED(CONFIG_KASAN) || (vsize > _REGION2_SIZE)) { 278 + BUILD_BUG_ON(!IS_ALIGNED(__START_KERNEL, THREAD_SIZE)); 279 + BUILD_BUG_ON(!IS_ALIGNED(__NO_KASLR_START_KERNEL, THREAD_SIZE)); 280 + BUILD_BUG_ON(__NO_KASLR_END_KERNEL > _REGION1_SIZE); 281 + vsize = get_vmem_size(ident_map_size, vmemmap_size, vmalloc_size, _REGION3_SIZE); 282 + if (IS_ENABLED(CONFIG_KASAN) || __NO_KASLR_END_KERNEL > _REGION2_SIZE || 283 + (vsize > _REGION2_SIZE && kaslr_enabled())) { 256 284 asce_limit = _REGION1_SIZE; 257 - rte_size = _REGION2_SIZE; 285 + if (__NO_KASLR_END_KERNEL > _REGION2_SIZE) { 286 + rte_size = _REGION2_SIZE; 287 + vsize = get_vmem_size(ident_map_size, vmemmap_size, vmalloc_size, _REGION2_SIZE); 288 + } else { 289 + rte_size = _REGION3_SIZE; 290 + } 258 291 } else { 259 292 asce_limit = _REGION2_SIZE; 260 293 rte_size = _REGION3_SIZE; ··· 271 290 * Forcing modules and vmalloc area under the ultravisor 272 291 * secure storage limit, so that any vmalloc allocation 273 292 * we do could be used to back secure guest storage. 293 + * 294 + * Assume the secure storage limit always exceeds _REGION2_SIZE, 295 + * otherwise asce_limit and rte_size would have been adjusted. 274 296 */ 275 297 vmax = adjust_to_uv_max(asce_limit); 276 298 #ifdef CONFIG_KASAN 299 + BUILD_BUG_ON(__NO_KASLR_END_KERNEL > KASAN_SHADOW_START); 277 300 /* force vmalloc and modules below kasan shadow */ 278 301 vmax = min(vmax, KASAN_SHADOW_START); 279 302 #endif 280 - __memcpy_real_area = round_down(vmax - MEMCPY_REAL_SIZE, PAGE_SIZE); 281 - __abs_lowcore = round_down(__memcpy_real_area - ABS_LOWCORE_MAP_SIZE, 282 - sizeof(struct lowcore)); 283 - MODULES_END = round_down(__abs_lowcore, _SEGMENT_SIZE); 303 + vsize = min(vsize, vmax); 304 + if (kaslr_enabled()) { 305 + unsigned long kernel_end, kaslr_len, slots, pos; 306 + 307 + kaslr_len = max(KASLR_LEN, vmax - vsize); 308 + slots = DIV_ROUND_UP(kaslr_len - kernel_size, THREAD_SIZE); 309 + if (get_random(slots, &pos)) 310 + pos = 0; 311 + kernel_end = vmax - pos * THREAD_SIZE; 312 + kernel_start = round_down(kernel_end - kernel_size, THREAD_SIZE); 313 + } else if (vmax < __NO_KASLR_END_KERNEL || vsize > __NO_KASLR_END_KERNEL) { 314 + kernel_start = round_down(vmax - kernel_size, THREAD_SIZE); 315 + decompressor_printk("The kernel base address is forced to %lx\n", kernel_start); 316 + } else { 317 + kernel_start = __NO_KASLR_START_KERNEL; 318 + } 319 + __kaslr_offset = kernel_start; 320 + 321 + MODULES_END = round_down(kernel_start, _SEGMENT_SIZE); 284 322 MODULES_VADDR = MODULES_END - MODULES_LEN; 285 323 VMALLOC_END = MODULES_VADDR; 286 324 287 325 /* allow vmalloc area to occupy up to about 1/2 of the rest virtual space left */ 288 - vsize = round_down(VMALLOC_END / 2, _SEGMENT_SIZE); 326 + vsize = (VMALLOC_END - FIXMAP_SIZE) / 2; 327 + vsize = round_down(vsize, _SEGMENT_SIZE); 289 328 vmalloc_size = min(vmalloc_size, vsize); 290 329 VMALLOC_START = VMALLOC_END - vmalloc_size; 291 330 331 + __memcpy_real_area = round_down(VMALLOC_START - MEMCPY_REAL_SIZE, PAGE_SIZE); 332 + __abs_lowcore = round_down(__memcpy_real_area - ABS_LOWCORE_MAP_SIZE, 333 + sizeof(struct lowcore)); 334 + 292 335 /* split remaining virtual space between 1:1 mapping & vmemmap array */ 293 - pages = VMALLOC_START / (PAGE_SIZE + sizeof(struct page)); 336 + pages = __abs_lowcore / (PAGE_SIZE + sizeof(struct page)); 294 337 pages = SECTION_ALIGN_UP(pages); 295 338 /* keep vmemmap_start aligned to a top level region table entry */ 296 - vmemmap_start = round_down(VMALLOC_START - pages * sizeof(struct page), rte_size); 297 - vmemmap_start = min(vmemmap_start, 1UL << MAX_PHYSMEM_BITS); 298 - /* maximum mappable address as seen by arch_get_mappable_range() */ 299 - max_mappable = vmemmap_start; 339 + vmemmap_start = round_down(__abs_lowcore - pages * sizeof(struct page), rte_size); 300 340 /* make sure identity map doesn't overlay with vmemmap */ 301 341 ident_map_size = min(ident_map_size, vmemmap_start); 302 342 vmemmap_size = SECTION_ALIGN_UP(ident_map_size / PAGE_SIZE) * sizeof(struct page); 303 - /* make sure vmemmap doesn't overlay with vmalloc area */ 304 - VMALLOC_START = max(vmemmap_start + vmemmap_size, VMALLOC_START); 343 + /* make sure vmemmap doesn't overlay with absolute lowcore area */ 344 + if (vmemmap_start + vmemmap_size > __abs_lowcore) { 345 + vmemmap_size = SECTION_ALIGN_DOWN(ident_map_size / PAGE_SIZE) * sizeof(struct page); 346 + ident_map_size = vmemmap_size / sizeof(struct page) * PAGE_SIZE; 347 + } 305 348 vmemmap = (struct page *)vmemmap_start; 349 + /* maximum address for which linear mapping could be created (DCSS, memory) */ 350 + BUILD_BUG_ON(MAX_DCSS_ADDR > (1UL << MAX_PHYSMEM_BITS)); 351 + max_mappable = max(ident_map_size, MAX_DCSS_ADDR); 352 + max_mappable = min(max_mappable, vmemmap_start); 353 + __identity_base = round_down(vmemmap_start - max_mappable, rte_size); 306 354 307 355 return asce_limit; 308 356 } ··· 339 329 /* 340 330 * This function clears the BSS section of the decompressed Linux kernel and NOT the decompressor's. 341 331 */ 342 - static void clear_bss_section(unsigned long vmlinux_lma) 332 + static void clear_bss_section(unsigned long kernel_start) 343 333 { 344 - memset((void *)vmlinux_lma + vmlinux.image_size, 0, vmlinux.bss_size); 334 + memset((void *)kernel_start + vmlinux.image_size, 0, vmlinux.bss_size); 345 335 } 346 336 347 337 /* ··· 358 348 vmalloc_size = max(size, vmalloc_size); 359 349 } 360 350 361 - static void kaslr_adjust_vmlinux_info(unsigned long offset) 351 + static void kaslr_adjust_vmlinux_info(long offset) 362 352 { 363 - *(unsigned long *)(&vmlinux.entry) += offset; 364 353 vmlinux.bootdata_off += offset; 365 354 vmlinux.bootdata_preserved_off += offset; 366 - #ifdef CONFIG_PIE_BUILD 367 - vmlinux.rela_dyn_start += offset; 368 - vmlinux.rela_dyn_end += offset; 369 - vmlinux.dynsym_start += offset; 370 - #else 371 355 vmlinux.got_start += offset; 372 356 vmlinux.got_end += offset; 373 - #endif 374 357 vmlinux.init_mm_off += offset; 375 358 vmlinux.swapper_pg_dir_off += offset; 376 359 vmlinux.invalid_pg_dir_off += offset; ··· 376 373 #endif 377 374 } 378 375 376 + static void fixup_vmlinux_info(void) 377 + { 378 + vmlinux.entry -= __START_KERNEL; 379 + kaslr_adjust_vmlinux_info(-__START_KERNEL); 380 + } 381 + 379 382 void startup_kernel(void) 380 383 { 381 - unsigned long max_physmem_end; 382 - unsigned long vmlinux_lma = 0; 384 + unsigned long kernel_size = vmlinux.image_size + vmlinux.bss_size; 385 + unsigned long nokaslr_offset_phys = mem_safe_offset(); 383 386 unsigned long amode31_lma = 0; 387 + unsigned long max_physmem_end; 384 388 unsigned long asce_limit; 385 389 unsigned long safe_addr; 386 - void *img; 387 390 psw_t psw; 388 391 392 + fixup_vmlinux_info(); 389 393 setup_lpp(); 390 - safe_addr = mem_safe_offset(); 394 + safe_addr = PAGE_ALIGN(nokaslr_offset_phys + kernel_size); 391 395 392 396 /* 393 - * Reserve decompressor memory together with decompression heap, buffer and 394 - * memory which might be occupied by uncompressed kernel at default 1Mb 395 - * position (if KASLR is off or failed). 397 + * Reserve decompressor memory together with decompression heap, 398 + * buffer and memory which might be occupied by uncompressed kernel 399 + * (if KASLR is off or failed). 396 400 */ 397 401 physmem_reserve(RR_DECOMPRESSOR, 0, safe_addr); 398 402 if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && parmarea.initrd_size) ··· 419 409 max_physmem_end = detect_max_physmem_end(); 420 410 setup_ident_map_size(max_physmem_end); 421 411 setup_vmalloc_size(); 422 - asce_limit = setup_kernel_memory_layout(); 412 + asce_limit = setup_kernel_memory_layout(kernel_size); 423 413 /* got final ident_map_size, physmem allocations could be performed now */ 424 414 physmem_set_usable_limit(ident_map_size); 425 415 detect_physmem_online_ranges(max_physmem_end); 426 416 save_ipl_cert_comp_list(); 427 417 rescue_initrd(safe_addr, ident_map_size); 428 - rescue_relocs(); 429 418 430 - if (kaslr_enabled()) { 431 - vmlinux_lma = randomize_within_range(vmlinux.image_size + vmlinux.bss_size, 432 - THREAD_SIZE, vmlinux.default_lma, 433 - ident_map_size); 434 - if (vmlinux_lma) { 435 - __kaslr_offset = vmlinux_lma - vmlinux.default_lma; 436 - kaslr_adjust_vmlinux_info(__kaslr_offset); 437 - } 438 - } 439 - vmlinux_lma = vmlinux_lma ?: vmlinux.default_lma; 440 - physmem_reserve(RR_VMLINUX, vmlinux_lma, vmlinux.image_size + vmlinux.bss_size); 441 - 442 - if (!IS_ENABLED(CONFIG_KERNEL_UNCOMPRESSED)) { 443 - img = decompress_kernel(); 444 - memmove((void *)vmlinux_lma, img, vmlinux.image_size); 445 - } else if (__kaslr_offset) { 446 - img = (void *)vmlinux.default_lma; 447 - memmove((void *)vmlinux_lma, img, vmlinux.image_size); 448 - memset(img, 0, vmlinux.image_size); 449 - } 419 + if (kaslr_enabled()) 420 + __kaslr_offset_phys = randomize_within_range(kernel_size, THREAD_SIZE, 0, ident_map_size); 421 + if (!__kaslr_offset_phys) 422 + __kaslr_offset_phys = nokaslr_offset_phys; 423 + kaslr_adjust_vmlinux_info(__kaslr_offset_phys); 424 + physmem_reserve(RR_VMLINUX, __kaslr_offset_phys, kernel_size); 425 + deploy_kernel((void *)__kaslr_offset_phys); 450 426 451 427 /* vmlinux decompression is done, shrink reserved low memory */ 452 428 physmem_reserve(RR_DECOMPRESSOR, 0, (unsigned long)_decompressor_end); 429 + 430 + /* 431 + * In case KASLR is enabled the randomized location of .amode31 432 + * section might overlap with .vmlinux.relocs section. To avoid that 433 + * the below randomize_within_range() could have been called with 434 + * __vmlinux_relocs_64_end as the lower range address. However, 435 + * .amode31 section is written to by the decompressed kernel - at 436 + * that time the contents of .vmlinux.relocs is not needed anymore. 437 + * Conversly, .vmlinux.relocs is read only by the decompressor, even 438 + * before the kernel started. Therefore, in case the two sections 439 + * overlap there is no risk of corrupting any data. 440 + */ 453 441 if (kaslr_enabled()) 454 442 amode31_lma = randomize_within_range(vmlinux.amode31_size, PAGE_SIZE, 0, SZ_2G); 455 - amode31_lma = amode31_lma ?: vmlinux.default_lma - vmlinux.amode31_size; 443 + if (!amode31_lma) 444 + amode31_lma = __kaslr_offset_phys - vmlinux.amode31_size; 456 445 physmem_reserve(RR_AMODE31, amode31_lma, vmlinux.amode31_size); 457 446 458 447 /* ··· 467 458 * - copy_bootdata() must follow setup_vmem() to propagate changes 468 459 * to bootdata made by setup_vmem() 469 460 */ 470 - clear_bss_section(vmlinux_lma); 471 - kaslr_adjust_relocs(vmlinux_lma, vmlinux_lma + vmlinux.image_size, __kaslr_offset); 461 + clear_bss_section(__kaslr_offset_phys); 462 + kaslr_adjust_relocs(__kaslr_offset_phys, __kaslr_offset_phys + vmlinux.image_size, 463 + __kaslr_offset, __kaslr_offset_phys); 472 464 kaslr_adjust_got(__kaslr_offset); 473 - free_relocs(); 474 - setup_vmem(asce_limit); 465 + setup_vmem(__kaslr_offset, __kaslr_offset + kernel_size, asce_limit); 475 466 copy_bootdata(); 476 467 477 468 /* 478 469 * Save KASLR offset for early dumps, before vmcore_info is set. 479 470 * Mark as uneven to distinguish from real vmcore_info pointer. 480 471 */ 481 - S390_lowcore.vmcore_info = __kaslr_offset ? __kaslr_offset | 0x1UL : 0; 472 + S390_lowcore.vmcore_info = __kaslr_offset_phys ? __kaslr_offset_phys | 0x1UL : 0; 482 473 483 474 /* 484 475 * Jump to the decompressed kernel entry point and switch DAT mode on. 485 476 */ 486 - psw.addr = vmlinux.entry; 477 + psw.addr = __kaslr_offset + vmlinux.entry; 487 478 psw.mask = PSW_KERNEL_BITS; 488 479 __load_psw(psw); 489 480 }
+58 -50
arch/s390/boot/vmem.c
··· 27 27 POPULATE_NONE, 28 28 POPULATE_DIRECT, 29 29 POPULATE_ABS_LOWCORE, 30 + POPULATE_IDENTITY, 31 + POPULATE_KERNEL, 30 32 #ifdef CONFIG_KASAN 31 33 POPULATE_KASAN_MAP_SHADOW, 32 34 POPULATE_KASAN_ZERO_SHADOW, ··· 56 54 pgtable_populate(start, end, mode); 57 55 } 58 56 59 - static void kasan_populate_shadow(void) 57 + static void kasan_populate_shadow(unsigned long kernel_start, unsigned long kernel_end) 60 58 { 61 59 pmd_t pmd_z = __pmd(__pa(kasan_early_shadow_pte) | _SEGMENT_ENTRY); 62 60 pud_t pud_z = __pud(__pa(kasan_early_shadow_pmd) | _REGION3_ENTRY); ··· 78 76 __arch_set_page_dat(kasan_early_shadow_pmd, 1UL << CRST_ALLOC_ORDER); 79 77 __arch_set_page_dat(kasan_early_shadow_pte, 1); 80 78 81 - /* 82 - * Current memory layout: 83 - * +- 0 -------------+ +- shadow start -+ 84 - * |1:1 ident mapping| /|1/8 of ident map| 85 - * | | / | | 86 - * +-end of ident map+ / +----------------+ 87 - * | ... gap ... | / | kasan | 88 - * | | / | zero page | 89 - * +- vmalloc area -+ / | mapping | 90 - * | vmalloc_size | / | (untracked) | 91 - * +- modules vaddr -+ / +----------------+ 92 - * | 2Gb |/ | unmapped | allocated per module 93 - * +- shadow start -+ +----------------+ 94 - * | 1/8 addr space | | zero pg mapping| (untracked) 95 - * +- shadow end ----+---------+- shadow end ---+ 96 - * 97 - * Current memory layout (KASAN_VMALLOC): 98 - * +- 0 -------------+ +- shadow start -+ 99 - * |1:1 ident mapping| /|1/8 of ident map| 100 - * | | / | | 101 - * +-end of ident map+ / +----------------+ 102 - * | ... gap ... | / | kasan zero page| (untracked) 103 - * | | / | mapping | 104 - * +- vmalloc area -+ / +----------------+ 105 - * | vmalloc_size | / |shallow populate| 106 - * +- modules vaddr -+ / +----------------+ 107 - * | 2Gb |/ |shallow populate| 108 - * +- shadow start -+ +----------------+ 109 - * | 1/8 addr space | | zero pg mapping| (untracked) 110 - * +- shadow end ----+---------+- shadow end ---+ 111 - */ 112 - 113 79 for_each_physmem_usable_range(i, &start, &end) { 114 - kasan_populate(start, end, POPULATE_KASAN_MAP_SHADOW); 115 - if (memgap_start && physmem_info.info_source == MEM_DETECT_DIAG260) 116 - kasan_populate(memgap_start, start, POPULATE_KASAN_ZERO_SHADOW); 80 + kasan_populate((unsigned long)__identity_va(start), 81 + (unsigned long)__identity_va(end), 82 + POPULATE_KASAN_MAP_SHADOW); 83 + if (memgap_start && physmem_info.info_source == MEM_DETECT_DIAG260) { 84 + kasan_populate((unsigned long)__identity_va(memgap_start), 85 + (unsigned long)__identity_va(start), 86 + POPULATE_KASAN_ZERO_SHADOW); 87 + } 117 88 memgap_start = end; 118 89 } 90 + kasan_populate(kernel_start, kernel_end, POPULATE_KASAN_MAP_SHADOW); 91 + kasan_populate(0, (unsigned long)__identity_va(0), POPULATE_KASAN_ZERO_SHADOW); 92 + kasan_populate(AMODE31_START, AMODE31_END, POPULATE_KASAN_ZERO_SHADOW); 119 93 if (IS_ENABLED(CONFIG_KASAN_VMALLOC)) { 120 94 untracked_end = VMALLOC_START; 121 95 /* shallowly populate kasan shadow for vmalloc and modules */ ··· 100 122 untracked_end = MODULES_VADDR; 101 123 } 102 124 /* populate kasan shadow for untracked memory */ 103 - kasan_populate(ident_map_size, untracked_end, POPULATE_KASAN_ZERO_SHADOW); 104 - kasan_populate(MODULES_END, _REGION1_SIZE, POPULATE_KASAN_ZERO_SHADOW); 125 + kasan_populate((unsigned long)__identity_va(ident_map_size), untracked_end, 126 + POPULATE_KASAN_ZERO_SHADOW); 127 + kasan_populate(kernel_end, _REGION1_SIZE, POPULATE_KASAN_ZERO_SHADOW); 105 128 } 106 129 107 130 static bool kasan_pgd_populate_zero_shadow(pgd_t *pgd, unsigned long addr, ··· 159 180 } 160 181 #else 161 182 162 - static inline void kasan_populate_shadow(void) {} 183 + static inline void kasan_populate_shadow(unsigned long kernel_start, unsigned long kernel_end) 184 + { 185 + } 163 186 164 187 static inline bool kasan_pgd_populate_zero_shadow(pgd_t *pgd, unsigned long addr, 165 188 unsigned long end, enum populate_mode mode) ··· 244 263 return addr; 245 264 case POPULATE_ABS_LOWCORE: 246 265 return __abs_lowcore_pa(addr); 266 + case POPULATE_KERNEL: 267 + return __kernel_pa(addr); 268 + case POPULATE_IDENTITY: 269 + return __identity_pa(addr); 247 270 #ifdef CONFIG_KASAN 248 271 case POPULATE_KASAN_MAP_SHADOW: 249 272 addr = physmem_alloc_top_down(RR_VMEM, size, size); ··· 259 274 } 260 275 } 261 276 262 - static bool can_large_pud(pud_t *pu_dir, unsigned long addr, unsigned long end) 277 + static bool large_allowed(enum populate_mode mode) 263 278 { 264 - return machine.has_edat2 && 279 + return (mode == POPULATE_DIRECT) || (mode == POPULATE_IDENTITY); 280 + } 281 + 282 + static bool can_large_pud(pud_t *pu_dir, unsigned long addr, unsigned long end, 283 + enum populate_mode mode) 284 + { 285 + return machine.has_edat2 && large_allowed(mode) && 265 286 IS_ALIGNED(addr, PUD_SIZE) && (end - addr) >= PUD_SIZE; 266 287 } 267 288 268 - static bool can_large_pmd(pmd_t *pm_dir, unsigned long addr, unsigned long end) 289 + static bool can_large_pmd(pmd_t *pm_dir, unsigned long addr, unsigned long end, 290 + enum populate_mode mode) 269 291 { 270 - return machine.has_edat1 && 292 + return machine.has_edat1 && large_allowed(mode) && 271 293 IS_ALIGNED(addr, PMD_SIZE) && (end - addr) >= PMD_SIZE; 272 294 } 273 295 ··· 314 322 if (pmd_none(*pmd)) { 315 323 if (kasan_pmd_populate_zero_shadow(pmd, addr, next, mode)) 316 324 continue; 317 - if (can_large_pmd(pmd, addr, next)) { 325 + if (can_large_pmd(pmd, addr, next, mode)) { 318 326 entry = __pmd(_pa(addr, _SEGMENT_SIZE, mode)); 319 327 entry = set_pmd_bit(entry, SEGMENT_KERNEL); 320 328 if (!machine.has_nx) ··· 347 355 if (pud_none(*pud)) { 348 356 if (kasan_pud_populate_zero_shadow(pud, addr, next, mode)) 349 357 continue; 350 - if (can_large_pud(pud, addr, next)) { 358 + if (can_large_pud(pud, addr, next, mode)) { 351 359 entry = __pud(_pa(addr, _REGION3_SIZE, mode)); 352 360 entry = set_pud_bit(entry, REGION3_KERNEL); 353 361 if (!machine.has_nx) ··· 410 418 } 411 419 } 412 420 413 - void setup_vmem(unsigned long asce_limit) 421 + void setup_vmem(unsigned long kernel_start, unsigned long kernel_end, unsigned long asce_limit) 414 422 { 415 423 unsigned long start, end; 416 424 unsigned long asce_type; 417 425 unsigned long asce_bits; 426 + pgd_t *init_mm_pgd; 418 427 int i; 419 428 420 429 /* ··· 425 432 */ 426 433 for_each_physmem_online_range(i, &start, &end) 427 434 __arch_set_page_nodat((void *)start, (end - start) >> PAGE_SHIFT); 435 + 436 + /* 437 + * init_mm->pgd contains virtual address of swapper_pg_dir. 438 + * It is unusable at this stage since DAT is yet off. Swap 439 + * it for physical address of swapper_pg_dir and restore 440 + * the virtual address after all page tables are created. 441 + */ 442 + init_mm_pgd = init_mm.pgd; 443 + init_mm.pgd = (pgd_t *)swapper_pg_dir; 428 444 429 445 if (asce_limit == _REGION1_SIZE) { 430 446 asce_type = _REGION2_ENTRY_EMPTY; ··· 455 453 * the lowcore and create the identity mapping only afterwards. 456 454 */ 457 455 pgtable_populate(0, sizeof(struct lowcore), POPULATE_DIRECT); 458 - for_each_physmem_usable_range(i, &start, &end) 459 - pgtable_populate(start, end, POPULATE_DIRECT); 456 + for_each_physmem_usable_range(i, &start, &end) { 457 + pgtable_populate((unsigned long)__identity_va(start), 458 + (unsigned long)__identity_va(end), 459 + POPULATE_IDENTITY); 460 + } 461 + pgtable_populate(kernel_start, kernel_end, POPULATE_KERNEL); 462 + pgtable_populate(AMODE31_START, AMODE31_END, POPULATE_DIRECT); 460 463 pgtable_populate(__abs_lowcore, __abs_lowcore + sizeof(struct lowcore), 461 464 POPULATE_ABS_LOWCORE); 462 465 pgtable_populate(__memcpy_real_area, __memcpy_real_area + PAGE_SIZE, 463 466 POPULATE_NONE); 464 - memcpy_real_ptep = __virt_to_kpte(__memcpy_real_area); 467 + memcpy_real_ptep = __identity_va(__virt_to_kpte(__memcpy_real_area)); 465 468 466 - kasan_populate_shadow(); 469 + kasan_populate_shadow(kernel_start, kernel_end); 467 470 468 471 S390_lowcore.kernel_asce.val = swapper_pg_dir | asce_bits; 469 472 S390_lowcore.user_asce = s390_invalid_asce; ··· 478 471 local_ctl_load(13, &S390_lowcore.kernel_asce); 479 472 480 473 init_mm.context.asce = S390_lowcore.kernel_asce.val; 474 + init_mm.pgd = init_mm_pgd; 481 475 }
+9 -19
arch/s390/boot/vmlinux.lds.S
··· 99 99 100 100 _decompressor_end = .; 101 101 102 + . = ALIGN(4); 103 + .vmlinux.relocs : { 104 + __vmlinux_relocs_64_start = .; 105 + *(.vmlinux.relocs_64) 106 + __vmlinux_relocs_64_end = .; 107 + } 108 + 102 109 #ifdef CONFIG_KERNEL_UNCOMPRESSED 103 - . = 0x100000; 110 + . = ALIGN(PAGE_SIZE); 111 + . += AMODE31_SIZE; /* .amode31 section */ 104 112 #else 105 113 . = ALIGN(8); 106 114 #endif ··· 117 109 *(.vmlinux.bin.compressed) 118 110 _compressed_end = .; 119 111 } 120 - 121 - #ifndef CONFIG_PIE_BUILD 122 - /* 123 - * When the kernel is built with CONFIG_KERNEL_UNCOMPRESSED, the entire 124 - * uncompressed vmlinux.bin is positioned in the bzImage decompressor 125 - * image at the default kernel LMA of 0x100000, enabling it to be 126 - * executed in-place. However, the size of .vmlinux.relocs could be 127 - * large enough to cause an overlap with the uncompressed kernel at the 128 - * address 0x100000. To address this issue, .vmlinux.relocs is 129 - * positioned after the .rodata.compressed. 130 - */ 131 - . = ALIGN(4); 132 - .vmlinux.relocs : { 133 - __vmlinux_relocs_64_start = .; 134 - *(.vmlinux.relocs_64) 135 - __vmlinux_relocs_64_end = .; 136 - } 137 - #endif 138 112 139 113 #define SB_TRAILER_SIZE 32 140 114 /* Trailer needed for Secure Boot */
+12 -18
arch/s390/include/asm/ap.h
··· 223 223 * config info as returned by the ap_qci() function. 224 224 */ 225 225 struct ap_config_info { 226 - unsigned int apsc : 1; /* S bit */ 227 - unsigned int apxa : 1; /* N bit */ 228 - unsigned int qact : 1; /* C bit */ 229 - unsigned int rc8a : 1; /* R bit */ 230 - unsigned int : 4; 231 - unsigned int apsb : 1; /* B bit */ 232 - unsigned int : 23; 226 + union { 227 + unsigned int flags; 228 + struct { 229 + unsigned int apsc : 1; /* S bit */ 230 + unsigned int apxa : 1; /* N bit */ 231 + unsigned int qact : 1; /* C bit */ 232 + unsigned int rc8a : 1; /* R bit */ 233 + unsigned int : 4; 234 + unsigned int apsb : 1; /* B bit */ 235 + unsigned int : 23; 236 + }; 237 + }; 233 238 unsigned char na; /* max # of APs - 1 */ 234 239 unsigned char nd; /* max # of Domains - 1 */ 235 240 unsigned char _reserved0[10]; ··· 548 543 549 544 return reg1.status; 550 545 } 551 - 552 - /* 553 - * Interface to tell the AP bus code that a configuration 554 - * change has happened. The bus code should at least do 555 - * an ap bus resource rescan. 556 - */ 557 - #if IS_ENABLED(CONFIG_ZCRYPT) 558 - void ap_bus_cfg_chg(void); 559 - #else 560 - static inline void ap_bus_cfg_chg(void){} 561 - #endif 562 546 563 547 #endif /* _ASM_S390_AP_H_ */
+1
arch/s390/include/asm/asm-prototypes.h
··· 4 4 #include <linux/kvm_host.h> 5 5 #include <linux/ftrace.h> 6 6 #include <asm/fpu.h> 7 + #include <asm/nospec-branch.h> 7 8 #include <asm-generic/asm-prototypes.h> 8 9 9 10 __int128_t __ashlti3(__int128_t a, int b);
+15
arch/s390/include/asm/chsc.h
··· 11 11 12 12 #include <uapi/asm/chsc.h> 13 13 14 + /* struct from linux/notifier.h */ 15 + struct notifier_block; 16 + 14 17 /** 15 18 * Operation codes for CHSC PNSO: 16 19 * PNSO_OC_NET_BRIDGE_INFO - only addresses that are visible to a bridgeport ··· 68 65 struct chsc_pnso_naihdr naihdr; 69 66 struct chsc_pnso_naid_l2 entries[]; 70 67 } __packed __aligned(PAGE_SIZE); 68 + 69 + /* 70 + * notifier interface - registered notifiers gets called on 71 + * the following events: 72 + * - ap config changed (CHSC_NOTIFY_AP_CFG) 73 + */ 74 + enum chsc_notify_type { 75 + CHSC_NOTIFY_AP_CFG = 3, 76 + }; 77 + 78 + int chsc_notifier_register(struct notifier_block *nb); 79 + int chsc_notifier_unregister(struct notifier_block *nb); 71 80 72 81 #endif /* _ASM_S390_CHSC_H */
+7
arch/s390/include/asm/extmem.h
··· 8 8 #define _ASM_S390X_DCSS_H 9 9 #ifndef __ASSEMBLY__ 10 10 11 + /* 12 + * DCSS segment is defined as a contiguous range of pages using DEFSEG command. 13 + * The range start and end is a page number with a value less than or equal to 14 + * 0x7ffffff (see CP Commands and Utilities Reference). 15 + */ 16 + #define MAX_DCSS_ADDR (512UL * SZ_1G) 17 + 11 18 /* possible values for segment type as returned by segment_info */ 12 19 #define SEG_TYPE_SW 0 13 20 #define SEG_TYPE_EW 1
+2 -6
arch/s390/include/asm/ftrace.h
··· 8 8 9 9 #ifndef __ASSEMBLY__ 10 10 11 - #ifdef CONFIG_CC_IS_CLANG 12 - /* https://llvm.org/pr41424 */ 13 - #define ftrace_return_address(n) 0UL 14 - #else 15 - #define ftrace_return_address(n) __builtin_return_address(n) 16 - #endif 11 + unsigned long return_address(unsigned int n); 12 + #define ftrace_return_address(n) return_address(n) 17 13 18 14 void ftrace_caller(void); 19 15
+1 -1
arch/s390/include/asm/gmap.h
··· 146 146 147 147 void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4], 148 148 unsigned long gaddr, unsigned long vmaddr); 149 - int gmap_mark_unmergeable(void); 149 + int s390_disable_cow_sharing(void); 150 150 void s390_unlist_old_asce(struct gmap *gmap); 151 151 int s390_replace_asce(struct gmap *gmap); 152 152 void s390_uv_destroy_pfns(unsigned long count, unsigned long *pfns);
+5
arch/s390/include/asm/mmu.h
··· 32 32 unsigned int uses_skeys:1; 33 33 /* The mmu context uses CMM. */ 34 34 unsigned int uses_cmm:1; 35 + /* 36 + * The mmu context allows COW-sharing of memory pages (KSM, zeropage). 37 + * Note that COW-sharing during fork() is currently always allowed. 38 + */ 39 + unsigned int allow_cow_sharing:1; 35 40 /* The gmaps associated with this context are allowed to use huge pages. */ 36 41 unsigned int allow_gmap_hpage_1m:1; 37 42 } mm_context_t;
+1
arch/s390/include/asm/mmu_context.h
··· 35 35 mm->context.has_pgste = 0; 36 36 mm->context.uses_skeys = 0; 37 37 mm->context.uses_cmm = 0; 38 + mm->context.allow_cow_sharing = 1; 38 39 mm->context.allow_gmap_hpage_1m = 0; 39 40 #endif 40 41 switch (mm->context.asce_limit) {
+20
arch/s390/include/asm/nospec-branch.h
··· 17 17 return __is_defined(CC_USING_EXPOLINE) && !nospec_disable; 18 18 } 19 19 20 + #ifdef CONFIG_EXPOLINE_EXTERN 21 + 22 + void __s390_indirect_jump_r1(void); 23 + void __s390_indirect_jump_r2(void); 24 + void __s390_indirect_jump_r3(void); 25 + void __s390_indirect_jump_r4(void); 26 + void __s390_indirect_jump_r5(void); 27 + void __s390_indirect_jump_r6(void); 28 + void __s390_indirect_jump_r7(void); 29 + void __s390_indirect_jump_r8(void); 30 + void __s390_indirect_jump_r9(void); 31 + void __s390_indirect_jump_r10(void); 32 + void __s390_indirect_jump_r11(void); 33 + void __s390_indirect_jump_r12(void); 34 + void __s390_indirect_jump_r13(void); 35 + void __s390_indirect_jump_r14(void); 36 + void __s390_indirect_jump_r15(void); 37 + 38 + #endif 39 + 20 40 #endif /* __ASSEMBLY__ */ 21 41 22 42 #endif /* _ASM_S390_EXPOLINE_H */
+7 -6
arch/s390/include/asm/nospec-insn.h
··· 16 16 */ 17 17 .macro __THUNK_PROLOG_NAME name 18 18 #ifdef CONFIG_EXPOLINE_EXTERN 19 - .pushsection .text,"ax",@progbits 20 - __ALIGN 19 + SYM_CODE_START(\name) 21 20 #else 22 21 .pushsection .text.\name,"axG",@progbits,\name,comdat 23 - #endif 24 22 .globl \name 25 23 .hidden \name 26 24 .type \name,@function 27 25 \name: 28 26 CFI_STARTPROC 27 + #endif 29 28 .endm 30 29 31 30 .macro __THUNK_EPILOG_NAME name 32 - CFI_ENDPROC 33 31 #ifdef CONFIG_EXPOLINE_EXTERN 34 - .size \name, .-\name 35 - #endif 32 + SYM_CODE_END(\name) 33 + EXPORT_SYMBOL(\name) 34 + #else 35 + CFI_ENDPROC 36 36 .popsection 37 + #endif 37 38 .endm 38 39 39 40 .macro __THUNK_PROLOG_BR r1
+25 -4
arch/s390/include/asm/os_info.h
··· 17 17 #define OS_INFO_VMCOREINFO 0 18 18 #define OS_INFO_REIPL_BLOCK 1 19 19 #define OS_INFO_FLAGS_ENTRY 2 20 + #define OS_INFO_RESERVED 3 21 + #define OS_INFO_IDENTITY_BASE 4 22 + #define OS_INFO_KASLR_OFFSET 5 23 + #define OS_INFO_KASLR_OFF_PHYS 6 24 + #define OS_INFO_VMEMMAP 7 25 + #define OS_INFO_AMODE31_START 8 26 + #define OS_INFO_AMODE31_END 9 27 + #define OS_INFO_IMAGE_START 10 28 + #define OS_INFO_IMAGE_END 11 29 + #define OS_INFO_IMAGE_PHYS 12 30 + #define OS_INFO_MAX 13 20 31 21 32 #define OS_INFO_FLAG_REIPL_CLEAR (1UL << 0) 22 33 23 34 struct os_info_entry { 24 - u64 addr; 35 + union { 36 + u64 addr; 37 + u64 val; 38 + }; 25 39 u64 size; 26 40 u32 csum; 27 41 } __packed; ··· 47 33 u16 version_minor; 48 34 u64 crashkernel_addr; 49 35 u64 crashkernel_size; 50 - struct os_info_entry entry[3]; 51 - u8 reserved[4004]; 36 + struct os_info_entry entry[OS_INFO_MAX]; 37 + u8 reserved[3804]; 52 38 } __packed; 53 39 54 40 void os_info_init(void); 55 - void os_info_entry_add(int nr, void *ptr, u64 len); 41 + void os_info_entry_add_data(int nr, void *ptr, u64 len); 42 + void os_info_entry_add_val(int nr, u64 val); 56 43 void os_info_crashkernel_add(unsigned long base, unsigned long size); 57 44 u32 os_info_csum(struct os_info *os_info); 58 45 59 46 #ifdef CONFIG_CRASH_DUMP 60 47 void *os_info_old_entry(int nr, unsigned long *size); 48 + static inline unsigned long os_info_old_value(int nr) 49 + { 50 + unsigned long size; 51 + 52 + return (unsigned long)os_info_old_entry(nr, &size); 53 + } 61 54 #else 62 55 static inline void *os_info_old_entry(int nr, unsigned long *size) 63 56 {
+45 -5
arch/s390/include/asm/page.h
··· 178 178 #define HAVE_ARCH_MAKE_PAGE_ACCESSIBLE 179 179 #endif 180 180 181 - #define __PAGE_OFFSET 0x0UL 182 - #define PAGE_OFFSET 0x0UL 181 + struct vm_layout { 182 + unsigned long kaslr_offset; 183 + unsigned long kaslr_offset_phys; 184 + unsigned long identity_base; 185 + unsigned long identity_size; 186 + }; 183 187 184 - #define __pa_nodebug(x) ((unsigned long)(x)) 188 + extern struct vm_layout vm_layout; 189 + 190 + #define __kaslr_offset vm_layout.kaslr_offset 191 + #define __kaslr_offset_phys vm_layout.kaslr_offset_phys 192 + #define __identity_base vm_layout.identity_base 193 + #define ident_map_size vm_layout.identity_size 194 + 195 + static inline unsigned long kaslr_offset(void) 196 + { 197 + return __kaslr_offset; 198 + } 199 + 200 + extern int __kaslr_enabled; 201 + static inline int kaslr_enabled(void) 202 + { 203 + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) 204 + return __kaslr_enabled; 205 + return 0; 206 + } 207 + 208 + #define __PAGE_OFFSET __identity_base 209 + #define PAGE_OFFSET __PAGE_OFFSET 185 210 186 211 #ifdef __DECOMPRESSOR 187 212 213 + #define __pa_nodebug(x) ((unsigned long)(x)) 188 214 #define __pa(x) __pa_nodebug(x) 189 215 #define __pa32(x) __pa(x) 190 216 #define __va(x) ((void *)(unsigned long)(x)) 191 217 192 218 #else /* __DECOMPRESSOR */ 219 + 220 + static inline unsigned long __pa_nodebug(unsigned long x) 221 + { 222 + if (x < __kaslr_offset) 223 + return x - __identity_base; 224 + return x - __kaslr_offset + __kaslr_offset_phys; 225 + } 193 226 194 227 #ifdef CONFIG_DEBUG_VIRTUAL 195 228 ··· 239 206 240 207 #define __pa(x) __phys_addr((unsigned long)(x), false) 241 208 #define __pa32(x) __phys_addr((unsigned long)(x), true) 242 - #define __va(x) ((void *)(unsigned long)(x)) 209 + #define __va(x) ((void *)((unsigned long)(x) + __identity_base)) 243 210 244 211 #endif /* __DECOMPRESSOR */ 245 212 ··· 264 231 #define virt_to_page(kaddr) pfn_to_page(virt_to_pfn(kaddr)) 265 232 #define page_to_virt(page) pfn_to_virt(page_to_pfn(page)) 266 233 267 - #define virt_addr_valid(kaddr) pfn_valid(phys_to_pfn(__pa_nodebug(kaddr))) 234 + #define virt_addr_valid(kaddr) pfn_valid(phys_to_pfn(__pa_nodebug((unsigned long)(kaddr)))) 268 235 269 236 #define VM_DATA_DEFAULT_FLAGS VM_DATA_FLAGS_NON_EXEC 270 237 ··· 272 239 273 240 #include <asm-generic/memory_model.h> 274 241 #include <asm-generic/getorder.h> 242 + 243 + #define AMODE31_SIZE (3 * PAGE_SIZE) 244 + 245 + #define KERNEL_IMAGE_SIZE (512 * 1024 * 1024) 246 + #define __START_KERNEL 0x100000 247 + #define __NO_KASLR_START_KERNEL CONFIG_KERNEL_IMAGE_BASE 248 + #define __NO_KASLR_END_KERNEL (__NO_KASLR_START_KERNEL + KERNEL_IMAGE_SIZE) 275 249 276 250 #endif /* _S390_PAGE_H */
+19 -3
arch/s390/include/asm/pgtable.h
··· 107 107 return 1; 108 108 } 109 109 110 + #ifdef CONFIG_RANDOMIZE_BASE 111 + #define KASLR_LEN (1UL << 31) 112 + #else 113 + #define KASLR_LEN 0UL 114 + #endif 115 + 110 116 /* 111 117 * A 64 bit pagetable entry of S390 has following format: 112 118 * | PFRA |0IPC| OS | ··· 572 566 } 573 567 574 568 /* 575 - * In the case that a guest uses storage keys 576 - * faults should no longer be backed by zero pages 569 + * As soon as the guest uses storage keys or enables PV, we deduplicate all 570 + * mapped shared zeropages and prevent new shared zeropages from getting 571 + * mapped. 577 572 */ 578 - #define mm_forbids_zeropage mm_has_pgste 573 + #define mm_forbids_zeropage mm_forbids_zeropage 574 + static inline int mm_forbids_zeropage(struct mm_struct *mm) 575 + { 576 + #ifdef CONFIG_PGSTE 577 + if (!mm->context.allow_cow_sharing) 578 + return 1; 579 + #endif 580 + return 0; 581 + } 582 + 579 583 static inline int mm_uses_skeys(struct mm_struct *mm) 580 584 { 581 585 #ifdef CONFIG_PGSTE
+3 -1
arch/s390/include/asm/physmem_info.h
··· 22 22 RR_DECOMPRESSOR, 23 23 RR_INITRD, 24 24 RR_VMLINUX, 25 - RR_RELOC, 26 25 RR_AMODE31, 27 26 RR_IPLREPORT, 28 27 RR_CERT_COMP_LIST, ··· 168 169 *size = physmem_info.reserved[type].end - physmem_info.reserved[type].start; 169 170 return *size; 170 171 } 172 + 173 + #define AMODE31_START (physmem_info.reserved[RR_AMODE31].start) 174 + #define AMODE31_END (physmem_info.reserved[RR_AMODE31].end) 171 175 172 176 #endif
-14
arch/s390/include/asm/setup.h
··· 127 127 extern void (*_machine_halt)(void); 128 128 extern void (*_machine_power_off)(void); 129 129 130 - extern unsigned long __kaslr_offset; 131 - static inline unsigned long kaslr_offset(void) 132 - { 133 - return __kaslr_offset; 134 - } 135 - 136 - extern int __kaslr_enabled; 137 - static inline int kaslr_enabled(void) 138 - { 139 - if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) 140 - return __kaslr_enabled; 141 - return 0; 142 - } 143 - 144 130 struct oldmem_data { 145 131 unsigned long start; 146 132 unsigned long size;
+2
arch/s390/kernel/Makefile
··· 11 11 # Do not trace early setup code 12 12 CFLAGS_REMOVE_early.o = $(CC_FLAGS_FTRACE) 13 13 CFLAGS_REMOVE_rethook.o = $(CC_FLAGS_FTRACE) 14 + CFLAGS_REMOVE_stacktrace.o = $(CC_FLAGS_FTRACE) 15 + CFLAGS_REMOVE_unwind_bc.o = $(CC_FLAGS_FTRACE) 14 16 15 17 endif 16 18
+36 -5
arch/s390/kernel/crash_dump.c
··· 465 465 ehdr->e_phoff = sizeof(Elf64_Ehdr); 466 466 ehdr->e_ehsize = sizeof(Elf64_Ehdr); 467 467 ehdr->e_phentsize = sizeof(Elf64_Phdr); 468 - ehdr->e_phnum = mem_chunk_cnt + 1; 468 + /* 469 + * Number of memory chunk PT_LOAD program headers plus one kernel 470 + * image PT_LOAD program header plus one PT_NOTE program header. 471 + */ 472 + ehdr->e_phnum = mem_chunk_cnt + 1 + 1; 469 473 return ehdr + 1; 470 474 } 471 475 ··· 505 501 */ 506 502 static void loads_init(Elf64_Phdr *phdr) 507 503 { 504 + unsigned long old_identity_base = os_info_old_value(OS_INFO_IDENTITY_BASE); 508 505 phys_addr_t start, end; 509 506 u64 idx; 510 507 511 508 for_each_physmem_range(idx, &oldmem_type, &start, &end) { 512 - phdr->p_filesz = end - start; 513 509 phdr->p_type = PT_LOAD; 510 + phdr->p_vaddr = old_identity_base + start; 514 511 phdr->p_offset = start; 515 - phdr->p_vaddr = (unsigned long)__va(start); 516 512 phdr->p_paddr = start; 513 + phdr->p_filesz = end - start; 517 514 phdr->p_memsz = end - start; 518 515 phdr->p_flags = PF_R | PF_W | PF_X; 519 516 phdr->p_align = PAGE_SIZE; 520 517 phdr++; 521 518 } 519 + } 520 + 521 + /* 522 + * Prepare PT_LOAD type program header for kernel image region 523 + */ 524 + static void text_init(Elf64_Phdr *phdr) 525 + { 526 + unsigned long start_phys = os_info_old_value(OS_INFO_IMAGE_PHYS); 527 + unsigned long start = os_info_old_value(OS_INFO_IMAGE_START); 528 + unsigned long end = os_info_old_value(OS_INFO_IMAGE_END); 529 + 530 + phdr->p_type = PT_LOAD; 531 + phdr->p_vaddr = start; 532 + phdr->p_filesz = end - start; 533 + phdr->p_memsz = end - start; 534 + phdr->p_offset = start_phys; 535 + phdr->p_paddr = start_phys; 536 + phdr->p_flags = PF_R | PF_W | PF_X; 537 + phdr->p_align = PAGE_SIZE; 522 538 } 523 539 524 540 /* ··· 581 557 size += nt_vmcoreinfo_size(); 582 558 /* nt_final */ 583 559 size += sizeof(Elf64_Nhdr); 560 + /* PT_LOAD type program header for kernel text region */ 561 + size += sizeof(Elf64_Phdr); 584 562 /* PT_LOADS */ 585 563 size += mem_chunk_cnt * sizeof(Elf64_Phdr); 586 564 ··· 594 568 */ 595 569 int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size) 596 570 { 597 - Elf64_Phdr *phdr_notes, *phdr_loads; 571 + Elf64_Phdr *phdr_notes, *phdr_loads, *phdr_text; 598 572 size_t alloc_size; 599 573 int mem_chunk_cnt; 600 574 void *ptr, *hdr; ··· 632 606 /* Init program headers */ 633 607 phdr_notes = ptr; 634 608 ptr = PTR_ADD(ptr, sizeof(Elf64_Phdr)); 609 + phdr_text = ptr; 610 + ptr = PTR_ADD(ptr, sizeof(Elf64_Phdr)); 635 611 phdr_loads = ptr; 636 612 ptr = PTR_ADD(ptr, sizeof(Elf64_Phdr) * mem_chunk_cnt); 637 613 /* Init notes */ 638 614 hdr_off = PTR_DIFF(ptr, hdr); 639 615 ptr = notes_init(phdr_notes, ptr, ((unsigned long) hdr) + hdr_off); 616 + /* Init kernel text program header */ 617 + text_init(phdr_text); 640 618 /* Init loads */ 641 - hdr_off = PTR_DIFF(ptr, hdr); 642 619 loads_init(phdr_loads); 620 + /* Finalize program headers */ 621 + hdr_off = PTR_DIFF(ptr, hdr); 643 622 *addr = (unsigned long long) hdr; 644 623 *size = (unsigned long long) hdr_off; 645 624 BUG_ON(elfcorehdr_size > alloc_size);
+3 -3
arch/s390/kernel/ipl.c
··· 1209 1209 1210 1210 void set_os_info_reipl_block(void) 1211 1211 { 1212 - os_info_entry_add(OS_INFO_REIPL_BLOCK, reipl_block_actual, 1213 - reipl_block_actual->hdr.len); 1212 + os_info_entry_add_data(OS_INFO_REIPL_BLOCK, reipl_block_actual, 1213 + reipl_block_actual->hdr.len); 1214 1214 } 1215 1215 1216 1216 /* reipl type */ ··· 1940 1940 reipl_type == IPL_TYPE_NSS || 1941 1941 reipl_type == IPL_TYPE_UNKNOWN) 1942 1942 os_info_flags |= OS_INFO_FLAG_REIPL_CLEAR; 1943 - os_info_entry_add(OS_INFO_FLAGS_ENTRY, &os_info_flags, sizeof(os_info_flags)); 1943 + os_info_entry_add_data(OS_INFO_FLAGS_ENTRY, &os_info_flags, sizeof(os_info_flags)); 1944 1944 csum = (__force unsigned int)cksm(reipl_block_actual, reipl_block_actual->hdr.len, 0); 1945 1945 abs_lc = get_abs_lowcore(); 1946 1946 abs_lc->ipib = __pa(reipl_block_actual);
+2 -2
arch/s390/kernel/nospec-branch.c
··· 114 114 type = BRASL_EXPOLINE; /* brasl instruction */ 115 115 else 116 116 continue; 117 - thunk = instr + (*(int *)(instr + 2)) * 2; 117 + thunk = instr + (long)(*(int *)(instr + 2)) * 2; 118 118 if (thunk[0] == 0xc6 && thunk[1] == 0x00) 119 119 /* exrl %r0,<target-br> */ 120 - br = thunk + (*(int *)(thunk + 2)) * 2; 120 + br = thunk + (long)(*(int *)(thunk + 2)) * 2; 121 121 else 122 122 continue; 123 123 if (br[0] != 0x07 || (br[1] & 0xf0) != 0xf0)
+26 -3
arch/s390/kernel/os_info.c
··· 15 15 #include <asm/checksum.h> 16 16 #include <asm/abs_lowcore.h> 17 17 #include <asm/os_info.h> 18 + #include <asm/physmem_info.h> 18 19 #include <asm/maccess.h> 19 20 #include <asm/asm-offsets.h> 21 + #include <asm/ipl.h> 20 22 21 23 /* 22 24 * OS info structure has to be page aligned ··· 45 43 } 46 44 47 45 /* 48 - * Add OS info entry and update checksum 46 + * Add OS info data entry and update checksum 49 47 */ 50 - void os_info_entry_add(int nr, void *ptr, u64 size) 48 + void os_info_entry_add_data(int nr, void *ptr, u64 size) 51 49 { 52 50 os_info.entry[nr].addr = __pa(ptr); 53 51 os_info.entry[nr].size = size; 54 52 os_info.entry[nr].csum = (__force u32)cksm(ptr, size, 0); 53 + os_info.csum = os_info_csum(&os_info); 54 + } 55 + 56 + /* 57 + * Add OS info value entry and update checksum 58 + */ 59 + void os_info_entry_add_val(int nr, u64 value) 60 + { 61 + os_info.entry[nr].val = value; 62 + os_info.entry[nr].size = 0; 63 + os_info.entry[nr].csum = 0; 55 64 os_info.csum = os_info_csum(&os_info); 56 65 } 57 66 ··· 73 60 { 74 61 struct lowcore *abs_lc; 75 62 63 + BUILD_BUG_ON(sizeof(struct os_info) != PAGE_SIZE); 76 64 os_info.version_major = OS_INFO_VERSION_MAJOR; 77 65 os_info.version_minor = OS_INFO_VERSION_MINOR; 78 66 os_info.magic = OS_INFO_MAGIC; 67 + os_info_entry_add_val(OS_INFO_IDENTITY_BASE, __identity_base); 68 + os_info_entry_add_val(OS_INFO_KASLR_OFFSET, kaslr_offset()); 69 + os_info_entry_add_val(OS_INFO_KASLR_OFF_PHYS, __kaslr_offset_phys); 70 + os_info_entry_add_val(OS_INFO_VMEMMAP, (unsigned long)vmemmap); 71 + os_info_entry_add_val(OS_INFO_AMODE31_START, AMODE31_START); 72 + os_info_entry_add_val(OS_INFO_AMODE31_END, AMODE31_END); 73 + os_info_entry_add_val(OS_INFO_IMAGE_START, (unsigned long)_stext); 74 + os_info_entry_add_val(OS_INFO_IMAGE_END, (unsigned long)_end); 75 + os_info_entry_add_val(OS_INFO_IMAGE_PHYS, __pa_symbol(_stext)); 79 76 os_info.csum = os_info_csum(&os_info); 80 77 abs_lc = get_abs_lowcore(); 81 78 abs_lc->os_info = __pa(&os_info); ··· 148 125 149 126 if (os_info_init) 150 127 return; 151 - if (!oldmem_data.start) 128 + if (!oldmem_data.start && !is_ipl_type_dump()) 152 129 goto fail; 153 130 if (copy_oldmem_kernel(&addr, __LC_OS_INFO, sizeof(addr))) 154 131 goto fail;
+1 -1
arch/s390/kernel/perf_cpum_cf.c
··· 428 428 case CPUMF_CTR_SET_CRYPTO: 429 429 if (cpumf_ctr_info.csvn >= 1 && cpumf_ctr_info.csvn <= 5) 430 430 ctrset_size = 16; 431 - else if (cpumf_ctr_info.csvn == 6 || cpumf_ctr_info.csvn == 7) 431 + else if (cpumf_ctr_info.csvn >= 6) 432 432 ctrset_size = 20; 433 433 break; 434 434 case CPUMF_CTR_SET_EXT:
+3 -8
arch/s390/kernel/perf_cpum_cf_events.c
··· 855 855 } 856 856 857 857 /* Determine version specific crypto set */ 858 - switch (ci.csvn) { 859 - case 1 ... 5: 858 + csvn = none; 859 + if (ci.csvn >= 1 && ci.csvn <= 5) 860 860 csvn = cpumcf_svn_12345_pmu_event_attr; 861 - break; 862 - case 6 ... 7: 861 + else if (ci.csvn >= 6) 863 862 csvn = cpumcf_svn_67_pmu_event_attr; 864 - break; 865 - default: 866 - csvn = none; 867 - } 868 863 869 864 /* Determine model-specific counter set(s) */ 870 865 get_cpu_id(&cpu_id);
+3 -3
arch/s390/kernel/setup.c
··· 146 146 static u32 __amode31_ref *__ctl_duct = __ctl_duct_amode31; 147 147 148 148 unsigned long __bootdata_preserved(max_mappable); 149 - unsigned long __bootdata(ident_map_size); 150 149 struct physmem_info __bootdata(physmem_info); 151 150 152 - unsigned long __bootdata_preserved(__kaslr_offset); 151 + struct vm_layout __bootdata_preserved(vm_layout); 152 + EXPORT_SYMBOL_GPL(vm_layout); 153 153 int __bootdata_preserved(__kaslr_enabled); 154 154 unsigned int __bootdata_preserved(zlib_dfltcc_support); 155 155 EXPORT_SYMBOL(zlib_dfltcc_support); ··· 765 765 unsigned long amode31_size = __eamode31 - __samode31; 766 766 long amode31_offset, *ptr; 767 767 768 - amode31_offset = physmem_info.reserved[RR_AMODE31].start - (unsigned long)__samode31; 768 + amode31_offset = AMODE31_START - (unsigned long)__samode31; 769 769 pr_info("Relocating AMODE31 section of size 0x%08lx\n", amode31_size); 770 770 771 771 /* Move original AMODE31 section to the new one */
+19
arch/s390/kernel/stacktrace.c
··· 101 101 } 102 102 pagefault_enable(); 103 103 } 104 + 105 + unsigned long return_address(unsigned int n) 106 + { 107 + struct unwind_state state; 108 + unsigned long addr; 109 + 110 + /* Increment to skip current stack entry */ 111 + n++; 112 + 113 + unwind_for_each_frame(&state, NULL, NULL, 0) { 114 + addr = unwind_get_return_address(&state); 115 + if (!addr) 116 + break; 117 + if (!n--) 118 + return addr; 119 + } 120 + return 0; 121 + } 122 + EXPORT_SYMBOL_GPL(return_address);
+28 -23
arch/s390/kernel/uv.c
··· 21 21 /* the bootdata_preserved fields come from ones in arch/s390/boot/uv.c */ 22 22 #ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST 23 23 int __bootdata_preserved(prot_virt_guest); 24 + EXPORT_SYMBOL(prot_virt_guest); 24 25 #endif 25 26 26 27 /* ··· 182 181 } 183 182 184 183 /* 185 - * Calculate the expected ref_count for a page that would otherwise have no 184 + * Calculate the expected ref_count for a folio that would otherwise have no 186 185 * further pins. This was cribbed from similar functions in other places in 187 186 * the kernel, but with some slight modifications. We know that a secure 188 - * page can not be a huge page for example. 187 + * folio can not be a large folio, for example. 189 188 */ 190 - static int expected_page_refs(struct page *page) 189 + static int expected_folio_refs(struct folio *folio) 191 190 { 192 191 int res; 193 192 194 - res = page_mapcount(page); 195 - if (PageSwapCache(page)) { 193 + res = folio_mapcount(folio); 194 + if (folio_test_swapcache(folio)) { 196 195 res++; 197 - } else if (page_mapping(page)) { 196 + } else if (folio_mapping(folio)) { 198 197 res++; 199 - if (page_has_private(page)) 198 + if (folio->private) 200 199 res++; 201 200 } 202 201 return res; 203 202 } 204 203 205 - static int make_page_secure(struct page *page, struct uv_cb_header *uvcb) 204 + static int make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb) 206 205 { 207 206 int expected, cc = 0; 208 207 209 - if (PageWriteback(page)) 208 + if (folio_test_writeback(folio)) 210 209 return -EAGAIN; 211 - expected = expected_page_refs(page); 212 - if (!page_ref_freeze(page, expected)) 210 + expected = expected_folio_refs(folio); 211 + if (!folio_ref_freeze(folio, expected)) 213 212 return -EBUSY; 214 - set_bit(PG_arch_1, &page->flags); 213 + set_bit(PG_arch_1, &folio->flags); 215 214 /* 216 215 * If the UVC does not succeed or fail immediately, we don't want to 217 216 * loop for long, or we might get stall notifications. ··· 221 220 * -EAGAIN and we let the callers deal with it. 222 221 */ 223 222 cc = __uv_call(0, (u64)uvcb); 224 - page_ref_unfreeze(page, expected); 223 + folio_ref_unfreeze(folio, expected); 225 224 /* 226 - * Return -ENXIO if the page was not mapped, -EINVAL for other errors. 225 + * Return -ENXIO if the folio was not mapped, -EINVAL for other errors. 227 226 * If busy or partially completed, return -EAGAIN. 228 227 */ 229 228 if (cc == UVC_CC_OK) ··· 278 277 bool local_drain = false; 279 278 spinlock_t *ptelock; 280 279 unsigned long uaddr; 281 - struct page *page; 280 + struct folio *folio; 282 281 pte_t *ptep; 283 282 int rc; 284 283 ··· 307 306 if (!ptep) 308 307 goto out; 309 308 if (pte_present(*ptep) && !(pte_val(*ptep) & _PAGE_INVALID) && pte_write(*ptep)) { 310 - page = pte_page(*ptep); 309 + folio = page_folio(pte_page(*ptep)); 310 + rc = -EINVAL; 311 + if (folio_test_large(folio)) 312 + goto unlock; 311 313 rc = -EAGAIN; 312 - if (trylock_page(page)) { 314 + if (folio_trylock(folio)) { 313 315 if (should_export_before_import(uvcb, gmap->mm)) 314 - uv_convert_from_secure(page_to_phys(page)); 315 - rc = make_page_secure(page, uvcb); 316 - unlock_page(page); 316 + uv_convert_from_secure(PFN_PHYS(folio_pfn(folio))); 317 + rc = make_folio_secure(folio, uvcb); 318 + folio_unlock(folio); 317 319 } 318 320 } 321 + unlock: 319 322 pte_unmap_unlock(ptep, ptelock); 320 323 out: 321 324 mmap_read_unlock(gmap->mm); ··· 329 324 * If we are here because the UVC returned busy or partial 330 325 * completion, this is just a useless check, but it is safe. 331 326 */ 332 - wait_on_page_writeback(page); 327 + folio_wait_writeback(folio); 333 328 } else if (rc == -EBUSY) { 334 329 /* 335 - * If we have tried a local drain and the page refcount 330 + * If we have tried a local drain and the folio refcount 336 331 * still does not match our expected safe value, try with a 337 332 * system wide drain. This is needed if the pagevecs holding 338 333 * the page are on a different CPU. ··· 343 338 return -EAGAIN; 344 339 } 345 340 /* 346 - * We are here if the page refcount does not match the 341 + * We are here if the folio refcount does not match the 347 342 * expected safe value. The main culprits are usually 348 343 * pagevecs. With lru_add_drain() we drain the pagevecs 349 344 * on the local CPU so that hopefully the refcount will
+2
arch/s390/kernel/vmcore_info.c
··· 14 14 VMCOREINFO_LENGTH(lowcore_ptr, NR_CPUS); 15 15 vmcoreinfo_append_str("SAMODE31=%lx\n", (unsigned long)__samode31); 16 16 vmcoreinfo_append_str("EAMODE31=%lx\n", (unsigned long)__eamode31); 17 + vmcoreinfo_append_str("IDENTITYBASE=%lx\n", __identity_base); 17 18 vmcoreinfo_append_str("KERNELOFFSET=%lx\n", kaslr_offset()); 19 + vmcoreinfo_append_str("KERNELOFFPHYS=%lx\n", __kaslr_offset_phys); 18 20 abs_lc = get_abs_lowcore(); 19 21 abs_lc->vmcore_info = paddr_vmcoreinfo_note(); 20 22 put_abs_lowcore(abs_lc);
+2 -36
arch/s390/kernel/vmlinux.lds.S
··· 39 39 40 40 SECTIONS 41 41 { 42 - . = 0x100000; 42 + . = __START_KERNEL; 43 43 .text : { 44 44 _stext = .; /* Start of text section */ 45 45 _text = .; /* Text and read-only data */ ··· 183 183 .amode31.data : { 184 184 *(.amode31.data) 185 185 } 186 - . = ALIGN(PAGE_SIZE); 186 + . = _samode31 + AMODE31_SIZE; 187 187 _eamode31 = .; 188 188 189 189 /* early.c uses stsi, which requires page aligned data. */ ··· 191 191 INIT_DATA_SECTION(0x100) 192 192 193 193 PERCPU_SECTION(0x100) 194 - 195 - #ifdef CONFIG_PIE_BUILD 196 - .dynsym ALIGN(8) : { 197 - __dynsym_start = .; 198 - *(.dynsym) 199 - __dynsym_end = .; 200 - } 201 - .rela.dyn ALIGN(8) : { 202 - __rela_dyn_start = .; 203 - *(.rela*) 204 - __rela_dyn_end = .; 205 - } 206 - .dynamic ALIGN(8) : { 207 - *(.dynamic) 208 - } 209 - .dynstr ALIGN(8) : { 210 - *(.dynstr) 211 - } 212 - #endif 213 - .hash ALIGN(8) : { 214 - *(.hash) 215 - } 216 - .gnu.hash ALIGN(8) : { 217 - *(.gnu.hash) 218 - } 219 194 220 195 . = ALIGN(PAGE_SIZE); 221 196 __init_end = .; /* freed after init ends here */ ··· 205 230 * it should match struct vmlinux_info 206 231 */ 207 232 .vmlinux.info 0 (INFO) : { 208 - QUAD(_stext) /* default_lma */ 209 233 QUAD(startup_continue) /* entry */ 210 234 QUAD(__bss_start - _stext) /* image_size */ 211 235 QUAD(__bss_stop - __bss_start) /* bss_size */ ··· 213 239 QUAD(__boot_data_preserved_start) /* bootdata_preserved_off */ 214 240 QUAD(__boot_data_preserved_end - 215 241 __boot_data_preserved_start) /* bootdata_preserved_size */ 216 - #ifdef CONFIG_PIE_BUILD 217 - QUAD(__dynsym_start) /* dynsym_start */ 218 - QUAD(__rela_dyn_start) /* rela_dyn_start */ 219 - QUAD(__rela_dyn_end) /* rela_dyn_end */ 220 - #else 221 242 QUAD(__got_start) /* got_start */ 222 243 QUAD(__got_end) /* got_end */ 223 - #endif 224 244 QUAD(_eamode31 - _samode31) /* amode31_size */ 225 245 QUAD(init_mm) 226 246 QUAD(swapper_pg_dir) ··· 250 282 *(.plt) *(.plt.*) *(.iplt) *(.igot .igot.plt) 251 283 } 252 284 ASSERT(SIZEOF(.plt) == 0, "Unexpected run-time procedure linkages detected!") 253 - #ifndef CONFIG_PIE_BUILD 254 285 .rela.dyn : { 255 286 *(.rela.*) *(.rela_*) 256 287 } 257 288 ASSERT(SIZEOF(.rela.dyn) == 0, "Unexpected run-time relocations (.rela) detected!") 258 - #endif 259 289 260 290 /* Sections to be discarded */ 261 291 DISCARDS
+1 -3
arch/s390/kvm/kvm-s390.c
··· 2631 2631 if (r) 2632 2632 break; 2633 2633 2634 - mmap_write_lock(current->mm); 2635 - r = gmap_mark_unmergeable(); 2636 - mmap_write_unlock(current->mm); 2634 + r = s390_disable_cow_sharing(); 2637 2635 if (r) 2638 2636 break; 2639 2637
+3 -2
arch/s390/kvm/vsie.c
··· 12 12 #include <linux/list.h> 13 13 #include <linux/bitmap.h> 14 14 #include <linux/sched/signal.h> 15 + #include <linux/io.h> 15 16 16 17 #include <asm/gmap.h> 17 18 #include <asm/mmu_context.h> ··· 362 361 case -EACCES: 363 362 return set_validity_icpt(scb_s, 0x003CU); 364 363 } 365 - scb_s->crycbd = ((__u32)(__u64) &vsie_page->crycb) | CRYCB_FORMAT2; 364 + scb_s->crycbd = (u32)virt_to_phys(&vsie_page->crycb) | CRYCB_FORMAT2; 366 365 return 0; 367 366 } 368 367 ··· 1006 1005 if (read_guest_real(vcpu, fac, &vsie_page->fac, 1007 1006 stfle_size() * sizeof(u64))) 1008 1007 return set_validity_icpt(scb_s, 0x1090U); 1009 - scb_s->fac = (__u32)(__u64) &vsie_page->fac; 1008 + scb_s->fac = (u32)virt_to_phys(&vsie_page->fac); 1010 1009 } 1011 1010 return 0; 1012 1011 }
+1 -1
arch/s390/lib/Makefile
··· 23 23 24 24 lib-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o 25 25 26 - obj-$(CONFIG_EXPOLINE_EXTERN) += expoline/ 26 + obj-$(CONFIG_EXPOLINE_EXTERN) += expoline.o
-3
arch/s390/lib/expoline/Makefile
··· 1 - # SPDX-License-Identifier: GPL-2.0 2 - 3 - obj-y += expoline.o
arch/s390/lib/expoline/expoline.S arch/s390/lib/expoline.S
+125 -40
arch/s390/mm/gmap.c
··· 2550 2550 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ 2551 2551 2552 2552 /* 2553 - * Remove all empty zero pages from the mapping for lazy refaulting 2554 - * - This must be called after mm->context.has_pgste is set, to avoid 2555 - * future creation of zero pages 2556 - * - This must be called after THP was disabled. 2557 - * 2558 - * mm contracts with s390, that even if mm were to remove a page table, 2559 - * racing with the loop below and so causing pte_offset_map_lock() to fail, 2560 - * it will never insert a page table containing empty zero pages once 2561 - * mm_forbids_zeropage(mm) i.e. mm->context.has_pgste is set. 2562 - */ 2563 - static int __zap_zero_pages(pmd_t *pmd, unsigned long start, 2564 - unsigned long end, struct mm_walk *walk) 2565 - { 2566 - unsigned long addr; 2567 - 2568 - for (addr = start; addr != end; addr += PAGE_SIZE) { 2569 - pte_t *ptep; 2570 - spinlock_t *ptl; 2571 - 2572 - ptep = pte_offset_map_lock(walk->mm, pmd, addr, &ptl); 2573 - if (!ptep) 2574 - break; 2575 - if (is_zero_pfn(pte_pfn(*ptep))) 2576 - ptep_xchg_direct(walk->mm, addr, ptep, __pte(_PAGE_INVALID)); 2577 - pte_unmap_unlock(ptep, ptl); 2578 - } 2579 - return 0; 2580 - } 2581 - 2582 - static const struct mm_walk_ops zap_zero_walk_ops = { 2583 - .pmd_entry = __zap_zero_pages, 2584 - .walk_lock = PGWALK_WRLOCK, 2585 - }; 2586 - 2587 - /* 2588 2553 * switch on pgstes for its userspace process (for kvm) 2589 2554 */ 2590 2555 int s390_enable_sie(void) ··· 2566 2601 mm->context.has_pgste = 1; 2567 2602 /* split thp mappings and disable thp for future mappings */ 2568 2603 thp_split_mm(mm); 2569 - walk_page_range(mm, 0, TASK_SIZE, &zap_zero_walk_ops, NULL); 2570 2604 mmap_write_unlock(mm); 2571 2605 return 0; 2572 2606 } 2573 2607 EXPORT_SYMBOL_GPL(s390_enable_sie); 2574 2608 2575 - int gmap_mark_unmergeable(void) 2609 + static int find_zeropage_pte_entry(pte_t *pte, unsigned long addr, 2610 + unsigned long end, struct mm_walk *walk) 2576 2611 { 2612 + unsigned long *found_addr = walk->private; 2613 + 2614 + /* Return 1 of the page is a zeropage. */ 2615 + if (is_zero_pfn(pte_pfn(*pte))) { 2616 + /* 2617 + * Shared zeropage in e.g., a FS DAX mapping? We cannot do the 2618 + * right thing and likely don't care: FAULT_FLAG_UNSHARE 2619 + * currently only works in COW mappings, which is also where 2620 + * mm_forbids_zeropage() is checked. 2621 + */ 2622 + if (!is_cow_mapping(walk->vma->vm_flags)) 2623 + return -EFAULT; 2624 + 2625 + *found_addr = addr; 2626 + return 1; 2627 + } 2628 + return 0; 2629 + } 2630 + 2631 + static const struct mm_walk_ops find_zeropage_ops = { 2632 + .pte_entry = find_zeropage_pte_entry, 2633 + .walk_lock = PGWALK_WRLOCK, 2634 + }; 2635 + 2636 + /* 2637 + * Unshare all shared zeropages, replacing them by anonymous pages. Note that 2638 + * we cannot simply zap all shared zeropages, because this could later 2639 + * trigger unexpected userfaultfd missing events. 2640 + * 2641 + * This must be called after mm->context.allow_cow_sharing was 2642 + * set to 0, to avoid future mappings of shared zeropages. 2643 + * 2644 + * mm contracts with s390, that even if mm were to remove a page table, 2645 + * and racing with walk_page_range_vma() calling pte_offset_map_lock() 2646 + * would fail, it will never insert a page table containing empty zero 2647 + * pages once mm_forbids_zeropage(mm) i.e. 2648 + * mm->context.allow_cow_sharing is set to 0. 2649 + */ 2650 + static int __s390_unshare_zeropages(struct mm_struct *mm) 2651 + { 2652 + struct vm_area_struct *vma; 2653 + VMA_ITERATOR(vmi, mm, 0); 2654 + unsigned long addr; 2655 + vm_fault_t fault; 2656 + int rc; 2657 + 2658 + for_each_vma(vmi, vma) { 2659 + /* 2660 + * We could only look at COW mappings, but it's more future 2661 + * proof to catch unexpected zeropages in other mappings and 2662 + * fail. 2663 + */ 2664 + if ((vma->vm_flags & VM_PFNMAP) || is_vm_hugetlb_page(vma)) 2665 + continue; 2666 + addr = vma->vm_start; 2667 + 2668 + retry: 2669 + rc = walk_page_range_vma(vma, addr, vma->vm_end, 2670 + &find_zeropage_ops, &addr); 2671 + if (rc < 0) 2672 + return rc; 2673 + else if (!rc) 2674 + continue; 2675 + 2676 + /* addr was updated by find_zeropage_pte_entry() */ 2677 + fault = handle_mm_fault(vma, addr, 2678 + FAULT_FLAG_UNSHARE | FAULT_FLAG_REMOTE, 2679 + NULL); 2680 + if (fault & VM_FAULT_OOM) 2681 + return -ENOMEM; 2682 + /* 2683 + * See break_ksm(): even after handle_mm_fault() returned 0, we 2684 + * must start the lookup from the current address, because 2685 + * handle_mm_fault() may back out if there's any difficulty. 2686 + * 2687 + * VM_FAULT_SIGBUS and VM_FAULT_SIGSEGV are unexpected but 2688 + * maybe they could trigger in the future on concurrent 2689 + * truncation. In that case, the shared zeropage would be gone 2690 + * and we can simply retry and make progress. 2691 + */ 2692 + cond_resched(); 2693 + goto retry; 2694 + } 2695 + 2696 + return 0; 2697 + } 2698 + 2699 + static int __s390_disable_cow_sharing(struct mm_struct *mm) 2700 + { 2701 + int rc; 2702 + 2703 + if (!mm->context.allow_cow_sharing) 2704 + return 0; 2705 + 2706 + mm->context.allow_cow_sharing = 0; 2707 + 2708 + /* Replace all shared zeropages by anonymous pages. */ 2709 + rc = __s390_unshare_zeropages(mm); 2577 2710 /* 2578 2711 * Make sure to disable KSM (if enabled for the whole process or 2579 2712 * individual VMAs). Note that nothing currently hinders user space 2580 2713 * from re-enabling it. 2581 2714 */ 2582 - return ksm_disable(current->mm); 2715 + if (!rc) 2716 + rc = ksm_disable(mm); 2717 + if (rc) 2718 + mm->context.allow_cow_sharing = 1; 2719 + return rc; 2583 2720 } 2584 - EXPORT_SYMBOL_GPL(gmap_mark_unmergeable); 2721 + 2722 + /* 2723 + * Disable most COW-sharing of memory pages for the whole process: 2724 + * (1) Disable KSM and unmerge/unshare any KSM pages. 2725 + * (2) Disallow shared zeropages and unshare any zerpages that are mapped. 2726 + * 2727 + * Not that we currently don't bother with COW-shared pages that are shared 2728 + * with parent/child processes due to fork(). 2729 + */ 2730 + int s390_disable_cow_sharing(void) 2731 + { 2732 + int rc; 2733 + 2734 + mmap_write_lock(current->mm); 2735 + rc = __s390_disable_cow_sharing(current->mm); 2736 + mmap_write_unlock(current->mm); 2737 + return rc; 2738 + } 2739 + EXPORT_SYMBOL_GPL(s390_disable_cow_sharing); 2585 2740 2586 2741 /* 2587 2742 * Enable storage key handling from now on and initialize the storage ··· 2770 2685 goto out_up; 2771 2686 2772 2687 mm->context.uses_skeys = 1; 2773 - rc = gmap_mark_unmergeable(); 2688 + rc = __s390_disable_cow_sharing(mm); 2774 2689 if (rc) { 2775 2690 mm->context.uses_skeys = 0; 2776 2691 goto out_up;
+4 -1
arch/s390/mm/vmem.c
··· 13 13 #include <linux/slab.h> 14 14 #include <linux/sort.h> 15 15 #include <asm/page-states.h> 16 + #include <asm/abs_lowcore.h> 16 17 #include <asm/cacheflush.h> 18 + #include <asm/maccess.h> 17 19 #include <asm/nospec-branch.h> 18 20 #include <asm/ctlreg.h> 19 21 #include <asm/pgalloc.h> ··· 23 21 #include <asm/tlbflush.h> 24 22 #include <asm/sections.h> 25 23 #include <asm/set_memory.h> 24 + #include <asm/physmem_info.h> 26 25 27 26 static DEFINE_MUTEX(vmem_mutex); 28 27 ··· 439 436 if (WARN_ON_ONCE(!PAGE_ALIGNED(start | end))) 440 437 return -EINVAL; 441 438 /* Don't mess with any tables not fully in 1:1 mapping & vmemmap area */ 442 - if (WARN_ON_ONCE(end > VMALLOC_START)) 439 + if (WARN_ON_ONCE(end > __abs_lowcore)) 443 440 return -EINVAL; 444 441 for (addr = start; addr < end; addr = next) { 445 442 next = pgd_addr_end(addr, end);
-4
arch/s390/pci/pci_sysfs.c
··· 172 172 } 173 173 static DEVICE_ATTR_RO(uid_is_unique); 174 174 175 - #ifndef CONFIG_DMI 176 175 /* analogous to smbios index */ 177 176 static ssize_t index_show(struct device *dev, 178 177 struct device_attribute *attr, char *buf) ··· 201 202 .attrs = zpci_ident_attrs, 202 203 .is_visible = zpci_index_is_visible, 203 204 }; 204 - #endif 205 205 206 206 static struct bin_attribute *zpci_bin_attrs[] = { 207 207 &bin_attr_util_string, ··· 243 245 const struct attribute_group *zpci_attr_groups[] = { 244 246 &zpci_attr_group, 245 247 &pfip_attr_group, 246 - #ifndef CONFIG_DMI 247 248 &zpci_ident_attr_group, 248 - #endif 249 249 NULL, 250 250 };
+1 -1
arch/s390/tools/relocs.c
··· 280 280 case R_390_GOTOFF64: 281 281 break; 282 282 case R_390_64: 283 - add_reloc(&relocs64, offset); 283 + add_reloc(&relocs64, offset - ehdr.e_entry); 284 284 break; 285 285 default: 286 286 die("Unsupported relocation type: %d\n", r_type);
+1 -17
drivers/crypto/Kconfig
··· 67 67 config ZCRYPT 68 68 tristate "Support for s390 cryptographic adapters" 69 69 depends on S390 70 + depends on AP 70 71 select HW_RANDOM 71 72 help 72 73 Select this option if you want to enable support for 73 74 s390 cryptographic adapters like Crypto Express 4 up 74 75 to 8 in Coprocessor (CEXxC), EP11 Coprocessor (CEXxP) 75 76 or Accelerator (CEXxA) mode. 76 - 77 - config ZCRYPT_DEBUG 78 - bool "Enable debug features for s390 cryptographic adapters" 79 - default n 80 - depends on DEBUG_KERNEL 81 - depends on ZCRYPT 82 - help 83 - Say 'Y' here to enable some additional debug features on the 84 - s390 cryptographic adapters driver. 85 - 86 - There will be some more sysfs attributes displayed for ap cards 87 - and queues and some flags on crypto requests are interpreted as 88 - debugging messages to force error injection. 89 - 90 - Do not enable on production level kernel build. 91 - 92 - If unsure, say N. 93 77 94 78 config PKEY 95 79 tristate "Kernel API for protected key handling"
+1 -1
drivers/s390/char/Makefile
··· 32 32 33 33 obj-$(CONFIG_PCI) += sclp_pci.o 34 34 35 - obj-$(subst m,y,$(CONFIG_ZCRYPT)) += sclp_ap.o 35 + obj-$(subst m,y,$(CONFIG_AP)) += sclp_ap.o 36 36 37 37 obj-$(CONFIG_VMLOGRDR) += vmlogrdr.o 38 38 obj-$(CONFIG_VMCP) += vmcp.o
+81 -60
drivers/s390/cio/chp.c
··· 127 127 /* 128 128 * Channel measurement related functions 129 129 */ 130 - static ssize_t chp_measurement_chars_read(struct file *filp, 131 - struct kobject *kobj, 132 - struct bin_attribute *bin_attr, 133 - char *buf, loff_t off, size_t count) 130 + static ssize_t measurement_chars_read(struct file *filp, struct kobject *kobj, 131 + struct bin_attribute *bin_attr, 132 + char *buf, loff_t off, size_t count) 134 133 { 135 134 struct channel_path *chp; 136 135 struct device *device; ··· 142 143 return memory_read_from_buffer(buf, count, &off, &chp->cmg_chars, 143 144 sizeof(chp->cmg_chars)); 144 145 } 146 + static BIN_ATTR_ADMIN_RO(measurement_chars, sizeof(struct cmg_chars)); 145 147 146 - static const struct bin_attribute chp_measurement_chars_attr = { 147 - .attr = { 148 - .name = "measurement_chars", 149 - .mode = S_IRUSR, 150 - }, 151 - .size = sizeof(struct cmg_chars), 152 - .read = chp_measurement_chars_read, 153 - }; 154 - 155 - static void chp_measurement_copy_block(struct cmg_entry *buf, 156 - struct channel_subsystem *css, 157 - struct chp_id chpid) 158 - { 159 - void *area; 160 - struct cmg_entry *entry, reference_buf; 161 - int idx; 162 - 163 - if (chpid.id < 128) { 164 - area = css->cub_addr1; 165 - idx = chpid.id; 166 - } else { 167 - area = css->cub_addr2; 168 - idx = chpid.id - 128; 169 - } 170 - entry = area + (idx * sizeof(struct cmg_entry)); 171 - do { 172 - memcpy(buf, entry, sizeof(*entry)); 173 - memcpy(&reference_buf, entry, sizeof(*entry)); 174 - } while (reference_buf.values[0] != buf->values[0]); 175 - } 176 - 177 - static ssize_t chp_measurement_read(struct file *filp, struct kobject *kobj, 178 - struct bin_attribute *bin_attr, 179 - char *buf, loff_t off, size_t count) 148 + static ssize_t chp_measurement_copy_block(void *buf, loff_t off, size_t count, 149 + struct kobject *kobj, bool extended) 180 150 { 181 151 struct channel_path *chp; 182 152 struct channel_subsystem *css; 183 153 struct device *device; 184 154 unsigned int size; 155 + void *area, *entry; 156 + int id, idx; 185 157 186 158 device = kobj_to_dev(kobj); 187 159 chp = to_channelpath(device); 188 160 css = to_css(chp->dev.parent); 161 + id = chp->chpid.id; 189 162 190 - size = sizeof(struct cmg_entry); 163 + if (extended) { 164 + /* Check if extended measurement data is available. */ 165 + if (!chp->extended) 166 + return 0; 167 + 168 + size = sizeof(struct cmg_ext_entry); 169 + area = css->ecub[id / CSS_ECUES_PER_PAGE]; 170 + idx = id % CSS_ECUES_PER_PAGE; 171 + } else { 172 + size = sizeof(struct cmg_entry); 173 + area = css->cub[id / CSS_CUES_PER_PAGE]; 174 + idx = id % CSS_CUES_PER_PAGE; 175 + } 176 + entry = area + (idx * size); 191 177 192 178 /* Only allow single reads. */ 193 179 if (off || count < size) 194 180 return 0; 195 - chp_measurement_copy_block((struct cmg_entry *)buf, css, chp->chpid); 196 - count = size; 197 - return count; 181 + 182 + memcpy(buf, entry, size); 183 + 184 + return size; 198 185 } 199 186 200 - static const struct bin_attribute chp_measurement_attr = { 201 - .attr = { 202 - .name = "measurement", 203 - .mode = S_IRUSR, 204 - }, 205 - .size = sizeof(struct cmg_entry), 206 - .read = chp_measurement_read, 187 + static ssize_t measurement_read(struct file *filp, struct kobject *kobj, 188 + struct bin_attribute *bin_attr, 189 + char *buf, loff_t off, size_t count) 190 + { 191 + return chp_measurement_copy_block(buf, off, count, kobj, false); 192 + } 193 + static BIN_ATTR_ADMIN_RO(measurement, sizeof(struct cmg_entry)); 194 + 195 + static ssize_t ext_measurement_read(struct file *filp, struct kobject *kobj, 196 + struct bin_attribute *bin_attr, 197 + char *buf, loff_t off, size_t count) 198 + { 199 + return chp_measurement_copy_block(buf, off, count, kobj, true); 200 + } 201 + static BIN_ATTR_ADMIN_RO(ext_measurement, sizeof(struct cmg_ext_entry)); 202 + 203 + static struct bin_attribute *measurement_attrs[] = { 204 + &bin_attr_measurement_chars, 205 + &bin_attr_measurement, 206 + &bin_attr_ext_measurement, 207 + NULL, 207 208 }; 209 + BIN_ATTRIBUTE_GROUPS(measurement); 208 210 209 211 void chp_remove_cmg_attr(struct channel_path *chp) 210 212 { 211 - device_remove_bin_file(&chp->dev, &chp_measurement_chars_attr); 212 - device_remove_bin_file(&chp->dev, &chp_measurement_attr); 213 + device_remove_groups(&chp->dev, measurement_groups); 213 214 } 214 215 215 216 int chp_add_cmg_attr(struct channel_path *chp) 216 217 { 217 - int ret; 218 - 219 - ret = device_create_bin_file(&chp->dev, &chp_measurement_chars_attr); 220 - if (ret) 221 - return ret; 222 - ret = device_create_bin_file(&chp->dev, &chp_measurement_attr); 223 - if (ret) 224 - device_remove_bin_file(&chp->dev, &chp_measurement_chars_attr); 225 - return ret; 218 + return device_add_groups(&chp->dev, measurement_groups); 226 219 } 227 220 228 221 /* ··· 392 401 } 393 402 static DEVICE_ATTR(esc, 0444, chp_esc_show, NULL); 394 403 404 + static char apply_max_suffix(unsigned long *value, unsigned long base) 405 + { 406 + static char suffixes[] = { 0, 'K', 'M', 'G', 'T' }; 407 + int i; 408 + 409 + for (i = 0; i < ARRAY_SIZE(suffixes) - 1; i++) { 410 + if (*value < base || *value % base != 0) 411 + break; 412 + *value /= base; 413 + } 414 + 415 + return suffixes[i]; 416 + } 417 + 418 + static ssize_t speed_bps_show(struct device *dev, 419 + struct device_attribute *attr, char *buf) 420 + { 421 + struct channel_path *chp = to_channelpath(dev); 422 + unsigned long speed = chp->speed; 423 + char suffix; 424 + 425 + suffix = apply_max_suffix(&speed, 1000); 426 + 427 + return suffix ? sysfs_emit(buf, "%lu%c\n", speed, suffix) : 428 + sysfs_emit(buf, "%lu\n", speed); 429 + } 430 + 431 + static DEVICE_ATTR_RO(speed_bps); 432 + 395 433 static ssize_t util_string_read(struct file *filp, struct kobject *kobj, 396 434 struct bin_attribute *attr, char *buf, 397 435 loff_t off, size_t count) ··· 452 432 &dev_attr_chid.attr, 453 433 &dev_attr_chid_external.attr, 454 434 &dev_attr_esc.attr, 435 + &dev_attr_speed_bps.attr, 455 436 NULL, 456 437 }; 457 438 static struct attribute_group chp_attr_group = {
+2
drivers/s390/cio/chp.h
··· 51 51 /* Channel-measurement related stuff: */ 52 52 int cmg; 53 53 int shared; 54 + int extended; 55 + unsigned long speed; 54 56 struct cmg_chars cmg_chars; 55 57 }; 56 58
+92 -30
drivers/s390/cio/chsc.c
··· 24 24 #include <asm/crw.h> 25 25 #include <asm/isc.h> 26 26 #include <asm/ebcdic.h> 27 - #include <asm/ap.h> 28 27 29 28 #include "css.h" 30 29 #include "cio.h" ··· 38 39 39 40 #define SEI_VF_FLA 0xc0 /* VF flag for Full Link Address */ 40 41 #define SEI_RS_CHPID 0x4 /* 4 in RS field indicates CHPID */ 42 + 43 + static BLOCKING_NOTIFIER_HEAD(chsc_notifiers); 44 + 45 + int chsc_notifier_register(struct notifier_block *nb) 46 + { 47 + return blocking_notifier_chain_register(&chsc_notifiers, nb); 48 + } 49 + EXPORT_SYMBOL(chsc_notifier_register); 50 + 51 + int chsc_notifier_unregister(struct notifier_block *nb) 52 + { 53 + return blocking_notifier_chain_unregister(&chsc_notifiers, nb); 54 + } 55 + EXPORT_SYMBOL(chsc_notifier_unregister); 41 56 42 57 /** 43 58 * chsc_error_from_response() - convert a chsc response to an error ··· 594 581 if (sei_area->rs != 5) 595 582 return; 596 583 597 - ap_bus_cfg_chg(); 584 + blocking_notifier_call_chain(&chsc_notifiers, 585 + CHSC_NOTIFY_AP_CFG, NULL); 598 586 } 599 587 600 588 static void chsc_process_sei_fces_event(struct chsc_sei_nt0_area *sei_area) ··· 871 857 struct { 872 858 struct chsc_header request; 873 859 u32 operation_code : 2; 874 - u32 : 30; 860 + u32 : 1; 861 + u32 e : 1; 862 + u32 : 28; 875 863 u32 key : 4; 876 864 u32 : 28; 877 - u32 zeroes1; 878 - dma32_t cub_addr1; 879 - u32 zeroes2; 880 - dma32_t cub_addr2; 881 - u32 reserved[13]; 865 + dma64_t cub[CSS_NUM_CUB_PAGES]; 866 + dma64_t ecub[CSS_NUM_ECUB_PAGES]; 867 + u32 reserved[5]; 882 868 struct chsc_header response; 883 869 u32 status : 8; 884 870 u32 : 4; 885 871 u32 fmt : 4; 886 872 u32 : 16; 887 - } *secm_area; 873 + } __packed *secm_area; 888 874 unsigned long flags; 889 - int ret, ccode; 875 + int ret, ccode, i; 890 876 891 877 spin_lock_irqsave(&chsc_page_lock, flags); 892 878 memset(chsc_page, 0, PAGE_SIZE); ··· 895 881 secm_area->request.code = 0x0016; 896 882 897 883 secm_area->key = PAGE_DEFAULT_KEY >> 4; 898 - secm_area->cub_addr1 = virt_to_dma32(css->cub_addr1); 899 - secm_area->cub_addr2 = virt_to_dma32(css->cub_addr2); 884 + secm_area->e = 1; 885 + 886 + for (i = 0; i < CSS_NUM_CUB_PAGES; i++) 887 + secm_area->cub[i] = (__force dma64_t)virt_to_dma32(css->cub[i]); 888 + for (i = 0; i < CSS_NUM_ECUB_PAGES; i++) 889 + secm_area->ecub[i] = virt_to_dma64(css->ecub[i]); 900 890 901 891 secm_area->operation_code = enable ? 0 : 1; 902 892 ··· 926 908 return ret; 927 909 } 928 910 911 + static int cub_alloc(struct channel_subsystem *css) 912 + { 913 + int i; 914 + 915 + for (i = 0; i < CSS_NUM_CUB_PAGES; i++) { 916 + css->cub[i] = (void *)get_zeroed_page(GFP_KERNEL | GFP_DMA); 917 + if (!css->cub[i]) 918 + return -ENOMEM; 919 + } 920 + for (i = 0; i < CSS_NUM_ECUB_PAGES; i++) { 921 + css->ecub[i] = (void *)get_zeroed_page(GFP_KERNEL); 922 + if (!css->ecub[i]) 923 + return -ENOMEM; 924 + } 925 + 926 + return 0; 927 + } 928 + 929 + static void cub_free(struct channel_subsystem *css) 930 + { 931 + int i; 932 + 933 + for (i = 0; i < CSS_NUM_CUB_PAGES; i++) { 934 + free_page((unsigned long)css->cub[i]); 935 + css->cub[i] = NULL; 936 + } 937 + for (i = 0; i < CSS_NUM_ECUB_PAGES; i++) { 938 + free_page((unsigned long)css->ecub[i]); 939 + css->ecub[i] = NULL; 940 + } 941 + } 942 + 929 943 int 930 944 chsc_secm(struct channel_subsystem *css, int enable) 931 945 { 932 946 int ret; 933 947 934 948 if (enable && !css->cm_enabled) { 935 - css->cub_addr1 = (void *)get_zeroed_page(GFP_KERNEL | GFP_DMA); 936 - css->cub_addr2 = (void *)get_zeroed_page(GFP_KERNEL | GFP_DMA); 937 - if (!css->cub_addr1 || !css->cub_addr2) { 938 - free_page((unsigned long)css->cub_addr1); 939 - free_page((unsigned long)css->cub_addr2); 940 - return -ENOMEM; 941 - } 949 + ret = cub_alloc(css); 950 + if (ret) 951 + goto out; 942 952 } 943 953 ret = __chsc_do_secm(css, enable); 944 954 if (!ret) { ··· 980 934 } else 981 935 chsc_remove_cmg_attr(css); 982 936 } 983 - if (!css->cm_enabled) { 984 - free_page((unsigned long)css->cub_addr1); 985 - free_page((unsigned long)css->cub_addr2); 986 - } 937 + 938 + out: 939 + if (!css->cm_enabled) 940 + cub_free(css); 941 + 987 942 return ret; 988 943 } 989 944 ··· 1066 1019 } 1067 1020 } 1068 1021 1022 + static unsigned long scmc_get_speed(u32 s, u32 p) 1023 + { 1024 + unsigned long speed = s; 1025 + 1026 + if (!p) 1027 + p = 8; 1028 + while (p--) 1029 + speed *= 10; 1030 + 1031 + return speed; 1032 + } 1033 + 1069 1034 int chsc_get_channel_measurement_chars(struct channel_path *chp) 1070 1035 { 1071 1036 unsigned long flags; ··· 1094 1035 u32 zeroes2; 1095 1036 u32 not_valid : 1; 1096 1037 u32 shared : 1; 1097 - u32 : 22; 1038 + u32 extended : 1; 1039 + u32 : 21; 1098 1040 u32 chpid : 8; 1099 1041 u32 cmcv : 5; 1100 - u32 : 11; 1042 + u32 : 7; 1043 + u32 cmgp : 4; 1101 1044 u32 cmgq : 8; 1102 1045 u32 cmg : 8; 1103 - u32 zeroes3; 1046 + u32 : 16; 1047 + u32 cmgs : 16; 1104 1048 u32 data[NR_MEASUREMENT_CHARS]; 1105 1049 } *scmc_area; 1106 1050 1107 1051 chp->shared = -1; 1108 1052 chp->cmg = -1; 1053 + chp->extended = 0; 1054 + chp->speed = 0; 1109 1055 1110 1056 if (!css_chsc_characteristics.scmc || !css_chsc_characteristics.secm) 1111 1057 return -EINVAL; ··· 1140 1076 1141 1077 chp->cmg = scmc_area->cmg; 1142 1078 chp->shared = scmc_area->shared; 1143 - if (chp->cmg != 2 && chp->cmg != 3) { 1144 - /* No cmg-dependent data. */ 1145 - goto out; 1146 - } 1079 + chp->extended = scmc_area->extended; 1080 + chp->speed = scmc_get_speed(scmc_area->cmgs, scmc_area->cmgp); 1147 1081 chsc_initialize_cmg_chars(chp, scmc_area->cmcv, 1148 1082 (struct cmg_chars *) &scmc_area->data); 1149 1083 out:
+5
drivers/s390/cio/chsc.h
··· 22 22 u32 values[NR_MEASUREMENT_ENTRIES]; 23 23 }; 24 24 25 + #define NR_EXT_MEASUREMENT_ENTRIES 16 26 + struct cmg_ext_entry { 27 + u32 values[NR_EXT_MEASUREMENT_ENTRIES]; 28 + }; 29 + 25 30 struct channel_path_desc_fmt1 { 26 31 u8 flags; 27 32 u8 lsn;
+7 -7
drivers/s390/cio/css.c
··· 309 309 { 310 310 struct subchannel *sch = to_subchannel(dev); 311 311 312 - return sprintf(buf, "%01x\n", sch->st); 312 + return sysfs_emit(buf, "%01x\n", sch->st); 313 313 } 314 314 315 315 static DEVICE_ATTR_RO(type); ··· 319 319 { 320 320 struct subchannel *sch = to_subchannel(dev); 321 321 322 - return sprintf(buf, "css:t%01X\n", sch->st); 322 + return sysfs_emit(buf, "css:t%01X\n", sch->st); 323 323 } 324 324 325 325 static DEVICE_ATTR_RO(modalias); ··· 345 345 ssize_t len; 346 346 347 347 device_lock(dev); 348 - len = snprintf(buf, PAGE_SIZE, "%s\n", sch->driver_override); 348 + len = sysfs_emit(buf, "%s\n", sch->driver_override); 349 349 device_unlock(dev); 350 350 return len; 351 351 } ··· 396 396 struct subchannel *sch = to_subchannel(dev); 397 397 struct pmcw *pmcw = &sch->schib.pmcw; 398 398 399 - return sprintf(buf, "%02x %02x %02x\n", 400 - pmcw->pim, pmcw->pam, pmcw->pom); 399 + return sysfs_emit(buf, "%02x %02x %02x\n", 400 + pmcw->pim, pmcw->pam, pmcw->pom); 401 401 } 402 402 static DEVICE_ATTR_RO(pimpampom); 403 403 ··· 881 881 if (!css->id_valid) 882 882 return -EINVAL; 883 883 884 - return sprintf(buf, "%x\n", css->cssid); 884 + return sysfs_emit(buf, "%x\n", css->cssid); 885 885 } 886 886 static DEVICE_ATTR_RO(real_cssid); 887 887 ··· 904 904 int ret; 905 905 906 906 mutex_lock(&css->mutex); 907 - ret = sprintf(buf, "%x\n", css->cm_enabled); 907 + ret = sysfs_emit(buf, "%x\n", css->cm_enabled); 908 908 mutex_unlock(&css->mutex); 909 909 return ret; 910 910 }
+11 -2
drivers/s390/cio/css.h
··· 35 35 #define SNID_STATE3_SINGLE_PATH 0 36 36 37 37 /* 38 + * Miscellaneous constants 39 + */ 40 + 41 + #define CSS_NUM_CUB_PAGES 2 42 + #define CSS_CUES_PER_PAGE 128 43 + #define CSS_NUM_ECUB_PAGES 4 44 + #define CSS_ECUES_PER_PAGE 64 45 + 46 + /* 38 47 * Conditions used to specify which subchannels need evaluation 39 48 */ 40 49 enum css_eval_cond { ··· 131 122 struct mutex mutex; 132 123 /* channel measurement related */ 133 124 int cm_enabled; 134 - void *cub_addr1; 135 - void *cub_addr2; 125 + void *cub[CSS_NUM_CUB_PAGES]; 126 + void *ecub[CSS_NUM_ECUB_PAGES]; 136 127 /* for orphaned ccw devices */ 137 128 struct subchannel *pseudo_subchannel; 138 129 };
+1 -1
drivers/s390/cio/trace.h
··· 50 50 __entry->devno = schib->pmcw.dev; 51 51 __entry->schib = *schib; 52 52 __entry->pmcw_ena = schib->pmcw.ena; 53 - __entry->pmcw_st = schib->pmcw.ena; 53 + __entry->pmcw_st = schib->pmcw.st; 54 54 __entry->pmcw_dnv = schib->pmcw.dnv; 55 55 __entry->pmcw_dev = schib->pmcw.dev; 56 56 __entry->pmcw_lpm = schib->pmcw.lpm;
+1 -1
drivers/s390/crypto/Makefile
··· 4 4 # 5 5 6 6 ap-objs := ap_bus.o ap_card.o ap_queue.o 7 - obj-$(subst m,y,$(CONFIG_ZCRYPT)) += ap.o 7 + obj-$(CONFIG_AP) += ap.o 8 8 # zcrypt_api.o and zcrypt_msgtype*.o depend on ap.o 9 9 zcrypt-objs := zcrypt_api.o zcrypt_card.o zcrypt_queue.o 10 10 zcrypt-objs += zcrypt_msgtype6.o zcrypt_msgtype50.o
+137 -101
drivers/s390/crypto/ap_bus.c
··· 39 39 #include <linux/ctype.h> 40 40 #include <linux/module.h> 41 41 #include <asm/uv.h> 42 + #include <asm/chsc.h> 42 43 43 44 #include "ap_bus.h" 44 45 #include "ap_debug.h" 45 46 46 - /* 47 - * Module parameters; note though this file itself isn't modular. 48 - */ 47 + MODULE_AUTHOR("IBM Corporation"); 48 + MODULE_DESCRIPTION("Adjunct Processor Bus driver"); 49 + MODULE_LICENSE("GPL"); 50 + 49 51 int ap_domain_index = -1; /* Adjunct Processor Domain Index */ 50 52 static DEFINE_SPINLOCK(ap_domain_lock); 51 53 module_param_named(domain, ap_domain_index, int, 0440); ··· 92 90 /* completion for APQN bindings complete */ 93 91 static DECLARE_COMPLETION(ap_apqn_bindings_complete); 94 92 95 - static struct ap_config_info *ap_qci_info; 96 - static struct ap_config_info *ap_qci_info_old; 93 + static struct ap_config_info qci[2]; 94 + static struct ap_config_info *const ap_qci_info = &qci[0]; 95 + static struct ap_config_info *const ap_qci_info_old = &qci[1]; 97 96 98 97 /* 99 98 * AP bus related debug feature things. ··· 206 203 */ 207 204 static inline int ap_qact_available(void) 208 205 { 209 - if (ap_qci_info) 210 - return ap_qci_info->qact; 211 - return 0; 206 + return ap_qci_info->qact; 212 207 } 213 208 214 209 /* ··· 216 215 */ 217 216 int ap_sb_available(void) 218 217 { 219 - if (ap_qci_info) 220 - return ap_qci_info->apsb; 221 - return 0; 218 + return ap_qci_info->apsb; 222 219 } 223 220 224 221 /* ··· 228 229 } 229 230 EXPORT_SYMBOL(ap_is_se_guest); 230 231 231 - /* 232 - * ap_fetch_qci_info(): Fetch cryptographic config info 233 - * 234 - * Returns the ap configuration info fetched via PQAP(QCI). 235 - * On success 0 is returned, on failure a negative errno 236 - * is returned, e.g. if the PQAP(QCI) instruction is not 237 - * available, the return value will be -EOPNOTSUPP. 238 - */ 239 - static inline int ap_fetch_qci_info(struct ap_config_info *info) 240 - { 241 - if (!ap_qci_available()) 242 - return -EOPNOTSUPP; 243 - if (!info) 244 - return -EINVAL; 245 - return ap_qci(info); 246 - } 247 - 248 232 /** 249 233 * ap_init_qci_info(): Allocate and query qci config info. 250 234 * Does also update the static variables ap_max_domain_id ··· 235 253 */ 236 254 static void __init ap_init_qci_info(void) 237 255 { 238 - if (!ap_qci_available()) { 256 + if (!ap_qci_available() || 257 + ap_qci(ap_qci_info)) { 239 258 AP_DBF_INFO("%s QCI not supported\n", __func__); 240 259 return; 241 260 } 242 - 243 - ap_qci_info = kzalloc(sizeof(*ap_qci_info), GFP_KERNEL); 244 - if (!ap_qci_info) 245 - return; 246 - ap_qci_info_old = kzalloc(sizeof(*ap_qci_info_old), GFP_KERNEL); 247 - if (!ap_qci_info_old) { 248 - kfree(ap_qci_info); 249 - ap_qci_info = NULL; 250 - return; 251 - } 252 - if (ap_fetch_qci_info(ap_qci_info) != 0) { 253 - kfree(ap_qci_info); 254 - kfree(ap_qci_info_old); 255 - ap_qci_info = NULL; 256 - ap_qci_info_old = NULL; 257 - return; 258 - } 261 + memcpy(ap_qci_info_old, ap_qci_info, sizeof(*ap_qci_info)); 259 262 AP_DBF_INFO("%s successful fetched initial qci info\n", __func__); 260 263 261 264 if (ap_qci_info->apxa) { ··· 255 288 __func__, ap_max_domain_id); 256 289 } 257 290 } 258 - 259 - memcpy(ap_qci_info_old, ap_qci_info, sizeof(*ap_qci_info)); 260 291 } 261 292 262 293 /* ··· 277 312 { 278 313 if (id > ap_max_adapter_id) 279 314 return 0; 280 - if (ap_qci_info) 315 + if (ap_qci_info->flags) 281 316 return ap_test_config(ap_qci_info->apm, id); 282 317 return 1; 283 318 } ··· 294 329 { 295 330 if (domain > ap_max_domain_id) 296 331 return 0; 297 - if (ap_qci_info) 332 + if (ap_qci_info->flags) 298 333 return ap_test_config(ap_qci_info->aqm, domain); 299 334 return 1; 300 335 } ··· 1026 1061 /* 1027 1062 * A config change has happened, force an ap bus rescan. 1028 1063 */ 1029 - void ap_bus_cfg_chg(void) 1064 + static int ap_bus_cfg_chg(struct notifier_block *nb, 1065 + unsigned long action, void *data) 1030 1066 { 1067 + if (action != CHSC_NOTIFY_AP_CFG) 1068 + return NOTIFY_DONE; 1069 + 1031 1070 pr_debug("%s config change, forcing bus rescan\n", __func__); 1032 1071 1033 1072 ap_bus_force_rescan(); 1073 + 1074 + return NOTIFY_OK; 1034 1075 } 1035 1076 1036 - /* 1037 - * hex2bitmap() - parse hex mask string and set bitmap. 1038 - * Valid strings are "0x012345678" with at least one valid hex number. 1039 - * Rest of the bitmap to the right is padded with 0. No spaces allowed 1040 - * within the string, the leading 0x may be omitted. 1041 - * Returns the bitmask with exactly the bits set as given by the hex 1042 - * string (both in big endian order). 1043 - */ 1044 - static int hex2bitmap(const char *str, unsigned long *bitmap, int bits) 1077 + static struct notifier_block ap_bus_nb = { 1078 + .notifier_call = ap_bus_cfg_chg, 1079 + }; 1080 + 1081 + int ap_hex2bitmap(const char *str, unsigned long *bitmap, int bits) 1045 1082 { 1046 1083 int i, n, b; 1047 1084 ··· 1070 1103 return -EINVAL; 1071 1104 return 0; 1072 1105 } 1106 + EXPORT_SYMBOL(ap_hex2bitmap); 1073 1107 1074 1108 /* 1075 1109 * modify_bitmap() - parse bitmask argument and modify an existing ··· 1136 1168 rc = modify_bitmap(str, newmap, bits); 1137 1169 } else { 1138 1170 memset(newmap, 0, size); 1139 - rc = hex2bitmap(str, newmap, bits); 1171 + rc = ap_hex2bitmap(str, newmap, bits); 1140 1172 } 1141 1173 return rc; 1142 1174 } ··· 1202 1234 1203 1235 static ssize_t ap_control_domain_mask_show(const struct bus_type *bus, char *buf) 1204 1236 { 1205 - if (!ap_qci_info) /* QCI not supported */ 1237 + if (!ap_qci_info->flags) /* QCI not supported */ 1206 1238 return sysfs_emit(buf, "not supported\n"); 1207 1239 1208 1240 return sysfs_emit(buf, "0x%08x%08x%08x%08x%08x%08x%08x%08x\n", ··· 1216 1248 1217 1249 static ssize_t ap_usage_domain_mask_show(const struct bus_type *bus, char *buf) 1218 1250 { 1219 - if (!ap_qci_info) /* QCI not supported */ 1251 + if (!ap_qci_info->flags) /* QCI not supported */ 1220 1252 return sysfs_emit(buf, "not supported\n"); 1221 1253 1222 1254 return sysfs_emit(buf, "0x%08x%08x%08x%08x%08x%08x%08x%08x\n", ··· 1230 1262 1231 1263 static ssize_t ap_adapter_mask_show(const struct bus_type *bus, char *buf) 1232 1264 { 1233 - if (!ap_qci_info) /* QCI not supported */ 1265 + if (!ap_qci_info->flags) /* QCI not supported */ 1234 1266 return sysfs_emit(buf, "not supported\n"); 1235 1267 1236 1268 return sysfs_emit(buf, "0x%08x%08x%08x%08x%08x%08x%08x%08x\n", ··· 1563 1595 { 1564 1596 int n = 0; 1565 1597 1566 - if (!ap_qci_info) /* QCI not supported */ 1598 + if (!ap_qci_info->flags) /* QCI not supported */ 1567 1599 return sysfs_emit(buf, "-\n"); 1568 1600 1569 1601 if (ap_qci_info->apsc) ··· 2126 2158 */ 2127 2159 static bool ap_get_configuration(void) 2128 2160 { 2129 - if (!ap_qci_info) /* QCI not supported */ 2161 + if (!ap_qci_info->flags) /* QCI not supported */ 2130 2162 return false; 2131 2163 2132 2164 memcpy(ap_qci_info_old, ap_qci_info, sizeof(*ap_qci_info)); 2133 - ap_fetch_qci_info(ap_qci_info); 2165 + ap_qci(ap_qci_info); 2134 2166 2135 2167 return memcmp(ap_qci_info, ap_qci_info_old, 2136 2168 sizeof(struct ap_config_info)) != 0; ··· 2147 2179 2148 2180 unsigned long m[BITS_TO_LONGS(AP_DEVICES)]; 2149 2181 2150 - if (!ap_qci_info) 2182 + if (!ap_qci_info->flags) 2151 2183 return false; 2152 2184 2153 2185 bitmap_andnot(m, (unsigned long *)ap_qci_info->apm, ··· 2168 2200 { 2169 2201 unsigned long m[BITS_TO_LONGS(AP_DOMAINS)]; 2170 2202 2171 - if (!ap_qci_info) 2203 + if (!ap_qci_info->flags) 2172 2204 return false; 2173 2205 2174 2206 bitmap_andnot(m, (unsigned long *)ap_qci_info->aqm, ··· 2278 2310 } 2279 2311 } 2280 2312 2281 - static int __init ap_debug_init(void) 2313 + static inline void __exit ap_async_exit(void) 2314 + { 2315 + if (ap_thread_flag) 2316 + ap_poll_thread_stop(); 2317 + chsc_notifier_unregister(&ap_bus_nb); 2318 + cancel_work(&ap_scan_bus_work); 2319 + hrtimer_cancel(&ap_poll_timer); 2320 + timer_delete(&ap_scan_bus_timer); 2321 + } 2322 + 2323 + static inline int __init ap_async_init(void) 2324 + { 2325 + int rc; 2326 + 2327 + /* Setup the AP bus rescan timer. */ 2328 + timer_setup(&ap_scan_bus_timer, ap_scan_bus_timer_callback, 0); 2329 + 2330 + /* 2331 + * Setup the high resolution poll timer. 2332 + * If we are running under z/VM adjust polling to z/VM polling rate. 2333 + */ 2334 + if (MACHINE_IS_VM) 2335 + poll_high_timeout = 1500000; 2336 + hrtimer_init(&ap_poll_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); 2337 + ap_poll_timer.function = ap_poll_timeout; 2338 + 2339 + queue_work(system_long_wq, &ap_scan_bus_work); 2340 + 2341 + rc = chsc_notifier_register(&ap_bus_nb); 2342 + if (rc) 2343 + goto out; 2344 + 2345 + /* Start the low priority AP bus poll thread. */ 2346 + if (!ap_thread_flag) 2347 + return 0; 2348 + 2349 + rc = ap_poll_thread_start(); 2350 + if (rc) 2351 + goto out_notifier; 2352 + 2353 + return 0; 2354 + 2355 + out_notifier: 2356 + chsc_notifier_unregister(&ap_bus_nb); 2357 + out: 2358 + cancel_work(&ap_scan_bus_work); 2359 + hrtimer_cancel(&ap_poll_timer); 2360 + timer_delete(&ap_scan_bus_timer); 2361 + return rc; 2362 + } 2363 + 2364 + static inline void ap_irq_exit(void) 2365 + { 2366 + if (ap_irq_flag) 2367 + unregister_adapter_interrupt(&ap_airq); 2368 + } 2369 + 2370 + static inline int __init ap_irq_init(void) 2371 + { 2372 + int rc; 2373 + 2374 + if (!ap_interrupts_available() || !ap_useirq) 2375 + return 0; 2376 + 2377 + rc = register_adapter_interrupt(&ap_airq); 2378 + ap_irq_flag = (rc == 0); 2379 + 2380 + return rc; 2381 + } 2382 + 2383 + static inline void ap_debug_exit(void) 2384 + { 2385 + debug_unregister(ap_dbf_info); 2386 + } 2387 + 2388 + static inline int __init ap_debug_init(void) 2282 2389 { 2283 2390 ap_dbf_info = debug_register("ap", 2, 1, 2284 2391 AP_DBF_MAX_SPRINTF_ARGS * sizeof(long)); ··· 2421 2378 ap_domain_index = -1; 2422 2379 } 2423 2380 2424 - /* enable interrupts if available */ 2425 - if (ap_interrupts_available() && ap_useirq) { 2426 - rc = register_adapter_interrupt(&ap_airq); 2427 - ap_irq_flag = (rc == 0); 2428 - } 2429 - 2430 2381 /* Create /sys/bus/ap. */ 2431 2382 rc = bus_register(&ap_bus_type); 2432 2383 if (rc) ··· 2433 2396 goto out_bus; 2434 2397 ap_root_device->bus = &ap_bus_type; 2435 2398 2436 - /* Setup the AP bus rescan timer. */ 2437 - timer_setup(&ap_scan_bus_timer, ap_scan_bus_timer_callback, 0); 2399 + /* enable interrupts if available */ 2400 + rc = ap_irq_init(); 2401 + if (rc) 2402 + goto out_device; 2438 2403 2439 - /* 2440 - * Setup the high resolution poll timer. 2441 - * If we are running under z/VM adjust polling to z/VM polling rate. 2442 - */ 2443 - if (MACHINE_IS_VM) 2444 - poll_high_timeout = 1500000; 2445 - hrtimer_init(&ap_poll_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); 2446 - ap_poll_timer.function = ap_poll_timeout; 2447 - 2448 - /* Start the low priority AP bus poll thread. */ 2449 - if (ap_thread_flag) { 2450 - rc = ap_poll_thread_start(); 2451 - if (rc) 2452 - goto out_work; 2453 - } 2454 - 2455 - queue_work(system_long_wq, &ap_scan_bus_work); 2404 + /* Setup asynchronous work (timers, workqueue, etc). */ 2405 + rc = ap_async_init(); 2406 + if (rc) 2407 + goto out_irq; 2456 2408 2457 2409 return 0; 2458 2410 2459 - out_work: 2460 - hrtimer_cancel(&ap_poll_timer); 2411 + out_irq: 2412 + ap_irq_exit(); 2413 + out_device: 2461 2414 root_device_unregister(ap_root_device); 2462 2415 out_bus: 2463 2416 bus_unregister(&ap_bus_type); 2464 2417 out: 2465 - if (ap_irq_flag) 2466 - unregister_adapter_interrupt(&ap_airq); 2467 - kfree(ap_qci_info); 2418 + ap_debug_exit(); 2468 2419 return rc; 2469 2420 } 2470 - device_initcall(ap_module_init); 2421 + 2422 + static void __exit ap_module_exit(void) 2423 + { 2424 + ap_async_exit(); 2425 + ap_irq_exit(); 2426 + root_device_unregister(ap_root_device); 2427 + bus_unregister(&ap_bus_type); 2428 + ap_debug_exit(); 2429 + } 2430 + 2431 + module_init(ap_module_init); 2432 + module_exit(ap_module_exit);
+22
drivers/s390/crypto/ap_bus.h
··· 344 344 struct mutex *lock); 345 345 346 346 /* 347 + * ap_hex2bitmap() - Convert a string containing a hexadecimal number (str) 348 + * into a bitmap (bitmap) with bits set that correspond to the bits represented 349 + * by the hex string. Input and output data is in big endian order. 350 + * 351 + * str - Input hex string of format "0x1234abcd". The leading "0x" is optional. 352 + * At least one digit is required. Must be large enough to hold the number of 353 + * bits represented by the bits parameter. 354 + * 355 + * bitmap - Pointer to a bitmap. Upon successful completion of this function, 356 + * this bitmap will have bits set to match the value of str. If bitmap is longer 357 + * than str, then the rightmost bits of bitmap are padded with zeros. Must be 358 + * large enough to hold the number of bits represented by the bits parameter. 359 + * 360 + * bits - Length, in bits, of the bitmap represented by str. Must be a multiple 361 + * of 8. 362 + * 363 + * Returns: 0 On success 364 + * -EINVAL If str format is invalid or bits is not a multiple of 8. 365 + */ 366 + int ap_hex2bitmap(const char *str, unsigned long *bitmap, int bits); 367 + 368 + /* 347 369 * Interface to wait for the AP bus to have done one initial ap bus 348 370 * scan and all detected APQNs have been bound to device drivers. 349 371 * If these both conditions are not fulfilled, this function blocks
+2 -2
drivers/s390/crypto/ap_queue.c
··· 708 708 709 709 static DEVICE_ATTR_RO(ap_functions); 710 710 711 - #ifdef CONFIG_ZCRYPT_DEBUG 711 + #ifdef CONFIG_AP_DEBUG 712 712 static ssize_t states_show(struct device *dev, 713 713 struct device_attribute *attr, char *buf) 714 714 { ··· 820 820 &dev_attr_config.attr, 821 821 &dev_attr_chkstop.attr, 822 822 &dev_attr_ap_functions.attr, 823 - #ifdef CONFIG_ZCRYPT_DEBUG 823 + #ifdef CONFIG_AP_DEBUG 824 824 &dev_attr_states.attr, 825 825 &dev_attr_last_err_rc.attr, 826 826 #endif
+208 -16
drivers/s390/crypto/vfio_ap_ops.c
··· 794 794 static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev, 795 795 struct vfio_ap_queue *q) 796 796 { 797 - if (q) { 798 - q->matrix_mdev = matrix_mdev; 799 - hash_add(matrix_mdev->qtable.queues, &q->mdev_qnode, q->apqn); 800 - } 797 + if (!q || vfio_ap_mdev_get_queue(matrix_mdev, q->apqn)) 798 + return; 799 + 800 + q->matrix_mdev = matrix_mdev; 801 + hash_add(matrix_mdev->qtable.queues, &q->mdev_qnode, q->apqn); 801 802 } 802 803 803 804 static void vfio_ap_mdev_link_apqn(struct ap_matrix_mdev *matrix_mdev, int apqn) ··· 1119 1118 } 1120 1119 } 1121 1120 1122 - static void vfio_ap_mdev_hot_unplug_adapter(struct ap_matrix_mdev *matrix_mdev, 1123 - unsigned long apid) 1121 + static void vfio_ap_mdev_hot_unplug_adapters(struct ap_matrix_mdev *matrix_mdev, 1122 + unsigned long *apids) 1124 1123 { 1125 1124 struct vfio_ap_queue *q, *tmpq; 1126 1125 struct list_head qlist; 1126 + unsigned long apid; 1127 + bool apcb_update = false; 1127 1128 1128 1129 INIT_LIST_HEAD(&qlist); 1129 - vfio_ap_mdev_unlink_adapter(matrix_mdev, apid, &qlist); 1130 1130 1131 - if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) { 1132 - clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm); 1133 - vfio_ap_mdev_update_guest_apcb(matrix_mdev); 1131 + for_each_set_bit_inv(apid, apids, AP_DEVICES) { 1132 + vfio_ap_mdev_unlink_adapter(matrix_mdev, apid, &qlist); 1133 + 1134 + if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) { 1135 + clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm); 1136 + apcb_update = true; 1137 + } 1134 1138 } 1139 + 1140 + /* Only update apcb if needed to avoid impacting guest */ 1141 + if (apcb_update) 1142 + vfio_ap_mdev_update_guest_apcb(matrix_mdev); 1135 1143 1136 1144 vfio_ap_mdev_reset_qlist(&qlist); 1137 1145 ··· 1148 1138 vfio_ap_unlink_mdev_fr_queue(q); 1149 1139 list_del(&q->reset_qnode); 1150 1140 } 1141 + } 1142 + 1143 + static void vfio_ap_mdev_hot_unplug_adapter(struct ap_matrix_mdev *matrix_mdev, 1144 + unsigned long apid) 1145 + { 1146 + DECLARE_BITMAP(apids, AP_DEVICES); 1147 + 1148 + bitmap_zero(apids, AP_DEVICES); 1149 + set_bit_inv(apid, apids); 1150 + vfio_ap_mdev_hot_unplug_adapters(matrix_mdev, apids); 1151 1151 } 1152 1152 1153 1153 /** ··· 1320 1300 } 1321 1301 } 1322 1302 1323 - static void vfio_ap_mdev_hot_unplug_domain(struct ap_matrix_mdev *matrix_mdev, 1324 - unsigned long apqi) 1303 + static void vfio_ap_mdev_hot_unplug_domains(struct ap_matrix_mdev *matrix_mdev, 1304 + unsigned long *apqis) 1325 1305 { 1326 1306 struct vfio_ap_queue *q, *tmpq; 1327 1307 struct list_head qlist; 1308 + unsigned long apqi; 1309 + bool apcb_update = false; 1328 1310 1329 1311 INIT_LIST_HEAD(&qlist); 1330 - vfio_ap_mdev_unlink_domain(matrix_mdev, apqi, &qlist); 1331 1312 1332 - if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) { 1333 - clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm); 1334 - vfio_ap_mdev_update_guest_apcb(matrix_mdev); 1313 + for_each_set_bit_inv(apqi, apqis, AP_DOMAINS) { 1314 + vfio_ap_mdev_unlink_domain(matrix_mdev, apqi, &qlist); 1315 + 1316 + if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) { 1317 + clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm); 1318 + apcb_update = true; 1319 + } 1335 1320 } 1321 + 1322 + /* Only update apcb if needed to avoid impacting guest */ 1323 + if (apcb_update) 1324 + vfio_ap_mdev_update_guest_apcb(matrix_mdev); 1336 1325 1337 1326 vfio_ap_mdev_reset_qlist(&qlist); 1338 1327 ··· 1349 1320 vfio_ap_unlink_mdev_fr_queue(q); 1350 1321 list_del(&q->reset_qnode); 1351 1322 } 1323 + } 1324 + 1325 + static void vfio_ap_mdev_hot_unplug_domain(struct ap_matrix_mdev *matrix_mdev, 1326 + unsigned long apqi) 1327 + { 1328 + DECLARE_BITMAP(apqis, AP_DOMAINS); 1329 + 1330 + bitmap_zero(apqis, AP_DEVICES); 1331 + set_bit_inv(apqi, apqis); 1332 + vfio_ap_mdev_hot_unplug_domains(matrix_mdev, apqis); 1352 1333 } 1353 1334 1354 1335 /** ··· 1609 1570 } 1610 1571 static DEVICE_ATTR_RO(guest_matrix); 1611 1572 1573 + static ssize_t write_ap_bitmap(unsigned long *bitmap, char *buf, int offset, char sep) 1574 + { 1575 + return sysfs_emit_at(buf, offset, "0x%016lx%016lx%016lx%016lx%c", 1576 + bitmap[0], bitmap[1], bitmap[2], bitmap[3], sep); 1577 + } 1578 + 1579 + static ssize_t ap_config_show(struct device *dev, struct device_attribute *attr, 1580 + char *buf) 1581 + { 1582 + struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev); 1583 + int idx = 0; 1584 + 1585 + idx += write_ap_bitmap(matrix_mdev->matrix.apm, buf, idx, ','); 1586 + idx += write_ap_bitmap(matrix_mdev->matrix.aqm, buf, idx, ','); 1587 + idx += write_ap_bitmap(matrix_mdev->matrix.adm, buf, idx, '\n'); 1588 + 1589 + return idx; 1590 + } 1591 + 1592 + /* Number of characters needed for a complete hex mask representing the bits in .. */ 1593 + #define AP_DEVICES_STRLEN (AP_DEVICES / 4 + 3) 1594 + #define AP_DOMAINS_STRLEN (AP_DOMAINS / 4 + 3) 1595 + #define AP_CONFIG_STRLEN (AP_DEVICES_STRLEN + 2 * AP_DOMAINS_STRLEN) 1596 + 1597 + static int parse_bitmap(char **strbufptr, unsigned long *bitmap, int nbits) 1598 + { 1599 + char *curmask; 1600 + 1601 + curmask = strsep(strbufptr, ",\n"); 1602 + if (!curmask) 1603 + return -EINVAL; 1604 + 1605 + bitmap_clear(bitmap, 0, nbits); 1606 + return ap_hex2bitmap(curmask, bitmap, nbits); 1607 + } 1608 + 1609 + static int ap_matrix_overflow_check(struct ap_matrix_mdev *matrix_mdev) 1610 + { 1611 + unsigned long bit; 1612 + 1613 + for_each_set_bit_inv(bit, matrix_mdev->matrix.apm, AP_DEVICES) { 1614 + if (bit > matrix_mdev->matrix.apm_max) 1615 + return -ENODEV; 1616 + } 1617 + 1618 + for_each_set_bit_inv(bit, matrix_mdev->matrix.aqm, AP_DOMAINS) { 1619 + if (bit > matrix_mdev->matrix.aqm_max) 1620 + return -ENODEV; 1621 + } 1622 + 1623 + for_each_set_bit_inv(bit, matrix_mdev->matrix.adm, AP_DOMAINS) { 1624 + if (bit > matrix_mdev->matrix.adm_max) 1625 + return -ENODEV; 1626 + } 1627 + 1628 + return 0; 1629 + } 1630 + 1631 + static void ap_matrix_copy(struct ap_matrix *dst, struct ap_matrix *src) 1632 + { 1633 + /* This check works around false positive gcc -Wstringop-overread */ 1634 + if (!src) 1635 + return; 1636 + 1637 + bitmap_copy(dst->apm, src->apm, AP_DEVICES); 1638 + bitmap_copy(dst->aqm, src->aqm, AP_DOMAINS); 1639 + bitmap_copy(dst->adm, src->adm, AP_DOMAINS); 1640 + } 1641 + 1642 + static ssize_t ap_config_store(struct device *dev, struct device_attribute *attr, 1643 + const char *buf, size_t count) 1644 + { 1645 + struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev); 1646 + struct ap_matrix m_new, m_old, m_added, m_removed; 1647 + DECLARE_BITMAP(apm_filtered, AP_DEVICES); 1648 + unsigned long newbit; 1649 + char *newbuf, *rest; 1650 + int rc = count; 1651 + bool do_update; 1652 + 1653 + newbuf = kstrndup(buf, AP_CONFIG_STRLEN, GFP_KERNEL); 1654 + if (!newbuf) 1655 + return -ENOMEM; 1656 + rest = newbuf; 1657 + 1658 + mutex_lock(&ap_perms_mutex); 1659 + get_update_locks_for_mdev(matrix_mdev); 1660 + 1661 + /* Save old state */ 1662 + ap_matrix_copy(&m_old, &matrix_mdev->matrix); 1663 + if (parse_bitmap(&rest, m_new.apm, AP_DEVICES) || 1664 + parse_bitmap(&rest, m_new.aqm, AP_DOMAINS) || 1665 + parse_bitmap(&rest, m_new.adm, AP_DOMAINS)) { 1666 + rc = -EINVAL; 1667 + goto out; 1668 + } 1669 + 1670 + bitmap_andnot(m_removed.apm, m_old.apm, m_new.apm, AP_DEVICES); 1671 + bitmap_andnot(m_removed.aqm, m_old.aqm, m_new.aqm, AP_DOMAINS); 1672 + bitmap_andnot(m_added.apm, m_new.apm, m_old.apm, AP_DEVICES); 1673 + bitmap_andnot(m_added.aqm, m_new.aqm, m_old.aqm, AP_DOMAINS); 1674 + 1675 + /* Need new bitmaps in matrix_mdev for validation */ 1676 + ap_matrix_copy(&matrix_mdev->matrix, &m_new); 1677 + 1678 + /* Ensure new state is valid, else undo new state */ 1679 + rc = vfio_ap_mdev_validate_masks(matrix_mdev); 1680 + if (rc) { 1681 + ap_matrix_copy(&matrix_mdev->matrix, &m_old); 1682 + goto out; 1683 + } 1684 + rc = ap_matrix_overflow_check(matrix_mdev); 1685 + if (rc) { 1686 + ap_matrix_copy(&matrix_mdev->matrix, &m_old); 1687 + goto out; 1688 + } 1689 + rc = count; 1690 + 1691 + /* Need old bitmaps in matrix_mdev for unplug/unlink */ 1692 + ap_matrix_copy(&matrix_mdev->matrix, &m_old); 1693 + 1694 + /* Unlink removed adapters/domains */ 1695 + vfio_ap_mdev_hot_unplug_adapters(matrix_mdev, m_removed.apm); 1696 + vfio_ap_mdev_hot_unplug_domains(matrix_mdev, m_removed.aqm); 1697 + 1698 + /* Need new bitmaps in matrix_mdev for linking new adapters/domains */ 1699 + ap_matrix_copy(&matrix_mdev->matrix, &m_new); 1700 + 1701 + /* Link newly added adapters */ 1702 + for_each_set_bit_inv(newbit, m_added.apm, AP_DEVICES) 1703 + vfio_ap_mdev_link_adapter(matrix_mdev, newbit); 1704 + 1705 + for_each_set_bit_inv(newbit, m_added.aqm, AP_DOMAINS) 1706 + vfio_ap_mdev_link_domain(matrix_mdev, newbit); 1707 + 1708 + /* filter resources not bound to vfio-ap */ 1709 + do_update = vfio_ap_mdev_filter_matrix(matrix_mdev, apm_filtered); 1710 + do_update |= vfio_ap_mdev_filter_cdoms(matrix_mdev); 1711 + 1712 + /* Apply changes to shadow apbc if things changed */ 1713 + if (do_update) { 1714 + vfio_ap_mdev_update_guest_apcb(matrix_mdev); 1715 + reset_queues_for_apids(matrix_mdev, apm_filtered); 1716 + } 1717 + out: 1718 + release_update_locks_for_mdev(matrix_mdev); 1719 + mutex_unlock(&ap_perms_mutex); 1720 + kfree(newbuf); 1721 + return rc; 1722 + } 1723 + static DEVICE_ATTR_RW(ap_config); 1724 + 1612 1725 static struct attribute *vfio_ap_mdev_attrs[] = { 1613 1726 &dev_attr_assign_adapter.attr, 1614 1727 &dev_attr_unassign_adapter.attr, ··· 1768 1577 &dev_attr_unassign_domain.attr, 1769 1578 &dev_attr_assign_control_domain.attr, 1770 1579 &dev_attr_unassign_control_domain.attr, 1580 + &dev_attr_ap_config.attr, 1771 1581 &dev_attr_control_domains.attr, 1772 1582 &dev_attr_matrix.attr, 1773 1583 &dev_attr_guest_matrix.attr,
+3 -3
drivers/s390/crypto/vfio_ap_private.h
··· 75 75 */ 76 76 struct ap_matrix { 77 77 unsigned long apm_max; 78 - DECLARE_BITMAP(apm, 256); 78 + DECLARE_BITMAP(apm, AP_DEVICES); 79 79 unsigned long aqm_max; 80 - DECLARE_BITMAP(aqm, 256); 80 + DECLARE_BITMAP(aqm, AP_DOMAINS); 81 81 unsigned long adm_max; 82 - DECLARE_BITMAP(adm, 256); 82 + DECLARE_BITMAP(adm, AP_DOMAINS); 83 83 }; 84 84 85 85 /**
+35
mm/userfaultfd.c
··· 316 316 goto out; 317 317 } 318 318 319 + static int mfill_atomic_pte_zeroed_folio(pmd_t *dst_pmd, 320 + struct vm_area_struct *dst_vma, 321 + unsigned long dst_addr) 322 + { 323 + struct folio *folio; 324 + int ret = -ENOMEM; 325 + 326 + folio = vma_alloc_zeroed_movable_folio(dst_vma, dst_addr); 327 + if (!folio) 328 + return ret; 329 + 330 + if (mem_cgroup_charge(folio, dst_vma->vm_mm, GFP_KERNEL)) 331 + goto out_put; 332 + 333 + /* 334 + * The memory barrier inside __folio_mark_uptodate makes sure that 335 + * zeroing out the folio become visible before mapping the page 336 + * using set_pte_at(). See do_anonymous_page(). 337 + */ 338 + __folio_mark_uptodate(folio); 339 + 340 + ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr, 341 + &folio->page, true, 0); 342 + if (ret) 343 + goto out_put; 344 + 345 + return 0; 346 + out_put: 347 + folio_put(folio); 348 + return ret; 349 + } 350 + 319 351 static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd, 320 352 struct vm_area_struct *dst_vma, 321 353 unsigned long dst_addr) ··· 355 323 pte_t _dst_pte, *dst_pte; 356 324 spinlock_t *ptl; 357 325 int ret; 326 + 327 + if (mm_forbids_zeropage(dst_vma->vm_mm)) 328 + return mfill_atomic_pte_zeroed_folio(dst_pmd, dst_vma, dst_addr); 358 329 359 330 _dst_pte = pte_mkspecial(pfn_pte(my_zero_pfn(dst_addr), 360 331 dst_vma->vm_page_prot));
-5
scripts/mod/modpost.c
··· 601 601 strstarts(symname, "_savevr_") || 602 602 strcmp(symname, ".TOC.") == 0) 603 603 return 1; 604 - 605 - if (info->hdr->e_machine == EM_S390) 606 - /* Expoline thunks are linked on all kernel modules during final link of .ko */ 607 - if (strstarts(symname, "__s390_indirect_jump_r")) 608 - return 1; 609 604 /* Do not ignore this symbol */ 610 605 return 0; 611 606 }