Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
"A bit of a small release, I suspect in part due to me travelling for
KS. But my backlog of patches to review is smaller than usual, so I
think in part folks just didn't send as much this cycle.

Non-highlights:

- Five fixes for the >128T address space handling, both to fix bugs
in our implementation and to bring the semantics exactly into line
with x86.

Highlights:

- Support for a new OPAL call on bare metal machines which gives us a
true NMI (ie. is not masked by MSR[EE]=0) for debugging etc.

- Support for Power9 DD2 in the CXL driver.

- Improvements to machine check handling so that uncorrectable errors
can be reported into the generic memory_failure() machinery.

- Some fixes and improvements for VPHN, which is used under PowerVM
to notify the Linux partition of topology changes.

- Plumbing to enable TM (transactional memory) without suspend on
some Power9 processors (PPC_FEATURE2_HTM_NO_SUSPEND).

- Support for emulating vector loads form cache-inhibited memory, on
some Power9 revisions.

- Disable the fast-endian switch "syscall" by default (behind a
CONFIG), we believe it has never had any users.

- A major rework of the API drivers use when initiating and waiting
for long running operations performed by OPAL firmware, and changes
to the powernv_flash driver to use the new API.

- Several fixes for the handling of FP/VMX/VSX while processes are
using transactional memory.

- Optimisations of TLB range flushes when using the radix MMU on
Power9.

- Improvements to the VAS facility used to access coprocessors on
Power9, and related improvements to the way the NX crypto driver
handles requests.

- Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit.

Thanks to: Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew
Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin
Herrenschmidt, Breno Leitao, Christophe Leroy, Christophe Lombard,
Cyril Bur, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven,
Guilherme G. Piccoli, Gustavo Romero, Haren Myneni, Joel Stanley,
Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami Hiramatsu,
Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao,
Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia
Franco de Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee,
Shriya, Stephen Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel
Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, and William A.
Kennington III"

* tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (151 commits)
powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature
powerpc/64s: Fix masking of SRR1 bits on instruction fault
powerpc/64s: mm_context.addr_limit is only used on hash
powerpc/64s/radix: Fix 128TB-512TB virtual address boundary case allocation
powerpc/64s/hash: Allow MAP_FIXED allocations to cross 128TB boundary
powerpc/64s/hash: Fix fork() with 512TB process address space
powerpc/64s/hash: Fix 128TB-512TB virtual address boundary case allocation
powerpc/64s/hash: Fix 512T hint detection to use >= 128T
powerpc: Fix DABR match on hash based systems
powerpc/signal: Properly handle return value from uprobe_deny_signal()
powerpc/fadump: use kstrtoint to handle sysfs store
powerpc/lib: Implement UACCESS_FLUSHCACHE API
powerpc/lib: Implement PMEM API
powerpc/powernv/npu: Don't explicitly flush nmmu tlb
powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm()
powerpc/powernv/idle: Round up latency and residency values
powerpc/kprobes: refactor kprobe_lookup_name for safer string operations
powerpc/kprobes: Blacklist emulate_update_regs() from kprobes
powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace
powerpc/kprobes: Disable preemption before invoking probe handler for optprobes
...

+3705 -1153
+4
Documentation/admin-guide/kernel-parameters.txt
··· 3204 3204 allowed (eg kernel_enable_fpu()/kernel_disable_fpu()). 3205 3205 There is some performance impact when enabling this. 3206 3206 3207 + ppc_tm= [PPC] 3208 + Format: {"off"} 3209 + Disable Hardware Transactional Memory 3210 + 3207 3211 print-fatal-signals= 3208 3212 [KNL] debug: print fatal signals 3209 3213
+5 -3
arch/powerpc/Kconfig
··· 139 139 select ARCH_HAS_ELF_RANDOMIZE 140 140 select ARCH_HAS_FORTIFY_SOURCE 141 141 select ARCH_HAS_GCOV_PROFILE_ALL 142 + select ARCH_HAS_PMEM_API if PPC64 142 143 select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE 143 144 select ARCH_HAS_SG_CHAIN 144 145 select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST 146 + select ARCH_HAS_UACCESS_FLUSHCACHE if PPC64 145 147 select ARCH_HAS_UBSAN_SANITIZE_ALL 146 148 select ARCH_HAS_ZONE_DEVICE if PPC_BOOK3S_64 147 149 select ARCH_HAVE_NMI_SAFE_CMPXCHG ··· 337 335 default n 338 336 339 337 config ARCH_SUPPORTS_DEBUG_PAGEALLOC 340 - depends on PPC32 || PPC_STD_MMU_64 338 + depends on PPC32 || PPC_BOOK3S_64 341 339 def_bool y 342 340 343 341 config ARCH_SUPPORTS_UPROBES ··· 724 722 725 723 config PPC_64K_PAGES 726 724 bool "64k page size" 727 - depends on !PPC_FSL_BOOK3E && (44x || PPC_STD_MMU_64 || PPC_BOOK3E_64) 725 + depends on !PPC_FSL_BOOK3E && (44x || PPC_BOOK3S_64 || PPC_BOOK3E_64) 728 726 select HAVE_ARCH_SOFT_DIRTY if PPC_BOOK3S_64 729 727 730 728 config PPC_256K_PAGES ··· 783 781 784 782 config PPC_SUBPAGE_PROT 785 783 bool "Support setting protections for 4k subpages" 786 - depends on PPC_STD_MMU_64 && PPC_64K_PAGES 784 + depends on PPC_BOOK3S_64 && PPC_64K_PAGES 787 785 help 788 786 This option adds support for a system call to allow user programs 789 787 to set access permissions (read/write, readonly, or no access)
+6
arch/powerpc/Kconfig.debug
··· 370 370 def_bool y 371 371 depends on PPC_PTDUMP && PPC_BOOK3S 372 372 373 + config PPC_FAST_ENDIAN_SWITCH 374 + bool "Deprecated fast endian-switch syscall" 375 + depends on DEBUG_KERNEL && PPC_BOOK3S_64 376 + help 377 + If you're unsure what this is, say N. 378 + 373 379 endmenu
+1 -1
arch/powerpc/boot/dts/acadia.dts
··· 183 183 usb@ef603000 { 184 184 compatible = "ohci-be"; 185 185 reg = <0xef603000 0x80>; 186 - interrupts-parent = <&UIC0>; 186 + interrupt-parent = <&UIC0>; 187 187 interrupts = <0xd 0x4 0xe 0x4>; 188 188 }; 189 189
+2
arch/powerpc/configs/powernv_defconfig
··· 192 192 CONFIG_IPMI_POWERNV=y 193 193 CONFIG_RAW_DRIVER=y 194 194 CONFIG_MAX_RAW_DEVS=1024 195 + CONFIG_I2C_CHARDEV=y 195 196 CONFIG_DRM=y 196 197 CONFIG_DRM_AST=y 197 198 CONFIG_FIRMWARE_EDID=y ··· 296 295 CONFIG_SCHED_TRACER=y 297 296 CONFIG_FTRACE_SYSCALLS=y 298 297 CONFIG_BLK_DEV_IO_TRACE=y 298 + CONFIG_PPC_EMULATED_STATS=y 299 299 CONFIG_CODE_PATCHING_SELFTEST=y 300 300 CONFIG_FTR_FIXUP_SELFTEST=y 301 301 CONFIG_MSI_BITMAP_SELFTEST=y
+1
arch/powerpc/configs/pseries_defconfig
··· 193 193 CONFIG_IBM_BSR=m 194 194 CONFIG_RAW_DRIVER=y 195 195 CONFIG_MAX_RAW_DEVS=1024 196 + CONFIG_I2C_CHARDEV=y 196 197 CONFIG_FB=y 197 198 CONFIG_FIRMWARE_EDID=y 198 199 CONFIG_FB_OF=y
+232
arch/powerpc/configs/skiroot_defconfig
··· 1 + CONFIG_PPC64=y 2 + CONFIG_ALTIVEC=y 3 + CONFIG_VSX=y 4 + CONFIG_NR_CPUS=2048 5 + CONFIG_CPU_LITTLE_ENDIAN=y 6 + # CONFIG_SWAP is not set 7 + CONFIG_SYSVIPC=y 8 + CONFIG_POSIX_MQUEUE=y 9 + # CONFIG_CROSS_MEMORY_ATTACH is not set 10 + CONFIG_NO_HZ=y 11 + CONFIG_HIGH_RES_TIMERS=y 12 + CONFIG_TASKSTATS=y 13 + CONFIG_TASK_DELAY_ACCT=y 14 + CONFIG_TASK_XACCT=y 15 + CONFIG_TASK_IO_ACCOUNTING=y 16 + CONFIG_IKCONFIG=y 17 + CONFIG_IKCONFIG_PROC=y 18 + CONFIG_LOG_BUF_SHIFT=20 19 + CONFIG_RELAY=y 20 + CONFIG_BLK_DEV_INITRD=y 21 + # CONFIG_RD_GZIP is not set 22 + # CONFIG_RD_BZIP2 is not set 23 + # CONFIG_RD_LZMA is not set 24 + # CONFIG_RD_LZO is not set 25 + # CONFIG_RD_LZ4 is not set 26 + CONFIG_CC_OPTIMIZE_FOR_SIZE=y 27 + CONFIG_PERF_EVENTS=y 28 + # CONFIG_COMPAT_BRK is not set 29 + CONFIG_JUMP_LABEL=y 30 + CONFIG_STRICT_KERNEL_RWX=y 31 + CONFIG_MODULES=y 32 + CONFIG_MODULE_UNLOAD=y 33 + CONFIG_MODULE_SIG=y 34 + CONFIG_MODULE_SIG_FORCE=y 35 + CONFIG_MODULE_SIG_SHA512=y 36 + CONFIG_PARTITION_ADVANCED=y 37 + # CONFIG_IOSCHED_DEADLINE is not set 38 + # CONFIG_PPC_PSERIES is not set 39 + CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y 40 + CONFIG_CPU_IDLE=y 41 + CONFIG_HZ_100=y 42 + CONFIG_KEXEC=y 43 + CONFIG_IRQ_ALL_CPUS=y 44 + CONFIG_NUMA=y 45 + # CONFIG_COMPACTION is not set 46 + # CONFIG_MIGRATION is not set 47 + # CONFIG_BOUNCE is not set 48 + CONFIG_PPC_64K_PAGES=y 49 + CONFIG_SCHED_SMT=y 50 + CONFIG_CMDLINE_BOOL=y 51 + CONFIG_CMDLINE="console=tty0 console=hvc0 powersave=off" 52 + # CONFIG_SECCOMP is not set 53 + CONFIG_NET=y 54 + CONFIG_PACKET=y 55 + CONFIG_UNIX=y 56 + CONFIG_INET=y 57 + CONFIG_IP_MULTICAST=y 58 + CONFIG_NET_IPIP=y 59 + CONFIG_SYN_COOKIES=y 60 + # CONFIG_INET_XFRM_MODE_TRANSPORT is not set 61 + # CONFIG_INET_XFRM_MODE_TUNNEL is not set 62 + # CONFIG_INET_XFRM_MODE_BEET is not set 63 + # CONFIG_IPV6 is not set 64 + CONFIG_DNS_RESOLVER=y 65 + # CONFIG_WIRELESS is not set 66 + CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" 67 + CONFIG_DEVTMPFS=y 68 + CONFIG_DEVTMPFS_MOUNT=y 69 + CONFIG_MTD=m 70 + CONFIG_MTD_POWERNV_FLASH=m 71 + CONFIG_BLK_DEV_LOOP=y 72 + CONFIG_BLK_DEV_RAM=y 73 + CONFIG_BLK_DEV_RAM_SIZE=65536 74 + CONFIG_VIRTIO_BLK=m 75 + CONFIG_BLK_DEV_NVME=m 76 + CONFIG_EEPROM_AT24=y 77 + # CONFIG_CXL is not set 78 + CONFIG_BLK_DEV_SD=m 79 + CONFIG_BLK_DEV_SR=m 80 + CONFIG_BLK_DEV_SR_VENDOR=y 81 + CONFIG_CHR_DEV_SG=m 82 + CONFIG_SCSI_CONSTANTS=y 83 + CONFIG_SCSI_SCAN_ASYNC=y 84 + CONFIG_SCSI_FC_ATTRS=y 85 + CONFIG_SCSI_CXGB3_ISCSI=m 86 + CONFIG_SCSI_CXGB4_ISCSI=m 87 + CONFIG_SCSI_BNX2_ISCSI=m 88 + CONFIG_BE2ISCSI=m 89 + CONFIG_SCSI_AACRAID=m 90 + CONFIG_MEGARAID_NEWGEN=y 91 + CONFIG_MEGARAID_MM=m 92 + CONFIG_MEGARAID_MAILBOX=m 93 + CONFIG_MEGARAID_SAS=m 94 + CONFIG_SCSI_MPT2SAS=m 95 + CONFIG_SCSI_IPR=m 96 + # CONFIG_SCSI_IPR_TRACE is not set 97 + # CONFIG_SCSI_IPR_DUMP is not set 98 + CONFIG_SCSI_QLA_FC=m 99 + CONFIG_SCSI_QLA_ISCSI=m 100 + CONFIG_SCSI_LPFC=m 101 + CONFIG_SCSI_VIRTIO=m 102 + CONFIG_SCSI_DH=y 103 + CONFIG_SCSI_DH_ALUA=m 104 + CONFIG_ATA=y 105 + CONFIG_SATA_AHCI=y 106 + # CONFIG_ATA_SFF is not set 107 + CONFIG_MD=y 108 + CONFIG_BLK_DEV_MD=m 109 + CONFIG_MD_LINEAR=m 110 + CONFIG_MD_RAID0=m 111 + CONFIG_MD_RAID1=m 112 + CONFIG_MD_RAID10=m 113 + CONFIG_MD_RAID456=m 114 + CONFIG_MD_MULTIPATH=m 115 + CONFIG_MD_FAULTY=m 116 + CONFIG_BLK_DEV_DM=m 117 + CONFIG_DM_CRYPT=m 118 + CONFIG_DM_SNAPSHOT=m 119 + CONFIG_DM_MIRROR=m 120 + CONFIG_DM_ZERO=m 121 + CONFIG_DM_MULTIPATH=m 122 + CONFIG_ACENIC=m 123 + CONFIG_ACENIC_OMIT_TIGON_I=y 124 + CONFIG_TIGON3=y 125 + CONFIG_BNX2X=m 126 + CONFIG_CHELSIO_T1=y 127 + CONFIG_BE2NET=m 128 + CONFIG_S2IO=m 129 + CONFIG_E100=m 130 + CONFIG_E1000=m 131 + CONFIG_E1000E=m 132 + CONFIG_IXGB=m 133 + CONFIG_IXGBE=m 134 + CONFIG_MLX4_EN=m 135 + CONFIG_MLX5_CORE=m 136 + CONFIG_MLX5_CORE_EN=y 137 + CONFIG_MYRI10GE=m 138 + CONFIG_QLGE=m 139 + CONFIG_NETXEN_NIC=m 140 + CONFIG_SFC=m 141 + # CONFIG_USB_NET_DRIVERS is not set 142 + # CONFIG_WLAN is not set 143 + CONFIG_INPUT_EVDEV=y 144 + CONFIG_INPUT_MISC=y 145 + # CONFIG_SERIO_SERPORT is not set 146 + # CONFIG_DEVMEM is not set 147 + CONFIG_SERIAL_8250=y 148 + CONFIG_SERIAL_8250_CONSOLE=y 149 + CONFIG_IPMI_HANDLER=y 150 + CONFIG_IPMI_DEVICE_INTERFACE=y 151 + CONFIG_IPMI_POWERNV=y 152 + CONFIG_HW_RANDOM=y 153 + CONFIG_TCG_TIS_I2C_NUVOTON=y 154 + # CONFIG_I2C_COMPAT is not set 155 + CONFIG_I2C_CHARDEV=y 156 + # CONFIG_I2C_HELPER_AUTO is not set 157 + CONFIG_DRM=y 158 + CONFIG_DRM_RADEON=y 159 + CONFIG_DRM_AST=m 160 + CONFIG_FIRMWARE_EDID=y 161 + CONFIG_FB_MODE_HELPERS=y 162 + CONFIG_FB_OF=y 163 + CONFIG_FB_MATROX=y 164 + CONFIG_FB_MATROX_MILLENIUM=y 165 + CONFIG_FB_MATROX_MYSTIQUE=y 166 + CONFIG_FB_MATROX_G=y 167 + # CONFIG_LCD_CLASS_DEVICE is not set 168 + # CONFIG_BACKLIGHT_GENERIC is not set 169 + # CONFIG_VGA_CONSOLE is not set 170 + CONFIG_LOGO=y 171 + # CONFIG_LOGO_LINUX_MONO is not set 172 + # CONFIG_LOGO_LINUX_VGA16 is not set 173 + CONFIG_USB_HIDDEV=y 174 + CONFIG_USB=y 175 + CONFIG_USB_MON=y 176 + CONFIG_USB_XHCI_HCD=y 177 + CONFIG_USB_EHCI_HCD=y 178 + # CONFIG_USB_EHCI_HCD_PPC_OF is not set 179 + CONFIG_USB_OHCI_HCD=y 180 + CONFIG_USB_STORAGE=y 181 + CONFIG_RTC_CLASS=y 182 + CONFIG_RTC_DRV_GENERIC=m 183 + CONFIG_VIRT_DRIVERS=y 184 + CONFIG_VIRTIO_PCI=y 185 + # CONFIG_IOMMU_SUPPORT is not set 186 + CONFIG_EXT4_FS=m 187 + CONFIG_EXT4_FS_POSIX_ACL=y 188 + CONFIG_EXT4_FS_SECURITY=y 189 + CONFIG_XFS_FS=m 190 + CONFIG_XFS_POSIX_ACL=y 191 + CONFIG_BTRFS_FS=m 192 + CONFIG_BTRFS_FS_POSIX_ACL=y 193 + CONFIG_ISO9660_FS=m 194 + CONFIG_UDF_FS=m 195 + CONFIG_MSDOS_FS=m 196 + CONFIG_VFAT_FS=m 197 + CONFIG_PROC_KCORE=y 198 + CONFIG_TMPFS=y 199 + CONFIG_TMPFS_POSIX_ACL=y 200 + # CONFIG_MISC_FILESYSTEMS is not set 201 + # CONFIG_NETWORK_FILESYSTEMS is not set 202 + CONFIG_NLS_DEFAULT="utf8" 203 + CONFIG_NLS_CODEPAGE_437=y 204 + CONFIG_NLS_ASCII=y 205 + CONFIG_NLS_ISO8859_1=y 206 + CONFIG_NLS_UTF8=y 207 + CONFIG_CRC16=y 208 + CONFIG_CRC_ITU_T=y 209 + CONFIG_LIBCRC32C=y 210 + CONFIG_PRINTK_TIME=y 211 + CONFIG_MAGIC_SYSRQ=y 212 + CONFIG_DEBUG_KERNEL=y 213 + CONFIG_DEBUG_STACKOVERFLOW=y 214 + CONFIG_SOFTLOCKUP_DETECTOR=y 215 + CONFIG_HARDLOCKUP_DETECTOR=y 216 + CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y 217 + CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y 218 + CONFIG_WQ_WATCHDOG=y 219 + CONFIG_SCHEDSTATS=y 220 + # CONFIG_FTRACE is not set 221 + CONFIG_XMON=y 222 + CONFIG_XMON_DEFAULT=y 223 + CONFIG_SECURITY=y 224 + CONFIG_IMA=y 225 + CONFIG_EVM=y 226 + # CONFIG_CRYPTO_ECHAINIV is not set 227 + CONFIG_CRYPTO_ECB=y 228 + CONFIG_CRYPTO_CMAC=y 229 + CONFIG_CRYPTO_MD4=y 230 + CONFIG_CRYPTO_ARC4=y 231 + CONFIG_CRYPTO_DES=y 232 + # CONFIG_CRYPTO_HW is not set
+1 -1
arch/powerpc/include/asm/book3s/64/mmu-hash.h
··· 606 606 607 607 /* 4 bits per slice and we have one slice per 1TB */ 608 608 #define SLICE_ARRAY_SIZE (H_PGTABLE_RANGE >> 41) 609 - #define TASK_SLICE_ARRAY_SZ(x) ((x)->context.addr_limit >> 41) 609 + #define TASK_SLICE_ARRAY_SZ(x) ((x)->context.slb_addr_limit >> 41) 610 610 611 611 #ifndef __ASSEMBLY__ 612 612
+1 -1
arch/powerpc/include/asm/book3s/64/mmu.h
··· 93 93 #ifdef CONFIG_PPC_MM_SLICES 94 94 u64 low_slices_psize; /* SLB page size encodings */ 95 95 unsigned char high_slices_psize[SLICE_ARRAY_SIZE]; 96 - unsigned long addr_limit; 96 + unsigned long slb_addr_limit; 97 97 #else 98 98 u16 sllp; /* SLB page size encoding */ 99 99 #endif
+22
arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
··· 66 66 { 67 67 } 68 68 69 + static inline void hash__local_flush_all_mm(struct mm_struct *mm) 70 + { 71 + /* 72 + * There's no Page Walk Cache for hash, so what is needed is 73 + * the same as flush_tlb_mm(), which doesn't really make sense 74 + * with hash. So the only thing we could do is flush the 75 + * entire LPID! Punt for now, as it's not being used. 76 + */ 77 + WARN_ON_ONCE(1); 78 + } 79 + 80 + static inline void hash__flush_all_mm(struct mm_struct *mm) 81 + { 82 + /* 83 + * There's no Page Walk Cache for hash, so what is needed is 84 + * the same as flush_tlb_mm(), which doesn't really make sense 85 + * with hash. So the only thing we could do is flush the 86 + * entire LPID! Punt for now, as it's not being used. 87 + */ 88 + WARN_ON_ONCE(1); 89 + } 90 + 69 91 static inline void hash__local_flush_tlb_page(struct vm_area_struct *vma, 70 92 unsigned long vmaddr) 71 93 {
+3
arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
··· 22 22 extern void radix__flush_tlb_kernel_range(unsigned long start, unsigned long end); 23 23 24 24 extern void radix__local_flush_tlb_mm(struct mm_struct *mm); 25 + extern void radix__local_flush_all_mm(struct mm_struct *mm); 25 26 extern void radix__local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr); 26 27 extern void radix__local_flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmaddr, 27 28 int psize); 28 29 extern void radix__tlb_flush(struct mmu_gather *tlb); 29 30 #ifdef CONFIG_SMP 30 31 extern void radix__flush_tlb_mm(struct mm_struct *mm); 32 + extern void radix__flush_all_mm(struct mm_struct *mm); 31 33 extern void radix__flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr); 32 34 extern void radix__flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmaddr, 33 35 int psize); 34 36 #else 35 37 #define radix__flush_tlb_mm(mm) radix__local_flush_tlb_mm(mm) 38 + #define radix__flush_all_mm(mm) radix__local_flush_all_mm(mm) 36 39 #define radix__flush_tlb_page(vma,addr) radix__local_flush_tlb_page(vma,addr) 37 40 #define radix__flush_tlb_page_psize(mm,addr,p) radix__local_flush_tlb_page_psize(mm,addr,p) 38 41 #endif
+15
arch/powerpc/include/asm/book3s/64/tlbflush.h
··· 58 58 return hash__local_flush_tlb_page(vma, vmaddr); 59 59 } 60 60 61 + static inline void local_flush_all_mm(struct mm_struct *mm) 62 + { 63 + if (radix_enabled()) 64 + return radix__local_flush_all_mm(mm); 65 + return hash__local_flush_all_mm(mm); 66 + } 67 + 61 68 static inline void tlb_flush(struct mmu_gather *tlb) 62 69 { 63 70 if (radix_enabled()) ··· 87 80 return radix__flush_tlb_page(vma, vmaddr); 88 81 return hash__flush_tlb_page(vma, vmaddr); 89 82 } 83 + 84 + static inline void flush_all_mm(struct mm_struct *mm) 85 + { 86 + if (radix_enabled()) 87 + return radix__flush_all_mm(mm); 88 + return hash__flush_all_mm(mm); 89 + } 90 90 #else 91 91 #define flush_tlb_mm(mm) local_flush_tlb_mm(mm) 92 92 #define flush_tlb_page(vma, addr) local_flush_tlb_page(vma, addr) 93 + #define flush_all_mm(mm) local_flush_all_mm(mm) 93 94 #endif /* CONFIG_SMP */ 94 95 /* 95 96 * flush the page walk cache for the address
+8 -4
arch/powerpc/include/asm/cputable.h
··· 207 207 #define CPU_FTR_STCX_CHECKS_ADDRESS LONG_ASM_CONST(0x0004000000000000) 208 208 #define CPU_FTR_POPCNTB LONG_ASM_CONST(0x0008000000000000) 209 209 #define CPU_FTR_POPCNTD LONG_ASM_CONST(0x0010000000000000) 210 - #define CPU_FTR_ICSWX LONG_ASM_CONST(0x0020000000000000) 210 + /* Free LONG_ASM_CONST(0x0020000000000000) */ 211 211 #define CPU_FTR_VMX_COPY LONG_ASM_CONST(0x0040000000000000) 212 212 #define CPU_FTR_TM LONG_ASM_CONST(0x0080000000000000) 213 213 #define CPU_FTR_CFAR LONG_ASM_CONST(0x0100000000000000) ··· 216 216 #define CPU_FTR_DABRX LONG_ASM_CONST(0x0800000000000000) 217 217 #define CPU_FTR_PMAO_BUG LONG_ASM_CONST(0x1000000000000000) 218 218 #define CPU_FTR_POWER9_DD1 LONG_ASM_CONST(0x4000000000000000) 219 + #define CPU_FTR_POWER9_DD2_1 LONG_ASM_CONST(0x8000000000000000) 219 220 220 221 #ifndef __ASSEMBLY__ 221 222 ··· 453 452 CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ 454 453 CPU_FTR_DSCR | CPU_FTR_SAO | CPU_FTR_ASYM_SMT | \ 455 454 CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \ 456 - CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | \ 455 + CPU_FTR_CFAR | CPU_FTR_HVMODE | \ 457 456 CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX) 458 457 #define CPU_FTRS_POWER8 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ 459 458 CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\ ··· 462 461 CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ 463 462 CPU_FTR_DSCR | CPU_FTR_SAO | \ 464 463 CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \ 465 - CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \ 464 + CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \ 466 465 CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \ 467 466 CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP) 468 467 #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG) ··· 479 478 CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_ARCH_300) 480 479 #define CPU_FTRS_POWER9_DD1 ((CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD1) & \ 481 480 (~CPU_FTR_SAO)) 481 + #define CPU_FTRS_POWER9_DD2_0 CPU_FTRS_POWER9 482 + #define CPU_FTRS_POWER9_DD2_1 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD2_1) 482 483 #define CPU_FTRS_CELL (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ 483 484 CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ 484 485 CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \ ··· 499 496 (CPU_FTRS_POWER4 | CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \ 500 497 CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \ 501 498 CPU_FTRS_POWER8 | CPU_FTRS_POWER8_DD1 | CPU_FTRS_CELL | \ 502 - CPU_FTRS_PA6T | CPU_FTR_VSX | CPU_FTRS_POWER9 | CPU_FTRS_POWER9_DD1) 499 + CPU_FTRS_PA6T | CPU_FTR_VSX | CPU_FTRS_POWER9 | \ 500 + CPU_FTRS_POWER9_DD1 | CPU_FTRS_POWER9_DD2_1) 503 501 #endif 504 502 #else 505 503 enum {
+3 -7
arch/powerpc/include/asm/eeh.h
··· 93 93 struct pci_bus *bus; /* Top PCI bus for bus PE */ 94 94 int check_count; /* Times of ignored error */ 95 95 int freeze_count; /* Times of froze up */ 96 - struct timeval tstamp; /* Time on first-time freeze */ 96 + time64_t tstamp; /* Time on first-time freeze */ 97 97 int false_positives; /* Times of reported #ff's */ 98 98 atomic_t pass_dev_cnt; /* Count of passed through devs */ 99 99 struct eeh_pe *parent; /* Parent PE */ ··· 200 200 struct eeh_ops { 201 201 char *name; 202 202 int (*init)(void); 203 - int (*post_init)(void); 204 203 void* (*probe)(struct pci_dn *pdn, void *data); 205 204 int (*set_option)(struct eeh_pe *pe, int option); 206 205 int (*get_pe_addr)(struct eeh_pe *pe); ··· 274 275 275 276 struct eeh_dev *eeh_dev_init(struct pci_dn *pdn); 276 277 void eeh_dev_phb_init_dynamic(struct pci_controller *phb); 277 - int eeh_init(void); 278 + void eeh_probe_devices(void); 278 279 int __init eeh_ops_register(struct eeh_ops *ops); 279 280 int __exit eeh_ops_unregister(const char *name); 280 281 int eeh_check_failure(const volatile void __iomem *token); ··· 320 321 return false; 321 322 } 322 323 323 - static inline int eeh_init(void) 324 - { 325 - return 0; 326 - } 324 + static inline void eeh_probe_devices(void) { } 327 325 328 326 static inline void *eeh_dev_init(struct pci_dn *pdn, void *data) 329 327 {
+4
arch/powerpc/include/asm/emulated_ops.h
··· 55 55 struct ppc_emulated_entry mfdscr; 56 56 struct ppc_emulated_entry mtdscr; 57 57 struct ppc_emulated_entry lq_stq; 58 + struct ppc_emulated_entry lxvw4x; 59 + struct ppc_emulated_entry lxvh8x; 60 + struct ppc_emulated_entry lxvd2x; 61 + struct ppc_emulated_entry lxvb16x; 58 62 #endif 59 63 } ppc_emulated; 60 64
+6 -6
arch/powerpc/include/asm/epapr_hcalls.h
··· 508 508 509 509 static inline long epapr_hypercall0_1(unsigned int nr, unsigned long *r2) 510 510 { 511 - unsigned long in[8]; 511 + unsigned long in[8] = {0}; 512 512 unsigned long out[8]; 513 513 unsigned long r; 514 514 ··· 520 520 521 521 static inline long epapr_hypercall0(unsigned int nr) 522 522 { 523 - unsigned long in[8]; 523 + unsigned long in[8] = {0}; 524 524 unsigned long out[8]; 525 525 526 526 return epapr_hypercall(in, out, nr); ··· 528 528 529 529 static inline long epapr_hypercall1(unsigned int nr, unsigned long p1) 530 530 { 531 - unsigned long in[8]; 531 + unsigned long in[8] = {0}; 532 532 unsigned long out[8]; 533 533 534 534 in[0] = p1; ··· 538 538 static inline long epapr_hypercall2(unsigned int nr, unsigned long p1, 539 539 unsigned long p2) 540 540 { 541 - unsigned long in[8]; 541 + unsigned long in[8] = {0}; 542 542 unsigned long out[8]; 543 543 544 544 in[0] = p1; ··· 549 549 static inline long epapr_hypercall3(unsigned int nr, unsigned long p1, 550 550 unsigned long p2, unsigned long p3) 551 551 { 552 - unsigned long in[8]; 552 + unsigned long in[8] = {0}; 553 553 unsigned long out[8]; 554 554 555 555 in[0] = p1; ··· 562 562 unsigned long p2, unsigned long p3, 563 563 unsigned long p4) 564 564 { 565 - unsigned long in[8]; 565 + unsigned long in[8] = {0}; 566 566 unsigned long out[8]; 567 567 568 568 in[0] = p1;
+5
arch/powerpc/include/asm/exception-64s.h
··· 55 55 #endif 56 56 57 57 /* 58 + * maximum recursive depth of MCE exceptions 59 + */ 60 + #define MAX_MCE_DEPTH 4 61 + 62 + /* 58 63 * EX_LR is only used in EXSLB and where it does not overlap with EX_DAR 59 64 * EX_CCR similarly with DSISR, but being 4 byte registers there is a hole 60 65 * in the save area so it's not necessary to overlap them. Could be used
-6
arch/powerpc/include/asm/hugetlb.h
··· 41 41 return radix__flush_hugetlb_page(vma, vmaddr); 42 42 } 43 43 44 - static inline void __local_flush_hugetlb_page(struct vm_area_struct *vma, 45 - unsigned long vmaddr) 46 - { 47 - if (radix_enabled()) 48 - return radix__local_flush_hugetlb_page(vma, vmaddr); 49 - } 50 44 #else 51 45 52 46 static inline pte_t *hugepd_page(hugepd_t hpd)
+1
arch/powerpc/include/asm/hw_irq.h
··· 32 32 33 33 #ifndef __ASSEMBLY__ 34 34 35 + extern void replay_system_reset(void); 35 36 extern void __replay_interrupt(unsigned int vector); 36 37 37 38 extern void timer_interrupt(struct pt_regs *);
+1 -1
arch/powerpc/include/asm/kprobes.h
··· 103 103 extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr); 104 104 extern int kprobe_handler(struct pt_regs *regs); 105 105 extern int kprobe_post_handler(struct pt_regs *regs); 106 - extern int is_current_kprobe_addr(unsigned long addr); 107 106 #ifdef CONFIG_KPROBES_ON_FTRACE 107 + extern int __is_active_jprobe(unsigned long addr); 108 108 extern int skip_singlestep(struct kprobe *p, struct pt_regs *regs, 109 109 struct kprobe_ctlblk *kcb); 110 110 #else
-4
arch/powerpc/include/asm/kvm_book3s_asm.h
··· 104 104 u8 napping; 105 105 106 106 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE 107 - /* 108 - * hwthread_req/hwthread_state pair is used to pull sibling threads 109 - * out of guest on pre-ISAv3.0B CPUs where threads share MMU. 110 - */ 111 107 u8 hwthread_req; 112 108 u8 hwthread_state; 113 109 u8 host_ipi;
+1 -3
arch/powerpc/include/asm/mce.h
··· 204 204 205 205 extern void save_mce_event(struct pt_regs *regs, long handled, 206 206 struct mce_error_info *mce_err, uint64_t nip, 207 - uint64_t addr); 207 + uint64_t addr, uint64_t phys_addr); 208 208 extern int get_mce_event(struct machine_check_event *mce, bool release); 209 209 extern void release_mce_event(void); 210 210 extern void machine_check_queue_event(void); 211 211 extern void machine_check_print_event_info(struct machine_check_event *evt, 212 212 bool user_mode); 213 - extern uint64_t get_mce_fault_addr(struct machine_check_event *evt); 214 - 215 213 #endif /* __ASM_PPC64_MCE_H__ */
+50
arch/powerpc/include/asm/mmu_context.h
··· 78 78 extern int use_cop(unsigned long acop, struct mm_struct *mm); 79 79 extern void drop_cop(unsigned long acop, struct mm_struct *mm); 80 80 81 + #ifdef CONFIG_PPC_BOOK3S_64 82 + static inline void inc_mm_active_cpus(struct mm_struct *mm) 83 + { 84 + atomic_inc(&mm->context.active_cpus); 85 + } 86 + 87 + static inline void dec_mm_active_cpus(struct mm_struct *mm) 88 + { 89 + atomic_dec(&mm->context.active_cpus); 90 + } 91 + 92 + static inline void mm_context_add_copro(struct mm_struct *mm) 93 + { 94 + /* 95 + * On hash, should only be called once over the lifetime of 96 + * the context, as we can't decrement the active cpus count 97 + * and flush properly for the time being. 98 + */ 99 + inc_mm_active_cpus(mm); 100 + } 101 + 102 + static inline void mm_context_remove_copro(struct mm_struct *mm) 103 + { 104 + /* 105 + * Need to broadcast a global flush of the full mm before 106 + * decrementing active_cpus count, as the next TLBI may be 107 + * local and the nMMU and/or PSL need to be cleaned up. 108 + * Should be rare enough so that it's acceptable. 109 + * 110 + * Skip on hash, as we don't know how to do the proper flush 111 + * for the time being. Invalidations will remain global if 112 + * used on hash. 113 + */ 114 + if (radix_enabled()) { 115 + flush_all_mm(mm); 116 + dec_mm_active_cpus(mm); 117 + } 118 + } 119 + #else 120 + static inline void inc_mm_active_cpus(struct mm_struct *mm) { } 121 + static inline void dec_mm_active_cpus(struct mm_struct *mm) { } 122 + static inline void mm_context_add_copro(struct mm_struct *mm) { } 123 + static inline void mm_context_remove_copro(struct mm_struct *mm) { } 124 + #endif 125 + 126 + 81 127 extern void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, 82 128 struct task_struct *tsk); 83 129 ··· 165 119 { 166 120 } 167 121 122 + #ifndef CONFIG_PPC_BOOK3S_64 168 123 static inline void arch_exit_mmap(struct mm_struct *mm) 169 124 { 170 125 } 126 + #else 127 + extern void arch_exit_mmap(struct mm_struct *mm); 128 + #endif 171 129 172 130 static inline void arch_unmap(struct mm_struct *mm, 173 131 struct vm_area_struct *vma,
+1 -1
arch/powerpc/include/asm/nohash/64/pgtable.h
··· 204 204 if (!huge) 205 205 assert_pte_locked(mm, addr); 206 206 207 - #ifdef CONFIG_PPC_STD_MMU_64 207 + #ifdef CONFIG_PPC_BOOK3S_64 208 208 if (old & _PAGE_HASHPTE) 209 209 hpte_need_flush(mm, addr, ptep, old, huge); 210 210 #endif
+3
arch/powerpc/include/asm/opal-api.h
··· 188 188 #define OPAL_XIVE_DUMP 142 189 189 #define OPAL_XIVE_RESERVED3 143 190 190 #define OPAL_XIVE_RESERVED4 144 191 + #define OPAL_SIGNAL_SYSTEM_RESET 145 191 192 #define OPAL_NPU_INIT_CONTEXT 146 192 193 #define OPAL_NPU_DESTROY_CONTEXT 147 193 194 #define OPAL_NPU_MAP_LPAR 148 ··· 896 895 */ 897 896 OPAL_REINIT_CPUS_MMU_HASH = (1 << 2), 898 897 OPAL_REINIT_CPUS_MMU_RADIX = (1 << 3), 898 + 899 + OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED = (1 << 4), 899 900 }; 900 901 901 902 typedef struct oppanel_line {
+4 -2
arch/powerpc/include/asm/opal.h
··· 281 281 int opal_set_power_shift_ratio(u32 handle, int token, u32 psr); 282 282 int opal_sensor_group_clear(u32 group_hndl, int token); 283 283 284 + s64 opal_signal_system_reset(s32 cpu); 285 + 284 286 /* Internal functions */ 285 287 extern int early_init_dt_scan_opal(unsigned long node, const char *uname, 286 288 int depth, void *data); ··· 306 304 extern void opal_notifier_disable(void); 307 305 extern void opal_notifier_update_evt(uint64_t evt_mask, uint64_t evt_val); 308 306 309 - extern int __opal_async_get_token(void); 310 307 extern int opal_async_get_token_interruptible(void); 311 - extern int __opal_async_release_token(int token); 312 308 extern int opal_async_release_token(int token); 313 309 extern int opal_async_wait_response(uint64_t token, struct opal_msg *msg); 310 + extern int opal_async_wait_response_interruptible(uint64_t token, 311 + struct opal_msg *msg); 314 312 extern int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data); 315 313 316 314 struct rtc_time;
+7 -6
arch/powerpc/include/asm/paca.h
··· 91 91 u8 cpu_start; /* At startup, processor spins until */ 92 92 /* this becomes non-zero. */ 93 93 u8 kexec_state; /* set when kexec down has irqs off */ 94 - #ifdef CONFIG_PPC_STD_MMU_64 94 + #ifdef CONFIG_PPC_BOOK3S_64 95 95 struct slb_shadow *slb_shadow_ptr; 96 96 struct dtl_entry *dispatch_log; 97 97 struct dtl_entry *dispatch_log_end; 98 - #endif /* CONFIG_PPC_STD_MMU_64 */ 98 + #endif 99 99 u64 dscr_default; /* per-CPU default DSCR */ 100 100 101 - #ifdef CONFIG_PPC_STD_MMU_64 101 + #ifdef CONFIG_PPC_BOOK3S_64 102 102 /* 103 103 * Now, starting in cacheline 2, the exception save areas 104 104 */ ··· 110 110 u16 vmalloc_sllp; 111 111 u16 slb_cache_ptr; 112 112 u32 slb_cache[SLB_CACHE_ENTRIES]; 113 - #endif /* CONFIG_PPC_STD_MMU_64 */ 113 + #endif /* CONFIG_PPC_BOOK3S_64 */ 114 114 115 115 #ifdef CONFIG_PPC_BOOK3E 116 116 u64 exgen[8] __aligned(0x40); ··· 143 143 #ifdef CONFIG_PPC_MM_SLICES 144 144 u64 mm_ctx_low_slices_psize; 145 145 unsigned char mm_ctx_high_slices_psize[SLICE_ARRAY_SIZE]; 146 - unsigned long addr_limit; 146 + unsigned long mm_ctx_slb_addr_limit; 147 147 #else 148 148 u16 mm_ctx_user_psize; 149 149 u16 mm_ctx_sllp; ··· 192 192 struct stop_sprs stop_sprs; 193 193 #endif 194 194 195 - #ifdef CONFIG_PPC_STD_MMU_64 195 + #ifdef CONFIG_PPC_BOOK3S_64 196 196 /* Non-maskable exceptions that are not performance critical */ 197 197 u64 exnmi[EX_SIZE]; /* used for system reset (nmi) */ 198 198 u64 exmc[EX_SIZE]; /* used for machine checks */ ··· 210 210 */ 211 211 u16 in_mce; 212 212 u8 hmi_event_available; /* HMI event is available */ 213 + u8 hmi_p9_special_emu; /* HMI P9 special emulation */ 213 214 #endif 214 215 215 216 /* Stuff for accurate time accounting */
+3 -3
arch/powerpc/include/asm/page_64.h
··· 117 117 #endif /* __ASSEMBLY__ */ 118 118 #else 119 119 #define slice_init() 120 - #ifdef CONFIG_PPC_STD_MMU_64 120 + #ifdef CONFIG_PPC_BOOK3S_64 121 121 #define get_slice_psize(mm, addr) ((mm)->context.user_psize) 122 122 #define slice_set_user_psize(mm, psize) \ 123 123 do { \ 124 124 (mm)->context.user_psize = (psize); \ 125 125 (mm)->context.sllp = SLB_VSID_USER | mmu_psize_defs[(psize)].sllp; \ 126 126 } while (0) 127 - #else /* CONFIG_PPC_STD_MMU_64 */ 127 + #else /* !CONFIG_PPC_BOOK3S_64 */ 128 128 #ifdef CONFIG_PPC_64K_PAGES 129 129 #define get_slice_psize(mm, addr) MMU_PAGE_64K 130 130 #else /* CONFIG_PPC_64K_PAGES */ 131 131 #define get_slice_psize(mm, addr) MMU_PAGE_4K 132 132 #endif /* !CONFIG_PPC_64K_PAGES */ 133 133 #define slice_set_user_psize(mm, psize) do { BUG(); } while(0) 134 - #endif /* !CONFIG_PPC_STD_MMU_64 */ 134 + #endif /* CONFIG_PPC_BOOK3S_64 */ 135 135 136 136 #define slice_set_range_psize(mm, start, len, psize) \ 137 137 slice_set_user_psize((mm), (psize))
+1
arch/powerpc/include/asm/pci-bridge.h
··· 218 218 #endif 219 219 struct list_head child_list; 220 220 struct list_head list; 221 + struct resource holes[PCI_SRIOV_NUM_BARS]; 221 222 }; 222 223 223 224 /* Get the pointer to a device_node's pci_dn */
+1 -1
arch/powerpc/include/asm/pgtable-be-types.h
··· 77 77 * With hash config 64k pages additionally define a bigger "real PTE" type that 78 78 * gathers the "second half" part of the PTE for pseudo 64k pages 79 79 */ 80 - #if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64) 80 + #if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_BOOK3S_64) 81 81 typedef struct { pte_t pte; unsigned long hidx; } real_pte_t; 82 82 #else 83 83 typedef struct { pte_t pte; } real_pte_t;
+2 -2
arch/powerpc/include/asm/pgtable-types.h
··· 50 50 * With hash config 64k pages additionally define a bigger "real PTE" type that 51 51 * gathers the "second half" part of the PTE for pseudo 64k pages 52 52 */ 53 - #if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64) 53 + #if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_BOOK3S_64) 54 54 typedef struct { pte_t pte; unsigned long hidx; } real_pte_t; 55 55 #else 56 56 typedef struct { pte_t pte; } real_pte_t; 57 57 #endif 58 58 59 - #ifdef CONFIG_PPC_STD_MMU_64 59 + #ifdef CONFIG_PPC_BOOK3S_64 60 60 #include <asm/cmpxchg.h> 61 61 62 62 static inline bool pte_xchg(pte_t *ptep, pte_t old, pte_t new)
+4
arch/powerpc/include/asm/powernv.h
··· 22 22 extern int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea, 23 23 unsigned long *flags, unsigned long *status, 24 24 int count); 25 + 26 + void pnv_tm_init(void); 25 27 #else 26 28 static inline void powernv_set_nmmu_ptcr(unsigned long ptcr) { } 27 29 static inline struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev, ··· 38 36 unsigned long *status, int count) { 39 37 return -ENODEV; 40 38 } 39 + 40 + static inline void pnv_tm_init(void) { } 41 41 #endif 42 42 43 43 #endif /* _ASM_POWERNV_H */
+25 -2
arch/powerpc/include/asm/ppc_asm.h
··· 774 774 #ifdef CONFIG_PPC_BOOK3E 775 775 #define FIXUP_ENDIAN 776 776 #else 777 + /* 778 + * This version may be used in in HV or non-HV context. 779 + * MSR[EE] must be disabled. 780 + */ 777 781 #define FIXUP_ENDIAN \ 778 782 tdi 0,0,0x48; /* Reverse endian of b . + 8 */ \ 779 - b $+44; /* Skip trampoline if endian is good */ \ 783 + b 191f; /* Skip trampoline if endian is good */ \ 780 784 .long 0xa600607d; /* mfmsr r11 */ \ 781 785 .long 0x01006b69; /* xori r11,r11,1 */ \ 782 786 .long 0x00004039; /* li r10,0 */ \ ··· 790 786 .long 0x14004a39; /* addi r10,r10,20 */ \ 791 787 .long 0xa6035a7d; /* mtsrr0 r10 */ \ 792 788 .long 0xa6037b7d; /* mtsrr1 r11 */ \ 793 - .long 0x2400004c /* rfid */ 789 + .long 0x2400004c; /* rfid */ \ 790 + 191: 791 + 792 + /* 793 + * This version that may only be used with MSR[HV]=1 794 + * - Does not clear MSR[RI], so more robust. 795 + * - Slightly smaller and faster. 796 + */ 797 + #define FIXUP_ENDIAN_HV \ 798 + tdi 0,0,0x48; /* Reverse endian of b . + 8 */ \ 799 + b 191f; /* Skip trampoline if endian is good */ \ 800 + .long 0xa600607d; /* mfmsr r11 */ \ 801 + .long 0x01006b69; /* xori r11,r11,1 */ \ 802 + .long 0x05009f42; /* bcl 20,31,$+4 */ \ 803 + .long 0xa602487d; /* mflr r10 */ \ 804 + .long 0x14004a39; /* addi r10,r10,20 */ \ 805 + .long 0xa64b5a7d; /* mthsrr0 r10 */ \ 806 + .long 0xa64b7b7d; /* mthsrr1 r11 */ \ 807 + .long 0x2402004c; /* hrfid */ \ 808 + 191: 794 809 795 810 #endif /* !CONFIG_PPC_BOOK3E */ 796 811
+3
arch/powerpc/include/asm/processor.h
··· 329 329 */ 330 330 int dscr_inherit; 331 331 unsigned long ppr; /* used to save/restore SMT priority */ 332 + unsigned long tidr; 332 333 #endif 333 334 #ifdef CONFIG_PPC_BOOK3S_64 334 335 unsigned long tar; ··· 341 340 unsigned long sier; 342 341 unsigned long mmcr2; 343 342 unsigned mmcr0; 343 + 344 344 unsigned used_ebb; 345 + unsigned int used_vas; 345 346 #endif 346 347 }; 347 348
+2
arch/powerpc/include/asm/string.h
··· 12 12 #define __HAVE_ARCH_MEMCMP 13 13 #define __HAVE_ARCH_MEMCHR 14 14 #define __HAVE_ARCH_MEMSET16 15 + #define __HAVE_ARCH_MEMCPY_FLUSHCACHE 15 16 16 17 extern char * strcpy(char *,const char *); 17 18 extern char * strncpy(char *,const char *, __kernel_size_t); ··· 25 24 extern void * memmove(void *,const void *,__kernel_size_t); 26 25 extern int memcmp(const void *,const void *,__kernel_size_t); 27 26 extern void * memchr(const void *,int,__kernel_size_t); 27 + extern void * memcpy_flushcache(void *,const void *,__kernel_size_t); 28 28 29 29 #ifdef CONFIG_PPC64 30 30 #define __HAVE_ARCH_MEMSET32
+5
arch/powerpc/include/asm/switch_to.h
··· 92 92 #endif 93 93 } 94 94 95 + extern int set_thread_uses_vas(void); 96 + 97 + extern int set_thread_tidr(struct task_struct *t); 98 + extern void clear_thread_tidr(struct task_struct *t); 99 + 95 100 #endif /* _ASM_POWERPC_SWITCH_TO_H */
+1 -1
arch/powerpc/include/asm/tlbflush.h
··· 77 77 flush_tlb_mm(mm); 78 78 } 79 79 80 - #elif defined(CONFIG_PPC_STD_MMU_64) 80 + #elif defined(CONFIG_PPC_BOOK3S_64) 81 81 #include <asm/book3s/64/tlbflush.h> 82 82 #else 83 83 #error Unsupported MMU type
+4 -3
arch/powerpc/include/asm/tm.h
··· 12 12 13 13 extern void tm_enable(void); 14 14 extern void tm_reclaim(struct thread_struct *thread, 15 - unsigned long orig_msr, uint8_t cause); 15 + uint8_t cause); 16 16 extern void tm_reclaim_current(uint8_t cause); 17 - extern void tm_recheckpoint(struct thread_struct *thread, 18 - unsigned long orig_msr); 17 + extern void tm_recheckpoint(struct thread_struct *thread); 19 18 extern void tm_abort(uint8_t cause); 20 19 extern void tm_save_sprs(struct thread_struct *thread); 21 20 extern void tm_restore_sprs(struct thread_struct *thread); 21 + 22 + extern bool tm_suspend_disabled; 22 23 23 24 #endif /* __ASSEMBLY__ */
+8
arch/powerpc/include/asm/topology.h
··· 97 97 } 98 98 #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */ 99 99 100 + #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_NEED_MULTIPLE_NODES) 101 + #if defined(CONFIG_PPC_SPLPAR) 102 + extern int timed_topology_update(int nsecs); 103 + #else 104 + #define timed_topology_update(nsecs) 105 + #endif /* CONFIG_PPC_SPLPAR */ 106 + #endif /* CONFIG_HOTPLUG_CPU || CONFIG_NEED_MULTIPLE_NODES */ 107 + 100 108 #include <asm-generic/topology.h> 101 109 102 110 #ifdef CONFIG_SMP
+22
arch/powerpc/include/asm/uaccess.h
··· 174 174 175 175 extern long __get_user_bad(void); 176 176 177 + /* 178 + * This does an atomic 128 byte aligned load from userspace. 179 + * Upto caller to do enable_kernel_vmx() before calling! 180 + */ 181 + #define __get_user_atomic_128_aligned(kaddr, uaddr, err) \ 182 + __asm__ __volatile__( \ 183 + "1: lvx 0,0,%1 # get user\n" \ 184 + " stvx 0,0,%2 # put kernel\n" \ 185 + "2:\n" \ 186 + ".section .fixup,\"ax\"\n" \ 187 + "3: li %0,%3\n" \ 188 + " b 2b\n" \ 189 + ".previous\n" \ 190 + EX_TABLE(1b, 3b) \ 191 + : "=r" (err) \ 192 + : "b" (uaddr), "b" (kaddr), "i" (-EFAULT), "0" (err)) 193 + 177 194 #define __get_user_asm(x, addr, err, op) \ 178 195 __asm__ __volatile__( \ 179 196 "1: "op" %1,0(%2) # get_user\n" \ ··· 356 339 357 340 extern long strncpy_from_user(char *dst, const char __user *src, long count); 358 341 extern __must_check long strnlen_user(const char __user *str, long n); 342 + 343 + extern long __copy_from_user_flushcache(void *dst, const void __user *src, 344 + unsigned size); 345 + extern void memcpy_page_flushcache(char *to, struct page *page, size_t offset, 346 + size_t len); 359 347 360 348 #endif /* _ARCH_POWERPC_UACCESS_H */
+21
arch/powerpc/include/asm/vas.h
··· 10 10 #ifndef _ASM_POWERPC_VAS_H 11 11 #define _ASM_POWERPC_VAS_H 12 12 13 + struct vas_window; 14 + 13 15 /* 14 16 * Min and max FIFO sizes are based on Version 1.05 Section 3.1.4.25 15 17 * (Local FIFO Size Register) of the VAS workbook. ··· 106 104 }; 107 105 108 106 /* 107 + * Helper to map a chip id to VAS id. 108 + * For POWER9, this is a 1:1 mapping. In the future this maybe a 1:N 109 + * mapping in which case, we will need to update this helper. 110 + * 111 + * Return the VAS id or -1 if no matching vasid is found. 112 + */ 113 + int chip_to_vas_id(int chipid); 114 + 115 + /* 109 116 * Helper to initialize receive window attributes to defaults for an 110 117 * NX window. 111 118 */ ··· 167 156 */ 168 157 int vas_paste_crb(struct vas_window *win, int offset, bool re); 169 158 159 + /* 160 + * Return a system-wide unique id for the VAS window @win. 161 + */ 162 + extern u32 vas_win_id(struct vas_window *win); 163 + 164 + /* 165 + * Return the power bus paste address associated with @win so the caller 166 + * can map that address into their address space. 167 + */ 168 + extern u64 vas_win_paste_addr(struct vas_window *win); 170 169 #endif /* __ASM_POWERPC_VAS_H */
+1
arch/powerpc/include/uapi/asm/cputable.h
··· 49 49 #define PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */ 50 50 #define PPC_FEATURE2_DARN 0x00200000 /* darn random number insn */ 51 51 #define PPC_FEATURE2_SCV 0x00100000 /* scv syscall */ 52 + #define PPC_FEATURE2_HTM_NO_SUSPEND 0x00080000 /* TM w/out suspended state */ 52 53 53 54 /* 54 55 * IMPORTANT!
+1 -1
arch/powerpc/kernel/Makefile
··· 129 129 obj-$(CONFIG_PPC64) += $(obj64-y) 130 130 obj-$(CONFIG_PPC32) += $(obj32-y) 131 131 132 - ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC_CORE),) 132 + ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC_CORE)(CONFIG_PPC_BOOK3S),) 133 133 obj-y += ppc_save_regs.o 134 134 endif 135 135
+3 -3
arch/powerpc/kernel/asm-offsets.c
··· 185 185 #ifdef CONFIG_PPC_MM_SLICES 186 186 OFFSET(PACALOWSLICESPSIZE, paca_struct, mm_ctx_low_slices_psize); 187 187 OFFSET(PACAHIGHSLICEPSIZE, paca_struct, mm_ctx_high_slices_psize); 188 - DEFINE(PACA_ADDR_LIMIT, offsetof(struct paca_struct, addr_limit)); 188 + OFFSET(PACA_SLB_ADDR_LIMIT, paca_struct, mm_ctx_slb_addr_limit); 189 189 DEFINE(MMUPSIZEDEFSIZE, sizeof(struct mmu_psize_def)); 190 190 #endif /* CONFIG_PPC_MM_SLICES */ 191 191 #endif ··· 208 208 OFFSET(TCD_ESEL_FIRST, tlb_core_data, esel_first); 209 209 #endif /* CONFIG_PPC_BOOK3E */ 210 210 211 - #ifdef CONFIG_PPC_STD_MMU_64 211 + #ifdef CONFIG_PPC_BOOK3S_64 212 212 OFFSET(PACASLBCACHE, paca_struct, slb_cache); 213 213 OFFSET(PACASLBCACHEPTR, paca_struct, slb_cache_ptr); 214 214 OFFSET(PACAVMALLOCSLLP, paca_struct, vmalloc_sllp); ··· 230 230 OFFSET(LPPACA_DTLIDX, lppaca, dtl_idx); 231 231 OFFSET(LPPACA_YIELDCOUNT, lppaca, yield_count); 232 232 OFFSET(PACA_DTL_RIDX, paca_struct, dtl_ridx); 233 - #endif /* CONFIG_PPC_STD_MMU_64 */ 233 + #endif /* CONFIG_PPC_BOOK3S_64 */ 234 234 OFFSET(PACAEMERGSP, paca_struct, emergency_sp); 235 235 #ifdef CONFIG_PPC_BOOK3S_64 236 236 OFFSET(PACAMCEMERGSP, paca_struct, mc_emergency_sp);
+22 -2
arch/powerpc/kernel/cputable.c
··· 547 547 .machine_check_early = __machine_check_early_realmode_p9, 548 548 .platform = "power9", 549 549 }, 550 - { /* Power9 */ 550 + { /* Power9 DD2.0 */ 551 + .pvr_mask = 0xffffefff, 552 + .pvr_value = 0x004e0200, 553 + .cpu_name = "POWER9 (raw)", 554 + .cpu_features = CPU_FTRS_POWER9_DD2_0, 555 + .cpu_user_features = COMMON_USER_POWER9, 556 + .cpu_user_features2 = COMMON_USER2_POWER9, 557 + .mmu_features = MMU_FTRS_POWER9, 558 + .icache_bsize = 128, 559 + .dcache_bsize = 128, 560 + .num_pmcs = 6, 561 + .pmc_type = PPC_PMC_IBM, 562 + .oprofile_cpu_type = "ppc64/power9", 563 + .oprofile_type = PPC_OPROFILE_INVALID, 564 + .cpu_setup = __setup_cpu_power9, 565 + .cpu_restore = __restore_cpu_power9, 566 + .flush_tlb = __flush_tlb_power9, 567 + .machine_check_early = __machine_check_early_realmode_p9, 568 + .platform = "power9", 569 + }, 570 + { /* Power9 DD 2.1 or later (see DD2.0 above) */ 551 571 .pvr_mask = 0xffff0000, 552 572 .pvr_value = 0x004e0000, 553 573 .cpu_name = "POWER9 (raw)", 554 - .cpu_features = CPU_FTRS_POWER9, 574 + .cpu_features = CPU_FTRS_POWER9_DD2_1, 555 575 .cpu_user_features = COMMON_USER_POWER9, 556 576 .cpu_user_features2 = COMMON_USER2_POWER9, 557 577 .mmu_features = MMU_FTRS_POWER9,
+3 -1
arch/powerpc/kernel/dt_cpu_ftrs.c
··· 634 634 {"no-execute", feat_enable, 0}, 635 635 {"strong-access-ordering", feat_enable, CPU_FTR_SAO}, 636 636 {"cache-inhibited-large-page", feat_enable_large_ci, 0}, 637 - {"coprocessor-icswx", feat_enable, CPU_FTR_ICSWX}, 637 + {"coprocessor-icswx", feat_enable, 0}, 638 638 {"hypervisor-virtualization-interrupt", feat_enable_hvi, 0}, 639 639 {"program-priority-register", feat_enable, CPU_FTR_HAS_PPR}, 640 640 {"wait", feat_enable, 0}, ··· 735 735 */ 736 736 if ((version & 0xffffff00) == 0x004e0100) 737 737 cur_cpu_spec->cpu_features |= CPU_FTR_POWER9_DD1; 738 + else if ((version & 0xffffefff) == 0x004e0200) 739 + cur_cpu_spec->cpu_features &= ~CPU_FTR_POWER9_DD2_1; 738 740 } 739 741 740 742 static void __init cpufeatures_setup_finished(void)
+14 -32
arch/powerpc/kernel/eeh.c
··· 972 972 .notifier_call = eeh_reboot_notifier, 973 973 }; 974 974 975 + void eeh_probe_devices(void) 976 + { 977 + struct pci_controller *hose, *tmp; 978 + struct pci_dn *pdn; 979 + 980 + /* Enable EEH for all adapters */ 981 + list_for_each_entry_safe(hose, tmp, &hose_list, list_node) { 982 + pdn = hose->pci_data; 983 + traverse_pci_dn(pdn, eeh_ops->probe, NULL); 984 + } 985 + } 986 + 975 987 /** 976 988 * eeh_init - EEH initialization 977 989 * ··· 999 987 * Even if force-off is set, the EEH hardware is still enabled, so that 1000 988 * newer systems can boot. 1001 989 */ 1002 - int eeh_init(void) 990 + static int eeh_init(void) 1003 991 { 1004 992 struct pci_controller *hose, *tmp; 1005 - struct pci_dn *pdn; 1006 - static int cnt = 0; 1007 993 int ret = 0; 1008 - 1009 - /* 1010 - * We have to delay the initialization on PowerNV after 1011 - * the PCI hierarchy tree has been built because the PEs 1012 - * are figured out based on PCI devices instead of device 1013 - * tree nodes 1014 - */ 1015 - if (machine_is(powernv) && cnt++ <= 0) 1016 - return ret; 1017 994 1018 995 /* Register reboot notifier */ 1019 996 ret = register_reboot_notifier(&eeh_reboot_nb); ··· 1029 1028 if (ret) 1030 1029 return ret; 1031 1030 1032 - /* Enable EEH for all adapters */ 1033 - list_for_each_entry_safe(hose, tmp, &hose_list, list_node) { 1034 - pdn = hose->pci_data; 1035 - traverse_pci_dn(pdn, eeh_ops->probe, NULL); 1036 - } 1037 - 1038 - /* 1039 - * Call platform post-initialization. Actually, It's good chance 1040 - * to inform platform that EEH is ready to supply service if the 1041 - * I/O cache stuff has been built up. 1042 - */ 1043 - if (eeh_ops->post_init) { 1044 - ret = eeh_ops->post_init(); 1045 - if (ret) 1046 - return ret; 1047 - } 1031 + eeh_probe_devices(); 1048 1032 1049 1033 if (eeh_enabled()) 1050 1034 pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n"); ··· 1742 1756 eeh_clear_flag(EEH_FORCE_DISABLED); 1743 1757 else 1744 1758 eeh_add_flag(EEH_FORCE_DISABLED); 1745 - 1746 - /* Notify the backend */ 1747 - if (eeh_ops->post_init) 1748 - eeh_ops->post_init(); 1749 1759 1750 1760 return 0; 1751 1761 }
+1 -1
arch/powerpc/kernel/eeh_driver.c
··· 623 623 struct eeh_rmv_data *rmv_data) 624 624 { 625 625 struct pci_bus *frozen_bus = eeh_pe_bus_get(pe); 626 - struct timeval tstamp; 626 + time64_t tstamp; 627 627 int cnt, rc; 628 628 struct eeh_dev *edev; 629 629
+4 -4
arch/powerpc/kernel/eeh_pe.c
··· 526 526 */ 527 527 void eeh_pe_update_time_stamp(struct eeh_pe *pe) 528 528 { 529 - struct timeval tstamp; 529 + time64_t tstamp; 530 530 531 531 if (!pe) return; 532 532 533 533 if (pe->freeze_count <= 0) { 534 534 pe->freeze_count = 0; 535 - do_gettimeofday(&pe->tstamp); 535 + pe->tstamp = ktime_get_seconds(); 536 536 } else { 537 - do_gettimeofday(&tstamp); 538 - if (tstamp.tv_sec - pe->tstamp.tv_sec > 3600) { 537 + tstamp = ktime_get_seconds(); 538 + if (tstamp - pe->tstamp > 3600) { 539 539 pe->tstamp = tstamp; 540 540 pe->freeze_count = 0; 541 541 }
+2 -2
arch/powerpc/kernel/entry_64.S
··· 539 539 std r6,PACACURRENT(r13) /* Set new 'current' */ 540 540 541 541 ld r8,KSP(r4) /* new stack pointer */ 542 - #ifdef CONFIG_PPC_STD_MMU_64 542 + #ifdef CONFIG_PPC_BOOK3S_64 543 543 BEGIN_MMU_FTR_SECTION 544 544 b 2f 545 545 END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_RADIX) ··· 588 588 slbmte r7,r0 589 589 isync 590 590 2: 591 - #endif /* CONFIG_PPC_STD_MMU_64 */ 591 + #endif /* CONFIG_PPC_BOOK3S_64 */ 592 592 593 593 CURRENT_THREAD_INFO(r7, r8) /* base of new stack */ 594 594 /* Note: this uses SWITCH_FRAME_SIZE rather than INT_FRAME_SIZE
+32 -17
arch/powerpc/kernel/exceptions-64s.S
··· 114 114 cmpwi cr3,r10,2 ; \ 115 115 BRANCH_TO_C000(r10, system_reset_idle_common) ; \ 116 116 1: \ 117 + KVMTEST_PR(n) ; \ 117 118 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) 118 119 #else 119 120 #define IDLETEST NOTEST ··· 131 130 132 131 EXC_REAL_END(system_reset, 0x100, 0x100) 133 132 EXC_VIRT_NONE(0x4100, 0x100) 133 + TRAMP_KVM(PACA_EXNMI, 0x100) 134 134 135 135 #ifdef CONFIG_PPC_P7_NAP 136 136 EXC_COMMON_BEGIN(system_reset_idle_common) ··· 235 233 addi r10,r10,1 /* increment paca->in_mce */ 236 234 sth r10,PACA_IN_MCE(r13) 237 235 /* Limit nested MCE to level 4 to avoid stack overflow */ 238 - cmpwi r10,4 236 + cmpwi r10,MAX_MCE_DEPTH 239 237 bgt 2f /* Check if we hit limit of 4 */ 240 238 std r11,GPR1(r1) /* Save r1 on the stack. */ 241 239 std r11,0(r1) /* make stack chain pointer */ ··· 544 542 RECONCILE_IRQ_STATE(r10, r11) 545 543 ld r12,_MSR(r1) 546 544 ld r3,_NIP(r1) 547 - andis. r4,r12,DSISR_BAD_FAULT_64S@h 545 + andis. r4,r12,DSISR_SRR1_MATCH_64S@h 548 546 li r5,0x400 549 547 std r3,_DAR(r1) 550 548 std r4,_DSISR(r1) ··· 608 606 cmpdi cr5,r11,MSR_RI 609 607 610 608 crset 4*cr0+eq 611 - #ifdef CONFIG_PPC_STD_MMU_64 609 + #ifdef CONFIG_PPC_BOOK3S_64 612 610 BEGIN_MMU_FTR_SECTION 613 611 bl slb_allocate 614 612 END_MMU_FTR_SECTION_IFCLR(MMU_FTR_TYPE_RADIX) ··· 890 888 #define LOAD_SYSCALL_HANDLER(reg) \ 891 889 __LOAD_HANDLER(reg, system_call_common) 892 890 893 - #define SYSCALL_FASTENDIAN_TEST \ 894 - BEGIN_FTR_SECTION \ 895 - cmpdi r0,0x1ebe ; \ 896 - beq- 1f ; \ 897 - END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) \ 898 - 899 891 /* 900 892 * After SYSCALL_KVMTEST, we reach here with PACA in r13, r13 in r9, 901 893 * and HMT_MEDIUM. ··· 904 908 rfid ; \ 905 909 b . ; /* prevent speculative execution */ 906 910 911 + #ifdef CONFIG_PPC_FAST_ENDIAN_SWITCH 912 + #define SYSCALL_FASTENDIAN_TEST \ 913 + BEGIN_FTR_SECTION \ 914 + cmpdi r0,0x1ebe ; \ 915 + beq- 1f ; \ 916 + END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE) \ 917 + 907 918 #define SYSCALL_FASTENDIAN \ 908 919 /* Fast LE/BE switch system call */ \ 909 920 1: mfspr r12,SPRN_SRR1 ; \ ··· 919 916 mr r13,r9 ; \ 920 917 rfid ; /* return to userspace */ \ 921 918 b . ; /* prevent speculative execution */ 919 + #else 920 + #define SYSCALL_FASTENDIAN_TEST 921 + #define SYSCALL_FASTENDIAN 922 + #endif /* CONFIG_PPC_FAST_ENDIAN_SWITCH */ 922 923 923 924 #if defined(CONFIG_RELOCATABLE) 924 925 /* ··· 1040 1033 EXCEPTION_PROLOG_COMMON_3(0xe60) 1041 1034 addi r3,r1,STACK_FRAME_OVERHEAD 1042 1035 BRANCH_LINK_TO_FAR(hmi_exception_realmode) /* Function call ABI */ 1036 + cmpdi cr0,r3,0 1037 + 1043 1038 /* Windup the stack. */ 1044 1039 /* Move original HSRR0 and HSRR1 into the respective regs */ 1045 1040 ld r9,_MSR(r1) ··· 1058 1049 REST_8GPRS(2, r1) 1059 1050 REST_GPR(10, r1) 1060 1051 ld r11,_CCR(r1) 1052 + REST_2GPRS(12, r1) 1053 + bne 1f 1061 1054 mtcr r11 1062 1055 REST_GPR(11, r1) 1063 - REST_2GPRS(12, r1) 1064 - /* restore original r1. */ 1056 + ld r1,GPR1(r1) 1057 + hrfid 1058 + 1059 + 1: mtcr r11 1060 + REST_GPR(11, r1) 1065 1061 ld r1,GPR1(r1) 1066 1062 1067 1063 /* ··· 1079 1065 EXCEPTION_PROLOG_0(PACA_EXGEN) 1080 1066 b tramp_real_hmi_exception 1081 1067 1082 - EXC_COMMON_ASYNC(hmi_exception_common, 0xe60, handle_hmi_exception) 1083 - 1068 + EXC_COMMON_BEGIN(hmi_exception_common) 1069 + EXCEPTION_COMMON(PACA_EXGEN, 0xe60, hmi_exception_common, handle_hmi_exception, 1070 + ret_from_except, FINISH_NAP;ADD_NVGPRS;ADD_RECONCILE;RUNLATCH_ON) 1084 1071 1085 1072 EXC_REAL_OOL_MASKABLE_HV(h_doorbell, 0xe80, 0x20) 1086 1073 EXC_VIRT_OOL_MASKABLE_HV(h_doorbell, 0x4e80, 0x20, 0xe80) ··· 1520 1505 */ 1521 1506 .balign IFETCH_ALIGN_BYTES 1522 1507 do_hash_page: 1523 - #ifdef CONFIG_PPC_STD_MMU_64 1524 - lis r0,DSISR_BAD_FAULT_64S@h 1508 + #ifdef CONFIG_PPC_BOOK3S_64 1509 + lis r0,(DSISR_BAD_FAULT_64S|DSISR_DABRMATCH)@h 1525 1510 ori r0,r0,DSISR_BAD_FAULT_64S@l 1526 1511 and. r0,r4,r0 /* weird error? */ 1527 1512 bne- handle_page_fault /* if not, try to insert a HPTE */ ··· 1551 1536 1552 1537 /* Reload DSISR into r4 for the DABR check below */ 1553 1538 ld r4,_DSISR(r1) 1554 - #endif /* CONFIG_PPC_STD_MMU_64 */ 1539 + #endif /* CONFIG_PPC_BOOK3S_64 */ 1555 1540 1556 1541 /* Here we have a page fault that hash_page can't handle. */ 1557 1542 handle_page_fault: ··· 1580 1565 12: b ret_from_except_lite 1581 1566 1582 1567 1583 - #ifdef CONFIG_PPC_STD_MMU_64 1568 + #ifdef CONFIG_PPC_BOOK3S_64 1584 1569 /* We have a page fault that hash_page could handle but HV refused 1585 1570 * the PTE insertion 1586 1571 */
+13 -4
arch/powerpc/kernel/fadump.c
··· 1270 1270 struct kobj_attribute *attr, 1271 1271 const char *buf, size_t count) 1272 1272 { 1273 + int input = -1; 1274 + 1273 1275 if (!fw_dump.dump_active) 1274 1276 return -EPERM; 1275 1277 1276 - if (buf[0] == '1') { 1278 + if (kstrtoint(buf, 0, &input)) 1279 + return -EINVAL; 1280 + 1281 + if (input == 1) { 1277 1282 /* 1278 1283 * Take away the '/proc/vmcore'. We are releasing the dump 1279 1284 * memory, hence it will not be valid anymore. ··· 1312 1307 const char *buf, size_t count) 1313 1308 { 1314 1309 int ret = 0; 1310 + int input = -1; 1315 1311 1316 1312 if (!fw_dump.fadump_enabled || fdm_active) 1317 1313 return -EPERM; 1318 1314 1315 + if (kstrtoint(buf, 0, &input)) 1316 + return -EINVAL; 1317 + 1319 1318 mutex_lock(&fadump_mutex); 1320 1319 1321 - switch (buf[0]) { 1322 - case '0': 1320 + switch (input) { 1321 + case 0: 1323 1322 if (fw_dump.dump_registered == 0) { 1324 1323 goto unlock_out; 1325 1324 } 1326 1325 /* Un-register Firmware-assisted dump */ 1327 1326 fadump_unregister_dump(&fdm); 1328 1327 break; 1329 - case '1': 1328 + case 1: 1330 1329 if (fw_dump.dump_registered == 1) { 1331 1330 ret = -EEXIST; 1332 1331 goto unlock_out;
+1 -1
arch/powerpc/kernel/head_32.S
··· 388 388 EXCEPTION_PROLOG 389 389 mfspr r10,SPRN_DSISR 390 390 stw r10,_DSISR(r11) 391 - andis. r0,r10,DSISR_BAD_FAULT_32S@h 391 + andis. r0,r10,(DSISR_BAD_FAULT_32S|DSISR_DABRMATCH)@h 392 392 bne 1f /* if not, try to put a PTE */ 393 393 mfspr r4,SPRN_DAR /* into the hash table */ 394 394 rlwinm r3,r10,32-15,21,21 /* DSISR_STORE -> _PAGE_RW */
+11 -5
arch/powerpc/kernel/head_64.S
··· 55 55 * 56 56 * For pSeries or server processors: 57 57 * 1. The MMU is off & open firmware is running in real mode. 58 - * 2. The kernel is entered at __start 58 + * 2. The primary CPU enters at __start. 59 + * 3. If the RTAS supports "query-cpu-stopped-state", then secondary 60 + * CPUs will enter as directed by "start-cpu" RTAS call, which is 61 + * generic_secondary_smp_init, with PIR in r3. 62 + * 4. Else the secondary CPUs will enter at secondary_hold (0x60) as 63 + * directed by the "start-cpu" RTS call, with PIR in r3. 59 64 * -or- For OPAL entry: 60 - * 1. The MMU is off, processor in HV mode, primary CPU enters at 0 61 - * with device-tree in gpr3. We also get OPAL base in r8 and 62 - * entry in r9 for debugging purposes 63 - * 2. Secondary processors enter at 0x60 with PIR in gpr3 65 + * 1. The MMU is off, processor in HV mode. 66 + * 2. The primary CPU enters at 0 with device-tree in r3, OPAL base 67 + * in r8, and entry in r9 for debugging purposes. 68 + * 3. Secondary CPUs enter as directed by OPAL_START_CPU call, which 69 + * is at generic_secondary_smp_init, with PIR in r3. 64 70 * 65 71 * For Book3E processors: 66 72 * 1. The MMU is on running in AS0 in a state defined in ePAPR
+33 -37
arch/powerpc/kernel/idle_book3s.S
··· 112 112 std r4, STOP_HFSCR(r13) 113 113 114 114 mfspr r3, SPRN_MMCRA 115 - mfspr r4, SPRN_MMCR1 115 + mfspr r4, SPRN_MMCR0 116 116 std r3, STOP_MMCRA(r13) 117 - std r4, STOP_MMCR1(r13) 117 + std r4, _MMCR0(r1) 118 118 119 - mfspr r3, SPRN_MMCR2 120 - std r3, STOP_MMCR2(r13) 119 + mfspr r3, SPRN_MMCR1 120 + mfspr r4, SPRN_MMCR2 121 + std r3, STOP_MMCR1(r13) 122 + std r4, STOP_MMCR2(r13) 121 123 blr 122 124 123 125 power9_restore_additional_sprs: ··· 137 135 ld r4, STOP_MMCRA(r13) 138 136 mtspr SPRN_HFSCR, r3 139 137 mtspr SPRN_MMCRA, r4 140 - /* We have already restored PACA_MMCR0 */ 141 - ld r3, STOP_MMCR1(r13) 142 - ld r4, STOP_MMCR2(r13) 143 - mtspr SPRN_MMCR1, r3 144 - mtspr SPRN_MMCR2, r4 138 + 139 + ld r3, _MMCR0(r1) 140 + ld r4, STOP_MMCR1(r13) 141 + mtspr SPRN_MMCR0, r3 142 + mtspr SPRN_MMCR1, r4 143 + 144 + ld r3, STOP_MMCR2(r13) 145 + mtspr SPRN_MMCR2, r3 145 146 blr 146 147 147 148 /* ··· 324 319 /* 325 320 * r3 - PSSCR value corresponding to the requested stop state. 326 321 */ 322 + power_enter_stop: 327 323 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE 328 - power_enter_stop_kvm_rm: 329 - /* 330 - * This is currently unused because POWER9 KVM does not have to 331 - * gather secondary threads into sibling mode, but the code is 332 - * here in case that function is required. 333 - * 334 - * Tell KVM we're entering idle. 335 - */ 324 + /* Tell KVM we're entering idle */ 336 325 li r4,KVM_HWTHREAD_IN_IDLE 337 326 /* DO THIS IN REAL MODE! See comment above. */ 338 327 stb r4,HSTATE_HWTHREAD_STATE(r13) 339 328 #endif 340 - power_enter_stop: 341 329 /* 342 330 * Check if we are executing the lite variant with ESL=EC=0 343 331 */ ··· 355 357 b pnv_wakeup_noloss 356 358 357 359 .Lhandle_esl_ec_set: 360 + BEGIN_FTR_SECTION 358 361 /* 359 - * POWER9 DD2 can incorrectly set PMAO when waking up after a 360 - * state-loss idle. Saving and restoring MMCR0 over idle is a 362 + * POWER9 DD2.0 or earlier can incorrectly set PMAO when waking up after 363 + * a state-loss idle. Saving and restoring MMCR0 over idle is a 361 364 * workaround. 362 365 */ 363 366 mfspr r4,SPRN_MMCR0 364 367 std r4,_MMCR0(r1) 368 + END_FTR_SECTION_IFCLR(CPU_FTR_POWER9_DD2_1) 365 369 366 370 /* 367 371 * Check if the requested state is a deep idle state. ··· 496 496 497 497 b pnv_powersave_wakeup 498 498 499 - #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE 500 - kvm_start_guest_check: 501 - li r0,KVM_HWTHREAD_IN_KERNEL 502 - stb r0,HSTATE_HWTHREAD_STATE(r13) 503 - /* Order setting hwthread_state vs. testing hwthread_req */ 504 - sync 505 - lbz r0,HSTATE_HWTHREAD_REQ(r13) 506 - cmpwi r0,0 507 - beqlr 508 - b kvm_start_guest 509 - #endif 510 - 511 499 /* 512 500 * Called from reset vector for powersave wakeups. 513 501 * cr3 - set to gt if waking up with partial/complete hypervisor state loss ··· 520 532 mr r3,r12 521 533 522 534 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE 523 - BEGIN_FTR_SECTION 524 - bl kvm_start_guest_check 525 - END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300) 535 + li r0,KVM_HWTHREAD_IN_KERNEL 536 + stb r0,HSTATE_HWTHREAD_STATE(r13) 537 + /* Order setting hwthread_state vs. testing hwthread_req */ 538 + sync 539 + lbz r0,HSTATE_HWTHREAD_REQ(r13) 540 + cmpwi r0,0 541 + beq 1f 542 + b kvm_start_guest 543 + 1: 526 544 #endif 527 545 528 546 /* Return SRR1 from power7_nap() */ ··· 549 555 * then clear bit 60 in MMCRA to ensure the PMU starts running. 550 556 */ 551 557 blt cr3,1f 558 + BEGIN_FTR_SECTION 552 559 PPC_INVALIDATE_ERAT 553 560 ld r1,PACAR1(r13) 561 + ld r4,_MMCR0(r1) 562 + mtspr SPRN_MMCR0,r4 563 + END_FTR_SECTION_IFCLR(CPU_FTR_POWER9_DD2_1) 554 564 mfspr r4,SPRN_MMCRA 555 565 ori r4,r4,(1 << (63-60)) 556 566 mtspr SPRN_MMCRA,r4 557 567 xori r4,r4,(1 << (63-60)) 558 568 mtspr SPRN_MMCRA,r4 559 - ld r4,_MMCR0(r1) 560 - mtspr SPRN_MMCR0,r4 561 569 1: 562 570 /* 563 571 * POWER ISA 3. Use PSSCR to determine if we
+48 -3
arch/powerpc/kernel/irq.c
··· 143 143 */ 144 144 unsigned char happened = local_paca->irq_happened; 145 145 146 + /* 147 + * We are responding to the next interrupt, so interrupt-off 148 + * latencies should be reset here. 149 + */ 150 + trace_hardirqs_on(); 151 + trace_hardirqs_off(); 152 + 146 153 if (happened & PACA_IRQ_HARD_DIS) { 147 154 /* Clear bit 0 which we wouldn't clear otherwise */ 148 155 local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS; ··· 277 270 #endif /* CONFIG_TRACE_IRQFLAGS */ 278 271 279 272 set_soft_enabled(0); 273 + trace_hardirqs_off(); 280 274 281 275 /* 282 276 * Check if anything needs to be re-emitted. We haven't ··· 287 279 replay = __check_irq_replay(); 288 280 289 281 /* We can soft-enable now */ 282 + trace_hardirqs_on(); 290 283 set_soft_enabled(1); 291 284 292 285 /* ··· 403 394 /* 404 395 * Take the SRR1 wakeup reason, index into this table to find the 405 396 * appropriate irq_happened bit. 397 + * 398 + * Sytem reset exceptions taken in idle state also come through here, 399 + * but they are NMI interrupts so do not need to wait for IRQs to be 400 + * restored, and should be taken as early as practical. These are marked 401 + * with 0xff in the table. The Power ISA specifies 0100b as the system 402 + * reset interrupt reason. 406 403 */ 404 + #define IRQ_SYSTEM_RESET 0xff 405 + 407 406 static const u8 srr1_to_lazyirq[0x10] = { 408 407 0, 0, 0, 409 408 PACA_IRQ_DBELL, 410 - 0, 409 + IRQ_SYSTEM_RESET, 411 410 PACA_IRQ_DBELL, 412 411 PACA_IRQ_DEC, 413 412 0, ··· 424 407 PACA_IRQ_HMI, 425 408 0, 0, 0, 0, 0 }; 426 409 410 + void replay_system_reset(void) 411 + { 412 + struct pt_regs regs; 413 + 414 + ppc_save_regs(&regs); 415 + regs.trap = 0x100; 416 + get_paca()->in_nmi = 1; 417 + system_reset_exception(&regs); 418 + get_paca()->in_nmi = 0; 419 + } 420 + EXPORT_SYMBOL_GPL(replay_system_reset); 421 + 427 422 void irq_set_pending_from_srr1(unsigned long srr1) 428 423 { 429 424 unsigned int idx = (srr1 & SRR1_WAKEMASK_P8) >> 18; 425 + u8 reason = srr1_to_lazyirq[idx]; 426 + 427 + /* 428 + * Take the system reset now, which is immediately after registers 429 + * are restored from idle. It's an NMI, so interrupts need not be 430 + * re-enabled before it is taken. 431 + */ 432 + if (unlikely(reason == IRQ_SYSTEM_RESET)) { 433 + replay_system_reset(); 434 + return; 435 + } 430 436 431 437 /* 432 438 * The 0 index (SRR1[42:45]=b0000) must always evaluate to 0, 433 - * so this can be called unconditionally with srr1 wake reason. 439 + * so this can be called unconditionally with the SRR1 wake 440 + * reason as returned by the idle code, which uses 0 to mean no 441 + * interrupt. 442 + * 443 + * If a future CPU was to designate this as an interrupt reason, 444 + * then a new index for no interrupt must be assigned. 434 445 */ 435 - local_paca->irq_happened |= srr1_to_lazyirq[idx]; 446 + local_paca->irq_happened |= reason; 436 447 } 437 448 #endif /* CONFIG_PPC_BOOK3S */ 438 449
+25 -9
arch/powerpc/kernel/kprobes-ftrace.c
··· 25 25 #include <linux/preempt.h> 26 26 #include <linux/ftrace.h> 27 27 28 + /* 29 + * This is called from ftrace code after invoking registered handlers to 30 + * disambiguate regs->nip changes done by jprobes and livepatch. We check if 31 + * there is an active jprobe at the provided address (mcount location). 32 + */ 33 + int __is_active_jprobe(unsigned long addr) 34 + { 35 + if (!preemptible()) { 36 + struct kprobe *p = raw_cpu_read(current_kprobe); 37 + return (p && (unsigned long)p->addr == addr) ? 1 : 0; 38 + } 39 + 40 + return 0; 41 + } 42 + 28 43 static nokprobe_inline 29 44 int __skip_singlestep(struct kprobe *p, struct pt_regs *regs, 30 45 struct kprobe_ctlblk *kcb, unsigned long orig_nip) ··· 75 60 { 76 61 struct kprobe *p; 77 62 struct kprobe_ctlblk *kcb; 78 - unsigned long flags; 79 63 80 - /* Disable irq for emulating a breakpoint and avoiding preempt */ 81 - local_irq_save(flags); 82 - hard_irq_disable(); 64 + preempt_disable(); 83 65 84 66 p = get_kprobe((kprobe_opcode_t *)nip); 85 67 if (unlikely(!p) || kprobe_disabled(p)) ··· 98 86 kcb->kprobe_status = KPROBE_HIT_ACTIVE; 99 87 if (!p->pre_handler || !p->pre_handler(p, regs)) 100 88 __skip_singlestep(p, regs, kcb, orig_nip); 101 - /* 102 - * If pre_handler returns !0, it sets regs->nip and 103 - * resets current kprobe. 104 - */ 89 + else { 90 + /* 91 + * If pre_handler returns !0, it sets regs->nip and 92 + * resets current kprobe. In this case, we should not 93 + * re-enable preemption. 94 + */ 95 + return; 96 + } 105 97 } 106 98 end: 107 - local_irq_restore(flags); 99 + preempt_enable_no_resched(); 108 100 } 109 101 NOKPROBE_SYMBOL(kprobe_ftrace_handler); 110 102
+44 -48
arch/powerpc/kernel/kprobes.c
··· 43 43 44 44 struct kretprobe_blackpoint kretprobe_blacklist[] = {{NULL, NULL}}; 45 45 46 - int is_current_kprobe_addr(unsigned long addr) 47 - { 48 - struct kprobe *p = kprobe_running(); 49 - return (p && (unsigned long)p->addr == addr) ? 1 : 0; 50 - } 51 - 52 46 bool arch_within_kprobe_blacklist(unsigned long addr) 53 47 { 54 48 return (addr >= (unsigned long)__kprobes_text_start && ··· 53 59 54 60 kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset) 55 61 { 56 - kprobe_opcode_t *addr; 62 + kprobe_opcode_t *addr = NULL; 57 63 58 64 #ifdef PPC64_ELF_ABI_v2 59 65 /* PPC64 ABIv2 needs local entry point */ ··· 85 91 * Also handle <module:symbol> format. 86 92 */ 87 93 char dot_name[MODULE_NAME_LEN + 1 + KSYM_NAME_LEN]; 88 - const char *modsym; 89 94 bool dot_appended = false; 90 - if ((modsym = strchr(name, ':')) != NULL) { 91 - modsym++; 92 - if (*modsym != '\0' && *modsym != '.') { 93 - /* Convert to <module:.symbol> */ 94 - strncpy(dot_name, name, modsym - name); 95 - dot_name[modsym - name] = '.'; 96 - dot_name[modsym - name + 1] = '\0'; 97 - strncat(dot_name, modsym, 98 - sizeof(dot_name) - (modsym - name) - 2); 99 - dot_appended = true; 100 - } else { 101 - dot_name[0] = '\0'; 102 - strncat(dot_name, name, sizeof(dot_name) - 1); 103 - } 104 - } else if (name[0] != '.') { 105 - dot_name[0] = '.'; 106 - dot_name[1] = '\0'; 107 - strncat(dot_name, name, KSYM_NAME_LEN - 2); 95 + const char *c; 96 + ssize_t ret = 0; 97 + int len = 0; 98 + 99 + if ((c = strnchr(name, MODULE_NAME_LEN, ':')) != NULL) { 100 + c++; 101 + len = c - name; 102 + memcpy(dot_name, name, len); 103 + } else 104 + c = name; 105 + 106 + if (*c != '\0' && *c != '.') { 107 + dot_name[len++] = '.'; 108 108 dot_appended = true; 109 - } else { 110 - dot_name[0] = '\0'; 111 - strncat(dot_name, name, KSYM_NAME_LEN - 1); 112 109 } 113 - addr = (kprobe_opcode_t *)kallsyms_lookup_name(dot_name); 114 - if (!addr && dot_appended) { 115 - /* Let's try the original non-dot symbol lookup */ 110 + ret = strscpy(dot_name + len, c, KSYM_NAME_LEN); 111 + if (ret > 0) 112 + addr = (kprobe_opcode_t *)kallsyms_lookup_name(dot_name); 113 + 114 + /* Fallback to the original non-dot symbol lookup */ 115 + if (!addr && dot_appended) 116 116 addr = (kprobe_opcode_t *)kallsyms_lookup_name(name); 117 - } 118 117 #else 119 118 addr = (kprobe_opcode_t *)kallsyms_lookup_name(name); 120 119 #endif ··· 226 239 } 227 240 NOKPROBE_SYMBOL(arch_prepare_kretprobe); 228 241 229 - int try_to_emulate(struct kprobe *p, struct pt_regs *regs) 242 + static int try_to_emulate(struct kprobe *p, struct pt_regs *regs) 230 243 { 231 244 int ret; 232 245 unsigned int insn = *p->ainsn.insn; ··· 248 261 */ 249 262 printk("Can't step on instruction %x\n", insn); 250 263 BUG(); 251 - } else if (ret == 0) 252 - /* This instruction can't be boosted */ 253 - p->ainsn.boostable = -1; 264 + } else { 265 + /* 266 + * If we haven't previously emulated this instruction, then it 267 + * can't be boosted. Note it down so we don't try to do so again. 268 + * 269 + * If, however, we had emulated this instruction in the past, 270 + * then this is just an error with the current run (for 271 + * instance, exceptions due to a load/store). We return 0 so 272 + * that this is now single-stepped, but continue to try 273 + * emulating it in subsequent probe hits. 274 + */ 275 + if (unlikely(p->ainsn.boostable != 1)) 276 + p->ainsn.boostable = -1; 277 + } 254 278 255 279 return ret; 256 280 } ··· 637 639 638 640 void __used jprobe_return(void) 639 641 { 640 - asm volatile("trap" ::: "memory"); 642 + asm volatile("jprobe_return_trap:\n" 643 + "trap\n" 644 + ::: "memory"); 641 645 } 642 646 NOKPROBE_SYMBOL(jprobe_return); 643 - 644 - static void __used jprobe_return_end(void) 645 - { 646 - } 647 - NOKPROBE_SYMBOL(jprobe_return_end); 648 647 649 648 int longjmp_break_handler(struct kprobe *p, struct pt_regs *regs) 650 649 { 651 650 struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); 652 651 653 - /* 654 - * FIXME - we should ideally be validating that we got here 'cos 655 - * of the "trap" in jprobe_return() above, before restoring the 656 - * saved regs... 657 - */ 652 + if (regs->nip != ppc_kallsyms_lookup_name("jprobe_return_trap")) { 653 + pr_debug("longjmp_break_handler NIP (0x%lx) does not match jprobe_return_trap (0x%lx)\n", 654 + regs->nip, ppc_kallsyms_lookup_name("jprobe_return_trap")); 655 + return 0; 656 + } 657 + 658 658 memcpy(regs, &kcb->jprobe_saved_regs, sizeof(struct pt_regs)); 659 659 /* It's OK to start function graph tracing again */ 660 660 unpause_graph_tracing();
+2 -2
arch/powerpc/kernel/machine_kexec_64.c
··· 360 360 /* NOTREACHED */ 361 361 } 362 362 363 - #ifdef CONFIG_PPC_STD_MMU_64 363 + #ifdef CONFIG_PPC_BOOK3S_64 364 364 /* Values we need to export to the second kernel via the device tree. */ 365 365 static unsigned long htab_base; 366 366 static unsigned long htab_size; ··· 402 402 return 0; 403 403 } 404 404 late_initcall(export_htab_values); 405 - #endif /* CONFIG_PPC_STD_MMU_64 */ 405 + #endif /* CONFIG_PPC_BOOK3S_64 */
+101 -44
arch/powerpc/kernel/mce.c
··· 39 39 static DEFINE_PER_CPU(int, mce_queue_count); 40 40 static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_event_queue); 41 41 42 + /* Queue for delayed MCE UE events. */ 43 + static DEFINE_PER_CPU(int, mce_ue_count); 44 + static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], 45 + mce_ue_event_queue); 46 + 42 47 static void machine_check_process_queued_event(struct irq_work *work); 48 + void machine_check_ue_event(struct machine_check_event *evt); 49 + static void machine_process_ue_event(struct work_struct *work); 50 + 43 51 static struct irq_work mce_event_process_work = { 44 52 .func = machine_check_process_queued_event, 45 53 }; 54 + 55 + DECLARE_WORK(mce_ue_event_work, machine_process_ue_event); 46 56 47 57 static void mce_set_error_info(struct machine_check_event *mce, 48 58 struct mce_error_info *mce_err) ··· 92 82 */ 93 83 void save_mce_event(struct pt_regs *regs, long handled, 94 84 struct mce_error_info *mce_err, 95 - uint64_t nip, uint64_t addr) 85 + uint64_t nip, uint64_t addr, uint64_t phys_addr) 96 86 { 97 87 int index = __this_cpu_inc_return(mce_nest_count) - 1; 98 88 struct machine_check_event *mce = this_cpu_ptr(&mce_event[index]); ··· 150 140 } else if (mce->error_type == MCE_ERROR_TYPE_UE) { 151 141 mce->u.ue_error.effective_address_provided = true; 152 142 mce->u.ue_error.effective_address = addr; 143 + if (phys_addr != ULONG_MAX) { 144 + mce->u.ue_error.physical_address_provided = true; 145 + mce->u.ue_error.physical_address = phys_addr; 146 + machine_check_ue_event(mce); 147 + } 153 148 } 154 149 return; 155 150 } ··· 208 193 get_mce_event(NULL, true); 209 194 } 210 195 196 + 197 + /* 198 + * Queue up the MCE event which then can be handled later. 199 + */ 200 + void machine_check_ue_event(struct machine_check_event *evt) 201 + { 202 + int index; 203 + 204 + index = __this_cpu_inc_return(mce_ue_count) - 1; 205 + /* If queue is full, just return for now. */ 206 + if (index >= MAX_MC_EVT) { 207 + __this_cpu_dec(mce_ue_count); 208 + return; 209 + } 210 + memcpy(this_cpu_ptr(&mce_ue_event_queue[index]), evt, sizeof(*evt)); 211 + 212 + /* Queue work to process this event later. */ 213 + schedule_work(&mce_ue_event_work); 214 + } 215 + 211 216 /* 212 217 * Queue up the MCE event which then can be handled later. 213 218 */ ··· 250 215 /* Queue irq work to process this event later. */ 251 216 irq_work_queue(&mce_event_process_work); 252 217 } 218 + /* 219 + * process pending MCE event from the mce event queue. This function will be 220 + * called during syscall exit. 221 + */ 222 + static void machine_process_ue_event(struct work_struct *work) 223 + { 224 + int index; 225 + struct machine_check_event *evt; 253 226 227 + while (__this_cpu_read(mce_ue_count) > 0) { 228 + index = __this_cpu_read(mce_ue_count) - 1; 229 + evt = this_cpu_ptr(&mce_ue_event_queue[index]); 230 + #ifdef CONFIG_MEMORY_FAILURE 231 + /* 232 + * This should probably queued elsewhere, but 233 + * oh! well 234 + */ 235 + if (evt->error_type == MCE_ERROR_TYPE_UE) { 236 + if (evt->u.ue_error.physical_address_provided) { 237 + unsigned long pfn; 238 + 239 + pfn = evt->u.ue_error.physical_address >> 240 + PAGE_SHIFT; 241 + memory_failure(pfn, SIGBUS, 0); 242 + } else 243 + pr_warn("Failed to identify bad address from " 244 + "where the uncorrectable error (UE) " 245 + "was generated\n"); 246 + } 247 + #endif 248 + __this_cpu_dec(mce_ue_count); 249 + } 250 + } 254 251 /* 255 252 * process pending MCE event from the mce event queue. This function will be 256 253 * called during syscall exit. ··· 290 223 static void machine_check_process_queued_event(struct irq_work *work) 291 224 { 292 225 int index; 226 + struct machine_check_event *evt; 293 227 294 228 add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE); 295 229 ··· 300 232 */ 301 233 while (__this_cpu_read(mce_queue_count) > 0) { 302 234 index = __this_cpu_read(mce_queue_count) - 1; 303 - machine_check_print_event_info( 304 - this_cpu_ptr(&mce_event_queue[index]), false); 235 + evt = this_cpu_ptr(&mce_event_queue[index]); 236 + machine_check_print_event_info(evt, false); 305 237 __this_cpu_dec(mce_queue_count); 306 238 } 307 239 } ··· 408 340 printk("%s Effective address: %016llx\n", 409 341 level, evt->u.ue_error.effective_address); 410 342 if (evt->u.ue_error.physical_address_provided) 411 - printk("%s Physical address: %016llx\n", 343 + printk("%s Physical address: %016llx\n", 412 344 level, evt->u.ue_error.physical_address); 413 345 break; 414 346 case MCE_ERROR_TYPE_SLB: ··· 479 411 } 480 412 EXPORT_SYMBOL_GPL(machine_check_print_event_info); 481 413 482 - uint64_t get_mce_fault_addr(struct machine_check_event *evt) 483 - { 484 - switch (evt->error_type) { 485 - case MCE_ERROR_TYPE_UE: 486 - if (evt->u.ue_error.effective_address_provided) 487 - return evt->u.ue_error.effective_address; 488 - break; 489 - case MCE_ERROR_TYPE_SLB: 490 - if (evt->u.slb_error.effective_address_provided) 491 - return evt->u.slb_error.effective_address; 492 - break; 493 - case MCE_ERROR_TYPE_ERAT: 494 - if (evt->u.erat_error.effective_address_provided) 495 - return evt->u.erat_error.effective_address; 496 - break; 497 - case MCE_ERROR_TYPE_TLB: 498 - if (evt->u.tlb_error.effective_address_provided) 499 - return evt->u.tlb_error.effective_address; 500 - break; 501 - case MCE_ERROR_TYPE_USER: 502 - if (evt->u.user_error.effective_address_provided) 503 - return evt->u.user_error.effective_address; 504 - break; 505 - case MCE_ERROR_TYPE_RA: 506 - if (evt->u.ra_error.effective_address_provided) 507 - return evt->u.ra_error.effective_address; 508 - break; 509 - case MCE_ERROR_TYPE_LINK: 510 - if (evt->u.link_error.effective_address_provided) 511 - return evt->u.link_error.effective_address; 512 - break; 513 - default: 514 - case MCE_ERROR_TYPE_UNKNOWN: 515 - break; 516 - } 517 - return 0; 518 - } 519 - EXPORT_SYMBOL(get_mce_fault_addr); 520 - 521 414 /* 522 415 * This function is called in real mode. Strictly no printk's please. 523 416 * ··· 499 470 { 500 471 __this_cpu_inc(irq_stat.hmi_exceptions); 501 472 473 + #ifdef CONFIG_PPC_BOOK3S_64 474 + /* Workaround for P9 vector CI loads (see p9_hmi_special_emu) */ 475 + if (pvr_version_is(PVR_POWER9)) { 476 + unsigned long hmer = mfspr(SPRN_HMER); 477 + 478 + /* Do we have the debug bit set */ 479 + if (hmer & PPC_BIT(17)) { 480 + hmer &= ~PPC_BIT(17); 481 + mtspr(SPRN_HMER, hmer); 482 + 483 + /* 484 + * Now to avoid problems with soft-disable we 485 + * only do the emulation if we are coming from 486 + * user space 487 + */ 488 + if (user_mode(regs)) 489 + local_paca->hmi_p9_special_emu = 1; 490 + 491 + /* 492 + * Don't bother going to OPAL if that's the 493 + * only relevant bit. 494 + */ 495 + if (!(hmer & mfspr(SPRN_HMEER))) 496 + return local_paca->hmi_p9_special_emu; 497 + } 498 + } 499 + #endif /* CONFIG_PPC_BOOK3S_64 */ 500 + 502 501 wait_for_subcore_guest_exit(); 503 502 504 503 if (ppc_md.hmi_exception_early) ··· 534 477 535 478 wait_for_tb_resync(); 536 479 537 - return 0; 480 + return 1; 538 481 }
+104 -11
arch/powerpc/kernel/mce_power.c
··· 27 27 #include <asm/mmu.h> 28 28 #include <asm/mce.h> 29 29 #include <asm/machdep.h> 30 + #include <asm/pgtable.h> 31 + #include <asm/pte-walk.h> 32 + #include <asm/sstep.h> 33 + #include <asm/exception-64s.h> 34 + 35 + /* 36 + * Convert an address related to an mm to a PFN. NOTE: we are in real 37 + * mode, we could potentially race with page table updates. 38 + */ 39 + static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) 40 + { 41 + pte_t *ptep; 42 + unsigned long flags; 43 + struct mm_struct *mm; 44 + 45 + if (user_mode(regs)) 46 + mm = current->mm; 47 + else 48 + mm = &init_mm; 49 + 50 + local_irq_save(flags); 51 + if (mm == current->mm) 52 + ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL); 53 + else 54 + ptep = find_init_mm_pte(addr, NULL); 55 + local_irq_restore(flags); 56 + if (!ptep || pte_special(*ptep)) 57 + return ULONG_MAX; 58 + return pte_pfn(*ptep); 59 + } 30 60 31 61 static void flush_tlb_206(unsigned int num_sets, unsigned int action) 32 62 { ··· 158 128 { 159 129 unsigned int num_sets; 160 130 161 - if (radix_enabled()) 131 + if (early_radix_enabled()) 162 132 num_sets = POWER9_TLB_SETS_RADIX; 163 133 else 164 134 num_sets = POWER9_TLB_SETS_HASH; ··· 168 138 169 139 170 140 /* flush SLBs and reload */ 171 - #ifdef CONFIG_PPC_STD_MMU_64 141 + #ifdef CONFIG_PPC_BOOK3S_64 172 142 static void flush_and_reload_slb(void) 173 143 { 174 144 struct slb_shadow *slb; ··· 215 185 216 186 static int mce_flush(int what) 217 187 { 218 - #ifdef CONFIG_PPC_STD_MMU_64 188 + #ifdef CONFIG_PPC_BOOK3S_64 219 189 if (what == MCE_FLUSH_SLB) { 220 190 flush_and_reload_slb(); 221 191 return 1; ··· 451 421 MCE_INITIATOR_CPU, MCE_SEV_ERROR_SYNC, }, 452 422 { 0, false, 0, 0, 0, 0 } }; 453 423 424 + static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr, 425 + uint64_t *phys_addr) 426 + { 427 + /* 428 + * Carefully look at the NIP to determine 429 + * the instruction to analyse. Reading the NIP 430 + * in real-mode is tricky and can lead to recursive 431 + * faults 432 + */ 433 + int instr; 434 + unsigned long pfn, instr_addr; 435 + struct instruction_op op; 436 + struct pt_regs tmp = *regs; 437 + 438 + pfn = addr_to_pfn(regs, regs->nip); 439 + if (pfn != ULONG_MAX) { 440 + instr_addr = (pfn << PAGE_SHIFT) + (regs->nip & ~PAGE_MASK); 441 + instr = *(unsigned int *)(instr_addr); 442 + if (!analyse_instr(&op, &tmp, instr)) { 443 + pfn = addr_to_pfn(regs, op.ea); 444 + *addr = op.ea; 445 + *phys_addr = (pfn << PAGE_SHIFT); 446 + return 0; 447 + } 448 + /* 449 + * analyse_instr() might fail if the instruction 450 + * is not a load/store, although this is unexpected 451 + * for load/store errors or if we got the NIP 452 + * wrong 453 + */ 454 + } 455 + *addr = 0; 456 + return -1; 457 + } 458 + 454 459 static int mce_handle_ierror(struct pt_regs *regs, 455 460 const struct mce_ierror_table table[], 456 - struct mce_error_info *mce_err, uint64_t *addr) 461 + struct mce_error_info *mce_err, uint64_t *addr, 462 + uint64_t *phys_addr) 457 463 { 458 464 uint64_t srr1 = regs->msr; 459 465 int handled = 0; ··· 541 475 } 542 476 mce_err->severity = table[i].severity; 543 477 mce_err->initiator = table[i].initiator; 544 - if (table[i].nip_valid) 478 + if (table[i].nip_valid) { 545 479 *addr = regs->nip; 480 + if (mce_err->severity == MCE_SEV_ERROR_SYNC && 481 + table[i].error_type == MCE_ERROR_TYPE_UE) { 482 + unsigned long pfn; 483 + 484 + if (get_paca()->in_mce < MAX_MCE_DEPTH) { 485 + pfn = addr_to_pfn(regs, regs->nip); 486 + if (pfn != ULONG_MAX) { 487 + *phys_addr = 488 + (pfn << PAGE_SHIFT); 489 + handled = 1; 490 + } 491 + } 492 + } 493 + } 546 494 return handled; 547 495 } 548 496 ··· 569 489 570 490 static int mce_handle_derror(struct pt_regs *regs, 571 491 const struct mce_derror_table table[], 572 - struct mce_error_info *mce_err, uint64_t *addr) 492 + struct mce_error_info *mce_err, uint64_t *addr, 493 + uint64_t *phys_addr) 573 494 { 574 495 uint64_t dsisr = regs->dsisr; 575 496 int handled = 0; ··· 636 555 mce_err->initiator = table[i].initiator; 637 556 if (table[i].dar_valid) 638 557 *addr = regs->dar; 639 - 558 + else if (mce_err->severity == MCE_SEV_ERROR_SYNC && 559 + table[i].error_type == MCE_ERROR_TYPE_UE) { 560 + /* 561 + * We do a maximum of 4 nested MCE calls, see 562 + * kernel/exception-64s.h 563 + */ 564 + if (get_paca()->in_mce < MAX_MCE_DEPTH) 565 + if (!mce_find_instr_ea_and_pfn(regs, addr, 566 + phys_addr)) 567 + handled = 1; 568 + } 640 569 found = 1; 641 570 } 642 571 ··· 683 592 const struct mce_ierror_table itable[]) 684 593 { 685 594 struct mce_error_info mce_err = { 0 }; 686 - uint64_t addr; 595 + uint64_t addr, phys_addr; 687 596 uint64_t srr1 = regs->msr; 688 597 long handled; 689 598 690 599 if (SRR1_MC_LOADSTORE(srr1)) 691 - handled = mce_handle_derror(regs, dtable, &mce_err, &addr); 600 + handled = mce_handle_derror(regs, dtable, &mce_err, &addr, 601 + &phys_addr); 692 602 else 693 - handled = mce_handle_ierror(regs, itable, &mce_err, &addr); 603 + handled = mce_handle_ierror(regs, itable, &mce_err, &addr, 604 + &phys_addr); 694 605 695 606 if (!handled && mce_err.error_type == MCE_ERROR_TYPE_UE) 696 607 handled = mce_handle_ue_error(regs); 697 608 698 - save_mce_event(regs, handled, &mce_err, regs->nip, addr); 609 + save_mce_event(regs, handled, &mce_err, regs->nip, addr, phys_addr); 699 610 700 611 return handled; 701 612 }
+2 -1
arch/powerpc/kernel/module_64.c
··· 429 429 /* Find this stub, or if that fails, the next avail. entry */ 430 430 stubs = (void *)sechdrs[me->arch.stubs_section].sh_addr; 431 431 for (i = 0; stub_func_addr(stubs[i].funcdata); i++) { 432 - BUG_ON(i >= num_stubs); 432 + if (WARN_ON(i >= num_stubs)) 433 + return 0; 433 434 434 435 if (stub_func_addr(stubs[i].funcdata) == func_addr(addr)) 435 436 return (unsigned long)&stubs[i];
+3 -12
arch/powerpc/kernel/optprobes.c
··· 115 115 static void optimized_callback(struct optimized_kprobe *op, 116 116 struct pt_regs *regs) 117 117 { 118 - struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); 119 - unsigned long flags; 120 - 121 118 /* This is possible if op is under delayed unoptimizing */ 122 119 if (kprobe_disabled(&op->kp)) 123 120 return; 124 121 125 - local_irq_save(flags); 126 - hard_irq_disable(); 122 + preempt_disable(); 127 123 128 124 if (kprobe_running()) { 129 125 kprobes_inc_nmissed_count(&op->kp); 130 126 } else { 131 127 __this_cpu_write(current_kprobe, &op->kp); 132 128 regs->nip = (unsigned long)op->kp.addr; 133 - kcb->kprobe_status = KPROBE_HIT_ACTIVE; 129 + get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE; 134 130 opt_pre_handler(&op->kp, regs); 135 131 __this_cpu_write(current_kprobe, NULL); 136 132 } 137 133 138 - /* 139 - * No need for an explicit __hard_irq_enable() here. 140 - * local_irq_restore() will re-enable interrupts, 141 - * if they were hard disabled. 142 - */ 143 - local_irq_restore(flags); 134 + preempt_enable_no_resched(); 144 135 } 145 136 NOKPROBE_SYMBOL(optimized_callback); 146 137
+8 -8
arch/powerpc/kernel/paca.c
··· 90 90 91 91 #endif /* CONFIG_PPC_BOOK3S */ 92 92 93 - #ifdef CONFIG_PPC_STD_MMU_64 93 + #ifdef CONFIG_PPC_BOOK3S_64 94 94 95 95 /* 96 96 * 3 persistent SLBs are registered here. The buffer will be zero ··· 135 135 return s; 136 136 } 137 137 138 - #else /* CONFIG_PPC_STD_MMU_64 */ 138 + #else /* !CONFIG_PPC_BOOK3S_64 */ 139 139 140 140 static void __init allocate_slb_shadows(int nr_cpus, int limit) { } 141 141 142 - #endif /* CONFIG_PPC_STD_MMU_64 */ 142 + #endif /* CONFIG_PPC_BOOK3S_64 */ 143 143 144 144 /* The Paca is an array with one entry per processor. Each contains an 145 145 * lppaca, which contains the information shared between the ··· 170 170 new_paca->kexec_state = KEXEC_STATE_NONE; 171 171 new_paca->__current = &init_task; 172 172 new_paca->data_offset = 0xfeeeeeeeeeeeeeeeULL; 173 - #ifdef CONFIG_PPC_STD_MMU_64 173 + #ifdef CONFIG_PPC_BOOK3S_64 174 174 new_paca->slb_shadow_ptr = init_slb_shadow(cpu); 175 - #endif /* CONFIG_PPC_STD_MMU_64 */ 175 + #endif 176 176 177 177 #ifdef CONFIG_PPC_BOOK3E 178 178 /* For now -- if we have threads this will be adjusted later */ ··· 262 262 263 263 get_paca()->mm_ctx_id = context->id; 264 264 #ifdef CONFIG_PPC_MM_SLICES 265 - VM_BUG_ON(!mm->context.addr_limit); 266 - get_paca()->addr_limit = mm->context.addr_limit; 265 + VM_BUG_ON(!mm->context.slb_addr_limit); 266 + get_paca()->mm_ctx_slb_addr_limit = mm->context.slb_addr_limit; 267 267 get_paca()->mm_ctx_low_slices_psize = context->low_slices_psize; 268 268 memcpy(&get_paca()->mm_ctx_high_slices_psize, 269 269 &context->high_slices_psize, TASK_SLICE_ARRAY_SZ(mm)); ··· 271 271 get_paca()->mm_ctx_user_psize = context->user_psize; 272 272 get_paca()->mm_ctx_sllp = context->sllp; 273 273 #endif 274 - #else /* CONFIG_PPC_BOOK3S */ 274 + #else /* !CONFIG_PPC_BOOK3S */ 275 275 return; 276 276 #endif 277 277 }
+2 -2
arch/powerpc/kernel/pci_64.c
··· 90 90 * to do an appropriate TLB flush here too 91 91 */ 92 92 if (bus->self) { 93 - #ifdef CONFIG_PPC_STD_MMU_64 93 + #ifdef CONFIG_PPC_BOOK3S_64 94 94 struct resource *res = bus->resource[0]; 95 95 #endif 96 96 97 97 pr_debug("IO unmapping for PCI-PCI bridge %s\n", 98 98 pci_name(bus->self)); 99 99 100 - #ifdef CONFIG_PPC_STD_MMU_64 100 + #ifdef CONFIG_PPC_BOOK3S_64 101 101 __flush_hash_table_range(&init_mm, res->start + _IO_BASE, 102 102 res->end + _IO_BASE + 1); 103 103 #endif
+190 -35
arch/powerpc/kernel/process.c
··· 77 77 extern unsigned long _get_SP(void); 78 78 79 79 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM 80 + /* 81 + * Are we running in "Suspend disabled" mode? If so we have to block any 82 + * sigreturn that would get us into suspended state, and we also warn in some 83 + * other paths that we should never reach with suspend disabled. 84 + */ 85 + bool tm_suspend_disabled __ro_after_init = false; 86 + 80 87 static void check_if_tm_restore_required(struct task_struct *tsk) 81 88 { 82 89 /* ··· 104 97 { 105 98 return MSR_TM_ACTIVE(msr); 106 99 } 100 + 101 + static bool tm_active_with_fp(struct task_struct *tsk) 102 + { 103 + return msr_tm_active(tsk->thread.regs->msr) && 104 + (tsk->thread.ckpt_regs.msr & MSR_FP); 105 + } 106 + 107 + static bool tm_active_with_altivec(struct task_struct *tsk) 108 + { 109 + return msr_tm_active(tsk->thread.regs->msr) && 110 + (tsk->thread.ckpt_regs.msr & MSR_VEC); 111 + } 107 112 #else 108 113 static inline bool msr_tm_active(unsigned long msr) { return false; } 109 114 static inline void check_if_tm_restore_required(struct task_struct *tsk) { } 115 + static inline bool tm_active_with_fp(struct task_struct *tsk) { return false; } 116 + static inline bool tm_active_with_altivec(struct task_struct *tsk) { return false; } 110 117 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ 111 118 112 119 bool strict_msr_control; ··· 253 232 254 233 static int restore_fp(struct task_struct *tsk) 255 234 { 256 - if (tsk->thread.load_fp || msr_tm_active(tsk->thread.regs->msr)) { 235 + if (tsk->thread.load_fp || tm_active_with_fp(tsk)) { 257 236 load_fp_state(&current->thread.fp_state); 258 237 current->thread.load_fp++; 259 238 return 1; ··· 335 314 static int restore_altivec(struct task_struct *tsk) 336 315 { 337 316 if (cpu_has_feature(CPU_FTR_ALTIVEC) && 338 - (tsk->thread.load_vec || msr_tm_active(tsk->thread.regs->msr))) { 317 + (tsk->thread.load_vec || tm_active_with_altivec(tsk))) { 339 318 load_vr_state(&tsk->thread.vr_state); 340 319 tsk->thread.used_vr = 1; 341 320 tsk->thread.load_vec++; ··· 874 853 if (!MSR_TM_SUSPENDED(mfmsr())) 875 854 return; 876 855 856 + giveup_all(container_of(thr, struct task_struct, thread)); 857 + 858 + tm_reclaim(thr, cause); 859 + 877 860 /* 878 861 * If we are in a transaction and FP is off then we can't have 879 862 * used FP inside that transaction. Hence the checkpointed ··· 896 871 if ((thr->ckpt_regs.msr & MSR_VEC) == 0) 897 872 memcpy(&thr->ckvr_state, &thr->vr_state, 898 873 sizeof(struct thread_vr_state)); 899 - 900 - giveup_all(container_of(thr, struct task_struct, thread)); 901 - 902 - tm_reclaim(thr, thr->ckpt_regs.msr, cause); 903 874 } 904 875 905 876 void tm_reclaim_current(uint8_t cause) ··· 924 903 if (!MSR_TM_ACTIVE(thr->regs->msr)) 925 904 goto out_and_saveregs; 926 905 906 + WARN_ON(tm_suspend_disabled); 907 + 927 908 TM_DEBUG("--- tm_reclaim on pid %d (NIP=%lx, " 928 909 "ccr=%lx, msr=%lx, trap=%lx)\n", 929 910 tsk->pid, thr->regs->nip, ··· 946 923 tm_save_sprs(thr); 947 924 } 948 925 949 - extern void __tm_recheckpoint(struct thread_struct *thread, 950 - unsigned long orig_msr); 926 + extern void __tm_recheckpoint(struct thread_struct *thread); 951 927 952 - void tm_recheckpoint(struct thread_struct *thread, 953 - unsigned long orig_msr) 928 + void tm_recheckpoint(struct thread_struct *thread) 954 929 { 955 930 unsigned long flags; 956 931 ··· 967 946 */ 968 947 tm_restore_sprs(thread); 969 948 970 - __tm_recheckpoint(thread, orig_msr); 949 + __tm_recheckpoint(thread); 971 950 972 951 local_irq_restore(flags); 973 952 } 974 953 975 954 static inline void tm_recheckpoint_new_task(struct task_struct *new) 976 955 { 977 - unsigned long msr; 978 - 979 956 if (!cpu_has_feature(CPU_FTR_TM)) 980 957 return; 981 958 ··· 992 973 tm_restore_sprs(&new->thread); 993 974 return; 994 975 } 995 - msr = new->thread.ckpt_regs.msr; 996 976 /* Recheckpoint to restore original checkpointed register state. */ 997 - TM_DEBUG("*** tm_recheckpoint of pid %d " 998 - "(new->msr 0x%lx, new->origmsr 0x%lx)\n", 999 - new->pid, new->thread.regs->msr, msr); 977 + TM_DEBUG("*** tm_recheckpoint of pid %d (new->msr 0x%lx)\n", 978 + new->pid, new->thread.regs->msr); 1000 979 1001 - tm_recheckpoint(&new->thread, msr); 980 + tm_recheckpoint(&new->thread); 1002 981 1003 982 /* 1004 983 * The checkpointed state has been restored but the live state has ··· 1136 1119 if (old_thread->tar != new_thread->tar) 1137 1120 mtspr(SPRN_TAR, new_thread->tar); 1138 1121 } 1122 + 1123 + if (cpu_has_feature(CPU_FTR_ARCH_300) && 1124 + old_thread->tidr != new_thread->tidr) 1125 + mtspr(SPRN_TIDR, new_thread->tidr); 1139 1126 #endif 1140 1127 } 1141 1128 ··· 1176 1155 } 1177 1156 #endif /* CONFIG_PPC64 */ 1178 1157 1179 - #ifdef CONFIG_PPC_STD_MMU_64 1158 + #ifdef CONFIG_PPC_BOOK3S_64 1180 1159 batch = this_cpu_ptr(&ppc64_tlb_batch); 1181 1160 if (batch->active) { 1182 1161 current_thread_info()->local_flags |= _TLF_LAZY_MMU; ··· 1184 1163 __flush_tlb_pending(batch); 1185 1164 batch->active = 0; 1186 1165 } 1187 - #endif /* CONFIG_PPC_STD_MMU_64 */ 1166 + #endif /* CONFIG_PPC_BOOK3S_64 */ 1188 1167 1189 1168 #ifdef CONFIG_PPC_ADV_DEBUG_REGS 1190 1169 switch_booke_debug_regs(&new->thread.debug); ··· 1230 1209 1231 1210 last = _switch(old_thread, new_thread); 1232 1211 1233 - #ifdef CONFIG_PPC_STD_MMU_64 1212 + #ifdef CONFIG_PPC_BOOK3S_64 1234 1213 if (current_thread_info()->local_flags & _TLF_LAZY_MMU) { 1235 1214 current_thread_info()->local_flags &= ~_TLF_LAZY_MMU; 1236 1215 batch = this_cpu_ptr(&ppc64_tlb_batch); ··· 1244 1223 * The copy-paste buffer can only store into foreign real 1245 1224 * addresses, so unprivileged processes can not see the 1246 1225 * data or use it in any way unless they have foreign real 1247 - * mappings. We don't have a VAS driver that allocates those 1248 - * yet, so no cpabort is required. 1226 + * mappings. If the new process has the foreign real address 1227 + * mappings, we must issue a cp_abort to clear any state and 1228 + * prevent snooping, corruption or a covert channel. 1229 + * 1230 + * DD1 allows paste into normal system memory so we do an 1231 + * unpaired copy, rather than cp_abort, to clear the buffer, 1232 + * since cp_abort is quite expensive. 1249 1233 */ 1250 - if (cpu_has_feature(CPU_FTR_POWER9_DD1)) { 1251 - /* 1252 - * DD1 allows paste into normal system memory, so we 1253 - * do an unpaired copy here to clear the buffer and 1254 - * prevent a covert channel being set up. 1255 - * 1256 - * cpabort is not used because it is quite expensive. 1257 - */ 1234 + if (current_thread_info()->task->thread.used_vas) { 1235 + asm volatile(PPC_CP_ABORT); 1236 + } else if (cpu_has_feature(CPU_FTR_POWER9_DD1)) { 1258 1237 asm volatile(PPC_COPY(%0, %1) 1259 1238 : : "r"(dummy_copy_buffer), "r"(0)); 1260 1239 } 1261 1240 } 1262 - #endif /* CONFIG_PPC_STD_MMU_64 */ 1241 + #endif /* CONFIG_PPC_BOOK3S_64 */ 1263 1242 1264 1243 return last; 1265 1244 } ··· 1455 1434 #endif /* CONFIG_HAVE_HW_BREAKPOINT */ 1456 1435 } 1457 1436 1437 + int set_thread_uses_vas(void) 1438 + { 1439 + #ifdef CONFIG_PPC_BOOK3S_64 1440 + if (!cpu_has_feature(CPU_FTR_ARCH_300)) 1441 + return -EINVAL; 1442 + 1443 + current->thread.used_vas = 1; 1444 + 1445 + /* 1446 + * Even a process that has no foreign real address mapping can use 1447 + * an unpaired COPY instruction (to no real effect). Issue CP_ABORT 1448 + * to clear any pending COPY and prevent a covert channel. 1449 + * 1450 + * __switch_to() will issue CP_ABORT on future context switches. 1451 + */ 1452 + asm volatile(PPC_CP_ABORT); 1453 + 1454 + #endif /* CONFIG_PPC_BOOK3S_64 */ 1455 + return 0; 1456 + } 1457 + 1458 + #ifdef CONFIG_PPC64 1459 + static DEFINE_SPINLOCK(vas_thread_id_lock); 1460 + static DEFINE_IDA(vas_thread_ida); 1461 + 1462 + /* 1463 + * We need to assign a unique thread id to each thread in a process. 1464 + * 1465 + * This thread id, referred to as TIDR, and separate from the Linux's tgid, 1466 + * is intended to be used to direct an ASB_Notify from the hardware to the 1467 + * thread, when a suitable event occurs in the system. 1468 + * 1469 + * One such event is a "paste" instruction in the context of Fast Thread 1470 + * Wakeup (aka Core-to-core wake up in the Virtual Accelerator Switchboard 1471 + * (VAS) in POWER9. 1472 + * 1473 + * To get a unique TIDR per process we could simply reuse task_pid_nr() but 1474 + * the problem is that task_pid_nr() is not yet available copy_thread() is 1475 + * called. Fixing that would require changing more intrusive arch-neutral 1476 + * code in code path in copy_process()?. 1477 + * 1478 + * Further, to assign unique TIDRs within each process, we need an atomic 1479 + * field (or an IDR) in task_struct, which again intrudes into the arch- 1480 + * neutral code. So try to assign globally unique TIDRs for now. 1481 + * 1482 + * NOTE: TIDR 0 indicates that the thread does not need a TIDR value. 1483 + * For now, only threads that expect to be notified by the VAS 1484 + * hardware need a TIDR value and we assign values > 0 for those. 1485 + */ 1486 + #define MAX_THREAD_CONTEXT ((1 << 16) - 1) 1487 + static int assign_thread_tidr(void) 1488 + { 1489 + int index; 1490 + int err; 1491 + 1492 + again: 1493 + if (!ida_pre_get(&vas_thread_ida, GFP_KERNEL)) 1494 + return -ENOMEM; 1495 + 1496 + spin_lock(&vas_thread_id_lock); 1497 + err = ida_get_new_above(&vas_thread_ida, 1, &index); 1498 + spin_unlock(&vas_thread_id_lock); 1499 + 1500 + if (err == -EAGAIN) 1501 + goto again; 1502 + else if (err) 1503 + return err; 1504 + 1505 + if (index > MAX_THREAD_CONTEXT) { 1506 + spin_lock(&vas_thread_id_lock); 1507 + ida_remove(&vas_thread_ida, index); 1508 + spin_unlock(&vas_thread_id_lock); 1509 + return -ENOMEM; 1510 + } 1511 + 1512 + return index; 1513 + } 1514 + 1515 + static void free_thread_tidr(int id) 1516 + { 1517 + spin_lock(&vas_thread_id_lock); 1518 + ida_remove(&vas_thread_ida, id); 1519 + spin_unlock(&vas_thread_id_lock); 1520 + } 1521 + 1522 + /* 1523 + * Clear any TIDR value assigned to this thread. 1524 + */ 1525 + void clear_thread_tidr(struct task_struct *t) 1526 + { 1527 + if (!t->thread.tidr) 1528 + return; 1529 + 1530 + if (!cpu_has_feature(CPU_FTR_ARCH_300)) { 1531 + WARN_ON_ONCE(1); 1532 + return; 1533 + } 1534 + 1535 + mtspr(SPRN_TIDR, 0); 1536 + free_thread_tidr(t->thread.tidr); 1537 + t->thread.tidr = 0; 1538 + } 1539 + 1540 + void arch_release_task_struct(struct task_struct *t) 1541 + { 1542 + clear_thread_tidr(t); 1543 + } 1544 + 1545 + /* 1546 + * Assign a unique TIDR (thread id) for task @t and set it in the thread 1547 + * structure. For now, we only support setting TIDR for 'current' task. 1548 + */ 1549 + int set_thread_tidr(struct task_struct *t) 1550 + { 1551 + if (!cpu_has_feature(CPU_FTR_ARCH_300)) 1552 + return -EINVAL; 1553 + 1554 + if (t != current) 1555 + return -EINVAL; 1556 + 1557 + t->thread.tidr = assign_thread_tidr(); 1558 + if (t->thread.tidr < 0) 1559 + return t->thread.tidr; 1560 + 1561 + mtspr(SPRN_TIDR, t->thread.tidr); 1562 + 1563 + return 0; 1564 + } 1565 + 1566 + #endif /* CONFIG_PPC64 */ 1567 + 1458 1568 void 1459 1569 release_thread(struct task_struct *t) 1460 1570 { ··· 1619 1467 1620 1468 static void setup_ksp_vsid(struct task_struct *p, unsigned long sp) 1621 1469 { 1622 - #ifdef CONFIG_PPC_STD_MMU_64 1470 + #ifdef CONFIG_PPC_BOOK3S_64 1623 1471 unsigned long sp_vsid; 1624 1472 unsigned long llp = mmu_psize_defs[mmu_linear_psize].sllp; 1625 1473 ··· 1732 1580 } 1733 1581 if (cpu_has_feature(CPU_FTR_HAS_PPR)) 1734 1582 p->thread.ppr = INIT_PPR; 1583 + 1584 + p->thread.tidr = 0; 1735 1585 #endif 1736 1586 kregs->nip = ppc_function_entry(f); 1737 1587 return 0; ··· 2052 1898 2053 1899 do { 2054 1900 sp = *(unsigned long *)sp; 2055 - if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD)) 1901 + if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD) || 1902 + p->state == TASK_RUNNING) 2056 1903 return 0; 2057 1904 if (count > 0) { 2058 1905 ip = ((unsigned long *)sp)[STACK_FRAME_LR_SAVE]; ··· 2201 2046 unsigned long base = mm->brk; 2202 2047 unsigned long ret; 2203 2048 2204 - #ifdef CONFIG_PPC_STD_MMU_64 2049 + #ifdef CONFIG_PPC_BOOK3S_64 2205 2050 /* 2206 2051 * If we are using 1TB segments and we are allowed to randomise 2207 2052 * the heap, we can put it above 1TB so it is backed by a 1TB
+36 -1
arch/powerpc/kernel/prom.c
··· 47 47 #include <asm/mmu.h> 48 48 #include <asm/paca.h> 49 49 #include <asm/pgtable.h> 50 + #include <asm/powernv.h> 50 51 #include <asm/iommu.h> 51 52 #include <asm/btext.h> 52 53 #include <asm/sections.h> ··· 229 228 ibm_pa_features, ARRAY_SIZE(ibm_pa_features)); 230 229 } 231 230 232 - #ifdef CONFIG_PPC_STD_MMU_64 231 + #ifdef CONFIG_PPC_BOOK3S_64 233 232 static void __init init_mmu_slb_size(unsigned long node) 234 233 { 235 234 const __be32 *slb_size_ptr; ··· 659 658 #endif 660 659 } 661 660 661 + #ifdef CONFIG_PPC_TRANSACTIONAL_MEM 662 + static bool tm_disabled __initdata; 663 + 664 + static int __init parse_ppc_tm(char *str) 665 + { 666 + bool res; 667 + 668 + if (kstrtobool(str, &res)) 669 + return -EINVAL; 670 + 671 + tm_disabled = !res; 672 + 673 + return 0; 674 + } 675 + early_param("ppc_tm", parse_ppc_tm); 676 + 677 + static void __init tm_init(void) 678 + { 679 + if (tm_disabled) { 680 + pr_info("Disabling hardware transactional memory (HTM)\n"); 681 + cur_cpu_spec->cpu_user_features2 &= 682 + ~(PPC_FEATURE2_HTM_NOSC | PPC_FEATURE2_HTM); 683 + cur_cpu_spec->cpu_features &= ~CPU_FTR_TM; 684 + return; 685 + } 686 + 687 + pnv_tm_init(); 688 + } 689 + #else 690 + static void tm_init(void) { } 691 + #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ 692 + 662 693 void __init early_init_devtree(void *params) 663 694 { 664 695 phys_addr_t limit; ··· 799 766 if (of_flat_dt_is_compatible(of_get_flat_dt_root(), "sony,ps3")) 800 767 powerpc_firmware_features |= FW_FEATURE_PS3_POSSIBLE; 801 768 #endif 769 + 770 + tm_init(); 802 771 803 772 DBG(" <- early_init_devtree()\n"); 804 773 }
+4 -3
arch/powerpc/kernel/setup-common.c
··· 773 773 static __init void print_system_info(void) 774 774 { 775 775 pr_info("-----------------------------------------------------\n"); 776 - #ifdef CONFIG_PPC_STD_MMU_64 776 + #ifdef CONFIG_PPC_BOOK3S_64 777 777 pr_info("ppc64_pft_size = 0x%llx\n", ppc64_pft_size); 778 778 #endif 779 779 #ifdef CONFIG_PPC_STD_MMU_32 ··· 800 800 pr_info("firmware_features = 0x%016lx\n", powerpc_firmware_features); 801 801 #endif 802 802 803 - #ifdef CONFIG_PPC_STD_MMU_64 803 + #ifdef CONFIG_PPC_BOOK3S_64 804 804 if (htab_address) 805 805 pr_info("htab_address = 0x%p\n", htab_address); 806 806 if (htab_hash_mask) ··· 898 898 899 899 #ifdef CONFIG_PPC_MM_SLICES 900 900 #ifdef CONFIG_PPC64 901 - init_mm.context.addr_limit = DEFAULT_MAP_WINDOW_USER64; 901 + if (!radix_enabled()) 902 + init_mm.context.slb_addr_limit = DEFAULT_MAP_WINDOW_USER64; 902 903 #else 903 904 #error "context.addr_limit not initialized." 904 905 #endif
+6
arch/powerpc/kernel/setup.h
··· 45 45 static inline void emergency_stack_init(void) { }; 46 46 #endif 47 47 48 + #ifdef CONFIG_PPC64 49 + void record_spr_defaults(void); 50 + #else 51 + static inline void record_spr_defaults(void) { }; 52 + #endif 53 + 48 54 /* 49 55 * Having this in kvm_ppc.h makes include dependencies too 50 56 * tricky to solve for setup-common.c so have it here.
+18 -1
arch/powerpc/kernel/setup_64.c
··· 69 69 #include <asm/opal.h> 70 70 #include <asm/cputhreads.h> 71 71 72 + #include "setup.h" 73 + 72 74 #ifdef DEBUG 73 75 #define DBG(fmt...) udbg_printf(fmt) 74 76 #else ··· 319 317 early_init_mmu(); 320 318 321 319 /* 320 + * After firmware and early platform setup code has set things up, 321 + * we note the SPR values for configurable control/performance 322 + * registers, and use those as initial defaults. 323 + */ 324 + record_spr_defaults(); 325 + 326 + /* 322 327 * At this point, we can let interrupts switch to virtual mode 323 328 * (the MMU has been setup), so adjust the MSR in the PACA to 324 329 * have IR and DR set and enable AIL if it exists ··· 369 360 #if defined(CONFIG_SMP) || defined(CONFIG_KEXEC_CORE) 370 361 static bool use_spinloop(void) 371 362 { 372 - if (!IS_ENABLED(CONFIG_PPC_BOOK3E)) 363 + if (IS_ENABLED(CONFIG_PPC_BOOK3S)) { 364 + /* 365 + * See comments in head_64.S -- not all platforms insert 366 + * secondaries at __secondary_hold and wait at the spin 367 + * loop. 368 + */ 369 + if (firmware_has_feature(FW_FEATURE_OPAL)) 370 + return false; 373 371 return true; 372 + } 374 373 375 374 /* 376 375 * When book3e boots from kexec, the ePAPR spin table does
+1 -1
arch/powerpc/kernel/signal.c
··· 103 103 static void do_signal(struct task_struct *tsk) 104 104 { 105 105 sigset_t *oldset = sigmask_to_save(); 106 - struct ksignal ksig; 106 + struct ksignal ksig = { .sig = 0 }; 107 107 int ret; 108 108 int is32 = is_32bit_task(); 109 109
+5 -1
arch/powerpc/kernel/signal_32.c
··· 519 519 { 520 520 unsigned long msr = regs->msr; 521 521 522 + WARN_ON(tm_suspend_disabled); 523 + 522 524 /* Remove TM bits from thread's MSR. The MSR in the sigcontext 523 525 * just indicates to userland that we were doing a transaction, but we 524 526 * don't want to return in transactional state. This also ensures ··· 771 769 int i; 772 770 #endif 773 771 772 + if (tm_suspend_disabled) 773 + return 1; 774 774 /* 775 775 * restore general registers but not including MSR or SOFTE. Also 776 776 * take care of keeping r2 (TLS) intact if not a signal. ··· 880 876 /* Make sure the transaction is marked as failed */ 881 877 current->thread.tm_texasr |= TEXASR_FS; 882 878 /* This loads the checkpointed FP/VEC state, if used */ 883 - tm_recheckpoint(&current->thread, msr); 879 + tm_recheckpoint(&current->thread); 884 880 885 881 /* This loads the speculative FP/VEC state, if used */ 886 882 msr_check_and_set(msr & (MSR_FP | MSR_VEC));
+6 -1
arch/powerpc/kernel/signal_64.c
··· 214 214 215 215 BUG_ON(!MSR_TM_ACTIVE(regs->msr)); 216 216 217 + WARN_ON(tm_suspend_disabled); 218 + 217 219 /* Remove TM bits from thread's MSR. The MSR in the sigcontext 218 220 * just indicates to userland that we were doing a transaction, but we 219 221 * don't want to return in transactional state. This also ensures ··· 432 430 433 431 BUG_ON(tsk != current); 434 432 433 + if (tm_suspend_disabled) 434 + return -EINVAL; 435 + 435 436 /* copy the GPRs */ 436 437 err |= __copy_from_user(regs->gpr, tm_sc->gp_regs, sizeof(regs->gpr)); 437 438 err |= __copy_from_user(&tsk->thread.ckpt_regs, sc->gp_regs, ··· 563 558 /* Make sure the transaction is marked as failed */ 564 559 tsk->thread.tm_texasr |= TEXASR_FS; 565 560 /* This loads the checkpointed FP/VEC state, if used */ 566 - tm_recheckpoint(&tsk->thread, msr); 561 + tm_recheckpoint(&tsk->thread); 567 562 568 563 msr_check_and_set(msr & (MSR_FP | MSR_VEC)); 569 564 if (msr & MSR_FP) {
+11
arch/powerpc/kernel/sysfs.c
··· 590 590 if (cpu_has_feature(CPU_FTR_DSCR)) 591 591 err = device_create_file(cpu_subsys.dev_root, &dev_attr_dscr_default); 592 592 } 593 + 594 + void __init record_spr_defaults(void) 595 + { 596 + int cpu; 597 + 598 + if (cpu_has_feature(CPU_FTR_DSCR)) { 599 + dscr_default = mfspr(SPRN_DSCR); 600 + for (cpu = 0; cpu < nr_cpu_ids; cpu++) 601 + paca[cpu].dscr_default = dscr_default; 602 + } 603 + } 593 604 #endif /* CONFIG_PPC64 */ 594 605 595 606 #ifdef HAS_PPC_PMC_PA6T
+1 -2
arch/powerpc/kernel/tau_6xx.c
··· 230 230 231 231 232 232 /* first, set up the window shrinking timer */ 233 - init_timer(&tau_timer); 234 - tau_timer.function = tau_timeout_smp; 233 + setup_timer(&tau_timer, tau_timeout_smp, 0UL); 235 234 tau_timer.expires = jiffies + shrink_timer; 236 235 add_timer(&tau_timer); 237 236
+17 -42
arch/powerpc/kernel/tm.S
··· 80 80 blr 81 81 82 82 /* void tm_reclaim(struct thread_struct *thread, 83 - * unsigned long orig_msr, 84 83 * uint8_t cause) 85 84 * 86 85 * - Performs a full reclaim. This destroys outstanding 87 86 * transactions and updates thread->regs.tm_ckpt_* with the 88 87 * original checkpointed state. Note that thread->regs is 89 88 * unchanged. 90 - * - FP regs are written back to thread->transact_fpr before 91 - * reclaiming. These are the transactional (current) versions. 92 89 * 93 90 * Purpose is to both abort transactions of, and preserve the state of, 94 91 * a transactions at a context switch. We preserve/restore both sets of process ··· 96 99 * Call with IRQs off, stacks get all out of sync for some periods in here! 97 100 */ 98 101 _GLOBAL(tm_reclaim) 99 - mfcr r6 102 + mfcr r5 100 103 mflr r0 101 - stw r6, 8(r1) 104 + stw r5, 8(r1) 102 105 std r0, 16(r1) 103 106 std r2, STK_GOT(r1) 104 107 stdu r1, -TM_FRAME_SIZE(r1) ··· 106 109 /* We've a struct pt_regs at [r1+STACK_FRAME_OVERHEAD]. */ 107 110 108 111 std r3, STK_PARAM(R3)(r1) 109 - std r4, STK_PARAM(R4)(r1) 110 112 SAVE_NVGPRS(r1) 111 113 112 114 /* We need to setup MSR for VSX register save instructions. */ ··· 135 139 std r1, PACAR1(r13) 136 140 137 141 /* Clear MSR RI since we are about to change r1, EE is already off. */ 138 - li r4, 0 139 - mtmsrd r4, 1 142 + li r5, 0 143 + mtmsrd r5, 1 140 144 141 145 /* 142 146 * BE CAREFUL HERE: ··· 148 152 * to user register state. (FPRs, CCR etc. also!) 149 153 * Use an sprg and a tm_scratch in the PACA to shuffle. 150 154 */ 151 - TRECLAIM(R5) /* Cause in r5 */ 155 + TRECLAIM(R4) /* Cause in r4 */ 152 156 153 157 /* ******************** GPRs ******************** */ 154 158 /* Stash the checkpointed r13 away in the scratch SPR and get the real ··· 239 243 240 244 241 245 /* ******************** FPR/VR/VSRs ************ 242 - * After reclaiming, capture the checkpointed FPRs/VRs /if used/. 243 - * 244 - * (If VSX used, FP and VMX are implied. Or, we don't need to look 245 - * at MSR.VSX as copying FP regs if .FP, vector regs if .VMX covers it.) 246 - * 247 - * We're passed the thread's MSR as the second parameter 246 + * After reclaiming, capture the checkpointed FPRs/VRs. 248 247 * 249 248 * We enabled VEC/FP/VSX in the msr above, so we can execute these 250 249 * instructions! 251 250 */ 252 - ld r4, STK_PARAM(R4)(r1) /* Second parameter, MSR * */ 253 251 mr r3, r12 254 - andis. r0, r4, MSR_VEC@h 255 - beq dont_backup_vec 256 252 253 + /* Altivec (VEC/VMX/VR)*/ 257 254 addi r7, r3, THREAD_CKVRSTATE 258 255 SAVE_32VRS(0, r6, r7) /* r6 scratch, r7 transact vr state */ 259 256 mfvscr v0 260 257 li r6, VRSTATE_VSCR 261 258 stvx v0, r7, r6 262 - dont_backup_vec: 259 + 260 + /* VRSAVE */ 263 261 mfspr r0, SPRN_VRSAVE 264 262 std r0, THREAD_CKVRSAVE(r3) 265 263 266 - andi. r0, r4, MSR_FP 267 - beq dont_backup_fp 268 - 264 + /* Floating Point (FP) */ 269 265 addi r7, r3, THREAD_CKFPSTATE 270 266 SAVE_32FPRS_VSRS(0, R6, R7) /* r6 scratch, r7 transact fp state */ 271 - 272 267 mffs fr0 273 268 stfd fr0,FPSTATE_FPSCR(r7) 274 269 275 - dont_backup_fp: 276 270 277 271 /* TM regs, incl TEXASR -- these live in thread_struct. Note they've 278 272 * been updated by the treclaim, to explain to userland the failure ··· 330 344 */ 331 345 subi r7, r7, STACK_FRAME_OVERHEAD 332 346 347 + /* We need to setup MSR for FP/VMX/VSX register save instructions. */ 333 348 mfmsr r6 334 - /* R4 = original MSR to indicate whether thread used FP/Vector etc. */ 335 - 336 - /* Enable FP/vec in MSR if necessary! */ 337 - lis r5, MSR_VEC@h 349 + mr r5, r6 338 350 ori r5, r5, MSR_FP 339 - and. r5, r4, r5 340 - beq restore_gprs /* if neither, skip both */ 341 - 351 + #ifdef CONFIG_ALTIVEC 352 + oris r5, r5, MSR_VEC@h 353 + #endif 342 354 #ifdef CONFIG_VSX 343 355 BEGIN_FTR_SECTION 344 - oris r5, r5, MSR_VSX@h 356 + oris r5,r5, MSR_VSX@h 345 357 END_FTR_SECTION_IFSET(CPU_FTR_VSX) 346 358 #endif 347 - or r5, r6, r5 /* Set MSR.FP+.VSX/.VEC */ 348 - mtmsr r5 359 + mtmsrd r5 349 360 350 361 #ifdef CONFIG_ALTIVEC 351 362 /* ··· 351 368 * thread.fp_state[] version holds the 'live' (transactional) 352 369 * and will be loaded subsequently by any FPUnavailable trap. 353 370 */ 354 - andis. r0, r4, MSR_VEC@h 355 - beq dont_restore_vec 356 - 357 371 addi r8, r3, THREAD_CKVRSTATE 358 372 li r5, VRSTATE_VSCR 359 373 lvx v0, r8, r5 360 374 mtvscr v0 361 375 REST_32VRS(0, r5, r8) /* r5 scratch, r8 ptr */ 362 - dont_restore_vec: 363 376 ld r5, THREAD_CKVRSAVE(r3) 364 377 mtspr SPRN_VRSAVE, r5 365 378 #endif 366 - 367 - andi. r0, r4, MSR_FP 368 - beq dont_restore_fp 369 379 370 380 addi r8, r3, THREAD_CKFPSTATE 371 381 lfd fr0, FPSTATE_FPSCR(r8) 372 382 MTFSF_L(fr0) 373 383 REST_32FPRS_VSRS(0, R4, R8) 374 384 375 - dont_restore_fp: 376 385 mtmsr r6 /* FP/Vec off again! */ 377 386 378 387 restore_gprs:
+2 -2
arch/powerpc/kernel/trace/ftrace_64_mprofile.S
··· 110 110 /* NIP has not been altered, skip over further checks */ 111 111 beq 1f 112 112 113 - /* Check if there is an active kprobe on us */ 113 + /* Check if there is an active jprobe on us */ 114 114 subi r3, r14, 4 115 - bl is_current_kprobe_addr 115 + bl __is_active_jprobe 116 116 nop 117 117 118 118 /*
+210 -46
arch/powerpc/kernel/traps.c
··· 37 37 #include <linux/kdebug.h> 38 38 #include <linux/ratelimit.h> 39 39 #include <linux/context_tracking.h> 40 + #include <linux/smp.h> 40 41 41 42 #include <asm/emulated_ops.h> 42 43 #include <asm/pgtable.h> ··· 700 699 die("System Management Interrupt", regs, SIGABRT); 701 700 } 702 701 702 + #ifdef CONFIG_VSX 703 + static void p9_hmi_special_emu(struct pt_regs *regs) 704 + { 705 + unsigned int ra, rb, t, i, sel, instr, rc; 706 + const void __user *addr; 707 + u8 vbuf[16], *vdst; 708 + unsigned long ea, msr, msr_mask; 709 + bool swap; 710 + 711 + if (__get_user_inatomic(instr, (unsigned int __user *)regs->nip)) 712 + return; 713 + 714 + /* 715 + * lxvb16x opcode: 0x7c0006d8 716 + * lxvd2x opcode: 0x7c000698 717 + * lxvh8x opcode: 0x7c000658 718 + * lxvw4x opcode: 0x7c000618 719 + */ 720 + if ((instr & 0xfc00073e) != 0x7c000618) { 721 + pr_devel("HMI vec emu: not vector CI %i:%s[%d] nip=%016lx" 722 + " instr=%08x\n", 723 + smp_processor_id(), current->comm, current->pid, 724 + regs->nip, instr); 725 + return; 726 + } 727 + 728 + /* Grab vector registers into the task struct */ 729 + msr = regs->msr; /* Grab msr before we flush the bits */ 730 + flush_vsx_to_thread(current); 731 + enable_kernel_altivec(); 732 + 733 + /* 734 + * Is userspace running with a different endian (this is rare but 735 + * not impossible) 736 + */ 737 + swap = (msr & MSR_LE) != (MSR_KERNEL & MSR_LE); 738 + 739 + /* Decode the instruction */ 740 + ra = (instr >> 16) & 0x1f; 741 + rb = (instr >> 11) & 0x1f; 742 + t = (instr >> 21) & 0x1f; 743 + if (instr & 1) 744 + vdst = (u8 *)&current->thread.vr_state.vr[t]; 745 + else 746 + vdst = (u8 *)&current->thread.fp_state.fpr[t][0]; 747 + 748 + /* Grab the vector address */ 749 + ea = regs->gpr[rb] + (ra ? regs->gpr[ra] : 0); 750 + if (is_32bit_task()) 751 + ea &= 0xfffffffful; 752 + addr = (__force const void __user *)ea; 753 + 754 + /* Check it */ 755 + if (!access_ok(VERIFY_READ, addr, 16)) { 756 + pr_devel("HMI vec emu: bad access %i:%s[%d] nip=%016lx" 757 + " instr=%08x addr=%016lx\n", 758 + smp_processor_id(), current->comm, current->pid, 759 + regs->nip, instr, (unsigned long)addr); 760 + return; 761 + } 762 + 763 + /* Read the vector */ 764 + rc = 0; 765 + if ((unsigned long)addr & 0xfUL) 766 + /* unaligned case */ 767 + rc = __copy_from_user_inatomic(vbuf, addr, 16); 768 + else 769 + __get_user_atomic_128_aligned(vbuf, addr, rc); 770 + if (rc) { 771 + pr_devel("HMI vec emu: page fault %i:%s[%d] nip=%016lx" 772 + " instr=%08x addr=%016lx\n", 773 + smp_processor_id(), current->comm, current->pid, 774 + regs->nip, instr, (unsigned long)addr); 775 + return; 776 + } 777 + 778 + pr_devel("HMI vec emu: emulated vector CI %i:%s[%d] nip=%016lx" 779 + " instr=%08x addr=%016lx\n", 780 + smp_processor_id(), current->comm, current->pid, regs->nip, 781 + instr, (unsigned long) addr); 782 + 783 + /* Grab instruction "selector" */ 784 + sel = (instr >> 6) & 3; 785 + 786 + /* 787 + * Check to make sure the facility is actually enabled. This 788 + * could happen if we get a false positive hit. 789 + * 790 + * lxvd2x/lxvw4x always check MSR VSX sel = 0,2 791 + * lxvh8x/lxvb16x check MSR VSX or VEC depending on VSR used sel = 1,3 792 + */ 793 + msr_mask = MSR_VSX; 794 + if ((sel & 1) && (instr & 1)) /* lxvh8x & lxvb16x + VSR >= 32 */ 795 + msr_mask = MSR_VEC; 796 + if (!(msr & msr_mask)) { 797 + pr_devel("HMI vec emu: MSR fac clear %i:%s[%d] nip=%016lx" 798 + " instr=%08x msr:%016lx\n", 799 + smp_processor_id(), current->comm, current->pid, 800 + regs->nip, instr, msr); 801 + return; 802 + } 803 + 804 + /* Do logging here before we modify sel based on endian */ 805 + switch (sel) { 806 + case 0: /* lxvw4x */ 807 + PPC_WARN_EMULATED(lxvw4x, regs); 808 + break; 809 + case 1: /* lxvh8x */ 810 + PPC_WARN_EMULATED(lxvh8x, regs); 811 + break; 812 + case 2: /* lxvd2x */ 813 + PPC_WARN_EMULATED(lxvd2x, regs); 814 + break; 815 + case 3: /* lxvb16x */ 816 + PPC_WARN_EMULATED(lxvb16x, regs); 817 + break; 818 + } 819 + 820 + #ifdef __LITTLE_ENDIAN__ 821 + /* 822 + * An LE kernel stores the vector in the task struct as an LE 823 + * byte array (effectively swapping both the components and 824 + * the content of the components). Those instructions expect 825 + * the components to remain in ascending address order, so we 826 + * swap them back. 827 + * 828 + * If we are running a BE user space, the expectation is that 829 + * of a simple memcpy, so forcing the emulation to look like 830 + * a lxvb16x should do the trick. 831 + */ 832 + if (swap) 833 + sel = 3; 834 + 835 + switch (sel) { 836 + case 0: /* lxvw4x */ 837 + for (i = 0; i < 4; i++) 838 + ((u32 *)vdst)[i] = ((u32 *)vbuf)[3-i]; 839 + break; 840 + case 1: /* lxvh8x */ 841 + for (i = 0; i < 8; i++) 842 + ((u16 *)vdst)[i] = ((u16 *)vbuf)[7-i]; 843 + break; 844 + case 2: /* lxvd2x */ 845 + for (i = 0; i < 2; i++) 846 + ((u64 *)vdst)[i] = ((u64 *)vbuf)[1-i]; 847 + break; 848 + case 3: /* lxvb16x */ 849 + for (i = 0; i < 16; i++) 850 + vdst[i] = vbuf[15-i]; 851 + break; 852 + } 853 + #else /* __LITTLE_ENDIAN__ */ 854 + /* On a big endian kernel, a BE userspace only needs a memcpy */ 855 + if (!swap) 856 + sel = 3; 857 + 858 + /* Otherwise, we need to swap the content of the components */ 859 + switch (sel) { 860 + case 0: /* lxvw4x */ 861 + for (i = 0; i < 4; i++) 862 + ((u32 *)vdst)[i] = cpu_to_le32(((u32 *)vbuf)[i]); 863 + break; 864 + case 1: /* lxvh8x */ 865 + for (i = 0; i < 8; i++) 866 + ((u16 *)vdst)[i] = cpu_to_le16(((u16 *)vbuf)[i]); 867 + break; 868 + case 2: /* lxvd2x */ 869 + for (i = 0; i < 2; i++) 870 + ((u64 *)vdst)[i] = cpu_to_le64(((u64 *)vbuf)[i]); 871 + break; 872 + case 3: /* lxvb16x */ 873 + memcpy(vdst, vbuf, 16); 874 + break; 875 + } 876 + #endif /* !__LITTLE_ENDIAN__ */ 877 + 878 + /* Go to next instruction */ 879 + regs->nip += 4; 880 + } 881 + #endif /* CONFIG_VSX */ 882 + 703 883 void handle_hmi_exception(struct pt_regs *regs) 704 884 { 705 885 struct pt_regs *old_regs; 706 886 707 887 old_regs = set_irq_regs(regs); 708 888 irq_enter(); 889 + 890 + #ifdef CONFIG_VSX 891 + /* Real mode flagged P9 special emu is needed */ 892 + if (local_paca->hmi_p9_special_emu) { 893 + local_paca->hmi_p9_special_emu = 0; 894 + 895 + /* 896 + * We don't want to take page faults while doing the 897 + * emulation, we just replay the instruction if necessary. 898 + */ 899 + pagefault_disable(); 900 + p9_hmi_special_emu(regs); 901 + pagefault_enable(); 902 + } 903 + #endif /* CONFIG_VSX */ 709 904 710 905 if (ppc_md.handle_hmi_exception) 711 906 ppc_md.handle_hmi_exception(regs); ··· 1337 1140 * - A treclaim is attempted when non transactional. 1338 1141 * - A tend is illegally attempted. 1339 1142 * - writing a TM SPR when transactional. 1340 - */ 1341 - if (!user_mode(regs) && 1342 - report_bug(regs->nip, regs) == BUG_TRAP_TYPE_WARN) { 1343 - regs->nip += 4; 1344 - goto bail; 1345 - } 1346 - /* If usermode caused this, it's done something illegal and 1143 + * 1144 + * If usermode caused this, it's done something illegal and 1347 1145 * gets a SIGILL slap on the wrist. We call it an illegal 1348 1146 * operand to distinguish from the instruction just being bad 1349 1147 * (e.g. executing a 'tend' on a CPU without TM!); it's an ··· 1679 1487 /* Reclaim didn't save out any FPRs to transact_fprs. */ 1680 1488 1681 1489 /* Enable FP for the task: */ 1682 - regs->msr |= (MSR_FP | current->thread.fpexc_mode); 1490 + current->thread.load_fp = 1; 1683 1491 1684 1492 /* This loads and recheckpoints the FP registers from 1685 1493 * thread.fpr[]. They will remain in registers after the ··· 1687 1495 * If VMX is in use, the VRs now hold checkpointed values, 1688 1496 * so we don't want to load the VRs from the thread_struct. 1689 1497 */ 1690 - tm_recheckpoint(&current->thread, MSR_FP); 1691 - 1692 - /* If VMX is in use, get the transactional values back */ 1693 - if (regs->msr & MSR_VEC) { 1694 - msr_check_and_set(MSR_VEC); 1695 - load_vr_state(&current->thread.vr_state); 1696 - /* At this point all the VSX state is loaded, so enable it */ 1697 - regs->msr |= MSR_VSX; 1698 - } 1498 + tm_recheckpoint(&current->thread); 1699 1499 } 1700 1500 1701 1501 void altivec_unavailable_tm(struct pt_regs *regs) ··· 1700 1516 "MSR=%lx\n", 1701 1517 regs->nip, regs->msr); 1702 1518 tm_reclaim_current(TM_CAUSE_FAC_UNAV); 1703 - regs->msr |= MSR_VEC; 1704 - tm_recheckpoint(&current->thread, MSR_VEC); 1519 + current->thread.load_vec = 1; 1520 + tm_recheckpoint(&current->thread); 1705 1521 current->thread.used_vr = 1; 1706 - 1707 - if (regs->msr & MSR_FP) { 1708 - msr_check_and_set(MSR_FP); 1709 - load_fp_state(&current->thread.fp_state); 1710 - regs->msr |= MSR_VSX; 1711 - } 1712 1522 } 1713 1523 1714 1524 void vsx_unavailable_tm(struct pt_regs *regs) 1715 1525 { 1716 - unsigned long orig_msr = regs->msr; 1717 - 1718 1526 /* See the comments in fp_unavailable_tm(). This works similarly, 1719 1527 * though we're loading both FP and VEC registers in here. 1720 1528 * ··· 1720 1544 1721 1545 current->thread.used_vsr = 1; 1722 1546 1723 - /* If FP and VMX are already loaded, we have all the state we need */ 1724 - if ((orig_msr & (MSR_FP | MSR_VEC)) == (MSR_FP | MSR_VEC)) { 1725 - regs->msr |= MSR_VSX; 1726 - return; 1727 - } 1728 - 1729 1547 /* This reclaims FP and/or VR regs if they're already enabled */ 1730 1548 tm_reclaim_current(TM_CAUSE_FAC_UNAV); 1731 1549 1732 - regs->msr |= MSR_VEC | MSR_FP | current->thread.fpexc_mode | 1733 - MSR_VSX; 1550 + current->thread.load_vec = 1; 1551 + current->thread.load_fp = 1; 1734 1552 1735 - /* This loads & recheckpoints FP and VRs; but we have 1736 - * to be sure not to overwrite previously-valid state. 1737 - */ 1738 - tm_recheckpoint(&current->thread, regs->msr & ~orig_msr); 1739 - 1740 - msr_check_and_set(orig_msr & (MSR_FP | MSR_VEC)); 1741 - 1742 - if (orig_msr & MSR_FP) 1743 - load_fp_state(&current->thread.fp_state); 1744 - if (orig_msr & MSR_VEC) 1745 - load_vr_state(&current->thread.vr_state); 1553 + tm_recheckpoint(&current->thread); 1746 1554 } 1747 1555 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ 1748 1556 ··· 2084 1924 WARN_EMULATED_SETUP(mfdscr), 2085 1925 WARN_EMULATED_SETUP(mtdscr), 2086 1926 WARN_EMULATED_SETUP(lq_stq), 1927 + WARN_EMULATED_SETUP(lxvw4x), 1928 + WARN_EMULATED_SETUP(lxvh8x), 1929 + WARN_EMULATED_SETUP(lxvd2x), 1930 + WARN_EMULATED_SETUP(lxvb16x), 2087 1931 #endif 2088 1932 }; 2089 1933
+17 -12
arch/powerpc/kernel/watchdog.c
··· 98 98 else 99 99 dump_stack(); 100 100 101 - if (hardlockup_panic) 102 - nmi_panic(regs, "Hard LOCKUP"); 101 + /* Do not panic from here because that can recurse into NMI IPI layer */ 103 102 } 104 103 105 104 static void set_cpumask_stuck(const struct cpumask *cpumask, u64 tb) ··· 134 135 pr_emerg("Watchdog CPU:%d detected Hard LOCKUP other CPUS:%*pbl\n", 135 136 cpu, cpumask_pr_args(&wd_smp_cpus_pending)); 136 137 137 - /* 138 - * Try to trigger the stuck CPUs. 139 - */ 140 - for_each_cpu(c, &wd_smp_cpus_pending) { 141 - if (c == cpu) 142 - continue; 143 - smp_send_nmi_ipi(c, wd_lockup_ipi, 1000000); 138 + if (!sysctl_hardlockup_all_cpu_backtrace) { 139 + /* 140 + * Try to trigger the stuck CPUs, unless we are going to 141 + * get a backtrace on all of them anyway. 142 + */ 143 + for_each_cpu(c, &wd_smp_cpus_pending) { 144 + if (c == cpu) 145 + continue; 146 + smp_send_nmi_ipi(c, wd_lockup_ipi, 1000000); 147 + } 148 + smp_flush_nmi_ipi(1000000); 144 149 } 145 - smp_flush_nmi_ipi(1000000); 146 150 147 151 /* Take the stuck CPUs out of the watch group */ 148 152 set_cpumask_stuck(&wd_smp_cpus_pending, tb); ··· 277 275 { 278 276 unsigned long ticks = tb_ticks_per_usec * wd_timer_period_ms * 1000; 279 277 int cpu = smp_processor_id(); 278 + u64 tb = get_tb(); 280 279 281 - if (get_tb() - per_cpu(wd_timer_tb, cpu) >= ticks) 282 - watchdog_timer_interrupt(cpu); 280 + if (tb - per_cpu(wd_timer_tb, cpu) >= ticks) { 281 + per_cpu(wd_timer_tb, cpu) = tb; 282 + wd_smp_clear_cpu_pending(cpu, tb); 283 + } 283 284 } 284 285 EXPORT_SYMBOL(arch_touch_nmi_watchdog); 285 286
+7 -13
arch/powerpc/kvm/book3s_hv.c
··· 47 47 48 48 #include <asm/reg.h> 49 49 #include <asm/ppc-opcode.h> 50 + #include <asm/asm-prototypes.h> 50 51 #include <asm/disassemble.h> 51 52 #include <asm/cputable.h> 52 53 #include <asm/cacheflush.h> ··· 1090 1089 vcpu->stat.ext_intr_exits++; 1091 1090 r = RESUME_GUEST; 1092 1091 break; 1093 - /* HMI is hypervisor interrupt and host has handled it. Resume guest.*/ 1092 + /* SR/HMI/PMI are HV interrupts that host has handled. Resume guest.*/ 1094 1093 case BOOK3S_INTERRUPT_HMI: 1095 1094 case BOOK3S_INTERRUPT_PERFMON: 1095 + case BOOK3S_INTERRUPT_SYSTEM_RESET: 1096 1096 r = RESUME_GUEST; 1097 1097 break; 1098 1098 case BOOK3S_INTERRUPT_MACHINE_CHECK: ··· 2119 2117 struct paca_struct *tpaca; 2120 2118 long timeout = 10000; 2121 2119 2122 - /* 2123 - * ISA v3.0 idle routines do not set hwthread_state or test 2124 - * hwthread_req, so they can not grab idle threads. 2125 - */ 2126 - if (cpu_has_feature(CPU_FTR_ARCH_300)) { 2127 - WARN(1, "KVM: can not control sibling threads\n"); 2128 - return -EBUSY; 2129 - } 2130 - 2131 2120 tpaca = &paca[cpu]; 2132 2121 2133 2122 /* Ensure the thread won't go into the kernel if it wakes */ ··· 2153 2160 struct paca_struct *tpaca; 2154 2161 2155 2162 tpaca = &paca[cpu]; 2163 + tpaca->kvm_hstate.hwthread_req = 0; 2156 2164 tpaca->kvm_hstate.kvm_vcpu = NULL; 2157 2165 tpaca->kvm_hstate.kvm_vcore = NULL; 2158 2166 tpaca->kvm_hstate.kvm_split_mode = NULL; 2159 - if (!cpu_has_feature(CPU_FTR_ARCH_300)) 2160 - tpaca->kvm_hstate.hwthread_req = 0; 2161 - 2162 2167 } 2163 2168 2164 2169 static void radix_flush_cpu(struct kvm *kvm, int cpu, struct kvm_vcpu *vcpu) ··· 2605 2614 break; 2606 2615 case BOOK3S_INTERRUPT_HMI: 2607 2616 local_paca->irq_happened |= PACA_IRQ_HMI; 2617 + break; 2618 + case BOOK3S_INTERRUPT_SYSTEM_RESET: 2619 + replay_system_reset(); 2608 2620 break; 2609 2621 } 2610 2622 }
-8
arch/powerpc/kvm/book3s_hv_rmhandlers.S
··· 149 149 subf r4, r4, r3 150 150 mtspr SPRN_DEC, r4 151 151 152 - BEGIN_FTR_SECTION 153 152 /* hwthread_req may have got set by cede or no vcpu, so clear it */ 154 153 li r0, 0 155 154 stb r0, HSTATE_HWTHREAD_REQ(r13) 156 - END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300) 157 155 158 156 /* 159 157 * For external interrupts we need to call the Linux ··· 314 316 * Relocation is off and most register values are lost. 315 317 * r13 points to the PACA. 316 318 * r3 contains the SRR1 wakeup value, SRR1 is trashed. 317 - * This is not used by ISAv3.0B processors. 318 319 */ 319 320 .globl kvm_start_guest 320 321 kvm_start_guest: ··· 432 435 * While waiting we also need to check if we get given a vcpu to run. 433 436 */ 434 437 kvm_no_guest: 435 - BEGIN_FTR_SECTION 436 - twi 31,0,0 437 - END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) 438 438 lbz r3, HSTATE_HWTHREAD_REQ(r13) 439 439 cmpwi r3, 0 440 440 bne 53f ··· 2540 2546 clrrdi r0, r0, 1 2541 2547 mtspr SPRN_CTRLT, r0 2542 2548 2543 - BEGIN_FTR_SECTION 2544 2549 li r0,1 2545 2550 stb r0,HSTATE_HWTHREAD_REQ(r13) 2546 - END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300) 2547 2551 mfspr r5,SPRN_LPCR 2548 2552 ori r5,r5,LPCR_PECE0 | LPCR_PECE1 2549 2553 BEGIN_FTR_SECTION
+2 -1
arch/powerpc/kvm/powerpc.c
··· 644 644 break; 645 645 #endif 646 646 case KVM_CAP_PPC_HTM: 647 - r = cpu_has_feature(CPU_FTR_TM_COMP) && hv_enabled; 647 + r = hv_enabled && 648 + (cur_cpu_spec->cpu_user_features2 & PPC_FEATURE2_HTM_COMP); 648 649 break; 649 650 default: 650 651 r = 0;
+1 -1
arch/powerpc/lib/Makefile
··· 24 24 25 25 obj64-y += copypage_64.o copyuser_64.o mem_64.o hweight_64.o \ 26 26 copyuser_power7.o string_64.o copypage_power7.o memcpy_power7.o \ 27 - memcpy_64.o memcmp_64.o 27 + memcpy_64.o memcmp_64.o pmem.o 28 28 29 29 obj64-$(CONFIG_SMP) += locks.o 30 30 obj64-$(CONFIG_ALTIVEC) += vmx-helper.o
+67
arch/powerpc/lib/pmem.c
··· 1 + /* 2 + * Copyright(c) 2017 IBM Corporation. All rights reserved. 3 + * 4 + * This program is free software; you can redistribute it and/or modify 5 + * it under the terms of version 2 of the GNU General Public License as 6 + * published by the Free Software Foundation. 7 + * 8 + * This program is distributed in the hope that it will be useful, but 9 + * WITHOUT ANY WARRANTY; without even the implied warranty of 10 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 11 + * General Public License for more details. 12 + */ 13 + 14 + #include <linux/string.h> 15 + #include <linux/export.h> 16 + #include <linux/uaccess.h> 17 + 18 + #include <asm/cacheflush.h> 19 + 20 + /* 21 + * CONFIG_ARCH_HAS_PMEM_API symbols 22 + */ 23 + void arch_wb_cache_pmem(void *addr, size_t size) 24 + { 25 + unsigned long start = (unsigned long) addr; 26 + flush_inval_dcache_range(start, start + size); 27 + } 28 + EXPORT_SYMBOL(arch_wb_cache_pmem); 29 + 30 + void arch_invalidate_pmem(void *addr, size_t size) 31 + { 32 + unsigned long start = (unsigned long) addr; 33 + flush_inval_dcache_range(start, start + size); 34 + } 35 + EXPORT_SYMBOL(arch_invalidate_pmem); 36 + 37 + /* 38 + * CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE symbols 39 + */ 40 + long __copy_from_user_flushcache(void *dest, const void __user *src, 41 + unsigned size) 42 + { 43 + unsigned long copied, start = (unsigned long) dest; 44 + 45 + copied = __copy_from_user(dest, src, size); 46 + flush_inval_dcache_range(start, start + size); 47 + 48 + return copied; 49 + } 50 + 51 + void *memcpy_flushcache(void *dest, const void *src, size_t size) 52 + { 53 + unsigned long start = (unsigned long) dest; 54 + 55 + memcpy(dest, src, size); 56 + flush_inval_dcache_range(start, start + size); 57 + 58 + return dest; 59 + } 60 + EXPORT_SYMBOL(memcpy_flushcache); 61 + 62 + void memcpy_page_flushcache(char *to, struct page *page, size_t offset, 63 + size_t len) 64 + { 65 + memcpy_flushcache(to, page_to_virt(page) + offset, len); 66 + } 67 + EXPORT_SYMBOL(memcpy_page_flushcache);
+20
arch/powerpc/lib/sstep.c
··· 31 31 #define XER_SO 0x80000000U 32 32 #define XER_OV 0x40000000U 33 33 #define XER_CA 0x20000000U 34 + #define XER_OV32 0x00080000U 35 + #define XER_CA32 0x00040000U 34 36 35 37 #ifdef CONFIG_PPC_FPU 36 38 /* ··· 964 962 op->ccval |= 0x20000000; 965 963 } 966 964 965 + static nokprobe_inline void set_ca32(struct instruction_op *op, bool val) 966 + { 967 + if (cpu_has_feature(CPU_FTR_ARCH_300)) { 968 + if (val) 969 + op->xerval |= XER_CA32; 970 + else 971 + op->xerval &= ~XER_CA32; 972 + } 973 + } 974 + 967 975 static nokprobe_inline void add_with_carry(const struct pt_regs *regs, 968 976 struct instruction_op *op, int rd, 969 977 unsigned long val1, unsigned long val2, ··· 997 985 op->xerval |= XER_CA; 998 986 else 999 987 op->xerval &= ~XER_CA; 988 + 989 + set_ca32(op, (unsigned int)val < (unsigned int)val1 || 990 + (carry_in && (unsigned int)val == (unsigned int)val1)); 1000 991 } 1001 992 1002 993 static nokprobe_inline void do_cmp_signed(const struct pt_regs *regs, ··· 1806 1791 op->xerval |= XER_CA; 1807 1792 else 1808 1793 op->xerval &= ~XER_CA; 1794 + set_ca32(op, op->xerval & XER_CA); 1809 1795 goto logical_done; 1810 1796 1811 1797 case 824: /* srawi */ ··· 1819 1803 op->xerval |= XER_CA; 1820 1804 else 1821 1805 op->xerval &= ~XER_CA; 1806 + set_ca32(op, op->xerval & XER_CA); 1822 1807 goto logical_done; 1823 1808 1824 1809 #ifdef __powerpc64__ ··· 1849 1832 op->xerval |= XER_CA; 1850 1833 else 1851 1834 op->xerval &= ~XER_CA; 1835 + set_ca32(op, op->xerval & XER_CA); 1852 1836 goto logical_done; 1853 1837 1854 1838 case 826: /* sradi with sh_5 = 0 */ ··· 1863 1845 op->xerval |= XER_CA; 1864 1846 else 1865 1847 op->xerval &= ~XER_CA; 1848 + set_ca32(op, op->xerval & XER_CA); 1866 1849 goto logical_done; 1867 1850 #endif /* __powerpc64__ */ 1868 1851 ··· 2717 2698 } 2718 2699 regs->nip = next_pc; 2719 2700 } 2701 + NOKPROBE_SYMBOL(emulate_update_regs); 2720 2702 2721 2703 /* 2722 2704 * Emulate a previously-analysed load or store instruction.
+3 -3
arch/powerpc/mm/Makefile
··· 15 15 obj-$(CONFIG_PPC_BOOK3E) += tlb_low_$(BITS)e.o 16 16 hash64-$(CONFIG_PPC_NATIVE) := hash_native_64.o 17 17 obj-$(CONFIG_PPC_BOOK3E_64) += pgtable-book3e.o 18 - obj-$(CONFIG_PPC_STD_MMU_64) += pgtable-hash64.o hash_utils_64.o slb_low.o slb.o $(hash64-y) mmu_context_book3s64.o pgtable-book3s64.o 18 + obj-$(CONFIG_PPC_BOOK3S_64) += pgtable-hash64.o hash_utils_64.o slb_low.o slb.o $(hash64-y) mmu_context_book3s64.o pgtable-book3s64.o 19 19 obj-$(CONFIG_PPC_RADIX_MMU) += pgtable-radix.o tlb-radix.o 20 20 obj-$(CONFIG_PPC_STD_MMU_32) += ppc_mmu_32.o hash_low_32.o mmu_context_hash32.o 21 21 obj-$(CONFIG_PPC_STD_MMU) += tlb_hash$(BITS).o 22 - ifeq ($(CONFIG_PPC_STD_MMU_64),y) 22 + ifeq ($(CONFIG_PPC_BOOK3S_64),y) 23 23 obj-$(CONFIG_PPC_4K_PAGES) += hash64_4k.o 24 24 obj-$(CONFIG_PPC_64K_PAGES) += hash64_64k.o 25 25 endif ··· 32 32 obj-$(CONFIG_PPC_MM_SLICES) += slice.o 33 33 obj-y += hugetlbpage.o 34 34 ifeq ($(CONFIG_HUGETLB_PAGE),y) 35 - obj-$(CONFIG_PPC_STD_MMU_64) += hugetlbpage-hash64.o 35 + obj-$(CONFIG_PPC_BOOK3S_64) += hugetlbpage-hash64.o 36 36 obj-$(CONFIG_PPC_RADIX_MMU) += hugetlbpage-radix.o 37 37 obj-$(CONFIG_PPC_BOOK3E_MMU) += hugetlbpage-book3e.o 38 38 endif
+1 -1
arch/powerpc/mm/dump_hashpagetable.c
··· 500 500 address_markers[6].start_address = PHB_IO_END; 501 501 address_markers[7].start_address = IOREMAP_BASE; 502 502 address_markers[8].start_address = IOREMAP_END; 503 - #ifdef CONFIG_PPC_STD_MMU_64 503 + #ifdef CONFIG_PPC_BOOK3S_64 504 504 address_markers[9].start_address = H_VMEMMAP_BASE; 505 505 #else 506 506 address_markers[9].start_address = VMEMMAP_BASE;
+5 -5
arch/powerpc/mm/dump_linuxpagetables.c
··· 112 112 113 113 static const struct flag_info flag_array[] = { 114 114 { 115 - #ifdef CONFIG_PPC_STD_MMU_64 115 + #ifdef CONFIG_PPC_BOOK3S_64 116 116 .mask = _PAGE_PRIVILEGED, 117 117 .val = 0, 118 118 #else ··· 147 147 .set = "present", 148 148 .clear = " ", 149 149 }, { 150 - #ifdef CONFIG_PPC_STD_MMU_64 150 + #ifdef CONFIG_PPC_BOOK3S_64 151 151 .mask = H_PAGE_HASHPTE, 152 152 .val = H_PAGE_HASHPTE, 153 153 #else ··· 157 157 .set = "hpte", 158 158 .clear = " ", 159 159 }, { 160 - #ifndef CONFIG_PPC_STD_MMU_64 160 + #ifndef CONFIG_PPC_BOOK3S_64 161 161 .mask = _PAGE_GUARDED, 162 162 .val = _PAGE_GUARDED, 163 163 .set = "guarded", ··· 174 174 .set = "accessed", 175 175 .clear = " ", 176 176 }, { 177 - #ifndef CONFIG_PPC_STD_MMU_64 177 + #ifndef CONFIG_PPC_BOOK3S_64 178 178 .mask = _PAGE_WRITETHRU, 179 179 .val = _PAGE_WRITETHRU, 180 180 .set = "write through", ··· 450 450 address_markers[i++].start_address = PHB_IO_END; 451 451 address_markers[i++].start_address = IOREMAP_BASE; 452 452 address_markers[i++].start_address = IOREMAP_END; 453 - #ifdef CONFIG_PPC_STD_MMU_64 453 + #ifdef CONFIG_PPC_BOOK3S_64 454 454 address_markers[i++].start_address = H_VMEMMAP_BASE; 455 455 #else 456 456 address_markers[i++].start_address = VMEMMAP_BASE;
+1
arch/powerpc/mm/hash_utils_64.c
··· 21 21 #undef DEBUG 22 22 #undef DEBUG_LOW 23 23 24 + #define pr_fmt(fmt) "hash-mmu: " fmt 24 25 #include <linux/spinlock.h> 25 26 #include <linux/errno.h> 26 27 #include <linux/sched/mm.h>
+11 -9
arch/powerpc/mm/hugetlbpage-radix.c
··· 49 49 struct mm_struct *mm = current->mm; 50 50 struct vm_area_struct *vma; 51 51 struct hstate *h = hstate_file(file); 52 + int fixed = (flags & MAP_FIXED); 53 + unsigned long high_limit; 52 54 struct vm_unmapped_area_info info; 53 55 54 - if (unlikely(addr > mm->context.addr_limit && addr < TASK_SIZE)) 55 - mm->context.addr_limit = TASK_SIZE; 56 + high_limit = DEFAULT_MAP_WINDOW; 57 + if (addr >= high_limit || (fixed && (addr + len > high_limit))) 58 + high_limit = TASK_SIZE; 56 59 57 60 if (len & ~huge_page_mask(h)) 58 61 return -EINVAL; 59 - if (len > mm->task_size) 62 + if (len > high_limit) 60 63 return -ENOMEM; 61 64 62 - if (flags & MAP_FIXED) { 65 + if (fixed) { 66 + if (addr > high_limit - len) 67 + return -ENOMEM; 63 68 if (prepare_hugepage_range(file, addr, len)) 64 69 return -EINVAL; 65 70 return addr; ··· 73 68 if (addr) { 74 69 addr = ALIGN(addr, huge_page_size(h)); 75 70 vma = find_vma(mm, addr); 76 - if (mm->task_size - len >= addr && 71 + if (high_limit - len >= addr && 77 72 (!vma || addr + len <= vm_start_gap(vma))) 78 73 return addr; 79 74 } ··· 84 79 info.flags = VM_UNMAPPED_AREA_TOPDOWN; 85 80 info.length = len; 86 81 info.low_limit = PAGE_SIZE; 87 - info.high_limit = current->mm->mmap_base; 82 + info.high_limit = mm->mmap_base + (high_limit - DEFAULT_MAP_WINDOW); 88 83 info.align_mask = PAGE_MASK & ~huge_page_mask(h); 89 84 info.align_offset = 0; 90 - 91 - if (addr > DEFAULT_MAP_WINDOW) 92 - info.high_limit += mm->context.addr_limit - DEFAULT_MAP_WINDOW; 93 85 94 86 return vm_unmapped_area(&info); 95 87 }
+15 -6
arch/powerpc/mm/init_64.c
··· 68 68 69 69 #include "mmu_decl.h" 70 70 71 - #ifdef CONFIG_PPC_STD_MMU_64 71 + #ifdef CONFIG_PPC_BOOK3S_64 72 72 #if H_PGTABLE_RANGE > USER_VSID_RANGE 73 73 #warning Limited user VSID range means pagetable space is wasted 74 74 #endif 75 - #endif /* CONFIG_PPC_STD_MMU_64 */ 75 + #endif /* CONFIG_PPC_BOOK3S_64 */ 76 76 77 77 phys_addr_t memstart_addr = ~0; 78 78 EXPORT_SYMBOL_GPL(memstart_addr); ··· 367 367 368 368 #endif /* CONFIG_SPARSEMEM_VMEMMAP */ 369 369 370 - #ifdef CONFIG_PPC_STD_MMU_64 371 - static bool disable_radix; 370 + #ifdef CONFIG_PPC_BOOK3S_64 371 + static bool disable_radix = !IS_ENABLED(CONFIG_PPC_RADIX_MMU_DEFAULT); 372 + 372 373 static int __init parse_disable_radix(char *p) 373 374 { 374 - disable_radix = true; 375 + bool val; 376 + 377 + if (strlen(p) == 0) 378 + val = true; 379 + else if (kstrtobool(p, &val)) 380 + return -EINVAL; 381 + 382 + disable_radix = val; 383 + 375 384 return 0; 376 385 } 377 386 early_param("disable_radix", parse_disable_radix); ··· 453 444 else 454 445 hash__early_init_devtree(); 455 446 } 456 - #endif /* CONFIG_PPC_STD_MMU_64 */ 447 + #endif /* CONFIG_PPC_BOOK3S_64 */
+25 -24
arch/powerpc/mm/mmap.c
··· 106 106 { 107 107 struct mm_struct *mm = current->mm; 108 108 struct vm_area_struct *vma; 109 + int fixed = (flags & MAP_FIXED); 110 + unsigned long high_limit; 109 111 struct vm_unmapped_area_info info; 110 112 111 - if (unlikely(addr > mm->context.addr_limit && 112 - mm->context.addr_limit != TASK_SIZE)) 113 - mm->context.addr_limit = TASK_SIZE; 113 + high_limit = DEFAULT_MAP_WINDOW; 114 + if (addr >= high_limit || (fixed && (addr + len > high_limit))) 115 + high_limit = TASK_SIZE; 114 116 115 - if (len > mm->task_size - mmap_min_addr) 117 + if (len > high_limit) 116 118 return -ENOMEM; 117 119 118 - if (flags & MAP_FIXED) 120 + if (fixed) { 121 + if (addr > high_limit - len) 122 + return -ENOMEM; 119 123 return addr; 124 + } 120 125 121 126 if (addr) { 122 127 addr = PAGE_ALIGN(addr); 123 128 vma = find_vma(mm, addr); 124 - if (mm->task_size - len >= addr && addr >= mmap_min_addr && 129 + if (high_limit - len >= addr && addr >= mmap_min_addr && 125 130 (!vma || addr + len <= vm_start_gap(vma))) 126 131 return addr; 127 132 } ··· 134 129 info.flags = 0; 135 130 info.length = len; 136 131 info.low_limit = mm->mmap_base; 132 + info.high_limit = high_limit; 137 133 info.align_mask = 0; 138 - 139 - if (unlikely(addr > DEFAULT_MAP_WINDOW)) 140 - info.high_limit = mm->context.addr_limit; 141 - else 142 - info.high_limit = DEFAULT_MAP_WINDOW; 143 134 144 135 return vm_unmapped_area(&info); 145 136 } ··· 150 149 struct vm_area_struct *vma; 151 150 struct mm_struct *mm = current->mm; 152 151 unsigned long addr = addr0; 152 + int fixed = (flags & MAP_FIXED); 153 + unsigned long high_limit; 153 154 struct vm_unmapped_area_info info; 154 155 155 - if (unlikely(addr > mm->context.addr_limit && 156 - mm->context.addr_limit != TASK_SIZE)) 157 - mm->context.addr_limit = TASK_SIZE; 156 + high_limit = DEFAULT_MAP_WINDOW; 157 + if (addr >= high_limit || (fixed && (addr + len > high_limit))) 158 + high_limit = TASK_SIZE; 158 159 159 - /* requested length too big for entire address space */ 160 - if (len > mm->task_size - mmap_min_addr) 160 + if (len > high_limit) 161 161 return -ENOMEM; 162 162 163 - if (flags & MAP_FIXED) 163 + if (fixed) { 164 + if (addr > high_limit - len) 165 + return -ENOMEM; 164 166 return addr; 167 + } 165 168 166 - /* requesting a specific address */ 167 169 if (addr) { 168 170 addr = PAGE_ALIGN(addr); 169 171 vma = find_vma(mm, addr); 170 - if (mm->task_size - len >= addr && addr >= mmap_min_addr && 171 - (!vma || addr + len <= vm_start_gap(vma))) 172 + if (high_limit - len >= addr && addr >= mmap_min_addr && 173 + (!vma || addr + len <= vm_start_gap(vma))) 172 174 return addr; 173 175 } 174 176 175 177 info.flags = VM_UNMAPPED_AREA_TOPDOWN; 176 178 info.length = len; 177 179 info.low_limit = max(PAGE_SIZE, mmap_min_addr); 178 - info.high_limit = mm->mmap_base; 180 + info.high_limit = mm->mmap_base + (high_limit - DEFAULT_MAP_WINDOW); 179 181 info.align_mask = 0; 180 - 181 - if (addr > DEFAULT_MAP_WINDOW) 182 - info.high_limit += mm->context.addr_limit - DEFAULT_MAP_WINDOW; 183 182 184 183 addr = vm_unmapped_area(&info); 185 184 if (!(addr & ~PAGE_MASK))
-9
arch/powerpc/mm/mmu_context.c
··· 34 34 struct mm_struct *mm) { } 35 35 #endif 36 36 37 - #ifdef CONFIG_PPC_BOOK3S_64 38 - static inline void inc_mm_active_cpus(struct mm_struct *mm) 39 - { 40 - atomic_inc(&mm->context.active_cpus); 41 - } 42 - #else 43 - static inline void inc_mm_active_cpus(struct mm_struct *mm) { } 44 - #endif 45 - 46 37 void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, 47 38 struct task_struct *tsk) 48 39 {
+24 -9
arch/powerpc/mm/mmu_context_book3s64.c
··· 93 93 return index; 94 94 95 95 /* 96 - * We do switch_slb() early in fork, even before we setup the 97 - * mm->context.addr_limit. Default to max task size so that we copy the 98 - * default values to paca which will help us to handle slb miss early. 96 + * In the case of exec, use the default limit, 97 + * otherwise inherit it from the mm we are duplicating. 99 98 */ 100 - mm->context.addr_limit = DEFAULT_MAP_WINDOW_USER64; 99 + if (!mm->context.slb_addr_limit) 100 + mm->context.slb_addr_limit = DEFAULT_MAP_WINDOW_USER64; 101 101 102 102 /* 103 103 * The old code would re-promote on fork, we don't do that when using ··· 216 216 #ifdef CONFIG_SPAPR_TCE_IOMMU 217 217 WARN_ON_ONCE(!list_empty(&mm->context.iommu_group_mem_list)); 218 218 #endif 219 + if (radix_enabled()) 220 + WARN_ON(process_tb[mm->context.id].prtb0 != 0); 221 + else 222 + subpage_prot_free(mm); 223 + destroy_pagetable_page(mm); 224 + __destroy_context(mm->context.id); 225 + mm->context.id = MMU_NO_CONTEXT; 226 + } 227 + 228 + void arch_exit_mmap(struct mm_struct *mm) 229 + { 219 230 if (radix_enabled()) { 220 231 /* 221 232 * Radix doesn't have a valid bit in the process table 222 233 * entries. However we know that at least P9 implementation 223 234 * will avoid caching an entry with an invalid RTS field, 224 235 * and 0 is invalid. So this will do. 236 + * 237 + * This runs before the "fullmm" tlb flush in exit_mmap, 238 + * which does a RIC=2 tlbie to clear the process table 239 + * entry. See the "fullmm" comments in tlb-radix.c. 240 + * 241 + * No barrier required here after the store because 242 + * this process will do the invalidate, which starts with 243 + * ptesync. 225 244 */ 226 245 process_tb[mm->context.id].prtb0 = 0; 227 - } else 228 - subpage_prot_free(mm); 229 - destroy_pagetable_page(mm); 230 - __destroy_context(mm->context.id); 231 - mm->context.id = MMU_NO_CONTEXT; 246 + } 232 247 } 233 248 234 249 #ifdef CONFIG_PPC_RADIX_MMU
+55 -8
arch/powerpc/mm/numa.c
··· 1148 1148 int new_nid; 1149 1149 }; 1150 1150 1151 + #define TOPOLOGY_DEF_TIMER_SECS 60 1152 + 1151 1153 static u8 vphn_cpu_change_counts[NR_CPUS][MAX_DISTANCE_REF_POINTS]; 1152 1154 static cpumask_t cpu_associativity_changes_mask; 1153 1155 static int vphn_enabled; 1154 1156 static int prrn_enabled; 1155 1157 static void reset_topology_timer(void); 1158 + static int topology_timer_secs = 1; 1159 + static int topology_inited; 1160 + static int topology_update_needed; 1161 + 1162 + /* 1163 + * Change polling interval for associativity changes. 1164 + */ 1165 + int timed_topology_update(int nsecs) 1166 + { 1167 + if (vphn_enabled) { 1168 + if (nsecs > 0) 1169 + topology_timer_secs = nsecs; 1170 + else 1171 + topology_timer_secs = TOPOLOGY_DEF_TIMER_SECS; 1172 + 1173 + reset_topology_timer(); 1174 + } 1175 + 1176 + return 0; 1177 + } 1156 1178 1157 1179 /* 1158 1180 * Store the current values of the associativity change counters in the ··· 1268 1246 "hcall_vphn() experienced a hardware fault " 1269 1247 "preventing VPHN. Disabling polling...\n"); 1270 1248 stop_topology_update(); 1249 + break; 1250 + case H_SUCCESS: 1251 + dbg("VPHN hcall succeeded. Reset polling...\n"); 1252 + timed_topology_update(0); 1253 + break; 1271 1254 } 1272 1255 1273 1256 return rc; ··· 1350 1323 struct device *dev; 1351 1324 int weight, new_nid, i = 0; 1352 1325 1353 - if (!prrn_enabled && !vphn_enabled) 1326 + if (!prrn_enabled && !vphn_enabled) { 1327 + if (!topology_inited) 1328 + topology_update_needed = 1; 1354 1329 return 0; 1330 + } 1355 1331 1356 1332 weight = cpumask_weight(&cpu_associativity_changes_mask); 1357 1333 if (!weight) ··· 1393 1363 cpumask_andnot(&cpu_associativity_changes_mask, 1394 1364 &cpu_associativity_changes_mask, 1395 1365 cpu_sibling_mask(cpu)); 1366 + dbg("Assoc chg gives same node %d for cpu%d\n", 1367 + new_nid, cpu); 1396 1368 cpu = cpu_last_thread_sibling(cpu); 1397 1369 continue; 1398 1370 } 1399 1371 1400 1372 for_each_cpu(sibling, cpu_sibling_mask(cpu)) { 1401 1373 ud = &updates[i++]; 1374 + ud->next = &updates[i]; 1402 1375 ud->cpu = sibling; 1403 1376 ud->new_nid = new_nid; 1404 1377 ud->old_nid = numa_cpu_lookup_table[sibling]; 1405 1378 cpumask_set_cpu(sibling, &updated_cpus); 1406 - if (i < weight) 1407 - ud->next = &updates[i]; 1408 1379 } 1409 1380 cpu = cpu_last_thread_sibling(cpu); 1410 1381 } 1382 + 1383 + /* 1384 + * Prevent processing of 'updates' from overflowing array 1385 + * where last entry filled in a 'next' pointer. 1386 + */ 1387 + if (i) 1388 + updates[i-1].next = NULL; 1411 1389 1412 1390 pr_debug("Topology update for the following CPUs:\n"); 1413 1391 if (cpumask_weight(&updated_cpus)) { ··· 1471 1433 1472 1434 out: 1473 1435 kfree(updates); 1436 + topology_update_needed = 0; 1474 1437 return changed; 1475 1438 } 1476 1439 ··· 1505 1466 1506 1467 static void reset_topology_timer(void) 1507 1468 { 1508 - mod_timer(&topology_timer, jiffies + 60 * HZ); 1469 + mod_timer(&topology_timer, jiffies + topology_timer_secs * HZ); 1509 1470 } 1510 1471 1511 1472 #ifdef CONFIG_SMP ··· 1554 1515 if (firmware_has_feature(FW_FEATURE_PRRN)) { 1555 1516 if (!prrn_enabled) { 1556 1517 prrn_enabled = 1; 1557 - vphn_enabled = 0; 1558 1518 #ifdef CONFIG_SMP 1559 1519 rc = of_reconfig_notifier_register(&dt_update_nb); 1560 1520 #endif 1561 1521 } 1562 - } else if (firmware_has_feature(FW_FEATURE_VPHN) && 1522 + } 1523 + if (firmware_has_feature(FW_FEATURE_VPHN) && 1563 1524 lppaca_shared_proc(get_lppaca())) { 1564 1525 if (!vphn_enabled) { 1565 - prrn_enabled = 0; 1566 1526 vphn_enabled = 1; 1567 1527 setup_cpu_associativity_change_counters(); 1568 1528 timer_setup(&topology_timer, topology_timer_fn, ··· 1585 1547 #ifdef CONFIG_SMP 1586 1548 rc = of_reconfig_notifier_unregister(&dt_update_nb); 1587 1549 #endif 1588 - } else if (vphn_enabled) { 1550 + } 1551 + if (vphn_enabled) { 1589 1552 vphn_enabled = 0; 1590 1553 rc = del_timer_sync(&topology_timer); 1591 1554 } ··· 1649 1610 if (topology_updates_enabled) 1650 1611 start_topology_update(); 1651 1612 1613 + if (vphn_enabled) 1614 + topology_schedule_update(); 1615 + 1652 1616 if (!proc_create("powerpc/topology_updates", 0644, NULL, &topology_ops)) 1653 1617 return -ENOMEM; 1618 + 1619 + topology_inited = 1; 1620 + if (topology_update_needed) 1621 + bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask), 1622 + nr_cpumask_bits); 1654 1623 1655 1624 return 0; 1656 1625 }
+10
arch/powerpc/mm/pgtable-radix.c
··· 169 169 { 170 170 unsigned long start, end; 171 171 172 + /* 173 + * mark_rodata_ro() will mark itself as !writable at some point. 174 + * Due to DD1 workaround in radix__pte_update(), we'll end up with 175 + * an invalid pte and the system will crash quite severly. 176 + */ 177 + if (cpu_has_feature(CPU_FTR_POWER9_DD1)) { 178 + pr_warn("Warning: Unable to mark rodata read only on P9 DD1\n"); 179 + return; 180 + } 181 + 172 182 start = (unsigned long)_stext; 173 183 end = (unsigned long)__init_begin; 174 184
+1 -1
arch/powerpc/mm/pgtable_64.c
··· 57 57 58 58 #include "mmu_decl.h" 59 59 60 - #ifdef CONFIG_PPC_STD_MMU_64 60 + #ifdef CONFIG_PPC_BOOK3S_64 61 61 #if TASK_SIZE_USER64 > (1UL << (ESID_BITS + SID_SHIFT)) 62 62 #error TASK_SIZE_USER64 exceeds user VSID range 63 63 #endif
+1 -5
arch/powerpc/mm/slb_low.S
··· 167 167 /* 168 168 * user space make sure we are within the allowed limit 169 169 */ 170 - ld r11,PACA_ADDR_LIMIT(r13) 170 + ld r11,PACA_SLB_ADDR_LIMIT(r13) 171 171 cmpld r3,r11 172 172 bge- 8f 173 173 ··· 309 309 srdi r10,r10,(SID_SHIFT_1T - SID_SHIFT) /* get 1T ESID */ 310 310 rldimi r10,r9,ESID_BITS_1T,0 311 311 ASM_VSID_SCRAMBLE(r10,r9,r11,1T) 312 - /* 313 - * bits above VSID_BITS_1T need to be ignored from r10 314 - * also combine VSID and flags 315 - */ 316 312 317 313 li r10,MMU_SEGSIZE_1T 318 314 rldimi r11,r10,SLB_VSID_SSIZE_SHIFT,0 /* insert segment size */
+30 -32
arch/powerpc/mm/slice.c
··· 96 96 { 97 97 struct vm_area_struct *vma; 98 98 99 - if ((mm->task_size - len) < addr) 99 + if ((mm->context.slb_addr_limit - len) < addr) 100 100 return 0; 101 101 vma = find_vma(mm, addr); 102 102 return (!vma || (addr + len) <= vm_start_gap(vma)); ··· 133 133 if (!slice_low_has_vma(mm, i)) 134 134 ret->low_slices |= 1u << i; 135 135 136 - if (mm->task_size <= SLICE_LOW_TOP) 136 + if (mm->context.slb_addr_limit <= SLICE_LOW_TOP) 137 137 return; 138 138 139 - for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.addr_limit); i++) 139 + for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) 140 140 if (!slice_high_has_vma(mm, i)) 141 141 __set_bit(i, ret->high_slices); 142 142 } ··· 157 157 ret->low_slices |= 1u << i; 158 158 159 159 hpsizes = mm->context.high_slices_psize; 160 - for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.addr_limit); i++) { 160 + for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) { 161 161 mask_index = i & 0x1; 162 162 index = i >> 1; 163 163 if (((hpsizes[index] >> (mask_index * 4)) & 0xf) == psize) ··· 169 169 struct slice_mask mask, struct slice_mask available) 170 170 { 171 171 DECLARE_BITMAP(result, SLICE_NUM_HIGH); 172 - unsigned long slice_count = GET_HIGH_SLICE_INDEX(mm->context.addr_limit); 172 + unsigned long slice_count = GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); 173 173 174 174 bitmap_and(result, mask.high_slices, 175 175 available.high_slices, slice_count); ··· 219 219 mm->context.low_slices_psize = lpsizes; 220 220 221 221 hpsizes = mm->context.high_slices_psize; 222 - for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.addr_limit); i++) { 222 + for (i = 0; i < GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit); i++) { 223 223 mask_index = i & 0x1; 224 224 index = i >> 1; 225 225 if (test_bit(i, mask.high_slices)) ··· 329 329 * Only for that request for which high_limit is above 330 330 * DEFAULT_MAP_WINDOW we should apply this. 331 331 */ 332 - if (high_limit > DEFAULT_MAP_WINDOW) 333 - addr += mm->context.addr_limit - DEFAULT_MAP_WINDOW; 332 + if (high_limit > DEFAULT_MAP_WINDOW) 333 + addr += mm->context.slb_addr_limit - DEFAULT_MAP_WINDOW; 334 334 335 335 while (addr > PAGE_SIZE) { 336 336 info.high_limit = addr; ··· 412 412 struct slice_mask compat_mask; 413 413 int fixed = (flags & MAP_FIXED); 414 414 int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT); 415 + unsigned long page_size = 1UL << pshift; 415 416 struct mm_struct *mm = current->mm; 416 417 unsigned long newaddr; 417 418 unsigned long high_limit; 418 419 419 - /* 420 - * Check if we need to expland slice area. 421 - */ 422 - if (unlikely(addr > mm->context.addr_limit && 423 - mm->context.addr_limit != TASK_SIZE)) { 424 - mm->context.addr_limit = TASK_SIZE; 420 + high_limit = DEFAULT_MAP_WINDOW; 421 + if (addr >= high_limit || (fixed && (addr + len > high_limit))) 422 + high_limit = TASK_SIZE; 423 + 424 + if (len > high_limit) 425 + return -ENOMEM; 426 + if (len & (page_size - 1)) 427 + return -EINVAL; 428 + if (fixed) { 429 + if (addr & (page_size - 1)) 430 + return -EINVAL; 431 + if (addr > high_limit - len) 432 + return -ENOMEM; 433 + } 434 + 435 + if (high_limit > mm->context.slb_addr_limit) { 436 + mm->context.slb_addr_limit = high_limit; 425 437 on_each_cpu(slice_flush_segments, mm, 1); 426 438 } 427 - /* 428 - * This mmap request can allocate upt to 512TB 429 - */ 430 - if (addr > DEFAULT_MAP_WINDOW) 431 - high_limit = mm->context.addr_limit; 432 - else 433 - high_limit = DEFAULT_MAP_WINDOW; 439 + 434 440 /* 435 441 * init different masks 436 442 */ ··· 452 446 453 447 /* Sanity checks */ 454 448 BUG_ON(mm->task_size == 0); 449 + BUG_ON(mm->context.slb_addr_limit == 0); 455 450 VM_BUG_ON(radix_enabled()); 456 451 457 452 slice_dbg("slice_get_unmapped_area(mm=%p, psize=%d...\n", mm, psize); 458 453 slice_dbg(" addr=%lx, len=%lx, flags=%lx, topdown=%d\n", 459 454 addr, len, flags, topdown); 460 455 461 - if (len > mm->task_size) 462 - return -ENOMEM; 463 - if (len & ((1ul << pshift) - 1)) 464 - return -EINVAL; 465 - if (fixed && (addr & ((1ul << pshift) - 1))) 466 - return -EINVAL; 467 - if (fixed && addr > (mm->task_size - len)) 468 - return -ENOMEM; 469 - 470 456 /* If hint, make sure it matches our alignment restrictions */ 471 457 if (!fixed && addr) { 472 - addr = _ALIGN_UP(addr, 1ul << pshift); 458 + addr = _ALIGN_UP(addr, page_size); 473 459 slice_dbg(" aligned addr=%lx\n", addr); 474 460 /* Ignore hint if it's too large or overlaps a VMA */ 475 - if (addr > mm->task_size - len || 461 + if (addr > high_limit - len || 476 462 !slice_area_is_free(mm, addr, len)) 477 463 addr = 0; 478 464 }
+254 -107
arch/powerpc/mm/tlb-radix.c
··· 39 39 trace_tlbie(0, 1, rb, rs, ric, prs, r); 40 40 } 41 41 42 + static inline void __tlbie_pid(unsigned long pid, unsigned long ric) 43 + { 44 + unsigned long rb,rs,prs,r; 45 + 46 + rb = PPC_BIT(53); /* IS = 1 */ 47 + rs = pid << PPC_BITLSHIFT(31); 48 + prs = 1; /* process scoped */ 49 + r = 1; /* raidx format */ 50 + 51 + asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1) 52 + : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory"); 53 + trace_tlbie(0, 0, rb, rs, ric, prs, r); 54 + } 55 + 42 56 /* 43 57 * We use 128 set in radix mode and 256 set in hpt mode. 44 58 */ ··· 84 70 85 71 static inline void _tlbie_pid(unsigned long pid, unsigned long ric) 86 72 { 87 - unsigned long rb,rs,prs,r; 88 - 89 - rb = PPC_BIT(53); /* IS = 1 */ 90 - rs = pid << PPC_BITLSHIFT(31); 91 - prs = 1; /* process scoped */ 92 - r = 1; /* raidx format */ 93 - 94 73 asm volatile("ptesync": : :"memory"); 95 - asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1) 96 - : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory"); 74 + __tlbie_pid(pid, ric); 97 75 asm volatile("eieio; tlbsync; ptesync": : :"memory"); 98 - trace_tlbie(0, 0, rb, rs, ric, prs, r); 99 76 } 100 77 101 - static inline void _tlbiel_va(unsigned long va, unsigned long pid, 102 - unsigned long ap, unsigned long ric) 78 + static inline void __tlbiel_va(unsigned long va, unsigned long pid, 79 + unsigned long ap, unsigned long ric) 103 80 { 104 81 unsigned long rb,rs,prs,r; 105 82 ··· 100 95 prs = 1; /* process scoped */ 101 96 r = 1; /* raidx format */ 102 97 103 - asm volatile("ptesync": : :"memory"); 104 98 asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1) 105 99 : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory"); 106 - asm volatile("ptesync": : :"memory"); 107 100 trace_tlbie(0, 1, rb, rs, ric, prs, r); 108 101 } 109 102 110 - static inline void _tlbie_va(unsigned long va, unsigned long pid, 103 + static inline void __tlbiel_va_range(unsigned long start, unsigned long end, 104 + unsigned long pid, unsigned long page_size, 105 + unsigned long psize) 106 + { 107 + unsigned long addr; 108 + unsigned long ap = mmu_get_ap(psize); 109 + 110 + for (addr = start; addr < end; addr += page_size) 111 + __tlbiel_va(addr, pid, ap, RIC_FLUSH_TLB); 112 + } 113 + 114 + static inline void _tlbiel_va(unsigned long va, unsigned long pid, 115 + unsigned long psize, unsigned long ric) 116 + { 117 + unsigned long ap = mmu_get_ap(psize); 118 + 119 + asm volatile("ptesync": : :"memory"); 120 + __tlbiel_va(va, pid, ap, ric); 121 + asm volatile("ptesync": : :"memory"); 122 + } 123 + 124 + static inline void _tlbiel_va_range(unsigned long start, unsigned long end, 125 + unsigned long pid, unsigned long page_size, 126 + unsigned long psize, bool also_pwc) 127 + { 128 + asm volatile("ptesync": : :"memory"); 129 + if (also_pwc) 130 + __tlbiel_pid(pid, 0, RIC_FLUSH_PWC); 131 + __tlbiel_va_range(start, end, pid, page_size, psize); 132 + asm volatile("ptesync": : :"memory"); 133 + } 134 + 135 + static inline void __tlbie_va(unsigned long va, unsigned long pid, 111 136 unsigned long ap, unsigned long ric) 112 137 { 113 138 unsigned long rb,rs,prs,r; ··· 148 113 prs = 1; /* process scoped */ 149 114 r = 1; /* raidx format */ 150 115 151 - asm volatile("ptesync": : :"memory"); 152 116 asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1) 153 117 : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory"); 154 - asm volatile("eieio; tlbsync; ptesync": : :"memory"); 155 118 trace_tlbie(0, 0, rb, rs, ric, prs, r); 119 + } 120 + 121 + static inline void __tlbie_va_range(unsigned long start, unsigned long end, 122 + unsigned long pid, unsigned long page_size, 123 + unsigned long psize) 124 + { 125 + unsigned long addr; 126 + unsigned long ap = mmu_get_ap(psize); 127 + 128 + for (addr = start; addr < end; addr += page_size) 129 + __tlbie_va(addr, pid, ap, RIC_FLUSH_TLB); 130 + } 131 + 132 + static inline void _tlbie_va(unsigned long va, unsigned long pid, 133 + unsigned long psize, unsigned long ric) 134 + { 135 + unsigned long ap = mmu_get_ap(psize); 136 + 137 + asm volatile("ptesync": : :"memory"); 138 + __tlbie_va(va, pid, ap, ric); 139 + asm volatile("eieio; tlbsync; ptesync": : :"memory"); 140 + } 141 + 142 + static inline void _tlbie_va_range(unsigned long start, unsigned long end, 143 + unsigned long pid, unsigned long page_size, 144 + unsigned long psize, bool also_pwc) 145 + { 146 + asm volatile("ptesync": : :"memory"); 147 + if (also_pwc) 148 + __tlbie_pid(pid, RIC_FLUSH_PWC); 149 + __tlbie_va_range(start, end, pid, page_size, psize); 150 + asm volatile("eieio; tlbsync; ptesync": : :"memory"); 156 151 } 157 152 158 153 /* ··· 209 144 EXPORT_SYMBOL(radix__local_flush_tlb_mm); 210 145 211 146 #ifndef CONFIG_SMP 212 - static void radix__local_flush_all_mm(struct mm_struct *mm) 147 + void radix__local_flush_all_mm(struct mm_struct *mm) 213 148 { 214 149 unsigned long pid; 215 150 ··· 219 154 _tlbiel_pid(pid, RIC_FLUSH_ALL); 220 155 preempt_enable(); 221 156 } 157 + EXPORT_SYMBOL(radix__local_flush_all_mm); 222 158 #endif /* CONFIG_SMP */ 223 159 224 160 void radix__local_flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmaddr, 225 161 int psize) 226 162 { 227 163 unsigned long pid; 228 - unsigned long ap = mmu_get_ap(psize); 229 164 230 165 preempt_disable(); 231 - pid = mm ? mm->context.id : 0; 166 + pid = mm->context.id; 232 167 if (pid != MMU_NO_CONTEXT) 233 - _tlbiel_va(vmaddr, pid, ap, RIC_FLUSH_TLB); 168 + _tlbiel_va(vmaddr, pid, psize, RIC_FLUSH_TLB); 234 169 preempt_enable(); 235 170 } 236 171 ··· 238 173 { 239 174 #ifdef CONFIG_HUGETLB_PAGE 240 175 /* need the return fix for nohash.c */ 241 - if (vma && is_vm_hugetlb_page(vma)) 242 - return __local_flush_hugetlb_page(vma, vmaddr); 176 + if (is_vm_hugetlb_page(vma)) 177 + return radix__local_flush_hugetlb_page(vma, vmaddr); 243 178 #endif 244 - radix__local_flush_tlb_page_psize(vma ? vma->vm_mm : NULL, vmaddr, 245 - mmu_virtual_psize); 179 + radix__local_flush_tlb_page_psize(vma->vm_mm, vmaddr, mmu_virtual_psize); 246 180 } 247 181 EXPORT_SYMBOL(radix__local_flush_tlb_page); 248 182 ··· 250 186 { 251 187 unsigned long pid; 252 188 253 - preempt_disable(); 254 189 pid = mm->context.id; 255 190 if (unlikely(pid == MMU_NO_CONTEXT)) 256 - goto no_context; 191 + return; 257 192 193 + preempt_disable(); 258 194 if (!mm_is_thread_local(mm)) 259 195 _tlbie_pid(pid, RIC_FLUSH_TLB); 260 196 else 261 197 _tlbiel_pid(pid, RIC_FLUSH_TLB); 262 - no_context: 263 198 preempt_enable(); 264 199 } 265 200 EXPORT_SYMBOL(radix__flush_tlb_mm); 266 201 267 - static void radix__flush_all_mm(struct mm_struct *mm) 202 + void radix__flush_all_mm(struct mm_struct *mm) 268 203 { 269 204 unsigned long pid; 270 205 271 - preempt_disable(); 272 206 pid = mm->context.id; 273 207 if (unlikely(pid == MMU_NO_CONTEXT)) 274 - goto no_context; 208 + return; 275 209 210 + preempt_disable(); 276 211 if (!mm_is_thread_local(mm)) 277 212 _tlbie_pid(pid, RIC_FLUSH_ALL); 278 213 else 279 214 _tlbiel_pid(pid, RIC_FLUSH_ALL); 280 - no_context: 281 215 preempt_enable(); 282 216 } 217 + EXPORT_SYMBOL(radix__flush_all_mm); 283 218 284 219 void radix__flush_tlb_pwc(struct mmu_gather *tlb, unsigned long addr) 285 220 { ··· 290 227 int psize) 291 228 { 292 229 unsigned long pid; 293 - unsigned long ap = mmu_get_ap(psize); 230 + 231 + pid = mm->context.id; 232 + if (unlikely(pid == MMU_NO_CONTEXT)) 233 + return; 294 234 295 235 preempt_disable(); 296 - pid = mm ? mm->context.id : 0; 297 - if (unlikely(pid == MMU_NO_CONTEXT)) 298 - goto bail; 299 236 if (!mm_is_thread_local(mm)) 300 - _tlbie_va(vmaddr, pid, ap, RIC_FLUSH_TLB); 237 + _tlbie_va(vmaddr, pid, psize, RIC_FLUSH_TLB); 301 238 else 302 - _tlbiel_va(vmaddr, pid, ap, RIC_FLUSH_TLB); 303 - bail: 239 + _tlbiel_va(vmaddr, pid, psize, RIC_FLUSH_TLB); 304 240 preempt_enable(); 305 241 } 306 242 307 243 void radix__flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr) 308 244 { 309 245 #ifdef CONFIG_HUGETLB_PAGE 310 - if (vma && is_vm_hugetlb_page(vma)) 311 - return flush_hugetlb_page(vma, vmaddr); 246 + if (is_vm_hugetlb_page(vma)) 247 + return radix__flush_hugetlb_page(vma, vmaddr); 312 248 #endif 313 - radix__flush_tlb_page_psize(vma ? vma->vm_mm : NULL, vmaddr, 314 - mmu_virtual_psize); 249 + radix__flush_tlb_page_psize(vma->vm_mm, vmaddr, mmu_virtual_psize); 315 250 } 316 251 EXPORT_SYMBOL(radix__flush_tlb_page); 317 252 ··· 323 262 } 324 263 EXPORT_SYMBOL(radix__flush_tlb_kernel_range); 325 264 265 + #define TLB_FLUSH_ALL -1UL 266 + 326 267 /* 327 - * Currently, for range flushing, we just do a full mm flush. Because 328 - * we use this in code path where we don' track the page size. 268 + * Number of pages above which we invalidate the entire PID rather than 269 + * flush individual pages, for local and global flushes respectively. 270 + * 271 + * tlbie goes out to the interconnect and individual ops are more costly. 272 + * It also does not iterate over sets like the local tlbiel variant when 273 + * invalidating a full PID, so it has a far lower threshold to change from 274 + * individual page flushes to full-pid flushes. 329 275 */ 276 + static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33; 277 + static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2; 278 + 330 279 void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start, 331 280 unsigned long end) 332 281 333 282 { 334 283 struct mm_struct *mm = vma->vm_mm; 284 + unsigned long pid; 285 + unsigned int page_shift = mmu_psize_defs[mmu_virtual_psize].shift; 286 + unsigned long page_size = 1UL << page_shift; 287 + unsigned long nr_pages = (end - start) >> page_shift; 288 + bool local, full; 335 289 336 - radix__flush_tlb_mm(mm); 290 + #ifdef CONFIG_HUGETLB_PAGE 291 + if (is_vm_hugetlb_page(vma)) 292 + return radix__flush_hugetlb_tlb_range(vma, start, end); 293 + #endif 294 + 295 + pid = mm->context.id; 296 + if (unlikely(pid == MMU_NO_CONTEXT)) 297 + return; 298 + 299 + preempt_disable(); 300 + if (mm_is_thread_local(mm)) { 301 + local = true; 302 + full = (end == TLB_FLUSH_ALL || 303 + nr_pages > tlb_local_single_page_flush_ceiling); 304 + } else { 305 + local = false; 306 + full = (end == TLB_FLUSH_ALL || 307 + nr_pages > tlb_single_page_flush_ceiling); 308 + } 309 + 310 + if (full) { 311 + if (local) 312 + _tlbiel_pid(pid, RIC_FLUSH_TLB); 313 + else 314 + _tlbie_pid(pid, RIC_FLUSH_TLB); 315 + } else { 316 + bool hflush = false; 317 + unsigned long hstart, hend; 318 + 319 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 320 + hstart = (start + HPAGE_PMD_SIZE - 1) >> HPAGE_PMD_SHIFT; 321 + hend = end >> HPAGE_PMD_SHIFT; 322 + if (hstart < hend) { 323 + hstart <<= HPAGE_PMD_SHIFT; 324 + hend <<= HPAGE_PMD_SHIFT; 325 + hflush = true; 326 + } 327 + #endif 328 + 329 + asm volatile("ptesync": : :"memory"); 330 + if (local) { 331 + __tlbiel_va_range(start, end, pid, page_size, mmu_virtual_psize); 332 + if (hflush) 333 + __tlbiel_va_range(hstart, hend, pid, 334 + HPAGE_PMD_SIZE, MMU_PAGE_2M); 335 + asm volatile("ptesync": : :"memory"); 336 + } else { 337 + __tlbie_va_range(start, end, pid, page_size, mmu_virtual_psize); 338 + if (hflush) 339 + __tlbie_va_range(hstart, hend, pid, 340 + HPAGE_PMD_SIZE, MMU_PAGE_2M); 341 + asm volatile("eieio; tlbsync; ptesync": : :"memory"); 342 + } 343 + } 344 + preempt_enable(); 337 345 } 338 346 EXPORT_SYMBOL(radix__flush_tlb_range); 339 347 ··· 421 291 return psize; 422 292 } 423 293 294 + static void radix__flush_tlb_pwc_range_psize(struct mm_struct *mm, unsigned long start, 295 + unsigned long end, int psize); 296 + 424 297 void radix__tlb_flush(struct mmu_gather *tlb) 425 298 { 426 299 int psize = 0; 427 300 struct mm_struct *mm = tlb->mm; 428 301 int page_size = tlb->page_size; 429 302 430 - psize = radix_get_mmu_psize(page_size); 431 303 /* 432 304 * if page size is not something we understand, do a full mm flush 305 + * 306 + * A "fullmm" flush must always do a flush_all_mm (RIC=2) flush 307 + * that flushes the process table entry cache upon process teardown. 308 + * See the comment for radix in arch_exit_mmap(). 433 309 */ 434 - if (psize != -1 && !tlb->fullmm && !tlb->need_flush_all) 435 - radix__flush_tlb_range_psize(mm, tlb->start, tlb->end, psize); 436 - else if (tlb->need_flush_all) { 437 - tlb->need_flush_all = 0; 310 + if (tlb->fullmm) { 438 311 radix__flush_all_mm(mm); 439 - } else 440 - radix__flush_tlb_mm(mm); 312 + } else if ( (psize = radix_get_mmu_psize(page_size)) == -1) { 313 + if (!tlb->need_flush_all) 314 + radix__flush_tlb_mm(mm); 315 + else 316 + radix__flush_all_mm(mm); 317 + } else { 318 + unsigned long start = tlb->start; 319 + unsigned long end = tlb->end; 320 + 321 + if (!tlb->need_flush_all) 322 + radix__flush_tlb_range_psize(mm, start, end, psize); 323 + else 324 + radix__flush_tlb_pwc_range_psize(mm, start, end, psize); 325 + } 326 + tlb->need_flush_all = 0; 441 327 } 442 328 443 - #define TLB_FLUSH_ALL -1UL 444 - /* 445 - * Number of pages above which we will do a bcast tlbie. Just a 446 - * number at this point copied from x86 447 - */ 448 - static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33; 329 + static inline void __radix__flush_tlb_range_psize(struct mm_struct *mm, 330 + unsigned long start, unsigned long end, 331 + int psize, bool also_pwc) 332 + { 333 + unsigned long pid; 334 + unsigned int page_shift = mmu_psize_defs[psize].shift; 335 + unsigned long page_size = 1UL << page_shift; 336 + unsigned long nr_pages = (end - start) >> page_shift; 337 + bool local, full; 338 + 339 + pid = mm->context.id; 340 + if (unlikely(pid == MMU_NO_CONTEXT)) 341 + return; 342 + 343 + preempt_disable(); 344 + if (mm_is_thread_local(mm)) { 345 + local = true; 346 + full = (end == TLB_FLUSH_ALL || 347 + nr_pages > tlb_local_single_page_flush_ceiling); 348 + } else { 349 + local = false; 350 + full = (end == TLB_FLUSH_ALL || 351 + nr_pages > tlb_single_page_flush_ceiling); 352 + } 353 + 354 + if (full) { 355 + if (local) 356 + _tlbiel_pid(pid, also_pwc ? RIC_FLUSH_ALL : RIC_FLUSH_TLB); 357 + else 358 + _tlbie_pid(pid, also_pwc ? RIC_FLUSH_ALL: RIC_FLUSH_TLB); 359 + } else { 360 + if (local) 361 + _tlbiel_va_range(start, end, pid, page_size, psize, also_pwc); 362 + else 363 + _tlbie_va_range(start, end, pid, page_size, psize, also_pwc); 364 + } 365 + preempt_enable(); 366 + } 449 367 450 368 void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start, 451 369 unsigned long end, int psize) 452 370 { 453 - unsigned long pid; 454 - unsigned long addr; 455 - int local = mm_is_thread_local(mm); 456 - unsigned long ap = mmu_get_ap(psize); 457 - unsigned long page_size = 1UL << mmu_psize_defs[psize].shift; 371 + return __radix__flush_tlb_range_psize(mm, start, end, psize, false); 372 + } 458 373 459 - 460 - preempt_disable(); 461 - pid = mm ? mm->context.id : 0; 462 - if (unlikely(pid == MMU_NO_CONTEXT)) 463 - goto err_out; 464 - 465 - if (end == TLB_FLUSH_ALL || 466 - (end - start) > tlb_single_page_flush_ceiling * page_size) { 467 - if (local) 468 - _tlbiel_pid(pid, RIC_FLUSH_TLB); 469 - else 470 - _tlbie_pid(pid, RIC_FLUSH_TLB); 471 - goto err_out; 472 - } 473 - for (addr = start; addr < end; addr += page_size) { 474 - 475 - if (local) 476 - _tlbiel_va(addr, pid, ap, RIC_FLUSH_TLB); 477 - else 478 - _tlbie_va(addr, pid, ap, RIC_FLUSH_TLB); 479 - } 480 - err_out: 481 - preempt_enable(); 374 + static void radix__flush_tlb_pwc_range_psize(struct mm_struct *mm, unsigned long start, 375 + unsigned long end, int psize) 376 + { 377 + __radix__flush_tlb_range_psize(mm, start, end, psize, true); 482 378 } 483 379 484 380 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 485 381 void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr) 486 382 { 487 - int local = mm_is_thread_local(mm); 488 - unsigned long ap = mmu_get_ap(mmu_virtual_psize); 489 383 unsigned long pid, end; 490 384 491 - 492 - pid = mm ? mm->context.id : 0; 493 - preempt_disable(); 385 + pid = mm->context.id; 494 386 if (unlikely(pid == MMU_NO_CONTEXT)) 495 - goto no_context; 387 + return; 496 388 497 389 /* 4k page size, just blow the world */ 498 390 if (PAGE_SIZE == 0x1000) { 499 391 radix__flush_all_mm(mm); 500 - preempt_enable(); 501 392 return; 502 393 } 503 394 504 - /* Otherwise first do the PWC */ 505 - if (local) 506 - _tlbiel_pid(pid, RIC_FLUSH_PWC); 507 - else 508 - _tlbie_pid(pid, RIC_FLUSH_PWC); 509 - 510 - /* Then iterate the pages */ 511 395 end = addr + HPAGE_PMD_SIZE; 512 - for (; addr < end; addr += PAGE_SIZE) { 513 - if (local) 514 - _tlbiel_va(addr, pid, ap, RIC_FLUSH_TLB); 515 - else 516 - _tlbie_va(addr, pid, ap, RIC_FLUSH_TLB); 396 + 397 + /* Otherwise first do the PWC, then iterate the pages. */ 398 + preempt_disable(); 399 + 400 + if (mm_is_thread_local(mm)) { 401 + _tlbiel_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize, true); 402 + } else { 403 + _tlbie_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize, true); 517 404 } 518 - no_context: 405 + 519 406 preempt_enable(); 520 407 } 521 408 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+4 -3
arch/powerpc/net/bpf_jit64.h
··· 23 23 * [ nv gpr save area ] 8*8 | 24 24 * [ tail_call_cnt ] 8 | 25 25 * [ local_tmp_var ] 8 | 26 - * fp (r31) --> [ ebpf stack space ] 512 | 26 + * fp (r31) --> [ ebpf stack space ] upto 512 | 27 27 * [ frame header ] 32/112 | 28 28 * sp (r1) ---> [ stack pointer ] -------------- 29 29 */ ··· 32 32 #define BPF_PPC_STACK_SAVE (8*8) 33 33 /* for bpf JIT code internal usage */ 34 34 #define BPF_PPC_STACK_LOCALS 16 35 - /* Ensure this is quadword aligned */ 36 - #define BPF_PPC_STACKFRAME (STACK_FRAME_MIN_SIZE + MAX_BPF_STACK + \ 35 + /* stack frame excluding BPF stack, ensure this is quadword aligned */ 36 + #define BPF_PPC_STACKFRAME (STACK_FRAME_MIN_SIZE + \ 37 37 BPF_PPC_STACK_LOCALS + BPF_PPC_STACK_SAVE) 38 38 39 39 #ifndef __ASSEMBLY__ ··· 103 103 */ 104 104 unsigned int seen; 105 105 unsigned int idx; 106 + unsigned int stack_size; 106 107 }; 107 108 108 109 #endif /* !__ASSEMBLY__ */
+10 -6
arch/powerpc/net/bpf_jit_comp64.c
··· 69 69 static int bpf_jit_stack_local(struct codegen_context *ctx) 70 70 { 71 71 if (bpf_has_stack_frame(ctx)) 72 - return STACK_FRAME_MIN_SIZE + MAX_BPF_STACK; 72 + return STACK_FRAME_MIN_SIZE + ctx->stack_size; 73 73 else 74 74 return -(BPF_PPC_STACK_SAVE + 16); 75 75 } ··· 82 82 static int bpf_jit_stack_offsetof(struct codegen_context *ctx, int reg) 83 83 { 84 84 if (reg >= BPF_PPC_NVR_MIN && reg < 32) 85 - return (bpf_has_stack_frame(ctx) ? BPF_PPC_STACKFRAME : 0) 86 - - (8 * (32 - reg)); 85 + return (bpf_has_stack_frame(ctx) ? 86 + (BPF_PPC_STACKFRAME + ctx->stack_size) : 0) 87 + - (8 * (32 - reg)); 87 88 88 89 pr_err("BPF JIT is asking about unknown registers"); 89 90 BUG(); ··· 135 134 PPC_BPF_STL(0, 1, PPC_LR_STKOFF); 136 135 } 137 136 138 - PPC_BPF_STLU(1, 1, -BPF_PPC_STACKFRAME); 137 + PPC_BPF_STLU(1, 1, -(BPF_PPC_STACKFRAME + ctx->stack_size)); 139 138 } 140 139 141 140 /* ··· 162 161 /* Setup frame pointer to point to the bpf stack area */ 163 162 if (bpf_is_seen_register(ctx, BPF_REG_FP)) 164 163 PPC_ADDI(b2p[BPF_REG_FP], 1, 165 - STACK_FRAME_MIN_SIZE + MAX_BPF_STACK); 164 + STACK_FRAME_MIN_SIZE + ctx->stack_size); 166 165 } 167 166 168 167 static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context *ctx) ··· 184 183 185 184 /* Tear down our stack frame */ 186 185 if (bpf_has_stack_frame(ctx)) { 187 - PPC_ADDI(1, 1, BPF_PPC_STACKFRAME); 186 + PPC_ADDI(1, 1, BPF_PPC_STACKFRAME + ctx->stack_size); 188 187 if (ctx->seen & SEEN_FUNC) { 189 188 PPC_BPF_LL(0, 1, PPC_LR_STKOFF); 190 189 PPC_MTLR(0); ··· 1013 1012 } 1014 1013 1015 1014 memset(&cgctx, 0, sizeof(struct codegen_context)); 1015 + 1016 + /* Make sure that the stack is quadword aligned. */ 1017 + cgctx.stack_size = round_up(fp->aux->stack_depth, 16); 1016 1018 1017 1019 /* Scouting faux-generate pass 0 */ 1018 1020 if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
+2 -6
arch/powerpc/oprofile/op_model_cell.c
··· 555 555 556 556 static void start_virt_cntrs(void) 557 557 { 558 - init_timer(&timer_virt_cntr); 559 - timer_virt_cntr.function = cell_virtual_cntr; 560 - timer_virt_cntr.data = 0UL; 558 + setup_timer(&timer_virt_cntr, cell_virtual_cntr, 0UL); 561 559 timer_virt_cntr.expires = jiffies + HZ / 10; 562 560 add_timer(&timer_virt_cntr); 563 561 } ··· 677 679 678 680 static void start_spu_event_swap(void) 679 681 { 680 - init_timer(&timer_spu_event_swap); 681 - timer_spu_event_swap.function = spu_evnt_swap; 682 - timer_spu_event_swap.data = 0UL; 682 + setup_timer(&timer_spu_event_swap, spu_evnt_swap, 0UL); 683 683 timer_spu_event_swap.expires = jiffies + HZ / 25; 684 684 add_timer(&timer_spu_event_swap); 685 685 }
+1 -1
arch/powerpc/perf/hv-24x7.c
··· 540 540 { 541 541 if (s1 < s2) 542 542 return 1; 543 - if (s2 > s1) 543 + if (s1 > s2) 544 544 return -1; 545 545 546 546 return memcmp(d1, d2, s1);
+14 -5
arch/powerpc/platforms/Kconfig.cputype
··· 295 295 def_bool y 296 296 depends on PPC_STD_MMU && PPC32 297 297 298 - config PPC_STD_MMU_64 299 - def_bool y 300 - depends on PPC_STD_MMU && PPC64 301 - 302 298 config PPC_RADIX_MMU 303 299 bool "Radix MMU Support" 304 300 depends on PPC_BOOK3S_64 ··· 304 308 Enable support for the Power ISA 3.0 Radix style MMU. Currently this 305 309 is only implemented by IBM Power9 CPUs, if you don't have one of them 306 310 you can probably disable this. 311 + 312 + config PPC_RADIX_MMU_DEFAULT 313 + bool "Default to using the Radix MMU when possible" 314 + depends on PPC_RADIX_MMU 315 + default y 316 + help 317 + When the hardware supports the Radix MMU, default to using it unless 318 + "disable_radix[=yes]" is specified on the kernel command line. 319 + 320 + If this option is disabled, the Hash MMU will be used by default, 321 + unless "disable_radix=no" is specified on the kernel command line. 322 + 323 + If you're unsure, say Y. 307 324 308 325 config ARCH_ENABLE_HUGEPAGE_MIGRATION 309 326 def_bool y ··· 333 324 334 325 config PPC_MM_SLICES 335 326 bool 336 - default y if PPC_STD_MMU_64 327 + default y if PPC_BOOK3S_64 337 328 default n 338 329 339 330 config PPC_HAVE_PMU_SUPPORT
+1 -3
arch/powerpc/platforms/powermac/low_i2c.c
··· 513 513 mutex_init(&host->mutex); 514 514 init_completion(&host->complete); 515 515 spin_lock_init(&host->lock); 516 - init_timer(&host->timeout_timer); 517 - host->timeout_timer.function = kw_i2c_timeout; 518 - host->timeout_timer.data = (unsigned long)host; 516 + setup_timer(&host->timeout_timer, kw_i2c_timeout, (unsigned long)host); 519 517 520 518 psteps = of_get_property(np, "AAPL,address-step", NULL); 521 519 steps = psteps ? (*psteps) : 0x10;
+2 -1
arch/powerpc/platforms/powernv/Makefile
··· 15 15 obj-$(CONFIG_OPAL_PRD) += opal-prd.o 16 16 obj-$(CONFIG_PERF_EVENTS) += opal-imc.o 17 17 obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o 18 - obj-$(CONFIG_PPC_VAS) += vas.o vas-window.o 18 + obj-$(CONFIG_PPC_VAS) += vas.o vas-window.o vas-debug.o 19 + obj-$(CONFIG_PPC_FTW) += nx-ftw.o
+22 -20
arch/powerpc/platforms/powernv/eeh-powernv.c
··· 41 41 #include "powernv.h" 42 42 #include "pci.h" 43 43 44 - static bool pnv_eeh_nb_init = false; 45 44 static int eeh_event_irq = -EINVAL; 46 45 47 46 static int pnv_eeh_init(void) ··· 196 197 * been built. If the I/O cache staff has been built, EEH is 197 198 * ready to supply service. 198 199 */ 199 - static int pnv_eeh_post_init(void) 200 + int pnv_eeh_post_init(void) 200 201 { 201 202 struct pci_controller *hose; 202 203 struct pnv_phb *phb; 203 204 int ret = 0; 204 205 206 + /* Probe devices & build address cache */ 207 + eeh_probe_devices(); 208 + eeh_addr_cache_build(); 209 + 205 210 /* Register OPAL event notifier */ 206 - if (!pnv_eeh_nb_init) { 207 - eeh_event_irq = opal_event_request(ilog2(OPAL_EVENT_PCI_ERROR)); 208 - if (eeh_event_irq < 0) { 209 - pr_err("%s: Can't register OPAL event interrupt (%d)\n", 210 - __func__, eeh_event_irq); 211 - return eeh_event_irq; 212 - } 211 + eeh_event_irq = opal_event_request(ilog2(OPAL_EVENT_PCI_ERROR)); 212 + if (eeh_event_irq < 0) { 213 + pr_err("%s: Can't register OPAL event interrupt (%d)\n", 214 + __func__, eeh_event_irq); 215 + return eeh_event_irq; 216 + } 213 217 214 - ret = request_irq(eeh_event_irq, pnv_eeh_event, 215 - IRQ_TYPE_LEVEL_HIGH, "opal-eeh", NULL); 216 - if (ret < 0) { 217 - irq_dispose_mapping(eeh_event_irq); 218 - pr_err("%s: Can't request OPAL event interrupt (%d)\n", 219 - __func__, eeh_event_irq); 220 - return ret; 221 - } 222 - 223 - pnv_eeh_nb_init = true; 218 + ret = request_irq(eeh_event_irq, pnv_eeh_event, 219 + IRQ_TYPE_LEVEL_HIGH, "opal-eeh", NULL); 220 + if (ret < 0) { 221 + irq_dispose_mapping(eeh_event_irq); 222 + pr_err("%s: Can't request OPAL event interrupt (%d)\n", 223 + __func__, eeh_event_irq); 224 + return ret; 224 225 } 225 226 226 227 if (!eeh_enabled()) ··· 364 365 365 366 /* Skip for PCI-ISA bridge */ 366 367 if ((pdn->class_code >> 8) == PCI_CLASS_BRIDGE_ISA) 368 + return NULL; 369 + 370 + /* Skip if we haven't probed yet */ 371 + if (phb->ioda.pe_rmap[config_addr] == IODA_INVALID_PE) 367 372 return NULL; 368 373 369 374 /* Initialize eeh device */ ··· 1734 1731 static struct eeh_ops pnv_eeh_ops = { 1735 1732 .name = "powernv", 1736 1733 .init = pnv_eeh_init, 1737 - .post_init = pnv_eeh_post_init, 1738 1734 .probe = pnv_eeh_probe, 1739 1735 .set_option = pnv_eeh_set_option, 1740 1736 .get_pe_addr = pnv_eeh_get_pe_addr,
+23 -5
arch/powerpc/platforms/powernv/npu-dma.c
··· 395 395 struct pci_dev *npdev[NV_MAX_NPUS][NV_MAX_LINKS]; 396 396 struct mmu_notifier mn; 397 397 struct kref kref; 398 + bool nmmu_flush; 398 399 399 400 /* Callback to stop translation requests on a given GPU */ 400 401 struct npu_context *(*release_cb)(struct npu_context *, void *); ··· 546 545 struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS]; 547 546 unsigned long pid = npu_context->mm->context.id; 548 547 549 - /* 550 - * Unfortunately the nest mmu does not support flushing specific 551 - * addresses so we have to flush the whole mm. 552 - */ 553 - flush_tlb_mm(npu_context->mm); 548 + if (npu_context->nmmu_flush) 549 + /* 550 + * Unfortunately the nest mmu does not support flushing specific 551 + * addresses so we have to flush the whole mm once before 552 + * shooting down the GPU translation. 553 + */ 554 + flush_all_mm(npu_context->mm); 554 555 555 556 /* 556 557 * Loop over all the NPUs this process is active on and launch ··· 725 722 return ERR_PTR(-ENODEV); 726 723 npu_context->npdev[npu->index][nvlink_index] = npdev; 727 724 725 + if (!nphb->npu.nmmu_flush) { 726 + /* 727 + * If we're not explicitly flushing ourselves we need to mark 728 + * the thread for global flushes 729 + */ 730 + npu_context->nmmu_flush = false; 731 + mm_context_add_copro(mm); 732 + } else 733 + npu_context->nmmu_flush = true; 734 + 728 735 return npu_context; 729 736 } 730 737 EXPORT_SYMBOL(pnv_npu2_init_context); ··· 743 730 { 744 731 struct npu_context *npu_context = 745 732 container_of(kref, struct npu_context, kref); 733 + 734 + if (!npu_context->nmmu_flush) 735 + mm_context_remove_copro(npu_context->mm); 746 736 747 737 npu_context->mm->context.npu_context = NULL; 748 738 mmu_notifier_unregister(&npu_context->mn, ··· 835 819 static int npu_index; 836 820 uint64_t rc = 0; 837 821 822 + phb->npu.nmmu_flush = 823 + of_property_read_bool(phb->hose->dn, "ibm,nmmu-flush"); 838 824 for_each_child_of_node(phb->hose->dn, dn) { 839 825 gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn)); 840 826 if (gpdev) {
+132 -50
arch/powerpc/platforms/powernv/opal-async.c
··· 1 1 /* 2 2 * PowerNV OPAL asynchronous completion interfaces 3 3 * 4 - * Copyright 2013 IBM Corp. 4 + * Copyright 2013-2017 IBM Corp. 5 5 * 6 6 * This program is free software; you can redistribute it and/or 7 7 * modify it under the terms of the GNU General Public License ··· 23 23 #include <asm/machdep.h> 24 24 #include <asm/opal.h> 25 25 26 - #define N_ASYNC_COMPLETIONS 64 26 + enum opal_async_token_state { 27 + ASYNC_TOKEN_UNALLOCATED = 0, 28 + ASYNC_TOKEN_ALLOCATED, 29 + ASYNC_TOKEN_DISPATCHED, 30 + ASYNC_TOKEN_ABANDONED, 31 + ASYNC_TOKEN_COMPLETED 32 + }; 27 33 28 - static DECLARE_BITMAP(opal_async_complete_map, N_ASYNC_COMPLETIONS) = {~0UL}; 29 - static DECLARE_BITMAP(opal_async_token_map, N_ASYNC_COMPLETIONS); 34 + struct opal_async_token { 35 + enum opal_async_token_state state; 36 + struct opal_msg response; 37 + }; 38 + 30 39 static DECLARE_WAIT_QUEUE_HEAD(opal_async_wait); 31 40 static DEFINE_SPINLOCK(opal_async_comp_lock); 32 41 static struct semaphore opal_async_sem; 33 - static struct opal_msg *opal_async_responses; 34 42 static unsigned int opal_max_async_tokens; 43 + static struct opal_async_token *opal_async_tokens; 35 44 36 - int __opal_async_get_token(void) 45 + static int __opal_async_get_token(void) 37 46 { 38 47 unsigned long flags; 39 - int token; 48 + int i, token = -EBUSY; 40 49 41 50 spin_lock_irqsave(&opal_async_comp_lock, flags); 42 - token = find_first_bit(opal_async_complete_map, opal_max_async_tokens); 43 - if (token >= opal_max_async_tokens) { 44 - token = -EBUSY; 45 - goto out; 51 + 52 + for (i = 0; i < opal_max_async_tokens; i++) { 53 + if (opal_async_tokens[i].state == ASYNC_TOKEN_UNALLOCATED) { 54 + opal_async_tokens[i].state = ASYNC_TOKEN_ALLOCATED; 55 + token = i; 56 + break; 57 + } 46 58 } 47 59 48 - if (__test_and_set_bit(token, opal_async_token_map)) { 49 - token = -EBUSY; 50 - goto out; 51 - } 52 - 53 - __clear_bit(token, opal_async_complete_map); 54 - 55 - out: 56 60 spin_unlock_irqrestore(&opal_async_comp_lock, flags); 57 61 return token; 58 62 } 59 63 64 + /* 65 + * Note: If the returned token is used in an opal call and opal returns 66 + * OPAL_ASYNC_COMPLETION you MUST call one of opal_async_wait_response() or 67 + * opal_async_wait_response_interruptible() at least once before calling another 68 + * opal_async_* function 69 + */ 60 70 int opal_async_get_token_interruptible(void) 61 71 { 62 72 int token; ··· 83 73 } 84 74 EXPORT_SYMBOL_GPL(opal_async_get_token_interruptible); 85 75 86 - int __opal_async_release_token(int token) 76 + static int __opal_async_release_token(int token) 87 77 { 88 78 unsigned long flags; 79 + int rc; 89 80 90 81 if (token < 0 || token >= opal_max_async_tokens) { 91 82 pr_err("%s: Passed token is out of range, token %d\n", ··· 95 84 } 96 85 97 86 spin_lock_irqsave(&opal_async_comp_lock, flags); 98 - __set_bit(token, opal_async_complete_map); 99 - __clear_bit(token, opal_async_token_map); 87 + switch (opal_async_tokens[token].state) { 88 + case ASYNC_TOKEN_COMPLETED: 89 + case ASYNC_TOKEN_ALLOCATED: 90 + opal_async_tokens[token].state = ASYNC_TOKEN_UNALLOCATED; 91 + rc = 0; 92 + break; 93 + /* 94 + * DISPATCHED and ABANDONED tokens must wait for OPAL to respond. 95 + * Mark a DISPATCHED token as ABANDONED so that the response handling 96 + * code knows no one cares and that it can free it then. 97 + */ 98 + case ASYNC_TOKEN_DISPATCHED: 99 + opal_async_tokens[token].state = ASYNC_TOKEN_ABANDONED; 100 + /* Fall through */ 101 + default: 102 + rc = 1; 103 + } 100 104 spin_unlock_irqrestore(&opal_async_comp_lock, flags); 101 105 102 - return 0; 106 + return rc; 103 107 } 104 108 105 109 int opal_async_release_token(int token) ··· 122 96 int ret; 123 97 124 98 ret = __opal_async_release_token(token); 125 - if (ret) 126 - return ret; 99 + if (!ret) 100 + up(&opal_async_sem); 127 101 128 - up(&opal_async_sem); 129 - 130 - return 0; 102 + return ret; 131 103 } 132 104 EXPORT_SYMBOL_GPL(opal_async_release_token); 133 105 ··· 141 117 return -EINVAL; 142 118 } 143 119 144 - /* Wakeup the poller before we wait for events to speed things 120 + /* 121 + * There is no need to mark the token as dispatched, wait_event() 122 + * will block until the token completes. 123 + * 124 + * Wakeup the poller before we wait for events to speed things 145 125 * up on platforms or simulators where the interrupts aren't 146 126 * functional. 147 127 */ 148 128 opal_wake_poller(); 149 - wait_event(opal_async_wait, test_bit(token, opal_async_complete_map)); 150 - memcpy(msg, &opal_async_responses[token], sizeof(*msg)); 129 + wait_event(opal_async_wait, opal_async_tokens[token].state 130 + == ASYNC_TOKEN_COMPLETED); 131 + memcpy(msg, &opal_async_tokens[token].response, sizeof(*msg)); 151 132 152 133 return 0; 153 134 } 154 135 EXPORT_SYMBOL_GPL(opal_async_wait_response); 155 136 137 + int opal_async_wait_response_interruptible(uint64_t token, struct opal_msg *msg) 138 + { 139 + unsigned long flags; 140 + int ret; 141 + 142 + if (token >= opal_max_async_tokens) { 143 + pr_err("%s: Invalid token passed\n", __func__); 144 + return -EINVAL; 145 + } 146 + 147 + if (!msg) { 148 + pr_err("%s: Invalid message pointer passed\n", __func__); 149 + return -EINVAL; 150 + } 151 + 152 + /* 153 + * The first time this gets called we mark the token as DISPATCHED 154 + * so that if wait_event_interruptible() returns not zero and the 155 + * caller frees the token, we know not to actually free the token 156 + * until the response comes. 157 + * 158 + * Only change if the token is ALLOCATED - it may have been 159 + * completed even before the caller gets around to calling this 160 + * the first time. 161 + * 162 + * There is also a dirty great comment at the token allocation 163 + * function that if the opal call returns OPAL_ASYNC_COMPLETION to 164 + * the caller then the caller *must* call this or the not 165 + * interruptible version before doing anything else with the 166 + * token. 167 + */ 168 + if (opal_async_tokens[token].state == ASYNC_TOKEN_ALLOCATED) { 169 + spin_lock_irqsave(&opal_async_comp_lock, flags); 170 + if (opal_async_tokens[token].state == ASYNC_TOKEN_ALLOCATED) 171 + opal_async_tokens[token].state = ASYNC_TOKEN_DISPATCHED; 172 + spin_unlock_irqrestore(&opal_async_comp_lock, flags); 173 + } 174 + 175 + /* 176 + * Wakeup the poller before we wait for events to speed things 177 + * up on platforms or simulators where the interrupts aren't 178 + * functional. 179 + */ 180 + opal_wake_poller(); 181 + ret = wait_event_interruptible(opal_async_wait, 182 + opal_async_tokens[token].state == 183 + ASYNC_TOKEN_COMPLETED); 184 + if (!ret) 185 + memcpy(msg, &opal_async_tokens[token].response, sizeof(*msg)); 186 + 187 + return ret; 188 + } 189 + EXPORT_SYMBOL_GPL(opal_async_wait_response_interruptible); 190 + 191 + /* Called from interrupt context */ 156 192 static int opal_async_comp_event(struct notifier_block *nb, 157 193 unsigned long msg_type, void *msg) 158 194 { 159 195 struct opal_msg *comp_msg = msg; 196 + enum opal_async_token_state state; 160 197 unsigned long flags; 161 198 uint64_t token; 162 199 ··· 225 140 return 0; 226 141 227 142 token = be64_to_cpu(comp_msg->params[0]); 228 - memcpy(&opal_async_responses[token], comp_msg, sizeof(*comp_msg)); 229 143 spin_lock_irqsave(&opal_async_comp_lock, flags); 230 - __set_bit(token, opal_async_complete_map); 144 + state = opal_async_tokens[token].state; 145 + opal_async_tokens[token].state = ASYNC_TOKEN_COMPLETED; 231 146 spin_unlock_irqrestore(&opal_async_comp_lock, flags); 232 147 148 + if (state == ASYNC_TOKEN_ABANDONED) { 149 + /* Free the token, no one else will */ 150 + opal_async_release_token(token); 151 + return 0; 152 + } 153 + memcpy(&opal_async_tokens[token].response, comp_msg, sizeof(*comp_msg)); 233 154 wake_up(&opal_async_wait); 234 155 235 156 return 0; ··· 269 178 } 270 179 271 180 opal_max_async_tokens = be32_to_cpup(async); 272 - if (opal_max_async_tokens > N_ASYNC_COMPLETIONS) 273 - opal_max_async_tokens = N_ASYNC_COMPLETIONS; 181 + opal_async_tokens = kcalloc(opal_max_async_tokens, 182 + sizeof(*opal_async_tokens), GFP_KERNEL); 183 + if (!opal_async_tokens) { 184 + err = -ENOMEM; 185 + goto out_opal_node; 186 + } 274 187 275 188 err = opal_message_notifier_register(OPAL_MSG_ASYNC_COMP, 276 189 &opal_async_comp_nb); 277 190 if (err) { 278 191 pr_err("%s: Can't register OPAL event notifier (%d)\n", 279 192 __func__, err); 193 + kfree(opal_async_tokens); 280 194 goto out_opal_node; 281 195 } 282 196 283 - opal_async_responses = kzalloc( 284 - sizeof(*opal_async_responses) * opal_max_async_tokens, 285 - GFP_KERNEL); 286 - if (!opal_async_responses) { 287 - pr_err("%s: Out of memory, failed to do asynchronous " 288 - "completion init\n", __func__); 289 - err = -ENOMEM; 290 - goto out_opal_node; 291 - } 292 - 293 - /* Initialize to 1 less than the maximum tokens available, as we may 294 - * require to pop one during emergency through synchronous call to 295 - * __opal_async_get_token() 296 - */ 297 - sema_init(&opal_async_sem, opal_max_async_tokens - 1); 197 + sema_init(&opal_async_sem, opal_max_async_tokens); 298 198 299 199 out_opal_node: 300 200 of_node_put(opal_node);
+1 -1
arch/powerpc/platforms/powernv/opal-hmi.c
··· 1 1 /* 2 - * OPAL hypervisor Maintenance interrupt handling support in PowreNV. 2 + * OPAL hypervisor Maintenance interrupt handling support in PowerNV. 3 3 * 4 4 * This program is free software; you can redistribute it and/or modify 5 5 * it under the terms of the GNU General Public License as published by
+7 -1
arch/powerpc/platforms/powernv/opal-irqchip.c
··· 174 174 175 175 /* First free interrupts, which will also mask them */ 176 176 for (i = 0; i < opal_irq_count; i++) { 177 - if (opal_irqs[i]) 177 + if (!opal_irqs[i]) 178 + continue; 179 + 180 + if (in_interrupt()) 181 + disable_irq_nosync(opal_irqs[i]); 182 + else 178 183 free_irq(opal_irqs[i], NULL); 184 + 179 185 opal_irqs[i] = 0; 180 186 } 181 187 }
+1 -1
arch/powerpc/platforms/powernv/opal-memory-errors.c
··· 1 1 /* 2 - * OPAL asynchronus Memory error handling support in PowreNV. 2 + * OPAL asynchronus Memory error handling support in PowerNV. 3 3 * 4 4 * This program is free software; you can redistribute it and/or modify 5 5 * it under the terms of the GNU General Public License as published by
+4 -13
arch/powerpc/platforms/powernv/opal-sensor.c
··· 19 19 */ 20 20 21 21 #include <linux/delay.h> 22 - #include <linux/mutex.h> 23 22 #include <linux/of_platform.h> 24 23 #include <asm/opal.h> 25 24 #include <asm/machdep.h> 26 - 27 - static DEFINE_MUTEX(opal_sensor_mutex); 28 25 29 26 /* 30 27 * This will return sensor information to driver based on the requested sensor ··· 35 38 __be32 data; 36 39 37 40 token = opal_async_get_token_interruptible(); 38 - if (token < 0) { 39 - pr_err("%s: Couldn't get the token, returning\n", __func__); 40 - ret = token; 41 - goto out; 42 - } 41 + if (token < 0) 42 + return token; 43 43 44 - mutex_lock(&opal_sensor_mutex); 45 44 ret = opal_sensor_read(sensor_hndl, token, &data); 46 45 switch (ret) { 47 46 case OPAL_ASYNC_COMPLETION: ··· 45 52 if (ret) { 46 53 pr_err("%s: Failed to wait for the async response, %d\n", 47 54 __func__, ret); 48 - goto out_token; 55 + goto out; 49 56 } 50 57 51 58 ret = opal_error_code(opal_get_async_rc(msg)); ··· 66 73 break; 67 74 } 68 75 69 - out_token: 70 - mutex_unlock(&opal_sensor_mutex); 71 - opal_async_release_token(token); 72 76 out: 77 + opal_async_release_token(token); 73 78 return ret; 74 79 } 75 80 EXPORT_SYMBOL_GPL(opal_get_sensor_data);
+3 -2
arch/powerpc/platforms/powernv/opal-wrappers.S
··· 94 94 * bytes (always BE) since MSR:LE will end up fixed up as a side 95 95 * effect of the rfid. 96 96 */ 97 - FIXUP_ENDIAN 97 + FIXUP_ENDIAN_HV 98 98 ld r2,PACATOC(r13); 99 99 lwz r4,8(r1); 100 100 ld r5,PPC_LR_STKOFF(r1); ··· 120 120 hrfid 121 121 122 122 opal_return_realmode: 123 - FIXUP_ENDIAN 123 + FIXUP_ENDIAN_HV 124 124 ld r2,PACATOC(r13); 125 125 lwz r11,8(r1); 126 126 ld r12,PPC_LR_STKOFF(r1) ··· 307 307 OPAL_CALL(opal_xive_set_vp_info, OPAL_XIVE_SET_VP_INFO); 308 308 OPAL_CALL(opal_xive_sync, OPAL_XIVE_SYNC); 309 309 OPAL_CALL(opal_xive_dump, OPAL_XIVE_DUMP); 310 + OPAL_CALL(opal_signal_system_reset, OPAL_SIGNAL_SYSTEM_RESET); 310 311 OPAL_CALL(opal_npu_init_context, OPAL_NPU_INIT_CONTEXT); 311 312 OPAL_CALL(opal_npu_destroy_context, OPAL_NPU_DESTROY_CONTEXT); 312 313 OPAL_CALL(opal_npu_map_lpar, OPAL_NPU_MAP_LPAR);
+2
arch/powerpc/platforms/powernv/opal.c
··· 998 998 999 999 case OPAL_PARAMETER: return -EINVAL; 1000 1000 case OPAL_ASYNC_COMPLETION: return -EINPROGRESS; 1001 + case OPAL_BUSY: 1001 1002 case OPAL_BUSY_EVENT: return -EBUSY; 1002 1003 case OPAL_NO_MEM: return -ENOMEM; 1003 1004 case OPAL_PERMISSION: return -EPERM; ··· 1038 1037 /* Export this for KVM */ 1039 1038 EXPORT_SYMBOL_GPL(opal_int_set_mfrr); 1040 1039 EXPORT_SYMBOL_GPL(opal_int_eoi); 1040 + EXPORT_SYMBOL_GPL(opal_error_code);
+23 -6
arch/powerpc/platforms/powernv/pci-ioda.c
··· 1002 1002 } 1003 1003 1004 1004 /* 1005 - * After doing so, there would be a "hole" in the /proc/iomem when 1006 - * offset is a positive value. It looks like the device return some 1007 - * mmio back to the system, which actually no one could use it. 1005 + * Since M64 BAR shares segments among all possible 256 PEs, 1006 + * we have to shift the beginning of PF IOV BAR to make it start from 1007 + * the segment which belongs to the PE number assigned to the first VF. 1008 + * This creates a "hole" in the /proc/iomem which could be used for 1009 + * allocating other resources so we reserve this area below and 1010 + * release when IOV is released. 1008 1011 */ 1009 1012 for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { 1010 1013 res = &dev->resource[i + PCI_IOV_RESOURCES]; ··· 1021 1018 dev_info(&dev->dev, "VF BAR%d: %pR shifted to %pR (%sabling %d VFs shifted by %d)\n", 1022 1019 i, &res2, res, (offset > 0) ? "En" : "Dis", 1023 1020 num_vfs, offset); 1021 + 1022 + if (offset < 0) { 1023 + devm_release_resource(&dev->dev, &pdn->holes[i]); 1024 + memset(&pdn->holes[i], 0, sizeof(pdn->holes[i])); 1025 + } 1026 + 1024 1027 pci_update_resource(dev, i + PCI_IOV_RESOURCES); 1028 + 1029 + if (offset > 0) { 1030 + pdn->holes[i].start = res2.start; 1031 + pdn->holes[i].end = res2.start + size * offset - 1; 1032 + pdn->holes[i].flags = IORESOURCE_BUS; 1033 + pdn->holes[i].name = "pnv_iov_reserved"; 1034 + devm_request_resource(&dev->dev, res->parent, 1035 + &pdn->holes[i]); 1036 + } 1025 1037 } 1026 1038 return 0; 1027 1039 } ··· 2797 2779 if (!levels || (levels > POWERNV_IOMMU_MAX_LEVELS)) 2798 2780 return -EINVAL; 2799 2781 2800 - if ((window_size > memory_hotplug_max()) || !is_power_of_2(window_size)) 2782 + if (!is_power_of_2(window_size)) 2801 2783 return -EINVAL; 2802 2784 2803 2785 /* Adjust direct table size from window_size and levels */ ··· 3311 3293 pnv_pci_ioda_create_dbgfs(); 3312 3294 3313 3295 #ifdef CONFIG_EEH 3314 - eeh_init(); 3315 - eeh_addr_cache_build(); 3296 + pnv_eeh_post_init(); 3316 3297 #endif 3317 3298 } 3318 3299
+4
arch/powerpc/platforms/powernv/pci.h
··· 188 188 189 189 /* Bitmask for MMIO register usage */ 190 190 unsigned long mmio_atsd_usage; 191 + 192 + /* Do we need to explicitly flush the nest mmu? */ 193 + bool nmmu_flush; 191 194 } npu; 192 195 193 196 #ifdef CONFIG_CXL_BASE ··· 238 235 extern void pnv_set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq); 239 236 extern bool pnv_pci_enable_device_hook(struct pci_dev *dev); 240 237 extern void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable); 238 + extern int pnv_eeh_post_init(void); 241 239 242 240 extern void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level, 243 241 const char *fmt, ...);
+25 -1
arch/powerpc/platforms/powernv/setup.c
··· 36 36 #include <asm/opal.h> 37 37 #include <asm/kexec.h> 38 38 #include <asm/smp.h> 39 + #include <asm/tm.h> 39 40 40 41 #include "powernv.h" 41 42 ··· 291 290 ppc_md.restart = pnv_restart; 292 291 pm_power_off = pnv_power_off; 293 292 ppc_md.halt = pnv_halt; 293 + /* ppc_md.system_reset_exception gets filled in by pnv_smp_init() */ 294 294 ppc_md.machine_check_exception = opal_machine_check; 295 295 ppc_md.mce_check_early_recovery = opal_mce_check_early_recovery; 296 296 ppc_md.hmi_exception_early = opal_hmi_exception_early; ··· 313 311 return 1; 314 312 } 315 313 314 + #ifdef CONFIG_PPC_TRANSACTIONAL_MEM 315 + void __init pnv_tm_init(void) 316 + { 317 + if (!firmware_has_feature(FW_FEATURE_OPAL) || 318 + !pvr_version_is(PVR_POWER9) || 319 + early_cpu_has_feature(CPU_FTR_TM)) 320 + return; 321 + 322 + if (opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) != OPAL_SUCCESS) 323 + return; 324 + 325 + pr_info("Enabling TM (Transactional Memory) with Suspend Disabled\n"); 326 + cur_cpu_spec->cpu_features |= CPU_FTR_TM; 327 + /* Make sure "normal" HTM is off (it should be) */ 328 + cur_cpu_spec->cpu_user_features2 &= ~PPC_FEATURE2_HTM; 329 + /* Turn on no suspend mode, and HTM no SC */ 330 + cur_cpu_spec->cpu_user_features2 |= PPC_FEATURE2_HTM_NO_SUSPEND | \ 331 + PPC_FEATURE2_HTM_NOSC; 332 + tm_suspend_disabled = true; 333 + } 334 + #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ 335 + 316 336 /* 317 337 * Returns the cpu frequency for 'cpu' in Hz. This is used by 318 338 * /proc/cpuinfo ··· 343 319 { 344 320 unsigned long ret_freq; 345 321 346 - ret_freq = cpufreq_quick_get(cpu) * 1000ul; 322 + ret_freq = cpufreq_get(cpu) * 1000ul; 347 323 348 324 /* 349 325 * If the backend cpufreq driver does not exist,
+59
arch/powerpc/platforms/powernv/smp.c
··· 49 49 50 50 static void pnv_smp_setup_cpu(int cpu) 51 51 { 52 + /* 53 + * P9 workaround for CI vector load (see traps.c), 54 + * enable the corresponding HMI interrupt 55 + */ 56 + if (pvr_version_is(PVR_POWER9)) 57 + mtspr(SPRN_HMEER, mfspr(SPRN_HMEER) | PPC_BIT(17)); 58 + 52 59 if (xive_enabled()) 53 60 xive_smp_setup_cpu(); 54 61 else if (cpu != boot_cpuid) ··· 297 290 } 298 291 } 299 292 293 + static int pnv_system_reset_exception(struct pt_regs *regs) 294 + { 295 + if (smp_handle_nmi_ipi(regs)) 296 + return 1; 297 + return 0; 298 + } 299 + 300 + static int pnv_cause_nmi_ipi(int cpu) 301 + { 302 + int64_t rc; 303 + 304 + if (cpu >= 0) { 305 + rc = opal_signal_system_reset(get_hard_smp_processor_id(cpu)); 306 + if (rc != OPAL_SUCCESS) 307 + return 0; 308 + return 1; 309 + 310 + } else if (cpu == NMI_IPI_ALL_OTHERS) { 311 + bool success = true; 312 + int c; 313 + 314 + 315 + /* 316 + * We do not use broadcasts (yet), because it's not clear 317 + * exactly what semantics Linux wants or the firmware should 318 + * provide. 319 + */ 320 + for_each_online_cpu(c) { 321 + if (c == smp_processor_id()) 322 + continue; 323 + 324 + rc = opal_signal_system_reset( 325 + get_hard_smp_processor_id(c)); 326 + if (rc != OPAL_SUCCESS) 327 + success = false; 328 + } 329 + if (success) 330 + return 1; 331 + 332 + /* 333 + * Caller will fall back to doorbells, which may pick 334 + * up the remainders. 335 + */ 336 + } 337 + 338 + return 0; 339 + } 340 + 300 341 static struct smp_ops_t pnv_smp_ops = { 301 342 .message_pass = NULL, /* Use smp_muxed_ipi_message_pass */ 302 343 .cause_ipi = NULL, /* Filled at runtime by pnv_smp_probe() */ ··· 363 308 /* This is called very early during platform setup_arch */ 364 309 void __init pnv_smp_init(void) 365 310 { 311 + if (opal_check_token(OPAL_SIGNAL_SYSTEM_RESET)) { 312 + ppc_md.system_reset_exception = pnv_system_reset_exception; 313 + pnv_smp_ops.cause_nmi_ipi = pnv_cause_nmi_ipi; 314 + } 366 315 smp_ops = &pnv_smp_ops; 367 316 368 317 #ifdef CONFIG_HOTPLUG_CPU
+209
arch/powerpc/platforms/powernv/vas-debug.c
··· 1 + /* 2 + * Copyright 2016-17 IBM Corp. 3 + * 4 + * This program is free software; you can redistribute it and/or 5 + * modify it under the terms of the GNU General Public License 6 + * as published by the Free Software Foundation; either version 7 + * 2 of the License, or (at your option) any later version. 8 + */ 9 + 10 + #define pr_fmt(fmt) "vas: " fmt 11 + 12 + #include <linux/types.h> 13 + #include <linux/slab.h> 14 + #include <linux/debugfs.h> 15 + #include <linux/seq_file.h> 16 + #include "vas.h" 17 + 18 + static struct dentry *vas_debugfs; 19 + 20 + static char *cop_to_str(int cop) 21 + { 22 + switch (cop) { 23 + case VAS_COP_TYPE_FAULT: return "Fault"; 24 + case VAS_COP_TYPE_842: return "NX-842 Normal Priority"; 25 + case VAS_COP_TYPE_842_HIPRI: return "NX-842 High Priority"; 26 + case VAS_COP_TYPE_GZIP: return "NX-GZIP Normal Priority"; 27 + case VAS_COP_TYPE_GZIP_HIPRI: return "NX-GZIP High Priority"; 28 + case VAS_COP_TYPE_FTW: return "Fast Thread-wakeup"; 29 + default: return "Unknown"; 30 + } 31 + } 32 + 33 + static int info_dbg_show(struct seq_file *s, void *private) 34 + { 35 + struct vas_window *window = s->private; 36 + 37 + mutex_lock(&vas_mutex); 38 + 39 + /* ensure window is not unmapped */ 40 + if (!window->hvwc_map) 41 + goto unlock; 42 + 43 + seq_printf(s, "Type: %s, %s\n", cop_to_str(window->cop), 44 + window->tx_win ? "Send" : "Receive"); 45 + seq_printf(s, "Pid : %d\n", window->pid); 46 + 47 + unlock: 48 + mutex_unlock(&vas_mutex); 49 + return 0; 50 + } 51 + 52 + static int info_dbg_open(struct inode *inode, struct file *file) 53 + { 54 + return single_open(file, info_dbg_show, inode->i_private); 55 + } 56 + 57 + static const struct file_operations info_fops = { 58 + .open = info_dbg_open, 59 + .read = seq_read, 60 + .llseek = seq_lseek, 61 + .release = single_release, 62 + }; 63 + 64 + static inline void print_reg(struct seq_file *s, struct vas_window *win, 65 + char *name, u32 reg) 66 + { 67 + seq_printf(s, "0x%016llx %s\n", read_hvwc_reg(win, name, reg), name); 68 + } 69 + 70 + static int hvwc_dbg_show(struct seq_file *s, void *private) 71 + { 72 + struct vas_window *window = s->private; 73 + 74 + mutex_lock(&vas_mutex); 75 + 76 + /* ensure window is not unmapped */ 77 + if (!window->hvwc_map) 78 + goto unlock; 79 + 80 + print_reg(s, window, VREG(LPID)); 81 + print_reg(s, window, VREG(PID)); 82 + print_reg(s, window, VREG(XLATE_MSR)); 83 + print_reg(s, window, VREG(XLATE_LPCR)); 84 + print_reg(s, window, VREG(XLATE_CTL)); 85 + print_reg(s, window, VREG(AMR)); 86 + print_reg(s, window, VREG(SEIDR)); 87 + print_reg(s, window, VREG(FAULT_TX_WIN)); 88 + print_reg(s, window, VREG(OSU_INTR_SRC_RA)); 89 + print_reg(s, window, VREG(HV_INTR_SRC_RA)); 90 + print_reg(s, window, VREG(PSWID)); 91 + print_reg(s, window, VREG(LFIFO_BAR)); 92 + print_reg(s, window, VREG(LDATA_STAMP_CTL)); 93 + print_reg(s, window, VREG(LDMA_CACHE_CTL)); 94 + print_reg(s, window, VREG(LRFIFO_PUSH)); 95 + print_reg(s, window, VREG(CURR_MSG_COUNT)); 96 + print_reg(s, window, VREG(LNOTIFY_AFTER_COUNT)); 97 + print_reg(s, window, VREG(LRX_WCRED)); 98 + print_reg(s, window, VREG(LRX_WCRED_ADDER)); 99 + print_reg(s, window, VREG(TX_WCRED)); 100 + print_reg(s, window, VREG(TX_WCRED_ADDER)); 101 + print_reg(s, window, VREG(LFIFO_SIZE)); 102 + print_reg(s, window, VREG(WINCTL)); 103 + print_reg(s, window, VREG(WIN_STATUS)); 104 + print_reg(s, window, VREG(WIN_CTX_CACHING_CTL)); 105 + print_reg(s, window, VREG(TX_RSVD_BUF_COUNT)); 106 + print_reg(s, window, VREG(LRFIFO_WIN_PTR)); 107 + print_reg(s, window, VREG(LNOTIFY_CTL)); 108 + print_reg(s, window, VREG(LNOTIFY_PID)); 109 + print_reg(s, window, VREG(LNOTIFY_LPID)); 110 + print_reg(s, window, VREG(LNOTIFY_TID)); 111 + print_reg(s, window, VREG(LNOTIFY_SCOPE)); 112 + print_reg(s, window, VREG(NX_UTIL_ADDER)); 113 + unlock: 114 + mutex_unlock(&vas_mutex); 115 + return 0; 116 + } 117 + 118 + static int hvwc_dbg_open(struct inode *inode, struct file *file) 119 + { 120 + return single_open(file, hvwc_dbg_show, inode->i_private); 121 + } 122 + 123 + static const struct file_operations hvwc_fops = { 124 + .open = hvwc_dbg_open, 125 + .read = seq_read, 126 + .llseek = seq_lseek, 127 + .release = single_release, 128 + }; 129 + 130 + void vas_window_free_dbgdir(struct vas_window *window) 131 + { 132 + if (window->dbgdir) { 133 + debugfs_remove_recursive(window->dbgdir); 134 + kfree(window->dbgname); 135 + window->dbgdir = NULL; 136 + window->dbgname = NULL; 137 + } 138 + } 139 + 140 + void vas_window_init_dbgdir(struct vas_window *window) 141 + { 142 + struct dentry *f, *d; 143 + 144 + if (!window->vinst->dbgdir) 145 + return; 146 + 147 + window->dbgname = kzalloc(16, GFP_KERNEL); 148 + if (!window->dbgname) 149 + return; 150 + 151 + snprintf(window->dbgname, 16, "w%d", window->winid); 152 + 153 + d = debugfs_create_dir(window->dbgname, window->vinst->dbgdir); 154 + if (IS_ERR(d)) 155 + goto free_name; 156 + 157 + window->dbgdir = d; 158 + 159 + f = debugfs_create_file("info", 0444, d, window, &info_fops); 160 + if (IS_ERR(f)) 161 + goto remove_dir; 162 + 163 + f = debugfs_create_file("hvwc", 0444, d, window, &hvwc_fops); 164 + if (IS_ERR(f)) 165 + goto remove_dir; 166 + 167 + return; 168 + 169 + free_name: 170 + kfree(window->dbgname); 171 + window->dbgname = NULL; 172 + 173 + remove_dir: 174 + debugfs_remove_recursive(window->dbgdir); 175 + window->dbgdir = NULL; 176 + } 177 + 178 + void vas_instance_init_dbgdir(struct vas_instance *vinst) 179 + { 180 + struct dentry *d; 181 + 182 + if (!vas_debugfs) 183 + return; 184 + 185 + vinst->dbgname = kzalloc(16, GFP_KERNEL); 186 + if (!vinst->dbgname) 187 + return; 188 + 189 + snprintf(vinst->dbgname, 16, "v%d", vinst->vas_id); 190 + 191 + d = debugfs_create_dir(vinst->dbgname, vas_debugfs); 192 + if (IS_ERR(d)) 193 + goto free_name; 194 + 195 + vinst->dbgdir = d; 196 + return; 197 + 198 + free_name: 199 + kfree(vinst->dbgname); 200 + vinst->dbgname = NULL; 201 + vinst->dbgdir = NULL; 202 + } 203 + 204 + void vas_init_dbgdir(void) 205 + { 206 + vas_debugfs = debugfs_create_dir("vas", NULL); 207 + if (IS_ERR(vas_debugfs)) 208 + vas_debugfs = NULL; 209 + }
+198 -44
arch/powerpc/platforms/powernv/vas-window.c
··· 16 16 #include <linux/log2.h> 17 17 #include <linux/rcupdate.h> 18 18 #include <linux/cred.h> 19 - 19 + #include <asm/switch_to.h> 20 + #include <asm/ppc-opcode.h> 20 21 #include "vas.h" 21 22 #include "copy-paste.h" 22 23 ··· 40 39 41 40 pr_debug("Txwin #%d: Paste addr 0x%llx\n", winid, *addr); 42 41 } 42 + 43 + u64 vas_win_paste_addr(struct vas_window *win) 44 + { 45 + u64 addr; 46 + 47 + compute_paste_address(win, &addr, NULL); 48 + 49 + return addr; 50 + } 51 + EXPORT_SYMBOL(vas_win_paste_addr); 43 52 44 53 static inline void get_hvwc_mmio_bar(struct vas_window *window, 45 54 u64 *start, int *len) ··· 156 145 } 157 146 158 147 /* 159 - * Unmap the MMIO regions for a window. 148 + * Unmap the MMIO regions for a window. Hold the vas_mutex so we don't 149 + * unmap when the window's debugfs dir is in use. This serializes close 150 + * of a window even on another VAS instance but since its not a critical 151 + * path, just minimize the time we hold the mutex for now. We can add 152 + * a per-instance mutex later if necessary. 160 153 */ 161 154 static void unmap_winctx_mmio_bars(struct vas_window *window) 162 155 { 163 156 int len; 157 + void *uwc_map; 158 + void *hvwc_map; 164 159 u64 busaddr_start; 165 160 166 - if (window->hvwc_map) { 161 + mutex_lock(&vas_mutex); 162 + 163 + hvwc_map = window->hvwc_map; 164 + window->hvwc_map = NULL; 165 + 166 + uwc_map = window->uwc_map; 167 + window->uwc_map = NULL; 168 + 169 + mutex_unlock(&vas_mutex); 170 + 171 + if (hvwc_map) { 167 172 get_hvwc_mmio_bar(window, &busaddr_start, &len); 168 - unmap_region(window->hvwc_map, busaddr_start, len); 169 - window->hvwc_map = NULL; 173 + unmap_region(hvwc_map, busaddr_start, len); 170 174 } 171 175 172 - if (window->uwc_map) { 176 + if (uwc_map) { 173 177 get_uwc_mmio_bar(window, &busaddr_start, &len); 174 - unmap_region(window->uwc_map, busaddr_start, len); 175 - window->uwc_map = NULL; 178 + unmap_region(uwc_map, busaddr_start, len); 176 179 } 177 180 } 178 181 ··· 553 528 struct vas_instance *vinst = window->vinst; 554 529 555 530 unmap_winctx_mmio_bars(window); 531 + 532 + vas_window_free_dbgdir(window); 533 + 556 534 kfree(window); 557 535 558 536 vas_release_window_id(&vinst->ida, winid); ··· 580 552 if (map_winctx_mmio_bars(window)) 581 553 goto out_free; 582 554 555 + vas_window_init_dbgdir(window); 556 + 583 557 return window; 584 558 585 559 out_free: ··· 599 569 } 600 570 601 571 /* 572 + * Find the user space receive window given the @pswid. 573 + * - We must have a valid vasid and it must belong to this instance. 574 + * (so both send and receive windows are on the same VAS instance) 575 + * - The window must refer to an OPEN, FTW, RECEIVE window. 576 + * 577 + * NOTE: We access ->windows[] table and assume that vinst->mutex is held. 578 + */ 579 + static struct vas_window *get_user_rxwin(struct vas_instance *vinst, u32 pswid) 580 + { 581 + int vasid, winid; 582 + struct vas_window *rxwin; 583 + 584 + decode_pswid(pswid, &vasid, &winid); 585 + 586 + if (vinst->vas_id != vasid) 587 + return ERR_PTR(-EINVAL); 588 + 589 + rxwin = vinst->windows[winid]; 590 + 591 + if (!rxwin || rxwin->tx_win || rxwin->cop != VAS_COP_TYPE_FTW) 592 + return ERR_PTR(-EINVAL); 593 + 594 + return rxwin; 595 + } 596 + 597 + /* 602 598 * Get the VAS receive window associated with NX engine identified 603 599 * by @cop and if applicable, @pswid. 604 600 * ··· 637 581 638 582 mutex_lock(&vinst->mutex); 639 583 640 - if (cop == VAS_COP_TYPE_842 || cop == VAS_COP_TYPE_842_HIPRI) 641 - rxwin = vinst->rxwin[cop] ?: ERR_PTR(-EINVAL); 584 + if (cop == VAS_COP_TYPE_FTW) 585 + rxwin = get_user_rxwin(vinst, pswid); 642 586 else 643 - rxwin = ERR_PTR(-EINVAL); 587 + rxwin = vinst->rxwin[cop] ?: ERR_PTR(-EINVAL); 644 588 645 589 if (!IS_ERR(rxwin)) 646 590 atomic_inc(&rxwin->num_txwins); ··· 730 674 731 675 winctx->rx_fifo = rxattr->rx_fifo; 732 676 winctx->rx_fifo_size = rxattr->rx_fifo_size; 733 - winctx->wcreds_max = rxattr->wcreds_max ?: VAS_WCREDS_DEFAULT; 677 + winctx->wcreds_max = rxwin->wcreds_max; 734 678 winctx->pin_win = rxattr->pin_win; 735 679 736 680 winctx->nx_win = rxattr->nx_win; 737 681 winctx->fault_win = rxattr->fault_win; 682 + winctx->user_win = rxattr->user_win; 683 + winctx->rej_no_credit = rxattr->rej_no_credit; 738 684 winctx->rx_word_mode = rxattr->rx_win_ord_mode; 739 685 winctx->tx_word_mode = rxattr->tx_win_ord_mode; 740 686 winctx->rx_wcred_mode = rxattr->rx_wcred_mode; 741 687 winctx->tx_wcred_mode = rxattr->tx_wcred_mode; 688 + winctx->notify_early = rxattr->notify_early; 742 689 743 690 if (winctx->nx_win) { 744 691 winctx->data_stamp = true; ··· 782 723 static bool rx_win_args_valid(enum vas_cop_type cop, 783 724 struct vas_rx_win_attr *attr) 784 725 { 785 - dump_rx_win_attr(attr); 726 + pr_debug("Rxattr: fault %d, notify %d, intr %d, early %d, fifo %d\n", 727 + attr->fault_win, attr->notify_disable, 728 + attr->intr_disable, attr->notify_early, 729 + attr->rx_fifo_size); 786 730 787 731 if (cop >= VAS_COP_TYPE_MAX) 788 732 return false; ··· 795 733 return false; 796 734 797 735 if (attr->rx_fifo_size > VAS_RX_FIFO_SIZE_MAX) 736 + return false; 737 + 738 + if (attr->wcreds_max > VAS_RX_WCREDS_MAX) 798 739 return false; 799 740 800 741 if (attr->nx_win) { ··· 900 835 rxwin->nx_win = rxattr->nx_win; 901 836 rxwin->user_win = rxattr->user_win; 902 837 rxwin->cop = cop; 838 + rxwin->wcreds_max = rxattr->wcreds_max ?: VAS_WCREDS_DEFAULT; 903 839 if (rxattr->user_win) 904 840 rxwin->pid = task_pid_vnr(current); 905 841 ··· 950 884 */ 951 885 memset(winctx, 0, sizeof(struct vas_winctx)); 952 886 953 - winctx->wcreds_max = txattr->wcreds_max ?: VAS_WCREDS_DEFAULT; 887 + winctx->wcreds_max = txwin->wcreds_max; 954 888 955 889 winctx->user_win = txattr->user_win; 956 890 winctx->nx_win = txwin->rxwin->nx_win; 957 891 winctx->pin_win = txattr->pin_win; 892 + winctx->rej_no_credit = txattr->rej_no_credit; 893 + winctx->rsvd_txbuf_enable = txattr->rsvd_txbuf_enable; 958 894 959 895 winctx->rx_wcred_mode = txattr->rx_wcred_mode; 960 896 winctx->tx_wcred_mode = txattr->tx_wcred_mode; 961 897 winctx->rx_word_mode = txattr->rx_win_ord_mode; 962 898 winctx->tx_word_mode = txattr->tx_win_ord_mode; 899 + winctx->rsvd_txbuf_count = txattr->rsvd_txbuf_count; 963 900 964 - if (winctx->nx_win) { 901 + winctx->intr_disable = true; 902 + if (winctx->nx_win) 965 903 winctx->data_stamp = true; 966 - winctx->intr_disable = true; 967 - } 968 904 969 905 winctx->lpid = txattr->lpid; 970 906 winctx->pidr = txattr->pidr; ··· 989 921 if (cop > VAS_COP_TYPE_MAX) 990 922 return false; 991 923 924 + if (attr->wcreds_max > VAS_TX_WCREDS_MAX) 925 + return false; 926 + 992 927 if (attr->user_win && 993 928 (cop != VAS_COP_TYPE_FTW || attr->rsvd_txbuf_count)) 994 929 return false; ··· 1011 940 if (!tx_win_args_valid(cop, attr)) 1012 941 return ERR_PTR(-EINVAL); 1013 942 943 + /* 944 + * If caller did not specify a vasid but specified the PSWID of a 945 + * receive window (applicable only to FTW windows), use the vasid 946 + * from that receive window. 947 + */ 948 + if (vasid == -1 && attr->pswid) 949 + decode_pswid(attr->pswid, &vasid, NULL); 950 + 1014 951 vinst = find_vas_instance(vasid); 1015 952 if (!vinst) { 1016 953 pr_devel("vasid %d not found!\n", vasid); ··· 1037 958 goto put_rxwin; 1038 959 } 1039 960 961 + txwin->cop = cop; 1040 962 txwin->tx_win = 1; 1041 963 txwin->rxwin = rxwin; 1042 964 txwin->nx_win = txwin->rxwin->nx_win; 1043 965 txwin->pid = attr->pid; 1044 966 txwin->user_win = attr->user_win; 967 + txwin->wcreds_max = attr->wcreds_max ?: VAS_WCREDS_DEFAULT; 1045 968 1046 969 init_winctx_for_txwin(txwin, attr, &winctx); 1047 970 ··· 1064 983 goto free_window; 1065 984 } 1066 985 } 986 + 987 + /* 988 + * Now that we have a send window, ensure context switch issues 989 + * CP_ABORT for this thread. 990 + */ 991 + rc = -EINVAL; 992 + if (set_thread_uses_vas() < 0) 993 + goto free_window; 1067 994 1068 995 set_vinst_win(vinst, txwin); 1069 996 ··· 1127 1038 else 1128 1039 rc = -EINVAL; 1129 1040 1130 - print_fifo_msg_count(txwin); 1041 + pr_debug("Txwin #%d: Msg count %llu\n", txwin->winid, 1042 + read_hvwc_reg(txwin, VREG(LRFIFO_PUSH))); 1131 1043 1132 1044 return rc; 1133 1045 } 1134 1046 EXPORT_SYMBOL_GPL(vas_paste_crb); 1135 1047 1048 + /* 1049 + * If credit checking is enabled for this window, poll for the return 1050 + * of window credits (i.e for NX engines to process any outstanding CRBs). 1051 + * Since NX-842 waits for the CRBs to be processed before closing the 1052 + * window, we should not have to wait for too long. 1053 + * 1054 + * TODO: We retry in 10ms intervals now. We could/should probably peek at 1055 + * the VAS_LRFIFO_PUSH_OFFSET register to get an estimate of pending 1056 + * CRBs on the FIFO and compute the delay dynamically on each retry. 1057 + * But that is not really needed until we support NX-GZIP access from 1058 + * user space. (NX-842 driver waits for CSB and Fast thread-wakeup 1059 + * doesn't use credit checking). 1060 + */ 1061 + static void poll_window_credits(struct vas_window *window) 1062 + { 1063 + u64 val; 1064 + int creds, mode; 1065 + 1066 + val = read_hvwc_reg(window, VREG(WINCTL)); 1067 + if (window->tx_win) 1068 + mode = GET_FIELD(VAS_WINCTL_TX_WCRED_MODE, val); 1069 + else 1070 + mode = GET_FIELD(VAS_WINCTL_RX_WCRED_MODE, val); 1071 + 1072 + if (!mode) 1073 + return; 1074 + retry: 1075 + if (window->tx_win) { 1076 + val = read_hvwc_reg(window, VREG(TX_WCRED)); 1077 + creds = GET_FIELD(VAS_TX_WCRED, val); 1078 + } else { 1079 + val = read_hvwc_reg(window, VREG(LRX_WCRED)); 1080 + creds = GET_FIELD(VAS_LRX_WCRED, val); 1081 + } 1082 + 1083 + if (creds < window->wcreds_max) { 1084 + val = 0; 1085 + set_current_state(TASK_UNINTERRUPTIBLE); 1086 + schedule_timeout(msecs_to_jiffies(10)); 1087 + goto retry; 1088 + } 1089 + } 1090 + 1091 + /* 1092 + * Wait for the window to go to "not-busy" state. It should only take a 1093 + * short time to queue a CRB, so window should not be busy for too long. 1094 + * Trying 5ms intervals. 1095 + */ 1136 1096 static void poll_window_busy_state(struct vas_window *window) 1137 1097 { 1138 1098 int busy; 1139 1099 u64 val; 1140 1100 1141 1101 retry: 1142 - /* 1143 - * Poll Window Busy flag 1144 - */ 1145 1102 val = read_hvwc_reg(window, VREG(WIN_STATUS)); 1146 1103 busy = GET_FIELD(VAS_WIN_BUSY, val); 1147 1104 if (busy) { 1148 1105 val = 0; 1149 1106 set_current_state(TASK_UNINTERRUPTIBLE); 1150 - schedule_timeout(HZ); 1107 + schedule_timeout(msecs_to_jiffies(5)); 1151 1108 goto retry; 1152 1109 } 1153 1110 } 1154 1111 1112 + /* 1113 + * Have the hardware cast a window out of cache and wait for it to 1114 + * be completed. 1115 + * 1116 + * NOTE: It can take a relatively long time to cast the window context 1117 + * out of the cache. It is not strictly necessary to cast out if: 1118 + * 1119 + * - we clear the "Pin Window" bit (so hardware is free to evict) 1120 + * 1121 + * - we re-initialize the window context when it is reassigned. 1122 + * 1123 + * We do the former in vas_win_close() and latter in vas_win_open(). 1124 + * So, ignoring the cast-out for now. We can add it as needed. If 1125 + * casting out becomes necessary we should consider offloading the 1126 + * job to a worker thread, so the window close can proceed quickly. 1127 + */ 1155 1128 static void poll_window_castout(struct vas_window *window) 1156 1129 { 1157 - int cached; 1130 + /* stub for now */ 1131 + } 1132 + 1133 + /* 1134 + * Unpin and close a window so no new requests are accepted and the 1135 + * hardware can evict this window from cache if necessary. 1136 + */ 1137 + static void unpin_close_window(struct vas_window *window) 1138 + { 1158 1139 u64 val; 1159 1140 1160 - /* Cast window context out of the cache */ 1161 - retry: 1162 - val = read_hvwc_reg(window, VREG(WIN_CTX_CACHING_CTL)); 1163 - cached = GET_FIELD(VAS_WIN_CACHE_STATUS, val); 1164 - if (cached) { 1165 - val = 0ULL; 1166 - val = SET_FIELD(VAS_CASTOUT_REQ, val, 1); 1167 - val = SET_FIELD(VAS_PUSH_TO_MEM, val, 0); 1168 - write_hvwc_reg(window, VREG(WIN_CTX_CACHING_CTL), val); 1169 - 1170 - set_current_state(TASK_UNINTERRUPTIBLE); 1171 - schedule_timeout(HZ); 1172 - goto retry; 1173 - } 1141 + val = read_hvwc_reg(window, VREG(WINCTL)); 1142 + val = SET_FIELD(VAS_WINCTL_PIN, val, 0); 1143 + val = SET_FIELD(VAS_WINCTL_OPEN, val, 0); 1144 + write_hvwc_reg(window, VREG(WINCTL), val); 1174 1145 } 1175 1146 1176 1147 /* ··· 1247 1098 */ 1248 1099 int vas_win_close(struct vas_window *window) 1249 1100 { 1250 - u64 val; 1251 - 1252 1101 if (!window) 1253 1102 return 0; 1254 1103 ··· 1262 1115 1263 1116 poll_window_busy_state(window); 1264 1117 1265 - /* Unpin window from cache and close it */ 1266 - val = read_hvwc_reg(window, VREG(WINCTL)); 1267 - val = SET_FIELD(VAS_WINCTL_PIN, val, 0); 1268 - val = SET_FIELD(VAS_WINCTL_OPEN, val, 0); 1269 - write_hvwc_reg(window, VREG(WINCTL), val); 1118 + unpin_close_window(window); 1119 + 1120 + poll_window_credits(window); 1270 1121 1271 1122 poll_window_castout(window); 1272 1123 ··· 1277 1132 return 0; 1278 1133 } 1279 1134 EXPORT_SYMBOL_GPL(vas_win_close); 1135 + 1136 + /* 1137 + * Return a system-wide unique window id for the window @win. 1138 + */ 1139 + u32 vas_win_id(struct vas_window *win) 1140 + { 1141 + return encode_pswid(win->vinst->vas_id, win->winid); 1142 + } 1143 + EXPORT_SYMBOL_GPL(vas_win_id);
+29 -2
arch/powerpc/platforms/powernv/vas.c
··· 18 18 #include <linux/of_platform.h> 19 19 #include <linux/of_address.h> 20 20 #include <linux/of.h> 21 + #include <asm/prom.h> 21 22 22 23 #include "vas.h" 23 24 24 - static DEFINE_MUTEX(vas_mutex); 25 + DEFINE_MUTEX(vas_mutex); 25 26 static LIST_HEAD(vas_instances); 27 + 28 + static DEFINE_PER_CPU(int, cpu_vas_id); 26 29 27 30 static int init_vas_instance(struct platform_device *pdev) 28 31 { 29 - int rc, vasid; 32 + int rc, cpu, vasid; 30 33 struct resource *res; 31 34 struct vas_instance *vinst; 32 35 struct device_node *dn = pdev->dev.of_node; ··· 77 74 "paste_win_id_shift 0x%llx\n", pdev->name, vasid, 78 75 vinst->paste_base_addr, vinst->paste_win_id_shift); 79 76 77 + for_each_possible_cpu(cpu) { 78 + if (cpu_to_chip_id(cpu) == of_get_ibm_chip_id(dn)) 79 + per_cpu(cpu_vas_id, cpu) = vasid; 80 + } 81 + 80 82 mutex_lock(&vas_mutex); 81 83 list_add(&vinst->node, &vas_instances); 82 84 mutex_unlock(&vas_mutex); 85 + 86 + vas_instance_init_dbgdir(vinst); 83 87 84 88 dev_set_drvdata(&pdev->dev, vinst); 85 89 ··· 108 98 struct vas_instance *vinst; 109 99 110 100 mutex_lock(&vas_mutex); 101 + 102 + if (vasid == -1) 103 + vasid = per_cpu(cpu_vas_id, smp_processor_id()); 104 + 111 105 list_for_each(ent, &vas_instances) { 112 106 vinst = list_entry(ent, struct vas_instance, node); 113 107 if (vinst->vas_id == vasid) { ··· 123 109 124 110 pr_devel("Instance %d not found\n", vasid); 125 111 return NULL; 112 + } 113 + 114 + int chip_to_vas_id(int chipid) 115 + { 116 + int cpu; 117 + 118 + for_each_possible_cpu(cpu) { 119 + if (cpu_to_chip_id(cpu) == chipid) 120 + return per_cpu(cpu_vas_id, cpu); 121 + } 122 + return -1; 126 123 } 127 124 128 125 static int vas_probe(struct platform_device *pdev) ··· 158 133 { 159 134 int found = 0; 160 135 struct device_node *dn; 136 + 137 + vas_init_dbgdir(); 161 138 162 139 platform_driver_register(&vas_driver); 163 140
+52 -41
arch/powerpc/platforms/powernv/vas.h
··· 13 13 #include <linux/idr.h> 14 14 #include <asm/vas.h> 15 15 #include <linux/io.h> 16 + #include <linux/dcache.h> 17 + #include <linux/mutex.h> 16 18 17 19 /* 18 20 * Overview of Virtual Accelerator Switchboard (VAS). ··· 108 106 * 109 107 * TODO: Needs tuning for per-process credits 110 108 */ 111 - #define VAS_WCREDS_MIN 16 112 - #define VAS_WCREDS_MAX ((64 << 10) - 1) 109 + #define VAS_RX_WCREDS_MAX ((64 << 10) - 1) 110 + #define VAS_TX_WCREDS_MAX ((4 << 10) - 1) 113 111 #define VAS_WCREDS_DEFAULT (1 << 10) 114 112 115 113 /* ··· 261 259 #define VAS_NX_UTIL_ADDER PPC_BITMASK(32, 63) 262 260 263 261 /* 262 + * VREG(x): 263 + * Expand a register's short name (eg: LPID) into two parameters: 264 + * - the register's short name in string form ("LPID"), and 265 + * - the name of the macro (eg: VAS_LPID_OFFSET), defining the 266 + * register's offset in the window context 267 + */ 268 + #define VREG_SFX(n, s) __stringify(n), VAS_##n##s 269 + #define VREG(r) VREG_SFX(r, _OFFSET) 270 + 271 + /* 264 272 * Local Notify Scope Control Register. (Receive windows only). 265 273 */ 266 274 enum vas_notify_scope { ··· 319 307 struct mutex mutex; 320 308 struct vas_window *rxwin[VAS_COP_TYPE_MAX]; 321 309 struct vas_window *windows[VAS_WINDOWS_PER_CHIP]; 310 + 311 + char *dbgname; 312 + struct dentry *dbgdir; 322 313 }; 323 314 324 315 /* ··· 337 322 void *hvwc_map; /* HV window context */ 338 323 void *uwc_map; /* OS/User window context */ 339 324 pid_t pid; /* Linux process id of owner */ 325 + int wcreds_max; /* Window credits */ 326 + 327 + char *dbgname; 328 + struct dentry *dbgdir; 340 329 341 330 /* Fields applicable only to send windows */ 342 331 void *paste_kaddr; ··· 402 383 enum vas_notify_after_count notify_after_count; 403 384 }; 404 385 386 + extern struct mutex vas_mutex; 387 + 405 388 extern struct vas_instance *find_vas_instance(int vasid); 406 - 407 - /* 408 - * VREG(x): 409 - * Expand a register's short name (eg: LPID) into two parameters: 410 - * - the register's short name in string form ("LPID"), and 411 - * - the name of the macro (eg: VAS_LPID_OFFSET), defining the 412 - * register's offset in the window context 413 - */ 414 - #define VREG_SFX(n, s) __stringify(n), VAS_##n##s 415 - #define VREG(r) VREG_SFX(r, _OFFSET) 416 - 417 - #ifdef vas_debug 418 - static inline void dump_rx_win_attr(struct vas_rx_win_attr *attr) 419 - { 420 - pr_err("fault %d, notify %d, intr %d early %d\n", 421 - attr->fault_win, attr->notify_disable, 422 - attr->intr_disable, attr->notify_early); 423 - 424 - pr_err("rx_fifo_size %d, max value %d\n", 425 - attr->rx_fifo_size, VAS_RX_FIFO_SIZE_MAX); 426 - } 389 + extern void vas_init_dbgdir(void); 390 + extern void vas_instance_init_dbgdir(struct vas_instance *vinst); 391 + extern void vas_window_init_dbgdir(struct vas_window *win); 392 + extern void vas_window_free_dbgdir(struct vas_window *win); 427 393 428 394 static inline void vas_log_write(struct vas_window *win, char *name, 429 395 void *regptr, u64 val) 430 396 { 431 397 if (val) 432 - pr_err("%swin #%d: %s reg %p, val 0x%016llx\n", 398 + pr_debug("%swin #%d: %s reg %p, val 0x%016llx\n", 433 399 win->tx_win ? "Tx" : "Rx", win->winid, name, 434 400 regptr, val); 435 401 } 436 - 437 - #else /* vas_debug */ 438 - 439 - #define vas_log_write(win, name, reg, val) 440 - #define dump_rx_win_attr(attr) 441 - 442 - #endif /* vas_debug */ 443 402 444 403 static inline void write_uwc_reg(struct vas_window *win, char *name, 445 404 s32 reg, u64 val) ··· 447 450 return in_be64(win->hvwc_map+reg); 448 451 } 449 452 450 - #ifdef vas_debug 451 - 452 - static void print_fifo_msg_count(struct vas_window *txwin) 453 + /* 454 + * Encode/decode the Partition Send Window ID (PSWID) for a window in 455 + * a way that we can uniquely identify any window in the system. i.e. 456 + * we should be able to locate the 'struct vas_window' given the PSWID. 457 + * 458 + * Bits Usage 459 + * 0:7 VAS id (8 bits) 460 + * 8:15 Unused, 0 (3 bits) 461 + * 16:31 Window id (16 bits) 462 + */ 463 + static inline u32 encode_pswid(int vasid, int winid) 453 464 { 454 - uint64_t read_hvwc_reg(struct vas_window *w, char *n, uint64_t o); 455 - pr_devel("Winid %d, Msg count %llu\n", txwin->winid, 456 - (uint64_t)read_hvwc_reg(txwin, VREG(LRFIFO_PUSH))); 465 + u32 pswid = 0; 466 + 467 + pswid |= vasid << (31 - 7); 468 + pswid |= winid; 469 + 470 + return pswid; 457 471 } 458 - #else /* vas_debug */ 459 472 460 - #define print_fifo_msg_count(window) 473 + static inline void decode_pswid(u32 pswid, int *vasid, int *winid) 474 + { 475 + if (vasid) 476 + *vasid = pswid >> (31 - 7) & 0xFF; 461 477 462 - #endif /* vas_debug */ 463 - 478 + if (winid) 479 + *winid = pswid & 0xFFFF; 480 + } 464 481 #endif /* _VAS_H */
+2
arch/powerpc/platforms/pseries/hotplug-cpu.c
··· 363 363 BUG_ON(get_cpu_current_state(cpu) 364 364 != CPU_STATE_OFFLINE); 365 365 cpu_maps_update_done(); 366 + timed_topology_update(1); 366 367 rc = device_online(get_cpu_device(cpu)); 367 368 if (rc) 368 369 goto out; ··· 534 533 set_preferred_offline_state(cpu, 535 534 CPU_STATE_OFFLINE); 536 535 cpu_maps_update_done(); 536 + timed_topology_update(1); 537 537 rc = device_offline(get_cpu_device(cpu)); 538 538 if (rc) 539 539 goto out;
+9 -10
arch/powerpc/platforms/pseries/iommu.c
··· 55 55 56 56 static struct iommu_table_group *iommu_pseries_alloc_group(int node) 57 57 { 58 - struct iommu_table_group *table_group = NULL; 59 - struct iommu_table *tbl = NULL; 60 - struct iommu_table_group_link *tgl = NULL; 58 + struct iommu_table_group *table_group; 59 + struct iommu_table *tbl; 60 + struct iommu_table_group_link *tgl; 61 61 62 62 table_group = kzalloc_node(sizeof(struct iommu_table_group), GFP_KERNEL, 63 63 node); 64 64 if (!table_group) 65 - goto fail_exit; 65 + return NULL; 66 66 67 67 tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL, node); 68 68 if (!tbl) 69 - goto fail_exit; 69 + goto free_group; 70 70 71 71 tgl = kzalloc_node(sizeof(struct iommu_table_group_link), GFP_KERNEL, 72 72 node); 73 73 if (!tgl) 74 - goto fail_exit; 74 + goto free_table; 75 75 76 76 INIT_LIST_HEAD_RCU(&tbl->it_group_list); 77 77 kref_init(&tbl->it_kref); ··· 82 82 83 83 return table_group; 84 84 85 - fail_exit: 86 - kfree(tgl); 87 - kfree(table_group); 85 + free_table: 88 86 kfree(tbl); 89 - 87 + free_group: 88 + kfree(table_group); 90 89 return NULL; 91 90 } 92 91
+4 -4
arch/powerpc/platforms/pseries/lpar.c
··· 93 93 return; 94 94 } 95 95 96 - #ifdef CONFIG_PPC_STD_MMU_64 96 + #ifdef CONFIG_PPC_BOOK3S_64 97 97 /* 98 98 * PAPR says this feature is SLB-Buffer but firmware never 99 99 * reports that. All SPLPAR support SLB shadow buffer. ··· 106 106 "cpu %d (hw %d) of area %lx failed with %ld\n", 107 107 cpu, hwcpu, addr, ret); 108 108 } 109 - #endif /* CONFIG_PPC_STD_MMU_64 */ 109 + #endif /* CONFIG_PPC_BOOK3S_64 */ 110 110 111 111 /* 112 112 * Register dispatch trace log, if one has been allocated. ··· 129 129 } 130 130 } 131 131 132 - #ifdef CONFIG_PPC_STD_MMU_64 132 + #ifdef CONFIG_PPC_BOOK3S_64 133 133 134 134 static long pSeries_lpar_hpte_insert(unsigned long hpte_group, 135 135 unsigned long vpn, unsigned long pa, ··· 824 824 EXPORT_SYMBOL(arch_free_page); 825 825 826 826 #endif /* CONFIG_PPC_SMLPAR */ 827 - #endif /* CONFIG_PPC_STD_MMU_64 */ 827 + #endif /* CONFIG_PPC_BOOK3S_64 */ 828 828 829 829 #ifdef CONFIG_TRACEPOINTS 830 830 #ifdef HAVE_JUMP_LABEL
+1 -1
arch/powerpc/platforms/pseries/lparcfg.c
··· 485 485 seq_printf(m, "shared_processor_mode=%d\n", 486 486 lppaca_shared_proc(get_lppaca())); 487 487 488 - #ifdef CONFIG_PPC_STD_MMU_64 488 + #ifdef CONFIG_PPC_BOOK3S_64 489 489 seq_printf(m, "slb_size=%d\n", mmu_slb_size); 490 490 #endif 491 491 parse_em_data(m);
+2
arch/powerpc/platforms/pseries/vio.c
··· 1592 1592 void vio_unregister_device(struct vio_dev *viodev) 1593 1593 { 1594 1594 device_unregister(&viodev->dev); 1595 + if (viodev->family == VDEVICE) 1596 + irq_dispose_mapping(viodev->irq); 1595 1597 } 1596 1598 EXPORT_SYMBOL(vio_unregister_device); 1597 1599
+1 -1
arch/powerpc/sysdev/axonram.c
··· 184 184 static int axon_ram_bank_id = -1; 185 185 struct axon_ram_bank *bank; 186 186 struct resource resource; 187 - int rc = 0; 187 + int rc; 188 188 189 189 axon_ram_bank_id++; 190 190
+2 -2
arch/powerpc/sysdev/ipic.c
··· 846 846 847 847 u32 ipic_get_mcp_status(void) 848 848 { 849 - return ipic_read(primary_ipic->regs, IPIC_SERMR); 849 + return ipic_read(primary_ipic->regs, IPIC_SERSR); 850 850 } 851 851 852 852 void ipic_clear_mcp_status(u32 mask) 853 853 { 854 - ipic_write(primary_ipic->regs, IPIC_SERMR, mask); 854 + ipic_write(primary_ipic->regs, IPIC_SERSR, mask); 855 855 } 856 856 857 857 /* Return an interrupt vector or 0 if no interrupt is pending. */
+162 -7
arch/powerpc/xmon/xmon.c
··· 28 28 #include <linux/bug.h> 29 29 #include <linux/nmi.h> 30 30 #include <linux/ctype.h> 31 + #include <linux/highmem.h> 31 32 32 33 #include <asm/debugfs.h> 33 34 #include <asm/ptrace.h> ··· 128 127 static void memex(void); 129 128 static int bsesc(void); 130 129 static void dump(void); 130 + static void show_pte(unsigned long); 131 131 static void prdump(unsigned long, long); 132 132 static int ppc_inst_dump(unsigned long, long, int); 133 133 static void dump_log_buf(void); ··· 236 234 #endif 237 235 "\ 238 236 dr dump stream of raw bytes\n\ 237 + dv dump virtual address translation \n\ 239 238 dt dump the tracing buffers (uses printk)\n\ 240 239 dtc dump the tracing buffers for current CPU (uses printk)\n\ 241 240 " ··· 281 278 #elif defined(CONFIG_44x) || defined(CONFIG_PPC_BOOK3E) 282 279 " u dump TLB\n" 283 280 #endif 281 + " U show uptime information\n" 284 282 " ? help\n" 285 283 " # n limit output to n lines per page (for dp, dpa, dl)\n" 286 284 " zr reboot\n\ ··· 534 530 535 531 waiting: 536 532 secondary = 1; 533 + spin_begin(); 537 534 while (secondary && !xmon_gate) { 538 535 if (in_xmon == 0) { 539 - if (fromipi) 536 + if (fromipi) { 537 + spin_end(); 540 538 goto leave; 539 + } 541 540 secondary = test_and_set_bit(0, &in_xmon); 542 541 } 543 - barrier(); 542 + spin_cpu_relax(); 543 + touch_nmi_watchdog(); 544 544 } 545 + spin_end(); 545 546 546 547 if (!secondary && !xmon_gate) { 547 548 /* we are the first cpu to come in */ ··· 577 568 mb(); 578 569 xmon_gate = 1; 579 570 barrier(); 571 + touch_nmi_watchdog(); 580 572 } 581 573 582 574 cmdloop: 583 575 while (in_xmon) { 584 576 if (secondary) { 577 + spin_begin(); 585 578 if (cpu == xmon_owner) { 586 579 if (!test_and_set_bit(0, &xmon_taken)) { 587 580 secondary = 0; 581 + spin_end(); 588 582 continue; 589 583 } 590 584 /* missed it */ 591 585 while (cpu == xmon_owner) 592 - barrier(); 586 + spin_cpu_relax(); 593 587 } 594 - barrier(); 588 + spin_cpu_relax(); 589 + touch_nmi_watchdog(); 595 590 } else { 596 591 cmd = cmds(regs); 597 592 if (cmd != 0) { ··· 909 896 write_ciabr(0); 910 897 } 911 898 899 + /* Based on uptime_proc_show(). */ 900 + static void 901 + show_uptime(void) 902 + { 903 + struct timespec uptime; 904 + 905 + if (setjmp(bus_error_jmp) == 0) { 906 + catch_memory_errors = 1; 907 + sync(); 908 + 909 + get_monotonic_boottime(&uptime); 910 + printf("Uptime: %lu.%.2lu seconds\n", (unsigned long)uptime.tv_sec, 911 + ((unsigned long)uptime.tv_nsec / (NSEC_PER_SEC/100))); 912 + 913 + sync(); 914 + __delay(200); \ 915 + } 916 + catch_memory_errors = 0; 917 + } 918 + 912 919 static void set_lpp_cmd(void) 913 920 { 914 921 unsigned long lpp; ··· 1064 1031 dump_tlb_book3e(); 1065 1032 break; 1066 1033 #endif 1034 + case 'U': 1035 + show_uptime(); 1036 + break; 1067 1037 default: 1068 1038 printf("Unrecognized command: "); 1069 1039 do { ··· 2315 2279 static void dump_one_paca(int cpu) 2316 2280 { 2317 2281 struct paca_struct *p; 2318 - #ifdef CONFIG_PPC_STD_MMU_64 2282 + #ifdef CONFIG_PPC_BOOK3S_64 2319 2283 int i = 0; 2320 2284 #endif 2321 2285 ··· 2356 2320 DUMP(p, hw_cpu_id, "x"); 2357 2321 DUMP(p, cpu_start, "x"); 2358 2322 DUMP(p, kexec_state, "x"); 2359 - #ifdef CONFIG_PPC_STD_MMU_64 2323 + #ifdef CONFIG_PPC_BOOK3S_64 2360 2324 for (i = 0; i < SLB_NUM_BOLTED; i++) { 2361 2325 u64 esid, vsid; 2362 2326 ··· 2387 2351 #endif 2388 2352 DUMP(p, __current, "p"); 2389 2353 DUMP(p, kstack, "lx"); 2354 + printf(" kstack_base = 0x%016lx\n", p->kstack & ~(THREAD_SIZE - 1)); 2390 2355 DUMP(p, stab_rr, "lx"); 2391 2356 DUMP(p, saved_r1, "lx"); 2392 2357 DUMP(p, trap_save, "x"); ··· 2512 2475 unsigned long num; 2513 2476 int c; 2514 2477 2478 + if (!xive_enabled()) { 2479 + printf("Xive disabled on this system\n"); 2480 + return; 2481 + } 2482 + 2515 2483 c = inchar(); 2516 2484 if (c == 'a') { 2517 2485 dump_all_xives(); ··· 2616 2574 dump_log_buf(); 2617 2575 } else if (c == 'o') { 2618 2576 dump_opal_msglog(); 2577 + } else if (c == 'v') { 2578 + /* dump virtual to physical translation */ 2579 + show_pte(adrs); 2619 2580 } else if (c == 'r') { 2620 2581 scanhex(&ndump); 2621 2582 if (ndump == 0) ··· 2952 2907 tsk->comm); 2953 2908 } 2954 2909 2910 + #ifdef CONFIG_PPC_BOOK3S_64 2911 + void format_pte(void *ptep, unsigned long pte) 2912 + { 2913 + printf("ptep @ 0x%016lx = 0x%016lx\n", (unsigned long)ptep, pte); 2914 + printf("Maps physical address = 0x%016lx\n", pte & PTE_RPN_MASK); 2915 + 2916 + printf("Flags = %s%s%s%s%s\n", 2917 + (pte & _PAGE_ACCESSED) ? "Accessed " : "", 2918 + (pte & _PAGE_DIRTY) ? "Dirty " : "", 2919 + (pte & _PAGE_READ) ? "Read " : "", 2920 + (pte & _PAGE_WRITE) ? "Write " : "", 2921 + (pte & _PAGE_EXEC) ? "Exec " : ""); 2922 + } 2923 + 2924 + static void show_pte(unsigned long addr) 2925 + { 2926 + unsigned long tskv = 0; 2927 + struct task_struct *tsk = NULL; 2928 + struct mm_struct *mm; 2929 + pgd_t *pgdp, *pgdir; 2930 + pud_t *pudp; 2931 + pmd_t *pmdp; 2932 + pte_t *ptep; 2933 + 2934 + if (!scanhex(&tskv)) 2935 + mm = &init_mm; 2936 + else 2937 + tsk = (struct task_struct *)tskv; 2938 + 2939 + if (tsk == NULL) 2940 + mm = &init_mm; 2941 + else 2942 + mm = tsk->active_mm; 2943 + 2944 + if (setjmp(bus_error_jmp) != 0) { 2945 + catch_memory_errors = 0; 2946 + printf("*** Error dumping pte for task %p\n", tsk); 2947 + return; 2948 + } 2949 + 2950 + catch_memory_errors = 1; 2951 + sync(); 2952 + 2953 + if (mm == &init_mm) { 2954 + pgdp = pgd_offset_k(addr); 2955 + pgdir = pgd_offset_k(0); 2956 + } else { 2957 + pgdp = pgd_offset(mm, addr); 2958 + pgdir = pgd_offset(mm, 0); 2959 + } 2960 + 2961 + if (pgd_none(*pgdp)) { 2962 + printf("no linux page table for address\n"); 2963 + return; 2964 + } 2965 + 2966 + printf("pgd @ 0x%016lx\n", pgdir); 2967 + 2968 + if (pgd_huge(*pgdp)) { 2969 + format_pte(pgdp, pgd_val(*pgdp)); 2970 + return; 2971 + } 2972 + printf("pgdp @ 0x%016lx = 0x%016lx\n", pgdp, pgd_val(*pgdp)); 2973 + 2974 + pudp = pud_offset(pgdp, addr); 2975 + 2976 + if (pud_none(*pudp)) { 2977 + printf("No valid PUD\n"); 2978 + return; 2979 + } 2980 + 2981 + if (pud_huge(*pudp)) { 2982 + format_pte(pudp, pud_val(*pudp)); 2983 + return; 2984 + } 2985 + 2986 + printf("pudp @ 0x%016lx = 0x%016lx\n", pudp, pud_val(*pudp)); 2987 + 2988 + pmdp = pmd_offset(pudp, addr); 2989 + 2990 + if (pmd_none(*pmdp)) { 2991 + printf("No valid PMD\n"); 2992 + return; 2993 + } 2994 + 2995 + if (pmd_huge(*pmdp)) { 2996 + format_pte(pmdp, pmd_val(*pmdp)); 2997 + return; 2998 + } 2999 + printf("pmdp @ 0x%016lx = 0x%016lx\n", pmdp, pmd_val(*pmdp)); 3000 + 3001 + ptep = pte_offset_map(pmdp, addr); 3002 + if (pte_none(*ptep)) { 3003 + printf("no valid PTE\n"); 3004 + return; 3005 + } 3006 + 3007 + format_pte(ptep, pte_val(*ptep)); 3008 + 3009 + sync(); 3010 + __delay(200); 3011 + catch_memory_errors = 0; 3012 + } 3013 + #else 3014 + static void show_pte(unsigned long addr) 3015 + { 3016 + printf("show_pte not yet implemented\n"); 3017 + } 3018 + #endif /* CONFIG_PPC_BOOK3S_64 */ 3019 + 2955 3020 static void show_tasks(void) 2956 3021 { 2957 3022 unsigned long tskv; ··· 3379 3224 printf("%s", after); 3380 3225 } 3381 3226 3382 - #ifdef CONFIG_PPC_STD_MMU_64 3227 + #ifdef CONFIG_PPC_BOOK3S_64 3383 3228 void dump_segments(void) 3384 3229 { 3385 3230 int i;
+2 -2
drivers/cpuidle/cpuidle-powernv.c
··· 384 384 * Firmware passes residency and latency values in ns. 385 385 * cpuidle expects it in us. 386 386 */ 387 - exit_latency = latency_ns[i] / 1000; 387 + exit_latency = DIV_ROUND_UP(latency_ns[i], 1000); 388 388 if (!rc) 389 - target_residency = residency_ns[i] / 1000; 389 + target_residency = DIV_ROUND_UP(residency_ns[i], 1000); 390 390 else 391 391 target_residency = 0; 392 392
+71 -96
drivers/crypto/nx/nx-842-powernv.c
··· 46 46 47 47 ktime_t start; 48 48 49 - struct vas_window *txwin; /* Used with VAS function */ 50 49 char padding[WORKMEM_ALIGN]; /* unused, to allow alignment */ 51 50 } __packed __aligned(WORKMEM_ALIGN); 52 51 ··· 64 65 * Send the request to NX engine on the chip for the corresponding CPU 65 66 * where the process is executing. Use with VAS function. 66 67 */ 67 - static DEFINE_PER_CPU(struct nx842_coproc *, coproc_inst); 68 + static DEFINE_PER_CPU(struct vas_window *, cpu_txwin); 68 69 69 70 /* no cpu hotplug on powernv, so this list never changes after init */ 70 71 static LIST_HEAD(nx842_coprocs); ··· 585 586 ccw = SET_FIELD(CCW_FC_842, ccw, fc); 586 587 crb->ccw = cpu_to_be32(ccw); 587 588 588 - txwin = wmem->txwin; 589 - /* shoudn't happen, we don't load without a coproc */ 590 - if (!txwin) { 591 - pr_err_ratelimited("NX-842 coprocessor is not available"); 592 - return -ENODEV; 593 - } 594 - 595 589 do { 596 590 wmem->start = ktime_get(); 597 591 preempt_disable(); 592 + txwin = this_cpu_read(cpu_txwin); 593 + 598 594 /* 599 595 * VAS copy CRB into L2 cache. Refer <asm/vas.h>. 600 596 * @crb and @offset. ··· 683 689 list_add(&coproc->list, &nx842_coprocs); 684 690 } 685 691 686 - /* 687 - * Identify chip ID for each CPU and save coprocesor adddress for the 688 - * corresponding NX engine in percpu coproc_inst. 689 - * coproc_inst is used in crypto_init to open send window on the NX instance 690 - * for the corresponding CPU / chip where the open request is executed. 691 - */ 692 - static void nx842_set_per_cpu_coproc(struct nx842_coproc *coproc) 693 - { 694 - unsigned int i, chip_id; 695 - 696 - for_each_possible_cpu(i) { 697 - chip_id = cpu_to_chip_id(i); 698 - 699 - if (coproc->chip_id == chip_id) 700 - per_cpu(coproc_inst, i) = coproc; 701 - } 702 - } 703 - 704 - 705 692 static struct vas_window *nx842_alloc_txwin(struct nx842_coproc *coproc) 706 693 { 707 694 struct vas_window *txwin = NULL; ··· 700 725 * Open a VAS send window which is used to send request to NX. 701 726 */ 702 727 txwin = vas_tx_win_open(coproc->vas.id, coproc->ct, &txattr); 703 - if (IS_ERR(txwin)) { 728 + if (IS_ERR(txwin)) 704 729 pr_err("ibm,nx-842: Can not open TX window: %ld\n", 705 730 PTR_ERR(txwin)); 706 - return NULL; 707 - } 708 731 709 732 return txwin; 733 + } 734 + 735 + /* 736 + * Identify chip ID for each CPU, open send wndow for the corresponding NX 737 + * engine and save txwin in percpu cpu_txwin. 738 + * cpu_txwin is used in copy/paste operation for each compression / 739 + * decompression request. 740 + */ 741 + static int nx842_open_percpu_txwins(void) 742 + { 743 + struct nx842_coproc *coproc, *n; 744 + unsigned int i, chip_id; 745 + 746 + for_each_possible_cpu(i) { 747 + struct vas_window *txwin = NULL; 748 + 749 + chip_id = cpu_to_chip_id(i); 750 + 751 + list_for_each_entry_safe(coproc, n, &nx842_coprocs, list) { 752 + /* 753 + * Kernel requests use only high priority FIFOs. So 754 + * open send windows for these FIFOs. 755 + */ 756 + 757 + if (coproc->ct != VAS_COP_TYPE_842_HIPRI) 758 + continue; 759 + 760 + if (coproc->chip_id == chip_id) { 761 + txwin = nx842_alloc_txwin(coproc); 762 + if (IS_ERR(txwin)) 763 + return PTR_ERR(txwin); 764 + 765 + per_cpu(cpu_txwin, i) = txwin; 766 + break; 767 + } 768 + } 769 + 770 + if (!per_cpu(cpu_txwin, i)) { 771 + /* shoudn't happen, Each chip will have NX engine */ 772 + pr_err("NX engine is not availavle for CPU %d\n", i); 773 + return -EINVAL; 774 + } 775 + } 776 + 777 + return 0; 710 778 } 711 779 712 780 static int __init vas_cfg_coproc_info(struct device_node *dn, int chip_id, ··· 837 819 coproc->vas.id = vasid; 838 820 nx842_add_coprocs_list(coproc, chip_id); 839 821 840 - /* 841 - * Kernel requests use only high priority FIFOs. So save coproc 842 - * info in percpu coproc_inst which will be used to open send 843 - * windows for crypto open requests later. 844 - */ 845 - if (coproc->ct == VAS_COP_TYPE_842_HIPRI) 846 - nx842_set_per_cpu_coproc(coproc); 847 - 848 822 return 0; 849 823 850 824 err_out: ··· 857 847 return -EINVAL; 858 848 } 859 849 860 - for_each_compatible_node(dn, NULL, "ibm,power9-vas-x") { 861 - if (of_get_ibm_chip_id(dn) == chip_id) 862 - break; 863 - } 864 - 865 - if (!dn) { 866 - pr_err("Missing VAS device node\n"); 850 + vasid = chip_to_vas_id(chip_id); 851 + if (vasid < 0) { 852 + pr_err("Unable to map chip_id %d to vasid\n", chip_id); 867 853 return -EINVAL; 868 854 } 869 - 870 - if (of_property_read_u32(dn, "ibm,vas-id", &vasid)) { 871 - pr_err("Missing ibm,vas-id device property\n"); 872 - of_node_put(dn); 873 - return -EINVAL; 874 - } 875 - 876 - of_node_put(dn); 877 855 878 856 for_each_child_of_node(pn, dn) { 879 857 if (of_device_is_compatible(dn, "ibm,p9-nx-842")) { ··· 926 928 static void nx842_delete_coprocs(void) 927 929 { 928 930 struct nx842_coproc *coproc, *n; 931 + struct vas_window *txwin; 932 + int i; 933 + 934 + /* 935 + * close percpu txwins that are opened for the corresponding coproc. 936 + */ 937 + for_each_possible_cpu(i) { 938 + txwin = per_cpu(cpu_txwin, i); 939 + if (txwin) 940 + vas_win_close(txwin); 941 + 942 + per_cpu(cpu_txwin, i) = 0; 943 + } 929 944 930 945 list_for_each_entry_safe(coproc, n, &nx842_coprocs, list) { 931 946 if (coproc->vas.rxwin) ··· 964 953 .compress = nx842_powernv_compress, 965 954 .decompress = nx842_powernv_decompress, 966 955 }; 967 - 968 - static int nx842_powernv_crypto_init_vas(struct crypto_tfm *tfm) 969 - { 970 - struct nx842_crypto_ctx *ctx = crypto_tfm_ctx(tfm); 971 - struct nx842_workmem *wmem; 972 - struct nx842_coproc *coproc; 973 - int ret; 974 - 975 - ret = nx842_crypto_init(tfm, &nx842_powernv_driver); 976 - 977 - if (ret) 978 - return ret; 979 - 980 - wmem = PTR_ALIGN((struct nx842_workmem *)ctx->wmem, WORKMEM_ALIGN); 981 - coproc = per_cpu(coproc_inst, smp_processor_id()); 982 - 983 - ret = -EINVAL; 984 - if (coproc && coproc->vas.rxwin) { 985 - wmem->txwin = nx842_alloc_txwin(coproc); 986 - if (!IS_ERR(wmem->txwin)) 987 - return 0; 988 - 989 - ret = PTR_ERR(wmem->txwin); 990 - } 991 - 992 - return ret; 993 - } 994 - 995 - void nx842_powernv_crypto_exit_vas(struct crypto_tfm *tfm) 996 - { 997 - struct nx842_crypto_ctx *ctx = crypto_tfm_ctx(tfm); 998 - struct nx842_workmem *wmem; 999 - 1000 - wmem = PTR_ALIGN((struct nx842_workmem *)ctx->wmem, WORKMEM_ALIGN); 1001 - 1002 - if (wmem && wmem->txwin) 1003 - vas_win_close(wmem->txwin); 1004 - 1005 - nx842_crypto_exit(tfm); 1006 - } 1007 956 1008 957 static int nx842_powernv_crypto_init(struct crypto_tfm *tfm) 1009 958 { ··· 1015 1044 1016 1045 nx842_powernv_exec = nx842_exec_icswx; 1017 1046 } else { 1047 + ret = nx842_open_percpu_txwins(); 1048 + if (ret) { 1049 + nx842_delete_coprocs(); 1050 + return ret; 1051 + } 1052 + 1018 1053 nx842_powernv_exec = nx842_exec_vas; 1019 - nx842_powernv_alg.cra_init = nx842_powernv_crypto_init_vas; 1020 - nx842_powernv_alg.cra_exit = nx842_powernv_crypto_exit_vas; 1021 1054 } 1022 1055 1023 1056 ret = crypto_register_alg(&nx842_powernv_alg);
+1 -1
drivers/crypto/nx/nx-842.c
··· 116 116 117 117 spin_lock_init(&ctx->lock); 118 118 ctx->driver = driver; 119 - ctx->wmem = kzalloc(driver->workmem_size, GFP_KERNEL); 119 + ctx->wmem = kmalloc(driver->workmem_size, GFP_KERNEL); 120 120 ctx->sbounce = (u8 *)__get_free_pages(GFP_KERNEL, BOUNCE_BUFFER_ORDER); 121 121 ctx->dbounce = (u8 *)__get_free_pages(GFP_KERNEL, BOUNCE_BUFFER_ORDER); 122 122 if (!ctx->wmem || !ctx->sbounce || !ctx->dbounce) {
+13 -3
drivers/misc/cxl/api.c
··· 15 15 #include <linux/module.h> 16 16 #include <linux/mount.h> 17 17 #include <linux/sched/mm.h> 18 + #include <linux/mmu_context.h> 18 19 19 20 #include "cxl.h" 20 21 ··· 332 331 /* ensure this mm_struct can't be freed */ 333 332 cxl_context_mm_count_get(ctx); 334 333 335 - /* decrement the use count */ 336 - if (ctx->mm) 334 + if (ctx->mm) { 335 + /* decrement the use count from above */ 337 336 mmput(ctx->mm); 337 + /* make TLBIs for this context global */ 338 + mm_context_add_copro(ctx->mm); 339 + } 338 340 } 339 341 340 342 /* ··· 346 342 */ 347 343 cxl_ctx_get(); 348 344 345 + /* See the comment in afu_ioctl_start_work() */ 346 + smp_mb(); 347 + 349 348 if ((rc = cxl_ops->attach_process(ctx, kernel, wed, 0))) { 350 349 put_pid(ctx->pid); 351 350 ctx->pid = NULL; 352 351 cxl_adapter_context_put(ctx->afu->adapter); 353 352 cxl_ctx_put(); 354 - if (task) 353 + if (task) { 355 354 cxl_context_mm_count_put(ctx); 355 + if (ctx->mm) 356 + mm_context_remove_copro(ctx->mm); 357 + } 356 358 goto out; 357 359 } 358 360
+3
drivers/misc/cxl/context.c
··· 18 18 #include <linux/slab.h> 19 19 #include <linux/idr.h> 20 20 #include <linux/sched/mm.h> 21 + #include <linux/mmu_context.h> 21 22 #include <asm/cputable.h> 22 23 #include <asm/current.h> 23 24 #include <asm/copro.h> ··· 268 267 269 268 /* Decrease the mm count on the context */ 270 269 cxl_context_mm_count_put(ctx); 270 + if (ctx->mm) 271 + mm_context_remove_copro(ctx->mm); 271 272 ctx->mm = NULL; 272 273 273 274 return 0;
+10 -12
drivers/misc/cxl/cxl.h
··· 100 100 static const cxl_p1_reg_t CXL_XSL_DSNCTL = {0x0168}; 101 101 /* PSL registers - CAIA 2 */ 102 102 static const cxl_p1_reg_t CXL_PSL9_CONTROL = {0x0020}; 103 + static const cxl_p1_reg_t CXL_XSL9_INV = {0x0110}; 104 + static const cxl_p1_reg_t CXL_XSL9_DBG = {0x0130}; 105 + static const cxl_p1_reg_t CXL_XSL9_DEF = {0x0140}; 103 106 static const cxl_p1_reg_t CXL_XSL9_DSNCTL = {0x0168}; 104 107 static const cxl_p1_reg_t CXL_PSL9_FIR1 = {0x0300}; 105 - static const cxl_p1_reg_t CXL_PSL9_FIR2 = {0x0308}; 108 + static const cxl_p1_reg_t CXL_PSL9_FIR_MASK = {0x0308}; 106 109 static const cxl_p1_reg_t CXL_PSL9_Timebase = {0x0310}; 107 110 static const cxl_p1_reg_t CXL_PSL9_DEBUG = {0x0320}; 108 111 static const cxl_p1_reg_t CXL_PSL9_FIR_CNTL = {0x0348}; ··· 115 112 static const cxl_p1_reg_t CXL_PSL9_APCDEDALLOC = {0x0378}; 116 113 static const cxl_p1_reg_t CXL_PSL9_APCDEDTYPE = {0x0380}; 117 114 static const cxl_p1_reg_t CXL_PSL9_TNR_ADDR = {0x0388}; 115 + static const cxl_p1_reg_t CXL_PSL9_CTCCFG = {0x0390}; 118 116 static const cxl_p1_reg_t CXL_PSL9_GP_CT = {0x0398}; 119 117 static const cxl_p1_reg_t CXL_XSL9_IERAT = {0x0588}; 120 118 static const cxl_p1_reg_t CXL_XSL9_ILPP = {0x0590}; ··· 417 413 #define CXL_DEV_MINORS 13 /* 1 control + 4 AFUs * 3 (dedicated/master/shared) */ 418 414 #define CXL_CARD_MINOR(adapter) (adapter->adapter_num * CXL_DEV_MINORS) 419 415 #define CXL_DEVT_ADAPTER(dev) (MINOR(dev) / CXL_DEV_MINORS) 416 + 417 + #define CXL_PSL9_TRACEID_MAX 0xAU 418 + #define CXL_PSL9_TRACESTATE_FIN 0x3U 420 419 421 420 enum cxl_context_status { 422 421 CLOSED, ··· 945 938 void cxl_debugfs_adapter_remove(struct cxl *adapter); 946 939 int cxl_debugfs_afu_add(struct cxl_afu *afu); 947 940 void cxl_debugfs_afu_remove(struct cxl_afu *afu); 948 - void cxl_stop_trace_psl9(struct cxl *cxl); 949 - void cxl_stop_trace_psl8(struct cxl *cxl); 950 941 void cxl_debugfs_add_adapter_regs_psl9(struct cxl *adapter, struct dentry *dir); 951 942 void cxl_debugfs_add_adapter_regs_psl8(struct cxl *adapter, struct dentry *dir); 952 943 void cxl_debugfs_add_adapter_regs_xsl(struct cxl *adapter, struct dentry *dir); ··· 977 972 } 978 973 979 974 static inline void cxl_debugfs_afu_remove(struct cxl_afu *afu) 980 - { 981 - } 982 - 983 - static inline void cxl_stop_trace_psl9(struct cxl *cxl) 984 - { 985 - } 986 - 987 - static inline void cxl_stop_trace_psl8(struct cxl *cxl) 988 975 { 989 976 } 990 977 ··· 1067 1070 1068 1071 void cxl_native_irq_dump_regs_psl9(struct cxl_context *ctx); 1069 1072 void cxl_native_irq_dump_regs_psl8(struct cxl_context *ctx); 1070 - void cxl_native_err_irq_dump_regs(struct cxl *adapter); 1073 + void cxl_native_err_irq_dump_regs_psl8(struct cxl *adapter); 1074 + void cxl_native_err_irq_dump_regs_psl9(struct cxl *adapter); 1071 1075 int cxl_pci_vphb_add(struct cxl_afu *afu); 1072 1076 void cxl_pci_vphb_remove(struct cxl_afu *afu); 1073 1077 void cxl_release_mapping(struct cxl_context *ctx);
+6 -23
drivers/misc/cxl/debugfs.c
··· 15 15 16 16 static struct dentry *cxl_debugfs; 17 17 18 - void cxl_stop_trace_psl9(struct cxl *adapter) 19 - { 20 - /* Stop the trace */ 21 - cxl_p1_write(adapter, CXL_PSL9_TRACECFG, 0x4480000000000000ULL); 22 - } 23 - 24 - void cxl_stop_trace_psl8(struct cxl *adapter) 25 - { 26 - int slice; 27 - 28 - /* Stop the trace */ 29 - cxl_p1_write(adapter, CXL_PSL_TRACE, 0x8000000000000017LL); 30 - 31 - /* Stop the slice traces */ 32 - spin_lock(&adapter->afu_list_lock); 33 - for (slice = 0; slice < adapter->slices; slice++) { 34 - if (adapter->afu[slice]) 35 - cxl_p1n_write(adapter->afu[slice], CXL_PSL_SLICE_TRACE, 0x8000000000000000LL); 36 - } 37 - spin_unlock(&adapter->afu_list_lock); 38 - } 39 - 40 18 /* Helpers to export CXL mmaped IO registers via debugfs */ 41 19 static int debugfs_io_u64_get(void *data, u64 *val) 42 20 { ··· 40 62 void cxl_debugfs_add_adapter_regs_psl9(struct cxl *adapter, struct dentry *dir) 41 63 { 42 64 debugfs_create_io_x64("fir1", S_IRUSR, dir, _cxl_p1_addr(adapter, CXL_PSL9_FIR1)); 43 - debugfs_create_io_x64("fir2", S_IRUSR, dir, _cxl_p1_addr(adapter, CXL_PSL9_FIR2)); 65 + debugfs_create_io_x64("fir_mask", 0400, dir, 66 + _cxl_p1_addr(adapter, CXL_PSL9_FIR_MASK)); 44 67 debugfs_create_io_x64("fir_cntl", S_IRUSR, dir, _cxl_p1_addr(adapter, CXL_PSL9_FIR_CNTL)); 45 68 debugfs_create_io_x64("trace", S_IRUSR | S_IWUSR, dir, _cxl_p1_addr(adapter, CXL_PSL9_TRACECFG)); 69 + debugfs_create_io_x64("debug", 0600, dir, 70 + _cxl_p1_addr(adapter, CXL_PSL9_DEBUG)); 71 + debugfs_create_io_x64("xsl-debug", 0600, dir, 72 + _cxl_p1_addr(adapter, CXL_XSL9_DBG)); 46 73 } 47 74 48 75 void cxl_debugfs_add_adapter_regs_psl8(struct cxl *adapter, struct dentry *dir)
+2 -13
drivers/misc/cxl/fault.c
··· 220 220 221 221 static bool cxl_is_page_fault(struct cxl_context *ctx, u64 dsisr) 222 222 { 223 - u64 crs; /* Translation Checkout Response Status */ 224 - 225 223 if ((cxl_is_power8()) && (dsisr & CXL_PSL_DSISR_An_DM)) 226 224 return true; 227 225 228 - if (cxl_is_power9()) { 229 - crs = (dsisr & CXL_PSL9_DSISR_An_CO_MASK); 230 - if ((crs == CXL_PSL9_DSISR_An_PF_SLR) || 231 - (crs == CXL_PSL9_DSISR_An_PF_RGC) || 232 - (crs == CXL_PSL9_DSISR_An_PF_RGP) || 233 - (crs == CXL_PSL9_DSISR_An_PF_HRH) || 234 - (crs == CXL_PSL9_DSISR_An_PF_STEG) || 235 - (crs == CXL_PSL9_DSISR_An_URTCH)) { 236 - return true; 237 - } 238 - } 226 + if (cxl_is_power9()) 227 + return true; 239 228 240 229 return false; 241 230 }
+22 -2
drivers/misc/cxl/file.c
··· 19 19 #include <linux/mm.h> 20 20 #include <linux/slab.h> 21 21 #include <linux/sched/mm.h> 22 + #include <linux/mmu_context.h> 22 23 #include <asm/cputable.h> 23 24 #include <asm/current.h> 24 25 #include <asm/copro.h> ··· 221 220 /* ensure this mm_struct can't be freed */ 222 221 cxl_context_mm_count_get(ctx); 223 222 224 - /* decrement the use count */ 225 - if (ctx->mm) 223 + if (ctx->mm) { 224 + /* decrement the use count from above */ 226 225 mmput(ctx->mm); 226 + /* make TLBIs for this context global */ 227 + mm_context_add_copro(ctx->mm); 228 + } 227 229 228 230 /* 229 231 * Increment driver use count. Enables global TLBIs for hash 230 232 * and callbacks to handle the segment table 231 233 */ 232 234 cxl_ctx_get(); 235 + 236 + /* 237 + * A barrier is needed to make sure all TLBIs are global 238 + * before we attach and the context starts being used by the 239 + * adapter. 240 + * 241 + * Needed after mm_context_add_copro() for radix and 242 + * cxl_ctx_get() for hash/p8. 243 + * 244 + * The barrier should really be mb(), since it involves a 245 + * device. However, it's only useful when we have local 246 + * vs. global TLBIs, i.e SMP=y. So keep smp_mb(). 247 + */ 248 + smp_mb(); 233 249 234 250 trace_cxl_attach(ctx, work.work_element_descriptor, work.num_interrupts, amr); 235 251 ··· 258 240 ctx->pid = NULL; 259 241 cxl_ctx_put(); 260 242 cxl_context_mm_count_put(ctx); 243 + if (ctx->mm) 244 + mm_context_remove_copro(ctx->mm); 261 245 goto out; 262 246 } 263 247
+21 -6
drivers/misc/cxl/native.c
··· 897 897 if (ctx->afu->adapter->native->sl_ops->update_dedicated_ivtes) 898 898 afu->adapter->native->sl_ops->update_dedicated_ivtes(ctx); 899 899 900 + ctx->elem->software_state = cpu_to_be32(CXL_PE_SOFTWARE_STATE_V); 901 + /* 902 + * Ideally we should do a wmb() here to make sure the changes to the 903 + * PE are visible to the card before we call afu_enable. 904 + * On ppc64 though all mmios are preceded by a 'sync' instruction hence 905 + * we dont dont need one here. 906 + */ 907 + 900 908 result = cxl_ops->afu_reset(afu); 901 909 if (result) 902 910 return result; ··· 1085 1077 1086 1078 void cxl_native_irq_dump_regs_psl9(struct cxl_context *ctx) 1087 1079 { 1088 - u64 fir1, fir2, serr; 1080 + u64 fir1, serr; 1089 1081 1090 1082 fir1 = cxl_p1_read(ctx->afu->adapter, CXL_PSL9_FIR1); 1091 - fir2 = cxl_p1_read(ctx->afu->adapter, CXL_PSL9_FIR2); 1092 1083 1093 1084 dev_crit(&ctx->afu->dev, "PSL_FIR1: 0x%016llx\n", fir1); 1094 - dev_crit(&ctx->afu->dev, "PSL_FIR2: 0x%016llx\n", fir2); 1095 1085 if (ctx->afu->adapter->native->sl_ops->register_serr_irq) { 1096 1086 serr = cxl_p1n_read(ctx->afu, CXL_PSL_SERR_An); 1097 1087 cxl_afu_decode_psl_serr(ctx->afu, serr); ··· 1263 1257 return IRQ_HANDLED; 1264 1258 } 1265 1259 1266 - void cxl_native_err_irq_dump_regs(struct cxl *adapter) 1260 + void cxl_native_err_irq_dump_regs_psl9(struct cxl *adapter) 1261 + { 1262 + u64 fir1; 1263 + 1264 + fir1 = cxl_p1_read(adapter, CXL_PSL9_FIR1); 1265 + dev_crit(&adapter->dev, "PSL_FIR: 0x%016llx\n", fir1); 1266 + } 1267 + 1268 + void cxl_native_err_irq_dump_regs_psl8(struct cxl *adapter) 1267 1269 { 1268 1270 u64 fir1, fir2; 1269 1271 1270 1272 fir1 = cxl_p1_read(adapter, CXL_PSL_FIR1); 1271 1273 fir2 = cxl_p1_read(adapter, CXL_PSL_FIR2); 1272 - 1273 - dev_crit(&adapter->dev, "PSL_FIR1: 0x%016llx\nPSL_FIR2: 0x%016llx\n", fir1, fir2); 1274 + dev_crit(&adapter->dev, 1275 + "PSL_FIR1: 0x%016llx\nPSL_FIR2: 0x%016llx\n", 1276 + fir1, fir2); 1274 1277 } 1275 1278 1276 1279 static irqreturn_t native_irq_err(int irq, void *data)
+64 -24
drivers/misc/cxl/pci.c
··· 401 401 *capp_unit_id = get_capp_unit_id(np, *phb_index); 402 402 of_node_put(np); 403 403 if (!*capp_unit_id) { 404 - pr_err("cxl: invalid capp unit id\n"); 404 + pr_err("cxl: invalid capp unit id (phb_index: %d)\n", 405 + *phb_index); 405 406 return -ENODEV; 406 407 } 407 408 ··· 476 475 psl_fircntl |= 0x1ULL; /* ce_thresh */ 477 476 cxl_p1_write(adapter, CXL_PSL9_FIR_CNTL, psl_fircntl); 478 477 479 - /* vccredits=0x1 pcklat=0x4 */ 480 - cxl_p1_write(adapter, CXL_PSL9_DSNDCTL, 0x0000000000001810ULL); 481 - 482 - /* 483 - * For debugging with trace arrays. 484 - * Configure RX trace 0 segmented mode. 485 - * Configure CT trace 0 segmented mode. 486 - * Configure LA0 trace 0 segmented mode. 487 - * Configure LA1 trace 0 segmented mode. 478 + /* Setup the PSL to transmit packets on the PCIe before the 479 + * CAPP is enabled 488 480 */ 489 - cxl_p1_write(adapter, CXL_PSL9_TRACECFG, 0x8040800080000000ULL); 490 - cxl_p1_write(adapter, CXL_PSL9_TRACECFG, 0x8040800080000003ULL); 491 - cxl_p1_write(adapter, CXL_PSL9_TRACECFG, 0x8040800080000005ULL); 492 - cxl_p1_write(adapter, CXL_PSL9_TRACECFG, 0x8040800080000006ULL); 481 + cxl_p1_write(adapter, CXL_PSL9_DSNDCTL, 0x0001001000002A10ULL); 493 482 494 483 /* 495 484 * A response to an ASB_Notify request is returned by the 496 485 * system as an MMIO write to the address defined in 497 - * the PSL_TNR_ADDR register 486 + * the PSL_TNR_ADDR register. 487 + * keep the Reset Value: 0x00020000E0000000 498 488 */ 499 - /* PSL_TNR_ADDR */ 500 489 501 - /* NORST */ 502 - cxl_p1_write(adapter, CXL_PSL9_DEBUG, 0x8000000000000000ULL); 490 + /* Enable XSL rty limit */ 491 + cxl_p1_write(adapter, CXL_XSL9_DEF, 0x51F8000000000005ULL); 503 492 504 - /* allocate the apc machines */ 505 - cxl_p1_write(adapter, CXL_PSL9_APCDEDTYPE, 0x40000003FFFF0000ULL); 493 + /* Change XSL_INV dummy read threshold */ 494 + cxl_p1_write(adapter, CXL_XSL9_INV, 0x0000040007FFC200ULL); 506 495 507 - /* Disable vc dd1 fix */ 508 - if (cxl_is_power9_dd1()) 509 - cxl_p1_write(adapter, CXL_PSL9_GP_CT, 0x0400000000000001ULL); 496 + if (phb_index == 3) { 497 + /* disable machines 31-47 and 20-27 for DMA */ 498 + cxl_p1_write(adapter, CXL_PSL9_APCDEDTYPE, 0x40000FF3FFFF0000ULL); 499 + } 500 + 501 + /* Snoop machines */ 502 + cxl_p1_write(adapter, CXL_PSL9_APCDEDALLOC, 0x800F000200000000ULL); 503 + 504 + if (cxl_is_power9_dd1()) { 505 + /* Disabling deadlock counter CAR */ 506 + cxl_p1_write(adapter, CXL_PSL9_GP_CT, 0x0020000000000001ULL); 507 + } else 508 + cxl_p1_write(adapter, CXL_PSL9_DEBUG, 0x4000000000000000ULL); 510 509 511 510 return 0; 512 511 } ··· 1747 1746 pci_disable_device(pdev); 1748 1747 } 1749 1748 1749 + static void cxl_stop_trace_psl9(struct cxl *adapter) 1750 + { 1751 + int traceid; 1752 + u64 trace_state, trace_mask; 1753 + struct pci_dev *dev = to_pci_dev(adapter->dev.parent); 1754 + 1755 + /* read each tracearray state and issue mmio to stop them is needed */ 1756 + for (traceid = 0; traceid <= CXL_PSL9_TRACEID_MAX; ++traceid) { 1757 + trace_state = cxl_p1_read(adapter, CXL_PSL9_CTCCFG); 1758 + trace_mask = (0x3ULL << (62 - traceid * 2)); 1759 + trace_state = (trace_state & trace_mask) >> (62 - traceid * 2); 1760 + dev_dbg(&dev->dev, "cxl: Traceid-%d trace_state=0x%0llX\n", 1761 + traceid, trace_state); 1762 + 1763 + /* issue mmio if the trace array isn't in FIN state */ 1764 + if (trace_state != CXL_PSL9_TRACESTATE_FIN) 1765 + cxl_p1_write(adapter, CXL_PSL9_TRACECFG, 1766 + 0x8400000000000000ULL | traceid); 1767 + } 1768 + } 1769 + 1770 + static void cxl_stop_trace_psl8(struct cxl *adapter) 1771 + { 1772 + int slice; 1773 + 1774 + /* Stop the trace */ 1775 + cxl_p1_write(adapter, CXL_PSL_TRACE, 0x8000000000000017LL); 1776 + 1777 + /* Stop the slice traces */ 1778 + spin_lock(&adapter->afu_list_lock); 1779 + for (slice = 0; slice < adapter->slices; slice++) { 1780 + if (adapter->afu[slice]) 1781 + cxl_p1n_write(adapter->afu[slice], CXL_PSL_SLICE_TRACE, 1782 + 0x8000000000000000LL); 1783 + } 1784 + spin_unlock(&adapter->afu_list_lock); 1785 + } 1786 + 1750 1787 static const struct cxl_service_layer_ops psl9_ops = { 1751 1788 .adapter_regs_init = init_implementation_adapter_regs_psl9, 1752 1789 .invalidate_all = cxl_invalidate_all_psl9, ··· 1801 1762 .debugfs_add_adapter_regs = cxl_debugfs_add_adapter_regs_psl9, 1802 1763 .debugfs_add_afu_regs = cxl_debugfs_add_afu_regs_psl9, 1803 1764 .psl_irq_dump_registers = cxl_native_irq_dump_regs_psl9, 1765 + .err_irq_dump_registers = cxl_native_err_irq_dump_regs_psl9, 1804 1766 .debugfs_stop_trace = cxl_stop_trace_psl9, 1805 1767 .write_timebase_ctrl = write_timebase_ctrl_psl9, 1806 1768 .timebase_read = timebase_read_psl9, ··· 1825 1785 .debugfs_add_adapter_regs = cxl_debugfs_add_adapter_regs_psl8, 1826 1786 .debugfs_add_afu_regs = cxl_debugfs_add_afu_regs_psl8, 1827 1787 .psl_irq_dump_registers = cxl_native_irq_dump_regs_psl8, 1828 - .err_irq_dump_registers = cxl_native_err_irq_dump_regs, 1788 + .err_irq_dump_registers = cxl_native_err_irq_dump_regs_psl8, 1829 1789 .debugfs_stop_trace = cxl_stop_trace_psl8, 1830 1790 .write_timebase_ctrl = write_timebase_ctrl_psl8, 1831 1791 .timebase_read = timebase_read_psl8,
+55 -32
drivers/mtd/devices/powernv_flash.c
··· 47 47 FLASH_OP_ERASE, 48 48 }; 49 49 50 + /* 51 + * Don't return -ERESTARTSYS if we can't get a token, the MTD core 52 + * might have split up the call from userspace and called into the 53 + * driver more than once, we'll already have done some amount of work. 54 + */ 50 55 static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op, 51 56 loff_t offset, size_t len, size_t *retlen, u_char *buf) 52 57 { ··· 68 63 if (token < 0) { 69 64 if (token != -ERESTARTSYS) 70 65 dev_err(dev, "Failed to get an async token\n"); 71 - 66 + else 67 + token = -EINTR; 72 68 return token; 73 69 } 74 70 ··· 84 78 rc = opal_flash_erase(info->id, offset, len, token); 85 79 break; 86 80 default: 87 - BUG_ON(1); 88 - } 89 - 90 - if (rc != OPAL_ASYNC_COMPLETION) { 91 - dev_err(dev, "opal_flash_async_op(op=%d) failed (rc %d)\n", 92 - op, rc); 81 + WARN_ON_ONCE(1); 93 82 opal_async_release_token(token); 94 83 return -EIO; 95 84 } 96 85 97 - rc = opal_async_wait_response(token, &msg); 86 + if (rc == OPAL_ASYNC_COMPLETION) { 87 + rc = opal_async_wait_response_interruptible(token, &msg); 88 + if (rc) { 89 + /* 90 + * If we return the mtd core will free the 91 + * buffer we've just passed to OPAL but OPAL 92 + * will continue to read or write from that 93 + * memory. 94 + * It may be tempting to ultimately return 0 95 + * if we're doing a read or a write since we 96 + * are going to end up waiting until OPAL is 97 + * done. However, because the MTD core sends 98 + * us the userspace request in chunks, we need 99 + * it to know we've been interrupted. 100 + */ 101 + rc = -EINTR; 102 + if (opal_async_wait_response(token, &msg)) 103 + dev_err(dev, "opal_async_wait_response() failed\n"); 104 + goto out; 105 + } 106 + rc = opal_get_async_rc(msg); 107 + } 108 + 109 + /* 110 + * OPAL does mutual exclusion on the flash, it will return 111 + * OPAL_BUSY. 112 + * During firmware updates by the service processor OPAL may 113 + * be (temporarily) prevented from accessing the flash, in 114 + * this case OPAL will also return OPAL_BUSY. 115 + * Both cases aren't errors exactly but the flash could have 116 + * changed, userspace should be informed. 117 + */ 118 + if (rc != OPAL_SUCCESS && rc != OPAL_BUSY) 119 + dev_err(dev, "opal_flash_async_op(op=%d) failed (rc %d)\n", 120 + op, rc); 121 + 122 + if (rc == OPAL_SUCCESS && retlen) 123 + *retlen = len; 124 + 125 + rc = opal_error_code(rc); 126 + out: 98 127 opal_async_release_token(token); 99 - if (rc) { 100 - dev_err(dev, "opal async wait failed (rc %d)\n", rc); 101 - return -EIO; 102 - } 103 - 104 - rc = opal_get_async_rc(msg); 105 - if (rc == OPAL_SUCCESS) { 106 - rc = 0; 107 - if (retlen) 108 - *retlen = len; 109 - } else { 110 - rc = -EIO; 111 - } 112 - 113 128 return rc; 114 129 } 115 130 ··· 247 220 int ret; 248 221 249 222 data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL); 250 - if (!data) { 251 - ret = -ENOMEM; 252 - goto out; 253 - } 223 + if (!data) 224 + return -ENOMEM; 225 + 254 226 data->mtd.priv = data; 255 227 256 228 ret = of_property_read_u32(dev->of_node, "ibm,opal-id", &(data->id)); 257 229 if (ret) { 258 230 dev_err(dev, "no device property 'ibm,opal-id'\n"); 259 - goto out; 231 + return ret; 260 232 } 261 233 262 234 ret = powernv_flash_set_driver_info(dev, &data->mtd); 263 235 if (ret) 264 - goto out; 236 + return ret; 265 237 266 238 dev_set_drvdata(dev, data); 267 239 ··· 269 243 * with an ffs partition at the start, it should prove easier for users 270 244 * to deal with partitions or not as they see fit 271 245 */ 272 - ret = mtd_device_register(&data->mtd, NULL, 0); 273 - 274 - out: 275 - return ret; 246 + return mtd_device_register(&data->mtd, NULL, 0); 276 247 } 277 248 278 249 /**
+14 -3
tools/testing/selftests/powerpc/benchmarks/context_switch.c
··· 10 10 */ 11 11 12 12 #define _GNU_SOURCE 13 + #include <errno.h> 13 14 #include <sched.h> 14 15 #include <string.h> 15 16 #include <stdio.h> ··· 76 75 77 76 static void start_thread_on(void *(*fn)(void *), void *arg, unsigned long cpu) 78 77 { 78 + int rc; 79 79 pthread_t tid; 80 80 cpu_set_t cpuset; 81 81 pthread_attr_t attr; ··· 84 82 CPU_ZERO(&cpuset); 85 83 CPU_SET(cpu, &cpuset); 86 84 87 - pthread_attr_init(&attr); 85 + rc = pthread_attr_init(&attr); 86 + if (rc) { 87 + errno = rc; 88 + perror("pthread_attr_init"); 89 + exit(1); 90 + } 88 91 89 - if (pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpuset)) { 92 + rc = pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpuset); 93 + if (rc) { 94 + errno = rc; 90 95 perror("pthread_attr_setaffinity_np"); 91 96 exit(1); 92 97 } 93 98 94 - if (pthread_create(&tid, &attr, fn, arg)) { 99 + rc = pthread_create(&tid, &attr, fn, arg); 100 + if (rc) { 101 + errno = rc; 95 102 perror("pthread_create"); 96 103 exit(1); 97 104 }
+5 -1
tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c
··· 53 53 } 54 54 55 55 while ((dp = readdir(sysfs))) { 56 + int len; 57 + 56 58 if (!(dp->d_type & DT_DIR)) 57 59 continue; 58 60 if (!strcmp(dp->d_name, "cpuidle")) ··· 62 60 if (!strstr(dp->d_name, "cpu")) 63 61 continue; 64 62 65 - sprintf(file, "%s%s/dscr", CPU_PATH, dp->d_name); 63 + len = snprintf(file, LEN_MAX, "%s%s/dscr", CPU_PATH, dp->d_name); 64 + if (len >= LEN_MAX) 65 + continue; 66 66 if (access(file, F_OK)) 67 67 continue; 68 68
+1
tools/testing/selftests/powerpc/tm/.gitignore
··· 12 12 tm-signal-context-chk-vmx 13 13 tm-signal-context-chk-vsx 14 14 tm-vmx-unavail 15 + tm-unavailable
+2 -1
tools/testing/selftests/powerpc/tm/Makefile
··· 3 3 tm-signal-context-chk-vmx tm-signal-context-chk-vsx 4 4 5 5 TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \ 6 - tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail \ 6 + tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable \ 7 7 $(SIGNAL_CONTEXT_CHK_TESTS) 8 8 9 9 include ../../lib.mk ··· 17 17 $(OUTPUT)/tm-tmspr: CFLAGS += -pthread 18 18 $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64 19 19 $(OUTPUT)/tm-resched-dscr: ../pmu/lib.o 20 + $(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno-error=uninitialized -mvsx 20 21 21 22 SIGNAL_CONTEXT_CHK_TESTS := $(patsubst %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS)) 22 23 $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S
+371
tools/testing/selftests/powerpc/tm/tm-unavailable.c
··· 1 + /* 2 + * Copyright 2017, Gustavo Romero, Breno Leitao, Cyril Bur, IBM Corp. 3 + * Licensed under GPLv2. 4 + * 5 + * Force FP, VEC and VSX unavailable exception during transaction in all 6 + * possible scenarios regarding the MSR.FP and MSR.VEC state, e.g. when FP 7 + * is enable and VEC is disable, when FP is disable and VEC is enable, and 8 + * so on. Then we check if the restored state is correctly set for the 9 + * FP and VEC registers to the previous state we set just before we entered 10 + * in TM, i.e. we check if it corrupts somehow the recheckpointed FP and 11 + * VEC/Altivec registers on abortion due to an unavailable exception in TM. 12 + * N.B. In this test we do not test all the FP/Altivec/VSX registers for 13 + * corruption, but only for registers vs0 and vs32, which are respectively 14 + * representatives of FP and VEC/Altivec reg sets. 15 + */ 16 + 17 + #define _GNU_SOURCE 18 + #include <stdio.h> 19 + #include <stdlib.h> 20 + #include <unistd.h> 21 + #include <inttypes.h> 22 + #include <stdbool.h> 23 + #include <pthread.h> 24 + #include <sched.h> 25 + 26 + #include "tm.h" 27 + 28 + #define DEBUG 0 29 + 30 + /* Unavailable exceptions to test in HTM */ 31 + #define FP_UNA_EXCEPTION 0 32 + #define VEC_UNA_EXCEPTION 1 33 + #define VSX_UNA_EXCEPTION 2 34 + 35 + #define NUM_EXCEPTIONS 3 36 + 37 + struct Flags { 38 + int touch_fp; 39 + int touch_vec; 40 + int result; 41 + int exception; 42 + } flags; 43 + 44 + bool expecting_failure(void) 45 + { 46 + if (flags.touch_fp && flags.exception == FP_UNA_EXCEPTION) 47 + return false; 48 + 49 + if (flags.touch_vec && flags.exception == VEC_UNA_EXCEPTION) 50 + return false; 51 + 52 + /* 53 + * If both FP and VEC are touched it does not mean that touching VSX 54 + * won't raise an exception. However since FP and VEC state are already 55 + * correctly loaded, the transaction is not aborted (i.e. 56 + * treclaimed/trecheckpointed) and MSR.VSX is just set as 1, so a TM 57 + * failure is not expected also in this case. 58 + */ 59 + if ((flags.touch_fp && flags.touch_vec) && 60 + flags.exception == VSX_UNA_EXCEPTION) 61 + return false; 62 + 63 + return true; 64 + } 65 + 66 + /* Check if failure occurred whilst in transaction. */ 67 + bool is_failure(uint64_t condition_reg) 68 + { 69 + /* 70 + * When failure handling occurs, CR0 is set to 0b1010 (0xa). Otherwise 71 + * transaction completes without failure and hence reaches out 'tend.' 72 + * that sets CR0 to 0b0100 (0x4). 73 + */ 74 + return ((condition_reg >> 28) & 0xa) == 0xa; 75 + } 76 + 77 + void *ping(void *input) 78 + { 79 + 80 + /* 81 + * Expected values for vs0 and vs32 after a TM failure. They must never 82 + * change, otherwise they got corrupted. 83 + */ 84 + uint64_t high_vs0 = 0x5555555555555555; 85 + uint64_t low_vs0 = 0xffffffffffffffff; 86 + uint64_t high_vs32 = 0x5555555555555555; 87 + uint64_t low_vs32 = 0xffffffffffffffff; 88 + 89 + /* Counter for busy wait */ 90 + uint64_t counter = 0x1ff000000; 91 + 92 + /* 93 + * Variable to keep a copy of CR register content taken just after we 94 + * leave the transactional state. 95 + */ 96 + uint64_t cr_ = 0; 97 + 98 + /* 99 + * Wait a bit so thread can get its name "ping". This is not important 100 + * to reproduce the issue but it's nice to have for systemtap debugging. 101 + */ 102 + if (DEBUG) 103 + sleep(1); 104 + 105 + printf("If MSR.FP=%d MSR.VEC=%d: ", flags.touch_fp, flags.touch_vec); 106 + 107 + if (flags.exception != FP_UNA_EXCEPTION && 108 + flags.exception != VEC_UNA_EXCEPTION && 109 + flags.exception != VSX_UNA_EXCEPTION) { 110 + printf("No valid exception specified to test.\n"); 111 + return NULL; 112 + } 113 + 114 + asm ( 115 + /* Prepare to merge low and high. */ 116 + " mtvsrd 33, %[high_vs0] ;" 117 + " mtvsrd 34, %[low_vs0] ;" 118 + 119 + /* 120 + * Adjust VS0 expected value after an TM failure, 121 + * i.e. vs0 = 0x5555555555555555555FFFFFFFFFFFFFFFF 122 + */ 123 + " xxmrghd 0, 33, 34 ;" 124 + 125 + /* 126 + * Adjust VS32 expected value after an TM failure, 127 + * i.e. vs32 = 0x5555555555555555555FFFFFFFFFFFFFFFF 128 + */ 129 + " xxmrghd 32, 33, 34 ;" 130 + 131 + /* 132 + * Wait an amount of context switches so load_fp and load_vec 133 + * overflow and MSR.FP, MSR.VEC, and MSR.VSX become zero (off). 134 + */ 135 + " mtctr %[counter] ;" 136 + 137 + /* Decrement CTR branch if CTR non zero. */ 138 + "1: bdnz 1b ;" 139 + 140 + /* 141 + * Check if we want to touch FP prior to the test in order 142 + * to set MSR.FP = 1 before provoking an unavailable 143 + * exception in TM. 144 + */ 145 + " cmpldi %[touch_fp], 0 ;" 146 + " beq no_fp ;" 147 + " fadd 10, 10, 10 ;" 148 + "no_fp: ;" 149 + 150 + /* 151 + * Check if we want to touch VEC prior to the test in order 152 + * to set MSR.VEC = 1 before provoking an unavailable 153 + * exception in TM. 154 + */ 155 + " cmpldi %[touch_vec], 0 ;" 156 + " beq no_vec ;" 157 + " vaddcuw 10, 10, 10 ;" 158 + "no_vec: ;" 159 + 160 + /* 161 + * Perhaps it would be a better idea to do the 162 + * compares outside transactional context and simply 163 + * duplicate code. 164 + */ 165 + " tbegin. ;" 166 + " beq trans_fail ;" 167 + 168 + /* Do we do FP Unavailable? */ 169 + " cmpldi %[exception], %[ex_fp] ;" 170 + " bne 1f ;" 171 + " fadd 10, 10, 10 ;" 172 + " b done ;" 173 + 174 + /* Do we do VEC Unavailable? */ 175 + "1: cmpldi %[exception], %[ex_vec] ;" 176 + " bne 2f ;" 177 + " vaddcuw 10, 10, 10 ;" 178 + " b done ;" 179 + 180 + /* 181 + * Not FP or VEC, therefore VSX. Ensure this 182 + * instruction always generates a VSX Unavailable. 183 + * ISA 3.0 is tricky here. 184 + * (xxmrghd will on ISA 2.07 and ISA 3.0) 185 + */ 186 + "2: xxmrghd 10, 10, 10 ;" 187 + 188 + "done: tend. ;" 189 + 190 + "trans_fail: ;" 191 + 192 + /* Give values back to C. */ 193 + " mfvsrd %[high_vs0], 0 ;" 194 + " xxsldwi 3, 0, 0, 2 ;" 195 + " mfvsrd %[low_vs0], 3 ;" 196 + " mfvsrd %[high_vs32], 32 ;" 197 + " xxsldwi 3, 32, 32, 2 ;" 198 + " mfvsrd %[low_vs32], 3 ;" 199 + 200 + /* Give CR back to C so that it can check what happened. */ 201 + " mfcr %[cr_] ;" 202 + 203 + : [high_vs0] "+r" (high_vs0), 204 + [low_vs0] "+r" (low_vs0), 205 + [high_vs32] "=r" (high_vs32), 206 + [low_vs32] "=r" (low_vs32), 207 + [cr_] "+r" (cr_) 208 + : [touch_fp] "r" (flags.touch_fp), 209 + [touch_vec] "r" (flags.touch_vec), 210 + [exception] "r" (flags.exception), 211 + [ex_fp] "i" (FP_UNA_EXCEPTION), 212 + [ex_vec] "i" (VEC_UNA_EXCEPTION), 213 + [ex_vsx] "i" (VSX_UNA_EXCEPTION), 214 + [counter] "r" (counter) 215 + 216 + : "cr0", "ctr", "v10", "vs0", "vs10", "vs3", "vs32", "vs33", 217 + "vs34", "fr10" 218 + 219 + ); 220 + 221 + /* 222 + * Check if we were expecting a failure and it did not occur by checking 223 + * CR0 state just after we leave the transaction. Either way we check if 224 + * vs0 or vs32 got corrupted. 225 + */ 226 + if (expecting_failure() && !is_failure(cr_)) { 227 + printf("\n\tExpecting the transaction to fail, %s", 228 + "but it didn't\n\t"); 229 + flags.result++; 230 + } 231 + 232 + /* Check if we were not expecting a failure and a it occurred. */ 233 + if (!expecting_failure() && is_failure(cr_)) { 234 + printf("\n\tUnexpected transaction failure 0x%02lx\n\t", 235 + failure_code()); 236 + return (void *) -1; 237 + } 238 + 239 + /* 240 + * Check if TM failed due to the cause we were expecting. 0xda is a 241 + * TM_CAUSE_FAC_UNAV cause, otherwise it's an unexpected cause. 242 + */ 243 + if (is_failure(cr_) && !failure_is_unavailable()) { 244 + printf("\n\tUnexpected failure cause 0x%02lx\n\t", 245 + failure_code()); 246 + return (void *) -1; 247 + } 248 + 249 + /* 0x4 is a success and 0xa is a fail. See comment in is_failure(). */ 250 + if (DEBUG) 251 + printf("CR0: 0x%1lx ", cr_ >> 28); 252 + 253 + /* Check FP (vs0) for the expected value. */ 254 + if (high_vs0 != 0x5555555555555555 || low_vs0 != 0xFFFFFFFFFFFFFFFF) { 255 + printf("FP corrupted!"); 256 + printf(" high = %#16" PRIx64 " low = %#16" PRIx64 " ", 257 + high_vs0, low_vs0); 258 + flags.result++; 259 + } else 260 + printf("FP ok "); 261 + 262 + /* Check VEC (vs32) for the expected value. */ 263 + if (high_vs32 != 0x5555555555555555 || low_vs32 != 0xFFFFFFFFFFFFFFFF) { 264 + printf("VEC corrupted!"); 265 + printf(" high = %#16" PRIx64 " low = %#16" PRIx64, 266 + high_vs32, low_vs32); 267 + flags.result++; 268 + } else 269 + printf("VEC ok"); 270 + 271 + putchar('\n'); 272 + 273 + return NULL; 274 + } 275 + 276 + /* Thread to force context switch */ 277 + void *pong(void *not_used) 278 + { 279 + /* Wait thread get its name "pong". */ 280 + if (DEBUG) 281 + sleep(1); 282 + 283 + /* Classed as an interactive-like thread. */ 284 + while (1) 285 + sched_yield(); 286 + } 287 + 288 + /* Function that creates a thread and launches the "ping" task. */ 289 + void test_fp_vec(int fp, int vec, pthread_attr_t *attr) 290 + { 291 + int retries = 2; 292 + void *ret_value; 293 + pthread_t t0; 294 + 295 + flags.touch_fp = fp; 296 + flags.touch_vec = vec; 297 + 298 + /* 299 + * Without luck it's possible that the transaction is aborted not due to 300 + * the unavailable exception caught in the middle as we expect but also, 301 + * for instance, due to a context switch or due to a KVM reschedule (if 302 + * it's running on a VM). Thus we try a few times before giving up, 303 + * checking if the failure cause is the one we expect. 304 + */ 305 + do { 306 + /* Bind 'ping' to CPU 0, as specified in 'attr'. */ 307 + pthread_create(&t0, attr, ping, (void *) &flags); 308 + pthread_setname_np(t0, "ping"); 309 + pthread_join(t0, &ret_value); 310 + retries--; 311 + } while (ret_value != NULL && retries); 312 + 313 + if (!retries) { 314 + flags.result = 1; 315 + if (DEBUG) 316 + printf("All transactions failed unexpectedly\n"); 317 + 318 + } 319 + } 320 + 321 + int main(int argc, char **argv) 322 + { 323 + int exception; /* FP = 0, VEC = 1, VSX = 2 */ 324 + pthread_t t1; 325 + pthread_attr_t attr; 326 + cpu_set_t cpuset; 327 + 328 + /* Set only CPU 0 in the mask. Both threads will be bound to CPU 0. */ 329 + CPU_ZERO(&cpuset); 330 + CPU_SET(0, &cpuset); 331 + 332 + /* Init pthread attribute. */ 333 + pthread_attr_init(&attr); 334 + 335 + /* Set CPU 0 mask into the pthread attribute. */ 336 + pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpuset); 337 + 338 + pthread_create(&t1, &attr /* Bind 'pong' to CPU 0 */, pong, NULL); 339 + pthread_setname_np(t1, "pong"); /* Name it for systemtap convenience */ 340 + 341 + flags.result = 0; 342 + 343 + for (exception = 0; exception < NUM_EXCEPTIONS; exception++) { 344 + printf("Checking if FP/VEC registers are sane after"); 345 + 346 + if (exception == FP_UNA_EXCEPTION) 347 + printf(" a FP unavailable exception...\n"); 348 + 349 + else if (exception == VEC_UNA_EXCEPTION) 350 + printf(" a VEC unavailable exception...\n"); 351 + 352 + else 353 + printf(" a VSX unavailable exception...\n"); 354 + 355 + flags.exception = exception; 356 + 357 + test_fp_vec(0, 0, &attr); 358 + test_fp_vec(1, 0, &attr); 359 + test_fp_vec(0, 1, &attr); 360 + test_fp_vec(1, 1, &attr); 361 + 362 + } 363 + 364 + if (flags.result > 0) { 365 + printf("result: failed!\n"); 366 + exit(1); 367 + } else { 368 + printf("result: success\n"); 369 + exit(0); 370 + } 371 + }
+5
tools/testing/selftests/powerpc/tm/tm.h
··· 47 47 return (failure_code() & TM_CAUSE_SYSCALL) == TM_CAUSE_SYSCALL; 48 48 } 49 49 50 + static inline bool failure_is_unavailable(void) 51 + { 52 + return (failure_code() & TM_CAUSE_FAC_UNAV) == TM_CAUSE_FAC_UNAV; 53 + } 54 + 50 55 static inline bool failure_is_nesting(void) 51 56 { 52 57 return (__builtin_get_texasru() & 0x400000);