Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc

+309

Documentation/devicetree/bindings/powerpc/fsl/interlaken-lac.txt

··· 1 + =============================================================================== 2 + Freescale Interlaken Look-Aside Controller Device Bindings 3 + Copyright 2012 Freescale Semiconductor Inc. 4 + 5 + CONTENTS 6 + - Interlaken Look-Aside Controller (LAC) Node 7 + - Example LAC Node 8 + - Interlaken Look-Aside Controller (LAC) Software Portal Node 9 + - Interlaken Look-Aside Controller (LAC) Software Portal Child Nodes 10 + - Example LAC SWP Node with Child Nodes 11 + 12 + ============================================================================== 13 + Interlaken Look-Aside Controller (LAC) Node 14 + 15 + DESCRIPTION 16 + 17 + The Interlaken is a narrow, high speed channelized chip-to-chip interface. To 18 + facilitate interoperability between a data path device and a look-aside 19 + co-processor, the Interlaken Look-Aside protocol is defined for short 20 + transaction-related transfers. Although based on the Interlaken protocol, 21 + Interlaken Look-Aside is not directly compatible with Interlaken and can be 22 + considered a different operation mode. 23 + 24 + The Interlaken LA controller connects internal platform to Interlaken serial 25 + interface. It accepts LA command through software portals, which are system 26 + memory mapped 4KB spaces. The LA commands are then translated into the 27 + Interlaken control words and data words, which are sent on TX side to TCAM 28 + through SerDes lanes. 29 + 30 + There are two 4KiB spaces defined within the LAC global register memory map. 31 + There is a full register set at 0x0000-0x0FFF (also known as the "hypervisor" 32 + version), and a subset at 0x1000-0x1FFF. The former is a superset of the 33 + latter, and includes certain registers that should not be accessible to 34 + partitioned software. Separate nodes are used for each region, with a phandle 35 + linking the hypervisor node to the normal operating node. 36 + 37 + PROPERTIES 38 + 39 + - compatible 40 + Usage: required 41 + Value type: <string> 42 + Definition: Must include "fsl,interlaken-lac". This represents only 43 + those LAC CCSR registers not protected in partitioned 44 + software. The version of the device is determined by the LAC 45 + IP Block Revision Register (IPBRR0) at offset 0x0BF8. 46 + 47 + Table of correspondences between IPBRR0 values and example 48 + chips: 49 + Value Device 50 + ----------- ------- 51 + 0x02000100 T4240 52 + 53 + The Hypervisor node has a different compatible. It must include 54 + "fsl,interlaken-lac-hv". This node represents the protected 55 + LAC register space and is required except inside a partition 56 + where access to the hypervisor node is to be denied. 57 + 58 + - fsl,non-hv-node 59 + Usage: required in "fsl,interlaken-lac-hv" 60 + Value type: <phandle> 61 + Definition: Points to the non-protected LAC CCSR mapped register space 62 + node. 63 + 64 + - reg 65 + Usage: required 66 + Value type: <prop-encoded-array> 67 + Definition: A standard property. The first resource represents the 68 + Interlaken LAC configuration registers. 69 + 70 + - interrupts: 71 + Usage: required in non-hv node only 72 + Value type: <prop-encoded-array> 73 + Definition: Interrupt mapping for Interlaken LAC error IRQ. 74 + 75 + EXAMPLE 76 + lac: lac@229000 { 77 + compatible = "fsl,interlaken-lac" 78 + reg = <0x229000 0x1000>; 79 + interrupts = <16 2 1 18>; 80 + }; 81 + 82 + lac-hv@228000 { 83 + compatible = "fsl,interlaken-lac-hv" 84 + reg = <0x228000 0x1000>; 85 + fsl,non-hv-node = <&lac>; 86 + }; 87 + 88 + =============================================================================== 89 + Interlaken Look-Aside Controller (LAC) Software Portal Container Node 90 + 91 + DESCRIPTION 92 + The Interlaken Look-Aside Controller (LAC) utilizes Software Portals to accept 93 + Interlaken Look-Aside (ILA) commands. The Interlaken LAC software portal 94 + memory map occupies 128KB of memory space. The software portal memory space is 95 + intended to be cache-enabled. WIMG for each software space is required to be 96 + 0010 if stashing is enabled; otherwise, WIMG can be 0000 or 0010. 97 + 98 + PROPERTIES 99 + 100 + - #address-cells 101 + Usage: required 102 + Value type: <u32> 103 + Definition: A standard property. Must have a value of 1. 104 + 105 + - #size-cells 106 + Usage: required 107 + Value type: <u32> 108 + Definition: A standard property. Must have a value of 1. 109 + 110 + - compatible 111 + Usage: required 112 + Value type: <string> 113 + Definition: Must include "fsl,interlaken-lac-portals" 114 + 115 + - ranges 116 + Usage: required 117 + Value type: <prop-encoded-array> 118 + Definition: A standard property. Specifies the address and length 119 + of the LAC portal memory space. 120 + 121 + =============================================================================== 122 + Interlaken Look-Aside Controller (LAC) Software Portals Child Nodes 123 + 124 + DESCRIPTION 125 + There are up to 24 available software portals with each software portal 126 + requiring 4KB of consecutive memory within the software portal memory mapped 127 + space. 128 + 129 + PROPERTIES 130 + 131 + - compatible 132 + Usage: required 133 + Value type: <string> 134 + Definition: Must include "fsl,interlaken-lac-portal-vX.Y" where X is 135 + the Major version (IP_MJ) found in the LAC IP Block Revision 136 + Register (IPBRR0), at offset 0x0BF8, and Y is the Minor version 137 + (IP_MN). 138 + 139 + Table of correspondences between version values and example chips: 140 + Value Device 141 + ------ ------- 142 + 1.0 T4240 143 + 144 + - reg 145 + Usage: required 146 + Value type: <prop-encoded-array> 147 + Definition: A standard property. The first resource represents the 148 + Interlaken LAC software portal registers. 149 + 150 + - fsl,liodn 151 + Value type: <u32> 152 + Definition: The logical I/O device number (LIODN) for this device. The 153 + LIODN is a number expressed by this device and used to perform 154 + look-ups in the IOMMU (PAMU) address table when performing 155 + DMAs. This property is automatically added by u-boot. 156 + 157 + =============================================================================== 158 + EXAMPLE 159 + 160 + lac-portals { 161 + #address-cells = <0x1>; 162 + #size-cells = <0x1>; 163 + compatible = "fsl,interlaken-lac-portals"; 164 + ranges = <0x0 0xf 0xf4400000 0x20000>; 165 + 166 + lportal0: lac-portal@0 { 167 + compatible = "fsl,interlaken-lac-portal-v1.0"; 168 + fsl,liodn = <0x204>; 169 + reg = <0x0 0x1000>; 170 + }; 171 + 172 + lportal1: lac-portal@1000 { 173 + compatible = "fsl,interlaken-lac-portal-v1.0"; 174 + fsl,liodn = <0x205>; 175 + reg = <0x1000 0x1000>; 176 + }; 177 + 178 + lportal2: lac-portal@2000 { 179 + compatible = "fsl,interlaken-lac-portal-v1.0"; 180 + fsl,liodn = <0x206>; 181 + reg = <0x2000 0x1000>; 182 + }; 183 + 184 + lportal3: lac-portal@3000 { 185 + compatible = "fsl,interlaken-lac-portal-v1.0"; 186 + fsl,liodn = <0x207>; 187 + reg = <0x3000 0x1000>; 188 + }; 189 + 190 + lportal4: lac-portal@4000 { 191 + compatible = "fsl,interlaken-lac-portal-v1.0"; 192 + fsl,liodn = <0x208>; 193 + reg = <0x4000 0x1000>; 194 + }; 195 + 196 + lportal5: lac-portal@5000 { 197 + compatible = "fsl,interlaken-lac-portal-v1.0"; 198 + fsl,liodn = <0x209>; 199 + reg = <0x5000 0x1000>; 200 + }; 201 + 202 + lportal6: lac-portal@6000 { 203 + compatible = "fsl,interlaken-lac-portal-v1.0"; 204 + fsl,liodn = <0x20A>; 205 + reg = <0x6000 0x1000>; 206 + }; 207 + 208 + lportal7: lac-portal@7000 { 209 + compatible = "fsl,interlaken-lac-portal-v1.0"; 210 + fsl,liodn = <0x20B>; 211 + reg = <0x7000 0x1000>; 212 + }; 213 + 214 + lportal8: lac-portal@8000 { 215 + compatible = "fsl,interlaken-lac-portal-v1.0"; 216 + fsl,liodn = <0x20C>; 217 + reg = <0x8000 0x1000>; 218 + }; 219 + 220 + lportal9: lac-portal@9000 { 221 + compatible = "fsl,interlaken-lac-portal-v1.0"; 222 + fsl,liodn = <0x20D>; 223 + reg = <0x9000 0x1000>; 224 + }; 225 + 226 + lportal10: lac-portal@A000 { 227 + compatible = "fsl,interlaken-lac-portal-v1.0"; 228 + fsl,liodn = <0x20E>; 229 + reg = <0xA000 0x1000>; 230 + }; 231 + 232 + lportal11: lac-portal@B000 { 233 + compatible = "fsl,interlaken-lac-portal-v1.0"; 234 + fsl,liodn = <0x20F>; 235 + reg = <0xB000 0x1000>; 236 + }; 237 + 238 + lportal12: lac-portal@C000 { 239 + compatible = "fsl,interlaken-lac-portal-v1.0"; 240 + fsl,liodn = <0x210>; 241 + reg = <0xC000 0x1000>; 242 + }; 243 + 244 + lportal13: lac-portal@D000 { 245 + compatible = "fsl,interlaken-lac-portal-v1.0"; 246 + fsl,liodn = <0x211>; 247 + reg = <0xD000 0x1000>; 248 + }; 249 + 250 + lportal14: lac-portal@E000 { 251 + compatible = "fsl,interlaken-lac-portal-v1.0"; 252 + fsl,liodn = <0x212>; 253 + reg = <0xE000 0x1000>; 254 + }; 255 + 256 + lportal15: lac-portal@F000 { 257 + compatible = "fsl,interlaken-lac-portal-v1.0"; 258 + fsl,liodn = <0x213>; 259 + reg = <0xF000 0x1000>; 260 + }; 261 + 262 + lportal16: lac-portal@10000 { 263 + compatible = "fsl,interlaken-lac-portal-v1.0"; 264 + fsl,liodn = <0x214>; 265 + reg = <0x10000 0x1000>; 266 + }; 267 + 268 + lportal17: lac-portal@11000 { 269 + compatible = "fsl,interlaken-lac-portal-v1.0"; 270 + fsl,liodn = <0x215>; 271 + reg = <0x11000 0x1000>; 272 + }; 273 + 274 + lportal8: lac-portal@1200 { 275 + compatible = "fsl,interlaken-lac-portal-v1.0"; 276 + fsl,liodn = <0x216>; 277 + reg = <0x12000 0x1000>; 278 + }; 279 + 280 + lportal19: lac-portal@13000 { 281 + compatible = "fsl,interlaken-lac-portal-v1.0"; 282 + fsl,liodn = <0x217>; 283 + reg = <0x13000 0x1000>; 284 + }; 285 + 286 + lportal20: lac-portal@14000 { 287 + compatible = "fsl,interlaken-lac-portal-v1.0"; 288 + fsl,liodn = <0x218>; 289 + reg = <0x14000 0x1000>; 290 + }; 291 + 292 + lportal21: lac-portal@15000 { 293 + compatible = "fsl,interlaken-lac-portal-v1.0"; 294 + fsl,liodn = <0x219>; 295 + reg = <0x15000 0x1000>; 296 + }; 297 + 298 + lportal22: lac-portal@16000 { 299 + compatible = "fsl,interlaken-lac-portal-v1.0"; 300 + fsl,liodn = <0x21A>; 301 + reg = <0x16000 0x1000>; 302 + }; 303 + 304 + lportal23: lac-portal@17000 { 305 + compatible = "fsl,interlaken-lac-portal-v1.0"; 306 + fsl,liodn = <0x21B>; 307 + reg = <0x17000 0x1000>; 308 + }; 309 + };

+2

Documentation/powerpc/00-INDEX

··· 14 14 - IBM "Hypervisor Virtual Console Server" Installation Guide 15 15 mpc52xx.txt 16 16 - Linux 2.6.x on MPC52xx family 17 + pmu-ebb.txt 18 + - Description of the API for using the PMU with Event Based Branches. 17 19 qe_firmware.txt 18 20 - describes the layout of firmware binaries for the Freescale QUICC 19 21 Engine and the code that parses and uploads the microcode therein.

+137

Documentation/powerpc/pmu-ebb.txt

··· 1 + PMU Event Based Branches 2 + ======================== 3 + 4 + Event Based Branches (EBBs) are a feature which allows the hardware to 5 + branch directly to a specified user space address when certain events occur. 6 + 7 + The full specification is available in Power ISA v2.07: 8 + 9 + https://www.power.org/documentation/power-isa-version-2-07/ 10 + 11 + One type of event for which EBBs can be configured is PMU exceptions. This 12 + document describes the API for configuring the Power PMU to generate EBBs, 13 + using the Linux perf_events API. 14 + 15 + 16 + Terminology 17 + ----------- 18 + 19 + Throughout this document we will refer to an "EBB event" or "EBB events". This 20 + just refers to a struct perf_event which has set the "EBB" flag in its 21 + attr.config. All events which can be configured on the hardware PMU are 22 + possible "EBB events". 23 + 24 + 25 + Background 26 + ---------- 27 + 28 + When a PMU EBB occurs it is delivered to the currently running process. As such 29 + EBBs can only sensibly be used by programs for self-monitoring. 30 + 31 + It is a feature of the perf_events API that events can be created on other 32 + processes, subject to standard permission checks. This is also true of EBB 33 + events, however unless the target process enables EBBs (via mtspr(BESCR)) no 34 + EBBs will ever be delivered. 35 + 36 + This makes it possible for a process to enable EBBs for itself, but not 37 + actually configure any events. At a later time another process can come along 38 + and attach an EBB event to the process, which will then cause EBBs to be 39 + delivered to the first process. It's not clear if this is actually useful. 40 + 41 + 42 + When the PMU is configured for EBBs, all PMU interrupts are delivered to the 43 + user process. This means once an EBB event is scheduled on the PMU, no non-EBB 44 + events can be configured. This means that EBB events can not be run 45 + concurrently with regular 'perf' commands, or any other perf events. 46 + 47 + It is however safe to run 'perf' commands on a process which is using EBBs. The 48 + kernel will in general schedule the EBB event, and perf will be notified that 49 + its events could not run. 50 + 51 + The exclusion between EBB events and regular events is implemented using the 52 + existing "pinned" and "exclusive" attributes of perf_events. This means EBB 53 + events will be given priority over other events, unless they are also pinned. 54 + If an EBB event and a regular event are both pinned, then whichever is enabled 55 + first will be scheduled and the other will be put in error state. See the 56 + section below titled "Enabling an EBB event" for more information. 57 + 58 + 59 + Creating an EBB event 60 + --------------------- 61 + 62 + To request that an event is counted using EBB, the event code should have bit 63 + 63 set. 64 + 65 + EBB events must be created with a particular, and restrictive, set of 66 + attributes - this is so that they interoperate correctly with the rest of the 67 + perf_events subsystem. 68 + 69 + An EBB event must be created with the "pinned" and "exclusive" attributes set. 70 + Note that if you are creating a group of EBB events, only the leader can have 71 + these attributes set. 72 + 73 + An EBB event must NOT set any of the "inherit", "sample_period", "freq" or 74 + "enable_on_exec" attributes. 75 + 76 + An EBB event must be attached to a task. This is specified to perf_event_open() 77 + by passing a pid value, typically 0 indicating the current task. 78 + 79 + All events in a group must agree on whether they want EBB. That is all events 80 + must request EBB, or none may request EBB. 81 + 82 + EBB events must specify the PMC they are to be counted on. This ensures 83 + userspace is able to reliably determine which PMC the event is scheduled on. 84 + 85 + 86 + Enabling an EBB event 87 + --------------------- 88 + 89 + Once an EBB event has been successfully opened, it must be enabled with the 90 + perf_events API. This can be achieved either via the ioctl() interface, or the 91 + prctl() interface. 92 + 93 + However, due to the design of the perf_events API, enabling an event does not 94 + guarantee that it has been scheduled on the PMU. To ensure that the EBB event 95 + has been scheduled on the PMU, you must perform a read() on the event. If the 96 + read() returns EOF, then the event has not been scheduled and EBBs are not 97 + enabled. 98 + 99 + This behaviour occurs because the EBB event is pinned and exclusive. When the 100 + EBB event is enabled it will force all other non-pinned events off the PMU. In 101 + this case the enable will be successful. However if there is already an event 102 + pinned on the PMU then the enable will not be successful. 103 + 104 + 105 + Reading an EBB event 106 + -------------------- 107 + 108 + It is possible to read() from an EBB event. However the results are 109 + meaningless. Because interrupts are being delivered to the user process the 110 + kernel is not able to count the event, and so will return a junk value. 111 + 112 + 113 + Closing an EBB event 114 + -------------------- 115 + 116 + When an EBB event is finished with, you can close it using close() as for any 117 + regular event. If this is the last EBB event the PMU will be deconfigured and 118 + no further PMU EBBs will be delivered. 119 + 120 + 121 + EBB Handler 122 + ----------- 123 + 124 + The EBB handler is just regular userspace code, however it must be written in 125 + the style of an interrupt handler. When the handler is entered all registers 126 + are live (possibly) and so must be saved somehow before the handler can invoke 127 + other code. 128 + 129 + It's up to the program how to handle this. For C programs a relatively simple 130 + option is to create an interrupt frame on the stack and save registers there. 131 + 132 + Fork 133 + ---- 134 + 135 + EBB events are not inherited across fork. If the child process wishes to use 136 + EBBs it should open a new event for itself. Similarly the EBB state in 137 + BESCR/EBBHR/EBBRR is cleared across fork().

+63

Documentation/vfio.txt

··· 283 283 interfaces implement the device region access defined by the device's 284 284 own VFIO_DEVICE_GET_REGION_INFO ioctl. 285 285 286 + 287 + PPC64 sPAPR implementation note 288 + ------------------------------------------------------------------------------- 289 + 290 + This implementation has some specifics: 291 + 292 + 1) Only one IOMMU group per container is supported as an IOMMU group 293 + represents the minimal entity which isolation can be guaranteed for and 294 + groups are allocated statically, one per a Partitionable Endpoint (PE) 295 + (PE is often a PCI domain but not always). 296 + 297 + 2) The hardware supports so called DMA windows - the PCI address range 298 + within which DMA transfer is allowed, any attempt to access address space 299 + out of the window leads to the whole PE isolation. 300 + 301 + 3) PPC64 guests are paravirtualized but not fully emulated. There is an API 302 + to map/unmap pages for DMA, and it normally maps 1..32 pages per call and 303 + currently there is no way to reduce the number of calls. In order to make things 304 + faster, the map/unmap handling has been implemented in real mode which provides 305 + an excellent performance which has limitations such as inability to do 306 + locked pages accounting in real time. 307 + 308 + So 3 additional ioctls have been added: 309 + 310 + VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start 311 + of the DMA window on the PCI bus. 312 + 313 + VFIO_IOMMU_ENABLE - enables the container. The locked pages accounting 314 + is done at this point. This lets user first to know what 315 + the DMA window is and adjust rlimit before doing any real job. 316 + 317 + VFIO_IOMMU_DISABLE - disables the container. 318 + 319 + 320 + The code flow from the example above should be slightly changed: 321 + 322 + ..... 323 + /* Add the group to the container */ 324 + ioctl(group, VFIO_GROUP_SET_CONTAINER, &container); 325 + 326 + /* Enable the IOMMU model we want */ 327 + ioctl(container, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU) 328 + 329 + /* Get addition sPAPR IOMMU info */ 330 + vfio_iommu_spapr_tce_info spapr_iommu_info; 331 + ioctl(container, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &spapr_iommu_info); 332 + 333 + if (ioctl(container, VFIO_IOMMU_ENABLE)) 334 + /* Cannot enable container, may be low rlimit */ 335 + 336 + /* Allocate some space and setup a DMA mapping */ 337 + dma_map.vaddr = mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE, 338 + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); 339 + 340 + dma_map.size = 1024 * 1024; 341 + dma_map.iova = 0; /* 1MB starting at 0x0 from device view */ 342 + dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE; 343 + 344 + /* Check here is .iova/.size are within DMA window from spapr_iommu_info */ 345 + 346 + ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map); 347 + ..... 348 + 286 349 ------------------------------------------------------------------------------- 287 350 288 351 [1] VFIO was originally an acronym for "Virtual Function I/O" in its

+7 -1

MAINTAINERS

··· 3123 3123 S: Maintained 3124 3124 F: drivers/media/rc/ene_ir.* 3125 3125 3126 + ENHANCED ERROR HANDLING (EEH) 3127 + M: Gavin Shan <shangw@linux.vnet.ibm.com> 3128 + L: linuxppc-dev@lists.ozlabs.org 3129 + S: Supported 3130 + F: Documentation/powerpc/eeh-pci-error-recovery.txt 3131 + F: arch/powerpc/kernel/eeh*.c 3132 + 3126 3133 EPSON S1D13XXX FRAMEBUFFER DRIVER 3127 3134 M: Kristoffer Ericson <kristoffer.ericson@gmail.com> 3128 3135 S: Maintained ··· 6199 6192 L: linux-pci@vger.kernel.org 6200 6193 S: Supported 6201 6194 F: Documentation/PCI/pci-error-recovery.txt 6202 - F: Documentation/powerpc/eeh-pci-error-recovery.txt 6203 6195 6204 6196 PCI SUBSYSTEM 6205 6197 M: Bjorn Helgaas <bhelgaas@google.com>

+5 -12

arch/powerpc/Kconfig

··· 298 298 299 299 config MATH_EMULATION 300 300 bool "Math emulation" 301 - depends on 4xx || 8xx || E200 || PPC_MPC832x || E500 301 + depends on 4xx || 8xx || PPC_MPC832x || BOOKE 302 302 ---help--- 303 303 Some PowerPC chips designed for embedded applications do not have 304 304 a floating-point unit and therefore do not implement the ··· 307 307 unit, which will allow programs that use floating-point 308 308 instructions to run. 309 309 310 + This is also useful to emulate missing (optional) instructions 311 + such as fsqrt on cores that do have an FPU but do not implement 312 + them (such as Freescale BookE). 313 + 310 314 config PPC_TRANSACTIONAL_MEM 311 315 bool "Transactional Memory support for POWERPC" 312 316 depends on PPC_BOOK3S_64 ··· 318 314 default n 319 315 ---help--- 320 316 Support user-mode Transactional Memory on POWERPC. 321 - 322 - config 8XX_MINIMAL_FPEMU 323 - bool "Minimal math emulation for 8xx" 324 - depends on 8xx && !MATH_EMULATION 325 - help 326 - Older arch/ppc kernels still emulated a few floating point 327 - instructions such as load and store, even when full math 328 - emulation is disabled. Say "Y" here if you want to preserve 329 - this behavior. 330 - 331 - It is recommended that you build a soft-float userspace instead. 332 317 333 318 config IOMMU_HELPER 334 319 def_bool PPC64

+7

arch/powerpc/Kconfig.debug

··· 147 147 enable debugging for the wrong type of machine your kernel 148 148 _will not boot_. 149 149 150 + config PPC_EARLY_DEBUG_BOOTX 151 + bool "BootX or OpenFirmware" 152 + depends on BOOTX_TEXT 153 + help 154 + Select this to enable early debugging for a machine using BootX 155 + or OpenFirmware. 156 + 150 157 config PPC_EARLY_DEBUG_LPAR 151 158 bool "LPAR HV Console" 152 159 depends on PPC_PSERIES

+5

arch/powerpc/boot/dts/currituck.dts

··· 103 103 interrupts = <34 2>; 104 104 }; 105 105 106 + FPGA0: fpga@50000000 { 107 + compatible = "ibm,currituck-fpga"; 108 + reg = <0x50000000 0x4>; 109 + }; 110 + 106 111 IIC0: i2c@00000000 { 107 112 compatible = "ibm,iic-currituck", "ibm,iic"; 108 113 reg = <0x0 0x00000014>;

+156

arch/powerpc/boot/dts/fsl/interlaken-lac-portals.dtsi

··· 1 + /* T4240 Interlaken LAC Portal device tree stub with 24 portals. 2 + * 3 + * Copyright 2012 Freescale Semiconductor Inc. 4 + * 5 + * Redistribution and use in source and binary forms, with or without 6 + * modification, are permitted provided that the following conditions are met: 7 + * * Redistributions of source code must retain the above copyright 8 + * notice, this list of conditions and the following disclaimer. 9 + * * Redistributions in binary form must reproduce the above copyright 10 + * notice, this list of conditions and the following disclaimer in the 11 + * documentation and/or other materials provided with the distribution. 12 + * * Neither the name of Freescale Semiconductor nor the 13 + * names of its contributors may be used to endorse or promote products 14 + * derived from this software without specific prior written permission. 15 + * 16 + * 17 + * ALTERNATIVELY, this software may be distributed under the terms of the 18 + * GNU General Public License ("GPL") as published by the Free Software 19 + * Foundation, either version 2 of that License or (at your option) any 20 + * later version. 21 + * 22 + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor "AS IS" AND ANY 23 + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 24 + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 25 + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY 26 + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 27 + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 28 + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 29 + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 30 + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 31 + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 32 + */ 33 + 34 + #address-cells = <0x1>; 35 + #size-cells = <0x1>; 36 + compatible = "fsl,interlaken-lac-portals"; 37 + 38 + lportal0: lac-portal@0 { 39 + compatible = "fsl,interlaken-lac-portal-v1.0"; 40 + reg = <0x0 0x1000>; 41 + }; 42 + 43 + lportal1: lac-portal@1000 { 44 + compatible = "fsl,interlaken-lac-portal-v1.0"; 45 + reg = <0x1000 0x1000>; 46 + }; 47 + 48 + lportal2: lac-portal@2000 { 49 + compatible = "fsl,interlaken-lac-portal-v1.0"; 50 + reg = <0x2000 0x1000>; 51 + }; 52 + 53 + lportal3: lac-portal@3000 { 54 + compatible = "fsl,interlaken-lac-portal-v1.0"; 55 + reg = <0x3000 0x1000>; 56 + }; 57 + 58 + lportal4: lac-portal@4000 { 59 + compatible = "fsl,interlaken-lac-portal-v1.0"; 60 + reg = <0x4000 0x1000>; 61 + }; 62 + 63 + lportal5: lac-portal@5000 { 64 + compatible = "fsl,interlaken-lac-portal-v1.0"; 65 + reg = <0x5000 0x1000>; 66 + }; 67 + 68 + lportal6: lac-portal@6000 { 69 + compatible = "fsl,interlaken-lac-portal-v1.0"; 70 + reg = <0x6000 0x1000>; 71 + }; 72 + 73 + lportal7: lac-portal@7000 { 74 + compatible = "fsl,interlaken-lac-portal-v1.0"; 75 + reg = <0x7000 0x1000>; 76 + }; 77 + 78 + lportal8: lac-portal@8000 { 79 + compatible = "fsl,interlaken-lac-portal-v1.0"; 80 + reg = <0x8000 0x1000>; 81 + }; 82 + 83 + lportal9: lac-portal@9000 { 84 + compatible = "fsl,interlaken-lac-portal-v1.0"; 85 + reg = <0x9000 0x1000>; 86 + }; 87 + 88 + lportal10: lac-portal@A000 { 89 + compatible = "fsl,interlaken-lac-portal-v1.0"; 90 + reg = <0xA000 0x1000>; 91 + }; 92 + 93 + lportal11: lac-portal@B000 { 94 + compatible = "fsl,interlaken-lac-portal-v1.0"; 95 + reg = <0xB000 0x1000>; 96 + }; 97 + 98 + lportal12: lac-portal@C000 { 99 + compatible = "fsl,interlaken-lac-portal-v1.0"; 100 + reg = <0xC000 0x1000>; 101 + }; 102 + 103 + lportal13: lac-portal@D000 { 104 + compatible = "fsl,interlaken-lac-portal-v1.0"; 105 + reg = <0xD000 0x1000>; 106 + }; 107 + 108 + lportal14: lac-portal@E000 { 109 + compatible = "fsl,interlaken-lac-portal-v1.0"; 110 + reg = <0xE000 0x1000>; 111 + }; 112 + 113 + lportal15: lac-portal@F000 { 114 + compatible = "fsl,interlaken-lac-portal-v1.0"; 115 + reg = <0xF000 0x1000>; 116 + }; 117 + 118 + lportal16: lac-portal@10000 { 119 + compatible = "fsl,interlaken-lac-portal-v1.0"; 120 + reg = <0x10000 0x1000>; 121 + }; 122 + 123 + lportal17: lac-portal@11000 { 124 + compatible = "fsl,interlaken-lac-portal-v1.0"; 125 + reg = <0x11000 0x1000>; 126 + }; 127 + 128 + lportal18: lac-portal@1200 { 129 + compatible = "fsl,interlaken-lac-portal-v1.0"; 130 + reg = <0x12000 0x1000>; 131 + }; 132 + 133 + lportal19: lac-portal@13000 { 134 + compatible = "fsl,interlaken-lac-portal-v1.0"; 135 + reg = <0x13000 0x1000>; 136 + }; 137 + 138 + lportal20: lac-portal@14000 { 139 + compatible = "fsl,interlaken-lac-portal-v1.0"; 140 + reg = <0x14000 0x1000>; 141 + }; 142 + 143 + lportal21: lac-portal@15000 { 144 + compatible = "fsl,interlaken-lac-portal-v1.0"; 145 + reg = <0x15000 0x1000>; 146 + }; 147 + 148 + lportal22: lac-portal@16000 { 149 + compatible = "fsl,interlaken-lac-portal-v1.0"; 150 + reg = <0x16000 0x1000>; 151 + }; 152 + 153 + lportal23: lac-portal@17000 { 154 + compatible = "fsl,interlaken-lac-portal-v1.0"; 155 + reg = <0x17000 0x1000>; 156 + };

+45

arch/powerpc/boot/dts/fsl/interlaken-lac.dtsi

··· 1 + /* 2 + * T4 Interlaken Look-aside Controller (LAC) device tree stub 3 + * 4 + * Copyright 2012 Freescale Semiconductor Inc. 5 + * 6 + * Redistribution and use in source and binary forms, with or without 7 + * modification, are permitted provided that the following conditions are met: 8 + * * Redistributions of source code must retain the above copyright 9 + * notice, this list of conditions and the following disclaimer. 10 + * * Redistributions in binary form must reproduce the above copyright 11 + * notice, this list of conditions and the following disclaimer in the 12 + * documentation and/or other materials provided with the distribution. 13 + * * Neither the name of Freescale Semiconductor nor the 14 + * names of its contributors may be used to endorse or promote products 15 + * derived from this software without specific prior written permission. 16 + * 17 + * 18 + * ALTERNATIVELY, this software may be distributed under the terms of the 19 + * GNU General Public License ("GPL") as published by the Free Software 20 + * Foundation, either version 2 of that License or (at your option) any 21 + * later version. 22 + * 23 + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor "AS IS" AND ANY 24 + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 25 + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 26 + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY 27 + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 28 + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 29 + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 30 + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 31 + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 32 + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 33 + */ 34 + 35 + lac: lac@229000 { 36 + compatible = "fsl,interlaken-lac"; 37 + reg = <0x229000 0x1000>; 38 + interrupts = <16 2 1 18>; 39 + }; 40 + 41 + lac-hv@228000 { 42 + compatible = "fsl,interlaken-lac-hv"; 43 + reg = <0x228000 0x1000>; 44 + fsl,non-hv-node = <&lac>; 45 + };

+2

arch/powerpc/configs/c2k_defconfig

··· 423 423 CONFIG_DEBUG_STACKOVERFLOW=y 424 424 CONFIG_DEBUG_STACK_USAGE=y 425 425 CONFIG_BOOTX_TEXT=y 426 + CONFIG_PPC_EARLY_DEBUG=y 427 + CONFIG_PPC_EARLY_DEBUG_BOOTX=y 426 428 CONFIG_KEYS=y 427 429 CONFIG_KEYS_DEBUG_PROC_KEYS=y 428 430 CONFIG_SECURITY=y

+2

arch/powerpc/configs/g5_defconfig

··· 284 284 CONFIG_LATENCYTOP=y 285 285 CONFIG_SYSCTL_SYSCALL_CHECK=y 286 286 CONFIG_BOOTX_TEXT=y 287 + CONFIG_PPC_EARLY_DEBUG=y 288 + CONFIG_PPC_EARLY_DEBUG_BOOTX=y 287 289 CONFIG_CRYPTO_NULL=m 288 290 CONFIG_CRYPTO_TEST=m 289 291 CONFIG_CRYPTO_ECB=m

+2

arch/powerpc/configs/maple_defconfig

··· 138 138 CONFIG_XMON=y 139 139 CONFIG_XMON_DEFAULT=y 140 140 CONFIG_BOOTX_TEXT=y 141 + CONFIG_PPC_EARLY_DEBUG=y 142 + CONFIG_PPC_EARLY_DEBUG_BOOTX=y 141 143 CONFIG_CRYPTO_ECB=m 142 144 CONFIG_CRYPTO_PCBC=m 143 145 # CONFIG_CRYPTO_ANSI_CPRNG is not set

+10 -17

arch/powerpc/configs/mpc512x_defconfig

··· 1 - CONFIG_EXPERIMENTAL=y 2 1 # CONFIG_SWAP is not set 3 2 CONFIG_SYSVIPC=y 4 - CONFIG_SPARSE_IRQ=y 3 + CONFIG_NO_HZ=y 5 4 CONFIG_LOG_BUF_SHIFT=16 6 5 CONFIG_BLK_DEV_INITRD=y 7 6 # CONFIG_COMPAT_BRK is not set ··· 8 9 CONFIG_MODULES=y 9 10 CONFIG_MODULE_UNLOAD=y 10 11 # CONFIG_BLK_DEV_BSG is not set 12 + CONFIG_PARTITION_ADVANCED=y 11 13 # CONFIG_IOSCHED_CFQ is not set 12 14 # CONFIG_PPC_CHRP is not set 13 15 CONFIG_PPC_MPC512x=y ··· 16 16 CONFIG_MPC512x_GENERIC=y 17 17 CONFIG_PDM360NG=y 18 18 # CONFIG_PPC_PMAC is not set 19 - CONFIG_NO_HZ=y 20 19 CONFIG_HZ_1000=y 21 - # CONFIG_MIGRATION is not set 22 20 # CONFIG_SECCOMP is not set 23 21 # CONFIG_PCI is not set 24 22 CONFIG_NET=y ··· 31 33 # CONFIG_INET_DIAG is not set 32 34 # CONFIG_IPV6 is not set 33 35 CONFIG_CAN=y 34 - CONFIG_CAN_RAW=y 35 - CONFIG_CAN_BCM=y 36 36 CONFIG_CAN_VCAN=y 37 37 CONFIG_CAN_MSCAN=y 38 38 CONFIG_CAN_DEBUG_DEVICES=y ··· 42 46 # CONFIG_FIRMWARE_IN_KERNEL is not set 43 47 CONFIG_MTD=y 44 48 CONFIG_MTD_CMDLINE_PARTS=y 45 - CONFIG_MTD_CHAR=y 46 49 CONFIG_MTD_BLOCK=y 47 50 CONFIG_MTD_CFI=y 48 51 CONFIG_MTD_CFI_AMDSTD=y ··· 55 60 CONFIG_BLK_DEV_RAM_COUNT=1 56 61 CONFIG_BLK_DEV_RAM_SIZE=8192 57 62 CONFIG_BLK_DEV_XIP=y 58 - CONFIG_MISC_DEVICES=y 59 63 CONFIG_EEPROM_AT24=y 60 64 CONFIG_EEPROM_AT25=y 61 65 CONFIG_SCSI=y ··· 62 68 CONFIG_BLK_DEV_SD=y 63 69 CONFIG_CHR_DEV_SG=y 64 70 CONFIG_NETDEVICES=y 71 + CONFIG_FS_ENET=y 65 72 CONFIG_MARVELL_PHY=y 66 73 CONFIG_DAVICOM_PHY=y 67 74 CONFIG_QSEMI_PHY=y ··· 78 83 CONFIG_LSI_ET1011C_PHY=y 79 84 CONFIG_FIXED_PHY=y 80 85 CONFIG_MDIO_BITBANG=y 81 - CONFIG_NET_ETHERNET=y 82 - CONFIG_FS_ENET=y 83 - # CONFIG_NETDEV_1000 is not set 84 - # CONFIG_NETDEV_10000 is not set 85 86 # CONFIG_WLAN is not set 86 87 # CONFIG_INPUT_MOUSEDEV_PSAUX is not set 87 88 CONFIG_INPUT_EVDEV=y ··· 97 106 CONFIG_GPIO_MPC8XXX=y 98 107 # CONFIG_HWMON is not set 99 108 CONFIG_MEDIA_SUPPORT=y 100 - CONFIG_VIDEO_DEV=y 101 109 CONFIG_VIDEO_ADV_DEBUG=y 102 - # CONFIG_VIDEO_HELPER_CHIPS_AUTO is not set 103 - CONFIG_VIDEO_SAA711X=y 104 110 CONFIG_FB=y 105 111 CONFIG_FB_FSL_DIU=y 106 112 # CONFIG_VGA_CONSOLE is not set 107 113 CONFIG_FRAMEBUFFER_CONSOLE=y 114 + CONFIG_USB=y 115 + CONFIG_USB_EHCI_HCD=y 116 + CONFIG_USB_EHCI_FSL=y 117 + # CONFIG_USB_EHCI_HCD_PPC_OF is not set 118 + CONFIG_USB_STORAGE=y 119 + CONFIG_USB_GADGET=y 120 + CONFIG_USB_FSL_USB2=y 108 121 CONFIG_RTC_CLASS=y 109 122 CONFIG_RTC_DRV_M41T80=y 110 123 CONFIG_RTC_DRV_MPC5121=y ··· 124 129 CONFIG_JFFS2_FS=y 125 130 CONFIG_UBIFS_FS=y 126 131 CONFIG_NFS_FS=y 127 - CONFIG_NFS_V3=y 128 132 CONFIG_ROOT_NFS=y 129 - CONFIG_PARTITION_ADVANCED=y 130 133 CONFIG_NLS_CODEPAGE_437=y 131 134 CONFIG_NLS_ISO8859_1=y 132 135 # CONFIG_ENABLE_WARN_DEPRECATED is not set

+1

arch/powerpc/configs/mpc85xx_smp_defconfig

··· 131 131 CONFIG_FS_ENET=y 132 132 CONFIG_UCC_GETH=y 133 133 CONFIG_GIANFAR=y 134 + CONFIG_E1000E=y 134 135 CONFIG_MARVELL_PHY=y 135 136 CONFIG_DAVICOM_PHY=y 136 137 CONFIG_CICADA_PHY=y

+2

arch/powerpc/configs/pmac32_defconfig

··· 350 350 CONFIG_XMON=y 351 351 CONFIG_XMON_DEFAULT=y 352 352 CONFIG_BOOTX_TEXT=y 353 + CONFIG_PPC_EARLY_DEBUG=y 354 + CONFIG_PPC_EARLY_DEBUG_BOOTX=y 353 355 CONFIG_CRYPTO_NULL=m 354 356 CONFIG_CRYPTO_PCBC=m 355 357 CONFIG_CRYPTO_MD4=m

+2

arch/powerpc/configs/ppc64_defconfig

··· 398 398 CONFIG_MSI_BITMAP_SELFTEST=y 399 399 CONFIG_XMON=y 400 400 CONFIG_BOOTX_TEXT=y 401 + CONFIG_PPC_EARLY_DEBUG=y 402 + CONFIG_PPC_EARLY_DEBUG_BOOTX=y 401 403 CONFIG_CRYPTO_NULL=m 402 404 CONFIG_CRYPTO_TEST=m 403 405 CONFIG_CRYPTO_PCBC=m

+2

arch/powerpc/configs/ppc6xx_defconfig

··· 1264 1264 CONFIG_DEBUG_STACK_USAGE=y 1265 1265 CONFIG_XMON=y 1266 1266 CONFIG_BOOTX_TEXT=y 1267 + CONFIG_PPC_EARLY_DEBUG=y 1268 + CONFIG_PPC_EARLY_DEBUG_BOOTX=y 1267 1269 CONFIG_KEYS=y 1268 1270 CONFIG_KEYS_DEBUG_PROC_KEYS=y 1269 1271 CONFIG_SECURITY=y

+1

arch/powerpc/configs/pseries_defconfig

··· 296 296 CONFIG_SQUASHFS_XATTR=y 297 297 CONFIG_SQUASHFS_LZO=y 298 298 CONFIG_SQUASHFS_XZ=y 299 + CONFIG_PSTORE=y 299 300 CONFIG_NFS_FS=y 300 301 CONFIG_NFS_V3_ACL=y 301 302 CONFIG_NFS_V4=y

+24 -12

arch/powerpc/include/asm/eeh.h

··· 24 24 #include <linux/init.h> 25 25 #include <linux/list.h> 26 26 #include <linux/string.h> 27 + #include <linux/time.h> 27 28 28 29 struct pci_dev; 29 30 struct pci_bus; ··· 53 52 54 53 #define EEH_PE_ISOLATED (1 << 0) /* Isolated PE */ 55 54 #define EEH_PE_RECOVERING (1 << 1) /* Recovering PE */ 55 + #define EEH_PE_PHB_DEAD (1 << 2) /* Dead PHB */ 56 56 57 57 struct eeh_pe { 58 58 int type; /* PE type: PHB/Bus/Device */ ··· 61 59 int config_addr; /* Traditional PCI address */ 62 60 int addr; /* PE configuration address */ 63 61 struct pci_controller *phb; /* Associated PHB */ 62 + struct pci_bus *bus; /* Top PCI bus for bus PE */ 64 63 int check_count; /* Times of ignored error */ 65 64 int freeze_count; /* Times of froze up */ 65 + struct timeval tstamp; /* Time on first-time freeze */ 66 66 int false_positives; /* Times of reported #ff's */ 67 67 struct eeh_pe *parent; /* Parent PE */ 68 68 struct list_head child_list; /* Link PE to the child list */ ··· 99 95 100 96 static inline struct device_node *eeh_dev_to_of_node(struct eeh_dev *edev) 101 97 { 102 - return edev->dn; 98 + return edev ? edev->dn : NULL; 103 99 } 104 100 105 101 static inline struct pci_dev *eeh_dev_to_pci_dev(struct eeh_dev *edev) 106 102 { 107 - return edev->pdev; 103 + return edev ? edev->pdev : NULL; 108 104 } 109 105 110 106 /* ··· 134 130 struct eeh_ops { 135 131 char *name; 136 132 int (*init)(void); 133 + int (*post_init)(void); 137 134 void* (*of_probe)(struct device_node *dn, void *flag); 138 - void* (*dev_probe)(struct pci_dev *dev, void *flag); 135 + int (*dev_probe)(struct pci_dev *dev, void *flag); 139 136 int (*set_option)(struct eeh_pe *pe, int option); 140 137 int (*get_pe_addr)(struct eeh_pe *pe); 141 138 int (*get_state)(struct eeh_pe *pe, int *state); ··· 146 141 int (*configure_bridge)(struct eeh_pe *pe); 147 142 int (*read_config)(struct device_node *dn, int where, int size, u32 *val); 148 143 int (*write_config)(struct device_node *dn, int where, int size, u32 val); 144 + int (*next_error)(struct eeh_pe **pe); 149 145 }; 150 146 151 147 extern struct eeh_ops *eeh_ops; 152 148 extern int eeh_subsystem_enabled; 153 - extern struct mutex eeh_mutex; 149 + extern raw_spinlock_t confirm_error_lock; 154 150 extern int eeh_probe_mode; 155 151 156 152 #define EEH_PROBE_MODE_DEV (1<<0) /* From PCI device */ ··· 172 166 return (eeh_probe_mode == EEH_PROBE_MODE_DEV); 173 167 } 174 168 175 - static inline void eeh_lock(void) 169 + static inline void eeh_serialize_lock(unsigned long *flags) 176 170 { 177 - mutex_lock(&eeh_mutex); 171 + raw_spin_lock_irqsave(&confirm_error_lock, *flags); 178 172 } 179 173 180 - static inline void eeh_unlock(void) 174 + static inline void eeh_serialize_unlock(unsigned long flags) 181 175 { 182 - mutex_unlock(&eeh_mutex); 176 + raw_spin_unlock_irqrestore(&confirm_error_lock, flags); 183 177 } 184 178 185 179 /* ··· 190 184 191 185 typedef void *(*eeh_traverse_func)(void *data, void *flag); 192 186 int eeh_phb_pe_create(struct pci_controller *phb); 187 + struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb); 188 + struct eeh_pe *eeh_pe_get(struct eeh_dev *edev); 193 189 int eeh_add_to_parent_pe(struct eeh_dev *edev); 194 190 int eeh_rmv_from_parent_pe(struct eeh_dev *edev, int purge_pe); 191 + void eeh_pe_update_time_stamp(struct eeh_pe *pe); 195 192 void *eeh_pe_dev_traverse(struct eeh_pe *root, 196 193 eeh_traverse_func fn, void *flag); 197 194 void eeh_pe_restore_bars(struct eeh_pe *pe); ··· 202 193 203 194 void *eeh_dev_init(struct device_node *dn, void *data); 204 195 void eeh_dev_phb_init_dynamic(struct pci_controller *phb); 196 + int eeh_init(void); 205 197 int __init eeh_ops_register(struct eeh_ops *ops); 206 198 int __exit eeh_ops_unregister(const char *name); 207 199 unsigned long eeh_check_failure(const volatile void __iomem *token, 208 200 unsigned long val); 209 201 int eeh_dev_check_failure(struct eeh_dev *edev); 210 - void __init eeh_addr_cache_build(void); 202 + void eeh_addr_cache_build(void); 211 203 void eeh_add_device_tree_early(struct device_node *); 212 204 void eeh_add_device_tree_late(struct pci_bus *); 213 205 void eeh_add_sysfs_files(struct pci_bus *); ··· 230 220 #define EEH_IO_ERROR_VALUE(size) (~0U >> ((4 - (size)) * 8)) 231 221 232 222 #else /* !CONFIG_EEH */ 223 + 224 + static inline int eeh_init(void) 225 + { 226 + return 0; 227 + } 233 228 234 229 static inline void *eeh_dev_init(struct device_node *dn, void *data) 235 230 { ··· 259 244 static inline void eeh_add_sysfs_files(struct pci_bus *bus) { } 260 245 261 246 static inline void eeh_remove_bus_device(struct pci_dev *dev, int purge_pe) { } 262 - 263 - static inline void eeh_lock(void) { } 264 - static inline void eeh_unlock(void) { } 265 247 266 248 #define EEH_POSSIBLE_ERROR(val, type) (0) 267 249 #define EEH_IO_ERROR_VALUE(size) (-1UL)

+2

arch/powerpc/include/asm/eeh_event.h

··· 31 31 struct eeh_pe *pe; /* EEH PE */ 32 32 }; 33 33 34 + int eeh_event_init(void); 34 35 int eeh_send_failure_event(struct eeh_pe *pe); 36 + void eeh_remove_event(struct eeh_pe *pe); 35 37 void eeh_handle_event(struct eeh_pe *pe); 36 38 37 39 #endif /* __KERNEL__ */

+4 -4

arch/powerpc/include/asm/exception-64s.h

··· 358 358 /* No guest interrupts come through here */ \ 359 359 SET_SCRATCH0(r13); /* save r13 */ \ 360 360 EXCEPTION_RELON_PROLOG_PSERIES(PACA_EXGEN, label##_common, \ 361 - EXC_STD, KVMTEST_PR, vec) 361 + EXC_STD, NOTEST, vec) 362 362 363 363 #define STD_RELON_EXCEPTION_PSERIES_OOL(vec, label) \ 364 364 .globl label##_relon_pSeries; \ 365 365 label##_relon_pSeries: \ 366 - EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_PR, vec); \ 366 + EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, vec); \ 367 367 EXCEPTION_RELON_PROLOG_PSERIES_1(label##_common, EXC_STD) 368 368 369 369 #define STD_RELON_EXCEPTION_HV(loc, vec, label) \ ··· 374 374 /* No guest interrupts come through here */ \ 375 375 SET_SCRATCH0(r13); /* save r13 */ \ 376 376 EXCEPTION_RELON_PROLOG_PSERIES(PACA_EXGEN, label##_common, \ 377 - EXC_HV, KVMTEST, vec) 377 + EXC_HV, NOTEST, vec) 378 378 379 379 #define STD_RELON_EXCEPTION_HV_OOL(vec, label) \ 380 380 .globl label##_relon_hv; \ 381 381 label##_relon_hv: \ 382 - EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST, vec); \ 382 + EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, vec); \ 383 383 EXCEPTION_RELON_PROLOG_PSERIES_1(label##_common, EXC_HV) 384 384 385 385 /* This associate vector numbers with bits in paca->irq_happened */

+7 -1

arch/powerpc/include/asm/hugetlb.h

··· 191 191 unsigned long vmaddr) 192 192 { 193 193 } 194 - #endif /* CONFIG_HUGETLB_PAGE */ 195 194 195 + #define hugepd_shift(x) 0 196 + static inline pte_t *hugepte_offset(hugepd_t *hpdp, unsigned long addr, 197 + unsigned pdshift) 198 + { 199 + return 0; 200 + } 201 + #endif /* CONFIG_HUGETLB_PAGE */ 196 202 197 203 /* 198 204 * FSL Book3E platforms require special gpage handling - the gpages

+26 -7

arch/powerpc/include/asm/iommu.h

··· 76 76 struct iommu_pool large_pool; 77 77 struct iommu_pool pools[IOMMU_NR_POOLS]; 78 78 unsigned long *it_map; /* A simple allocation bitmap for now */ 79 + #ifdef CONFIG_IOMMU_API 80 + struct iommu_group *it_group; 81 + #endif 79 82 }; 80 83 81 84 struct scatterlist; ··· 101 98 */ 102 99 extern struct iommu_table *iommu_init_table(struct iommu_table * tbl, 103 100 int nid); 101 + extern void iommu_register_group(struct iommu_table *tbl, 102 + int pci_domain_number, unsigned long pe_num); 104 103 105 104 extern int iommu_map_sg(struct device *dev, struct iommu_table *tbl, 106 105 struct scatterlist *sglist, int nelems, ··· 130 125 extern void iommu_init_early_dart(void); 131 126 extern void iommu_init_early_pasemi(void); 132 127 133 - #ifdef CONFIG_PCI 134 - extern void pci_iommu_init(void); 135 - extern void pci_direct_iommu_init(void); 136 - #else 137 - static inline void pci_iommu_init(void) { } 138 - #endif 139 - 140 128 extern void alloc_dart_table(void); 141 129 #if defined(CONFIG_PPC64) && defined(CONFIG_PM) 142 130 static inline void iommu_save(void) ··· 144 146 ppc_md.iommu_restore(); 145 147 } 146 148 #endif 149 + 150 + /* The API to support IOMMU operations for VFIO */ 151 + extern int iommu_tce_clear_param_check(struct iommu_table *tbl, 152 + unsigned long ioba, unsigned long tce_value, 153 + unsigned long npages); 154 + extern int iommu_tce_put_param_check(struct iommu_table *tbl, 155 + unsigned long ioba, unsigned long tce); 156 + extern int iommu_tce_build(struct iommu_table *tbl, unsigned long entry, 157 + unsigned long hwaddr, enum dma_data_direction direction); 158 + extern unsigned long iommu_clear_tce(struct iommu_table *tbl, 159 + unsigned long entry); 160 + extern int iommu_clear_tces_and_put_pages(struct iommu_table *tbl, 161 + unsigned long entry, unsigned long pages); 162 + extern int iommu_put_tce_user_mode(struct iommu_table *tbl, 163 + unsigned long entry, unsigned long tce); 164 + 165 + extern void iommu_flush_tce(struct iommu_table *tbl); 166 + extern int iommu_take_ownership(struct iommu_table *tbl); 167 + extern void iommu_release_ownership(struct iommu_table *tbl); 168 + 169 + extern enum dma_data_direction iommu_tce_direction(unsigned long tce); 147 170 148 171 #endif /* __KERNEL__ */ 149 172 #endif /* _ASM_IOMMU_H */

+33 -23

arch/powerpc/include/asm/kvm_book3s_64.h

··· 159 159 } 160 160 161 161 /* 162 - * Lock and read a linux PTE. If it's present and writable, atomically 163 - * set dirty and referenced bits and return the PTE, otherwise return 0. 162 + * If it's present and writable, atomically set dirty and referenced bits and 163 + * return the PTE, otherwise return 0. If we find a transparent hugepage 164 + * and if it is marked splitting we return 0; 164 165 */ 165 - static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing) 166 + static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing, 167 + unsigned int hugepage) 166 168 { 167 - pte_t pte, tmp; 169 + pte_t old_pte, new_pte = __pte(0); 168 170 169 - /* wait until _PAGE_BUSY is clear then set it atomically */ 170 - __asm__ __volatile__ ( 171 - "1: ldarx %0,0,%3\n" 172 - " andi. %1,%0,%4\n" 173 - " bne- 1b\n" 174 - " ori %1,%0,%4\n" 175 - " stdcx. %1,0,%3\n" 176 - " bne- 1b" 177 - : "=&r" (pte), "=&r" (tmp), "=m" (*p) 178 - : "r" (p), "i" (_PAGE_BUSY) 179 - : "cc"); 171 + while (1) { 172 + old_pte = pte_val(*ptep); 173 + /* 174 + * wait until _PAGE_BUSY is clear then set it atomically 175 + */ 176 + if (unlikely(old_pte & _PAGE_BUSY)) { 177 + cpu_relax(); 178 + continue; 179 + } 180 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 181 + /* If hugepage and is trans splitting return None */ 182 + if (unlikely(hugepage && 183 + pmd_trans_splitting(pte_pmd(old_pte)))) 184 + return __pte(0); 185 + #endif 186 + /* If pte is not present return None */ 187 + if (unlikely(!(old_pte & _PAGE_PRESENT))) 188 + return __pte(0); 180 189 181 - if (pte_present(pte)) { 182 - pte = pte_mkyoung(pte); 183 - if (writing && pte_write(pte)) 184 - pte = pte_mkdirty(pte); 190 + new_pte = pte_mkyoung(old_pte); 191 + if (writing && pte_write(old_pte)) 192 + new_pte = pte_mkdirty(new_pte); 193 + 194 + if (old_pte == __cmpxchg_u64((unsigned long *)ptep, old_pte, 195 + new_pte)) 196 + break; 185 197 } 186 - 187 - *p = pte; /* clears _PAGE_BUSY */ 188 - 189 - return pte; 198 + return new_pte; 190 199 } 200 + 191 201 192 202 /* Return HPTE cache control bits corresponding to Linux pte bits */ 193 203 static inline unsigned long hpte_cache_bits(unsigned long pte_val)

+2 -1

arch/powerpc/include/asm/lppaca.h

··· 66 66 67 67 u8 reserved6[48]; 68 68 u8 cede_latency_hint; 69 - u8 reserved7[7]; 69 + u8 ebb_regs_in_use; 70 + u8 reserved7[6]; 70 71 u8 dtl_enable_mask; /* Dispatch Trace Log mask */ 71 72 u8 donate_dedicated_cpu; /* Donate dedicated CPU cycles */ 72 73 u8 fpregs_in_use;

+7 -4

arch/powerpc/include/asm/machdep.h

··· 36 36 #ifdef CONFIG_PPC64 37 37 void (*hpte_invalidate)(unsigned long slot, 38 38 unsigned long vpn, 39 - int psize, int ssize, 40 - int local); 39 + int bpsize, int apsize, 40 + int ssize, int local); 41 41 long (*hpte_updatepp)(unsigned long slot, 42 42 unsigned long newpp, 43 43 unsigned long vpn, 44 - int psize, int ssize, 45 - int local); 44 + int bpsize, int apsize, 45 + int ssize, int local); 46 46 void (*hpte_updateboltedpp)(unsigned long newpp, 47 47 unsigned long ea, 48 48 int psize, int ssize); ··· 57 57 void (*hpte_removebolted)(unsigned long ea, 58 58 int psize, int ssize); 59 59 void (*flush_hash_range)(unsigned long number, int local); 60 + void (*hugepage_invalidate)(struct mm_struct *mm, 61 + unsigned char *hpte_slot_array, 62 + unsigned long addr, int psize); 60 63 61 64 /* special for kexec, to be called in real mode, linear mapping is 62 65 * destroyed as well */

+14

arch/powerpc/include/asm/mmu-hash64.h

··· 340 340 int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, 341 341 pte_t *ptep, unsigned long trap, int local, int ssize, 342 342 unsigned int shift, unsigned int mmu_psize); 343 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 344 + extern int __hash_page_thp(unsigned long ea, unsigned long access, 345 + unsigned long vsid, pmd_t *pmdp, unsigned long trap, 346 + int local, int ssize, unsigned int psize); 347 + #else 348 + static inline int __hash_page_thp(unsigned long ea, unsigned long access, 349 + unsigned long vsid, pmd_t *pmdp, 350 + unsigned long trap, int local, 351 + int ssize, unsigned int psize) 352 + { 353 + BUG(); 354 + return -1; 355 + } 356 + #endif 343 357 extern void hash_failure_debug(unsigned long ea, unsigned long access, 344 358 unsigned long vsid, unsigned long trap, 345 359 int ssize, int psize, int lpsize,

-1

arch/powerpc/include/asm/mpc5121.h

··· 68 68 }; 69 69 70 70 int mpc512x_cs_config(unsigned int cs, u32 val); 71 - int __init mpc5121_clk_init(void); 72 71 73 72 #endif /* __ASM_POWERPC_MPC5121_H__ */

+5

arch/powerpc/include/asm/mpic.h

··· 339 339 #endif 340 340 }; 341 341 342 + extern struct bus_type mpic_subsys; 343 + 342 344 /* 343 345 * MPIC flags (passed to mpic_alloc) 344 346 * ··· 394 392 395 393 #define MPIC_REGSET_STANDARD MPIC_REGSET(0) /* Original MPIC */ 396 394 #define MPIC_REGSET_TSI108 MPIC_REGSET(1) /* Tsi108/109 PIC */ 395 + 396 + /* Get the version of primary MPIC */ 397 + extern u32 fsl_mpic_primary_get_version(void); 397 398 398 399 /* Allocate the controller structure and setup the linux irq descs 399 400 * for the range if interrupts passed in. No HW initialization is

+46

arch/powerpc/include/asm/mpic_timer.h

··· 1 + /* 2 + * arch/powerpc/include/asm/mpic_timer.h 3 + * 4 + * Header file for Mpic Global Timer 5 + * 6 + * Copyright 2013 Freescale Semiconductor, Inc. 7 + * 8 + * Author: Wang Dongsheng <Dongsheng.Wang@freescale.com> 9 + * Li Yang <leoli@freescale.com> 10 + * 11 + * This program is free software; you can redistribute it and/or modify it 12 + * under the terms of the GNU General Public License as published by the 13 + * Free Software Foundation; either version 2 of the License, or (at your 14 + * option) any later version. 15 + */ 16 + 17 + #ifndef __MPIC_TIMER__ 18 + #define __MPIC_TIMER__ 19 + 20 + #include <linux/interrupt.h> 21 + #include <linux/time.h> 22 + 23 + struct mpic_timer { 24 + void *dev; 25 + struct cascade_priv *cascade_handle; 26 + unsigned int num; 27 + unsigned int irq; 28 + }; 29 + 30 + #ifdef CONFIG_MPIC_TIMER 31 + struct mpic_timer *mpic_request_timer(irq_handler_t fn, void *dev, 32 + const struct timeval *time); 33 + void mpic_start_timer(struct mpic_timer *handle); 34 + void mpic_stop_timer(struct mpic_timer *handle); 35 + void mpic_get_remain_time(struct mpic_timer *handle, struct timeval *time); 36 + void mpic_free_timer(struct mpic_timer *handle); 37 + #else 38 + struct mpic_timer *mpic_request_timer(irq_handler_t fn, void *dev, 39 + const struct timeval *time) { return NULL; } 40 + void mpic_start_timer(struct mpic_timer *handle) { } 41 + void mpic_stop_timer(struct mpic_timer *handle) { } 42 + void mpic_get_remain_time(struct mpic_timer *handle, struct timeval *time) { } 43 + void mpic_free_timer(struct mpic_timer *handle) { } 44 + #endif 45 + 46 + #endif

+119 -21

arch/powerpc/include/asm/opal.h

··· 117 117 #define OPAL_SET_SLOT_LED_STATUS 55 118 118 #define OPAL_GET_EPOW_STATUS 56 119 119 #define OPAL_SET_SYSTEM_ATTENTION_LED 57 120 + #define OPAL_RESERVED1 58 121 + #define OPAL_RESERVED2 59 122 + #define OPAL_PCI_NEXT_ERROR 60 123 + #define OPAL_PCI_EEH_FREEZE_STATUS2 61 124 + #define OPAL_PCI_POLL 62 120 125 #define OPAL_PCI_MSI_EOI 63 126 + #define OPAL_PCI_GET_PHB_DIAG_DATA2 64 121 127 122 128 #ifndef __ASSEMBLY__ 123 129 ··· 131 125 enum OpalVendorApiTokens { 132 126 OPAL_START_VENDOR_API_RANGE = 1000, OPAL_END_VENDOR_API_RANGE = 1999 133 127 }; 128 + 134 129 enum OpalFreezeState { 135 130 OPAL_EEH_STOPPED_NOT_FROZEN = 0, 136 131 OPAL_EEH_STOPPED_MMIO_FREEZE = 1, ··· 141 134 OPAL_EEH_STOPPED_TEMP_UNAVAIL = 5, 142 135 OPAL_EEH_STOPPED_PERM_UNAVAIL = 6 143 136 }; 137 + 144 138 enum OpalEehFreezeActionToken { 145 139 OPAL_EEH_ACTION_CLEAR_FREEZE_MMIO = 1, 146 140 OPAL_EEH_ACTION_CLEAR_FREEZE_DMA = 2, 147 141 OPAL_EEH_ACTION_CLEAR_FREEZE_ALL = 3 148 142 }; 143 + 149 144 enum OpalPciStatusToken { 150 - OPAL_EEH_PHB_NO_ERROR = 0, 151 - OPAL_EEH_PHB_FATAL = 1, 152 - OPAL_EEH_PHB_RECOVERABLE = 2, 153 - OPAL_EEH_PHB_BUS_ERROR = 3, 154 - OPAL_EEH_PCI_NO_DEVSEL = 4, 155 - OPAL_EEH_PCI_TA = 5, 156 - OPAL_EEH_PCIEX_UR = 6, 157 - OPAL_EEH_PCIEX_CA = 7, 158 - OPAL_EEH_PCI_MMIO_ERROR = 8, 159 - OPAL_EEH_PCI_DMA_ERROR = 9 145 + OPAL_EEH_NO_ERROR = 0, 146 + OPAL_EEH_IOC_ERROR = 1, 147 + OPAL_EEH_PHB_ERROR = 2, 148 + OPAL_EEH_PE_ERROR = 3, 149 + OPAL_EEH_PE_MMIO_ERROR = 4, 150 + OPAL_EEH_PE_DMA_ERROR = 5 160 151 }; 152 + 153 + enum OpalPciErrorSeverity { 154 + OPAL_EEH_SEV_NO_ERROR = 0, 155 + OPAL_EEH_SEV_IOC_DEAD = 1, 156 + OPAL_EEH_SEV_PHB_DEAD = 2, 157 + OPAL_EEH_SEV_PHB_FENCED = 3, 158 + OPAL_EEH_SEV_PE_ER = 4, 159 + OPAL_EEH_SEV_INF = 5 160 + }; 161 + 161 162 enum OpalShpcAction { 162 163 OPAL_SHPC_GET_LINK_STATE = 0, 163 164 OPAL_SHPC_GET_SLOT_STATE = 1 164 165 }; 166 + 165 167 enum OpalShpcLinkState { 166 168 OPAL_SHPC_LINK_DOWN = 0, 167 169 OPAL_SHPC_LINK_UP = 1 168 170 }; 171 + 169 172 enum OpalMmioWindowType { 170 173 OPAL_M32_WINDOW_TYPE = 1, 171 174 OPAL_M64_WINDOW_TYPE = 2, 172 175 OPAL_IO_WINDOW_TYPE = 3 173 176 }; 177 + 174 178 enum OpalShpcSlotState { 175 179 OPAL_SHPC_DEV_NOT_PRESENT = 0, 176 180 OPAL_SHPC_DEV_PRESENT = 1 177 181 }; 182 + 178 183 enum OpalExceptionHandler { 179 184 OPAL_MACHINE_CHECK_HANDLER = 1, 180 185 OPAL_HYPERVISOR_MAINTENANCE_HANDLER = 2, 181 186 OPAL_SOFTPATCH_HANDLER = 3 182 187 }; 188 + 183 189 enum OpalPendingState { 184 - OPAL_EVENT_OPAL_INTERNAL = 0x1, 185 - OPAL_EVENT_NVRAM = 0x2, 186 - OPAL_EVENT_RTC = 0x4, 187 - OPAL_EVENT_CONSOLE_OUTPUT = 0x8, 188 - OPAL_EVENT_CONSOLE_INPUT = 0x10, 189 - OPAL_EVENT_ERROR_LOG_AVAIL = 0x20, 190 - OPAL_EVENT_ERROR_LOG = 0x40, 191 - OPAL_EVENT_EPOW = 0x80, 192 - OPAL_EVENT_LED_STATUS = 0x100 190 + OPAL_EVENT_OPAL_INTERNAL = 0x1, 191 + OPAL_EVENT_NVRAM = 0x2, 192 + OPAL_EVENT_RTC = 0x4, 193 + OPAL_EVENT_CONSOLE_OUTPUT = 0x8, 194 + OPAL_EVENT_CONSOLE_INPUT = 0x10, 195 + OPAL_EVENT_ERROR_LOG_AVAIL = 0x20, 196 + OPAL_EVENT_ERROR_LOG = 0x40, 197 + OPAL_EVENT_EPOW = 0x80, 198 + OPAL_EVENT_LED_STATUS = 0x100, 199 + OPAL_EVENT_PCI_ERROR = 0x200 193 200 }; 194 201 195 202 /* Machine check related definitions */ ··· 385 364 } u; 386 365 }; 387 366 367 + enum { 368 + OPAL_P7IOC_DIAG_TYPE_NONE = 0, 369 + OPAL_P7IOC_DIAG_TYPE_RGC = 1, 370 + OPAL_P7IOC_DIAG_TYPE_BI = 2, 371 + OPAL_P7IOC_DIAG_TYPE_CI = 3, 372 + OPAL_P7IOC_DIAG_TYPE_MISC = 4, 373 + OPAL_P7IOC_DIAG_TYPE_I2C = 5, 374 + OPAL_P7IOC_DIAG_TYPE_LAST = 6 375 + }; 376 + 377 + struct OpalIoP7IOCErrorData { 378 + uint16_t type; 379 + 380 + /* GEM */ 381 + uint64_t gemXfir; 382 + uint64_t gemRfir; 383 + uint64_t gemRirqfir; 384 + uint64_t gemMask; 385 + uint64_t gemRwof; 386 + 387 + /* LEM */ 388 + uint64_t lemFir; 389 + uint64_t lemErrMask; 390 + uint64_t lemAction0; 391 + uint64_t lemAction1; 392 + uint64_t lemWof; 393 + 394 + union { 395 + struct OpalIoP7IOCRgcErrorData { 396 + uint64_t rgcStatus; /* 3E1C10 */ 397 + uint64_t rgcLdcp; /* 3E1C18 */ 398 + }rgc; 399 + struct OpalIoP7IOCBiErrorData { 400 + uint64_t biLdcp0; /* 3C0100, 3C0118 */ 401 + uint64_t biLdcp1; /* 3C0108, 3C0120 */ 402 + uint64_t biLdcp2; /* 3C0110, 3C0128 */ 403 + uint64_t biFenceStatus; /* 3C0130, 3C0130 */ 404 + 405 + uint8_t biDownbound; /* BI Downbound or Upbound */ 406 + }bi; 407 + struct OpalIoP7IOCCiErrorData { 408 + uint64_t ciPortStatus; /* 3Dn008 */ 409 + uint64_t ciPortLdcp; /* 3Dn010 */ 410 + 411 + uint8_t ciPort; /* Index of CI port: 0/1 */ 412 + }ci; 413 + }; 414 + }; 415 + 388 416 /** 389 417 * This structure defines the overlay which will be used to store PHB error 390 418 * data upon request. 391 419 */ 392 420 enum { 421 + OPAL_PHB_ERROR_DATA_VERSION_1 = 1, 422 + }; 423 + 424 + enum { 425 + OPAL_PHB_ERROR_DATA_TYPE_P7IOC = 1, 426 + }; 427 + 428 + enum { 393 429 OPAL_P7IOC_NUM_PEST_REGS = 128, 394 430 }; 395 431 432 + struct OpalIoPhbErrorCommon { 433 + uint32_t version; 434 + uint32_t ioType; 435 + uint32_t len; 436 + }; 437 + 396 438 struct OpalIoP7IOCPhbErrorData { 439 + struct OpalIoPhbErrorCommon common; 440 + 397 441 uint32_t brdgCtl; 398 442 399 443 // P7IOC utl regs ··· 616 530 uint64_t pci_mem_size); 617 531 int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t assert_state); 618 532 619 - int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer, uint64_t diag_buffer_len); 620 - int64_t opal_pci_get_phb_diag_data(uint64_t phb_id, void *diag_buffer, uint64_t diag_buffer_len); 533 + int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer, 534 + uint64_t diag_buffer_len); 535 + int64_t opal_pci_get_phb_diag_data(uint64_t phb_id, void *diag_buffer, 536 + uint64_t diag_buffer_len); 537 + int64_t opal_pci_get_phb_diag_data2(uint64_t phb_id, void *diag_buffer, 538 + uint64_t diag_buffer_len); 621 539 int64_t opal_pci_fence_phb(uint64_t phb_id); 622 540 int64_t opal_pci_reinit(uint64_t phb_id, uint8_t reinit_scope); 623 541 int64_t opal_pci_mask_pe_error(uint64_t phb_id, uint16_t pe_number, uint8_t error_type, uint8_t mask_action); 624 542 int64_t opal_set_slot_led_status(uint64_t phb_id, uint64_t slot_id, uint8_t led_type, uint8_t led_action); 625 543 int64_t opal_get_epow_status(uint64_t *status); 626 544 int64_t opal_set_system_attention_led(uint8_t led_action); 545 + int64_t opal_pci_next_error(uint64_t phb_id, uint64_t *first_frozen_pe, 546 + uint16_t *pci_error_type, uint16_t *severity); 547 + int64_t opal_pci_poll(uint64_t phb_id); 627 548 628 549 /* Internal functions */ 629 550 extern int early_init_dt_scan_opal(unsigned long node, const char *uname, int depth, void *data); ··· 643 550 /* Internal functions */ 644 551 extern int early_init_dt_scan_opal(unsigned long node, const char *uname, 645 552 int depth, void *data); 553 + 554 + extern int opal_notifier_register(struct notifier_block *nb); 555 + extern void opal_notifier_enable(void); 556 + extern void opal_notifier_disable(void); 557 + extern void opal_notifier_update_evt(uint64_t evt_mask, uint64_t evt_val); 646 558 647 559 extern int opal_get_chars(uint32_t vtermno, char *buf, int count); 648 560 extern int opal_put_chars(uint32_t vtermno, const char *buf, int total_len);

+6

arch/powerpc/include/asm/perf_event_server.h

··· 60 60 #define PPMU_HAS_SSLOT 0x00000020 /* Has sampled slot in MMCRA */ 61 61 #define PPMU_HAS_SIER 0x00000040 /* Has SIER */ 62 62 #define PPMU_BHRB 0x00000080 /* has BHRB feature enabled */ 63 + #define PPMU_EBB 0x00000100 /* supports event based branch */ 63 64 64 65 /* 65 66 * Values for flags to get_alternatives() ··· 68 67 #define PPMU_LIMITED_PMC_OK 1 /* can put this on a limited PMC */ 69 68 #define PPMU_LIMITED_PMC_REQD 2 /* have to put this on a limited PMC */ 70 69 #define PPMU_ONLY_COUNT_RUN 4 /* only counting in run state */ 70 + 71 + /* 72 + * We use the event config bit 63 as a flag to request EBB. 73 + */ 74 + #define EVENT_CONFIG_EBB_SHIFT 63 71 75 72 76 extern int register_power_pmu(struct power_pmu *); 73 77

+3 -3

arch/powerpc/include/asm/pgalloc-64.h

··· 221 221 222 222 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) 223 223 { 224 - return kmem_cache_alloc(PGT_CACHE(PMD_INDEX_SIZE), 224 + return kmem_cache_alloc(PGT_CACHE(PMD_CACHE_INDEX), 225 225 GFP_KERNEL|__GFP_REPEAT); 226 226 } 227 227 228 228 static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd) 229 229 { 230 - kmem_cache_free(PGT_CACHE(PMD_INDEX_SIZE), pmd); 230 + kmem_cache_free(PGT_CACHE(PMD_CACHE_INDEX), pmd); 231 231 } 232 232 233 233 #define __pmd_free_tlb(tlb, pmd, addr) \ 234 - pgtable_free_tlb(tlb, pmd, PMD_INDEX_SIZE) 234 + pgtable_free_tlb(tlb, pmd, PMD_CACHE_INDEX) 235 235 #ifndef CONFIG_PPC_64K_PAGES 236 236 #define __pud_free_tlb(tlb, pud, addr) \ 237 237 pgtable_free_tlb(tlb, pud, PUD_INDEX_SIZE)

+2 -1

arch/powerpc/include/asm/pgtable-ppc64-64k.h

··· 33 33 #define PGDIR_MASK (~(PGDIR_SIZE-1)) 34 34 35 35 /* Bits to mask out from a PMD to get to the PTE page */ 36 - #define PMD_MASKED_BITS 0x1ff 36 + /* PMDs point to PTE table fragments which are 4K aligned. */ 37 + #define PMD_MASKED_BITS 0xfff 37 38 /* Bits to mask out from a PGD/PUD to get to the PMD page */ 38 39 #define PUD_MASKED_BITS 0x1ff 39 40

+218 -39

arch/powerpc/include/asm/pgtable-ppc64.h

··· 10 10 #else 11 11 #include <asm/pgtable-ppc64-4k.h> 12 12 #endif 13 + #include <asm/barrier.h> 13 14 14 15 #define FIRST_USER_ADDRESS 0 15 16 ··· 21 20 PUD_INDEX_SIZE + PGD_INDEX_SIZE + PAGE_SHIFT) 22 21 #define PGTABLE_RANGE (ASM_CONST(1) << PGTABLE_EADDR_SIZE) 23 22 24 - 23 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 24 + #define PMD_CACHE_INDEX (PMD_INDEX_SIZE + 1) 25 + #else 26 + #define PMD_CACHE_INDEX PMD_INDEX_SIZE 27 + #endif 25 28 /* 26 29 * Define the address range of the kernel non-linear virtual area 27 30 */ ··· 155 150 #define pmd_present(pmd) (pmd_val(pmd) != 0) 156 151 #define pmd_clear(pmdp) (pmd_val(*(pmdp)) = 0) 157 152 #define pmd_page_vaddr(pmd) (pmd_val(pmd) & ~PMD_MASKED_BITS) 158 - #define pmd_page(pmd) virt_to_page(pmd_page_vaddr(pmd)) 153 + extern struct page *pmd_page(pmd_t pmd); 159 154 160 155 #define pud_set(pudp, pudval) (pud_val(*(pudp)) = (pudval)) 161 156 #define pud_none(pud) (!pud_val(pud)) ··· 344 339 345 340 void pgtable_cache_add(unsigned shift, void (*ctor)(void *)); 346 341 void pgtable_cache_init(void); 347 - 348 - /* 349 - * find_linux_pte returns the address of a linux pte for a given 350 - * effective address and directory. If not found, it returns zero. 351 - */ 352 - static inline pte_t *find_linux_pte(pgd_t *pgdir, unsigned long ea) 353 - { 354 - pgd_t *pg; 355 - pud_t *pu; 356 - pmd_t *pm; 357 - pte_t *pt = NULL; 358 - 359 - pg = pgdir + pgd_index(ea); 360 - if (!pgd_none(*pg)) { 361 - pu = pud_offset(pg, ea); 362 - if (!pud_none(*pu)) { 363 - pm = pmd_offset(pu, ea); 364 - if (pmd_present(*pm)) 365 - pt = pte_offset_kernel(pm, ea); 366 - } 367 - } 368 - return pt; 369 - } 370 - 371 - #ifdef CONFIG_HUGETLB_PAGE 372 - pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, 373 - unsigned *shift); 374 - #else 375 - static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, 376 - unsigned *shift) 377 - { 378 - if (shift) 379 - *shift = 0; 380 - return find_linux_pte(pgdir, ea); 381 - } 382 - #endif /* !CONFIG_HUGETLB_PAGE */ 383 - 384 342 #endif /* __ASSEMBLY__ */ 385 343 344 + /* 345 + * THP pages can't be special. So use the _PAGE_SPECIAL 346 + */ 347 + #define _PAGE_SPLITTING _PAGE_SPECIAL 348 + 349 + /* 350 + * We need to differentiate between explicit huge page and THP huge 351 + * page, since THP huge page also need to track real subpage details 352 + */ 353 + #define _PAGE_THP_HUGE _PAGE_4K_PFN 354 + 355 + /* 356 + * set of bits not changed in pmd_modify. 357 + */ 358 + #define _HPAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS | \ 359 + _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_SPLITTING | \ 360 + _PAGE_THP_HUGE) 361 + 362 + #ifndef __ASSEMBLY__ 363 + /* 364 + * The linux hugepage PMD now include the pmd entries followed by the address 365 + * to the stashed pgtable_t. The stashed pgtable_t contains the hpte bits. 366 + * [ 1 bit secondary | 3 bit hidx | 1 bit valid | 000]. We use one byte per 367 + * each HPTE entry. With 16MB hugepage and 64K HPTE we need 256 entries and 368 + * with 4K HPTE we need 4096 entries. Both will fit in a 4K pgtable_t. 369 + * 370 + * The last three bits are intentionally left to zero. This memory location 371 + * are also used as normal page PTE pointers. So if we have any pointers 372 + * left around while we collapse a hugepage, we need to make sure 373 + * _PAGE_PRESENT and _PAGE_FILE bits of that are zero when we look at them 374 + */ 375 + static inline unsigned int hpte_valid(unsigned char *hpte_slot_array, int index) 376 + { 377 + return (hpte_slot_array[index] >> 3) & 0x1; 378 + } 379 + 380 + static inline unsigned int hpte_hash_index(unsigned char *hpte_slot_array, 381 + int index) 382 + { 383 + return hpte_slot_array[index] >> 4; 384 + } 385 + 386 + static inline void mark_hpte_slot_valid(unsigned char *hpte_slot_array, 387 + unsigned int index, unsigned int hidx) 388 + { 389 + hpte_slot_array[index] = hidx << 4 | 0x1 << 3; 390 + } 391 + 392 + static inline char *get_hpte_slot_array(pmd_t *pmdp) 393 + { 394 + /* 395 + * The hpte hindex is stored in the pgtable whose address is in the 396 + * second half of the PMD 397 + * 398 + * Order this load with the test for pmd_trans_huge in the caller 399 + */ 400 + smp_rmb(); 401 + return *(char **)(pmdp + PTRS_PER_PMD); 402 + 403 + 404 + } 405 + 406 + extern void hpte_do_hugepage_flush(struct mm_struct *mm, unsigned long addr, 407 + pmd_t *pmdp); 408 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 409 + extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot); 410 + extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot); 411 + extern pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot); 412 + extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, 413 + pmd_t *pmdp, pmd_t pmd); 414 + extern void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr, 415 + pmd_t *pmd); 416 + 417 + static inline int pmd_trans_huge(pmd_t pmd) 418 + { 419 + /* 420 + * leaf pte for huge page, bottom two bits != 00 421 + */ 422 + return (pmd_val(pmd) & 0x3) && (pmd_val(pmd) & _PAGE_THP_HUGE); 423 + } 424 + 425 + static inline int pmd_large(pmd_t pmd) 426 + { 427 + /* 428 + * leaf pte for huge page, bottom two bits != 00 429 + */ 430 + if (pmd_trans_huge(pmd)) 431 + return pmd_val(pmd) & _PAGE_PRESENT; 432 + return 0; 433 + } 434 + 435 + static inline int pmd_trans_splitting(pmd_t pmd) 436 + { 437 + if (pmd_trans_huge(pmd)) 438 + return pmd_val(pmd) & _PAGE_SPLITTING; 439 + return 0; 440 + } 441 + 442 + extern int has_transparent_hugepage(void); 443 + #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ 444 + 445 + static inline pte_t pmd_pte(pmd_t pmd) 446 + { 447 + return __pte(pmd_val(pmd)); 448 + } 449 + 450 + static inline pmd_t pte_pmd(pte_t pte) 451 + { 452 + return __pmd(pte_val(pte)); 453 + } 454 + 455 + static inline pte_t *pmdp_ptep(pmd_t *pmd) 456 + { 457 + return (pte_t *)pmd; 458 + } 459 + 460 + #define pmd_pfn(pmd) pte_pfn(pmd_pte(pmd)) 461 + #define pmd_young(pmd) pte_young(pmd_pte(pmd)) 462 + #define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd))) 463 + #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd))) 464 + #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd))) 465 + #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) 466 + #define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd))) 467 + 468 + #define __HAVE_ARCH_PMD_WRITE 469 + #define pmd_write(pmd) pte_write(pmd_pte(pmd)) 470 + 471 + static inline pmd_t pmd_mkhuge(pmd_t pmd) 472 + { 473 + /* Do nothing, mk_pmd() does this part. */ 474 + return pmd; 475 + } 476 + 477 + static inline pmd_t pmd_mknotpresent(pmd_t pmd) 478 + { 479 + pmd_val(pmd) &= ~_PAGE_PRESENT; 480 + return pmd; 481 + } 482 + 483 + static inline pmd_t pmd_mksplitting(pmd_t pmd) 484 + { 485 + pmd_val(pmd) |= _PAGE_SPLITTING; 486 + return pmd; 487 + } 488 + 489 + #define __HAVE_ARCH_PMD_SAME 490 + static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b) 491 + { 492 + return (((pmd_val(pmd_a) ^ pmd_val(pmd_b)) & ~_PAGE_HPTEFLAGS) == 0); 493 + } 494 + 495 + #define __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS 496 + extern int pmdp_set_access_flags(struct vm_area_struct *vma, 497 + unsigned long address, pmd_t *pmdp, 498 + pmd_t entry, int dirty); 499 + 500 + extern unsigned long pmd_hugepage_update(struct mm_struct *mm, 501 + unsigned long addr, 502 + pmd_t *pmdp, unsigned long clr); 503 + 504 + static inline int __pmdp_test_and_clear_young(struct mm_struct *mm, 505 + unsigned long addr, pmd_t *pmdp) 506 + { 507 + unsigned long old; 508 + 509 + if ((pmd_val(*pmdp) & (_PAGE_ACCESSED | _PAGE_HASHPTE)) == 0) 510 + return 0; 511 + old = pmd_hugepage_update(mm, addr, pmdp, _PAGE_ACCESSED); 512 + return ((old & _PAGE_ACCESSED) != 0); 513 + } 514 + 515 + #define __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG 516 + extern int pmdp_test_and_clear_young(struct vm_area_struct *vma, 517 + unsigned long address, pmd_t *pmdp); 518 + #define __HAVE_ARCH_PMDP_CLEAR_YOUNG_FLUSH 519 + extern int pmdp_clear_flush_young(struct vm_area_struct *vma, 520 + unsigned long address, pmd_t *pmdp); 521 + 522 + #define __HAVE_ARCH_PMDP_GET_AND_CLEAR 523 + extern pmd_t pmdp_get_and_clear(struct mm_struct *mm, 524 + unsigned long addr, pmd_t *pmdp); 525 + 526 + #define __HAVE_ARCH_PMDP_CLEAR_FLUSH 527 + extern pmd_t pmdp_clear_flush(struct vm_area_struct *vma, unsigned long address, 528 + pmd_t *pmdp); 529 + 530 + #define __HAVE_ARCH_PMDP_SET_WRPROTECT 531 + static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, 532 + pmd_t *pmdp) 533 + { 534 + 535 + if ((pmd_val(*pmdp) & _PAGE_RW) == 0) 536 + return; 537 + 538 + pmd_hugepage_update(mm, addr, pmdp, _PAGE_RW); 539 + } 540 + 541 + #define __HAVE_ARCH_PMDP_SPLITTING_FLUSH 542 + extern void pmdp_splitting_flush(struct vm_area_struct *vma, 543 + unsigned long address, pmd_t *pmdp); 544 + 545 + #define __HAVE_ARCH_PGTABLE_DEPOSIT 546 + extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, 547 + pgtable_t pgtable); 548 + #define __HAVE_ARCH_PGTABLE_WITHDRAW 549 + extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp); 550 + 551 + #define __HAVE_ARCH_PMDP_INVALIDATE 552 + extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, 553 + pmd_t *pmdp); 554 + #endif /* __ASSEMBLY__ */ 386 555 #endif /* _ASM_POWERPC_PGTABLE_PPC64_H_ */

+6

arch/powerpc/include/asm/pgtable.h

··· 217 217 218 218 extern int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, 219 219 unsigned long end, int write, struct page **pages, int *nr); 220 + #ifndef CONFIG_TRANSPARENT_HUGEPAGE 221 + #define pmd_large(pmd) 0 222 + #define has_transparent_hugepage() 0 223 + #endif 224 + pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, 225 + unsigned *shift); 220 226 #endif /* __ASSEMBLY__ */ 221 227 222 228 #endif /* __KERNEL__ */

+25

arch/powerpc/include/asm/probes.h

··· 38 38 #define is_trap(instr) (IS_TW(instr) || IS_TWI(instr)) 39 39 #endif /* CONFIG_PPC64 */ 40 40 41 + #ifdef CONFIG_PPC_ADV_DEBUG_REGS 42 + #define MSR_SINGLESTEP (MSR_DE) 43 + #else 44 + #define MSR_SINGLESTEP (MSR_SE) 45 + #endif 46 + 47 + /* Enable single stepping for the current task */ 48 + static inline void enable_single_step(struct pt_regs *regs) 49 + { 50 + regs->msr |= MSR_SINGLESTEP; 51 + #ifdef CONFIG_PPC_ADV_DEBUG_REGS 52 + /* 53 + * We turn off Critical Input Exception(CE) to ensure that the single 54 + * step will be for the instruction we have the probe on; if we don't, 55 + * it is possible we'd get the single step reported for CE. 56 + */ 57 + regs->msr &= ~MSR_CE; 58 + mtspr(SPRN_DBCR0, mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM); 59 + #ifdef CONFIG_PPC_47x 60 + isync(); 61 + #endif 62 + #endif 63 + } 64 + 65 + 41 66 #endif /* __KERNEL__ */ 42 67 #endif /* _ASM_POWERPC_PROBES_H */

+7 -9

arch/powerpc/include/asm/processor.h

··· 168 168 * The following help to manage the use of Debug Control Registers 169 169 * om the BookE platforms. 170 170 */ 171 - unsigned long dbcr0; 172 - unsigned long dbcr1; 171 + uint32_t dbcr0; 172 + uint32_t dbcr1; 173 173 #ifdef CONFIG_BOOKE 174 - unsigned long dbcr2; 174 + uint32_t dbcr2; 175 175 #endif 176 176 /* 177 177 * The stored value of the DBSR register will be the value at the ··· 179 179 * user (will never be written to) and has value while helping to 180 180 * describe the reason for the last debug trap. Torez 181 181 */ 182 - unsigned long dbsr; 182 + uint32_t dbsr; 183 183 /* 184 184 * The following will contain addresses used by debug applications 185 185 * to help trace and trap on particular address locations. ··· 200 200 #endif 201 201 #endif 202 202 /* FP and VSX 0-31 register set */ 203 - double fpr[32][TS_FPRWIDTH]; 203 + double fpr[32][TS_FPRWIDTH] __attribute__((aligned(16))); 204 204 struct { 205 205 206 206 unsigned int pad; ··· 287 287 unsigned long siar; 288 288 unsigned long sdar; 289 289 unsigned long sier; 290 - unsigned long mmcr0; 291 290 unsigned long mmcr2; 292 - unsigned long mmcra; 291 + unsigned mmcr0; 292 + unsigned used_ebb; 293 293 #endif 294 294 }; 295 295 ··· 404 404 405 405 #define spin_lock_prefetch(x) prefetchw(x) 406 406 407 - #ifdef CONFIG_PPC64 408 407 #define HAVE_ARCH_PICK_MMAP_LAYOUT 409 - #endif 410 408 411 409 #ifdef CONFIG_PPC64 412 410 static inline unsigned long get_clean_sp(unsigned long sp, int is_32)

+9

arch/powerpc/include/asm/reg.h

··· 621 621 #define MMCR0_PMXE 0x04000000UL /* performance monitor exception enable */ 622 622 #define MMCR0_FCECE 0x02000000UL /* freeze ctrs on enabled cond or event */ 623 623 #define MMCR0_TBEE 0x00400000UL /* time base exception enable */ 624 + #define MMCR0_EBE 0x00100000UL /* Event based branch enable */ 625 + #define MMCR0_PMCC 0x000c0000UL /* PMC control */ 626 + #define MMCR0_PMCC_U6 0x00080000UL /* PMC1-6 are R/W by user (PR) */ 624 627 #define MMCR0_PMC1CE 0x00008000UL /* PMC1 count enable*/ 625 628 #define MMCR0_PMCjCE 0x00004000UL /* PMCj count enable*/ 626 629 #define MMCR0_TRIGGER 0x00002000UL /* TRIGGER enable */ 627 630 #define MMCR0_PMAO 0x00000080UL /* performance monitor alert has occurred, set to 0 after handling exception */ 628 631 #define MMCR0_SHRFC 0x00000040UL /* SHRre freeze conditions between threads */ 632 + #define MMCR0_FC56 0x00000010UL /* freeze counters 5 and 6 */ 629 633 #define MMCR0_FCTI 0x00000008UL /* freeze counters in tags inactive mode */ 630 634 #define MMCR0_FCTA 0x00000004UL /* freeze counters in tags active mode */ 631 635 #define MMCR0_FCWAIT 0x00000002UL /* freeze counter in WAIT state */ ··· 676 672 #define SIER_SIHV 0x1000000 /* Sampled MSR_HV */ 677 673 #define SIER_SIAR_VALID 0x0400000 /* SIAR contents valid */ 678 674 #define SIER_SDAR_VALID 0x0200000 /* SDAR contents valid */ 675 + 676 + /* When EBB is enabled, some of MMCR0/MMCR2/SIER are user accessible */ 677 + #define MMCR0_USER_MASK (MMCR0_FC | MMCR0_PMXE | MMCR0_PMAO) 678 + #define MMCR2_USER_MASK 0x4020100804020000UL /* (FC1P|FC2P|FC3P|FC4P|FC5P|FC6P) */ 679 + #define SIER_USER_MASK 0x7fffffUL 679 680 680 681 #define SPRN_PA6T_MMCR0 795 681 682 #define PA6T_MMCR0_EN0 0x0000000000000001UL

+2 -2

arch/powerpc/include/asm/rtas.h

··· 350 350 (devfn << 8) | (reg & 0xff); 351 351 } 352 352 353 - extern void __cpuinit rtas_give_timebase(void); 354 - extern void __cpuinit rtas_take_timebase(void); 353 + extern void rtas_give_timebase(void); 354 + extern void rtas_take_timebase(void); 355 355 356 356 #ifdef CONFIG_PPC_RTAS 357 357 static inline int page_is_rtas_user_buf(unsigned long pfn)

+14

arch/powerpc/include/asm/switch_to.h

··· 67 67 } 68 68 #endif 69 69 70 + static inline void clear_task_ebb(struct task_struct *t) 71 + { 72 + #ifdef CONFIG_PPC_BOOK3S_64 73 + /* EBB perf events are not inherited, so clear all EBB state. */ 74 + t->thread.bescr = 0; 75 + t->thread.mmcr2 = 0; 76 + t->thread.mmcr0 = 0; 77 + t->thread.siar = 0; 78 + t->thread.sdar = 0; 79 + t->thread.sier = 0; 80 + t->thread.used_ebb = 0; 81 + #endif 82 + } 83 + 70 84 #endif /* _ASM_POWERPC_SWITCH_TO_H */

+2 -1

arch/powerpc/include/asm/tlbflush.h

··· 165 165 /* Private function for use by PCI IO mapping code */ 166 166 extern void __flush_hash_table_range(struct mm_struct *mm, unsigned long start, 167 167 unsigned long end); 168 - 168 + extern void flush_tlb_pmd_range(struct mm_struct *mm, pmd_t *pmd, 169 + unsigned long addr); 169 170 #else 170 171 #error Unsupported MMU type 171 172 #endif

+1 -1

arch/powerpc/include/asm/vdso.h

··· 22 22 extern unsigned long vdso32_sigtramp; 23 23 extern unsigned long vdso32_rt_sigtramp; 24 24 25 - int __cpuinit vdso_getcpu_init(void); 25 + int vdso_getcpu_init(void); 26 26 27 27 #else /* __ASSEMBLY__ */ 28 28

+3 -1

arch/powerpc/kernel/Makefile

··· 58 58 obj-$(CONFIG_LPARCFG) += lparcfg.o 59 59 obj-$(CONFIG_IBMVIO) += vio.o 60 60 obj-$(CONFIG_IBMEBUS) += ibmebus.o 61 + obj-$(CONFIG_EEH) += eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \ 62 + eeh_driver.o eeh_event.o eeh_sysfs.o 61 63 obj-$(CONFIG_GENERIC_TBSYNC) += smp-tbsync.o 62 64 obj-$(CONFIG_CRASH_DUMP) += crash_dump.o 63 65 obj-$(CONFIG_FA_DUMP) += fadump.o ··· 102 100 obj-$(CONFIG_STACKTRACE) += stacktrace.o 103 101 obj-$(CONFIG_SWIOTLB) += dma-swiotlb.o 104 102 105 - pci64-$(CONFIG_PPC64) += pci_dn.o isa-bridge.o 103 + pci64-$(CONFIG_PPC64) += pci_dn.o pci-hotplug.o isa-bridge.o 106 104 obj-$(CONFIG_PCI) += pci_$(CONFIG_WORD_SIZE).o $(pci64-y) \ 107 105 pci-common.o pci_of_scan.o 108 106 obj-$(CONFIG_PCI_MSI) += msi.o

+3 -4

arch/powerpc/kernel/asm-offsets.c

··· 105 105 DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid)); 106 106 #else /* CONFIG_PPC64 */ 107 107 DEFINE(PGDIR, offsetof(struct thread_struct, pgdir)); 108 - #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE) 109 - DEFINE(THREAD_DBCR0, offsetof(struct thread_struct, dbcr0)); 110 - #endif 111 108 #ifdef CONFIG_SPE 112 109 DEFINE(THREAD_EVR0, offsetof(struct thread_struct, evr[0])); 113 110 DEFINE(THREAD_ACC, offsetof(struct thread_struct, acc)); ··· 112 115 DEFINE(THREAD_USED_SPE, offsetof(struct thread_struct, used_spe)); 113 116 #endif /* CONFIG_SPE */ 114 117 #endif /* CONFIG_PPC64 */ 118 + #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE) 119 + DEFINE(THREAD_DBCR0, offsetof(struct thread_struct, dbcr0)); 120 + #endif 115 121 #ifdef CONFIG_KVM_BOOK3S_32_HANDLER 116 122 DEFINE(THREAD_KVM_SVCPU, offsetof(struct thread_struct, kvm_shadow_vcpu)); 117 123 #endif ··· 132 132 DEFINE(THREAD_SIER, offsetof(struct thread_struct, sier)); 133 133 DEFINE(THREAD_MMCR0, offsetof(struct thread_struct, mmcr0)); 134 134 DEFINE(THREAD_MMCR2, offsetof(struct thread_struct, mmcr2)); 135 - DEFINE(THREAD_MMCRA, offsetof(struct thread_struct, mmcra)); 136 135 #endif 137 136 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM 138 137 DEFINE(PACATMSCRATCH, offsetof(struct paca_struct, tm_scratch));

+21 -15

arch/powerpc/kernel/cacheinfo.c

··· 131 131 return cache_type_info[cache->type].name; 132 132 } 133 133 134 - static void __cpuinit cache_init(struct cache *cache, int type, int level, struct device_node *ofnode) 134 + static void cache_init(struct cache *cache, int type, int level, 135 + struct device_node *ofnode) 135 136 { 136 137 cache->type = type; 137 138 cache->level = level; ··· 141 140 list_add(&cache->list, &cache_list); 142 141 } 143 142 144 - static struct cache *__cpuinit new_cache(int type, int level, struct device_node *ofnode) 143 + static struct cache *new_cache(int type, int level, struct device_node *ofnode) 145 144 { 146 145 struct cache *cache; 147 146 ··· 325 324 return of_get_property(np, "cache-unified", NULL); 326 325 } 327 326 328 - static struct cache *__cpuinit cache_do_one_devnode_unified(struct device_node *node, int level) 327 + static struct cache *cache_do_one_devnode_unified(struct device_node *node, 328 + int level) 329 329 { 330 330 struct cache *cache; 331 331 ··· 337 335 return cache; 338 336 } 339 337 340 - static struct cache *__cpuinit cache_do_one_devnode_split(struct device_node *node, int level) 338 + static struct cache *cache_do_one_devnode_split(struct device_node *node, 339 + int level) 341 340 { 342 341 struct cache *dcache, *icache; 343 342 ··· 360 357 return NULL; 361 358 } 362 359 363 - static struct cache *__cpuinit cache_do_one_devnode(struct device_node *node, int level) 360 + static struct cache *cache_do_one_devnode(struct device_node *node, int level) 364 361 { 365 362 struct cache *cache; 366 363 ··· 372 369 return cache; 373 370 } 374 371 375 - static struct cache *__cpuinit cache_lookup_or_instantiate(struct device_node *node, int level) 372 + static struct cache *cache_lookup_or_instantiate(struct device_node *node, 373 + int level) 376 374 { 377 375 struct cache *cache; 378 376 ··· 389 385 return cache; 390 386 } 391 387 392 - static void __cpuinit link_cache_lists(struct cache *smaller, struct cache *bigger) 388 + static void link_cache_lists(struct cache *smaller, struct cache *bigger) 393 389 { 394 390 while (smaller->next_local) { 395 391 if (smaller->next_local == bigger) ··· 400 396 smaller->next_local = bigger; 401 397 } 402 398 403 - static void __cpuinit do_subsidiary_caches_debugcheck(struct cache *cache) 399 + static void do_subsidiary_caches_debugcheck(struct cache *cache) 404 400 { 405 401 WARN_ON_ONCE(cache->level != 1); 406 402 WARN_ON_ONCE(strcmp(cache->ofnode->type, "cpu")); 407 403 } 408 404 409 - static void __cpuinit do_subsidiary_caches(struct cache *cache) 405 + static void do_subsidiary_caches(struct cache *cache) 410 406 { 411 407 struct device_node *subcache_node; 412 408 int level = cache->level; ··· 427 423 } 428 424 } 429 425 430 - static struct cache *__cpuinit cache_chain_instantiate(unsigned int cpu_id) 426 + static struct cache *cache_chain_instantiate(unsigned int cpu_id) 431 427 { 432 428 struct device_node *cpu_node; 433 429 struct cache *cpu_cache = NULL; ··· 452 448 return cpu_cache; 453 449 } 454 450 455 - static struct cache_dir *__cpuinit cacheinfo_create_cache_dir(unsigned int cpu_id) 451 + static struct cache_dir *cacheinfo_create_cache_dir(unsigned int cpu_id) 456 452 { 457 453 struct cache_dir *cache_dir; 458 454 struct device *dev; ··· 657 653 .default_attrs = cache_index_default_attrs, 658 654 }; 659 655 660 - static void __cpuinit cacheinfo_create_index_opt_attrs(struct cache_index_dir *dir) 656 + static void cacheinfo_create_index_opt_attrs(struct cache_index_dir *dir) 661 657 { 662 658 const char *cache_name; 663 659 const char *cache_type; ··· 700 696 kfree(buf); 701 697 } 702 698 703 - static void __cpuinit cacheinfo_create_index_dir(struct cache *cache, int index, struct cache_dir *cache_dir) 699 + static void cacheinfo_create_index_dir(struct cache *cache, int index, 700 + struct cache_dir *cache_dir) 704 701 { 705 702 struct cache_index_dir *index_dir; 706 703 int rc; ··· 727 722 kfree(index_dir); 728 723 } 729 724 730 - static void __cpuinit cacheinfo_sysfs_populate(unsigned int cpu_id, struct cache *cache_list) 725 + static void cacheinfo_sysfs_populate(unsigned int cpu_id, 726 + struct cache *cache_list) 731 727 { 732 728 struct cache_dir *cache_dir; 733 729 struct cache *cache; ··· 746 740 } 747 741 } 748 742 749 - void __cpuinit cacheinfo_cpu_online(unsigned int cpu_id) 743 + void cacheinfo_cpu_online(unsigned int cpu_id) 750 744 { 751 745 struct cache *cache; 752 746

+26 -4

arch/powerpc/kernel/entry_64.S

··· 629 629 630 630 CURRENT_THREAD_INFO(r9, r1) 631 631 ld r3,_MSR(r1) 632 + #ifdef CONFIG_PPC_BOOK3E 633 + ld r10,PACACURRENT(r13) 634 + #endif /* CONFIG_PPC_BOOK3E */ 632 635 ld r4,TI_FLAGS(r9) 633 636 andi. r3,r3,MSR_PR 634 637 beq resume_kernel 638 + #ifdef CONFIG_PPC_BOOK3E 639 + lwz r3,(THREAD+THREAD_DBCR0)(r10) 640 + #endif /* CONFIG_PPC_BOOK3E */ 635 641 636 642 /* Check current_thread_info()->flags */ 637 643 andi. r0,r4,_TIF_USER_WORK_MASK 644 + #ifdef CONFIG_PPC_BOOK3E 645 + bne 1f 646 + /* 647 + * Check to see if the dbcr0 register is set up to debug. 648 + * Use the internal debug mode bit to do this. 649 + */ 650 + andis. r0,r3,DBCR0_IDM@h 638 651 beq restore 639 - 640 - andi. r0,r4,_TIF_NEED_RESCHED 641 - beq 1f 652 + mfmsr r0 653 + rlwinm r0,r0,0,~MSR_DE /* Clear MSR.DE */ 654 + mtmsr r0 655 + mtspr SPRN_DBCR0,r3 656 + li r10, -1 657 + mtspr SPRN_DBSR,r10 658 + b restore 659 + #else 660 + beq restore 661 + #endif 662 + 1: andi. r0,r4,_TIF_NEED_RESCHED 663 + beq 2f 642 664 bl .restore_interrupts 643 665 SCHEDULE_USER 644 666 b .ret_from_except_lite 645 667 646 - 1: bl .save_nvgprs 668 + 2: bl .save_nvgprs 647 669 bl .restore_interrupts 648 670 addi r3,r1,STACK_FRAME_OVERHEAD 649 671 bl .do_notify_resume

+25 -31

arch/powerpc/kernel/exceptions-64s.S

··· 341 341 EXCEPTION_PROLOG_0(PACA_EXGEN) 342 342 b vsx_unavailable_pSeries 343 343 344 + facility_unavailable_trampoline: 344 345 . = 0xf60 345 346 SET_SCRATCH0(r13) 346 347 EXCEPTION_PROLOG_0(PACA_EXGEN) 347 - b tm_unavailable_pSeries 348 + b facility_unavailable_pSeries 349 + 350 + hv_facility_unavailable_trampoline: 351 + . = 0xf80 352 + SET_SCRATCH0(r13) 353 + EXCEPTION_PROLOG_0(PACA_EXGEN) 354 + b facility_unavailable_hv 348 355 349 356 #ifdef CONFIG_CBE_RAS 350 357 STD_EXCEPTION_HV(0x1200, 0x1202, cbe_system_error) ··· 529 522 KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf20) 530 523 STD_EXCEPTION_PSERIES_OOL(0xf40, vsx_unavailable) 531 524 KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf40) 532 - STD_EXCEPTION_PSERIES_OOL(0xf60, tm_unavailable) 525 + STD_EXCEPTION_PSERIES_OOL(0xf60, facility_unavailable) 533 526 KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf60) 527 + STD_EXCEPTION_HV_OOL(0xf82, facility_unavailable) 528 + KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xf82) 534 529 535 530 /* 536 531 * An interrupt came in while soft-disabled. We set paca->irq_happened, then: ··· 802 793 STD_RELON_EXCEPTION_PSERIES(0x4d00, 0xd00, single_step) 803 794 804 795 . = 0x4e00 805 - SET_SCRATCH0(r13) 806 - EXCEPTION_PROLOG_0(PACA_EXGEN) 807 - b h_data_storage_relon_hv 796 + b . /* Can't happen, see v2.07 Book III-S section 6.5 */ 808 797 809 798 . = 0x4e20 810 - SET_SCRATCH0(r13) 811 - EXCEPTION_PROLOG_0(PACA_EXGEN) 812 - b h_instr_storage_relon_hv 799 + b . /* Can't happen, see v2.07 Book III-S section 6.5 */ 813 800 814 801 . = 0x4e40 815 802 SET_SCRATCH0(r13) ··· 813 808 b emulation_assist_relon_hv 814 809 815 810 . = 0x4e60 816 - SET_SCRATCH0(r13) 817 - EXCEPTION_PROLOG_0(PACA_EXGEN) 818 - b hmi_exception_relon_hv 811 + b . /* Can't happen, see v2.07 Book III-S section 6.5 */ 819 812 820 813 . = 0x4e80 821 814 SET_SCRATCH0(r13) ··· 838 835 EXCEPTION_PROLOG_0(PACA_EXGEN) 839 836 b vsx_unavailable_relon_pSeries 840 837 841 - tm_unavailable_relon_pSeries_1: 838 + facility_unavailable_relon_trampoline: 842 839 . = 0x4f60 843 840 SET_SCRATCH0(r13) 844 841 EXCEPTION_PROLOG_0(PACA_EXGEN) 845 - b tm_unavailable_relon_pSeries 842 + b facility_unavailable_relon_pSeries 843 + 844 + hv_facility_unavailable_relon_trampoline: 845 + . = 0x4f80 846 + SET_SCRATCH0(r13) 847 + EXCEPTION_PROLOG_0(PACA_EXGEN) 848 + b facility_unavailable_relon_hv 846 849 847 850 STD_RELON_EXCEPTION_PSERIES(0x5300, 0x1300, instruction_breakpoint) 848 851 #ifdef CONFIG_PPC_DENORMALISATION ··· 1174 1165 bl .vsx_unavailable_exception 1175 1166 b .ret_from_except 1176 1167 1177 - .align 7 1178 - .globl tm_unavailable_common 1179 - tm_unavailable_common: 1180 - EXCEPTION_PROLOG_COMMON(0xf60, PACA_EXGEN) 1181 - bl .save_nvgprs 1182 - DISABLE_INTS 1183 - addi r3,r1,STACK_FRAME_OVERHEAD 1184 - bl .tm_unavailable_exception 1185 - b .ret_from_except 1168 + STD_EXCEPTION_COMMON(0xf60, facility_unavailable, .facility_unavailable_exception) 1186 1169 1187 1170 .align 7 1188 1171 .globl __end_handlers 1189 1172 __end_handlers: 1190 1173 1191 1174 /* Equivalents to the above handlers for relocation-on interrupt vectors */ 1192 - STD_RELON_EXCEPTION_HV_OOL(0xe00, h_data_storage) 1193 - KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe00) 1194 - STD_RELON_EXCEPTION_HV_OOL(0xe20, h_instr_storage) 1195 - KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe20) 1196 1175 STD_RELON_EXCEPTION_HV_OOL(0xe40, emulation_assist) 1197 - KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe40) 1198 - STD_RELON_EXCEPTION_HV_OOL(0xe60, hmi_exception) 1199 - KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe60) 1200 1176 MASKABLE_RELON_EXCEPTION_HV_OOL(0xe80, h_doorbell) 1201 - KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe80) 1202 1177 1203 1178 STD_RELON_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor) 1204 1179 STD_RELON_EXCEPTION_PSERIES_OOL(0xf20, altivec_unavailable) 1205 1180 STD_RELON_EXCEPTION_PSERIES_OOL(0xf40, vsx_unavailable) 1206 - STD_RELON_EXCEPTION_PSERIES_OOL(0xf60, tm_unavailable) 1181 + STD_RELON_EXCEPTION_PSERIES_OOL(0xf60, facility_unavailable) 1182 + STD_RELON_EXCEPTION_HV_OOL(0xf80, facility_unavailable) 1207 1183 1208 1184 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) 1209 1185 /*

+2 -1

arch/powerpc/kernel/hw_breakpoint.c

··· 176 176 length_max = 512 ; /* 64 doublewords */ 177 177 /* DAWR region can't cross 512 boundary */ 178 178 if ((bp->attr.bp_addr >> 10) != 179 - ((bp->attr.bp_addr + bp->attr.bp_len) >> 10)) 179 + ((bp->attr.bp_addr + bp->attr.bp_len - 1) >> 10)) 180 180 return -EINVAL; 181 181 } 182 182 if (info->len > ··· 250 250 * we still need to single-step the instruction, but we don't 251 251 * generate an event. 252 252 */ 253 + info->type &= ~HW_BRK_TYPE_EXTRANEOUS_IRQ; 253 254 if (!((bp->attr.bp_addr <= dar) && 254 255 (dar - bp->attr.bp_addr < bp->attr.bp_len))) 255 256 info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ;

+2 -2

arch/powerpc/kernel/idle.c

··· 85 85 /* 86 86 * Register the sysctl to set/clear powersave_nap. 87 87 */ 88 - static ctl_table powersave_nap_ctl_table[]={ 88 + static struct ctl_table powersave_nap_ctl_table[] = { 89 89 { 90 90 .procname = "powersave-nap", 91 91 .data = &powersave_nap, ··· 95 95 }, 96 96 {} 97 97 }; 98 - static ctl_table powersave_nap_sysctl_root[] = { 98 + static struct ctl_table powersave_nap_sysctl_root[] = { 99 99 { 100 100 .procname = "kernel", 101 101 .mode = 0555,

+9 -2

arch/powerpc/kernel/io-workarounds.c

··· 55 55 56 56 struct iowa_bus *iowa_mem_find_bus(const PCI_IO_ADDR addr) 57 57 { 58 + unsigned hugepage_shift; 58 59 struct iowa_bus *bus; 59 60 int token; 60 61 ··· 71 70 if (vaddr < PHB_IO_BASE || vaddr >= PHB_IO_END) 72 71 return NULL; 73 72 74 - ptep = find_linux_pte(init_mm.pgd, vaddr); 73 + ptep = find_linux_pte_or_hugepte(init_mm.pgd, vaddr, 74 + &hugepage_shift); 75 75 if (ptep == NULL) 76 76 paddr = 0; 77 - else 77 + else { 78 + /* 79 + * we don't have hugepages backing iomem 80 + */ 81 + WARN_ON(hugepage_shift); 78 82 paddr = pte_pfn(*ptep) << PAGE_SHIFT; 83 + } 79 84 bus = iowa_pci_find(vaddr, paddr); 80 85 81 86 if (bus == NULL)

+323

arch/powerpc/kernel/iommu.c

··· 36 36 #include <linux/hash.h> 37 37 #include <linux/fault-inject.h> 38 38 #include <linux/pci.h> 39 + #include <linux/iommu.h> 40 + #include <linux/sched.h> 39 41 #include <asm/io.h> 40 42 #include <asm/prom.h> 41 43 #include <asm/iommu.h> ··· 46 44 #include <asm/kdump.h> 47 45 #include <asm/fadump.h> 48 46 #include <asm/vio.h> 47 + #include <asm/tce.h> 49 48 50 49 #define DBG(...) 51 50 ··· 727 724 if (tbl->it_offset == 0) 728 725 clear_bit(0, tbl->it_map); 729 726 727 + #ifdef CONFIG_IOMMU_API 728 + if (tbl->it_group) { 729 + iommu_group_put(tbl->it_group); 730 + BUG_ON(tbl->it_group); 731 + } 732 + #endif 733 + 730 734 /* verify that table contains no entries */ 731 735 if (!bitmap_empty(tbl->it_map, tbl->it_size)) 732 736 pr_warn("%s: Unexpected TCEs for %s\n", __func__, node_name); ··· 870 860 free_pages((unsigned long)vaddr, get_order(size)); 871 861 } 872 862 } 863 + 864 + #ifdef CONFIG_IOMMU_API 865 + /* 866 + * SPAPR TCE API 867 + */ 868 + static void group_release(void *iommu_data) 869 + { 870 + struct iommu_table *tbl = iommu_data; 871 + tbl->it_group = NULL; 872 + } 873 + 874 + void iommu_register_group(struct iommu_table *tbl, 875 + int pci_domain_number, unsigned long pe_num) 876 + { 877 + struct iommu_group *grp; 878 + char *name; 879 + 880 + grp = iommu_group_alloc(); 881 + if (IS_ERR(grp)) { 882 + pr_warn("powerpc iommu api: cannot create new group, err=%ld\n", 883 + PTR_ERR(grp)); 884 + return; 885 + } 886 + tbl->it_group = grp; 887 + iommu_group_set_iommudata(grp, tbl, group_release); 888 + name = kasprintf(GFP_KERNEL, "domain%d-pe%lx", 889 + pci_domain_number, pe_num); 890 + if (!name) 891 + return; 892 + iommu_group_set_name(grp, name); 893 + kfree(name); 894 + } 895 + 896 + enum dma_data_direction iommu_tce_direction(unsigned long tce) 897 + { 898 + if ((tce & TCE_PCI_READ) && (tce & TCE_PCI_WRITE)) 899 + return DMA_BIDIRECTIONAL; 900 + else if (tce & TCE_PCI_READ) 901 + return DMA_TO_DEVICE; 902 + else if (tce & TCE_PCI_WRITE) 903 + return DMA_FROM_DEVICE; 904 + else 905 + return DMA_NONE; 906 + } 907 + EXPORT_SYMBOL_GPL(iommu_tce_direction); 908 + 909 + void iommu_flush_tce(struct iommu_table *tbl) 910 + { 911 + /* Flush/invalidate TLB caches if necessary */ 912 + if (ppc_md.tce_flush) 913 + ppc_md.tce_flush(tbl); 914 + 915 + /* Make sure updates are seen by hardware */ 916 + mb(); 917 + } 918 + EXPORT_SYMBOL_GPL(iommu_flush_tce); 919 + 920 + int iommu_tce_clear_param_check(struct iommu_table *tbl, 921 + unsigned long ioba, unsigned long tce_value, 922 + unsigned long npages) 923 + { 924 + /* ppc_md.tce_free() does not support any value but 0 */ 925 + if (tce_value) 926 + return -EINVAL; 927 + 928 + if (ioba & ~IOMMU_PAGE_MASK) 929 + return -EINVAL; 930 + 931 + ioba >>= IOMMU_PAGE_SHIFT; 932 + if (ioba < tbl->it_offset) 933 + return -EINVAL; 934 + 935 + if ((ioba + npages) > (tbl->it_offset + tbl->it_size)) 936 + return -EINVAL; 937 + 938 + return 0; 939 + } 940 + EXPORT_SYMBOL_GPL(iommu_tce_clear_param_check); 941 + 942 + int iommu_tce_put_param_check(struct iommu_table *tbl, 943 + unsigned long ioba, unsigned long tce) 944 + { 945 + if (!(tce & (TCE_PCI_WRITE | TCE_PCI_READ))) 946 + return -EINVAL; 947 + 948 + if (tce & ~(IOMMU_PAGE_MASK | TCE_PCI_WRITE | TCE_PCI_READ)) 949 + return -EINVAL; 950 + 951 + if (ioba & ~IOMMU_PAGE_MASK) 952 + return -EINVAL; 953 + 954 + ioba >>= IOMMU_PAGE_SHIFT; 955 + if (ioba < tbl->it_offset) 956 + return -EINVAL; 957 + 958 + if ((ioba + 1) > (tbl->it_offset + tbl->it_size)) 959 + return -EINVAL; 960 + 961 + return 0; 962 + } 963 + EXPORT_SYMBOL_GPL(iommu_tce_put_param_check); 964 + 965 + unsigned long iommu_clear_tce(struct iommu_table *tbl, unsigned long entry) 966 + { 967 + unsigned long oldtce; 968 + struct iommu_pool *pool = get_pool(tbl, entry); 969 + 970 + spin_lock(&(pool->lock)); 971 + 972 + oldtce = ppc_md.tce_get(tbl, entry); 973 + if (oldtce & (TCE_PCI_WRITE | TCE_PCI_READ)) 974 + ppc_md.tce_free(tbl, entry, 1); 975 + else 976 + oldtce = 0; 977 + 978 + spin_unlock(&(pool->lock)); 979 + 980 + return oldtce; 981 + } 982 + EXPORT_SYMBOL_GPL(iommu_clear_tce); 983 + 984 + int iommu_clear_tces_and_put_pages(struct iommu_table *tbl, 985 + unsigned long entry, unsigned long pages) 986 + { 987 + unsigned long oldtce; 988 + struct page *page; 989 + 990 + for ( ; pages; --pages, ++entry) { 991 + oldtce = iommu_clear_tce(tbl, entry); 992 + if (!oldtce) 993 + continue; 994 + 995 + page = pfn_to_page(oldtce >> PAGE_SHIFT); 996 + WARN_ON(!page); 997 + if (page) { 998 + if (oldtce & TCE_PCI_WRITE) 999 + SetPageDirty(page); 1000 + put_page(page); 1001 + } 1002 + } 1003 + 1004 + return 0; 1005 + } 1006 + EXPORT_SYMBOL_GPL(iommu_clear_tces_and_put_pages); 1007 + 1008 + /* 1009 + * hwaddr is a kernel virtual address here (0xc... bazillion), 1010 + * tce_build converts it to a physical address. 1011 + */ 1012 + int iommu_tce_build(struct iommu_table *tbl, unsigned long entry, 1013 + unsigned long hwaddr, enum dma_data_direction direction) 1014 + { 1015 + int ret = -EBUSY; 1016 + unsigned long oldtce; 1017 + struct iommu_pool *pool = get_pool(tbl, entry); 1018 + 1019 + spin_lock(&(pool->lock)); 1020 + 1021 + oldtce = ppc_md.tce_get(tbl, entry); 1022 + /* Add new entry if it is not busy */ 1023 + if (!(oldtce & (TCE_PCI_WRITE | TCE_PCI_READ))) 1024 + ret = ppc_md.tce_build(tbl, entry, 1, hwaddr, direction, NULL); 1025 + 1026 + spin_unlock(&(pool->lock)); 1027 + 1028 + /* if (unlikely(ret)) 1029 + pr_err("iommu_tce: %s failed on hwaddr=%lx ioba=%lx kva=%lx ret=%d\n", 1030 + __func__, hwaddr, entry << IOMMU_PAGE_SHIFT, 1031 + hwaddr, ret); */ 1032 + 1033 + return ret; 1034 + } 1035 + EXPORT_SYMBOL_GPL(iommu_tce_build); 1036 + 1037 + int iommu_put_tce_user_mode(struct iommu_table *tbl, unsigned long entry, 1038 + unsigned long tce) 1039 + { 1040 + int ret; 1041 + struct page *page = NULL; 1042 + unsigned long hwaddr, offset = tce & IOMMU_PAGE_MASK & ~PAGE_MASK; 1043 + enum dma_data_direction direction = iommu_tce_direction(tce); 1044 + 1045 + ret = get_user_pages_fast(tce & PAGE_MASK, 1, 1046 + direction != DMA_TO_DEVICE, &page); 1047 + if (unlikely(ret != 1)) { 1048 + /* pr_err("iommu_tce: get_user_pages_fast failed tce=%lx ioba=%lx ret=%d\n", 1049 + tce, entry << IOMMU_PAGE_SHIFT, ret); */ 1050 + return -EFAULT; 1051 + } 1052 + hwaddr = (unsigned long) page_address(page) + offset; 1053 + 1054 + ret = iommu_tce_build(tbl, entry, hwaddr, direction); 1055 + if (ret) 1056 + put_page(page); 1057 + 1058 + if (ret < 0) 1059 + pr_err("iommu_tce: %s failed ioba=%lx, tce=%lx, ret=%d\n", 1060 + __func__, entry << IOMMU_PAGE_SHIFT, tce, ret); 1061 + 1062 + return ret; 1063 + } 1064 + EXPORT_SYMBOL_GPL(iommu_put_tce_user_mode); 1065 + 1066 + int iommu_take_ownership(struct iommu_table *tbl) 1067 + { 1068 + unsigned long sz = (tbl->it_size + 7) >> 3; 1069 + 1070 + if (tbl->it_offset == 0) 1071 + clear_bit(0, tbl->it_map); 1072 + 1073 + if (!bitmap_empty(tbl->it_map, tbl->it_size)) { 1074 + pr_err("iommu_tce: it_map is not empty"); 1075 + return -EBUSY; 1076 + } 1077 + 1078 + memset(tbl->it_map, 0xff, sz); 1079 + iommu_clear_tces_and_put_pages(tbl, tbl->it_offset, tbl->it_size); 1080 + 1081 + return 0; 1082 + } 1083 + EXPORT_SYMBOL_GPL(iommu_take_ownership); 1084 + 1085 + void iommu_release_ownership(struct iommu_table *tbl) 1086 + { 1087 + unsigned long sz = (tbl->it_size + 7) >> 3; 1088 + 1089 + iommu_clear_tces_and_put_pages(tbl, tbl->it_offset, tbl->it_size); 1090 + memset(tbl->it_map, 0, sz); 1091 + 1092 + /* Restore bit#0 set by iommu_init_table() */ 1093 + if (tbl->it_offset == 0) 1094 + set_bit(0, tbl->it_map); 1095 + } 1096 + EXPORT_SYMBOL_GPL(iommu_release_ownership); 1097 + 1098 + static int iommu_add_device(struct device *dev) 1099 + { 1100 + struct iommu_table *tbl; 1101 + int ret = 0; 1102 + 1103 + if (WARN_ON(dev->iommu_group)) { 1104 + pr_warn("iommu_tce: device %s is already in iommu group %d, skipping\n", 1105 + dev_name(dev), 1106 + iommu_group_id(dev->iommu_group)); 1107 + return -EBUSY; 1108 + } 1109 + 1110 + tbl = get_iommu_table_base(dev); 1111 + if (!tbl || !tbl->it_group) { 1112 + pr_debug("iommu_tce: skipping device %s with no tbl\n", 1113 + dev_name(dev)); 1114 + return 0; 1115 + } 1116 + 1117 + pr_debug("iommu_tce: adding %s to iommu group %d\n", 1118 + dev_name(dev), iommu_group_id(tbl->it_group)); 1119 + 1120 + ret = iommu_group_add_device(tbl->it_group, dev); 1121 + if (ret < 0) 1122 + pr_err("iommu_tce: %s has not been added, ret=%d\n", 1123 + dev_name(dev), ret); 1124 + 1125 + return ret; 1126 + } 1127 + 1128 + static void iommu_del_device(struct device *dev) 1129 + { 1130 + iommu_group_remove_device(dev); 1131 + } 1132 + 1133 + static int iommu_bus_notifier(struct notifier_block *nb, 1134 + unsigned long action, void *data) 1135 + { 1136 + struct device *dev = data; 1137 + 1138 + switch (action) { 1139 + case BUS_NOTIFY_ADD_DEVICE: 1140 + return iommu_add_device(dev); 1141 + case BUS_NOTIFY_DEL_DEVICE: 1142 + iommu_del_device(dev); 1143 + return 0; 1144 + default: 1145 + return 0; 1146 + } 1147 + } 1148 + 1149 + static struct notifier_block tce_iommu_bus_nb = { 1150 + .notifier_call = iommu_bus_notifier, 1151 + }; 1152 + 1153 + static int __init tce_iommu_init(void) 1154 + { 1155 + struct pci_dev *pdev = NULL; 1156 + 1157 + BUILD_BUG_ON(PAGE_SIZE < IOMMU_PAGE_SIZE); 1158 + 1159 + for_each_pci_dev(pdev) 1160 + iommu_add_device(&pdev->dev); 1161 + 1162 + bus_register_notifier(&pci_bus_type, &tce_iommu_bus_nb); 1163 + return 0; 1164 + } 1165 + 1166 + subsys_initcall_sync(tce_iommu_init); 1167 + 1168 + #else 1169 + 1170 + void iommu_register_group(struct iommu_table *tbl, 1171 + int pci_domain_number, unsigned long pe_num) 1172 + { 1173 + } 1174 + 1175 + #endif /* CONFIG_IOMMU_API */

-2

arch/powerpc/kernel/irq.c

··· 116 116 u64 now = get_tb_or_rtc(); 117 117 u64 *next_tb = &__get_cpu_var(decrementers_next_tb); 118 118 119 - if (now >= *next_tb) 120 - set_dec(1); 121 119 return now >= *next_tb; 122 120 } 123 121

+1 -19

arch/powerpc/kernel/kprobes.c

··· 36 36 #include <asm/sstep.h> 37 37 #include <asm/uaccess.h> 38 38 39 - #ifdef CONFIG_PPC_ADV_DEBUG_REGS 40 - #define MSR_SINGLESTEP (MSR_DE) 41 - #else 42 - #define MSR_SINGLESTEP (MSR_SE) 43 - #endif 44 - 45 39 DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL; 46 40 DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk); 47 41 ··· 98 104 99 105 static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs *regs) 100 106 { 101 - /* We turn off async exceptions to ensure that the single step will 102 - * be for the instruction we have the kprobe on, if we dont its 103 - * possible we'd get the single step reported for an exception handler 104 - * like Decrementer or External Interrupt */ 105 - regs->msr &= ~MSR_EE; 106 - regs->msr |= MSR_SINGLESTEP; 107 - #ifdef CONFIG_PPC_ADV_DEBUG_REGS 108 - regs->msr &= ~MSR_CE; 109 - mtspr(SPRN_DBCR0, mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM); 110 - #ifdef CONFIG_PPC_47x 111 - isync(); 112 - #endif 113 - #endif 107 + enable_single_step(regs); 114 108 115 109 /* 116 110 * On powerpc we should single step on the original

+14 -6

arch/powerpc/kernel/nvram_64.c

··· 84 84 char *tmp = NULL; 85 85 ssize_t size; 86 86 87 - ret = -ENODEV; 88 - if (!ppc_md.nvram_size) 87 + if (!ppc_md.nvram_size) { 88 + ret = -ENODEV; 89 89 goto out; 90 + } 90 91 91 - ret = 0; 92 92 size = ppc_md.nvram_size(); 93 - if (*ppos >= size || size < 0) 93 + if (size < 0) { 94 + ret = size; 94 95 goto out; 96 + } 97 + 98 + if (*ppos >= size) { 99 + ret = 0; 100 + goto out; 101 + } 95 102 96 103 count = min_t(size_t, count, size - *ppos); 97 104 count = min(count, PAGE_SIZE); 98 105 99 - ret = -ENOMEM; 100 106 tmp = kmalloc(count, GFP_KERNEL); 101 - if (!tmp) 107 + if (!tmp) { 108 + ret = -ENOMEM; 102 109 goto out; 110 + } 103 111 104 112 ret = ppc_md.nvram_read(tmp, count, ppos); 105 113 if (ret <= 0)

+111

arch/powerpc/kernel/pci-hotplug.c

··· 1 + /* 2 + * Derived from "arch/powerpc/platforms/pseries/pci_dlpar.c" 3 + * 4 + * Copyright (C) 2003 Linda Xie <lxie@us.ibm.com> 5 + * Copyright (C) 2005 International Business Machines 6 + * 7 + * Updates, 2005, John Rose <johnrose@austin.ibm.com> 8 + * Updates, 2005, Linas Vepstas <linas@austin.ibm.com> 9 + * Updates, 2013, Gavin Shan <shangw@linux.vnet.ibm.com> 10 + * 11 + * This program is free software; you can redistribute it and/or modify 12 + * it under the terms of the GNU General Public License as published by 13 + * the Free Software Foundation; either version 2 of the License, or 14 + * (at your option) any later version. 15 + */ 16 + 17 + #include <linux/pci.h> 18 + #include <linux/export.h> 19 + #include <asm/pci-bridge.h> 20 + #include <asm/ppc-pci.h> 21 + #include <asm/firmware.h> 22 + #include <asm/eeh.h> 23 + 24 + /** 25 + * __pcibios_remove_pci_devices - remove all devices under this bus 26 + * @bus: the indicated PCI bus 27 + * @purge_pe: destroy the PE on removal of PCI devices 28 + * 29 + * Remove all of the PCI devices under this bus both from the 30 + * linux pci device tree, and from the powerpc EEH address cache. 31 + * By default, the corresponding PE will be destroied during the 32 + * normal PCI hotplug path. For PCI hotplug during EEH recovery, 33 + * the corresponding PE won't be destroied and deallocated. 34 + */ 35 + void __pcibios_remove_pci_devices(struct pci_bus *bus, int purge_pe) 36 + { 37 + struct pci_dev *dev, *tmp; 38 + struct pci_bus *child_bus; 39 + 40 + /* First go down child busses */ 41 + list_for_each_entry(child_bus, &bus->children, node) 42 + __pcibios_remove_pci_devices(child_bus, purge_pe); 43 + 44 + pr_debug("PCI: Removing devices on bus %04x:%02x\n", 45 + pci_domain_nr(bus), bus->number); 46 + list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) { 47 + pr_debug(" * Removing %s...\n", pci_name(dev)); 48 + eeh_remove_bus_device(dev, purge_pe); 49 + pci_stop_and_remove_bus_device(dev); 50 + } 51 + } 52 + 53 + /** 54 + * pcibios_remove_pci_devices - remove all devices under this bus 55 + * @bus: the indicated PCI bus 56 + * 57 + * Remove all of the PCI devices under this bus both from the 58 + * linux pci device tree, and from the powerpc EEH address cache. 59 + */ 60 + void pcibios_remove_pci_devices(struct pci_bus *bus) 61 + { 62 + __pcibios_remove_pci_devices(bus, 1); 63 + } 64 + EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices); 65 + 66 + /** 67 + * pcibios_add_pci_devices - adds new pci devices to bus 68 + * @bus: the indicated PCI bus 69 + * 70 + * This routine will find and fixup new pci devices under 71 + * the indicated bus. This routine presumes that there 72 + * might already be some devices under this bridge, so 73 + * it carefully tries to add only new devices. (And that 74 + * is how this routine differs from other, similar pcibios 75 + * routines.) 76 + */ 77 + void pcibios_add_pci_devices(struct pci_bus * bus) 78 + { 79 + int slotno, num, mode, pass, max; 80 + struct pci_dev *dev; 81 + struct device_node *dn = pci_bus_to_OF_node(bus); 82 + 83 + eeh_add_device_tree_early(dn); 84 + 85 + mode = PCI_PROBE_NORMAL; 86 + if (ppc_md.pci_probe_mode) 87 + mode = ppc_md.pci_probe_mode(bus); 88 + 89 + if (mode == PCI_PROBE_DEVTREE) { 90 + /* use ofdt-based probe */ 91 + of_rescan_bus(dn, bus); 92 + } else if (mode == PCI_PROBE_NORMAL) { 93 + /* use legacy probe */ 94 + slotno = PCI_SLOT(PCI_DN(dn->child)->devfn); 95 + num = pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); 96 + if (!num) 97 + return; 98 + pcibios_setup_bus_devices(bus); 99 + max = bus->busn_res.start; 100 + for (pass = 0; pass < 2; pass++) { 101 + list_for_each_entry(dev, &bus->devices, bus_list) { 102 + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE || 103 + dev->hdr_type == PCI_HEADER_TYPE_CARDBUS) 104 + max = pci_scan_bridge(bus, dev, 105 + max, pass); 106 + } 107 + } 108 + } 109 + pcibios_finish_adding_to_bus(bus); 110 + } 111 + EXPORT_SYMBOL_GPL(pcibios_add_pci_devices);

+4

arch/powerpc/kernel/process.c

··· 916 916 flush_altivec_to_thread(src); 917 917 flush_vsx_to_thread(src); 918 918 flush_spe_to_thread(src); 919 + 919 920 *dst = *src; 921 + 922 + clear_task_ebb(dst); 923 + 920 924 return 0; 921 925 } 922 926

+40 -2

arch/powerpc/kernel/prom.c

··· 559 559 } 560 560 #endif 561 561 562 + static void __init early_reserve_mem_dt(void) 563 + { 564 + unsigned long i, len, dt_root; 565 + const __be32 *prop; 566 + 567 + dt_root = of_get_flat_dt_root(); 568 + 569 + prop = of_get_flat_dt_prop(dt_root, "reserved-ranges", &len); 570 + 571 + if (!prop) 572 + return; 573 + 574 + DBG("Found new-style reserved-ranges\n"); 575 + 576 + /* Each reserved range is an (address,size) pair, 2 cells each, 577 + * totalling 4 cells per range. */ 578 + for (i = 0; i < len / (sizeof(*prop) * 4); i++) { 579 + u64 base, size; 580 + 581 + base = of_read_number(prop + (i * 4) + 0, 2); 582 + size = of_read_number(prop + (i * 4) + 2, 2); 583 + 584 + if (size) { 585 + DBG("reserving: %llx -> %llx\n", base, size); 586 + memblock_reserve(base, size); 587 + } 588 + } 589 + } 590 + 562 591 static void __init early_reserve_mem(void) 563 592 { 564 593 u64 base, size; ··· 603 574 self_size = initial_boot_params->totalsize; 604 575 memblock_reserve(self_base, self_size); 605 576 577 + /* Look for the new "reserved-regions" property in the DT */ 578 + early_reserve_mem_dt(); 579 + 606 580 #ifdef CONFIG_BLK_DEV_INITRD 607 - /* then reserve the initrd, if any */ 608 - if (initrd_start && (initrd_end > initrd_start)) 581 + /* Then reserve the initrd, if any */ 582 + if (initrd_start && (initrd_end > initrd_start)) { 609 583 memblock_reserve(_ALIGN_DOWN(__pa(initrd_start), PAGE_SIZE), 610 584 _ALIGN_UP(initrd_end, PAGE_SIZE) - 611 585 _ALIGN_DOWN(initrd_start, PAGE_SIZE)); 586 + } 612 587 #endif /* CONFIG_BLK_DEV_INITRD */ 613 588 614 589 #ifdef CONFIG_PPC32 ··· 623 590 if (*reserve_map > 0xffffffffull) { 624 591 u32 base_32, size_32; 625 592 u32 *reserve_map_32 = (u32 *)reserve_map; 593 + 594 + DBG("Found old 32-bit reserve map\n"); 626 595 627 596 while (1) { 628 597 base_32 = *(reserve_map_32++); ··· 640 605 return; 641 606 } 642 607 #endif 608 + DBG("Processing reserve map\n"); 609 + 610 + /* Handle the reserve map in the fdt blob if it exists */ 643 611 while (1) { 644 612 base = *(reserve_map++); 645 613 size = *(reserve_map++);

+3 -1

arch/powerpc/kernel/ptrace.c

··· 1449 1449 */ 1450 1450 if (bp_info->addr_mode == PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE) { 1451 1451 len = bp_info->addr2 - bp_info->addr; 1452 - } else if (bp_info->addr_mode != PPC_BREAKPOINT_MODE_EXACT) { 1452 + } else if (bp_info->addr_mode == PPC_BREAKPOINT_MODE_EXACT) 1453 + len = 1; 1454 + else { 1453 1455 ptrace_put_breakpoints(child); 1454 1456 return -EINVAL; 1455 1457 }

+2 -1

arch/powerpc/kernel/reloc_32.S

··· 166 166 /* R_PPC_ADDR16_LO */ 167 167 lo16: 168 168 cmpwi r4, R_PPC_ADDR16_LO 169 - bne nxtrela 169 + bne unknown_type 170 170 lwz r4, 0(r9) /* r_offset */ 171 171 lwz r0, 8(r9) /* r_addend */ 172 172 add r0, r0, r3 ··· 191 191 dcbst r4,r7 192 192 sync /* Ensure the data is flushed before icbi */ 193 193 icbi r4,r7 194 + unknown_type: 194 195 cmpwi r8, 0 /* relasz = 0 ? */ 195 196 ble done 196 197 add r9, r9, r6 /* move to next entry in the .rela table */

+2 -2

arch/powerpc/kernel/rtas.c

··· 1172 1172 static arch_spinlock_t timebase_lock; 1173 1173 static u64 timebase = 0; 1174 1174 1175 - void __cpuinit rtas_give_timebase(void) 1175 + void rtas_give_timebase(void) 1176 1176 { 1177 1177 unsigned long flags; 1178 1178 ··· 1189 1189 local_irq_restore(flags); 1190 1190 } 1191 1191 1192 - void __cpuinit rtas_take_timebase(void) 1192 + void rtas_take_timebase(void) 1193 1193 { 1194 1194 while (!timebase) 1195 1195 barrier();

+1 -1

arch/powerpc/kernel/setup_64.c

··· 76 76 #endif 77 77 78 78 int boot_cpuid = 0; 79 - int __initdata spinning_secondaries; 79 + int spinning_secondaries; 80 80 u64 ppc64_pft_size; 81 81 82 82 /* Pick defaults since we might want to patch instructions

+53 -17

arch/powerpc/kernel/signal_32.c

··· 407 407 * altivec/spe instructions at some point. 408 408 */ 409 409 static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame, 410 - int sigret, int ctx_has_vsx_region) 410 + struct mcontext __user *tm_frame, int sigret, 411 + int ctx_has_vsx_region) 411 412 { 412 413 unsigned long msr = regs->msr; 413 414 ··· 476 475 477 476 if (__put_user(msr, &frame->mc_gregs[PT_MSR])) 478 477 return 1; 478 + /* We need to write 0 the MSR top 32 bits in the tm frame so that we 479 + * can check it on the restore to see if TM is active 480 + */ 481 + if (tm_frame && __put_user(0, &tm_frame->mc_gregs[PT_MSR])) 482 + return 1; 483 + 479 484 if (sigret) { 480 485 /* Set up the sigreturn trampoline: li r0,sigret; sc */ 481 486 if (__put_user(0x38000000UL + sigret, &frame->tramp[0]) ··· 754 747 struct mcontext __user *tm_sr) 755 748 { 756 749 long err; 757 - unsigned long msr; 750 + unsigned long msr, msr_hi; 758 751 #ifdef CONFIG_VSX 759 752 int i; 760 753 #endif ··· 859 852 tm_enable(); 860 853 /* This loads the checkpointed FP/VEC state, if used */ 861 854 tm_recheckpoint(&current->thread, msr); 862 - /* The task has moved into TM state S, so ensure MSR reflects this */ 863 - regs->msr = (regs->msr & ~MSR_TS_MASK) | MSR_TS_S; 855 + /* Get the top half of the MSR */ 856 + if (__get_user(msr_hi, &tm_sr->mc_gregs[PT_MSR])) 857 + return 1; 858 + /* Pull in MSR TM from user context */ 859 + regs->msr = (regs->msr & ~MSR_TS_MASK) | ((msr_hi<<32) & MSR_TS_MASK); 864 860 865 861 /* This loads the speculative FP/VEC state, if used */ 866 862 if (msr & MSR_FP) { ··· 962 952 { 963 953 struct rt_sigframe __user *rt_sf; 964 954 struct mcontext __user *frame; 955 + struct mcontext __user *tm_frame = NULL; 965 956 void __user *addr; 966 957 unsigned long newsp = 0; 967 958 int sigret; ··· 996 985 } 997 986 998 987 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM 988 + tm_frame = &rt_sf->uc_transact.uc_mcontext; 999 989 if (MSR_TM_ACTIVE(regs->msr)) { 1000 - if (save_tm_user_regs(regs, &rt_sf->uc.uc_mcontext, 1001 - &rt_sf->uc_transact.uc_mcontext, sigret)) 990 + if (save_tm_user_regs(regs, frame, tm_frame, sigret)) 1002 991 goto badframe; 1003 992 } 1004 993 else 1005 994 #endif 1006 - if (save_user_regs(regs, frame, sigret, 1)) 995 + { 996 + if (save_user_regs(regs, frame, tm_frame, sigret, 1)) 1007 997 goto badframe; 998 + } 1008 999 regs->link = tramp; 1009 1000 1010 1001 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM 1011 1002 if (MSR_TM_ACTIVE(regs->msr)) { 1012 1003 if (__put_user((unsigned long)&rt_sf->uc_transact, 1013 1004 &rt_sf->uc.uc_link) 1014 - || __put_user(to_user_ptr(&rt_sf->uc_transact.uc_mcontext), 1015 - &rt_sf->uc_transact.uc_regs)) 1005 + || __put_user((unsigned long)tm_frame, &rt_sf->uc_transact.uc_regs)) 1016 1006 goto badframe; 1017 1007 } 1018 1008 else ··· 1182 1170 mctx = (struct mcontext __user *) 1183 1171 ((unsigned long) &old_ctx->uc_mcontext & ~0xfUL); 1184 1172 if (!access_ok(VERIFY_WRITE, old_ctx, ctx_size) 1185 - || save_user_regs(regs, mctx, 0, ctx_has_vsx_region) 1173 + || save_user_regs(regs, mctx, NULL, 0, ctx_has_vsx_region) 1186 1174 || put_sigset_t(&old_ctx->uc_sigmask, &current->blocked) 1187 1175 || __put_user(to_user_ptr(mctx), &old_ctx->uc_regs)) 1188 1176 return -EFAULT; ··· 1245 1233 if (__get_user(msr_hi, &mcp->mc_gregs[PT_MSR])) 1246 1234 goto bad; 1247 1235 1248 - if (MSR_TM_SUSPENDED(msr_hi<<32)) { 1236 + if (MSR_TM_ACTIVE(msr_hi<<32)) { 1249 1237 /* We only recheckpoint on return if we're 1250 1238 * transaction. 1251 1239 */ ··· 1404 1392 { 1405 1393 struct sigcontext __user *sc; 1406 1394 struct sigframe __user *frame; 1395 + struct mcontext __user *tm_mctx = NULL; 1407 1396 unsigned long newsp = 0; 1408 1397 int sigret; 1409 1398 unsigned long tramp; ··· 1438 1425 } 1439 1426 1440 1427 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM 1428 + tm_mctx = &frame->mctx_transact; 1441 1429 if (MSR_TM_ACTIVE(regs->msr)) { 1442 1430 if (save_tm_user_regs(regs, &frame->mctx, &frame->mctx_transact, 1443 1431 sigret)) ··· 1446 1432 } 1447 1433 else 1448 1434 #endif 1449 - if (save_user_regs(regs, &frame->mctx, sigret, 1)) 1435 + { 1436 + if (save_user_regs(regs, &frame->mctx, tm_mctx, sigret, 1)) 1450 1437 goto badframe; 1438 + } 1451 1439 1452 1440 regs->link = tramp; 1453 1441 ··· 1497 1481 long sys_sigreturn(int r3, int r4, int r5, int r6, int r7, int r8, 1498 1482 struct pt_regs *regs) 1499 1483 { 1484 + struct sigframe __user *sf; 1500 1485 struct sigcontext __user *sc; 1501 1486 struct sigcontext sigctx; 1502 1487 struct mcontext __user *sr; 1503 1488 void __user *addr; 1504 1489 sigset_t set; 1490 + #ifdef CONFIG_PPC_TRANSACTIONAL_MEM 1491 + struct mcontext __user *mcp, *tm_mcp; 1492 + unsigned long msr_hi; 1493 + #endif 1505 1494 1506 1495 /* Always make any pending restarted system calls return -EINTR */ 1507 1496 current_thread_info()->restart_block.fn = do_no_restart_syscall; 1508 1497 1509 - sc = (struct sigcontext __user *)(regs->gpr[1] + __SIGNAL_FRAMESIZE); 1498 + sf = (struct sigframe __user *)(regs->gpr[1] + __SIGNAL_FRAMESIZE); 1499 + sc = &sf->sctx; 1510 1500 addr = sc; 1511 1501 if (copy_from_user(&sigctx, sc, sizeof(sigctx))) 1512 1502 goto badframe; ··· 1529 1507 #endif 1530 1508 set_current_blocked(&set); 1531 1509 1532 - sr = (struct mcontext __user *)from_user_ptr(sigctx.regs); 1533 - addr = sr; 1534 - if (!access_ok(VERIFY_READ, sr, sizeof(*sr)) 1535 - || restore_user_regs(regs, sr, 1)) 1510 + #ifdef CONFIG_PPC_TRANSACTIONAL_MEM 1511 + mcp = (struct mcontext __user *)&sf->mctx; 1512 + tm_mcp = (struct mcontext __user *)&sf->mctx_transact; 1513 + if (__get_user(msr_hi, &tm_mcp->mc_gregs[PT_MSR])) 1536 1514 goto badframe; 1515 + if (MSR_TM_ACTIVE(msr_hi<<32)) { 1516 + if (!cpu_has_feature(CPU_FTR_TM)) 1517 + goto badframe; 1518 + if (restore_tm_user_regs(regs, mcp, tm_mcp)) 1519 + goto badframe; 1520 + } else 1521 + #endif 1522 + { 1523 + sr = (struct mcontext __user *)from_user_ptr(sigctx.regs); 1524 + addr = sr; 1525 + if (!access_ok(VERIFY_READ, sr, sizeof(*sr)) 1526 + || restore_user_regs(regs, sr, 1)) 1527 + goto badframe; 1528 + } 1537 1529 1538 1530 set_thread_flag(TIF_RESTOREALL); 1539 1531 return 0;

+5 -3

arch/powerpc/kernel/signal_64.c

··· 410 410 411 411 /* get MSR separately, transfer the LE bit if doing signal return */ 412 412 err |= __get_user(msr, &sc->gp_regs[PT_MSR]); 413 + /* pull in MSR TM from user context */ 414 + regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK); 415 + 416 + /* pull in MSR LE from user context */ 413 417 regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE); 414 418 415 419 /* The following non-GPR non-FPR non-VR state is also checkpointed: */ ··· 509 505 tm_enable(); 510 506 /* This loads the checkpointed FP/VEC state, if used */ 511 507 tm_recheckpoint(&current->thread, msr); 512 - /* The task has moved into TM state S, so ensure MSR reflects this: */ 513 - regs->msr = (regs->msr & ~MSR_TS_MASK) | __MASK(33); 514 508 515 509 /* This loads the speculative FP/VEC state, if used */ 516 510 if (msr & MSR_FP) { ··· 656 654 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM 657 655 if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR])) 658 656 goto badframe; 659 - if (MSR_TM_SUSPENDED(msr)) { 657 + if (MSR_TM_ACTIVE(msr)) { 660 658 /* We recheckpoint on return. */ 661 659 struct ucontext __user *uc_transact; 662 660 if (__get_user(uc_transact, &uc->uc_link))

+7 -5

arch/powerpc/kernel/smp.c

··· 480 480 secondary_ti = current_set[cpu] = ti; 481 481 } 482 482 483 - int __cpuinit __cpu_up(unsigned int cpu, struct task_struct *tidle) 483 + int __cpu_up(unsigned int cpu, struct task_struct *tidle) 484 484 { 485 485 int rc, c; 486 486 ··· 610 610 } 611 611 612 612 /* Activate a secondary processor. */ 613 - __cpuinit void start_secondary(void *unused) 613 + void start_secondary(void *unused) 614 614 { 615 615 unsigned int cpu = smp_processor_id(); 616 616 struct device_node *l2_cache; ··· 637 637 638 638 vdso_getcpu_init(); 639 639 #endif 640 - notify_cpu_starting(cpu); 641 - set_cpu_online(cpu, true); 642 640 /* Update sibling maps */ 643 641 base = cpu_first_thread_sibling(cpu); 644 642 for (i = 0; i < threads_per_core; i++) { 645 - if (cpu_is_offline(base + i)) 643 + if (cpu_is_offline(base + i) && (cpu != base + i)) 646 644 continue; 647 645 cpumask_set_cpu(cpu, cpu_sibling_mask(base + i)); 648 646 cpumask_set_cpu(base + i, cpu_sibling_mask(cpu)); ··· 664 666 of_node_put(np); 665 667 } 666 668 of_node_put(l2_cache); 669 + 670 + smp_wmb(); 671 + notify_cpu_starting(cpu); 672 + set_cpu_online(cpu, true); 667 673 668 674 local_irq_enable(); 669 675

+3 -3

arch/powerpc/kernel/sysfs.c

··· 341 341 #endif /* HAS_PPC_PMC_PA6T */ 342 342 #endif /* HAS_PPC_PMC_CLASSIC */ 343 343 344 - static void __cpuinit register_cpu_online(unsigned int cpu) 344 + static void register_cpu_online(unsigned int cpu) 345 345 { 346 346 struct cpu *c = &per_cpu(cpu_devices, cpu); 347 347 struct device *s = &c->dev; ··· 502 502 503 503 #endif /* CONFIG_HOTPLUG_CPU */ 504 504 505 - static int __cpuinit sysfs_cpu_notify(struct notifier_block *self, 505 + static int sysfs_cpu_notify(struct notifier_block *self, 506 506 unsigned long action, void *hcpu) 507 507 { 508 508 unsigned int cpu = (unsigned int)(long)hcpu; ··· 522 522 return NOTIFY_OK; 523 523 } 524 524 525 - static struct notifier_block __cpuinitdata sysfs_cpu_nb = { 525 + static struct notifier_block sysfs_cpu_nb = { 526 526 .notifier_call = sysfs_cpu_notify, 527 527 }; 528 528

-1

arch/powerpc/kernel/time.c

··· 631 631 return found; 632 632 } 633 633 634 - /* should become __cpuinit when secondary_cpu_time_init also is */ 635 634 void start_cpu_decrementer(void) 636 635 { 637 636 #if defined(CONFIG_BOOKE) || defined(CONFIG_40x)

+16 -2

arch/powerpc/kernel/tm.S

··· 112 112 std r3, STACK_PARAM(0)(r1) 113 113 SAVE_NVGPRS(r1) 114 114 115 + /* We need to setup MSR for VSX register save instructions. Here we 116 + * also clear the MSR RI since when we do the treclaim, we won't have a 117 + * valid kernel pointer for a while. We clear RI here as it avoids 118 + * adding another mtmsr closer to the treclaim. This makes the region 119 + * maked as non-recoverable wider than it needs to be but it saves on 120 + * inserting another mtmsrd later. 121 + */ 115 122 mfmsr r14 116 123 mr r15, r14 117 124 ori r15, r15, MSR_FP 125 + li r16, MSR_RI 126 + andc r15, r15, r16 118 127 oris r15, r15, MSR_VEC@h 119 128 #ifdef CONFIG_VSX 120 129 BEGIN_FTR_SECTION ··· 358 349 mtcr r5 359 350 mtxer r6 360 351 361 - /* MSR and flags: We don't change CRs, and we don't need to alter 362 - * MSR. 352 + /* Clear the MSR RI since we are about to change R1. EE is already off 363 353 */ 354 + li r4, 0 355 + mtmsrd r4, 1 364 356 365 357 REST_4GPRS(0, r7) /* GPR0-3 */ 366 358 REST_GPR(4, r7) /* GPR4-6 */ ··· 386 376 387 377 GET_PACA(r13) 388 378 GET_SCRATCH0(r1) 379 + 380 + /* R1 is restored, so we are recoverable again. EE is still off */ 381 + li r4, MSR_RI 382 + mtmsrd r4, 1 389 383 390 384 REST_NVGPRS(r1) 391 385

+49 -30

arch/powerpc/kernel/traps.c

··· 866 866 u8 val; 867 867 u32 shift = 8 * (3 - (pos & 0x3)); 868 868 869 + /* if process is 32-bit, clear upper 32 bits of EA */ 870 + if ((regs->msr & MSR_64BIT) == 0) 871 + EA &= 0xFFFFFFFF; 872 + 869 873 switch ((instword & PPC_INST_STRING_MASK)) { 870 874 case PPC_INST_LSWX: 871 875 case PPC_INST_LSWI: ··· 1129 1125 * ESR_DST (!?) or 0. In the process of chasing this with the 1130 1126 * hardware people - not sure if it can happen on any illegal 1131 1127 * instruction or only on FP instructions, whether there is a 1132 - * pattern to occurrences etc. -dgibson 31/Mar/2003 */ 1128 + * pattern to occurrences etc. -dgibson 31/Mar/2003 1129 + */ 1130 + 1131 + /* 1132 + * If we support a HW FPU, we need to ensure the FP state 1133 + * if flushed into the thread_struct before attempting 1134 + * emulation 1135 + */ 1136 + #ifdef CONFIG_PPC_FPU 1137 + flush_fp_to_thread(current); 1138 + #endif 1133 1139 switch (do_mathemu(regs)) { 1134 1140 case 0: 1135 1141 emulate_single_step(regs); ··· 1296 1282 die("Unrecoverable VSX Unavailable Exception", regs, SIGABRT); 1297 1283 } 1298 1284 1299 - void tm_unavailable_exception(struct pt_regs *regs) 1285 + void facility_unavailable_exception(struct pt_regs *regs) 1300 1286 { 1287 + static char *facility_strings[] = { 1288 + "FPU", 1289 + "VMX/VSX", 1290 + "DSCR", 1291 + "PMU SPRs", 1292 + "BHRB", 1293 + "TM", 1294 + "AT", 1295 + "EBB", 1296 + "TAR", 1297 + }; 1298 + char *facility, *prefix; 1299 + u64 value; 1300 + 1301 + if (regs->trap == 0xf60) { 1302 + value = mfspr(SPRN_FSCR); 1303 + prefix = ""; 1304 + } else { 1305 + value = mfspr(SPRN_HFSCR); 1306 + prefix = "Hypervisor "; 1307 + } 1308 + 1309 + value = value >> 56; 1310 + 1301 1311 /* We restore the interrupt state now */ 1302 1312 if (!arch_irq_disabled_regs(regs)) 1303 1313 local_irq_enable(); 1304 1314 1305 - /* Currently we never expect a TMU exception. Catch 1306 - * this and kill the process! 1307 - */ 1308 - printk(KERN_EMERG "Unexpected TM unavailable exception at %lx " 1309 - "(msr %lx)\n", 1310 - regs->nip, regs->msr); 1315 + if (value < ARRAY_SIZE(facility_strings)) 1316 + facility = facility_strings[value]; 1317 + else 1318 + facility = "unknown"; 1319 + 1320 + pr_err("%sFacility '%s' unavailable, exception at 0x%lx, MSR=%lx\n", 1321 + prefix, facility, regs->nip, regs->msr); 1311 1322 1312 1323 if (user_mode(regs)) { 1313 1324 _exception(SIGILL, regs, ILL_ILLOPC, regs->nip); 1314 1325 return; 1315 1326 } 1316 1327 1317 - die("Unexpected TM unavailable exception", regs, SIGABRT); 1328 + die("Unexpected facility unavailable exception", regs, SIGABRT); 1318 1329 } 1319 1330 1320 1331 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM ··· 1435 1396 void SoftwareEmulation(struct pt_regs *regs) 1436 1397 { 1437 1398 extern int do_mathemu(struct pt_regs *); 1438 - extern int Soft_emulate_8xx(struct pt_regs *); 1439 - #if defined(CONFIG_MATH_EMULATION) || defined(CONFIG_8XX_MINIMAL_FPEMU) 1399 + #if defined(CONFIG_MATH_EMULATION) 1440 1400 int errcode; 1441 1401 #endif 1442 1402 ··· 1466 1428 return; 1467 1429 default: 1468 1430 _exception(SIGILL, regs, ILL_ILLOPC, regs->nip); 1469 - return; 1470 - } 1471 - 1472 - #elif defined(CONFIG_8XX_MINIMAL_FPEMU) 1473 - errcode = Soft_emulate_8xx(regs); 1474 - if (errcode >= 0) 1475 - PPC_WARN_EMULATED(8xx, regs); 1476 - 1477 - switch (errcode) { 1478 - case 0: 1479 - emulate_single_step(regs); 1480 - return; 1481 - case 1: 1482 - _exception(SIGILL, regs, ILL_ILLOPC, regs->nip); 1483 - return; 1484 - case -EFAULT: 1485 - _exception(SIGSEGV, regs, SEGV_MAPERR, regs->nip); 1486 1431 return; 1487 1432 } 1488 1433 #else ··· 1817 1796 WARN_EMULATED_SETUP(unaligned), 1818 1797 #ifdef CONFIG_MATH_EMULATION 1819 1798 WARN_EMULATED_SETUP(math), 1820 - #elif defined(CONFIG_8XX_MINIMAL_FPEMU) 1821 - WARN_EMULATED_SETUP(8xx), 1822 1799 #endif 1823 1800 #ifdef CONFIG_VSX 1824 1801 WARN_EMULATED_SETUP(vsx),

+1 -1

arch/powerpc/kernel/udbg.c

··· 50 50 udbg_init_debug_beat(); 51 51 #elif defined(CONFIG_PPC_EARLY_DEBUG_PAS_REALMODE) 52 52 udbg_init_pas_realmode(); 53 - #elif defined(CONFIG_BOOTX_TEXT) 53 + #elif defined(CONFIG_PPC_EARLY_DEBUG_BOOTX) 54 54 udbg_init_btext(); 55 55 #elif defined(CONFIG_PPC_EARLY_DEBUG_44x) 56 56 /* PPC44x debug */

+1 -1

arch/powerpc/kernel/vdso.c

··· 711 711 } 712 712 713 713 #ifdef CONFIG_PPC64 714 - int __cpuinit vdso_getcpu_init(void) 714 + int vdso_getcpu_init(void) 715 715 { 716 716 unsigned long cpu, node, val; 717 717

+1 -1

arch/powerpc/kvm/book3s_64_mmu_host.c

··· 34 34 void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte) 35 35 { 36 36 ppc_md.hpte_invalidate(pte->slot, pte->host_vpn, 37 - MMU_PAGE_4K, MMU_SEGSIZE_256M, 37 + MMU_PAGE_4K, MMU_PAGE_4K, MMU_SEGSIZE_256M, 38 38 false); 39 39 } 40 40

+5 -3

arch/powerpc/kvm/book3s_64_mmu_hv.c

··· 675 675 } 676 676 /* if the guest wants write access, see if that is OK */ 677 677 if (!writing && hpte_is_writable(r)) { 678 + unsigned int hugepage_shift; 678 679 pte_t *ptep, pte; 679 680 680 681 /* ··· 684 683 */ 685 684 rcu_read_lock_sched(); 686 685 ptep = find_linux_pte_or_hugepte(current->mm->pgd, 687 - hva, NULL); 688 - if (ptep && pte_present(*ptep)) { 689 - pte = kvmppc_read_update_linux_pte(ptep, 1); 686 + hva, &hugepage_shift); 687 + if (ptep) { 688 + pte = kvmppc_read_update_linux_pte(ptep, 1, 689 + hugepage_shift); 690 690 if (pte_write(pte)) 691 691 write_ok = 1; 692 692 }

+6 -8

arch/powerpc/kvm/book3s_hv_rm_mmu.c

··· 27 27 unsigned long addr = (unsigned long) x; 28 28 pte_t *p; 29 29 30 - p = find_linux_pte(swapper_pg_dir, addr); 30 + p = find_linux_pte_or_hugepte(swapper_pg_dir, addr, NULL); 31 31 if (!p || !pte_present(*p)) 32 32 return NULL; 33 33 /* assume we don't have huge pages in vmalloc space... */ ··· 139 139 { 140 140 pte_t *ptep; 141 141 unsigned long ps = *pte_sizep; 142 - unsigned int shift; 142 + unsigned int hugepage_shift; 143 143 144 - ptep = find_linux_pte_or_hugepte(pgdir, hva, &shift); 144 + ptep = find_linux_pte_or_hugepte(pgdir, hva, &hugepage_shift); 145 145 if (!ptep) 146 146 return __pte(0); 147 - if (shift) 148 - *pte_sizep = 1ul << shift; 147 + if (hugepage_shift) 148 + *pte_sizep = 1ul << hugepage_shift; 149 149 else 150 150 *pte_sizep = PAGE_SIZE; 151 151 if (ps > *pte_sizep) 152 152 return __pte(0); 153 - if (!pte_present(*ptep)) 154 - return __pte(0); 155 - return kvmppc_read_update_linux_pte(ptep, writing); 153 + return kvmppc_read_update_linux_pte(ptep, writing, hugepage_shift); 156 154 } 157 155 158 156 static inline void unlock_hpte(unsigned long *hpte, unsigned long hpte_v)

+1 -1

arch/powerpc/lib/sstep.c

··· 580 580 if (instr & 1) 581 581 regs->link = regs->nip; 582 582 if (branch_taken(instr, regs)) 583 - regs->nip = imm; 583 + regs->nip = truncate_if_32bit(regs->msr, imm); 584 584 return 1; 585 585 #ifdef CONFIG_PPC64 586 586 case 17: /* sc */

+2 -1

arch/powerpc/math-emu/Makefile

··· 4 4 fmadd.o fmadds.o fmsub.o fmsubs.o \ 5 5 fmul.o fmuls.o fnabs.o fneg.o \ 6 6 fnmadd.o fnmadds.o fnmsub.o fnmsubs.o \ 7 - fres.o frsp.o frsqrte.o fsel.o lfs.o \ 7 + fres.o fre.o frsp.o fsel.o lfs.o \ 8 + frsqrte.o frsqrtes.o \ 8 9 fsqrt.o fsqrts.o fsub.o fsubs.o \ 9 10 mcrfs.o mffs.o mtfsb0.o mtfsb1.o \ 10 11 mtfsf.o mtfsfi.o stfiwx.o stfs.o \

+11

arch/powerpc/math-emu/fre.c

··· 1 + #include <linux/types.h> 2 + #include <linux/errno.h> 3 + #include <asm/uaccess.h> 4 + 5 + int fre(void *frD, void *frB) 6 + { 7 + #ifdef DEBUG 8 + printk("%s: %p %p\n", __func__, frD, frB); 9 + #endif 10 + return -ENOSYS; 11 + }

+11

arch/powerpc/math-emu/frsqrtes.c

··· 1 + #include <linux/types.h> 2 + #include <linux/errno.h> 3 + #include <asm/uaccess.h> 4 + 5 + int frsqrtes(void *frD, void *frB) 6 + { 7 + #ifdef DEBUG 8 + printk("%s: %p %p\n", __func__, frD, frB); 9 + #endif 10 + return 0; 11 + }

+10 -4

arch/powerpc/math-emu/math.c

··· 58 58 FLOATFUNC(fneg); 59 59 60 60 /* Optional */ 61 + FLOATFUNC(fre); 61 62 FLOATFUNC(fres); 62 63 FLOATFUNC(frsqrte); 64 + FLOATFUNC(frsqrtes); 63 65 FLOATFUNC(fsel); 64 66 FLOATFUNC(fsqrt); 65 67 FLOATFUNC(fsqrts); ··· 99 97 #define FSQRTS 0x016 /* 22 */ 100 98 #define FRES 0x018 /* 24 */ 101 99 #define FMULS 0x019 /* 25 */ 100 + #define FRSQRTES 0x01a /* 26 */ 102 101 #define FMSUBS 0x01c /* 28 */ 103 102 #define FMADDS 0x01d /* 29 */ 104 103 #define FNMSUBS 0x01e /* 30 */ ··· 112 109 #define FADD 0x015 /* 21 */ 113 110 #define FSQRT 0x016 /* 22 */ 114 111 #define FSEL 0x017 /* 23 */ 112 + #define FRE 0x018 /* 24 */ 115 113 #define FMUL 0x019 /* 25 */ 116 114 #define FRSQRTE 0x01a /* 26 */ 117 115 #define FMSUB 0x01c /* 28 */ ··· 303 299 case FDIVS: func = fdivs; type = AB; break; 304 300 case FSUBS: func = fsubs; type = AB; break; 305 301 case FADDS: func = fadds; type = AB; break; 306 - case FSQRTS: func = fsqrts; type = AB; break; 307 - case FRES: func = fres; type = AB; break; 302 + case FSQRTS: func = fsqrts; type = XB; break; 303 + case FRES: func = fres; type = XB; break; 308 304 case FMULS: func = fmuls; type = AC; break; 305 + case FRSQRTES: func = frsqrtes;type = XB; break; 309 306 case FMSUBS: func = fmsubs; type = ABC; break; 310 307 case FMADDS: func = fmadds; type = ABC; break; 311 308 case FNMSUBS: func = fnmsubs; type = ABC; break; ··· 322 317 case FDIV: func = fdiv; type = AB; break; 323 318 case FSUB: func = fsub; type = AB; break; 324 319 case FADD: func = fadd; type = AB; break; 325 - case FSQRT: func = fsqrt; type = AB; break; 320 + case FSQRT: func = fsqrt; type = XB; break; 321 + case FRE: func = fre; type = XB; break; 326 322 case FSEL: func = fsel; type = ABC; break; 327 323 case FMUL: func = fmul; type = AC; break; 328 - case FRSQRTE: func = frsqrte; type = AB; break; 324 + case FRSQRTE: func = frsqrte; type = XB; break; 329 325 case FMSUB: func = fmsub; type = ABC; break; 330 326 case FMADD: func = fmadd; type = ABC; break; 331 327 case FNMSUB: func = fnmsub; type = ABC; break;

+3 -3

arch/powerpc/mm/44x_mmu.c

··· 41 41 42 42 unsigned long tlb_47x_boltmap[1024/8]; 43 43 44 - static void __cpuinit ppc44x_update_tlb_hwater(void) 44 + static void ppc44x_update_tlb_hwater(void) 45 45 { 46 46 extern unsigned int tlb_44x_patch_hwater_D[]; 47 47 extern unsigned int tlb_44x_patch_hwater_I[]; ··· 134 134 /* 135 135 * "Pins" a 256MB TLB entry in AS0 for kernel lowmem for 47x type MMU 136 136 */ 137 - static void __cpuinit ppc47x_pin_tlb(unsigned int virt, unsigned int phys) 137 + static void ppc47x_pin_tlb(unsigned int virt, unsigned int phys) 138 138 { 139 139 unsigned int rA; 140 140 int bolted; ··· 229 229 } 230 230 231 231 #ifdef CONFIG_SMP 232 - void __cpuinit mmu_init_secondary(int cpu) 232 + void mmu_init_secondary(int cpu) 233 233 { 234 234 unsigned long addr; 235 235 unsigned long memstart = memstart_addr & ~(PPC_PIN_SIZE - 1);

+4 -4

arch/powerpc/mm/Makefile

··· 6 6 7 7 ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC) 8 8 9 - obj-y := fault.o mem.o pgtable.o gup.o \ 9 + obj-y := fault.o mem.o pgtable.o gup.o mmap.o \ 10 10 init_$(CONFIG_WORD_SIZE).o \ 11 11 pgtable_$(CONFIG_WORD_SIZE).o 12 12 obj-$(CONFIG_PPC_MMU_NOHASH) += mmu_context_nohash.o tlb_nohash.o \ 13 13 tlb_nohash_low.o 14 14 obj-$(CONFIG_PPC_BOOK3E) += tlb_low_$(CONFIG_WORD_SIZE)e.o 15 - obj-$(CONFIG_PPC64) += mmap_64.o 16 15 hash64-$(CONFIG_PPC_NATIVE) := hash_native_64.o 17 16 obj-$(CONFIG_PPC_STD_MMU_64) += hash_utils_64.o \ 18 17 slb_low.o slb.o stab.o \ 19 - mmap_64.o $(hash64-y) 18 + $(hash64-y) 20 19 obj-$(CONFIG_PPC_STD_MMU_32) += ppc_mmu_32.o 21 20 obj-$(CONFIG_PPC_STD_MMU) += hash_low_$(CONFIG_WORD_SIZE).o \ 22 21 tlb_hash$(CONFIG_WORD_SIZE).o \ ··· 27 28 obj-$(CONFIG_PPC_FSL_BOOK3E) += fsl_booke_mmu.o 28 29 obj-$(CONFIG_NEED_MULTIPLE_NODES) += numa.o 29 30 obj-$(CONFIG_PPC_MM_SLICES) += slice.o 30 - ifeq ($(CONFIG_HUGETLB_PAGE),y) 31 31 obj-y += hugetlbpage.o 32 + ifeq ($(CONFIG_HUGETLB_PAGE),y) 32 33 obj-$(CONFIG_PPC_STD_MMU_64) += hugetlbpage-hash64.o 33 34 obj-$(CONFIG_PPC_BOOK3E_MMU) += hugetlbpage-book3e.o 34 35 endif 36 + obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += hugepage-hash64.o 35 37 obj-$(CONFIG_PPC_SUBPAGE_PROT) += subpage-prot.o 36 38 obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o 37 39 obj-$(CONFIG_HIGHMEM) += highmem.o

+12 -6

arch/powerpc/mm/gup.c

··· 34 34 35 35 ptep = pte_offset_kernel(&pmd, addr); 36 36 do { 37 - pte_t pte = *ptep; 37 + pte_t pte = ACCESS_ONCE(*ptep); 38 38 struct page *page; 39 39 40 40 if ((pte_val(pte) & mask) != result) ··· 63 63 64 64 pmdp = pmd_offset(&pud, addr); 65 65 do { 66 - pmd_t pmd = *pmdp; 66 + pmd_t pmd = ACCESS_ONCE(*pmdp); 67 67 68 68 next = pmd_addr_end(addr, end); 69 - if (pmd_none(pmd)) 69 + /* 70 + * If we find a splitting transparent hugepage we 71 + * return zero. That will result in taking the slow 72 + * path which will call wait_split_huge_page() 73 + * if the pmd is still in splitting state 74 + */ 75 + if (pmd_none(pmd) || pmd_trans_splitting(pmd)) 70 76 return 0; 71 - if (pmd_huge(pmd)) { 77 + if (pmd_huge(pmd) || pmd_large(pmd)) { 72 78 if (!gup_hugepte((pte_t *)pmdp, PMD_SIZE, addr, next, 73 79 write, pages, nr)) 74 80 return 0; ··· 97 91 98 92 pudp = pud_offset(&pgd, addr); 99 93 do { 100 - pud_t pud = *pudp; 94 + pud_t pud = ACCESS_ONCE(*pudp); 101 95 102 96 next = pud_addr_end(addr, end); 103 97 if (pud_none(pud)) ··· 160 154 161 155 pgdp = pgd_offset(mm, addr); 162 156 do { 163 - pgd_t pgd = *pgdp; 157 + pgd_t pgd = ACCESS_ONCE(*pgdp); 164 158 165 159 pr_devel(" %016lx: normal pgd %p\n", addr, 166 160 (void *)pgd_val(pgd));

+12 -9

arch/powerpc/mm/hash_low_64.S

··· 289 289 290 290 /* Call ppc_md.hpte_updatepp */ 291 291 mr r5,r29 /* vpn */ 292 - li r6,MMU_PAGE_4K /* page size */ 293 - ld r7,STK_PARAM(R9)(r1) /* segment size */ 294 - ld r8,STK_PARAM(R8)(r1) /* get "local" param */ 292 + li r6,MMU_PAGE_4K /* base page size */ 293 + li r7,MMU_PAGE_4K /* actual page size */ 294 + ld r8,STK_PARAM(R9)(r1) /* segment size */ 295 + ld r9,STK_PARAM(R8)(r1) /* get "local" param */ 295 296 _GLOBAL(htab_call_hpte_updatepp) 296 297 bl . /* Patched by htab_finish_init() */ 297 298 ··· 650 649 651 650 /* Call ppc_md.hpte_updatepp */ 652 651 mr r5,r29 /* vpn */ 653 - li r6,MMU_PAGE_4K /* page size */ 654 - ld r7,STK_PARAM(R9)(r1) /* segment size */ 655 - ld r8,STK_PARAM(R8)(r1) /* get "local" param */ 652 + li r6,MMU_PAGE_4K /* base page size */ 653 + li r7,MMU_PAGE_4K /* actual page size */ 654 + ld r8,STK_PARAM(R9)(r1) /* segment size */ 655 + ld r9,STK_PARAM(R8)(r1) /* get "local" param */ 656 656 _GLOBAL(htab_call_hpte_updatepp) 657 657 bl . /* patched by htab_finish_init() */ 658 658 ··· 939 937 940 938 /* Call ppc_md.hpte_updatepp */ 941 939 mr r5,r29 /* vpn */ 942 - li r6,MMU_PAGE_64K 943 - ld r7,STK_PARAM(R9)(r1) /* segment size */ 944 - ld r8,STK_PARAM(R8)(r1) /* get "local" param */ 940 + li r6,MMU_PAGE_64K /* base page size */ 941 + li r7,MMU_PAGE_64K /* actual page size */ 942 + ld r8,STK_PARAM(R9)(r1) /* segment size */ 943 + ld r9,STK_PARAM(R8)(r1) /* get "local" param */ 945 944 _GLOBAL(ht64_call_hpte_updatepp) 946 945 bl . /* patched by htab_finish_init() */ 947 946

+118 -77

arch/powerpc/mm/hash_native_64.c

··· 273 273 return i; 274 274 } 275 275 276 - static inline int __hpte_actual_psize(unsigned int lp, int psize) 277 - { 278 - int i, shift; 279 - unsigned int mask; 280 - 281 - /* start from 1 ignoring MMU_PAGE_4K */ 282 - for (i = 1; i < MMU_PAGE_COUNT; i++) { 283 - 284 - /* invalid penc */ 285 - if (mmu_psize_defs[psize].penc[i] == -1) 286 - continue; 287 - /* 288 - * encoding bits per actual page size 289 - * PTE LP actual page size 290 - * rrrr rrrz >=8KB 291 - * rrrr rrzz >=16KB 292 - * rrrr rzzz >=32KB 293 - * rrrr zzzz >=64KB 294 - * ....... 295 - */ 296 - shift = mmu_psize_defs[i].shift - LP_SHIFT; 297 - if (shift > LP_BITS) 298 - shift = LP_BITS; 299 - mask = (1 << shift) - 1; 300 - if ((lp & mask) == mmu_psize_defs[psize].penc[i]) 301 - return i; 302 - } 303 - return -1; 304 - } 305 - 306 - static inline int hpte_actual_psize(struct hash_pte *hptep, int psize) 307 - { 308 - /* Look at the 8 bit LP value */ 309 - unsigned int lp = (hptep->r >> LP_SHIFT) & ((1 << LP_BITS) - 1); 310 - 311 - if (!(hptep->v & HPTE_V_VALID)) 312 - return -1; 313 - 314 - /* First check if it is large page */ 315 - if (!(hptep->v & HPTE_V_LARGE)) 316 - return MMU_PAGE_4K; 317 - 318 - return __hpte_actual_psize(lp, psize); 319 - } 320 - 321 276 static long native_hpte_updatepp(unsigned long slot, unsigned long newpp, 322 - unsigned long vpn, int psize, int ssize, 323 - int local) 277 + unsigned long vpn, int bpsize, 278 + int apsize, int ssize, int local) 324 279 { 325 280 struct hash_pte *hptep = htab_address + slot; 326 281 unsigned long hpte_v, want_v; 327 282 int ret = 0; 328 - int actual_psize; 329 283 330 - want_v = hpte_encode_avpn(vpn, psize, ssize); 284 + want_v = hpte_encode_avpn(vpn, bpsize, ssize); 331 285 332 286 DBG_LOW(" update(vpn=%016lx, avpnv=%016lx, group=%lx, newpp=%lx)", 333 287 vpn, want_v & HPTE_V_AVPN, slot, newpp); ··· 289 335 native_lock_hpte(hptep); 290 336 291 337 hpte_v = hptep->v; 292 - actual_psize = hpte_actual_psize(hptep, psize); 293 338 /* 294 339 * We need to invalidate the TLB always because hpte_remove doesn't do 295 340 * a tlb invalidate. If a hash bucket gets full, we "evict" a more/less ··· 296 343 * (hpte_remove) because we assume the old translation is still 297 344 * technically "valid". 298 345 */ 299 - if (actual_psize < 0) { 300 - actual_psize = psize; 301 - ret = -1; 302 - goto err_out; 303 - } 304 - if (!HPTE_V_COMPARE(hpte_v, want_v)) { 346 + if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID)) { 305 347 DBG_LOW(" -> miss\n"); 306 348 ret = -1; 307 349 } else { ··· 305 357 hptep->r = (hptep->r & ~(HPTE_R_PP | HPTE_R_N)) | 306 358 (newpp & (HPTE_R_PP | HPTE_R_N | HPTE_R_C)); 307 359 } 308 - err_out: 309 360 native_unlock_hpte(hptep); 310 361 311 362 /* Ensure it is out of the tlb too. */ 312 - tlbie(vpn, psize, actual_psize, ssize, local); 363 + tlbie(vpn, bpsize, apsize, ssize, local); 313 364 314 365 return ret; 315 366 } ··· 349 402 static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea, 350 403 int psize, int ssize) 351 404 { 352 - int actual_psize; 353 405 unsigned long vpn; 354 406 unsigned long vsid; 355 407 long slot; ··· 361 415 if (slot == -1) 362 416 panic("could not find page to bolt\n"); 363 417 hptep = htab_address + slot; 364 - actual_psize = hpte_actual_psize(hptep, psize); 365 - if (actual_psize < 0) 366 - actual_psize = psize; 367 418 368 419 /* Update the HPTE */ 369 420 hptep->r = (hptep->r & ~(HPTE_R_PP | HPTE_R_N)) | 370 421 (newpp & (HPTE_R_PP | HPTE_R_N)); 371 - 372 - /* Ensure it is out of the tlb too. */ 373 - tlbie(vpn, psize, actual_psize, ssize, 0); 422 + /* 423 + * Ensure it is out of the tlb too. Bolted entries base and 424 + * actual page size will be same. 425 + */ 426 + tlbie(vpn, psize, psize, ssize, 0); 374 427 } 375 428 376 429 static void native_hpte_invalidate(unsigned long slot, unsigned long vpn, 377 - int psize, int ssize, int local) 430 + int bpsize, int apsize, int ssize, int local) 378 431 { 379 432 struct hash_pte *hptep = htab_address + slot; 380 433 unsigned long hpte_v; 381 434 unsigned long want_v; 382 435 unsigned long flags; 383 - int actual_psize; 384 436 385 437 local_irq_save(flags); 386 438 387 439 DBG_LOW(" invalidate(vpn=%016lx, hash: %lx)\n", vpn, slot); 388 440 389 - want_v = hpte_encode_avpn(vpn, psize, ssize); 441 + want_v = hpte_encode_avpn(vpn, bpsize, ssize); 390 442 native_lock_hpte(hptep); 391 443 hpte_v = hptep->v; 392 444 393 - actual_psize = hpte_actual_psize(hptep, psize); 394 445 /* 395 446 * We need to invalidate the TLB always because hpte_remove doesn't do 396 447 * a tlb invalidate. If a hash bucket gets full, we "evict" a more/less ··· 395 452 * (hpte_remove) because we assume the old translation is still 396 453 * technically "valid". 397 454 */ 398 - if (actual_psize < 0) { 399 - actual_psize = psize; 400 - native_unlock_hpte(hptep); 401 - goto err_out; 402 - } 403 - if (!HPTE_V_COMPARE(hpte_v, want_v)) 455 + if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID)) 404 456 native_unlock_hpte(hptep); 405 457 else 406 458 /* Invalidate the hpte. NOTE: this also unlocks it */ 407 459 hptep->v = 0; 408 460 409 - err_out: 410 461 /* Invalidate the TLB */ 411 - tlbie(vpn, psize, actual_psize, ssize, local); 462 + tlbie(vpn, bpsize, apsize, ssize, local); 463 + 412 464 local_irq_restore(flags); 465 + } 466 + 467 + static void native_hugepage_invalidate(struct mm_struct *mm, 468 + unsigned char *hpte_slot_array, 469 + unsigned long addr, int psize) 470 + { 471 + int ssize = 0, i; 472 + int lock_tlbie; 473 + struct hash_pte *hptep; 474 + int actual_psize = MMU_PAGE_16M; 475 + unsigned int max_hpte_count, valid; 476 + unsigned long flags, s_addr = addr; 477 + unsigned long hpte_v, want_v, shift; 478 + unsigned long hidx, vpn = 0, vsid, hash, slot; 479 + 480 + shift = mmu_psize_defs[psize].shift; 481 + max_hpte_count = 1U << (PMD_SHIFT - shift); 482 + 483 + local_irq_save(flags); 484 + for (i = 0; i < max_hpte_count; i++) { 485 + valid = hpte_valid(hpte_slot_array, i); 486 + if (!valid) 487 + continue; 488 + hidx = hpte_hash_index(hpte_slot_array, i); 489 + 490 + /* get the vpn */ 491 + addr = s_addr + (i * (1ul << shift)); 492 + if (!is_kernel_addr(addr)) { 493 + ssize = user_segment_size(addr); 494 + vsid = get_vsid(mm->context.id, addr, ssize); 495 + WARN_ON(vsid == 0); 496 + } else { 497 + vsid = get_kernel_vsid(addr, mmu_kernel_ssize); 498 + ssize = mmu_kernel_ssize; 499 + } 500 + 501 + vpn = hpt_vpn(addr, vsid, ssize); 502 + hash = hpt_hash(vpn, shift, ssize); 503 + if (hidx & _PTEIDX_SECONDARY) 504 + hash = ~hash; 505 + 506 + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; 507 + slot += hidx & _PTEIDX_GROUP_IX; 508 + 509 + hptep = htab_address + slot; 510 + want_v = hpte_encode_avpn(vpn, psize, ssize); 511 + native_lock_hpte(hptep); 512 + hpte_v = hptep->v; 513 + 514 + /* Even if we miss, we need to invalidate the TLB */ 515 + if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID)) 516 + native_unlock_hpte(hptep); 517 + else 518 + /* Invalidate the hpte. NOTE: this also unlocks it */ 519 + hptep->v = 0; 520 + } 521 + /* 522 + * Since this is a hugepage, we just need a single tlbie. 523 + * use the last vpn. 524 + */ 525 + lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE); 526 + if (lock_tlbie) 527 + raw_spin_lock(&native_tlbie_lock); 528 + 529 + asm volatile("ptesync":::"memory"); 530 + __tlbie(vpn, psize, actual_psize, ssize); 531 + asm volatile("eieio; tlbsync; ptesync":::"memory"); 532 + 533 + if (lock_tlbie) 534 + raw_spin_unlock(&native_tlbie_lock); 535 + 536 + local_irq_restore(flags); 537 + } 538 + 539 + static inline int __hpte_actual_psize(unsigned int lp, int psize) 540 + { 541 + int i, shift; 542 + unsigned int mask; 543 + 544 + /* start from 1 ignoring MMU_PAGE_4K */ 545 + for (i = 1; i < MMU_PAGE_COUNT; i++) { 546 + 547 + /* invalid penc */ 548 + if (mmu_psize_defs[psize].penc[i] == -1) 549 + continue; 550 + /* 551 + * encoding bits per actual page size 552 + * PTE LP actual page size 553 + * rrrr rrrz >=8KB 554 + * rrrr rrzz >=16KB 555 + * rrrr rzzz >=32KB 556 + * rrrr zzzz >=64KB 557 + * ....... 558 + */ 559 + shift = mmu_psize_defs[i].shift - LP_SHIFT; 560 + if (shift > LP_BITS) 561 + shift = LP_BITS; 562 + mask = (1 << shift) - 1; 563 + if ((lp & mask) == mmu_psize_defs[psize].penc[i]) 564 + return i; 565 + } 566 + return -1; 413 567 } 414 568 415 569 static void hpte_decode(struct hash_pte *hpte, unsigned long slot, ··· 712 672 ppc_md.hpte_remove = native_hpte_remove; 713 673 ppc_md.hpte_clear_all = native_hpte_clear; 714 674 ppc_md.flush_hash_range = native_flush_hash_range; 675 + ppc_md.hugepage_invalidate = native_hugepage_invalidate; 715 676 }

+48 -21

arch/powerpc/mm/hash_utils_64.c

··· 807 807 } 808 808 809 809 #ifdef CONFIG_SMP 810 - void __cpuinit early_init_mmu_secondary(void) 810 + void early_init_mmu_secondary(void) 811 811 { 812 812 /* Initialize hash table for that CPU */ 813 813 if (!firmware_has_feature(FW_FEATURE_LPAR)) ··· 1050 1050 goto bail; 1051 1051 } 1052 1052 1053 - #ifdef CONFIG_HUGETLB_PAGE 1054 1053 if (hugeshift) { 1055 - rc = __hash_page_huge(ea, access, vsid, ptep, trap, local, 1056 - ssize, hugeshift, psize); 1054 + if (pmd_trans_huge(*(pmd_t *)ptep)) 1055 + rc = __hash_page_thp(ea, access, vsid, (pmd_t *)ptep, 1056 + trap, local, ssize, psize); 1057 + #ifdef CONFIG_HUGETLB_PAGE 1058 + else 1059 + rc = __hash_page_huge(ea, access, vsid, ptep, trap, 1060 + local, ssize, hugeshift, psize); 1061 + #else 1062 + else { 1063 + /* 1064 + * if we have hugeshift, and is not transhuge with 1065 + * hugetlb disabled, something is really wrong. 1066 + */ 1067 + rc = 1; 1068 + WARN_ON(1); 1069 + } 1070 + #endif 1057 1071 goto bail; 1058 1072 } 1059 - #endif /* CONFIG_HUGETLB_PAGE */ 1060 1073 1061 1074 #ifndef CONFIG_PPC_64K_PAGES 1062 1075 DBG_LOW(" i-pte: %016lx\n", pte_val(*ptep)); ··· 1158 1145 void hash_preload(struct mm_struct *mm, unsigned long ea, 1159 1146 unsigned long access, unsigned long trap) 1160 1147 { 1148 + int hugepage_shift; 1161 1149 unsigned long vsid; 1162 1150 pgd_t *pgdir; 1163 1151 pte_t *ptep; ··· 1180 1166 pgdir = mm->pgd; 1181 1167 if (pgdir == NULL) 1182 1168 return; 1183 - ptep = find_linux_pte(pgdir, ea); 1184 - if (!ptep) 1185 - return; 1186 1169 1170 + /* Get VSID */ 1171 + ssize = user_segment_size(ea); 1172 + vsid = get_vsid(mm->context.id, ea, ssize); 1173 + if (!vsid) 1174 + return; 1175 + /* 1176 + * Hash doesn't like irqs. Walking linux page table with irq disabled 1177 + * saves us from holding multiple locks. 1178 + */ 1179 + local_irq_save(flags); 1180 + 1181 + /* 1182 + * THP pages use update_mmu_cache_pmd. We don't do 1183 + * hash preload there. Hence can ignore THP here 1184 + */ 1185 + ptep = find_linux_pte_or_hugepte(pgdir, ea, &hugepage_shift); 1186 + if (!ptep) 1187 + goto out_exit; 1188 + 1189 + WARN_ON(hugepage_shift); 1187 1190 #ifdef CONFIG_PPC_64K_PAGES 1188 1191 /* If either _PAGE_4K_PFN or _PAGE_NO_CACHE is set (and we are on 1189 1192 * a 64K kernel), then we don't preload, hash_page() will take ··· 1209 1178 * page size demotion here 1210 1179 */ 1211 1180 if (pte_val(*ptep) & (_PAGE_4K_PFN | _PAGE_NO_CACHE)) 1212 - return; 1181 + goto out_exit; 1213 1182 #endif /* CONFIG_PPC_64K_PAGES */ 1214 - 1215 - /* Get VSID */ 1216 - ssize = user_segment_size(ea); 1217 - vsid = get_vsid(mm->context.id, ea, ssize); 1218 - if (!vsid) 1219 - return; 1220 - 1221 - /* Hash doesn't like irqs */ 1222 - local_irq_save(flags); 1223 1183 1224 1184 /* Is that local to this CPU ? */ 1225 1185 if (cpumask_equal(mm_cpumask(mm), cpumask_of(smp_processor_id()))) ··· 1233 1211 mm->context.user_psize, 1234 1212 mm->context.user_psize, 1235 1213 pte_val(*ptep)); 1236 - 1214 + out_exit: 1237 1215 local_irq_restore(flags); 1238 1216 } 1239 1217 ··· 1254 1232 slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; 1255 1233 slot += hidx & _PTEIDX_GROUP_IX; 1256 1234 DBG_LOW(" sub %ld: hash=%lx, hidx=%lx\n", index, slot, hidx); 1257 - ppc_md.hpte_invalidate(slot, vpn, psize, ssize, local); 1235 + /* 1236 + * We use same base page size and actual psize, because we don't 1237 + * use these functions for hugepage 1238 + */ 1239 + ppc_md.hpte_invalidate(slot, vpn, psize, psize, ssize, local); 1258 1240 } pte_iterate_hashed_end(); 1259 1241 1260 1242 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM ··· 1391 1365 hash = ~hash; 1392 1366 slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; 1393 1367 slot += hidx & _PTEIDX_GROUP_IX; 1394 - ppc_md.hpte_invalidate(slot, vpn, mmu_linear_psize, mmu_kernel_ssize, 0); 1368 + ppc_md.hpte_invalidate(slot, vpn, mmu_linear_psize, mmu_linear_psize, 1369 + mmu_kernel_ssize, 0); 1395 1370 } 1396 1371 1397 1372 void kernel_map_pages(struct page *page, int numpages, int enable)

+175

arch/powerpc/mm/hugepage-hash64.c

··· 1 + /* 2 + * Copyright IBM Corporation, 2013 3 + * Author Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> 4 + * 5 + * This program is free software; you can redistribute it and/or modify it 6 + * under the terms of version 2.1 of the GNU Lesser General Public License 7 + * as published by the Free Software Foundation. 8 + * 9 + * This program is distributed in the hope that it would be useful, but 10 + * WITHOUT ANY WARRANTY; without even the implied warranty of 11 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 12 + * 13 + */ 14 + 15 + /* 16 + * PPC64 THP Support for hash based MMUs 17 + */ 18 + #include <linux/mm.h> 19 + #include <asm/machdep.h> 20 + 21 + int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid, 22 + pmd_t *pmdp, unsigned long trap, int local, int ssize, 23 + unsigned int psize) 24 + { 25 + unsigned int index, valid; 26 + unsigned char *hpte_slot_array; 27 + unsigned long rflags, pa, hidx; 28 + unsigned long old_pmd, new_pmd; 29 + int ret, lpsize = MMU_PAGE_16M; 30 + unsigned long vpn, hash, shift, slot; 31 + 32 + /* 33 + * atomically mark the linux large page PMD busy and dirty 34 + */ 35 + do { 36 + old_pmd = pmd_val(*pmdp); 37 + /* If PMD busy, retry the access */ 38 + if (unlikely(old_pmd & _PAGE_BUSY)) 39 + return 0; 40 + /* If PMD is trans splitting retry the access */ 41 + if (unlikely(old_pmd & _PAGE_SPLITTING)) 42 + return 0; 43 + /* If PMD permissions don't match, take page fault */ 44 + if (unlikely(access & ~old_pmd)) 45 + return 1; 46 + /* 47 + * Try to lock the PTE, add ACCESSED and DIRTY if it was 48 + * a write access 49 + */ 50 + new_pmd = old_pmd | _PAGE_BUSY | _PAGE_ACCESSED; 51 + if (access & _PAGE_RW) 52 + new_pmd |= _PAGE_DIRTY; 53 + } while (old_pmd != __cmpxchg_u64((unsigned long *)pmdp, 54 + old_pmd, new_pmd)); 55 + /* 56 + * PP bits. _PAGE_USER is already PP bit 0x2, so we only 57 + * need to add in 0x1 if it's a read-only user page 58 + */ 59 + rflags = new_pmd & _PAGE_USER; 60 + if ((new_pmd & _PAGE_USER) && !((new_pmd & _PAGE_RW) && 61 + (new_pmd & _PAGE_DIRTY))) 62 + rflags |= 0x1; 63 + /* 64 + * _PAGE_EXEC -> HW_NO_EXEC since it's inverted 65 + */ 66 + rflags |= ((new_pmd & _PAGE_EXEC) ? 0 : HPTE_R_N); 67 + 68 + #if 0 69 + if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) { 70 + 71 + /* 72 + * No CPU has hugepages but lacks no execute, so we 73 + * don't need to worry about that case 74 + */ 75 + rflags = hash_page_do_lazy_icache(rflags, __pte(old_pte), trap); 76 + } 77 + #endif 78 + /* 79 + * Find the slot index details for this ea, using base page size. 80 + */ 81 + shift = mmu_psize_defs[psize].shift; 82 + index = (ea & ~HPAGE_PMD_MASK) >> shift; 83 + BUG_ON(index >= 4096); 84 + 85 + vpn = hpt_vpn(ea, vsid, ssize); 86 + hash = hpt_hash(vpn, shift, ssize); 87 + hpte_slot_array = get_hpte_slot_array(pmdp); 88 + 89 + valid = hpte_valid(hpte_slot_array, index); 90 + if (valid) { 91 + /* update the hpte bits */ 92 + hidx = hpte_hash_index(hpte_slot_array, index); 93 + if (hidx & _PTEIDX_SECONDARY) 94 + hash = ~hash; 95 + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; 96 + slot += hidx & _PTEIDX_GROUP_IX; 97 + 98 + ret = ppc_md.hpte_updatepp(slot, rflags, vpn, 99 + psize, lpsize, ssize, local); 100 + /* 101 + * We failed to update, try to insert a new entry. 102 + */ 103 + if (ret == -1) { 104 + /* 105 + * large pte is marked busy, so we can be sure 106 + * nobody is looking at hpte_slot_array. hence we can 107 + * safely update this here. 108 + */ 109 + valid = 0; 110 + new_pmd &= ~_PAGE_HPTEFLAGS; 111 + hpte_slot_array[index] = 0; 112 + } else 113 + /* clear the busy bits and set the hash pte bits */ 114 + new_pmd = (new_pmd & ~_PAGE_HPTEFLAGS) | _PAGE_HASHPTE; 115 + } 116 + 117 + if (!valid) { 118 + unsigned long hpte_group; 119 + 120 + /* insert new entry */ 121 + pa = pmd_pfn(__pmd(old_pmd)) << PAGE_SHIFT; 122 + repeat: 123 + hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL; 124 + 125 + /* clear the busy bits and set the hash pte bits */ 126 + new_pmd = (new_pmd & ~_PAGE_HPTEFLAGS) | _PAGE_HASHPTE; 127 + 128 + /* Add in WIMG bits */ 129 + rflags |= (new_pmd & (_PAGE_WRITETHRU | _PAGE_NO_CACHE | 130 + _PAGE_COHERENT | _PAGE_GUARDED)); 131 + 132 + /* Insert into the hash table, primary slot */ 133 + slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0, 134 + psize, lpsize, ssize); 135 + /* 136 + * Primary is full, try the secondary 137 + */ 138 + if (unlikely(slot == -1)) { 139 + hpte_group = ((~hash & htab_hash_mask) * 140 + HPTES_PER_GROUP) & ~0x7UL; 141 + slot = ppc_md.hpte_insert(hpte_group, vpn, pa, 142 + rflags, HPTE_V_SECONDARY, 143 + psize, lpsize, ssize); 144 + if (slot == -1) { 145 + if (mftb() & 0x1) 146 + hpte_group = ((hash & htab_hash_mask) * 147 + HPTES_PER_GROUP) & ~0x7UL; 148 + 149 + ppc_md.hpte_remove(hpte_group); 150 + goto repeat; 151 + } 152 + } 153 + /* 154 + * Hypervisor failure. Restore old pmd and return -1 155 + * similar to __hash_page_* 156 + */ 157 + if (unlikely(slot == -2)) { 158 + *pmdp = __pmd(old_pmd); 159 + hash_failure_debug(ea, access, vsid, trap, ssize, 160 + psize, lpsize, old_pmd); 161 + return -1; 162 + } 163 + /* 164 + * large pte is marked busy, so we can be sure 165 + * nobody is looking at hpte_slot_array. hence we can 166 + * safely update this here. 167 + */ 168 + mark_hpte_slot_valid(hpte_slot_array, index, slot); 169 + } 170 + /* 171 + * No need to use ldarx/stdcx here 172 + */ 173 + *pmdp = __pmd(new_pmd & ~_PAGE_BUSY); 174 + return 0; 175 + }

+1 -1

arch/powerpc/mm/hugetlbpage-hash64.c

··· 81 81 slot += (old_pte & _PAGE_F_GIX) >> 12; 82 82 83 83 if (ppc_md.hpte_updatepp(slot, rflags, vpn, mmu_psize, 84 - ssize, local) == -1) 84 + mmu_psize, ssize, local) == -1) 85 85 old_pte &= ~_PAGE_HPTEFLAGS; 86 86 } 87 87

+174 -125

arch/powerpc/mm/hugetlbpage.c

··· 21 21 #include <asm/pgalloc.h> 22 22 #include <asm/tlb.h> 23 23 #include <asm/setup.h> 24 + #include <asm/hugetlb.h> 25 + 26 + #ifdef CONFIG_HUGETLB_PAGE 24 27 25 28 #define PAGE_SHIFT_64K 16 26 29 #define PAGE_SHIFT_16M 24 ··· 103 100 } 104 101 #endif 105 102 106 - /* 107 - * We have 4 cases for pgds and pmds: 108 - * (1) invalid (all zeroes) 109 - * (2) pointer to next table, as normal; bottom 6 bits == 0 110 - * (3) leaf pte for huge page, bottom two bits != 00 111 - * (4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table 112 - */ 113 - pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift) 114 - { 115 - pgd_t *pg; 116 - pud_t *pu; 117 - pmd_t *pm; 118 - pte_t *ret_pte; 119 - hugepd_t *hpdp = NULL; 120 - unsigned pdshift = PGDIR_SHIFT; 121 - 122 - if (shift) 123 - *shift = 0; 124 - 125 - pg = pgdir + pgd_index(ea); 126 - 127 - if (pgd_huge(*pg)) { 128 - ret_pte = (pte_t *) pg; 129 - goto out; 130 - } else if (is_hugepd(pg)) 131 - hpdp = (hugepd_t *)pg; 132 - else if (!pgd_none(*pg)) { 133 - pdshift = PUD_SHIFT; 134 - pu = pud_offset(pg, ea); 135 - 136 - if (pud_huge(*pu)) { 137 - ret_pte = (pte_t *) pu; 138 - goto out; 139 - } else if (is_hugepd(pu)) 140 - hpdp = (hugepd_t *)pu; 141 - else if (!pud_none(*pu)) { 142 - pdshift = PMD_SHIFT; 143 - pm = pmd_offset(pu, ea); 144 - 145 - if (pmd_huge(*pm)) { 146 - ret_pte = (pte_t *) pm; 147 - goto out; 148 - } else if (is_hugepd(pm)) 149 - hpdp = (hugepd_t *)pm; 150 - else if (!pmd_none(*pm)) 151 - return pte_offset_kernel(pm, ea); 152 - } 153 - } 154 - if (!hpdp) 155 - return NULL; 156 - 157 - ret_pte = hugepte_offset(hpdp, ea, pdshift); 158 - pdshift = hugepd_shift(*hpdp); 159 - out: 160 - if (shift) 161 - *shift = pdshift; 162 - return ret_pte; 163 - } 164 - EXPORT_SYMBOL_GPL(find_linux_pte_or_hugepte); 165 - 166 103 pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) 167 104 { 105 + /* Only called for hugetlbfs pages, hence can ignore THP */ 168 106 return find_linux_pte_or_hugepte(mm->pgd, addr, NULL); 169 107 } 170 108 ··· 680 736 struct page *page; 681 737 unsigned shift; 682 738 unsigned long mask; 683 - 739 + /* 740 + * Transparent hugepages are handled by generic code. We can skip them 741 + * here. 742 + */ 684 743 ptep = find_linux_pte_or_hugepte(mm->pgd, address, &shift); 685 744 686 745 /* Verify it is a huge page else bail. */ 687 - if (!ptep || !shift) 746 + if (!ptep || !shift || pmd_trans_huge(*(pmd_t *)ptep)) 688 747 return ERR_PTR(-EINVAL); 689 748 690 749 mask = (1UL << shift) - 1; ··· 704 757 { 705 758 BUG(); 706 759 return NULL; 707 - } 708 - 709 - int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, 710 - unsigned long end, int write, struct page **pages, int *nr) 711 - { 712 - unsigned long mask; 713 - unsigned long pte_end; 714 - struct page *head, *page, *tail; 715 - pte_t pte; 716 - int refs; 717 - 718 - pte_end = (addr + sz) & ~(sz-1); 719 - if (pte_end < end) 720 - end = pte_end; 721 - 722 - pte = *ptep; 723 - mask = _PAGE_PRESENT | _PAGE_USER; 724 - if (write) 725 - mask |= _PAGE_RW; 726 - 727 - if ((pte_val(pte) & mask) != mask) 728 - return 0; 729 - 730 - /* hugepages are never "special" */ 731 - VM_BUG_ON(!pfn_valid(pte_pfn(pte))); 732 - 733 - refs = 0; 734 - head = pte_page(pte); 735 - 736 - page = head + ((addr & (sz-1)) >> PAGE_SHIFT); 737 - tail = page; 738 - do { 739 - VM_BUG_ON(compound_head(page) != head); 740 - pages[*nr] = page; 741 - (*nr)++; 742 - page++; 743 - refs++; 744 - } while (addr += PAGE_SIZE, addr != end); 745 - 746 - if (!page_cache_add_speculative(head, refs)) { 747 - *nr -= refs; 748 - return 0; 749 - } 750 - 751 - if (unlikely(pte_val(pte) != pte_val(*ptep))) { 752 - /* Could be optimized better */ 753 - *nr -= refs; 754 - while (refs--) 755 - put_page(head); 756 - return 0; 757 - } 758 - 759 - /* 760 - * Any tail page need their mapcount reference taken before we 761 - * return. 762 - */ 763 - while (refs--) { 764 - if (PageTail(tail)) 765 - get_huge_page_tail(tail); 766 - tail++; 767 - } 768 - 769 - return 1; 770 760 } 771 761 772 762 static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end, ··· 921 1037 kunmap_atomic(start); 922 1038 } 923 1039 } 1040 + } 1041 + 1042 + #endif /* CONFIG_HUGETLB_PAGE */ 1043 + 1044 + /* 1045 + * We have 4 cases for pgds and pmds: 1046 + * (1) invalid (all zeroes) 1047 + * (2) pointer to next table, as normal; bottom 6 bits == 0 1048 + * (3) leaf pte for huge page, bottom two bits != 00 1049 + * (4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table 1050 + * 1051 + * So long as we atomically load page table pointers we are safe against teardown, 1052 + * we can follow the address down to the the page and take a ref on it. 1053 + */ 1054 + 1055 + pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift) 1056 + { 1057 + pgd_t pgd, *pgdp; 1058 + pud_t pud, *pudp; 1059 + pmd_t pmd, *pmdp; 1060 + pte_t *ret_pte; 1061 + hugepd_t *hpdp = NULL; 1062 + unsigned pdshift = PGDIR_SHIFT; 1063 + 1064 + if (shift) 1065 + *shift = 0; 1066 + 1067 + pgdp = pgdir + pgd_index(ea); 1068 + pgd = ACCESS_ONCE(*pgdp); 1069 + /* 1070 + * Always operate on the local stack value. This make sure the 1071 + * value don't get updated by a parallel THP split/collapse, 1072 + * page fault or a page unmap. The return pte_t * is still not 1073 + * stable. So should be checked there for above conditions. 1074 + */ 1075 + if (pgd_none(pgd)) 1076 + return NULL; 1077 + else if (pgd_huge(pgd)) { 1078 + ret_pte = (pte_t *) pgdp; 1079 + goto out; 1080 + } else if (is_hugepd(&pgd)) 1081 + hpdp = (hugepd_t *)&pgd; 1082 + else { 1083 + /* 1084 + * Even if we end up with an unmap, the pgtable will not 1085 + * be freed, because we do an rcu free and here we are 1086 + * irq disabled 1087 + */ 1088 + pdshift = PUD_SHIFT; 1089 + pudp = pud_offset(&pgd, ea); 1090 + pud = ACCESS_ONCE(*pudp); 1091 + 1092 + if (pud_none(pud)) 1093 + return NULL; 1094 + else if (pud_huge(pud)) { 1095 + ret_pte = (pte_t *) pudp; 1096 + goto out; 1097 + } else if (is_hugepd(&pud)) 1098 + hpdp = (hugepd_t *)&pud; 1099 + else { 1100 + pdshift = PMD_SHIFT; 1101 + pmdp = pmd_offset(&pud, ea); 1102 + pmd = ACCESS_ONCE(*pmdp); 1103 + /* 1104 + * A hugepage collapse is captured by pmd_none, because 1105 + * it mark the pmd none and do a hpte invalidate. 1106 + * 1107 + * A hugepage split is captured by pmd_trans_splitting 1108 + * because we mark the pmd trans splitting and do a 1109 + * hpte invalidate 1110 + * 1111 + */ 1112 + if (pmd_none(pmd) || pmd_trans_splitting(pmd)) 1113 + return NULL; 1114 + 1115 + if (pmd_huge(pmd) || pmd_large(pmd)) { 1116 + ret_pte = (pte_t *) pmdp; 1117 + goto out; 1118 + } else if (is_hugepd(&pmd)) 1119 + hpdp = (hugepd_t *)&pmd; 1120 + else 1121 + return pte_offset_kernel(&pmd, ea); 1122 + } 1123 + } 1124 + if (!hpdp) 1125 + return NULL; 1126 + 1127 + ret_pte = hugepte_offset(hpdp, ea, pdshift); 1128 + pdshift = hugepd_shift(*hpdp); 1129 + out: 1130 + if (shift) 1131 + *shift = pdshift; 1132 + return ret_pte; 1133 + } 1134 + EXPORT_SYMBOL_GPL(find_linux_pte_or_hugepte); 1135 + 1136 + int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, 1137 + unsigned long end, int write, struct page **pages, int *nr) 1138 + { 1139 + unsigned long mask; 1140 + unsigned long pte_end; 1141 + struct page *head, *page, *tail; 1142 + pte_t pte; 1143 + int refs; 1144 + 1145 + pte_end = (addr + sz) & ~(sz-1); 1146 + if (pte_end < end) 1147 + end = pte_end; 1148 + 1149 + pte = ACCESS_ONCE(*ptep); 1150 + mask = _PAGE_PRESENT | _PAGE_USER; 1151 + if (write) 1152 + mask |= _PAGE_RW; 1153 + 1154 + if ((pte_val(pte) & mask) != mask) 1155 + return 0; 1156 + 1157 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 1158 + /* 1159 + * check for splitting here 1160 + */ 1161 + if (pmd_trans_splitting(pte_pmd(pte))) 1162 + return 0; 1163 + #endif 1164 + 1165 + /* hugepages are never "special" */ 1166 + VM_BUG_ON(!pfn_valid(pte_pfn(pte))); 1167 + 1168 + refs = 0; 1169 + head = pte_page(pte); 1170 + 1171 + page = head + ((addr & (sz-1)) >> PAGE_SHIFT); 1172 + tail = page; 1173 + do { 1174 + VM_BUG_ON(compound_head(page) != head); 1175 + pages[*nr] = page; 1176 + (*nr)++; 1177 + page++; 1178 + refs++; 1179 + } while (addr += PAGE_SIZE, addr != end); 1180 + 1181 + if (!page_cache_add_speculative(head, refs)) { 1182 + *nr -= refs; 1183 + return 0; 1184 + } 1185 + 1186 + if (unlikely(pte_val(pte) != pte_val(*ptep))) { 1187 + /* Could be optimized better */ 1188 + *nr -= refs; 1189 + while (refs--) 1190 + put_page(head); 1191 + return 0; 1192 + } 1193 + 1194 + /* 1195 + * Any tail page need their mapcount reference taken before we 1196 + * return. 1197 + */ 1198 + while (refs--) { 1199 + if (PageTail(tail)) 1200 + get_huge_page_tail(tail); 1201 + tail++; 1202 + } 1203 + 1204 + return 1; 924 1205 }

+6 -3

arch/powerpc/mm/init_64.c

··· 88 88 89 89 static void pmd_ctor(void *addr) 90 90 { 91 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 92 + memset(addr, 0, PMD_TABLE_SIZE * 2); 93 + #else 91 94 memset(addr, 0, PMD_TABLE_SIZE); 95 + #endif 92 96 } 93 97 94 98 struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE]; ··· 141 137 void pgtable_cache_init(void) 142 138 { 143 139 pgtable_cache_add(PGD_INDEX_SIZE, pgd_ctor); 144 - pgtable_cache_add(PMD_INDEX_SIZE, pmd_ctor); 145 - if (!PGT_CACHE(PGD_INDEX_SIZE) || !PGT_CACHE(PMD_INDEX_SIZE)) 140 + pgtable_cache_add(PMD_CACHE_INDEX, pmd_ctor); 141 + if (!PGT_CACHE(PGD_INDEX_SIZE) || !PGT_CACHE(PMD_CACHE_INDEX)) 146 142 panic("Couldn't allocate pgtable caches"); 147 - 148 143 /* In all current configs, when the PUD index exists it's the 149 144 * same size as either the pgd or pmd index. Verify that the 150 145 * initialization above has also created a PUD cache. This

+4

arch/powerpc/mm/mem.c

··· 461 461 pte_t *ptep) 462 462 { 463 463 #ifdef CONFIG_PPC_STD_MMU 464 + /* 465 + * We don't need to worry about _PAGE_PRESENT here because we are 466 + * called with either mm->page_table_lock held or ptl lock held 467 + */ 464 468 unsigned long access = 0, trap; 465 469 466 470 /* We only want HPTEs for linux PTEs that have _PAGE_ACCESSED set */

arch/powerpc/mm/mmap_64.c arch/powerpc/mm/mmap.c

+9 -6

arch/powerpc/mm/mmu_context_nohash.c

··· 112 112 */ 113 113 for_each_cpu(cpu, mm_cpumask(mm)) { 114 114 for (i = cpu_first_thread_sibling(cpu); 115 - i <= cpu_last_thread_sibling(cpu); i++) 116 - __set_bit(id, stale_map[i]); 115 + i <= cpu_last_thread_sibling(cpu); i++) { 116 + if (stale_map[i]) 117 + __set_bit(id, stale_map[i]); 118 + } 117 119 cpu = i - 1; 118 120 } 119 121 return id; ··· 274 272 /* XXX This clear should ultimately be part of local_flush_tlb_mm */ 275 273 for (i = cpu_first_thread_sibling(cpu); 276 274 i <= cpu_last_thread_sibling(cpu); i++) { 277 - __clear_bit(id, stale_map[i]); 275 + if (stale_map[i]) 276 + __clear_bit(id, stale_map[i]); 278 277 } 279 278 } 280 279 ··· 332 329 333 330 #ifdef CONFIG_SMP 334 331 335 - static int __cpuinit mmu_context_cpu_notify(struct notifier_block *self, 336 - unsigned long action, void *hcpu) 332 + static int mmu_context_cpu_notify(struct notifier_block *self, 333 + unsigned long action, void *hcpu) 337 334 { 338 335 unsigned int cpu = (unsigned int)(long)hcpu; 339 336 ··· 366 363 return NOTIFY_OK; 367 364 } 368 365 369 - static struct notifier_block __cpuinitdata mmu_context_cpu_nb = { 366 + static struct notifier_block mmu_context_cpu_nb = { 370 367 .notifier_call = mmu_context_cpu_notify, 371 368 }; 372 369

+6 -6

arch/powerpc/mm/numa.c

··· 516 516 * Figure out to which domain a cpu belongs and stick it there. 517 517 * Return the id of the domain used. 518 518 */ 519 - static int __cpuinit numa_setup_cpu(unsigned long lcpu) 519 + static int numa_setup_cpu(unsigned long lcpu) 520 520 { 521 521 int nid = 0; 522 522 struct device_node *cpu = of_get_cpu_node(lcpu, NULL); ··· 538 538 return nid; 539 539 } 540 540 541 - static int __cpuinit cpu_numa_callback(struct notifier_block *nfb, 542 - unsigned long action, 541 + static int cpu_numa_callback(struct notifier_block *nfb, unsigned long action, 543 542 void *hcpu) 544 543 { 545 544 unsigned long lcpu = (unsigned long)hcpu; ··· 918 919 return ret; 919 920 } 920 921 921 - static struct notifier_block __cpuinitdata ppc64_numa_nb = { 922 + static struct notifier_block ppc64_numa_nb = { 922 923 .notifier_call = cpu_numa_callback, 923 924 .priority = 1 /* Must run before sched domains notifier. */ 924 925 }; ··· 1432 1433 if (cpu != update->cpu) 1433 1434 continue; 1434 1435 1435 - unregister_cpu_under_node(update->cpu, update->old_nid); 1436 1436 unmap_cpu_from_node(update->cpu); 1437 1437 map_cpu_to_node(update->cpu, update->new_nid); 1438 1438 vdso_getcpu_init(); 1439 - register_cpu_under_node(update->cpu, update->new_nid); 1440 1439 } 1441 1440 1442 1441 return 0; ··· 1482 1485 stop_machine(update_cpu_topology, &updates[0], &updated_cpus); 1483 1486 1484 1487 for (ud = &updates[0]; ud; ud = ud->next) { 1488 + unregister_cpu_under_node(ud->cpu, ud->old_nid); 1489 + register_cpu_under_node(ud->cpu, ud->new_nid); 1490 + 1485 1491 dev = get_cpu_device(ud->cpu); 1486 1492 if (dev) 1487 1493 kobject_uevent(&dev->kobj, KOBJ_CHANGE);

+8

arch/powerpc/mm/pgtable.c

··· 235 235 pud = pud_offset(pgd, addr); 236 236 BUG_ON(pud_none(*pud)); 237 237 pmd = pmd_offset(pud, addr); 238 + /* 239 + * khugepaged to collapse normal pages to hugepage, first set 240 + * pmd to none to force page fault/gup to take mmap_sem. After 241 + * pmd is set to none, we do a pte_clear which does this assertion 242 + * so if we find pmd none, return. 243 + */ 244 + if (pmd_none(*pmd)) 245 + return; 238 246 BUG_ON(!pmd_present(*pmd)); 239 247 assert_spin_locked(pte_lockptr(mm, pmd)); 240 248 }

+414

arch/powerpc/mm/pgtable_64.c

··· 338 338 EXPORT_SYMBOL(__iounmap); 339 339 EXPORT_SYMBOL(__iounmap_at); 340 340 341 + /* 342 + * For hugepage we have pfn in the pmd, we use PTE_RPN_SHIFT bits for flags 343 + * For PTE page, we have a PTE_FRAG_SIZE (4K) aligned virtual address. 344 + */ 345 + struct page *pmd_page(pmd_t pmd) 346 + { 347 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 348 + if (pmd_trans_huge(pmd)) 349 + return pfn_to_page(pmd_pfn(pmd)); 350 + #endif 351 + return virt_to_page(pmd_page_vaddr(pmd)); 352 + } 353 + 341 354 #ifdef CONFIG_PPC_64K_PAGES 342 355 static pte_t *get_from_cache(struct mm_struct *mm) 343 356 { ··· 468 455 } 469 456 #endif 470 457 #endif /* CONFIG_PPC_64K_PAGES */ 458 + 459 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 460 + 461 + /* 462 + * This is called when relaxing access to a hugepage. It's also called in the page 463 + * fault path when we don't hit any of the major fault cases, ie, a minor 464 + * update of _PAGE_ACCESSED, _PAGE_DIRTY, etc... The generic code will have 465 + * handled those two for us, we additionally deal with missing execute 466 + * permission here on some processors 467 + */ 468 + int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address, 469 + pmd_t *pmdp, pmd_t entry, int dirty) 470 + { 471 + int changed; 472 + #ifdef CONFIG_DEBUG_VM 473 + WARN_ON(!pmd_trans_huge(*pmdp)); 474 + assert_spin_locked(&vma->vm_mm->page_table_lock); 475 + #endif 476 + changed = !pmd_same(*(pmdp), entry); 477 + if (changed) { 478 + __ptep_set_access_flags(pmdp_ptep(pmdp), pmd_pte(entry)); 479 + /* 480 + * Since we are not supporting SW TLB systems, we don't 481 + * have any thing similar to flush_tlb_page_nohash() 482 + */ 483 + } 484 + return changed; 485 + } 486 + 487 + unsigned long pmd_hugepage_update(struct mm_struct *mm, unsigned long addr, 488 + pmd_t *pmdp, unsigned long clr) 489 + { 490 + 491 + unsigned long old, tmp; 492 + 493 + #ifdef CONFIG_DEBUG_VM 494 + WARN_ON(!pmd_trans_huge(*pmdp)); 495 + assert_spin_locked(&mm->page_table_lock); 496 + #endif 497 + 498 + #ifdef PTE_ATOMIC_UPDATES 499 + __asm__ __volatile__( 500 + "1: ldarx %0,0,%3\n\ 501 + andi. %1,%0,%6\n\ 502 + bne- 1b \n\ 503 + andc %1,%0,%4 \n\ 504 + stdcx. %1,0,%3 \n\ 505 + bne- 1b" 506 + : "=&r" (old), "=&r" (tmp), "=m" (*pmdp) 507 + : "r" (pmdp), "r" (clr), "m" (*pmdp), "i" (_PAGE_BUSY) 508 + : "cc" ); 509 + #else 510 + old = pmd_val(*pmdp); 511 + *pmdp = __pmd(old & ~clr); 512 + #endif 513 + if (old & _PAGE_HASHPTE) 514 + hpte_do_hugepage_flush(mm, addr, pmdp); 515 + return old; 516 + } 517 + 518 + pmd_t pmdp_clear_flush(struct vm_area_struct *vma, unsigned long address, 519 + pmd_t *pmdp) 520 + { 521 + pmd_t pmd; 522 + 523 + VM_BUG_ON(address & ~HPAGE_PMD_MASK); 524 + if (pmd_trans_huge(*pmdp)) { 525 + pmd = pmdp_get_and_clear(vma->vm_mm, address, pmdp); 526 + } else { 527 + /* 528 + * khugepaged calls this for normal pmd 529 + */ 530 + pmd = *pmdp; 531 + pmd_clear(pmdp); 532 + /* 533 + * Wait for all pending hash_page to finish. This is needed 534 + * in case of subpage collapse. When we collapse normal pages 535 + * to hugepage, we first clear the pmd, then invalidate all 536 + * the PTE entries. The assumption here is that any low level 537 + * page fault will see a none pmd and take the slow path that 538 + * will wait on mmap_sem. But we could very well be in a 539 + * hash_page with local ptep pointer value. Such a hash page 540 + * can result in adding new HPTE entries for normal subpages. 541 + * That means we could be modifying the page content as we 542 + * copy them to a huge page. So wait for parallel hash_page 543 + * to finish before invalidating HPTE entries. We can do this 544 + * by sending an IPI to all the cpus and executing a dummy 545 + * function there. 546 + */ 547 + kick_all_cpus_sync(); 548 + /* 549 + * Now invalidate the hpte entries in the range 550 + * covered by pmd. This make sure we take a 551 + * fault and will find the pmd as none, which will 552 + * result in a major fault which takes mmap_sem and 553 + * hence wait for collapse to complete. Without this 554 + * the __collapse_huge_page_copy can result in copying 555 + * the old content. 556 + */ 557 + flush_tlb_pmd_range(vma->vm_mm, &pmd, address); 558 + } 559 + return pmd; 560 + } 561 + 562 + int pmdp_test_and_clear_young(struct vm_area_struct *vma, 563 + unsigned long address, pmd_t *pmdp) 564 + { 565 + return __pmdp_test_and_clear_young(vma->vm_mm, address, pmdp); 566 + } 567 + 568 + /* 569 + * We currently remove entries from the hashtable regardless of whether 570 + * the entry was young or dirty. The generic routines only flush if the 571 + * entry was young or dirty which is not good enough. 572 + * 573 + * We should be more intelligent about this but for the moment we override 574 + * these functions and force a tlb flush unconditionally 575 + */ 576 + int pmdp_clear_flush_young(struct vm_area_struct *vma, 577 + unsigned long address, pmd_t *pmdp) 578 + { 579 + return __pmdp_test_and_clear_young(vma->vm_mm, address, pmdp); 580 + } 581 + 582 + /* 583 + * We mark the pmd splitting and invalidate all the hpte 584 + * entries for this hugepage. 585 + */ 586 + void pmdp_splitting_flush(struct vm_area_struct *vma, 587 + unsigned long address, pmd_t *pmdp) 588 + { 589 + unsigned long old, tmp; 590 + 591 + VM_BUG_ON(address & ~HPAGE_PMD_MASK); 592 + 593 + #ifdef CONFIG_DEBUG_VM 594 + WARN_ON(!pmd_trans_huge(*pmdp)); 595 + assert_spin_locked(&vma->vm_mm->page_table_lock); 596 + #endif 597 + 598 + #ifdef PTE_ATOMIC_UPDATES 599 + 600 + __asm__ __volatile__( 601 + "1: ldarx %0,0,%3\n\ 602 + andi. %1,%0,%6\n\ 603 + bne- 1b \n\ 604 + ori %1,%0,%4 \n\ 605 + stdcx. %1,0,%3 \n\ 606 + bne- 1b" 607 + : "=&r" (old), "=&r" (tmp), "=m" (*pmdp) 608 + : "r" (pmdp), "i" (_PAGE_SPLITTING), "m" (*pmdp), "i" (_PAGE_BUSY) 609 + : "cc" ); 610 + #else 611 + old = pmd_val(*pmdp); 612 + *pmdp = __pmd(old | _PAGE_SPLITTING); 613 + #endif 614 + /* 615 + * If we didn't had the splitting flag set, go and flush the 616 + * HPTE entries. 617 + */ 618 + if (!(old & _PAGE_SPLITTING)) { 619 + /* We need to flush the hpte */ 620 + if (old & _PAGE_HASHPTE) 621 + hpte_do_hugepage_flush(vma->vm_mm, address, pmdp); 622 + } 623 + } 624 + 625 + /* 626 + * We want to put the pgtable in pmd and use pgtable for tracking 627 + * the base page size hptes 628 + */ 629 + void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, 630 + pgtable_t pgtable) 631 + { 632 + pgtable_t *pgtable_slot; 633 + assert_spin_locked(&mm->page_table_lock); 634 + /* 635 + * we store the pgtable in the second half of PMD 636 + */ 637 + pgtable_slot = (pgtable_t *)pmdp + PTRS_PER_PMD; 638 + *pgtable_slot = pgtable; 639 + /* 640 + * expose the deposited pgtable to other cpus. 641 + * before we set the hugepage PTE at pmd level 642 + * hash fault code looks at the deposted pgtable 643 + * to store hash index values. 644 + */ 645 + smp_wmb(); 646 + } 647 + 648 + pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) 649 + { 650 + pgtable_t pgtable; 651 + pgtable_t *pgtable_slot; 652 + 653 + assert_spin_locked(&mm->page_table_lock); 654 + pgtable_slot = (pgtable_t *)pmdp + PTRS_PER_PMD; 655 + pgtable = *pgtable_slot; 656 + /* 657 + * Once we withdraw, mark the entry NULL. 658 + */ 659 + *pgtable_slot = NULL; 660 + /* 661 + * We store HPTE information in the deposited PTE fragment. 662 + * zero out the content on withdraw. 663 + */ 664 + memset(pgtable, 0, PTE_FRAG_SIZE); 665 + return pgtable; 666 + } 667 + 668 + /* 669 + * set a new huge pmd. We should not be called for updating 670 + * an existing pmd entry. That should go via pmd_hugepage_update. 671 + */ 672 + void set_pmd_at(struct mm_struct *mm, unsigned long addr, 673 + pmd_t *pmdp, pmd_t pmd) 674 + { 675 + #ifdef CONFIG_DEBUG_VM 676 + WARN_ON(!pmd_none(*pmdp)); 677 + assert_spin_locked(&mm->page_table_lock); 678 + WARN_ON(!pmd_trans_huge(pmd)); 679 + #endif 680 + return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd)); 681 + } 682 + 683 + void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, 684 + pmd_t *pmdp) 685 + { 686 + pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT); 687 + } 688 + 689 + /* 690 + * A linux hugepage PMD was changed and the corresponding hash table entries 691 + * neesd to be flushed. 692 + */ 693 + void hpte_do_hugepage_flush(struct mm_struct *mm, unsigned long addr, 694 + pmd_t *pmdp) 695 + { 696 + int ssize, i; 697 + unsigned long s_addr; 698 + int max_hpte_count; 699 + unsigned int psize, valid; 700 + unsigned char *hpte_slot_array; 701 + unsigned long hidx, vpn, vsid, hash, shift, slot; 702 + 703 + /* 704 + * Flush all the hptes mapping this hugepage 705 + */ 706 + s_addr = addr & HPAGE_PMD_MASK; 707 + hpte_slot_array = get_hpte_slot_array(pmdp); 708 + /* 709 + * IF we try to do a HUGE PTE update after a withdraw is done. 710 + * we will find the below NULL. This happens when we do 711 + * split_huge_page_pmd 712 + */ 713 + if (!hpte_slot_array) 714 + return; 715 + 716 + /* get the base page size */ 717 + psize = get_slice_psize(mm, s_addr); 718 + 719 + if (ppc_md.hugepage_invalidate) 720 + return ppc_md.hugepage_invalidate(mm, hpte_slot_array, 721 + s_addr, psize); 722 + /* 723 + * No bluk hpte removal support, invalidate each entry 724 + */ 725 + shift = mmu_psize_defs[psize].shift; 726 + max_hpte_count = HPAGE_PMD_SIZE >> shift; 727 + for (i = 0; i < max_hpte_count; i++) { 728 + /* 729 + * 8 bits per each hpte entries 730 + * 000| [ secondary group (one bit) | hidx (3 bits) | valid bit] 731 + */ 732 + valid = hpte_valid(hpte_slot_array, i); 733 + if (!valid) 734 + continue; 735 + hidx = hpte_hash_index(hpte_slot_array, i); 736 + 737 + /* get the vpn */ 738 + addr = s_addr + (i * (1ul << shift)); 739 + if (!is_kernel_addr(addr)) { 740 + ssize = user_segment_size(addr); 741 + vsid = get_vsid(mm->context.id, addr, ssize); 742 + WARN_ON(vsid == 0); 743 + } else { 744 + vsid = get_kernel_vsid(addr, mmu_kernel_ssize); 745 + ssize = mmu_kernel_ssize; 746 + } 747 + 748 + vpn = hpt_vpn(addr, vsid, ssize); 749 + hash = hpt_hash(vpn, shift, ssize); 750 + if (hidx & _PTEIDX_SECONDARY) 751 + hash = ~hash; 752 + 753 + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; 754 + slot += hidx & _PTEIDX_GROUP_IX; 755 + ppc_md.hpte_invalidate(slot, vpn, psize, 756 + MMU_PAGE_16M, ssize, 0); 757 + } 758 + } 759 + 760 + static pmd_t pmd_set_protbits(pmd_t pmd, pgprot_t pgprot) 761 + { 762 + pmd_val(pmd) |= pgprot_val(pgprot); 763 + return pmd; 764 + } 765 + 766 + pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot) 767 + { 768 + pmd_t pmd; 769 + /* 770 + * For a valid pte, we would have _PAGE_PRESENT or _PAGE_FILE always 771 + * set. We use this to check THP page at pmd level. 772 + * leaf pte for huge page, bottom two bits != 00 773 + */ 774 + pmd_val(pmd) = pfn << PTE_RPN_SHIFT; 775 + pmd_val(pmd) |= _PAGE_THP_HUGE; 776 + pmd = pmd_set_protbits(pmd, pgprot); 777 + return pmd; 778 + } 779 + 780 + pmd_t mk_pmd(struct page *page, pgprot_t pgprot) 781 + { 782 + return pfn_pmd(page_to_pfn(page), pgprot); 783 + } 784 + 785 + pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) 786 + { 787 + 788 + pmd_val(pmd) &= _HPAGE_CHG_MASK; 789 + pmd = pmd_set_protbits(pmd, newprot); 790 + return pmd; 791 + } 792 + 793 + /* 794 + * This is called at the end of handling a user page fault, when the 795 + * fault has been handled by updating a HUGE PMD entry in the linux page tables. 796 + * We use it to preload an HPTE into the hash table corresponding to 797 + * the updated linux HUGE PMD entry. 798 + */ 799 + void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr, 800 + pmd_t *pmd) 801 + { 802 + return; 803 + } 804 + 805 + pmd_t pmdp_get_and_clear(struct mm_struct *mm, 806 + unsigned long addr, pmd_t *pmdp) 807 + { 808 + pmd_t old_pmd; 809 + pgtable_t pgtable; 810 + unsigned long old; 811 + pgtable_t *pgtable_slot; 812 + 813 + old = pmd_hugepage_update(mm, addr, pmdp, ~0UL); 814 + old_pmd = __pmd(old); 815 + /* 816 + * We have pmd == none and we are holding page_table_lock. 817 + * So we can safely go and clear the pgtable hash 818 + * index info. 819 + */ 820 + pgtable_slot = (pgtable_t *)pmdp + PTRS_PER_PMD; 821 + pgtable = *pgtable_slot; 822 + /* 823 + * Let's zero out old valid and hash index details 824 + * hash fault look at them. 825 + */ 826 + memset(pgtable, 0, PTE_FRAG_SIZE); 827 + return old_pmd; 828 + } 829 + 830 + int has_transparent_hugepage(void) 831 + { 832 + if (!mmu_has_feature(MMU_FTR_16M_PAGE)) 833 + return 0; 834 + /* 835 + * We support THP only if PMD_SIZE is 16MB. 836 + */ 837 + if (mmu_psize_defs[MMU_PAGE_16M].shift != PMD_SHIFT) 838 + return 0; 839 + /* 840 + * We need to make sure that we support 16MB hugepage in a segement 841 + * with base page size 64K or 4K. We only enable THP with a PAGE_SIZE 842 + * of 64K. 843 + */ 844 + /* 845 + * If we have 64K HPTE, we will be using that by default 846 + */ 847 + if (mmu_psize_defs[MMU_PAGE_64K].shift && 848 + (mmu_psize_defs[MMU_PAGE_64K].penc[MMU_PAGE_16M] == -1)) 849 + return 0; 850 + /* 851 + * Ok we only have 4K HPTE 852 + */ 853 + if (mmu_psize_defs[MMU_PAGE_4K].penc[MMU_PAGE_16M] == -1) 854 + return 0; 855 + 856 + return 1; 857 + } 858 + #endif /* CONFIG_TRANSPARENT_HUGEPAGE */

+48

arch/powerpc/mm/subpage-prot.c

··· 130 130 up_write(&mm->mmap_sem); 131 131 } 132 132 133 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 134 + static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr, 135 + unsigned long end, struct mm_walk *walk) 136 + { 137 + struct vm_area_struct *vma = walk->private; 138 + split_huge_page_pmd(vma, addr, pmd); 139 + return 0; 140 + } 141 + 142 + static void subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long addr, 143 + unsigned long len) 144 + { 145 + struct vm_area_struct *vma; 146 + struct mm_walk subpage_proto_walk = { 147 + .mm = mm, 148 + .pmd_entry = subpage_walk_pmd_entry, 149 + }; 150 + 151 + /* 152 + * We don't try too hard, we just mark all the vma in that range 153 + * VM_NOHUGEPAGE and split them. 154 + */ 155 + vma = find_vma(mm, addr); 156 + /* 157 + * If the range is in unmapped range, just return 158 + */ 159 + if (vma && ((addr + len) <= vma->vm_start)) 160 + return; 161 + 162 + while (vma) { 163 + if (vma->vm_start >= (addr + len)) 164 + break; 165 + vma->vm_flags |= VM_NOHUGEPAGE; 166 + subpage_proto_walk.private = vma; 167 + walk_page_range(vma->vm_start, vma->vm_end, 168 + &subpage_proto_walk); 169 + vma = vma->vm_next; 170 + } 171 + } 172 + #else 173 + static void subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long addr, 174 + unsigned long len) 175 + { 176 + return; 177 + } 178 + #endif 179 + 133 180 /* 134 181 * Copy in a subpage protection map for an address range. 135 182 * The map has 2 bits per 4k subpage, so 32 bits per 64k page. ··· 215 168 return -EFAULT; 216 169 217 170 down_write(&mm->mmap_sem); 171 + subpage_mark_vma_nohuge(mm, addr, len); 218 172 for (limit = addr + len; addr < limit; addr = next) { 219 173 next = pmd_addr_end(addr, limit); 220 174 err = -ENOMEM;

+34 -2

arch/powerpc/mm/tlb_hash64.c

··· 189 189 void __flush_hash_table_range(struct mm_struct *mm, unsigned long start, 190 190 unsigned long end) 191 191 { 192 + int hugepage_shift; 192 193 unsigned long flags; 193 194 194 195 start = _ALIGN_DOWN(start, PAGE_SIZE); ··· 207 206 local_irq_save(flags); 208 207 arch_enter_lazy_mmu_mode(); 209 208 for (; start < end; start += PAGE_SIZE) { 210 - pte_t *ptep = find_linux_pte(mm->pgd, start); 209 + pte_t *ptep = find_linux_pte_or_hugepte(mm->pgd, start, 210 + &hugepage_shift); 211 211 unsigned long pte; 212 212 213 213 if (ptep == NULL) ··· 216 214 pte = pte_val(*ptep); 217 215 if (!(pte & _PAGE_HASHPTE)) 218 216 continue; 219 - hpte_need_flush(mm, start, ptep, pte, 0); 217 + if (unlikely(hugepage_shift && pmd_trans_huge(*(pmd_t *)pte))) 218 + hpte_do_hugepage_flush(mm, start, (pmd_t *)pte); 219 + else 220 + hpte_need_flush(mm, start, ptep, pte, 0); 221 + } 222 + arch_leave_lazy_mmu_mode(); 223 + local_irq_restore(flags); 224 + } 225 + 226 + void flush_tlb_pmd_range(struct mm_struct *mm, pmd_t *pmd, unsigned long addr) 227 + { 228 + pte_t *pte; 229 + pte_t *start_pte; 230 + unsigned long flags; 231 + 232 + addr = _ALIGN_DOWN(addr, PMD_SIZE); 233 + /* Note: Normally, we should only ever use a batch within a 234 + * PTE locked section. This violates the rule, but will work 235 + * since we don't actually modify the PTEs, we just flush the 236 + * hash while leaving the PTEs intact (including their reference 237 + * to being hashed). This is not the most performance oriented 238 + * way to do things but is fine for our needs here. 239 + */ 240 + local_irq_save(flags); 241 + arch_enter_lazy_mmu_mode(); 242 + start_pte = pte_offset_map(pmd, addr); 243 + for (pte = start_pte; pte < start_pte + PTRS_PER_PTE; pte++) { 244 + unsigned long pteval = pte_val(*pte); 245 + if (pteval & _PAGE_HASHPTE) 246 + hpte_need_flush(mm, addr, pte, pteval, 0); 247 + addr += PAGE_SIZE; 220 248 } 221 249 arch_leave_lazy_mmu_mode(); 222 250 local_irq_restore(flags);

+1 -1

arch/powerpc/mm/tlb_nohash.c

··· 648 648 __early_init_mmu(1); 649 649 } 650 650 651 - void __cpuinit early_init_mmu_secondary(void) 651 + void early_init_mmu_secondary(void) 652 652 { 653 653 __early_init_mmu(0); 654 654 }

+174 -27

arch/powerpc/perf/core-book3s.c

··· 75 75 76 76 #define MMCR0_FCHV 0 77 77 #define MMCR0_PMCjCE MMCR0_PMCnCE 78 + #define MMCR0_FC56 0 79 + #define MMCR0_PMAO 0 80 + #define MMCR0_EBE 0 81 + #define MMCR0_PMCC 0 82 + #define MMCR0_PMCC_U6 0 78 83 79 84 #define SPRN_MMCRA SPRN_MMCR2 80 85 #define MMCRA_SAMPLE_ENABLE 0 ··· 105 100 static inline int siar_valid(struct pt_regs *regs) 106 101 { 107 102 return 1; 103 + } 104 + 105 + static bool is_ebb_event(struct perf_event *event) { return false; } 106 + static int ebb_event_check(struct perf_event *event) { return 0; } 107 + static void ebb_event_add(struct perf_event *event) { } 108 + static void ebb_switch_out(unsigned long mmcr0) { } 109 + static unsigned long ebb_switch_in(bool ebb, unsigned long mmcr0) 110 + { 111 + return mmcr0; 108 112 } 109 113 110 114 static inline void power_pmu_bhrb_enable(struct perf_event *event) {} ··· 476 462 return; 477 463 } 478 464 465 + static bool is_ebb_event(struct perf_event *event) 466 + { 467 + /* 468 + * This could be a per-PMU callback, but we'd rather avoid the cost. We 469 + * check that the PMU supports EBB, meaning those that don't can still 470 + * use bit 63 of the event code for something else if they wish. 471 + */ 472 + return (ppmu->flags & PPMU_EBB) && 473 + ((event->attr.config >> EVENT_CONFIG_EBB_SHIFT) & 1); 474 + } 475 + 476 + static int ebb_event_check(struct perf_event *event) 477 + { 478 + struct perf_event *leader = event->group_leader; 479 + 480 + /* Event and group leader must agree on EBB */ 481 + if (is_ebb_event(leader) != is_ebb_event(event)) 482 + return -EINVAL; 483 + 484 + if (is_ebb_event(event)) { 485 + if (!(event->attach_state & PERF_ATTACH_TASK)) 486 + return -EINVAL; 487 + 488 + if (!leader->attr.pinned || !leader->attr.exclusive) 489 + return -EINVAL; 490 + 491 + if (event->attr.inherit || event->attr.sample_period || 492 + event->attr.enable_on_exec || event->attr.freq) 493 + return -EINVAL; 494 + } 495 + 496 + return 0; 497 + } 498 + 499 + static void ebb_event_add(struct perf_event *event) 500 + { 501 + if (!is_ebb_event(event) || current->thread.used_ebb) 502 + return; 503 + 504 + /* 505 + * IFF this is the first time we've added an EBB event, set 506 + * PMXE in the user MMCR0 so we can detect when it's cleared by 507 + * userspace. We need this so that we can context switch while 508 + * userspace is in the EBB handler (where PMXE is 0). 509 + */ 510 + current->thread.used_ebb = 1; 511 + current->thread.mmcr0 |= MMCR0_PMXE; 512 + } 513 + 514 + static void ebb_switch_out(unsigned long mmcr0) 515 + { 516 + if (!(mmcr0 & MMCR0_EBE)) 517 + return; 518 + 519 + current->thread.siar = mfspr(SPRN_SIAR); 520 + current->thread.sier = mfspr(SPRN_SIER); 521 + current->thread.sdar = mfspr(SPRN_SDAR); 522 + current->thread.mmcr0 = mmcr0 & MMCR0_USER_MASK; 523 + current->thread.mmcr2 = mfspr(SPRN_MMCR2) & MMCR2_USER_MASK; 524 + } 525 + 526 + static unsigned long ebb_switch_in(bool ebb, unsigned long mmcr0) 527 + { 528 + if (!ebb) 529 + goto out; 530 + 531 + /* Enable EBB and read/write to all 6 PMCs for userspace */ 532 + mmcr0 |= MMCR0_EBE | MMCR0_PMCC_U6; 533 + 534 + /* Add any bits from the user reg, FC or PMAO */ 535 + mmcr0 |= current->thread.mmcr0; 536 + 537 + /* Be careful not to set PMXE if userspace had it cleared */ 538 + if (!(current->thread.mmcr0 & MMCR0_PMXE)) 539 + mmcr0 &= ~MMCR0_PMXE; 540 + 541 + mtspr(SPRN_SIAR, current->thread.siar); 542 + mtspr(SPRN_SIER, current->thread.sier); 543 + mtspr(SPRN_SDAR, current->thread.sdar); 544 + mtspr(SPRN_MMCR2, current->thread.mmcr2); 545 + out: 546 + return mmcr0; 547 + } 479 548 #endif /* CONFIG_PPC64 */ 480 549 481 550 static void perf_event_interrupt(struct pt_regs *regs); ··· 829 732 830 733 if (!event->hw.idx) 831 734 return; 735 + 736 + if (is_ebb_event(event)) { 737 + val = read_pmc(event->hw.idx); 738 + local64_set(&event->hw.prev_count, val); 739 + return; 740 + } 741 + 832 742 /* 833 743 * Performance monitor interrupts come even when interrupts 834 744 * are soft-disabled, as long as interrupts are hard-enabled. ··· 956 852 static void power_pmu_disable(struct pmu *pmu) 957 853 { 958 854 struct cpu_hw_events *cpuhw; 959 - unsigned long flags; 855 + unsigned long flags, mmcr0, val; 960 856 961 857 if (!ppmu) 962 858 return; ··· 964 860 cpuhw = &__get_cpu_var(cpu_hw_events); 965 861 966 862 if (!cpuhw->disabled) { 967 - cpuhw->disabled = 1; 968 - cpuhw->n_added = 0; 969 - 970 863 /* 971 864 * Check if we ever enabled the PMU on this cpu. 972 865 */ ··· 971 870 ppc_enable_pmcs(); 972 871 cpuhw->pmcs_enabled = 1; 973 872 } 873 + 874 + /* 875 + * Set the 'freeze counters' bit, clear EBE/PMCC/PMAO/FC56. 876 + */ 877 + val = mmcr0 = mfspr(SPRN_MMCR0); 878 + val |= MMCR0_FC; 879 + val &= ~(MMCR0_EBE | MMCR0_PMCC | MMCR0_PMAO | MMCR0_FC56); 880 + 881 + /* 882 + * The barrier is to make sure the mtspr has been 883 + * executed and the PMU has frozen the events etc. 884 + * before we return. 885 + */ 886 + write_mmcr0(cpuhw, val); 887 + mb(); 974 888 975 889 /* 976 890 * Disable instruction sampling if it was enabled ··· 996 880 mb(); 997 881 } 998 882 999 - /* 1000 - * Set the 'freeze counters' bit. 1001 - * The barrier is to make sure the mtspr has been 1002 - * executed and the PMU has frozen the events 1003 - * before we return. 1004 - */ 1005 - write_mmcr0(cpuhw, mfspr(SPRN_MMCR0) | MMCR0_FC); 1006 - mb(); 883 + cpuhw->disabled = 1; 884 + cpuhw->n_added = 0; 885 + 886 + ebb_switch_out(mmcr0); 1007 887 } 888 + 1008 889 local_irq_restore(flags); 1009 890 } 1010 891 ··· 1016 903 struct cpu_hw_events *cpuhw; 1017 904 unsigned long flags; 1018 905 long i; 1019 - unsigned long val; 906 + unsigned long val, mmcr0; 1020 907 s64 left; 1021 908 unsigned int hwc_index[MAX_HWEVENTS]; 1022 909 int n_lim; 1023 910 int idx; 911 + bool ebb; 1024 912 1025 913 if (!ppmu) 1026 914 return; 1027 915 local_irq_save(flags); 916 + 1028 917 cpuhw = &__get_cpu_var(cpu_hw_events); 1029 - if (!cpuhw->disabled) { 1030 - local_irq_restore(flags); 1031 - return; 918 + if (!cpuhw->disabled) 919 + goto out; 920 + 921 + if (cpuhw->n_events == 0) { 922 + ppc_set_pmu_inuse(0); 923 + goto out; 1032 924 } 925 + 1033 926 cpuhw->disabled = 0; 927 + 928 + /* 929 + * EBB requires an exclusive group and all events must have the EBB 930 + * flag set, or not set, so we can just check a single event. Also we 931 + * know we have at least one event. 932 + */ 933 + ebb = is_ebb_event(cpuhw->event[0]); 1034 934 1035 935 /* 1036 936 * If we didn't change anything, or only removed events, ··· 1054 928 if (!cpuhw->n_added) { 1055 929 mtspr(SPRN_MMCRA, cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE); 1056 930 mtspr(SPRN_MMCR1, cpuhw->mmcr[1]); 1057 - if (cpuhw->n_events == 0) 1058 - ppc_set_pmu_inuse(0); 1059 931 goto out_enable; 1060 932 } 1061 933 ··· 1120 996 ++n_lim; 1121 997 continue; 1122 998 } 1123 - val = 0; 1124 - if (event->hw.sample_period) { 1125 - left = local64_read(&event->hw.period_left); 1126 - if (left < 0x80000000L) 1127 - val = 0x80000000L - left; 999 + 1000 + if (ebb) 1001 + val = local64_read(&event->hw.prev_count); 1002 + else { 1003 + val = 0; 1004 + if (event->hw.sample_period) { 1005 + left = local64_read(&event->hw.period_left); 1006 + if (left < 0x80000000L) 1007 + val = 0x80000000L - left; 1008 + } 1009 + local64_set(&event->hw.prev_count, val); 1128 1010 } 1129 - local64_set(&event->hw.prev_count, val); 1011 + 1130 1012 event->hw.idx = idx; 1131 1013 if (event->hw.state & PERF_HES_STOPPED) 1132 1014 val = 0; 1133 1015 write_pmc(idx, val); 1016 + 1134 1017 perf_event_update_userpage(event); 1135 1018 } 1136 1019 cpuhw->n_limited = n_lim; 1137 1020 cpuhw->mmcr[0] |= MMCR0_PMXE | MMCR0_FCECE; 1138 1021 1139 1022 out_enable: 1023 + mmcr0 = ebb_switch_in(ebb, cpuhw->mmcr[0]); 1024 + 1140 1025 mb(); 1141 - write_mmcr0(cpuhw, cpuhw->mmcr[0]); 1026 + write_mmcr0(cpuhw, mmcr0); 1142 1027 1143 1028 /* 1144 1029 * Enable instruction sampling if necessary ··· 1245 1112 event->hw.config = cpuhw->events[n0]; 1246 1113 1247 1114 nocheck: 1115 + ebb_event_add(event); 1116 + 1248 1117 ++cpuhw->n_events; 1249 1118 ++cpuhw->n_added; 1250 1119 ··· 1607 1472 } 1608 1473 } 1609 1474 1475 + /* Extra checks for EBB */ 1476 + err = ebb_event_check(event); 1477 + if (err) 1478 + return err; 1479 + 1610 1480 /* 1611 1481 * If this is in a group, check if it can go on with all the 1612 1482 * other hardware events in the group. We assume the event ··· 1649 1509 event->hw.event_base = cflags[n]; 1650 1510 event->hw.last_period = event->hw.sample_period; 1651 1511 local64_set(&event->hw.period_left, event->hw.last_period); 1512 + 1513 + /* 1514 + * For EBB events we just context switch the PMC value, we don't do any 1515 + * of the sample_period logic. We use hw.prev_count for this. 1516 + */ 1517 + if (is_ebb_event(event)) 1518 + local64_set(&event->hw.prev_count, 0); 1652 1519 1653 1520 /* 1654 1521 * See if we need to reserve the PMU. ··· 1933 1786 cpuhw->mmcr[0] = MMCR0_FC; 1934 1787 } 1935 1788 1936 - static int __cpuinit 1789 + static int 1937 1790 power_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu) 1938 1791 { 1939 1792 unsigned int cpu = (long)hcpu; ··· 1950 1803 return NOTIFY_OK; 1951 1804 } 1952 1805 1953 - int __cpuinit register_power_pmu(struct power_pmu *pmu) 1806 + int register_power_pmu(struct power_pmu *pmu) 1954 1807 { 1955 1808 if (ppmu) 1956 1809 return -EBUSY; /* something's already registered */

+50 -12

arch/powerpc/perf/power8-pmu.c

··· 31 31 * 32 32 * 60 56 52 48 44 40 36 32 33 33 * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | 34 - * [ thresh_cmp ] [ thresh_ctl ] 35 - * | 36 - * thresh start/stop OR FAB match -* 34 + * | [ thresh_cmp ] [ thresh_ctl ] 35 + * | | 36 + * *- EBB (Linux) thresh start/stop OR FAB match -* 37 37 * 38 38 * 28 24 20 16 12 8 4 0 39 39 * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | ··· 85 85 * 86 86 */ 87 87 88 + #define EVENT_EBB_MASK 1ull 88 89 #define EVENT_THR_CMP_SHIFT 40 /* Threshold CMP value */ 89 90 #define EVENT_THR_CMP_MASK 0x3ff 90 91 #define EVENT_THR_CTL_SHIFT 32 /* Threshold control value (start/stop) */ ··· 110 109 #define EVENT_IS_MARKED (EVENT_MARKED_MASK << EVENT_MARKED_SHIFT) 111 110 #define EVENT_PSEL_MASK 0xff /* PMCxSEL value */ 112 111 112 + #define EVENT_VALID_MASK \ 113 + ((EVENT_THRESH_MASK << EVENT_THRESH_SHIFT) | \ 114 + (EVENT_SAMPLE_MASK << EVENT_SAMPLE_SHIFT) | \ 115 + (EVENT_CACHE_SEL_MASK << EVENT_CACHE_SEL_SHIFT) | \ 116 + (EVENT_PMC_MASK << EVENT_PMC_SHIFT) | \ 117 + (EVENT_UNIT_MASK << EVENT_UNIT_SHIFT) | \ 118 + (EVENT_COMBINE_MASK << EVENT_COMBINE_SHIFT) | \ 119 + (EVENT_MARKED_MASK << EVENT_MARKED_SHIFT) | \ 120 + (EVENT_EBB_MASK << EVENT_CONFIG_EBB_SHIFT) | \ 121 + EVENT_PSEL_MASK) 122 + 113 123 /* MMCRA IFM bits - POWER8 */ 114 124 #define POWER8_MMCRA_IFM1 0x0000000040000000UL 115 125 #define POWER8_MMCRA_IFM2 0x0000000080000000UL ··· 142 130 * 143 131 * 28 24 20 16 12 8 4 0 144 132 * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | 145 - * [ ] [ sample ] [ ] [6] [5] [4] [3] [2] [1] 146 - * | | 147 - * L1 I/D qualifier -* | Count of events for each PMC. 148 - * | p1, p2, p3, p4, p5, p6. 133 + * | [ ] [ sample ] [ ] [6] [5] [4] [3] [2] [1] 134 + * EBB -* | | 135 + * | | Count of events for each PMC. 136 + * L1 I/D qualifier -* | p1, p2, p3, p4, p5, p6. 149 137 * nc - number of counters -* 150 138 * 151 139 * The PMC fields P1..P6, and NC, are adder fields. As we accumulate constraints ··· 160 148 /* We just throw all the threshold bits into the constraint */ 161 149 #define CNST_THRESH_VAL(v) (((v) & EVENT_THRESH_MASK) << 32) 162 150 #define CNST_THRESH_MASK CNST_THRESH_VAL(EVENT_THRESH_MASK) 151 + 152 + #define CNST_EBB_VAL(v) (((v) & EVENT_EBB_MASK) << 24) 153 + #define CNST_EBB_MASK CNST_EBB_VAL(EVENT_EBB_MASK) 163 154 164 155 #define CNST_L1_QUAL_VAL(v) (((v) & 3) << 22) 165 156 #define CNST_L1_QUAL_MASK CNST_L1_QUAL_VAL(3) ··· 222 207 223 208 static int power8_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp) 224 209 { 225 - unsigned int unit, pmc, cache; 210 + unsigned int unit, pmc, cache, ebb; 226 211 unsigned long mask, value; 227 212 228 213 mask = value = 0; 229 214 230 - pmc = (event >> EVENT_PMC_SHIFT) & EVENT_PMC_MASK; 231 - unit = (event >> EVENT_UNIT_SHIFT) & EVENT_UNIT_MASK; 232 - cache = (event >> EVENT_CACHE_SEL_SHIFT) & EVENT_CACHE_SEL_MASK; 215 + if (event & ~EVENT_VALID_MASK) 216 + return -1; 217 + 218 + pmc = (event >> EVENT_PMC_SHIFT) & EVENT_PMC_MASK; 219 + unit = (event >> EVENT_UNIT_SHIFT) & EVENT_UNIT_MASK; 220 + cache = (event >> EVENT_CACHE_SEL_SHIFT) & EVENT_CACHE_SEL_MASK; 221 + ebb = (event >> EVENT_CONFIG_EBB_SHIFT) & EVENT_EBB_MASK; 222 + 223 + /* Clear the EBB bit in the event, so event checks work below */ 224 + event &= ~(EVENT_EBB_MASK << EVENT_CONFIG_EBB_SHIFT); 233 225 234 226 if (pmc) { 235 227 if (pmc > 6) ··· 305 283 mask |= CNST_THRESH_MASK; 306 284 value |= CNST_THRESH_VAL(event >> EVENT_THRESH_SHIFT); 307 285 } 286 + 287 + if (!pmc && ebb) 288 + /* EBB events must specify the PMC */ 289 + return -1; 290 + 291 + /* 292 + * All events must agree on EBB, either all request it or none. 293 + * EBB events are pinned & exclusive, so this should never actually 294 + * hit, but we leave it as a fallback in case. 295 + */ 296 + mask |= CNST_EBB_VAL(ebb); 297 + value |= CNST_EBB_MASK; 308 298 309 299 *maskp = mask; 310 300 *valp = value; ··· 411 377 412 378 if (pmc_inuse & 0x7c) 413 379 mmcr[0] |= MMCR0_PMCjCE; 380 + 381 + /* If we're not using PMC 5 or 6, freeze them */ 382 + if (!(pmc_inuse & 0x60)) 383 + mmcr[0] |= MMCR0_FC56; 414 384 415 385 mmcr[1] = mmcr1; 416 386 mmcr[2] = mmcra; ··· 612 574 .get_constraint = power8_get_constraint, 613 575 .get_alternatives = power8_get_alternatives, 614 576 .disable_pmc = power8_disable_pmc, 615 - .flags = PPMU_HAS_SSLOT | PPMU_HAS_SIER | PPMU_BHRB, 577 + .flags = PPMU_HAS_SSLOT | PPMU_HAS_SIER | PPMU_BHRB | PPMU_EBB, 616 578 .n_generic = ARRAY_SIZE(power8_generic_events), 617 579 .generic_events = power8_generic_events, 618 580 .attr_groups = power8_pmu_attr_groups,

+39 -4

arch/powerpc/platforms/44x/currituck.c

··· 91 91 } 92 92 93 93 #ifdef CONFIG_SMP 94 - static void __cpuinit smp_ppc47x_setup_cpu(int cpu) 94 + static void smp_ppc47x_setup_cpu(int cpu) 95 95 { 96 96 mpic_setup_this_cpu(); 97 97 } 98 98 99 - static int __cpuinit smp_ppc47x_kick_cpu(int cpu) 99 + static int smp_ppc47x_kick_cpu(int cpu) 100 100 { 101 101 struct device_node *cpunode = of_get_cpu_node(cpu, NULL); 102 102 const u64 *spin_table_addr_prop; ··· 176 176 return 1; 177 177 } 178 178 179 + static int board_rev = -1; 180 + static int __init ppc47x_get_board_rev(void) 181 + { 182 + u8 fpga_reg0; 183 + void *fpga; 184 + struct device_node *np; 185 + 186 + np = of_find_compatible_node(NULL, NULL, "ibm,currituck-fpga"); 187 + if (!np) 188 + goto fail; 189 + 190 + fpga = of_iomap(np, 0); 191 + of_node_put(np); 192 + if (!fpga) 193 + goto fail; 194 + 195 + fpga_reg0 = ioread8(fpga); 196 + board_rev = fpga_reg0 & 0x03; 197 + pr_info("%s: Found board revision %d\n", __func__, board_rev); 198 + iounmap(fpga); 199 + return 0; 200 + 201 + fail: 202 + pr_info("%s: Unable to find board revision\n", __func__); 203 + return 0; 204 + } 205 + machine_arch_initcall(ppc47x, ppc47x_get_board_rev); 206 + 179 207 /* Use USB controller should have been hardware swizzled but it wasn't :( */ 180 208 static void ppc47x_pci_irq_fixup(struct pci_dev *dev) 181 209 { 182 210 if (dev->vendor == 0x1033 && (dev->device == 0x0035 || 183 211 dev->device == 0x00e0)) { 184 - dev->irq = irq_create_mapping(NULL, 47); 185 - pr_info("%s: Mapping irq 47 %d\n", __func__, dev->irq); 212 + if (board_rev == 0) { 213 + dev->irq = irq_create_mapping(NULL, 47); 214 + pr_info("%s: Mapping irq %d\n", __func__, dev->irq); 215 + } else if (board_rev == 2) { 216 + dev->irq = irq_create_mapping(NULL, 49); 217 + pr_info("%s: Mapping irq %d\n", __func__, dev->irq); 218 + } else { 219 + pr_alert("%s: Unknown board revision\n", __func__); 220 + } 186 221 } 187 222 } 188 223

+2 -2

arch/powerpc/platforms/44x/iss4xx.c

··· 81 81 } 82 82 83 83 #ifdef CONFIG_SMP 84 - static void __cpuinit smp_iss4xx_setup_cpu(int cpu) 84 + static void smp_iss4xx_setup_cpu(int cpu) 85 85 { 86 86 mpic_setup_this_cpu(); 87 87 } 88 88 89 - static int __cpuinit smp_iss4xx_kick_cpu(int cpu) 89 + static int smp_iss4xx_kick_cpu(int cpu) 90 90 { 91 91 struct device_node *cpunode = of_get_cpu_node(cpu, NULL); 92 92 const u64 *spin_table_addr_prop;

+2 -4

arch/powerpc/platforms/512x/mpc5121_ads.c

··· 43 43 mpc83xx_add_bridge(np); 44 44 #endif 45 45 46 - #if defined(CONFIG_FB_FSL_DIU) || defined(CONFIG_FB_FSL_DIU_MODULE) 47 - mpc512x_setup_diu(); 48 - #endif 46 + mpc512x_setup_arch(); 49 47 } 50 48 51 49 static void __init mpc5121_ads_init_IRQ(void) ··· 67 69 .probe = mpc5121_ads_probe, 68 70 .setup_arch = mpc5121_ads_setup_arch, 69 71 .init = mpc512x_init, 70 - .init_early = mpc512x_init_diu, 72 + .init_early = mpc512x_init_early, 71 73 .init_IRQ = mpc5121_ads_init_IRQ, 72 74 .get_irq = ipic_get_irq, 73 75 .calibrate_decr = generic_calibrate_decr,

+3 -9

arch/powerpc/platforms/512x/mpc512x.h

··· 12 12 #ifndef __MPC512X_H__ 13 13 #define __MPC512X_H__ 14 14 extern void __init mpc512x_init_IRQ(void); 15 + extern void __init mpc512x_init_early(void); 15 16 extern void __init mpc512x_init(void); 17 + extern void __init mpc512x_setup_arch(void); 16 18 extern int __init mpc5121_clk_init(void); 17 - void __init mpc512x_declare_of_platform_devices(void); 18 19 extern const char *mpc512x_select_psc_compat(void); 20 + extern const char *mpc512x_select_reset_compat(void); 19 21 extern void mpc512x_restart(char *cmd); 20 - 21 - #if defined(CONFIG_FB_FSL_DIU) || defined(CONFIG_FB_FSL_DIU_MODULE) 22 - void mpc512x_init_diu(void); 23 - void mpc512x_setup_diu(void); 24 - #else 25 - #define mpc512x_init_diu NULL 26 - #define mpc512x_setup_diu NULL 27 - #endif 28 22 29 23 #endif /* __MPC512X_H__ */

+2 -2

arch/powerpc/platforms/512x/mpc512x_generic.c

··· 45 45 .name = "MPC512x generic", 46 46 .probe = mpc512x_generic_probe, 47 47 .init = mpc512x_init, 48 - .init_early = mpc512x_init_diu, 49 - .setup_arch = mpc512x_setup_diu, 48 + .init_early = mpc512x_init_early, 49 + .setup_arch = mpc512x_setup_arch, 50 50 .init_IRQ = mpc512x_init_IRQ, 51 51 .get_irq = ipic_get_irq, 52 52 .calibrate_decr = generic_calibrate_decr,

+28 -3

arch/powerpc/platforms/512x/mpc512x_shared.c

··· 35 35 static void __init mpc512x_restart_init(void) 36 36 { 37 37 struct device_node *np; 38 + const char *reset_compat; 38 39 39 - np = of_find_compatible_node(NULL, NULL, "fsl,mpc5121-reset"); 40 + reset_compat = mpc512x_select_reset_compat(); 41 + np = of_find_compatible_node(NULL, NULL, reset_compat); 40 42 if (!np) 41 43 return; 42 44 ··· 60 58 ; 61 59 } 62 60 63 - #if defined(CONFIG_FB_FSL_DIU) || defined(CONFIG_FB_FSL_DIU_MODULE) 61 + #if IS_ENABLED(CONFIG_FB_FSL_DIU) 64 62 65 63 struct fsl_diu_shared_fb { 66 64 u8 gamma[0x300]; /* 32-bit aligned! */ ··· 357 355 return NULL; 358 356 } 359 357 358 + const char *mpc512x_select_reset_compat(void) 359 + { 360 + if (of_machine_is_compatible("fsl,mpc5121")) 361 + return "fsl,mpc5121-reset"; 362 + 363 + if (of_machine_is_compatible("fsl,mpc5125")) 364 + return "fsl,mpc5125-reset"; 365 + 366 + return NULL; 367 + } 368 + 360 369 static unsigned int __init get_fifo_size(struct device_node *np, 361 370 char *prop_name) 362 371 { ··· 449 436 } 450 437 } 451 438 439 + void __init mpc512x_init_early(void) 440 + { 441 + mpc512x_restart_init(); 442 + if (IS_ENABLED(CONFIG_FB_FSL_DIU)) 443 + mpc512x_init_diu(); 444 + } 445 + 452 446 void __init mpc512x_init(void) 453 447 { 454 448 mpc5121_clk_init(); 455 449 mpc512x_declare_of_platform_devices(); 456 - mpc512x_restart_init(); 457 450 mpc512x_psc_fifo_init(); 451 + } 452 + 453 + void __init mpc512x_setup_arch(void) 454 + { 455 + if (IS_ENABLED(CONFIG_FB_FSL_DIU)) 456 + mpc512x_setup_diu(); 458 457 } 459 458 460 459 /**

+2 -2

arch/powerpc/platforms/512x/pdm360ng.c

··· 119 119 define_machine(pdm360ng) { 120 120 .name = "PDM360NG", 121 121 .probe = pdm360ng_probe, 122 - .setup_arch = mpc512x_setup_diu, 122 + .setup_arch = mpc512x_setup_arch, 123 123 .init = pdm360ng_init, 124 - .init_early = mpc512x_init_diu, 124 + .init_early = mpc512x_init_early, 125 125 .init_IRQ = mpc512x_init_IRQ, 126 126 .get_irq = ipic_get_irq, 127 127 .calibrate_decr = generic_calibrate_decr,

+1 -11

arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c

··· 231 231 .id_table = mcu_ids, 232 232 }; 233 233 234 - static int __init mcu_init(void) 235 - { 236 - return i2c_add_driver(&mcu_driver); 237 - } 238 - module_init(mcu_init); 239 - 240 - static void __exit mcu_exit(void) 241 - { 242 - i2c_del_driver(&mcu_driver); 243 - } 244 - module_exit(mcu_exit); 234 + module_i2c_driver(mcu_driver); 245 235 246 236 MODULE_DESCRIPTION("Power Management and GPIO expander driver for " 247 237 "MPC8349E-mITX-compatible MCU");

-5

arch/powerpc/platforms/85xx/p5020_ds.c

··· 75 75 #ifdef CONFIG_PCI 76 76 .pcibios_fixup_bus = fsl_pcibios_fixup_bus, 77 77 #endif 78 - /* coreint doesn't play nice with lazy EE, use legacy mpic for now */ 79 - #ifdef CONFIG_PPC64 80 - .get_irq = mpic_get_irq, 81 - #else 82 78 .get_irq = mpic_get_coreint_irq, 83 - #endif 84 79 .restart = fsl_rstcr_restart, 85 80 .calibrate_decr = generic_calibrate_decr, 86 81 .progress = udbg_progress,

-5

arch/powerpc/platforms/85xx/p5040_ds.c

··· 66 66 #ifdef CONFIG_PCI 67 67 .pcibios_fixup_bus = fsl_pcibios_fixup_bus, 68 68 #endif 69 - /* coreint doesn't play nice with lazy EE, use legacy mpic for now */ 70 - #ifdef CONFIG_PPC64 71 - .get_irq = mpic_get_irq, 72 - #else 73 69 .get_irq = mpic_get_coreint_irq, 74 - #endif 75 70 .restart = fsl_rstcr_restart, 76 71 .calibrate_decr = generic_calibrate_decr, 77 72 .progress = udbg_progress,

+3 -3

arch/powerpc/platforms/85xx/smp.c

··· 99 99 } 100 100 101 101 #ifdef CONFIG_HOTPLUG_CPU 102 - static void __cpuinit smp_85xx_mach_cpu_die(void) 102 + static void smp_85xx_mach_cpu_die(void) 103 103 { 104 104 unsigned int cpu = smp_processor_id(); 105 105 u32 tmp; ··· 141 141 return in_be32(&((struct epapr_spin_table *)spin_table)->addr_l); 142 142 } 143 143 144 - static int __cpuinit smp_85xx_kick_cpu(int nr) 144 + static int smp_85xx_kick_cpu(int nr) 145 145 { 146 146 unsigned long flags; 147 147 const u64 *cpu_rel_addr; ··· 362 362 } 363 363 #endif /* CONFIG_KEXEC */ 364 364 365 - static void __cpuinit smp_85xx_setup_cpu(int cpu_nr) 365 + static void smp_85xx_setup_cpu(int cpu_nr) 366 366 { 367 367 if (smp_85xx_ops.probe == smp_mpic_probe) 368 368 mpic_setup_this_cpu();

-5

arch/powerpc/platforms/85xx/t4240_qds.c

··· 75 75 #ifdef CONFIG_PCI 76 76 .pcibios_fixup_bus = fsl_pcibios_fixup_bus, 77 77 #endif 78 - /* coreint doesn't play nice with lazy EE, use legacy mpic for now */ 79 - #ifdef CONFIG_PPC64 80 - .get_irq = mpic_get_irq, 81 - #else 82 78 .get_irq = mpic_get_coreint_irq, 83 - #endif 84 79 .restart = fsl_rstcr_restart, 85 80 .calibrate_decr = generic_calibrate_decr, 86 81 .progress = udbg_progress,

+4 -10

arch/powerpc/platforms/8xx/m8xx_setup.c

··· 43 43 44 44 static struct irqaction tbint_irqaction = { 45 45 .handler = timebase_interrupt, 46 + .flags = IRQF_NO_THREAD, 46 47 .name = "tbint", 47 48 }; 48 49 ··· 219 218 220 219 static void cpm_cascade(unsigned int irq, struct irq_desc *desc) 221 220 { 222 - struct irq_chip *chip; 223 - int cascade_irq; 221 + struct irq_chip *chip = irq_desc_get_chip(desc); 222 + int cascade_irq = cpm_get_irq(); 224 223 225 - if ((cascade_irq = cpm_get_irq()) >= 0) { 226 - struct irq_desc *cdesc = irq_to_desc(cascade_irq); 227 - 224 + if (cascade_irq >= 0) 228 225 generic_handle_irq(cascade_irq); 229 226 230 - chip = irq_desc_get_chip(cdesc); 231 - chip->irq_eoi(&cdesc->irq_data); 232 - } 233 - 234 - chip = irq_desc_get_chip(desc); 235 227 chip->irq_eoi(&desc->irq_data); 236 228 } 237 229

+26

arch/powerpc/platforms/Kconfig

··· 86 86 bool 87 87 default n 88 88 89 + config MPIC_TIMER 90 + bool "MPIC Global Timer" 91 + depends on MPIC && FSL_SOC 92 + default n 93 + help 94 + The MPIC global timer is a hardware timer inside the 95 + Freescale PIC complying with OpenPIC standard. When the 96 + specified interval times out, the hardware timer generates 97 + an interrupt. The driver currently is only tested on fsl 98 + chip, but it can potentially support other global timers 99 + complying with the OpenPIC standard. 100 + 101 + config FSL_MPIC_TIMER_WAKEUP 102 + tristate "Freescale MPIC global timer wakeup driver" 103 + depends on FSL_SOC && MPIC_TIMER && PM 104 + default n 105 + help 106 + The driver provides a way to wake up the system by MPIC 107 + timer. 108 + e.g. "echo 5 > /sys/devices/system/mpic/timer_wakeup" 109 + 89 110 config PPC_EPAPR_HV_PIC 90 111 bool 91 112 default n ··· 184 163 bool "Support for GX bus based adapters" 185 164 help 186 165 Bus device driver for GX bus based adapters. 166 + 167 + config EEH 168 + bool 169 + depends on (PPC_POWERNV || PPC_PSERIES) && PCI 170 + default y 187 171 188 172 config PPC_MPC106 189 173 bool

+1

arch/powerpc/platforms/Kconfig.cputype

··· 71 71 select PPC_FPU 72 72 select PPC_HAVE_PMU_SUPPORT 73 73 select SYS_SUPPORTS_HUGETLBFS 74 + select HAVE_ARCH_TRANSPARENT_HUGEPAGE if PPC_64K_PAGES 74 75 75 76 config PPC_BOOK3E_64 76 77 bool "Embedded processors"

+10 -6

arch/powerpc/platforms/cell/beat_htab.c

··· 185 185 static long beat_lpar_hpte_updatepp(unsigned long slot, 186 186 unsigned long newpp, 187 187 unsigned long vpn, 188 - int psize, int ssize, int local) 188 + int psize, int apsize, 189 + int ssize, int local) 189 190 { 190 191 unsigned long lpar_rc; 191 192 u64 dummy0, dummy1; ··· 275 274 } 276 275 277 276 static void beat_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn, 278 - int psize, int ssize, int local) 277 + int psize, int apsize, 278 + int ssize, int local) 279 279 { 280 280 unsigned long want_v; 281 281 unsigned long lpar_rc; ··· 366 364 * already zero. For now I am paranoid. 367 365 */ 368 366 static long beat_lpar_hpte_updatepp_v3(unsigned long slot, 369 - unsigned long newpp, 370 - unsigned long vpn, 371 - int psize, int ssize, int local) 367 + unsigned long newpp, 368 + unsigned long vpn, 369 + int psize, int apsize, 370 + int ssize, int local) 372 371 { 373 372 unsigned long lpar_rc; 374 373 unsigned long want_v; ··· 397 394 } 398 395 399 396 static void beat_lpar_hpte_invalidate_v3(unsigned long slot, unsigned long vpn, 400 - int psize, int ssize, int local) 397 + int psize, int apsize, 398 + int ssize, int local) 401 399 { 402 400 unsigned long want_v; 403 401 unsigned long lpar_rc;

+1 -1

arch/powerpc/platforms/cell/smp.c

··· 142 142 * during boot if the user requests it. Odd-numbered 143 143 * cpus are assumed to be secondary threads. 144 144 */ 145 - if (system_state < SYSTEM_RUNNING && 145 + if (system_state == SYSTEM_BOOTING && 146 146 cpu_has_feature(CPU_FTR_SMT) && 147 147 !smt_enabled_at_boot && cpu_thread_in_core(nr) != 0) 148 148 return 0;

+1 -1

arch/powerpc/platforms/powermac/smp.c

··· 885 885 return NOTIFY_OK; 886 886 } 887 887 888 - static struct notifier_block __cpuinitdata smp_core99_cpu_nb = { 888 + static struct notifier_block smp_core99_cpu_nb = { 889 889 .notifier_call = smp_core99_cpu_notify, 890 890 }; 891 891 #endif /* CONFIG_HOTPLUG_CPU */

+1

arch/powerpc/platforms/powernv/Makefile

··· 3 3 4 4 obj-$(CONFIG_SMP) += smp.o 5 5 obj-$(CONFIG_PCI) += pci.o pci-p5ioc2.o pci-ioda.o 6 + obj-$(CONFIG_EEH) += eeh-ioda.o eeh-powernv.o

+916

arch/powerpc/platforms/powernv/eeh-ioda.c

··· 1 + /* 2 + * The file intends to implement the functions needed by EEH, which is 3 + * built on IODA compliant chip. Actually, lots of functions related 4 + * to EEH would be built based on the OPAL APIs. 5 + * 6 + * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2013. 7 + * 8 + * This program is free software; you can redistribute it and/or modify 9 + * it under the terms of the GNU General Public License as published by 10 + * the Free Software Foundation; either version 2 of the License, or 11 + * (at your option) any later version. 12 + */ 13 + 14 + #include <linux/bootmem.h> 15 + #include <linux/debugfs.h> 16 + #include <linux/delay.h> 17 + #include <linux/init.h> 18 + #include <linux/io.h> 19 + #include <linux/irq.h> 20 + #include <linux/kernel.h> 21 + #include <linux/msi.h> 22 + #include <linux/notifier.h> 23 + #include <linux/pci.h> 24 + #include <linux/string.h> 25 + 26 + #include <asm/eeh.h> 27 + #include <asm/eeh_event.h> 28 + #include <asm/io.h> 29 + #include <asm/iommu.h> 30 + #include <asm/msi_bitmap.h> 31 + #include <asm/opal.h> 32 + #include <asm/pci-bridge.h> 33 + #include <asm/ppc-pci.h> 34 + #include <asm/tce.h> 35 + 36 + #include "powernv.h" 37 + #include "pci.h" 38 + 39 + /* Debugging option */ 40 + #ifdef IODA_EEH_DBG_ON 41 + #define IODA_EEH_DBG(args...) pr_info(args) 42 + #else 43 + #define IODA_EEH_DBG(args...) 44 + #endif 45 + 46 + static char *hub_diag = NULL; 47 + static int ioda_eeh_nb_init = 0; 48 + 49 + static int ioda_eeh_event(struct notifier_block *nb, 50 + unsigned long events, void *change) 51 + { 52 + uint64_t changed_evts = (uint64_t)change; 53 + 54 + /* We simply send special EEH event */ 55 + if ((changed_evts & OPAL_EVENT_PCI_ERROR) && 56 + (events & OPAL_EVENT_PCI_ERROR)) 57 + eeh_send_failure_event(NULL); 58 + 59 + return 0; 60 + } 61 + 62 + static struct notifier_block ioda_eeh_nb = { 63 + .notifier_call = ioda_eeh_event, 64 + .next = NULL, 65 + .priority = 0 66 + }; 67 + 68 + #ifdef CONFIG_DEBUG_FS 69 + static int ioda_eeh_dbgfs_set(void *data, u64 val) 70 + { 71 + struct pci_controller *hose = data; 72 + struct pnv_phb *phb = hose->private_data; 73 + 74 + out_be64(phb->regs + 0xD10, val); 75 + return 0; 76 + } 77 + 78 + static int ioda_eeh_dbgfs_get(void *data, u64 *val) 79 + { 80 + struct pci_controller *hose = data; 81 + struct pnv_phb *phb = hose->private_data; 82 + 83 + *val = in_be64(phb->regs + 0xD10); 84 + return 0; 85 + } 86 + 87 + DEFINE_SIMPLE_ATTRIBUTE(ioda_eeh_dbgfs_ops, ioda_eeh_dbgfs_get, 88 + ioda_eeh_dbgfs_set, "0x%llx\n"); 89 + #endif /* CONFIG_DEBUG_FS */ 90 + 91 + /** 92 + * ioda_eeh_post_init - Chip dependent post initialization 93 + * @hose: PCI controller 94 + * 95 + * The function will be called after eeh PEs and devices 96 + * have been built. That means the EEH is ready to supply 97 + * service with I/O cache. 98 + */ 99 + static int ioda_eeh_post_init(struct pci_controller *hose) 100 + { 101 + struct pnv_phb *phb = hose->private_data; 102 + int ret; 103 + 104 + /* Register OPAL event notifier */ 105 + if (!ioda_eeh_nb_init) { 106 + ret = opal_notifier_register(&ioda_eeh_nb); 107 + if (ret) { 108 + pr_err("%s: Can't register OPAL event notifier (%d)\n", 109 + __func__, ret); 110 + return ret; 111 + } 112 + 113 + ioda_eeh_nb_init = 1; 114 + } 115 + 116 + /* FIXME: Enable it for PHB3 later */ 117 + if (phb->type == PNV_PHB_IODA1) { 118 + if (!hub_diag) { 119 + hub_diag = (char *)__get_free_page(GFP_KERNEL | 120 + __GFP_ZERO); 121 + if (!hub_diag) { 122 + pr_err("%s: Out of memory !\n", 123 + __func__); 124 + return -ENOMEM; 125 + } 126 + } 127 + 128 + #ifdef CONFIG_DEBUG_FS 129 + if (phb->dbgfs) 130 + debugfs_create_file("err_injct", 0600, 131 + phb->dbgfs, hose, 132 + &ioda_eeh_dbgfs_ops); 133 + #endif 134 + 135 + phb->eeh_state |= PNV_EEH_STATE_ENABLED; 136 + } 137 + 138 + return 0; 139 + } 140 + 141 + /** 142 + * ioda_eeh_set_option - Set EEH operation or I/O setting 143 + * @pe: EEH PE 144 + * @option: options 145 + * 146 + * Enable or disable EEH option for the indicated PE. The 147 + * function also can be used to enable I/O or DMA for the 148 + * PE. 149 + */ 150 + static int ioda_eeh_set_option(struct eeh_pe *pe, int option) 151 + { 152 + s64 ret; 153 + u32 pe_no; 154 + struct pci_controller *hose = pe->phb; 155 + struct pnv_phb *phb = hose->private_data; 156 + 157 + /* Check on PE number */ 158 + if (pe->addr < 0 || pe->addr >= phb->ioda.total_pe) { 159 + pr_err("%s: PE address %x out of range [0, %x] " 160 + "on PHB#%x\n", 161 + __func__, pe->addr, phb->ioda.total_pe, 162 + hose->global_number); 163 + return -EINVAL; 164 + } 165 + 166 + pe_no = pe->addr; 167 + switch (option) { 168 + case EEH_OPT_DISABLE: 169 + ret = -EEXIST; 170 + break; 171 + case EEH_OPT_ENABLE: 172 + ret = 0; 173 + break; 174 + case EEH_OPT_THAW_MMIO: 175 + ret = opal_pci_eeh_freeze_clear(phb->opal_id, pe_no, 176 + OPAL_EEH_ACTION_CLEAR_FREEZE_MMIO); 177 + if (ret) { 178 + pr_warning("%s: Failed to enable MMIO for " 179 + "PHB#%x-PE#%x, err=%lld\n", 180 + __func__, hose->global_number, pe_no, ret); 181 + return -EIO; 182 + } 183 + 184 + break; 185 + case EEH_OPT_THAW_DMA: 186 + ret = opal_pci_eeh_freeze_clear(phb->opal_id, pe_no, 187 + OPAL_EEH_ACTION_CLEAR_FREEZE_DMA); 188 + if (ret) { 189 + pr_warning("%s: Failed to enable DMA for " 190 + "PHB#%x-PE#%x, err=%lld\n", 191 + __func__, hose->global_number, pe_no, ret); 192 + return -EIO; 193 + } 194 + 195 + break; 196 + default: 197 + pr_warning("%s: Invalid option %d\n", __func__, option); 198 + return -EINVAL; 199 + } 200 + 201 + return ret; 202 + } 203 + 204 + /** 205 + * ioda_eeh_get_state - Retrieve the state of PE 206 + * @pe: EEH PE 207 + * 208 + * The PE's state should be retrieved from the PEEV, PEST 209 + * IODA tables. Since the OPAL has exported the function 210 + * to do it, it'd better to use that. 211 + */ 212 + static int ioda_eeh_get_state(struct eeh_pe *pe) 213 + { 214 + s64 ret = 0; 215 + u8 fstate; 216 + u16 pcierr; 217 + u32 pe_no; 218 + int result; 219 + struct pci_controller *hose = pe->phb; 220 + struct pnv_phb *phb = hose->private_data; 221 + 222 + /* 223 + * Sanity check on PE address. The PHB PE address should 224 + * be zero. 225 + */ 226 + if (pe->addr < 0 || pe->addr >= phb->ioda.total_pe) { 227 + pr_err("%s: PE address %x out of range [0, %x] " 228 + "on PHB#%x\n", 229 + __func__, pe->addr, phb->ioda.total_pe, 230 + hose->global_number); 231 + return EEH_STATE_NOT_SUPPORT; 232 + } 233 + 234 + /* Retrieve PE status through OPAL */ 235 + pe_no = pe->addr; 236 + ret = opal_pci_eeh_freeze_status(phb->opal_id, pe_no, 237 + &fstate, &pcierr, NULL); 238 + if (ret) { 239 + pr_err("%s: Failed to get EEH status on " 240 + "PHB#%x-PE#%x\n, err=%lld\n", 241 + __func__, hose->global_number, pe_no, ret); 242 + return EEH_STATE_NOT_SUPPORT; 243 + } 244 + 245 + /* Check PHB status */ 246 + if (pe->type & EEH_PE_PHB) { 247 + result = 0; 248 + result &= ~EEH_STATE_RESET_ACTIVE; 249 + 250 + if (pcierr != OPAL_EEH_PHB_ERROR) { 251 + result |= EEH_STATE_MMIO_ACTIVE; 252 + result |= EEH_STATE_DMA_ACTIVE; 253 + result |= EEH_STATE_MMIO_ENABLED; 254 + result |= EEH_STATE_DMA_ENABLED; 255 + } 256 + 257 + return result; 258 + } 259 + 260 + /* Parse result out */ 261 + result = 0; 262 + switch (fstate) { 263 + case OPAL_EEH_STOPPED_NOT_FROZEN: 264 + result &= ~EEH_STATE_RESET_ACTIVE; 265 + result |= EEH_STATE_MMIO_ACTIVE; 266 + result |= EEH_STATE_DMA_ACTIVE; 267 + result |= EEH_STATE_MMIO_ENABLED; 268 + result |= EEH_STATE_DMA_ENABLED; 269 + break; 270 + case OPAL_EEH_STOPPED_MMIO_FREEZE: 271 + result &= ~EEH_STATE_RESET_ACTIVE; 272 + result |= EEH_STATE_DMA_ACTIVE; 273 + result |= EEH_STATE_DMA_ENABLED; 274 + break; 275 + case OPAL_EEH_STOPPED_DMA_FREEZE: 276 + result &= ~EEH_STATE_RESET_ACTIVE; 277 + result |= EEH_STATE_MMIO_ACTIVE; 278 + result |= EEH_STATE_MMIO_ENABLED; 279 + break; 280 + case OPAL_EEH_STOPPED_MMIO_DMA_FREEZE: 281 + result &= ~EEH_STATE_RESET_ACTIVE; 282 + break; 283 + case OPAL_EEH_STOPPED_RESET: 284 + result |= EEH_STATE_RESET_ACTIVE; 285 + break; 286 + case OPAL_EEH_STOPPED_TEMP_UNAVAIL: 287 + result |= EEH_STATE_UNAVAILABLE; 288 + break; 289 + case OPAL_EEH_STOPPED_PERM_UNAVAIL: 290 + result |= EEH_STATE_NOT_SUPPORT; 291 + break; 292 + default: 293 + pr_warning("%s: Unexpected EEH status 0x%x " 294 + "on PHB#%x-PE#%x\n", 295 + __func__, fstate, hose->global_number, pe_no); 296 + } 297 + 298 + return result; 299 + } 300 + 301 + static int ioda_eeh_pe_clear(struct eeh_pe *pe) 302 + { 303 + struct pci_controller *hose; 304 + struct pnv_phb *phb; 305 + u32 pe_no; 306 + u8 fstate; 307 + u16 pcierr; 308 + s64 ret; 309 + 310 + pe_no = pe->addr; 311 + hose = pe->phb; 312 + phb = pe->phb->private_data; 313 + 314 + /* Clear the EEH error on the PE */ 315 + ret = opal_pci_eeh_freeze_clear(phb->opal_id, 316 + pe_no, OPAL_EEH_ACTION_CLEAR_FREEZE_ALL); 317 + if (ret) { 318 + pr_err("%s: Failed to clear EEH error for " 319 + "PHB#%x-PE#%x, err=%lld\n", 320 + __func__, hose->global_number, pe_no, ret); 321 + return -EIO; 322 + } 323 + 324 + /* 325 + * Read the PE state back and verify that the frozen 326 + * state has been removed. 327 + */ 328 + ret = opal_pci_eeh_freeze_status(phb->opal_id, pe_no, 329 + &fstate, &pcierr, NULL); 330 + if (ret) { 331 + pr_err("%s: Failed to get EEH status on " 332 + "PHB#%x-PE#%x\n, err=%lld\n", 333 + __func__, hose->global_number, pe_no, ret); 334 + return -EIO; 335 + } 336 + 337 + if (fstate != OPAL_EEH_STOPPED_NOT_FROZEN) { 338 + pr_err("%s: Frozen state not cleared on " 339 + "PHB#%x-PE#%x, sts=%x\n", 340 + __func__, hose->global_number, pe_no, fstate); 341 + return -EIO; 342 + } 343 + 344 + return 0; 345 + } 346 + 347 + static s64 ioda_eeh_phb_poll(struct pnv_phb *phb) 348 + { 349 + s64 rc = OPAL_HARDWARE; 350 + 351 + while (1) { 352 + rc = opal_pci_poll(phb->opal_id); 353 + if (rc <= 0) 354 + break; 355 + 356 + msleep(rc); 357 + } 358 + 359 + return rc; 360 + } 361 + 362 + static int ioda_eeh_phb_reset(struct pci_controller *hose, int option) 363 + { 364 + struct pnv_phb *phb = hose->private_data; 365 + s64 rc = OPAL_HARDWARE; 366 + 367 + pr_debug("%s: Reset PHB#%x, option=%d\n", 368 + __func__, hose->global_number, option); 369 + 370 + /* Issue PHB complete reset request */ 371 + if (option == EEH_RESET_FUNDAMENTAL || 372 + option == EEH_RESET_HOT) 373 + rc = opal_pci_reset(phb->opal_id, 374 + OPAL_PHB_COMPLETE, 375 + OPAL_ASSERT_RESET); 376 + else if (option == EEH_RESET_DEACTIVATE) 377 + rc = opal_pci_reset(phb->opal_id, 378 + OPAL_PHB_COMPLETE, 379 + OPAL_DEASSERT_RESET); 380 + if (rc < 0) 381 + goto out; 382 + 383 + /* 384 + * Poll state of the PHB until the request is done 385 + * successfully. 386 + */ 387 + rc = ioda_eeh_phb_poll(phb); 388 + out: 389 + if (rc != OPAL_SUCCESS) 390 + return -EIO; 391 + 392 + return 0; 393 + } 394 + 395 + static int ioda_eeh_root_reset(struct pci_controller *hose, int option) 396 + { 397 + struct pnv_phb *phb = hose->private_data; 398 + s64 rc = OPAL_SUCCESS; 399 + 400 + pr_debug("%s: Reset PHB#%x, option=%d\n", 401 + __func__, hose->global_number, option); 402 + 403 + /* 404 + * During the reset deassert time, we needn't care 405 + * the reset scope because the firmware does nothing 406 + * for fundamental or hot reset during deassert phase. 407 + */ 408 + if (option == EEH_RESET_FUNDAMENTAL) 409 + rc = opal_pci_reset(phb->opal_id, 410 + OPAL_PCI_FUNDAMENTAL_RESET, 411 + OPAL_ASSERT_RESET); 412 + else if (option == EEH_RESET_HOT) 413 + rc = opal_pci_reset(phb->opal_id, 414 + OPAL_PCI_HOT_RESET, 415 + OPAL_ASSERT_RESET); 416 + else if (option == EEH_RESET_DEACTIVATE) 417 + rc = opal_pci_reset(phb->opal_id, 418 + OPAL_PCI_HOT_RESET, 419 + OPAL_DEASSERT_RESET); 420 + if (rc < 0) 421 + goto out; 422 + 423 + /* Poll state of the PHB until the request is done */ 424 + rc = ioda_eeh_phb_poll(phb); 425 + out: 426 + if (rc != OPAL_SUCCESS) 427 + return -EIO; 428 + 429 + return 0; 430 + } 431 + 432 + static int ioda_eeh_bridge_reset(struct pci_controller *hose, 433 + struct pci_dev *dev, int option) 434 + { 435 + u16 ctrl; 436 + 437 + pr_debug("%s: Reset device %04x:%02x:%02x.%01x with option %d\n", 438 + __func__, hose->global_number, dev->bus->number, 439 + PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), option); 440 + 441 + switch (option) { 442 + case EEH_RESET_FUNDAMENTAL: 443 + case EEH_RESET_HOT: 444 + pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl); 445 + ctrl |= PCI_BRIDGE_CTL_BUS_RESET; 446 + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl); 447 + break; 448 + case EEH_RESET_DEACTIVATE: 449 + pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl); 450 + ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET; 451 + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl); 452 + break; 453 + } 454 + 455 + return 0; 456 + } 457 + 458 + /** 459 + * ioda_eeh_reset - Reset the indicated PE 460 + * @pe: EEH PE 461 + * @option: reset option 462 + * 463 + * Do reset on the indicated PE. For PCI bus sensitive PE, 464 + * we need to reset the parent p2p bridge. The PHB has to 465 + * be reinitialized if the p2p bridge is root bridge. For 466 + * PCI device sensitive PE, we will try to reset the device 467 + * through FLR. For now, we don't have OPAL APIs to do HARD 468 + * reset yet, so all reset would be SOFT (HOT) reset. 469 + */ 470 + static int ioda_eeh_reset(struct eeh_pe *pe, int option) 471 + { 472 + struct pci_controller *hose = pe->phb; 473 + struct eeh_dev *edev; 474 + struct pci_dev *dev; 475 + int ret; 476 + 477 + /* 478 + * Anyway, we have to clear the problematic state for the 479 + * corresponding PE. However, we needn't do it if the PE 480 + * is PHB associated. That means the PHB is having fatal 481 + * errors and it needs reset. Further more, the AIB interface 482 + * isn't reliable any more. 483 + */ 484 + if (!(pe->type & EEH_PE_PHB) && 485 + (option == EEH_RESET_HOT || 486 + option == EEH_RESET_FUNDAMENTAL)) { 487 + ret = ioda_eeh_pe_clear(pe); 488 + if (ret) 489 + return -EIO; 490 + } 491 + 492 + /* 493 + * The rules applied to reset, either fundamental or hot reset: 494 + * 495 + * We always reset the direct upstream bridge of the PE. If the 496 + * direct upstream bridge isn't root bridge, we always take hot 497 + * reset no matter what option (fundamental or hot) is. Otherwise, 498 + * we should do the reset according to the required option. 499 + */ 500 + if (pe->type & EEH_PE_PHB) { 501 + ret = ioda_eeh_phb_reset(hose, option); 502 + } else { 503 + if (pe->type & EEH_PE_DEVICE) { 504 + /* 505 + * If it's device PE, we didn't refer to the parent 506 + * PCI bus yet. So we have to figure it out indirectly. 507 + */ 508 + edev = list_first_entry(&pe->edevs, 509 + struct eeh_dev, list); 510 + dev = eeh_dev_to_pci_dev(edev); 511 + dev = dev->bus->self; 512 + } else { 513 + /* 514 + * If it's bus PE, the parent PCI bus is already there 515 + * and just pick it up. 516 + */ 517 + dev = pe->bus->self; 518 + } 519 + 520 + /* 521 + * Do reset based on the fact that the direct upstream bridge 522 + * is root bridge (port) or not. 523 + */ 524 + if (dev->bus->number == 0) 525 + ret = ioda_eeh_root_reset(hose, option); 526 + else 527 + ret = ioda_eeh_bridge_reset(hose, dev, option); 528 + } 529 + 530 + return ret; 531 + } 532 + 533 + /** 534 + * ioda_eeh_get_log - Retrieve error log 535 + * @pe: EEH PE 536 + * @severity: Severity level of the log 537 + * @drv_log: buffer to store the log 538 + * @len: space of the log buffer 539 + * 540 + * The function is used to retrieve error log from P7IOC. 541 + */ 542 + static int ioda_eeh_get_log(struct eeh_pe *pe, int severity, 543 + char *drv_log, unsigned long len) 544 + { 545 + s64 ret; 546 + unsigned long flags; 547 + struct pci_controller *hose = pe->phb; 548 + struct pnv_phb *phb = hose->private_data; 549 + 550 + spin_lock_irqsave(&phb->lock, flags); 551 + 552 + ret = opal_pci_get_phb_diag_data2(phb->opal_id, 553 + phb->diag.blob, PNV_PCI_DIAG_BUF_SIZE); 554 + if (ret) { 555 + spin_unlock_irqrestore(&phb->lock, flags); 556 + pr_warning("%s: Failed to get log for PHB#%x-PE#%x\n", 557 + __func__, hose->global_number, pe->addr); 558 + return -EIO; 559 + } 560 + 561 + /* 562 + * FIXME: We probably need log the error in somewhere. 563 + * Lets make it up in future. 564 + */ 565 + /* pr_info("%s", phb->diag.blob); */ 566 + 567 + spin_unlock_irqrestore(&phb->lock, flags); 568 + 569 + return 0; 570 + } 571 + 572 + /** 573 + * ioda_eeh_configure_bridge - Configure the PCI bridges for the indicated PE 574 + * @pe: EEH PE 575 + * 576 + * For particular PE, it might have included PCI bridges. In order 577 + * to make the PE work properly, those PCI bridges should be configured 578 + * correctly. However, we need do nothing on P7IOC since the reset 579 + * function will do everything that should be covered by the function. 580 + */ 581 + static int ioda_eeh_configure_bridge(struct eeh_pe *pe) 582 + { 583 + return 0; 584 + } 585 + 586 + static void ioda_eeh_hub_diag_common(struct OpalIoP7IOCErrorData *data) 587 + { 588 + /* GEM */ 589 + pr_info(" GEM XFIR: %016llx\n", data->gemXfir); 590 + pr_info(" GEM RFIR: %016llx\n", data->gemRfir); 591 + pr_info(" GEM RIRQFIR: %016llx\n", data->gemRirqfir); 592 + pr_info(" GEM Mask: %016llx\n", data->gemMask); 593 + pr_info(" GEM RWOF: %016llx\n", data->gemRwof); 594 + 595 + /* LEM */ 596 + pr_info(" LEM FIR: %016llx\n", data->lemFir); 597 + pr_info(" LEM Error Mask: %016llx\n", data->lemErrMask); 598 + pr_info(" LEM Action 0: %016llx\n", data->lemAction0); 599 + pr_info(" LEM Action 1: %016llx\n", data->lemAction1); 600 + pr_info(" LEM WOF: %016llx\n", data->lemWof); 601 + } 602 + 603 + static void ioda_eeh_hub_diag(struct pci_controller *hose) 604 + { 605 + struct pnv_phb *phb = hose->private_data; 606 + struct OpalIoP7IOCErrorData *data; 607 + long rc; 608 + 609 + data = (struct OpalIoP7IOCErrorData *)ioda_eeh_hub_diag; 610 + rc = opal_pci_get_hub_diag_data(phb->hub_id, data, PAGE_SIZE); 611 + if (rc != OPAL_SUCCESS) { 612 + pr_warning("%s: Failed to get HUB#%llx diag-data (%ld)\n", 613 + __func__, phb->hub_id, rc); 614 + return; 615 + } 616 + 617 + switch (data->type) { 618 + case OPAL_P7IOC_DIAG_TYPE_RGC: 619 + pr_info("P7IOC diag-data for RGC\n\n"); 620 + ioda_eeh_hub_diag_common(data); 621 + pr_info(" RGC Status: %016llx\n", data->rgc.rgcStatus); 622 + pr_info(" RGC LDCP: %016llx\n", data->rgc.rgcLdcp); 623 + break; 624 + case OPAL_P7IOC_DIAG_TYPE_BI: 625 + pr_info("P7IOC diag-data for BI %s\n\n", 626 + data->bi.biDownbound ? "Downbound" : "Upbound"); 627 + ioda_eeh_hub_diag_common(data); 628 + pr_info(" BI LDCP 0: %016llx\n", data->bi.biLdcp0); 629 + pr_info(" BI LDCP 1: %016llx\n", data->bi.biLdcp1); 630 + pr_info(" BI LDCP 2: %016llx\n", data->bi.biLdcp2); 631 + pr_info(" BI Fence Status: %016llx\n", data->bi.biFenceStatus); 632 + break; 633 + case OPAL_P7IOC_DIAG_TYPE_CI: 634 + pr_info("P7IOC diag-data for CI Port %d\\nn", 635 + data->ci.ciPort); 636 + ioda_eeh_hub_diag_common(data); 637 + pr_info(" CI Port Status: %016llx\n", data->ci.ciPortStatus); 638 + pr_info(" CI Port LDCP: %016llx\n", data->ci.ciPortLdcp); 639 + break; 640 + case OPAL_P7IOC_DIAG_TYPE_MISC: 641 + pr_info("P7IOC diag-data for MISC\n\n"); 642 + ioda_eeh_hub_diag_common(data); 643 + break; 644 + case OPAL_P7IOC_DIAG_TYPE_I2C: 645 + pr_info("P7IOC diag-data for I2C\n\n"); 646 + ioda_eeh_hub_diag_common(data); 647 + break; 648 + default: 649 + pr_warning("%s: Invalid type of HUB#%llx diag-data (%d)\n", 650 + __func__, phb->hub_id, data->type); 651 + } 652 + } 653 + 654 + static void ioda_eeh_p7ioc_phb_diag(struct pci_controller *hose, 655 + struct OpalIoPhbErrorCommon *common) 656 + { 657 + struct OpalIoP7IOCPhbErrorData *data; 658 + int i; 659 + 660 + data = (struct OpalIoP7IOCPhbErrorData *)common; 661 + 662 + pr_info("P7IOC PHB#%x Diag-data (Version: %d)\n\n", 663 + hose->global_number, common->version); 664 + 665 + pr_info(" brdgCtl: %08x\n", data->brdgCtl); 666 + 667 + pr_info(" portStatusReg: %08x\n", data->portStatusReg); 668 + pr_info(" rootCmplxStatus: %08x\n", data->rootCmplxStatus); 669 + pr_info(" busAgentStatus: %08x\n", data->busAgentStatus); 670 + 671 + pr_info(" deviceStatus: %08x\n", data->deviceStatus); 672 + pr_info(" slotStatus: %08x\n", data->slotStatus); 673 + pr_info(" linkStatus: %08x\n", data->linkStatus); 674 + pr_info(" devCmdStatus: %08x\n", data->devCmdStatus); 675 + pr_info(" devSecStatus: %08x\n", data->devSecStatus); 676 + 677 + pr_info(" rootErrorStatus: %08x\n", data->rootErrorStatus); 678 + pr_info(" uncorrErrorStatus: %08x\n", data->uncorrErrorStatus); 679 + pr_info(" corrErrorStatus: %08x\n", data->corrErrorStatus); 680 + pr_info(" tlpHdr1: %08x\n", data->tlpHdr1); 681 + pr_info(" tlpHdr2: %08x\n", data->tlpHdr2); 682 + pr_info(" tlpHdr3: %08x\n", data->tlpHdr3); 683 + pr_info(" tlpHdr4: %08x\n", data->tlpHdr4); 684 + pr_info(" sourceId: %08x\n", data->sourceId); 685 + 686 + pr_info(" errorClass: %016llx\n", data->errorClass); 687 + pr_info(" correlator: %016llx\n", data->correlator); 688 + pr_info(" p7iocPlssr: %016llx\n", data->p7iocPlssr); 689 + pr_info(" p7iocCsr: %016llx\n", data->p7iocCsr); 690 + pr_info(" lemFir: %016llx\n", data->lemFir); 691 + pr_info(" lemErrorMask: %016llx\n", data->lemErrorMask); 692 + pr_info(" lemWOF: %016llx\n", data->lemWOF); 693 + pr_info(" phbErrorStatus: %016llx\n", data->phbErrorStatus); 694 + pr_info(" phbFirstErrorStatus: %016llx\n", data->phbFirstErrorStatus); 695 + pr_info(" phbErrorLog0: %016llx\n", data->phbErrorLog0); 696 + pr_info(" phbErrorLog1: %016llx\n", data->phbErrorLog1); 697 + pr_info(" mmioErrorStatus: %016llx\n", data->mmioErrorStatus); 698 + pr_info(" mmioFirstErrorStatus: %016llx\n", data->mmioFirstErrorStatus); 699 + pr_info(" mmioErrorLog0: %016llx\n", data->mmioErrorLog0); 700 + pr_info(" mmioErrorLog1: %016llx\n", data->mmioErrorLog1); 701 + pr_info(" dma0ErrorStatus: %016llx\n", data->dma0ErrorStatus); 702 + pr_info(" dma0FirstErrorStatus: %016llx\n", data->dma0FirstErrorStatus); 703 + pr_info(" dma0ErrorLog0: %016llx\n", data->dma0ErrorLog0); 704 + pr_info(" dma0ErrorLog1: %016llx\n", data->dma0ErrorLog1); 705 + pr_info(" dma1ErrorStatus: %016llx\n", data->dma1ErrorStatus); 706 + pr_info(" dma1FirstErrorStatus: %016llx\n", data->dma1FirstErrorStatus); 707 + pr_info(" dma1ErrorLog0: %016llx\n", data->dma1ErrorLog0); 708 + pr_info(" dma1ErrorLog1: %016llx\n", data->dma1ErrorLog1); 709 + 710 + for (i = 0; i < OPAL_P7IOC_NUM_PEST_REGS; i++) { 711 + if ((data->pestA[i] >> 63) == 0 && 712 + (data->pestB[i] >> 63) == 0) 713 + continue; 714 + 715 + pr_info(" PE[%3d] PESTA: %016llx\n", i, data->pestA[i]); 716 + pr_info(" PESTB: %016llx\n", data->pestB[i]); 717 + } 718 + } 719 + 720 + static void ioda_eeh_phb_diag(struct pci_controller *hose) 721 + { 722 + struct pnv_phb *phb = hose->private_data; 723 + struct OpalIoPhbErrorCommon *common; 724 + long rc; 725 + 726 + common = (struct OpalIoPhbErrorCommon *)phb->diag.blob; 727 + rc = opal_pci_get_phb_diag_data2(phb->opal_id, common, PAGE_SIZE); 728 + if (rc != OPAL_SUCCESS) { 729 + pr_warning("%s: Failed to get diag-data for PHB#%x (%ld)\n", 730 + __func__, hose->global_number, rc); 731 + return; 732 + } 733 + 734 + switch (common->ioType) { 735 + case OPAL_PHB_ERROR_DATA_TYPE_P7IOC: 736 + ioda_eeh_p7ioc_phb_diag(hose, common); 737 + break; 738 + default: 739 + pr_warning("%s: Unrecognized I/O chip %d\n", 740 + __func__, common->ioType); 741 + } 742 + } 743 + 744 + static int ioda_eeh_get_phb_pe(struct pci_controller *hose, 745 + struct eeh_pe **pe) 746 + { 747 + struct eeh_pe *phb_pe; 748 + 749 + phb_pe = eeh_phb_pe_get(hose); 750 + if (!phb_pe) { 751 + pr_warning("%s Can't find PE for PHB#%d\n", 752 + __func__, hose->global_number); 753 + return -EEXIST; 754 + } 755 + 756 + *pe = phb_pe; 757 + return 0; 758 + } 759 + 760 + static int ioda_eeh_get_pe(struct pci_controller *hose, 761 + u16 pe_no, struct eeh_pe **pe) 762 + { 763 + struct eeh_pe *phb_pe, *dev_pe; 764 + struct eeh_dev dev; 765 + 766 + /* Find the PHB PE */ 767 + if (ioda_eeh_get_phb_pe(hose, &phb_pe)) 768 + return -EEXIST; 769 + 770 + /* Find the PE according to PE# */ 771 + memset(&dev, 0, sizeof(struct eeh_dev)); 772 + dev.phb = hose; 773 + dev.pe_config_addr = pe_no; 774 + dev_pe = eeh_pe_get(&dev); 775 + if (!dev_pe) { 776 + pr_warning("%s: Can't find PE for PHB#%x - PE#%x\n", 777 + __func__, hose->global_number, pe_no); 778 + return -EEXIST; 779 + } 780 + 781 + *pe = dev_pe; 782 + return 0; 783 + } 784 + 785 + /** 786 + * ioda_eeh_next_error - Retrieve next error for EEH core to handle 787 + * @pe: The affected PE 788 + * 789 + * The function is expected to be called by EEH core while it gets 790 + * special EEH event (without binding PE). The function calls to 791 + * OPAL APIs for next error to handle. The informational error is 792 + * handled internally by platform. However, the dead IOC, dead PHB, 793 + * fenced PHB and frozen PE should be handled by EEH core eventually. 794 + */ 795 + static int ioda_eeh_next_error(struct eeh_pe **pe) 796 + { 797 + struct pci_controller *hose, *tmp; 798 + struct pnv_phb *phb; 799 + u64 frozen_pe_no; 800 + u16 err_type, severity; 801 + long rc; 802 + int ret = 1; 803 + 804 + /* 805 + * While running here, it's safe to purge the event queue. 806 + * And we should keep the cached OPAL notifier event sychronized 807 + * between the kernel and firmware. 808 + */ 809 + eeh_remove_event(NULL); 810 + opal_notifier_update_evt(OPAL_EVENT_PCI_ERROR, 0x0ul); 811 + 812 + list_for_each_entry_safe(hose, tmp, &hose_list, list_node) { 813 + /* 814 + * If the subordinate PCI buses of the PHB has been 815 + * removed, we needn't take care of it any more. 816 + */ 817 + phb = hose->private_data; 818 + if (phb->eeh_state & PNV_EEH_STATE_REMOVED) 819 + continue; 820 + 821 + rc = opal_pci_next_error(phb->opal_id, 822 + &frozen_pe_no, &err_type, &severity); 823 + 824 + /* If OPAL API returns error, we needn't proceed */ 825 + if (rc != OPAL_SUCCESS) { 826 + IODA_EEH_DBG("%s: Invalid return value on " 827 + "PHB#%x (0x%lx) from opal_pci_next_error", 828 + __func__, hose->global_number, rc); 829 + continue; 830 + } 831 + 832 + /* If the PHB doesn't have error, stop processing */ 833 + if (err_type == OPAL_EEH_NO_ERROR || 834 + severity == OPAL_EEH_SEV_NO_ERROR) { 835 + IODA_EEH_DBG("%s: No error found on PHB#%x\n", 836 + __func__, hose->global_number); 837 + continue; 838 + } 839 + 840 + /* 841 + * Processing the error. We're expecting the error with 842 + * highest priority reported upon multiple errors on the 843 + * specific PHB. 844 + */ 845 + IODA_EEH_DBG("%s: Error (%d, %d, %d) on PHB#%x\n", 846 + err_type, severity, pe_no, hose->global_number); 847 + switch (err_type) { 848 + case OPAL_EEH_IOC_ERROR: 849 + if (severity == OPAL_EEH_SEV_IOC_DEAD) { 850 + list_for_each_entry_safe(hose, tmp, 851 + &hose_list, list_node) { 852 + phb = hose->private_data; 853 + phb->eeh_state |= PNV_EEH_STATE_REMOVED; 854 + } 855 + 856 + pr_err("EEH: dead IOC detected\n"); 857 + ret = 4; 858 + goto out; 859 + } else if (severity == OPAL_EEH_SEV_INF) { 860 + pr_info("EEH: IOC informative error " 861 + "detected\n"); 862 + ioda_eeh_hub_diag(hose); 863 + } 864 + 865 + break; 866 + case OPAL_EEH_PHB_ERROR: 867 + if (severity == OPAL_EEH_SEV_PHB_DEAD) { 868 + if (ioda_eeh_get_phb_pe(hose, pe)) 869 + break; 870 + 871 + pr_err("EEH: dead PHB#%x detected\n", 872 + hose->global_number); 873 + phb->eeh_state |= PNV_EEH_STATE_REMOVED; 874 + ret = 3; 875 + goto out; 876 + } else if (severity == OPAL_EEH_SEV_PHB_FENCED) { 877 + if (ioda_eeh_get_phb_pe(hose, pe)) 878 + break; 879 + 880 + pr_err("EEH: fenced PHB#%x detected\n", 881 + hose->global_number); 882 + ret = 2; 883 + goto out; 884 + } else if (severity == OPAL_EEH_SEV_INF) { 885 + pr_info("EEH: PHB#%x informative error " 886 + "detected\n", 887 + hose->global_number); 888 + ioda_eeh_phb_diag(hose); 889 + } 890 + 891 + break; 892 + case OPAL_EEH_PE_ERROR: 893 + if (ioda_eeh_get_pe(hose, frozen_pe_no, pe)) 894 + break; 895 + 896 + pr_err("EEH: Frozen PE#%x on PHB#%x detected\n", 897 + (*pe)->addr, (*pe)->phb->global_number); 898 + ret = 1; 899 + goto out; 900 + } 901 + } 902 + 903 + ret = 0; 904 + out: 905 + return ret; 906 + } 907 + 908 + struct pnv_eeh_ops ioda_eeh_ops = { 909 + .post_init = ioda_eeh_post_init, 910 + .set_option = ioda_eeh_set_option, 911 + .get_state = ioda_eeh_get_state, 912 + .reset = ioda_eeh_reset, 913 + .get_log = ioda_eeh_get_log, 914 + .configure_bridge = ioda_eeh_configure_bridge, 915 + .next_error = ioda_eeh_next_error 916 + };

+379

arch/powerpc/platforms/powernv/eeh-powernv.c

··· 1 + /* 2 + * The file intends to implement the platform dependent EEH operations on 3 + * powernv platform. Actually, the powernv was created in order to fully 4 + * hypervisor support. 5 + * 6 + * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2013. 7 + * 8 + * This program is free software; you can redistribute it and/or modify 9 + * it under the terms of the GNU General Public License as published by 10 + * the Free Software Foundation; either version 2 of the License, or 11 + * (at your option) any later version. 12 + */ 13 + 14 + #include <linux/atomic.h> 15 + #include <linux/delay.h> 16 + #include <linux/export.h> 17 + #include <linux/init.h> 18 + #include <linux/list.h> 19 + #include <linux/msi.h> 20 + #include <linux/of.h> 21 + #include <linux/pci.h> 22 + #include <linux/proc_fs.h> 23 + #include <linux/rbtree.h> 24 + #include <linux/sched.h> 25 + #include <linux/seq_file.h> 26 + #include <linux/spinlock.h> 27 + 28 + #include <asm/eeh.h> 29 + #include <asm/eeh_event.h> 30 + #include <asm/firmware.h> 31 + #include <asm/io.h> 32 + #include <asm/iommu.h> 33 + #include <asm/machdep.h> 34 + #include <asm/msi_bitmap.h> 35 + #include <asm/opal.h> 36 + #include <asm/ppc-pci.h> 37 + 38 + #include "powernv.h" 39 + #include "pci.h" 40 + 41 + /** 42 + * powernv_eeh_init - EEH platform dependent initialization 43 + * 44 + * EEH platform dependent initialization on powernv 45 + */ 46 + static int powernv_eeh_init(void) 47 + { 48 + /* We require OPALv3 */ 49 + if (!firmware_has_feature(FW_FEATURE_OPALv3)) { 50 + pr_warning("%s: OPALv3 is required !\n", __func__); 51 + return -EINVAL; 52 + } 53 + 54 + /* Set EEH probe mode */ 55 + eeh_probe_mode_set(EEH_PROBE_MODE_DEV); 56 + 57 + return 0; 58 + } 59 + 60 + /** 61 + * powernv_eeh_post_init - EEH platform dependent post initialization 62 + * 63 + * EEH platform dependent post initialization on powernv. When 64 + * the function is called, the EEH PEs and devices should have 65 + * been built. If the I/O cache staff has been built, EEH is 66 + * ready to supply service. 67 + */ 68 + static int powernv_eeh_post_init(void) 69 + { 70 + struct pci_controller *hose; 71 + struct pnv_phb *phb; 72 + int ret = 0; 73 + 74 + list_for_each_entry(hose, &hose_list, list_node) { 75 + phb = hose->private_data; 76 + 77 + if (phb->eeh_ops && phb->eeh_ops->post_init) { 78 + ret = phb->eeh_ops->post_init(hose); 79 + if (ret) 80 + break; 81 + } 82 + } 83 + 84 + return ret; 85 + } 86 + 87 + /** 88 + * powernv_eeh_dev_probe - Do probe on PCI device 89 + * @dev: PCI device 90 + * @flag: unused 91 + * 92 + * When EEH module is installed during system boot, all PCI devices 93 + * are checked one by one to see if it supports EEH. The function 94 + * is introduced for the purpose. By default, EEH has been enabled 95 + * on all PCI devices. That's to say, we only need do necessary 96 + * initialization on the corresponding eeh device and create PE 97 + * accordingly. 98 + * 99 + * It's notable that's unsafe to retrieve the EEH device through 100 + * the corresponding PCI device. During the PCI device hotplug, which 101 + * was possiblly triggered by EEH core, the binding between EEH device 102 + * and the PCI device isn't built yet. 103 + */ 104 + static int powernv_eeh_dev_probe(struct pci_dev *dev, void *flag) 105 + { 106 + struct pci_controller *hose = pci_bus_to_host(dev->bus); 107 + struct pnv_phb *phb = hose->private_data; 108 + struct device_node *dn = pci_device_to_OF_node(dev); 109 + struct eeh_dev *edev = of_node_to_eeh_dev(dn); 110 + 111 + /* 112 + * When probing the root bridge, which doesn't have any 113 + * subordinate PCI devices. We don't have OF node for 114 + * the root bridge. So it's not reasonable to continue 115 + * the probing. 116 + */ 117 + if (!dn || !edev) 118 + return 0; 119 + 120 + /* Skip for PCI-ISA bridge */ 121 + if ((dev->class >> 8) == PCI_CLASS_BRIDGE_ISA) 122 + return 0; 123 + 124 + /* Initialize eeh device */ 125 + edev->class_code = dev->class; 126 + edev->mode = 0; 127 + edev->config_addr = ((dev->bus->number << 8) | dev->devfn); 128 + edev->pe_config_addr = phb->bdfn_to_pe(phb, dev->bus, dev->devfn & 0xff); 129 + 130 + /* Create PE */ 131 + eeh_add_to_parent_pe(edev); 132 + 133 + /* 134 + * Enable EEH explicitly so that we will do EEH check 135 + * while accessing I/O stuff 136 + * 137 + * FIXME: Enable that for PHB3 later 138 + */ 139 + if (phb->type == PNV_PHB_IODA1) 140 + eeh_subsystem_enabled = 1; 141 + 142 + /* Save memory bars */ 143 + eeh_save_bars(edev); 144 + 145 + return 0; 146 + } 147 + 148 + /** 149 + * powernv_eeh_set_option - Initialize EEH or MMIO/DMA reenable 150 + * @pe: EEH PE 151 + * @option: operation to be issued 152 + * 153 + * The function is used to control the EEH functionality globally. 154 + * Currently, following options are support according to PAPR: 155 + * Enable EEH, Disable EEH, Enable MMIO and Enable DMA 156 + */ 157 + static int powernv_eeh_set_option(struct eeh_pe *pe, int option) 158 + { 159 + struct pci_controller *hose = pe->phb; 160 + struct pnv_phb *phb = hose->private_data; 161 + int ret = -EEXIST; 162 + 163 + /* 164 + * What we need do is pass it down for hardware 165 + * implementation to handle it. 166 + */ 167 + if (phb->eeh_ops && phb->eeh_ops->set_option) 168 + ret = phb->eeh_ops->set_option(pe, option); 169 + 170 + return ret; 171 + } 172 + 173 + /** 174 + * powernv_eeh_get_pe_addr - Retrieve PE address 175 + * @pe: EEH PE 176 + * 177 + * Retrieve the PE address according to the given tranditional 178 + * PCI BDF (Bus/Device/Function) address. 179 + */ 180 + static int powernv_eeh_get_pe_addr(struct eeh_pe *pe) 181 + { 182 + return pe->addr; 183 + } 184 + 185 + /** 186 + * powernv_eeh_get_state - Retrieve PE state 187 + * @pe: EEH PE 188 + * @delay: delay while PE state is temporarily unavailable 189 + * 190 + * Retrieve the state of the specified PE. For IODA-compitable 191 + * platform, it should be retrieved from IODA table. Therefore, 192 + * we prefer passing down to hardware implementation to handle 193 + * it. 194 + */ 195 + static int powernv_eeh_get_state(struct eeh_pe *pe, int *delay) 196 + { 197 + struct pci_controller *hose = pe->phb; 198 + struct pnv_phb *phb = hose->private_data; 199 + int ret = EEH_STATE_NOT_SUPPORT; 200 + 201 + if (phb->eeh_ops && phb->eeh_ops->get_state) { 202 + ret = phb->eeh_ops->get_state(pe); 203 + 204 + /* 205 + * If the PE state is temporarily unavailable, 206 + * to inform the EEH core delay for default 207 + * period (1 second) 208 + */ 209 + if (delay) { 210 + *delay = 0; 211 + if (ret & EEH_STATE_UNAVAILABLE) 212 + *delay = 1000; 213 + } 214 + } 215 + 216 + return ret; 217 + } 218 + 219 + /** 220 + * powernv_eeh_reset - Reset the specified PE 221 + * @pe: EEH PE 222 + * @option: reset option 223 + * 224 + * Reset the specified PE 225 + */ 226 + static int powernv_eeh_reset(struct eeh_pe *pe, int option) 227 + { 228 + struct pci_controller *hose = pe->phb; 229 + struct pnv_phb *phb = hose->private_data; 230 + int ret = -EEXIST; 231 + 232 + if (phb->eeh_ops && phb->eeh_ops->reset) 233 + ret = phb->eeh_ops->reset(pe, option); 234 + 235 + return ret; 236 + } 237 + 238 + /** 239 + * powernv_eeh_wait_state - Wait for PE state 240 + * @pe: EEH PE 241 + * @max_wait: maximal period in microsecond 242 + * 243 + * Wait for the state of associated PE. It might take some time 244 + * to retrieve the PE's state. 245 + */ 246 + static int powernv_eeh_wait_state(struct eeh_pe *pe, int max_wait) 247 + { 248 + int ret; 249 + int mwait; 250 + 251 + while (1) { 252 + ret = powernv_eeh_get_state(pe, &mwait); 253 + 254 + /* 255 + * If the PE's state is temporarily unavailable, 256 + * we have to wait for the specified time. Otherwise, 257 + * the PE's state will be returned immediately. 258 + */ 259 + if (ret != EEH_STATE_UNAVAILABLE) 260 + return ret; 261 + 262 + max_wait -= mwait; 263 + if (max_wait <= 0) { 264 + pr_warning("%s: Timeout getting PE#%x's state (%d)\n", 265 + __func__, pe->addr, max_wait); 266 + return EEH_STATE_NOT_SUPPORT; 267 + } 268 + 269 + msleep(mwait); 270 + } 271 + 272 + return EEH_STATE_NOT_SUPPORT; 273 + } 274 + 275 + /** 276 + * powernv_eeh_get_log - Retrieve error log 277 + * @pe: EEH PE 278 + * @severity: temporary or permanent error log 279 + * @drv_log: driver log to be combined with retrieved error log 280 + * @len: length of driver log 281 + * 282 + * Retrieve the temporary or permanent error from the PE. 283 + */ 284 + static int powernv_eeh_get_log(struct eeh_pe *pe, int severity, 285 + char *drv_log, unsigned long len) 286 + { 287 + struct pci_controller *hose = pe->phb; 288 + struct pnv_phb *phb = hose->private_data; 289 + int ret = -EEXIST; 290 + 291 + if (phb->eeh_ops && phb->eeh_ops->get_log) 292 + ret = phb->eeh_ops->get_log(pe, severity, drv_log, len); 293 + 294 + return ret; 295 + } 296 + 297 + /** 298 + * powernv_eeh_configure_bridge - Configure PCI bridges in the indicated PE 299 + * @pe: EEH PE 300 + * 301 + * The function will be called to reconfigure the bridges included 302 + * in the specified PE so that the mulfunctional PE would be recovered 303 + * again. 304 + */ 305 + static int powernv_eeh_configure_bridge(struct eeh_pe *pe) 306 + { 307 + struct pci_controller *hose = pe->phb; 308 + struct pnv_phb *phb = hose->private_data; 309 + int ret = 0; 310 + 311 + if (phb->eeh_ops && phb->eeh_ops->configure_bridge) 312 + ret = phb->eeh_ops->configure_bridge(pe); 313 + 314 + return ret; 315 + } 316 + 317 + /** 318 + * powernv_eeh_next_error - Retrieve next EEH error to handle 319 + * @pe: Affected PE 320 + * 321 + * Using OPAL API, to retrieve next EEH error for EEH core to handle 322 + */ 323 + static int powernv_eeh_next_error(struct eeh_pe **pe) 324 + { 325 + struct pci_controller *hose; 326 + struct pnv_phb *phb = NULL; 327 + 328 + list_for_each_entry(hose, &hose_list, list_node) { 329 + phb = hose->private_data; 330 + break; 331 + } 332 + 333 + if (phb && phb->eeh_ops->next_error) 334 + return phb->eeh_ops->next_error(pe); 335 + 336 + return -EEXIST; 337 + } 338 + 339 + static struct eeh_ops powernv_eeh_ops = { 340 + .name = "powernv", 341 + .init = powernv_eeh_init, 342 + .post_init = powernv_eeh_post_init, 343 + .of_probe = NULL, 344 + .dev_probe = powernv_eeh_dev_probe, 345 + .set_option = powernv_eeh_set_option, 346 + .get_pe_addr = powernv_eeh_get_pe_addr, 347 + .get_state = powernv_eeh_get_state, 348 + .reset = powernv_eeh_reset, 349 + .wait_state = powernv_eeh_wait_state, 350 + .get_log = powernv_eeh_get_log, 351 + .configure_bridge = powernv_eeh_configure_bridge, 352 + .read_config = pnv_pci_cfg_read, 353 + .write_config = pnv_pci_cfg_write, 354 + .next_error = powernv_eeh_next_error 355 + }; 356 + 357 + /** 358 + * eeh_powernv_init - Register platform dependent EEH operations 359 + * 360 + * EEH initialization on powernv platform. This function should be 361 + * called before any EEH related functions. 362 + */ 363 + static int __init eeh_powernv_init(void) 364 + { 365 + int ret = -EINVAL; 366 + 367 + if (!machine_is(powernv)) 368 + return ret; 369 + 370 + ret = eeh_ops_register(&powernv_eeh_ops); 371 + if (!ret) 372 + pr_info("EEH: PowerNV platform initialized\n"); 373 + else 374 + pr_info("EEH: Failed to initialize PowerNV platform (%d)\n", ret); 375 + 376 + return ret; 377 + } 378 + 379 + early_initcall(eeh_powernv_init);

+3

arch/powerpc/platforms/powernv/opal-wrappers.S

··· 107 107 OPAL_CALL(opal_set_slot_led_status, OPAL_SET_SLOT_LED_STATUS); 108 108 OPAL_CALL(opal_get_epow_status, OPAL_GET_EPOW_STATUS); 109 109 OPAL_CALL(opal_set_system_attention_led, OPAL_SET_SYSTEM_ATTENTION_LED); 110 + OPAL_CALL(opal_pci_next_error, OPAL_PCI_NEXT_ERROR); 111 + OPAL_CALL(opal_pci_poll, OPAL_PCI_POLL); 110 112 OPAL_CALL(opal_pci_msi_eoi, OPAL_PCI_MSI_EOI); 113 + OPAL_CALL(opal_pci_get_phb_diag_data2, OPAL_PCI_GET_PHB_DIAG_DATA2);

+68 -1

arch/powerpc/platforms/powernv/opal.c

··· 15 15 #include <linux/of.h> 16 16 #include <linux/of_platform.h> 17 17 #include <linux/interrupt.h> 18 + #include <linux/notifier.h> 18 19 #include <linux/slab.h> 19 20 #include <asm/opal.h> 20 21 #include <asm/firmware.h> ··· 32 31 extern u64 opal_mc_secondary_handler[]; 33 32 static unsigned int *opal_irqs; 34 33 static unsigned int opal_irq_count; 34 + static ATOMIC_NOTIFIER_HEAD(opal_notifier_head); 35 + static DEFINE_SPINLOCK(opal_notifier_lock); 36 + static uint64_t last_notified_mask = 0x0ul; 37 + static atomic_t opal_notifier_hold = ATOMIC_INIT(0); 35 38 36 39 int __init early_init_dt_scan_opal(unsigned long node, 37 40 const char *uname, int depth, void *data) ··· 99 94 } 100 95 101 96 early_initcall(opal_register_exception_handlers); 97 + 98 + int opal_notifier_register(struct notifier_block *nb) 99 + { 100 + if (!nb) { 101 + pr_warning("%s: Invalid argument (%p)\n", 102 + __func__, nb); 103 + return -EINVAL; 104 + } 105 + 106 + atomic_notifier_chain_register(&opal_notifier_head, nb); 107 + return 0; 108 + } 109 + 110 + static void opal_do_notifier(uint64_t events) 111 + { 112 + unsigned long flags; 113 + uint64_t changed_mask; 114 + 115 + if (atomic_read(&opal_notifier_hold)) 116 + return; 117 + 118 + spin_lock_irqsave(&opal_notifier_lock, flags); 119 + changed_mask = last_notified_mask ^ events; 120 + last_notified_mask = events; 121 + spin_unlock_irqrestore(&opal_notifier_lock, flags); 122 + 123 + /* 124 + * We feed with the event bits and changed bits for 125 + * enough information to the callback. 126 + */ 127 + atomic_notifier_call_chain(&opal_notifier_head, 128 + events, (void *)changed_mask); 129 + } 130 + 131 + void opal_notifier_update_evt(uint64_t evt_mask, 132 + uint64_t evt_val) 133 + { 134 + unsigned long flags; 135 + 136 + spin_lock_irqsave(&opal_notifier_lock, flags); 137 + last_notified_mask &= ~evt_mask; 138 + last_notified_mask |= evt_val; 139 + spin_unlock_irqrestore(&opal_notifier_lock, flags); 140 + } 141 + 142 + void opal_notifier_enable(void) 143 + { 144 + int64_t rc; 145 + uint64_t evt = 0; 146 + 147 + atomic_set(&opal_notifier_hold, 0); 148 + 149 + /* Process pending events */ 150 + rc = opal_poll_events(&evt); 151 + if (rc == OPAL_SUCCESS && evt) 152 + opal_do_notifier(evt); 153 + } 154 + 155 + void opal_notifier_disable(void) 156 + { 157 + atomic_set(&opal_notifier_hold, 1); 158 + } 102 159 103 160 int opal_get_chars(uint32_t vtermno, char *buf, int count) 104 161 { ··· 364 297 365 298 opal_handle_interrupt(virq_to_hw(irq), &events); 366 299 367 - /* XXX TODO: Do something with the events */ 300 + opal_do_notifier(events); 368 301 369 302 return IRQ_HANDLED; 370 303 }

+59 -3

arch/powerpc/platforms/powernv/pci-ioda.c

··· 13 13 14 14 #include <linux/kernel.h> 15 15 #include <linux/pci.h> 16 + #include <linux/debugfs.h> 16 17 #include <linux/delay.h> 17 18 #include <linux/string.h> 18 19 #include <linux/init.h> ··· 33 32 #include <asm/iommu.h> 34 33 #include <asm/tce.h> 35 34 #include <asm/xics.h> 35 + #include <asm/debug.h> 36 36 37 37 #include "powernv.h" 38 38 #include "pci.h" ··· 443 441 set_iommu_table_base(&pdev->dev, &pe->tce32_table); 444 442 } 445 443 444 + static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus) 445 + { 446 + struct pci_dev *dev; 447 + 448 + list_for_each_entry(dev, &bus->devices, bus_list) { 449 + set_iommu_table_base(&dev->dev, &pe->tce32_table); 450 + if (dev->subordinate) 451 + pnv_ioda_setup_bus_dma(pe, dev->subordinate); 452 + } 453 + } 454 + 446 455 static void pnv_pci_ioda1_tce_invalidate(struct iommu_table *tbl, 447 456 u64 *startp, u64 *endp) 448 457 { ··· 608 595 TCE_PCI_SWINV_PAIR; 609 596 } 610 597 iommu_init_table(tbl, phb->hose->node); 598 + iommu_register_group(tbl, pci_domain_nr(pe->pbus), pe->pe_number); 599 + 600 + if (pe->pdev) 601 + set_iommu_table_base(&pe->pdev->dev, tbl); 602 + else 603 + pnv_ioda_setup_bus_dma(pe, pe->pbus); 611 604 612 605 return; 613 606 fail: ··· 685 666 tbl->it_type = TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE; 686 667 } 687 668 iommu_init_table(tbl, phb->hose->node); 669 + 670 + if (pe->pdev) 671 + set_iommu_table_base(&pe->pdev->dev, tbl); 672 + else 673 + pnv_ioda_setup_bus_dma(pe, pe->pbus); 688 674 689 675 return; 690 676 fail: ··· 992 968 } 993 969 } 994 970 971 + static void pnv_pci_ioda_create_dbgfs(void) 972 + { 973 + #ifdef CONFIG_DEBUG_FS 974 + struct pci_controller *hose, *tmp; 975 + struct pnv_phb *phb; 976 + char name[16]; 977 + 978 + list_for_each_entry_safe(hose, tmp, &hose_list, list_node) { 979 + phb = hose->private_data; 980 + 981 + sprintf(name, "PCI%04x", hose->global_number); 982 + phb->dbgfs = debugfs_create_dir(name, powerpc_debugfs_root); 983 + if (!phb->dbgfs) 984 + pr_warning("%s: Error on creating debugfs on PHB#%x\n", 985 + __func__, hose->global_number); 986 + } 987 + #endif /* CONFIG_DEBUG_FS */ 988 + } 989 + 995 990 static void pnv_pci_ioda_fixup(void) 996 991 { 997 992 pnv_pci_ioda_setup_PEs(); 998 993 pnv_pci_ioda_setup_seg(); 999 994 pnv_pci_ioda_setup_DMA(); 995 + 996 + pnv_pci_ioda_create_dbgfs(); 997 + 998 + #ifdef CONFIG_EEH 999 + eeh_probe_mode_set(EEH_PROBE_MODE_DEV); 1000 + eeh_addr_cache_build(); 1001 + eeh_init(); 1002 + #endif 1000 1003 } 1001 1004 1002 1005 /* ··· 1100 1049 OPAL_ASSERT_RESET); 1101 1050 } 1102 1051 1103 - void __init pnv_pci_init_ioda_phb(struct device_node *np, int ioda_type) 1052 + void __init pnv_pci_init_ioda_phb(struct device_node *np, 1053 + u64 hub_id, int ioda_type) 1104 1054 { 1105 1055 struct pci_controller *hose; 1106 1056 static int primary = 1; ··· 1139 1087 hose->first_busno = 0; 1140 1088 hose->last_busno = 0xff; 1141 1089 hose->private_data = phb; 1090 + phb->hub_id = hub_id; 1142 1091 phb->opal_id = phb_id; 1143 1092 phb->type = ioda_type; 1144 1093 ··· 1225 1172 phb->ioda.io_size, phb->ioda.io_segsize); 1226 1173 1227 1174 phb->hose->ops = &pnv_pci_ops; 1175 + #ifdef CONFIG_EEH 1176 + phb->eeh_ops = &ioda_eeh_ops; 1177 + #endif 1228 1178 1229 1179 /* Setup RID -> PE mapping function */ 1230 1180 phb->bdfn_to_pe = pnv_ioda_bdfn_to_pe; ··· 1268 1212 1269 1213 void pnv_pci_init_ioda2_phb(struct device_node *np) 1270 1214 { 1271 - pnv_pci_init_ioda_phb(np, PNV_PHB_IODA2); 1215 + pnv_pci_init_ioda_phb(np, 0, PNV_PHB_IODA2); 1272 1216 } 1273 1217 1274 1218 void __init pnv_pci_init_ioda_hub(struct device_node *np) ··· 1291 1235 for_each_child_of_node(np, phbn) { 1292 1236 /* Look for IODA1 PHBs */ 1293 1237 if (of_device_is_compatible(phbn, "ibm,ioda-phb")) 1294 - pnv_pci_init_ioda_phb(phbn, PNV_PHB_IODA1); 1238 + pnv_pci_init_ioda_phb(phbn, hub_id, PNV_PHB_IODA1); 1295 1239 } 1296 1240 }

+8 -3

arch/powerpc/platforms/powernv/pci-p5ioc2.c

··· 86 86 static void pnv_pci_p5ioc2_dma_dev_setup(struct pnv_phb *phb, 87 87 struct pci_dev *pdev) 88 88 { 89 - if (phb->p5ioc2.iommu_table.it_map == NULL) 89 + if (phb->p5ioc2.iommu_table.it_map == NULL) { 90 90 iommu_init_table(&phb->p5ioc2.iommu_table, phb->hose->node); 91 + iommu_register_group(&phb->p5ioc2.iommu_table, 92 + pci_domain_nr(phb->hose->bus), phb->opal_id); 93 + } 91 94 92 95 set_iommu_table_base(&pdev->dev, &phb->p5ioc2.iommu_table); 93 96 } 94 97 95 - static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np, 98 + static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np, u64 hub_id, 96 99 void *tce_mem, u64 tce_size) 97 100 { 98 101 struct pnv_phb *phb; ··· 136 133 phb->hose->first_busno = 0; 137 134 phb->hose->last_busno = 0xff; 138 135 phb->hose->private_data = phb; 136 + phb->hub_id = hub_id; 139 137 phb->opal_id = phb_id; 140 138 phb->type = PNV_PHB_P5IOC2; 141 139 phb->model = PNV_PHB_MODEL_P5IOC2; ··· 230 226 for_each_child_of_node(np, phbn) { 231 227 if (of_device_is_compatible(phbn, "ibm,p5ioc2-pcix") || 232 228 of_device_is_compatible(phbn, "ibm,p5ioc2-pciex")) { 233 - pnv_pci_init_p5ioc2_phb(phbn, tce_mem, tce_per_phb); 229 + pnv_pci_init_p5ioc2_phb(phbn, hub_id, 230 + tce_mem, tce_per_phb); 234 231 tce_mem += tce_per_phb; 235 232 } 236 233 }

+102 -35

arch/powerpc/platforms/powernv/pci.c

··· 20 20 #include <linux/irq.h> 21 21 #include <linux/io.h> 22 22 #include <linux/msi.h> 23 + #include <linux/iommu.h> 23 24 24 25 #include <asm/sections.h> 25 26 #include <asm/io.h> ··· 33 32 #include <asm/iommu.h> 34 33 #include <asm/tce.h> 35 34 #include <asm/firmware.h> 35 + #include <asm/eeh_event.h> 36 + #include <asm/eeh.h> 36 37 37 38 #include "powernv.h" 38 39 #include "pci.h" ··· 205 202 206 203 spin_lock_irqsave(&phb->lock, flags); 207 204 208 - rc = opal_pci_get_phb_diag_data(phb->opal_id, phb->diag.blob, PNV_PCI_DIAG_BUF_SIZE); 205 + rc = opal_pci_get_phb_diag_data2(phb->opal_id, phb->diag.blob, 206 + PNV_PCI_DIAG_BUF_SIZE); 209 207 has_diag = (rc == OPAL_SUCCESS); 210 208 211 209 rc = opal_pci_eeh_freeze_clear(phb->opal_id, pe_no, ··· 231 227 spin_unlock_irqrestore(&phb->lock, flags); 232 228 } 233 229 234 - static void pnv_pci_config_check_eeh(struct pnv_phb *phb, struct pci_bus *bus, 235 - u32 bdfn) 230 + static void pnv_pci_config_check_eeh(struct pnv_phb *phb, 231 + struct device_node *dn) 236 232 { 237 233 s64 rc; 238 234 u8 fstate; 239 235 u16 pcierr; 240 236 u32 pe_no; 241 237 242 - /* Get PE# if we support IODA */ 243 - pe_no = phb->bdfn_to_pe ? phb->bdfn_to_pe(phb, bus, bdfn & 0xff) : 0; 238 + /* 239 + * Get the PE#. During the PCI probe stage, we might not 240 + * setup that yet. So all ER errors should be mapped to 241 + * PE#0 242 + */ 243 + pe_no = PCI_DN(dn)->pe_number; 244 + if (pe_no == IODA_INVALID_PE) 245 + pe_no = 0; 244 246 245 247 /* Read freeze status */ 246 248 rc = opal_pci_eeh_freeze_status(phb->opal_id, pe_no, &fstate, &pcierr, 247 249 NULL); 248 250 if (rc) { 249 - pr_warning("PCI %d: Failed to read EEH status for PE#%d," 250 - " err %lld\n", phb->hose->global_number, pe_no, rc); 251 + pr_warning("%s: Can't read EEH status (PE#%d) for " 252 + "%s, err %lld\n", 253 + __func__, pe_no, dn->full_name, rc); 251 254 return; 252 255 } 253 - cfg_dbg(" -> EEH check, bdfn=%04x PE%d fstate=%x\n", 254 - bdfn, pe_no, fstate); 256 + cfg_dbg(" -> EEH check, bdfn=%04x PE#%d fstate=%x\n", 257 + (PCI_DN(dn)->busno << 8) | (PCI_DN(dn)->devfn), 258 + pe_no, fstate); 255 259 if (fstate != 0) 256 260 pnv_pci_handle_eeh_config(phb, pe_no); 257 261 } 258 262 259 - static int pnv_pci_read_config(struct pci_bus *bus, 260 - unsigned int devfn, 261 - int where, int size, u32 *val) 263 + int pnv_pci_cfg_read(struct device_node *dn, 264 + int where, int size, u32 *val) 262 265 { 263 - struct pci_controller *hose = pci_bus_to_host(bus); 264 - struct pnv_phb *phb = hose->private_data; 265 - u32 bdfn = (((uint64_t)bus->number) << 8) | devfn; 266 + struct pci_dn *pdn = PCI_DN(dn); 267 + struct pnv_phb *phb = pdn->phb->private_data; 268 + u32 bdfn = (pdn->busno << 8) | pdn->devfn; 269 + #ifdef CONFIG_EEH 270 + struct eeh_pe *phb_pe = NULL; 271 + #endif 266 272 s64 rc; 267 - 268 - if (hose == NULL) 269 - return PCIBIOS_DEVICE_NOT_FOUND; 270 273 271 274 switch (size) { 272 275 case 1: { ··· 298 287 default: 299 288 return PCIBIOS_FUNC_NOT_SUPPORTED; 300 289 } 301 - cfg_dbg("pnv_pci_read_config bus: %x devfn: %x +%x/%x -> %08x\n", 302 - bus->number, devfn, where, size, *val); 290 + cfg_dbg("%s: bus: %x devfn: %x +%x/%x -> %08x\n", 291 + __func__, pdn->busno, pdn->devfn, where, size, *val); 303 292 304 - /* Check if the PHB got frozen due to an error (no response) */ 305 - pnv_pci_config_check_eeh(phb, bus, bdfn); 293 + /* 294 + * Check if the specified PE has been put into frozen 295 + * state. On the other hand, we needn't do that while 296 + * the PHB has been put into frozen state because of 297 + * PHB-fatal errors. 298 + */ 299 + #ifdef CONFIG_EEH 300 + phb_pe = eeh_phb_pe_get(pdn->phb); 301 + if (phb_pe && (phb_pe->state & EEH_PE_ISOLATED)) 302 + return PCIBIOS_SUCCESSFUL; 303 + 304 + if (phb->eeh_state & PNV_EEH_STATE_ENABLED) { 305 + if (*val == EEH_IO_ERROR_VALUE(size) && 306 + eeh_dev_check_failure(of_node_to_eeh_dev(dn))) 307 + return PCIBIOS_DEVICE_NOT_FOUND; 308 + } else { 309 + pnv_pci_config_check_eeh(phb, dn); 310 + } 311 + #else 312 + pnv_pci_config_check_eeh(phb, dn); 313 + #endif 306 314 307 315 return PCIBIOS_SUCCESSFUL; 308 316 } 309 317 310 - static int pnv_pci_write_config(struct pci_bus *bus, 311 - unsigned int devfn, 312 - int where, int size, u32 val) 318 + int pnv_pci_cfg_write(struct device_node *dn, 319 + int where, int size, u32 val) 313 320 { 314 - struct pci_controller *hose = pci_bus_to_host(bus); 315 - struct pnv_phb *phb = hose->private_data; 316 - u32 bdfn = (((uint64_t)bus->number) << 8) | devfn; 321 + struct pci_dn *pdn = PCI_DN(dn); 322 + struct pnv_phb *phb = pdn->phb->private_data; 323 + u32 bdfn = (pdn->busno << 8) | pdn->devfn; 317 324 318 - if (hose == NULL) 319 - return PCIBIOS_DEVICE_NOT_FOUND; 320 - 321 - cfg_dbg("pnv_pci_write_config bus: %x devfn: %x +%x/%x -> %08x\n", 322 - bus->number, devfn, where, size, val); 325 + cfg_dbg("%s: bus: %x devfn: %x +%x/%x -> %08x\n", 326 + pdn->busno, pdn->devfn, where, size, val); 323 327 switch (size) { 324 328 case 1: 325 329 opal_pci_config_write_byte(phb->opal_id, bdfn, where, val); ··· 348 322 default: 349 323 return PCIBIOS_FUNC_NOT_SUPPORTED; 350 324 } 325 + 351 326 /* Check if the PHB got frozen due to an error (no response) */ 352 - pnv_pci_config_check_eeh(phb, bus, bdfn); 327 + #ifdef CONFIG_EEH 328 + if (!(phb->eeh_state & PNV_EEH_STATE_ENABLED)) 329 + pnv_pci_config_check_eeh(phb, dn); 330 + #else 331 + pnv_pci_config_check_eeh(phb, dn); 332 + #endif 353 333 354 334 return PCIBIOS_SUCCESSFUL; 355 335 } 356 336 337 + static int pnv_pci_read_config(struct pci_bus *bus, 338 + unsigned int devfn, 339 + int where, int size, u32 *val) 340 + { 341 + struct device_node *dn, *busdn = pci_bus_to_OF_node(bus); 342 + struct pci_dn *pdn; 343 + 344 + for (dn = busdn->child; dn; dn = dn->sibling) { 345 + pdn = PCI_DN(dn); 346 + if (pdn && pdn->devfn == devfn) 347 + return pnv_pci_cfg_read(dn, where, size, val); 348 + } 349 + 350 + *val = 0xFFFFFFFF; 351 + return PCIBIOS_DEVICE_NOT_FOUND; 352 + 353 + } 354 + 355 + static int pnv_pci_write_config(struct pci_bus *bus, 356 + unsigned int devfn, 357 + int where, int size, u32 val) 358 + { 359 + struct device_node *dn, *busdn = pci_bus_to_OF_node(bus); 360 + struct pci_dn *pdn; 361 + 362 + for (dn = busdn->child; dn; dn = dn->sibling) { 363 + pdn = PCI_DN(dn); 364 + if (pdn && pdn->devfn == devfn) 365 + return pnv_pci_cfg_write(dn, where, size, val); 366 + } 367 + 368 + return PCIBIOS_DEVICE_NOT_FOUND; 369 + } 370 + 357 371 struct pci_ops pnv_pci_ops = { 358 - .read = pnv_pci_read_config, 372 + .read = pnv_pci_read_config, 359 373 .write = pnv_pci_write_config, 360 374 }; 361 375 ··· 478 412 pnv_pci_setup_iommu_table(tbl, __va(be64_to_cpup(basep)), 479 413 be32_to_cpup(sizep), 0); 480 414 iommu_init_table(tbl, hose->node); 415 + iommu_register_group(tbl, pci_domain_nr(hose->bus), 0); 481 416 482 417 /* Deal with SW invalidated TCEs when needed (BML way) */ 483 418 swinvp = of_get_property(hose->dn, "linux,tce-sw-invalidate-info",

+35

arch/powerpc/platforms/powernv/pci.h

··· 66 66 struct list_head list; 67 67 }; 68 68 69 + /* IOC dependent EEH operations */ 70 + #ifdef CONFIG_EEH 71 + struct pnv_eeh_ops { 72 + int (*post_init)(struct pci_controller *hose); 73 + int (*set_option)(struct eeh_pe *pe, int option); 74 + int (*get_state)(struct eeh_pe *pe); 75 + int (*reset)(struct eeh_pe *pe, int option); 76 + int (*get_log)(struct eeh_pe *pe, int severity, 77 + char *drv_log, unsigned long len); 78 + int (*configure_bridge)(struct eeh_pe *pe); 79 + int (*next_error)(struct eeh_pe **pe); 80 + }; 81 + 82 + #define PNV_EEH_STATE_ENABLED (1 << 0) /* EEH enabled */ 83 + #define PNV_EEH_STATE_REMOVED (1 << 1) /* PHB removed */ 84 + 85 + #endif /* CONFIG_EEH */ 86 + 69 87 struct pnv_phb { 70 88 struct pci_controller *hose; 71 89 enum pnv_phb_type type; 72 90 enum pnv_phb_model model; 91 + u64 hub_id; 73 92 u64 opal_id; 74 93 void __iomem *regs; 75 94 int initialized; 76 95 spinlock_t lock; 96 + 97 + #ifdef CONFIG_EEH 98 + struct pnv_eeh_ops *eeh_ops; 99 + int eeh_state; 100 + #endif 101 + 102 + #ifdef CONFIG_DEBUG_FS 103 + struct dentry *dbgfs; 104 + #endif 77 105 78 106 #ifdef CONFIG_PCI_MSI 79 107 unsigned int msi_base; ··· 178 150 }; 179 151 180 152 extern struct pci_ops pnv_pci_ops; 153 + #ifdef CONFIG_EEH 154 + extern struct pnv_eeh_ops ioda_eeh_ops; 155 + #endif 181 156 157 + int pnv_pci_cfg_read(struct device_node *dn, 158 + int where, int size, u32 *val); 159 + int pnv_pci_cfg_write(struct device_node *dn, 160 + int where, int size, u32 val); 182 161 extern void pnv_pci_setup_iommu_table(struct iommu_table *tbl, 183 162 void *tce_mem, u64 tce_size, 184 163 u64 dma_offset);

+4

arch/powerpc/platforms/powernv/setup.c

··· 93 93 { 94 94 long rc = OPAL_BUSY; 95 95 96 + opal_notifier_disable(); 97 + 96 98 while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) { 97 99 rc = opal_cec_reboot(); 98 100 if (rc == OPAL_BUSY_EVENT) ··· 109 107 static void __noreturn pnv_power_off(void) 110 108 { 111 109 long rc = OPAL_BUSY; 110 + 111 + opal_notifier_disable(); 112 112 113 113 while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) { 114 114 rc = opal_cec_power_down(0);

+2 -2

arch/powerpc/platforms/powernv/smp.c

··· 40 40 #define DBG(fmt...) 41 41 #endif 42 42 43 - static void __cpuinit pnv_smp_setup_cpu(int cpu) 43 + static void pnv_smp_setup_cpu(int cpu) 44 44 { 45 45 if (cpu != boot_cpuid) 46 46 xics_setup_cpu(); ··· 51 51 /* Special case - we inhibit secondary thread startup 52 52 * during boot if the user requests it. 53 53 */ 54 - if (system_state < SYSTEM_RUNNING && cpu_has_feature(CPU_FTR_SMT)) { 54 + if (system_state == SYSTEM_BOOTING && cpu_has_feature(CPU_FTR_SMT)) { 55 55 if (!smt_enabled_at_boot && cpu_thread_in_core(nr) != 0) 56 56 return 0; 57 57 if (smt_enabled_at_boot

+3 -2

arch/powerpc/platforms/ps3/htab.c

··· 109 109 } 110 110 111 111 static long ps3_hpte_updatepp(unsigned long slot, unsigned long newpp, 112 - unsigned long vpn, int psize, int ssize, int local) 112 + unsigned long vpn, int psize, int apsize, 113 + int ssize, int local) 113 114 { 114 115 int result; 115 116 u64 hpte_v, want_v, hpte_rs; ··· 163 162 } 164 163 165 164 static void ps3_hpte_invalidate(unsigned long slot, unsigned long vpn, 166 - int psize, int ssize, int local) 165 + int psize, int apsize, int ssize, int local) 167 166 { 168 167 unsigned long flags; 169 168 int result;

-5

arch/powerpc/platforms/pseries/Kconfig

··· 33 33 processors, that is, which share physical processors between 34 34 two or more partitions. 35 35 36 - config EEH 37 - bool 38 - depends on PPC_PSERIES && PCI 39 - default y 40 - 41 36 config PSERIES_MSI 42 37 bool 43 38 depends on PCI_MSI && EEH

+1 -3

arch/powerpc/platforms/pseries/Makefile

··· 6 6 firmware.o power.o dlpar.o mobility.o 7 7 obj-$(CONFIG_SMP) += smp.o 8 8 obj-$(CONFIG_SCANLOG) += scanlog.o 9 - obj-$(CONFIG_EEH) += eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \ 10 - eeh_driver.o eeh_event.o eeh_sysfs.o \ 11 - eeh_pseries.o 9 + obj-$(CONFIG_EEH) += eeh_pseries.o 12 10 obj-$(CONFIG_KEXEC) += kexec.o 13 11 obj-$(CONFIG_PCI) += pci.o pci_dlpar.o 14 12 obj-$(CONFIG_PSERIES_MSI) += msi.o

+161 -33

arch/powerpc/platforms/pseries/eeh.c arch/powerpc/kernel/eeh.c

··· 103 103 */ 104 104 int eeh_probe_mode; 105 105 106 - /* Global EEH mutex */ 107 - DEFINE_MUTEX(eeh_mutex); 108 - 109 106 /* Lock to avoid races due to multiple reports of an error */ 110 - static DEFINE_RAW_SPINLOCK(confirm_error_lock); 107 + DEFINE_RAW_SPINLOCK(confirm_error_lock); 111 108 112 109 /* Buffer for reporting pci register dumps. Its here in BSS, and 113 110 * not dynamically alloced, so that it ends up in RMO where RTAS ··· 232 235 { 233 236 size_t loglen = 0; 234 237 struct eeh_dev *edev; 238 + bool valid_cfg_log = true; 235 239 236 - eeh_pci_enable(pe, EEH_OPT_THAW_MMIO); 237 - eeh_ops->configure_bridge(pe); 238 - eeh_pe_restore_bars(pe); 240 + /* 241 + * When the PHB is fenced or dead, it's pointless to collect 242 + * the data from PCI config space because it should return 243 + * 0xFF's. For ER, we still retrieve the data from the PCI 244 + * config space. 245 + */ 246 + if (eeh_probe_mode_dev() && 247 + (pe->type & EEH_PE_PHB) && 248 + (pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD))) 249 + valid_cfg_log = false; 239 250 240 - pci_regs_buf[0] = 0; 241 - eeh_pe_for_each_dev(pe, edev) { 242 - loglen += eeh_gather_pci_data(edev, pci_regs_buf, 243 - EEH_PCI_REGS_LOG_LEN); 244 - } 251 + if (valid_cfg_log) { 252 + eeh_pci_enable(pe, EEH_OPT_THAW_MMIO); 253 + eeh_ops->configure_bridge(pe); 254 + eeh_pe_restore_bars(pe); 255 + 256 + pci_regs_buf[0] = 0; 257 + eeh_pe_for_each_dev(pe, edev) { 258 + loglen += eeh_gather_pci_data(edev, pci_regs_buf + loglen, 259 + EEH_PCI_REGS_LOG_LEN - loglen); 260 + } 261 + } 245 262 246 263 eeh_ops->get_log(pe, severity, pci_regs_buf, loglen); 247 264 } ··· 271 260 { 272 261 pte_t *ptep; 273 262 unsigned long pa; 263 + int hugepage_shift; 274 264 275 - ptep = find_linux_pte(init_mm.pgd, token); 265 + /* 266 + * We won't find hugepages here, iomem 267 + */ 268 + ptep = find_linux_pte_or_hugepte(init_mm.pgd, token, &hugepage_shift); 276 269 if (!ptep) 277 270 return token; 271 + WARN_ON(hugepage_shift); 278 272 pa = pte_pfn(*ptep) << PAGE_SHIFT; 279 273 280 274 return pa | (token & (PAGE_SIZE-1)); 275 + } 276 + 277 + /* 278 + * On PowerNV platform, we might already have fenced PHB there. 279 + * For that case, it's meaningless to recover frozen PE. Intead, 280 + * We have to handle fenced PHB firstly. 281 + */ 282 + static int eeh_phb_check_failure(struct eeh_pe *pe) 283 + { 284 + struct eeh_pe *phb_pe; 285 + unsigned long flags; 286 + int ret; 287 + 288 + if (!eeh_probe_mode_dev()) 289 + return -EPERM; 290 + 291 + /* Find the PHB PE */ 292 + phb_pe = eeh_phb_pe_get(pe->phb); 293 + if (!phb_pe) { 294 + pr_warning("%s Can't find PE for PHB#%d\n", 295 + __func__, pe->phb->global_number); 296 + return -EEXIST; 297 + } 298 + 299 + /* If the PHB has been in problematic state */ 300 + eeh_serialize_lock(&flags); 301 + if (phb_pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD)) { 302 + ret = 0; 303 + goto out; 304 + } 305 + 306 + /* Check PHB state */ 307 + ret = eeh_ops->get_state(phb_pe, NULL); 308 + if ((ret < 0) || 309 + (ret == EEH_STATE_NOT_SUPPORT) || 310 + (ret & (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) == 311 + (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) { 312 + ret = 0; 313 + goto out; 314 + } 315 + 316 + /* Isolate the PHB and send event */ 317 + eeh_pe_state_mark(phb_pe, EEH_PE_ISOLATED); 318 + eeh_serialize_unlock(flags); 319 + eeh_send_failure_event(phb_pe); 320 + 321 + pr_err("EEH: PHB#%x failure detected\n", 322 + phb_pe->phb->global_number); 323 + dump_stack(); 324 + 325 + return 1; 326 + out: 327 + eeh_serialize_unlock(flags); 328 + return ret; 281 329 } 282 330 283 331 /** ··· 389 319 return 0; 390 320 } 391 321 322 + /* 323 + * On PowerNV platform, we might already have fenced PHB 324 + * there and we need take care of that firstly. 325 + */ 326 + ret = eeh_phb_check_failure(pe); 327 + if (ret > 0) 328 + return ret; 329 + 392 330 /* If we already have a pending isolation event for this 393 331 * slot, we know it's bad already, we don't need to check. 394 332 * Do this checking under a lock; as multiple PCI devices 395 333 * in one slot might report errors simultaneously, and we 396 334 * only want one error recovery routine running. 397 335 */ 398 - raw_spin_lock_irqsave(&confirm_error_lock, flags); 336 + eeh_serialize_lock(&flags); 399 337 rc = 1; 400 338 if (pe->state & EEH_PE_ISOLATED) { 401 339 pe->check_count++; ··· 446 368 } 447 369 448 370 eeh_stats.slot_resets++; 449 - 371 + 450 372 /* Avoid repeated reports of this failure, including problems 451 373 * with other functions on this device, and functions under 452 374 * bridges. 453 375 */ 454 376 eeh_pe_state_mark(pe, EEH_PE_ISOLATED); 455 - raw_spin_unlock_irqrestore(&confirm_error_lock, flags); 377 + eeh_serialize_unlock(flags); 456 378 457 379 eeh_send_failure_event(pe); 458 380 ··· 460 382 * a stack trace will help the device-driver authors figure 461 383 * out what happened. So print that out. 462 384 */ 463 - WARN(1, "EEH: failure detected\n"); 385 + pr_err("EEH: Frozen PE#%x detected on PHB#%x\n", 386 + pe->addr, pe->phb->global_number); 387 + dump_stack(); 388 + 464 389 return 1; 465 390 466 391 dn_unlock: 467 - raw_spin_unlock_irqrestore(&confirm_error_lock, flags); 392 + eeh_serialize_unlock(flags); 468 393 return rc; 469 394 } 470 395 ··· 606 525 * or a fundamental reset (3). 607 526 * A fundamental reset required by any device under 608 527 * Partitionable Endpoint trumps hot-reset. 609 - */ 528 + */ 610 529 eeh_pe_dev_traverse(pe, eeh_set_dev_freset, &freset); 611 530 612 531 if (freset) ··· 619 538 */ 620 539 #define PCI_BUS_RST_HOLD_TIME_MSEC 250 621 540 msleep(PCI_BUS_RST_HOLD_TIME_MSEC); 622 - 623 - /* We might get hit with another EEH freeze as soon as the 541 + 542 + /* We might get hit with another EEH freeze as soon as the 624 543 * pci slot reset line is dropped. Make sure we don't miss 625 544 * these, and clear the flag now. 626 545 */ ··· 646 565 */ 647 566 int eeh_reset_pe(struct eeh_pe *pe) 648 567 { 568 + int flags = (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE); 649 569 int i, rc; 650 570 651 571 /* Take three shots at resetting the bus */ ··· 654 572 eeh_reset_pe_once(pe); 655 573 656 574 rc = eeh_ops->wait_state(pe, PCI_BUS_RESET_WAIT_MSEC); 657 - if (rc == (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) 575 + if ((rc & flags) == flags) 658 576 return 0; 659 577 660 578 if (rc < 0) { ··· 686 604 if (!edev) 687 605 return; 688 606 dn = eeh_dev_to_of_node(edev); 689 - 607 + 690 608 for (i = 0; i < 16; i++) 691 609 eeh_ops->read_config(dn, i * 4, 4, &edev->config_space[i]); 692 610 } ··· 756 674 * Even if force-off is set, the EEH hardware is still enabled, so that 757 675 * newer systems can boot. 758 676 */ 759 - static int __init eeh_init(void) 677 + int eeh_init(void) 760 678 { 761 679 struct pci_controller *hose, *tmp; 762 680 struct device_node *phb; 763 - int ret; 681 + static int cnt = 0; 682 + int ret = 0; 683 + 684 + /* 685 + * We have to delay the initialization on PowerNV after 686 + * the PCI hierarchy tree has been built because the PEs 687 + * are figured out based on PCI devices instead of device 688 + * tree nodes 689 + */ 690 + if (machine_is(powernv) && cnt++ <= 0) 691 + return ret; 764 692 765 693 /* call platform initialization function */ 766 694 if (!eeh_ops) { ··· 783 691 return ret; 784 692 } 785 693 786 - raw_spin_lock_init(&confirm_error_lock); 694 + /* Initialize EEH event */ 695 + ret = eeh_event_init(); 696 + if (ret) 697 + return ret; 787 698 788 699 /* Enable EEH for all adapters */ 789 700 if (eeh_probe_mode_devtree()) { ··· 795 700 phb = hose->dn; 796 701 traverse_pci_devices(phb, eeh_ops->of_probe, NULL); 797 702 } 703 + } else if (eeh_probe_mode_dev()) { 704 + list_for_each_entry_safe(hose, tmp, 705 + &hose_list, list_node) 706 + pci_walk_bus(hose->bus, eeh_ops->dev_probe, NULL); 707 + } else { 708 + pr_warning("%s: Invalid probe mode %d\n", 709 + __func__, eeh_probe_mode); 710 + return -EINVAL; 711 + } 712 + 713 + /* 714 + * Call platform post-initialization. Actually, It's good chance 715 + * to inform platform that EEH is ready to supply service if the 716 + * I/O cache stuff has been built up. 717 + */ 718 + if (eeh_ops->post_init) { 719 + ret = eeh_ops->post_init(); 720 + if (ret) 721 + return ret; 798 722 } 799 723 800 724 if (eeh_subsystem_enabled) ··· 842 728 { 843 729 struct pci_controller *phb; 844 730 731 + /* 732 + * If we're doing EEH probe based on PCI device, we 733 + * would delay the probe until late stage because 734 + * the PCI device isn't available this moment. 735 + */ 736 + if (!eeh_probe_mode_devtree()) 737 + return; 738 + 845 739 if (!of_node_to_eeh_dev(dn)) 846 740 return; 847 741 phb = of_node_to_eeh_dev(dn)->phb; ··· 858 736 if (NULL == phb || 0 == phb->buid) 859 737 return; 860 738 861 - /* FIXME: hotplug support on POWERNV */ 862 739 eeh_ops->of_probe(dn, NULL); 863 740 } 864 741 ··· 908 787 edev->pdev = dev; 909 788 dev->dev.archdata.edev = edev; 910 789 790 + /* 791 + * We have to do the EEH probe here because the PCI device 792 + * hasn't been created yet in the early stage. 793 + */ 794 + if (eeh_probe_mode_dev()) 795 + eeh_ops->dev_probe(dev, NULL); 796 + 911 797 eeh_addr_cache_insert_dev(dev); 912 798 } 913 799 ··· 931 803 struct pci_dev *dev; 932 804 933 805 list_for_each_entry(dev, &bus->devices, bus_list) { 934 - eeh_add_device_late(dev); 935 - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { 936 - struct pci_bus *subbus = dev->subordinate; 937 - if (subbus) 938 - eeh_add_device_tree_late(subbus); 939 - } 806 + eeh_add_device_late(dev); 807 + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { 808 + struct pci_bus *subbus = dev->subordinate; 809 + if (subbus) 810 + eeh_add_device_tree_late(subbus); 811 + } 940 812 } 941 813 } 942 814 EXPORT_SYMBOL_GPL(eeh_add_device_tree_late);

+2 -3

arch/powerpc/platforms/pseries/eeh_cache.c arch/powerpc/kernel/eeh_cache.c

··· 194 194 } 195 195 196 196 /* Skip any devices for which EEH is not enabled. */ 197 - if (!edev->pe) { 197 + if (!eeh_probe_mode_dev() && !edev->pe) { 198 198 #ifdef DEBUG 199 199 pr_info("PCI: skip building address cache for=%s - %s\n", 200 200 pci_name(dev), dn->full_name); ··· 285 285 * Must be run late in boot process, after the pci controllers 286 286 * have been scanned for devices (after all device resources are known). 287 287 */ 288 - void __init eeh_addr_cache_build(void) 288 + void eeh_addr_cache_build(void) 289 289 { 290 290 struct device_node *dn; 291 291 struct eeh_dev *edev; ··· 316 316 eeh_addr_cache_print(&pci_io_addr_cache_root); 317 317 #endif 318 318 } 319 -

arch/powerpc/platforms/pseries/eeh_dev.c arch/powerpc/kernel/eeh_dev.c

+139 -30

arch/powerpc/platforms/pseries/eeh_driver.c arch/powerpc/kernel/eeh_driver.c

··· 154 154 * eeh_report_error - Report pci error to each device driver 155 155 * @data: eeh device 156 156 * @userdata: return value 157 - * 158 - * Report an EEH error to each device driver, collect up and 159 - * merge the device driver responses. Cumulative response 157 + * 158 + * Report an EEH error to each device driver, collect up and 159 + * merge the device driver responses. Cumulative response 160 160 * passed back in "userdata". 161 161 */ 162 162 static void *eeh_report_error(void *data, void *userdata) ··· 349 349 */ 350 350 static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus) 351 351 { 352 + struct timeval tstamp; 352 353 int cnt, rc; 353 354 354 355 /* pcibios will clear the counter; save the value */ 355 356 cnt = pe->freeze_count; 357 + tstamp = pe->tstamp; 356 358 357 359 /* 358 360 * We don't remove the corresponding PE instances because ··· 378 376 eeh_pe_restore_bars(pe); 379 377 380 378 /* Give the system 5 seconds to finish running the user-space 381 - * hotplug shutdown scripts, e.g. ifdown for ethernet. Yes, 382 - * this is a hack, but if we don't do this, and try to bring 383 - * the device up before the scripts have taken it down, 379 + * hotplug shutdown scripts, e.g. ifdown for ethernet. Yes, 380 + * this is a hack, but if we don't do this, and try to bring 381 + * the device up before the scripts have taken it down, 384 382 * potentially weird things happen. 385 383 */ 386 384 if (bus) { 387 385 ssleep(5); 388 386 pcibios_add_pci_devices(bus); 389 387 } 388 + 389 + pe->tstamp = tstamp; 390 390 pe->freeze_count = cnt; 391 391 392 392 return 0; ··· 399 395 */ 400 396 #define MAX_WAIT_FOR_RECOVERY 150 401 397 402 - /** 403 - * eeh_handle_event - Reset a PCI device after hard lockup. 404 - * @pe: EEH PE 405 - * 406 - * While PHB detects address or data parity errors on particular PCI 407 - * slot, the associated PE will be frozen. Besides, DMA's occurring 408 - * to wild addresses (which usually happen due to bugs in device 409 - * drivers or in PCI adapter firmware) can cause EEH error. #SERR, 410 - * #PERR or other misc PCI-related errors also can trigger EEH errors. 411 - * 412 - * Recovery process consists of unplugging the device driver (which 413 - * generated hotplug events to userspace), then issuing a PCI #RST to 414 - * the device, then reconfiguring the PCI config space for all bridges 415 - * & devices under this slot, and then finally restarting the device 416 - * drivers (which cause a second set of hotplug events to go out to 417 - * userspace). 418 - */ 419 - void eeh_handle_event(struct eeh_pe *pe) 398 + static void eeh_handle_normal_event(struct eeh_pe *pe) 420 399 { 421 400 struct pci_bus *frozen_bus; 422 401 int rc = 0; ··· 412 425 return; 413 426 } 414 427 428 + eeh_pe_update_time_stamp(pe); 415 429 pe->freeze_count++; 416 430 if (pe->freeze_count > EEH_MAX_ALLOWED_FREEZES) 417 431 goto excess_failures; ··· 425 437 * status ... if any child can't handle the reset, then the entire 426 438 * slot is dlpar removed and added. 427 439 */ 440 + pr_info("EEH: Notify device drivers to shutdown\n"); 428 441 eeh_pe_dev_traverse(pe, eeh_report_error, &result); 429 442 430 443 /* Get the current PCI slot state. This can take a long time, ··· 433 444 */ 434 445 rc = eeh_ops->wait_state(pe, MAX_WAIT_FOR_RECOVERY*1000); 435 446 if (rc < 0 || rc == EEH_STATE_NOT_SUPPORT) { 436 - printk(KERN_WARNING "EEH: Permanent failure\n"); 447 + pr_warning("EEH: Permanent failure\n"); 437 448 goto hard_fail; 438 449 } 439 450 ··· 441 452 * don't post the error log until after all dev drivers 442 453 * have been informed. 443 454 */ 455 + pr_info("EEH: Collect temporary log\n"); 444 456 eeh_slot_error_detail(pe, EEH_LOG_TEMP); 445 457 446 458 /* If all device drivers were EEH-unaware, then shut ··· 449 459 * go down willingly, without panicing the system. 450 460 */ 451 461 if (result == PCI_ERS_RESULT_NONE) { 462 + pr_info("EEH: Reset with hotplug activity\n"); 452 463 rc = eeh_reset_device(pe, frozen_bus); 453 464 if (rc) { 454 - printk(KERN_WARNING "EEH: Unable to reset, rc=%d\n", rc); 465 + pr_warning("%s: Unable to reset, err=%d\n", 466 + __func__, rc); 455 467 goto hard_fail; 456 468 } 457 469 } 458 470 459 471 /* If all devices reported they can proceed, then re-enable MMIO */ 460 472 if (result == PCI_ERS_RESULT_CAN_RECOVER) { 473 + pr_info("EEH: Enable I/O for affected devices\n"); 461 474 rc = eeh_pci_enable(pe, EEH_OPT_THAW_MMIO); 462 475 463 476 if (rc < 0) ··· 468 475 if (rc) { 469 476 result = PCI_ERS_RESULT_NEED_RESET; 470 477 } else { 478 + pr_info("EEH: Notify device drivers to resume I/O\n"); 471 479 result = PCI_ERS_RESULT_NONE; 472 480 eeh_pe_dev_traverse(pe, eeh_report_mmio_enabled, &result); 473 481 } ··· 476 482 477 483 /* If all devices reported they can proceed, then re-enable DMA */ 478 484 if (result == PCI_ERS_RESULT_CAN_RECOVER) { 485 + pr_info("EEH: Enabled DMA for affected devices\n"); 479 486 rc = eeh_pci_enable(pe, EEH_OPT_THAW_DMA); 480 487 481 488 if (rc < 0) ··· 489 494 490 495 /* If any device has a hard failure, then shut off everything. */ 491 496 if (result == PCI_ERS_RESULT_DISCONNECT) { 492 - printk(KERN_WARNING "EEH: Device driver gave up\n"); 497 + pr_warning("EEH: Device driver gave up\n"); 493 498 goto hard_fail; 494 499 } 495 500 496 501 /* If any device called out for a reset, then reset the slot */ 497 502 if (result == PCI_ERS_RESULT_NEED_RESET) { 503 + pr_info("EEH: Reset without hotplug activity\n"); 498 504 rc = eeh_reset_device(pe, NULL); 499 505 if (rc) { 500 - printk(KERN_WARNING "EEH: Cannot reset, rc=%d\n", rc); 506 + pr_warning("%s: Cannot reset, err=%d\n", 507 + __func__, rc); 501 508 goto hard_fail; 502 509 } 510 + 511 + pr_info("EEH: Notify device drivers " 512 + "the completion of reset\n"); 503 513 result = PCI_ERS_RESULT_NONE; 504 514 eeh_pe_dev_traverse(pe, eeh_report_reset, &result); 505 515 } ··· 512 512 /* All devices should claim they have recovered by now. */ 513 513 if ((result != PCI_ERS_RESULT_RECOVERED) && 514 514 (result != PCI_ERS_RESULT_NONE)) { 515 - printk(KERN_WARNING "EEH: Not recovered\n"); 515 + pr_warning("EEH: Not recovered\n"); 516 516 goto hard_fail; 517 517 } 518 518 519 519 /* Tell all device drivers that they can resume operations */ 520 + pr_info("EEH: Notify device driver to resume\n"); 520 521 eeh_pe_dev_traverse(pe, eeh_report_resume, NULL); 521 522 522 523 return; 523 - 524 + 524 525 excess_failures: 525 526 /* 526 527 * About 90% of all real-life EEH failures in the field ··· 551 550 pcibios_remove_pci_devices(frozen_bus); 552 551 } 553 552 553 + static void eeh_handle_special_event(void) 554 + { 555 + struct eeh_pe *pe, *phb_pe; 556 + struct pci_bus *bus; 557 + struct pci_controller *hose, *tmp; 558 + unsigned long flags; 559 + int rc = 0; 560 + 561 + /* 562 + * The return value from next_error() has been classified as follows. 563 + * It might be good to enumerate them. However, next_error() is only 564 + * supported by PowerNV platform for now. So it would be fine to use 565 + * integer directly: 566 + * 567 + * 4 - Dead IOC 3 - Dead PHB 568 + * 2 - Fenced PHB 1 - Frozen PE 569 + * 0 - No error found 570 + * 571 + */ 572 + rc = eeh_ops->next_error(&pe); 573 + if (rc <= 0) 574 + return; 575 + 576 + switch (rc) { 577 + case 4: 578 + /* Mark all PHBs in dead state */ 579 + eeh_serialize_lock(&flags); 580 + list_for_each_entry_safe(hose, tmp, 581 + &hose_list, list_node) { 582 + phb_pe = eeh_phb_pe_get(hose); 583 + if (!phb_pe) continue; 584 + 585 + eeh_pe_state_mark(phb_pe, 586 + EEH_PE_ISOLATED | EEH_PE_PHB_DEAD); 587 + } 588 + eeh_serialize_unlock(flags); 589 + 590 + /* Purge all events */ 591 + eeh_remove_event(NULL); 592 + break; 593 + case 3: 594 + case 2: 595 + case 1: 596 + /* Mark the PE in fenced state */ 597 + eeh_serialize_lock(&flags); 598 + if (rc == 3) 599 + eeh_pe_state_mark(pe, 600 + EEH_PE_ISOLATED | EEH_PE_PHB_DEAD); 601 + else 602 + eeh_pe_state_mark(pe, 603 + EEH_PE_ISOLATED | EEH_PE_RECOVERING); 604 + eeh_serialize_unlock(flags); 605 + 606 + /* Purge all events of the PHB */ 607 + eeh_remove_event(pe); 608 + break; 609 + default: 610 + pr_err("%s: Invalid value %d from next_error()\n", 611 + __func__, rc); 612 + return; 613 + } 614 + 615 + /* 616 + * For fenced PHB and frozen PE, it's handled as normal 617 + * event. We have to remove the affected PHBs for dead 618 + * PHB and IOC 619 + */ 620 + if (rc == 2 || rc == 1) 621 + eeh_handle_normal_event(pe); 622 + else { 623 + list_for_each_entry_safe(hose, tmp, 624 + &hose_list, list_node) { 625 + phb_pe = eeh_phb_pe_get(hose); 626 + if (!phb_pe || !(phb_pe->state & EEH_PE_PHB_DEAD)) 627 + continue; 628 + 629 + bus = eeh_pe_bus_get(phb_pe); 630 + /* Notify all devices that they're about to go down. */ 631 + eeh_pe_dev_traverse(pe, eeh_report_failure, NULL); 632 + pcibios_remove_pci_devices(bus); 633 + } 634 + } 635 + } 636 + 637 + /** 638 + * eeh_handle_event - Reset a PCI device after hard lockup. 639 + * @pe: EEH PE 640 + * 641 + * While PHB detects address or data parity errors on particular PCI 642 + * slot, the associated PE will be frozen. Besides, DMA's occurring 643 + * to wild addresses (which usually happen due to bugs in device 644 + * drivers or in PCI adapter firmware) can cause EEH error. #SERR, 645 + * #PERR or other misc PCI-related errors also can trigger EEH errors. 646 + * 647 + * Recovery process consists of unplugging the device driver (which 648 + * generated hotplug events to userspace), then issuing a PCI #RST to 649 + * the device, then reconfiguring the PCI config space for all bridges 650 + * & devices under this slot, and then finally restarting the device 651 + * drivers (which cause a second set of hotplug events to go out to 652 + * userspace). 653 + */ 654 + void eeh_handle_event(struct eeh_pe *pe) 655 + { 656 + if (pe) 657 + eeh_handle_normal_event(pe); 658 + else 659 + eeh_handle_special_event(); 660 + }

+84 -44

arch/powerpc/platforms/pseries/eeh_event.c arch/powerpc/kernel/eeh_event.c

··· 18 18 19 19 #include <linux/delay.h> 20 20 #include <linux/list.h> 21 - #include <linux/mutex.h> 22 21 #include <linux/sched.h> 22 + #include <linux/semaphore.h> 23 23 #include <linux/pci.h> 24 24 #include <linux/slab.h> 25 - #include <linux/workqueue.h> 26 25 #include <linux/kthread.h> 27 26 #include <asm/eeh_event.h> 28 27 #include <asm/ppc-pci.h> ··· 34 35 * work-queue, where a worker thread can drive recovery. 35 36 */ 36 37 37 - /* EEH event workqueue setup. */ 38 38 static DEFINE_SPINLOCK(eeh_eventlist_lock); 39 + static struct semaphore eeh_eventlist_sem; 39 40 LIST_HEAD(eeh_eventlist); 40 - static void eeh_thread_launcher(struct work_struct *); 41 - DECLARE_WORK(eeh_event_wq, eeh_thread_launcher); 42 - 43 - /* Serialize reset sequences for a given pci device */ 44 - DEFINE_MUTEX(eeh_event_mutex); 45 41 46 42 /** 47 43 * eeh_event_handler - Dispatch EEH events. ··· 54 60 struct eeh_event *event; 55 61 struct eeh_pe *pe; 56 62 57 - spin_lock_irqsave(&eeh_eventlist_lock, flags); 58 - event = NULL; 63 + while (!kthread_should_stop()) { 64 + if (down_interruptible(&eeh_eventlist_sem)) 65 + break; 59 66 60 - /* Unqueue the event, get ready to process. */ 61 - if (!list_empty(&eeh_eventlist)) { 62 - event = list_entry(eeh_eventlist.next, struct eeh_event, list); 63 - list_del(&event->list); 64 - } 65 - spin_unlock_irqrestore(&eeh_eventlist_lock, flags); 67 + /* Fetch EEH event from the queue */ 68 + spin_lock_irqsave(&eeh_eventlist_lock, flags); 69 + event = NULL; 70 + if (!list_empty(&eeh_eventlist)) { 71 + event = list_entry(eeh_eventlist.next, 72 + struct eeh_event, list); 73 + list_del(&event->list); 74 + } 75 + spin_unlock_irqrestore(&eeh_eventlist_lock, flags); 76 + if (!event) 77 + continue; 66 78 67 - if (event == NULL) 68 - return 0; 79 + /* We might have event without binding PE */ 80 + pe = event->pe; 81 + if (pe) { 82 + eeh_pe_state_mark(pe, EEH_PE_RECOVERING); 83 + pr_info("EEH: Detected PCI bus error on PHB#%d-PE#%x\n", 84 + pe->phb->global_number, pe->addr); 85 + eeh_handle_event(pe); 86 + eeh_pe_state_clear(pe, EEH_PE_RECOVERING); 87 + } else { 88 + eeh_handle_event(NULL); 89 + } 69 90 70 - /* Serialize processing of EEH events */ 71 - mutex_lock(&eeh_event_mutex); 72 - pe = event->pe; 73 - eeh_pe_state_mark(pe, EEH_PE_RECOVERING); 74 - pr_info("EEH: Detected PCI bus error on PHB#%d-PE#%x\n", 75 - pe->phb->global_number, pe->addr); 76 - 77 - set_current_state(TASK_INTERRUPTIBLE); /* Don't add to load average */ 78 - eeh_handle_event(pe); 79 - eeh_pe_state_clear(pe, EEH_PE_RECOVERING); 80 - 81 - kfree(event); 82 - mutex_unlock(&eeh_event_mutex); 83 - 84 - /* If there are no new errors after an hour, clear the counter. */ 85 - if (pe && pe->freeze_count > 0) { 86 - msleep_interruptible(3600*1000); 87 - if (pe->freeze_count > 0) 88 - pe->freeze_count--; 89 - 91 + kfree(event); 90 92 } 91 93 92 94 return 0; 93 95 } 94 96 95 97 /** 96 - * eeh_thread_launcher - Start kernel thread to handle EEH events 97 - * @dummy - unused 98 + * eeh_event_init - Start kernel thread to handle EEH events 98 99 * 99 100 * This routine is called to start the kernel thread for processing 100 101 * EEH event. 101 102 */ 102 - static void eeh_thread_launcher(struct work_struct *dummy) 103 + int eeh_event_init(void) 103 104 { 104 - if (IS_ERR(kthread_run(eeh_event_handler, NULL, "eehd"))) 105 - printk(KERN_ERR "Failed to start EEH daemon\n"); 105 + struct task_struct *t; 106 + int ret = 0; 107 + 108 + /* Initialize semaphore */ 109 + sema_init(&eeh_eventlist_sem, 0); 110 + 111 + t = kthread_run(eeh_event_handler, NULL, "eehd"); 112 + if (IS_ERR(t)) { 113 + ret = PTR_ERR(t); 114 + pr_err("%s: Failed to start EEH daemon (%d)\n", 115 + __func__, ret); 116 + return ret; 117 + } 118 + 119 + return 0; 106 120 } 107 121 108 122 /** ··· 138 136 list_add(&event->list, &eeh_eventlist); 139 137 spin_unlock_irqrestore(&eeh_eventlist_lock, flags); 140 138 141 - schedule_work(&eeh_event_wq); 139 + /* For EEH deamon to knick in */ 140 + up(&eeh_eventlist_sem); 142 141 143 142 return 0; 143 + } 144 + 145 + /** 146 + * eeh_remove_event - Remove EEH event from the queue 147 + * @pe: Event binding to the PE 148 + * 149 + * On PowerNV platform, we might have subsequent coming events 150 + * is part of the former one. For that case, those subsequent 151 + * coming events are totally duplicated and unnecessary, thus 152 + * they should be removed. 153 + */ 154 + void eeh_remove_event(struct eeh_pe *pe) 155 + { 156 + unsigned long flags; 157 + struct eeh_event *event, *tmp; 158 + 159 + spin_lock_irqsave(&eeh_eventlist_lock, flags); 160 + list_for_each_entry_safe(event, tmp, &eeh_eventlist, list) { 161 + /* 162 + * If we don't have valid PE passed in, that means 163 + * we already have event corresponding to dead IOC 164 + * and all events should be purged. 165 + */ 166 + if (!pe) { 167 + list_del(&event->list); 168 + kfree(event); 169 + } else if (pe->type & EEH_PE_PHB) { 170 + if (event->pe && event->pe->phb == pe->phb) { 171 + list_del(&event->list); 172 + kfree(event); 173 + } 174 + } else if (event->pe == pe) { 175 + list_del(&event->list); 176 + kfree(event); 177 + } 178 + } 179 + spin_unlock_irqrestore(&eeh_eventlist_lock, flags); 144 180 }

+192 -45

arch/powerpc/platforms/pseries/eeh_pe.c arch/powerpc/kernel/eeh_pe.c

··· 22 22 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 23 23 */ 24 24 25 + #include <linux/delay.h> 25 26 #include <linux/export.h> 26 27 #include <linux/gfp.h> 27 28 #include <linux/init.h> ··· 79 78 } 80 79 81 80 /* Put it into the list */ 82 - eeh_lock(); 83 81 list_add_tail(&pe->child, &eeh_phb_pe); 84 - eeh_unlock(); 85 82 86 83 pr_debug("EEH: Add PE for PHB#%d\n", phb->global_number); 87 84 ··· 94 95 * hierarchy tree is composed of PHB PEs. The function is used 95 96 * to retrieve the corresponding PHB PE according to the given PHB. 96 97 */ 97 - static struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb) 98 + struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb) 98 99 { 99 100 struct eeh_pe *pe; 100 101 ··· 184 185 return NULL; 185 186 } 186 187 187 - eeh_lock(); 188 - 189 188 /* Traverse root PE */ 190 189 for (pe = root; pe; pe = eeh_pe_next(pe, root)) { 191 190 eeh_pe_for_each_dev(pe, edev) { 192 191 ret = fn(edev, flag); 193 - if (ret) { 194 - eeh_unlock(); 192 + if (ret) 195 193 return ret; 196 - } 197 194 } 198 195 } 199 - 200 - eeh_unlock(); 201 196 202 197 return NULL; 203 198 } ··· 221 228 return pe; 222 229 223 230 /* Try BDF address */ 224 - if (edev->pe_config_addr && 231 + if (edev->config_addr && 225 232 (edev->config_addr == pe->config_addr)) 226 233 return pe; 227 234 ··· 239 246 * which is composed of PCI bus/device/function number, or unified 240 247 * PE address. 241 248 */ 242 - static struct eeh_pe *eeh_pe_get(struct eeh_dev *edev) 249 + struct eeh_pe *eeh_pe_get(struct eeh_dev *edev) 243 250 { 244 251 struct eeh_pe *root = eeh_phb_pe_get(edev->phb); 245 252 struct eeh_pe *pe; ··· 298 305 { 299 306 struct eeh_pe *pe, *parent; 300 307 301 - eeh_lock(); 302 - 303 308 /* 304 309 * Search the PE has been existing or not according 305 310 * to the PE address. If that has been existing, the ··· 307 316 pe = eeh_pe_get(edev); 308 317 if (pe && !(pe->type & EEH_PE_INVALID)) { 309 318 if (!edev->pe_config_addr) { 310 - eeh_unlock(); 311 319 pr_err("%s: PE with addr 0x%x already exists\n", 312 320 __func__, edev->config_addr); 313 321 return -EEXIST; ··· 318 328 319 329 /* Put the edev to PE */ 320 330 list_add_tail(&edev->list, &pe->edevs); 321 - eeh_unlock(); 322 331 pr_debug("EEH: Add %s to Bus PE#%x\n", 323 332 edev->dn->full_name, pe->addr); 324 333 ··· 336 347 parent->type &= ~EEH_PE_INVALID; 337 348 parent = parent->parent; 338 349 } 339 - eeh_unlock(); 340 350 pr_debug("EEH: Add %s to Device PE#%x, Parent PE#%x\n", 341 351 edev->dn->full_name, pe->addr, pe->parent->addr); 342 352 ··· 345 357 /* Create a new EEH PE */ 346 358 pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE); 347 359 if (!pe) { 348 - eeh_unlock(); 349 360 pr_err("%s: out of memory!\n", __func__); 350 361 return -ENOMEM; 351 362 } 352 363 pe->addr = edev->pe_config_addr; 353 364 pe->config_addr = edev->config_addr; 365 + 366 + /* 367 + * While doing PE reset, we probably hot-reset the 368 + * upstream bridge. However, the PCI devices including 369 + * the associated EEH devices might be removed when EEH 370 + * core is doing recovery. So that won't safe to retrieve 371 + * the bridge through downstream EEH device. We have to 372 + * trace the parent PCI bus, then the upstream bridge. 373 + */ 374 + if (eeh_probe_mode_dev()) 375 + pe->bus = eeh_dev_to_pci_dev(edev)->bus; 354 376 355 377 /* 356 378 * Put the new EEH PE into hierarchy tree. If the parent ··· 372 374 if (!parent) { 373 375 parent = eeh_phb_pe_get(edev->phb); 374 376 if (!parent) { 375 - eeh_unlock(); 376 377 pr_err("%s: No PHB PE is found (PHB Domain=%d)\n", 377 378 __func__, edev->phb->global_number); 378 379 edev->pe = NULL; ··· 388 391 list_add_tail(&pe->child, &parent->child_list); 389 392 list_add_tail(&edev->list, &pe->edevs); 390 393 edev->pe = pe; 391 - eeh_unlock(); 392 394 pr_debug("EEH: Add %s to Device PE#%x, Parent PE#%x\n", 393 395 edev->dn->full_name, pe->addr, pe->parent->addr); 394 396 ··· 414 418 __func__, edev->dn->full_name); 415 419 return -EEXIST; 416 420 } 417 - 418 - eeh_lock(); 419 421 420 422 /* Remove the EEH device */ 421 423 pe = edev->pe; ··· 459 465 pe = parent; 460 466 } 461 467 462 - eeh_unlock(); 463 - 464 468 return 0; 469 + } 470 + 471 + /** 472 + * eeh_pe_update_time_stamp - Update PE's frozen time stamp 473 + * @pe: EEH PE 474 + * 475 + * We have time stamp for each PE to trace its time of getting 476 + * frozen in last hour. The function should be called to update 477 + * the time stamp on first error of the specific PE. On the other 478 + * handle, we needn't account for errors happened in last hour. 479 + */ 480 + void eeh_pe_update_time_stamp(struct eeh_pe *pe) 481 + { 482 + struct timeval tstamp; 483 + 484 + if (!pe) return; 485 + 486 + if (pe->freeze_count <= 0) { 487 + pe->freeze_count = 0; 488 + do_gettimeofday(&pe->tstamp); 489 + } else { 490 + do_gettimeofday(&tstamp); 491 + if (tstamp.tv_sec - pe->tstamp.tv_sec > 3600) { 492 + pe->tstamp = tstamp; 493 + pe->freeze_count = 0; 494 + } 495 + } 465 496 } 466 497 467 498 /** ··· 531 512 */ 532 513 void eeh_pe_state_mark(struct eeh_pe *pe, int state) 533 514 { 534 - eeh_lock(); 535 515 eeh_pe_traverse(pe, __eeh_pe_state_mark, &state); 536 - eeh_unlock(); 537 516 } 538 517 539 518 /** ··· 565 548 */ 566 549 void eeh_pe_state_clear(struct eeh_pe *pe, int state) 567 550 { 568 - eeh_lock(); 569 551 eeh_pe_traverse(pe, __eeh_pe_state_clear, &state); 570 - eeh_unlock(); 571 552 } 572 553 573 - /** 574 - * eeh_restore_one_device_bars - Restore the Base Address Registers for one device 575 - * @data: EEH device 576 - * @flag: Unused 554 + /* 555 + * Some PCI bridges (e.g. PLX bridges) have primary/secondary 556 + * buses assigned explicitly by firmware, and we probably have 557 + * lost that after reset. So we have to delay the check until 558 + * the PCI-CFG registers have been restored for the parent 559 + * bridge. 577 560 * 578 - * Loads the PCI configuration space base address registers, 579 - * the expansion ROM base address, the latency timer, and etc. 580 - * from the saved values in the device node. 561 + * Don't use normal PCI-CFG accessors, which probably has been 562 + * blocked on normal path during the stage. So we need utilize 563 + * eeh operations, which is always permitted. 581 564 */ 582 - static void *eeh_restore_one_device_bars(void *data, void *flag) 565 + static void eeh_bridge_check_link(struct pci_dev *pdev, 566 + struct device_node *dn) 567 + { 568 + int cap; 569 + uint32_t val; 570 + int timeout = 0; 571 + 572 + /* 573 + * We only check root port and downstream ports of 574 + * PCIe switches 575 + */ 576 + if (!pci_is_pcie(pdev) || 577 + (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT && 578 + pci_pcie_type(pdev) != PCI_EXP_TYPE_DOWNSTREAM)) 579 + return; 580 + 581 + pr_debug("%s: Check PCIe link for %s ...\n", 582 + __func__, pci_name(pdev)); 583 + 584 + /* Check slot status */ 585 + cap = pdev->pcie_cap; 586 + eeh_ops->read_config(dn, cap + PCI_EXP_SLTSTA, 2, &val); 587 + if (!(val & PCI_EXP_SLTSTA_PDS)) { 588 + pr_debug(" No card in the slot (0x%04x) !\n", val); 589 + return; 590 + } 591 + 592 + /* Check power status if we have the capability */ 593 + eeh_ops->read_config(dn, cap + PCI_EXP_SLTCAP, 2, &val); 594 + if (val & PCI_EXP_SLTCAP_PCP) { 595 + eeh_ops->read_config(dn, cap + PCI_EXP_SLTCTL, 2, &val); 596 + if (val & PCI_EXP_SLTCTL_PCC) { 597 + pr_debug(" In power-off state, power it on ...\n"); 598 + val &= ~(PCI_EXP_SLTCTL_PCC | PCI_EXP_SLTCTL_PIC); 599 + val |= (0x0100 & PCI_EXP_SLTCTL_PIC); 600 + eeh_ops->write_config(dn, cap + PCI_EXP_SLTCTL, 2, val); 601 + msleep(2 * 1000); 602 + } 603 + } 604 + 605 + /* Enable link */ 606 + eeh_ops->read_config(dn, cap + PCI_EXP_LNKCTL, 2, &val); 607 + val &= ~PCI_EXP_LNKCTL_LD; 608 + eeh_ops->write_config(dn, cap + PCI_EXP_LNKCTL, 2, val); 609 + 610 + /* Check link */ 611 + eeh_ops->read_config(dn, cap + PCI_EXP_LNKCAP, 4, &val); 612 + if (!(val & PCI_EXP_LNKCAP_DLLLARC)) { 613 + pr_debug(" No link reporting capability (0x%08x) \n", val); 614 + msleep(1000); 615 + return; 616 + } 617 + 618 + /* Wait the link is up until timeout (5s) */ 619 + timeout = 0; 620 + while (timeout < 5000) { 621 + msleep(20); 622 + timeout += 20; 623 + 624 + eeh_ops->read_config(dn, cap + PCI_EXP_LNKSTA, 2, &val); 625 + if (val & PCI_EXP_LNKSTA_DLLLA) 626 + break; 627 + } 628 + 629 + if (val & PCI_EXP_LNKSTA_DLLLA) 630 + pr_debug(" Link up (%s)\n", 631 + (val & PCI_EXP_LNKSTA_CLS_2_5GB) ? "2.5GB" : "5GB"); 632 + else 633 + pr_debug(" Link not ready (0x%04x)\n", val); 634 + } 635 + 636 + #define BYTE_SWAP(OFF) (8*((OFF)/4)+3-(OFF)) 637 + #define SAVED_BYTE(OFF) (((u8 *)(edev->config_space))[BYTE_SWAP(OFF)]) 638 + 639 + static void eeh_restore_bridge_bars(struct pci_dev *pdev, 640 + struct eeh_dev *edev, 641 + struct device_node *dn) 642 + { 643 + int i; 644 + 645 + /* 646 + * Device BARs: 0x10 - 0x18 647 + * Bus numbers and windows: 0x18 - 0x30 648 + */ 649 + for (i = 4; i < 13; i++) 650 + eeh_ops->write_config(dn, i*4, 4, edev->config_space[i]); 651 + /* Rom: 0x38 */ 652 + eeh_ops->write_config(dn, 14*4, 4, edev->config_space[14]); 653 + 654 + /* Cache line & Latency timer: 0xC 0xD */ 655 + eeh_ops->write_config(dn, PCI_CACHE_LINE_SIZE, 1, 656 + SAVED_BYTE(PCI_CACHE_LINE_SIZE)); 657 + eeh_ops->write_config(dn, PCI_LATENCY_TIMER, 1, 658 + SAVED_BYTE(PCI_LATENCY_TIMER)); 659 + /* Max latency, min grant, interrupt ping and line: 0x3C */ 660 + eeh_ops->write_config(dn, 15*4, 4, edev->config_space[15]); 661 + 662 + /* PCI Command: 0x4 */ 663 + eeh_ops->write_config(dn, PCI_COMMAND, 4, edev->config_space[1]); 664 + 665 + /* Check the PCIe link is ready */ 666 + eeh_bridge_check_link(pdev, dn); 667 + } 668 + 669 + static void eeh_restore_device_bars(struct eeh_dev *edev, 670 + struct device_node *dn) 583 671 { 584 672 int i; 585 673 u32 cmd; 586 - struct eeh_dev *edev = (struct eeh_dev *)data; 587 - struct device_node *dn = eeh_dev_to_of_node(edev); 588 674 589 675 for (i = 4; i < 10; i++) 590 676 eeh_ops->write_config(dn, i*4, 4, edev->config_space[i]); 591 677 /* 12 == Expansion ROM Address */ 592 678 eeh_ops->write_config(dn, 12*4, 4, edev->config_space[12]); 593 - 594 - #define BYTE_SWAP(OFF) (8*((OFF)/4)+3-(OFF)) 595 - #define SAVED_BYTE(OFF) (((u8 *)(edev->config_space))[BYTE_SWAP(OFF)]) 596 679 597 680 eeh_ops->write_config(dn, PCI_CACHE_LINE_SIZE, 1, 598 681 SAVED_BYTE(PCI_CACHE_LINE_SIZE)); ··· 716 599 else 717 600 cmd &= ~PCI_COMMAND_SERR; 718 601 eeh_ops->write_config(dn, PCI_COMMAND, 4, cmd); 602 + } 603 + 604 + /** 605 + * eeh_restore_one_device_bars - Restore the Base Address Registers for one device 606 + * @data: EEH device 607 + * @flag: Unused 608 + * 609 + * Loads the PCI configuration space base address registers, 610 + * the expansion ROM base address, the latency timer, and etc. 611 + * from the saved values in the device node. 612 + */ 613 + static void *eeh_restore_one_device_bars(void *data, void *flag) 614 + { 615 + struct pci_dev *pdev = NULL; 616 + struct eeh_dev *edev = (struct eeh_dev *)data; 617 + struct device_node *dn = eeh_dev_to_of_node(edev); 618 + 619 + /* Trace the PCI bridge */ 620 + if (eeh_probe_mode_dev()) { 621 + pdev = eeh_dev_to_pci_dev(edev); 622 + if (pdev->hdr_type != PCI_HEADER_TYPE_BRIDGE) 623 + pdev = NULL; 624 + } 625 + 626 + if (pdev) 627 + eeh_restore_bridge_bars(pdev, edev, dn); 628 + else 629 + eeh_restore_device_bars(edev, dn); 719 630 720 631 return NULL; 721 632 } ··· 780 635 struct eeh_dev *edev; 781 636 struct pci_dev *pdev; 782 637 783 - eeh_lock(); 784 - 785 638 if (pe->type & EEH_PE_PHB) { 786 639 bus = pe->phb->bus; 787 640 } else if (pe->type & EEH_PE_BUS || 788 641 pe->type & EEH_PE_DEVICE) { 642 + if (pe->bus) { 643 + bus = pe->bus; 644 + goto out; 645 + } 646 + 789 647 edev = list_first_entry(&pe->edevs, struct eeh_dev, list); 790 648 pdev = eeh_dev_to_pci_dev(edev); 791 649 if (pdev) 792 650 bus = pdev->bus; 793 651 } 794 652 795 - eeh_unlock(); 796 - 653 + out: 797 654 return bus; 798 655 }

-1

arch/powerpc/platforms/pseries/eeh_sysfs.c arch/powerpc/kernel/eeh_sysfs.c

··· 72 72 device_remove_file(&pdev->dev, &dev_attr_eeh_config_addr); 73 73 device_remove_file(&pdev->dev, &dev_attr_eeh_pe_config_addr); 74 74 } 75 -

+1 -1

arch/powerpc/platforms/pseries/io_event_irq.c

··· 115 115 * by scope or event type alone. For example, Torrent ISR route change 116 116 * event is reported with scope 0x00 (Not Applicatable) rather than 117 117 * 0x3B (Torrent-hub). It is better to let the clients to identify 118 - * who owns the the event. 118 + * who owns the event. 119 119 */ 120 120 121 121 static irqreturn_t ioei_interrupt(int irq, void *dev_id)

+4

arch/powerpc/platforms/pseries/iommu.c

··· 614 614 615 615 iommu_table_setparms(pci->phb, dn, tbl); 616 616 pci->iommu_table = iommu_init_table(tbl, pci->phb->node); 617 + iommu_register_group(tbl, pci_domain_nr(bus), 0); 617 618 618 619 /* Divide the rest (1.75GB) among the children */ 619 620 pci->phb->dma_window_size = 0x80000000ul; ··· 659 658 ppci->phb->node); 660 659 iommu_table_setparms_lpar(ppci->phb, pdn, tbl, dma_window); 661 660 ppci->iommu_table = iommu_init_table(tbl, ppci->phb->node); 661 + iommu_register_group(tbl, pci_domain_nr(bus), 0); 662 662 pr_debug(" created table: %p\n", ppci->iommu_table); 663 663 } 664 664 } ··· 686 684 phb->node); 687 685 iommu_table_setparms(phb, dn, tbl); 688 686 PCI_DN(dn)->iommu_table = iommu_init_table(tbl, phb->node); 687 + iommu_register_group(tbl, pci_domain_nr(phb->bus), 0); 689 688 set_iommu_table_base(&dev->dev, PCI_DN(dn)->iommu_table); 690 689 return; 691 690 } ··· 1187 1184 pci->phb->node); 1188 1185 iommu_table_setparms_lpar(pci->phb, pdn, tbl, dma_window); 1189 1186 pci->iommu_table = iommu_init_table(tbl, pci->phb->node); 1187 + iommu_register_group(tbl, pci_domain_nr(pci->phb->bus), 0); 1190 1188 pr_debug(" created table: %p\n", pci->iommu_table); 1191 1189 } else { 1192 1190 pr_debug(" found DMA window, table: %p\n", pci->iommu_table);

+130 -12

arch/powerpc/platforms/pseries/lpar.c

··· 45 45 #include "plpar_wrappers.h" 46 46 #include "pseries.h" 47 47 48 + /* Flag bits for H_BULK_REMOVE */ 49 + #define HBR_REQUEST 0x4000000000000000UL 50 + #define HBR_RESPONSE 0x8000000000000000UL 51 + #define HBR_END 0xc000000000000000UL 52 + #define HBR_AVPN 0x0200000000000000UL 53 + #define HBR_ANDCOND 0x0100000000000000UL 54 + 48 55 49 56 /* in hvCall.S */ 50 57 EXPORT_SYMBOL(plpar_hcall); ··· 70 63 71 64 if (cpu_has_feature(CPU_FTR_ALTIVEC)) 72 65 lppaca_of(cpu).vmxregs_in_use = 1; 66 + 67 + if (cpu_has_feature(CPU_FTR_ARCH_207S)) 68 + lppaca_of(cpu).ebb_regs_in_use = 1; 73 69 74 70 addr = __pa(&lppaca_of(cpu)); 75 71 ret = register_vpa(hwcpu, addr); ··· 250 240 static long pSeries_lpar_hpte_updatepp(unsigned long slot, 251 241 unsigned long newpp, 252 242 unsigned long vpn, 253 - int psize, int ssize, int local) 243 + int psize, int apsize, 244 + int ssize, int local) 254 245 { 255 246 unsigned long lpar_rc; 256 247 unsigned long flags = (newpp & 7) | H_AVPN; ··· 339 328 } 340 329 341 330 static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn, 342 - int psize, int ssize, int local) 331 + int psize, int apsize, 332 + int ssize, int local) 343 333 { 344 334 unsigned long want_v; 345 335 unsigned long lpar_rc; ··· 357 345 BUG_ON(lpar_rc != H_SUCCESS); 358 346 } 359 347 348 + /* 349 + * Limit iterations holding pSeries_lpar_tlbie_lock to 3. We also need 350 + * to make sure that we avoid bouncing the hypervisor tlbie lock. 351 + */ 352 + #define PPC64_HUGE_HPTE_BATCH 12 353 + 354 + static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot, 355 + unsigned long *vpn, int count, 356 + int psize, int ssize) 357 + { 358 + unsigned long param[8]; 359 + int i = 0, pix = 0, rc; 360 + unsigned long flags = 0; 361 + int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE); 362 + 363 + if (lock_tlbie) 364 + spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags); 365 + 366 + for (i = 0; i < count; i++) { 367 + 368 + if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) { 369 + pSeries_lpar_hpte_invalidate(slot[i], vpn[i], psize, 0, 370 + ssize, 0); 371 + } else { 372 + param[pix] = HBR_REQUEST | HBR_AVPN | slot[i]; 373 + param[pix+1] = hpte_encode_avpn(vpn[i], psize, ssize); 374 + pix += 2; 375 + if (pix == 8) { 376 + rc = plpar_hcall9(H_BULK_REMOVE, param, 377 + param[0], param[1], param[2], 378 + param[3], param[4], param[5], 379 + param[6], param[7]); 380 + BUG_ON(rc != H_SUCCESS); 381 + pix = 0; 382 + } 383 + } 384 + } 385 + if (pix) { 386 + param[pix] = HBR_END; 387 + rc = plpar_hcall9(H_BULK_REMOVE, param, param[0], param[1], 388 + param[2], param[3], param[4], param[5], 389 + param[6], param[7]); 390 + BUG_ON(rc != H_SUCCESS); 391 + } 392 + 393 + if (lock_tlbie) 394 + spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags); 395 + } 396 + 397 + static void pSeries_lpar_hugepage_invalidate(struct mm_struct *mm, 398 + unsigned char *hpte_slot_array, 399 + unsigned long addr, int psize) 400 + { 401 + int ssize = 0, i, index = 0; 402 + unsigned long s_addr = addr; 403 + unsigned int max_hpte_count, valid; 404 + unsigned long vpn_array[PPC64_HUGE_HPTE_BATCH]; 405 + unsigned long slot_array[PPC64_HUGE_HPTE_BATCH]; 406 + unsigned long shift, hidx, vpn = 0, vsid, hash, slot; 407 + 408 + shift = mmu_psize_defs[psize].shift; 409 + max_hpte_count = 1U << (PMD_SHIFT - shift); 410 + 411 + for (i = 0; i < max_hpte_count; i++) { 412 + valid = hpte_valid(hpte_slot_array, i); 413 + if (!valid) 414 + continue; 415 + hidx = hpte_hash_index(hpte_slot_array, i); 416 + 417 + /* get the vpn */ 418 + addr = s_addr + (i * (1ul << shift)); 419 + if (!is_kernel_addr(addr)) { 420 + ssize = user_segment_size(addr); 421 + vsid = get_vsid(mm->context.id, addr, ssize); 422 + WARN_ON(vsid == 0); 423 + } else { 424 + vsid = get_kernel_vsid(addr, mmu_kernel_ssize); 425 + ssize = mmu_kernel_ssize; 426 + } 427 + 428 + vpn = hpt_vpn(addr, vsid, ssize); 429 + hash = hpt_hash(vpn, shift, ssize); 430 + if (hidx & _PTEIDX_SECONDARY) 431 + hash = ~hash; 432 + 433 + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; 434 + slot += hidx & _PTEIDX_GROUP_IX; 435 + 436 + slot_array[index] = slot; 437 + vpn_array[index] = vpn; 438 + if (index == PPC64_HUGE_HPTE_BATCH - 1) { 439 + /* 440 + * Now do a bluk invalidate 441 + */ 442 + __pSeries_lpar_hugepage_invalidate(slot_array, 443 + vpn_array, 444 + PPC64_HUGE_HPTE_BATCH, 445 + psize, ssize); 446 + index = 0; 447 + } else 448 + index++; 449 + } 450 + if (index) 451 + __pSeries_lpar_hugepage_invalidate(slot_array, vpn_array, 452 + index, psize, ssize); 453 + } 454 + 360 455 static void pSeries_lpar_hpte_removebolted(unsigned long ea, 361 456 int psize, int ssize) 362 457 { ··· 475 356 476 357 slot = pSeries_lpar_hpte_find(vpn, psize, ssize); 477 358 BUG_ON(slot == -1); 478 - 479 - pSeries_lpar_hpte_invalidate(slot, vpn, psize, ssize, 0); 359 + /* 360 + * lpar doesn't use the passed actual page size 361 + */ 362 + pSeries_lpar_hpte_invalidate(slot, vpn, psize, 0, ssize, 0); 480 363 } 481 - 482 - /* Flag bits for H_BULK_REMOVE */ 483 - #define HBR_REQUEST 0x4000000000000000UL 484 - #define HBR_RESPONSE 0x8000000000000000UL 485 - #define HBR_END 0xc000000000000000UL 486 - #define HBR_AVPN 0x0200000000000000UL 487 - #define HBR_ANDCOND 0x0100000000000000UL 488 364 489 365 /* 490 366 * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie ··· 514 400 slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; 515 401 slot += hidx & _PTEIDX_GROUP_IX; 516 402 if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) { 403 + /* 404 + * lpar doesn't use the passed actual page size 405 + */ 517 406 pSeries_lpar_hpte_invalidate(slot, vpn, psize, 518 - ssize, local); 407 + 0, ssize, local); 519 408 } else { 520 409 param[pix] = HBR_REQUEST | HBR_AVPN | slot; 521 410 param[pix+1] = hpte_encode_avpn(vpn, psize, ··· 569 452 ppc_md.hpte_removebolted = pSeries_lpar_hpte_removebolted; 570 453 ppc_md.flush_hash_range = pSeries_lpar_flush_hash_range; 571 454 ppc_md.hpte_clear_all = pSeries_lpar_hptab_clear; 455 + ppc_md.hugepage_invalidate = pSeries_lpar_hugepage_invalidate; 572 456 } 573 457 574 458 #ifdef CONFIG_PPC_SMLPAR

+498 -146

arch/powerpc/platforms/pseries/nvram.c

··· 18 18 #include <linux/spinlock.h> 19 19 #include <linux/slab.h> 20 20 #include <linux/kmsg_dump.h> 21 + #include <linux/pstore.h> 21 22 #include <linux/ctype.h> 22 23 #include <linux/zlib.h> 23 24 #include <asm/uaccess.h> ··· 29 28 30 29 /* Max bytes to read/write in one go */ 31 30 #define NVRW_CNT 0x20 31 + 32 + /* 33 + * Set oops header version to distingush between old and new format header. 34 + * lnx,oops-log partition max size is 4000, header version > 4000 will 35 + * help in identifying new header. 36 + */ 37 + #define OOPS_HDR_VERSION 5000 32 38 33 39 static unsigned int nvram_size; 34 40 static int nvram_fetch, nvram_store; ··· 53 45 int min_size; /* minimum acceptable size (0 means req_size) */ 54 46 long size; /* size of data portion (excluding err_log_info) */ 55 47 long index; /* offset of data portion of partition */ 48 + bool os_partition; /* partition initialized by OS, not FW */ 56 49 }; 57 50 58 51 static struct nvram_os_partition rtas_log_partition = { 59 52 .name = "ibm,rtas-log", 60 53 .req_size = 2079, 61 54 .min_size = 1055, 62 - .index = -1 55 + .index = -1, 56 + .os_partition = true 63 57 }; 64 58 65 59 static struct nvram_os_partition oops_log_partition = { 66 60 .name = "lnx,oops-log", 67 61 .req_size = 4000, 68 62 .min_size = 2000, 69 - .index = -1 63 + .index = -1, 64 + .os_partition = true 70 65 }; 71 66 72 67 static const char *pseries_nvram_os_partitions[] = { ··· 77 66 "lnx,oops-log", 78 67 NULL 79 68 }; 69 + 70 + struct oops_log_info { 71 + u16 version; 72 + u16 report_length; 73 + u64 timestamp; 74 + } __attribute__((packed)); 80 75 81 76 static void oops_to_nvram(struct kmsg_dumper *dumper, 82 77 enum kmsg_dump_reason reason); ··· 100 83 101 84 * big_oops_buf[] holds the uncompressed text we're capturing. 102 85 * 103 - * oops_buf[] holds the compressed text, preceded by a prefix. 104 - * The prefix is just a u16 holding the length of the compressed* text. 105 - * (*Or uncompressed, if compression fails.) oops_buf[] gets written 106 - * to NVRAM. 86 + * oops_buf[] holds the compressed text, preceded by a oops header. 87 + * oops header has u16 holding the version of oops header (to differentiate 88 + * between old and new format header) followed by u16 holding the length of 89 + * the compressed* text (*Or uncompressed, if compression fails.) and u64 90 + * holding the timestamp. oops_buf[] gets written to NVRAM. 107 91 * 108 - * oops_len points to the prefix. oops_data points to the compressed text. 92 + * oops_log_info points to the header. oops_data points to the compressed text. 109 93 * 110 94 * +- oops_buf 111 - * | +- oops_data 112 - * v v 113 - * +------------+-----------------------------------------------+ 114 - * | length | text | 115 - * | (2 bytes) | (oops_data_sz bytes) | 116 - * +------------+-----------------------------------------------+ 95 + * | +- oops_data 96 + * v v 97 + * +-----------+-----------+-----------+------------------------+ 98 + * | version | length | timestamp | text | 99 + * | (2 bytes) | (2 bytes) | (8 bytes) | (oops_data_sz bytes) | 100 + * +-----------+-----------+-----------+------------------------+ 117 101 * ^ 118 - * +- oops_len 102 + * +- oops_log_info 119 103 * 120 104 * We preallocate these buffers during init to avoid kmalloc during oops/panic. 121 105 */ 122 106 static size_t big_oops_buf_sz; 123 107 static char *big_oops_buf, *oops_buf; 124 - static u16 *oops_len; 125 108 static char *oops_data; 126 109 static size_t oops_data_sz; 127 110 ··· 130 113 #define WINDOW_BITS 12 131 114 #define MEM_LEVEL 4 132 115 static struct z_stream_s stream; 116 + 117 + #ifdef CONFIG_PSTORE 118 + static struct nvram_os_partition of_config_partition = { 119 + .name = "of-config", 120 + .index = -1, 121 + .os_partition = false 122 + }; 123 + 124 + static struct nvram_os_partition common_partition = { 125 + .name = "common", 126 + .index = -1, 127 + .os_partition = false 128 + }; 129 + 130 + static enum pstore_type_id nvram_type_ids[] = { 131 + PSTORE_TYPE_DMESG, 132 + PSTORE_TYPE_PPC_RTAS, 133 + PSTORE_TYPE_PPC_OF, 134 + PSTORE_TYPE_PPC_COMMON, 135 + -1 136 + }; 137 + static int read_type; 138 + static unsigned long last_rtas_event; 139 + #endif 133 140 134 141 static ssize_t pSeries_nvram_read(char *buf, size_t count, loff_t *index) 135 142 { ··· 316 275 { 317 276 int rc = nvram_write_os_partition(&rtas_log_partition, buff, length, 318 277 err_type, error_log_cnt); 319 - if (!rc) 278 + if (!rc) { 320 279 last_unread_rtas_event = get_seconds(); 280 + #ifdef CONFIG_PSTORE 281 + last_rtas_event = get_seconds(); 282 + #endif 283 + } 284 + 321 285 return rc; 286 + } 287 + 288 + /* nvram_read_partition 289 + * 290 + * Reads nvram partition for at most 'length' 291 + */ 292 + int nvram_read_partition(struct nvram_os_partition *part, char *buff, 293 + int length, unsigned int *err_type, 294 + unsigned int *error_log_cnt) 295 + { 296 + int rc; 297 + loff_t tmp_index; 298 + struct err_log_info info; 299 + 300 + if (part->index == -1) 301 + return -1; 302 + 303 + if (length > part->size) 304 + length = part->size; 305 + 306 + tmp_index = part->index; 307 + 308 + if (part->os_partition) { 309 + rc = ppc_md.nvram_read((char *)&info, 310 + sizeof(struct err_log_info), 311 + &tmp_index); 312 + if (rc <= 0) { 313 + pr_err("%s: Failed nvram_read (%d)\n", __FUNCTION__, 314 + rc); 315 + return rc; 316 + } 317 + } 318 + 319 + rc = ppc_md.nvram_read(buff, length, &tmp_index); 320 + if (rc <= 0) { 321 + pr_err("%s: Failed nvram_read (%d)\n", __FUNCTION__, rc); 322 + return rc; 323 + } 324 + 325 + if (part->os_partition) { 326 + *error_log_cnt = info.seq_num; 327 + *err_type = info.error_type; 328 + } 329 + 330 + return 0; 322 331 } 323 332 324 333 /* nvram_read_error_log 325 334 * 326 335 * Reads nvram for error log for at most 'length' 327 336 */ 328 - int nvram_read_error_log(char * buff, int length, 329 - unsigned int * err_type, unsigned int * error_log_cnt) 337 + int nvram_read_error_log(char *buff, int length, 338 + unsigned int *err_type, unsigned int *error_log_cnt) 330 339 { 331 - int rc; 332 - loff_t tmp_index; 333 - struct err_log_info info; 334 - 335 - if (rtas_log_partition.index == -1) 336 - return -1; 337 - 338 - if (length > rtas_log_partition.size) 339 - length = rtas_log_partition.size; 340 - 341 - tmp_index = rtas_log_partition.index; 342 - 343 - rc = ppc_md.nvram_read((char *)&info, sizeof(struct err_log_info), &tmp_index); 344 - if (rc <= 0) { 345 - printk(KERN_ERR "nvram_read_error_log: Failed nvram_read (%d)\n", rc); 346 - return rc; 347 - } 348 - 349 - rc = ppc_md.nvram_read(buff, length, &tmp_index); 350 - if (rc <= 0) { 351 - printk(KERN_ERR "nvram_read_error_log: Failed nvram_read (%d)\n", rc); 352 - return rc; 353 - } 354 - 355 - *error_log_cnt = info.seq_num; 356 - *err_type = info.error_type; 357 - 358 - return 0; 340 + return nvram_read_partition(&rtas_log_partition, buff, length, 341 + err_type, error_log_cnt); 359 342 } 360 343 361 344 /* This doesn't actually zero anything, but it sets the event_logged ··· 470 405 return 0; 471 406 } 472 407 473 - static void __init nvram_init_oops_partition(int rtas_partition_exists) 474 - { 475 - int rc; 476 - 477 - rc = pseries_nvram_init_os_partition(&oops_log_partition); 478 - if (rc != 0) { 479 - if (!rtas_partition_exists) 480 - return; 481 - pr_notice("nvram: Using %s partition to log both" 482 - " RTAS errors and oops/panic reports\n", 483 - rtas_log_partition.name); 484 - memcpy(&oops_log_partition, &rtas_log_partition, 485 - sizeof(rtas_log_partition)); 486 - } 487 - oops_buf = kmalloc(oops_log_partition.size, GFP_KERNEL); 488 - if (!oops_buf) { 489 - pr_err("nvram: No memory for %s partition\n", 490 - oops_log_partition.name); 491 - return; 492 - } 493 - oops_len = (u16*) oops_buf; 494 - oops_data = oops_buf + sizeof(u16); 495 - oops_data_sz = oops_log_partition.size - sizeof(u16); 496 - 497 - /* 498 - * Figure compression (preceded by elimination of each line's <n> 499 - * severity prefix) will reduce the oops/panic report to at most 500 - * 45% of its original size. 501 - */ 502 - big_oops_buf_sz = (oops_data_sz * 100) / 45; 503 - big_oops_buf = kmalloc(big_oops_buf_sz, GFP_KERNEL); 504 - if (big_oops_buf) { 505 - stream.workspace = kmalloc(zlib_deflate_workspacesize( 506 - WINDOW_BITS, MEM_LEVEL), GFP_KERNEL); 507 - if (!stream.workspace) { 508 - pr_err("nvram: No memory for compression workspace; " 509 - "skipping compression of %s partition data\n", 510 - oops_log_partition.name); 511 - kfree(big_oops_buf); 512 - big_oops_buf = NULL; 513 - } 514 - } else { 515 - pr_err("No memory for uncompressed %s data; " 516 - "skipping compression\n", oops_log_partition.name); 517 - stream.workspace = NULL; 518 - } 519 - 520 - rc = kmsg_dump_register(&nvram_kmsg_dumper); 521 - if (rc != 0) { 522 - pr_err("nvram: kmsg_dump_register() failed; returned %d\n", rc); 523 - kfree(oops_buf); 524 - kfree(big_oops_buf); 525 - kfree(stream.workspace); 526 - } 527 - } 528 - 529 - static int __init pseries_nvram_init_log_partitions(void) 530 - { 531 - int rc; 532 - 533 - rc = pseries_nvram_init_os_partition(&rtas_log_partition); 534 - nvram_init_oops_partition(rc == 0); 535 - return 0; 536 - } 537 - machine_arch_initcall(pseries, pseries_nvram_init_log_partitions); 538 - 539 - int __init pSeries_nvram_init(void) 540 - { 541 - struct device_node *nvram; 542 - const unsigned int *nbytes_p; 543 - unsigned int proplen; 544 - 545 - nvram = of_find_node_by_type(NULL, "nvram"); 546 - if (nvram == NULL) 547 - return -ENODEV; 548 - 549 - nbytes_p = of_get_property(nvram, "#bytes", &proplen); 550 - if (nbytes_p == NULL || proplen != sizeof(unsigned int)) { 551 - of_node_put(nvram); 552 - return -EIO; 553 - } 554 - 555 - nvram_size = *nbytes_p; 556 - 557 - nvram_fetch = rtas_token("nvram-fetch"); 558 - nvram_store = rtas_token("nvram-store"); 559 - printk(KERN_INFO "PPC64 nvram contains %d bytes\n", nvram_size); 560 - of_node_put(nvram); 561 - 562 - ppc_md.nvram_read = pSeries_nvram_read; 563 - ppc_md.nvram_write = pSeries_nvram_write; 564 - ppc_md.nvram_size = pSeries_nvram_get_size; 565 - 566 - return 0; 567 - } 568 - 569 408 /* 570 409 * Are we using the ibm,rtas-log for oops/panic reports? And if so, 571 410 * would logging this oops/panic overwrite an RTAS event that rtas_errd ··· 524 555 /* Compress the text from big_oops_buf into oops_buf. */ 525 556 static int zip_oops(size_t text_len) 526 557 { 558 + struct oops_log_info *oops_hdr = (struct oops_log_info *)oops_buf; 527 559 int zipped_len = nvram_compress(big_oops_buf, oops_data, text_len, 528 560 oops_data_sz); 529 561 if (zipped_len < 0) { ··· 532 562 pr_err("nvram: logging uncompressed oops/panic report\n"); 533 563 return -1; 534 564 } 535 - *oops_len = (u16) zipped_len; 565 + oops_hdr->version = OOPS_HDR_VERSION; 566 + oops_hdr->report_length = (u16) zipped_len; 567 + oops_hdr->timestamp = get_seconds(); 536 568 return 0; 537 569 } 570 + 571 + #ifdef CONFIG_PSTORE 572 + /* Derived from logfs_uncompress */ 573 + int nvram_decompress(void *in, void *out, size_t inlen, size_t outlen) 574 + { 575 + int err, ret; 576 + 577 + ret = -EIO; 578 + err = zlib_inflateInit(&stream); 579 + if (err != Z_OK) 580 + goto error; 581 + 582 + stream.next_in = in; 583 + stream.avail_in = inlen; 584 + stream.total_in = 0; 585 + stream.next_out = out; 586 + stream.avail_out = outlen; 587 + stream.total_out = 0; 588 + 589 + err = zlib_inflate(&stream, Z_FINISH); 590 + if (err != Z_STREAM_END) 591 + goto error; 592 + 593 + err = zlib_inflateEnd(&stream); 594 + if (err != Z_OK) 595 + goto error; 596 + 597 + ret = stream.total_out; 598 + error: 599 + return ret; 600 + } 601 + 602 + static int unzip_oops(char *oops_buf, char *big_buf) 603 + { 604 + struct oops_log_info *oops_hdr = (struct oops_log_info *)oops_buf; 605 + u64 timestamp = oops_hdr->timestamp; 606 + char *big_oops_data = NULL; 607 + char *oops_data_buf = NULL; 608 + size_t big_oops_data_sz; 609 + int unzipped_len; 610 + 611 + big_oops_data = big_buf + sizeof(struct oops_log_info); 612 + big_oops_data_sz = big_oops_buf_sz - sizeof(struct oops_log_info); 613 + oops_data_buf = oops_buf + sizeof(struct oops_log_info); 614 + 615 + unzipped_len = nvram_decompress(oops_data_buf, big_oops_data, 616 + oops_hdr->report_length, 617 + big_oops_data_sz); 618 + 619 + if (unzipped_len < 0) { 620 + pr_err("nvram: decompression failed; returned %d\n", 621 + unzipped_len); 622 + return -1; 623 + } 624 + oops_hdr = (struct oops_log_info *)big_buf; 625 + oops_hdr->version = OOPS_HDR_VERSION; 626 + oops_hdr->report_length = (u16) unzipped_len; 627 + oops_hdr->timestamp = timestamp; 628 + return 0; 629 + } 630 + 631 + static int nvram_pstore_open(struct pstore_info *psi) 632 + { 633 + /* Reset the iterator to start reading partitions again */ 634 + read_type = -1; 635 + return 0; 636 + } 637 + 638 + /** 639 + * nvram_pstore_write - pstore write callback for nvram 640 + * @type: Type of message logged 641 + * @reason: reason behind dump (oops/panic) 642 + * @id: identifier to indicate the write performed 643 + * @part: pstore writes data to registered buffer in parts, 644 + * part number will indicate the same. 645 + * @count: Indicates oops count 646 + * @hsize: Size of header added by pstore 647 + * @size: number of bytes written to the registered buffer 648 + * @psi: registered pstore_info structure 649 + * 650 + * Called by pstore_dump() when an oops or panic report is logged in the 651 + * printk buffer. 652 + * Returns 0 on successful write. 653 + */ 654 + static int nvram_pstore_write(enum pstore_type_id type, 655 + enum kmsg_dump_reason reason, 656 + u64 *id, unsigned int part, int count, 657 + size_t hsize, size_t size, 658 + struct pstore_info *psi) 659 + { 660 + int rc; 661 + unsigned int err_type = ERR_TYPE_KERNEL_PANIC; 662 + struct oops_log_info *oops_hdr = (struct oops_log_info *) oops_buf; 663 + 664 + /* part 1 has the recent messages from printk buffer */ 665 + if (part > 1 || type != PSTORE_TYPE_DMESG || 666 + clobbering_unread_rtas_event()) 667 + return -1; 668 + 669 + oops_hdr->version = OOPS_HDR_VERSION; 670 + oops_hdr->report_length = (u16) size; 671 + oops_hdr->timestamp = get_seconds(); 672 + 673 + if (big_oops_buf) { 674 + rc = zip_oops(size); 675 + /* 676 + * If compression fails copy recent log messages from 677 + * big_oops_buf to oops_data. 678 + */ 679 + if (rc != 0) { 680 + size_t diff = size - oops_data_sz + hsize; 681 + 682 + if (size > oops_data_sz) { 683 + memcpy(oops_data, big_oops_buf, hsize); 684 + memcpy(oops_data + hsize, big_oops_buf + diff, 685 + oops_data_sz - hsize); 686 + 687 + oops_hdr->report_length = (u16) oops_data_sz; 688 + } else 689 + memcpy(oops_data, big_oops_buf, size); 690 + } else 691 + err_type = ERR_TYPE_KERNEL_PANIC_GZ; 692 + } 693 + 694 + rc = nvram_write_os_partition(&oops_log_partition, oops_buf, 695 + (int) (sizeof(*oops_hdr) + oops_hdr->report_length), err_type, 696 + count); 697 + 698 + if (rc != 0) 699 + return rc; 700 + 701 + *id = part; 702 + return 0; 703 + } 704 + 705 + /* 706 + * Reads the oops/panic report, rtas, of-config and common partition. 707 + * Returns the length of the data we read from each partition. 708 + * Returns 0 if we've been called before. 709 + */ 710 + static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type, 711 + int *count, struct timespec *time, char **buf, 712 + struct pstore_info *psi) 713 + { 714 + struct oops_log_info *oops_hdr; 715 + unsigned int err_type, id_no, size = 0; 716 + struct nvram_os_partition *part = NULL; 717 + char *buff = NULL, *big_buff = NULL; 718 + int rc, sig = 0; 719 + loff_t p; 720 + 721 + read_partition: 722 + read_type++; 723 + 724 + switch (nvram_type_ids[read_type]) { 725 + case PSTORE_TYPE_DMESG: 726 + part = &oops_log_partition; 727 + *type = PSTORE_TYPE_DMESG; 728 + break; 729 + case PSTORE_TYPE_PPC_RTAS: 730 + part = &rtas_log_partition; 731 + *type = PSTORE_TYPE_PPC_RTAS; 732 + time->tv_sec = last_rtas_event; 733 + time->tv_nsec = 0; 734 + break; 735 + case PSTORE_TYPE_PPC_OF: 736 + sig = NVRAM_SIG_OF; 737 + part = &of_config_partition; 738 + *type = PSTORE_TYPE_PPC_OF; 739 + *id = PSTORE_TYPE_PPC_OF; 740 + time->tv_sec = 0; 741 + time->tv_nsec = 0; 742 + break; 743 + case PSTORE_TYPE_PPC_COMMON: 744 + sig = NVRAM_SIG_SYS; 745 + part = &common_partition; 746 + *type = PSTORE_TYPE_PPC_COMMON; 747 + *id = PSTORE_TYPE_PPC_COMMON; 748 + time->tv_sec = 0; 749 + time->tv_nsec = 0; 750 + break; 751 + default: 752 + return 0; 753 + } 754 + 755 + if (!part->os_partition) { 756 + p = nvram_find_partition(part->name, sig, &size); 757 + if (p <= 0) { 758 + pr_err("nvram: Failed to find partition %s, " 759 + "err %d\n", part->name, (int)p); 760 + return 0; 761 + } 762 + part->index = p; 763 + part->size = size; 764 + } 765 + 766 + buff = kmalloc(part->size, GFP_KERNEL); 767 + 768 + if (!buff) 769 + return -ENOMEM; 770 + 771 + if (nvram_read_partition(part, buff, part->size, &err_type, &id_no)) { 772 + kfree(buff); 773 + return 0; 774 + } 775 + 776 + *count = 0; 777 + 778 + if (part->os_partition) 779 + *id = id_no; 780 + 781 + if (nvram_type_ids[read_type] == PSTORE_TYPE_DMESG) { 782 + oops_hdr = (struct oops_log_info *)buff; 783 + *buf = buff + sizeof(*oops_hdr); 784 + 785 + if (err_type == ERR_TYPE_KERNEL_PANIC_GZ) { 786 + big_buff = kmalloc(big_oops_buf_sz, GFP_KERNEL); 787 + if (!big_buff) 788 + return -ENOMEM; 789 + 790 + rc = unzip_oops(buff, big_buff); 791 + 792 + if (rc != 0) { 793 + kfree(buff); 794 + kfree(big_buff); 795 + goto read_partition; 796 + } 797 + 798 + oops_hdr = (struct oops_log_info *)big_buff; 799 + *buf = big_buff + sizeof(*oops_hdr); 800 + kfree(buff); 801 + } 802 + 803 + time->tv_sec = oops_hdr->timestamp; 804 + time->tv_nsec = 0; 805 + return oops_hdr->report_length; 806 + } 807 + 808 + *buf = buff; 809 + return part->size; 810 + } 811 + 812 + static struct pstore_info nvram_pstore_info = { 813 + .owner = THIS_MODULE, 814 + .name = "nvram", 815 + .open = nvram_pstore_open, 816 + .read = nvram_pstore_read, 817 + .write = nvram_pstore_write, 818 + }; 819 + 820 + static int nvram_pstore_init(void) 821 + { 822 + int rc = 0; 823 + 824 + if (big_oops_buf) { 825 + nvram_pstore_info.buf = big_oops_buf; 826 + nvram_pstore_info.bufsize = big_oops_buf_sz; 827 + } else { 828 + nvram_pstore_info.buf = oops_data; 829 + nvram_pstore_info.bufsize = oops_data_sz; 830 + } 831 + 832 + rc = pstore_register(&nvram_pstore_info); 833 + if (rc != 0) 834 + pr_err("nvram: pstore_register() failed, defaults to " 835 + "kmsg_dump; returned %d\n", rc); 836 + 837 + return rc; 838 + } 839 + #else 840 + static int nvram_pstore_init(void) 841 + { 842 + return -1; 843 + } 844 + #endif 845 + 846 + static void __init nvram_init_oops_partition(int rtas_partition_exists) 847 + { 848 + int rc; 849 + 850 + rc = pseries_nvram_init_os_partition(&oops_log_partition); 851 + if (rc != 0) { 852 + if (!rtas_partition_exists) 853 + return; 854 + pr_notice("nvram: Using %s partition to log both" 855 + " RTAS errors and oops/panic reports\n", 856 + rtas_log_partition.name); 857 + memcpy(&oops_log_partition, &rtas_log_partition, 858 + sizeof(rtas_log_partition)); 859 + } 860 + oops_buf = kmalloc(oops_log_partition.size, GFP_KERNEL); 861 + if (!oops_buf) { 862 + pr_err("nvram: No memory for %s partition\n", 863 + oops_log_partition.name); 864 + return; 865 + } 866 + oops_data = oops_buf + sizeof(struct oops_log_info); 867 + oops_data_sz = oops_log_partition.size - sizeof(struct oops_log_info); 868 + 869 + /* 870 + * Figure compression (preceded by elimination of each line's <n> 871 + * severity prefix) will reduce the oops/panic report to at most 872 + * 45% of its original size. 873 + */ 874 + big_oops_buf_sz = (oops_data_sz * 100) / 45; 875 + big_oops_buf = kmalloc(big_oops_buf_sz, GFP_KERNEL); 876 + if (big_oops_buf) { 877 + stream.workspace = kmalloc(zlib_deflate_workspacesize( 878 + WINDOW_BITS, MEM_LEVEL), GFP_KERNEL); 879 + if (!stream.workspace) { 880 + pr_err("nvram: No memory for compression workspace; " 881 + "skipping compression of %s partition data\n", 882 + oops_log_partition.name); 883 + kfree(big_oops_buf); 884 + big_oops_buf = NULL; 885 + } 886 + } else { 887 + pr_err("No memory for uncompressed %s data; " 888 + "skipping compression\n", oops_log_partition.name); 889 + stream.workspace = NULL; 890 + } 891 + 892 + rc = nvram_pstore_init(); 893 + 894 + if (!rc) 895 + return; 896 + 897 + rc = kmsg_dump_register(&nvram_kmsg_dumper); 898 + if (rc != 0) { 899 + pr_err("nvram: kmsg_dump_register() failed; returned %d\n", rc); 900 + kfree(oops_buf); 901 + kfree(big_oops_buf); 902 + kfree(stream.workspace); 903 + } 904 + } 905 + 906 + static int __init pseries_nvram_init_log_partitions(void) 907 + { 908 + int rc; 909 + 910 + rc = pseries_nvram_init_os_partition(&rtas_log_partition); 911 + nvram_init_oops_partition(rc == 0); 912 + return 0; 913 + } 914 + machine_arch_initcall(pseries, pseries_nvram_init_log_partitions); 915 + 916 + int __init pSeries_nvram_init(void) 917 + { 918 + struct device_node *nvram; 919 + const unsigned int *nbytes_p; 920 + unsigned int proplen; 921 + 922 + nvram = of_find_node_by_type(NULL, "nvram"); 923 + if (nvram == NULL) 924 + return -ENODEV; 925 + 926 + nbytes_p = of_get_property(nvram, "#bytes", &proplen); 927 + if (nbytes_p == NULL || proplen != sizeof(unsigned int)) { 928 + of_node_put(nvram); 929 + return -EIO; 930 + } 931 + 932 + nvram_size = *nbytes_p; 933 + 934 + nvram_fetch = rtas_token("nvram-fetch"); 935 + nvram_store = rtas_token("nvram-store"); 936 + printk(KERN_INFO "PPC64 nvram contains %d bytes\n", nvram_size); 937 + of_node_put(nvram); 938 + 939 + ppc_md.nvram_read = pSeries_nvram_read; 940 + ppc_md.nvram_write = pSeries_nvram_write; 941 + ppc_md.nvram_size = pSeries_nvram_get_size; 942 + 943 + return 0; 944 + } 945 + 538 946 539 947 /* 540 948 * This is our kmsg_dump callback, called after an oops or panic report ··· 924 576 static void oops_to_nvram(struct kmsg_dumper *dumper, 925 577 enum kmsg_dump_reason reason) 926 578 { 579 + struct oops_log_info *oops_hdr = (struct oops_log_info *)oops_buf; 927 580 static unsigned int oops_count = 0; 928 581 static bool panicking = false; 929 582 static DEFINE_SPINLOCK(lock); ··· 968 619 } 969 620 if (rc != 0) { 970 621 kmsg_dump_rewind(dumper); 971 - kmsg_dump_get_buffer(dumper, true, 622 + kmsg_dump_get_buffer(dumper, false, 972 623 oops_data, oops_data_sz, &text_len); 973 624 err_type = ERR_TYPE_KERNEL_PANIC; 974 - *oops_len = (u16) text_len; 625 + oops_hdr->version = OOPS_HDR_VERSION; 626 + oops_hdr->report_length = (u16) text_len; 627 + oops_hdr->timestamp = get_seconds(); 975 628 } 976 629 977 630 (void) nvram_write_os_partition(&oops_log_partition, oops_buf, 978 - (int) (sizeof(*oops_len) + *oops_len), err_type, ++oops_count); 631 + (int) (sizeof(*oops_hdr) + oops_hdr->report_length), err_type, 632 + ++oops_count); 979 633 980 634 spin_unlock_irqrestore(&lock, flags); 981 635 }

-85

arch/powerpc/platforms/pseries/pci_dlpar.c

··· 64 64 } 65 65 EXPORT_SYMBOL_GPL(pcibios_find_pci_bus); 66 66 67 - /** 68 - * __pcibios_remove_pci_devices - remove all devices under this bus 69 - * @bus: the indicated PCI bus 70 - * @purge_pe: destroy the PE on removal of PCI devices 71 - * 72 - * Remove all of the PCI devices under this bus both from the 73 - * linux pci device tree, and from the powerpc EEH address cache. 74 - * By default, the corresponding PE will be destroied during the 75 - * normal PCI hotplug path. For PCI hotplug during EEH recovery, 76 - * the corresponding PE won't be destroied and deallocated. 77 - */ 78 - void __pcibios_remove_pci_devices(struct pci_bus *bus, int purge_pe) 79 - { 80 - struct pci_dev *dev, *tmp; 81 - struct pci_bus *child_bus; 82 - 83 - /* First go down child busses */ 84 - list_for_each_entry(child_bus, &bus->children, node) 85 - __pcibios_remove_pci_devices(child_bus, purge_pe); 86 - 87 - pr_debug("PCI: Removing devices on bus %04x:%02x\n", 88 - pci_domain_nr(bus), bus->number); 89 - list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) { 90 - pr_debug(" * Removing %s...\n", pci_name(dev)); 91 - eeh_remove_bus_device(dev, purge_pe); 92 - pci_stop_and_remove_bus_device(dev); 93 - } 94 - } 95 - 96 - /** 97 - * pcibios_remove_pci_devices - remove all devices under this bus 98 - * 99 - * Remove all of the PCI devices under this bus both from the 100 - * linux pci device tree, and from the powerpc EEH address cache. 101 - */ 102 - void pcibios_remove_pci_devices(struct pci_bus *bus) 103 - { 104 - __pcibios_remove_pci_devices(bus, 1); 105 - } 106 - EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices); 107 - 108 - /** 109 - * pcibios_add_pci_devices - adds new pci devices to bus 110 - * 111 - * This routine will find and fixup new pci devices under 112 - * the indicated bus. This routine presumes that there 113 - * might already be some devices under this bridge, so 114 - * it carefully tries to add only new devices. (And that 115 - * is how this routine differs from other, similar pcibios 116 - * routines.) 117 - */ 118 - void pcibios_add_pci_devices(struct pci_bus * bus) 119 - { 120 - int slotno, num, mode, pass, max; 121 - struct pci_dev *dev; 122 - struct device_node *dn = pci_bus_to_OF_node(bus); 123 - 124 - eeh_add_device_tree_early(dn); 125 - 126 - mode = PCI_PROBE_NORMAL; 127 - if (ppc_md.pci_probe_mode) 128 - mode = ppc_md.pci_probe_mode(bus); 129 - 130 - if (mode == PCI_PROBE_DEVTREE) { 131 - /* use ofdt-based probe */ 132 - of_rescan_bus(dn, bus); 133 - } else if (mode == PCI_PROBE_NORMAL) { 134 - /* use legacy probe */ 135 - slotno = PCI_SLOT(PCI_DN(dn->child)->devfn); 136 - num = pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); 137 - if (!num) 138 - return; 139 - pcibios_setup_bus_devices(bus); 140 - max = bus->busn_res.start; 141 - for (pass=0; pass < 2; pass++) 142 - list_for_each_entry(dev, &bus->devices, bus_list) { 143 - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE || 144 - dev->hdr_type == PCI_HEADER_TYPE_CARDBUS) 145 - max = pci_scan_bridge(bus, dev, max, pass); 146 - } 147 - } 148 - pcibios_finish_adding_to_bus(bus); 149 - } 150 - EXPORT_SYMBOL_GPL(pcibios_add_pci_devices); 151 - 152 67 struct pci_controller *init_phb_dynamic(struct device_node *dn) 153 68 { 154 69 struct pci_controller *phb;

+4 -4

arch/powerpc/platforms/pseries/ras.c

··· 83 83 switch (event_modifier) { 84 84 case EPOW_SHUTDOWN_NORMAL: 85 85 pr_emerg("Firmware initiated power off"); 86 - orderly_poweroff(1); 86 + orderly_poweroff(true); 87 87 break; 88 88 89 89 case EPOW_SHUTDOWN_ON_UPS: ··· 95 95 pr_emerg("Loss of system critical functions reported by " 96 96 "firmware"); 97 97 pr_emerg("Check RTAS error log for details"); 98 - orderly_poweroff(1); 98 + orderly_poweroff(true); 99 99 break; 100 100 101 101 case EPOW_SHUTDOWN_AMBIENT_TEMPERATURE_TOO_HIGH: 102 102 pr_emerg("Ambient temperature too high reported by firmware"); 103 103 pr_emerg("Check RTAS error log for details"); 104 - orderly_poweroff(1); 104 + orderly_poweroff(true); 105 105 break; 106 106 107 107 default: ··· 162 162 163 163 case EPOW_SYSTEM_HALT: 164 164 pr_emerg("Firmware initiated power off"); 165 - orderly_poweroff(1); 165 + orderly_poweroff(true); 166 166 break; 167 167 168 168 case EPOW_MAIN_ENCLOSURE:

+1 -1

arch/powerpc/platforms/pseries/smp.c

··· 192 192 /* Special case - we inhibit secondary thread startup 193 193 * during boot if the user requests it. 194 194 */ 195 - if (system_state < SYSTEM_RUNNING && cpu_has_feature(CPU_FTR_SMT)) { 195 + if (system_state == SYSTEM_BOOTING && cpu_has_feature(CPU_FTR_SMT)) { 196 196 if (!smt_enabled_at_boot && cpu_thread_in_core(nr) != 0) 197 197 return 0; 198 198 if (smt_enabled_at_boot

+2

arch/powerpc/sysdev/Makefile

··· 4 4 5 5 mpic-msi-obj-$(CONFIG_PCI_MSI) += mpic_msi.o mpic_u3msi.o mpic_pasemi_msi.o 6 6 obj-$(CONFIG_MPIC) += mpic.o $(mpic-msi-obj-y) 7 + obj-$(CONFIG_MPIC_TIMER) += mpic_timer.o 8 + obj-$(CONFIG_FSL_MPIC_TIMER_WAKEUP) += fsl_mpic_timer_wakeup.o 7 9 mpic-msgr-obj-$(CONFIG_MPIC_MSGR) += mpic_msgr.o 8 10 obj-$(CONFIG_MPIC) += mpic.o $(mpic-msi-obj-y) $(mpic-msgr-obj-y) 9 11 obj-$(CONFIG_PPC_EPAPR_HV_PIC) += ehv_pic.o

+1

arch/powerpc/sysdev/cpm1.c

··· 120 120 121 121 static struct irqaction cpm_error_irqaction = { 122 122 .handler = cpm_error_interrupt, 123 + .flags = IRQF_NO_THREAD, 123 124 .name = "error", 124 125 }; 125 126

+161

arch/powerpc/sysdev/fsl_mpic_timer_wakeup.c

··· 1 + /* 2 + * MPIC timer wakeup driver 3 + * 4 + * Copyright 2013 Freescale Semiconductor, Inc. 5 + * 6 + * This program is free software; you can redistribute it and/or modify it 7 + * under the terms of the GNU General Public License as published by the 8 + * Free Software Foundation; either version 2 of the License, or (at your 9 + * option) any later version. 10 + */ 11 + 12 + #include <linux/kernel.h> 13 + #include <linux/slab.h> 14 + #include <linux/errno.h> 15 + #include <linux/module.h> 16 + #include <linux/interrupt.h> 17 + #include <linux/device.h> 18 + 19 + #include <asm/mpic_timer.h> 20 + #include <asm/mpic.h> 21 + 22 + struct fsl_mpic_timer_wakeup { 23 + struct mpic_timer *timer; 24 + struct work_struct free_work; 25 + }; 26 + 27 + static struct fsl_mpic_timer_wakeup *fsl_wakeup; 28 + static DEFINE_MUTEX(sysfs_lock); 29 + 30 + static void fsl_free_resource(struct work_struct *ws) 31 + { 32 + struct fsl_mpic_timer_wakeup *wakeup = 33 + container_of(ws, struct fsl_mpic_timer_wakeup, free_work); 34 + 35 + mutex_lock(&sysfs_lock); 36 + 37 + if (wakeup->timer) { 38 + disable_irq_wake(wakeup->timer->irq); 39 + mpic_free_timer(wakeup->timer); 40 + } 41 + 42 + wakeup->timer = NULL; 43 + mutex_unlock(&sysfs_lock); 44 + } 45 + 46 + static irqreturn_t fsl_mpic_timer_irq(int irq, void *dev_id) 47 + { 48 + struct fsl_mpic_timer_wakeup *wakeup = dev_id; 49 + 50 + schedule_work(&wakeup->free_work); 51 + 52 + return wakeup->timer ? IRQ_HANDLED : IRQ_NONE; 53 + } 54 + 55 + static ssize_t fsl_timer_wakeup_show(struct device *dev, 56 + struct device_attribute *attr, 57 + char *buf) 58 + { 59 + struct timeval interval; 60 + int val = 0; 61 + 62 + mutex_lock(&sysfs_lock); 63 + if (fsl_wakeup->timer) { 64 + mpic_get_remain_time(fsl_wakeup->timer, &interval); 65 + val = interval.tv_sec + 1; 66 + } 67 + mutex_unlock(&sysfs_lock); 68 + 69 + return sprintf(buf, "%d\n", val); 70 + } 71 + 72 + static ssize_t fsl_timer_wakeup_store(struct device *dev, 73 + struct device_attribute *attr, 74 + const char *buf, 75 + size_t count) 76 + { 77 + struct timeval interval; 78 + int ret; 79 + 80 + interval.tv_usec = 0; 81 + if (kstrtol(buf, 0, &interval.tv_sec)) 82 + return -EINVAL; 83 + 84 + mutex_lock(&sysfs_lock); 85 + 86 + if (fsl_wakeup->timer) { 87 + disable_irq_wake(fsl_wakeup->timer->irq); 88 + mpic_free_timer(fsl_wakeup->timer); 89 + fsl_wakeup->timer = NULL; 90 + } 91 + 92 + if (!interval.tv_sec) { 93 + mutex_unlock(&sysfs_lock); 94 + return count; 95 + } 96 + 97 + fsl_wakeup->timer = mpic_request_timer(fsl_mpic_timer_irq, 98 + fsl_wakeup, &interval); 99 + if (!fsl_wakeup->timer) { 100 + mutex_unlock(&sysfs_lock); 101 + return -EINVAL; 102 + } 103 + 104 + ret = enable_irq_wake(fsl_wakeup->timer->irq); 105 + if (ret) { 106 + mpic_free_timer(fsl_wakeup->timer); 107 + fsl_wakeup->timer = NULL; 108 + mutex_unlock(&sysfs_lock); 109 + 110 + return ret; 111 + } 112 + 113 + mpic_start_timer(fsl_wakeup->timer); 114 + 115 + mutex_unlock(&sysfs_lock); 116 + 117 + return count; 118 + } 119 + 120 + static struct device_attribute mpic_attributes = __ATTR(timer_wakeup, 0644, 121 + fsl_timer_wakeup_show, fsl_timer_wakeup_store); 122 + 123 + static int __init fsl_wakeup_sys_init(void) 124 + { 125 + int ret; 126 + 127 + fsl_wakeup = kzalloc(sizeof(struct fsl_mpic_timer_wakeup), GFP_KERNEL); 128 + if (!fsl_wakeup) 129 + return -ENOMEM; 130 + 131 + INIT_WORK(&fsl_wakeup->free_work, fsl_free_resource); 132 + 133 + ret = device_create_file(mpic_subsys.dev_root, &mpic_attributes); 134 + if (ret) 135 + kfree(fsl_wakeup); 136 + 137 + return ret; 138 + } 139 + 140 + static void __exit fsl_wakeup_sys_exit(void) 141 + { 142 + device_remove_file(mpic_subsys.dev_root, &mpic_attributes); 143 + 144 + mutex_lock(&sysfs_lock); 145 + 146 + if (fsl_wakeup->timer) { 147 + disable_irq_wake(fsl_wakeup->timer->irq); 148 + mpic_free_timer(fsl_wakeup->timer); 149 + } 150 + 151 + kfree(fsl_wakeup); 152 + 153 + mutex_unlock(&sysfs_lock); 154 + } 155 + 156 + module_init(fsl_wakeup_sys_init); 157 + module_exit(fsl_wakeup_sys_exit); 158 + 159 + MODULE_DESCRIPTION("Freescale MPIC global timer wakeup driver"); 160 + MODULE_LICENSE("GPL v2"); 161 + MODULE_AUTHOR("Wang Dongsheng <dongsheng.wang@freescale.com>");

+51 -7

arch/powerpc/sysdev/mpic.c

··· 48 48 #define DBG(fmt...) 49 49 #endif 50 50 51 + struct bus_type mpic_subsys = { 52 + .name = "mpic", 53 + .dev_name = "mpic", 54 + }; 55 + EXPORT_SYMBOL_GPL(mpic_subsys); 56 + 51 57 static struct mpic *mpics; 52 58 static struct mpic *mpic_primary; 53 59 static DEFINE_RAW_SPINLOCK(mpic_lock); ··· 926 920 return IRQ_SET_MASK_OK_NOCOPY; 927 921 } 928 922 923 + static int mpic_irq_set_wake(struct irq_data *d, unsigned int on) 924 + { 925 + struct irq_desc *desc = container_of(d, struct irq_desc, irq_data); 926 + struct mpic *mpic = mpic_from_irq_data(d); 927 + 928 + if (!(mpic->flags & MPIC_FSL)) 929 + return -ENXIO; 930 + 931 + if (on) 932 + desc->action->flags |= IRQF_NO_SUSPEND; 933 + else 934 + desc->action->flags &= ~IRQF_NO_SUSPEND; 935 + 936 + return 0; 937 + } 938 + 929 939 void mpic_set_vector(unsigned int virq, unsigned int vector) 930 940 { 931 941 struct mpic *mpic = mpic_from_irq(virq); ··· 979 957 .irq_unmask = mpic_unmask_irq, 980 958 .irq_eoi = mpic_end_irq, 981 959 .irq_set_type = mpic_set_irq_type, 960 + .irq_set_wake = mpic_irq_set_wake, 982 961 }; 983 962 984 963 #ifdef CONFIG_SMP ··· 994 971 .irq_mask = mpic_mask_tm, 995 972 .irq_unmask = mpic_unmask_tm, 996 973 .irq_eoi = mpic_end_irq, 974 + .irq_set_wake = mpic_irq_set_wake, 997 975 }; 998 976 999 977 #ifdef CONFIG_MPIC_U3_HT_IRQS ··· 1197 1173 .xlate = mpic_host_xlate, 1198 1174 }; 1199 1175 1176 + static u32 fsl_mpic_get_version(struct mpic *mpic) 1177 + { 1178 + u32 brr1; 1179 + 1180 + if (!(mpic->flags & MPIC_FSL)) 1181 + return 0; 1182 + 1183 + brr1 = _mpic_read(mpic->reg_type, &mpic->thiscpuregs, 1184 + MPIC_FSL_BRR1); 1185 + 1186 + return brr1 & MPIC_FSL_BRR1_VER; 1187 + } 1188 + 1200 1189 /* 1201 1190 * Exported functions 1202 1191 */ 1192 + 1193 + u32 fsl_mpic_primary_get_version(void) 1194 + { 1195 + struct mpic *mpic = mpic_primary; 1196 + 1197 + if (mpic) 1198 + return fsl_mpic_get_version(mpic); 1199 + 1200 + return 0; 1201 + } 1203 1202 1204 1203 struct mpic * __init mpic_alloc(struct device_node *node, 1205 1204 phys_addr_t phys_addr, ··· 1370 1323 mpic_map(mpic, mpic->paddr, &mpic->tmregs, MPIC_INFO(TIMER_BASE), 0x1000); 1371 1324 1372 1325 if (mpic->flags & MPIC_FSL) { 1373 - u32 brr1; 1374 1326 int ret; 1375 1327 1376 1328 /* ··· 1380 1334 mpic_map(mpic, mpic->paddr, &mpic->thiscpuregs, 1381 1335 MPIC_CPU_THISBASE, 0x1000); 1382 1336 1383 - brr1 = _mpic_read(mpic->reg_type, &mpic->thiscpuregs, 1384 - MPIC_FSL_BRR1); 1385 - fsl_version = brr1 & MPIC_FSL_BRR1_VER; 1337 + fsl_version = fsl_mpic_get_version(mpic); 1386 1338 1387 1339 /* Error interrupt mask register (EIMR) is required for 1388 1340 * handling individual device error interrupts. EIMR ··· 1570 1526 mpic_cpu_write(MPIC_INFO(CPU_CURRENT_TASK_PRI), 0xf); 1571 1527 1572 1528 if (mpic->flags & MPIC_FSL) { 1573 - u32 brr1 = _mpic_read(mpic->reg_type, &mpic->thiscpuregs, 1574 - MPIC_FSL_BRR1); 1575 - u32 version = brr1 & MPIC_FSL_BRR1_VER; 1529 + u32 version = fsl_mpic_get_version(mpic); 1576 1530 1577 1531 /* 1578 1532 * Timer group B is present at the latest in MPIC 3.1 (e.g. ··· 2041 1999 static int mpic_init_sys(void) 2042 2000 { 2043 2001 register_syscore_ops(&mpic_syscore_ops); 2002 + subsys_system_register(&mpic_subsys, NULL); 2003 + 2044 2004 return 0; 2045 2005 } 2046 2006

+593

arch/powerpc/sysdev/mpic_timer.c

··· 1 + /* 2 + * MPIC timer driver 3 + * 4 + * Copyright 2013 Freescale Semiconductor, Inc. 5 + * Author: Dongsheng Wang <Dongsheng.Wang@freescale.com> 6 + * Li Yang <leoli@freescale.com> 7 + * 8 + * This program is free software; you can redistribute it and/or modify it 9 + * under the terms of the GNU General Public License as published by the 10 + * Free Software Foundation; either version 2 of the License, or (at your 11 + * option) any later version. 12 + */ 13 + 14 + #include <linux/kernel.h> 15 + #include <linux/init.h> 16 + #include <linux/module.h> 17 + #include <linux/errno.h> 18 + #include <linux/mm.h> 19 + #include <linux/interrupt.h> 20 + #include <linux/slab.h> 21 + #include <linux/of.h> 22 + #include <linux/of_device.h> 23 + #include <linux/syscore_ops.h> 24 + #include <sysdev/fsl_soc.h> 25 + #include <asm/io.h> 26 + 27 + #include <asm/mpic_timer.h> 28 + 29 + #define FSL_GLOBAL_TIMER 0x1 30 + 31 + /* Clock Ratio 32 + * Divide by 64 0x00000300 33 + * Divide by 32 0x00000200 34 + * Divide by 16 0x00000100 35 + * Divide by 8 0x00000000 (Hardware default div) 36 + */ 37 + #define MPIC_TIMER_TCR_CLKDIV 0x00000300 38 + 39 + #define MPIC_TIMER_TCR_ROVR_OFFSET 24 40 + 41 + #define TIMER_STOP 0x80000000 42 + #define TIMERS_PER_GROUP 4 43 + #define MAX_TICKS (~0U >> 1) 44 + #define MAX_TICKS_CASCADE (~0U) 45 + #define TIMER_OFFSET(num) (1 << (TIMERS_PER_GROUP - 1 - num)) 46 + 47 + /* tv_usec should be less than ONE_SECOND, otherwise use tv_sec */ 48 + #define ONE_SECOND 1000000 49 + 50 + struct timer_regs { 51 + u32 gtccr; 52 + u32 res0[3]; 53 + u32 gtbcr; 54 + u32 res1[3]; 55 + u32 gtvpr; 56 + u32 res2[3]; 57 + u32 gtdr; 58 + u32 res3[3]; 59 + }; 60 + 61 + struct cascade_priv { 62 + u32 tcr_value; /* TCR register: CASC & ROVR value */ 63 + unsigned int cascade_map; /* cascade map */ 64 + unsigned int timer_num; /* cascade control timer */ 65 + }; 66 + 67 + struct timer_group_priv { 68 + struct timer_regs __iomem *regs; 69 + struct mpic_timer timer[TIMERS_PER_GROUP]; 70 + struct list_head node; 71 + unsigned int timerfreq; 72 + unsigned int idle; 73 + unsigned int flags; 74 + spinlock_t lock; 75 + void __iomem *group_tcr; 76 + }; 77 + 78 + static struct cascade_priv cascade_timer[] = { 79 + /* cascade timer 0 and 1 */ 80 + {0x1, 0xc, 0x1}, 81 + /* cascade timer 1 and 2 */ 82 + {0x2, 0x6, 0x2}, 83 + /* cascade timer 2 and 3 */ 84 + {0x4, 0x3, 0x3} 85 + }; 86 + 87 + static LIST_HEAD(timer_group_list); 88 + 89 + static void convert_ticks_to_time(struct timer_group_priv *priv, 90 + const u64 ticks, struct timeval *time) 91 + { 92 + u64 tmp_sec; 93 + 94 + time->tv_sec = (__kernel_time_t)div_u64(ticks, priv->timerfreq); 95 + tmp_sec = (u64)time->tv_sec * (u64)priv->timerfreq; 96 + 97 + time->tv_usec = (__kernel_suseconds_t) 98 + div_u64((ticks - tmp_sec) * 1000000, priv->timerfreq); 99 + 100 + return; 101 + } 102 + 103 + /* the time set by the user is converted to "ticks" */ 104 + static int convert_time_to_ticks(struct timer_group_priv *priv, 105 + const struct timeval *time, u64 *ticks) 106 + { 107 + u64 max_value; /* prevent u64 overflow */ 108 + u64 tmp = 0; 109 + 110 + u64 tmp_sec; 111 + u64 tmp_ms; 112 + u64 tmp_us; 113 + 114 + max_value = div_u64(ULLONG_MAX, priv->timerfreq); 115 + 116 + if (time->tv_sec > max_value || 117 + (time->tv_sec == max_value && time->tv_usec > 0)) 118 + return -EINVAL; 119 + 120 + tmp_sec = (u64)time->tv_sec * (u64)priv->timerfreq; 121 + tmp += tmp_sec; 122 + 123 + tmp_ms = time->tv_usec / 1000; 124 + tmp_ms = div_u64((u64)tmp_ms * (u64)priv->timerfreq, 1000); 125 + tmp += tmp_ms; 126 + 127 + tmp_us = time->tv_usec % 1000; 128 + tmp_us = div_u64((u64)tmp_us * (u64)priv->timerfreq, 1000000); 129 + tmp += tmp_us; 130 + 131 + *ticks = tmp; 132 + 133 + return 0; 134 + } 135 + 136 + /* detect whether there is a cascade timer available */ 137 + static struct mpic_timer *detect_idle_cascade_timer( 138 + struct timer_group_priv *priv) 139 + { 140 + struct cascade_priv *casc_priv; 141 + unsigned int map; 142 + unsigned int array_size = ARRAY_SIZE(cascade_timer); 143 + unsigned int num; 144 + unsigned int i; 145 + unsigned long flags; 146 + 147 + casc_priv = cascade_timer; 148 + for (i = 0; i < array_size; i++) { 149 + spin_lock_irqsave(&priv->lock, flags); 150 + map = casc_priv->cascade_map & priv->idle; 151 + if (map == casc_priv->cascade_map) { 152 + num = casc_priv->timer_num; 153 + priv->timer[num].cascade_handle = casc_priv; 154 + 155 + /* set timer busy */ 156 + priv->idle &= ~casc_priv->cascade_map; 157 + spin_unlock_irqrestore(&priv->lock, flags); 158 + return &priv->timer[num]; 159 + } 160 + spin_unlock_irqrestore(&priv->lock, flags); 161 + casc_priv++; 162 + } 163 + 164 + return NULL; 165 + } 166 + 167 + static int set_cascade_timer(struct timer_group_priv *priv, u64 ticks, 168 + unsigned int num) 169 + { 170 + struct cascade_priv *casc_priv; 171 + u32 tcr; 172 + u32 tmp_ticks; 173 + u32 rem_ticks; 174 + 175 + /* set group tcr reg for cascade */ 176 + casc_priv = priv->timer[num].cascade_handle; 177 + if (!casc_priv) 178 + return -EINVAL; 179 + 180 + tcr = casc_priv->tcr_value | 181 + (casc_priv->tcr_value << MPIC_TIMER_TCR_ROVR_OFFSET); 182 + setbits32(priv->group_tcr, tcr); 183 + 184 + tmp_ticks = div_u64_rem(ticks, MAX_TICKS_CASCADE, &rem_ticks); 185 + 186 + out_be32(&priv->regs[num].gtccr, 0); 187 + out_be32(&priv->regs[num].gtbcr, tmp_ticks | TIMER_STOP); 188 + 189 + out_be32(&priv->regs[num - 1].gtccr, 0); 190 + out_be32(&priv->regs[num - 1].gtbcr, rem_ticks); 191 + 192 + return 0; 193 + } 194 + 195 + static struct mpic_timer *get_cascade_timer(struct timer_group_priv *priv, 196 + u64 ticks) 197 + { 198 + struct mpic_timer *allocated_timer; 199 + 200 + /* Two cascade timers: Support the maximum time */ 201 + const u64 max_ticks = (u64)MAX_TICKS * (u64)MAX_TICKS_CASCADE; 202 + int ret; 203 + 204 + if (ticks > max_ticks) 205 + return NULL; 206 + 207 + /* detect idle timer */ 208 + allocated_timer = detect_idle_cascade_timer(priv); 209 + if (!allocated_timer) 210 + return NULL; 211 + 212 + /* set ticks to timer */ 213 + ret = set_cascade_timer(priv, ticks, allocated_timer->num); 214 + if (ret < 0) 215 + return NULL; 216 + 217 + return allocated_timer; 218 + } 219 + 220 + static struct mpic_timer *get_timer(const struct timeval *time) 221 + { 222 + struct timer_group_priv *priv; 223 + struct mpic_timer *timer; 224 + 225 + u64 ticks; 226 + unsigned int num; 227 + unsigned int i; 228 + unsigned long flags; 229 + int ret; 230 + 231 + list_for_each_entry(priv, &timer_group_list, node) { 232 + ret = convert_time_to_ticks(priv, time, &ticks); 233 + if (ret < 0) 234 + return NULL; 235 + 236 + if (ticks > MAX_TICKS) { 237 + if (!(priv->flags & FSL_GLOBAL_TIMER)) 238 + return NULL; 239 + 240 + timer = get_cascade_timer(priv, ticks); 241 + if (!timer) 242 + continue; 243 + 244 + return timer; 245 + } 246 + 247 + for (i = 0; i < TIMERS_PER_GROUP; i++) { 248 + /* one timer: Reverse allocation */ 249 + num = TIMERS_PER_GROUP - 1 - i; 250 + spin_lock_irqsave(&priv->lock, flags); 251 + if (priv->idle & (1 << i)) { 252 + /* set timer busy */ 253 + priv->idle &= ~(1 << i); 254 + /* set ticks & stop timer */ 255 + out_be32(&priv->regs[num].gtbcr, 256 + ticks | TIMER_STOP); 257 + out_be32(&priv->regs[num].gtccr, 0); 258 + priv->timer[num].cascade_handle = NULL; 259 + spin_unlock_irqrestore(&priv->lock, flags); 260 + return &priv->timer[num]; 261 + } 262 + spin_unlock_irqrestore(&priv->lock, flags); 263 + } 264 + } 265 + 266 + return NULL; 267 + } 268 + 269 + /** 270 + * mpic_start_timer - start hardware timer 271 + * @handle: the timer to be started. 272 + * 273 + * It will do ->fn(->dev) callback from the hardware interrupt at 274 + * the ->timeval point in the future. 275 + */ 276 + void mpic_start_timer(struct mpic_timer *handle) 277 + { 278 + struct timer_group_priv *priv = container_of(handle, 279 + struct timer_group_priv, timer[handle->num]); 280 + 281 + clrbits32(&priv->regs[handle->num].gtbcr, TIMER_STOP); 282 + } 283 + EXPORT_SYMBOL(mpic_start_timer); 284 + 285 + /** 286 + * mpic_stop_timer - stop hardware timer 287 + * @handle: the timer to be stoped 288 + * 289 + * The timer periodically generates an interrupt. Unless user stops the timer. 290 + */ 291 + void mpic_stop_timer(struct mpic_timer *handle) 292 + { 293 + struct timer_group_priv *priv = container_of(handle, 294 + struct timer_group_priv, timer[handle->num]); 295 + struct cascade_priv *casc_priv; 296 + 297 + setbits32(&priv->regs[handle->num].gtbcr, TIMER_STOP); 298 + 299 + casc_priv = priv->timer[handle->num].cascade_handle; 300 + if (casc_priv) { 301 + out_be32(&priv->regs[handle->num].gtccr, 0); 302 + out_be32(&priv->regs[handle->num - 1].gtccr, 0); 303 + } else { 304 + out_be32(&priv->regs[handle->num].gtccr, 0); 305 + } 306 + } 307 + EXPORT_SYMBOL(mpic_stop_timer); 308 + 309 + /** 310 + * mpic_get_remain_time - get timer time 311 + * @handle: the timer to be selected. 312 + * @time: time for timer 313 + * 314 + * Query timer remaining time. 315 + */ 316 + void mpic_get_remain_time(struct mpic_timer *handle, struct timeval *time) 317 + { 318 + struct timer_group_priv *priv = container_of(handle, 319 + struct timer_group_priv, timer[handle->num]); 320 + struct cascade_priv *casc_priv; 321 + 322 + u64 ticks; 323 + u32 tmp_ticks; 324 + 325 + casc_priv = priv->timer[handle->num].cascade_handle; 326 + if (casc_priv) { 327 + tmp_ticks = in_be32(&priv->regs[handle->num].gtccr); 328 + ticks = ((u64)tmp_ticks & UINT_MAX) * (u64)MAX_TICKS_CASCADE; 329 + tmp_ticks = in_be32(&priv->regs[handle->num - 1].gtccr); 330 + ticks += tmp_ticks; 331 + } else { 332 + ticks = in_be32(&priv->regs[handle->num].gtccr); 333 + } 334 + 335 + convert_ticks_to_time(priv, ticks, time); 336 + } 337 + EXPORT_SYMBOL(mpic_get_remain_time); 338 + 339 + /** 340 + * mpic_free_timer - free hardware timer 341 + * @handle: the timer to be removed. 342 + * 343 + * Free the timer. 344 + * 345 + * Note: can not be used in interrupt context. 346 + */ 347 + void mpic_free_timer(struct mpic_timer *handle) 348 + { 349 + struct timer_group_priv *priv = container_of(handle, 350 + struct timer_group_priv, timer[handle->num]); 351 + 352 + struct cascade_priv *casc_priv; 353 + unsigned long flags; 354 + 355 + mpic_stop_timer(handle); 356 + 357 + casc_priv = priv->timer[handle->num].cascade_handle; 358 + 359 + free_irq(priv->timer[handle->num].irq, priv->timer[handle->num].dev); 360 + 361 + spin_lock_irqsave(&priv->lock, flags); 362 + if (casc_priv) { 363 + u32 tcr; 364 + tcr = casc_priv->tcr_value | (casc_priv->tcr_value << 365 + MPIC_TIMER_TCR_ROVR_OFFSET); 366 + clrbits32(priv->group_tcr, tcr); 367 + priv->idle |= casc_priv->cascade_map; 368 + priv->timer[handle->num].cascade_handle = NULL; 369 + } else { 370 + priv->idle |= TIMER_OFFSET(handle->num); 371 + } 372 + spin_unlock_irqrestore(&priv->lock, flags); 373 + } 374 + EXPORT_SYMBOL(mpic_free_timer); 375 + 376 + /** 377 + * mpic_request_timer - get a hardware timer 378 + * @fn: interrupt handler function 379 + * @dev: callback function of the data 380 + * @time: time for timer 381 + * 382 + * This executes the "request_irq", returning NULL 383 + * else "handle" on success. 384 + */ 385 + struct mpic_timer *mpic_request_timer(irq_handler_t fn, void *dev, 386 + const struct timeval *time) 387 + { 388 + struct mpic_timer *allocated_timer; 389 + int ret; 390 + 391 + if (list_empty(&timer_group_list)) 392 + return NULL; 393 + 394 + if (!(time->tv_sec + time->tv_usec) || 395 + time->tv_sec < 0 || time->tv_usec < 0) 396 + return NULL; 397 + 398 + if (time->tv_usec > ONE_SECOND) 399 + return NULL; 400 + 401 + allocated_timer = get_timer(time); 402 + if (!allocated_timer) 403 + return NULL; 404 + 405 + ret = request_irq(allocated_timer->irq, fn, 406 + IRQF_TRIGGER_LOW, "global-timer", dev); 407 + if (ret) { 408 + mpic_free_timer(allocated_timer); 409 + return NULL; 410 + } 411 + 412 + allocated_timer->dev = dev; 413 + 414 + return allocated_timer; 415 + } 416 + EXPORT_SYMBOL(mpic_request_timer); 417 + 418 + static int timer_group_get_freq(struct device_node *np, 419 + struct timer_group_priv *priv) 420 + { 421 + u32 div; 422 + 423 + if (priv->flags & FSL_GLOBAL_TIMER) { 424 + struct device_node *dn; 425 + 426 + dn = of_find_compatible_node(NULL, NULL, "fsl,mpic"); 427 + if (dn) { 428 + of_property_read_u32(dn, "clock-frequency", 429 + &priv->timerfreq); 430 + of_node_put(dn); 431 + } 432 + } 433 + 434 + if (priv->timerfreq <= 0) 435 + return -EINVAL; 436 + 437 + if (priv->flags & FSL_GLOBAL_TIMER) { 438 + div = (1 << (MPIC_TIMER_TCR_CLKDIV >> 8)) * 8; 439 + priv->timerfreq /= div; 440 + } 441 + 442 + return 0; 443 + } 444 + 445 + static int timer_group_get_irq(struct device_node *np, 446 + struct timer_group_priv *priv) 447 + { 448 + const u32 all_timer[] = { 0, TIMERS_PER_GROUP }; 449 + const u32 *p; 450 + u32 offset; 451 + u32 count; 452 + 453 + unsigned int i; 454 + unsigned int j; 455 + unsigned int irq_index = 0; 456 + unsigned int irq; 457 + int len; 458 + 459 + p = of_get_property(np, "fsl,available-ranges", &len); 460 + if (p && len % (2 * sizeof(u32)) != 0) { 461 + pr_err("%s: malformed available-ranges property.\n", 462 + np->full_name); 463 + return -EINVAL; 464 + } 465 + 466 + if (!p) { 467 + p = all_timer; 468 + len = sizeof(all_timer); 469 + } 470 + 471 + len /= 2 * sizeof(u32); 472 + 473 + for (i = 0; i < len; i++) { 474 + offset = p[i * 2]; 475 + count = p[i * 2 + 1]; 476 + for (j = 0; j < count; j++) { 477 + irq = irq_of_parse_and_map(np, irq_index); 478 + if (!irq) { 479 + pr_err("%s: irq parse and map failed.\n", 480 + np->full_name); 481 + return -EINVAL; 482 + } 483 + 484 + /* Set timer idle */ 485 + priv->idle |= TIMER_OFFSET((offset + j)); 486 + priv->timer[offset + j].irq = irq; 487 + priv->timer[offset + j].num = offset + j; 488 + irq_index++; 489 + } 490 + } 491 + 492 + return 0; 493 + } 494 + 495 + static void timer_group_init(struct device_node *np) 496 + { 497 + struct timer_group_priv *priv; 498 + unsigned int i = 0; 499 + int ret; 500 + 501 + priv = kzalloc(sizeof(struct timer_group_priv), GFP_KERNEL); 502 + if (!priv) { 503 + pr_err("%s: cannot allocate memory for group.\n", 504 + np->full_name); 505 + return; 506 + } 507 + 508 + if (of_device_is_compatible(np, "fsl,mpic-global-timer")) 509 + priv->flags |= FSL_GLOBAL_TIMER; 510 + 511 + priv->regs = of_iomap(np, i++); 512 + if (!priv->regs) { 513 + pr_err("%s: cannot ioremap timer register address.\n", 514 + np->full_name); 515 + goto out; 516 + } 517 + 518 + if (priv->flags & FSL_GLOBAL_TIMER) { 519 + priv->group_tcr = of_iomap(np, i++); 520 + if (!priv->group_tcr) { 521 + pr_err("%s: cannot ioremap tcr address.\n", 522 + np->full_name); 523 + goto out; 524 + } 525 + } 526 + 527 + ret = timer_group_get_freq(np, priv); 528 + if (ret < 0) { 529 + pr_err("%s: cannot get timer frequency.\n", np->full_name); 530 + goto out; 531 + } 532 + 533 + ret = timer_group_get_irq(np, priv); 534 + if (ret < 0) { 535 + pr_err("%s: cannot get timer irqs.\n", np->full_name); 536 + goto out; 537 + } 538 + 539 + spin_lock_init(&priv->lock); 540 + 541 + /* Init FSL timer hardware */ 542 + if (priv->flags & FSL_GLOBAL_TIMER) 543 + setbits32(priv->group_tcr, MPIC_TIMER_TCR_CLKDIV); 544 + 545 + list_add_tail(&priv->node, &timer_group_list); 546 + 547 + return; 548 + 549 + out: 550 + if (priv->regs) 551 + iounmap(priv->regs); 552 + 553 + if (priv->group_tcr) 554 + iounmap(priv->group_tcr); 555 + 556 + kfree(priv); 557 + } 558 + 559 + static void mpic_timer_resume(void) 560 + { 561 + struct timer_group_priv *priv; 562 + 563 + list_for_each_entry(priv, &timer_group_list, node) { 564 + /* Init FSL timer hardware */ 565 + if (priv->flags & FSL_GLOBAL_TIMER) 566 + setbits32(priv->group_tcr, MPIC_TIMER_TCR_CLKDIV); 567 + } 568 + } 569 + 570 + static const struct of_device_id mpic_timer_ids[] = { 571 + { .compatible = "fsl,mpic-global-timer", }, 572 + {}, 573 + }; 574 + 575 + static struct syscore_ops mpic_timer_syscore_ops = { 576 + .resume = mpic_timer_resume, 577 + }; 578 + 579 + static int __init mpic_timer_init(void) 580 + { 581 + struct device_node *np = NULL; 582 + 583 + for_each_matching_node(np, mpic_timer_ids) 584 + timer_group_init(np); 585 + 586 + register_syscore_ops(&mpic_timer_syscore_ops); 587 + 588 + if (list_empty(&timer_group_list)) 589 + return -ENODEV; 590 + 591 + return 0; 592 + } 593 + subsys_initcall(mpic_timer_init);

+3 -2

arch/s390/include/asm/pgtable.h

··· 1364 1364 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 1365 1365 1366 1366 #define __HAVE_ARCH_PGTABLE_DEPOSIT 1367 - extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable); 1367 + extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, 1368 + pgtable_t pgtable); 1368 1369 1369 1370 #define __HAVE_ARCH_PGTABLE_WITHDRAW 1370 - extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm); 1371 + extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp); 1371 1372 1372 1373 static inline int pmd_trans_splitting(pmd_t pmd) 1373 1374 {

+3 -2

arch/s390/mm/pgtable.c

··· 1165 1165 } 1166 1166 } 1167 1167 1168 - void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable) 1168 + void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, 1169 + pgtable_t pgtable) 1169 1170 { 1170 1171 struct list_head *lh = (struct list_head *) pgtable; 1171 1172 ··· 1180 1179 mm->pmd_huge_pte = pgtable; 1181 1180 } 1182 1181 1183 - pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm) 1182 + pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) 1184 1183 { 1185 1184 struct list_head *lh; 1186 1185 pgtable_t pgtable;

+3 -2

arch/sparc/include/asm/pgtable_64.h

··· 853 853 pmd_t *pmd); 854 854 855 855 #define __HAVE_ARCH_PGTABLE_DEPOSIT 856 - extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable); 856 + extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, 857 + pgtable_t pgtable); 857 858 858 859 #define __HAVE_ARCH_PGTABLE_WITHDRAW 859 - extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm); 860 + extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp); 860 861 #endif 861 862 862 863 /* Encode and de-code a swap entry */

+3 -2

arch/sparc/mm/tlb.c

··· 188 188 } 189 189 } 190 190 191 - void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable) 191 + void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, 192 + pgtable_t pgtable) 192 193 { 193 194 struct list_head *lh = (struct list_head *) pgtable; 194 195 ··· 203 202 mm->pmd_huge_pte = pgtable; 204 203 } 205 204 206 - pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm) 205 + pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) 207 206 { 208 207 struct list_head *lh; 209 208 pgtable_t pgtable;

+2 -2

drivers/acpi/apei/erst.c

··· 935 935 struct timespec *time, char **buf, 936 936 struct pstore_info *psi); 937 937 static int erst_writer(enum pstore_type_id type, enum kmsg_dump_reason reason, 938 - u64 *id, unsigned int part, int count, 938 + u64 *id, unsigned int part, int count, size_t hsize, 939 939 size_t size, struct pstore_info *psi); 940 940 static int erst_clearer(enum pstore_type_id type, u64 id, int count, 941 941 struct timespec time, struct pstore_info *psi); ··· 1055 1055 } 1056 1056 1057 1057 static int erst_writer(enum pstore_type_id type, enum kmsg_dump_reason reason, 1058 - u64 *id, unsigned int part, int count, 1058 + u64 *id, unsigned int part, int count, size_t hsize, 1059 1059 size_t size, struct pstore_info *psi) 1060 1060 { 1061 1061 struct cper_pstore_record *rcd = (struct cper_pstore_record *)

+1 -1

drivers/firmware/efi/efi-pstore.c

··· 103 103 104 104 static int efi_pstore_write(enum pstore_type_id type, 105 105 enum kmsg_dump_reason reason, u64 *id, 106 - unsigned int part, int count, size_t size, 106 + unsigned int part, int count, size_t hsize, size_t size, 107 107 struct pstore_info *psi) 108 108 { 109 109 char name[DUMP_NAME_LEN];

+8

drivers/i2c/busses/i2c-cpm.c

··· 338 338 tptr = 0; 339 339 rptr = 0; 340 340 341 + /* 342 + * If there was a collision in the last i2c transaction, 343 + * Set I2COM_MASTER as it was cleared during collision. 344 + */ 345 + if (in_be16(&tbdf->cbd_sc) & BD_SC_CL) { 346 + out_8(&cpm->i2c_reg->i2com, I2COM_MASTER); 347 + } 348 + 341 349 while (tptr < num) { 342 350 pmsg = &msgs[tptr]; 343 351 dev_dbg(&adap->dev, "R: %d T: %d\n", rptr, tptr);

+8

drivers/iommu/Kconfig

··· 261 261 default 256 if SHMOBILE_IOMMU_ADDRSIZE_64MB 262 262 default 128 if SHMOBILE_IOMMU_ADDRSIZE_32MB 263 263 264 + config SPAPR_TCE_IOMMU 265 + bool "sPAPR TCE IOMMU Support" 266 + depends on PPC_POWERNV || PPC_PSERIES 267 + select IOMMU_API 268 + help 269 + Enables bits of IOMMU API required by VFIO. The iommu_ops 270 + is not implemented as it is not necessary for VFIO. 271 + 264 272 endif # IOMMU_SUPPORT

+1 -1

drivers/macintosh/adb.c

··· 697 697 int ret = 0; 698 698 struct adbdev_state *state = file->private_data; 699 699 struct adb_request *req; 700 - wait_queue_t wait = __WAITQUEUE_INITIALIZER(wait,current); 700 + DECLARE_WAITQUEUE(wait,current); 701 701 unsigned long flags; 702 702 703 703 if (count < 2)

+4 -4

drivers/macintosh/mac_hid.c

··· 181 181 mac_hid_destroy_emumouse(); 182 182 } 183 183 184 - static int mac_hid_toggle_emumouse(ctl_table *table, int write, 184 + static int mac_hid_toggle_emumouse(struct ctl_table *table, int write, 185 185 void __user *buffer, size_t *lenp, 186 186 loff_t *ppos) 187 187 { ··· 214 214 } 215 215 216 216 /* file(s) in /proc/sys/dev/mac_hid */ 217 - static ctl_table mac_hid_files[] = { 217 + static struct ctl_table mac_hid_files[] = { 218 218 { 219 219 .procname = "mouse_button_emulation", 220 220 .data = &mouse_emulate_buttons, ··· 240 240 }; 241 241 242 242 /* dir in /proc/sys/dev */ 243 - static ctl_table mac_hid_dir[] = { 243 + static struct ctl_table mac_hid_dir[] = { 244 244 { 245 245 .procname = "mac_hid", 246 246 .maxlen = 0, ··· 251 251 }; 252 252 253 253 /* /proc/sys/dev itself, in case that is not there yet */ 254 - static ctl_table mac_hid_root_dir[] = { 254 + static struct ctl_table mac_hid_root_dir[] = { 255 255 { 256 256 .procname = "dev", 257 257 .maxlen = 0,

+1 -1

drivers/macintosh/via-cuda.c

··· 259 259 } while (0) 260 260 261 261 static int 262 - cuda_init_via(void) 262 + __init cuda_init_via(void) 263 263 { 264 264 out_8(&via[DIRB], (in_8(&via[DIRB]) | TACK | TIP) & ~TREQ); /* TACK & TIP out */ 265 265 out_8(&via[B], in_8(&via[B]) | TACK | TIP); /* negate them */

+5 -1

drivers/macintosh/windfarm_pm121.c

··· 276 276 277 277 static unsigned int pm121_failure_state; 278 278 static int pm121_readjust, pm121_skipping; 279 + static bool pm121_overtemp; 279 280 static s32 average_power; 280 281 281 282 struct pm121_correction { ··· 848 847 if (new_failure & FAILURE_OVERTEMP) { 849 848 wf_set_overtemp(); 850 849 pm121_skipping = 2; 850 + pm121_overtemp = true; 851 851 } 852 852 853 853 /* We only clear the overtemp condition if overtemp is cleared ··· 857 855 * the control loop levels, but we don't want to keep it clear 858 856 * here in this case 859 857 */ 860 - if (new_failure == 0 && last_failure & FAILURE_OVERTEMP) 858 + if (!pm121_failure_state && pm121_overtemp) { 861 859 wf_clear_overtemp(); 860 + pm121_overtemp = false; 861 + } 862 862 } 863 863 864 864

+5 -1

drivers/macintosh/windfarm_pm81.c

··· 149 149 150 150 static unsigned int wf_smu_failure_state; 151 151 static int wf_smu_readjust, wf_smu_skipping; 152 + static bool wf_smu_overtemp; 152 153 153 154 /* 154 155 * ****** System Fans Control Loop ****** ··· 594 593 if (new_failure & FAILURE_OVERTEMP) { 595 594 wf_set_overtemp(); 596 595 wf_smu_skipping = 2; 596 + wf_smu_overtemp = true; 597 597 } 598 598 599 599 /* We only clear the overtemp condition if overtemp is cleared ··· 603 601 * the control loop levels, but we don't want to keep it clear 604 602 * here in this case 605 603 */ 606 - if (new_failure == 0 && last_failure & FAILURE_OVERTEMP) 604 + if (!wf_smu_failure_state && wf_smu_overtemp) { 607 605 wf_clear_overtemp(); 606 + wf_smu_overtemp = false; 607 + } 608 608 } 609 609 610 610 static void wf_smu_new_control(struct wf_control *ct)

+5 -1

drivers/macintosh/windfarm_pm91.c

··· 76 76 77 77 /* Set to kick the control loop into life */ 78 78 static int wf_smu_all_controls_ok, wf_smu_all_sensors_ok, wf_smu_started; 79 + static bool wf_smu_overtemp; 79 80 80 81 /* Failure handling.. could be nicer */ 81 82 #define FAILURE_FAN 0x01 ··· 518 517 if (new_failure & FAILURE_OVERTEMP) { 519 518 wf_set_overtemp(); 520 519 wf_smu_skipping = 2; 520 + wf_smu_overtemp = true; 521 521 } 522 522 523 523 /* We only clear the overtemp condition if overtemp is cleared ··· 527 525 * the control loop levels, but we don't want to keep it clear 528 526 * here in this case 529 527 */ 530 - if (new_failure == 0 && last_failure & FAILURE_OVERTEMP) 528 + if (!wf_smu_failure_state && wf_smu_overtemp) { 531 529 wf_clear_overtemp(); 530 + wf_smu_overtemp = false; 531 + } 532 532 } 533 533 534 534

-1

drivers/macintosh/windfarm_smu_sat.c

··· 343 343 wf_unregister_sensor(&sens->sens); 344 344 } 345 345 sat->i2c = NULL; 346 - i2c_set_clientdata(client, NULL); 347 346 kref_put(&sat->ref, wf_sat_release); 348 347 349 348 return 0;

+6

drivers/vfio/Kconfig

··· 3 3 depends on VFIO 4 4 default n 5 5 6 + config VFIO_IOMMU_SPAPR_TCE 7 + tristate 8 + depends on VFIO && SPAPR_TCE_IOMMU 9 + default n 10 + 6 11 menuconfig VFIO 7 12 tristate "VFIO Non-Privileged userspace driver framework" 8 13 depends on IOMMU_API 9 14 select VFIO_IOMMU_TYPE1 if X86 15 + select VFIO_IOMMU_SPAPR_TCE if (PPC_POWERNV || PPC_PSERIES) 10 16 help 11 17 VFIO provides a framework for secure userspace device drivers. 12 18 See Documentation/vfio.txt for more details.

+1

drivers/vfio/Makefile

··· 1 1 obj-$(CONFIG_VFIO) += vfio.o 2 2 obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o 3 + obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o 3 4 obj-$(CONFIG_VFIO_PCI) += pci/

+1

drivers/vfio/vfio.c

··· 1415 1415 * drivers. 1416 1416 */ 1417 1417 request_module_nowait("vfio_iommu_type1"); 1418 + request_module_nowait("vfio_iommu_spapr_tce"); 1418 1419 1419 1420 return 0; 1420 1421

+377

drivers/vfio/vfio_iommu_spapr_tce.c

··· 1 + /* 2 + * VFIO: IOMMU DMA mapping support for TCE on POWER 3 + * 4 + * Copyright (C) 2013 IBM Corp. All rights reserved. 5 + * Author: Alexey Kardashevskiy <aik@ozlabs.ru> 6 + * 7 + * This program is free software; you can redistribute it and/or modify 8 + * it under the terms of the GNU General Public License version 2 as 9 + * published by the Free Software Foundation. 10 + * 11 + * Derived from original vfio_iommu_type1.c: 12 + * Copyright (C) 2012 Red Hat, Inc. All rights reserved. 13 + * Author: Alex Williamson <alex.williamson@redhat.com> 14 + */ 15 + 16 + #include <linux/module.h> 17 + #include <linux/pci.h> 18 + #include <linux/slab.h> 19 + #include <linux/uaccess.h> 20 + #include <linux/err.h> 21 + #include <linux/vfio.h> 22 + #include <asm/iommu.h> 23 + #include <asm/tce.h> 24 + 25 + #define DRIVER_VERSION "0.1" 26 + #define DRIVER_AUTHOR "aik@ozlabs.ru" 27 + #define DRIVER_DESC "VFIO IOMMU SPAPR TCE" 28 + 29 + static void tce_iommu_detach_group(void *iommu_data, 30 + struct iommu_group *iommu_group); 31 + 32 + /* 33 + * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation 34 + * 35 + * This code handles mapping and unmapping of user data buffers 36 + * into DMA'ble space using the IOMMU 37 + */ 38 + 39 + /* 40 + * The container descriptor supports only a single group per container. 41 + * Required by the API as the container is not supplied with the IOMMU group 42 + * at the moment of initialization. 43 + */ 44 + struct tce_container { 45 + struct mutex lock; 46 + struct iommu_table *tbl; 47 + bool enabled; 48 + }; 49 + 50 + static int tce_iommu_enable(struct tce_container *container) 51 + { 52 + int ret = 0; 53 + unsigned long locked, lock_limit, npages; 54 + struct iommu_table *tbl = container->tbl; 55 + 56 + if (!container->tbl) 57 + return -ENXIO; 58 + 59 + if (!current->mm) 60 + return -ESRCH; /* process exited */ 61 + 62 + if (container->enabled) 63 + return -EBUSY; 64 + 65 + /* 66 + * When userspace pages are mapped into the IOMMU, they are effectively 67 + * locked memory, so, theoretically, we need to update the accounting 68 + * of locked pages on each map and unmap. For powerpc, the map unmap 69 + * paths can be very hot, though, and the accounting would kill 70 + * performance, especially since it would be difficult to impossible 71 + * to handle the accounting in real mode only. 72 + * 73 + * To address that, rather than precisely accounting every page, we 74 + * instead account for a worst case on locked memory when the iommu is 75 + * enabled and disabled. The worst case upper bound on locked memory 76 + * is the size of the whole iommu window, which is usually relatively 77 + * small (compared to total memory sizes) on POWER hardware. 78 + * 79 + * Also we don't have a nice way to fail on H_PUT_TCE due to ulimits, 80 + * that would effectively kill the guest at random points, much better 81 + * enforcing the limit based on the max that the guest can map. 82 + */ 83 + down_write(&current->mm->mmap_sem); 84 + npages = (tbl->it_size << IOMMU_PAGE_SHIFT) >> PAGE_SHIFT; 85 + locked = current->mm->locked_vm + npages; 86 + lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; 87 + if (locked > lock_limit && !capable(CAP_IPC_LOCK)) { 88 + pr_warn("RLIMIT_MEMLOCK (%ld) exceeded\n", 89 + rlimit(RLIMIT_MEMLOCK)); 90 + ret = -ENOMEM; 91 + } else { 92 + 93 + current->mm->locked_vm += npages; 94 + container->enabled = true; 95 + } 96 + up_write(&current->mm->mmap_sem); 97 + 98 + return ret; 99 + } 100 + 101 + static void tce_iommu_disable(struct tce_container *container) 102 + { 103 + if (!container->enabled) 104 + return; 105 + 106 + container->enabled = false; 107 + 108 + if (!container->tbl || !current->mm) 109 + return; 110 + 111 + down_write(&current->mm->mmap_sem); 112 + current->mm->locked_vm -= (container->tbl->it_size << 113 + IOMMU_PAGE_SHIFT) >> PAGE_SHIFT; 114 + up_write(&current->mm->mmap_sem); 115 + } 116 + 117 + static void *tce_iommu_open(unsigned long arg) 118 + { 119 + struct tce_container *container; 120 + 121 + if (arg != VFIO_SPAPR_TCE_IOMMU) { 122 + pr_err("tce_vfio: Wrong IOMMU type\n"); 123 + return ERR_PTR(-EINVAL); 124 + } 125 + 126 + container = kzalloc(sizeof(*container), GFP_KERNEL); 127 + if (!container) 128 + return ERR_PTR(-ENOMEM); 129 + 130 + mutex_init(&container->lock); 131 + 132 + return container; 133 + } 134 + 135 + static void tce_iommu_release(void *iommu_data) 136 + { 137 + struct tce_container *container = iommu_data; 138 + 139 + WARN_ON(container->tbl && !container->tbl->it_group); 140 + tce_iommu_disable(container); 141 + 142 + if (container->tbl && container->tbl->it_group) 143 + tce_iommu_detach_group(iommu_data, container->tbl->it_group); 144 + 145 + mutex_destroy(&container->lock); 146 + 147 + kfree(container); 148 + } 149 + 150 + static long tce_iommu_ioctl(void *iommu_data, 151 + unsigned int cmd, unsigned long arg) 152 + { 153 + struct tce_container *container = iommu_data; 154 + unsigned long minsz; 155 + long ret; 156 + 157 + switch (cmd) { 158 + case VFIO_CHECK_EXTENSION: 159 + return (arg == VFIO_SPAPR_TCE_IOMMU) ? 1 : 0; 160 + 161 + case VFIO_IOMMU_SPAPR_TCE_GET_INFO: { 162 + struct vfio_iommu_spapr_tce_info info; 163 + struct iommu_table *tbl = container->tbl; 164 + 165 + if (WARN_ON(!tbl)) 166 + return -ENXIO; 167 + 168 + minsz = offsetofend(struct vfio_iommu_spapr_tce_info, 169 + dma32_window_size); 170 + 171 + if (copy_from_user(&info, (void __user *)arg, minsz)) 172 + return -EFAULT; 173 + 174 + if (info.argsz < minsz) 175 + return -EINVAL; 176 + 177 + info.dma32_window_start = tbl->it_offset << IOMMU_PAGE_SHIFT; 178 + info.dma32_window_size = tbl->it_size << IOMMU_PAGE_SHIFT; 179 + info.flags = 0; 180 + 181 + if (copy_to_user((void __user *)arg, &info, minsz)) 182 + return -EFAULT; 183 + 184 + return 0; 185 + } 186 + case VFIO_IOMMU_MAP_DMA: { 187 + struct vfio_iommu_type1_dma_map param; 188 + struct iommu_table *tbl = container->tbl; 189 + unsigned long tce, i; 190 + 191 + if (!tbl) 192 + return -ENXIO; 193 + 194 + BUG_ON(!tbl->it_group); 195 + 196 + minsz = offsetofend(struct vfio_iommu_type1_dma_map, size); 197 + 198 + if (copy_from_user(&param, (void __user *)arg, minsz)) 199 + return -EFAULT; 200 + 201 + if (param.argsz < minsz) 202 + return -EINVAL; 203 + 204 + if (param.flags & ~(VFIO_DMA_MAP_FLAG_READ | 205 + VFIO_DMA_MAP_FLAG_WRITE)) 206 + return -EINVAL; 207 + 208 + if ((param.size & ~IOMMU_PAGE_MASK) || 209 + (param.vaddr & ~IOMMU_PAGE_MASK)) 210 + return -EINVAL; 211 + 212 + /* iova is checked by the IOMMU API */ 213 + tce = param.vaddr; 214 + if (param.flags & VFIO_DMA_MAP_FLAG_READ) 215 + tce |= TCE_PCI_READ; 216 + if (param.flags & VFIO_DMA_MAP_FLAG_WRITE) 217 + tce |= TCE_PCI_WRITE; 218 + 219 + ret = iommu_tce_put_param_check(tbl, param.iova, tce); 220 + if (ret) 221 + return ret; 222 + 223 + for (i = 0; i < (param.size >> IOMMU_PAGE_SHIFT); ++i) { 224 + ret = iommu_put_tce_user_mode(tbl, 225 + (param.iova >> IOMMU_PAGE_SHIFT) + i, 226 + tce); 227 + if (ret) 228 + break; 229 + tce += IOMMU_PAGE_SIZE; 230 + } 231 + if (ret) 232 + iommu_clear_tces_and_put_pages(tbl, 233 + param.iova >> IOMMU_PAGE_SHIFT, i); 234 + 235 + iommu_flush_tce(tbl); 236 + 237 + return ret; 238 + } 239 + case VFIO_IOMMU_UNMAP_DMA: { 240 + struct vfio_iommu_type1_dma_unmap param; 241 + struct iommu_table *tbl = container->tbl; 242 + 243 + if (WARN_ON(!tbl)) 244 + return -ENXIO; 245 + 246 + minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, 247 + size); 248 + 249 + if (copy_from_user(&param, (void __user *)arg, minsz)) 250 + return -EFAULT; 251 + 252 + if (param.argsz < minsz) 253 + return -EINVAL; 254 + 255 + /* No flag is supported now */ 256 + if (param.flags) 257 + return -EINVAL; 258 + 259 + if (param.size & ~IOMMU_PAGE_MASK) 260 + return -EINVAL; 261 + 262 + ret = iommu_tce_clear_param_check(tbl, param.iova, 0, 263 + param.size >> IOMMU_PAGE_SHIFT); 264 + if (ret) 265 + return ret; 266 + 267 + ret = iommu_clear_tces_and_put_pages(tbl, 268 + param.iova >> IOMMU_PAGE_SHIFT, 269 + param.size >> IOMMU_PAGE_SHIFT); 270 + iommu_flush_tce(tbl); 271 + 272 + return ret; 273 + } 274 + case VFIO_IOMMU_ENABLE: 275 + mutex_lock(&container->lock); 276 + ret = tce_iommu_enable(container); 277 + mutex_unlock(&container->lock); 278 + return ret; 279 + 280 + 281 + case VFIO_IOMMU_DISABLE: 282 + mutex_lock(&container->lock); 283 + tce_iommu_disable(container); 284 + mutex_unlock(&container->lock); 285 + return 0; 286 + } 287 + 288 + return -ENOTTY; 289 + } 290 + 291 + static int tce_iommu_attach_group(void *iommu_data, 292 + struct iommu_group *iommu_group) 293 + { 294 + int ret; 295 + struct tce_container *container = iommu_data; 296 + struct iommu_table *tbl = iommu_group_get_iommudata(iommu_group); 297 + 298 + BUG_ON(!tbl); 299 + mutex_lock(&container->lock); 300 + 301 + /* pr_debug("tce_vfio: Attaching group #%u to iommu %p\n", 302 + iommu_group_id(iommu_group), iommu_group); */ 303 + if (container->tbl) { 304 + pr_warn("tce_vfio: Only one group per IOMMU container is allowed, existing id=%d, attaching id=%d\n", 305 + iommu_group_id(container->tbl->it_group), 306 + iommu_group_id(iommu_group)); 307 + ret = -EBUSY; 308 + } else if (container->enabled) { 309 + pr_err("tce_vfio: attaching group #%u to enabled container\n", 310 + iommu_group_id(iommu_group)); 311 + ret = -EBUSY; 312 + } else { 313 + ret = iommu_take_ownership(tbl); 314 + if (!ret) 315 + container->tbl = tbl; 316 + } 317 + 318 + mutex_unlock(&container->lock); 319 + 320 + return ret; 321 + } 322 + 323 + static void tce_iommu_detach_group(void *iommu_data, 324 + struct iommu_group *iommu_group) 325 + { 326 + struct tce_container *container = iommu_data; 327 + struct iommu_table *tbl = iommu_group_get_iommudata(iommu_group); 328 + 329 + BUG_ON(!tbl); 330 + mutex_lock(&container->lock); 331 + if (tbl != container->tbl) { 332 + pr_warn("tce_vfio: detaching group #%u, expected group is #%u\n", 333 + iommu_group_id(iommu_group), 334 + iommu_group_id(tbl->it_group)); 335 + } else { 336 + if (container->enabled) { 337 + pr_warn("tce_vfio: detaching group #%u from enabled container, forcing disable\n", 338 + iommu_group_id(tbl->it_group)); 339 + tce_iommu_disable(container); 340 + } 341 + 342 + /* pr_debug("tce_vfio: detaching group #%u from iommu %p\n", 343 + iommu_group_id(iommu_group), iommu_group); */ 344 + container->tbl = NULL; 345 + iommu_release_ownership(tbl); 346 + } 347 + mutex_unlock(&container->lock); 348 + } 349 + 350 + const struct vfio_iommu_driver_ops tce_iommu_driver_ops = { 351 + .name = "iommu-vfio-powerpc", 352 + .owner = THIS_MODULE, 353 + .open = tce_iommu_open, 354 + .release = tce_iommu_release, 355 + .ioctl = tce_iommu_ioctl, 356 + .attach_group = tce_iommu_attach_group, 357 + .detach_group = tce_iommu_detach_group, 358 + }; 359 + 360 + static int __init tce_iommu_init(void) 361 + { 362 + return vfio_register_iommu_driver(&tce_iommu_driver_ops); 363 + } 364 + 365 + static void __exit tce_iommu_cleanup(void) 366 + { 367 + vfio_unregister_iommu_driver(&tce_iommu_driver_ops); 368 + } 369 + 370 + module_init(tce_iommu_init); 371 + module_exit(tce_iommu_cleanup); 372 + 373 + MODULE_VERSION(DRIVER_VERSION); 374 + MODULE_LICENSE("GPL v2"); 375 + MODULE_AUTHOR(DRIVER_AUTHOR); 376 + MODULE_DESCRIPTION(DRIVER_DESC); 377 +

+8

drivers/watchdog/booke_wdt.c

··· 138 138 val &= ~WDTP_MASK; 139 139 val |= (TCR_WIE|TCR_WRC(WRC_CHIP)|WDTP(booke_wdt_period)); 140 140 141 + #ifdef CONFIG_PPC_BOOK3E_64 142 + /* 143 + * Crit ints are currently broken on PPC64 Book-E, so 144 + * just disable them for now. 145 + */ 146 + val &= ~TCR_WIE; 147 + #endif 148 + 141 149 mtspr(SPRN_TCR, val); 142 150 } 143 151

+1 -1

fs/pstore/ftrace.c

··· 44 44 rec.parent_ip = parent_ip; 45 45 pstore_ftrace_encode_cpu(&rec, raw_smp_processor_id()); 46 46 psinfo->write_buf(PSTORE_TYPE_FTRACE, 0, NULL, 0, (void *)&rec, 47 - sizeof(rec), psinfo); 47 + 0, sizeof(rec), psinfo); 48 48 49 49 local_irq_restore(flags); 50 50 }

+9

fs/pstore/inode.c

··· 326 326 case PSTORE_TYPE_MCE: 327 327 sprintf(name, "mce-%s-%lld", psname, id); 328 328 break; 329 + case PSTORE_TYPE_PPC_RTAS: 330 + sprintf(name, "rtas-%s-%lld", psname, id); 331 + break; 332 + case PSTORE_TYPE_PPC_OF: 333 + sprintf(name, "powerpc-ofw-%s-%lld", psname, id); 334 + break; 335 + case PSTORE_TYPE_PPC_COMMON: 336 + sprintf(name, "powerpc-common-%s-%lld", psname, id); 337 + break; 329 338 case PSTORE_TYPE_UNKNOWN: 330 339 sprintf(name, "unknown-%s-%lld", psname, id); 331 340 break;

+6 -4

fs/pstore/platform.c

··· 159 159 break; 160 160 161 161 ret = psinfo->write(PSTORE_TYPE_DMESG, reason, &id, part, 162 - oopscount, hsize + len, psinfo); 162 + oopscount, hsize, hsize + len, psinfo); 163 163 if (ret == 0 && reason == KMSG_DUMP_OOPS && pstore_is_mounted()) 164 164 pstore_new_entry = 1; 165 165 ··· 196 196 spin_lock_irqsave(&psinfo->buf_lock, flags); 197 197 } 198 198 memcpy(psinfo->buf, s, c); 199 - psinfo->write(PSTORE_TYPE_CONSOLE, 0, &id, 0, 0, c, psinfo); 199 + psinfo->write(PSTORE_TYPE_CONSOLE, 0, &id, 0, 0, 0, c, psinfo); 200 200 spin_unlock_irqrestore(&psinfo->buf_lock, flags); 201 201 s += c; 202 202 c = e - s; ··· 221 221 static int pstore_write_compat(enum pstore_type_id type, 222 222 enum kmsg_dump_reason reason, 223 223 u64 *id, unsigned int part, int count, 224 - size_t size, struct pstore_info *psi) 224 + size_t hsize, size_t size, 225 + struct pstore_info *psi) 225 226 { 226 - return psi->write_buf(type, reason, id, part, psinfo->buf, size, psi); 227 + return psi->write_buf(type, reason, id, part, psinfo->buf, hsize, 228 + size, psi); 227 229 } 228 230 229 231 /*

+2 -1

fs/pstore/ram.c

··· 195 195 static int notrace ramoops_pstore_write_buf(enum pstore_type_id type, 196 196 enum kmsg_dump_reason reason, 197 197 u64 *id, unsigned int part, 198 - const char *buf, size_t size, 198 + const char *buf, 199 + size_t hsize, size_t size, 199 200 struct pstore_info *psi) 200 201 { 201 202 struct ramoops_context *cxt = psi->data;

+3 -2

include/asm-generic/pgtable.h

··· 173 173 #endif 174 174 175 175 #ifndef __HAVE_ARCH_PGTABLE_DEPOSIT 176 - extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable); 176 + extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, 177 + pgtable_t pgtable); 177 178 #endif 178 179 179 180 #ifndef __HAVE_ARCH_PGTABLE_WITHDRAW 180 - extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm); 181 + extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp); 181 182 #endif 182 183 183 184 #ifndef __HAVE_ARCH_PMDP_INVALIDATE

+3 -3

include/linux/huge_mm.h

··· 60 60 #define HPAGE_PMD_NR (1<<HPAGE_PMD_ORDER) 61 61 62 62 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 63 - #define HPAGE_PMD_SHIFT HPAGE_SHIFT 64 - #define HPAGE_PMD_MASK HPAGE_MASK 65 - #define HPAGE_PMD_SIZE HPAGE_SIZE 63 + #define HPAGE_PMD_SHIFT PMD_SHIFT 64 + #define HPAGE_PMD_SIZE ((1UL) << HPAGE_PMD_SHIFT) 65 + #define HPAGE_PMD_MASK (~(HPAGE_PMD_SIZE - 1)) 66 66 67 67 extern bool is_vma_temporary_stack(struct vm_area_struct *vma); 68 68

+8 -4

include/linux/pstore.h

··· 35 35 PSTORE_TYPE_MCE = 1, 36 36 PSTORE_TYPE_CONSOLE = 2, 37 37 PSTORE_TYPE_FTRACE = 3, 38 + /* PPC64 partition types */ 39 + PSTORE_TYPE_PPC_RTAS = 4, 40 + PSTORE_TYPE_PPC_OF = 5, 41 + PSTORE_TYPE_PPC_COMMON = 6, 38 42 PSTORE_TYPE_UNKNOWN = 255 39 43 }; 40 44 ··· 58 54 struct pstore_info *psi); 59 55 int (*write)(enum pstore_type_id type, 60 56 enum kmsg_dump_reason reason, u64 *id, 61 - unsigned int part, int count, size_t size, 62 - struct pstore_info *psi); 57 + unsigned int part, int count, size_t hsize, 58 + size_t size, struct pstore_info *psi); 63 59 int (*write_buf)(enum pstore_type_id type, 64 60 enum kmsg_dump_reason reason, u64 *id, 65 - unsigned int part, const char *buf, size_t size, 66 - struct pstore_info *psi); 61 + unsigned int part, const char *buf, size_t hsize, 62 + size_t size, struct pstore_info *psi); 67 63 int (*erase)(enum pstore_type_id type, u64 id, 68 64 int count, struct timespec time, 69 65 struct pstore_info *psi);

+34

include/uapi/linux/vfio.h

··· 22 22 /* Extensions */ 23 23 24 24 #define VFIO_TYPE1_IOMMU 1 25 + #define VFIO_SPAPR_TCE_IOMMU 2 25 26 26 27 /* 27 28 * The IOCTL interface is designed for extensibility by embedding the ··· 375 374 }; 376 375 377 376 #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14) 377 + 378 + /* 379 + * IOCTLs to enable/disable IOMMU container usage. 380 + * No parameters are supported. 381 + */ 382 + #define VFIO_IOMMU_ENABLE _IO(VFIO_TYPE, VFIO_BASE + 15) 383 + #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16) 384 + 385 + /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ 386 + 387 + /* 388 + * The SPAPR TCE info struct provides the information about the PCI bus 389 + * address ranges available for DMA, these values are programmed into 390 + * the hardware so the guest has to know that information. 391 + * 392 + * The DMA 32 bit window start is an absolute PCI bus address. 393 + * The IOVA address passed via map/unmap ioctls are absolute PCI bus 394 + * addresses too so the window works as a filter rather than an offset 395 + * for IOVA addresses. 396 + * 397 + * A flag will need to be added if other page sizes are supported, 398 + * so as defined here, it is always 4k. 399 + */ 400 + struct vfio_iommu_spapr_tce_info { 401 + __u32 argsz; 402 + __u32 flags; /* reserved for future use */ 403 + __u32 dma32_window_start; /* 32 bit window start (bytes) */ 404 + __u32 dma32_window_size; /* 32 bit window size (bytes) */ 405 + }; 406 + 407 + #define VFIO_IOMMU_SPAPR_TCE_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) 408 + 409 + /* ***************************************************************** */ 378 410 379 411 #endif /* _UAPIVFIO_H */

+18 -10

mm/huge_memory.c

··· 729 729 pmd_t entry; 730 730 entry = mk_huge_pmd(page, vma); 731 731 page_add_new_anon_rmap(page, vma, haddr); 732 + pgtable_trans_huge_deposit(mm, pmd, pgtable); 732 733 set_pmd_at(mm, haddr, pmd, entry); 733 - pgtable_trans_huge_deposit(mm, pgtable); 734 734 add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR); 735 735 mm->nr_ptes++; 736 736 spin_unlock(&mm->page_table_lock); ··· 771 771 entry = mk_pmd(zero_page, vma->vm_page_prot); 772 772 entry = pmd_wrprotect(entry); 773 773 entry = pmd_mkhuge(entry); 774 + pgtable_trans_huge_deposit(mm, pmd, pgtable); 774 775 set_pmd_at(mm, haddr, pmd, entry); 775 - pgtable_trans_huge_deposit(mm, pgtable); 776 776 mm->nr_ptes++; 777 777 return true; 778 778 } ··· 916 916 917 917 pmdp_set_wrprotect(src_mm, addr, src_pmd); 918 918 pmd = pmd_mkold(pmd_wrprotect(pmd)); 919 + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); 919 920 set_pmd_at(dst_mm, addr, dst_pmd, pmd); 920 - pgtable_trans_huge_deposit(dst_mm, pgtable); 921 921 dst_mm->nr_ptes++; 922 922 923 923 ret = 0; ··· 987 987 pmdp_clear_flush(vma, haddr, pmd); 988 988 /* leave pmd empty until pte is filled */ 989 989 990 - pgtable = pgtable_trans_huge_withdraw(mm); 990 + pgtable = pgtable_trans_huge_withdraw(mm, pmd); 991 991 pmd_populate(mm, &_pmd, pgtable); 992 992 993 993 for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) { ··· 1085 1085 pmdp_clear_flush(vma, haddr, pmd); 1086 1086 /* leave pmd empty until pte is filled */ 1087 1087 1088 - pgtable = pgtable_trans_huge_withdraw(mm); 1088 + pgtable = pgtable_trans_huge_withdraw(mm, pmd); 1089 1089 pmd_populate(mm, &_pmd, pgtable); 1090 1090 1091 1091 for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) { ··· 1265 1265 * young bit, instead of the current set_pmd_at. 1266 1266 */ 1267 1267 _pmd = pmd_mkyoung(pmd_mkdirty(*pmd)); 1268 - set_pmd_at(mm, addr & HPAGE_PMD_MASK, pmd, _pmd); 1268 + if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK, 1269 + pmd, _pmd, 1)) 1270 + update_mmu_cache_pmd(vma, addr, pmd); 1269 1271 } 1270 1272 if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { 1271 1273 if (page->mapping && trylock_page(page)) { ··· 1360 1358 struct page *page; 1361 1359 pgtable_t pgtable; 1362 1360 pmd_t orig_pmd; 1363 - pgtable = pgtable_trans_huge_withdraw(tlb->mm); 1361 + /* 1362 + * For architectures like ppc64 we look at deposited pgtable 1363 + * when calling pmdp_get_and_clear. So do the 1364 + * pgtable_trans_huge_withdraw after finishing pmdp related 1365 + * operations. 1366 + */ 1364 1367 orig_pmd = pmdp_get_and_clear(tlb->mm, addr, pmd); 1365 1368 tlb_remove_pmd_tlb_entry(tlb, pmd, addr); 1369 + pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd); 1366 1370 if (is_huge_zero_pmd(orig_pmd)) { 1367 1371 tlb->mm->nr_ptes--; 1368 1372 spin_unlock(&tlb->mm->page_table_lock); ··· 1699 1691 pmd = page_check_address_pmd(page, mm, address, 1700 1692 PAGE_CHECK_ADDRESS_PMD_SPLITTING_FLAG); 1701 1693 if (pmd) { 1702 - pgtable = pgtable_trans_huge_withdraw(mm); 1694 + pgtable = pgtable_trans_huge_withdraw(mm, pmd); 1703 1695 pmd_populate(mm, &_pmd, pgtable); 1704 1696 1705 1697 haddr = address; ··· 2367 2359 spin_lock(&mm->page_table_lock); 2368 2360 BUG_ON(!pmd_none(*pmd)); 2369 2361 page_add_new_anon_rmap(new_page, vma, address); 2362 + pgtable_trans_huge_deposit(mm, pmd, pgtable); 2370 2363 set_pmd_at(mm, address, pmd, _pmd); 2371 2364 update_mmu_cache_pmd(vma, address, pmd); 2372 - pgtable_trans_huge_deposit(mm, pgtable); 2373 2365 spin_unlock(&mm->page_table_lock); 2374 2366 2375 2367 *hpage = NULL; ··· 2675 2667 pmdp_clear_flush(vma, haddr, pmd); 2676 2668 /* leave pmd empty until pte is filled */ 2677 2669 2678 - pgtable = pgtable_trans_huge_withdraw(mm); 2670 + pgtable = pgtable_trans_huge_withdraw(mm, pmd); 2679 2671 pmd_populate(mm, &_pmd, pgtable); 2680 2672 2681 2673 for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {

+3 -2

mm/pgtable-generic.c

··· 124 124 125 125 #ifndef __HAVE_ARCH_PGTABLE_DEPOSIT 126 126 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 127 - void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable) 127 + void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, 128 + pgtable_t pgtable) 128 129 { 129 130 assert_spin_locked(&mm->page_table_lock); 130 131 ··· 142 141 #ifndef __HAVE_ARCH_PGTABLE_WITHDRAW 143 142 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 144 143 /* no "address" argument so destroys page coloring of some arch */ 145 - pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm) 144 + pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) 146 145 { 147 146 pgtable_t pgtable; 148 147