Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc

Pull powerpc updates from Ben Herrenschmidt:
"This is the powerpc changes for the 3.11 merge window. In addition to
the usual bug fixes and small updates, the main highlights are:

- Support for transparent huge pages by Aneesh Kumar for 64-bit
server processors. This allows the use of 16M pages as transparent
huge pages on kernels compiled with a 64K base page size.

- Base VFIO support for KVM on power by Alexey Kardashevskiy

- Wiring up of our nvram to the pstore infrastructure, including
putting compressed oopses in there by Aruna Balakrishnaiah

- Move, rework and improve our "EEH" (basically PCI error handling
and recovery) infrastructure. It is no longer specific to pseries
but is now usable by the new "powernv" platform as well (no
hypervisor) by Gavin Shan.

- I fixed some bugs in our math-emu instruction decoding and made it
usable to emulate some optional FP instructions on processors with
hard FP that lack them (such as fsqrt on Freescale embedded
processors).

- Support for Power8 "Event Based Branch" facility by Michael
Ellerman. This facility allows what is basically "userspace
interrupts" for performance monitor events.

- A bunch of Transactional Memory vs. Signals bug fixes and HW
breakpoint/watchpoint fixes by Michael Neuling.

And more ... I appologize in advance if I've failed to highlight
something that somebody deemed worth it."

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
pstore: Add hsize argument in write_buf call of pstore_ftrace_call
powerpc/fsl: add MPIC timer wakeup support
powerpc/mpic: create mpic subsystem object
powerpc/mpic: add global timer support
powerpc/mpic: add irq_set_wake support
powerpc/85xx: enable coreint for all the 64bit boards
powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx
powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig
powerpc/mpic: Add get_version API both for internal and external use
powerpc: Handle both new style and old style reserve maps
powerpc/hw_brk: Fix off by one error when validating DAWR region end
powerpc/pseries: Support compression of oops text via pstore
powerpc/pseries: Re-organise the oops compression code
pstore: Pass header size in the pstore write callback
powerpc/powernv: Fix iommu initialization again
powerpc/pseries: Inform the hypervisor we are using EBB regs
powerpc/perf: Add power8 EBB support
powerpc/perf: Core EBB support for 64-bit book3s
powerpc/perf: Drop MMCRA from thread_struct
powerpc/perf: Don't enable if we have zero events
...

+7681 -1225
+309
Documentation/devicetree/bindings/powerpc/fsl/interlaken-lac.txt
··· 1 + =============================================================================== 2 + Freescale Interlaken Look-Aside Controller Device Bindings 3 + Copyright 2012 Freescale Semiconductor Inc. 4 + 5 + CONTENTS 6 + - Interlaken Look-Aside Controller (LAC) Node 7 + - Example LAC Node 8 + - Interlaken Look-Aside Controller (LAC) Software Portal Node 9 + - Interlaken Look-Aside Controller (LAC) Software Portal Child Nodes 10 + - Example LAC SWP Node with Child Nodes 11 + 12 + ============================================================================== 13 + Interlaken Look-Aside Controller (LAC) Node 14 + 15 + DESCRIPTION 16 + 17 + The Interlaken is a narrow, high speed channelized chip-to-chip interface. To 18 + facilitate interoperability between a data path device and a look-aside 19 + co-processor, the Interlaken Look-Aside protocol is defined for short 20 + transaction-related transfers. Although based on the Interlaken protocol, 21 + Interlaken Look-Aside is not directly compatible with Interlaken and can be 22 + considered a different operation mode. 23 + 24 + The Interlaken LA controller connects internal platform to Interlaken serial 25 + interface. It accepts LA command through software portals, which are system 26 + memory mapped 4KB spaces. The LA commands are then translated into the 27 + Interlaken control words and data words, which are sent on TX side to TCAM 28 + through SerDes lanes. 29 + 30 + There are two 4KiB spaces defined within the LAC global register memory map. 31 + There is a full register set at 0x0000-0x0FFF (also known as the "hypervisor" 32 + version), and a subset at 0x1000-0x1FFF. The former is a superset of the 33 + latter, and includes certain registers that should not be accessible to 34 + partitioned software. Separate nodes are used for each region, with a phandle 35 + linking the hypervisor node to the normal operating node. 36 + 37 + PROPERTIES 38 + 39 + - compatible 40 + Usage: required 41 + Value type: <string> 42 + Definition: Must include "fsl,interlaken-lac". This represents only 43 + those LAC CCSR registers not protected in partitioned 44 + software. The version of the device is determined by the LAC 45 + IP Block Revision Register (IPBRR0) at offset 0x0BF8. 46 + 47 + Table of correspondences between IPBRR0 values and example 48 + chips: 49 + Value Device 50 + ----------- ------- 51 + 0x02000100 T4240 52 + 53 + The Hypervisor node has a different compatible. It must include 54 + "fsl,interlaken-lac-hv". This node represents the protected 55 + LAC register space and is required except inside a partition 56 + where access to the hypervisor node is to be denied. 57 + 58 + - fsl,non-hv-node 59 + Usage: required in "fsl,interlaken-lac-hv" 60 + Value type: <phandle> 61 + Definition: Points to the non-protected LAC CCSR mapped register space 62 + node. 63 + 64 + - reg 65 + Usage: required 66 + Value type: <prop-encoded-array> 67 + Definition: A standard property. The first resource represents the 68 + Interlaken LAC configuration registers. 69 + 70 + - interrupts: 71 + Usage: required in non-hv node only 72 + Value type: <prop-encoded-array> 73 + Definition: Interrupt mapping for Interlaken LAC error IRQ. 74 + 75 + EXAMPLE 76 + lac: lac@229000 { 77 + compatible = "fsl,interlaken-lac" 78 + reg = <0x229000 0x1000>; 79 + interrupts = <16 2 1 18>; 80 + }; 81 + 82 + lac-hv@228000 { 83 + compatible = "fsl,interlaken-lac-hv" 84 + reg = <0x228000 0x1000>; 85 + fsl,non-hv-node = <&lac>; 86 + }; 87 + 88 + =============================================================================== 89 + Interlaken Look-Aside Controller (LAC) Software Portal Container Node 90 + 91 + DESCRIPTION 92 + The Interlaken Look-Aside Controller (LAC) utilizes Software Portals to accept 93 + Interlaken Look-Aside (ILA) commands. The Interlaken LAC software portal 94 + memory map occupies 128KB of memory space. The software portal memory space is 95 + intended to be cache-enabled. WIMG for each software space is required to be 96 + 0010 if stashing is enabled; otherwise, WIMG can be 0000 or 0010. 97 + 98 + PROPERTIES 99 + 100 + - #address-cells 101 + Usage: required 102 + Value type: <u32> 103 + Definition: A standard property. Must have a value of 1. 104 + 105 + - #size-cells 106 + Usage: required 107 + Value type: <u32> 108 + Definition: A standard property. Must have a value of 1. 109 + 110 + - compatible 111 + Usage: required 112 + Value type: <string> 113 + Definition: Must include "fsl,interlaken-lac-portals" 114 + 115 + - ranges 116 + Usage: required 117 + Value type: <prop-encoded-array> 118 + Definition: A standard property. Specifies the address and length 119 + of the LAC portal memory space. 120 + 121 + =============================================================================== 122 + Interlaken Look-Aside Controller (LAC) Software Portals Child Nodes 123 + 124 + DESCRIPTION 125 + There are up to 24 available software portals with each software portal 126 + requiring 4KB of consecutive memory within the software portal memory mapped 127 + space. 128 + 129 + PROPERTIES 130 + 131 + - compatible 132 + Usage: required 133 + Value type: <string> 134 + Definition: Must include "fsl,interlaken-lac-portal-vX.Y" where X is 135 + the Major version (IP_MJ) found in the LAC IP Block Revision 136 + Register (IPBRR0), at offset 0x0BF8, and Y is the Minor version 137 + (IP_MN). 138 + 139 + Table of correspondences between version values and example chips: 140 + Value Device 141 + ------ ------- 142 + 1.0 T4240 143 + 144 + - reg 145 + Usage: required 146 + Value type: <prop-encoded-array> 147 + Definition: A standard property. The first resource represents the 148 + Interlaken LAC software portal registers. 149 + 150 + - fsl,liodn 151 + Value type: <u32> 152 + Definition: The logical I/O device number (LIODN) for this device. The 153 + LIODN is a number expressed by this device and used to perform 154 + look-ups in the IOMMU (PAMU) address table when performing 155 + DMAs. This property is automatically added by u-boot. 156 + 157 + =============================================================================== 158 + EXAMPLE 159 + 160 + lac-portals { 161 + #address-cells = <0x1>; 162 + #size-cells = <0x1>; 163 + compatible = "fsl,interlaken-lac-portals"; 164 + ranges = <0x0 0xf 0xf4400000 0x20000>; 165 + 166 + lportal0: lac-portal@0 { 167 + compatible = "fsl,interlaken-lac-portal-v1.0"; 168 + fsl,liodn = <0x204>; 169 + reg = <0x0 0x1000>; 170 + }; 171 + 172 + lportal1: lac-portal@1000 { 173 + compatible = "fsl,interlaken-lac-portal-v1.0"; 174 + fsl,liodn = <0x205>; 175 + reg = <0x1000 0x1000>; 176 + }; 177 + 178 + lportal2: lac-portal@2000 { 179 + compatible = "fsl,interlaken-lac-portal-v1.0"; 180 + fsl,liodn = <0x206>; 181 + reg = <0x2000 0x1000>; 182 + }; 183 + 184 + lportal3: lac-portal@3000 { 185 + compatible = "fsl,interlaken-lac-portal-v1.0"; 186 + fsl,liodn = <0x207>; 187 + reg = <0x3000 0x1000>; 188 + }; 189 + 190 + lportal4: lac-portal@4000 { 191 + compatible = "fsl,interlaken-lac-portal-v1.0"; 192 + fsl,liodn = <0x208>; 193 + reg = <0x4000 0x1000>; 194 + }; 195 + 196 + lportal5: lac-portal@5000 { 197 + compatible = "fsl,interlaken-lac-portal-v1.0"; 198 + fsl,liodn = <0x209>; 199 + reg = <0x5000 0x1000>; 200 + }; 201 + 202 + lportal6: lac-portal@6000 { 203 + compatible = "fsl,interlaken-lac-portal-v1.0"; 204 + fsl,liodn = <0x20A>; 205 + reg = <0x6000 0x1000>; 206 + }; 207 + 208 + lportal7: lac-portal@7000 { 209 + compatible = "fsl,interlaken-lac-portal-v1.0"; 210 + fsl,liodn = <0x20B>; 211 + reg = <0x7000 0x1000>; 212 + }; 213 + 214 + lportal8: lac-portal@8000 { 215 + compatible = "fsl,interlaken-lac-portal-v1.0"; 216 + fsl,liodn = <0x20C>; 217 + reg = <0x8000 0x1000>; 218 + }; 219 + 220 + lportal9: lac-portal@9000 { 221 + compatible = "fsl,interlaken-lac-portal-v1.0"; 222 + fsl,liodn = <0x20D>; 223 + reg = <0x9000 0x1000>; 224 + }; 225 + 226 + lportal10: lac-portal@A000 { 227 + compatible = "fsl,interlaken-lac-portal-v1.0"; 228 + fsl,liodn = <0x20E>; 229 + reg = <0xA000 0x1000>; 230 + }; 231 + 232 + lportal11: lac-portal@B000 { 233 + compatible = "fsl,interlaken-lac-portal-v1.0"; 234 + fsl,liodn = <0x20F>; 235 + reg = <0xB000 0x1000>; 236 + }; 237 + 238 + lportal12: lac-portal@C000 { 239 + compatible = "fsl,interlaken-lac-portal-v1.0"; 240 + fsl,liodn = <0x210>; 241 + reg = <0xC000 0x1000>; 242 + }; 243 + 244 + lportal13: lac-portal@D000 { 245 + compatible = "fsl,interlaken-lac-portal-v1.0"; 246 + fsl,liodn = <0x211>; 247 + reg = <0xD000 0x1000>; 248 + }; 249 + 250 + lportal14: lac-portal@E000 { 251 + compatible = "fsl,interlaken-lac-portal-v1.0"; 252 + fsl,liodn = <0x212>; 253 + reg = <0xE000 0x1000>; 254 + }; 255 + 256 + lportal15: lac-portal@F000 { 257 + compatible = "fsl,interlaken-lac-portal-v1.0"; 258 + fsl,liodn = <0x213>; 259 + reg = <0xF000 0x1000>; 260 + }; 261 + 262 + lportal16: lac-portal@10000 { 263 + compatible = "fsl,interlaken-lac-portal-v1.0"; 264 + fsl,liodn = <0x214>; 265 + reg = <0x10000 0x1000>; 266 + }; 267 + 268 + lportal17: lac-portal@11000 { 269 + compatible = "fsl,interlaken-lac-portal-v1.0"; 270 + fsl,liodn = <0x215>; 271 + reg = <0x11000 0x1000>; 272 + }; 273 + 274 + lportal8: lac-portal@1200 { 275 + compatible = "fsl,interlaken-lac-portal-v1.0"; 276 + fsl,liodn = <0x216>; 277 + reg = <0x12000 0x1000>; 278 + }; 279 + 280 + lportal19: lac-portal@13000 { 281 + compatible = "fsl,interlaken-lac-portal-v1.0"; 282 + fsl,liodn = <0x217>; 283 + reg = <0x13000 0x1000>; 284 + }; 285 + 286 + lportal20: lac-portal@14000 { 287 + compatible = "fsl,interlaken-lac-portal-v1.0"; 288 + fsl,liodn = <0x218>; 289 + reg = <0x14000 0x1000>; 290 + }; 291 + 292 + lportal21: lac-portal@15000 { 293 + compatible = "fsl,interlaken-lac-portal-v1.0"; 294 + fsl,liodn = <0x219>; 295 + reg = <0x15000 0x1000>; 296 + }; 297 + 298 + lportal22: lac-portal@16000 { 299 + compatible = "fsl,interlaken-lac-portal-v1.0"; 300 + fsl,liodn = <0x21A>; 301 + reg = <0x16000 0x1000>; 302 + }; 303 + 304 + lportal23: lac-portal@17000 { 305 + compatible = "fsl,interlaken-lac-portal-v1.0"; 306 + fsl,liodn = <0x21B>; 307 + reg = <0x17000 0x1000>; 308 + }; 309 + };
+2
Documentation/powerpc/00-INDEX
··· 14 14 - IBM "Hypervisor Virtual Console Server" Installation Guide 15 15 mpc52xx.txt 16 16 - Linux 2.6.x on MPC52xx family 17 + pmu-ebb.txt 18 + - Description of the API for using the PMU with Event Based Branches. 17 19 qe_firmware.txt 18 20 - describes the layout of firmware binaries for the Freescale QUICC 19 21 Engine and the code that parses and uploads the microcode therein.
+137
Documentation/powerpc/pmu-ebb.txt
··· 1 + PMU Event Based Branches 2 + ======================== 3 + 4 + Event Based Branches (EBBs) are a feature which allows the hardware to 5 + branch directly to a specified user space address when certain events occur. 6 + 7 + The full specification is available in Power ISA v2.07: 8 + 9 + https://www.power.org/documentation/power-isa-version-2-07/ 10 + 11 + One type of event for which EBBs can be configured is PMU exceptions. This 12 + document describes the API for configuring the Power PMU to generate EBBs, 13 + using the Linux perf_events API. 14 + 15 + 16 + Terminology 17 + ----------- 18 + 19 + Throughout this document we will refer to an "EBB event" or "EBB events". This 20 + just refers to a struct perf_event which has set the "EBB" flag in its 21 + attr.config. All events which can be configured on the hardware PMU are 22 + possible "EBB events". 23 + 24 + 25 + Background 26 + ---------- 27 + 28 + When a PMU EBB occurs it is delivered to the currently running process. As such 29 + EBBs can only sensibly be used by programs for self-monitoring. 30 + 31 + It is a feature of the perf_events API that events can be created on other 32 + processes, subject to standard permission checks. This is also true of EBB 33 + events, however unless the target process enables EBBs (via mtspr(BESCR)) no 34 + EBBs will ever be delivered. 35 + 36 + This makes it possible for a process to enable EBBs for itself, but not 37 + actually configure any events. At a later time another process can come along 38 + and attach an EBB event to the process, which will then cause EBBs to be 39 + delivered to the first process. It's not clear if this is actually useful. 40 + 41 + 42 + When the PMU is configured for EBBs, all PMU interrupts are delivered to the 43 + user process. This means once an EBB event is scheduled on the PMU, no non-EBB 44 + events can be configured. This means that EBB events can not be run 45 + concurrently with regular 'perf' commands, or any other perf events. 46 + 47 + It is however safe to run 'perf' commands on a process which is using EBBs. The 48 + kernel will in general schedule the EBB event, and perf will be notified that 49 + its events could not run. 50 + 51 + The exclusion between EBB events and regular events is implemented using the 52 + existing "pinned" and "exclusive" attributes of perf_events. This means EBB 53 + events will be given priority over other events, unless they are also pinned. 54 + If an EBB event and a regular event are both pinned, then whichever is enabled 55 + first will be scheduled and the other will be put in error state. See the 56 + section below titled "Enabling an EBB event" for more information. 57 + 58 + 59 + Creating an EBB event 60 + --------------------- 61 + 62 + To request that an event is counted using EBB, the event code should have bit 63 + 63 set. 64 + 65 + EBB events must be created with a particular, and restrictive, set of 66 + attributes - this is so that they interoperate correctly with the rest of the 67 + perf_events subsystem. 68 + 69 + An EBB event must be created with the "pinned" and "exclusive" attributes set. 70 + Note that if you are creating a group of EBB events, only the leader can have 71 + these attributes set. 72 + 73 + An EBB event must NOT set any of the "inherit", "sample_period", "freq" or 74 + "enable_on_exec" attributes. 75 + 76 + An EBB event must be attached to a task. This is specified to perf_event_open() 77 + by passing a pid value, typically 0 indicating the current task. 78 + 79 + All events in a group must agree on whether they want EBB. That is all events 80 + must request EBB, or none may request EBB. 81 + 82 + EBB events must specify the PMC they are to be counted on. This ensures 83 + userspace is able to reliably determine which PMC the event is scheduled on. 84 + 85 + 86 + Enabling an EBB event 87 + --------------------- 88 + 89 + Once an EBB event has been successfully opened, it must be enabled with the 90 + perf_events API. This can be achieved either via the ioctl() interface, or the 91 + prctl() interface. 92 + 93 + However, due to the design of the perf_events API, enabling an event does not 94 + guarantee that it has been scheduled on the PMU. To ensure that the EBB event 95 + has been scheduled on the PMU, you must perform a read() on the event. If the 96 + read() returns EOF, then the event has not been scheduled and EBBs are not 97 + enabled. 98 + 99 + This behaviour occurs because the EBB event is pinned and exclusive. When the 100 + EBB event is enabled it will force all other non-pinned events off the PMU. In 101 + this case the enable will be successful. However if there is already an event 102 + pinned on the PMU then the enable will not be successful. 103 + 104 + 105 + Reading an EBB event 106 + -------------------- 107 + 108 + It is possible to read() from an EBB event. However the results are 109 + meaningless. Because interrupts are being delivered to the user process the 110 + kernel is not able to count the event, and so will return a junk value. 111 + 112 + 113 + Closing an EBB event 114 + -------------------- 115 + 116 + When an EBB event is finished with, you can close it using close() as for any 117 + regular event. If this is the last EBB event the PMU will be deconfigured and 118 + no further PMU EBBs will be delivered. 119 + 120 + 121 + EBB Handler 122 + ----------- 123 + 124 + The EBB handler is just regular userspace code, however it must be written in 125 + the style of an interrupt handler. When the handler is entered all registers 126 + are live (possibly) and so must be saved somehow before the handler can invoke 127 + other code. 128 + 129 + It's up to the program how to handle this. For C programs a relatively simple 130 + option is to create an interrupt frame on the stack and save registers there. 131 + 132 + Fork 133 + ---- 134 + 135 + EBB events are not inherited across fork. If the child process wishes to use 136 + EBBs it should open a new event for itself. Similarly the EBB state in 137 + BESCR/EBBHR/EBBRR is cleared across fork().
+63
Documentation/vfio.txt
··· 283 283 interfaces implement the device region access defined by the device's 284 284 own VFIO_DEVICE_GET_REGION_INFO ioctl. 285 285 286 + 287 + PPC64 sPAPR implementation note 288 + ------------------------------------------------------------------------------- 289 + 290 + This implementation has some specifics: 291 + 292 + 1) Only one IOMMU group per container is supported as an IOMMU group 293 + represents the minimal entity which isolation can be guaranteed for and 294 + groups are allocated statically, one per a Partitionable Endpoint (PE) 295 + (PE is often a PCI domain but not always). 296 + 297 + 2) The hardware supports so called DMA windows - the PCI address range 298 + within which DMA transfer is allowed, any attempt to access address space 299 + out of the window leads to the whole PE isolation. 300 + 301 + 3) PPC64 guests are paravirtualized but not fully emulated. There is an API 302 + to map/unmap pages for DMA, and it normally maps 1..32 pages per call and 303 + currently there is no way to reduce the number of calls. In order to make things 304 + faster, the map/unmap handling has been implemented in real mode which provides 305 + an excellent performance which has limitations such as inability to do 306 + locked pages accounting in real time. 307 + 308 + So 3 additional ioctls have been added: 309 + 310 + VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start 311 + of the DMA window on the PCI bus. 312 + 313 + VFIO_IOMMU_ENABLE - enables the container. The locked pages accounting 314 + is done at this point. This lets user first to know what 315 + the DMA window is and adjust rlimit before doing any real job. 316 + 317 + VFIO_IOMMU_DISABLE - disables the container. 318 + 319 + 320 + The code flow from the example above should be slightly changed: 321 + 322 + ..... 323 + /* Add the group to the container */ 324 + ioctl(group, VFIO_GROUP_SET_CONTAINER, &container); 325 + 326 + /* Enable the IOMMU model we want */ 327 + ioctl(container, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU) 328 + 329 + /* Get addition sPAPR IOMMU info */ 330 + vfio_iommu_spapr_tce_info spapr_iommu_info; 331 + ioctl(container, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &spapr_iommu_info); 332 + 333 + if (ioctl(container, VFIO_IOMMU_ENABLE)) 334 + /* Cannot enable container, may be low rlimit */ 335 + 336 + /* Allocate some space and setup a DMA mapping */ 337 + dma_map.vaddr = mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE, 338 + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); 339 + 340 + dma_map.size = 1024 * 1024; 341 + dma_map.iova = 0; /* 1MB starting at 0x0 from device view */ 342 + dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE; 343 + 344 + /* Check here is .iova/.size are within DMA window from spapr_iommu_info */ 345 + 346 + ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map); 347 + ..... 348 + 286 349 ------------------------------------------------------------------------------- 287 350 288 351 [1] VFIO was originally an acronym for "Virtual Function I/O" in its
+7 -1
MAINTAINERS
··· 3123 3123 S: Maintained 3124 3124 F: drivers/media/rc/ene_ir.* 3125 3125 3126 + ENHANCED ERROR HANDLING (EEH) 3127 + M: Gavin Shan <shangw@linux.vnet.ibm.com> 3128 + L: linuxppc-dev@lists.ozlabs.org 3129 + S: Supported 3130 + F: Documentation/powerpc/eeh-pci-error-recovery.txt 3131 + F: arch/powerpc/kernel/eeh*.c 3132 + 3126 3133 EPSON S1D13XXX FRAMEBUFFER DRIVER 3127 3134 M: Kristoffer Ericson <kristoffer.ericson@gmail.com> 3128 3135 S: Maintained ··· 6199 6192 L: linux-pci@vger.kernel.org 6200 6193 S: Supported 6201 6194 F: Documentation/PCI/pci-error-recovery.txt 6202 - F: Documentation/powerpc/eeh-pci-error-recovery.txt 6203 6195 6204 6196 PCI SUBSYSTEM 6205 6197 M: Bjorn Helgaas <bhelgaas@google.com>
+5 -12
arch/powerpc/Kconfig
··· 298 298 299 299 config MATH_EMULATION 300 300 bool "Math emulation" 301 - depends on 4xx || 8xx || E200 || PPC_MPC832x || E500 301 + depends on 4xx || 8xx || PPC_MPC832x || BOOKE 302 302 ---help--- 303 303 Some PowerPC chips designed for embedded applications do not have 304 304 a floating-point unit and therefore do not implement the ··· 307 307 unit, which will allow programs that use floating-point 308 308 instructions to run. 309 309 310 + This is also useful to emulate missing (optional) instructions 311 + such as fsqrt on cores that do have an FPU but do not implement 312 + them (such as Freescale BookE). 313 + 310 314 config PPC_TRANSACTIONAL_MEM 311 315 bool "Transactional Memory support for POWERPC" 312 316 depends on PPC_BOOK3S_64 ··· 318 314 default n 319 315 ---help--- 320 316 Support user-mode Transactional Memory on POWERPC. 321 - 322 - config 8XX_MINIMAL_FPEMU 323 - bool "Minimal math emulation for 8xx" 324 - depends on 8xx && !MATH_EMULATION 325 - help 326 - Older arch/ppc kernels still emulated a few floating point 327 - instructions such as load and store, even when full math 328 - emulation is disabled. Say "Y" here if you want to preserve 329 - this behavior. 330 - 331 - It is recommended that you build a soft-float userspace instead. 332 317 333 318 config IOMMU_HELPER 334 319 def_bool PPC64
+7
arch/powerpc/Kconfig.debug
··· 147 147 enable debugging for the wrong type of machine your kernel 148 148 _will not boot_. 149 149 150 + config PPC_EARLY_DEBUG_BOOTX 151 + bool "BootX or OpenFirmware" 152 + depends on BOOTX_TEXT 153 + help 154 + Select this to enable early debugging for a machine using BootX 155 + or OpenFirmware. 156 + 150 157 config PPC_EARLY_DEBUG_LPAR 151 158 bool "LPAR HV Console" 152 159 depends on PPC_PSERIES
+5
arch/powerpc/boot/dts/currituck.dts
··· 103 103 interrupts = <34 2>; 104 104 }; 105 105 106 + FPGA0: fpga@50000000 { 107 + compatible = "ibm,currituck-fpga"; 108 + reg = <0x50000000 0x4>; 109 + }; 110 + 106 111 IIC0: i2c@00000000 { 107 112 compatible = "ibm,iic-currituck", "ibm,iic"; 108 113 reg = <0x0 0x00000014>;
+156
arch/powerpc/boot/dts/fsl/interlaken-lac-portals.dtsi
··· 1 + /* T4240 Interlaken LAC Portal device tree stub with 24 portals. 2 + * 3 + * Copyright 2012 Freescale Semiconductor Inc. 4 + * 5 + * Redistribution and use in source and binary forms, with or without 6 + * modification, are permitted provided that the following conditions are met: 7 + * * Redistributions of source code must retain the above copyright 8 + * notice, this list of conditions and the following disclaimer. 9 + * * Redistributions in binary form must reproduce the above copyright 10 + * notice, this list of conditions and the following disclaimer in the 11 + * documentation and/or other materials provided with the distribution. 12 + * * Neither the name of Freescale Semiconductor nor the 13 + * names of its contributors may be used to endorse or promote products 14 + * derived from this software without specific prior written permission. 15 + * 16 + * 17 + * ALTERNATIVELY, this software may be distributed under the terms of the 18 + * GNU General Public License ("GPL") as published by the Free Software 19 + * Foundation, either version 2 of that License or (at your option) any 20 + * later version. 21 + * 22 + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor "AS IS" AND ANY 23 + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 24 + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 25 + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY 26 + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 27 + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 28 + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 29 + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 30 + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 31 + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 32 + */ 33 + 34 + #address-cells = <0x1>; 35 + #size-cells = <0x1>; 36 + compatible = "fsl,interlaken-lac-portals"; 37 + 38 + lportal0: lac-portal@0 { 39 + compatible = "fsl,interlaken-lac-portal-v1.0"; 40 + reg = <0x0 0x1000>; 41 + }; 42 + 43 + lportal1: lac-portal@1000 { 44 + compatible = "fsl,interlaken-lac-portal-v1.0"; 45 + reg = <0x1000 0x1000>; 46 + }; 47 + 48 + lportal2: lac-portal@2000 { 49 + compatible = "fsl,interlaken-lac-portal-v1.0"; 50 + reg = <0x2000 0x1000>; 51 + }; 52 + 53 + lportal3: lac-portal@3000 { 54 + compatible = "fsl,interlaken-lac-portal-v1.0"; 55 + reg = <0x3000 0x1000>; 56 + }; 57 + 58 + lportal4: lac-portal@4000 { 59 + compatible = "fsl,interlaken-lac-portal-v1.0"; 60 + reg = <0x4000 0x1000>; 61 + }; 62 + 63 + lportal5: lac-portal@5000 { 64 + compatible = "fsl,interlaken-lac-portal-v1.0"; 65 + reg = <0x5000 0x1000>; 66 + }; 67 + 68 + lportal6: lac-portal@6000 { 69 + compatible = "fsl,interlaken-lac-portal-v1.0"; 70 + reg = <0x6000 0x1000>; 71 + }; 72 + 73 + lportal7: lac-portal@7000 { 74 + compatible = "fsl,interlaken-lac-portal-v1.0"; 75 + reg = <0x7000 0x1000>; 76 + }; 77 + 78 + lportal8: lac-portal@8000 { 79 + compatible = "fsl,interlaken-lac-portal-v1.0"; 80 + reg = <0x8000 0x1000>; 81 + }; 82 + 83 + lportal9: lac-portal@9000 { 84 + compatible = "fsl,interlaken-lac-portal-v1.0"; 85 + reg = <0x9000 0x1000>; 86 + }; 87 + 88 + lportal10: lac-portal@A000 { 89 + compatible = "fsl,interlaken-lac-portal-v1.0"; 90 + reg = <0xA000 0x1000>; 91 + }; 92 + 93 + lportal11: lac-portal@B000 { 94 + compatible = "fsl,interlaken-lac-portal-v1.0"; 95 + reg = <0xB000 0x1000>; 96 + }; 97 + 98 + lportal12: lac-portal@C000 { 99 + compatible = "fsl,interlaken-lac-portal-v1.0"; 100 + reg = <0xC000 0x1000>; 101 + }; 102 + 103 + lportal13: lac-portal@D000 { 104 + compatible = "fsl,interlaken-lac-portal-v1.0"; 105 + reg = <0xD000 0x1000>; 106 + }; 107 + 108 + lportal14: lac-portal@E000 { 109 + compatible = "fsl,interlaken-lac-portal-v1.0"; 110 + reg = <0xE000 0x1000>; 111 + }; 112 + 113 + lportal15: lac-portal@F000 { 114 + compatible = "fsl,interlaken-lac-portal-v1.0"; 115 + reg = <0xF000 0x1000>; 116 + }; 117 + 118 + lportal16: lac-portal@10000 { 119 + compatible = "fsl,interlaken-lac-portal-v1.0"; 120 + reg = <0x10000 0x1000>; 121 + }; 122 + 123 + lportal17: lac-portal@11000 { 124 + compatible = "fsl,interlaken-lac-portal-v1.0"; 125 + reg = <0x11000 0x1000>; 126 + }; 127 + 128 + lportal18: lac-portal@1200 { 129 + compatible = "fsl,interlaken-lac-portal-v1.0"; 130 + reg = <0x12000 0x1000>; 131 + }; 132 + 133 + lportal19: lac-portal@13000 { 134 + compatible = "fsl,interlaken-lac-portal-v1.0"; 135 + reg = <0x13000 0x1000>; 136 + }; 137 + 138 + lportal20: lac-portal@14000 { 139 + compatible = "fsl,interlaken-lac-portal-v1.0"; 140 + reg = <0x14000 0x1000>; 141 + }; 142 + 143 + lportal21: lac-portal@15000 { 144 + compatible = "fsl,interlaken-lac-portal-v1.0"; 145 + reg = <0x15000 0x1000>; 146 + }; 147 + 148 + lportal22: lac-portal@16000 { 149 + compatible = "fsl,interlaken-lac-portal-v1.0"; 150 + reg = <0x16000 0x1000>; 151 + }; 152 + 153 + lportal23: lac-portal@17000 { 154 + compatible = "fsl,interlaken-lac-portal-v1.0"; 155 + reg = <0x17000 0x1000>; 156 + };
+45
arch/powerpc/boot/dts/fsl/interlaken-lac.dtsi
··· 1 + /* 2 + * T4 Interlaken Look-aside Controller (LAC) device tree stub 3 + * 4 + * Copyright 2012 Freescale Semiconductor Inc. 5 + * 6 + * Redistribution and use in source and binary forms, with or without 7 + * modification, are permitted provided that the following conditions are met: 8 + * * Redistributions of source code must retain the above copyright 9 + * notice, this list of conditions and the following disclaimer. 10 + * * Redistributions in binary form must reproduce the above copyright 11 + * notice, this list of conditions and the following disclaimer in the 12 + * documentation and/or other materials provided with the distribution. 13 + * * Neither the name of Freescale Semiconductor nor the 14 + * names of its contributors may be used to endorse or promote products 15 + * derived from this software without specific prior written permission. 16 + * 17 + * 18 + * ALTERNATIVELY, this software may be distributed under the terms of the 19 + * GNU General Public License ("GPL") as published by the Free Software 20 + * Foundation, either version 2 of that License or (at your option) any 21 + * later version. 22 + * 23 + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor "AS IS" AND ANY 24 + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 25 + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 26 + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY 27 + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 28 + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 29 + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 30 + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 31 + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 32 + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 33 + */ 34 + 35 + lac: lac@229000 { 36 + compatible = "fsl,interlaken-lac"; 37 + reg = <0x229000 0x1000>; 38 + interrupts = <16 2 1 18>; 39 + }; 40 + 41 + lac-hv@228000 { 42 + compatible = "fsl,interlaken-lac-hv"; 43 + reg = <0x228000 0x1000>; 44 + fsl,non-hv-node = <&lac>; 45 + };
+2
arch/powerpc/configs/c2k_defconfig
··· 423 423 CONFIG_DEBUG_STACKOVERFLOW=y 424 424 CONFIG_DEBUG_STACK_USAGE=y 425 425 CONFIG_BOOTX_TEXT=y 426 + CONFIG_PPC_EARLY_DEBUG=y 427 + CONFIG_PPC_EARLY_DEBUG_BOOTX=y 426 428 CONFIG_KEYS=y 427 429 CONFIG_KEYS_DEBUG_PROC_KEYS=y 428 430 CONFIG_SECURITY=y
+2
arch/powerpc/configs/g5_defconfig
··· 284 284 CONFIG_LATENCYTOP=y 285 285 CONFIG_SYSCTL_SYSCALL_CHECK=y 286 286 CONFIG_BOOTX_TEXT=y 287 + CONFIG_PPC_EARLY_DEBUG=y 288 + CONFIG_PPC_EARLY_DEBUG_BOOTX=y 287 289 CONFIG_CRYPTO_NULL=m 288 290 CONFIG_CRYPTO_TEST=m 289 291 CONFIG_CRYPTO_ECB=m
+2
arch/powerpc/configs/maple_defconfig
··· 138 138 CONFIG_XMON=y 139 139 CONFIG_XMON_DEFAULT=y 140 140 CONFIG_BOOTX_TEXT=y 141 + CONFIG_PPC_EARLY_DEBUG=y 142 + CONFIG_PPC_EARLY_DEBUG_BOOTX=y 141 143 CONFIG_CRYPTO_ECB=m 142 144 CONFIG_CRYPTO_PCBC=m 143 145 # CONFIG_CRYPTO_ANSI_CPRNG is not set
+10 -17
arch/powerpc/configs/mpc512x_defconfig
··· 1 - CONFIG_EXPERIMENTAL=y 2 1 # CONFIG_SWAP is not set 3 2 CONFIG_SYSVIPC=y 4 - CONFIG_SPARSE_IRQ=y 3 + CONFIG_NO_HZ=y 5 4 CONFIG_LOG_BUF_SHIFT=16 6 5 CONFIG_BLK_DEV_INITRD=y 7 6 # CONFIG_COMPAT_BRK is not set ··· 8 9 CONFIG_MODULES=y 9 10 CONFIG_MODULE_UNLOAD=y 10 11 # CONFIG_BLK_DEV_BSG is not set 12 + CONFIG_PARTITION_ADVANCED=y 11 13 # CONFIG_IOSCHED_CFQ is not set 12 14 # CONFIG_PPC_CHRP is not set 13 15 CONFIG_PPC_MPC512x=y ··· 16 16 CONFIG_MPC512x_GENERIC=y 17 17 CONFIG_PDM360NG=y 18 18 # CONFIG_PPC_PMAC is not set 19 - CONFIG_NO_HZ=y 20 19 CONFIG_HZ_1000=y 21 - # CONFIG_MIGRATION is not set 22 20 # CONFIG_SECCOMP is not set 23 21 # CONFIG_PCI is not set 24 22 CONFIG_NET=y ··· 31 33 # CONFIG_INET_DIAG is not set 32 34 # CONFIG_IPV6 is not set 33 35 CONFIG_CAN=y 34 - CONFIG_CAN_RAW=y 35 - CONFIG_CAN_BCM=y 36 36 CONFIG_CAN_VCAN=y 37 37 CONFIG_CAN_MSCAN=y 38 38 CONFIG_CAN_DEBUG_DEVICES=y ··· 42 46 # CONFIG_FIRMWARE_IN_KERNEL is not set 43 47 CONFIG_MTD=y 44 48 CONFIG_MTD_CMDLINE_PARTS=y 45 - CONFIG_MTD_CHAR=y 46 49 CONFIG_MTD_BLOCK=y 47 50 CONFIG_MTD_CFI=y 48 51 CONFIG_MTD_CFI_AMDSTD=y ··· 55 60 CONFIG_BLK_DEV_RAM_COUNT=1 56 61 CONFIG_BLK_DEV_RAM_SIZE=8192 57 62 CONFIG_BLK_DEV_XIP=y 58 - CONFIG_MISC_DEVICES=y 59 63 CONFIG_EEPROM_AT24=y 60 64 CONFIG_EEPROM_AT25=y 61 65 CONFIG_SCSI=y ··· 62 68 CONFIG_BLK_DEV_SD=y 63 69 CONFIG_CHR_DEV_SG=y 64 70 CONFIG_NETDEVICES=y 71 + CONFIG_FS_ENET=y 65 72 CONFIG_MARVELL_PHY=y 66 73 CONFIG_DAVICOM_PHY=y 67 74 CONFIG_QSEMI_PHY=y ··· 78 83 CONFIG_LSI_ET1011C_PHY=y 79 84 CONFIG_FIXED_PHY=y 80 85 CONFIG_MDIO_BITBANG=y 81 - CONFIG_NET_ETHERNET=y 82 - CONFIG_FS_ENET=y 83 - # CONFIG_NETDEV_1000 is not set 84 - # CONFIG_NETDEV_10000 is not set 85 86 # CONFIG_WLAN is not set 86 87 # CONFIG_INPUT_MOUSEDEV_PSAUX is not set 87 88 CONFIG_INPUT_EVDEV=y ··· 97 106 CONFIG_GPIO_MPC8XXX=y 98 107 # CONFIG_HWMON is not set 99 108 CONFIG_MEDIA_SUPPORT=y 100 - CONFIG_VIDEO_DEV=y 101 109 CONFIG_VIDEO_ADV_DEBUG=y 102 - # CONFIG_VIDEO_HELPER_CHIPS_AUTO is not set 103 - CONFIG_VIDEO_SAA711X=y 104 110 CONFIG_FB=y 105 111 CONFIG_FB_FSL_DIU=y 106 112 # CONFIG_VGA_CONSOLE is not set 107 113 CONFIG_FRAMEBUFFER_CONSOLE=y 114 + CONFIG_USB=y 115 + CONFIG_USB_EHCI_HCD=y 116 + CONFIG_USB_EHCI_FSL=y 117 + # CONFIG_USB_EHCI_HCD_PPC_OF is not set 118 + CONFIG_USB_STORAGE=y 119 + CONFIG_USB_GADGET=y 120 + CONFIG_USB_FSL_USB2=y 108 121 CONFIG_RTC_CLASS=y 109 122 CONFIG_RTC_DRV_M41T80=y 110 123 CONFIG_RTC_DRV_MPC5121=y ··· 124 129 CONFIG_JFFS2_FS=y 125 130 CONFIG_UBIFS_FS=y 126 131 CONFIG_NFS_FS=y 127 - CONFIG_NFS_V3=y 128 132 CONFIG_ROOT_NFS=y 129 - CONFIG_PARTITION_ADVANCED=y 130 133 CONFIG_NLS_CODEPAGE_437=y 131 134 CONFIG_NLS_ISO8859_1=y 132 135 # CONFIG_ENABLE_WARN_DEPRECATED is not set
+1
arch/powerpc/configs/mpc85xx_smp_defconfig
··· 131 131 CONFIG_FS_ENET=y 132 132 CONFIG_UCC_GETH=y 133 133 CONFIG_GIANFAR=y 134 + CONFIG_E1000E=y 134 135 CONFIG_MARVELL_PHY=y 135 136 CONFIG_DAVICOM_PHY=y 136 137 CONFIG_CICADA_PHY=y
+2
arch/powerpc/configs/pmac32_defconfig
··· 350 350 CONFIG_XMON=y 351 351 CONFIG_XMON_DEFAULT=y 352 352 CONFIG_BOOTX_TEXT=y 353 + CONFIG_PPC_EARLY_DEBUG=y 354 + CONFIG_PPC_EARLY_DEBUG_BOOTX=y 353 355 CONFIG_CRYPTO_NULL=m 354 356 CONFIG_CRYPTO_PCBC=m 355 357 CONFIG_CRYPTO_MD4=m
+2
arch/powerpc/configs/ppc64_defconfig
··· 398 398 CONFIG_MSI_BITMAP_SELFTEST=y 399 399 CONFIG_XMON=y 400 400 CONFIG_BOOTX_TEXT=y 401 + CONFIG_PPC_EARLY_DEBUG=y 402 + CONFIG_PPC_EARLY_DEBUG_BOOTX=y 401 403 CONFIG_CRYPTO_NULL=m 402 404 CONFIG_CRYPTO_TEST=m 403 405 CONFIG_CRYPTO_PCBC=m
+2
arch/powerpc/configs/ppc6xx_defconfig
··· 1264 1264 CONFIG_DEBUG_STACK_USAGE=y 1265 1265 CONFIG_XMON=y 1266 1266 CONFIG_BOOTX_TEXT=y 1267 + CONFIG_PPC_EARLY_DEBUG=y 1268 + CONFIG_PPC_EARLY_DEBUG_BOOTX=y 1267 1269 CONFIG_KEYS=y 1268 1270 CONFIG_KEYS_DEBUG_PROC_KEYS=y 1269 1271 CONFIG_SECURITY=y
+1
arch/powerpc/configs/pseries_defconfig
··· 296 296 CONFIG_SQUASHFS_XATTR=y 297 297 CONFIG_SQUASHFS_LZO=y 298 298 CONFIG_SQUASHFS_XZ=y 299 + CONFIG_PSTORE=y 299 300 CONFIG_NFS_FS=y 300 301 CONFIG_NFS_V3_ACL=y 301 302 CONFIG_NFS_V4=y
+24 -12
arch/powerpc/include/asm/eeh.h
··· 24 24 #include <linux/init.h> 25 25 #include <linux/list.h> 26 26 #include <linux/string.h> 27 + #include <linux/time.h> 27 28 28 29 struct pci_dev; 29 30 struct pci_bus; ··· 53 52 54 53 #define EEH_PE_ISOLATED (1 << 0) /* Isolated PE */ 55 54 #define EEH_PE_RECOVERING (1 << 1) /* Recovering PE */ 55 + #define EEH_PE_PHB_DEAD (1 << 2) /* Dead PHB */ 56 56 57 57 struct eeh_pe { 58 58 int type; /* PE type: PHB/Bus/Device */ ··· 61 59 int config_addr; /* Traditional PCI address */ 62 60 int addr; /* PE configuration address */ 63 61 struct pci_controller *phb; /* Associated PHB */ 62 + struct pci_bus *bus; /* Top PCI bus for bus PE */ 64 63 int check_count; /* Times of ignored error */ 65 64 int freeze_count; /* Times of froze up */ 65 + struct timeval tstamp; /* Time on first-time freeze */ 66 66 int false_positives; /* Times of reported #ff's */ 67 67 struct eeh_pe *parent; /* Parent PE */ 68 68 struct list_head child_list; /* Link PE to the child list */ ··· 99 95 100 96 static inline struct device_node *eeh_dev_to_of_node(struct eeh_dev *edev) 101 97 { 102 - return edev->dn; 98 + return edev ? edev->dn : NULL; 103 99 } 104 100 105 101 static inline struct pci_dev *eeh_dev_to_pci_dev(struct eeh_dev *edev) 106 102 { 107 - return edev->pdev; 103 + return edev ? edev->pdev : NULL; 108 104 } 109 105 110 106 /* ··· 134 130 struct eeh_ops { 135 131 char *name; 136 132 int (*init)(void); 133 + int (*post_init)(void); 137 134 void* (*of_probe)(struct device_node *dn, void *flag); 138 - void* (*dev_probe)(struct pci_dev *dev, void *flag); 135 + int (*dev_probe)(struct pci_dev *dev, void *flag); 139 136 int (*set_option)(struct eeh_pe *pe, int option); 140 137 int (*get_pe_addr)(struct eeh_pe *pe); 141 138 int (*get_state)(struct eeh_pe *pe, int *state); ··· 146 141 int (*configure_bridge)(struct eeh_pe *pe); 147 142 int (*read_config)(struct device_node *dn, int where, int size, u32 *val); 148 143 int (*write_config)(struct device_node *dn, int where, int size, u32 val); 144 + int (*next_error)(struct eeh_pe **pe); 149 145 }; 150 146 151 147 extern struct eeh_ops *eeh_ops; 152 148 extern int eeh_subsystem_enabled; 153 - extern struct mutex eeh_mutex; 149 + extern raw_spinlock_t confirm_error_lock; 154 150 extern int eeh_probe_mode; 155 151 156 152 #define EEH_PROBE_MODE_DEV (1<<0) /* From PCI device */ ··· 172 166 return (eeh_probe_mode == EEH_PROBE_MODE_DEV); 173 167 } 174 168 175 - static inline void eeh_lock(void) 169 + static inline void eeh_serialize_lock(unsigned long *flags) 176 170 { 177 - mutex_lock(&eeh_mutex); 171 + raw_spin_lock_irqsave(&confirm_error_lock, *flags); 178 172 } 179 173 180 - static inline void eeh_unlock(void) 174 + static inline void eeh_serialize_unlock(unsigned long flags) 181 175 { 182 - mutex_unlock(&eeh_mutex); 176 + raw_spin_unlock_irqrestore(&confirm_error_lock, flags); 183 177 } 184 178 185 179 /* ··· 190 184 191 185 typedef void *(*eeh_traverse_func)(void *data, void *flag); 192 186 int eeh_phb_pe_create(struct pci_controller *phb); 187 + struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb); 188 + struct eeh_pe *eeh_pe_get(struct eeh_dev *edev); 193 189 int eeh_add_to_parent_pe(struct eeh_dev *edev); 194 190 int eeh_rmv_from_parent_pe(struct eeh_dev *edev, int purge_pe); 191 + void eeh_pe_update_time_stamp(struct eeh_pe *pe); 195 192 void *eeh_pe_dev_traverse(struct eeh_pe *root, 196 193 eeh_traverse_func fn, void *flag); 197 194 void eeh_pe_restore_bars(struct eeh_pe *pe); ··· 202 193 203 194 void *eeh_dev_init(struct device_node *dn, void *data); 204 195 void eeh_dev_phb_init_dynamic(struct pci_controller *phb); 196 + int eeh_init(void); 205 197 int __init eeh_ops_register(struct eeh_ops *ops); 206 198 int __exit eeh_ops_unregister(const char *name); 207 199 unsigned long eeh_check_failure(const volatile void __iomem *token, 208 200 unsigned long val); 209 201 int eeh_dev_check_failure(struct eeh_dev *edev); 210 - void __init eeh_addr_cache_build(void); 202 + void eeh_addr_cache_build(void); 211 203 void eeh_add_device_tree_early(struct device_node *); 212 204 void eeh_add_device_tree_late(struct pci_bus *); 213 205 void eeh_add_sysfs_files(struct pci_bus *); ··· 230 220 #define EEH_IO_ERROR_VALUE(size) (~0U >> ((4 - (size)) * 8)) 231 221 232 222 #else /* !CONFIG_EEH */ 223 + 224 + static inline int eeh_init(void) 225 + { 226 + return 0; 227 + } 233 228 234 229 static inline void *eeh_dev_init(struct device_node *dn, void *data) 235 230 { ··· 259 244 static inline void eeh_add_sysfs_files(struct pci_bus *bus) { } 260 245 261 246 static inline void eeh_remove_bus_device(struct pci_dev *dev, int purge_pe) { } 262 - 263 - static inline void eeh_lock(void) { } 264 - static inline void eeh_unlock(void) { } 265 247 266 248 #define EEH_POSSIBLE_ERROR(val, type) (0) 267 249 #define EEH_IO_ERROR_VALUE(size) (-1UL)
+2
arch/powerpc/include/asm/eeh_event.h
··· 31 31 struct eeh_pe *pe; /* EEH PE */ 32 32 }; 33 33 34 + int eeh_event_init(void); 34 35 int eeh_send_failure_event(struct eeh_pe *pe); 36 + void eeh_remove_event(struct eeh_pe *pe); 35 37 void eeh_handle_event(struct eeh_pe *pe); 36 38 37 39 #endif /* __KERNEL__ */
+4 -4
arch/powerpc/include/asm/exception-64s.h
··· 358 358 /* No guest interrupts come through here */ \ 359 359 SET_SCRATCH0(r13); /* save r13 */ \ 360 360 EXCEPTION_RELON_PROLOG_PSERIES(PACA_EXGEN, label##_common, \ 361 - EXC_STD, KVMTEST_PR, vec) 361 + EXC_STD, NOTEST, vec) 362 362 363 363 #define STD_RELON_EXCEPTION_PSERIES_OOL(vec, label) \ 364 364 .globl label##_relon_pSeries; \ 365 365 label##_relon_pSeries: \ 366 - EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_PR, vec); \ 366 + EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, vec); \ 367 367 EXCEPTION_RELON_PROLOG_PSERIES_1(label##_common, EXC_STD) 368 368 369 369 #define STD_RELON_EXCEPTION_HV(loc, vec, label) \ ··· 374 374 /* No guest interrupts come through here */ \ 375 375 SET_SCRATCH0(r13); /* save r13 */ \ 376 376 EXCEPTION_RELON_PROLOG_PSERIES(PACA_EXGEN, label##_common, \ 377 - EXC_HV, KVMTEST, vec) 377 + EXC_HV, NOTEST, vec) 378 378 379 379 #define STD_RELON_EXCEPTION_HV_OOL(vec, label) \ 380 380 .globl label##_relon_hv; \ 381 381 label##_relon_hv: \ 382 - EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST, vec); \ 382 + EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, vec); \ 383 383 EXCEPTION_RELON_PROLOG_PSERIES_1(label##_common, EXC_HV) 384 384 385 385 /* This associate vector numbers with bits in paca->irq_happened */
+7 -1
arch/powerpc/include/asm/hugetlb.h
··· 191 191 unsigned long vmaddr) 192 192 { 193 193 } 194 - #endif /* CONFIG_HUGETLB_PAGE */ 195 194 195 + #define hugepd_shift(x) 0 196 + static inline pte_t *hugepte_offset(hugepd_t *hpdp, unsigned long addr, 197 + unsigned pdshift) 198 + { 199 + return 0; 200 + } 201 + #endif /* CONFIG_HUGETLB_PAGE */ 196 202 197 203 /* 198 204 * FSL Book3E platforms require special gpage handling - the gpages
+26 -7
arch/powerpc/include/asm/iommu.h
··· 76 76 struct iommu_pool large_pool; 77 77 struct iommu_pool pools[IOMMU_NR_POOLS]; 78 78 unsigned long *it_map; /* A simple allocation bitmap for now */ 79 + #ifdef CONFIG_IOMMU_API 80 + struct iommu_group *it_group; 81 + #endif 79 82 }; 80 83 81 84 struct scatterlist; ··· 101 98 */ 102 99 extern struct iommu_table *iommu_init_table(struct iommu_table * tbl, 103 100 int nid); 101 + extern void iommu_register_group(struct iommu_table *tbl, 102 + int pci_domain_number, unsigned long pe_num); 104 103 105 104 extern int iommu_map_sg(struct device *dev, struct iommu_table *tbl, 106 105 struct scatterlist *sglist, int nelems, ··· 130 125 extern void iommu_init_early_dart(void); 131 126 extern void iommu_init_early_pasemi(void); 132 127 133 - #ifdef CONFIG_PCI 134 - extern void pci_iommu_init(void); 135 - extern void pci_direct_iommu_init(void); 136 - #else 137 - static inline void pci_iommu_init(void) { } 138 - #endif 139 - 140 128 extern void alloc_dart_table(void); 141 129 #if defined(CONFIG_PPC64) && defined(CONFIG_PM) 142 130 static inline void iommu_save(void) ··· 144 146 ppc_md.iommu_restore(); 145 147 } 146 148 #endif 149 + 150 + /* The API to support IOMMU operations for VFIO */ 151 + extern int iommu_tce_clear_param_check(struct iommu_table *tbl, 152 + unsigned long ioba, unsigned long tce_value, 153 + unsigned long npages); 154 + extern int iommu_tce_put_param_check(struct iommu_table *tbl, 155 + unsigned long ioba, unsigned long tce); 156 + extern int iommu_tce_build(struct iommu_table *tbl, unsigned long entry, 157 + unsigned long hwaddr, enum dma_data_direction direction); 158 + extern unsigned long iommu_clear_tce(struct iommu_table *tbl, 159 + unsigned long entry); 160 + extern int iommu_clear_tces_and_put_pages(struct iommu_table *tbl, 161 + unsigned long entry, unsigned long pages); 162 + extern int iommu_put_tce_user_mode(struct iommu_table *tbl, 163 + unsigned long entry, unsigned long tce); 164 + 165 + extern void iommu_flush_tce(struct iommu_table *tbl); 166 + extern int iommu_take_ownership(struct iommu_table *tbl); 167 + extern void iommu_release_ownership(struct iommu_table *tbl); 168 + 169 + extern enum dma_data_direction iommu_tce_direction(unsigned long tce); 147 170 148 171 #endif /* __KERNEL__ */ 149 172 #endif /* _ASM_IOMMU_H */
+33 -23
arch/powerpc/include/asm/kvm_book3s_64.h
··· 159 159 } 160 160 161 161 /* 162 - * Lock and read a linux PTE. If it's present and writable, atomically 163 - * set dirty and referenced bits and return the PTE, otherwise return 0. 162 + * If it's present and writable, atomically set dirty and referenced bits and 163 + * return the PTE, otherwise return 0. If we find a transparent hugepage 164 + * and if it is marked splitting we return 0; 164 165 */ 165 - static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing) 166 + static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing, 167 + unsigned int hugepage) 166 168 { 167 - pte_t pte, tmp; 169 + pte_t old_pte, new_pte = __pte(0); 168 170 169 - /* wait until _PAGE_BUSY is clear then set it atomically */ 170 - __asm__ __volatile__ ( 171 - "1: ldarx %0,0,%3\n" 172 - " andi. %1,%0,%4\n" 173 - " bne- 1b\n" 174 - " ori %1,%0,%4\n" 175 - " stdcx. %1,0,%3\n" 176 - " bne- 1b" 177 - : "=&r" (pte), "=&r" (tmp), "=m" (*p) 178 - : "r" (p), "i" (_PAGE_BUSY) 179 - : "cc"); 171 + while (1) { 172 + old_pte = pte_val(*ptep); 173 + /* 174 + * wait until _PAGE_BUSY is clear then set it atomically 175 + */ 176 + if (unlikely(old_pte & _PAGE_BUSY)) { 177 + cpu_relax(); 178 + continue; 179 + } 180 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 181 + /* If hugepage and is trans splitting return None */ 182 + if (unlikely(hugepage && 183 + pmd_trans_splitting(pte_pmd(old_pte)))) 184 + return __pte(0); 185 + #endif 186 + /* If pte is not present return None */ 187 + if (unlikely(!(old_pte & _PAGE_PRESENT))) 188 + return __pte(0); 180 189 181 - if (pte_present(pte)) { 182 - pte = pte_mkyoung(pte); 183 - if (writing && pte_write(pte)) 184 - pte = pte_mkdirty(pte); 190 + new_pte = pte_mkyoung(old_pte); 191 + if (writing && pte_write(old_pte)) 192 + new_pte = pte_mkdirty(new_pte); 193 + 194 + if (old_pte == __cmpxchg_u64((unsigned long *)ptep, old_pte, 195 + new_pte)) 196 + break; 185 197 } 186 - 187 - *p = pte; /* clears _PAGE_BUSY */ 188 - 189 - return pte; 198 + return new_pte; 190 199 } 200 + 191 201 192 202 /* Return HPTE cache control bits corresponding to Linux pte bits */ 193 203 static inline unsigned long hpte_cache_bits(unsigned long pte_val)
+2 -1
arch/powerpc/include/asm/lppaca.h
··· 66 66 67 67 u8 reserved6[48]; 68 68 u8 cede_latency_hint; 69 - u8 reserved7[7]; 69 + u8 ebb_regs_in_use; 70 + u8 reserved7[6]; 70 71 u8 dtl_enable_mask; /* Dispatch Trace Log mask */ 71 72 u8 donate_dedicated_cpu; /* Donate dedicated CPU cycles */ 72 73 u8 fpregs_in_use;
+7 -4
arch/powerpc/include/asm/machdep.h
··· 36 36 #ifdef CONFIG_PPC64 37 37 void (*hpte_invalidate)(unsigned long slot, 38 38 unsigned long vpn, 39 - int psize, int ssize, 40 - int local); 39 + int bpsize, int apsize, 40 + int ssize, int local); 41 41 long (*hpte_updatepp)(unsigned long slot, 42 42 unsigned long newpp, 43 43 unsigned long vpn, 44 - int psize, int ssize, 45 - int local); 44 + int bpsize, int apsize, 45 + int ssize, int local); 46 46 void (*hpte_updateboltedpp)(unsigned long newpp, 47 47 unsigned long ea, 48 48 int psize, int ssize); ··· 57 57 void (*hpte_removebolted)(unsigned long ea, 58 58 int psize, int ssize); 59 59 void (*flush_hash_range)(unsigned long number, int local); 60 + void (*hugepage_invalidate)(struct mm_struct *mm, 61 + unsigned char *hpte_slot_array, 62 + unsigned long addr, int psize); 60 63 61 64 /* special for kexec, to be called in real mode, linear mapping is 62 65 * destroyed as well */
+14
arch/powerpc/include/asm/mmu-hash64.h
··· 340 340 int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, 341 341 pte_t *ptep, unsigned long trap, int local, int ssize, 342 342 unsigned int shift, unsigned int mmu_psize); 343 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 344 + extern int __hash_page_thp(unsigned long ea, unsigned long access, 345 + unsigned long vsid, pmd_t *pmdp, unsigned long trap, 346 + int local, int ssize, unsigned int psize); 347 + #else 348 + static inline int __hash_page_thp(unsigned long ea, unsigned long access, 349 + unsigned long vsid, pmd_t *pmdp, 350 + unsigned long trap, int local, 351 + int ssize, unsigned int psize) 352 + { 353 + BUG(); 354 + return -1; 355 + } 356 + #endif 343 357 extern void hash_failure_debug(unsigned long ea, unsigned long access, 344 358 unsigned long vsid, unsigned long trap, 345 359 int ssize, int psize, int lpsize,
-1
arch/powerpc/include/asm/mpc5121.h
··· 68 68 }; 69 69 70 70 int mpc512x_cs_config(unsigned int cs, u32 val); 71 - int __init mpc5121_clk_init(void); 72 71 73 72 #endif /* __ASM_POWERPC_MPC5121_H__ */
+5
arch/powerpc/include/asm/mpic.h
··· 339 339 #endif 340 340 }; 341 341 342 + extern struct bus_type mpic_subsys; 343 + 342 344 /* 343 345 * MPIC flags (passed to mpic_alloc) 344 346 * ··· 394 392 395 393 #define MPIC_REGSET_STANDARD MPIC_REGSET(0) /* Original MPIC */ 396 394 #define MPIC_REGSET_TSI108 MPIC_REGSET(1) /* Tsi108/109 PIC */ 395 + 396 + /* Get the version of primary MPIC */ 397 + extern u32 fsl_mpic_primary_get_version(void); 397 398 398 399 /* Allocate the controller structure and setup the linux irq descs 399 400 * for the range if interrupts passed in. No HW initialization is
+46
arch/powerpc/include/asm/mpic_timer.h
··· 1 + /* 2 + * arch/powerpc/include/asm/mpic_timer.h 3 + * 4 + * Header file for Mpic Global Timer 5 + * 6 + * Copyright 2013 Freescale Semiconductor, Inc. 7 + * 8 + * Author: Wang Dongsheng <Dongsheng.Wang@freescale.com> 9 + * Li Yang <leoli@freescale.com> 10 + * 11 + * This program is free software; you can redistribute it and/or modify it 12 + * under the terms of the GNU General Public License as published by the 13 + * Free Software Foundation; either version 2 of the License, or (at your 14 + * option) any later version. 15 + */ 16 + 17 + #ifndef __MPIC_TIMER__ 18 + #define __MPIC_TIMER__ 19 + 20 + #include <linux/interrupt.h> 21 + #include <linux/time.h> 22 + 23 + struct mpic_timer { 24 + void *dev; 25 + struct cascade_priv *cascade_handle; 26 + unsigned int num; 27 + unsigned int irq; 28 + }; 29 + 30 + #ifdef CONFIG_MPIC_TIMER 31 + struct mpic_timer *mpic_request_timer(irq_handler_t fn, void *dev, 32 + const struct timeval *time); 33 + void mpic_start_timer(struct mpic_timer *handle); 34 + void mpic_stop_timer(struct mpic_timer *handle); 35 + void mpic_get_remain_time(struct mpic_timer *handle, struct timeval *time); 36 + void mpic_free_timer(struct mpic_timer *handle); 37 + #else 38 + struct mpic_timer *mpic_request_timer(irq_handler_t fn, void *dev, 39 + const struct timeval *time) { return NULL; } 40 + void mpic_start_timer(struct mpic_timer *handle) { } 41 + void mpic_stop_timer(struct mpic_timer *handle) { } 42 + void mpic_get_remain_time(struct mpic_timer *handle, struct timeval *time) { } 43 + void mpic_free_timer(struct mpic_timer *handle) { } 44 + #endif 45 + 46 + #endif
+119 -21
arch/powerpc/include/asm/opal.h
··· 117 117 #define OPAL_SET_SLOT_LED_STATUS 55 118 118 #define OPAL_GET_EPOW_STATUS 56 119 119 #define OPAL_SET_SYSTEM_ATTENTION_LED 57 120 + #define OPAL_RESERVED1 58 121 + #define OPAL_RESERVED2 59 122 + #define OPAL_PCI_NEXT_ERROR 60 123 + #define OPAL_PCI_EEH_FREEZE_STATUS2 61 124 + #define OPAL_PCI_POLL 62 120 125 #define OPAL_PCI_MSI_EOI 63 126 + #define OPAL_PCI_GET_PHB_DIAG_DATA2 64 121 127 122 128 #ifndef __ASSEMBLY__ 123 129 ··· 131 125 enum OpalVendorApiTokens { 132 126 OPAL_START_VENDOR_API_RANGE = 1000, OPAL_END_VENDOR_API_RANGE = 1999 133 127 }; 128 + 134 129 enum OpalFreezeState { 135 130 OPAL_EEH_STOPPED_NOT_FROZEN = 0, 136 131 OPAL_EEH_STOPPED_MMIO_FREEZE = 1, ··· 141 134 OPAL_EEH_STOPPED_TEMP_UNAVAIL = 5, 142 135 OPAL_EEH_STOPPED_PERM_UNAVAIL = 6 143 136 }; 137 + 144 138 enum OpalEehFreezeActionToken { 145 139 OPAL_EEH_ACTION_CLEAR_FREEZE_MMIO = 1, 146 140 OPAL_EEH_ACTION_CLEAR_FREEZE_DMA = 2, 147 141 OPAL_EEH_ACTION_CLEAR_FREEZE_ALL = 3 148 142 }; 143 + 149 144 enum OpalPciStatusToken { 150 - OPAL_EEH_PHB_NO_ERROR = 0, 151 - OPAL_EEH_PHB_FATAL = 1, 152 - OPAL_EEH_PHB_RECOVERABLE = 2, 153 - OPAL_EEH_PHB_BUS_ERROR = 3, 154 - OPAL_EEH_PCI_NO_DEVSEL = 4, 155 - OPAL_EEH_PCI_TA = 5, 156 - OPAL_EEH_PCIEX_UR = 6, 157 - OPAL_EEH_PCIEX_CA = 7, 158 - OPAL_EEH_PCI_MMIO_ERROR = 8, 159 - OPAL_EEH_PCI_DMA_ERROR = 9 145 + OPAL_EEH_NO_ERROR = 0, 146 + OPAL_EEH_IOC_ERROR = 1, 147 + OPAL_EEH_PHB_ERROR = 2, 148 + OPAL_EEH_PE_ERROR = 3, 149 + OPAL_EEH_PE_MMIO_ERROR = 4, 150 + OPAL_EEH_PE_DMA_ERROR = 5 160 151 }; 152 + 153 + enum OpalPciErrorSeverity { 154 + OPAL_EEH_SEV_NO_ERROR = 0, 155 + OPAL_EEH_SEV_IOC_DEAD = 1, 156 + OPAL_EEH_SEV_PHB_DEAD = 2, 157 + OPAL_EEH_SEV_PHB_FENCED = 3, 158 + OPAL_EEH_SEV_PE_ER = 4, 159 + OPAL_EEH_SEV_INF = 5 160 + }; 161 + 161 162 enum OpalShpcAction { 162 163 OPAL_SHPC_GET_LINK_STATE = 0, 163 164 OPAL_SHPC_GET_SLOT_STATE = 1 164 165 }; 166 + 165 167 enum OpalShpcLinkState { 166 168 OPAL_SHPC_LINK_DOWN = 0, 167 169 OPAL_SHPC_LINK_UP = 1 168 170 }; 171 + 169 172 enum OpalMmioWindowType { 170 173 OPAL_M32_WINDOW_TYPE = 1, 171 174 OPAL_M64_WINDOW_TYPE = 2, 172 175 OPAL_IO_WINDOW_TYPE = 3 173 176 }; 177 + 174 178 enum OpalShpcSlotState { 175 179 OPAL_SHPC_DEV_NOT_PRESENT = 0, 176 180 OPAL_SHPC_DEV_PRESENT = 1 177 181 }; 182 + 178 183 enum OpalExceptionHandler { 179 184 OPAL_MACHINE_CHECK_HANDLER = 1, 180 185 OPAL_HYPERVISOR_MAINTENANCE_HANDLER = 2, 181 186 OPAL_SOFTPATCH_HANDLER = 3 182 187 }; 188 + 183 189 enum OpalPendingState { 184 - OPAL_EVENT_OPAL_INTERNAL = 0x1, 185 - OPAL_EVENT_NVRAM = 0x2, 186 - OPAL_EVENT_RTC = 0x4, 187 - OPAL_EVENT_CONSOLE_OUTPUT = 0x8, 188 - OPAL_EVENT_CONSOLE_INPUT = 0x10, 189 - OPAL_EVENT_ERROR_LOG_AVAIL = 0x20, 190 - OPAL_EVENT_ERROR_LOG = 0x40, 191 - OPAL_EVENT_EPOW = 0x80, 192 - OPAL_EVENT_LED_STATUS = 0x100 190 + OPAL_EVENT_OPAL_INTERNAL = 0x1, 191 + OPAL_EVENT_NVRAM = 0x2, 192 + OPAL_EVENT_RTC = 0x4, 193 + OPAL_EVENT_CONSOLE_OUTPUT = 0x8, 194 + OPAL_EVENT_CONSOLE_INPUT = 0x10, 195 + OPAL_EVENT_ERROR_LOG_AVAIL = 0x20, 196 + OPAL_EVENT_ERROR_LOG = 0x40, 197 + OPAL_EVENT_EPOW = 0x80, 198 + OPAL_EVENT_LED_STATUS = 0x100, 199 + OPAL_EVENT_PCI_ERROR = 0x200 193 200 }; 194 201 195 202 /* Machine check related definitions */ ··· 385 364 } u; 386 365 }; 387 366 367 + enum { 368 + OPAL_P7IOC_DIAG_TYPE_NONE = 0, 369 + OPAL_P7IOC_DIAG_TYPE_RGC = 1, 370 + OPAL_P7IOC_DIAG_TYPE_BI = 2, 371 + OPAL_P7IOC_DIAG_TYPE_CI = 3, 372 + OPAL_P7IOC_DIAG_TYPE_MISC = 4, 373 + OPAL_P7IOC_DIAG_TYPE_I2C = 5, 374 + OPAL_P7IOC_DIAG_TYPE_LAST = 6 375 + }; 376 + 377 + struct OpalIoP7IOCErrorData { 378 + uint16_t type; 379 + 380 + /* GEM */ 381 + uint64_t gemXfir; 382 + uint64_t gemRfir; 383 + uint64_t gemRirqfir; 384 + uint64_t gemMask; 385 + uint64_t gemRwof; 386 + 387 + /* LEM */ 388 + uint64_t lemFir; 389 + uint64_t lemErrMask; 390 + uint64_t lemAction0; 391 + uint64_t lemAction1; 392 + uint64_t lemWof; 393 + 394 + union { 395 + struct OpalIoP7IOCRgcErrorData { 396 + uint64_t rgcStatus; /* 3E1C10 */ 397 + uint64_t rgcLdcp; /* 3E1C18 */ 398 + }rgc; 399 + struct OpalIoP7IOCBiErrorData { 400 + uint64_t biLdcp0; /* 3C0100, 3C0118 */ 401 + uint64_t biLdcp1; /* 3C0108, 3C0120 */ 402 + uint64_t biLdcp2; /* 3C0110, 3C0128 */ 403 + uint64_t biFenceStatus; /* 3C0130, 3C0130 */ 404 + 405 + uint8_t biDownbound; /* BI Downbound or Upbound */ 406 + }bi; 407 + struct OpalIoP7IOCCiErrorData { 408 + uint64_t ciPortStatus; /* 3Dn008 */ 409 + uint64_t ciPortLdcp; /* 3Dn010 */ 410 + 411 + uint8_t ciPort; /* Index of CI port: 0/1 */ 412 + }ci; 413 + }; 414 + }; 415 + 388 416 /** 389 417 * This structure defines the overlay which will be used to store PHB error 390 418 * data upon request. 391 419 */ 392 420 enum { 421 + OPAL_PHB_ERROR_DATA_VERSION_1 = 1, 422 + }; 423 + 424 + enum { 425 + OPAL_PHB_ERROR_DATA_TYPE_P7IOC = 1, 426 + }; 427 + 428 + enum { 393 429 OPAL_P7IOC_NUM_PEST_REGS = 128, 394 430 }; 395 431 432 + struct OpalIoPhbErrorCommon { 433 + uint32_t version; 434 + uint32_t ioType; 435 + uint32_t len; 436 + }; 437 + 396 438 struct OpalIoP7IOCPhbErrorData { 439 + struct OpalIoPhbErrorCommon common; 440 + 397 441 uint32_t brdgCtl; 398 442 399 443 // P7IOC utl regs ··· 616 530 uint64_t pci_mem_size); 617 531 int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t assert_state); 618 532 619 - int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer, uint64_t diag_buffer_len); 620 - int64_t opal_pci_get_phb_diag_data(uint64_t phb_id, void *diag_buffer, uint64_t diag_buffer_len); 533 + int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer, 534 + uint64_t diag_buffer_len); 535 + int64_t opal_pci_get_phb_diag_data(uint64_t phb_id, void *diag_buffer, 536 + uint64_t diag_buffer_len); 537 + int64_t opal_pci_get_phb_diag_data2(uint64_t phb_id, void *diag_buffer, 538 + uint64_t diag_buffer_len); 621 539 int64_t opal_pci_fence_phb(uint64_t phb_id); 622 540 int64_t opal_pci_reinit(uint64_t phb_id, uint8_t reinit_scope); 623 541 int64_t opal_pci_mask_pe_error(uint64_t phb_id, uint16_t pe_number, uint8_t error_type, uint8_t mask_action); 624 542 int64_t opal_set_slot_led_status(uint64_t phb_id, uint64_t slot_id, uint8_t led_type, uint8_t led_action); 625 543 int64_t opal_get_epow_status(uint64_t *status); 626 544 int64_t opal_set_system_attention_led(uint8_t led_action); 545 + int64_t opal_pci_next_error(uint64_t phb_id, uint64_t *first_frozen_pe, 546 + uint16_t *pci_error_type, uint16_t *severity); 547 + int64_t opal_pci_poll(uint64_t phb_id); 627 548 628 549 /* Internal functions */ 629 550 extern int early_init_dt_scan_opal(unsigned long node, const char *uname, int depth, void *data); ··· 643 550 /* Internal functions */ 644 551 extern int early_init_dt_scan_opal(unsigned long node, const char *uname, 645 552 int depth, void *data); 553 + 554 + extern int opal_notifier_register(struct notifier_block *nb); 555 + extern void opal_notifier_enable(void); 556 + extern void opal_notifier_disable(void); 557 + extern void opal_notifier_update_evt(uint64_t evt_mask, uint64_t evt_val); 646 558 647 559 extern int opal_get_chars(uint32_t vtermno, char *buf, int count); 648 560 extern int opal_put_chars(uint32_t vtermno, const char *buf, int total_len);
+6
arch/powerpc/include/asm/perf_event_server.h
··· 60 60 #define PPMU_HAS_SSLOT 0x00000020 /* Has sampled slot in MMCRA */ 61 61 #define PPMU_HAS_SIER 0x00000040 /* Has SIER */ 62 62 #define PPMU_BHRB 0x00000080 /* has BHRB feature enabled */ 63 + #define PPMU_EBB 0x00000100 /* supports event based branch */ 63 64 64 65 /* 65 66 * Values for flags to get_alternatives() ··· 68 67 #define PPMU_LIMITED_PMC_OK 1 /* can put this on a limited PMC */ 69 68 #define PPMU_LIMITED_PMC_REQD 2 /* have to put this on a limited PMC */ 70 69 #define PPMU_ONLY_COUNT_RUN 4 /* only counting in run state */ 70 + 71 + /* 72 + * We use the event config bit 63 as a flag to request EBB. 73 + */ 74 + #define EVENT_CONFIG_EBB_SHIFT 63 71 75 72 76 extern int register_power_pmu(struct power_pmu *); 73 77
+3 -3
arch/powerpc/include/asm/pgalloc-64.h
··· 221 221 222 222 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) 223 223 { 224 - return kmem_cache_alloc(PGT_CACHE(PMD_INDEX_SIZE), 224 + return kmem_cache_alloc(PGT_CACHE(PMD_CACHE_INDEX), 225 225 GFP_KERNEL|__GFP_REPEAT); 226 226 } 227 227 228 228 static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd) 229 229 { 230 - kmem_cache_free(PGT_CACHE(PMD_INDEX_SIZE), pmd); 230 + kmem_cache_free(PGT_CACHE(PMD_CACHE_INDEX), pmd); 231 231 } 232 232 233 233 #define __pmd_free_tlb(tlb, pmd, addr) \ 234 - pgtable_free_tlb(tlb, pmd, PMD_INDEX_SIZE) 234 + pgtable_free_tlb(tlb, pmd, PMD_CACHE_INDEX) 235 235 #ifndef CONFIG_PPC_64K_PAGES 236 236 #define __pud_free_tlb(tlb, pud, addr) \ 237 237 pgtable_free_tlb(tlb, pud, PUD_INDEX_SIZE)
+2 -1
arch/powerpc/include/asm/pgtable-ppc64-64k.h
··· 33 33 #define PGDIR_MASK (~(PGDIR_SIZE-1)) 34 34 35 35 /* Bits to mask out from a PMD to get to the PTE page */ 36 - #define PMD_MASKED_BITS 0x1ff 36 + /* PMDs point to PTE table fragments which are 4K aligned. */ 37 + #define PMD_MASKED_BITS 0xfff 37 38 /* Bits to mask out from a PGD/PUD to get to the PMD page */ 38 39 #define PUD_MASKED_BITS 0x1ff 39 40
+218 -39
arch/powerpc/include/asm/pgtable-ppc64.h
··· 10 10 #else 11 11 #include <asm/pgtable-ppc64-4k.h> 12 12 #endif 13 + #include <asm/barrier.h> 13 14 14 15 #define FIRST_USER_ADDRESS 0 15 16 ··· 21 20 PUD_INDEX_SIZE + PGD_INDEX_SIZE + PAGE_SHIFT) 22 21 #define PGTABLE_RANGE (ASM_CONST(1) << PGTABLE_EADDR_SIZE) 23 22 24 - 23 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 24 + #define PMD_CACHE_INDEX (PMD_INDEX_SIZE + 1) 25 + #else 26 + #define PMD_CACHE_INDEX PMD_INDEX_SIZE 27 + #endif 25 28 /* 26 29 * Define the address range of the kernel non-linear virtual area 27 30 */ ··· 155 150 #define pmd_present(pmd) (pmd_val(pmd) != 0) 156 151 #define pmd_clear(pmdp) (pmd_val(*(pmdp)) = 0) 157 152 #define pmd_page_vaddr(pmd) (pmd_val(pmd) & ~PMD_MASKED_BITS) 158 - #define pmd_page(pmd) virt_to_page(pmd_page_vaddr(pmd)) 153 + extern struct page *pmd_page(pmd_t pmd); 159 154 160 155 #define pud_set(pudp, pudval) (pud_val(*(pudp)) = (pudval)) 161 156 #define pud_none(pud) (!pud_val(pud)) ··· 344 339 345 340 void pgtable_cache_add(unsigned shift, void (*ctor)(void *)); 346 341 void pgtable_cache_init(void); 347 - 348 - /* 349 - * find_linux_pte returns the address of a linux pte for a given 350 - * effective address and directory. If not found, it returns zero. 351 - */ 352 - static inline pte_t *find_linux_pte(pgd_t *pgdir, unsigned long ea) 353 - { 354 - pgd_t *pg; 355 - pud_t *pu; 356 - pmd_t *pm; 357 - pte_t *pt = NULL; 358 - 359 - pg = pgdir + pgd_index(ea); 360 - if (!pgd_none(*pg)) { 361 - pu = pud_offset(pg, ea); 362 - if (!pud_none(*pu)) { 363 - pm = pmd_offset(pu, ea); 364 - if (pmd_present(*pm)) 365 - pt = pte_offset_kernel(pm, ea); 366 - } 367 - } 368 - return pt; 369 - } 370 - 371 - #ifdef CONFIG_HUGETLB_PAGE 372 - pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, 373 - unsigned *shift); 374 - #else 375 - static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, 376 - unsigned *shift) 377 - { 378 - if (shift) 379 - *shift = 0; 380 - return find_linux_pte(pgdir, ea); 381 - } 382 - #endif /* !CONFIG_HUGETLB_PAGE */ 383 - 384 342 #endif /* __ASSEMBLY__ */ 385 343 344 + /* 345 + * THP pages can't be special. So use the _PAGE_SPECIAL 346 + */ 347 + #define _PAGE_SPLITTING _PAGE_SPECIAL 348 + 349 + /* 350 + * We need to differentiate between explicit huge page and THP huge 351 + * page, since THP huge page also need to track real subpage details 352 + */ 353 + #define _PAGE_THP_HUGE _PAGE_4K_PFN 354 + 355 + /* 356 + * set of bits not changed in pmd_modify. 357 + */ 358 + #define _HPAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS | \ 359 + _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_SPLITTING | \ 360 + _PAGE_THP_HUGE) 361 + 362 + #ifndef __ASSEMBLY__ 363 + /* 364 + * The linux hugepage PMD now include the pmd entries followed by the address 365 + * to the stashed pgtable_t. The stashed pgtable_t contains the hpte bits. 366 + * [ 1 bit secondary | 3 bit hidx | 1 bit valid | 000]. We use one byte per 367 + * each HPTE entry. With 16MB hugepage and 64K HPTE we need 256 entries and 368 + * with 4K HPTE we need 4096 entries. Both will fit in a 4K pgtable_t. 369 + * 370 + * The last three bits are intentionally left to zero. This memory location 371 + * are also used as normal page PTE pointers. So if we have any pointers 372 + * left around while we collapse a hugepage, we need to make sure 373 + * _PAGE_PRESENT and _PAGE_FILE bits of that are zero when we look at them 374 + */ 375 + static inline unsigned int hpte_valid(unsigned char *hpte_slot_array, int index) 376 + { 377 + return (hpte_slot_array[index] >> 3) & 0x1; 378 + } 379 + 380 + static inline unsigned int hpte_hash_index(unsigned char *hpte_slot_array, 381 + int index) 382 + { 383 + return hpte_slot_array[index] >> 4; 384 + } 385 + 386 + static inline void mark_hpte_slot_valid(unsigned char *hpte_slot_array, 387 + unsigned int index, unsigned int hidx) 388 + { 389 + hpte_slot_array[index] = hidx << 4 | 0x1 << 3; 390 + } 391 + 392 + static inline char *get_hpte_slot_array(pmd_t *pmdp) 393 + { 394 + /* 395 + * The hpte hindex is stored in the pgtable whose address is in the 396 + * second half of the PMD 397 + * 398 + * Order this load with the test for pmd_trans_huge in the caller 399 + */ 400 + smp_rmb(); 401 + return *(char **)(pmdp + PTRS_PER_PMD); 402 + 403 + 404 + } 405 + 406 + extern void hpte_do_hugepage_flush(struct mm_struct *mm, unsigned long addr, 407 + pmd_t *pmdp); 408 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 409 + extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot); 410 + extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot); 411 + extern pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot); 412 + extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, 413 + pmd_t *pmdp, pmd_t pmd); 414 + extern void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr, 415 + pmd_t *pmd); 416 + 417 + static inline int pmd_trans_huge(pmd_t pmd) 418 + { 419 + /* 420 + * leaf pte for huge page, bottom two bits != 00 421 + */ 422 + return (pmd_val(pmd) & 0x3) && (pmd_val(pmd) & _PAGE_THP_HUGE); 423 + } 424 + 425 + static inline int pmd_large(pmd_t pmd) 426 + { 427 + /* 428 + * leaf pte for huge page, bottom two bits != 00 429 + */ 430 + if (pmd_trans_huge(pmd)) 431 + return pmd_val(pmd) & _PAGE_PRESENT; 432 + return 0; 433 + } 434 + 435 + static inline int pmd_trans_splitting(pmd_t pmd) 436 + { 437 + if (pmd_trans_huge(pmd)) 438 + return pmd_val(pmd) & _PAGE_SPLITTING; 439 + return 0; 440 + } 441 + 442 + extern int has_transparent_hugepage(void); 443 + #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ 444 + 445 + static inline pte_t pmd_pte(pmd_t pmd) 446 + { 447 + return __pte(pmd_val(pmd)); 448 + } 449 + 450 + static inline pmd_t pte_pmd(pte_t pte) 451 + { 452 + return __pmd(pte_val(pte)); 453 + } 454 + 455 + static inline pte_t *pmdp_ptep(pmd_t *pmd) 456 + { 457 + return (pte_t *)pmd; 458 + } 459 + 460 + #define pmd_pfn(pmd) pte_pfn(pmd_pte(pmd)) 461 + #define pmd_young(pmd) pte_young(pmd_pte(pmd)) 462 + #define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd))) 463 + #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd))) 464 + #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd))) 465 + #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) 466 + #define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd))) 467 + 468 + #define __HAVE_ARCH_PMD_WRITE 469 + #define pmd_write(pmd) pte_write(pmd_pte(pmd)) 470 + 471 + static inline pmd_t pmd_mkhuge(pmd_t pmd) 472 + { 473 + /* Do nothing, mk_pmd() does this part. */ 474 + return pmd; 475 + } 476 + 477 + static inline pmd_t pmd_mknotpresent(pmd_t pmd) 478 + { 479 + pmd_val(pmd) &= ~_PAGE_PRESENT; 480 + return pmd; 481 + } 482 + 483 + static inline pmd_t pmd_mksplitting(pmd_t pmd) 484 + { 485 + pmd_val(pmd) |= _PAGE_SPLITTING; 486 + return pmd; 487 + } 488 + 489 + #define __HAVE_ARCH_PMD_SAME 490 + static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b) 491 + { 492 + return (((pmd_val(pmd_a) ^ pmd_val(pmd_b)) & ~_PAGE_HPTEFLAGS) == 0); 493 + } 494 + 495 + #define __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS 496 + extern int pmdp_set_access_flags(struct vm_area_struct *vma, 497 + unsigned long address, pmd_t *pmdp, 498 + pmd_t entry, int dirty); 499 + 500 + extern unsigned long pmd_hugepage_update(struct mm_struct *mm, 501 + unsigned long addr, 502 + pmd_t *pmdp, unsigned long clr); 503 + 504 + static inline int __pmdp_test_and_clear_young(struct mm_struct *mm, 505 + unsigned long addr, pmd_t *pmdp) 506 + { 507 + unsigned long old; 508 + 509 + if ((pmd_val(*pmdp) & (_PAGE_ACCESSED | _PAGE_HASHPTE)) == 0) 510 + return 0; 511 + old = pmd_hugepage_update(mm, addr, pmdp, _PAGE_ACCESSED); 512 + return ((old & _PAGE_ACCESSED) != 0); 513 + } 514 + 515 + #define __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG 516 + extern int pmdp_test_and_clear_young(struct vm_area_struct *vma, 517 + unsigned long address, pmd_t *pmdp); 518 + #define __HAVE_ARCH_PMDP_CLEAR_YOUNG_FLUSH 519 + extern int pmdp_clear_flush_young(struct vm_area_struct *vma, 520 + unsigned long address, pmd_t *pmdp); 521 + 522 + #define __HAVE_ARCH_PMDP_GET_AND_CLEAR 523 + extern pmd_t pmdp_get_and_clear(struct mm_struct *mm, 524 + unsigned long addr, pmd_t *pmdp); 525 + 526 + #define __HAVE_ARCH_PMDP_CLEAR_FLUSH 527 + extern pmd_t pmdp_clear_flush(struct vm_area_struct *vma, unsigned long address, 528 + pmd_t *pmdp); 529 + 530 + #define __HAVE_ARCH_PMDP_SET_WRPROTECT 531 + static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, 532 + pmd_t *pmdp) 533 + { 534 + 535 + if ((pmd_val(*pmdp) & _PAGE_RW) == 0) 536 + return; 537 + 538 + pmd_hugepage_update(mm, addr, pmdp, _PAGE_RW); 539 + } 540 + 541 + #define __HAVE_ARCH_PMDP_SPLITTING_FLUSH 542 + extern void pmdp_splitting_flush(struct vm_area_struct *vma, 543 + unsigned long address, pmd_t *pmdp); 544 + 545 + #define __HAVE_ARCH_PGTABLE_DEPOSIT 546 + extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, 547 + pgtable_t pgtable); 548 + #define __HAVE_ARCH_PGTABLE_WITHDRAW 549 + extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp); 550 + 551 + #define __HAVE_ARCH_PMDP_INVALIDATE 552 + extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, 553 + pmd_t *pmdp); 554 + #endif /* __ASSEMBLY__ */ 386 555 #endif /* _ASM_POWERPC_PGTABLE_PPC64_H_ */
+6
arch/powerpc/include/asm/pgtable.h
··· 217 217 218 218 extern int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, 219 219 unsigned long end, int write, struct page **pages, int *nr); 220 + #ifndef CONFIG_TRANSPARENT_HUGEPAGE 221 + #define pmd_large(pmd) 0 222 + #define has_transparent_hugepage() 0 223 + #endif 224 + pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, 225 + unsigned *shift); 220 226 #endif /* __ASSEMBLY__ */ 221 227 222 228 #endif /* __KERNEL__ */
+25
arch/powerpc/include/asm/probes.h
··· 38 38 #define is_trap(instr) (IS_TW(instr) || IS_TWI(instr)) 39 39 #endif /* CONFIG_PPC64 */ 40 40 41 + #ifdef CONFIG_PPC_ADV_DEBUG_REGS 42 + #define MSR_SINGLESTEP (MSR_DE) 43 + #else 44 + #define MSR_SINGLESTEP (MSR_SE) 45 + #endif 46 + 47 + /* Enable single stepping for the current task */ 48 + static inline void enable_single_step(struct pt_regs *regs) 49 + { 50 + regs->msr |= MSR_SINGLESTEP; 51 + #ifdef CONFIG_PPC_ADV_DEBUG_REGS 52 + /* 53 + * We turn off Critical Input Exception(CE) to ensure that the single 54 + * step will be for the instruction we have the probe on; if we don't, 55 + * it is possible we'd get the single step reported for CE. 56 + */ 57 + regs->msr &= ~MSR_CE; 58 + mtspr(SPRN_DBCR0, mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM); 59 + #ifdef CONFIG_PPC_47x 60 + isync(); 61 + #endif 62 + #endif 63 + } 64 + 65 + 41 66 #endif /* __KERNEL__ */ 42 67 #endif /* _ASM_POWERPC_PROBES_H */
+7 -9
arch/powerpc/include/asm/processor.h
··· 168 168 * The following help to manage the use of Debug Control Registers 169 169 * om the BookE platforms. 170 170 */ 171 - unsigned long dbcr0; 172 - unsigned long dbcr1; 171 + uint32_t dbcr0; 172 + uint32_t dbcr1; 173 173 #ifdef CONFIG_BOOKE 174 - unsigned long dbcr2; 174 + uint32_t dbcr2; 175 175 #endif 176 176 /* 177 177 * The stored value of the DBSR register will be the value at the ··· 179 179 * user (will never be written to) and has value while helping to 180 180 * describe the reason for the last debug trap. Torez 181 181 */ 182 - unsigned long dbsr; 182 + uint32_t dbsr; 183 183 /* 184 184 * The following will contain addresses used by debug applications 185 185 * to help trace and trap on particular address locations. ··· 200 200 #endif 201 201 #endif 202 202 /* FP and VSX 0-31 register set */ 203 - double fpr[32][TS_FPRWIDTH]; 203 + double fpr[32][TS_FPRWIDTH] __attribute__((aligned(16))); 204 204 struct { 205 205 206 206 unsigned int pad; ··· 287 287 unsigned long siar; 288 288 unsigned long sdar; 289 289 unsigned long sier; 290 - unsigned long mmcr0; 291 290 unsigned long mmcr2; 292 - unsigned long mmcra; 291 + unsigned mmcr0; 292 + unsigned used_ebb; 293 293 #endif 294 294 }; 295 295 ··· 404 404 405 405 #define spin_lock_prefetch(x) prefetchw(x) 406 406 407 - #ifdef CONFIG_PPC64 408 407 #define HAVE_ARCH_PICK_MMAP_LAYOUT 409 - #endif 410 408 411 409 #ifdef CONFIG_PPC64 412 410 static inline unsigned long get_clean_sp(unsigned long sp, int is_32)
+9
arch/powerpc/include/asm/reg.h
··· 621 621 #define MMCR0_PMXE 0x04000000UL /* performance monitor exception enable */ 622 622 #define MMCR0_FCECE 0x02000000UL /* freeze ctrs on enabled cond or event */ 623 623 #define MMCR0_TBEE 0x00400000UL /* time base exception enable */ 624 + #define MMCR0_EBE 0x00100000UL /* Event based branch enable */ 625 + #define MMCR0_PMCC 0x000c0000UL /* PMC control */ 626 + #define MMCR0_PMCC_U6 0x00080000UL /* PMC1-6 are R/W by user (PR) */ 624 627 #define MMCR0_PMC1CE 0x00008000UL /* PMC1 count enable*/ 625 628 #define MMCR0_PMCjCE 0x00004000UL /* PMCj count enable*/ 626 629 #define MMCR0_TRIGGER 0x00002000UL /* TRIGGER enable */ 627 630 #define MMCR0_PMAO 0x00000080UL /* performance monitor alert has occurred, set to 0 after handling exception */ 628 631 #define MMCR0_SHRFC 0x00000040UL /* SHRre freeze conditions between threads */ 632 + #define MMCR0_FC56 0x00000010UL /* freeze counters 5 and 6 */ 629 633 #define MMCR0_FCTI 0x00000008UL /* freeze counters in tags inactive mode */ 630 634 #define MMCR0_FCTA 0x00000004UL /* freeze counters in tags active mode */ 631 635 #define MMCR0_FCWAIT 0x00000002UL /* freeze counter in WAIT state */ ··· 676 672 #define SIER_SIHV 0x1000000 /* Sampled MSR_HV */ 677 673 #define SIER_SIAR_VALID 0x0400000 /* SIAR contents valid */ 678 674 #define SIER_SDAR_VALID 0x0200000 /* SDAR contents valid */ 675 + 676 + /* When EBB is enabled, some of MMCR0/MMCR2/SIER are user accessible */ 677 + #define MMCR0_USER_MASK (MMCR0_FC | MMCR0_PMXE | MMCR0_PMAO) 678 + #define MMCR2_USER_MASK 0x4020100804020000UL /* (FC1P|FC2P|FC3P|FC4P|FC5P|FC6P) */ 679 + #define SIER_USER_MASK 0x7fffffUL 679 680 680 681 #define SPRN_PA6T_MMCR0 795 681 682 #define PA6T_MMCR0_EN0 0x0000000000000001UL
+2 -2
arch/powerpc/include/asm/rtas.h
··· 350 350 (devfn << 8) | (reg & 0xff); 351 351 } 352 352 353 - extern void __cpuinit rtas_give_timebase(void); 354 - extern void __cpuinit rtas_take_timebase(void); 353 + extern void rtas_give_timebase(void); 354 + extern void rtas_take_timebase(void); 355 355 356 356 #ifdef CONFIG_PPC_RTAS 357 357 static inline int page_is_rtas_user_buf(unsigned long pfn)
+14
arch/powerpc/include/asm/switch_to.h
··· 67 67 } 68 68 #endif 69 69 70 + static inline void clear_task_ebb(struct task_struct *t) 71 + { 72 + #ifdef CONFIG_PPC_BOOK3S_64 73 + /* EBB perf events are not inherited, so clear all EBB state. */ 74 + t->thread.bescr = 0; 75 + t->thread.mmcr2 = 0; 76 + t->thread.mmcr0 = 0; 77 + t->thread.siar = 0; 78 + t->thread.sdar = 0; 79 + t->thread.sier = 0; 80 + t->thread.used_ebb = 0; 81 + #endif 82 + } 83 + 70 84 #endif /* _ASM_POWERPC_SWITCH_TO_H */
+2 -1
arch/powerpc/include/asm/tlbflush.h
··· 165 165 /* Private function for use by PCI IO mapping code */ 166 166 extern void __flush_hash_table_range(struct mm_struct *mm, unsigned long start, 167 167 unsigned long end); 168 - 168 + extern void flush_tlb_pmd_range(struct mm_struct *mm, pmd_t *pmd, 169 + unsigned long addr); 169 170 #else 170 171 #error Unsupported MMU type 171 172 #endif
+1 -1
arch/powerpc/include/asm/vdso.h
··· 22 22 extern unsigned long vdso32_sigtramp; 23 23 extern unsigned long vdso32_rt_sigtramp; 24 24 25 - int __cpuinit vdso_getcpu_init(void); 25 + int vdso_getcpu_init(void); 26 26 27 27 #else /* __ASSEMBLY__ */ 28 28
+3 -1
arch/powerpc/kernel/Makefile
··· 58 58 obj-$(CONFIG_LPARCFG) += lparcfg.o 59 59 obj-$(CONFIG_IBMVIO) += vio.o 60 60 obj-$(CONFIG_IBMEBUS) += ibmebus.o 61 + obj-$(CONFIG_EEH) += eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \ 62 + eeh_driver.o eeh_event.o eeh_sysfs.o 61 63 obj-$(CONFIG_GENERIC_TBSYNC) += smp-tbsync.o 62 64 obj-$(CONFIG_CRASH_DUMP) += crash_dump.o 63 65 obj-$(CONFIG_FA_DUMP) += fadump.o ··· 102 100 obj-$(CONFIG_STACKTRACE) += stacktrace.o 103 101 obj-$(CONFIG_SWIOTLB) += dma-swiotlb.o 104 102 105 - pci64-$(CONFIG_PPC64) += pci_dn.o isa-bridge.o 103 + pci64-$(CONFIG_PPC64) += pci_dn.o pci-hotplug.o isa-bridge.o 106 104 obj-$(CONFIG_PCI) += pci_$(CONFIG_WORD_SIZE).o $(pci64-y) \ 107 105 pci-common.o pci_of_scan.o 108 106 obj-$(CONFIG_PCI_MSI) += msi.o
+3 -4
arch/powerpc/kernel/asm-offsets.c
··· 105 105 DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid)); 106 106 #else /* CONFIG_PPC64 */ 107 107 DEFINE(PGDIR, offsetof(struct thread_struct, pgdir)); 108 - #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE) 109 - DEFINE(THREAD_DBCR0, offsetof(struct thread_struct, dbcr0)); 110 - #endif 111 108 #ifdef CONFIG_SPE 112 109 DEFINE(THREAD_EVR0, offsetof(struct thread_struct, evr[0])); 113 110 DEFINE(THREAD_ACC, offsetof(struct thread_struct, acc)); ··· 112 115 DEFINE(THREAD_USED_SPE, offsetof(struct thread_struct, used_spe)); 113 116 #endif /* CONFIG_SPE */ 114 117 #endif /* CONFIG_PPC64 */ 118 + #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE) 119 + DEFINE(THREAD_DBCR0, offsetof(struct thread_struct, dbcr0)); 120 + #endif 115 121 #ifdef CONFIG_KVM_BOOK3S_32_HANDLER 116 122 DEFINE(THREAD_KVM_SVCPU, offsetof(struct thread_struct, kvm_shadow_vcpu)); 117 123 #endif ··· 132 132 DEFINE(THREAD_SIER, offsetof(struct thread_struct, sier)); 133 133 DEFINE(THREAD_MMCR0, offsetof(struct thread_struct, mmcr0)); 134 134 DEFINE(THREAD_MMCR2, offsetof(struct thread_struct, mmcr2)); 135 - DEFINE(THREAD_MMCRA, offsetof(struct thread_struct, mmcra)); 136 135 #endif 137 136 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM 138 137 DEFINE(PACATMSCRATCH, offsetof(struct paca_struct, tm_scratch));
+21 -15
arch/powerpc/kernel/cacheinfo.c
··· 131 131 return cache_type_info[cache->type].name; 132 132 } 133 133 134 - static void __cpuinit cache_init(struct cache *cache, int type, int level, struct device_node *ofnode) 134 + static void cache_init(struct cache *cache, int type, int level, 135 + struct device_node *ofnode) 135 136 { 136 137 cache->type = type; 137 138 cache->level = level; ··· 141 140 list_add(&cache->list, &cache_list); 142 141 } 143 142 144 - static struct cache *__cpuinit new_cache(int type, int level, struct device_node *ofnode) 143 + static struct cache *new_cache(int type, int level, struct device_node *ofnode) 145 144 { 146 145 struct cache *cache; 147 146 ··· 325 324 return of_get_property(np, "cache-unified", NULL); 326 325 } 327 326 328 - static struct cache *__cpuinit cache_do_one_devnode_unified(struct device_node *node, int level) 327 + static struct cache *cache_do_one_devnode_unified(struct device_node *node, 328 + int level) 329 329 { 330 330 struct cache *cache; 331 331 ··· 337 335 return cache; 338 336 } 339 337 340 - static struct cache *__cpuinit cache_do_one_devnode_split(struct device_node *node, int level) 338 + static struct cache *cache_do_one_devnode_split(struct device_node *node, 339 + int level) 341 340 { 342 341 struct cache *dcache, *icache; 343 342 ··· 360 357 return NULL; 361 358 } 362 359 363 - static struct cache *__cpuinit cache_do_one_devnode(struct device_node *node, int level) 360 + static struct cache *cache_do_one_devnode(struct device_node *node, int level) 364 361 { 365 362 struct cache *cache; 366 363 ··· 372 369 return cache; 373 370 } 374 371 375 - static struct cache *__cpuinit cache_lookup_or_instantiate(struct device_node *node, int level) 372 + static struct cache *cache_lookup_or_instantiate(struct device_node *node, 373 + int level) 376 374 { 377 375 struct cache *cache; 378 376 ··· 389 385 return cache; 390 386 } 391 387 392 - static void __cpuinit link_cache_lists(struct cache *smaller, struct cache *bigger) 388 + static void link_cache_lists(struct cache *smaller, struct cache *bigger) 393 389 { 394 390 while (smaller->next_local) { 395 391 if (smaller->next_local == bigger) ··· 400 396 smaller->next_local = bigger; 401 397 } 402 398 403 - static void __cpuinit do_subsidiary_caches_debugcheck(struct cache *cache) 399 + static void do_subsidiary_caches_debugcheck(struct cache *cache) 404 400 { 405 401 WARN_ON_ONCE(cache->level != 1); 406 402 WARN_ON_ONCE(strcmp(cache->ofnode->type, "cpu")); 407 403 } 408 404 409 - static void __cpuinit do_subsidiary_caches(struct cache *cache) 405 + static void do_subsidiary_caches(struct cache *cache) 410 406 { 411 407 struct device_node *subcache_node; 412 408 int level = cache->level; ··· 427 423 } 428 424 } 429 425 430 - static struct cache *__cpuinit cache_chain_instantiate(unsigned int cpu_id) 426 + static struct cache *cache_chain_instantiate(unsigned int cpu_id) 431 427 { 432 428 struct device_node *cpu_node; 433 429 struct cache *cpu_cache = NULL; ··· 452 448 return cpu_cache; 453 449 } 454 450 455 - static struct cache_dir *__cpuinit cacheinfo_create_cache_dir(unsigned int cpu_id) 451 + static struct cache_dir *cacheinfo_create_cache_dir(unsigned int cpu_id) 456 452 { 457 453 struct cache_dir *cache_dir; 458 454 struct device *dev; ··· 657 653 .default_attrs = cache_index_default_attrs, 658 654 }; 659 655 660 - static void __cpuinit cacheinfo_create_index_opt_attrs(struct cache_index_dir *dir) 656 + static void cacheinfo_create_index_opt_attrs(struct cache_index_dir *dir) 661 657 { 662 658 const char *cache_name; 663 659 const char *cache_type; ··· 700 696 kfree(buf); 701 697 } 702 698 703 - static void __cpuinit cacheinfo_create_index_dir(struct cache *cache, int index, struct cache_dir *cache_dir) 699 + static void cacheinfo_create_index_dir(struct cache *cache, int index, 700 + struct cache_dir *cache_dir) 704 701 { 705 702 struct cache_index_dir *index_dir; 706 703 int rc; ··· 727 722 kfree(index_dir); 728 723 } 729 724 730 - static void __cpuinit cacheinfo_sysfs_populate(unsigned int cpu_id, struct cache *cache_list) 725 + static void cacheinfo_sysfs_populate(unsigned int cpu_id, 726 + struct cache *cache_list) 731 727 { 732 728 struct cache_dir *cache_dir; 733 729 struct cache *cache; ··· 746 740 } 747 741 } 748 742 749 - void __cpuinit cacheinfo_cpu_online(unsigned int cpu_id) 743 + void cacheinfo_cpu_online(unsigned int cpu_id) 750 744 { 751 745 struct cache *cache; 752 746
+26 -4
arch/powerpc/kernel/entry_64.S
··· 629 629 630 630 CURRENT_THREAD_INFO(r9, r1) 631 631 ld r3,_MSR(r1) 632 + #ifdef CONFIG_PPC_BOOK3E 633 + ld r10,PACACURRENT(r13) 634 + #endif /* CONFIG_PPC_BOOK3E */ 632 635 ld r4,TI_FLAGS(r9) 633 636 andi. r3,r3,MSR_PR 634 637 beq resume_kernel 638 + #ifdef CONFIG_PPC_BOOK3E 639 + lwz r3,(THREAD+THREAD_DBCR0)(r10) 640 + #endif /* CONFIG_PPC_BOOK3E */ 635 641 636 642 /* Check current_thread_info()->flags */ 637 643 andi. r0,r4,_TIF_USER_WORK_MASK 644 + #ifdef CONFIG_PPC_BOOK3E 645 + bne 1f 646 + /* 647 + * Check to see if the dbcr0 register is set up to debug. 648 + * Use the internal debug mode bit to do this. 649 + */ 650 + andis. r0,r3,DBCR0_IDM@h 638 651 beq restore 639 - 640 - andi. r0,r4,_TIF_NEED_RESCHED 641 - beq 1f 652 + mfmsr r0 653 + rlwinm r0,r0,0,~MSR_DE /* Clear MSR.DE */ 654 + mtmsr r0 655 + mtspr SPRN_DBCR0,r3 656 + li r10, -1 657 + mtspr SPRN_DBSR,r10 658 + b restore 659 + #else 660 + beq restore 661 + #endif 662 + 1: andi. r0,r4,_TIF_NEED_RESCHED 663 + beq 2f 642 664 bl .restore_interrupts 643 665 SCHEDULE_USER 644 666 b .ret_from_except_lite 645 667 646 - 1: bl .save_nvgprs 668 + 2: bl .save_nvgprs 647 669 bl .restore_interrupts 648 670 addi r3,r1,STACK_FRAME_OVERHEAD 649 671 bl .do_notify_resume
+25 -31
arch/powerpc/kernel/exceptions-64s.S
··· 341 341 EXCEPTION_PROLOG_0(PACA_EXGEN) 342 342 b vsx_unavailable_pSeries 343 343 344 + facility_unavailable_trampoline: 344 345 . = 0xf60 345 346 SET_SCRATCH0(r13) 346 347 EXCEPTION_PROLOG_0(PACA_EXGEN) 347 - b tm_unavailable_pSeries 348 + b facility_unavailable_pSeries 349 + 350 + hv_facility_unavailable_trampoline: 351 + . = 0xf80 352 + SET_SCRATCH0(r13) 353 + EXCEPTION_PROLOG_0(PACA_EXGEN) 354 + b facility_unavailable_hv 348 355 349 356 #ifdef CONFIG_CBE_RAS 350 357 STD_EXCEPTION_HV(0x1200, 0x1202, cbe_system_error) ··· 529 522 KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf20) 530 523 STD_EXCEPTION_PSERIES_OOL(0xf40, vsx_unavailable) 531 524 KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf40) 532 - STD_EXCEPTION_PSERIES_OOL(0xf60, tm_unavailable) 525 + STD_EXCEPTION_PSERIES_OOL(0xf60, facility_unavailable) 533 526 KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf60) 527 + STD_EXCEPTION_HV_OOL(0xf82, facility_unavailable) 528 + KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xf82) 534 529 535 530 /* 536 531 * An interrupt came in while soft-disabled. We set paca->irq_happened, then: ··· 802 793 STD_RELON_EXCEPTION_PSERIES(0x4d00, 0xd00, single_step) 803 794 804 795 . = 0x4e00 805 - SET_SCRATCH0(r13) 806 - EXCEPTION_PROLOG_0(PACA_EXGEN) 807 - b h_data_storage_relon_hv 796 + b . /* Can't happen, see v2.07 Book III-S section 6.5 */ 808 797 809 798 . = 0x4e20 810 - SET_SCRATCH0(r13) 811 - EXCEPTION_PROLOG_0(PACA_EXGEN) 812 - b h_instr_storage_relon_hv 799 + b . /* Can't happen, see v2.07 Book III-S section 6.5 */ 813 800 814 801 . = 0x4e40 815 802 SET_SCRATCH0(r13) ··· 813 808 b emulation_assist_relon_hv 814 809 815 810 . = 0x4e60 816 - SET_SCRATCH0(r13) 817 - EXCEPTION_PROLOG_0(PACA_EXGEN) 818 - b hmi_exception_relon_hv 811 + b . /* Can't happen, see v2.07 Book III-S section 6.5 */ 819 812 820 813 . = 0x4e80 821 814 SET_SCRATCH0(r13) ··· 838 835 EXCEPTION_PROLOG_0(PACA_EXGEN) 839 836 b vsx_unavailable_relon_pSeries 840 837 841 - tm_unavailable_relon_pSeries_1: 838 + facility_unavailable_relon_trampoline: 842 839 . = 0x4f60 843 840 SET_SCRATCH0(r13) 844 841 EXCEPTION_PROLOG_0(PACA_EXGEN) 845 - b tm_unavailable_relon_pSeries 842 + b facility_unavailable_relon_pSeries 843 + 844 + hv_facility_unavailable_relon_trampoline: 845 + . = 0x4f80 846 + SET_SCRATCH0(r13) 847 + EXCEPTION_PROLOG_0(PACA_EXGEN) 848 + b facility_unavailable_relon_hv 846 849 847 850 STD_RELON_EXCEPTION_PSERIES(0x5300, 0x1300, instruction_breakpoint) 848 851 #ifdef CONFIG_PPC_DENORMALISATION ··· 1174 1165 bl .vsx_unavailable_exception 1175 1166 b .ret_from_except 1176 1167 1177 - .align 7 1178 - .globl tm_unavailable_common 1179 - tm_unavailable_common: 1180 - EXCEPTION_PROLOG_COMMON(0xf60, PACA_EXGEN) 1181 - bl .save_nvgprs 1182 - DISABLE_INTS 1183 - addi r3,r1,STACK_FRAME_OVERHEAD 1184 - bl .tm_unavailable_exception 1185 - b .ret_from_except 1168 + STD_EXCEPTION_COMMON(0xf60, facility_unavailable, .facility_unavailable_exception) 1186 1169 1187 1170 .align 7 1188 1171 .globl __end_handlers 1189 1172 __end_handlers: 1190 1173 1191 1174 /* Equivalents to the above handlers for relocation-on interrupt vectors */ 1192 - STD_RELON_EXCEPTION_HV_OOL(0xe00, h_data_storage) 1193 - KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe00) 1194 - STD_RELON_EXCEPTION_HV_OOL(0xe20, h_instr_storage) 1195 - KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe20) 1196 1175 STD_RELON_EXCEPTION_HV_OOL(0xe40, emulation_assist) 1197 - KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe40) 1198 - STD_RELON_EXCEPTION_HV_OOL(0xe60, hmi_exception) 1199 - KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe60) 1200 1176 MASKABLE_RELON_EXCEPTION_HV_OOL(0xe80, h_doorbell) 1201 - KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe80) 1202 1177 1203 1178 STD_RELON_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor) 1204 1179 STD_RELON_EXCEPTION_PSERIES_OOL(0xf20, altivec_unavailable) 1205 1180 STD_RELON_EXCEPTION_PSERIES_OOL(0xf40, vsx_unavailable) 1206 - STD_RELON_EXCEPTION_PSERIES_OOL(0xf60, tm_unavailable) 1181 + STD_RELON_EXCEPTION_PSERIES_OOL(0xf60, facility_unavailable) 1182 + STD_RELON_EXCEPTION_HV_OOL(0xf80, facility_unavailable) 1207 1183 1208 1184 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) 1209 1185 /*
+2 -1
arch/powerpc/kernel/hw_breakpoint.c
··· 176 176 length_max = 512 ; /* 64 doublewords */ 177 177 /* DAWR region can't cross 512 boundary */ 178 178 if ((bp->attr.bp_addr >> 10) != 179 - ((bp->attr.bp_addr + bp->attr.bp_len) >> 10)) 179 + ((bp->attr.bp_addr + bp->attr.bp_len - 1) >> 10)) 180 180 return -EINVAL; 181 181 } 182 182 if (info->len > ··· 250 250 * we still need to single-step the instruction, but we don't 251 251 * generate an event. 252 252 */ 253 + info->type &= ~HW_BRK_TYPE_EXTRANEOUS_IRQ; 253 254 if (!((bp->attr.bp_addr <= dar) && 254 255 (dar - bp->attr.bp_addr < bp->attr.bp_len))) 255 256 info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ;
+2 -2
arch/powerpc/kernel/idle.c
··· 85 85 /* 86 86 * Register the sysctl to set/clear powersave_nap. 87 87 */ 88 - static ctl_table powersave_nap_ctl_table[]={ 88 + static struct ctl_table powersave_nap_ctl_table[] = { 89 89 { 90 90 .procname = "powersave-nap", 91 91 .data = &powersave_nap, ··· 95 95 }, 96 96 {} 97 97 }; 98 - static ctl_table powersave_nap_sysctl_root[] = { 98 + static struct ctl_table powersave_nap_sysctl_root[] = { 99 99 { 100 100 .procname = "kernel", 101 101 .mode = 0555,
+9 -2
arch/powerpc/kernel/io-workarounds.c
··· 55 55 56 56 struct iowa_bus *iowa_mem_find_bus(const PCI_IO_ADDR addr) 57 57 { 58 + unsigned hugepage_shift; 58 59 struct iowa_bus *bus; 59 60 int token; 60 61 ··· 71 70 if (vaddr < PHB_IO_BASE || vaddr >= PHB_IO_END) 72 71 return NULL; 73 72 74 - ptep = find_linux_pte(init_mm.pgd, vaddr); 73 + ptep = find_linux_pte_or_hugepte(init_mm.pgd, vaddr, 74 + &hugepage_shift); 75 75 if (ptep == NULL) 76 76 paddr = 0; 77 - else 77 + else { 78 + /* 79 + * we don't have hugepages backing iomem 80 + */ 81 + WARN_ON(hugepage_shift); 78 82 paddr = pte_pfn(*ptep) << PAGE_SHIFT; 83 + } 79 84 bus = iowa_pci_find(vaddr, paddr); 80 85 81 86 if (bus == NULL)
+323
arch/powerpc/kernel/iommu.c
··· 36 36 #include <linux/hash.h> 37 37 #include <linux/fault-inject.h> 38 38 #include <linux/pci.h> 39 + #include <linux/iommu.h> 40 + #include <linux/sched.h> 39 41 #include <asm/io.h> 40 42 #include <asm/prom.h> 41 43 #include <asm/iommu.h> ··· 46 44 #include <asm/kdump.h> 47 45 #include <asm/fadump.h> 48 46 #include <asm/vio.h> 47 + #include <asm/tce.h> 49 48 50 49 #define DBG(...) 51 50 ··· 727 724 if (tbl->it_offset == 0) 728 725 clear_bit(0, tbl->it_map); 729 726 727 + #ifdef CONFIG_IOMMU_API 728 + if (tbl->it_group) { 729 + iommu_group_put(tbl->it_group); 730 + BUG_ON(tbl->it_group); 731 + } 732 + #endif 733 + 730 734 /* verify that table contains no entries */ 731 735 if (!bitmap_empty(tbl->it_map, tbl->it_size)) 732 736 pr_warn("%s: Unexpected TCEs for %s\n", __func__, node_name); ··· 870 860 free_pages((unsigned long)vaddr, get_order(size)); 871 861 } 872 862 } 863 + 864 + #ifdef CONFIG_IOMMU_API 865 + /* 866 + * SPAPR TCE API 867 + */ 868 + static void group_release(void *iommu_data) 869 + { 870 + struct iommu_table *tbl = iommu_data; 871 + tbl->it_group = NULL; 872 + } 873 + 874 + void iommu_register_group(struct iommu_table *tbl, 875 + int pci_domain_number, unsigned long pe_num) 876 + { 877 + struct iommu_group *grp; 878 + char *name; 879 + 880 + grp = iommu_group_alloc(); 881 + if (IS_ERR(grp)) { 882 + pr_warn("powerpc iommu api: cannot create new group, err=%ld\n", 883 + PTR_ERR(grp)); 884 + return; 885 + } 886 + tbl->it_group = grp; 887 + iommu_group_set_iommudata(grp, tbl, group_release); 888 + name = kasprintf(GFP_KERNEL, "domain%d-pe%lx", 889 + pci_domain_number, pe_num); 890 + if (!name) 891 + return; 892 + iommu_group_set_name(grp, name); 893 + kfree(name); 894 + } 895 + 896 + enum dma_data_direction iommu_tce_direction(unsigned long tce) 897 + { 898 + if ((tce & TCE_PCI_READ) && (tce & TCE_PCI_WRITE)) 899 + return DMA_BIDIRECTIONAL; 900 + else if (tce & TCE_PCI_READ) 901 + return DMA_TO_DEVICE; 902 + else if (tce & TCE_PCI_WRITE) 903 + return DMA_FROM_DEVICE; 904 + else 905 + return DMA_NONE; 906 + } 907 + EXPORT_SYMBOL_GPL(iommu_tce_direction); 908 + 909 + void iommu_flush_tce(struct iommu_table *tbl) 910 + { 911 + /* Flush/invalidate TLB caches if necessary */ 912 + if (ppc_md.tce_flush) 913 + ppc_md.tce_flush(tbl); 914 + 915 + /* Make sure updates are seen by hardware */ 916 + mb(); 917 + } 918 + EXPORT_SYMBOL_GPL(iommu_flush_tce); 919 + 920 + int iommu_tce_clear_param_check(struct iommu_table *tbl, 921 + unsigned long ioba, unsigned long tce_value, 922 + unsigned long npages) 923 + { 924 + /* ppc_md.tce_free() does not support any value but 0 */ 925 + if (tce_value) 926 + return -EINVAL; 927 + 928 + if (ioba & ~IOMMU_PAGE_MASK) 929 + return -EINVAL; 930 + 931 + ioba >>= IOMMU_PAGE_SHIFT; 932 + if (ioba < tbl->it_offset) 933 + return -EINVAL; 934 + 935 + if ((ioba + npages) > (tbl->it_offset + tbl->it_size)) 936 + return -EINVAL; 937 + 938 + return 0; 939 + } 940 + EXPORT_SYMBOL_GPL(iommu_tce_clear_param_check); 941 + 942 + int iommu_tce_put_param_check(struct iommu_table *tbl, 943 + unsigned long ioba, unsigned long tce) 944 + { 945 + if (!(tce & (TCE_PCI_WRITE | TCE_PCI_READ))) 946 + return -EINVAL; 947 + 948 + if (tce & ~(IOMMU_PAGE_MASK | TCE_PCI_WRITE | TCE_PCI_READ)) 949 + return -EINVAL; 950 + 951 + if (ioba & ~IOMMU_PAGE_MASK) 952 + return -EINVAL; 953 + 954 + ioba >>= IOMMU_PAGE_SHIFT; 955 + if (ioba < tbl->it_offset) 956 + return -EINVAL; 957 + 958 + if ((ioba + 1) > (tbl->it_offset + tbl->it_size)) 959 + return -EINVAL; 960 + 961 + return 0; 962 + } 963 + EXPORT_SYMBOL_GPL(iommu_tce_put_param_check); 964 + 965 + unsigned long iommu_clear_tce(struct iommu_table *tbl, unsigned long entry) 966 + { 967 + unsigned long oldtce; 968 + struct iommu_pool *pool = get_pool(tbl, entry); 969 + 970 + spin_lock(&(pool->lock)); 971 + 972 + oldtce = ppc_md.tce_get(tbl, entry); 973 + if (oldtce & (TCE_PCI_WRITE | TCE_PCI_READ)) 974 + ppc_md.tce_free(tbl, entry, 1); 975 + else 976 + oldtce = 0; 977 + 978 + spin_unlock(&(pool->lock)); 979 + 980 + return oldtce; 981 + } 982 + EXPORT_SYMBOL_GPL(iommu_clear_tce); 983 + 984 + int iommu_clear_tces_and_put_pages(struct iommu_table *tbl, 985 + unsigned long entry, unsigned long pages) 986 + { 987 + unsigned long oldtce; 988 + struct page *page; 989 + 990 + for ( ; pages; --pages, ++entry) { 991 + oldtce = iommu_clear_tce(tbl, entry); 992 + if (!oldtce) 993 + continue; 994 + 995 + page = pfn_to_page(oldtce >> PAGE_SHIFT); 996 + WARN_ON(!page); 997 + if (page) { 998 + if (oldtce & TCE_PCI_WRITE) 999 + SetPageDirty(page); 1000 + put_page(page); 1001 + } 1002 + } 1003 + 1004 + return 0; 1005 + } 1006 + EXPORT_SYMBOL_GPL(iommu_clear_tces_and_put_pages); 1007 + 1008 + /* 1009 + * hwaddr is a kernel virtual address here (0xc... bazillion), 1010 + * tce_build converts it to a physical address. 1011 + */ 1012 + int iommu_tce_build(struct iommu_table *tbl, unsigned long entry, 1013 + unsigned long hwaddr, enum dma_data_direction direction) 1014 + { 1015 + int ret = -EBUSY; 1016 + unsigned long oldtce; 1017 + struct iommu_pool *pool = get_pool(tbl, entry); 1018 + 1019 + spin_lock(&(pool->lock)); 1020 + 1021 + oldtce = ppc_md.tce_get(tbl, entry); 1022 + /* Add new entry if it is not busy */ 1023 + if (!(oldtce & (TCE_PCI_WRITE | TCE_PCI_READ))) 1024 + ret = ppc_md.tce_build(tbl, entry, 1, hwaddr, direction, NULL); 1025 + 1026 + spin_unlock(&(pool->lock)); 1027 + 1028 + /* if (unlikely(ret)) 1029 + pr_err("iommu_tce: %s failed on hwaddr=%lx ioba=%lx kva=%lx ret=%d\n", 1030 + __func__, hwaddr, entry << IOMMU_PAGE_SHIFT, 1031 + hwaddr, ret); */ 1032 + 1033 + return ret; 1034 + } 1035 + EXPORT_SYMBOL_GPL(iommu_tce_build); 1036 + 1037 + int iommu_put_tce_user_mode(struct iommu_table *tbl, unsigned long entry, 1038 + unsigned long tce) 1039 + { 1040 + int ret; 1041 + struct page *page = NULL; 1042 + unsigned long hwaddr, offset = tce & IOMMU_PAGE_MASK & ~PAGE_MASK; 1043 + enum dma_data_direction direction = iommu_tce_direction(tce); 1044 + 1045 + ret = get_user_pages_fast(tce & PAGE_MASK, 1, 1046 + direction != DMA_TO_DEVICE, &page); 1047 + if (unlikely(ret != 1)) { 1048 + /* pr_err("iommu_tce: get_user_pages_fast failed tce=%lx ioba=%lx ret=%d\n", 1049 + tce, entry << IOMMU_PAGE_SHIFT, ret); */ 1050 + return -EFAULT; 1051 + } 1052 + hwaddr = (unsigned long) page_address(page) + offset; 1053 + 1054 + ret = iommu_tce_build(tbl, entry, hwaddr, direction); 1055 + if (ret) 1056 + put_page(page); 1057 + 1058 + if (ret < 0) 1059 + pr_err("iommu_tce: %s failed ioba=%lx, tce=%lx, ret=%d\n", 1060 + __func__, entry << IOMMU_PAGE_SHIFT, tce, ret); 1061 + 1062 + return ret; 1063 + } 1064 + EXPORT_SYMBOL_GPL(iommu_put_tce_user_mode); 1065 + 1066 + int iommu_take_ownership(struct iommu_table *tbl) 1067 + { 1068 + unsigned long sz = (tbl->it_size + 7) >> 3; 1069 + 1070 + if (tbl->it_offset == 0) 1071 + clear_bit(0, tbl->it_map); 1072 + 1073 + if (!bitmap_empty(tbl->it_map, tbl->it_size)) { 1074 + pr_err("iommu_tce: it_map is not empty"); 1075 + return -EBUSY; 1076 + } 1077 + 1078 + memset(tbl->it_map, 0xff, sz); 1079 + iommu_clear_tces_and_put_pages(tbl, tbl->it_offset, tbl->it_size); 1080 + 1081 + return 0; 1082 + } 1083 + EXPORT_SYMBOL_GPL(iommu_take_ownership); 1084 + 1085 + void iommu_release_ownership(struct iommu_table *tbl) 1086 + { 1087 + unsigned long sz = (tbl->it_size + 7) >> 3; 1088 + 1089 + iommu_clear_tces_and_put_pages(tbl, tbl->it_offset, tbl->it_size); 1090 + memset(tbl->it_map, 0, sz); 1091 + 1092 + /* Restore bit#0 set by iommu_init_table() */ 1093 + if (tbl->it_offset == 0) 1094 + set_bit(0, tbl->it_map); 1095 + } 1096 + EXPORT_SYMBOL_GPL(iommu_release_ownership); 1097 + 1098 + static int iommu_add_device(struct device *dev) 1099 + { 1100 + struct iommu_table *tbl; 1101 + int ret = 0; 1102 + 1103 + if (WARN_ON(dev->iommu_group)) { 1104 + pr_warn("iommu_tce: device %s is already in iommu group %d, skipping\n", 1105 + dev_name(dev), 1106 + iommu_group_id(dev->iommu_group)); 1107 + return -EBUSY; 1108 + } 1109 + 1110 + tbl = get_iommu_table_base(dev); 1111 + if (!tbl || !tbl->it_group) { 1112 + pr_debug("iommu_tce: skipping device %s with no tbl\n", 1113 + dev_name(dev)); 1114 + return 0; 1115 + } 1116 + 1117 + pr_debug("iommu_tce: adding %s to iommu group %d\n", 1118 + dev_name(dev), iommu_group_id(tbl->it_group)); 1119 + 1120 + ret = iommu_group_add_device(tbl->it_group, dev); 1121 + if (ret < 0) 1122 + pr_err("iommu_tce: %s has not been added, ret=%d\n", 1123 + dev_name(dev), ret); 1124 + 1125 + return ret; 1126 + } 1127 + 1128 + static void iommu_del_device(struct device *dev) 1129 + { 1130 + iommu_group_remove_device(dev); 1131 + } 1132 + 1133 + static int iommu_bus_notifier(struct notifier_block *nb, 1134 + unsigned long action, void *data) 1135 + { 1136 + struct device *dev = data; 1137 + 1138 + switch (action) { 1139 + case BUS_NOTIFY_ADD_DEVICE: 1140 + return iommu_add_device(dev); 1141 + case BUS_NOTIFY_DEL_DEVICE: 1142 + iommu_del_device(dev); 1143 + return 0; 1144 + default: 1145 + return 0; 1146 + } 1147 + } 1148 + 1149 + static struct notifier_block tce_iommu_bus_nb = { 1150 + .notifier_call = iommu_bus_notifier, 1151 + }; 1152 + 1153 + static int __init tce_iommu_init(void) 1154 + { 1155 + struct pci_dev *pdev = NULL; 1156 + 1157 + BUILD_BUG_ON(PAGE_SIZE < IOMMU_PAGE_SIZE); 1158 + 1159 + for_each_pci_dev(pdev) 1160 + iommu_add_device(&pdev->dev); 1161 + 1162 + bus_register_notifier(&pci_bus_type, &tce_iommu_bus_nb); 1163 + return 0; 1164 + } 1165 + 1166 + subsys_initcall_sync(tce_iommu_init); 1167 + 1168 + #else 1169 + 1170 + void iommu_register_group(struct iommu_table *tbl, 1171 + int pci_domain_number, unsigned long pe_num) 1172 + { 1173 + } 1174 + 1175 + #endif /* CONFIG_IOMMU_API */
-2
arch/powerpc/kernel/irq.c
··· 116 116 u64 now = get_tb_or_rtc(); 117 117 u64 *next_tb = &__get_cpu_var(decrementers_next_tb); 118 118 119 - if (now >= *next_tb) 120 - set_dec(1); 121 119 return now >= *next_tb; 122 120 } 123 121
+1 -19
arch/powerpc/kernel/kprobes.c
··· 36 36 #include <asm/sstep.h> 37 37 #include <asm/uaccess.h> 38 38 39 - #ifdef CONFIG_PPC_ADV_DEBUG_REGS 40 - #define MSR_SINGLESTEP (MSR_DE) 41 - #else 42 - #define MSR_SINGLESTEP (MSR_SE) 43 - #endif 44 - 45 39 DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL; 46 40 DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk); 47 41 ··· 98 104 99 105 static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs *regs) 100 106 { 101 - /* We turn off async exceptions to ensure that the single step will 102 - * be for the instruction we have the kprobe on, if we dont its 103 - * possible we'd get the single step reported for an exception handler 104 - * like Decrementer or External Interrupt */ 105 - regs->msr &= ~MSR_EE; 106 - regs->msr |= MSR_SINGLESTEP; 107 - #ifdef CONFIG_PPC_ADV_DEBUG_REGS 108 - regs->msr &= ~MSR_CE; 109 - mtspr(SPRN_DBCR0, mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM); 110 - #ifdef CONFIG_PPC_47x 111 - isync(); 112 - #endif 113 - #endif 107 + enable_single_step(regs); 114 108 115 109 /* 116 110 * On powerpc we should single step on the original
+14 -6
arch/powerpc/kernel/nvram_64.c
··· 84 84 char *tmp = NULL; 85 85 ssize_t size; 86 86 87 - ret = -ENODEV; 88 - if (!ppc_md.nvram_size) 87 + if (!ppc_md.nvram_size) { 88 + ret = -ENODEV; 89 89 goto out; 90 + } 90 91 91 - ret = 0; 92 92 size = ppc_md.nvram_size(); 93 - if (*ppos >= size || size < 0) 93 + if (size < 0) { 94 + ret = size; 94 95 goto out; 96 + } 97 + 98 + if (*ppos >= size) { 99 + ret = 0; 100 + goto out; 101 + } 95 102 96 103 count = min_t(size_t, count, size - *ppos); 97 104 count = min(count, PAGE_SIZE); 98 105 99 - ret = -ENOMEM; 100 106 tmp = kmalloc(count, GFP_KERNEL); 101 - if (!tmp) 107 + if (!tmp) { 108 + ret = -ENOMEM; 102 109 goto out; 110 + } 103 111 104 112 ret = ppc_md.nvram_read(tmp, count, ppos); 105 113 if (ret <= 0)
+111
arch/powerpc/kernel/pci-hotplug.c
··· 1 + /* 2 + * Derived from "arch/powerpc/platforms/pseries/pci_dlpar.c" 3 + * 4 + * Copyright (C) 2003 Linda Xie <lxie@us.ibm.com> 5 + * Copyright (C) 2005 International Business Machines 6 + * 7 + * Updates, 2005, John Rose <johnrose@austin.ibm.com> 8 + * Updates, 2005, Linas Vepstas <linas@austin.ibm.com> 9 + * Updates, 2013, Gavin Shan <shangw@linux.vnet.ibm.com> 10 + * 11 + * This program is free software; you can redistribute it and/or modify 12 + * it under the terms of the GNU General Public License as published by 13 + * the Free Software Foundation; either version 2 of the License, or 14 + * (at your option) any later version. 15 + */ 16 + 17 + #include <linux/pci.h> 18 + #include <linux/export.h> 19 + #include <asm/pci-bridge.h> 20 + #include <asm/ppc-pci.h> 21 + #include <asm/firmware.h> 22 + #include <asm/eeh.h> 23 + 24 + /** 25 + * __pcibios_remove_pci_devices - remove all devices under this bus 26 + * @bus: the indicated PCI bus 27 + * @purge_pe: destroy the PE on removal of PCI devices 28 + * 29 + * Remove all of the PCI devices under this bus both from the 30 + * linux pci device tree, and from the powerpc EEH address cache. 31 + * By default, the corresponding PE will be destroied during the 32 + * normal PCI hotplug path. For PCI hotplug during EEH recovery, 33 + * the corresponding PE won't be destroied and deallocated. 34 + */ 35 + void __pcibios_remove_pci_devices(struct pci_bus *bus, int purge_pe) 36 + { 37 + struct pci_dev *dev, *tmp; 38 + struct pci_bus *child_bus; 39 + 40 + /* First go down child busses */ 41 + list_for_each_entry(child_bus, &bus->children, node) 42 + __pcibios_remove_pci_devices(child_bus, purge_pe); 43 + 44 + pr_debug("PCI: Removing devices on bus %04x:%02x\n", 45 + pci_domain_nr(bus), bus->number); 46 + list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) { 47 + pr_debug(" * Removing %s...\n", pci_name(dev)); 48 + eeh_remove_bus_device(dev, purge_pe); 49 + pci_stop_and_remove_bus_device(dev); 50 + } 51 + } 52 + 53 + /** 54 + * pcibios_remove_pci_devices - remove all devices under this bus 55 + * @bus: the indicated PCI bus 56 + * 57 + * Remove all of the PCI devices under this bus both from the 58 + * linux pci device tree, and from the powerpc EEH address cache. 59 + */ 60 + void pcibios_remove_pci_devices(struct pci_bus *bus) 61 + { 62 + __pcibios_remove_pci_devices(bus, 1); 63 + } 64 + EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices); 65 + 66 + /** 67 + * pcibios_add_pci_devices - adds new pci devices to bus 68 + * @bus: the indicated PCI bus 69 + * 70 + * This routine will find and fixup new pci devices under 71 + * the indicated bus. This routine presumes that there 72 + * might already be some devices under this bridge, so 73 + * it carefully tries to add only new devices. (And that 74 + * is how this routine differs from other, similar pcibios 75 + * routines.) 76 + */ 77 + void pcibios_add_pci_devices(struct pci_bus * bus) 78 + { 79 + int slotno, num, mode, pass, max; 80 + struct pci_dev *dev; 81 + struct device_node *dn = pci_bus_to_OF_node(bus); 82 + 83 + eeh_add_device_tree_early(dn); 84 + 85 + mode = PCI_PROBE_NORMAL; 86 + if (ppc_md.pci_probe_mode) 87 + mode = ppc_md.pci_probe_mode(bus); 88 + 89 + if (mode == PCI_PROBE_DEVTREE) { 90 + /* use ofdt-based probe */ 91 + of_rescan_bus(dn, bus); 92 + } else if (mode == PCI_PROBE_NORMAL) { 93 + /* use legacy probe */ 94 + slotno = PCI_SLOT(PCI_DN(dn->child)->devfn); 95 + num = pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); 96 + if (!num) 97 + return; 98 + pcibios_setup_bus_devices(bus); 99 + max = bus->busn_res.start; 100 + for (pass = 0; pass < 2; pass++) { 101 + list_for_each_entry(dev, &bus->devices, bus_list) { 102 + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE || 103 + dev->hdr_type == PCI_HEADER_TYPE_CARDBUS) 104 + max = pci_scan_bridge(bus, dev, 105 + max, pass); 106 + } 107 + } 108 + } 109 + pcibios_finish_adding_to_bus(bus); 110 + } 111 + EXPORT_SYMBOL_GPL(pcibios_add_pci_devices);
+4
arch/powerpc/kernel/process.c
··· 916 916 flush_altivec_to_thread(src); 917 917 flush_vsx_to_thread(src); 918 918 flush_spe_to_thread(src); 919 + 919 920 *dst = *src; 921 + 922 + clear_task_ebb(dst); 923 + 920 924 return 0; 921 925 } 922 926
+40 -2
arch/powerpc/kernel/prom.c
··· 559 559 } 560 560 #endif 561 561 562 + static void __init early_reserve_mem_dt(void) 563 + { 564 + unsigned long i, len, dt_root; 565 + const __be32 *prop; 566 + 567 + dt_root = of_get_flat_dt_root(); 568 + 569 + prop = of_get_flat_dt_prop(dt_root, "reserved-ranges", &len); 570 + 571 + if (!prop) 572 + return; 573 + 574 + DBG("Found new-style reserved-ranges\n"); 575 + 576 + /* Each reserved range is an (address,size) pair, 2 cells each, 577 + * totalling 4 cells per range. */ 578 + for (i = 0; i < len / (sizeof(*prop) * 4); i++) { 579 + u64 base, size; 580 + 581 + base = of_read_number(prop + (i * 4) + 0, 2); 582 + size = of_read_number(prop + (i * 4) + 2, 2); 583 + 584 + if (size) { 585 + DBG("reserving: %llx -> %llx\n", base, size); 586 + memblock_reserve(base, size); 587 + } 588 + } 589 + } 590 + 562 591 static void __init early_reserve_mem(void) 563 592 { 564 593 u64 base, size; ··· 603 574 self_size = initial_boot_params->totalsize; 604 575 memblock_reserve(self_base, self_size); 605 576 577 + /* Look for the new "reserved-regions" property in the DT */ 578 + early_reserve_mem_dt(); 579 + 606 580 #ifdef CONFIG_BLK_DEV_INITRD 607 - /* then reserve the initrd, if any */ 608 - if (initrd_start && (initrd_end > initrd_start)) 581 + /* Then reserve the initrd, if any */ 582 + if (initrd_start && (initrd_end > initrd_start)) { 609 583 memblock_reserve(_ALIGN_DOWN(__pa(initrd_start), PAGE_SIZE), 610 584 _ALIGN_UP(initrd_end, PAGE_SIZE) - 611 585 _ALIGN_DOWN(initrd_start, PAGE_SIZE)); 586 + } 612 587 #endif /* CONFIG_BLK_DEV_INITRD */ 613 588 614 589 #ifdef CONFIG_PPC32 ··· 623 590 if (*reserve_map > 0xffffffffull) { 624 591 u32 base_32, size_32; 625 592 u32 *reserve_map_32 = (u32 *)reserve_map; 593 + 594 + DBG("Found old 32-bit reserve map\n"); 626 595 627 596 while (1) { 628 597 base_32 = *(reserve_map_32++); ··· 640 605 return; 641 606 } 642 607 #endif 608 + DBG("Processing reserve map\n"); 609 + 610 + /* Handle the reserve map in the fdt blob if it exists */ 643 611 while (1) { 644 612 base = *(reserve_map++); 645 613 size = *(reserve_map++);
+3 -1
arch/powerpc/kernel/ptrace.c
··· 1449 1449 */ 1450 1450 if (bp_info->addr_mode == PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE) { 1451 1451 len = bp_info->addr2 - bp_info->addr; 1452 - } else if (bp_info->addr_mode != PPC_BREAKPOINT_MODE_EXACT) { 1452 + } else if (bp_info->addr_mode == PPC_BREAKPOINT_MODE_EXACT) 1453 + len = 1; 1454 + else { 1453 1455 ptrace_put_breakpoints(child); 1454 1456 return -EINVAL; 1455 1457 }
+2 -1
arch/powerpc/kernel/reloc_32.S
··· 166 166 /* R_PPC_ADDR16_LO */ 167 167 lo16: 168 168 cmpwi r4, R_PPC_ADDR16_LO 169 - bne nxtrela 169 + bne unknown_type 170 170 lwz r4, 0(r9) /* r_offset */ 171 171 lwz r0, 8(r9) /* r_addend */ 172 172 add r0, r0, r3 ··· 191 191 dcbst r4,r7 192 192 sync /* Ensure the data is flushed before icbi */ 193 193 icbi r4,r7 194 + unknown_type: 194 195 cmpwi r8, 0 /* relasz = 0 ? */ 195 196 ble done 196 197 add r9, r9, r6 /* move to next entry in the .rela table */
+2 -2
arch/powerpc/kernel/rtas.c
··· 1172 1172 static arch_spinlock_t timebase_lock; 1173 1173 static u64 timebase = 0; 1174 1174 1175 - void __cpuinit rtas_give_timebase(void) 1175 + void rtas_give_timebase(void) 1176 1176 { 1177 1177 unsigned long flags; 1178 1178 ··· 1189 1189 local_irq_restore(flags); 1190 1190 } 1191 1191 1192 - void __cpuinit rtas_take_timebase(void) 1192 + void rtas_take_timebase(void) 1193 1193 { 1194 1194 while (!timebase) 1195 1195 barrier();
+1 -1
arch/powerpc/kernel/setup_64.c
··· 76 76 #endif 77 77 78 78 int boot_cpuid = 0; 79 - int __initdata spinning_secondaries; 79 + int spinning_secondaries; 80 80 u64 ppc64_pft_size; 81 81 82 82 /* Pick defaults since we might want to patch instructions
+53 -17
arch/powerpc/kernel/signal_32.c
··· 407 407 * altivec/spe instructions at some point. 408 408 */ 409 409 static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame, 410 - int sigret, int ctx_has_vsx_region) 410 + struct mcontext __user *tm_frame, int sigret, 411 + int ctx_has_vsx_region) 411 412 { 412 413 unsigned long msr = regs->msr; 413 414 ··· 476 475 477 476 if (__put_user(msr, &frame->mc_gregs[PT_MSR])) 478 477 return 1; 478 + /* We need to write 0 the MSR top 32 bits in the tm frame so that we 479 + * can check it on the restore to see if TM is active 480 + */ 481 + if (tm_frame && __put_user(0, &tm_frame->mc_gregs[PT_MSR])) 482 + return 1; 483 + 479 484 if (sigret) { 480 485 /* Set up the sigreturn trampoline: li r0,sigret; sc */ 481 486 if (__put_user(0x38000000UL + sigret, &frame->tramp[0]) ··· 754 747 struct mcontext __user *tm_sr) 755 748 { 756 749 long err; 757 - unsigned long msr; 750 + unsigned long msr, msr_hi; 758 751 #ifdef CONFIG_VSX 759 752 int i; 760 753 #endif ··· 859 852 tm_enable(); 860 853 /* This loads the checkpointed FP/VEC state, if used */ 861 854 tm_recheckpoint(&current->thread, msr); 862 - /* The task has moved into TM state S, so ensure MSR reflects this */ 863 - regs->msr = (regs->msr & ~MSR_TS_MASK) | MSR_TS_S; 855 + /* Get the top half of the MSR */ 856 + if (__get_user(msr_hi, &tm_sr->mc_gregs[PT_MSR])) 857 + return 1; 858 + /* Pull in MSR TM from user context */ 859 + regs->msr = (regs->msr & ~MSR_TS_MASK) | ((msr_hi<<32) & MSR_TS_MASK); 864 860 865 861 /* This loads the speculative FP/VEC state, if used */ 866 862 if (msr & MSR_FP) { ··· 962 952 { 963 953 struct rt_sigframe __user *rt_sf; 964 954 struct mcontext __user *frame; 955 + struct mcontext __user *tm_frame = NULL; 965 956 void __user *addr; 966 957 unsigned long newsp = 0; 967 958 int sigret; ··· 996 985 } 997 986 998 987 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM 988 + tm_frame = &rt_sf->uc_transact.uc_mcontext; 999 989 if (MSR_TM_ACTIVE(regs->msr)) { 1000 - if (save_tm_user_regs(regs, &rt_sf->uc.uc_mcontext, 1001 - &rt_sf->uc_transact.uc_mcontext, sigret)) 990 + if (save_tm_user_regs(regs, frame, tm_frame, sigret)) 1002 991 goto badframe; 1003 992 } 1004 993 else 1005 994 #endif 1006 - if (save_user_regs(regs, frame, sigret, 1)) 995 + { 996 + if (save_user_regs(regs, frame, tm_frame, sigret, 1)) 1007 997 goto badframe; 998 + } 1008 999 regs->link = tramp; 1009 1000 1010 1001 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM 1011 1002 if (MSR_TM_ACTIVE(regs->msr)) { 1012 1003 if (__put_user((unsigned long)&rt_sf->uc_transact, 1013 1004 &rt_sf->uc.uc_link) 1014 - || __put_user(to_user_ptr(&rt_sf->uc_transact.uc_mcontext), 1015 - &rt_sf->uc_transact.uc_regs)) 1005 + || __put_user((unsigned long)tm_frame, &rt_sf->uc_transact.uc_regs)) 1016 1006 goto badframe; 1017 1007 } 1018 1008 else ··· 1182 1170 mctx = (struct mcontext __user *) 1183 1171 ((unsigned long) &old_ctx->uc_mcontext & ~0xfUL); 1184 1172 if (!access_ok(VERIFY_WRITE, old_ctx, ctx_size) 1185 - || save_user_regs(regs, mctx, 0, ctx_has_vsx_region) 1173 + || save_user_regs(regs, mctx, NULL, 0, ctx_has_vsx_region) 1186 1174 || put_sigset_t(&old_ctx->uc_sigmask, &current->blocked) 1187 1175 || __put_user(to_user_ptr(mctx), &old_ctx->uc_regs)) 1188 1176 return -EFAULT; ··· 1245 1233 if (__get_user(msr_hi, &mcp->mc_gregs[PT_MSR])) 1246 1234 goto bad; 1247 1235 1248 - if (MSR_TM_SUSPENDED(msr_hi<<32)) { 1236 + if (MSR_TM_ACTIVE(msr_hi<<32)) { 1249 1237 /* We only recheckpoint on return if we're 1250 1238 * transaction. 1251 1239 */ ··· 1404 1392 { 1405 1393 struct sigcontext __user *sc; 1406 1394 struct sigframe __user *frame; 1395 + struct mcontext __user *tm_mctx = NULL; 1407 1396 unsigned long newsp = 0; 1408 1397 int sigret; 1409 1398 unsigned long tramp; ··· 1438 1425 } 1439 1426 1440 1427 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM 1428 + tm_mctx = &frame->mctx_transact; 1441 1429 if (MSR_TM_ACTIVE(regs->msr)) { 1442 1430 if (save_tm_user_regs(regs, &frame->mctx, &frame->mctx_transact, 1443 1431 sigret)) ··· 1446 1432 } 1447 1433 else 1448 1434 #endif 1449 - if (save_user_regs(regs, &frame->mctx, sigret, 1)) 1435 + { 1436 + if (save_user_regs(regs, &frame->mctx, tm_mctx, sigret, 1)) 1450 1437 goto badframe; 1438 + } 1451 1439 1452 1440 regs->link = tramp; 1453 1441 ··· 1497 1481 long sys_sigreturn(int r3, int r4, int r5, int r6, int r7, int r8, 1498 1482 struct pt_regs *regs) 1499 1483 { 1484 + struct sigframe __user *sf; 1500 1485 struct sigcontext __user *sc; 1501 1486 struct sigcontext sigctx; 1502 1487 struct mcontext __user *sr; 1503 1488 void __user *addr; 1504 1489 sigset_t set; 1490 + #ifdef CONFIG_PPC_TRANSACTIONAL_MEM 1491 + struct mcontext __user *mcp, *tm_mcp; 1492 + unsigned long msr_hi; 1493 + #endif 1505 1494 1506 1495 /* Always make any pending restarted system calls return -EINTR */ 1507 1496 current_thread_info()->restart_block.fn = do_no_restart_syscall; 1508 1497 1509 - sc = (struct sigcontext __user *)(regs->gpr[1] + __SIGNAL_FRAMESIZE); 1498 + sf = (struct sigframe __user *)(regs->gpr[1] + __SIGNAL_FRAMESIZE); 1499 + sc = &sf->sctx; 1510 1500 addr = sc; 1511 1501 if (copy_from_user(&sigctx, sc, sizeof(sigctx))) 1512 1502 goto badframe; ··· 1529 1507 #endif 1530 1508 set_current_blocked(&set); 1531 1509 1532 - sr = (struct mcontext __user *)from_user_ptr(sigctx.regs); 1533 - addr = sr; 1534 - if (!access_ok(VERIFY_READ, sr, sizeof(*sr)) 1535 - || restore_user_regs(regs, sr, 1)) 1510 + #ifdef CONFIG_PPC_TRANSACTIONAL_MEM 1511 + mcp = (struct mcontext __user *)&sf->mctx; 1512 + tm_mcp = (struct mcontext __user *)&sf->mctx_transact; 1513 + if (__get_user(msr_hi, &tm_mcp->mc_gregs[PT_MSR])) 1536 1514 goto badframe; 1515 + if (MSR_TM_ACTIVE(msr_hi<<32)) { 1516 + if (!cpu_has_feature(CPU_FTR_TM)) 1517 + goto badframe; 1518 + if (restore_tm_user_regs(regs, mcp, tm_mcp)) 1519 + goto badframe; 1520 + } else 1521 + #endif 1522 + { 1523 + sr = (struct mcontext __user *)from_user_ptr(sigctx.regs); 1524 + addr = sr; 1525 + if (!access_ok(VERIFY_READ, sr, sizeof(*sr)) 1526 + || restore_user_regs(regs, sr, 1)) 1527 + goto badframe; 1528 + } 1537 1529 1538 1530 set_thread_flag(TIF_RESTOREALL); 1539 1531 return 0;
+5 -3
arch/powerpc/kernel/signal_64.c
··· 410 410 411 411 /* get MSR separately, transfer the LE bit if doing signal return */ 412 412 err |= __get_user(msr, &sc->gp_regs[PT_MSR]); 413 + /* pull in MSR TM from user context */ 414 + regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK); 415 + 416 + /* pull in MSR LE from user context */ 413 417 regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE); 414 418 415 419 /* The following non-GPR non-FPR non-VR state is also checkpointed: */ ··· 509 505 tm_enable(); 510 506 /* This loads the checkpointed FP/VEC state, if used */ 511 507 tm_recheckpoint(&current->thread, msr); 512 - /* The task has moved into TM state S, so ensure MSR reflects this: */ 513 - regs->msr = (regs->msr & ~MSR_TS_MASK) | __MASK(33); 514 508 515 509 /* This loads the speculative FP/VEC state, if used */ 516 510 if (msr & MSR_FP) { ··· 656 654 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM 657 655 if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR])) 658 656 goto badframe; 659 - if (MSR_TM_SUSPENDED(msr)) { 657 + if (MSR_TM_ACTIVE(msr)) { 660 658 /* We recheckpoint on return. */ 661 659 struct ucontext __user *uc_transact; 662 660 if (__get_user(uc_transact, &uc->uc_link))
+7 -5
arch/powerpc/kernel/smp.c
··· 480 480 secondary_ti = current_set[cpu] = ti; 481 481 } 482 482 483 - int __cpuinit __cpu_up(unsigned int cpu, struct task_struct *tidle) 483 + int __cpu_up(unsigned int cpu, struct task_struct *tidle) 484 484 { 485 485 int rc, c; 486 486 ··· 610 610 } 611 611 612 612 /* Activate a secondary processor. */ 613 - __cpuinit void start_secondary(void *unused) 613 + void start_secondary(void *unused) 614 614 { 615 615 unsigned int cpu = smp_processor_id(); 616 616 struct device_node *l2_cache; ··· 637 637 638 638 vdso_getcpu_init(); 639 639 #endif 640 - notify_cpu_starting(cpu); 641 - set_cpu_online(cpu, true); 642 640 /* Update sibling maps */ 643 641 base = cpu_first_thread_sibling(cpu); 644 642 for (i = 0; i < threads_per_core; i++) { 645 - if (cpu_is_offline(base + i)) 643 + if (cpu_is_offline(base + i) && (cpu != base + i)) 646 644 continue; 647 645 cpumask_set_cpu(cpu, cpu_sibling_mask(base + i)); 648 646 cpumask_set_cpu(base + i, cpu_sibling_mask(cpu)); ··· 664 666 of_node_put(np); 665 667 } 666 668 of_node_put(l2_cache); 669 + 670 + smp_wmb(); 671 + notify_cpu_starting(cpu); 672 + set_cpu_online(cpu, true); 667 673 668 674 local_irq_enable(); 669 675
+3 -3
arch/powerpc/kernel/sysfs.c
··· 341 341 #endif /* HAS_PPC_PMC_PA6T */ 342 342 #endif /* HAS_PPC_PMC_CLASSIC */ 343 343 344 - static void __cpuinit register_cpu_online(unsigned int cpu) 344 + static void register_cpu_online(unsigned int cpu) 345 345 { 346 346 struct cpu *c = &per_cpu(cpu_devices, cpu); 347 347 struct device *s = &c->dev; ··· 502 502 503 503 #endif /* CONFIG_HOTPLUG_CPU */ 504 504 505 - static int __cpuinit sysfs_cpu_notify(struct notifier_block *self, 505 + static int sysfs_cpu_notify(struct notifier_block *self, 506 506 unsigned long action, void *hcpu) 507 507 { 508 508 unsigned int cpu = (unsigned int)(long)hcpu; ··· 522 522 return NOTIFY_OK; 523 523 } 524 524 525 - static struct notifier_block __cpuinitdata sysfs_cpu_nb = { 525 + static struct notifier_block sysfs_cpu_nb = { 526 526 .notifier_call = sysfs_cpu_notify, 527 527 }; 528 528
-1
arch/powerpc/kernel/time.c
··· 631 631 return found; 632 632 } 633 633 634 - /* should become __cpuinit when secondary_cpu_time_init also is */ 635 634 void start_cpu_decrementer(void) 636 635 { 637 636 #if defined(CONFIG_BOOKE) || defined(CONFIG_40x)
+16 -2
arch/powerpc/kernel/tm.S
··· 112 112 std r3, STACK_PARAM(0)(r1) 113 113 SAVE_NVGPRS(r1) 114 114 115 + /* We need to setup MSR for VSX register save instructions. Here we 116 + * also clear the MSR RI since when we do the treclaim, we won't have a 117 + * valid kernel pointer for a while. We clear RI here as it avoids 118 + * adding another mtmsr closer to the treclaim. This makes the region 119 + * maked as non-recoverable wider than it needs to be but it saves on 120 + * inserting another mtmsrd later. 121 + */ 115 122 mfmsr r14 116 123 mr r15, r14 117 124 ori r15, r15, MSR_FP 125 + li r16, MSR_RI 126 + andc r15, r15, r16 118 127 oris r15, r15, MSR_VEC@h 119 128 #ifdef CONFIG_VSX 120 129 BEGIN_FTR_SECTION ··· 358 349 mtcr r5 359 350 mtxer r6 360 351 361 - /* MSR and flags: We don't change CRs, and we don't need to alter 362 - * MSR. 352 + /* Clear the MSR RI since we are about to change R1. EE is already off 363 353 */ 354 + li r4, 0 355 + mtmsrd r4, 1 364 356 365 357 REST_4GPRS(0, r7) /* GPR0-3 */ 366 358 REST_GPR(4, r7) /* GPR4-6 */ ··· 386 376 387 377 GET_PACA(r13) 388 378 GET_SCRATCH0(r1) 379 + 380 + /* R1 is restored, so we are recoverable again. EE is still off */ 381 + li r4, MSR_RI 382 + mtmsrd r4, 1 389 383 390 384 REST_NVGPRS(r1) 391 385
+49 -30
arch/powerpc/kernel/traps.c
··· 866 866 u8 val; 867 867 u32 shift = 8 * (3 - (pos & 0x3)); 868 868 869 + /* if process is 32-bit, clear upper 32 bits of EA */ 870 + if ((regs->msr & MSR_64BIT) == 0) 871 + EA &= 0xFFFFFFFF; 872 + 869 873 switch ((instword & PPC_INST_STRING_MASK)) { 870 874 case PPC_INST_LSWX: 871 875 case PPC_INST_LSWI: ··· 1129 1125 * ESR_DST (!?) or 0. In the process of chasing this with the 1130 1126 * hardware people - not sure if it can happen on any illegal 1131 1127 * instruction or only on FP instructions, whether there is a 1132 - * pattern to occurrences etc. -dgibson 31/Mar/2003 */ 1128 + * pattern to occurrences etc. -dgibson 31/Mar/2003 1129 + */ 1130 + 1131 + /* 1132 + * If we support a HW FPU, we need to ensure the FP state 1133 + * if flushed into the thread_struct before attempting 1134 + * emulation 1135 + */ 1136 + #ifdef CONFIG_PPC_FPU 1137 + flush_fp_to_thread(current); 1138 + #endif 1133 1139 switch (do_mathemu(regs)) { 1134 1140 case 0: 1135 1141 emulate_single_step(regs); ··· 1296 1282 die("Unrecoverable VSX Unavailable Exception", regs, SIGABRT); 1297 1283 } 1298 1284 1299 - void tm_unavailable_exception(struct pt_regs *regs) 1285 + void facility_unavailable_exception(struct pt_regs *regs) 1300 1286 { 1287 + static char *facility_strings[] = { 1288 + "FPU", 1289 + "VMX/VSX", 1290 + "DSCR", 1291 + "PMU SPRs", 1292 + "BHRB", 1293 + "TM", 1294 + "AT", 1295 + "EBB", 1296 + "TAR", 1297 + }; 1298 + char *facility, *prefix; 1299 + u64 value; 1300 + 1301 + if (regs->trap == 0xf60) { 1302 + value = mfspr(SPRN_FSCR); 1303 + prefix = ""; 1304 + } else { 1305 + value = mfspr(SPRN_HFSCR); 1306 + prefix = "Hypervisor "; 1307 + } 1308 + 1309 + value = value >> 56; 1310 + 1301 1311 /* We restore the interrupt state now */ 1302 1312 if (!arch_irq_disabled_regs(regs)) 1303 1313 local_irq_enable(); 1304 1314 1305 - /* Currently we never expect a TMU exception. Catch 1306 - * this and kill the process! 1307 - */ 1308 - printk(KERN_EMERG "Unexpected TM unavailable exception at %lx " 1309 - "(msr %lx)\n", 1310 - regs->nip, regs->msr); 1315 + if (value < ARRAY_SIZE(facility_strings)) 1316 + facility = facility_strings[value]; 1317 + else 1318 + facility = "unknown"; 1319 + 1320 + pr_err("%sFacility '%s' unavailable, exception at 0x%lx, MSR=%lx\n", 1321 + prefix, facility, regs->nip, regs->msr); 1311 1322 1312 1323 if (user_mode(regs)) { 1313 1324 _exception(SIGILL, regs, ILL_ILLOPC, regs->nip); 1314 1325 return; 1315 1326 } 1316 1327 1317 - die("Unexpected TM unavailable exception", regs, SIGABRT); 1328 + die("Unexpected facility unavailable exception", regs, SIGABRT); 1318 1329 } 1319 1330 1320 1331 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM ··· 1435 1396 void SoftwareEmulation(struct pt_regs *regs) 1436 1397 { 1437 1398 extern int do_mathemu(struct pt_regs *); 1438 - extern int Soft_emulate_8xx(struct pt_regs *); 1439 - #if defined(CONFIG_MATH_EMULATION) || defined(CONFIG_8XX_MINIMAL_FPEMU) 1399 + #if defined(CONFIG_MATH_EMULATION) 1440 1400 int errcode; 1441 1401 #endif 1442 1402 ··· 1466 1428 return; 1467 1429 default: 1468 1430 _exception(SIGILL, regs, ILL_ILLOPC, regs->nip); 1469 - return; 1470 - } 1471 - 1472 - #elif defined(CONFIG_8XX_MINIMAL_FPEMU) 1473 - errcode = Soft_emulate_8xx(regs); 1474 - if (errcode >= 0) 1475 - PPC_WARN_EMULATED(8xx, regs); 1476 - 1477 - switch (errcode) { 1478 - case 0: 1479 - emulate_single_step(regs); 1480 - return; 1481 - case 1: 1482 - _exception(SIGILL, regs, ILL_ILLOPC, regs->nip); 1483 - return; 1484 - case -EFAULT: 1485 - _exception(SIGSEGV, regs, SEGV_MAPERR, regs->nip); 1486 1431 return; 1487 1432 } 1488 1433 #else ··· 1817 1796 WARN_EMULATED_SETUP(unaligned), 1818 1797 #ifdef CONFIG_MATH_EMULATION 1819 1798 WARN_EMULATED_SETUP(math), 1820 - #elif defined(CONFIG_8XX_MINIMAL_FPEMU) 1821 - WARN_EMULATED_SETUP(8xx), 1822 1799 #endif 1823 1800 #ifdef CONFIG_VSX 1824 1801 WARN_EMULATED_SETUP(vsx),
+1 -1
arch/powerpc/kernel/udbg.c
··· 50 50 udbg_init_debug_beat(); 51 51 #elif defined(CONFIG_PPC_EARLY_DEBUG_PAS_REALMODE) 52 52 udbg_init_pas_realmode(); 53 - #elif defined(CONFIG_BOOTX_TEXT) 53 + #elif defined(CONFIG_PPC_EARLY_DEBUG_BOOTX) 54 54 udbg_init_btext(); 55 55 #elif defined(CONFIG_PPC_EARLY_DEBUG_44x) 56 56 /* PPC44x debug */
+1 -1
arch/powerpc/kernel/vdso.c
··· 711 711 } 712 712 713 713 #ifdef CONFIG_PPC64 714 - int __cpuinit vdso_getcpu_init(void) 714 + int vdso_getcpu_init(void) 715 715 { 716 716 unsigned long cpu, node, val; 717 717
+1 -1
arch/powerpc/kvm/book3s_64_mmu_host.c
··· 34 34 void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte) 35 35 { 36 36 ppc_md.hpte_invalidate(pte->slot, pte->host_vpn, 37 - MMU_PAGE_4K, MMU_SEGSIZE_256M, 37 + MMU_PAGE_4K, MMU_PAGE_4K, MMU_SEGSIZE_256M, 38 38 false); 39 39 } 40 40
+5 -3
arch/powerpc/kvm/book3s_64_mmu_hv.c
··· 675 675 } 676 676 /* if the guest wants write access, see if that is OK */ 677 677 if (!writing && hpte_is_writable(r)) { 678 + unsigned int hugepage_shift; 678 679 pte_t *ptep, pte; 679 680 680 681 /* ··· 684 683 */ 685 684 rcu_read_lock_sched(); 686 685 ptep = find_linux_pte_or_hugepte(current->mm->pgd, 687 - hva, NULL); 688 - if (ptep && pte_present(*ptep)) { 689 - pte = kvmppc_read_update_linux_pte(ptep, 1); 686 + hva, &hugepage_shift); 687 + if (ptep) { 688 + pte = kvmppc_read_update_linux_pte(ptep, 1, 689 + hugepage_shift); 690 690 if (pte_write(pte)) 691 691 write_ok = 1; 692 692 }
+6 -8
arch/powerpc/kvm/book3s_hv_rm_mmu.c
··· 27 27 unsigned long addr = (unsigned long) x; 28 28 pte_t *p; 29 29 30 - p = find_linux_pte(swapper_pg_dir, addr); 30 + p = find_linux_pte_or_hugepte(swapper_pg_dir, addr, NULL); 31 31 if (!p || !pte_present(*p)) 32 32 return NULL; 33 33 /* assume we don't have huge pages in vmalloc space... */ ··· 139 139 { 140 140 pte_t *ptep; 141 141 unsigned long ps = *pte_sizep; 142 - unsigned int shift; 142 + unsigned int hugepage_shift; 143 143 144 - ptep = find_linux_pte_or_hugepte(pgdir, hva, &shift); 144 + ptep = find_linux_pte_or_hugepte(pgdir, hva, &hugepage_shift); 145 145 if (!ptep) 146 146 return __pte(0); 147 - if (shift) 148 - *pte_sizep = 1ul << shift; 147 + if (hugepage_shift) 148 + *pte_sizep = 1ul << hugepage_shift; 149 149 else 150 150 *pte_sizep = PAGE_SIZE; 151 151 if (ps > *pte_sizep) 152 152 return __pte(0); 153 - if (!pte_present(*ptep)) 154 - return __pte(0); 155 - return kvmppc_read_update_linux_pte(ptep, writing); 153 + return kvmppc_read_update_linux_pte(ptep, writing, hugepage_shift); 156 154 } 157 155 158 156 static inline void unlock_hpte(unsigned long *hpte, unsigned long hpte_v)
+1 -1
arch/powerpc/lib/sstep.c
··· 580 580 if (instr & 1) 581 581 regs->link = regs->nip; 582 582 if (branch_taken(instr, regs)) 583 - regs->nip = imm; 583 + regs->nip = truncate_if_32bit(regs->msr, imm); 584 584 return 1; 585 585 #ifdef CONFIG_PPC64 586 586 case 17: /* sc */
+2 -1
arch/powerpc/math-emu/Makefile
··· 4 4 fmadd.o fmadds.o fmsub.o fmsubs.o \ 5 5 fmul.o fmuls.o fnabs.o fneg.o \ 6 6 fnmadd.o fnmadds.o fnmsub.o fnmsubs.o \ 7 - fres.o frsp.o frsqrte.o fsel.o lfs.o \ 7 + fres.o fre.o frsp.o fsel.o lfs.o \ 8 + frsqrte.o frsqrtes.o \ 8 9 fsqrt.o fsqrts.o fsub.o fsubs.o \ 9 10 mcrfs.o mffs.o mtfsb0.o mtfsb1.o \ 10 11 mtfsf.o mtfsfi.o stfiwx.o stfs.o \
+11
arch/powerpc/math-emu/fre.c
··· 1 + #include <linux/types.h> 2 + #include <linux/errno.h> 3 + #include <asm/uaccess.h> 4 + 5 + int fre(void *frD, void *frB) 6 + { 7 + #ifdef DEBUG 8 + printk("%s: %p %p\n", __func__, frD, frB); 9 + #endif 10 + return -ENOSYS; 11 + }
+11
arch/powerpc/math-emu/frsqrtes.c
··· 1 + #include <linux/types.h> 2 + #include <linux/errno.h> 3 + #include <asm/uaccess.h> 4 + 5 + int frsqrtes(void *frD, void *frB) 6 + { 7 + #ifdef DEBUG 8 + printk("%s: %p %p\n", __func__, frD, frB); 9 + #endif 10 + return 0; 11 + }
+10 -4
arch/powerpc/math-emu/math.c
··· 58 58 FLOATFUNC(fneg); 59 59 60 60 /* Optional */ 61 + FLOATFUNC(fre); 61 62 FLOATFUNC(fres); 62 63 FLOATFUNC(frsqrte); 64 + FLOATFUNC(frsqrtes); 63 65 FLOATFUNC(fsel); 64 66 FLOATFUNC(fsqrt); 65 67 FLOATFUNC(fsqrts); ··· 99 97 #define FSQRTS 0x016 /* 22 */ 100 98 #define FRES 0x018 /* 24 */ 101 99 #define FMULS 0x019 /* 25 */ 100 + #define FRSQRTES 0x01a /* 26 */ 102 101 #define FMSUBS 0x01c /* 28 */ 103 102 #define FMADDS 0x01d /* 29 */ 104 103 #define FNMSUBS 0x01e /* 30 */ ··· 112 109 #define FADD 0x015 /* 21 */ 113 110 #define FSQRT 0x016 /* 22 */ 114 111 #define FSEL 0x017 /* 23 */ 112 + #define FRE 0x018 /* 24 */ 115 113 #define FMUL 0x019 /* 25 */ 116 114 #define FRSQRTE 0x01a /* 26 */ 117 115 #define FMSUB 0x01c /* 28 */ ··· 303 299 case FDIVS: func = fdivs; type = AB; break; 304 300 case FSUBS: func = fsubs; type = AB; break; 305 301 case FADDS: func = fadds; type = AB; break; 306 - case FSQRTS: func = fsqrts; type = AB; break; 307 - case FRES: func = fres; type = AB; break; 302 + case FSQRTS: func = fsqrts; type = XB; break; 303 + case FRES: func = fres; type = XB; break; 308 304 case FMULS: func = fmuls; type = AC; break; 305 + case FRSQRTES: func = frsqrtes;type = XB; break; 309 306 case FMSUBS: func = fmsubs; type = ABC; break; 310 307 case FMADDS: func = fmadds; type = ABC; break; 311 308 case FNMSUBS: func = fnmsubs; type = ABC; break; ··· 322 317 case FDIV: func = fdiv; type = AB; break; 323 318 case FSUB: func = fsub; type = AB; break; 324 319 case FADD: func = fadd; type = AB; break; 325 - case FSQRT: func = fsqrt; type = AB; break; 320 + case FSQRT: func = fsqrt; type = XB; break; 321 + case FRE: func = fre; type = XB; break; 326 322 case FSEL: func = fsel; type = ABC; break; 327 323 case FMUL: func = fmul; type = AC; break; 328 - case FRSQRTE: func = frsqrte; type = AB; break; 324 + case FRSQRTE: func = frsqrte; type = XB; break; 329 325 case FMSUB: func = fmsub; type = ABC; break; 330 326 case FMADD: func = fmadd; type = ABC; break; 331 327 case FNMSUB: func = fnmsub; type = ABC; break;
+3 -3
arch/powerpc/mm/44x_mmu.c
··· 41 41 42 42 unsigned long tlb_47x_boltmap[1024/8]; 43 43 44 - static void __cpuinit ppc44x_update_tlb_hwater(void) 44 + static void ppc44x_update_tlb_hwater(void) 45 45 { 46 46 extern unsigned int tlb_44x_patch_hwater_D[]; 47 47 extern unsigned int tlb_44x_patch_hwater_I[]; ··· 134 134 /* 135 135 * "Pins" a 256MB TLB entry in AS0 for kernel lowmem for 47x type MMU 136 136 */ 137 - static void __cpuinit ppc47x_pin_tlb(unsigned int virt, unsigned int phys) 137 + static void ppc47x_pin_tlb(unsigned int virt, unsigned int phys) 138 138 { 139 139 unsigned int rA; 140 140 int bolted; ··· 229 229 } 230 230 231 231 #ifdef CONFIG_SMP 232 - void __cpuinit mmu_init_secondary(int cpu) 232 + void mmu_init_secondary(int cpu) 233 233 { 234 234 unsigned long addr; 235 235 unsigned long memstart = memstart_addr & ~(PPC_PIN_SIZE - 1);
+4 -4
arch/powerpc/mm/Makefile
··· 6 6 7 7 ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC) 8 8 9 - obj-y := fault.o mem.o pgtable.o gup.o \ 9 + obj-y := fault.o mem.o pgtable.o gup.o mmap.o \ 10 10 init_$(CONFIG_WORD_SIZE).o \ 11 11 pgtable_$(CONFIG_WORD_SIZE).o 12 12 obj-$(CONFIG_PPC_MMU_NOHASH) += mmu_context_nohash.o tlb_nohash.o \ 13 13 tlb_nohash_low.o 14 14 obj-$(CONFIG_PPC_BOOK3E) += tlb_low_$(CONFIG_WORD_SIZE)e.o 15 - obj-$(CONFIG_PPC64) += mmap_64.o 16 15 hash64-$(CONFIG_PPC_NATIVE) := hash_native_64.o 17 16 obj-$(CONFIG_PPC_STD_MMU_64) += hash_utils_64.o \ 18 17 slb_low.o slb.o stab.o \ 19 - mmap_64.o $(hash64-y) 18 + $(hash64-y) 20 19 obj-$(CONFIG_PPC_STD_MMU_32) += ppc_mmu_32.o 21 20 obj-$(CONFIG_PPC_STD_MMU) += hash_low_$(CONFIG_WORD_SIZE).o \ 22 21 tlb_hash$(CONFIG_WORD_SIZE).o \ ··· 27 28 obj-$(CONFIG_PPC_FSL_BOOK3E) += fsl_booke_mmu.o 28 29 obj-$(CONFIG_NEED_MULTIPLE_NODES) += numa.o 29 30 obj-$(CONFIG_PPC_MM_SLICES) += slice.o 30 - ifeq ($(CONFIG_HUGETLB_PAGE),y) 31 31 obj-y += hugetlbpage.o 32 + ifeq ($(CONFIG_HUGETLB_PAGE),y) 32 33 obj-$(CONFIG_PPC_STD_MMU_64) += hugetlbpage-hash64.o 33 34 obj-$(CONFIG_PPC_BOOK3E_MMU) += hugetlbpage-book3e.o 34 35 endif 36 + obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += hugepage-hash64.o 35 37 obj-$(CONFIG_PPC_SUBPAGE_PROT) += subpage-prot.o 36 38 obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o 37 39 obj-$(CONFIG_HIGHMEM) += highmem.o
+12 -6
arch/powerpc/mm/gup.c
··· 34 34 35 35 ptep = pte_offset_kernel(&pmd, addr); 36 36 do { 37 - pte_t pte = *ptep; 37 + pte_t pte = ACCESS_ONCE(*ptep); 38 38 struct page *page; 39 39 40 40 if ((pte_val(pte) & mask) != result) ··· 63 63 64 64 pmdp = pmd_offset(&pud, addr); 65 65 do { 66 - pmd_t pmd = *pmdp; 66 + pmd_t pmd = ACCESS_ONCE(*pmdp); 67 67 68 68 next = pmd_addr_end(addr, end); 69 - if (pmd_none(pmd)) 69 + /* 70 + * If we find a splitting transparent hugepage we 71 + * return zero. That will result in taking the slow 72 + * path which will call wait_split_huge_page() 73 + * if the pmd is still in splitting state 74 + */ 75 + if (pmd_none(pmd) || pmd_trans_splitting(pmd)) 70 76 return 0; 71 - if (pmd_huge(pmd)) { 77 + if (pmd_huge(pmd) || pmd_large(pmd)) { 72 78 if (!gup_hugepte((pte_t *)pmdp, PMD_SIZE, addr, next, 73 79 write, pages, nr)) 74 80 return 0; ··· 97 91 98 92 pudp = pud_offset(&pgd, addr); 99 93 do { 100 - pud_t pud = *pudp; 94 + pud_t pud = ACCESS_ONCE(*pudp); 101 95 102 96 next = pud_addr_end(addr, end); 103 97 if (pud_none(pud)) ··· 160 154 161 155 pgdp = pgd_offset(mm, addr); 162 156 do { 163 - pgd_t pgd = *pgdp; 157 + pgd_t pgd = ACCESS_ONCE(*pgdp); 164 158 165 159 pr_devel(" %016lx: normal pgd %p\n", addr, 166 160 (void *)pgd_val(pgd));
+12 -9
arch/powerpc/mm/hash_low_64.S
··· 289 289 290 290 /* Call ppc_md.hpte_updatepp */ 291 291 mr r5,r29 /* vpn */ 292 - li r6,MMU_PAGE_4K /* page size */ 293 - ld r7,STK_PARAM(R9)(r1) /* segment size */ 294 - ld r8,STK_PARAM(R8)(r1) /* get "local" param */ 292 + li r6,MMU_PAGE_4K /* base page size */ 293 + li r7,MMU_PAGE_4K /* actual page size */ 294 + ld r8,STK_PARAM(R9)(r1) /* segment size */ 295 + ld r9,STK_PARAM(R8)(r1) /* get "local" param */ 295 296 _GLOBAL(htab_call_hpte_updatepp) 296 297 bl . /* Patched by htab_finish_init() */ 297 298 ··· 650 649 651 650 /* Call ppc_md.hpte_updatepp */ 652 651 mr r5,r29 /* vpn */ 653 - li r6,MMU_PAGE_4K /* page size */ 654 - ld r7,STK_PARAM(R9)(r1) /* segment size */ 655 - ld r8,STK_PARAM(R8)(r1) /* get "local" param */ 652 + li r6,MMU_PAGE_4K /* base page size */ 653 + li r7,MMU_PAGE_4K /* actual page size */ 654 + ld r8,STK_PARAM(R9)(r1) /* segment size */ 655 + ld r9,STK_PARAM(R8)(r1) /* get "local" param */ 656 656 _GLOBAL(htab_call_hpte_updatepp) 657 657 bl . /* patched by htab_finish_init() */ 658 658 ··· 939 937 940 938 /* Call ppc_md.hpte_updatepp */ 941 939 mr r5,r29 /* vpn */ 942 - li r6,MMU_PAGE_64K 943 - ld r7,STK_PARAM(R9)(r1) /* segment size */ 944 - ld r8,STK_PARAM(R8)(r1) /* get "local" param */ 940 + li r6,MMU_PAGE_64K /* base page size */ 941 + li r7,MMU_PAGE_64K /* actual page size */ 942 + ld r8,STK_PARAM(R9)(r1) /* segment size */ 943 + ld r9,STK_PARAM(R8)(r1) /* get "local" param */ 945 944 _GLOBAL(ht64_call_hpte_updatepp) 946 945 bl . /* patched by htab_finish_init() */ 947 946
+118 -77
arch/powerpc/mm/hash_native_64.c
··· 273 273 return i; 274 274 } 275 275 276 - static inline int __hpte_actual_psize(unsigned int lp, int psize) 277 - { 278 - int i, shift; 279 - unsigned int mask; 280 - 281 - /* start from 1 ignoring MMU_PAGE_4K */ 282 - for (i = 1; i < MMU_PAGE_COUNT; i++) { 283 - 284 - /* invalid penc */ 285 - if (mmu_psize_defs[psize].penc[i] == -1) 286 - continue; 287 - /* 288 - * encoding bits per actual page size 289 - * PTE LP actual page size 290 - * rrrr rrrz >=8KB 291 - * rrrr rrzz >=16KB 292 - * rrrr rzzz >=32KB 293 - * rrrr zzzz >=64KB 294 - * ....... 295 - */ 296 - shift = mmu_psize_defs[i].shift - LP_SHIFT; 297 - if (shift > LP_BITS) 298 - shift = LP_BITS; 299 - mask = (1 << shift) - 1; 300 - if ((lp & mask) == mmu_psize_defs[psize].penc[i]) 301 - return i; 302 - } 303 - return -1; 304 - } 305 - 306 - static inline int hpte_actual_psize(struct hash_pte *hptep, int psize) 307 - { 308 - /* Look at the 8 bit LP value */ 309 - unsigned int lp = (hptep->r >> LP_SHIFT) & ((1 << LP_BITS) - 1); 310 - 311 - if (!(hptep->v & HPTE_V_VALID)) 312 - return -1; 313 - 314 - /* First check if it is large page */ 315 - if (!(hptep->v & HPTE_V_LARGE)) 316 - return MMU_PAGE_4K; 317 - 318 - return __hpte_actual_psize(lp, psize); 319 - } 320 - 321 276 static long native_hpte_updatepp(unsigned long slot, unsigned long newpp, 322 - unsigned long vpn, int psize, int ssize, 323 - int local) 277 + unsigned long vpn, int bpsize, 278 + int apsize, int ssize, int local) 324 279 { 325 280 struct hash_pte *hptep = htab_address + slot; 326 281 unsigned long hpte_v, want_v; 327 282 int ret = 0; 328 - int actual_psize; 329 283 330 - want_v = hpte_encode_avpn(vpn, psize, ssize); 284 + want_v = hpte_encode_avpn(vpn, bpsize, ssize); 331 285 332 286 DBG_LOW(" update(vpn=%016lx, avpnv=%016lx, group=%lx, newpp=%lx)", 333 287 vpn, want_v & HPTE_V_AVPN, slot, newpp); ··· 289 335 native_lock_hpte(hptep); 290 336 291 337 hpte_v = hptep->v; 292 - actual_psize = hpte_actual_psize(hptep, psize); 293 338 /* 294 339 * We need to invalidate the TLB always because hpte_remove doesn't do 295 340 * a tlb invalidate. If a hash bucket gets full, we "evict" a more/less ··· 296 343 * (hpte_remove) because we assume the old translation is still 297 344 * technically "valid". 298 345 */ 299 - if (actual_psize < 0) { 300 - actual_psize = psize; 301 - ret = -1; 302 - goto err_out; 303 - } 304 - if (!HPTE_V_COMPARE(hpte_v, want_v)) { 346 + if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID)) { 305 347 DBG_LOW(" -> miss\n"); 306 348 ret = -1; 307 349 } else { ··· 305 357 hptep->r = (hptep->r & ~(HPTE_R_PP | HPTE_R_N)) | 306 358 (newpp & (HPTE_R_PP | HPTE_R_N | HPTE_R_C)); 307 359 } 308 - err_out: 309 360 native_unlock_hpte(hptep); 310 361 311 362 /* Ensure it is out of the tlb too. */ 312 - tlbie(vpn, psize, actual_psize, ssize, local); 363 + tlbie(vpn, bpsize, apsize, ssize, local); 313 364 314 365 return ret; 315 366 } ··· 349 402 static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea, 350 403 int psize, int ssize) 351 404 { 352 - int actual_psize; 353 405 unsigned long vpn; 354 406 unsigned long vsid; 355 407 long slot; ··· 361 415 if (slot == -1) 362 416 panic("could not find page to bolt\n"); 363 417 hptep = htab_address + slot; 364 - actual_psize = hpte_actual_psize(hptep, psize); 365 - if (actual_psize < 0) 366 - actual_psize = psize; 367 418 368 419 /* Update the HPTE */ 369 420 hptep->r = (hptep->r & ~(HPTE_R_PP | HPTE_R_N)) | 370 421 (newpp & (HPTE_R_PP | HPTE_R_N)); 371 - 372 - /* Ensure it is out of the tlb too. */ 373 - tlbie(vpn, psize, actual_psize, ssize, 0); 422 + /* 423 + * Ensure it is out of the tlb too. Bolted entries base and 424 + * actual page size will be same. 425 + */ 426 + tlbie(vpn, psize, psize, ssize, 0); 374 427 } 375 428 376 429 static void native_hpte_invalidate(unsigned long slot, unsigned long vpn, 377 - int psize, int ssize, int local) 430 + int bpsize, int apsize, int ssize, int local) 378 431 { 379 432 struct hash_pte *hptep = htab_address + slot; 380 433 unsigned long hpte_v; 381 434 unsigned long want_v; 382 435 unsigned long flags; 383 - int actual_psize; 384 436 385 437 local_irq_save(flags); 386 438 387 439 DBG_LOW(" invalidate(vpn=%016lx, hash: %lx)\n", vpn, slot); 388 440 389 - want_v = hpte_encode_avpn(vpn, psize, ssize); 441 + want_v = hpte_encode_avpn(vpn, bpsize, ssize); 390 442 native_lock_hpte(hptep); 391 443 hpte_v = hptep->v; 392 444 393 - actual_psize = hpte_actual_psize(hptep, psize); 394 445 /* 395 446 * We need to invalidate the TLB always because hpte_remove doesn't do 396 447 * a tlb invalidate. If a hash bucket gets full, we "evict" a more/less ··· 395 452 * (hpte_remove) because we assume the old translation is still 396 453 * technically "valid". 397 454 */ 398 - if (actual_psize < 0) { 399 - actual_psize = psize; 400 - native_unlock_hpte(hptep); 401 - goto err_out; 402 - } 403 - if (!HPTE_V_COMPARE(hpte_v, want_v)) 455 + if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID)) 404 456 native_unlock_hpte(hptep); 405 457 else 406 458 /* Invalidate the hpte. NOTE: this also unlocks it */ 407 459 hptep->v = 0; 408 460 409 - err_out: 410 461 /* Invalidate the TLB */ 411 - tlbie(vpn, psize, actual_psize, ssize, local); 462 + tlbie(vpn, bpsize, apsize, ssize, local); 463 + 412 464 local_irq_restore(flags); 465 + } 466 + 467 + static void native_hugepage_invalidate(struct mm_struct *mm, 468 + unsigned char *hpte_slot_array, 469 + unsigned long addr, int psize) 470 + { 471 + int ssize = 0, i; 472 + int lock_tlbie; 473 + struct hash_pte *hptep; 474 + int actual_psize = MMU_PAGE_16M; 475 + unsigned int max_hpte_count, valid; 476 + unsigned long flags, s_addr = addr; 477 + unsigned long hpte_v, want_v, shift; 478 + unsigned long hidx, vpn = 0, vsid, hash, slot; 479 + 480 + shift = mmu_psize_defs[psize].shift; 481 + max_hpte_count = 1U << (PMD_SHIFT - shift); 482 + 483 + local_irq_save(flags); 484 + for (i = 0; i < max_hpte_count; i++) { 485 + valid = hpte_valid(hpte_slot_array, i); 486 + if (!valid) 487 + continue; 488 + hidx = hpte_hash_index(hpte_slot_array, i); 489 + 490 + /* get the vpn */ 491 + addr = s_addr + (i * (1ul << shift)); 492 + if (!is_kernel_addr(addr)) { 493 + ssize = user_segment_size(addr); 494 + vsid = get_vsid(mm->context.id, addr, ssize); 495 + WARN_ON(vsid == 0); 496 + } else { 497 + vsid = get_kernel_vsid(addr, mmu_kernel_ssize); 498 + ssize = mmu_kernel_ssize; 499 + } 500 + 501 + vpn = hpt_vpn(addr, vsid, ssize); 502 + hash = hpt_hash(vpn, shift, ssize); 503 + if (hidx & _PTEIDX_SECONDARY) 504 + hash = ~hash; 505 + 506 + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; 507 + slot += hidx & _PTEIDX_GROUP_IX; 508 + 509 + hptep = htab_address + slot; 510 + want_v = hpte_encode_avpn(vpn, psize, ssize); 511 + native_lock_hpte(hptep); 512 + hpte_v = hptep->v; 513 + 514 + /* Even if we miss, we need to invalidate the TLB */ 515 + if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID)) 516 + native_unlock_hpte(hptep); 517 + else 518 + /* Invalidate the hpte. NOTE: this also unlocks it */ 519 + hptep->v = 0; 520 + } 521 + /* 522 + * Since this is a hugepage, we just need a single tlbie. 523 + * use the last vpn. 524 + */ 525 + lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE); 526 + if (lock_tlbie) 527 + raw_spin_lock(&native_tlbie_lock); 528 + 529 + asm volatile("ptesync":::"memory"); 530 + __tlbie(vpn, psize, actual_psize, ssize); 531 + asm volatile("eieio; tlbsync; ptesync":::"memory"); 532 + 533 + if (lock_tlbie) 534 + raw_spin_unlock(&native_tlbie_lock); 535 + 536 + local_irq_restore(flags); 537 + } 538 + 539 + static inline int __hpte_actual_psize(unsigned int lp, int psize) 540 + { 541 + int i, shift; 542 + unsigned int mask; 543 + 544 + /* start from 1 ignoring MMU_PAGE_4K */ 545 + for (i = 1; i < MMU_PAGE_COUNT; i++) { 546 + 547 + /* invalid penc */ 548 + if (mmu_psize_defs[psize].penc[i] == -1) 549 + continue; 550 + /* 551 + * encoding bits per actual page size 552 + * PTE LP actual page size 553 + * rrrr rrrz >=8KB 554 + * rrrr rrzz >=16KB 555 + * rrrr rzzz >=32KB 556 + * rrrr zzzz >=64KB 557 + * ....... 558 + */ 559 + shift = mmu_psize_defs[i].shift - LP_SHIFT; 560 + if (shift > LP_BITS) 561 + shift = LP_BITS; 562 + mask = (1 << shift) - 1; 563 + if ((lp & mask) == mmu_psize_defs[psize].penc[i]) 564 + return i; 565 + } 566 + return -1; 413 567 } 414 568 415 569 static void hpte_decode(struct hash_pte *hpte, unsigned long slot, ··· 712 672 ppc_md.hpte_remove = native_hpte_remove; 713 673 ppc_md.hpte_clear_all = native_hpte_clear; 714 674 ppc_md.flush_hash_range = native_flush_hash_range; 675 + ppc_md.hugepage_invalidate = native_hugepage_invalidate; 715 676 }
+48 -21
arch/powerpc/mm/hash_utils_64.c
··· 807 807 } 808 808 809 809 #ifdef CONFIG_SMP 810 - void __cpuinit early_init_mmu_secondary(void) 810 + void early_init_mmu_secondary(void) 811 811 { 812 812 /* Initialize hash table for that CPU */ 813 813 if (!firmware_has_feature(FW_FEATURE_LPAR)) ··· 1050 1050 goto bail; 1051 1051 } 1052 1052 1053 - #ifdef CONFIG_HUGETLB_PAGE 1054 1053 if (hugeshift) { 1055 - rc = __hash_page_huge(ea, access, vsid, ptep, trap, local, 1056 - ssize, hugeshift, psize); 1054 + if (pmd_trans_huge(*(pmd_t *)ptep)) 1055 + rc = __hash_page_thp(ea, access, vsid, (pmd_t *)ptep, 1056 + trap, local, ssize, psize); 1057 + #ifdef CONFIG_HUGETLB_PAGE 1058 + else 1059 + rc = __hash_page_huge(ea, access, vsid, ptep, trap, 1060 + local, ssize, hugeshift, psize); 1061 + #else 1062 + else { 1063 + /* 1064 + * if we have hugeshift, and is not transhuge with 1065 + * hugetlb disabled, something is really wrong. 1066 + */ 1067 + rc = 1; 1068 + WARN_ON(1); 1069 + } 1070 + #endif 1057 1071 goto bail; 1058 1072 } 1059 - #endif /* CONFIG_HUGETLB_PAGE */ 1060 1073 1061 1074 #ifndef CONFIG_PPC_64K_PAGES 1062 1075 DBG_LOW(" i-pte: %016lx\n", pte_val(*ptep)); ··· 1158 1145 void hash_preload(struct mm_struct *mm, unsigned long ea, 1159 1146 unsigned long access, unsigned long trap) 1160 1147 { 1148 + int hugepage_shift; 1161 1149 unsigned long vsid; 1162 1150 pgd_t *pgdir; 1163 1151 pte_t *ptep; ··· 1180 1166 pgdir = mm->pgd; 1181 1167 if (pgdir == NULL) 1182 1168 return; 1183 - ptep = find_linux_pte(pgdir, ea); 1184 - if (!ptep) 1185 - return; 1186 1169 1170 + /* Get VSID */ 1171 + ssize = user_segment_size(ea); 1172 + vsid = get_vsid(mm->context.id, ea, ssize); 1173 + if (!vsid) 1174 + return; 1175 + /* 1176 + * Hash doesn't like irqs. Walking linux page table with irq disabled 1177 + * saves us from holding multiple locks. 1178 + */ 1179 + local_irq_save(flags); 1180 + 1181 + /* 1182 + * THP pages use update_mmu_cache_pmd. We don't do 1183 + * hash preload there. Hence can ignore THP here 1184 + */ 1185 + ptep = find_linux_pte_or_hugepte(pgdir, ea, &hugepage_shift); 1186 + if (!ptep) 1187 + goto out_exit; 1188 + 1189 + WARN_ON(hugepage_shift); 1187 1190 #ifdef CONFIG_PPC_64K_PAGES 1188 1191 /* If either _PAGE_4K_PFN or _PAGE_NO_CACHE is set (and we are on 1189 1192 * a 64K kernel), then we don't preload, hash_page() will take ··· 1209 1178 * page size demotion here 1210 1179 */ 1211 1180 if (pte_val(*ptep) & (_PAGE_4K_PFN | _PAGE_NO_CACHE)) 1212 - return; 1181 + goto out_exit; 1213 1182 #endif /* CONFIG_PPC_64K_PAGES */ 1214 - 1215 - /* Get VSID */ 1216 - ssize = user_segment_size(ea); 1217 - vsid = get_vsid(mm->context.id, ea, ssize); 1218 - if (!vsid) 1219 - return; 1220 - 1221 - /* Hash doesn't like irqs */ 1222 - local_irq_save(flags); 1223 1183 1224 1184 /* Is that local to this CPU ? */ 1225 1185 if (cpumask_equal(mm_cpumask(mm), cpumask_of(smp_processor_id()))) ··· 1233 1211 mm->context.user_psize, 1234 1212 mm->context.user_psize, 1235 1213 pte_val(*ptep)); 1236 - 1214 + out_exit: 1237 1215 local_irq_restore(flags); 1238 1216 } 1239 1217 ··· 1254 1232 slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; 1255 1233 slot += hidx & _PTEIDX_GROUP_IX; 1256 1234 DBG_LOW(" sub %ld: hash=%lx, hidx=%lx\n", index, slot, hidx); 1257 - ppc_md.hpte_invalidate(slot, vpn, psize, ssize, local); 1235 + /* 1236 + * We use same base page size and actual psize, because we don't 1237 + * use these functions for hugepage 1238 + */ 1239 + ppc_md.hpte_invalidate(slot, vpn, psize, psize, ssize, local); 1258 1240 } pte_iterate_hashed_end(); 1259 1241 1260 1242 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM ··· 1391 1365 hash = ~hash; 1392 1366 slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; 1393 1367 slot += hidx & _PTEIDX_GROUP_IX; 1394 - ppc_md.hpte_invalidate(slot, vpn, mmu_linear_psize, mmu_kernel_ssize, 0); 1368 + ppc_md.hpte_invalidate(slot, vpn, mmu_linear_psize, mmu_linear_psize, 1369 + mmu_kernel_ssize, 0); 1395 1370 } 1396 1371 1397 1372 void kernel_map_pages(struct page *page, int numpages, int enable)
+175
arch/powerpc/mm/hugepage-hash64.c
··· 1 + /* 2 + * Copyright IBM Corporation, 2013 3 + * Author Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> 4 + * 5 + * This program is free software; you can redistribute it and/or modify it 6 + * under the terms of version 2.1 of the GNU Lesser General Public License 7 + * as published by the Free Software Foundation. 8 + * 9 + * This program is distributed in the hope that it would be useful, but 10 + * WITHOUT ANY WARRANTY; without even the implied warranty of 11 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 12 + * 13 + */ 14 + 15 + /* 16 + * PPC64 THP Support for hash based MMUs 17 + */ 18 + #include <linux/mm.h> 19 + #include <asm/machdep.h> 20 + 21 + int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid, 22 + pmd_t *pmdp, unsigned long trap, int local, int ssize, 23 + unsigned int psize) 24 + { 25 + unsigned int index, valid; 26 + unsigned char *hpte_slot_array; 27 + unsigned long rflags, pa, hidx; 28 + unsigned long old_pmd, new_pmd; 29 + int ret, lpsize = MMU_PAGE_16M; 30 + unsigned long vpn, hash, shift, slot; 31 + 32 + /* 33 + * atomically mark the linux large page PMD busy and dirty 34 + */ 35 + do { 36 + old_pmd = pmd_val(*pmdp); 37 + /* If PMD busy, retry the access */ 38 + if (unlikely(old_pmd & _PAGE_BUSY)) 39 + return 0; 40 + /* If PMD is trans splitting retry the access */ 41 + if (unlikely(old_pmd & _PAGE_SPLITTING)) 42 + return 0; 43 + /* If PMD permissions don't match, take page fault */ 44 + if (unlikely(access & ~old_pmd)) 45 + return 1; 46 + /* 47 + * Try to lock the PTE, add ACCESSED and DIRTY if it was 48 + * a write access 49 + */ 50 + new_pmd = old_pmd | _PAGE_BUSY | _PAGE_ACCESSED; 51 + if (access & _PAGE_RW) 52 + new_pmd |= _PAGE_DIRTY; 53 + } while (old_pmd != __cmpxchg_u64((unsigned long *)pmdp, 54 + old_pmd, new_pmd)); 55 + /* 56 + * PP bits. _PAGE_USER is already PP bit 0x2, so we only 57 + * need to add in 0x1 if it's a read-only user page 58 + */ 59 + rflags = new_pmd & _PAGE_USER; 60 + if ((new_pmd & _PAGE_USER) && !((new_pmd & _PAGE_RW) && 61 + (new_pmd & _PAGE_DIRTY))) 62 + rflags |= 0x1; 63 + /* 64 + * _PAGE_EXEC -> HW_NO_EXEC since it's inverted 65 + */ 66 + rflags |= ((new_pmd & _PAGE_EXEC) ? 0 : HPTE_R_N); 67 + 68 + #if 0 69 + if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) { 70 + 71 + /* 72 + * No CPU has hugepages but lacks no execute, so we 73 + * don't need to worry about that case 74 + */ 75 + rflags = hash_page_do_lazy_icache(rflags, __pte(old_pte), trap); 76 + } 77 + #endif 78 + /* 79 + * Find the slot index details for this ea, using base page size. 80 + */ 81 + shift = mmu_psize_defs[psize].shift; 82 + index = (ea & ~HPAGE_PMD_MASK) >> shift; 83 + BUG_ON(index >= 4096); 84 + 85 + vpn = hpt_vpn(ea, vsid, ssize); 86 + hash = hpt_hash(vpn, shift, ssize); 87 + hpte_slot_array = get_hpte_slot_array(pmdp); 88 + 89 + valid = hpte_valid(hpte_slot_array, index); 90 + if (valid) { 91 + /* update the hpte bits */ 92 + hidx = hpte_hash_index(hpte_slot_array, index); 93 + if (hidx & _PTEIDX_SECONDARY) 94 + hash = ~hash; 95 + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; 96 + slot += hidx & _PTEIDX_GROUP_IX; 97 + 98 + ret = ppc_md.hpte_updatepp(slot, rflags, vpn, 99 + psize, lpsize, ssize, local); 100 + /* 101 + * We failed to update, try to insert a new entry. 102 + */ 103 + if (ret == -1) { 104 + /* 105 + * large pte is marked busy, so we can be sure 106 + * nobody is looking at hpte_slot_array. hence we can 107 + * safely update this here. 108 + */ 109 + valid = 0; 110 + new_pmd &= ~_PAGE_HPTEFLAGS; 111 + hpte_slot_array[index] = 0; 112 + } else 113 + /* clear the busy bits and set the hash pte bits */ 114 + new_pmd = (new_pmd & ~_PAGE_HPTEFLAGS) | _PAGE_HASHPTE; 115 + } 116 + 117 + if (!valid) { 118 + unsigned long hpte_group; 119 + 120 + /* insert new entry */ 121 + pa = pmd_pfn(__pmd(old_pmd)) << PAGE_SHIFT; 122 + repeat: 123 + hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL; 124 + 125 + /* clear the busy bits and set the hash pte bits */ 126 + new_pmd = (new_pmd & ~_PAGE_HPTEFLAGS) | _PAGE_HASHPTE; 127 + 128 + /* Add in WIMG bits */ 129 + rflags |= (new_pmd & (_PAGE_WRITETHRU | _PAGE_NO_CACHE | 130 + _PAGE_COHERENT | _PAGE_GUARDED)); 131 + 132 + /* Insert into the hash table, primary slot */ 133 + slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0, 134 + psize, lpsize, ssize); 135 + /* 136 + * Primary is full, try the secondary 137 + */ 138 + if (unlikely(slot == -1)) { 139 + hpte_group = ((~hash & htab_hash_mask) * 140 + HPTES_PER_GROUP) & ~0x7UL; 141 + slot = ppc_md.hpte_insert(hpte_group, vpn, pa, 142 + rflags, HPTE_V_SECONDARY, 143 + psize, lpsize, ssize); 144 + if (slot == -1) { 145 + if (mftb() & 0x1) 146 + hpte_group = ((hash & htab_hash_mask) * 147 + HPTES_PER_GROUP) & ~0x7UL; 148 + 149 + ppc_md.hpte_remove(hpte_group); 150 + goto repeat; 151 + } 152 + } 153 + /* 154 + * Hypervisor failure. Restore old pmd and return -1 155 + * similar to __hash_page_* 156 + */ 157 + if (unlikely(slot == -2)) { 158 + *pmdp = __pmd(old_pmd); 159 + hash_failure_debug(ea, access, vsid, trap, ssize, 160 + psize, lpsize, old_pmd); 161 + return -1; 162 + } 163 + /* 164 + * large pte is marked busy, so we can be sure 165 + * nobody is looking at hpte_slot_array. hence we can 166 + * safely update this here. 167 + */ 168 + mark_hpte_slot_valid(hpte_slot_array, index, slot); 169 + } 170 + /* 171 + * No need to use ldarx/stdcx here 172 + */ 173 + *pmdp = __pmd(new_pmd & ~_PAGE_BUSY); 174 + return 0; 175 + }
+1 -1
arch/powerpc/mm/hugetlbpage-hash64.c
··· 81 81 slot += (old_pte & _PAGE_F_GIX) >> 12; 82 82 83 83 if (ppc_md.hpte_updatepp(slot, rflags, vpn, mmu_psize, 84 - ssize, local) == -1) 84 + mmu_psize, ssize, local) == -1) 85 85 old_pte &= ~_PAGE_HPTEFLAGS; 86 86 } 87 87
+174 -125
arch/powerpc/mm/hugetlbpage.c
··· 21 21 #include <asm/pgalloc.h> 22 22 #include <asm/tlb.h> 23 23 #include <asm/setup.h> 24 + #include <asm/hugetlb.h> 25 + 26 + #ifdef CONFIG_HUGETLB_PAGE 24 27 25 28 #define PAGE_SHIFT_64K 16 26 29 #define PAGE_SHIFT_16M 24 ··· 103 100 } 104 101 #endif 105 102 106 - /* 107 - * We have 4 cases for pgds and pmds: 108 - * (1) invalid (all zeroes) 109 - * (2) pointer to next table, as normal; bottom 6 bits == 0 110 - * (3) leaf pte for huge page, bottom two bits != 00 111 - * (4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table 112 - */ 113 - pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift) 114 - { 115 - pgd_t *pg; 116 - pud_t *pu; 117 - pmd_t *pm; 118 - pte_t *ret_pte; 119 - hugepd_t *hpdp = NULL; 120 - unsigned pdshift = PGDIR_SHIFT; 121 - 122 - if (shift) 123 - *shift = 0; 124 - 125 - pg = pgdir + pgd_index(ea); 126 - 127 - if (pgd_huge(*pg)) { 128 - ret_pte = (pte_t *) pg; 129 - goto out; 130 - } else if (is_hugepd(pg)) 131 - hpdp = (hugepd_t *)pg; 132 - else if (!pgd_none(*pg)) { 133 - pdshift = PUD_SHIFT; 134 - pu = pud_offset(pg, ea); 135 - 136 - if (pud_huge(*pu)) { 137 - ret_pte = (pte_t *) pu; 138 - goto out; 139 - } else if (is_hugepd(pu)) 140 - hpdp = (hugepd_t *)pu; 141 - else if (!pud_none(*pu)) { 142 - pdshift = PMD_SHIFT; 143 - pm = pmd_offset(pu, ea); 144 - 145 - if (pmd_huge(*pm)) { 146 - ret_pte = (pte_t *) pm; 147 - goto out; 148 - } else if (is_hugepd(pm)) 149 - hpdp = (hugepd_t *)pm; 150 - else if (!pmd_none(*pm)) 151 - return pte_offset_kernel(pm, ea); 152 - } 153 - } 154 - if (!hpdp) 155 - return NULL; 156 - 157 - ret_pte = hugepte_offset(hpdp, ea, pdshift); 158 - pdshift = hugepd_shift(*hpdp); 159 - out: 160 - if (shift) 161 - *shift = pdshift; 162 - return ret_pte; 163 - } 164 - EXPORT_SYMBOL_GPL(find_linux_pte_or_hugepte); 165 - 166 103 pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) 167 104 { 105 + /* Only called for hugetlbfs pages, hence can ignore THP */ 168 106 return find_linux_pte_or_hugepte(mm->pgd, addr, NULL); 169 107 } 170 108 ··· 680 736 struct page *page; 681 737 unsigned shift; 682 738 unsigned long mask; 683 - 739 + /* 740 + * Transparent hugepages are handled by generic code. We can skip them 741 + * here. 742 + */ 684 743 ptep = find_linux_pte_or_hugepte(mm->pgd, address, &shift); 685 744 686 745 /* Verify it is a huge page else bail. */ 687 - if (!ptep || !shift) 746 + if (!ptep || !shift || pmd_trans_huge(*(pmd_t *)ptep)) 688 747 return ERR_PTR(-EINVAL); 689 748 690 749 mask = (1UL << shift) - 1; ··· 704 757 { 705 758 BUG(); 706 759 return NULL; 707 - } 708 - 709 - int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, 710 - unsigned long end, int write, struct page **pages, int *nr) 711 - { 712 - unsigned long mask; 713 - unsigned long pte_end; 714 - struct page *head, *page, *tail; 715 - pte_t pte; 716 - int refs; 717 - 718 - pte_end = (addr + sz) & ~(sz-1); 719 - if (pte_end < end) 720 - end = pte_end; 721 - 722 - pte = *ptep; 723 - mask = _PAGE_PRESENT | _PAGE_USER; 724 - if (write) 725 - mask |= _PAGE_RW; 726 - 727 - if ((pte_val(pte) & mask) != mask) 728 - return 0; 729 - 730 - /* hugepages are never "special" */ 731 - VM_BUG_ON(!pfn_valid(pte_pfn(pte))); 732 - 733 - refs = 0; 734 - head = pte_page(pte); 735 - 736 - page = head + ((addr & (sz-1)) >> PAGE_SHIFT); 737 - tail = page; 738 - do { 739 - VM_BUG_ON(compound_head(page) != head); 740 - pages[*nr] = page; 741 - (*nr)++; 742 - page++; 743 - refs++; 744 - } while (addr += PAGE_SIZE, addr != end); 745 - 746 - if (!page_cache_add_speculative(head, refs)) { 747 - *nr -= refs; 748 - return 0; 749 - } 750 - 751 - if (unlikely(pte_val(pte) != pte_val(*ptep))) { 752 - /* Could be optimized better */ 753 - *nr -= refs; 754 - while (refs--) 755 - put_page(head); 756 - return 0; 757 - } 758 - 759 - /* 760 - * Any tail page need their mapcount reference taken before we 761 - * return. 762 - */ 763 - while (refs--) { 764 - if (PageTail(tail)) 765 - get_huge_page_tail(tail); 766 - tail++; 767 - } 768 - 769 - return 1; 770 760 } 771 761 772 762 static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end, ··· 921 1037 kunmap_atomic(start); 922 1038 } 923 1039 } 1040 + } 1041 + 1042 + #endif /* CONFIG_HUGETLB_PAGE */ 1043 + 1044 + /* 1045 + * We have 4 cases for pgds and pmds: 1046 + * (1) invalid (all zeroes) 1047 + * (2) pointer to next table, as normal; bottom 6 bits == 0 1048 + * (3) leaf pte for huge page, bottom two bits != 00 1049 + * (4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table 1050 + * 1051 + * So long as we atomically load page table pointers we are safe against teardown, 1052 + * we can follow the address down to the the page and take a ref on it. 1053 + */ 1054 + 1055 + pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift) 1056 + { 1057 + pgd_t pgd, *pgdp; 1058 + pud_t pud, *pudp; 1059 + pmd_t pmd, *pmdp; 1060 + pte_t *ret_pte; 1061 + hugepd_t *hpdp = NULL; 1062 + unsigned pdshift = PGDIR_SHIFT; 1063 + 1064 + if (shift) 1065 + *shift = 0; 1066 + 1067 + pgdp = pgdir + pgd_index(ea); 1068 + pgd = ACCESS_ONCE(*pgdp); 1069 + /* 1070 + * Always operate on the local stack value. This make sure the 1071 + * value don't get updated by a parallel THP split/collapse, 1072 + * page fault or a page unmap. The return pte_t * is still not 1073 + * stable. So should be checked there for above conditions. 1074 + */ 1075 + if (pgd_none(pgd)) 1076 + return NULL; 1077 + else if (pgd_huge(pgd)) { 1078 + ret_pte = (pte_t *) pgdp; 1079 + goto out; 1080 + } else if (is_hugepd(&pgd)) 1081 + hpdp = (hugepd_t *)&pgd; 1082 + else { 1083 + /* 1084 + * Even if we end up with an unmap, the pgtable will not 1085 + * be freed, because we do an rcu free and here we are 1086 + * irq disabled 1087 + */ 1088 + pdshift = PUD_SHIFT; 1089 + pudp = pud_offset(&pgd, ea); 1090 + pud = ACCESS_ONCE(*pudp); 1091 + 1092 + if (pud_none(pud)) 1093 + return NULL; 1094 + else if (pud_huge(pud)) { 1095 + ret_pte = (pte_t *) pudp; 1096 + goto out; 1097 + } else if (is_hugepd(&pud)) 1098 + hpdp = (hugepd_t *)&pud; 1099 + else { 1100 + pdshift = PMD_SHIFT; 1101 + pmdp = pmd_offset(&pud, ea); 1102 + pmd = ACCESS_ONCE(*pmdp); 1103 + /* 1104 + * A hugepage collapse is captured by pmd_none, because 1105 + * it mark the pmd none and do a hpte invalidate. 1106 + * 1107 + * A hugepage split is captured by pmd_trans_splitting 1108 + * because we mark the pmd trans splitting and do a 1109 + * hpte invalidate 1110 + * 1111 + */ 1112 + if (pmd_none(pmd) || pmd_trans_splitting(pmd)) 1113 + return NULL; 1114 + 1115 + if (pmd_huge(pmd) || pmd_large(pmd)) { 1116 + ret_pte = (pte_t *) pmdp; 1117 + goto out; 1118 + } else if (is_hugepd(&pmd)) 1119 + hpdp = (hugepd_t *)&pmd; 1120 + else 1121 + return pte_offset_kernel(&pmd, ea); 1122 + } 1123 + } 1124 + if (!hpdp) 1125 + return NULL; 1126 + 1127 + ret_pte = hugepte_offset(hpdp, ea, pdshift); 1128 + pdshift = hugepd_shift(*hpdp); 1129 + out: 1130 + if (shift) 1131 + *shift = pdshift; 1132 + return ret_pte; 1133 + } 1134 + EXPORT_SYMBOL_GPL(find_linux_pte_or_hugepte); 1135 + 1136 + int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, 1137 + unsigned long end, int write, struct page **pages, int *nr) 1138 + { 1139 + unsigned long mask; 1140 + unsigned long pte_end; 1141 + struct page *head, *page, *tail; 1142 + pte_t pte; 1143 + int refs; 1144 + 1145 + pte_end = (addr + sz) & ~(sz-1); 1146 + if (pte_end < end) 1147 + end = pte_end; 1148 + 1149 + pte = ACCESS_ONCE(*ptep); 1150 + mask = _PAGE_PRESENT | _PAGE_USER; 1151 + if (write) 1152 + mask |= _PAGE_RW; 1153 + 1154 + if ((pte_val(pte) & mask) != mask) 1155 + return 0; 1156 + 1157 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 1158 + /* 1159 + * check for splitting here 1160 + */ 1161 + if (pmd_trans_splitting(pte_pmd(pte))) 1162 + return 0; 1163 + #endif 1164 + 1165 + /* hugepages are never "special" */ 1166 + VM_BUG_ON(!pfn_valid(pte_pfn(pte))); 1167 + 1168 + refs = 0; 1169 + head = pte_page(pte); 1170 + 1171 + page = head + ((addr & (sz-1)) >> PAGE_SHIFT); 1172 + tail = page; 1173 + do { 1174 + VM_BUG_ON(compound_head(page) != head); 1175 + pages[*nr] = page; 1176 + (*nr)++; 1177 + page++; 1178 + refs++; 1179 + } while (addr += PAGE_SIZE, addr != end); 1180 + 1181 + if (!page_cache_add_speculative(head, refs)) { 1182 + *nr -= refs; 1183 + return 0; 1184 + } 1185 + 1186 + if (unlikely(pte_val(pte) != pte_val(*ptep))) { 1187 + /* Could be optimized better */ 1188 + *nr -= refs; 1189 + while (refs--) 1190 + put_page(head); 1191 + return 0; 1192 + } 1193 + 1194 + /* 1195 + * Any tail page need their mapcount reference taken before we 1196 + * return. 1197 + */ 1198 + while (refs--) { 1199 + if (PageTail(tail)) 1200 + get_huge_page_tail(tail); 1201 + tail++; 1202 + } 1203 + 1204 + return 1; 924 1205 }
+6 -3
arch/powerpc/mm/init_64.c
··· 88 88 89 89 static void pmd_ctor(void *addr) 90 90 { 91 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 92 + memset(addr, 0, PMD_TABLE_SIZE * 2); 93 + #else 91 94 memset(addr, 0, PMD_TABLE_SIZE); 95 + #endif 92 96 } 93 97 94 98 struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE]; ··· 141 137 void pgtable_cache_init(void) 142 138 { 143 139 pgtable_cache_add(PGD_INDEX_SIZE, pgd_ctor); 144 - pgtable_cache_add(PMD_INDEX_SIZE, pmd_ctor); 145 - if (!PGT_CACHE(PGD_INDEX_SIZE) || !PGT_CACHE(PMD_INDEX_SIZE)) 140 + pgtable_cache_add(PMD_CACHE_INDEX, pmd_ctor); 141 + if (!PGT_CACHE(PGD_INDEX_SIZE) || !PGT_CACHE(PMD_CACHE_INDEX)) 146 142 panic("Couldn't allocate pgtable caches"); 147 - 148 143 /* In all current configs, when the PUD index exists it's the 149 144 * same size as either the pgd or pmd index. Verify that the 150 145 * initialization above has also created a PUD cache. This
+4
arch/powerpc/mm/mem.c
··· 461 461 pte_t *ptep) 462 462 { 463 463 #ifdef CONFIG_PPC_STD_MMU 464 + /* 465 + * We don't need to worry about _PAGE_PRESENT here because we are 466 + * called with either mm->page_table_lock held or ptl lock held 467 + */ 464 468 unsigned long access = 0, trap; 465 469 466 470 /* We only want HPTEs for linux PTEs that have _PAGE_ACCESSED set */
arch/powerpc/mm/mmap_64.c arch/powerpc/mm/mmap.c
+9 -6
arch/powerpc/mm/mmu_context_nohash.c
··· 112 112 */ 113 113 for_each_cpu(cpu, mm_cpumask(mm)) { 114 114 for (i = cpu_first_thread_sibling(cpu); 115 - i <= cpu_last_thread_sibling(cpu); i++) 116 - __set_bit(id, stale_map[i]); 115 + i <= cpu_last_thread_sibling(cpu); i++) { 116 + if (stale_map[i]) 117 + __set_bit(id, stale_map[i]); 118 + } 117 119 cpu = i - 1; 118 120 } 119 121 return id; ··· 274 272 /* XXX This clear should ultimately be part of local_flush_tlb_mm */ 275 273 for (i = cpu_first_thread_sibling(cpu); 276 274 i <= cpu_last_thread_sibling(cpu); i++) { 277 - __clear_bit(id, stale_map[i]); 275 + if (stale_map[i]) 276 + __clear_bit(id, stale_map[i]); 278 277 } 279 278 } 280 279 ··· 332 329 333 330 #ifdef CONFIG_SMP 334 331 335 - static int __cpuinit mmu_context_cpu_notify(struct notifier_block *self, 336 - unsigned long action, void *hcpu) 332 + static int mmu_context_cpu_notify(struct notifier_block *self, 333 + unsigned long action, void *hcpu) 337 334 { 338 335 unsigned int cpu = (unsigned int)(long)hcpu; 339 336 ··· 366 363 return NOTIFY_OK; 367 364 } 368 365 369 - static struct notifier_block __cpuinitdata mmu_context_cpu_nb = { 366 + static struct notifier_block mmu_context_cpu_nb = { 370 367 .notifier_call = mmu_context_cpu_notify, 371 368 }; 372 369
+6 -6
arch/powerpc/mm/numa.c
··· 516 516 * Figure out to which domain a cpu belongs and stick it there. 517 517 * Return the id of the domain used. 518 518 */ 519 - static int __cpuinit numa_setup_cpu(unsigned long lcpu) 519 + static int numa_setup_cpu(unsigned long lcpu) 520 520 { 521 521 int nid = 0; 522 522 struct device_node *cpu = of_get_cpu_node(lcpu, NULL); ··· 538 538 return nid; 539 539 } 540 540 541 - static int __cpuinit cpu_numa_callback(struct notifier_block *nfb, 542 - unsigned long action, 541 + static int cpu_numa_callback(struct notifier_block *nfb, unsigned long action, 543 542 void *hcpu) 544 543 { 545 544 unsigned long lcpu = (unsigned long)hcpu; ··· 918 919 return ret; 919 920 } 920 921 921 - static struct notifier_block __cpuinitdata ppc64_numa_nb = { 922 + static struct notifier_block ppc64_numa_nb = { 922 923 .notifier_call = cpu_numa_callback, 923 924 .priority = 1 /* Must run before sched domains notifier. */ 924 925 }; ··· 1432 1433 if (cpu != update->cpu) 1433 1434 continue; 1434 1435 1435 - unregister_cpu_under_node(update->cpu, update->old_nid); 1436 1436 unmap_cpu_from_node(update->cpu); 1437 1437 map_cpu_to_node(update->cpu, update->new_nid); 1438 1438 vdso_getcpu_init(); 1439 - register_cpu_under_node(update->cpu, update->new_nid); 1440 1439 } 1441 1440 1442 1441 return 0; ··· 1482 1485 stop_machine(update_cpu_topology, &updates[0], &updated_cpus); 1483 1486 1484 1487 for (ud = &updates[0]; ud; ud = ud->next) { 1488 + unregister_cpu_under_node(ud->cpu, ud->old_nid); 1489 + register_cpu_under_node(ud->cpu, ud->new_nid); 1490 + 1485 1491 dev = get_cpu_device(ud->cpu); 1486 1492 if (dev) 1487 1493 kobject_uevent(&dev->kobj, KOBJ_CHANGE);
+8
arch/powerpc/mm/pgtable.c
··· 235 235 pud = pud_offset(pgd, addr); 236 236 BUG_ON(pud_none(*pud)); 237 237 pmd = pmd_offset(pud, addr); 238 + /* 239 + * khugepaged to collapse normal pages to hugepage, first set 240 + * pmd to none to force page fault/gup to take mmap_sem. After 241 + * pmd is set to none, we do a pte_clear which does this assertion 242 + * so if we find pmd none, return. 243 + */ 244 + if (pmd_none(*pmd)) 245 + return; 238 246 BUG_ON(!pmd_present(*pmd)); 239 247 assert_spin_locked(pte_lockptr(mm, pmd)); 240 248 }
+414
arch/powerpc/mm/pgtable_64.c
··· 338 338 EXPORT_SYMBOL(__iounmap); 339 339 EXPORT_SYMBOL(__iounmap_at); 340 340 341 + /* 342 + * For hugepage we have pfn in the pmd, we use PTE_RPN_SHIFT bits for flags 343 + * For PTE page, we have a PTE_FRAG_SIZE (4K) aligned virtual address. 344 + */ 345 + struct page *pmd_page(pmd_t pmd) 346 + { 347 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 348 + if (pmd_trans_huge(pmd)) 349 + return pfn_to_page(pmd_pfn(pmd)); 350 + #endif 351 + return virt_to_page(pmd_page_vaddr(pmd)); 352 + } 353 + 341 354 #ifdef CONFIG_PPC_64K_PAGES 342 355 static pte_t *get_from_cache(struct mm_struct *mm) 343 356 { ··· 468 455 } 469 456 #endif 470 457 #endif /* CONFIG_PPC_64K_PAGES */ 458 + 459 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 460 + 461 + /* 462 + * This is called when relaxing access to a hugepage. It's also called in the page 463 + * fault path when we don't hit any of the major fault cases, ie, a minor 464 + * update of _PAGE_ACCESSED, _PAGE_DIRTY, etc... The generic code will have 465 + * handled those two for us, we additionally deal with missing execute 466 + * permission here on some processors 467 + */ 468 + int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address, 469 + pmd_t *pmdp, pmd_t entry, int dirty) 470 + { 471 + int changed; 472 + #ifdef CONFIG_DEBUG_VM 473 + WARN_ON(!pmd_trans_huge(*pmdp)); 474 + assert_spin_locked(&vma->vm_mm->page_table_lock); 475 + #endif 476 + changed = !pmd_same(*(pmdp), entry); 477 + if (changed) { 478 + __ptep_set_access_flags(pmdp_ptep(pmdp), pmd_pte(entry)); 479 + /* 480 + * Since we are not supporting SW TLB systems, we don't 481 + * have any thing similar to flush_tlb_page_nohash() 482 + */ 483 + } 484 + return changed; 485 + } 486 + 487 + unsigned long pmd_hugepage_update(struct mm_struct *mm, unsigned long addr, 488 + pmd_t *pmdp, unsigned long clr) 489 + { 490 + 491 + unsigned long old, tmp; 492 + 493 + #ifdef CONFIG_DEBUG_VM 494 + WARN_ON(!pmd_trans_huge(*pmdp)); 495 + assert_spin_locked(&mm->page_table_lock); 496 + #endif 497 + 498 + #ifdef PTE_ATOMIC_UPDATES 499 + __asm__ __volatile__( 500 + "1: ldarx %0,0,%3\n\ 501 + andi. %1,%0,%6\n\ 502 + bne- 1b \n\ 503 + andc %1,%0,%4 \n\ 504 + stdcx. %1,0,%3 \n\ 505 + bne- 1b" 506 + : "=&r" (old), "=&r" (tmp), "=m" (*pmdp) 507 + : "r" (pmdp), "r" (clr), "m" (*pmdp), "i" (_PAGE_BUSY) 508 + : "cc" ); 509 + #else 510 + old = pmd_val(*pmdp); 511 + *pmdp = __pmd(old & ~clr); 512 + #endif 513 + if (old & _PAGE_HASHPTE) 514 + hpte_do_hugepage_flush(mm, addr, pmdp); 515 + return old; 516 + } 517 + 518 + pmd_t pmdp_clear_flush(struct vm_area_struct *vma, unsigned long address, 519 + pmd_t *pmdp) 520 + { 521 + pmd_t pmd; 522 + 523 + VM_BUG_ON(address & ~HPAGE_PMD_MASK); 524 + if (pmd_trans_huge(*pmdp)) { 525 + pmd = pmdp_get_and_clear(vma->vm_mm, address, pmdp); 526 + } else { 527 + /* 528 + * khugepaged calls this for normal pmd 529 + */ 530 + pmd = *pmdp; 531 + pmd_clear(pmdp); 532 + /* 533 + * Wait for all pending hash_page to finish. This is needed 534 + * in case of subpage collapse. When we collapse normal pages 535 + * to hugepage, we first clear the pmd, then invalidate all 536 + * the PTE entries. The assumption here is that any low level 537 + * page fault will see a none pmd and take the slow path that 538 + * will wait on mmap_sem. But we could very well be in a 539 + * hash_page with local ptep pointer value. Such a hash page 540 + * can result in adding new HPTE entries for normal subpages. 541 + * That means we could be modifying the page content as we 542 + * copy them to a huge page. So wait for parallel hash_page 543 + * to finish before invalidating HPTE entries. We can do this 544 + * by sending an IPI to all the cpus and executing a dummy 545 + * function there. 546 + */ 547 + kick_all_cpus_sync(); 548 + /* 549 + * Now invalidate the hpte entries in the range 550 + * covered by pmd. This make sure we take a 551 + * fault and will find the pmd as none, which will 552 + * result in a major fault which takes mmap_sem and 553 + * hence wait for collapse to complete. Without this 554 + * the __collapse_huge_page_copy can result in copying 555 + * the old content. 556 + */ 557 + flush_tlb_pmd_range(vma->vm_mm, &pmd, address); 558 + } 559 + return pmd; 560 + } 561 + 562 + int pmdp_test_and_clear_young(struct vm_area_struct *vma, 563 + unsigned long address, pmd_t *pmdp) 564 + { 565 + return __pmdp_test_and_clear_young(vma->vm_mm, address, pmdp); 566 + } 567 + 568 + /* 569 + * We currently remove entries from the hashtable regardless of whether 570 + * the entry was young or dirty. The generic routines only flush if the 571 + * entry was young or dirty which is not good enough. 572 + * 573 + * We should be more intelligent about this but for the moment we override 574 + * these functions and force a tlb flush unconditionally 575 + */ 576 + int pmdp_clear_flush_young(struct vm_area_struct *vma, 577 + unsigned long address, pmd_t *pmdp) 578 + { 579 + return __pmdp_test_and_clear_young(vma->vm_mm, address, pmdp); 580 + } 581 + 582 + /* 583 + * We mark the pmd splitting and invalidate all the hpte 584 + * entries for this hugepage. 585 + */ 586 + void pmdp_splitting_flush(struct vm_area_struct *vma, 587 + unsigned long address, pmd_t *pmdp) 588 + { 589 + unsigned long old, tmp; 590 + 591 + VM_BUG_ON(address & ~HPAGE_PMD_MASK); 592 + 593 + #ifdef CONFIG_DEBUG_VM 594 + WARN_ON(!pmd_trans_huge(*pmdp)); 595 + assert_spin_locked(&vma->vm_mm->page_table_lock); 596 + #endif 597 + 598 + #ifdef PTE_ATOMIC_UPDATES 599 + 600 + __asm__ __volatile__( 601 + "1: ldarx %0,0,%3\n\ 602 + andi. %1,%0,%6\n\ 603 + bne- 1b \n\ 604 + ori %1,%0,%4 \n\ 605 + stdcx. %1,0,%3 \n\ 606 + bne- 1b" 607 + : "=&r" (old), "=&r" (tmp), "=m" (*pmdp) 608 + : "r" (pmdp), "i" (_PAGE_SPLITTING), "m" (*pmdp), "i" (_PAGE_BUSY) 609 + : "cc" ); 610 + #else 611 + old = pmd_val(*pmdp); 612 + *pmdp = __pmd(old | _PAGE_SPLITTING); 613 + #endif 614 + /* 615 + * If we didn't had the splitting flag set, go and flush the 616 + * HPTE entries. 617 + */ 618 + if (!(old & _PAGE_SPLITTING)) { 619 + /* We need to flush the hpte */ 620 + if (old & _PAGE_HASHPTE) 621 + hpte_do_hugepage_flush(vma->vm_mm, address, pmdp); 622 + } 623 + } 624 + 625 + /* 626 + * We want to put the pgtable in pmd and use pgtable for tracking 627 + * the base page size hptes 628 + */ 629 + void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, 630 + pgtable_t pgtable) 631 + { 632 + pgtable_t *pgtable_slot; 633 + assert_spin_locked(&mm->page_table_lock); 634 + /* 635 + * we store the pgtable in the second half of PMD 636 + */ 637 + pgtable_slot = (pgtable_t *)pmdp + PTRS_PER_PMD; 638 + *pgtable_slot = pgtable; 639 + /* 640 + * expose the deposited pgtable to other cpus. 641 + * before we set the hugepage PTE at pmd level 642 + * hash fault code looks at the deposted pgtable 643 + * to store hash index values. 644 + */ 645 + smp_wmb(); 646 + } 647 + 648 + pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) 649 + { 650 + pgtable_t pgtable; 651 + pgtable_t *pgtable_slot; 652 + 653 + assert_spin_locked(&mm->page_table_lock); 654 + pgtable_slot = (pgtable_t *)pmdp + PTRS_PER_PMD; 655 + pgtable = *pgtable_slot; 656 + /* 657 + * Once we withdraw, mark the entry NULL. 658 + */ 659 + *pgtable_slot = NULL; 660 + /* 661 + * We store HPTE information in the deposited PTE fragment. 662 + * zero out the content on withdraw. 663 + */ 664 + memset(pgtable, 0, PTE_FRAG_SIZE); 665 + return pgtable; 666 + } 667 + 668 + /* 669 + * set a new huge pmd. We should not be called for updating 670 + * an existing pmd entry. That should go via pmd_hugepage_update. 671 + */ 672 + void set_pmd_at(struct mm_struct *mm, unsigned long addr, 673 + pmd_t *pmdp, pmd_t pmd) 674 + { 675 + #ifdef CONFIG_DEBUG_VM 676 + WARN_ON(!pmd_none(*pmdp)); 677 + assert_spin_locked(&mm->page_table_lock); 678 + WARN_ON(!pmd_trans_huge(pmd)); 679 + #endif 680 + return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd)); 681 + } 682 + 683 + void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, 684 + pmd_t *pmdp) 685 + { 686 + pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT); 687 + } 688 + 689 + /* 690 + * A linux hugepage PMD was changed and the corresponding hash table entries 691 + * neesd to be flushed. 692 + */ 693 + void hpte_do_hugepage_flush(struct mm_struct *mm, unsigned long addr, 694 + pmd_t *pmdp) 695 + { 696 + int ssize, i; 697 + unsigned long s_addr; 698 + int max_hpte_count; 699 + unsigned int psize, valid; 700 + unsigned char *hpte_slot_array; 701 + unsigned long hidx, vpn, vsid, hash, shift, slot; 702 + 703 + /* 704 + * Flush all the hptes mapping this hugepage 705 + */ 706 + s_addr = addr & HPAGE_PMD_MASK; 707 + hpte_slot_array = get_hpte_slot_array(pmdp); 708 + /* 709 + * IF we try to do a HUGE PTE update after a withdraw is done. 710 + * we will find the below NULL. This happens when we do 711 + * split_huge_page_pmd 712 + */ 713 + if (!hpte_slot_array) 714 + return; 715 + 716 + /* get the base page size */ 717 + psize = get_slice_psize(mm, s_addr); 718 + 719 + if (ppc_md.hugepage_invalidate) 720 + return ppc_md.hugepage_invalidate(mm, hpte_slot_array, 721 + s_addr, psize); 722 + /* 723 + * No bluk hpte removal support, invalidate each entry 724 + */ 725 + shift = mmu_psize_defs[psize].shift; 726 + max_hpte_count = HPAGE_PMD_SIZE >> shift; 727 + for (i = 0; i < max_hpte_count; i++) { 728 + /* 729 + * 8 bits per each hpte entries 730 + * 000| [ secondary group (one bit) | hidx (3 bits) | valid bit] 731 + */ 732 + valid = hpte_valid(hpte_slot_array, i); 733 + if (!valid) 734 + continue; 735 + hidx = hpte_hash_index(hpte_slot_array, i); 736 + 737 + /* get the vpn */ 738 + addr = s_addr + (i * (1ul << shift)); 739 + if (!is_kernel_addr(addr)) { 740 + ssize = user_segment_size(addr); 741 + vsid = get_vsid(mm->context.id, addr, ssize); 742 + WARN_ON(vsid == 0); 743 + } else { 744 + vsid = get_kernel_vsid(addr, mmu_kernel_ssize); 745 + ssize = mmu_kernel_ssize; 746 + } 747 + 748 + vpn = hpt_vpn(addr, vsid, ssize); 749 + hash = hpt_hash(vpn, shift, ssize); 750 + if (hidx & _PTEIDX_SECONDARY) 751 + hash = ~hash; 752 + 753 + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; 754 + slot += hidx & _PTEIDX_GROUP_IX; 755 + ppc_md.hpte_invalidate(slot, vpn, psize, 756 + MMU_PAGE_16M, ssize, 0); 757 + } 758 + } 759 + 760 + static pmd_t pmd_set_protbits(pmd_t pmd, pgprot_t pgprot) 761 + { 762 + pmd_val(pmd) |= pgprot_val(pgprot); 763 + return pmd; 764 + } 765 + 766 + pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot) 767 + { 768 + pmd_t pmd; 769 + /* 770 + * For a valid pte, we would have _PAGE_PRESENT or _PAGE_FILE always 771 + * set. We use this to check THP page at pmd level. 772 + * leaf pte for huge page, bottom two bits != 00 773 + */ 774 + pmd_val(pmd) = pfn << PTE_RPN_SHIFT; 775 + pmd_val(pmd) |= _PAGE_THP_HUGE; 776 + pmd = pmd_set_protbits(pmd, pgprot); 777 + return pmd; 778 + } 779 + 780 + pmd_t mk_pmd(struct page *page, pgprot_t pgprot) 781 + { 782 + return pfn_pmd(page_to_pfn(page), pgprot); 783 + } 784 + 785 + pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) 786 + { 787 + 788 + pmd_val(pmd) &= _HPAGE_CHG_MASK; 789 + pmd = pmd_set_protbits(pmd, newprot); 790 + return pmd; 791 + } 792 + 793 + /* 794 + * This is called at the end of handling a user page fault, when the 795 + * fault has been handled by updating a HUGE PMD entry in the linux page tables. 796 + * We use it to preload an HPTE into the hash table corresponding to 797 + * the updated linux HUGE PMD entry. 798 + */ 799 + void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr, 800 + pmd_t *pmd) 801 + { 802 + return; 803 + } 804 + 805 + pmd_t pmdp_get_and_clear(struct mm_struct *mm, 806 + unsigned long addr, pmd_t *pmdp) 807 + { 808 + pmd_t old_pmd; 809 + pgtable_t pgtable; 810 + unsigned long old; 811 + pgtable_t *pgtable_slot; 812 + 813 + old = pmd_hugepage_update(mm, addr, pmdp, ~0UL); 814 + old_pmd = __pmd(old); 815 + /* 816 + * We have pmd == none and we are holding page_table_lock. 817 + * So we can safely go and clear the pgtable hash 818 + * index info. 819 + */ 820 + pgtable_slot = (pgtable_t *)pmdp + PTRS_PER_PMD; 821 + pgtable = *pgtable_slot; 822 + /* 823 + * Let's zero out old valid and hash index details 824 + * hash fault look at them. 825 + */ 826 + memset(pgtable, 0, PTE_FRAG_SIZE); 827 + return old_pmd; 828 + } 829 + 830 + int has_transparent_hugepage(void) 831 + { 832 + if (!mmu_has_feature(MMU_FTR_16M_PAGE)) 833 + return 0; 834 + /* 835 + * We support THP only if PMD_SIZE is 16MB. 836 + */ 837 + if (mmu_psize_defs[MMU_PAGE_16M].shift != PMD_SHIFT) 838 + return 0; 839 + /* 840 + * We need to make sure that we support 16MB hugepage in a segement 841 + * with base page size 64K or 4K. We only enable THP with a PAGE_SIZE 842 + * of 64K. 843 + */ 844 + /* 845 + * If we have 64K HPTE, we will be using that by default 846 + */ 847 + if (mmu_psize_defs[MMU_PAGE_64K].shift && 848 + (mmu_psize_defs[MMU_PAGE_64K].penc[MMU_PAGE_16M] == -1)) 849 + return 0; 850 + /* 851 + * Ok we only have 4K HPTE 852 + */ 853 + if (mmu_psize_defs[MMU_PAGE_4K].penc[MMU_PAGE_16M] == -1) 854 + return 0; 855 + 856 + return 1; 857 + } 858 + #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+48
arch/powerpc/mm/subpage-prot.c
··· 130 130 up_write(&mm->mmap_sem); 131 131 } 132 132 133 + #ifdef CONFIG_TRANSPARENT_HUGEPAGE 134 + static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr, 135 + unsigned long end, struct mm_walk *walk) 136 + { 137 + struct vm_area_struct *vma = walk->private; 138 + split_huge_page_pmd(vma, addr, pmd); 139 + return 0; 140 + } 141 + 142 + static void subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long addr, 143 + unsigned long len) 144 + { 145 + struct vm_area_struct *vma; 146 + struct mm_walk subpage_proto_walk = { 147 + .mm = mm, 148 + .pmd_entry = subpage_walk_pmd_entry, 149 + }; 150 + 151 + /* 152 + * We don't try too hard, we just mark all the vma in that range 153 + * VM_NOHUGEPAGE and split them. 154 + */ 155 + vma = find_vma(mm, addr); 156 + /* 157 + * If the range is in unmapped range, just return 158 + */ 159 + if (vma && ((addr + len) <= vma->vm_start)) 160 + return; 161 + 162 + while (vma) { 163 + if (vma->vm_start >= (addr + len)) 164 + break; 165 + vma->vm_flags |= VM_NOHUGEPAGE; 166 + subpage_proto_walk.private = vma; 167 + walk_page_range(vma->vm_start, vma->vm_end, 168 + &subpage_proto_walk); 169 + vma = vma->vm_next; 170 + } 171 + } 172 + #else 173 + static void subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long addr, 174 + unsigned long len) 175 + { 176 + return; 177 + } 178 + #endif 179 + 133 180 /* 134 181 * Copy in a subpage protection map for an address range. 135 182 * The map has 2 bits per 4k subpage, so 32 bits per 64k page. ··· 215 168 return -EFAULT; 216 169 217 170 down_write(&mm->mmap_sem); 171 + subpage_mark_vma_nohuge(mm, addr, len); 218 172 for (limit = addr + len; addr < limit; addr = next) { 219 173 next = pmd_addr_end(addr, limit); 220 174 err = -ENOMEM;
+34 -2
arch/powerpc/mm/tlb_hash64.c
··· 189 189 void __flush_hash_table_range(struct mm_struct *mm, unsigned long start, 190 190 unsigned long end) 191 191 { 192 + int hugepage_shift; 192 193 unsigned long flags; 193 194 194 195 start = _ALIGN_DOWN(start, PAGE_SIZE); ··· 207 206 local_irq_save(flags); 208 207 arch_enter_lazy_mmu_mode(); 209 208 for (; start < end; start += PAGE_SIZE) { 210 - pte_t *ptep = find_linux_pte(mm->pgd, start); 209 + pte_t *ptep = find_linux_pte_or_hugepte(mm->pgd, start, 210 + &hugepage_shift); 211 211 unsigned long pte; 212 212 213 213 if (ptep == NULL) ··· 216 214 pte = pte_val(*ptep); 217 215 if (!(pte & _PAGE_HASHPTE)) 218 216 continue; 219 - hpte_need_flush(mm, start, ptep, pte, 0); 217 + if (unlikely(hugepage_shift && pmd_trans_huge(*(pmd_t *)pte))) 218 + hpte_do_hugepage_flush(mm, start, (pmd_t *)pte); 219 + else 220 + hpte_need_flush(mm, start, ptep, pte, 0); 221 + } 222 + arch_leave_lazy_mmu_mode(); 223 + local_irq_restore(flags); 224 + } 225 + 226 + void flush_tlb_pmd_range(struct mm_struct *mm, pmd_t *pmd, unsigned long addr) 227 + { 228 + pte_t *pte; 229 + pte_t *start_pte; 230 + unsigned long flags; 231 + 232 + addr = _ALIGN_DOWN(addr, PMD_SIZE); 233 + /* Note: Normally, we should only ever use a batch within a 234 + * PTE locked section. This violates the rule, but will work 235 + * since we don't actually modify the PTEs, we just flush the 236 + * hash while leaving the PTEs intact (including their reference 237 + * to being hashed). This is not the most performance oriented 238 + * way to do things but is fine for our needs here. 239 + */ 240 + local_irq_save(flags); 241 + arch_enter_lazy_mmu_mode(); 242 + start_pte = pte_offset_map(pmd, addr); 243 + for (pte = start_pte; pte < start_pte + PTRS_PER_PTE; pte++) { 244 + unsigned long pteval = pte_val(*pte); 245 + if (pteval & _PAGE_HASHPTE) 246 + hpte_need_flush(mm, addr, pte, pteval, 0); 247 + addr += PAGE_SIZE; 220 248 } 221 249 arch_leave_lazy_mmu_mode(); 222 250 local_irq_restore(flags);
+1 -1
arch/powerpc/mm/tlb_nohash.c
··· 648 648 __early_init_mmu(1); 649 649 } 650 650 651 - void __cpuinit early_init_mmu_secondary(void) 651 + void early_init_mmu_secondary(void) 652 652 { 653 653 __early_init_mmu(0); 654 654 }
+174 -27
arch/powerpc/perf/core-book3s.c
··· 75 75 76 76 #define MMCR0_FCHV 0 77 77 #define MMCR0_PMCjCE MMCR0_PMCnCE 78 + #define MMCR0_FC56 0 79 + #define MMCR0_PMAO 0 80 + #define MMCR0_EBE 0 81 + #define MMCR0_PMCC 0 82 + #define MMCR0_PMCC_U6 0 78 83 79 84 #define SPRN_MMCRA SPRN_MMCR2 80 85 #define MMCRA_SAMPLE_ENABLE 0 ··· 105 100 static inline int siar_valid(struct pt_regs *regs) 106 101 { 107 102 return 1; 103 + } 104 + 105 + static bool is_ebb_event(struct perf_event *event) { return false; } 106 + static int ebb_event_check(struct perf_event *event) { return 0; } 107 + static void ebb_event_add(struct perf_event *event) { } 108 + static void ebb_switch_out(unsigned long mmcr0) { } 109 + static unsigned long ebb_switch_in(bool ebb, unsigned long mmcr0) 110 + { 111 + return mmcr0; 108 112 } 109 113 110 114 static inline void power_pmu_bhrb_enable(struct perf_event *event) {} ··· 476 462 return; 477 463 } 478 464 465 + static bool is_ebb_event(struct perf_event *event) 466 + { 467 + /* 468 + * This could be a per-PMU callback, but we'd rather avoid the cost. We 469 + * check that the PMU supports EBB, meaning those that don't can still 470 + * use bit 63 of the event code for something else if they wish. 471 + */ 472 + return (ppmu->flags & PPMU_EBB) && 473 + ((event->attr.config >> EVENT_CONFIG_EBB_SHIFT) & 1); 474 + } 475 + 476 + static int ebb_event_check(struct perf_event *event) 477 + { 478 + struct perf_event *leader = event->group_leader; 479 + 480 + /* Event and group leader must agree on EBB */ 481 + if (is_ebb_event(leader) != is_ebb_event(event)) 482 + return -EINVAL; 483 + 484 + if (is_ebb_event(event)) { 485 + if (!(event->attach_state & PERF_ATTACH_TASK)) 486 + return -EINVAL; 487 + 488 + if (!leader->attr.pinned || !leader->attr.exclusive) 489 + return -EINVAL; 490 + 491 + if (event->attr.inherit || event->attr.sample_period || 492 + event->attr.enable_on_exec || event->attr.freq) 493 + return -EINVAL; 494 + } 495 + 496 + return 0; 497 + } 498 + 499 + static void ebb_event_add(struct perf_event *event) 500 + { 501 + if (!is_ebb_event(event) || current->thread.used_ebb) 502 + return; 503 + 504 + /* 505 + * IFF this is the first time we've added an EBB event, set 506 + * PMXE in the user MMCR0 so we can detect when it's cleared by 507 + * userspace. We need this so that we can context switch while 508 + * userspace is in the EBB handler (where PMXE is 0). 509 + */ 510 + current->thread.used_ebb = 1; 511 + current->thread.mmcr0 |= MMCR0_PMXE; 512 + } 513 + 514 + static void ebb_switch_out(unsigned long mmcr0) 515 + { 516 + if (!(mmcr0 & MMCR0_EBE)) 517 + return; 518 + 519 + current->thread.siar = mfspr(SPRN_SIAR); 520 + current->thread.sier = mfspr(SPRN_SIER); 521 + current->thread.sdar = mfspr(SPRN_SDAR); 522 + current->thread.mmcr0 = mmcr0 & MMCR0_USER_MASK; 523 + current->thread.mmcr2 = mfspr(SPRN_MMCR2) & MMCR2_USER_MASK; 524 + } 525 + 526 + static unsigned long ebb_switch_in(bool ebb, unsigned long mmcr0) 527 + { 528 + if (!ebb) 529 + goto out; 530 + 531 + /* Enable EBB and read/write to all 6 PMCs for userspace */ 532 + mmcr0 |= MMCR0_EBE | MMCR0_PMCC_U6; 533 + 534 + /* Add any bits from the user reg, FC or PMAO */ 535 + mmcr0 |= current->thread.mmcr0; 536 + 537 + /* Be careful not to set PMXE if userspace had it cleared */ 538 + if (!(current->thread.mmcr0 & MMCR0_PMXE)) 539 + mmcr0 &= ~MMCR0_PMXE; 540 + 541 + mtspr(SPRN_SIAR, current->thread.siar); 542 + mtspr(SPRN_SIER, current->thread.sier); 543 + mtspr(SPRN_SDAR, current->thread.sdar); 544 + mtspr(SPRN_MMCR2, current->thread.mmcr2); 545 + out: 546 + return mmcr0; 547 + } 479 548 #endif /* CONFIG_PPC64 */ 480 549 481 550 static void perf_event_interrupt(struct pt_regs *regs); ··· 829 732 830 733 if (!event->hw.idx) 831 734 return; 735 + 736 + if (is_ebb_event(event)) { 737 + val = read_pmc(event->hw.idx); 738 + local64_set(&event->hw.prev_count, val); 739 + return; 740 + } 741 + 832 742 /* 833 743 * Performance monitor interrupts come even when interrupts 834 744 * are soft-disabled, as long as interrupts are hard-enabled. ··· 956 852 static void power_pmu_disable(struct pmu *pmu) 957 853 { 958 854 struct cpu_hw_events *cpuhw; 959 - unsigned long flags; 855 + unsigned long flags, mmcr0, val; 960 856 961 857 if (!ppmu) 962 858 return; ··· 964 860 cpuhw = &__get_cpu_var(cpu_hw_events); 965 861 966 862 if (!cpuhw->disabled) { 967 - cpuhw->disabled = 1; 968 - cpuhw->n_added = 0; 969 - 970 863 /* 971 864 * Check if we ever enabled the PMU on this cpu. 972 865 */ ··· 971 870 ppc_enable_pmcs(); 972 871 cpuhw->pmcs_enabled = 1; 973 872 } 873 + 874 + /* 875 + * Set the 'freeze counters' bit, clear EBE/PMCC/PMAO/FC56. 876 + */ 877 + val = mmcr0 = mfspr(SPRN_MMCR0); 878 + val |= MMCR0_FC; 879 + val &= ~(MMCR0_EBE | MMCR0_PMCC | MMCR0_PMAO | MMCR0_FC56); 880 + 881 + /* 882 + * The barrier is to make sure the mtspr has been 883 + * executed and the PMU has frozen the events etc. 884 + * before we return. 885 + */ 886 + write_mmcr0(cpuhw, val); 887 + mb(); 974 888 975 889 /* 976 890 * Disable instruction sampling if it was enabled ··· 996 880 mb(); 997 881 } 998 882 999 - /* 1000 - * Set the 'freeze counters' bit. 1001 - * The barrier is to make sure the mtspr has been 1002 - * executed and the PMU has frozen the events 1003 - * before we return. 1004 - */ 1005 - write_mmcr0(cpuhw, mfspr(SPRN_MMCR0) | MMCR0_FC); 1006 - mb(); 883 + cpuhw->disabled = 1; 884 + cpuhw->n_added = 0; 885 + 886 + ebb_switch_out(mmcr0); 1007 887 } 888 + 1008 889 local_irq_restore(flags); 1009 890 } 1010 891 ··· 1016 903 struct cpu_hw_events *cpuhw; 1017 904 unsigned long flags; 1018 905 long i; 1019 - unsigned long val; 906 + unsigned long val, mmcr0; 1020 907 s64 left; 1021 908 unsigned int hwc_index[MAX_HWEVENTS]; 1022 909 int n_lim; 1023 910 int idx; 911 + bool ebb; 1024 912 1025 913 if (!ppmu) 1026 914 return; 1027 915 local_irq_save(flags); 916 + 1028 917 cpuhw = &__get_cpu_var(cpu_hw_events); 1029 - if (!cpuhw->disabled) { 1030 - local_irq_restore(flags); 1031 - return; 918 + if (!cpuhw->disabled) 919 + goto out; 920 + 921 + if (cpuhw->n_events == 0) { 922 + ppc_set_pmu_inuse(0); 923 + goto out; 1032 924 } 925 + 1033 926 cpuhw->disabled = 0; 927 + 928 + /* 929 + * EBB requires an exclusive group and all events must have the EBB 930 + * flag set, or not set, so we can just check a single event. Also we 931 + * know we have at least one event. 932 + */ 933 + ebb = is_ebb_event(cpuhw->event[0]); 1034 934 1035 935 /* 1036 936 * If we didn't change anything, or only removed events, ··· 1054 928 if (!cpuhw->n_added) { 1055 929 mtspr(SPRN_MMCRA, cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE); 1056 930 mtspr(SPRN_MMCR1, cpuhw->mmcr[1]); 1057 - if (cpuhw->n_events == 0) 1058 - ppc_set_pmu_inuse(0); 1059 931 goto out_enable; 1060 932 } 1061 933 ··· 1120 996 ++n_lim; 1121 997 continue; 1122 998 } 1123 - val = 0; 1124 - if (event->hw.sample_period) { 1125 - left = local64_read(&event->hw.period_left); 1126 - if (left < 0x80000000L) 1127 - val = 0x80000000L - left; 999 + 1000 + if (ebb) 1001 + val = local64_read(&event->hw.prev_count); 1002 + else { 1003 + val = 0; 1004 + if (event->hw.sample_period) { 1005 + left = local64_read(&event->hw.period_left); 1006 + if (left < 0x80000000L) 1007 + val = 0x80000000L - left; 1008 + } 1009 + local64_set(&event->hw.prev_count, val); 1128 1010 } 1129 - local64_set(&event->hw.prev_count, val); 1011 + 1130 1012 event->hw.idx = idx; 1131 1013 if (event->hw.state & PERF_HES_STOPPED) 1132 1014 val = 0; 1133 1015 write_pmc(idx, val); 1016 + 1134 1017 perf_event_update_userpage(event); 1135 1018 } 1136 1019 cpuhw->n_limited = n_lim; 1137 1020 cpuhw->mmcr[0] |= MMCR0_PMXE | MMCR0_FCECE; 1138 1021 1139 1022 out_enable: 1023 + mmcr0 = ebb_switch_in(ebb, cpuhw->mmcr[0]); 1024 + 1140 1025 mb(); 1141 - write_mmcr0(cpuhw, cpuhw->mmcr[0]); 1026 + write_mmcr0(cpuhw, mmcr0); 1142 1027 1143 1028 /* 1144 1029 * Enable instruction sampling if necessary ··· 1245 1112 event->hw.config = cpuhw->events[n0]; 1246 1113 1247 1114 nocheck: 1115 + ebb_event_add(event); 1116 + 1248 1117 ++cpuhw->n_events; 1249 1118 ++cpuhw->n_added; 1250 1119 ··· 1607 1472 } 1608 1473 } 1609 1474 1475 + /* Extra checks for EBB */ 1476 + err = ebb_event_check(event); 1477 + if (err) 1478 + return err; 1479 + 1610 1480 /* 1611 1481 * If this is in a group, check if it can go on with all the 1612 1482 * other hardware events in the group. We assume the event ··· 1649 1509 event->hw.event_base = cflags[n]; 1650 1510 event->hw.last_period = event->hw.sample_period; 1651 1511 local64_set(&event->hw.period_left, event->hw.last_period); 1512 + 1513 + /* 1514 + * For EBB events we just context switch the PMC value, we don't do any 1515 + * of the sample_period logic. We use hw.prev_count for this. 1516 + */ 1517 + if (is_ebb_event(event)) 1518 + local64_set(&event->hw.prev_count, 0); 1652 1519 1653 1520 /* 1654 1521 * See if we need to reserve the PMU. ··· 1933 1786 cpuhw->mmcr[0] = MMCR0_FC; 1934 1787 } 1935 1788 1936 - static int __cpuinit 1789 + static int 1937 1790 power_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu) 1938 1791 { 1939 1792 unsigned int cpu = (long)hcpu; ··· 1950 1803 return NOTIFY_OK; 1951 1804 } 1952 1805 1953 - int __cpuinit register_power_pmu(struct power_pmu *pmu) 1806 + int register_power_pmu(struct power_pmu *pmu) 1954 1807 { 1955 1808 if (ppmu) 1956 1809 return -EBUSY; /* something's already registered */
+50 -12
arch/powerpc/perf/power8-pmu.c
··· 31 31 * 32 32 * 60 56 52 48 44 40 36 32 33 33 * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | 34 - * [ thresh_cmp ] [ thresh_ctl ] 35 - * | 36 - * thresh start/stop OR FAB match -* 34 + * | [ thresh_cmp ] [ thresh_ctl ] 35 + * | | 36 + * *- EBB (Linux) thresh start/stop OR FAB match -* 37 37 * 38 38 * 28 24 20 16 12 8 4 0 39 39 * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | ··· 85 85 * 86 86 */ 87 87 88 + #define EVENT_EBB_MASK 1ull 88 89 #define EVENT_THR_CMP_SHIFT 40 /* Threshold CMP value */ 89 90 #define EVENT_THR_CMP_MASK 0x3ff 90 91 #define EVENT_THR_CTL_SHIFT 32 /* Threshold control value (start/stop) */ ··· 110 109 #define EVENT_IS_MARKED (EVENT_MARKED_MASK << EVENT_MARKED_SHIFT) 111 110 #define EVENT_PSEL_MASK 0xff /* PMCxSEL value */ 112 111 112 + #define EVENT_VALID_MASK \ 113 + ((EVENT_THRESH_MASK << EVENT_THRESH_SHIFT) | \ 114 + (EVENT_SAMPLE_MASK << EVENT_SAMPLE_SHIFT) | \ 115 + (EVENT_CACHE_SEL_MASK << EVENT_CACHE_SEL_SHIFT) | \ 116 + (EVENT_PMC_MASK << EVENT_PMC_SHIFT) | \ 117 + (EVENT_UNIT_MASK << EVENT_UNIT_SHIFT) | \ 118 + (EVENT_COMBINE_MASK << EVENT_COMBINE_SHIFT) | \ 119 + (EVENT_MARKED_MASK << EVENT_MARKED_SHIFT) | \ 120 + (EVENT_EBB_MASK << EVENT_CONFIG_EBB_SHIFT) | \ 121 + EVENT_PSEL_MASK) 122 + 113 123 /* MMCRA IFM bits - POWER8 */ 114 124 #define POWER8_MMCRA_IFM1 0x0000000040000000UL 115 125 #define POWER8_MMCRA_IFM2 0x0000000080000000UL ··· 142 130 * 143 131 * 28 24 20 16 12 8 4 0 144 132 * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | 145 - * [ ] [ sample ] [ ] [6] [5] [4] [3] [2] [1] 146 - * | | 147 - * L1 I/D qualifier -* | Count of events for each PMC. 148 - * | p1, p2, p3, p4, p5, p6. 133 + * | [ ] [ sample ] [ ] [6] [5] [4] [3] [2] [1] 134 + * EBB -* | | 135 + * | | Count of events for each PMC. 136 + * L1 I/D qualifier -* | p1, p2, p3, p4, p5, p6. 149 137 * nc - number of counters -* 150 138 * 151 139 * The PMC fields P1..P6, and NC, are adder fields. As we accumulate constraints ··· 160 148 /* We just throw all the threshold bits into the constraint */ 161 149 #define CNST_THRESH_VAL(v) (((v) & EVENT_THRESH_MASK) << 32) 162 150 #define CNST_THRESH_MASK CNST_THRESH_VAL(EVENT_THRESH_MASK) 151 + 152 + #define CNST_EBB_VAL(v) (((v) & EVENT_EBB_MASK) << 24) 153 + #define CNST_EBB_MASK CNST_EBB_VAL(EVENT_EBB_MASK) 163 154 164 155 #define CNST_L1_QUAL_VAL(v) (((v) & 3) << 22) 165 156 #define CNST_L1_QUAL_MASK CNST_L1_QUAL_VAL(3) ··· 222 207 223 208 static int power8_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp) 224 209 { 225 - unsigned int unit, pmc, cache; 210 + unsigned int unit, pmc, cache, ebb; 226 211 unsigned long mask, value; 227 212 228 213 mask = value = 0; 229 214 230 - pmc = (event >> EVENT_PMC_SHIFT) & EVENT_PMC_MASK; 231 - unit = (event >> EVENT_UNIT_SHIFT) & EVENT_UNIT_MASK; 232 - cache = (event >> EVENT_CACHE_SEL_SHIFT) & EVENT_CACHE_SEL_MASK; 215 + if (event & ~EVENT_VALID_MASK) 216 + return -1; 217 + 218 + pmc = (event >> EVENT_PMC_SHIFT) & EVENT_PMC_MASK; 219 + unit = (event >> EVENT_UNIT_SHIFT) & EVENT_UNIT_MASK; 220 + cache = (event >> EVENT_CACHE_SEL_SHIFT) & EVENT_CACHE_SEL_MASK; 221 + ebb = (event >> EVENT_CONFIG_EBB_SHIFT) & EVENT_EBB_MASK; 222 + 223 + /* Clear the EBB bit in the event, so event checks work below */ 224 + event &= ~(EVENT_EBB_MASK << EVENT_CONFIG_EBB_SHIFT); 233 225 234 226 if (pmc) { 235 227 if (pmc > 6) ··· 305 283 mask |= CNST_THRESH_MASK; 306 284 value |= CNST_THRESH_VAL(event >> EVENT_THRESH_SHIFT); 307 285 } 286 + 287 + if (!pmc && ebb) 288 + /* EBB events must specify the PMC */ 289 + return -1; 290 + 291 + /* 292 + * All events must agree on EBB, either all request it or none. 293 + * EBB events are pinned & exclusive, so this should never actually 294 + * hit, but we leave it as a fallback in case. 295 + */ 296 + mask |= CNST_EBB_VAL(ebb); 297 + value |= CNST_EBB_MASK; 308 298 309 299 *maskp = mask; 310 300 *valp = value; ··· 411 377 412 378 if (pmc_inuse & 0x7c) 413 379 mmcr[0] |= MMCR0_PMCjCE; 380 + 381 + /* If we're not using PMC 5 or 6, freeze them */ 382 + if (!(pmc_inuse & 0x60)) 383 + mmcr[0] |= MMCR0_FC56; 414 384 415 385 mmcr[1] = mmcr1; 416 386 mmcr[2] = mmcra; ··· 612 574 .get_constraint = power8_get_constraint, 613 575 .get_alternatives = power8_get_alternatives, 614 576 .disable_pmc = power8_disable_pmc, 615 - .flags = PPMU_HAS_SSLOT | PPMU_HAS_SIER | PPMU_BHRB, 577 + .flags = PPMU_HAS_SSLOT | PPMU_HAS_SIER | PPMU_BHRB | PPMU_EBB, 616 578 .n_generic = ARRAY_SIZE(power8_generic_events), 617 579 .generic_events = power8_generic_events, 618 580 .attr_groups = power8_pmu_attr_groups,
+39 -4
arch/powerpc/platforms/44x/currituck.c
··· 91 91 } 92 92 93 93 #ifdef CONFIG_SMP 94 - static void __cpuinit smp_ppc47x_setup_cpu(int cpu) 94 + static void smp_ppc47x_setup_cpu(int cpu) 95 95 { 96 96 mpic_setup_this_cpu(); 97 97 } 98 98 99 - static int __cpuinit smp_ppc47x_kick_cpu(int cpu) 99 + static int smp_ppc47x_kick_cpu(int cpu) 100 100 { 101 101 struct device_node *cpunode = of_get_cpu_node(cpu, NULL); 102 102 const u64 *spin_table_addr_prop; ··· 176 176 return 1; 177 177 } 178 178 179 + static int board_rev = -1; 180 + static int __init ppc47x_get_board_rev(void) 181 + { 182 + u8 fpga_reg0; 183 + void *fpga; 184 + struct device_node *np; 185 + 186 + np = of_find_compatible_node(NULL, NULL, "ibm,currituck-fpga"); 187 + if (!np) 188 + goto fail; 189 + 190 + fpga = of_iomap(np, 0); 191 + of_node_put(np); 192 + if (!fpga) 193 + goto fail; 194 + 195 + fpga_reg0 = ioread8(fpga); 196 + board_rev = fpga_reg0 & 0x03; 197 + pr_info("%s: Found board revision %d\n", __func__, board_rev); 198 + iounmap(fpga); 199 + return 0; 200 + 201 + fail: 202 + pr_info("%s: Unable to find board revision\n", __func__); 203 + return 0; 204 + } 205 + machine_arch_initcall(ppc47x, ppc47x_get_board_rev); 206 + 179 207 /* Use USB controller should have been hardware swizzled but it wasn't :( */ 180 208 static void ppc47x_pci_irq_fixup(struct pci_dev *dev) 181 209 { 182 210 if (dev->vendor == 0x1033 && (dev->device == 0x0035 || 183 211 dev->device == 0x00e0)) { 184 - dev->irq = irq_create_mapping(NULL, 47); 185 - pr_info("%s: Mapping irq 47 %d\n", __func__, dev->irq); 212 + if (board_rev == 0) { 213 + dev->irq = irq_create_mapping(NULL, 47); 214 + pr_info("%s: Mapping irq %d\n", __func__, dev->irq); 215 + } else if (board_rev == 2) { 216 + dev->irq = irq_create_mapping(NULL, 49); 217 + pr_info("%s: Mapping irq %d\n", __func__, dev->irq); 218 + } else { 219 + pr_alert("%s: Unknown board revision\n", __func__); 220 + } 186 221 } 187 222 } 188 223
+2 -2
arch/powerpc/platforms/44x/iss4xx.c
··· 81 81 } 82 82 83 83 #ifdef CONFIG_SMP 84 - static void __cpuinit smp_iss4xx_setup_cpu(int cpu) 84 + static void smp_iss4xx_setup_cpu(int cpu) 85 85 { 86 86 mpic_setup_this_cpu(); 87 87 } 88 88 89 - static int __cpuinit smp_iss4xx_kick_cpu(int cpu) 89 + static int smp_iss4xx_kick_cpu(int cpu) 90 90 { 91 91 struct device_node *cpunode = of_get_cpu_node(cpu, NULL); 92 92 const u64 *spin_table_addr_prop;
+2 -4
arch/powerpc/platforms/512x/mpc5121_ads.c
··· 43 43 mpc83xx_add_bridge(np); 44 44 #endif 45 45 46 - #if defined(CONFIG_FB_FSL_DIU) || defined(CONFIG_FB_FSL_DIU_MODULE) 47 - mpc512x_setup_diu(); 48 - #endif 46 + mpc512x_setup_arch(); 49 47 } 50 48 51 49 static void __init mpc5121_ads_init_IRQ(void) ··· 67 69 .probe = mpc5121_ads_probe, 68 70 .setup_arch = mpc5121_ads_setup_arch, 69 71 .init = mpc512x_init, 70 - .init_early = mpc512x_init_diu, 72 + .init_early = mpc512x_init_early, 71 73 .init_IRQ = mpc5121_ads_init_IRQ, 72 74 .get_irq = ipic_get_irq, 73 75 .calibrate_decr = generic_calibrate_decr,
+3 -9
arch/powerpc/platforms/512x/mpc512x.h
··· 12 12 #ifndef __MPC512X_H__ 13 13 #define __MPC512X_H__ 14 14 extern void __init mpc512x_init_IRQ(void); 15 + extern void __init mpc512x_init_early(void); 15 16 extern void __init mpc512x_init(void); 17 + extern void __init mpc512x_setup_arch(void); 16 18 extern int __init mpc5121_clk_init(void); 17 - void __init mpc512x_declare_of_platform_devices(void); 18 19 extern const char *mpc512x_select_psc_compat(void); 20 + extern const char *mpc512x_select_reset_compat(void); 19 21 extern void mpc512x_restart(char *cmd); 20 - 21 - #if defined(CONFIG_FB_FSL_DIU) || defined(CONFIG_FB_FSL_DIU_MODULE) 22 - void mpc512x_init_diu(void); 23 - void mpc512x_setup_diu(void); 24 - #else 25 - #define mpc512x_init_diu NULL 26 - #define mpc512x_setup_diu NULL 27 - #endif 28 22 29 23 #endif /* __MPC512X_H__ */
+2 -2
arch/powerpc/platforms/512x/mpc512x_generic.c
··· 45 45 .name = "MPC512x generic", 46 46 .probe = mpc512x_generic_probe, 47 47 .init = mpc512x_init, 48 - .init_early = mpc512x_init_diu, 49 - .setup_arch = mpc512x_setup_diu, 48 + .init_early = mpc512x_init_early, 49 + .setup_arch = mpc512x_setup_arch, 50 50 .init_IRQ = mpc512x_init_IRQ, 51 51 .get_irq = ipic_get_irq, 52 52 .calibrate_decr = generic_calibrate_decr,
+28 -3
arch/powerpc/platforms/512x/mpc512x_shared.c
··· 35 35 static void __init mpc512x_restart_init(void) 36 36 { 37 37 struct device_node *np; 38 + const char *reset_compat; 38 39 39 - np = of_find_compatible_node(NULL, NULL, "fsl,mpc5121-reset"); 40 + reset_compat = mpc512x_select_reset_compat(); 41 + np = of_find_compatible_node(NULL, NULL, reset_compat); 40 42 if (!np) 41 43 return; 42 44 ··· 60 58 ; 61 59 } 62 60 63 - #if defined(CONFIG_FB_FSL_DIU) || defined(CONFIG_FB_FSL_DIU_MODULE) 61 + #if IS_ENABLED(CONFIG_FB_FSL_DIU) 64 62 65 63 struct fsl_diu_shared_fb { 66 64 u8 gamma[0x300]; /* 32-bit aligned! */ ··· 357 355 return NULL; 358 356 } 359 357 358 + const char *mpc512x_select_reset_compat(void) 359 + { 360 + if (of_machine_is_compatible("fsl,mpc5121")) 361 + return "fsl,mpc5121-reset"; 362 + 363 + if (of_machine_is_compatible("fsl,mpc5125")) 364 + return "fsl,mpc5125-reset"; 365 + 366 + return NULL; 367 + } 368 + 360 369 static unsigned int __init get_fifo_size(struct device_node *np, 361 370 char *prop_name) 362 371 { ··· 449 436 } 450 437 } 451 438 439 + void __init mpc512x_init_early(void) 440 + { 441 + mpc512x_restart_init(); 442 + if (IS_ENABLED(CONFIG_FB_FSL_DIU)) 443 + mpc512x_init_diu(); 444 + } 445 + 452 446 void __init mpc512x_init(void) 453 447 { 454 448 mpc5121_clk_init(); 455 449 mpc512x_declare_of_platform_devices(); 456 - mpc512x_restart_init(); 457 450 mpc512x_psc_fifo_init(); 451 + } 452 + 453 + void __init mpc512x_setup_arch(void) 454 + { 455 + if (IS_ENABLED(CONFIG_FB_FSL_DIU)) 456 + mpc512x_setup_diu(); 458 457 } 459 458 460 459 /**
+2 -2
arch/powerpc/platforms/512x/pdm360ng.c
··· 119 119 define_machine(pdm360ng) { 120 120 .name = "PDM360NG", 121 121 .probe = pdm360ng_probe, 122 - .setup_arch = mpc512x_setup_diu, 122 + .setup_arch = mpc512x_setup_arch, 123 123 .init = pdm360ng_init, 124 - .init_early = mpc512x_init_diu, 124 + .init_early = mpc512x_init_early, 125 125 .init_IRQ = mpc512x_init_IRQ, 126 126 .get_irq = ipic_get_irq, 127 127 .calibrate_decr = generic_calibrate_decr,
+1 -11
arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c
··· 231 231 .id_table = mcu_ids, 232 232 }; 233 233 234 - static int __init mcu_init(void) 235 - { 236 - return i2c_add_driver(&mcu_driver); 237 - } 238 - module_init(mcu_init); 239 - 240 - static void __exit mcu_exit(void) 241 - { 242 - i2c_del_driver(&mcu_driver); 243 - } 244 - module_exit(mcu_exit); 234 + module_i2c_driver(mcu_driver); 245 235 246 236 MODULE_DESCRIPTION("Power Management and GPIO expander driver for " 247 237 "MPC8349E-mITX-compatible MCU");
-5
arch/powerpc/platforms/85xx/p5020_ds.c
··· 75 75 #ifdef CONFIG_PCI 76 76 .pcibios_fixup_bus = fsl_pcibios_fixup_bus, 77 77 #endif 78 - /* coreint doesn't play nice with lazy EE, use legacy mpic for now */ 79 - #ifdef CONFIG_PPC64 80 - .get_irq = mpic_get_irq, 81 - #else 82 78 .get_irq = mpic_get_coreint_irq, 83 - #endif 84 79 .restart = fsl_rstcr_restart, 85 80 .calibrate_decr = generic_calibrate_decr, 86 81 .progress = udbg_progress,
-5
arch/powerpc/platforms/85xx/p5040_ds.c
··· 66 66 #ifdef CONFIG_PCI 67 67 .pcibios_fixup_bus = fsl_pcibios_fixup_bus, 68 68 #endif 69 - /* coreint doesn't play nice with lazy EE, use legacy mpic for now */ 70 - #ifdef CONFIG_PPC64 71 - .get_irq = mpic_get_irq, 72 - #else 73 69 .get_irq = mpic_get_coreint_irq, 74 - #endif 75 70 .restart = fsl_rstcr_restart, 76 71 .calibrate_decr = generic_calibrate_decr, 77 72 .progress = udbg_progress,
+3 -3
arch/powerpc/platforms/85xx/smp.c
··· 99 99 } 100 100 101 101 #ifdef CONFIG_HOTPLUG_CPU 102 - static void __cpuinit smp_85xx_mach_cpu_die(void) 102 + static void smp_85xx_mach_cpu_die(void) 103 103 { 104 104 unsigned int cpu = smp_processor_id(); 105 105 u32 tmp; ··· 141 141 return in_be32(&((struct epapr_spin_table *)spin_table)->addr_l); 142 142 } 143 143 144 - static int __cpuinit smp_85xx_kick_cpu(int nr) 144 + static int smp_85xx_kick_cpu(int nr) 145 145 { 146 146 unsigned long flags; 147 147 const u64 *cpu_rel_addr; ··· 362 362 } 363 363 #endif /* CONFIG_KEXEC */ 364 364 365 - static void __cpuinit smp_85xx_setup_cpu(int cpu_nr) 365 + static void smp_85xx_setup_cpu(int cpu_nr) 366 366 { 367 367 if (smp_85xx_ops.probe == smp_mpic_probe) 368 368 mpic_setup_this_cpu();
-5
arch/powerpc/platforms/85xx/t4240_qds.c
··· 75 75 #ifdef CONFIG_PCI 76 76 .pcibios_fixup_bus = fsl_pcibios_fixup_bus, 77 77 #endif 78 - /* coreint doesn't play nice with lazy EE, use legacy mpic for now */ 79 - #ifdef CONFIG_PPC64 80 - .get_irq = mpic_get_irq, 81 - #else 82 78 .get_irq = mpic_get_coreint_irq, 83 - #endif 84 79 .restart = fsl_rstcr_restart, 85 80 .calibrate_decr = generic_calibrate_decr, 86 81 .progress = udbg_progress,
+4 -10
arch/powerpc/platforms/8xx/m8xx_setup.c
··· 43 43 44 44 static struct irqaction tbint_irqaction = { 45 45 .handler = timebase_interrupt, 46 + .flags = IRQF_NO_THREAD, 46 47 .name = "tbint", 47 48 }; 48 49 ··· 219 218 220 219 static void cpm_cascade(unsigned int irq, struct irq_desc *desc) 221 220 { 222 - struct irq_chip *chip; 223 - int cascade_irq; 221 + struct irq_chip *chip = irq_desc_get_chip(desc); 222 + int cascade_irq = cpm_get_irq(); 224 223 225 - if ((cascade_irq = cpm_get_irq()) >= 0) { 226 - struct irq_desc *cdesc = irq_to_desc(cascade_irq); 227 - 224 + if (cascade_irq >= 0) 228 225 generic_handle_irq(cascade_irq); 229 226 230 - chip = irq_desc_get_chip(cdesc); 231 - chip->irq_eoi(&cdesc->irq_data); 232 - } 233 - 234 - chip = irq_desc_get_chip(desc); 235 227 chip->irq_eoi(&desc->irq_data); 236 228 } 237 229
+26
arch/powerpc/platforms/Kconfig
··· 86 86 bool 87 87 default n 88 88 89 + config MPIC_TIMER 90 + bool "MPIC Global Timer" 91 + depends on MPIC && FSL_SOC 92 + default n 93 + help 94 + The MPIC global timer is a hardware timer inside the 95 + Freescale PIC complying with OpenPIC standard. When the 96 + specified interval times out, the hardware timer generates 97 + an interrupt. The driver currently is only tested on fsl 98 + chip, but it can potentially support other global timers 99 + complying with the OpenPIC standard. 100 + 101 + config FSL_MPIC_TIMER_WAKEUP 102 + tristate "Freescale MPIC global timer wakeup driver" 103 + depends on FSL_SOC && MPIC_TIMER && PM 104 + default n 105 + help 106 + The driver provides a way to wake up the system by MPIC 107 + timer. 108 + e.g. "echo 5 > /sys/devices/system/mpic/timer_wakeup" 109 + 89 110 config PPC_EPAPR_HV_PIC 90 111 bool 91 112 default n ··· 184 163 bool "Support for GX bus based adapters" 185 164 help 186 165 Bus device driver for GX bus based adapters. 166 + 167 + config EEH 168 + bool 169 + depends on (PPC_POWERNV || PPC_PSERIES) && PCI 170 + default y 187 171 188 172 config PPC_MPC106 189 173 bool
+1
arch/powerpc/platforms/Kconfig.cputype
··· 71 71 select PPC_FPU 72 72 select PPC_HAVE_PMU_SUPPORT 73 73 select SYS_SUPPORTS_HUGETLBFS 74 + select HAVE_ARCH_TRANSPARENT_HUGEPAGE if PPC_64K_PAGES 74 75 75 76 config PPC_BOOK3E_64 76 77 bool "Embedded processors"
+10 -6
arch/powerpc/platforms/cell/beat_htab.c
··· 185 185 static long beat_lpar_hpte_updatepp(unsigned long slot, 186 186 unsigned long newpp, 187 187 unsigned long vpn, 188 - int psize, int ssize, int local) 188 + int psize, int apsize, 189 + int ssize, int local) 189 190 { 190 191 unsigned long lpar_rc; 191 192 u64 dummy0, dummy1; ··· 275 274 } 276 275 277 276 static void beat_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn, 278 - int psize, int ssize, int local) 277 + int psize, int apsize, 278 + int ssize, int local) 279 279 { 280 280 unsigned long want_v; 281 281 unsigned long lpar_rc; ··· 366 364 * already zero. For now I am paranoid. 367 365 */ 368 366 static long beat_lpar_hpte_updatepp_v3(unsigned long slot, 369 - unsigned long newpp, 370 - unsigned long vpn, 371 - int psize, int ssize, int local) 367 + unsigned long newpp, 368 + unsigned long vpn, 369 + int psize, int apsize, 370 + int ssize, int local) 372 371 { 373 372 unsigned long lpar_rc; 374 373 unsigned long want_v; ··· 397 394 } 398 395 399 396 static void beat_lpar_hpte_invalidate_v3(unsigned long slot, unsigned long vpn, 400 - int psize, int ssize, int local) 397 + int psize, int apsize, 398 + int ssize, int local) 401 399 { 402 400 unsigned long want_v; 403 401 unsigned long lpar_rc;
+1 -1
arch/powerpc/platforms/cell/smp.c
··· 142 142 * during boot if the user requests it. Odd-numbered 143 143 * cpus are assumed to be secondary threads. 144 144 */ 145 - if (system_state < SYSTEM_RUNNING && 145 + if (system_state == SYSTEM_BOOTING && 146 146 cpu_has_feature(CPU_FTR_SMT) && 147 147 !smt_enabled_at_boot && cpu_thread_in_core(nr) != 0) 148 148 return 0;
+1 -1
arch/powerpc/platforms/powermac/smp.c
··· 885 885 return NOTIFY_OK; 886 886 } 887 887 888 - static struct notifier_block __cpuinitdata smp_core99_cpu_nb = { 888 + static struct notifier_block smp_core99_cpu_nb = { 889 889 .notifier_call = smp_core99_cpu_notify, 890 890 }; 891 891 #endif /* CONFIG_HOTPLUG_CPU */
+1
arch/powerpc/platforms/powernv/Makefile
··· 3 3 4 4 obj-$(CONFIG_SMP) += smp.o 5 5 obj-$(CONFIG_PCI) += pci.o pci-p5ioc2.o pci-ioda.o 6 + obj-$(CONFIG_EEH) += eeh-ioda.o eeh-powernv.o
+916
arch/powerpc/platforms/powernv/eeh-ioda.c
··· 1 + /* 2 + * The file intends to implement the functions needed by EEH, which is 3 + * built on IODA compliant chip. Actually, lots of functions related 4 + * to EEH would be built based on the OPAL APIs. 5 + * 6 + * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2013. 7 + * 8 + * This program is free software; you can redistribute it and/or modify 9 + * it under the terms of the GNU General Public License as published by 10 + * the Free Software Foundation; either version 2 of the License, or 11 + * (at your option) any later version. 12 + */ 13 + 14 + #include <linux/bootmem.h> 15 + #include <linux/debugfs.h> 16 + #include <linux/delay.h> 17 + #include <linux/init.h> 18 + #include <linux/io.h> 19 + #include <linux/irq.h> 20 + #include <linux/kernel.h> 21 + #include <linux/msi.h> 22 + #include <linux/notifier.h> 23 + #include <linux/pci.h> 24 + #include <linux/string.h> 25 + 26 + #include <asm/eeh.h> 27 + #include <asm/eeh_event.h> 28 + #include <asm/io.h> 29 + #include <asm/iommu.h> 30 + #include <asm/msi_bitmap.h> 31 + #include <asm/opal.h> 32 + #include <asm/pci-bridge.h> 33 + #include <asm/ppc-pci.h> 34 + #include <asm/tce.h> 35 + 36 + #include "powernv.h" 37 + #include "pci.h" 38 + 39 + /* Debugging option */ 40 + #ifdef IODA_EEH_DBG_ON 41 + #define IODA_EEH_DBG(args...) pr_info(args) 42 + #else 43 + #define IODA_EEH_DBG(args...) 44 + #endif 45 + 46 + static char *hub_diag = NULL; 47 + static int ioda_eeh_nb_init = 0; 48 + 49 + static int ioda_eeh_event(struct notifier_block *nb, 50 + unsigned long events, void *change) 51 + { 52 + uint64_t changed_evts = (uint64_t)change; 53 + 54 + /* We simply send special EEH event */ 55 + if ((changed_evts & OPAL_EVENT_PCI_ERROR) && 56 + (events & OPAL_EVENT_PCI_ERROR)) 57 + eeh_send_failure_event(NULL); 58 + 59 + return 0; 60 + } 61 + 62 + static struct notifier_block ioda_eeh_nb = { 63 + .notifier_call = ioda_eeh_event, 64 + .next = NULL, 65 + .priority = 0 66 + }; 67 + 68 + #ifdef CONFIG_DEBUG_FS 69 + static int ioda_eeh_dbgfs_set(void *data, u64 val) 70 + { 71 + struct pci_controller *hose = data; 72 + struct pnv_phb *phb = hose->private_data; 73 + 74 + out_be64(phb->regs + 0xD10, val); 75 + return 0; 76 + } 77 + 78 + static int ioda_eeh_dbgfs_get(void *data, u64 *val) 79 + { 80 + struct pci_controller *hose = data; 81 + struct pnv_phb *phb = hose->private_data; 82 + 83 + *val = in_be64(phb->regs + 0xD10); 84 + return 0; 85 + } 86 + 87 + DEFINE_SIMPLE_ATTRIBUTE(ioda_eeh_dbgfs_ops, ioda_eeh_dbgfs_get, 88 + ioda_eeh_dbgfs_set, "0x%llx\n"); 89 + #endif /* CONFIG_DEBUG_FS */ 90 + 91 + /** 92 + * ioda_eeh_post_init - Chip dependent post initialization 93 + * @hose: PCI controller 94 + * 95 + * The function will be called after eeh PEs and devices 96 + * have been built. That means the EEH is ready to supply 97 + * service with I/O cache. 98 + */ 99 + static int ioda_eeh_post_init(struct pci_controller *hose) 100 + { 101 + struct pnv_phb *phb = hose->private_data; 102 + int ret; 103 + 104 + /* Register OPAL event notifier */ 105 + if (!ioda_eeh_nb_init) { 106 + ret = opal_notifier_register(&ioda_eeh_nb); 107 + if (ret) { 108 + pr_err("%s: Can't register OPAL event notifier (%d)\n", 109 + __func__, ret); 110 + return ret; 111 + } 112 + 113 + ioda_eeh_nb_init = 1; 114 + } 115 + 116 + /* FIXME: Enable it for PHB3 later */ 117 + if (phb->type == PNV_PHB_IODA1) { 118 + if (!hub_diag) { 119 + hub_diag = (char *)__get_free_page(GFP_KERNEL | 120 + __GFP_ZERO); 121 + if (!hub_diag) { 122 + pr_err("%s: Out of memory !\n", 123 + __func__); 124 + return -ENOMEM; 125 + } 126 + } 127 + 128 + #ifdef CONFIG_DEBUG_FS 129 + if (phb->dbgfs) 130 + debugfs_create_file("err_injct", 0600, 131 + phb->dbgfs, hose, 132 + &ioda_eeh_dbgfs_ops); 133 + #endif 134 + 135 + phb->eeh_state |= PNV_EEH_STATE_ENABLED; 136 + } 137 + 138 + return 0; 139 + } 140 + 141 + /** 142 + * ioda_eeh_set_option - Set EEH operation or I/O setting 143 + * @pe: EEH PE 144 + * @option: options 145 + * 146 + * Enable or disable EEH option for the indicated PE. The 147 + * function also can be used to enable I/O or DMA for the 148 + * PE. 149 + */ 150 + static int ioda_eeh_set_option(struct eeh_pe *pe, int option) 151 + { 152 + s64 ret; 153 + u32 pe_no; 154 + struct pci_controller *hose = pe->phb; 155 + struct pnv_phb *phb = hose->private_data; 156 + 157 + /* Check on PE number */ 158 + if (pe->addr < 0 || pe->addr >= phb->ioda.total_pe) { 159 + pr_err("%s: PE address %x out of range [0, %x] " 160 + "on PHB#%x\n", 161 + __func__, pe->addr, phb->ioda.total_pe, 162 + hose->global_number); 163 + return -EINVAL; 164 + } 165 + 166 + pe_no = pe->addr; 167 + switch (option) { 168 + case EEH_OPT_DISABLE: 169 + ret = -EEXIST; 170 + break; 171 + case EEH_OPT_ENABLE: 172 + ret = 0; 173 + break; 174 + case EEH_OPT_THAW_MMIO: 175 + ret = opal_pci_eeh_freeze_clear(phb->opal_id, pe_no, 176 + OPAL_EEH_ACTION_CLEAR_FREEZE_MMIO); 177 + if (ret) { 178 + pr_warning("%s: Failed to enable MMIO for " 179 + "PHB#%x-PE#%x, err=%lld\n", 180 + __func__, hose->global_number, pe_no, ret); 181 + return -EIO; 182 + } 183 + 184 + break; 185 + case EEH_OPT_THAW_DMA: 186 + ret = opal_pci_eeh_freeze_clear(phb->opal_id, pe_no, 187 + OPAL_EEH_ACTION_CLEAR_FREEZE_DMA); 188 + if (ret) { 189 + pr_warning("%s: Failed to enable DMA for " 190 + "PHB#%x-PE#%x, err=%lld\n", 191 + __func__, hose->global_number, pe_no, ret); 192 + return -EIO; 193 + } 194 + 195 + break; 196 + default: 197 + pr_warning("%s: Invalid option %d\n", __func__, option); 198 + return -EINVAL; 199 + } 200 + 201 + return ret; 202 + } 203 + 204 + /** 205 + * ioda_eeh_get_state - Retrieve the state of PE 206 + * @pe: EEH PE 207 + * 208 + * The PE's state should be retrieved from the PEEV, PEST 209 + * IODA tables. Since the OPAL has exported the function 210 + * to do it, it'd better to use that. 211 + */ 212 + static int ioda_eeh_get_state(struct eeh_pe *pe) 213 + { 214 + s64 ret = 0; 215 + u8 fstate; 216 + u16 pcierr; 217 + u32 pe_no; 218 + int result; 219 + struct pci_controller *hose = pe->phb; 220 + struct pnv_phb *phb = hose->private_data; 221 + 222 + /* 223 + * Sanity check on PE address. The PHB PE address should 224 + * be zero. 225 + */ 226 + if (pe->addr < 0 || pe->addr >= phb->ioda.total_pe) { 227 + pr_err("%s: PE address %x out of range [0, %x] " 228 + "on PHB#%x\n", 229 + __func__, pe->addr, phb->ioda.total_pe, 230 + hose->global_number); 231 + return EEH_STATE_NOT_SUPPORT; 232 + } 233 + 234 + /* Retrieve PE status through OPAL */ 235 + pe_no = pe->addr; 236 + ret = opal_pci_eeh_freeze_status(phb->opal_id, pe_no, 237 + &fstate, &pcierr, NULL); 238 + if (ret) { 239 + pr_err("%s: Failed to get EEH status on " 240 + "PHB#%x-PE#%x\n, err=%lld\n", 241 + __func__, hose->global_number, pe_no, ret); 242 + return EEH_STATE_NOT_SUPPORT; 243 + } 244 + 245 + /* Check PHB status */ 246 + if (pe->type & EEH_PE_PHB) { 247 + result = 0; 248 + result &= ~EEH_STATE_RESET_ACTIVE; 249 + 250 + if (pcierr != OPAL_EEH_PHB_ERROR) { 251 + result |= EEH_STATE_MMIO_ACTIVE; 252 + result |= EEH_STATE_DMA_ACTIVE; 253 + result |= EEH_STATE_MMIO_ENABLED; 254 + result |= EEH_STATE_DMA_ENABLED; 255 + } 256 + 257 + return result; 258 + } 259 + 260 + /* Parse result out */ 261 + result = 0; 262 + switch (fstate) { 263 + case OPAL_EEH_STOPPED_NOT_FROZEN: 264 + result &= ~EEH_STATE_RESET_ACTIVE; 265 + result |= EEH_STATE_MMIO_ACTIVE; 266 + result |= EEH_STATE_DMA_ACTIVE; 267 + result |= EEH_STATE_MMIO_ENABLED; 268 + result |= EEH_STATE_DMA_ENABLED; 269 + break; 270 + case OPAL_EEH_STOPPED_MMIO_FREEZE: 271 + result &= ~EEH_STATE_RESET_ACTIVE; 272 + result |= EEH_STATE_DMA_ACTIVE; 273 + result |= EEH_STATE_DMA_ENABLED; 274 + break; 275 + case OPAL_EEH_STOPPED_DMA_FREEZE: 276 + result &= ~EEH_STATE_RESET_ACTIVE; 277 + result |= EEH_STATE_MMIO_ACTIVE; 278 + result |= EEH_STATE_MMIO_ENABLED; 279 + break; 280 + case OPAL_EEH_STOPPED_MMIO_DMA_FREEZE: 281 + result &= ~EEH_STATE_RESET_ACTIVE; 282 + break; 283 + case OPAL_EEH_STOPPED_RESET: 284 + result |= EEH_STATE_RESET_ACTIVE; 285 + break; 286 + case OPAL_EEH_STOPPED_TEMP_UNAVAIL: 287 + result |= EEH_STATE_UNAVAILABLE; 288 + break; 289 + case OPAL_EEH_STOPPED_PERM_UNAVAIL: 290 + result |= EEH_STATE_NOT_SUPPORT; 291 + break; 292 + default: 293 + pr_warning("%s: Unexpected EEH status 0x%x " 294 + "on PHB#%x-PE#%x\n", 295 + __func__, fstate, hose->global_number, pe_no); 296 + } 297 + 298 + return result; 299 + } 300 + 301 + static int ioda_eeh_pe_clear(struct eeh_pe *pe) 302 + { 303 + struct pci_controller *hose; 304 + struct pnv_phb *phb; 305 + u32 pe_no; 306 + u8 fstate; 307 + u16 pcierr; 308 + s64 ret; 309 + 310 + pe_no = pe->addr; 311 + hose = pe->phb; 312 + phb = pe->phb->private_data; 313 + 314 + /* Clear the EEH error on the PE */ 315 + ret = opal_pci_eeh_freeze_clear(phb->opal_id, 316 + pe_no, OPAL_EEH_ACTION_CLEAR_FREEZE_ALL); 317 + if (ret) { 318 + pr_err("%s: Failed to clear EEH error for " 319 + "PHB#%x-PE#%x, err=%lld\n", 320 + __func__, hose->global_number, pe_no, ret); 321 + return -EIO; 322 + } 323 + 324 + /* 325 + * Read the PE state back and verify that the frozen 326 + * state has been removed. 327 + */ 328 + ret = opal_pci_eeh_freeze_status(phb->opal_id, pe_no, 329 + &fstate, &pcierr, NULL); 330 + if (ret) { 331 + pr_err("%s: Failed to get EEH status on " 332 + "PHB#%x-PE#%x\n, err=%lld\n", 333 + __func__, hose->global_number, pe_no, ret); 334 + return -EIO; 335 + } 336 + 337 + if (fstate != OPAL_EEH_STOPPED_NOT_FROZEN) { 338 + pr_err("%s: Frozen state not cleared on " 339 + "PHB#%x-PE#%x, sts=%x\n", 340 + __func__, hose->global_number, pe_no, fstate); 341 + return -EIO; 342 + } 343 + 344 + return 0; 345 + } 346 + 347 + static s64 ioda_eeh_phb_poll(struct pnv_phb *phb) 348 + { 349 + s64 rc = OPAL_HARDWARE; 350 + 351 + while (1) { 352 + rc = opal_pci_poll(phb->opal_id); 353 + if (rc <= 0) 354 + break; 355 + 356 + msleep(rc); 357 + } 358 + 359 + return rc; 360 + } 361 + 362 + static int ioda_eeh_phb_reset(struct pci_controller *hose, int option) 363 + { 364 + struct pnv_phb *phb = hose->private_data; 365 + s64 rc = OPAL_HARDWARE; 366 + 367 + pr_debug("%s: Reset PHB#%x, option=%d\n", 368 + __func__, hose->global_number, option); 369 + 370 + /* Issue PHB complete reset request */ 371 + if (option == EEH_RESET_FUNDAMENTAL || 372 + option == EEH_RESET_HOT) 373 + rc = opal_pci_reset(phb->opal_id, 374 + OPAL_PHB_COMPLETE, 375 + OPAL_ASSERT_RESET); 376 + else if (option == EEH_RESET_DEACTIVATE) 377 + rc = opal_pci_reset(phb->opal_id, 378 + OPAL_PHB_COMPLETE, 379 + OPAL_DEASSERT_RESET); 380 + if (rc < 0) 381 + goto out; 382 + 383 + /* 384 + * Poll state of the PHB until the request is done 385 + * successfully. 386 + */ 387 + rc = ioda_eeh_phb_poll(phb); 388 + out: 389 + if (rc != OPAL_SUCCESS) 390 + return -EIO; 391 + 392 + return 0; 393 + } 394 + 395 + static int ioda_eeh_root_reset(struct pci_controller *hose, int option) 396 + { 397 + struct pnv_phb *phb = hose->private_data; 398 + s64 rc = OPAL_SUCCESS; 399 + 400 + pr_debug("%s: Reset PHB#%x, option=%d\n", 401 + __func__, hose->global_number, option); 402 + 403 + /* 404 + * During the reset deassert time, we needn't care 405 + * the reset scope because the firmware does nothing 406 + * for fundamental or hot reset during deassert phase. 407 + */ 408 + if (option == EEH_RESET_FUNDAMENTAL) 409 + rc = opal_pci_reset(phb->opal_id, 410 + OPAL_PCI_FUNDAMENTAL_RESET, 411 + OPAL_ASSERT_RESET); 412 + else if (option == EEH_RESET_HOT) 413 + rc = opal_pci_reset(phb->opal_id, 414 + OPAL_PCI_HOT_RESET, 415 + OPAL_ASSERT_RESET); 416 + else if (option == EEH_RESET_DEACTIVATE) 417 + rc = opal_pci_reset(phb->opal_id, 418 + OPAL_PCI_HOT_RESET, 419 + OPAL_DEASSERT_RESET); 420 + if (rc < 0) 421 + goto out; 422 + 423 + /* Poll state of the PHB until the request is done */ 424 + rc = ioda_eeh_phb_poll(phb); 425 + out: 426 + if (rc != OPAL_SUCCESS) 427 + return -EIO; 428 + 429 + return 0; 430 + } 431 + 432 + static int ioda_eeh_bridge_reset(struct pci_controller *hose, 433 + struct pci_dev *dev, int option) 434 + { 435 + u16 ctrl; 436 + 437 + pr_debug("%s: Reset device %04x:%02x:%02x.%01x with option %d\n", 438 + __func__, hose->global_number, dev->bus->number, 439 + PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), option); 440 + 441 + switch (option) { 442 + case EEH_RESET_FUNDAMENTAL: 443 + case EEH_RESET_HOT: 444 + pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl); 445 + ctrl |= PCI_BRIDGE_CTL_BUS_RESET; 446 + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl); 447 + break; 448 + case EEH_RESET_DEACTIVATE: 449 + pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl); 450 + ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET; 451 + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl); 452 + break; 453 + } 454 + 455 + return 0; 456 + } 457 + 458 + /** 459 + * ioda_eeh_reset - Reset the indicated PE 460 + * @pe: EEH PE 461 + * @option: reset option 462 + * 463 + * Do reset on the indicated PE. For PCI bus sensitive PE, 464 + * we need to reset the parent p2p bridge. The PHB has to 465 + * be reinitialized if the p2p bridge is root bridge. For 466 + * PCI device sensitive PE, we will try to reset the device 467 + * through FLR. For now, we don't have OPAL APIs to do HARD 468 + * reset yet, so all reset would be SOFT (HOT) reset. 469 + */ 470 + static int ioda_eeh_reset(struct eeh_pe *pe, int option) 471 + { 472 + struct pci_controller *hose = pe->phb; 473 + struct eeh_dev *edev; 474 + struct pci_dev *dev; 475 + int ret; 476 + 477 + /* 478 + * Anyway, we have to clear the problematic state for the 479 + * corresponding PE. However, we needn't do it if the PE 480 + * is PHB associated. That means the PHB is having fatal 481 + * errors and it needs reset. Further more, the AIB interface 482 + * isn't reliable any more. 483 + */ 484 + if (!(pe->type & EEH_PE_PHB) && 485 + (option == EEH_RESET_HOT || 486 + option == EEH_RESET_FUNDAMENTAL)) { 487 + ret = ioda_eeh_pe_clear(pe); 488 + if (ret) 489 + return -EIO; 490 + } 491 + 492 + /* 493 + * The rules applied to reset, either fundamental or hot reset: 494 + * 495 + * We always reset the direct upstream bridge of the PE. If the 496 + * direct upstream bridge isn't root bridge, we always take hot 497 + * reset no matter what option (fundamental or hot) is. Otherwise, 498 + * we should do the reset according to the required option. 499 + */ 500 + if (pe->type & EEH_PE_PHB) { 501 + ret = ioda_eeh_phb_reset(hose, option); 502 + } else { 503 + if (pe->type & EEH_PE_DEVICE) { 504 + /* 505 + * If it's device PE, we didn't refer to the parent 506 + * PCI bus yet. So we have to figure it out indirectly. 507 + */ 508 + edev = list_first_entry(&pe->edevs, 509 + struct eeh_dev, list); 510 + dev = eeh_dev_to_pci_dev(edev); 511 + dev = dev->bus->self; 512 + } else { 513 + /* 514 + * If it's bus PE, the parent PCI bus is already there 515 + * and just pick it up. 516 + */ 517 + dev = pe->bus->self; 518 + } 519 + 520 + /* 521 + * Do reset based on the fact that the direct upstream bridge 522 + * is root bridge (port) or not. 523 + */ 524 + if (dev->bus->number == 0) 525 + ret = ioda_eeh_root_reset(hose, option); 526 + else 527 + ret = ioda_eeh_bridge_reset(hose, dev, option); 528 + } 529 + 530 + return ret; 531 + } 532 + 533 + /** 534 + * ioda_eeh_get_log - Retrieve error log 535 + * @pe: EEH PE 536 + * @severity: Severity level of the log 537 + * @drv_log: buffer to store the log 538 + * @len: space of the log buffer 539 + * 540 + * The function is used to retrieve error log from P7IOC. 541 + */ 542 + static int ioda_eeh_get_log(struct eeh_pe *pe, int severity, 543 + char *drv_log, unsigned long len) 544 + { 545 + s64 ret; 546 + unsigned long flags; 547 + struct pci_controller *hose = pe->phb; 548 + struct pnv_phb *phb = hose->private_data; 549 + 550 + spin_lock_irqsave(&phb->lock, flags); 551 + 552 + ret = opal_pci_get_phb_diag_data2(phb->opal_id, 553 + phb->diag.blob, PNV_PCI_DIAG_BUF_SIZE); 554 + if (ret) { 555 + spin_unlock_irqrestore(&phb->lock, flags); 556 + pr_warning("%s: Failed to get log for PHB#%x-PE#%x\n", 557 + __func__, hose->global_number, pe->addr); 558 + return -EIO; 559 + } 560 + 561 + /* 562 + * FIXME: We probably need log the error in somewhere. 563 + * Lets make it up in future. 564 + */ 565 + /* pr_info("%s", phb->diag.blob); */ 566 + 567 + spin_unlock_irqrestore(&phb->lock, flags); 568 + 569 + return 0; 570 + } 571 + 572 + /** 573 + * ioda_eeh_configure_bridge - Configure the PCI bridges for the indicated PE 574 + * @pe: EEH PE 575 + * 576 + * For particular PE, it might have included PCI bridges. In order 577 + * to make the PE work properly, those PCI bridges should be configured 578 + * correctly. However, we need do nothing on P7IOC since the reset 579 + * function will do everything that should be covered by the function. 580 + */ 581 + static int ioda_eeh_configure_bridge(struct eeh_pe *pe) 582 + { 583 + return 0; 584 + } 585 + 586 + static void ioda_eeh_hub_diag_common(struct OpalIoP7IOCErrorData *data) 587 + { 588 + /* GEM */ 589 + pr_info(" GEM XFIR: %016llx\n", data->gemXfir); 590 + pr_info(" GEM RFIR: %016llx\n", data->gemRfir); 591 + pr_info(" GEM RIRQFIR: %016llx\n", data->gemRirqfir); 592 + pr_info(" GEM Mask: %016llx\n", data->gemMask); 593 + pr_info(" GEM RWOF: %016llx\n", data->gemRwof); 594 + 595 + /* LEM */ 596 + pr_info(" LEM FIR: %016llx\n", data->lemFir); 597 + pr_info(" LEM Error Mask: %016llx\n", data->lemErrMask); 598 + pr_info(" LEM Action 0: %016llx\n", data->lemAction0); 599 + pr_info(" LEM Action 1: %016llx\n", data->lemAction1); 600 + pr_info(" LEM WOF: %016llx\n", data->lemWof); 601 + } 602 + 603 + static void ioda_eeh_hub_diag(struct pci_controller *hose) 604 + { 605 + struct pnv_phb *phb = hose->private_data; 606 + struct OpalIoP7IOCErrorData *data; 607 + long rc; 608 + 609 + data = (struct OpalIoP7IOCErrorData *)ioda_eeh_hub_diag; 610 + rc = opal_pci_get_hub_diag_data(phb->hub_id, data, PAGE_SIZE); 611 + if (rc != OPAL_SUCCESS) { 612 + pr_warning("%s: Failed to get HUB#%llx diag-data (%ld)\n", 613 + __func__, phb->hub_id, rc); 614 + return; 615 + } 616 + 617 + switch (data->type) { 618 + case OPAL_P7IOC_DIAG_TYPE_RGC: 619 + pr_info("P7IOC diag-data for RGC\n\n"); 620 + ioda_eeh_hub_diag_common(data); 621 + pr_info(" RGC Status: %016llx\n", data->rgc.rgcStatus); 622 + pr_info(" RGC LDCP: %016llx\n", data->rgc.rgcLdcp); 623 + break; 624 + case OPAL_P7IOC_DIAG_TYPE_BI: 625 + pr_info("P7IOC diag-data for BI %s\n\n", 626 + data->bi.biDownbound ? "Downbound" : "Upbound"); 627 + ioda_eeh_hub_diag_common(data); 628 + pr_info(" BI LDCP 0: %016llx\n", data->bi.biLdcp0); 629 + pr_info(" BI LDCP 1: %016llx\n", data->bi.biLdcp1); 630 + pr_info(" BI LDCP 2: %016llx\n", data->bi.biLdcp2); 631 + pr_info(" BI Fence Status: %016llx\n", data->bi.biFenceStatus); 632 + break; 633 + case OPAL_P7IOC_DIAG_TYPE_CI: 634 + pr_info("P7IOC diag-data for CI Port %d\\nn", 635 + data->ci.ciPort); 636 + ioda_eeh_hub_diag_common(data); 637 + pr_info(" CI Port Status: %016llx\n", data->ci.ciPortStatus); 638 + pr_info(" CI Port LDCP: %016llx\n", data->ci.ciPortLdcp); 639 + break; 640 + case OPAL_P7IOC_DIAG_TYPE_MISC: 641 + pr_info("P7IOC diag-data for MISC\n\n"); 642 + ioda_eeh_hub_diag_common(data); 643 + break; 644 + case OPAL_P7IOC_DIAG_TYPE_I2C: 645 + pr_info("P7IOC diag-data for I2C\n\n"); 646 + ioda_eeh_hub_diag_common(data); 647 + break; 648 + default: 649 + pr_warning("%s: Invalid type of HUB#%llx diag-data (%d)\n", 650 + __func__, phb->hub_id, data->type); 651 + } 652 + } 653 + 654 + static void ioda_eeh_p7ioc_phb_diag(struct pci_controller *hose, 655 + struct OpalIoPhbErrorCommon *common) 656 + { 657 + struct OpalIoP7IOCPhbErrorData *data; 658 + int i; 659 + 660 + data = (struct OpalIoP7IOCPhbErrorData *)common; 661 + 662 + pr_info("P7IOC PHB#%x Diag-data (Version: %d)\n\n", 663 + hose->global_number, common->version); 664 + 665 + pr_info(" brdgCtl: %08x\n", data->brdgCtl); 666 + 667 + pr_info(" portStatusReg: %08x\n", data->portStatusReg); 668 + pr_info(" rootCmplxStatus: %08x\n", data->rootCmplxStatus); 669 + pr_info(" busAgentStatus: %08x\n", data->busAgentStatus); 670 + 671 + pr_info(" deviceStatus: %08x\n", data->deviceStatus); 672 + pr_info(" slotStatus: %08x\n", data->slotStatus); 673 + pr_info(" linkStatus: %08x\n", data->linkStatus); 674 + pr_info(" devCmdStatus: %08x\n", data->devCmdStatus); 675 + pr_info(" devSecStatus: %08x\n", data->devSecStatus); 676 + 677 + pr_info(" rootErrorStatus: %08x\n", data->rootErrorStatus); 678 + pr_info(" uncorrErrorStatus: %08x\n", data->uncorrErrorStatus); 679 + pr_info(" corrErrorStatus: %08x\n", data->corrErrorStatus); 680 + pr_info(" tlpHdr1: %08x\n", data->tlpHdr1); 681 + pr_info(" tlpHdr2: %08x\n", data->tlpHdr2); 682 + pr_info(" tlpHdr3: %08x\n", data->tlpHdr3); 683 + pr_info(" tlpHdr4: %08x\n", data->tlpHdr4); 684 + pr_info(" sourceId: %08x\n", data->sourceId); 685 + 686 + pr_info(" errorClass: %016llx\n", data->errorClass); 687 + pr_info(" correlator: %016llx\n", data->correlator); 688 + pr_info(" p7iocPlssr: %016llx\n", data->p7iocPlssr); 689 + pr_info(" p7iocCsr: %016llx\n", data->p7iocCsr); 690 + pr_info(" lemFir: %016llx\n", data->lemFir); 691 + pr_info(" lemErrorMask: %016llx\n", data->lemErrorMask); 692 + pr_info(" lemWOF: %016llx\n", data->lemWOF); 693 + pr_info(" phbErrorStatus: %016llx\n", data->phbErrorStatus); 694 + pr_info(" phbFirstErrorStatus: %016llx\n", data->phbFirstErrorStatus); 695 + pr_info(" phbErrorLog0: %016llx\n", data->phbErrorLog0); 696 + pr_info(" phbErrorLog1: %016llx\n", data->phbErrorLog1); 697 + pr_info(" mmioErrorStatus: %016llx\n", data->mmioErrorStatus); 698 + pr_info(" mmioFirstErrorStatus: %016llx\n", data->mmioFirstErrorStatus); 699 + pr_info(" mmioErrorLog0: %016llx\n", data->mmioErrorLog0); 700 + pr_info(" mmioErrorLog1: %016llx\n", data->mmioErrorLog1); 701 + pr_info(" dma0ErrorStatus: %016llx\n", data->dma0ErrorStatus); 702 + pr_info(" dma0FirstErrorStatus: %016llx\n", data->dma0FirstErrorStatus); 703 + pr_info(" dma0ErrorLog0: %016llx\n", data->dma0ErrorLog0); 704 + pr_info(" dma0ErrorLog1: %016llx\n", data->dma0ErrorLog1); 705 + pr_info(" dma1ErrorStatus: %016llx\n", data->dma1ErrorStatus); 706 + pr_info(" dma1FirstErrorStatus: %016llx\n", data->dma1FirstErrorStatus); 707 + pr_info(" dma1ErrorLog0: %016llx\n", data->dma1ErrorLog0); 708 + pr_info(" dma1ErrorLog1: %016llx\n", data->dma1ErrorLog1); 709 + 710 + for (i = 0; i < OPAL_P7IOC_NUM_PEST_REGS; i++) { 711 + if ((data->pestA[i] >> 63) == 0 && 712 + (data->pestB[i] >> 63) == 0) 713 + continue; 714 + 715 + pr_info(" PE[%3d] PESTA: %016llx\n", i, data->pestA[i]); 716 + pr_info(" PESTB: %016llx\n", data->pestB[i]); 717 + } 718 + } 719 + 720 + static void ioda_eeh_phb_diag(struct pci_controller *hose) 721 + { 722 + struct pnv_phb *phb = hose->private_data; 723 + struct OpalIoPhbErrorCommon *common; 724 + long rc; 725 + 726 + common = (struct OpalIoPhbErrorCommon *)phb->diag.blob; 727 + rc = opal_pci_get_phb_diag_data2(phb->opal_id, common, PAGE_SIZE); 728 + if (rc != OPAL_SUCCESS) { 729 + pr_warning("%s: Failed to get diag-data for PHB#%x (%ld)\n", 730 + __func__, hose->global_number, rc); 731 + return; 732 + } 733 + 734 + switch (common->ioType) { 735 + case OPAL_PHB_ERROR_DATA_TYPE_P7IOC: 736 + ioda_eeh_p7ioc_phb_diag(hose, common); 737 + break; 738 + default: 739 + pr_warning("%s: Unrecognized I/O chip %d\n", 740 + __func__, common->ioType); 741 + } 742 + } 743 + 744 + static int ioda_eeh_get_phb_pe(struct pci_controller *hose, 745 + struct eeh_pe **pe) 746 + { 747 + struct eeh_pe *phb_pe; 748 + 749 + phb_pe = eeh_phb_pe_get(hose); 750 + if (!phb_pe) { 751 + pr_warning("%s Can't find PE for PHB#%d\n", 752 + __func__, hose->global_number); 753 + return -EEXIST; 754 + } 755 + 756 + *pe = phb_pe; 757 + return 0; 758 + } 759 + 760 + static int ioda_eeh_get_pe(struct pci_controller *hose, 761 + u16 pe_no, struct eeh_pe **pe) 762 + { 763 + struct eeh_pe *phb_pe, *dev_pe; 764 + struct eeh_dev dev; 765 + 766 + /* Find the PHB PE */ 767 + if (ioda_eeh_get_phb_pe(hose, &phb_pe)) 768 + return -EEXIST; 769 + 770 + /* Find the PE according to PE# */ 771 + memset(&dev, 0, sizeof(struct eeh_dev)); 772 + dev.phb = hose; 773 + dev.pe_config_addr = pe_no; 774 + dev_pe = eeh_pe_get(&dev); 775 + if (!dev_pe) { 776 + pr_warning("%s: Can't find PE for PHB#%x - PE#%x\n", 777 + __func__, hose->global_number, pe_no); 778 + return -EEXIST; 779 + } 780 + 781 + *pe = dev_pe; 782 + return 0; 783 + } 784 + 785 + /** 786 + * ioda_eeh_next_error - Retrieve next error for EEH core to handle 787 + * @pe: The affected PE 788 + * 789 + * The function is expected to be called by EEH core while it gets 790 + * special EEH event (without binding PE). The function calls to 791 + * OPAL APIs for next error to handle. The informational error is 792 + * handled internally by platform. However, the dead IOC, dead PHB, 793 + * fenced PHB and frozen PE should be handled by EEH core eventually. 794 + */ 795 + static int ioda_eeh_next_error(struct eeh_pe **pe) 796 + { 797 + struct pci_controller *hose, *tmp; 798 + struct pnv_phb *phb; 799 + u64 frozen_pe_no; 800 + u16 err_type, severity; 801 + long rc; 802 + int ret = 1; 803 + 804 + /* 805 + * While running here, it's safe to purge the event queue. 806 + * And we should keep the cached OPAL notifier event sychronized 807 + * between the kernel and firmware. 808 + */ 809 + eeh_remove_event(NULL); 810 + opal_notifier_update_evt(OPAL_EVENT_PCI_ERROR, 0x0ul); 811 + 812 + list_for_each_entry_safe(hose, tmp, &hose_list, list_node) { 813 + /* 814 + * If the subordinate PCI buses of the PHB has been 815 + * removed, we needn't take care of it any more. 816 + */ 817 + phb = hose->private_data; 818 + if (phb->eeh_state & PNV_EEH_STATE_REMOVED) 819 + continue; 820 + 821 + rc = opal_pci_next_error(phb->opal_id, 822 + &frozen_pe_no, &err_type, &severity); 823 + 824 + /* If OPAL API returns error, we needn't proceed */ 825 + if (rc != OPAL_SUCCESS) { 826 + IODA_EEH_DBG("%s: Invalid return value on " 827 + "PHB#%x (0x%lx) from opal_pci_next_error", 828 + __func__, hose->global_number, rc); 829 + continue; 830 + } 831 + 832 + /* If the PHB doesn't have error, stop processing */ 833 + if (err_type == OPAL_EEH_NO_ERROR || 834 + severity == OPAL_EEH_SEV_NO_ERROR) { 835 + IODA_EEH_DBG("%s: No error found on PHB#%x\n", 836 + __func__, hose->global_number); 837 + continue; 838 + } 839 + 840 + /* 841 + * Processing the error. We're expecting the error with 842 + * highest priority reported upon multiple errors on the 843 + * specific PHB. 844 + */ 845 + IODA_EEH_DBG("%s: Error (%d, %d, %d) on PHB#%x\n", 846 + err_type, severity, pe_no, hose->global_number); 847 + switch (err_type) { 848 + case OPAL_EEH_IOC_ERROR: 849 + if (severity == OPAL_EEH_SEV_IOC_DEAD) { 850 + list_for_each_entry_safe(hose, tmp, 851 + &hose_list, list_node) { 852 + phb = hose->private_data; 853 + phb->eeh_state |= PNV_EEH_STATE_REMOVED; 854 + } 855 + 856 + pr_err("EEH: dead IOC detected\n"); 857 + ret = 4; 858 + goto out; 859 + } else if (severity == OPAL_EEH_SEV_INF) { 860 + pr_info("EEH: IOC informative error " 861 + "detected\n"); 862 + ioda_eeh_hub_diag(hose); 863 + } 864 + 865 + break; 866 + case OPAL_EEH_PHB_ERROR: 867 + if (severity == OPAL_EEH_SEV_PHB_DEAD) { 868 + if (ioda_eeh_get_phb_pe(hose, pe)) 869 + break; 870 + 871 + pr_err("EEH: dead PHB#%x detected\n", 872 + hose->global_number); 873 + phb->eeh_state |= PNV_EEH_STATE_REMOVED; 874 + ret = 3; 875 + goto out; 876 + } else if (severity == OPAL_EEH_SEV_PHB_FENCED) { 877 + if (ioda_eeh_get_phb_pe(hose, pe)) 878 + break; 879 + 880 + pr_err("EEH: fenced PHB#%x detected\n", 881 + hose->global_number); 882 + ret = 2; 883 + goto out; 884 + } else if (severity == OPAL_EEH_SEV_INF) { 885 + pr_info("EEH: PHB#%x informative error " 886 + "detected\n", 887 + hose->global_number); 888 + ioda_eeh_phb_diag(hose); 889 + } 890 + 891 + break; 892 + case OPAL_EEH_PE_ERROR: 893 + if (ioda_eeh_get_pe(hose, frozen_pe_no, pe)) 894 + break; 895 + 896 + pr_err("EEH: Frozen PE#%x on PHB#%x detected\n", 897 + (*pe)->addr, (*pe)->phb->global_number); 898 + ret = 1; 899 + goto out; 900 + } 901 + } 902 + 903 + ret = 0; 904 + out: 905 + return ret; 906 + } 907 + 908 + struct pnv_eeh_ops ioda_eeh_ops = { 909 + .post_init = ioda_eeh_post_init, 910 + .set_option = ioda_eeh_set_option, 911 + .get_state = ioda_eeh_get_state, 912 + .reset = ioda_eeh_reset, 913 + .get_log = ioda_eeh_get_log, 914 + .configure_bridge = ioda_eeh_configure_bridge, 915 + .next_error = ioda_eeh_next_error 916 + };
+379
arch/powerpc/platforms/powernv/eeh-powernv.c
··· 1 + /* 2 + * The file intends to implement the platform dependent EEH operations on 3 + * powernv platform. Actually, the powernv was created in order to fully 4 + * hypervisor support. 5 + * 6 + * Copyright Benjamin Herrenschmidt & Gavin Shan, IBM Corporation 2013. 7 + * 8 + * This program is free software; you can redistribute it and/or modify 9 + * it under the terms of the GNU General Public License as published by 10 + * the Free Software Foundation; either version 2 of the License, or 11 + * (at your option) any later version. 12 + */ 13 + 14 + #include <linux/atomic.h> 15 + #include <linux/delay.h> 16 + #include <linux/export.h> 17 + #include <linux/init.h> 18 + #include <linux/list.h> 19 + #include <linux/msi.h> 20 + #include <linux/of.h> 21 + #include <linux/pci.h> 22 + #include <linux/proc_fs.h> 23 + #include <linux/rbtree.h> 24 + #include <linux/sched.h> 25 + #include <linux/seq_file.h> 26 + #include <linux/spinlock.h> 27 + 28 + #include <asm/eeh.h> 29 + #include <asm/eeh_event.h> 30 + #include <asm/firmware.h> 31 + #include <asm/io.h> 32 + #include <asm/iommu.h> 33 + #include <asm/machdep.h> 34 + #include <asm/msi_bitmap.h> 35 + #include <asm/opal.h> 36 + #include <asm/ppc-pci.h> 37 + 38 + #include "powernv.h" 39 + #include "pci.h" 40 + 41 + /** 42 + * powernv_eeh_init - EEH platform dependent initialization 43 + * 44 + * EEH platform dependent initialization on powernv 45 + */ 46 + static int powernv_eeh_init(void) 47 + { 48 + /* We require OPALv3 */ 49 + if (!firmware_has_feature(FW_FEATURE_OPALv3)) { 50 + pr_warning("%s: OPALv3 is required !\n", __func__); 51 + return -EINVAL; 52 + } 53 + 54 + /* Set EEH probe mode */ 55 + eeh_probe_mode_set(EEH_PROBE_MODE_DEV); 56 + 57 + return 0; 58 + } 59 + 60 + /** 61 + * powernv_eeh_post_init - EEH platform dependent post initialization 62 + * 63 + * EEH platform dependent post initialization on powernv. When 64 + * the function is called, the EEH PEs and devices should have 65 + * been built. If the I/O cache staff has been built, EEH is 66 + * ready to supply service. 67 + */ 68 + static int powernv_eeh_post_init(void) 69 + { 70 + struct pci_controller *hose; 71 + struct pnv_phb *phb; 72 + int ret = 0; 73 + 74 + list_for_each_entry(hose, &hose_list, list_node) { 75 + phb = hose->private_data; 76 + 77 + if (phb->eeh_ops && phb->eeh_ops->post_init) { 78 + ret = phb->eeh_ops->post_init(hose); 79 + if (ret) 80 + break; 81 + } 82 + } 83 + 84 + return ret; 85 + } 86 + 87 + /** 88 + * powernv_eeh_dev_probe - Do probe on PCI device 89 + * @dev: PCI device 90 + * @flag: unused 91 + * 92 + * When EEH module is installed during system boot, all PCI devices 93 + * are checked one by one to see if it supports EEH. The function 94 + * is introduced for the purpose. By default, EEH has been enabled 95 + * on all PCI devices. That's to say, we only need do necessary 96 + * initialization on the corresponding eeh device and create PE 97 + * accordingly. 98 + * 99 + * It's notable that's unsafe to retrieve the EEH device through 100 + * the corresponding PCI device. During the PCI device hotplug, which 101 + * was possiblly triggered by EEH core, the binding between EEH device 102 + * and the PCI device isn't built yet. 103 + */ 104 + static int powernv_eeh_dev_probe(struct pci_dev *dev, void *flag) 105 + { 106 + struct pci_controller *hose = pci_bus_to_host(dev->bus); 107 + struct pnv_phb *phb = hose->private_data; 108 + struct device_node *dn = pci_device_to_OF_node(dev); 109 + struct eeh_dev *edev = of_node_to_eeh_dev(dn); 110 + 111 + /* 112 + * When probing the root bridge, which doesn't have any 113 + * subordinate PCI devices. We don't have OF node for 114 + * the root bridge. So it's not reasonable to continue 115 + * the probing. 116 + */ 117 + if (!dn || !edev) 118 + return 0; 119 + 120 + /* Skip for PCI-ISA bridge */ 121 + if ((dev->class >> 8) == PCI_CLASS_BRIDGE_ISA) 122 + return 0; 123 + 124 + /* Initialize eeh device */ 125 + edev->class_code = dev->class; 126 + edev->mode = 0; 127 + edev->config_addr = ((dev->bus->number << 8) | dev->devfn); 128 + edev->pe_config_addr = phb->bdfn_to_pe(phb, dev->bus, dev->devfn & 0xff); 129 + 130 + /* Create PE */ 131 + eeh_add_to_parent_pe(edev); 132 + 133 + /* 134 + * Enable EEH explicitly so that we will do EEH check 135 + * while accessing I/O stuff 136 + * 137 + * FIXME: Enable that for PHB3 later 138 + */ 139 + if (phb->type == PNV_PHB_IODA1) 140 + eeh_subsystem_enabled = 1; 141 + 142 + /* Save memory bars */ 143 + eeh_save_bars(edev); 144 + 145 + return 0; 146 + } 147 + 148 + /** 149 + * powernv_eeh_set_option - Initialize EEH or MMIO/DMA reenable 150 + * @pe: EEH PE 151 + * @option: operation to be issued 152 + * 153 + * The function is used to control the EEH functionality globally. 154 + * Currently, following options are support according to PAPR: 155 + * Enable EEH, Disable EEH, Enable MMIO and Enable DMA 156 + */ 157 + static int powernv_eeh_set_option(struct eeh_pe *pe, int option) 158 + { 159 + struct pci_controller *hose = pe->phb; 160 + struct pnv_phb *phb = hose->private_data; 161 + int ret = -EEXIST; 162 + 163 + /* 164 + * What we need do is pass it down for hardware 165 + * implementation to handle it. 166 + */ 167 + if (phb->eeh_ops && phb->eeh_ops->set_option) 168 + ret = phb->eeh_ops->set_option(pe, option); 169 + 170 + return ret; 171 + } 172 + 173 + /** 174 + * powernv_eeh_get_pe_addr - Retrieve PE address 175 + * @pe: EEH PE 176 + * 177 + * Retrieve the PE address according to the given tranditional 178 + * PCI BDF (Bus/Device/Function) address. 179 + */ 180 + static int powernv_eeh_get_pe_addr(struct eeh_pe *pe) 181 + { 182 + return pe->addr; 183 + } 184 + 185 + /** 186 + * powernv_eeh_get_state - Retrieve PE state 187 + * @pe: EEH PE 188 + * @delay: delay while PE state is temporarily unavailable 189 + * 190 + * Retrieve the state of the specified PE. For IODA-compitable 191 + * platform, it should be retrieved from IODA table. Therefore, 192 + * we prefer passing down to hardware implementation to handle 193 + * it. 194 + */ 195 + static int powernv_eeh_get_state(struct eeh_pe *pe, int *delay) 196 + { 197 + struct pci_controller *hose = pe->phb; 198 + struct pnv_phb *phb = hose->private_data; 199 + int ret = EEH_STATE_NOT_SUPPORT; 200 + 201 + if (phb->eeh_ops && phb->eeh_ops->get_state) { 202 + ret = phb->eeh_ops->get_state(pe); 203 + 204 + /* 205 + * If the PE state is temporarily unavailable, 206 + * to inform the EEH core delay for default 207 + * period (1 second) 208 + */ 209 + if (delay) { 210 + *delay = 0; 211 + if (ret & EEH_STATE_UNAVAILABLE) 212 + *delay = 1000; 213 + } 214 + } 215 + 216 + return ret; 217 + } 218 + 219 + /** 220 + * powernv_eeh_reset - Reset the specified PE 221 + * @pe: EEH PE 222 + * @option: reset option 223 + * 224 + * Reset the specified PE 225 + */ 226 + static int powernv_eeh_reset(struct eeh_pe *pe, int option) 227 + { 228 + struct pci_controller *hose = pe->phb; 229 + struct pnv_phb *phb = hose->private_data; 230 + int ret = -EEXIST; 231 + 232 + if (phb->eeh_ops && phb->eeh_ops->reset) 233 + ret = phb->eeh_ops->reset(pe, option); 234 + 235 + return ret; 236 + } 237 + 238 + /** 239 + * powernv_eeh_wait_state - Wait for PE state 240 + * @pe: EEH PE 241 + * @max_wait: maximal period in microsecond 242 + * 243 + * Wait for the state of associated PE. It might take some time 244 + * to retrieve the PE's state. 245 + */ 246 + static int powernv_eeh_wait_state(struct eeh_pe *pe, int max_wait) 247 + { 248 + int ret; 249 + int mwait; 250 + 251 + while (1) { 252 + ret = powernv_eeh_get_state(pe, &mwait); 253 + 254 + /* 255 + * If the PE's state is temporarily unavailable, 256 + * we have to wait for the specified time. Otherwise, 257 + * the PE's state will be returned immediately. 258 + */ 259 + if (ret != EEH_STATE_UNAVAILABLE) 260 + return ret; 261 + 262 + max_wait -= mwait; 263 + if (max_wait <= 0) { 264 + pr_warning("%s: Timeout getting PE#%x's state (%d)\n", 265 + __func__, pe->addr, max_wait); 266 + return EEH_STATE_NOT_SUPPORT; 267 + } 268 + 269 + msleep(mwait); 270 + } 271 + 272 + return EEH_STATE_NOT_SUPPORT; 273 + } 274 + 275 + /** 276 + * powernv_eeh_get_log - Retrieve error log 277 + * @pe: EEH PE 278 + * @severity: temporary or permanent error log 279 + * @drv_log: driver log to be combined with retrieved error log 280 + * @len: length of driver log 281 + * 282 + * Retrieve the temporary or permanent error from the PE. 283 + */ 284 + static int powernv_eeh_get_log(struct eeh_pe *pe, int severity, 285 + char *drv_log, unsigned long len) 286 + { 287 + struct pci_controller *hose = pe->phb; 288 + struct pnv_phb *phb = hose->private_data; 289 + int ret = -EEXIST; 290 + 291 + if (phb->eeh_ops && phb->eeh_ops->get_log) 292 + ret = phb->eeh_ops->get_log(pe, severity, drv_log, len); 293 + 294 + return ret; 295 + } 296 + 297 + /** 298 + * powernv_eeh_configure_bridge - Configure PCI bridges in the indicated PE 299 + * @pe: EEH PE 300 + * 301 + * The function will be called to reconfigure the bridges included 302 + * in the specified PE so that the mulfunctional PE would be recovered 303 + * again. 304 + */ 305 + static int powernv_eeh_configure_bridge(struct eeh_pe *pe) 306 + { 307 + struct pci_controller *hose = pe->phb; 308 + struct pnv_phb *phb = hose->private_data; 309 + int ret = 0; 310 + 311 + if (phb->eeh_ops && phb->eeh_ops->configure_bridge) 312 + ret = phb->eeh_ops->configure_bridge(pe); 313 + 314 + return ret; 315 + } 316 + 317 + /** 318 + * powernv_eeh_next_error - Retrieve next EEH error to handle 319 + * @pe: Affected PE 320 + * 321 + * Using OPAL API, to retrieve next EEH error for EEH core to handle 322 + */ 323 + static int powernv_eeh_next_error(struct eeh_pe **pe) 324 + { 325 + struct pci_controller *hose; 326 + struct pnv_phb *phb = NULL; 327 + 328 + list_for_each_entry(hose, &hose_list, list_node) { 329 + phb = hose->private_data; 330 + break; 331 + } 332 + 333 + if (phb && phb->eeh_ops->next_error) 334 + return phb->eeh_ops->next_error(pe); 335 + 336 + return -EEXIST; 337 + } 338 + 339 + static struct eeh_ops powernv_eeh_ops = { 340 + .name = "powernv", 341 + .init = powernv_eeh_init, 342 + .post_init = powernv_eeh_post_init, 343 + .of_probe = NULL, 344 + .dev_probe = powernv_eeh_dev_probe, 345 + .set_option = powernv_eeh_set_option, 346 + .get_pe_addr = powernv_eeh_get_pe_addr, 347 + .get_state = powernv_eeh_get_state, 348 + .reset = powernv_eeh_reset, 349 + .wait_state = powernv_eeh_wait_state, 350 + .get_log = powernv_eeh_get_log, 351 + .configure_bridge = powernv_eeh_configure_bridge, 352 + .read_config = pnv_pci_cfg_read, 353 + .write_config = pnv_pci_cfg_write, 354 + .next_error = powernv_eeh_next_error 355 + }; 356 + 357 + /** 358 + * eeh_powernv_init - Register platform dependent EEH operations 359 + * 360 + * EEH initialization on powernv platform. This function should be 361 + * called before any EEH related functions. 362 + */ 363 + static int __init eeh_powernv_init(void) 364 + { 365 + int ret = -EINVAL; 366 + 367 + if (!machine_is(powernv)) 368 + return ret; 369 + 370 + ret = eeh_ops_register(&powernv_eeh_ops); 371 + if (!ret) 372 + pr_info("EEH: PowerNV platform initialized\n"); 373 + else 374 + pr_info("EEH: Failed to initialize PowerNV platform (%d)\n", ret); 375 + 376 + return ret; 377 + } 378 + 379 + early_initcall(eeh_powernv_init);
+3
arch/powerpc/platforms/powernv/opal-wrappers.S
··· 107 107 OPAL_CALL(opal_set_slot_led_status, OPAL_SET_SLOT_LED_STATUS); 108 108 OPAL_CALL(opal_get_epow_status, OPAL_GET_EPOW_STATUS); 109 109 OPAL_CALL(opal_set_system_attention_led, OPAL_SET_SYSTEM_ATTENTION_LED); 110 + OPAL_CALL(opal_pci_next_error, OPAL_PCI_NEXT_ERROR); 111 + OPAL_CALL(opal_pci_poll, OPAL_PCI_POLL); 110 112 OPAL_CALL(opal_pci_msi_eoi, OPAL_PCI_MSI_EOI); 113 + OPAL_CALL(opal_pci_get_phb_diag_data2, OPAL_PCI_GET_PHB_DIAG_DATA2);
+68 -1
arch/powerpc/platforms/powernv/opal.c
··· 15 15 #include <linux/of.h> 16 16 #include <linux/of_platform.h> 17 17 #include <linux/interrupt.h> 18 + #include <linux/notifier.h> 18 19 #include <linux/slab.h> 19 20 #include <asm/opal.h> 20 21 #include <asm/firmware.h> ··· 32 31 extern u64 opal_mc_secondary_handler[]; 33 32 static unsigned int *opal_irqs; 34 33 static unsigned int opal_irq_count; 34 + static ATOMIC_NOTIFIER_HEAD(opal_notifier_head); 35 + static DEFINE_SPINLOCK(opal_notifier_lock); 36 + static uint64_t last_notified_mask = 0x0ul; 37 + static atomic_t opal_notifier_hold = ATOMIC_INIT(0); 35 38 36 39 int __init early_init_dt_scan_opal(unsigned long node, 37 40 const char *uname, int depth, void *data) ··· 99 94 } 100 95 101 96 early_initcall(opal_register_exception_handlers); 97 + 98 + int opal_notifier_register(struct notifier_block *nb) 99 + { 100 + if (!nb) { 101 + pr_warning("%s: Invalid argument (%p)\n", 102 + __func__, nb); 103 + return -EINVAL; 104 + } 105 + 106 + atomic_notifier_chain_register(&opal_notifier_head, nb); 107 + return 0; 108 + } 109 + 110 + static void opal_do_notifier(uint64_t events) 111 + { 112 + unsigned long flags; 113 + uint64_t changed_mask; 114 + 115 + if (atomic_read(&opal_notifier_hold)) 116 + return; 117 + 118 + spin_lock_irqsave(&opal_notifier_lock, flags); 119 + changed_mask = last_notified_mask ^ events; 120 + last_notified_mask = events; 121 + spin_unlock_irqrestore(&opal_notifier_lock, flags); 122 + 123 + /* 124 + * We feed with the event bits and changed bits for 125 + * enough information to the callback. 126 + */ 127 + atomic_notifier_call_chain(&opal_notifier_head, 128 + events, (void *)changed_mask); 129 + } 130 + 131 + void opal_notifier_update_evt(uint64_t evt_mask, 132 + uint64_t evt_val) 133 + { 134 + unsigned long flags; 135 + 136 + spin_lock_irqsave(&opal_notifier_lock, flags); 137 + last_notified_mask &= ~evt_mask; 138 + last_notified_mask |= evt_val; 139 + spin_unlock_irqrestore(&opal_notifier_lock, flags); 140 + } 141 + 142 + void opal_notifier_enable(void) 143 + { 144 + int64_t rc; 145 + uint64_t evt = 0; 146 + 147 + atomic_set(&opal_notifier_hold, 0); 148 + 149 + /* Process pending events */ 150 + rc = opal_poll_events(&evt); 151 + if (rc == OPAL_SUCCESS && evt) 152 + opal_do_notifier(evt); 153 + } 154 + 155 + void opal_notifier_disable(void) 156 + { 157 + atomic_set(&opal_notifier_hold, 1); 158 + } 102 159 103 160 int opal_get_chars(uint32_t vtermno, char *buf, int count) 104 161 { ··· 364 297 365 298 opal_handle_interrupt(virq_to_hw(irq), &events); 366 299 367 - /* XXX TODO: Do something with the events */ 300 + opal_do_notifier(events); 368 301 369 302 return IRQ_HANDLED; 370 303 }
+59 -3
arch/powerpc/platforms/powernv/pci-ioda.c
··· 13 13 14 14 #include <linux/kernel.h> 15 15 #include <linux/pci.h> 16 + #include <linux/debugfs.h> 16 17 #include <linux/delay.h> 17 18 #include <linux/string.h> 18 19 #include <linux/init.h> ··· 33 32 #include <asm/iommu.h> 34 33 #include <asm/tce.h> 35 34 #include <asm/xics.h> 35 + #include <asm/debug.h> 36 36 37 37 #include "powernv.h" 38 38 #include "pci.h" ··· 443 441 set_iommu_table_base(&pdev->dev, &pe->tce32_table); 444 442 } 445 443 444 + static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus) 445 + { 446 + struct pci_dev *dev; 447 + 448 + list_for_each_entry(dev, &bus->devices, bus_list) { 449 + set_iommu_table_base(&dev->dev, &pe->tce32_table); 450 + if (dev->subordinate) 451 + pnv_ioda_setup_bus_dma(pe, dev->subordinate); 452 + } 453 + } 454 + 446 455 static void pnv_pci_ioda1_tce_invalidate(struct iommu_table *tbl, 447 456 u64 *startp, u64 *endp) 448 457 { ··· 608 595 TCE_PCI_SWINV_PAIR; 609 596 } 610 597 iommu_init_table(tbl, phb->hose->node); 598 + iommu_register_group(tbl, pci_domain_nr(pe->pbus), pe->pe_number); 599 + 600 + if (pe->pdev) 601 + set_iommu_table_base(&pe->pdev->dev, tbl); 602 + else 603 + pnv_ioda_setup_bus_dma(pe, pe->pbus); 611 604 612 605 return; 613 606 fail: ··· 685 666 tbl->it_type = TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE; 686 667 } 687 668 iommu_init_table(tbl, phb->hose->node); 669 + 670 + if (pe->pdev) 671 + set_iommu_table_base(&pe->pdev->dev, tbl); 672 + else 673 + pnv_ioda_setup_bus_dma(pe, pe->pbus); 688 674 689 675 return; 690 676 fail: ··· 992 968 } 993 969 } 994 970 971 + static void pnv_pci_ioda_create_dbgfs(void) 972 + { 973 + #ifdef CONFIG_DEBUG_FS 974 + struct pci_controller *hose, *tmp; 975 + struct pnv_phb *phb; 976 + char name[16]; 977 + 978 + list_for_each_entry_safe(hose, tmp, &hose_list, list_node) { 979 + phb = hose->private_data; 980 + 981 + sprintf(name, "PCI%04x", hose->global_number); 982 + phb->dbgfs = debugfs_create_dir(name, powerpc_debugfs_root); 983 + if (!phb->dbgfs) 984 + pr_warning("%s: Error on creating debugfs on PHB#%x\n", 985 + __func__, hose->global_number); 986 + } 987 + #endif /* CONFIG_DEBUG_FS */ 988 + } 989 + 995 990 static void pnv_pci_ioda_fixup(void) 996 991 { 997 992 pnv_pci_ioda_setup_PEs(); 998 993 pnv_pci_ioda_setup_seg(); 999 994 pnv_pci_ioda_setup_DMA(); 995 + 996 + pnv_pci_ioda_create_dbgfs(); 997 + 998 + #ifdef CONFIG_EEH 999 + eeh_probe_mode_set(EEH_PROBE_MODE_DEV); 1000 + eeh_addr_cache_build(); 1001 + eeh_init(); 1002 + #endif 1000 1003 } 1001 1004 1002 1005 /* ··· 1100 1049 OPAL_ASSERT_RESET); 1101 1050 } 1102 1051 1103 - void __init pnv_pci_init_ioda_phb(struct device_node *np, int ioda_type) 1052 + void __init pnv_pci_init_ioda_phb(struct device_node *np, 1053 + u64 hub_id, int ioda_type) 1104 1054 { 1105 1055 struct pci_controller *hose; 1106 1056 static int primary = 1; ··· 1139 1087 hose->first_busno = 0; 1140 1088 hose->last_busno = 0xff; 1141 1089 hose->private_data = phb; 1090 + phb->hub_id = hub_id; 1142 1091 phb->opal_id = phb_id; 1143 1092 phb->type = ioda_type; 1144 1093 ··· 1225 1172 phb->ioda.io_size, phb->ioda.io_segsize); 1226 1173 1227 1174 phb->hose->ops = &pnv_pci_ops; 1175 + #ifdef CONFIG_EEH 1176 + phb->eeh_ops = &ioda_eeh_ops; 1177 + #endif 1228 1178 1229 1179 /* Setup RID -> PE mapping function */ 1230 1180 phb->bdfn_to_pe = pnv_ioda_bdfn_to_pe; ··· 1268 1212 1269 1213 void pnv_pci_init_ioda2_phb(struct device_node *np) 1270 1214 { 1271 - pnv_pci_init_ioda_phb(np, PNV_PHB_IODA2); 1215 + pnv_pci_init_ioda_phb(np, 0, PNV_PHB_IODA2); 1272 1216 } 1273 1217 1274 1218 void __init pnv_pci_init_ioda_hub(struct device_node *np) ··· 1291 1235 for_each_child_of_node(np, phbn) { 1292 1236 /* Look for IODA1 PHBs */ 1293 1237 if (of_device_is_compatible(phbn, "ibm,ioda-phb")) 1294 - pnv_pci_init_ioda_phb(phbn, PNV_PHB_IODA1); 1238 + pnv_pci_init_ioda_phb(phbn, hub_id, PNV_PHB_IODA1); 1295 1239 } 1296 1240 }
+8 -3
arch/powerpc/platforms/powernv/pci-p5ioc2.c
··· 86 86 static void pnv_pci_p5ioc2_dma_dev_setup(struct pnv_phb *phb, 87 87 struct pci_dev *pdev) 88 88 { 89 - if (phb->p5ioc2.iommu_table.it_map == NULL) 89 + if (phb->p5ioc2.iommu_table.it_map == NULL) { 90 90 iommu_init_table(&phb->p5ioc2.iommu_table, phb->hose->node); 91 + iommu_register_group(&phb->p5ioc2.iommu_table, 92 + pci_domain_nr(phb->hose->bus), phb->opal_id); 93 + } 91 94 92 95 set_iommu_table_base(&pdev->dev, &phb->p5ioc2.iommu_table); 93 96 } 94 97 95 - static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np, 98 + static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np, u64 hub_id, 96 99 void *tce_mem, u64 tce_size) 97 100 { 98 101 struct pnv_phb *phb; ··· 136 133 phb->hose->first_busno = 0; 137 134 phb->hose->last_busno = 0xff; 138 135 phb->hose->private_data = phb; 136 + phb->hub_id = hub_id; 139 137 phb->opal_id = phb_id; 140 138 phb->type = PNV_PHB_P5IOC2; 141 139 phb->model = PNV_PHB_MODEL_P5IOC2; ··· 230 226 for_each_child_of_node(np, phbn) { 231 227 if (of_device_is_compatible(phbn, "ibm,p5ioc2-pcix") || 232 228 of_device_is_compatible(phbn, "ibm,p5ioc2-pciex")) { 233 - pnv_pci_init_p5ioc2_phb(phbn, tce_mem, tce_per_phb); 229 + pnv_pci_init_p5ioc2_phb(phbn, hub_id, 230 + tce_mem, tce_per_phb); 234 231 tce_mem += tce_per_phb; 235 232 } 236 233 }
+102 -35
arch/powerpc/platforms/powernv/pci.c
··· 20 20 #include <linux/irq.h> 21 21 #include <linux/io.h> 22 22 #include <linux/msi.h> 23 + #include <linux/iommu.h> 23 24 24 25 #include <asm/sections.h> 25 26 #include <asm/io.h> ··· 33 32 #include <asm/iommu.h> 34 33 #include <asm/tce.h> 35 34 #include <asm/firmware.h> 35 + #include <asm/eeh_event.h> 36 + #include <asm/eeh.h> 36 37 37 38 #include "powernv.h" 38 39 #include "pci.h" ··· 205 202 206 203 spin_lock_irqsave(&phb->lock, flags); 207 204 208 - rc = opal_pci_get_phb_diag_data(phb->opal_id, phb->diag.blob, PNV_PCI_DIAG_BUF_SIZE); 205 + rc = opal_pci_get_phb_diag_data2(phb->opal_id, phb->diag.blob, 206 + PNV_PCI_DIAG_BUF_SIZE); 209 207 has_diag = (rc == OPAL_SUCCESS); 210 208 211 209 rc = opal_pci_eeh_freeze_clear(phb->opal_id, pe_no, ··· 231 227 spin_unlock_irqrestore(&phb->lock, flags); 232 228 } 233 229 234 - static void pnv_pci_config_check_eeh(struct pnv_phb *phb, struct pci_bus *bus, 235 - u32 bdfn) 230 + static void pnv_pci_config_check_eeh(struct pnv_phb *phb, 231 + struct device_node *dn) 236 232 { 237 233 s64 rc; 238 234 u8 fstate; 239 235 u16 pcierr; 240 236 u32 pe_no; 241 237 242 - /* Get PE# if we support IODA */ 243 - pe_no = phb->bdfn_to_pe ? phb->bdfn_to_pe(phb, bus, bdfn & 0xff) : 0; 238 + /* 239 + * Get the PE#. During the PCI probe stage, we might not 240 + * setup that yet. So all ER errors should be mapped to 241 + * PE#0 242 + */ 243 + pe_no = PCI_DN(dn)->pe_number; 244 + if (pe_no == IODA_INVALID_PE) 245 + pe_no = 0; 244 246 245 247 /* Read freeze status */ 246 248 rc = opal_pci_eeh_freeze_status(phb->opal_id, pe_no, &fstate, &pcierr, 247 249 NULL); 248 250 if (rc) { 249 - pr_warning("PCI %d: Failed to read EEH status for PE#%d," 250 - " err %lld\n", phb->hose->global_number, pe_no, rc); 251 + pr_warning("%s: Can't read EEH status (PE#%d) for " 252 + "%s, err %lld\n", 253 + __func__, pe_no, dn->full_name, rc); 251 254 return; 252 255 } 253 - cfg_dbg(" -> EEH check, bdfn=%04x PE%d fstate=%x\n", 254 - bdfn, pe_no, fstate); 256 + cfg_dbg(" -> EEH check, bdfn=%04x PE#%d fstate=%x\n", 257 + (PCI_DN(dn)->busno << 8) | (PCI_DN(dn)->devfn), 258 + pe_no, fstate); 255 259 if (fstate != 0) 256 260 pnv_pci_handle_eeh_config(phb, pe_no); 257 261 } 258 262 259 - static int pnv_pci_read_config(struct pci_bus *bus, 260 - unsigned int devfn, 261 - int where, int size, u32 *val) 263 + int pnv_pci_cfg_read(struct device_node *dn, 264 + int where, int size, u32 *val) 262 265 { 263 - struct pci_controller *hose = pci_bus_to_host(bus); 264 - struct pnv_phb *phb = hose->private_data; 265 - u32 bdfn = (((uint64_t)bus->number) << 8) | devfn; 266 + struct pci_dn *pdn = PCI_DN(dn); 267 + struct pnv_phb *phb = pdn->phb->private_data; 268 + u32 bdfn = (pdn->busno << 8) | pdn->devfn; 269 + #ifdef CONFIG_EEH 270 + struct eeh_pe *phb_pe = NULL; 271 + #endif 266 272 s64 rc; 267 - 268 - if (hose == NULL) 269 - return PCIBIOS_DEVICE_NOT_FOUND; 270 273 271 274 switch (size) { 272 275 case 1: { ··· 298 287 default: 299 288 return PCIBIOS_FUNC_NOT_SUPPORTED; 300 289 } 301 - cfg_dbg("pnv_pci_read_config bus: %x devfn: %x +%x/%x -> %08x\n", 302 - bus->number, devfn, where, size, *val); 290 + cfg_dbg("%s: bus: %x devfn: %x +%x/%x -> %08x\n", 291 + __func__, pdn->busno, pdn->devfn, where, size, *val); 303 292 304 - /* Check if the PHB got frozen due to an error (no response) */ 305 - pnv_pci_config_check_eeh(phb, bus, bdfn); 293 + /* 294 + * Check if the specified PE has been put into frozen 295 + * state. On the other hand, we needn't do that while 296 + * the PHB has been put into frozen state because of 297 + * PHB-fatal errors. 298 + */ 299 + #ifdef CONFIG_EEH 300 + phb_pe = eeh_phb_pe_get(pdn->phb); 301 + if (phb_pe && (phb_pe->state & EEH_PE_ISOLATED)) 302 + return PCIBIOS_SUCCESSFUL; 303 + 304 + if (phb->eeh_state & PNV_EEH_STATE_ENABLED) { 305 + if (*val == EEH_IO_ERROR_VALUE(size) && 306 + eeh_dev_check_failure(of_node_to_eeh_dev(dn))) 307 + return PCIBIOS_DEVICE_NOT_FOUND; 308 + } else { 309 + pnv_pci_config_check_eeh(phb, dn); 310 + } 311 + #else 312 + pnv_pci_config_check_eeh(phb, dn); 313 + #endif 306 314 307 315 return PCIBIOS_SUCCESSFUL; 308 316 } 309 317 310 - static int pnv_pci_write_config(struct pci_bus *bus, 311 - unsigned int devfn, 312 - int where, int size, u32 val) 318 + int pnv_pci_cfg_write(struct device_node *dn, 319 + int where, int size, u32 val) 313 320 { 314 - struct pci_controller *hose = pci_bus_to_host(bus); 315 - struct pnv_phb *phb = hose->private_data; 316 - u32 bdfn = (((uint64_t)bus->number) << 8) | devfn; 321 + struct pci_dn *pdn = PCI_DN(dn); 322 + struct pnv_phb *phb = pdn->phb->private_data; 323 + u32 bdfn = (pdn->busno << 8) | pdn->devfn; 317 324 318 - if (hose == NULL) 319 - return PCIBIOS_DEVICE_NOT_FOUND; 320 - 321 - cfg_dbg("pnv_pci_write_config bus: %x devfn: %x +%x/%x -> %08x\n", 322 - bus->number, devfn, where, size, val); 325 + cfg_dbg("%s: bus: %x devfn: %x +%x/%x -> %08x\n", 326 + pdn->busno, pdn->devfn, where, size, val); 323 327 switch (size) { 324 328 case 1: 325 329 opal_pci_config_write_byte(phb->opal_id, bdfn, where, val); ··· 348 322 default: 349 323 return PCIBIOS_FUNC_NOT_SUPPORTED; 350 324 } 325 + 351 326 /* Check if the PHB got frozen due to an error (no response) */ 352 - pnv_pci_config_check_eeh(phb, bus, bdfn); 327 + #ifdef CONFIG_EEH 328 + if (!(phb->eeh_state & PNV_EEH_STATE_ENABLED)) 329 + pnv_pci_config_check_eeh(phb, dn); 330 + #else 331 + pnv_pci_config_check_eeh(phb, dn); 332 + #endif 353 333 354 334 return PCIBIOS_SUCCESSFUL; 355 335 } 356 336 337 + static int pnv_pci_read_config(struct pci_bus *bus, 338 + unsigned int devfn, 339 + int where, int size, u32 *val) 340 + { 341 + struct device_node *dn, *busdn = pci_bus_to_OF_node(bus); 342 + struct pci_dn *pdn; 343 + 344 + for (dn = busdn->child; dn; dn = dn->sibling) { 345 + pdn = PCI_DN(dn); 346 + if (pdn && pdn->devfn == devfn) 347 + return pnv_pci_cfg_read(dn, where, size, val); 348 + } 349 + 350 + *val = 0xFFFFFFFF; 351 + return PCIBIOS_DEVICE_NOT_FOUND; 352 + 353 + } 354 + 355 + static int pnv_pci_write_config(struct pci_bus *bus, 356 + unsigned int devfn, 357 + int where, int size, u32 val) 358 + { 359 + struct device_node *dn, *busdn = pci_bus_to_OF_node(bus); 360 + struct pci_dn *pdn; 361 + 362 + for (dn = busdn->child; dn; dn = dn->sibling) { 363 + pdn = PCI_DN(dn); 364 + if (pdn && pdn->devfn == devfn) 365 + return pnv_pci_cfg_write(dn, where, size, val); 366 + } 367 + 368 + return PCIBIOS_DEVICE_NOT_FOUND; 369 + } 370 + 357 371 struct pci_ops pnv_pci_ops = { 358 - .read = pnv_pci_read_config, 372 + .read = pnv_pci_read_config, 359 373 .write = pnv_pci_write_config, 360 374 }; 361 375 ··· 478 412 pnv_pci_setup_iommu_table(tbl, __va(be64_to_cpup(basep)), 479 413 be32_to_cpup(sizep), 0); 480 414 iommu_init_table(tbl, hose->node); 415 + iommu_register_group(tbl, pci_domain_nr(hose->bus), 0); 481 416 482 417 /* Deal with SW invalidated TCEs when needed (BML way) */ 483 418 swinvp = of_get_property(hose->dn, "linux,tce-sw-invalidate-info",
+35
arch/powerpc/platforms/powernv/pci.h
··· 66 66 struct list_head list; 67 67 }; 68 68 69 + /* IOC dependent EEH operations */ 70 + #ifdef CONFIG_EEH 71 + struct pnv_eeh_ops { 72 + int (*post_init)(struct pci_controller *hose); 73 + int (*set_option)(struct eeh_pe *pe, int option); 74 + int (*get_state)(struct eeh_pe *pe); 75 + int (*reset)(struct eeh_pe *pe, int option); 76 + int (*get_log)(struct eeh_pe *pe, int severity, 77 + char *drv_log, unsigned long len); 78 + int (*configure_bridge)(struct eeh_pe *pe); 79 + int (*next_error)(struct eeh_pe **pe); 80 + }; 81 + 82 + #define PNV_EEH_STATE_ENABLED (1 << 0) /* EEH enabled */ 83 + #define PNV_EEH_STATE_REMOVED (1 << 1) /* PHB removed */ 84 + 85 + #endif /* CONFIG_EEH */ 86 + 69 87 struct pnv_phb { 70 88 struct pci_controller *hose; 71 89 enum pnv_phb_type type; 72 90 enum pnv_phb_model model; 91 + u64 hub_id; 73 92 u64 opal_id; 74 93 void __iomem *regs; 75 94 int initialized; 76 95 spinlock_t lock; 96 + 97 + #ifdef CONFIG_EEH 98 + struct pnv_eeh_ops *eeh_ops; 99 + int eeh_state; 100 + #endif 101 + 102 + #ifdef CONFIG_DEBUG_FS 103 + struct dentry *dbgfs; 104 + #endif 77 105 78 106 #ifdef CONFIG_PCI_MSI 79 107 unsigned int msi_base; ··· 178 150 }; 179 151 180 152 extern struct pci_ops pnv_pci_ops; 153 + #ifdef CONFIG_EEH 154 + extern struct pnv_eeh_ops ioda_eeh_ops; 155 + #endif 181 156 157 + int pnv_pci_cfg_read(struct device_node *dn, 158 + int where, int size, u32 *val); 159 + int pnv_pci_cfg_write(struct device_node *dn, 160 + int where, int size, u32 val); 182 161 extern void pnv_pci_setup_iommu_table(struct iommu_table *tbl, 183 162 void *tce_mem, u64 tce_size, 184 163 u64 dma_offset);
+4
arch/powerpc/platforms/powernv/setup.c
··· 93 93 { 94 94 long rc = OPAL_BUSY; 95 95 96 + opal_notifier_disable(); 97 + 96 98 while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) { 97 99 rc = opal_cec_reboot(); 98 100 if (rc == OPAL_BUSY_EVENT) ··· 109 107 static void __noreturn pnv_power_off(void) 110 108 { 111 109 long rc = OPAL_BUSY; 110 + 111 + opal_notifier_disable(); 112 112 113 113 while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) { 114 114 rc = opal_cec_power_down(0);
+2 -2
arch/powerpc/platforms/powernv/smp.c
··· 40 40 #define DBG(fmt...) 41 41 #endif 42 42 43 - static void __cpuinit pnv_smp_setup_cpu(int cpu) 43 + static void pnv_smp_setup_cpu(int cpu) 44 44 { 45 45 if (cpu != boot_cpuid) 46 46 xics_setup_cpu(); ··· 51 51 /* Special case - we inhibit secondary thread startup 52 52 * during boot if the user requests it. 53 53 */ 54 - if (system_state < SYSTEM_RUNNING && cpu_has_feature(CPU_FTR_SMT)) { 54 + if (system_state == SYSTEM_BOOTING && cpu_has_feature(CPU_FTR_SMT)) { 55 55 if (!smt_enabled_at_boot && cpu_thread_in_core(nr) != 0) 56 56 return 0; 57 57 if (smt_enabled_at_boot
+3 -2
arch/powerpc/platforms/ps3/htab.c
··· 109 109 } 110 110 111 111 static long ps3_hpte_updatepp(unsigned long slot, unsigned long newpp, 112 - unsigned long vpn, int psize, int ssize, int local) 112 + unsigned long vpn, int psize, int apsize, 113 + int ssize, int local) 113 114 { 114 115 int result; 115 116 u64 hpte_v, want_v, hpte_rs; ··· 163 162 } 164 163 165 164 static void ps3_hpte_invalidate(unsigned long slot, unsigned long vpn, 166 - int psize, int ssize, int local) 165 + int psize, int apsize, int ssize, int local) 167 166 { 168 167 unsigned long flags; 169 168 int result;
-5
arch/powerpc/platforms/pseries/Kconfig
··· 33 33 processors, that is, which share physical processors between 34 34 two or more partitions. 35 35 36 - config EEH 37 - bool 38 - depends on PPC_PSERIES && PCI 39 - default y 40 - 41 36 config PSERIES_MSI 42 37 bool 43 38 depends on PCI_MSI && EEH
+1 -3
arch/powerpc/platforms/pseries/Makefile
··· 6 6 firmware.o power.o dlpar.o mobility.o 7 7 obj-$(CONFIG_SMP) += smp.o 8 8 obj-$(CONFIG_SCANLOG) += scanlog.o 9 - obj-$(CONFIG_EEH) += eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \ 10 - eeh_driver.o eeh_event.o eeh_sysfs.o \ 11 - eeh_pseries.o 9 + obj-$(CONFIG_EEH) += eeh_pseries.o 12 10 obj-$(CONFIG_KEXEC) += kexec.o 13 11 obj-$(CONFIG_PCI) += pci.o pci_dlpar.o 14 12 obj-$(CONFIG_PSERIES_MSI) += msi.o
+161 -33
arch/powerpc/platforms/pseries/eeh.c arch/powerpc/kernel/eeh.c
··· 103 103 */ 104 104 int eeh_probe_mode; 105 105 106 - /* Global EEH mutex */ 107 - DEFINE_MUTEX(eeh_mutex); 108 - 109 106 /* Lock to avoid races due to multiple reports of an error */ 110 - static DEFINE_RAW_SPINLOCK(confirm_error_lock); 107 + DEFINE_RAW_SPINLOCK(confirm_error_lock); 111 108 112 109 /* Buffer for reporting pci register dumps. Its here in BSS, and 113 110 * not dynamically alloced, so that it ends up in RMO where RTAS ··· 232 235 { 233 236 size_t loglen = 0; 234 237 struct eeh_dev *edev; 238 + bool valid_cfg_log = true; 235 239 236 - eeh_pci_enable(pe, EEH_OPT_THAW_MMIO); 237 - eeh_ops->configure_bridge(pe); 238 - eeh_pe_restore_bars(pe); 240 + /* 241 + * When the PHB is fenced or dead, it's pointless to collect 242 + * the data from PCI config space because it should return 243 + * 0xFF's. For ER, we still retrieve the data from the PCI 244 + * config space. 245 + */ 246 + if (eeh_probe_mode_dev() && 247 + (pe->type & EEH_PE_PHB) && 248 + (pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD))) 249 + valid_cfg_log = false; 239 250 240 - pci_regs_buf[0] = 0; 241 - eeh_pe_for_each_dev(pe, edev) { 242 - loglen += eeh_gather_pci_data(edev, pci_regs_buf, 243 - EEH_PCI_REGS_LOG_LEN); 244 - } 251 + if (valid_cfg_log) { 252 + eeh_pci_enable(pe, EEH_OPT_THAW_MMIO); 253 + eeh_ops->configure_bridge(pe); 254 + eeh_pe_restore_bars(pe); 255 + 256 + pci_regs_buf[0] = 0; 257 + eeh_pe_for_each_dev(pe, edev) { 258 + loglen += eeh_gather_pci_data(edev, pci_regs_buf + loglen, 259 + EEH_PCI_REGS_LOG_LEN - loglen); 260 + } 261 + } 245 262 246 263 eeh_ops->get_log(pe, severity, pci_regs_buf, loglen); 247 264 } ··· 271 260 { 272 261 pte_t *ptep; 273 262 unsigned long pa; 263 + int hugepage_shift; 274 264 275 - ptep = find_linux_pte(init_mm.pgd, token); 265 + /* 266 + * We won't find hugepages here, iomem 267 + */ 268 + ptep = find_linux_pte_or_hugepte(init_mm.pgd, token, &hugepage_shift); 276 269 if (!ptep) 277 270 return token; 271 + WARN_ON(hugepage_shift); 278 272 pa = pte_pfn(*ptep) << PAGE_SHIFT; 279 273 280 274 return pa | (token & (PAGE_SIZE-1)); 275 + } 276 + 277 + /* 278 + * On PowerNV platform, we might already have fenced PHB there. 279 + * For that case, it's meaningless to recover frozen PE. Intead, 280 + * We have to handle fenced PHB firstly. 281 + */ 282 + static int eeh_phb_check_failure(struct eeh_pe *pe) 283 + { 284 + struct eeh_pe *phb_pe; 285 + unsigned long flags; 286 + int ret; 287 + 288 + if (!eeh_probe_mode_dev()) 289 + return -EPERM; 290 + 291 + /* Find the PHB PE */ 292 + phb_pe = eeh_phb_pe_get(pe->phb); 293 + if (!phb_pe) { 294 + pr_warning("%s Can't find PE for PHB#%d\n", 295 + __func__, pe->phb->global_number); 296 + return -EEXIST; 297 + } 298 + 299 + /* If the PHB has been in problematic state */ 300 + eeh_serialize_lock(&flags); 301 + if (phb_pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD)) { 302 + ret = 0; 303 + goto out; 304 + } 305 + 306 + /* Check PHB state */ 307 + ret = eeh_ops->get_state(phb_pe, NULL); 308 + if ((ret < 0) || 309 + (ret == EEH_STATE_NOT_SUPPORT) || 310 + (ret & (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) == 311 + (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) { 312 + ret = 0; 313 + goto out; 314 + } 315 + 316 + /* Isolate the PHB and send event */ 317 + eeh_pe_state_mark(phb_pe, EEH_PE_ISOLATED); 318 + eeh_serialize_unlock(flags); 319 + eeh_send_failure_event(phb_pe); 320 + 321 + pr_err("EEH: PHB#%x failure detected\n", 322 + phb_pe->phb->global_number); 323 + dump_stack(); 324 + 325 + return 1; 326 + out: 327 + eeh_serialize_unlock(flags); 328 + return ret; 281 329 } 282 330 283 331 /** ··· 389 319 return 0; 390 320 } 391 321 322 + /* 323 + * On PowerNV platform, we might already have fenced PHB 324 + * there and we need take care of that firstly. 325 + */ 326 + ret = eeh_phb_check_failure(pe); 327 + if (ret > 0) 328 + return ret; 329 + 392 330 /* If we already have a pending isolation event for this 393 331 * slot, we know it's bad already, we don't need to check. 394 332 * Do this checking under a lock; as multiple PCI devices 395 333 * in one slot might report errors simultaneously, and we 396 334 * only want one error recovery routine running. 397 335 */ 398 - raw_spin_lock_irqsave(&confirm_error_lock, flags); 336 + eeh_serialize_lock(&flags); 399 337 rc = 1; 400 338 if (pe->state & EEH_PE_ISOLATED) { 401 339 pe->check_count++; ··· 446 368 } 447 369 448 370 eeh_stats.slot_resets++; 449 - 371 + 450 372 /* Avoid repeated reports of this failure, including problems 451 373 * with other functions on this device, and functions under 452 374 * bridges. 453 375 */ 454 376 eeh_pe_state_mark(pe, EEH_PE_ISOLATED); 455 - raw_spin_unlock_irqrestore(&confirm_error_lock, flags); 377 + eeh_serialize_unlock(flags); 456 378 457 379 eeh_send_failure_event(pe); 458 380 ··· 460 382 * a stack trace will help the device-driver authors figure 461 383 * out what happened. So print that out. 462 384 */ 463 - WARN(1, "EEH: failure detected\n"); 385 + pr_err("EEH: Frozen PE#%x detected on PHB#%x\n", 386 + pe->addr, pe->phb->global_number); 387 + dump_stack(); 388 + 464 389 return 1; 465 390 466 391 dn_unlock: 467 - raw_spin_unlock_irqrestore(&confirm_error_lock, flags); 392 + eeh_serialize_unlock(flags); 468 393 return rc; 469 394 } 470 395 ··· 606 525 * or a fundamental reset (3). 607 526 * A fundamental reset required by any device under 608 527 * Partitionable Endpoint trumps hot-reset. 609 - */ 528 + */ 610 529 eeh_pe_dev_traverse(pe, eeh_set_dev_freset, &freset); 611 530 612 531 if (freset) ··· 619 538 */ 620 539 #define PCI_BUS_RST_HOLD_TIME_MSEC 250 621 540 msleep(PCI_BUS_RST_HOLD_TIME_MSEC); 622 - 623 - /* We might get hit with another EEH freeze as soon as the 541 + 542 + /* We might get hit with another EEH freeze as soon as the 624 543 * pci slot reset line is dropped. Make sure we don't miss 625 544 * these, and clear the flag now. 626 545 */ ··· 646 565 */ 647 566 int eeh_reset_pe(struct eeh_pe *pe) 648 567 { 568 + int flags = (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE); 649 569 int i, rc; 650 570 651 571 /* Take three shots at resetting the bus */ ··· 654 572 eeh_reset_pe_once(pe); 655 573 656 574 rc = eeh_ops->wait_state(pe, PCI_BUS_RESET_WAIT_MSEC); 657 - if (rc == (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) 575 + if ((rc & flags) == flags) 658 576 return 0; 659 577 660 578 if (rc < 0) { ··· 686 604 if (!edev) 687 605 return; 688 606 dn = eeh_dev_to_of_node(edev); 689 - 607 + 690 608 for (i = 0; i < 16; i++) 691 609 eeh_ops->read_config(dn, i * 4, 4, &edev->config_space[i]); 692 610 } ··· 756 674 * Even if force-off is set, the EEH hardware is still enabled, so that 757 675 * newer systems can boot. 758 676 */ 759 - static int __init eeh_init(void) 677 + int eeh_init(void) 760 678 { 761 679 struct pci_controller *hose, *tmp; 762 680 struct device_node *phb; 763 - int ret; 681 + static int cnt = 0; 682 + int ret = 0; 683 + 684 + /* 685 + * We have to delay the initialization on PowerNV after 686 + * the PCI hierarchy tree has been built because the PEs 687 + * are figured out based on PCI devices instead of device 688 + * tree nodes 689 + */ 690 + if (machine_is(powernv) && cnt++ <= 0) 691 + return ret; 764 692 765 693 /* call platform initialization function */ 766 694 if (!eeh_ops) { ··· 783 691 return ret; 784 692 } 785 693 786 - raw_spin_lock_init(&confirm_error_lock); 694 + /* Initialize EEH event */ 695 + ret = eeh_event_init(); 696 + if (ret) 697 + return ret; 787 698 788 699 /* Enable EEH for all adapters */ 789 700 if (eeh_probe_mode_devtree()) { ··· 795 700 phb = hose->dn; 796 701 traverse_pci_devices(phb, eeh_ops->of_probe, NULL); 797 702 } 703 + } else if (eeh_probe_mode_dev()) { 704 + list_for_each_entry_safe(hose, tmp, 705 + &hose_list, list_node) 706 + pci_walk_bus(hose->bus, eeh_ops->dev_probe, NULL); 707 + } else { 708 + pr_warning("%s: Invalid probe mode %d\n", 709 + __func__, eeh_probe_mode); 710 + return -EINVAL; 711 + } 712 + 713 + /* 714 + * Call platform post-initialization. Actually, It's good chance 715 + * to inform platform that EEH is ready to supply service if the 716 + * I/O cache stuff has been built up. 717 + */ 718 + if (eeh_ops->post_init) { 719 + ret = eeh_ops->post_init(); 720 + if (ret) 721 + return ret; 798 722 } 799 723 800 724 if (eeh_subsystem_enabled) ··· 842 728 { 843 729 struct pci_controller *phb; 844 730 731 + /* 732 + * If we're doing EEH probe based on PCI device, we 733 + * would delay the probe until late stage because 734 + * the PCI device isn't available this moment. 735 + */ 736 + if (!eeh_probe_mode_devtree()) 737 + return; 738 + 845 739 if (!of_node_to_eeh_dev(dn)) 846 740 return; 847 741 phb = of_node_to_eeh_dev(dn)->phb; ··· 858 736 if (NULL == phb || 0 == phb->buid) 859 737 return; 860 738 861 - /* FIXME: hotplug support on POWERNV */ 862 739 eeh_ops->of_probe(dn, NULL); 863 740 } 864 741 ··· 908 787 edev->pdev = dev; 909 788 dev->dev.archdata.edev = edev; 910 789 790 + /* 791 + * We have to do the EEH probe here because the PCI device 792 + * hasn't been created yet in the early stage. 793 + */ 794 + if (eeh_probe_mode_dev()) 795 + eeh_ops->dev_probe(dev, NULL); 796 + 911 797 eeh_addr_cache_insert_dev(dev); 912 798 } 913 799 ··· 931 803 struct pci_dev *dev; 932 804 933 805 list_for_each_entry(dev, &bus->devices, bus_list) { 934 - eeh_add_device_late(dev); 935 - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { 936 - struct pci_bus *subbus = dev->subordinate; 937 - if (subbus) 938 - eeh_add_device_tree_late(subbus); 939 - } 806 + eeh_add_device_late(dev); 807 + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { 808 + struct pci_bus *subbus = dev->subordinate; 809 + if (subbus) 810 + eeh_add_device_tree_late(subbus); 811 + } 940 812 } 941 813 } 942 814 EXPORT_SYMBOL_GPL(eeh_add_device_tree_late);
+2 -3
arch/powerpc/platforms/pseries/eeh_cache.c arch/powerpc/kernel/eeh_cache.c
··· 194 194 } 195 195 196 196 /* Skip any devices for which EEH is not enabled. */ 197 - if (!edev->pe) { 197 + if (!eeh_probe_mode_dev() && !edev->pe) { 198 198 #ifdef DEBUG 199 199 pr_info("PCI: skip building address cache for=%s - %s\n", 200 200 pci_name(dev), dn->full_name); ··· 285 285 * Must be run late in boot process, after the pci controllers 286 286 * have been scanned for devices (after all device resources are known). 287 287 */ 288 - void __init eeh_addr_cache_build(void) 288 + void eeh_addr_cache_build(void) 289 289 { 290 290 struct device_node *dn; 291 291 struct eeh_dev *edev; ··· 316 316 eeh_addr_cache_print(&pci_io_addr_cache_root); 317 317 #endif 318 318 } 319 -
arch/powerpc/platforms/pseries/eeh_dev.c arch/powerpc/kernel/eeh_dev.c
+139 -30
arch/powerpc/platforms/pseries/eeh_driver.c arch/powerpc/kernel/eeh_driver.c
··· 154 154 * eeh_report_error - Report pci error to each device driver 155 155 * @data: eeh device 156 156 * @userdata: return value 157 - * 158 - * Report an EEH error to each device driver, collect up and 159 - * merge the device driver responses. Cumulative response 157 + * 158 + * Report an EEH error to each device driver, collect up and 159 + * merge the device driver responses. Cumulative response 160 160 * passed back in "userdata". 161 161 */ 162 162 static void *eeh_report_error(void *data, void *userdata) ··· 349 349 */ 350 350 static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus) 351 351 { 352 + struct timeval tstamp; 352 353 int cnt, rc; 353 354 354 355 /* pcibios will clear the counter; save the value */ 355 356 cnt = pe->freeze_count; 357 + tstamp = pe->tstamp; 356 358 357 359 /* 358 360 * We don't remove the corresponding PE instances because ··· 378 376 eeh_pe_restore_bars(pe); 379 377 380 378 /* Give the system 5 seconds to finish running the user-space 381 - * hotplug shutdown scripts, e.g. ifdown for ethernet. Yes, 382 - * this is a hack, but if we don't do this, and try to bring 383 - * the device up before the scripts have taken it down, 379 + * hotplug shutdown scripts, e.g. ifdown for ethernet. Yes, 380 + * this is a hack, but if we don't do this, and try to bring 381 + * the device up before the scripts have taken it down, 384 382 * potentially weird things happen. 385 383 */ 386 384 if (bus) { 387 385 ssleep(5); 388 386 pcibios_add_pci_devices(bus); 389 387 } 388 + 389 + pe->tstamp = tstamp; 390 390 pe->freeze_count = cnt; 391 391 392 392 return 0; ··· 399 395 */ 400 396 #define MAX_WAIT_FOR_RECOVERY 150 401 397 402 - /** 403 - * eeh_handle_event - Reset a PCI device after hard lockup. 404 - * @pe: EEH PE 405 - * 406 - * While PHB detects address or data parity errors on particular PCI 407 - * slot, the associated PE will be frozen. Besides, DMA's occurring 408 - * to wild addresses (which usually happen due to bugs in device 409 - * drivers or in PCI adapter firmware) can cause EEH error. #SERR, 410 - * #PERR or other misc PCI-related errors also can trigger EEH errors. 411 - * 412 - * Recovery process consists of unplugging the device driver (which 413 - * generated hotplug events to userspace), then issuing a PCI #RST to 414 - * the device, then reconfiguring the PCI config space for all bridges 415 - * & devices under this slot, and then finally restarting the device 416 - * drivers (which cause a second set of hotplug events to go out to 417 - * userspace). 418 - */ 419 - void eeh_handle_event(struct eeh_pe *pe) 398 + static void eeh_handle_normal_event(struct eeh_pe *pe) 420 399 { 421 400 struct pci_bus *frozen_bus; 422 401 int rc = 0; ··· 412 425 return; 413 426 } 414 427 428 + eeh_pe_update_time_stamp(pe); 415 429 pe->freeze_count++; 416 430 if (pe->freeze_count > EEH_MAX_ALLOWED_FREEZES) 417 431 goto excess_failures; ··· 425 437 * status ... if any child can't handle the reset, then the entire 426 438 * slot is dlpar removed and added. 427 439 */ 440 + pr_info("EEH: Notify device drivers to shutdown\n"); 428 441 eeh_pe_dev_traverse(pe, eeh_report_error, &result); 429 442 430 443 /* Get the current PCI slot state. This can take a long time, ··· 433 444 */ 434 445 rc = eeh_ops->wait_state(pe, MAX_WAIT_FOR_RECOVERY*1000); 435 446 if (rc < 0 || rc == EEH_STATE_NOT_SUPPORT) { 436 - printk(KERN_WARNING "EEH: Permanent failure\n"); 447 + pr_warning("EEH: Permanent failure\n"); 437 448 goto hard_fail; 438 449 } 439 450 ··· 441 452 * don't post the error log until after all dev drivers 442 453 * have been informed. 443 454 */ 455 + pr_info("EEH: Collect temporary log\n"); 444 456 eeh_slot_error_detail(pe, EEH_LOG_TEMP); 445 457 446 458 /* If all device drivers were EEH-unaware, then shut ··· 449 459 * go down willingly, without panicing the system. 450 460 */ 451 461 if (result == PCI_ERS_RESULT_NONE) { 462 + pr_info("EEH: Reset with hotplug activity\n"); 452 463 rc = eeh_reset_device(pe, frozen_bus); 453 464 if (rc) { 454 - printk(KERN_WARNING "EEH: Unable to reset, rc=%d\n", rc); 465 + pr_warning("%s: Unable to reset, err=%d\n", 466 + __func__, rc); 455 467 goto hard_fail; 456 468 } 457 469 } 458 470 459 471 /* If all devices reported they can proceed, then re-enable MMIO */ 460 472 if (result == PCI_ERS_RESULT_CAN_RECOVER) { 473 + pr_info("EEH: Enable I/O for affected devices\n"); 461 474 rc = eeh_pci_enable(pe, EEH_OPT_THAW_MMIO); 462 475 463 476 if (rc < 0) ··· 468 475 if (rc) { 469 476 result = PCI_ERS_RESULT_NEED_RESET; 470 477 } else { 478 + pr_info("EEH: Notify device drivers to resume I/O\n"); 471 479 result = PCI_ERS_RESULT_NONE; 472 480 eeh_pe_dev_traverse(pe, eeh_report_mmio_enabled, &result); 473 481 } ··· 476 482 477 483 /* If all devices reported they can proceed, then re-enable DMA */ 478 484 if (result == PCI_ERS_RESULT_CAN_RECOVER) { 485 + pr_info("EEH: Enabled DMA for affected devices\n"); 479 486 rc = eeh_pci_enable(pe, EEH_OPT_THAW_DMA); 480 487 481 488 if (rc < 0) ··· 489 494 490 495 /* If any device has a hard failure, then shut off everything. */ 491 496 if (result == PCI_ERS_RESULT_DISCONNECT) { 492 - printk(KERN_WARNING "EEH: Device driver gave up\n"); 497 + pr_warning("EEH: Device driver gave up\n"); 493 498 goto hard_fail; 494 499 } 495 500 496 501 /* If any device called out for a reset, then reset the slot */ 497 502 if (result == PCI_ERS_RESULT_NEED_RESET) { 503 + pr_info("EEH: Reset without hotplug activity\n"); 498 504 rc = eeh_reset_device(pe, NULL); 499 505 if (rc) { 500 - printk(KERN_WARNING "EEH: Cannot reset, rc=%d\n", rc); 506 + pr_warning("%s: Cannot reset, err=%d\n", 507 + __func__, rc); 501 508 goto hard_fail; 502 509 } 510 + 511 + pr_info("EEH: Notify device drivers " 512 + "the completion of reset\n"); 503 513 result = PCI_ERS_RESULT_NONE; 504 514 eeh_pe_dev_traverse(pe, eeh_report_reset, &result); 505 515 } ··· 512 512 /* All devices should claim they have recovered by now. */ 513 513 if ((result != PCI_ERS_RESULT_RECOVERED) && 514 514 (result != PCI_ERS_RESULT_NONE)) { 515 - printk(KERN_WARNING "EEH: Not recovered\n"); 515 + pr_warning("EEH: Not recovered\n"); 516 516 goto hard_fail; 517 517 } 518 518 519 519 /* Tell all device drivers that they can resume operations */ 520 + pr_info("EEH: Notify device driver to resume\n"); 520 521 eeh_pe_dev_traverse(pe, eeh_report_resume, NULL); 521 522 522 523 return; 523 - 524 + 524 525 excess_failures: 525 526 /* 526 527 * About 90% of all real-life EEH failures in the field ··· 551 550 pcibios_remove_pci_devices(frozen_bus); 552 551 } 553 552 553 + static void eeh_handle_special_event(void) 554 + { 555 + struct eeh_pe *pe, *phb_pe; 556 + struct pci_bus *bus; 557 + struct pci_controller *hose, *tmp; 558 + unsigned long flags; 559 + int rc = 0; 560 + 561 + /* 562 + * The return value from next_error() has been classified as follows. 563 + * It might be good to enumerate them. However, next_error() is only 564 + * supported by PowerNV platform for now. So it would be fine to use 565 + * integer directly: 566 + * 567 + * 4 - Dead IOC 3 - Dead PHB 568 + * 2 - Fenced PHB 1 - Frozen PE 569 + * 0 - No error found 570 + * 571 + */ 572 + rc = eeh_ops->next_error(&pe); 573 + if (rc <= 0) 574 + return; 575 + 576 + switch (rc) { 577 + case 4: 578 + /* Mark all PHBs in dead state */ 579 + eeh_serialize_lock(&flags); 580 + list_for_each_entry_safe(hose, tmp, 581 + &hose_list, list_node) { 582 + phb_pe = eeh_phb_pe_get(hose); 583 + if (!phb_pe) continue; 584 + 585 + eeh_pe_state_mark(phb_pe, 586 + EEH_PE_ISOLATED | EEH_PE_PHB_DEAD); 587 + } 588 + eeh_serialize_unlock(flags); 589 + 590 + /* Purge all events */ 591 + eeh_remove_event(NULL); 592 + break; 593 + case 3: 594 + case 2: 595 + case 1: 596 + /* Mark the PE in fenced state */ 597 + eeh_serialize_lock(&flags); 598 + if (rc == 3) 599 + eeh_pe_state_mark(pe, 600 + EEH_PE_ISOLATED | EEH_PE_PHB_DEAD); 601 + else 602 + eeh_pe_state_mark(pe, 603 + EEH_PE_ISOLATED | EEH_PE_RECOVERING); 604 + eeh_serialize_unlock(flags); 605 + 606 + /* Purge all events of the PHB */ 607 + eeh_remove_event(pe); 608 + break; 609 + default: 610 + pr_err("%s: Invalid value %d from next_error()\n", 611 + __func__, rc); 612 + return; 613 + } 614 + 615 + /* 616 + * For fenced PHB and frozen PE, it's handled as normal 617 + * event. We have to remove the affected PHBs for dead 618 + * PHB and IOC 619 + */ 620 + if (rc == 2 || rc == 1) 621 + eeh_handle_normal_event(pe); 622 + else { 623 + list_for_each_entry_safe(hose, tmp, 624 + &hose_list, list_node) { 625 + phb_pe = eeh_phb_pe_get(hose); 626 + if (!phb_pe || !(phb_pe->state & EEH_PE_PHB_DEAD)) 627 + continue; 628 + 629 + bus = eeh_pe_bus_get(phb_pe); 630 + /* Notify all devices that they're about to go down. */ 631 + eeh_pe_dev_traverse(pe, eeh_report_failure, NULL); 632 + pcibios_remove_pci_devices(bus); 633 + } 634 + } 635 + } 636 + 637 + /** 638 + * eeh_handle_event - Reset a PCI device after hard lockup. 639 + * @pe: EEH PE 640 + * 641 + * While PHB detects address or data parity errors on particular PCI 642 + * slot, the associated PE will be frozen. Besides, DMA's occurring 643 + * to wild addresses (which usually happen due to bugs in device 644 + * drivers or in PCI adapter firmware) can cause EEH error. #SERR, 645 + * #PERR or other misc PCI-related errors also can trigger EEH errors. 646 + * 647 + * Recovery process consists of unplugging the device driver (which 648 + * generated hotplug events to userspace), then issuing a PCI #RST to 649 + * the device, then reconfiguring the PCI config space for all bridges 650 + * & devices under this slot, and then finally restarting the device 651 + * drivers (which cause a second set of hotplug events to go out to 652 + * userspace). 653 + */ 654 + void eeh_handle_event(struct eeh_pe *pe) 655 + { 656 + if (pe) 657 + eeh_handle_normal_event(pe); 658 + else 659 + eeh_handle_special_event(); 660 + }
+84 -44
arch/powerpc/platforms/pseries/eeh_event.c arch/powerpc/kernel/eeh_event.c
··· 18 18 19 19 #include <linux/delay.h> 20 20 #include <linux/list.h> 21 - #include <linux/mutex.h> 22 21 #include <linux/sched.h> 22 + #include <linux/semaphore.h> 23 23 #include <linux/pci.h> 24 24 #include <linux/slab.h> 25 - #include <linux/workqueue.h> 26 25 #include <linux/kthread.h> 27 26 #include <asm/eeh_event.h> 28 27 #include <asm/ppc-pci.h> ··· 34 35 * work-queue, where a worker thread can drive recovery. 35 36 */ 36 37 37 - /* EEH event workqueue setup. */ 38 38 static DEFINE_SPINLOCK(eeh_eventlist_lock); 39 + static struct semaphore eeh_eventlist_sem; 39 40 LIST_HEAD(eeh_eventlist); 40 - static void eeh_thread_launcher(struct work_struct *); 41 - DECLARE_WORK(eeh_event_wq, eeh_thread_launcher); 42 - 43 - /* Serialize reset sequences for a given pci device */ 44 - DEFINE_MUTEX(eeh_event_mutex); 45 41 46 42 /** 47 43 * eeh_event_handler - Dispatch EEH events. ··· 54 60 struct eeh_event *event; 55 61 struct eeh_pe *pe; 56 62 57 - spin_lock_irqsave(&eeh_eventlist_lock, flags); 58 - event = NULL; 63 + while (!kthread_should_stop()) { 64 + if (down_interruptible(&eeh_eventlist_sem)) 65 + break; 59 66 60 - /* Unqueue the event, get ready to process. */ 61 - if (!list_empty(&eeh_eventlist)) { 62 - event = list_entry(eeh_eventlist.next, struct eeh_event, list); 63 - list_del(&event->list); 64 - } 65 - spin_unlock_irqrestore(&eeh_eventlist_lock, flags); 67 + /* Fetch EEH event from the queue */ 68 + spin_lock_irqsave(&eeh_eventlist_lock, flags); 69 + event = NULL; 70 + if (!list_empty(&eeh_eventlist)) { 71 + event = list_entry(eeh_eventlist.next, 72 + struct eeh_event, list); 73 + list_del(&event->list); 74 + } 75 + spin_unlock_irqrestore(&eeh_eventlist_lock, flags); 76 + if (!event) 77 + continue; 66 78 67 - if (event == NULL) 68 - return 0; 79 + /* We might have event without binding PE */ 80 + pe = event->pe; 81 + if (pe) { 82 + eeh_pe_state_mark(pe, EEH_PE_RECOVERING); 83 + pr_info("EEH: Detected PCI bus error on PHB#%d-PE#%x\n", 84 + pe->phb->global_number, pe->addr); 85 + eeh_handle_event(pe); 86 + eeh_pe_state_clear(pe, EEH_PE_RECOVERING); 87 + } else { 88 + eeh_handle_event(NULL); 89 + } 69 90 70 - /* Serialize processing of EEH events */ 71 - mutex_lock(&eeh_event_mutex); 72 - pe = event->pe; 73 - eeh_pe_state_mark(pe, EEH_PE_RECOVERING); 74 - pr_info("EEH: Detected PCI bus error on PHB#%d-PE#%x\n", 75 - pe->phb->global_number, pe->addr); 76 - 77 - set_current_state(TASK_INTERRUPTIBLE); /* Don't add to load average */ 78 - eeh_handle_event(pe); 79 - eeh_pe_state_clear(pe, EEH_PE_RECOVERING); 80 - 81 - kfree(event); 82 - mutex_unlock(&eeh_event_mutex); 83 - 84 - /* If there are no new errors after an hour, clear the counter. */ 85 - if (pe && pe->freeze_count > 0) { 86 - msleep_interruptible(3600*1000); 87 - if (pe->freeze_count > 0) 88 - pe->freeze_count--; 89 - 91 + kfree(event); 90 92 } 91 93 92 94 return 0; 93 95 } 94 96 95 97 /** 96 - * eeh_thread_launcher - Start kernel thread to handle EEH events 97 - * @dummy - unused 98 + * eeh_event_init - Start kernel thread to handle EEH events 98 99 * 99 100 * This routine is called to start the kernel thread for processing 100 101 * EEH event. 101 102 */ 102 - static void eeh_thread_launcher(struct work_struct *dummy) 103 + int eeh_event_init(void) 103 104 { 104 - if (IS_ERR(kthread_run(eeh_event_handler, NULL, "eehd"))) 105 - printk(KERN_ERR "Failed to start EEH daemon\n"); 105 + struct task_struct *t; 106 + int ret = 0; 107 + 108 + /* Initialize semaphore */ 109 + sema_init(&eeh_eventlist_sem, 0); 110 + 111 + t = kthread_run(eeh_event_handler, NULL, "eehd"); 112 + if (IS_ERR(t)) { 113 + ret = PTR_ERR(t); 114 + pr_err("%s: Failed to start EEH daemon (%d)\n", 115 + __func__, ret); 116 + return ret; 117 + } 118 + 119 + return 0; 106 120 } 107 121 108 122 /** ··· 138 136 list_add(&event->list, &eeh_eventlist); 139 137 spin_unlock_irqrestore(&eeh_eventlist_lock, flags); 140 138 141 - schedule_work(&eeh_event_wq); 139 + /* For EEH deamon to knick in */ 140 + up(&eeh_eventlist_sem); 142 141 143 142 return 0; 143 + } 144 + 145 + /** 146 + * eeh_remove_event - Remove EEH event from the queue 147 + * @pe: Event binding to the PE 148 + * 149 + * On PowerNV platform, we might have subsequent coming events 150 + * is part of the former one. For that case, those subsequent 151 + * coming events are totally duplicated and unnecessary, thus 152 + * they should be removed. 153 + */ 154 + void eeh_remove_event(struct eeh_pe *pe) 155 + { 156 + unsigned long flags; 157 + struct eeh_event *event, *tmp; 158 + 159 + spin_lock_irqsave(&eeh_eventlist_lock, flags); 160 + list_for_each_entry_safe(event, tmp, &eeh_eventlist, list) { 161 + /* 162 + * If we don't have valid PE passed in, that means 163 + * we already have event corresponding to dead IOC 164 + * and all events should be purged. 165 + */ 166 + if (!pe) { 167 + list_del(&event->list); 168 + kfree(event); 169 + } else if (pe->type & EEH_PE_PHB) { 170 + if (event->pe && event->pe->phb == pe->phb) { 171 + list_del(&event->list); 172 + kfree(event); 173 + } 174 + } else if (event->pe == pe) { 175 + list_del(&event->list); 176 + kfree(event); 177 + } 178 + } 179 + spin_unlock_irqrestore(&eeh_eventlist_lock, flags); 144 180 }
+192 -45
arch/powerpc/platforms/pseries/eeh_pe.c arch/powerpc/kernel/eeh_pe.c
··· 22 22 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 23 23 */ 24 24 25 + #include <linux/delay.h> 25 26 #include <linux/export.h> 26 27 #include <linux/gfp.h> 27 28 #include <linux/init.h> ··· 79 78 } 80 79 81 80 /* Put it into the list */ 82 - eeh_lock(); 83 81 list_add_tail(&pe->child, &eeh_phb_pe); 84 - eeh_unlock(); 85 82 86 83 pr_debug("EEH: Add PE for PHB#%d\n", phb->global_number); 87 84 ··· 94 95 * hierarchy tree is composed of PHB PEs. The function is used 95 96 * to retrieve the corresponding PHB PE according to the given PHB. 96 97 */ 97 - static struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb) 98 + struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb) 98 99 { 99 100 struct eeh_pe *pe; 100 101 ··· 184 185 return NULL; 185 186 } 186 187 187 - eeh_lock(); 188 - 189 188 /* Traverse root PE */ 190 189 for (pe = root; pe; pe = eeh_pe_next(pe, root)) { 191 190 eeh_pe_for_each_dev(pe, edev) { 192 191 ret = fn(edev, flag); 193 - if (ret) { 194 - eeh_unlock(); 192 + if (ret) 195 193 return ret; 196 - } 197 194 } 198 195 } 199 - 200 - eeh_unlock(); 201 196 202 197 return NULL; 203 198 } ··· 221 228 return pe; 222 229 223 230 /* Try BDF address */ 224 - if (edev->pe_config_addr && 231 + if (edev->config_addr && 225 232 (edev->config_addr == pe->config_addr)) 226 233 return pe; 227 234 ··· 239 246 * which is composed of PCI bus/device/function number, or unified 240 247 * PE address. 241 248 */ 242 - static struct eeh_pe *eeh_pe_get(struct eeh_dev *edev) 249 + struct eeh_pe *eeh_pe_get(struct eeh_dev *edev) 243 250 { 244 251 struct eeh_pe *root = eeh_phb_pe_get(edev->phb); 245 252 struct eeh_pe *pe; ··· 298 305 { 299 306 struct eeh_pe *pe, *parent; 300 307 301 - eeh_lock(); 302 - 303 308 /* 304 309 * Search the PE has been existing or not according 305 310 * to the PE address. If that has been existing, the ··· 307 316 pe = eeh_pe_get(edev); 308 317 if (pe && !(pe->type & EEH_PE_INVALID)) { 309 318 if (!edev->pe_config_addr) { 310 - eeh_unlock(); 311 319 pr_err("%s: PE with addr 0x%x already exists\n", 312 320 __func__, edev->config_addr); 313 321 return -EEXIST; ··· 318 328 319 329 /* Put the edev to PE */ 320 330 list_add_tail(&edev->list, &pe->edevs); 321 - eeh_unlock(); 322 331 pr_debug("EEH: Add %s to Bus PE#%x\n", 323 332 edev->dn->full_name, pe->addr); 324 333 ··· 336 347 parent->type &= ~EEH_PE_INVALID; 337 348 parent = parent->parent; 338 349 } 339 - eeh_unlock(); 340 350 pr_debug("EEH: Add %s to Device PE#%x, Parent PE#%x\n", 341 351 edev->dn->full_name, pe->addr, pe->parent->addr); 342 352 ··· 345 357 /* Create a new EEH PE */ 346 358 pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE); 347 359 if (!pe) { 348 - eeh_unlock(); 349 360 pr_err("%s: out of memory!\n", __func__); 350 361 return -ENOMEM; 351 362 } 352 363 pe->addr = edev->pe_config_addr; 353 364 pe->config_addr = edev->config_addr; 365 + 366 + /* 367 + * While doing PE reset, we probably hot-reset the 368 + * upstream bridge. However, the PCI devices including 369 + * the associated EEH devices might be removed when EEH 370 + * core is doing recovery. So that won't safe to retrieve 371 + * the bridge through downstream EEH device. We have to 372 + * trace the parent PCI bus, then the upstream bridge. 373 + */ 374 + if (eeh_probe_mode_dev()) 375 + pe->bus = eeh_dev_to_pci_dev(edev)->bus; 354 376 355 377 /* 356 378 * Put the new EEH PE into hierarchy tree. If the parent ··· 372 374 if (!parent) { 373 375 parent = eeh_phb_pe_get(edev->phb); 374 376 if (!parent) { 375 - eeh_unlock(); 376 377 pr_err("%s: No PHB PE is found (PHB Domain=%d)\n", 377 378 __func__, edev->phb->global_number); 378 379 edev->pe = NULL; ··· 388 391 list_add_tail(&pe->child, &parent->child_list); 389 392 list_add_tail(&edev->list, &pe->edevs); 390 393 edev->pe = pe; 391 - eeh_unlock(); 392 394 pr_debug("EEH: Add %s to Device PE#%x, Parent PE#%x\n", 393 395 edev->dn->full_name, pe->addr, pe->parent->addr); 394 396 ··· 414 418 __func__, edev->dn->full_name); 415 419 return -EEXIST; 416 420 } 417 - 418 - eeh_lock(); 419 421 420 422 /* Remove the EEH device */ 421 423 pe = edev->pe; ··· 459 465 pe = parent; 460 466 } 461 467 462 - eeh_unlock(); 463 - 464 468 return 0; 469 + } 470 + 471 + /** 472 + * eeh_pe_update_time_stamp - Update PE's frozen time stamp 473 + * @pe: EEH PE 474 + * 475 + * We have time stamp for each PE to trace its time of getting 476 + * frozen in last hour. The function should be called to update 477 + * the time stamp on first error of the specific PE. On the other 478 + * handle, we needn't account for errors happened in last hour. 479 + */ 480 + void eeh_pe_update_time_stamp(struct eeh_pe *pe) 481 + { 482 + struct timeval tstamp; 483 + 484 + if (!pe) return; 485 + 486 + if (pe->freeze_count <= 0) { 487 + pe->freeze_count = 0; 488 + do_gettimeofday(&pe->tstamp); 489 + } else { 490 + do_gettimeofday(&tstamp); 491 + if (tstamp.tv_sec - pe->tstamp.tv_sec > 3600) { 492 + pe->tstamp = tstamp; 493 + pe->freeze_count = 0; 494 + } 495 + } 465 496 } 466 497 467 498 /** ··· 531 512 */ 532 513 void eeh_pe_state_mark(struct eeh_pe *pe, int state) 533 514 { 534 - eeh_lock(); 535 515 eeh_pe_traverse(pe, __eeh_pe_state_mark, &state); 536 - eeh_unlock(); 537 516 } 538 517 539 518 /** ··· 565 548 */ 566 549 void eeh_pe_state_clear(struct eeh_pe *pe, int state) 567 550 { 568 - eeh_lock(); 569 551 eeh_pe_traverse(pe, __eeh_pe_state_clear, &state); 570 - eeh_unlock(); 571 552 } 572 553 573 - /** 574 - * eeh_restore_one_device_bars - Restore the Base Address Registers for one device 575 - * @data: EEH device 576 - * @flag: Unused 554 + /* 555 + * Some PCI bridges (e.g. PLX bridges) have primary/secondary 556 + * buses assigned explicitly by firmware, and we probably have 557 + * lost that after reset. So we have to delay the check until 558 + * the PCI-CFG registers have been restored for the parent 559 + * bridge. 577 560 * 578 - * Loads the PCI configuration space base address registers, 579 - * the expansion ROM base address, the latency timer, and etc. 580 - * from the saved values in the device node. 561 + * Don't use normal PCI-CFG accessors, which probably has been 562 + * blocked on normal path during the stage. So we need utilize 563 + * eeh operations, which is always permitted. 581 564 */ 582 - static void *eeh_restore_one_device_bars(void *data, void *flag) 565 + static void eeh_bridge_check_link(struct pci_dev *pdev, 566 + struct device_node *dn) 567 + { 568 + int cap; 569 + uint32_t val; 570 + int timeout = 0; 571 + 572 + /* 573 + * We only check root port and downstream ports of 574 + * PCIe switches 575 + */ 576 + if (!pci_is_pcie(pdev) || 577 + (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT && 578 + pci_pcie_type(pdev) != PCI_EXP_TYPE_DOWNSTREAM)) 579 + return; 580 + 581 + pr_debug("%s: Check PCIe link for %s ...\n", 582 + __func__, pci_name(pdev)); 583 + 584 + /* Check slot status */ 585 + cap = pdev->pcie_cap; 586 + eeh_ops->read_config(dn, cap + PCI_EXP_SLTSTA, 2, &val); 587 + if (!(val & PCI_EXP_SLTSTA_PDS)) { 588 + pr_debug(" No card in the slot (0x%04x) !\n", val); 589 + return; 590 + } 591 + 592 + /* Check power status if we have the capability */ 593 + eeh_ops->read_config(dn, cap + PCI_EXP_SLTCAP, 2, &val); 594 + if (val & PCI_EXP_SLTCAP_PCP) { 595 + eeh_ops->read_config(dn, cap + PCI_EXP_SLTCTL, 2, &val); 596 + if (val & PCI_EXP_SLTCTL_PCC) { 597 + pr_debug(" In power-off state, power it on ...\n"); 598 + val &= ~(PCI_EXP_SLTCTL_PCC | PCI_EXP_SLTCTL_PIC); 599 + val |= (0x0100 & PCI_EXP_SLTCTL_PIC); 600 + eeh_ops->write_config(dn, cap + PCI_EXP_SLTCTL, 2, val); 601 + msleep(2 * 1000); 602 + } 603 + } 604 + 605 + /* Enable link */ 606 + eeh_ops->read_config(dn, cap + PCI_EXP_LNKCTL, 2, &val); 607 + val &= ~PCI_EXP_LNKCTL_LD; 608 + eeh_ops->write_config(dn, cap + PCI_EXP_LNKCTL, 2, val); 609 + 610 + /* Check link */ 611 + eeh_ops->read_config(dn, cap + PCI_EXP_LNKCAP, 4, &val); 612 + if (!(val & PCI_EXP_LNKCAP_DLLLARC)) { 613 + pr_debug(" No link reporting capability (0x%08x) \n", val); 614 + msleep(1000); 615 + return; 616 + } 617 + 618 + /* Wait the link is up until timeout (5s) */ 619 + timeout = 0; 620 + while (timeout < 5000) { 621 + msleep(20); 622 + timeout += 20; 623 + 624 + eeh_ops->read_config(dn, cap + PCI_EXP_LNKSTA, 2, &val); 625 + if (val & PCI_EXP_LNKSTA_DLLLA) 626 + break; 627 + } 628 + 629 + if (val & PCI_EXP_LNKSTA_DLLLA) 630 + pr_debug(" Link up (%s)\n", 631 + (val & PCI_EXP_LNKSTA_CLS_2_5GB) ? "2.5GB" : "5GB"); 632 + else 633 + pr_debug(" Link not ready (0x%04x)\n", val); 634 + } 635 + 636 + #define BYTE_SWAP(OFF) (8*((OFF)/4)+3-(OFF)) 637 + #define SAVED_BYTE(OFF) (((u8 *)(edev->config_space))[BYTE_SWAP(OFF)]) 638 + 639 + static void eeh_restore_bridge_bars(struct pci_dev *pdev, 640 + struct eeh_dev *edev, 641 + struct device_node *dn) 642 + { 643 + int i; 644 + 645 + /* 646 + * Device BARs: 0x10 - 0x18 647 + * Bus numbers and windows: 0x18 - 0x30 648 + */ 649 + for (i = 4; i < 13; i++) 650 + eeh_ops->write_config(dn, i*4, 4, edev->config_space[i]); 651 + /* Rom: 0x38 */ 652 + eeh_ops->write_config(dn, 14*4, 4, edev->config_space[14]); 653 + 654 + /* Cache line & Latency timer: 0xC 0xD */ 655 + eeh_ops->write_config(dn, PCI_CACHE_LINE_SIZE, 1, 656 + SAVED_BYTE(PCI_CACHE_LINE_SIZE)); 657 + eeh_ops->write_config(dn, PCI_LATENCY_TIMER, 1, 658 + SAVED_BYTE(PCI_LATENCY_TIMER)); 659 + /* Max latency, min grant, interrupt ping and line: 0x3C */ 660 + eeh_ops->write_config(dn, 15*4, 4, edev->config_space[15]); 661 + 662 + /* PCI Command: 0x4 */ 663 + eeh_ops->write_config(dn, PCI_COMMAND, 4, edev->config_space[1]); 664 + 665 + /* Check the PCIe link is ready */ 666 + eeh_bridge_check_link(pdev, dn); 667 + } 668 + 669 + static void eeh_restore_device_bars(struct eeh_dev *edev, 670 + struct device_node *dn) 583 671 { 584 672 int i; 585 673 u32 cmd; 586 - struct eeh_dev *edev = (struct eeh_dev *)data; 587 - struct device_node *dn = eeh_dev_to_of_node(edev); 588 674 589 675 for (i = 4; i < 10; i++) 590 676 eeh_ops->write_config(dn, i*4, 4, edev->config_space[i]); 591 677 /* 12 == Expansion ROM Address */ 592 678 eeh_ops->write_config(dn, 12*4, 4, edev->config_space[12]); 593 - 594 - #define BYTE_SWAP(OFF) (8*((OFF)/4)+3-(OFF)) 595 - #define SAVED_BYTE(OFF) (((u8 *)(edev->config_space))[BYTE_SWAP(OFF)]) 596 679 597 680 eeh_ops->write_config(dn, PCI_CACHE_LINE_SIZE, 1, 598 681 SAVED_BYTE(PCI_CACHE_LINE_SIZE)); ··· 716 599 else 717 600 cmd &= ~PCI_COMMAND_SERR; 718 601 eeh_ops->write_config(dn, PCI_COMMAND, 4, cmd); 602 + } 603 + 604 + /** 605 + * eeh_restore_one_device_bars - Restore the Base Address Registers for one device 606 + * @data: EEH device 607 + * @flag: Unused 608 + * 609 + * Loads the PCI configuration space base address registers, 610 + * the expansion ROM base address, the latency timer, and etc. 611 + * from the saved values in the device node. 612 + */ 613 + static void *eeh_restore_one_device_bars(void *data, void *flag) 614 + { 615 + struct pci_dev *pdev = NULL; 616 + struct eeh_dev *edev = (struct eeh_dev *)data; 617 + struct device_node *dn = eeh_dev_to_of_node(edev); 618 + 619 + /* Trace the PCI bridge */ 620 + if (eeh_probe_mode_dev()) { 621 + pdev = eeh_dev_to_pci_dev(edev); 622 + if (pdev->hdr_type != PCI_HEADER_TYPE_BRIDGE) 623 + pdev = NULL; 624 + } 625 + 626 + if (pdev) 627 + eeh_restore_bridge_bars(pdev, edev, dn); 628 + else 629 + eeh_restore_device_bars(edev, dn); 719 630 720 631 return NULL; 721 632 } ··· 780 635 struct eeh_dev *edev; 781 636 struct pci_dev *pdev; 782 637 783 - eeh_lock(); 784 - 785 638 if (pe->type & EEH_PE_PHB) { 786 639 bus = pe->phb->bus; 787 640 } else if (pe->type & EEH_PE_BUS || 788 641 pe->type & EEH_PE_DEVICE) { 642 + if (pe->bus) { 643 + bus = pe->bus; 644 + goto out; 645 + } 646 + 789 647 edev = list_first_entry(&pe->edevs, struct eeh_dev, list); 790 648 pdev = eeh_dev_to_pci_dev(edev); 791 649 if (pdev) 792 650 bus = pdev->bus; 793 651 } 794 652 795 - eeh_unlock(); 796 - 653 + out: 797 654 return bus; 798 655 }
-1
arch/powerpc/platforms/pseries/eeh_sysfs.c arch/powerpc/kernel/eeh_sysfs.c
··· 72 72 device_remove_file(&pdev->dev, &dev_attr_eeh_config_addr); 73 73 device_remove_file(&pdev->dev, &dev_attr_eeh_pe_config_addr); 74 74 } 75 -
+1 -1
arch/powerpc/platforms/pseries/io_event_irq.c
··· 115 115 * by scope or event type alone. For example, Torrent ISR route change 116 116 * event is reported with scope 0x00 (Not Applicatable) rather than 117 117 * 0x3B (Torrent-hub). It is better to let the clients to identify 118 - * who owns the the event. 118 + * who owns the event. 119 119 */ 120 120 121 121 static irqreturn_t ioei_interrupt(int irq, void *dev_id)
+4
arch/powerpc/platforms/pseries/iommu.c
··· 614 614 615 615 iommu_table_setparms(pci->phb, dn, tbl); 616 616 pci->iommu_table = iommu_init_table(tbl, pci->phb->node); 617 + iommu_register_group(tbl, pci_domain_nr(bus), 0); 617 618 618 619 /* Divide the rest (1.75GB) among the children */ 619 620 pci->phb->dma_window_size = 0x80000000ul; ··· 659 658 ppci->phb->node); 660 659 iommu_table_setparms_lpar(ppci->phb, pdn, tbl, dma_window); 661 660 ppci->iommu_table = iommu_init_table(tbl, ppci->phb->node); 661 + iommu_register_group(tbl, pci_domain_nr(bus), 0); 662 662 pr_debug(" created table: %p\n", ppci->iommu_table); 663 663 } 664 664 } ··· 686 684 phb->node); 687 685 iommu_table_setparms(phb, dn, tbl); 688 686 PCI_DN(dn)->iommu_table = iommu_init_table(tbl, phb->node); 687 + iommu_register_group(tbl, pci_domain_nr(phb->bus), 0); 689 688 set_iommu_table_base(&dev->dev, PCI_DN(dn)->iommu_table); 690 689 return; 691 690 } ··· 1187 1184 pci->phb->node); 1188 1185 iommu_table_setparms_lpar(pci->phb, pdn, tbl, dma_window); 1189 1186 pci->iommu_table = iommu_init_table(tbl, pci->phb->node); 1187 + iommu_register_group(tbl, pci_domain_nr(pci->phb->bus), 0); 1190 1188 pr_debug(" created table: %p\n", pci->iommu_table); 1191 1189 } else { 1192 1190 pr_debug(" found DMA window, table: %p\n", pci->iommu_table);
+130 -12
arch/powerpc/platforms/pseries/lpar.c
··· 45 45 #include "plpar_wrappers.h" 46 46 #include "pseries.h" 47 47 48 + /* Flag bits for H_BULK_REMOVE */ 49 + #define HBR_REQUEST 0x4000000000000000UL 50 + #define HBR_RESPONSE 0x8000000000000000UL 51 + #define HBR_END 0xc000000000000000UL 52 + #define HBR_AVPN 0x0200000000000000UL 53 + #define HBR_ANDCOND 0x0100000000000000UL 54 + 48 55 49 56 /* in hvCall.S */ 50 57 EXPORT_SYMBOL(plpar_hcall); ··· 70 63 71 64 if (cpu_has_feature(CPU_FTR_ALTIVEC)) 72 65 lppaca_of(cpu).vmxregs_in_use = 1; 66 + 67 + if (cpu_has_feature(CPU_FTR_ARCH_207S)) 68 + lppaca_of(cpu).ebb_regs_in_use = 1; 73 69 74 70 addr = __pa(&lppaca_of(cpu)); 75 71 ret = register_vpa(hwcpu, addr); ··· 250 240 static long pSeries_lpar_hpte_updatepp(unsigned long slot, 251 241 unsigned long newpp, 252 242 unsigned long vpn, 253 - int psize, int ssize, int local) 243 + int psize, int apsize, 244 + int ssize, int local) 254 245 { 255 246 unsigned long lpar_rc; 256 247 unsigned long flags = (newpp & 7) | H_AVPN; ··· 339 328 } 340 329 341 330 static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn, 342 - int psize, int ssize, int local) 331 + int psize, int apsize, 332 + int ssize, int local) 343 333 { 344 334 unsigned long want_v; 345 335 unsigned long lpar_rc; ··· 357 345 BUG_ON(lpar_rc != H_SUCCESS); 358 346 } 359 347 348 + /* 349 + * Limit iterations holding pSeries_lpar_tlbie_lock to 3. We also need 350 + * to make sure that we avoid bouncing the hypervisor tlbie lock. 351 + */ 352 + #define PPC64_HUGE_HPTE_BATCH 12 353 + 354 + static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot, 355 + unsigned long *vpn, int count, 356 + int psize, int ssize) 357 + { 358 + unsigned long param[8]; 359 + int i = 0, pix = 0, rc; 360 + unsigned long flags = 0; 361 + int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE); 362 + 363 + if (lock_tlbie) 364 + spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags); 365 + 366 + for (i = 0; i < count; i++) { 367 + 368 + if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) { 369 + pSeries_lpar_hpte_invalidate(slot[i], vpn[i], psize, 0, 370 + ssize, 0); 371 + } else { 372 + param[pix] = HBR_REQUEST | HBR_AVPN | slot[i]; 373 + param[pix+1] = hpte_encode_avpn(vpn[i], psize, ssize); 374 + pix += 2; 375 + if (pix == 8) { 376 + rc = plpar_hcall9(H_BULK_REMOVE, param, 377 + param[0], param[1], param[2], 378 + param[3], param[4], param[5], 379 + param[6], param[7]); 380 + BUG_ON(rc != H_SUCCESS); 381 + pix = 0; 382 + } 383 + } 384 + } 385 + if (pix) { 386 + param[pix] = HBR_END; 387 + rc = plpar_hcall9(H_BULK_REMOVE, param, param[0], param[1], 388 + param[2], param[3], param[4], param[5], 389 + param[6], param[7]); 390 + BUG_ON(rc != H_SUCCESS); 391 + } 392 + 393 + if (lock_tlbie) 394 + spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags); 395 + } 396 + 397 + static void pSeries_lpar_hugepage_invalidate(struct mm_struct *mm, 398 + unsigned char *hpte_slot_array, 399 + unsigned long addr, int psize) 400 + { 401 + int ssize = 0, i, index = 0; 402 + unsigned long s_addr = addr; 403 + unsigned int max_hpte_count, valid; 404 + unsigned long vpn_array[PPC64_HUGE_HPTE_BATCH]; 405 + unsigned long slot_array[PPC64_HUGE_HPTE_BATCH]; 406 + unsigned long shift, hidx, vpn = 0, vsid, hash, slot; 407 + 408 + shift = mmu_psize_defs[psize].shift; 409 + max_hpte_count = 1U << (PMD_SHIFT - shift); 410 + 411 + for (i = 0; i < max_hpte_count; i++) { 412 + valid = hpte_valid(hpte_slot_array, i); 413 + if (!valid) 414 + continue; 415 + hidx = hpte_hash_index(hpte_slot_array, i); 416 + 417 + /* get the vpn */ 418 + addr = s_addr + (i * (1ul << shift)); 419 + if (!is_kernel_addr(addr)) { 420 + ssize = user_segment_size(addr); 421 + vsid = get_vsid(mm->context.id, addr, ssize); 422 + WARN_ON(vsid == 0); 423 + } else { 424 + vsid = get_kernel_vsid(addr, mmu_kernel_ssize); 425 + ssize = mmu_kernel_ssize; 426 + } 427 + 428 + vpn = hpt_vpn(addr, vsid, ssize); 429 + hash = hpt_hash(vpn, shift, ssize); 430 + if (hidx & _PTEIDX_SECONDARY) 431 + hash = ~hash; 432 + 433 + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; 434 + slot += hidx & _PTEIDX_GROUP_IX; 435 + 436 + slot_array[index] = slot; 437 + vpn_array[index] = vpn; 438 + if (index == PPC64_HUGE_HPTE_BATCH - 1) { 439 + /* 440 + * Now do a bluk invalidate 441 + */ 442 + __pSeries_lpar_hugepage_invalidate(slot_array, 443 + vpn_array, 444 + PPC64_HUGE_HPTE_BATCH, 445 + psize, ssize); 446 + index = 0; 447 + } else 448 + index++; 449 + } 450 + if (index) 451 + __pSeries_lpar_hugepage_invalidate(slot_array, vpn_array, 452 + index, psize, ssize); 453 + } 454 + 360 455 static void pSeries_lpar_hpte_removebolted(unsigned long ea, 361 456 int psize, int ssize) 362 457 { ··· 475 356 476 357 slot = pSeries_lpar_hpte_find(vpn, psize, ssize); 477 358 BUG_ON(slot == -1); 478 - 479 - pSeries_lpar_hpte_invalidate(slot, vpn, psize, ssize, 0); 359 + /* 360 + * lpar doesn't use the passed actual page size 361 + */ 362 + pSeries_lpar_hpte_invalidate(slot, vpn, psize, 0, ssize, 0); 480 363 } 481 - 482 - /* Flag bits for H_BULK_REMOVE */ 483 - #define HBR_REQUEST 0x4000000000000000UL 484 - #define HBR_RESPONSE 0x8000000000000000UL 485 - #define HBR_END 0xc000000000000000UL 486 - #define HBR_AVPN 0x0200000000000000UL 487 - #define HBR_ANDCOND 0x0100000000000000UL 488 364 489 365 /* 490 366 * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie ··· 514 400 slot = (hash & htab_hash_mask) * HPTES_PER_GROUP; 515 401 slot += hidx & _PTEIDX_GROUP_IX; 516 402 if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) { 403 + /* 404 + * lpar doesn't use the passed actual page size 405 + */ 517 406 pSeries_lpar_hpte_invalidate(slot, vpn, psize, 518 - ssize, local); 407 + 0, ssize, local); 519 408 } else { 520 409 param[pix] = HBR_REQUEST | HBR_AVPN | slot; 521 410 param[pix+1] = hpte_encode_avpn(vpn, psize, ··· 569 452 ppc_md.hpte_removebolted = pSeries_lpar_hpte_removebolted; 570 453 ppc_md.flush_hash_range = pSeries_lpar_flush_hash_range; 571 454 ppc_md.hpte_clear_all = pSeries_lpar_hptab_clear; 455 + ppc_md.hugepage_invalidate = pSeries_lpar_hugepage_invalidate; 572 456 } 573 457 574 458 #ifdef CONFIG_PPC_SMLPAR
+498 -146
arch/powerpc/platforms/pseries/nvram.c
··· 18 18 #include <linux/spinlock.h> 19 19 #include <linux/slab.h> 20 20 #include <linux/kmsg_dump.h> 21 + #include <linux/pstore.h> 21 22 #include <linux/ctype.h> 22 23 #include <linux/zlib.h> 23 24 #include <asm/uaccess.h> ··· 29 28 30 29 /* Max bytes to read/write in one go */ 31 30 #define NVRW_CNT 0x20 31 + 32 + /* 33 + * Set oops header version to distingush between old and new format header. 34 + * lnx,oops-log partition max size is 4000, header version > 4000 will 35 + * help in identifying new header. 36 + */ 37 + #define OOPS_HDR_VERSION 5000 32 38 33 39 static unsigned int nvram_size; 34 40 static int nvram_fetch, nvram_store; ··· 53 45 int min_size; /* minimum acceptable size (0 means req_size) */ 54 46 long size; /* size of data portion (excluding err_log_info) */ 55 47 long index; /* offset of data portion of partition */ 48 + bool os_partition; /* partition initialized by OS, not FW */ 56 49 }; 57 50 58 51 static struct nvram_os_partition rtas_log_partition = { 59 52 .name = "ibm,rtas-log", 60 53 .req_size = 2079, 61 54 .min_size = 1055, 62 - .index = -1 55 + .index = -1, 56 + .os_partition = true 63 57 }; 64 58 65 59 static struct nvram_os_partition oops_log_partition = { 66 60 .name = "lnx,oops-log", 67 61 .req_size = 4000, 68 62 .min_size = 2000, 69 - .index = -1 63 + .index = -1, 64 + .os_partition = true 70 65 }; 71 66 72 67 static const char *pseries_nvram_os_partitions[] = { ··· 77 66 "lnx,oops-log", 78 67 NULL 79 68 }; 69 + 70 + struct oops_log_info { 71 + u16 version; 72 + u16 report_length; 73 + u64 timestamp; 74 + } __attribute__((packed)); 80 75 81 76 static void oops_to_nvram(struct kmsg_dumper *dumper, 82 77 enum kmsg_dump_reason reason); ··· 100 83 101 84 * big_oops_buf[] holds the uncompressed text we're capturing. 102 85 * 103 - * oops_buf[] holds the compressed text, preceded by a prefix. 104 - * The prefix is just a u16 holding the length of the compressed* text. 105 - * (*Or uncompressed, if compression fails.) oops_buf[] gets written 106 - * to NVRAM. 86 + * oops_buf[] holds the compressed text, preceded by a oops header. 87 + * oops header has u16 holding the version of oops header (to differentiate 88 + * between old and new format header) followed by u16 holding the length of 89 + * the compressed* text (*Or uncompressed, if compression fails.) and u64 90 + * holding the timestamp. oops_buf[] gets written to NVRAM. 107 91 * 108 - * oops_len points to the prefix. oops_data points to the compressed text. 92 + * oops_log_info points to the header. oops_data points to the compressed text. 109 93 * 110 94 * +- oops_buf 111 - * | +- oops_data 112 - * v v 113 - * +------------+-----------------------------------------------+ 114 - * | length | text | 115 - * | (2 bytes) | (oops_data_sz bytes) | 116 - * +------------+-----------------------------------------------+ 95 + * | +- oops_data 96 + * v v 97 + * +-----------+-----------+-----------+------------------------+ 98 + * | version | length | timestamp | text | 99 + * | (2 bytes) | (2 bytes) | (8 bytes) | (oops_data_sz bytes) | 100 + * +-----------+-----------+-----------+------------------------+ 117 101 * ^ 118 - * +- oops_len 102 + * +- oops_log_info 119 103 * 120 104 * We preallocate these buffers during init to avoid kmalloc during oops/panic. 121 105 */ 122 106 static size_t big_oops_buf_sz; 123 107 static char *big_oops_buf, *oops_buf; 124 - static u16 *oops_len; 125 108 static char *oops_data; 126 109 static size_t oops_data_sz; 127 110 ··· 130 113 #define WINDOW_BITS 12 131 114 #define MEM_LEVEL 4 132 115 static struct z_stream_s stream; 116 + 117 + #ifdef CONFIG_PSTORE 118 + static struct nvram_os_partition of_config_partition = { 119 + .name = "of-config", 120 + .index = -1, 121 + .os_partition = false 122 + }; 123 + 124 + static struct nvram_os_partition common_partition = { 125 + .name = "common", 126 + .index = -1, 127 + .os_partition = false 128 + }; 129 + 130 + static enum pstore_type_id nvram_type_ids[] = { 131 + PSTORE_TYPE_DMESG, 132 + PSTORE_TYPE_PPC_RTAS, 133 + PSTORE_TYPE_PPC_OF, 134 + PSTORE_TYPE_PPC_COMMON, 135 + -1 136 + }; 137 + static int read_type; 138 + static unsigned long last_rtas_event; 139 + #endif 133 140 134 141 static ssize_t pSeries_nvram_read(char *buf, size_t count, loff_t *index) 135 142 { ··· 316 275 { 317 276 int rc = nvram_write_os_partition(&rtas_log_partition, buff, length, 318 277 err_type, error_log_cnt); 319 - if (!rc) 278 + if (!rc) { 320 279 last_unread_rtas_event = get_seconds(); 280 + #ifdef CONFIG_PSTORE 281 + last_rtas_event = get_seconds(); 282 + #endif 283 + } 284 + 321 285 return rc; 286 + } 287 + 288 + /* nvram_read_partition 289 + * 290 + * Reads nvram partition for at most 'length' 291 + */ 292 + int nvram_read_partition(struct nvram_os_partition *part, char *buff, 293 + int length, unsigned int *err_type, 294 + unsigned int *error_log_cnt) 295 + { 296 + int rc; 297 + loff_t tmp_index; 298 + struct err_log_info info; 299 + 300 + if (part->index == -1) 301 + return -1; 302 + 303 + if (length > part->size) 304 + length = part->size; 305 + 306 + tmp_index = part->index; 307 + 308 + if (part->os_partition) { 309 + rc = ppc_md.nvram_read((char *)&info, 310 + sizeof(struct err_log_info), 311 + &tmp_index); 312 + if (rc <= 0) { 313 + pr_err("%s: Failed nvram_read (%d)\n", __FUNCTION__, 314 + rc); 315 + return rc; 316 + } 317 + } 318 + 319 + rc = ppc_md.nvram_read(buff, length, &tmp_index); 320 + if (rc <= 0) { 321 + pr_err("%s: Failed nvram_read (%d)\n", __FUNCTION__, rc); 322 + return rc; 323 + } 324 + 325 + if (part->os_partition) { 326 + *error_log_cnt = info.seq_num; 327 + *err_type = info.error_type; 328 + } 329 + 330 + return 0; 322 331 } 323 332 324 333 /* nvram_read_error_log 325 334 * 326 335 * Reads nvram for error log for at most 'length' 327 336 */ 328 - int nvram_read_error_log(char * buff, int length, 329 - unsigned int * err_type, unsigned int * error_log_cnt) 337 + int nvram_read_error_log(char *buff, int length, 338 + unsigned int *err_type, unsigned int *error_log_cnt) 330 339 { 331 - int rc; 332 - loff_t tmp_index; 333 - struct err_log_info info; 334 - 335 - if (rtas_log_partition.index == -1) 336 - return -1; 337 - 338 - if (length > rtas_log_partition.size) 339 - length = rtas_log_partition.size; 340 - 341 - tmp_index = rtas_log_partition.index; 342 - 343 - rc = ppc_md.nvram_read((char *)&info, sizeof(struct err_log_info), &tmp_index); 344 - if (rc <= 0) { 345 - printk(KERN_ERR "nvram_read_error_log: Failed nvram_read (%d)\n", rc); 346 - return rc; 347 - } 348 - 349 - rc = ppc_md.nvram_read(buff, length, &tmp_index); 350 - if (rc <= 0) { 351 - printk(KERN_ERR "nvram_read_error_log: Failed nvram_read (%d)\n", rc); 352 - return rc; 353 - } 354 - 355 - *error_log_cnt = info.seq_num; 356 - *err_type = info.error_type; 357 - 358 - return 0; 340 + return nvram_read_partition(&rtas_log_partition, buff, length, 341 + err_type, error_log_cnt); 359 342 } 360 343 361 344 /* This doesn't actually zero anything, but it sets the event_logged ··· 470 405 return 0; 471 406 } 472 407 473 - static void __init nvram_init_oops_partition(int rtas_partition_exists) 474 - { 475 - int rc; 476 - 477 - rc = pseries_nvram_init_os_partition(&oops_log_partition); 478 - if (rc != 0) { 479 - if (!rtas_partition_exists) 480 - return; 481 - pr_notice("nvram: Using %s partition to log both" 482 - " RTAS errors and oops/panic reports\n", 483 - rtas_log_partition.name); 484 - memcpy(&oops_log_partition, &rtas_log_partition, 485 - sizeof(rtas_log_partition)); 486 - } 487 - oops_buf = kmalloc(oops_log_partition.size, GFP_KERNEL); 488 - if (!oops_buf) { 489 - pr_err("nvram: No memory for %s partition\n", 490 - oops_log_partition.name); 491 - return; 492 - } 493 - oops_len = (u16*) oops_buf; 494 - oops_data = oops_buf + sizeof(u16); 495 - oops_data_sz = oops_log_partition.size - sizeof(u16); 496 - 497 - /* 498 - * Figure compression (preceded by elimination of each line's <n> 499 - * severity prefix) will reduce the oops/panic report to at most 500 - * 45% of its original size. 501 - */ 502 - big_oops_buf_sz = (oops_data_sz * 100) / 45; 503 - big_oops_buf = kmalloc(big_oops_buf_sz, GFP_KERNEL); 504 - if (big_oops_buf) { 505 - stream.workspace = kmalloc(zlib_deflate_workspacesize( 506 - WINDOW_BITS, MEM_LEVEL), GFP_KERNEL); 507 - if (!stream.workspace) { 508 - pr_err("nvram: No memory for compression workspace; " 509 - "skipping compression of %s partition data\n", 510 - oops_log_partition.name); 511 - kfree(big_oops_buf); 512 - big_oops_buf = NULL; 513 - } 514 - } else { 515 - pr_err("No memory for uncompressed %s data; " 516 - "skipping compression\n", oops_log_partition.name); 517 - stream.workspace = NULL; 518 - } 519 - 520 - rc = kmsg_dump_register(&nvram_kmsg_dumper); 521 - if (rc != 0) { 522 - pr_err("nvram: kmsg_dump_register() failed; returned %d\n", rc); 523 - kfree(oops_buf); 524 - kfree(big_oops_buf); 525 - kfree(stream.workspace); 526 - } 527 - } 528 - 529 - static int __init pseries_nvram_init_log_partitions(void) 530 - { 531 - int rc; 532 - 533 - rc = pseries_nvram_init_os_partition(&rtas_log_partition); 534 - nvram_init_oops_partition(rc == 0); 535 - return 0; 536 - } 537 - machine_arch_initcall(pseries, pseries_nvram_init_log_partitions); 538 - 539 - int __init pSeries_nvram_init(void) 540 - { 541 - struct device_node *nvram; 542 - const unsigned int *nbytes_p; 543 - unsigned int proplen; 544 - 545 - nvram = of_find_node_by_type(NULL, "nvram"); 546 - if (nvram == NULL) 547 - return -ENODEV; 548 - 549 - nbytes_p = of_get_property(nvram, "#bytes", &proplen); 550 - if (nbytes_p == NULL || proplen != sizeof(unsigned int)) { 551 - of_node_put(nvram); 552 - return -EIO; 553 - } 554 - 555 - nvram_size = *nbytes_p; 556 - 557 - nvram_fetch = rtas_token("nvram-fetch"); 558 - nvram_store = rtas_token("nvram-store"); 559 - printk(KERN_INFO "PPC64 nvram contains %d bytes\n", nvram_size); 560 - of_node_put(nvram); 561 - 562 - ppc_md.nvram_read = pSeries_nvram_read; 563 - ppc_md.nvram_write = pSeries_nvram_write; 564 - ppc_md.nvram_size = pSeries_nvram_get_size; 565 - 566 - return 0; 567 - } 568 - 569 408 /* 570 409 * Are we using the ibm,rtas-log for oops/panic reports? And if so, 571 410 * would logging this oops/panic overwrite an RTAS event that rtas_errd ··· 524 555 /* Compress the text from big_oops_buf into oops_buf. */ 525 556 static int zip_oops(size_t text_len) 526 557 { 558 + struct oops_log_info *oops_hdr = (struct oops_log_info *)oops_buf; 527 559 int zipped_len = nvram_compress(big_oops_buf, oops_data, text_len, 528 560 oops_data_sz); 529 561 if (zipped_len < 0) { ··· 532 562 pr_err("nvram: logging uncompressed oops/panic report\n"); 533 563 return -1; 534 564 } 535 - *oops_len = (u16) zipped_len; 565 + oops_hdr->version = OOPS_HDR_VERSION; 566 + oops_hdr->report_length = (u16) zipped_len; 567 + oops_hdr->timestamp = get_seconds(); 536 568 return 0; 537 569 } 570 + 571 + #ifdef CONFIG_PSTORE 572 + /* Derived from logfs_uncompress */ 573 + int nvram_decompress(void *in, void *out, size_t inlen, size_t outlen) 574 + { 575 + int err, ret; 576 + 577 + ret = -EIO; 578 + err = zlib_inflateInit(&stream); 579 + if (err != Z_OK) 580 + goto error; 581 + 582 + stream.next_in = in; 583 + stream.avail_in = inlen; 584 + stream.total_in = 0; 585 + stream.next_out = out; 586 + stream.avail_out = outlen; 587 + stream.total_out = 0; 588 + 589 + err = zlib_inflate(&stream, Z_FINISH); 590 + if (err != Z_STREAM_END) 591 + goto error; 592 + 593 + err = zlib_inflateEnd(&stream); 594 + if (err != Z_OK) 595 + goto error; 596 + 597 + ret = stream.total_out; 598 + error: 599 + return ret; 600 + } 601 + 602 + static int unzip_oops(char *oops_buf, char *big_buf) 603 + { 604 + struct oops_log_info *oops_hdr = (struct oops_log_info *)oops_buf; 605 + u64 timestamp = oops_hdr->timestamp; 606 + char *big_oops_data = NULL; 607 + char *oops_data_buf = NULL; 608 + size_t big_oops_data_sz; 609 + int unzipped_len; 610 + 611 + big_oops_data = big_buf + sizeof(struct oops_log_info); 612 + big_oops_data_sz = big_oops_buf_sz - sizeof(struct oops_log_info); 613 + oops_data_buf = oops_buf + sizeof(struct oops_log_info); 614 + 615 + unzipped_len = nvram_decompress(oops_data_buf, big_oops_data, 616 + oops_hdr->report_length, 617 + big_oops_data_sz); 618 + 619 + if (unzipped_len < 0) { 620 + pr_err("nvram: decompression failed; returned %d\n", 621 + unzipped_len); 622 + return -1; 623 + } 624 + oops_hdr = (struct oops_log_info *)big_buf; 625 + oops_hdr->version = OOPS_HDR_VERSION; 626 + oops_hdr->report_length = (u16) unzipped_len; 627 + oops_hdr->timestamp = timestamp; 628 + return 0; 629 + } 630 + 631 + static int nvram_pstore_open(struct pstore_info *psi) 632 + { 633 + /* Reset the iterator to start reading partitions again */ 634 + read_type = -1; 635 + return 0; 636 + } 637 + 638 + /** 639 + * nvram_pstore_write - pstore write callback for nvram 640 + * @type: Type of message logged 641 + * @reason: reason behind dump (oops/panic) 642 + * @id: identifier to indicate the write performed 643 + * @part: pstore writes data to registered buffer in parts, 644 + * part number will indicate the same. 645 + * @count: Indicates oops count 646 + * @hsize: Size of header added by pstore 647 + * @size: number of bytes written to the registered buffer 648 + * @psi: registered pstore_info structure 649 + * 650 + * Called by pstore_dump() when an oops or panic report is logged in the 651 + * printk buffer. 652 + * Returns 0 on successful write. 653 + */ 654 + static int nvram_pstore_write(enum pstore_type_id type, 655 + enum kmsg_dump_reason reason, 656 + u64 *id, unsigned int part, int count, 657 + size_t hsize, size_t size, 658 + struct pstore_info *psi) 659 + { 660 + int rc; 661 + unsigned int err_type = ERR_TYPE_KERNEL_PANIC; 662 + struct oops_log_info *oops_hdr = (struct oops_log_info *) oops_buf; 663 + 664 + /* part 1 has the recent messages from printk buffer */ 665 + if (part > 1 || type != PSTORE_TYPE_DMESG || 666 + clobbering_unread_rtas_event()) 667 + return -1; 668 + 669 + oops_hdr->version = OOPS_HDR_VERSION; 670 + oops_hdr->report_length = (u16) size; 671 + oops_hdr->timestamp = get_seconds(); 672 + 673 + if (big_oops_buf) { 674 + rc = zip_oops(size); 675 + /* 676 + * If compression fails copy recent log messages from 677 + * big_oops_buf to oops_data. 678 + */ 679 + if (rc != 0) { 680 + size_t diff = size - oops_data_sz + hsize; 681 + 682 + if (size > oops_data_sz) { 683 + memcpy(oops_data, big_oops_buf, hsize); 684 + memcpy(oops_data + hsize, big_oops_buf + diff, 685 + oops_data_sz - hsize); 686 + 687 + oops_hdr->report_length = (u16) oops_data_sz; 688 + } else 689 + memcpy(oops_data, big_oops_buf, size); 690 + } else 691 + err_type = ERR_TYPE_KERNEL_PANIC_GZ; 692 + } 693 + 694 + rc = nvram_write_os_partition(&oops_log_partition, oops_buf, 695 + (int) (sizeof(*oops_hdr) + oops_hdr->report_length), err_type, 696 + count); 697 + 698 + if (rc != 0) 699 + return rc; 700 + 701 + *id = part; 702 + return 0; 703 + } 704 + 705 + /* 706 + * Reads the oops/panic report, rtas, of-config and common partition. 707 + * Returns the length of the data we read from each partition. 708 + * Returns 0 if we've been called before. 709 + */ 710 + static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type, 711 + int *count, struct timespec *time, char **buf, 712 + struct pstore_info *psi) 713 + { 714 + struct oops_log_info *oops_hdr; 715 + unsigned int err_type, id_no, size = 0; 716 + struct nvram_os_partition *part = NULL; 717 + char *buff = NULL, *big_buff = NULL; 718 + int rc, sig = 0; 719 + loff_t p; 720 + 721 + read_partition: 722 + read_type++; 723 + 724 + switch (nvram_type_ids[read_type]) { 725 + case PSTORE_TYPE_DMESG: 726 + part = &oops_log_partition; 727 + *type = PSTORE_TYPE_DMESG; 728 + break; 729 + case PSTORE_TYPE_PPC_RTAS: 730 + part = &rtas_log_partition; 731 + *type = PSTORE_TYPE_PPC_RTAS; 732 + time->tv_sec = last_rtas_event; 733 + time->tv_nsec = 0; 734 + break; 735 + case PSTORE_TYPE_PPC_OF: 736 + sig = NVRAM_SIG_OF; 737 + part = &of_config_partition; 738 + *type = PSTORE_TYPE_PPC_OF; 739 + *id = PSTORE_TYPE_PPC_OF; 740 + time->tv_sec = 0; 741 + time->tv_nsec = 0; 742 + break; 743 + case PSTORE_TYPE_PPC_COMMON: 744 + sig = NVRAM_SIG_SYS; 745 + part = &common_partition; 746 + *type = PSTORE_TYPE_PPC_COMMON; 747 + *id = PSTORE_TYPE_PPC_COMMON; 748 + time->tv_sec = 0; 749 + time->tv_nsec = 0; 750 + break; 751 + default: 752 + return 0; 753 + } 754 + 755 + if (!part->os_partition) { 756 + p = nvram_find_partition(part->name, sig, &size); 757 + if (p <= 0) { 758 + pr_err("nvram: Failed to find partition %s, " 759 + "err %d\n", part->name, (int)p); 760 + return 0; 761 + } 762 + part->index = p; 763 + part->size = size; 764 + } 765 + 766 + buff = kmalloc(part->size, GFP_KERNEL); 767 + 768 + if (!buff) 769 + return -ENOMEM; 770 + 771 + if (nvram_read_partition(part, buff, part->size, &err_type, &id_no)) { 772 + kfree(buff); 773 + return 0; 774 + } 775 + 776 + *count = 0; 777 + 778 + if (part->os_partition) 779 + *id = id_no; 780 + 781 + if (nvram_type_ids[read_type] == PSTORE_TYPE_DMESG) { 782 + oops_hdr = (struct oops_log_info *)buff; 783 + *buf = buff + sizeof(*oops_hdr); 784 + 785 + if (err_type == ERR_TYPE_KERNEL_PANIC_GZ) { 786 + big_buff = kmalloc(big_oops_buf_sz, GFP_KERNEL); 787 + if (!big_buff) 788 + return -ENOMEM; 789 + 790 + rc = unzip_oops(buff, big_buff); 791 + 792 + if (rc != 0) { 793 + kfree(buff); 794 + kfree(big_buff); 795 + goto read_partition; 796 + } 797 + 798 + oops_hdr = (struct oops_log_info *)big_buff; 799 + *buf = big_buff + sizeof(*oops_hdr); 800 + kfree(buff); 801 + } 802 + 803 + time->tv_sec = oops_hdr->timestamp; 804 + time->tv_nsec = 0; 805 + return oops_hdr->report_length; 806 + } 807 + 808 + *buf = buff; 809 + return part->size; 810 + } 811 + 812 + static struct pstore_info nvram_pstore_info = { 813 + .owner = THIS_MODULE, 814 + .name = "nvram", 815 + .open = nvram_pstore_open, 816 + .read = nvram_pstore_read, 817 + .write = nvram_pstore_write, 818 + }; 819 + 820 + static int nvram_pstore_init(void) 821 + { 822 + int rc = 0; 823 + 824 + if (big_oops_buf) { 825 + nvram_pstore_info.buf = big_oops_buf; 826 + nvram_pstore_info.bufsize = big_oops_buf_sz; 827 + } else { 828 + nvram_pstore_info.buf = oops_data; 829 + nvram_pstore_info.bufsize = oops_data_sz; 830 + } 831 + 832 + rc = pstore_register(&nvram_pstore_info); 833 + if (rc != 0) 834 + pr_err("nvram: pstore_register() failed, defaults to " 835 + "kmsg_dump; returned %d\n", rc); 836 + 837 + return rc; 838 + } 839 + #else 840 + static int nvram_pstore_init(void) 841 + { 842 + return -1; 843 + } 844 + #endif 845 + 846 + static void __init nvram_init_oops_partition(int rtas_partition_exists) 847 + { 848 + int rc; 849 + 850 + rc = pseries_nvram_init_os_partition(&oops_log_partition); 851 + if (rc != 0) { 852 + if (!rtas_partition_exists) 853 + return; 854 + pr_notice("nvram: Using %s partition to log both" 855 + " RTAS errors and oops/panic reports\n", 856 + rtas_log_partition.name); 857 + memcpy(&oops_log_partition, &rtas_log_partition, 858 + sizeof(rtas_log_partition)); 859 + } 860 + oops_buf = kmalloc(oops_log_partition.size, GFP_KERNEL); 861 + if (!oops_buf) { 862 + pr_err("nvram: No memory for %s partition\n", 863 + oops_log_partition.name); 864 + return; 865 + } 866 + oops_data = oops_buf + sizeof(struct oops_log_info); 867 + oops_data_sz = oops_log_partition.size - sizeof(struct oops_log_info); 868 + 869 + /* 870 + * Figure compression (preceded by elimination of each line's <n> 871 + * severity prefix) will reduce the oops/panic report to at most 872 + * 45% of its original size. 873 + */ 874 + big_oops_buf_sz = (oops_data_sz * 100) / 45; 875 + big_oops_buf = kmalloc(big_oops_buf_sz, GFP_KERNEL); 876 + if (big_oops_buf) { 877 + stream.workspace = kmalloc(zlib_deflate_workspacesize( 878 + WINDOW_BITS, MEM_LEVEL), GFP_KERNEL); 879 + if (!stream.workspace) { 880 + pr_err("nvram: No memory for compression workspace; " 881 + "skipping compression of %s partition data\n", 882 + oops_log_partition.name); 883 + kfree(big_oops_buf); 884 + big_oops_buf = NULL; 885 + } 886 + } else { 887 + pr_err("No memory for uncompressed %s data; " 888 + "skipping compression\n", oops_log_partition.name); 889 + stream.workspace = NULL; 890 + } 891 + 892 + rc = nvram_pstore_init(); 893 + 894 + if (!rc) 895 + return; 896 + 897 + rc = kmsg_dump_register(&nvram_kmsg_dumper); 898 + if (rc != 0) { 899 + pr_err("nvram: kmsg_dump_register() failed; returned %d\n", rc); 900 + kfree(oops_buf); 901 + kfree(big_oops_buf); 902 + kfree(stream.workspace); 903 + } 904 + } 905 + 906 + static int __init pseries_nvram_init_log_partitions(void) 907 + { 908 + int rc; 909 + 910 + rc = pseries_nvram_init_os_partition(&rtas_log_partition); 911 + nvram_init_oops_partition(rc == 0); 912 + return 0; 913 + } 914 + machine_arch_initcall(pseries, pseries_nvram_init_log_partitions); 915 + 916 + int __init pSeries_nvram_init(void) 917 + { 918 + struct device_node *nvram; 919 + const unsigned int *nbytes_p; 920 + unsigned int proplen; 921 + 922 + nvram = of_find_node_by_type(NULL, "nvram"); 923 + if (nvram == NULL) 924 + return -ENODEV; 925 + 926 + nbytes_p = of_get_property(nvram, "#bytes", &proplen); 927 + if (nbytes_p == NULL || proplen != sizeof(unsigned int)) { 928 + of_node_put(nvram); 929 + return -EIO; 930 + } 931 + 932 + nvram_size = *nbytes_p; 933 + 934 + nvram_fetch = rtas_token("nvram-fetch"); 935 + nvram_store = rtas_token("nvram-store"); 936 + printk(KERN_INFO "PPC64 nvram contains %d bytes\n", nvram_size); 937 + of_node_put(nvram); 938 + 939 + ppc_md.nvram_read = pSeries_nvram_read; 940 + ppc_md.nvram_write = pSeries_nvram_write; 941 + ppc_md.nvram_size = pSeries_nvram_get_size; 942 + 943 + return 0; 944 + } 945 + 538 946 539 947 /* 540 948 * This is our kmsg_dump callback, called after an oops or panic report ··· 924 576 static void oops_to_nvram(struct kmsg_dumper *dumper, 925 577 enum kmsg_dump_reason reason) 926 578 { 579 + struct oops_log_info *oops_hdr = (struct oops_log_info *)oops_buf; 927 580 static unsigned int oops_count = 0; 928 581 static bool panicking = false; 929 582 static DEFINE_SPINLOCK(lock); ··· 968 619 } 969 620 if (rc != 0) { 970 621 kmsg_dump_rewind(dumper); 971 - kmsg_dump_get_buffer(dumper, true, 622 + kmsg_dump_get_buffer(dumper, false, 972 623 oops_data, oops_data_sz, &text_len); 973 624 err_type = ERR_TYPE_KERNEL_PANIC; 974 - *oops_len = (u16) text_len; 625 + oops_hdr->version = OOPS_HDR_VERSION; 626 + oops_hdr->report_length = (u16) text_len; 627 + oops_hdr->timestamp = get_seconds(); 975 628 } 976 629 977 630 (void) nvram_write_os_partition(&oops_log_partition, oops_buf, 978 - (int) (sizeof(*oops_len) + *oops_len), err_type, ++oops_count); 631 + (int) (sizeof(*oops_hdr) + oops_hdr->report_length), err_type, 632 + ++oops_count); 979 633 980 634 spin_unlock_irqrestore(&lock, flags); 981 635 }
-85
arch/powerpc/platforms/pseries/pci_dlpar.c
··· 64 64 } 65 65 EXPORT_SYMBOL_GPL(pcibios_find_pci_bus); 66 66 67 - /** 68 - * __pcibios_remove_pci_devices - remove all devices under this bus 69 - * @bus: the indicated PCI bus 70 - * @purge_pe: destroy the PE on removal of PCI devices 71 - * 72 - * Remove all of the PCI devices under this bus both from the 73 - * linux pci device tree, and from the powerpc EEH address cache. 74 - * By default, the corresponding PE will be destroied during the 75 - * normal PCI hotplug path. For PCI hotplug during EEH recovery, 76 - * the corresponding PE won't be destroied and deallocated. 77 - */ 78 - void __pcibios_remove_pci_devices(struct pci_bus *bus, int purge_pe) 79 - { 80 - struct pci_dev *dev, *tmp; 81 - struct pci_bus *child_bus; 82 - 83 - /* First go down child busses */ 84 - list_for_each_entry(child_bus, &bus->children, node) 85 - __pcibios_remove_pci_devices(child_bus, purge_pe); 86 - 87 - pr_debug("PCI: Removing devices on bus %04x:%02x\n", 88 - pci_domain_nr(bus), bus->number); 89 - list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) { 90 - pr_debug(" * Removing %s...\n", pci_name(dev)); 91 - eeh_remove_bus_device(dev, purge_pe); 92 - pci_stop_and_remove_bus_device(dev); 93 - } 94 - } 95 - 96 - /** 97 - * pcibios_remove_pci_devices - remove all devices under this bus 98 - * 99 - * Remove all of the PCI devices under this bus both from the 100 - * linux pci device tree, and from the powerpc EEH address cache. 101 - */ 102 - void pcibios_remove_pci_devices(struct pci_bus *bus) 103 - { 104 - __pcibios_remove_pci_devices(bus, 1); 105 - } 106 - EXPORT_SYMBOL_GPL(pcibios_remove_pci_devices); 107 - 108 - /** 109 - * pcibios_add_pci_devices - adds new pci devices to bus 110 - * 111 - * This routine will find and fixup new pci devices under 112 - * the indicated bus. This routine presumes that there 113 - * might already be some devices under this bridge, so 114 - * it carefully tries to add only new devices. (And that 115 - * is how this routine differs from other, similar pcibios 116 - * routines.) 117 - */ 118 - void pcibios_add_pci_devices(struct pci_bus * bus) 119 - { 120 - int slotno, num, mode, pass, max; 121 - struct pci_dev *dev; 122 - struct device_node *dn = pci_bus_to_OF_node(bus); 123 - 124 - eeh_add_device_tree_early(dn); 125 - 126 - mode = PCI_PROBE_NORMAL; 127 - if (ppc_md.pci_probe_mode) 128 - mode = ppc_md.pci_probe_mode(bus); 129 - 130 - if (mode == PCI_PROBE_DEVTREE) { 131 - /* use ofdt-based probe */ 132 - of_rescan_bus(dn, bus); 133 - } else if (mode == PCI_PROBE_NORMAL) { 134 - /* use legacy probe */ 135 - slotno = PCI_SLOT(PCI_DN(dn->child)->devfn); 136 - num = pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); 137 - if (!num) 138 - return; 139 - pcibios_setup_bus_devices(bus); 140 - max = bus->busn_res.start; 141 - for (pass=0; pass < 2; pass++) 142 - list_for_each_entry(dev, &bus->devices, bus_list) { 143 - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE || 144 - dev->hdr_type == PCI_HEADER_TYPE_CARDBUS) 145 - max = pci_scan_bridge(bus, dev, max, pass); 146 - } 147 - } 148 - pcibios_finish_adding_to_bus(bus); 149 - } 150 - EXPORT_SYMBOL_GPL(pcibios_add_pci_devices); 151 - 152 67 struct pci_controller *init_phb_dynamic(struct device_node *dn) 153 68 { 154 69 struct pci_controller *phb;
+4 -4
arch/powerpc/platforms/pseries/ras.c
··· 83 83 switch (event_modifier) { 84 84 case EPOW_SHUTDOWN_NORMAL: 85 85 pr_emerg("Firmware initiated power off"); 86 - orderly_poweroff(1); 86 + orderly_poweroff(true); 87 87 break; 88 88 89 89 case EPOW_SHUTDOWN_ON_UPS: ··· 95 95 pr_emerg("Loss of system critical functions reported by " 96 96 "firmware"); 97 97 pr_emerg("Check RTAS error log for details"); 98 - orderly_poweroff(1); 98 + orderly_poweroff(true); 99 99 break; 100 100 101 101 case EPOW_SHUTDOWN_AMBIENT_TEMPERATURE_TOO_HIGH: 102 102 pr_emerg("Ambient temperature too high reported by firmware"); 103 103 pr_emerg("Check RTAS error log for details"); 104 - orderly_poweroff(1); 104 + orderly_poweroff(true); 105 105 break; 106 106 107 107 default: ··· 162 162 163 163 case EPOW_SYSTEM_HALT: 164 164 pr_emerg("Firmware initiated power off"); 165 - orderly_poweroff(1); 165 + orderly_poweroff(true); 166 166 break; 167 167 168 168 case EPOW_MAIN_ENCLOSURE:
+1 -1
arch/powerpc/platforms/pseries/smp.c
··· 192 192 /* Special case - we inhibit secondary thread startup 193 193 * during boot if the user requests it. 194 194 */ 195 - if (system_state < SYSTEM_RUNNING && cpu_has_feature(CPU_FTR_SMT)) { 195 + if (system_state == SYSTEM_BOOTING && cpu_has_feature(CPU_FTR_SMT)) { 196 196 if (!smt_enabled_at_boot && cpu_thread_in_core(nr) != 0) 197 197 return 0; 198 198 if (smt_enabled_at_boot
+2
arch/powerpc/sysdev/Makefile
··· 4 4 5 5 mpic-msi-obj-$(CONFIG_PCI_MSI) += mpic_msi.o mpic_u3msi.o mpic_pasemi_msi.o 6 6 obj-$(CONFIG_MPIC) += mpic.o $(mpic-msi-obj-y) 7 + obj-$(CONFIG_MPIC_TIMER) += mpic_timer.o 8 + obj-$(CONFIG_FSL_MPIC_TIMER_WAKEUP) += fsl_mpic_timer_wakeup.o 7 9 mpic-msgr-obj-$(CONFIG_MPIC_MSGR) += mpic_msgr.o 8 10 obj-$(CONFIG_MPIC) += mpic.o $(mpic-msi-obj-y) $(mpic-msgr-obj-y) 9 11 obj-$(CONFIG_PPC_EPAPR_HV_PIC) += ehv_pic.o
+1
arch/powerpc/sysdev/cpm1.c
··· 120 120 121 121 static struct irqaction cpm_error_irqaction = { 122 122 .handler = cpm_error_interrupt, 123 + .flags = IRQF_NO_THREAD, 123 124 .name = "error", 124 125 }; 125 126
+161
arch/powerpc/sysdev/fsl_mpic_timer_wakeup.c
··· 1 + /* 2 + * MPIC timer wakeup driver 3 + * 4 + * Copyright 2013 Freescale Semiconductor, Inc. 5 + * 6 + * This program is free software; you can redistribute it and/or modify it 7 + * under the terms of the GNU General Public License as published by the 8 + * Free Software Foundation; either version 2 of the License, or (at your 9 + * option) any later version. 10 + */ 11 + 12 + #include <linux/kernel.h> 13 + #include <linux/slab.h> 14 + #include <linux/errno.h> 15 + #include <linux/module.h> 16 + #include <linux/interrupt.h> 17 + #include <linux/device.h> 18 + 19 + #include <asm/mpic_timer.h> 20 + #include <asm/mpic.h> 21 + 22 + struct fsl_mpic_timer_wakeup { 23 + struct mpic_timer *timer; 24 + struct work_struct free_work; 25 + }; 26 + 27 + static struct fsl_mpic_timer_wakeup *fsl_wakeup; 28 + static DEFINE_MUTEX(sysfs_lock); 29 + 30 + static void fsl_free_resource(struct work_struct *ws) 31 + { 32 + struct fsl_mpic_timer_wakeup *wakeup = 33 + container_of(ws, struct fsl_mpic_timer_wakeup, free_work); 34 + 35 + mutex_lock(&sysfs_lock); 36 + 37 + if (wakeup->timer) { 38 + disable_irq_wake(wakeup->timer->irq); 39 + mpic_free_timer(wakeup->timer); 40 + } 41 + 42 + wakeup->timer = NULL; 43 + mutex_unlock(&sysfs_lock); 44 + } 45 + 46 + static irqreturn_t fsl_mpic_timer_irq(int irq, void *dev_id) 47 + { 48 + struct fsl_mpic_timer_wakeup *wakeup = dev_id; 49 + 50 + schedule_work(&wakeup->free_work); 51 + 52 + return wakeup->timer ? IRQ_HANDLED : IRQ_NONE; 53 + } 54 + 55 + static ssize_t fsl_timer_wakeup_show(struct device *dev, 56 + struct device_attribute *attr, 57 + char *buf) 58 + { 59 + struct timeval interval; 60 + int val = 0; 61 + 62 + mutex_lock(&sysfs_lock); 63 + if (fsl_wakeup->timer) { 64 + mpic_get_remain_time(fsl_wakeup->timer, &interval); 65 + val = interval.tv_sec + 1; 66 + } 67 + mutex_unlock(&sysfs_lock); 68 + 69 + return sprintf(buf, "%d\n", val); 70 + } 71 + 72 + static ssize_t fsl_timer_wakeup_store(struct device *dev, 73 + struct device_attribute *attr, 74 + const char *buf, 75 + size_t count) 76 + { 77 + struct timeval interval; 78 + int ret; 79 + 80 + interval.tv_usec = 0; 81 + if (kstrtol(buf, 0, &interval.tv_sec)) 82 + return -EINVAL; 83 + 84 + mutex_lock(&sysfs_lock); 85 + 86 + if (fsl_wakeup->timer) { 87 + disable_irq_wake(fsl_wakeup->timer->irq); 88 + mpic_free_timer(fsl_wakeup->timer); 89 + fsl_wakeup->timer = NULL; 90 + } 91 + 92 + if (!interval.tv_sec) { 93 + mutex_unlock(&sysfs_lock); 94 + return count; 95 + } 96 + 97 + fsl_wakeup->timer = mpic_request_timer(fsl_mpic_timer_irq, 98 + fsl_wakeup, &interval); 99 + if (!fsl_wakeup->timer) { 100 + mutex_unlock(&sysfs_lock); 101 + return -EINVAL; 102 + } 103 + 104 + ret = enable_irq_wake(fsl_wakeup->timer->irq); 105 + if (ret) { 106 + mpic_free_timer(fsl_wakeup->timer); 107 + fsl_wakeup->timer = NULL; 108 + mutex_unlock(&sysfs_lock); 109 + 110 + return ret; 111 + } 112 + 113 + mpic_start_timer(fsl_wakeup->timer); 114 + 115 + mutex_unlock(&sysfs_lock); 116 + 117 + return count; 118 + } 119 + 120 + static struct device_attribute mpic_attributes = __ATTR(timer_wakeup, 0644, 121 + fsl_timer_wakeup_show, fsl_timer_wakeup_store); 122 + 123 + static int __init fsl_wakeup_sys_init(void) 124 + { 125 + int ret; 126 + 127 + fsl_wakeup = kzalloc(sizeof(struct fsl_mpic_timer_wakeup), GFP_KERNEL); 128 + if (!fsl_wakeup) 129 + return -ENOMEM; 130 + 131 + INIT_WORK(&fsl_wakeup->free_work, fsl_free_resource); 132 + 133 + ret = device_create_file(mpic_subsys.dev_root, &mpic_attributes); 134 + if (ret) 135 + kfree(fsl_wakeup); 136 + 137 + return ret; 138 + } 139 + 140 + static void __exit fsl_wakeup_sys_exit(void) 141 + { 142 + device_remove_file(mpic_subsys.dev_root, &mpic_attributes); 143 + 144 + mutex_lock(&sysfs_lock); 145 + 146 + if (fsl_wakeup->timer) { 147 + disable_irq_wake(fsl_wakeup->timer->irq); 148 + mpic_free_timer(fsl_wakeup->timer); 149 + } 150 + 151 + kfree(fsl_wakeup); 152 + 153 + mutex_unlock(&sysfs_lock); 154 + } 155 + 156 + module_init(fsl_wakeup_sys_init); 157 + module_exit(fsl_wakeup_sys_exit); 158 + 159 + MODULE_DESCRIPTION("Freescale MPIC global timer wakeup driver"); 160 + MODULE_LICENSE("GPL v2"); 161 + MODULE_AUTHOR("Wang Dongsheng <dongsheng.wang@freescale.com>");
+51 -7
arch/powerpc/sysdev/mpic.c
··· 48 48 #define DBG(fmt...) 49 49 #endif 50 50 51 + struct bus_type mpic_subsys = { 52 + .name = "mpic", 53 + .dev_name = "mpic", 54 + }; 55 + EXPORT_SYMBOL_GPL(mpic_subsys); 56 + 51 57 static struct mpic *mpics; 52 58 static struct mpic *mpic_primary; 53 59 static DEFINE_RAW_SPINLOCK(mpic_lock); ··· 926 920 return IRQ_SET_MASK_OK_NOCOPY; 927 921 } 928 922 923 + static int mpic_irq_set_wake(struct irq_data *d, unsigned int on) 924 + { 925 + struct irq_desc *desc = container_of(d, struct irq_desc, irq_data); 926 + struct mpic *mpic = mpic_from_irq_data(d); 927 + 928 + if (!(mpic->flags & MPIC_FSL)) 929 + return -ENXIO; 930 + 931 + if (on) 932 + desc->action->flags |= IRQF_NO_SUSPEND; 933 + else 934 + desc->action->flags &= ~IRQF_NO_SUSPEND; 935 + 936 + return 0; 937 + } 938 + 929 939 void mpic_set_vector(unsigned int virq, unsigned int vector) 930 940 { 931 941 struct mpic *mpic = mpic_from_irq(virq); ··· 979 957 .irq_unmask = mpic_unmask_irq, 980 958 .irq_eoi = mpic_end_irq, 981 959 .irq_set_type = mpic_set_irq_type, 960 + .irq_set_wake = mpic_irq_set_wake, 982 961 }; 983 962 984 963 #ifdef CONFIG_SMP ··· 994 971 .irq_mask = mpic_mask_tm, 995 972 .irq_unmask = mpic_unmask_tm, 996 973 .irq_eoi = mpic_end_irq, 974 + .irq_set_wake = mpic_irq_set_wake, 997 975 }; 998 976 999 977 #ifdef CONFIG_MPIC_U3_HT_IRQS ··· 1197 1173 .xlate = mpic_host_xlate, 1198 1174 }; 1199 1175 1176 + static u32 fsl_mpic_get_version(struct mpic *mpic) 1177 + { 1178 + u32 brr1; 1179 + 1180 + if (!(mpic->flags & MPIC_FSL)) 1181 + return 0; 1182 + 1183 + brr1 = _mpic_read(mpic->reg_type, &mpic->thiscpuregs, 1184 + MPIC_FSL_BRR1); 1185 + 1186 + return brr1 & MPIC_FSL_BRR1_VER; 1187 + } 1188 + 1200 1189 /* 1201 1190 * Exported functions 1202 1191 */ 1192 + 1193 + u32 fsl_mpic_primary_get_version(void) 1194 + { 1195 + struct mpic *mpic = mpic_primary; 1196 + 1197 + if (mpic) 1198 + return fsl_mpic_get_version(mpic); 1199 + 1200 + return 0; 1201 + } 1203 1202 1204 1203 struct mpic * __init mpic_alloc(struct device_node *node, 1205 1204 phys_addr_t phys_addr, ··· 1370 1323 mpic_map(mpic, mpic->paddr, &mpic->tmregs, MPIC_INFO(TIMER_BASE), 0x1000); 1371 1324 1372 1325 if (mpic->flags & MPIC_FSL) { 1373 - u32 brr1; 1374 1326 int ret; 1375 1327 1376 1328 /* ··· 1380 1334 mpic_map(mpic, mpic->paddr, &mpic->thiscpuregs, 1381 1335 MPIC_CPU_THISBASE, 0x1000); 1382 1336 1383 - brr1 = _mpic_read(mpic->reg_type, &mpic->thiscpuregs, 1384 - MPIC_FSL_BRR1); 1385 - fsl_version = brr1 & MPIC_FSL_BRR1_VER; 1337 + fsl_version = fsl_mpic_get_version(mpic); 1386 1338 1387 1339 /* Error interrupt mask register (EIMR) is required for 1388 1340 * handling individual device error interrupts. EIMR ··· 1570 1526 mpic_cpu_write(MPIC_INFO(CPU_CURRENT_TASK_PRI), 0xf); 1571 1527 1572 1528 if (mpic->flags & MPIC_FSL) { 1573 - u32 brr1 = _mpic_read(mpic->reg_type, &mpic->thiscpuregs, 1574 - MPIC_FSL_BRR1); 1575 - u32 version = brr1 & MPIC_FSL_BRR1_VER; 1529 + u32 version = fsl_mpic_get_version(mpic); 1576 1530 1577 1531 /* 1578 1532 * Timer group B is present at the latest in MPIC 3.1 (e.g. ··· 2041 1999 static int mpic_init_sys(void) 2042 2000 { 2043 2001 register_syscore_ops(&mpic_syscore_ops); 2002 + subsys_system_register(&mpic_subsys, NULL); 2003 + 2044 2004 return 0; 2045 2005 } 2046 2006
+593
arch/powerpc/sysdev/mpic_timer.c
··· 1 + /* 2 + * MPIC timer driver 3 + * 4 + * Copyright 2013 Freescale Semiconductor, Inc. 5 + * Author: Dongsheng Wang <Dongsheng.Wang@freescale.com> 6 + * Li Yang <leoli@freescale.com> 7 + * 8 + * This program is free software; you can redistribute it and/or modify it 9 + * under the terms of the GNU General Public License as published by the 10 + * Free Software Foundation; either version 2 of the License, or (at your 11 + * option) any later version. 12 + */ 13 + 14 + #include <linux/kernel.h> 15 + #include <linux/init.h> 16 + #include <linux/module.h> 17 + #include <linux/errno.h> 18 + #include <linux/mm.h> 19 + #include <linux/interrupt.h> 20 + #include <linux/slab.h> 21 + #include <linux/of.h> 22 + #include <linux/of_device.h> 23 + #include <linux/syscore_ops.h> 24 + #include <sysdev/fsl_soc.h> 25 + #include <asm/io.h> 26 + 27 + #include <asm/mpic_timer.h> 28 + 29 + #define FSL_GLOBAL_TIMER 0x1 30 + 31 + /* Clock Ratio 32 + * Divide by 64 0x00000300 33 + * Divide by 32 0x00000200 34 + * Divide by 16 0x00000100 35 + * Divide by 8 0x00000000 (Hardware default div) 36 + */ 37 + #define MPIC_TIMER_TCR_CLKDIV 0x00000300 38 + 39 + #define MPIC_TIMER_TCR_ROVR_OFFSET 24 40 + 41 + #define TIMER_STOP 0x80000000 42 + #define TIMERS_PER_GROUP 4 43 + #define MAX_TICKS (~0U >> 1) 44 + #define MAX_TICKS_CASCADE (~0U) 45 + #define TIMER_OFFSET(num) (1 << (TIMERS_PER_GROUP - 1 - num)) 46 + 47 + /* tv_usec should be less than ONE_SECOND, otherwise use tv_sec */ 48 + #define ONE_SECOND 1000000 49 + 50 + struct timer_regs { 51 + u32 gtccr; 52 + u32 res0[3]; 53 + u32 gtbcr; 54 + u32 res1[3]; 55 + u32 gtvpr; 56 + u32 res2[3]; 57 + u32 gtdr; 58 + u32 res3[3]; 59 + }; 60 + 61 + struct cascade_priv { 62 + u32 tcr_value; /* TCR register: CASC & ROVR value */ 63 + unsigned int cascade_map; /* cascade map */ 64 + unsigned int timer_num; /* cascade control timer */ 65 + }; 66 + 67 + struct timer_group_priv { 68 + struct timer_regs __iomem *regs; 69 + struct mpic_timer timer[TIMERS_PER_GROUP]; 70 + struct list_head node; 71 + unsigned int timerfreq; 72 + unsigned int idle; 73 + unsigned int flags; 74 + spinlock_t lock; 75 + void __iomem *group_tcr; 76 + }; 77 + 78 + static struct cascade_priv cascade_timer[] = { 79 + /* cascade timer 0 and 1 */ 80 + {0x1, 0xc, 0x1}, 81 + /* cascade timer 1 and 2 */ 82 + {0x2, 0x6, 0x2}, 83 + /* cascade timer 2 and 3 */ 84 + {0x4, 0x3, 0x3} 85 + }; 86 + 87 + static LIST_HEAD(timer_group_list); 88 + 89 + static void convert_ticks_to_time(struct timer_group_priv *priv, 90 + const u64 ticks, struct timeval *time) 91 + { 92 + u64 tmp_sec; 93 + 94 + time->tv_sec = (__kernel_time_t)div_u64(ticks, priv->timerfreq); 95 + tmp_sec = (u64)time->tv_sec * (u64)priv->timerfreq; 96 + 97 + time->tv_usec = (__kernel_suseconds_t) 98 + div_u64((ticks - tmp_sec) * 1000000, priv->timerfreq); 99 + 100 + return; 101 + } 102 + 103 + /* the time set by the user is converted to "ticks" */ 104 + static int convert_time_to_ticks(struct timer_group_priv *priv, 105 + const struct timeval *time, u64 *ticks) 106 + { 107 + u64 max_value; /* prevent u64 overflow */ 108 + u64 tmp = 0; 109 + 110 + u64 tmp_sec; 111 + u64 tmp_ms; 112 + u64 tmp_us; 113 + 114 + max_value = div_u64(ULLONG_MAX, priv->timerfreq); 115 + 116 + if (time->tv_sec > max_value || 117 + (time->tv_sec == max_value && time->tv_usec > 0)) 118 + return -EINVAL; 119 + 120 + tmp_sec = (u64)time->tv_sec * (u64)priv->timerfreq; 121 + tmp += tmp_sec; 122 + 123 + tmp_ms = time->tv_usec / 1000; 124 + tmp_ms = div_u64((u64)tmp_ms * (u64)priv->timerfreq, 1000); 125 + tmp += tmp_ms; 126 + 127 + tmp_us = time->tv_usec % 1000; 128 + tmp_us = div_u64((u64)tmp_us * (u64)priv->timerfreq, 1000000); 129 + tmp += tmp_us; 130 + 131 + *ticks = tmp; 132 + 133 + return 0; 134 + } 135 + 136 + /* detect whether there is a cascade timer available */ 137 + static struct mpic_timer *detect_idle_cascade_timer( 138 + struct timer_group_priv *priv) 139 + { 140 + struct cascade_priv *casc_priv; 141 + unsigned int map; 142 + unsigned int array_size = ARRAY_SIZE(cascade_timer); 143 + unsigned int num; 144 + unsigned int i; 145 + unsigned long flags; 146 + 147 + casc_priv = cascade_timer; 148 + for (i = 0; i < array_size; i++) { 149 + spin_lock_irqsave(&priv->lock, flags); 150 + map = casc_priv->cascade_map & priv->idle; 151 + if (map == casc_priv->cascade_map) { 152 + num = casc_priv->timer_num; 153 + priv->timer[num].cascade_handle = casc_priv; 154 + 155 + /* set timer busy */ 156 + priv->idle &= ~casc_priv->cascade_map; 157 + spin_unlock_irqrestore(&priv->lock, flags); 158 + return &priv->timer[num]; 159 + } 160 + spin_unlock_irqrestore(&priv->lock, flags); 161 + casc_priv++; 162 + } 163 + 164 + return NULL; 165 + } 166 + 167 + static int set_cascade_timer(struct timer_group_priv *priv, u64 ticks, 168 + unsigned int num) 169 + { 170 + struct cascade_priv *casc_priv; 171 + u32 tcr; 172 + u32 tmp_ticks; 173 + u32 rem_ticks; 174 + 175 + /* set group tcr reg for cascade */ 176 + casc_priv = priv->timer[num].cascade_handle; 177 + if (!casc_priv) 178 + return -EINVAL; 179 + 180 + tcr = casc_priv->tcr_value | 181 + (casc_priv->tcr_value << MPIC_TIMER_TCR_ROVR_OFFSET); 182 + setbits32(priv->group_tcr, tcr); 183 + 184 + tmp_ticks = div_u64_rem(ticks, MAX_TICKS_CASCADE, &rem_ticks); 185 + 186 + out_be32(&priv->regs[num].gtccr, 0); 187 + out_be32(&priv->regs[num].gtbcr, tmp_ticks | TIMER_STOP); 188 + 189 + out_be32(&priv->regs[num - 1].gtccr, 0); 190 + out_be32(&priv->regs[num - 1].gtbcr, rem_ticks); 191 + 192 + return 0; 193 + } 194 + 195 + static struct mpic_timer *get_cascade_timer(struct timer_group_priv *priv, 196 + u64 ticks) 197 + { 198 + struct mpic_timer *allocated_timer; 199 + 200 + /* Two cascade timers: Support the maximum time */ 201 + const u64 max_ticks = (u64)MAX_TICKS * (u64)MAX_TICKS_CASCADE; 202 + int ret; 203 + 204 + if (ticks > max_ticks) 205 + return NULL; 206 + 207 + /* detect idle timer */ 208 + allocated_timer = detect_idle_cascade_timer(priv); 209 + if (!allocated_timer) 210 + return NULL; 211 + 212 + /* set ticks to timer */ 213 + ret = set_cascade_timer(priv, ticks, allocated_timer->num); 214 + if (ret < 0) 215 + return NULL; 216 + 217 + return allocated_timer; 218 + } 219 + 220 + static struct mpic_timer *get_timer(const struct timeval *time) 221 + { 222 + struct timer_group_priv *priv; 223 + struct mpic_timer *timer; 224 + 225 + u64 ticks; 226 + unsigned int num; 227 + unsigned int i; 228 + unsigned long flags; 229 + int ret; 230 + 231 + list_for_each_entry(priv, &timer_group_list, node) { 232 + ret = convert_time_to_ticks(priv, time, &ticks); 233 + if (ret < 0) 234 + return NULL; 235 + 236 + if (ticks > MAX_TICKS) { 237 + if (!(priv->flags & FSL_GLOBAL_TIMER)) 238 + return NULL; 239 + 240 + timer = get_cascade_timer(priv, ticks); 241 + if (!timer) 242 + continue; 243 + 244 + return timer; 245 + } 246 + 247 + for (i = 0; i < TIMERS_PER_GROUP; i++) { 248 + /* one timer: Reverse allocation */ 249 + num = TIMERS_PER_GROUP - 1 - i; 250 + spin_lock_irqsave(&priv->lock, flags); 251 + if (priv->idle & (1 << i)) { 252 + /* set timer busy */ 253 + priv->idle &= ~(1 << i); 254 + /* set ticks & stop timer */ 255 + out_be32(&priv->regs[num].gtbcr, 256 + ticks | TIMER_STOP); 257 + out_be32(&priv->regs[num].gtccr, 0); 258 + priv->timer[num].cascade_handle = NULL; 259 + spin_unlock_irqrestore(&priv->lock, flags); 260 + return &priv->timer[num]; 261 + } 262 + spin_unlock_irqrestore(&priv->lock, flags); 263 + } 264 + } 265 + 266 + return NULL; 267 + } 268 + 269 + /** 270 + * mpic_start_timer - start hardware timer 271 + * @handle: the timer to be started. 272 + * 273 + * It will do ->fn(->dev) callback from the hardware interrupt at 274 + * the ->timeval point in the future. 275 + */ 276 + void mpic_start_timer(struct mpic_timer *handle) 277 + { 278 + struct timer_group_priv *priv = container_of(handle, 279 + struct timer_group_priv, timer[handle->num]); 280 + 281 + clrbits32(&priv->regs[handle->num].gtbcr, TIMER_STOP); 282 + } 283 + EXPORT_SYMBOL(mpic_start_timer); 284 + 285 + /** 286 + * mpic_stop_timer - stop hardware timer 287 + * @handle: the timer to be stoped 288 + * 289 + * The timer periodically generates an interrupt. Unless user stops the timer. 290 + */ 291 + void mpic_stop_timer(struct mpic_timer *handle) 292 + { 293 + struct timer_group_priv *priv = container_of(handle, 294 + struct timer_group_priv, timer[handle->num]); 295 + struct cascade_priv *casc_priv; 296 + 297 + setbits32(&priv->regs[handle->num].gtbcr, TIMER_STOP); 298 + 299 + casc_priv = priv->timer[handle->num].cascade_handle; 300 + if (casc_priv) { 301 + out_be32(&priv->regs[handle->num].gtccr, 0); 302 + out_be32(&priv->regs[handle->num - 1].gtccr, 0); 303 + } else { 304 + out_be32(&priv->regs[handle->num].gtccr, 0); 305 + } 306 + } 307 + EXPORT_SYMBOL(mpic_stop_timer); 308 + 309 + /** 310 + * mpic_get_remain_time - get timer time 311 + * @handle: the timer to be selected. 312 + * @time: time for timer 313 + * 314 + * Query timer remaining time. 315 + */ 316 + void mpic_get_remain_time(struct mpic_timer *handle, struct timeval *time) 317 + { 318 + struct timer_group_priv *priv = container_of(handle, 319 + struct timer_group_priv, timer[handle->num]); 320 + struct cascade_priv *casc_priv; 321 + 322 + u64 ticks; 323 + u32 tmp_ticks; 324 + 325 + casc_priv = priv->timer[handle->num].cascade_handle; 326 + if (casc_priv) { 327 + tmp_ticks = in_be32(&priv->regs[handle->num].gtccr); 328 + ticks = ((u64)tmp_ticks & UINT_MAX) * (u64)MAX_TICKS_CASCADE; 329 + tmp_ticks = in_be32(&priv->regs[handle->num - 1].gtccr); 330 + ticks += tmp_ticks; 331 + } else { 332 + ticks = in_be32(&priv->regs[handle->num].gtccr); 333 + } 334 + 335 + convert_ticks_to_time(priv, ticks, time); 336 + } 337 + EXPORT_SYMBOL(mpic_get_remain_time); 338 + 339 + /** 340 + * mpic_free_timer - free hardware timer 341 + * @handle: the timer to be removed. 342 + * 343 + * Free the timer. 344 + * 345 + * Note: can not be used in interrupt context. 346 + */ 347 + void mpic_free_timer(struct mpic_timer *handle) 348 + { 349 + struct timer_group_priv *priv = container_of(handle, 350 + struct timer_group_priv, timer[handle->num]); 351 + 352 + struct cascade_priv *casc_priv; 353 + unsigned long flags; 354 + 355 + mpic_stop_timer(handle); 356 + 357 + casc_priv = priv->timer[handle->num].cascade_handle; 358 + 359 + free_irq(priv->timer[handle->num].irq, priv->timer[handle->num].dev); 360 + 361 + spin_lock_irqsave(&priv->lock, flags); 362 + if (casc_priv) { 363 + u32 tcr; 364 + tcr = casc_priv->tcr_value | (casc_priv->tcr_value << 365 + MPIC_TIMER_TCR_ROVR_OFFSET); 366 + clrbits32(priv->group_tcr, tcr); 367 + priv->idle |= casc_priv->cascade_map; 368 + priv->timer[handle->num].cascade_handle = NULL; 369 + } else { 370 + priv->idle |= TIMER_OFFSET(handle->num); 371 + } 372 + spin_unlock_irqrestore(&priv->lock, flags); 373 + } 374 + EXPORT_SYMBOL(mpic_free_timer); 375 + 376 + /** 377 + * mpic_request_timer - get a hardware timer 378 + * @fn: interrupt handler function 379 + * @dev: callback function of the data 380 + * @time: time for timer 381 + * 382 + * This executes the "request_irq", returning NULL 383 + * else "handle" on success. 384 + */ 385 + struct mpic_timer *mpic_request_timer(irq_handler_t fn, void *dev, 386 + const struct timeval *time) 387 + { 388 + struct mpic_timer *allocated_timer; 389 + int ret; 390 + 391 + if (list_empty(&timer_group_list)) 392 + return NULL; 393 + 394 + if (!(time->tv_sec + time->tv_usec) || 395 + time->tv_sec < 0 || time->tv_usec < 0) 396 + return NULL; 397 + 398 + if (time->tv_usec > ONE_SECOND) 399 + return NULL; 400 + 401 + allocated_timer = get_timer(time); 402 + if (!allocated_timer) 403 + return NULL; 404 + 405 + ret = request_irq(allocated_timer->irq, fn, 406 + IRQF_TRIGGER_LOW, "global-timer", dev); 407 + if (ret) { 408 + mpic_free_timer(allocated_timer); 409 + return NULL; 410 + } 411 + 412 + allocated_timer->dev = dev; 413 + 414 + return allocated_timer; 415 + } 416 + EXPORT_SYMBOL(mpic_request_timer); 417 + 418 + static int timer_group_get_freq(struct device_node *np, 419 + struct timer_group_priv *priv) 420 + { 421 + u32 div; 422 + 423 + if (priv->flags & FSL_GLOBAL_TIMER) { 424 + struct device_node *dn; 425 + 426 + dn = of_find_compatible_node(NULL, NULL, "fsl,mpic"); 427 + if (dn) { 428 + of_property_read_u32(dn, "clock-frequency", 429 + &priv->timerfreq); 430 + of_node_put(dn); 431 + } 432 + } 433 + 434 + if (priv->timerfreq <= 0) 435 + return -EINVAL; 436 + 437 + if (priv->flags & FSL_GLOBAL_TIMER) { 438 + div = (1 << (MPIC_TIMER_TCR_CLKDIV >> 8)) * 8; 439 + priv->timerfreq /= div; 440 + } 441 + 442 + return 0; 443 + } 444 + 445 + static int timer_group_get_irq(struct device_node *np, 446 + struct timer_group_priv *priv) 447 + { 448 + const u32 all_timer[] = { 0, TIMERS_PER_GROUP }; 449 + const u32 *p; 450 + u32 offset; 451 + u32 count; 452 + 453 + unsigned int i; 454 + unsigned int j; 455 + unsigned int irq_index = 0; 456 + unsigned int irq; 457 + int len; 458 + 459 + p = of_get_property(np, "fsl,available-ranges", &len); 460 + if (p && len % (2 * sizeof(u32)) != 0) { 461 + pr_err("%s: malformed available-ranges property.\n", 462 + np->full_name); 463 + return -EINVAL; 464 + } 465 + 466 + if (!p) { 467 + p = all_timer; 468 + len = sizeof(all_timer); 469 + } 470 + 471 + len /= 2 * sizeof(u32); 472 + 473 + for (i = 0; i < len; i++) { 474 + offset = p[i * 2]; 475 + count = p[i * 2 + 1]; 476 + for (j = 0; j < count; j++) { 477 + irq = irq_of_parse_and_map(np, irq_index); 478 + if (!irq) { 479 + pr_err("%s: irq parse and map failed.\n", 480 + np->full_name); 481 + return -EINVAL; 482 + } 483 + 484 + /* Set timer idle */ 485 + priv->idle |= TIMER_OFFSET((offset + j)); 486 + priv->timer[offset + j].irq = irq; 487 + priv->timer[offset + j].num = offset + j; 488 + irq_index++; 489 + } 490 + } 491 + 492 + return 0; 493 + } 494 + 495 + static void timer_group_init(struct device_node *np) 496 + { 497 + struct timer_group_priv *priv; 498 + unsigned int i = 0; 499 + int ret; 500 + 501 + priv = kzalloc(sizeof(struct timer_group_priv), GFP_KERNEL); 502 + if (!priv) { 503 + pr_err("%s: cannot allocate memory for group.\n", 504 + np->full_name); 505 + return; 506 + } 507 + 508 + if (of_device_is_compatible(np, "fsl,mpic-global-timer")) 509 + priv->flags |= FSL_GLOBAL_TIMER; 510 + 511 + priv->regs = of_iomap(np, i++); 512 + if (!priv->regs) { 513 + pr_err("%s: cannot ioremap timer register address.\n", 514 + np->full_name); 515 + goto out; 516 + } 517 + 518 + if (priv->flags & FSL_GLOBAL_TIMER) { 519 + priv->group_tcr = of_iomap(np, i++); 520 + if (!priv->group_tcr) { 521 + pr_err("%s: cannot ioremap tcr address.\n", 522 + np->full_name); 523 + goto out; 524 + } 525 + } 526 + 527 + ret = timer_group_get_freq(np, priv); 528 + if (ret < 0) { 529 + pr_err("%s: cannot get timer frequency.\n", np->full_name); 530 + goto out; 531 + } 532 + 533 + ret = timer_group_get_irq(np, priv); 534 + if (ret < 0) { 535 + pr_err("%s: cannot get timer irqs.\n", np->full_name); 536 + goto out; 537 + } 538 + 539 + spin_lock_init(&priv->lock); 540 + 541 + /* Init FSL timer hardware */ 542 + if (priv->flags & FSL_GLOBAL_TIMER) 543 + setbits32(priv->group_tcr, MPIC_TIMER_TCR_CLKDIV); 544 + 545 + list_add_tail(&priv->node, &timer_group_list); 546 + 547 + return; 548 + 549 + out: 550 + if (priv->regs) 551 + iounmap(priv->regs); 552 + 553 + if (priv->group_tcr) 554 + iounmap(priv->group_tcr); 555 + 556 + kfree(priv); 557 + } 558 + 559 + static void mpic_timer_resume(void) 560 + { 561 + struct timer_group_priv *priv; 562 + 563 + list_for_each_entry(priv, &timer_group_list, node) { 564 + /* Init FSL timer hardware */ 565 + if (priv->flags & FSL_GLOBAL_TIMER) 566 + setbits32(priv->group_tcr, MPIC_TIMER_TCR_CLKDIV); 567 + } 568 + } 569 + 570 + static const struct of_device_id mpic_timer_ids[] = { 571 + { .compatible = "fsl,mpic-global-timer", }, 572 + {}, 573 + }; 574 + 575 + static struct syscore_ops mpic_timer_syscore_ops = { 576 + .resume = mpic_timer_resume, 577 + }; 578 + 579 + static int __init mpic_timer_init(void) 580 + { 581 + struct device_node *np = NULL; 582 + 583 + for_each_matching_node(np, mpic_timer_ids) 584 + timer_group_init(np); 585 + 586 + register_syscore_ops(&mpic_timer_syscore_ops); 587 + 588 + if (list_empty(&timer_group_list)) 589 + return -ENODEV; 590 + 591 + return 0; 592 + } 593 + subsys_initcall(mpic_timer_init);
+3 -2
arch/s390/include/asm/pgtable.h
··· 1364 1364 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 1365 1365 1366 1366 #define __HAVE_ARCH_PGTABLE_DEPOSIT 1367 - extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable); 1367 + extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, 1368 + pgtable_t pgtable); 1368 1369 1369 1370 #define __HAVE_ARCH_PGTABLE_WITHDRAW 1370 - extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm); 1371 + extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp); 1371 1372 1372 1373 static inline int pmd_trans_splitting(pmd_t pmd) 1373 1374 {
+3 -2
arch/s390/mm/pgtable.c
··· 1165 1165 } 1166 1166 } 1167 1167 1168 - void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable) 1168 + void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, 1169 + pgtable_t pgtable) 1169 1170 { 1170 1171 struct list_head *lh = (struct list_head *) pgtable; 1171 1172 ··· 1180 1179 mm->pmd_huge_pte = pgtable; 1181 1180 } 1182 1181 1183 - pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm) 1182 + pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) 1184 1183 { 1185 1184 struct list_head *lh; 1186 1185 pgtable_t pgtable;
+3 -2
arch/sparc/include/asm/pgtable_64.h
··· 853 853 pmd_t *pmd); 854 854 855 855 #define __HAVE_ARCH_PGTABLE_DEPOSIT 856 - extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable); 856 + extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, 857 + pgtable_t pgtable); 857 858 858 859 #define __HAVE_ARCH_PGTABLE_WITHDRAW 859 - extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm); 860 + extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp); 860 861 #endif 861 862 862 863 /* Encode and de-code a swap entry */
+3 -2
arch/sparc/mm/tlb.c
··· 188 188 } 189 189 } 190 190 191 - void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable) 191 + void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, 192 + pgtable_t pgtable) 192 193 { 193 194 struct list_head *lh = (struct list_head *) pgtable; 194 195 ··· 203 202 mm->pmd_huge_pte = pgtable; 204 203 } 205 204 206 - pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm) 205 + pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) 207 206 { 208 207 struct list_head *lh; 209 208 pgtable_t pgtable;
+2 -2
drivers/acpi/apei/erst.c
··· 935 935 struct timespec *time, char **buf, 936 936 struct pstore_info *psi); 937 937 static int erst_writer(enum pstore_type_id type, enum kmsg_dump_reason reason, 938 - u64 *id, unsigned int part, int count, 938 + u64 *id, unsigned int part, int count, size_t hsize, 939 939 size_t size, struct pstore_info *psi); 940 940 static int erst_clearer(enum pstore_type_id type, u64 id, int count, 941 941 struct timespec time, struct pstore_info *psi); ··· 1055 1055 } 1056 1056 1057 1057 static int erst_writer(enum pstore_type_id type, enum kmsg_dump_reason reason, 1058 - u64 *id, unsigned int part, int count, 1058 + u64 *id, unsigned int part, int count, size_t hsize, 1059 1059 size_t size, struct pstore_info *psi) 1060 1060 { 1061 1061 struct cper_pstore_record *rcd = (struct cper_pstore_record *)
+1 -1
drivers/firmware/efi/efi-pstore.c
··· 103 103 104 104 static int efi_pstore_write(enum pstore_type_id type, 105 105 enum kmsg_dump_reason reason, u64 *id, 106 - unsigned int part, int count, size_t size, 106 + unsigned int part, int count, size_t hsize, size_t size, 107 107 struct pstore_info *psi) 108 108 { 109 109 char name[DUMP_NAME_LEN];
+8
drivers/i2c/busses/i2c-cpm.c
··· 338 338 tptr = 0; 339 339 rptr = 0; 340 340 341 + /* 342 + * If there was a collision in the last i2c transaction, 343 + * Set I2COM_MASTER as it was cleared during collision. 344 + */ 345 + if (in_be16(&tbdf->cbd_sc) & BD_SC_CL) { 346 + out_8(&cpm->i2c_reg->i2com, I2COM_MASTER); 347 + } 348 + 341 349 while (tptr < num) { 342 350 pmsg = &msgs[tptr]; 343 351 dev_dbg(&adap->dev, "R: %d T: %d\n", rptr, tptr);
+8
drivers/iommu/Kconfig
··· 261 261 default 256 if SHMOBILE_IOMMU_ADDRSIZE_64MB 262 262 default 128 if SHMOBILE_IOMMU_ADDRSIZE_32MB 263 263 264 + config SPAPR_TCE_IOMMU 265 + bool "sPAPR TCE IOMMU Support" 266 + depends on PPC_POWERNV || PPC_PSERIES 267 + select IOMMU_API 268 + help 269 + Enables bits of IOMMU API required by VFIO. The iommu_ops 270 + is not implemented as it is not necessary for VFIO. 271 + 264 272 endif # IOMMU_SUPPORT
+1 -1
drivers/macintosh/adb.c
··· 697 697 int ret = 0; 698 698 struct adbdev_state *state = file->private_data; 699 699 struct adb_request *req; 700 - wait_queue_t wait = __WAITQUEUE_INITIALIZER(wait,current); 700 + DECLARE_WAITQUEUE(wait,current); 701 701 unsigned long flags; 702 702 703 703 if (count < 2)
+4 -4
drivers/macintosh/mac_hid.c
··· 181 181 mac_hid_destroy_emumouse(); 182 182 } 183 183 184 - static int mac_hid_toggle_emumouse(ctl_table *table, int write, 184 + static int mac_hid_toggle_emumouse(struct ctl_table *table, int write, 185 185 void __user *buffer, size_t *lenp, 186 186 loff_t *ppos) 187 187 { ··· 214 214 } 215 215 216 216 /* file(s) in /proc/sys/dev/mac_hid */ 217 - static ctl_table mac_hid_files[] = { 217 + static struct ctl_table mac_hid_files[] = { 218 218 { 219 219 .procname = "mouse_button_emulation", 220 220 .data = &mouse_emulate_buttons, ··· 240 240 }; 241 241 242 242 /* dir in /proc/sys/dev */ 243 - static ctl_table mac_hid_dir[] = { 243 + static struct ctl_table mac_hid_dir[] = { 244 244 { 245 245 .procname = "mac_hid", 246 246 .maxlen = 0, ··· 251 251 }; 252 252 253 253 /* /proc/sys/dev itself, in case that is not there yet */ 254 - static ctl_table mac_hid_root_dir[] = { 254 + static struct ctl_table mac_hid_root_dir[] = { 255 255 { 256 256 .procname = "dev", 257 257 .maxlen = 0,
+1 -1
drivers/macintosh/via-cuda.c
··· 259 259 } while (0) 260 260 261 261 static int 262 - cuda_init_via(void) 262 + __init cuda_init_via(void) 263 263 { 264 264 out_8(&via[DIRB], (in_8(&via[DIRB]) | TACK | TIP) & ~TREQ); /* TACK & TIP out */ 265 265 out_8(&via[B], in_8(&via[B]) | TACK | TIP); /* negate them */
+5 -1
drivers/macintosh/windfarm_pm121.c
··· 276 276 277 277 static unsigned int pm121_failure_state; 278 278 static int pm121_readjust, pm121_skipping; 279 + static bool pm121_overtemp; 279 280 static s32 average_power; 280 281 281 282 struct pm121_correction { ··· 848 847 if (new_failure & FAILURE_OVERTEMP) { 849 848 wf_set_overtemp(); 850 849 pm121_skipping = 2; 850 + pm121_overtemp = true; 851 851 } 852 852 853 853 /* We only clear the overtemp condition if overtemp is cleared ··· 857 855 * the control loop levels, but we don't want to keep it clear 858 856 * here in this case 859 857 */ 860 - if (new_failure == 0 && last_failure & FAILURE_OVERTEMP) 858 + if (!pm121_failure_state && pm121_overtemp) { 861 859 wf_clear_overtemp(); 860 + pm121_overtemp = false; 861 + } 862 862 } 863 863 864 864
+5 -1
drivers/macintosh/windfarm_pm81.c
··· 149 149 150 150 static unsigned int wf_smu_failure_state; 151 151 static int wf_smu_readjust, wf_smu_skipping; 152 + static bool wf_smu_overtemp; 152 153 153 154 /* 154 155 * ****** System Fans Control Loop ****** ··· 594 593 if (new_failure & FAILURE_OVERTEMP) { 595 594 wf_set_overtemp(); 596 595 wf_smu_skipping = 2; 596 + wf_smu_overtemp = true; 597 597 } 598 598 599 599 /* We only clear the overtemp condition if overtemp is cleared ··· 603 601 * the control loop levels, but we don't want to keep it clear 604 602 * here in this case 605 603 */ 606 - if (new_failure == 0 && last_failure & FAILURE_OVERTEMP) 604 + if (!wf_smu_failure_state && wf_smu_overtemp) { 607 605 wf_clear_overtemp(); 606 + wf_smu_overtemp = false; 607 + } 608 608 } 609 609 610 610 static void wf_smu_new_control(struct wf_control *ct)
+5 -1
drivers/macintosh/windfarm_pm91.c
··· 76 76 77 77 /* Set to kick the control loop into life */ 78 78 static int wf_smu_all_controls_ok, wf_smu_all_sensors_ok, wf_smu_started; 79 + static bool wf_smu_overtemp; 79 80 80 81 /* Failure handling.. could be nicer */ 81 82 #define FAILURE_FAN 0x01 ··· 518 517 if (new_failure & FAILURE_OVERTEMP) { 519 518 wf_set_overtemp(); 520 519 wf_smu_skipping = 2; 520 + wf_smu_overtemp = true; 521 521 } 522 522 523 523 /* We only clear the overtemp condition if overtemp is cleared ··· 527 525 * the control loop levels, but we don't want to keep it clear 528 526 * here in this case 529 527 */ 530 - if (new_failure == 0 && last_failure & FAILURE_OVERTEMP) 528 + if (!wf_smu_failure_state && wf_smu_overtemp) { 531 529 wf_clear_overtemp(); 530 + wf_smu_overtemp = false; 531 + } 532 532 } 533 533 534 534
-1
drivers/macintosh/windfarm_smu_sat.c
··· 343 343 wf_unregister_sensor(&sens->sens); 344 344 } 345 345 sat->i2c = NULL; 346 - i2c_set_clientdata(client, NULL); 347 346 kref_put(&sat->ref, wf_sat_release); 348 347 349 348 return 0;
+6
drivers/vfio/Kconfig
··· 3 3 depends on VFIO 4 4 default n 5 5 6 + config VFIO_IOMMU_SPAPR_TCE 7 + tristate 8 + depends on VFIO && SPAPR_TCE_IOMMU 9 + default n 10 + 6 11 menuconfig VFIO 7 12 tristate "VFIO Non-Privileged userspace driver framework" 8 13 depends on IOMMU_API 9 14 select VFIO_IOMMU_TYPE1 if X86 15 + select VFIO_IOMMU_SPAPR_TCE if (PPC_POWERNV || PPC_PSERIES) 10 16 help 11 17 VFIO provides a framework for secure userspace device drivers. 12 18 See Documentation/vfio.txt for more details.
+1
drivers/vfio/Makefile
··· 1 1 obj-$(CONFIG_VFIO) += vfio.o 2 2 obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o 3 + obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o 3 4 obj-$(CONFIG_VFIO_PCI) += pci/
+1
drivers/vfio/vfio.c
··· 1415 1415 * drivers. 1416 1416 */ 1417 1417 request_module_nowait("vfio_iommu_type1"); 1418 + request_module_nowait("vfio_iommu_spapr_tce"); 1418 1419 1419 1420 return 0; 1420 1421
+377
drivers/vfio/vfio_iommu_spapr_tce.c
··· 1 + /* 2 + * VFIO: IOMMU DMA mapping support for TCE on POWER 3 + * 4 + * Copyright (C) 2013 IBM Corp. All rights reserved. 5 + * Author: Alexey Kardashevskiy <aik@ozlabs.ru> 6 + * 7 + * This program is free software; you can redistribute it and/or modify 8 + * it under the terms of the GNU General Public License version 2 as 9 + * published by the Free Software Foundation. 10 + * 11 + * Derived from original vfio_iommu_type1.c: 12 + * Copyright (C) 2012 Red Hat, Inc. All rights reserved. 13 + * Author: Alex Williamson <alex.williamson@redhat.com> 14 + */ 15 + 16 + #include <linux/module.h> 17 + #include <linux/pci.h> 18 + #include <linux/slab.h> 19 + #include <linux/uaccess.h> 20 + #include <linux/err.h> 21 + #include <linux/vfio.h> 22 + #include <asm/iommu.h> 23 + #include <asm/tce.h> 24 + 25 + #define DRIVER_VERSION "0.1" 26 + #define DRIVER_AUTHOR "aik@ozlabs.ru" 27 + #define DRIVER_DESC "VFIO IOMMU SPAPR TCE" 28 + 29 + static void tce_iommu_detach_group(void *iommu_data, 30 + struct iommu_group *iommu_group); 31 + 32 + /* 33 + * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation 34 + * 35 + * This code handles mapping and unmapping of user data buffers 36 + * into DMA'ble space using the IOMMU 37 + */ 38 + 39 + /* 40 + * The container descriptor supports only a single group per container. 41 + * Required by the API as the container is not supplied with the IOMMU group 42 + * at the moment of initialization. 43 + */ 44 + struct tce_container { 45 + struct mutex lock; 46 + struct iommu_table *tbl; 47 + bool enabled; 48 + }; 49 + 50 + static int tce_iommu_enable(struct tce_container *container) 51 + { 52 + int ret = 0; 53 + unsigned long locked, lock_limit, npages; 54 + struct iommu_table *tbl = container->tbl; 55 + 56 + if (!container->tbl) 57 + return -ENXIO; 58 + 59 + if (!current->mm) 60 + return -ESRCH; /* process exited */ 61 + 62 + if (container->enabled) 63 + return -EBUSY; 64 + 65 + /* 66 + * When userspace pages are mapped into the IOMMU, they are effectively 67 + * locked memory, so, theoretically, we need to update the accounting 68 + * of locked pages on each map and unmap. For powerpc, the map unmap 69 + * paths can be very hot, though, and the accounting would kill 70 + * performance, especially since it would be difficult to impossible 71 + * to handle the accounting in real mode only. 72 + * 73 + * To address that, rather than precisely accounting every page, we 74 + * instead account for a worst case on locked memory when the iommu is 75 + * enabled and disabled. The worst case upper bound on locked memory 76 + * is the size of the whole iommu window, which is usually relatively 77 + * small (compared to total memory sizes) on POWER hardware. 78 + * 79 + * Also we don't have a nice way to fail on H_PUT_TCE due to ulimits, 80 + * that would effectively kill the guest at random points, much better 81 + * enforcing the limit based on the max that the guest can map. 82 + */ 83 + down_write(&current->mm->mmap_sem); 84 + npages = (tbl->it_size << IOMMU_PAGE_SHIFT) >> PAGE_SHIFT; 85 + locked = current->mm->locked_vm + npages; 86 + lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; 87 + if (locked > lock_limit && !capable(CAP_IPC_LOCK)) { 88 + pr_warn("RLIMIT_MEMLOCK (%ld) exceeded\n", 89 + rlimit(RLIMIT_MEMLOCK)); 90 + ret = -ENOMEM; 91 + } else { 92 + 93 + current->mm->locked_vm += npages; 94 + container->enabled = true; 95 + } 96 + up_write(&current->mm->mmap_sem); 97 + 98 + return ret; 99 + } 100 + 101 + static void tce_iommu_disable(struct tce_container *container) 102 + { 103 + if (!container->enabled) 104 + return; 105 + 106 + container->enabled = false; 107 + 108 + if (!container->tbl || !current->mm) 109 + return; 110 + 111 + down_write(&current->mm->mmap_sem); 112 + current->mm->locked_vm -= (container->tbl->it_size << 113 + IOMMU_PAGE_SHIFT) >> PAGE_SHIFT; 114 + up_write(&current->mm->mmap_sem); 115 + } 116 + 117 + static void *tce_iommu_open(unsigned long arg) 118 + { 119 + struct tce_container *container; 120 + 121 + if (arg != VFIO_SPAPR_TCE_IOMMU) { 122 + pr_err("tce_vfio: Wrong IOMMU type\n"); 123 + return ERR_PTR(-EINVAL); 124 + } 125 + 126 + container = kzalloc(sizeof(*container), GFP_KERNEL); 127 + if (!container) 128 + return ERR_PTR(-ENOMEM); 129 + 130 + mutex_init(&container->lock); 131 + 132 + return container; 133 + } 134 + 135 + static void tce_iommu_release(void *iommu_data) 136 + { 137 + struct tce_container *container = iommu_data; 138 + 139 + WARN_ON(container->tbl && !container->tbl->it_group); 140 + tce_iommu_disable(container); 141 + 142 + if (container->tbl && container->tbl->it_group) 143 + tce_iommu_detach_group(iommu_data, container->tbl->it_group); 144 + 145 + mutex_destroy(&container->lock); 146 + 147 + kfree(container); 148 + } 149 + 150 + static long tce_iommu_ioctl(void *iommu_data, 151 + unsigned int cmd, unsigned long arg) 152 + { 153 + struct tce_container *container = iommu_data; 154 + unsigned long minsz; 155 + long ret; 156 + 157 + switch (cmd) { 158 + case VFIO_CHECK_EXTENSION: 159 + return (arg == VFIO_SPAPR_TCE_IOMMU) ? 1 : 0; 160 + 161 + case VFIO_IOMMU_SPAPR_TCE_GET_INFO: { 162 + struct vfio_iommu_spapr_tce_info info; 163 + struct iommu_table *tbl = container->tbl; 164 + 165 + if (WARN_ON(!tbl)) 166 + return -ENXIO; 167 + 168 + minsz = offsetofend(struct vfio_iommu_spapr_tce_info, 169 + dma32_window_size); 170 + 171 + if (copy_from_user(&info, (void __user *)arg, minsz)) 172 + return -EFAULT; 173 + 174 + if (info.argsz < minsz) 175 + return -EINVAL; 176 + 177 + info.dma32_window_start = tbl->it_offset << IOMMU_PAGE_SHIFT; 178 + info.dma32_window_size = tbl->it_size << IOMMU_PAGE_SHIFT; 179 + info.flags = 0; 180 + 181 + if (copy_to_user((void __user *)arg, &info, minsz)) 182 + return -EFAULT; 183 + 184 + return 0; 185 + } 186 + case VFIO_IOMMU_MAP_DMA: { 187 + struct vfio_iommu_type1_dma_map param; 188 + struct iommu_table *tbl = container->tbl; 189 + unsigned long tce, i; 190 + 191 + if (!tbl) 192 + return -ENXIO; 193 + 194 + BUG_ON(!tbl->it_group); 195 + 196 + minsz = offsetofend(struct vfio_iommu_type1_dma_map, size); 197 + 198 + if (copy_from_user(&param, (void __user *)arg, minsz)) 199 + return -EFAULT; 200 + 201 + if (param.argsz < minsz) 202 + return -EINVAL; 203 + 204 + if (param.flags & ~(VFIO_DMA_MAP_FLAG_READ | 205 + VFIO_DMA_MAP_FLAG_WRITE)) 206 + return -EINVAL; 207 + 208 + if ((param.size & ~IOMMU_PAGE_MASK) || 209 + (param.vaddr & ~IOMMU_PAGE_MASK)) 210 + return -EINVAL; 211 + 212 + /* iova is checked by the IOMMU API */ 213 + tce = param.vaddr; 214 + if (param.flags & VFIO_DMA_MAP_FLAG_READ) 215 + tce |= TCE_PCI_READ; 216 + if (param.flags & VFIO_DMA_MAP_FLAG_WRITE) 217 + tce |= TCE_PCI_WRITE; 218 + 219 + ret = iommu_tce_put_param_check(tbl, param.iova, tce); 220 + if (ret) 221 + return ret; 222 + 223 + for (i = 0; i < (param.size >> IOMMU_PAGE_SHIFT); ++i) { 224 + ret = iommu_put_tce_user_mode(tbl, 225 + (param.iova >> IOMMU_PAGE_SHIFT) + i, 226 + tce); 227 + if (ret) 228 + break; 229 + tce += IOMMU_PAGE_SIZE; 230 + } 231 + if (ret) 232 + iommu_clear_tces_and_put_pages(tbl, 233 + param.iova >> IOMMU_PAGE_SHIFT, i); 234 + 235 + iommu_flush_tce(tbl); 236 + 237 + return ret; 238 + } 239 + case VFIO_IOMMU_UNMAP_DMA: { 240 + struct vfio_iommu_type1_dma_unmap param; 241 + struct iommu_table *tbl = container->tbl; 242 + 243 + if (WARN_ON(!tbl)) 244 + return -ENXIO; 245 + 246 + minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, 247 + size); 248 + 249 + if (copy_from_user(&param, (void __user *)arg, minsz)) 250 + return -EFAULT; 251 + 252 + if (param.argsz < minsz) 253 + return -EINVAL; 254 + 255 + /* No flag is supported now */ 256 + if (param.flags) 257 + return -EINVAL; 258 + 259 + if (param.size & ~IOMMU_PAGE_MASK) 260 + return -EINVAL; 261 + 262 + ret = iommu_tce_clear_param_check(tbl, param.iova, 0, 263 + param.size >> IOMMU_PAGE_SHIFT); 264 + if (ret) 265 + return ret; 266 + 267 + ret = iommu_clear_tces_and_put_pages(tbl, 268 + param.iova >> IOMMU_PAGE_SHIFT, 269 + param.size >> IOMMU_PAGE_SHIFT); 270 + iommu_flush_tce(tbl); 271 + 272 + return ret; 273 + } 274 + case VFIO_IOMMU_ENABLE: 275 + mutex_lock(&container->lock); 276 + ret = tce_iommu_enable(container); 277 + mutex_unlock(&container->lock); 278 + return ret; 279 + 280 + 281 + case VFIO_IOMMU_DISABLE: 282 + mutex_lock(&container->lock); 283 + tce_iommu_disable(container); 284 + mutex_unlock(&container->lock); 285 + return 0; 286 + } 287 + 288 + return -ENOTTY; 289 + } 290 + 291 + static int tce_iommu_attach_group(void *iommu_data, 292 + struct iommu_group *iommu_group) 293 + { 294 + int ret; 295 + struct tce_container *container = iommu_data; 296 + struct iommu_table *tbl = iommu_group_get_iommudata(iommu_group); 297 + 298 + BUG_ON(!tbl); 299 + mutex_lock(&container->lock); 300 + 301 + /* pr_debug("tce_vfio: Attaching group #%u to iommu %p\n", 302 + iommu_group_id(iommu_group), iommu_group); */ 303 + if (container->tbl) { 304 + pr_warn("tce_vfio: Only one group per IOMMU container is allowed, existing id=%d, attaching id=%d\n", 305 + iommu_group_id(container->tbl->it_group), 306 + iommu_group_id(iommu_group)); 307 + ret = -EBUSY; 308 + } else if (container->enabled) { 309 + pr_err("tce_vfio: attaching group #%u to enabled container\n", 310 + iommu_group_id(iommu_group)); 311 + ret = -EBUSY; 312 + } else { 313 + ret = iommu_take_ownership(tbl); 314 + if (!ret) 315 + container->tbl = tbl; 316 + } 317 + 318 + mutex_unlock(&container->lock); 319 + 320 + return ret; 321 + } 322 + 323 + static void tce_iommu_detach_group(void *iommu_data, 324 + struct iommu_group *iommu_group) 325 + { 326 + struct tce_container *container = iommu_data; 327 + struct iommu_table *tbl = iommu_group_get_iommudata(iommu_group); 328 + 329 + BUG_ON(!tbl); 330 + mutex_lock(&container->lock); 331 + if (tbl != container->tbl) { 332 + pr_warn("tce_vfio: detaching group #%u, expected group is #%u\n", 333 + iommu_group_id(iommu_group), 334 + iommu_group_id(tbl->it_group)); 335 + } else { 336 + if (container->enabled) { 337 + pr_warn("tce_vfio: detaching group #%u from enabled container, forcing disable\n", 338 + iommu_group_id(tbl->it_group)); 339 + tce_iommu_disable(container); 340 + } 341 + 342 + /* pr_debug("tce_vfio: detaching group #%u from iommu %p\n", 343 + iommu_group_id(iommu_group), iommu_group); */ 344 + container->tbl = NULL; 345 + iommu_release_ownership(tbl); 346 + } 347 + mutex_unlock(&container->lock); 348 + } 349 + 350 + const struct vfio_iommu_driver_ops tce_iommu_driver_ops = { 351 + .name = "iommu-vfio-powerpc", 352 + .owner = THIS_MODULE, 353 + .open = tce_iommu_open, 354 + .release = tce_iommu_release, 355 + .ioctl = tce_iommu_ioctl, 356 + .attach_group = tce_iommu_attach_group, 357 + .detach_group = tce_iommu_detach_group, 358 + }; 359 + 360 + static int __init tce_iommu_init(void) 361 + { 362 + return vfio_register_iommu_driver(&tce_iommu_driver_ops); 363 + } 364 + 365 + static void __exit tce_iommu_cleanup(void) 366 + { 367 + vfio_unregister_iommu_driver(&tce_iommu_driver_ops); 368 + } 369 + 370 + module_init(tce_iommu_init); 371 + module_exit(tce_iommu_cleanup); 372 + 373 + MODULE_VERSION(DRIVER_VERSION); 374 + MODULE_LICENSE("GPL v2"); 375 + MODULE_AUTHOR(DRIVER_AUTHOR); 376 + MODULE_DESCRIPTION(DRIVER_DESC); 377 +
+8
drivers/watchdog/booke_wdt.c
··· 138 138 val &= ~WDTP_MASK; 139 139 val |= (TCR_WIE|TCR_WRC(WRC_CHIP)|WDTP(booke_wdt_period)); 140 140 141 + #ifdef CONFIG_PPC_BOOK3E_64 142 + /* 143 + * Crit ints are currently broken on PPC64 Book-E, so 144 + * just disable them for now. 145 + */ 146 + val &= ~TCR_WIE; 147 + #endif 148 + 141 149 mtspr(SPRN_TCR, val); 142 150 } 143 151
+1 -1
fs/pstore/ftrace.c
··· 44 44 rec.parent_ip = parent_ip; 45 45 pstore_ftrace_encode_cpu(&rec, raw_smp_processor_id()); 46 46 psinfo->write_buf(PSTORE_TYPE_FTRACE, 0, NULL, 0, (void *)&rec, 47 - sizeof(rec), psinfo); 47 + 0, sizeof(rec), psinfo); 48 48 49 49 local_irq_restore(flags); 50 50 }
+9
fs/pstore/inode.c
··· 326 326 case PSTORE_TYPE_MCE: 327 327 sprintf(name, "mce-%s-%lld", psname, id); 328 328 break; 329 + case PSTORE_TYPE_PPC_RTAS: 330 + sprintf(name, "rtas-%s-%lld", psname, id); 331 + break; 332 + case PSTORE_TYPE_PPC_OF: 333 + sprintf(name, "powerpc-ofw-%s-%lld", psname, id); 334 + break; 335 + case PSTORE_TYPE_PPC_COMMON: 336 + sprintf(name, "powerpc-common-%s-%lld", psname, id); 337 + break; 329 338 case PSTORE_TYPE_UNKNOWN: 330 339 sprintf(name, "unknown-%s-%lld", psname, id); 331 340 break;
+6 -4
fs/pstore/platform.c
··· 159 159 break; 160 160 161 161 ret = psinfo->write(PSTORE_TYPE_DMESG, reason, &id, part, 162 - oopscount, hsize + len, psinfo); 162 + oopscount, hsize, hsize + len, psinfo); 163 163 if (ret == 0 && reason == KMSG_DUMP_OOPS && pstore_is_mounted()) 164 164 pstore_new_entry = 1; 165 165 ··· 196 196 spin_lock_irqsave(&psinfo->buf_lock, flags); 197 197 } 198 198 memcpy(psinfo->buf, s, c); 199 - psinfo->write(PSTORE_TYPE_CONSOLE, 0, &id, 0, 0, c, psinfo); 199 + psinfo->write(PSTORE_TYPE_CONSOLE, 0, &id, 0, 0, 0, c, psinfo); 200 200 spin_unlock_irqrestore(&psinfo->buf_lock, flags); 201 201 s += c; 202 202 c = e - s; ··· 221 221 static int pstore_write_compat(enum pstore_type_id type, 222 222 enum kmsg_dump_reason reason, 223 223 u64 *id, unsigned int part, int count, 224 - size_t size, struct pstore_info *psi) 224 + size_t hsize, size_t size, 225 + struct pstore_info *psi) 225 226 { 226 - return psi->write_buf(type, reason, id, part, psinfo->buf, size, psi); 227 + return psi->write_buf(type, reason, id, part, psinfo->buf, hsize, 228 + size, psi); 227 229 } 228 230 229 231 /*
+2 -1
fs/pstore/ram.c
··· 195 195 static int notrace ramoops_pstore_write_buf(enum pstore_type_id type, 196 196 enum kmsg_dump_reason reason, 197 197 u64 *id, unsigned int part, 198 - const char *buf, size_t size, 198 + const char *buf, 199 + size_t hsize, size_t size, 199 200 struct pstore_info *psi) 200 201 { 201 202 struct ramoops_context *cxt = psi->data;
+3 -2
include/asm-generic/pgtable.h
··· 173 173 #endif 174 174 175 175 #ifndef __HAVE_ARCH_PGTABLE_DEPOSIT 176 - extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable); 176 + extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, 177 + pgtable_t pgtable); 177 178 #endif 178 179 179 180 #ifndef __HAVE_ARCH_PGTABLE_WITHDRAW 180 - extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm); 181 + extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp); 181 182 #endif 182 183 183 184 #ifndef __HAVE_ARCH_PMDP_INVALIDATE
+3 -3
include/linux/huge_mm.h
··· 60 60 #define HPAGE_PMD_NR (1<<HPAGE_PMD_ORDER) 61 61 62 62 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 63 - #define HPAGE_PMD_SHIFT HPAGE_SHIFT 64 - #define HPAGE_PMD_MASK HPAGE_MASK 65 - #define HPAGE_PMD_SIZE HPAGE_SIZE 63 + #define HPAGE_PMD_SHIFT PMD_SHIFT 64 + #define HPAGE_PMD_SIZE ((1UL) << HPAGE_PMD_SHIFT) 65 + #define HPAGE_PMD_MASK (~(HPAGE_PMD_SIZE - 1)) 66 66 67 67 extern bool is_vma_temporary_stack(struct vm_area_struct *vma); 68 68
+8 -4
include/linux/pstore.h
··· 35 35 PSTORE_TYPE_MCE = 1, 36 36 PSTORE_TYPE_CONSOLE = 2, 37 37 PSTORE_TYPE_FTRACE = 3, 38 + /* PPC64 partition types */ 39 + PSTORE_TYPE_PPC_RTAS = 4, 40 + PSTORE_TYPE_PPC_OF = 5, 41 + PSTORE_TYPE_PPC_COMMON = 6, 38 42 PSTORE_TYPE_UNKNOWN = 255 39 43 }; 40 44 ··· 58 54 struct pstore_info *psi); 59 55 int (*write)(enum pstore_type_id type, 60 56 enum kmsg_dump_reason reason, u64 *id, 61 - unsigned int part, int count, size_t size, 62 - struct pstore_info *psi); 57 + unsigned int part, int count, size_t hsize, 58 + size_t size, struct pstore_info *psi); 63 59 int (*write_buf)(enum pstore_type_id type, 64 60 enum kmsg_dump_reason reason, u64 *id, 65 - unsigned int part, const char *buf, size_t size, 66 - struct pstore_info *psi); 61 + unsigned int part, const char *buf, size_t hsize, 62 + size_t size, struct pstore_info *psi); 67 63 int (*erase)(enum pstore_type_id type, u64 id, 68 64 int count, struct timespec time, 69 65 struct pstore_info *psi);
+34
include/uapi/linux/vfio.h
··· 22 22 /* Extensions */ 23 23 24 24 #define VFIO_TYPE1_IOMMU 1 25 + #define VFIO_SPAPR_TCE_IOMMU 2 25 26 26 27 /* 27 28 * The IOCTL interface is designed for extensibility by embedding the ··· 375 374 }; 376 375 377 376 #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14) 377 + 378 + /* 379 + * IOCTLs to enable/disable IOMMU container usage. 380 + * No parameters are supported. 381 + */ 382 + #define VFIO_IOMMU_ENABLE _IO(VFIO_TYPE, VFIO_BASE + 15) 383 + #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16) 384 + 385 + /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ 386 + 387 + /* 388 + * The SPAPR TCE info struct provides the information about the PCI bus 389 + * address ranges available for DMA, these values are programmed into 390 + * the hardware so the guest has to know that information. 391 + * 392 + * The DMA 32 bit window start is an absolute PCI bus address. 393 + * The IOVA address passed via map/unmap ioctls are absolute PCI bus 394 + * addresses too so the window works as a filter rather than an offset 395 + * for IOVA addresses. 396 + * 397 + * A flag will need to be added if other page sizes are supported, 398 + * so as defined here, it is always 4k. 399 + */ 400 + struct vfio_iommu_spapr_tce_info { 401 + __u32 argsz; 402 + __u32 flags; /* reserved for future use */ 403 + __u32 dma32_window_start; /* 32 bit window start (bytes) */ 404 + __u32 dma32_window_size; /* 32 bit window size (bytes) */ 405 + }; 406 + 407 + #define VFIO_IOMMU_SPAPR_TCE_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) 408 + 409 + /* ***************************************************************** */ 378 410 379 411 #endif /* _UAPIVFIO_H */
+18 -10
mm/huge_memory.c
··· 729 729 pmd_t entry; 730 730 entry = mk_huge_pmd(page, vma); 731 731 page_add_new_anon_rmap(page, vma, haddr); 732 + pgtable_trans_huge_deposit(mm, pmd, pgtable); 732 733 set_pmd_at(mm, haddr, pmd, entry); 733 - pgtable_trans_huge_deposit(mm, pgtable); 734 734 add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR); 735 735 mm->nr_ptes++; 736 736 spin_unlock(&mm->page_table_lock); ··· 771 771 entry = mk_pmd(zero_page, vma->vm_page_prot); 772 772 entry = pmd_wrprotect(entry); 773 773 entry = pmd_mkhuge(entry); 774 + pgtable_trans_huge_deposit(mm, pmd, pgtable); 774 775 set_pmd_at(mm, haddr, pmd, entry); 775 - pgtable_trans_huge_deposit(mm, pgtable); 776 776 mm->nr_ptes++; 777 777 return true; 778 778 } ··· 916 916 917 917 pmdp_set_wrprotect(src_mm, addr, src_pmd); 918 918 pmd = pmd_mkold(pmd_wrprotect(pmd)); 919 + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); 919 920 set_pmd_at(dst_mm, addr, dst_pmd, pmd); 920 - pgtable_trans_huge_deposit(dst_mm, pgtable); 921 921 dst_mm->nr_ptes++; 922 922 923 923 ret = 0; ··· 987 987 pmdp_clear_flush(vma, haddr, pmd); 988 988 /* leave pmd empty until pte is filled */ 989 989 990 - pgtable = pgtable_trans_huge_withdraw(mm); 990 + pgtable = pgtable_trans_huge_withdraw(mm, pmd); 991 991 pmd_populate(mm, &_pmd, pgtable); 992 992 993 993 for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) { ··· 1085 1085 pmdp_clear_flush(vma, haddr, pmd); 1086 1086 /* leave pmd empty until pte is filled */ 1087 1087 1088 - pgtable = pgtable_trans_huge_withdraw(mm); 1088 + pgtable = pgtable_trans_huge_withdraw(mm, pmd); 1089 1089 pmd_populate(mm, &_pmd, pgtable); 1090 1090 1091 1091 for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) { ··· 1265 1265 * young bit, instead of the current set_pmd_at. 1266 1266 */ 1267 1267 _pmd = pmd_mkyoung(pmd_mkdirty(*pmd)); 1268 - set_pmd_at(mm, addr & HPAGE_PMD_MASK, pmd, _pmd); 1268 + if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK, 1269 + pmd, _pmd, 1)) 1270 + update_mmu_cache_pmd(vma, addr, pmd); 1269 1271 } 1270 1272 if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { 1271 1273 if (page->mapping && trylock_page(page)) { ··· 1360 1358 struct page *page; 1361 1359 pgtable_t pgtable; 1362 1360 pmd_t orig_pmd; 1363 - pgtable = pgtable_trans_huge_withdraw(tlb->mm); 1361 + /* 1362 + * For architectures like ppc64 we look at deposited pgtable 1363 + * when calling pmdp_get_and_clear. So do the 1364 + * pgtable_trans_huge_withdraw after finishing pmdp related 1365 + * operations. 1366 + */ 1364 1367 orig_pmd = pmdp_get_and_clear(tlb->mm, addr, pmd); 1365 1368 tlb_remove_pmd_tlb_entry(tlb, pmd, addr); 1369 + pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd); 1366 1370 if (is_huge_zero_pmd(orig_pmd)) { 1367 1371 tlb->mm->nr_ptes--; 1368 1372 spin_unlock(&tlb->mm->page_table_lock); ··· 1699 1691 pmd = page_check_address_pmd(page, mm, address, 1700 1692 PAGE_CHECK_ADDRESS_PMD_SPLITTING_FLAG); 1701 1693 if (pmd) { 1702 - pgtable = pgtable_trans_huge_withdraw(mm); 1694 + pgtable = pgtable_trans_huge_withdraw(mm, pmd); 1703 1695 pmd_populate(mm, &_pmd, pgtable); 1704 1696 1705 1697 haddr = address; ··· 2367 2359 spin_lock(&mm->page_table_lock); 2368 2360 BUG_ON(!pmd_none(*pmd)); 2369 2361 page_add_new_anon_rmap(new_page, vma, address); 2362 + pgtable_trans_huge_deposit(mm, pmd, pgtable); 2370 2363 set_pmd_at(mm, address, pmd, _pmd); 2371 2364 update_mmu_cache_pmd(vma, address, pmd); 2372 - pgtable_trans_huge_deposit(mm, pgtable); 2373 2365 spin_unlock(&mm->page_table_lock); 2374 2366 2375 2367 *hpage = NULL; ··· 2675 2667 pmdp_clear_flush(vma, haddr, pmd); 2676 2668 /* leave pmd empty until pte is filled */ 2677 2669 2678 - pgtable = pgtable_trans_huge_withdraw(mm); 2670 + pgtable = pgtable_trans_huge_withdraw(mm, pmd); 2679 2671 pmd_populate(mm, &_pmd, pgtable); 2680 2672 2681 2673 for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+3 -2
mm/pgtable-generic.c
··· 124 124 125 125 #ifndef __HAVE_ARCH_PGTABLE_DEPOSIT 126 126 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 127 - void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable) 127 + void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, 128 + pgtable_t pgtable) 128 129 { 129 130 assert_spin_locked(&mm->page_table_lock); 130 131 ··· 142 141 #ifndef __HAVE_ARCH_PGTABLE_WITHDRAW 143 142 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 144 143 /* no "address" argument so destroys page coloring of some arch */ 145 - pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm) 144 + pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) 146 145 { 147 146 pgtable_t pgtable; 148 147