Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

at v4.8-rc3 438 lines 16 kB view raw
1Coherent Accelerator Interface (CXL) 2==================================== 3 4Introduction 5============ 6 7 The coherent accelerator interface is designed to allow the 8 coherent connection of accelerators (FPGAs and other devices) to a 9 POWER system. These devices need to adhere to the Coherent 10 Accelerator Interface Architecture (CAIA). 11 12 IBM refers to this as the Coherent Accelerator Processor Interface 13 or CAPI. In the kernel it's referred to by the name CXL to avoid 14 confusion with the ISDN CAPI subsystem. 15 16 Coherent in this context means that the accelerator and CPUs can 17 both access system memory directly and with the same effective 18 addresses. 19 20 21Hardware overview 22================= 23 24 POWER8 FPGA 25 +----------+ +---------+ 26 | | | | 27 | CPU | | AFU | 28 | | | | 29 | | | | 30 | | | | 31 +----------+ +---------+ 32 | PHB | | | 33 | +------+ | PSL | 34 | | CAPP |<------>| | 35 +---+------+ PCIE +---------+ 36 37 The POWER8 chip has a Coherently Attached Processor Proxy (CAPP) 38 unit which is part of the PCIe Host Bridge (PHB). This is managed 39 by Linux by calls into OPAL. Linux doesn't directly program the 40 CAPP. 41 42 The FPGA (or coherently attached device) consists of two parts. 43 The POWER Service Layer (PSL) and the Accelerator Function Unit 44 (AFU). The AFU is used to implement specific functionality behind 45 the PSL. The PSL, among other things, provides memory address 46 translation services to allow each AFU direct access to userspace 47 memory. 48 49 The AFU is the core part of the accelerator (eg. the compression, 50 crypto etc function). The kernel has no knowledge of the function 51 of the AFU. Only userspace interacts directly with the AFU. 52 53 The PSL provides the translation and interrupt services that the 54 AFU needs. This is what the kernel interacts with. For example, if 55 the AFU needs to read a particular effective address, it sends 56 that address to the PSL, the PSL then translates it, fetches the 57 data from memory and returns it to the AFU. If the PSL has a 58 translation miss, it interrupts the kernel and the kernel services 59 the fault. The context to which this fault is serviced is based on 60 who owns that acceleration function. 61 62 63AFU Modes 64========= 65 66 There are two programming modes supported by the AFU. Dedicated 67 and AFU directed. AFU may support one or both modes. 68 69 When using dedicated mode only one MMU context is supported. In 70 this mode, only one userspace process can use the accelerator at 71 time. 72 73 When using AFU directed mode, up to 16K simultaneous contexts can 74 be supported. This means up to 16K simultaneous userspace 75 applications may use the accelerator (although specific AFUs may 76 support fewer). In this mode, the AFU sends a 16 bit context ID 77 with each of its requests. This tells the PSL which context is 78 associated with each operation. If the PSL can't translate an 79 operation, the ID can also be accessed by the kernel so it can 80 determine the userspace context associated with an operation. 81 82 83MMIO space 84========== 85 86 A portion of the accelerator MMIO space can be directly mapped 87 from the AFU to userspace. Either the whole space can be mapped or 88 just a per context portion. The hardware is self describing, hence 89 the kernel can determine the offset and size of the per context 90 portion. 91 92 93Interrupts 94========== 95 96 AFUs may generate interrupts that are destined for userspace. These 97 are received by the kernel as hardware interrupts and passed onto 98 userspace by a read syscall documented below. 99 100 Data storage faults and error interrupts are handled by the kernel 101 driver. 102 103 104Work Element Descriptor (WED) 105============================= 106 107 The WED is a 64-bit parameter passed to the AFU when a context is 108 started. Its format is up to the AFU hence the kernel has no 109 knowledge of what it represents. Typically it will be the 110 effective address of a work queue or status block where the AFU 111 and userspace can share control and status information. 112 113 114 115 116User API 117======== 118 1191. AFU character devices 120 121 For AFUs operating in AFU directed mode, two character device 122 files will be created. /dev/cxl/afu0.0m will correspond to a 123 master context and /dev/cxl/afu0.0s will correspond to a slave 124 context. Master contexts have access to the full MMIO space an 125 AFU provides. Slave contexts have access to only the per process 126 MMIO space an AFU provides. 127 128 For AFUs operating in dedicated process mode, the driver will 129 only create a single character device per AFU called 130 /dev/cxl/afu0.0d. This will have access to the entire MMIO space 131 that the AFU provides (like master contexts in AFU directed). 132 133 The types described below are defined in include/uapi/misc/cxl.h 134 135 The following file operations are supported on both slave and 136 master devices. 137 138 A userspace library libcxl is available here: 139 https://github.com/ibm-capi/libcxl 140 This provides a C interface to this kernel API. 141 142open 143---- 144 145 Opens the device and allocates a file descriptor to be used with 146 the rest of the API. 147 148 A dedicated mode AFU only has one context and only allows the 149 device to be opened once. 150 151 An AFU directed mode AFU can have many contexts, the device can be 152 opened once for each context that is available. 153 154 When all available contexts are allocated the open call will fail 155 and return -ENOSPC. 156 157 Note: IRQs need to be allocated for each context, which may limit 158 the number of contexts that can be created, and therefore 159 how many times the device can be opened. The POWER8 CAPP 160 supports 2040 IRQs and 3 are used by the kernel, so 2037 are 161 left. If 1 IRQ is needed per context, then only 2037 162 contexts can be allocated. If 4 IRQs are needed per context, 163 then only 2037/4 = 509 contexts can be allocated. 164 165 166ioctl 167----- 168 169 CXL_IOCTL_START_WORK: 170 Starts the AFU context and associates it with the current 171 process. Once this ioctl is successfully executed, all memory 172 mapped into this process is accessible to this AFU context 173 using the same effective addresses. No additional calls are 174 required to map/unmap memory. The AFU memory context will be 175 updated as userspace allocates and frees memory. This ioctl 176 returns once the AFU context is started. 177 178 Takes a pointer to a struct cxl_ioctl_start_work: 179 180 struct cxl_ioctl_start_work { 181 __u64 flags; 182 __u64 work_element_descriptor; 183 __u64 amr; 184 __s16 num_interrupts; 185 __s16 reserved1; 186 __s32 reserved2; 187 __u64 reserved3; 188 __u64 reserved4; 189 __u64 reserved5; 190 __u64 reserved6; 191 }; 192 193 flags: 194 Indicates which optional fields in the structure are 195 valid. 196 197 work_element_descriptor: 198 The Work Element Descriptor (WED) is a 64-bit argument 199 defined by the AFU. Typically this is an effective 200 address pointing to an AFU specific structure 201 describing what work to perform. 202 203 amr: 204 Authority Mask Register (AMR), same as the powerpc 205 AMR. This field is only used by the kernel when the 206 corresponding CXL_START_WORK_AMR value is specified in 207 flags. If not specified the kernel will use a default 208 value of 0. 209 210 num_interrupts: 211 Number of userspace interrupts to request. This field 212 is only used by the kernel when the corresponding 213 CXL_START_WORK_NUM_IRQS value is specified in flags. 214 If not specified the minimum number required by the 215 AFU will be allocated. The min and max number can be 216 obtained from sysfs. 217 218 reserved fields: 219 For ABI padding and future extensions 220 221 CXL_IOCTL_GET_PROCESS_ELEMENT: 222 Get the current context id, also known as the process element. 223 The value is returned from the kernel as a __u32. 224 225 226mmap 227---- 228 229 An AFU may have an MMIO space to facilitate communication with the 230 AFU. If it does, the MMIO space can be accessed via mmap. The size 231 and contents of this area are specific to the particular AFU. The 232 size can be discovered via sysfs. 233 234 In AFU directed mode, master contexts are allowed to map all of 235 the MMIO space and slave contexts are allowed to only map the per 236 process MMIO space associated with the context. In dedicated 237 process mode the entire MMIO space can always be mapped. 238 239 This mmap call must be done after the START_WORK ioctl. 240 241 Care should be taken when accessing MMIO space. Only 32 and 64-bit 242 accesses are supported by POWER8. Also, the AFU will be designed 243 with a specific endianness, so all MMIO accesses should consider 244 endianness (recommend endian(3) variants like: le64toh(), 245 be64toh() etc). These endian issues equally apply to shared memory 246 queues the WED may describe. 247 248 249read 250---- 251 252 Reads events from the AFU. Blocks if no events are pending 253 (unless O_NONBLOCK is supplied). Returns -EIO in the case of an 254 unrecoverable error or if the card is removed. 255 256 read() will always return an integral number of events. 257 258 The buffer passed to read() must be at least 4K bytes. 259 260 The result of the read will be a buffer of one or more events, 261 each event is of type struct cxl_event, of varying size. 262 263 struct cxl_event { 264 struct cxl_event_header header; 265 union { 266 struct cxl_event_afu_interrupt irq; 267 struct cxl_event_data_storage fault; 268 struct cxl_event_afu_error afu_error; 269 }; 270 }; 271 272 The struct cxl_event_header is defined as: 273 274 struct cxl_event_header { 275 __u16 type; 276 __u16 size; 277 __u16 process_element; 278 __u16 reserved1; 279 }; 280 281 type: 282 This defines the type of event. The type determines how 283 the rest of the event is structured. These types are 284 described below and defined by enum cxl_event_type. 285 286 size: 287 This is the size of the event in bytes including the 288 struct cxl_event_header. The start of the next event can 289 be found at this offset from the start of the current 290 event. 291 292 process_element: 293 Context ID of the event. 294 295 reserved field: 296 For future extensions and padding. 297 298 If the event type is CXL_EVENT_AFU_INTERRUPT then the event 299 structure is defined as: 300 301 struct cxl_event_afu_interrupt { 302 __u16 flags; 303 __u16 irq; /* Raised AFU interrupt number */ 304 __u32 reserved1; 305 }; 306 307 flags: 308 These flags indicate which optional fields are present 309 in this struct. Currently all fields are mandatory. 310 311 irq: 312 The IRQ number sent by the AFU. 313 314 reserved field: 315 For future extensions and padding. 316 317 If the event type is CXL_EVENT_DATA_STORAGE then the event 318 structure is defined as: 319 320 struct cxl_event_data_storage { 321 __u16 flags; 322 __u16 reserved1; 323 __u32 reserved2; 324 __u64 addr; 325 __u64 dsisr; 326 __u64 reserved3; 327 }; 328 329 flags: 330 These flags indicate which optional fields are present in 331 this struct. Currently all fields are mandatory. 332 333 address: 334 The address that the AFU unsuccessfully attempted to 335 access. Valid accesses will be handled transparently by the 336 kernel but invalid accesses will generate this event. 337 338 dsisr: 339 This field gives information on the type of fault. It is a 340 copy of the DSISR from the PSL hardware when the address 341 fault occurred. The form of the DSISR is as defined in the 342 CAIA. 343 344 reserved fields: 345 For future extensions 346 347 If the event type is CXL_EVENT_AFU_ERROR then the event structure 348 is defined as: 349 350 struct cxl_event_afu_error { 351 __u16 flags; 352 __u16 reserved1; 353 __u32 reserved2; 354 __u64 error; 355 }; 356 357 flags: 358 These flags indicate which optional fields are present in 359 this struct. Currently all fields are Mandatory. 360 361 error: 362 Error status from the AFU. Defined by the AFU. 363 364 reserved fields: 365 For future extensions and padding 366 367 3682. Card character device (powerVM guest only) 369 370 In a powerVM guest, an extra character device is created for the 371 card. The device is only used to write (flash) a new image on the 372 FPGA accelerator. Once the image is written and verified, the 373 device tree is updated and the card is reset to reload the updated 374 image. 375 376open 377---- 378 379 Opens the device and allocates a file descriptor to be used with 380 the rest of the API. The device can only be opened once. 381 382ioctl 383----- 384 385CXL_IOCTL_DOWNLOAD_IMAGE: 386CXL_IOCTL_VALIDATE_IMAGE: 387 Starts and controls flashing a new FPGA image. Partial 388 reconfiguration is not supported (yet), so the image must contain 389 a copy of the PSL and AFU(s). Since an image can be quite large, 390 the caller may have to iterate, splitting the image in smaller 391 chunks. 392 393 Takes a pointer to a struct cxl_adapter_image: 394 struct cxl_adapter_image { 395 __u64 flags; 396 __u64 data; 397 __u64 len_data; 398 __u64 len_image; 399 __u64 reserved1; 400 __u64 reserved2; 401 __u64 reserved3; 402 __u64 reserved4; 403 }; 404 405 flags: 406 These flags indicate which optional fields are present in 407 this struct. Currently all fields are mandatory. 408 409 data: 410 Pointer to a buffer with part of the image to write to the 411 card. 412 413 len_data: 414 Size of the buffer pointed to by data. 415 416 len_image: 417 Full size of the image. 418 419 420Sysfs Class 421=========== 422 423 A cxl sysfs class is added under /sys/class/cxl to facilitate 424 enumeration and tuning of the accelerators. Its layout is 425 described in Documentation/ABI/testing/sysfs-class-cxl 426 427 428Udev rules 429========== 430 431 The following udev rules could be used to create a symlink to the 432 most logical chardev to use in any programming mode (afuX.Yd for 433 dedicated, afuX.Ys for afu directed), since the API is virtually 434 identical for each: 435 436 SUBSYSTEM=="cxl", ATTRS{mode}=="dedicated_process", SYMLINK="cxl/%b" 437 SUBSYSTEM=="cxl", ATTRS{mode}=="afu_directed", \ 438 KERNEL=="afu[0-9]*.[0-9]*s", SYMLINK="cxl/%b"