Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Documentation: Document the NVMe PCI endpoint target driver

Add a documentation file
(Documentation/nvme/nvme-pci-endpoint-target.rst) for the new NVMe PCI
endpoint target driver. This provides an overview of the driver
requirements, capabilities and limitations. A user guide describing how
to setup a NVMe PCI endpoint device using this driver is also provided.

This document is made accessible also from the PCI endpoint
documentation using a link. Furthermore, since the existing nvme
documentation was not accessible from the top documentation index, an
index file is added to Documentation/nvme and this index listed as
"NVMe Subsystem" in the "Storage interfaces" section of the subsystem
API index.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>

authored by

Damien Le Moal and committed by
Keith Busch
002ec8f1 0faa0fe6

+395
+1
Documentation/PCI/endpoint/index.rst
··· 15 15 pci-ntb-howto 16 16 pci-vntb-function 17 17 pci-vntb-howto 18 + pci-nvme-function 18 19 19 20 function/binding/pci-test 20 21 function/binding/pci-ntb
+13
Documentation/PCI/endpoint/pci-nvme-function.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================= 4 + PCI NVMe Function 5 + ================= 6 + 7 + :Author: Damien Le Moal <dlemoal@kernel.org> 8 + 9 + The PCI NVMe endpoint function implements a PCI NVMe controller using the NVMe 10 + subsystem target core code. The driver for this function resides with the NVMe 11 + subsystem as drivers/nvme/target/nvmet-pciep.c. 12 + 13 + See Documentation/nvme/nvme-pci-endpoint-target.rst for more details.
+12
Documentation/nvme/index.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ============== 4 + NVMe Subsystem 5 + ============== 6 + 7 + .. toctree:: 8 + :maxdepth: 2 9 + :numbered: 10 + 11 + feature-and-quirk-policy 12 + nvme-pci-endpoint-target
+368
Documentation/nvme/nvme-pci-endpoint-target.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================================= 4 + NVMe PCI Endpoint Function Target 5 + ================================= 6 + 7 + :Author: Damien Le Moal <dlemoal@kernel.org> 8 + 9 + The NVMe PCI endpoint function target driver implements a NVMe PCIe controller 10 + using a NVMe fabrics target controller configured with the PCI transport type. 11 + 12 + Overview 13 + ======== 14 + 15 + The NVMe PCI endpoint function target driver allows exposing a NVMe target 16 + controller over a PCIe link, thus implementing an NVMe PCIe device similar to a 17 + regular M.2 SSD. The target controller is created in the same manner as when 18 + using NVMe over fabrics: the controller represents the interface to an NVMe 19 + subsystem using a port. The port transfer type must be configured to be 20 + "pci". The subsystem can be configured to have namespaces backed by regular 21 + files or block devices, or can use NVMe passthrough to expose to the PCI host an 22 + existing physical NVMe device or a NVMe fabrics host controller (e.g. a NVMe TCP 23 + host controller). 24 + 25 + The NVMe PCI endpoint function target driver relies as much as possible on the 26 + NVMe target core code to parse and execute NVMe commands submitted by the PCIe 27 + host. However, using the PCI endpoint framework API and DMA API, the driver is 28 + also responsible for managing all data transfers over the PCIe link. This 29 + implies that the NVMe PCI endpoint function target driver implements several 30 + NVMe data structure management and some NVMe command parsing. 31 + 32 + 1) The driver manages retrieval of NVMe commands in submission queues using DMA 33 + if supported, or MMIO otherwise. Each command retrieved is then executed 34 + using a work item to maximize performance with the parallel execution of 35 + multiple commands on different CPUs. The driver uses a work item to 36 + constantly poll the doorbell of all submission queues to detect command 37 + submissions from the PCIe host. 38 + 39 + 2) The driver transfers completion queues entries of completed commands to the 40 + PCIe host using MMIO copy of the entries in the host completion queue. 41 + After posting completion entries in a completion queue, the driver uses the 42 + PCI endpoint framework API to raise an interrupt to the host to signal the 43 + commands completion. 44 + 45 + 3) For any command that has a data buffer, the NVMe PCI endpoint target driver 46 + parses the command PRPs or SGLs lists to create a list of PCI address 47 + segments representing the mapping of the command data buffer on the host. 48 + The command data buffer is transferred over the PCIe link using this list of 49 + PCI address segments using DMA, if supported. If DMA is not supported, MMIO 50 + is used, which results in poor performance. For write commands, the command 51 + data buffer is transferred from the host into a local memory buffer before 52 + executing the command using the target core code. For read commands, a local 53 + memory buffer is allocated to execute the command and the content of that 54 + buffer is transferred to the host once the command completes. 55 + 56 + Controller Capabilities 57 + ----------------------- 58 + 59 + The NVMe capabilities exposed to the PCIe host through the BAR 0 registers 60 + are almost identical to the capabilities of the NVMe target controller 61 + implemented by the target core code. There are some exceptions. 62 + 63 + 1) The NVMe PCI endpoint target driver always sets the controller capability 64 + CQR bit to request "Contiguous Queues Required". This is to facilitate the 65 + mapping of a queue PCI address range to the local CPU address space. 66 + 67 + 2) The doorbell stride (DSTRB) is always set to be 4B 68 + 69 + 3) Since the PCI endpoint framework does not provide a way to handle PCI level 70 + resets, the controller capability NSSR bit (NVM Subsystem Reset Supported) 71 + is always cleared. 72 + 73 + 4) The boot partition support (BPS), Persistent Memory Region Supported (PMRS) 74 + and Controller Memory Buffer Supported (CMBS) capabilities are never 75 + reported. 76 + 77 + Supported Features 78 + ------------------ 79 + 80 + The NVMe PCI endpoint target driver implements support for both PRPs and SGLs. 81 + The driver also implements IRQ vector coalescing and submission queue 82 + arbitration burst. 83 + 84 + The maximum number of queues and the maximum data transfer size (MDTS) are 85 + configurable through configfs before starting the controller. To avoid issues 86 + with excessive local memory usage for executing commands, MDTS defaults to 512 87 + KB and is limited to a maximum of 2 MB (arbitrary limit). 88 + 89 + Mimimum number of PCI Address Mapping Windows Required 90 + ------------------------------------------------------ 91 + 92 + Most PCI endpoint controllers provide a limited number of mapping windows for 93 + mapping a PCI address range to local CPU memory addresses. The NVMe PCI 94 + endpoint target controllers uses mapping windows for the following. 95 + 96 + 1) One memory window for raising MSI or MSI-X interrupts 97 + 2) One memory window for MMIO transfers 98 + 3) One memory window for each completion queue 99 + 100 + Given the highly asynchronous nature of the NVMe PCI endpoint target driver 101 + operation, the memory windows as described above will generally not be used 102 + simultaneously, but that may happen. So a safe maximum number of completion 103 + queues that can be supported is equal to the total number of memory mapping 104 + windows of the PCI endpoint controller minus two. E.g. for an endpoint PCI 105 + controller with 32 outbound memory windows available, up to 30 completion 106 + queues can be safely operated without any risk of getting PCI address mapping 107 + errors due to the lack of memory windows. 108 + 109 + Maximum Number of Queue Pairs 110 + ----------------------------- 111 + 112 + Upon binding of the NVMe PCI endpoint target driver to the PCI endpoint 113 + controller, BAR 0 is allocated with enough space to accommodate the admin queue 114 + and multiple I/O queues. The maximum of number of I/O queues pairs that can be 115 + supported is limited by several factors. 116 + 117 + 1) The NVMe target core code limits the maximum number of I/O queues to the 118 + number of online CPUs. 119 + 2) The total number of queue pairs, including the admin queue, cannot exceed 120 + the number of MSI-X or MSI vectors available. 121 + 3) The total number of completion queues must not exceed the total number of 122 + PCI mapping windows minus 2 (see above). 123 + 124 + The NVMe endpoint function driver allows configuring the maximum number of 125 + queue pairs through configfs. 126 + 127 + Limitations and NVMe Specification Non-Compliance 128 + ------------------------------------------------- 129 + 130 + Similar to the NVMe target core code, the NVMe PCI endpoint target driver does 131 + not support multiple submission queues using the same completion queue. All 132 + submission queues must specify a unique completion queue. 133 + 134 + 135 + User Guide 136 + ========== 137 + 138 + This section describes the hardware requirements and how to setup an NVMe PCI 139 + endpoint target device. 140 + 141 + Kernel Requirements 142 + ------------------- 143 + 144 + The kernel must be compiled with the configuration options CONFIG_PCI_ENDPOINT, 145 + CONFIG_PCI_ENDPOINT_CONFIGFS, and CONFIG_NVME_TARGET_PCI_EPF enabled. 146 + CONFIG_PCI, CONFIG_BLK_DEV_NVME and CONFIG_NVME_TARGET must also be enabled 147 + (obviously). 148 + 149 + In addition to this, at least one PCI endpoint controller driver should be 150 + available for the endpoint hardware used. 151 + 152 + To facilitate testing, enabling the null-blk driver (CONFIG_BLK_DEV_NULL_BLK) 153 + is also recommended. With this, a simple setup using a null_blk block device 154 + as a subsystem namespace can be used. 155 + 156 + Hardware Requirements 157 + --------------------- 158 + 159 + To use the NVMe PCI endpoint target driver, at least one endpoint controller 160 + device is required. 161 + 162 + To find the list of endpoint controller devices in the system:: 163 + 164 + # ls /sys/class/pci_epc/ 165 + a40000000.pcie-ep 166 + 167 + If PCI_ENDPOINT_CONFIGFS is enabled:: 168 + 169 + # ls /sys/kernel/config/pci_ep/controllers 170 + a40000000.pcie-ep 171 + 172 + The endpoint board must of course also be connected to a host with a PCI cable 173 + with RX-TX signal swapped. If the host PCI slot used does not have 174 + plug-and-play capabilities, the host should be powered off when the NVMe PCI 175 + endpoint device is configured. 176 + 177 + NVMe Endpoint Device 178 + -------------------- 179 + 180 + Creating an NVMe endpoint device is a two step process. First, an NVMe target 181 + subsystem and port must be defined. Second, the NVMe PCI endpoint device must 182 + be setup and bound to the subsystem and port created. 183 + 184 + Creating a NVMe Subsystem and Port 185 + ---------------------------------- 186 + 187 + Details about how to configure a NVMe target subsystem and port are outside the 188 + scope of this document. The following only provides a simple example of a port 189 + and subsystem with a single namespace backed by a null_blk device. 190 + 191 + First, make sure that configfs is enabled:: 192 + 193 + # mount -t configfs none /sys/kernel/config 194 + 195 + Next, create a null_blk device (default settings give a 250 GB device without 196 + memory backing). The block device created will be /dev/nullb0 by default:: 197 + 198 + # modprobe null_blk 199 + # ls /dev/nullb0 200 + /dev/nullb0 201 + 202 + The NVMe PCI endpoint function target driver must be loaded:: 203 + 204 + # modprobe nvmet_pci_epf 205 + # lsmod | grep nvmet 206 + nvmet_pci_epf 32768 0 207 + nvmet 118784 1 nvmet_pci_epf 208 + nvme_core 131072 2 nvmet_pci_epf,nvmet 209 + 210 + Now, create a subsystem and a port that we will use to create a PCI target 211 + controller when setting up the NVMe PCI endpoint target device. In this 212 + example, the port is created with a maximum of 4 I/O queue pairs:: 213 + 214 + # cd /sys/kernel/config/nvmet/subsystems 215 + # mkdir nvmepf.0.nqn 216 + # echo -n "Linux-pci-epf" > nvmepf.0.nqn/attr_model 217 + # echo "0x1b96" > nvmepf.0.nqn/attr_vendor_id 218 + # echo "0x1b96" > nvmepf.0.nqn/attr_subsys_vendor_id 219 + # echo 1 > nvmepf.0.nqn/attr_allow_any_host 220 + # echo 4 > nvmepf.0.nqn/attr_qid_max 221 + 222 + Next, create and enable the subsystem namespace using the null_blk block 223 + device:: 224 + 225 + # mkdir nvmepf.0.nqn/namespaces/1 226 + # echo -n "/dev/nullb0" > nvmepf.0.nqn/namespaces/1/device_path 227 + # echo 1 > "nvmepf.0.nqn/namespaces/1/enable" 228 + 229 + Finally, create the target port and link it to the subsystem:: 230 + 231 + # cd /sys/kernel/config/nvmet/ports 232 + # mkdir 1 233 + # echo -n "pci" > 1/addr_trtype 234 + # ln -s /sys/kernel/config/nvmet/subsystems/nvmepf.0.nqn \ 235 + /sys/kernel/config/nvmet/ports/1/subsystems/nvmepf.0.nqn 236 + 237 + Creating a NVMe PCI Endpoint Device 238 + ----------------------------------- 239 + 240 + With the NVMe target subsystem and port ready for use, the NVMe PCI endpoint 241 + device can now be created and enabled. The NVMe PCI endpoint target driver 242 + should already be loaded (that is done automatically when the port is created):: 243 + 244 + # ls /sys/kernel/config/pci_ep/functions 245 + nvmet_pci_epf 246 + 247 + Next, create function 0:: 248 + 249 + # cd /sys/kernel/config/pci_ep/functions/nvmet_pci_epf 250 + # mkdir nvmepf.0 251 + # ls nvmepf.0/ 252 + baseclass_code msix_interrupts secondary 253 + cache_line_size nvme subclass_code 254 + deviceid primary subsys_id 255 + interrupt_pin progif_code subsys_vendor_id 256 + msi_interrupts revid vendorid 257 + 258 + Configure the function using any device ID (the vendor ID for the device will 259 + be automatically set to the same value as the NVMe target subsystem vendor 260 + ID):: 261 + 262 + # cd /sys/kernel/config/pci_ep/functions/nvmet_pci_epf 263 + # echo 0xBEEF > nvmepf.0/deviceid 264 + # echo 32 > nvmepf.0/msix_interrupts 265 + 266 + If the PCI endpoint controller used does not support MSI-X, MSI can be 267 + configured instead:: 268 + 269 + # echo 32 > nvmepf.0/msi_interrupts 270 + 271 + Next, let's bind our endpoint device with the target subsystem and port that we 272 + created:: 273 + 274 + # echo 1 > nvmepf.0/nvme/portid 275 + # echo "nvmepf.0.nqn" > nvmepf.0/nvme/subsysnqn 276 + 277 + The endpoint function can then be bound to the endpoint controller and the 278 + controller started:: 279 + 280 + # cd /sys/kernel/config/pci_ep 281 + # ln -s functions/nvmet_pci_epf/nvmepf.0 controllers/a40000000.pcie-ep/ 282 + # echo 1 > controllers/a40000000.pcie-ep/start 283 + 284 + On the endpoint machine, kernel messages will show information as the NVMe 285 + target device and endpoint device are created and connected. 286 + 287 + .. code-block:: text 288 + 289 + null_blk: disk nullb0 created 290 + null_blk: module loaded 291 + nvmet: adding nsid 1 to subsystem nvmepf.0.nqn 292 + nvmet_pci_epf nvmet_pci_epf.0: PCI endpoint controller supports MSI-X, 32 vectors 293 + nvmet: Created nvm controller 1 for subsystem nvmepf.0.nqn for NQN nqn.2014-08.org.nvmexpress:uuid:2ab90791-2246-4fbb-961d-4c3d5a5a0176. 294 + nvmet_pci_epf nvmet_pci_epf.0: New PCI ctrl "nvmepf.0.nqn", 4 I/O queues, mdts 524288 B 295 + 296 + PCI Root-Complex Host 297 + --------------------- 298 + 299 + Booting the PCI host will result in the initialization of the PCIe link (this 300 + may be signaled by the PCI endpoint driver with a kernel message). A kernel 301 + message on the endpoint will also signal when the host NVMe driver enables the 302 + device controller:: 303 + 304 + nvmet_pci_epf nvmet_pci_epf.0: Enabling controller 305 + 306 + On the host side, the NVMe PCI endpoint function target device will is 307 + discoverable as a PCI device, with the vendor ID and device ID as configured:: 308 + 309 + # lspci -n 310 + 0000:01:00.0 0108: 1b96:beef 311 + 312 + An this device will be recognized as an NVMe device with a single namespace:: 313 + 314 + # lsblk 315 + NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS 316 + nvme0n1 259:0 0 250G 0 disk 317 + 318 + The NVMe endpoint block device can then be used as any other regular NVMe 319 + namespace block device. The *nvme* command line utility can be used to get more 320 + detailed information about the endpoint device:: 321 + 322 + # nvme id-ctrl /dev/nvme0 323 + NVME Identify Controller: 324 + vid : 0x1b96 325 + ssvid : 0x1b96 326 + sn : 94993c85650ef7bcd625 327 + mn : Linux-pci-epf 328 + fr : 6.13.0-r 329 + rab : 6 330 + ieee : 000000 331 + cmic : 0xb 332 + mdts : 7 333 + cntlid : 0x1 334 + ver : 0x20100 335 + ... 336 + 337 + 338 + Endpoint Bindings 339 + ================= 340 + 341 + The NVMe PCI endpoint target driver uses the PCI endpoint configfs device 342 + attributes as follows. 343 + 344 + ================ =========================================================== 345 + vendorid Ignored (the vendor id of the NVMe target subsystem is used) 346 + deviceid Anything is OK (e.g. PCI_ANY_ID) 347 + revid Do not care 348 + progif_code Must be 0x02 (NVM Express) 349 + baseclass_code Must be 0x01 (PCI_BASE_CLASS_STORAGE) 350 + subclass_code Must be 0x08 (Non-Volatile Memory controller) 351 + cache_line_size Do not care 352 + subsys_vendor_id Ignored (the subsystem vendor id of the NVMe target subsystem 353 + is used) 354 + subsys_id Anything is OK (e.g. PCI_ANY_ID) 355 + msi_interrupts At least equal to the number of queue pairs desired 356 + msix_interrupts At least equal to the number of queue pairs desired 357 + interrupt_pin Interrupt PIN to use if MSI and MSI-X are not supported 358 + ================ =========================================================== 359 + 360 + The NVMe PCI endpoint target function also has some specific configurable 361 + fields defined in the *nvme* subdirectory of the function directory. These 362 + fields are as follows. 363 + 364 + ================ =========================================================== 365 + mdts_kb Maximum data transfer size in KiB (default: 512) 366 + portid The ID of the target port to use 367 + subsysnqn The NQN of the target subsystem to use 368 + ================ ===========================================================
+1
Documentation/subsystem-apis.rst
··· 60 60 cdrom/index 61 61 scsi/index 62 62 target/index 63 + nvme/index 63 64 64 65 Other subsystems 65 66 ----------------