Linux kernel mirror (for testing)
git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel
os
linux
1.. SPDX-License-Identifier: GPL-2.0
2
3===========================================================
4POWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1)
5===========================================================
6
7Device types supported:
8 - KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1
9
10This device acts as a VM interrupt controller. It provides the KVM
11interface to configure the interrupt sources of a VM in the underlying
12POWER9 XIVE interrupt controller.
13
14Only one XIVE instance may be instantiated. A guest XIVE device
15requires a POWER9 host and the guest OS should have support for the
16XIVE native exploitation interrupt mode. If not, it should run using
17the legacy interrupt mode, referred as XICS (POWER7/8).
18
19* Device Mappings
20
21 The KVM device exposes different MMIO ranges of the XIVE HW which
22 are required for interrupt management. These are exposed to the
23 guest in VMAs populated with a custom VM fault handler.
24
25 1. Thread Interrupt Management Area (TIMA)
26
27 Each thread has an associated Thread Interrupt Management context
28 composed of a set of registers. These registers let the thread
29 handle priority management and interrupt acknowledgment. The most
30 important are :
31
32 - Interrupt Pending Buffer (IPB)
33 - Current Processor Priority (CPPR)
34 - Notification Source Register (NSR)
35
36 They are exposed to software in four different pages each proposing
37 a view with a different privilege. The first page is for the
38 physical thread context and the second for the hypervisor. Only the
39 third (operating system) and the fourth (user level) are exposed the
40 guest.
41
42 2. Event State Buffer (ESB)
43
44 Each source is associated with an Event State Buffer (ESB) with
45 either a pair of even/odd pair of pages which provides commands to
46 manage the source: to trigger, to EOI, to turn off the source for
47 instance.
48
49 3. Device pass-through
50
51 When a device is passed-through into the guest, the source
52 interrupts are from a different HW controller (PHB4) and the ESB
53 pages exposed to the guest should accommodate this change.
54
55 The passthru_irq helpers, kvmppc_xive_set_mapped() and
56 kvmppc_xive_clr_mapped() are called when the device HW irqs are
57 mapped into or unmapped from the guest IRQ number space. The KVM
58 device extends these helpers to clear the ESB pages of the guest IRQ
59 number being mapped and then lets the VM fault handler repopulate.
60 The handler will insert the ESB page corresponding to the HW
61 interrupt of the device being passed-through or the initial IPI ESB
62 page if the device has being removed.
63
64 The ESB remapping is fully transparent to the guest and the OS
65 device driver. All handling is done within VFIO and the above
66 helpers in KVM-PPC.
67
68* Groups:
69
701. KVM_DEV_XIVE_GRP_CTRL
71 Provides global controls on the device
72
73 Attributes:
74 1.1 KVM_DEV_XIVE_RESET (write only)
75 Resets the interrupt controller configuration for sources and event
76 queues. To be used by kexec and kdump.
77
78 Errors: none
79
80 1.2 KVM_DEV_XIVE_EQ_SYNC (write only)
81 Sync all the sources and queues and mark the EQ pages dirty. This
82 to make sure that a consistent memory state is captured when
83 migrating the VM.
84
85 Errors: none
86
87 1.3 KVM_DEV_XIVE_NR_SERVERS (write only)
88 The kvm_device_attr.addr points to a __u32 value which is the number of
89 interrupt server numbers (ie, highest possible vcpu id plus one).
90
91 Errors:
92
93 ======= ==========================================
94 -EINVAL Value greater than KVM_MAX_VCPU_IDS.
95 -EFAULT Invalid user pointer for attr->addr.
96 -EBUSY A vCPU is already connected to the device.
97 ======= ==========================================
98
992. KVM_DEV_XIVE_GRP_SOURCE (write only)
100 Initializes a new source in the XIVE device and mask it.
101
102 Attributes:
103 Interrupt source number (64-bit)
104
105 The kvm_device_attr.addr points to a __u64 value::
106
107 bits: | 63 .... 2 | 1 | 0
108 values: | unused | level | type
109
110 - type: 0:MSI 1:LSI
111 - level: assertion level in case of an LSI.
112
113 Errors:
114
115 ======= ==========================================
116 -E2BIG Interrupt source number is out of range
117 -ENOMEM Could not create a new source block
118 -EFAULT Invalid user pointer for attr->addr.
119 -ENXIO Could not allocate underlying HW interrupt
120 ======= ==========================================
121
1223. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only)
123 Configures source targeting
124
125 Attributes:
126 Interrupt source number (64-bit)
127
128 The kvm_device_attr.addr points to a __u64 value::
129
130 bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0
131 values: | eisn | mask | server | priority
132
133 - priority: 0-7 interrupt priority level
134 - server: CPU number chosen to handle the interrupt
135 - mask: mask flag (unused)
136 - eisn: Effective Interrupt Source Number
137
138 Errors:
139
140 ======= =======================================================
141 -ENOENT Unknown source number
142 -EINVAL Not initialized source number
143 -EINVAL Invalid priority
144 -EINVAL Invalid CPU number.
145 -EFAULT Invalid user pointer for attr->addr.
146 -ENXIO CPU event queues not configured or configuration of the
147 underlying HW interrupt failed
148 -EBUSY No CPU available to serve interrupt
149 ======= =======================================================
150
1514. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write)
152 Configures an event queue of a CPU
153
154 Attributes:
155 EQ descriptor identifier (64-bit)
156
157 The EQ descriptor identifier is a tuple (server, priority)::
158
159 bits: | 63 .... 32 | 31 .. 3 | 2 .. 0
160 values: | unused | server | priority
161
162 The kvm_device_attr.addr points to::
163
164 struct kvm_ppc_xive_eq {
165 __u32 flags;
166 __u32 qshift;
167 __u64 qaddr;
168 __u32 qtoggle;
169 __u32 qindex;
170 __u8 pad[40];
171 };
172
173 - flags: queue flags
174 KVM_XIVE_EQ_ALWAYS_NOTIFY (required)
175 forces notification without using the coalescing mechanism
176 provided by the XIVE END ESBs.
177 - qshift: queue size (power of 2)
178 - qaddr: real address of queue
179 - qtoggle: current queue toggle bit
180 - qindex: current queue index
181 - pad: reserved for future use
182
183 Errors:
184
185 ======= =========================================
186 -ENOENT Invalid CPU number
187 -EINVAL Invalid priority
188 -EINVAL Invalid flags
189 -EINVAL Invalid queue size
190 -EINVAL Invalid queue address
191 -EFAULT Invalid user pointer for attr->addr.
192 -EIO Configuration of the underlying HW failed
193 ======= =========================================
194
1955. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only)
196 Synchronize the source to flush event notifications
197
198 Attributes:
199 Interrupt source number (64-bit)
200
201 Errors:
202
203 ======= =============================
204 -ENOENT Unknown source number
205 -EINVAL Not initialized source number
206 ======= =============================
207
208* VCPU state
209
210 The XIVE IC maintains VP interrupt state in an internal structure
211 called the NVT. When a VP is not dispatched on a HW processor
212 thread, this structure can be updated by HW if the VP is the target
213 of an event notification.
214
215 It is important for migration to capture the cached IPB from the NVT
216 as it synthesizes the priorities of the pending interrupts. We
217 capture a bit more to report debug information.
218
219 KVM_REG_PPC_VP_STATE (2 * 64bits)::
220
221 bits: | 63 .... 32 | 31 .... 0 |
222 values: | TIMA word0 | TIMA word1 |
223 bits: | 127 .......... 64 |
224 values: | unused |
225
226* Migration:
227
228 Saving the state of a VM using the XIVE native exploitation mode
229 should follow a specific sequence. When the VM is stopped :
230
231 1. Mask all sources (PQ=01) to stop the flow of events.
232
233 2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to
234 flush any in-flight event notification and to stabilize the EQs. At
235 this stage, the EQ pages are marked dirty to make sure they are
236 transferred in the migration sequence.
237
238 3. Capture the state of the source targeting, the EQs configuration
239 and the state of thread interrupt context registers.
240
241 Restore is similar:
242
243 1. Restore the EQ configuration. As targeting depends on it.
244 2. Restore targeting
245 3. Restore the thread interrupt contexts
246 4. Restore the source states
247 5. Let the vCPU run