Linux kernel mirror (for testing)
git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel
os
linux
1.. SPDX-License-Identifier: GPL-2.0
2
3===================
4ice devlink support
5===================
6
7This document describes the devlink features implemented by the ``ice``
8device driver.
9
10Parameters
11==========
12
13.. list-table:: Generic parameters implemented
14
15 * - Name
16 - Mode
17 - Notes
18 * - ``enable_roce``
19 - runtime
20 - mutually exclusive with ``enable_iwarp``
21 * - ``enable_iwarp``
22 - runtime
23 - mutually exclusive with ``enable_roce``
24
25Info versions
26=============
27
28The ``ice`` driver reports the following versions
29
30.. list-table:: devlink info versions implemented
31 :widths: 5 5 5 90
32
33 * - Name
34 - Type
35 - Example
36 - Description
37 * - ``board.id``
38 - fixed
39 - K65390-000
40 - The Product Board Assembly (PBA) identifier of the board.
41 * - ``fw.mgmt``
42 - running
43 - 2.1.7
44 - 3-digit version number of the management firmware running on the
45 Embedded Management Processor of the device. It controls the PHY,
46 link, access to device resources, etc. Intel documentation refers to
47 this as the EMP firmware.
48 * - ``fw.mgmt.api``
49 - running
50 - 1.5.1
51 - 3-digit version number (major.minor.patch) of the API exported over
52 the AdminQ by the management firmware. Used by the driver to
53 identify what commands are supported. Historical versions of the
54 kernel only displayed a 2-digit version number (major.minor).
55 * - ``fw.mgmt.build``
56 - running
57 - 0x305d955f
58 - Unique identifier of the source for the management firmware.
59 * - ``fw.undi``
60 - running
61 - 1.2581.0
62 - Version of the Option ROM containing the UEFI driver. The version is
63 reported in ``major.minor.patch`` format. The major version is
64 incremented whenever a major breaking change occurs, or when the
65 minor version would overflow. The minor version is incremented for
66 non-breaking changes and reset to 1 when the major version is
67 incremented. The patch version is normally 0 but is incremented when
68 a fix is delivered as a patch against an older base Option ROM.
69 * - ``fw.psid.api``
70 - running
71 - 0.80
72 - Version defining the format of the flash contents.
73 * - ``fw.bundle_id``
74 - running
75 - 0x80002ec0
76 - Unique identifier of the firmware image file that was loaded onto
77 the device. Also referred to as the EETRACK identifier of the NVM.
78 * - ``fw.app.name``
79 - running
80 - ICE OS Default Package
81 - The name of the DDP package that is active in the device. The DDP
82 package is loaded by the driver during initialization. Each
83 variation of the DDP package has a unique name.
84 * - ``fw.app``
85 - running
86 - 1.3.1.0
87 - The version of the DDP package that is active in the device. Note
88 that both the name (as reported by ``fw.app.name``) and version are
89 required to uniquely identify the package.
90 * - ``fw.app.bundle_id``
91 - running
92 - 0xc0000001
93 - Unique identifier for the DDP package loaded in the device. Also
94 referred to as the DDP Track ID. Can be used to uniquely identify
95 the specific DDP package.
96 * - ``fw.netlist``
97 - running
98 - 1.1.2000-6.7.0
99 - The version of the netlist module. This module defines the device's
100 Ethernet capabilities and default settings, and is used by the
101 management firmware as part of managing link and device
102 connectivity.
103 * - ``fw.netlist.build``
104 - running
105 - 0xee16ced7
106 - The first 4 bytes of the hash of the netlist module contents.
107
108Flash Update
109============
110
111The ``ice`` driver implements support for flash update using the
112``devlink-flash`` interface. It supports updating the device flash using a
113combined flash image that contains the ``fw.mgmt``, ``fw.undi``, and
114``fw.netlist`` components.
115
116.. list-table:: List of supported overwrite modes
117 :widths: 5 95
118
119 * - Bits
120 - Behavior
121 * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS``
122 - Do not preserve settings stored in the flash components being
123 updated. This includes overwriting the port configuration that
124 determines the number of physical functions the device will
125 initialize with.
126 * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` and ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS``
127 - Do not preserve either settings or identifiers. Overwrite everything
128 in the flash with the contents from the provided image, without
129 performing any preservation. This includes overwriting device
130 identifying fields such as the MAC address, VPD area, and device
131 serial number. It is expected that this combination be used with an
132 image customized for the specific device.
133
134The ice hardware does not support overwriting only identifiers while
135preserving settings, and thus ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS`` on its
136own will be rejected. If no overwrite mask is provided, the firmware will be
137instructed to preserve all settings and identifying fields when updating.
138
139Reload
140======
141
142The ``ice`` driver supports activating new firmware after a flash update
143using ``DEVLINK_CMD_RELOAD`` with the ``DEVLINK_RELOAD_ACTION_FW_ACTIVATE``
144action.
145
146.. code:: shell
147
148 $ devlink dev reload pci/0000:01:00.0 reload action fw_activate
149
150The new firmware is activated by issuing a device specific Embedded
151Management Processor reset which requests the device to reset and reload the
152EMP firmware image.
153
154The driver does not currently support reloading the driver via
155``DEVLINK_RELOAD_ACTION_DRIVER_REINIT``.
156
157Port split
158==========
159
160The ``ice`` driver supports port splitting only for port 0, as the FW has
161a predefined set of available port split options for the whole device.
162
163A system reboot is required for port split to be applied.
164
165The following command will select the port split option with 4 ports:
166
167.. code:: shell
168
169 $ devlink port split pci/0000:16:00.0/0 count 4
170
171The list of all available port options will be printed to dynamic debug after
172each ``split`` and ``unsplit`` command. The first option is the default.
173
174.. code:: shell
175
176 ice 0000:16:00.0: Available port split options and max port speeds (Gbps):
177 ice 0000:16:00.0: Status Split Quad 0 Quad 1
178 ice 0000:16:00.0: count L0 L1 L2 L3 L4 L5 L6 L7
179 ice 0000:16:00.0: Active 2 100 - - - 100 - - -
180 ice 0000:16:00.0: 2 50 - 50 - - - - -
181 ice 0000:16:00.0: Pending 4 25 25 25 25 - - - -
182 ice 0000:16:00.0: 4 25 25 - - 25 25 - -
183 ice 0000:16:00.0: 8 10 10 10 10 10 10 10 10
184 ice 0000:16:00.0: 1 100 - - - - - - -
185
186There could be multiple FW port options with the same port split count. When
187the same port split count request is issued again, the next FW port option with
188the same port split count will be selected.
189
190``devlink port unsplit`` will select the option with a split count of 1. If
191there is no FW option available with split count 1, you will receive an error.
192
193Regions
194=======
195
196The ``ice`` driver implements the following regions for accessing internal
197device data.
198
199.. list-table:: regions implemented
200 :widths: 15 85
201
202 * - Name
203 - Description
204 * - ``nvm-flash``
205 - The contents of the entire flash chip, sometimes referred to as
206 the device's Non Volatile Memory.
207 * - ``shadow-ram``
208 - The contents of the Shadow RAM, which is loaded from the beginning
209 of the flash. Although the contents are primarily from the flash,
210 this area also contains data generated during device boot which is
211 not stored in flash.
212 * - ``device-caps``
213 - The contents of the device firmware's capabilities buffer. Useful to
214 determine the current state and configuration of the device.
215
216Both the ``nvm-flash`` and ``shadow-ram`` regions can be accessed without a
217snapshot. The ``device-caps`` region requires a snapshot as the contents are
218sent by firmware and can't be split into separate reads.
219
220Users can request an immediate capture of a snapshot for all three regions
221via the ``DEVLINK_CMD_REGION_NEW`` command.
222
223.. code:: shell
224
225 $ devlink region show
226 pci/0000:01:00.0/nvm-flash: size 10485760 snapshot [] max 1
227 pci/0000:01:00.0/device-caps: size 4096 snapshot [] max 10
228
229 $ devlink region new pci/0000:01:00.0/nvm-flash snapshot 1
230 $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1
231
232 $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1
233 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
234 0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8
235 0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc
236 0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5
237
238 $ devlink region read pci/0000:01:00.0/nvm-flash snapshot 1 address 0 length 16
239 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
240
241 $ devlink region delete pci/0000:01:00.0/nvm-flash snapshot 1
242
243 $ devlink region new pci/0000:01:00.0/device-caps snapshot 1
244 $ devlink region dump pci/0000:01:00.0/device-caps snapshot 1
245 0000000000000000 01 00 01 00 00 00 00 00 01 00 00 00 00 00 00 00
246 0000000000000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
247 0000000000000020 02 00 02 01 32 03 00 00 0a 00 00 00 25 00 00 00
248 0000000000000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
249 0000000000000040 04 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00
250 0000000000000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
251 0000000000000060 05 00 01 00 03 00 00 00 00 00 00 00 00 00 00 00
252 0000000000000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
253 0000000000000080 06 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00
254 0000000000000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
255 00000000000000a0 08 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
256 00000000000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
257 00000000000000c0 12 00 01 00 01 00 00 00 01 00 01 00 00 00 00 00
258 00000000000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
259 00000000000000e0 13 00 01 00 00 01 00 00 00 00 00 00 00 00 00 00
260 00000000000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
261 0000000000000100 14 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00
262 0000000000000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
263 0000000000000120 15 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00
264 0000000000000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
265 0000000000000140 16 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00
266 0000000000000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
267 0000000000000160 17 00 01 00 06 00 00 00 00 00 00 00 00 00 00 00
268 0000000000000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
269 0000000000000180 18 00 01 00 01 00 00 00 01 00 00 00 08 00 00 00
270 0000000000000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
271 00000000000001a0 22 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00
272 00000000000001b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
273 00000000000001c0 40 00 01 00 00 08 00 00 08 00 00 00 00 00 00 00
274 00000000000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
275 00000000000001e0 41 00 01 00 00 08 00 00 00 00 00 00 00 00 00 00
276 00000000000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
277 0000000000000200 42 00 01 00 00 08 00 00 00 00 00 00 00 00 00 00
278 0000000000000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
279
280 $ devlink region delete pci/0000:01:00.0/device-caps snapshot 1
281
282Devlink Rate
283============
284
285The ``ice`` driver implements devlink-rate API. It allows for offload of
286the Hierarchical QoS to the hardware. It enables user to group Virtual
287Functions in a tree structure and assign supported parameters: tx_share,
288tx_max, tx_priority and tx_weight to each node in a tree. So effectively
289user gains an ability to control how much bandwidth is allocated for each
290VF group. This is later enforced by the HW.
291
292It is assumed that this feature is mutually exclusive with DCB performed
293in FW and ADQ, or any driver feature that would trigger changes in QoS,
294for example creation of the new traffic class. The driver will prevent DCB
295or ADQ configuration if user started making any changes to the nodes using
296devlink-rate API. To configure those features a driver reload is necessary.
297Correspondingly if ADQ or DCB will get configured the driver won't export
298hierarchy at all, or will remove the untouched hierarchy if those
299features are enabled after the hierarchy is exported, but before any
300changes are made.
301
302This feature is also dependent on switchdev being enabled in the system.
303It's required because devlink-rate requires devlink-port objects to be
304present, and those objects are only created in switchdev mode.
305
306If the driver is set to the switchdev mode, it will export internal
307hierarchy the moment VF's are created. Root of the tree is always
308represented by the node_0. This node can't be deleted by the user. Leaf
309nodes and nodes with children also can't be deleted.
310
311.. list-table:: Attributes supported
312 :widths: 15 85
313
314 * - Name
315 - Description
316 * - ``tx_max``
317 - maximum bandwidth to be consumed by the tree Node. Rate Limit is
318 an absolute number specifying a maximum amount of bytes a Node may
319 consume during the course of one second. Rate limit guarantees
320 that a link will not oversaturate the receiver on the remote end
321 and also enforces an SLA between the subscriber and network
322 provider.
323 * - ``tx_share``
324 - minimum bandwidth allocated to a tree node when it is not blocked.
325 It specifies an absolute BW. While tx_max defines the maximum
326 bandwidth the node may consume, the tx_share marks committed BW
327 for the Node.
328 * - ``tx_priority``
329 - allows for usage of strict priority arbiter among siblings. This
330 arbitration scheme attempts to schedule nodes based on their
331 priority as long as the nodes remain within their bandwidth limit.
332 Range 0-7. Nodes with priority 7 have the highest priority and are
333 selected first, while nodes with priority 0 have the lowest
334 priority. Nodes that have the same priority are treated equally.
335 * - ``tx_weight``
336 - allows for usage of Weighted Fair Queuing arbitration scheme among
337 siblings. This arbitration scheme can be used simultaneously with
338 the strict priority. Range 1-200. Only relative values matter for
339 arbitration.
340
341``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case
342nodes with the same priority form a WFQ subgroup in the sibling group
343and arbitration among them is based on assigned weights.
344
345.. code:: shell
346
347 # enable switchdev
348 $ devlink dev eswitch set pci/0000:4b:00.0 mode switchdev
349
350 # at this point driver should export internal hierarchy
351 $ echo 2 > /sys/class/net/ens785np0/device/sriov_numvfs
352
353 $ devlink port function rate show
354 pci/0000:4b:00.0/node_25: type node parent node_24
355 pci/0000:4b:00.0/node_24: type node parent node_0
356 pci/0000:4b:00.0/node_32: type node parent node_31
357 pci/0000:4b:00.0/node_31: type node parent node_30
358 pci/0000:4b:00.0/node_30: type node parent node_16
359 pci/0000:4b:00.0/node_19: type node parent node_18
360 pci/0000:4b:00.0/node_18: type node parent node_17
361 pci/0000:4b:00.0/node_17: type node parent node_16
362 pci/0000:4b:00.0/node_14: type node parent node_5
363 pci/0000:4b:00.0/node_5: type node parent node_3
364 pci/0000:4b:00.0/node_13: type node parent node_4
365 pci/0000:4b:00.0/node_12: type node parent node_4
366 pci/0000:4b:00.0/node_11: type node parent node_4
367 pci/0000:4b:00.0/node_10: type node parent node_4
368 pci/0000:4b:00.0/node_9: type node parent node_4
369 pci/0000:4b:00.0/node_8: type node parent node_4
370 pci/0000:4b:00.0/node_7: type node parent node_4
371 pci/0000:4b:00.0/node_6: type node parent node_4
372 pci/0000:4b:00.0/node_4: type node parent node_3
373 pci/0000:4b:00.0/node_3: type node parent node_16
374 pci/0000:4b:00.0/node_16: type node parent node_15
375 pci/0000:4b:00.0/node_15: type node parent node_0
376 pci/0000:4b:00.0/node_2: type node parent node_1
377 pci/0000:4b:00.0/node_1: type node parent node_0
378 pci/0000:4b:00.0/node_0: type node
379 pci/0000:4b:00.0/1: type leaf parent node_25
380 pci/0000:4b:00.0/2: type leaf parent node_25
381
382 # let's create some custom node
383 $ devlink port function rate add pci/0000:4b:00.0/node_custom parent node_0
384
385 # second custom node
386 $ devlink port function rate add pci/0000:4b:00.0/node_custom_1 parent node_custom
387
388 # reassign second VF to newly created branch
389 $ devlink port function rate set pci/0000:4b:00.0/2 parent node_custom_1
390
391 # assign tx_weight to the VF
392 $ devlink port function rate set pci/0000:4b:00.0/2 tx_weight 5
393
394 # assign tx_share to the VF
395 $ devlink port function rate set pci/0000:4b:00.0/2 tx_share 500Mbps