Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

docs: net: dsa: update information about multiple CPU ports

DSA now supports multiple CPU ports, explain the use cases that are
covered, the new UAPI, the permitted degrees of freedom, the driver API,
and remove some old "hanging fruits".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

authored by

Vladimir Oltean and committed by
Paolo Abeni
0773e3a8 acc43b7b

+128 -6
+96
Documentation/networking/dsa/configuration.rst
··· 49 49 *eth0* 50 50 the master interface 51 51 52 + *eth1* 53 + another master interface 54 + 52 55 *lan1* 53 56 a slave interface 54 57 ··· 363 360 364 361 Script writers are therefore encouraged to use the ``master static`` set of 365 362 flags when working with bridge FDB entries on DSA switch interfaces. 363 + 364 + Affinity of user ports to CPU ports 365 + ----------------------------------- 366 + 367 + Typically, DSA switches are attached to the host via a single Ethernet 368 + interface, but in cases where the switch chip is discrete, the hardware design 369 + may permit the use of 2 or more ports connected to the host, for an increase in 370 + termination throughput. 371 + 372 + DSA can make use of multiple CPU ports in two ways. First, it is possible to 373 + statically assign the termination traffic associated with a certain user port 374 + to be processed by a certain CPU port. This way, user space can implement 375 + custom policies of static load balancing between user ports, by spreading the 376 + affinities according to the available CPU ports. 377 + 378 + Secondly, it is possible to perform load balancing between CPU ports on a per 379 + packet basis, rather than statically assigning user ports to CPU ports. 380 + This can be achieved by placing the DSA masters under a LAG interface (bonding 381 + or team). DSA monitors this operation and creates a mirror of this software LAG 382 + on the CPU ports facing the physical DSA masters that constitute the LAG slave 383 + devices. 384 + 385 + To make use of multiple CPU ports, the firmware (device tree) description of 386 + the switch must mark all the links between CPU ports and their DSA masters 387 + using the ``ethernet`` reference/phandle. At startup, only a single CPU port 388 + and DSA master will be used - the numerically first port from the firmware 389 + description which has an ``ethernet`` property. It is up to the user to 390 + configure the system for the switch to use other masters. 391 + 392 + DSA uses the ``rtnl_link_ops`` mechanism (with a "dsa" ``kind``) to allow 393 + changing the DSA master of a user port. The ``IFLA_DSA_MASTER`` u32 netlink 394 + attribute contains the ifindex of the master device that handles each slave 395 + device. The DSA master must be a valid candidate based on firmware node 396 + information, or a LAG interface which contains only slaves which are valid 397 + candidates. 398 + 399 + Using iproute2, the following manipulations are possible: 400 + 401 + .. code-block:: sh 402 + 403 + # See the DSA master in current use 404 + ip -d link show dev swp0 405 + (...) 406 + dsa master eth0 407 + 408 + # Static CPU port distribution 409 + ip link set swp0 type dsa master eth1 410 + ip link set swp1 type dsa master eth0 411 + ip link set swp2 type dsa master eth1 412 + ip link set swp3 type dsa master eth0 413 + 414 + # CPU ports in LAG, using explicit assignment of the DSA master 415 + ip link add bond0 type bond mode balance-xor && ip link set bond0 up 416 + ip link set eth1 down && ip link set eth1 master bond0 417 + ip link set swp0 type dsa master bond0 418 + ip link set swp1 type dsa master bond0 419 + ip link set swp2 type dsa master bond0 420 + ip link set swp3 type dsa master bond0 421 + ip link set eth0 down && ip link set eth0 master bond0 422 + ip -d link show dev swp0 423 + (...) 424 + dsa master bond0 425 + 426 + # CPU ports in LAG, relying on implicit migration of the DSA master 427 + ip link add bond0 type bond mode balance-xor && ip link set bond0 up 428 + ip link set eth0 down && ip link set eth0 master bond0 429 + ip link set eth1 down && ip link set eth1 master bond0 430 + ip -d link show dev swp0 431 + (...) 432 + dsa master bond0 433 + 434 + Notice that in the case of CPU ports under a LAG, the use of the 435 + ``IFLA_DSA_MASTER`` netlink attribute is not strictly needed, but rather, DSA 436 + reacts to the ``IFLA_MASTER`` attribute change of its present master (``eth0``) 437 + and migrates all user ports to the new upper of ``eth0``, ``bond0``. Similarly, 438 + when ``bond0`` is destroyed using ``RTM_DELLINK``, DSA migrates the user ports 439 + that were assigned to this interface to the first physical DSA master which is 440 + eligible, based on the firmware description (it effectively reverts to the 441 + startup configuration). 442 + 443 + In a setup with more than 2 physical CPU ports, it is therefore possible to mix 444 + static user to CPU port assignment with LAG between DSA masters. It is not 445 + possible to statically assign a user port towards a DSA master that has any 446 + upper interfaces (this includes LAG devices - the master must always be the LAG 447 + in this case). 448 + 449 + Live changing of the DSA master (and thus CPU port) affinity of a user port is 450 + permitted, in order to allow dynamic redistribution in response to traffic. 451 + 452 + Physical DSA masters are allowed to join and leave at any time a LAG interface 453 + used as a DSA master; however, DSA will reject a LAG interface as a valid 454 + candidate for being a DSA master unless it has at least one physical DSA master 455 + as a slave device.
+32 -6
Documentation/networking/dsa/dsa.rst
··· 303 303 Ethernet switch will be able to process these incoming frames from the 304 304 management interface and deliver them to the physical switch port. 305 305 306 + When using multiple CPU ports, it is possible to stack a LAG (bonding/team) 307 + device between the DSA slave devices and the physical DSA masters. The LAG 308 + device is thus also a DSA master, but the LAG slave devices continue to be DSA 309 + masters as well (just with no user port assigned to them; this is needed for 310 + recovery in case the LAG DSA master disappears). Thus, the data path of the LAG 311 + DSA master is used asymmetrically. On RX, the ``ETH_P_XDSA`` handler, which 312 + calls ``dsa_switch_rcv()``, is invoked early (on the physical DSA master; 313 + LAG slave). Therefore, the RX data path of the LAG DSA master is not used. 314 + On the other hand, TX takes place linearly: ``dsa_slave_xmit`` calls 315 + ``dsa_enqueue_skb``, which calls ``dev_queue_xmit`` towards the LAG DSA master. 316 + The latter calls ``dev_queue_xmit`` towards one physical DSA master or the 317 + other, and in both cases, the packet exits the system through a hardware path 318 + towards the switch. 319 + 306 320 Graphical representation 307 321 ------------------------ 308 322 ··· 642 628 probing only to be torn down immediately afterwards, for example in case its 643 629 PHY cannot be found. In this case, probing of the DSA switch continues 644 630 without that particular port. 631 + 632 + - ``port_change_master``: method through which the affinity (association used 633 + for traffic termination purposes) between a user port and a CPU port can be 634 + changed. By default all user ports from a tree are assigned to the first 635 + available CPU port that makes sense for them (most of the times this means 636 + the user ports of a tree are all assigned to the same CPU port, except for H 637 + topologies as described in commit 2c0b03258b8b). The ``port`` argument 638 + represents the index of the user port, and the ``master`` argument represents 639 + the new DSA master ``net_device``. The CPU port associated with the new 640 + master can be retrieved by looking at ``struct dsa_port *cpu_dp = 641 + master->dsa_ptr``. Additionally, the master can also be a LAG device where 642 + all the slave devices are physical DSA masters. LAG DSA masters also have a 643 + valid ``master->dsa_ptr`` pointer, however this is not unique, but rather a 644 + duplicate of the first physical DSA master's (LAG slave) ``dsa_ptr``. In case 645 + of a LAG DSA master, a further call to ``port_lag_join`` will be emitted 646 + separately for the physical CPU ports associated with the physical DSA 647 + masters, requesting them to create a hardware LAG associated with the LAG 648 + interface. 645 649 646 650 PHY devices and link management 647 651 ------------------------------- ··· 1127 1095 the other DSA enforces a fairly strict device driver model, and deals with most 1128 1096 of the switch specific. At some point we should envision a merger between these 1129 1097 two subsystems and get the best of both worlds. 1130 - 1131 - Other hanging fruits 1132 - -------------------- 1133 - 1134 - - allowing more than one CPU/management interface: 1135 - http://comments.gmane.org/gmane.linux.network/365657