Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Documentation: networking: add Twisted Pair Ethernet diagnostics at OSI Layer 1

This patch introduces a diagnostic guide for troubleshooting Twisted
Pair Ethernet variants at OSI Layer 1. It provides detailed steps for
detecting and resolving common link issues, such as incorrect wiring,
cable damage, and power delivery problems. The guide also includes
interface verification steps and PHY-specific diagnostics.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241004121824.1716303-1-o.rempel@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

authored by

Oleksij Rempel and committed by
Paolo Abeni
e793b86a a17b9b3a

+785
+17
Documentation/networking/diagnostic/index.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ====================== 4 + Networking Diagnostics 5 + ====================== 6 + 7 + .. toctree:: 8 + :maxdepth: 2 9 + 10 + twisted_pair_layer1_diagnostics.rst 11 + 12 + .. only:: subproject and html 13 + 14 + Indices 15 + ======= 16 + 17 + * :ref:`genindex`
+767
Documentation/networking/diagnostic/twisted_pair_layer1_diagnostics.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + Diagnostic Concept for Investigating Twisted Pair Ethernet Variants at OSI Layer 1 4 + ================================================================================== 5 + 6 + Introduction 7 + ------------ 8 + 9 + This documentation is designed for two primary audiences: 10 + 11 + 1. **Users and System Administrators**: For those dealing with real-world 12 + Ethernet issues, this guide provides a practical, step-by-step 13 + troubleshooting flow to help identify and resolve common problems in Twisted 14 + Pair Ethernet at OSI Layer 1. If you're facing unstable links, speed drops, 15 + or mysterious network issues, jump right into the step-by-step guide and 16 + follow it through to find your solution. 17 + 18 + 2. **Kernel Developers**: For developers working with network drivers and PHY 19 + support, this documentation outlines the diagnostic process and highlights 20 + areas where the Linux kernel’s diagnostic interfaces could be extended or 21 + improved. By understanding the diagnostic flow, developers can better 22 + prioritize future enhancements. 23 + 24 + Step-by-Step Diagnostic Guide from Linux (General Ethernet) 25 + ----------------------------------------------------------- 26 + 27 + This diagnostic guide covers common Ethernet troubleshooting scenarios, 28 + focusing on **link stability and detection** across different Ethernet 29 + environments, including **Single-Pair Ethernet (SPE)** and **Multi-Pair 30 + Ethernet (MPE)**, as well as power delivery technologies like **PoDL** (Power 31 + over Data Line) and **PoE** (Clause 33 PSE). 32 + 33 + The guide is designed to help users diagnose physical layer (Layer 1) issues on 34 + systems running **Linux kernel version 6.11 or newer**, utilizing **ethtool 35 + version 6.10 or later** and **iproute2 version 6.4.0 or later**. 36 + 37 + In this guide, we assume that users may have **limited or no access to the link 38 + partner** and will focus on diagnosing issues locally. 39 + 40 + Diagnostic Scenarios 41 + ~~~~~~~~~~~~~~~~~~~~ 42 + 43 + - **Link is up and stable, but no data transfer**: If the link is stable but 44 + there are issues with data transmission, refer to the **OSI Layer 2 45 + Troubleshooting Guide**. 46 + 47 + - **Link is unstable**: Link resets, speed drops, or other fluctuations 48 + indicate potential issues at the hardware or physical layer. 49 + 50 + - **No link detected**: The interface is up, but no link is established. 51 + 52 + Verify Interface Status 53 + ~~~~~~~~~~~~~~~~~~~~~~~ 54 + 55 + Begin by verifying the status of the Ethernet interface to check if it is 56 + administratively up. Unlike `ethtool`, which provides information on the link 57 + and PHY status, it does not show the **administrative state** of the interface. 58 + To check this, you should use the `ip` command, which describes the interface 59 + state within the angle brackets `"<>"` in its output. 60 + 61 + For example, in the output `<NO-CARRIER,BROADCAST,MULTICAST,UP>`, the important 62 + keywords are: 63 + 64 + - **UP**: The interface is in the administrative "UP" state. 65 + - **NO-CARRIER**: The interface is administratively up, but no physical link is 66 + detected. 67 + 68 + If the output shows `<BROADCAST,MULTICAST>`, this indicates the interface is in 69 + the administrative "DOWN" state. 70 + 71 + - **Command:** `ip link show dev <interface>` 72 + 73 + - **Expected Output:** 74 + 75 + .. code-block:: bash 76 + 77 + 4: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 ... 78 + link/ether 88:14:2b:00:96:f2 brd ff:ff:ff:ff:ff:ff 79 + 80 + - **Interpreting the Output:** 81 + 82 + - **Administrative UP State**: 83 + 84 + - If the output contains **"UP"**, the interface is administratively up, 85 + and the system is trying to establish a physical link. 86 + 87 + - If you also see **"NO-CARRIER"**, it means the physical link has not been 88 + detected, indicating potential Layer 1 issues like a cable fault, 89 + misconfiguration, or no connection at the link partner. In this case, 90 + proceed to the **Inspect Link Status and PHY Configuration** section. 91 + 92 + - **Administrative DOWN State**: 93 + 94 + - If the output lacks **"UP"** and shows only states like 95 + **"<BROADCAST,MULTICAST>"**, it means the interface is administratively 96 + down. In this case, bring the interface up using the following command: 97 + 98 + .. code-block:: bash 99 + 100 + ip link set dev <interface> up 101 + 102 + - **Next Steps**: 103 + 104 + - If the interface is **administratively up** but shows **NO-CARRIER**, 105 + proceed to the **Inspect Link Status and PHY Configuration** section to 106 + troubleshoot potential physical layer issues. 107 + 108 + - If the interface was **administratively down** and you have brought it up, 109 + ensure to **repeat this verification step** to confirm the new state of the 110 + interface before proceeding 111 + 112 + - **If the interface is up and the link is detected**: 113 + 114 + - If the output shows **"UP"** and there is **no `NO-CARRIER`**, the 115 + interface is administratively up, and the physical link has been 116 + successfully established. If everything is working as expected, the Layer 117 + 1 diagnostics are complete, and no further action is needed. 118 + 119 + - If the interface is up and the link is detected but **no data is being 120 + transferred**, the issue is likely beyond Layer 1, and you should proceed 121 + with diagnosing the higher layers of the OSI model. This may involve 122 + checking Layer 2 configurations (such as VLANs or MAC address issues), 123 + Layer 3 settings (like IP addresses, routing, or ARP), or Layer 4 and 124 + above (firewalls, services, etc.). 125 + 126 + - If the **link is unstable** or **frequently resetting or dropping**, this 127 + may indicate a physical layer issue such as a faulty cable, interference, 128 + or power delivery problems. In this case, proceed with the next step in 129 + this guide. 130 + 131 + Inspect Link Status and PHY Configuration 132 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 133 + 134 + Use `ethtool -I` to check the link status, PHY configuration, supported link 135 + modes, and additional statistics such as the **Link Down Events** counter. This 136 + step is essential for diagnosing Layer 1 problems such as speed mismatches, 137 + duplex issues, and link instability. 138 + 139 + For both **Single-Pair Ethernet (SPE)** and **Multi-Pair Ethernet (MPE)** 140 + devices, you will use this step to gather key details about the link. **SPE** 141 + links generally support a single speed and mode without autonegotiation (with 142 + the exception of **10BaseT1L**), while **MPE** devices typically support 143 + multiple link modes and autonegotiation. 144 + 145 + - **Command:** `ethtool -I <interface>` 146 + 147 + - **Example Output for SPE Interface (Non-autonegotiation)**: 148 + 149 + .. code-block:: bash 150 + 151 + Settings for spe4: 152 + Supported ports: [ TP ] 153 + Supported link modes: 100baseT1/Full 154 + Supported pause frame use: No 155 + Supports auto-negotiation: No 156 + Supported FEC modes: Not reported 157 + Advertised link modes: Not applicable 158 + Advertised pause frame use: No 159 + Advertised auto-negotiation: No 160 + Advertised FEC modes: Not reported 161 + Speed: 100Mb/s 162 + Duplex: Full 163 + Auto-negotiation: off 164 + master-slave cfg: forced slave 165 + master-slave status: slave 166 + Port: Twisted Pair 167 + PHYAD: 6 168 + Transceiver: external 169 + MDI-X: Unknown 170 + Supports Wake-on: d 171 + Wake-on: d 172 + Link detected: yes 173 + SQI: 7/7 174 + Link Down Events: 2 175 + 176 + - **Example Output for MPE Interface (Autonegotiation)**: 177 + 178 + .. code-block:: bash 179 + 180 + Settings for eth1: 181 + Supported ports: [ TP MII ] 182 + Supported link modes: 10baseT/Half 10baseT/Full 183 + 100baseT/Half 100baseT/Full 184 + Supported pause frame use: Symmetric Receive-only 185 + Supports auto-negotiation: Yes 186 + Supported FEC modes: Not reported 187 + Advertised link modes: 10baseT/Half 10baseT/Full 188 + 100baseT/Half 100baseT/Full 189 + Advertised pause frame use: Symmetric Receive-only 190 + Advertised auto-negotiation: Yes 191 + Advertised FEC modes: Not reported 192 + Link partner advertised link modes: 10baseT/Half 10baseT/Full 193 + 100baseT/Half 100baseT/Full 194 + Link partner advertised pause frame use: Symmetric Receive-only 195 + Link partner advertised auto-negotiation: Yes 196 + Link partner advertised FEC modes: Not reported 197 + Speed: 100Mb/s 198 + Duplex: Full 199 + Auto-negotiation: on 200 + Port: Twisted Pair 201 + PHYAD: 10 202 + Transceiver: internal 203 + MDI-X: Unknown 204 + Supports Wake-on: pg 205 + Wake-on: p 206 + Link detected: yes 207 + Link Down Events: 1 208 + 209 + - **Next Steps**: 210 + 211 + - Record the output provided by `ethtool`, particularly noting the 212 + **master-slave status**, **speed**, **duplex**, and other relevant fields. 213 + This information will be useful for further analysis or troubleshooting. 214 + Once the **ethtool** output has been collected and stored, move on to the 215 + next diagnostic step. 216 + 217 + Check Power Delivery (PoDL or PoE) 218 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 219 + 220 + If it is known that **PoDL** or **PoE** is **not implemented** on the system, 221 + or the **PSE** (Power Sourcing Equipment) is managed by proprietary user-space 222 + software or external tools, you can skip this step. In such cases, verify power 223 + delivery through alternative methods, such as checking hardware indicators 224 + (LEDs), using multimeters, or consulting vendor-specific software for 225 + monitoring power status. 226 + 227 + If **PoDL** or **PoE** is implemented and managed directly by Linux, follow 228 + these steps to ensure power is being delivered correctly: 229 + 230 + - **Command:** `ethtool --show-pse <interface>` 231 + 232 + - **Expected Output Examples**: 233 + 234 + 1. **PSE Not Supported**: 235 + 236 + If no PSE is attached or the interface does not support PSE, the following 237 + output is expected: 238 + 239 + .. code-block:: bash 240 + 241 + netlink error: No PSE is attached 242 + netlink error: Operation not supported 243 + 244 + 2. **PoDL (Single-Pair Ethernet)**: 245 + 246 + When PoDL is implemented, you might see the following attributes: 247 + 248 + .. code-block:: bash 249 + 250 + PSE attributes for eth1: 251 + PoDL PSE Admin State: enabled 252 + PoDL PSE Power Detection Status: delivering power 253 + 254 + 3. **PoE (Clause 33 PSE)**: 255 + 256 + For standard PoE, the output may look like this: 257 + 258 + .. code-block:: bash 259 + 260 + PSE attributes for eth1: 261 + Clause 33 PSE Admin State: enabled 262 + Clause 33 PSE Power Detection Status: delivering power 263 + Clause 33 PSE Available Power Limit: 18000 264 + 265 + - **Adjust Power Limit (if needed)**: 266 + 267 + - Sometimes, the available power limit may not be sufficient for the link 268 + partner. You can increase the power limit as needed. 269 + 270 + - **Command:** `ethtool --set-pse <interface> c33-pse-avail-pw-limit <limit>` 271 + 272 + Example: 273 + 274 + .. code-block:: bash 275 + 276 + ethtool --set-pse eth1 c33-pse-avail-pw-limit 18000 277 + ethtool --show-pse eth1 278 + 279 + **Expected Output** after adjusting the power limit: 280 + 281 + .. code-block:: bash 282 + 283 + Clause 33 PSE Available Power Limit: 18000 284 + 285 + 286 + - **Next Steps**: 287 + 288 + - **PoE or PoDL Not Used**: If **PoE** or **PoDL** is not implemented or used 289 + on the system, proceed to the next diagnostic step, as power delivery is 290 + not relevant for this setup. 291 + 292 + - **PoE or PoDL Controlled Externally**: If **PoE** or **PoDL** is used but 293 + is not managed by the Linux kernel's **PSE-PD** framework (i.e., it is 294 + controlled by proprietary user-space software or external tools), this part 295 + is out of scope for this documentation. Please consult vendor-specific 296 + documentation or external tools for monitoring and managing power delivery. 297 + 298 + - **PSE Admin State Disabled**: 299 + 300 + - If the `PSE Admin State:` is **disabled**, enable it by running one of 301 + the following commands: 302 + 303 + .. code-block:: bash 304 + 305 + ethtool --set-pse <devname> podl-pse-admin-control enable 306 + 307 + or, for Clause 33 PSE (PoE): 308 + 309 + ethtool --set-pse <devname> c33-pse-admin-control enable 310 + 311 + - After enabling the PSE Admin State, return to the start of the **Check 312 + Power Delivery (PoDL or PoE)** step to recheck the power delivery status. 313 + 314 + - **Power Not Delivered**: If the `Power Detection Status` shows something 315 + other than "delivering power" (e.g., `over current`), troubleshoot the 316 + **PSE**. Check for potential issues such as a short circuit in the cable, 317 + insufficient power delivery, or a fault in the PSE itself. 318 + 319 + - **Power Delivered but No Link**: If power is being delivered but no link is 320 + established, proceed with further diagnostics by performing **Cable 321 + Diagnostics** or reviewing the **Inspect Link Status and PHY 322 + Configuration** steps to identify any underlying issues with the physical 323 + link or settings. 324 + 325 + Cable Diagnostics 326 + ~~~~~~~~~~~~~~~~~ 327 + 328 + Use `ethtool` to test for physical layer issues such as cable faults. The test 329 + results can vary depending on the cable's condition, the technology in use, and 330 + the state of the link partner. The results from the cable test will help in 331 + diagnosing issues like open circuits, shorts, impedance mismatches, and 332 + noise-related problems. 333 + 334 + - **Command:** `ethtool --cable-test <interface>` 335 + 336 + The following are the typical outputs for **Single-Pair Ethernet (SPE)** and 337 + **Multi-Pair Ethernet (MPE)**: 338 + 339 + - **For Single-Pair Ethernet (SPE)**: 340 + - **Expected Output (SPE)**: 341 + 342 + .. code-block:: bash 343 + 344 + Cable test completed for device eth1. 345 + Pair A, fault length: 25.00m 346 + Pair A code Open Circuit 347 + 348 + This indicates an open circuit or cable fault at the reported distance, but 349 + results can be influenced by the link partner's state. Refer to the 350 + **"Troubleshooting Based on Cable Test Results"** section for further 351 + interpretation of these results. 352 + 353 + - **For Multi-Pair Ethernet (MPE)**: 354 + - **Expected Output (MPE)**: 355 + 356 + .. code-block:: bash 357 + 358 + Cable test completed for device eth0. 359 + Pair A code OK 360 + Pair B code OK 361 + Pair C code Open Circuit 362 + 363 + Here, Pair C is reported as having an open circuit, while Pairs A and B are 364 + functioning correctly. However, if autonegotiation is in use on Pairs A and 365 + B, the cable test may be disrupted. Refer to the **"Troubleshooting Based on 366 + Cable Test Results"** section for a detailed explanation of these issues and 367 + how to resolve them. 368 + 369 + For detailed descriptions of the different possible cable test results, please 370 + refer to the **"Troubleshooting Based on Cable Test Results"** section. 371 + 372 + Troubleshooting Based on Cable Test Results 373 + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 374 + 375 + After running the cable test, the results can help identify specific issues in 376 + the physical connection. However, it is important to note that **cable testing 377 + results heavily depend on the capabilities and characteristics of both the 378 + local hardware and the link partner**. The accuracy and reliability of the 379 + results can vary significantly between different hardware implementations. 380 + 381 + In some cases, this can introduce **blind spots** in the current cable testing 382 + implementation, where certain results may not accurately reflect the actual 383 + physical state of the cable. For example: 384 + 385 + - An **Open Circuit** result might not only indicate a damaged or disconnected 386 + cable but also occur if the cable is properly attached to a powered-down link 387 + partner. 388 + 389 + - Some PHYs may report a **Short within Pair** if the link partner is in 390 + **forced slave mode**, even though there is no actual short in the cable. 391 + 392 + To help users interpret the results more effectively, it could be beneficial to 393 + extend the **kernel UAPI** (User API) to provide additional context or 394 + **possible variants** of issues based on the hardware’s characteristics. Since 395 + these quirks are often hardware-specific, the **kernel driver** would be an 396 + ideal source of such information. By providing flags or hints related to 397 + potential false positives for each test result, users would have a better 398 + understanding of what to verify and where to investigate further. 399 + 400 + Until such improvements are made, users should be aware of these limitations 401 + and manually verify cable issues as needed. Physical inspections may help 402 + resolve uncertainties related to false positive results. 403 + 404 + The results can be one of the following: 405 + 406 + - **OK**: 407 + 408 + - The cable is functioning correctly, and no issues were detected. 409 + 410 + - **Next Steps**: If you are still experiencing issues, it might be related 411 + to higher-layer problems, such as duplex mismatches or speed negotiation, 412 + which are not physical-layer issues. 413 + 414 + - **Special Case for `BaseT1` (1000/100/10BaseT1)**: In `BaseT1` systems, an 415 + "OK" result typically also means that the link is up and likely in **slave 416 + mode**, since cable tests usually only pass in this mode. For some 417 + **10BaseT1L** PHYs, an "OK" result may occur even if the cable is too long 418 + for the PHY's configured range (for example, when the range is configured 419 + for short-distance mode). 420 + 421 + - **Open Circuit**: 422 + 423 + - An **Open Circuit** result typically indicates that the cable is damaged or 424 + disconnected at the reported fault length. Consider these possibilities: 425 + 426 + - If the link partner is in **admin down** state or powered off, you might 427 + still get an "Open Circuit" result even if the cable is functional. 428 + 429 + - **Next Steps**: Inspect the cable at the fault length for visible damage 430 + or loose connections. Verify the link partner is powered on and in the 431 + correct mode. 432 + 433 + - **Short within Pair**: 434 + 435 + - A **Short within Pair** indicates an unintended connection within the same 436 + pair of wires, typically caused by physical damage to the cable. 437 + 438 + - **Next Steps**: Replace or repair the cable and check for any physical 439 + damage or improperly crimped connectors. 440 + 441 + - **Short to Another Pair**: 442 + 443 + - A **Short to Another Pair** means the wires from different pairs are 444 + shorted, which could occur due to physical damage or incorrect wiring. 445 + 446 + - **Next Steps**: Replace or repair the damaged cable. Inspect the cable for 447 + incorrect terminations or pinched wiring. 448 + 449 + - **Impedance Mismatch**: 450 + 451 + - **Impedance Mismatch** indicates a reflection caused by an impedance 452 + discontinuity in the cable. This can happen when a part of the cable has 453 + abnormal impedance (e.g., when different cable types are spliced together 454 + or when there is a defect in the cable). 455 + 456 + - **Next Steps**: Check the cable quality and ensure consistent impedance 457 + throughout its length. Replace any sections of the cable that do not meet 458 + specifications. 459 + 460 + - **Noise**: 461 + 462 + - **Noise** means that the Time Domain Reflectometry (TDR) test could not 463 + complete due to excessive noise on the cable, which can be caused by 464 + interference from electromagnetic sources. 465 + 466 + - **Next Steps**: Identify and eliminate sources of electromagnetic 467 + interference (EMI) near the cable. Consider using shielded cables or 468 + rerouting the cable away from noise sources. 469 + 470 + - **Resolution Not Possible**: 471 + 472 + - **Resolution Not Possible** means that the TDR test could not detect the 473 + issue due to the resolution limitations of the test or because the fault is 474 + beyond the distance that the test can measure. 475 + 476 + - **Next Steps**: Inspect the cable manually if possible, or use alternative 477 + diagnostic tools that can handle greater distances or higher resolution. 478 + 479 + - **Unknown**: 480 + 481 + - An **Unknown** result may occur when the test cannot classify the fault or 482 + when a specific issue is outside the scope of the tool's detection 483 + capabilities. 484 + 485 + - **Next Steps**: Re-run the test, verify the link partner's state, and inspect 486 + the cable manually if necessary. 487 + 488 + Verify Link Partner PHY Configuration 489 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 490 + 491 + If the cable test passes but the link is still not functioning correctly, it’s 492 + essential to verify the configuration of the link partner’s PHY. Mismatches in 493 + speed, duplex settings, or master-slave roles can cause connection issues. 494 + 495 + Autonegotiation Mismatch 496 + ^^^^^^^^^^^^^^^^^^^^^^^^ 497 + 498 + - If both link partners support autonegotiation, ensure that autonegotiation is 499 + enabled on both sides and that all supported link modes are advertised. A 500 + mismatch can lead to connectivity problems or sub optimal performance. 501 + 502 + - **Quick Fix:** Reset autonegotiation to the default settings, which will 503 + advertise all default link modes: 504 + 505 + .. code-block:: bash 506 + 507 + ethtool -s <interface> autoneg on 508 + 509 + - **Command to check configuration:** `ethtool <interface>` 510 + 511 + - **Expected Output:** Ensure that both sides advertise compatible link modes. 512 + If autonegotiation is off, verify that both link partners are configured for 513 + the same speed and duplex. 514 + 515 + The following example shows a case where the local PHY advertises fewer link 516 + modes than it supports. This will reduce the number of overlapping link modes 517 + with the link partner. In the worst case, there will be no common link modes, 518 + and the link will not be created: 519 + 520 + .. code-block:: bash 521 + 522 + Settings for eth0: 523 + Supported link modes: 1000baseT/Full, 100baseT/Full 524 + Advertised link modes: 1000baseT/Full 525 + Speed: 1000Mb/s 526 + Duplex: Full 527 + Auto-negotiation: on 528 + 529 + Combined Mode Mismatch (Autonegotiation on One Side, Forced on the Other) 530 + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 531 + 532 + - One possible issue occurs when one side is using **autonegotiation** (as in 533 + most modern systems), and the other side is set to a **forced link mode** 534 + (e.g., older hardware with single-speed hubs). In such cases, modern PHYs 535 + will attempt to detect the forced mode on the other side. If the link is 536 + established, you may notice: 537 + 538 + - **No or empty "Link partner advertised link modes"**. 539 + 540 + - **"Link partner advertised auto-negotiation:"** will be **"no"** or not 541 + present. 542 + 543 + - This type of detection does not always work reliably: 544 + 545 + - Typically, the modern PHY will default to **Half Duplex**, even if the link 546 + partner is actually configured for **Full Duplex**. 547 + 548 + - Some PHYs may not work reliably if the link partner switches from one 549 + forced mode to another. In this case, only a down/up cycle may help. 550 + 551 + - **Next Steps**: Set both sides to the same fixed speed and duplex mode to 552 + avoid potential detection issues. 553 + 554 + .. code-block:: bash 555 + 556 + ethtool -s <interface> speed 1000 duplex full autoneg off 557 + 558 + Master/Slave Role Mismatch (BaseT1 and 1000BaseT PHYs) 559 + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 560 + 561 + - In **BaseT1** systems (e.g., 1000BaseT1, 100BaseT1), link establishment 562 + requires that one device is configured as **master** and the other as 563 + **slave**. A mismatch in this master-slave configuration can prevent the link 564 + from being established. However, **1000BaseT** also supports configurable 565 + master/slave roles and can face similar issues. 566 + 567 + - **Role Preference in 1000BaseT**: The **1000BaseT** specification allows link 568 + partners to negotiate master-slave roles or role preferences during 569 + autonegotiation. Some PHYs have hardware limitations or bugs that prevent 570 + them from functioning properly in certain roles. In such cases, drivers may 571 + force these PHYs into a specific role (e.g., **forced master** or **forced 572 + slave**) or try a weaker option by setting preferences. If both link partners 573 + have the same issue and are forced into the same mode (e.g., both forced into 574 + master mode), they will not be able to establish a link. 575 + 576 + - **Next Steps**: Ensure that one side is configured as **master** and the 577 + other as **slave** to avoid this issue, particularly when hardware 578 + limitations are involved, or try the weaker **preferred** option instead of 579 + **forced**. Check for any driver-related restrictions or forced modes. 580 + 581 + - **Command to force master/slave mode**: 582 + 583 + .. code-block:: bash 584 + 585 + ethtool -s <interface> master-slave forced-master 586 + 587 + or: 588 + 589 + .. code-block:: bash 590 + 591 + ethtool -s <interface> master-slave forced-master speed 1000 duplex full autoneg off 592 + 593 + 594 + - **Check the current master/slave status**: 595 + 596 + .. code-block:: bash 597 + 598 + ethtool <interface> 599 + 600 + Example Output: 601 + 602 + .. code-block:: bash 603 + 604 + master-slave cfg: forced-master 605 + master-slave status: master 606 + 607 + - **Hardware Bugs and Driver Forcing**: If a known hardware issue forces the 608 + PHY into a specific mode, it’s essential to check the driver source code or 609 + hardware documentation for details. Ensure that the roles are compatible 610 + across both link partners, and if both PHYs are forced into the same mode, 611 + adjust one side accordingly to resolve the mismatch. 612 + 613 + Monitor Link Resets and Speed Drops 614 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 615 + 616 + If the link is unstable, showing frequent resets or speed drops, this may 617 + indicate issues with the cable, PHY configuration, or environmental factors. 618 + While there is still no completely unified way in Linux to directly monitor 619 + downshift events or link speed changes via user space tools, both the Linux 620 + kernel logs and `ethtool` can provide valuable insights, especially if the 621 + driver supports reporting such events. 622 + 623 + - **Monitor Kernel Logs for Link Resets and Speed Drops**: 624 + 625 + - The Linux kernel will print link status changes, including downshift 626 + events, in the system logs. These messages typically include speed changes, 627 + duplex mode, and downshifted link speed (if the driver supports it). 628 + 629 + - **Command to monitor kernel logs in real-time:** 630 + 631 + .. code-block:: bash 632 + 633 + dmesg -w | grep "Link is Up\|Link is Down" 634 + 635 + - Example Output (if a downshift occurs): 636 + 637 + .. code-block:: bash 638 + 639 + eth0: Link is Up - 100Mbps/Full (downshifted) - flow control rx/tx 640 + eth0: Link is Down 641 + 642 + This indicates that the link has been established but has downshifted from 643 + a higher speed. 644 + 645 + - **Note**: Not all drivers or PHYs support downshift reporting, so you may 646 + not see this information for all devices. 647 + 648 + - **Monitor Link Down Events Using `ethtool`**: 649 + 650 + - Starting with the latest kernel and `ethtool` versions, you can track 651 + **Link Down Events** using the `ethtool -I` command. This will provide 652 + counters for link drops, helping to diagnose link instability issues if 653 + supported by the driver. 654 + 655 + - **Command to monitor link down events:** 656 + 657 + .. code-block:: bash 658 + 659 + ethtool -I <interface> 660 + 661 + - Example Output (if supported): 662 + 663 + .. code-block:: bash 664 + 665 + PSE attributes for eth1: 666 + Link Down Events: 5 667 + 668 + This indicates that the link has dropped 5 times. Frequent link down events 669 + may indicate cable or environmental issues that require further 670 + investigation. 671 + 672 + - **Check Link Status and Speed**: 673 + 674 + - Even though downshift counts or events are not easily tracked, you can 675 + still use `ethtool` to manually check the current link speed and status. 676 + 677 + - **Command:** `ethtool <interface>` 678 + 679 + - **Expected Output:** 680 + 681 + .. code-block:: bash 682 + 683 + Speed: 1000Mb/s 684 + Duplex: Full 685 + Auto-negotiation: on 686 + Link detected: yes 687 + 688 + Any inconsistencies in the expected speed or duplex setting could indicate 689 + an issue. 690 + 691 + - **Disable Energy-Efficient Ethernet (EEE) for Diagnostics**: 692 + 693 + - **EEE** (Energy-Efficient Ethernet) can be a source of link instability due 694 + to transitions in and out of low-power states. For diagnostic purposes, it 695 + may be useful to **temporarily** disable EEE to determine if it is 696 + contributing to link instability. This is **not a generic recommendation** 697 + for disabling power management. 698 + 699 + - **Next Steps**: Disable EEE and monitor if the link becomes stable. If 700 + disabling EEE resolves the issue, report the bug so that the driver can be 701 + fixed. 702 + 703 + - **Command:** 704 + 705 + .. code-block:: bash 706 + 707 + ethtool --set-eee <interface> eee off 708 + 709 + - **Important**: If disabling EEE resolves the instability, the issue should 710 + be reported to the maintainers as a bug, and the driver should be corrected 711 + to handle EEE properly without causing instability. Disabling EEE 712 + permanently should not be seen as a solution. 713 + 714 + - **Monitor Error Counters**: 715 + 716 + - While some NIC drivers and PHYs provide error counters, there is no unified 717 + set of PHY-specific counters across all hardware. Additionally, not all 718 + PHYs provide useful information related to errors like CRC errors, frame 719 + drops, or link flaps. Therefore, this step is dependent on the specific 720 + hardware and driver support. 721 + 722 + - **Next Steps**: Use `ethtool -S <interface>` to check if your driver 723 + provides useful error counters. In some cases, counters may provide 724 + information about errors like link flaps or physical layer problems (e.g., 725 + excessive CRC errors), but results can vary significantly depending on the 726 + PHY. 727 + 728 + - **Command:** `ethtool -S <interface>` 729 + 730 + - **Example Output (if supported)**: 731 + 732 + .. code-block:: bash 733 + 734 + rx_crc_errors: 123 735 + tx_errors: 45 736 + rx_frame_errors: 78 737 + 738 + - **Note**: If no meaningful error counters are available or if counters are 739 + not supported, you may need to rely on physical inspections (e.g., cable 740 + condition) or kernel log messages (e.g., link up/down events) to further 741 + diagnose the issue. 742 + 743 + When All Else Fails... 744 + ~~~~~~~~~~~~~~~~~~~~~~ 745 + 746 + So you've checked the cables, monitored the logs, disabled EEE, and still... 747 + nothing? Don’t worry, you’re not alone. Sometimes, Ethernet gremlins just don’t 748 + want to cooperate. 749 + 750 + But before you throw in the towel (or the Ethernet cable), take a deep breath. 751 + It’s always possible that: 752 + 753 + 1. Your PHY has a unique, undocumented personality. 754 + 755 + 2. The problem is lying dormant, waiting for just the right moment to magically 756 + resolve itself (hey, it happens!). 757 + 758 + 3. Or, it could be that the ultimate solution simply hasn’t been invented yet. 759 + 760 + If none of the above bring you comfort, there’s one final step: contribute! If 761 + you've uncovered new or unusual issues, or have creative diagnostic methods, 762 + feel free to share your findings and extend this documentation. Together, we 763 + can hunt down every elusive network issue - one twisted pair at a time. 764 + 765 + Remember: sometimes the solution is just a reboot away, but if not, it’s time to 766 + dig deeper - or report that bug! 767 +
+1
Documentation/networking/index.rst
··· 14 14 can 15 15 can_ucan_protocol 16 16 device_drivers/index 17 + diagnostic/index 17 18 dsa/index 18 19 devlink/index 19 20 caif/index