Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

net: enetc: fix the deadlock of enetc_mdio_lock

After applying the workaround for err050089, the LS1028A platform
experiences RCU stalls on RT kernel. This issue is caused by the
recursive acquisition of the read lock enetc_mdio_lock. Here list some
of the call stacks identified under the enetc_poll path that may lead to
a deadlock:

enetc_poll
-> enetc_lock_mdio
-> enetc_clean_rx_ring OR napi_complete_done
-> napi_gro_receive
-> enetc_start_xmit
-> enetc_lock_mdio
-> enetc_map_tx_buffs
-> enetc_unlock_mdio
-> enetc_unlock_mdio

After enetc_poll acquires the read lock, a higher-priority writer attempts
to acquire the lock, causing preemption. The writer detects that a
read lock is already held and is scheduled out. However, readers under
enetc_poll cannot acquire the read lock again because a writer is already
waiting, leading to a thread hang.

Currently, the deadlock is avoided by adjusting enetc_lock_mdio to prevent
recursive lock acquisition.

Fixes: 6d36ecdbc441 ("net: enetc: take the MDIO lock only once per NAPI poll cycle")
Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com>
Acked-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20251015021427.180757-1-jianpeng.chang.cn@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

authored by

Jianpeng Chang and committed by
Jakub Kicinski
50bd33f6 7f864458

+21 -4
+21 -4
drivers/net/ethernet/freescale/enetc/enetc.c
··· 1595 1595 /* next descriptor to process */ 1596 1596 i = rx_ring->next_to_clean; 1597 1597 1598 + enetc_lock_mdio(); 1599 + 1598 1600 while (likely(rx_frm_cnt < work_limit)) { 1599 1601 union enetc_rx_bd *rxbd; 1600 1602 struct sk_buff *skb; ··· 1632 1630 rx_byte_cnt += skb->len + ETH_HLEN; 1633 1631 rx_frm_cnt++; 1634 1632 1633 + enetc_unlock_mdio(); 1635 1634 napi_gro_receive(napi, skb); 1635 + enetc_lock_mdio(); 1636 1636 } 1637 1637 1638 1638 rx_ring->next_to_clean = i; 1639 1639 1640 1640 rx_ring->stats.packets += rx_frm_cnt; 1641 1641 rx_ring->stats.bytes += rx_byte_cnt; 1642 + 1643 + enetc_unlock_mdio(); 1642 1644 1643 1645 return rx_frm_cnt; 1644 1646 } ··· 1953 1947 /* next descriptor to process */ 1954 1948 i = rx_ring->next_to_clean; 1955 1949 1950 + enetc_lock_mdio(); 1951 + 1956 1952 while (likely(rx_frm_cnt < work_limit)) { 1957 1953 union enetc_rx_bd *rxbd, *orig_rxbd; 1958 1954 struct xdp_buff xdp_buff; ··· 2018 2010 */ 2019 2011 enetc_bulk_flip_buff(rx_ring, orig_i, i); 2020 2012 2013 + enetc_unlock_mdio(); 2021 2014 napi_gro_receive(napi, skb); 2015 + enetc_lock_mdio(); 2022 2016 break; 2023 2017 case XDP_TX: 2024 2018 tx_ring = priv->xdp_tx_ring[rx_ring->index]; ··· 2055 2045 } 2056 2046 break; 2057 2047 case XDP_REDIRECT: 2048 + enetc_unlock_mdio(); 2058 2049 err = xdp_do_redirect(rx_ring->ndev, &xdp_buff, prog); 2050 + enetc_lock_mdio(); 2059 2051 if (unlikely(err)) { 2060 2052 enetc_xdp_drop(rx_ring, orig_i, i); 2061 2053 rx_ring->stats.xdp_redirect_failures++; ··· 2077 2065 rx_ring->stats.packets += rx_frm_cnt; 2078 2066 rx_ring->stats.bytes += rx_byte_cnt; 2079 2067 2080 - if (xdp_redirect_frm_cnt) 2068 + if (xdp_redirect_frm_cnt) { 2069 + enetc_unlock_mdio(); 2081 2070 xdp_do_flush(); 2071 + enetc_lock_mdio(); 2072 + } 2082 2073 2083 2074 if (xdp_tx_frm_cnt) 2084 2075 enetc_update_tx_ring_tail(tx_ring); ··· 2089 2074 if (cleaned_cnt > rx_ring->xdp.xdp_tx_in_flight) 2090 2075 enetc_refill_rx_ring(rx_ring, enetc_bd_unused(rx_ring) - 2091 2076 rx_ring->xdp.xdp_tx_in_flight); 2077 + 2078 + enetc_unlock_mdio(); 2092 2079 2093 2080 return rx_frm_cnt; 2094 2081 } ··· 2110 2093 for (i = 0; i < v->count_tx_rings; i++) 2111 2094 if (!enetc_clean_tx_ring(&v->tx_ring[i], budget)) 2112 2095 complete = false; 2096 + enetc_unlock_mdio(); 2113 2097 2114 2098 prog = rx_ring->xdp.prog; 2115 2099 if (prog) ··· 2122 2104 if (work_done) 2123 2105 v->rx_napi_work = true; 2124 2106 2125 - if (!complete) { 2126 - enetc_unlock_mdio(); 2107 + if (!complete) 2127 2108 return budget; 2128 - } 2129 2109 2130 2110 napi_complete_done(napi, work_done); 2131 2111 ··· 2132 2116 2133 2117 v->rx_napi_work = false; 2134 2118 2119 + enetc_lock_mdio(); 2135 2120 /* enable interrupts */ 2136 2121 enetc_wr_reg_hot(v->rbier, ENETC_RBIER_RXTIE); 2137 2122