Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

xhci: Fix perceived dead host due to runtime suspend race with event handler

Don't rely on event interrupt (EINT) bit alone to detect pending port
change in resume. If no change event is detected the host may be suspended
again, oterwise roothubs are resumed.

There is a lag in xHC setting EINT. If we don't notice the pending change
in resume, and the controller is runtime suspeded again, it causes the
event handler to assume host is dead as it will fail to read xHC registers
once PCI puts the controller to D3 state.

[ 268.520969] xhci_hcd: xhci_resume: starting port polling.
[ 268.520985] xhci_hcd: xhci_hub_status_data: stopping port polling.
[ 268.521030] xhci_hcd: xhci_suspend: stopping port polling.
[ 268.521040] xhci_hcd: // Setting command ring address to 0x349bd001
[ 268.521139] xhci_hcd: Port Status Change Event for port 3
[ 268.521149] xhci_hcd: resume root hub
[ 268.521163] xhci_hcd: port resume event for port 3
[ 268.521168] xhci_hcd: xHC is not running.
[ 268.521174] xhci_hcd: handle_port_status: starting port polling.
[ 268.596322] xhci_hcd: xhci_hc_died: xHCI host controller not responding, assume dead

The EINT lag is described in a additional note in xhci specs 4.19.2:

"Due to internal xHC scheduling and system delays, there will be a lag
between a change bit being set and the Port Status Change Event that it
generated being written to the Event Ring. If SW reads the PORTSC and
sees a change bit set, there is no guarantee that the corresponding Port
Status Change Event has already been written into the Event Ring."

Cc: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

authored by

Mathias Nyman and committed by
Greg Kroah-Hartman
229bc19f 9d1a68c4

+41 -3
+37 -3
drivers/usb/host/xhci.c
··· 908 908 spin_unlock_irqrestore(&xhci->lock, flags); 909 909 } 910 910 911 + static bool xhci_pending_portevent(struct xhci_hcd *xhci) 912 + { 913 + struct xhci_port **ports; 914 + int port_index; 915 + u32 status; 916 + u32 portsc; 917 + 918 + status = readl(&xhci->op_regs->status); 919 + if (status & STS_EINT) 920 + return true; 921 + /* 922 + * Checking STS_EINT is not enough as there is a lag between a change 923 + * bit being set and the Port Status Change Event that it generated 924 + * being written to the Event Ring. See note in xhci 1.1 section 4.19.2. 925 + */ 926 + 927 + port_index = xhci->usb2_rhub.num_ports; 928 + ports = xhci->usb2_rhub.ports; 929 + while (port_index--) { 930 + portsc = readl(ports[port_index]->addr); 931 + if (portsc & PORT_CHANGE_MASK || 932 + (portsc & PORT_PLS_MASK) == XDEV_RESUME) 933 + return true; 934 + } 935 + port_index = xhci->usb3_rhub.num_ports; 936 + ports = xhci->usb3_rhub.ports; 937 + while (port_index--) { 938 + portsc = readl(ports[port_index]->addr); 939 + if (portsc & PORT_CHANGE_MASK || 940 + (portsc & PORT_PLS_MASK) == XDEV_RESUME) 941 + return true; 942 + } 943 + return false; 944 + } 945 + 911 946 /* 912 947 * Stop HC (not bus-specific) 913 948 * ··· 1044 1009 */ 1045 1010 int xhci_resume(struct xhci_hcd *xhci, bool hibernated) 1046 1011 { 1047 - u32 command, temp = 0, status; 1012 + u32 command, temp = 0; 1048 1013 struct usb_hcd *hcd = xhci_to_hcd(xhci); 1049 1014 struct usb_hcd *secondary_hcd; 1050 1015 int retval = 0; ··· 1169 1134 done: 1170 1135 if (retval == 0) { 1171 1136 /* Resume root hubs only when have pending events. */ 1172 - status = readl(&xhci->op_regs->status); 1173 - if (status & STS_EINT) { 1137 + if (xhci_pending_portevent(xhci)) { 1174 1138 usb_hcd_resume_root_hub(xhci->shared_hcd); 1175 1139 usb_hcd_resume_root_hub(hcd); 1176 1140 }
+4
drivers/usb/host/xhci.h
··· 382 382 #define PORT_PLC (1 << 22) 383 383 /* port configure error change - port failed to configure its link partner */ 384 384 #define PORT_CEC (1 << 23) 385 + #define PORT_CHANGE_MASK (PORT_CSC | PORT_PEC | PORT_WRC | PORT_OCC | \ 386 + PORT_RC | PORT_PLC | PORT_CEC) 387 + 388 + 385 389 /* Cold Attach Status - xHC can set this bit to report device attached during 386 390 * Sx state. Warm port reset should be perfomed to clear this bit and move port 387 391 * to connected state.