Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

USB: EHCI: add a delay when unlinking an active QH

Michael Reutman reports that an AMD/ATI EHCI host controller on one of
his computers does not stop transferring data when an active bulk QH
is unlinked from the async schedule. Apparently that host controller
fails to implement the IAA mechanism correctly when an active QH is
unlinked. This leads to data corruption, because the controller
continues to update the QH in memory when the driver doesn't expect
it. As a result, the next URB submitted for that QH can hang, because
the link pointers for the TD queue have been messed up. This
misbehavior is observed quite regularly.

To be fair, the EHCI spec (section 4.8.2) says that active QHs should
not be unlinked. It goes on to recommend a procedure that involves
waiting for the QH to go inactive before unlinking it. In the real
world this is impractical, not least because the QH may _never_ go
inactive. (What were they thinking?) Sometimes we have no choice but
to unlink an active QH.

In an attempt to avoid the problems that can ensue, this patch changes
how the driver decides when the unlink is complete. In addition to
waiting through two IAA cycles, in cases where the QH was not known to
be inactive beforehand we now wait until a 2-ms period has elapsed
with the host controller making no change to the QH data structure
(the hw_current and hw_token fields in particular). The intuition
here is that after such a long period, the endpoint must be NAKing and
hopefully the QH has been dropped from the host controller's internal
cache. There's no way to know if this reasoning is really valid --
the spec is no help in this regard -- but at least this approach fixes
Michael's problem.

The test for whether the QH is already known to be inactive involves
the reason for unlinking the QH originally. If it was unlinked
because it had halted, or it stopped in response to a short read, or
it overlaid a dummy TD (a silicon bug), then it certainly is inactive.
If it was unlinked because the TD queue was empty and no TDs have been
added to the queue in the meantime, then it must be inactive. Or if
the hardware status indicates that the QH is currently halted (even if
that wasn't the reason for unlinking it), then it is inactive.
Otherwise, if none of those checks apply, we go through the 2-ms
delay.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Michael Reutman <mreutman@epiqsolutions.com>
Tested-by: Michael Reutman <mreutman@epiqsolutions.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

authored by

Alan Stern and committed by
Greg Kroah-Hartman
87d61912 f96fba0d

+59 -5
+3
drivers/usb/host/ehci-hcd.c
··· 566 566 /* Accept arbitrarily long scatter-gather lists */ 567 567 if (!(hcd->driver->flags & HCD_LOCAL_MEM)) 568 568 hcd->self.sg_tablesize = ~0; 569 + 570 + /* Prepare for unlinking active QHs */ 571 + ehci->old_current = ~0; 569 572 return 0; 570 573 } 571 574
+51 -5
drivers/usb/host/ehci-q.c
··· 1341 1341 * after the IAA interrupt occurs. In self-defense, always go 1342 1342 * through two IAA cycles for each QH. 1343 1343 */ 1344 - else if (qh->qh_state == QH_STATE_UNLINK_WAIT) { 1344 + else if (qh->qh_state == QH_STATE_UNLINK) { 1345 + /* 1346 + * Second IAA cycle has finished. Process only the first 1347 + * waiting QH (NVIDIA (?) bug). 1348 + */ 1349 + list_move_tail(&qh->unlink_node, &ehci->async_idle); 1350 + } 1351 + 1352 + /* 1353 + * AMD/ATI (?) bug: The HC can continue to use an active QH long 1354 + * after the IAA interrupt occurs. To prevent problems, QHs that 1355 + * may still be active will wait until 2 ms have passed with no 1356 + * change to the hw_current and hw_token fields (this delay occurs 1357 + * between the two IAA cycles). 1358 + * 1359 + * The EHCI spec (4.8.2) says that active QHs must not be removed 1360 + * from the async schedule and recommends waiting until the QH 1361 + * goes inactive. This is ridiculous because the QH will _never_ 1362 + * become inactive if the endpoint NAKs indefinitely. 1363 + */ 1364 + 1365 + /* Some reasons for unlinking guarantee the QH can't be active */ 1366 + else if (qh->unlink_reason & (QH_UNLINK_HALTED | 1367 + QH_UNLINK_SHORT_READ | QH_UNLINK_DUMMY_OVERLAY)) 1368 + goto DelayDone; 1369 + 1370 + /* The QH can't be active if the queue was and still is empty... */ 1371 + else if ((qh->unlink_reason & QH_UNLINK_QUEUE_EMPTY) && 1372 + list_empty(&qh->qtd_list)) 1373 + goto DelayDone; 1374 + 1375 + /* ... or if the QH has halted */ 1376 + else if (qh->hw->hw_token & cpu_to_hc32(ehci, QTD_STS_HALT)) 1377 + goto DelayDone; 1378 + 1379 + /* Otherwise we have to wait until the QH stops changing */ 1380 + else { 1381 + __hc32 qh_current, qh_token; 1382 + 1383 + qh_current = qh->hw->hw_current; 1384 + qh_token = qh->hw->hw_token; 1385 + if (qh_current != ehci->old_current || 1386 + qh_token != ehci->old_token) { 1387 + ehci->old_current = qh_current; 1388 + ehci->old_token = qh_token; 1389 + ehci_enable_event(ehci, 1390 + EHCI_HRTIMER_ACTIVE_UNLINK, true); 1391 + return; 1392 + } 1393 + DelayDone: 1345 1394 qh->qh_state = QH_STATE_UNLINK; 1346 1395 early_exit = true; 1347 1396 } 1348 - 1349 - /* Otherwise process only the first waiting QH (NVIDIA bug?) */ 1350 - else 1351 - list_move_tail(&qh->unlink_node, &ehci->async_idle); 1397 + ehci->old_current = ~0; /* Prepare for next QH */ 1352 1398 1353 1399 /* Start a new IAA cycle if any QHs are waiting for it */ 1354 1400 if (!list_empty(&ehci->async_unlink))
+2
drivers/usb/host/ehci-timer.c
··· 72 72 1 * NSEC_PER_MSEC, /* EHCI_HRTIMER_POLL_DEAD */ 73 73 1125 * NSEC_PER_USEC, /* EHCI_HRTIMER_UNLINK_INTR */ 74 74 2 * NSEC_PER_MSEC, /* EHCI_HRTIMER_FREE_ITDS */ 75 + 2 * NSEC_PER_MSEC, /* EHCI_HRTIMER_ACTIVE_UNLINK */ 75 76 5 * NSEC_PER_MSEC, /* EHCI_HRTIMER_START_UNLINK_INTR */ 76 77 6 * NSEC_PER_MSEC, /* EHCI_HRTIMER_ASYNC_UNLINKS */ 77 78 10 * NSEC_PER_MSEC, /* EHCI_HRTIMER_IAA_WATCHDOG */ ··· 396 395 ehci_handle_controller_death, /* EHCI_HRTIMER_POLL_DEAD */ 397 396 ehci_handle_intr_unlinks, /* EHCI_HRTIMER_UNLINK_INTR */ 398 397 end_free_itds, /* EHCI_HRTIMER_FREE_ITDS */ 398 + end_unlink_async, /* EHCI_HRTIMER_ACTIVE_UNLINK */ 399 399 ehci_handle_start_intr_unlinks, /* EHCI_HRTIMER_START_UNLINK_INTR */ 400 400 unlink_empty_async, /* EHCI_HRTIMER_ASYNC_UNLINKS */ 401 401 ehci_iaa_watchdog, /* EHCI_HRTIMER_IAA_WATCHDOG */
+3
drivers/usb/host/ehci.h
··· 110 110 EHCI_HRTIMER_POLL_DEAD, /* Wait for dead controller to stop */ 111 111 EHCI_HRTIMER_UNLINK_INTR, /* Wait for interrupt QH unlink */ 112 112 EHCI_HRTIMER_FREE_ITDS, /* Wait for unused iTDs and siTDs */ 113 + EHCI_HRTIMER_ACTIVE_UNLINK, /* Wait while unlinking an active QH */ 113 114 EHCI_HRTIMER_START_UNLINK_INTR, /* Unlink empty interrupt QHs */ 114 115 EHCI_HRTIMER_ASYNC_UNLINKS, /* Unlink empty async QHs */ 115 116 EHCI_HRTIMER_IAA_WATCHDOG, /* Handle lost IAA interrupts */ ··· 157 156 struct list_head async_idle; 158 157 unsigned async_unlink_cycle; 159 158 unsigned async_count; /* async activity count */ 159 + __hc32 old_current; /* Test for QH becoming */ 160 + __hc32 old_token; /* inactive during unlink */ 160 161 161 162 /* periodic schedule support */ 162 163 #define DEFAULT_I_TDPS 1024 /* some HCs can do less */