Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

USB: ehci-hcd unlink speedups

This patch fixes some performance bugs observed with some workloads
when unlinking EHCI queue header (QH) descriptors from the async ring
(control/bulk schedule).

The mechanism intended to defer unlinking an empty QH (so there is no
penalty in common cases where it's quickly reused) was not working as
intended. Sometimes the unlink was scheduled:

- too quickly ... which can be a *strong* negative effect, since
that QH becomes unavailable for immediate re-use;

- too slowly ... wasting DMA cycles, usually a minor issue except
for increased bus contention and power usage;

Plus there was an extreme case of "too slowly": a logical error in the
IAA watchdog-timer conversion meant that sometimes the unlink never
got scheduled.

The fix replaces a simple counter with a timestamp derived from the
controller's 8 KHz microframe counter, and adjusts the timer usage
for some issues associated with HZ being less than 8K.

(Based on a patch originally by Alan Stern, and good troubleshooting
from Leonid.)

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Leonid <leonidv11@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

authored by

David Brownell and committed by
Greg Kroah-Hartman
b9638011 38f3ad5e

+14 -10
+1 -1
drivers/usb/host/ehci-hcd.c
··· 84 84 #define EHCI_IAA_MSECS 10 /* arbitrary */ 85 85 #define EHCI_IO_JIFFIES (HZ/10) /* io watchdog > irq_thresh */ 86 86 #define EHCI_ASYNC_JIFFIES (HZ/20) /* async idle timeout */ 87 - #define EHCI_SHRINK_JIFFIES (HZ/200) /* async qh unlink delay */ 87 + #define EHCI_SHRINK_FRAMES 5 /* async qh unlink delay */ 88 88 89 89 /* Initial IRQ latency: faster than hw default */ 90 90 static int log2_irq_thresh = 0; // 0 to 6
+9 -8
drivers/usb/host/ehci-q.c
··· 1116 1116 struct ehci_qh *qh; 1117 1117 enum ehci_timer_action action = TIMER_IO_WATCHDOG; 1118 1118 1119 - if (!++(ehci->stamp)) 1120 - ehci->stamp++; 1119 + ehci->stamp = ehci_readl(ehci, &ehci->regs->frame_index); 1121 1120 timer_action_done (ehci, TIMER_ASYNC_SHRINK); 1122 1121 rescan: 1123 1122 qh = ehci->async->qh_next.qh; ··· 1141 1142 } 1142 1143 } 1143 1144 1144 - /* unlink idle entries, reducing HC PCI usage as well 1145 + /* unlink idle entries, reducing DMA usage as well 1145 1146 * as HCD schedule-scanning costs. delay for any qh 1146 1147 * we just scanned, there's a not-unusual case that it 1147 1148 * doesn't stay idle for long. 1148 1149 * (plus, avoids some kind of re-activation race.) 1149 1150 */ 1150 - if (list_empty (&qh->qtd_list)) { 1151 - if (qh->stamp == ehci->stamp) 1151 + if (list_empty(&qh->qtd_list) 1152 + && qh->qh_state == QH_STATE_LINKED) { 1153 + if (!ehci->reclaim 1154 + && ((ehci->stamp - qh->stamp) & 0x1fff) 1155 + >= (EHCI_SHRINK_FRAMES * 8)) 1156 + start_unlink_async(ehci, qh); 1157 + else 1152 1158 action = TIMER_ASYNC_SHRINK; 1153 - else if (!ehci->reclaim 1154 - && qh->qh_state == QH_STATE_LINKED) 1155 - start_unlink_async (ehci, qh); 1156 1159 } 1157 1160 1158 1161 qh = qh->qh_next.qh;
+4 -1
drivers/usb/host/ehci.h
··· 198 198 break; 199 199 // case TIMER_ASYNC_SHRINK: 200 200 default: 201 - t = EHCI_SHRINK_JIFFIES; 201 + /* add a jiffie since we synch against the 202 + * 8 KHz uframe counter. 203 + */ 204 + t = DIV_ROUND_UP(EHCI_SHRINK_FRAMES * HZ, 1000) + 1; 202 205 break; 203 206 } 204 207 mod_timer(&ehci->watchdog, t + jiffies);