[PATCH] lock validator: fix ns83820.c irq-flags bug

Barry K. Nathan reported the following lockdep warning:

[ 197.343948] BUG: warning at kernel/lockdep.c:1856/trace_hardirqs_on()
[ 197.345928] [<c010329b>] show_trace_log_lvl+0x5b/0x105
[ 197.346359] [<c0103896>] show_trace+0x1b/0x20
[ 197.346759] [<c01038ed>] dump_stack+0x1f/0x24
[ 197.347159] [<c012efa2>] trace_hardirqs_on+0xfb/0x185
[ 197.348873] [<c029b009>] _spin_unlock_irq+0x24/0x2d
[ 197.350620] [<e09034e8>] do_tx_done+0x171/0x179 [ns83820]
[ 197.350895] [<e090445c>] ns83820_irq+0x149/0x20b [ns83820]
[ 197.351166] [<c013b4b8>] handle_IRQ_event+0x1d/0x52
[ 197.353216] [<c013c6c2>] handle_level_irq+0x97/0xe1
[ 197.355157] [<c01048c3>] do_IRQ+0x8b/0xac
[ 197.355612] [<c0102d9d>] common_interrupt+0x25/0x2c

this is caused because the ns83820 driver re-enables irq flags
in hardirq context.

While legal in theory, in practice it should only be done if the
hardware is really old and has some very high overhead in its ISR.
(such as PIO IDE)

For modern hardware, running ISRs with irqs enabled is discouraged,
because 1) new hardware is fast enough to not cause latency problems
2) allowing the nesting of hardware interrupts only 'spreads out'
the handling of the current ISR, causing extra cachemisses that would
otherwise not happen. Furthermore, on architectures where ISRs share
the kernel stacks, enabling interrupts in ISRs introduces a much
higher kernel-stack-nesting and thus kernel-stack-overflow risk.
3) not managing irq-flags via the _irqsave / _irqrestore variants
is dangerous: it's easy to forget whether one function nests inside
another, and irq flags might be mismanaged.

In the few cases where re-enabling interrupts in an ISR is considered
useful (and unavoidable), it has to be taught to the lock validator
explicitly (because the lock validator needs the "no ISR ever enables
hardirqs" artificial simplification to keep the IRQ/softirq locking
dependencies manageable).

This teaching is done via the explicit use local_irq_enable_in_hardirq().
On a stock kernel this maps to local_irq_enable(). If the lock validator
is enabled then this does not enable interrupts.

Now, the analysis of drivers/net/ns83820.c's irq flags use: the
irq-enabling in irq context seems intentional, but i dont think it's
justified. Furthermore, the driver suffers from problem #3 above too,
in ns83820_tx_timeout() it disables irqs via local_irq_save(), but
then it calls do_tx_done() which does a spin_unlock_irq(),
re-enabling for a function that does not expect it! While currently
this bug seems harmless (only some debug printout seems to be
affected by it), it's nevertheless something to be fixed.

So this patch makes the ns83820 ISR irq-flags-safe, and cleans up
do_tx_done() use and locking to avoid the ns83820_tx_timeout() bug.

From: Arjan van de Ven <arjan@linux.intel.com>

ns83820_mib_isr takes the misc_lock in IRQ context. All other places that
do this in the ISR already use _irqsave versions, make this consistent at
least. At some point in the future someone should audit the driver to see
if all _irqsave's in the ISR can go away, this is generally an iffy/fragile
proposition though; for now get it safe, simple and consistent.

From: Arjan van de Ven <arjan@linux.intel.com>

ok this is a real driver deadlock:

The ns83820 driver enabled interrupts (by unlocking the misc_lock with
_irq) while still holding the rx_info.lock, which is required to be irq
safe since it's used in the ISR like this:
writel(1, dev->base + IER);
spin_unlock_irq(&dev->misc_lock);
kick_rx(ndev);
spin_unlock_irq(&dev->rx_info.lock);

This is can cause a deadlock if an irq was pending at the first
spin_unlock_irq already, or if one would hit during kick_rx().
Simply remove the first _irq solves this

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

authored by Ingo Molnar and committed by Jeff Garzik 3a10cceb ac5bfe40

+17 -14
+17 -14
drivers/net/ns83820.c
··· 803 803 804 804 writel(dev->IMR_cache, dev->base + IMR); 805 805 writel(1, dev->base + IER); 806 - spin_unlock_irq(&dev->misc_lock); 806 + spin_unlock(&dev->misc_lock); 807 807 808 808 kick_rx(ndev); 809 809 ··· 1012 1012 struct ns83820 *dev = PRIV(ndev); 1013 1013 u32 cmdsts, tx_done_idx, *desc; 1014 1014 1015 - spin_lock_irq(&dev->tx_lock); 1016 - 1017 1015 dprintk("do_tx_done(%p)\n", ndev); 1018 1016 tx_done_idx = dev->tx_done_idx; 1019 1017 desc = dev->tx_descs + (tx_done_idx * DESC_SIZE); ··· 1067 1069 netif_start_queue(ndev); 1068 1070 netif_wake_queue(ndev); 1069 1071 } 1070 - spin_unlock_irq(&dev->tx_lock); 1071 1072 } 1072 1073 1073 1074 static void ns83820_cleanup_tx(struct ns83820 *dev) ··· 1278 1281 .get_link = ns83820_get_link 1279 1282 }; 1280 1283 1284 + /* this function is called in irq context from the ISR */ 1281 1285 static void ns83820_mib_isr(struct ns83820 *dev) 1282 1286 { 1283 - spin_lock(&dev->misc_lock); 1287 + unsigned long flags; 1288 + spin_lock_irqsave(&dev->misc_lock, flags); 1284 1289 ns83820_update_stats(dev); 1285 - spin_unlock(&dev->misc_lock); 1290 + spin_unlock_irqrestore(&dev->misc_lock, flags); 1286 1291 } 1287 1292 1288 1293 static void ns83820_do_isr(struct net_device *ndev, u32 isr); ··· 1306 1307 static void ns83820_do_isr(struct net_device *ndev, u32 isr) 1307 1308 { 1308 1309 struct ns83820 *dev = PRIV(ndev); 1310 + unsigned long flags; 1311 + 1309 1312 #ifdef DEBUG 1310 1313 if (isr & ~(ISR_PHY | ISR_RXDESC | ISR_RXEARLY | ISR_RXOK | ISR_RXERR | ISR_TXIDLE | ISR_TXOK | ISR_TXDESC)) 1311 1314 Dprintk("odd isr? 0x%08x\n", isr); ··· 1322 1321 if ((ISR_RXDESC | ISR_RXOK) & isr) { 1323 1322 prefetch(dev->rx_info.next_rx_desc); 1324 1323 1325 - spin_lock_irq(&dev->misc_lock); 1324 + spin_lock_irqsave(&dev->misc_lock, flags); 1326 1325 dev->IMR_cache &= ~(ISR_RXDESC | ISR_RXOK); 1327 1326 writel(dev->IMR_cache, dev->base + IMR); 1328 - spin_unlock_irq(&dev->misc_lock); 1327 + spin_unlock_irqrestore(&dev->misc_lock, flags); 1329 1328 1330 1329 tasklet_schedule(&dev->rx_tasklet); 1331 1330 //rx_irq(ndev); ··· 1371 1370 * work has accumulated 1372 1371 */ 1373 1372 if ((ISR_TXDESC | ISR_TXIDLE | ISR_TXOK | ISR_TXERR) & isr) { 1373 + spin_lock_irqsave(&dev->tx_lock, flags); 1374 1374 do_tx_done(ndev); 1375 + spin_unlock_irqrestore(&dev->tx_lock, flags); 1375 1376 1376 1377 /* Disable TxOk if there are no outstanding tx packets. 1377 1378 */ 1378 1379 if ((dev->tx_done_idx == dev->tx_free_idx) && 1379 1380 (dev->IMR_cache & ISR_TXOK)) { 1380 - spin_lock_irq(&dev->misc_lock); 1381 + spin_lock_irqsave(&dev->misc_lock, flags); 1381 1382 dev->IMR_cache &= ~ISR_TXOK; 1382 1383 writel(dev->IMR_cache, dev->base + IMR); 1383 - spin_unlock_irq(&dev->misc_lock); 1384 + spin_unlock_irqrestore(&dev->misc_lock, flags); 1384 1385 } 1385 1386 } 1386 1387 ··· 1393 1390 * nature are expected, we must enable TxOk. 1394 1391 */ 1395 1392 if ((ISR_TXIDLE & isr) && (dev->tx_done_idx != dev->tx_free_idx)) { 1396 - spin_lock_irq(&dev->misc_lock); 1393 + spin_lock_irqsave(&dev->misc_lock, flags); 1397 1394 dev->IMR_cache |= ISR_TXOK; 1398 1395 writel(dev->IMR_cache, dev->base + IMR); 1399 - spin_unlock_irq(&dev->misc_lock); 1396 + spin_unlock_irqrestore(&dev->misc_lock, flags); 1400 1397 } 1401 1398 1402 1399 /* MIB interrupt: one of the statistics counters is about to overflow */ ··· 1458 1455 u32 tx_done_idx, *desc; 1459 1456 unsigned long flags; 1460 1457 1461 - local_irq_save(flags); 1458 + spin_lock_irqsave(&dev->tx_lock, flags); 1462 1459 1463 1460 tx_done_idx = dev->tx_done_idx; 1464 1461 desc = dev->tx_descs + (tx_done_idx * DESC_SIZE); ··· 1485 1482 ndev->name, 1486 1483 tx_done_idx, dev->tx_free_idx, le32_to_cpu(desc[DESC_CMDSTS])); 1487 1484 1488 - local_irq_restore(flags); 1485 + spin_unlock_irqrestore(&dev->tx_lock, flags); 1489 1486 } 1490 1487 1491 1488 static void ns83820_tx_watch(unsigned long data)