Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

can: mcp251x: Fix race condition on receive interrupt

The mcp251x driver uses both receiving mailboxes of the CAN controller
chips. For retrieving the CAN frames from the controller via SPI, it checks
once per interrupt which mailboxes have been filled and will retrieve the
messages accordingly.

This introduces a race condition, as another CAN frame can enter mailbox 1
while mailbox 0 is emptied. If now another CAN frame enters mailbox 0 until
the interrupt handler is called next, mailbox 0 is emptied before
mailbox 1, leading to out-of-order CAN frames in the network device.

This is fixed by checking the interrupt flags once again after freeing
mailbox 0, to correctly also empty mailbox 1 before leaving the handler.

For reproducing the bug I created the following setup:
- Two CAN devices, one Raspberry Pi with MCP2515, the other can be any.
- Setup CAN to 1 MHz
- Spam bursts of 5 CAN-messages with increasing CAN-ids
- Continue sending the bursts while sleeping a second between the bursts
- Check on the RPi whether the received messages have increasing CAN-ids
- Without this patch, every burst of messages will contain a flipped pair

v3: https://lore.kernel.org/all/20220804075914.67569-1-sebastian.wuerl@ororatech.com
v2: https://lore.kernel.org/all/20220804064803.63157-1-sebastian.wuerl@ororatech.com
v1: https://lore.kernel.org/all/20220803153300.58732-1-sebastian.wuerl@ororatech.com

Fixes: bf66f3736a94 ("can: mcp251x: Move to threaded interrupts instead of workqueues.")
Signed-off-by: Sebastian Würl <sebastian.wuerl@ororatech.com>
Link: https://lore.kernel.org/all/20220804081411.68567-1-sebastian.wuerl@ororatech.com
[mkl: reduce scope of intf1, eflag1]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

authored by

Sebastian Würl and committed by
Marc Kleine-Budde
d80d60b0 a4cb6e62

+15 -3
+15 -3
drivers/net/can/spi/mcp251x.c
··· 1070 1070 1071 1071 mcp251x_read_2regs(spi, CANINTF, &intf, &eflag); 1072 1072 1073 - /* mask out flags we don't care about */ 1074 - intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR; 1075 - 1076 1073 /* receive buffer 0 */ 1077 1074 if (intf & CANINTF_RX0IF) { 1078 1075 mcp251x_hw_rx(spi, 0); ··· 1079 1082 if (mcp251x_is_2510(spi)) 1080 1083 mcp251x_write_bits(spi, CANINTF, 1081 1084 CANINTF_RX0IF, 0x00); 1085 + 1086 + /* check if buffer 1 is already known to be full, no need to re-read */ 1087 + if (!(intf & CANINTF_RX1IF)) { 1088 + u8 intf1, eflag1; 1089 + 1090 + /* intf needs to be read again to avoid a race condition */ 1091 + mcp251x_read_2regs(spi, CANINTF, &intf1, &eflag1); 1092 + 1093 + /* combine flags from both operations for error handling */ 1094 + intf |= intf1; 1095 + eflag |= eflag1; 1096 + } 1082 1097 } 1083 1098 1084 1099 /* receive buffer 1 */ ··· 1100 1091 if (mcp251x_is_2510(spi)) 1101 1092 clear_intf |= CANINTF_RX1IF; 1102 1093 } 1094 + 1095 + /* mask out flags we don't care about */ 1096 + intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR; 1103 1097 1104 1098 /* any error or tx interrupt we need to clear? */ 1105 1099 if (intf & (CANINTF_ERR | CANINTF_TX))