Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

spi: cadence-quadspi: Prevent indirect read

Merge series from Mateusz Litwin <mateusz.litwin@nokia.com>:

On the Stratix10 platform, indirect reads can become very slow due to lost
interrupts and/or missed `complete()` calls, causing
`wait_for_completion_timeout()` to expire.

Three issues were identified:
1) A race condition exists between the read loop and IRQ `complete()`
call:
An IRQ can call `complete()` after the inner loop ends, but before
`reinit_completion()`, losing the completion event and leading to
`wait_for_completion_timeout()` expire. This function will not return
an error because `bytes_to_read` > 0 (indicating data is already in the
FIFO) and the final `ret` value is overwritten by
`cqspi_wait_for_bit()` return value (indicating request completion),
masking the timeout.

For test purpose, logging was added to print the count of timeouts and
the outer loop count.
$ dd if=/dev/mtd0 of=/dev/null bs=64M count=1
[ 2232.925219] cadence-qspi ff8d2000.spi: Indirect read error timeout
(1) loop (12472)
[ 2236.200391] cadence-qspi ff8d2000.spi: Indirect read error timeout
(1) loop (12460)
[ 2239.482836] cadence-qspi ff8d2000.spi: Indirect read error timeout
(5) loop (12450)
This indicates that such an event is rare, but possible.
Tested on the Stratix10 platform.

2) The quirk assumes the indirect read path never leaves the inner loop on
SoCFPGA. This assumption is incorrect when using slow flash. Disabling
IRQs in the inner loop can cause lost interrupts.

3) The `CQSPI_SLOW_SRAM` quirk disables `CQSPI_REG_IRQ_IND_COMP` (indirect
completion) interrupt, relying solely on the `CQSPI_REG_IRQ_WATERMARK`
(FIFO watermark) interrupt. For small transfers sizes, the final data
read might not fill the FIFO sufficiently to trigger the watermark,
preventing completion and leading to wait_for_completion_timeout()
expiration.

Two patches have been prepared to resolve these issues.
- [1/2] spi: cadence-quadspi: Prevent lost complete() call during
indirect read
Moving `reinit_completion()` before the inner loop prevents a race
condition. This might cause a premature IRQ complete() call to occur;
however, in the worst case, this will result in a spurious wakeup and
another wait cycle, which is preferable to waiting for a timeout.

- [2/2] spi: cadence-quadspi: Improve CQSPI_SLOW_SRAM quirk if flash is
slow
Re-enabling `CQSPI_REG_IRQ_IND_COMP` interrupt resolves the problem for
small reads and removes the disabling of interrupts, addressing the
issue with lost interrupts. This marginally increases the IRQ count.

Test:
$ dd if=/dev/mtd0 of=/dev/null bs=1M count=64
Results from the Stratix10 platform with mt25qu02g flash.
FIFO size in all tests: 128

Serviced interrupt call counts:
Without `CQSPI_SLOW_SRAM` quirk: 16 668 850
With `CQSPI_SLOW_SRAM` quirk: 204 176
With `CQSPI_SLOW_SRAM` and this patch: 224 528

Patch 2/2: Delivers a substantial read‑performance improvement for the
Cadence QSPI controller on the Stratix10 platform. Patch 1/2: Applies to
all platforms and should yield a modest performance gain, most noticeable
with large `CQSPI_READ_TIMEOUT_MS` values and workloads dominated by many
small reads.

+11 -12
+11 -12
drivers/spi/spi-cadence-quadspi.c
··· 300 300 CQSPI_REG_IRQ_IND_SRAM_FULL | \ 301 301 CQSPI_REG_IRQ_IND_COMP) 302 302 303 + #define CQSPI_IRQ_MASK_RD_SLOW_SRAM (CQSPI_REG_IRQ_WATERMARK | \ 304 + CQSPI_REG_IRQ_IND_COMP) 305 + 303 306 #define CQSPI_IRQ_MASK_WR (CQSPI_REG_IRQ_IND_COMP | \ 304 307 CQSPI_REG_IRQ_WATERMARK | \ 305 308 CQSPI_REG_IRQ_UNDERFLOW) ··· 384 381 else if (!cqspi->slow_sram) 385 382 irq_status &= CQSPI_IRQ_MASK_RD | CQSPI_IRQ_MASK_WR; 386 383 else 387 - irq_status &= CQSPI_REG_IRQ_WATERMARK | CQSPI_IRQ_MASK_WR; 384 + irq_status &= CQSPI_IRQ_MASK_RD_SLOW_SRAM | CQSPI_IRQ_MASK_WR; 388 385 389 386 if (irq_status) 390 387 complete(&cqspi->transfer_complete); ··· 760 757 */ 761 758 762 759 if (use_irq && cqspi->slow_sram) 763 - writel(CQSPI_REG_IRQ_WATERMARK, reg_base + CQSPI_REG_IRQMASK); 760 + writel(CQSPI_IRQ_MASK_RD_SLOW_SRAM, reg_base + CQSPI_REG_IRQMASK); 764 761 else if (use_irq) 765 762 writel(CQSPI_IRQ_MASK_RD, reg_base + CQSPI_REG_IRQMASK); 766 763 else ··· 772 769 readl(reg_base + CQSPI_REG_INDIRECTRD); /* Flush posted write. */ 773 770 774 771 while (remaining > 0) { 772 + ret = 0; 775 773 if (use_irq && 776 774 !wait_for_completion_timeout(&cqspi->transfer_complete, 777 775 msecs_to_jiffies(CQSPI_READ_TIMEOUT_MS))) 778 776 ret = -ETIMEDOUT; 779 777 780 778 /* 781 - * Disable all read interrupts until 782 - * we are out of "bytes to read" 779 + * Prevent lost interrupt and race condition by reinitializing early. 780 + * A spurious wakeup and another wait cycle can occur here, 781 + * which is preferable to waiting until timeout if interrupt is lost. 783 782 */ 784 - if (cqspi->slow_sram) 785 - writel(0x0, reg_base + CQSPI_REG_IRQMASK); 783 + if (use_irq) 784 + reinit_completion(&cqspi->transfer_complete); 786 785 787 786 bytes_to_read = cqspi_get_rd_sram_level(cqspi); 788 787 ··· 815 810 rxbuf += bytes_to_read; 816 811 remaining -= bytes_to_read; 817 812 bytes_to_read = cqspi_get_rd_sram_level(cqspi); 818 - } 819 - 820 - if (use_irq && remaining > 0) { 821 - reinit_completion(&cqspi->transfer_complete); 822 - if (cqspi->slow_sram) 823 - writel(CQSPI_REG_IRQ_WATERMARK, reg_base + CQSPI_REG_IRQMASK); 824 813 } 825 814 } 826 815