Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

cxlflash: Fix to resolve cmd leak after host reset

After a few iterations of resetting the card, either during EEH
recovery, or a host_reset the following is seen in the logs. cxlflash
0008:00: cxlflash_queuecommand: could not get a free command

At every reset of the card, the commands that are outstanding are being
leaked. No effort is being made to reap these commands. A few more
resets later, the above error message floods the logs and the card is
rendered totally unusable as no free commands are available.

Iterated through the 'cmd' queue and printed out the 'free' counter and
found that on each reset certain commands were in-use and stayed in-use
through subsequent resets.

To resolve this issue, when the card is reset, reap all the commands
that are active/outstanding.

Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

authored by

Manoj Kumar and committed by
Martin K. Petersen
ee91e332 85599218

+17 -2
+17 -2
drivers/scsi/cxlflash/main.c
··· 632 632 * @cfg: Internal structure associated with the host. 633 633 * 634 634 * Safe to call with AFU in a partially allocated/initialized state. 635 + * 636 + * Cleans up all state associated with the command queue, and unmaps 637 + * the MMIO space. 638 + * 639 + * - complete() will take care of commands we initiated (they'll be checked 640 + * in as part of the cleanup that occurs after the completion) 641 + * 642 + * - cmd_checkin() will take care of entries that we did not initiate and that 643 + * have not (and will not) complete because they are sitting on a [now stale] 644 + * hardware queue 635 645 */ 636 646 static void stop_afu(struct cxlflash_cfg *cfg) 637 647 { 638 648 int i; 639 649 struct afu *afu = cfg->afu; 650 + struct afu_cmd *cmd; 640 651 641 652 if (likely(afu)) { 642 - for (i = 0; i < CXLFLASH_NUM_CMDS; i++) 643 - complete(&afu->cmd[i].cevent); 653 + for (i = 0; i < CXLFLASH_NUM_CMDS; i++) { 654 + cmd = &afu->cmd[i]; 655 + complete(&cmd->cevent); 656 + if (!atomic_read(&cmd->free)) 657 + cmd_checkin(cmd); 658 + } 644 659 645 660 if (likely(afu->afu_map)) { 646 661 cxl_psa_unmap((void __iomem *)afu->afu_map);