Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

[SCSI] libfc: possible race could panic system due to NULL fsp->cmd

It is unlikely but in case if it hits then it would cause panic
due to null cmd ptr, so far only one instance seen recently with
ESX though this was introduced long ago with this commit:-

commit c1ecb90a66c5afc7cc5c9349f9c3714eef4a5cfb
Author: Chris Leech <christopher.leech@intel.com>
Date: Thu Dec 10 09:59:26 2009 -0800
[SCSI] libfc: reduce hold time on SCSI host lock

Currently fsp->cmd is set to NULL w/o scsi_queue_lock before
dequeuing from scsi_pkt_queue and that could cause NULL
fsp->cmd in fc_fcp_cleanup_each_cmd for cmd completing
with fsp->cmd = NULL after fc_fcp_cleanup_each_cmd taken
reference. No need to set fsp->cmd to NULL as this is also
protected by fc_fcp_lock_pkt(), for above race the
fc_fcp_lock_pkt() in fc_fcp_cleanup_each_cmd() will fail
as that cmd is already done.

Mike mentioned same issue at
http://www.open-fcoe.org/pipermail/devel/2010-September/010533.html

Similarly moved sc_cmd->SCp.ptr = NULL under scsi_queue_lock so
that scsi abort error handler won't abort on completed cmds.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

authored by

Vasu Dev and committed by
James Bottomley
8b7ac2bb 3067817a

+6 -12
+6 -12
drivers/scsi/libfc/fc_fcp.c
··· 58 58 #define FC_SRB_WRITE (1 << 0) 59 59 60 60 /* 61 - * The SCp.ptr should be tested and set under the host lock. NULL indicates 62 - * that the command has been retruned to the scsi layer. 61 + * The SCp.ptr should be tested and set under the scsi_pkt_queue lock 63 62 */ 64 63 #define CMD_SP(Cmnd) ((struct fc_fcp_pkt *)(Cmnd)->SCp.ptr) 65 64 #define CMD_ENTRY_STATUS(Cmnd) ((Cmnd)->SCp.have_data_in) ··· 1879 1880 1880 1881 lport = fsp->lp; 1881 1882 si = fc_get_scsi_internal(lport); 1882 - if (!fsp->cmd) 1883 - return; 1884 1883 1885 1884 /* 1886 1885 * if can_queue ramp down is done then try can_queue ramp up ··· 1888 1891 fc_fcp_can_queue_ramp_up(lport); 1889 1892 1890 1893 sc_cmd = fsp->cmd; 1891 - fsp->cmd = NULL; 1892 - 1893 - if (!sc_cmd->SCp.ptr) 1894 - return; 1895 - 1896 1894 CMD_SCSI_STATUS(sc_cmd) = fsp->cdb_status; 1897 1895 switch (fsp->status_code) { 1898 1896 case FC_COMPLETE: ··· 1968 1976 1969 1977 spin_lock_irqsave(&si->scsi_queue_lock, flags); 1970 1978 list_del(&fsp->list); 1971 - spin_unlock_irqrestore(&si->scsi_queue_lock, flags); 1972 1979 sc_cmd->SCp.ptr = NULL; 1980 + spin_unlock_irqrestore(&si->scsi_queue_lock, flags); 1973 1981 sc_cmd->scsi_done(sc_cmd); 1974 1982 1975 1983 /* release ref from initial allocation in queue command */ ··· 1987 1995 { 1988 1996 struct fc_fcp_pkt *fsp; 1989 1997 struct fc_lport *lport; 1998 + struct fc_fcp_internal *si; 1990 1999 int rc = FAILED; 1991 2000 unsigned long flags; 1992 2001 ··· 1997 2004 else if (!lport->link_up) 1998 2005 return rc; 1999 2006 2000 - spin_lock_irqsave(lport->host->host_lock, flags); 2007 + si = fc_get_scsi_internal(lport); 2008 + spin_lock_irqsave(&si->scsi_queue_lock, flags); 2001 2009 fsp = CMD_SP(sc_cmd); 2002 2010 if (!fsp) { 2003 2011 /* command completed while scsi eh was setting up */ ··· 2007 2013 } 2008 2014 /* grab a ref so the fsp and sc_cmd cannot be relased from under us */ 2009 2015 fc_fcp_pkt_hold(fsp); 2010 - spin_unlock_irqrestore(lport->host->host_lock, flags); 2016 + spin_unlock_irqrestore(&si->scsi_queue_lock, flags); 2011 2017 2012 2018 if (fc_fcp_lock_pkt(fsp)) { 2013 2019 /* completed while we were waiting for timer to be deleted */