Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

[SCSI] sd: fix array cache flushing bug causing performance problems

Some arrays synchronize their full non volatile cache when the sd driver sends
a SYNCHRONIZE CACHE command. Unfortunately, they can have Terrabytes of this
and we send a SYNCHRONIZE CACHE for every barrier if an array reports it has a
writeback cache. This leads to massive slowdowns on journalled filesystems.

The fix is to allow userspace to turn off the writeback cache setting as a
temporary measure (i.e. without doing the MODE SELECT to write it back to the
device), so even though the device reported it has a writeback cache, the
user, knowing that the cache is non volatile and all they care about is
filesystem correctness, can turn that bit off in the kernel and avoid the
performance ruinous (and safety irrelevant) SYNCHRONIZE CACHE commands.

The way you do this is add a 'temporary' prefix when performing the usual
cache setting operations, so

echo temporary write through > /sys/class/scsi_disk/<disk>/cache_type

Reported-by: Ric Wheeler <rwheeler@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

+21
+20
drivers/scsi/sd.c
··· 142 142 char *buffer_data; 143 143 struct scsi_mode_data data; 144 144 struct scsi_sense_hdr sshdr; 145 + const char *temp = "temporary "; 145 146 int len; 146 147 147 148 if (sdp->type != TYPE_DISK) ··· 150 149 * can do it, but there's probably so many exceptions 151 150 * it's not worth the risk */ 152 151 return -EINVAL; 152 + 153 + if (strncmp(buf, temp, sizeof(temp) - 1) == 0) { 154 + buf += sizeof(temp) - 1; 155 + sdkp->cache_override = 1; 156 + } else { 157 + sdkp->cache_override = 0; 158 + } 153 159 154 160 for (i = 0; i < ARRAY_SIZE(sd_cache_types); i++) { 155 161 len = strlen(sd_cache_types[i]); ··· 170 162 return -EINVAL; 171 163 rcd = ct & 0x01 ? 1 : 0; 172 164 wce = ct & 0x02 ? 1 : 0; 165 + 166 + if (sdkp->cache_override) { 167 + sdkp->WCE = wce; 168 + sdkp->RCD = rcd; 169 + return count; 170 + } 171 + 173 172 if (scsi_mode_sense(sdp, 0x08, 8, buffer, sizeof(buffer), SD_TIMEOUT, 174 173 SD_MAX_RETRIES, &data, NULL)) 175 174 return -EINVAL; ··· 2334 2319 int old_rcd = sdkp->RCD; 2335 2320 int old_dpofua = sdkp->DPOFUA; 2336 2321 2322 + 2323 + if (sdkp->cache_override) 2324 + return; 2325 + 2337 2326 first_len = 4; 2338 2327 if (sdp->skip_ms_page_8) { 2339 2328 if (sdp->type == TYPE_RBC) ··· 2831 2812 sdkp->capacity = 0; 2832 2813 sdkp->media_present = 1; 2833 2814 sdkp->write_prot = 0; 2815 + sdkp->cache_override = 0; 2834 2816 sdkp->WCE = 0; 2835 2817 sdkp->RCD = 0; 2836 2818 sdkp->ATO = 0;
+1
drivers/scsi/sd.h
··· 73 73 u8 protection_type;/* Data Integrity Field */ 74 74 u8 provisioning_mode; 75 75 unsigned ATO : 1; /* state of disk ATO bit */ 76 + unsigned cache_override : 1; /* temp override of WCE,RCD */ 76 77 unsigned WCE : 1; /* state of disk WCE bit */ 77 78 unsigned RCD : 1; /* state of disk RCD bit, unused */ 78 79 unsigned DPOFUA : 1; /* state of disk DPOFUA bit */