Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Improve setting of "events_cleared" for write-intent bitmaps.

When an array is degraded, bits in the write-intent bitmap are not
cleared, so that if the missing device is re-added, it can be synced
by only updated those parts of the device that have changed since
it was removed.

The enable this a 'events_cleared' value is stored. It is the event
counter for the array the last time that any bits were cleared.

Sometimes - if a device disappears from an array while it is 'clean' -
the events_cleared value gets updated incorrectly (there are subtle
ordering issues between updateing events in the main metadata and the
bitmap metadata) resulting in the missing device appearing to require
a full resync when it is re-added.

With this patch, we update events_cleared precisely when we are about
to clear a bit in the bitmap. We record events_cleared when we clear
the bit internally, and copy that to the superblock which is written
out before the bit on storage. This makes it more "obviously correct".

We also need to update events_cleared when the event_count is going
backwards (as happens on a dirty->clean transition of a non-degraded
array).

Thanks to Mike Snitzer for identifying this problem and testing early
"fixes".

Cc: "Mike Snitzer" <snitzer@gmail.com>
Signed-off-by: Neil Brown <neilb@suse.de>

+25 -5
+24 -5
drivers/md/bitmap.c
··· 454 454 spin_unlock_irqrestore(&bitmap->lock, flags); 455 455 sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0); 456 456 sb->events = cpu_to_le64(bitmap->mddev->events); 457 - if (!bitmap->mddev->degraded) 458 - sb->events_cleared = cpu_to_le64(bitmap->mddev->events); 457 + if (bitmap->mddev->events < bitmap->events_cleared) { 458 + /* rocking back to read-only */ 459 + bitmap->events_cleared = bitmap->mddev->events; 460 + sb->events_cleared = cpu_to_le64(bitmap->events_cleared); 461 + } 459 462 kunmap_atomic(sb, KM_USER0); 460 463 write_page(bitmap, bitmap->sb_page, 1); 461 464 } ··· 1088 1085 } else 1089 1086 spin_unlock_irqrestore(&bitmap->lock, flags); 1090 1087 lastpage = page; 1091 - /* 1092 - printk("bitmap clean at page %lu\n", j); 1093 - */ 1088 + 1089 + /* We are possibly going to clear some bits, so make 1090 + * sure that events_cleared is up-to-date. 1091 + */ 1092 + if (bitmap->need_sync) { 1093 + bitmap_super_t *sb; 1094 + bitmap->need_sync = 0; 1095 + sb = kmap_atomic(bitmap->sb_page, KM_USER0); 1096 + sb->events_cleared = 1097 + cpu_to_le64(bitmap->events_cleared); 1098 + kunmap_atomic(sb, KM_USER0); 1099 + write_page(bitmap, bitmap->sb_page, 1); 1100 + } 1094 1101 spin_lock_irqsave(&bitmap->lock, flags); 1095 1102 clear_page_attr(bitmap, page, BITMAP_PAGE_CLEAN); 1096 1103 } ··· 1268 1255 if (!bmc) { 1269 1256 spin_unlock_irqrestore(&bitmap->lock, flags); 1270 1257 return; 1258 + } 1259 + 1260 + if (success && 1261 + bitmap->events_cleared < bitmap->mddev->events) { 1262 + bitmap->events_cleared = bitmap->mddev->events; 1263 + bitmap->need_sync = 1; 1271 1264 } 1272 1265 1273 1266 if (!success && ! (*bmc & NEEDED_MASK))
+1
include/linux/raid/bitmap.h
··· 221 221 unsigned long syncchunk; 222 222 223 223 __u64 events_cleared; 224 + int need_sync; 224 225 225 226 /* bitmap spinlock */ 226 227 spinlock_t lock;