Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

writeback: merge for_kupdate and !for_kupdate cases

Unify the logic for kupdate and non-kupdate cases. There won't be
starvation because the inodes requeued into b_more_io will later be
spliced _after_ the remaining inodes in b_io, hence won't stand in the way
of other inodes in the next run.

It avoids unnecessary redirty_tail() calls, hence the update of
i_dirtied_when. The timestamp update is undesirable because it could
later delay the inode's periodic writeback, or may exclude the inode from
the data integrity sync operation (which checks timestamp to avoid extra
work and livelock).

===
How the redirty_tail() comes about:

It was a long story.. This redirty_tail() was introduced with
wbc.more_io. The initial patch for more_io actually does not have the
redirty_tail(), and when it's merged, several 100% iowait bug reports
arised:

reiserfs:
http://lkml.org/lkml/2007/10/23/93

jfs:
commit 29a424f28390752a4ca2349633aaacc6be494db5
JFS: clear PAGECACHE_TAG_DIRTY for no-write pages

ext2:
http://www.spinics.net/linux/lists/linux-ext4/msg04762.html

They are all old bugs hidden in various filesystems that become "visible"
with the more_io patch. At the time, the ext2 bug is thought to be
"trivial", so not fixed. Instead the following updated more_io patch with
redirty_tail() is merged:

http://www.spinics.net/linux/lists/linux-ext4/msg04507.html

This will in general prevent 100% on ext2 and possibly other unknown FS bugs.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Wu Fengguang and committed by
Linus Torvalds
a50aeb40 4ea879b9

+10 -33
+10 -33
fs/fs-writeback.c
··· 374 374 if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) { 375 375 /* 376 376 * We didn't write back all the pages. nfs_writepages() 377 - * sometimes bales out without doing anything. Redirty 378 - * the inode; Move it from b_io onto b_more_io/b_dirty. 377 + * sometimes bales out without doing anything. 379 378 */ 380 - /* 381 - * akpm: if the caller was the kupdate function we put 382 - * this inode at the head of b_dirty so it gets first 383 - * consideration. Otherwise, move it to the tail, for 384 - * the reasons described there. I'm not really sure 385 - * how much sense this makes. Presumably I had a good 386 - * reasons for doing it this way, and I'd rather not 387 - * muck with it at present. 388 - */ 389 - if (wbc->for_kupdate) { 379 + inode->i_state |= I_DIRTY_PAGES; 380 + if (wbc->nr_to_write <= 0) { 390 381 /* 391 - * For the kupdate function we move the inode 392 - * to b_more_io so it will get more writeout as 393 - * soon as the queue becomes uncongested. 382 + * slice used up: queue for next turn 394 383 */ 395 - inode->i_state |= I_DIRTY_PAGES; 396 - if (wbc->nr_to_write <= 0) { 397 - /* 398 - * slice used up: queue for next turn 399 - */ 400 - requeue_io(inode); 401 - } else { 402 - /* 403 - * somehow blocked: retry later 404 - */ 405 - redirty_tail(inode); 406 - } 384 + requeue_io(inode); 407 385 } else { 408 386 /* 409 - * Otherwise fully redirty the inode so that 410 - * other inodes on this superblock will get some 411 - * writeout. Otherwise heavy writing to one 412 - * file would indefinitely suspend writeout of 413 - * all the other files. 387 + * Writeback blocked by something other than 388 + * congestion. Delay the inode for some time to 389 + * avoid spinning on the CPU (100% iowait) 390 + * retrying writeback of the dirty page/inode 391 + * that cannot be performed immediately. 414 392 */ 415 - inode->i_state |= I_DIRTY_PAGES; 416 393 redirty_tail(inode); 417 394 } 418 395 } else if (inode->i_state & I_DIRTY) {