Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

jbd: fix race between free buffer and commit transaction

journal_try_to_free_buffers() could race with jbd commit transaction when
the later is holding the buffer reference while waiting for the data
buffer to flush to disk. If the caller of journal_try_to_free_buffers()
request tries hard to release the buffers, it will treat the failure as
error and return back to the caller. We have seen the directo IO failed
due to this race. Some of the caller of releasepage() also expecting the
buffer to be dropped when passed with GFP_KERNEL mask to the
releasepage()->journal_try_to_free_buffers().

With this patch, if the caller is passing the __GFP_WAIT and __GFP_FS to
indicating this call could wait, in case of try_to_free_buffers() failed,
let's waiting for journal_commit_transaction() to finish commit the
current committing transaction, then try to free those buffers again.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Mingming Cao and committed by
Linus Torvalds
3f31fddf 9ebfbe9f

+56 -4
+55 -2
fs/jbd/transaction.c
··· 1648 1648 return; 1649 1649 } 1650 1650 1651 + /* 1652 + * journal_try_to_free_buffers() could race with journal_commit_transaction() 1653 + * The latter might still hold the a count on buffers when inspecting 1654 + * them on t_syncdata_list or t_locked_list. 1655 + * 1656 + * journal_try_to_free_buffers() will call this function to 1657 + * wait for the current transaction to finish syncing data buffers, before 1658 + * tryinf to free that buffer. 1659 + * 1660 + * Called with journal->j_state_lock held. 1661 + */ 1662 + static void journal_wait_for_transaction_sync_data(journal_t *journal) 1663 + { 1664 + transaction_t *transaction = NULL; 1665 + tid_t tid; 1666 + 1667 + spin_lock(&journal->j_state_lock); 1668 + transaction = journal->j_committing_transaction; 1669 + 1670 + if (!transaction) { 1671 + spin_unlock(&journal->j_state_lock); 1672 + return; 1673 + } 1674 + 1675 + tid = transaction->t_tid; 1676 + spin_unlock(&journal->j_state_lock); 1677 + log_wait_commit(journal, tid); 1678 + } 1651 1679 1652 1680 /** 1653 1681 * int journal_try_to_free_buffers() - try to free page buffers. 1654 1682 * @journal: journal for operation 1655 1683 * @page: to try and free 1656 - * @unused_gfp_mask: unused 1684 + * @gfp_mask: we use the mask to detect how hard should we try to release 1685 + * buffers. If __GFP_WAIT and __GFP_FS is set, we wait for commit code to 1686 + * release the buffers. 1657 1687 * 1658 1688 * 1659 1689 * For all the buffers on this page, ··· 1712 1682 * journal_try_to_free_buffer() is changing its state. But that 1713 1683 * cannot happen because we never reallocate freed data as metadata 1714 1684 * while the data is part of a transaction. Yes? 1685 + * 1686 + * Return 0 on failure, 1 on success 1715 1687 */ 1716 1688 int journal_try_to_free_buffers(journal_t *journal, 1717 - struct page *page, gfp_t unused_gfp_mask) 1689 + struct page *page, gfp_t gfp_mask) 1718 1690 { 1719 1691 struct buffer_head *head; 1720 1692 struct buffer_head *bh; ··· 1745 1713 if (buffer_jbd(bh)) 1746 1714 goto busy; 1747 1715 } while ((bh = bh->b_this_page) != head); 1716 + 1748 1717 ret = try_to_free_buffers(page); 1718 + 1719 + /* 1720 + * There are a number of places where journal_try_to_free_buffers() 1721 + * could race with journal_commit_transaction(), the later still 1722 + * holds the reference to the buffers to free while processing them. 1723 + * try_to_free_buffers() failed to free those buffers. Some of the 1724 + * caller of releasepage() request page buffers to be dropped, otherwise 1725 + * treat the fail-to-free as errors (such as generic_file_direct_IO()) 1726 + * 1727 + * So, if the caller of try_to_release_page() wants the synchronous 1728 + * behaviour(i.e make sure buffers are dropped upon return), 1729 + * let's wait for the current transaction to finish flush of 1730 + * dirty data buffers, then try to free those buffers again, 1731 + * with the journal locked. 1732 + */ 1733 + if (ret == 0 && (gfp_mask & __GFP_WAIT) && (gfp_mask & __GFP_FS)) { 1734 + journal_wait_for_transaction_sync_data(journal); 1735 + ret = try_to_free_buffers(page); 1736 + } 1737 + 1749 1738 busy: 1750 1739 return ret; 1751 1740 }
+1 -2
mm/filemap.c
··· 2563 2563 * Otherwise return zero. 2564 2564 * 2565 2565 * The @gfp_mask argument specifies whether I/O may be performed to release 2566 - * this page (__GFP_IO), and whether the call may block (__GFP_WAIT). 2566 + * this page (__GFP_IO), and whether the call may block (__GFP_WAIT & __GFP_FS). 2567 2567 * 2568 - * NOTE: @gfp_mask may go away, and this function may become non-blocking. 2569 2568 */ 2570 2569 int try_to_release_page(struct page *page, gfp_t gfp_mask) 2571 2570 {