Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

xfs: fix efi/efd error handling to avoid fs shutdown hangs

Freeing an extent in XFS involves logging an EFI (extent free
intention), freeing the actual extent, and logging an EFD (extent
free done). The EFI object is created with a reference count of 2:
one for the current transaction and one for the subsequently created
EFD. Under normal circumstances, the first reference is dropped when
the EFI is unpinned and the second reference is dropped when the EFD
is committed to the on-disk log.

In event of errors or filesystem shutdown, there are various
potential cleanup scenarios depending on the state of the EFI/EFD.
The cleanup scenarios are confusing and racy, as demonstrated by the
following test sequence:

# mount $dev $mnt
# fsstress -d $mnt -n 99999 -p 16 -z -f fallocate=1 \
-f punch=1 -f creat=1 -f unlink=1 &
# sleep 5
# killall -9 fsstress; wait
# godown -f $mnt
# umount

... in which the final umount can hang due to the AIL being pinned
indefinitely by one or more EFI items. This can occur due to several
conditions. For example, if the shutdown occurs after the EFI is
committed to the on-disk log and the EFD committed to the CIL, but
before the EFD committed to the log, the EFD iop_committed() abort
handler does not drop its reference to the EFI. Alternatively,
manual error injection in the xfs_bmap_finish() codepath shows that
if an error occurs after the EFI transaction is committed but before
the EFD is constructed and logged, the EFI is never released from
the AIL.

Update the EFI/EFD item handling code to use a more straightforward
and reliable approach to error handling. If an error occurs after
the EFI transaction is committed and before the EFD is constructed,
release the EFI explicitly from xfs_bmap_finish(). If the EFI
transaction is cancelled, release the EFI in the unlock handler.

Once the EFD is constructed, it is responsible for releasing the EFI
under any circumstances (including whether the EFI item aborts due
to log I/O error). Update the EFD item handlers to release the EFI
if the transaction is cancelled or aborts due to log I/O error.
Finally, update xfs_bmap_finish() to log at least one EFD extent to
the transaction before xfs_free_extent() errors are handled to
ensure the transaction is dirty and EFD item error handling is
triggered.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

authored by

Brian Foster and committed by
Dave Chinner
8d99fe92 d43ac29b

+111 -67
+49 -35
fs/xfs/xfs_bmap_util.c
··· 67 67 */ 68 68 int /* error */ 69 69 xfs_bmap_finish( 70 - xfs_trans_t **tp, /* transaction pointer addr */ 71 - xfs_bmap_free_t *flist, /* i/o: list extents to free */ 72 - int *committed) /* xact committed or not */ 70 + struct xfs_trans **tp, /* transaction pointer addr */ 71 + struct xfs_bmap_free *flist, /* i/o: list extents to free */ 72 + int *committed)/* xact committed or not */ 73 73 { 74 - xfs_efd_log_item_t *efd; /* extent free data */ 75 - xfs_efi_log_item_t *efi; /* extent free intention */ 76 - int error; /* error return value */ 77 - xfs_bmap_free_item_t *free; /* free extent item */ 78 - xfs_mount_t *mp; /* filesystem mount structure */ 79 - xfs_bmap_free_item_t *next; /* next item on free list */ 74 + struct xfs_efd_log_item *efd; /* extent free data */ 75 + struct xfs_efi_log_item *efi; /* extent free intention */ 76 + int error; /* error return value */ 77 + struct xfs_bmap_free_item *free; /* free extent item */ 78 + struct xfs_bmap_free_item *next; /* next item on free list */ 80 79 81 80 ASSERT((*tp)->t_flags & XFS_TRANS_PERM_LOG_RES); 82 81 if (flist->xbf_count == 0) { ··· 87 88 xfs_trans_log_efi_extent(*tp, efi, free->xbfi_startblock, 88 89 free->xbfi_blockcount); 89 90 90 - error = xfs_trans_roll(tp, NULL); 91 - *committed = 1; 92 - /* 93 - * We have a new transaction, so we should return committed=1, 94 - * even though we're returning an error. 95 - */ 96 - if (error) 91 + error = __xfs_trans_roll(tp, NULL, committed); 92 + if (error) { 93 + /* 94 + * If the transaction was committed, drop the EFD reference 95 + * since we're bailing out of here. The other reference is 96 + * dropped when the EFI hits the AIL. 97 + * 98 + * If the transaction was not committed, the EFI is freed by the 99 + * EFI item unlock handler on abort. Also, we have a new 100 + * transaction so we should return committed=1 even though we're 101 + * returning an error. 102 + */ 103 + if (*committed) { 104 + xfs_efi_release(efi); 105 + xfs_force_shutdown((*tp)->t_mountp, 106 + (error == -EFSCORRUPTED) ? 107 + SHUTDOWN_CORRUPT_INCORE : 108 + SHUTDOWN_META_IO_ERROR); 109 + } else { 110 + *committed = 1; 111 + } 112 + 97 113 return error; 114 + } 98 115 99 116 efd = xfs_trans_get_efd(*tp, efi, flist->xbf_count); 100 117 for (free = flist->xbf_first; free != NULL; free = next) { 101 118 next = free->xbfi_next; 102 - if ((error = xfs_free_extent(*tp, free->xbfi_startblock, 103 - free->xbfi_blockcount))) { 104 - /* 105 - * The bmap free list will be cleaned up at a 106 - * higher level. The EFI will be canceled when 107 - * this transaction is aborted. 108 - * Need to force shutdown here to make sure it 109 - * happens, since this transaction may not be 110 - * dirty yet. 111 - */ 112 - mp = (*tp)->t_mountp; 113 - if (!XFS_FORCED_SHUTDOWN(mp)) 114 - xfs_force_shutdown(mp, 115 - (error == -EFSCORRUPTED) ? 116 - SHUTDOWN_CORRUPT_INCORE : 117 - SHUTDOWN_META_IO_ERROR); 118 - return error; 119 - } 119 + 120 + /* 121 + * Free the extent and log the EFD to dirty the transaction 122 + * before handling errors. This ensures that the transaction is 123 + * aborted, which: 124 + * 125 + * 1.) releases the EFI and frees the EFD 126 + * 2.) shuts down the filesystem 127 + * 128 + * The bmap free list is cleaned up at a higher level. 129 + */ 130 + error = xfs_free_extent(*tp, free->xbfi_startblock, 131 + free->xbfi_blockcount); 120 132 xfs_trans_log_efd_extent(*tp, efd, free->xbfi_startblock, 121 - free->xbfi_blockcount); 133 + free->xbfi_blockcount); 134 + if (error) 135 + return error; 136 + 122 137 xfs_bmap_del_free(flist, NULL, free); 123 138 } 139 + 124 140 return 0; 125 141 } 126 142
+40 -29
fs/xfs/xfs_extfree_item.c
··· 61 61 62 62 if (atomic_dec_and_test(&efip->efi_refcount)) { 63 63 spin_lock(&ailp->xa_lock); 64 - /* xfs_trans_ail_delete() drops the AIL lock. */ 65 - xfs_trans_ail_delete(ailp, &efip->efi_item, 66 - SHUTDOWN_LOG_IO_ERROR); 64 + /* 65 + * We don't know whether the EFI made it to the AIL. Remove it 66 + * if so. Note that xfs_trans_ail_delete() drops the AIL lock. 67 + */ 68 + if (efip->efi_item.li_flags & XFS_LI_IN_AIL) 69 + xfs_trans_ail_delete(ailp, &efip->efi_item, 70 + SHUTDOWN_LOG_IO_ERROR); 71 + else 72 + spin_unlock(&ailp->xa_lock); 67 73 xfs_efi_item_free(efip); 68 74 } 69 75 } ··· 134 128 } 135 129 136 130 /* 137 - * While EFIs cannot really be pinned, the unpin operation is the last place at 138 - * which the EFI is manipulated during a transaction. If we are being asked to 139 - * remove the EFI it's because the transaction has been cancelled and by 140 - * definition that means the EFI cannot be in the AIL so remove it from the 141 - * transaction and free it. Otherwise coordinate with xfs_efi_release() 142 - * to determine who gets to free the EFI. 131 + * The unpin operation is the last place an EFI is manipulated in the log. It is 132 + * either inserted in the AIL or aborted in the event of a log I/O error. In 133 + * either case, the EFI transaction has been successfully committed to make it 134 + * this far. Therefore, we expect whoever committed the EFI to either construct 135 + * and commit the EFD or drop the EFD's reference in the event of error. Simply 136 + * drop the log's EFI reference now that the log is done with it. 143 137 */ 144 138 STATIC void 145 139 xfs_efi_item_unpin( ··· 147 141 int remove) 148 142 { 149 143 struct xfs_efi_log_item *efip = EFI_ITEM(lip); 150 - 151 - if (remove) { 152 - ASSERT(!(lip->li_flags & XFS_LI_IN_AIL)); 153 - if (lip->li_desc) 154 - xfs_trans_del_item(lip); 155 - xfs_efi_item_free(efip); 156 - return; 157 - } 158 144 xfs_efi_release(efip); 159 145 } 160 146 ··· 165 167 return XFS_ITEM_PINNED; 166 168 } 167 169 170 + /* 171 + * The EFI has been either committed or aborted if the transaction has been 172 + * cancelled. If the transaction was cancelled, an EFD isn't going to be 173 + * constructed and thus we free the EFI here directly. 174 + */ 168 175 STATIC void 169 176 xfs_efi_item_unlock( 170 177 struct xfs_log_item *lip) ··· 415 412 return XFS_ITEM_PINNED; 416 413 } 417 414 415 + /* 416 + * The EFD is either committed or aborted if the transaction is cancelled. If 417 + * the transaction is cancelled, drop our reference to the EFI and free the EFD. 418 + */ 418 419 STATIC void 419 420 xfs_efd_item_unlock( 420 421 struct xfs_log_item *lip) 421 422 { 422 - if (lip->li_flags & XFS_LI_ABORTED) 423 - xfs_efd_item_free(EFD_ITEM(lip)); 423 + struct xfs_efd_log_item *efdp = EFD_ITEM(lip); 424 + 425 + if (lip->li_flags & XFS_LI_ABORTED) { 426 + xfs_efi_release(efdp->efd_efip); 427 + xfs_efd_item_free(efdp); 428 + } 424 429 } 425 430 426 431 /* 427 - * When the efd item is committed to disk, all we need to do 428 - * is delete our reference to our partner efi item and then 429 - * free ourselves. Since we're freeing ourselves we must 430 - * return -1 to keep the transaction code from further referencing 431 - * this item. 432 + * When the efd item is committed to disk, all we need to do is delete our 433 + * reference to our partner efi item and then free ourselves. Since we're 434 + * freeing ourselves we must return -1 to keep the transaction code from further 435 + * referencing this item. 432 436 */ 433 437 STATIC xfs_lsn_t 434 438 xfs_efd_item_committed( ··· 445 435 struct xfs_efd_log_item *efdp = EFD_ITEM(lip); 446 436 447 437 /* 448 - * If we got a log I/O error, it's always the case that the LR with the 449 - * EFI got unpinned and freed before the EFD got aborted. 438 + * Drop the EFI reference regardless of whether the EFD has been 439 + * aborted. Once the EFD transaction is constructed, it is the sole 440 + * responsibility of the EFD to release the EFI (even if the EFI is 441 + * aborted due to log I/O error). 450 442 */ 451 - if (!(lip->li_flags & XFS_LI_ABORTED)) 452 - xfs_efi_release(efdp->efd_efip); 453 - 443 + xfs_efi_release(efdp->efd_efip); 454 444 xfs_efd_item_free(efdp); 445 + 455 446 return (xfs_lsn_t)-1; 456 447 } 457 448
+22 -3
fs/xfs/xfs_extfree_item.h
··· 39 39 * "extent free done" log item described below. 40 40 * 41 41 * The EFI is reference counted so that it is not freed prior to both the EFI 42 - * and EFD being committed and unpinned. This ensures that when the last 43 - * reference goes away the EFI will always be in the AIL as it has been 44 - * unpinned, regardless of whether the EFD is processed before or after the EFI. 42 + * and EFD being committed and unpinned. This ensures the EFI is inserted into 43 + * the AIL even in the event of out of order EFI/EFD processing. In other words, 44 + * an EFI is born with two references: 45 + * 46 + * 1.) an EFI held reference to track EFI AIL insertion 47 + * 2.) an EFD held reference to track EFD commit 48 + * 49 + * On allocation, both references are the responsibility of the caller. Once the 50 + * EFI is added to and dirtied in a transaction, ownership of reference one 51 + * transfers to the transaction. The reference is dropped once the EFI is 52 + * inserted to the AIL or in the event of failure along the way (e.g., commit 53 + * failure, log I/O error, etc.). Note that the caller remains responsible for 54 + * the EFD reference under all circumstances to this point. The caller has no 55 + * means to detect failure once the transaction is committed, however. 56 + * Therefore, an EFD is required after this point, even in the event of 57 + * unrelated failure. 58 + * 59 + * Once an EFD is allocated and dirtied in a transaction, reference two 60 + * transfers to the transaction. The EFD reference is dropped once it reaches 61 + * the unpin handler. Similar to the EFI, the reference also drops in the event 62 + * of commit failure or log I/O errors. Note that the EFD is not inserted in the 63 + * AIL, so at this point both the EFI and EFD are freed. 45 64 */ 46 65 typedef struct xfs_efi_log_item { 47 66 xfs_log_item_t efi_item;