Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

btrfs: avoid transaction commit on any fsync after subvolume creation

As of commit 1b53e51a4a8f ("btrfs: don't commit transaction for every
subvol create") we started to make any fsync after creating a subvolume
to fallback to a transaction commit if the fsync is performed in the
same transaction that was used to create the subvolume. This happens
with the following at ioctl.c:create_subvol():

$ cat fs/btrfs/ioctl.c
(...)
/* Tree log can't currently deal with an inode which is a new root. */
btrfs_set_log_full_commit(trans);
(...)

Note that the comment is misleading as the problem is not that fsync can
not deal with the root inode of a new root, but that we can not log any
inode that belongs to a root that was not yet persisted because that would
make log replay fail since the root doesn't exist at log replay time.

The above simply makes any fsync fallback to a full transaction commit if
it happens in the same transaction used to create the subvolume - even if
it's an inode that belongs to any other subvolume. This is a brute force
solution and it doesn't necessarily improve performance for every workload
out there - it just moves a full transaction commit from one place, the
subvolume creation, to another - an fsync for any inode.

Just improve on this by making the fallback to a transaction commit only
for an fsync against an inode of the new subvolume, or for the directory
that contains the dentry that points to the new subvolume (in case anyone
attempts to fsync the directory in the same transaction).

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

authored by

Filipe Manana and committed by
David Sterba
45c4102f ebc7c767

+31 -2
+2 -2
fs/btrfs/ioctl.c
··· 662 662 qgroup_reserved = 0; 663 663 trans->block_rsv = &block_rsv; 664 664 trans->bytes_reserved = block_rsv.size; 665 - /* Tree log can't currently deal with an inode which is a new root. */ 666 - btrfs_set_log_full_commit(trans); 667 665 668 666 ret = btrfs_qgroup_inherit(trans, 0, objectid, btrfs_root_id(root), inherit); 669 667 if (ret) ··· 761 763 btrfs_abort_transaction(trans, ret); 762 764 goto out; 763 765 } 766 + 767 + btrfs_record_new_subvolume(trans, BTRFS_I(dir)); 764 768 765 769 d_instantiate_new(dentry, new_inode_args.inode); 766 770 new_inode_args.inode = NULL;
+27
fs/btrfs/tree-log.c
··· 7079 7079 } 7080 7080 7081 7081 /* 7082 + * If we're logging an inode from a subvolume created in the current 7083 + * transaction we must force a commit since the root is not persisted. 7084 + */ 7085 + if (btrfs_root_generation(&root->root_item) == trans->transid) { 7086 + ret = BTRFS_LOG_FORCE_COMMIT; 7087 + goto end_no_trans; 7088 + } 7089 + 7090 + /* 7082 7091 * Skip already logged inodes or inodes corresponding to tmpfiles 7083 7092 * (since logging them is pointless, a link count of 0 means they 7084 7093 * will never be accessible). ··· 7461 7452 */ 7462 7453 void btrfs_record_snapshot_destroy(struct btrfs_trans_handle *trans, 7463 7454 struct btrfs_inode *dir) 7455 + { 7456 + mutex_lock(&dir->log_mutex); 7457 + dir->last_unlink_trans = trans->transid; 7458 + mutex_unlock(&dir->log_mutex); 7459 + } 7460 + 7461 + /* 7462 + * Call this when creating a subvolume in a directory. 7463 + * Because we don't commit a transaction when creating a subvolume, we can't 7464 + * allow the directory pointing to the subvolume to be logged with an entry that 7465 + * points to an unpersisted root if we are still in the transaction used to 7466 + * create the subvolume, so make any attempt to log the directory to result in a 7467 + * full log sync. 7468 + * Also we don't need to worry with renames, since btrfs_rename() marks the log 7469 + * for full commit when renaming a subvolume. 7470 + */ 7471 + void btrfs_record_new_subvolume(const struct btrfs_trans_handle *trans, 7472 + struct btrfs_inode *dir) 7464 7473 { 7465 7474 mutex_lock(&dir->log_mutex); 7466 7475 dir->last_unlink_trans = trans->transid;
+2
fs/btrfs/tree-log.h
··· 94 94 bool for_rename); 95 95 void btrfs_record_snapshot_destroy(struct btrfs_trans_handle *trans, 96 96 struct btrfs_inode *dir); 97 + void btrfs_record_new_subvolume(const struct btrfs_trans_handle *trans, 98 + struct btrfs_inode *dir); 97 99 void btrfs_log_new_name(struct btrfs_trans_handle *trans, 98 100 struct dentry *old_dentry, struct btrfs_inode *old_dir, 99 101 u64 old_dir_index, struct dentry *parent);