Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull DAX updates part one from Darrick Wong:
"After many years of LKML-wrangling about how to enable programs to
query and influence the file data access mode (DAX) when a filesystem
resides on storage devices such as persistent memory, Ira Weiny has
emerged with a proposed set of standard behaviors that has not been
shot down by anyone! We're more or less standardizing on the current
XFS behavior and adapting ext4 to do the same.

This is the first of a handful pull requests that will make ext4 and
XFS present a consistent interface for user programs that care about
DAX. We add a statx attribute that programs can check to see if DAX is
enabled on a particular file. Then, we update the DAX documentation to
spell out the user-visible behaviors that filesystems will guarantee
(until the next storage industry shakeup). The on-disk inode flag has
been in XFS for a few years now.

Summary:

- Clean up io_is_direct.

- Add a new statx flag to indicate when file data access is being
done via DAX (as opposed to the page cache).

- Update the documentation for how system administrators and
application programmers can take advantage of the (still
experimental DAX) feature"

Link: https://lore.kernel.org/lkml/20200505002016.1085071-1-ira.weiny@intel.com/

* tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
Documentation/dax: Update Usage section
fs/stat: Define DAX statx attribute
fs: Remove unneeded IS_DAX() check in io_is_direct()

+147 -12
+139 -3
Documentation/filesystems/dax.txt
··· 20 20 If you have a block device which supports DAX, you can make a filesystem 21 21 on it as usual. The DAX code currently only supports files with a block 22 22 size equal to your kernel's PAGE_SIZE, so you may need to specify a block 23 - size when creating the filesystem. When mounting it, use the "-o dax" 24 - option on the command line or add 'dax' to the options in /etc/fstab. 23 + size when creating the filesystem. 24 + 25 + Currently 3 filesystems support DAX: ext2, ext4 and xfs. Enabling DAX on them 26 + is different. 27 + 28 + Enabling DAX on ext4 and ext2 29 + ----------------------------- 30 + 31 + When mounting the filesystem, use the "-o dax" option on the command line or 32 + add 'dax' to the options in /etc/fstab. This works to enable DAX on all files 33 + within the filesystem. It is equivalent to the '-o dax=always' behavior below. 34 + 35 + 36 + Enabling DAX on xfs 37 + ------------------- 38 + 39 + Summary 40 + ------- 41 + 42 + 1. There exists an in-kernel file access mode flag S_DAX that corresponds to 43 + the statx flag STATX_ATTR_DAX. See the manpage for statx(2) for details 44 + about this access mode. 45 + 46 + 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular 47 + files and directories. This advisory flag can be set or cleared at any 48 + time, but doing so does not immediately affect the S_DAX state. 49 + 50 + 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will 51 + be inherited by all regular files and subdirectories that are subsequently 52 + created in this directory. Files and subdirectories that exist at the time 53 + this flag is set or cleared on the parent directory are not modified by 54 + this modification of the parent directory. 55 + 56 + 4. There exist dax mount options which can override FS_XFLAG_DAX in the 57 + setting of the S_DAX flag. Given underlying storage which supports DAX the 58 + following hold: 59 + 60 + "-o dax=inode" means "follow FS_XFLAG_DAX" and is the default. 61 + 62 + "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX." 63 + 64 + "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX." 65 + 66 + "-o dax" is a legacy option which is an alias for "dax=always". 67 + This may be removed in the future so "-o dax=always" is 68 + the preferred method for specifying this behavior. 69 + 70 + NOTE: Modifications to and the inheritance behavior of FS_XFLAG_DAX remain 71 + the same even when the filesystem is mounted with a dax option. However, 72 + in-core inode state (S_DAX) will be overridden until the filesystem is 73 + remounted with dax=inode and the inode is evicted from kernel memory. 74 + 75 + 5. The S_DAX policy can be changed via: 76 + 77 + a) Setting the parent directory FS_XFLAG_DAX as needed before files are 78 + created 79 + 80 + b) Setting the appropriate dax="foo" mount option 81 + 82 + c) Changing the FS_XFLAG_DAX flag on existing regular files and 83 + directories. This has runtime constraints and limitations that are 84 + described in 6) below. 85 + 86 + 6. When changing the S_DAX policy via toggling the persistent FS_XFLAG_DAX flag, 87 + the change in behaviour for existing regular files may not occur 88 + immediately. If the change must take effect immediately, the administrator 89 + needs to: 90 + 91 + a) stop the application so there are no active references to the data set 92 + the policy change will affect 93 + 94 + b) evict the data set from kernel caches so it will be re-instantiated when 95 + the application is restarted. This can be achieved by: 96 + 97 + i. drop-caches 98 + ii. a filesystem unmount and mount cycle 99 + iii. a system reboot 100 + 101 + 102 + Details 103 + ------- 104 + 105 + There are 2 per-file dax flags. One is a persistent inode setting (FS_XFLAG_DAX) 106 + and the other is a volatile flag indicating the active state of the feature 107 + (S_DAX). 108 + 109 + FS_XFLAG_DAX is preserved within the filesystem. This persistent config 110 + setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl 111 + (see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'. 112 + 113 + New files and directories automatically inherit FS_XFLAG_DAX from 114 + their parent directory _when_ _created_. Therefore, setting FS_XFLAG_DAX at 115 + directory creation time can be used to set a default behavior for an entire 116 + sub-tree. 117 + 118 + To clarify inheritance, here are 3 examples: 119 + 120 + Example A: 121 + 122 + mkdir -p a/b/c 123 + xfs_io -c 'chattr +x' a 124 + mkdir a/b/c/d 125 + mkdir a/e 126 + 127 + dax: a,e 128 + no dax: b,c,d 129 + 130 + Example B: 131 + 132 + mkdir a 133 + xfs_io -c 'chattr +x' a 134 + mkdir -p a/b/c/d 135 + 136 + dax: a,b,c,d 137 + no dax: 138 + 139 + Example C: 140 + 141 + mkdir -p a/b/c 142 + xfs_io -c 'chattr +x' c 143 + mkdir a/b/c/d 144 + 145 + dax: c,d 146 + no dax: a,b 147 + 148 + 149 + The current enabled state (S_DAX) is set when a file inode is instantiated in 150 + memory by the kernel. It is set based on the underlying media support, the 151 + value of FS_XFLAG_DAX and the filesystem's dax mount option. 152 + 153 + statx can be used to query S_DAX. NOTE that only regular files will ever have 154 + S_DAX set and therefore statx will never indicate that S_DAX is set on 155 + directories. 156 + 157 + Setting the FS_XFLAG_DAX flag (specifically or through inheritance) occurs even 158 + if the underlying media does not support dax and/or the filesystem is 159 + overridden with a mount option. 160 + 25 161 26 162 27 163 Implementation Tips for Block Driver Writers ··· 230 94 redundancy in the following ways: 231 95 232 96 1. Delete the affected file, and restore from a backup (sysadmin route): 233 - This will free the file system blocks that were being used by the file, 97 + This will free the filesystem blocks that were being used by the file, 234 98 and the next time they're allocated, they will be zeroed first, which 235 99 happens through the driver, and will clear bad sectors. 236 100
+3 -3
drivers/block/loop.c
··· 644 644 645 645 static inline void loop_update_dio(struct loop_device *lo) 646 646 { 647 - __loop_update_dio(lo, io_is_direct(lo->lo_backing_file) | 648 - lo->use_dio); 647 + __loop_update_dio(lo, (lo->lo_backing_file->f_flags & O_DIRECT) | 648 + lo->use_dio); 649 649 } 650 650 651 651 static void loop_reread_partitions(struct loop_device *lo, ··· 1149 1149 1150 1150 if (config->block_size) 1151 1151 bsize = config->block_size; 1152 - else if (io_is_direct(lo->lo_backing_file) && inode->i_sb->s_bdev) 1152 + else if ((lo->lo_backing_file->f_flags & O_DIRECT) && inode->i_sb->s_bdev) 1153 1153 /* In case of direct I/O, match underlying block size */ 1154 1154 bsize = bdev_logical_block_size(inode->i_sb->s_bdev); 1155 1155 else
+3
fs/stat.c
··· 80 80 if (IS_AUTOMOUNT(inode)) 81 81 stat->attributes |= STATX_ATTR_AUTOMOUNT; 82 82 83 + if (IS_DAX(inode)) 84 + stat->attributes |= STATX_ATTR_DAX; 85 + 83 86 if (inode->i_op->getattr) 84 87 return inode->i_op->getattr(path, stat, request_mask, 85 88 query_flags);
+1 -6
include/linux/fs.h
··· 3413 3413 3414 3414 extern int file_update_time(struct file *file); 3415 3415 3416 - static inline bool io_is_direct(struct file *filp) 3417 - { 3418 - return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host); 3419 - } 3420 - 3421 3416 static inline bool vma_is_dax(const struct vm_area_struct *vma) 3422 3417 { 3423 3418 return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host); ··· 3437 3442 int res = 0; 3438 3443 if (file->f_flags & O_APPEND) 3439 3444 res |= IOCB_APPEND; 3440 - if (io_is_direct(file)) 3445 + if (file->f_flags & O_DIRECT) 3441 3446 res |= IOCB_DIRECT; 3442 3447 if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host)) 3443 3448 res |= IOCB_DSYNC;
+1
include/uapi/linux/stat.h
··· 183 183 #define STATX_ATTR_AUTOMOUNT 0x00001000 /* Dir: Automount trigger */ 184 184 #define STATX_ATTR_MOUNT_ROOT 0x00002000 /* Root of a mount */ 185 185 #define STATX_ATTR_VERITY 0x00100000 /* [I] Verity protected file */ 186 + #define STATX_ATTR_DAX 0x00002000 /* [I] File is DAX */ 186 187 187 188 188 189 #endif /* _UAPI_LINUX_STAT_H */