Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

fs: clarify when the i_version counter must be updated

The i_version field in the kernel has had different semantics over
the decades, but NFSv4 has certain expectations. Update the comments
in iversion.h to describe when the i_version must change.

Cc: Colin Walters <walters@verbum.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Trond Myklebust <trondmy@hammerspace.com>
Cc: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>

+20 -2
+20 -2
include/linux/iversion.h
··· 9 9 * --------------------------- 10 10 * The change attribute (i_version) is mandated by NFSv4 and is mostly for 11 11 * knfsd, but is also used for other purposes (e.g. IMA). The i_version must 12 - * appear different to observers if there was a change to the inode's data or 13 - * metadata since it was last queried. 12 + * appear larger to observers if there was an explicit change to the inode's 13 + * data or metadata since it was last queried. 14 + * 15 + * An explicit change is one that would ordinarily result in a change to the 16 + * inode status change time (aka ctime). i_version must appear to change, even 17 + * if the ctime does not (since the whole point is to avoid missing updates due 18 + * to timestamp granularity). If POSIX or other relevant spec mandates that the 19 + * ctime must change due to an operation, then the i_version counter must be 20 + * incremented as well. 21 + * 22 + * Making the i_version update completely atomic with the operation itself would 23 + * be prohibitively expensive. Traditionally the kernel has updated the times on 24 + * directories after an operation that changes its contents. For regular files, 25 + * the ctime is usually updated before the data is copied into the cache for a 26 + * write. This means that there is a window of time when an observer can 27 + * associate a new timestamp with old file contents. Since the purpose of the 28 + * i_version is to allow for better cache coherency, the i_version must always 29 + * be updated after the results of the operation are visible. Updating it before 30 + * and after a change is also permitted. (Note that no filesystems currently do 31 + * this. Fixing that is a work-in-progress). 14 32 * 15 33 * Observers see the i_version as a 64-bit number that never decreases. If it 16 34 * remains the same since it was last checked, then nothing has changed in the