Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'errseq-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull errseq infrastructure fix from Jeff Layton:
"The PostgreSQL developers recently had a spirited discussion about the
writeback error handling in Linux, and reached out to us about a
behavoir change to the code that bit them when the errseq_t changes
were merged.

When we changed to using errseq_t for tracking writeback errors, we
lost the ability for an application to see a writeback error that
occurred before the open on which the fsync was issued. This was
problematic for PostgreSQL which offloads fsync calls to a completely
separate process from the DB writers.

This patch restores that ability. If the errseq_t value in the inode
does not have the SEEN flag set, then we just return 0 for the sample.
That ensures that any recorded error is always delivered at least
once.

Note that we might still lose the error if the inode gets evicted from
the cache before anything can reopen it, but that was the case before
errseq_t was merged. At LSF/MM we had some discussion about keeping
inodes with unreported writeback errors around in the cache for longer
(possibly indefinitely), but that's really a separate problem"

* tag 'errseq-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
errseq: Always report a writeback error once

+9 -14
+9 -14
lib/errseq.c
··· 111 111 * errseq_sample() - Grab current errseq_t value. 112 112 * @eseq: Pointer to errseq_t to be sampled. 113 113 * 114 - * This function allows callers to sample an errseq_t value, marking it as 115 - * "seen" if required. 114 + * This function allows callers to initialise their errseq_t variable. 115 + * If the error has been "seen", new callers will not see an old error. 116 + * If there is an unseen error in @eseq, the caller of this function will 117 + * see it the next time it checks for an error. 116 118 * 119 + * Context: Any context. 117 120 * Return: The current errseq value. 118 121 */ 119 122 errseq_t errseq_sample(errseq_t *eseq) 120 123 { 121 124 errseq_t old = READ_ONCE(*eseq); 122 - errseq_t new = old; 123 125 124 - /* 125 - * For the common case of no errors ever having been set, we can skip 126 - * marking the SEEN bit. Once an error has been set, the value will 127 - * never go back to zero. 128 - */ 129 - if (old != 0) { 130 - new |= ERRSEQ_SEEN; 131 - if (old != new) 132 - cmpxchg(eseq, old, new); 133 - } 134 - return new; 126 + /* If nobody has seen this error yet, then we can be the first. */ 127 + if (!(old & ERRSEQ_SEEN)) 128 + old = 0; 129 + return old; 135 130 } 136 131 EXPORT_SYMBOL(errseq_sample); 137 132