Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

docs: *-regressions.rst: explain how quickly issues should be handled

Add a section with a few rules of thumb about how
quickly developers should address regressions to
Documentation/process/handling-regressions.rst; additionally,
add a short paragraph about this to the companion document
Documentation/admin-guide/reporting-regressions.rst as well.

The rules of thumb were written after studying the quotes from Linus
found in handling-regressions.rst and especially influenced by
statements like "Users are literally the _only_ thing that matters" and
"without users, your program is not a program, it's a pointless piece of
code that you might as well throw away". The author interpreted those in
perspective to how the various Linux kernel series are maintained
currently and what those practices might mean for users running into a
regression on a small or big kernel update.

That for example lead to the paragraph starting with "Aim to get fixes
for regressions mainlined within one week after identifying the culprit,
if the regression was introduced in a stable/longterm release or the
devel cycle for the latest mainline release". Some might see this as
pretty high bar, but on the other hand something like that is needed to
not leave users out in the cold for too long -- which can quickly happen
when updating to the latest stable series, as the previous one is
normally stamped "End of Life" about three or four weeks after a new
mainline release. This makes a lot of users switch during this
timeframe. Any of them thus risk running into regressions not promptly
fixed; even worse, once the previous stable series is EOLed for real,
users that face a regression might be left with only three options:

(1) continue running an outdated and thus potentially insecure kernel
version from an abandoned stable series

(2) run the kernel with the regression

(3) downgrade to an earlier longterm series still supported

This is better avoided, as (1) puts users and their data in danger, (2)
will only be possible if it's a minor regression that doesn't interfere
with booting or serious usage, and (3) might be regression itself or
impossible on the particular machine, as the users might require drivers
or features only introduced after the latest longterm series branched
of.

In the end this lead to the aforementioned "Aim to fix regression within
one week" part. It's also the reason for the "Try to resolve any
regressions introduced in the current development cycle before its
end.".

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
CC: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/a7b717b52c0d54cdec9b6daf56ed6669feddee2c.1644994117.git.linux@leemhuis.info
Signed-off-by: Jonathan Corbet <corbet@lwn.net>

authored by

Thorsten Leemhuis and committed by
Jonathan Corbet
d2b40ba2 1ecf393f

+99
+12
Documentation/admin-guide/reporting-regressions.rst
··· 214 214 the regressions mailing list, ideally with a "regzbot command" in your mail to 215 215 get it tracked immediately. 216 216 217 + How quickly are regressions normally fixed? 218 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 219 + 220 + Developers should fix any reported regression as quickly as possible, to provide 221 + affected users with a solution in a timely manner and prevent more users from 222 + running into the issue; nevertheless developers need to take enough time and 223 + care to ensure regression fixes do not cause additional damage. 224 + 225 + The answer thus depends on various factors like the impact of a regression, its 226 + age, or the Linux series in which it occurs. In the end though, most regressions 227 + should be fixed within two weeks. 228 + 217 229 Is it a regression, if the issue can be avoided by updating some software? 218 230 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 219 231
+87
Documentation/process/handling-regressions.rst
··· 45 45 mandated by Documentation/process/submitting-patches.rst and 46 46 :ref:`Documentation/process/5.Posting.rst <development_posting>`. 47 47 48 + #. Try to fix regressions quickly once the culprit has been identified; fixes 49 + for most regressions should be merged within two weeks, but some need to be 50 + resolved within two or three days. 51 + 48 52 49 53 All the details on Linux kernel regressions relevant for developers 50 54 =================================================================== ··· 128 124 tools and scripts used by other kernel developers or Linux distributions; one of 129 125 these tools is regzbot, which heavily relies on the "Link:" tags to associate 130 126 reports for regression with changes resolving them. 127 + 128 + Prioritize work on fixing regressions 129 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 130 + 131 + You should fix any reported regression as quickly as possible, to provide 132 + affected users with a solution in a timely manner and prevent more users from 133 + running into the issue; nevertheless developers need to take enough time and 134 + care to ensure regression fixes do not cause additional damage. 135 + 136 + In the end though, developers should give their best to prevent users from 137 + running into situations where a regression leaves them only three options: "run 138 + a kernel with a regression that seriously impacts usage", "continue running an 139 + outdated and thus potentially insecure kernel version for more than two weeks 140 + after a regression's culprit was identified", and "downgrade to a still 141 + supported kernel series that lack required features". 142 + 143 + How to realize this depends a lot on the situation. Here are a few rules of 144 + thumb for you, in order or importance: 145 + 146 + * Prioritize work on handling regression reports and fixing regression over all 147 + other Linux kernel work, unless the latter concerns acute security issues or 148 + bugs causing data loss or damage. 149 + 150 + * Always consider reverting the culprit commits and reapplying them later 151 + together with necessary fixes, as this might be the least dangerous and 152 + quickest way to fix a regression. 153 + 154 + * Developers should handle regressions in all supported kernel series, but are 155 + free to delegate the work to the stable team, if the issue probably at no 156 + point in time occurred with mainline. 157 + 158 + * Try to resolve any regressions introduced in the current development before 159 + its end. If you fear a fix might be too risky to apply only days before a new 160 + mainline release, let Linus decide: submit the fix separately to him as soon 161 + as possible with the explanation of the situation. He then can make a call 162 + and postpone the release if necessary, for example if multiple such changes 163 + show up in his inbox. 164 + 165 + * Address regressions in stable, longterm, or proper mainline releases with 166 + more urgency than regressions in mainline pre-releases. That changes after 167 + the release of the fifth pre-release, aka "-rc5": mainline then becomes as 168 + important, to ensure all the improvements and fixes are ideally tested 169 + together for at least one week before Linus releases a new mainline version. 170 + 171 + * Fix regressions within two or three days, if they are critical for some 172 + reason -- for example, if the issue is likely to affect many users of the 173 + kernel series in question on all or certain architectures. Note, this 174 + includes mainline, as issues like compile errors otherwise might prevent many 175 + testers or continuous integration systems from testing the series. 176 + 177 + * Aim to fix regressions within one week after the culprit was identified, if 178 + the issue was introduced in either: 179 + 180 + * a recent stable/longterm release 181 + 182 + * the development cycle of the latest proper mainline release 183 + 184 + In the latter case (say Linux v5.14), try to address regressions even 185 + quicker, if the stable series for the predecessor (v5.13) will be abandoned 186 + soon or already was stamped "End-of-Life" (EOL) -- this usually happens about 187 + three to four weeks after a new mainline release. 188 + 189 + * Try to fix all other regressions within two weeks after the culprit was 190 + found. Two or three additional weeks are acceptable for performance 191 + regressions and other issues which are annoying, but don't prevent anyone 192 + from running Linux (unless it's an issue in the current development cycle, 193 + as those should ideally be addressed before the release). A few weeks in 194 + total are acceptable if a regression can only be fixed with a risky change 195 + and at the same time is affecting only a few users; as much time is 196 + also okay if the regression is already present in the second newest longterm 197 + kernel series. 198 + 199 + Note: The aforementioned time frames for resolving regressions are meant to 200 + include getting the fix tested, reviewed, and merged into mainline, ideally with 201 + the fix being in linux-next at least briefly. This leads to delays you need to 202 + account for. 203 + 204 + Subsystem maintainers are expected to assist in reaching those periods by doing 205 + timely reviews and quick handling of accepted patches. They thus might have to 206 + send git-pull requests earlier or more often than usual; depending on the fix, 207 + it might even be acceptable to skip testing in linux-next. Especially fixes for 208 + regressions in stable and longterm kernels need to be handled quickly, as fixes 209 + need to be merged in mainline before they can be backported to older series. 131 210 132 211 133 212 More aspects regarding regressions developers should be aware of