docs: add two documents about regression handling

+1

Documentation/admin-guide/index.rst

··· 35 35 :maxdepth: 1 36 36 37 37 reporting-issues 38 + reporting-regressions 38 39 security-bugs 39 40 bug-hunting 40 41 bug-bisect

+439

Documentation/admin-guide/reporting-regressions.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0) 2 + .. [see the bottom of this file for redistribution information] 3 + 4 + Reporting regressions 5 + +++++++++++++++++++++ 6 + 7 + "*We don't cause regressions*" is the first rule of Linux kernel development; 8 + Linux founder and lead developer Linus Torvalds established it himself and 9 + ensures it's obeyed. 10 + 11 + This document describes what the rule means for users and how the Linux kernel's 12 + development model ensures to address all reported regressions; aspects relevant 13 + for kernel developers are left to Documentation/process/handling-regressions.rst. 14 + 15 + 16 + The important bits (aka "TL;DR") 17 + ================================ 18 + 19 + #. It's a regression if something running fine with one Linux kernel works worse 20 + or not at all with a newer version. Note, the newer kernel has to be compiled 21 + using a similar configuration; the detailed explanations below describes this 22 + and other fine print in more detail. 23 + 24 + #. Report your issue as outlined in Documentation/admin-guide/reporting-issues.rst, 25 + it already covers all aspects important for regressions and repeated 26 + below for convenience. Two of them are important: start your report's subject 27 + with "[REGRESSION]" and CC or forward it to `the regression mailing list 28 + <https://lore.kernel.org/regressions/>`_ (regressions@lists.linux.dev). 29 + 30 + #. Optional, but recommended: when sending or forwarding your report, make the 31 + Linux kernel regression tracking bot "regzbot" track the issue by specifying 32 + when the regression started like this:: 33 + 34 + #regzbot introduced v5.13..v5.14-rc1 35 + 36 + 37 + All the details on Linux kernel regressions relevant for users 38 + ============================================================== 39 + 40 + 41 + The important basics 42 + -------------------- 43 + 44 + 45 + What is a "regression" and what is the "no regressions rule"? 46 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 47 + 48 + It's a regression if some application or practical use case running fine with 49 + one Linux kernel works worse or not at all with a newer version compiled using a 50 + similar configuration. The "no regressions rule" forbids this to take place; if 51 + it happens by accident, developers that caused it are expected to quickly fix 52 + the issue. 53 + 54 + It thus is a regression when a WiFi driver from Linux 5.13 works fine, but with 55 + 5.14 doesn't work at all, works significantly slower, or misbehaves somehow. 56 + It's also a regression if a perfectly working application suddenly shows erratic 57 + behavior with a newer kernel version; such issues can be caused by changes in 58 + procfs, sysfs, or one of the many other interfaces Linux provides to userland 59 + software. But keep in mind, as mentioned earlier: 5.14 in this example needs to 60 + be built from a configuration similar to the one from 5.13. This can be achieved 61 + using ``make olddefconfig``, as explained in more detail below. 62 + 63 + Note the "practical use case" in the first sentence of this section: developers 64 + despite the "no regressions" rule are free to change any aspect of the kernel 65 + and even APIs or ABIs to userland, as long as no existing application or use 66 + case breaks. 67 + 68 + Also be aware the "no regressions" rule covers only interfaces the kernel 69 + provides to the userland. It thus does not apply to kernel-internal interfaces 70 + like the module API, which some externally developed drivers use to hook into 71 + the kernel. 72 + 73 + How do I report a regression? 74 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 75 + 76 + Just report the issue as outlined in 77 + Documentation/admin-guide/reporting-issues.rst, it already describes the 78 + important points. The following aspects outlined there are especially relevant 79 + for regressions: 80 + 81 + * When checking for existing reports to join, also search the `archives of the 82 + Linux regressions mailing list <https://lore.kernel.org/regressions/>`_ and 83 + `regzbot's web-interface <https://linux-regtracking.leemhuis.info/regzbot/>`_. 84 + 85 + * Start your report's subject with "[REGRESSION]". 86 + 87 + * In your report, clearly mention the last kernel version that worked fine and 88 + the first broken one. Ideally try to find the exact change causing the 89 + regression using a bisection, as explained below in more detail. 90 + 91 + * Remember to let the Linux regressions mailing list 92 + (regressions@lists.linux.dev) know about your report: 93 + 94 + * If you report the regression by mail, CC the regressions list. 95 + 96 + * If you report your regression to some bug tracker, forward the submitted 97 + report by mail to the regressions list while CCing the maintainer and the 98 + mailing list for the subsystem in question. 99 + 100 + If it's a regression within a stable or longterm series (e.g. 101 + v5.15.3..v5.15.5), remember to CC the `Linux stable mailing list 102 + <https://lore.kernel.org/stable/>`_ (stable@vger.kernel.org). 103 + 104 + In case you performed a successful bisection, add everyone to the CC the 105 + culprit's commit message mentions in lines starting with "Signed-off-by:". 106 + 107 + When CCing for forwarding your report to the list, consider directly telling the 108 + aforementioned Linux kernel regression tracking bot about your report. To do 109 + that, include a paragraph like this in your mail:: 110 + 111 + #regzbot introduced: v5.13..v5.14-rc1 112 + 113 + Regzbot will then consider your mail a report for a regression introduced in the 114 + specified version range. In above case Linux v5.13 still worked fine and Linux 115 + v5.14-rc1 was the first version where you encountered the issue. If you 116 + performed a bisection to find the commit that caused the regression, specify the 117 + culprit's commit-id instead:: 118 + 119 + #regzbot introduced: 1f2e3d4c5d 120 + 121 + Placing such a "regzbot command" is in your interest, as it will ensure the 122 + report won't fall through the cracks unnoticed. If you omit this, the Linux 123 + kernel's regressions tracker will take care of telling regzbot about your 124 + regression, as long as you send a copy to the regressions mailing lists. But the 125 + regression tracker is just one human which sometimes has to rest or occasionally 126 + might even enjoy some time away from computers (as crazy as that might sound). 127 + Relying on this person thus will result in an unnecessary delay before the 128 + regressions becomes mentioned `on the list of tracked and unresolved Linux 129 + kernel regressions <https://linux-regtracking.leemhuis.info/regzbot/>`_ and the 130 + weekly regression reports sent by regzbot. Such delays can result in Linus 131 + Torvalds being unaware of important regressions when deciding between "continue 132 + development or call this finished and release the final?". 133 + 134 + Are really all regressions fixed? 135 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 136 + 137 + Nearly all of them are, as long as the change causing the regression (the 138 + "culprit commit") is reliably identified. Some regressions can be fixed without 139 + this, but often it's required. 140 + 141 + Who needs to find the root cause of a regression? 142 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 143 + 144 + Developers of the affected code area should try to locate the culprit on their 145 + own. But for them that's often impossible to do with reasonable effort, as quite 146 + a lot of issues only occur in a particular environment outside the developer's 147 + reach -- for example, a specific hardware platform, firmware, Linux distro, 148 + system's configuration, or application. That's why in the end it's often up to 149 + the reporter to locate the culprit commit; sometimes users might even need to 150 + run additional tests afterwards to pinpoint the exact root cause. Developers 151 + should offer advice and reasonably help where they can, to make this process 152 + relatively easy and achievable for typical users. 153 + 154 + How can I find the culprit? 155 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 156 + 157 + Perform a bisection, as roughly outlined in 158 + Documentation/admin-guide/reporting-issues.rst and described in more detail by 159 + Documentation/admin-guide/bug-bisect.rst. It might sound like a lot of work, but 160 + in many cases finds the culprit relatively quickly. If it's hard or 161 + time-consuming to reliably reproduce the issue, consider teaming up with other 162 + affected users to narrow down the search range together. 163 + 164 + Who can I ask for advice when it comes to regressions? 165 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 166 + 167 + Send a mail to the regressions mailing list (regressions@lists.linux.dev) while 168 + CCing the Linux kernel's regression tracker (regressions@leemhuis.info); if the 169 + issue might better be dealt with in private, feel free to omit the list. 170 + 171 + 172 + Additional details about regressions 173 + ------------------------------------ 174 + 175 + 176 + What is the goal of the "no regressions rule"? 177 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 178 + 179 + Users should feel safe when updating kernel versions and not have to worry 180 + something might break. This is in the interest of the kernel developers to make 181 + updating attractive: they don't want users to stay on stable or longterm Linux 182 + series that are either abandoned or more than one and a half years old. That's 183 + in everybody's interest, as `those series might have known bugs, security 184 + issues, or other problematic aspects already fixed in later versions 185 + <http://www.kroah.com/log/blog/2018/08/24/what-stable-kernel-should-i-use/>`_. 186 + Additionally, the kernel developers want to make it simple and appealing for 187 + users to test the latest pre-release or regular release. That's also in 188 + everybody's interest, as it's a lot easier to track down and fix problems, if 189 + they are reported shortly after being introduced. 190 + 191 + Is the "no regressions" rule really adhered in practice? 192 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 193 + 194 + It's taken really seriously, as can be seen by many mailing list posts from 195 + Linux creator and lead developer Linus Torvalds, some of which are quoted in 196 + Documentation/process/handling-regressions.rst. 197 + 198 + Exceptions to this rule are extremely rare; in the past developers almost always 199 + turned out to be wrong when they assumed a particular situation was warranting 200 + an exception. 201 + 202 + Who ensures the "no regressions" is actually followed? 203 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 204 + 205 + The subsystem maintainers should take care of that, which are watched and 206 + supported by the tree maintainers -- e.g. Linus Torvalds for mainline and 207 + Greg Kroah-Hartman et al. for various stable/longterm series. 208 + 209 + All of them are helped by people trying to ensure no regression report falls 210 + through the cracks. One of them is Thorsten Leemhuis, who's currently acting as 211 + the Linux kernel's "regressions tracker"; to facilitate this work he relies on 212 + regzbot, the Linux kernel regression tracking bot. That's why you want to bring 213 + your report on the radar of these people by CCing or forwarding each report to 214 + the regressions mailing list, ideally with a "regzbot command" in your mail to 215 + get it tracked immediately. 216 + 217 + Is it a regression, if the issue can be avoided by updating some software? 218 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 219 + 220 + Almost always: yes. If a developer tells you otherwise, ask the regression 221 + tracker for advice as outlined above. 222 + 223 + Is it a regression, if a newer kernel works slower or consumes more energy? 224 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 225 + 226 + Yes, but the difference has to be significant. A five percent slow-down in a 227 + micro-benchmark thus is unlikely to qualify as regression, unless it also 228 + influences the results of a broad benchmark by more than one percent. If in 229 + doubt, ask for advice. 230 + 231 + Is it a regression, if an external kernel module breaks when updating Linux? 232 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 233 + 234 + No, as the "no regression" rule is about interfaces and services the Linux 235 + kernel provides to the userland. It thus does not cover building or running 236 + externally developed kernel modules, as they run in kernel-space and hook into 237 + the kernel using internal interfaces occasionally changed. 238 + 239 + How are regressions handled that are caused by security fixes? 240 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 241 + 242 + In extremely rare situations security issues can't be fixed without causing 243 + regressions; those fixes are given way, as they are the lesser evil in the end. 244 + Luckily this middling almost always can be avoided, as key developers for the 245 + affected area and often Linus Torvalds himself try very hard to fix security 246 + issues without causing regressions. 247 + 248 + If you nevertheless face such a case, check the mailing list archives if people 249 + tried their best to avoid the regression. If not, report it; if in doubt, ask 250 + for advice as outlined above. 251 + 252 + What happens if fixing a regression is impossible without causing another? 253 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 254 + 255 + Sadly these things happen, but luckily not very often; if they occur, expert 256 + developers of the affected code area should look into the issue to find a fix 257 + that avoids regressions or at least their impact. If you run into such a 258 + situation, do what was outlined already for regressions caused by security 259 + fixes: check earlier discussions if people already tried their best and ask for 260 + advice if in doubt. 261 + 262 + A quick note while at it: these situations could be avoided, if people would 263 + regularly give mainline pre-releases (say v5.15-rc1 or -rc3) from each 264 + development cycle a test run. This is best explained by imagining a change 265 + integrated between Linux v5.14 and v5.15-rc1 which causes a regression, but at 266 + the same time is a hard requirement for some other improvement applied for 267 + 5.15-rc1. All these changes often can simply be reverted and the regression thus 268 + solved, if someone finds and reports it before 5.15 is released. A few days or 269 + weeks later this solution can become impossible, as some software might have 270 + started to rely on aspects introduced by one of the follow-up changes: reverting 271 + all changes would then cause a regression for users of said software and thus is 272 + out of the question. 273 + 274 + Is it a regression, if some feature I relied on was removed months ago? 275 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 276 + 277 + It is, but often it's hard to fix such regressions due to the aspects outlined 278 + in the previous section. It hence needs to be dealt with on a case-by-case 279 + basis. This is another reason why it's in everybody's interest to regularly test 280 + mainline pre-releases. 281 + 282 + Does the "no regression" rule apply if I seem to be the only affected person? 283 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 284 + 285 + It does, but only for practical usage: the Linux developers want to be free to 286 + remove support for hardware only to be found in attics and museums anymore. 287 + 288 + Note, sometimes regressions can't be avoided to make progress -- and the latter 289 + is needed to prevent Linux from stagnation. Hence, if only very few users seem 290 + to be affected by a regression, it for the greater good might be in their and 291 + everyone else's interest to lettings things pass. Especially if there is an 292 + easy way to circumvent the regression somehow, for example by updating some 293 + software or using a kernel parameter created just for this purpose. 294 + 295 + Does the regression rule apply for code in the staging tree as well? 296 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 297 + 298 + Not according to the `help text for the configuration option covering all 299 + staging code <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/Kconfig>`_, 300 + which since its early days states:: 301 + 302 + Please note that these drivers are under heavy development, may or 303 + may not work, and may contain userspace interfaces that most likely 304 + will be changed in the near future. 305 + 306 + The staging developers nevertheless often adhere to the "no regressions" rule, 307 + but sometimes bend it to make progress. That's for example why some users had to 308 + deal with (often negligible) regressions when a WiFi driver from the staging 309 + tree was replaced by a totally different one written from scratch. 310 + 311 + Why do later versions have to be "compiled with a similar configuration"? 312 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 313 + 314 + Because the Linux kernel developers sometimes integrate changes known to cause 315 + regressions, but make them optional and disable them in the kernel's default 316 + configuration. This trick allows progress, as the "no regressions" rule 317 + otherwise would lead to stagnation. 318 + 319 + Consider for example a new security feature blocking access to some kernel 320 + interfaces often abused by malware, which at the same time are required to run a 321 + few rarely used applications. The outlined approach makes both camps happy: 322 + people using these applications can leave the new security feature off, while 323 + everyone else can enable it without running into trouble. 324 + 325 + How to create a configuration similar to the one of an older kernel? 326 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 327 + 328 + Start your machine with a known-good kernel and configure the newer Linux 329 + version with ``make olddefconfig``. This makes the kernel's build scripts pick 330 + up the configuration file (the ".config" file) from the running kernel as base 331 + for the new one you are about to compile; afterwards they set all new 332 + configuration options to their default value, which should disable new features 333 + that might cause regressions. 334 + 335 + Can I report a regression I found with pre-compiled vanilla kernels? 336 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 337 + 338 + You need to ensure the newer kernel was compiled with a similar configuration 339 + file as the older one (see above), as those that built them might have enabled 340 + some known-to-be incompatible feature for the newer kernel. If in doubt, report 341 + the matter to the kernel's provider and ask for advice. 342 + 343 + 344 + More about regression tracking with "regzbot" 345 + --------------------------------------------- 346 + 347 + What is regression tracking and why should I care about it? 348 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 349 + 350 + Rules like "no regressions" need someone to ensure they are followed, otherwise 351 + they are broken either accidentally or on purpose. History has shown this to be 352 + true for Linux kernel development as well. That's why Thorsten Leemhuis, the 353 + Linux Kernel's regression tracker, and some people try to ensure all regression 354 + are fixed by keeping an eye on them until they are resolved. Neither of them are 355 + paid for this, that's why the work is done on a best effort basis. 356 + 357 + Why and how are Linux kernel regressions tracked using a bot? 358 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 359 + 360 + Tracking regressions completely manually has proven to be quite hard due to the 361 + distributed and loosely structured nature of Linux kernel development process. 362 + That's why the Linux kernel's regression tracker developed regzbot to facilitate 363 + the work, with the long term goal to automate regression tracking as much as 364 + possible for everyone involved. 365 + 366 + Regzbot works by watching for replies to reports of tracked regressions. 367 + Additionally, it's looking out for posted or committed patches referencing such 368 + reports with "Link:" tags; replies to such patch postings are tracked as well. 369 + Combined this data provides good insights into the current state of the fixing 370 + process. 371 + 372 + How to see which regressions regzbot tracks currently? 373 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 374 + 375 + Check out `regzbot's web-interface <https://linux-regtracking.leemhuis.info/regzbot/>`_. 376 + 377 + What kind of issues are supposed to be tracked by regzbot? 378 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 379 + 380 + The bot is meant to track regressions, hence please don't involve regzbot for 381 + regular issues. But it's okay for the Linux kernel's regression tracker if you 382 + involve regzbot to track severe issues, like reports about hangs, corrupted 383 + data, or internal errors (Panic, Oops, BUG(), warning, ...). 384 + 385 + How to change aspects of a tracked regression? 386 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 387 + 388 + By using a 'regzbot command' in a direct or indirect reply to the mail with the 389 + report. The easiest way to do that: find the report in your "Sent" folder or the 390 + mailing list archive and reply to it using your mailer's "Reply-all" function. 391 + In that mail, use one of the following commands in a stand-alone paragraph (IOW: 392 + use blank lines to separate one or multiple of these commands from the rest of 393 + the mail's text). 394 + 395 + * Update when the regression started to happen, for example after performing a 396 + bisection:: 397 + 398 + #regzbot introduced: 1f2e3d4c5d 399 + 400 + * Set or update the title:: 401 + 402 + #regzbot title: foo 403 + 404 + * Monitor a discussion or bugzilla.kernel.org ticket where additions aspects of 405 + the issue or a fix are discussed::: 406 + 407 + #regzbot monitor: https://lore.kernel.org/r/30th.anniversary.repost@klaava.Helsinki.FI/ 408 + #regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=123456789 409 + 410 + * Point to a place with further details of interest, like a mailing list post 411 + or a ticket in a bug tracker that are slightly related, but about a different 412 + topic:: 413 + 414 + #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=123456789 415 + 416 + * Mark a regression as invalid:: 417 + 418 + #regzbot invalid: wasn't a regression, problem has always existed 419 + 420 + Regzbot supports a few other commands primarily used by developers or people 421 + tracking regressions. They and more details about the aforementioned regzbot 422 + commands can be found in the `getting started guide 423 + <https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md>`_ and 424 + the `reference documentation <https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md>`_ 425 + for regzbot. 426 + 427 + .. 428 + end-of-content 429 + .. 430 + This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top 431 + of the file. If you want to distribute this text under CC-BY-4.0 only, 432 + please use "The Linux kernel developers" for author attribution and link 433 + this as source: 434 + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-regressions.rst 435 + .. 436 + Note: Only the content of this RST file as found in the Linux kernel sources 437 + is available under CC-BY-4.0, as versions of this text that were processed 438 + (for example by the kernel's build system) might contain content taken from 439 + files which use a more restrictive license.

+659

Documentation/process/handling-regressions.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0) 2 + .. See the bottom of this file for additional redistribution information. 3 + 4 + Handling regressions 5 + ++++++++++++++++++++ 6 + 7 + *We don't cause regressions* -- this document describes what this "first rule of 8 + Linux kernel development" means in practice for developers. It complements 9 + Documentation/admin-guide/reporting-regressions.rst, which covers the topic from a 10 + user's point of view; if you never read that text, go and at least skim over it 11 + before continuing here. 12 + 13 + The important bits (aka "The TL;DR") 14 + ==================================== 15 + 16 + #. Ensure subscribers of the `regression mailing list <https://lore.kernel.org/regressions/>`_ 17 + (regressions@lists.linux.dev) quickly become aware of any new regression 18 + report: 19 + 20 + * When receiving a mailed report that did not CC the list, bring it into the 21 + loop by immediately sending at least a brief "Reply-all" with the list 22 + CCed. 23 + 24 + * Forward or bounce any reports submitted in bug trackers to the list. 25 + 26 + #. Make the Linux kernel regression tracking bot "regzbot" track the issue (this 27 + is optional, but recommended): 28 + 29 + * For mailed reports, check if the reporter included a line like ``#regzbot 30 + introduced v5.13..v5.14-rc1``. If not, send a reply (with the regressions 31 + list in CC) containing a paragraph like the following, which tells regzbot 32 + when the issue started to happen:: 33 + 34 + #regzbot ^introduced 1f2e3d4c5b6a 35 + 36 + * When forwarding reports from a bug tracker to the regressions list (see 37 + above), include a paragraph like the following:: 38 + 39 + #regzbot introduced: v5.13..v5.14-rc1 40 + #regzbot from: Some N. Ice Human <some.human@example.com> 41 + #regzbot monitor: http://some.bugtracker.example.com/ticket?id=123456789 42 + 43 + #. When submitting fixes for regressions, add "Link:" tags to the patch 44 + description pointing to all places where the issue was reported, as 45 + mandated by Documentation/process/submitting-patches.rst and 46 + :ref:`Documentation/process/5.Posting.rst <development_posting>`. 47 + 48 + 49 + All the details on Linux kernel regressions relevant for developers 50 + =================================================================== 51 + 52 + 53 + The important basics in more detail 54 + ----------------------------------- 55 + 56 + 57 + What to do when receiving regression reports 58 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 59 + 60 + Ensure the Linux kernel's regression tracker and others subscribers of the 61 + `regression mailing list <https://lore.kernel.org/regressions/>`_ 62 + (regressions@lists.linux.dev) become aware of any newly reported regression: 63 + 64 + * When you receive a report by mail that did not CC the list, immediately bring 65 + it into the loop by sending at least a brief "Reply-all" with the list CCed; 66 + try to ensure it gets CCed again in case you reply to a reply that omitted 67 + the list. 68 + 69 + * If a report submitted in a bug tracker hits your Inbox, forward or bounce it 70 + to the list. Consider checking the list archives beforehand, if the reporter 71 + already forwarded the report as instructed by 72 + Documentation/admin-guide/reporting-issues.rst. 73 + 74 + When doing either, consider making the Linux kernel regression tracking bot 75 + "regzbot" immediately start tracking the issue: 76 + 77 + * For mailed reports, check if the reporter included a "regzbot command" like 78 + ``#regzbot introduced 1f2e3d4c5b6a``. If not, send a reply (with the 79 + regressions list in CC) with a paragraph like the following::: 80 + 81 + #regzbot ^introduced: v5.13..v5.14-rc1 82 + 83 + This tells regzbot the version range in which the issue started to happen; 84 + you can specify a range using commit-ids as well or state a single commit-id 85 + in case the reporter bisected the culprit. 86 + 87 + Note the caret (^) before the "introduced": it tells regzbot to treat the 88 + parent mail (the one you reply to) as the initial report for the regression 89 + you want to see tracked; that's important, as regzbot will later look out 90 + for patches with "Link:" tags pointing to the report in the archives on 91 + lore.kernel.org. 92 + 93 + * When forwarding a regressions reported to a bug tracker, include a paragraph 94 + with these regzbot commands:: 95 + 96 + #regzbot introduced: 1f2e3d4c5b6a 97 + #regzbot from: Some N. Ice Human <some.human@example.com> 98 + #regzbot monitor: http://some.bugtracker.example.com/ticket?id=123456789 99 + 100 + Regzbot will then automatically associate patches with the report that 101 + contain "Link:" tags pointing to your mail or the mentioned ticket. 102 + 103 + What's important when fixing regressions 104 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 105 + 106 + You don't need to do anything special when submitting fixes for regression, just 107 + remember to do what Documentation/process/submitting-patches.rst, 108 + :ref:`Documentation/process/5.Posting.rst <development_posting>`, and 109 + Documentation/process/stable-kernel-rules.rst already explain in more detail: 110 + 111 + * Point to all places where the issue was reported using "Link:" tags:: 112 + 113 + Link: https://lore.kernel.org/r/30th.anniversary.repost@klaava.Helsinki.FI/ 114 + Link: https://bugzilla.kernel.org/show_bug.cgi?id=1234567890 115 + 116 + * Add a "Fixes:" tag to specify the commit causing the regression. 117 + 118 + * If the culprit was merged in an earlier development cycle, explicitly mark 119 + the fix for backporting using the ``Cc: stable@vger.kernel.org`` tag. 120 + 121 + All this is expected from you and important when it comes to regression, as 122 + these tags are of great value for everyone (you included) that might be looking 123 + into the issue weeks, months, or years later. These tags are also crucial for 124 + tools and scripts used by other kernel developers or Linux distributions; one of 125 + these tools is regzbot, which heavily relies on the "Link:" tags to associate 126 + reports for regression with changes resolving them. 127 + 128 + 129 + More aspects regarding regressions developers should be aware of 130 + ---------------------------------------------------------------- 131 + 132 + 133 + How to deal with changes where a risk of regression is known 134 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 135 + 136 + Evaluate how big the risk of regressions is, for example by performing a code 137 + search in Linux distributions and Git forges. Also consider asking other 138 + developers or projects likely to be affected to evaluate or even test the 139 + proposed change; if problems surface, maybe some solution acceptable for all 140 + can be found. 141 + 142 + If the risk of regressions in the end seems to be relatively small, go ahead 143 + with the change, but let all involved parties know about the risk. Hence, make 144 + sure your patch description makes this aspect obvious. Once the change is 145 + merged, tell the Linux kernel's regression tracker and the regressions mailing 146 + list about the risk, so everyone has the change on the radar in case reports 147 + trickle in. Depending on the risk, you also might want to ask the subsystem 148 + maintainer to mention the issue in his mainline pull request. 149 + 150 + What else is there to known about regressions? 151 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 152 + 153 + Check out Documentation/admin-guide/reporting-regressions.rst, it covers a lot 154 + of other aspects you want might want to be aware of: 155 + 156 + * the purpose of the "no regressions rule" 157 + 158 + * what issues actually qualify as regression 159 + 160 + * who's in charge for finding the root cause of a regression 161 + 162 + * how to handle tricky situations, e.g. when a regression is caused by a 163 + security fix or when fixing a regression might cause another one 164 + 165 + Whom to ask for advice when it comes to regressions 166 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 167 + 168 + Send a mail to the regressions mailing list (regressions@lists.linux.dev) while 169 + CCing the Linux kernel's regression tracker (regressions@leemhuis.info); if the 170 + issue might better be dealt with in private, feel free to omit the list. 171 + 172 + 173 + More about regression tracking and regzbot 174 + ------------------------------------------ 175 + 176 + 177 + Why the Linux kernel has a regression tracker, and why is regzbot used? 178 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 179 + 180 + Rules like "no regressions" need someone to ensure they are followed, otherwise 181 + they are broken either accidentally or on purpose. History has shown this to be 182 + true for the Linux kernel as well. That's why Thorsten Leemhuis volunteered to 183 + keep an eye on things as the Linux kernel's regression tracker, who's 184 + occasionally helped by other people. Neither of them are paid to do this, 185 + that's why regression tracking is done on a best effort basis. 186 + 187 + Earlier attempts to manually track regressions have shown it's an exhausting and 188 + frustrating work, which is why they were abandoned after a while. To prevent 189 + this from happening again, Thorsten developed regzbot to facilitate the work, 190 + with the long term goal to automate regression tracking as much as possible for 191 + everyone involved. 192 + 193 + How does regression tracking work with regzbot? 194 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 195 + 196 + The bot watches for replies to reports of tracked regressions. Additionally, 197 + it's looking out for posted or committed patches referencing such reports 198 + with "Link:" tags; replies to such patch postings are tracked as well. 199 + Combined this data provides good insights into the current state of the fixing 200 + process. 201 + 202 + Regzbot tries to do its job with as little overhead as possible for both 203 + reporters and developers. In fact, only reporters are burdened with an extra 204 + duty: they need to tell regzbot about the regression report using the ``#regzbot 205 + introduced`` command outlined above; if they don't do that, someone else can 206 + take care of that using ``#regzbot ^introduced``. 207 + 208 + For developers there normally is no extra work involved, they just need to make 209 + sure to do something that was expected long before regzbot came to light: add 210 + "Link:" tags to the patch description pointing to all reports about the issue 211 + fixed. 212 + 213 + Do I have to use regzbot? 214 + ~~~~~~~~~~~~~~~~~~~~~~~~~ 215 + 216 + It's in the interest of everyone if you do, as kernel maintainers like Linus 217 + Torvalds partly rely on regzbot's tracking in their work -- for example when 218 + deciding to release a new version or extend the development phase. For this they 219 + need to be aware of all unfixed regression; to do that, Linus is known to look 220 + into the weekly reports sent by regzbot. 221 + 222 + Do I have to tell regzbot about every regression I stumble upon? 223 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 224 + 225 + Ideally yes: we are all humans and easily forget problems when something more 226 + important unexpectedly comes up -- for example a bigger problem in the Linux 227 + kernel or something in real life that's keeping us away from keyboards for a 228 + while. Hence, it's best to tell regzbot about every regression, except when you 229 + immediately write a fix and commit it to a tree regularly merged to the affected 230 + kernel series. 231 + 232 + How to see which regressions regzbot tracks currently? 233 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 234 + 235 + Check `regzbot's web-interface <https://linux-regtracking.leemhuis.info/regzbot/>`_ 236 + for the latest info; alternatively, `search for the latest regression report 237 + <https://lore.kernel.org/lkml/?q=%22Linux+regressions+report%22+f%3Aregzbot>`_, 238 + which regzbot normally sends out once a week on Sunday evening (UTC), which is a 239 + few hours before Linus usually publishes new (pre-)releases. 240 + 241 + What places is regzbot monitoring? 242 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 243 + 244 + Regzbot is watching the most important Linux mailing lists as well as the git 245 + repositories of linux-next, mainline, and stable/longterm. 246 + 247 + What kind of issues are supposed to be tracked by regzbot? 248 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 249 + 250 + The bot is meant to track regressions, hence please don't involve regzbot for 251 + regular issues. But it's okay for the Linux kernel's regression tracker if you 252 + use regzbot to track severe issues, like reports about hangs, corrupted data, 253 + or internal errors (Panic, Oops, BUG(), warning, ...). 254 + 255 + Can I add regressions found by CI systems to regzbot's tracking? 256 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 257 + 258 + Feel free to do so, if the particular regression likely has impact on practical 259 + use cases and thus might be noticed by users; hence, please don't involve 260 + regzbot for theoretical regressions unlikely to show themselves in real world 261 + usage. 262 + 263 + How to interact with regzbot? 264 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 265 + 266 + By using a 'regzbot command' in a direct or indirect reply to the mail with the 267 + regression report. These commands need to be in their own paragraph (IOW: they 268 + need to be separated from the rest of the mail using blank lines). 269 + 270 + One such command is ``#regzbot introduced <version or commit>``, which makes 271 + regzbot consider your mail as a regressions report added to the tracking, as 272 + already described above; ``#regzbot ^introduced <version or commit>`` is another 273 + such command, which makes regzbot consider the parent mail as a report for a 274 + regression which it starts to track. 275 + 276 + Once one of those two commands has been utilized, other regzbot commands can be 277 + used in direct or indirect replies to the report. You can write them below one 278 + of the `introduced` commands or in replies to the mail that used one of them 279 + or itself is a reply to that mail: 280 + 281 + * Set or update the title:: 282 + 283 + #regzbot title: foo 284 + 285 + * Monitor a discussion or bugzilla.kernel.org ticket where additions aspects of 286 + the issue or a fix are discussed -- for example the posting of a patch fixing 287 + the regression:: 288 + 289 + #regzbot monitor: https://lore.kernel.org/all/30th.anniversary.repost@klaava.Helsinki.FI/ 290 + 291 + Monitoring only works for lore.kernel.org and bugzilla.kernel.org; regzbot 292 + will consider all messages in that thread or ticket as related to the fixing 293 + process. 294 + 295 + * Point to a place with further details of interest, like a mailing list post 296 + or a ticket in a bug tracker that are slightly related, but about a different 297 + topic:: 298 + 299 + #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=123456789 300 + 301 + * Mark a regression as fixed by a commit that is heading upstream or already 302 + landed:: 303 + 304 + #regzbot fixed-by: 1f2e3d4c5d 305 + 306 + * Mark a regression as a duplicate of another one already tracked by regzbot:: 307 + 308 + #regzbot dup-of: https://lore.kernel.org/all/30th.anniversary.repost@klaava.Helsinki.FI/ 309 + 310 + * Mark a regression as invalid:: 311 + 312 + #regzbot invalid: wasn't a regression, problem has always existed 313 + 314 + Is there more to tell about regzbot and its commands? 315 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 316 + 317 + More detailed and up-to-date information about the Linux 318 + kernel's regression tracking bot can be found on its 319 + `project page <https://gitlab.com/knurd42/regzbot>`_, which among others 320 + contains a `getting started guide <https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md>`_ 321 + and `reference documentation <https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md>`_ 322 + which both cover more details than the above section. 323 + 324 + Quotes from Linus about regression 325 + ---------------------------------- 326 + 327 + Find below a few real life examples of how Linus Torvalds expects regressions to 328 + be handled: 329 + 330 + * From `2017-10-26 (1/2) 331 + <https://lore.kernel.org/lkml/CA+55aFwiiQYJ+YoLKCXjN_beDVfu38mg=Ggg5LFOcqHE8Qi7Zw@mail.gmail.com/>`_:: 332 + 333 + If you break existing user space setups THAT IS A REGRESSION. 334 + 335 + It's not ok to say "but we'll fix the user space setup". 336 + 337 + Really. NOT OK. 338 + 339 + [...] 340 + 341 + The first rule is: 342 + 343 + - we don't cause regressions 344 + 345 + and the corollary is that when regressions *do* occur, we admit to 346 + them and fix them, instead of blaming user space. 347 + 348 + The fact that you have apparently been denying the regression now for 349 + three weeks means that I will revert, and I will stop pulling apparmor 350 + requests until the people involved understand how kernel development 351 + is done. 352 + 353 + * From `2017-10-26 (2/2) 354 + <https://lore.kernel.org/lkml/CA+55aFxW7NMAMvYhkvz1UPbUTUJewRt6Yb51QAx5RtrWOwjebg@mail.gmail.com/>`_:: 355 + 356 + People should basically always feel like they can update their kernel 357 + and simply not have to worry about it. 358 + 359 + I refuse to introduce "you can only update the kernel if you also 360 + update that other program" kind of limitations. If the kernel used to 361 + work for you, the rule is that it continues to work for you. 362 + 363 + There have been exceptions, but they are few and far between, and they 364 + generally have some major and fundamental reasons for having happened, 365 + that were basically entirely unavoidable, and people _tried_hard_ to 366 + avoid them. Maybe we can't practically support the hardware any more 367 + after it is decades old and nobody uses it with modern kernels any 368 + more. Maybe there's a serious security issue with how we did things, 369 + and people actually depended on that fundamentally broken model. Maybe 370 + there was some fundamental other breakage that just _had_ to have a 371 + flag day for very core and fundamental reasons. 372 + 373 + And notice that this is very much about *breaking* peoples environments. 374 + 375 + Behavioral changes happen, and maybe we don't even support some 376 + feature any more. There's a number of fields in /proc/<pid>/stat that 377 + are printed out as zeroes, simply because they don't even *exist* in 378 + the kernel any more, or because showing them was a mistake (typically 379 + an information leak). But the numbers got replaced by zeroes, so that 380 + the code that used to parse the fields still works. The user might not 381 + see everything they used to see, and so behavior is clearly different, 382 + but things still _work_, even if they might no longer show sensitive 383 + (or no longer relevant) information. 384 + 385 + But if something actually breaks, then the change must get fixed or 386 + reverted. And it gets fixed in the *kernel*. Not by saying "well, fix 387 + your user space then". It was a kernel change that exposed the 388 + problem, it needs to be the kernel that corrects for it, because we 389 + have a "upgrade in place" model. We don't have a "upgrade with new 390 + user space". 391 + 392 + And I seriously will refuse to take code from people who do not 393 + understand and honor this very simple rule. 394 + 395 + This rule is also not going to change. 396 + 397 + And yes, I realize that the kernel is "special" in this respect. I'm 398 + proud of it. 399 + 400 + I have seen, and can point to, lots of projects that go "We need to 401 + break that use case in order to make progress" or "you relied on 402 + undocumented behavior, it sucks to be you" or "there's a better way to 403 + do what you want to do, and you have to change to that new better 404 + way", and I simply don't think that's acceptable outside of very early 405 + alpha releases that have experimental users that know what they signed 406 + up for. The kernel hasn't been in that situation for the last two 407 + decades. 408 + 409 + We do API breakage _inside_ the kernel all the time. We will fix 410 + internal problems by saying "you now need to do XYZ", but then it's 411 + about internal kernel API's, and the people who do that then also 412 + obviously have to fix up all the in-kernel users of that API. Nobody 413 + can say "I now broke the API you used, and now _you_ need to fix it 414 + up". Whoever broke something gets to fix it too. 415 + 416 + And we simply do not break user space. 417 + 418 + * From `2020-05-21 419 + <https://lore.kernel.org/all/CAHk-=wiVi7mSrsMP=fLXQrXK_UimybW=ziLOwSzFTtoXUacWVQ@mail.gmail.com/>`_:: 420 + 421 + The rules about regressions have never been about any kind of 422 + documented behavior, or where the code lives. 423 + 424 + The rules about regressions are always about "breaks user workflow". 425 + 426 + Users are literally the _only_ thing that matters. 427 + 428 + No amount of "you shouldn't have used this" or "that behavior was 429 + undefined, it's your own fault your app broke" or "that used to work 430 + simply because of a kernel bug" is at all relevant. 431 + 432 + Now, reality is never entirely black-and-white. So we've had things 433 + like "serious security issue" etc that just forces us to make changes 434 + that may break user space. But even then the rule is that we don't 435 + really have other options that would allow things to continue. 436 + 437 + And obviously, if users take years to even notice that something 438 + broke, or if we have sane ways to work around the breakage that 439 + doesn't make for too much trouble for users (ie "ok, there are a 440 + handful of users, and they can use a kernel command line to work 441 + around it" kind of things) we've also been a bit less strict. 442 + 443 + But no, "that was documented to be broken" (whether it's because the 444 + code was in staging or because the man-page said something else) is 445 + irrelevant. If staging code is so useful that people end up using it, 446 + that means that it's basically regular kernel code with a flag saying 447 + "please clean this up". 448 + 449 + The other side of the coin is that people who talk about "API 450 + stability" are entirely wrong. API's don't matter either. You can make 451 + any changes to an API you like - as long as nobody notices. 452 + 453 + Again, the regression rule is not about documentation, not about 454 + API's, and not about the phase of the moon. 455 + 456 + It's entirely about "we caused problems for user space that used to work". 457 + 458 + * From `2017-11-05 459 + <https://lore.kernel.org/all/CA+55aFzUvbGjD8nQ-+3oiMBx14c_6zOj2n7KLN3UsJ-qsd4Dcw@mail.gmail.com/>`_:: 460 + 461 + And our regression rule has never been "behavior doesn't change". 462 + That would mean that we could never make any changes at all. 463 + 464 + For example, we do things like add new error handling etc all the 465 + time, which we then sometimes even add tests for in our kselftest 466 + directory. 467 + 468 + So clearly behavior changes all the time and we don't consider that a 469 + regression per se. 470 + 471 + The rule for a regression for the kernel is that some real user 472 + workflow breaks. Not some test. Not a "look, I used to be able to do 473 + X, now I can't". 474 + 475 + * From `2018-08-03 476 + <https://lore.kernel.org/all/CA+55aFwWZX=CXmWDTkDGb36kf12XmTehmQjbiMPCqCRG2hi9kw@mail.gmail.com/>`_:: 477 + 478 + YOU ARE MISSING THE #1 KERNEL RULE. 479 + 480 + We do not regress, and we do not regress exactly because your are 100% wrong. 481 + 482 + And the reason you state for your opinion is in fact exactly *WHY* you 483 + are wrong. 484 + 485 + Your "good reasons" are pure and utter garbage. 486 + 487 + The whole point of "we do not regress" is so that people can upgrade 488 + the kernel and never have to worry about it. 489 + 490 + > Kernel had a bug which has been fixed 491 + 492 + That is *ENTIRELY* immaterial. 493 + 494 + Guys, whether something was buggy or not DOES NOT MATTER. 495 + 496 + Why? 497 + 498 + Bugs happen. That's a fact of life. Arguing that "we had to break 499 + something because we were fixing a bug" is completely insane. We fix 500 + tens of bugs every single day, thinking that "fixing a bug" means that 501 + we can break something is simply NOT TRUE. 502 + 503 + So bugs simply aren't even relevant to the discussion. They happen, 504 + they get found, they get fixed, and it has nothing to do with "we 505 + break users". 506 + 507 + Because the only thing that matters IS THE USER. 508 + 509 + How hard is that to understand? 510 + 511 + Anybody who uses "but it was buggy" as an argument is entirely missing 512 + the point. As far as the USER was concerned, it wasn't buggy - it 513 + worked for him/her. 514 + 515 + Maybe it worked *because* the user had taken the bug into account, 516 + maybe it worked because the user didn't notice - again, it doesn't 517 + matter. It worked for the user. 518 + 519 + Breaking a user workflow for a "bug" is absolutely the WORST reason 520 + for breakage you can imagine. 521 + 522 + It's basically saying "I took something that worked, and I broke it, 523 + but now it's better". Do you not see how f*cking insane that statement 524 + is? 525 + 526 + And without users, your program is not a program, it's a pointless 527 + piece of code that you might as well throw away. 528 + 529 + Seriously. This is *why* the #1 rule for kernel development is "we 530 + don't break users". Because "I fixed a bug" is absolutely NOT AN 531 + ARGUMENT if that bug fix broke a user setup. You actually introduced a 532 + MUCH BIGGER bug by "fixing" something that the user clearly didn't 533 + even care about. 534 + 535 + And dammit, we upgrade the kernel ALL THE TIME without upgrading any 536 + other programs at all. It is absolutely required, because flag-days 537 + and dependencies are horribly bad. 538 + 539 + And it is also required simply because I as a kernel developer do not 540 + upgrade random other tools that I don't even care about as I develop 541 + the kernel, and I want any of my users to feel safe doing the same 542 + time. 543 + 544 + So no. Your rule is COMPLETELY wrong. If you cannot upgrade a kernel 545 + without upgrading some other random binary, then we have a problem. 546 + 547 + * From `2021-06-05 548 + <https://lore.kernel.org/all/CAHk-=wiUVqHN76YUwhkjZzwTdjMMJf_zN4+u7vEJjmEGh3recw@mail.gmail.com/>`_:: 549 + 550 + THERE ARE NO VALID ARGUMENTS FOR REGRESSIONS. 551 + 552 + Honestly, security people need to understand that "not working" is not 553 + a success case of security. It's a failure case. 554 + 555 + Yes, "not working" may be secure. But security in that case is *pointless*. 556 + 557 + * From `2011-05-06 (1/3) 558 + <https://lore.kernel.org/all/BANLkTim9YvResB+PwRp7QTK-a5VNg2PvmQ@mail.gmail.com/>`_:: 559 + 560 + Binary compatibility is more important. 561 + 562 + And if binaries don't use the interface to parse the format (or just 563 + parse it wrongly - see the fairly recent example of adding uuid's to 564 + /proc/self/mountinfo), then it's a regression. 565 + 566 + And regressions get reverted, unless there are security issues or 567 + similar that makes us go "Oh Gods, we really have to break things". 568 + 569 + I don't understand why this simple logic is so hard for some kernel 570 + developers to understand. Reality matters. Your personal wishes matter 571 + NOT AT ALL. 572 + 573 + If you made an interface that can be used without parsing the 574 + interface description, then we're stuck with the interface. Theory 575 + simply doesn't matter. 576 + 577 + You could help fix the tools, and try to avoid the compatibility 578 + issues that way. There aren't that many of them. 579 + 580 + From `2011-05-06 (2/3) 581 + <https://lore.kernel.org/all/BANLkTi=KVXjKR82sqsz4gwjr+E0vtqCmvA@mail.gmail.com/>`_:: 582 + 583 + it's clearly NOT an internal tracepoint. By definition. It's being 584 + used by powertop. 585 + 586 + From `2011-05-06 (3/3) 587 + <https://lore.kernel.org/all/BANLkTinazaXRdGovYL7rRVp+j6HbJ7pzhg@mail.gmail.com/>`_:: 588 + 589 + We have programs that use that ABI and thus it's a regression if they break. 590 + 591 + * From `2012-07-06 <https://lore.kernel.org/all/CA+55aFwnLJ+0sjx92EGREGTWOx84wwKaraSzpTNJwPVV8edw8g@mail.gmail.com/>`_:: 592 + 593 + > Now this got me wondering if Debian _unstable_ actually qualifies as a 594 + > standard distro userspace. 595 + 596 + Oh, if the kernel breaks some standard user space, that counts. Tons 597 + of people run Debian unstable 598 + 599 + * From `2019-09-15 600 + <https://lore.kernel.org/lkml/CAHk-=wiP4K8DRJWsCo=20hn_6054xBamGKF2kPgUzpB5aMaofA@mail.gmail.com/>`_:: 601 + 602 + One _particularly_ last-minute revert is the top-most commit (ignoring 603 + the version change itself) done just before the release, and while 604 + it's very annoying, it's perhaps also instructive. 605 + 606 + What's instructive about it is that I reverted a commit that wasn't 607 + actually buggy. In fact, it was doing exactly what it set out to do, 608 + and did it very well. In fact it did it _so_ well that the much 609 + improved IO patterns it caused then ended up revealing a user-visible 610 + regression due to a real bug in a completely unrelated area. 611 + 612 + The actual details of that regression are not the reason I point that 613 + revert out as instructive, though. It's more that it's an instructive 614 + example of what counts as a regression, and what the whole "no 615 + regressions" kernel rule means. The reverted commit didn't change any 616 + API's, and it didn't introduce any new bugs. But it ended up exposing 617 + another problem, and as such caused a kernel upgrade to fail for a 618 + user. So it got reverted. 619 + 620 + The point here being that we revert based on user-reported _behavior_, 621 + not based on some "it changes the ABI" or "it caused a bug" concept. 622 + The problem was really pre-existing, and it just didn't happen to 623 + trigger before. The better IO patterns introduced by the change just 624 + happened to expose an old bug, and people had grown to depend on the 625 + previously benign behavior of that old issue. 626 + 627 + And never fear, we'll re-introduce the fix that improved on the IO 628 + patterns once we've decided just how to handle the fact that we had a 629 + bad interaction with an interface that people had then just happened 630 + to rely on incidental behavior for before. It's just that we'll have 631 + to hash through how to do that (there are no less than three different 632 + patches by three different developers being discussed, and there might 633 + be more coming...). In the meantime, I reverted the thing that exposed 634 + the problem to users for this release, even if I hope it will be 635 + re-introduced (perhaps even backported as a stable patch) once we have 636 + consensus about the issue it exposed. 637 + 638 + Take-away from the whole thing: it's not about whether you change the 639 + kernel-userspace ABI, or fix a bug, or about whether the old code 640 + "should never have worked in the first place". It's about whether 641 + something breaks existing users' workflow. 642 + 643 + Anyway, that was my little aside on the whole regression thing. Since 644 + it's that "first rule of kernel programming", I felt it is perhaps 645 + worth just bringing it up every once in a while 646 + 647 + .. 648 + end-of-content 649 + .. 650 + This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top 651 + of the file. If you want to distribute this text under CC-BY-4.0 only, 652 + please use "The Linux kernel developers" for author attribution and link 653 + this as source: 654 + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/process/handling-regressions.rst 655 + .. 656 + Note: Only the content of this RST file as found in the Linux kernel sources 657 + is available under CC-BY-4.0, as versions of this text that were processed 658 + (for example by the kernel's build system) might contain content taken from 659 + files which use a more restrictive license.

+1

Documentation/process/index.rst

··· 25 25 code-of-conduct-interpretation 26 26 development-process 27 27 submitting-patches 28 + handling-regressions 28 29 programming-language 29 30 coding-style 30 31 maintainer-handbooks

+2

MAINTAINERS

··· 10438 10438 M: Thorsten Leemhuis <linux@leemhuis.info> 10439 10439 L: regressions@lists.linux.dev 10440 10440 S: Supported 10441 + F: Documentation/admin-guide/reporting-regressions.rst 10442 + F: Documentation/process/handling-regressions.rst 10441 10443 10442 10444 KERNEL SELFTEST FRAMEWORK 10443 10445 M: Shuah Khan <shuah@kernel.org>