lol

doc: improve fetchers overview, deduplicate readme content, follow doc conventions (#297654)

* doc: improve fetchers overview, deduplicate readme content

* Improve caveat explanation and some fetchurl content

* move out consumer docs on source fetching

* move note on mirror URLs to the relevant section

this may be better suited for the `fetchurl` reference, but it's probably better to
just render that information into the manual. for now, because
- contributor documentation encourages mirrors
- we can expect contributors to dig into the source
- linking source files is trivial in in-code documentation
we leave it there.

* move instructions for updating hashes to the manual

* Add more clarity on text, reorganise source hash methods

---------

Co-authored-by: Valentin Gagarin <valentin.gagarin@tweag.io>
Co-authored-by: Dominic Mills-Howell <dominic.millz27@gmail.com>
Co-authored-by: lolbinarycat <dogedoge61+github@gmail.com>

+152 -105
+132 -37
doc/build-helpers/fetchers.chapter.md
··· 1 1 # Fetchers {#chap-pkgs-fetchers} 2 2 3 3 Building software with Nix often requires downloading source code and other files from the internet. 4 - To this end, Nixpkgs provides *fetchers*: functions to obtain remote sources via various protocols and services. 4 + To this end, we use functions that we call _fetchers_, which obtain remote sources via various protocols and services. 5 5 6 - Nixpkgs fetchers differ from built-in fetchers such as [`builtins.fetchTarball`](https://nixos.org/manual/nix/stable/language/builtins.html#builtins-fetchTarball): 6 + Nix provides built-in fetchers such as [`builtins.fetchTarball`](https://nixos.org/manual/nix/stable/language/builtins.html#builtins-fetchTarball). 7 + Nixpkgs provides its own fetchers, which work differently: 8 + 7 9 - A built-in fetcher will download and cache files at evaluation time and produce a [store path](https://nixos.org/manual/nix/stable/glossary#gloss-store-path). 8 - A Nixpkgs fetcher will create a ([fixed-output](https://nixos.org/manual/nix/stable/glossary#gloss-fixed-output-derivation)) [derivation](https://nixos.org/manual/nix/stable/language/derivations), and files are downloaded at build time. 10 + A Nixpkgs fetcher will create a ([fixed-output](https://nixos.org/manual/nix/stable/glossary#gloss-fixed-output-derivation)) [derivation](https://nixos.org/manual/nix/stable/glossary#gloss-derivation), and files are downloaded at build time. 9 11 - Built-in fetchers will invalidate their cache after [`tarball-ttl`](https://nixos.org/manual/nix/stable/command-ref/conf-file#conf-tarball-ttl) expires, and will require network activity to check if the cache entry is up to date. 10 - Nixpkgs fetchers only re-download if the specified hash changes or the store object is not otherwise available. 12 + Nixpkgs fetchers only re-download if the specified hash changes or the store object is not available. 11 13 - Built-in fetchers do not use [substituters](https://nixos.org/manual/nix/stable/command-ref/conf-file#conf-substituters). 12 14 Derivations produced by Nixpkgs fetchers will use any configured binary cache transparently. 13 15 14 - This significantly reduces the time needed to evaluate the entirety of Nixpkgs, and allows [Hydra](https://nixos.org/hydra) to retain and re-distribute sources used by Nixpkgs in the [public binary cache](https://cache.nixos.org). 15 - For these reasons, built-in fetchers are not allowed in Nixpkgs source code. 16 + This significantly reduces the time needed to evaluate Nixpkgs, and allows [Hydra](https://nixos.org/hydra) to retain and re-distribute sources used by Nixpkgs in the [public binary cache](https://cache.nixos.org). 17 + For these reasons, Nix's built-in fetchers are not allowed in Nixpkgs. 16 18 17 - The following table shows an overview of the differences: 19 + The following table summarises the differences: 18 20 19 21 | Fetchers | Download | Output | Cache | Re-download when | 20 22 |-|-|-|-|-| 21 23 | `builtins.fetch*` | evaluation time | store path | `/nix/store`, `~/.cache/nix` | `tarball-ttl` expires, cache miss in `~/.cache/nix`, output store object not in local store | 22 24 | `pkgs.fetch*` | build time | derivation | `/nix/store`, substituters | output store object not available | 23 25 26 + :::{.tip} 27 + `pkgs.fetchFrom*` helpers retrieve _snapshots_ of version-controlled sources, as opposed to the entire version history, which is more efficient. 28 + `pkgs.fetchgit` by default also has the same behaviour, but can be changed through specific attributes given to it. 29 + ::: 30 + 24 31 ## Caveats {#chap-pkgs-fetchers-caveats} 25 32 26 - The fact that the hash belongs to the Nix derivation output and not the file itself can lead to confusion. 27 - For example, consider the following fetcher: 33 + Because Nixpkgs fetchers are fixed-output derivations, an [output hash](https://nixos.org/manual/nix/stable/language/advanced-attributes#adv-attr-outputHash) has to be specified, usually indirectly through a `hash` attribute. 34 + This hash refers to the derivation output, which can be different from the remote source itself! 35 + 36 + This has the following implications that you should be aware of: 37 + 38 + - Use Nix (or Nix-aware) tooling to produce the output hash. 39 + 40 + - When changing any fetcher parameters, always update the output hash. 41 + Use one of the methods from [](#sec-pkgs-fetchers-updating-source-hashes). 42 + Otherwise, existing store objects that match the output hash will be re-used rather than fetching new content. 43 + 44 + :::{.note} 45 + A similar problem arises while testing changes to a fetcher's implementation. 46 + If the output of the derivation already exists in the Nix store, test failures can go undetected. 47 + The [`invalidateFetcherByDrvHash`](#tester-invalidateFetcherByDrvHash) function helps prevent reusing cached derivations. 48 + ::: 49 + 50 + ## Updating source hashes {#sec-pkgs-fetchers-updating-source-hashes} 51 + 52 + There are several ways to obtain the hash corresponding to a remote source. 53 + Unless you understand how the fetcher you're using calculates the hash from the downloaded contents, you should use [the fake hash method](#sec-pkgs-fetchers-updating-source-hashes-fakehash-method). 54 + 55 + 1. []{#sec-pkgs-fetchers-updating-source-hashes-fakehash-method} The fake hash method: In your package recipe, set the hash to one of 56 + 57 + - `""` 58 + - `lib.fakeHash` 59 + - `lib.fakeSha256` 60 + - `lib.fakeSha512` 61 + 62 + Attempt to build, extract the calculated hashes from error messages, and put them into the recipe. 63 + 64 + :::{.warning} 65 + You must use one of these four fake hashes and not some arbitrarily-chosen hash. 66 + See [](#sec-pkgs-fetchers-secure-hashes) for details. 67 + ::: 68 + 69 + :::{.example #ex-fetchers-update-fod-hash} 70 + # Update source hash with the fake hash method 71 + 72 + Consider the following recipe that produces a plain file: 73 + 74 + ```nix 75 + { fetchurl }: 76 + fetchurl { 77 + url = "https://raw.githubusercontent.com/NixOS/nixpkgs/23.05/.version"; 78 + hash = "sha256-ZHl1emidXVojm83LCVrwULpwIzKE/mYwfztVkvpruOM="; 79 + } 80 + ``` 81 + 82 + A common mistake is to update a fetcher parameter, such as `url`, without updating the hash: 83 + 84 + ```nix 85 + { fetchurl }: 86 + fetchurl { 87 + url = "https://raw.githubusercontent.com/NixOS/nixpkgs/23.11/.version"; 88 + hash = "sha256-ZHl1emidXVojm83LCVrwULpwIzKE/mYwfztVkvpruOM="; 89 + } 90 + ``` 91 + 92 + **This will produce the same output as before!** 93 + Set the hash to an empty string: 94 + 95 + ```nix 96 + { fetchurl }: 97 + fetchurl { 98 + url = "https://raw.githubusercontent.com/NixOS/nixpkgs/23.11/.version"; 99 + hash = ""; 100 + } 101 + ``` 28 102 29 - ```nix 30 - fetchurl { 31 - url = "http://www.example.org/hello-1.0.tar.gz"; 32 - hash = "sha256-lTeyxzJNQeMdu1IVdovNMtgn77jRIhSybLdMbTkf2Ww="; 33 - } 34 - ``` 103 + When building the package, use the error message to determine the correct hash: 35 104 36 - A common mistake is to update a fetcher’s URL, or a version parameter, without updating the hash. 105 + ```shell 106 + $ nix-build 107 + (some output removed for clarity) 108 + error: hash mismatch in fixed-output derivation '/nix/store/7yynn53jpc93l76z9zdjj4xdxgynawcw-version.drv': 109 + specified: sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= 110 + got: sha256-BZqI7r0MNP29yGH5+yW2tjU9OOpOCEvwWKrWCv5CQ0I= 111 + error: build of '/nix/store/bqdjcw5ij5ymfbm41dq230chk9hdhqff-version.drv' failed 112 + ``` 113 + ::: 37 114 38 - ```nix 39 - fetchurl { 40 - url = "http://www.example.org/hello-1.1.tar.gz"; 41 - hash = "sha256-lTeyxzJNQeMdu1IVdovNMtgn77jRIhSybLdMbTkf2Ww="; 42 - } 43 - ``` 115 + 2. Prefetch the source with [`nix-prefetch-<type> <URL>`](https://search.nixos.org/packages?buckets={%22package_attr_set%22%3A[%22No%20package%20set%22]%2C%22package_license_set%22%3A[]%2C%22package_maintainers_set%22%3A[]%2C%22package_platforms%22%3A[]}&query=nix-prefetch), where `<type>` is one of 44 116 45 - **This will reuse the old contents**. 46 - Remember to invalidate the hash argument, in this case by setting the `hash` attribute to an empty string. 117 + - `url` 118 + - `git` 119 + - `hg` 120 + - `cvs` 121 + - `bzr` 122 + - `svn` 47 123 48 - ```nix 49 - fetchurl { 50 - url = "http://www.example.org/hello-1.1.tar.gz"; 51 - hash = ""; 52 - } 53 - ``` 124 + The hash is printed to stdout. 125 + 126 + 3. Prefetch by package source (with `nix-prefetch-url '<nixpkgs>' -A <package>.src`, where `<package>` is package attribute name). 127 + The hash is printed to stdout. 128 + 129 + This works well when you've upgraded the existing package version and want to find out new hash, but is useless if the package can't be accessed by attribute or the package has multiple sources (`.srcs`, architecture-dependent sources, etc). 54 130 55 - Use the resulting error message to determine the correct hash. 131 + 4. Upstream hash: use it when upstream provides `sha256` or `sha512`. 132 + Don't use it when upstream provides `md5`, compute `sha256` instead. 56 133 57 - ``` 58 - error: hash mismatch in fixed-output derivation '/path/to/my.drv': 59 - specified: sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= 60 - got: sha256-lTeyxzJNQeMdu1IVdovNMtgn77jRIhSybLdMbTkf2Ww= 61 - ``` 134 + A little nuance is that `nix-prefetch-*` tools produce hashes with the `nix32` encoding (a Nix-specific base32 adaptation), but upstream usually provides hexadecimal (`base16`) encoding. 135 + Fetchers understand both formats. 136 + Nixpkgs does not standardise on any one format. 62 137 63 - A similar problem arises while testing changes to a fetcher's implementation. If the output of the derivation already exists in the Nix store, test failures can go undetected. The [`invalidateFetcherByDrvHash`](#tester-invalidateFetcherByDrvHash) function helps prevent reusing cached derivations. 138 + You can convert between hash formats with [`nix-hash`](https://nixos.org/manual/nix/stable/command-ref/nix-hash). 139 + 140 + 5. Extract the hash from a local source archive with `sha256sum`. 141 + Use `nix-prefetch-url file:///path/to/archive` if you want the custom Nix `base32` hash. 142 + 143 + ## Obtaining hashes securely {#sec-pkgs-fetchers-secure-hashes} 144 + 145 + It's always a good idea to avoid Man-in-the-Middle (MITM) attacks when downloading source contents. 146 + Otherwise, you could unknowingly download malware instead of the intended source, and instead of the actual source hash, you'll end up using the hash of malware. 147 + Here are security considerations for this scenario: 148 + 149 + - `http://` URLs are not secure to prefetch hashes. 150 + 151 + - Upstream hashes should be obtained via a secure protocol. 152 + 153 + - `https://` URLs give you more protections when using `nix-prefetch-*` or for upstream hashes. 154 + 155 + - `https://` URLs are secure when using the [fake hash method](#sec-pkgs-fetchers-updating-source-hashes-fakehash-method) *only if* you use one of the listed fake hashes. 156 + If you use any other hash, the download will be exposed to MITM attacks even if you use HTTPS URLs. 157 + 158 + In more concrete terms, if you use any other hash, the [`--insecure` flag](https://curl.se/docs/manpage.html#-k) will be passed to the underlying call to `curl` when downloading content. 64 159 65 160 ## `fetchurl` and `fetchzip` {#fetchurl} 66 161
+3 -1
doc/manpage-urls.json
··· 320 320 "login.defs(5)": "https://man.archlinux.org/man/login.defs.5", 321 321 "unshare(1)": "https://man.archlinux.org/man/unshare.1.en", 322 322 "nix-shell(1)": "https://nixos.org/manual/nix/stable/command-ref/nix-shell.html", 323 - "mksquashfs(1)": "https://man.archlinux.org/man/extra/squashfs-tools/mksquashfs.1.en" 323 + "mksquashfs(1)": "https://man.archlinux.org/man/extra/squashfs-tools/mksquashfs.1.en", 324 + "curl(1)": "https://curl.se/docs/manpage.html", 325 + "netrc(5)": "https://man.cx/netrc" 324 326 }
+17 -67
pkgs/README.md
··· 94 94 95 95 - All other [`meta`](https://nixos.org/manual/nixpkgs/stable/#chap-meta) attributes are optional, but it’s still a good idea to provide at least the `description`, `homepage` and [`license`](https://nixos.org/manual/nixpkgs/stable/#sec-meta-license). 96 96 97 - - You can use `nix-prefetch-url url` to get the SHA-256 hash of source distributions. There are similar commands as `nix-prefetch-git` and `nix-prefetch-hg` available in `nix-prefetch-scripts` package. 98 - 99 - - A list of schemes for `mirror://` URLs can be found in [`pkgs/build-support/fetchurl/mirrors.nix`](build-support/fetchurl/mirrors.nix). 100 - 101 - The exact syntax and semantics of the Nix expression language, including the built-in function, are [described in the Nix manual](https://nixos.org/manual/nix/stable/language/). 97 + - The exact syntax and semantics of the Nix expression language, including the built-in functions, are [Nix language reference](https://nixos.org/manual/nix/stable/language/). 102 98 103 99 5. To test whether the package builds, run the following command from the root of the nixpkgs source tree: 104 100 ··· 397 393 398 394 See the Nixpkgs manual for more details on [standard meta-attributes](https://nixos.org/nixpkgs/manual/#sec-standard-meta-attributes). 399 395 400 - ### Import From Derivation 396 + ## Import From Derivation 401 397 402 398 [Import From Derivation](https://nixos.org/manual/nix/unstable/language/import-from-derivation) (IFD) is disallowed in Nixpkgs for performance reasons: 403 399 [Hydra](https://github.com/NixOS/hydra) evaluates the entire package set, and sequential builds during evaluation would increase evaluation times to become impractical. ··· 406 402 407 403 ## Sources 408 404 409 - ### Fetching Sources 405 + Always fetch source files using [Nixpkgs fetchers](https://nixos.org/manual/nixpkgs/unstable/#chap-pkgs-fetchers). 406 + Use reproducible sources with a high degree of availability. 407 + Prefer protocols that support proxies. 410 408 411 - There are multiple ways to fetch a package source in nixpkgs. The general guideline is that you should package reproducible sources with a high degree of availability. Right now there is only one fetcher which has mirroring support and that is `fetchurl`. Note that you should also prefer protocols which have a corresponding proxy environment variable. 409 + A list of schemes for `mirror://` URLs can be found in [`pkgs/build-support/fetchurl/mirrors.nix`](build-support/fetchurl/mirrors.nix), and is supported by [`fetchurl`](https://nixos.org/manual/nixpkgs/unstable/#fetchurl). 410 + Other fetchers which end up relying on `fetchurl` may also support mirroring. 412 411 413 - You can find many source fetch helpers in `pkgs/build-support/fetch*`. 412 + The preferred source hash type is `sha256`. 414 413 415 - In the file `pkgs/top-level/all-packages.nix` you can find fetch helpers, these have names on the form `fetchFrom*`. The intention of these are to provide snapshot fetches but using the same api as some of the version controlled fetchers from `pkgs/build-support/`. As an example going from bad to good: 414 + Examples going from bad to best practices: 416 415 417 416 - Bad: Uses `git://` which won't be proxied. 418 417 ··· 438 437 } 439 438 ``` 440 439 441 - - Best: Fetches a snapshot archive and you get the rev you want. 440 + - Best: Fetches a snapshot archive for the given revision. 442 441 443 442 ```nix 444 443 { ··· 451 450 } 452 451 ``` 453 452 454 - When fetching from GitHub, commits must always be referenced by their full commit hash. This is because GitHub shares commit hashes among all forks and returns `404 Not Found` when a short commit hash is ambiguous. It already happens for some short, 6-character commit hashes in `nixpkgs`. 455 - It is a practical vector for a denial-of-service attack by pushing large amounts of auto generated commits into forks and was already [demonstrated against GitHub Actions Beta](https://blog.teddykatz.com/2019/11/12/github-actions-dos.html). 456 - 457 - Find the value to put as `hash` by running `nix-shell -p nix-prefetch-github --run "nix-prefetch-github --rev 1f795f9f44607cc5bec70d1300150bfefcef2aae NixOS nix"`. 458 - 459 - #### Obtaining source hash 460 - 461 - Preferred source hash type is sha256. There are several ways to get it. 462 - 463 - 1. Prefetch URL (with `nix-prefetch-XXX URL`, where `XXX` is one of `url`, `git`, `hg`, `cvs`, `bzr`, `svn`). Hash is printed to stdout. 464 - 465 - 2. Prefetch by package source (with `nix-prefetch-url '<nixpkgs>' -A PACKAGE.src`, where `PACKAGE` is package attribute name). Hash is printed to stdout. 466 - 467 - This works well when you've upgraded existing package version and want to find out new hash, but is useless if package can't be accessed by attribute or package has multiple sources (`.srcs`, architecture-dependent sources, etc). 468 - 469 - 3. Upstream provided hash: use it when upstream provides `sha256` or `sha512` (when upstream provides `md5`, don't use it, compute `sha256` instead). 470 - 471 - A little nuance is that `nix-prefetch-*` tools produce hash encoded with `base32`, but upstream usually provides hexadecimal (`base16`) encoding. Fetchers understand both formats. Nixpkgs does not standardize on any one format. 472 - 473 - You can convert between formats with nix-hash, for example: 474 - 475 - ```ShellSession 476 - $ nix-hash --type sha256 --to-base32 HASH 477 - ``` 478 - 479 - 4. Extracting hash from local source tarball can be done with `sha256sum`. Use `nix-prefetch-url file:///path/to/tarball` if you want base32 hash. 480 - 481 - 5. Fake hash: set the hash to one of 482 - 483 - - `""` 484 - - `lib.fakeHash` 485 - - `lib.fakeSha256` 486 - - `lib.fakeSha512` 487 - 488 - in the package expression, attempt build and extract correct hash from error messages. 453 + > [!Note] 454 + > When fetching from GitHub, always reference revisions by their full commit hash. 455 + > GitHub shares commit hashes among all forks and returns `404 Not Found` when a short commit hash is ambiguous. 456 + > It already happened in Nixpkgs for short, 6-character commit hashes. 457 + > 458 + > Pushing large amounts of auto generated commits into forks is a practical vector for a denial-of-service attack, and was already [demonstrated against GitHub Actions Beta](https://blog.teddykatz.com/2019/11/12/github-actions-dos.html). 489 459 490 - > [!Warning] 491 - > You must use one of these four fake hashes and not some arbitrarily-chosen hash. 492 - > See [here][secure-hashes] 493 - 494 - This is last resort method when reconstructing source URL is non-trivial and `nix-prefetch-url -A` isn’t applicable (for example, [one of `kodi` dependencies](https://github.com/NixOS/nixpkgs/blob/d2ab091dd308b99e4912b805a5eb088dd536adb9/pkgs/applications/video/kodi/default.nix#L73)). The easiest way then would be replace hash with a fake one and rebuild. Nix build will fail and error message will contain desired hash. 495 - 496 - 497 - #### Obtaining hashes securely 498 - [secure-hashes]: #obtaining-hashes-securely 499 - 500 - Let's say Man-in-the-Middle (MITM) sits close to your network. Then instead of fetching source you can fetch malware, and instead of source hash you get hash of malware. Here are security considerations for this scenario: 501 - 502 - - `http://` URLs are not secure to prefetch hash from; 503 - 504 - - hashes from upstream (in method 3) should be obtained via secure protocol; 505 - 506 - - `https://` URLs are secure in methods 1, 2, 3; 507 - 508 - - `https://` URLs are secure in method 5 *only if* you use one of the listed fake hashes. If you use any other hash, `fetchurl` will pass `--insecure` to `curl` and may then degrade to HTTP in case of TLS certificate expiration. 509 - 510 - ### Patches 460 + ## Patches 511 461 512 462 Patches available online should be retrieved using `fetchpatch`. 513 463