doc/languages-frameworks/cuda.section.md at haskell-updates

tjh.dev / nixpkgs
fork atom
nixpkgs mirror (for testing) github.com/NixOS/nixpkgs
nix
fork atom
nixpkgs / doc / languages-frameworks / cuda.section.md
at haskell-updates 370 lines 24 kB view raw view rendered
wrap content
  1# CUDA {#cuda}
  2
  3Compute Unified Device Architecture (CUDA) is a parallel computing platform and application programming interface (API) model created by NVIDIA. It's commonly used to accelerate computationally intensive problems and has been widely adopted for high-performance computing (HPC) and machine learning (ML) applications.
  4
  5## User Guide {#cuda-user-guide}
  6
  7Packages provided by NVIDIA which require CUDA are typically stored in CUDA package sets.
  8
  9Nixpkgs provides a number of CUDA package sets, each based on a different CUDA release. Top-level attributes that provide access to CUDA package sets follow these naming conventions:
 10
 11- `cudaPackages_x_y`: A major-minor-versioned package set for a specific CUDA release, where `x` and `y` are the major and minor versions of the CUDA release.
 12- `cudaPackages_x`: A major-versioned alias to the major-minor-versioned CUDA package set with the latest widely supported major CUDA release.
 13- `cudaPackages`: An unversioned alias to the major-versioned alias for the latest widely supported CUDA release. The package set referenced by this alias is also referred to as the "default" CUDA package set.
 14
 15It is recommended to use the unversioned `cudaPackages` attribute. While versioned package sets are available (e.g., `cudaPackages_12_8`), they are periodically removed.
 16
 17Here are two examples to illustrate the naming conventions:
 18
 19- If `cudaPackages_12_9` is the latest release in the 12.x series, but core libraries like OpenCV or ONNX Runtime fail to build with it, `cudaPackages_12` may alias `cudaPackages_12_8` instead of `cudaPackages_12_9`.
 20- If `cudaPackages_13_1` is the latest release, but core libraries like PyTorch or Torch Vision fail to build with it, `cudaPackages` may alias `cudaPackages_12` instead of `cudaPackages_13`.
 21
 22All CUDA package sets include common CUDA packages like `libcublas`, `cudnn`, `tensorrt`, and `nccl`.
 23
 24### Configuring Nixpkgs for CUDA {#cuda-configuring-nixpkgs-for-cuda}
 25
 26CUDA support is not enabled by default in Nixpkgs. To enable CUDA support, make sure Nixpkgs is imported with a configuration similar to the following:
 27
 28```nix
 29{ pkgs }:
 30{
 31  allowUnfreePredicate = pkgs._cuda.lib.allowUnfreeCudaPredicate;
 32  cudaCapabilities = [ <target-architectures> ];
 33  cudaForwardCompat = true;
 34  cudaSupport = true;
 35}
 36```
 37
 38The majority of CUDA packages are unfree, so either `allowUnfreePredicate` or `allowUnfree` should be set.
 39
 40The `cudaSupport` configuration option is used by packages to conditionally enable CUDA-specific functionality. This configuration option is commonly used by packages which can be built with or without CUDA support.
 41
 42The `cudaCapabilities` configuration option specifies a list of CUDA capabilities. Packages may use this option to control device code generation to take advantage of architecture-specific functionality, speed up compile times by producing less device code, or slim package closures. For example, you can build for Ada Lovelace GPUs with `cudaCapabilities = [ "8.9" ];`. If `cudaCapabilities` is not provided, the default value is calculated per-package set, derived from a list of GPUs supported by that CUDA version. Please consult [supported GPUs](https://en.wikipedia.org/wiki/CUDA#GPUs_supported) for specific cards. Library maintainers should consult [NVCC Docs](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/) and its release notes.
 43
 44::: {.caution}
 45Certain CUDA capabilities are not targeted by default, including capabilities belonging to the Jetson family of devices (e.g. `8.7`, which corresponds to the Jetson Orin) or non-baseline feature-sets (e.g. `9.0a`, which corresponds to the Hopper exclusive feature set). If you need to target these capabilities, you must explicitly set `cudaCapabilities` to include them.
 46:::
 47
 48The `cudaForwardCompat` boolean configuration option determines whether PTX support for future hardware is enabled.
 49
 50### Modifying CUDA package sets {#cuda-modifying-cuda-package-sets}
 51
 52CUDA package sets are defined in `pkgs/top-level/cuda-packages.nix`. A CUDA package set is created by `callPackage`-ing `pkgs/development/cuda-modules/default.nix` with an attribute set `manifests`, containing NVIDIA manifests for each redistributable. The manifests for supported redistributables are available through `_cuda.manifests` and live in `pkgs/development/cuda-modules/_cuda/manifests`.
 53
 54The majority of the CUDA package set tooling is available through the top-level attribute set `_cuda`, a fixed-point defined outside the CUDA package sets. As a fixed-point, `_cuda` should be modified through its `extend` attribute.
 55
 56::: {.caution}
 57As indicated by the underscore prefix, `_cuda` is an implementation detail and no guarantees are provided with respect to its stability or API. The `_cuda` attribute set is exposed only to ease creation or modification of CUDA package sets by expert, out-of-tree users.
 58:::
 59
 60Out-of-tree modifications of packages should use `overrideAttrs` to make any necessary modifications to the package expression.
 61
 62::: {.note}
 63The `_cuda` attribute set previously exposed `fixups`, an attribute set mapping from package name (`pname`) to a `callPackage`-compatible expression which provided to `overrideAttrs` on the result of a generic redistributable builder. This functionality has been removed in favor of including full package expressions for each redistributable package to ensure consistent attribute set membership across supported CUDA releases, platforms, and configurations.
 64:::
 65
 66### Extending CUDA package sets {#cuda-extending-cuda-package-sets}
 67
 68CUDA package sets are scopes and provide the usual `overrideScope` attribute for overriding package attributes (see the note about `_cuda` in [Configuring CUDA package sets](#cuda-modifying-cuda-package-sets)).
 69
 70Inspired by `pythonPackagesExtensions`, the `_cuda.extensions` attribute is a list of extensions applied to every version of the CUDA package set, allowing modification of all versions of the CUDA package set without needing to know their names or explicitly enumerate and modify them. As an example, disabling `cuda_compat` across all CUDA package sets can be accomplished with this overlay:
 71
 72```nix
 73final: prev: {
 74  _cuda = prev._cuda.extend (
 75    _: prevAttrs: {
 76      extensions = prevAttrs.extensions ++ [ (_: _: { cuda_compat = null; }) ];
 77    }
 78  );
 79}
 80```
 81
 82Redistributable packages are constructed by the `buildRedist` helper; see `pkgs/development/cuda-modules/buildRedist/default.nix` for the implementation.
 83
 84### Using `cudaPackages` {#cuda-using-cudapackages}
 85
 86::: {.caution}
 87A non-trivial amount of CUDA package discoverability and usability relies on the various setup hooks used by a CUDA package set. As a result, users will likely encounter issues trying to perform builds within a `devShell` without manually invoking phases.
 88:::
 89
 90To use one or more CUDA packages in an expression, give the expression a `cudaPackages` parameter, and in case CUDA support is optional, add a `config` and `cudaSupport` parameter:
 91
 92```nix
 93{
 94  config,
 95  cudaSupport ? config.cudaSupport,
 96  cudaPackages,
 97}:
 98<package-expression>
 99```
100
101In your package's derivation arguments, it is _strongly_ recommended that the following are set:
102
103```nix
104{
105  __structuredAttrs = true;
106  strictDeps = true;
107}
108```
109
110These settings ensure that the CUDA setup hooks function as intended.
111
112When using `callPackage`, you can choose to pass in a different variant, e.g. when a package requires a specific version of CUDA:
113
114```nix
115{ mypkg = callPackage { cudaPackages = cudaPackages_12_6; }; }
116```
117
118::: {.caution}
119Overriding the CUDA package set for a package may cause inconsistencies, because the override does not affect its direct or transitive dependencies. As a result, it is easy to end up with a package that use a different CUDA package set than its dependencies. If possible, it is recommended that you change the default CUDA package set globally, to ensure a consistent environment.
120:::
121
122### Nixpkgs CUDA variants {#cuda-nixpkgs-cuda-variants}
123
124Nixpkgs CUDA variants are provided primarily for the convenience of selecting CUDA-enabled packages by attribute path. As an example, the `pkgsForCudaArch` collection of CUDA Nixpkgs variants allows you to access an instantiation of OpenCV with CUDA support for an Ada Lovelace GPU with the attribute path `pkgsForCudaArch.sm_89.opencv`, without needing to modify the `config` provided when importing Nixpkgs.
125
126::: {.caution}
127Nixpkgs variants are not free: they require re-evaluating Nixpkgs. Where possible, import Nixpkgs once, with the desired configuration.
128:::
129
130#### Using `cudaPackages.pkgs` {#cuda-using-cudapackages-pkgs}
131
132Each CUDA package set has a `pkgs` attribute, which is a variant of Nixpkgs in which the enclosing CUDA package set becomes the default. This was done primarily to avoid package set leakage, wherein a member of a non-default CUDA package set has a (potentially transitive) dependency on a member of the default CUDA package set.
133
134::: {.note}
135Package set leakage is a common problem in Nixpkgs and is not limited to CUDA package sets.
136:::
137
138As an added benefit of `pkgs` being configured this way, building a package with a non-default version of CUDA is as simple as accessing an attribute. As an example, `cudaPackages_12_8.pkgs.opencv` provides OpenCV built against CUDA 12.8.
139
140#### Using `pkgsCuda` {#cuda-using-pkgscuda}
141
142The `pkgsCuda` attribute set is a variant of Nixpkgs configured with `cudaSupport = true;` and `rocmSupport = false`. It is a convenient way to access a variant of Nixpkgs configured with the default set of CUDA capabilities.
143
144#### Using `pkgsForCudaArch` {#cuda-using-pkgsforcudaarch}
145
146The `pkgsForCudaArch` attribute set maps CUDA architectures (e.g., `sm_89` for Ada Lovelace or `sm_90a` for architecture-specific Hopper) to Nixpkgs variants configured to support exactly that architecture. As an example, `pkgsForCudaArch.sm_89` is a Nixpkgs variant extending `pkgs` and setting the following values in `config`:
147
148```nix
149{
150  cudaSupport = true;
151  cudaCapabilities = [ "8.9" ];
152  cudaForwardCompat = false;
153}
154```
155
156::: {.note}
157In `pkgsForCudaArch`, the `cudaForwardCompat` option is set to `false` because exactly one CUDA architecture is supported by the corresponding Nixpkgs variant. Furthermore, some architectures, including architecture-specific feature sets like `sm_90a`, cannot be built with forward compatibility.
158:::
159
160::: {.caution}
161Not every version of CUDA supports every architecture!
162
163To illustrate: support for Blackwell (e.g., `sm_100`) was added in CUDA 12.8. Assume our Nixpkgs' default CUDA package set is to CUDA 12.6. Then the Nixpkgs variant available through `pkgsForCudaArch.sm_100` is useless, since packages like `pkgsForCudaArch.sm_100.opencv` and `pkgsForCudaArch.sm_100.python3Packages.torch` will try to generate code for `sm_100`, an architecture unknown to CUDA 12.6. In that case, you should use `pkgsForCudaArch.sm_100.cudaPackages_12_8.pkgs` instead (see [Using `cudaPackages.pkgs`](#cuda-using-cudapackages-pkgs) for more details).
164:::
165
166The `pkgsForCudaArch` attribute set makes it possible to access packages built for a specific architecture without needing to manually call `pkgs.extend` and supply a new `config`. As an example, `pkgsForCudaArch.sm_89.python3Packages.torch` provides PyTorch built for Ada Lovelace GPUs.
167
168### Running Docker or Podman containers with CUDA support {#cuda-docker-podman}
169
170It is possible to run Docker or Podman containers with CUDA support. The recommended mechanism to perform this task is to use the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html).
171
172The NVIDIA Container Toolkit can be enabled in NixOS like follows:
173
174```nix
175{ hardware.nvidia-container-toolkit.enable = true; }
176```
177
178This will automatically enable a service that generates a CDI specification (located at `/var/run/cdi/nvidia-container-toolkit.json`) based on the auto-detected hardware of your machine. You can check this service by running:
179
180```ShellSession
181$ systemctl status nvidia-container-toolkit-cdi-generator.service
182```
183
184::: {.note}
185Depending on what settings you had already enabled in your system, you might need to restart your machine in order for the NVIDIA Container Toolkit to generate a valid CDI specification for your machine.
186:::
187
188Once that a valid CDI specification has been generated for your machine on boot time, both Podman and Docker (> 25) will use this spec if you provide them with the `--device` flag:
189
190```ShellSession
191$ podman run --rm -it --device=nvidia.com/gpu=all ubuntu:latest nvidia-smi -L
192GPU 0: NVIDIA GeForce RTX 4090 (UUID: <REDACTED>)
193GPU 1: NVIDIA GeForce RTX 2080 SUPER (UUID: <REDACTED>)
194```
195
196```ShellSession
197$ docker run --rm -it --device=nvidia.com/gpu=all ubuntu:latest nvidia-smi -L
198GPU 0: NVIDIA GeForce RTX 4090 (UUID: <REDACTED>)
199GPU 1: NVIDIA GeForce RTX 2080 SUPER (UUID: <REDACTED>)
200```
201
202You can check all the identifiers that have been generated for your auto-detected hardware by checking the contents of the `/var/run/cdi/nvidia-container-toolkit.json` file:
203
204```ShellSession
205$ nix run nixpkgs#jq -- -r '.devices[].name' < /var/run/cdi/nvidia-container-toolkit.json
2060
2071
208all
209```
210
211#### Specifying what devices to expose to the container {#cuda-specifying-what-devices-to-expose-to-the-container}
212
213You can choose what devices are exposed to your containers by using the identifier on the generated CDI specification. Like follows:
214
215```ShellSession
216$ podman run --rm -it --device=nvidia.com/gpu=0 ubuntu:latest nvidia-smi -L
217GPU 0: NVIDIA GeForce RTX 4090 (UUID: <REDACTED>)
218```
219
220You can repeat the `--device` argument as many times as necessary if you have multiple GPU's and you want to pick up which ones to expose to the container:
221
222```ShellSession
223$ podman run --rm -it --device=nvidia.com/gpu=0 --device=nvidia.com/gpu=1 ubuntu:latest nvidia-smi -L
224GPU 0: NVIDIA GeForce RTX 4090 (UUID: <REDACTED>)
225GPU 1: NVIDIA GeForce RTX 2080 SUPER (UUID: <REDACTED>)
226```
227
228::: {.note}
229By default, the NVIDIA Container Toolkit will use the GPU index to identify specific devices. You can change the way to identify what devices to expose by using the `hardware.nvidia-container-toolkit.device-name-strategy` NixOS attribute.
230:::
231
232#### Using docker-compose {#cuda-using-docker-compose}
233
234It's possible to expose GPUs to a `docker-compose` environment as well. With a `docker-compose.yaml` file like follows:
235
236```yaml
237services:
238  some-service:
239    image: ubuntu:latest
240    command: sleep infinity
241    deploy:
242      resources:
243        reservations:
244          devices:
245          - driver: cdi
246            device_ids:
247            - nvidia.com/gpu=all
248```
249
250In the same manner, you can pick specific devices that will be exposed to the container:
251
252```yaml
253services:
254  some-service:
255    image: ubuntu:latest
256    command: sleep infinity
257    deploy:
258      resources:
259        reservations:
260          devices:
261          - driver: cdi
262            device_ids:
263            - nvidia.com/gpu=0
264            - nvidia.com/gpu=1
265```
266
267## Contributing {#cuda-contributing}
268
269::: {.warning}
270This section of the docs is still very much in progress. Feedback is welcome in GitHub Issues tagging @NixOS/cuda-maintainers or on [Matrix](https://matrix.to/#/#cuda:nixos.org).
271:::
272
273### Package set maintenance {#cuda-package-set-maintenance}
274
275The CUDA Toolkit is a suite of CUDA libraries and software meant to provide a development environment for CUDA-accelerated applications. Until the release of CUDA 11.4, NVIDIA had only made the CUDA Toolkit available as a multi-gigabyte runfile installer. From CUDA 11.4 and onwards, NVIDIA has also provided CUDA redistributables (“CUDA-redist”): individually packaged CUDA Toolkit components meant to facilitate redistribution and inclusion in downstream projects. These packages are available in the [`cudaPackages`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages) package set.
276
277While the monolithic CUDA Toolkit runfile installer is no longer provided, [`cudaPackages.cudatoolkit`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.cudatoolkit) provides a `symlinkJoin`-ed approximation which common libraries. The use of [`cudaPackages.cudatoolkit`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.cudatoolkit) is discouraged: all new projects should use the CUDA redistributables available in [`cudaPackages`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages) instead, as they are much easier to maintain and update.
278
279#### Updating redistributables {#cuda-updating-redistributables}
280
281Whenever a new version of a redistributable manifest is made available:
282
2831. Check the corresponding README.md in `pkgs/development/cuda-modules/_cuda/manifests` for the URL to use when vendoring manifests.
2842. Update the manifest version used in construction of each CUDA package set in `pkgs/top-level/cuda-packages.nix`.
2853. Update package expressions in `pkgs/development/cuda-modules/packages`.
286
287Updating package expressions amounts to:
288
289- adding fixes conditioned on newer releases, like added or removed dependencies
290- adding package expressions for new packages
291- updating `passthru.brokenConditions` and `passthru.badPlatformsConditions` with various constraints, (e.g., new releases removing support for various architectures)
292
293#### Updating supported compilers and GPUs {#cuda-updating-supported-compilers-and-gpus}
294
2951. Update `nvccCompatibilities` in `pkgs/development/cuda-modules/_cuda/db/bootstrap/nvcc.nix` to include the newest release of NVCC, as well as any newly supported host compilers.
2962. Update `cudaCapabilityToInfo` in `pkgs/development/cuda-modules/_cuda/db/bootstrap/cuda.nix` to include any new GPUs supported by the new release of CUDA.
297
298#### Updating the CUDA package set {#cuda-updating-the-cuda-package-set}
299
300::: {.note}
301Changing the default CUDA package set should occur in a separate PR, allowing time for additional testing.
302:::
303
304::: {.warning}
305As described in [Using `cudaPackages.pkgs`](#cuda-using-cudapackages-pkgs), the current implementation fix for package set leakage involves creating a new instance for each non-default CUDA package sets. As such, We should limit the number of CUDA package sets which have `recurseForDerivations` set to true: `lib.recurseIntoAttrs` should only be applied to the default CUDA package set.
306:::
307
3081. Include a new `cudaPackages_<major>_<minor>` package set in `pkgs/top-level/cuda-packages.nix` and inherit it in `pkgs/top-level/all-packages.nix`.
3092. Successfully build the closure of the new package set, updating expressions in `pkgs/development/cuda-modules/packages` as needed. Below are some common failures:
310
311| Unable to ...  | During ...                       | Reason                                           | Solution                   | Note                                                         |
312| -------------- | -------------------------------- | ------------------------------------------------ | -------------------------- | ------------------------------------------------------------ |
313| Find headers   | `configurePhase` or `buildPhase` | Missing dependency on a `dev` output             | Add the missing dependency | The `dev` output typically contains the headers               |
314| Find libraries | `configurePhase`                 | Missing dependency on a `dev` output             | Add the missing dependency | The `dev` output typically contains CMake configuration files |
315| Find libraries | `buildPhase` or `patchelf`       | Missing dependency on a `lib` or `static` output | Add the missing dependency | The `lib` or `static` output typically contains the libraries |
316
317::: {.note}
318Two utility derivations ease testing updates to the package set:
319
320- `cudaPackages.tests.redists-unpacked`: the `src` of each redistributable package unpacked and `symlinkJoin`-ed
321- `cudaPackages.tests.redists-installed`: each output of each redistributable package `symlinkJoin`-ed
322:::
323
324Failure to run the resulting binary is typically the most challenging to diagnose, as it may involve a combination of the aforementioned issues. This type of failure typically occurs when a library attempts to load or open a library it depends on that it does not declare in its `DT_NEEDED` section. Try the following debugging steps:
325
3261. First ensure that dependencies are patched with [`autoAddDriverRunpath`](https://search.nixos.org/packages?channel=unstable&type=packages&query=autoAddDriverRunpath).
3272. Failing that, try running the application with [`nixGL`](https://github.com/guibou/nixGL) or a similar wrapper tool.
3283. If that works, it likely means that the application is attempting to load a library that is not in the `RPATH` or `RUNPATH` of the binary.
329
330### Writing tests {#cuda-writing-tests}
331
332::: {.caution}
333The existence of `passthru.testers` and `passthru.tests` should be considered an implementation detail -- they are not meant to be a public or stable interface.
334:::
335
336In general, there are two attribute sets in `passthru` that are used to build and run tests for CUDA packages: `passthru.testers` and `passthru.tests`. Each attribute set may contain an attribute set named `cuda`, which contains CUDA-specific derivations. The `cuda` attribute set is used to separate CUDA-specific derivations from those which support multiple implementations (e.g., OpenCL, ROCm, etc.) or have different licenses. For an example of such generic derivations, see the `magma` package.
337
338::: {.note}
339Derivations are nested under the `cuda` attribute due to an OfBorg quirk: if evaluation fails (e.g., because of unfree licenses), the entire enclosing attribute set is discarded. This prevents other attributes in the set from being discovered, evaluated, or built.
340:::
341
342#### `passthru.testers` {#cuda-passthru-testers}
343
344Attributes added to `passthru.testers` are derivations which produce an executable which runs a test. The produced executable should:
345
346- Take care to set up the environment, make temporary directories, and so on.
347- Be registered as the derivation's `meta.mainProgram` so that it can be run directly.
348
349::: {.note}
350Testers which always require CUDA should be placed in `passthru.testers.cuda`, while those which are generic should be placed in `passthru.testers`.
351:::
352
353The `passthru.testers` attribute set allows running tests outside the Nix sandbox. There are a number of reasons why this is useful, since such a test:
354
355- Can be run on non-NixOS systems, when wrapped with utilities like `nixGL` or `nix-gl-host`.
356- Has network access patterns which are difficult or impossible to sandbox.
357- Is free to produce output which is not deterministic, such as timing information.
358
359#### `passthru.tests` {#cuda-passthru-tests}
360
361Attributes added to `passthru.tests` are derivations which run tests inside the Nix sandbox. Tests should:
362
363- Use the executables produced by `passthru.testers`, where possible, to avoid duplication of test logic.
364- Include `requiredSystemFeatures = [ "cuda" ];`, possibly conditioned on the value of `cudaSupport` if they are generic, to ensure that they are only run on systems exposing a CUDA-capable GPU.
365
366::: {.note}
367Tests which always require CUDA should be placed in `passthru.tests.cuda`, while those which are generic should be placed in `passthru.tests`.
368:::
369
370This is useful for tests which are deterministic (e.g., checking exit codes) and which can be provided with all necessary resources in the sandbox.