pkgs/development/cuda-modules/README.md at python-updates

tjh.dev / nixpkgs
fork atom
nixpkgs mirror (for testing) github.com/NixOS/nixpkgs
nix
fork atom
nixpkgs / pkgs / development / cuda-modules / README.md
at python-updates 99 lines 5.0 kB view raw view rendered
wrap content
 1# CUDA Modules
 2
 3> [!NOTE]
 4> This document is meant to help CUDA maintainers understand the structure of
 5> the CUDA packages in Nixpkgs. It is not meant to be a user-facing document.
 6> For a user-facing document, see [the CUDA section of the manual](../../../doc/languages-frameworks/cuda.section.md).
 7
 8The files in this directory are added (in some way) to the `cudaPackages`
 9package set by [cuda-packages.nix](../../top-level/cuda-packages.nix).
10
11## Top-level directories
12
13- `_cuda`: Fixed-point used to configure, construct, and extend the CUDA package
14    set. This includes NVIDIA manifests.
15- `buildRedist`: Contains the logic to build packages using NVIDIA's manifests.
16- `packages`: Contains packages which exist in every instance of the CUDA
17    package set. These packages are built in a `by-name` fashion.
18- `tests`: Contains tests which can be run against the CUDA package set.
19
20Many redistributable packages are in the `packages` directory. Their presence
21ensures that, even if a CUDA package set which no longer includes a given package
22is being constructed, the attribute for that package will still exist (but refer
23to a broken package). This prevents missing attribute errors as the package set
24evolves.
25
26## Distinguished packages
27
28Some packages are purposefully not in the `packages` directory. These are packages
29which do not make sense for Nixpkgs, require further investigation, or are otherwise
30not straightforward to include. These packages are:
31
32- `cuda`:
33  - `collectx_bringup`: missing `libssl.so.1.1` and `libcrypto.so.1.1`; not sure how
34    to provide them or what the package does.
35  - `cuda_sandbox_dev`: unclear on purpose.
36  - `driver_assistant`: we don't use the drivers from the CUDA releases; irrelevant.
37  - `mft_autocomplete`: unsure of purpose; contains FHS paths.
38  - `mft_oem`: unsure of purpose; contains FHS paths.
39  - `mft`: unsure of purpose; contains FHS paths.
40  - `nvidia_driver`: we don't use the drivers from the CUDA releases; irrelevant.
41  - `nvlsm`: contains FHS paths/NVSwitch and NVLINK software
42  - `libnvidia_nscq`: NVSwitch software
43  - `libnvsdm`: NVSwitch software
44- `cublasmp`:
45  - `libcublasmp`: `nvshmem` isn't packaged.
46- `cudnn`:
47  - `cudnn_samples`: requires FreeImage, which is abandoned and not packaged.
48
49> [!NOTE]
50>
51> When packaging redistributables, prefer `autoPatchelfIgnoreMissingDeps` to providing
52> paths to stubs with `extraAutoPatchelfLibs`; the stubs are meant to be used for
53> projects where linking against libraries available only at runtime is unavoidable.
54
55### CUDA Compatibility
56
57[CUDA Compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/),
58available as `cudaPackages.cuda_compat`, is a component which makes it possible
59to run applications built against a newer CUDA toolkit (for example CUDA 12) on
60a machine with an older CUDA driver (for example CUDA 11), which isn't possible
61out of the box. At the time of writing, CUDA Compatibility is only available on
62the Nvidia Jetson architecture, but Nvidia might release support for more
63architectures in the future.
64
65As CUDA Compatibility strictly increases the range of supported applications, we
66try our best to enable it by default on supported platforms.
67
68#### Functioning
69
70`cuda_compat` simply provides a new `libcuda.so` (and associated variants) that
71needs to be used in place of the default CUDA driver's `libcuda.so`. However,
72the other shared libraries of the default driver must still be accessible:
73`cuda_compat` isn't a complete drop-in replacement for the driver (and that's
74the point, otherwise, it would just be a newer driver).
75
76Nvidia's recommendation is to set `LD_LIBRARY_PATH` to point to `cuda_compat`'s
77driver. This is fine for a manual, one-shot usage, but in general setting
78`LD_LIBRARY_PATH` is a red flag. This is global state which short-circuits most
79of other dynamic library resolution mechanisms and can break things in
80non-obvious ways, especially with other Nix-built software.
81
82#### CUDA Compat with Nix
83
84Since `cuda_compat` is a known derivation, the easy way to do this in Nix would
85be to add `cuda_compat` as a dependency of CUDA libraries and applications and
86let Nix do its magic by filling the `DT_RUNPATH` fields. However,
87`cuda_compat` itself depends on `libnvrm_mem` and `libnvrm_gpu` which are loaded
88dynamically at runtime from `/run/opengl-driver`. This doesn't please the Nix
89sandbox when building, which can't find those (a second minor issue is that
90`addOpenGLRunpathHook` prepends the `/run/opengl-driver` path, so that would
91still take precedence).
92
93The current solution is to do something similar to `addOpenGLRunpathHook`: the
94`addCudaCompatRunpathHook` prepends the path to `cuda_compat`'s `libcuda.so`
95to the `DT_RUNPATH` of whichever package includes the hook as a dependency, and
96we include the hook by default for packages in `cudaPackages` (by adding it as an
97input in `genericManifestBuilder`). We also make sure it's included after
98`addOpenGLRunpathHook`, so that it appears _before_ in the `DT_RUNPATH` and
99takes precedence.