1# Gradle Setup Hook 2 3## Introduction 4 5Gradle build scripts are written in a DSL, computing the list of Gradle 6dependencies is a Turing-complete task, not just in theory but also in 7practice. Fetching all of the dependencies often requires building some 8native code, running some commands to check the host platform, or just 9fetching some files using either JVM code or commands like `curl` or 10`wget`. 11 12This practice is widespread and isn't considered a bad practice in the 13Java world, so all we can do is run Gradle to check what dependencies 14end up being fetched, and allow derivation authors to apply workarounds 15so they can run the code necessary for fetching the dependencies our 16script doesn't fetch. 17 18"Run Gradle to check what dependencies end up being fetched" isn't a 19straightforward task. For example, Gradle usually uses Maven 20repositories, which have features such as "snapshots", a way to always 21use the latest version of a dependency as opposed to a fixed version. 22Obviously, this is horrible for reproducibility. Additionally, Gradle 23doesn't offer a way to export the list of dependency URLs and hashes (it 24does in a way, but it's far from being complete, and as such is useless 25for Nixpkgs). Even if it did, it would be annoying to use considering 26fetching non-Gradle dependencies in Gradle scripts is commonplace. 27 28That's why the setup hook uses mitm-cache, a program designed for 29intercepting all HTTP requests, recording all the files that were 30accessed, creating a Nix derivation with all of them, and then allowing 31the Gradle derivation to access these files. 32 33## Maven Repositories 34 35(Reference: [Repository 36Layout](https://cwiki.apache.org/confluence/display/MAVENOLD/Repository+Layout+-+Final)) 37 38Most of Gradle dependencies are fetched from Maven repositories. For 39each dependency, Gradle finds the first repo where it can successfully 40fetch that dependency, and uses that repo for it. Different repos might 41actually return different files for the same artifact because of e.g. 42pom normalization. Different repos may be used for the same artifact 43even across a single package (for example, if two build scripts define 44repositories in a different order). 45 46The artifact metadata is specified in a .pom file, and the artifacts 47themselves are typically .jar files. The URL format is as follows: 48 49`<repo>/<group-id>/<artifact-id>/<base-version>/<artifact-id>-<version>[-<classifier>].<ext>` 50 51For example: 52 53- `https://repo.maven.apache.org/maven2/org/slf4j/slf4j-api/2.0.9/slf4j-api-2.0.9.pom` 54- `https://oss.sonatype.org/content/groups/public/com/tobiasdiez/easybind/2.2.1-SNAPSHOT/easybind-2.2.1-20230117.075740-16.pom` 55 56Where: 57 58- `<repo>` is the repo base (`https://repo.maven.apache.org/maven2`) 59- `<group-id>` is the group ID with dots replaced with slashes 60 (`org.slf4j` -> `org/slf4j`) 61- `<artifact-id>` is the artifact ID (`slf4j-api`) 62- `<base-version>` is the artifact version (`2.0.9` for normal 63 artifacts, `2.2.1-SNAPSHOT` for snapshots) 64- `<version>` is the artifact version - can be either `<base-version>` 65 or `<version-base>-<timestamp>-<build-num>` (`2.0.9` for normal 66 artifacts, and either `2.2.1-SNAPSHOT` or `2.2.1-20230117.075740-16` 67 for snapshots) 68 - `<version-base>` - `<base-version>` without the `-SNAPSHOT` suffix 69 - `<timestamp>` - artifact build timestamp in the `YYYYMMDD.HHMMSS` 70 format (UTC) 71 - `<build-num>` - a counter that's incremented by 1 for each new 72 snapshot build 73- `<classifier>` is an optional classifier for allowing a single .pom to 74 refer to multiple .jar files. .pom files don't have classifiers, as 75 they describe metadata. 76- `<ext>` is the extension. .pom 77 78Note that the artifact ID can contain `-`, so you can't extract the 79artifact ID and version from just the file name. 80 81Additionally, the files in the repository may have associated signature 82files, formed by appending `.asc` to the filename, and hashsum files, 83formed by appending `.md5` or `.sha1` to the filename. The signatures 84are harmless, but the `.md5`/`.sha1` files are rejected. 85 86The reasoning is as follows - consider two files `a.jar` and `b.jar`, 87that have the same hash. Gradle will fetch `a.jar.sha1`, find out that 88it hasn't yet downloaded a file with this hash, and then fetch `a.jar`, 89and finally download `b.jar.sha1`, locate it in its cache, and then 90*not* download `b.jar`. This means `b.jar` won't be stored in the MITM 91cache. Then, consider that on a later invocation, the fetching order 92changed, whether it was because of running on a different system, 93changed behavior after a Gradle update, or any other source of 94nondeterminism - `b.jar` is fetched before `a.jar`. Gradle will first 95fetch `b.jar.sha1`, not find it in its cache, attempt to fetch `b.jar`, 96and fail, as the cache doesn't have that file. 97 98For the same reason, the proxy strips all checksum/etag headers. An 99alternative would be to make the proxy remember previous checksums and 100etags, but that would complicate the implementation - however, such a 101feature can be implemented if necessary. Note that checksum/etag header 102stripping is hardcoded, but `.md5/.sha1` file rejection is configured 103via CLI arguments. 104 105**Caveat**: Gradle .module files also contain file hashes, in md5, sha1, 106sha256, sha512 formats. It has posed no problem as of yet, but it might in 107the future. If it does pose problems, the deps derivation code can be 108extended to find all checksums in .module files and copy existing files 109there if their hash matches. 110 111## Snapshots 112 113Snapshots are a way to publish the very latest, unstable version of a 114dependency that constantly changes. Any project that depends on a 115snapshot will depend on this rolling version, rather than a fixed 116version. It's easy to understand why this is a bad idea for reproducible 117builds. Still, they can be dealt with by the logic in `gradle.fetchDeps` 118and `gradle.updateDeps`. 119 120First, as you can see above, while normal artifacts have the same 121`base-version` and `version`, for snapshots it usually (but not 122necessarily) differs. 123 124Second, for figuring out where to download the snapshot, Gradle consults 125`maven-metadata.xml`. With that in mind... 126 127## Maven Metadata 128 129(Reference: [Maven 130Metadata](https://maven.apache.org/repositories/metadata.html), 131[Metadata](https://maven.apache.org/ref/3.9.8/maven-repository-metadata/repository-metadata.html) 132 133Maven metadata files are called `maven-metadata.xml`. 134 135There are three levels of metadata: "G level", "A level", "V level", 136representing group, artifact, or version metadata. 137 138G level metadata is currently unsupported. It's only used for Maven 139plugins, which Gradle presumably doesn't use. 140 141A level metadata is used for getting the version list for an artifact. 142It's an xml with the following items: 143 144- `<groupId>` - group ID 145- `<artifactId>` - artifact ID 146- `<versioning>` 147 - `<latest>` - the very latest base version (e.g. `2.2.1-SNAPSHOT`) 148 - `<release>` - the latest non-snapshot version 149 - `<versions>` - the version list, each in a `<version>` tag 150 - `<lastUpdated>` - the metadata update timestamp (UTC, 151 `YYYYMMDDHHMMSS`) 152 153V level metadata is used for listing the snapshot versions. It has the 154following items: 155 156- `<groupId>` - group ID 157- `<artifactId>` - artifact ID 158- `<versioning>` 159 - `<lastUpdated>` - the metadata update timestamp (UTC, 160 `YYYYMMDDHHMMSS`) 161 - `<snapshot>` - info about the latest snapshot version 162 - `<timestamp>` - build timestamp (UTC, `YYYYMMDD.HHMMSS`) 163 - `<buildNumber>` - build number 164 - `<snapshotVersions>` - the list of all available snapshot file info, 165 each info is enclosed in a `<snapshotVersion>` 166 - `<classifier>` - classifier (optional) 167 - `<extension>` - file extension 168 - `<value>` - snapshot version (as opposed to base version) 169 - `<updated>` - snapshot build timestamp (UTC, `YYYYMMDDHHMMSS`) 170 171## Lockfile Format 172 173The mitm-cache lockfile format is described in the [mitm-cache 174README](https://github.com/chayleaf/mitm-cache#readme). 175 176The Nixpkgs Gradle lockfile format is more complicated: 177 178```json 179{ 180 "!comment": "This is a Nixpkgs Gradle dependency lockfile. For more details, refer to the Gradle section in the Nixpkgs manual.", 181 "!version": 1, 182 "https://oss.sonatype.org/content/repositories/snapshots/com/badlogicgames/gdx-controllers": { 183 "gdx-controllers#gdx-controllers-core/2.2.4-20231021.200112-6/SNAPSHOT": { 184 185 "jar": "sha256-Gdz2J1IvDJFktUD2XeGNS0SIrOyym19X/+dCbbbe3/U=", 186 "pom": "sha256-90QW/Mtz1jbDUhKjdJ88ekhulZR2a7eCaEJoswmeny4=" 187 }, 188 "gdx-controllers-core/2.2.4-SNAPSHOT/maven-metadata": { 189 "xml": { 190 "groupId": "com.badlogicgames.gdx-controllers" 191 } 192 } 193 }, 194 "https://repo.maven.apache.org/maven2": { 195 "com/badlogicgames/gdx#gdx-backend-lwjgl3/1.12.1": { 196 "jar": "sha256-B3OwjHfBoHcJPFlyy4u2WJuRe4ZF/+tKh7gKsDg41o0=", 197 "module": "sha256-9O7d2ip5+E6OiwN47WWxC8XqSX/mT+b0iDioCRTTyqc=", 198 "pom": "sha256-IRSihaCUPC2d0QzB0MVDoOWM1DXjcisTYtnaaxR9SRo=" 199 } 200 } 201} 202``` 203 204`!comment` is a human-readable description explaining what the file is, 205`!version` is the lockfile version (note that while it shares the name 206with mitm-cache's `!version`, they don't actually have to be in sync and 207can be bumped separately). 208 209The other keys are parts of a URL. Each URL is split into three parts. 210They are joined like this: `<part1>/<part2>.<part3>`. 211 212Some URLs may have a `#` in them. In that case, the part after `#` is 213parsed as `#<artifact-id>/<version>[/SNAPSHOT][/<classifier>].<ext>` and 214expanded into 215`<artifact-id>/<base-version>/<artifact-id>-<version>[-<classifier>].<ext>`. 216 217Each URL has a value associated with it. The value may be: 218 219- an SRI hash (string) 220- for `maven-metadata.xml` - an attrset containing the parts of the 221 metadata that can't be generated in Nix code (e.g. `groupId`, which is 222 challenging to parse from a URL because it's not always possible to 223 discern where the repo base ends and the group ID begins). 224 225`compress-deps-json.py` converts the JSON from mitm-cache format into 226Nixpkgs Gradle lockfile format. `fetch.nix` does the opposite. 227 228## Security Considerations 229 230Lockfiles won't be human-reviewed. They must be tampering-resistant. 231That's why it's imperative that nobody can inject their own contents 232into the lockfiles. 233 234This is achieved in a very simple way - the `deps.json` only contains 235the following: 236 237- `maven-metadata.xml` URLs and small pieces of the contained metadata 238 (most of it will be generated in Nix, i.e. the area of injection is 239 minimal, and the parts that aren't generated in Nix are validated). 240- artifact/other file URLs and associated hashes (Nix will complain if 241 the hash doesn't match, and Gradle won't even access the URL if it 242 doesn't match) 243 244Please be mindful of the above when working on Gradle support for 245Nixpkgs.