[mirror] Scalable static site server for Git forges (like GitHub Pages)
1git-pages
2=========
3
4_git-pages_ is a static site server for use with Git forges (i.e. a GitHub Pages replacement). It is written with efficiency in mind, scaling horizontally to any number of machines and serving sites up to multiple gigabytes in size, while being equally suitable for small single-user deployments.
5
6It is implemented in Go and has no other mandatory dependencies, although it is designed to be used together with the [Caddy server][caddy] for TLS termination. Site data may be stored on the filesystem or in an [Amazon S3](https://aws.amazon.com/s3/) compatible object store.
7
8The included Docker container provides everything needed to deploy a Pages service, including zero-configuration on-demand provisioning of TLS certificates from [Let's Encrypt](https://letsencrypt.org/), and runs on any commodity cloud infrastructure.
9
10> [!TIP]
11> If you want to publish a site using _git-pages_ to an existing service like Codeberg Pages or [Grebedoc][grebedoc], consider using the [CLI tool][git-pages-cli] or the [Forgejo Action][git-pages-action].
12
13[caddy]: https://caddyserver.com/
14[git-pages-cli]: https://codeberg.org/git-pages/git-pages-cli
15[git-pages-action]: https://codeberg.org/git-pages/action
16[codeberg-pages]: https://codeberg.page
17[grebedoc]: https://grebedoc.dev
18
19
20Quickstart
21----------
22
23You will need [Go](https://go.dev/) 1.25 or newer. Run:
24
25```console
26$ mkdir -p data
27$ cp conf/config.example.toml config.toml
28$ PAGES_INSECURE=1 go run .
29```
30
31These commands starts an HTTP server on `0.0.0.0:3000` and use the `data` directory for persistence. **Authentication is disabled via `PAGES_INSECURE=1`** to avoid the need to set up a DNS server as well; never enable `PAGES_INSECURE=1` in production.
32
33To publish a site, run the following commands (consider also using the [git-pages-cli] tool):
34
35```console
36$ curl http://localhost:3000/ -X PUT --data https://codeberg.org/git-pages/git-pages.git
37b70644b523c4aaf4efd206a588087a1d406cb047
38```
39
40The `pages` branch of the repository is now available at http://localhost:3000/!
41
42
43Deployment
44----------
45
46The first-party container supports running _git-pages_ either standalone or together with [Caddy][].
47
48To run _git-pages_ standalone and use the filesystem to store site data:
49
50```console
51$ docker run -u $(id -u):$(id -g) --mount type=bind,src=$(pwd)/data,dst=/app/data -p 3000:3000 codeberg.org/git-pages/git-pages:latest
52```
53
54To run _git-pages_ with Caddy and use an S3-compatible endpoint to store site data and TLS key material:
55
56```console
57$ docker run -e PAGES_STORAGE_TYPE -e PAGES_STORAGE_S3_ENDPOINT -e PAGES_STORAGE_S3_REGION -e PAGES_STORAGE_S3_ACCESS_KEY_ID -e PAGES_STORAGE_S3_SECRET_ACCESS_KEY -e PAGES_STORAGE_S3_BUCKET -e ACME_EMAIL -p 80:80 -p 443:443 codeberg.org/git-pages/git-pages:latest supervisord
58```
59
60
61Features
62--------
63
64* In response to a `GET` or `HEAD` request, the server selects an appropriate site and responds with files from it. A site is a combination of the hostname and (optionally) the project name.
65 - The site is selected as follows:
66 - If the URL matches `https://<hostname>/<project-name>/...` and a site was published at `<project-name>`, this project-specific site is selected.
67 - If the URL matches `https://<hostname>/...` and the previous rule did not apply, the index site is selected.
68 - Site URLs that have a path starting with `.git-pages/...` are reserved for _git-pages_ itself.
69 - The `.git-pages/health` URL returns `ok` with the `Last-Modified:` header set to the manifest modification time.
70 - The `.git-pages/manifest.json` URL returns a [ProtoJSON](https://protobuf.dev/programming-guides/json/) representation of the deployed site manifest with the `Last-Modified:` header set to the manifest modification time. It enumerates site structure, redirect rules, and errors that were not severe enough to abort publishing. Note that **the manifest JSON format is not stable and will change without notice**.
71 - The `.git-pages/archive.tar` URL returns a tar archive of all site contents, including `_redirects` and `_headers` files (reconstructed from the manifest), with the `Last-Modified:` header set to the manifest modification time. Compression can be enabled using the `Accept-Encoding:` HTTP header (only).
72* In response to a `PUT` or `POST` request, the server updates a site with new content. The URL of the request must be the root URL of the site that is being published.
73 - If the `PUT` method receives an `application/x-www-form-urlencoded` body, it contains a repository URL to be shallowly cloned. The `Branch` header contains the branch to be checked out; the `pages` branch is used if the header is absent.
74 - If the `PUT` method receives an `application/x-tar`, `application/x-tar+gzip`, `application/x-tar+zstd`, or `application/zip` body, it contains an archive to be extracted.
75 - The `POST` method requires an `application/json` body containing a Forgejo/Gitea/Gogs/GitHub webhook event payload. Requests where the `ref` key contains anything other than `refs/heads/pages` are ignored, and only the `pages` branch is used. The `repository.clone_url` key contains a repository URL to be shallowly cloned.
76 - If the received contents is empty, performs the same action as `DELETE`.
77* In response to a `PATCH` request, the server partially updates a site with new content. The URL of the request must be the root URL of the site that is being published.
78 - The request must have a `application/x-tar`, `application/x-tar+gzip`, or `application/x-tar+zstd` body, whose contents is *merged* with the existing site contents as follows:
79 - A character device entry with major 0 and minor 0 is treated as a "whiteout marker" (following [unionfs][whiteout]): it causes any existing file or directory with the same name to be deleted.
80 - A directory entry replaces any existing file or directory with the same name (if any), recursively removing the old contents.
81 - A file or symlink entry replaces any existing file or directory with the same name (if any).
82 - If there is no `Create-Parents:` header or a `Create-Parents: no` header is present, the parent path of an entry must exist and refer to a directory.
83 - If a `Create-Parents: yes` header is present, any missing segments in the parent path of an entry will be created (like `mkdir -p`). Any existing segments refer to directories.
84 - The request must have a `Atomic: yes` or `Atomic: no` header. Not every backend configuration makes it possible to perform atomic compare-and-swap operations; on backends without atomic CAS support, `Atomic: yes` requests will fail, while `Atomic: no` requests will provide a best-effort approximation.
85 - If a `PATCH` request loses a race against another content update request, it may return `409 Conflict`. This is true regardless of the `Atomic:` header value. Whenever this happens, resubmit the request as-is.
86 - If the site has no contents after the update is applied, performs the same action as `DELETE`.
87* In response to a `DELETE` request, the server unpublishes a site. The URL of the request must be the root URL of the site that is being unpublished. Site data remains stored for an indeterminate period of time, but becomes completely inaccessible.
88* If a `Dry-Run: yes` header is provided with a `PUT`, `PATCH`, `DELETE`, or `POST` request, only the authorization checks are run; no destructive updates are made.
89* All updates to site content are atomic (subject to consistency guarantees of the storage backend). That is, there is an instantaneous moment during an update before which the server will return the old content and after which it will return the new content.
90* Files with a certain name, when placed in the root of a site, have special functions:
91 - [Netlify `_redirects`][_redirects] file can be used to specify HTTP redirect and rewrite rules. The _git-pages_ implementation currently does not support placeholders, query parameters, or conditions, and may differ from Netlify in other minor ways. If you find that a supported `_redirects` file feature does not work the same as on Netlify, please file an issue. (Note that _git-pages_ does not perform URL normalization; `/foo` and `/foo/` are *not* the same, unlike with Netlify.)
92 - [Netlify `_headers`][_headers] file can be used to specify custom HTTP response headers (if allowlisted by configuration). In particular, this is useful to enable [CORS requests][cors]. The _git-pages_ implementation may differ from Netlify in minor ways; if you find that a `_headers` file feature does not work the same as on Netlify, please file an issue.
93* Incremental updates can be made using `PUT` or `PATCH` requests where the body contains an archive (both tar and zip are supported).
94 - Any archive entry that is a symlink to `/git/pages/<git-sha256>` is replaced with an existing manifest entry for the same site whose git blob hash matches `<git-sha256>`. If there is no existing manifest entry with the specified git hash, the update fails with a `422 Unprocessable Entity`.
95 - For this error response only, if the negotiated content type is `application/vnd.git-pages.unresolved`, the response will contain the `<git-sha256>` of each unresolved reference, one per line.
96* Support for SHA-256 Git hashes is [limited by go-git][go-git-sha256]; once go-git implements the required features, _git-pages_ will automatically gain support for SHA-256 Git hashes. Note that shallow clones (used by _git-pages_ to conserve bandwidth if available) aren't supported yet in the Git protocol as of 2025.
97
98[_redirects]: https://docs.netlify.com/manage/routing/redirects/overview/
99[_headers]: https://docs.netlify.com/manage/routing/headers/
100[cors]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CORS
101[go-git-sha256]: https://github.com/go-git/go-git/issues/706
102[whiteout]: https://docs.kernel.org/filesystems/overlayfs.html#whiteouts-and-opaque-directories
103
104
105Authorization
106-------------
107
108DNS is the primary authorization method, using either TXT records or wildcard matching. In certain cases, git forge authorization is used in addition to DNS.
109
110The authorization flow for content updates (`PUT`, `PATCH`, `DELETE`, `POST` requests) proceeds sequentially in the following order, with the first of multiple applicable rule taking precedence:
111
1121. **Development Mode:** If the environment variable `PAGES_INSECURE` is set to a truthful value like `1`, the request is authorized.
1132. **DNS Challenge:** If the method is `PUT`, `PATCH`, `DELETE`, `POST`, and a well-formed `Authorization:` header is provided containing a `<token>`, and a TXT record lookup at `_git-pages-challenge.<host>` returns a record whose concatenated value equals `SHA256("<host> <token>")`, the request is authorized.
114 - **`Pages` scheme:** Request includes an `Authorization: Pages <token>` header.
115 - **`Basic` scheme:** Request includes an `Authorization: Basic <basic>` header, where `<basic>` is equal to `Base64("Pages:<token>")`. (Useful for non-Forgejo forges.)
1163. **DNS Allowlist:** If the method is `PUT` or `POST`, and the request URL is `scheme://<user>.<host>/`, and a TXT record lookup at `_git-pages-repository.<host>` returns a set of well-formed absolute URLs, and (for `PUT` requests) the body contains a repository URL, and the requested clone URLs is contained in this set of URLs, the request is authorized.
1174. **Wildcard Match (content):** If the method is `POST`, and a `[[wildcard]]` configuration section exists where the suffix of a hostname (compared label-wise) is equal to `[[wildcard]].domain`, and (for `PUT` requests) the body contains a repository URL, and the requested clone URL is a *matching* clone URL, the request is authorized.
118 - **Index repository:** If the request URL is `scheme://<user>.<host>/`, a *matching* clone URL is computed by templating `[[wildcard]].clone-url` with `<user>` and `<project>`, where `<project>` is computed by templating each element of `[[wildcard]].index-repos` with `<user>`, and `[[wildcard]]` is the section where the match occurred.
119 - **Project repository:** If the request URL is `scheme://<user>.<host>/<project>/`, a *matching* clone URL is computed by templating `[[wildcard]].clone-url` with `<user>` and `<project>`, and `[[wildcard]]` is the section where the match occurred.
1205. **Forge Authorization:** If the method is `PUT` or `PATCH`, and the body contains an archive, and a `[[wildcard]]` configuration section exists where the suffix of a hostname (compared label-wise) is equal to `[[wildcard]].domain`, and `[[wildcard]].authorization` is non-empty, and the request includes a `Forge-Authorization:` header, and the header (when forwarded as `Authorization:`) grants push permissions to a repository at the *matching* clone URL (as defined above) as determined by an API call to the forge, the request is authorized. (This enables publishing a site for a private repository.)
1215. **Default Deny:** Otherwise, the request is not authorized.
122
123The authorization flow for metadata retrieval (`GET` requests with site paths starting with `.git-pages/`) in the following order, with the first of multiple applicable rule taking precedence:
124
1251. **Development Mode:** Same as for content updates.
1262. **DNS Challenge:** Same as for content updates.
1273. **Wildcard Match (metadata):** If a `[[wildcard]]` configuration section exists where the suffix of a hostname (compared label-wise) is equal to `[[wildcard]].domain`, the request is authorized.
1284. **Default Deny:** Otherwise, the request is not authorized.
129
130
131Observability
132-------------
133
134_git-pages_ has robust observability features built in:
135* The metrics endpoint (bound to `:3002` by default) returns Go, pages server, and storage backend metrics in the [Prometheus](https://prometheus.io/) format.
136* Optional [Sentry](https://sentry.io/) integration allows greater visibility into the application. The `ENVIRONMENT` environment variable configures the deploy environment name (`development` by default).
137 * If `SENTRY_DSN` environment variable is set, panics are reported to Sentry.
138 * If `SENTRY_DSN` and `SENTRY_LOGS=1` environment variables are set, logs are uploaded to Sentry.
139 * If `SENTRY_DSN` and `SENTRY_TRACING=1` environment variables are set, traces are uploaded to Sentry.
140* Optional syslog integration allows transmitting application logs to a syslog daemon. When present, the `SYSLOG_ADDR` environment variable enables the integration, and the value is used to configure the syslog destination. The value must follow the format `family/address` and is usually one of the following:
141 * a Unix datagram socket: `unixgram//dev/log`;
142 * TLS over TCP: `tcp+tls/host:port`;
143 * plain TCP: `tcp/host:post`;
144 * UDP: `udp/host:port`.
145
146
147Architecture (v2)
148-----------------
149
150An object store (filesystem, S3, ...) is used as the sole mechanism for state storage. The object store is expected to provide atomic operations and where necessary the backend adapter ensures as such.
151
152- Repositories themselves never reach the object store; they are cloned to an ephemeral location and discarded immediately after their contents is extracted.
153- The `blob/` prefix contains file data organized by hash of their contents (indiscriminately of the repository they belong to).
154 - Very small files are stored inline in the manifest.
155- The `site/` prefix contains site manifests organized by domain and project name (e.g. `site/example.org/myproject` or `site/example.org/.index`).
156 - The manifest is a Protobuf object containing a flat mapping of paths to entries. An entry is comprised of type (file, directory, symlink, etc) and data, which may be stored inline or refer to a blob.
157 - A small amount of internal metadata within a manifest allows attributing deployments to their source and computing quotas.
158- Additionally, the object store contains *staged manifests*, representing an in-progress update operation.
159 - An update first creates a staged manifest, then uploads blobs, then replaces the deployed manifest with the staged one. This avoids TOCTTOU race conditions during garbage collection.
160 - Stable marshalling allows addressing staged manifests by the hash of their contents.
161
162This approach, unlike the v1 one, cannot be easily introspected with normal Unix commands, but is very friendly to S3-style object storage services, as it does not rely on operations these services cannot support (subtree rename, directory stat, symlink/readlink).
163
164The S3 backend, intended for (relatively) high latency connections, caches both manifests and blobs in memory. Since a manifest is necessary and sufficient to return `304 Not Modified` responses for a matching `ETag`, this drastically reduces navigation latency. Blobs are content-addressed and are an obvious target for a last level cache.
165
166
167Architecture (v1)
168-----------------
169
170*This was the original architecture and it is no longer used. Migration to v2 was last available in commit [7e9cd17b](https://codeberg.org/git-pages/git-pages/commit/7e9cd17b70717bea2fe240eb6a784cb206243690).*
171
172Filesystem is used as the sole mechanism for state storage.
173
174- The `data/tree/` directory contains working trees organized by commit hash (indiscriminately of the repository they belong to). Repositories themselves are never stored on disk; they are cloned in-memory and discarded immediately after their contents is extracted.
175 - The presence of a working tree directory under the appropriate commit hash is considered an indicator of its completeness. Checkouts are first done into a temporary directory and then atomically moved into place.
176 - Currently a working tree is never removed, but a practical system would need to have a way to discard orphaned ones.
177- The `data/www/` directory contains symlinks to working trees organized by domain and project name (e.g. `data/www/example.org/myproject` or `data/www/example.org/.index`).
178 - The presence of a symlink at the appropriate location is considered an indicator of completeness as well. Updating to a new content version is done by creating a new symlink at a temporary location and then atomically moving it into place.
179 - This structure is simple enough that it may be served by e.g. Nginx instead of the Go application.
180- `openat2(RESOLVE_IN_ROOT)` is used to confine GET requests strictly under the `data/` directory.
181
182This approach has the benefits of being easy to explore and debug, but places a lot of faith onto the filesystem implementation; partial data loss, write reordering, or incomplete journalling *will* result in confusing and persistent caching issues. This is probably fine, but needs to be understood.
183
184The specific arrangement used is clearly not optimal; at a minimum it is likely worth it to deduplicate files under `data/tree/` using hardlinks, or perhaps to put objects in a flat, content-addressed store with `data/www/` linking to each individual file. The key practical constraint will likely be the need to attribute excessively large trees to repositories they were built from (and to perform GC), which suggests adding structure and not removing it.
185
186
187License
188-------
189
190[0-clause BSD](LICENSE.txt)