commits
Signed-off-by: oppiliappan <me@oppi.li>
Without this, some of the slog lines end in `\n` and some do not, which
I find deeply irritating.
There was never a particularly good reason to tie the fallback handler
to a wildcard domain; most importantly, this prevented it from being
used for custom domains, which is required for migrating custom domains
from Codeberg Pages v2 server.
The following script may be used to handle abusive sites:
cd $(mktemp -d)
echo "<h1>Gone</h1>" >index.html
echo "/* /index.html 410" >_redirects
tar cf site.tar index.html _redirects
git-pages -update-site $1 site.tar
git-pages -freeze-domain $1
This commit changes the git fetch algorithm to only retrieve blobs
that aren't included in the previously deployed site manifest, if
git filters are supported by the remote.
It also changes how manifest entry sizes are represented, such that
both decompressed and compressed sizes are stored. This enables
computing accurate (and repeatable) sizes even after incremental
updates.
Co-authored-by: David Leadbeater <dgl@dgl.cx>
Forcing the server to repeatedly decompress a large blob is a potential
DoS vector, so having a metric for this is essential.
The former metric was misnamed: it only counted NoSuchKey errors.
Also, it was applied *after* the cache, meaning it was just a count
of every request that got a successful 404 from the S3 backend.
Also, it pooled blob and manifest requests together.
The new metric is 1-to-1 correspondent to S3 requests and distinguishes
between different kinds of errors. Also, it distinguishes kinds of
requests. Example output:
git_pages_s3_get_object_responses_count{code="NoSuchKey",kind="manifest"} 1
git_pages_s3_get_object_responses_count{code="OK",kind="blob"} 1
git_pages_s3_get_object_responses_count{code="OK",kind="manifest"} 1
Related to 4cca8abaf0.
They've fixed it in https://github.com/getsentry/sentry-go/issues/1142
Via `sentry-telemetry-buffer` feature.
I think this causes high CPU use on Grebedoc.
Previously a <512 byte file without an extension resulted in:
internal server error: runtime error: slice bounds out of range [:512] with capacity 8
Otherwise, an undesired degree of freedom permits a third party to
deny access to index site URLs by publishing projects with the same
name.
In the future, the _git-pages-repository TXT record format may be
extended to allow non-index sites to be specified without introducing
undesired degrees of freedom.
Introduced in commit dd168186.
This commit also moves all of the globals into `main.go`.
On Windows, there is no way to reload configuration at runtime.
The HTTP endpoint is `/.git-pages/archive.tar` and it is gated behind
a feature flag `archive-site`. It serially downloads every blob and
writes it to the client in a chunked response, optionally compressed
with gzip or zstd as per `Accept-Encoding:`. It is authorized the same
as `/.git-pages/manifest.json`, for the same reasons.
The CLI operation is `-get-archive <site-name>` and it writes a tar
archive to stdout. This could be useful for an administrator to review
the contents of a site in response to a report.
Both `_headers` and `_redirects` files are present in the output,
reconstituted from the manifest.
I think this doesn't affect anything, but prevents an embarrassing
message from appearing in the log:
compress: saved NaN percent (0 B to 0 B)
This is both to reduce the amount of loose variables in the code, as
well as to make it closer to `S3Backend.GetBlob`.
This is an attempt to stop OOMing Codeberg's Forgejo Actions runners,
which count disk and RAM against the same quota.
This is to match the behavior of GitHub, as well as because it isn't
particularly useful to serve a file from the index repo with the same
path segment as the project name (and quite confusing too).
This size is not used by git-pages itself, and is not representative of
storage needs, but may be used for estimating how large a site would
be if downloaded in its entirety.
Previously, this method would match only hosts of the form:
user.host.com
This changeset allows matches on hosts of the form:
user.org.host.com
user.organization.com.host.com
This will potentially be the pattern that tangled.org uses for its hosted
instance of git-pages.
Signed-off-by: oppiliappan <me@oppi.li>
Done for uniformity and to make git-pages-cli implementation nicer.
Same rationale as in 9d0a3ac6ad15218d452cdc48a6a798ff5d0492a8.
This commit changes the git fetch algorithm to only retrieve blobs
that aren't included in the previously deployed site manifest, if
git filters are supported by the remote.
It also changes how manifest entry sizes are represented, such that
both decompressed and compressed sizes are stored. This enables
computing accurate (and repeatable) sizes even after incremental
updates.
Co-authored-by: David Leadbeater <dgl@dgl.cx>
The former metric was misnamed: it only counted NoSuchKey errors.
Also, it was applied *after* the cache, meaning it was just a count
of every request that got a successful 404 from the S3 backend.
Also, it pooled blob and manifest requests together.
The new metric is 1-to-1 correspondent to S3 requests and distinguishes
between different kinds of errors. Also, it distinguishes kinds of
requests. Example output:
git_pages_s3_get_object_responses_count{code="NoSuchKey",kind="manifest"} 1
git_pages_s3_get_object_responses_count{code="OK",kind="blob"} 1
git_pages_s3_get_object_responses_count{code="OK",kind="manifest"} 1
Otherwise, an undesired degree of freedom permits a third party to
deny access to index site URLs by publishing projects with the same
name.
In the future, the _git-pages-repository TXT record format may be
extended to allow non-index sites to be specified without introducing
undesired degrees of freedom.
The HTTP endpoint is `/.git-pages/archive.tar` and it is gated behind
a feature flag `archive-site`. It serially downloads every blob and
writes it to the client in a chunked response, optionally compressed
with gzip or zstd as per `Accept-Encoding:`. It is authorized the same
as `/.git-pages/manifest.json`, for the same reasons.
The CLI operation is `-get-archive <site-name>` and it writes a tar
archive to stdout. This could be useful for an administrator to review
the contents of a site in response to a report.
Both `_headers` and `_redirects` files are present in the output,
reconstituted from the manifest.
Previously, this method would match only hosts of the form:
user.host.com
This changeset allows matches on hosts of the form:
user.org.host.com
user.organization.com.host.com
This will potentially be the pattern that tangled.org uses for its hosted
instance of git-pages.
Signed-off-by: oppiliappan <me@oppi.li>