commits
* No `Accept:` header should be the same as `Accept: */*`.
* For unresolved reference error, `text/plain` should take priority.
This will be used for incremental archive updates.
This will be used for incremental archive uploads.
This will be used for incremental archive uploads.
This will be used for incremental archive uploads.
Unfortunately this is still not enough to fit into codeberg-medium :(
It has been tested on Grebedoc (Fly.io servers) and found to work
satisfactorily, though without any apparent benefit. It requires client
opt-in and so enabling it at all times is benign.
The PATCH method has been tested by myself and on Codeberg and found
to work satisfactorily.
Because using PATCH causes the git-pages server to store state that
is not necessarily easily reproducible from any single specific source
(i.e. it stores a composition of many disparate requests), it may be
necessary to back it up. For this, the feature `archive-site` is also
stabilized. It has not seen much use but not providing a backup method
would be a disservice.
This helps debugging slow scripts (e.g. using ClamAV).
To use this function, configure git-pages with e.g.:
[audit]
collect = true
notify-url = "http://localhost:3004/"
and run an audit server with e.g.:
git-pages -audit-server tcp/:3004 python $(pwd)/process.py
The provided command line is executed after appending two arguments
(audit record ID and event type), and runs in a temporary directory
with the audit record extracted into it. The following files will
be present in this directory:
* `$1-event.json` (always)
* `$1-manifest.json` (if type is `CommitManifest`)
* `$1-archive.tar` (if type is `CommitManifest`)
The script must complete successfully for the event processing to
finish. The notification will keep being re-sent (by the worker) with
exponential backoff until it does.
This acts like `mkdir -p`, making it much less annoying to deploy
e.g. documentation preview generators that use deep paths.
Like before, the site must already exist: we cannot do a CAS on
a non-existent manifest at the moment.
Neither of these names is self-explanatory, and it is better to have
fewer distinct identifiers for the same concept.
This makes it *much* faster.
Also, record the principal of `git-pages -{freeze,unfreeze}-domain`
and `git-pages -update-site` as the CLI administrator.
Currently this doesn't handle `X-Forwarded-For` and as such isn't very
useful. It is surprisingly difficult to find a high-quality library for
parsing `X-Forwarded-For` and a solution will have to be found.
On Linux and macOS, two file descriptors opened by the same process are
treated as if they were different processes for the purpose of locking.
Last-Modified does not have enough resolution to be fully reliable;
ETag does. This test now passes on both filesystem and MinIO:
$ go run ./test/stresspatch -count 100
...
written: 100 of 100
Other S3 implementations haven't been tested.
Gated behind the `patch` feature.
This is only a breaking change if you've enabled the `audit` feature.
All past audit reports should be removed once this commit is deployed,
as both the Protobuf schema and the Snowflake epoch have changed.
This avoids spreading URL parse error handling code all over
the codebase. It's not even easy to trigger that error!
This caused a crash when setting `PAGES_AUDIT_NODE_ID` or using
`-print-config-env-vars`.
The list of audit events is:
- `CommitManifest`
- `DeleteManifest`
- `FreezeDomain`
- `UnfreezeDomain`
Currently these are the main abuse/moderation-relevant actions.
If collection is enabled, these events will be logged to `audit/...`
storage hierarchy; a way to examine audit logs will be added in
the future.
The auditing interposer backend is enabled with feature `audit`.
Without this, some of the slog lines end in `\n` and some do not, which
I find deeply irritating.
There was never a particularly good reason to tie the fallback handler
to a wildcard domain; most importantly, this prevented it from being
used for custom domains, which is required for migrating custom domains
from Codeberg Pages v2 server.
The following script may be used to handle abusive sites:
cd $(mktemp -d)
echo "<h1>Gone</h1>" >index.html
echo "/* /index.html 410" >_redirects
tar cf site.tar index.html _redirects
git-pages -update-site $1 site.tar
git-pages -freeze-domain $1
This commit changes the git fetch algorithm to only retrieve blobs
that aren't included in the previously deployed site manifest, if
git filters are supported by the remote.
It also changes how manifest entry sizes are represented, such that
both decompressed and compressed sizes are stored. This enables
computing accurate (and repeatable) sizes even after incremental
updates.
Co-authored-by: David Leadbeater <dgl@dgl.cx>
Forcing the server to repeatedly decompress a large blob is a potential
DoS vector, so having a metric for this is essential.
The former metric was misnamed: it only counted NoSuchKey errors.
Also, it was applied *after* the cache, meaning it was just a count
of every request that got a successful 404 from the S3 backend.
Also, it pooled blob and manifest requests together.
The new metric is 1-to-1 correspondent to S3 requests and distinguishes
between different kinds of errors. Also, it distinguishes kinds of
requests. Example output:
git_pages_s3_get_object_responses_count{code="NoSuchKey",kind="manifest"} 1
git_pages_s3_get_object_responses_count{code="OK",kind="blob"} 1
git_pages_s3_get_object_responses_count{code="OK",kind="manifest"} 1
Related to 4cca8abaf0.
They've fixed it in https://github.com/getsentry/sentry-go/issues/1142
Via `sentry-telemetry-buffer` feature.
I think this causes high CPU use on Grebedoc.
The PATCH method has been tested by myself and on Codeberg and found
to work satisfactorily.
Because using PATCH causes the git-pages server to store state that
is not necessarily easily reproducible from any single specific source
(i.e. it stores a composition of many disparate requests), it may be
necessary to back it up. For this, the feature `archive-site` is also
stabilized. It has not seen much use but not providing a backup method
would be a disservice.
To use this function, configure git-pages with e.g.:
[audit]
collect = true
notify-url = "http://localhost:3004/"
and run an audit server with e.g.:
git-pages -audit-server tcp/:3004 python $(pwd)/process.py
The provided command line is executed after appending two arguments
(audit record ID and event type), and runs in a temporary directory
with the audit record extracted into it. The following files will
be present in this directory:
* `$1-event.json` (always)
* `$1-manifest.json` (if type is `CommitManifest`)
* `$1-archive.tar` (if type is `CommitManifest`)
The script must complete successfully for the event processing to
finish. The notification will keep being re-sent (by the worker) with
exponential backoff until it does.
The list of audit events is:
- `CommitManifest`
- `DeleteManifest`
- `FreezeDomain`
- `UnfreezeDomain`
Currently these are the main abuse/moderation-relevant actions.
If collection is enabled, these events will be logged to `audit/...`
storage hierarchy; a way to examine audit logs will be added in
the future.
The auditing interposer backend is enabled with feature `audit`.
This commit changes the git fetch algorithm to only retrieve blobs
that aren't included in the previously deployed site manifest, if
git filters are supported by the remote.
It also changes how manifest entry sizes are represented, such that
both decompressed and compressed sizes are stored. This enables
computing accurate (and repeatable) sizes even after incremental
updates.
Co-authored-by: David Leadbeater <dgl@dgl.cx>
The former metric was misnamed: it only counted NoSuchKey errors.
Also, it was applied *after* the cache, meaning it was just a count
of every request that got a successful 404 from the S3 backend.
Also, it pooled blob and manifest requests together.
The new metric is 1-to-1 correspondent to S3 requests and distinguishes
between different kinds of errors. Also, it distinguishes kinds of
requests. Example output:
git_pages_s3_get_object_responses_count{code="NoSuchKey",kind="manifest"} 1
git_pages_s3_get_object_responses_count{code="OK",kind="blob"} 1
git_pages_s3_get_object_responses_count{code="OK",kind="manifest"} 1