[mirror] Scalable static site server for Git forges (like GitHub Pages)

Allow `PATCH` method to apply partial updates.

Gated behind the `patch` feature.

+14 -4
README.md
··· 69 69 - If the `PUT` method receives an `application/x-tar`, `application/x-tar+gzip`, `application/x-tar+zstd`, or `application/zip` body, it contains an archive to be extracted. 70 70 - The `POST` method requires an `application/json` body containing a Forgejo/Gitea/Gogs/GitHub webhook event payload. Requests where the `ref` key contains anything other than `refs/heads/pages` are ignored, and only the `pages` branch is used. The `repository.clone_url` key contains a repository URL to be shallowly cloned. 71 71 - If the received contents is empty, performs the same action as `DELETE`. 72 + * **With feature `patch`:** In response to a `PATCH` request, the server partially updates a site with new content. The URL of the request must be the root URL of the site that is being published. 73 + - The request must have a `application/x-tar`, `application/x-tar+gzip`, or `application/x-tar+zstd` body, whose contents is *merged* with the existing site contents as follows: 74 + - A character device entry with major 0 and minor 0 is treated as a "whiteout marker" (following [unionfs][whiteout]): it causes any existing file or directory with the same name to be deleted. 75 + - A directory entry replaces any existing file or directory with the same name (if any), recursively removing the old contents. 76 + - A file or symlink entry replaces any existing file or directory with the same name (if any). 77 + - In any case, the parent of an entry must exist and be a directory. 78 + - The request must have a `Race-Free: yes` or `Race-Free: no` header. Not every backend configuration makes it possible to perform atomic compare-and-swap operations; on backends without atomic CAS support, `Race-Free: yes` requests will fail, while `Race-Free: no` requests will provide a best-effort approximation. 79 + - If a `PATCH` request loses a race against another content update request, it may return `409 Conflict`. This is true regardless of the `Race-Free:` header value. Whenever this happens, resubmit the request as-is. 80 + - If the site has no contents after the update is applied, performs the same action as `DELETE`. 72 81 * In response to a `DELETE` request, the server unpublishes a site. The URL of the request must be the root URL of the site that is being unpublished. Site data remains stored for an indeterminate period of time, but becomes completely inaccessible. 73 - * If a `Dry-Run: yes` header is provided with a `PUT`, `DELETE`, or `POST` request, only the authorization checks are run; no destructive updates are made. Note that this functionality was added in _git-pages_ v0.2.0. 82 + * If a `Dry-Run: yes` header is provided with a `PUT`, `PATCH`, `DELETE`, or `POST` request, only the authorization checks are run; no destructive updates are made. Note that this functionality was added in _git-pages_ v0.2.0. 74 83 * All updates to site content are atomic (subject to consistency guarantees of the storage backend). That is, there is an instantaneous moment during an update before which the server will return the old content and after which it will return the new content. 75 84 * Files with a certain name, when placed in the root of a site, have special functions: 76 85 - [Netlify `_redirects`][_redirects] file can be used to specify HTTP redirect and rewrite rules. The _git-pages_ implementation currently does not support placeholders, query parameters, or conditions, and may differ from Netlify in other minor ways. If you find that a supported `_redirects` file feature does not work the same as on Netlify, please file an issue. (Note that _git-pages_ does not perform URL normalization; `/foo` and `/foo/` are *not* the same, unlike with Netlify.) ··· 81 90 [_headers]: https://docs.netlify.com/manage/routing/headers/ 82 91 [cors]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CORS 83 92 [go-git-sha256]: https://github.com/go-git/go-git/issues/706 93 + [whiteout]: https://docs.kernel.org/filesystems/overlayfs.html#whiteouts-and-opaque-directories 84 94 85 95 86 96 Authorization ··· 88 98 89 99 DNS is the primary authorization method, using either TXT records or wildcard matching. In certain cases, git forge authorization is used in addition to DNS. 90 100 91 - The authorization flow for content updates (`PUT`, `DELETE`, `POST` requests) proceeds sequentially in the following order, with the first of multiple applicable rule taking precedence: 101 + The authorization flow for content updates (`PUT`, `PATCH`, `DELETE`, `POST` requests) proceeds sequentially in the following order, with the first of multiple applicable rule taking precedence: 92 102 93 103 1. **Development Mode:** If the environment variable `PAGES_INSECURE` is set to a truthful value like `1`, the request is authorized. 94 - 2. **DNS Challenge:** If the method is `PUT`, `DELETE`, `POST`, and a well-formed `Authorization:` header is provided containing a `<token>`, and a TXT record lookup at `_git-pages-challenge.<host>` returns a record whose concatenated value equals `SHA256("<host> <token>")`, the request is authorized. 104 + 2. **DNS Challenge:** If the method is `PUT`, `PATCH`, `DELETE`, `POST`, and a well-formed `Authorization:` header is provided containing a `<token>`, and a TXT record lookup at `_git-pages-challenge.<host>` returns a record whose concatenated value equals `SHA256("<host> <token>")`, the request is authorized. 95 105 - **`Pages` scheme:** Request includes an `Authorization: Pages <token>` header. 96 106 - **`Basic` scheme:** Request includes an `Authorization: Basic <basic>` header, where `<basic>` is equal to `Base64("Pages:<token>")`. (Useful for non-Forgejo forges.) 97 107 3. **DNS Allowlist:** If the method is `PUT` or `POST`, and the request URL is `scheme://<user>.<host>/`, and a TXT record lookup at `_git-pages-repository.<host>` returns a set of well-formed absolute URLs, and (for `PUT` requests) the body contains a repository URL, and the requested clone URLs is contained in this set of URLs, the request is authorized. 98 108 4. **Wildcard Match (content):** If the method is `POST`, and a `[[wildcard]]` configuration section exists where the suffix of a hostname (compared label-wise) is equal to `[[wildcard]].domain`, and (for `PUT` requests) the body contains a repository URL, and the requested clone URL is a *matching* clone URL, the request is authorized. 99 109 - **Index repository:** If the request URL is `scheme://<user>.<host>/`, a *matching* clone URL is computed by templating `[[wildcard]].clone-url` with `<user>` and `<project>`, where `<project>` is computed by templating each element of `[[wildcard]].index-repos` with `<user>`, and `[[wildcard]]` is the section where the match occurred. 100 110 - **Project repository:** If the request URL is `scheme://<user>.<host>/<project>/`, a *matching* clone URL is computed by templating `[[wildcard]].clone-url` with `<user>` and `<project>`, and `[[wildcard]]` is the section where the match occurred. 101 - 5. **Forge Authorization:** If the method is `PUT`, and the body contains an archive, and a `[[wildcard]]` configuration section exists where the suffix of a hostname (compared label-wise) is equal to `[[wildcard]].domain`, and `[[wildcard]].authorization` is non-empty, and the request includes a `Forge-Authorization:` header, and the header (when forwarded as `Authorization:`) grants push permissions to a repository at the *matching* clone URL (as defined above) as determined by an API call to the forge, the request is authorized. (This enables publishing a site for a private repository.) 111 + 5. **Forge Authorization:** If the method is `PUT` or `PATCH`, and the body contains an archive, and a `[[wildcard]]` configuration section exists where the suffix of a hostname (compared label-wise) is equal to `[[wildcard]].domain`, and `[[wildcard]].authorization` is non-empty, and the request includes a `Forge-Authorization:` header, and the header (when forwarded as `Authorization:`) grants push permissions to a repository at the *matching* clone URL (as defined above) as determined by an API call to the forge, the request is authorized. (This enables publishing a site for a private repository.) 102 112 5. **Default Deny:** Otherwise, the request is not authorized. 103 113 104 114 The authorization flow for metadata retrieval (`GET` requests with site paths starting with `.git-pages/`) in the following order, with the first of multiple applicable rule taking precedence:
+8 -4
src/audit.go
··· 144 144 } 145 145 } 146 146 147 - func (audited *auditedBackend) CommitManifest(ctx context.Context, name string, manifest *Manifest) (err error) { 147 + func (audited *auditedBackend) CommitManifest( 148 + ctx context.Context, name string, manifest *Manifest, opts ModifyManifestOptions, 149 + ) (err error) { 148 150 domain, project, ok := strings.Cut(name, "/") 149 151 if !ok { 150 152 panic("malformed manifest name") ··· 156 158 Manifest: manifest, 157 159 }) 158 160 159 - return audited.Backend.CommitManifest(ctx, name, manifest) 161 + return audited.Backend.CommitManifest(ctx, name, manifest, opts) 160 162 } 161 163 162 - func (audited *auditedBackend) DeleteManifest(ctx context.Context, name string) (err error) { 164 + func (audited *auditedBackend) DeleteManifest( 165 + ctx context.Context, name string, opts ModifyManifestOptions, 166 + ) (err error) { 163 167 domain, project, ok := strings.Cut(name, "/") 164 168 if !ok { 165 169 panic("malformed manifest name") ··· 170 174 Project: proto.String(project), 171 175 }) 172 176 173 - return audited.Backend.DeleteManifest(ctx, name) 177 + return audited.Backend.DeleteManifest(ctx, name, opts) 174 178 } 175 179 176 180 func (audited *auditedBackend) FreezeDomain(ctx context.Context, domain string, freeze bool) (err error) {
+16 -3
src/backend.go
··· 12 12 ) 13 13 14 14 var ErrObjectNotFound = errors.New("not found") 15 + var ErrPreconditionFailed = errors.New("precondition failed") 16 + var ErrWriteConflict = errors.New("write conflict") 15 17 var ErrDomainFrozen = errors.New("domain administratively frozen") 16 18 17 19 func splitBlobName(name string) []string { ··· 33 35 // If true and the manifest is past the cache `MaxAge`, `GetManifest` blocks and returns 34 36 // a fresh object instead of revalidating in background and returning a stale object. 35 37 BypassCache bool 38 + } 39 + 40 + type ModifyManifestOptions struct { 41 + // If non-zero, the request will only succeed if the manifest hasn't been changed since 42 + // the given time. Whether this is racy or not is can be determined via `HasAtomicCAS()`. 43 + IfUnmodifiedSince time.Time 36 44 } 37 45 38 46 type QueryAuditLogOptions struct { ··· 81 89 // effects. 82 90 StageManifest(ctx context.Context, manifest *Manifest) error 83 91 92 + // Whether a compare-and-swap operation on a manifest is truly race-free, or only best-effort 93 + // atomic with a small but non-zero window where two requests may race where the one committing 94 + // first will have its update lost. (Plain swap operations are always guaranteed to be atomic.) 95 + HasAtomicCAS(ctx context.Context) bool 96 + 84 97 // Commit a manifest. This is an atomic operation; `GetManifest` calls will return either 85 98 // the old version or the new version of the manifest, never anything else. 86 - CommitManifest(ctx context.Context, name string, manifest *Manifest) error 99 + CommitManifest(ctx context.Context, name string, manifest *Manifest, opts ModifyManifestOptions) error 87 100 88 101 // Delete a manifest. 89 - DeleteManifest(ctx context.Context, name string) error 102 + DeleteManifest(ctx context.Context, name string, opts ModifyManifestOptions) error 90 103 91 104 // List all manifests. 92 105 ListManifests(ctx context.Context) (manifests []string, err error) ··· 114 127 func CreateBackend(config *StorageConfig) (backend Backend, err error) { 115 128 switch config.Type { 116 129 case "fs": 117 - if backend, err = NewFSBackend(&config.FS); err != nil { 130 + if backend, err = NewFSBackend(context.Background(), &config.FS); err != nil { 118 131 err = fmt.Errorf("fs backend: %w", err) 119 132 } 120 133 case "s3":
+110 -7
src/backend_fs.go
··· 11 11 "os" 12 12 "path/filepath" 13 13 "strings" 14 + "sync" 14 15 "time" 15 16 ) 16 17 17 18 type FSBackend struct { 18 - blobRoot *os.Root 19 - siteRoot *os.Root 20 - auditRoot *os.Root 19 + blobRoot *os.Root 20 + siteRoot *os.Root 21 + auditRoot *os.Root 22 + hasAtomicCAS bool 21 23 } 22 24 23 25 var _ Backend = (*FSBackend)(nil) ··· 56 58 return tempPath, nil 57 59 } 58 60 59 - func NewFSBackend(config *FSConfig) (*FSBackend, error) { 61 + func checkAtomicCAS(root *os.Root) bool { 62 + fileName := ".hasAtomicCAS" 63 + file, err := root.Create(fileName) 64 + if err != nil { 65 + panic(err) 66 + } 67 + root.Remove(fileName) 68 + defer file.Close() 69 + 70 + flockErr := FileLock(file) 71 + funlockErr := FileUnlock(file) 72 + return (flockErr == nil && funlockErr == nil) 73 + } 74 + 75 + func NewFSBackend(ctx context.Context, config *FSConfig) (*FSBackend, error) { 60 76 blobRoot, err := maybeCreateOpenRoot(config.Root, "blob") 61 77 if err != nil { 62 78 return nil, fmt.Errorf("blob: %w", err) ··· 69 85 if err != nil { 70 86 return nil, fmt.Errorf("audit: %w", err) 71 87 } 72 - return &FSBackend{blobRoot, siteRoot, auditRoot}, nil 88 + hasAtomicCAS := checkAtomicCAS(siteRoot) 89 + if hasAtomicCAS { 90 + logc.Println(ctx, "fs: has atomic CAS") 91 + } else { 92 + logc.Println(ctx, "fs: has best-effort CAS") 93 + } 94 + return &FSBackend{blobRoot, siteRoot, auditRoot, hasAtomicCAS}, nil 73 95 } 74 96 75 97 func (fs *FSBackend) Backend() Backend { ··· 229 251 } 230 252 } 231 253 232 - func (fs *FSBackend) CommitManifest(ctx context.Context, name string, manifest *Manifest) error { 254 + func (fs *FSBackend) HasAtomicCAS(ctx context.Context) bool { 255 + // On a suitable filesystem, POSIX advisory locks can be used to implement atomic CAS. 256 + // An implementation consists of two parts: 257 + // - Intra-process mutex set (one per manifest), to prevent races between goroutines; 258 + // - Inter-process POSIX advisory locks (one per manifest), to prevent races between 259 + // different git-pages instances. 260 + return fs.hasAtomicCAS 261 + } 262 + 263 + // Right now updates aren't very common, so this lock is essentially entirely uncontended. 264 + // If it ever becomes a bottleneck it should be replaced with a per-manifest lock. 265 + var sharedManifestLock = sync.Mutex{} 266 + 267 + type manifestLockGuard struct { 268 + file *os.File 269 + } 270 + 271 + func lockManifest(fs *os.Root, name string) (*manifestLockGuard, error) { 272 + file, err := fs.Open(name) 273 + if errors.Is(err, os.ErrNotExist) { 274 + return &manifestLockGuard{nil}, nil 275 + } else if err != nil { 276 + return nil, fmt.Errorf("open: %w", err) 277 + } 278 + if err := FileLock(file); err != nil { 279 + file.Close() 280 + return nil, fmt.Errorf("flock(LOCK_EX): %w", err) 281 + } 282 + sharedManifestLock.Lock() 283 + return &manifestLockGuard{file}, nil 284 + } 285 + 286 + func (guard *manifestLockGuard) Unlock() { 287 + if guard.file != nil { 288 + FileUnlock(guard.file) 289 + guard.file.Close() 290 + sharedManifestLock.Unlock() 291 + } 292 + } 293 + 294 + func (fs *FSBackend) checkManifestPrecondition( 295 + ctx context.Context, name string, opts ModifyManifestOptions, 296 + ) error { 297 + if !opts.IfUnmodifiedSince.IsZero() { 298 + stat, err := fs.siteRoot.Stat(name) 299 + if err != nil { 300 + return fmt.Errorf("stat: %w", err) 301 + } 302 + 303 + if stat.ModTime().Compare(opts.IfUnmodifiedSince) > 0 { 304 + return fmt.Errorf("%w: If-Unmodified-Since", ErrPreconditionFailed) 305 + } 306 + } 307 + 308 + return nil 309 + } 310 + 311 + func (fs *FSBackend) CommitManifest( 312 + ctx context.Context, name string, manifest *Manifest, opts ModifyManifestOptions, 313 + ) error { 314 + if guard, err := lockManifest(fs.siteRoot, name); err != nil { 315 + return err 316 + } else { 317 + defer guard.Unlock() 318 + } 319 + 233 320 domain := filepath.Dir(name) 234 321 if err := fs.checkDomainFrozen(ctx, domain); err != nil { 322 + return err 323 + } 324 + 325 + if err := fs.checkManifestPrecondition(ctx, name, opts); err != nil { 235 326 return err 236 327 } 237 328 ··· 253 344 return nil 254 345 } 255 346 256 - func (fs *FSBackend) DeleteManifest(ctx context.Context, name string) error { 347 + func (fs *FSBackend) DeleteManifest( 348 + ctx context.Context, name string, opts ModifyManifestOptions, 349 + ) error { 350 + if guard, err := lockManifest(fs.siteRoot, name); err != nil { 351 + return err 352 + } else { 353 + defer guard.Unlock() 354 + } 355 + 257 356 domain := filepath.Dir(name) 258 357 if err := fs.checkDomainFrozen(ctx, domain); err != nil { 358 + return err 359 + } 360 + 361 + if err := fs.checkManifestPrecondition(ctx, name, opts); err != nil { 259 362 return err 260 363 } 261 364
+54 -3
src/backend_s3.go
··· 530 530 } 531 531 } 532 532 533 - func (s3 *S3Backend) CommitManifest(ctx context.Context, name string, manifest *Manifest) error { 533 + func (s3 *S3Backend) HasAtomicCAS(ctx context.Context) bool { 534 + // Support for `If-Unmodified-Since:` or `If-Match:` for PutObject requests is very spotty: 535 + // - AWS supports only `If-Match:`: 536 + // https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html 537 + // - Minio supports `If-Match:`: 538 + // https://blog.min.io/leading-the-way-minios-conditional-write-feature-for-modern-data-workloads/ 539 + // - Tigris supports `If-Unmodified-Since:` and `If-Match:`, but only with `X-Tigris-Consistent: true`; 540 + // https://www.tigrisdata.com/docs/objects/conditionals/ 541 + // Note that the `X-Tigris-Consistent: true` header must be present on *every* transaction 542 + // touching the object, not just on the CAS transactions. 543 + // - Wasabi does not support either one and docs seem to suggest that the headers are ignored; 544 + // - Garage does not support either one and source code suggests the headers are ignored. 545 + // It seems that the only safe option is to not claim support for atomic CAS, and only do 546 + // best-effort CAS implementation using HeadObject and PutObject/DeleteObject. 547 + return false 548 + } 549 + 550 + func (s3 *S3Backend) checkManifestPrecondition( 551 + ctx context.Context, name string, opts ModifyManifestOptions, 552 + ) error { 553 + if !opts.IfUnmodifiedSince.IsZero() { 554 + stat, err := s3.client.StatObject(ctx, s3.bucket, manifestObjectName(name), 555 + minio.GetObjectOptions{}) 556 + if err != nil { 557 + return fmt.Errorf("stat: %w", err) 558 + } 559 + 560 + if stat.LastModified.Compare(opts.IfUnmodifiedSince) > 0 { 561 + return fmt.Errorf("%w: If-Unmodified-Since", ErrPreconditionFailed) 562 + } 563 + } 564 + 565 + return nil 566 + } 567 + 568 + func (s3 *S3Backend) CommitManifest( 569 + ctx context.Context, name string, manifest *Manifest, opts ModifyManifestOptions, 570 + ) error { 534 571 data := EncodeManifest(manifest) 535 572 logc.Printf(ctx, "s3: commit manifest %x -> %s", sha256.Sum256(data), name) 536 573 537 574 _, domain, _ := strings.Cut(name, "/") 538 575 if err := s3.checkDomainFrozen(ctx, domain); err != nil { 576 + return err 577 + } 578 + 579 + if err := s3.checkManifestPrecondition(ctx, name, opts); err != nil { 539 580 return err 540 581 } 541 582 ··· 547 588 minio.RemoveObjectOptions{}) 548 589 s3.siteCache.Cache.Invalidate(name) 549 590 if putErr != nil { 550 - return putErr 591 + if errResp := minio.ToErrorResponse(putErr); errResp.Code == "PreconditionFailed" { 592 + return ErrPreconditionFailed 593 + } else { 594 + return putErr 595 + } 551 596 } else if removeErr != nil { 552 597 return removeErr 553 598 } else { ··· 555 600 } 556 601 } 557 602 558 - func (s3 *S3Backend) DeleteManifest(ctx context.Context, name string) error { 603 + func (s3 *S3Backend) DeleteManifest( 604 + ctx context.Context, name string, opts ModifyManifestOptions, 605 + ) error { 559 606 logc.Printf(ctx, "s3: delete manifest %s\n", name) 560 607 561 608 _, domain, _ := strings.Cut(name, "/") 562 609 if err := s3.checkDomainFrozen(ctx, domain); err != nil { 610 + return err 611 + } 612 + 613 + if err := s3.checkManifestPrecondition(ctx, name, opts); err != nil { 563 614 return err 564 615 } 565 616
+1 -1
src/extract.go
··· 77 77 case tar.TypeDir: 78 78 AddDirectory(manifest, fileName) 79 79 default: 80 - AddProblem(manifest, fileName, "unsupported type '%c'", header.Typeflag) 80 + AddProblem(manifest, fileName, "tar: unsupported type '%c'", header.Typeflag) 81 81 continue 82 82 } 83 83 }
+16
src/flock_other.go
··· 1 + //go:build !unix 2 + 3 + package git_pages 4 + 5 + import ( 6 + "fmt" 7 + "os" 8 + ) 9 + 10 + func FileLock(file *os.File) error { 11 + return fmt.Errorf("unimplemented") 12 + } 13 + 14 + func FileUnlock(file *os.File) error { 15 + return fmt.Errorf("unimplemented") 16 + }
+16
src/flock_posix.go
··· 1 + //go:build unix 2 + 3 + package git_pages 4 + 5 + import ( 6 + "os" 7 + "syscall" 8 + ) 9 + 10 + func FileLock(file *os.File) error { 11 + return syscall.Flock(int(file.Fd()), syscall.LOCK_EX) 12 + } 13 + 14 + func FileUnlock(file *os.File) error { 15 + return syscall.Flock(int(file.Fd()), syscall.LOCK_UN) 16 + }
+4 -2
src/manifest.go
··· 270 270 271 271 // Uploads inline file data over certain size to the storage backend. Returns a copy of 272 272 // the manifest updated to refer to an external content-addressable store. 273 - func StoreManifest(ctx context.Context, name string, manifest *Manifest) (*Manifest, error) { 273 + func StoreManifest( 274 + ctx context.Context, name string, manifest *Manifest, opts ModifyManifestOptions, 275 + ) (*Manifest, error) { 274 276 span, ctx := ObserveFunction(ctx, "StoreManifest", "manifest.name", name) 275 277 defer span.Finish() 276 278 ··· 349 351 return nil, err // currently ignores all but 1st error 350 352 } 351 353 352 - if err := backend.CommitManifest(ctx, name, &extManifest); err != nil { 354 + if err := backend.CommitManifest(ctx, name, &extManifest, opts); err != nil { 353 355 if errors.Is(err, ErrDomainFrozen) { 354 356 return nil, err 355 357 } else {
+8 -4
src/observe.go
··· 403 403 return 404 404 } 405 405 406 - func (backend *observedBackend) CommitManifest(ctx context.Context, name string, manifest *Manifest) (err error) { 406 + func (backend *observedBackend) HasAtomicCAS(ctx context.Context) bool { 407 + return backend.inner.HasAtomicCAS(ctx) 408 + } 409 + 410 + func (backend *observedBackend) CommitManifest(ctx context.Context, name string, manifest *Manifest, opts ModifyManifestOptions) (err error) { 407 411 span, ctx := ObserveFunction(ctx, "CommitManifest", "manifest.name", name) 408 - err = backend.inner.CommitManifest(ctx, name, manifest) 412 + err = backend.inner.CommitManifest(ctx, name, manifest, opts) 409 413 span.Finish() 410 414 return 411 415 } 412 416 413 - func (backend *observedBackend) DeleteManifest(ctx context.Context, name string) (err error) { 417 + func (backend *observedBackend) DeleteManifest(ctx context.Context, name string, opts ModifyManifestOptions) (err error) { 414 418 span, ctx := ObserveFunction(ctx, "DeleteManifest", "manifest.name", name) 415 - err = backend.inner.DeleteManifest(ctx, name) 419 + err = backend.inner.DeleteManifest(ctx, name, opts) 416 420 span.Finish() 417 421 return 418 422 }
+92 -7
src/pages.go
··· 46 46 }, []string{"cause"}) 47 47 ) 48 48 49 - func reportSiteUpdate(via string, result *UpdateResult) { 49 + func observeSiteUpdate(via string, result *UpdateResult) { 50 50 siteUpdatesCount.With(prometheus.Labels{"via": via}).Inc() 51 - 52 51 switch result.outcome { 53 52 case UpdateError: 54 53 siteUpdateErrorCount.With(prometheus.Labels{"cause": "other"}).Inc() ··· 358 357 } 359 358 if !negotiatedEncoding { 360 359 w.WriteHeader(http.StatusNotAcceptable) 361 - return fmt.Errorf("no supported content encodings (accept-encoding: %q)", 360 + return fmt.Errorf("no supported content encodings (Accept-Encoding: %q)", 362 361 r.Header.Get("Accept-Encoding")) 363 362 } 364 363 ··· 420 419 func putPage(w http.ResponseWriter, r *http.Request) error { 421 420 var result UpdateResult 422 421 422 + for _, header := range []string{ 423 + "If-Modified-Since", "If-Unmodified-Since", "If-Match", "If-None-Match", 424 + } { 425 + if r.Header.Get(header) != "" { 426 + http.Error(w, fmt.Sprintf("unsupported precondition %s", header), http.StatusBadRequest) 427 + return nil 428 + } 429 + } 430 + 423 431 host, err := GetHost(r) 424 432 if err != nil { 425 433 return err ··· 483 491 result = UpdateFromArchive(updateCtx, webRoot, contentType, reader) 484 492 } 485 493 494 + return reportUpdateResult(w, result) 495 + } 496 + 497 + func patchPage(w http.ResponseWriter, r *http.Request) error { 498 + if !config.Feature("patch") { 499 + http.Error(w, "method not allowed", http.StatusMethodNotAllowed) 500 + return nil 501 + } 502 + 503 + for _, header := range []string{ 504 + "If-Modified-Since", "If-Unmodified-Since", "If-Match", "If-None-Match", 505 + } { 506 + if r.Header.Get(header) != "" { 507 + http.Error(w, fmt.Sprintf("unsupported precondition %s", header), http.StatusBadRequest) 508 + return nil 509 + } 510 + } 511 + 512 + host, err := GetHost(r) 513 + if err != nil { 514 + return err 515 + } 516 + 517 + projectName, err := GetProjectName(r) 518 + if err != nil { 519 + return err 520 + } 521 + 522 + webRoot := makeWebRoot(host, projectName) 523 + 524 + updateCtx, cancel := context.WithTimeout(r.Context(), time.Duration(config.Limits.UpdateTimeout)) 525 + defer cancel() 526 + 527 + if _, err = AuthorizeUpdateFromArchive(r); err != nil { 528 + return err 529 + } 530 + 531 + // Providing atomic compare-and-swap operations might be difficult or impossible depending 532 + // on the backend in use and its configuration, but for applications where a mostly-atomic 533 + // compare-and-swap operation is good enough (e.g. generating page previews) we don't want 534 + // to prevent the use of partial updates. 535 + wantRaceFree := r.Header.Get("Race-Free") 536 + hasAtomicCAS := backend.HasAtomicCAS(r.Context()) 537 + switch { 538 + case wantRaceFree == "yes" && hasAtomicCAS || wantRaceFree == "no": 539 + // all good 540 + case wantRaceFree == "yes": 541 + http.Error(w, "race free partial updates unsupported", http.StatusPreconditionFailed) 542 + return nil 543 + case wantRaceFree == "": 544 + http.Error(w, "must provide \"Race-Free: yes|no\" header", http.StatusPreconditionRequired) 545 + return nil 546 + default: 547 + http.Error(w, "malformed Race-Free: header", http.StatusBadRequest) 548 + return nil 549 + } 550 + 551 + if checkDryRun(w, r) { 552 + return nil 553 + } 554 + 555 + contentType := getMediaType(r.Header.Get("Content-Type")) 556 + reader := http.MaxBytesReader(w, r.Body, int64(config.Limits.MaxSiteSize.Bytes())) 557 + result := PartialUpdateFromArchive(updateCtx, webRoot, contentType, reader) 558 + return reportUpdateResult(w, result) 559 + } 560 + 561 + func reportUpdateResult(w http.ResponseWriter, result UpdateResult) error { 486 562 switch result.outcome { 487 563 case UpdateError: 488 564 if errors.Is(result.err, ErrManifestTooLarge) { ··· 491 567 w.WriteHeader(http.StatusUnsupportedMediaType) 492 568 } else if errors.Is(result.err, ErrArchiveTooLarge) { 493 569 w.WriteHeader(http.StatusRequestEntityTooLarge) 570 + } else if errors.Is(result.err, ErrMalformedPatch) { 571 + w.WriteHeader(http.StatusUnprocessableEntity) 572 + } else if errors.Is(result.err, ErrPreconditionFailed) { 573 + w.WriteHeader(http.StatusPreconditionFailed) 574 + } else if errors.Is(result.err, ErrWriteConflict) { 575 + w.WriteHeader(http.StatusConflict) 494 576 } else if errors.Is(result.err, ErrDomainFrozen) { 495 577 w.WriteHeader(http.StatusForbidden) 496 578 } else { ··· 521 603 } else { 522 604 fmt.Fprintln(w, "internal error") 523 605 } 524 - reportSiteUpdate("rest", &result) 606 + observeSiteUpdate("rest", &result) 525 607 return nil 526 608 } 527 609 ··· 545 627 return nil 546 628 } 547 629 548 - err = backend.DeleteManifest(r.Context(), makeWebRoot(host, projectName)) 630 + err = backend.DeleteManifest(r.Context(), makeWebRoot(host, projectName), 631 + ModifyManifestOptions{}) 549 632 if err != nil { 550 633 w.WriteHeader(http.StatusInternalServerError) 551 634 } else { ··· 656 739 657 740 result := UpdateFromRepository(ctx, webRoot, repoURL, auth.branch) 658 741 resultChan <- result 659 - reportSiteUpdate("webhook", &result) 742 + observeSiteUpdate("webhook", &result) 660 743 }(context.Background()) 661 744 662 745 var result UpdateResult ··· 716 799 } 717 800 } 718 801 } 719 - allowedMethods := []string{"OPTIONS", "HEAD", "GET", "PUT", "DELETE", "POST"} 802 + allowedMethods := []string{"OPTIONS", "HEAD", "GET", "PUT", "PATCH", "DELETE", "POST"} 720 803 if r.Method == "OPTIONS" || !slices.Contains(allowedMethods, r.Method) { 721 804 w.Header().Add("Allow", strings.Join(allowedMethods, ", ")) 722 805 } ··· 729 812 err = getPage(w, r) 730 813 case http.MethodPut: 731 814 err = putPage(w, r) 815 + case http.MethodPatch: 816 + err = patchPage(w, r) 732 817 case http.MethodDelete: 733 818 err = deletePage(w, r) 734 819 // webhook API
+128
src/patch.go
··· 1 + package git_pages 2 + 3 + import ( 4 + "archive/tar" 5 + "errors" 6 + "fmt" 7 + "io" 8 + "maps" 9 + "slices" 10 + "strings" 11 + ) 12 + 13 + var ErrMalformedPatch = errors.New("malformed patch") 14 + 15 + // Mutates `manifest` according to a tar stream and the following rules: 16 + // - A character device with major 0 and minor 0 is a "whiteout marker". When placed 17 + // at a given path, this path and its entire subtree (if any) are removed from the manifest. 18 + // - When a directory is placed at a given path, this path and its entire subtree (if any) are 19 + // removed from the manifest and replaced with the contents of the directory. 20 + func ApplyTarPatch(manifest *Manifest, reader io.Reader) error { 21 + type Node struct { 22 + entry *Entry 23 + children map[string]*Node 24 + } 25 + 26 + // Extract the manifest contents (which is using a flat hash map) into a directory tree 27 + // so that recursive delete operations have O(1) complexity. s 28 + var root *Node 29 + sortedNames := slices.Sorted(maps.Keys(manifest.GetContents())) 30 + for _, name := range sortedNames { 31 + entry := manifest.Contents[name] 32 + node := &Node{entry: entry} 33 + if entry.GetType() == Type_Directory { 34 + node.children = map[string]*Node{} 35 + } 36 + if name == "" { 37 + root = node 38 + } else { 39 + segments := strings.Split(name, "/") 40 + fileName := segments[len(segments)-1] 41 + iter := root 42 + for _, segment := range segments[:len(segments)-1] { 43 + if iter.children == nil { 44 + panic("malformed manifest") 45 + } else if _, exists := iter.children[segment]; !exists { 46 + panic("malformed manifest") 47 + } else { 48 + iter = iter.children[segment] 49 + } 50 + } 51 + iter.children[fileName] = node 52 + } 53 + } 54 + manifest.Contents = map[string]*Entry{} 55 + 56 + // Process the archive as a patch operation. 57 + archive := tar.NewReader(reader) 58 + for { 59 + header, err := archive.Next() 60 + if err == io.EOF { 61 + break 62 + } else if err != nil { 63 + return err 64 + } 65 + 66 + segments := strings.Split(strings.TrimRight(header.Name, "/"), "/") 67 + fileName := segments[len(segments)-1] 68 + node := root 69 + for index, segment := range segments[:len(segments)-1] { 70 + if node.children == nil { 71 + dirName := strings.Join(segments[:index], "/") 72 + return fmt.Errorf("%w: %s: not a directory", ErrMalformedPatch, dirName) 73 + } 74 + if _, exists := node.children[segment]; !exists { 75 + nodeName := strings.Join(segments[:index+1], "/") 76 + return fmt.Errorf("%w: %s: path not found", ErrMalformedPatch, nodeName) 77 + } else { 78 + node = node.children[segment] 79 + } 80 + } 81 + if node.children == nil { 82 + dirName := strings.Join(segments[:len(segments)-1], "/") 83 + return fmt.Errorf("%w: %s: not a directory", ErrMalformedPatch, dirName) 84 + } 85 + 86 + switch header.Typeflag { 87 + case tar.TypeReg: 88 + fileData, err := io.ReadAll(archive) 89 + if err != nil { 90 + return fmt.Errorf("tar: %s: %w", header.Name, err) 91 + } 92 + node.children[fileName] = &Node{ 93 + entry: NewManifestEntry(Type_InlineFile, fileData), 94 + } 95 + case tar.TypeSymlink: 96 + node.children[fileName] = &Node{ 97 + entry: NewManifestEntry(Type_Symlink, []byte(header.Linkname)), 98 + } 99 + case tar.TypeDir: 100 + node.children[fileName] = &Node{ 101 + entry: NewManifestEntry(Type_Directory, nil), 102 + children: map[string]*Node{}, 103 + } 104 + case tar.TypeChar: 105 + if header.Devmajor == 0 && header.Devminor == 0 { 106 + delete(node.children, fileName) 107 + } else { 108 + AddProblem(manifest, header.Name, 109 + "tar: unsupported chardev %d,%d", header.Devmajor, header.Devminor) 110 + } 111 + default: 112 + AddProblem(manifest, header.Name, 113 + "tar: unsupported type '%c'", header.Typeflag) 114 + continue 115 + } 116 + } 117 + 118 + // Repopulate manifest contents with the updated directory tree. 119 + var traverse func([]string, *Node) 120 + traverse = func(segments []string, node *Node) { 121 + manifest.Contents[strings.Join(segments, "/")] = node.entry 122 + for fileName, childNode := range node.children { 123 + traverse(append(segments, fileName), childNode) 124 + } 125 + } 126 + traverse([]string{}, root) 127 + return nil 128 + }
+90 -20
src/update.go
··· 6 6 "fmt" 7 7 "io" 8 8 "strings" 9 + 10 + "google.golang.org/protobuf/proto" 9 11 ) 10 12 11 13 type UpdateOutcome int ··· 25 27 err error 26 28 } 27 29 28 - func Update(ctx context.Context, webRoot string, manifest *Manifest) UpdateResult { 29 - var oldManifest, newManifest *Manifest 30 + func Update( 31 + ctx context.Context, webRoot string, oldManifest, newManifest *Manifest, 32 + opts ModifyManifestOptions, 33 + ) UpdateResult { 30 34 var err error 35 + var storedManifest *Manifest 31 36 32 37 outcome := UpdateError 33 - oldManifest, _, _ = backend.GetManifest(ctx, webRoot, GetManifestOptions{}) 34 - if IsManifestEmpty(manifest) { 35 - newManifest, err = manifest, backend.DeleteManifest(ctx, webRoot) 38 + if IsManifestEmpty(newManifest) { 39 + storedManifest, err = newManifest, backend.DeleteManifest(ctx, webRoot, opts) 36 40 if err == nil { 37 41 if oldManifest == nil { 38 42 outcome = UpdateNoChange ··· 40 44 outcome = UpdateDeleted 41 45 } 42 46 } 43 - } else if err = PrepareManifest(ctx, manifest); err == nil { 44 - newManifest, err = StoreManifest(ctx, webRoot, manifest) 47 + } else if err = PrepareManifest(ctx, newManifest); err == nil { 48 + storedManifest, err = StoreManifest(ctx, webRoot, newManifest, opts) 45 49 if err == nil { 46 50 domain, _, _ := strings.Cut(webRoot, "/") 47 51 err = backend.CreateDomain(ctx, domain) ··· 49 53 if err == nil { 50 54 if oldManifest == nil { 51 55 outcome = UpdateCreated 52 - } else if CompareManifest(oldManifest, newManifest) { 56 + } else if CompareManifest(oldManifest, storedManifest) { 53 57 outcome = UpdateNoChange 54 58 } else { 55 59 outcome = UpdateReplaced ··· 69 73 case UpdateNoChange: 70 74 status = "unchanged" 71 75 } 72 - if newManifest.Commit != nil { 73 - logc.Printf(ctx, "update %s ok: %s %s", webRoot, status, *newManifest.Commit) 76 + if storedManifest.Commit != nil { 77 + logc.Printf(ctx, "update %s ok: %s %s", webRoot, status, *storedManifest.Commit) 74 78 } else { 75 79 logc.Printf(ctx, "update %s ok: %s", webRoot, status) 76 80 } ··· 78 82 logc.Printf(ctx, "update %s err: %s", webRoot, err) 79 83 } 80 84 81 - return UpdateResult{outcome, newManifest, err} 85 + return UpdateResult{outcome, storedManifest, err} 82 86 } 83 87 84 88 func UpdateFromRepository( ··· 92 96 93 97 logc.Printf(ctx, "update %s: %s %s\n", webRoot, repoURL, branch) 94 98 95 - oldManifest, _, _ := backend.GetManifest(ctx, webRoot, GetManifestOptions{}) 96 99 // Ignore errors; worst case we have to re-fetch all of the blobs. 100 + oldManifest, _, _ := backend.GetManifest(ctx, webRoot, GetManifestOptions{}) 97 101 98 - manifest, err := FetchRepository(ctx, repoURL, branch, oldManifest) 102 + newManifest, err := FetchRepository(ctx, repoURL, branch, oldManifest) 99 103 if errors.Is(err, context.DeadlineExceeded) { 100 104 result = UpdateResult{UpdateTimeout, nil, fmt.Errorf("update timeout")} 101 105 } else if err != nil { 102 106 result = UpdateResult{UpdateError, nil, err} 103 107 } else { 104 - result = Update(ctx, webRoot, manifest) 108 + result = Update(ctx, webRoot, oldManifest, newManifest, ModifyManifestOptions{}) 105 109 } 106 110 107 111 observeUpdateResult(result) ··· 116 120 contentType string, 117 121 reader io.Reader, 118 122 ) (result UpdateResult) { 119 - var manifest *Manifest 120 123 var err error 121 124 125 + // Ignore errors; here the old manifest is used only to determine the update outcome. 126 + oldManifest, _, _ := backend.GetManifest(ctx, webRoot, GetManifestOptions{}) 127 + 128 + var newManifest *Manifest 122 129 switch contentType { 123 130 case "application/x-tar": 124 131 logc.Printf(ctx, "update %s: (tar)", webRoot) 125 - manifest, err = ExtractTar(reader) // yellow? 132 + newManifest, err = ExtractTar(reader) // yellow? 126 133 case "application/x-tar+gzip": 127 134 logc.Printf(ctx, "update %s: (tar.gz)", webRoot) 128 - manifest, err = ExtractGzip(reader, ExtractTar) // definitely yellow. 135 + newManifest, err = ExtractGzip(reader, ExtractTar) // definitely yellow. 129 136 case "application/x-tar+zstd": 130 137 logc.Printf(ctx, "update %s: (tar.zst)", webRoot) 131 - manifest, err = ExtractZstd(reader, ExtractTar) 138 + newManifest, err = ExtractZstd(reader, ExtractTar) 132 139 case "application/zip": 133 140 logc.Printf(ctx, "update %s: (zip)", webRoot) 134 - manifest, err = ExtractZip(reader) 141 + newManifest, err = ExtractZip(reader) 135 142 default: 136 143 err = errArchiveFormat 137 144 } ··· 140 147 logc.Printf(ctx, "update %s err: %s", webRoot, err) 141 148 result = UpdateResult{UpdateError, nil, err} 142 149 } else { 143 - result = Update(ctx, webRoot, manifest) 150 + result = Update(ctx, webRoot, oldManifest, newManifest, ModifyManifestOptions{}) 151 + } 152 + 153 + observeUpdateResult(result) 154 + return 155 + } 156 + 157 + func PartialUpdateFromArchive( 158 + ctx context.Context, 159 + webRoot string, 160 + contentType string, 161 + reader io.Reader, 162 + ) (result UpdateResult) { 163 + var err error 164 + 165 + // Here the old manifest is used both as a substrate to which a patch is applied, as well 166 + // as a "load linked" operation for a future "store conditional" update which, taken together, 167 + // create an atomic compare-and-swap operation. 168 + oldManifest, oldManifestMtime, err := backend.GetManifest(ctx, webRoot, 169 + GetManifestOptions{BypassCache: true}) 170 + if err != nil { 171 + logc.Printf(ctx, "patch %s err: %s", webRoot, err) 172 + return UpdateResult{UpdateError, nil, err} 173 + } 174 + 175 + applyTarPatch := func(reader io.Reader) (*Manifest, error) { 176 + // Clone the manifest before starting to mutate it. `GetManifest` may return cached 177 + // `*Manifest` objects, which should never be mutated. 178 + newManifest := &Manifest{} 179 + proto.Merge(newManifest, oldManifest) 180 + if err := ApplyTarPatch(newManifest, reader); err != nil { 181 + return nil, err 182 + } else { 183 + return newManifest, nil 184 + } 185 + } 186 + 187 + var newManifest *Manifest 188 + switch contentType { 189 + case "application/x-tar": 190 + logc.Printf(ctx, "patch %s: (tar)", webRoot) 191 + newManifest, err = applyTarPatch(reader) 192 + case "application/x-tar+gzip": 193 + logc.Printf(ctx, "patch %s: (tar.gz)", webRoot) 194 + newManifest, err = ExtractGzip(reader, applyTarPatch) 195 + case "application/x-tar+zstd": 196 + logc.Printf(ctx, "patch %s: (tar.zst)", webRoot) 197 + newManifest, err = ExtractZstd(reader, applyTarPatch) 198 + default: 199 + err = errArchiveFormat 200 + } 201 + 202 + if err != nil { 203 + logc.Printf(ctx, "patch %s err: %s", webRoot, err) 204 + result = UpdateResult{UpdateError, nil, err} 205 + } else { 206 + result = Update(ctx, webRoot, oldManifest, newManifest, 207 + ModifyManifestOptions{IfUnmodifiedSince: oldManifestMtime}) 208 + // The `If-Unmodified-Since` precondition is internally generated here, which means its 209 + // failure shouldn't be surfaced as-is in the HTTP response. If we also accepted options 210 + // from the client, then that precondition failure should surface in the response. 211 + if errors.Is(result.err, ErrPreconditionFailed) { 212 + result.err = ErrWriteConflict 213 + } 144 214 } 145 215 146 216 observeUpdateResult(result)
+112
test/stresspatch/main.go
··· 1 + package main 2 + 3 + import ( 4 + "archive/tar" 5 + "bytes" 6 + "flag" 7 + "fmt" 8 + "io" 9 + "net/http" 10 + "sync" 11 + "time" 12 + ) 13 + 14 + func makeInit() []byte { 15 + writer := bytes.NewBuffer(nil) 16 + archive := tar.NewWriter(writer) 17 + archive.WriteHeader(&tar.Header{ 18 + Typeflag: tar.TypeReg, 19 + Name: "index.html", 20 + }) 21 + archive.Write([]byte{}) 22 + archive.Flush() 23 + return writer.Bytes() 24 + } 25 + 26 + func initSite() { 27 + req, err := http.NewRequest(http.MethodPut, "http://localhost:3000", 28 + bytes.NewReader(makeInit())) 29 + if err != nil { 30 + panic(err) 31 + } 32 + 33 + req.Header.Add("Content-Type", "application/x-tar") 34 + resp, err := http.DefaultClient.Do(req) 35 + if err != nil { 36 + panic(err) 37 + } 38 + defer resp.Body.Close() 39 + } 40 + 41 + func makePatch(n int) []byte { 42 + writer := bytes.NewBuffer(nil) 43 + archive := tar.NewWriter(writer) 44 + archive.WriteHeader(&tar.Header{ 45 + Typeflag: tar.TypeReg, 46 + Name: fmt.Sprintf("%d.txt", n), 47 + }) 48 + archive.Write([]byte{}) 49 + archive.Flush() 50 + return writer.Bytes() 51 + } 52 + 53 + func patchRequest(n int) int { 54 + req, err := http.NewRequest(http.MethodPatch, "http://localhost:3000", 55 + bytes.NewReader(makePatch(n))) 56 + if err != nil { 57 + panic(err) 58 + } 59 + 60 + req.Header.Add("Race-Free", "no") 61 + req.Header.Add("Content-Type", "application/x-tar") 62 + resp, err := http.DefaultClient.Do(req) 63 + if err != nil { 64 + panic(err) 65 + } 66 + defer resp.Body.Close() 67 + 68 + data, err := io.ReadAll(resp.Body) 69 + if err != nil { 70 + panic(err) 71 + } 72 + 73 + fmt.Printf("%d: %s %q\n", n, resp.Status, string(data)) 74 + return resp.StatusCode 75 + } 76 + 77 + func concurrentWriter(wg *sync.WaitGroup, n int) { 78 + for { 79 + if patchRequest(n) == 200 { 80 + break 81 + } 82 + } 83 + wg.Done() 84 + } 85 + 86 + var count = flag.Int("count", 10, "request count") 87 + 88 + func main() { 89 + flag.Parse() 90 + 91 + initSite() 92 + time.Sleep(time.Second) 93 + 94 + wg := &sync.WaitGroup{} 95 + for n := range *count { 96 + wg.Add(1) 97 + go concurrentWriter(wg, n) 98 + } 99 + wg.Wait() 100 + 101 + success := 0 102 + for n := range *count { 103 + resp, err := http.Get(fmt.Sprintf("http://localhost:3000/%d.txt", n)) 104 + if err != nil { 105 + panic(err) 106 + } 107 + if resp.StatusCode == 200 { 108 + success++ 109 + } 110 + } 111 + fmt.Printf("written: %d of %d\n", success, *count) 112 + }