infrastructure#
Single-tenant AT Protocol infrastructure on Hetzner Cloud. Runs a PDS (sans-self.org), a Tangled knot (knot.sans-self.org), and a Zot container registry (zot.sans-self.org) on a k3s cluster.
Stack#
- Infra: OpenTofu (kube-hetzner module v2.18.5) + Hetzner Cloud
- Cluster: k3s with 3× CAX21 ARM nodes (HA embedded etcd, nbg1)
- Storage: JuiceFS CSI backed by Hetzner Object Storage (S3), Redis metadata
- Blobs: PDS blob storage on S3 (native, not filesystem)
- State: OpenTofu remote backend on S3
- Secrets: git-crypt
- Manifests: Kustomize
- DNS: Hetzner DNS (sans-self.org apex + wildcard → Traefik LB)
Secrets#
All files matching k8s/**/*.secret are git-crypt encrypted. Unlock before building:
git-crypt unlock
k8s/juicefs/metaurl.secret is derived from k8s/juicefs/redis-password.secret — it embeds the Redis password in a connection URI. If you rotate the password, regenerate it:
make secrets
The password file must not have a trailing newline. The Makefile handles this correctly.
Deploy#
Manifests:
kubectl apply -k k8s/
JuiceFS CSI driver is managed by OpenTofu as a helm_release resource:
tofu apply
Cluster rebuild#
If the cluster needs to be rebuilt from scratch (new nodes, not just config changes):
- The first control plane node must bootstrap with
cluster-init: truein/etc/rancher/k3s/config.yaml— kube-hetzner doesn't handle this automatically when all nodes are new. - Hetzner can't shrink disks. Switching to a smaller server type (e.g. cx33 → cax11) requires tainting the server resource:
tofu taint 'module.kube-hetzner.module.control_planes["0-0-control-plane"].hcloud_server.server' - After bootstrap, fetch a fresh kubeconfig from the node — the one in tofu state will have the wrong CA.
- JuiceFS CSI on SELinux (MicroOS) requires
sidecarPrivileged: trueinjuicefs-csi-values.yamlundernode:. Without it, the CSI socket has a label mismatch and sidecars can't connect.
Backups#
Daily S3 backups via CronJobs (02:00 PDS, 02:30 knot). See RESTORE.md for recovery procedures.
After a PDS restore, the sequencer autoincrement must be bumped past the relay's cursor — see RESTORE.md section "Fix sequencer cursor".