1
2`relay`: atproto relay reference implementation
3===============================================
4
5*NOTE: "relays" used to be called "Big Graph Servers", or "BGS", or "bigsky". Many variables and packages still reference "bgs"*
6
7This is a reference implementation of an atproto relay, written and operated by Bluesky.
8
9In [atproto](https://atproto.com), a relay subscribes to multiple PDS hosts and outputs a combined "firehose" event stream. Downstream services can subscribe to this single firehose a get all relevant events for the entire network, or a specific sub-graph of the network. The relay verifies repo data structure integrity and identity signatures. It is application-agnostic, and does not validate data records against atproto Lexicon schemas.
10
11This relay implementation is designed to subscribe to the entire global network. The current state of the codebase is informally expected to scale to around 100 million accounts in the network, and tens of thousands of repo events per second (peak).
12
13Features and design decisions:
14
15- runs on a single server (not a distributed system)
16- upstream host and account state is stored in a SQL database
17- SQL driver: [gorm](https://gorm.io), supporting PostgreSQL in production and sqlite for testing
18- highly concurrent: not particularly CPU intensive
19- single golang binary for easy deployment
20- observability: logging, prometheus metrics, OTEL traces
21- admin web interface: configure limits, add upstream PDS instances, etc
22
23This daemon is relatively simple to self-host, though it isn't as well documented or supported as the PDS reference implementation (see details below).
24
25See `./HACKING.md` for more documentation of specific behaviors of this implementation.
26
27
28## Development Tips
29
30The README and Makefile at the top level of this git repo have some generic helpers for testing, linting, formatting code, etc.
31
32To build the admin web interface, and then build and run the relay locally:
33
34 make build-relay-admin-ui
35 make run-dev-relay
36
37You can run the command directly to get a list of configuration flags and environment variables. The environment will be loaded from a `.env`file if one exist:
38
39 go run ./cmd/relay/ --help
40
41You can also build an run the command directly:
42
43 go build ./cmd/relay
44 ./relay serve
45
46By default, the daemon will use sqlite for databases (in the directory `./data/relay/`), and the HTTP API will be bound to localhost port 2470.
47
48When the daemon isn't running, sqlite database files can be inspected with:
49
50 sqlite3 data/relay/relay.sqlite
51 [...]
52 sqlite> .schema
53
54To wipe all local data (careful!):
55
56 # double-check before running this destructive command
57 rm -rf ./data/relay/*
58
59There is a basic web dashboard, though it will not be included unless built and copied to a local directory `./public/`. Run `make build-relay-admin-ui`, and then when running the daemon the dashboard will be available at: <http://localhost:2470/dash/>. Paste in the admin key, eg `dummy`.
60
61The local admin routes can also be accessed by passing the admin password using HTTP Basic auth (with username `admin`), for example:
62
63 http get :2470/admin/pds/list -a admin:dummy
64
65Request crawl of an individual PDS instance like:
66
67 http post :2470/admin/pds/requestCrawl -a admin:dummy hostname=pds.example.com
68
69The `goat` command line tool (also part of the indigo git repository) includes helpers for administering, inspecting, and debugging relays:
70
71 RELAY_HOST=http://localhost:2470 goat firehose --verify-mst
72 RELAY_HOST=http://localhost:2470 goat relay admin host list
73
74## API Endpoints
75
76This relay implements the core atproto "sync" API endpoints:
77
78- `GET /xrpc/com.atproto.sync.subscribeRepos` (WebSocket)
79- `GET /xrpc/com.atproto.sync.getRepo` (HTTP redirect to account's PDS)
80- `GET /xrpc/com.atproto.sync.getRepoStatus`
81- `GET /xrpc/com.atproto.sync.listRepos` (optional)
82- `GET /xrpc/com.atproto.sync.getLatestCommit` (optional)
83
84It also implements some relay-specific endpoints:
85
86- `POST /xrpc/com.atproto.sync.requestCrawl`
87- `GET /xrpc/com.atproto.sync.listHosts`
88- `GET /xrpc/com.atproto.sync.getHostStatus`
89
90Documentation can be found in the [atproto specifications](https://atproto.com/specs/sync) for repository synchronization, event streams, data formats, account status, etc.
91
92This implementation also has some off-protocol admin endpoints under `/admin/`. These have legacy schemas from an earlier implementation, are not well documented, and should not be considered a stable API to build upon. The intention is to refactor them in to Lexicon-specified APIs.
93
94## Configuration and Operation
95
96*NOTE: this document is not a complete guide to operating a relay as a public service. That requires planning around acceptable use policies, financial sustainability, infrastructure selection, etc. This is just a quick overview of the mechanics of getting a relay up and running.*
97
98Some notable configuration env vars:
99
100- `RELAY_ADMIN_PASSWORD`
101- `DATABASE_URL`: eg, `postgres://relay:CHANGEME@localhost:5432/relay`
102- `RELAY_PERSIST_DIR`: storage location for "backfill" events, eg `/data/relay/persist`
103- `RELAY_REPLAY_WINDOW`: the duration of output "backfill window", eg `24h`
104- `RELAY_LENIENT_SYNC_VALIDATION`: if `true`, allow legacy upstreams which don't implement atproto sync v1.1
105- `RELAY_TRUSTED_DOMAINS`: patterns of PDS hosts which get larger quotas by default, eg `*.host.bsky.network`
106
107There is a health check endpoint at `/xrpc/_health`. Prometheus metrics are exposed by default on port 2471, path `/metrics`. The service logs fairly verbosely to stdout; use `LOG_LEVEL` to control log volume (`warn`, `info`, etc).
108
109Be sure to double-check bandwidth usage and pricing if running a public relay! Bandwidth prices can vary widely between providers, and popular cloud services (AWS, Google Cloud, Azure) are very expensive compared to alternatives like OVH or Hetzner.
110
111The relay admin interface has flexibility for many situations, but in some operational incidents it may be necessary to run SQL commands to do cleanups. This should be done when the relay is not actively operating. It is also recommended to run SQL commands in a transaction that can be rolled back in case of a typo or mistake.
112
113On the public web, you should probably run the relay behind a load-balancer or reverse proxy like `haproxy` or `caddy`, which manages TLS and can have various HTTP limits and behaviors configured. Remember that WebSocket support is required.
114
115The relay does not resolve atproto handles, but it does do DNS resolutions for hostnames, and may do a burst of resolutions at startup. Note that the go runtime may have an internal DNS implementation enabled (this is the default for the Dockerfile). The relay *will* do a large number of DID resolutions, particularly calls to the PLC directory, and particularly after a process restart when the in-process identity cache is warming up.
116
117### PostgreSQL
118
119PostgreSQL is recommended for any non-trival relay deployments. Database configuration is passed via the `DATABASE_URL` environment variable, or the corresponding CLI arg.
120
121The user and database must already be configured. For example:
122
123 CREATE DATABASE relay;
124
125 CREATE USER ${username} WITH PASSWORD '${password}';
126 GRANT ALL PRIVILEGES ON DATABASE relay TO ${username};
127
128This service currently uses `gorm` to automatically run database migrations as the regular user. There is no support for running database migrations separately under more privileged database user.
129
130### Docker
131
132The relay is relatively easy to build and operate as as simple executable, but there is also Dockerfile in this directory. It can be used to build customized/patched versions of the relay as a container, republish them, run locally, deploy to servers, deploy to an orchestrated cluster, etc.
133
134Relays process a lot of packets, so we strongly recommend running docker in "host networking" mode when operating a full-network relay. You may also want to use something other than default docker log management (eg, `svlogd`), to handle large log volumes.
135
136### Bootstrapping Host List
137
138Before bulk-adding hosts, you should probably increase the "new-hosts-per-day" limit, at least temporarily.
139
140The relay comes with a helper command to pull a list of hosts from an existing relay. You should shut the relay down first and run this as a separate command:
141
142 ./relay pull-hosts
143
144An alternative method, using `goat` and `parallel`, which is more gentle and may be better for small servers:
145
146 # dump a host list using goat
147 # 'rg' is ripgrep
148 RELAY_HOST=https://relay1.us-west.bsky.network goat relay host list | rg '\tactive' | cut -f1 > hosts.txt
149
150 # assuming that .env contains local relay configuration and admin credential
151 shuf hosts.txt | parallel goat relay admin host add {}