Rust AppView - highly experimental!
at main 81 lines 9.9 kB view raw view rendered
1# Parakeet 2 3Parakeet is a [Bluesky](https://bsky.app) [AppView](https://atproto.wiki/en/wiki/reference/core-architecture/appview) 4aiming to implement most of the functionality required to support the Bluesky client. Notably not implemented is a CDN. 5 6## Status and Roadmap 7Most common functionality works, with notable omissions being like/repost/follow statuses, blocks and mutes don't get 8applied, labels might not track CIDs properly, label redaction doesn't work at all (beware!). 9 10Future work is tracked in issues, but the highlights are below. Help would be highly appreciated. 11- Notifications 12- Search 13- Pinned Posts 14- The Timeline 15- Monitoring: metrics, tracing, and health checks. 16 17## The Code 18Parakeet is implemented in Rust, using Postgres as a database, Redis for caching and queue processing, RocksDB for 19aggregation, and Diesel for migrations and querying. 20 21This repo is one big Rust workspace, containing nearly everything required to run and support the AppView. 22 23### Packages 24- consumer: Relay indexer, Label consumer, Backfiller. Takes raw records in from repos and stores them. 25- dataloader-rs: a vendored fork of https://github.com/cksac/dataloader-rs, with some tweaks to fit caching requirements. 26- did-resolver: A did:plc and did:web resolver using hickory and reqwest. Supports custom PLC directories. 27- lexica: Rust types for the relevant lexicons[sic] for Bluesky. 28- parakeet: The core AppView server code. Using Axum and Diesel. 29- parakeet-db: Database types and models, also the Diesel schema. 30- parakeet-index: Stats aggregator based on RocksDB. Uses gRPC with tonic. 31- parakeet-lexgen: A WIP code generator for Lexicon in Rust. Not in use. 32 33There is also a dependency on a fork of [jsonwebtoken](https://gitlab.com/parakeet-social/jsonwebtoken) until upstream 34supports ES256K. 35 36## Running 37Prebuilt docker images are published (semi) automatically by GitLab CI at https://gitlab.com/parakeet-social/parakeet. 38Use `registry.gitlab.com/parakeet-social/parakeet/[package]:[branch]` in your docker-compose.yml. There is currently no 39versioning until the project is more stable (sorry). 40You can also just build with cargo. 41 42To run, you'll need Postgres (version 16 or higher), Redis or a Redis-like, consumer, parakeet, and parakeet-index. 43 44### Configuring 45There are quite a lot of environment variables, although sensible defaults are provided when possible. Variables are 46prefixed by `PK`, `PKC`, or `PKI` depending on if they're used in Parakeet, Consumer, or parakeet-index, respectively. 47Some are common to two or three parts, and are marked accordingly. 48 49| Variable | Default | Description | 50|-------------------------------------|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------| 51| (PK/PKC)_INDEX_URI | n/a | Required. URI of the parakeet-index instance in format `[host]:[port]` | 52| (PK/PKC)_REDIS_URI | n/a | Required. URI of Redis (or compatible) in format `redis://[host]:[port]` | 53| (PK/PKC)_PLC_DIRECTORY | `https://plc.directory` | Optional. A PLC mirror or different instance to use when resolving did:plc. | 54| PKC_DATABASE__URL | n/a | Required. URI of Postgres in format `postgres://[user]:[pass]@[host]:[port]/[db]` | 55| PKC_UA_CONTACT | n/a | Recommended. Some contact details (email / bluesky handle / website) to add to User-Agent. | 56| PKC_LABEL_SOURCE | n/a | Required if consuming Labels. A labeler or label relay to consume. | 57| PKC_RESUME_PATH | n/a | Required if consuming relay or label firehose. Where to store the cursor data. | 58| PKC_INDEXER__RELAY_SOURCE | n/a | Required if consuming relay. Relay to consume from. | 59| PKC_INDEXER__HISTORY_MODE | n/a | Required if consuming relay. `backfill_history` or `realtime` depending on if you plan to backfill when consuming record data from a relay. | 60| PKC_INDEXER__INDEXER_WORKERS | 4 | How many workers to spread indexing work between. 4 or 6 usually works depending on load. Ensure you have enough DB connections available. | 61| PKC_INDEXER__START_COMMIT_SEQ | n/a | Optionally, the relay sequence to start consuming from. Overridden by the data in PKC_RESUME_PATH, so clear that first if you reset. | 62| PKC_INDEXER__SKIP_HANDLE_VALIDATION | false | Should the indexer SKIP validating handles from `#identity` events. | 63| PKC_INDEXER__REQUEST_BACKFILL | false | Should the indexer request backfill when relevant. Only when `backfill_history` set. You likely want TRUE, unless you're manually controlling backfill queues. | 64| PKC_BACKFILL__WORKERS | 4 | How many workers to use when backfilling into the DB. Ensure you have enough DB connections available as one is created per worker. | 65| PKC_BACKFILL__SKIP_AGGREGATION | false | Whether to skip sending aggregation to parakeet-index. Does not remove the index requirement. Useful when developing. | 66| PKC_BACKFILL__DOWNLOAD_WORKERS | 25 | How many workers to use to download repos for backfilling. | 67| PKC_BACKFILL__DOWNLOAD_BUFFER | 25000 | How many repos to download and queue. | 68| PKC_BACKFILL__DOWNLOAD_TMP_DIR | n/a | Where to download repos to. Ensure there is enough space. | 69| (PK/PKI)_SERVER__BIND_ADDRESS | `0.0.0.0` | Address for the server to bind to. For index outside of docker, you probably want loopback as there is no auth. | 70| (PK/PKI)_SERVER__PORT | PK: 6000, PKI: 6001 | Port for the server to bind to. | 71| (PK/PKI)_DATABASE_URL | n/a | Required. URI of Postgres in format `postgres://[user]:[pass]@[host]:[port]/[db]` | 72| PK_SERVICE__DID | n/a | DID for the AppView in did:web. (did:plc is possible but untested) | 73| PK_SERVICE__PUBLIC_KEY | n/a | Public key for the AppView. Unsure if actually used, but may be required by PDS. | 74| PK_SERVICE__ENDPOINT | n/a | HTTPS publicly accessible endpoint for the AppView. | 75| PK_TRUSTED_VERIFIERS | n/a | Optionally, trusted verifiers to use. For many, join with `,`. | 76| PK_CDN__BASE | `https://cdn.bsky.app` | Optionally, base URL for a Bluesky compatible CDN | 77| PK_CDN__VIDEO_BASE | `https://video.bsky.app` | Optionally, base URL for a Bluesky compatible video CDN | 78| PK_DID_ALLOWLIST | n/a | Optional. If set, controls which DIDs can access the AppView. For many, join with `,` | 79| PK_MIGRATE | false | Set to TRUE to run database migrations automatically on start. | 80| PKI_INDEX_DB_PATH | n/a | Required. Location to store the index database. | 81