Parakeet is a Rust-based Bluesky AppView aiming to implement most of the functionality required to support the Bluesky client

chore: readme

Changed files
+70
+70
README.md
···
··· 1 + # Parakeet 2 + 3 + Parakeet is a [Bluesky](https://bsky.app) [AppView](https://atproto.wiki/en/wiki/reference/core-architecture/appview) 4 + aiming to implement most of the functionality required to support the Bluesky client. Notably not implemented is a CDN. 5 + 6 + ## The Code 7 + Parakeet is implemented in Rust, using Postgres as a database, Redis for caching and queue processing, RocksDB for 8 + aggregation, and Diesel for migrations and querying. 9 + 10 + This repo is one big Rust workspace, containing nearly everything required to run and support the AppView. 11 + 12 + ### Packages 13 + - consumer: Relay indexer, Label consumer, Backfiller. Takes raw records in from repos and stores them. 14 + - dataloader-rs: a vendored fork of https://github.com/cksac/dataloader-rs, with some tweaks to fit caching requirements. 15 + - did-resolver: A did:plc and did:web resolver using hickory and reqwest. Supports custom PLC directories. 16 + - lexica: Rust types for the relevant lexicons[sic] for Bluesky. 17 + - parakeet: The core AppView server code. Using Axum and Diesel. 18 + - parakeet-db: Database types and models, also the Diesel schema. 19 + - parakeet-index: Stats aggregator based on RocksDB. Uses gRPC with tonic. 20 + - parakeet-lexgen: A WIP code generator for Lexicon in Rust. Not in use. 21 + 22 + There is also a dependency on a fork of [jsonwebtoken](https://gitlab.com/parakeet-social/jsonwebtoken) until upstream 23 + supports ES256K. 24 + 25 + ## Running 26 + Prebuilt docker images are published (semi) automatically by GitLab CI at https://gitlab.com/parakeet-social/parakeet. 27 + Use `registry.gitlab.com/parakeet-social/parakeet/[package]:[branch]` in your docker-compose.yml. There is currently no 28 + versioning until the project is more stable (sorry). 29 + You can also just build with cargo. 30 + 31 + To run, you'll need Postgres (version 16 or higher), Redis or a Redis-like, consumer, parakeet, and parakeet-index. 32 + 33 + ### Configuring 34 + There are quite a lot of environment variables, although sensible defaults are provided when possible. Variables are 35 + prefixed by `PK`, `PKC`, or `PKI` depending on if they're used in Parakeet, Consumer, or parakeet-index, respectively. 36 + Some are common to two or three parts, and are marked accordingly. 37 + 38 + | Variable | Default | Description | 39 + |-------------------------------------|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------| 40 + | (PK/PKC)_INDEX_URI | n/a | Required. URI of the parakeet-index instance in format `[host]:[port]` | 41 + | (PK/PKC)_REDIS_URI | n/a | Required. URI of Redis (or compatible) in format `redis://[host]:[port]` | 42 + | (PK/PKC)_PLC_DIRECTORY | `https://plc.directory` | Optional. A PLC mirror or different instance to use when resolving did:plc. | 43 + | PKC_DATABASE__URL | n/a | Required. URI of Postgres in format `postgres://[user]:[pass]@[host]:[port]/[db]` | 44 + | PKC_UA_CONTACT | n/a | Recommended. Some contact details (email / bluesky handle / website) to add to User-Agent. | 45 + | PKC_LABEL_SOURCE | n/a | Required if consuming Labels. A labeler or label relay to consume. | 46 + | PKC_RESUME_PATH | n/a | Required if consuming relay or label firehose. Where to store the cursor data. | 47 + | PKC_INDEXER__RELAY_SOURCE | n/a | Required if consuming relay. Relay to consume from. | 48 + | PKC_INDEXER__HISTORY_MODE | n/a | Required if consuming relay. `backfill_history` or `realtime` depending on if you plan to backfill when consuming record data from a relay. | 49 + | PKC_INDEXER__INDEXER_WORKERS | 4 | How many workers to spread indexing work between. 4 or 6 usually works depending on load. Ensure you have enough DB connections available. | 50 + | PKC_INDEXER__START_COMMIT_SEQ | n/a | Optionally, the relay sequence to start consuming from. Overridden by the data in PKC_RESUME_PATH, so clear that first if you reset. | 51 + | PKC_INDEXER__SKIP_HANDLE_VALIDATION | false | Should the indexer SKIP validating handles from `#identity` events. | 52 + | PKC_INDEXER__REQUEST_BACKFILL | false | Should the indexer request backfill when relevant. Only when `backfill_history` set. You likely want TRUE, unless you're manually controlling backfill queues. | 53 + | PKC_BACKFILL__WORKERS | 4 | How many workers to use when backfilling into the DB. Ensure you have enough DB connections available as one is created per worker. | 54 + | PKC_BACKFILL__SKIP_AGGREGATION | false | Whether to skip sending aggregation to parakeet-index. Does not remove the index requirement. Useful when developing. | 55 + | PKC_BACKFILL__DOWNLOAD_WORKERS | 25 | How many workers to use to download repos for backfilling. | 56 + | PKC_BACKFILL__DOWNLOAD_BUFFER | 25000 | How many repos to download and queue. | 57 + | PKC_BACKFILL__DOWNLOAD_TMP_DIR | n/a | Where to download repos to. Ensure there is enough space. | 58 + | (PK/PKI)_SERVER__BIND_ADDRESS | `0.0.0.0` | Address for the server to bind to. For index outside of docker, you probably want loopback as there is no auth. | 59 + | (PK/PKI)_SERVER__PORT | PK: 6000, PKI: 6001 | Port for the server to bind to. | 60 + | (PK/PKI)_DATABASE_URL | n/a | Required. URI of Postgres in format `postgres://[user]:[pass]@[host]:[port]/[db]` | 61 + | PK_SERVICE__DID | n/a | DID for the AppView in did:web. (did:plc is possible but untested) | 62 + | PK_SERVICE__PUBLIC_KEY | n/a | Public key for the AppView. Unsure if actually used, but may be required by PDS. | 63 + | PK_SERVICE__ENDPOINT | n/a | HTTPS publicly accessible endpoint for the AppView. | 64 + | PK_TRUSTED_VERIFIERS | n/a | Optionally, trusted verifiers to use. For many, join with `,`. | 65 + | PK_CDN__BASE | `https://cdn.bsky.app` | Optionally, base URL for a Bluesky compatible CDN | 66 + | PK_CDN__VIDEO_BASE | `https://video.bsky.app` | Optionally, base URL for a Bluesky compatible video CDN | 67 + | PK_DID_ALLOWLIST | n/a | Optional. If set, controls which DIDs can access the AppView. For many, join with `,` | 68 + | PK_MIGRATE | false | Set to TRUE to run database migrations automatically on start. | 69 + | PKI_INDEX_DB_PATH | n/a | Required. Location to store the index database. | 70 +