Parakeet#
Parakeet is a Bluesky AppView aiming to implement most of the functionality required to support the Bluesky client. Notably not implemented is a CDN.
Status and Roadmap#
Most common functionality works, with notable omissions being like/repost/follow statuses, blocks and mutes don't get applied, labels might not track CIDs properly, label redaction doesn't work at all (beware!).
Future work is tracked in issues, but the highlights are below. Help would be highly appreciated.
- Notifications
- Search
- Pinned Posts
- The Timeline
- Monitoring: metrics, tracing, and health checks.
The Code#
Parakeet is implemented in Rust, using Postgres as a database, Redis for caching and queue processing, RocksDB for aggregation, and Diesel for migrations and querying.
This repo is one big Rust workspace, containing nearly everything required to run and support the AppView.
Packages#
- consumer: Relay indexer, Label consumer, Backfiller. Takes raw records in from repos and stores them.
- dataloader-rs: a vendored fork of https://github.com/cksac/dataloader-rs, with some tweaks to fit caching requirements.
- did-resolver: A did:plc and did:web resolver using hickory and reqwest. Supports custom PLC directories.
- lexica: Rust types for the relevant lexicons[sic] for Bluesky.
- parakeet: The core AppView server code. Using Axum and Diesel.
- parakeet-db: Database types and models, also the Diesel schema.
- parakeet-index: Stats aggregator based on RocksDB. Uses gRPC with tonic.
- parakeet-lexgen: A WIP code generator for Lexicon in Rust. Not in use.
There is also a dependency on a fork of jsonwebtoken until upstream supports ES256K.
Running#
Prebuilt docker images are published (semi) automatically by GitLab CI at https://gitlab.com/parakeet-social/parakeet.
Use registry.gitlab.com/parakeet-social/parakeet/[package]:[branch] in your docker-compose.yml. There is currently no
versioning until the project is more stable (sorry).
You can also just build with cargo.
To run, you'll need Postgres (version 16 or higher), Redis or a Redis-like, consumer, parakeet, and parakeet-index.
Configuring#
There are quite a lot of environment variables, although sensible defaults are provided when possible. Variables are
prefixed by PK, PKC, or PKI depending on if they're used in Parakeet, Consumer, or parakeet-index, respectively.
Some are common to two or three parts, and are marked accordingly.
| Variable | Default | Description |
|---|---|---|
| (PK/PKC)_INDEX_URI | n/a | Required. URI of the parakeet-index instance in format [host]:[port] |
| (PK/PKC)_REDIS_URI | n/a | Required. URI of Redis (or compatible) in format redis://[host]:[port] |
| (PK/PKC)_PLC_DIRECTORY | https://plc.directory |
Optional. A PLC mirror or different instance to use when resolving did:plc. |
| PKC_DATABASE__URL | n/a | Required. URI of Postgres in format postgres://[user]:[pass]@[host]:[port]/[db] |
| PKC_UA_CONTACT | n/a | Recommended. Some contact details (email / bluesky handle / website) to add to User-Agent. |
| PKC_LABEL_SOURCE | n/a | Required if consuming Labels. A labeler or label relay to consume. |
| PKC_RESUME_PATH | n/a | Required if consuming relay or label firehose. Where to store the cursor data. |
| PKC_INDEXER__RELAY_SOURCE | n/a | Required if consuming relay. Relay to consume from. |
| PKC_INDEXER__HISTORY_MODE | n/a | Required if consuming relay. backfill_history or realtime depending on if you plan to backfill when consuming record data from a relay. |
| PKC_INDEXER__INDEXER_WORKERS | 4 | How many workers to spread indexing work between. 4 or 6 usually works depending on load. Ensure you have enough DB connections available. |
| PKC_INDEXER__START_COMMIT_SEQ | n/a | Optionally, the relay sequence to start consuming from. Overridden by the data in PKC_RESUME_PATH, so clear that first if you reset. |
| PKC_INDEXER__SKIP_HANDLE_VALIDATION | false | Should the indexer SKIP validating handles from #identity events. |
| PKC_INDEXER__REQUEST_BACKFILL | false | Should the indexer request backfill when relevant. Only when backfill_history set. You likely want TRUE, unless you're manually controlling backfill queues. |
| PKC_BACKFILL__WORKERS | 4 | How many workers to use when backfilling into the DB. Ensure you have enough DB connections available as one is created per worker. |
| PKC_BACKFILL__SKIP_AGGREGATION | false | Whether to skip sending aggregation to parakeet-index. Does not remove the index requirement. Useful when developing. |
| PKC_BACKFILL__DOWNLOAD_WORKERS | 25 | How many workers to use to download repos for backfilling. |
| PKC_BACKFILL__DOWNLOAD_BUFFER | 25000 | How many repos to download and queue. |
| PKC_BACKFILL__DOWNLOAD_TMP_DIR | n/a | Where to download repos to. Ensure there is enough space. |
| (PK/PKI)_SERVER__BIND_ADDRESS | 0.0.0.0 |
Address for the server to bind to. For index outside of docker, you probably want loopback as there is no auth. |
| (PK/PKI)_SERVER__PORT | PK: 6000, PKI: 6001 | Port for the server to bind to. |
| (PK/PKI)_DATABASE_URL | n/a | Required. URI of Postgres in format postgres://[user]:[pass]@[host]:[port]/[db] |
| PK_SERVICE__DID | n/a | DID for the AppView in did:web. (did:plc is possible but untested) |
| PK_SERVICE__PUBLIC_KEY | n/a | Public key for the AppView. Unsure if actually used, but may be required by PDS. |
| PK_SERVICE__ENDPOINT | n/a | HTTPS publicly accessible endpoint for the AppView. |
| PK_TRUSTED_VERIFIERS | n/a | Optionally, trusted verifiers to use. For many, join with ,. |
| PK_CDN__BASE | https://cdn.bsky.app |
Optionally, base URL for a Bluesky compatible CDN |
| PK_CDN__VIDEO_BASE | https://video.bsky.app |
Optionally, base URL for a Bluesky compatible video CDN |
| PK_DID_ALLOWLIST | n/a | Optional. If set, controls which DIDs can access the AppView. For many, join with , |
| PK_MIGRATE | false | Set to TRUE to run database migrations automatically on start. |
| PKI_INDEX_DB_PATH | n/a | Required. Location to store the index database. |