···11+# Parakeet
22+33+Parakeet is a [Bluesky](https://bsky.app) [AppView](https://atproto.wiki/en/wiki/reference/core-architecture/appview)
44+aiming to implement most of the functionality required to support the Bluesky client. Notably not implemented is a CDN.
55+66+## The Code
77+Parakeet is implemented in Rust, using Postgres as a database, Redis for caching and queue processing, RocksDB for
88+aggregation, and Diesel for migrations and querying.
99+1010+This repo is one big Rust workspace, containing nearly everything required to run and support the AppView.
1111+1212+### Packages
1313+- consumer: Relay indexer, Label consumer, Backfiller. Takes raw records in from repos and stores them.
1414+- dataloader-rs: a vendored fork of https://github.com/cksac/dataloader-rs, with some tweaks to fit caching requirements.
1515+- did-resolver: A did:plc and did:web resolver using hickory and reqwest. Supports custom PLC directories.
1616+- lexica: Rust types for the relevant lexicons[sic] for Bluesky.
1717+- parakeet: The core AppView server code. Using Axum and Diesel.
1818+- parakeet-db: Database types and models, also the Diesel schema.
1919+- parakeet-index: Stats aggregator based on RocksDB. Uses gRPC with tonic.
2020+- parakeet-lexgen: A WIP code generator for Lexicon in Rust. Not in use.
2121+2222+There is also a dependency on a fork of [jsonwebtoken](https://gitlab.com/parakeet-social/jsonwebtoken) until upstream
2323+supports ES256K.
2424+2525+## Running
2626+Prebuilt docker images are published (semi) automatically by GitLab CI at https://gitlab.com/parakeet-social/parakeet.
2727+Use `registry.gitlab.com/parakeet-social/parakeet/[package]:[branch]` in your docker-compose.yml. There is currently no
2828+versioning until the project is more stable (sorry).
2929+You can also just build with cargo.
3030+3131+To run, you'll need Postgres (version 16 or higher), Redis or a Redis-like, consumer, parakeet, and parakeet-index.
3232+3333+### Configuring
3434+There are quite a lot of environment variables, although sensible defaults are provided when possible. Variables are
3535+prefixed by `PK`, `PKC`, or `PKI` depending on if they're used in Parakeet, Consumer, or parakeet-index, respectively.
3636+Some are common to two or three parts, and are marked accordingly.
3737+3838+| Variable | Default | Description |
3939+|-------------------------------------|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|
4040+| (PK/PKC)_INDEX_URI | n/a | Required. URI of the parakeet-index instance in format `[host]:[port]` |
4141+| (PK/PKC)_REDIS_URI | n/a | Required. URI of Redis (or compatible) in format `redis://[host]:[port]` |
4242+| (PK/PKC)_PLC_DIRECTORY | `https://plc.directory` | Optional. A PLC mirror or different instance to use when resolving did:plc. |
4343+| PKC_DATABASE__URL | n/a | Required. URI of Postgres in format `postgres://[user]:[pass]@[host]:[port]/[db]` |
4444+| PKC_UA_CONTACT | n/a | Recommended. Some contact details (email / bluesky handle / website) to add to User-Agent. |
4545+| PKC_LABEL_SOURCE | n/a | Required if consuming Labels. A labeler or label relay to consume. |
4646+| PKC_RESUME_PATH | n/a | Required if consuming relay or label firehose. Where to store the cursor data. |
4747+| PKC_INDEXER__RELAY_SOURCE | n/a | Required if consuming relay. Relay to consume from. |
4848+| PKC_INDEXER__HISTORY_MODE | n/a | Required if consuming relay. `backfill_history` or `realtime` depending on if you plan to backfill when consuming record data from a relay. |
4949+| PKC_INDEXER__INDEXER_WORKERS | 4 | How many workers to spread indexing work between. 4 or 6 usually works depending on load. Ensure you have enough DB connections available. |
5050+| PKC_INDEXER__START_COMMIT_SEQ | n/a | Optionally, the relay sequence to start consuming from. Overridden by the data in PKC_RESUME_PATH, so clear that first if you reset. |
5151+| PKC_INDEXER__SKIP_HANDLE_VALIDATION | false | Should the indexer SKIP validating handles from `#identity` events. |
5252+| PKC_INDEXER__REQUEST_BACKFILL | false | Should the indexer request backfill when relevant. Only when `backfill_history` set. You likely want TRUE, unless you're manually controlling backfill queues. |
5353+| PKC_BACKFILL__WORKERS | 4 | How many workers to use when backfilling into the DB. Ensure you have enough DB connections available as one is created per worker. |
5454+| PKC_BACKFILL__SKIP_AGGREGATION | false | Whether to skip sending aggregation to parakeet-index. Does not remove the index requirement. Useful when developing. |
5555+| PKC_BACKFILL__DOWNLOAD_WORKERS | 25 | How many workers to use to download repos for backfilling. |
5656+| PKC_BACKFILL__DOWNLOAD_BUFFER | 25000 | How many repos to download and queue. |
5757+| PKC_BACKFILL__DOWNLOAD_TMP_DIR | n/a | Where to download repos to. Ensure there is enough space. |
5858+| (PK/PKI)_SERVER__BIND_ADDRESS | `0.0.0.0` | Address for the server to bind to. For index outside of docker, you probably want loopback as there is no auth. |
5959+| (PK/PKI)_SERVER__PORT | PK: 6000, PKI: 6001 | Port for the server to bind to. |
6060+| (PK/PKI)_DATABASE_URL | n/a | Required. URI of Postgres in format `postgres://[user]:[pass]@[host]:[port]/[db]` |
6161+| PK_SERVICE__DID | n/a | DID for the AppView in did:web. (did:plc is possible but untested) |
6262+| PK_SERVICE__PUBLIC_KEY | n/a | Public key for the AppView. Unsure if actually used, but may be required by PDS. |
6363+| PK_SERVICE__ENDPOINT | n/a | HTTPS publicly accessible endpoint for the AppView. |
6464+| PK_TRUSTED_VERIFIERS | n/a | Optionally, trusted verifiers to use. For many, join with `,`. |
6565+| PK_CDN__BASE | `https://cdn.bsky.app` | Optionally, base URL for a Bluesky compatible CDN |
6666+| PK_CDN__VIDEO_BASE | `https://video.bsky.app` | Optionally, base URL for a Bluesky compatible video CDN |
6767+| PK_DID_ALLOWLIST | n/a | Optional. If set, controls which DIDs can access the AppView. For many, join with `,` |
6868+| PK_MIGRATE | false | Set to TRUE to run database migrations automatically on start. |
6969+| PKI_INDEX_DB_PATH | n/a | Required. Location to store the index database. |
7070+