Custom Bluesky Feed#
Create Bluesky feeds about ANY topic using:
- Simple rules for capturing posts (regular expressions, accounts to include or ignore)
- An internal labeller (both on the CLI and in a web UI)
- A simple random forest classifier based on labeled posts (aim at least one thousand labeled posts)
- One server, multiple feeds
This is serving a triathlon feed, live from an old PC at home!
Requirements#
just is optional, but handy.
Usage#
Settings#
| Environment variable | Required | Description | Example |
|---|---|---|---|
TRI_BSKY_FEED_CONFIG |
Yes | Comma-separated paths or URLs to config files (details below) | feed.toml or triathlon.toml,maracatu.toml |
TRI_BSKY_FEED_FDB_CLUSTER_FILE |
Yes | How connect to the FoundationDB cluster | /etc/foundationdb/fdb.cluster |
TRI_BSKY_FEED_ORT_LIB_PATH |
Yes | Path to the ONNX runtime shared library (.so, or .dylib for macOS) |
/usr/local/lib/libonnxruntime.so |
TRI_BSKY_FEED_MODEL_DIR |
No | Path to the model/ directory with the ONNX model (see below) |
Defaults to ./model/ relative to the CLI entrypoint |
TRI_BSKY_FEED_USER |
No | Username to protect the web labeller UI | admin |
TRI_BSKY_FEED_PASSWORD |
No | Password to protect the web labeller UI | s3cr3t |
Config file#
Feed identity, filter rules, and ranker parameters are controlled by a TOML config file. The file has three sections:
[feed]— uniquename,didandrkeyidentifying the feed[filter]—trusted_accounts(DIDs always included),ignored_accounts(DIDs always ignored),patterns(Go regexps), andexclude(substrings to reject)[ranker]— scoring weights, candidate pool size, cutoff window, and adisabledflag for chronological order
See feed.toml in the repository for a fully annotated example (the triathlon feed with current values).
Compile#
Requires GOEXPERIMENT=jsonv2:
$ just build
Running the feed#
Requires an ONNX model (see how to get started below).
$ tri-bsky-feed run
- Consumes from the firehose, saving relevant posts to the database
- Serves the feed with saved posts that were not excluded by manual labeling (see below)
Getting started#
The tooling in this repo allows you to prepare your environment to run the feed yourself. Basically it consists in the three steps below. Note that when multiple feeds are configured, pass --feed <name> to any command to target a specific feed.
Collecting posts#
Just by running the feed as described above you will start to capture new posts as they appear in the firehose. You can backfill using Tap to increment your database. Start an instance capturing posts from the entire network:
$ just tap
Or start an instance capturing posts from specific accounts only, adding the desired DIDs after the instance is up:
$ just tap false
$ just tap-add did:plc:abc123 did:plc:xyz456
Then backfill from it. The --since flag controls how far back to look, using Go duration syntax (e.g. 72h for three days). Defaults to 168h, one week.
$ tri-bsky-feed backfill --since 24h ws://localhost:2480
Labeling posts manually#
$ tri-bsky-feed label
Displays unlabeled posts from the database, one at a time, for manual labeling as related or unrelated to each feed.
You can use --web to have a browser UI.
Also, you can check statistics from the database:
$ tri-bsky-feed stats
Generating the ONNX model#
First, we need the contents of the database as a CSV for training:
$ tri-bsky-feed export data.csv
You can use
tri-bsky-feed importto import data back to the database if you need.
Then, we use Python to train and export an ONNX model:
$ just train data.csv
You can compare different models with
just compare data.csv.
After generating your first model, the labeller interfaces above will show you the probability of a given post being related or not.
Developer tooling#
Check tri-bsky-feed --help as well. These commands accept both post URL or AT URI.
Saving a specific post#
$ tri-bsky-feed save https://bsky.app/profile/cuducos.bsky.social/post/3mejpnaf5ns2a
Fetches a post and saves it to the database. Use --related or --no-related to label it, or omit both to save as pending.
Testing the classification model for a specific post#
$ tri-bsky-feed classify at://did:plc:3272gdrjsuikiff7qsgokgas/app.bsky.feed.post/3lpr3nl6jtg2j
Checking a post status in the database#
$ tri-bsky-feed debug at://did:plc:3272gdrjsuikiff7qsgokgas/app.bsky.feed.post/3lpr3nl6jtg2j
Contributing#
Lint, format and tests:
$ just check