A triathlon feed for Bluesky
Go 90.4%
Python 4.8%
JavaScript 2.3%
CSS 0.8%
Just 0.6%
HTML 0.5%
Dockerfile 0.4%
93 1 0

Clone this repository

https://tangled.org/cuducos.me/tri-bsky-feed https://tangled.org/did:plc:3272gdrjsuikiff7qsgokgas/tri-bsky-feed
git@tangled.org:cuducos.me/tri-bsky-feed git@tangled.org:did:plc:3272gdrjsuikiff7qsgokgas/tri-bsky-feed

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
README.md

Custom Bluesky Feed#

Create Bluesky feeds about ANY topic using:

  1. Simple rules for capturing posts (regular expressions, accounts to include or ignore)
  2. An internal labeller (both on the CLI and in a web UI)
  3. A simple random forest classifier based on labeled posts (aim at least one thousand labeled posts)
  4. One server, multiple feeds

This is serving a triathlon feed, live from an old PC at home!

Requirements#

  • Go 1.26
  • uv (for generating the ONNX model)

just is optional, but handy.

Usage#

Settings#

Environment variable Required Description Example
TRI_BSKY_FEED_CONFIG Yes Comma-separated paths or URLs to config files (details below) feed.toml or triathlon.toml,maracatu.toml
TRI_BSKY_FEED_FDB_CLUSTER_FILE Yes How connect to the FoundationDB cluster /etc/foundationdb/fdb.cluster
TRI_BSKY_FEED_ORT_LIB_PATH Yes Path to the ONNX runtime shared library (.so, or .dylib for macOS) /usr/local/lib/libonnxruntime.so
TRI_BSKY_FEED_MODEL_DIR No Path to the model/ directory with the ONNX model (see below) Defaults to ./model/ relative to the CLI entrypoint
TRI_BSKY_FEED_USER No Username to protect the web labeller UI admin
TRI_BSKY_FEED_PASSWORD No Password to protect the web labeller UI s3cr3t

Config file#

Feed identity, filter rules, and ranker parameters are controlled by a TOML config file. The file has three sections:

  • [feed] — unique name, did and rkey identifying the feed
  • [filter]trusted_accounts (DIDs always included), ignored_accounts (DIDs always ignored), patterns (Go regexps), and exclude (substrings to reject)
  • [ranker] — scoring weights, candidate pool size, cutoff window, and a disabled flag for chronological order

See feed.toml in the repository for a fully annotated example (the triathlon feed with current values).

Compile#

Requires GOEXPERIMENT=jsonv2:

$ just build

Running the feed#

Requires an ONNX model (see how to get started below).

$ tri-bsky-feed run
  • Consumes from the firehose, saving relevant posts to the database
  • Serves the feed with saved posts that were not excluded by manual labeling (see below)

Getting started#

The tooling in this repo allows you to prepare your environment to run the feed yourself. Basically it consists in the three steps below. Note that when multiple feeds are configured, pass --feed <name> to any command to target a specific feed.

Collecting posts#

Just by running the feed as described above you will start to capture new posts as they appear in the firehose. You can backfill using Tap to increment your database. Start an instance capturing posts from the entire network:

$ just tap

Or start an instance capturing posts from specific accounts only, adding the desired DIDs after the instance is up:

$ just tap false
$ just tap-add did:plc:abc123 did:plc:xyz456

Then backfill from it. The --since flag controls how far back to look, using Go duration syntax (e.g. 72h for three days). Defaults to 168h, one week.

$ tri-bsky-feed backfill --since 24h ws://localhost:2480

Labeling posts manually#

$ tri-bsky-feed label

Displays unlabeled posts from the database, one at a time, for manual labeling as related or unrelated to each feed.

You can use --web to have a browser UI.

Also, you can check statistics from the database:

$ tri-bsky-feed stats

Generating the ONNX model#

First, we need the contents of the database as a CSV for training:

$ tri-bsky-feed export data.csv

You can use tri-bsky-feed import to import data back to the database if you need.

Then, we use Python to train and export an ONNX model:

$ just train data.csv

You can compare different models with just compare data.csv.

After generating your first model, the labeller interfaces above will show you the probability of a given post being related or not.

Developer tooling#

Check tri-bsky-feed --help as well. These commands accept both post URL or AT URI.

Saving a specific post#

$ tri-bsky-feed save https://bsky.app/profile/cuducos.bsky.social/post/3mejpnaf5ns2a

Fetches a post and saves it to the database. Use --related or --no-related to label it, or omit both to save as pending.

Testing the classification model for a specific post#

$ tri-bsky-feed classify at://did:plc:3272gdrjsuikiff7qsgokgas/app.bsky.feed.post/3lpr3nl6jtg2j

Checking a post status in the database#

$ tri-bsky-feed debug at://did:plc:3272gdrjsuikiff7qsgokgas/app.bsky.feed.post/3lpr3nl6jtg2j

Contributing#

Lint, format and tests:

$ just check