Star Spangled#

This is the script used for the US Politics Labeler @uspol.bluesky.bot. It has been genericized and published for the benefit of others who may wish to operate similar labelers.

A guide will be published sometime, maybe.

It was developed for use in Docker, on x86_64 Linux. It should work outside of Docker, but it may be more difficult to manage, and there might not be guides for it. I have no idea if it works on Windows or MacOS, so you're on your own if you decide to run it there. Which you really shouldn't, since it has to be run on an always-on server machine, anyway.

Installation#

In short, clone this repository (ideally into a sub-folder of where your config files will be), copy out the configuration files and Compose file, and run, either through Docker or Deno.

WARNING

Before running Star Spangled (the final command below), make sure you're running an Ozone instance. You can un-comment a section in the compose.yaml file to run an Ozone instance right next to the automation script. Setup of the Ozone instance is out of scope of this guide; I'll have to write a more fully fledged guide soon.

In long, with recent Docker or Podman already installed:

# Everything after and including # is a comment.
# Everything after and including $ is the actual command.
# Omit comments and pre-command context information before typing into a terminal.
/srv/labeling-service $ git clone https://codeberg.org/bleonard252/star-spangled
[...git clone output...]
/srv/labeling-service $ cp star-spangled/.env.example ./.env
/srv/labeling-service $ cp star-spangled/keywords.tsv.example ./keywords.tsv
/srv/labeling-service $ cp star-spangled/overrides.tsv.example ./overrides.tsv
/srv/labeling-service $ cp star-spangled/compose.yaml.example ./compose.yaml
# HERE: Edit the 4 files you just copied.
/srv/labeling-service $ docker compose up -d
# or use podman instead of docker

TIP

I use the sub-folder method here so that I can run Ozone and Redis with the same Compose file, and store their configuration separate from the source code. You don't necessarily have to do this, the config files are already in the Gitignore.

I also recommend doing it this way if you want to run multiple instances, i.e. to set different settings for different situations (such as a lower Escalate And Label threshold for reports, or different polling intervals). This is because we recommend a different login.json for each instance, just to make sure nothing breaks with the login/token mechanism. You could do this with named volumes, but I don't do that because if I need to tamper with the file (such as to force it to log out), it's a lot more difficult if it's not in an easy, well-known place: where the Compose file is.

Basically it's a lot easier to manage this way.

If all is well, it should now be running in the background. You can check on it with:

docker compose logs

How it works#

The script frequently checks (or watches) all the things you specify in the command line (lists, feeds, tags) and checks them for the keywords you specify in keywords.tsv. If it's posted by an account skipped in overrides.tsv, it is skipped. Bonus points are applied (see below). The scores for each label are then checked: if any label's score passes the LABEL_THRESHOLD, all of the labels determined are automatically added to the post (and acknowledged in Ozone). If not, then if any label's score passes the ESCALATE_AND_LABEL_THRESHOLD, all of the labels determined are automatically added to the post, and it is escalated in Ozone for manual review. If that doesn't apply, then if any label's score passes the ESCALATE_THRESHOLD, the labels on the post, if any, are not changed, and the post is escalated in Ozone for manual review.

WARNING

Only Bluesky posts, with the record type app.bsky.feed.post, are currently considered. Whitewind, for example, doesn't check for labels, so it's not very useful there. It's oriented towards accurately blocking content, rather than casting a wide net, so checking non-post material for matches doesn't make much sense.

Possible sources for bonus points:

The or: or bo: label prefixes in keywords.tsv
The "bonus-points-only" special label (which is removed before labels are applied!)
Reported posts (+25 points)
Posts where no other bonus points apply, but multiple labels have positive scores (+25 points; this is called CATEGORY_SHARE_POINTS in the code)
Discovered by crawling the reply thread of a matched post (+15 points)

How reviews are handled#

If the reported post has never been seen before (it checks for the report to have the Ozone tag "auto-handled" or its presence in the "alreadyHandled" key in Redis), it gets checked for keywords by the script.
- If no keywords match, the post is escalated anyway. The bot comment in Ozone will tell you this.
If it has been seen before, the post just gets escalated immediately. The bot's comment in Ozone will tell you what happened.
If the reported subject is not a post, it also just gets escalated immediately. The comment reflects the unknown type issue (or that it's an account).

Info that's included in the automated comment#

Bot comments in Ozone will include:

[Automated], which is ALWAYS there.
What keywords match, if any.
Why a subject was escalated, if it wasn't for the keywords.

Configuration#

Star Spangled uses three files for configuration, and other things:

.env, which contains hard-coded things like credentials and other hard settings like thresholds.
keywords.tsv, which is a three-column tab-separated values file (which uses # as a line-comment character) holding keywords to match against, the labels they match for, and the score (a type of weight) to apply to the label when a keyword matches a post.
- This has the special pseudo-label "bonus-points-only", which applies points to every label that has points, and the prefixes or: (which apply bonus points, or to the prefixed label if no other label has a positive score) and bo: (which applies bonus points to only that prefixed label, and only if it already has a positive score).
overrides.tsv, which is formatted similar to keywords.tsv, except with two columns instead: for the identifier (DIDs only) for an account to override, and either the word "skip" or bonus points to apply to all posts by a specific account.
- This feature is nice for accounts you know do not post things your labeler should cover, which might be frequently false-positived (or if you know they do post stuff that frequently gets false-negatived). One example skipped entry in the US Politics Labeler is the Auschwitz Museum.
Various command-line switches, which indicate search modes:

  --poll-feed                <feedAtUri>  - A feed to check periodically. You'll need the feed's AT URI.
  --poll-chronological-feed  <feedAtUri>  - A feed to check periodically. You'll need the feed's AT URI. Automatically paginates.
  --poll-list                <listAtUri>  - A list to check periodically. This may point to an AT URI of a Bluesky list, or to a file where each line is the DID or handle of the account to crawl.
  --hashtag                  <hashtag>    - A hashtag (without the hash) or hidden tag to check periodically.
  --firehose                              - Listen on the firehose.
  --poll-reports                          - Poll for reports.
  --crawl-thread                          - Crawl other posts in a thread.

Clone this repository