A tool for parsing traffic on the jetstream and applying a moderation workstream based on regexp based rules

16: Update readme.md

Changed files
+81 -21
+81 -21
README.md
··· 1 - # skywatch-tools 1 + # Skywatch Automod 2 + 3 + This project provides tools for automating moderation of the Bluesky social network. It listens to the Bluesky firehose stream, analyzes various types of content against user-defined rules, and performs moderation actions such as applying labels, reporting content, or leaving comments. 4 + 5 + ## Features 6 + 7 + - **Real-time Moderation:** Monitors the Bluesky firehose in real-time. 8 + - **Content-Aware Analysis:** Analyzes posts, user profiles (display names, descriptions), and handles 9 + - **Flexible Rule Engine:** Uses regular expressions for defining moderation checks. 10 + - **Variety of Actions:** Can apply labels, create reports (for posts or accounts), and post comments on accounts. 11 + - **Configurable:** Highly configurable through environment variables and a central constants file. 12 + - **Allowlisting:** Supports allowlisting for DIDs and text patterns to reduce false positives. 13 + - **URL Unshortening:** Automatically resolves shortened URLs in posts before checking them. 14 + - **Monitoring:** Exposes a Prometheus metrics endpoint to monitor its activity. (untested) 15 + - **Resilient:** Persists the firehose cursor to gracefully handle restarts without missing events. 16 + 17 + ## How It Works 18 + 19 + The application connects to the Bluesky firehose and subscribes to a set of collections (e.g., posts, profiles). When a new event is received, it is passed through a series of checks defined in `src/constants.ts`. These checks are categorized by content type: 20 + 21 + - `POST_CHECKS`: For post content and links. 22 + - `HANDLE_CHECKS`: For user handles. 23 + - `PROFILE_CHECKS`: For user display names and descriptions. 24 + 25 + If the content matches a check's criteria (and is not excluded by an allowlist), a corresponding moderation action is triggered. These actions (labeling, reporting, etc.) are performed using the Bluesky API. 26 + 27 + ## Getting Started 28 + 29 + ### Prerequisites 2 30 3 - This is a rewrite of the original skywatch-tools project in TypeScript. The original project was written in Bash. The purpose of this project is to automate the moderation by the Bluesky independent labeler skywatch.blue 31 + - Node.js (v20 or higher recommended) 32 + - `bun` package manager 33 + - A Bluesky account for the bot. 34 + - A Bluesky labeler account 4 35 5 - ## Installation and Setup 36 + ### 1. Installation 6 37 7 - To install dependencies: 38 + Clone the repository and install the dependencies: 8 39 9 40 ```bash 10 - bun i 41 + git clone <repository-url> 42 + cd skywatch-automod-public 43 + bun install 11 44 ``` 12 45 13 - Modify .env.example with your own values and rename it to .env 46 + ### Configuration 47 + 48 + There are two main configuration files you need to set up: 49 + 50 + - **Checks (`src/constants.ts`):** 51 + This file defines the rules for your automod. You need to create it by copying the example file: 52 + 53 + ```bash 54 + cp src/constants.ts.example src/constants.ts 55 + ``` 56 + 57 + Then, edit `src/constants.ts` to define your own checks. For detailed instructions on how to create checks, please see [developing_checks.md](./src/developing_checks.md). 58 + 59 + - **Environment Variables (`.env`):** 60 + This file contains credentials and other runtime configuration. You will need to create a `.env` file and populate it with your specific values. You can use `.env.example` as a reference if it exists in the 61 + 62 + ### 3. Running the Application 63 + 64 + Once configured, you can start the automod with: 14 65 15 66 ```bash 16 67 bun run start 17 68 ``` 18 69 19 - To run in docker: 70 + ### 4. Running with Docker 71 + 72 + You can also build and run the application as a Docker container. 20 73 21 74 ```bash 75 + # Build the container 22 76 docker build -pull -t skywatch-automod . 77 + 78 + # Run the container 23 79 docker run -d -p 4101:4101 skywatch-automod 24 80 ``` 25 81 26 - ## Brief overview 82 + Make sure your `.env` file is present when building the Docker image, as it will be copied into the container. 27 83 28 - Currently this tooling does one thing. It monitors the bluesky firehose and analyzes content for phrases which fit Skywatch's criteria for moderation. If the criteria is met, it can automatically label the content with the appropriate label. 84 + #### Configuration Variables 29 85 30 - In certain cases, where regexp will create too many false positives, it will flag content as a report against related to the account, so that it can be reviewed later. 86 + The following environment variables are used for configuration: 31 87 32 - For information on how to set-up your own checks, please see the [developing_checks.md](./src/developing_checks.md) file. 33 - 34 - _TODO_: 35 - 36 - - [ ] Remove unused types 37 - - [ ] Update the types needed to be more specific to the checks rather than bluesky content types 38 - - [ ] Consider how to write directly to OzonePDS database rather than using the API. May require running the same instance as Ozone to allow for direct database access. 39 - - [ ] Add compose.yaml for easy deployment 40 - - [ ] Make the metrics server work (or remove it) 41 - 42 - Create a seperate program to watch OZONE_PDS firehose labels, and update the lists as needed. This will remove dependency on broken ruby tools created by aegis. 88 + | Variable | Description | Default | 89 + | ------------------------ | ---------------------------------------------------------------- | ----------------------------------------- | 90 + | `DID` | The DID of your moderation service for atproto-proxy headers. | `""` | 91 + | `OZONE_URL` | The URL of the Ozone service. | `""` | 92 + | `OZONE_PDS` | The Public Downstream Service for Ozone. | `""` | 93 + | `BSKY_HANDLE` | The handle (username) of the bot's Bluesky account. | `""` | 94 + | `BSKY_PASSWORD` | The app password for the bot's Bluesky account. | `""` | 95 + | `HOST` | The host on which the server runs. | `127.0.0.1` | 96 + | `PORT` | The port for the main application (currently unused). | `4100` | 97 + | `METRICS_PORT` | The port for the Prometheus metrics server. | `4101` | 98 + | `FIREHOSE_URL` | The WebSocket URL for the Bluesky firehose. | `wss://jetstream.atproto.tools/subscribe` | 99 + | `CURSOR_UPDATE_INTERVAL` | How often to save the firehose cursor to disk (in milliseconds). | `60000` | 100 + | `LABEL_LIMIT` | (Optional) API call limit for labeling. | `undefined` | 101 + | `LABEL_LIMIT_WAIT` | (Optional) Wait time when label limit is hit. | `undefined` | 102 + | `LOG_LEVEL` | The logging level. | `info` |