A rust implementation of skywatch-phash
1# skywatch-phash-rs 2 3Rust implementation of Bluesky image moderation service using perceptual hashing (aHash/average hash algorithm). 4 5Monitors Bluesky's Jetstream for posts with images, computes perceptual hashes, matches against known bad images, and takes automated moderation actions (label/report posts and accounts). 6 7## Features 8 9- Real-time Jetstream subscription for post monitoring 10- Perceptual hash (aHash) computation for images 11- Configurable hamming distance thresholds per rule 12- Redis-backed job queue and phash caching 13- Concurrent worker pool for parallel processing 14- Automatic retry with dead letter queue 15- Metrics tracking and logging 16- Graceful shutdown handling 17 18## Prerequisites 19 20- **For Docker deployment:** 21 - Docker and Docker Compose 22- **For local development:** 23 - Nix with flakes enabled (recommended), OR 24 - Rust 1.83+ 25- **Required for all:** 26 - Bluesky labeler account with app password 27 28## Quick Start 29 301. **Clone and setup:** 31 ```bash 32 git clone <repo-url> 33 cd skywatch-phash-rs 34 ``` 35 362. **Configure environment:** 37 ```bash 38 cp .env.example .env 39 # Edit .env and fill in your automod account credentials: 40 # - AUTOMOD_HANDLE 41 # - AUTOMOD_PASSWORD 42 ``` 43 443. **Start the service:** 45 ```bash 46 docker compose up --build 47 ``` 48 494. **Monitor logs:** 50 ```bash 51 docker compose logs -f app 52 ``` 53 545. **Stop the service:** 55 ```bash 56 docker compose down 57 ``` 58 59## Phash CLI Tool 60 61Compute perceptual hash for a single image: 62 63```bash 64# Using cargo 65cargo run --bin phash-cli path/to/image.jpg 66 67# Or build and run 68cargo build --release --bin phash-cli 69./target/release/phash-cli image.png 70``` 71 72Output is a 16-character hex string (64-bit hash): 73``` 74e0e0e0e0e0fcfefe 75``` 76 77Use this to generate hashes for your blob check rules. 78 79## Configuration 80 81All configuration is via environment variables (see `.env.example`): 82 83### Required Variables 84 85- `AUTOMOD_HANDLE` - Your automod account handle (e.g., automod.bsky.social) 86- `AUTOMOD_PASSWORD` - App password for automod account 87- `LABELER_DID` - DID of your main labeler account (e.g., skywatch.blue) 88- `OZONE_URL` - Ozone moderation service URL 89- `OZONE_PDS` - Ozone PDS endpoint (for authentication) 90 91### Optional Variables 92 93- `PROCESSING_CONCURRENCY` (default: 4) - Max parallel job processing 94- `PHASH_HAMMING_THRESHOLD` (default: 5) - Global hamming distance threshold 95- `CACHE_ENABLED` (default: true) - Enable Redis phash caching 96- `CACHE_TTL_SECONDS` (default: 86400) - Cache TTL (24 hours) 97- `RETRY_ATTEMPTS` (default: 3) - Max retry attempts for failed jobs 98- `JETSTREAM_URL` - Jetstream websocket URL 99- `REDIS_URL` - Redis connection string 100 101## Blob Check Rules 102 103Rules are defined in `rules/blobs.json`: 104 105```json 106[ 107 { 108 "phashes": ["e0e0e0e0e0fcfefe", "9b9e00008f8fffff"], 109 "label": "spam", 110 "comment": "Known spam image detected", 111 "reportAcct": false, 112 "labelAcct": true, 113 "reportPost": true, 114 "toLabel": true, 115 "hammingThreshold": 3, 116 "description": "Optional description", 117 "ignoreDID": ["did:plc:exempted-user"] 118 } 119] 120``` 121 122### Rule Fields 123 124- `phashes` - Array of 16-char hex hashes to match against 125- `label` - Label to apply (e.g., "spam", "csam", "troll") 126- `comment` - Comment for reports 127- `reportAcct` - Report the account 128- `labelAcct` - Label the account 129- `reportPost` - Report the post 130- `toLabel` - Label the post 131- `hammingThreshold` - Max hamming distance for match (overrides global) 132- `description` - Optional description (not used by system) 133- `ignoreDID` - Optional array of DIDs to skip 134 135## Architecture 136 137``` 138Jetstream WebSocket 139140 Job Channel (mpsc) 141142 Redis Queue (FIFO) 143144 Worker Pool (semaphore-controlled concurrency) 145146 ┌─────────────────┐ 147 │ For each job: │ 148 │ 1. Check cache │ 149 │ 2. Download blob│ 150 │ 3. Compute phash│ 151 │ 4. Match rules │ 152 │ 5. Take actions │ 153 └─────────────────┘ 154155 Metrics Tracking 156``` 157 158### Components 159 160- **Jetstream Client** - Subscribes to Bluesky firehose, filters posts with images 161- **Job Queue** - Redis-backed FIFO queue with retry logic 162- **Worker Pool** - Configurable concurrency with semaphore control 163- **Phash Cache** - Redis-backed cache for computed hashes (reduces redundant work) 164- **Agent Session** - Authenticated session with automatic token refresh 165- **Metrics** - Lock-free atomic counters for monitoring 166 167## Development 168 169### Using Nix (Recommended) 170 171If you have Nix with flakes enabled: 172 173```bash 174# Enter dev shell with all dependencies 175nix develop 176 177# Or use direnv for automatic environment loading 178direnv allow 179 180# Build the project 181nix build 182 183# Run the binary 184nix run 185``` 186 187The Nix flake provides: 188- Rust toolchain (stable latest) 189- Native dependencies (OpenSSL, pkg-config) 190- Development tools (cargo-watch, redis) 191- Reproducible builds across Linux and macOS 192 193### Without Nix 194 195Run locally without Docker: 196 197```bash 198# Start Redis 199docker run -d -p 6379:6379 redis 200 201# Create .env 202cp .env.example .env 203# Edit .env with your credentials 204 205# Run service 206cargo run 207 208# Run tests 209cargo test 210 211# Run specific binary 212cargo run --bin phash-cli image.jpg 213``` 214 215## Metrics 216 217Logged every 60 seconds: 218 219- **Jobs**: received, processed, failed, retried 220- **Blobs**: processed, downloaded 221- **Matches**: found 222- **Cache**: hits, misses, hit rate 223- **Moderation**: posts/accounts labeled and reported 224 225Final metrics are logged on graceful shutdown (Ctrl+C). 226 227## License 228 229See LICENSE file.