A rust implementation of skywatch-phash
1# skywatch-phash-rs
2
3Rust implementation of Bluesky image moderation service using perceptual hashing (aHash/average hash algorithm).
4
5Monitors Bluesky's Jetstream for posts with images, computes perceptual hashes, matches against known bad images, and takes automated moderation actions (label/report posts and accounts).
6
7## Features
8
9- Real-time Jetstream subscription for post monitoring
10- Perceptual hash (aHash) computation for images
11- Configurable hamming distance thresholds per rule
12- Redis-backed job queue and phash caching
13- Concurrent worker pool for parallel processing
14- Automatic retry with dead letter queue
15- Metrics tracking and logging
16- Graceful shutdown handling
17
18## Prerequisites
19
20- **For Docker deployment:**
21 - Docker and Docker Compose
22- **For local development:**
23 - Nix with flakes enabled (recommended), OR
24 - Rust 1.83+
25- **Required for all:**
26 - Bluesky labeler account with app password
27
28## Quick Start
29
301. **Clone and setup:**
31 ```bash
32 git clone <repo-url>
33 cd skywatch-phash-rs
34 ```
35
362. **Configure environment:**
37 ```bash
38 cp .env.example .env
39 # Edit .env and fill in your automod account credentials:
40 # - AUTOMOD_HANDLE
41 # - AUTOMOD_PASSWORD
42 ```
43
443. **Start the service:**
45 ```bash
46 docker compose up --build
47 ```
48
494. **Monitor logs:**
50 ```bash
51 docker compose logs -f app
52 ```
53
545. **Stop the service:**
55 ```bash
56 docker compose down
57 ```
58
59## Phash CLI Tool
60
61Compute perceptual hash for a single image:
62
63```bash
64# Using cargo
65cargo run --bin phash-cli path/to/image.jpg
66
67# Or build and run
68cargo build --release --bin phash-cli
69./target/release/phash-cli image.png
70```
71
72Output is a 16-character hex string (64-bit hash):
73```
74e0e0e0e0e0fcfefe
75```
76
77Use this to generate hashes for your blob check rules.
78
79## Configuration
80
81All configuration is via environment variables (see `.env.example`):
82
83### Required Variables
84
85- `AUTOMOD_HANDLE` - Your automod account handle (e.g., automod.bsky.social)
86- `AUTOMOD_PASSWORD` - App password for automod account
87- `LABELER_DID` - DID of your main labeler account (e.g., skywatch.blue)
88- `OZONE_URL` - Ozone moderation service URL
89- `OZONE_PDS` - Ozone PDS endpoint (for authentication)
90
91### Optional Variables
92
93- `PROCESSING_CONCURRENCY` (default: 4) - Max parallel job processing
94- `PHASH_HAMMING_THRESHOLD` (default: 5) - Global hamming distance threshold
95- `CACHE_ENABLED` (default: true) - Enable Redis phash caching
96- `CACHE_TTL_SECONDS` (default: 86400) - Cache TTL (24 hours)
97- `RETRY_ATTEMPTS` (default: 3) - Max retry attempts for failed jobs
98- `JETSTREAM_URL` - Jetstream websocket URL
99- `REDIS_URL` - Redis connection string
100
101## Blob Check Rules
102
103Rules are defined in `rules/blobs.json`:
104
105```json
106[
107 {
108 "phashes": ["e0e0e0e0e0fcfefe", "9b9e00008f8fffff"],
109 "label": "spam",
110 "comment": "Known spam image detected",
111 "reportAcct": false,
112 "labelAcct": true,
113 "reportPost": true,
114 "toLabel": true,
115 "hammingThreshold": 3,
116 "description": "Optional description",
117 "ignoreDID": ["did:plc:exempted-user"]
118 }
119]
120```
121
122### Rule Fields
123
124- `phashes` - Array of 16-char hex hashes to match against
125- `label` - Label to apply (e.g., "spam", "csam", "troll")
126- `comment` - Comment for reports
127- `reportAcct` - Report the account
128- `labelAcct` - Label the account
129- `reportPost` - Report the post
130- `toLabel` - Label the post
131- `hammingThreshold` - Max hamming distance for match (overrides global)
132- `description` - Optional description (not used by system)
133- `ignoreDID` - Optional array of DIDs to skip
134
135## Architecture
136
137```
138Jetstream WebSocket
139 ↓
140 Job Channel (mpsc)
141 ↓
142 Redis Queue (FIFO)
143 ↓
144 Worker Pool (semaphore-controlled concurrency)
145 ↓
146 ┌─────────────────┐
147 │ For each job: │
148 │ 1. Check cache │
149 │ 2. Download blob│
150 │ 3. Compute phash│
151 │ 4. Match rules │
152 │ 5. Take actions │
153 └─────────────────┘
154 ↓
155 Metrics Tracking
156```
157
158### Components
159
160- **Jetstream Client** - Subscribes to Bluesky firehose, filters posts with images
161- **Job Queue** - Redis-backed FIFO queue with retry logic
162- **Worker Pool** - Configurable concurrency with semaphore control
163- **Phash Cache** - Redis-backed cache for computed hashes (reduces redundant work)
164- **Agent Session** - Authenticated session with automatic token refresh
165- **Metrics** - Lock-free atomic counters for monitoring
166
167## Development
168
169### Using Nix (Recommended)
170
171If you have Nix with flakes enabled:
172
173```bash
174# Enter dev shell with all dependencies
175nix develop
176
177# Or use direnv for automatic environment loading
178direnv allow
179
180# Build the project
181nix build
182
183# Run the binary
184nix run
185```
186
187The Nix flake provides:
188- Rust toolchain (stable latest)
189- Native dependencies (OpenSSL, pkg-config)
190- Development tools (cargo-watch, redis)
191- Reproducible builds across Linux and macOS
192
193### Without Nix
194
195Run locally without Docker:
196
197```bash
198# Start Redis
199docker run -d -p 6379:6379 redis
200
201# Create .env
202cp .env.example .env
203# Edit .env with your credentials
204
205# Run service
206cargo run
207
208# Run tests
209cargo test
210
211# Run specific binary
212cargo run --bin phash-cli image.jpg
213```
214
215## Metrics
216
217Logged every 60 seconds:
218
219- **Jobs**: received, processed, failed, retried
220- **Blobs**: processed, downloaded
221- **Matches**: found
222- **Cache**: hits, misses, hit rate
223- **Moderation**: posts/accounts labeled and reported
224
225Final metrics are logged on graceful shutdown (Ctrl+C).
226
227## License
228
229See LICENSE file.