skywatch-phash-rs#
Rust implementation of Bluesky image moderation service using perceptual hashing (aHash/average hash algorithm).
Monitors Bluesky's Jetstream for posts with images, computes perceptual hashes, matches against known bad images, and takes automated moderation actions (label/report posts and accounts).
Features#
- Real-time Jetstream subscription for post monitoring
- Perceptual hash (aHash) computation for images
- Configurable hamming distance thresholds per rule
- Redis-backed job queue and phash caching
- Concurrent worker pool for parallel processing
- Automatic retry with dead letter queue
- Metrics tracking and logging
- Graceful shutdown handling
Prerequisites#
- For Docker deployment:
- Docker and Docker Compose
- For local development:
- Nix with flakes enabled (recommended), OR
- Rust 1.83+
- Required for all:
- Bluesky labeler account with app password
Quick Start#
-
Clone and setup:
git clone <repo-url> cd skywatch-phash-rs -
Configure environment:
cp .env.example .env # Edit .env and fill in your automod account credentials: # - AUTOMOD_HANDLE # - AUTOMOD_PASSWORD -
Start the service:
docker compose up --build -
Monitor logs:
docker compose logs -f app -
Stop the service:
docker compose down
Phash CLI Tool#
Compute perceptual hash for a single image:
# Using cargo
cargo run --bin phash-cli path/to/image.jpg
# Or build and run
cargo build --release --bin phash-cli
./target/release/phash-cli image.png
Output is a 16-character hex string (64-bit hash):
e0e0e0e0e0fcfefe
Use this to generate hashes for your blob check rules.
Configuration#
All configuration is via environment variables (see .env.example):
Required Variables#
AUTOMOD_HANDLE- Your automod account handle (e.g., automod.bsky.social)AUTOMOD_PASSWORD- App password for automod accountLABELER_DID- DID of your main labeler account (e.g., skywatch.blue)OZONE_URL- Ozone moderation service URLOZONE_PDS- Ozone PDS endpoint (for authentication)
Optional Variables#
PROCESSING_CONCURRENCY(default: 4) - Max parallel job processingPHASH_HAMMING_THRESHOLD(default: 5) - Global hamming distance thresholdCACHE_ENABLED(default: true) - Enable Redis phash cachingCACHE_TTL_SECONDS(default: 86400) - Cache TTL (24 hours)RETRY_ATTEMPTS(default: 3) - Max retry attempts for failed jobsJETSTREAM_URL- Jetstream websocket URLREDIS_URL- Redis connection string
Blob Check Rules#
Rules are defined in rules/blobs.json:
[
{
"phashes": ["e0e0e0e0e0fcfefe", "9b9e00008f8fffff"],
"label": "spam",
"comment": "Known spam image detected",
"reportAcct": false,
"labelAcct": true,
"reportPost": true,
"toLabel": true,
"hammingThreshold": 3,
"description": "Optional description",
"ignoreDID": ["did:plc:exempted-user"]
}
]
Rule Fields#
phashes- Array of 16-char hex hashes to match againstlabel- Label to apply (e.g., "spam", "csam", "troll")comment- Comment for reportsreportAcct- Report the accountlabelAcct- Label the accountreportPost- Report the posttoLabel- Label the posthammingThreshold- Max hamming distance for match (overrides global)description- Optional description (not used by system)ignoreDID- Optional array of DIDs to skip
Architecture#
Jetstream WebSocket
↓
Job Channel (mpsc)
↓
Redis Queue (FIFO)
↓
Worker Pool (semaphore-controlled concurrency)
↓
┌─────────────────┐
│ For each job: │
│ 1. Check cache │
│ 2. Download blob│
│ 3. Compute phash│
│ 4. Match rules │
│ 5. Take actions │
└─────────────────┘
↓
Metrics Tracking
Components#
- Jetstream Client - Subscribes to Bluesky firehose, filters posts with images
- Job Queue - Redis-backed FIFO queue with retry logic
- Worker Pool - Configurable concurrency with semaphore control
- Phash Cache - Redis-backed cache for computed hashes (reduces redundant work)
- Agent Session - Authenticated session with automatic token refresh
- Metrics - Lock-free atomic counters for monitoring
Development#
Using Nix (Recommended)#
If you have Nix with flakes enabled:
# Enter dev shell with all dependencies
nix develop
# Or use direnv for automatic environment loading
direnv allow
# Build the project
nix build
# Run the binary
nix run
The Nix flake provides:
- Rust toolchain (stable latest)
- Native dependencies (OpenSSL, pkg-config)
- Development tools (cargo-watch, redis)
- Reproducible builds across Linux and macOS
Without Nix#
Run locally without Docker:
# Start Redis
docker run -d -p 6379:6379 redis
# Create .env
cp .env.example .env
# Edit .env with your credentials
# Run service
cargo run
# Run tests
cargo test
# Run specific binary
cargo run --bin phash-cli image.jpg
Metrics#
Logged every 60 seconds:
- Jobs: received, processed, failed, retried
- Blobs: processed, downloaded
- Matches: found
- Cache: hits, misses, hit rate
- Moderation: posts/accounts labeled and reported
Final metrics are logged on graceful shutdown (Ctrl+C).
License#
See LICENSE file.