A rust implementation of skywatch-phash
Rust 99.5%
Dockerfile 0.4%
Shell 0.1%
Other 0.1%
10 2 0

Clone this repository

https://tangled.org/skywatch.blue/skywatch-phash-rs
git@tangled.org:skywatch.blue/skywatch-phash-rs

For self-hosted knots, clone URLs may differ based on your setup.

README.md

skywatch-phash-rs#

Rust implementation of Bluesky image moderation service using perceptual hashing (aHash/average hash algorithm).

Monitors Bluesky's Jetstream for posts with images, computes perceptual hashes, matches against known bad images, and takes automated moderation actions (label/report posts and accounts).

Features#

  • Real-time Jetstream subscription for post monitoring
  • Perceptual hash (aHash) computation for images
  • Configurable hamming distance thresholds per rule
  • Redis-backed job queue and phash caching
  • Concurrent worker pool for parallel processing
  • Automatic retry with dead letter queue
  • Metrics tracking and logging
  • Graceful shutdown handling

Prerequisites#

  • For Docker deployment:
    • Docker and Docker Compose
  • For local development:
    • Nix with flakes enabled (recommended), OR
    • Rust 1.83+
  • Required for all:
    • Bluesky labeler account with app password

Quick Start#

  1. Clone and setup:

    git clone <repo-url>
    cd skywatch-phash-rs
    
  2. Configure environment:

    cp .env.example .env
    # Edit .env and fill in your automod account credentials:
    # - AUTOMOD_HANDLE
    # - AUTOMOD_PASSWORD
    
  3. Start the service:

    docker compose up --build
    
  4. Monitor logs:

    docker compose logs -f app
    
  5. Stop the service:

    docker compose down
    

Phash CLI Tool#

Compute perceptual hash for a single image:

# Using cargo
cargo run --bin phash-cli path/to/image.jpg

# Or build and run
cargo build --release --bin phash-cli
./target/release/phash-cli image.png

Output is a 16-character hex string (64-bit hash):

e0e0e0e0e0fcfefe

Use this to generate hashes for your blob check rules.

Configuration#

All configuration is via environment variables (see .env.example):

Required Variables#

  • AUTOMOD_HANDLE - Your automod account handle (e.g., automod.bsky.social)
  • AUTOMOD_PASSWORD - App password for automod account
  • LABELER_DID - DID of your main labeler account (e.g., skywatch.blue)
  • OZONE_URL - Ozone moderation service URL
  • OZONE_PDS - Ozone PDS endpoint (for authentication)

Optional Variables#

  • PROCESSING_CONCURRENCY (default: 4) - Max parallel job processing
  • PHASH_HAMMING_THRESHOLD (default: 5) - Global hamming distance threshold
  • CACHE_ENABLED (default: true) - Enable Redis phash caching
  • CACHE_TTL_SECONDS (default: 86400) - Cache TTL (24 hours)
  • RETRY_ATTEMPTS (default: 3) - Max retry attempts for failed jobs
  • JETSTREAM_URL - Jetstream websocket URL
  • REDIS_URL - Redis connection string

Blob Check Rules#

Rules are defined in rules/blobs.json:

[
  {
    "phashes": ["e0e0e0e0e0fcfefe", "9b9e00008f8fffff"],
    "label": "spam",
    "comment": "Known spam image detected",
    "reportAcct": false,
    "labelAcct": true,
    "reportPost": true,
    "toLabel": true,
    "hammingThreshold": 3,
    "description": "Optional description",
    "ignoreDID": ["did:plc:exempted-user"]
  }
]

Rule Fields#

  • phashes - Array of 16-char hex hashes to match against
  • label - Label to apply (e.g., "spam", "csam", "troll")
  • comment - Comment for reports
  • reportAcct - Report the account
  • labelAcct - Label the account
  • reportPost - Report the post
  • toLabel - Label the post
  • hammingThreshold - Max hamming distance for match (overrides global)
  • description - Optional description (not used by system)
  • ignoreDID - Optional array of DIDs to skip

Architecture#

Jetstream WebSocket
        ↓
   Job Channel (mpsc)
        ↓
   Redis Queue (FIFO)
        ↓
   Worker Pool (semaphore-controlled concurrency)
        ↓
   ┌─────────────────┐
   │ For each job:   │
   │ 1. Check cache  │
   │ 2. Download blob│
   │ 3. Compute phash│
   │ 4. Match rules  │
   │ 5. Take actions │
   └─────────────────┘
        ↓
   Metrics Tracking

Components#

  • Jetstream Client - Subscribes to Bluesky firehose, filters posts with images
  • Job Queue - Redis-backed FIFO queue with retry logic
  • Worker Pool - Configurable concurrency with semaphore control
  • Phash Cache - Redis-backed cache for computed hashes (reduces redundant work)
  • Agent Session - Authenticated session with automatic token refresh
  • Metrics - Lock-free atomic counters for monitoring

Development#

If you have Nix with flakes enabled:

# Enter dev shell with all dependencies
nix develop

# Or use direnv for automatic environment loading
direnv allow

# Build the project
nix build

# Run the binary
nix run

The Nix flake provides:

  • Rust toolchain (stable latest)
  • Native dependencies (OpenSSL, pkg-config)
  • Development tools (cargo-watch, redis)
  • Reproducible builds across Linux and macOS

Without Nix#

Run locally without Docker:

# Start Redis
docker run -d -p 6379:6379 redis

# Create .env
cp .env.example .env
# Edit .env with your credentials

# Run service
cargo run

# Run tests
cargo test

# Run specific binary
cargo run --bin phash-cli image.jpg

Metrics#

Logged every 60 seconds:

  • Jobs: received, processed, failed, retried
  • Blobs: processed, downloaded
  • Matches: found
  • Cache: hits, misses, hit rate
  • Moderation: posts/accounts labeled and reported

Final metrics are logged on graceful shutdown (Ctrl+C).

License#

See LICENSE file.