A tool for tailing the firehose and matching images against known perceptual hashes, and then labeling them.
TypeScript 99.6%
Dockerfile 0.4%
6 1 1

Clone this repository

https://tangled.org/skywatch.blue/skywatch-phash
git@tangled.org:skywatch.blue/skywatch-phash

For self-hosted knots, clone URLs may differ based on your setup.

README.md

skywatch-phash#

Perceptual hash-based image moderation service for Bluesky/ATProto. Detects known harassment images using phash fingerprinting and automatically applies labels and reports.

How it works#

  1. Subscribes to Bluesky firehose via Jetstream
  2. Extracts images from posts and computes perceptual hashes
  3. Compares against known harassment image hashes using Hamming distance
  4. On match, executes configured moderation actions (label/report post and/or account)
  5. Caches phashes in Redis to avoid re-fetching viral images

Features#

  • Fast matching - Hamming distance threshold for fuzzy matching (handles crops, filters, etc)
  • Caching - Redis-backed phash cache (24hr TTL by default)
  • Deduplication - Prevents duplicate labels/reports via Redis claims (7-day TTL)
  • Allowlisting - Skip checks for trusted accounts via ignoreDID field
  • Rate limiting - Configurable delay between moderation API calls
  • Metrics - Tracks cache hits, matches, labels applied, etc

Setup#

Prerequisites#

  • Bun runtime
  • Redis server
  • Bluesky labeler account with app password

Installation#

bun install

Configuration#

Copy .env.example to .env and configure:

# Required
LABELER_DID=did:plc:your-labeler-did
LABELER_HANDLE=your-labeler.bsky.social
LABELER_PASSWORD=your-app-password

# Optional (defaults shown)
JETSTREAM_URL=wss://jetstream1.us-east.fire.hose.cam/subscribe
REDIS_URL=redis://localhost:6379
PROCESSING_CONCURRENCY=10
CACHE_ENABLED=true
CACHE_TTL_SECONDS=86400
OZONE_URL=https://ozone.skywatch.blue
OZONE_PDS=https://blewit.us-west.host.bsky.network
MOD_DID=did:plc:e4elbtctnfqocyfcml6h2lf7
RATE_LIMIT_MS=100

Adding phash rules#

Edit rules/blobs.ts:

export const BLOB_CHECKS: BlobCheck[] = [
  {
    phashes: ["0f1e2d3c4b5a6978", "1a2b3c4d5e6f7890"],
    label: "harassment-image",
    comment: "Known harassment meme detected",
    reportAcct: false,
    labelAcct: false,
    reportPost: true,
    toLabel: true,
    hammingThreshold: 5,
    ignoreDID: ["did:plc:trusted-account"],
  },
];

Hamming threshold guide:

  • 0 = Exact match only (very strict)
  • 1-2 = Nearly identical images (minor compression artifacts)
  • 3-4 = Very similar images (slight edits, crops)
  • 5-8 = Similar images (moderate edits)
  • 10+ = Loosely similar images (too permissive)

To generate a phash from an image:

bun run phash /path/to/image.png

Running#

Development#

bun run dev

Production#

bun run start

Docker#

docker compose up -d

Testing#

bun test              # run all tests
bun run typecheck     # type checking
bun run lint          # linting

VM Requirements#

Minimal:

  • 2GB RAM
  • 2 vCPUs
  • 10GB disk

Recommended:

  • 4GB RAM
  • 2-4 vCPUs
  • 20GB disk

Scale PROCESSING_CONCURRENCY based on available RAM (each concurrent image process uses ~50-200MB).

License#

MIT