QuickDID is a high-performance AT Protocol identity resolution service written in Rust. It provides handle-to-DID resolution with Redis-backed caching and queue processing.
1# QuickDID - Development Guide for Claude 2 3## Overview 4QuickDID is a high-performance AT Protocol identity resolution service written in Rust. It provides bidirectional handle-to-DID and DID-to-handle resolution with multi-layer caching (Redis, SQLite, in-memory), queue processing, metrics support, proactive cache refreshing, and real-time cache updates via Jetstream consumer. 5 6## Configuration 7 8QuickDID follows the 12-factor app methodology and uses environment variables exclusively for configuration. There are no command-line arguments except for `--version` and `--help`. 9 10Configuration is validated at startup, and the service will exit with specific error codes if validation fails: 11- `error-quickdid-config-1`: Missing required environment variable 12- `error-quickdid-config-2`: Invalid configuration value 13- `error-quickdid-config-3`: Invalid TTL value (must be positive) 14- `error-quickdid-config-4`: Invalid timeout value (must be positive) 15 16## Common Commands 17 18### Building and Running 19```bash 20# Build the project 21cargo build 22 23# Run in debug mode (requires environment variables) 24HTTP_EXTERNAL=localhost:3007 cargo run 25 26# Run tests 27cargo test 28 29# Type checking 30cargo check 31 32# Linting 33cargo clippy 34 35# Show version 36cargo run -- --version 37 38# Show help 39cargo run -- --help 40``` 41 42### Development with VS Code 43The project includes a `.vscode/launch.json` configuration for debugging with Redis integration. Use the "Debug executable 'quickdid'" launch configuration. 44 45## Architecture 46 47### Core Components 48 491. **Handle Resolution** (`src/handle_resolver/`) 50 - `BaseHandleResolver`: Core resolution using DNS and HTTP 51 - `RateLimitedHandleResolver`: Semaphore-based rate limiting with optional timeout 52 - `CachingHandleResolver`: In-memory caching layer with bidirectional support 53 - `RedisHandleResolver`: Redis-backed persistent caching with bidirectional lookups 54 - `SqliteHandleResolver`: SQLite-backed persistent caching with bidirectional support 55 - `ProactiveRefreshResolver`: Automatically refreshes cache entries before expiration 56 - All resolvers implement `HandleResolver` trait with: 57 - `resolve`: Handle-to-DID resolution 58 - `purge`: Remove entries by handle or DID 59 - `set`: Manually update handle-to-DID mappings 60 - Uses binary serialization via `HandleResolutionResult` for space efficiency 61 - Resolution stack: Cache → ProactiveRefresh (optional) → RateLimited (optional) → Base → DNS/HTTP 62 - Includes resolution timing measurements for metrics 63 642. **Binary Serialization** (`src/handle_resolution_result.rs`) 65 - Compact storage format using bincode 66 - Strips DID prefixes for did:web and did:plc methods 67 - Stores: timestamp (u64), method type (i16), payload (String) 68 693. **Queue System** (`src/queue/`) 70 - Supports MPSC (in-process), Redis, SQLite, and no-op adapters 71 - `HandleResolutionWork` items processed asynchronously 72 - Redis uses reliable queue pattern (LPUSH/RPOPLPUSH/LREM) 73 - SQLite provides persistent queue with work shedding capabilities 74 754. **HTTP Server** (`src/http/`) 76 - XRPC endpoints for AT Protocol compatibility 77 - Health check endpoint 78 - Static file serving from configurable directory (default: www) 79 - Serves .well-known files as static content 80 - CORS headers support for cross-origin requests 81 - Cache-Control headers with configurable max-age and stale directives 82 - ETag support with configurable seed for cache invalidation 83 845. **Metrics System** (`src/metrics.rs`) 85 - Pluggable metrics publishing with StatsD support 86 - Tracks counters, gauges, and timings 87 - Configurable tags for environment/service identification 88 - No-op adapter for development environments 89 - Metrics for Jetstream event processing 90 916. **Jetstream Consumer** (`src/jetstream_handler.rs`) 92 - Consumes AT Protocol firehose events via WebSocket 93 - Processes Account events (purges deleted/deactivated accounts) 94 - Processes Identity events (updates handle-to-DID mappings) 95 - Automatic reconnection with exponential backoff 96 - Comprehensive metrics for event processing 97 - Spawned as cancellable task using task manager 98 99## Key Technical Details 100 101### DID Method Types 102- `did:web`: Web-based DIDs, prefix stripped for storage 103- `did:plc`: PLC directory DIDs, prefix stripped for storage 104- Other DID methods stored with full identifier 105 106### Redis Integration 107- **Bidirectional Caching**: 108 - Stores both handle→DID and DID→handle mappings 109 - Uses MetroHash64 for key generation 110 - Binary data storage for efficiency 111 - Automatic synchronization of both directions 112- **Queuing**: Reliable queue with processing/dead letter queues 113- **Key Prefixes**: Configurable via `QUEUE_REDIS_PREFIX` environment variable 114 115### Handle Resolution Flow 1161. Check cache (Redis/SQLite/in-memory based on configuration) 1172. If cache miss and rate limiting enabled: 118 - Acquire semaphore permit (with optional timeout) 119 - If timeout configured and exceeded, return error 1203. Perform DNS TXT lookup or HTTP well-known query 1214. Cache result with appropriate TTL in both directions (handle→DID and DID→handle) 1225. Return DID or error 123 124### Cache Management Operations 125- **Purge**: Removes entries by either handle or DID 126 - Uses `atproto_identity::resolve::parse_input` for identifier detection 127 - Removes both handle→DID and DID→handle mappings 128 - Chains through all resolver layers 129- **Set**: Manually updates handle-to-DID mappings 130 - Updates both directions in cache 131 - Normalizes handles to lowercase 132 - Chains through all resolver layers 133 134## Environment Variables 135 136### Required 137- `HTTP_EXTERNAL`: External hostname for service endpoints (e.g., `localhost:3007`) 138 139### Optional - Core Configuration 140- `HTTP_PORT`: Server port (default: 8080) 141- `PLC_HOSTNAME`: PLC directory hostname (default: plc.directory) 142- `RUST_LOG`: Logging level (e.g., debug, info) 143- `STATIC_FILES_DIR`: Directory for serving static files (default: www) 144 145### Optional - Caching 146- `REDIS_URL`: Redis connection URL for caching 147- `SQLITE_URL`: SQLite database URL for caching (e.g., `sqlite:./quickdid.db`) 148- `CACHE_TTL_MEMORY`: TTL for in-memory cache in seconds (default: 600) 149- `CACHE_TTL_REDIS`: TTL for Redis cache in seconds (default: 7776000) 150- `CACHE_TTL_SQLITE`: TTL for SQLite cache in seconds (default: 7776000) 151 152### Optional - Queue Configuration 153- `QUEUE_ADAPTER`: Queue type - 'mpsc', 'redis', 'sqlite', 'noop', or 'none' (default: mpsc) 154- `QUEUE_REDIS_PREFIX`: Redis key prefix for queues (default: queue:handleresolver:) 155- `QUEUE_WORKER_ID`: Worker ID for queue operations (default: worker1) 156- `QUEUE_BUFFER_SIZE`: Buffer size for MPSC queue (default: 1000) 157- `QUEUE_SQLITE_MAX_SIZE`: Max queue size for SQLite work shedding (default: 10000) 158- `QUEUE_REDIS_TIMEOUT`: Redis blocking timeout in seconds (default: 5) 159- `QUEUE_REDIS_DEDUP_ENABLED`: Enable queue deduplication to prevent duplicate handles (default: false) 160- `QUEUE_REDIS_DEDUP_TTL`: TTL for deduplication keys in seconds (default: 60) 161 162### Optional - Rate Limiting 163- `RESOLVER_MAX_CONCURRENT`: Maximum concurrent handle resolutions (default: 0 = disabled) 164- `RESOLVER_MAX_CONCURRENT_TIMEOUT_MS`: Timeout for acquiring rate limit permit in ms (default: 0 = no timeout) 165 166### Optional - HTTP Cache Control 167- `CACHE_MAX_AGE`: Max-age for Cache-Control header in seconds (default: 86400) 168- `CACHE_STALE_IF_ERROR`: Stale-if-error directive in seconds (default: 172800) 169- `CACHE_STALE_WHILE_REVALIDATE`: Stale-while-revalidate directive in seconds (default: 86400) 170- `CACHE_MAX_STALE`: Max-stale directive in seconds (default: 86400) 171- `ETAG_SEED`: Seed value for ETag generation (default: application version) 172 173### Optional - Metrics 174- `METRICS_ADAPTER`: Metrics adapter type - 'noop' or 'statsd' (default: noop) 175- `METRICS_STATSD_HOST`: StatsD host and port (required when METRICS_ADAPTER=statsd, e.g., localhost:8125) 176- `METRICS_STATSD_BIND`: Bind address for StatsD UDP socket (default: [::]:0 for IPv6, can use 0.0.0.0:0 for IPv4) 177- `METRICS_PREFIX`: Prefix for all metrics (default: quickdid) 178- `METRICS_TAGS`: Comma-separated tags (e.g., env:prod,service:quickdid) 179 180### Optional - Proactive Refresh 181- `PROACTIVE_REFRESH_ENABLED`: Enable proactive cache refreshing (default: false) 182- `PROACTIVE_REFRESH_THRESHOLD`: Refresh when TTL remaining is below this threshold (0.0-1.0, default: 0.8) 183 184### Optional - Jetstream Consumer 185- `JETSTREAM_ENABLED`: Enable Jetstream consumer for real-time cache updates (default: false) 186- `JETSTREAM_HOSTNAME`: Jetstream WebSocket hostname (default: jetstream.atproto.tools) 187 188## Error Handling 189 190All error strings must use this format: 191 192 error-quickdid-<domain>-<number> <message>: <details> 193 194Current error domains and examples: 195 196* `config`: Configuration errors (e.g., error-quickdid-config-1 Missing required environment variable) 197* `resolve`: Handle resolution errors (e.g., error-quickdid-resolve-1 Failed to resolve subject) 198* `queue`: Queue operation errors (e.g., error-quickdid-queue-1 Failed to push to queue) 199* `cache`: Cache-related errors (e.g., error-quickdid-cache-1 Redis pool creation failed) 200* `result`: Serialization errors (e.g., error-quickdid-result-1 System time error) 201* `task`: Task processing errors (e.g., error-quickdid-task-1 Queue adapter health check failed) 202 203Errors should be represented as enums using the `thiserror` library. 204 205Avoid creating new errors with the `anyhow!(...)` or `bail!(...)` macro. 206 207## Testing 208 209### Running Tests 210```bash 211# Run all tests 212cargo test 213 214# Run with Redis integration tests 215TEST_REDIS_URL=redis://localhost:6379 cargo test 216 217# Run specific test module 218cargo test handle_resolver::tests 219``` 220 221### Test Coverage Areas 222- Handle resolution with various DID methods 223- Binary serialization/deserialization 224- Redis caching and expiration with bidirectional lookups 225- Queue processing logic 226- HTTP endpoint responses 227- Jetstream event handler processing 228- Purge and set operations across resolver layers 229 230## Development Patterns 231 232### Error Handling 233- Uses strongly-typed errors with `thiserror` for all modules 234- Each error has a unique identifier following the pattern `error-quickdid-<domain>-<number>` 235- Graceful fallbacks when Redis/SQLite is unavailable 236- Detailed tracing for debugging 237- Avoid using `anyhow!()` or `bail!()` macros - use proper error types instead 238 239### Performance Optimizations 240- Binary serialization reduces storage by ~40% 241- MetroHash64 for fast key generation 242- Connection pooling for Redis 243- Configurable TTLs for cache entries 244- Rate limiting via semaphore-based concurrency control 245- HTTP caching with ETag and Cache-Control headers 246- Resolution timing metrics for performance monitoring 247 248### Code Style 249- Follow existing Rust idioms and patterns 250- Use `tracing` for logging, not `println!` 251- Prefer `Arc` for shared state across async tasks 252- Handle errors explicitly, avoid `.unwrap()` in production code 253- Use `httpdate` crate for HTTP date formatting (not `chrono`) 254 255## Common Tasks 256 257### Adding a New DID Method 2581. Update `DidMethodType` enum in `handle_resolution_result.rs` 2592. Modify `parse_did()` and `to_did()` methods 2603. Add test cases for the new method type 261 262### Modifying Cache TTL 263- For in-memory: Set `CACHE_TTL_MEMORY` environment variable 264- For Redis: Set `CACHE_TTL_REDIS` environment variable 265- For SQLite: Set `CACHE_TTL_SQLITE` environment variable 266 267### Configuring Metrics 2681. Set `METRICS_ADAPTER=statsd` and `METRICS_STATSD_HOST=localhost:8125` 2692. Configure tags with `METRICS_TAGS=env:prod,service:quickdid` 2703. Use Telegraf + TimescaleDB for aggregation (see `docs/telegraf-timescaledb-metrics-guide.md`) 2714. Railway deployment resources available in `railway-resources/telegraf/` 272 273### Debugging Resolution Issues 2741. Enable debug logging: `RUST_LOG=debug` 2752. Check Redis cache: 276 - Handle lookup: `redis-cli GET "handle:<hash>"` 277 - DID lookup: `redis-cli GET "handle:<hash>"` (same key format) 2783. Check SQLite cache: `sqlite3 quickdid.db "SELECT * FROM handle_resolution_cache;"` 2794. Monitor queue processing in logs 2805. Check rate limiting: Look for "Rate limit permit acquisition timed out" errors 2816. Verify DNS/HTTP connectivity to AT Protocol infrastructure 2827. Monitor metrics for resolution timing and cache hit rates 2838. Check Jetstream consumer status: 284 - Look for "Jetstream consumer" log entries 285 - Monitor `jetstream.*` metrics 286 - Check reconnection attempts in logs 287 288## Dependencies 289- `atproto-identity`: Core AT Protocol identity resolution 290- `atproto-jetstream`: AT Protocol Jetstream event consumer 291- `bincode`: Binary serialization 292- `deadpool-redis`: Redis connection pooling 293- `metrohash`: Fast non-cryptographic hashing 294- `tokio`: Async runtime 295- `axum`: Web framework 296- `httpdate`: HTTP date formatting (replacing chrono) 297- `cadence`: StatsD metrics client 298- `thiserror`: Error handling