hotfix: add configurable timeout to queue listener database connection (#280)
CRITICAL FIX for production outage at 2025-11-18 05:04 UTC.
## root cause
asyncpg.connect() in queue listener had no timeout. when Neon database
was slow/unresponsive, the connection attempt would hang indefinitely,
exhausting SQLAlchemy's 5-connection pool and causing all API requests
to timeout with 500 errors.
## analysis
queried logfire for healthy connection latencies:
- typical: 2-4ms
- p50: 25ms
- p95: 352ms
- p99: 549ms
## fix
- add QUEUE_CONNECT_TIMEOUT setting (default: 3.0s)
- use asyncio.wait_for() with configurable timeout on asyncpg.connect()
- timeout provides 5x headroom over p99 while failing fast
- add specific TimeoutError handling with clear logging
- connection failures now fail fast instead of hanging
## configuration
adjust timeout via environment variable if needed:
```
QUEUE_CONNECT_TIMEOUT=5.0 # increase if seeing timeout errors
```
## impact
prevents queue listener from blocking all database connections when
Neon is slow. the listener will retry after reconnect_delay (5s)
instead of hanging indefinitely and breaking the entire app.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude <noreply@anthropic.com>
authored by
zzstoatzz.io
Claude
and committed by
GitHub
d6c22c19
9f632cfc