zzstoatzz.io / plyr.fm

music on atproto plyr.fm

plyr.fm / docs / runbooks /

at main 2 files

docs: audit and update documentation (#547)

1 month ago

connection-pool-exhaustion.md

docs: audit and update documentation (#547)

1 month ago

README.md

runbooks#

operational procedures for production incidents.

available runbooks#

connection-pool-exhaustion - 500s everywhere, queue listener down, stuck connections

when to use#

runbooks are for known failure modes with established remediation steps. if you encounter a new type of incident:

stabilize first (restart machines if needed)
investigate using logfire
document the incident and create a new runbook

general troubleshooting#

# check machine status
fly status -a relay-api

# view recent logs
fly logs -a relay-api

# restart machines
fly machines list -a relay-api
fly machines restart <machine-id> -a relay-api