database migrations#
current state (automated ✓)#
plyr.fm uses automated database migrations via fly.io's release_command.
how it works#
1. developer creates migration locally
↓
2. commit and push to main
↓
3. github actions triggers deployment
↓
4. fly.io builds docker image
↓
5. fly.io runs release_command BEFORE deploying
- temporary machine spins up
- runs: uv run alembic upgrade head
- if succeeds → proceed to deployment
- if fails → abort, keep old version running
↓
6. new app version deploys with updated schema
configuration#
fly.toml:
[deploy]
release_command = "uv run alembic upgrade head"
benefits:
- zero manual intervention required
- migrations run before new code serves traffic (no inconsistent state)
- automatic rollback if migration fails
- clear deployment logs showing migration output
how dev/prod database separation works#
plyr.fm uses environment-based database configuration to ensure migrations always target the correct database.
the key mechanism: DATABASE_URL environment variable
┌─────────────────────────────────────────────────────────────┐
│ alembic/env.py (migration runtime) │
│ │
│ 1. imports backend.config.settings │
│ 2. reads settings.database.url │
│ 3. sets alembic connection string │
│ │
│ config.set_main_option("sqlalchemy.url", settings.database.url)
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ src/backend/config.py (pydantic-settings) │
│ │
│ class Settings(BaseSettings): │
│ database_url: str = Field( │
│ default="postgresql+asyncpg://localhost/plyr" │
│ ) │
│ │
│ reads from DATABASE_URL environment variable │
└─────────────────────────────────────────────────────────────┘
│
▼
┌───────────────┴───────────────┐
│ │
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────┐
│ local development │ │ production (fly.io) │
│ │ │ │
│ .env file: │ │ fly secrets: │
│ DATABASE_URL= │ │ DATABASE_URL= │
│ postgresql+asyncpg:// │ │ postgresql:// │
│ [neon dev connection] │ │ [neon prod connection] │
│ │ │ │
│ when you run: │ │ when fly.io runs: │
│ uv run alembic upgrade │ │ release_command: │
│ │ │ uv run alembic upgrade │
│ → migrates DEV db │ │ → migrates PROD db │
└──────────────────────────┘ └──────────────────────────┘
why this is safe:
- no shared configuration: local and production environments have completely separate
DATABASE_URLvalues - environment-specific secrets: production database URL is stored in fly.io secrets, never in code
- explicit context: you cannot accidentally run a production migration locally because your local
DATABASE_URLpoints to the Neon dev database - explicit context: fly.io cannot run migrations against your dev database because it only knows about the production
DATABASE_URL
concrete example:
# local development
$ cat .env
DATABASE_URL=postgresql+asyncpg://neon_user:***@ep-muddy-flower-98795112.us-east-2.aws.neon.tech/plyr-dev
$ uv run alembic upgrade head
# connects to neon dev database (plyr-dev)
# migrates your development database
# production (inside fly.io release machine)
$ echo $DATABASE_URL
postgresql://neon_user:***@ep-cold-butterfly-11920742.us-east-1.aws.neon.tech/plyr-prd
$ uv run alembic upgrade head
# connects to neon production database (plyr-prd)
# migrates your production database
migration flow for each environment:
local development:
1. developer edits model in src/backend/models/
2. runs: uv run alembic revision --autogenerate -m "description"
3. alembic reads DATABASE_URL from .env (neon dev)
4. generates migration by comparing:
- current model state (code)
- current database state (neon dev database)
5. runs: uv run alembic upgrade head
6. migration applies to dev database
production deployment:
1. developer commits migration file to git
2. pushes to main branch
3. github actions triggers deployment
4. fly.io builds docker image (includes migration files)
5. fly.io starts temporary release machine
6. release machine has DATABASE_URL from fly secrets (production neon)
7. release machine runs: uv run alembic upgrade head
8. alembic reads DATABASE_URL (production neon)
9. migration applies to production database
10. if successful, deployment proceeds
11. if failed, deployment aborts (old version keeps running)
test database (separate again):
tests use a third database entirely:
# tests/conftest.py
def test_database_url(worker_id: str) -> str:
return "postgresql+asyncpg://plyr_test:plyr_test@localhost:5433/plyr_test"
this ensures:
- tests never touch dev or prod databases
- tests can run in parallel (separate databases per worker)
- test data is isolated and can be cleared between tests
the complete picture:
┌──────────────────────────────────────────────────────────────┐
│ four separate databases (three neon instances + local test): │
│ │
│ 1. dev (neon: plyr-dev / muddy-flower-98795112) │
│ - for local development │
│ - set via .env: DATABASE_URL=postgresql+asyncpg://... │
│ - migrations run manually: uv run alembic upgrade head │
│ │
│ 2. staging (neon: plyr-stg / frosty-math-37367092) │
│ - for staging environment │
│ - set via fly secrets on relay-api-staging │
│ - migrations run automatically via release_command │
│ │
│ 3. prod (neon: plyr-prd / cold-butterfly-11920742) │
│ - for production traffic │
│ - set via fly secrets: DATABASE_URL=postgresql://... │
│ - migrations run automatically via release_command │
│ │
│ 4. test (localhost:5433/plyr_test) │
│ - for automated tests only │
│ - set via conftest.py fixture │
│ - schema created from models directly (no migrations) │
│ │
│ these databases never interact or share configuration │
└──────────────────────────────────────────────────────────────┘
recent pain points (2025-11-02)#
when deploying timezone support migration 31e69ba0c570:
- dockerfile didn't include migration files - had to create PR #14 to add
COPY alembic.iniandCOPY alembic ./alembic - alembic version tracking out of sync - production database had
user_preferencestable but alembic thought version was older, causing "relation already exists" errors - manual stamp needed - had to run
flyctl ssh console -a relay-api -C "uv run alembic stamp 9e8c7aa5b945"to fix version tracking - manual migration execution - had to run
flyctl ssh console -a relay-api -C "uv run alembic upgrade head"after deployment - blocked deployment - couldn't deploy until all manual steps completed
this took ~30 minutes of manual intervention for what should be automatic.
how reference project N does it (the right way)#
reference project N has a sophisticated automated migration system:
architecture overview#
merge to main
↓
detect changed migration files (paths-filter)
↓
build docker image with migrations included
↓
run migrations via google cloud run jobs
├─ separate job per database (auth, background, events)
├─ jobs execute before app deployment
├─ jobs block until migration completes
└─ deployment fails if migration fails
↓
update deployment manifests
↓
kubernetes/flux picks up new image
↓
app starts serving traffic with new schema
key components#
1. migration detection
# .github/paths-filter-pg-migrations.yml
service_a:
- 'src/project/service_a/migrations/**'
service_b:
- 'src/project/service_b/migrations/**'
- 'src/project/utilities/database.py'
uses dorny/paths-filter@v3 to detect which migrations changed in a PR.
2. cloud run migration jobs
separate jobs for running migrations:
service-a-db-migration-devservice-a-db-migration-stgservice-b-events-migration-dev- etc.
jobs are updated with new image tag and executed with --wait flag.
3. python migration utilities
# from reference project N's database.py
def alembic_upgrade(database: str, revision: str = "head"):
"""Run alembic upgrades on database"""
import alembic.command
alembic.command.upgrade(alembic_config(database), revision)
clean python API for running migrations programmatically.
4. cli commands
# reference project N provides CLI for manual operations
project-cli database upgrade
project-cli database downgrade -r <revision>
project-cli database reset
5. safety mechanisms
- migrations run in dedicated jobs (not in app containers)
ALEMBIC_LOCKprevents concurrent execution- global roles updated after migrations complete
- separate environments (dev, stg, prod) test migrations before production
- deployment fails if migration fails
why this works#
isolation: migrations run in separate containers, not in the app
- prevents race conditions between multiple app instances
- allows long-running migrations without blocking app startup
- easier to debug when migrations fail
ordering: migrations complete before app deployment
- ensures schema is ready when app starts
- no "app running on old schema" bugs
- rollback is automatic (deployment doesn't proceed if migration fails)
observability: cloud run jobs provide logs and status
- easy to see what migration ran and when
- clear failure messages
- can re-run jobs manually if needed
automation: zero manual intervention
- no SSH required
- no human error
- consistent across environments
implementation history#
what we tried#
initial approach (broken): bash wrapper script
[deploy]
release_command = './scripts/migrate.sh'
problem: timed out during VM startup, likely due to script overhead or environment setup issues.
solution: direct command execution
[deploy]
release_command = "uv run alembic upgrade head"
result: works perfectly. migrations complete in ~3 seconds, no timeouts.
key lesson: fly.io's release_command works reliably when you give it a direct command instead of wrapping it in a shell script. the wrapper script was the problem, not VM resources or timeouts.
current implementation (working ✓)#
fly.toml:
[deploy]
release_command = "uv run alembic upgrade head"
github actions (.github/workflows/deploy-backend.yml):
- name: detect changes
uses: dorny/paths-filter@v3
id: changes
with:
filters: .github/path-filters.yml
- name: deploy to fly.io
run: |
if [ "${{ steps.changes.outputs.migrations }}" == "true" ]; then
echo "🔄 migrations detected - will run via release_command before deployment"
fi
flyctl deploy --remote-only
path filters (.github/path-filters.yml):
migrations:
- "alembic/versions/**"
- "alembic/env.py"
- "alembic.ini"
this setup:
- detects when migrations change
- logs that migrations will run (for visibility)
- deploys normally
- fly.io automatically runs
release_commandbefore deployment - migrations succeed → deployment proceeds
- migrations fail → deployment aborts, old version keeps running
alternative approaches considered#
option: post-deployment github actions job
run migrations via SSH after fly deployment completes:
- name: run database migrations
run: flyctl ssh console -a plyr-api -C "uv run alembic upgrade head"
why we didn't use this:
- migrations run AFTER app starts (brief window where app has wrong schema)
- requires implementing deployment polling logic
- SSH from CI is a security consideration
- fly.io
release_commandis simpler and runs migrations BEFORE deployment
option: neon branch-based migrations
use neon's branch features for zero-downtime migrations (test on branch, then promote):
why we didn't use this:
- adds complexity for marginal benefit at current scale
- our migrations are simple and fast (~3 seconds)
- can revisit when we have complex, long-running migrations
current capabilities#
migration isolation via release_command (already implemented):
- fly.io's
release_commandruns migrations in a separate temporary machine before deployment - migrations complete before app serves traffic (no inconsistent state)
- deployment automatically aborts if migration fails
- this provides similar benefits to kubernetes init containers
multi-environment pipeline (already implemented):
- dev → staging → production progression via three neon databases
- migrations tested locally against neon dev first
- staging deployment validates migrations before production
- automated via GitHub Actions
future considerations#
as plyr.fm scales, we may want to explore:
neon branch-based migrations (for complex changes):
- test migrations on database branch first
- promote branch to production (instant swap)
- zero downtime, instant rollback
- useful for high-risk schema changes
automated smoke tests:
- run basic API health checks after migration completes
- verify critical queries still work
- alert if performance degrades significantly
migration best practices#
before creating migration#
-
check current state
# local uv run alembic current # production flyctl ssh console -a relay-api -C "uv run alembic current" -
ensure schemas are in sync
# generate test migration uv run alembic revision --autogenerate -m "test" # if file is empty, schemas match # if file has changes, schemas are out of sync (fix this first) # remove test file rm alembic/versions/*_test.py
creating migration#
-
make model changes
- edit files in
src/backend/models/ - keep changes focused (one logical change per migration)
- edit files in
-
generate migration
uv run alembic revision --autogenerate -m "descriptive name" -
review generated migration
- alembic autogenerate is not perfect
- verify the upgrade() function does what you expect
- check for missing operations (autogenerate doesn't detect everything)
- ensure downgrade() is correct
-
test locally
# upgrade uv run alembic upgrade head # verify schema uv run python -c "from backend.models import Track; print(Track.__table__.columns)" # test downgrade uv run alembic downgrade -1 uv run alembic upgrade head -
test with actual app
# start backend with new schema uv run uvicorn backend.main:app --reload # verify endpoints work curl http://localhost:8001/health
deploying migration#
-
commit migration file only
git add alembic/versions/<hash>_<description>.py git commit -m "migration: <description>" -
create PR and review
- include migration details in PR description
- note any backward compatibility concerns
- tag for review if complex
-
merge and deploy
- merge to main
- watch deployment
- run migration (manual for now, automated soon)
-
verify deployment
# check revision flyctl ssh console -a relay-api -C "uv run alembic current" # check app health curl https://relay-api.fly.dev/health # check logs flyctl logs --app relay-api
handling failed migrations#
if migration fails during upgrade:
-
check what failed
flyctl logs --app relay-api | grep alembic -
check database state
flyctl ssh console -a relay-api -C "uv run alembic current" -
fix the issue
- if migration was partially applied, may need manual SQL to fix
- if migration didn't apply, fix and re-run
- if data is corrupted, may need to restore from backup
-
options:
a. downgrade and fix:
flyctl ssh console -a relay-api -C "uv run alembic downgrade -1" # fix migration file locally # commit and redeployb. create fix migration:
# create new migration that fixes the issue uv run alembic revision -m "fix previous migration" # implement fix in upgrade() # commit and deployc. manual SQL fix:
flyctl ssh console -a relay-api # connect to database # run manual SQL to fix state # stamp to correct revision # exit
zero-downtime migration patterns#
for production systems, some schema changes require special handling:
adding nullable columns (safe):
def upgrade():
op.add_column('tracks', sa.Column('new_field', sa.String(), nullable=True))
adding non-nullable columns (requires two-step):
# migration 1: add nullable
def upgrade():
op.add_column('tracks', sa.Column('new_field', sa.String(), nullable=True))
# migration 2: populate and make non-null
def upgrade():
op.execute("UPDATE tracks SET new_field = 'default' WHERE new_field IS NULL")
op.alter_column('tracks', 'new_field', nullable=False)
dropping columns (requires two-step):
# migration 1: stop using column in app code
# deploy app
# migration 2: drop column
def upgrade():
op.drop_column('tracks', 'old_field')
renaming columns (requires three-step):
# migration 1: add new column
def upgrade():
op.add_column('tracks', sa.Column('new_name', sa.String()))
op.execute("UPDATE tracks SET new_name = old_name")
# migration 2: switch app to use new column
# deploy app
# migration 3: drop old column
def upgrade():
op.drop_column('tracks', 'old_name')
references#
internal#
- plyr.fm dockerfile:
Dockerfile - plyr.fm fly config:
fly.toml - alembic config:
alembic.ini - alembic env:
alembic/env.py - models:
src/backend/models/
external#
- alembic documentation: https://alembic.sqlalchemy.org/
- fly.io release commands: https://fly.io/docs/reference/configuration/#release_command
- fly.io ssh: https://fly.io/docs/flyctl/ssh/
- neon branching: https://neon.tech/docs/guides/branching
github actions#
- paths-filter action: https://github.com/dorny/paths-filter
- flyctl actions: https://github.com/superfly/flyctl-actions
last updated: 2025-12-05 status: fully automated via fly.io release_command ✓ owner: @zzstoatzz