1# database migrations 2 3## current state (automated ✓) 4 5plyr.fm uses **automated database migrations** via fly.io's `release_command`. 6 7### how it works 8 9``` 101. developer creates migration locally 11 122. commit and push to main 13 143. github actions triggers deployment 15 164. fly.io builds docker image 17 185. fly.io runs release_command BEFORE deploying 19 - temporary machine spins up 20 - runs: uv run alembic upgrade head 21 - if succeeds → proceed to deployment 22 - if fails → abort, keep old version running 23 246. new app version deploys with updated schema 25``` 26 27### configuration 28 29**fly.toml:** 30```toml 31[deploy] 32 release_command = "uv run alembic upgrade head" 33``` 34 35**benefits:** 36- zero manual intervention required 37- migrations run before new code serves traffic (no inconsistent state) 38- automatic rollback if migration fails 39- clear deployment logs showing migration output 40 41### how dev/prod database separation works 42 43plyr.fm uses **environment-based database configuration** to ensure migrations always target the correct database. 44 45**the key mechanism: `DATABASE_URL` environment variable** 46 47``` 48┌─────────────────────────────────────────────────────────────┐ 49│ alembic/env.py (migration runtime) │ 50│ │ 51│ 1. imports backend.config.settings │ 52│ 2. reads settings.database.url │ 53│ 3. sets alembic connection string │ 54│ │ 55│ config.set_main_option("sqlalchemy.url", settings.database.url) 56└─────────────────────────────────────────────────────────────┘ 57 58 59┌─────────────────────────────────────────────────────────────┐ 60│ src/backend/config.py (pydantic-settings) │ 61│ │ 62│ class Settings(BaseSettings): │ 63│ database_url: str = Field( │ 64│ default="postgresql+asyncpg://localhost/plyr" │ 65│ ) │ 66│ │ 67│ reads from DATABASE_URL environment variable │ 68└─────────────────────────────────────────────────────────────┘ 69 70 71 ┌───────────────┴───────────────┐ 72 │ │ 73 ▼ ▼ 74┌──────────────────────────┐ ┌──────────────────────────┐ 75│ local development │ │ production (fly.io) │ 76│ │ │ │ 77│ .env file: │ │ fly secrets: │ 78│ DATABASE_URL= │ │ DATABASE_URL= │ 79│ postgresql+asyncpg:// │ │ postgresql:// │ 80│ [neon dev connection] │ │ [neon prod connection] │ 81│ │ │ │ 82│ when you run: │ │ when fly.io runs: │ 83│ uv run alembic upgrade │ │ release_command: │ 84│ │ │ uv run alembic upgrade │ 85│ → migrates DEV db │ │ → migrates PROD db │ 86└──────────────────────────┘ └──────────────────────────┘ 87``` 88 89**why this is safe:** 90 911. **no shared configuration**: local and production environments have completely separate `DATABASE_URL` values 922. **environment-specific secrets**: production database URL is stored in fly.io secrets, never in code 933. **explicit context**: you cannot accidentally run a production migration locally because your local `DATABASE_URL` points to the Neon dev database 944. **explicit context**: fly.io cannot run migrations against your dev database because it only knows about the production `DATABASE_URL` 95 96**concrete example:** 97 98```bash 99# local development 100$ cat .env 101DATABASE_URL=postgresql+asyncpg://neon_user:***@ep-muddy-flower-98795112.us-east-2.aws.neon.tech/plyr-dev 102 103$ uv run alembic upgrade head 104# connects to neon dev database (plyr-dev) 105# migrates your development database 106 107# production (inside fly.io release machine) 108$ echo $DATABASE_URL 109postgresql://neon_user:***@ep-cold-butterfly-11920742.us-east-1.aws.neon.tech/plyr-prd 110 111$ uv run alembic upgrade head 112# connects to neon production database (plyr-prd) 113# migrates your production database 114``` 115 116**migration flow for each environment:** 117 118local development: 119``` 1201. developer edits model in src/backend/models/ 1212. runs: uv run alembic revision --autogenerate -m "description" 1223. alembic reads DATABASE_URL from .env (neon dev) 1234. generates migration by comparing: 124 - current model state (code) 125 - current database state (neon dev database) 1265. runs: uv run alembic upgrade head 1276. migration applies to dev database 128``` 129 130production deployment: 131``` 1321. developer commits migration file to git 1332. pushes to main branch 1343. github actions triggers deployment 1354. fly.io builds docker image (includes migration files) 1365. fly.io starts temporary release machine 1376. release machine has DATABASE_URL from fly secrets (production neon) 1387. release machine runs: uv run alembic upgrade head 1398. alembic reads DATABASE_URL (production neon) 1409. migration applies to production database 14110. if successful, deployment proceeds 14211. if failed, deployment aborts (old version keeps running) 143``` 144 145**test database (separate again):** 146 147tests use a third database entirely: 148 149```python 150# tests/conftest.py 151def test_database_url(worker_id: str) -> str: 152 return "postgresql+asyncpg://plyr_test:plyr_test@localhost:5433/plyr_test" 153``` 154 155this ensures: 156- tests never touch dev or prod databases 157- tests can run in parallel (separate databases per worker) 158- test data is isolated and can be cleared between tests 159 160**the complete picture:** 161 162``` 163┌──────────────────────────────────────────────────────────────┐ 164│ four separate databases (three neon instances + local test): │ 165│ │ 166│ 1. dev (neon: plyr-dev / muddy-flower-98795112) │ 167│ - for local development │ 168│ - set via .env: DATABASE_URL=postgresql+asyncpg://... │ 169│ - migrations run manually: uv run alembic upgrade head │ 170│ │ 171│ 2. staging (neon: plyr-stg / frosty-math-37367092) │ 172│ - for staging environment │ 173│ - set via fly secrets on relay-api-staging │ 174│ - migrations run automatically via release_command │ 175│ │ 176│ 3. prod (neon: plyr-prd / cold-butterfly-11920742) │ 177│ - for production traffic │ 178│ - set via fly secrets: DATABASE_URL=postgresql://... │ 179│ - migrations run automatically via release_command │ 180│ │ 181│ 4. test (localhost:5433/plyr_test) │ 182│ - for automated tests only │ 183│ - set via conftest.py fixture │ 184│ - schema created from models directly (no migrations) │ 185│ │ 186│ these databases never interact or share configuration │ 187└──────────────────────────────────────────────────────────────┘ 188``` 189 190### recent pain points (2025-11-02) 191 192when deploying timezone support migration `31e69ba0c570`: 193 1941. **dockerfile didn't include migration files** - had to create PR #14 to add `COPY alembic.ini` and `COPY alembic ./alembic` 1952. **alembic version tracking out of sync** - production database had `user_preferences` table but alembic thought version was older, causing "relation already exists" errors 1963. **manual stamp needed** - had to run `flyctl ssh console -a relay-api -C "uv run alembic stamp 9e8c7aa5b945"` to fix version tracking 1974. **manual migration execution** - had to run `flyctl ssh console -a relay-api -C "uv run alembic upgrade head"` after deployment 1985. **blocked deployment** - couldn't deploy until all manual steps completed 199 200this took ~30 minutes of manual intervention for what should be automatic. 201 202## how reference project N does it (the right way) 203 204reference project N has a sophisticated automated migration system: 205 206### architecture overview 207 208``` 209merge to main 210211detect changed migration files (paths-filter) 212213build docker image with migrations included 214215run migrations via google cloud run jobs 216 ├─ separate job per database (auth, background, events) 217 ├─ jobs execute before app deployment 218 ├─ jobs block until migration completes 219 └─ deployment fails if migration fails 220221update deployment manifests 222223kubernetes/flux picks up new image 224225app starts serving traffic with new schema 226``` 227 228### key components 229 230**1. migration detection** 231```yaml 232# .github/paths-filter-pg-migrations.yml 233service_a: 234 - 'src/project/service_a/migrations/**' 235service_b: 236 - 'src/project/service_b/migrations/**' 237 - 'src/project/utilities/database.py' 238``` 239 240uses `dorny/paths-filter@v3` to detect which migrations changed in a PR. 241 242**2. cloud run migration jobs** 243 244separate jobs for running migrations: 245- `service-a-db-migration-dev` 246- `service-a-db-migration-stg` 247- `service-b-events-migration-dev` 248- etc. 249 250jobs are updated with new image tag and executed with `--wait` flag. 251 252**3. python migration utilities** 253 254```python 255# from reference project N's database.py 256def alembic_upgrade(database: str, revision: str = "head"): 257 """Run alembic upgrades on database""" 258 import alembic.command 259 alembic.command.upgrade(alembic_config(database), revision) 260``` 261 262clean python API for running migrations programmatically. 263 264**4. cli commands** 265 266```bash 267# reference project N provides CLI for manual operations 268project-cli database upgrade 269project-cli database downgrade -r <revision> 270project-cli database reset 271``` 272 273**5. safety mechanisms** 274 275- migrations run in dedicated jobs (not in app containers) 276- `ALEMBIC_LOCK` prevents concurrent execution 277- global roles updated after migrations complete 278- separate environments (dev, stg, prod) test migrations before production 279- deployment fails if migration fails 280 281### why this works 282 283**isolation**: migrations run in separate containers, not in the app 284- prevents race conditions between multiple app instances 285- allows long-running migrations without blocking app startup 286- easier to debug when migrations fail 287 288**ordering**: migrations complete before app deployment 289- ensures schema is ready when app starts 290- no "app running on old schema" bugs 291- rollback is automatic (deployment doesn't proceed if migration fails) 292 293**observability**: cloud run jobs provide logs and status 294- easy to see what migration ran and when 295- clear failure messages 296- can re-run jobs manually if needed 297 298**automation**: zero manual intervention 299- no SSH required 300- no human error 301- consistent across environments 302 303## implementation history 304 305### what we tried 306 307**initial approach (broken)**: bash wrapper script 308```toml 309[deploy] 310 release_command = './scripts/migrate.sh' 311``` 312 313**problem**: timed out during VM startup, likely due to script overhead or environment setup issues. 314 315**solution**: direct command execution 316```toml 317[deploy] 318 release_command = "uv run alembic upgrade head" 319``` 320 321**result**: works perfectly. migrations complete in ~3 seconds, no timeouts. 322 323**key lesson**: fly.io's `release_command` works reliably when you give it a direct command instead of wrapping it in a shell script. the wrapper script was the problem, not VM resources or timeouts. 324 325### current implementation (working ✓) 326 327**fly.toml:** 328```toml 329[deploy] 330 release_command = "uv run alembic upgrade head" 331``` 332 333**github actions (.github/workflows/deploy-backend.yml):** 334```yaml 335- name: detect changes 336 uses: dorny/paths-filter@v3 337 id: changes 338 with: 339 filters: .github/path-filters.yml 340 341- name: deploy to fly.io 342 run: | 343 if [ "${{ steps.changes.outputs.migrations }}" == "true" ]; then 344 echo "🔄 migrations detected - will run via release_command before deployment" 345 fi 346 flyctl deploy --remote-only 347``` 348 349**path filters (.github/path-filters.yml):** 350```yaml 351migrations: 352 - "alembic/versions/**" 353 - "alembic/env.py" 354 - "alembic.ini" 355``` 356 357this setup: 3581. detects when migrations change 3592. logs that migrations will run (for visibility) 3603. deploys normally 3614. fly.io automatically runs `release_command` before deployment 3625. migrations succeed → deployment proceeds 3636. migrations fail → deployment aborts, old version keeps running 364 365### alternative approaches considered 366 367**option: post-deployment github actions job** 368 369run migrations via SSH after fly deployment completes: 370 371```yaml 372- name: run database migrations 373 run: flyctl ssh console -a plyr-api -C "uv run alembic upgrade head" 374``` 375 376**why we didn't use this**: 377- migrations run AFTER app starts (brief window where app has wrong schema) 378- requires implementing deployment polling logic 379- SSH from CI is a security consideration 380- fly.io `release_command` is simpler and runs migrations BEFORE deployment 381 382**option: neon branch-based migrations** 383 384use neon's branch features for zero-downtime migrations (test on branch, then promote): 385 386**why we didn't use this**: 387- adds complexity for marginal benefit at current scale 388- our migrations are simple and fast (~3 seconds) 389- can revisit when we have complex, long-running migrations 390 391## current capabilities 392 393**migration isolation via release_command** (already implemented): 394- fly.io's `release_command` runs migrations in a separate temporary machine before deployment 395- migrations complete before app serves traffic (no inconsistent state) 396- deployment automatically aborts if migration fails 397- this provides similar benefits to kubernetes init containers 398 399**multi-environment pipeline** (already implemented): 400- dev → staging → production progression via three neon databases 401- migrations tested locally against neon dev first 402- staging deployment validates migrations before production 403- automated via GitHub Actions 404 405## future considerations 406 407as plyr.fm scales, we may want to explore: 408 409**neon branch-based migrations** (for complex changes): 410- test migrations on database branch first 411- promote branch to production (instant swap) 412- zero downtime, instant rollback 413- useful for high-risk schema changes 414 415**automated smoke tests**: 416- run basic API health checks after migration completes 417- verify critical queries still work 418- alert if performance degrades significantly 419 420## migration best practices 421 422### before creating migration 423 4241. **check current state** 425 ```bash 426 # local 427 uv run alembic current 428 429 # production 430 flyctl ssh console -a relay-api -C "uv run alembic current" 431 ``` 432 4332. **ensure schemas are in sync** 434 ```bash 435 # generate test migration 436 uv run alembic revision --autogenerate -m "test" 437 438 # if file is empty, schemas match 439 # if file has changes, schemas are out of sync (fix this first) 440 441 # remove test file 442 rm alembic/versions/*_test.py 443 ``` 444 445### creating migration 446 4471. **make model changes** 448 - edit files in `src/backend/models/` 449 - keep changes focused (one logical change per migration) 450 4512. **generate migration** 452 ```bash 453 uv run alembic revision --autogenerate -m "descriptive name" 454 ``` 455 4563. **review generated migration** 457 - alembic autogenerate is not perfect 458 - verify the upgrade() function does what you expect 459 - check for missing operations (autogenerate doesn't detect everything) 460 - ensure downgrade() is correct 461 4624. **test locally** 463 ```bash 464 # upgrade 465 uv run alembic upgrade head 466 467 # verify schema 468 uv run python -c "from backend.models import Track; print(Track.__table__.columns)" 469 470 # test downgrade 471 uv run alembic downgrade -1 472 uv run alembic upgrade head 473 ``` 474 4755. **test with actual app** 476 ```bash 477 # start backend with new schema 478 uv run uvicorn backend.main:app --reload 479 480 # verify endpoints work 481 curl http://localhost:8001/health 482 ``` 483 484### deploying migration 485 4861. **commit migration file only** 487 ```bash 488 git add alembic/versions/<hash>_<description>.py 489 git commit -m "migration: <description>" 490 ``` 491 4922. **create PR and review** 493 - include migration details in PR description 494 - note any backward compatibility concerns 495 - tag for review if complex 496 4973. **merge and deploy** 498 - merge to main 499 - watch deployment 500 - run migration (manual for now, automated soon) 501 5024. **verify deployment** 503 ```bash 504 # check revision 505 flyctl ssh console -a relay-api -C "uv run alembic current" 506 507 # check app health 508 curl https://relay-api.fly.dev/health 509 510 # check logs 511 flyctl logs --app relay-api 512 ``` 513 514### handling failed migrations 515 516**if migration fails during upgrade**: 517 5181. **check what failed** 519 ```bash 520 flyctl logs --app relay-api | grep alembic 521 ``` 522 5232. **check database state** 524 ```bash 525 flyctl ssh console -a relay-api -C "uv run alembic current" 526 ``` 527 5283. **fix the issue** 529 - if migration was partially applied, may need manual SQL to fix 530 - if migration didn't apply, fix and re-run 531 - if data is corrupted, may need to restore from backup 532 5334. **options**: 534 535 a. **downgrade and fix**: 536 ```bash 537 flyctl ssh console -a relay-api -C "uv run alembic downgrade -1" 538 # fix migration file locally 539 # commit and redeploy 540 ``` 541 542 b. **create fix migration**: 543 ```bash 544 # create new migration that fixes the issue 545 uv run alembic revision -m "fix previous migration" 546 # implement fix in upgrade() 547 # commit and deploy 548 ``` 549 550 c. **manual SQL fix**: 551 ```bash 552 flyctl ssh console -a relay-api 553 # connect to database 554 # run manual SQL to fix state 555 # stamp to correct revision 556 # exit 557 ``` 558 559### zero-downtime migration patterns 560 561for production systems, some schema changes require special handling: 562 563**adding nullable columns** (safe): 564```python 565def upgrade(): 566 op.add_column('tracks', sa.Column('new_field', sa.String(), nullable=True)) 567``` 568 569**adding non-nullable columns** (requires two-step): 570```python 571# migration 1: add nullable 572def upgrade(): 573 op.add_column('tracks', sa.Column('new_field', sa.String(), nullable=True)) 574 575# migration 2: populate and make non-null 576def upgrade(): 577 op.execute("UPDATE tracks SET new_field = 'default' WHERE new_field IS NULL") 578 op.alter_column('tracks', 'new_field', nullable=False) 579``` 580 581**dropping columns** (requires two-step): 582```python 583# migration 1: stop using column in app code 584# deploy app 585 586# migration 2: drop column 587def upgrade(): 588 op.drop_column('tracks', 'old_field') 589``` 590 591**renaming columns** (requires three-step): 592```python 593# migration 1: add new column 594def upgrade(): 595 op.add_column('tracks', sa.Column('new_name', sa.String())) 596 op.execute("UPDATE tracks SET new_name = old_name") 597 598# migration 2: switch app to use new column 599# deploy app 600 601# migration 3: drop old column 602def upgrade(): 603 op.drop_column('tracks', 'old_name') 604``` 605 606## references 607 608### internal 609- plyr.fm dockerfile: `Dockerfile` 610- plyr.fm fly config: `fly.toml` 611- alembic config: `alembic.ini` 612- alembic env: `alembic/env.py` 613- models: `src/backend/models/` 614 615### external 616- alembic documentation: https://alembic.sqlalchemy.org/ 617- fly.io release commands: https://fly.io/docs/reference/configuration/#release_command 618- fly.io ssh: https://fly.io/docs/flyctl/ssh/ 619- neon branching: https://neon.tech/docs/guides/branching 620 621### github actions 622- paths-filter action: https://github.com/dorny/paths-filter 623- flyctl actions: https://github.com/superfly/flyctl-actions 624 625--- 626 627**last updated**: 2025-12-05 628**status**: fully automated via fly.io release_command ✓ 629**owner**: @zzstoatzz