commit 5c1e0b471e94771af486b39ab9a17da63318f3a0 · zzstoatzz.io/plyr.fm

+10 -4

.github/workflows/status-maintenance.yml

··· 46 46 with: 47 47 anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }} 48 48 claude_args: | 49 - --allowedTools "Read,Write,Edit,Bash" 49 + --allowedTools "Read,Write,Edit,Bash,Fetch" 50 50 prompt: | 51 51 you are maintaining the plyr.fm (pronounce as "player FM") project status file. 52 52 ··· 61 61 62 62 before writing any transcript, understand the timeline: 63 63 1. run `git log --oneline --since="1 week ago"` to see recent commits 64 - 2. run `git log --oneline -20` to see the last 20 commits with dates 64 + 2. run `git log --oneline -30` to see the last 30 commits with dates 65 65 3. note the actual dates of changes - don't present old work as "just shipped" 66 66 67 67 ## task 2: archive old sections (if needed) ··· 84 84 if skip_audio is false: 85 85 1. write a 2-3 minute podcast script to podcast_script.txt 86 86 - two hosts having a casual conversation 87 - - focus on shipped features from the top of STATUS.md 87 + - host personalities should be inspired by Gilfoyle and Dinesh from Silicon Valley 88 + - focus on recently shipped features from the git history (except for the first episode) 88 89 - format: "Host: ..." and "Cohost: ..." lines 89 90 - IMPORTANT: "plyr.fm" is pronounced "player FM" (not "plir" or spelling it out) 91 + - do not over-sensationalize / over-compliment the project's significance / achievements / progress 90 92 91 93 temporal awareness: 92 94 - use the git history to understand WHEN things actually shipped 93 95 - if this is the first episode, acknowledge the project started in november 2025 94 - - reference time correctly: "last week we shipped X" vs "back in november we built Y" 96 + - reference time correctly: "last week they shipped X" vs "back in november they built Y" 95 97 - don't present month-old work as if it just happened 96 98 97 99 tone guidelines: ··· 100 102 - use intuitive analogies to explain technical concepts in terms of everyday experience 101 103 - matter-of-fact delivery, not hype-y or marketing-speak 102 104 - brief, conversational - like two friends catching up on what shipped 105 + 106 + read upstream documentation: 107 + - docs/**.md contains a lot of useful information 108 + - you can Fetch atproto.com to understand primitives that are relevant to the project 103 109 104 110 2. run: uv run scripts/generate_tts.py podcast_script.txt update.wav 105 111

-878

.status_history/2025-11.md

··· 1 - ### detailed history 2 - 3 - ### Queue hydration + ATProto token hardening (Nov 12, 2025) 4 - 5 - **Why:** queue endpoints were occasionally taking 2s+ and restore operations could 401 6 - when multiple requests refreshed an expired ATProto token simultaneously. 7 - 8 - **What shipped:** 9 - - Added persistent `image_url` on `Track` rows so queue hydration no longer probes R2 10 - for every track. Queue payloads now pull art directly from Postgres, with a one-time 11 - fallback for legacy rows. 12 - - Updated `_internal/queue.py` to backfill any missing URLs once (with caching) instead 13 - of per-request GETs. 14 - - Introduced per-session locks in `_refresh_session_tokens` so only one coroutine hits 15 - `oauth_client.refresh_session` at a time; others reuse the refreshed tokens. This 16 - removes the race that caused the batch restore flow to intermittently 500/401. 17 - 18 - **Impact:** queue tail latency dropped back under 500 ms in staging tests, ATProto restore flows are now reliable under concurrent use, and Logfire no longer shows 500s 19 - from the PDS. 20 - 21 - ### Liked tracks feature (PR #157, Nov 11, 2025) 22 - 23 - - ✅ server-side persistent collections 24 - - ✅ ATProto record publication for cross-platform visibility 25 - - ✅ UI for adding/removing tracks from liked collection 26 - - ✅ like counts displayed in track responses and analytics (#170) 27 - - ✅ analytics cards now clickable links to track detail pages (#171) 28 - - ✅ liked state shown on artist page tracks (#163) 29 - 30 - ### Upload streaming + progress UX (PR #182, Nov 11, 2025) 31 - 32 - - Frontend switched from `fetch` to `XMLHttpRequest` so we can display upload progress 33 - toasts (critical for >50 MB mixes on mobile). 34 - - Upload form now clears only after the request succeeds; failed attempts leave the 35 - form intact so users don't lose metadata. 36 - - Backend writes uploads/images to temp files in 8 MB chunks before handing them to the 37 - storage layer, eliminating whole-file buffering and iOS crashes for hour-long mixes. 38 - - Deployment verified locally and by rerunning the exact repro Stella hit (85 minute 39 - mix from mobile). 40 - 41 - ### transcoder API deployment (PR #156, Nov 11, 2025) 42 - 43 - **standalone Rust transcoding service** 🎉 44 - - **deployed**: https://plyr-transcoder.fly.dev/ 45 - - **purpose**: convert AIFF/FLAC/etc. to MP3 for browser compatibility 46 - - **technology**: Axum + ffmpeg + Docker 47 - - **security**: `X-Transcoder-Key` header authentication (shared secret) 48 - - **capacity**: handles 1GB uploads, tested with 85-minute AIFF files (~858MB → 195MB MP3 in 32 seconds) 49 - - **architecture**: 50 - - 2 Fly machines for high availability 51 - - auto-stop/start for cost efficiency 52 - - stateless design (no R2 integration yet) 53 - - 320kbps MP3 output with proper ID3 tags 54 - - **status**: deployed and tested, ready for integration into plyr.fm upload pipeline 55 - - **next steps**: wire into backend with R2 integration and job queue (see issue #153) 56 - 57 - ### AIFF/AIF browser compatibility fix (PR #152, Nov 11, 2025) 58 - 59 - **format validation improvements** 60 - - **problem discovered**: AIFF/AIF files only work in Safari, not Chrome/Firefox 61 - - browsers throw `MediaError code 4: MEDIA_ERR_SRC_NOT_SUPPORTED` 62 - - users could upload files but they wouldn't play in most browsers 63 - - **immediate solution**: reject AIFF/AIF uploads at both backend and frontend 64 - - removed AIFF/AIF from AudioFormat enum 65 - - added format hints to upload UI: "supported: mp3, wav, m4a" 66 - - client-side validation with helpful error messages 67 - - **long-term solution**: deployed standalone transcoder service (see above) 68 - - separate Rust/Axum service with ffmpeg 69 - - accepts all formats, converts to browser-compatible MP3 70 - - integration into upload pipeline pending (issue #153) 71 - 72 - **observability improvements**: 73 - - added logfire instrumentation to upload background tasks 74 - - added logfire spans to R2 storage operations 75 - - documented logfire querying patterns in `docs/logfire-querying.md` 76 - 77 - ### async I/O performance fixes (PRs #149-151, Nov 10-11, 2025) 78 - 79 - Eliminated event loop blocking across backend with three critical PRs: 80 - 81 - 1. **PR #149: async R2 reads** - converted R2 `head_object` operations from sync boto3 to async aioboto3 82 - - portal page load time: 2+ seconds → ~200ms 83 - - root cause: `track.image_url` was blocking on serial R2 HEAD requests 84 - 85 - 2. **PR #150: concurrent PDS resolution** - parallelized ATProto PDS URL lookups 86 - - homepage load time: 2-6 seconds → 200-400ms 87 - - root cause: serial `resolve_atproto_data()` calls (8 artists × 200-300ms each) 88 - - fix: `asyncio.gather()` for batch resolution, database caching for subsequent loads 89 - 90 - 3. **PR #151: async storage writes/deletes** - made save/delete operations non-blocking 91 - - R2: switched to `aioboto3` for uploads/deletes (async S3 operations) 92 - - filesystem: used `anyio.Path` and `anyio.open_file()` for chunked async I/O (64KB chunks) 93 - - impact: multi-MB uploads no longer monopolize worker thread, constant memory usage 94 - 95 - ### cover art support (PRs #123-126, #132-139) 96 - - ✅ track cover image upload and storage (separate R2 bucket) 97 - - ✅ image display on track pages and player 98 - - ✅ Open Graph meta tags for track sharing 99 - - ✅ mobile-optimized layouts with cover art 100 - - ✅ sticky bottom player on mobile with cover 101 - 102 - ### track detail pages (PR #164, Nov 12, 2025) 103 - 104 - - ✅ dedicated track detail pages with large cover art 105 - - ✅ play button updates queue state correctly (#169) 106 - - ✅ liked state loaded efficiently via server-side fetch 107 - - ✅ mobile-optimized layouts with proper scrolling constraints 108 - - ✅ origin validation for image URLs (#168) 109 - 110 - ### mobile UI improvements (PRs #159-185, Nov 11-12, 2025) 111 - 112 - - ✅ compact action menus and better navigation (#161) 113 - - ✅ improved mobile responsiveness (#159) 114 - - ✅ consistent button layouts across mobile/desktop (#176-181, #185) 115 - - ✅ always show play count and like count on mobile (#177) 116 - - ✅ login page UX improvements (#174-175) 117 - - ✅ liked page UX improvements (#173) 118 - - ✅ accent color for liked tracks (#160) 119 - 120 - ### queue management improvements (PRs #110-113, #115) 121 - - ✅ visual feedback on queue add/remove 122 - - ✅ toast notifications for queue actions 123 - - ✅ better error handling for queue operations 124 - - ✅ improved shuffle and auto-advance UX 125 - 126 - ### infrastructure and tooling 127 - - ✅ R2 bucket separation: audio-prod and images-prod (PR #124) 128 - - ✅ admin script for content moderation (`scripts/delete_track.py`) 129 - - ✅ bluesky attribution link in header 130 - - ✅ changelog target added (#183) 131 - - ✅ documentation updates (#158) 132 - - ✅ track metadata edits now persist correctly (#162) 133 - 134 - ## immediate priorities 135 - 136 - ### high priority features 137 - 1. **audio transcoding pipeline integration** (issue #153) 138 - - ✅ standalone transcoder service deployed at https://plyr-transcoder.fly.dev/ 139 - - ✅ Rust/Axum service with ffmpeg, tested with 85-minute files 140 - - ✅ secure auth via X-Transcoder-Key header 141 - - ⏳ next: integrate into plyr.fm upload pipeline 142 - - backend calls transcoder API for unsupported formats 143 - - queue-based job system for async processing 144 - - R2 integration (fetch original, store MP3) 145 - - maintain original file hash for deduplication 146 - - handle transcoding failures gracefully 147 - 148 - ### critical bugs 149 - 1. **upload reliability** (issue #147): upload returns 200 but file missing from R2, no error logged 150 - - priority: high (data loss risk) 151 - - need better error handling and retry logic in background upload task 152 - 153 - 2. **database connection pool SSL errors**: intermittent failures on first request 154 - - symptom: `/tracks/` returns 500 on first request, succeeds after 155 - - fix: set `pool_pre_ping=True`, adjust `pool_recycle` for Neon timeouts 156 - - documented in `docs/logfire-querying.md` 157 - 158 - ### performance optimizations 159 - 3. **persist concrete file extensions in database**: currently brute-force probing all supported formats on read 160 - - already know `Track.file_type` and image format during upload 161 - - eliminating repeated `exists()` checks reduces filesystem/R2 HEAD spam 162 - - improves audio streaming latency (`/audio/{file_id}` endpoint walks extensions sequentially) 163 - 164 - 4. **stream large uploads directly to storage**: current implementation reads entire file into memory before background task 165 - - multi-GB uploads risk OOM 166 - - stream from `UploadFile.file` → storage backend for constant memory usage 167 - 168 - ### new features 169 - 5. **content-addressable storage** (issue #146) 170 - - hash-based file storage for automatic deduplication 171 - - reduces storage costs when multiple artists upload same file 172 - - enables content verification 173 - 174 - 6. **liked tracks feature** (issue #144): design schema and ATProto record format 175 - - server-side persistent collections 176 - - ATProto record publication for cross-platform visibility 177 - - UI for adding/removing tracks from liked collection 178 - 179 - ## open issues by timeline 180 - 181 - ### immediate 182 - - issue #153: audio transcoding pipeline (ffmpeg worker for AIFF/FLAC→MP3) 183 - - issue #147: upload reliability bug (data loss risk) 184 - - issue #144: likes feature for personal collections 185 - 186 - ### short-term 187 - - issue #146: content-addressable storage (hash-based deduplication) 188 - - issue #24: implement play count abuse prevention 189 - - database connection pool tuning (SSL errors) 190 - - file extension persistence in database 191 - 192 - ### medium-term 193 - - issue #39: postmortem - cross-domain auth deployment and remaining security TODOs 194 - - issue #46: consider removing init_db() from lifespan in favor of migration-only approach 195 - - issue #56: design public developer API and versioning 196 - - issue #57: support multiple audio item types (voice memos/snippets) 197 - - issue #122: fullscreen player for immersive playback 198 - 199 - ### long-term 200 - - migrate to plyr-owned lexicon (custom ATProto namespace with richer metadata) 201 - - publish to multiple ATProto AppViews for cross-platform visibility 202 - - explore ATProto-native notifications (replace Bluesky DM bot) 203 - - realtime queue syncing across devices via SSE/WebSocket 204 - - artist analytics dashboard improvements 205 - - issue #44: modern music streaming feature parity 206 - 207 - ## technical state 208 - 209 - ### architecture 210 - 211 - **backend** 212 - - language: Python 3.11+ 213 - - framework: FastAPI with uvicorn 214 - - database: Neon PostgreSQL (serverless, fully managed) 215 - - storage: Cloudflare R2 (S3-compatible object storage) 216 - - hosting: Fly.io (2x shared-cpu VMs, auto-scaling) 217 - - observability: Pydantic Logfire (traces, metrics, logs) 218 - - auth: ATProto OAuth 2.1 (forked SDK: github.com/zzstoatzz/atproto) 219 - 220 - **frontend** 221 - - framework: SvelteKit (latest v2.43.2) 222 - - runtime: Bun (fast JS runtime) 223 - - hosting: Cloudflare Pages (edge network) 224 - - styling: vanilla CSS with lowercase aesthetic 225 - - state management: Svelte 5 runes ($state, $derived, $effect) 226 - 227 - **deployment** 228 - - ci/cd: GitHub Actions 229 - - backend: automatic on main branch merge (fly.io deploy) 230 - - frontend: automatic on every push to main (cloudflare pages) 231 - - migrations: automated via fly.io release_command 232 - - environments: dev → staging → production (full separation) 233 - - versioning: nebula timestamp format (YYYY.MMDD.HHMMSS) 234 - 235 - **key dependencies** 236 - - atproto: forked SDK for OAuth and record management 237 - - sqlalchemy: async ORM for postgres 238 - - alembic: database migrations 239 - - boto3/aioboto3: R2 storage client 240 - - logfire: observability (FastAPI + SQLAlchemy instrumentation) 241 - - httpx: async HTTP client 242 - 243 - **what's working** 244 - 245 - **core functionality** 246 - - ✅ ATProto OAuth 2.1 authentication with encrypted state 247 - - ✅ secure session management via HttpOnly cookies (XSS protection) 248 - - ✅ developer tokens with independent OAuth grants (programmatic API access) 249 - - ✅ platform stats endpoint and homepage display (plays, tracks, artists) 250 - - ✅ Media Session API for CarPlay, lock screens, control center 251 - - ✅ timed comments on tracks with clickable timestamps 252 - - ✅ account deletion with explicit confirmation 253 - - ✅ artist profiles synced with Bluesky (avatar, display name, handle) 254 - - ✅ track upload with streaming to prevent OOM 255 - - ✅ track edit (title, artist, album, features metadata) 256 - - ✅ track deletion with cascade cleanup 257 - - ✅ audio streaming via HTML5 player with 307 redirects to R2 CDN 258 - - ✅ track metadata published as ATProto records (fm.plyr.track namespace) 259 - - ✅ play count tracking with threshold (30% or 30s, whichever comes first) 260 - - ✅ like functionality with counts 261 - - ✅ artist analytics dashboard 262 - - ✅ queue management (shuffle, auto-advance, reorder) 263 - - ✅ mobile-optimized responsive UI 264 - - ✅ cross-tab queue synchronization via BroadcastChannel 265 - - ✅ share tracks via URL with Open Graph previews (including cover art) 266 - - ✅ image URL caching in database (eliminates N+1 R2 calls) 267 - - ✅ format validation (rejects AIFF/AIF, accepts MP3/WAV/M4A with helpful error mes 268 - sages) 269 - - ✅ standalone audio transcoding service deployed and verified (see issue #153) 270 - - ✅ Bluesky embed player UI changes implemented (pending upstream social-app PR) 271 - - ✅ admin content moderation script for removing inappropriate uploads 272 - - ✅ copyright moderation system (AuDD fingerprinting, review workflow, violation tracking) 273 - - ✅ ATProto labeler for copyright violations (queryLabels, subscribeLabels XRPC endpoints) 274 - - ✅ admin UI for reviewing flagged tracks with htmx (plyr-moderation.fly.dev/admin) 275 - 276 - **albums** 277 - - ✅ album database schema with track relationships 278 - - ✅ album browsing pages (`/u/{handle}` shows discography) 279 - - ✅ album detail pages (`/u/{handle}/album/{slug}`) with full track lists 280 - - ✅ album cover art upload and display 281 - - ✅ server-side rendering for SEO 282 - - ✅ rich Open Graph metadata for link previews (music.album type) 283 - - ✅ long album title handling (100-char slugs, CSS truncation) 284 - - ⏸ ATProto records for albums (deferred, see issue #221) 285 - 286 - **frontend architecture** 287 - - ✅ server-side data loading (`+page.server.ts`) for artist and album pages 288 - - ✅ client-side data loading (`+page.ts`) for auth-dependent pages 289 - - ✅ centralized auth manager (`lib/auth.svelte.ts`) 290 - - ✅ layout-level auth state (`+layout.ts`) shared across all pages 291 - - ✅ eliminated "flash of loading" via proper load functions 292 - - ✅ consistent auth patterns (no scattered localStorage calls) 293 - 294 - **deployment (fully automated)** 295 - - **production**: 296 - - frontend: https://plyr.fm (cloudflare pages) 297 - - backend: https://relay-api.fly.dev (fly.io: 2 machines, 1GB RAM, 1 shared CPU, min 1 running) 298 - - database: neon postgresql 299 - - storage: cloudflare R2 (audio-prod and images-prod buckets) 300 - - deploy: github release → automatic 301 - 302 - - **staging**: 303 - - backend: https://api-stg.plyr.fm (fly.io: relay-api-staging) 304 - - frontend: https://stg.plyr.fm (cloudflare pages: plyr-fm-stg) 305 - - database: neon postgresql (relay-staging) 306 - - storage: cloudflare R2 (audio-stg bucket) 307 - - deploy: push to main → automatic 308 - 309 - - **development**: 310 - - backend: localhost:8000 311 - - frontend: localhost:5173 312 - - database: neon postgresql (relay-dev) 313 - - storage: cloudflare R2 (audio-dev and images-dev buckets) 314 - 315 - - **developer tooling**: 316 - - `just serve` - run backend locally 317 - - `just dev` - run frontend locally 318 - - `just test` - run test suite 319 - - `just release` - create production release (backend + frontend) 320 - - `just release-frontend-only` - deploy only frontend changes (added Nov 13) 321 - 322 - ### what's in progress 323 - 324 - **immediate work** 325 - - investigating playback auto-start behavior (#225) 326 - - page refresh sometimes starts playing immediately 327 - - may be related to queue state restoration or localStorage caching 328 - - `autoplay_next` preference not being respected in all cases 329 - - liquid glass effects as user-configurable setting (#186) 330 - 331 - **active research** 332 - - transcoding pipeline architecture (see sandbox/transcoding-pipeline-plan.md) 333 - - content moderation systems (#166, #167, #393 - takedown state representation) 334 - - PWA capabilities and offline support (#165) 335 - 336 - ### known issues 337 - 338 - **player behavior** 339 - - playback auto-start on refresh (#225) 340 - - sometimes plays immediately after page load 341 - - investigating localStorage/queue state persistence 342 - - may not respect `autoplay_next` preference in all scenarios 343 - 344 - **missing features** 345 - - no ATProto records for albums yet (#221 - consciously deferred) 346 - - no track genres/tags/descriptions yet (#155) 347 - - no AIFF/AIF transcoding support (#153) 348 - - no PWA installation prompts (#165) 349 - - no fullscreen player view (#122) 350 - - no public API for third-party integrations (#56) 351 - 352 - **technical debt** 353 - - multi-tab playback synchronization could be more robust 354 - - queue state conflicts can occur with rapid operations 355 - 356 - ### technical decisions 357 - 358 - **why Python/FastAPI instead of Rust?** 359 - - rapid prototyping velocity during MVP phase 360 - - rich ecosystem for web APIs (fastapi, sqlalchemy, pydantic) 361 - - excellent async support with asyncio 362 - - lower barrier to contribution 363 - - trade-off: accepting higher latency for faster development 364 - - future: can migrate hot paths to Rust if needed (transcoding service already planned) 365 - 366 - **why Fly.io instead of AWS/GCP?** 367 - - simple deployment model (dockerfile → production) 368 - - automatic SSL/TLS certificates 369 - - built-in global load balancing 370 - - reasonable pricing for MVP ($5/month) 371 - - easy migration path to larger providers later 372 - - trade-off: vendor-specific features, less control 373 - 374 - **why Cloudflare R2 instead of S3?** 375 - - zero egress fees (critical for audio streaming) 376 - - S3-compatible API (easy migration if needed) 377 - - integrated CDN for fast delivery 378 - - significantly cheaper than S3 for bandwidth-heavy workloads 379 - 380 - **why forked atproto SDK?** 381 - - upstream SDK lacked OAuth 2.1 support 382 - - needed custom record management patterns 383 - - maintains compatibility with ATProto spec 384 - - contributes improvements back when possible 385 - 386 - **why SvelteKit instead of React/Next.js?** 387 - - Svelte 5 runes provide excellent reactivity model 388 - - smaller bundle sizes (critical for mobile) 389 - - less boilerplate than React 390 - - SSR + static generation flexibility 391 - - modern DX with TypeScript 392 - 393 - **why Neon instead of self-hosted Postgres?** 394 - - serverless autoscaling (no capacity planning) 395 - - branch-per-PR workflow (preview databases) 396 - - automatic backups and point-in-time recovery 397 - - generous free tier for MVP 398 - - trade-off: higher latency than co-located DB, but acceptable 399 - 400 - **why reject AIFF instead of transcoding immediately?** 401 - - MVP speed: transcoding requires queue infrastructure, ffmpeg setup, error handling 402 - - user communication: better to be upfront about limitations than silent failures 403 - - resource management: transcoding is CPU-intensive, needs proper worker architecture 404 - - future flexibility: can add transcoding as optional feature (high-quality uploads → MP3 delivery) 405 - - trade-off: some users can't upload AIFF now, but those who can upload MP3 have working experience 406 - 407 - **why async everywhere?** 408 - - event loop performance: single-threaded async handles high concurrency 409 - - I/O-bound workload: most time spent waiting on network/disk 410 - - recent work (PRs #149-151) eliminated all blocking operations 411 - - alternative: thread pools for blocking I/O, but increases complexity 412 - - trade-off: debugging async code harder than sync, but worth throughput gains 413 - 414 - **why anyio.Path over thread pools?** 415 - - true async I/O: `anyio` uses OS-level async file operations where available 416 - - constant memory: chunked reads/writes (64KB) prevent OOM on large files 417 - - thread pools: would work but less efficient, more context switching 418 - - trade-off: anyio API slightly different from stdlib `pathlib`, but cleaner async semantics 419 - 420 - ## cost structure 421 - 422 - current monthly costs: ~$5-6 423 - 424 - - cloudflare pages: $0 (free tier) 425 - - cloudflare R2: ~$0.16 (storage + operations, no egress fees) 426 - - fly.io production: $5.00 (2x shared-cpu-1x VMs with auto-stop) 427 - - fly.io staging: $0 (auto-stop, only runs during testing) 428 - - neon: $0 (free tier, 0.5 CPU, 512MB RAM, 3GB storage) 429 - - logfire: $0 (free tier) 430 - - domain: $12/year (~$1/month) 431 - 432 - ## deployment URLs 433 - 434 - - **production frontend**: https://plyr.fm 435 - - **production backend**: https://relay-api.fly.dev (redirects to https://api.plyr.fm) 436 - - **staging backend**: https://api-stg.plyr.fm 437 - - **staging frontend**: https://stg.plyr.fm 438 - - **repository**: https://github.com/zzstoatzz/plyr.fm (private) 439 - - **monitoring**: https://logfire-us.pydantic.dev/zzstoatzz/relay 440 - - **bluesky**: https://bsky.app/profile/plyr.fm 441 - - **latest release**: 2025.1129.214811 442 - 443 - ## health indicators 444 - 445 - **production status**: ✅ healthy 446 - - uptime: consistently available 447 - - response times: <500ms p95 for API endpoints 448 - - error rate: <1% (mostly invalid OAuth states) 449 - - storage: ~12 tracks uploaded, functioning correctly 450 - 451 - **key metrics** 452 - - total tracks: ~12 453 - - total artists: ~3 454 - - play counts: tracked per-track 455 - - storage used: <1GB R2 456 - - database size: <10MB postgres 457 - 458 - ## next session prep 459 - 460 - **context for new agent:** 461 - 1. Fixed R2 image upload path mismatch, ensuring images save with the correct prefix. 462 - 2. Implemented UI changes for the embed player: removed the Queue button and matched fonts to the main app. 463 - 3. Opened a draft PR to the upstream social-app repository for native Plyr.fm embed support. 464 - 4. Updated issue #153 (transcoding pipeline) with a clear roadmap for integration into the backend. 465 - 5. Developed a local verification script for the transcoder service for faster local iteration. 466 - 467 - **useful commands:** 468 - - `just backend run` - run backend locally 469 - - `just frontend dev` - run frontend locally 470 - - `just test` - run test suite (from `backend/` directory) 471 - - `gh issue list` - check open issues 472 - ## admin tooling 473 - 474 - ### content moderation 475 - script: `scripts/delete_track.py` 476 - - requires `ADMIN_*` prefixed environment variables 477 - - deletes audio file from R2 478 - - deletes cover image from R2 (if exists) 479 - - deletes database record (cascades to likes and queue entries) 480 - - notes ATProto records for manual cleanup (can't delete from other users' PDS) 481 - 482 - usage: 483 - ```bash 484 - # dry run 485 - uv run scripts/delete_track.py <track_id> --dry-run 486 - 487 - # delete with confirmation 488 - uv run scripts/delete_track.py <track_id> 489 - 490 - # delete without confirmation 491 - uv run scripts/delete_track.py <track_id> --yes 492 - 493 - # by URL 494 - uv run scripts/delete_track.py --url https://plyr.fm/track/34 495 - ``` 496 - 497 - required environment variables: 498 - - `ADMIN_DATABASE_URL` - production database connection 499 - - `ADMIN_AWS_ACCESS_KEY_ID` - R2 access key 500 - - `ADMIN_AWS_SECRET_ACCESS_KEY` - R2 secret 501 - - `ADMIN_R2_ENDPOINT_URL` - R2 endpoint 502 - - `ADMIN_R2_BUCKET` - R2 bucket name 503 - 504 - ## known issues 505 - 506 - ### non-blocking 507 - - cloudflare pages preview URLs return 404 (production works fine) 508 - - some "relay" references remain in docs and comments 509 - - ATProto like records can't be deleted when removing tracks (orphaned on users' PDS) 510 - 511 - ## for new contributors 512 - 513 - ### getting started 514 - 1. clone: `gh repo clone zzstoatzz/plyr.fm` 515 - 2. install dependencies: `uv sync && cd frontend && bun install` 516 - 3. run backend: `uv run uvicorn backend.main:app --reload` 517 - 4. run frontend: `cd frontend && bun run dev` 518 - 5. visit http://localhost:5173 519 - 520 - ### development workflow 521 - 1. create issue on github 522 - 2. create PR from feature branch 523 - 3. ensure pre-commit hooks pass 524 - 4. test locally 525 - 5. merge to main → deploys to staging automatically 526 - 6. verify on staging 527 - 7. create github release → deploys to production automatically 528 - 529 - ### key principles 530 - - type hints everywhere 531 - - lowercase aesthetic 532 - - generic terminology (use "items" not "tracks" where appropriate) 533 - - ATProto first 534 - - mobile matters 535 - - cost conscious 536 - - async everywhere (no blocking I/O) 537 - 538 - ### project structure 539 - ``` 540 - plyr.fm/ 541 - ├── backend/ # FastAPI app & Python tooling 542 - │ ├── src/backend/ # application code 543 - │ │ ├── api/ # public endpoints 544 - │ │ ├── _internal/ # internal services 545 - │ │ ├── models/ # database schemas 546 - │ │ └── storage/ # storage adapters 547 - │ ├── tests/ # pytest suite 548 - │ └── alembic/ # database migrations 549 - ├── frontend/ # SvelteKit app 550 - │ ├── src/lib/ # components & state 551 - │ └── src/routes/ # pages 552 - ├── moderation/ # Rust moderation service (ATProto labeler) 553 - │ ├── src/ # Axum handlers, AuDD client, label signing 554 - │ └── static/ # admin UI (html/css/js) 555 - ├── transcoder/ # Rust audio transcoding service 556 - ├── docs/ # documentation 557 - └── justfile # task runner (mods: backend, frontend, moderation, transcoder) 558 - ``` 559 - 560 - ## documentation 561 - 562 - - [deployment overview](docs/deployment/overview.md) 563 - - [configuration guide](docs/configuration.md) 564 - - [queue design](docs/queue-design.md) 565 - - [logfire querying](docs/logfire-querying.md) 566 - - [pdsx guide](docs/pdsx-guide.md) 567 - - [neon mcp guide](docs/neon-mcp-guide.md) 568 - 569 - ## performance optimization session (Nov 12, 2025) 570 - 571 - ### issue: slow /tracks/liked endpoint 572 - 573 - **symptoms**: 574 - - `/tracks/liked` taking 600-900ms consistently 575 - - only ~25ms spent in database queries 576 - - mysterious 575ms gap with no spans in Logfire traces 577 - - endpoint felt sluggish compared to other pages 578 - 579 - **investigation**: 580 - - examined Logfire traces for `/tracks/liked` requests 581 - - found 5-6 liked tracks being returned per request 582 - - DB queries completing fast (track data, artist info, like counts all under 10ms each) 583 - - noticed R2 storage calls weren't appearing in traces despite taking majority of request time 584 - 585 - **root cause**: 586 - - PR #184 added `image_url` column to tracks table to eliminate N+1 R2 API calls 587 - - new tracks (uploaded after PR) have `image_url` populated at upload time ✅ 588 - - legacy tracks (15 tracks uploaded before PR) had `image_url = NULL` ❌ 589 - - fallback code called `track.get_image_url()` for NULL values 590 - - `get_image_url()` makes uninstrumented R2 `head_object` API calls to find image extensions 591 - - each track with NULL `image_url` = ~100-120ms of R2 API calls per request 592 - - 5 tracks × 120ms = ~600ms of uninstrumented latency 593 - 594 - **why R2 calls weren't visible**: 595 - - `storage.get_url()` method had no Logfire instrumentation 596 - - R2 API calls happening but not creating spans 597 - - appeared as mysterious gap in trace timeline 598 - 599 - **solution implemented**: 600 - 1. created `scripts/backfill_image_urls.py` to populate missing `image_url` values 601 - 2. ran script against production database with production R2 credentials 602 - 3. backfilled 11 tracks successfully (4 already done in previous partial run) 603 - 4. 3 tracks "failed" but actually have non-existent images (optional, expected) 604 - 5. script uses concurrent `asyncio.gather()` for performance 605 - 606 - **key learning: environment configuration matters**: 607 - - initial script runs failed silently because: 608 - - script used local `.env` credentials (dev R2 bucket) 609 - - production images stored in different R2 bucket (`images-prod`) 610 - - `get_url()` returned `None` when images not found in dev bucket 611 - - fix: passed production R2 credentials via environment variables: 612 - - `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` 613 - - `R2_IMAGE_BUCKET=images-prod` 614 - - `R2_PUBLIC_IMAGE_BUCKET_URL=https://pub-7ea7ea9a6f224f4f8c0321a2bb008c5a.r2.dev` 615 - 616 - **results**: 617 - - before: 15 tracks needed backfill, causing ~600-900ms latency on `/tracks/liked` 618 - - after: 13 tracks populated with `image_url`, 3 legitimately have no images 619 - - `/tracks/liked` now loads with 0 R2 API calls instead of 5-11 620 - - endpoint feels "really, really snappy" (user feedback) 621 - - performance improvement visible immediately after backfill 622 - 623 - **database cleanup: queue_state table bloat**: 624 - - discovered `queue_state` had 265% bloat (53 dead rows, 20 live rows) 625 - - ran `VACUUM (FULL, ANALYZE) queue_state` against production 626 - - result: 0 dead rows, table clean 627 - - configured autovacuum for queue_state to prevent future bloat: 628 - - frequent updates to this table make it prone to bloat 629 - - should tune `autovacuum_vacuum_scale_factor` to 0.05 (5% vs default 20%) 630 - 631 - **endpoint performance snapshot** (post-fix, last 10 minutes): 632 - - `GET /tracks/`: 410ms (down from 2+ seconds) 633 - - `GET /queue/`: 399ms (down from 2+ seconds) 634 - - `GET /tracks/liked`: now sub-200ms (down from 600-900ms) 635 - - `GET /preferences/`: 200ms median 636 - - `GET /auth/me`: 114ms median 637 - - `POST /tracks/{track_id}/play`: 34ms 638 - 639 - **PR #184 context**: 640 - - PR claimed "opportunistic backfill: legacy records update on first access" 641 - - but actual implementation never saved computed `image_url` back to database 642 - - fallback code only computed URLs on-demand, didn't persist them 643 - - this is why repeated visits kept hitting R2 API for same tracks 644 - - one-time backfill script was correct solution vs adding write logic to read endpoints 645 - 646 - **graceful ATProto recovery (PR #180)**: 647 - - reviewed recent work on handling tracks with missing `atproto_record_uri` 648 - - 4 tracks in production have NULL ATProto records (expected from upload failures) 649 - - system already handles this gracefully: 650 - - like buttons disabled with helpful tooltips 651 - - track owners can self-service restore via portal 652 - - `restore-record` endpoint recreates with correct TID timestamps 653 - - no action needed - existing recovery system working as designed 654 - 655 - **performance metrics pre/post all recent PRs**: 656 - - PR #184 (image_url storage): eliminated hundreds of R2 API calls per request 657 - - today's backfill: eliminated remaining R2 calls for legacy tracks 658 - - combined impact: queue/tracks endpoints now 5-10x faster than before PR #184 659 - - all endpoints now consistently sub-second response times 660 - 661 - **documentation created**: 662 - - `docs/neon-mcp-guide.md`: comprehensive guide for using Neon MCP 663 - - project/branch management 664 - - database schema inspection 665 - - SQL query patterns for plyr.fm 666 - - connection string generation 667 - - environment mapping (dev/staging/prod) 668 - - debugging workflows 669 - - `scripts/backfill_image_urls.py`: reusable for any future image_url gaps 670 - - dry-run mode for safety 671 - - concurrent R2 API calls 672 - - detailed error logging 673 - - production-tested 674 - 675 - **tools and patterns established**: 676 - - Neon MCP for database inspection and queries 677 - - Logfire arbitrary queries for performance analysis 678 - - production secret management via Fly.io 679 - - `flyctl ssh console` for environment inspection 680 - - backfill scripts with dry-run mode 681 - - environment variable overrides for production operations 682 - 683 - **system health indicators**: 684 - - ✅ no 5xx errors in recent spans 685 - - ✅ database queries all under 70ms p95 686 - - ✅ SSL connection pool issues resolved (no errors in recent traces) 687 - - ✅ queue_state table bloat eliminated 688 - - ✅ all track images either in DB or legitimately NULL 689 - - ✅ application feels fast and responsive 690 - 691 - **next steps**: 692 - 1. configure autovacuum for `queue_state` table (prevent future bloat) 693 - 2. add Logfire instrumentation to `storage.get_url()` for visibility 694 - 3. monitor `/tracks/liked` performance over next few days 695 - 4. consider adding similar backfill pattern for any future column additions 696 - 697 - --- 698 - 699 - ### copyright moderation system (PRs #382, #384, Nov 29-30, 2025) 700 - 701 - **motivation**: detect potential copyright violations in uploaded tracks to avoid DMCA issues and protect the platform. 702 - 703 - **what shipped**: 704 - - **moderation service** (Rust/Axum on Fly.io): 705 - - standalone service at `plyr-moderation.fly.dev` 706 - - integrates with AuDD enterprise API for audio fingerprinting 707 - - scans audio URLs and returns matches with metadata (artist, title, album, ISRC, timecode) 708 - - auth via `X-Moderation-Key` header 709 - - **backend integration** (PR #382): 710 - - `ModerationSettings` in config (service URL, auth token, timeout) 711 - - moderation client module (`backend/_internal/moderation.py`) 712 - - fire-and-forget background task on track upload 713 - - stores results in `copyright_scans` table 714 - - scan errors stored as "clear" so tracks aren't stuck unscanned 715 - - **flagging fix** (PR #384): 716 - - AuDD enterprise API returns no confidence scores (all 0) 717 - - changed from score threshold to presence-based flagging: `is_flagged = !matches.is_empty()` 718 - - removed unused `score_threshold` config 719 - - **backfill script** (`scripts/scan_tracks_copyright.py`): 720 - - scans existing tracks that haven't been checked 721 - - `--max-duration` flag to skip long DJ sets (estimated from file size) 722 - - `--dry-run` mode to preview what would be scanned 723 - - supports dev/staging/prod environments 724 - - **review workflow**: 725 - - `copyright_scans` table has `resolution`, `reviewed_at`, `reviewed_by`, `review_notes` columns 726 - - resolution values: `violation`, `false_positive`, `original_artist` 727 - - SQL queries for dashboard: flagged tracks, unreviewed flags, violations list 728 - 729 - **initial review results** (25 flagged tracks): 730 - - 8 violations (actual copyright issues) 731 - - 11 false positives (fingerprint noise) 732 - - 6 original artists (people uploading their own distributed music) 733 - 734 - **impact**: 735 - - automated copyright detection on upload 736 - - manual review workflow for flagged content 737 - - protection against DMCA takedown requests 738 - - clear audit trail with resolution status 739 - 740 - --- 741 - 742 - ### platform stats and media session integration (PRs #359-379, Nov 27-29, 2025) 743 - 744 - **motivation**: show platform activity at a glance, improve playback experience across devices, and give users control over their data. 745 - 746 - **what shipped**: 747 - - **platform stats endpoint and UI** (PRs #376, #378, #379): 748 - - `GET /stats` returns total plays, tracks, and artists 749 - - stats bar displays in homepage header (e.g., "1,691 plays • 55 tracks • 8 artists") 750 - - skeleton loading animation while fetching 751 - - responsive layout: visible in header on wide screens, collapses to menu on narrow 752 - - end-of-list animation on homepage 753 - - **Media Session API** (PR #371): 754 - - provides track metadata to CarPlay, lock screens, Bluetooth devices, macOS control center 755 - - artwork display with fallback to artist avatar 756 - - play/pause, prev/next, seek controls all work from system UI 757 - - position state syncs scrubbers on external interfaces 758 - - **browser tab title** (PR #374): 759 - - shows "track - artist • plyr.fm" while playing 760 - - persists across page navigation 761 - - reverts to page title when playback stops 762 - - **timed comments** (PR #359): 763 - - comments capture timestamp when added during playback 764 - - clickable timestamp buttons seek to that moment 765 - - compact scrollable comments section on track pages 766 - - **constellation integration** (PR #360): 767 - - queries constellation.microcosm.blue backlink index 768 - - enables network-wide like counts (not just plyr.fm internal) 769 - - environment-aware namespace handling 770 - - **account deletion** (PR #363): 771 - - explicit confirmation flow (type handle to confirm) 772 - - deletes all plyr.fm data (tracks, albums, likes, comments, preferences) 773 - - optional ATProto record cleanup with clear warnings about orphaned references 774 - 775 - **impact**: 776 - - platform stats give visitors immediate sense of activity 777 - - media session makes plyr.fm tracks controllable from car/lock screen/control center 778 - - timed comments enable discussion at specific moments in tracks 779 - - account deletion gives users full control over their data 780 - 781 - --- 782 - 783 - ### developer tokens with independent OAuth grants (PR #367, Nov 28, 2025) 784 - 785 - **motivation**: programmatic API access (scripts, CLIs, automation) needed tokens that survive browser logout and don't become stale when browser sessions refresh. 786 - 787 - **what shipped**: 788 - - **OAuth-based dev tokens**: each developer token gets its own OAuth authorization flow 789 - - user clicks "create token" → redirected to PDS for authorization → token created with independent credentials 790 - - tokens have their own DPoP keypair, access/refresh tokens - completely separate from browser session 791 - - **cookie isolation**: dev token exchange doesn't set browser cookie 792 - - added `is_dev_token` flag to ExchangeToken model 793 - - /auth/exchange skips Set-Cookie for dev token flows 794 - - prevents logout from deleting dev tokens (critical bug fixed during implementation) 795 - - **token management UI**: portal → "your data" → "developer tokens" 796 - - create with optional name and expiration (30/90/180/365 days or never) 797 - - list active tokens with creation/expiration dates 798 - - revoke individual tokens 799 - - **API endpoints**: 800 - - `POST /auth/developer-token/start` - initiates OAuth flow, returns auth_url 801 - - `GET /auth/developer-tokens` - list user's tokens 802 - - `DELETE /auth/developer-tokens/{prefix}` - revoke by 8-char prefix 803 - 804 - **security properties**: 805 - - tokens are full sessions with encrypted OAuth credentials (Fernet) 806 - - each token refreshes independently (no staleness from browser session refresh) 807 - - revokable individually without affecting browser or other tokens 808 - - explicit OAuth consent required at PDS for each token created 809 - 810 - **testing verified**: 811 - - created token → uploaded track → logged out → deleted track with token ✓ 812 - - browser logout doesn't affect dev tokens ✓ 813 - - token works across browser sessions ✓ 814 - - staging deployment tested end-to-end ✓ 815 - 816 - **documentation**: see `docs/authentication.md` "developer tokens" section 817 - 818 - --- 819 - 820 - ### oEmbed endpoint for Leaflet.pub embeds (PRs #355-358, Nov 25, 2025) 821 - 822 - **motivation**: plyr.fm tracks embedded in Leaflet.pub (via iframely) showed a black HTML5 audio box instead of our custom embed player. 823 - 824 - **what shipped**: 825 - - **oEmbed endpoint** (PR #355): `/oembed` returns proper embed HTML with iframe 826 - - follows oEmbed spec with `type: "rich"` and iframe in `html` field 827 - - discovery link in track page `<head>` for automatic detection 828 - - **iframely domain registration**: registered plyr.fm on iframely.com (free tier) 829 - - this was the key fix - iframely now returns our embed iframe as `links.player[0]` 830 - - API key: stored in 1password (iframely account) 831 - 832 - **debugging journey** (PRs #356-358): 833 - - initially tried `og:video` meta tags to hint iframe embed - didn't work 834 - - tried removing `og:audio` to force oEmbed fallback - resulted in no player link 835 - - discovered iframely requires domain registration to trust oEmbed providers 836 - - after registration, iframely correctly returns embed iframe URL 837 - 838 - **current state**: 839 - - oEmbed endpoint working: `curl https://api.plyr.fm/oembed?url=https://plyr.fm/track/92` 840 - - iframely returns `links.player[0].href = "https://plyr.fm/embed/track/92"` (our embed) 841 - - Leaflet.pub should show proper embeds (pending their cache expiry) 842 - 843 - **impact**: 844 - - plyr.fm tracks can be embedded in Leaflet.pub and other iframely-powered services 845 - - proper embed player with cover art instead of raw HTML5 audio 846 - 847 - --- 848 - 849 - ### export & upload reliability (PRs #337-344, Nov 24, 2025) 850 - 851 - **motivation**: exports were failing silently on large files (OOM), uploads showed incorrect progress, and SSE connections triggered false error toasts. 852 - 853 - **what shipped**: 854 - - **database-backed jobs** (PR #337): moved upload/export tracking from in-memory to postgres 855 - - jobs table persists state across server restarts 856 - - enables reliable progress tracking via SSE polling 857 - - **streaming exports** (PR #343): fixed OOM on large file exports 858 - - previously loaded entire files into memory via `response["Body"].read()` 859 - - now streams to temp files, adds to zip from disk (constant memory) 860 - - 90-minute WAV files now export successfully on 1GB VM 861 - - **progress tracking fix** (PR #340): upload progress was receiving bytes but treating as percentage 862 - - `UploadProgressTracker` now properly converts bytes to percentage 863 - - upload progress bar works correctly again 864 - - **UX improvements** (PRs #338-339, #341-342, #344): 865 - - export filename now includes date (`plyr-tracks-2025-11-24.zip`) 866 - - toast notification on track deletion 867 - - fixed false "lost connection" error when SSE completes normally 868 - - progress now shows "downloading track X of Y" instead of confusing count 869 - 870 - **impact**: 871 - - exports work for arbitrarily large files (limited by disk, not RAM) 872 - - upload progress displays correctly 873 - - job state survives server restarts 874 - - clearer progress messaging during exports 875 - 876 - --- 877 - 878 - archived from STATUS.md on 2025-12-01

+874 -2

STATUS.md

··· 136 136 - htmx endpoints: `/admin/flags-html`, `/admin/resolve-htmx` 137 137 - server-rendered HTML partials for flag cards 138 138 139 + 140 + ### Queue hydration + ATProto token hardening (Nov 12, 2025) 141 + 142 + **Why:** queue endpoints were occasionally taking 2s+ and restore operations could 401 143 + when multiple requests refreshed an expired ATProto token simultaneously. 144 + 145 + **What shipped:** 146 + - Added persistent `image_url` on `Track` rows so queue hydration no longer probes R2 147 + for every track. Queue payloads now pull art directly from Postgres, with a one-time 148 + fallback for legacy rows. 149 + - Updated `_internal/queue.py` to backfill any missing URLs once (with caching) instead 150 + of per-request GETs. 151 + - Introduced per-session locks in `_refresh_session_tokens` so only one coroutine hits 152 + `oauth_client.refresh_session` at a time; others reuse the refreshed tokens. This 153 + removes the race that caused the batch restore flow to intermittently 500/401. 154 + 155 + **Impact:** queue tail latency dropped back under 500 ms in staging tests, ATProto restore flows are now reliable under concurrent use, and Logfire no longer shows 500s 156 + from the PDS. 157 + 158 + ### Liked tracks feature (PR #157, Nov 11, 2025) 159 + 160 + - ✅ server-side persistent collections 161 + - ✅ ATProto record publication for cross-platform visibility 162 + - ✅ UI for adding/removing tracks from liked collection 163 + - ✅ like counts displayed in track responses and analytics (#170) 164 + - ✅ analytics cards now clickable links to track detail pages (#171) 165 + - ✅ liked state shown on artist page tracks (#163) 166 + 167 + ### Upload streaming + progress UX (PR #182, Nov 11, 2025) 168 + 169 + - Frontend switched from `fetch` to `XMLHttpRequest` so we can display upload progress 170 + toasts (critical for >50 MB mixes on mobile). 171 + - Upload form now clears only after the request succeeds; failed attempts leave the 172 + form intact so users don't lose metadata. 173 + - Backend writes uploads/images to temp files in 8 MB chunks before handing them to the 174 + storage layer, eliminating whole-file buffering and iOS crashes for hour-long mixes. 175 + - Deployment verified locally and by rerunning the exact repro Stella hit (85 minute 176 + mix from mobile). 177 + 178 + ### transcoder API deployment (PR #156, Nov 11, 2025) 179 + 180 + **standalone Rust transcoding service** 🎉 181 + - **deployed**: https://plyr-transcoder.fly.dev/ 182 + - **purpose**: convert AIFF/FLAC/etc. to MP3 for browser compatibility 183 + - **technology**: Axum + ffmpeg + Docker 184 + - **security**: `X-Transcoder-Key` header authentication (shared secret) 185 + - **capacity**: handles 1GB uploads, tested with 85-minute AIFF files (~858MB → 195MB MP3 in 32 seconds) 186 + - **architecture**: 187 + - 2 Fly machines for high availability 188 + - auto-stop/start for cost efficiency 189 + - stateless design (no R2 integration yet) 190 + - 320kbps MP3 output with proper ID3 tags 191 + - **status**: deployed and tested, ready for integration into plyr.fm upload pipeline 192 + - **next steps**: wire into backend with R2 integration and job queue (see issue #153) 193 + 194 + ### AIFF/AIF browser compatibility fix (PR #152, Nov 11, 2025) 195 + 196 + **format validation improvements** 197 + - **problem discovered**: AIFF/AIF files only work in Safari, not Chrome/Firefox 198 + - browsers throw `MediaError code 4: MEDIA_ERR_SRC_NOT_SUPPORTED` 199 + - users could upload files but they wouldn't play in most browsers 200 + - **immediate solution**: reject AIFF/AIF uploads at both backend and frontend 201 + - removed AIFF/AIF from AudioFormat enum 202 + - added format hints to upload UI: "supported: mp3, wav, m4a" 203 + - client-side validation with helpful error messages 204 + - **long-term solution**: deployed standalone transcoder service (see above) 205 + - separate Rust/Axum service with ffmpeg 206 + - accepts all formats, converts to browser-compatible MP3 207 + - integration into upload pipeline pending (issue #153) 208 + 209 + **observability improvements**: 210 + - added logfire instrumentation to upload background tasks 211 + - added logfire spans to R2 storage operations 212 + - documented logfire querying patterns in `docs/logfire-querying.md` 213 + 214 + ### async I/O performance fixes (PRs #149-151, Nov 10-11, 2025) 215 + 216 + Eliminated event loop blocking across backend with three critical PRs: 217 + 218 + 1. **PR #149: async R2 reads** - converted R2 `head_object` operations from sync boto3 to async aioboto3 219 + - portal page load time: 2+ seconds → ~200ms 220 + - root cause: `track.image_url` was blocking on serial R2 HEAD requests 221 + 222 + 2. **PR #150: concurrent PDS resolution** - parallelized ATProto PDS URL lookups 223 + - homepage load time: 2-6 seconds → 200-400ms 224 + - root cause: serial `resolve_atproto_data()` calls (8 artists × 200-300ms each) 225 + - fix: `asyncio.gather()` for batch resolution, database caching for subsequent loads 226 + 227 + 3. **PR #151: async storage writes/deletes** - made save/delete operations non-blocking 228 + - R2: switched to `aioboto3` for uploads/deletes (async S3 operations) 229 + - filesystem: used `anyio.Path` and `anyio.open_file()` for chunked async I/O (64KB chunks) 230 + - impact: multi-MB uploads no longer monopolize worker thread, constant memory usage 231 + 232 + ### cover art support (PRs #123-126, #132-139) 233 + - ✅ track cover image upload and storage (separate R2 bucket) 234 + - ✅ image display on track pages and player 235 + - ✅ Open Graph meta tags for track sharing 236 + - ✅ mobile-optimized layouts with cover art 237 + - ✅ sticky bottom player on mobile with cover 238 + 239 + ### track detail pages (PR #164, Nov 12, 2025) 240 + 241 + - ✅ dedicated track detail pages with large cover art 242 + - ✅ play button updates queue state correctly (#169) 243 + - ✅ liked state loaded efficiently via server-side fetch 244 + - ✅ mobile-optimized layouts with proper scrolling constraints 245 + - ✅ origin validation for image URLs (#168) 246 + 247 + ### mobile UI improvements (PRs #159-185, Nov 11-12, 2025) 248 + 249 + - ✅ compact action menus and better navigation (#161) 250 + - ✅ improved mobile responsiveness (#159) 251 + - ✅ consistent button layouts across mobile/desktop (#176-181, #185) 252 + - ✅ always show play count and like count on mobile (#177) 253 + - ✅ login page UX improvements (#174-175) 254 + - ✅ liked page UX improvements (#173) 255 + - ✅ accent color for liked tracks (#160) 256 + 257 + ### queue management improvements (PRs #110-113, #115) 258 + - ✅ visual feedback on queue add/remove 259 + - ✅ toast notifications for queue actions 260 + - ✅ better error handling for queue operations 261 + - ✅ improved shuffle and auto-advance UX 262 + 263 + ### infrastructure and tooling 264 + - ✅ R2 bucket separation: audio-prod and images-prod (PR #124) 265 + - ✅ admin script for content moderation (`scripts/delete_track.py`) 266 + - ✅ bluesky attribution link in header 267 + - ✅ changelog target added (#183) 268 + - ✅ documentation updates (#158) 269 + - ✅ track metadata edits now persist correctly (#162) 270 + 271 + ## immediate priorities 272 + 273 + ### high priority features 274 + 1. **audio transcoding pipeline integration** (issue #153) 275 + - ✅ standalone transcoder service deployed at https://plyr-transcoder.fly.dev/ 276 + - ✅ Rust/Axum service with ffmpeg, tested with 85-minute files 277 + - ✅ secure auth via X-Transcoder-Key header 278 + - ⏳ next: integrate into plyr.fm upload pipeline 279 + - backend calls transcoder API for unsupported formats 280 + - queue-based job system for async processing 281 + - R2 integration (fetch original, store MP3) 282 + - maintain original file hash for deduplication 283 + - handle transcoding failures gracefully 284 + 285 + ### critical bugs 286 + 1. **upload reliability** (issue #147): upload returns 200 but file missing from R2, no error logged 287 + - priority: high (data loss risk) 288 + - need better error handling and retry logic in background upload task 289 + 290 + 2. **database connection pool SSL errors**: intermittent failures on first request 291 + - symptom: `/tracks/` returns 500 on first request, succeeds after 292 + - fix: set `pool_pre_ping=True`, adjust `pool_recycle` for Neon timeouts 293 + - documented in `docs/logfire-querying.md` 294 + 295 + ### performance optimizations 296 + 3. **persist concrete file extensions in database**: currently brute-force probing all supported formats on read 297 + - already know `Track.file_type` and image format during upload 298 + - eliminating repeated `exists()` checks reduces filesystem/R2 HEAD spam 299 + - improves audio streaming latency (`/audio/{file_id}` endpoint walks extensions sequentially) 300 + 301 + 4. **stream large uploads directly to storage**: current implementation reads entire file into memory before background task 302 + - multi-GB uploads risk OOM 303 + - stream from `UploadFile.file` → storage backend for constant memory usage 304 + 305 + ### new features 306 + 5. **content-addressable storage** (issue #146) 307 + - hash-based file storage for automatic deduplication 308 + - reduces storage costs when multiple artists upload same file 309 + - enables content verification 310 + 311 + 6. **liked tracks feature** (issue #144): design schema and ATProto record format 312 + - server-side persistent collections 313 + - ATProto record publication for cross-platform visibility 314 + - UI for adding/removing tracks from liked collection 315 + 316 + ## open issues by timeline 317 + 318 + ### immediate 319 + - issue #153: audio transcoding pipeline (ffmpeg worker for AIFF/FLAC→MP3) 320 + - issue #147: upload reliability bug (data loss risk) 321 + - issue #144: likes feature for personal collections 322 + 323 + ### short-term 324 + - issue #146: content-addressable storage (hash-based deduplication) 325 + - issue #24: implement play count abuse prevention 326 + - database connection pool tuning (SSL errors) 327 + - file extension persistence in database 328 + 329 + ### medium-term 330 + - issue #39: postmortem - cross-domain auth deployment and remaining security TODOs 331 + - issue #46: consider removing init_db() from lifespan in favor of migration-only approach 332 + - issue #56: design public developer API and versioning 333 + - issue #57: support multiple audio item types (voice memos/snippets) 334 + - issue #122: fullscreen player for immersive playback 335 + 336 + ### long-term 337 + - migrate to plyr-owned lexicon (custom ATProto namespace with richer metadata) 338 + - publish to multiple ATProto AppViews for cross-platform visibility 339 + - explore ATProto-native notifications (replace Bluesky DM bot) 340 + - realtime queue syncing across devices via SSE/WebSocket 341 + - artist analytics dashboard improvements 342 + - issue #44: modern music streaming feature parity 343 + 344 + ## technical state 345 + 346 + ### architecture 347 + 348 + **backend** 349 + - language: Python 3.11+ 350 + - framework: FastAPI with uvicorn 351 + - database: Neon PostgreSQL (serverless, fully managed) 352 + - storage: Cloudflare R2 (S3-compatible object storage) 353 + - hosting: Fly.io (2x shared-cpu VMs, auto-scaling) 354 + - observability: Pydantic Logfire (traces, metrics, logs) 355 + - auth: ATProto OAuth 2.1 (forked SDK: github.com/zzstoatzz/atproto) 356 + 357 + **frontend** 358 + - framework: SvelteKit (latest v2.43.2) 359 + - runtime: Bun (fast JS runtime) 360 + - hosting: Cloudflare Pages (edge network) 361 + - styling: vanilla CSS with lowercase aesthetic 362 + - state management: Svelte 5 runes ($state, $derived, $effect) 363 + 364 + **deployment** 365 + - ci/cd: GitHub Actions 366 + - backend: automatic on main branch merge (fly.io deploy) 367 + - frontend: automatic on every push to main (cloudflare pages) 368 + - migrations: automated via fly.io release_command 369 + - environments: dev → staging → production (full separation) 370 + - versioning: nebula timestamp format (YYYY.MMDD.HHMMSS) 371 + 372 + **key dependencies** 373 + - atproto: forked SDK for OAuth and record management 374 + - sqlalchemy: async ORM for postgres 375 + - alembic: database migrations 376 + - boto3/aioboto3: R2 storage client 377 + - logfire: observability (FastAPI + SQLAlchemy instrumentation) 378 + - httpx: async HTTP client 379 + 380 + **what's working** 381 + 382 + **core functionality** 383 + - ✅ ATProto OAuth 2.1 authentication with encrypted state 384 + - ✅ secure session management via HttpOnly cookies (XSS protection) 385 + - ✅ developer tokens with independent OAuth grants (programmatic API access) 386 + - ✅ platform stats endpoint and homepage display (plays, tracks, artists) 387 + - ✅ Media Session API for CarPlay, lock screens, control center 388 + - ✅ timed comments on tracks with clickable timestamps 389 + - ✅ account deletion with explicit confirmation 390 + - ✅ artist profiles synced with Bluesky (avatar, display name, handle) 391 + - ✅ track upload with streaming to prevent OOM 392 + - ✅ track edit (title, artist, album, features metadata) 393 + - ✅ track deletion with cascade cleanup 394 + - ✅ audio streaming via HTML5 player with 307 redirects to R2 CDN 395 + - ✅ track metadata published as ATProto records (fm.plyr.track namespace) 396 + - ✅ play count tracking with threshold (30% or 30s, whichever comes first) 397 + - ✅ like functionality with counts 398 + - ✅ artist analytics dashboard 399 + - ✅ queue management (shuffle, auto-advance, reorder) 400 + - ✅ mobile-optimized responsive UI 401 + - ✅ cross-tab queue synchronization via BroadcastChannel 402 + - ✅ share tracks via URL with Open Graph previews (including cover art) 403 + - ✅ image URL caching in database (eliminates N+1 R2 calls) 404 + - ✅ format validation (rejects AIFF/AIF, accepts MP3/WAV/M4A with helpful error mes 405 + sages) 406 + - ✅ standalone audio transcoding service deployed and verified (see issue #153) 407 + - ✅ Bluesky embed player UI changes implemented (pending upstream social-app PR) 408 + - ✅ admin content moderation script for removing inappropriate uploads 409 + - ✅ copyright moderation system (AuDD fingerprinting, review workflow, violation tracking) 410 + - ✅ ATProto labeler for copyright violations (queryLabels, subscribeLabels XRPC endpoints) 411 + - ✅ admin UI for reviewing flagged tracks with htmx (plyr-moderation.fly.dev/admin) 412 + 413 + **albums** 414 + - ✅ album database schema with track relationships 415 + - ✅ album browsing pages (`/u/{handle}` shows discography) 416 + - ✅ album detail pages (`/u/{handle}/album/{slug}`) with full track lists 417 + - ✅ album cover art upload and display 418 + - ✅ server-side rendering for SEO 419 + - ✅ rich Open Graph metadata for link previews (music.album type) 420 + - ✅ long album title handling (100-char slugs, CSS truncation) 421 + - ⏸ ATProto records for albums (deferred, see issue #221) 422 + 423 + **frontend architecture** 424 + - ✅ server-side data loading (`+page.server.ts`) for artist and album pages 425 + - ✅ client-side data loading (`+page.ts`) for auth-dependent pages 426 + - ✅ centralized auth manager (`lib/auth.svelte.ts`) 427 + - ✅ layout-level auth state (`+layout.ts`) shared across all pages 428 + - ✅ eliminated "flash of loading" via proper load functions 429 + - ✅ consistent auth patterns (no scattered localStorage calls) 430 + 431 + **deployment (fully automated)** 432 + - **production**: 433 + - frontend: https://plyr.fm (cloudflare pages) 434 + - backend: https://relay-api.fly.dev (fly.io: 2 machines, 1GB RAM, 1 shared CPU, min 1 running) 435 + - database: neon postgresql 436 + - storage: cloudflare R2 (audio-prod and images-prod buckets) 437 + - deploy: github release → automatic 438 + 439 + - **staging**: 440 + - backend: https://api-stg.plyr.fm (fly.io: relay-api-staging) 441 + - frontend: https://stg.plyr.fm (cloudflare pages: plyr-fm-stg) 442 + - database: neon postgresql (relay-staging) 443 + - storage: cloudflare R2 (audio-stg bucket) 444 + - deploy: push to main → automatic 445 + 446 + - **development**: 447 + - backend: localhost:8000 448 + - frontend: localhost:5173 449 + - database: neon postgresql (relay-dev) 450 + - storage: cloudflare R2 (audio-dev and images-dev buckets) 451 + 452 + - **developer tooling**: 453 + - `just serve` - run backend locally 454 + - `just dev` - run frontend locally 455 + - `just test` - run test suite 456 + - `just release` - create production release (backend + frontend) 457 + - `just release-frontend-only` - deploy only frontend changes (added Nov 13) 458 + 459 + ### what's in progress 460 + 461 + **immediate work** 462 + - investigating playback auto-start behavior (#225) 463 + - page refresh sometimes starts playing immediately 464 + - may be related to queue state restoration or localStorage caching 465 + - `autoplay_next` preference not being respected in all cases 466 + - liquid glass effects as user-configurable setting (#186) 467 + 468 + **active research** 469 + - transcoding pipeline architecture (see sandbox/transcoding-pipeline-plan.md) 470 + - content moderation systems (#166, #167, #393 - takedown state representation) 471 + - PWA capabilities and offline support (#165) 472 + 473 + ### known issues 474 + 475 + **player behavior** 476 + - playback auto-start on refresh (#225) 477 + - sometimes plays immediately after page load 478 + - investigating localStorage/queue state persistence 479 + - may not respect `autoplay_next` preference in all scenarios 480 + 481 + **missing features** 482 + - no ATProto records for albums yet (#221 - consciously deferred) 483 + - no track genres/tags/descriptions yet (#155) 484 + - no AIFF/AIF transcoding support (#153) 485 + - no PWA installation prompts (#165) 486 + - no fullscreen player view (#122) 487 + - no public API for third-party integrations (#56) 488 + 489 + **technical debt** 490 + - multi-tab playback synchronization could be more robust 491 + - queue state conflicts can occur with rapid operations 492 + 493 + ### technical decisions 494 + 495 + **why Python/FastAPI instead of Rust?** 496 + - rapid prototyping velocity during MVP phase 497 + - rich ecosystem for web APIs (fastapi, sqlalchemy, pydantic) 498 + - excellent async support with asyncio 499 + - lower barrier to contribution 500 + - trade-off: accepting higher latency for faster development 501 + - future: can migrate hot paths to Rust if needed (transcoding service already planned) 502 + 503 + **why Fly.io instead of AWS/GCP?** 504 + - simple deployment model (dockerfile → production) 505 + - automatic SSL/TLS certificates 506 + - built-in global load balancing 507 + - reasonable pricing for MVP ($5/month) 508 + - easy migration path to larger providers later 509 + - trade-off: vendor-specific features, less control 510 + 511 + **why Cloudflare R2 instead of S3?** 512 + - zero egress fees (critical for audio streaming) 513 + - S3-compatible API (easy migration if needed) 514 + - integrated CDN for fast delivery 515 + - significantly cheaper than S3 for bandwidth-heavy workloads 516 + 517 + **why forked atproto SDK?** 518 + - upstream SDK lacked OAuth 2.1 support 519 + - needed custom record management patterns 520 + - maintains compatibility with ATProto spec 521 + - contributes improvements back when possible 522 + 523 + **why SvelteKit instead of React/Next.js?** 524 + - Svelte 5 runes provide excellent reactivity model 525 + - smaller bundle sizes (critical for mobile) 526 + - less boilerplate than React 527 + - SSR + static generation flexibility 528 + - modern DX with TypeScript 529 + 530 + **why Neon instead of self-hosted Postgres?** 531 + - serverless autoscaling (no capacity planning) 532 + - branch-per-PR workflow (preview databases) 533 + - automatic backups and point-in-time recovery 534 + - generous free tier for MVP 535 + - trade-off: higher latency than co-located DB, but acceptable 536 + 537 + **why reject AIFF instead of transcoding immediately?** 538 + - MVP speed: transcoding requires queue infrastructure, ffmpeg setup, error handling 539 + - user communication: better to be upfront about limitations than silent failures 540 + - resource management: transcoding is CPU-intensive, needs proper worker architecture 541 + - future flexibility: can add transcoding as optional feature (high-quality uploads → MP3 delivery) 542 + - trade-off: some users can't upload AIFF now, but those who can upload MP3 have working experience 543 + 544 + **why async everywhere?** 545 + - event loop performance: single-threaded async handles high concurrency 546 + - I/O-bound workload: most time spent waiting on network/disk 547 + - recent work (PRs #149-151) eliminated all blocking operations 548 + - alternative: thread pools for blocking I/O, but increases complexity 549 + - trade-off: debugging async code harder than sync, but worth throughput gains 550 + 551 + **why anyio.Path over thread pools?** 552 + - true async I/O: `anyio` uses OS-level async file operations where available 553 + - constant memory: chunked reads/writes (64KB) prevent OOM on large files 554 + - thread pools: would work but less efficient, more context switching 555 + - trade-off: anyio API slightly different from stdlib `pathlib`, but cleaner async semantics 556 + 557 + ## cost structure 558 + 559 + current monthly costs: ~$5-6 560 + 561 + - cloudflare pages: $0 (free tier) 562 + - cloudflare R2: ~$0.16 (storage + operations, no egress fees) 563 + - fly.io production: $5.00 (2x shared-cpu-1x VMs with auto-stop) 564 + - fly.io staging: $0 (auto-stop, only runs during testing) 565 + - neon: $0 (free tier, 0.5 CPU, 512MB RAM, 3GB storage) 566 + - logfire: $0 (free tier) 567 + - domain: $12/year (~$1/month) 568 + 569 + ## deployment URLs 570 + 571 + - **production frontend**: https://plyr.fm 572 + - **production backend**: https://relay-api.fly.dev (redirects to https://api.plyr.fm) 573 + - **staging backend**: https://api-stg.plyr.fm 574 + - **staging frontend**: https://stg.plyr.fm 575 + - **repository**: https://github.com/zzstoatzz/plyr.fm (private) 576 + - **monitoring**: https://logfire-us.pydantic.dev/zzstoatzz/relay 577 + - **bluesky**: https://bsky.app/profile/plyr.fm 578 + - **latest release**: 2025.1129.214811 579 + 580 + ## health indicators 581 + 582 + **production status**: ✅ healthy 583 + - uptime: consistently available 584 + - response times: <500ms p95 for API endpoints 585 + - error rate: <1% (mostly invalid OAuth states) 586 + - storage: ~12 tracks uploaded, functioning correctly 587 + 588 + **key metrics** 589 + - total tracks: ~12 590 + - total artists: ~3 591 + - play counts: tracked per-track 592 + - storage used: <1GB R2 593 + - database size: <10MB postgres 594 + 595 + ## next session prep 596 + 597 + **context for new agent:** 598 + 1. Fixed R2 image upload path mismatch, ensuring images save with the correct prefix. 599 + 2. Implemented UI changes for the embed player: removed the Queue button and matched fonts to the main app. 600 + 3. Opened a draft PR to the upstream social-app repository for native Plyr.fm embed support. 601 + 4. Updated issue #153 (transcoding pipeline) with a clear roadmap for integration into the backend. 602 + 5. Developed a local verification script for the transcoder service for faster local iteration. 603 + 604 + **useful commands:** 605 + - `just backend run` - run backend locally 606 + - `just frontend dev` - run frontend locally 607 + - `just test` - run test suite (from `backend/` directory) 608 + - `gh issue list` - check open issues 609 + ## admin tooling 610 + 611 + ### content moderation 612 + script: `scripts/delete_track.py` 613 + - requires `ADMIN_*` prefixed environment variables 614 + - deletes audio file from R2 615 + - deletes cover image from R2 (if exists) 616 + - deletes database record (cascades to likes and queue entries) 617 + - notes ATProto records for manual cleanup (can't delete from other users' PDS) 618 + 619 + usage: 620 + ```bash 621 + # dry run 622 + uv run scripts/delete_track.py <track_id> --dry-run 623 + 624 + # delete with confirmation 625 + uv run scripts/delete_track.py <track_id> 626 + 627 + # delete without confirmation 628 + uv run scripts/delete_track.py <track_id> --yes 629 + 630 + # by URL 631 + uv run scripts/delete_track.py --url https://plyr.fm/track/34 632 + ``` 633 + 634 + required environment variables: 635 + - `ADMIN_DATABASE_URL` - production database connection 636 + - `ADMIN_AWS_ACCESS_KEY_ID` - R2 access key 637 + - `ADMIN_AWS_SECRET_ACCESS_KEY` - R2 secret 638 + - `ADMIN_R2_ENDPOINT_URL` - R2 endpoint 639 + - `ADMIN_R2_BUCKET` - R2 bucket name 640 + 641 + ## known issues 642 + 643 + ### non-blocking 644 + - cloudflare pages preview URLs return 404 (production works fine) 645 + - some "relay" references remain in docs and comments 646 + - ATProto like records can't be deleted when removing tracks (orphaned on users' PDS) 647 + 648 + ## for new contributors 649 + 650 + ### getting started 651 + 1. clone: `gh repo clone zzstoatzz/plyr.fm` 652 + 2. install dependencies: `uv sync && cd frontend && bun install` 653 + 3. run backend: `uv run uvicorn backend.main:app --reload` 654 + 4. run frontend: `cd frontend && bun run dev` 655 + 5. visit http://localhost:5173 656 + 657 + ### development workflow 658 + 1. create issue on github 659 + 2. create PR from feature branch 660 + 3. ensure pre-commit hooks pass 661 + 4. test locally 662 + 5. merge to main → deploys to staging automatically 663 + 6. verify on staging 664 + 7. create github release → deploys to production automatically 665 + 666 + ### key principles 667 + - type hints everywhere 668 + - lowercase aesthetic 669 + - generic terminology (use "items" not "tracks" where appropriate) 670 + - ATProto first 671 + - mobile matters 672 + - cost conscious 673 + - async everywhere (no blocking I/O) 674 + 675 + ### project structure 676 + ``` 677 + plyr.fm/ 678 + ├── backend/ # FastAPI app & Python tooling 679 + │ ├── src/backend/ # application code 680 + │ │ ├── api/ # public endpoints 681 + │ │ ├── _internal/ # internal services 682 + │ │ ├── models/ # database schemas 683 + │ │ └── storage/ # storage adapters 684 + │ ├── tests/ # pytest suite 685 + │ └── alembic/ # database migrations 686 + ├── frontend/ # SvelteKit app 687 + │ ├── src/lib/ # components & state 688 + │ └── src/routes/ # pages 689 + ├── moderation/ # Rust moderation service (ATProto labeler) 690 + │ ├── src/ # Axum handlers, AuDD client, label signing 691 + │ └── static/ # admin UI (html/css/js) 692 + ├── transcoder/ # Rust audio transcoding service 693 + ├── docs/ # documentation 694 + └── justfile # task runner (mods: backend, frontend, moderation, transcoder) 695 + ``` 696 + 697 + ## documentation 698 + 699 + - [deployment overview](docs/deployment/overview.md) 700 + - [configuration guide](docs/configuration.md) 701 + - [queue design](docs/queue-design.md) 702 + - [logfire querying](docs/logfire-querying.md) 703 + - [pdsx guide](docs/pdsx-guide.md) 704 + - [neon mcp guide](docs/neon-mcp-guide.md) 705 + 706 + ## performance optimization session (Nov 12, 2025) 707 + 708 + ### issue: slow /tracks/liked endpoint 709 + 710 + **symptoms**: 711 + - `/tracks/liked` taking 600-900ms consistently 712 + - only ~25ms spent in database queries 713 + - mysterious 575ms gap with no spans in Logfire traces 714 + - endpoint felt sluggish compared to other pages 715 + 716 + **investigation**: 717 + - examined Logfire traces for `/tracks/liked` requests 718 + - found 5-6 liked tracks being returned per request 719 + - DB queries completing fast (track data, artist info, like counts all under 10ms each) 720 + - noticed R2 storage calls weren't appearing in traces despite taking majority of request time 721 + 722 + **root cause**: 723 + - PR #184 added `image_url` column to tracks table to eliminate N+1 R2 API calls 724 + - new tracks (uploaded after PR) have `image_url` populated at upload time ✅ 725 + - legacy tracks (15 tracks uploaded before PR) had `image_url = NULL` ❌ 726 + - fallback code called `track.get_image_url()` for NULL values 727 + - `get_image_url()` makes uninstrumented R2 `head_object` API calls to find image extensions 728 + - each track with NULL `image_url` = ~100-120ms of R2 API calls per request 729 + - 5 tracks × 120ms = ~600ms of uninstrumented latency 730 + 731 + **why R2 calls weren't visible**: 732 + - `storage.get_url()` method had no Logfire instrumentation 733 + - R2 API calls happening but not creating spans 734 + - appeared as mysterious gap in trace timeline 735 + 736 + **solution implemented**: 737 + 1. created `scripts/backfill_image_urls.py` to populate missing `image_url` values 738 + 2. ran script against production database with production R2 credentials 739 + 3. backfilled 11 tracks successfully (4 already done in previous partial run) 740 + 4. 3 tracks "failed" but actually have non-existent images (optional, expected) 741 + 5. script uses concurrent `asyncio.gather()` for performance 742 + 743 + **key learning: environment configuration matters**: 744 + - initial script runs failed silently because: 745 + - script used local `.env` credentials (dev R2 bucket) 746 + - production images stored in different R2 bucket (`images-prod`) 747 + - `get_url()` returned `None` when images not found in dev bucket 748 + - fix: passed production R2 credentials via environment variables: 749 + - `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` 750 + - `R2_IMAGE_BUCKET=images-prod` 751 + - `R2_PUBLIC_IMAGE_BUCKET_URL=https://pub-7ea7ea9a6f224f4f8c0321a2bb008c5a.r2.dev` 752 + 753 + **results**: 754 + - before: 15 tracks needed backfill, causing ~600-900ms latency on `/tracks/liked` 755 + - after: 13 tracks populated with `image_url`, 3 legitimately have no images 756 + - `/tracks/liked` now loads with 0 R2 API calls instead of 5-11 757 + - endpoint feels "really, really snappy" (user feedback) 758 + - performance improvement visible immediately after backfill 759 + 760 + **database cleanup: queue_state table bloat**: 761 + - discovered `queue_state` had 265% bloat (53 dead rows, 20 live rows) 762 + - ran `VACUUM (FULL, ANALYZE) queue_state` against production 763 + - result: 0 dead rows, table clean 764 + - configured autovacuum for queue_state to prevent future bloat: 765 + - frequent updates to this table make it prone to bloat 766 + - should tune `autovacuum_vacuum_scale_factor` to 0.05 (5% vs default 20%) 767 + 768 + **endpoint performance snapshot** (post-fix, last 10 minutes): 769 + - `GET /tracks/`: 410ms (down from 2+ seconds) 770 + - `GET /queue/`: 399ms (down from 2+ seconds) 771 + - `GET /tracks/liked`: now sub-200ms (down from 600-900ms) 772 + - `GET /preferences/`: 200ms median 773 + - `GET /auth/me`: 114ms median 774 + - `POST /tracks/{track_id}/play`: 34ms 775 + 776 + **PR #184 context**: 777 + - PR claimed "opportunistic backfill: legacy records update on first access" 778 + - but actual implementation never saved computed `image_url` back to database 779 + - fallback code only computed URLs on-demand, didn't persist them 780 + - this is why repeated visits kept hitting R2 API for same tracks 781 + - one-time backfill script was correct solution vs adding write logic to read endpoints 782 + 783 + **graceful ATProto recovery (PR #180)**: 784 + - reviewed recent work on handling tracks with missing `atproto_record_uri` 785 + - 4 tracks in production have NULL ATProto records (expected from upload failures) 786 + - system already handles this gracefully: 787 + - like buttons disabled with helpful tooltips 788 + - track owners can self-service restore via portal 789 + - `restore-record` endpoint recreates with correct TID timestamps 790 + - no action needed - existing recovery system working as designed 791 + 792 + **performance metrics pre/post all recent PRs**: 793 + - PR #184 (image_url storage): eliminated hundreds of R2 API calls per request 794 + - today's backfill: eliminated remaining R2 calls for legacy tracks 795 + - combined impact: queue/tracks endpoints now 5-10x faster than before PR #184 796 + - all endpoints now consistently sub-second response times 797 + 798 + **documentation created**: 799 + - `docs/neon-mcp-guide.md`: comprehensive guide for using Neon MCP 800 + - project/branch management 801 + - database schema inspection 802 + - SQL query patterns for plyr.fm 803 + - connection string generation 804 + - environment mapping (dev/staging/prod) 805 + - debugging workflows 806 + - `scripts/backfill_image_urls.py`: reusable for any future image_url gaps 807 + - dry-run mode for safety 808 + - concurrent R2 API calls 809 + - detailed error logging 810 + - production-tested 811 + 812 + **tools and patterns established**: 813 + - Neon MCP for database inspection and queries 814 + - Logfire arbitrary queries for performance analysis 815 + - production secret management via Fly.io 816 + - `flyctl ssh console` for environment inspection 817 + - backfill scripts with dry-run mode 818 + - environment variable overrides for production operations 819 + 820 + **system health indicators**: 821 + - ✅ no 5xx errors in recent spans 822 + - ✅ database queries all under 70ms p95 823 + - ✅ SSL connection pool issues resolved (no errors in recent traces) 824 + - ✅ queue_state table bloat eliminated 825 + - ✅ all track images either in DB or legitimately NULL 826 + - ✅ application feels fast and responsive 827 + 828 + **next steps**: 829 + 1. configure autovacuum for `queue_state` table (prevent future bloat) 830 + 2. add Logfire instrumentation to `storage.get_url()` for visibility 831 + 3. monitor `/tracks/liked` performance over next few days 832 + 4. consider adding similar backfill pattern for any future column additions 833 + 139 834 --- 140 835 141 - this is a living document. last updated 2025-12-01 after ATProto labeler work. 836 + ### copyright moderation system (PRs #382, #384, Nov 29-30, 2025) 837 + 838 + **motivation**: detect potential copyright violations in uploaded tracks to avoid DMCA issues and protect the platform. 839 + 840 + **what shipped**: 841 + - **moderation service** (Rust/Axum on Fly.io): 842 + - standalone service at `plyr-moderation.fly.dev` 843 + - integrates with AuDD enterprise API for audio fingerprinting 844 + - scans audio URLs and returns matches with metadata (artist, title, album, ISRC, timecode) 845 + - auth via `X-Moderation-Key` header 846 + - **backend integration** (PR #382): 847 + - `ModerationSettings` in config (service URL, auth token, timeout) 848 + - moderation client module (`backend/_internal/moderation.py`) 849 + - fire-and-forget background task on track upload 850 + - stores results in `copyright_scans` table 851 + - scan errors stored as "clear" so tracks aren't stuck unscanned 852 + - **flagging fix** (PR #384): 853 + - AuDD enterprise API returns no confidence scores (all 0) 854 + - changed from score threshold to presence-based flagging: `is_flagged = !matches.is_empty()` 855 + - removed unused `score_threshold` config 856 + - **backfill script** (`scripts/scan_tracks_copyright.py`): 857 + - scans existing tracks that haven't been checked 858 + - `--max-duration` flag to skip long DJ sets (estimated from file size) 859 + - `--dry-run` mode to preview what would be scanned 860 + - supports dev/staging/prod environments 861 + - **review workflow**: 862 + - `copyright_scans` table has `resolution`, `reviewed_at`, `reviewed_by`, `review_notes` columns 863 + - resolution values: `violation`, `false_positive`, `original_artist` 864 + - SQL queries for dashboard: flagged tracks, unreviewed flags, violations list 865 + 866 + **initial review results** (25 flagged tracks): 867 + - 8 violations (actual copyright issues) 868 + - 11 false positives (fingerprint noise) 869 + - 6 original artists (people uploading their own distributed music) 870 + 871 + **impact**: 872 + - automated copyright detection on upload 873 + - manual review workflow for flagged content 874 + - protection against DMCA takedown requests 875 + - clear audit trail with resolution status 876 + 877 + --- 878 + 879 + ### platform stats and media session integration (PRs #359-379, Nov 27-29, 2025) 880 + 881 + **motivation**: show platform activity at a glance, improve playback experience across devices, and give users control over their data. 882 + 883 + **what shipped**: 884 + - **platform stats endpoint and UI** (PRs #376, #378, #379): 885 + - `GET /stats` returns total plays, tracks, and artists 886 + - stats bar displays in homepage header (e.g., "1,691 plays • 55 tracks • 8 artists") 887 + - skeleton loading animation while fetching 888 + - responsive layout: visible in header on wide screens, collapses to menu on narrow 889 + - end-of-list animation on homepage 890 + - **Media Session API** (PR #371): 891 + - provides track metadata to CarPlay, lock screens, Bluetooth devices, macOS control center 892 + - artwork display with fallback to artist avatar 893 + - play/pause, prev/next, seek controls all work from system UI 894 + - position state syncs scrubbers on external interfaces 895 + - **browser tab title** (PR #374): 896 + - shows "track - artist • plyr.fm" while playing 897 + - persists across page navigation 898 + - reverts to page title when playback stops 899 + - **timed comments** (PR #359): 900 + - comments capture timestamp when added during playback 901 + - clickable timestamp buttons seek to that moment 902 + - compact scrollable comments section on track pages 903 + - **constellation integration** (PR #360): 904 + - queries constellation.microcosm.blue backlink index 905 + - enables network-wide like counts (not just plyr.fm internal) 906 + - environment-aware namespace handling 907 + - **account deletion** (PR #363): 908 + - explicit confirmation flow (type handle to confirm) 909 + - deletes all plyr.fm data (tracks, albums, likes, comments, preferences) 910 + - optional ATProto record cleanup with clear warnings about orphaned references 911 + 912 + **impact**: 913 + - platform stats give visitors immediate sense of activity 914 + - media session makes plyr.fm tracks controllable from car/lock screen/control center 915 + - timed comments enable discussion at specific moments in tracks 916 + - account deletion gives users full control over their data 917 + 918 + --- 919 + 920 + ### developer tokens with independent OAuth grants (PR #367, Nov 28, 2025) 921 + 922 + **motivation**: programmatic API access (scripts, CLIs, automation) needed tokens that survive browser logout and don't become stale when browser sessions refresh. 923 + 924 + **what shipped**: 925 + - **OAuth-based dev tokens**: each developer token gets its own OAuth authorization flow 926 + - user clicks "create token" → redirected to PDS for authorization → token created with independent credentials 927 + - tokens have their own DPoP keypair, access/refresh tokens - completely separate from browser session 928 + - **cookie isolation**: dev token exchange doesn't set browser cookie 929 + - added `is_dev_token` flag to ExchangeToken model 930 + - /auth/exchange skips Set-Cookie for dev token flows 931 + - prevents logout from deleting dev tokens (critical bug fixed during implementation) 932 + - **token management UI**: portal → "your data" → "developer tokens" 933 + - create with optional name and expiration (30/90/180/365 days or never) 934 + - list active tokens with creation/expiration dates 935 + - revoke individual tokens 936 + - **API endpoints**: 937 + - `POST /auth/developer-token/start` - initiates OAuth flow, returns auth_url 938 + - `GET /auth/developer-tokens` - list user's tokens 939 + - `DELETE /auth/developer-tokens/{prefix}` - revoke by 8-char prefix 940 + 941 + **security properties**: 942 + - tokens are full sessions with encrypted OAuth credentials (Fernet) 943 + - each token refreshes independently (no staleness from browser session refresh) 944 + - revokable individually without affecting browser or other tokens 945 + - explicit OAuth consent required at PDS for each token created 946 + 947 + **testing verified**: 948 + - created token → uploaded track → logged out → deleted track with token ✓ 949 + - browser logout doesn't affect dev tokens ✓ 950 + - token works across browser sessions ✓ 951 + - staging deployment tested end-to-end ✓ 952 + 953 + **documentation**: see `docs/authentication.md` "developer tokens" section 954 + 955 + --- 956 + 957 + ### oEmbed endpoint for Leaflet.pub embeds (PRs #355-358, Nov 25, 2025) 958 + 959 + **motivation**: plyr.fm tracks embedded in Leaflet.pub (via iframely) showed a black HTML5 audio box instead of our custom embed player. 960 + 961 + **what shipped**: 962 + - **oEmbed endpoint** (PR #355): `/oembed` returns proper embed HTML with iframe 963 + - follows oEmbed spec with `type: "rich"` and iframe in `html` field 964 + - discovery link in track page `<head>` for automatic detection 965 + - **iframely domain registration**: registered plyr.fm on iframely.com (free tier) 966 + - this was the key fix - iframely now returns our embed iframe as `links.player[0]` 967 + - API key: stored in 1password (iframely account) 968 + 969 + **debugging journey** (PRs #356-358): 970 + - initially tried `og:video` meta tags to hint iframe embed - didn't work 971 + - tried removing `og:audio` to force oEmbed fallback - resulted in no player link 972 + - discovered iframely requires domain registration to trust oEmbed providers 973 + - after registration, iframely correctly returns embed iframe URL 974 + 975 + **current state**: 976 + - oEmbed endpoint working: `curl https://api.plyr.fm/oembed?url=https://plyr.fm/track/92` 977 + - iframely returns `links.player[0].href = "https://plyr.fm/embed/track/92"` (our embed) 978 + - Leaflet.pub should show proper embeds (pending their cache expiry) 979 + 980 + **impact**: 981 + - plyr.fm tracks can be embedded in Leaflet.pub and other iframely-powered services 982 + - proper embed player with cover art instead of raw HTML5 audio 983 + 984 + --- 142 985 143 - older history has been archived to .status_history/ directory. 986 + ### export & upload reliability (PRs #337-344, Nov 24, 2025) 987 + 988 + **motivation**: exports were failing silently on large files (OOM), uploads showed incorrect progress, and SSE connections triggered false error toasts. 989 + 990 + **what shipped**: 991 + - **database-backed jobs** (PR #337): moved upload/export tracking from in-memory to postgres 992 + - jobs table persists state across server restarts 993 + - enables reliable progress tracking via SSE polling 994 + - **streaming exports** (PR #343): fixed OOM on large file exports 995 + - previously loaded entire files into memory via `response["Body"].read()` 996 + - now streams to temp files, adds to zip from disk (constant memory) 997 + - 90-minute WAV files now export successfully on 1GB VM 998 + - **progress tracking fix** (PR #340): upload progress was receiving bytes but treating as percentage 999 + - `UploadProgressTracker` now properly converts bytes to percentage 1000 + - upload progress bar works correctly again 1001 + - **UX improvements** (PRs #338-339, #341-342, #344): 1002 + - export filename now includes date (`plyr-tracks-2025-11-24.zip`) 1003 + - toast notification on track deletion 1004 + - fixed false "lost connection" error when SSE completes normally 1005 + - progress now shows "downloading track X of Y" instead of confusing count 1006 + 1007 + **impact**: 1008 + - exports work for arbitrarily large files (limited by disk, not RAM) 1009 + - upload progress displays correctly 1010 + - job state survives server restarts 1011 + - clearer progress messaging during exports 1012 + 1013 + --- 1014 + 1015 + this is a living document. last updated 2025-12-01 after ATProto labeler work.