chore: weekly status maintenance (#420)

authored by claude[bot] and committed by GitHub 88cb471d d3dbe2ee

Changed files
+447 -738
.status_history
+441
.status_history/2025-11.md
··· 1 + # plyr.fm status archive - november 2025 2 + 3 + ### Queue hydration + ATProto token hardening (Nov 12, 2025) 4 + 5 + **Why:** queue endpoints were occasionally taking 2s+ and restore operations could 401 6 + when multiple requests refreshed an expired ATProto token simultaneously. 7 + 8 + **What shipped:** 9 + - Added persistent `image_url` on `Track` rows so queue hydration no longer probes R2 10 + for every track. Queue payloads now pull art directly from Postgres, with a one-time 11 + fallback for legacy rows. 12 + - Updated `_internal/queue.py` to backfill any missing URLs once (with caching) instead 13 + of per-request GETs. 14 + - Introduced per-session locks in `_refresh_session_tokens` so only one coroutine hits 15 + `oauth_client.refresh_session` at a time; others reuse the refreshed tokens. This 16 + removes the race that caused the batch restore flow to intermittently 500/401. 17 + 18 + **Impact:** queue tail latency dropped back under 500 ms in staging tests, ATProto restore flows are now reliable under concurrent use, and Logfire no longer shows 500s 19 + from the PDS. 20 + 21 + ### Liked tracks feature (PR #157, Nov 11, 2025) 22 + 23 + - ✅ server-side persistent collections 24 + - ✅ ATProto record publication for cross-platform visibility 25 + - ✅ UI for adding/removing tracks from liked collection 26 + - ✅ like counts displayed in track responses and analytics (#170) 27 + - ✅ analytics cards now clickable links to track detail pages (#171) 28 + - ✅ liked state shown on artist page tracks (#163) 29 + 30 + ### Upload streaming + progress UX (PR #182, Nov 11, 2025) 31 + 32 + - Frontend switched from `fetch` to `XMLHttpRequest` so we can display upload progress 33 + toasts (critical for >50 MB mixes on mobile). 34 + - Upload form now clears only after the request succeeds; failed attempts leave the 35 + form intact so users don't lose metadata. 36 + - Backend writes uploads/images to temp files in 8 MB chunks before handing them to the 37 + storage layer, eliminating whole-file buffering and iOS crashes for hour-long mixes. 38 + - Deployment verified locally and by rerunning the exact repro Stella hit (85 minute 39 + mix from mobile). 40 + 41 + ### transcoder API deployment (PR #156, Nov 11, 2025) 42 + 43 + **standalone Rust transcoding service** 🎉 44 + - **deployed**: https://plyr-transcoder.fly.dev/ 45 + - **purpose**: convert AIFF/FLAC/etc. to MP3 for browser compatibility 46 + - **technology**: Axum + ffmpeg + Docker 47 + - **security**: `X-Transcoder-Key` header authentication (shared secret) 48 + - **capacity**: handles 1GB uploads, tested with 85-minute AIFF files (~858MB → 195MB MP3 in 32 seconds) 49 + - **architecture**: 50 + - 2 Fly machines for high availability 51 + - auto-stop/start for cost efficiency 52 + - stateless design (no R2 integration yet) 53 + - 320kbps MP3 output with proper ID3 tags 54 + - **status**: deployed and tested, ready for integration into plyr.fm upload pipeline 55 + - **next steps**: wire into backend with R2 integration and job queue (see issue #153) 56 + 57 + ### AIFF/AIF browser compatibility fix (PR #152, Nov 11, 2025) 58 + 59 + **format validation improvements** 60 + - **problem discovered**: AIFF/AIF files only work in Safari, not Chrome/Firefox 61 + - browsers throw `MediaError code 4: MEDIA_ERR_SRC_NOT_SUPPORTED` 62 + - users could upload files but they wouldn't play in most browsers 63 + - **immediate solution**: reject AIFF/AIF uploads at both backend and frontend 64 + - removed AIFF/AIF from AudioFormat enum 65 + - added format hints to upload UI: "supported: mp3, wav, m4a" 66 + - client-side validation with helpful error messages 67 + - **long-term solution**: deployed standalone transcoder service (see above) 68 + - separate Rust/Axum service with ffmpeg 69 + - accepts all formats, converts to browser-compatible MP3 70 + - integration into upload pipeline pending (issue #153) 71 + 72 + **observability improvements**: 73 + - added logfire instrumentation to upload background tasks 74 + - added logfire spans to R2 storage operations 75 + - documented logfire querying patterns in `docs/logfire-querying.md` 76 + 77 + ### async I/O performance fixes (PRs #149-151, Nov 10-11, 2025) 78 + 79 + Eliminated event loop blocking across backend with three critical PRs: 80 + 81 + 1. **PR #149: async R2 reads** - converted R2 `head_object` operations from sync boto3 to async aioboto3 82 + - portal page load time: 2+ seconds → ~200ms 83 + - root cause: `track.image_url` was blocking on serial R2 HEAD requests 84 + 85 + 2. **PR #150: concurrent PDS resolution** - parallelized ATProto PDS URL lookups 86 + - homepage load time: 2-6 seconds → 200-400ms 87 + - root cause: serial `resolve_atproto_data()` calls (8 artists × 200-300ms each) 88 + - fix: `asyncio.gather()` for batch resolution, database caching for subsequent loads 89 + 90 + 3. **PR #151: async storage writes/deletes** - made save/delete operations non-blocking 91 + - R2: switched to `aioboto3` for uploads/deletes (async S3 operations) 92 + - filesystem: used `anyio.Path` and `anyio.open_file()` for chunked async I/O (64KB chunks) 93 + - impact: multi-MB uploads no longer monopolize worker thread, constant memory usage 94 + 95 + ### cover art support (PRs #123-126, #132-139) 96 + - ✅ track cover image upload and storage (separate R2 bucket) 97 + - ✅ image display on track pages and player 98 + - ✅ Open Graph meta tags for track sharing 99 + - ✅ mobile-optimized layouts with cover art 100 + - ✅ sticky bottom player on mobile with cover 101 + 102 + ### track detail pages (PR #164, Nov 12, 2025) 103 + 104 + - ✅ dedicated track detail pages with large cover art 105 + - ✅ play button updates queue state correctly (#169) 106 + - ✅ liked state loaded efficiently via server-side fetch 107 + - ✅ mobile-optimized layouts with proper scrolling constraints 108 + - ✅ origin validation for image URLs (#168) 109 + 110 + ### mobile UI improvements (PRs #159-185, Nov 11-12, 2025) 111 + 112 + - ✅ compact action menus and better navigation (#161) 113 + - ✅ improved mobile responsiveness (#159) 114 + - ✅ consistent button layouts across mobile/desktop (#176-181, #185) 115 + - ✅ always show play count and like count on mobile (#177) 116 + - ✅ login page UX improvements (#174-175) 117 + - ✅ liked page UX improvements (#173) 118 + - ✅ accent color for liked tracks (#160) 119 + 120 + ### queue management improvements (PRs #110-113, #115) 121 + - ✅ visual feedback on queue add/remove 122 + - ✅ toast notifications for queue actions 123 + - ✅ better error handling for queue operations 124 + - ✅ improved shuffle and auto-advance UX 125 + 126 + ### infrastructure and tooling 127 + - ✅ R2 bucket separation: audio-prod and images-prod (PR #124) 128 + - ✅ admin script for content moderation (`scripts/delete_track.py`) 129 + - ✅ bluesky attribution link in header 130 + - ✅ changelog target added (#183) 131 + - ✅ documentation updates (#158) 132 + - ✅ track metadata edits now persist correctly (#162) 133 + 134 + --- 135 + 136 + ## performance optimization session (Nov 12, 2025) 137 + 138 + ### issue: slow /tracks/liked endpoint 139 + 140 + **symptoms**: 141 + - `/tracks/liked` taking 600-900ms consistently 142 + - only ~25ms spent in database queries 143 + - mysterious 575ms gap with no spans in Logfire traces 144 + - endpoint felt sluggish compared to other pages 145 + 146 + **investigation**: 147 + - examined Logfire traces for `/tracks/liked` requests 148 + - found 5-6 liked tracks being returned per request 149 + - DB queries completing fast (track data, artist info, like counts all under 10ms each) 150 + - noticed R2 storage calls weren't appearing in traces despite taking majority of request time 151 + 152 + **root cause**: 153 + - PR #184 added `image_url` column to tracks table to eliminate N+1 R2 API calls 154 + - new tracks (uploaded after PR) have `image_url` populated at upload time ✅ 155 + - legacy tracks (15 tracks uploaded before PR) had `image_url = NULL` ❌ 156 + - fallback code called `track.get_image_url()` for NULL values 157 + - `get_image_url()` makes uninstrumented R2 `head_object` API calls to find image extensions 158 + - each track with NULL `image_url` = ~100-120ms of R2 API calls per request 159 + - 5 tracks × 120ms = ~600ms of uninstrumented latency 160 + 161 + **why R2 calls weren't visible**: 162 + - `storage.get_url()` method had no Logfire instrumentation 163 + - R2 API calls happening but not creating spans 164 + - appeared as mysterious gap in trace timeline 165 + 166 + **solution implemented**: 167 + 1. created `scripts/backfill_image_urls.py` to populate missing `image_url` values 168 + 2. ran script against production database with production R2 credentials 169 + 3. backfilled 11 tracks successfully (4 already done in previous partial run) 170 + 4. 3 tracks "failed" but actually have non-existent images (optional, expected) 171 + 5. script uses concurrent `asyncio.gather()` for performance 172 + 173 + **key learning: environment configuration matters**: 174 + - initial script runs failed silently because: 175 + - script used local `.env` credentials (dev R2 bucket) 176 + - production images stored in different R2 bucket (`images-prod`) 177 + - `get_url()` returned `None` when images not found in dev bucket 178 + - fix: passed production R2 credentials via environment variables: 179 + - `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` 180 + - `R2_IMAGE_BUCKET=images-prod` 181 + - `R2_PUBLIC_IMAGE_BUCKET_URL=https://pub-7ea7ea9a6f224f4f8c0321a2bb008c5a.r2.dev` 182 + 183 + **results**: 184 + - before: 15 tracks needed backfill, causing ~600-900ms latency on `/tracks/liked` 185 + - after: 13 tracks populated with `image_url`, 3 legitimately have no images 186 + - `/tracks/liked` now loads with 0 R2 API calls instead of 5-11 187 + - endpoint feels "really, really snappy" (user feedback) 188 + - performance improvement visible immediately after backfill 189 + 190 + **database cleanup: queue_state table bloat**: 191 + - discovered `queue_state` had 265% bloat (53 dead rows, 20 live rows) 192 + - ran `VACUUM (FULL, ANALYZE) queue_state` against production 193 + - result: 0 dead rows, table clean 194 + - configured autovacuum for queue_state to prevent future bloat: 195 + - frequent updates to this table make it prone to bloat 196 + - should tune `autovacuum_vacuum_scale_factor` to 0.05 (5% vs default 20%) 197 + 198 + **endpoint performance snapshot** (post-fix, last 10 minutes): 199 + - `GET /tracks/`: 410ms (down from 2+ seconds) 200 + - `GET /queue/`: 399ms (down from 2+ seconds) 201 + - `GET /tracks/liked`: now sub-200ms (down from 600-900ms) 202 + - `GET /preferences/`: 200ms median 203 + - `GET /auth/me`: 114ms median 204 + - `POST /tracks/{track_id}/play`: 34ms 205 + 206 + **PR #184 context**: 207 + - PR claimed "opportunistic backfill: legacy records update on first access" 208 + - but actual implementation never saved computed `image_url` back to database 209 + - fallback code only computed URLs on-demand, didn't persist them 210 + - this is why repeated visits kept hitting R2 API for same tracks 211 + - one-time backfill script was correct solution vs adding write logic to read endpoints 212 + 213 + **graceful ATProto recovery (PR #180)**: 214 + - reviewed recent work on handling tracks with missing `atproto_record_uri` 215 + - 4 tracks in production have NULL ATProto records (expected from upload failures) 216 + - system already handles this gracefully: 217 + - like buttons disabled with helpful tooltips 218 + - track owners can self-service restore via portal 219 + - `restore-record` endpoint recreates with correct TID timestamps 220 + - no action needed - existing recovery system working as designed 221 + 222 + **performance metrics pre/post all recent PRs**: 223 + - PR #184 (image_url storage): eliminated hundreds of R2 API calls per request 224 + - today's backfill: eliminated remaining R2 calls for legacy tracks 225 + - combined impact: queue/tracks endpoints now 5-10x faster than before PR #184 226 + - all endpoints now consistently sub-second response times 227 + 228 + **documentation created**: 229 + - `docs/neon-mcp-guide.md`: comprehensive guide for using Neon MCP 230 + - project/branch management 231 + - database schema inspection 232 + - SQL query patterns for plyr.fm 233 + - connection string generation 234 + - environment mapping (dev/staging/prod) 235 + - debugging workflows 236 + - `scripts/backfill_image_urls.py`: reusable for any future image_url gaps 237 + - dry-run mode for safety 238 + - concurrent R2 API calls 239 + - detailed error logging 240 + - production-tested 241 + 242 + **tools and patterns established**: 243 + - Neon MCP for database inspection and queries 244 + - Logfire arbitrary queries for performance analysis 245 + - production secret management via Fly.io 246 + - `flyctl ssh console` for environment inspection 247 + - backfill scripts with dry-run mode 248 + - environment variable overrides for production operations 249 + 250 + **system health indicators**: 251 + - ✅ no 5xx errors in recent spans 252 + - ✅ database queries all under 70ms p95 253 + - ✅ SSL connection pool issues resolved (no errors in recent traces) 254 + - ✅ queue_state table bloat eliminated 255 + - ✅ all track images either in DB or legitimately NULL 256 + - ✅ application feels fast and responsive 257 + 258 + **next steps**: 259 + 1. configure autovacuum for `queue_state` table (prevent future bloat) 260 + 2. add Logfire instrumentation to `storage.get_url()` for visibility 261 + 3. monitor `/tracks/liked` performance over next few days 262 + 4. consider adding similar backfill pattern for any future column additions 263 + 264 + --- 265 + 266 + ### copyright moderation system (PRs #382, #384, Nov 29-30, 2025) 267 + 268 + **motivation**: detect potential copyright violations in uploaded tracks to avoid DMCA issues and protect the platform. 269 + 270 + **what shipped**: 271 + - **moderation service** (Rust/Axum on Fly.io): 272 + - standalone service at `plyr-moderation.fly.dev` 273 + - integrates with AuDD enterprise API for audio fingerprinting 274 + - scans audio URLs and returns matches with metadata (artist, title, album, ISRC, timecode) 275 + - auth via `X-Moderation-Key` header 276 + - **backend integration** (PR #382): 277 + - `ModerationSettings` in config (service URL, auth token, timeout) 278 + - moderation client module (`backend/_internal/moderation.py`) 279 + - fire-and-forget background task on track upload 280 + - stores results in `copyright_scans` table 281 + - scan errors stored as "clear" so tracks aren't stuck unscanned 282 + - **flagging fix** (PR #384): 283 + - AuDD enterprise API returns no confidence scores (all 0) 284 + - changed from score threshold to presence-based flagging: `is_flagged = !matches.is_empty()` 285 + - removed unused `score_threshold` config 286 + - **backfill script** (`scripts/scan_tracks_copyright.py`): 287 + - scans existing tracks that haven't been checked 288 + - `--max-duration` flag to skip long DJ sets (estimated from file size) 289 + - `--dry-run` mode to preview what would be scanned 290 + - supports dev/staging/prod environments 291 + - **review workflow**: 292 + - `copyright_scans` table has `resolution`, `reviewed_at`, `reviewed_by`, `review_notes` columns 293 + - resolution values: `violation`, `false_positive`, `original_artist` 294 + - SQL queries for dashboard: flagged tracks, unreviewed flags, violations list 295 + 296 + **initial review results** (25 flagged tracks): 297 + - 8 violations (actual copyright issues) 298 + - 11 false positives (fingerprint noise) 299 + - 6 original artists (people uploading their own distributed music) 300 + 301 + **impact**: 302 + - automated copyright detection on upload 303 + - manual review workflow for flagged content 304 + - protection against DMCA takedown requests 305 + - clear audit trail with resolution status 306 + 307 + --- 308 + 309 + ### platform stats and media session integration (PRs #359-379, Nov 27-29, 2025) 310 + 311 + **motivation**: show platform activity at a glance, improve playback experience across devices, and give users control over their data. 312 + 313 + **what shipped**: 314 + - **platform stats endpoint and UI** (PRs #376, #378, #379): 315 + - `GET /stats` returns total plays, tracks, and artists 316 + - stats bar displays in homepage header (e.g., "1,691 plays • 55 tracks • 8 artists") 317 + - skeleton loading animation while fetching 318 + - responsive layout: visible in header on wide screens, collapses to menu on narrow 319 + - end-of-list animation on homepage 320 + - **Media Session API** (PR #371): 321 + - provides track metadata to CarPlay, lock screens, Bluetooth devices, macOS control center 322 + - artwork display with fallback to artist avatar 323 + - play/pause, prev/next, seek controls all work from system UI 324 + - position state syncs scrubbers on external interfaces 325 + - **browser tab title** (PR #374): 326 + - shows "track - artist • plyr.fm" while playing 327 + - persists across page navigation 328 + - reverts to page title when playback stops 329 + - **timed comments** (PR #359): 330 + - comments capture timestamp when added during playback 331 + - clickable timestamp buttons seek to that moment 332 + - compact scrollable comments section on track pages 333 + - **constellation integration** (PR #360): 334 + - queries constellation.microcosm.blue backlink index 335 + - enables network-wide like counts (not just plyr.fm internal) 336 + - environment-aware namespace handling 337 + - **account deletion** (PR #363): 338 + - explicit confirmation flow (type handle to confirm) 339 + - deletes all plyr.fm data (tracks, albums, likes, comments, preferences) 340 + - optional ATProto record cleanup with clear warnings about orphaned references 341 + 342 + **impact**: 343 + - platform stats give visitors immediate sense of activity 344 + - media session makes plyr.fm tracks controllable from car/lock screen/control center 345 + - timed comments enable discussion at specific moments in tracks 346 + - account deletion gives users full control over their data 347 + 348 + --- 349 + 350 + ### developer tokens with independent OAuth grants (PR #367, Nov 28, 2025) 351 + 352 + **motivation**: programmatic API access (scripts, CLIs, automation) needed tokens that survive browser logout and don't become stale when browser sessions refresh. 353 + 354 + **what shipped**: 355 + - **OAuth-based dev tokens**: each developer token gets its own OAuth authorization flow 356 + - user clicks "create token" → redirected to PDS for authorization → token created with independent credentials 357 + - tokens have their own DPoP keypair, access/refresh tokens - completely separate from browser session 358 + - **cookie isolation**: dev token exchange doesn't set browser cookie 359 + - added `is_dev_token` flag to ExchangeToken model 360 + - /auth/exchange skips Set-Cookie for dev token flows 361 + - prevents logout from deleting dev tokens (critical bug fixed during implementation) 362 + - **token management UI**: portal → "your data" → "developer tokens" 363 + - create with optional name and expiration (30/90/180/365 days or never) 364 + - list active tokens with creation/expiration dates 365 + - revoke individual tokens 366 + - **API endpoints**: 367 + - `POST /auth/developer-token/start` - initiates OAuth flow, returns auth_url 368 + - `GET /auth/developer-tokens` - list user's tokens 369 + - `DELETE /auth/developer-tokens/{prefix}` - revoke by 8-char prefix 370 + 371 + **security properties**: 372 + - tokens are full sessions with encrypted OAuth credentials (Fernet) 373 + - each token refreshes independently (no staleness from browser session refresh) 374 + - revokable individually without affecting browser or other tokens 375 + - explicit OAuth consent required at PDS for each token created 376 + 377 + **testing verified**: 378 + - created token → uploaded track → logged out → deleted track with token ✓ 379 + - browser logout doesn't affect dev tokens ✓ 380 + - token works across browser sessions ✓ 381 + - staging deployment tested end-to-end ✓ 382 + 383 + **documentation**: see `docs/authentication.md` "developer tokens" section 384 + 385 + --- 386 + 387 + ### oEmbed endpoint for Leaflet.pub embeds (PRs #355-358, Nov 25, 2025) 388 + 389 + **motivation**: plyr.fm tracks embedded in Leaflet.pub (via iframely) showed a black HTML5 audio box instead of our custom embed player. 390 + 391 + **what shipped**: 392 + - **oEmbed endpoint** (PR #355): `/oembed` returns proper embed HTML with iframe 393 + - follows oEmbed spec with `type: "rich"` and iframe in `html` field 394 + - discovery link in track page `<head>` for automatic detection 395 + - **iframely domain registration**: registered plyr.fm on iframely.com (free tier) 396 + - this was the key fix - iframely now returns our embed iframe as `links.player[0]` 397 + - API key: stored in 1password (iframely account) 398 + 399 + **debugging journey** (PRs #356-358): 400 + - initially tried `og:video` meta tags to hint iframe embed - didn't work 401 + - tried removing `og:audio` to force oEmbed fallback - resulted in no player link 402 + - discovered iframely requires domain registration to trust oEmbed providers 403 + - after registration, iframely correctly returns embed iframe URL 404 + 405 + **current state**: 406 + - oEmbed endpoint working: `curl https://api.plyr.fm/oembed?url=https://plyr.fm/track/92` 407 + - iframely returns `links.player[0].href = "https://plyr.fm/embed/track/92"` (our embed) 408 + - Leaflet.pub should show proper embeds (pending their cache expiry) 409 + 410 + **impact**: 411 + - plyr.fm tracks can be embedded in Leaflet.pub and other iframely-powered services 412 + - proper embed player with cover art instead of raw HTML5 audio 413 + 414 + --- 415 + 416 + ### export & upload reliability (PRs #337-344, Nov 24, 2025) 417 + 418 + **motivation**: exports were failing silently on large files (OOM), uploads showed incorrect progress, and SSE connections triggered false error toasts. 419 + 420 + **what shipped**: 421 + - **database-backed jobs** (PR #337): moved upload/export tracking from in-memory to postgres 422 + - jobs table persists state across server restarts 423 + - enables reliable progress tracking via SSE polling 424 + - **streaming exports** (PR #343): fixed OOM on large file exports 425 + - previously loaded entire files into memory via `response["Body"].read()` 426 + - now streams to temp files, adds to zip from disk (constant memory) 427 + - 90-minute WAV files now export successfully on 1GB VM 428 + - **progress tracking fix** (PR #340): upload progress was receiving bytes but treating as percentage 429 + - `UploadProgressTracker` now properly converts bytes to percentage 430 + - upload progress bar works correctly again 431 + - **UX improvements** (PRs #338-339, #341-342, #344): 432 + - export filename now includes date (`plyr-tracks-2025-11-24.zip`) 433 + - toast notification on track deletion 434 + - fixed false "lost connection" error when SSE completes normally 435 + - progress now shows "downloading track X of Y" instead of confusing count 436 + 437 + **impact**: 438 + - exports work for arbitrarily large files (limited by disk, not RAM) 439 + - upload progress displays correctly 440 + - job state survives server restarts 441 + - clearer progress messaging during exports
+6 -738
STATUS.md
··· 131 131 - htmx endpoints: `/admin/flags-html`, `/admin/resolve-htmx` 132 132 - server-rendered HTML partials for flag cards 133 133 134 - 135 - ### Queue hydration + ATProto token hardening (Nov 12, 2025) 136 - 137 - **Why:** queue endpoints were occasionally taking 2s+ and restore operations could 401 138 - when multiple requests refreshed an expired ATProto token simultaneously. 139 - 140 - **What shipped:** 141 - - Added persistent `image_url` on `Track` rows so queue hydration no longer probes R2 142 - for every track. Queue payloads now pull art directly from Postgres, with a one-time 143 - fallback for legacy rows. 144 - - Updated `_internal/queue.py` to backfill any missing URLs once (with caching) instead 145 - of per-request GETs. 146 - - Introduced per-session locks in `_refresh_session_tokens` so only one coroutine hits 147 - `oauth_client.refresh_session` at a time; others reuse the refreshed tokens. This 148 - removes the race that caused the batch restore flow to intermittently 500/401. 149 - 150 - **Impact:** queue tail latency dropped back under 500 ms in staging tests, ATProto restore flows are now reliable under concurrent use, and Logfire no longer shows 500s 151 - from the PDS. 152 - 153 - ### Liked tracks feature (PR #157, Nov 11, 2025) 154 - 155 - - ✅ server-side persistent collections 156 - - ✅ ATProto record publication for cross-platform visibility 157 - - ✅ UI for adding/removing tracks from liked collection 158 - - ✅ like counts displayed in track responses and analytics (#170) 159 - - ✅ analytics cards now clickable links to track detail pages (#171) 160 - - ✅ liked state shown on artist page tracks (#163) 161 - 162 - ### Upload streaming + progress UX (PR #182, Nov 11, 2025) 163 - 164 - - Frontend switched from `fetch` to `XMLHttpRequest` so we can display upload progress 165 - toasts (critical for >50 MB mixes on mobile). 166 - - Upload form now clears only after the request succeeds; failed attempts leave the 167 - form intact so users don't lose metadata. 168 - - Backend writes uploads/images to temp files in 8 MB chunks before handing them to the 169 - storage layer, eliminating whole-file buffering and iOS crashes for hour-long mixes. 170 - - Deployment verified locally and by rerunning the exact repro Stella hit (85 minute 171 - mix from mobile). 172 - 173 - ### transcoder API deployment (PR #156, Nov 11, 2025) 174 - 175 - **standalone Rust transcoding service** 🎉 176 - - **deployed**: https://plyr-transcoder.fly.dev/ 177 - - **purpose**: convert AIFF/FLAC/etc. to MP3 for browser compatibility 178 - - **technology**: Axum + ffmpeg + Docker 179 - - **security**: `X-Transcoder-Key` header authentication (shared secret) 180 - - **capacity**: handles 1GB uploads, tested with 85-minute AIFF files (~858MB → 195MB MP3 in 32 seconds) 181 - - **architecture**: 182 - - 2 Fly machines for high availability 183 - - auto-stop/start for cost efficiency 184 - - stateless design (no R2 integration yet) 185 - - 320kbps MP3 output with proper ID3 tags 186 - - **status**: deployed and tested, ready for integration into plyr.fm upload pipeline 187 - - **next steps**: wire into backend with R2 integration and job queue (see issue #153) 188 - 189 - ### AIFF/AIF browser compatibility fix (PR #152, Nov 11, 2025) 190 - 191 - **format validation improvements** 192 - - **problem discovered**: AIFF/AIF files only work in Safari, not Chrome/Firefox 193 - - browsers throw `MediaError code 4: MEDIA_ERR_SRC_NOT_SUPPORTED` 194 - - users could upload files but they wouldn't play in most browsers 195 - - **immediate solution**: reject AIFF/AIF uploads at both backend and frontend 196 - - removed AIFF/AIF from AudioFormat enum 197 - - added format hints to upload UI: "supported: mp3, wav, m4a" 198 - - client-side validation with helpful error messages 199 - - **long-term solution**: deployed standalone transcoder service (see above) 200 - - separate Rust/Axum service with ffmpeg 201 - - accepts all formats, converts to browser-compatible MP3 202 - - integration into upload pipeline pending (issue #153) 203 - 204 - **observability improvements**: 205 - - added logfire instrumentation to upload background tasks 206 - - added logfire spans to R2 storage operations 207 - - documented logfire querying patterns in `docs/logfire-querying.md` 208 - 209 - ### async I/O performance fixes (PRs #149-151, Nov 10-11, 2025) 210 - 211 - Eliminated event loop blocking across backend with three critical PRs: 212 - 213 - 1. **PR #149: async R2 reads** - converted R2 `head_object` operations from sync boto3 to async aioboto3 214 - - portal page load time: 2+ seconds → ~200ms 215 - - root cause: `track.image_url` was blocking on serial R2 HEAD requests 216 - 217 - 2. **PR #150: concurrent PDS resolution** - parallelized ATProto PDS URL lookups 218 - - homepage load time: 2-6 seconds → 200-400ms 219 - - root cause: serial `resolve_atproto_data()` calls (8 artists × 200-300ms each) 220 - - fix: `asyncio.gather()` for batch resolution, database caching for subsequent loads 221 - 222 - 3. **PR #151: async storage writes/deletes** - made save/delete operations non-blocking 223 - - R2: switched to `aioboto3` for uploads/deletes (async S3 operations) 224 - - filesystem: used `anyio.Path` and `anyio.open_file()` for chunked async I/O (64KB chunks) 225 - - impact: multi-MB uploads no longer monopolize worker thread, constant memory usage 226 - 227 - ### cover art support (PRs #123-126, #132-139) 228 - - ✅ track cover image upload and storage (separate R2 bucket) 229 - - ✅ image display on track pages and player 230 - - ✅ Open Graph meta tags for track sharing 231 - - ✅ mobile-optimized layouts with cover art 232 - - ✅ sticky bottom player on mobile with cover 233 - 234 - ### track detail pages (PR #164, Nov 12, 2025) 235 - 236 - - ✅ dedicated track detail pages with large cover art 237 - - ✅ play button updates queue state correctly (#169) 238 - - ✅ liked state loaded efficiently via server-side fetch 239 - - ✅ mobile-optimized layouts with proper scrolling constraints 240 - - ✅ origin validation for image URLs (#168) 241 - 242 - ### mobile UI improvements (PRs #159-185, Nov 11-12, 2025) 243 - 244 - - ✅ compact action menus and better navigation (#161) 245 - - ✅ improved mobile responsiveness (#159) 246 - - ✅ consistent button layouts across mobile/desktop (#176-181, #185) 247 - - ✅ always show play count and like count on mobile (#177) 248 - - ✅ login page UX improvements (#174-175) 249 - - ✅ liked page UX improvements (#173) 250 - - ✅ accent color for liked tracks (#160) 251 - 252 - ### queue management improvements (PRs #110-113, #115) 253 - - ✅ visual feedback on queue add/remove 254 - - ✅ toast notifications for queue actions 255 - - ✅ better error handling for queue operations 256 - - ✅ improved shuffle and auto-advance UX 257 - 258 - ### infrastructure and tooling 259 - - ✅ R2 bucket separation: audio-prod and images-prod (PR #124) 260 - - ✅ admin script for content moderation (`scripts/delete_track.py`) 261 - - ✅ bluesky attribution link in header 262 - - ✅ changelog target added (#183) 263 - - ✅ documentation updates (#158) 264 - - ✅ track metadata edits now persist correctly (#162) 134 + --- 265 135 266 136 ## immediate priorities 267 137 ··· 287 157 - fix: set `pool_pre_ping=True`, adjust `pool_recycle` for Neon timeouts 288 158 - documented in `docs/logfire-querying.md` 289 159 290 - ### performance optimizations 291 - 3. **persist concrete file extensions in database**: currently brute-force probing all supported formats on read 292 - - already know `Track.file_type` and image format during upload 293 - - eliminating repeated `exists()` checks reduces filesystem/R2 HEAD spam 294 - - improves audio streaming latency (`/audio/{file_id}` endpoint walks extensions sequentially) 295 - 296 - 4. **stream large uploads directly to storage**: current implementation reads entire file into memory before background task 297 - - multi-GB uploads risk OOM 298 - - stream from `UploadFile.file` → storage backend for constant memory usage 299 - 300 - ### new features 301 - 5. **content-addressable storage** (issue #146) 302 - - hash-based file storage for automatic deduplication 303 - - reduces storage costs when multiple artists upload same file 304 - - enables content verification 305 - 306 - 6. **liked tracks feature** (issue #144): design schema and ATProto record format 307 - - server-side persistent collections 308 - - ATProto record publication for cross-platform visibility 309 - - UI for adding/removing tracks from liked collection 310 - 311 - ## open issues by timeline 312 - 313 - ### immediate 314 - - issue #153: audio transcoding pipeline (ffmpeg worker for AIFF/FLAC→MP3) 315 - - issue #147: upload reliability bug (data loss risk) 316 - - issue #144: likes feature for personal collections 317 - 318 - ### short-term 319 - - issue #146: content-addressable storage (hash-based deduplication) 320 - - issue #24: implement play count abuse prevention 321 - - database connection pool tuning (SSL errors) 322 - - file extension persistence in database 323 - 324 - ### medium-term 325 - - issue #39: postmortem - cross-domain auth deployment and remaining security TODOs 326 - - issue #46: consider removing init_db() from lifespan in favor of migration-only approach 327 - - issue #56: design public developer API and versioning 328 - - issue #57: support multiple audio item types (voice memos/snippets) 329 - - issue #122: fullscreen player for immersive playback 330 - 331 - ### long-term 332 - - migrate to plyr-owned lexicon (custom ATProto namespace with richer metadata) 333 - - publish to multiple ATProto AppViews for cross-platform visibility 334 - - explore ATProto-native notifications (replace Bluesky DM bot) 335 - - realtime queue syncing across devices via SSE/WebSocket 336 - - artist analytics dashboard improvements 337 - - issue #44: modern music streaming feature parity 160 + --- 338 161 339 162 ## technical state 340 163 341 - ### architecture 342 - 343 - **backend** 344 - - language: Python 3.11+ 345 - - framework: FastAPI with uvicorn 346 - - database: Neon PostgreSQL (serverless, fully managed) 347 - - storage: Cloudflare R2 (S3-compatible object storage) 348 - - hosting: Fly.io (2x shared-cpu VMs, auto-scaling) 349 - - observability: Pydantic Logfire (traces, metrics, logs) 350 - - auth: ATProto OAuth 2.1 (forked SDK: github.com/zzstoatzz/atproto) 351 - 352 - **frontend** 353 - - framework: SvelteKit (latest v2.43.2) 354 - - runtime: Bun (fast JS runtime) 355 - - hosting: Cloudflare Pages (edge network) 356 - - styling: vanilla CSS with lowercase aesthetic 357 - - state management: Svelte 5 runes ($state, $derived, $effect) 358 - 359 - **deployment** 360 - - ci/cd: GitHub Actions 361 - - backend: automatic on main branch merge (fly.io deploy) 362 - - frontend: automatic on every push to main (cloudflare pages) 363 - - migrations: automated via fly.io release_command 364 - - environments: dev → staging → production (full separation) 365 - - versioning: nebula timestamp format (YYYY.MMDD.HHMMSS) 366 - 367 - **key dependencies** 368 - - atproto: forked SDK for OAuth and record management 369 - - sqlalchemy: async ORM for postgres 370 - - alembic: database migrations 371 - - boto3/aioboto3: R2 storage client 372 - - logfire: observability (FastAPI + SQLAlchemy instrumentation) 373 - - httpx: async HTTP client 374 - 375 - **what's working** 164 + ### what's working 376 165 377 166 **core functionality** 378 167 - ✅ ATProto OAuth 2.1 authentication with encrypted state ··· 396 185 - ✅ cross-tab queue synchronization via BroadcastChannel 397 186 - ✅ share tracks via URL with Open Graph previews (including cover art) 398 187 - ✅ image URL caching in database (eliminates N+1 R2 calls) 399 - - ✅ format validation (rejects AIFF/AIF, accepts MP3/WAV/M4A with helpful error mes 400 - sages) 188 + - ✅ format validation (rejects AIFF/AIF, accepts MP3/WAV/M4A with helpful error messages) 401 189 - ✅ standalone audio transcoding service deployed and verified (see issue #153) 402 - - ✅ Bluesky embed player UI changes implemented (pending upstream social-app PR) 403 190 - ✅ admin content moderation script for removing inappropriate uploads 404 191 - ✅ copyright moderation system (AuDD fingerprinting, review workflow, violation tracking) 405 192 - ✅ ATProto labeler for copyright violations (queryLabels, subscribeLabels XRPC endpoints) ··· 415 202 - ✅ long album title handling (100-char slugs, CSS truncation) 416 203 - ⏸ ATProto records for albums (deferred, see issue #221) 417 204 418 - **frontend architecture** 419 - - ✅ server-side data loading (`+page.server.ts`) for artist and album pages 420 - - ✅ client-side data loading (`+page.ts`) for auth-dependent pages 421 - - ✅ centralized auth manager (`lib/auth.svelte.ts`) 422 - - ✅ layout-level auth state (`+layout.ts`) shared across all pages 423 - - ✅ eliminated "flash of loading" via proper load functions 424 - - ✅ consistent auth patterns (no scattered localStorage calls) 425 - 426 205 **deployment (fully automated)** 427 206 - **production**: 428 207 - frontend: https://plyr.fm (cloudflare pages) ··· 438 217 - storage: cloudflare R2 (audio-stg bucket) 439 218 - deploy: push to main → automatic 440 219 441 - - **development**: 442 - - backend: localhost:8000 443 - - frontend: localhost:5173 444 - - database: neon postgresql (relay-dev) 445 - - storage: cloudflare R2 (audio-dev and images-dev buckets) 446 - 447 - - **developer tooling**: 448 - - `just serve` - run backend locally 449 - - `just dev` - run frontend locally 450 - - `just test` - run test suite 451 - - `just release` - create production release (backend + frontend) 452 - - `just release-frontend-only` - deploy only frontend changes (added Nov 13) 453 - 454 - ### what's in progress 455 - 456 - **immediate work** 457 - - investigating playback auto-start behavior (#225) 458 - - page refresh sometimes starts playing immediately 459 - - may be related to queue state restoration or localStorage caching 460 - - `autoplay_next` preference not being respected in all cases 461 - - liquid glass effects as user-configurable setting (#186) 462 - 463 - **active research** 464 - - transcoding pipeline architecture (see sandbox/transcoding-pipeline-plan.md) 465 - - content moderation systems (#166, #167, #393 - takedown state representation) 466 - - PWA capabilities and offline support (#165) 467 - 468 220 ### known issues 469 221 470 222 **player behavior** ··· 481 233 - no fullscreen player view (#122) 482 234 - no public API for third-party integrations (#56) 483 235 484 - **technical debt** 485 - - multi-tab playback synchronization could be more robust 486 - - queue state conflicts can occur with rapid operations 487 - 488 - ### technical decisions 489 - 490 - **why Python/FastAPI instead of Rust?** 491 - - rapid prototyping velocity during MVP phase 492 - - rich ecosystem for web APIs (fastapi, sqlalchemy, pydantic) 493 - - excellent async support with asyncio 494 - - lower barrier to contribution 495 - - trade-off: accepting higher latency for faster development 496 - - future: can migrate hot paths to Rust if needed (transcoding service already planned) 497 - 498 - **why Fly.io instead of AWS/GCP?** 499 - - simple deployment model (dockerfile → production) 500 - - automatic SSL/TLS certificates 501 - - built-in global load balancing 502 - - reasonable pricing for MVP ($5/month) 503 - - easy migration path to larger providers later 504 - - trade-off: vendor-specific features, less control 505 - 506 - **why Cloudflare R2 instead of S3?** 507 - - zero egress fees (critical for audio streaming) 508 - - S3-compatible API (easy migration if needed) 509 - - integrated CDN for fast delivery 510 - - significantly cheaper than S3 for bandwidth-heavy workloads 511 - 512 - **why forked atproto SDK?** 513 - - upstream SDK lacked OAuth 2.1 support 514 - - needed custom record management patterns 515 - - maintains compatibility with ATProto spec 516 - - contributes improvements back when possible 517 - 518 - **why SvelteKit instead of React/Next.js?** 519 - - Svelte 5 runes provide excellent reactivity model 520 - - smaller bundle sizes (critical for mobile) 521 - - less boilerplate than React 522 - - SSR + static generation flexibility 523 - - modern DX with TypeScript 524 - 525 - **why Neon instead of self-hosted Postgres?** 526 - - serverless autoscaling (no capacity planning) 527 - - branch-per-PR workflow (preview databases) 528 - - automatic backups and point-in-time recovery 529 - - generous free tier for MVP 530 - - trade-off: higher latency than co-located DB, but acceptable 531 - 532 - **why reject AIFF instead of transcoding immediately?** 533 - - MVP speed: transcoding requires queue infrastructure, ffmpeg setup, error handling 534 - - user communication: better to be upfront about limitations than silent failures 535 - - resource management: transcoding is CPU-intensive, needs proper worker architecture 536 - - future flexibility: can add transcoding as optional feature (high-quality uploads → MP3 delivery) 537 - - trade-off: some users can't upload AIFF now, but those who can upload MP3 have working experience 538 - 539 - **why async everywhere?** 540 - - event loop performance: single-threaded async handles high concurrency 541 - - I/O-bound workload: most time spent waiting on network/disk 542 - - recent work (PRs #149-151) eliminated all blocking operations 543 - - alternative: thread pools for blocking I/O, but increases complexity 544 - - trade-off: debugging async code harder than sync, but worth throughput gains 545 - 546 - **why anyio.Path over thread pools?** 547 - - true async I/O: `anyio` uses OS-level async file operations where available 548 - - constant memory: chunked reads/writes (64KB) prevent OOM on large files 549 - - thread pools: would work but less efficient, more context switching 550 - - trade-off: anyio API slightly different from stdlib `pathlib`, but cleaner async semantics 236 + --- 551 237 552 238 ## cost structure 553 239 ··· 587 273 - storage used: <1GB R2 588 274 - database size: <10MB postgres 589 275 590 - ## next session prep 591 - 592 - **context for new agent:** 593 - 1. Fixed R2 image upload path mismatch, ensuring images save with the correct prefix. 594 - 2. Implemented UI changes for the embed player: removed the Queue button and matched fonts to the main app. 595 - 3. Opened a draft PR to the upstream social-app repository for native Plyr.fm embed support. 596 - 4. Updated issue #153 (transcoding pipeline) with a clear roadmap for integration into the backend. 597 - 5. Developed a local verification script for the transcoder service for faster local iteration. 598 - 599 - **useful commands:** 600 - - `just backend run` - run backend locally 601 - - `just frontend dev` - run frontend locally 602 - - `just test` - run test suite (from `backend/` directory) 603 - - `gh issue list` - check open issues 604 - ## admin tooling 605 - 606 - ### content moderation 607 - script: `scripts/delete_track.py` 608 - - requires `ADMIN_*` prefixed environment variables 609 - - deletes audio file from R2 610 - - deletes cover image from R2 (if exists) 611 - - deletes database record (cascades to likes and queue entries) 612 - - notes ATProto records for manual cleanup (can't delete from other users' PDS) 613 - 614 - usage: 615 - ```bash 616 - # dry run 617 - uv run scripts/delete_track.py <track_id> --dry-run 618 - 619 - # delete with confirmation 620 - uv run scripts/delete_track.py <track_id> 621 - 622 - # delete without confirmation 623 - uv run scripts/delete_track.py <track_id> --yes 624 - 625 - # by URL 626 - uv run scripts/delete_track.py --url https://plyr.fm/track/34 627 - ``` 628 - 629 - required environment variables: 630 - - `ADMIN_DATABASE_URL` - production database connection 631 - - `ADMIN_AWS_ACCESS_KEY_ID` - R2 access key 632 - - `ADMIN_AWS_SECRET_ACCESS_KEY` - R2 secret 633 - - `ADMIN_R2_ENDPOINT_URL` - R2 endpoint 634 - - `ADMIN_R2_BUCKET` - R2 bucket name 635 - 636 - ## known issues 637 - 638 - ### non-blocking 639 - - cloudflare pages preview URLs return 404 (production works fine) 640 - - some "relay" references remain in docs and comments 641 - - ATProto like records can't be deleted when removing tracks (orphaned on users' PDS) 642 - 643 - ## for new contributors 644 - 645 - ### getting started 646 - 1. clone: `gh repo clone zzstoatzz/plyr.fm` 647 - 2. install dependencies: `uv sync && cd frontend && bun install` 648 - 3. run backend: `uv run uvicorn backend.main:app --reload` 649 - 4. run frontend: `cd frontend && bun run dev` 650 - 5. visit http://localhost:5173 651 - 652 - ### development workflow 653 - 1. create issue on github 654 - 2. create PR from feature branch 655 - 3. ensure pre-commit hooks pass 656 - 4. test locally 657 - 5. merge to main → deploys to staging automatically 658 - 6. verify on staging 659 - 7. create github release → deploys to production automatically 660 - 661 - ### key principles 662 - - type hints everywhere 663 - - lowercase aesthetic 664 - - generic terminology (use "items" not "tracks" where appropriate) 665 - - ATProto first 666 - - mobile matters 667 - - cost conscious 668 - - async everywhere (no blocking I/O) 669 - 670 - ### project structure 671 - ``` 672 - plyr.fm/ 673 - ├── backend/ # FastAPI app & Python tooling 674 - │ ├── src/backend/ # application code 675 - │ │ ├── api/ # public endpoints 676 - │ │ ├── _internal/ # internal services 677 - │ │ ├── models/ # database schemas 678 - │ │ └── storage/ # storage adapters 679 - │ ├── tests/ # pytest suite 680 - │ └── alembic/ # database migrations 681 - ├── frontend/ # SvelteKit app 682 - │ ├── src/lib/ # components & state 683 - │ └── src/routes/ # pages 684 - ├── moderation/ # Rust moderation service (ATProto labeler) 685 - │ ├── src/ # Axum handlers, AuDD client, label signing 686 - │ └── static/ # admin UI (html/css/js) 687 - ├── transcoder/ # Rust audio transcoding service 688 - ├── docs/ # documentation 689 - └── justfile # task runner (mods: backend, frontend, moderation, transcoder) 690 - ``` 691 - 692 - ## documentation 693 - 694 - - [deployment overview](docs/deployment/overview.md) 695 - - [configuration guide](docs/configuration.md) 696 - - [queue design](docs/queue-design.md) 697 - - [logfire querying](docs/logfire-querying.md) 698 - - [pdsx guide](docs/pdsx-guide.md) 699 - - [neon mcp guide](docs/neon-mcp-guide.md) 700 - 701 - ## performance optimization session (Nov 12, 2025) 702 - 703 - ### issue: slow /tracks/liked endpoint 704 - 705 - **symptoms**: 706 - - `/tracks/liked` taking 600-900ms consistently 707 - - only ~25ms spent in database queries 708 - - mysterious 575ms gap with no spans in Logfire traces 709 - - endpoint felt sluggish compared to other pages 710 - 711 - **investigation**: 712 - - examined Logfire traces for `/tracks/liked` requests 713 - - found 5-6 liked tracks being returned per request 714 - - DB queries completing fast (track data, artist info, like counts all under 10ms each) 715 - - noticed R2 storage calls weren't appearing in traces despite taking majority of request time 716 - 717 - **root cause**: 718 - - PR #184 added `image_url` column to tracks table to eliminate N+1 R2 API calls 719 - - new tracks (uploaded after PR) have `image_url` populated at upload time ✅ 720 - - legacy tracks (15 tracks uploaded before PR) had `image_url = NULL` ❌ 721 - - fallback code called `track.get_image_url()` for NULL values 722 - - `get_image_url()` makes uninstrumented R2 `head_object` API calls to find image extensions 723 - - each track with NULL `image_url` = ~100-120ms of R2 API calls per request 724 - - 5 tracks × 120ms = ~600ms of uninstrumented latency 725 - 726 - **why R2 calls weren't visible**: 727 - - `storage.get_url()` method had no Logfire instrumentation 728 - - R2 API calls happening but not creating spans 729 - - appeared as mysterious gap in trace timeline 730 - 731 - **solution implemented**: 732 - 1. created `scripts/backfill_image_urls.py` to populate missing `image_url` values 733 - 2. ran script against production database with production R2 credentials 734 - 3. backfilled 11 tracks successfully (4 already done in previous partial run) 735 - 4. 3 tracks "failed" but actually have non-existent images (optional, expected) 736 - 5. script uses concurrent `asyncio.gather()` for performance 737 - 738 - **key learning: environment configuration matters**: 739 - - initial script runs failed silently because: 740 - - script used local `.env` credentials (dev R2 bucket) 741 - - production images stored in different R2 bucket (`images-prod`) 742 - - `get_url()` returned `None` when images not found in dev bucket 743 - - fix: passed production R2 credentials via environment variables: 744 - - `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` 745 - - `R2_IMAGE_BUCKET=images-prod` 746 - - `R2_PUBLIC_IMAGE_BUCKET_URL=https://pub-7ea7ea9a6f224f4f8c0321a2bb008c5a.r2.dev` 747 - 748 - **results**: 749 - - before: 15 tracks needed backfill, causing ~600-900ms latency on `/tracks/liked` 750 - - after: 13 tracks populated with `image_url`, 3 legitimately have no images 751 - - `/tracks/liked` now loads with 0 R2 API calls instead of 5-11 752 - - endpoint feels "really, really snappy" (user feedback) 753 - - performance improvement visible immediately after backfill 754 - 755 - **database cleanup: queue_state table bloat**: 756 - - discovered `queue_state` had 265% bloat (53 dead rows, 20 live rows) 757 - - ran `VACUUM (FULL, ANALYZE) queue_state` against production 758 - - result: 0 dead rows, table clean 759 - - configured autovacuum for queue_state to prevent future bloat: 760 - - frequent updates to this table make it prone to bloat 761 - - should tune `autovacuum_vacuum_scale_factor` to 0.05 (5% vs default 20%) 762 - 763 - **endpoint performance snapshot** (post-fix, last 10 minutes): 764 - - `GET /tracks/`: 410ms (down from 2+ seconds) 765 - - `GET /queue/`: 399ms (down from 2+ seconds) 766 - - `GET /tracks/liked`: now sub-200ms (down from 600-900ms) 767 - - `GET /preferences/`: 200ms median 768 - - `GET /auth/me`: 114ms median 769 - - `POST /tracks/{track_id}/play`: 34ms 770 - 771 - **PR #184 context**: 772 - - PR claimed "opportunistic backfill: legacy records update on first access" 773 - - but actual implementation never saved computed `image_url` back to database 774 - - fallback code only computed URLs on-demand, didn't persist them 775 - - this is why repeated visits kept hitting R2 API for same tracks 776 - - one-time backfill script was correct solution vs adding write logic to read endpoints 777 - 778 - **graceful ATProto recovery (PR #180)**: 779 - - reviewed recent work on handling tracks with missing `atproto_record_uri` 780 - - 4 tracks in production have NULL ATProto records (expected from upload failures) 781 - - system already handles this gracefully: 782 - - like buttons disabled with helpful tooltips 783 - - track owners can self-service restore via portal 784 - - `restore-record` endpoint recreates with correct TID timestamps 785 - - no action needed - existing recovery system working as designed 786 - 787 - **performance metrics pre/post all recent PRs**: 788 - - PR #184 (image_url storage): eliminated hundreds of R2 API calls per request 789 - - today's backfill: eliminated remaining R2 calls for legacy tracks 790 - - combined impact: queue/tracks endpoints now 5-10x faster than before PR #184 791 - - all endpoints now consistently sub-second response times 792 - 793 - **documentation created**: 794 - - `docs/neon-mcp-guide.md`: comprehensive guide for using Neon MCP 795 - - project/branch management 796 - - database schema inspection 797 - - SQL query patterns for plyr.fm 798 - - connection string generation 799 - - environment mapping (dev/staging/prod) 800 - - debugging workflows 801 - - `scripts/backfill_image_urls.py`: reusable for any future image_url gaps 802 - - dry-run mode for safety 803 - - concurrent R2 API calls 804 - - detailed error logging 805 - - production-tested 806 - 807 - **tools and patterns established**: 808 - - Neon MCP for database inspection and queries 809 - - Logfire arbitrary queries for performance analysis 810 - - production secret management via Fly.io 811 - - `flyctl ssh console` for environment inspection 812 - - backfill scripts with dry-run mode 813 - - environment variable overrides for production operations 814 - 815 - **system health indicators**: 816 - - ✅ no 5xx errors in recent spans 817 - - ✅ database queries all under 70ms p95 818 - - ✅ SSL connection pool issues resolved (no errors in recent traces) 819 - - ✅ queue_state table bloat eliminated 820 - - ✅ all track images either in DB or legitimately NULL 821 - - ✅ application feels fast and responsive 822 - 823 - **next steps**: 824 - 1. configure autovacuum for `queue_state` table (prevent future bloat) 825 - 2. add Logfire instrumentation to `storage.get_url()` for visibility 826 - 3. monitor `/tracks/liked` performance over next few days 827 - 4. consider adding similar backfill pattern for any future column additions 828 - 829 276 --- 830 277 831 - ### copyright moderation system (PRs #382, #384, Nov 29-30, 2025) 832 - 833 - **motivation**: detect potential copyright violations in uploaded tracks to avoid DMCA issues and protect the platform. 834 - 835 - **what shipped**: 836 - - **moderation service** (Rust/Axum on Fly.io): 837 - - standalone service at `plyr-moderation.fly.dev` 838 - - integrates with AuDD enterprise API for audio fingerprinting 839 - - scans audio URLs and returns matches with metadata (artist, title, album, ISRC, timecode) 840 - - auth via `X-Moderation-Key` header 841 - - **backend integration** (PR #382): 842 - - `ModerationSettings` in config (service URL, auth token, timeout) 843 - - moderation client module (`backend/_internal/moderation.py`) 844 - - fire-and-forget background task on track upload 845 - - stores results in `copyright_scans` table 846 - - scan errors stored as "clear" so tracks aren't stuck unscanned 847 - - **flagging fix** (PR #384): 848 - - AuDD enterprise API returns no confidence scores (all 0) 849 - - changed from score threshold to presence-based flagging: `is_flagged = !matches.is_empty()` 850 - - removed unused `score_threshold` config 851 - - **backfill script** (`scripts/scan_tracks_copyright.py`): 852 - - scans existing tracks that haven't been checked 853 - - `--max-duration` flag to skip long DJ sets (estimated from file size) 854 - - `--dry-run` mode to preview what would be scanned 855 - - supports dev/staging/prod environments 856 - - **review workflow**: 857 - - `copyright_scans` table has `resolution`, `reviewed_at`, `reviewed_by`, `review_notes` columns 858 - - resolution values: `violation`, `false_positive`, `original_artist` 859 - - SQL queries for dashboard: flagged tracks, unreviewed flags, violations list 860 - 861 - **initial review results** (25 flagged tracks): 862 - - 8 violations (actual copyright issues) 863 - - 11 false positives (fingerprint noise) 864 - - 6 original artists (people uploading their own distributed music) 865 - 866 - **impact**: 867 - - automated copyright detection on upload 868 - - manual review workflow for flagged content 869 - - protection against DMCA takedown requests 870 - - clear audit trail with resolution status 871 - 872 - --- 873 - 874 - ### platform stats and media session integration (PRs #359-379, Nov 27-29, 2025) 875 - 876 - **motivation**: show platform activity at a glance, improve playback experience across devices, and give users control over their data. 877 - 878 - **what shipped**: 879 - - **platform stats endpoint and UI** (PRs #376, #378, #379): 880 - - `GET /stats` returns total plays, tracks, and artists 881 - - stats bar displays in homepage header (e.g., "1,691 plays • 55 tracks • 8 artists") 882 - - skeleton loading animation while fetching 883 - - responsive layout: visible in header on wide screens, collapses to menu on narrow 884 - - end-of-list animation on homepage 885 - - **Media Session API** (PR #371): 886 - - provides track metadata to CarPlay, lock screens, Bluetooth devices, macOS control center 887 - - artwork display with fallback to artist avatar 888 - - play/pause, prev/next, seek controls all work from system UI 889 - - position state syncs scrubbers on external interfaces 890 - - **browser tab title** (PR #374): 891 - - shows "track - artist • plyr.fm" while playing 892 - - persists across page navigation 893 - - reverts to page title when playback stops 894 - - **timed comments** (PR #359): 895 - - comments capture timestamp when added during playback 896 - - clickable timestamp buttons seek to that moment 897 - - compact scrollable comments section on track pages 898 - - **constellation integration** (PR #360): 899 - - queries constellation.microcosm.blue backlink index 900 - - enables network-wide like counts (not just plyr.fm internal) 901 - - environment-aware namespace handling 902 - - **account deletion** (PR #363): 903 - - explicit confirmation flow (type handle to confirm) 904 - - deletes all plyr.fm data (tracks, albums, likes, comments, preferences) 905 - - optional ATProto record cleanup with clear warnings about orphaned references 906 - 907 - **impact**: 908 - - platform stats give visitors immediate sense of activity 909 - - media session makes plyr.fm tracks controllable from car/lock screen/control center 910 - - timed comments enable discussion at specific moments in tracks 911 - - account deletion gives users full control over their data 912 - 913 - --- 914 - 915 - ### developer tokens with independent OAuth grants (PR #367, Nov 28, 2025) 916 - 917 - **motivation**: programmatic API access (scripts, CLIs, automation) needed tokens that survive browser logout and don't become stale when browser sessions refresh. 918 - 919 - **what shipped**: 920 - - **OAuth-based dev tokens**: each developer token gets its own OAuth authorization flow 921 - - user clicks "create token" → redirected to PDS for authorization → token created with independent credentials 922 - - tokens have their own DPoP keypair, access/refresh tokens - completely separate from browser session 923 - - **cookie isolation**: dev token exchange doesn't set browser cookie 924 - - added `is_dev_token` flag to ExchangeToken model 925 - - /auth/exchange skips Set-Cookie for dev token flows 926 - - prevents logout from deleting dev tokens (critical bug fixed during implementation) 927 - - **token management UI**: portal → "your data" → "developer tokens" 928 - - create with optional name and expiration (30/90/180/365 days or never) 929 - - list active tokens with creation/expiration dates 930 - - revoke individual tokens 931 - - **API endpoints**: 932 - - `POST /auth/developer-token/start` - initiates OAuth flow, returns auth_url 933 - - `GET /auth/developer-tokens` - list user's tokens 934 - - `DELETE /auth/developer-tokens/{prefix}` - revoke by 8-char prefix 935 - 936 - **security properties**: 937 - - tokens are full sessions with encrypted OAuth credentials (Fernet) 938 - - each token refreshes independently (no staleness from browser session refresh) 939 - - revokable individually without affecting browser or other tokens 940 - - explicit OAuth consent required at PDS for each token created 941 - 942 - **testing verified**: 943 - - created token → uploaded track → logged out → deleted track with token ✓ 944 - - browser logout doesn't affect dev tokens ✓ 945 - - token works across browser sessions ✓ 946 - - staging deployment tested end-to-end ✓ 947 - 948 - **documentation**: see `docs/authentication.md` "developer tokens" section 949 - 950 - --- 951 - 952 - ### oEmbed endpoint for Leaflet.pub embeds (PRs #355-358, Nov 25, 2025) 953 - 954 - **motivation**: plyr.fm tracks embedded in Leaflet.pub (via iframely) showed a black HTML5 audio box instead of our custom embed player. 955 - 956 - **what shipped**: 957 - - **oEmbed endpoint** (PR #355): `/oembed` returns proper embed HTML with iframe 958 - - follows oEmbed spec with `type: "rich"` and iframe in `html` field 959 - - discovery link in track page `<head>` for automatic detection 960 - - **iframely domain registration**: registered plyr.fm on iframely.com (free tier) 961 - - this was the key fix - iframely now returns our embed iframe as `links.player[0]` 962 - - API key: stored in 1password (iframely account) 963 - 964 - **debugging journey** (PRs #356-358): 965 - - initially tried `og:video` meta tags to hint iframe embed - didn't work 966 - - tried removing `og:audio` to force oEmbed fallback - resulted in no player link 967 - - discovered iframely requires domain registration to trust oEmbed providers 968 - - after registration, iframely correctly returns embed iframe URL 969 - 970 - **current state**: 971 - - oEmbed endpoint working: `curl https://api.plyr.fm/oembed?url=https://plyr.fm/track/92` 972 - - iframely returns `links.player[0].href = "https://plyr.fm/embed/track/92"` (our embed) 973 - - Leaflet.pub should show proper embeds (pending their cache expiry) 974 - 975 - **impact**: 976 - - plyr.fm tracks can be embedded in Leaflet.pub and other iframely-powered services 977 - - proper embed player with cover art instead of raw HTML5 audio 978 - 979 - --- 980 - 981 - ### export & upload reliability (PRs #337-344, Nov 24, 2025) 982 - 983 - **motivation**: exports were failing silently on large files (OOM), uploads showed incorrect progress, and SSE connections triggered false error toasts. 984 - 985 - **what shipped**: 986 - - **database-backed jobs** (PR #337): moved upload/export tracking from in-memory to postgres 987 - - jobs table persists state across server restarts 988 - - enables reliable progress tracking via SSE polling 989 - - **streaming exports** (PR #343): fixed OOM on large file exports 990 - - previously loaded entire files into memory via `response["Body"].read()` 991 - - now streams to temp files, adds to zip from disk (constant memory) 992 - - 90-minute WAV files now export successfully on 1GB VM 993 - - **progress tracking fix** (PR #340): upload progress was receiving bytes but treating as percentage 994 - - `UploadProgressTracker` now properly converts bytes to percentage 995 - - upload progress bar works correctly again 996 - - **UX improvements** (PRs #338-339, #341-342, #344): 997 - - export filename now includes date (`plyr-tracks-2025-11-24.zip`) 998 - - toast notification on track deletion 999 - - fixed false "lost connection" error when SSE completes normally 1000 - - progress now shows "downloading track X of Y" instead of confusing count 1001 - 1002 - **impact**: 1003 - - exports work for arbitrarily large files (limited by disk, not RAM) 1004 - - upload progress displays correctly 1005 - - job state survives server restarts 1006 - - clearer progress messaging during exports 1007 - 1008 - --- 1009 - 1010 - this is a living document. last updated 2025-12-01 after ATProto labeler work. 278 + this is a living document. last updated 2025-12-02 after status maintenance.
update.wav

This is a binary file and will not be displayed.