+441
.status_history/2025-11.md
+441
.status_history/2025-11.md
···
···
1
+
# plyr.fm status archive - november 2025
2
+
3
+
### Queue hydration + ATProto token hardening (Nov 12, 2025)
4
+
5
+
**Why:** queue endpoints were occasionally taking 2s+ and restore operations could 401
6
+
when multiple requests refreshed an expired ATProto token simultaneously.
7
+
8
+
**What shipped:**
9
+
- Added persistent `image_url` on `Track` rows so queue hydration no longer probes R2
10
+
for every track. Queue payloads now pull art directly from Postgres, with a one-time
11
+
fallback for legacy rows.
12
+
- Updated `_internal/queue.py` to backfill any missing URLs once (with caching) instead
13
+
of per-request GETs.
14
+
- Introduced per-session locks in `_refresh_session_tokens` so only one coroutine hits
15
+
`oauth_client.refresh_session` at a time; others reuse the refreshed tokens. This
16
+
removes the race that caused the batch restore flow to intermittently 500/401.
17
+
18
+
**Impact:** queue tail latency dropped back under 500 ms in staging tests, ATProto restore flows are now reliable under concurrent use, and Logfire no longer shows 500s
19
+
from the PDS.
20
+
21
+
### Liked tracks feature (PR #157, Nov 11, 2025)
22
+
23
+
- ✅ server-side persistent collections
24
+
- ✅ ATProto record publication for cross-platform visibility
25
+
- ✅ UI for adding/removing tracks from liked collection
26
+
- ✅ like counts displayed in track responses and analytics (#170)
27
+
- ✅ analytics cards now clickable links to track detail pages (#171)
28
+
- ✅ liked state shown on artist page tracks (#163)
29
+
30
+
### Upload streaming + progress UX (PR #182, Nov 11, 2025)
31
+
32
+
- Frontend switched from `fetch` to `XMLHttpRequest` so we can display upload progress
33
+
toasts (critical for >50 MB mixes on mobile).
34
+
- Upload form now clears only after the request succeeds; failed attempts leave the
35
+
form intact so users don't lose metadata.
36
+
- Backend writes uploads/images to temp files in 8 MB chunks before handing them to the
37
+
storage layer, eliminating whole-file buffering and iOS crashes for hour-long mixes.
38
+
- Deployment verified locally and by rerunning the exact repro Stella hit (85 minute
39
+
mix from mobile).
40
+
41
+
### transcoder API deployment (PR #156, Nov 11, 2025)
42
+
43
+
**standalone Rust transcoding service** 🎉
44
+
- **deployed**: https://plyr-transcoder.fly.dev/
45
+
- **purpose**: convert AIFF/FLAC/etc. to MP3 for browser compatibility
46
+
- **technology**: Axum + ffmpeg + Docker
47
+
- **security**: `X-Transcoder-Key` header authentication (shared secret)
48
+
- **capacity**: handles 1GB uploads, tested with 85-minute AIFF files (~858MB → 195MB MP3 in 32 seconds)
49
+
- **architecture**:
50
+
- 2 Fly machines for high availability
51
+
- auto-stop/start for cost efficiency
52
+
- stateless design (no R2 integration yet)
53
+
- 320kbps MP3 output with proper ID3 tags
54
+
- **status**: deployed and tested, ready for integration into plyr.fm upload pipeline
55
+
- **next steps**: wire into backend with R2 integration and job queue (see issue #153)
56
+
57
+
### AIFF/AIF browser compatibility fix (PR #152, Nov 11, 2025)
58
+
59
+
**format validation improvements**
60
+
- **problem discovered**: AIFF/AIF files only work in Safari, not Chrome/Firefox
61
+
- browsers throw `MediaError code 4: MEDIA_ERR_SRC_NOT_SUPPORTED`
62
+
- users could upload files but they wouldn't play in most browsers
63
+
- **immediate solution**: reject AIFF/AIF uploads at both backend and frontend
64
+
- removed AIFF/AIF from AudioFormat enum
65
+
- added format hints to upload UI: "supported: mp3, wav, m4a"
66
+
- client-side validation with helpful error messages
67
+
- **long-term solution**: deployed standalone transcoder service (see above)
68
+
- separate Rust/Axum service with ffmpeg
69
+
- accepts all formats, converts to browser-compatible MP3
70
+
- integration into upload pipeline pending (issue #153)
71
+
72
+
**observability improvements**:
73
+
- added logfire instrumentation to upload background tasks
74
+
- added logfire spans to R2 storage operations
75
+
- documented logfire querying patterns in `docs/logfire-querying.md`
76
+
77
+
### async I/O performance fixes (PRs #149-151, Nov 10-11, 2025)
78
+
79
+
Eliminated event loop blocking across backend with three critical PRs:
80
+
81
+
1. **PR #149: async R2 reads** - converted R2 `head_object` operations from sync boto3 to async aioboto3
82
+
- portal page load time: 2+ seconds → ~200ms
83
+
- root cause: `track.image_url` was blocking on serial R2 HEAD requests
84
+
85
+
2. **PR #150: concurrent PDS resolution** - parallelized ATProto PDS URL lookups
86
+
- homepage load time: 2-6 seconds → 200-400ms
87
+
- root cause: serial `resolve_atproto_data()` calls (8 artists × 200-300ms each)
88
+
- fix: `asyncio.gather()` for batch resolution, database caching for subsequent loads
89
+
90
+
3. **PR #151: async storage writes/deletes** - made save/delete operations non-blocking
91
+
- R2: switched to `aioboto3` for uploads/deletes (async S3 operations)
92
+
- filesystem: used `anyio.Path` and `anyio.open_file()` for chunked async I/O (64KB chunks)
93
+
- impact: multi-MB uploads no longer monopolize worker thread, constant memory usage
94
+
95
+
### cover art support (PRs #123-126, #132-139)
96
+
- ✅ track cover image upload and storage (separate R2 bucket)
97
+
- ✅ image display on track pages and player
98
+
- ✅ Open Graph meta tags for track sharing
99
+
- ✅ mobile-optimized layouts with cover art
100
+
- ✅ sticky bottom player on mobile with cover
101
+
102
+
### track detail pages (PR #164, Nov 12, 2025)
103
+
104
+
- ✅ dedicated track detail pages with large cover art
105
+
- ✅ play button updates queue state correctly (#169)
106
+
- ✅ liked state loaded efficiently via server-side fetch
107
+
- ✅ mobile-optimized layouts with proper scrolling constraints
108
+
- ✅ origin validation for image URLs (#168)
109
+
110
+
### mobile UI improvements (PRs #159-185, Nov 11-12, 2025)
111
+
112
+
- ✅ compact action menus and better navigation (#161)
113
+
- ✅ improved mobile responsiveness (#159)
114
+
- ✅ consistent button layouts across mobile/desktop (#176-181, #185)
115
+
- ✅ always show play count and like count on mobile (#177)
116
+
- ✅ login page UX improvements (#174-175)
117
+
- ✅ liked page UX improvements (#173)
118
+
- ✅ accent color for liked tracks (#160)
119
+
120
+
### queue management improvements (PRs #110-113, #115)
121
+
- ✅ visual feedback on queue add/remove
122
+
- ✅ toast notifications for queue actions
123
+
- ✅ better error handling for queue operations
124
+
- ✅ improved shuffle and auto-advance UX
125
+
126
+
### infrastructure and tooling
127
+
- ✅ R2 bucket separation: audio-prod and images-prod (PR #124)
128
+
- ✅ admin script for content moderation (`scripts/delete_track.py`)
129
+
- ✅ bluesky attribution link in header
130
+
- ✅ changelog target added (#183)
131
+
- ✅ documentation updates (#158)
132
+
- ✅ track metadata edits now persist correctly (#162)
133
+
134
+
---
135
+
136
+
## performance optimization session (Nov 12, 2025)
137
+
138
+
### issue: slow /tracks/liked endpoint
139
+
140
+
**symptoms**:
141
+
- `/tracks/liked` taking 600-900ms consistently
142
+
- only ~25ms spent in database queries
143
+
- mysterious 575ms gap with no spans in Logfire traces
144
+
- endpoint felt sluggish compared to other pages
145
+
146
+
**investigation**:
147
+
- examined Logfire traces for `/tracks/liked` requests
148
+
- found 5-6 liked tracks being returned per request
149
+
- DB queries completing fast (track data, artist info, like counts all under 10ms each)
150
+
- noticed R2 storage calls weren't appearing in traces despite taking majority of request time
151
+
152
+
**root cause**:
153
+
- PR #184 added `image_url` column to tracks table to eliminate N+1 R2 API calls
154
+
- new tracks (uploaded after PR) have `image_url` populated at upload time ✅
155
+
- legacy tracks (15 tracks uploaded before PR) had `image_url = NULL` ❌
156
+
- fallback code called `track.get_image_url()` for NULL values
157
+
- `get_image_url()` makes uninstrumented R2 `head_object` API calls to find image extensions
158
+
- each track with NULL `image_url` = ~100-120ms of R2 API calls per request
159
+
- 5 tracks × 120ms = ~600ms of uninstrumented latency
160
+
161
+
**why R2 calls weren't visible**:
162
+
- `storage.get_url()` method had no Logfire instrumentation
163
+
- R2 API calls happening but not creating spans
164
+
- appeared as mysterious gap in trace timeline
165
+
166
+
**solution implemented**:
167
+
1. created `scripts/backfill_image_urls.py` to populate missing `image_url` values
168
+
2. ran script against production database with production R2 credentials
169
+
3. backfilled 11 tracks successfully (4 already done in previous partial run)
170
+
4. 3 tracks "failed" but actually have non-existent images (optional, expected)
171
+
5. script uses concurrent `asyncio.gather()` for performance
172
+
173
+
**key learning: environment configuration matters**:
174
+
- initial script runs failed silently because:
175
+
- script used local `.env` credentials (dev R2 bucket)
176
+
- production images stored in different R2 bucket (`images-prod`)
177
+
- `get_url()` returned `None` when images not found in dev bucket
178
+
- fix: passed production R2 credentials via environment variables:
179
+
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`
180
+
- `R2_IMAGE_BUCKET=images-prod`
181
+
- `R2_PUBLIC_IMAGE_BUCKET_URL=https://pub-7ea7ea9a6f224f4f8c0321a2bb008c5a.r2.dev`
182
+
183
+
**results**:
184
+
- before: 15 tracks needed backfill, causing ~600-900ms latency on `/tracks/liked`
185
+
- after: 13 tracks populated with `image_url`, 3 legitimately have no images
186
+
- `/tracks/liked` now loads with 0 R2 API calls instead of 5-11
187
+
- endpoint feels "really, really snappy" (user feedback)
188
+
- performance improvement visible immediately after backfill
189
+
190
+
**database cleanup: queue_state table bloat**:
191
+
- discovered `queue_state` had 265% bloat (53 dead rows, 20 live rows)
192
+
- ran `VACUUM (FULL, ANALYZE) queue_state` against production
193
+
- result: 0 dead rows, table clean
194
+
- configured autovacuum for queue_state to prevent future bloat:
195
+
- frequent updates to this table make it prone to bloat
196
+
- should tune `autovacuum_vacuum_scale_factor` to 0.05 (5% vs default 20%)
197
+
198
+
**endpoint performance snapshot** (post-fix, last 10 minutes):
199
+
- `GET /tracks/`: 410ms (down from 2+ seconds)
200
+
- `GET /queue/`: 399ms (down from 2+ seconds)
201
+
- `GET /tracks/liked`: now sub-200ms (down from 600-900ms)
202
+
- `GET /preferences/`: 200ms median
203
+
- `GET /auth/me`: 114ms median
204
+
- `POST /tracks/{track_id}/play`: 34ms
205
+
206
+
**PR #184 context**:
207
+
- PR claimed "opportunistic backfill: legacy records update on first access"
208
+
- but actual implementation never saved computed `image_url` back to database
209
+
- fallback code only computed URLs on-demand, didn't persist them
210
+
- this is why repeated visits kept hitting R2 API for same tracks
211
+
- one-time backfill script was correct solution vs adding write logic to read endpoints
212
+
213
+
**graceful ATProto recovery (PR #180)**:
214
+
- reviewed recent work on handling tracks with missing `atproto_record_uri`
215
+
- 4 tracks in production have NULL ATProto records (expected from upload failures)
216
+
- system already handles this gracefully:
217
+
- like buttons disabled with helpful tooltips
218
+
- track owners can self-service restore via portal
219
+
- `restore-record` endpoint recreates with correct TID timestamps
220
+
- no action needed - existing recovery system working as designed
221
+
222
+
**performance metrics pre/post all recent PRs**:
223
+
- PR #184 (image_url storage): eliminated hundreds of R2 API calls per request
224
+
- today's backfill: eliminated remaining R2 calls for legacy tracks
225
+
- combined impact: queue/tracks endpoints now 5-10x faster than before PR #184
226
+
- all endpoints now consistently sub-second response times
227
+
228
+
**documentation created**:
229
+
- `docs/neon-mcp-guide.md`: comprehensive guide for using Neon MCP
230
+
- project/branch management
231
+
- database schema inspection
232
+
- SQL query patterns for plyr.fm
233
+
- connection string generation
234
+
- environment mapping (dev/staging/prod)
235
+
- debugging workflows
236
+
- `scripts/backfill_image_urls.py`: reusable for any future image_url gaps
237
+
- dry-run mode for safety
238
+
- concurrent R2 API calls
239
+
- detailed error logging
240
+
- production-tested
241
+
242
+
**tools and patterns established**:
243
+
- Neon MCP for database inspection and queries
244
+
- Logfire arbitrary queries for performance analysis
245
+
- production secret management via Fly.io
246
+
- `flyctl ssh console` for environment inspection
247
+
- backfill scripts with dry-run mode
248
+
- environment variable overrides for production operations
249
+
250
+
**system health indicators**:
251
+
- ✅ no 5xx errors in recent spans
252
+
- ✅ database queries all under 70ms p95
253
+
- ✅ SSL connection pool issues resolved (no errors in recent traces)
254
+
- ✅ queue_state table bloat eliminated
255
+
- ✅ all track images either in DB or legitimately NULL
256
+
- ✅ application feels fast and responsive
257
+
258
+
**next steps**:
259
+
1. configure autovacuum for `queue_state` table (prevent future bloat)
260
+
2. add Logfire instrumentation to `storage.get_url()` for visibility
261
+
3. monitor `/tracks/liked` performance over next few days
262
+
4. consider adding similar backfill pattern for any future column additions
263
+
264
+
---
265
+
266
+
### copyright moderation system (PRs #382, #384, Nov 29-30, 2025)
267
+
268
+
**motivation**: detect potential copyright violations in uploaded tracks to avoid DMCA issues and protect the platform.
269
+
270
+
**what shipped**:
271
+
- **moderation service** (Rust/Axum on Fly.io):
272
+
- standalone service at `plyr-moderation.fly.dev`
273
+
- integrates with AuDD enterprise API for audio fingerprinting
274
+
- scans audio URLs and returns matches with metadata (artist, title, album, ISRC, timecode)
275
+
- auth via `X-Moderation-Key` header
276
+
- **backend integration** (PR #382):
277
+
- `ModerationSettings` in config (service URL, auth token, timeout)
278
+
- moderation client module (`backend/_internal/moderation.py`)
279
+
- fire-and-forget background task on track upload
280
+
- stores results in `copyright_scans` table
281
+
- scan errors stored as "clear" so tracks aren't stuck unscanned
282
+
- **flagging fix** (PR #384):
283
+
- AuDD enterprise API returns no confidence scores (all 0)
284
+
- changed from score threshold to presence-based flagging: `is_flagged = !matches.is_empty()`
285
+
- removed unused `score_threshold` config
286
+
- **backfill script** (`scripts/scan_tracks_copyright.py`):
287
+
- scans existing tracks that haven't been checked
288
+
- `--max-duration` flag to skip long DJ sets (estimated from file size)
289
+
- `--dry-run` mode to preview what would be scanned
290
+
- supports dev/staging/prod environments
291
+
- **review workflow**:
292
+
- `copyright_scans` table has `resolution`, `reviewed_at`, `reviewed_by`, `review_notes` columns
293
+
- resolution values: `violation`, `false_positive`, `original_artist`
294
+
- SQL queries for dashboard: flagged tracks, unreviewed flags, violations list
295
+
296
+
**initial review results** (25 flagged tracks):
297
+
- 8 violations (actual copyright issues)
298
+
- 11 false positives (fingerprint noise)
299
+
- 6 original artists (people uploading their own distributed music)
300
+
301
+
**impact**:
302
+
- automated copyright detection on upload
303
+
- manual review workflow for flagged content
304
+
- protection against DMCA takedown requests
305
+
- clear audit trail with resolution status
306
+
307
+
---
308
+
309
+
### platform stats and media session integration (PRs #359-379, Nov 27-29, 2025)
310
+
311
+
**motivation**: show platform activity at a glance, improve playback experience across devices, and give users control over their data.
312
+
313
+
**what shipped**:
314
+
- **platform stats endpoint and UI** (PRs #376, #378, #379):
315
+
- `GET /stats` returns total plays, tracks, and artists
316
+
- stats bar displays in homepage header (e.g., "1,691 plays • 55 tracks • 8 artists")
317
+
- skeleton loading animation while fetching
318
+
- responsive layout: visible in header on wide screens, collapses to menu on narrow
319
+
- end-of-list animation on homepage
320
+
- **Media Session API** (PR #371):
321
+
- provides track metadata to CarPlay, lock screens, Bluetooth devices, macOS control center
322
+
- artwork display with fallback to artist avatar
323
+
- play/pause, prev/next, seek controls all work from system UI
324
+
- position state syncs scrubbers on external interfaces
325
+
- **browser tab title** (PR #374):
326
+
- shows "track - artist • plyr.fm" while playing
327
+
- persists across page navigation
328
+
- reverts to page title when playback stops
329
+
- **timed comments** (PR #359):
330
+
- comments capture timestamp when added during playback
331
+
- clickable timestamp buttons seek to that moment
332
+
- compact scrollable comments section on track pages
333
+
- **constellation integration** (PR #360):
334
+
- queries constellation.microcosm.blue backlink index
335
+
- enables network-wide like counts (not just plyr.fm internal)
336
+
- environment-aware namespace handling
337
+
- **account deletion** (PR #363):
338
+
- explicit confirmation flow (type handle to confirm)
339
+
- deletes all plyr.fm data (tracks, albums, likes, comments, preferences)
340
+
- optional ATProto record cleanup with clear warnings about orphaned references
341
+
342
+
**impact**:
343
+
- platform stats give visitors immediate sense of activity
344
+
- media session makes plyr.fm tracks controllable from car/lock screen/control center
345
+
- timed comments enable discussion at specific moments in tracks
346
+
- account deletion gives users full control over their data
347
+
348
+
---
349
+
350
+
### developer tokens with independent OAuth grants (PR #367, Nov 28, 2025)
351
+
352
+
**motivation**: programmatic API access (scripts, CLIs, automation) needed tokens that survive browser logout and don't become stale when browser sessions refresh.
353
+
354
+
**what shipped**:
355
+
- **OAuth-based dev tokens**: each developer token gets its own OAuth authorization flow
356
+
- user clicks "create token" → redirected to PDS for authorization → token created with independent credentials
357
+
- tokens have their own DPoP keypair, access/refresh tokens - completely separate from browser session
358
+
- **cookie isolation**: dev token exchange doesn't set browser cookie
359
+
- added `is_dev_token` flag to ExchangeToken model
360
+
- /auth/exchange skips Set-Cookie for dev token flows
361
+
- prevents logout from deleting dev tokens (critical bug fixed during implementation)
362
+
- **token management UI**: portal → "your data" → "developer tokens"
363
+
- create with optional name and expiration (30/90/180/365 days or never)
364
+
- list active tokens with creation/expiration dates
365
+
- revoke individual tokens
366
+
- **API endpoints**:
367
+
- `POST /auth/developer-token/start` - initiates OAuth flow, returns auth_url
368
+
- `GET /auth/developer-tokens` - list user's tokens
369
+
- `DELETE /auth/developer-tokens/{prefix}` - revoke by 8-char prefix
370
+
371
+
**security properties**:
372
+
- tokens are full sessions with encrypted OAuth credentials (Fernet)
373
+
- each token refreshes independently (no staleness from browser session refresh)
374
+
- revokable individually without affecting browser or other tokens
375
+
- explicit OAuth consent required at PDS for each token created
376
+
377
+
**testing verified**:
378
+
- created token → uploaded track → logged out → deleted track with token ✓
379
+
- browser logout doesn't affect dev tokens ✓
380
+
- token works across browser sessions ✓
381
+
- staging deployment tested end-to-end ✓
382
+
383
+
**documentation**: see `docs/authentication.md` "developer tokens" section
384
+
385
+
---
386
+
387
+
### oEmbed endpoint for Leaflet.pub embeds (PRs #355-358, Nov 25, 2025)
388
+
389
+
**motivation**: plyr.fm tracks embedded in Leaflet.pub (via iframely) showed a black HTML5 audio box instead of our custom embed player.
390
+
391
+
**what shipped**:
392
+
- **oEmbed endpoint** (PR #355): `/oembed` returns proper embed HTML with iframe
393
+
- follows oEmbed spec with `type: "rich"` and iframe in `html` field
394
+
- discovery link in track page `<head>` for automatic detection
395
+
- **iframely domain registration**: registered plyr.fm on iframely.com (free tier)
396
+
- this was the key fix - iframely now returns our embed iframe as `links.player[0]`
397
+
- API key: stored in 1password (iframely account)
398
+
399
+
**debugging journey** (PRs #356-358):
400
+
- initially tried `og:video` meta tags to hint iframe embed - didn't work
401
+
- tried removing `og:audio` to force oEmbed fallback - resulted in no player link
402
+
- discovered iframely requires domain registration to trust oEmbed providers
403
+
- after registration, iframely correctly returns embed iframe URL
404
+
405
+
**current state**:
406
+
- oEmbed endpoint working: `curl https://api.plyr.fm/oembed?url=https://plyr.fm/track/92`
407
+
- iframely returns `links.player[0].href = "https://plyr.fm/embed/track/92"` (our embed)
408
+
- Leaflet.pub should show proper embeds (pending their cache expiry)
409
+
410
+
**impact**:
411
+
- plyr.fm tracks can be embedded in Leaflet.pub and other iframely-powered services
412
+
- proper embed player with cover art instead of raw HTML5 audio
413
+
414
+
---
415
+
416
+
### export & upload reliability (PRs #337-344, Nov 24, 2025)
417
+
418
+
**motivation**: exports were failing silently on large files (OOM), uploads showed incorrect progress, and SSE connections triggered false error toasts.
419
+
420
+
**what shipped**:
421
+
- **database-backed jobs** (PR #337): moved upload/export tracking from in-memory to postgres
422
+
- jobs table persists state across server restarts
423
+
- enables reliable progress tracking via SSE polling
424
+
- **streaming exports** (PR #343): fixed OOM on large file exports
425
+
- previously loaded entire files into memory via `response["Body"].read()`
426
+
- now streams to temp files, adds to zip from disk (constant memory)
427
+
- 90-minute WAV files now export successfully on 1GB VM
428
+
- **progress tracking fix** (PR #340): upload progress was receiving bytes but treating as percentage
429
+
- `UploadProgressTracker` now properly converts bytes to percentage
430
+
- upload progress bar works correctly again
431
+
- **UX improvements** (PRs #338-339, #341-342, #344):
432
+
- export filename now includes date (`plyr-tracks-2025-11-24.zip`)
433
+
- toast notification on track deletion
434
+
- fixed false "lost connection" error when SSE completes normally
435
+
- progress now shows "downloading track X of Y" instead of confusing count
436
+
437
+
**impact**:
438
+
- exports work for arbitrarily large files (limited by disk, not RAM)
439
+
- upload progress displays correctly
440
+
- job state survives server restarts
441
+
- clearer progress messaging during exports
+6
-738
STATUS.md
+6
-738
STATUS.md
···
131
- htmx endpoints: `/admin/flags-html`, `/admin/resolve-htmx`
132
- server-rendered HTML partials for flag cards
133
134
-
135
-
### Queue hydration + ATProto token hardening (Nov 12, 2025)
136
-
137
-
**Why:** queue endpoints were occasionally taking 2s+ and restore operations could 401
138
-
when multiple requests refreshed an expired ATProto token simultaneously.
139
-
140
-
**What shipped:**
141
-
- Added persistent `image_url` on `Track` rows so queue hydration no longer probes R2
142
-
for every track. Queue payloads now pull art directly from Postgres, with a one-time
143
-
fallback for legacy rows.
144
-
- Updated `_internal/queue.py` to backfill any missing URLs once (with caching) instead
145
-
of per-request GETs.
146
-
- Introduced per-session locks in `_refresh_session_tokens` so only one coroutine hits
147
-
`oauth_client.refresh_session` at a time; others reuse the refreshed tokens. This
148
-
removes the race that caused the batch restore flow to intermittently 500/401.
149
-
150
-
**Impact:** queue tail latency dropped back under 500 ms in staging tests, ATProto restore flows are now reliable under concurrent use, and Logfire no longer shows 500s
151
-
from the PDS.
152
-
153
-
### Liked tracks feature (PR #157, Nov 11, 2025)
154
-
155
-
- ✅ server-side persistent collections
156
-
- ✅ ATProto record publication for cross-platform visibility
157
-
- ✅ UI for adding/removing tracks from liked collection
158
-
- ✅ like counts displayed in track responses and analytics (#170)
159
-
- ✅ analytics cards now clickable links to track detail pages (#171)
160
-
- ✅ liked state shown on artist page tracks (#163)
161
-
162
-
### Upload streaming + progress UX (PR #182, Nov 11, 2025)
163
-
164
-
- Frontend switched from `fetch` to `XMLHttpRequest` so we can display upload progress
165
-
toasts (critical for >50 MB mixes on mobile).
166
-
- Upload form now clears only after the request succeeds; failed attempts leave the
167
-
form intact so users don't lose metadata.
168
-
- Backend writes uploads/images to temp files in 8 MB chunks before handing them to the
169
-
storage layer, eliminating whole-file buffering and iOS crashes for hour-long mixes.
170
-
- Deployment verified locally and by rerunning the exact repro Stella hit (85 minute
171
-
mix from mobile).
172
-
173
-
### transcoder API deployment (PR #156, Nov 11, 2025)
174
-
175
-
**standalone Rust transcoding service** 🎉
176
-
- **deployed**: https://plyr-transcoder.fly.dev/
177
-
- **purpose**: convert AIFF/FLAC/etc. to MP3 for browser compatibility
178
-
- **technology**: Axum + ffmpeg + Docker
179
-
- **security**: `X-Transcoder-Key` header authentication (shared secret)
180
-
- **capacity**: handles 1GB uploads, tested with 85-minute AIFF files (~858MB → 195MB MP3 in 32 seconds)
181
-
- **architecture**:
182
-
- 2 Fly machines for high availability
183
-
- auto-stop/start for cost efficiency
184
-
- stateless design (no R2 integration yet)
185
-
- 320kbps MP3 output with proper ID3 tags
186
-
- **status**: deployed and tested, ready for integration into plyr.fm upload pipeline
187
-
- **next steps**: wire into backend with R2 integration and job queue (see issue #153)
188
-
189
-
### AIFF/AIF browser compatibility fix (PR #152, Nov 11, 2025)
190
-
191
-
**format validation improvements**
192
-
- **problem discovered**: AIFF/AIF files only work in Safari, not Chrome/Firefox
193
-
- browsers throw `MediaError code 4: MEDIA_ERR_SRC_NOT_SUPPORTED`
194
-
- users could upload files but they wouldn't play in most browsers
195
-
- **immediate solution**: reject AIFF/AIF uploads at both backend and frontend
196
-
- removed AIFF/AIF from AudioFormat enum
197
-
- added format hints to upload UI: "supported: mp3, wav, m4a"
198
-
- client-side validation with helpful error messages
199
-
- **long-term solution**: deployed standalone transcoder service (see above)
200
-
- separate Rust/Axum service with ffmpeg
201
-
- accepts all formats, converts to browser-compatible MP3
202
-
- integration into upload pipeline pending (issue #153)
203
-
204
-
**observability improvements**:
205
-
- added logfire instrumentation to upload background tasks
206
-
- added logfire spans to R2 storage operations
207
-
- documented logfire querying patterns in `docs/logfire-querying.md`
208
-
209
-
### async I/O performance fixes (PRs #149-151, Nov 10-11, 2025)
210
-
211
-
Eliminated event loop blocking across backend with three critical PRs:
212
-
213
-
1. **PR #149: async R2 reads** - converted R2 `head_object` operations from sync boto3 to async aioboto3
214
-
- portal page load time: 2+ seconds → ~200ms
215
-
- root cause: `track.image_url` was blocking on serial R2 HEAD requests
216
-
217
-
2. **PR #150: concurrent PDS resolution** - parallelized ATProto PDS URL lookups
218
-
- homepage load time: 2-6 seconds → 200-400ms
219
-
- root cause: serial `resolve_atproto_data()` calls (8 artists × 200-300ms each)
220
-
- fix: `asyncio.gather()` for batch resolution, database caching for subsequent loads
221
-
222
-
3. **PR #151: async storage writes/deletes** - made save/delete operations non-blocking
223
-
- R2: switched to `aioboto3` for uploads/deletes (async S3 operations)
224
-
- filesystem: used `anyio.Path` and `anyio.open_file()` for chunked async I/O (64KB chunks)
225
-
- impact: multi-MB uploads no longer monopolize worker thread, constant memory usage
226
-
227
-
### cover art support (PRs #123-126, #132-139)
228
-
- ✅ track cover image upload and storage (separate R2 bucket)
229
-
- ✅ image display on track pages and player
230
-
- ✅ Open Graph meta tags for track sharing
231
-
- ✅ mobile-optimized layouts with cover art
232
-
- ✅ sticky bottom player on mobile with cover
233
-
234
-
### track detail pages (PR #164, Nov 12, 2025)
235
-
236
-
- ✅ dedicated track detail pages with large cover art
237
-
- ✅ play button updates queue state correctly (#169)
238
-
- ✅ liked state loaded efficiently via server-side fetch
239
-
- ✅ mobile-optimized layouts with proper scrolling constraints
240
-
- ✅ origin validation for image URLs (#168)
241
-
242
-
### mobile UI improvements (PRs #159-185, Nov 11-12, 2025)
243
-
244
-
- ✅ compact action menus and better navigation (#161)
245
-
- ✅ improved mobile responsiveness (#159)
246
-
- ✅ consistent button layouts across mobile/desktop (#176-181, #185)
247
-
- ✅ always show play count and like count on mobile (#177)
248
-
- ✅ login page UX improvements (#174-175)
249
-
- ✅ liked page UX improvements (#173)
250
-
- ✅ accent color for liked tracks (#160)
251
-
252
-
### queue management improvements (PRs #110-113, #115)
253
-
- ✅ visual feedback on queue add/remove
254
-
- ✅ toast notifications for queue actions
255
-
- ✅ better error handling for queue operations
256
-
- ✅ improved shuffle and auto-advance UX
257
-
258
-
### infrastructure and tooling
259
-
- ✅ R2 bucket separation: audio-prod and images-prod (PR #124)
260
-
- ✅ admin script for content moderation (`scripts/delete_track.py`)
261
-
- ✅ bluesky attribution link in header
262
-
- ✅ changelog target added (#183)
263
-
- ✅ documentation updates (#158)
264
-
- ✅ track metadata edits now persist correctly (#162)
265
266
## immediate priorities
267
···
287
- fix: set `pool_pre_ping=True`, adjust `pool_recycle` for Neon timeouts
288
- documented in `docs/logfire-querying.md`
289
290
-
### performance optimizations
291
-
3. **persist concrete file extensions in database**: currently brute-force probing all supported formats on read
292
-
- already know `Track.file_type` and image format during upload
293
-
- eliminating repeated `exists()` checks reduces filesystem/R2 HEAD spam
294
-
- improves audio streaming latency (`/audio/{file_id}` endpoint walks extensions sequentially)
295
-
296
-
4. **stream large uploads directly to storage**: current implementation reads entire file into memory before background task
297
-
- multi-GB uploads risk OOM
298
-
- stream from `UploadFile.file` → storage backend for constant memory usage
299
-
300
-
### new features
301
-
5. **content-addressable storage** (issue #146)
302
-
- hash-based file storage for automatic deduplication
303
-
- reduces storage costs when multiple artists upload same file
304
-
- enables content verification
305
-
306
-
6. **liked tracks feature** (issue #144): design schema and ATProto record format
307
-
- server-side persistent collections
308
-
- ATProto record publication for cross-platform visibility
309
-
- UI for adding/removing tracks from liked collection
310
-
311
-
## open issues by timeline
312
-
313
-
### immediate
314
-
- issue #153: audio transcoding pipeline (ffmpeg worker for AIFF/FLAC→MP3)
315
-
- issue #147: upload reliability bug (data loss risk)
316
-
- issue #144: likes feature for personal collections
317
-
318
-
### short-term
319
-
- issue #146: content-addressable storage (hash-based deduplication)
320
-
- issue #24: implement play count abuse prevention
321
-
- database connection pool tuning (SSL errors)
322
-
- file extension persistence in database
323
-
324
-
### medium-term
325
-
- issue #39: postmortem - cross-domain auth deployment and remaining security TODOs
326
-
- issue #46: consider removing init_db() from lifespan in favor of migration-only approach
327
-
- issue #56: design public developer API and versioning
328
-
- issue #57: support multiple audio item types (voice memos/snippets)
329
-
- issue #122: fullscreen player for immersive playback
330
-
331
-
### long-term
332
-
- migrate to plyr-owned lexicon (custom ATProto namespace with richer metadata)
333
-
- publish to multiple ATProto AppViews for cross-platform visibility
334
-
- explore ATProto-native notifications (replace Bluesky DM bot)
335
-
- realtime queue syncing across devices via SSE/WebSocket
336
-
- artist analytics dashboard improvements
337
-
- issue #44: modern music streaming feature parity
338
339
## technical state
340
341
-
### architecture
342
-
343
-
**backend**
344
-
- language: Python 3.11+
345
-
- framework: FastAPI with uvicorn
346
-
- database: Neon PostgreSQL (serverless, fully managed)
347
-
- storage: Cloudflare R2 (S3-compatible object storage)
348
-
- hosting: Fly.io (2x shared-cpu VMs, auto-scaling)
349
-
- observability: Pydantic Logfire (traces, metrics, logs)
350
-
- auth: ATProto OAuth 2.1 (forked SDK: github.com/zzstoatzz/atproto)
351
-
352
-
**frontend**
353
-
- framework: SvelteKit (latest v2.43.2)
354
-
- runtime: Bun (fast JS runtime)
355
-
- hosting: Cloudflare Pages (edge network)
356
-
- styling: vanilla CSS with lowercase aesthetic
357
-
- state management: Svelte 5 runes ($state, $derived, $effect)
358
-
359
-
**deployment**
360
-
- ci/cd: GitHub Actions
361
-
- backend: automatic on main branch merge (fly.io deploy)
362
-
- frontend: automatic on every push to main (cloudflare pages)
363
-
- migrations: automated via fly.io release_command
364
-
- environments: dev → staging → production (full separation)
365
-
- versioning: nebula timestamp format (YYYY.MMDD.HHMMSS)
366
-
367
-
**key dependencies**
368
-
- atproto: forked SDK for OAuth and record management
369
-
- sqlalchemy: async ORM for postgres
370
-
- alembic: database migrations
371
-
- boto3/aioboto3: R2 storage client
372
-
- logfire: observability (FastAPI + SQLAlchemy instrumentation)
373
-
- httpx: async HTTP client
374
-
375
-
**what's working**
376
377
**core functionality**
378
- ✅ ATProto OAuth 2.1 authentication with encrypted state
···
396
- ✅ cross-tab queue synchronization via BroadcastChannel
397
- ✅ share tracks via URL with Open Graph previews (including cover art)
398
- ✅ image URL caching in database (eliminates N+1 R2 calls)
399
-
- ✅ format validation (rejects AIFF/AIF, accepts MP3/WAV/M4A with helpful error mes
400
-
sages)
401
- ✅ standalone audio transcoding service deployed and verified (see issue #153)
402
-
- ✅ Bluesky embed player UI changes implemented (pending upstream social-app PR)
403
- ✅ admin content moderation script for removing inappropriate uploads
404
- ✅ copyright moderation system (AuDD fingerprinting, review workflow, violation tracking)
405
- ✅ ATProto labeler for copyright violations (queryLabels, subscribeLabels XRPC endpoints)
···
415
- ✅ long album title handling (100-char slugs, CSS truncation)
416
- ⏸ ATProto records for albums (deferred, see issue #221)
417
418
-
**frontend architecture**
419
-
- ✅ server-side data loading (`+page.server.ts`) for artist and album pages
420
-
- ✅ client-side data loading (`+page.ts`) for auth-dependent pages
421
-
- ✅ centralized auth manager (`lib/auth.svelte.ts`)
422
-
- ✅ layout-level auth state (`+layout.ts`) shared across all pages
423
-
- ✅ eliminated "flash of loading" via proper load functions
424
-
- ✅ consistent auth patterns (no scattered localStorage calls)
425
-
426
**deployment (fully automated)**
427
- **production**:
428
- frontend: https://plyr.fm (cloudflare pages)
···
438
- storage: cloudflare R2 (audio-stg bucket)
439
- deploy: push to main → automatic
440
441
-
- **development**:
442
-
- backend: localhost:8000
443
-
- frontend: localhost:5173
444
-
- database: neon postgresql (relay-dev)
445
-
- storage: cloudflare R2 (audio-dev and images-dev buckets)
446
-
447
-
- **developer tooling**:
448
-
- `just serve` - run backend locally
449
-
- `just dev` - run frontend locally
450
-
- `just test` - run test suite
451
-
- `just release` - create production release (backend + frontend)
452
-
- `just release-frontend-only` - deploy only frontend changes (added Nov 13)
453
-
454
-
### what's in progress
455
-
456
-
**immediate work**
457
-
- investigating playback auto-start behavior (#225)
458
-
- page refresh sometimes starts playing immediately
459
-
- may be related to queue state restoration or localStorage caching
460
-
- `autoplay_next` preference not being respected in all cases
461
-
- liquid glass effects as user-configurable setting (#186)
462
-
463
-
**active research**
464
-
- transcoding pipeline architecture (see sandbox/transcoding-pipeline-plan.md)
465
-
- content moderation systems (#166, #167, #393 - takedown state representation)
466
-
- PWA capabilities and offline support (#165)
467
-
468
### known issues
469
470
**player behavior**
···
481
- no fullscreen player view (#122)
482
- no public API for third-party integrations (#56)
483
484
-
**technical debt**
485
-
- multi-tab playback synchronization could be more robust
486
-
- queue state conflicts can occur with rapid operations
487
-
488
-
### technical decisions
489
-
490
-
**why Python/FastAPI instead of Rust?**
491
-
- rapid prototyping velocity during MVP phase
492
-
- rich ecosystem for web APIs (fastapi, sqlalchemy, pydantic)
493
-
- excellent async support with asyncio
494
-
- lower barrier to contribution
495
-
- trade-off: accepting higher latency for faster development
496
-
- future: can migrate hot paths to Rust if needed (transcoding service already planned)
497
-
498
-
**why Fly.io instead of AWS/GCP?**
499
-
- simple deployment model (dockerfile → production)
500
-
- automatic SSL/TLS certificates
501
-
- built-in global load balancing
502
-
- reasonable pricing for MVP ($5/month)
503
-
- easy migration path to larger providers later
504
-
- trade-off: vendor-specific features, less control
505
-
506
-
**why Cloudflare R2 instead of S3?**
507
-
- zero egress fees (critical for audio streaming)
508
-
- S3-compatible API (easy migration if needed)
509
-
- integrated CDN for fast delivery
510
-
- significantly cheaper than S3 for bandwidth-heavy workloads
511
-
512
-
**why forked atproto SDK?**
513
-
- upstream SDK lacked OAuth 2.1 support
514
-
- needed custom record management patterns
515
-
- maintains compatibility with ATProto spec
516
-
- contributes improvements back when possible
517
-
518
-
**why SvelteKit instead of React/Next.js?**
519
-
- Svelte 5 runes provide excellent reactivity model
520
-
- smaller bundle sizes (critical for mobile)
521
-
- less boilerplate than React
522
-
- SSR + static generation flexibility
523
-
- modern DX with TypeScript
524
-
525
-
**why Neon instead of self-hosted Postgres?**
526
-
- serverless autoscaling (no capacity planning)
527
-
- branch-per-PR workflow (preview databases)
528
-
- automatic backups and point-in-time recovery
529
-
- generous free tier for MVP
530
-
- trade-off: higher latency than co-located DB, but acceptable
531
-
532
-
**why reject AIFF instead of transcoding immediately?**
533
-
- MVP speed: transcoding requires queue infrastructure, ffmpeg setup, error handling
534
-
- user communication: better to be upfront about limitations than silent failures
535
-
- resource management: transcoding is CPU-intensive, needs proper worker architecture
536
-
- future flexibility: can add transcoding as optional feature (high-quality uploads → MP3 delivery)
537
-
- trade-off: some users can't upload AIFF now, but those who can upload MP3 have working experience
538
-
539
-
**why async everywhere?**
540
-
- event loop performance: single-threaded async handles high concurrency
541
-
- I/O-bound workload: most time spent waiting on network/disk
542
-
- recent work (PRs #149-151) eliminated all blocking operations
543
-
- alternative: thread pools for blocking I/O, but increases complexity
544
-
- trade-off: debugging async code harder than sync, but worth throughput gains
545
-
546
-
**why anyio.Path over thread pools?**
547
-
- true async I/O: `anyio` uses OS-level async file operations where available
548
-
- constant memory: chunked reads/writes (64KB) prevent OOM on large files
549
-
- thread pools: would work but less efficient, more context switching
550
-
- trade-off: anyio API slightly different from stdlib `pathlib`, but cleaner async semantics
551
552
## cost structure
553
···
587
- storage used: <1GB R2
588
- database size: <10MB postgres
589
590
-
## next session prep
591
-
592
-
**context for new agent:**
593
-
1. Fixed R2 image upload path mismatch, ensuring images save with the correct prefix.
594
-
2. Implemented UI changes for the embed player: removed the Queue button and matched fonts to the main app.
595
-
3. Opened a draft PR to the upstream social-app repository for native Plyr.fm embed support.
596
-
4. Updated issue #153 (transcoding pipeline) with a clear roadmap for integration into the backend.
597
-
5. Developed a local verification script for the transcoder service for faster local iteration.
598
-
599
-
**useful commands:**
600
-
- `just backend run` - run backend locally
601
-
- `just frontend dev` - run frontend locally
602
-
- `just test` - run test suite (from `backend/` directory)
603
-
- `gh issue list` - check open issues
604
-
## admin tooling
605
-
606
-
### content moderation
607
-
script: `scripts/delete_track.py`
608
-
- requires `ADMIN_*` prefixed environment variables
609
-
- deletes audio file from R2
610
-
- deletes cover image from R2 (if exists)
611
-
- deletes database record (cascades to likes and queue entries)
612
-
- notes ATProto records for manual cleanup (can't delete from other users' PDS)
613
-
614
-
usage:
615
-
```bash
616
-
# dry run
617
-
uv run scripts/delete_track.py <track_id> --dry-run
618
-
619
-
# delete with confirmation
620
-
uv run scripts/delete_track.py <track_id>
621
-
622
-
# delete without confirmation
623
-
uv run scripts/delete_track.py <track_id> --yes
624
-
625
-
# by URL
626
-
uv run scripts/delete_track.py --url https://plyr.fm/track/34
627
-
```
628
-
629
-
required environment variables:
630
-
- `ADMIN_DATABASE_URL` - production database connection
631
-
- `ADMIN_AWS_ACCESS_KEY_ID` - R2 access key
632
-
- `ADMIN_AWS_SECRET_ACCESS_KEY` - R2 secret
633
-
- `ADMIN_R2_ENDPOINT_URL` - R2 endpoint
634
-
- `ADMIN_R2_BUCKET` - R2 bucket name
635
-
636
-
## known issues
637
-
638
-
### non-blocking
639
-
- cloudflare pages preview URLs return 404 (production works fine)
640
-
- some "relay" references remain in docs and comments
641
-
- ATProto like records can't be deleted when removing tracks (orphaned on users' PDS)
642
-
643
-
## for new contributors
644
-
645
-
### getting started
646
-
1. clone: `gh repo clone zzstoatzz/plyr.fm`
647
-
2. install dependencies: `uv sync && cd frontend && bun install`
648
-
3. run backend: `uv run uvicorn backend.main:app --reload`
649
-
4. run frontend: `cd frontend && bun run dev`
650
-
5. visit http://localhost:5173
651
-
652
-
### development workflow
653
-
1. create issue on github
654
-
2. create PR from feature branch
655
-
3. ensure pre-commit hooks pass
656
-
4. test locally
657
-
5. merge to main → deploys to staging automatically
658
-
6. verify on staging
659
-
7. create github release → deploys to production automatically
660
-
661
-
### key principles
662
-
- type hints everywhere
663
-
- lowercase aesthetic
664
-
- generic terminology (use "items" not "tracks" where appropriate)
665
-
- ATProto first
666
-
- mobile matters
667
-
- cost conscious
668
-
- async everywhere (no blocking I/O)
669
-
670
-
### project structure
671
-
```
672
-
plyr.fm/
673
-
├── backend/ # FastAPI app & Python tooling
674
-
│ ├── src/backend/ # application code
675
-
│ │ ├── api/ # public endpoints
676
-
│ │ ├── _internal/ # internal services
677
-
│ │ ├── models/ # database schemas
678
-
│ │ └── storage/ # storage adapters
679
-
│ ├── tests/ # pytest suite
680
-
│ └── alembic/ # database migrations
681
-
├── frontend/ # SvelteKit app
682
-
│ ├── src/lib/ # components & state
683
-
│ └── src/routes/ # pages
684
-
├── moderation/ # Rust moderation service (ATProto labeler)
685
-
│ ├── src/ # Axum handlers, AuDD client, label signing
686
-
│ └── static/ # admin UI (html/css/js)
687
-
├── transcoder/ # Rust audio transcoding service
688
-
├── docs/ # documentation
689
-
└── justfile # task runner (mods: backend, frontend, moderation, transcoder)
690
-
```
691
-
692
-
## documentation
693
-
694
-
- [deployment overview](docs/deployment/overview.md)
695
-
- [configuration guide](docs/configuration.md)
696
-
- [queue design](docs/queue-design.md)
697
-
- [logfire querying](docs/logfire-querying.md)
698
-
- [pdsx guide](docs/pdsx-guide.md)
699
-
- [neon mcp guide](docs/neon-mcp-guide.md)
700
-
701
-
## performance optimization session (Nov 12, 2025)
702
-
703
-
### issue: slow /tracks/liked endpoint
704
-
705
-
**symptoms**:
706
-
- `/tracks/liked` taking 600-900ms consistently
707
-
- only ~25ms spent in database queries
708
-
- mysterious 575ms gap with no spans in Logfire traces
709
-
- endpoint felt sluggish compared to other pages
710
-
711
-
**investigation**:
712
-
- examined Logfire traces for `/tracks/liked` requests
713
-
- found 5-6 liked tracks being returned per request
714
-
- DB queries completing fast (track data, artist info, like counts all under 10ms each)
715
-
- noticed R2 storage calls weren't appearing in traces despite taking majority of request time
716
-
717
-
**root cause**:
718
-
- PR #184 added `image_url` column to tracks table to eliminate N+1 R2 API calls
719
-
- new tracks (uploaded after PR) have `image_url` populated at upload time ✅
720
-
- legacy tracks (15 tracks uploaded before PR) had `image_url = NULL` ❌
721
-
- fallback code called `track.get_image_url()` for NULL values
722
-
- `get_image_url()` makes uninstrumented R2 `head_object` API calls to find image extensions
723
-
- each track with NULL `image_url` = ~100-120ms of R2 API calls per request
724
-
- 5 tracks × 120ms = ~600ms of uninstrumented latency
725
-
726
-
**why R2 calls weren't visible**:
727
-
- `storage.get_url()` method had no Logfire instrumentation
728
-
- R2 API calls happening but not creating spans
729
-
- appeared as mysterious gap in trace timeline
730
-
731
-
**solution implemented**:
732
-
1. created `scripts/backfill_image_urls.py` to populate missing `image_url` values
733
-
2. ran script against production database with production R2 credentials
734
-
3. backfilled 11 tracks successfully (4 already done in previous partial run)
735
-
4. 3 tracks "failed" but actually have non-existent images (optional, expected)
736
-
5. script uses concurrent `asyncio.gather()` for performance
737
-
738
-
**key learning: environment configuration matters**:
739
-
- initial script runs failed silently because:
740
-
- script used local `.env` credentials (dev R2 bucket)
741
-
- production images stored in different R2 bucket (`images-prod`)
742
-
- `get_url()` returned `None` when images not found in dev bucket
743
-
- fix: passed production R2 credentials via environment variables:
744
-
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`
745
-
- `R2_IMAGE_BUCKET=images-prod`
746
-
- `R2_PUBLIC_IMAGE_BUCKET_URL=https://pub-7ea7ea9a6f224f4f8c0321a2bb008c5a.r2.dev`
747
-
748
-
**results**:
749
-
- before: 15 tracks needed backfill, causing ~600-900ms latency on `/tracks/liked`
750
-
- after: 13 tracks populated with `image_url`, 3 legitimately have no images
751
-
- `/tracks/liked` now loads with 0 R2 API calls instead of 5-11
752
-
- endpoint feels "really, really snappy" (user feedback)
753
-
- performance improvement visible immediately after backfill
754
-
755
-
**database cleanup: queue_state table bloat**:
756
-
- discovered `queue_state` had 265% bloat (53 dead rows, 20 live rows)
757
-
- ran `VACUUM (FULL, ANALYZE) queue_state` against production
758
-
- result: 0 dead rows, table clean
759
-
- configured autovacuum for queue_state to prevent future bloat:
760
-
- frequent updates to this table make it prone to bloat
761
-
- should tune `autovacuum_vacuum_scale_factor` to 0.05 (5% vs default 20%)
762
-
763
-
**endpoint performance snapshot** (post-fix, last 10 minutes):
764
-
- `GET /tracks/`: 410ms (down from 2+ seconds)
765
-
- `GET /queue/`: 399ms (down from 2+ seconds)
766
-
- `GET /tracks/liked`: now sub-200ms (down from 600-900ms)
767
-
- `GET /preferences/`: 200ms median
768
-
- `GET /auth/me`: 114ms median
769
-
- `POST /tracks/{track_id}/play`: 34ms
770
-
771
-
**PR #184 context**:
772
-
- PR claimed "opportunistic backfill: legacy records update on first access"
773
-
- but actual implementation never saved computed `image_url` back to database
774
-
- fallback code only computed URLs on-demand, didn't persist them
775
-
- this is why repeated visits kept hitting R2 API for same tracks
776
-
- one-time backfill script was correct solution vs adding write logic to read endpoints
777
-
778
-
**graceful ATProto recovery (PR #180)**:
779
-
- reviewed recent work on handling tracks with missing `atproto_record_uri`
780
-
- 4 tracks in production have NULL ATProto records (expected from upload failures)
781
-
- system already handles this gracefully:
782
-
- like buttons disabled with helpful tooltips
783
-
- track owners can self-service restore via portal
784
-
- `restore-record` endpoint recreates with correct TID timestamps
785
-
- no action needed - existing recovery system working as designed
786
-
787
-
**performance metrics pre/post all recent PRs**:
788
-
- PR #184 (image_url storage): eliminated hundreds of R2 API calls per request
789
-
- today's backfill: eliminated remaining R2 calls for legacy tracks
790
-
- combined impact: queue/tracks endpoints now 5-10x faster than before PR #184
791
-
- all endpoints now consistently sub-second response times
792
-
793
-
**documentation created**:
794
-
- `docs/neon-mcp-guide.md`: comprehensive guide for using Neon MCP
795
-
- project/branch management
796
-
- database schema inspection
797
-
- SQL query patterns for plyr.fm
798
-
- connection string generation
799
-
- environment mapping (dev/staging/prod)
800
-
- debugging workflows
801
-
- `scripts/backfill_image_urls.py`: reusable for any future image_url gaps
802
-
- dry-run mode for safety
803
-
- concurrent R2 API calls
804
-
- detailed error logging
805
-
- production-tested
806
-
807
-
**tools and patterns established**:
808
-
- Neon MCP for database inspection and queries
809
-
- Logfire arbitrary queries for performance analysis
810
-
- production secret management via Fly.io
811
-
- `flyctl ssh console` for environment inspection
812
-
- backfill scripts with dry-run mode
813
-
- environment variable overrides for production operations
814
-
815
-
**system health indicators**:
816
-
- ✅ no 5xx errors in recent spans
817
-
- ✅ database queries all under 70ms p95
818
-
- ✅ SSL connection pool issues resolved (no errors in recent traces)
819
-
- ✅ queue_state table bloat eliminated
820
-
- ✅ all track images either in DB or legitimately NULL
821
-
- ✅ application feels fast and responsive
822
-
823
-
**next steps**:
824
-
1. configure autovacuum for `queue_state` table (prevent future bloat)
825
-
2. add Logfire instrumentation to `storage.get_url()` for visibility
826
-
3. monitor `/tracks/liked` performance over next few days
827
-
4. consider adding similar backfill pattern for any future column additions
828
-
829
---
830
831
-
### copyright moderation system (PRs #382, #384, Nov 29-30, 2025)
832
-
833
-
**motivation**: detect potential copyright violations in uploaded tracks to avoid DMCA issues and protect the platform.
834
-
835
-
**what shipped**:
836
-
- **moderation service** (Rust/Axum on Fly.io):
837
-
- standalone service at `plyr-moderation.fly.dev`
838
-
- integrates with AuDD enterprise API for audio fingerprinting
839
-
- scans audio URLs and returns matches with metadata (artist, title, album, ISRC, timecode)
840
-
- auth via `X-Moderation-Key` header
841
-
- **backend integration** (PR #382):
842
-
- `ModerationSettings` in config (service URL, auth token, timeout)
843
-
- moderation client module (`backend/_internal/moderation.py`)
844
-
- fire-and-forget background task on track upload
845
-
- stores results in `copyright_scans` table
846
-
- scan errors stored as "clear" so tracks aren't stuck unscanned
847
-
- **flagging fix** (PR #384):
848
-
- AuDD enterprise API returns no confidence scores (all 0)
849
-
- changed from score threshold to presence-based flagging: `is_flagged = !matches.is_empty()`
850
-
- removed unused `score_threshold` config
851
-
- **backfill script** (`scripts/scan_tracks_copyright.py`):
852
-
- scans existing tracks that haven't been checked
853
-
- `--max-duration` flag to skip long DJ sets (estimated from file size)
854
-
- `--dry-run` mode to preview what would be scanned
855
-
- supports dev/staging/prod environments
856
-
- **review workflow**:
857
-
- `copyright_scans` table has `resolution`, `reviewed_at`, `reviewed_by`, `review_notes` columns
858
-
- resolution values: `violation`, `false_positive`, `original_artist`
859
-
- SQL queries for dashboard: flagged tracks, unreviewed flags, violations list
860
-
861
-
**initial review results** (25 flagged tracks):
862
-
- 8 violations (actual copyright issues)
863
-
- 11 false positives (fingerprint noise)
864
-
- 6 original artists (people uploading their own distributed music)
865
-
866
-
**impact**:
867
-
- automated copyright detection on upload
868
-
- manual review workflow for flagged content
869
-
- protection against DMCA takedown requests
870
-
- clear audit trail with resolution status
871
-
872
-
---
873
-
874
-
### platform stats and media session integration (PRs #359-379, Nov 27-29, 2025)
875
-
876
-
**motivation**: show platform activity at a glance, improve playback experience across devices, and give users control over their data.
877
-
878
-
**what shipped**:
879
-
- **platform stats endpoint and UI** (PRs #376, #378, #379):
880
-
- `GET /stats` returns total plays, tracks, and artists
881
-
- stats bar displays in homepage header (e.g., "1,691 plays • 55 tracks • 8 artists")
882
-
- skeleton loading animation while fetching
883
-
- responsive layout: visible in header on wide screens, collapses to menu on narrow
884
-
- end-of-list animation on homepage
885
-
- **Media Session API** (PR #371):
886
-
- provides track metadata to CarPlay, lock screens, Bluetooth devices, macOS control center
887
-
- artwork display with fallback to artist avatar
888
-
- play/pause, prev/next, seek controls all work from system UI
889
-
- position state syncs scrubbers on external interfaces
890
-
- **browser tab title** (PR #374):
891
-
- shows "track - artist • plyr.fm" while playing
892
-
- persists across page navigation
893
-
- reverts to page title when playback stops
894
-
- **timed comments** (PR #359):
895
-
- comments capture timestamp when added during playback
896
-
- clickable timestamp buttons seek to that moment
897
-
- compact scrollable comments section on track pages
898
-
- **constellation integration** (PR #360):
899
-
- queries constellation.microcosm.blue backlink index
900
-
- enables network-wide like counts (not just plyr.fm internal)
901
-
- environment-aware namespace handling
902
-
- **account deletion** (PR #363):
903
-
- explicit confirmation flow (type handle to confirm)
904
-
- deletes all plyr.fm data (tracks, albums, likes, comments, preferences)
905
-
- optional ATProto record cleanup with clear warnings about orphaned references
906
-
907
-
**impact**:
908
-
- platform stats give visitors immediate sense of activity
909
-
- media session makes plyr.fm tracks controllable from car/lock screen/control center
910
-
- timed comments enable discussion at specific moments in tracks
911
-
- account deletion gives users full control over their data
912
-
913
-
---
914
-
915
-
### developer tokens with independent OAuth grants (PR #367, Nov 28, 2025)
916
-
917
-
**motivation**: programmatic API access (scripts, CLIs, automation) needed tokens that survive browser logout and don't become stale when browser sessions refresh.
918
-
919
-
**what shipped**:
920
-
- **OAuth-based dev tokens**: each developer token gets its own OAuth authorization flow
921
-
- user clicks "create token" → redirected to PDS for authorization → token created with independent credentials
922
-
- tokens have their own DPoP keypair, access/refresh tokens - completely separate from browser session
923
-
- **cookie isolation**: dev token exchange doesn't set browser cookie
924
-
- added `is_dev_token` flag to ExchangeToken model
925
-
- /auth/exchange skips Set-Cookie for dev token flows
926
-
- prevents logout from deleting dev tokens (critical bug fixed during implementation)
927
-
- **token management UI**: portal → "your data" → "developer tokens"
928
-
- create with optional name and expiration (30/90/180/365 days or never)
929
-
- list active tokens with creation/expiration dates
930
-
- revoke individual tokens
931
-
- **API endpoints**:
932
-
- `POST /auth/developer-token/start` - initiates OAuth flow, returns auth_url
933
-
- `GET /auth/developer-tokens` - list user's tokens
934
-
- `DELETE /auth/developer-tokens/{prefix}` - revoke by 8-char prefix
935
-
936
-
**security properties**:
937
-
- tokens are full sessions with encrypted OAuth credentials (Fernet)
938
-
- each token refreshes independently (no staleness from browser session refresh)
939
-
- revokable individually without affecting browser or other tokens
940
-
- explicit OAuth consent required at PDS for each token created
941
-
942
-
**testing verified**:
943
-
- created token → uploaded track → logged out → deleted track with token ✓
944
-
- browser logout doesn't affect dev tokens ✓
945
-
- token works across browser sessions ✓
946
-
- staging deployment tested end-to-end ✓
947
-
948
-
**documentation**: see `docs/authentication.md` "developer tokens" section
949
-
950
-
---
951
-
952
-
### oEmbed endpoint for Leaflet.pub embeds (PRs #355-358, Nov 25, 2025)
953
-
954
-
**motivation**: plyr.fm tracks embedded in Leaflet.pub (via iframely) showed a black HTML5 audio box instead of our custom embed player.
955
-
956
-
**what shipped**:
957
-
- **oEmbed endpoint** (PR #355): `/oembed` returns proper embed HTML with iframe
958
-
- follows oEmbed spec with `type: "rich"` and iframe in `html` field
959
-
- discovery link in track page `<head>` for automatic detection
960
-
- **iframely domain registration**: registered plyr.fm on iframely.com (free tier)
961
-
- this was the key fix - iframely now returns our embed iframe as `links.player[0]`
962
-
- API key: stored in 1password (iframely account)
963
-
964
-
**debugging journey** (PRs #356-358):
965
-
- initially tried `og:video` meta tags to hint iframe embed - didn't work
966
-
- tried removing `og:audio` to force oEmbed fallback - resulted in no player link
967
-
- discovered iframely requires domain registration to trust oEmbed providers
968
-
- after registration, iframely correctly returns embed iframe URL
969
-
970
-
**current state**:
971
-
- oEmbed endpoint working: `curl https://api.plyr.fm/oembed?url=https://plyr.fm/track/92`
972
-
- iframely returns `links.player[0].href = "https://plyr.fm/embed/track/92"` (our embed)
973
-
- Leaflet.pub should show proper embeds (pending their cache expiry)
974
-
975
-
**impact**:
976
-
- plyr.fm tracks can be embedded in Leaflet.pub and other iframely-powered services
977
-
- proper embed player with cover art instead of raw HTML5 audio
978
-
979
-
---
980
-
981
-
### export & upload reliability (PRs #337-344, Nov 24, 2025)
982
-
983
-
**motivation**: exports were failing silently on large files (OOM), uploads showed incorrect progress, and SSE connections triggered false error toasts.
984
-
985
-
**what shipped**:
986
-
- **database-backed jobs** (PR #337): moved upload/export tracking from in-memory to postgres
987
-
- jobs table persists state across server restarts
988
-
- enables reliable progress tracking via SSE polling
989
-
- **streaming exports** (PR #343): fixed OOM on large file exports
990
-
- previously loaded entire files into memory via `response["Body"].read()`
991
-
- now streams to temp files, adds to zip from disk (constant memory)
992
-
- 90-minute WAV files now export successfully on 1GB VM
993
-
- **progress tracking fix** (PR #340): upload progress was receiving bytes but treating as percentage
994
-
- `UploadProgressTracker` now properly converts bytes to percentage
995
-
- upload progress bar works correctly again
996
-
- **UX improvements** (PRs #338-339, #341-342, #344):
997
-
- export filename now includes date (`plyr-tracks-2025-11-24.zip`)
998
-
- toast notification on track deletion
999
-
- fixed false "lost connection" error when SSE completes normally
1000
-
- progress now shows "downloading track X of Y" instead of confusing count
1001
-
1002
-
**impact**:
1003
-
- exports work for arbitrarily large files (limited by disk, not RAM)
1004
-
- upload progress displays correctly
1005
-
- job state survives server restarts
1006
-
- clearer progress messaging during exports
1007
-
1008
-
---
1009
-
1010
-
this is a living document. last updated 2025-12-01 after ATProto labeler work.
···
131
- htmx endpoints: `/admin/flags-html`, `/admin/resolve-htmx`
132
- server-rendered HTML partials for flag cards
133
134
+
---
135
136
## immediate priorities
137
···
157
- fix: set `pool_pre_ping=True`, adjust `pool_recycle` for Neon timeouts
158
- documented in `docs/logfire-querying.md`
159
160
+
---
161
162
## technical state
163
164
+
### what's working
165
166
**core functionality**
167
- ✅ ATProto OAuth 2.1 authentication with encrypted state
···
185
- ✅ cross-tab queue synchronization via BroadcastChannel
186
- ✅ share tracks via URL with Open Graph previews (including cover art)
187
- ✅ image URL caching in database (eliminates N+1 R2 calls)
188
+
- ✅ format validation (rejects AIFF/AIF, accepts MP3/WAV/M4A with helpful error messages)
189
- ✅ standalone audio transcoding service deployed and verified (see issue #153)
190
- ✅ admin content moderation script for removing inappropriate uploads
191
- ✅ copyright moderation system (AuDD fingerprinting, review workflow, violation tracking)
192
- ✅ ATProto labeler for copyright violations (queryLabels, subscribeLabels XRPC endpoints)
···
202
- ✅ long album title handling (100-char slugs, CSS truncation)
203
- ⏸ ATProto records for albums (deferred, see issue #221)
204
205
**deployment (fully automated)**
206
- **production**:
207
- frontend: https://plyr.fm (cloudflare pages)
···
217
- storage: cloudflare R2 (audio-stg bucket)
218
- deploy: push to main → automatic
219
220
### known issues
221
222
**player behavior**
···
233
- no fullscreen player view (#122)
234
- no public API for third-party integrations (#56)
235
236
+
---
237
238
## cost structure
239
···
273
- storage used: <1GB R2
274
- database size: <10MB postgres
275
276
---
277
278
+
this is a living document. last updated 2025-12-02 after status maintenance.
update.wav
update.wav
This is a binary file and will not be displayed.