+10
-4
.github/workflows/status-maintenance.yml
+10
-4
.github/workflows/status-maintenance.yml
···
46
46
with:
47
47
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
48
48
claude_args: |
49
-
--allowedTools "Read,Write,Edit,Bash"
49
+
--allowedTools "Read,Write,Edit,Bash,Fetch"
50
50
prompt: |
51
51
you are maintaining the plyr.fm (pronounce as "player FM") project status file.
52
52
···
61
61
62
62
before writing any transcript, understand the timeline:
63
63
1. run `git log --oneline --since="1 week ago"` to see recent commits
64
-
2. run `git log --oneline -20` to see the last 20 commits with dates
64
+
2. run `git log --oneline -30` to see the last 30 commits with dates
65
65
3. note the actual dates of changes - don't present old work as "just shipped"
66
66
67
67
## task 2: archive old sections (if needed)
···
84
84
if skip_audio is false:
85
85
1. write a 2-3 minute podcast script to podcast_script.txt
86
86
- two hosts having a casual conversation
87
-
- focus on shipped features from the top of STATUS.md
87
+
- host personalities should be inspired by Gilfoyle and Dinesh from Silicon Valley
88
+
- focus on recently shipped features from the git history (except for the first episode)
88
89
- format: "Host: ..." and "Cohost: ..." lines
89
90
- IMPORTANT: "plyr.fm" is pronounced "player FM" (not "plir" or spelling it out)
91
+
- do not over-sensationalize / over-compliment the project's significance / achievements / progress
90
92
91
93
temporal awareness:
92
94
- use the git history to understand WHEN things actually shipped
93
95
- if this is the first episode, acknowledge the project started in november 2025
94
-
- reference time correctly: "last week we shipped X" vs "back in november we built Y"
96
+
- reference time correctly: "last week they shipped X" vs "back in november they built Y"
95
97
- don't present month-old work as if it just happened
96
98
97
99
tone guidelines:
···
100
102
- use intuitive analogies to explain technical concepts in terms of everyday experience
101
103
- matter-of-fact delivery, not hype-y or marketing-speak
102
104
- brief, conversational - like two friends catching up on what shipped
105
+
106
+
read upstream documentation:
107
+
- docs/**.md contains a lot of useful information
108
+
- you can Fetch atproto.com to understand primitives that are relevant to the project
103
109
104
110
2. run: uv run scripts/generate_tts.py podcast_script.txt update.wav
105
111
-878
.status_history/2025-11.md
-878
.status_history/2025-11.md
···
1
-
### detailed history
2
-
3
-
### Queue hydration + ATProto token hardening (Nov 12, 2025)
4
-
5
-
**Why:** queue endpoints were occasionally taking 2s+ and restore operations could 401
6
-
when multiple requests refreshed an expired ATProto token simultaneously.
7
-
8
-
**What shipped:**
9
-
- Added persistent `image_url` on `Track` rows so queue hydration no longer probes R2
10
-
for every track. Queue payloads now pull art directly from Postgres, with a one-time
11
-
fallback for legacy rows.
12
-
- Updated `_internal/queue.py` to backfill any missing URLs once (with caching) instead
13
-
of per-request GETs.
14
-
- Introduced per-session locks in `_refresh_session_tokens` so only one coroutine hits
15
-
`oauth_client.refresh_session` at a time; others reuse the refreshed tokens. This
16
-
removes the race that caused the batch restore flow to intermittently 500/401.
17
-
18
-
**Impact:** queue tail latency dropped back under 500 ms in staging tests, ATProto restore flows are now reliable under concurrent use, and Logfire no longer shows 500s
19
-
from the PDS.
20
-
21
-
### Liked tracks feature (PR #157, Nov 11, 2025)
22
-
23
-
- ✅ server-side persistent collections
24
-
- ✅ ATProto record publication for cross-platform visibility
25
-
- ✅ UI for adding/removing tracks from liked collection
26
-
- ✅ like counts displayed in track responses and analytics (#170)
27
-
- ✅ analytics cards now clickable links to track detail pages (#171)
28
-
- ✅ liked state shown on artist page tracks (#163)
29
-
30
-
### Upload streaming + progress UX (PR #182, Nov 11, 2025)
31
-
32
-
- Frontend switched from `fetch` to `XMLHttpRequest` so we can display upload progress
33
-
toasts (critical for >50 MB mixes on mobile).
34
-
- Upload form now clears only after the request succeeds; failed attempts leave the
35
-
form intact so users don't lose metadata.
36
-
- Backend writes uploads/images to temp files in 8 MB chunks before handing them to the
37
-
storage layer, eliminating whole-file buffering and iOS crashes for hour-long mixes.
38
-
- Deployment verified locally and by rerunning the exact repro Stella hit (85 minute
39
-
mix from mobile).
40
-
41
-
### transcoder API deployment (PR #156, Nov 11, 2025)
42
-
43
-
**standalone Rust transcoding service** 🎉
44
-
- **deployed**: https://plyr-transcoder.fly.dev/
45
-
- **purpose**: convert AIFF/FLAC/etc. to MP3 for browser compatibility
46
-
- **technology**: Axum + ffmpeg + Docker
47
-
- **security**: `X-Transcoder-Key` header authentication (shared secret)
48
-
- **capacity**: handles 1GB uploads, tested with 85-minute AIFF files (~858MB → 195MB MP3 in 32 seconds)
49
-
- **architecture**:
50
-
- 2 Fly machines for high availability
51
-
- auto-stop/start for cost efficiency
52
-
- stateless design (no R2 integration yet)
53
-
- 320kbps MP3 output with proper ID3 tags
54
-
- **status**: deployed and tested, ready for integration into plyr.fm upload pipeline
55
-
- **next steps**: wire into backend with R2 integration and job queue (see issue #153)
56
-
57
-
### AIFF/AIF browser compatibility fix (PR #152, Nov 11, 2025)
58
-
59
-
**format validation improvements**
60
-
- **problem discovered**: AIFF/AIF files only work in Safari, not Chrome/Firefox
61
-
- browsers throw `MediaError code 4: MEDIA_ERR_SRC_NOT_SUPPORTED`
62
-
- users could upload files but they wouldn't play in most browsers
63
-
- **immediate solution**: reject AIFF/AIF uploads at both backend and frontend
64
-
- removed AIFF/AIF from AudioFormat enum
65
-
- added format hints to upload UI: "supported: mp3, wav, m4a"
66
-
- client-side validation with helpful error messages
67
-
- **long-term solution**: deployed standalone transcoder service (see above)
68
-
- separate Rust/Axum service with ffmpeg
69
-
- accepts all formats, converts to browser-compatible MP3
70
-
- integration into upload pipeline pending (issue #153)
71
-
72
-
**observability improvements**:
73
-
- added logfire instrumentation to upload background tasks
74
-
- added logfire spans to R2 storage operations
75
-
- documented logfire querying patterns in `docs/logfire-querying.md`
76
-
77
-
### async I/O performance fixes (PRs #149-151, Nov 10-11, 2025)
78
-
79
-
Eliminated event loop blocking across backend with three critical PRs:
80
-
81
-
1. **PR #149: async R2 reads** - converted R2 `head_object` operations from sync boto3 to async aioboto3
82
-
- portal page load time: 2+ seconds → ~200ms
83
-
- root cause: `track.image_url` was blocking on serial R2 HEAD requests
84
-
85
-
2. **PR #150: concurrent PDS resolution** - parallelized ATProto PDS URL lookups
86
-
- homepage load time: 2-6 seconds → 200-400ms
87
-
- root cause: serial `resolve_atproto_data()` calls (8 artists × 200-300ms each)
88
-
- fix: `asyncio.gather()` for batch resolution, database caching for subsequent loads
89
-
90
-
3. **PR #151: async storage writes/deletes** - made save/delete operations non-blocking
91
-
- R2: switched to `aioboto3` for uploads/deletes (async S3 operations)
92
-
- filesystem: used `anyio.Path` and `anyio.open_file()` for chunked async I/O (64KB chunks)
93
-
- impact: multi-MB uploads no longer monopolize worker thread, constant memory usage
94
-
95
-
### cover art support (PRs #123-126, #132-139)
96
-
- ✅ track cover image upload and storage (separate R2 bucket)
97
-
- ✅ image display on track pages and player
98
-
- ✅ Open Graph meta tags for track sharing
99
-
- ✅ mobile-optimized layouts with cover art
100
-
- ✅ sticky bottom player on mobile with cover
101
-
102
-
### track detail pages (PR #164, Nov 12, 2025)
103
-
104
-
- ✅ dedicated track detail pages with large cover art
105
-
- ✅ play button updates queue state correctly (#169)
106
-
- ✅ liked state loaded efficiently via server-side fetch
107
-
- ✅ mobile-optimized layouts with proper scrolling constraints
108
-
- ✅ origin validation for image URLs (#168)
109
-
110
-
### mobile UI improvements (PRs #159-185, Nov 11-12, 2025)
111
-
112
-
- ✅ compact action menus and better navigation (#161)
113
-
- ✅ improved mobile responsiveness (#159)
114
-
- ✅ consistent button layouts across mobile/desktop (#176-181, #185)
115
-
- ✅ always show play count and like count on mobile (#177)
116
-
- ✅ login page UX improvements (#174-175)
117
-
- ✅ liked page UX improvements (#173)
118
-
- ✅ accent color for liked tracks (#160)
119
-
120
-
### queue management improvements (PRs #110-113, #115)
121
-
- ✅ visual feedback on queue add/remove
122
-
- ✅ toast notifications for queue actions
123
-
- ✅ better error handling for queue operations
124
-
- ✅ improved shuffle and auto-advance UX
125
-
126
-
### infrastructure and tooling
127
-
- ✅ R2 bucket separation: audio-prod and images-prod (PR #124)
128
-
- ✅ admin script for content moderation (`scripts/delete_track.py`)
129
-
- ✅ bluesky attribution link in header
130
-
- ✅ changelog target added (#183)
131
-
- ✅ documentation updates (#158)
132
-
- ✅ track metadata edits now persist correctly (#162)
133
-
134
-
## immediate priorities
135
-
136
-
### high priority features
137
-
1. **audio transcoding pipeline integration** (issue #153)
138
-
- ✅ standalone transcoder service deployed at https://plyr-transcoder.fly.dev/
139
-
- ✅ Rust/Axum service with ffmpeg, tested with 85-minute files
140
-
- ✅ secure auth via X-Transcoder-Key header
141
-
- ⏳ next: integrate into plyr.fm upload pipeline
142
-
- backend calls transcoder API for unsupported formats
143
-
- queue-based job system for async processing
144
-
- R2 integration (fetch original, store MP3)
145
-
- maintain original file hash for deduplication
146
-
- handle transcoding failures gracefully
147
-
148
-
### critical bugs
149
-
1. **upload reliability** (issue #147): upload returns 200 but file missing from R2, no error logged
150
-
- priority: high (data loss risk)
151
-
- need better error handling and retry logic in background upload task
152
-
153
-
2. **database connection pool SSL errors**: intermittent failures on first request
154
-
- symptom: `/tracks/` returns 500 on first request, succeeds after
155
-
- fix: set `pool_pre_ping=True`, adjust `pool_recycle` for Neon timeouts
156
-
- documented in `docs/logfire-querying.md`
157
-
158
-
### performance optimizations
159
-
3. **persist concrete file extensions in database**: currently brute-force probing all supported formats on read
160
-
- already know `Track.file_type` and image format during upload
161
-
- eliminating repeated `exists()` checks reduces filesystem/R2 HEAD spam
162
-
- improves audio streaming latency (`/audio/{file_id}` endpoint walks extensions sequentially)
163
-
164
-
4. **stream large uploads directly to storage**: current implementation reads entire file into memory before background task
165
-
- multi-GB uploads risk OOM
166
-
- stream from `UploadFile.file` → storage backend for constant memory usage
167
-
168
-
### new features
169
-
5. **content-addressable storage** (issue #146)
170
-
- hash-based file storage for automatic deduplication
171
-
- reduces storage costs when multiple artists upload same file
172
-
- enables content verification
173
-
174
-
6. **liked tracks feature** (issue #144): design schema and ATProto record format
175
-
- server-side persistent collections
176
-
- ATProto record publication for cross-platform visibility
177
-
- UI for adding/removing tracks from liked collection
178
-
179
-
## open issues by timeline
180
-
181
-
### immediate
182
-
- issue #153: audio transcoding pipeline (ffmpeg worker for AIFF/FLAC→MP3)
183
-
- issue #147: upload reliability bug (data loss risk)
184
-
- issue #144: likes feature for personal collections
185
-
186
-
### short-term
187
-
- issue #146: content-addressable storage (hash-based deduplication)
188
-
- issue #24: implement play count abuse prevention
189
-
- database connection pool tuning (SSL errors)
190
-
- file extension persistence in database
191
-
192
-
### medium-term
193
-
- issue #39: postmortem - cross-domain auth deployment and remaining security TODOs
194
-
- issue #46: consider removing init_db() from lifespan in favor of migration-only approach
195
-
- issue #56: design public developer API and versioning
196
-
- issue #57: support multiple audio item types (voice memos/snippets)
197
-
- issue #122: fullscreen player for immersive playback
198
-
199
-
### long-term
200
-
- migrate to plyr-owned lexicon (custom ATProto namespace with richer metadata)
201
-
- publish to multiple ATProto AppViews for cross-platform visibility
202
-
- explore ATProto-native notifications (replace Bluesky DM bot)
203
-
- realtime queue syncing across devices via SSE/WebSocket
204
-
- artist analytics dashboard improvements
205
-
- issue #44: modern music streaming feature parity
206
-
207
-
## technical state
208
-
209
-
### architecture
210
-
211
-
**backend**
212
-
- language: Python 3.11+
213
-
- framework: FastAPI with uvicorn
214
-
- database: Neon PostgreSQL (serverless, fully managed)
215
-
- storage: Cloudflare R2 (S3-compatible object storage)
216
-
- hosting: Fly.io (2x shared-cpu VMs, auto-scaling)
217
-
- observability: Pydantic Logfire (traces, metrics, logs)
218
-
- auth: ATProto OAuth 2.1 (forked SDK: github.com/zzstoatzz/atproto)
219
-
220
-
**frontend**
221
-
- framework: SvelteKit (latest v2.43.2)
222
-
- runtime: Bun (fast JS runtime)
223
-
- hosting: Cloudflare Pages (edge network)
224
-
- styling: vanilla CSS with lowercase aesthetic
225
-
- state management: Svelte 5 runes ($state, $derived, $effect)
226
-
227
-
**deployment**
228
-
- ci/cd: GitHub Actions
229
-
- backend: automatic on main branch merge (fly.io deploy)
230
-
- frontend: automatic on every push to main (cloudflare pages)
231
-
- migrations: automated via fly.io release_command
232
-
- environments: dev → staging → production (full separation)
233
-
- versioning: nebula timestamp format (YYYY.MMDD.HHMMSS)
234
-
235
-
**key dependencies**
236
-
- atproto: forked SDK for OAuth and record management
237
-
- sqlalchemy: async ORM for postgres
238
-
- alembic: database migrations
239
-
- boto3/aioboto3: R2 storage client
240
-
- logfire: observability (FastAPI + SQLAlchemy instrumentation)
241
-
- httpx: async HTTP client
242
-
243
-
**what's working**
244
-
245
-
**core functionality**
246
-
- ✅ ATProto OAuth 2.1 authentication with encrypted state
247
-
- ✅ secure session management via HttpOnly cookies (XSS protection)
248
-
- ✅ developer tokens with independent OAuth grants (programmatic API access)
249
-
- ✅ platform stats endpoint and homepage display (plays, tracks, artists)
250
-
- ✅ Media Session API for CarPlay, lock screens, control center
251
-
- ✅ timed comments on tracks with clickable timestamps
252
-
- ✅ account deletion with explicit confirmation
253
-
- ✅ artist profiles synced with Bluesky (avatar, display name, handle)
254
-
- ✅ track upload with streaming to prevent OOM
255
-
- ✅ track edit (title, artist, album, features metadata)
256
-
- ✅ track deletion with cascade cleanup
257
-
- ✅ audio streaming via HTML5 player with 307 redirects to R2 CDN
258
-
- ✅ track metadata published as ATProto records (fm.plyr.track namespace)
259
-
- ✅ play count tracking with threshold (30% or 30s, whichever comes first)
260
-
- ✅ like functionality with counts
261
-
- ✅ artist analytics dashboard
262
-
- ✅ queue management (shuffle, auto-advance, reorder)
263
-
- ✅ mobile-optimized responsive UI
264
-
- ✅ cross-tab queue synchronization via BroadcastChannel
265
-
- ✅ share tracks via URL with Open Graph previews (including cover art)
266
-
- ✅ image URL caching in database (eliminates N+1 R2 calls)
267
-
- ✅ format validation (rejects AIFF/AIF, accepts MP3/WAV/M4A with helpful error mes
268
-
sages)
269
-
- ✅ standalone audio transcoding service deployed and verified (see issue #153)
270
-
- ✅ Bluesky embed player UI changes implemented (pending upstream social-app PR)
271
-
- ✅ admin content moderation script for removing inappropriate uploads
272
-
- ✅ copyright moderation system (AuDD fingerprinting, review workflow, violation tracking)
273
-
- ✅ ATProto labeler for copyright violations (queryLabels, subscribeLabels XRPC endpoints)
274
-
- ✅ admin UI for reviewing flagged tracks with htmx (plyr-moderation.fly.dev/admin)
275
-
276
-
**albums**
277
-
- ✅ album database schema with track relationships
278
-
- ✅ album browsing pages (`/u/{handle}` shows discography)
279
-
- ✅ album detail pages (`/u/{handle}/album/{slug}`) with full track lists
280
-
- ✅ album cover art upload and display
281
-
- ✅ server-side rendering for SEO
282
-
- ✅ rich Open Graph metadata for link previews (music.album type)
283
-
- ✅ long album title handling (100-char slugs, CSS truncation)
284
-
- ⏸ ATProto records for albums (deferred, see issue #221)
285
-
286
-
**frontend architecture**
287
-
- ✅ server-side data loading (`+page.server.ts`) for artist and album pages
288
-
- ✅ client-side data loading (`+page.ts`) for auth-dependent pages
289
-
- ✅ centralized auth manager (`lib/auth.svelte.ts`)
290
-
- ✅ layout-level auth state (`+layout.ts`) shared across all pages
291
-
- ✅ eliminated "flash of loading" via proper load functions
292
-
- ✅ consistent auth patterns (no scattered localStorage calls)
293
-
294
-
**deployment (fully automated)**
295
-
- **production**:
296
-
- frontend: https://plyr.fm (cloudflare pages)
297
-
- backend: https://relay-api.fly.dev (fly.io: 2 machines, 1GB RAM, 1 shared CPU, min 1 running)
298
-
- database: neon postgresql
299
-
- storage: cloudflare R2 (audio-prod and images-prod buckets)
300
-
- deploy: github release → automatic
301
-
302
-
- **staging**:
303
-
- backend: https://api-stg.plyr.fm (fly.io: relay-api-staging)
304
-
- frontend: https://stg.plyr.fm (cloudflare pages: plyr-fm-stg)
305
-
- database: neon postgresql (relay-staging)
306
-
- storage: cloudflare R2 (audio-stg bucket)
307
-
- deploy: push to main → automatic
308
-
309
-
- **development**:
310
-
- backend: localhost:8000
311
-
- frontend: localhost:5173
312
-
- database: neon postgresql (relay-dev)
313
-
- storage: cloudflare R2 (audio-dev and images-dev buckets)
314
-
315
-
- **developer tooling**:
316
-
- `just serve` - run backend locally
317
-
- `just dev` - run frontend locally
318
-
- `just test` - run test suite
319
-
- `just release` - create production release (backend + frontend)
320
-
- `just release-frontend-only` - deploy only frontend changes (added Nov 13)
321
-
322
-
### what's in progress
323
-
324
-
**immediate work**
325
-
- investigating playback auto-start behavior (#225)
326
-
- page refresh sometimes starts playing immediately
327
-
- may be related to queue state restoration or localStorage caching
328
-
- `autoplay_next` preference not being respected in all cases
329
-
- liquid glass effects as user-configurable setting (#186)
330
-
331
-
**active research**
332
-
- transcoding pipeline architecture (see sandbox/transcoding-pipeline-plan.md)
333
-
- content moderation systems (#166, #167, #393 - takedown state representation)
334
-
- PWA capabilities and offline support (#165)
335
-
336
-
### known issues
337
-
338
-
**player behavior**
339
-
- playback auto-start on refresh (#225)
340
-
- sometimes plays immediately after page load
341
-
- investigating localStorage/queue state persistence
342
-
- may not respect `autoplay_next` preference in all scenarios
343
-
344
-
**missing features**
345
-
- no ATProto records for albums yet (#221 - consciously deferred)
346
-
- no track genres/tags/descriptions yet (#155)
347
-
- no AIFF/AIF transcoding support (#153)
348
-
- no PWA installation prompts (#165)
349
-
- no fullscreen player view (#122)
350
-
- no public API for third-party integrations (#56)
351
-
352
-
**technical debt**
353
-
- multi-tab playback synchronization could be more robust
354
-
- queue state conflicts can occur with rapid operations
355
-
356
-
### technical decisions
357
-
358
-
**why Python/FastAPI instead of Rust?**
359
-
- rapid prototyping velocity during MVP phase
360
-
- rich ecosystem for web APIs (fastapi, sqlalchemy, pydantic)
361
-
- excellent async support with asyncio
362
-
- lower barrier to contribution
363
-
- trade-off: accepting higher latency for faster development
364
-
- future: can migrate hot paths to Rust if needed (transcoding service already planned)
365
-
366
-
**why Fly.io instead of AWS/GCP?**
367
-
- simple deployment model (dockerfile → production)
368
-
- automatic SSL/TLS certificates
369
-
- built-in global load balancing
370
-
- reasonable pricing for MVP ($5/month)
371
-
- easy migration path to larger providers later
372
-
- trade-off: vendor-specific features, less control
373
-
374
-
**why Cloudflare R2 instead of S3?**
375
-
- zero egress fees (critical for audio streaming)
376
-
- S3-compatible API (easy migration if needed)
377
-
- integrated CDN for fast delivery
378
-
- significantly cheaper than S3 for bandwidth-heavy workloads
379
-
380
-
**why forked atproto SDK?**
381
-
- upstream SDK lacked OAuth 2.1 support
382
-
- needed custom record management patterns
383
-
- maintains compatibility with ATProto spec
384
-
- contributes improvements back when possible
385
-
386
-
**why SvelteKit instead of React/Next.js?**
387
-
- Svelte 5 runes provide excellent reactivity model
388
-
- smaller bundle sizes (critical for mobile)
389
-
- less boilerplate than React
390
-
- SSR + static generation flexibility
391
-
- modern DX with TypeScript
392
-
393
-
**why Neon instead of self-hosted Postgres?**
394
-
- serverless autoscaling (no capacity planning)
395
-
- branch-per-PR workflow (preview databases)
396
-
- automatic backups and point-in-time recovery
397
-
- generous free tier for MVP
398
-
- trade-off: higher latency than co-located DB, but acceptable
399
-
400
-
**why reject AIFF instead of transcoding immediately?**
401
-
- MVP speed: transcoding requires queue infrastructure, ffmpeg setup, error handling
402
-
- user communication: better to be upfront about limitations than silent failures
403
-
- resource management: transcoding is CPU-intensive, needs proper worker architecture
404
-
- future flexibility: can add transcoding as optional feature (high-quality uploads → MP3 delivery)
405
-
- trade-off: some users can't upload AIFF now, but those who can upload MP3 have working experience
406
-
407
-
**why async everywhere?**
408
-
- event loop performance: single-threaded async handles high concurrency
409
-
- I/O-bound workload: most time spent waiting on network/disk
410
-
- recent work (PRs #149-151) eliminated all blocking operations
411
-
- alternative: thread pools for blocking I/O, but increases complexity
412
-
- trade-off: debugging async code harder than sync, but worth throughput gains
413
-
414
-
**why anyio.Path over thread pools?**
415
-
- true async I/O: `anyio` uses OS-level async file operations where available
416
-
- constant memory: chunked reads/writes (64KB) prevent OOM on large files
417
-
- thread pools: would work but less efficient, more context switching
418
-
- trade-off: anyio API slightly different from stdlib `pathlib`, but cleaner async semantics
419
-
420
-
## cost structure
421
-
422
-
current monthly costs: ~$5-6
423
-
424
-
- cloudflare pages: $0 (free tier)
425
-
- cloudflare R2: ~$0.16 (storage + operations, no egress fees)
426
-
- fly.io production: $5.00 (2x shared-cpu-1x VMs with auto-stop)
427
-
- fly.io staging: $0 (auto-stop, only runs during testing)
428
-
- neon: $0 (free tier, 0.5 CPU, 512MB RAM, 3GB storage)
429
-
- logfire: $0 (free tier)
430
-
- domain: $12/year (~$1/month)
431
-
432
-
## deployment URLs
433
-
434
-
- **production frontend**: https://plyr.fm
435
-
- **production backend**: https://relay-api.fly.dev (redirects to https://api.plyr.fm)
436
-
- **staging backend**: https://api-stg.plyr.fm
437
-
- **staging frontend**: https://stg.plyr.fm
438
-
- **repository**: https://github.com/zzstoatzz/plyr.fm (private)
439
-
- **monitoring**: https://logfire-us.pydantic.dev/zzstoatzz/relay
440
-
- **bluesky**: https://bsky.app/profile/plyr.fm
441
-
- **latest release**: 2025.1129.214811
442
-
443
-
## health indicators
444
-
445
-
**production status**: ✅ healthy
446
-
- uptime: consistently available
447
-
- response times: <500ms p95 for API endpoints
448
-
- error rate: <1% (mostly invalid OAuth states)
449
-
- storage: ~12 tracks uploaded, functioning correctly
450
-
451
-
**key metrics**
452
-
- total tracks: ~12
453
-
- total artists: ~3
454
-
- play counts: tracked per-track
455
-
- storage used: <1GB R2
456
-
- database size: <10MB postgres
457
-
458
-
## next session prep
459
-
460
-
**context for new agent:**
461
-
1. Fixed R2 image upload path mismatch, ensuring images save with the correct prefix.
462
-
2. Implemented UI changes for the embed player: removed the Queue button and matched fonts to the main app.
463
-
3. Opened a draft PR to the upstream social-app repository for native Plyr.fm embed support.
464
-
4. Updated issue #153 (transcoding pipeline) with a clear roadmap for integration into the backend.
465
-
5. Developed a local verification script for the transcoder service for faster local iteration.
466
-
467
-
**useful commands:**
468
-
- `just backend run` - run backend locally
469
-
- `just frontend dev` - run frontend locally
470
-
- `just test` - run test suite (from `backend/` directory)
471
-
- `gh issue list` - check open issues
472
-
## admin tooling
473
-
474
-
### content moderation
475
-
script: `scripts/delete_track.py`
476
-
- requires `ADMIN_*` prefixed environment variables
477
-
- deletes audio file from R2
478
-
- deletes cover image from R2 (if exists)
479
-
- deletes database record (cascades to likes and queue entries)
480
-
- notes ATProto records for manual cleanup (can't delete from other users' PDS)
481
-
482
-
usage:
483
-
```bash
484
-
# dry run
485
-
uv run scripts/delete_track.py <track_id> --dry-run
486
-
487
-
# delete with confirmation
488
-
uv run scripts/delete_track.py <track_id>
489
-
490
-
# delete without confirmation
491
-
uv run scripts/delete_track.py <track_id> --yes
492
-
493
-
# by URL
494
-
uv run scripts/delete_track.py --url https://plyr.fm/track/34
495
-
```
496
-
497
-
required environment variables:
498
-
- `ADMIN_DATABASE_URL` - production database connection
499
-
- `ADMIN_AWS_ACCESS_KEY_ID` - R2 access key
500
-
- `ADMIN_AWS_SECRET_ACCESS_KEY` - R2 secret
501
-
- `ADMIN_R2_ENDPOINT_URL` - R2 endpoint
502
-
- `ADMIN_R2_BUCKET` - R2 bucket name
503
-
504
-
## known issues
505
-
506
-
### non-blocking
507
-
- cloudflare pages preview URLs return 404 (production works fine)
508
-
- some "relay" references remain in docs and comments
509
-
- ATProto like records can't be deleted when removing tracks (orphaned on users' PDS)
510
-
511
-
## for new contributors
512
-
513
-
### getting started
514
-
1. clone: `gh repo clone zzstoatzz/plyr.fm`
515
-
2. install dependencies: `uv sync && cd frontend && bun install`
516
-
3. run backend: `uv run uvicorn backend.main:app --reload`
517
-
4. run frontend: `cd frontend && bun run dev`
518
-
5. visit http://localhost:5173
519
-
520
-
### development workflow
521
-
1. create issue on github
522
-
2. create PR from feature branch
523
-
3. ensure pre-commit hooks pass
524
-
4. test locally
525
-
5. merge to main → deploys to staging automatically
526
-
6. verify on staging
527
-
7. create github release → deploys to production automatically
528
-
529
-
### key principles
530
-
- type hints everywhere
531
-
- lowercase aesthetic
532
-
- generic terminology (use "items" not "tracks" where appropriate)
533
-
- ATProto first
534
-
- mobile matters
535
-
- cost conscious
536
-
- async everywhere (no blocking I/O)
537
-
538
-
### project structure
539
-
```
540
-
plyr.fm/
541
-
├── backend/ # FastAPI app & Python tooling
542
-
│ ├── src/backend/ # application code
543
-
│ │ ├── api/ # public endpoints
544
-
│ │ ├── _internal/ # internal services
545
-
│ │ ├── models/ # database schemas
546
-
│ │ └── storage/ # storage adapters
547
-
│ ├── tests/ # pytest suite
548
-
│ └── alembic/ # database migrations
549
-
├── frontend/ # SvelteKit app
550
-
│ ├── src/lib/ # components & state
551
-
│ └── src/routes/ # pages
552
-
├── moderation/ # Rust moderation service (ATProto labeler)
553
-
│ ├── src/ # Axum handlers, AuDD client, label signing
554
-
│ └── static/ # admin UI (html/css/js)
555
-
├── transcoder/ # Rust audio transcoding service
556
-
├── docs/ # documentation
557
-
└── justfile # task runner (mods: backend, frontend, moderation, transcoder)
558
-
```
559
-
560
-
## documentation
561
-
562
-
- [deployment overview](docs/deployment/overview.md)
563
-
- [configuration guide](docs/configuration.md)
564
-
- [queue design](docs/queue-design.md)
565
-
- [logfire querying](docs/logfire-querying.md)
566
-
- [pdsx guide](docs/pdsx-guide.md)
567
-
- [neon mcp guide](docs/neon-mcp-guide.md)
568
-
569
-
## performance optimization session (Nov 12, 2025)
570
-
571
-
### issue: slow /tracks/liked endpoint
572
-
573
-
**symptoms**:
574
-
- `/tracks/liked` taking 600-900ms consistently
575
-
- only ~25ms spent in database queries
576
-
- mysterious 575ms gap with no spans in Logfire traces
577
-
- endpoint felt sluggish compared to other pages
578
-
579
-
**investigation**:
580
-
- examined Logfire traces for `/tracks/liked` requests
581
-
- found 5-6 liked tracks being returned per request
582
-
- DB queries completing fast (track data, artist info, like counts all under 10ms each)
583
-
- noticed R2 storage calls weren't appearing in traces despite taking majority of request time
584
-
585
-
**root cause**:
586
-
- PR #184 added `image_url` column to tracks table to eliminate N+1 R2 API calls
587
-
- new tracks (uploaded after PR) have `image_url` populated at upload time ✅
588
-
- legacy tracks (15 tracks uploaded before PR) had `image_url = NULL` ❌
589
-
- fallback code called `track.get_image_url()` for NULL values
590
-
- `get_image_url()` makes uninstrumented R2 `head_object` API calls to find image extensions
591
-
- each track with NULL `image_url` = ~100-120ms of R2 API calls per request
592
-
- 5 tracks × 120ms = ~600ms of uninstrumented latency
593
-
594
-
**why R2 calls weren't visible**:
595
-
- `storage.get_url()` method had no Logfire instrumentation
596
-
- R2 API calls happening but not creating spans
597
-
- appeared as mysterious gap in trace timeline
598
-
599
-
**solution implemented**:
600
-
1. created `scripts/backfill_image_urls.py` to populate missing `image_url` values
601
-
2. ran script against production database with production R2 credentials
602
-
3. backfilled 11 tracks successfully (4 already done in previous partial run)
603
-
4. 3 tracks "failed" but actually have non-existent images (optional, expected)
604
-
5. script uses concurrent `asyncio.gather()` for performance
605
-
606
-
**key learning: environment configuration matters**:
607
-
- initial script runs failed silently because:
608
-
- script used local `.env` credentials (dev R2 bucket)
609
-
- production images stored in different R2 bucket (`images-prod`)
610
-
- `get_url()` returned `None` when images not found in dev bucket
611
-
- fix: passed production R2 credentials via environment variables:
612
-
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`
613
-
- `R2_IMAGE_BUCKET=images-prod`
614
-
- `R2_PUBLIC_IMAGE_BUCKET_URL=https://pub-7ea7ea9a6f224f4f8c0321a2bb008c5a.r2.dev`
615
-
616
-
**results**:
617
-
- before: 15 tracks needed backfill, causing ~600-900ms latency on `/tracks/liked`
618
-
- after: 13 tracks populated with `image_url`, 3 legitimately have no images
619
-
- `/tracks/liked` now loads with 0 R2 API calls instead of 5-11
620
-
- endpoint feels "really, really snappy" (user feedback)
621
-
- performance improvement visible immediately after backfill
622
-
623
-
**database cleanup: queue_state table bloat**:
624
-
- discovered `queue_state` had 265% bloat (53 dead rows, 20 live rows)
625
-
- ran `VACUUM (FULL, ANALYZE) queue_state` against production
626
-
- result: 0 dead rows, table clean
627
-
- configured autovacuum for queue_state to prevent future bloat:
628
-
- frequent updates to this table make it prone to bloat
629
-
- should tune `autovacuum_vacuum_scale_factor` to 0.05 (5% vs default 20%)
630
-
631
-
**endpoint performance snapshot** (post-fix, last 10 minutes):
632
-
- `GET /tracks/`: 410ms (down from 2+ seconds)
633
-
- `GET /queue/`: 399ms (down from 2+ seconds)
634
-
- `GET /tracks/liked`: now sub-200ms (down from 600-900ms)
635
-
- `GET /preferences/`: 200ms median
636
-
- `GET /auth/me`: 114ms median
637
-
- `POST /tracks/{track_id}/play`: 34ms
638
-
639
-
**PR #184 context**:
640
-
- PR claimed "opportunistic backfill: legacy records update on first access"
641
-
- but actual implementation never saved computed `image_url` back to database
642
-
- fallback code only computed URLs on-demand, didn't persist them
643
-
- this is why repeated visits kept hitting R2 API for same tracks
644
-
- one-time backfill script was correct solution vs adding write logic to read endpoints
645
-
646
-
**graceful ATProto recovery (PR #180)**:
647
-
- reviewed recent work on handling tracks with missing `atproto_record_uri`
648
-
- 4 tracks in production have NULL ATProto records (expected from upload failures)
649
-
- system already handles this gracefully:
650
-
- like buttons disabled with helpful tooltips
651
-
- track owners can self-service restore via portal
652
-
- `restore-record` endpoint recreates with correct TID timestamps
653
-
- no action needed - existing recovery system working as designed
654
-
655
-
**performance metrics pre/post all recent PRs**:
656
-
- PR #184 (image_url storage): eliminated hundreds of R2 API calls per request
657
-
- today's backfill: eliminated remaining R2 calls for legacy tracks
658
-
- combined impact: queue/tracks endpoints now 5-10x faster than before PR #184
659
-
- all endpoints now consistently sub-second response times
660
-
661
-
**documentation created**:
662
-
- `docs/neon-mcp-guide.md`: comprehensive guide for using Neon MCP
663
-
- project/branch management
664
-
- database schema inspection
665
-
- SQL query patterns for plyr.fm
666
-
- connection string generation
667
-
- environment mapping (dev/staging/prod)
668
-
- debugging workflows
669
-
- `scripts/backfill_image_urls.py`: reusable for any future image_url gaps
670
-
- dry-run mode for safety
671
-
- concurrent R2 API calls
672
-
- detailed error logging
673
-
- production-tested
674
-
675
-
**tools and patterns established**:
676
-
- Neon MCP for database inspection and queries
677
-
- Logfire arbitrary queries for performance analysis
678
-
- production secret management via Fly.io
679
-
- `flyctl ssh console` for environment inspection
680
-
- backfill scripts with dry-run mode
681
-
- environment variable overrides for production operations
682
-
683
-
**system health indicators**:
684
-
- ✅ no 5xx errors in recent spans
685
-
- ✅ database queries all under 70ms p95
686
-
- ✅ SSL connection pool issues resolved (no errors in recent traces)
687
-
- ✅ queue_state table bloat eliminated
688
-
- ✅ all track images either in DB or legitimately NULL
689
-
- ✅ application feels fast and responsive
690
-
691
-
**next steps**:
692
-
1. configure autovacuum for `queue_state` table (prevent future bloat)
693
-
2. add Logfire instrumentation to `storage.get_url()` for visibility
694
-
3. monitor `/tracks/liked` performance over next few days
695
-
4. consider adding similar backfill pattern for any future column additions
696
-
697
-
---
698
-
699
-
### copyright moderation system (PRs #382, #384, Nov 29-30, 2025)
700
-
701
-
**motivation**: detect potential copyright violations in uploaded tracks to avoid DMCA issues and protect the platform.
702
-
703
-
**what shipped**:
704
-
- **moderation service** (Rust/Axum on Fly.io):
705
-
- standalone service at `plyr-moderation.fly.dev`
706
-
- integrates with AuDD enterprise API for audio fingerprinting
707
-
- scans audio URLs and returns matches with metadata (artist, title, album, ISRC, timecode)
708
-
- auth via `X-Moderation-Key` header
709
-
- **backend integration** (PR #382):
710
-
- `ModerationSettings` in config (service URL, auth token, timeout)
711
-
- moderation client module (`backend/_internal/moderation.py`)
712
-
- fire-and-forget background task on track upload
713
-
- stores results in `copyright_scans` table
714
-
- scan errors stored as "clear" so tracks aren't stuck unscanned
715
-
- **flagging fix** (PR #384):
716
-
- AuDD enterprise API returns no confidence scores (all 0)
717
-
- changed from score threshold to presence-based flagging: `is_flagged = !matches.is_empty()`
718
-
- removed unused `score_threshold` config
719
-
- **backfill script** (`scripts/scan_tracks_copyright.py`):
720
-
- scans existing tracks that haven't been checked
721
-
- `--max-duration` flag to skip long DJ sets (estimated from file size)
722
-
- `--dry-run` mode to preview what would be scanned
723
-
- supports dev/staging/prod environments
724
-
- **review workflow**:
725
-
- `copyright_scans` table has `resolution`, `reviewed_at`, `reviewed_by`, `review_notes` columns
726
-
- resolution values: `violation`, `false_positive`, `original_artist`
727
-
- SQL queries for dashboard: flagged tracks, unreviewed flags, violations list
728
-
729
-
**initial review results** (25 flagged tracks):
730
-
- 8 violations (actual copyright issues)
731
-
- 11 false positives (fingerprint noise)
732
-
- 6 original artists (people uploading their own distributed music)
733
-
734
-
**impact**:
735
-
- automated copyright detection on upload
736
-
- manual review workflow for flagged content
737
-
- protection against DMCA takedown requests
738
-
- clear audit trail with resolution status
739
-
740
-
---
741
-
742
-
### platform stats and media session integration (PRs #359-379, Nov 27-29, 2025)
743
-
744
-
**motivation**: show platform activity at a glance, improve playback experience across devices, and give users control over their data.
745
-
746
-
**what shipped**:
747
-
- **platform stats endpoint and UI** (PRs #376, #378, #379):
748
-
- `GET /stats` returns total plays, tracks, and artists
749
-
- stats bar displays in homepage header (e.g., "1,691 plays • 55 tracks • 8 artists")
750
-
- skeleton loading animation while fetching
751
-
- responsive layout: visible in header on wide screens, collapses to menu on narrow
752
-
- end-of-list animation on homepage
753
-
- **Media Session API** (PR #371):
754
-
- provides track metadata to CarPlay, lock screens, Bluetooth devices, macOS control center
755
-
- artwork display with fallback to artist avatar
756
-
- play/pause, prev/next, seek controls all work from system UI
757
-
- position state syncs scrubbers on external interfaces
758
-
- **browser tab title** (PR #374):
759
-
- shows "track - artist • plyr.fm" while playing
760
-
- persists across page navigation
761
-
- reverts to page title when playback stops
762
-
- **timed comments** (PR #359):
763
-
- comments capture timestamp when added during playback
764
-
- clickable timestamp buttons seek to that moment
765
-
- compact scrollable comments section on track pages
766
-
- **constellation integration** (PR #360):
767
-
- queries constellation.microcosm.blue backlink index
768
-
- enables network-wide like counts (not just plyr.fm internal)
769
-
- environment-aware namespace handling
770
-
- **account deletion** (PR #363):
771
-
- explicit confirmation flow (type handle to confirm)
772
-
- deletes all plyr.fm data (tracks, albums, likes, comments, preferences)
773
-
- optional ATProto record cleanup with clear warnings about orphaned references
774
-
775
-
**impact**:
776
-
- platform stats give visitors immediate sense of activity
777
-
- media session makes plyr.fm tracks controllable from car/lock screen/control center
778
-
- timed comments enable discussion at specific moments in tracks
779
-
- account deletion gives users full control over their data
780
-
781
-
---
782
-
783
-
### developer tokens with independent OAuth grants (PR #367, Nov 28, 2025)
784
-
785
-
**motivation**: programmatic API access (scripts, CLIs, automation) needed tokens that survive browser logout and don't become stale when browser sessions refresh.
786
-
787
-
**what shipped**:
788
-
- **OAuth-based dev tokens**: each developer token gets its own OAuth authorization flow
789
-
- user clicks "create token" → redirected to PDS for authorization → token created with independent credentials
790
-
- tokens have their own DPoP keypair, access/refresh tokens - completely separate from browser session
791
-
- **cookie isolation**: dev token exchange doesn't set browser cookie
792
-
- added `is_dev_token` flag to ExchangeToken model
793
-
- /auth/exchange skips Set-Cookie for dev token flows
794
-
- prevents logout from deleting dev tokens (critical bug fixed during implementation)
795
-
- **token management UI**: portal → "your data" → "developer tokens"
796
-
- create with optional name and expiration (30/90/180/365 days or never)
797
-
- list active tokens with creation/expiration dates
798
-
- revoke individual tokens
799
-
- **API endpoints**:
800
-
- `POST /auth/developer-token/start` - initiates OAuth flow, returns auth_url
801
-
- `GET /auth/developer-tokens` - list user's tokens
802
-
- `DELETE /auth/developer-tokens/{prefix}` - revoke by 8-char prefix
803
-
804
-
**security properties**:
805
-
- tokens are full sessions with encrypted OAuth credentials (Fernet)
806
-
- each token refreshes independently (no staleness from browser session refresh)
807
-
- revokable individually without affecting browser or other tokens
808
-
- explicit OAuth consent required at PDS for each token created
809
-
810
-
**testing verified**:
811
-
- created token → uploaded track → logged out → deleted track with token ✓
812
-
- browser logout doesn't affect dev tokens ✓
813
-
- token works across browser sessions ✓
814
-
- staging deployment tested end-to-end ✓
815
-
816
-
**documentation**: see `docs/authentication.md` "developer tokens" section
817
-
818
-
---
819
-
820
-
### oEmbed endpoint for Leaflet.pub embeds (PRs #355-358, Nov 25, 2025)
821
-
822
-
**motivation**: plyr.fm tracks embedded in Leaflet.pub (via iframely) showed a black HTML5 audio box instead of our custom embed player.
823
-
824
-
**what shipped**:
825
-
- **oEmbed endpoint** (PR #355): `/oembed` returns proper embed HTML with iframe
826
-
- follows oEmbed spec with `type: "rich"` and iframe in `html` field
827
-
- discovery link in track page `<head>` for automatic detection
828
-
- **iframely domain registration**: registered plyr.fm on iframely.com (free tier)
829
-
- this was the key fix - iframely now returns our embed iframe as `links.player[0]`
830
-
- API key: stored in 1password (iframely account)
831
-
832
-
**debugging journey** (PRs #356-358):
833
-
- initially tried `og:video` meta tags to hint iframe embed - didn't work
834
-
- tried removing `og:audio` to force oEmbed fallback - resulted in no player link
835
-
- discovered iframely requires domain registration to trust oEmbed providers
836
-
- after registration, iframely correctly returns embed iframe URL
837
-
838
-
**current state**:
839
-
- oEmbed endpoint working: `curl https://api.plyr.fm/oembed?url=https://plyr.fm/track/92`
840
-
- iframely returns `links.player[0].href = "https://plyr.fm/embed/track/92"` (our embed)
841
-
- Leaflet.pub should show proper embeds (pending their cache expiry)
842
-
843
-
**impact**:
844
-
- plyr.fm tracks can be embedded in Leaflet.pub and other iframely-powered services
845
-
- proper embed player with cover art instead of raw HTML5 audio
846
-
847
-
---
848
-
849
-
### export & upload reliability (PRs #337-344, Nov 24, 2025)
850
-
851
-
**motivation**: exports were failing silently on large files (OOM), uploads showed incorrect progress, and SSE connections triggered false error toasts.
852
-
853
-
**what shipped**:
854
-
- **database-backed jobs** (PR #337): moved upload/export tracking from in-memory to postgres
855
-
- jobs table persists state across server restarts
856
-
- enables reliable progress tracking via SSE polling
857
-
- **streaming exports** (PR #343): fixed OOM on large file exports
858
-
- previously loaded entire files into memory via `response["Body"].read()`
859
-
- now streams to temp files, adds to zip from disk (constant memory)
860
-
- 90-minute WAV files now export successfully on 1GB VM
861
-
- **progress tracking fix** (PR #340): upload progress was receiving bytes but treating as percentage
862
-
- `UploadProgressTracker` now properly converts bytes to percentage
863
-
- upload progress bar works correctly again
864
-
- **UX improvements** (PRs #338-339, #341-342, #344):
865
-
- export filename now includes date (`plyr-tracks-2025-11-24.zip`)
866
-
- toast notification on track deletion
867
-
- fixed false "lost connection" error when SSE completes normally
868
-
- progress now shows "downloading track X of Y" instead of confusing count
869
-
870
-
**impact**:
871
-
- exports work for arbitrarily large files (limited by disk, not RAM)
872
-
- upload progress displays correctly
873
-
- job state survives server restarts
874
-
- clearer progress messaging during exports
875
-
876
-
---
877
-
878
-
archived from STATUS.md on 2025-12-01
+874
-2
STATUS.md
+874
-2
STATUS.md
···
136
136
- htmx endpoints: `/admin/flags-html`, `/admin/resolve-htmx`
137
137
- server-rendered HTML partials for flag cards
138
138
139
+
140
+
### Queue hydration + ATProto token hardening (Nov 12, 2025)
141
+
142
+
**Why:** queue endpoints were occasionally taking 2s+ and restore operations could 401
143
+
when multiple requests refreshed an expired ATProto token simultaneously.
144
+
145
+
**What shipped:**
146
+
- Added persistent `image_url` on `Track` rows so queue hydration no longer probes R2
147
+
for every track. Queue payloads now pull art directly from Postgres, with a one-time
148
+
fallback for legacy rows.
149
+
- Updated `_internal/queue.py` to backfill any missing URLs once (with caching) instead
150
+
of per-request GETs.
151
+
- Introduced per-session locks in `_refresh_session_tokens` so only one coroutine hits
152
+
`oauth_client.refresh_session` at a time; others reuse the refreshed tokens. This
153
+
removes the race that caused the batch restore flow to intermittently 500/401.
154
+
155
+
**Impact:** queue tail latency dropped back under 500 ms in staging tests, ATProto restore flows are now reliable under concurrent use, and Logfire no longer shows 500s
156
+
from the PDS.
157
+
158
+
### Liked tracks feature (PR #157, Nov 11, 2025)
159
+
160
+
- ✅ server-side persistent collections
161
+
- ✅ ATProto record publication for cross-platform visibility
162
+
- ✅ UI for adding/removing tracks from liked collection
163
+
- ✅ like counts displayed in track responses and analytics (#170)
164
+
- ✅ analytics cards now clickable links to track detail pages (#171)
165
+
- ✅ liked state shown on artist page tracks (#163)
166
+
167
+
### Upload streaming + progress UX (PR #182, Nov 11, 2025)
168
+
169
+
- Frontend switched from `fetch` to `XMLHttpRequest` so we can display upload progress
170
+
toasts (critical for >50 MB mixes on mobile).
171
+
- Upload form now clears only after the request succeeds; failed attempts leave the
172
+
form intact so users don't lose metadata.
173
+
- Backend writes uploads/images to temp files in 8 MB chunks before handing them to the
174
+
storage layer, eliminating whole-file buffering and iOS crashes for hour-long mixes.
175
+
- Deployment verified locally and by rerunning the exact repro Stella hit (85 minute
176
+
mix from mobile).
177
+
178
+
### transcoder API deployment (PR #156, Nov 11, 2025)
179
+
180
+
**standalone Rust transcoding service** 🎉
181
+
- **deployed**: https://plyr-transcoder.fly.dev/
182
+
- **purpose**: convert AIFF/FLAC/etc. to MP3 for browser compatibility
183
+
- **technology**: Axum + ffmpeg + Docker
184
+
- **security**: `X-Transcoder-Key` header authentication (shared secret)
185
+
- **capacity**: handles 1GB uploads, tested with 85-minute AIFF files (~858MB → 195MB MP3 in 32 seconds)
186
+
- **architecture**:
187
+
- 2 Fly machines for high availability
188
+
- auto-stop/start for cost efficiency
189
+
- stateless design (no R2 integration yet)
190
+
- 320kbps MP3 output with proper ID3 tags
191
+
- **status**: deployed and tested, ready for integration into plyr.fm upload pipeline
192
+
- **next steps**: wire into backend with R2 integration and job queue (see issue #153)
193
+
194
+
### AIFF/AIF browser compatibility fix (PR #152, Nov 11, 2025)
195
+
196
+
**format validation improvements**
197
+
- **problem discovered**: AIFF/AIF files only work in Safari, not Chrome/Firefox
198
+
- browsers throw `MediaError code 4: MEDIA_ERR_SRC_NOT_SUPPORTED`
199
+
- users could upload files but they wouldn't play in most browsers
200
+
- **immediate solution**: reject AIFF/AIF uploads at both backend and frontend
201
+
- removed AIFF/AIF from AudioFormat enum
202
+
- added format hints to upload UI: "supported: mp3, wav, m4a"
203
+
- client-side validation with helpful error messages
204
+
- **long-term solution**: deployed standalone transcoder service (see above)
205
+
- separate Rust/Axum service with ffmpeg
206
+
- accepts all formats, converts to browser-compatible MP3
207
+
- integration into upload pipeline pending (issue #153)
208
+
209
+
**observability improvements**:
210
+
- added logfire instrumentation to upload background tasks
211
+
- added logfire spans to R2 storage operations
212
+
- documented logfire querying patterns in `docs/logfire-querying.md`
213
+
214
+
### async I/O performance fixes (PRs #149-151, Nov 10-11, 2025)
215
+
216
+
Eliminated event loop blocking across backend with three critical PRs:
217
+
218
+
1. **PR #149: async R2 reads** - converted R2 `head_object` operations from sync boto3 to async aioboto3
219
+
- portal page load time: 2+ seconds → ~200ms
220
+
- root cause: `track.image_url` was blocking on serial R2 HEAD requests
221
+
222
+
2. **PR #150: concurrent PDS resolution** - parallelized ATProto PDS URL lookups
223
+
- homepage load time: 2-6 seconds → 200-400ms
224
+
- root cause: serial `resolve_atproto_data()` calls (8 artists × 200-300ms each)
225
+
- fix: `asyncio.gather()` for batch resolution, database caching for subsequent loads
226
+
227
+
3. **PR #151: async storage writes/deletes** - made save/delete operations non-blocking
228
+
- R2: switched to `aioboto3` for uploads/deletes (async S3 operations)
229
+
- filesystem: used `anyio.Path` and `anyio.open_file()` for chunked async I/O (64KB chunks)
230
+
- impact: multi-MB uploads no longer monopolize worker thread, constant memory usage
231
+
232
+
### cover art support (PRs #123-126, #132-139)
233
+
- ✅ track cover image upload and storage (separate R2 bucket)
234
+
- ✅ image display on track pages and player
235
+
- ✅ Open Graph meta tags for track sharing
236
+
- ✅ mobile-optimized layouts with cover art
237
+
- ✅ sticky bottom player on mobile with cover
238
+
239
+
### track detail pages (PR #164, Nov 12, 2025)
240
+
241
+
- ✅ dedicated track detail pages with large cover art
242
+
- ✅ play button updates queue state correctly (#169)
243
+
- ✅ liked state loaded efficiently via server-side fetch
244
+
- ✅ mobile-optimized layouts with proper scrolling constraints
245
+
- ✅ origin validation for image URLs (#168)
246
+
247
+
### mobile UI improvements (PRs #159-185, Nov 11-12, 2025)
248
+
249
+
- ✅ compact action menus and better navigation (#161)
250
+
- ✅ improved mobile responsiveness (#159)
251
+
- ✅ consistent button layouts across mobile/desktop (#176-181, #185)
252
+
- ✅ always show play count and like count on mobile (#177)
253
+
- ✅ login page UX improvements (#174-175)
254
+
- ✅ liked page UX improvements (#173)
255
+
- ✅ accent color for liked tracks (#160)
256
+
257
+
### queue management improvements (PRs #110-113, #115)
258
+
- ✅ visual feedback on queue add/remove
259
+
- ✅ toast notifications for queue actions
260
+
- ✅ better error handling for queue operations
261
+
- ✅ improved shuffle and auto-advance UX
262
+
263
+
### infrastructure and tooling
264
+
- ✅ R2 bucket separation: audio-prod and images-prod (PR #124)
265
+
- ✅ admin script for content moderation (`scripts/delete_track.py`)
266
+
- ✅ bluesky attribution link in header
267
+
- ✅ changelog target added (#183)
268
+
- ✅ documentation updates (#158)
269
+
- ✅ track metadata edits now persist correctly (#162)
270
+
271
+
## immediate priorities
272
+
273
+
### high priority features
274
+
1. **audio transcoding pipeline integration** (issue #153)
275
+
- ✅ standalone transcoder service deployed at https://plyr-transcoder.fly.dev/
276
+
- ✅ Rust/Axum service with ffmpeg, tested with 85-minute files
277
+
- ✅ secure auth via X-Transcoder-Key header
278
+
- ⏳ next: integrate into plyr.fm upload pipeline
279
+
- backend calls transcoder API for unsupported formats
280
+
- queue-based job system for async processing
281
+
- R2 integration (fetch original, store MP3)
282
+
- maintain original file hash for deduplication
283
+
- handle transcoding failures gracefully
284
+
285
+
### critical bugs
286
+
1. **upload reliability** (issue #147): upload returns 200 but file missing from R2, no error logged
287
+
- priority: high (data loss risk)
288
+
- need better error handling and retry logic in background upload task
289
+
290
+
2. **database connection pool SSL errors**: intermittent failures on first request
291
+
- symptom: `/tracks/` returns 500 on first request, succeeds after
292
+
- fix: set `pool_pre_ping=True`, adjust `pool_recycle` for Neon timeouts
293
+
- documented in `docs/logfire-querying.md`
294
+
295
+
### performance optimizations
296
+
3. **persist concrete file extensions in database**: currently brute-force probing all supported formats on read
297
+
- already know `Track.file_type` and image format during upload
298
+
- eliminating repeated `exists()` checks reduces filesystem/R2 HEAD spam
299
+
- improves audio streaming latency (`/audio/{file_id}` endpoint walks extensions sequentially)
300
+
301
+
4. **stream large uploads directly to storage**: current implementation reads entire file into memory before background task
302
+
- multi-GB uploads risk OOM
303
+
- stream from `UploadFile.file` → storage backend for constant memory usage
304
+
305
+
### new features
306
+
5. **content-addressable storage** (issue #146)
307
+
- hash-based file storage for automatic deduplication
308
+
- reduces storage costs when multiple artists upload same file
309
+
- enables content verification
310
+
311
+
6. **liked tracks feature** (issue #144): design schema and ATProto record format
312
+
- server-side persistent collections
313
+
- ATProto record publication for cross-platform visibility
314
+
- UI for adding/removing tracks from liked collection
315
+
316
+
## open issues by timeline
317
+
318
+
### immediate
319
+
- issue #153: audio transcoding pipeline (ffmpeg worker for AIFF/FLAC→MP3)
320
+
- issue #147: upload reliability bug (data loss risk)
321
+
- issue #144: likes feature for personal collections
322
+
323
+
### short-term
324
+
- issue #146: content-addressable storage (hash-based deduplication)
325
+
- issue #24: implement play count abuse prevention
326
+
- database connection pool tuning (SSL errors)
327
+
- file extension persistence in database
328
+
329
+
### medium-term
330
+
- issue #39: postmortem - cross-domain auth deployment and remaining security TODOs
331
+
- issue #46: consider removing init_db() from lifespan in favor of migration-only approach
332
+
- issue #56: design public developer API and versioning
333
+
- issue #57: support multiple audio item types (voice memos/snippets)
334
+
- issue #122: fullscreen player for immersive playback
335
+
336
+
### long-term
337
+
- migrate to plyr-owned lexicon (custom ATProto namespace with richer metadata)
338
+
- publish to multiple ATProto AppViews for cross-platform visibility
339
+
- explore ATProto-native notifications (replace Bluesky DM bot)
340
+
- realtime queue syncing across devices via SSE/WebSocket
341
+
- artist analytics dashboard improvements
342
+
- issue #44: modern music streaming feature parity
343
+
344
+
## technical state
345
+
346
+
### architecture
347
+
348
+
**backend**
349
+
- language: Python 3.11+
350
+
- framework: FastAPI with uvicorn
351
+
- database: Neon PostgreSQL (serverless, fully managed)
352
+
- storage: Cloudflare R2 (S3-compatible object storage)
353
+
- hosting: Fly.io (2x shared-cpu VMs, auto-scaling)
354
+
- observability: Pydantic Logfire (traces, metrics, logs)
355
+
- auth: ATProto OAuth 2.1 (forked SDK: github.com/zzstoatzz/atproto)
356
+
357
+
**frontend**
358
+
- framework: SvelteKit (latest v2.43.2)
359
+
- runtime: Bun (fast JS runtime)
360
+
- hosting: Cloudflare Pages (edge network)
361
+
- styling: vanilla CSS with lowercase aesthetic
362
+
- state management: Svelte 5 runes ($state, $derived, $effect)
363
+
364
+
**deployment**
365
+
- ci/cd: GitHub Actions
366
+
- backend: automatic on main branch merge (fly.io deploy)
367
+
- frontend: automatic on every push to main (cloudflare pages)
368
+
- migrations: automated via fly.io release_command
369
+
- environments: dev → staging → production (full separation)
370
+
- versioning: nebula timestamp format (YYYY.MMDD.HHMMSS)
371
+
372
+
**key dependencies**
373
+
- atproto: forked SDK for OAuth and record management
374
+
- sqlalchemy: async ORM for postgres
375
+
- alembic: database migrations
376
+
- boto3/aioboto3: R2 storage client
377
+
- logfire: observability (FastAPI + SQLAlchemy instrumentation)
378
+
- httpx: async HTTP client
379
+
380
+
**what's working**
381
+
382
+
**core functionality**
383
+
- ✅ ATProto OAuth 2.1 authentication with encrypted state
384
+
- ✅ secure session management via HttpOnly cookies (XSS protection)
385
+
- ✅ developer tokens with independent OAuth grants (programmatic API access)
386
+
- ✅ platform stats endpoint and homepage display (plays, tracks, artists)
387
+
- ✅ Media Session API for CarPlay, lock screens, control center
388
+
- ✅ timed comments on tracks with clickable timestamps
389
+
- ✅ account deletion with explicit confirmation
390
+
- ✅ artist profiles synced with Bluesky (avatar, display name, handle)
391
+
- ✅ track upload with streaming to prevent OOM
392
+
- ✅ track edit (title, artist, album, features metadata)
393
+
- ✅ track deletion with cascade cleanup
394
+
- ✅ audio streaming via HTML5 player with 307 redirects to R2 CDN
395
+
- ✅ track metadata published as ATProto records (fm.plyr.track namespace)
396
+
- ✅ play count tracking with threshold (30% or 30s, whichever comes first)
397
+
- ✅ like functionality with counts
398
+
- ✅ artist analytics dashboard
399
+
- ✅ queue management (shuffle, auto-advance, reorder)
400
+
- ✅ mobile-optimized responsive UI
401
+
- ✅ cross-tab queue synchronization via BroadcastChannel
402
+
- ✅ share tracks via URL with Open Graph previews (including cover art)
403
+
- ✅ image URL caching in database (eliminates N+1 R2 calls)
404
+
- ✅ format validation (rejects AIFF/AIF, accepts MP3/WAV/M4A with helpful error mes
405
+
sages)
406
+
- ✅ standalone audio transcoding service deployed and verified (see issue #153)
407
+
- ✅ Bluesky embed player UI changes implemented (pending upstream social-app PR)
408
+
- ✅ admin content moderation script for removing inappropriate uploads
409
+
- ✅ copyright moderation system (AuDD fingerprinting, review workflow, violation tracking)
410
+
- ✅ ATProto labeler for copyright violations (queryLabels, subscribeLabels XRPC endpoints)
411
+
- ✅ admin UI for reviewing flagged tracks with htmx (plyr-moderation.fly.dev/admin)
412
+
413
+
**albums**
414
+
- ✅ album database schema with track relationships
415
+
- ✅ album browsing pages (`/u/{handle}` shows discography)
416
+
- ✅ album detail pages (`/u/{handle}/album/{slug}`) with full track lists
417
+
- ✅ album cover art upload and display
418
+
- ✅ server-side rendering for SEO
419
+
- ✅ rich Open Graph metadata for link previews (music.album type)
420
+
- ✅ long album title handling (100-char slugs, CSS truncation)
421
+
- ⏸ ATProto records for albums (deferred, see issue #221)
422
+
423
+
**frontend architecture**
424
+
- ✅ server-side data loading (`+page.server.ts`) for artist and album pages
425
+
- ✅ client-side data loading (`+page.ts`) for auth-dependent pages
426
+
- ✅ centralized auth manager (`lib/auth.svelte.ts`)
427
+
- ✅ layout-level auth state (`+layout.ts`) shared across all pages
428
+
- ✅ eliminated "flash of loading" via proper load functions
429
+
- ✅ consistent auth patterns (no scattered localStorage calls)
430
+
431
+
**deployment (fully automated)**
432
+
- **production**:
433
+
- frontend: https://plyr.fm (cloudflare pages)
434
+
- backend: https://relay-api.fly.dev (fly.io: 2 machines, 1GB RAM, 1 shared CPU, min 1 running)
435
+
- database: neon postgresql
436
+
- storage: cloudflare R2 (audio-prod and images-prod buckets)
437
+
- deploy: github release → automatic
438
+
439
+
- **staging**:
440
+
- backend: https://api-stg.plyr.fm (fly.io: relay-api-staging)
441
+
- frontend: https://stg.plyr.fm (cloudflare pages: plyr-fm-stg)
442
+
- database: neon postgresql (relay-staging)
443
+
- storage: cloudflare R2 (audio-stg bucket)
444
+
- deploy: push to main → automatic
445
+
446
+
- **development**:
447
+
- backend: localhost:8000
448
+
- frontend: localhost:5173
449
+
- database: neon postgresql (relay-dev)
450
+
- storage: cloudflare R2 (audio-dev and images-dev buckets)
451
+
452
+
- **developer tooling**:
453
+
- `just serve` - run backend locally
454
+
- `just dev` - run frontend locally
455
+
- `just test` - run test suite
456
+
- `just release` - create production release (backend + frontend)
457
+
- `just release-frontend-only` - deploy only frontend changes (added Nov 13)
458
+
459
+
### what's in progress
460
+
461
+
**immediate work**
462
+
- investigating playback auto-start behavior (#225)
463
+
- page refresh sometimes starts playing immediately
464
+
- may be related to queue state restoration or localStorage caching
465
+
- `autoplay_next` preference not being respected in all cases
466
+
- liquid glass effects as user-configurable setting (#186)
467
+
468
+
**active research**
469
+
- transcoding pipeline architecture (see sandbox/transcoding-pipeline-plan.md)
470
+
- content moderation systems (#166, #167, #393 - takedown state representation)
471
+
- PWA capabilities and offline support (#165)
472
+
473
+
### known issues
474
+
475
+
**player behavior**
476
+
- playback auto-start on refresh (#225)
477
+
- sometimes plays immediately after page load
478
+
- investigating localStorage/queue state persistence
479
+
- may not respect `autoplay_next` preference in all scenarios
480
+
481
+
**missing features**
482
+
- no ATProto records for albums yet (#221 - consciously deferred)
483
+
- no track genres/tags/descriptions yet (#155)
484
+
- no AIFF/AIF transcoding support (#153)
485
+
- no PWA installation prompts (#165)
486
+
- no fullscreen player view (#122)
487
+
- no public API for third-party integrations (#56)
488
+
489
+
**technical debt**
490
+
- multi-tab playback synchronization could be more robust
491
+
- queue state conflicts can occur with rapid operations
492
+
493
+
### technical decisions
494
+
495
+
**why Python/FastAPI instead of Rust?**
496
+
- rapid prototyping velocity during MVP phase
497
+
- rich ecosystem for web APIs (fastapi, sqlalchemy, pydantic)
498
+
- excellent async support with asyncio
499
+
- lower barrier to contribution
500
+
- trade-off: accepting higher latency for faster development
501
+
- future: can migrate hot paths to Rust if needed (transcoding service already planned)
502
+
503
+
**why Fly.io instead of AWS/GCP?**
504
+
- simple deployment model (dockerfile → production)
505
+
- automatic SSL/TLS certificates
506
+
- built-in global load balancing
507
+
- reasonable pricing for MVP ($5/month)
508
+
- easy migration path to larger providers later
509
+
- trade-off: vendor-specific features, less control
510
+
511
+
**why Cloudflare R2 instead of S3?**
512
+
- zero egress fees (critical for audio streaming)
513
+
- S3-compatible API (easy migration if needed)
514
+
- integrated CDN for fast delivery
515
+
- significantly cheaper than S3 for bandwidth-heavy workloads
516
+
517
+
**why forked atproto SDK?**
518
+
- upstream SDK lacked OAuth 2.1 support
519
+
- needed custom record management patterns
520
+
- maintains compatibility with ATProto spec
521
+
- contributes improvements back when possible
522
+
523
+
**why SvelteKit instead of React/Next.js?**
524
+
- Svelte 5 runes provide excellent reactivity model
525
+
- smaller bundle sizes (critical for mobile)
526
+
- less boilerplate than React
527
+
- SSR + static generation flexibility
528
+
- modern DX with TypeScript
529
+
530
+
**why Neon instead of self-hosted Postgres?**
531
+
- serverless autoscaling (no capacity planning)
532
+
- branch-per-PR workflow (preview databases)
533
+
- automatic backups and point-in-time recovery
534
+
- generous free tier for MVP
535
+
- trade-off: higher latency than co-located DB, but acceptable
536
+
537
+
**why reject AIFF instead of transcoding immediately?**
538
+
- MVP speed: transcoding requires queue infrastructure, ffmpeg setup, error handling
539
+
- user communication: better to be upfront about limitations than silent failures
540
+
- resource management: transcoding is CPU-intensive, needs proper worker architecture
541
+
- future flexibility: can add transcoding as optional feature (high-quality uploads → MP3 delivery)
542
+
- trade-off: some users can't upload AIFF now, but those who can upload MP3 have working experience
543
+
544
+
**why async everywhere?**
545
+
- event loop performance: single-threaded async handles high concurrency
546
+
- I/O-bound workload: most time spent waiting on network/disk
547
+
- recent work (PRs #149-151) eliminated all blocking operations
548
+
- alternative: thread pools for blocking I/O, but increases complexity
549
+
- trade-off: debugging async code harder than sync, but worth throughput gains
550
+
551
+
**why anyio.Path over thread pools?**
552
+
- true async I/O: `anyio` uses OS-level async file operations where available
553
+
- constant memory: chunked reads/writes (64KB) prevent OOM on large files
554
+
- thread pools: would work but less efficient, more context switching
555
+
- trade-off: anyio API slightly different from stdlib `pathlib`, but cleaner async semantics
556
+
557
+
## cost structure
558
+
559
+
current monthly costs: ~$5-6
560
+
561
+
- cloudflare pages: $0 (free tier)
562
+
- cloudflare R2: ~$0.16 (storage + operations, no egress fees)
563
+
- fly.io production: $5.00 (2x shared-cpu-1x VMs with auto-stop)
564
+
- fly.io staging: $0 (auto-stop, only runs during testing)
565
+
- neon: $0 (free tier, 0.5 CPU, 512MB RAM, 3GB storage)
566
+
- logfire: $0 (free tier)
567
+
- domain: $12/year (~$1/month)
568
+
569
+
## deployment URLs
570
+
571
+
- **production frontend**: https://plyr.fm
572
+
- **production backend**: https://relay-api.fly.dev (redirects to https://api.plyr.fm)
573
+
- **staging backend**: https://api-stg.plyr.fm
574
+
- **staging frontend**: https://stg.plyr.fm
575
+
- **repository**: https://github.com/zzstoatzz/plyr.fm (private)
576
+
- **monitoring**: https://logfire-us.pydantic.dev/zzstoatzz/relay
577
+
- **bluesky**: https://bsky.app/profile/plyr.fm
578
+
- **latest release**: 2025.1129.214811
579
+
580
+
## health indicators
581
+
582
+
**production status**: ✅ healthy
583
+
- uptime: consistently available
584
+
- response times: <500ms p95 for API endpoints
585
+
- error rate: <1% (mostly invalid OAuth states)
586
+
- storage: ~12 tracks uploaded, functioning correctly
587
+
588
+
**key metrics**
589
+
- total tracks: ~12
590
+
- total artists: ~3
591
+
- play counts: tracked per-track
592
+
- storage used: <1GB R2
593
+
- database size: <10MB postgres
594
+
595
+
## next session prep
596
+
597
+
**context for new agent:**
598
+
1. Fixed R2 image upload path mismatch, ensuring images save with the correct prefix.
599
+
2. Implemented UI changes for the embed player: removed the Queue button and matched fonts to the main app.
600
+
3. Opened a draft PR to the upstream social-app repository for native Plyr.fm embed support.
601
+
4. Updated issue #153 (transcoding pipeline) with a clear roadmap for integration into the backend.
602
+
5. Developed a local verification script for the transcoder service for faster local iteration.
603
+
604
+
**useful commands:**
605
+
- `just backend run` - run backend locally
606
+
- `just frontend dev` - run frontend locally
607
+
- `just test` - run test suite (from `backend/` directory)
608
+
- `gh issue list` - check open issues
609
+
## admin tooling
610
+
611
+
### content moderation
612
+
script: `scripts/delete_track.py`
613
+
- requires `ADMIN_*` prefixed environment variables
614
+
- deletes audio file from R2
615
+
- deletes cover image from R2 (if exists)
616
+
- deletes database record (cascades to likes and queue entries)
617
+
- notes ATProto records for manual cleanup (can't delete from other users' PDS)
618
+
619
+
usage:
620
+
```bash
621
+
# dry run
622
+
uv run scripts/delete_track.py <track_id> --dry-run
623
+
624
+
# delete with confirmation
625
+
uv run scripts/delete_track.py <track_id>
626
+
627
+
# delete without confirmation
628
+
uv run scripts/delete_track.py <track_id> --yes
629
+
630
+
# by URL
631
+
uv run scripts/delete_track.py --url https://plyr.fm/track/34
632
+
```
633
+
634
+
required environment variables:
635
+
- `ADMIN_DATABASE_URL` - production database connection
636
+
- `ADMIN_AWS_ACCESS_KEY_ID` - R2 access key
637
+
- `ADMIN_AWS_SECRET_ACCESS_KEY` - R2 secret
638
+
- `ADMIN_R2_ENDPOINT_URL` - R2 endpoint
639
+
- `ADMIN_R2_BUCKET` - R2 bucket name
640
+
641
+
## known issues
642
+
643
+
### non-blocking
644
+
- cloudflare pages preview URLs return 404 (production works fine)
645
+
- some "relay" references remain in docs and comments
646
+
- ATProto like records can't be deleted when removing tracks (orphaned on users' PDS)
647
+
648
+
## for new contributors
649
+
650
+
### getting started
651
+
1. clone: `gh repo clone zzstoatzz/plyr.fm`
652
+
2. install dependencies: `uv sync && cd frontend && bun install`
653
+
3. run backend: `uv run uvicorn backend.main:app --reload`
654
+
4. run frontend: `cd frontend && bun run dev`
655
+
5. visit http://localhost:5173
656
+
657
+
### development workflow
658
+
1. create issue on github
659
+
2. create PR from feature branch
660
+
3. ensure pre-commit hooks pass
661
+
4. test locally
662
+
5. merge to main → deploys to staging automatically
663
+
6. verify on staging
664
+
7. create github release → deploys to production automatically
665
+
666
+
### key principles
667
+
- type hints everywhere
668
+
- lowercase aesthetic
669
+
- generic terminology (use "items" not "tracks" where appropriate)
670
+
- ATProto first
671
+
- mobile matters
672
+
- cost conscious
673
+
- async everywhere (no blocking I/O)
674
+
675
+
### project structure
676
+
```
677
+
plyr.fm/
678
+
├── backend/ # FastAPI app & Python tooling
679
+
│ ├── src/backend/ # application code
680
+
│ │ ├── api/ # public endpoints
681
+
│ │ ├── _internal/ # internal services
682
+
│ │ ├── models/ # database schemas
683
+
│ │ └── storage/ # storage adapters
684
+
│ ├── tests/ # pytest suite
685
+
│ └── alembic/ # database migrations
686
+
├── frontend/ # SvelteKit app
687
+
│ ├── src/lib/ # components & state
688
+
│ └── src/routes/ # pages
689
+
├── moderation/ # Rust moderation service (ATProto labeler)
690
+
│ ├── src/ # Axum handlers, AuDD client, label signing
691
+
│ └── static/ # admin UI (html/css/js)
692
+
├── transcoder/ # Rust audio transcoding service
693
+
├── docs/ # documentation
694
+
└── justfile # task runner (mods: backend, frontend, moderation, transcoder)
695
+
```
696
+
697
+
## documentation
698
+
699
+
- [deployment overview](docs/deployment/overview.md)
700
+
- [configuration guide](docs/configuration.md)
701
+
- [queue design](docs/queue-design.md)
702
+
- [logfire querying](docs/logfire-querying.md)
703
+
- [pdsx guide](docs/pdsx-guide.md)
704
+
- [neon mcp guide](docs/neon-mcp-guide.md)
705
+
706
+
## performance optimization session (Nov 12, 2025)
707
+
708
+
### issue: slow /tracks/liked endpoint
709
+
710
+
**symptoms**:
711
+
- `/tracks/liked` taking 600-900ms consistently
712
+
- only ~25ms spent in database queries
713
+
- mysterious 575ms gap with no spans in Logfire traces
714
+
- endpoint felt sluggish compared to other pages
715
+
716
+
**investigation**:
717
+
- examined Logfire traces for `/tracks/liked` requests
718
+
- found 5-6 liked tracks being returned per request
719
+
- DB queries completing fast (track data, artist info, like counts all under 10ms each)
720
+
- noticed R2 storage calls weren't appearing in traces despite taking majority of request time
721
+
722
+
**root cause**:
723
+
- PR #184 added `image_url` column to tracks table to eliminate N+1 R2 API calls
724
+
- new tracks (uploaded after PR) have `image_url` populated at upload time ✅
725
+
- legacy tracks (15 tracks uploaded before PR) had `image_url = NULL` ❌
726
+
- fallback code called `track.get_image_url()` for NULL values
727
+
- `get_image_url()` makes uninstrumented R2 `head_object` API calls to find image extensions
728
+
- each track with NULL `image_url` = ~100-120ms of R2 API calls per request
729
+
- 5 tracks × 120ms = ~600ms of uninstrumented latency
730
+
731
+
**why R2 calls weren't visible**:
732
+
- `storage.get_url()` method had no Logfire instrumentation
733
+
- R2 API calls happening but not creating spans
734
+
- appeared as mysterious gap in trace timeline
735
+
736
+
**solution implemented**:
737
+
1. created `scripts/backfill_image_urls.py` to populate missing `image_url` values
738
+
2. ran script against production database with production R2 credentials
739
+
3. backfilled 11 tracks successfully (4 already done in previous partial run)
740
+
4. 3 tracks "failed" but actually have non-existent images (optional, expected)
741
+
5. script uses concurrent `asyncio.gather()` for performance
742
+
743
+
**key learning: environment configuration matters**:
744
+
- initial script runs failed silently because:
745
+
- script used local `.env` credentials (dev R2 bucket)
746
+
- production images stored in different R2 bucket (`images-prod`)
747
+
- `get_url()` returned `None` when images not found in dev bucket
748
+
- fix: passed production R2 credentials via environment variables:
749
+
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`
750
+
- `R2_IMAGE_BUCKET=images-prod`
751
+
- `R2_PUBLIC_IMAGE_BUCKET_URL=https://pub-7ea7ea9a6f224f4f8c0321a2bb008c5a.r2.dev`
752
+
753
+
**results**:
754
+
- before: 15 tracks needed backfill, causing ~600-900ms latency on `/tracks/liked`
755
+
- after: 13 tracks populated with `image_url`, 3 legitimately have no images
756
+
- `/tracks/liked` now loads with 0 R2 API calls instead of 5-11
757
+
- endpoint feels "really, really snappy" (user feedback)
758
+
- performance improvement visible immediately after backfill
759
+
760
+
**database cleanup: queue_state table bloat**:
761
+
- discovered `queue_state` had 265% bloat (53 dead rows, 20 live rows)
762
+
- ran `VACUUM (FULL, ANALYZE) queue_state` against production
763
+
- result: 0 dead rows, table clean
764
+
- configured autovacuum for queue_state to prevent future bloat:
765
+
- frequent updates to this table make it prone to bloat
766
+
- should tune `autovacuum_vacuum_scale_factor` to 0.05 (5% vs default 20%)
767
+
768
+
**endpoint performance snapshot** (post-fix, last 10 minutes):
769
+
- `GET /tracks/`: 410ms (down from 2+ seconds)
770
+
- `GET /queue/`: 399ms (down from 2+ seconds)
771
+
- `GET /tracks/liked`: now sub-200ms (down from 600-900ms)
772
+
- `GET /preferences/`: 200ms median
773
+
- `GET /auth/me`: 114ms median
774
+
- `POST /tracks/{track_id}/play`: 34ms
775
+
776
+
**PR #184 context**:
777
+
- PR claimed "opportunistic backfill: legacy records update on first access"
778
+
- but actual implementation never saved computed `image_url` back to database
779
+
- fallback code only computed URLs on-demand, didn't persist them
780
+
- this is why repeated visits kept hitting R2 API for same tracks
781
+
- one-time backfill script was correct solution vs adding write logic to read endpoints
782
+
783
+
**graceful ATProto recovery (PR #180)**:
784
+
- reviewed recent work on handling tracks with missing `atproto_record_uri`
785
+
- 4 tracks in production have NULL ATProto records (expected from upload failures)
786
+
- system already handles this gracefully:
787
+
- like buttons disabled with helpful tooltips
788
+
- track owners can self-service restore via portal
789
+
- `restore-record` endpoint recreates with correct TID timestamps
790
+
- no action needed - existing recovery system working as designed
791
+
792
+
**performance metrics pre/post all recent PRs**:
793
+
- PR #184 (image_url storage): eliminated hundreds of R2 API calls per request
794
+
- today's backfill: eliminated remaining R2 calls for legacy tracks
795
+
- combined impact: queue/tracks endpoints now 5-10x faster than before PR #184
796
+
- all endpoints now consistently sub-second response times
797
+
798
+
**documentation created**:
799
+
- `docs/neon-mcp-guide.md`: comprehensive guide for using Neon MCP
800
+
- project/branch management
801
+
- database schema inspection
802
+
- SQL query patterns for plyr.fm
803
+
- connection string generation
804
+
- environment mapping (dev/staging/prod)
805
+
- debugging workflows
806
+
- `scripts/backfill_image_urls.py`: reusable for any future image_url gaps
807
+
- dry-run mode for safety
808
+
- concurrent R2 API calls
809
+
- detailed error logging
810
+
- production-tested
811
+
812
+
**tools and patterns established**:
813
+
- Neon MCP for database inspection and queries
814
+
- Logfire arbitrary queries for performance analysis
815
+
- production secret management via Fly.io
816
+
- `flyctl ssh console` for environment inspection
817
+
- backfill scripts with dry-run mode
818
+
- environment variable overrides for production operations
819
+
820
+
**system health indicators**:
821
+
- ✅ no 5xx errors in recent spans
822
+
- ✅ database queries all under 70ms p95
823
+
- ✅ SSL connection pool issues resolved (no errors in recent traces)
824
+
- ✅ queue_state table bloat eliminated
825
+
- ✅ all track images either in DB or legitimately NULL
826
+
- ✅ application feels fast and responsive
827
+
828
+
**next steps**:
829
+
1. configure autovacuum for `queue_state` table (prevent future bloat)
830
+
2. add Logfire instrumentation to `storage.get_url()` for visibility
831
+
3. monitor `/tracks/liked` performance over next few days
832
+
4. consider adding similar backfill pattern for any future column additions
833
+
139
834
---
140
835
141
-
this is a living document. last updated 2025-12-01 after ATProto labeler work.
836
+
### copyright moderation system (PRs #382, #384, Nov 29-30, 2025)
837
+
838
+
**motivation**: detect potential copyright violations in uploaded tracks to avoid DMCA issues and protect the platform.
839
+
840
+
**what shipped**:
841
+
- **moderation service** (Rust/Axum on Fly.io):
842
+
- standalone service at `plyr-moderation.fly.dev`
843
+
- integrates with AuDD enterprise API for audio fingerprinting
844
+
- scans audio URLs and returns matches with metadata (artist, title, album, ISRC, timecode)
845
+
- auth via `X-Moderation-Key` header
846
+
- **backend integration** (PR #382):
847
+
- `ModerationSettings` in config (service URL, auth token, timeout)
848
+
- moderation client module (`backend/_internal/moderation.py`)
849
+
- fire-and-forget background task on track upload
850
+
- stores results in `copyright_scans` table
851
+
- scan errors stored as "clear" so tracks aren't stuck unscanned
852
+
- **flagging fix** (PR #384):
853
+
- AuDD enterprise API returns no confidence scores (all 0)
854
+
- changed from score threshold to presence-based flagging: `is_flagged = !matches.is_empty()`
855
+
- removed unused `score_threshold` config
856
+
- **backfill script** (`scripts/scan_tracks_copyright.py`):
857
+
- scans existing tracks that haven't been checked
858
+
- `--max-duration` flag to skip long DJ sets (estimated from file size)
859
+
- `--dry-run` mode to preview what would be scanned
860
+
- supports dev/staging/prod environments
861
+
- **review workflow**:
862
+
- `copyright_scans` table has `resolution`, `reviewed_at`, `reviewed_by`, `review_notes` columns
863
+
- resolution values: `violation`, `false_positive`, `original_artist`
864
+
- SQL queries for dashboard: flagged tracks, unreviewed flags, violations list
865
+
866
+
**initial review results** (25 flagged tracks):
867
+
- 8 violations (actual copyright issues)
868
+
- 11 false positives (fingerprint noise)
869
+
- 6 original artists (people uploading their own distributed music)
870
+
871
+
**impact**:
872
+
- automated copyright detection on upload
873
+
- manual review workflow for flagged content
874
+
- protection against DMCA takedown requests
875
+
- clear audit trail with resolution status
876
+
877
+
---
878
+
879
+
### platform stats and media session integration (PRs #359-379, Nov 27-29, 2025)
880
+
881
+
**motivation**: show platform activity at a glance, improve playback experience across devices, and give users control over their data.
882
+
883
+
**what shipped**:
884
+
- **platform stats endpoint and UI** (PRs #376, #378, #379):
885
+
- `GET /stats` returns total plays, tracks, and artists
886
+
- stats bar displays in homepage header (e.g., "1,691 plays • 55 tracks • 8 artists")
887
+
- skeleton loading animation while fetching
888
+
- responsive layout: visible in header on wide screens, collapses to menu on narrow
889
+
- end-of-list animation on homepage
890
+
- **Media Session API** (PR #371):
891
+
- provides track metadata to CarPlay, lock screens, Bluetooth devices, macOS control center
892
+
- artwork display with fallback to artist avatar
893
+
- play/pause, prev/next, seek controls all work from system UI
894
+
- position state syncs scrubbers on external interfaces
895
+
- **browser tab title** (PR #374):
896
+
- shows "track - artist • plyr.fm" while playing
897
+
- persists across page navigation
898
+
- reverts to page title when playback stops
899
+
- **timed comments** (PR #359):
900
+
- comments capture timestamp when added during playback
901
+
- clickable timestamp buttons seek to that moment
902
+
- compact scrollable comments section on track pages
903
+
- **constellation integration** (PR #360):
904
+
- queries constellation.microcosm.blue backlink index
905
+
- enables network-wide like counts (not just plyr.fm internal)
906
+
- environment-aware namespace handling
907
+
- **account deletion** (PR #363):
908
+
- explicit confirmation flow (type handle to confirm)
909
+
- deletes all plyr.fm data (tracks, albums, likes, comments, preferences)
910
+
- optional ATProto record cleanup with clear warnings about orphaned references
911
+
912
+
**impact**:
913
+
- platform stats give visitors immediate sense of activity
914
+
- media session makes plyr.fm tracks controllable from car/lock screen/control center
915
+
- timed comments enable discussion at specific moments in tracks
916
+
- account deletion gives users full control over their data
917
+
918
+
---
919
+
920
+
### developer tokens with independent OAuth grants (PR #367, Nov 28, 2025)
921
+
922
+
**motivation**: programmatic API access (scripts, CLIs, automation) needed tokens that survive browser logout and don't become stale when browser sessions refresh.
923
+
924
+
**what shipped**:
925
+
- **OAuth-based dev tokens**: each developer token gets its own OAuth authorization flow
926
+
- user clicks "create token" → redirected to PDS for authorization → token created with independent credentials
927
+
- tokens have their own DPoP keypair, access/refresh tokens - completely separate from browser session
928
+
- **cookie isolation**: dev token exchange doesn't set browser cookie
929
+
- added `is_dev_token` flag to ExchangeToken model
930
+
- /auth/exchange skips Set-Cookie for dev token flows
931
+
- prevents logout from deleting dev tokens (critical bug fixed during implementation)
932
+
- **token management UI**: portal → "your data" → "developer tokens"
933
+
- create with optional name and expiration (30/90/180/365 days or never)
934
+
- list active tokens with creation/expiration dates
935
+
- revoke individual tokens
936
+
- **API endpoints**:
937
+
- `POST /auth/developer-token/start` - initiates OAuth flow, returns auth_url
938
+
- `GET /auth/developer-tokens` - list user's tokens
939
+
- `DELETE /auth/developer-tokens/{prefix}` - revoke by 8-char prefix
940
+
941
+
**security properties**:
942
+
- tokens are full sessions with encrypted OAuth credentials (Fernet)
943
+
- each token refreshes independently (no staleness from browser session refresh)
944
+
- revokable individually without affecting browser or other tokens
945
+
- explicit OAuth consent required at PDS for each token created
946
+
947
+
**testing verified**:
948
+
- created token → uploaded track → logged out → deleted track with token ✓
949
+
- browser logout doesn't affect dev tokens ✓
950
+
- token works across browser sessions ✓
951
+
- staging deployment tested end-to-end ✓
952
+
953
+
**documentation**: see `docs/authentication.md` "developer tokens" section
954
+
955
+
---
956
+
957
+
### oEmbed endpoint for Leaflet.pub embeds (PRs #355-358, Nov 25, 2025)
958
+
959
+
**motivation**: plyr.fm tracks embedded in Leaflet.pub (via iframely) showed a black HTML5 audio box instead of our custom embed player.
960
+
961
+
**what shipped**:
962
+
- **oEmbed endpoint** (PR #355): `/oembed` returns proper embed HTML with iframe
963
+
- follows oEmbed spec with `type: "rich"` and iframe in `html` field
964
+
- discovery link in track page `<head>` for automatic detection
965
+
- **iframely domain registration**: registered plyr.fm on iframely.com (free tier)
966
+
- this was the key fix - iframely now returns our embed iframe as `links.player[0]`
967
+
- API key: stored in 1password (iframely account)
968
+
969
+
**debugging journey** (PRs #356-358):
970
+
- initially tried `og:video` meta tags to hint iframe embed - didn't work
971
+
- tried removing `og:audio` to force oEmbed fallback - resulted in no player link
972
+
- discovered iframely requires domain registration to trust oEmbed providers
973
+
- after registration, iframely correctly returns embed iframe URL
974
+
975
+
**current state**:
976
+
- oEmbed endpoint working: `curl https://api.plyr.fm/oembed?url=https://plyr.fm/track/92`
977
+
- iframely returns `links.player[0].href = "https://plyr.fm/embed/track/92"` (our embed)
978
+
- Leaflet.pub should show proper embeds (pending their cache expiry)
979
+
980
+
**impact**:
981
+
- plyr.fm tracks can be embedded in Leaflet.pub and other iframely-powered services
982
+
- proper embed player with cover art instead of raw HTML5 audio
983
+
984
+
---
142
985
143
-
older history has been archived to .status_history/ directory.
986
+
### export & upload reliability (PRs #337-344, Nov 24, 2025)
987
+
988
+
**motivation**: exports were failing silently on large files (OOM), uploads showed incorrect progress, and SSE connections triggered false error toasts.
989
+
990
+
**what shipped**:
991
+
- **database-backed jobs** (PR #337): moved upload/export tracking from in-memory to postgres
992
+
- jobs table persists state across server restarts
993
+
- enables reliable progress tracking via SSE polling
994
+
- **streaming exports** (PR #343): fixed OOM on large file exports
995
+
- previously loaded entire files into memory via `response["Body"].read()`
996
+
- now streams to temp files, adds to zip from disk (constant memory)
997
+
- 90-minute WAV files now export successfully on 1GB VM
998
+
- **progress tracking fix** (PR #340): upload progress was receiving bytes but treating as percentage
999
+
- `UploadProgressTracker` now properly converts bytes to percentage
1000
+
- upload progress bar works correctly again
1001
+
- **UX improvements** (PRs #338-339, #341-342, #344):
1002
+
- export filename now includes date (`plyr-tracks-2025-11-24.zip`)
1003
+
- toast notification on track deletion
1004
+
- fixed false "lost connection" error when SSE completes normally
1005
+
- progress now shows "downloading track X of Y" instead of confusing count
1006
+
1007
+
**impact**:
1008
+
- exports work for arbitrarily large files (limited by disk, not RAM)
1009
+
- upload progress displays correctly
1010
+
- job state survives server restarts
1011
+
- clearer progress messaging during exports
1012
+
1013
+
---
1014
+
1015
+
this is a living document. last updated 2025-12-01 after ATProto labeler work.