podcast manager
1#+PROPERTY: COOKIE_DATA recursive
2#+STARTUP: overview
3
4most of this is old, I need to rework it
5
6* design
7
8** frontend (packages/app)
9- http://localhost:7891
10- proxies ~/api~ and ~/sync~ to the backend in development
11- uses Dexie for local storage with sync plugin
12- custom sync replication implementation using PeerJS through the signalling server
13
14** backend (packages/server)
15- http://localhost:7890
16- serves ~/dist~ if the directory is present (see ~dist~ script)
17- serves ~/api~ for RSS caching proxy
18 - file-based routing under the api directory
19- serves ~/sync~ which is a ~peerjs~ signalling server
20
21** sync
22- each client keeps the full data set
23- dexie sync and observable let us stream change sets
24- we can publish the "latest" to all peers
25- on first pull, if not the first client, we can request a dump out of band
26
27*** rss feed data
28- do we want to backup feed data?
29 - conceptually, this should be refetchable
30 - but feeds go away, and some will only show recent stories
31 - so yes, we'll need this
32 - but server side, we can dedupe
33 - content-addressed server-side cache?
34
35- server side does RSS pulling
36 - can feeds be marked private, such that they won't be pulled through the proxy?
37 - but then we require everything to be fetchable via cors
38 - client configured proxy settings?
39
40*** peer connection
41- on startup, check for current realm-id and key pair
42- if not present, ask to login or start new
43 - if login, run through the [[* pairing]] process
44 - if start new, run through the [[* registration]] process
45- use keypair to authenticate to server
46 - response includes list of active peers to connect
47- clients negotiate sync from there
48- an identity is a keypair and a realm
49
50- realm is uuid
51 - realm on the server is the socket connection for peer discovery
52 - keeps a list of verified public keys
53 - and manages the /current/ ~public-key->peer ids~ mapping
54 - realm on the client side is first piece of info required for sync
55 - when connecting to the signalling server, you present a realm, and a signed public key
56 - server accepts/rejects based on signature and current verified keys
57
58- a new keypair can create a realm
59
60- a new keypair can double sign an invitation
61 - invite = ~{ realm:, nonce:, not_before:, not_after:, authorizer: }~, signed with verified key
62 - exchanging an invite = ~{ invite: }~, signed with my key
63
64- on startup
65 - start stand-alone (no syncing required, usually the case on first-run)
66 - generate a keypair
67 - want server backup?
68 - sign a "setup" message with new keypair and send to the server
69 - server responds with a new realm, that this keypair is already verified for
70 - move along
71 - exchange invite to sync to other devices
72 - generate a keypair
73 - sign the exchange message with the invite and send to the server
74 - server verifies the invite
75 - adds the new public key to the peer list and publishes downstream
76 - move along
77
78***** standalone
79in this mode, there is no syncing. this is the most likely first-time run option.
80
81- generate a keypair on startup, so we have a stable fingerprint in the future
82- done
83
84***** pairing
85in this mode, there is syncing to a named realm, but not necessarily server resources consumed
86we don't need an email, since the server is just doing signalling and peer management
87
88- generate an invite from an existing verified peer
89 - ~{ realm:, not_before:, not_after:, inviter: peer.public_key }~
90 - sign that invitation from the existing verified peer
91
92- standalone -> paired
93 - get the invitation somehow (QR code?)
94 - sign an invite exchange with the standalone's public key
95 - send to server
96 - server verifies the invite
97 - adds the new public key to the peer list and publishes downstream
98
99***** server backup
100in this mode, there is syncing to a named realm by email.
101
102goal of server backup mode is that we can go from email->fully working client with latest data without having to have any clients left around that could participate in the sync.
103
104- generate a keypair on startup
105- sign a registration message sent to the server
106 - send a verification email
107 - if email/realm already exists, this is authorization
108 - if not, it's email validation
109 - server starts a realm and associates the public key
110 - server acts as a peer for the realm, and stores private data
111
112- since dexie is publishing change sets, we should be able to just store deltas
113- but we'll need to store _all_ deltas, unless we're materializing on the server side too
114 - should we use an indexdb shim so we can import/export from the server for clean start?
115 - how much materialization does the server need?
116
117* ai instructions
118- when writing to the devlog, add tags to your entries specifying ~:ai:~ and what tool did it.
119- false starts and prototypes are in ~./devlog/~
120
121* notes and decision record [1/11]
122** architecture design (may 28-29) :ai:claude:
123
124details notes are in [[./devlog/may-29.org]]
125key decisions and system design:
126
127*** sync model
128- device-specific records for playback state/queues to avoid conflicts
129- content-addressed server cache with deduplication
130- dual-JWT invitation flow for secure realm joining
131
132*** data structures
133- tag-based filtering system instead of rigid hierarchies
134- regex patterns for episode title parsing and organization
135- service worker caching with background download support
136
137*** core schemas
138**** client (dexie)
139- Channel/ChannelEntry for RSS feeds and episodes
140- PlayRecord/QueueItem scoped by deviceId
141- FilterView for virtual feed organization
142
143**** server (drizzle)
144- ContentStore for deduplicated content by hash
145- Realm/PeerConnection for sync authorization
146- HttpCache with health tracking and TTL
147
148*** push sync strategy
149- revision-based sync (just send revision ranges in push notifications)
150- background fetch API for large downloads where supported
151- graceful degradation to reactive caching
152
153*** research todos :ai:claude:
154
155**** sync and data management
156***** DONE identity and signature management
157***** TODO dexie sync capabilities vs rxdb for multi-device sync implementation
158***** TODO webrtc p2p sync implementation patterns and reliability
159***** TODO conflict resolution strategies for device-specific data in distributed sync
160***** TODO content-addressed deduplication algorithms for rss/podcast content
161**** client-side storage and caching
162***** TODO opfs storage limits and cleanup strategies for client-side caching
163***** TODO practical background fetch api limits and edge cases for podcast downloads
164**** automation and intelligence
165***** TODO llm-based regex generation for episode title parsing automation
166***** TODO push notification subscription management and realm authentication
167**** platform and browser capabilities
168***** TODO browser audio api capabilities for podcast-specific features (speed, silence skip)
169***** TODO progressive web app installation and platform-specific behaviors
170
171# Local Variables:
172# org-hierarchical-todo-statistics: nil
173# org-checkbox-hierarchical-statistics: nil
174# End:
175
176** <2025-05-28 Wed>
177getting everything setup
178
179the biggest open question I have is what sort of privacy/encryption guarantee I need. I want the server to be able to do things like cache and store feed data long-term.
180
181Is "if you want full privacy, self-host" valid?
182
183*** possibilities
184
185- fully PWA
186 - CON: cors, which would require a proxy anyway
187 - CON: audio analysis, llm based stuff for categorization, etc. won't work
188 - PRO: private as all get out
189 - can still do WebRTC p2p sync for resiliancy
190 - can still do server backups, if sync stream is encrypted, but no compaction would be available
191 - could do _explicit_ server backups as dump files
192
193- self hostable
194 - PRO: can do bunches of private stuff on the server, because if you don't want me to see it, do it elsewhere
195 - CON: hard for folk to use
196
197*** brainstorm :ai:claude:
198**** sync conflict resolution design discussion :ai:claude:
199
200discussed the sync architecture and dexie conflict handling:
201
202*dexie syncable limitations*:
203- logical clocks handle causally-related changes well
204- basic timestamp-based conflict resolution for concurrent updates
205- last-writer-wins for same field conflicts
206- no sophisticated CRDT or vector clock support
207
208*solutions for podcast-specific conflicts*:
209
210- play records: device-specific approach
211 - store separate ~play_records~ per ~device_id~
212 - each record: ~{ episode_id, device_id, position, completed, timestamp }~
213 - UI handles conflict resolution with "continue from X device?" prompts
214 - avoids arbitrary timestamp wins, gives users control
215
216- subscription trees
217 - store ~parent_path~ as single string field ("/Tech/Programming")
218 - simpler than managing folder membership tables
219 - conflicts still possible but contained to single field
220 - could store move operations as events for richer resolution
221
222*other sync considerations*:
223- settings/preferences: distinguish device-local vs global
224- bulk operations: "mark all played" can create duplicate operations
225- metadata updates: server RSS updates vs local renames
226- temporal ordering: recently played lists, queue reordering
227- storage limits: cleanup operations conflicting across devices
228- feed state: refresh timestamps, error states
229
230*approach*: prefer "events not state" pattern and device-specific records where semantic conflicts are likely
231
232**** data model brainstorm :ai:claude:
233
234core entities designed with sync in mind:
235
236***** ~Feed~ :: RSS/podcast subscription
237- ~parent_path~ field for folder structure (eg. ~/Tech/Programming~)
238- ~is_private~ flag to skip server proxy
239- ~refresh_interval~ for custom update frequencies
240
241***** ~Episode~ :: individual podcast episodes
242- standard RSS metadata (guid, title, description, media url)
243- duration and file info for playback
244
245***** ~PlayRecord~ :: device-specific playback state
246- separate record per ~device_id~ to avoid timestamp conflicts
247- position, completed status, playback speed
248- UI can prompt "continue from X device?" for resolution
249
250***** ~QueueItem~ :: device-specific episode queue
251- ordered list with position field
252- ~device_id~ scoped to avoid queue conflicts
253
254***** ~Subscription~ :: feed membership settings
255- can be global or device-specific
256- auto-download preferences per device
257
258***** ~Settings~ :: split global vs device-local
259- theme, default speed = global
260- download path, audio device = device-local
261
262***** Event tables for complex operations:
263- ~FeedMoveEvent~ for folder reorganization
264- ~BulkMarkPlayedEvent~ for "mark all read" operations
265- better conflict resolution than direct state updates
266
267***** sync considerations
268- device identity established on first run
269- dexie syncable handles basic timestamp conflicts
270- prefer device-scoped records for semantic conflicts
271- event-driven pattern for bulk operations
272
273**** schema evolution from previous iteration :ai:claude:
274
275reviewed existing schema from tmp/feed.ts - well designed foundation:
276
277***** keep from original
278- Channel/ChannelEntry naming and structure
279- ~refreshHP~ adaptive refresh system (much better than simple intervals)
280- rich podcast metadata (people, tags, enclosure, podcast object)
281- HTTP caching with etag/status tracking
282- epoch millisecond timestamps
283- ~hashId()~ approach for entry IDs
284
285***** add for multi-device sync
286- ~PlayState~ table (device-scoped position/completion)
287- Subscription table (with ~parentPath~ for folders, device-scoped settings)
288- ~QueueItem~ table (device-scoped episode queues)
289- Device table (identity management)
290
291***** migration considerations
292- existing Channel/ChannelEntry can be preserved
293- new tables are additive
294- ~fetchAndUpsert~ method works well with server proxy architecture
295- dexie sync vs rxdb - need to evaluate change tracking capabilities
296
297**** content-addressed caching for offline resilience :ai:claude:
298
299designed caching system for when upstream feeds fail/disappear, building on existing cache-schema.ts:
300
301***** server-side schema evolution (drizzle sqlite):
302- keep existing ~httpCacheTable~ design (health tracking, http headers, ttl)
303- add ~contentHash~ field pointing to deduplicated content
304- new ~contentStoreTable~: deduplicated blobs by sha256 hash
305- new ~contentHistoryTable~: url -> contentHash timeline with isLatest flag
306- reference counting for garbage collection
307
308***** client-side OPFS storage
309- ~/cache/content/{contentHash}.xml~ for raw feeds
310- ~/cache/media/{contentHash}.mp3~ for podcast episodes
311- ~LocalCacheEntry~ metadata tracks expiration and offline-only flags
312- maintains last N versions per feed for historical access
313
314***** fetch strategy & fallback
3151. check local OPFS cache first (fastest)
3162. try server proxy ~/api/feed?url={feedUrl}~ (deduplicated)
3173. server checks ~contentHistory~, serves latest or fetches upstream
3184. server returns ~{contentHash, content, cached: boolean}~
3195. client stores with content hash as filename
3206. emergency mode: serve stale content when upstream fails
321
322- preserves existing health tracking and HTTP caching logic
323- popular feeds cached once on server, many clients benefit
324- bandwidth savings via content hash comparison
325- historical feed state preservation (feeds disappear!)
326- true offline operation after initial sync
327
328** <2025-05-29 Thu> :ai:claude:
329e2e encryption and invitation flow design
330
331worked through the crypto and invitation architecture. key decisions:
332
333*** keypair strategy
334- use jwk format for interoperability (server stores public keys)
335- ed25519 for signing, separate x25519 for encryption if needed
336- zustand lazy initialization pattern: ~ensureKeypair()~ on first use
337- store private jwk in persisted zustand state
338
339*** invitation flow: dual-jwt approach
340solved the chicken-and-egg problem of sharing encryption keys securely.
341
342**** qr code contains two signed jwts:
3431. invitation token: ~{iss: inviter_fingerprint, sub: invitation_id, purpose: "realm_invite"}~
3442. encryption key token: ~{iss: inviter_fingerprint, ephemeral_private: base64_key, purpose: "ephemeral_key"}~
345
346**** exchange process:
3471. invitee posts jwt1 + their public keys to ~/invitations~
3482. server verifies jwt1 signature against realm members
3493. if valid: adds invitee to realm, returns ~{realm_id, realm_members, encrypted_realm_key}~
3504. invitee verifies jwt2 signature against returned realm members
3515. invitee extracts ephemeral private key, decrypts realm encryption key
352
353**** security properties:
354- server never has decryption capability (missing ephemeral private key)
355- both jwts must be signed by verified realm member
356- if first exchange fails, second jwt is cryptographically worthless
357- atomic operation: identity added only if invitation valid
358- built-in expiration and tamper detection via jwt standard
359
360**** considered alternatives:
361- raw ephemeral keys in qr: simpler but no authenticity
362- ecdh key agreement: chicken-and-egg problem with public key exchange
363- server escrow: good but missing authentication layer
364- password-based: requires secure out-of-band sharing
365
366the dual-jwt approach provides proper authenticated invitations while maintaining e2e encryption properties.
367
368**** refined dual-jwt with ephemeral signing
369simplified the approach by using ephemeral key for second jwt signature:
370
371**setup**:
3721. inviter generates ephemeral keypair
3732. encrypts realm key with ephemeral private key
3743. posts to server: ~{invitation_id, realm_id, ephemeral_public, encrypted_realm_key}~
375
376**qr code contains**:
377#+BEGIN_SRC json
378// JWT 1: signed with inviter's realm signing key
379{
380 "realm_id": "uuid",
381 "invitation_id": "uuid",
382 "iss": "inviter_fingerprint"
383}
384
385// JWT 2: signed with ephemeral private key
386{
387 "ephemeral_private": "base64_key",
388 "invitation_id": "uuid"
389}
390#+END_SRC
391
392**exchange flow**:
3931. submit jwt1 → server verifies against realm members → returns ~{invitation_id, realm_id, ephemeral_public, encrypted_realm_key}~
3942. verify jwt2 signature using ~ephemeral_public~ from server response
3953. extract ~ephemeral_private~ from jwt2, decrypt realm key
396
397**benefits over previous version**:
398- no premature key disclosure (invitee keys shared via normal webrtc peering)
399- self-contained verification (ephemeral public key verifies jwt2)
400- cleaner separation of realm auth vs encryption key distribution
401- simpler flow (no need to return realm member list)
402
403**crypto verification principle**: digital signatures work as sign-with-private/verify-with-public, while encryption works as encrypt-with-public/decrypt-with-private. jwt2 verification uses signature verification, not decryption.
404
405**invitation flow diagram**:
406#+BEGIN_SRC mermaid
407sequenceDiagram
408 participant I as Inviter
409 participant S as Server
410 participant E as Invitee
411
412 Note over I: Generate ephemeral keypair
413 I->>I: ephemeral_private, ephemeral_public
414
415 Note over I: Encrypt realm key
416 I->>I: encrypted_realm_key = encrypt(realm_key, ephemeral_private)
417
418 I->>S: POST /invitations<br/>{invitation_id, realm_id, ephemeral_public, encrypted_realm_key}
419 S-->>I: OK
420
421 Note over I: Create JWTs for QR code
422 I->>I: jwt1 = sign({realm_id, invitation_id}, inviter_private)
423 I->>I: jwt2 = sign({ephemeral_private, invitation_id}, ephemeral_private)
424
425 Note over I,E: QR code contains [jwt1, jwt2]
426
427 E->>S: POST /invitations/exchange<br/>{jwt1}
428 Note over S: Verify jwt1 signature<br/>against realm members
429 S-->>E: {invitation_id, realm_id, ephemeral_public, encrypted_realm_key}
430
431 Note over E: Verify jwt2 signature<br/>using ephemeral_public
432 E->>E: verify_signature(jwt2, ephemeral_public)
433
434 Note over E: Extract key and decrypt
435 E->>E: ephemeral_private = decode(jwt2)
436 E->>E: realm_key = decrypt(encrypted_realm_key, ephemeral_private)
437
438 Note over E: Now member of realm!
439#+END_SRC
440
441**** jwk keypair generation and validation :ai:claude:
442
443discussed jwk vs raw crypto.subtle for keypair storage. since public keys need server storage for realm authorization, jwk is better for interoperability.
444
445**keypair generation**:
446#+BEGIN_SRC typescript
447const keypair = await crypto.subtle.generateKey(
448 { name: "Ed25519" },
449 true,
450 ["sign", "verify"]
451);
452
453const publicJWK = await crypto.subtle.exportKey("jwk", keypair.publicKey);
454const privateJWK = await crypto.subtle.exportKey("jwk", keypair.privateKey);
455
456// JWK format:
457{
458 "kty": "OKP",
459 "crv": "Ed25519",
460 "x": "base64url-encoded-public-key",
461 "d": "base64url-encoded-private-key" // only in private JWK
462}
463#+END_SRC
464
465**client validation**:
466#+BEGIN_SRC typescript
467function isValidEd25519PublicJWK(jwk: any): boolean {
468 return (
469 typeof jwk === 'object' &&
470 jwk.kty === 'OKP' &&
471 jwk.crv === 'Ed25519' &&
472 typeof jwk.x === 'string' &&
473 jwk.x.length === 43 && // base64url Ed25519 public key length
474 !jwk.d && // public key shouldn't have private component
475 !jwk.use || jwk.use === 'sig'
476 );
477}
478
479async function validatePublicKey(publicJWK: JsonWebKey): Promise<CryptoKey | null> {
480 try {
481 if (!isValidEd25519PublicJWK(publicJWK)) return null;
482
483 const key = await crypto.subtle.importKey(
484 'jwk',
485 publicJWK,
486 { name: 'Ed25519' },
487 false,
488 ['verify']
489 );
490
491 return key;
492 } catch {
493 return null;
494 }
495}
496#+END_SRC
497
498**server validation (node.js)**:
499#+BEGIN_SRC typescript
500import { webcrypto } from 'node:crypto';
501
502async function validateClientPublicKey(publicJWK: JsonWebKey): Promise<boolean> {
503 try {
504 if (!isValidEd25519PublicJWK(publicJWK)) return false;
505
506 await webcrypto.subtle.importKey(
507 'jwk',
508 publicJWK,
509 { name: 'Ed25519' },
510 false,
511 ['verify']
512 );
513
514 return true;
515 } catch {
516 return false;
517 }
518}
519#+END_SRC
520
521**authentication flow**:
522#+BEGIN_SRC typescript
523// client signs message
524const authMessage = {
525 realm: 'uuid-here',
526 timestamp: Date.now(),
527 action: 'join'
528};
529
530const signature = await crypto.subtle.sign(
531 'Ed25519',
532 privateKey,
533 new TextEncoder().encode(JSON.stringify(authMessage))
534);
535
536// server verifies
537async function verifyAuth(req: AuthRequest): Promise<boolean> {
538 const publicKey = await webcrypto.subtle.importKey(
539 'jwk',
540 req.publicKey,
541 { name: 'Ed25519' },
542 false,
543 ['verify']
544 );
545
546 const messageBytes = new TextEncoder().encode(JSON.stringify(req.message));
547 const signatureBytes = new Uint8Array(req.signature);
548
549 return await webcrypto.subtle.verify(
550 'Ed25519',
551 publicKey,
552 signatureBytes,
553 messageBytes
554 );
555}
556#+END_SRC
557
558**** proposed schemas :ai:claude:
559
560***** client-side schema (dexie)
561
562#+BEGIN_SRC typescript
563// Core RSS/Podcast data (from your existing design)
564interface Channel {
565 id: string;
566 feedUrl: string;
567 htmlUrl?: string;
568 imageUrl?: string;
569 title?: string;
570 description?: string;
571 language?: string;
572 people?: Record<string, string>;
573 tags?: string[];
574
575 // Refresh management
576 refreshHP: number;
577 nextRefreshAt?: number;
578 lastRefreshAt?: number;
579 lastRefreshStatus?: string;
580 lastRefreshHttpStatus?: number;
581 lastRefreshHttpEtag?: string;
582
583 // Cache info
584 contentHash?: string;
585 lastFetchedAt?: number;
586}
587
588interface ChannelEntry {
589 id: string;
590 channelId: string;
591 guid: string;
592 title: string;
593 linkUrl?: string;
594 imageUrl?: string;
595 snippet?: string;
596 content?: string;
597
598 enclosure?: {
599 url: string;
600 type?: string;
601 length?: number;
602 };
603
604 podcast?: {
605 explicit?: boolean;
606 duration?: string;
607 seasonNum?: number;
608 episodeNum?: number;
609 transcriptUrl?: string;
610 };
611
612 publishedAt?: number;
613 fetchedAt?: number;
614}
615
616// Device-specific sync tables
617interface PlayRecord {
618 id: string;
619 entryId: string;
620 deviceId: string;
621 position: number;
622 duration?: number;
623 completed: boolean;
624 speed: number;
625 updatedAt: number;
626}
627
628interface Subscription {
629 id: string;
630 channelId: string;
631 deviceId?: string;
632 parentPath: string; // "/Tech/Programming"
633 autoDownload: boolean;
634 downloadLimit?: number;
635 isActive: boolean;
636 createdAt: number;
637 updatedAt: number;
638}
639
640interface QueueItem {
641 id: string;
642 entryId: string;
643 deviceId: string;
644 position: number;
645 addedAt: number;
646}
647
648interface Device {
649 id: string;
650 name: string;
651 platform: string;
652 lastSeen: number;
653}
654
655// Local cache metadata
656interface LocalCache {
657 id: string;
658 url: string;
659 contentHash: string;
660 filePath: string; // OPFS path
661 cachedAt: number;
662 expiresAt?: number;
663 size: number;
664 isOfflineOnly: boolean;
665}
666
667// Dexie schema
668const db = new Dexie('SkypodDB');
669db.version(1).stores({
670 channels: '&id, feedUrl, contentHash',
671 channelEntries: '&id, channelId, publishedAt',
672 playRecords: '&id, [entryId+deviceId], deviceId, updatedAt',
673 subscriptions: '&id, channelId, deviceId, parentPath',
674 queueItems: '&id, entryId, deviceId, position',
675 devices: '&id, lastSeen',
676 localCache: '&id, url, contentHash, expiresAt'
677});
678#+END_SRC
679
680***** server-side schema
681
682#+BEGIN_SRC typescript
683// Content-addressed cache
684interface ContentStore {
685 contentHash: string; // Primary key
686 content: Buffer; // Raw feed content
687 contentType: string;
688 contentLength: number;
689 firstSeenAt: number;
690 referenceCount: number;
691}
692
693interface ContentHistory {
694 id: string;
695 url: string;
696 contentHash: string;
697 fetchedAt: number;
698 isLatest: boolean;
699}
700
701// HTTP cache with health tracking (from your existing design)
702interface HttpCache {
703 key: string; // URL hash, primary key
704 url: string;
705
706 status: 'alive' | 'dead';
707 lastFetchedAt: number;
708 lastFetchError?: string;
709 lastFetchErrorStreak: number;
710
711 lastHttpStatus: number;
712 lastHttpEtag?: string;
713 lastHttpHeaders: Record<string, string>;
714 expiresAt: number;
715 expirationTtl: number;
716
717 contentHash: string; // Points to ContentStore
718}
719
720// Sync/auth tables
721interface Realm {
722 id: string; // UUID
723 createdAt: number;
724 verifiedKeys: string[]; // Public key list
725}
726
727interface PeerConnection {
728 id: string;
729 realmId: string;
730 publicKey: string;
731 lastSeen: number;
732 isOnline: boolean;
733}
734
735// Media cache for podcast episodes
736interface MediaCache {
737 contentHash: string; // Primary key
738 originalUrl: string;
739 mimeType: string;
740 fileSize: number;
741 content: Buffer;
742 cachedAt: number;
743 accessCount: number;
744}
745#+END_SRC
746
747**** episode title parsing for sub-feed groupings :ai:claude:
748
749*problem*: some podcast feeds contain multiple shows, need hierarchical organization within a feed
750
751*example*: "Apocalypse Players" podcast
752- episode title: "A Term of Art 6 - Winston's Hollow"
753- desired grouping: "Apocalypse Players > A Term of Art > 6 - Winston's Hollow"
754- UI shows sub-shows within the main feed
755
756***** approaches considered
757
7581. *manual regex patterns* (short-term solution)
759 - user provides regex with capture groups = tags
760 - reliable, immediate, user-controlled
761 - requires manual setup per feed
762
7632. *LLM-generated regex* (automation goal)
764 - analyze last 100 episode titles
765 - generate regex pattern automatically
766 - good balance of automation + reliability
767
7683. *NER model training* (experimental)
769 - train spacy model for episode title parsing
770 - current prototype: 150 labelled examples, limited success
771 - needs more training data to be viable
772
773***** data model implications
774
775- add regex pattern field to Channel/Feed
776- store extracted groupings as hierarchical tags on ~ChannelEntry~
777- maybe add grouping/series field to episodes
778
779***** plan
780
781*preference*: start with manual regex, evolve toward LLM automation
782
783*implementation design*:
784- if no title pattern: episodes are direct children of the feed
785- title pattern = regex with named capture groups + path template
786
787*example configuration*:
788- regex: ~^(?<series>[^0-9]+)\s*(?<episode>\d+)\s*-\s*(?<title>.+)$~
789- path template: ~{series} > Episode {episode} - {title}~
790- result: "A Term of Art 6 - Winston's Hollow" → "A Term of Art > Episode 6 - Winston's Hollow"
791
792*schema additions*:
793#+BEGIN_SRC typescript
794interface Channel {
795 // ... existing fields
796 titlePatterns?: Array<{
797 name: string; // "Main Episodes", "Bonus Content", etc.
798 regex: string; // named capture groups
799 pathTemplate: string; // interpolation template
800 priority: number; // order to try patterns (lower = first)
801 isActive: boolean; // can disable without deleting
802 }>;
803 fallbackPath?: string; // template for unmatched episodes
804}
805
806interface ChannelEntry {
807 // ... existing fields
808 parsedPath?: string; // computed from titlePattern
809 parsedGroups?: Record<string, string>; // captured groups
810 matchedPatternName?: string; // which pattern was used
811}
812#+END_SRC
813
814*pattern matching logic*:
8151. try patterns in priority order (lower number = higher priority)
8162. first matching pattern wins
8173. if no patterns match, use fallbackPath template (e.g., "Misc > {title}")
8184. if no fallbackPath, episode stays direct child of feed
819
820*example multi-pattern setup*:
821- Pattern 1: "Main Episodes" - ~^(?<series>[^0-9]+)\s*(?<episode>\d+)~ → ~{series} > Episode {episode}~
822- Pattern 2: "Bonus Content" - ~^Bonus:\s*(?<title>.+)~ → ~Bonus > {title}~
823- Fallback: ~Misc > {title}~
824
825**** scoped tags and filter-based UI evolution :ai:claude:
826
827*generalization*: move from rigid hierarchies to tag-based filtering system
828
829*tag scoping*:
830- feed-level tags: "Tech", "Gaming", "D&D"
831- episode-level tags: from regex captures like "series:CriticalRole", "campaign:2", "type:main"
832- user tags: manual additions like "favorites", "todo"
833
834*UI as tag filtering*:
835- default view: all episodes grouped by feed
836- filter by ~series:CriticalRole~ → shows only CR episodes across all feeds
837- filter by ~type:bonus~ → shows bonus content from all podcasts
838- combine filters: ~series:CriticalRole AND type:main~ → main CR episodes only
839
840*benefits*:
841- no rigid hierarchy - users create their own views
842- regex patterns become automated episode taggers
843- same filtering system works for search, organization, queues
844- tags are syncable metadata, views are client-side
845
846*schema evolution*:
847#+BEGIN_SRC typescript
848interface Tag {
849 scope: 'feed' | 'episode' | 'user';
850 key: string; // "series", "type", "campaign"
851 value: string; // "CriticalRole", "bonus", "2"
852}
853
854interface ChannelEntry {
855 // ... existing
856 tags: Tag[]; // includes regex-generated + manual
857}
858
859interface FilterView {
860 id: string;
861 name: string;
862 folderPath: string; // "/Channels/Critical Role"
863 filters: Array<{
864 key: string;
865 value: string;
866 operator: 'equals' | 'contains' | 'not';
867 }>;
868 isDefault: boolean;
869 createdAt: number;
870}
871#+END_SRC
872
873**** default UI construction and feed merging :ai:claude:
874
875*auto-generated views on subscribe*:
876- subscribe to "Critical Role" → creates ~/Channels/Critical Role~ folder
877- default filter view: ~feed:CriticalRole~ (shows all episodes from that feed)
878- user can customize, split into sub-views, or delete
879
880*smart view suggestions*:
881- after regex patterns generate tags, suggest splitting views
882- "I noticed episodes with ~series:Campaign2~ and ~series:Campaign3~ - create separate views?"
883- "Create view for ~type:bonus~ episodes?"
884
885*view management UX*:
886- right-click feed → "Split by series", "Split by type"
887- drag episodes between views to create manual filters
888- views can be nested: ~/Channels/Critical Role/Campaign 2/Main Episodes~
889
890*feed merging for multi-source shows*:
891problem: patreon feed + main show feed for same podcast
892
893#+BEGIN_EXAMPLE
894/Channels/
895 Critical Role/
896 All Episodes # merged view: feed:CriticalRole OR feed:CriticalRolePatreon
897 Main Feed # filter: feed:CriticalRole
898 Patreon Feed # filter: feed:CriticalRolePatreon
899#+END_EXAMPLE
900
901*deduplication strategy*:
902- episodes matched by ~guid~ or similar content hash
903- duplicate episodes get ~source:main,patreon~ tags
904- UI shows single episode with source indicators
905- user can choose preferred source for playback
906- play state syncs across all sources of same episode
907
908*feed relationship schema*:
909#+BEGIN_SRC typescript
910interface FeedGroup {
911 id: string;
912 name: string; // "Critical Role"
913 feedIds: string[]; // [mainFeedId, patreonFeedId]
914 mergeStrategy: 'guid' | 'title' | 'contentHash';
915 defaultView: FilterView;
916}
917
918interface ChannelEntry {
919 // ... existing
920 duplicateOf?: string; // points to canonical episode ID
921 sources: string[]; // feed IDs where this episode appears
922}
923#+END_SRC
924
925**per-view settings and state**:
926each filter view acts like a virtual feed with its own:
927- unread counts (episodes matching filter that haven't been played)
928- notification settings (notify for new episodes in this view)
929- muted state (hide notifications, mark as read automatically)
930- auto-download preferences (download episodes that match this filter)
931- play queue integration (add new episodes to queue)
932
933**use cases**:
934- mute "Bonus Content" view but keep notifications for main episodes
935- auto-download only "Campaign 2" episodes, skip everything else
936- separate unread counts: "5 unread in Main Episodes, 2 in Bonus"
937- queue only certain series automatically
938
939**schema additions**:
940#+BEGIN_SRC typescript
941interface FilterView {
942 // ... existing fields
943 settings: {
944 notificationsEnabled: boolean;
945 isMuted: boolean;
946 autoDownload: boolean;
947 autoQueue: boolean;
948 downloadLimit?: number; // max episodes to keep
949 };
950 state: {
951 unreadCount: number;
952 lastViewedAt?: number;
953 isCollapsed: boolean; // in sidebar
954 };
955}
956#+END_SRC
957
958*inheritance behavior*:
959- new filter views inherit settings from parent feed/group
960- user can override per-view
961- "mute all Critical Role" vs "mute only bonus episodes"
962
963**** client-side episode caching strategy :ai:claude:
964
965*architecture*: service worker-based transparent caching
966
967*flow*:
9681. audio player requests ~/audio?url={episodeUrl}~
9692. service worker intercepts request
9703. if present in cache (with Range header support):
971 - serve from cache
9724. else:
973 - let request continue to server (immediate playback)
974 - simultaneously start background fetch of full audio file
975 - when complete, broadcast "episode-cached" event
976 - audio player catches event and restarts feed → now uses cached version
977
978**benefits**:
979- no playback interruption (streaming starts immediately)
980- seamless transition to cached version
981- Range header support for seeking/scrubbing
982- transparent to audio player implementation
983
984*implementation considerations*:
985- cache storage limits and cleanup policies
986- partial download resumption if interrupted
987- cache invalidation when episode URLs change
988- offline playback support
989- progress tracking for background downloads
990
991**schema additions**:
992#+BEGIN_SRC typescript
993interface CachedEpisode {
994 episodeId: string;
995 originalUrl: string;
996 cacheKey: string; // for cache API
997 fileSize: number;
998 cachedAt: number;
999 lastAccessedAt: number;
1000 downloadProgress?: number; // 0-100 for in-progress downloads
1001}
1002#+END_SRC
1003
1004**service worker events**:
1005- ~episode-cache-started~ - background download began
1006- ~episode-cache-progress~ - download progress update
1007- ~episode-cache-complete~ - ready to switch to cached version
1008- ~episode-cache-error~ - download failed, stay with streaming
1009
1010**background sync for proactive downloads**:
1011
1012**browser support reality**:
1013- Background Sync API: good support (Chrome/Edge, limited Safari)
1014- Periodic Background Sync: very limited (Chrome only, requires PWA install)
1015- Push notifications: good support, but requires user permission
1016
1017**hybrid approach**:
10181. **foreground sync** (reliable): when app is open, check for new episodes
10192. **background sync** (opportunistic): register sync event when app closes
10203. **push notifications** (fallback): server pushes "new episodes available"
10214. **manual sync** (always works): pull-to-refresh, settings toggle
1022
1023**implementation strategy**:
1024#+BEGIN_SRC typescript
1025// Register background sync when app becomes hidden
1026document.addEventListener('visibilitychange', () => {
1027 if (document.hidden && 'serviceWorker' in navigator) {
1028 navigator.serviceWorker.ready.then(registration => {
1029 return registration.sync.register('download-episodes');
1030 });
1031 }
1032});
1033
1034// Service worker handles sync event
1035self.addEventListener('sync', event => {
1036 if (event.tag === 'download-episodes') {
1037 event.waitUntil(syncEpisodes());
1038 }
1039});
1040#+END_SRC
1041
1042**realistic expectations**:
1043- iOS Safari: very limited background processing
1044- Android Chrome: decent background sync support
1045- Desktop: mostly works
1046- battery/data saver modes: disabled by OS
1047
1048**fallback strategy**: rely primarily on foreground sync + push notifications, treat background sync as nice-to-have enhancement
1049
1050**push notification sync workflow**:
1051
1052**server-side trigger**:
10531. server detects new episodes during RSS refresh
10542. check which users are subscribed to that feed
10553. send push notification with episode metadata payload
10564. notification wakes up service worker on client
1057
1058**service worker notification handler**:
1059#+BEGIN_SRC typescript
1060self.addEventListener('push', event => {
1061 const data = event.data?.json();
1062
1063 if (data.type === 'new-episodes') {
1064 event.waitUntil(
1065 // Start background download of new episodes
1066 downloadNewEpisodes(data.episodes)
1067 .then(() => {
1068 // Show notification to user
1069 return self.registration.showNotification('New episodes available', {
1070 body: ~${data.episodes.length} new episodes downloaded~,
1071 icon: '/icon-192.png',
1072 badge: '/badge-72.png',
1073 tag: 'new-episodes',
1074 data: { episodeIds: data.episodes.map(e => e.id) }
1075 });
1076 })
1077 );
1078 }
1079});
1080
1081// Handle notification click
1082self.addEventListener('notificationclick', event => {
1083 event.notification.close();
1084
1085 // Open app to specific episode or feed
1086 event.waitUntil(
1087 clients.openWindow(~/episodes/${event.notification.data.episodeIds[0]}~)
1088 );
1089});
1090#+END_SRC
1091
1092**server push logic**:
1093- batch notifications (don't spam for every episode)
1094- respect user notification preferences from FilterView settings
1095- include episode metadata in payload to avoid round-trip
1096- throttle notifications (max 1 per feed per hour?)
1097
1098**user flow**:
10991. new episode published → server pushes notification
11002. service worker downloads episode in background
11013. user sees "New episodes downloaded" notification
11024. tap notification → opens app to new episode, ready to play offline
1103
1104*benefits*:
1105- true background downloading without user interaction
1106- works even when app is closed
1107- respects per-feed notification settings
1108
1109**push payload size constraints**:
1110- **limit**: ~4KB (4,096 bytes) across most services
1111- **practical limit**: ~3KB to account for service overhead
1112- **implications for episode metadata**:
1113
1114#+BEGIN_SRC json
1115{
1116 "type": "new-episodes",
1117 "episodes": [
1118 {
1119 "id": "ep123",
1120 "channelId": "ch456",
1121 "title": "Episode Title",
1122 "url": "https://...",
1123 "duration": 3600,
1124 "size": 89432112
1125 }
1126 ]
1127}
1128#+END_SRC
1129
1130**payload optimization strategies**:
1131- minimal episode metadata in push (id, url, basic info)
1132- batch multiple episodes in single notification
1133- full episode details fetched after service worker wakes up
1134- URL shortening for long episode URLs
1135- compress JSON payload if needed
1136
1137**alternative for large payloads**:
1138- push notification contains only "new episodes available" signal
1139- service worker makes API call to get full episode list
1140- trade-off: requires network round-trip but unlimited data
1141
1142**logical clock sync optimization**:
1143
1144much simpler approach using sync revisions:
1145
1146#+BEGIN_SRC json
1147{
1148 "type": "sync-available",
1149 "fromRevision": 12345,
1150 "toRevision": 12389,
1151 "changeCount": 8
1152}
1153#+END_SRC
1154
1155**service worker sync flow**:
11561. push notification wakes service worker with revision range
11572. service worker fetches ~/sync?from=12345&to=12389~
11583. server returns only changes in that range (episodes, feed updates, etc)
11594. service worker applies changes to local dexie store
11605. service worker queues background downloads for new episodes
11616. updates local revision to 12389
1162
1163**benefits of revision-based approach**:
1164- tiny push payload (just revision numbers)
1165- server can efficiently return only changes in range
1166- automatic deduplication (revision already applied = skip)
1167- works for any sync data (episodes, feed metadata, user settings)
1168- handles offline gaps gracefully (fetch missing revision ranges)
1169
1170**sync API response**:
1171#+BEGIN_SRC typescript
1172interface SyncResponse {
1173 fromRevision: number;
1174 toRevision: number;
1175 changes: Array<{
1176 type: 'episode' | 'channel' | 'subscription';
1177 operation: 'create' | 'update' | 'delete';
1178 data: any;
1179 revision: number;
1180 }>;
1181}
1182#+END_SRC
1183
1184**integration with episode downloads**:
1185- service worker processes sync changes
1186- identifies new episodes that match user's auto-download filters
1187- queues those for background cache fetching
1188- much more efficient than sending episode metadata in push payload
1189
1190**service worker processing time constraints**:
1191
1192**hard limits**:
1193- **30 seconds idle timeout**: service worker terminates after 30s of inactivity
1194- **5 minutes event processing**: single event/request must complete within 5 minutes
1195- **30 seconds fetch timeout**: individual network requests timeout after 30s
1196- **notification requirement**: push events MUST display notification before promise settles
1197
1198**practical implications**:
1199- sync API call (~/sync?from=X&to=Y~) must complete within 30s
1200- large episode downloads must be queued, not started immediately in push handler
1201- use ~event.waitUntil()~ to keep service worker alive during processing
1202- break large operations into smaller chunks
1203
1204**recommended push event flow**:
1205#+BEGIN_SRC typescript
1206self.addEventListener('push', event => {
1207 const data = event.data?.json();
1208
1209 event.waitUntil(
1210 // Must complete within 5 minutes total
1211 handlePushSync(data)
1212 .then(() => {
1213 // Required: show notification before promise settles
1214 return self.registration.showNotification('Episodes synced');
1215 })
1216 );
1217});
1218
1219async function handlePushSync(data) {
1220 // 1. Quick sync API call (< 30s)
1221 const changes = await fetch(~/sync?from=${data.fromRevision}&to=${data.toRevision}~);
1222
1223 // 2. Apply changes to dexie store (fast, local)
1224 await applyChangesToStore(changes);
1225
1226 // 3. Queue episode downloads for later (don't start here)
1227 await queueEpisodeDownloads(changes.newEpisodes);
1228
1229 // Total time: < 5 minutes, preferably < 30s
1230}
1231#+END_SRC
1232
1233*download strategy*: use push event for sync + queuing, separate background tasks for actual downloads
1234
1235*background fetch API for large downloads*:
1236
1237*progressive enhancement approach*:
1238#+BEGIN_SRC typescript
1239async function queueEpisodeDownloads(episodes) {
1240 for (const episode of episodes) {
1241 if ('serviceWorker' in navigator && 'BackgroundFetch' in window) {
1242 // Chrome/Edge: use Background Fetch API for true background downloading
1243 await navigator.serviceWorker.ready.then(registration => {
1244 return registration.backgroundFetch.fetch(
1245 ~episode-${episode.id}~,
1246 episode.url,
1247 {
1248 icons: [{ src: '/icon-256.png', sizes: '256x256', type: 'image/png' }],
1249 title: ~Downloading: ${episode.title}~,
1250 downloadTotal: episode.fileSize
1251 }
1252 );
1253 });
1254 } else {
1255 // Fallback: queue for reactive download (download while streaming)
1256 await queueReactiveDownload(episode);
1257 }
1258 }
1259}
1260
1261// Handle background fetch completion
1262self.addEventListener('backgroundfetch', event => {
1263 if (event.tag.startsWith('episode-')) {
1264 event.waitUntil(handleEpisodeDownloadComplete(event));
1265 }
1266});
1267#+END_SRC
1268
1269*browser support reality*:
1270- *Chrome/Edge*: Background Fetch API supported
1271- *Firefox/Safari*: not supported, fallback to reactive caching
1272- *mobile*: varies by platform and browser
1273
1274*benefits when available*:
1275- true background downloading (survives app close, browser close)
1276- built-in download progress UI
1277- automatic retry on network failure
1278- no service worker time limits during download
1279
1280*graceful degradation*:
1281- detect support, use when available
1282- fallback to reactive caching (download while streaming)
1283- user gets best experience possible on their platform
1284
1285*** research todos :ai:claude:
1286
1287high-level unanswered questions from architecture brainstorming:
1288
1289**** sync and data management
1290***** TODO dexie sync capabilities vs rxdb for multi-device sync implementation
1291***** TODO webrtc p2p sync implementation patterns and reliability
1292***** TODO conflict resolution strategies for device-specific data in distributed sync
1293***** TODO content-addressed deduplication algorithms for rss/podcast content
1294**** client-side storage and caching
1295***** TODO opfs storage limits and cleanup strategies for client-side caching
1296***** TODO practical background fetch api limits and edge cases for podcast downloads
1297**** automation and intelligence
1298***** TODO llm-based regex generation for episode title parsing automation
1299***** TODO push notification subscription management and realm authentication
1300**** platform and browser capabilities
1301***** TODO browser audio api capabilities for podcast-specific features (speed, silence skip)
1302***** TODO progressive web app installation and platform-specific behaviors
1303
1304# Local Variables:
1305# org-hierarchical-todo-statistics: nil
1306# org-checkbox-hierarchical-statistics: nil
1307# End: