1#+PROPERTY: COOKIE_DATA recursive 2#+STARTUP: overview 3 4most of this is old, I need to rework it 5 6* design 7 8** frontend (packages/app) 9- http://localhost:7891 10- proxies ~/api~ and ~/sync~ to the backend in development 11- uses Dexie for local storage with sync plugin 12- custom sync replication implementation using PeerJS through the signalling server 13 14** backend (packages/server) 15- http://localhost:7890 16- serves ~/dist~ if the directory is present (see ~dist~ script) 17- serves ~/api~ for RSS caching proxy 18 - file-based routing under the api directory 19- serves ~/sync~ which is a ~peerjs~ signalling server 20 21** sync 22- each client keeps the full data set 23- dexie sync and observable let us stream change sets 24- we can publish the "latest" to all peers 25- on first pull, if not the first client, we can request a dump out of band 26 27*** rss feed data 28- do we want to backup feed data? 29 - conceptually, this should be refetchable 30 - but feeds go away, and some will only show recent stories 31 - so yes, we'll need this 32 - but server side, we can dedupe 33 - content-addressed server-side cache? 34 35- server side does RSS pulling 36 - can feeds be marked private, such that they won't be pulled through the proxy? 37 - but then we require everything to be fetchable via cors 38 - client configured proxy settings? 39 40*** peer connection 41- on startup, check for current realm-id and key pair 42- if not present, ask to login or start new 43 - if login, run through the [[* pairing]] process 44 - if start new, run through the [[* registration]] process 45- use keypair to authenticate to server 46 - response includes list of active peers to connect 47- clients negotiate sync from there 48- an identity is a keypair and a realm 49 50- realm is uuid 51 - realm on the server is the socket connection for peer discovery 52 - keeps a list of verified public keys 53 - and manages the /current/ ~public-key->peer ids~ mapping 54 - realm on the client side is first piece of info required for sync 55 - when connecting to the signalling server, you present a realm, and a signed public key 56 - server accepts/rejects based on signature and current verified keys 57 58- a new keypair can create a realm 59 60- a new keypair can double sign an invitation 61 - invite = ~{ realm:, nonce:, not_before:, not_after:, authorizer: }~, signed with verified key 62 - exchanging an invite = ~{ invite: }~, signed with my key 63 64- on startup 65 - start stand-alone (no syncing required, usually the case on first-run) 66 - generate a keypair 67 - want server backup? 68 - sign a "setup" message with new keypair and send to the server 69 - server responds with a new realm, that this keypair is already verified for 70 - move along 71 - exchange invite to sync to other devices 72 - generate a keypair 73 - sign the exchange message with the invite and send to the server 74 - server verifies the invite 75 - adds the new public key to the peer list and publishes downstream 76 - move along 77 78***** standalone 79in this mode, there is no syncing. this is the most likely first-time run option. 80 81- generate a keypair on startup, so we have a stable fingerprint in the future 82- done 83 84***** pairing 85in this mode, there is syncing to a named realm, but not necessarily server resources consumed 86we don't need an email, since the server is just doing signalling and peer management 87 88- generate an invite from an existing verified peer 89 - ~{ realm:, not_before:, not_after:, inviter: peer.public_key }~ 90 - sign that invitation from the existing verified peer 91 92- standalone -> paired 93 - get the invitation somehow (QR code?) 94 - sign an invite exchange with the standalone's public key 95 - send to server 96 - server verifies the invite 97 - adds the new public key to the peer list and publishes downstream 98 99***** server backup 100in this mode, there is syncing to a named realm by email. 101 102goal of server backup mode is that we can go from email->fully working client with latest data without having to have any clients left around that could participate in the sync. 103 104- generate a keypair on startup 105- sign a registration message sent to the server 106 - send a verification email 107 - if email/realm already exists, this is authorization 108 - if not, it's email validation 109 - server starts a realm and associates the public key 110 - server acts as a peer for the realm, and stores private data 111 112- since dexie is publishing change sets, we should be able to just store deltas 113- but we'll need to store _all_ deltas, unless we're materializing on the server side too 114 - should we use an indexdb shim so we can import/export from the server for clean start? 115 - how much materialization does the server need? 116 117* ai instructions 118- when writing to the devlog, add tags to your entries specifying ~:ai:~ and what tool did it. 119- false starts and prototypes are in ~./devlog/~ 120 121* notes and decision record [1/11] 122** architecture design (may 28-29) :ai:claude: 123 124details notes are in [[./devlog/may-29.org]] 125key decisions and system design: 126 127*** sync model 128- device-specific records for playback state/queues to avoid conflicts 129- content-addressed server cache with deduplication 130- dual-JWT invitation flow for secure realm joining 131 132*** data structures 133- tag-based filtering system instead of rigid hierarchies 134- regex patterns for episode title parsing and organization 135- service worker caching with background download support 136 137*** core schemas 138**** client (dexie) 139- Channel/ChannelEntry for RSS feeds and episodes 140- PlayRecord/QueueItem scoped by deviceId 141- FilterView for virtual feed organization 142 143**** server (drizzle) 144- ContentStore for deduplicated content by hash 145- Realm/PeerConnection for sync authorization 146- HttpCache with health tracking and TTL 147 148*** push sync strategy 149- revision-based sync (just send revision ranges in push notifications) 150- background fetch API for large downloads where supported 151- graceful degradation to reactive caching 152 153*** research todos :ai:claude: 154 155**** sync and data management 156***** DONE identity and signature management 157***** TODO dexie sync capabilities vs rxdb for multi-device sync implementation 158***** TODO webrtc p2p sync implementation patterns and reliability 159***** TODO conflict resolution strategies for device-specific data in distributed sync 160***** TODO content-addressed deduplication algorithms for rss/podcast content 161**** client-side storage and caching 162***** TODO opfs storage limits and cleanup strategies for client-side caching 163***** TODO practical background fetch api limits and edge cases for podcast downloads 164**** automation and intelligence 165***** TODO llm-based regex generation for episode title parsing automation 166***** TODO push notification subscription management and realm authentication 167**** platform and browser capabilities 168***** TODO browser audio api capabilities for podcast-specific features (speed, silence skip) 169***** TODO progressive web app installation and platform-specific behaviors 170 171# Local Variables: 172# org-hierarchical-todo-statistics: nil 173# org-checkbox-hierarchical-statistics: nil 174# End: 175 176** <2025-05-28 Wed> 177getting everything setup 178 179the biggest open question I have is what sort of privacy/encryption guarantee I need. I want the server to be able to do things like cache and store feed data long-term. 180 181Is "if you want full privacy, self-host" valid? 182 183*** possibilities 184 185- fully PWA 186 - CON: cors, which would require a proxy anyway 187 - CON: audio analysis, llm based stuff for categorization, etc. won't work 188 - PRO: private as all get out 189 - can still do WebRTC p2p sync for resiliancy 190 - can still do server backups, if sync stream is encrypted, but no compaction would be available 191 - could do _explicit_ server backups as dump files 192 193- self hostable 194 - PRO: can do bunches of private stuff on the server, because if you don't want me to see it, do it elsewhere 195 - CON: hard for folk to use 196 197*** brainstorm :ai:claude: 198**** sync conflict resolution design discussion :ai:claude: 199 200discussed the sync architecture and dexie conflict handling: 201 202*dexie syncable limitations*: 203- logical clocks handle causally-related changes well 204- basic timestamp-based conflict resolution for concurrent updates 205- last-writer-wins for same field conflicts 206- no sophisticated CRDT or vector clock support 207 208*solutions for podcast-specific conflicts*: 209 210- play records: device-specific approach 211 - store separate ~play_records~ per ~device_id~ 212 - each record: ~{ episode_id, device_id, position, completed, timestamp }~ 213 - UI handles conflict resolution with "continue from X device?" prompts 214 - avoids arbitrary timestamp wins, gives users control 215 216- subscription trees 217 - store ~parent_path~ as single string field ("/Tech/Programming") 218 - simpler than managing folder membership tables 219 - conflicts still possible but contained to single field 220 - could store move operations as events for richer resolution 221 222*other sync considerations*: 223- settings/preferences: distinguish device-local vs global 224- bulk operations: "mark all played" can create duplicate operations 225- metadata updates: server RSS updates vs local renames 226- temporal ordering: recently played lists, queue reordering 227- storage limits: cleanup operations conflicting across devices 228- feed state: refresh timestamps, error states 229 230*approach*: prefer "events not state" pattern and device-specific records where semantic conflicts are likely 231 232**** data model brainstorm :ai:claude: 233 234core entities designed with sync in mind: 235 236***** ~Feed~ :: RSS/podcast subscription 237- ~parent_path~ field for folder structure (eg. ~/Tech/Programming~) 238- ~is_private~ flag to skip server proxy 239- ~refresh_interval~ for custom update frequencies 240 241***** ~Episode~ :: individual podcast episodes 242- standard RSS metadata (guid, title, description, media url) 243- duration and file info for playback 244 245***** ~PlayRecord~ :: device-specific playback state 246- separate record per ~device_id~ to avoid timestamp conflicts 247- position, completed status, playback speed 248- UI can prompt "continue from X device?" for resolution 249 250***** ~QueueItem~ :: device-specific episode queue 251- ordered list with position field 252- ~device_id~ scoped to avoid queue conflicts 253 254***** ~Subscription~ :: feed membership settings 255- can be global or device-specific 256- auto-download preferences per device 257 258***** ~Settings~ :: split global vs device-local 259- theme, default speed = global 260- download path, audio device = device-local 261 262***** Event tables for complex operations: 263- ~FeedMoveEvent~ for folder reorganization 264- ~BulkMarkPlayedEvent~ for "mark all read" operations 265- better conflict resolution than direct state updates 266 267***** sync considerations 268- device identity established on first run 269- dexie syncable handles basic timestamp conflicts 270- prefer device-scoped records for semantic conflicts 271- event-driven pattern for bulk operations 272 273**** schema evolution from previous iteration :ai:claude: 274 275reviewed existing schema from tmp/feed.ts - well designed foundation: 276 277***** keep from original 278- Channel/ChannelEntry naming and structure 279- ~refreshHP~ adaptive refresh system (much better than simple intervals) 280- rich podcast metadata (people, tags, enclosure, podcast object) 281- HTTP caching with etag/status tracking 282- epoch millisecond timestamps 283- ~hashId()~ approach for entry IDs 284 285***** add for multi-device sync 286- ~PlayState~ table (device-scoped position/completion) 287- Subscription table (with ~parentPath~ for folders, device-scoped settings) 288- ~QueueItem~ table (device-scoped episode queues) 289- Device table (identity management) 290 291***** migration considerations 292- existing Channel/ChannelEntry can be preserved 293- new tables are additive 294- ~fetchAndUpsert~ method works well with server proxy architecture 295- dexie sync vs rxdb - need to evaluate change tracking capabilities 296 297**** content-addressed caching for offline resilience :ai:claude: 298 299designed caching system for when upstream feeds fail/disappear, building on existing cache-schema.ts: 300 301***** server-side schema evolution (drizzle sqlite): 302- keep existing ~httpCacheTable~ design (health tracking, http headers, ttl) 303- add ~contentHash~ field pointing to deduplicated content 304- new ~contentStoreTable~: deduplicated blobs by sha256 hash 305- new ~contentHistoryTable~: url -> contentHash timeline with isLatest flag 306- reference counting for garbage collection 307 308***** client-side OPFS storage 309- ~/cache/content/{contentHash}.xml~ for raw feeds 310- ~/cache/media/{contentHash}.mp3~ for podcast episodes 311- ~LocalCacheEntry~ metadata tracks expiration and offline-only flags 312- maintains last N versions per feed for historical access 313 314***** fetch strategy & fallback 3151. check local OPFS cache first (fastest) 3162. try server proxy ~/api/feed?url={feedUrl}~ (deduplicated) 3173. server checks ~contentHistory~, serves latest or fetches upstream 3184. server returns ~{contentHash, content, cached: boolean}~ 3195. client stores with content hash as filename 3206. emergency mode: serve stale content when upstream fails 321 322- preserves existing health tracking and HTTP caching logic 323- popular feeds cached once on server, many clients benefit 324- bandwidth savings via content hash comparison 325- historical feed state preservation (feeds disappear!) 326- true offline operation after initial sync 327 328** <2025-05-29 Thu> :ai:claude: 329e2e encryption and invitation flow design 330 331worked through the crypto and invitation architecture. key decisions: 332 333*** keypair strategy 334- use jwk format for interoperability (server stores public keys) 335- ed25519 for signing, separate x25519 for encryption if needed 336- zustand lazy initialization pattern: ~ensureKeypair()~ on first use 337- store private jwk in persisted zustand state 338 339*** invitation flow: dual-jwt approach 340solved the chicken-and-egg problem of sharing encryption keys securely. 341 342**** qr code contains two signed jwts: 3431. invitation token: ~{iss: inviter_fingerprint, sub: invitation_id, purpose: "realm_invite"}~ 3442. encryption key token: ~{iss: inviter_fingerprint, ephemeral_private: base64_key, purpose: "ephemeral_key"}~ 345 346**** exchange process: 3471. invitee posts jwt1 + their public keys to ~/invitations~ 3482. server verifies jwt1 signature against realm members 3493. if valid: adds invitee to realm, returns ~{realm_id, realm_members, encrypted_realm_key}~ 3504. invitee verifies jwt2 signature against returned realm members 3515. invitee extracts ephemeral private key, decrypts realm encryption key 352 353**** security properties: 354- server never has decryption capability (missing ephemeral private key) 355- both jwts must be signed by verified realm member 356- if first exchange fails, second jwt is cryptographically worthless 357- atomic operation: identity added only if invitation valid 358- built-in expiration and tamper detection via jwt standard 359 360**** considered alternatives: 361- raw ephemeral keys in qr: simpler but no authenticity 362- ecdh key agreement: chicken-and-egg problem with public key exchange 363- server escrow: good but missing authentication layer 364- password-based: requires secure out-of-band sharing 365 366the dual-jwt approach provides proper authenticated invitations while maintaining e2e encryption properties. 367 368**** refined dual-jwt with ephemeral signing 369simplified the approach by using ephemeral key for second jwt signature: 370 371**setup**: 3721. inviter generates ephemeral keypair 3732. encrypts realm key with ephemeral private key 3743. posts to server: ~{invitation_id, realm_id, ephemeral_public, encrypted_realm_key}~ 375 376**qr code contains**: 377#+BEGIN_SRC json 378// JWT 1: signed with inviter's realm signing key 379{ 380 "realm_id": "uuid", 381 "invitation_id": "uuid", 382 "iss": "inviter_fingerprint" 383} 384 385// JWT 2: signed with ephemeral private key 386{ 387 "ephemeral_private": "base64_key", 388 "invitation_id": "uuid" 389} 390#+END_SRC 391 392**exchange flow**: 3931. submit jwt1 → server verifies against realm members → returns ~{invitation_id, realm_id, ephemeral_public, encrypted_realm_key}~ 3942. verify jwt2 signature using ~ephemeral_public~ from server response 3953. extract ~ephemeral_private~ from jwt2, decrypt realm key 396 397**benefits over previous version**: 398- no premature key disclosure (invitee keys shared via normal webrtc peering) 399- self-contained verification (ephemeral public key verifies jwt2) 400- cleaner separation of realm auth vs encryption key distribution 401- simpler flow (no need to return realm member list) 402 403**crypto verification principle**: digital signatures work as sign-with-private/verify-with-public, while encryption works as encrypt-with-public/decrypt-with-private. jwt2 verification uses signature verification, not decryption. 404 405**invitation flow diagram**: 406#+BEGIN_SRC mermaid 407sequenceDiagram 408 participant I as Inviter 409 participant S as Server 410 participant E as Invitee 411 412 Note over I: Generate ephemeral keypair 413 I->>I: ephemeral_private, ephemeral_public 414 415 Note over I: Encrypt realm key 416 I->>I: encrypted_realm_key = encrypt(realm_key, ephemeral_private) 417 418 I->>S: POST /invitations<br/>{invitation_id, realm_id, ephemeral_public, encrypted_realm_key} 419 S-->>I: OK 420 421 Note over I: Create JWTs for QR code 422 I->>I: jwt1 = sign({realm_id, invitation_id}, inviter_private) 423 I->>I: jwt2 = sign({ephemeral_private, invitation_id}, ephemeral_private) 424 425 Note over I,E: QR code contains [jwt1, jwt2] 426 427 E->>S: POST /invitations/exchange<br/>{jwt1} 428 Note over S: Verify jwt1 signature<br/>against realm members 429 S-->>E: {invitation_id, realm_id, ephemeral_public, encrypted_realm_key} 430 431 Note over E: Verify jwt2 signature<br/>using ephemeral_public 432 E->>E: verify_signature(jwt2, ephemeral_public) 433 434 Note over E: Extract key and decrypt 435 E->>E: ephemeral_private = decode(jwt2) 436 E->>E: realm_key = decrypt(encrypted_realm_key, ephemeral_private) 437 438 Note over E: Now member of realm! 439#+END_SRC 440 441**** jwk keypair generation and validation :ai:claude: 442 443discussed jwk vs raw crypto.subtle for keypair storage. since public keys need server storage for realm authorization, jwk is better for interoperability. 444 445**keypair generation**: 446#+BEGIN_SRC typescript 447const keypair = await crypto.subtle.generateKey( 448 { name: "Ed25519" }, 449 true, 450 ["sign", "verify"] 451); 452 453const publicJWK = await crypto.subtle.exportKey("jwk", keypair.publicKey); 454const privateJWK = await crypto.subtle.exportKey("jwk", keypair.privateKey); 455 456// JWK format: 457{ 458 "kty": "OKP", 459 "crv": "Ed25519", 460 "x": "base64url-encoded-public-key", 461 "d": "base64url-encoded-private-key" // only in private JWK 462} 463#+END_SRC 464 465**client validation**: 466#+BEGIN_SRC typescript 467function isValidEd25519PublicJWK(jwk: any): boolean { 468 return ( 469 typeof jwk === 'object' && 470 jwk.kty === 'OKP' && 471 jwk.crv === 'Ed25519' && 472 typeof jwk.x === 'string' && 473 jwk.x.length === 43 && // base64url Ed25519 public key length 474 !jwk.d && // public key shouldn't have private component 475 !jwk.use || jwk.use === 'sig' 476 ); 477} 478 479async function validatePublicKey(publicJWK: JsonWebKey): Promise<CryptoKey | null> { 480 try { 481 if (!isValidEd25519PublicJWK(publicJWK)) return null; 482 483 const key = await crypto.subtle.importKey( 484 'jwk', 485 publicJWK, 486 { name: 'Ed25519' }, 487 false, 488 ['verify'] 489 ); 490 491 return key; 492 } catch { 493 return null; 494 } 495} 496#+END_SRC 497 498**server validation (node.js)**: 499#+BEGIN_SRC typescript 500import { webcrypto } from 'node:crypto'; 501 502async function validateClientPublicKey(publicJWK: JsonWebKey): Promise<boolean> { 503 try { 504 if (!isValidEd25519PublicJWK(publicJWK)) return false; 505 506 await webcrypto.subtle.importKey( 507 'jwk', 508 publicJWK, 509 { name: 'Ed25519' }, 510 false, 511 ['verify'] 512 ); 513 514 return true; 515 } catch { 516 return false; 517 } 518} 519#+END_SRC 520 521**authentication flow**: 522#+BEGIN_SRC typescript 523// client signs message 524const authMessage = { 525 realm: 'uuid-here', 526 timestamp: Date.now(), 527 action: 'join' 528}; 529 530const signature = await crypto.subtle.sign( 531 'Ed25519', 532 privateKey, 533 new TextEncoder().encode(JSON.stringify(authMessage)) 534); 535 536// server verifies 537async function verifyAuth(req: AuthRequest): Promise<boolean> { 538 const publicKey = await webcrypto.subtle.importKey( 539 'jwk', 540 req.publicKey, 541 { name: 'Ed25519' }, 542 false, 543 ['verify'] 544 ); 545 546 const messageBytes = new TextEncoder().encode(JSON.stringify(req.message)); 547 const signatureBytes = new Uint8Array(req.signature); 548 549 return await webcrypto.subtle.verify( 550 'Ed25519', 551 publicKey, 552 signatureBytes, 553 messageBytes 554 ); 555} 556#+END_SRC 557 558**** proposed schemas :ai:claude: 559 560***** client-side schema (dexie) 561 562#+BEGIN_SRC typescript 563// Core RSS/Podcast data (from your existing design) 564interface Channel { 565 id: string; 566 feedUrl: string; 567 htmlUrl?: string; 568 imageUrl?: string; 569 title?: string; 570 description?: string; 571 language?: string; 572 people?: Record<string, string>; 573 tags?: string[]; 574 575 // Refresh management 576 refreshHP: number; 577 nextRefreshAt?: number; 578 lastRefreshAt?: number; 579 lastRefreshStatus?: string; 580 lastRefreshHttpStatus?: number; 581 lastRefreshHttpEtag?: string; 582 583 // Cache info 584 contentHash?: string; 585 lastFetchedAt?: number; 586} 587 588interface ChannelEntry { 589 id: string; 590 channelId: string; 591 guid: string; 592 title: string; 593 linkUrl?: string; 594 imageUrl?: string; 595 snippet?: string; 596 content?: string; 597 598 enclosure?: { 599 url: string; 600 type?: string; 601 length?: number; 602 }; 603 604 podcast?: { 605 explicit?: boolean; 606 duration?: string; 607 seasonNum?: number; 608 episodeNum?: number; 609 transcriptUrl?: string; 610 }; 611 612 publishedAt?: number; 613 fetchedAt?: number; 614} 615 616// Device-specific sync tables 617interface PlayRecord { 618 id: string; 619 entryId: string; 620 deviceId: string; 621 position: number; 622 duration?: number; 623 completed: boolean; 624 speed: number; 625 updatedAt: number; 626} 627 628interface Subscription { 629 id: string; 630 channelId: string; 631 deviceId?: string; 632 parentPath: string; // "/Tech/Programming" 633 autoDownload: boolean; 634 downloadLimit?: number; 635 isActive: boolean; 636 createdAt: number; 637 updatedAt: number; 638} 639 640interface QueueItem { 641 id: string; 642 entryId: string; 643 deviceId: string; 644 position: number; 645 addedAt: number; 646} 647 648interface Device { 649 id: string; 650 name: string; 651 platform: string; 652 lastSeen: number; 653} 654 655// Local cache metadata 656interface LocalCache { 657 id: string; 658 url: string; 659 contentHash: string; 660 filePath: string; // OPFS path 661 cachedAt: number; 662 expiresAt?: number; 663 size: number; 664 isOfflineOnly: boolean; 665} 666 667// Dexie schema 668const db = new Dexie('SkypodDB'); 669db.version(1).stores({ 670 channels: '&id, feedUrl, contentHash', 671 channelEntries: '&id, channelId, publishedAt', 672 playRecords: '&id, [entryId+deviceId], deviceId, updatedAt', 673 subscriptions: '&id, channelId, deviceId, parentPath', 674 queueItems: '&id, entryId, deviceId, position', 675 devices: '&id, lastSeen', 676 localCache: '&id, url, contentHash, expiresAt' 677}); 678#+END_SRC 679 680***** server-side schema 681 682#+BEGIN_SRC typescript 683// Content-addressed cache 684interface ContentStore { 685 contentHash: string; // Primary key 686 content: Buffer; // Raw feed content 687 contentType: string; 688 contentLength: number; 689 firstSeenAt: number; 690 referenceCount: number; 691} 692 693interface ContentHistory { 694 id: string; 695 url: string; 696 contentHash: string; 697 fetchedAt: number; 698 isLatest: boolean; 699} 700 701// HTTP cache with health tracking (from your existing design) 702interface HttpCache { 703 key: string; // URL hash, primary key 704 url: string; 705 706 status: 'alive' | 'dead'; 707 lastFetchedAt: number; 708 lastFetchError?: string; 709 lastFetchErrorStreak: number; 710 711 lastHttpStatus: number; 712 lastHttpEtag?: string; 713 lastHttpHeaders: Record<string, string>; 714 expiresAt: number; 715 expirationTtl: number; 716 717 contentHash: string; // Points to ContentStore 718} 719 720// Sync/auth tables 721interface Realm { 722 id: string; // UUID 723 createdAt: number; 724 verifiedKeys: string[]; // Public key list 725} 726 727interface PeerConnection { 728 id: string; 729 realmId: string; 730 publicKey: string; 731 lastSeen: number; 732 isOnline: boolean; 733} 734 735// Media cache for podcast episodes 736interface MediaCache { 737 contentHash: string; // Primary key 738 originalUrl: string; 739 mimeType: string; 740 fileSize: number; 741 content: Buffer; 742 cachedAt: number; 743 accessCount: number; 744} 745#+END_SRC 746 747**** episode title parsing for sub-feed groupings :ai:claude: 748 749*problem*: some podcast feeds contain multiple shows, need hierarchical organization within a feed 750 751*example*: "Apocalypse Players" podcast 752- episode title: "A Term of Art 6 - Winston's Hollow" 753- desired grouping: "Apocalypse Players > A Term of Art > 6 - Winston's Hollow" 754- UI shows sub-shows within the main feed 755 756***** approaches considered 757 7581. *manual regex patterns* (short-term solution) 759 - user provides regex with capture groups = tags 760 - reliable, immediate, user-controlled 761 - requires manual setup per feed 762 7632. *LLM-generated regex* (automation goal) 764 - analyze last 100 episode titles 765 - generate regex pattern automatically 766 - good balance of automation + reliability 767 7683. *NER model training* (experimental) 769 - train spacy model for episode title parsing 770 - current prototype: 150 labelled examples, limited success 771 - needs more training data to be viable 772 773***** data model implications 774 775- add regex pattern field to Channel/Feed 776- store extracted groupings as hierarchical tags on ~ChannelEntry~ 777- maybe add grouping/series field to episodes 778 779***** plan 780 781*preference*: start with manual regex, evolve toward LLM automation 782 783*implementation design*: 784- if no title pattern: episodes are direct children of the feed 785- title pattern = regex with named capture groups + path template 786 787*example configuration*: 788- regex: ~^(?<series>[^0-9]+)\s*(?<episode>\d+)\s*-\s*(?<title>.+)$~ 789- path template: ~{series} > Episode {episode} - {title}~ 790- result: "A Term of Art 6 - Winston's Hollow" → "A Term of Art > Episode 6 - Winston's Hollow" 791 792*schema additions*: 793#+BEGIN_SRC typescript 794interface Channel { 795 // ... existing fields 796 titlePatterns?: Array<{ 797 name: string; // "Main Episodes", "Bonus Content", etc. 798 regex: string; // named capture groups 799 pathTemplate: string; // interpolation template 800 priority: number; // order to try patterns (lower = first) 801 isActive: boolean; // can disable without deleting 802 }>; 803 fallbackPath?: string; // template for unmatched episodes 804} 805 806interface ChannelEntry { 807 // ... existing fields 808 parsedPath?: string; // computed from titlePattern 809 parsedGroups?: Record<string, string>; // captured groups 810 matchedPatternName?: string; // which pattern was used 811} 812#+END_SRC 813 814*pattern matching logic*: 8151. try patterns in priority order (lower number = higher priority) 8162. first matching pattern wins 8173. if no patterns match, use fallbackPath template (e.g., "Misc > {title}") 8184. if no fallbackPath, episode stays direct child of feed 819 820*example multi-pattern setup*: 821- Pattern 1: "Main Episodes" - ~^(?<series>[^0-9]+)\s*(?<episode>\d+)~~{series} > Episode {episode}~ 822- Pattern 2: "Bonus Content" - ~^Bonus:\s*(?<title>.+)~~Bonus > {title}~ 823- Fallback: ~Misc > {title}~ 824 825**** scoped tags and filter-based UI evolution :ai:claude: 826 827*generalization*: move from rigid hierarchies to tag-based filtering system 828 829*tag scoping*: 830- feed-level tags: "Tech", "Gaming", "D&D" 831- episode-level tags: from regex captures like "series:CriticalRole", "campaign:2", "type:main" 832- user tags: manual additions like "favorites", "todo" 833 834*UI as tag filtering*: 835- default view: all episodes grouped by feed 836- filter by ~series:CriticalRole~ → shows only CR episodes across all feeds 837- filter by ~type:bonus~ → shows bonus content from all podcasts 838- combine filters: ~series:CriticalRole AND type:main~ → main CR episodes only 839 840*benefits*: 841- no rigid hierarchy - users create their own views 842- regex patterns become automated episode taggers 843- same filtering system works for search, organization, queues 844- tags are syncable metadata, views are client-side 845 846*schema evolution*: 847#+BEGIN_SRC typescript 848interface Tag { 849 scope: 'feed' | 'episode' | 'user'; 850 key: string; // "series", "type", "campaign" 851 value: string; // "CriticalRole", "bonus", "2" 852} 853 854interface ChannelEntry { 855 // ... existing 856 tags: Tag[]; // includes regex-generated + manual 857} 858 859interface FilterView { 860 id: string; 861 name: string; 862 folderPath: string; // "/Channels/Critical Role" 863 filters: Array<{ 864 key: string; 865 value: string; 866 operator: 'equals' | 'contains' | 'not'; 867 }>; 868 isDefault: boolean; 869 createdAt: number; 870} 871#+END_SRC 872 873**** default UI construction and feed merging :ai:claude: 874 875*auto-generated views on subscribe*: 876- subscribe to "Critical Role" → creates ~/Channels/Critical Role~ folder 877- default filter view: ~feed:CriticalRole~ (shows all episodes from that feed) 878- user can customize, split into sub-views, or delete 879 880*smart view suggestions*: 881- after regex patterns generate tags, suggest splitting views 882- "I noticed episodes with ~series:Campaign2~ and ~series:Campaign3~ - create separate views?" 883- "Create view for ~type:bonus~ episodes?" 884 885*view management UX*: 886- right-click feed → "Split by series", "Split by type" 887- drag episodes between views to create manual filters 888- views can be nested: ~/Channels/Critical Role/Campaign 2/Main Episodes~ 889 890*feed merging for multi-source shows*: 891problem: patreon feed + main show feed for same podcast 892 893#+BEGIN_EXAMPLE 894/Channels/ 895 Critical Role/ 896 All Episodes # merged view: feed:CriticalRole OR feed:CriticalRolePatreon 897 Main Feed # filter: feed:CriticalRole 898 Patreon Feed # filter: feed:CriticalRolePatreon 899#+END_EXAMPLE 900 901*deduplication strategy*: 902- episodes matched by ~guid~ or similar content hash 903- duplicate episodes get ~source:main,patreon~ tags 904- UI shows single episode with source indicators 905- user can choose preferred source for playback 906- play state syncs across all sources of same episode 907 908*feed relationship schema*: 909#+BEGIN_SRC typescript 910interface FeedGroup { 911 id: string; 912 name: string; // "Critical Role" 913 feedIds: string[]; // [mainFeedId, patreonFeedId] 914 mergeStrategy: 'guid' | 'title' | 'contentHash'; 915 defaultView: FilterView; 916} 917 918interface ChannelEntry { 919 // ... existing 920 duplicateOf?: string; // points to canonical episode ID 921 sources: string[]; // feed IDs where this episode appears 922} 923#+END_SRC 924 925**per-view settings and state**: 926each filter view acts like a virtual feed with its own: 927- unread counts (episodes matching filter that haven't been played) 928- notification settings (notify for new episodes in this view) 929- muted state (hide notifications, mark as read automatically) 930- auto-download preferences (download episodes that match this filter) 931- play queue integration (add new episodes to queue) 932 933**use cases**: 934- mute "Bonus Content" view but keep notifications for main episodes 935- auto-download only "Campaign 2" episodes, skip everything else 936- separate unread counts: "5 unread in Main Episodes, 2 in Bonus" 937- queue only certain series automatically 938 939**schema additions**: 940#+BEGIN_SRC typescript 941interface FilterView { 942 // ... existing fields 943 settings: { 944 notificationsEnabled: boolean; 945 isMuted: boolean; 946 autoDownload: boolean; 947 autoQueue: boolean; 948 downloadLimit?: number; // max episodes to keep 949 }; 950 state: { 951 unreadCount: number; 952 lastViewedAt?: number; 953 isCollapsed: boolean; // in sidebar 954 }; 955} 956#+END_SRC 957 958*inheritance behavior*: 959- new filter views inherit settings from parent feed/group 960- user can override per-view 961- "mute all Critical Role" vs "mute only bonus episodes" 962 963**** client-side episode caching strategy :ai:claude: 964 965*architecture*: service worker-based transparent caching 966 967*flow*: 9681. audio player requests ~/audio?url={episodeUrl}~ 9692. service worker intercepts request 9703. if present in cache (with Range header support): 971 - serve from cache 9724. else: 973 - let request continue to server (immediate playback) 974 - simultaneously start background fetch of full audio file 975 - when complete, broadcast "episode-cached" event 976 - audio player catches event and restarts feed → now uses cached version 977 978**benefits**: 979- no playback interruption (streaming starts immediately) 980- seamless transition to cached version 981- Range header support for seeking/scrubbing 982- transparent to audio player implementation 983 984*implementation considerations*: 985- cache storage limits and cleanup policies 986- partial download resumption if interrupted 987- cache invalidation when episode URLs change 988- offline playback support 989- progress tracking for background downloads 990 991**schema additions**: 992#+BEGIN_SRC typescript 993interface CachedEpisode { 994 episodeId: string; 995 originalUrl: string; 996 cacheKey: string; // for cache API 997 fileSize: number; 998 cachedAt: number; 999 lastAccessedAt: number; 1000 downloadProgress?: number; // 0-100 for in-progress downloads 1001} 1002#+END_SRC 1003 1004**service worker events**: 1005- ~episode-cache-started~ - background download began 1006- ~episode-cache-progress~ - download progress update 1007- ~episode-cache-complete~ - ready to switch to cached version 1008- ~episode-cache-error~ - download failed, stay with streaming 1009 1010**background sync for proactive downloads**: 1011 1012**browser support reality**: 1013- Background Sync API: good support (Chrome/Edge, limited Safari) 1014- Periodic Background Sync: very limited (Chrome only, requires PWA install) 1015- Push notifications: good support, but requires user permission 1016 1017**hybrid approach**: 10181. **foreground sync** (reliable): when app is open, check for new episodes 10192. **background sync** (opportunistic): register sync event when app closes 10203. **push notifications** (fallback): server pushes "new episodes available" 10214. **manual sync** (always works): pull-to-refresh, settings toggle 1022 1023**implementation strategy**: 1024#+BEGIN_SRC typescript 1025// Register background sync when app becomes hidden 1026document.addEventListener('visibilitychange', () => { 1027 if (document.hidden && 'serviceWorker' in navigator) { 1028 navigator.serviceWorker.ready.then(registration => { 1029 return registration.sync.register('download-episodes'); 1030 }); 1031 } 1032}); 1033 1034// Service worker handles sync event 1035self.addEventListener('sync', event => { 1036 if (event.tag === 'download-episodes') { 1037 event.waitUntil(syncEpisodes()); 1038 } 1039}); 1040#+END_SRC 1041 1042**realistic expectations**: 1043- iOS Safari: very limited background processing 1044- Android Chrome: decent background sync support 1045- Desktop: mostly works 1046- battery/data saver modes: disabled by OS 1047 1048**fallback strategy**: rely primarily on foreground sync + push notifications, treat background sync as nice-to-have enhancement 1049 1050**push notification sync workflow**: 1051 1052**server-side trigger**: 10531. server detects new episodes during RSS refresh 10542. check which users are subscribed to that feed 10553. send push notification with episode metadata payload 10564. notification wakes up service worker on client 1057 1058**service worker notification handler**: 1059#+BEGIN_SRC typescript 1060self.addEventListener('push', event => { 1061 const data = event.data?.json(); 1062 1063 if (data.type === 'new-episodes') { 1064 event.waitUntil( 1065 // Start background download of new episodes 1066 downloadNewEpisodes(data.episodes) 1067 .then(() => { 1068 // Show notification to user 1069 return self.registration.showNotification('New episodes available', { 1070 body: ~${data.episodes.length} new episodes downloaded~, 1071 icon: '/icon-192.png', 1072 badge: '/badge-72.png', 1073 tag: 'new-episodes', 1074 data: { episodeIds: data.episodes.map(e => e.id) } 1075 }); 1076 }) 1077 ); 1078 } 1079}); 1080 1081// Handle notification click 1082self.addEventListener('notificationclick', event => { 1083 event.notification.close(); 1084 1085 // Open app to specific episode or feed 1086 event.waitUntil( 1087 clients.openWindow(~/episodes/${event.notification.data.episodeIds[0]}~) 1088 ); 1089}); 1090#+END_SRC 1091 1092**server push logic**: 1093- batch notifications (don't spam for every episode) 1094- respect user notification preferences from FilterView settings 1095- include episode metadata in payload to avoid round-trip 1096- throttle notifications (max 1 per feed per hour?) 1097 1098**user flow**: 10991. new episode published → server pushes notification 11002. service worker downloads episode in background 11013. user sees "New episodes downloaded" notification 11024. tap notification → opens app to new episode, ready to play offline 1103 1104*benefits*: 1105- true background downloading without user interaction 1106- works even when app is closed 1107- respects per-feed notification settings 1108 1109**push payload size constraints**: 1110- **limit**: ~4KB (4,096 bytes) across most services 1111- **practical limit**: ~3KB to account for service overhead 1112- **implications for episode metadata**: 1113 1114#+BEGIN_SRC json 1115{ 1116 "type": "new-episodes", 1117 "episodes": [ 1118 { 1119 "id": "ep123", 1120 "channelId": "ch456", 1121 "title": "Episode Title", 1122 "url": "https://...", 1123 "duration": 3600, 1124 "size": 89432112 1125 } 1126 ] 1127} 1128#+END_SRC 1129 1130**payload optimization strategies**: 1131- minimal episode metadata in push (id, url, basic info) 1132- batch multiple episodes in single notification 1133- full episode details fetched after service worker wakes up 1134- URL shortening for long episode URLs 1135- compress JSON payload if needed 1136 1137**alternative for large payloads**: 1138- push notification contains only "new episodes available" signal 1139- service worker makes API call to get full episode list 1140- trade-off: requires network round-trip but unlimited data 1141 1142**logical clock sync optimization**: 1143 1144much simpler approach using sync revisions: 1145 1146#+BEGIN_SRC json 1147{ 1148 "type": "sync-available", 1149 "fromRevision": 12345, 1150 "toRevision": 12389, 1151 "changeCount": 8 1152} 1153#+END_SRC 1154 1155**service worker sync flow**: 11561. push notification wakes service worker with revision range 11572. service worker fetches ~/sync?from=12345&to=12389~ 11583. server returns only changes in that range (episodes, feed updates, etc) 11594. service worker applies changes to local dexie store 11605. service worker queues background downloads for new episodes 11616. updates local revision to 12389 1162 1163**benefits of revision-based approach**: 1164- tiny push payload (just revision numbers) 1165- server can efficiently return only changes in range 1166- automatic deduplication (revision already applied = skip) 1167- works for any sync data (episodes, feed metadata, user settings) 1168- handles offline gaps gracefully (fetch missing revision ranges) 1169 1170**sync API response**: 1171#+BEGIN_SRC typescript 1172interface SyncResponse { 1173 fromRevision: number; 1174 toRevision: number; 1175 changes: Array<{ 1176 type: 'episode' | 'channel' | 'subscription'; 1177 operation: 'create' | 'update' | 'delete'; 1178 data: any; 1179 revision: number; 1180 }>; 1181} 1182#+END_SRC 1183 1184**integration with episode downloads**: 1185- service worker processes sync changes 1186- identifies new episodes that match user's auto-download filters 1187- queues those for background cache fetching 1188- much more efficient than sending episode metadata in push payload 1189 1190**service worker processing time constraints**: 1191 1192**hard limits**: 1193- **30 seconds idle timeout**: service worker terminates after 30s of inactivity 1194- **5 minutes event processing**: single event/request must complete within 5 minutes 1195- **30 seconds fetch timeout**: individual network requests timeout after 30s 1196- **notification requirement**: push events MUST display notification before promise settles 1197 1198**practical implications**: 1199- sync API call (~/sync?from=X&to=Y~) must complete within 30s 1200- large episode downloads must be queued, not started immediately in push handler 1201- use ~event.waitUntil()~ to keep service worker alive during processing 1202- break large operations into smaller chunks 1203 1204**recommended push event flow**: 1205#+BEGIN_SRC typescript 1206self.addEventListener('push', event => { 1207 const data = event.data?.json(); 1208 1209 event.waitUntil( 1210 // Must complete within 5 minutes total 1211 handlePushSync(data) 1212 .then(() => { 1213 // Required: show notification before promise settles 1214 return self.registration.showNotification('Episodes synced'); 1215 }) 1216 ); 1217}); 1218 1219async function handlePushSync(data) { 1220 // 1. Quick sync API call (< 30s) 1221 const changes = await fetch(~/sync?from=${data.fromRevision}&to=${data.toRevision}~); 1222 1223 // 2. Apply changes to dexie store (fast, local) 1224 await applyChangesToStore(changes); 1225 1226 // 3. Queue episode downloads for later (don't start here) 1227 await queueEpisodeDownloads(changes.newEpisodes); 1228 1229 // Total time: < 5 minutes, preferably < 30s 1230} 1231#+END_SRC 1232 1233*download strategy*: use push event for sync + queuing, separate background tasks for actual downloads 1234 1235*background fetch API for large downloads*: 1236 1237*progressive enhancement approach*: 1238#+BEGIN_SRC typescript 1239async function queueEpisodeDownloads(episodes) { 1240 for (const episode of episodes) { 1241 if ('serviceWorker' in navigator && 'BackgroundFetch' in window) { 1242 // Chrome/Edge: use Background Fetch API for true background downloading 1243 await navigator.serviceWorker.ready.then(registration => { 1244 return registration.backgroundFetch.fetch( 1245 ~episode-${episode.id}~, 1246 episode.url, 1247 { 1248 icons: [{ src: '/icon-256.png', sizes: '256x256', type: 'image/png' }], 1249 title: ~Downloading: ${episode.title}~, 1250 downloadTotal: episode.fileSize 1251 } 1252 ); 1253 }); 1254 } else { 1255 // Fallback: queue for reactive download (download while streaming) 1256 await queueReactiveDownload(episode); 1257 } 1258 } 1259} 1260 1261// Handle background fetch completion 1262self.addEventListener('backgroundfetch', event => { 1263 if (event.tag.startsWith('episode-')) { 1264 event.waitUntil(handleEpisodeDownloadComplete(event)); 1265 } 1266}); 1267#+END_SRC 1268 1269*browser support reality*: 1270- *Chrome/Edge*: Background Fetch API supported 1271- *Firefox/Safari*: not supported, fallback to reactive caching 1272- *mobile*: varies by platform and browser 1273 1274*benefits when available*: 1275- true background downloading (survives app close, browser close) 1276- built-in download progress UI 1277- automatic retry on network failure 1278- no service worker time limits during download 1279 1280*graceful degradation*: 1281- detect support, use when available 1282- fallback to reactive caching (download while streaming) 1283- user gets best experience possible on their platform 1284 1285*** research todos :ai:claude: 1286 1287high-level unanswered questions from architecture brainstorming: 1288 1289**** sync and data management 1290***** TODO dexie sync capabilities vs rxdb for multi-device sync implementation 1291***** TODO webrtc p2p sync implementation patterns and reliability 1292***** TODO conflict resolution strategies for device-specific data in distributed sync 1293***** TODO content-addressed deduplication algorithms for rss/podcast content 1294**** client-side storage and caching 1295***** TODO opfs storage limits and cleanup strategies for client-side caching 1296***** TODO practical background fetch api limits and edge cases for podcast downloads 1297**** automation and intelligence 1298***** TODO llm-based regex generation for episode title parsing automation 1299***** TODO push notification subscription management and realm authentication 1300**** platform and browser capabilities 1301***** TODO browser audio api capabilities for podcast-specific features (speed, silence skip) 1302***** TODO progressive web app installation and platform-specific behaviors 1303 1304# Local Variables: 1305# org-hierarchical-todo-statistics: nil 1306# org-checkbox-hierarchical-statistics: nil 1307# End: