docs/devlog.org at prototype-deno · accidental.cc/skypod

accidental.cc / skypod
podcast manager
skypod / docs / devlog.org
at prototype-deno 44 kB view raw
   1#+PROPERTY: COOKIE_DATA recursive
   2#+STARTUP: overview
   3
   4most of this is old, I need to rework it
   5
   6* design
   7
   8** frontend (packages/app)
   9- http://localhost:7891
  10- proxies ~/api~ and ~/sync~ to the backend in development
  11- uses Dexie for local storage with sync plugin
  12- custom sync replication implementation using PeerJS through the signalling server
  13
  14** backend (packages/server)
  15- http://localhost:7890
  16- serves ~/dist~ if the directory is present (see ~dist~ script)
  17- serves ~/api~ for RSS caching proxy
  18  - file-based routing under the api directory
  19- serves ~/sync~ which is a ~peerjs~ signalling server
  20
  21** sync
  22- each client keeps the full data set
  23- dexie sync and observable let us stream change sets
  24- we can publish the "latest" to all peers
  25- on first pull, if not the first client, we can request a dump out of band
  26
  27*** rss feed data
  28- do we want to backup feed data?
  29  - conceptually, this should be refetchable
  30  - but feeds go away, and some will only show recent stories
  31    - so yes, we'll need this
  32  - but server side, we can dedupe
  33    - content-addressed server-side cache?
  34
  35- server side does RSS pulling
  36  - can feeds be marked private, such that they won't be pulled through the proxy?
  37  - but then we require everything to be fetchable via cors
  38    - client configured proxy settings?
  39
  40*** peer connection
  41- on startup, check for current realm-id and key pair
  42- if not present, ask to login or start new
  43  - if login, run through the [[* pairing]] process
  44  - if start new, run through the [[* registration]] process
  45- use keypair to authenticate to server
  46  - response includes list of active peers to connect
  47- clients negotiate sync from there
  48- an identity is a keypair and a realm
  49
  50- realm is uuid
  51  - realm on the server is the socket connection for peer discovery
  52    - keeps a list of verified public keys
  53    - and manages the /current/ ~public-key->peer ids~ mapping
  54  - realm on the client side is first piece of info required for sync
  55    - when connecting to the signalling server, you present a realm, and a signed public key
  56    - server accepts/rejects based on signature and current verified keys
  57
  58- a new keypair can create a realm
  59
  60- a new keypair can double sign an invitation
  61  - invite = ~{ realm:, nonce:, not_before:, not_after:, authorizer: }~, signed with verified key
  62  - exchanging an invite = ~{ invite: }~, signed with my key
  63
  64- on startup
  65  - start stand-alone (no syncing required, usually the case on first-run)
  66    - generate a keypair
  67    - want server backup?
  68      - sign a "setup" message with new keypair and send to the server
  69      - server responds with a new realm, that this keypair is already verified for
  70    - move along
  71  - exchange invite to sync to other devices
  72    - generate a keypair
  73    - sign the exchange message with the invite and send to the server
  74      - server verifies the invite
  75      - adds the new public key to the peer list and publishes downstream
  76    - move along
  77
  78***** standalone
  79in this mode, there is no syncing. this is the most likely first-time run option.
  80
  81- generate a keypair on startup, so we have a stable fingerprint in the future
  82- done
  83
  84***** pairing
  85in this mode, there is syncing to a named realm, but not necessarily server resources consumed
  86we don't need an email, since the server is just doing signalling and peer management
  87
  88- generate an invite from an existing verified peer
  89  - ~{ realm:, not_before:, not_after:, inviter: peer.public_key }~
  90  - sign that invitation from the existing verified peer
  91
  92- standalone -> paired
  93  - get the invitation somehow (QR code?)
  94  - sign an invite exchange with the standalone's public key
  95  - send to server
  96    - server verifies the invite
  97    - adds the new public key to the peer list and publishes downstream
  98
  99***** server backup
 100in this mode, there is syncing to a named realm by email.
 101
 102goal of server backup mode is that we can go from email->fully working client with latest data without having to have any clients left around that could participate in the sync.
 103
 104- generate a keypair on startup
 105- sign a registration message sent to the server
 106  - send a verification email
 107    - if email/realm already exists, this is authorization
 108    - if not, it's email validation
 109  - server starts a realm and associates the public key
 110  - server acts as a peer for the realm, and stores private data
 111
 112- since dexie is publishing change sets, we should be able to just store deltas
 113- but we'll need to store _all_ deltas, unless we're materializing on the server side too
 114  - should we use an indexdb shim so we can import/export from the server for clean start?
 115  - how much materialization does the server need?
 116
 117* ai instructions
 118- when writing to the devlog, add tags to your entries specifying ~:ai:~ and what tool did it.
 119- false starts and prototypes are in ~./devlog/~
 120
 121* notes and decision record [1/11]
 122** architecture design (may 28-29)                               :ai:claude:
 123
 124details notes are in [[./devlog/may-29.org]]
 125key decisions and system design:
 126
 127*** sync model
 128- device-specific records for playback state/queues to avoid conflicts
 129- content-addressed server cache with deduplication
 130- dual-JWT invitation flow for secure realm joining
 131
 132*** data structures
 133- tag-based filtering system instead of rigid hierarchies
 134- regex patterns for episode title parsing and organization
 135- service worker caching with background download support
 136
 137*** core schemas
 138**** client (dexie)
 139- Channel/ChannelEntry for RSS feeds and episodes
 140- PlayRecord/QueueItem scoped by deviceId
 141- FilterView for virtual feed organization
 142
 143**** server (drizzle)
 144- ContentStore for deduplicated content by hash
 145- Realm/PeerConnection for sync authorization
 146- HttpCache with health tracking and TTL
 147
 148*** push sync strategy
 149- revision-based sync (just send revision ranges in push notifications)
 150- background fetch API for large downloads where supported
 151- graceful degradation to reactive caching
 152
 153*** research todos                                                :ai:claude:
 154
 155**** sync and data management
 156***** DONE identity and signature management
 157***** TODO dexie sync capabilities vs rxdb for multi-device sync implementation
 158***** TODO webrtc p2p sync implementation patterns and reliability
 159***** TODO conflict resolution strategies for device-specific data in distributed sync
 160***** TODO content-addressed deduplication algorithms for rss/podcast content
 161**** client-side storage and caching
 162***** TODO opfs storage limits and cleanup strategies for client-side caching
 163***** TODO practical background fetch api limits and edge cases for podcast downloads
 164**** automation and intelligence
 165***** TODO llm-based regex generation for episode title parsing automation
 166***** TODO push notification subscription management and realm authentication
 167**** platform and browser capabilities
 168***** TODO browser audio api capabilities for podcast-specific features (speed, silence skip)
 169***** TODO progressive web app installation and platform-specific behaviors
 170
 171# Local Variables:
 172# org-hierarchical-todo-statistics: nil
 173# org-checkbox-hierarchical-statistics: nil
 174# End:
 175
 176** <2025-05-28 Wed>
 177getting everything setup
 178
 179the biggest open question I have is what sort of privacy/encryption guarantee I need. I want the server to be able to do things like cache and store feed data long-term.
 180
 181Is "if you want full privacy, self-host" valid?
 182
 183*** possibilities
 184
 185- fully PWA
 186  - CON: cors, which would require a proxy anyway
 187  - CON: audio analysis, llm based stuff for categorization, etc. won't work
 188  - PRO: private as all get out
 189    - can still do WebRTC p2p sync for resiliancy
 190    - can still do server backups, if sync stream is encrypted, but no compaction would be available
 191    - could do _explicit_ server backups as dump files
 192
 193- self hostable
 194  - PRO: can do bunches of private stuff on the server, because if you don't want me to see it, do it elsewhere
 195  - CON: hard for folk to use
 196
 197*** brainstorm                                                    :ai:claude:
 198**** sync conflict resolution design discussion                   :ai:claude:
 199
 200discussed the sync architecture and dexie conflict handling:
 201
 202*dexie syncable limitations*:
 203- logical clocks handle causally-related changes well
 204- basic timestamp-based conflict resolution for concurrent updates
 205- last-writer-wins for same field conflicts
 206- no sophisticated CRDT or vector clock support
 207
 208*solutions for podcast-specific conflicts*:
 209
 210- play records: device-specific approach
 211  - store separate ~play_records~ per ~device_id~
 212  - each record: ~{ episode_id, device_id, position, completed, timestamp }~
 213  - UI handles conflict resolution with "continue from X device?" prompts
 214  - avoids arbitrary timestamp wins, gives users control
 215
 216- subscription trees
 217  - store ~parent_path~ as single string field ("/Tech/Programming")
 218  - simpler than managing folder membership tables
 219  - conflicts still possible but contained to single field
 220  - could store move operations as events for richer resolution
 221
 222*other sync considerations*:
 223- settings/preferences: distinguish device-local vs global
 224- bulk operations: "mark all played" can create duplicate operations
 225- metadata updates: server RSS updates vs local renames
 226- temporal ordering: recently played lists, queue reordering
 227- storage limits: cleanup operations conflicting across devices
 228- feed state: refresh timestamps, error states
 229
 230*approach*: prefer "events not state" pattern and device-specific records where semantic conflicts are likely
 231
 232**** data model brainstorm                                        :ai:claude:
 233
 234core entities designed with sync in mind:
 235
 236***** ~Feed~ :: RSS/podcast subscription
 237- ~parent_path~ field for folder structure (eg. ~/Tech/Programming~)
 238- ~is_private~ flag to skip server proxy
 239- ~refresh_interval~ for custom update frequencies
 240
 241***** ~Episode~ :: individual podcast episodes
 242- standard RSS metadata (guid, title, description, media url)
 243- duration and file info for playback
 244
 245***** ~PlayRecord~ :: device-specific playback state
 246- separate record per ~device_id~ to avoid timestamp conflicts
 247- position, completed status, playback speed
 248- UI can prompt "continue from X device?" for resolution
 249
 250***** ~QueueItem~ :: device-specific episode queue
 251- ordered list with position field
 252- ~device_id~ scoped to avoid queue conflicts
 253
 254***** ~Subscription~ :: feed membership settings
 255- can be global or device-specific
 256- auto-download preferences per device
 257
 258***** ~Settings~ :: split global vs device-local
 259- theme, default speed = global
 260- download path, audio device = device-local
 261
 262***** Event tables for complex operations:
 263- ~FeedMoveEvent~ for folder reorganization
 264- ~BulkMarkPlayedEvent~ for "mark all read" operations
 265- better conflict resolution than direct state updates
 266
 267***** sync considerations
 268- device identity established on first run
 269- dexie syncable handles basic timestamp conflicts
 270- prefer device-scoped records for semantic conflicts
 271- event-driven pattern for bulk operations
 272
 273**** schema evolution from previous iteration                     :ai:claude:
 274
 275reviewed existing schema from tmp/feed.ts - well designed foundation:
 276
 277***** keep from original
 278- Channel/ChannelEntry naming and structure
 279- ~refreshHP~ adaptive refresh system (much better than simple intervals)
 280- rich podcast metadata (people, tags, enclosure, podcast object)
 281- HTTP caching with etag/status tracking
 282- epoch millisecond timestamps
 283- ~hashId()~ approach for entry IDs
 284
 285***** add for multi-device sync
 286- ~PlayState~ table (device-scoped position/completion)
 287- Subscription table (with ~parentPath~ for folders, device-scoped settings)
 288- ~QueueItem~ table (device-scoped episode queues)
 289- Device table (identity management)
 290
 291***** migration considerations
 292- existing Channel/ChannelEntry can be preserved
 293- new tables are additive
 294- ~fetchAndUpsert~ method works well with server proxy architecture
 295- dexie sync vs rxdb - need to evaluate change tracking capabilities
 296
 297**** content-addressed caching for offline resilience             :ai:claude:
 298
 299designed caching system for when upstream feeds fail/disappear, building on existing cache-schema.ts:
 300
 301***** server-side schema evolution (drizzle sqlite):
 302- keep existing ~httpCacheTable~ design (health tracking, http headers, ttl)
 303- add ~contentHash~ field pointing to deduplicated content
 304- new ~contentStoreTable~: deduplicated blobs by sha256 hash
 305- new ~contentHistoryTable~: url -> contentHash timeline with isLatest flag
 306- reference counting for garbage collection
 307
 308***** client-side OPFS storage
 309- ~/cache/content/{contentHash}.xml~ for raw feeds
 310- ~/cache/media/{contentHash}.mp3~ for podcast episodes
 311- ~LocalCacheEntry~ metadata tracks expiration and offline-only flags
 312- maintains last N versions per feed for historical access
 313
 314***** fetch strategy & fallback
 3151. check local OPFS cache first (fastest)
 3162. try server proxy ~/api/feed?url={feedUrl}~ (deduplicated)
 3173. server checks ~contentHistory~, serves latest or fetches upstream
 3184. server returns ~{contentHash, content, cached: boolean}~
 3195. client stores with content hash as filename
 3206. emergency mode: serve stale content when upstream fails
 321
 322- preserves existing health tracking and HTTP caching logic
 323- popular feeds cached once on server, many clients benefit
 324- bandwidth savings via content hash comparison
 325- historical feed state preservation (feeds disappear!)
 326- true offline operation after initial sync
 327
 328** <2025-05-29 Thu>                                               :ai:claude:
 329e2e encryption and invitation flow design
 330
 331worked through the crypto and invitation architecture. key decisions:
 332
 333*** keypair strategy
 334- use jwk format for interoperability (server stores public keys)
 335- ed25519 for signing, separate x25519 for encryption if needed
 336- zustand lazy initialization pattern: ~ensureKeypair()~ on first use
 337- store private jwk in persisted zustand state
 338
 339*** invitation flow: dual-jwt approach
 340solved the chicken-and-egg problem of sharing encryption keys securely.
 341
 342**** qr code contains two signed jwts:
 3431. invitation token: ~{iss: inviter_fingerprint, sub: invitation_id, purpose: "realm_invite"}~
 3442. encryption key token: ~{iss: inviter_fingerprint, ephemeral_private: base64_key, purpose: "ephemeral_key"}~
 345
 346**** exchange process:
 3471. invitee posts jwt1 + their public keys to ~/invitations~
 3482. server verifies jwt1 signature against realm members
 3493. if valid: adds invitee to realm, returns ~{realm_id, realm_members, encrypted_realm_key}~
 3504. invitee verifies jwt2 signature against returned realm members
 3515. invitee extracts ephemeral private key, decrypts realm encryption key
 352
 353**** security properties:
 354- server never has decryption capability (missing ephemeral private key)
 355- both jwts must be signed by verified realm member
 356- if first exchange fails, second jwt is cryptographically worthless
 357- atomic operation: identity added only if invitation valid
 358- built-in expiration and tamper detection via jwt standard
 359
 360**** considered alternatives:
 361- raw ephemeral keys in qr: simpler but no authenticity
 362- ecdh key agreement: chicken-and-egg problem with public key exchange
 363- server escrow: good but missing authentication layer
 364- password-based: requires secure out-of-band sharing
 365
 366the dual-jwt approach provides proper authenticated invitations while maintaining e2e encryption properties.
 367
 368**** refined dual-jwt with ephemeral signing
 369simplified the approach by using ephemeral key for second jwt signature:
 370
 371**setup**:
 3721. inviter generates ephemeral keypair
 3732. encrypts realm key with ephemeral private key
 3743. posts to server: ~{invitation_id, realm_id, ephemeral_public, encrypted_realm_key}~
 375
 376**qr code contains**:
 377#+BEGIN_SRC json
 378// JWT 1: signed with inviter's realm signing key
 379{
 380  "realm_id": "uuid",
 381  "invitation_id": "uuid",
 382  "iss": "inviter_fingerprint"
 383}
 384
 385// JWT 2: signed with ephemeral private key
 386{
 387  "ephemeral_private": "base64_key",
 388  "invitation_id": "uuid"
 389}
 390#+END_SRC
 391
 392**exchange flow**:
 3931. submit jwt1 → server verifies against realm members → returns ~{invitation_id, realm_id, ephemeral_public, encrypted_realm_key}~
 3942. verify jwt2 signature using ~ephemeral_public~ from server response
 3953. extract ~ephemeral_private~ from jwt2, decrypt realm key
 396
 397**benefits over previous version**:
 398- no premature key disclosure (invitee keys shared via normal webrtc peering)
 399- self-contained verification (ephemeral public key verifies jwt2)
 400- cleaner separation of realm auth vs encryption key distribution
 401- simpler flow (no need to return realm member list)
 402
 403**crypto verification principle**: digital signatures work as sign-with-private/verify-with-public, while encryption works as encrypt-with-public/decrypt-with-private. jwt2 verification uses signature verification, not decryption.
 404
 405**invitation flow diagram**:
 406#+BEGIN_SRC mermaid
 407sequenceDiagram
 408    participant I as Inviter
 409    participant S as Server
 410    participant E as Invitee
 411
 412    Note over I: Generate ephemeral keypair
 413    I->>I: ephemeral_private, ephemeral_public
 414
 415    Note over I: Encrypt realm key
 416    I->>I: encrypted_realm_key = encrypt(realm_key, ephemeral_private)
 417
 418    I->>S: POST /invitations<br/>{invitation_id, realm_id, ephemeral_public, encrypted_realm_key}
 419    S-->>I: OK
 420
 421    Note over I: Create JWTs for QR code
 422    I->>I: jwt1 = sign({realm_id, invitation_id}, inviter_private)
 423    I->>I: jwt2 = sign({ephemeral_private, invitation_id}, ephemeral_private)
 424
 425    Note over I,E: QR code contains [jwt1, jwt2]
 426
 427    E->>S: POST /invitations/exchange<br/>{jwt1}
 428    Note over S: Verify jwt1 signature<br/>against realm members
 429    S-->>E: {invitation_id, realm_id, ephemeral_public, encrypted_realm_key}
 430
 431    Note over E: Verify jwt2 signature<br/>using ephemeral_public
 432    E->>E: verify_signature(jwt2, ephemeral_public)
 433
 434    Note over E: Extract key and decrypt
 435    E->>E: ephemeral_private = decode(jwt2)
 436    E->>E: realm_key = decrypt(encrypted_realm_key, ephemeral_private)
 437
 438    Note over E: Now member of realm!
 439#+END_SRC
 440
 441**** jwk keypair generation and validation                       :ai:claude:
 442
 443discussed jwk vs raw crypto.subtle for keypair storage. since public keys need server storage for realm authorization, jwk is better for interoperability.
 444
 445**keypair generation**:
 446#+BEGIN_SRC typescript
 447const keypair = await crypto.subtle.generateKey(
 448  { name: "Ed25519" },
 449  true,
 450  ["sign", "verify"]
 451);
 452
 453const publicJWK = await crypto.subtle.exportKey("jwk", keypair.publicKey);
 454const privateJWK = await crypto.subtle.exportKey("jwk", keypair.privateKey);
 455
 456// JWK format:
 457{
 458  "kty": "OKP",
 459  "crv": "Ed25519",
 460  "x": "base64url-encoded-public-key",
 461  "d": "base64url-encoded-private-key" // only in private JWK
 462}
 463#+END_SRC
 464
 465**client validation**:
 466#+BEGIN_SRC typescript
 467function isValidEd25519PublicJWK(jwk: any): boolean {
 468  return (
 469    typeof jwk === 'object' &&
 470    jwk.kty === 'OKP' &&
 471    jwk.crv === 'Ed25519' &&
 472    typeof jwk.x === 'string' &&
 473    jwk.x.length === 43 && // base64url Ed25519 public key length
 474    !jwk.d && // public key shouldn't have private component
 475    !jwk.use || jwk.use === 'sig'
 476  );
 477}
 478
 479async function validatePublicKey(publicJWK: JsonWebKey): Promise<CryptoKey | null> {
 480  try {
 481    if (!isValidEd25519PublicJWK(publicJWK)) return null;
 482
 483    const key = await crypto.subtle.importKey(
 484      'jwk',
 485      publicJWK,
 486      { name: 'Ed25519' },
 487      false,
 488      ['verify']
 489    );
 490
 491    return key;
 492  } catch {
 493    return null;
 494  }
 495}
 496#+END_SRC
 497
 498**server validation (node.js)**:
 499#+BEGIN_SRC typescript
 500import { webcrypto } from 'node:crypto';
 501
 502async function validateClientPublicKey(publicJWK: JsonWebKey): Promise<boolean> {
 503  try {
 504    if (!isValidEd25519PublicJWK(publicJWK)) return false;
 505
 506    await webcrypto.subtle.importKey(
 507      'jwk',
 508      publicJWK,
 509      { name: 'Ed25519' },
 510      false,
 511      ['verify']
 512    );
 513
 514    return true;
 515  } catch {
 516    return false;
 517  }
 518}
 519#+END_SRC
 520
 521**authentication flow**:
 522#+BEGIN_SRC typescript
 523// client signs message
 524const authMessage = {
 525  realm: 'uuid-here',
 526  timestamp: Date.now(),
 527  action: 'join'
 528};
 529
 530const signature = await crypto.subtle.sign(
 531  'Ed25519',
 532  privateKey,
 533  new TextEncoder().encode(JSON.stringify(authMessage))
 534);
 535
 536// server verifies
 537async function verifyAuth(req: AuthRequest): Promise<boolean> {
 538  const publicKey = await webcrypto.subtle.importKey(
 539    'jwk',
 540    req.publicKey,
 541    { name: 'Ed25519' },
 542    false,
 543    ['verify']
 544  );
 545
 546  const messageBytes = new TextEncoder().encode(JSON.stringify(req.message));
 547  const signatureBytes = new Uint8Array(req.signature);
 548
 549  return await webcrypto.subtle.verify(
 550    'Ed25519',
 551    publicKey,
 552    signatureBytes,
 553    messageBytes
 554  );
 555}
 556#+END_SRC
 557
 558**** proposed schemas                                             :ai:claude:
 559
 560***** client-side schema (dexie)
 561
 562#+BEGIN_SRC typescript
 563// Core RSS/Podcast data (from your existing design)
 564interface Channel {
 565  id: string;
 566  feedUrl: string;
 567  htmlUrl?: string;
 568  imageUrl?: string;
 569  title?: string;
 570  description?: string;
 571  language?: string;
 572  people?: Record<string, string>;
 573  tags?: string[];
 574
 575  // Refresh management
 576  refreshHP: number;
 577  nextRefreshAt?: number;
 578  lastRefreshAt?: number;
 579  lastRefreshStatus?: string;
 580  lastRefreshHttpStatus?: number;
 581  lastRefreshHttpEtag?: string;
 582
 583  // Cache info
 584  contentHash?: string;
 585  lastFetchedAt?: number;
 586}
 587
 588interface ChannelEntry {
 589  id: string;
 590  channelId: string;
 591  guid: string;
 592  title: string;
 593  linkUrl?: string;
 594  imageUrl?: string;
 595  snippet?: string;
 596  content?: string;
 597
 598  enclosure?: {
 599    url: string;
 600    type?: string;
 601    length?: number;
 602  };
 603
 604  podcast?: {
 605    explicit?: boolean;
 606    duration?: string;
 607    seasonNum?: number;
 608    episodeNum?: number;
 609    transcriptUrl?: string;
 610  };
 611
 612  publishedAt?: number;
 613  fetchedAt?: number;
 614}
 615
 616// Device-specific sync tables
 617interface PlayRecord {
 618  id: string;
 619  entryId: string;
 620  deviceId: string;
 621  position: number;
 622  duration?: number;
 623  completed: boolean;
 624  speed: number;
 625  updatedAt: number;
 626}
 627
 628interface Subscription {
 629  id: string;
 630  channelId: string;
 631  deviceId?: string;
 632  parentPath: string;  // "/Tech/Programming"
 633  autoDownload: boolean;
 634  downloadLimit?: number;
 635  isActive: boolean;
 636  createdAt: number;
 637  updatedAt: number;
 638}
 639
 640interface QueueItem {
 641  id: string;
 642  entryId: string;
 643  deviceId: string;
 644  position: number;
 645  addedAt: number;
 646}
 647
 648interface Device {
 649  id: string;
 650  name: string;
 651  platform: string;
 652  lastSeen: number;
 653}
 654
 655// Local cache metadata
 656interface LocalCache {
 657  id: string;
 658  url: string;
 659  contentHash: string;
 660  filePath: string;    // OPFS path
 661  cachedAt: number;
 662  expiresAt?: number;
 663  size: number;
 664  isOfflineOnly: boolean;
 665}
 666
 667// Dexie schema
 668const db = new Dexie('SkypodDB');
 669db.version(1).stores({
 670  channels: '&id, feedUrl, contentHash',
 671  channelEntries: '&id, channelId, publishedAt',
 672  playRecords: '&id, [entryId+deviceId], deviceId, updatedAt',
 673  subscriptions: '&id, channelId, deviceId, parentPath',
 674  queueItems: '&id, entryId, deviceId, position',
 675  devices: '&id, lastSeen',
 676  localCache: '&id, url, contentHash, expiresAt'
 677});
 678#+END_SRC
 679
 680***** server-side schema
 681
 682#+BEGIN_SRC typescript
 683// Content-addressed cache
 684interface ContentStore {
 685  contentHash: string;     // Primary key
 686  content: Buffer;         // Raw feed content
 687  contentType: string;
 688  contentLength: number;
 689  firstSeenAt: number;
 690  referenceCount: number;
 691}
 692
 693interface ContentHistory {
 694  id: string;
 695  url: string;
 696  contentHash: string;
 697  fetchedAt: number;
 698  isLatest: boolean;
 699}
 700
 701// HTTP cache with health tracking (from your existing design)
 702interface HttpCache {
 703  key: string;             // URL hash, primary key
 704  url: string;
 705
 706  status: 'alive' | 'dead';
 707  lastFetchedAt: number;
 708  lastFetchError?: string;
 709  lastFetchErrorStreak: number;
 710
 711  lastHttpStatus: number;
 712  lastHttpEtag?: string;
 713  lastHttpHeaders: Record<string, string>;
 714  expiresAt: number;
 715  expirationTtl: number;
 716
 717  contentHash: string;     // Points to ContentStore
 718}
 719
 720// Sync/auth tables
 721interface Realm {
 722  id: string;              // UUID
 723  createdAt: number;
 724  verifiedKeys: string[];  // Public key list
 725}
 726
 727interface PeerConnection {
 728  id: string;
 729  realmId: string;
 730  publicKey: string;
 731  lastSeen: number;
 732  isOnline: boolean;
 733}
 734
 735// Media cache for podcast episodes
 736interface MediaCache {
 737  contentHash: string;     // Primary key
 738  originalUrl: string;
 739  mimeType: string;
 740  fileSize: number;
 741  content: Buffer;
 742  cachedAt: number;
 743  accessCount: number;
 744}
 745#+END_SRC
 746
 747**** episode title parsing for sub-feed groupings                 :ai:claude:
 748
 749*problem*: some podcast feeds contain multiple shows, need hierarchical organization within a feed
 750
 751*example*: "Apocalypse Players" podcast
 752- episode title: "A Term of Art 6 - Winston's Hollow"
 753- desired grouping: "Apocalypse Players > A Term of Art > 6 - Winston's Hollow"
 754- UI shows sub-shows within the main feed
 755
 756***** approaches considered
 757
 7581. *manual regex patterns* (short-term solution)
 759   - user provides regex with capture groups = tags
 760   - reliable, immediate, user-controlled
 761   - requires manual setup per feed
 762
 7632. *LLM-generated regex* (automation goal)
 764   - analyze last 100 episode titles
 765   - generate regex pattern automatically
 766   - good balance of automation + reliability
 767
 7683. *NER model training* (experimental)
 769   - train spacy model for episode title parsing
 770   - current prototype: 150 labelled examples, limited success
 771   - needs more training data to be viable
 772
 773***** data model implications
 774
 775- add regex pattern field to Channel/Feed
 776- store extracted groupings as hierarchical tags on ~ChannelEntry~
 777- maybe add grouping/series field to episodes
 778
 779***** plan
 780
 781*preference*: start with manual regex, evolve toward LLM automation
 782
 783*implementation design*:
 784- if no title pattern: episodes are direct children of the feed
 785- title pattern = regex with named capture groups + path template
 786
 787*example configuration*:
 788- regex: ~^(?<series>[^0-9]+)\s*(?<episode>\d+)\s*-\s*(?<title>.+)$~
 789- path template: ~{series} > Episode {episode} - {title}~
 790- result: "A Term of Art 6 - Winston's Hollow" → "A Term of Art > Episode 6 - Winston's Hollow"
 791
 792*schema additions*:
 793#+BEGIN_SRC typescript
 794interface Channel {
 795  // ... existing fields
 796  titlePatterns?: Array<{
 797    name: string;            // "Main Episodes", "Bonus Content", etc.
 798    regex: string;           // named capture groups
 799    pathTemplate: string;    // interpolation template
 800    priority: number;        // order to try patterns (lower = first)
 801    isActive: boolean;       // can disable without deleting
 802  }>;
 803  fallbackPath?: string;     // template for unmatched episodes
 804}
 805
 806interface ChannelEntry {
 807  // ... existing fields
 808  parsedPath?: string;       // computed from titlePattern
 809  parsedGroups?: Record<string, string>; // captured groups
 810  matchedPatternName?: string; // which pattern was used
 811}
 812#+END_SRC
 813
 814*pattern matching logic*:
 8151. try patterns in priority order (lower number = higher priority)
 8162. first matching pattern wins
 8173. if no patterns match, use fallbackPath template (e.g., "Misc > {title}")
 8184. if no fallbackPath, episode stays direct child of feed
 819
 820*example multi-pattern setup*:
 821- Pattern 1: "Main Episodes" - ~^(?<series>[^0-9]+)\s*(?<episode>\d+)~ → ~{series} > Episode {episode}~
 822- Pattern 2: "Bonus Content" - ~^Bonus:\s*(?<title>.+)~ → ~Bonus > {title}~
 823- Fallback: ~Misc > {title}~
 824
 825**** scoped tags and filter-based UI evolution                    :ai:claude:
 826
 827*generalization*: move from rigid hierarchies to tag-based filtering system
 828
 829*tag scoping*:
 830- feed-level tags: "Tech", "Gaming", "D&D"
 831- episode-level tags: from regex captures like "series:CriticalRole", "campaign:2", "type:main"
 832- user tags: manual additions like "favorites", "todo"
 833
 834*UI as tag filtering*:
 835- default view: all episodes grouped by feed
 836- filter by ~series:CriticalRole~ → shows only CR episodes across all feeds
 837- filter by ~type:bonus~ → shows bonus content from all podcasts
 838- combine filters: ~series:CriticalRole AND type:main~ → main CR episodes only
 839
 840*benefits*:
 841- no rigid hierarchy - users create their own views
 842- regex patterns become automated episode taggers
 843- same filtering system works for search, organization, queues
 844- tags are syncable metadata, views are client-side
 845
 846*schema evolution*:
 847#+BEGIN_SRC typescript
 848interface Tag {
 849  scope: 'feed' | 'episode' | 'user';
 850  key: string;    // "series", "type", "campaign"
 851  value: string;  // "CriticalRole", "bonus", "2"
 852}
 853
 854interface ChannelEntry {
 855  // ... existing
 856  tags: Tag[];  // includes regex-generated + manual
 857}
 858
 859interface FilterView {
 860  id: string;
 861  name: string;
 862  folderPath: string;  // "/Channels/Critical Role"
 863  filters: Array<{
 864    key: string;
 865    value: string;
 866    operator: 'equals' | 'contains' | 'not';
 867  }>;
 868  isDefault: boolean;
 869  createdAt: number;
 870}
 871#+END_SRC
 872
 873**** default UI construction and feed merging                     :ai:claude:
 874
 875*auto-generated views on subscribe*:
 876- subscribe to "Critical Role" → creates ~/Channels/Critical Role~ folder
 877- default filter view: ~feed:CriticalRole~ (shows all episodes from that feed)
 878- user can customize, split into sub-views, or delete
 879
 880*smart view suggestions*:
 881- after regex patterns generate tags, suggest splitting views
 882- "I noticed episodes with ~series:Campaign2~ and ~series:Campaign3~ - create separate views?"
 883- "Create view for ~type:bonus~ episodes?"
 884
 885*view management UX*:
 886- right-click feed → "Split by series", "Split by type"
 887- drag episodes between views to create manual filters
 888- views can be nested: ~/Channels/Critical Role/Campaign 2/Main Episodes~
 889
 890*feed merging for multi-source shows*:
 891problem: patreon feed + main show feed for same podcast
 892
 893#+BEGIN_EXAMPLE
 894/Channels/
 895  Critical Role/
 896    All Episodes         # merged view: feed:CriticalRole OR feed:CriticalRolePatreon
 897    Main Feed           # filter: feed:CriticalRole
 898    Patreon Feed        # filter: feed:CriticalRolePatreon
 899#+END_EXAMPLE
 900
 901*deduplication strategy*:
 902- episodes matched by ~guid~ or similar content hash
 903- duplicate episodes get ~source:main,patreon~ tags
 904- UI shows single episode with source indicators
 905- user can choose preferred source for playback
 906- play state syncs across all sources of same episode
 907
 908*feed relationship schema*:
 909#+BEGIN_SRC typescript
 910interface FeedGroup {
 911  id: string;
 912  name: string;           // "Critical Role"
 913  feedIds: string[];      // [mainFeedId, patreonFeedId]
 914  mergeStrategy: 'guid' | 'title' | 'contentHash';
 915  defaultView: FilterView;
 916}
 917
 918interface ChannelEntry {
 919  // ... existing
 920  duplicateOf?: string;   // points to canonical episode ID
 921  sources: string[];      // feed IDs where this episode appears
 922}
 923#+END_SRC
 924
 925**per-view settings and state**:
 926each filter view acts like a virtual feed with its own:
 927- unread counts (episodes matching filter that haven't been played)
 928- notification settings (notify for new episodes in this view)
 929- muted state (hide notifications, mark as read automatically)
 930- auto-download preferences (download episodes that match this filter)
 931- play queue integration (add new episodes to queue)
 932
 933**use cases**:
 934- mute "Bonus Content" view but keep notifications for main episodes
 935- auto-download only "Campaign 2" episodes, skip everything else
 936- separate unread counts: "5 unread in Main Episodes, 2 in Bonus"
 937- queue only certain series automatically
 938
 939**schema additions**:
 940#+BEGIN_SRC typescript
 941interface FilterView {
 942  // ... existing fields
 943  settings: {
 944    notificationsEnabled: boolean;
 945    isMuted: boolean;
 946    autoDownload: boolean;
 947    autoQueue: boolean;
 948    downloadLimit?: number;  // max episodes to keep
 949  };
 950  state: {
 951    unreadCount: number;
 952    lastViewedAt?: number;
 953    isCollapsed: boolean;    // in sidebar
 954  };
 955}
 956#+END_SRC
 957
 958*inheritance behavior*:
 959- new filter views inherit settings from parent feed/group
 960- user can override per-view
 961- "mute all Critical Role" vs "mute only bonus episodes"
 962
 963**** client-side episode caching strategy                         :ai:claude:
 964
 965*architecture*: service worker-based transparent caching
 966
 967*flow*:
 9681. audio player requests ~/audio?url={episodeUrl}~
 9692. service worker intercepts request
 9703. if present in cache (with Range header support):
 971   - serve from cache
 9724. else:
 973   - let request continue to server (immediate playback)
 974   - simultaneously start background fetch of full audio file
 975   - when complete, broadcast "episode-cached" event
 976   - audio player catches event and restarts feed → now uses cached version
 977
 978**benefits**:
 979- no playback interruption (streaming starts immediately)
 980- seamless transition to cached version
 981- Range header support for seeking/scrubbing
 982- transparent to audio player implementation
 983
 984*implementation considerations*:
 985- cache storage limits and cleanup policies
 986- partial download resumption if interrupted
 987- cache invalidation when episode URLs change
 988- offline playback support
 989- progress tracking for background downloads
 990
 991**schema additions**:
 992#+BEGIN_SRC typescript
 993interface CachedEpisode {
 994  episodeId: string;
 995  originalUrl: string;
 996  cacheKey: string;        // for cache API
 997  fileSize: number;
 998  cachedAt: number;
 999  lastAccessedAt: number;
1000  downloadProgress?: number; // 0-100 for in-progress downloads
1001}
1002#+END_SRC
1003
1004**service worker events**:
1005- ~episode-cache-started~ - background download began
1006- ~episode-cache-progress~ - download progress update
1007- ~episode-cache-complete~ - ready to switch to cached version
1008- ~episode-cache-error~ - download failed, stay with streaming
1009
1010**background sync for proactive downloads**:
1011
1012**browser support reality**:
1013- Background Sync API: good support (Chrome/Edge, limited Safari)
1014- Periodic Background Sync: very limited (Chrome only, requires PWA install)
1015- Push notifications: good support, but requires user permission
1016
1017**hybrid approach**:
10181. **foreground sync** (reliable): when app is open, check for new episodes
10192. **background sync** (opportunistic): register sync event when app closes
10203. **push notifications** (fallback): server pushes "new episodes available"
10214. **manual sync** (always works): pull-to-refresh, settings toggle
1022
1023**implementation strategy**:
1024#+BEGIN_SRC typescript
1025// Register background sync when app becomes hidden
1026document.addEventListener('visibilitychange', () => {
1027  if (document.hidden && 'serviceWorker' in navigator) {
1028    navigator.serviceWorker.ready.then(registration => {
1029      return registration.sync.register('download-episodes');
1030    });
1031  }
1032});
1033
1034// Service worker handles sync event
1035self.addEventListener('sync', event => {
1036  if (event.tag === 'download-episodes') {
1037    event.waitUntil(syncEpisodes());
1038  }
1039});
1040#+END_SRC
1041
1042**realistic expectations**:
1043- iOS Safari: very limited background processing
1044- Android Chrome: decent background sync support
1045- Desktop: mostly works
1046- battery/data saver modes: disabled by OS
1047
1048**fallback strategy**: rely primarily on foreground sync + push notifications, treat background sync as nice-to-have enhancement
1049
1050**push notification sync workflow**:
1051
1052**server-side trigger**:
10531. server detects new episodes during RSS refresh
10542. check which users are subscribed to that feed
10553. send push notification with episode metadata payload
10564. notification wakes up service worker on client
1057
1058**service worker notification handler**:
1059#+BEGIN_SRC typescript
1060self.addEventListener('push', event => {
1061  const data = event.data?.json();
1062
1063  if (data.type === 'new-episodes') {
1064    event.waitUntil(
1065      // Start background download of new episodes
1066      downloadNewEpisodes(data.episodes)
1067        .then(() => {
1068          // Show notification to user
1069          return self.registration.showNotification('New episodes available', {
1070            body: ~${data.episodes.length} new episodes downloaded~,
1071            icon: '/icon-192.png',
1072            badge: '/badge-72.png',
1073            tag: 'new-episodes',
1074            data: { episodeIds: data.episodes.map(e => e.id) }
1075          });
1076        })
1077    );
1078  }
1079});
1080
1081// Handle notification click
1082self.addEventListener('notificationclick', event => {
1083  event.notification.close();
1084
1085  // Open app to specific episode or feed
1086  event.waitUntil(
1087    clients.openWindow(~/episodes/${event.notification.data.episodeIds[0]}~)
1088  );
1089});
1090#+END_SRC
1091
1092**server push logic**:
1093- batch notifications (don't spam for every episode)
1094- respect user notification preferences from FilterView settings
1095- include episode metadata in payload to avoid round-trip
1096- throttle notifications (max 1 per feed per hour?)
1097
1098**user flow**:
10991. new episode published → server pushes notification
11002. service worker downloads episode in background
11013. user sees "New episodes downloaded" notification
11024. tap notification → opens app to new episode, ready to play offline
1103
1104*benefits*:
1105- true background downloading without user interaction
1106- works even when app is closed
1107- respects per-feed notification settings
1108
1109**push payload size constraints**:
1110- **limit**: ~4KB (4,096 bytes) across most services
1111- **practical limit**: ~3KB to account for service overhead
1112- **implications for episode metadata**:
1113
1114#+BEGIN_SRC json
1115{
1116  "type": "new-episodes",
1117  "episodes": [
1118    {
1119      "id": "ep123",
1120      "channelId": "ch456",
1121      "title": "Episode Title",
1122      "url": "https://...",
1123      "duration": 3600,
1124      "size": 89432112
1125    }
1126  ]
1127}
1128#+END_SRC
1129
1130**payload optimization strategies**:
1131- minimal episode metadata in push (id, url, basic info)
1132- batch multiple episodes in single notification
1133- full episode details fetched after service worker wakes up
1134- URL shortening for long episode URLs
1135- compress JSON payload if needed
1136
1137**alternative for large payloads**:
1138- push notification contains only "new episodes available" signal
1139- service worker makes API call to get full episode list
1140- trade-off: requires network round-trip but unlimited data
1141
1142**logical clock sync optimization**:
1143
1144much simpler approach using sync revisions:
1145
1146#+BEGIN_SRC json
1147{
1148  "type": "sync-available",
1149  "fromRevision": 12345,
1150  "toRevision": 12389,
1151  "changeCount": 8
1152}
1153#+END_SRC
1154
1155**service worker sync flow**:
11561. push notification wakes service worker with revision range
11572. service worker fetches ~/sync?from=12345&to=12389~
11583. server returns only changes in that range (episodes, feed updates, etc)
11594. service worker applies changes to local dexie store
11605. service worker queues background downloads for new episodes
11616. updates local revision to 12389
1162
1163**benefits of revision-based approach**:
1164- tiny push payload (just revision numbers)
1165- server can efficiently return only changes in range
1166- automatic deduplication (revision already applied = skip)
1167- works for any sync data (episodes, feed metadata, user settings)
1168- handles offline gaps gracefully (fetch missing revision ranges)
1169
1170**sync API response**:
1171#+BEGIN_SRC typescript
1172interface SyncResponse {
1173  fromRevision: number;
1174  toRevision: number;
1175  changes: Array<{
1176    type: 'episode' | 'channel' | 'subscription';
1177    operation: 'create' | 'update' | 'delete';
1178    data: any;
1179    revision: number;
1180  }>;
1181}
1182#+END_SRC
1183
1184**integration with episode downloads**:
1185- service worker processes sync changes
1186- identifies new episodes that match user's auto-download filters
1187- queues those for background cache fetching
1188- much more efficient than sending episode metadata in push payload
1189
1190**service worker processing time constraints**:
1191
1192**hard limits**:
1193- **30 seconds idle timeout**: service worker terminates after 30s of inactivity
1194- **5 minutes event processing**: single event/request must complete within 5 minutes
1195- **30 seconds fetch timeout**: individual network requests timeout after 30s
1196- **notification requirement**: push events MUST display notification before promise settles
1197
1198**practical implications**:
1199- sync API call (~/sync?from=X&to=Y~) must complete within 30s
1200- large episode downloads must be queued, not started immediately in push handler
1201- use ~event.waitUntil()~ to keep service worker alive during processing
1202- break large operations into smaller chunks
1203
1204**recommended push event flow**:
1205#+BEGIN_SRC typescript
1206self.addEventListener('push', event => {
1207  const data = event.data?.json();
1208
1209  event.waitUntil(
1210    // Must complete within 5 minutes total
1211    handlePushSync(data)
1212      .then(() => {
1213        // Required: show notification before promise settles
1214        return self.registration.showNotification('Episodes synced');
1215      })
1216  );
1217});
1218
1219async function handlePushSync(data) {
1220  // 1. Quick sync API call (< 30s)
1221  const changes = await fetch(~/sync?from=${data.fromRevision}&to=${data.toRevision}~);
1222
1223  // 2. Apply changes to dexie store (fast, local)
1224  await applyChangesToStore(changes);
1225
1226  // 3. Queue episode downloads for later (don't start here)
1227  await queueEpisodeDownloads(changes.newEpisodes);
1228
1229  // Total time: < 5 minutes, preferably < 30s
1230}
1231#+END_SRC
1232
1233*download strategy*: use push event for sync + queuing, separate background tasks for actual downloads
1234
1235*background fetch API for large downloads*:
1236
1237*progressive enhancement approach*:
1238#+BEGIN_SRC typescript
1239async function queueEpisodeDownloads(episodes) {
1240  for (const episode of episodes) {
1241    if ('serviceWorker' in navigator && 'BackgroundFetch' in window) {
1242      // Chrome/Edge: use Background Fetch API for true background downloading
1243      await navigator.serviceWorker.ready.then(registration => {
1244        return registration.backgroundFetch.fetch(
1245          ~episode-${episode.id}~,
1246          episode.url,
1247          {
1248            icons: [{ src: '/icon-256.png', sizes: '256x256', type: 'image/png' }],
1249            title: ~Downloading: ${episode.title}~,
1250            downloadTotal: episode.fileSize
1251          }
1252        );
1253      });
1254    } else {
1255      // Fallback: queue for reactive download (download while streaming)
1256      await queueReactiveDownload(episode);
1257    }
1258  }
1259}
1260
1261// Handle background fetch completion
1262self.addEventListener('backgroundfetch', event => {
1263  if (event.tag.startsWith('episode-')) {
1264    event.waitUntil(handleEpisodeDownloadComplete(event));
1265  }
1266});
1267#+END_SRC
1268
1269*browser support reality*:
1270- *Chrome/Edge*: Background Fetch API supported
1271- *Firefox/Safari*: not supported, fallback to reactive caching
1272- *mobile*: varies by platform and browser
1273
1274*benefits when available*:
1275- true background downloading (survives app close, browser close)
1276- built-in download progress UI
1277- automatic retry on network failure
1278- no service worker time limits during download
1279
1280*graceful degradation*:
1281- detect support, use when available
1282- fallback to reactive caching (download while streaming)
1283- user gets best experience possible on their platform
1284
1285*** research todos                                                :ai:claude:
1286
1287high-level unanswered questions from architecture brainstorming:
1288
1289**** sync and data management
1290***** TODO dexie sync capabilities vs rxdb for multi-device sync implementation
1291***** TODO webrtc p2p sync implementation patterns and reliability
1292***** TODO conflict resolution strategies for device-specific data in distributed sync
1293***** TODO content-addressed deduplication algorithms for rss/podcast content
1294**** client-side storage and caching
1295***** TODO opfs storage limits and cleanup strategies for client-side caching
1296***** TODO practical background fetch api limits and edge cases for podcast downloads
1297**** automation and intelligence
1298***** TODO llm-based regex generation for episode title parsing automation
1299***** TODO push notification subscription management and realm authentication
1300**** platform and browser capabilities
1301***** TODO browser audio api capabilities for podcast-specific features (speed, silence skip)
1302***** TODO progressive web app installation and platform-specific behaviors
1303
1304# Local Variables:
1305# org-hierarchical-todo-statistics: nil
1306# org-checkbox-hierarchical-statistics: nil
1307# End: