#+PROPERTY: COOKIE_DATA recursive
#+STARTUP: overview

most of this is old, I need to rework it

* design

** frontend (packages/app)
- http://localhost:7891
- proxies ~/api~ and ~/sync~ to the backend in development
- uses Dexie for local storage with sync plugin
- custom sync replication implementation using PeerJS through the signalling server

** backend (packages/server)
- http://localhost:7890
- serves ~/dist~ if the directory is present (see ~dist~ script)
- serves ~/api~ for RSS caching proxy
  - file-based routing under the api directory
- serves ~/sync~ which is a ~peerjs~ signalling server

** sync
- each client keeps the full data set
- dexie sync and observable let us stream change sets
- we can publish the "latest" to all peers
- on first pull, if not the first client, we can request a dump out of band

*** rss feed data
- do we want to backup feed data?
  - conceptually, this should be refetchable
  - but feeds go away, and some will only show recent stories
    - so yes, we'll need this
  - but server side, we can dedupe
    - content-addressed server-side cache?

- server side does RSS pulling
  - can feeds be marked private, such that they won't be pulled through the proxy?
  - but then we require everything to be fetchable via cors
    - client configured proxy settings?

*** peer connection
- on startup, check for current realm-id and key pair
- if not present, ask to login or start new
  - if login, run through the [[* pairing]] process
  - if start new, run through the [[* registration]] process
- use keypair to authenticate to server
  - response includes list of active peers to connect
- clients negotiate sync from there
- an identity is a keypair and a realm

- realm is uuid
  - realm on the server is the socket connection for peer discovery
    - keeps a list of verified public keys
    - and manages the /current/ ~public-key->peer ids~ mapping
  - realm on the client side is first piece of info required for sync
    - when connecting to the signalling server, you present a realm, and a signed public key
    - server accepts/rejects based on signature and current verified keys

- a new keypair can create a realm

- a new keypair can double sign an invitation
  - invite = ~{ realm:, nonce:, not_before:, not_after:, authorizer: }~, signed with verified key
  - exchanging an invite = ~{ invite: }~, signed with my key

- on startup
  - start stand-alone (no syncing required, usually the case on first-run)
    - generate a keypair
    - want server backup?
      - sign a "setup" message with new keypair and send to the server
      - server responds with a new realm, that this keypair is already verified for
    - move along
  - exchange invite to sync to other devices
    - generate a keypair
    - sign the exchange message with the invite and send to the server
      - server verifies the invite
      - adds the new public key to the peer list and publishes downstream
    - move along

***** standalone
in this mode, there is no syncing. this is the most likely first-time run option.

- generate a keypair on startup, so we have a stable fingerprint in the future
- done

***** pairing
in this mode, there is syncing to a named realm, but not necessarily server resources consumed
we don't need an email, since the server is just doing signalling and peer management

- generate an invite from an existing verified peer
  - ~{ realm:, not_before:, not_after:, inviter: peer.public_key }~
  - sign that invitation from the existing verified peer

- standalone -> paired
  - get the invitation somehow (QR code?)
  - sign an invite exchange with the standalone's public key
  - send to server
    - server verifies the invite
    - adds the new public key to the peer list and publishes downstream

***** server backup
in this mode, there is syncing to a named realm by email.

goal of server backup mode is that we can go from email->fully working client with latest data without having to have any clients left around that could participate in the sync.

- generate a keypair on startup
- sign a registration message sent to the server
  - send a verification email
    - if email/realm already exists, this is authorization
    - if not, it's email validation
  - server starts a realm and associates the public key
  - server acts as a peer for the realm, and stores private data

- since dexie is publishing change sets, we should be able to just store deltas
- but we'll need to store _all_ deltas, unless we're materializing on the server side too
  - should we use an indexdb shim so we can import/export from the server for clean start?
  - how much materialization does the server need?

* ai instructions
- when writing to the devlog, add tags to your entries specifying ~:ai:~ and what tool did it.
- false starts and prototypes are in ~./devlog/~

* notes and decision record [1/11]
** architecture design (may 28-29)                               :ai:claude:

details notes are in [[./devlog/may-29.org]]
key decisions and system design:

*** sync model
- device-specific records for playback state/queues to avoid conflicts
- content-addressed server cache with deduplication
- dual-JWT invitation flow for secure realm joining

*** data structures
- tag-based filtering system instead of rigid hierarchies
- regex patterns for episode title parsing and organization
- service worker caching with background download support

*** core schemas
**** client (dexie)
- Channel/ChannelEntry for RSS feeds and episodes
- PlayRecord/QueueItem scoped by deviceId
- FilterView for virtual feed organization

**** server (drizzle)
- ContentStore for deduplicated content by hash
- Realm/PeerConnection for sync authorization
- HttpCache with health tracking and TTL

*** push sync strategy
- revision-based sync (just send revision ranges in push notifications)
- background fetch API for large downloads where supported
- graceful degradation to reactive caching

*** research todos                                                :ai:claude:

**** sync and data management
***** DONE identity and signature management
***** TODO dexie sync capabilities vs rxdb for multi-device sync implementation
***** TODO webrtc p2p sync implementation patterns and reliability
***** TODO conflict resolution strategies for device-specific data in distributed sync
***** TODO content-addressed deduplication algorithms for rss/podcast content
**** client-side storage and caching
***** TODO opfs storage limits and cleanup strategies for client-side caching
***** TODO practical background fetch api limits and edge cases for podcast downloads
**** automation and intelligence
***** TODO llm-based regex generation for episode title parsing automation
***** TODO push notification subscription management and realm authentication
**** platform and browser capabilities
***** TODO browser audio api capabilities for podcast-specific features (speed, silence skip)
***** TODO progressive web app installation and platform-specific behaviors

# Local Variables:
# org-hierarchical-todo-statistics: nil
# org-checkbox-hierarchical-statistics: nil
# End:

** <2025-05-28 Wed>
getting everything setup

the biggest open question I have is what sort of privacy/encryption guarantee I need. I want the server to be able to do things like cache and store feed data long-term.

Is "if you want full privacy, self-host" valid?

*** possibilities

- fully PWA
  - CON: cors, which would require a proxy anyway
  - CON: audio analysis, llm based stuff for categorization, etc. won't work
  - PRO: private as all get out
    - can still do WebRTC p2p sync for resiliancy
    - can still do server backups, if sync stream is encrypted, but no compaction would be available
    - could do _explicit_ server backups as dump files

- self hostable
  - PRO: can do bunches of private stuff on the server, because if you don't want me to see it, do it elsewhere
  - CON: hard for folk to use

*** brainstorm                                                    :ai:claude:
**** sync conflict resolution design discussion                   :ai:claude:

discussed the sync architecture and dexie conflict handling:

*dexie syncable limitations*:
- logical clocks handle causally-related changes well
- basic timestamp-based conflict resolution for concurrent updates
- last-writer-wins for same field conflicts
- no sophisticated CRDT or vector clock support

*solutions for podcast-specific conflicts*:

- play records: device-specific approach
  - store separate ~play_records~ per ~device_id~
  - each record: ~{ episode_id, device_id, position, completed, timestamp }~
  - UI handles conflict resolution with "continue from X device?" prompts
  - avoids arbitrary timestamp wins, gives users control

- subscription trees
  - store ~parent_path~ as single string field ("/Tech/Programming")
  - simpler than managing folder membership tables
  - conflicts still possible but contained to single field
  - could store move operations as events for richer resolution

*other sync considerations*:
- settings/preferences: distinguish device-local vs global
- bulk operations: "mark all played" can create duplicate operations
- metadata updates: server RSS updates vs local renames
- temporal ordering: recently played lists, queue reordering
- storage limits: cleanup operations conflicting across devices
- feed state: refresh timestamps, error states

*approach*: prefer "events not state" pattern and device-specific records where semantic conflicts are likely

**** data model brainstorm                                        :ai:claude:

core entities designed with sync in mind:

***** ~Feed~ :: RSS/podcast subscription
- ~parent_path~ field for folder structure (eg. ~/Tech/Programming~)
- ~is_private~ flag to skip server proxy
- ~refresh_interval~ for custom update frequencies

***** ~Episode~ :: individual podcast episodes
- standard RSS metadata (guid, title, description, media url)
- duration and file info for playback

***** ~PlayRecord~ :: device-specific playback state
- separate record per ~device_id~ to avoid timestamp conflicts
- position, completed status, playback speed
- UI can prompt "continue from X device?" for resolution

***** ~QueueItem~ :: device-specific episode queue
- ordered list with position field
- ~device_id~ scoped to avoid queue conflicts

***** ~Subscription~ :: feed membership settings
- can be global or device-specific
- auto-download preferences per device

***** ~Settings~ :: split global vs device-local
- theme, default speed = global
- download path, audio device = device-local

***** Event tables for complex operations:
- ~FeedMoveEvent~ for folder reorganization
- ~BulkMarkPlayedEvent~ for "mark all read" operations
- better conflict resolution than direct state updates

***** sync considerations
- device identity established on first run
- dexie syncable handles basic timestamp conflicts
- prefer device-scoped records for semantic conflicts
- event-driven pattern for bulk operations

**** schema evolution from previous iteration                     :ai:claude:

reviewed existing schema from tmp/feed.ts - well designed foundation:

***** keep from original
- Channel/ChannelEntry naming and structure
- ~refreshHP~ adaptive refresh system (much better than simple intervals)
- rich podcast metadata (people, tags, enclosure, podcast object)
- HTTP caching with etag/status tracking
- epoch millisecond timestamps
- ~hashId()~ approach for entry IDs

***** add for multi-device sync
- ~PlayState~ table (device-scoped position/completion)
- Subscription table (with ~parentPath~ for folders, device-scoped settings)
- ~QueueItem~ table (device-scoped episode queues)
- Device table (identity management)

***** migration considerations
- existing Channel/ChannelEntry can be preserved
- new tables are additive
- ~fetchAndUpsert~ method works well with server proxy architecture
- dexie sync vs rxdb - need to evaluate change tracking capabilities

**** content-addressed caching for offline resilience             :ai:claude:

designed caching system for when upstream feeds fail/disappear, building on existing cache-schema.ts:

***** server-side schema evolution (drizzle sqlite):
- keep existing ~httpCacheTable~ design (health tracking, http headers, ttl)
- add ~contentHash~ field pointing to deduplicated content
- new ~contentStoreTable~: deduplicated blobs by sha256 hash
- new ~contentHistoryTable~: url -> contentHash timeline with isLatest flag
- reference counting for garbage collection

***** client-side OPFS storage
- ~/cache/content/{contentHash}.xml~ for raw feeds
- ~/cache/media/{contentHash}.mp3~ for podcast episodes
- ~LocalCacheEntry~ metadata tracks expiration and offline-only flags
- maintains last N versions per feed for historical access

***** fetch strategy & fallback
1. check local OPFS cache first (fastest)
2. try server proxy ~/api/feed?url={feedUrl}~ (deduplicated)
3. server checks ~contentHistory~, serves latest or fetches upstream
4. server returns ~{contentHash, content, cached: boolean}~
5. client stores with content hash as filename
6. emergency mode: serve stale content when upstream fails

- preserves existing health tracking and HTTP caching logic
- popular feeds cached once on server, many clients benefit
- bandwidth savings via content hash comparison
- historical feed state preservation (feeds disappear!)
- true offline operation after initial sync

** <2025-05-29 Thu>                                               :ai:claude:
e2e encryption and invitation flow design

worked through the crypto and invitation architecture. key decisions:

*** keypair strategy
- use jwk format for interoperability (server stores public keys)
- ed25519 for signing, separate x25519 for encryption if needed
- zustand lazy initialization pattern: ~ensureKeypair()~ on first use
- store private jwk in persisted zustand state

*** invitation flow: dual-jwt approach
solved the chicken-and-egg problem of sharing encryption keys securely.

**** qr code contains two signed jwts:
1. invitation token: ~{iss: inviter_fingerprint, sub: invitation_id, purpose: "realm_invite"}~
2. encryption key token: ~{iss: inviter_fingerprint, ephemeral_private: base64_key, purpose: "ephemeral_key"}~

**** exchange process:
1. invitee posts jwt1 + their public keys to ~/invitations~
2. server verifies jwt1 signature against realm members
3. if valid: adds invitee to realm, returns ~{realm_id, realm_members, encrypted_realm_key}~
4. invitee verifies jwt2 signature against returned realm members
5. invitee extracts ephemeral private key, decrypts realm encryption key

**** security properties:
- server never has decryption capability (missing ephemeral private key)
- both jwts must be signed by verified realm member
- if first exchange fails, second jwt is cryptographically worthless
- atomic operation: identity added only if invitation valid
- built-in expiration and tamper detection via jwt standard

**** considered alternatives:
- raw ephemeral keys in qr: simpler but no authenticity
- ecdh key agreement: chicken-and-egg problem with public key exchange
- server escrow: good but missing authentication layer
- password-based: requires secure out-of-band sharing

the dual-jwt approach provides proper authenticated invitations while maintaining e2e encryption properties.

**** refined dual-jwt with ephemeral signing
simplified the approach by using ephemeral key for second jwt signature:

**setup**:
1. inviter generates ephemeral keypair
2. encrypts realm key with ephemeral private key
3. posts to server: ~{invitation_id, realm_id, ephemeral_public, encrypted_realm_key}~

**qr code contains**:
#+BEGIN_SRC json
// JWT 1: signed with inviter's realm signing key
{
  "realm_id": "uuid",
  "invitation_id": "uuid",
  "iss": "inviter_fingerprint"
}

// JWT 2: signed with ephemeral private key
{
  "ephemeral_private": "base64_key",
  "invitation_id": "uuid"
}
#+END_SRC

**exchange flow**:
1. submit jwt1 → server verifies against realm members → returns ~{invitation_id, realm_id, ephemeral_public, encrypted_realm_key}~
2. verify jwt2 signature using ~ephemeral_public~ from server response
3. extract ~ephemeral_private~ from jwt2, decrypt realm key

**benefits over previous version**:
- no premature key disclosure (invitee keys shared via normal webrtc peering)
- self-contained verification (ephemeral public key verifies jwt2)
- cleaner separation of realm auth vs encryption key distribution
- simpler flow (no need to return realm member list)

**crypto verification principle**: digital signatures work as sign-with-private/verify-with-public, while encryption works as encrypt-with-public/decrypt-with-private. jwt2 verification uses signature verification, not decryption.

**invitation flow diagram**:
#+BEGIN_SRC mermaid
sequenceDiagram
    participant I as Inviter
    participant S as Server
    participant E as Invitee

    Note over I: Generate ephemeral keypair
    I->>I: ephemeral_private, ephemeral_public

    Note over I: Encrypt realm key
    I->>I: encrypted_realm_key = encrypt(realm_key, ephemeral_private)

    I->>S: POST /invitations<br/>{invitation_id, realm_id, ephemeral_public, encrypted_realm_key}
    S-->>I: OK

    Note over I: Create JWTs for QR code
    I->>I: jwt1 = sign({realm_id, invitation_id}, inviter_private)
    I->>I: jwt2 = sign({ephemeral_private, invitation_id}, ephemeral_private)

    Note over I,E: QR code contains [jwt1, jwt2]

    E->>S: POST /invitations/exchange<br/>{jwt1}
    Note over S: Verify jwt1 signature<br/>against realm members
    S-->>E: {invitation_id, realm_id, ephemeral_public, encrypted_realm_key}

    Note over E: Verify jwt2 signature<br/>using ephemeral_public
    E->>E: verify_signature(jwt2, ephemeral_public)

    Note over E: Extract key and decrypt
    E->>E: ephemeral_private = decode(jwt2)
    E->>E: realm_key = decrypt(encrypted_realm_key, ephemeral_private)

    Note over E: Now member of realm!
#+END_SRC

**** jwk keypair generation and validation                       :ai:claude:

discussed jwk vs raw crypto.subtle for keypair storage. since public keys need server storage for realm authorization, jwk is better for interoperability.

**keypair generation**:
#+BEGIN_SRC typescript
const keypair = await crypto.subtle.generateKey(
  { name: "Ed25519" },
  true,
  ["sign", "verify"]
);

const publicJWK = await crypto.subtle.exportKey("jwk", keypair.publicKey);
const privateJWK = await crypto.subtle.exportKey("jwk", keypair.privateKey);

// JWK format:
{
  "kty": "OKP",
  "crv": "Ed25519",
  "x": "base64url-encoded-public-key",
  "d": "base64url-encoded-private-key" // only in private JWK
}
#+END_SRC

**client validation**:
#+BEGIN_SRC typescript
function isValidEd25519PublicJWK(jwk: any): boolean {
  return (
    typeof jwk === 'object' &&
    jwk.kty === 'OKP' &&
    jwk.crv === 'Ed25519' &&
    typeof jwk.x === 'string' &&
    jwk.x.length === 43 && // base64url Ed25519 public key length
    !jwk.d && // public key shouldn't have private component
    !jwk.use || jwk.use === 'sig'
  );
}

async function validatePublicKey(publicJWK: JsonWebKey): Promise<CryptoKey | null> {
  try {
    if (!isValidEd25519PublicJWK(publicJWK)) return null;

    const key = await crypto.subtle.importKey(
      'jwk',
      publicJWK,
      { name: 'Ed25519' },
      false,
      ['verify']
    );

    return key;
  } catch {
    return null;
  }
}
#+END_SRC

**server validation (node.js)**:
#+BEGIN_SRC typescript
import { webcrypto } from 'node:crypto';

async function validateClientPublicKey(publicJWK: JsonWebKey): Promise<boolean> {
  try {
    if (!isValidEd25519PublicJWK(publicJWK)) return false;

    await webcrypto.subtle.importKey(
      'jwk',
      publicJWK,
      { name: 'Ed25519' },
      false,
      ['verify']
    );

    return true;
  } catch {
    return false;
  }
}
#+END_SRC

**authentication flow**:
#+BEGIN_SRC typescript
// client signs message
const authMessage = {
  realm: 'uuid-here',
  timestamp: Date.now(),
  action: 'join'
};

const signature = await crypto.subtle.sign(
  'Ed25519',
  privateKey,
  new TextEncoder().encode(JSON.stringify(authMessage))
);

// server verifies
async function verifyAuth(req: AuthRequest): Promise<boolean> {
  const publicKey = await webcrypto.subtle.importKey(
    'jwk',
    req.publicKey,
    { name: 'Ed25519' },
    false,
    ['verify']
  );

  const messageBytes = new TextEncoder().encode(JSON.stringify(req.message));
  const signatureBytes = new Uint8Array(req.signature);

  return await webcrypto.subtle.verify(
    'Ed25519',
    publicKey,
    signatureBytes,
    messageBytes
  );
}
#+END_SRC

**** proposed schemas                                             :ai:claude:

***** client-side schema (dexie)

#+BEGIN_SRC typescript
// Core RSS/Podcast data (from your existing design)
interface Channel {
  id: string;
  feedUrl: string;
  htmlUrl?: string;
  imageUrl?: string;
  title?: string;
  description?: string;
  language?: string;
  people?: Record<string, string>;
  tags?: string[];

  // Refresh management
  refreshHP: number;
  nextRefreshAt?: number;
  lastRefreshAt?: number;
  lastRefreshStatus?: string;
  lastRefreshHttpStatus?: number;
  lastRefreshHttpEtag?: string;

  // Cache info
  contentHash?: string;
  lastFetchedAt?: number;
}

interface ChannelEntry {
  id: string;
  channelId: string;
  guid: string;
  title: string;
  linkUrl?: string;
  imageUrl?: string;
  snippet?: string;
  content?: string;

  enclosure?: {
    url: string;
    type?: string;
    length?: number;
  };

  podcast?: {
    explicit?: boolean;
    duration?: string;
    seasonNum?: number;
    episodeNum?: number;
    transcriptUrl?: string;
  };

  publishedAt?: number;
  fetchedAt?: number;
}

// Device-specific sync tables
interface PlayRecord {
  id: string;
  entryId: string;
  deviceId: string;
  position: number;
  duration?: number;
  completed: boolean;
  speed: number;
  updatedAt: number;
}

interface Subscription {
  id: string;
  channelId: string;
  deviceId?: string;
  parentPath: string;  // "/Tech/Programming"
  autoDownload: boolean;
  downloadLimit?: number;
  isActive: boolean;
  createdAt: number;
  updatedAt: number;
}

interface QueueItem {
  id: string;
  entryId: string;
  deviceId: string;
  position: number;
  addedAt: number;
}

interface Device {
  id: string;
  name: string;
  platform: string;
  lastSeen: number;
}

// Local cache metadata
interface LocalCache {
  id: string;
  url: string;
  contentHash: string;
  filePath: string;    // OPFS path
  cachedAt: number;
  expiresAt?: number;
  size: number;
  isOfflineOnly: boolean;
}

// Dexie schema
const db = new Dexie('SkypodDB');
db.version(1).stores({
  channels: '&id, feedUrl, contentHash',
  channelEntries: '&id, channelId, publishedAt',
  playRecords: '&id, [entryId+deviceId], deviceId, updatedAt',
  subscriptions: '&id, channelId, deviceId, parentPath',
  queueItems: '&id, entryId, deviceId, position',
  devices: '&id, lastSeen',
  localCache: '&id, url, contentHash, expiresAt'
});
#+END_SRC

***** server-side schema

#+BEGIN_SRC typescript
// Content-addressed cache
interface ContentStore {
  contentHash: string;     // Primary key
  content: Buffer;         // Raw feed content
  contentType: string;
  contentLength: number;
  firstSeenAt: number;
  referenceCount: number;
}

interface ContentHistory {
  id: string;
  url: string;
  contentHash: string;
  fetchedAt: number;
  isLatest: boolean;
}

// HTTP cache with health tracking (from your existing design)
interface HttpCache {
  key: string;             // URL hash, primary key
  url: string;

  status: 'alive' | 'dead';
  lastFetchedAt: number;
  lastFetchError?: string;
  lastFetchErrorStreak: number;

  lastHttpStatus: number;
  lastHttpEtag?: string;
  lastHttpHeaders: Record<string, string>;
  expiresAt: number;
  expirationTtl: number;

  contentHash: string;     // Points to ContentStore
}

// Sync/auth tables
interface Realm {
  id: string;              // UUID
  createdAt: number;
  verifiedKeys: string[];  // Public key list
}

interface PeerConnection {
  id: string;
  realmId: string;
  publicKey: string;
  lastSeen: number;
  isOnline: boolean;
}

// Media cache for podcast episodes
interface MediaCache {
  contentHash: string;     // Primary key
  originalUrl: string;
  mimeType: string;
  fileSize: number;
  content: Buffer;
  cachedAt: number;
  accessCount: number;
}
#+END_SRC

**** episode title parsing for sub-feed groupings                 :ai:claude:

*problem*: some podcast feeds contain multiple shows, need hierarchical organization within a feed

*example*: "Apocalypse Players" podcast
- episode title: "A Term of Art 6 - Winston's Hollow"
- desired grouping: "Apocalypse Players > A Term of Art > 6 - Winston's Hollow"
- UI shows sub-shows within the main feed

***** approaches considered

1. *manual regex patterns* (short-term solution)
   - user provides regex with capture groups = tags
   - reliable, immediate, user-controlled
   - requires manual setup per feed

2. *LLM-generated regex* (automation goal)
   - analyze last 100 episode titles
   - generate regex pattern automatically
   - good balance of automation + reliability

3. *NER model training* (experimental)
   - train spacy model for episode title parsing
   - current prototype: 150 labelled examples, limited success
   - needs more training data to be viable

***** data model implications

- add regex pattern field to Channel/Feed
- store extracted groupings as hierarchical tags on ~ChannelEntry~
- maybe add grouping/series field to episodes

***** plan

*preference*: start with manual regex, evolve toward LLM automation

*implementation design*:
- if no title pattern: episodes are direct children of the feed
- title pattern = regex with named capture groups + path template

*example configuration*:
- regex: ~^(?<series>[^0-9]+)\s*(?<episode>\d+)\s*-\s*(?<title>.+)$~
- path template: ~{series} > Episode {episode} - {title}~
- result: "A Term of Art 6 - Winston's Hollow" → "A Term of Art > Episode 6 - Winston's Hollow"

*schema additions*:
#+BEGIN_SRC typescript
interface Channel {
  // ... existing fields
  titlePatterns?: Array<{
    name: string;            // "Main Episodes", "Bonus Content", etc.
    regex: string;           // named capture groups
    pathTemplate: string;    // interpolation template
    priority: number;        // order to try patterns (lower = first)
    isActive: boolean;       // can disable without deleting
  }>;
  fallbackPath?: string;     // template for unmatched episodes
}

interface ChannelEntry {
  // ... existing fields
  parsedPath?: string;       // computed from titlePattern
  parsedGroups?: Record<string, string>; // captured groups
  matchedPatternName?: string; // which pattern was used
}
#+END_SRC

*pattern matching logic*:
1. try patterns in priority order (lower number = higher priority)
2. first matching pattern wins
3. if no patterns match, use fallbackPath template (e.g., "Misc > {title}")
4. if no fallbackPath, episode stays direct child of feed

*example multi-pattern setup*:
- Pattern 1: "Main Episodes" - ~^(?<series>[^0-9]+)\s*(?<episode>\d+)~ → ~{series} > Episode {episode}~
- Pattern 2: "Bonus Content" - ~^Bonus:\s*(?<title>.+)~ → ~Bonus > {title}~
- Fallback: ~Misc > {title}~

**** scoped tags and filter-based UI evolution                    :ai:claude:

*generalization*: move from rigid hierarchies to tag-based filtering system

*tag scoping*:
- feed-level tags: "Tech", "Gaming", "D&D"
- episode-level tags: from regex captures like "series:CriticalRole", "campaign:2", "type:main"
- user tags: manual additions like "favorites", "todo"

*UI as tag filtering*:
- default view: all episodes grouped by feed
- filter by ~series:CriticalRole~ → shows only CR episodes across all feeds
- filter by ~type:bonus~ → shows bonus content from all podcasts
- combine filters: ~series:CriticalRole AND type:main~ → main CR episodes only

*benefits*:
- no rigid hierarchy - users create their own views
- regex patterns become automated episode taggers
- same filtering system works for search, organization, queues
- tags are syncable metadata, views are client-side

*schema evolution*:
#+BEGIN_SRC typescript
interface Tag {
  scope: 'feed' | 'episode' | 'user';
  key: string;    // "series", "type", "campaign"
  value: string;  // "CriticalRole", "bonus", "2"
}

interface ChannelEntry {
  // ... existing
  tags: Tag[];  // includes regex-generated + manual
}

interface FilterView {
  id: string;
  name: string;
  folderPath: string;  // "/Channels/Critical Role"
  filters: Array<{
    key: string;
    value: string;
    operator: 'equals' | 'contains' | 'not';
  }>;
  isDefault: boolean;
  createdAt: number;
}
#+END_SRC

**** default UI construction and feed merging                     :ai:claude:

*auto-generated views on subscribe*:
- subscribe to "Critical Role" → creates ~/Channels/Critical Role~ folder
- default filter view: ~feed:CriticalRole~ (shows all episodes from that feed)
- user can customize, split into sub-views, or delete

*smart view suggestions*:
- after regex patterns generate tags, suggest splitting views
- "I noticed episodes with ~series:Campaign2~ and ~series:Campaign3~ - create separate views?"
- "Create view for ~type:bonus~ episodes?"

*view management UX*:
- right-click feed → "Split by series", "Split by type"
- drag episodes between views to create manual filters
- views can be nested: ~/Channels/Critical Role/Campaign 2/Main Episodes~

*feed merging for multi-source shows*:
problem: patreon feed + main show feed for same podcast

#+BEGIN_EXAMPLE
/Channels/
  Critical Role/
    All Episodes         # merged view: feed:CriticalRole OR feed:CriticalRolePatreon
    Main Feed           # filter: feed:CriticalRole
    Patreon Feed        # filter: feed:CriticalRolePatreon
#+END_EXAMPLE

*deduplication strategy*:
- episodes matched by ~guid~ or similar content hash
- duplicate episodes get ~source:main,patreon~ tags
- UI shows single episode with source indicators
- user can choose preferred source for playback
- play state syncs across all sources of same episode

*feed relationship schema*:
#+BEGIN_SRC typescript
interface FeedGroup {
  id: string;
  name: string;           // "Critical Role"
  feedIds: string[];      // [mainFeedId, patreonFeedId]
  mergeStrategy: 'guid' | 'title' | 'contentHash';
  defaultView: FilterView;
}

interface ChannelEntry {
  // ... existing
  duplicateOf?: string;   // points to canonical episode ID
  sources: string[];      // feed IDs where this episode appears
}
#+END_SRC

**per-view settings and state**:
each filter view acts like a virtual feed with its own:
- unread counts (episodes matching filter that haven't been played)
- notification settings (notify for new episodes in this view)
- muted state (hide notifications, mark as read automatically)
- auto-download preferences (download episodes that match this filter)
- play queue integration (add new episodes to queue)

**use cases**:
- mute "Bonus Content" view but keep notifications for main episodes
- auto-download only "Campaign 2" episodes, skip everything else
- separate unread counts: "5 unread in Main Episodes, 2 in Bonus"
- queue only certain series automatically

**schema additions**:
#+BEGIN_SRC typescript
interface FilterView {
  // ... existing fields
  settings: {
    notificationsEnabled: boolean;
    isMuted: boolean;
    autoDownload: boolean;
    autoQueue: boolean;
    downloadLimit?: number;  // max episodes to keep
  };
  state: {
    unreadCount: number;
    lastViewedAt?: number;
    isCollapsed: boolean;    // in sidebar
  };
}
#+END_SRC

*inheritance behavior*:
- new filter views inherit settings from parent feed/group
- user can override per-view
- "mute all Critical Role" vs "mute only bonus episodes"

**** client-side episode caching strategy                         :ai:claude:

*architecture*: service worker-based transparent caching

*flow*:
1. audio player requests ~/audio?url={episodeUrl}~
2. service worker intercepts request
3. if present in cache (with Range header support):
   - serve from cache
4. else:
   - let request continue to server (immediate playback)
   - simultaneously start background fetch of full audio file
   - when complete, broadcast "episode-cached" event
   - audio player catches event and restarts feed → now uses cached version

**benefits**:
- no playback interruption (streaming starts immediately)
- seamless transition to cached version
- Range header support for seeking/scrubbing
- transparent to audio player implementation

*implementation considerations*:
- cache storage limits and cleanup policies
- partial download resumption if interrupted
- cache invalidation when episode URLs change
- offline playback support
- progress tracking for background downloads

**schema additions**:
#+BEGIN_SRC typescript
interface CachedEpisode {
  episodeId: string;
  originalUrl: string;
  cacheKey: string;        // for cache API
  fileSize: number;
  cachedAt: number;
  lastAccessedAt: number;
  downloadProgress?: number; // 0-100 for in-progress downloads
}
#+END_SRC

**service worker events**:
- ~episode-cache-started~ - background download began
- ~episode-cache-progress~ - download progress update
- ~episode-cache-complete~ - ready to switch to cached version
- ~episode-cache-error~ - download failed, stay with streaming

**background sync for proactive downloads**:

**browser support reality**:
- Background Sync API: good support (Chrome/Edge, limited Safari)
- Periodic Background Sync: very limited (Chrome only, requires PWA install)
- Push notifications: good support, but requires user permission

**hybrid approach**:
1. **foreground sync** (reliable): when app is open, check for new episodes
2. **background sync** (opportunistic): register sync event when app closes
3. **push notifications** (fallback): server pushes "new episodes available"
4. **manual sync** (always works): pull-to-refresh, settings toggle

**implementation strategy**:
#+BEGIN_SRC typescript
// Register background sync when app becomes hidden
document.addEventListener('visibilitychange', () => {
  if (document.hidden && 'serviceWorker' in navigator) {
    navigator.serviceWorker.ready.then(registration => {
      return registration.sync.register('download-episodes');
    });
  }
});

// Service worker handles sync event
self.addEventListener('sync', event => {
  if (event.tag === 'download-episodes') {
    event.waitUntil(syncEpisodes());
  }
});
#+END_SRC

**realistic expectations**:
- iOS Safari: very limited background processing
- Android Chrome: decent background sync support
- Desktop: mostly works
- battery/data saver modes: disabled by OS

**fallback strategy**: rely primarily on foreground sync + push notifications, treat background sync as nice-to-have enhancement

**push notification sync workflow**:

**server-side trigger**:
1. server detects new episodes during RSS refresh
2. check which users are subscribed to that feed
3. send push notification with episode metadata payload
4. notification wakes up service worker on client

**service worker notification handler**:
#+BEGIN_SRC typescript
self.addEventListener('push', event => {
  const data = event.data?.json();

  if (data.type === 'new-episodes') {
    event.waitUntil(
      // Start background download of new episodes
      downloadNewEpisodes(data.episodes)
        .then(() => {
          // Show notification to user
          return self.registration.showNotification('New episodes available', {
            body: ~${data.episodes.length} new episodes downloaded~,
            icon: '/icon-192.png',
            badge: '/badge-72.png',
            tag: 'new-episodes',
            data: { episodeIds: data.episodes.map(e => e.id) }
          });
        })
    );
  }
});

// Handle notification click
self.addEventListener('notificationclick', event => {
  event.notification.close();

  // Open app to specific episode or feed
  event.waitUntil(
    clients.openWindow(~/episodes/${event.notification.data.episodeIds[0]}~)
  );
});
#+END_SRC

**server push logic**:
- batch notifications (don't spam for every episode)
- respect user notification preferences from FilterView settings
- include episode metadata in payload to avoid round-trip
- throttle notifications (max 1 per feed per hour?)

**user flow**:
1. new episode published → server pushes notification
2. service worker downloads episode in background
3. user sees "New episodes downloaded" notification
4. tap notification → opens app to new episode, ready to play offline

*benefits*:
- true background downloading without user interaction
- works even when app is closed
- respects per-feed notification settings

**push payload size constraints**:
- **limit**: ~4KB (4,096 bytes) across most services
- **practical limit**: ~3KB to account for service overhead
- **implications for episode metadata**:

#+BEGIN_SRC json
{
  "type": "new-episodes",
  "episodes": [
    {
      "id": "ep123",
      "channelId": "ch456",
      "title": "Episode Title",
      "url": "https://...",
      "duration": 3600,
      "size": 89432112
    }
  ]
}
#+END_SRC

**payload optimization strategies**:
- minimal episode metadata in push (id, url, basic info)
- batch multiple episodes in single notification
- full episode details fetched after service worker wakes up
- URL shortening for long episode URLs
- compress JSON payload if needed

**alternative for large payloads**:
- push notification contains only "new episodes available" signal
- service worker makes API call to get full episode list
- trade-off: requires network round-trip but unlimited data

**logical clock sync optimization**:

much simpler approach using sync revisions:

#+BEGIN_SRC json
{
  "type": "sync-available",
  "fromRevision": 12345,
  "toRevision": 12389,
  "changeCount": 8
}
#+END_SRC

**service worker sync flow**:
1. push notification wakes service worker with revision range
2. service worker fetches ~/sync?from=12345&to=12389~
3. server returns only changes in that range (episodes, feed updates, etc)
4. service worker applies changes to local dexie store
5. service worker queues background downloads for new episodes
6. updates local revision to 12389

**benefits of revision-based approach**:
- tiny push payload (just revision numbers)
- server can efficiently return only changes in range
- automatic deduplication (revision already applied = skip)
- works for any sync data (episodes, feed metadata, user settings)
- handles offline gaps gracefully (fetch missing revision ranges)

**sync API response**:
#+BEGIN_SRC typescript
interface SyncResponse {
  fromRevision: number;
  toRevision: number;
  changes: Array<{
    type: 'episode' | 'channel' | 'subscription';
    operation: 'create' | 'update' | 'delete';
    data: any;
    revision: number;
  }>;
}
#+END_SRC

**integration with episode downloads**:
- service worker processes sync changes
- identifies new episodes that match user's auto-download filters
- queues those for background cache fetching
- much more efficient than sending episode metadata in push payload

**service worker processing time constraints**:

**hard limits**:
- **30 seconds idle timeout**: service worker terminates after 30s of inactivity
- **5 minutes event processing**: single event/request must complete within 5 minutes
- **30 seconds fetch timeout**: individual network requests timeout after 30s
- **notification requirement**: push events MUST display notification before promise settles

**practical implications**:
- sync API call (~/sync?from=X&to=Y~) must complete within 30s
- large episode downloads must be queued, not started immediately in push handler
- use ~event.waitUntil()~ to keep service worker alive during processing
- break large operations into smaller chunks

**recommended push event flow**:
#+BEGIN_SRC typescript
self.addEventListener('push', event => {
  const data = event.data?.json();

  event.waitUntil(
    // Must complete within 5 minutes total
    handlePushSync(data)
      .then(() => {
        // Required: show notification before promise settles
        return self.registration.showNotification('Episodes synced');
      })
  );
});

async function handlePushSync(data) {
  // 1. Quick sync API call (< 30s)
  const changes = await fetch(~/sync?from=${data.fromRevision}&to=${data.toRevision}~);

  // 2. Apply changes to dexie store (fast, local)
  await applyChangesToStore(changes);

  // 3. Queue episode downloads for later (don't start here)
  await queueEpisodeDownloads(changes.newEpisodes);

  // Total time: < 5 minutes, preferably < 30s
}
#+END_SRC

*download strategy*: use push event for sync + queuing, separate background tasks for actual downloads

*background fetch API for large downloads*:

*progressive enhancement approach*:
#+BEGIN_SRC typescript
async function queueEpisodeDownloads(episodes) {
  for (const episode of episodes) {
    if ('serviceWorker' in navigator && 'BackgroundFetch' in window) {
      // Chrome/Edge: use Background Fetch API for true background downloading
      await navigator.serviceWorker.ready.then(registration => {
        return registration.backgroundFetch.fetch(
          ~episode-${episode.id}~,
          episode.url,
          {
            icons: [{ src: '/icon-256.png', sizes: '256x256', type: 'image/png' }],
            title: ~Downloading: ${episode.title}~,
            downloadTotal: episode.fileSize
          }
        );
      });
    } else {
      // Fallback: queue for reactive download (download while streaming)
      await queueReactiveDownload(episode);
    }
  }
}

// Handle background fetch completion
self.addEventListener('backgroundfetch', event => {
  if (event.tag.startsWith('episode-')) {
    event.waitUntil(handleEpisodeDownloadComplete(event));
  }
});
#+END_SRC

*browser support reality*:
- *Chrome/Edge*: Background Fetch API supported
- *Firefox/Safari*: not supported, fallback to reactive caching
- *mobile*: varies by platform and browser

*benefits when available*:
- true background downloading (survives app close, browser close)
- built-in download progress UI
- automatic retry on network failure
- no service worker time limits during download

*graceful degradation*:
- detect support, use when available
- fallback to reactive caching (download while streaming)
- user gets best experience possible on their platform

*** research todos                                                :ai:claude:

high-level unanswered questions from architecture brainstorming:

**** sync and data management
***** TODO dexie sync capabilities vs rxdb for multi-device sync implementation
***** TODO webrtc p2p sync implementation patterns and reliability
***** TODO conflict resolution strategies for device-specific data in distributed sync
***** TODO content-addressed deduplication algorithms for rss/podcast content
**** client-side storage and caching
***** TODO opfs storage limits and cleanup strategies for client-side caching
***** TODO practical background fetch api limits and edge cases for podcast downloads
**** automation and intelligence
***** TODO llm-based regex generation for episode title parsing automation
***** TODO push notification subscription management and realm authentication
**** platform and browser capabilities
***** TODO browser audio api capabilities for podcast-specific features (speed, silence skip)
***** TODO progressive web app installation and platform-specific behaviors

# Local Variables:
# org-hierarchical-todo-statistics: nil
# org-checkbox-hierarchical-statistics: nil
# End: