personal activity index (bluesky, leaflet, substack) pai.desertthunder.dev
rss bluesky

Personal Activity Index CLI – Roadmap & Tasks#

Objective: Build a POSIX-style Rust CLI that ingests content from Substack, Bluesky, and Leaflet into SQLite, with an optional Cloudflare Worker + D1 deployment path.

Targets:

  • Self-host: single binary + SQLite.
  • Cloudflare: Rust Worker + D1 + Cron triggers.

Workspace & Architecture#

Goal: Shared core library, CLI frontend, and Worker frontend, with clear separation of concerns.

  • Create Cargo workspace layout:
    • core/ – shared types, fetchers, and storage traits.
    • cli/ – POSIX-style binary (pai).
    • worker/ – Cloudflare Worker using workers-rs.
  • In core/:
    • Define SourceKind enum: substack, bluesky, leaflet.
    • Define Item struct with fields:
      • id, source_kind, source_id, author, title, summary, url, content_html, published_at, created_at.
    • Define Storage trait with at minimum:
      • insert_or_replace_item(&self, item: &Item) -> Result<()>
      • list_items(&self, filter: &ListFilter) -> Result<Vec<Item>>
    • Define SourceFetcher trait:
      • fn sync(&self, storage: &dyn Storage) -> Result<()>
  • In cli/:
    • Add argument parsing that follows POSIX conventions:
      • Options of the form -h, -V, -C dir, -d path, etc.
      • Options come before operands/subcommands where possible.
    • Define subcommands (as operands) with their own POSIX-style options:
      • sync
      • list
      • export
      • serve
  • In core/:
    • Implement sync_all_sources(config, storage) that calls each fetcher.

Milestone 1 – Local SQLite Storage (Self-host Base)#

Goal: pai can sync data into a local SQLite file.

  • Choose SQLite crate (native mode):

    • e.g. rusqlite
  • Define SQL schema and migrations:

    • items table:
    CREATE TABLE IF NOT EXISTS items (
      id            TEXT PRIMARY KEY,
      source_kind   TEXT NOT NULL,
      source_id     TEXT NOT NULL,
      author        TEXT,
      title         TEXT,
      summary       TEXT,
      url           TEXT NOT NULL,
      content_html  TEXT,
      published_at  TEXT NOT NULL,
      created_at    TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP
    );
    
    CREATE INDEX IF NOT EXISTS idx_items_source_date ON items (source_kind, source_id, published_at DESC);
    
    • Embed migrations or provide schema.sql + pai db-migrate command.
  • Implement SqliteStorage in cli/:

    • Opens/creates DB at -d path or $XDG_DATA_HOME/pai/pai.db fallback.
    • Implements Storage trait.
  • Implement pai sync path:

    • pai sync → load config → open SQLite → call sync_all_sources.
    • Exit codes:
      • 0 on success, non-zero on failure.
  • Add pai db-check:

    • Verifies schema and prints basic stats (item count per source).

Milestone 2 – Source Integrations ✅#

Goal: All three sources can be ingested via the CLI.

Status: COMPLETE - All three source integrations (Substack RSS, Bluesky AT Protocol, Leaflet RSS) are implemented and tested with real data.

2.1 Substack (Pattern Matched)#

  • Add config support:

    [sources.substack]
    enabled   = true
    base_url  = "https://patternmatched.substack.com"
    
  • Implement SubstackFetcher in core/:

    • Fetch {base_url}/feed.

    • Parse RSS using feed-rs.

    • Map <item>:

      • id = GUID if present, otherwise link.
      • source_kind = "substack".
      • source_id = "patternmatched.substack.com".
      • title, summary from RSS title/description.
      • url from link.
      • published_at from pubDate (normalized to ISO 8601).
  • Wire into sync_all_sources when enabled.

2.2 Bluesky (desertthunder.dev)#

  • Add config support:

    [sources.bluesky]
    enabled = true
    handle  = "desertthunder.dev"
    
  • Implement BlueskyFetcher in core/:

    • Fetch:

      • https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed?actor=desertthunder.dev&limit=N
    • Filter out reposts/quotes (only original posts).

    • Map post record:

      • id = uri (AT URI).
      • source_kind = "bluesky".
      • source_id = "desertthunder.dev".
      • title = truncated text up to N chars.
      • summary = full text (or truncated).
      • url = canonical https://bsky.app/profile/…/post/… derived from URI.
      • published_at = record.createdAt (ISO 8601 already).
    • Optional:

      • Support pagination via cursor until a configured max number of posts.

2.3 Leaflet (desertthunder / stormlightlabs)#

  • Add config support:

    [[sources.leaflet]]
    enabled    = true
    id         = "desertthunder"
    base_url   = "https://desertthunder.leaflet.pub"
    
    [[sources.leaflet]]
    enabled    = true
    id         = "stormlightlabs"
    base_url   = "https://stormlightlabs.leaflet.pub"
    
  • Use AT Protocol instead of HTML parsing:

    • Use com.atproto.repo.listRecords with collection pub.leaflet.post.
  • Implement LeafletFetcher in core/:

    • For each configured pub:

      • Fetch records using AT Protocol.

      • Parse pub.leaflet.post records.

      • For each post:

        • Extract title from record.
        • Extract publishedAt or createdAt.
        • Derive summary from summary or content field.
        • Generate URL using slug or record ID.
        • Normalize date to ISO 8601 for published_at.
      • Insert or replace items in storage.

  • Wire into sync_all_sources.

Milestone 3 – Query, Filter, and Export (CLI Only)#

Goal: Make local data usable even without HTTP.

  • Implement pai list:
    • Syntax: pai list [options] (options before operands).
    • Options:
      • -k kind filter by source_kind (substack, bluesky, leaflet).
      • -S id filter by source_id (host/handle).
      • -n N limit number of results (default 20).
      • -s time “since time” (e.g. ISO 8601, or “7d” shorthand if desired).
      • -q pattern simple substring filter on title/summary.
    • Render as ASCII table or simple text.
  • Implement pai export:
    • Syntax: pai export -f format [-o file].
    • Supported formats:
      • json (default).
      • ndjson (optional).
      • rss (optional aggregate).
    • Options:
      • -f format (json, rss, …).
      • -o path output file (default stdout).
  • Implement exit statuses for typical cases:
    • 0 on success.
    • >0 on error (bad args, DB error, network failure, etc.).

Milestone 4 – Self-hosted HTTP Server Mode#

Goal: Provide a small HTTP API backed by SQLite for self-hosted deployments.

  • Add serve subcommand in cli/:
    • Syntax: pai serve [options].
    • Options:
      • -d path database path.
      • -a addr listen address (default 127.0.0.1:8080).
    • Follows POSIX conventions: all options before operands.
  • Implement HTTP server (axum):
    • GET /api/feed – list all items, newest first.
    • Query params:
      • source_kind, source_id, limit, since, q.
    • Optional:
      • GET /api/item/{id} for a single item.
  • Ensure graceful shutdown and clean error handling.
  • Document reverse-proxy examples (Caddy, nginx).

Milestone 5 – Cloudflare Worker + D1 Frontend#

Goal: Provide an alternative deployment path using Cloudflare Workers with D1 and Cron triggers.

  • In worker/:
    • Depend on worker crate with d1 feature enabled.
    • Reuse core::Item and parsing code (ensure crates are WASM-friendly).
  • Configure D1:
    • Provide schema.sql compatible with D1 (same items table).
    • Example wrangler.toml with [[d1_databases]] binding.
  • Implement Worker routes:
    • GET /api/feed with similar semantics as CLI server.
  • Implement scheduled handler for Cron:
    • On each scheduled run, call per-source syncers writing to D1.
    • Document cron configuration in wrangler.toml.
  • Add pai cf-init in cli/:
    • Generates a starter wrangler.toml.
    • Prints instructions to create D1 DB and bind it.

Milestone 6 – POSIX Polish, Packaging, and Docs#

Goal: Make the CLI feel like a “real UNIX utility” and easy to adopt.

  • Verify POSIX-style argument handling:
    • Short options only in usage syntax; long options are optional extensions.
    • Options before operands/subcommands in docs and examples.
    • Support grouped short options where meaningful (e.g. -hv).
  • Implement:
    • -h – usage synopsis and options (per POSIX convention).
    • -V – version info.
  • Add manpage-style documentation using clap_mangen (https://crates.io/crates/clap_mangen) in build.rs:
    • man/pai.1 with SYNOPSIS, DESCRIPTION, OPTIONS, OPERANDS, EXIT STATUS, ENVIRONMENT, FILES, EXAMPLES.
  • Publish pai crate to crates.io.
  • Write README with:
    • Self-hosted quickstart.
    • Cloudflare Worker quickstart.
    • Config reference (config.toml).

2. CLI & Config Spec (POSIX-style)#

2.1 POSIX argument conventions you’re aligning with#

Key constraints you want to follow:

  • Options are introduced by a single - followed by a single letter (-h, -V, -d path). :contentReference[oaicite:0]{index=0}
  • Options that require arguments use a separate token: -d path rather than -dpath. :contentReference[oaicite:1]{index=1}
  • Options appear before operands (here, subcommands and file paths) in the recommended syntax: utility_name [-a] [-b arg] operand1 operand2 …. :contentReference[oaicite:2]{index=2}
  • -h for help, -V for version are widely conventional. :contentReference[oaicite:3]{index=3}

You can still offer --long-option aliases as a GNU-style extension; just document the POSIX short forms as canonical. :contentReference[oaicite:4]{index=4}

2.2 CLI synopsis#

Utility name: pai (single binary).

Global synopsis#

pai [-hV] [-C config_dir] [-d db_path] command [command-options] [command-operands]
  • -h Print usage and exit.

  • -V Print version and exit.

  • -C config_dir Set configuration directory. Default: $XDG_CONFIG_HOME/pai or $HOME/.config/pai.

  • -d db_path Path to SQLite database file. Default: $XDG_DATA_HOME/pai/pai.db or $HOME/.local/share/pai/pai.db.

Subcommands are treated as operands in POSIX terms; each subcommand then has its own POSIX-style options.

2.3 Subcommands and their options#

1. sync – fetch and store content#

pai [-C config_dir] [-d db_path] sync [-a] [-k kind] [-S source_id]

Options:

  • -a Sync all configured sources (default if -k not specified).

  • -k kind Sync only a particular source kind:

    • substack
    • bluesky
    • leaflet
  • -S source_id Sync only a specific source instance (e.g. patternmatched.substack.com, desertthunder.dev, desertthunder.leaflet.pub, stormlightlabs.leaflet.pub).

Examples:

pai sync -a
pai sync -k substack
pai sync -k leaflet -S desertthunder.leaflet.pub

2. list – inspect stored items#

pai [-C config_dir] [-d db_path] list [-k kind] [-S source_id] [-n number] [-s since] [-q pattern]

Options:

  • -k kind Filter by source kind (substack, bluesky, leaflet).

  • -S source_id Filter by specific source id (host or handle).

  • -n number Maximum number of items to display (default 20).

  • -s since Only show items published at or after this time. The CLI can accept ISO 8601 (2025-11-23T00:00:00Z) and, as a convenience, relative strings like 7d, 24h if you want.

  • -q pattern Filter items whose title/summary contains the given substring.

3. export – produce feeds/files#

pai [-C config_dir] [-d db_path] export [-k kind] [-S source_id] [-n number] [-s since] [-q pattern] [-f format] [-o file]

Options (in addition to list filters):

  • -f format Output format:

    • json (default)
    • ndjson
    • rss (optional)
  • -o file Output file. Default is standard output.

Examples:

pai export -f json -o activity.json
pai export -k bluesky -n 50 -f ndjson

4. serve – self-host HTTP API#

pai [-C config_dir] [-d db_path] serve [-a address]

Options:

  • -a address Address to bind HTTP server to. Default: 127.0.0.1:8080.

The HTTP API mirrors the query semantics of list and export:

  • GET /api/feed?source_kind=bluesky&limit=50&since=...&q=...

5. cf-init – scaffold Cloudflare deployment#

pai cf-init [-o dir]

Options:

  • -o dir Directory into which to write wrangler.toml, schema.sql, and a sample worker entry point. Default: current directory.

This command doesn’t need DB access; it just writes templates and prints next steps (create D1 DB, bind it, set up Cron).

2.4 config.toml spec#

Default location:

  • $XDG_CONFIG_HOME/pai/config.toml or
  • $HOME/.config/pai/config.toml if XDG_CONFIG_HOME is unset.

Top-level layout:

[database]
# Path to SQLite database for self-host mode.
# Ignored by the Worker; used only by `pai` binary.
path = "/home/owais/.local/share/pai/pai.db"

[deployment]
# Which deploy targets are configured.
# "sqlite" is always available; "cloudflare" is optional.
mode = "sqlite"        # or "cloudflare"

[deployment.cloudflare]
# Optional metadata for generating wrangler.toml, etc.
worker_name   = "personal-activity-index"
d1_binding    = "DB"
database_name = "personal_activity_db"

[sources.substack]
enabled   = true
base_url  = "https://patternmatched.substack.com"

[sources.bluesky]
enabled = true
handle  = "desertthunder.dev"

[[sources.leaflet]]
enabled   = true
id        = "desertthunder"
base_url  = "https://desertthunder.leaflet.pub"

[[sources.leaflet]]
enabled   = true
id        = "stormlightlabs"
base_url  = "https://stormlightlabs.leaflet.pub"

Notes:

  • The CLI should not require the Cloudflare section unless a user explicitly wants to generate Worker scaffolding.
  • The Worker itself will get its D1 binding and Cron schedule from wrangler.toml and the Cloudflare dashboard, not from this config file; you just reuse the same schema and Item type.

2.5 POSIX compliance checklist#

When you implement the CLI parsing, you can sanity-check against POSIX & GNU guidance:

  • Short options are single letters with a single leading -. (The Open Group)
  • Options precede non-option arguments (your commands and operands) in the usage examples. (The Open Group)
  • Options that take arguments are formatted as -x arg rather than -xarg in documentation. (gnu.org)
  • You provide -h / -V and consistent help text. (Baeldung on Kotlin)
  • Long options (--help, --version, --config-dir, etc.) can be supported as extensions but are not required for conformance. (Software Engineering Stack Exchange)