Personal Activity Index CLI – Roadmap & Tasks#

Objective: Build a POSIX-style Rust CLI that ingests content from Substack, Bluesky, and Leaflet into SQLite, with an optional Cloudflare Worker + D1 deployment path.

Targets:

Self-host: single binary + SQLite.
Cloudflare: Rust Worker + D1 + Cron triggers.

Workspace & Architecture#

Goal: Shared core library, CLI frontend, and Worker frontend, with clear separation of concerns.

Create Cargo workspace layout:
- core/ – shared types, fetchers, and storage traits.
- cli/ – POSIX-style binary (pai).
- worker/ – Cloudflare Worker using workers-rs.
In core/:
- Define SourceKind enum: substack, bluesky, leaflet.
- Define Item struct with fields:
  - id, source_kind, source_id, author, title, summary, url, content_html, published_at, created_at.
- Define Storage trait with at minimum:
  - insert_or_replace_item(&self, item: &Item) -> Result<()>
  - list_items(&self, filter: &ListFilter) -> Result<Vec<Item>>
- Define SourceFetcher trait:
  - fn sync(&self, storage: &dyn Storage) -> Result<()>
In cli/:
- Add argument parsing that follows POSIX conventions:
  - Options of the form -h, -V, -C dir, -d path, etc.
  - Options come before operands/subcommands where possible.
- Define subcommands (as operands) with their own POSIX-style options:
  - sync
  - list
  - export
  - serve
In core/:
- Implement sync_all_sources(config, storage) that calls each fetcher.

Milestone 1 – Local SQLite Storage (Self-host Base)#

Goal: pai can sync data into a local SQLite file.

Choose SQLite crate (native mode):
- e.g. rusqlite

Define SQL schema and migrations:

items table:

CREATE TABLE IF NOT EXISTS items (
  id            TEXT PRIMARY KEY,
  source_kind   TEXT NOT NULL,
  source_id     TEXT NOT NULL,
  author        TEXT,
  title         TEXT,
  summary       TEXT,
  url           TEXT NOT NULL,
  content_html  TEXT,
  published_at  TEXT NOT NULL,
  created_at    TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_items_source_date ON items (source_kind, source_id, published_at DESC);

Embed migrations or provide schema.sql + pai db-migrate command.

Implement SqliteStorage in cli/:
- Opens/creates DB at -d path or $XDG_DATA_HOME/pai/pai.db fallback.
- Implements Storage trait.
Implement pai sync path:
- pai sync → load config → open SQLite → call sync_all_sources.
- Exit codes:
  - 0 on success, non-zero on failure.
Add pai db-check:
- Verifies schema and prints basic stats (item count per source).

Milestone 2 – Source Integrations ✅#

Goal: All three sources can be ingested via the CLI.

Status: COMPLETE - All three source integrations (Substack RSS, Bluesky AT Protocol, Leaflet RSS) are implemented and tested with real data.

2.1 Substack (Pattern Matched)#

Add config support:

[sources.substack]
enabled   = true
base_url  = "https://patternmatched.substack.com"

Implement SubstackFetcher in core/:
- Fetch {base_url}/feed.
- Parse RSS using feed-rs.
- Map <item>:
  - id = GUID if present, otherwise link.
  - source_kind = "substack".
  - source_id = "patternmatched.substack.com".
  - title, summary from RSS title/description.
  - url from link.
  - published_at from pubDate (normalized to ISO 8601).
Wire into sync_all_sources when enabled.

2.2 Bluesky (desertthunder.dev)#

Add config support:

[sources.bluesky]
enabled = true
handle  = "desertthunder.dev"

Implement BlueskyFetcher in core/:
- Fetch:
  - https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed?actor=desertthunder.dev&limit=N
- Filter out reposts/quotes (only original posts).
- Map post record:
  - id = uri (AT URI).
  - source_kind = "bluesky".
  - source_id = "desertthunder.dev".
  - title = truncated text up to N chars.
  - summary = full text (or truncated).
  - url = canonical https://bsky.app/profile/…/post/… derived from URI.
  - published_at = record.createdAt (ISO 8601 already).
- Optional:
  - Support pagination via cursor until a configured max number of posts.

2.3 Leaflet (desertthunder / stormlightlabs)#

Add config support:

[[sources.leaflet]]
enabled    = true
id         = "desertthunder"
base_url   = "https://desertthunder.leaflet.pub"

[[sources.leaflet]]
enabled    = true
id         = "stormlightlabs"
base_url   = "https://stormlightlabs.leaflet.pub"

Use AT Protocol instead of HTML parsing:
- Use com.atproto.repo.listRecords with collection pub.leaflet.post.
Implement LeafletFetcher in core/:
- For each configured pub:
  - Fetch records using AT Protocol.
  - Parse pub.leaflet.post records.
  - For each post:
    - Extract title from record.
    - Extract publishedAt or createdAt.
    - Derive summary from summary or content field.
    - Generate URL using slug or record ID.
    - Normalize date to ISO 8601 for published_at.
  - Insert or replace items in storage.
Wire into sync_all_sources.

Milestone 3 – Query, Filter, and Export (CLI Only)#

Goal: Make local data usable even without HTTP.

Implement pai list:
- Syntax: pai list [options] (options before operands).
- Options:
  - -k kind filter by source_kind (substack, bluesky, leaflet).
  - -S id filter by source_id (host/handle).
  - -n N limit number of results (default 20).
  - -s time “since time” (e.g. ISO 8601, or “7d” shorthand if desired).
  - -q pattern simple substring filter on title/summary.
- Render as ASCII table or simple text.
Implement pai export:
- Syntax: pai export -f format [-o file].
- Supported formats:
  - json (default).
  - ndjson (optional).
  - rss (optional aggregate).
- Options:
  - -f format (json, rss, …).
  - -o path output file (default stdout).
Implement exit statuses for typical cases:
- 0 on success.
- >0 on error (bad args, DB error, network failure, etc.).

Milestone 4 – Self-hosted HTTP Server Mode#

Goal: Provide a small HTTP API backed by SQLite for self-hosted deployments.

Add serve subcommand in cli/:
- Syntax: pai serve [options].
- Options:
  - -d path database path.
  - -a addr listen address (default 127.0.0.1:8080).
- Follows POSIX conventions: all options before operands.
Implement HTTP server (axum):
- GET /api/feed – list all items, newest first.
- Query params:
  - source_kind, source_id, limit, since, q.
- Optional:
  - GET /api/item/{id} for a single item.
Ensure graceful shutdown and clean error handling.
Document reverse-proxy examples (Caddy, nginx).

Milestone 5 – Cloudflare Worker + D1 Frontend#

Goal: Provide an alternative deployment path using Cloudflare Workers with D1 and Cron triggers.

In worker/:
- Depend on worker crate with d1 feature enabled.
- Reuse core::Item and parsing code (ensure crates are WASM-friendly).
Configure D1:
- Provide schema.sql compatible with D1 (same items table).
- Example wrangler.toml with [[d1_databases]] binding.
Implement Worker routes:
- GET /api/feed with similar semantics as CLI server.
Implement scheduled handler for Cron:
- On each scheduled run, call per-source syncers writing to D1.
- Document cron configuration in wrangler.toml.
Add pai cf-init in cli/:
- Generates a starter wrangler.toml.
- Prints instructions to create D1 DB and bind it.

Milestone 6 – POSIX Polish, Packaging, and Docs#

Goal: Make the CLI feel like a “real UNIX utility” and easy to adopt.

Verify POSIX-style argument handling:
- Short options only in usage syntax; long options are optional extensions.
- Options before operands/subcommands in docs and examples.
- Support grouped short options where meaningful (e.g. -hv).
Implement:
- -h – usage synopsis and options (per POSIX convention).
- -V – version info.
Add manpage-style documentation using clap_mangen (https://crates.io/crates/clap_mangen) in build.rs:
- man/pai.1 with SYNOPSIS, DESCRIPTION, OPTIONS, OPERANDS, EXIT STATUS, ENVIRONMENT, FILES, EXAMPLES.
Publish pai crate to crates.io.
Write README with:
- Self-hosted quickstart.
- Cloudflare Worker quickstart.
- Config reference (config.toml).

2. CLI & Config Spec (POSIX-style)#

2.1 POSIX argument conventions you’re aligning with#

Key constraints you want to follow:

Options are introduced by a single - followed by a single letter (-h, -V, -d path). :contentReference[oaicite:0]{index=0}
Options that require arguments use a separate token: -d path rather than -dpath. :contentReference[oaicite:1]{index=1}
Options appear before operands (here, subcommands and file paths) in the recommended syntax: utility_name [-a] [-b arg] operand1 operand2 …. :contentReference[oaicite:2]{index=2}
-h for help, -V for version are widely conventional. :contentReference[oaicite:3]{index=3}

You can still offer --long-option aliases as a GNU-style extension; just document the POSIX short forms as canonical. :contentReference[oaicite:4]{index=4}

2.2 CLI synopsis#

Utility name: pai (single binary).

Global synopsis#

pai [-hV] [-C config_dir] [-d db_path] command [command-options] [command-operands]

-h Print usage and exit.
-V Print version and exit.
-C config_dir Set configuration directory. Default: $XDG_CONFIG_HOME/pai or $HOME/.config/pai.
-d db_path Path to SQLite database file. Default: $XDG_DATA_HOME/pai/pai.db or $HOME/.local/share/pai/pai.db.

Subcommands are treated as operands in POSIX terms; each subcommand then has its own POSIX-style options.

2.3 Subcommands and their options#

1. `sync` – fetch and store content#

pai [-C config_dir] [-d db_path] sync [-a] [-k kind] [-S source_id]

Options:

-a Sync all configured sources (default if -k not specified).
-k kind Sync only a particular source kind:
- substack
- bluesky
- leaflet
-S source_id Sync only a specific source instance (e.g. patternmatched.substack.com, desertthunder.dev, desertthunder.leaflet.pub, stormlightlabs.leaflet.pub).

Examples:

pai sync -a
pai sync -k substack
pai sync -k leaflet -S desertthunder.leaflet.pub

2. `list` – inspect stored items#

pai [-C config_dir] [-d db_path] list [-k kind] [-S source_id] [-n number] [-s since] [-q pattern]

Options:

-k kind Filter by source kind (substack, bluesky, leaflet).
-S source_id Filter by specific source id (host or handle).
-n number Maximum number of items to display (default 20).
-s since Only show items published at or after this time. The CLI can accept ISO 8601 (2025-11-23T00:00:00Z) and, as a convenience, relative strings like 7d, 24h if you want.
-q pattern Filter items whose title/summary contains the given substring.

3. `export` – produce feeds/files#

pai [-C config_dir] [-d db_path] export [-k kind] [-S source_id] [-n number] [-s since] [-q pattern] [-f format] [-o file]

Options (in addition to list filters):

-f format Output format:
- json (default)
- ndjson
- rss (optional)
-o file Output file. Default is standard output.

Examples:

pai export -f json -o activity.json
pai export -k bluesky -n 50 -f ndjson

4. `serve` – self-host HTTP API#

pai [-C config_dir] [-d db_path] serve [-a address]

Options:

-a address Address to bind HTTP server to. Default: 127.0.0.1:8080.

The HTTP API mirrors the query semantics of list and export:

GET /api/feed?source_kind=bluesky&limit=50&since=...&q=...

5. `cf-init` – scaffold Cloudflare deployment#

pai cf-init [-o dir]

Options:

-o dir Directory into which to write wrangler.toml, schema.sql, and a sample worker entry point. Default: current directory.

This command doesn’t need DB access; it just writes templates and prints next steps (create D1 DB, bind it, set up Cron).

2.4 `config.toml` spec#

Default location:

$XDG_CONFIG_HOME/pai/config.toml or
$HOME/.config/pai/config.toml if XDG_CONFIG_HOME is unset.

Top-level layout:

[database]
# Path to SQLite database for self-host mode.
# Ignored by the Worker; used only by `pai` binary.
path = "/home/owais/.local/share/pai/pai.db"

[deployment]
# Which deploy targets are configured.
# "sqlite" is always available; "cloudflare" is optional.
mode = "sqlite"        # or "cloudflare"

[deployment.cloudflare]
# Optional metadata for generating wrangler.toml, etc.
worker_name   = "personal-activity-index"
d1_binding    = "DB"
database_name = "personal_activity_db"

[sources.substack]
enabled   = true
base_url  = "https://patternmatched.substack.com"

[sources.bluesky]
enabled = true
handle  = "desertthunder.dev"

[[sources.leaflet]]
enabled   = true
id        = "desertthunder"
base_url  = "https://desertthunder.leaflet.pub"

[[sources.leaflet]]
enabled   = true
id        = "stormlightlabs"
base_url  = "https://stormlightlabs.leaflet.pub"

Notes:

The CLI should not require the Cloudflare section unless a user explicitly wants to generate Worker scaffolding.
The Worker itself will get its D1 binding and Cron schedule from wrangler.toml and the Cloudflare dashboard, not from this config file; you just reuse the same schema and Item type.

2.5 POSIX compliance checklist#

When you implement the CLI parsing, you can sanity-check against POSIX & GNU guidance:

Short options are single letters with a single leading -. (The Open Group)
Options precede non-option arguments (your commands and operands) in the usage examples. (The Open Group)
Options that take arguments are formatted as -x arg rather than -xarg in documentation. (gnu.org)
You provide -h / -V and consistent help text. (Baeldung on Kotlin)
Long options (--help, --version, --config-dir, etc.) can be supported as extensions but are not required for conformance. (Software Engineering Stack Exchange)

Personal Activity Index CLI – Roadmap & Tasks#

Workspace & Architecture#

Milestone 1 – Local SQLite Storage (Self-host Base)#

Milestone 2 – Source Integrations ✅#

2.1 Substack (Pattern Matched)#

2.2 Bluesky (desertthunder.dev)#

2.3 Leaflet (desertthunder / stormlightlabs)#

Milestone 3 – Query, Filter, and Export (CLI Only)#

Milestone 4 – Self-hosted HTTP Server Mode#

Milestone 5 – Cloudflare Worker + D1 Frontend#

Milestone 6 – POSIX Polish, Packaging, and Docs#

2. CLI & Config Spec (POSIX-style)#

2.1 POSIX argument conventions you’re aligning with#

2.2 CLI synopsis#

Global synopsis#

2.3 Subcommands and their options#

1. sync – fetch and store content#

2. list – inspect stored items#

3. export – produce feeds/files#

4. serve – self-host HTTP API#

5. cf-init – scaffold Cloudflare deployment#

2.4 config.toml spec#

2.5 POSIX compliance checklist#

1. `sync` – fetch and store content#

2. `list` – inspect stored items#

3. `export` – produce feeds/files#

4. `serve` – self-host HTTP API#

5. `cf-init` – scaffold Cloudflare deployment#

2.4 `config.toml` spec#