Personal Activity Index CLI – Roadmap & Tasks#
Objective: Build a POSIX-style Rust CLI that ingests content from Substack, Bluesky, and Leaflet into SQLite, with an optional Cloudflare Worker + D1 deployment path.
Targets:
- Self-host: single binary + SQLite.
- Cloudflare: Rust Worker + D1 + Cron triggers.
Workspace & Architecture#
Goal: Shared core library, CLI frontend, and Worker frontend, with clear separation of concerns.
- Create Cargo workspace layout:
-
core/– shared types, fetchers, and storage traits. -
cli/– POSIX-style binary (pai). -
worker/– Cloudflare Worker usingworkers-rs.
-
- In
core/:- Define
SourceKindenum:substack,bluesky,leaflet. - Define
Itemstruct with fields:-
id,source_kind,source_id,author,title,summary,url,content_html,published_at,created_at.
-
- Define
Storagetrait with at minimum:-
insert_or_replace_item(&self, item: &Item) -> Result<()> -
list_items(&self, filter: &ListFilter) -> Result<Vec<Item>>
-
- Define
SourceFetchertrait:-
fn sync(&self, storage: &dyn Storage) -> Result<()>
-
- Define
- In
cli/:- Add argument parsing that follows POSIX conventions:
- Options of the form
-h,-V,-C dir,-d path, etc. - Options come before operands/subcommands where possible.
- Options of the form
- Define subcommands (as operands) with their own POSIX-style options:
-
sync -
list -
export -
serve
-
- Add argument parsing that follows POSIX conventions:
- In
core/:- Implement
sync_all_sources(config, storage)that calls each fetcher.
- Implement
Milestone 1 – Local SQLite Storage (Self-host Base)#
Goal: pai can sync data into a local SQLite file.
-
Choose SQLite crate (native mode):
- e.g.
rusqlite
- e.g.
-
Define SQL schema and migrations:
-
itemstable:
CREATE TABLE IF NOT EXISTS items ( id TEXT PRIMARY KEY, source_kind TEXT NOT NULL, source_id TEXT NOT NULL, author TEXT, title TEXT, summary TEXT, url TEXT NOT NULL, content_html TEXT, published_at TEXT NOT NULL, created_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP ); CREATE INDEX IF NOT EXISTS idx_items_source_date ON items (source_kind, source_id, published_at DESC);- Embed migrations or provide
schema.sql+pai db-migratecommand.
-
-
Implement
SqliteStorageincli/:- Opens/creates DB at
-d pathor$XDG_DATA_HOME/pai/pai.dbfallback. - Implements
Storagetrait.
- Opens/creates DB at
-
Implement
pai syncpath:-
pai sync→ load config → open SQLite → callsync_all_sources. - Exit codes:
-
0on success, non-zero on failure.
-
-
-
Add
pai db-check:- Verifies schema and prints basic stats (item count per source).
Milestone 2 – Source Integrations ✅#
Goal: All three sources can be ingested via the CLI.
Status: COMPLETE - All three source integrations (Substack RSS, Bluesky AT Protocol, Leaflet RSS) are implemented and tested with real data.
2.1 Substack (Pattern Matched)#
-
Add config support:
[sources.substack] enabled = true base_url = "https://patternmatched.substack.com" -
Implement
SubstackFetcherincore/:-
Fetch
{base_url}/feed. -
Parse RSS using
feed-rs. -
Map
<item>:-
id= GUID if present, otherwiselink. -
source_kind = "substack". -
source_id = "patternmatched.substack.com". -
title,summaryfrom RSStitle/description. -
urlfromlink. -
published_atfrompubDate(normalized to ISO 8601).
-
-
-
Wire into
sync_all_sourceswhen enabled.
2.2 Bluesky (desertthunder.dev)#
-
Add config support:
[sources.bluesky] enabled = true handle = "desertthunder.dev" -
Implement
BlueskyFetcherincore/:-
Fetch:
-
https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed?actor=desertthunder.dev&limit=N
-
-
Filter out reposts/quotes (only original posts).
-
Map
postrecord:-
id=uri(AT URI). -
source_kind = "bluesky". -
source_id = "desertthunder.dev". -
title= truncated text up to N chars. -
summary= full text (or truncated). -
url= canonicalhttps://bsky.app/profile/…/post/…derived from URI. -
published_at=record.createdAt(ISO 8601 already).
-
-
Optional:
- Support pagination via
cursoruntil a configured max number of posts.
- Support pagination via
-
2.3 Leaflet (desertthunder / stormlightlabs)#
-
Add config support:
[[sources.leaflet]] enabled = true id = "desertthunder" base_url = "https://desertthunder.leaflet.pub" [[sources.leaflet]] enabled = true id = "stormlightlabs" base_url = "https://stormlightlabs.leaflet.pub" -
Use AT Protocol instead of HTML parsing:
- Use
com.atproto.repo.listRecordswith collectionpub.leaflet.post.
- Use
-
Implement
LeafletFetcherincore/:-
For each configured pub:
-
Fetch records using AT Protocol.
-
Parse
pub.leaflet.postrecords. -
For each post:
- Extract
titlefrom record. - Extract
publishedAtorcreatedAt. - Derive summary from
summaryorcontentfield. - Generate URL using
slugor record ID. - Normalize date to ISO 8601 for
published_at.
- Extract
-
Insert or replace items in storage.
-
-
-
Wire into
sync_all_sources.
Milestone 3 – Query, Filter, and Export (CLI Only)#
Goal: Make local data usable even without HTTP.
- Implement
pai list:- Syntax:
pai list [options](options before operands). - Options:
-
-k kindfilter bysource_kind(substack,bluesky,leaflet). -
-S idfilter bysource_id(host/handle). -
-n Nlimit number of results (default 20). -
-s time“since time” (e.g. ISO 8601, or “7d” shorthand if desired). -
-q patternsimple substring filter on title/summary.
-
- Render as ASCII table or simple text.
- Syntax:
- Implement
pai export:- Syntax:
pai export -f format [-o file]. - Supported formats:
-
json(default). -
ndjson(optional). -
rss(optional aggregate).
-
- Options:
-
-f format(json,rss, …). -
-o pathoutput file (default stdout).
-
- Syntax:
- Implement exit statuses for typical cases:
-
0on success. -
>0on error (bad args, DB error, network failure, etc.).
-
Milestone 4 – Self-hosted HTTP Server Mode#
Goal: Provide a small HTTP API backed by SQLite for self-hosted deployments.
- Add
servesubcommand incli/:- Syntax:
pai serve [options]. - Options:
-
-d pathdatabase path. -
-a addrlisten address (default127.0.0.1:8080).
-
- Follows POSIX conventions: all options before operands.
- Syntax:
- Implement HTTP server (
axum):-
GET /api/feed– list all items, newest first. - Query params:
-
source_kind,source_id,limit,since,q.
-
- Optional:
-
GET /api/item/{id}for a single item.
-
-
- Ensure graceful shutdown and clean error handling.
- Document reverse-proxy examples (Caddy, nginx).
Milestone 5 – Cloudflare Worker + D1 Frontend#
Goal: Provide an alternative deployment path using Cloudflare Workers with D1 and Cron triggers.
- In
worker/:- Depend on
workercrate withd1feature enabled. - Reuse
core::Itemand parsing code (ensure crates are WASM-friendly).
- Depend on
- Configure D1:
- Provide
schema.sqlcompatible with D1 (sameitemstable). - Example
wrangler.tomlwith[[d1_databases]]binding.
- Provide
- Implement Worker routes:
-
GET /api/feedwith similar semantics as CLI server.
-
- Implement
scheduledhandler for Cron:- On each scheduled run, call per-source syncers writing to D1.
- Document cron configuration in
wrangler.toml.
- Add
pai cf-initincli/:- Generates a starter
wrangler.toml. - Prints instructions to create D1 DB and bind it.
- Generates a starter
Milestone 6 – POSIX Polish, Packaging, and Docs#
Goal: Make the CLI feel like a “real UNIX utility” and easy to adopt.
- Verify POSIX-style argument handling:
- Short options only in usage syntax; long options are optional extensions.
- Options before operands/subcommands in docs and examples.
- Support grouped short options where meaningful (e.g.
-hv).
- Implement:
-
-h– usage synopsis and options (per POSIX convention). -
-V– version info.
-
- Add manpage-style documentation using clap_mangen (https://crates.io/crates/clap_mangen) in build.rs:
-
man/pai.1with SYNOPSIS, DESCRIPTION, OPTIONS, OPERANDS, EXIT STATUS, ENVIRONMENT, FILES, EXAMPLES.
-
- Publish
paicrate to crates.io. - Write README with:
- Self-hosted quickstart.
- Cloudflare Worker quickstart.
- Config reference (
config.toml).
2. CLI & Config Spec (POSIX-style)#
2.1 POSIX argument conventions you’re aligning with#
Key constraints you want to follow:
- Options are introduced by a single
-followed by a single letter (-h,-V,-d path). :contentReference[oaicite:0]{index=0} - Options that require arguments use a separate token:
-d pathrather than-dpath. :contentReference[oaicite:1]{index=1} - Options appear before operands (here, subcommands and file paths) in the recommended syntax:
utility_name [-a] [-b arg] operand1 operand2 …. :contentReference[oaicite:2]{index=2} -hfor help,-Vfor version are widely conventional. :contentReference[oaicite:3]{index=3}
You can still offer --long-option aliases as a GNU-style extension; just document the POSIX short forms as canonical. :contentReference[oaicite:4]{index=4}
2.2 CLI synopsis#
Utility name: pai (single binary).
Global synopsis#
pai [-hV] [-C config_dir] [-d db_path] command [command-options] [command-operands]
-
-hPrint usage and exit. -
-VPrint version and exit. -
-C config_dirSet configuration directory. Default:$XDG_CONFIG_HOME/paior$HOME/.config/pai. -
-d db_pathPath to SQLite database file. Default:$XDG_DATA_HOME/pai/pai.dbor$HOME/.local/share/pai/pai.db.
Subcommands are treated as operands in POSIX terms; each subcommand then has its own POSIX-style options.
2.3 Subcommands and their options#
1. sync – fetch and store content#
pai [-C config_dir] [-d db_path] sync [-a] [-k kind] [-S source_id]
Options:
-
-aSync all configured sources (default if-knot specified). -
-k kindSync only a particular source kind:substackblueskyleaflet
-
-S source_idSync only a specific source instance (e.g.patternmatched.substack.com,desertthunder.dev,desertthunder.leaflet.pub,stormlightlabs.leaflet.pub).
Examples:
pai sync -a
pai sync -k substack
pai sync -k leaflet -S desertthunder.leaflet.pub
2. list – inspect stored items#
pai [-C config_dir] [-d db_path] list [-k kind] [-S source_id] [-n number] [-s since] [-q pattern]
Options:
-
-k kindFilter by source kind (substack,bluesky,leaflet). -
-S source_idFilter by specific source id (host or handle). -
-n numberMaximum number of items to display (default 20). -
-s sinceOnly show items published at or after this time. The CLI can accept ISO 8601 (2025-11-23T00:00:00Z) and, as a convenience, relative strings like7d,24hif you want. -
-q patternFilter items whose title/summary contains the given substring.
3. export – produce feeds/files#
pai [-C config_dir] [-d db_path] export [-k kind] [-S source_id] [-n number] [-s since] [-q pattern] [-f format] [-o file]
Options (in addition to list filters):
-
-f formatOutput format:json(default)ndjsonrss(optional)
-
-o fileOutput file. Default is standard output.
Examples:
pai export -f json -o activity.json
pai export -k bluesky -n 50 -f ndjson
4. serve – self-host HTTP API#
pai [-C config_dir] [-d db_path] serve [-a address]
Options:
-a addressAddress to bind HTTP server to. Default:127.0.0.1:8080.
The HTTP API mirrors the query semantics of list and export:
GET /api/feed?source_kind=bluesky&limit=50&since=...&q=...
5. cf-init – scaffold Cloudflare deployment#
pai cf-init [-o dir]
Options:
-o dirDirectory into which to writewrangler.toml,schema.sql, and a sampleworkerentry point. Default: current directory.
This command doesn’t need DB access; it just writes templates and prints next steps (create D1 DB, bind it, set up Cron).
2.4 config.toml spec#
Default location:
$XDG_CONFIG_HOME/pai/config.tomlor$HOME/.config/pai/config.tomlifXDG_CONFIG_HOMEis unset.
Top-level layout:
[database]
# Path to SQLite database for self-host mode.
# Ignored by the Worker; used only by `pai` binary.
path = "/home/owais/.local/share/pai/pai.db"
[deployment]
# Which deploy targets are configured.
# "sqlite" is always available; "cloudflare" is optional.
mode = "sqlite" # or "cloudflare"
[deployment.cloudflare]
# Optional metadata for generating wrangler.toml, etc.
worker_name = "personal-activity-index"
d1_binding = "DB"
database_name = "personal_activity_db"
[sources.substack]
enabled = true
base_url = "https://patternmatched.substack.com"
[sources.bluesky]
enabled = true
handle = "desertthunder.dev"
[[sources.leaflet]]
enabled = true
id = "desertthunder"
base_url = "https://desertthunder.leaflet.pub"
[[sources.leaflet]]
enabled = true
id = "stormlightlabs"
base_url = "https://stormlightlabs.leaflet.pub"
Notes:
- The CLI should not require the Cloudflare section unless a user explicitly wants to generate Worker scaffolding.
- The Worker itself will get its D1 binding and Cron schedule from
wrangler.tomland the Cloudflare dashboard, not from this config file; you just reuse the same schema andItemtype.
2.5 POSIX compliance checklist#
When you implement the CLI parsing, you can sanity-check against POSIX & GNU guidance:
- Short options are single letters with a single leading
-. (The Open Group) - Options precede non-option arguments (your commands and operands) in the usage examples. (The Open Group)
- Options that take arguments are formatted as
-x argrather than-xargin documentation. (gnu.org) - You provide
-h/-Vand consistent help text. (Baeldung on Kotlin) - Long options (
--help,--version,--config-dir, etc.) can be supported as extensions but are not required for conformance. (Software Engineering Stack Exchange)