personal activity index (bluesky, leaflet, substack) pai.desertthunder.dev
rss bluesky
at main 466 lines 16 kB view raw view rendered
1 2# Personal Activity Index CLI – Roadmap & Tasks 3 4Objective: 5Build a POSIX-style Rust CLI that ingests content from Substack, Bluesky, and Leaflet into SQLite, with an optional Cloudflare Worker + D1 deployment path. 6 7Targets: 8 9- Self-host: single binary + SQLite. 10- Cloudflare: Rust Worker + D1 + Cron triggers. 11 12## Workspace & Architecture 13 14**Goal:** Shared core library, CLI frontend, and Worker frontend, with clear separation of concerns. 15 16- [x] Create Cargo workspace layout: 17 - [x] `core/` – shared types, fetchers, and storage traits. 18 - [x] `cli/` – POSIX-style binary (`pai`). 19 - [x] `worker/` – Cloudflare Worker using `workers-rs`. 20- [x] In `core/`: 21 - [x] Define `SourceKind` enum: `substack`, `bluesky`, `leaflet`. 22 - [x] Define `Item` struct with fields: 23 - [x] `id`, `source_kind`, `source_id`, `author`, `title`, `summary`, 24 `url`, `content_html`, `published_at`, `created_at`. 25 - [x] Define `Storage` trait with at minimum: 26 - [x] `insert_or_replace_item(&self, item: &Item) -> Result<()>` 27 - [x] `list_items(&self, filter: &ListFilter) -> Result<Vec<Item>>` 28 - [x] Define `SourceFetcher` trait: 29 - [x] `fn sync(&self, storage: &dyn Storage) -> Result<()>` 30- [x] In `cli/`: 31 - [x] Add argument parsing that follows POSIX conventions: 32 - Options of the form `-h`, `-V`, `-C dir`, `-d path`, etc. 33 - Options come before operands/subcommands where possible. 34 - [x] Define subcommands (as operands) with their own POSIX-style options: 35 - [x] `sync` 36 - [x] `list` 37 - [x] `export` 38 - [x] `serve` 39- [x] In `core/`: 40 - [x] Implement `sync_all_sources(config, storage)` that calls each fetcher. 41 42## Milestone 1 – Local SQLite Storage (Self-host Base) 43 44**Goal:** `pai` can sync data into a local SQLite file. 45 46- [x] Choose SQLite crate (native mode): 47 - [x] e.g. `rusqlite` 48- [x] Define SQL schema and migrations: 49 - [x] `items` table: 50 51 ```sql 52 CREATE TABLE IF NOT EXISTS items ( 53 id TEXT PRIMARY KEY, 54 source_kind TEXT NOT NULL, 55 source_id TEXT NOT NULL, 56 author TEXT, 57 title TEXT, 58 summary TEXT, 59 url TEXT NOT NULL, 60 content_html TEXT, 61 published_at TEXT NOT NULL, 62 created_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP 63 ); 64 65 CREATE INDEX IF NOT EXISTS idx_items_source_date ON items (source_kind, source_id, published_at DESC); 66 ``` 67 68 - [x] Embed migrations or provide `schema.sql` + `pai db-migrate` command. 69- [x] Implement `SqliteStorage` in `cli/`: 70 - [x] Opens/creates DB at `-d path` or `$XDG_DATA_HOME/pai/pai.db` fallback. 71 - [x] Implements `Storage` trait. 72- [x] Implement `pai sync` path: 73 - [x] `pai sync` → load config → open SQLite → call `sync_all_sources`. 74 - [x] Exit codes: 75 - [x] `0` on success, non-zero on failure. 76- [x] Add `pai db-check`: 77 - [x] Verifies schema and prints basic stats (item count per source). 78 79## Milestone 2 – Source Integrations ✅ 80 81**Goal:** All three sources can be ingested via the CLI. 82 83**Status:** COMPLETE - All three source integrations (Substack RSS, Bluesky AT Protocol, Leaflet RSS) are implemented and tested with real data. 84 85### 2.1 Substack (Pattern Matched) 86 87- [x] Add config support: 88 89 ```toml 90 [sources.substack] 91 enabled = true 92 base_url = "https://patternmatched.substack.com" 93 ``` 94 95- [x] Implement `SubstackFetcher` in `core/`: 96 97 - [x] Fetch `{base_url}/feed`. 98 - [x] Parse RSS using `feed-rs`. 99 - [x] Map `<item>`: 100 101 - [x] `id` = GUID if present, otherwise `link`. 102 - [x] `source_kind = "substack"`. 103 - [x] `source_id = "patternmatched.substack.com"`. 104 - [x] `title`, `summary` from RSS `title`/`description`. 105 - [x] `url` from `link`. 106 - [x] `published_at` from `pubDate` (normalized to ISO 8601). 107- [x] Wire into `sync_all_sources` when enabled. 108 109### 2.2 Bluesky (desertthunder.dev) 110 111- [x] Add config support: 112 113 ```toml 114 [sources.bluesky] 115 enabled = true 116 handle = "desertthunder.dev" 117 ``` 118 119- [x] Implement `BlueskyFetcher` in `core/`: 120 121 - [x] Fetch: 122 123 - [x] `https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed?actor=desertthunder.dev&limit=N` 124 - [x] Filter out reposts/quotes (only original posts). 125 - [x] Map `post` record: 126 127 - [x] `id` = `uri` (AT URI). 128 - [x] `source_kind = "bluesky"`. 129 - [x] `source_id = "desertthunder.dev"`. 130 - [x] `title` = truncated text up to N chars. 131 - [x] `summary` = full text (or truncated). 132 - [x] `url` = canonical `https://bsky.app/profile/…/post/…` derived from URI. 133 - [x] `published_at` = `record.createdAt` (ISO 8601 already). 134 - [ ] Optional: 135 136 - [ ] Support pagination via `cursor` until a configured max number of posts. 137 138### 2.3 Leaflet (desertthunder / stormlightlabs) 139 140- [x] Add config support: 141 142 ```toml 143 [[sources.leaflet]] 144 enabled = true 145 id = "desertthunder" 146 base_url = "https://desertthunder.leaflet.pub" 147 148 [[sources.leaflet]] 149 enabled = true 150 id = "stormlightlabs" 151 base_url = "https://stormlightlabs.leaflet.pub" 152 ``` 153 154- [x] Use AT Protocol instead of HTML parsing: 155 156 - [x] Use `com.atproto.repo.listRecords` with collection `pub.leaflet.post`. 157 158- [x] Implement `LeafletFetcher` in `core/`: 159 160 - [x] For each configured pub: 161 162 - [x] Fetch records using AT Protocol. 163 - [x] Parse `pub.leaflet.post` records. 164 - [x] For each post: 165 166 - [x] Extract `title` from record. 167 - [x] Extract `publishedAt` or `createdAt`. 168 - [x] Derive summary from `summary` or `content` field. 169 - [x] Generate URL using `slug` or record ID. 170 - [x] Normalize date to ISO 8601 for `published_at`. 171 - [x] Insert or replace items in storage. 172 173- [x] Wire into `sync_all_sources`. 174 175## Milestone 3 – Query, Filter, and Export (CLI Only) 176 177**Goal:** Make local data usable even without HTTP. 178 179- [x] Implement `pai list`: 180 - [x] Syntax: `pai list [options]` (options before operands). 181 - [x] Options: 182 - [x] `-k kind` filter by `source_kind` (`substack`, `bluesky`, `leaflet`). 183 - [x] `-S id` filter by `source_id` (host/handle). 184 - [x] `-n N` limit number of results (default 20). 185 - [x] `-s time` “since time” (e.g. ISO 8601, or “7d” shorthand if desired). 186 - [x] `-q pattern` simple substring filter on title/summary. 187 - [x] Render as ASCII table or simple text. 188- [x] Implement `pai export`: 189 - [x] Syntax: `pai export -f format [-o file]`. 190 - [x] Supported formats: 191 - [x] `json` (default). 192 - [x] `ndjson` (optional). 193 - [x] `rss` (optional aggregate). 194 - [x] Options: 195 - [x] `-f format` (`json`, `rss`, …). 196 - [x] `-o path` output file (default stdout). 197- [x] Implement exit statuses for typical cases: 198 - [x] `0` on success. 199 - [x] `>0` on error (bad args, DB error, network failure, etc.). 200 201## Milestone 4 – Self-hosted HTTP Server Mode 202 203**Goal:** Provide a small HTTP API backed by SQLite for self-hosted deployments. 204 205- [x] Add `serve` subcommand in `cli/`: 206 - [x] Syntax: `pai serve [options]`. 207 - [x] Options: 208 - [x] `-d path` database path. 209 - [x] `-a addr` listen address (default `127.0.0.1:8080`). 210 - [x] Follows POSIX conventions: all options before operands. 211- [x] Implement HTTP server (`axum`): 212 - [x] `GET /api/feed` – list all items, newest first. 213 - [x] Query params: 214 - [x] `source_kind`, `source_id`, `limit`, `since`, `q`. 215 - [x] Optional: 216 - [x] `GET /api/item/{id}` for a single item. 217- [x] Ensure graceful shutdown and clean error handling. 218- [x] Document reverse-proxy examples (Caddy, nginx). 219 220## Milestone 5 – Cloudflare Worker + D1 Frontend 221 222**Goal:** Provide an alternative deployment path using Cloudflare Workers with D1 and Cron triggers. 223 224- [ ] In `worker/`: 225 - [ ] Depend on `worker` crate with `d1` feature enabled. 226 - [ ] Reuse `core::Item` and parsing code (ensure crates are WASM-friendly). 227- [ ] Configure D1: 228 - [ ] Provide `schema.sql` compatible with D1 (same `items` table). 229 - [ ] Example `wrangler.toml` with `[[d1_databases]]` binding. 230- [ ] Implement Worker routes: 231 - [ ] `GET /api/feed` with similar semantics as CLI server. 232- [ ] Implement `scheduled` handler for Cron: 233 - [ ] On each scheduled run, call per-source syncers writing to D1. 234 - [ ] Document cron configuration in `wrangler.toml`. 235- [ ] Add `pai cf-init` in `cli/`: 236 - [ ] Generates a starter `wrangler.toml`. 237 - [ ] Prints instructions to create D1 DB and bind it. 238 239## Milestone 6 – POSIX Polish, Packaging, and Docs 240 241**Goal:** Make the CLI feel like a “real UNIX utility” and easy to adopt. 242 243- [ ] Verify POSIX-style argument handling: 244 - [ ] Short options only in usage syntax; long options are optional extensions. 245 - [ ] Options before operands/subcommands in docs and examples. 246 - [ ] Support grouped short options where meaningful (e.g. `-hv`). 247- [ ] Implement: 248 - [ ] `-h` – usage synopsis and options (per POSIX convention). 249 - [ ] `-V` – version info. 250- [ ] Add manpage-style documentation using clap_mangen (<https://crates.io/crates/clap_mangen>) in build.rs: 251 - [ ] `man/pai.1` with SYNOPSIS, DESCRIPTION, OPTIONS, OPERANDS, EXIT STATUS, ENVIRONMENT, FILES, EXAMPLES. 252- [ ] Publish `pai` crate to crates.io. 253- [ ] Write README with: 254 - [ ] Self-hosted quickstart. 255 - [ ] Cloudflare Worker quickstart. 256 - [ ] Config reference (`config.toml`). 257 258## 2. CLI & Config Spec (POSIX-style) 259 260### 2.1 POSIX argument conventions you’re aligning with 261 262Key constraints you want to follow: 263 264- Options are introduced by a single `-` followed by a single letter (`-h`, `-V`, `-d path`). :contentReference[oaicite:0]{index=0} 265- Options that require arguments use a separate token: `-d path` rather than `-dpath`. :contentReference[oaicite:1]{index=1} 266- Options appear before operands (here, subcommands and file paths) in the recommended syntax: 267 `utility_name [-a] [-b arg] operand1 operand2 …`. :contentReference[oaicite:2]{index=2} 268- `-h` for help, `-V` for version are widely conventional. :contentReference[oaicite:3]{index=3} 269 270You *can* still offer `--long-option` aliases as a GNU-style extension; just document the POSIX short forms as canonical. :contentReference[oaicite:4]{index=4} 271 272### 2.2 CLI synopsis 273 274**Utility name:** `pai` (single binary). 275 276#### Global synopsis 277 278```text 279pai [-hV] [-C config_dir] [-d db_path] command [command-options] [command-operands] 280``` 281 282- `-h` 283 Print usage and exit. 284 285- `-V` 286 Print version and exit. 287 288- `-C config_dir` 289 Set configuration directory. Default: `$XDG_CONFIG_HOME/pai` or `$HOME/.config/pai`. 290 291- `-d db_path` 292 Path to SQLite database file. Default: `$XDG_DATA_HOME/pai/pai.db` or `$HOME/.local/share/pai/pai.db`. 293 294Subcommands are treated as **operands** in POSIX terms; each subcommand then has its own POSIX-style options. 295 296### 2.3 Subcommands and their options 297 298#### 1. `sync` – fetch and store content 299 300```text 301pai [-C config_dir] [-d db_path] sync [-a] [-k kind] [-S source_id] 302``` 303 304Options: 305 306- `-a` 307 Sync all configured sources (default if `-k` not specified). 308 309- `-k kind` 310 Sync only a particular source kind: 311 312 - `substack` 313 - `bluesky` 314 - `leaflet` 315 316- `-S source_id` 317 Sync only a specific source instance (e.g. `patternmatched.substack.com`, `desertthunder.dev`, `desertthunder.leaflet.pub`, `stormlightlabs.leaflet.pub`). 318 319Examples: 320 321```sh 322pai sync -a 323pai sync -k substack 324pai sync -k leaflet -S desertthunder.leaflet.pub 325``` 326 327#### 2. `list` – inspect stored items 328 329```text 330pai [-C config_dir] [-d db_path] list [-k kind] [-S source_id] [-n number] [-s since] [-q pattern] 331``` 332 333Options: 334 335- `-k kind` 336 Filter by source kind (`substack`, `bluesky`, `leaflet`). 337 338- `-S source_id` 339 Filter by specific source id (host or handle). 340 341- `-n number` 342 Maximum number of items to display (default 20). 343 344- `-s since` 345 Only show items published at or after this time. The CLI can accept ISO 8601 (`2025-11-23T00:00:00Z`) and, as a convenience, relative strings like `7d`, `24h` if you want. 346 347- `-q pattern` 348 Filter items whose title/summary contains the given substring. 349 350#### 3. `export` – produce feeds/files 351 352```text 353pai [-C config_dir] [-d db_path] export [-k kind] [-S source_id] [-n number] [-s since] [-q pattern] [-f format] [-o file] 354``` 355 356Options (in addition to `list` filters): 357 358- `-f format` 359 Output format: 360 361 - `json` (default) 362 - `ndjson` 363 - `rss` (optional) 364 365- `-o file` 366 Output file. Default is standard output. 367 368Examples: 369 370```sh 371pai export -f json -o activity.json 372pai export -k bluesky -n 50 -f ndjson 373``` 374 375#### 4. `serve` – self-host HTTP API 376 377```text 378pai [-C config_dir] [-d db_path] serve [-a address] 379``` 380 381Options: 382 383- `-a address` 384 Address to bind HTTP server to. Default: `127.0.0.1:8080`. 385 386The HTTP API mirrors the query semantics of `list` and `export`: 387 388- `GET /api/feed?source_kind=bluesky&limit=50&since=...&q=...` 389 390#### 5. `cf-init` – scaffold Cloudflare deployment 391 392```text 393pai cf-init [-o dir] 394``` 395 396Options: 397 398- `-o dir` 399 Directory into which to write `wrangler.toml`, `schema.sql`, and a sample `worker` entry point. Default: current directory. 400 401This command doesn’t need DB access; it just writes templates and prints next steps (create D1 DB, bind it, set up Cron). 402 403### 2.4 `config.toml` spec 404 405**Default location:** 406 407- `$XDG_CONFIG_HOME/pai/config.toml` or 408- `$HOME/.config/pai/config.toml` if `XDG_CONFIG_HOME` is unset. 409 410**Top-level layout:** 411 412```toml 413[database] 414# Path to SQLite database for self-host mode. 415# Ignored by the Worker; used only by `pai` binary. 416path = "/home/owais/.local/share/pai/pai.db" 417 418[deployment] 419# Which deploy targets are configured. 420# "sqlite" is always available; "cloudflare" is optional. 421mode = "sqlite" # or "cloudflare" 422 423[deployment.cloudflare] 424# Optional metadata for generating wrangler.toml, etc. 425worker_name = "personal-activity-index" 426d1_binding = "DB" 427database_name = "personal_activity_db" 428 429[sources.substack] 430enabled = true 431base_url = "https://patternmatched.substack.com" 432 433[sources.bluesky] 434enabled = true 435handle = "desertthunder.dev" 436 437[[sources.leaflet]] 438enabled = true 439id = "desertthunder" 440base_url = "https://desertthunder.leaflet.pub" 441 442[[sources.leaflet]] 443enabled = true 444id = "stormlightlabs" 445base_url = "https://stormlightlabs.leaflet.pub" 446``` 447 448**Notes:** 449 450- The CLI should **not** require the Cloudflare section unless a user explicitly wants to generate Worker scaffolding. 451- The Worker itself will get its D1 binding and Cron schedule from `wrangler.toml` and the Cloudflare dashboard, not from this config file; you just reuse the same schema and `Item` type. 452 453### 2.5 POSIX compliance checklist 454 455When you implement the CLI parsing, you can sanity-check against POSIX & GNU guidance: 456 457- Short options are single letters with a single leading `-`. ([The Open Group][1]) 458- Options precede non-option arguments (your commands and operands) in the usage examples. ([The Open Group][1]) 459- Options that take arguments are formatted as `-x arg` rather than `-xarg` in documentation. ([gnu.org][2]) 460- You provide `-h` / `-V` and consistent help text. ([Baeldung on Kotlin][3]) 461- Long options (`--help`, `--version`, `--config-dir`, etc.) can be supported as extensions but are not required for conformance. ([Software Engineering Stack Exchange][4]) 462 463[1]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html "12. Utility Conventions" 464[2]: https://www.gnu.org/s/libc/manual/html_node/Argument-Syntax.html "Argument Syntax (The GNU C Library)" 465[3]: https://www.baeldung.com/linux/posix "A Guide to POSIX | Baeldung on Linux" 466[4]: https://softwareengineering.stackexchange.com/questions/70357/command-line-options-style-posix-or-what "Command line options style - POSIX or what?"