personal activity index (bluesky, leaflet, substack)
pai.desertthunder.dev
rss
bluesky
1
2# Personal Activity Index CLI – Roadmap & Tasks
3
4Objective:
5Build a POSIX-style Rust CLI that ingests content from Substack, Bluesky, and Leaflet into SQLite, with an optional Cloudflare Worker + D1 deployment path.
6
7Targets:
8
9- Self-host: single binary + SQLite.
10- Cloudflare: Rust Worker + D1 + Cron triggers.
11
12## Workspace & Architecture
13
14**Goal:** Shared core library, CLI frontend, and Worker frontend, with clear separation of concerns.
15
16- [x] Create Cargo workspace layout:
17 - [x] `core/` – shared types, fetchers, and storage traits.
18 - [x] `cli/` – POSIX-style binary (`pai`).
19 - [x] `worker/` – Cloudflare Worker using `workers-rs`.
20- [x] In `core/`:
21 - [x] Define `SourceKind` enum: `substack`, `bluesky`, `leaflet`.
22 - [x] Define `Item` struct with fields:
23 - [x] `id`, `source_kind`, `source_id`, `author`, `title`, `summary`,
24 `url`, `content_html`, `published_at`, `created_at`.
25 - [x] Define `Storage` trait with at minimum:
26 - [x] `insert_or_replace_item(&self, item: &Item) -> Result<()>`
27 - [x] `list_items(&self, filter: &ListFilter) -> Result<Vec<Item>>`
28 - [x] Define `SourceFetcher` trait:
29 - [x] `fn sync(&self, storage: &dyn Storage) -> Result<()>`
30- [x] In `cli/`:
31 - [x] Add argument parsing that follows POSIX conventions:
32 - Options of the form `-h`, `-V`, `-C dir`, `-d path`, etc.
33 - Options come before operands/subcommands where possible.
34 - [x] Define subcommands (as operands) with their own POSIX-style options:
35 - [x] `sync`
36 - [x] `list`
37 - [x] `export`
38 - [x] `serve`
39- [x] In `core/`:
40 - [x] Implement `sync_all_sources(config, storage)` that calls each fetcher.
41
42## Milestone 1 – Local SQLite Storage (Self-host Base)
43
44**Goal:** `pai` can sync data into a local SQLite file.
45
46- [x] Choose SQLite crate (native mode):
47 - [x] e.g. `rusqlite`
48- [x] Define SQL schema and migrations:
49 - [x] `items` table:
50
51 ```sql
52 CREATE TABLE IF NOT EXISTS items (
53 id TEXT PRIMARY KEY,
54 source_kind TEXT NOT NULL,
55 source_id TEXT NOT NULL,
56 author TEXT,
57 title TEXT,
58 summary TEXT,
59 url TEXT NOT NULL,
60 content_html TEXT,
61 published_at TEXT NOT NULL,
62 created_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP
63 );
64
65 CREATE INDEX IF NOT EXISTS idx_items_source_date ON items (source_kind, source_id, published_at DESC);
66 ```
67
68 - [x] Embed migrations or provide `schema.sql` + `pai db-migrate` command.
69- [x] Implement `SqliteStorage` in `cli/`:
70 - [x] Opens/creates DB at `-d path` or `$XDG_DATA_HOME/pai/pai.db` fallback.
71 - [x] Implements `Storage` trait.
72- [x] Implement `pai sync` path:
73 - [x] `pai sync` → load config → open SQLite → call `sync_all_sources`.
74 - [x] Exit codes:
75 - [x] `0` on success, non-zero on failure.
76- [x] Add `pai db-check`:
77 - [x] Verifies schema and prints basic stats (item count per source).
78
79## Milestone 2 – Source Integrations ✅
80
81**Goal:** All three sources can be ingested via the CLI.
82
83**Status:** COMPLETE - All three source integrations (Substack RSS, Bluesky AT Protocol, Leaflet RSS) are implemented and tested with real data.
84
85### 2.1 Substack (Pattern Matched)
86
87- [x] Add config support:
88
89 ```toml
90 [sources.substack]
91 enabled = true
92 base_url = "https://patternmatched.substack.com"
93 ```
94
95- [x] Implement `SubstackFetcher` in `core/`:
96
97 - [x] Fetch `{base_url}/feed`.
98 - [x] Parse RSS using `feed-rs`.
99 - [x] Map `<item>`:
100
101 - [x] `id` = GUID if present, otherwise `link`.
102 - [x] `source_kind = "substack"`.
103 - [x] `source_id = "patternmatched.substack.com"`.
104 - [x] `title`, `summary` from RSS `title`/`description`.
105 - [x] `url` from `link`.
106 - [x] `published_at` from `pubDate` (normalized to ISO 8601).
107- [x] Wire into `sync_all_sources` when enabled.
108
109### 2.2 Bluesky (desertthunder.dev)
110
111- [x] Add config support:
112
113 ```toml
114 [sources.bluesky]
115 enabled = true
116 handle = "desertthunder.dev"
117 ```
118
119- [x] Implement `BlueskyFetcher` in `core/`:
120
121 - [x] Fetch:
122
123 - [x] `https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed?actor=desertthunder.dev&limit=N`
124 - [x] Filter out reposts/quotes (only original posts).
125 - [x] Map `post` record:
126
127 - [x] `id` = `uri` (AT URI).
128 - [x] `source_kind = "bluesky"`.
129 - [x] `source_id = "desertthunder.dev"`.
130 - [x] `title` = truncated text up to N chars.
131 - [x] `summary` = full text (or truncated).
132 - [x] `url` = canonical `https://bsky.app/profile/…/post/…` derived from URI.
133 - [x] `published_at` = `record.createdAt` (ISO 8601 already).
134 - [ ] Optional:
135
136 - [ ] Support pagination via `cursor` until a configured max number of posts.
137
138### 2.3 Leaflet (desertthunder / stormlightlabs)
139
140- [x] Add config support:
141
142 ```toml
143 [[sources.leaflet]]
144 enabled = true
145 id = "desertthunder"
146 base_url = "https://desertthunder.leaflet.pub"
147
148 [[sources.leaflet]]
149 enabled = true
150 id = "stormlightlabs"
151 base_url = "https://stormlightlabs.leaflet.pub"
152 ```
153
154- [x] Use AT Protocol instead of HTML parsing:
155
156 - [x] Use `com.atproto.repo.listRecords` with collection `pub.leaflet.post`.
157
158- [x] Implement `LeafletFetcher` in `core/`:
159
160 - [x] For each configured pub:
161
162 - [x] Fetch records using AT Protocol.
163 - [x] Parse `pub.leaflet.post` records.
164 - [x] For each post:
165
166 - [x] Extract `title` from record.
167 - [x] Extract `publishedAt` or `createdAt`.
168 - [x] Derive summary from `summary` or `content` field.
169 - [x] Generate URL using `slug` or record ID.
170 - [x] Normalize date to ISO 8601 for `published_at`.
171 - [x] Insert or replace items in storage.
172
173- [x] Wire into `sync_all_sources`.
174
175## Milestone 3 – Query, Filter, and Export (CLI Only)
176
177**Goal:** Make local data usable even without HTTP.
178
179- [x] Implement `pai list`:
180 - [x] Syntax: `pai list [options]` (options before operands).
181 - [x] Options:
182 - [x] `-k kind` filter by `source_kind` (`substack`, `bluesky`, `leaflet`).
183 - [x] `-S id` filter by `source_id` (host/handle).
184 - [x] `-n N` limit number of results (default 20).
185 - [x] `-s time` “since time” (e.g. ISO 8601, or “7d” shorthand if desired).
186 - [x] `-q pattern` simple substring filter on title/summary.
187 - [x] Render as ASCII table or simple text.
188- [x] Implement `pai export`:
189 - [x] Syntax: `pai export -f format [-o file]`.
190 - [x] Supported formats:
191 - [x] `json` (default).
192 - [x] `ndjson` (optional).
193 - [x] `rss` (optional aggregate).
194 - [x] Options:
195 - [x] `-f format` (`json`, `rss`, …).
196 - [x] `-o path` output file (default stdout).
197- [x] Implement exit statuses for typical cases:
198 - [x] `0` on success.
199 - [x] `>0` on error (bad args, DB error, network failure, etc.).
200
201## Milestone 4 – Self-hosted HTTP Server Mode
202
203**Goal:** Provide a small HTTP API backed by SQLite for self-hosted deployments.
204
205- [x] Add `serve` subcommand in `cli/`:
206 - [x] Syntax: `pai serve [options]`.
207 - [x] Options:
208 - [x] `-d path` database path.
209 - [x] `-a addr` listen address (default `127.0.0.1:8080`).
210 - [x] Follows POSIX conventions: all options before operands.
211- [x] Implement HTTP server (`axum`):
212 - [x] `GET /api/feed` – list all items, newest first.
213 - [x] Query params:
214 - [x] `source_kind`, `source_id`, `limit`, `since`, `q`.
215 - [x] Optional:
216 - [x] `GET /api/item/{id}` for a single item.
217- [x] Ensure graceful shutdown and clean error handling.
218- [x] Document reverse-proxy examples (Caddy, nginx).
219
220## Milestone 5 – Cloudflare Worker + D1 Frontend
221
222**Goal:** Provide an alternative deployment path using Cloudflare Workers with D1 and Cron triggers.
223
224- [ ] In `worker/`:
225 - [ ] Depend on `worker` crate with `d1` feature enabled.
226 - [ ] Reuse `core::Item` and parsing code (ensure crates are WASM-friendly).
227- [ ] Configure D1:
228 - [ ] Provide `schema.sql` compatible with D1 (same `items` table).
229 - [ ] Example `wrangler.toml` with `[[d1_databases]]` binding.
230- [ ] Implement Worker routes:
231 - [ ] `GET /api/feed` with similar semantics as CLI server.
232- [ ] Implement `scheduled` handler for Cron:
233 - [ ] On each scheduled run, call per-source syncers writing to D1.
234 - [ ] Document cron configuration in `wrangler.toml`.
235- [ ] Add `pai cf-init` in `cli/`:
236 - [ ] Generates a starter `wrangler.toml`.
237 - [ ] Prints instructions to create D1 DB and bind it.
238
239## Milestone 6 – POSIX Polish, Packaging, and Docs
240
241**Goal:** Make the CLI feel like a “real UNIX utility” and easy to adopt.
242
243- [ ] Verify POSIX-style argument handling:
244 - [ ] Short options only in usage syntax; long options are optional extensions.
245 - [ ] Options before operands/subcommands in docs and examples.
246 - [ ] Support grouped short options where meaningful (e.g. `-hv`).
247- [ ] Implement:
248 - [ ] `-h` – usage synopsis and options (per POSIX convention).
249 - [ ] `-V` – version info.
250- [ ] Add manpage-style documentation using clap_mangen (<https://crates.io/crates/clap_mangen>) in build.rs:
251 - [ ] `man/pai.1` with SYNOPSIS, DESCRIPTION, OPTIONS, OPERANDS, EXIT STATUS, ENVIRONMENT, FILES, EXAMPLES.
252- [ ] Publish `pai` crate to crates.io.
253- [ ] Write README with:
254 - [ ] Self-hosted quickstart.
255 - [ ] Cloudflare Worker quickstart.
256 - [ ] Config reference (`config.toml`).
257
258## 2. CLI & Config Spec (POSIX-style)
259
260### 2.1 POSIX argument conventions you’re aligning with
261
262Key constraints you want to follow:
263
264- Options are introduced by a single `-` followed by a single letter (`-h`, `-V`, `-d path`). :contentReference[oaicite:0]{index=0}
265- Options that require arguments use a separate token: `-d path` rather than `-dpath`. :contentReference[oaicite:1]{index=1}
266- Options appear before operands (here, subcommands and file paths) in the recommended syntax:
267 `utility_name [-a] [-b arg] operand1 operand2 …`. :contentReference[oaicite:2]{index=2}
268- `-h` for help, `-V` for version are widely conventional. :contentReference[oaicite:3]{index=3}
269
270You *can* still offer `--long-option` aliases as a GNU-style extension; just document the POSIX short forms as canonical. :contentReference[oaicite:4]{index=4}
271
272### 2.2 CLI synopsis
273
274**Utility name:** `pai` (single binary).
275
276#### Global synopsis
277
278```text
279pai [-hV] [-C config_dir] [-d db_path] command [command-options] [command-operands]
280```
281
282- `-h`
283 Print usage and exit.
284
285- `-V`
286 Print version and exit.
287
288- `-C config_dir`
289 Set configuration directory. Default: `$XDG_CONFIG_HOME/pai` or `$HOME/.config/pai`.
290
291- `-d db_path`
292 Path to SQLite database file. Default: `$XDG_DATA_HOME/pai/pai.db` or `$HOME/.local/share/pai/pai.db`.
293
294Subcommands are treated as **operands** in POSIX terms; each subcommand then has its own POSIX-style options.
295
296### 2.3 Subcommands and their options
297
298#### 1. `sync` – fetch and store content
299
300```text
301pai [-C config_dir] [-d db_path] sync [-a] [-k kind] [-S source_id]
302```
303
304Options:
305
306- `-a`
307 Sync all configured sources (default if `-k` not specified).
308
309- `-k kind`
310 Sync only a particular source kind:
311
312 - `substack`
313 - `bluesky`
314 - `leaflet`
315
316- `-S source_id`
317 Sync only a specific source instance (e.g. `patternmatched.substack.com`, `desertthunder.dev`, `desertthunder.leaflet.pub`, `stormlightlabs.leaflet.pub`).
318
319Examples:
320
321```sh
322pai sync -a
323pai sync -k substack
324pai sync -k leaflet -S desertthunder.leaflet.pub
325```
326
327#### 2. `list` – inspect stored items
328
329```text
330pai [-C config_dir] [-d db_path] list [-k kind] [-S source_id] [-n number] [-s since] [-q pattern]
331```
332
333Options:
334
335- `-k kind`
336 Filter by source kind (`substack`, `bluesky`, `leaflet`).
337
338- `-S source_id`
339 Filter by specific source id (host or handle).
340
341- `-n number`
342 Maximum number of items to display (default 20).
343
344- `-s since`
345 Only show items published at or after this time. The CLI can accept ISO 8601 (`2025-11-23T00:00:00Z`) and, as a convenience, relative strings like `7d`, `24h` if you want.
346
347- `-q pattern`
348 Filter items whose title/summary contains the given substring.
349
350#### 3. `export` – produce feeds/files
351
352```text
353pai [-C config_dir] [-d db_path] export [-k kind] [-S source_id] [-n number] [-s since] [-q pattern] [-f format] [-o file]
354```
355
356Options (in addition to `list` filters):
357
358- `-f format`
359 Output format:
360
361 - `json` (default)
362 - `ndjson`
363 - `rss` (optional)
364
365- `-o file`
366 Output file. Default is standard output.
367
368Examples:
369
370```sh
371pai export -f json -o activity.json
372pai export -k bluesky -n 50 -f ndjson
373```
374
375#### 4. `serve` – self-host HTTP API
376
377```text
378pai [-C config_dir] [-d db_path] serve [-a address]
379```
380
381Options:
382
383- `-a address`
384 Address to bind HTTP server to. Default: `127.0.0.1:8080`.
385
386The HTTP API mirrors the query semantics of `list` and `export`:
387
388- `GET /api/feed?source_kind=bluesky&limit=50&since=...&q=...`
389
390#### 5. `cf-init` – scaffold Cloudflare deployment
391
392```text
393pai cf-init [-o dir]
394```
395
396Options:
397
398- `-o dir`
399 Directory into which to write `wrangler.toml`, `schema.sql`, and a sample `worker` entry point. Default: current directory.
400
401This command doesn’t need DB access; it just writes templates and prints next steps (create D1 DB, bind it, set up Cron).
402
403### 2.4 `config.toml` spec
404
405**Default location:**
406
407- `$XDG_CONFIG_HOME/pai/config.toml` or
408- `$HOME/.config/pai/config.toml` if `XDG_CONFIG_HOME` is unset.
409
410**Top-level layout:**
411
412```toml
413[database]
414# Path to SQLite database for self-host mode.
415# Ignored by the Worker; used only by `pai` binary.
416path = "/home/owais/.local/share/pai/pai.db"
417
418[deployment]
419# Which deploy targets are configured.
420# "sqlite" is always available; "cloudflare" is optional.
421mode = "sqlite" # or "cloudflare"
422
423[deployment.cloudflare]
424# Optional metadata for generating wrangler.toml, etc.
425worker_name = "personal-activity-index"
426d1_binding = "DB"
427database_name = "personal_activity_db"
428
429[sources.substack]
430enabled = true
431base_url = "https://patternmatched.substack.com"
432
433[sources.bluesky]
434enabled = true
435handle = "desertthunder.dev"
436
437[[sources.leaflet]]
438enabled = true
439id = "desertthunder"
440base_url = "https://desertthunder.leaflet.pub"
441
442[[sources.leaflet]]
443enabled = true
444id = "stormlightlabs"
445base_url = "https://stormlightlabs.leaflet.pub"
446```
447
448**Notes:**
449
450- The CLI should **not** require the Cloudflare section unless a user explicitly wants to generate Worker scaffolding.
451- The Worker itself will get its D1 binding and Cron schedule from `wrangler.toml` and the Cloudflare dashboard, not from this config file; you just reuse the same schema and `Item` type.
452
453### 2.5 POSIX compliance checklist
454
455When you implement the CLI parsing, you can sanity-check against POSIX & GNU guidance:
456
457- Short options are single letters with a single leading `-`. ([The Open Group][1])
458- Options precede non-option arguments (your commands and operands) in the usage examples. ([The Open Group][1])
459- Options that take arguments are formatted as `-x arg` rather than `-xarg` in documentation. ([gnu.org][2])
460- You provide `-h` / `-V` and consistent help text. ([Baeldung on Kotlin][3])
461- Long options (`--help`, `--version`, `--config-dir`, etc.) can be supported as extensions but are not required for conformance. ([Software Engineering Stack Exchange][4])
462
463[1]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html "12. Utility Conventions"
464[2]: https://www.gnu.org/s/libc/manual/html_node/Argument-Syntax.html "Argument Syntax (The GNU C Library)"
465[3]: https://www.baeldung.com/linux/posix "A Guide to POSIX | Baeldung on Linux"
466[4]: https://softwareengineering.stackexchange.com/questions/70357/command-line-options-style-posix-or-what "Command line options style - POSIX or what?"