personal activity index (bluesky, leaflet, substack) pai.desertthunder.dev
rss bluesky
at main 373 lines 11 kB view raw view rendered
1<!-- markdownlint-disable MD033 --> 2 3# Personal Activity Index 4 5A CLI that ingests content from Substack, Bluesky, Leaflet, and BearBlog into SQLite, with an optional Cloudflare Worker + D1 deployment path. 6 7## Features 8 9- Fetch posts from multiple sources: 10 - **Substack** via RSS feeds 11 - **Bluesky** via AT Protocol 12 - **Leaflet** publications via RSS feeds 13 - **BearBlog** publications via RSS feeds 14- Local SQLite storage with full-text search 15- Flexible filtering and querying via `pai list` / `pai export` 16- Self-hostable HTTP API (`pai serve` exposes `/api/feed`, `/api/item/{id}`, and `/status`) 17- Cloudflare Worker deployment path (D1) for serverless setups 18 19## Quick Start 20 21```bash 22# Install 23cargo install --path cli 24 25# Initialize config (creates ~/.config/pai/config.toml) 26pai init 27 28# Edit config with your sources 29$EDITOR ~/.config/pai/config.toml 30 31# Sync content 32pai sync 33 34# List items 35pai list -n 10 36 37# Check database 38pai db-check 39 40# Install the manpage so `man pai` works 41pai man --install 42 43# Generate manpage to a file 44pai man -o pai.1 45``` 46 47<details> 48<summary>For server mode, run the built-in HTTP server against your SQLite database:</summary> 49 50<br> 51 52```bash 53pai serve -d /var/lib/pai/pai.db -a 127.0.0.1:8080 54``` 55 56Endpoints: 57 58- `GET /api/feed` – list newest items (supports `source_kind`, `source_id`, `limit`, `since`, `q`) 59- `GET /api/item/{id}` – fetch a single item 60- `GET /status` – health/status summary (total items, counts per source) 61 62For reverse-proxy examples (nginx, Caddy, Docker), see [DEPLOYMENT.md](./DEPLOYMENT.md). 63 64</details> 65 66## Configuration 67 68Configuration is loaded from `$XDG_CONFIG_HOME/pai/config.toml` or `$HOME/.config/pai/config.toml`. 69 70See [config.example.toml](./config.example.toml) for a complete example with all available options. 71 72<details> 73<summary> 74CORS Configuration 75</summary> 76 77Both the HTTP server and Cloudflare Worker support CORS configuration to allow cross-origin requests from your web applications. 78 79### HTTP Server (config.toml) 80 81 Add a `[cors]` section to your config file: 82 83 ```toml 84 [cors] 85 allowed_origins = ["https://desertthunder.dev", "http://localhost:4321"] 86 dev_key = "your-secret-dev-key" 87 ``` 88 89 Configuration options: 90 91- **allowed_origins**: List of allowed origins. Supports: 92 - Exact match: `http://localhost:4321` only allows that exact origin 93 - Same-root-domain: `https://desertthunder.dev` also allows `https://pai.desertthunder.dev`, `https://api.desertthunder.dev`, etc. 94- **dev_key**: Optional development key for local testing. 95 When set, requests with the `X-Local-Dev-Key` header matching this value are allowed regardless of origin. 96 97### Cloudflare Worker (Environment Variables) 98 99 Configure CORS via environment variables in `wrangler.toml`: 100 101 ```toml 102 [vars] 103 CORS_ALLOWED_ORIGINS = "https://desertthunder.dev,http://localhost:4321" 104 CORS_DEV_KEY = "your-secret-dev-key" 105 ``` 106 107- **CORS_ALLOWED_ORIGINS**: Comma-separated list of allowed origins 108- **CORS_DEV_KEY**: Optional development key (same behavior as HTTP server) 109 110#### Local Development with X-LOCAL-DEV-KEY 111 112For local development from Astro or other frameworks: 113 1141. Add a `dev_key` to your CORS config: 115 116 ```toml 117 [cors] 118 allowed_origins = ["http://localhost:4321"] 119 dev_key = "local-dev-secret-123" 120 ``` 121 1222. Include the header in your API requests: 123 124 ```javascript 125 fetch('http://localhost:8080/api/feed', { 126 headers: { 127 'X-Local-Dev-Key': 'local-dev-secret-123' 128 } 129 }) 130 ``` 131 132 The dev key header bypasses origin checking, useful for testing from different local ports or during development. 133 134#### Same-Root-Domain Support 135 136 If you configure `allowed_origins = ["https://desertthunder.dev"]`, requests from: 137 138- `https://desertthunder.dev` ✓ (exact match) 139- `https://pai.desertthunder.dev` ✓ (subdomain of allowed root) 140- `https://api.desertthunder.dev` ✓ (subdomain of allowed root) 141- `https://evil.dev` ✗ (different root domain) 142 143 This allows you to deploy the API at `pai.desertthunder.dev` and access it from your main site at `desertthunder.dev` without explicitly listing every subdomain. 144 145</details> 146 147## Documentation 148 149- CLI synopsis: `pai -h`, `pai <command> -h`, or `pai man` for the generated `pai(1)` page. 150- `pai man --install [--install-dir DIR]` copies `pai.1` into a MANPATH directory (defaults to `~/.local/share/man/man1`) 151- Database schema and config reference: [config.example.toml](./config.example.toml). 152- Deployment topologies: [DEPLOYMENT.md](./DEPLOYMENT.md). 153 154## Architecture 155 156The project is organized as a Cargo workspace 157 158```sh 159. 160├── core # Shared types, fetchers, and the storage trait 161├── cli # CLI binary (POSIX-compliant) 162└── worker # Cloudflare Worker deployment using workers-rs 163``` 164 165<details> 166<summary><strong>Source Implementations</strong></summary> 167 168### Substack (RSS) 169 170Substack fetcher uses standard RSS 2.0 feeds available at `{base_url}/feed`. 171 172**Implementation:** 173 174- Fetches RSS feed using `feed-rs` parser 175- Maps RSS `<item>` elements to standardized `Item` struct 176- Uses GUID as item ID, falls back to link if GUID is missing 177- Normalizes `pubDate` to ISO 8601 format 178 179**Key mappings:** 180 181- `id` = RSS GUID or link 182- `source_kind` = `substack` 183- `source_id` = Domain extracted from base_url 184- `title` = RSS title 185- `summary` = RSS description 186- `url` = RSS link 187- `content_html` = RSS content (if available) 188- `published_at` = RSS pubDate (normalized to ISO 8601) 189 190**Example RSS structure:** 191 192```xml 193<item> 194 <title>Post Title</title> 195 <link>https://example.substack.com/p/post-slug</link> 196 <guid>https://example.substack.com/p/post-slug</guid> 197 <pubDate>Mon, 01 Jan 2024 12:00:00 +0000</pubDate> 198 <description>Post summary or excerpt</description> 199</item> 200``` 201 202### AT Protocol Integration (Bluesky) 203 204#### Overview 205 206Bluesky is built on the AT Protocol (Authenticated Transfer Protocol), a decentralized social networking protocol. 207 208**Key Concepts:** 209 210- **DID (Decentralized Identifier)**: Unique identifier for users (e.g., `did:plc:xyz123`) 211- **Handle**: Human-readable identifier (e.g., `user.bsky.social`) 212- **AT URI**: Resource identifier (e.g., `at://did:plc:xyz/app.bsky.feed.post/abc123`) 213- **Lexicon**: Schema definition language for records and API methods 214- **XRPC**: HTTP API wrapper for AT Protocol methods 215- **PDS (Personal Data Server)**: Server that stores user data 216 217#### Implementation 218 219Bluesky uses standard `app.bsky.feed.post` records and provides a public API for fetching posts. 220 221**Endpoint:** `GET https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed` 222 223**Parameters:** 224 225- `actor` - User handle or DID 226- `limit` - Number of posts to fetch (default: 50) 227- `cursor` - Pagination cursor (optional) 228 229**Implementation:** 230 231- Fetches author feed using `app.bsky.feed.getAuthorFeed` 232- Filters out reposts and quotes (only includes original posts) 233- Converts AT URIs to canonical Bluesky URLs 234- Truncates long post text to create titles 235 236**Key mappings:** 237 238- `id` = AT URI (e.g., `at://did:plc:xyz/app.bsky.feed.post/abc123`) 239- `source_kind` = `bluesky` 240- `source_id` = User handle 241- `title` = Truncated post text (first 100 chars) 242- `summary` = Full post text 243- `url` = Canonical URL (`https://bsky.app/profile/{handle}/post/{post_id}`) 244- `author` = Post author handle 245- `published_at` = Post `createdAt` timestamp 246 247**Filtering reposts:** 248Posts with a `reason` field (indicating repost or quote) are excluded to fetch only original content. 249 250### Leaflet (RSS) 251 252#### Overview 253 254Leaflet publications provide RSS feeds at `{base_url}/rss`, making them straightforward to fetch using standard RSS parsing. 255 256**Note:** While Leaflet is built on AT Protocol and uses custom `pub.leaflet.post` records, we use RSS feeds for simplicity and reliability. Leaflet's RSS implementation provides all necessary metadata without requiring AT Protocol PDS queries. 257 258**Implementation:** 259 260- Fetches RSS feed using `feed-rs` parser 261- Maps RSS `<item>` elements to standardized `Item` struct 262- Supports multiple publications via config array 263- Uses entry ID from feed, falls back to link if missing 264- Normalizes publication dates to ISO 8601 format 265 266**Key mappings:** 267 268- `id` = RSS entry ID or link 269- `source_kind` = `leaflet` 270- `source_id` = Publication ID from config (e.g., `desertthunder`, `stormlightlabs`) 271- `title` = RSS entry title 272- `summary` = RSS entry summary/description 273- `url` = RSS entry link 274- `content_html` = RSS content body (if available) 275- `author` = RSS entry author 276- `published_at` = RSS published date or updated date (normalized to ISO 8601) 277 278**Configuration:** 279 280Leaflet supports multiple publications through array configuration: 281 282```toml 283[[sources.leaflet]] 284enabled = true 285id = "desertthunder" 286base_url = "https://desertthunder.leaflet.pub" 287 288[[sources.leaflet]] 289enabled = true 290id = "stormlightlabs" 291base_url = "https://stormlightlabs.leaflet.pub" 292``` 293 294**Example RSS structure:** 295 296```xml 297<item> 298 <title>Dev Log: 2025-11-22</title> 299 <link>https://desertthunder.leaflet.pub/3m6a7fuk7u22p</link> 300 <guid>https://desertthunder.leaflet.pub/3m6a7fuk7u22p</guid> 301 <pubDate>Fri, 22 Nov 2025 16:22:54 +0000</pubDate> 302 <description>Post summary or excerpt</description> 303</item> 304``` 305 306### BearBlog (RSS) 307 308#### Overview 309 310BearBlog is a minimalist blogging platform that provides RSS feeds at `{slug}.bearblog.dev/feed/`, making them straightforward to fetch using standard RSS parsing. 311 312**Implementation:** 313 314- Fetches RSS feed using `feed-rs` parser 315- Maps RSS `<item>` elements to standardized `Item` struct 316- Supports multiple blogs via config array 317- Uses entry ID from feed, falls back to link if missing 318- Normalizes publication dates to ISO 8601 format 319 320**Key mappings:** 321 322- `id` = RSS entry ID or link 323- `source_kind` = `bearblog` 324- `source_id` = Blog ID from config (e.g., `desertthunder`) 325- `title` = RSS entry title 326- `summary` = RSS entry summary/description 327- `url` = RSS entry link 328- `content_html` = RSS content body (if available) 329- `author` = RSS entry author 330- `published_at` = RSS published date or updated date (normalized to ISO 8601) 331 332**Configuration:** 333 334BearBlog supports multiple blogs through array configuration: 335 336```toml 337[[sources.bearblog]] 338enabled = true 339id = "desertthunder" 340base_url = "https://desertthunder.bearblog.dev" 341 342[[sources.bearblog]] 343enabled = true 344id = "another-blog" 345base_url = "https://another-blog.bearblog.dev" 346``` 347 348**Example RSS structure:** 349 350```xml 351<item> 352 <title>My Blog Post</title> 353 <link>https://desertthunder.bearblog.dev/my-blog-post</link> 354 <guid>https://desertthunder.bearblog.dev/my-blog-post</guid> 355 <pubDate>Fri, 22 Nov 2025 16:22:54 +0000</pubDate> 356 <description>Post summary or excerpt</description> 357</item> 358``` 359 360</details> 361 362## References 363 364- [AT Protocol Documentation](https://atproto.com) 365- [Lexicon Guide](https://atproto.com/guides/lexicon) - Schema definition language 366- [XRPC Specification](https://atproto.com/specs/xrpc) - HTTP API wrapper 367- [Bluesky API Documentation](https://docs.bsky.app/) 368- [Leaflet](https://tangled.org/leaflet.pub/leaflet) - Leaflet source code 369- [Leaflet Manual](https://about.leaflet.pub/) - User-facing documentation 370 371## License 372 373See [LICENSE](./LICENSE)