commit aef8e7f6c89ae621b5b3ecd7db227c4390dfac67 · zzstoatzz.io/leaflet-search

+58 -5

docs/standard-search-planning.md

··· 25 25 26 26 also added loading indicator for "related to" results in frontend. 27 27 28 + ### recent work (2026-01-06) 29 + 30 + - merged PR1: multi-platform schema (platform + source_collection columns) 31 + - added `loading.js` - portable loading state handler for dashboards 32 + - skeleton shimmer while loading 33 + - "waking up" toast after 2s threshold (fly.io cold start handling) 34 + - designed to be copied to other projects 35 + - fixed pluralization ("1 result" vs "2 results") 36 + 28 37 ## what we know 29 38 30 39 ### standard.site lexicons ··· 195 204 - [ ] should we dedupe platform-specific vs standard records? 196 205 - [ ] embeddings: regenerate for all, or use same model? 197 206 207 + ## implementation plan (PRs) 208 + 209 + breaking work into reviewable chunks: 210 + 211 + ### PR1: database schema for multi-platform ✅ MERGED 212 + - add `platform TEXT` column to documents (default 'leaflet') 213 + - add `source_collection TEXT` column (default 'pub.leaflet.document') 214 + - backfill existing ~3500 records 215 + - no behavior change, just schema prep 216 + - https://github.com/zzstoatzz/leaflet-search/pull/1 217 + 218 + ### PR2: generalized content extraction 219 + - new `extractor.zig` module with platform-agnostic interface 220 + - `textContent` extraction for standard.site records 221 + - keep existing block parser for `pub.leaflet.*` 222 + - platform detection from `content.$type` 223 + 224 + ### PR3: TAP subscriber for site.standard.document 225 + - subscribe to `site.standard.document` + `site.standard.publication` 226 + - route to appropriate extractor 227 + - starts ingesting pckt.blog content 228 + 229 + ### PR4: API platform filter 230 + - add `?platform=` query param to `/search` 231 + - include `platform` field in results 232 + - frontend: show platform badge, optional filter 233 + 234 + ### PR5 (optional, separate track): witness cache 235 + - `witness_cache` table for raw records 236 + - replay tooling for backfills 237 + - independent of above work 238 + 239 + ## operational notes 240 + 241 + - **cloudflare pages**: `leaflet-search` does NOT auto-deploy from git. manual deploy required: 242 + ```bash 243 + wrangler pages deploy site --project-name leaflet-search 244 + ``` 245 + - **fly.io backend**: deploy from backend directory: 246 + ```bash 247 + cd backend && fly deploy 248 + ``` 249 + - **git remotes**: push to both `origin` (tangled.sh) and `github` (for MCP + PRs) 250 + 198 251 ## next steps 199 252 200 253 1. ~~verify leaflet's site.standard.document structure~~ (done - they don't have any) 201 254 2. ~~find and examine offprint records~~ (done - no public content yet) 202 - 3. decide on hybrid vs wait approach 203 - 4. consider witness cache architecture (see below) 204 - 5. design database migration 205 - 6. implement generalized tap subscriber 206 - 7. test with multi-platform data 255 + 3. ~~PR1: database schema~~ (merged) 256 + 4. PR2: generalized content extraction 257 + 5. PR3: TAP subscriber 258 + 6. PR4: API platform filter 259 + 7. consider witness cache architecture (see below) 207 260 208 261 --- 209 262