+58
-5
docs/standard-search-planning.md
+58
-5
docs/standard-search-planning.md
···
25
25
26
26
also added loading indicator for "related to" results in frontend.
27
27
28
+
### recent work (2026-01-06)
29
+
30
+
- merged PR1: multi-platform schema (platform + source_collection columns)
31
+
- added `loading.js` - portable loading state handler for dashboards
32
+
- skeleton shimmer while loading
33
+
- "waking up" toast after 2s threshold (fly.io cold start handling)
34
+
- designed to be copied to other projects
35
+
- fixed pluralization ("1 result" vs "2 results")
36
+
28
37
## what we know
29
38
30
39
### standard.site lexicons
···
195
204
- [ ] should we dedupe platform-specific vs standard records?
196
205
- [ ] embeddings: regenerate for all, or use same model?
197
206
207
+
## implementation plan (PRs)
208
+
209
+
breaking work into reviewable chunks:
210
+
211
+
### PR1: database schema for multi-platform ✅ MERGED
212
+
- add `platform TEXT` column to documents (default 'leaflet')
213
+
- add `source_collection TEXT` column (default 'pub.leaflet.document')
214
+
- backfill existing ~3500 records
215
+
- no behavior change, just schema prep
216
+
- https://github.com/zzstoatzz/leaflet-search/pull/1
217
+
218
+
### PR2: generalized content extraction
219
+
- new `extractor.zig` module with platform-agnostic interface
220
+
- `textContent` extraction for standard.site records
221
+
- keep existing block parser for `pub.leaflet.*`
222
+
- platform detection from `content.$type`
223
+
224
+
### PR3: TAP subscriber for site.standard.document
225
+
- subscribe to `site.standard.document` + `site.standard.publication`
226
+
- route to appropriate extractor
227
+
- starts ingesting pckt.blog content
228
+
229
+
### PR4: API platform filter
230
+
- add `?platform=` query param to `/search`
231
+
- include `platform` field in results
232
+
- frontend: show platform badge, optional filter
233
+
234
+
### PR5 (optional, separate track): witness cache
235
+
- `witness_cache` table for raw records
236
+
- replay tooling for backfills
237
+
- independent of above work
238
+
239
+
## operational notes
240
+
241
+
- **cloudflare pages**: `leaflet-search` does NOT auto-deploy from git. manual deploy required:
242
+
```bash
243
+
wrangler pages deploy site --project-name leaflet-search
244
+
```
245
+
- **fly.io backend**: deploy from backend directory:
246
+
```bash
247
+
cd backend && fly deploy
248
+
```
249
+
- **git remotes**: push to both `origin` (tangled.sh) and `github` (for MCP + PRs)
250
+
198
251
## next steps
199
252
200
253
1. ~~verify leaflet's site.standard.document structure~~ (done - they don't have any)
201
254
2. ~~find and examine offprint records~~ (done - no public content yet)
202
-
3. decide on hybrid vs wait approach
203
-
4. consider witness cache architecture (see below)
204
-
5. design database migration
205
-
6. implement generalized tap subscriber
206
-
7. test with multi-platform data
255
+
3. ~~PR1: database schema~~ (merged)
256
+
4. PR2: generalized content extraction
257
+
5. PR3: TAP subscriber
258
+
6. PR4: API platform filter
259
+
7. consider witness cache architecture (see below)
207
260
208
261
---
209
262