search for standard sites pub-search.waow.tech/
search zig blog atproto
Zig 54.6%
HTML 15.0%
Python 11.5%
JavaScript 6.4%
CSS 1.7%
Dockerfile 0.6%
Just 0.4%
Other 10.0%
68 5 0

Clone this repository

https://tangled.org/zzstoatzz.io/leaflet-search
git@tangled.org:zzstoatzz.io/leaflet-search

For self-hosted knots, clone URLs may differ based on your setup.

README.md

leaflet-search#

by @zzstoatzz.io

search for leaflet.

live: leaflet-search.pages.dev

how it works#

  1. tap syncs leaflet content from the network
  2. backend indexes content into SQLite FTS5 via Turso, serves search API
  3. site static frontend on Cloudflare Pages

MCP server#

search is also exposed as an MCP server for AI agents like Claude Code:

claude mcp add-json leaflet '{"type": "http", "url": "https://leaflet-search-by-zzstoatzz.fastmcp.app/mcp"}'

see mcp/README.md for local setup and usage details.

api#

GET /search?q=<query>&tag=<tag>  # full-text search with query, tag, or both
GET /similar?uri=<at-uri>        # find similar documents via vector embeddings
GET /tags                        # list all tags with counts
GET /popular                     # popular search queries
GET /stats                       # document/publication counts
GET /health                      # health check

search returns three entity types: article (document in a publication), looseleaf (standalone document), publication (newsletter itself). tag filtering applies to documents only.

/similar uses Voyage AI embeddings with brute-force cosine similarity (~0.15s for 3500 docs).

stack#

  • Fly.io hosts backend + tap
  • Turso cloud SQLite with vector support
  • Voyage AI embeddings (voyage-3-lite)
  • Tap syncs leaflet content from ATProto firehose
  • Zig HTTP server, search API, content indexing
  • Cloudflare Pages static frontend

embeddings#

documents are embedded using Voyage AI's voyage-3-lite model (512 dimensions). new documents from the firehose don't automatically get embeddings - they need to be backfilled periodically.

backfill embeddings#

requires TURSO_URL, TURSO_TOKEN, and VOYAGE_API_KEY in .env:

# check how many docs need embeddings
./scripts/backfill-embeddings --dry-run

# run the backfill (uses batching + concurrency)
./scripts/backfill-embeddings --batch-size 50

the script:

  • fetches docs where embedding IS NULL
  • batches them to Voyage API (50 docs/batch default)
  • writes embeddings to Turso in batched transactions
  • runs 8 concurrent workers

note: we use brute-force cosine similarity instead of a vector index. Turso's DiskANN index has ~60s write latency per row, making it impractical for incremental updates. brute-force on 3500 vectors runs in ~0.15s which is fine for this scale.