zzstoatzz.io / leaflet-search

search for standard sites pub-search.waow.tech/

search zig blog atproto

fork atom

Zig 54.6%

HTML 15.0%

Python 11.5%

JavaScript 6.4%

CSS 1.7%

Dockerfile 0.6%

Just 0.4%

Other 10.0%

68 5 0

backend

5 days ago

5 days ago

1 week ago

5 days ago

5 days ago

1 week ago

1 week ago

5 days ago

1 week ago

commits 68

add platform badge to search results in frontend

79da934f

zzstoatzz.io +1

5 days ago

api-platform-filter

add platform field to search results and ?platform= filter

d7572926

zzstoatzz.io +1

5 days ago

subscribe to site.standard.document/publication collections

f7a4f0e7

zzstoatzz.io +1

5 days ago

branches 5

api-platform-filter 5 days ago

main 2 hours ago default

compare

standard-site-subscription 5 days ago

compare

README.md

leaflet-search#

by @zzstoatzz.io

search for leaflet.

live: leaflet-search.pages.dev

how it works#

tap syncs leaflet content from the network
backend indexes content into SQLite FTS5 via Turso, serves search API
site static frontend on Cloudflare Pages

MCP server#

search is also exposed as an MCP server for AI agents like Claude Code:

claude mcp add-json leaflet '{"type": "http", "url": "https://leaflet-search-by-zzstoatzz.fastmcp.app/mcp"}'

see mcp/README.md for local setup and usage details.

api#

GET /search?q=<query>&tag=<tag>  # full-text search with query, tag, or both
GET /similar?uri=<at-uri>        # find similar documents via vector embeddings
GET /tags                        # list all tags with counts
GET /popular                     # popular search queries
GET /stats                       # document/publication counts
GET /health                      # health check

search returns three entity types: article (document in a publication), looseleaf (standalone document), publication (newsletter itself). tag filtering applies to documents only.

/similar uses Voyage AI embeddings with brute-force cosine similarity (~0.15s for 3500 docs).

stack #

Fly.io hosts backend + tap
Turso cloud SQLite with vector support
Voyage AI embeddings (voyage-3-lite)
Tap syncs leaflet content from ATProto firehose
Zig HTTP server, search API, content indexing
Cloudflare Pages static frontend

embeddings#

documents are embedded using Voyage AI's voyage-3-lite model (512 dimensions). new documents from the firehose don't automatically get embeddings - they need to be backfilled periodically.

backfill embeddings#

requires TURSO_URL, TURSO_TOKEN, and VOYAGE_API_KEY in .env:

# check how many docs need embeddings
./scripts/backfill-embeddings --dry-run

# run the backfill (uses batching + concurrency)
./scripts/backfill-embeddings --batch-size 50

the script:

fetches docs where embedding IS NULL
batches them to Voyage API (50 docs/batch default)
writes embeddings to Turso in batched transactions
runs 8 concurrent workers

note: we use brute-force cosine similarity instead of a vector index. Turso's DiskANN index has ~60s write latency per row, making it impractical for incremental updates. brute-force on 3500 vectors runs in ~0.15s which is fine for this scale.

Clone this repository

Clone this repository

leaflet-search#

how it works#

MCP server#

api#

stack#

embeddings#

backfill embeddings#

stack #