find-bufo#
hybrid semantic + keyword search for the bufo zone
live at: find-bufo.com
overview#
a one-page application for searching through all the bufos from bufo.zone using hybrid search that combines:
- semantic search via multimodal embeddings (understands meaning and visual content)
- keyword search via BM25 full-text search (finds exact filename matches)
architecture#
- backend: rust (actix-web)
- frontend: vanilla html/css/js
- embeddings: voyage ai voyage-multimodal-3
- vector store: turbopuffer
- deployment: fly.io
setup#
-
install dependencies:
- rust toolchain
- python 3.11+ with uv
-
copy environment variables:
cp .env.example .env -
set your api keys in
.env:VOYAGE_API_TOKEN- for generating embeddingsTURBOPUFFER_API_KEY- for vector storage
ingestion#
to populate the vector store with bufos:
just re-index
this will:
- scrape all bufos from bufo.zone
- download them to
data/bufos/ - generate embeddings for each image with
input_type="document" - upload to turbopuffer
development#
run the server locally:
cargo run
the app will be available at http://localhost:8080
deployment#
deploy to fly.io:
fly launch # first time
fly secrets set VOYAGE_API_TOKEN=your_token
fly secrets set TURBOPUFFER_API_KEY=your_key
just deploy
usage#
- open the app
- enter a search query describing the bufo you want
- see the top matching bufos with hybrid similarity scores
- click any bufo to open it in a new tab
api parameters#
the search API supports these parameters:
query: search text (required)top_k: number of results (default: 10)alpha: fusion weight (default: 0.7)1.0= pure semantic (best for conceptual queries like "happy", "apocalyptic")0.7= default (balances semantic understanding with exact matches)0.5= balanced (equal weight to both signals)0.0= pure keyword (best for exact filename searches)
example: /api/search?query=jumping&top_k=5&alpha=0.5
how it works#
ingestion#
all bufo images are processed through early fusion multimodal embeddings:
- filename text extracted (e.g., "bufo-jumping-on-bed" → "bufo jumping on bed")
- combined with image content in single embedding request
- voyage-multimodal-3 creates 1024-dim vectors capturing both text and visual features
- uploaded to turbopuffer with BM25-enabled
namefield for keyword search
search#
- semantic branch: query embedded using voyage-multimodal-3 with
input_type="query" - keyword branch: BM25 full-text search against bufo names
- fusion: weighted combination using
alphaparameterscore = α * semantic + (1-α) * keyword- both scores normalized to 0-1 range before fusion
- ranking: results sorted by fused score, top_k returned
why hybrid?#
- semantic alone: misses exact filename matches (e.g., "happy" might not find "bufo-is-happy")
- keyword alone: no semantic understanding (e.g., "happy" won't find "excited" or "smiling")
- hybrid: gets the best of both worlds