motet#
A mote of data, composed.
A personal search indexer that crawls the corners of the web you care about and indexes them into a local full-text search engine. The opposite of Google — small, focused, private, and yours.
┌──────────────┐
│ motet CLI │
│ & Web UI │
└──────┬───────┘
│
┌──────────┴──────────┐
│ motet_core │
├──────────┬──────────┤
│ Crawlers │ Query │
│ (blog, │ Engine │
│ yelp, │ (BM25) │
│ reddit, │ │
│ crates) │ │
├──────────┴──────────┤
│ Tantivy │ SQLite │
│ (index) │ (meta) │
└──────────┴──────────┘
│ │
┌──────────┘ └──────────┐
▼ ▼
~/.local/share/motet/index/ ~/.local/share/motet/motet.db
Features#
- Focused — you choose what gets indexed, no SEO spam or ads
- Fast — sub-millisecond queries over a compact local index
- Private — runs locally, no tracking, no cloud dependency
- Dual interface — CLI for the terminal, React web UI in the browser
- Single binary — the web frontend is embedded in the Rust binary
- Extensible — add new sources by implementing the crawler trait
Quick Start#
# Initialize default config (This Week in Rust + Scout Magazine)
motet init
# Crawl all configured sources
motet crawl
# Search your index
motet search "rust async"
# Start the web UI
motet serve
# → http://127.0.0.1:3838
Installation#
From Source#
cargo install --path motet_cli
Nix#
nix build # or: nix develop
Configuration#
Sources are defined in ~/.config/motet/sources.json. Run motet init to
generate a default config, then edit it to add your own sources.
{
"sources": {
"this_week_in_rust": {
"kind": "blog",
"url": "https://this-week-in-rust.org/",
"crawl_interval": "7d",
"selector": "article",
"max_pages": 50,
"source_kind_label": "blog"
},
"scout_magazine": {
"kind": "blog",
"url": "https://scoutmagazine.ca/category/food-drink/",
"crawl_interval": "3d",
"selector": "article",
"max_pages": 50,
"source_kind_label": "restaurant"
}
}
}
Source Kinds#
| Kind | Description | Status |
|---|---|---|
blog |
Generic HTML blog scraper | Implemented |
yelp |
Yelp Fusion API | Planned |
reddit |
Reddit posts via API | Planned |
crates_io |
crates.io package metadata | Planned |
Configuration Fields#
| Field | Required | Description |
|---|---|---|
kind |
yes | Crawler type (see table above) |
url |
blog | Base URL to crawl |
crawl_interval |
no | Re-crawl frequency (30m, 12h, 1d, 7d) |
selector |
no | CSS selector for article elements |
max_pages |
no | Maximum pages to crawl per run |
source_kind_label |
no | Facet label for filtering (blog, restaurant) |
CLI Reference#
motet init Write default config
motet crawl Crawl all sources
motet crawl --source <name> Crawl one source
motet crawl --dry-run Show what would be crawled
motet search <query> Search the index
motet search <query> --limit 20 Limit result count
motet serve Start web UI on :3838
motet serve --port 8080 Custom port
motet stats Show index statistics
Crates#
| Crate | Description |
|---|---|
motet_core |
Library — crawlers, index, query, config |
motet_cli |
Binary — CLI commands and web server |
How It Works#
Crawling#
Each source has a crawler that fetches pages, extracts text content, and produces structured documents. The generic blog crawler:
- Fetches the index page at the configured URL
- Extracts article links using the CSS selector
- Fetches each article and extracts title + body text
- Produces a snippet (first ~300 chars) for search result display
Indexing#
Documents are stored in two places:
| Store | Contents | Purpose |
|---|---|---|
| Tantivy | URL, title, body, facets, tags, date | Full-text search (BM25) |
| SQLite | Crawl timestamps, ETags, structured metadata | Freshness, dedup, domain-specific data |
Searching#
Queries run against the Tantivy index using BM25 scoring across the title
and body fields. Results include the source kind and name as facets for
filtering.
Storage Layout#
~/.config/motet/
└── sources.json # Source configuration
~/.local/share/motet/
├── index/ # Tantivy full-text index
└── motet.db # SQLite metadata
Development#
nix develop # Enter dev shell (Rust 1.90 + Node 22)
cargo build # Build
cargo test # Test
cargo clippy # Lint
Building the Web UI#
cd motet_web
npm install
npm run build # Output to dist/, embedded in binary
License#
Apache-2.0 OR MIT