this repo has no description
Rust 79.0%
TypeScript 12.1%
Nix 7.3%
CSS 1.1%
HTML 0.5%
1 2 0

Clone this repository

https://tangled.org/expede.wtf/motet https://tangled.org/did:plc:oypgij57lv3ytni32p2jqbce/motet
git@tangled.org:expede.wtf/motet git@tangled.org:did:plc:oypgij57lv3ytni32p2jqbce/motet

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
README.md

motet#

A mote of data, composed.

A personal search indexer that crawls the corners of the web you care about and indexes them into a local full-text search engine. The opposite of Google — small, focused, private, and yours.

                  ┌──────────────┐
                  │  motet CLI   │
                  │  & Web UI    │
                  └──────┬───────┘
                         │
              ┌──────────┴──────────┐
              │     motet_core      │
              ├──────────┬──────────┤
              │ Crawlers │  Query   │
              │ (blog,   │  Engine  │
              │  yelp,   │ (BM25)   │
              │  reddit, │          │
              │  crates) │          │
              ├──────────┴──────────┤
              │  Tantivy │  SQLite  │
              │  (index) │  (meta)  │
              └──────────┴──────────┘
                     │          │
          ┌──────────┘          └──────────┐
          ▼                                ▼
  ~/.local/share/motet/index/   ~/.local/share/motet/motet.db

Features#

  • Focused — you choose what gets indexed, no SEO spam or ads
  • Fast — sub-millisecond queries over a compact local index
  • Private — runs locally, no tracking, no cloud dependency
  • Dual interface — CLI for the terminal, React web UI in the browser
  • Single binary — the web frontend is embedded in the Rust binary
  • Extensible — add new sources by implementing the crawler trait

Quick Start#

# Initialize default config (This Week in Rust + Scout Magazine)
motet init

# Crawl all configured sources
motet crawl

# Search your index
motet search "rust async"

# Start the web UI
motet serve
# → http://127.0.0.1:3838

Installation#

From Source#

cargo install --path motet_cli

Nix#

nix build   # or: nix develop

Configuration#

Sources are defined in ~/.config/motet/sources.json. Run motet init to generate a default config, then edit it to add your own sources.

{
  "sources": {
    "this_week_in_rust": {
      "kind": "blog",
      "url": "https://this-week-in-rust.org/",
      "crawl_interval": "7d",
      "selector": "article",
      "max_pages": 50,
      "source_kind_label": "blog"
    },
    "scout_magazine": {
      "kind": "blog",
      "url": "https://scoutmagazine.ca/category/food-drink/",
      "crawl_interval": "3d",
      "selector": "article",
      "max_pages": 50,
      "source_kind_label": "restaurant"
    }
  }
}

Source Kinds#

Kind Description Status
blog Generic HTML blog scraper Implemented
yelp Yelp Fusion API Planned
reddit Reddit posts via API Planned
crates_io crates.io package metadata Planned

Configuration Fields#

Field Required Description
kind yes Crawler type (see table above)
url blog Base URL to crawl
crawl_interval no Re-crawl frequency (30m, 12h, 1d, 7d)
selector no CSS selector for article elements
max_pages no Maximum pages to crawl per run
source_kind_label no Facet label for filtering (blog, restaurant)

CLI Reference#

motet init                          Write default config
motet crawl                         Crawl all sources
motet crawl --source <name>         Crawl one source
motet crawl --dry-run               Show what would be crawled
motet search <query>                Search the index
motet search <query> --limit 20     Limit result count
motet serve                         Start web UI on :3838
motet serve --port 8080             Custom port
motet stats                         Show index statistics

Crates#

Crate Description
motet_core Library — crawlers, index, query, config
motet_cli Binary — CLI commands and web server

How It Works#

Crawling#

Each source has a crawler that fetches pages, extracts text content, and produces structured documents. The generic blog crawler:

  1. Fetches the index page at the configured URL
  2. Extracts article links using the CSS selector
  3. Fetches each article and extracts title + body text
  4. Produces a snippet (first ~300 chars) for search result display

Indexing#

Documents are stored in two places:

Store Contents Purpose
Tantivy URL, title, body, facets, tags, date Full-text search (BM25)
SQLite Crawl timestamps, ETags, structured metadata Freshness, dedup, domain-specific data

Searching#

Queries run against the Tantivy index using BM25 scoring across the title and body fields. Results include the source kind and name as facets for filtering.

Storage Layout#

~/.config/motet/
└── sources.json              # Source configuration

~/.local/share/motet/
├── index/                    # Tantivy full-text index
└── motet.db                  # SQLite metadata

Development#

nix develop     # Enter dev shell (Rust 1.90 + Node 22)
cargo build     # Build
cargo test      # Test
cargo clippy    # Lint

Building the Web UI#

cd motet_web
npm install
npm run build   # Output to dist/, embedded in binary

License#

Apache-2.0 OR MIT