commits
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Bearer token now optional (non-bearer gets 128d, bearer gets 768d)
- Add include_embeddings request field + embedding response field
- Update limit docs (400 global, 1200 DID-scoped)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Support account-scoped search (-did), find-similar-posts (-did -rkey),
and browse-account (-did without query). Update README with all search
modes and API request fields.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New search subcommand with -token, -limit, -cluster, -distinct flags.
Backwards compatible (no subcommand defaults to stream).
Documents POST /search API request/response schema in README.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MLX on M1 stores weights in bf16 but computes all matmul in float32 —
Metal's simdgroup_matrix uses the FP32 ALU pipeline and MLX kernels
hardcode AccumType=float. Load weights in bf16 for identical rounding,
then upcast entire model to float32 for inference. Cosine similarity
improves from ~0.99 to ~0.995+ at 768d.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mlx_embeddings.mean_pooling casts the attention mask to float32 before
accumulating the pooling sum. verify.py was staying in bfloat16, losing
precision during summation over many tokens. Cast h and mask to float32
for pooling and dense projection to match the server-side computation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix .python-version (3.11→3.14, 3.11 wasn't installed)
- Fix verify.py: load model in bfloat16 to match server precision,
use correct pool-then-dense order (matching mlx-embeddings ≥0.1.0)
- Make verify input file mandatory (no bundled fixture)
- Remove unused zstandard from requirements.txt
- Default URL to divepool.social instead of localhost
- Update README examples and protocol docs
- Gitignore *.jsonl and .venv/
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Example client for the real-time embedding stream. Includes Go stream
reader with auto-reconnect, columnar batch parser, and Python verifier
for EmbeddingGemma embedding correctness.
Open clients: 128d Matryoshka truncation, L2-normalized
Token clients: full 768d
MLX on M1 stores weights in bf16 but computes all matmul in float32 —
Metal's simdgroup_matrix uses the FP32 ALU pipeline and MLX kernels
hardcode AccumType=float. Load weights in bf16 for identical rounding,
then upcast entire model to float32 for inference. Cosine similarity
improves from ~0.99 to ~0.995+ at 768d.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mlx_embeddings.mean_pooling casts the attention mask to float32 before
accumulating the pooling sum. verify.py was staying in bfloat16, losing
precision during summation over many tokens. Cast h and mask to float32
for pooling and dense projection to match the server-side computation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix .python-version (3.11→3.14, 3.11 wasn't installed)
- Fix verify.py: load model in bfloat16 to match server precision,
use correct pool-then-dense order (matching mlx-embeddings ≥0.1.0)
- Make verify input file mandatory (no bundled fixture)
- Remove unused zstandard from requirements.txt
- Default URL to divepool.social instead of localhost
- Update README examples and protocol docs
- Gitignore *.jsonl and .venv/
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>