this repo has no description
Zig 81.8%
Python 18.1%
Just 0.1%
5 1 0

Clone this repository

https://tangled.org/zzstoatzz.io/spacez https://tangled.org/did:plc:xbtmt2zjwlrfegqvch7fboei/spacez
git@tangled.org:zzstoatzz.io/spacez git@tangled.org:did:plc:xbtmt2zjwlrfegqvch7fboei/spacez

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
README.md

spacez#

spaCy's en_core_web_sm NER pipeline, reimplemented in zig.

hash embeddings → residual CNN → BILUO transition parser, all from the original weights. no python, no runtime dependencies, ~6MB binary.

install#

zig fetch --save https://tangled.sh/@zzstoatzz.io/spacez/archive/main

then in build.zig:

const spacez = b.dependency("spacez", .{}).module("spacez");
exe.root_module.addImport("spacez", spacez);

usage#

const spacez = @import("spacez");

var model = try spacez.Model.load(spacez.en_core_web_sm);

var entities: [64]spacez.SpanEntity = undefined;
const n = spacez.recognize(&model, "Barack Obama visited Paris.", &entities);

for (entities[0..n]) |e| {
    // e.label = .PERSON, .GPE, etc.
    // e.start / e.end = byte offsets into source text
}

weights are embedded at compile time — no files to download or manage.

what's here#

  • hash embeddings — murmurhash + bloom → 4 embedding tables, reduced via maxout
  • CNN encoder — 4 residual blocks (seq2col → dense → maxout → layernorm → residual add)
  • transition parser — greedy BILUO with 73 actions (9 entity types × 4 transitions + 1)
  • tokenizer — full port of spaCy's English tokenizer (~1,000 special cases, prefix/suffix/infix rules)
  • SIMD ops — vectorized matvec, vadd using zig's @Vector intrinsics

validation#

73/73 test sentences produce identical entities and byte offsets to spaCy en_core_web_sm 3.8.0, including multi-byte UTF-8, contractions, hyphens, possessives, informal text, and zero-entity inputs.

# head-to-head comparison (requires spaCy + en_core_web_sm)
uv run --python 3.12 --with spacy \
  --with 'en_core_web_sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl' \
  python scripts/compare.py

entity types#

PERSON, ORG, GPE, PRODUCT, EVENT, WORK_OF_ART, FAC, NORP, LOC

(plus CARDINAL, DATE, MONEY, PERCENT, ORDINAL, QUANTITY, TIME, LAW — inherited from the model)