Reference implementation for the Phoenix Architecture. Work in progress.
aicoding.leaflet.pub/
ai
coding
crazy
Phase A — Clause Extraction & Semantic Hashing#
Overview#
Phase A is the foundation layer. It parses spec documents (Markdown) into discrete clauses and computes semantic hashes for change detection.
Components#
1. Spec Parser (src/spec-parser.ts)#
Parses Markdown spec documents into structured clauses.
Input: Markdown file content + document ID
Output: Array of Clause objects
Parsing Rules:
- Split on heading boundaries (any level: #, ##, ###, etc.)
- Each heading + its body content = one clause
- Track section hierarchy (e.g.,
["1. Adoption Scope", "v1 Scope"]) - Record source line ranges
- Preserve raw text, compute normalized text
Normalization:
- Lowercase
- Collapse whitespace (multiple spaces/tabs → single space)
- Strip leading/trailing whitespace per line
- Remove markdown formatting characters (**, *, `, #)
- Remove empty lines
- Sort list items within a list block (for order-invariant hashing)
2. Clause Model (src/models/clause.ts)#
interface Clause {
clause_id: string; // content-addressed hash
source_doc_id: string; // document identifier
source_line_range: [number, number]; // [start, end] 1-indexed
raw_text: string; // original text
normalized_text: string; // after normalization
section_path: string[]; // heading hierarchy
clause_semhash: string; // SHA-256 of normalized_text
context_semhash_cold: string; // SHA-256 of normalized_text + section_path + adjacent clause hashes
}
3. Semantic Hasher (src/semhash.ts)#
clause_semhash: SHA-256(normalized_text)
context_semhash_cold: SHA-256(normalized_text + section_path.join('/') + prev_clause_semhash + next_clause_semhash)
This captures local context without requiring the canonical graph (cold start).
4. Spec Graph Store (src/store/spec-store.ts)#
Persists clauses to the content-addressed store and maintains the spec graph index.
Operations:
ingestDocument(docPath: string): IngestResultgetClauses(docId: string): Clause[]getClause(clauseId: string): Clause | nulldiffDocument(docPath: string): ClauseDiff[]
5. Diff Engine (src/diff.ts)#
Compares previous vs. current clauses for a document.
Diff types:
ADDED— new clauseREMOVED— clause deletedMODIFIED— clause_semhash changedMOVED— section_path changed but content sameUNCHANGED— identical
Data Flow#
spec/*.md → SpecParser.parse() → Clause[] → SemHasher.hash() → Clause[] (with hashes) → SpecStore.save()
File Layout#
src/
models/
clause.ts # Clause interface + types
spec-parser.ts # Markdown → Clause[] parser
semhash.ts # Semantic hashing functions
normalizer.ts # Text normalization
diff.ts # Clause diff engine
store/
spec-store.ts # Spec graph persistence
content-store.ts # Content-addressed object store
index.ts # Public API exports
Success Criteria#
- Parse a Markdown spec into correct clauses with accurate line ranges
- Normalized text is deterministic and order-invariant for lists
- clause_semhash is stable across formatting-only changes
- context_semhash_cold captures local structure
- Diff engine correctly classifies all change types
- Store persists and retrieves clauses by ID and document