Reference implementation for the Phoenix Architecture. Work in progress. aicoding.leaflet.pub/
ai coding crazy
at main 104 lines 3.4 kB view raw view rendered
1# Phase A — Clause Extraction & Semantic Hashing 2 3## Overview 4 5Phase A is the foundation layer. It parses spec documents (Markdown) into discrete clauses and computes semantic hashes for change detection. 6 7## Components 8 9### 1. Spec Parser (`src/spec-parser.ts`) 10 11Parses Markdown spec documents into structured clauses. 12 13**Input:** Markdown file content + document ID 14**Output:** Array of `Clause` objects 15 16**Parsing Rules:** 17- Split on heading boundaries (any level: #, ##, ###, etc.) 18- Each heading + its body content = one clause 19- Track section hierarchy (e.g., `["1. Adoption Scope", "v1 Scope"]`) 20- Record source line ranges 21- Preserve raw text, compute normalized text 22 23**Normalization:** 24- Lowercase 25- Collapse whitespace (multiple spaces/tabs → single space) 26- Strip leading/trailing whitespace per line 27- Remove markdown formatting characters (**, *, `, #) 28- Remove empty lines 29- Sort list items within a list block (for order-invariant hashing) 30 31### 2. Clause Model (`src/models/clause.ts`) 32 33```typescript 34interface Clause { 35 clause_id: string; // content-addressed hash 36 source_doc_id: string; // document identifier 37 source_line_range: [number, number]; // [start, end] 1-indexed 38 raw_text: string; // original text 39 normalized_text: string; // after normalization 40 section_path: string[]; // heading hierarchy 41 clause_semhash: string; // SHA-256 of normalized_text 42 context_semhash_cold: string; // SHA-256 of normalized_text + section_path + adjacent clause hashes 43} 44``` 45 46### 3. Semantic Hasher (`src/semhash.ts`) 47 48**clause_semhash:** `SHA-256(normalized_text)` 49 50**context_semhash_cold:** `SHA-256(normalized_text + section_path.join('/') + prev_clause_semhash + next_clause_semhash)` 51 52This captures local context without requiring the canonical graph (cold start). 53 54### 4. Spec Graph Store (`src/store/spec-store.ts`) 55 56Persists clauses to the content-addressed store and maintains the spec graph index. 57 58**Operations:** 59- `ingestDocument(docPath: string): IngestResult` 60- `getClauses(docId: string): Clause[]` 61- `getClause(clauseId: string): Clause | null` 62- `diffDocument(docPath: string): ClauseDiff[]` 63 64### 5. Diff Engine (`src/diff.ts`) 65 66Compares previous vs. current clauses for a document. 67 68**Diff types:** 69- `ADDED` — new clause 70- `REMOVED` — clause deleted 71- `MODIFIED` — clause_semhash changed 72- `MOVED` — section_path changed but content same 73- `UNCHANGED` — identical 74 75## Data Flow 76 77``` 78spec/*.md → SpecParser.parse() → Clause[] → SemHasher.hash() → Clause[] (with hashes) → SpecStore.save() 79``` 80 81## File Layout 82 83``` 84src/ 85 models/ 86 clause.ts # Clause interface + types 87 spec-parser.ts # Markdown → Clause[] parser 88 semhash.ts # Semantic hashing functions 89 normalizer.ts # Text normalization 90 diff.ts # Clause diff engine 91 store/ 92 spec-store.ts # Spec graph persistence 93 content-store.ts # Content-addressed object store 94 index.ts # Public API exports 95``` 96 97## Success Criteria 98 991. Parse a Markdown spec into correct clauses with accurate line ranges 1002. Normalized text is deterministic and order-invariant for lists 1013. clause_semhash is stable across formatting-only changes 1024. context_semhash_cold captures local structure 1035. Diff engine correctly classifies all change types 1046. Store persists and retrieves clauses by ID and document