Reference implementation for the Phoenix Architecture. Work in progress.
aicoding.leaflet.pub/
ai
coding
crazy
Phase B — Canonicalization, Warm Context Hashing & Change Classifier#
Overview#
Phase B transforms clauses into a Canonical Graph — structured requirement nodes (Requirements, Constraints, Invariants, Definitions). It also computes warm context hashes that incorporate canonical graph context, and implements the A/B/C/D change classifier.
Components#
1. Canonical Node Model (src/models/canonical.ts)#
enum CanonicalType {
REQUIREMENT = 'REQUIREMENT',
CONSTRAINT = 'CONSTRAINT',
INVARIANT = 'INVARIANT',
DEFINITION = 'DEFINITION',
}
interface CanonicalNode {
canon_id: string; // content-addressed
type: CanonicalType;
statement: string; // normalized canonical statement
source_clause_ids: string[]; // provenance back to clauses
linked_canon_ids: string[]; // edges to related canonical nodes
tags: string[]; // extracted keywords/terms
}
2. Canonicalization Engine (src/canonicalizer.ts)#
Extracts canonical nodes from clauses using rule-based extraction:
Extraction Rules:
- Lines containing "must", "shall", "required" → REQUIREMENT
- Lines containing "must not", "forbidden", "prohibited" → CONSTRAINT
- Lines containing "always", "never", "invariant" → INVARIANT
- Lines containing definitions (": ", "is defined as", "means") → DEFINITION
- Headings containing "constraint", "security", "limit" → CONSTRAINT context
- Headings containing "requirement" → REQUIREMENT context
Linking Rules:
- Nodes sharing terms/keywords get linked
- Nodes from same clause get linked
- Nodes referencing same entities get linked
3. Warm Context Hasher (src/warm-hasher.ts)#
After canonicalization, compute context_semhash_warm:
context_semhash_warm = SHA-256(
normalized_text +
section_path.join('/') +
sorted(linked_canon_ids).join(',') +
sorted(canon_node_types).join(',')
)
4. Change Classifier (src/classifier.ts)#
Classifies each change into A/B/C/D:
| Class | Meaning | Criteria |
|---|---|---|
| A | Trivial | normalized_text identical, only formatting changed |
| B | Local semantic | clause_semhash changed, context_semhash_cold unchanged |
| C | Contextual shift | context_semhash changed, canonical links affected |
| D | Uncertain | classifier confidence below threshold |
Signals:
norm_diff: edit distance of normalized textssemhash_delta: binary (same/different clause_semhash)context_cold_delta: binary (same/different context_semhash_cold)term_ref_delta: Jaccard distance of extracted termssection_structure_delta: section_path changed?canon_impact: number of affected canonical nodes
5. D-Rate Tracker (src/d-rate.ts)#
Tracks D-classification rate over a rolling window.
- Target: ≤5%
- Acceptable: ≤10%
- Alarm: >15%
6. Bootstrap State Machine (src/bootstrap.ts)#
States: BOOTSTRAP_COLD → BOOTSTRAP_WARMING → STEADY_STATE
Transitions:
- COLD → WARMING: after first canonicalization + warm pass complete
- WARMING → STEADY_STATE: after D-rate stabilizes below acceptable threshold
Data Flow#
Clauses (Phase A)
→ Canonicalizer.extract() → CanonicalNode[]
→ WarmHasher.computeWarm() → Clause[] (with warm hashes)
→ Classifier.classify() → ChangeClassification[]
→ DRateTracker.record() → DRateStatus
→ BootstrapState.transition()
File Layout (Phase B additions)#
src/
models/
canonical.ts # CanonicalNode interface + types
classification.ts # ChangeClass enum + ChangeClassification
canonicalizer.ts # Clause → CanonicalNode extraction
warm-hasher.ts # Warm context hash computation
classifier.ts # A/B/C/D change classifier
d-rate.ts # D-rate tracking
bootstrap.ts # Bootstrap state machine
store/
canonical-store.ts # Canonical graph persistence