Reference implementation for the Phoenix Architecture. Work in progress. aicoding.leaflet.pub/
ai coding crazy

Phase B — Canonicalization, Warm Context Hashing & Change Classifier#

Overview#

Phase B transforms clauses into a Canonical Graph — structured requirement nodes (Requirements, Constraints, Invariants, Definitions). It also computes warm context hashes that incorporate canonical graph context, and implements the A/B/C/D change classifier.

Components#

1. Canonical Node Model (src/models/canonical.ts)#

enum CanonicalType {
  REQUIREMENT = 'REQUIREMENT',
  CONSTRAINT = 'CONSTRAINT',
  INVARIANT = 'INVARIANT',
  DEFINITION = 'DEFINITION',
}

interface CanonicalNode {
  canon_id: string;            // content-addressed
  type: CanonicalType;
  statement: string;           // normalized canonical statement
  source_clause_ids: string[]; // provenance back to clauses
  linked_canon_ids: string[];  // edges to related canonical nodes
  tags: string[];              // extracted keywords/terms
}

2. Canonicalization Engine (src/canonicalizer.ts)#

Extracts canonical nodes from clauses using rule-based extraction:

Extraction Rules:

  • Lines containing "must", "shall", "required" → REQUIREMENT
  • Lines containing "must not", "forbidden", "prohibited" → CONSTRAINT
  • Lines containing "always", "never", "invariant" → INVARIANT
  • Lines containing definitions (": ", "is defined as", "means") → DEFINITION
  • Headings containing "constraint", "security", "limit" → CONSTRAINT context
  • Headings containing "requirement" → REQUIREMENT context

Linking Rules:

  • Nodes sharing terms/keywords get linked
  • Nodes from same clause get linked
  • Nodes referencing same entities get linked

3. Warm Context Hasher (src/warm-hasher.ts)#

After canonicalization, compute context_semhash_warm:

context_semhash_warm = SHA-256(
  normalized_text +
  section_path.join('/') +
  sorted(linked_canon_ids).join(',') +
  sorted(canon_node_types).join(',')
)

4. Change Classifier (src/classifier.ts)#

Classifies each change into A/B/C/D:

Class Meaning Criteria
A Trivial normalized_text identical, only formatting changed
B Local semantic clause_semhash changed, context_semhash_cold unchanged
C Contextual shift context_semhash changed, canonical links affected
D Uncertain classifier confidence below threshold

Signals:

  • norm_diff: edit distance of normalized texts
  • semhash_delta: binary (same/different clause_semhash)
  • context_cold_delta: binary (same/different context_semhash_cold)
  • term_ref_delta: Jaccard distance of extracted terms
  • section_structure_delta: section_path changed?
  • canon_impact: number of affected canonical nodes

5. D-Rate Tracker (src/d-rate.ts)#

Tracks D-classification rate over a rolling window.

  • Target: ≤5%
  • Acceptable: ≤10%
  • Alarm: >15%

6. Bootstrap State Machine (src/bootstrap.ts)#

States: BOOTSTRAP_COLDBOOTSTRAP_WARMINGSTEADY_STATE

Transitions:

  • COLD → WARMING: after first canonicalization + warm pass complete
  • WARMING → STEADY_STATE: after D-rate stabilizes below acceptable threshold

Data Flow#

Clauses (Phase A)
  → Canonicalizer.extract() → CanonicalNode[]
  → WarmHasher.computeWarm() → Clause[] (with warm hashes)
  → Classifier.classify() → ChangeClassification[]
  → DRateTracker.record() → DRateStatus
  → BootstrapState.transition()

File Layout (Phase B additions)#

src/
  models/
    canonical.ts       # CanonicalNode interface + types
    classification.ts  # ChangeClass enum + ChangeClassification
  canonicalizer.ts     # Clause → CanonicalNode extraction
  warm-hasher.ts       # Warm context hash computation
  classifier.ts        # A/B/C/D change classifier
  d-rate.ts            # D-rate tracking
  bootstrap.ts         # Bootstrap state machine
  store/
    canonical-store.ts # Canonical graph persistence