Reference implementation for the Phoenix Architecture. Work in progress. aicoding.leaflet.pub/
ai coding crazy

Phoenix VCS — Project Audit Report#

Date: 2026-02-17
Scope: Phases A, B, C1, C2
Lines of code: ~2,450 source, ~1,800 test (4,258 total)
Tests: 142 passing across 17 test files (14 unit + 3 functional)


✅ What's Working Well#

  1. Clean architecture — Models, logic, and storage are well-separated. Models are pure types, logic is pure functions (easy to test), stores handle persistence.

  2. Content-addressed design — Every object (clause, canonical node, IU) is identified by a hash of its content. This is sound and will scale well.

  3. Test coverage — Every module has unit tests. Three functional tests validate end-to-end pipelines. All 142 pass in ~110ms.

  4. TypeScript strict modestrict: true enabled, compiles cleanly with no suppressions in source code.

  5. Provenance chain — The traceability from spec lines → clauses → canonical nodes → IUs → generated files → boundary validation is fully connected.


🔧 Issues Fixed During Audit#

# Issue Severity Fix
1 Duplicate boundary diagnostics — When a package was both forbidden and not in the allowlist, two diagnostics were emitted for the same import. Medium Changed to else if so forbidden check takes priority.
2 Dead code in classifier — The D-class branch inside the canon-impact block was unreachable (confidence was always ≥ 0.7, threshold was < 0.6). Low Removed dead branch.
3 as any in tests — Two test lines used {} as any for signal objects. Low Replaced with properly typed empty signal objects.

⚠️ Issues to Address (Not Yet Fixed)#

High Priority#

H1. No provenance graph persistence
The PRD specifies a Provenance Graph (Section 2) that records all transformation edges. Currently, provenance is implicit (canonical nodes have source_clause_ids, IUs have source_canon_ids), but there's no unified provenance store. Every transformation should emit a provenance edge to a dedicated graph.

H2. Normalizer doesn't handle code blocks
Fenced code blocks (```) are currently processed like regular text — headings and list items inside code blocks get mangled. The parser should skip code block contents during normalization.

H3. No pre-heading content handling
If a spec file has content before the first heading (e.g., a preamble), it's silently discarded by the parser. Only heading-bounded sections are captured. The PRD doesn't explicitly address this, but losing content is wrong.

H4. Classifier D-class is hard to trigger
The current classification logic produces D (uncertain) only when norm_diff > 0.7 || term_ref_delta > 0.7 AND no canonical impact AND no context shift. This is a very narrow window. The D-rate mechanism needs real exercise.

Medium Priority#

M1. IU planner grouping is greedy
clusterNodes() uses BFS to group all transitively connected nodes. In a large spec, this could collapse too many unrelated requirements into a single giant IU because of loose term overlap chains (A links to B links to C...). Should add a max-cluster-size or minimum-link-weight threshold.

M2. Regeneration is stub-only
The regen engine only produces function stubs. This is expected for v1, but the stub quality is minimal — no imports, no types, no contract enforcement in the generated code. The stubs should at least generate TypeScript interfaces from the IU contract.

M3. Manifest doesn't track deleted files
If a file is removed from output_files between regenerations, the old manifest entry persists. Need a reconciliation step that detects orphaned manifest entries.

M4. Content store has no garbage collection
Objects are never deleted. After multiple ingestions, stale clause objects accumulate. Need either reference counting or mark-and-sweep relative to the current graph indices.

M5. Side channel detection is shallow
The dep-extractor uses regex patterns. It misses indirect patterns like const { env } = process; env.SECRET, dynamic imports, and aliased require calls. Acceptable for v1 but should move to AST-based extraction.

M6. Spec parser doesn't handle ATX heading edge cases
Lines like # (heading marker with no text), ##text (no space), or setext-style headings (Title\n====) are not handled.

Low Priority#

L1. No .gitignore
The project is missing a .gitignore for node_modules/, dist/, and temp .phoenix/ directories.

L2. Demo creates temp directories without cleanup
mkdtempSync in the demo creates temp dirs that are never cleaned up.

L3. Store uses synchronous fs operations
All file I/O is synchronous (readFileSync, writeFileSync). Fine for a CLI tool, but should be async if this becomes a long-running server.

L4. No input validation on store operations
ContentStore.put() and SpecStore.ingestDocument() don't validate inputs. A non-hex ID or missing file would produce cryptic errors.

L5. Warm hasher performance
computeWarmHashes iterates all canonical nodes for every clause (O(clauses × nodes)). Should build an index of clause→nodes first.


📊 Coverage Gaps#

Component Unit Tests Functional Tests Gap
Normalizer ✅ 12 Missing: code blocks, nested markdown
Spec Parser ✅ 11 ✅ via ingestion Missing: setext headings, pre-heading content
Semhash ✅ 9
Diff ✅ 7 ✅ via ingestion Missing: large-scale diff (100+ clauses)
Canonicalizer ✅ 13 ✅ via canonicalization
Warm Hasher ✅ 5 ✅ via canonicalization
Classifier ✅ 7 ✅ via canonicalization Missing: D-class exercise
D-Rate ✅ 9 ✅ via canonicalization
Bootstrap ✅ 10 ✅ via canonicalization
IU Planner ✅ 7 ✅ via IU pipeline Missing: large spec with many clusters
Regen ✅ 6 ✅ via IU pipeline
Manifest ✅ via IU pipeline Missing: dedicated unit tests for ManifestManager
Drift ✅ 5 ✅ via IU pipeline
Dep Extractor ✅ 10 ✅ via IU pipeline
Boundary Validator ✅ 12 ✅ via IU pipeline
Content Store ✅ via ingestion Missing: dedicated unit tests
Spec Store ✅ via ingestion Missing: dedicated unit tests
Canonical Store ✅ via canonicalization Missing: dedicated unit tests

🏗️ Recommendations for Phase D+#

  1. Build a Provenance Store before Evidence/Policy (Phase D) — the evidence engine needs provenance edges to bind evidence to the right graph nodes.

  2. Add a CLI entry point (phoenix bootstrap, phoenix status, phoenix ingest) — the core logic is all functions/classes but there's no user-facing command.

  3. Add integration tests with the real PRD.md — run the full A→C2 pipeline against the Phoenix PRD itself as a dogfood test.

  4. Consider property-based testing for the normalizer and diff engine — these are the foundation and need to be bulletproof.

  5. Add structured logging — every transformation should emit a structured log event that can reconstruct the provenance graph.