Reference implementation for the Phoenix Architecture. Work in progress.
aicoding.leaflet.pub/
ai
coding
crazy
1# Phoenix VCS — Project Audit Report
2
3**Date:** 2026-02-17
4**Scope:** Phases A, B, C1, C2
5**Lines of code:** ~2,450 source, ~1,800 test (4,258 total)
6**Tests:** 142 passing across 17 test files (14 unit + 3 functional)
7
8---
9
10## ✅ What's Working Well
11
121. **Clean architecture** — Models, logic, and storage are well-separated. Models are pure types, logic is pure functions (easy to test), stores handle persistence.
13
142. **Content-addressed design** — Every object (clause, canonical node, IU) is identified by a hash of its content. This is sound and will scale well.
15
163. **Test coverage** — Every module has unit tests. Three functional tests validate end-to-end pipelines. All 142 pass in ~110ms.
17
184. **TypeScript strict mode** — `strict: true` enabled, compiles cleanly with no suppressions in source code.
19
205. **Provenance chain** — The traceability from spec lines → clauses → canonical nodes → IUs → generated files → boundary validation is fully connected.
21
22---
23
24## 🔧 Issues Fixed During Audit
25
26| # | Issue | Severity | Fix |
27|---|-------|----------|-----|
28| 1 | **Duplicate boundary diagnostics** — When a package was both forbidden and not in the allowlist, two diagnostics were emitted for the same import. | Medium | Changed to `else if` so forbidden check takes priority. |
29| 2 | **Dead code in classifier** — The D-class branch inside the canon-impact block was unreachable (confidence was always ≥ 0.7, threshold was < 0.6). | Low | Removed dead branch. |
30| 3 | **`as any` in tests** — Two test lines used `{} as any` for signal objects. | Low | Replaced with properly typed empty signal objects. |
31
32---
33
34## ⚠️ Issues to Address (Not Yet Fixed)
35
36### High Priority
37
38**H1. No provenance graph persistence**
39The PRD specifies a Provenance Graph (Section 2) that records all transformation edges. Currently, provenance is implicit (canonical nodes have `source_clause_ids`, IUs have `source_canon_ids`), but there's no unified provenance store. Every transformation should emit a provenance edge to a dedicated graph.
40
41**H2. Normalizer doesn't handle code blocks**
42Fenced code blocks (` ``` `) are currently processed like regular text — headings and list items inside code blocks get mangled. The parser should skip code block contents during normalization.
43
44**H3. No pre-heading content handling**
45If a spec file has content before the first heading (e.g., a preamble), it's silently discarded by the parser. Only heading-bounded sections are captured. The PRD doesn't explicitly address this, but losing content is wrong.
46
47**H4. Classifier D-class is hard to trigger**
48The current classification logic produces D (uncertain) only when `norm_diff > 0.7 || term_ref_delta > 0.7` AND no canonical impact AND no context shift. This is a very narrow window. The D-rate mechanism needs real exercise.
49
50### Medium Priority
51
52**M1. IU planner grouping is greedy**
53`clusterNodes()` uses BFS to group all transitively connected nodes. In a large spec, this could collapse too many unrelated requirements into a single giant IU because of loose term overlap chains (A links to B links to C...). Should add a max-cluster-size or minimum-link-weight threshold.
54
55**M2. Regeneration is stub-only**
56The regen engine only produces function stubs. This is expected for v1, but the stub quality is minimal — no imports, no types, no contract enforcement in the generated code. The stubs should at least generate TypeScript interfaces from the IU contract.
57
58**M3. Manifest doesn't track deleted files**
59If a file is removed from `output_files` between regenerations, the old manifest entry persists. Need a reconciliation step that detects orphaned manifest entries.
60
61**M4. Content store has no garbage collection**
62Objects are never deleted. After multiple ingestions, stale clause objects accumulate. Need either reference counting or mark-and-sweep relative to the current graph indices.
63
64**M5. Side channel detection is shallow**
65The dep-extractor uses regex patterns. It misses indirect patterns like `const { env } = process; env.SECRET`, dynamic imports, and aliased require calls. Acceptable for v1 but should move to AST-based extraction.
66
67**M6. Spec parser doesn't handle ATX heading edge cases**
68Lines like `# ` (heading marker with no text), `##text` (no space), or setext-style headings (`Title\n====`) are not handled.
69
70### Low Priority
71
72**L1. No .gitignore**
73The project is missing a `.gitignore` for `node_modules/`, `dist/`, and temp `.phoenix/` directories.
74
75**L2. Demo creates temp directories without cleanup**
76`mkdtempSync` in the demo creates temp dirs that are never cleaned up.
77
78**L3. Store uses synchronous fs operations**
79All file I/O is synchronous (`readFileSync`, `writeFileSync`). Fine for a CLI tool, but should be async if this becomes a long-running server.
80
81**L4. No input validation on store operations**
82`ContentStore.put()` and `SpecStore.ingestDocument()` don't validate inputs. A non-hex ID or missing file would produce cryptic errors.
83
84**L5. Warm hasher performance**
85`computeWarmHashes` iterates all canonical nodes for every clause (O(clauses × nodes)). Should build an index of clause→nodes first.
86
87---
88
89## 📊 Coverage Gaps
90
91| Component | Unit Tests | Functional Tests | Gap |
92|-----------|-----------|-----------------|-----|
93| Normalizer | ✅ 12 | — | Missing: code blocks, nested markdown |
94| Spec Parser | ✅ 11 | ✅ via ingestion | Missing: setext headings, pre-heading content |
95| Semhash | ✅ 9 | — | — |
96| Diff | ✅ 7 | ✅ via ingestion | Missing: large-scale diff (100+ clauses) |
97| Canonicalizer | ✅ 13 | ✅ via canonicalization | — |
98| Warm Hasher | ✅ 5 | ✅ via canonicalization | — |
99| Classifier | ✅ 7 | ✅ via canonicalization | Missing: D-class exercise |
100| D-Rate | ✅ 9 | ✅ via canonicalization | — |
101| Bootstrap | ✅ 10 | ✅ via canonicalization | — |
102| IU Planner | ✅ 7 | ✅ via IU pipeline | Missing: large spec with many clusters |
103| Regen | ✅ 6 | ✅ via IU pipeline | — |
104| Manifest | — | ✅ via IU pipeline | Missing: dedicated unit tests for ManifestManager |
105| Drift | ✅ 5 | ✅ via IU pipeline | — |
106| Dep Extractor | ✅ 10 | ✅ via IU pipeline | — |
107| Boundary Validator | ✅ 12 | ✅ via IU pipeline | — |
108| Content Store | — | ✅ via ingestion | Missing: dedicated unit tests |
109| Spec Store | — | ✅ via ingestion | Missing: dedicated unit tests |
110| Canonical Store | — | ✅ via canonicalization | Missing: dedicated unit tests |
111
112---
113
114## 🏗️ Recommendations for Phase D+
115
1161. **Build a Provenance Store** before Evidence/Policy (Phase D) — the evidence engine needs provenance edges to bind evidence to the right graph nodes.
117
1182. **Add a CLI entry point** (`phoenix bootstrap`, `phoenix status`, `phoenix ingest`) — the core logic is all functions/classes but there's no user-facing command.
119
1203. **Add integration tests with the real PRD.md** — run the full A→C2 pipeline against the Phoenix PRD itself as a dogfood test.
121
1224. **Consider property-based testing** for the normalizer and diff engine — these are the foundation and need to be bulletproof.
123
1245. **Add structured logging** — every transformation should emit a structured log event that can reconstruct the provenance graph.