docs/AUDIT.md at main · chadfowler.com/phoenix

chadfowler.com / phoenix
fork atom
Reference implementation for the Phoenix Architecture. Work in progress. aicoding.leaflet.pub/
ai coding crazy
fork atom
phoenix / docs / AUDIT.md
at main 124 lines 7.3 kB view raw view rendered
wrap content
Chad Fowler Phoenix VCS v0.1.0 — initial commit 6w ago
030f51e3
  1# Phoenix VCS — Project Audit Report
  2
  3**Date:** 2026-02-17  
  4**Scope:** Phases A, B, C1, C2  
  5**Lines of code:** ~2,450 source, ~1,800 test (4,258 total)  
  6**Tests:** 142 passing across 17 test files (14 unit + 3 functional)
  7
  8---
  9
 10## ✅ What's Working Well
 11
 121. **Clean architecture** — Models, logic, and storage are well-separated. Models are pure types, logic is pure functions (easy to test), stores handle persistence.
 13
 142. **Content-addressed design** — Every object (clause, canonical node, IU) is identified by a hash of its content. This is sound and will scale well.
 15
 163. **Test coverage** — Every module has unit tests. Three functional tests validate end-to-end pipelines. All 142 pass in ~110ms.
 17
 184. **TypeScript strict mode** — `strict: true` enabled, compiles cleanly with no suppressions in source code.
 19
 205. **Provenance chain** — The traceability from spec lines → clauses → canonical nodes → IUs → generated files → boundary validation is fully connected.
 21
 22---
 23
 24## 🔧 Issues Fixed During Audit
 25
 26| # | Issue | Severity | Fix |
 27|---|-------|----------|-----|
 28| 1 | **Duplicate boundary diagnostics** — When a package was both forbidden and not in the allowlist, two diagnostics were emitted for the same import. | Medium | Changed to `else if` so forbidden check takes priority. |
 29| 2 | **Dead code in classifier** — The D-class branch inside the canon-impact block was unreachable (confidence was always ≥ 0.7, threshold was < 0.6). | Low | Removed dead branch. |
 30| 3 | **`as any` in tests** — Two test lines used `{} as any` for signal objects. | Low | Replaced with properly typed empty signal objects. |
 31
 32---
 33
 34## ⚠️ Issues to Address (Not Yet Fixed)
 35
 36### High Priority
 37
 38**H1. No provenance graph persistence**  
 39The PRD specifies a Provenance Graph (Section 2) that records all transformation edges. Currently, provenance is implicit (canonical nodes have `source_clause_ids`, IUs have `source_canon_ids`), but there's no unified provenance store. Every transformation should emit a provenance edge to a dedicated graph.
 40
 41**H2. Normalizer doesn't handle code blocks**  
 42Fenced code blocks (` ``` `) are currently processed like regular text — headings and list items inside code blocks get mangled. The parser should skip code block contents during normalization.
 43
 44**H3. No pre-heading content handling**  
 45If a spec file has content before the first heading (e.g., a preamble), it's silently discarded by the parser. Only heading-bounded sections are captured. The PRD doesn't explicitly address this, but losing content is wrong.
 46
 47**H4. Classifier D-class is hard to trigger**  
 48The current classification logic produces D (uncertain) only when `norm_diff > 0.7 || term_ref_delta > 0.7` AND no canonical impact AND no context shift. This is a very narrow window. The D-rate mechanism needs real exercise.
 49
 50### Medium Priority
 51
 52**M1. IU planner grouping is greedy**  
 53`clusterNodes()` uses BFS to group all transitively connected nodes. In a large spec, this could collapse too many unrelated requirements into a single giant IU because of loose term overlap chains (A links to B links to C...). Should add a max-cluster-size or minimum-link-weight threshold.
 54
 55**M2. Regeneration is stub-only**  
 56The regen engine only produces function stubs. This is expected for v1, but the stub quality is minimal — no imports, no types, no contract enforcement in the generated code. The stubs should at least generate TypeScript interfaces from the IU contract.
 57
 58**M3. Manifest doesn't track deleted files**  
 59If a file is removed from `output_files` between regenerations, the old manifest entry persists. Need a reconciliation step that detects orphaned manifest entries.
 60
 61**M4. Content store has no garbage collection**  
 62Objects are never deleted. After multiple ingestions, stale clause objects accumulate. Need either reference counting or mark-and-sweep relative to the current graph indices.
 63
 64**M5. Side channel detection is shallow**  
 65The dep-extractor uses regex patterns. It misses indirect patterns like `const { env } = process; env.SECRET`, dynamic imports, and aliased require calls. Acceptable for v1 but should move to AST-based extraction.
 66
 67**M6. Spec parser doesn't handle ATX heading edge cases**  
 68Lines like `# ` (heading marker with no text), `##text` (no space), or setext-style headings (`Title\n====`) are not handled.
 69
 70### Low Priority
 71
 72**L1. No .gitignore**  
 73The project is missing a `.gitignore` for `node_modules/`, `dist/`, and temp `.phoenix/` directories.
 74
 75**L2. Demo creates temp directories without cleanup**  
 76`mkdtempSync` in the demo creates temp dirs that are never cleaned up.
 77
 78**L3. Store uses synchronous fs operations**  
 79All file I/O is synchronous (`readFileSync`, `writeFileSync`). Fine for a CLI tool, but should be async if this becomes a long-running server.
 80
 81**L4. No input validation on store operations**  
 82`ContentStore.put()` and `SpecStore.ingestDocument()` don't validate inputs. A non-hex ID or missing file would produce cryptic errors.
 83
 84**L5. Warm hasher performance**  
 85`computeWarmHashes` iterates all canonical nodes for every clause (O(clauses × nodes)). Should build an index of clause→nodes first.
 86
 87---
 88
 89## 📊 Coverage Gaps
 90
 91| Component | Unit Tests | Functional Tests | Gap |
 92|-----------|-----------|-----------------|-----|
 93| Normalizer | ✅ 12 | — | Missing: code blocks, nested markdown |
 94| Spec Parser | ✅ 11 | ✅ via ingestion | Missing: setext headings, pre-heading content |
 95| Semhash | ✅ 9 | — | — |
 96| Diff | ✅ 7 | ✅ via ingestion | Missing: large-scale diff (100+ clauses) |
 97| Canonicalizer | ✅ 13 | ✅ via canonicalization | — |
 98| Warm Hasher | ✅ 5 | ✅ via canonicalization | — |
 99| Classifier | ✅ 7 | ✅ via canonicalization | Missing: D-class exercise |
100| D-Rate | ✅ 9 | ✅ via canonicalization | — |
101| Bootstrap | ✅ 10 | ✅ via canonicalization | — |
102| IU Planner | ✅ 7 | ✅ via IU pipeline | Missing: large spec with many clusters |
103| Regen | ✅ 6 | ✅ via IU pipeline | — |
104| Manifest | — | ✅ via IU pipeline | Missing: dedicated unit tests for ManifestManager |
105| Drift | ✅ 5 | ✅ via IU pipeline | — |
106| Dep Extractor | ✅ 10 | ✅ via IU pipeline | — |
107| Boundary Validator | ✅ 12 | ✅ via IU pipeline | — |
108| Content Store | — | ✅ via ingestion | Missing: dedicated unit tests |
109| Spec Store | — | ✅ via ingestion | Missing: dedicated unit tests |
110| Canonical Store | — | ✅ via canonicalization | Missing: dedicated unit tests |
111
112---
113
114## 🏗️ Recommendations for Phase D+
115
1161. **Build a Provenance Store** before Evidence/Policy (Phase D) — the evidence engine needs provenance edges to bind evidence to the right graph nodes.
117
1182. **Add a CLI entry point** (`phoenix bootstrap`, `phoenix status`, `phoenix ingest`) — the core logic is all functions/classes but there's no user-facing command.
119
1203. **Add integration tests with the real PRD.md** — run the full A→C2 pipeline against the Phoenix PRD itself as a dogfood test.
121
1224. **Consider property-based testing** for the normalizer and diff engine — these are the foundation and need to be bulletproof.
123
1245. **Add structured logging** — every transformation should emit a structured log event that can reconstruct the provenance graph.