Reference implementation for the Phoenix Architecture. Work in progress. aicoding.leaflet.pub/
ai coding crazy
at main 124 lines 7.3 kB view raw view rendered
1# Phoenix VCS — Project Audit Report 2 3**Date:** 2026-02-17 4**Scope:** Phases A, B, C1, C2 5**Lines of code:** ~2,450 source, ~1,800 test (4,258 total) 6**Tests:** 142 passing across 17 test files (14 unit + 3 functional) 7 8--- 9 10## ✅ What's Working Well 11 121. **Clean architecture** — Models, logic, and storage are well-separated. Models are pure types, logic is pure functions (easy to test), stores handle persistence. 13 142. **Content-addressed design** — Every object (clause, canonical node, IU) is identified by a hash of its content. This is sound and will scale well. 15 163. **Test coverage** — Every module has unit tests. Three functional tests validate end-to-end pipelines. All 142 pass in ~110ms. 17 184. **TypeScript strict mode**`strict: true` enabled, compiles cleanly with no suppressions in source code. 19 205. **Provenance chain** — The traceability from spec lines → clauses → canonical nodes → IUs → generated files → boundary validation is fully connected. 21 22--- 23 24## 🔧 Issues Fixed During Audit 25 26| # | Issue | Severity | Fix | 27|---|-------|----------|-----| 28| 1 | **Duplicate boundary diagnostics** — When a package was both forbidden and not in the allowlist, two diagnostics were emitted for the same import. | Medium | Changed to `else if` so forbidden check takes priority. | 29| 2 | **Dead code in classifier** — The D-class branch inside the canon-impact block was unreachable (confidence was always ≥ 0.7, threshold was < 0.6). | Low | Removed dead branch. | 30| 3 | **`as any` in tests** — Two test lines used `{} as any` for signal objects. | Low | Replaced with properly typed empty signal objects. | 31 32--- 33 34## ⚠️ Issues to Address (Not Yet Fixed) 35 36### High Priority 37 38**H1. No provenance graph persistence** 39The PRD specifies a Provenance Graph (Section 2) that records all transformation edges. Currently, provenance is implicit (canonical nodes have `source_clause_ids`, IUs have `source_canon_ids`), but there's no unified provenance store. Every transformation should emit a provenance edge to a dedicated graph. 40 41**H2. Normalizer doesn't handle code blocks** 42Fenced code blocks (` ``` `) are currently processed like regular text — headings and list items inside code blocks get mangled. The parser should skip code block contents during normalization. 43 44**H3. No pre-heading content handling** 45If a spec file has content before the first heading (e.g., a preamble), it's silently discarded by the parser. Only heading-bounded sections are captured. The PRD doesn't explicitly address this, but losing content is wrong. 46 47**H4. Classifier D-class is hard to trigger** 48The current classification logic produces D (uncertain) only when `norm_diff > 0.7 || term_ref_delta > 0.7` AND no canonical impact AND no context shift. This is a very narrow window. The D-rate mechanism needs real exercise. 49 50### Medium Priority 51 52**M1. IU planner grouping is greedy** 53`clusterNodes()` uses BFS to group all transitively connected nodes. In a large spec, this could collapse too many unrelated requirements into a single giant IU because of loose term overlap chains (A links to B links to C...). Should add a max-cluster-size or minimum-link-weight threshold. 54 55**M2. Regeneration is stub-only** 56The regen engine only produces function stubs. This is expected for v1, but the stub quality is minimal — no imports, no types, no contract enforcement in the generated code. The stubs should at least generate TypeScript interfaces from the IU contract. 57 58**M3. Manifest doesn't track deleted files** 59If a file is removed from `output_files` between regenerations, the old manifest entry persists. Need a reconciliation step that detects orphaned manifest entries. 60 61**M4. Content store has no garbage collection** 62Objects are never deleted. After multiple ingestions, stale clause objects accumulate. Need either reference counting or mark-and-sweep relative to the current graph indices. 63 64**M5. Side channel detection is shallow** 65The dep-extractor uses regex patterns. It misses indirect patterns like `const { env } = process; env.SECRET`, dynamic imports, and aliased require calls. Acceptable for v1 but should move to AST-based extraction. 66 67**M6. Spec parser doesn't handle ATX heading edge cases** 68Lines like `# ` (heading marker with no text), `##text` (no space), or setext-style headings (`Title\n====`) are not handled. 69 70### Low Priority 71 72**L1. No .gitignore** 73The project is missing a `.gitignore` for `node_modules/`, `dist/`, and temp `.phoenix/` directories. 74 75**L2. Demo creates temp directories without cleanup** 76`mkdtempSync` in the demo creates temp dirs that are never cleaned up. 77 78**L3. Store uses synchronous fs operations** 79All file I/O is synchronous (`readFileSync`, `writeFileSync`). Fine for a CLI tool, but should be async if this becomes a long-running server. 80 81**L4. No input validation on store operations** 82`ContentStore.put()` and `SpecStore.ingestDocument()` don't validate inputs. A non-hex ID or missing file would produce cryptic errors. 83 84**L5. Warm hasher performance** 85`computeWarmHashes` iterates all canonical nodes for every clause (O(clauses × nodes)). Should build an index of clause→nodes first. 86 87--- 88 89## 📊 Coverage Gaps 90 91| Component | Unit Tests | Functional Tests | Gap | 92|-----------|-----------|-----------------|-----| 93| Normalizer | ✅ 12 | — | Missing: code blocks, nested markdown | 94| Spec Parser | ✅ 11 | ✅ via ingestion | Missing: setext headings, pre-heading content | 95| Semhash | ✅ 9 | — | — | 96| Diff | ✅ 7 | ✅ via ingestion | Missing: large-scale diff (100+ clauses) | 97| Canonicalizer | ✅ 13 | ✅ via canonicalization | — | 98| Warm Hasher | ✅ 5 | ✅ via canonicalization | — | 99| Classifier | ✅ 7 | ✅ via canonicalization | Missing: D-class exercise | 100| D-Rate | ✅ 9 | ✅ via canonicalization | — | 101| Bootstrap | ✅ 10 | ✅ via canonicalization | — | 102| IU Planner | ✅ 7 | ✅ via IU pipeline | Missing: large spec with many clusters | 103| Regen | ✅ 6 | ✅ via IU pipeline | — | 104| Manifest | — | ✅ via IU pipeline | Missing: dedicated unit tests for ManifestManager | 105| Drift | ✅ 5 | ✅ via IU pipeline | — | 106| Dep Extractor | ✅ 10 | ✅ via IU pipeline | — | 107| Boundary Validator | ✅ 12 | ✅ via IU pipeline | — | 108| Content Store | — | ✅ via ingestion | Missing: dedicated unit tests | 109| Spec Store | — | ✅ via ingestion | Missing: dedicated unit tests | 110| Canonical Store | — | ✅ via canonicalization | Missing: dedicated unit tests | 111 112--- 113 114## 🏗️ Recommendations for Phase D+ 115 1161. **Build a Provenance Store** before Evidence/Policy (Phase D) — the evidence engine needs provenance edges to bind evidence to the right graph nodes. 117 1182. **Add a CLI entry point** (`phoenix bootstrap`, `phoenix status`, `phoenix ingest`) — the core logic is all functions/classes but there's no user-facing command. 119 1203. **Add integration tests with the real PRD.md** — run the full A→C2 pipeline against the Phoenix PRD itself as a dogfood test. 121 1224. **Consider property-based testing** for the normalizer and diff engine — these are the foundation and need to be bulletproof. 123 1245. **Add structured logging** — every transformation should emit a structured log event that can reconstruct the provenance graph.