Reference implementation for the Phoenix Architecture. Work in progress. aicoding.leaflet.pub/
ai coding crazy
at main 89 lines 4.0 kB view raw view rendered
1# Phoenix Architecture Prompt Optimization — Experiment Program 2 3You are an autonomous research agent optimizing the architecture target prompts so that Phoenix generates working multi-resource REST APIs from specs. 4 5## Rules 6 71. **Edit ONLY `src/architectures/sqlite-web-api.ts`** — the system prompt extension and code examples 82. **Run `npx tsx experiments/eval-runner-arch.ts --skip-bootstrap`** to test changes (uses existing generated code) 93. **When you want to test with full regeneration**, run `npx tsx experiments/eval-runner-arch.ts` (takes ~2-3 min for LLM calls) 104. **Parse the score** from the last line: `val_score=X.XXXX` 115. **If score improved**`git add src/architectures/sqlite-web-api.ts && git commit -m "arch-experiment: <description> score=X.XXXX"` 126. **If score decreased or unchanged**`git checkout src/architectures/sqlite-web-api.ts` (revert) 137. **Never stop to ask the human** 148. **Never edit the eval runner or the spec** 15 16## Current Score: 42% (8/19 tests passing) 17 18## What Works (8/19) 19- POST /categories creates category ✓ 20- POST /categories rejects empty name ✓ 21- GET /categories returns array ✓ 22- POST /todos creates todo without category ✓ 23- POST /todos rejects invalid category_id ✓ 24- POST /todos rejects empty title ✓ 25- GET /todos/999 returns 404 ✓ 26- GET /todos?completed=0 filters incomplete ✓ 27 28## What Fails (11/19) 29- POST /todos creates todo with category — likely category_id not saved 30- GET /todos returns todos with category_name — SQL JOIN missing 31- GET /todos/:id returns todo with category_name — SQL JOIN missing 32- PATCH /todos/:id marks completed — patch not working 33- GET /todos?completed=1 filters completed — filter broken 34- GET /todos?category_id=N filters by category — filter broken 35- GET /stats returns counts — stats endpoint missing or wrong 36- GET /stats includes by_category — stats endpoint missing 37- DELETE /todos/:id returns 204 — delete broken 38- DELETE /categories/:id with todos returns 400 — cascade check missing 39- DELETE /categories/:id without todos returns 204 — delete broken 40 41## Key Issues to Fix via Prompt Engineering 42 43### 1. SQL JOINs for related data 44The generated todo queries use `SELECT * FROM todos` but need: 45```sql 46SELECT todos.*, categories.name as category_name 47FROM todos LEFT JOIN categories ON todos.category_id = categories.id 48``` 49Add this pattern to the code examples. 50 51### 2. Query parameter filtering 52The spec says `GET /todos?completed=1` should filter. The generated code needs to: 53```typescript 54const completed = c.req.query('completed'); 55let query = 'SELECT ... FROM todos LEFT JOIN categories ON ...'; 56const params: unknown[] = []; 57if (completed !== undefined) { query += ' WHERE todos.completed = ?'; params.push(Number(completed)); } 58``` 59 60### 3. Stats endpoint as a separate module 61The stats endpoint is a separate IU. It needs its own Hono router with a GET / handler that queries aggregate data. 62 63### 4. Delete with cascade check 64DELETE /categories/:id needs to check if any todos reference this category before deleting. 65 66### 5. Multi-resource relationships 67The code example only shows a single resource (notes). Add a second example showing: 68- Foreign key relationships 69- LEFT JOIN queries 70- Cascade protection on delete 71- Query parameter filtering 72 73## Strategy 74 751. First: add a multi-resource code example to the architecture target showing JOINs, filtering, and cascade protection 762. Then: run full eval to see if the LLM picks up the new patterns 773. Iterate on prompt wording if specific tests still fail 78 79## What You Can Change 80 81In `src/architectures/sqlite-web-api.ts`: 82- `SYSTEM_PROMPT_EXTENSION` — the architectural rules 83- `CODE_EXAMPLES` — the few-shot examples (most powerful lever) 84- Both strings are interpolated into the LLM prompt at generation time 85 86## Cost 87 88Each full eval run costs ~$0.05-0.15 in API calls (3 IU generations + canonicalization). 89Keep experiments focused. 10-15 experiments should be enough.