Clone this repository
For self-hosted knots, clone URLs may differ based on your setup.
Download tar.gz
Full pipeline with web interface generates cleanly and all CRUD tests
pass. Increased bootstrap timeout to 15min for 4-IU specs with UI.
Added Web Interface section to todo spec. Phoenix now generates a
complete single-page HTML app with inline CSS/JS that calls the API
via fetch(). Web/UI modules mount at root (/). Architecture-aware
stub fallback produces valid Hono routers when LLM fails.
Two prompt changes:
1. Enforce snake_case for column names and JSON keys (category_id not
categoryid, by_category not bycategory)
2. Stats endpoint at /todos/stats (natural REST sub-resource)
All 19 tests pass: categories CRUD, todos CRUD with FK validation,
LEFT JOIN with category_name, query filtering (completed, category_id),
stats with by_category aggregation, cascade delete protection.
Full pipeline: spec → canonical graph → IUs → working multi-resource
REST API with SQLite, foreign keys, JOINs, filtering, and validation.
Root cause: the typecheck-retry loop couldn't resolve ../../db.js
because shared files were written by scaffold AFTER code generation.
The LLM would "fix" the import error by creating its own Database.
Now: shared files + package.json + npm install happen BEFORE codegen.
Also added mandatory import block at top of user prompt and multi-
resource code example with JOINs, filtering, cascade protection.
Imports now correct (db from ../../db.js). Score 32% on hard spec —
remaining failures are logic issues (JOINs, stats, filtering), not
import issues. Ready for autoresearch prompt optimization.
Expanded todo spec: categories with FK relationships, query filtering,
stats endpoint. Fixed route mounting to derive paths from IU names.
Strengthened DB import rules in architecture prompt.
Score: 42% (8/19) — categories CRUD works, todos partially work.
Remaining: JOINs, filtering, stats, delete cascade. Ready for
autoresearch prompt optimization.
Automated test harness that bootstraps the todo-app from scratch,
starts the server, runs 10 HTTP tests (create, list, get, update,
delete, validation, 404), and scores pass rate.
Score: 10/10 (100%) — reproducible across clean bootstraps.
This is the autoresearch eval function for architecture prompt tuning.
Fixed three issues:
1. DB sharing: strengthened architecture prompt to forbid new Database(),
require import from ../../db.js
2. Route consolidation: simplified spec to one section per resource,
producing 1 IU instead of 6 fragmented modules
3. Data model context: prompt builder now includes DEFINITION/CONTEXT
nodes from other sections so LLM sees full schema
All CRUD operations verified working:
- POST /todos → 201 with Zod validation
- GET /todos → 200, ordered by created_at
- GET /todos/:id → 200 or 404
- PATCH /todos/:id → updates title/completed
- DELETE /todos/:id → 204
- Validation: empty title → 400, missing todo → 404