An AI agent built to do Ralph loops - plan mode for planning and ralph mode for implementing.

docs: add human test plan for V2 Phase 5

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

+129
+129
docs/test-plans/2026-02-12-v2-phase5.md
··· 1 + # Human Test Plan: V2 Phase 5 Agent Intelligence & Context Enhancement 2 + 3 + ## Prerequisites 4 + 5 + - Rust toolchain 1.85.0+ installed 6 + - Working directory: `/Users/david.hagerty/code/personal/rustagent/new-directions` 7 + - Branch: `new-directions` 8 + - Verify automated tests pass: `cargo test` 9 + - Verify compilation: `cargo check` 10 + - Verify no clippy warnings: `cargo clippy` 11 + 12 + ## Phase 1: AGENTS.md Parser Enhancement 13 + 14 + | Step | Action | Expected | 15 + |------|--------|----------| 16 + | 1.1 | Run `cargo test agents_md` | All tests pass, including heading summary and relative path tests | 17 + | 1.2 | Create a test project with nested AGENTS.md files at root, `src/`, and `src/auth/`. Run `cargo test agents_md` | `resolve_agents_md` returns relative paths (no absolute prefixes), ordered closest-to-file first, with heading line counts | 18 + | 1.3 | Run `cargo test context` and look for ReadAgentsMdTool tests | Tests pass for both directory path input (`src/auth`) and full file path input (`src/auth/AGENTS.md`) | 19 + 20 + ## Phase 2: ContextBuilder Enhancement 21 + 22 + | Step | Action | Expected | 23 + |------|--------|----------| 24 + | 2.1 | Run `cargo test context` | All tests pass, including dependency status and previous attempt rendering | 25 + | 2.2 | Inspect test output for `[DEP:DONE]` and `[DEP:PENDING]` markers | System prompt contains correct dependency status lines | 26 + | 2.3 | Inspect test output for `[PREV_ATTEMPT]` section | System prompt includes `## Previous Attempt` and `[PREV_ATTEMPT]` when previous attempt exists, omits section entirely when `None` | 27 + | 2.4 | Run `cargo test context` and look for budget tests | With small budget (e.g., 200 tokens), only required sections (Role, Task, Rules) remain; optional sections (observations, AGENTS.md summaries, decisions) are trimmed | 28 + 29 + ## Phase 3: Autonomy Levels & Approval Gates 30 + 31 + | Step | Action | Expected | 32 + |------|--------|----------| 33 + | 3.1 | Run `cargo test autonomy` | All 19 inline tests pass | 34 + | 3.2 | Verify serde round-trips | `AutonomyLevel` serializes as lowercase (`"full"`, `"supervised"`, `"gated"`), `ApprovalResponse` uses `tag = "action"` with lowercase variants | 35 + | 3.3 | Verify `AutonomyLevel::default()` | Returns `Supervised` | 36 + | 3.4 | Verify `GateChecker::new(Full).check_gate(PlanReview, ...)` | Returns `None` (no gates active) | 37 + | 3.5 | Verify `GateChecker::new(Supervised).check_gate(PlanReview, ...)` | Returns `Some(ApprovalRequest)` with correct gate and fields | 38 + 39 + ## Phase 4: Security Scope Enforcement 40 + 41 + | Step | Action | Expected | 42 + |------|--------|----------| 43 + | 4.1 | Run `cargo test scope` | All inline unit tests pass | 44 + | 4.2 | Run `cargo test security_scope` | Integration tests pass, verifying built-in profiles (planner/reviewer/researcher deny writes, coder allows writes) | 45 + | 4.3 | Verify deny-takes-precedence: `check_path` with a path matching both allowed and denied patterns | Returns `ScopeCheck::Denied` | 46 + | 4.4 | Verify `read_only` enforcement | All write/create operations denied when `read_only = true` | 47 + | 4.5 | Verify wildcard `*` matching | `*` in `allowed_paths` matches any path; `*` in `allowed_commands` matches any command | 48 + 49 + ## Phase 5: Code Search Tool 50 + 51 + | Step | Action | Expected | 52 + |------|--------|----------| 53 + | 5.1 | Run `cargo test code_search` | All unit and integration tests pass | 54 + | 5.2 | Verify pattern search | Searching for `"fn main"` in a test project returns matching lines with `file_path:line_number: content` format | 55 + | 5.3 | Verify glob filtering | Searching with `file_glob: "*.rs"` limits results to `.rs` files only | 56 + | 5.4 | Verify result capping | With `max_results: 2`, returns exactly 2 results plus a cap notice line | 57 + | 5.5 | Verify binary file handling | Binary files are skipped without errors | 58 + | 5.6 | Verify directory skipping | Files in `node_modules/`, `target/`, `.git/`, `.jj/` directories are not searched | 59 + | 5.7 | Verify tool registration | `cargo check` succeeds (compiler enforces `CodeSearchTool` is registered in V2 registry with `project_root` parameter) | 60 + 61 + ## Phase 6: Agent Error Recovery & Task Reassignment 62 + 63 + | Step | Action | Expected | 64 + |------|--------|----------| 65 + | 6.1 | Run `cargo test orchestrator` | All tests pass, including lifecycle and individual AC tests | 66 + | 6.2 | Verify previous attempt injection | After a task retry, the re-spawned `AgentContext` has `previous_attempt = Some("the error")` | 67 + | 6.3 | Verify first attempt has no previous attempt | On first attempt (no retries), `previous_attempt` is `None` | 68 + | 6.4 | Verify failure cascading | After task A fails permanently, task B (which `DependsOn` A) becomes Blocked with `blocked_reason` referencing A and `metadata["blocker_task_id"]` set to A's ID | 69 + | 6.5 | Verify unrelated task unaffected | Task C (no dependency on A) remains Ready after A fails | 70 + | 6.6 | Verify blocked task recovery | During scheduling, a Blocked task whose blocker has completed transitions back to Ready | 71 + | 6.7 | Verify blocked task stays blocked | A Blocked task whose blocker is still Failed remains Blocked | 72 + 73 + ## End-to-End Smoke Test 74 + 75 + **Purpose:** Validates that all Phase 5 features work together with a running LLM provider and agent. 76 + 77 + | Step | Action | Expected | 78 + |------|--------|----------| 79 + | E2E.1 | Start the daemon with a test project: `cargo run -- project add test-project /tmp/test` | Project registered | 80 + | E2E.2 | Create a goal with gated autonomy: `cargo run -- run "Test goal" --profile coder` | Agent starts executing | 81 + | E2E.3 | Observe agent context (via debug logging: `RUST_LOG=rustagent=debug`) | System prompt includes dependency status `[DEP:DONE]`/`[DEP:PENDING]` lines and token budget trimming is active | 82 + | E2E.4 | Verify agent can use `code_search` tool | Agent invokes `code_search` and receives formatted results | 83 + | E2E.5 | Verify SecurityScope enforcement | Agent operating under coder profile has path/command restrictions enforced per `SecurityScope` | 84 + | E2E.6 | Simulate a task failure (force-kill agent mid-task, or set max_retries=1 for a task that errors) | On retry, the new agent context includes `[PREV_ATTEMPT]` with the previous failure description | 85 + 86 + ## Traceability Matrix 87 + 88 + | Acceptance Criterion | Automated Test Location | Manual Step | 89 + |----------------------|------------------------|-------------| 90 + | v2-phase5.AC1.1 | `src/context/agents_md.rs` (inline) | 1.1, 1.2 | 91 + | v2-phase5.AC1.2 | `src/context/agents_md.rs` (inline) | 1.2 | 92 + | v2-phase5.AC1.3 | `src/context/agents_md.rs` (inline) | 1.2 | 93 + | v2-phase5.AC2.1 | `src/context/mod.rs` (inline) | 1.3 | 94 + | v2-phase5.AC2.2 | `src/context/mod.rs` (inline) | 1.3 | 95 + | v2-phase5.AC3.1 | `src/context/mod.rs` (inline) | 2.2 | 96 + | v2-phase5.AC3.2 | `src/context/mod.rs` (inline) | 2.3 | 97 + | v2-phase5.AC3.3 | `src/context/mod.rs` (inline) | 2.3 | 98 + | v2-phase5.AC4.1 | `src/context/mod.rs` (inline) | 2.4 | 99 + | v2-phase5.AC4.2 | `src/context/mod.rs` (inline) | 2.4 | 100 + | v2-phase5.AC5.1 | `src/autonomy.rs` (inline) | 3.1 | 101 + | v2-phase5.AC5.2 | `src/autonomy.rs` (inline) | 3.1 | 102 + | v2-phase5.AC5.3 | `src/autonomy.rs` (inline) | 3.1 | 103 + | v2-phase5.AC5.4 | `src/autonomy.rs` (inline) | 3.3 | 104 + | v2-phase5.AC6.1 | `src/autonomy.rs` (inline) | 3.1 | 105 + | v2-phase5.AC6.2 | `src/autonomy.rs` (inline) | 3.2 | 106 + | v2-phase5.AC14.1 | `src/autonomy.rs` (inline) | 3.4 | 107 + | v2-phase5.AC14.2 | `src/autonomy.rs` (inline) | 3.5 | 108 + | v2-phase5.AC7.1 | `src/security/scope.rs` (inline) | 4.1 | 109 + | v2-phase5.AC7.2 | `src/security/scope.rs` (inline) | 4.3 | 110 + | v2-phase5.AC7.3 | `src/security/scope.rs` (inline) | 4.4 | 111 + | v2-phase5.AC7.4 | `src/security/scope.rs` (inline) | 4.4 | 112 + | v2-phase5.AC7.5 | `src/security/scope.rs` (inline) | 4.5 | 113 + | v2-phase5.AC8.1 | `src/security/scope.rs` (inline) | 4.1 | 114 + | v2-phase5.AC8.2 | `src/security/scope.rs` (inline) | 4.1 | 115 + | v2-phase5.AC8.3 | `src/security/scope.rs` (inline) | 4.5 | 116 + | v2-phase5.AC9.1 | `src/tools/search.rs` (inline) + `tests/code_search_test.rs` | 5.2 | 117 + | v2-phase5.AC9.2 | `src/tools/search.rs` (inline) + `tests/code_search_test.rs` | 5.3 | 118 + | v2-phase5.AC9.3 | `src/tools/search.rs` (inline) + `tests/code_search_test.rs` | 5.4 | 119 + | v2-phase5.AC9.4 | `src/tools/search.rs` (inline) | 5.5 | 120 + | v2-phase5.AC10.1 | `cargo check` (compiler) | 5.7 | 121 + | v2-phase5.AC10.2 | `src/tools/search.rs` (inline) | 5.1 | 122 + | v2-phase5.AC11.1 | `tests/orchestrator_test.rs` | 6.2 | 123 + | v2-phase5.AC11.2 | `tests/orchestrator_test.rs` | 6.3 | 124 + | v2-phase5.AC12.1 | `tests/orchestrator_test.rs` | 6.4 | 125 + | v2-phase5.AC12.2 | `tests/orchestrator_test.rs` | 6.5 | 126 + | v2-phase5.AC13.1 | `tests/orchestrator_test.rs` | 6.6 | 127 + | v2-phase5.AC13.2 | `tests/orchestrator_test.rs` | 6.7 | 128 + 129 + All 38 acceptance criteria accounted for: 38 by automated tests, 7 by end-to-end manual verification steps.