Validate built features through conversational testing with persistent state. Creates UAT.md that tracks test progress, survives /clear, and feeds gaps into /gsd:plan-phase --gaps. User tests, Claude records. One test at a time. Plain text responses. **Show expected, ask if reality matches.** Claude presents what SHOULD happen. User confirms or describes what's different. - "yes" / "y" / "next" / empty → pass - Anything else → logged as issue, severity inferred No Pass/Fail buttons. No severity questions. Just: "Here's what should happen. Does it?" If $ARGUMENTS contains a phase number, load context: ```bash INIT=$(node "C:/Users/yaoji/.claude/get-shit-done/bin/gsd-tools.cjs" init verify-work "${PHASE_ARG}") if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi ``` Parse JSON for: `planner_model`, `checker_model`, `commit_docs`, `phase_found`, `phase_dir`, `phase_number`, `phase_name`, `has_verification`. **First: Check for active UAT sessions** ```bash find .planning/phases -name "*-UAT.md" -type f 2>/dev/null | head -5 ``` **If active sessions exist AND no $ARGUMENTS provided:** Read each file's frontmatter (status, phase) and Current Test section. Display inline: ``` ## Active UAT Sessions | # | Phase | Status | Current Test | Progress | |---|-------|--------|--------------|----------| | 1 | 04-comments | testing | 3. Reply to Comment | 2/6 | | 2 | 05-auth | testing | 1. Login Form | 0/4 | Reply with a number to resume, or provide a phase number to start new. ``` Wait for user response. - If user replies with number (1, 2) → Load that file, go to `resume_from_file` - If user replies with phase number → Treat as new session, go to `create_uat_file` **If active sessions exist AND $ARGUMENTS provided:** Check if session exists for that phase. If yes, offer to resume or restart. If no, continue to `create_uat_file`. **If no active sessions AND no $ARGUMENTS:** ``` No active UAT sessions. Provide a phase number to start testing (e.g., /gsd:verify-work 4) ``` **If no active sessions AND $ARGUMENTS provided:** Continue to `create_uat_file`. **Find what to test:** Use `phase_dir` from init (or run init if not already done). ```bash ls "$phase_dir"/*-SUMMARY.md 2>/dev/null ``` Read each SUMMARY.md to extract testable deliverables. **Extract testable deliverables from SUMMARY.md:** Parse for: 1. **Accomplishments** - Features/functionality added 2. **User-facing changes** - UI, workflows, interactions Focus on USER-OBSERVABLE outcomes, not implementation details. For each deliverable, create a test: - name: Brief test name - expected: What the user should see/experience (specific, observable) Examples: - Accomplishment: "Added comment threading with infinite nesting" → Test: "Reply to a Comment" → Expected: "Clicking Reply opens inline composer below comment. Submitting shows reply nested under parent with visual indentation." Skip internal/non-observable items (refactors, type changes, etc.). **Cold-start smoke test injection:** After extracting tests from SUMMARYs, scan the SUMMARY files for modified/created file paths. If ANY path matches these patterns: `server.ts`, `server.js`, `app.ts`, `app.js`, `index.ts`, `index.js`, `main.ts`, `main.js`, `database/*`, `db/*`, `seed/*`, `seeds/*`, `migrations/*`, `startup*`, `docker-compose*`, `Dockerfile*` Then **prepend** this test to the test list: - name: "Cold Start Smoke Test" - expected: "Kill any running server/service. Clear ephemeral state (temp DBs, caches, lock files). Start the application from scratch. Server boots without errors, any seed/migration completes, and a primary query (health check, homepage load, or basic API call) returns live data." This catches bugs that only manifest on fresh start — race conditions in startup sequences, silent seed failures, missing environment setup — which pass against warm state but break in production. **Create UAT file with all tests:** ```bash mkdir -p "$PHASE_DIR" ``` Build test list from extracted deliverables. Create file: ```markdown --- status: testing phase: XX-name source: [list of SUMMARY.md files] started: [ISO timestamp] updated: [ISO timestamp] --- ## Current Test number: 1 name: [first test name] expected: | [what user should observe] awaiting: user response ## Tests ### 1. [Test Name] expected: [observable behavior] result: [pending] ### 2. [Test Name] expected: [observable behavior] result: [pending] ... ## Summary total: [N] passed: 0 issues: 0 pending: [N] skipped: 0 ## Gaps [none yet] ``` Write to `.planning/phases/XX-name/{phase_num}-UAT.md` Proceed to `present_test`. **Present current test to user:** Read Current Test section from UAT file. Display using checkpoint box format: ``` ╔══════════════════════════════════════════════════════════════╗ ║ CHECKPOINT: Verification Required ║ ╚══════════════════════════════════════════════════════════════╝ **Test {number}: {name}** {expected} ────────────────────────────────────────────────────────────── → Type "pass" or describe what's wrong ────────────────────────────────────────────────────────────── ``` Wait for user response (plain text, no AskUserQuestion). **Process user response and update file:** **If response indicates pass:** - Empty response, "yes", "y", "ok", "pass", "next", "approved", "✓" Update Tests section: ``` ### {N}. {name} expected: {expected} result: pass ``` **If response indicates skip:** - "skip", "can't test", "n/a" Update Tests section: ``` ### {N}. {name} expected: {expected} result: skipped reason: [user's reason if provided] ``` **If response is anything else:** - Treat as issue description Infer severity from description: - Contains: crash, error, exception, fails, broken, unusable → blocker - Contains: doesn't work, wrong, missing, can't → major - Contains: slow, weird, off, minor, small → minor - Contains: color, font, spacing, alignment, visual → cosmetic - Default if unclear: major Update Tests section: ``` ### {N}. {name} expected: {expected} result: issue reported: "{verbatim user response}" severity: {inferred} ``` Append to Gaps section (structured YAML for plan-phase --gaps): ```yaml - truth: "{expected behavior from test}" status: failed reason: "User reported: {verbatim user response}" severity: {inferred} test: {N} artifacts: [] # Filled by diagnosis missing: [] # Filled by diagnosis ``` **After any response:** Update Summary counts. Update frontmatter.updated timestamp. If more tests remain → Update Current Test, go to `present_test` If no more tests → Go to `complete_session` **Resume testing from UAT file:** Read the full UAT file. Find first test with `result: [pending]`. Announce: ``` Resuming: Phase {phase} UAT Progress: {passed + issues + skipped}/{total} Issues found so far: {issues count} Continuing from Test {N}... ``` Update Current Test section with the pending test. Proceed to `present_test`. **Complete testing and commit:** Update frontmatter: - status: complete - updated: [now] Clear Current Test section: ``` ## Current Test [testing complete] ``` Commit the UAT file: ```bash node "C:/Users/yaoji/.claude/get-shit-done/bin/gsd-tools.cjs" commit "test({phase_num}): complete UAT - {passed} passed, {issues} issues" --files ".planning/phases/XX-name/{phase_num}-UAT.md" ``` Present summary: ``` ## UAT Complete: Phase {phase} | Result | Count | |--------|-------| | Passed | {N} | | Issues | {N} | | Skipped| {N} | [If issues > 0:] ### Issues Found [List from Issues section] ``` **If issues > 0:** Proceed to `diagnose_issues` **If issues == 0:** ``` All tests passed. Ready to continue. - `/gsd:plan-phase {next}` — Plan next phase - `/gsd:execute-phase {next}` — Execute next phase - `/gsd:ui-review {phase}` — visual quality audit (if frontend files were modified) ``` **Diagnose root causes before planning fixes:** ``` --- {N} issues found. Diagnosing root causes... Spawning parallel debug agents to investigate each issue. ``` - Load diagnose-issues workflow - Follow @C:/Users/yaoji/.claude/get-shit-done/workflows/diagnose-issues.md - Spawn parallel debug agents for each issue - Collect root causes - Update UAT.md with root causes - Proceed to `plan_gap_closure` Diagnosis runs automatically - no user prompt. Parallel agents investigate simultaneously, so overhead is minimal and fixes are more accurate. **Auto-plan fixes from diagnosed gaps:** Display: ``` ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ GSD ► PLANNING FIXES ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ◆ Spawning planner for gap closure... ``` Spawn gsd-planner in --gaps mode: ``` Task( prompt=""" **Phase:** {phase_number} **Mode:** gap_closure - {phase_dir}/{phase_num}-UAT.md (UAT with diagnoses) - .planning/STATE.md (Project State) - .planning/ROADMAP.md (Roadmap) Output consumed by /gsd:execute-phase Plans must be executable prompts. """, subagent_type="gsd-planner", model="{planner_model}", description="Plan gap fixes for Phase {phase}" ) ``` On return: - **PLANNING COMPLETE:** Proceed to `verify_gap_plans` - **PLANNING INCONCLUSIVE:** Report and offer manual intervention **Verify fix plans with checker:** Display: ``` ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ GSD ► VERIFYING FIX PLANS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ◆ Spawning plan checker... ``` Initialize: `iteration_count = 1` Spawn gsd-plan-checker: ``` Task( prompt=""" **Phase:** {phase_number} **Phase Goal:** Close diagnosed gaps from UAT - {phase_dir}/*-PLAN.md (Plans to verify) Return one of: - ## VERIFICATION PASSED — all checks pass - ## ISSUES FOUND — structured issue list """, subagent_type="gsd-plan-checker", model="{checker_model}", description="Verify Phase {phase} fix plans" ) ``` On return: - **VERIFICATION PASSED:** Proceed to `present_ready` - **ISSUES FOUND:** Proceed to `revision_loop` **Iterate planner ↔ checker until plans pass (max 3):** **If iteration_count < 3:** Display: `Sending back to planner for revision... (iteration {N}/3)` Spawn gsd-planner with revision context: ``` Task( prompt=""" **Phase:** {phase_number} **Mode:** revision - {phase_dir}/*-PLAN.md (Existing plans) **Checker issues:** {structured_issues_from_checker} Read existing PLAN.md files. Make targeted updates to address checker issues. Do NOT replan from scratch unless issues are fundamental. """, subagent_type="gsd-planner", model="{planner_model}", description="Revise Phase {phase} plans" ) ``` After planner returns → spawn checker again (verify_gap_plans logic) Increment iteration_count **If iteration_count >= 3:** Display: `Max iterations reached. {N} issues remain.` Offer options: 1. Force proceed (execute despite issues) 2. Provide guidance (user gives direction, retry) 3. Abandon (exit, user runs /gsd:plan-phase manually) Wait for user response. **Present completion and next steps:** ``` ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ GSD ► FIXES READY ✓ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ **Phase {X}: {Name}** — {N} gap(s) diagnosed, {M} fix plan(s) created | Gap | Root Cause | Fix Plan | |-----|------------|----------| | {truth 1} | {root_cause} | {phase}-04 | | {truth 2} | {root_cause} | {phase}-04 | Plans verified and ready for execution. ─────────────────────────────────────────────────────────────── ## ▶ Next Up **Execute fixes** — run fix plans `/clear` then `/gsd:execute-phase {phase} --gaps-only` ─────────────────────────────────────────────────────────────── ``` **Batched writes for efficiency:** Keep results in memory. Write to file only when: 1. **Issue found** — Preserve the problem immediately 2. **Session complete** — Final write before commit 3. **Checkpoint** — Every 5 passed tests (safety net) | Section | Rule | When Written | |---------|------|--------------| | Frontmatter.status | OVERWRITE | Start, complete | | Frontmatter.updated | OVERWRITE | On any file write | | Current Test | OVERWRITE | On any file write | | Tests.{N}.result | OVERWRITE | On any file write | | Summary | OVERWRITE | On any file write | | Gaps | APPEND | When issue found | On context reset: File shows last checkpoint. Resume from there. **Infer severity from user's natural language:** | User says | Infer | |-----------|-------| | "crashes", "error", "exception", "fails completely" | blocker | | "doesn't work", "nothing happens", "wrong behavior" | major | | "works but...", "slow", "weird", "minor issue" | minor | | "color", "spacing", "alignment", "looks off" | cosmetic | Default to **major** if unclear. User can correct if needed. **Never ask "how severe is this?"** - just infer and move on. - [ ] UAT file created with all tests from SUMMARY.md - [ ] Tests presented one at a time with expected behavior - [ ] User responses processed as pass/issue/skip - [ ] Severity inferred from description (never asked) - [ ] Batched writes: on issue, every 5 passes, or completion - [ ] Committed on completion - [ ] If issues: parallel debug agents diagnose root causes - [ ] If issues: gsd-planner creates fix plans (gap_closure mode) - [ ] If issues: gsd-plan-checker verifies fix plans - [ ] If issues: revision loop until plans pass (max 3 iterations) - [ ] Ready for `/gsd:execute-phase --gaps-only` when complete