smart-support/eng-review-plan.md

# Smart Support Framework — Eng Review Plan

## Context

Build a pluggable AI customer support framework. Core value: "Paste your API, get an AI agent that executes actions." This plan incorporates all CEO review expansions (6 features) with re-sequenced phasing (core first). Timeline extended to 6-7 weeks per outside voice feedback.

No code exists yet. Greenfield project.

## Architecture Decisions

```
Customer → React Chat UI → FastAPI WebSocket → LangGraph Supervisor → Agents → MCP Tools → Client APIs
                                                      ↑                   ↑
                                                Agent Registry        interrupt()
                                                (YAML config)        (HITL safety)
                                                      ↑
                                              PostgresSaver
                                            (checkpoint persistence)
```

| Decision | Choice | Rationale |
|----------|--------|-----------|
| Agent orchestration | `langgraph-supervisor` v1.1 | Built-in supervisor with middleware. Don't rebuild. [Layer 1] |
| MCP integration | `langchain-mcp-adapters` + `@tool` | MultiServerMCPClient for MCP, @tool for CLI/API. No custom base class. [Layer 1] |
| Checkpointer | PostgresSaver from day one (app + tests) | Phase 4 analytics/replay needs queryable data. Docker Compose. |
| LLM provider | LangChain `BaseChatModel` + env config | No custom wrapper. `LLM_PROVIDER` + `LLM_MODEL` env vars. [Layer 1] |
| Streaming | FastAPI WebSocket + `astream_events()` | Built-in. No custom streaming layer. [Layer 1] |
| OpenAPI import | Full MCP server generation + LLM classification + human review | Parse spec → generate tools → LLM classifies read/write/params → operator reviews |
| OpenAPI import UX | Async background task with WebSocket progress | Don't block chat during import |
| Replay | Custom paginated API endpoint | Not raw `get_state_history()`. Design for 200+ turn threads. |
| Interrupt TTL | Auto-cancel + retry offer after 30 min | Stale approvals are dangerous. Re-evaluate current state on retry. |
| Routing fallback | General-purpose fallback agent | Catches misroutes. TODO for routing accuracy eval. |
| Resolution metric | Tool call success + no escalation | Honest starting definition. Refine with customer satisfaction signals. |
| Cost tracking | LangChain callback logging tokens per conversation | Surface cost-per-resolution in analytics. |
| SSRF protection | Block private IPs + DNS rebinding protection | Mandatory for OpenAPI URL fetching. Build as standalone utility. |
| DB error handling | try/except around graph invocation | Return clear error message to user, don't fail silently. |

## Project Structure

```
smart-support/
├── backend/
│   ├── app/
│   │   ├── main.py          # FastAPI app + WebSocket
│   │   ├── graph.py         # LangGraph supervisor setup
│   │   ├── agents/          # Agent definitions + tools
│   │   ├── registry.py      # YAML agent registry loader
│   │   ├── openapi/         # OpenAPI parser + MCP server generator
│   │   ├── replay/          # Replay API endpoint
│   │   ├── analytics/       # Analytics queries + endpoint
│   │   └── callbacks.py     # Token usage logging callback
│   ├── agents.yaml          # Agent registry config
│   ├── templates/           # Vertical templates (e-commerce.yaml, etc.)
│   └── tests/
├── frontend/                # React chat UI + replay + dashboard
├── docker-compose.yml       # Postgres + app
└── pyproject.toml
```

## Phasing (6-7 weeks)

### Phase 1 (Weeks 1-3): Core Framework
- FastAPI backend with WebSocket for chat
- LangGraph supervisor with 2-3 demo agents (order lookup, FAQ, escalation)
- PostgresSaver checkpointer via Docker Compose
- YAML-based agent registry with validation
- React chat UI with streaming tokens
- Agent personality via YAML config
- Basic interrupt() flow for write operations
- Fallback agent for misrouted queries
- Token usage logging callback
- Try/except for DB errors
- **Integration checkpoint:** End of week 3, full chat loop works end-to-end

### Phase 2 (Weeks 3-4): Multi-Agent + Safety
- Full supervisor routing with intent classification
- Webhook escalation (HTTP POST to configured URL + retry)
- Vertical templates (YAML configs for e-commerce, SaaS)
- Expired interrupt handling (auto-cancel + retry offer after 30-min TTL)
- **Integration checkpoint:** End of week 4, multi-agent routing + interrupt flow works

### Phase 3 (Weeks 4-6): OpenAPI Auto-Discovery
- Parse OpenAPI 3.0 specs from user-provided URLs
- SSRF protection (block private IPs, DNS rebinding, URL allowlist)
- Generate full MCP servers wrapping each endpoint
- LLM-assisted endpoint classification (read/write, customer params, agent groupings)
- Operator review/correction UI for classifications
- Auto-generate agent YAML from classified spec
- Async import with WebSocket progress updates
- **Integration checkpoint:** End of week 6, paste a real API spec → tools work in chat

### Phase 4 (Weeks 6-7): Analytics + Replay
- Custom paginated replay API endpoint
- Replay UI (step-by-step timeline in React)
- Analytics queries (resolution rate, agent usage, escalation %, cost-per-resolution)
- Analytics dashboard UI with zero-state handling
- Resolution rate = successful tool call + no escalation
- **Integration checkpoint:** End of week 7, full product demo ready

### Phase 5 (Buffer): Polish + Demo Prep
- Error handling hardening
- Demo script and sample data
- Docker Compose for full-stack deployment

## Tech Stack

- Python 3.11+, FastAPI, LangGraph v1.1.0
- langgraph-supervisor, langchain-mcp-adapters, langgraph-checkpoint-postgres v3.0.5
- React (frontend), PostgreSQL 16 (via Docker Compose)
- Claude Sonnet 4.6 via `ChatAnthropic` (configurable via env)
- pytest + FastAPI TestClient for backend tests
- openapi-spec-validator for spec validation

## NOT in scope

- Authentication/authorization (deferred to pre-production)
- Multi-tenant architecture (deferred to first paid customer)
- CI/CD pipeline (manual deploy for prototype)
- Rate limiting (deferred to pre-production)
- Zendesk/Intercom marketplace integration (deferred to post-validation)
- Mobile-responsive chat UI (desktop-only for demo)
- Internationalization/i18n
- Billing/pricing infrastructure
- Distribution pipeline (manual Docker Compose deploy)

## What already exists (reuse, don't rebuild)

- `langgraph-supervisor` — agent orchestration
- `langgraph-checkpoint-postgres` — state persistence
- LangGraph `interrupt()` — human-in-the-loop
- `langchain-mcp-adapters` (`MultiServerMCPClient`) — MCP tool integration
- LangChain `BaseChatModel` — LLM provider abstraction
- FastAPI WebSocket + `astream_events()` — streaming
- `openapi-spec-validator` — OpenAPI spec validation

## Testing Strategy

TDD per phase. 80%+ coverage target. pytest + FastAPI TestClient.

45 codepaths identified (33 code paths + 12 user flows, 6 E2E).

Key test categories:
1. **Graph tests** — invoke supervisor with mock tools, assert routing + state
2. **MCP tool tests** — mock external HTTP, test structured responses
3. **WebSocket tests** — FastAPI TestClient, test message → response cycle
4. **Interrupt tests** — test approval, rejection, and TTL expiry flows
5. **OpenAPI tests** — test spec parsing, SSRF blocking, MCP generation
6. **E2E tests** — 6 critical flows (happy path, cancel+approve, cancel+reject, multi-turn, OpenAPI import, replay)

## Failure Modes

| Codepath | Failure | Mitigation |
|----------|---------|------------|
| LLM API call | Timeout/rate limit | Error message to user |
| MCP tool call | External API down | Escalation + error message |
| Interrupt resume | 30-min TTL expired | Auto-cancel + retry offer |
| PostgresSaver | DB connection lost | try/except + user-facing error |
| OpenAPI URL fetch | SSRF attempt | Block private IPs + DNS rebinding |
| Supervisor routing | Wrong agent | Fallback agent catches misroutes |
| Webhook POST | Target unreachable | Retry with backoff + log |

## Parallelization Strategy

| Lane | Steps | Modules |
|------|-------|---------|
| A | Phase 1 backend + Phase 2 | backend/app/ |
| B | Phase 1 frontend | frontend/ |
| C | SSRF utility (standalone) | backend/app/openapi/ssrf.py |

Launch A + B + C in parallel. Merge after Phase 1. Phase 3-4 are sequential (depend on core).

## Verification

1. `docker compose up` — Postgres + app starts
2. Open `http://localhost:8000` — chat UI loads
3. Send "What's the status of order 1042?" — get streaming response
4. Send "Cancel order 1042" — get interrupt prompt → approve → confirmation
5. `pytest --cov` — 80%+ coverage
6. Paste sample OpenAPI spec → tools generated → chat uses them (Phase 3)
7. View replay of completed conversation (Phase 4)
8. View analytics dashboard (Phase 4)

## GSTACK REVIEW REPORT

| Review | Trigger | Why | Runs | Status | Findings |
|--------|---------|-----|------|--------|----------|
| CEO Review | `/plan-ceo-review` | Scope & strategy | 1 | CLEAR | 6 proposals, 6 accepted, 0 deferred |
| Codex Review | `/codex review` | Independent 2nd opinion | 0 | — | — |
| Eng Review | `/plan-eng-review` | Architecture & tests (required) | 2 | CLEAR | 10 issues, 0 critical gaps |
| Design Review | `/plan-design-review` | UI/UX gaps | 0 | — | — |

- **OUTSIDE VOICE:** Claude subagent review found 10 issues. 3 cross-model tensions resolved (PostgresSaver timing, OpenAPI feasibility, timeline). 3 new findings adopted (routing fallback, resolution metric definition, LLM cost tracking).
- **UNRESOLVED:** 0
- **VERDICT:** CEO + ENG CLEARED — ready to implement