# Smart Support Framework — Eng Review Plan ## Context Build a pluggable AI customer support framework. Core value: "Paste your API, get an AI agent that executes actions." This plan incorporates all CEO review expansions (6 features) with re-sequenced phasing (core first). Timeline extended to 6-7 weeks per outside voice feedback. No code exists yet. Greenfield project. ## Architecture Decisions ``` Customer → React Chat UI → FastAPI WebSocket → LangGraph Supervisor → Agents → MCP Tools → Client APIs ↑ ↑ Agent Registry interrupt() (YAML config) (HITL safety) ↑ PostgresSaver (checkpoint persistence) ``` | Decision | Choice | Rationale | |----------|--------|-----------| | Agent orchestration | `langgraph-supervisor` v1.1 | Built-in supervisor with middleware. Don't rebuild. [Layer 1] | | MCP integration | `langchain-mcp-adapters` + `@tool` | MultiServerMCPClient for MCP, @tool for CLI/API. No custom base class. [Layer 1] | | Checkpointer | PostgresSaver from day one (app + tests) | Phase 4 analytics/replay needs queryable data. Docker Compose. | | LLM provider | LangChain `BaseChatModel` + env config | No custom wrapper. `LLM_PROVIDER` + `LLM_MODEL` env vars. [Layer 1] | | Streaming | FastAPI WebSocket + `astream_events()` | Built-in. No custom streaming layer. [Layer 1] | | OpenAPI import | Full MCP server generation + LLM classification + human review | Parse spec → generate tools → LLM classifies read/write/params → operator reviews | | OpenAPI import UX | Async background task with WebSocket progress | Don't block chat during import | | Replay | Custom paginated API endpoint | Not raw `get_state_history()`. Design for 200+ turn threads. | | Interrupt TTL | Auto-cancel + retry offer after 30 min | Stale approvals are dangerous. Re-evaluate current state on retry. | | Routing fallback | General-purpose fallback agent | Catches misroutes. TODO for routing accuracy eval. | | Resolution metric | Tool call success + no escalation | Honest starting definition. Refine with customer satisfaction signals. | | Cost tracking | LangChain callback logging tokens per conversation | Surface cost-per-resolution in analytics. | | SSRF protection | Block private IPs + DNS rebinding protection | Mandatory for OpenAPI URL fetching. Build as standalone utility. | | DB error handling | try/except around graph invocation | Return clear error message to user, don't fail silently. | ## Project Structure ``` smart-support/ ├── backend/ │ ├── app/ │ │ ├── main.py # FastAPI app + WebSocket │ │ ├── graph.py # LangGraph supervisor setup │ │ ├── agents/ # Agent definitions + tools │ │ ├── registry.py # YAML agent registry loader │ │ ├── openapi/ # OpenAPI parser + MCP server generator │ │ ├── replay/ # Replay API endpoint │ │ ├── analytics/ # Analytics queries + endpoint │ │ └── callbacks.py # Token usage logging callback │ ├── agents.yaml # Agent registry config │ ├── templates/ # Vertical templates (e-commerce.yaml, etc.) │ └── tests/ ├── frontend/ # React chat UI + replay + dashboard ├── docker-compose.yml # Postgres + app └── pyproject.toml ``` ## Phasing (6-7 weeks) ### Phase 1 (Weeks 1-3): Core Framework - FastAPI backend with WebSocket for chat - LangGraph supervisor with 2-3 demo agents (order lookup, FAQ, escalation) - PostgresSaver checkpointer via Docker Compose - YAML-based agent registry with validation - React chat UI with streaming tokens - Agent personality via YAML config - Basic interrupt() flow for write operations - Fallback agent for misrouted queries - Token usage logging callback - Try/except for DB errors - **Integration checkpoint:** End of week 3, full chat loop works end-to-end ### Phase 2 (Weeks 3-4): Multi-Agent + Safety - Full supervisor routing with intent classification - Webhook escalation (HTTP POST to configured URL + retry) - Vertical templates (YAML configs for e-commerce, SaaS) - Expired interrupt handling (auto-cancel + retry offer after 30-min TTL) - **Integration checkpoint:** End of week 4, multi-agent routing + interrupt flow works ### Phase 3 (Weeks 4-6): OpenAPI Auto-Discovery - Parse OpenAPI 3.0 specs from user-provided URLs - SSRF protection (block private IPs, DNS rebinding, URL allowlist) - Generate full MCP servers wrapping each endpoint - LLM-assisted endpoint classification (read/write, customer params, agent groupings) - Operator review/correction UI for classifications - Auto-generate agent YAML from classified spec - Async import with WebSocket progress updates - **Integration checkpoint:** End of week 6, paste a real API spec → tools work in chat ### Phase 4 (Weeks 6-7): Analytics + Replay - Custom paginated replay API endpoint - Replay UI (step-by-step timeline in React) - Analytics queries (resolution rate, agent usage, escalation %, cost-per-resolution) - Analytics dashboard UI with zero-state handling - Resolution rate = successful tool call + no escalation - **Integration checkpoint:** End of week 7, full product demo ready ### Phase 5 (Buffer): Polish + Demo Prep - Error handling hardening - Demo script and sample data - Docker Compose for full-stack deployment ## Tech Stack - Python 3.11+, FastAPI, LangGraph 1.x (currently 1.1.6) - langgraph-supervisor 0.0.31, langchain-mcp-adapters, langgraph-checkpoint-postgres v3.0.5 - React (frontend), PostgreSQL 16 (via Docker Compose) - Claude Sonnet 4.6 via `ChatAnthropic` (configurable via env) - pytest + FastAPI TestClient for backend tests - openapi-spec-validator for spec validation ## NOT in scope - Authentication/authorization (deferred to pre-production) - Multi-tenant architecture (deferred to first paid customer) - CI/CD pipeline (manual deploy for prototype) - Rate limiting (deferred to pre-production) - Zendesk/Intercom marketplace integration (deferred to post-validation) - Mobile-responsive chat UI (desktop-only for demo) - Internationalization/i18n - Billing/pricing infrastructure - Distribution pipeline (manual Docker Compose deploy) ## What already exists (reuse, don't rebuild) - `langgraph-supervisor` — agent orchestration - `langgraph-checkpoint-postgres` — state persistence - LangGraph `interrupt()` — human-in-the-loop - `langchain-mcp-adapters` (`MultiServerMCPClient`) — MCP tool integration - LangChain `BaseChatModel` — LLM provider abstraction - FastAPI WebSocket + `astream_events()` — streaming - `openapi-spec-validator` — OpenAPI spec validation ## Testing Strategy TDD per phase. 80%+ coverage target. pytest + FastAPI TestClient. 45 codepaths identified (33 code paths + 12 user flows, 6 E2E). Key test categories: 1. **Graph tests** — invoke supervisor with mock tools, assert routing + state 2. **MCP tool tests** — mock external HTTP, test structured responses 3. **WebSocket tests** — FastAPI TestClient, test message → response cycle 4. **Interrupt tests** — test approval, rejection, and TTL expiry flows 5. **OpenAPI tests** — test spec parsing, SSRF blocking, MCP generation 6. **E2E tests** — 6 critical flows (happy path, cancel+approve, cancel+reject, multi-turn, OpenAPI import, replay) ## Failure Modes | Codepath | Failure | Mitigation | |----------|---------|------------| | LLM API call | Timeout/rate limit | Error message to user | | MCP tool call | External API down | Escalation + error message | | Interrupt resume | 30-min TTL expired | Auto-cancel + retry offer | | PostgresSaver | DB connection lost | try/except + user-facing error | | OpenAPI URL fetch | SSRF attempt | Block private IPs + DNS rebinding | | Supervisor routing | Wrong agent | Fallback agent catches misroutes | | Webhook POST | Target unreachable | Retry with backoff + log | ## Parallelization Strategy | Lane | Steps | Modules | |------|-------|---------| | A | Phase 1 backend + Phase 2 | backend/app/ | | B | Phase 1 frontend | frontend/ | | C | SSRF utility (standalone) | backend/app/openapi/ssrf.py | Launch A + B + C in parallel. Merge after Phase 1. Phase 3-4 are sequential (depend on core). ## Verification 1. `docker compose up` — Postgres + app starts 2. Open `http://localhost:8000` — chat UI loads 3. Send "What's the status of order 1042?" — get streaming response 4. Send "Cancel order 1042" — get interrupt prompt → approve → confirmation 5. `pytest --cov` — 80%+ coverage 6. Paste sample OpenAPI spec → tools generated → chat uses them (Phase 3) 7. View replay of completed conversation (Phase 4) 8. View analytics dashboard (Phase 4) ## GSTACK REVIEW REPORT | Review | Trigger | Why | Runs | Status | Findings | |--------|---------|-----|------|--------|----------| | CEO Review | `/plan-ceo-review` | Scope & strategy | 1 | CLEAR | 6 proposals, 6 accepted, 0 deferred | | Codex Review | `/codex review` | Independent 2nd opinion | 0 | — | — | | Eng Review | `/plan-eng-review` | Architecture & tests (required) | 2 | CLEAR | 10 issues, 0 critical gaps | | Design Review | `/plan-design-review` | UI/UX gaps | 0 | — | — | - **OUTSIDE VOICE:** Claude subagent review found 10 issues. 3 cross-model tensions resolved (PostgresSaver timing, OpenAPI feasibility, timeline). 3 new findings adopted (routing fallback, resolution metric definition, LLM cost tracking). - **UNRESOLVED:** 0 - **VERDICT:** CEO + ENG CLEARED — ready to implement