- Bump langgraph from 0.4 to 1.0+, langgraph-supervisor from 0.0.12 to 0.0.30+ - Bump langchain-core, langchain-anthropic, langchain-openai to 1.x - Add langchain>=1.0 dependency for new create_agent location - Migrate create_react_agent -> create_agent (prompt -> system_prompt) - Fix create_supervisor positional arg to named agents= parameter - Replace AsyncMock checkpointer with InMemorySaver in tests (v1 type validation) - Update version references in README, ARCHITECTURE, eng-review-plan
9.7 KiB
9.7 KiB
Smart Support Framework — Eng Review Plan
Context
Build a pluggable AI customer support framework. Core value: "Paste your API, get an AI agent that executes actions." This plan incorporates all CEO review expansions (6 features) with re-sequenced phasing (core first). Timeline extended to 6-7 weeks per outside voice feedback.
No code exists yet. Greenfield project.
Architecture Decisions
Customer → React Chat UI → FastAPI WebSocket → LangGraph Supervisor → Agents → MCP Tools → Client APIs
↑ ↑
Agent Registry interrupt()
(YAML config) (HITL safety)
↑
PostgresSaver
(checkpoint persistence)
| Decision | Choice | Rationale |
|---|---|---|
| Agent orchestration | langgraph-supervisor v1.1 |
Built-in supervisor with middleware. Don't rebuild. [Layer 1] |
| MCP integration | langchain-mcp-adapters + @tool |
MultiServerMCPClient for MCP, @tool for CLI/API. No custom base class. [Layer 1] |
| Checkpointer | PostgresSaver from day one (app + tests) | Phase 4 analytics/replay needs queryable data. Docker Compose. |
| LLM provider | LangChain BaseChatModel + env config |
No custom wrapper. LLM_PROVIDER + LLM_MODEL env vars. [Layer 1] |
| Streaming | FastAPI WebSocket + astream_events() |
Built-in. No custom streaming layer. [Layer 1] |
| OpenAPI import | Full MCP server generation + LLM classification + human review | Parse spec → generate tools → LLM classifies read/write/params → operator reviews |
| OpenAPI import UX | Async background task with WebSocket progress | Don't block chat during import |
| Replay | Custom paginated API endpoint | Not raw get_state_history(). Design for 200+ turn threads. |
| Interrupt TTL | Auto-cancel + retry offer after 30 min | Stale approvals are dangerous. Re-evaluate current state on retry. |
| Routing fallback | General-purpose fallback agent | Catches misroutes. TODO for routing accuracy eval. |
| Resolution metric | Tool call success + no escalation | Honest starting definition. Refine with customer satisfaction signals. |
| Cost tracking | LangChain callback logging tokens per conversation | Surface cost-per-resolution in analytics. |
| SSRF protection | Block private IPs + DNS rebinding protection | Mandatory for OpenAPI URL fetching. Build as standalone utility. |
| DB error handling | try/except around graph invocation | Return clear error message to user, don't fail silently. |
Project Structure
smart-support/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI app + WebSocket
│ │ ├── graph.py # LangGraph supervisor setup
│ │ ├── agents/ # Agent definitions + tools
│ │ ├── registry.py # YAML agent registry loader
│ │ ├── openapi/ # OpenAPI parser + MCP server generator
│ │ ├── replay/ # Replay API endpoint
│ │ ├── analytics/ # Analytics queries + endpoint
│ │ └── callbacks.py # Token usage logging callback
│ ├── agents.yaml # Agent registry config
│ ├── templates/ # Vertical templates (e-commerce.yaml, etc.)
│ └── tests/
├── frontend/ # React chat UI + replay + dashboard
├── docker-compose.yml # Postgres + app
└── pyproject.toml
Phasing (6-7 weeks)
Phase 1 (Weeks 1-3): Core Framework
- FastAPI backend with WebSocket for chat
- LangGraph supervisor with 2-3 demo agents (order lookup, FAQ, escalation)
- PostgresSaver checkpointer via Docker Compose
- YAML-based agent registry with validation
- React chat UI with streaming tokens
- Agent personality via YAML config
- Basic interrupt() flow for write operations
- Fallback agent for misrouted queries
- Token usage logging callback
- Try/except for DB errors
- Integration checkpoint: End of week 3, full chat loop works end-to-end
Phase 2 (Weeks 3-4): Multi-Agent + Safety
- Full supervisor routing with intent classification
- Webhook escalation (HTTP POST to configured URL + retry)
- Vertical templates (YAML configs for e-commerce, SaaS)
- Expired interrupt handling (auto-cancel + retry offer after 30-min TTL)
- Integration checkpoint: End of week 4, multi-agent routing + interrupt flow works
Phase 3 (Weeks 4-6): OpenAPI Auto-Discovery
- Parse OpenAPI 3.0 specs from user-provided URLs
- SSRF protection (block private IPs, DNS rebinding, URL allowlist)
- Generate full MCP servers wrapping each endpoint
- LLM-assisted endpoint classification (read/write, customer params, agent groupings)
- Operator review/correction UI for classifications
- Auto-generate agent YAML from classified spec
- Async import with WebSocket progress updates
- Integration checkpoint: End of week 6, paste a real API spec → tools work in chat
Phase 4 (Weeks 6-7): Analytics + Replay
- Custom paginated replay API endpoint
- Replay UI (step-by-step timeline in React)
- Analytics queries (resolution rate, agent usage, escalation %, cost-per-resolution)
- Analytics dashboard UI with zero-state handling
- Resolution rate = successful tool call + no escalation
- Integration checkpoint: End of week 7, full product demo ready
Phase 5 (Buffer): Polish + Demo Prep
- Error handling hardening
- Demo script and sample data
- Docker Compose for full-stack deployment
Tech Stack
- Python 3.11+, FastAPI, LangGraph 1.x (currently 1.1.6)
- langgraph-supervisor 0.0.31, langchain-mcp-adapters, langgraph-checkpoint-postgres v3.0.5
- React (frontend), PostgreSQL 16 (via Docker Compose)
- Claude Sonnet 4.6 via
ChatAnthropic(configurable via env) - pytest + FastAPI TestClient for backend tests
- openapi-spec-validator for spec validation
NOT in scope
- Authentication/authorization (deferred to pre-production)
- Multi-tenant architecture (deferred to first paid customer)
- CI/CD pipeline (manual deploy for prototype)
- Rate limiting (deferred to pre-production)
- Zendesk/Intercom marketplace integration (deferred to post-validation)
- Mobile-responsive chat UI (desktop-only for demo)
- Internationalization/i18n
- Billing/pricing infrastructure
- Distribution pipeline (manual Docker Compose deploy)
What already exists (reuse, don't rebuild)
langgraph-supervisor— agent orchestrationlanggraph-checkpoint-postgres— state persistence- LangGraph
interrupt()— human-in-the-loop langchain-mcp-adapters(MultiServerMCPClient) — MCP tool integration- LangChain
BaseChatModel— LLM provider abstraction - FastAPI WebSocket +
astream_events()— streaming openapi-spec-validator— OpenAPI spec validation
Testing Strategy
TDD per phase. 80%+ coverage target. pytest + FastAPI TestClient.
45 codepaths identified (33 code paths + 12 user flows, 6 E2E).
Key test categories:
- Graph tests — invoke supervisor with mock tools, assert routing + state
- MCP tool tests — mock external HTTP, test structured responses
- WebSocket tests — FastAPI TestClient, test message → response cycle
- Interrupt tests — test approval, rejection, and TTL expiry flows
- OpenAPI tests — test spec parsing, SSRF blocking, MCP generation
- E2E tests — 6 critical flows (happy path, cancel+approve, cancel+reject, multi-turn, OpenAPI import, replay)
Failure Modes
| Codepath | Failure | Mitigation |
|---|---|---|
| LLM API call | Timeout/rate limit | Error message to user |
| MCP tool call | External API down | Escalation + error message |
| Interrupt resume | 30-min TTL expired | Auto-cancel + retry offer |
| PostgresSaver | DB connection lost | try/except + user-facing error |
| OpenAPI URL fetch | SSRF attempt | Block private IPs + DNS rebinding |
| Supervisor routing | Wrong agent | Fallback agent catches misroutes |
| Webhook POST | Target unreachable | Retry with backoff + log |
Parallelization Strategy
| Lane | Steps | Modules |
|---|---|---|
| A | Phase 1 backend + Phase 2 | backend/app/ |
| B | Phase 1 frontend | frontend/ |
| C | SSRF utility (standalone) | backend/app/openapi/ssrf.py |
Launch A + B + C in parallel. Merge after Phase 1. Phase 3-4 are sequential (depend on core).
Verification
docker compose up— Postgres + app starts- Open
http://localhost:8000— chat UI loads - Send "What's the status of order 1042?" — get streaming response
- Send "Cancel order 1042" — get interrupt prompt → approve → confirmation
pytest --cov— 80%+ coverage- Paste sample OpenAPI spec → tools generated → chat uses them (Phase 3)
- View replay of completed conversation (Phase 4)
- View analytics dashboard (Phase 4)
GSTACK REVIEW REPORT
| Review | Trigger | Why | Runs | Status | Findings |
|---|---|---|---|---|---|
| CEO Review | /plan-ceo-review |
Scope & strategy | 1 | CLEAR | 6 proposals, 6 accepted, 0 deferred |
| Codex Review | /codex review |
Independent 2nd opinion | 0 | — | — |
| Eng Review | /plan-eng-review |
Architecture & tests (required) | 2 | CLEAR | 10 issues, 0 critical gaps |
| Design Review | /plan-design-review |
UI/UX gaps | 0 | — | — |
- OUTSIDE VOICE: Claude subagent review found 10 issues. 3 cross-model tensions resolved (PostgresSaver timing, OpenAPI feasibility, timeline). 3 new findings adopted (routing fallback, resolution metric definition, LLM cost tracking).
- UNRESOLVED: 0
- VERDICT: CEO + ENG CLEARED — ready to implement