Files
smart-support/eng-review-plan.md
Yaojia Wang b8654aa31f feat: upgrade LangGraph to 1.x and migrate deprecated APIs
- Bump langgraph from 0.4 to 1.0+, langgraph-supervisor from 0.0.12 to 0.0.30+
- Bump langchain-core, langchain-anthropic, langchain-openai to 1.x
- Add langchain>=1.0 dependency for new create_agent location
- Migrate create_react_agent -> create_agent (prompt -> system_prompt)
- Fix create_supervisor positional arg to named agents= parameter
- Replace AsyncMock checkpointer with InMemorySaver in tests (v1 type validation)
- Update version references in README, ARCHITECTURE, eng-review-plan
2026-04-06 14:51:51 +02:00

9.7 KiB

Smart Support Framework — Eng Review Plan

Context

Build a pluggable AI customer support framework. Core value: "Paste your API, get an AI agent that executes actions." This plan incorporates all CEO review expansions (6 features) with re-sequenced phasing (core first). Timeline extended to 6-7 weeks per outside voice feedback.

No code exists yet. Greenfield project.

Architecture Decisions

Customer → React Chat UI → FastAPI WebSocket → LangGraph Supervisor → Agents → MCP Tools → Client APIs
                                                      ↑                   ↑
                                                Agent Registry        interrupt()
                                                (YAML config)        (HITL safety)
                                                      ↑
                                              PostgresSaver
                                            (checkpoint persistence)
Decision Choice Rationale
Agent orchestration langgraph-supervisor v1.1 Built-in supervisor with middleware. Don't rebuild. [Layer 1]
MCP integration langchain-mcp-adapters + @tool MultiServerMCPClient for MCP, @tool for CLI/API. No custom base class. [Layer 1]
Checkpointer PostgresSaver from day one (app + tests) Phase 4 analytics/replay needs queryable data. Docker Compose.
LLM provider LangChain BaseChatModel + env config No custom wrapper. LLM_PROVIDER + LLM_MODEL env vars. [Layer 1]
Streaming FastAPI WebSocket + astream_events() Built-in. No custom streaming layer. [Layer 1]
OpenAPI import Full MCP server generation + LLM classification + human review Parse spec → generate tools → LLM classifies read/write/params → operator reviews
OpenAPI import UX Async background task with WebSocket progress Don't block chat during import
Replay Custom paginated API endpoint Not raw get_state_history(). Design for 200+ turn threads.
Interrupt TTL Auto-cancel + retry offer after 30 min Stale approvals are dangerous. Re-evaluate current state on retry.
Routing fallback General-purpose fallback agent Catches misroutes. TODO for routing accuracy eval.
Resolution metric Tool call success + no escalation Honest starting definition. Refine with customer satisfaction signals.
Cost tracking LangChain callback logging tokens per conversation Surface cost-per-resolution in analytics.
SSRF protection Block private IPs + DNS rebinding protection Mandatory for OpenAPI URL fetching. Build as standalone utility.
DB error handling try/except around graph invocation Return clear error message to user, don't fail silently.

Project Structure

smart-support/
├── backend/
│   ├── app/
│   │   ├── main.py          # FastAPI app + WebSocket
│   │   ├── graph.py         # LangGraph supervisor setup
│   │   ├── agents/          # Agent definitions + tools
│   │   ├── registry.py      # YAML agent registry loader
│   │   ├── openapi/         # OpenAPI parser + MCP server generator
│   │   ├── replay/          # Replay API endpoint
│   │   ├── analytics/       # Analytics queries + endpoint
│   │   └── callbacks.py     # Token usage logging callback
│   ├── agents.yaml          # Agent registry config
│   ├── templates/           # Vertical templates (e-commerce.yaml, etc.)
│   └── tests/
├── frontend/                # React chat UI + replay + dashboard
├── docker-compose.yml       # Postgres + app
└── pyproject.toml

Phasing (6-7 weeks)

Phase 1 (Weeks 1-3): Core Framework

  • FastAPI backend with WebSocket for chat
  • LangGraph supervisor with 2-3 demo agents (order lookup, FAQ, escalation)
  • PostgresSaver checkpointer via Docker Compose
  • YAML-based agent registry with validation
  • React chat UI with streaming tokens
  • Agent personality via YAML config
  • Basic interrupt() flow for write operations
  • Fallback agent for misrouted queries
  • Token usage logging callback
  • Try/except for DB errors
  • Integration checkpoint: End of week 3, full chat loop works end-to-end

Phase 2 (Weeks 3-4): Multi-Agent + Safety

  • Full supervisor routing with intent classification
  • Webhook escalation (HTTP POST to configured URL + retry)
  • Vertical templates (YAML configs for e-commerce, SaaS)
  • Expired interrupt handling (auto-cancel + retry offer after 30-min TTL)
  • Integration checkpoint: End of week 4, multi-agent routing + interrupt flow works

Phase 3 (Weeks 4-6): OpenAPI Auto-Discovery

  • Parse OpenAPI 3.0 specs from user-provided URLs
  • SSRF protection (block private IPs, DNS rebinding, URL allowlist)
  • Generate full MCP servers wrapping each endpoint
  • LLM-assisted endpoint classification (read/write, customer params, agent groupings)
  • Operator review/correction UI for classifications
  • Auto-generate agent YAML from classified spec
  • Async import with WebSocket progress updates
  • Integration checkpoint: End of week 6, paste a real API spec → tools work in chat

Phase 4 (Weeks 6-7): Analytics + Replay

  • Custom paginated replay API endpoint
  • Replay UI (step-by-step timeline in React)
  • Analytics queries (resolution rate, agent usage, escalation %, cost-per-resolution)
  • Analytics dashboard UI with zero-state handling
  • Resolution rate = successful tool call + no escalation
  • Integration checkpoint: End of week 7, full product demo ready

Phase 5 (Buffer): Polish + Demo Prep

  • Error handling hardening
  • Demo script and sample data
  • Docker Compose for full-stack deployment

Tech Stack

  • Python 3.11+, FastAPI, LangGraph 1.x (currently 1.1.6)
  • langgraph-supervisor 0.0.31, langchain-mcp-adapters, langgraph-checkpoint-postgres v3.0.5
  • React (frontend), PostgreSQL 16 (via Docker Compose)
  • Claude Sonnet 4.6 via ChatAnthropic (configurable via env)
  • pytest + FastAPI TestClient for backend tests
  • openapi-spec-validator for spec validation

NOT in scope

  • Authentication/authorization (deferred to pre-production)
  • Multi-tenant architecture (deferred to first paid customer)
  • CI/CD pipeline (manual deploy for prototype)
  • Rate limiting (deferred to pre-production)
  • Zendesk/Intercom marketplace integration (deferred to post-validation)
  • Mobile-responsive chat UI (desktop-only for demo)
  • Internationalization/i18n
  • Billing/pricing infrastructure
  • Distribution pipeline (manual Docker Compose deploy)

What already exists (reuse, don't rebuild)

  • langgraph-supervisor — agent orchestration
  • langgraph-checkpoint-postgres — state persistence
  • LangGraph interrupt() — human-in-the-loop
  • langchain-mcp-adapters (MultiServerMCPClient) — MCP tool integration
  • LangChain BaseChatModel — LLM provider abstraction
  • FastAPI WebSocket + astream_events() — streaming
  • openapi-spec-validator — OpenAPI spec validation

Testing Strategy

TDD per phase. 80%+ coverage target. pytest + FastAPI TestClient.

45 codepaths identified (33 code paths + 12 user flows, 6 E2E).

Key test categories:

  1. Graph tests — invoke supervisor with mock tools, assert routing + state
  2. MCP tool tests — mock external HTTP, test structured responses
  3. WebSocket tests — FastAPI TestClient, test message → response cycle
  4. Interrupt tests — test approval, rejection, and TTL expiry flows
  5. OpenAPI tests — test spec parsing, SSRF blocking, MCP generation
  6. E2E tests — 6 critical flows (happy path, cancel+approve, cancel+reject, multi-turn, OpenAPI import, replay)

Failure Modes

Codepath Failure Mitigation
LLM API call Timeout/rate limit Error message to user
MCP tool call External API down Escalation + error message
Interrupt resume 30-min TTL expired Auto-cancel + retry offer
PostgresSaver DB connection lost try/except + user-facing error
OpenAPI URL fetch SSRF attempt Block private IPs + DNS rebinding
Supervisor routing Wrong agent Fallback agent catches misroutes
Webhook POST Target unreachable Retry with backoff + log

Parallelization Strategy

Lane Steps Modules
A Phase 1 backend + Phase 2 backend/app/
B Phase 1 frontend frontend/
C SSRF utility (standalone) backend/app/openapi/ssrf.py

Launch A + B + C in parallel. Merge after Phase 1. Phase 3-4 are sequential (depend on core).

Verification

  1. docker compose up — Postgres + app starts
  2. Open http://localhost:8000 — chat UI loads
  3. Send "What's the status of order 1042?" — get streaming response
  4. Send "Cancel order 1042" — get interrupt prompt → approve → confirmation
  5. pytest --cov — 80%+ coverage
  6. Paste sample OpenAPI spec → tools generated → chat uses them (Phase 3)
  7. View replay of completed conversation (Phase 4)
  8. View analytics dashboard (Phase 4)

GSTACK REVIEW REPORT

Review Trigger Why Runs Status Findings
CEO Review /plan-ceo-review Scope & strategy 1 CLEAR 6 proposals, 6 accepted, 0 deferred
Codex Review /codex review Independent 2nd opinion 0
Eng Review /plan-eng-review Architecture & tests (required) 2 CLEAR 10 issues, 0 critical gaps
Design Review /plan-design-review UI/UX gaps 0
  • OUTSIDE VOICE: Claude subagent review found 10 issues. 3 cross-model tensions resolved (PostgresSaver timing, OpenAPI feasibility, timeline). 3 new findings adopted (routing fallback, resolution metric definition, LLM cost tracking).
  • UNRESOLVED: 0
  • VERDICT: CEO + ENG CLEARED — ready to implement