Files

Yaojia Wang b8654aa31f feat: upgrade LangGraph to 1.x and migrate deprecated APIs

- Bump langgraph from 0.4 to 1.0+, langgraph-supervisor from 0.0.12 to 0.0.30+
- Bump langchain-core, langchain-anthropic, langchain-openai to 1.x
- Add langchain>=1.0 dependency for new create_agent location
- Migrate create_react_agent -> create_agent (prompt -> system_prompt)
- Fix create_supervisor positional arg to named agents= parameter
- Replace AsyncMock checkpointer with InMemorySaver in tests (v1 type validation)
- Update version references in README, ARCHITECTURE, eng-review-plan

2026-04-06 14:51:51 +02:00

9.7 KiB

Raw Blame History

Smart Support Framework — Eng Review Plan

Context

Build a pluggable AI customer support framework. Core value: "Paste your API, get an AI agent that executes actions." This plan incorporates all CEO review expansions (6 features) with re-sequenced phasing (core first). Timeline extended to 6-7 weeks per outside voice feedback.

No code exists yet. Greenfield project.

Architecture Decisions

Customer → React Chat UI → FastAPI WebSocket → LangGraph Supervisor → Agents → MCP Tools → Client APIs
                                                      ↑                   ↑
                                                Agent Registry        interrupt()
                                                (YAML config)        (HITL safety)
                                                      ↑
                                              PostgresSaver
                                            (checkpoint persistence)

Decision	Choice	Rationale
Agent orchestration	`langgraph-supervisor` v1.1	Built-in supervisor with middleware. Don't rebuild. [Layer 1]
MCP integration	`langchain-mcp-adapters` + `@tool`	MultiServerMCPClient for MCP, @tool for CLI/API. No custom base class. [Layer 1]
Checkpointer	PostgresSaver from day one (app + tests)	Phase 4 analytics/replay needs queryable data. Docker Compose.
LLM provider	LangChain `BaseChatModel` + env config	No custom wrapper. `LLM_PROVIDER` + `LLM_MODEL` env vars. [Layer 1]
Streaming	FastAPI WebSocket + `astream_events()`	Built-in. No custom streaming layer. [Layer 1]
OpenAPI import	Full MCP server generation + LLM classification + human review	Parse spec → generate tools → LLM classifies read/write/params → operator reviews
OpenAPI import UX	Async background task with WebSocket progress	Don't block chat during import
Replay	Custom paginated API endpoint	Not raw `get_state_history()`. Design for 200+ turn threads.
Interrupt TTL	Auto-cancel + retry offer after 30 min	Stale approvals are dangerous. Re-evaluate current state on retry.
Routing fallback	General-purpose fallback agent	Catches misroutes. TODO for routing accuracy eval.
Resolution metric	Tool call success + no escalation	Honest starting definition. Refine with customer satisfaction signals.
Cost tracking	LangChain callback logging tokens per conversation	Surface cost-per-resolution in analytics.
SSRF protection	Block private IPs + DNS rebinding protection	Mandatory for OpenAPI URL fetching. Build as standalone utility.
DB error handling	try/except around graph invocation	Return clear error message to user, don't fail silently.

Project Structure

smart-support/
├── backend/
│   ├── app/
│   │   ├── main.py          # FastAPI app + WebSocket
│   │   ├── graph.py         # LangGraph supervisor setup
│   │   ├── agents/          # Agent definitions + tools
│   │   ├── registry.py      # YAML agent registry loader
│   │   ├── openapi/         # OpenAPI parser + MCP server generator
│   │   ├── replay/          # Replay API endpoint
│   │   ├── analytics/       # Analytics queries + endpoint
│   │   └── callbacks.py     # Token usage logging callback
│   ├── agents.yaml          # Agent registry config
│   ├── templates/           # Vertical templates (e-commerce.yaml, etc.)
│   └── tests/
├── frontend/                # React chat UI + replay + dashboard
├── docker-compose.yml       # Postgres + app
└── pyproject.toml

Phasing (6-7 weeks)

Phase 1 (Weeks 1-3): Core Framework

FastAPI backend with WebSocket for chat
LangGraph supervisor with 2-3 demo agents (order lookup, FAQ, escalation)
PostgresSaver checkpointer via Docker Compose
YAML-based agent registry with validation
React chat UI with streaming tokens
Agent personality via YAML config
Basic interrupt() flow for write operations
Fallback agent for misrouted queries
Token usage logging callback
Try/except for DB errors
Integration checkpoint: End of week 3, full chat loop works end-to-end

Phase 2 (Weeks 3-4): Multi-Agent + Safety

Full supervisor routing with intent classification
Webhook escalation (HTTP POST to configured URL + retry)
Vertical templates (YAML configs for e-commerce, SaaS)
Expired interrupt handling (auto-cancel + retry offer after 30-min TTL)
Integration checkpoint: End of week 4, multi-agent routing + interrupt flow works

Phase 3 (Weeks 4-6): OpenAPI Auto-Discovery

Parse OpenAPI 3.0 specs from user-provided URLs
SSRF protection (block private IPs, DNS rebinding, URL allowlist)
Generate full MCP servers wrapping each endpoint
LLM-assisted endpoint classification (read/write, customer params, agent groupings)
Operator review/correction UI for classifications
Auto-generate agent YAML from classified spec
Async import with WebSocket progress updates
Integration checkpoint: End of week 6, paste a real API spec → tools work in chat

Phase 4 (Weeks 6-7): Analytics + Replay

Custom paginated replay API endpoint
Replay UI (step-by-step timeline in React)
Analytics queries (resolution rate, agent usage, escalation %, cost-per-resolution)
Analytics dashboard UI with zero-state handling
Resolution rate = successful tool call + no escalation
Integration checkpoint: End of week 7, full product demo ready

Phase 5 (Buffer): Polish + Demo Prep

Error handling hardening
Demo script and sample data
Docker Compose for full-stack deployment

Tech Stack

Python 3.11+, FastAPI, LangGraph 1.x (currently 1.1.6)
langgraph-supervisor 0.0.31, langchain-mcp-adapters, langgraph-checkpoint-postgres v3.0.5
React (frontend), PostgreSQL 16 (via Docker Compose)
Claude Sonnet 4.6 via ChatAnthropic (configurable via env)
pytest + FastAPI TestClient for backend tests
openapi-spec-validator for spec validation

NOT in scope

Authentication/authorization (deferred to pre-production)
Multi-tenant architecture (deferred to first paid customer)
CI/CD pipeline (manual deploy for prototype)
Rate limiting (deferred to pre-production)
Zendesk/Intercom marketplace integration (deferred to post-validation)
Mobile-responsive chat UI (desktop-only for demo)
Internationalization/i18n
Billing/pricing infrastructure
Distribution pipeline (manual Docker Compose deploy)

What already exists (reuse, don't rebuild)

langgraph-supervisor — agent orchestration
langgraph-checkpoint-postgres — state persistence
LangGraph interrupt() — human-in-the-loop
langchain-mcp-adapters (MultiServerMCPClient) — MCP tool integration
LangChain BaseChatModel — LLM provider abstraction
FastAPI WebSocket + astream_events() — streaming
openapi-spec-validator — OpenAPI spec validation

Testing Strategy

TDD per phase. 80%+ coverage target. pytest + FastAPI TestClient.

45 codepaths identified (33 code paths + 12 user flows, 6 E2E).

Key test categories:

Graph tests — invoke supervisor with mock tools, assert routing + state
MCP tool tests — mock external HTTP, test structured responses
WebSocket tests — FastAPI TestClient, test message → response cycle
Interrupt tests — test approval, rejection, and TTL expiry flows
OpenAPI tests — test spec parsing, SSRF blocking, MCP generation
E2E tests — 6 critical flows (happy path, cancel+approve, cancel+reject, multi-turn, OpenAPI import, replay)

Failure Modes

Codepath	Failure	Mitigation
LLM API call	Timeout/rate limit	Error message to user
MCP tool call	External API down	Escalation + error message
Interrupt resume	30-min TTL expired	Auto-cancel + retry offer
PostgresSaver	DB connection lost	try/except + user-facing error
OpenAPI URL fetch	SSRF attempt	Block private IPs + DNS rebinding
Supervisor routing	Wrong agent	Fallback agent catches misroutes
Webhook POST	Target unreachable	Retry with backoff + log

Parallelization Strategy

Lane	Steps	Modules
A	Phase 1 backend + Phase 2	backend/app/
B	Phase 1 frontend	frontend/
C	SSRF utility (standalone)	backend/app/openapi/ssrf.py

Launch A + B + C in parallel. Merge after Phase 1. Phase 3-4 are sequential (depend on core).

Verification

docker compose up — Postgres + app starts
Open http://localhost:8000 — chat UI loads
Send "What's the status of order 1042?" — get streaming response
Send "Cancel order 1042" — get interrupt prompt → approve → confirmation
pytest --cov — 80%+ coverage
Paste sample OpenAPI spec → tools generated → chat uses them (Phase 3)
View replay of completed conversation (Phase 4)
View analytics dashboard (Phase 4)

GSTACK REVIEW REPORT

Review	Trigger	Why	Runs	Status	Findings
CEO Review	`/plan-ceo-review`	Scope & strategy	1	CLEAR	6 proposals, 6 accepted, 0 deferred
Codex Review	`/codex review`	Independent 2nd opinion	0	—	—
Eng Review	`/plan-eng-review`	Architecture & tests (required)	2	CLEAR	10 issues, 0 critical gaps
Design Review	`/plan-design-review`	UI/UX gaps	0	—	—

OUTSIDE VOICE: Claude subagent review found 10 issues. 3 cross-model tensions resolved (PostgresSaver timing, OpenAPI feasibility, timeline). 3 new findings adopted (routing fallback, resolution metric definition, LLM cost tracking).
UNRESOLVED: 0
VERDICT: CEO + ENG CLEARED — ready to implement

9.7 KiB Raw Blame History