Smart Support - AI customer service action layer framework. Includes design doc, CEO plan, eng review, test plan, and README.
195 lines
9.6 KiB
Markdown
195 lines
9.6 KiB
Markdown
# Smart Support Framework — Eng Review Plan
|
|
|
|
## Context
|
|
|
|
Build a pluggable AI customer support framework. Core value: "Paste your API, get an AI agent that executes actions." This plan incorporates all CEO review expansions (6 features) with re-sequenced phasing (core first). Timeline extended to 6-7 weeks per outside voice feedback.
|
|
|
|
No code exists yet. Greenfield project.
|
|
|
|
## Architecture Decisions
|
|
|
|
```
|
|
Customer → React Chat UI → FastAPI WebSocket → LangGraph Supervisor → Agents → MCP Tools → Client APIs
|
|
↑ ↑
|
|
Agent Registry interrupt()
|
|
(YAML config) (HITL safety)
|
|
↑
|
|
PostgresSaver
|
|
(checkpoint persistence)
|
|
```
|
|
|
|
| Decision | Choice | Rationale |
|
|
|----------|--------|-----------|
|
|
| Agent orchestration | `langgraph-supervisor` v1.1 | Built-in supervisor with middleware. Don't rebuild. [Layer 1] |
|
|
| MCP integration | `langchain-mcp-adapters` + `@tool` | MultiServerMCPClient for MCP, @tool for CLI/API. No custom base class. [Layer 1] |
|
|
| Checkpointer | PostgresSaver from day one (app + tests) | Phase 4 analytics/replay needs queryable data. Docker Compose. |
|
|
| LLM provider | LangChain `BaseChatModel` + env config | No custom wrapper. `LLM_PROVIDER` + `LLM_MODEL` env vars. [Layer 1] |
|
|
| Streaming | FastAPI WebSocket + `astream_events()` | Built-in. No custom streaming layer. [Layer 1] |
|
|
| OpenAPI import | Full MCP server generation + LLM classification + human review | Parse spec → generate tools → LLM classifies read/write/params → operator reviews |
|
|
| OpenAPI import UX | Async background task with WebSocket progress | Don't block chat during import |
|
|
| Replay | Custom paginated API endpoint | Not raw `get_state_history()`. Design for 200+ turn threads. |
|
|
| Interrupt TTL | Auto-cancel + retry offer after 30 min | Stale approvals are dangerous. Re-evaluate current state on retry. |
|
|
| Routing fallback | General-purpose fallback agent | Catches misroutes. TODO for routing accuracy eval. |
|
|
| Resolution metric | Tool call success + no escalation | Honest starting definition. Refine with customer satisfaction signals. |
|
|
| Cost tracking | LangChain callback logging tokens per conversation | Surface cost-per-resolution in analytics. |
|
|
| SSRF protection | Block private IPs + DNS rebinding protection | Mandatory for OpenAPI URL fetching. Build as standalone utility. |
|
|
| DB error handling | try/except around graph invocation | Return clear error message to user, don't fail silently. |
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
smart-support/
|
|
├── backend/
|
|
│ ├── app/
|
|
│ │ ├── main.py # FastAPI app + WebSocket
|
|
│ │ ├── graph.py # LangGraph supervisor setup
|
|
│ │ ├── agents/ # Agent definitions + tools
|
|
│ │ ├── registry.py # YAML agent registry loader
|
|
│ │ ├── openapi/ # OpenAPI parser + MCP server generator
|
|
│ │ ├── replay/ # Replay API endpoint
|
|
│ │ ├── analytics/ # Analytics queries + endpoint
|
|
│ │ └── callbacks.py # Token usage logging callback
|
|
│ ├── agents.yaml # Agent registry config
|
|
│ ├── templates/ # Vertical templates (e-commerce.yaml, etc.)
|
|
│ └── tests/
|
|
├── frontend/ # React chat UI + replay + dashboard
|
|
├── docker-compose.yml # Postgres + app
|
|
└── pyproject.toml
|
|
```
|
|
|
|
## Phasing (6-7 weeks)
|
|
|
|
### Phase 1 (Weeks 1-3): Core Framework
|
|
- FastAPI backend with WebSocket for chat
|
|
- LangGraph supervisor with 2-3 demo agents (order lookup, FAQ, escalation)
|
|
- PostgresSaver checkpointer via Docker Compose
|
|
- YAML-based agent registry with validation
|
|
- React chat UI with streaming tokens
|
|
- Agent personality via YAML config
|
|
- Basic interrupt() flow for write operations
|
|
- Fallback agent for misrouted queries
|
|
- Token usage logging callback
|
|
- Try/except for DB errors
|
|
- **Integration checkpoint:** End of week 3, full chat loop works end-to-end
|
|
|
|
### Phase 2 (Weeks 3-4): Multi-Agent + Safety
|
|
- Full supervisor routing with intent classification
|
|
- Webhook escalation (HTTP POST to configured URL + retry)
|
|
- Vertical templates (YAML configs for e-commerce, SaaS)
|
|
- Expired interrupt handling (auto-cancel + retry offer after 30-min TTL)
|
|
- **Integration checkpoint:** End of week 4, multi-agent routing + interrupt flow works
|
|
|
|
### Phase 3 (Weeks 4-6): OpenAPI Auto-Discovery
|
|
- Parse OpenAPI 3.0 specs from user-provided URLs
|
|
- SSRF protection (block private IPs, DNS rebinding, URL allowlist)
|
|
- Generate full MCP servers wrapping each endpoint
|
|
- LLM-assisted endpoint classification (read/write, customer params, agent groupings)
|
|
- Operator review/correction UI for classifications
|
|
- Auto-generate agent YAML from classified spec
|
|
- Async import with WebSocket progress updates
|
|
- **Integration checkpoint:** End of week 6, paste a real API spec → tools work in chat
|
|
|
|
### Phase 4 (Weeks 6-7): Analytics + Replay
|
|
- Custom paginated replay API endpoint
|
|
- Replay UI (step-by-step timeline in React)
|
|
- Analytics queries (resolution rate, agent usage, escalation %, cost-per-resolution)
|
|
- Analytics dashboard UI with zero-state handling
|
|
- Resolution rate = successful tool call + no escalation
|
|
- **Integration checkpoint:** End of week 7, full product demo ready
|
|
|
|
### Phase 5 (Buffer): Polish + Demo Prep
|
|
- Error handling hardening
|
|
- Demo script and sample data
|
|
- Docker Compose for full-stack deployment
|
|
|
|
## Tech Stack
|
|
|
|
- Python 3.11+, FastAPI, LangGraph v1.1.0
|
|
- langgraph-supervisor, langchain-mcp-adapters, langgraph-checkpoint-postgres v3.0.5
|
|
- React (frontend), PostgreSQL 16 (via Docker Compose)
|
|
- Claude Sonnet 4.6 via `ChatAnthropic` (configurable via env)
|
|
- pytest + FastAPI TestClient for backend tests
|
|
- openapi-spec-validator for spec validation
|
|
|
|
## NOT in scope
|
|
|
|
- Authentication/authorization (deferred to pre-production)
|
|
- Multi-tenant architecture (deferred to first paid customer)
|
|
- CI/CD pipeline (manual deploy for prototype)
|
|
- Rate limiting (deferred to pre-production)
|
|
- Zendesk/Intercom marketplace integration (deferred to post-validation)
|
|
- Mobile-responsive chat UI (desktop-only for demo)
|
|
- Internationalization/i18n
|
|
- Billing/pricing infrastructure
|
|
- Distribution pipeline (manual Docker Compose deploy)
|
|
|
|
## What already exists (reuse, don't rebuild)
|
|
|
|
- `langgraph-supervisor` — agent orchestration
|
|
- `langgraph-checkpoint-postgres` — state persistence
|
|
- LangGraph `interrupt()` — human-in-the-loop
|
|
- `langchain-mcp-adapters` (`MultiServerMCPClient`) — MCP tool integration
|
|
- LangChain `BaseChatModel` — LLM provider abstraction
|
|
- FastAPI WebSocket + `astream_events()` — streaming
|
|
- `openapi-spec-validator` — OpenAPI spec validation
|
|
|
|
## Testing Strategy
|
|
|
|
TDD per phase. 80%+ coverage target. pytest + FastAPI TestClient.
|
|
|
|
45 codepaths identified (33 code paths + 12 user flows, 6 E2E).
|
|
|
|
Key test categories:
|
|
1. **Graph tests** — invoke supervisor with mock tools, assert routing + state
|
|
2. **MCP tool tests** — mock external HTTP, test structured responses
|
|
3. **WebSocket tests** — FastAPI TestClient, test message → response cycle
|
|
4. **Interrupt tests** — test approval, rejection, and TTL expiry flows
|
|
5. **OpenAPI tests** — test spec parsing, SSRF blocking, MCP generation
|
|
6. **E2E tests** — 6 critical flows (happy path, cancel+approve, cancel+reject, multi-turn, OpenAPI import, replay)
|
|
|
|
## Failure Modes
|
|
|
|
| Codepath | Failure | Mitigation |
|
|
|----------|---------|------------|
|
|
| LLM API call | Timeout/rate limit | Error message to user |
|
|
| MCP tool call | External API down | Escalation + error message |
|
|
| Interrupt resume | 30-min TTL expired | Auto-cancel + retry offer |
|
|
| PostgresSaver | DB connection lost | try/except + user-facing error |
|
|
| OpenAPI URL fetch | SSRF attempt | Block private IPs + DNS rebinding |
|
|
| Supervisor routing | Wrong agent | Fallback agent catches misroutes |
|
|
| Webhook POST | Target unreachable | Retry with backoff + log |
|
|
|
|
## Parallelization Strategy
|
|
|
|
| Lane | Steps | Modules |
|
|
|------|-------|---------|
|
|
| A | Phase 1 backend + Phase 2 | backend/app/ |
|
|
| B | Phase 1 frontend | frontend/ |
|
|
| C | SSRF utility (standalone) | backend/app/openapi/ssrf.py |
|
|
|
|
Launch A + B + C in parallel. Merge after Phase 1. Phase 3-4 are sequential (depend on core).
|
|
|
|
## Verification
|
|
|
|
1. `docker compose up` — Postgres + app starts
|
|
2. Open `http://localhost:8000` — chat UI loads
|
|
3. Send "What's the status of order 1042?" — get streaming response
|
|
4. Send "Cancel order 1042" — get interrupt prompt → approve → confirmation
|
|
5. `pytest --cov` — 80%+ coverage
|
|
6. Paste sample OpenAPI spec → tools generated → chat uses them (Phase 3)
|
|
7. View replay of completed conversation (Phase 4)
|
|
8. View analytics dashboard (Phase 4)
|
|
|
|
## GSTACK REVIEW REPORT
|
|
|
|
| Review | Trigger | Why | Runs | Status | Findings |
|
|
|--------|---------|-----|------|--------|----------|
|
|
| CEO Review | `/plan-ceo-review` | Scope & strategy | 1 | CLEAR | 6 proposals, 6 accepted, 0 deferred |
|
|
| Codex Review | `/codex review` | Independent 2nd opinion | 0 | — | — |
|
|
| Eng Review | `/plan-eng-review` | Architecture & tests (required) | 2 | CLEAR | 10 issues, 0 critical gaps |
|
|
| Design Review | `/plan-design-review` | UI/UX gaps | 0 | — | — |
|
|
|
|
- **OUTSIDE VOICE:** Claude subagent review found 10 issues. 3 cross-model tensions resolved (PostgresSaver timing, OpenAPI feasibility, timeline). 3 new findings adopted (routing fallback, resolution metric definition, LLM cost tracking).
|
|
- **UNRESOLVED:** 0
|
|
- **VERDICT:** CEO + ENG CLEARED — ready to implement
|