feat: complete phase 5 -- error hardening, frontend, Docker, demo, docs

Backend:
- ConversationTracker: Protocol + PostgresConversationTracker for lifecycle tracking
- Error handler: ErrorCategory enum, classify_error(), with_retry() exponential backoff
- Wire PostgresAnalyticsRecorder + ConversationTracker into ws_handler
- Rate limiting (10 msg/10s per thread), edge case hardening
- Health endpoint GET /api/health, version 0.5.0
- Demo seed data script + sample OpenAPI spec

Frontend (all new):
- React Router with NavBar (Chat / Replay / Dashboard / Review)
- ReplayListPage + ReplayPage with ReplayTimeline component
- DashboardPage with MetricCard, range selector, zero-state
- ReviewPage for OpenAPI classification review
- ErrorBanner for WebSocket disconnect handling
- API client (api.ts) with typed fetch wrappers

Infrastructure:
- Frontend Dockerfile (multi-stage node -> nginx)
- nginx.conf with SPA routing + API/WS proxy
- docker-compose.yml with frontend service + healthchecks
- .env.example files (root + backend)

Documentation:
- README.md with quick start and architecture
- Agent configuration guide
- OpenAPI import guide
- Deployment guide
- Demo script

48 new tests, 449 total passing, 92.87% coverage
This commit is contained in:
Yaojia Wang
2026-03-31 21:20:06 +02:00
parent 38644594d2
commit 0e78e5b06b
44 changed files with 3397 additions and 169 deletions

242
README.md
View File

@@ -1,159 +1,165 @@
# Smart Support
AI 客服行动层框架。粘贴你的 API获得一个能执行真实操作的智能客服。
AI customer support action layer. Paste your API spec, get an AI agent that executes real actions.
## 问题
## The Problem
现有客服工具(ZendeskIntercomAda)擅长回答 FAQ但自动化率卡在 20-30%。剩下 70% 的工单需要人工登录内部系统,手动查订单、取消订单、发优惠券。
Existing support tools (Zendesk, Intercom, Ada) answer FAQs well but automation
rates stall at 20-30%. The remaining 70% of tickets require agents to manually
log into internal systems to look up orders, cancel orders, issue coupons.
Smart Support 是补全这个缺口的「行动层」。它不替代现有客服平台,而是让 AI 能直接调用内部系统完成操作。
Smart Support fills that gap as the "action layer" -- it does not replace your
existing support platform, it enables AI to directly call your internal systems.
## 工作原理
## How It Works
```
客户消息 → Chat UI FastAPI WebSocket LangGraph Supervisor → 专业 Agent MCP Tools → 你的内部系统
Agent 注册表 interrupt()
(YAML 配置) (人工确认)
PostgresSaver
(会话状态持久化)
User message -> Chat UI -> FastAPI WebSocket -> LangGraph Supervisor -> Specialist Agent -> MCP Tools -> Your systems
| |
Agent Registry interrupt()
(YAML config) (human approval)
|
PostgresSaver
(session persistence)
```
1. 客户在聊天界面发送消息
2. LangGraph Supervisor 分析意图,路由到对应的专业 Agent
3. Agent 通过 MCP 协议调用你的内部系统(查订单、取消订单、发折扣...
4. 涉及写操作时,自动触发人工确认流程
5. 所有操作全程记录,支持回放和分析
1. User sends a message in the chat UI.
2. LangGraph Supervisor classifies intent and routes to the right agent.
3. Agent calls your internal systems via MCP tools.
4. Write operations trigger a human-in-the-loop approval gate.
5. All operations are logged with full replay and analytics.
## 核心特性
## Key Features
- **多 Agent 协作** - 不同操作由不同 Agent 处理,各自拥有独立的权限边界和工具集
- **即插即用** - 粘贴 OpenAPI 规范 URL,自动生成 MCP 工具和 Agent 配置
- **人工确认** - 所有写操作(取消、退款、修改)需要人工审批,读操作直接执行
- **会话上下文** - 支持多轮对话Agent 能理解「取消那个订单」这样的指代
- **实时流式输出** - WebSocket 双向通信,逐 token 流式返回
- **对话回放** - 逐步查看 Agent 决策过程、工具调用和返回结果
- **数据分析** - 解决率、Agent 使用率、升级率、每次对话成本
- **YAML 驱动配置** - Agent 定义、人设、垂直模板全部通过 YAML 配置
- **Multi-agent routing** -- each operation goes to a specialist agent with its own tools and permissions
- **Zero-config import** -- paste an OpenAPI 3.0 URL, agents are generated automatically
- **Human-in-the-loop** -- all write operations (cancel, refund, modify) require approval; reads execute immediately
- **Session context** -- multi-turn conversation with persistent state across reconnects
- **Real-time streaming** -- WebSocket token streaming with live tool call visibility
- **Conversation replay** -- step-by-step audit trail of every agent decision
- **Analytics dashboard** -- resolution rate, agent usage, escalation rate, cost per conversation
- **YAML-driven config** -- agents, personas, and vertical templates in a single file
## 技术栈
## Tech Stack
| 组件 | 技术选型 |
|------|---------|
| 后端 | Python 3.11+, FastAPI |
| Agent 编排 | LangGraph v1.1, langgraph-supervisor |
| 工具集成 | langchain-mcp-adapters, @tool |
| 状态持久化 | PostgreSQL + langgraph-checkpoint-postgres |
| LLM | Claude Sonnet 4.6(可切换 OpenAI、Google 等) |
| 前端 | React |
| 部署 | Docker Compose |
| Component | Technology |
|-----------|-----------|
| Backend | Python 3.11+, FastAPI |
| Agent orchestration | LangGraph v1.1 |
| Session state | PostgreSQL + langgraph-checkpoint-postgres |
| LLM | Claude Sonnet 4.6 (configurable: OpenAI, Google) |
| Frontend | React 19, TypeScript, Vite |
| Deployment | Docker Compose |
## 项目结构
## Quick Start
```bash
git clone <repo-url>
cd smart-support
# Configure your LLM API key
cp .env.example .env
# Edit .env: set ANTHROPIC_API_KEY (or OPENAI_API_KEY)
# Start all services
docker compose up -d
# Open the app
open http://localhost
```
## Project Structure
```
smart-support/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI + WebSocket 入口
│ │ ├── graph.py # LangGraph Supervisor 配置
│ │ ├── agents/ # Agent 定义 + 工具
│ │ ├── registry.py # YAML Agent 注册表加载器
│ │ ├── openapi/ # OpenAPI 解析 + MCP 服务器生成
│ │ ├── replay/ # 对话回放 API
│ │ ├── analytics/ # 数据分析查询 + API
│ │ ── callbacks.py # Token 用量统计
│ ├── agents.yaml # Agent 注册表配置
├── templates/ # 垂直行业模板
── tests/
├── frontend/ # React 聊天 UI + 回放 + 仪表盘
├── docker-compose.yml # PostgreSQL + 应用
── pyproject.toml
│ │ ├── main.py # FastAPI + WebSocket entry point
│ │ ├── graph.py # LangGraph Supervisor
│ │ ├── ws_handler.py # WebSocket message dispatch + rate limiting
│ │ ├── conversation_tracker.py # Conversation lifecycle tracking
│ │ ├── agents/ # Agent definitions and tools
│ │ ├── registry.py # YAML agent registry loader
│ │ ├── openapi/ # OpenAPI parser and review API
│ │ ── replay/ # Conversation replay API
│ ├── analytics/ # Analytics queries and API
│ └── tools/ # Error handling and retry utilities
── agents.yaml # Agent registry configuration
├── fixtures/ # Demo data and sample OpenAPI spec
│ └── tests/ # Unit, integration, and E2E tests
── frontend/
│ ├── src/
│ │ ├── pages/ # Chat, Replay, Dashboard, Review pages
│ │ ├── components/ # NavBar, Layout, MetricCard, ReplayTimeline
│ │ ├── hooks/ # useWebSocket with reconnect support
│ │ └── api.ts # Typed API client
│ └── Dockerfile # Multi-stage nginx build
├── docs/ # Architecture, deployment, guides
├── docker-compose.yml # Full-stack compose
└── .env.example # Environment variable template
```
## 快速开始
```bash
# 启动 PostgreSQL 和应用
docker compose up
# 访问聊天界面
open http://localhost:8000
```
## Agent 配置示例
## Agent Configuration
```yaml
# agents.yaml
agents:
- name: order_lookup
description: 查询订单状态、物流信息
permission: read
personality:
tone: professional
greeting: "您好,我来帮您查询订单信息。"
tools:
- get_order_status
- get_tracking_info
- name: order_actions
description: 取消订单、修改订单
permission: write # 触发人工确认
personality:
tone: careful
greeting: "我可以帮您处理订单变更,所有操作都会先经过您的确认。"
tools:
- cancel_order
- modify_order
- name: discount
description: 发放优惠券、折扣码
- name: order_agent
description: "Handles order status, tracking, and cancellations."
permission: write
tools:
- apply_discount
- generate_coupon
- get_order_status
- cancel_order
personality:
tone: friendly
greeting: "I can help with your order. What is the order number?"
escalation_message: "I'm escalating this to a human agent."
- name: general_agent
description: "Answers general questions and FAQs."
permission: read
tools:
- search_faq
```
## OpenAPI 自动接入
## API Endpoints
不需要手动写 MCP 连接器。粘贴你的 API 规范 URL
| Method | Path | Description |
|--------|------|-------------|
| WS | `/ws` | Main WebSocket chat endpoint |
| GET | `/api/health` | Health check |
| GET | `/api/conversations` | List conversations |
| GET | `/api/replay/{thread_id}` | Replay conversation |
| GET | `/api/analytics` | Analytics summary |
| POST | `/api/openapi/import` | Import OpenAPI spec |
| GET | `/api/openapi/jobs/{id}` | Check import job status |
1. 框架解析 OpenAPI 3.0 规范
2. LLM 自动分类每个端点(读/写、客户参数、Agent 分组)
3. 运维人员审核分类结果
4. 自动生成 MCP 服务器 + Agent YAML 配置
5. 新工具立即可用
## Security
## 安全设计
- **SSRF protection** -- OpenAPI import blocks private IPs and metadata service URLs
- **Input validation** -- messages validated for size (32 KB), content length (10 KB), thread ID format
- **Rate limiting** -- 10 messages per 10 seconds per session
- **Audit trail** -- every tool call logged with agent, params, result, timestamp
- **Permission isolation** -- each agent only accesses its configured tools
- **Interrupt TTL** -- unanswered approval prompts expire after 30 minutes
- **人工确认** - 所有写操作需要客户或运维人员批准
- **SSRF 防护** - OpenAPI URL 导入时屏蔽内网地址和 DNS 重绑定攻击
- **操作审计** - 每个操作记录 Agent、参数、结果、时间戳
- **权限隔离** - 每个 Agent 只能访问其配置的工具集
- **中断超时** - 30 分钟未确认的操作自动取消,防止过期审批
## Running Tests
## 开发阶段
```bash
cd backend
pytest --cov=app --cov-report=term-missing
```
| 阶段 | 周期 | 内容 |
|------|------|------|
| Phase 1 | 第 1-3 周 | 核心框架Chat UI + Supervisor + Agent 注册表 + 中断流程 |
| Phase 2 | 第 3-4 周 | 多 Agent 路由 + Webhook 升级 + 垂直模板 |
| Phase 3 | 第 4-6 周 | OpenAPI 自动发现 + MCP 服务器生成 + SSRF 防护 |
| Phase 4 | 第 6-7 周 | 对话回放 + 数据分析仪表盘 |
Coverage is enforced at 80%+.
## 目标用户
## Documentation
中型电商公司(日均 500-5000 订单5-20 名客服)的客户体验负责人。
他们的痛点:客服需要在 Zendesk 和 Shopify 后台之间反复切换手动执行查询和操作。Smart Support 让 AI 直接完成这些操作,人工只需审批关键步骤。
## 相关文档
- [设计文档](design-doc.md) - 问题定义、约束、方案选择
- [CEO 计划](ceo-plan.md) - 产品愿景、范围决策
- [工程评审计划](eng-review-plan.md) - 架构决策、测试策略、失败模式
- [测试计划](eng-review-test-plan.md) - 测试路径、边界情况、E2E 流程
- [待办事项](TODOS.md) - 延迟到后续阶段的工作
- [Architecture](docs/ARCHITECTURE.md) -- System design and component diagram
- [Development Plan](docs/DEVELOPMENT-PLAN.md) -- Phase breakdown and status
- [Agent Config Guide](docs/agent-config-guide.md) -- How to configure agents
- [OpenAPI Import Guide](docs/openapi-import-guide.md) -- Auto-discovery workflow
- [Deployment Guide](docs/deployment.md) -- Docker and production deployment
- [Demo Script](docs/demo-script.md) -- Step-by-step live demo walkthrough
## License