feat: complete phase 5 -- error hardening, frontend, Docker, demo, docs

Backend: - ConversationTracker: Protocol + PostgresConversationTracker for lifecycle tracking - Error handler: ErrorCategory enum, classify_error(), with_retry() exponential backoff - Wire PostgresAnalyticsRecorder + ConversationTracker into ws_handler - Rate limiting (10 msg/10s per thread), edge case hardening - Health endpoint GET /api/health, version 0.5.0 - Demo seed data script + sample OpenAPI spec Frontend (all new): - React Router with NavBar (Chat / Replay / Dashboard / Review) - ReplayListPage + ReplayPage with ReplayTimeline component - DashboardPage with MetricCard, range selector, zero-state - ReviewPage for OpenAPI classification review - ErrorBanner for WebSocket disconnect handling - API client (api.ts) with typed fetch wrappers Infrastructure: - Frontend Dockerfile (multi-stage node -> nginx) - nginx.conf with SPA routing + API/WS proxy - docker-compose.yml with frontend service + healthchecks - .env.example files (root + backend) Documentation: - README.md with quick start and architecture - Agent configuration guide - OpenAPI import guide - Deployment guide - Demo script 48 new tests, 449 total passing, 92.87% coverage
2026-03-31 21:20:06 +02:00
parent 38644594d2
commit 0e78e5b06b
44 changed files with 3397 additions and 169 deletions
--- a/README.md
+++ b/README.md
@@ -1,159 +1,165 @@
 # Smart Support

-AI 客服行动层框架。粘贴你的 API，获得一个能执行真实操作的智能客服。
+AI customer support action layer. Paste your API spec, get an AI agent that executes real actions.

-## 问题
+## The Problem

-现有客服工具（Zendesk、Intercom、Ada）擅长回答 FAQ，但自动化率卡在 20-30%。剩下 70% 的工单需要人工登录内部系统，手动查订单、取消订单、发优惠券。
+Existing support tools (Zendesk, Intercom, Ada) answer FAQs well but automation
+rates stall at 20-30%. The remaining 70% of tickets require agents to manually
+log into internal systems to look up orders, cancel orders, issue coupons.

-Smart Support 是补全这个缺口的「行动层」。它不替代现有客服平台，而是让 AI 能直接调用内部系统完成操作。
+Smart Support fills that gap as the "action layer" -- it does not replace your
+existing support platform, it enables AI to directly call your internal systems.

-## 工作原理
+## How It Works

 ```
-客户消息 → Chat UI → FastAPI WebSocket → LangGraph Supervisor → 专业 Agent → MCP Tools → 你的内部系统
-                                                ↑                      ↑
-                                          Agent 注册表            interrupt()
-                                          (YAML 配置)           (人工确认)
-                                                ↑
-                                          PostgresSaver
-                                         (会话状态持久化)
+User message -> Chat UI -> FastAPI WebSocket -> LangGraph Supervisor -> Specialist Agent -> MCP Tools -> Your systems
+                                                        |                      |
+                                                  Agent Registry          interrupt()
+                                                  (YAML config)         (human approval)
+                                                        |
+                                                  PostgresSaver
+                                               (session persistence)
 ```

-1. 客户在聊天界面发送消息
-2. LangGraph Supervisor 分析意图，路由到对应的专业 Agent
-3. Agent 通过 MCP 协议调用你的内部系统（查订单、取消订单、发折扣...）
-4. 涉及写操作时，自动触发人工确认流程
-5. 所有操作全程记录，支持回放和分析
+1. User sends a message in the chat UI.
+2. LangGraph Supervisor classifies intent and routes to the right agent.
+3. Agent calls your internal systems via MCP tools.
+4. Write operations trigger a human-in-the-loop approval gate.
+5. All operations are logged with full replay and analytics.

-## 核心特性
+## Key Features

- **多 Agent 协作** - 不同操作由不同 Agent 处理，各自拥有独立的权限边界和工具集
- **即插即用** - 粘贴 OpenAPI 规范 URL，自动生成 MCP 工具和 Agent 配置
- **人工确认** - 所有写操作（取消、退款、修改）需要人工审批，读操作直接执行
- **会话上下文** - 支持多轮对话，Agent 能理解「取消那个订单」这样的指代
- **实时流式输出** - WebSocket 双向通信，逐 token 流式返回
- **对话回放** - 逐步查看 Agent 决策过程、工具调用和返回结果
- **数据分析** - 解决率、Agent 使用率、升级率、每次对话成本
- **YAML 驱动配置** - Agent 定义、人设、垂直模板全部通过 YAML 配置
+- **Multi-agent routing** -- each operation goes to a specialist agent with its own tools and permissions
+- **Zero-config import** -- paste an OpenAPI 3.0 URL, agents are generated automatically
+- **Human-in-the-loop** -- all write operations (cancel, refund, modify) require approval; reads execute immediately
+- **Session context** -- multi-turn conversation with persistent state across reconnects
+- **Real-time streaming** -- WebSocket token streaming with live tool call visibility
+- **Conversation replay** -- step-by-step audit trail of every agent decision
+- **Analytics dashboard** -- resolution rate, agent usage, escalation rate, cost per conversation
+- **YAML-driven config** -- agents, personas, and vertical templates in a single file

-## 技术栈
+## Tech Stack

-| 组件 | 技术选型 |
-|------|---------|
-| 后端 | Python 3.11+, FastAPI |
-| Agent 编排 | LangGraph v1.1, langgraph-supervisor |
-| 工具集成 | langchain-mcp-adapters, @tool |
-| 状态持久化 | PostgreSQL + langgraph-checkpoint-postgres |
-| LLM | Claude Sonnet 4.6（可切换 OpenAI、Google 等） |
-| 前端 | React |
-| 部署 | Docker Compose |
+| Component | Technology |
+|-----------|-----------|
+| Backend | Python 3.11+, FastAPI |
+| Agent orchestration | LangGraph v1.1 |
+| Session state | PostgreSQL + langgraph-checkpoint-postgres |
+| LLM | Claude Sonnet 4.6 (configurable: OpenAI, Google) |
+| Frontend | React 19, TypeScript, Vite |
+| Deployment | Docker Compose |

-## 项目结构
+## Quick Start
+
+```bash
+git clone <repo-url>
+cd smart-support
+
+# Configure your LLM API key
+cp .env.example .env
+# Edit .env: set ANTHROPIC_API_KEY (or OPENAI_API_KEY)
+
+# Start all services
+docker compose up -d
+
+# Open the app
+open http://localhost
+```
+
+## Project Structure

 ```
 smart-support/
 ├── backend/
 │   ├── app/
-│   │   ├── main.py          # FastAPI + WebSocket 入口
-│   │   ├── graph.py         # LangGraph Supervisor 配置
-│   │   ├── agents/          # Agent 定义 + 工具
-│   │   ├── registry.py      # YAML Agent 注册表加载器
-│   │   ├── openapi/         # OpenAPI 解析 + MCP 服务器生成
-│   │   ├── replay/          # 对话回放 API
-│   │   ├── analytics/       # 数据分析查询 + API
-│   │   └── callbacks.py     # Token 用量统计
-│   ├── agents.yaml          # Agent 注册表配置
-│   ├── templates/           # 垂直行业模板
-│   └── tests/
-├── frontend/                # React 聊天 UI + 回放 + 仪表盘
-├── docker-compose.yml       # PostgreSQL + 应用
-└── pyproject.toml
+│   │   ├── main.py              # FastAPI + WebSocket entry point
+│   │   ├── graph.py             # LangGraph Supervisor
+│   │   ├── ws_handler.py        # WebSocket message dispatch + rate limiting
+│   │   ├── conversation_tracker.py  # Conversation lifecycle tracking
+│   │   ├── agents/              # Agent definitions and tools
+│   │   ├── registry.py          # YAML agent registry loader
+│   │   ├── openapi/             # OpenAPI parser and review API
+│   │   ├── replay/              # Conversation replay API
+│   │   ├── analytics/           # Analytics queries and API
+│   │   └── tools/               # Error handling and retry utilities
+│   ├── agents.yaml              # Agent registry configuration
+│   ├── fixtures/                # Demo data and sample OpenAPI spec
+│   └── tests/                   # Unit, integration, and E2E tests
+├── frontend/
+│   ├── src/
+│   │   ├── pages/               # Chat, Replay, Dashboard, Review pages
+│   │   ├── components/          # NavBar, Layout, MetricCard, ReplayTimeline
+│   │   ├── hooks/               # useWebSocket with reconnect support
+│   │   └── api.ts               # Typed API client
+│   └── Dockerfile               # Multi-stage nginx build
+├── docs/                        # Architecture, deployment, guides
+├── docker-compose.yml           # Full-stack compose
+└── .env.example                 # Environment variable template
 ```

-## 快速开始
-
-```bash
-# 启动 PostgreSQL 和应用
-docker compose up
-
-# 访问聊天界面
-open http://localhost:8000
-```
-
-## Agent 配置示例
+## Agent Configuration

 ```yaml
 # agents.yaml
 agents:
-  - name: order_lookup
-    description: 查询订单状态、物流信息
-    permission: read
-    personality:
-      tone: professional
-      greeting: "您好，我来帮您查询订单信息。"
-    tools:
-      - get_order_status
-      - get_tracking_info
-
-  - name: order_actions
-    description: 取消订单、修改订单
-    permission: write  # 触发人工确认
-    personality:
-      tone: careful
-      greeting: "我可以帮您处理订单变更，所有操作都会先经过您的确认。"
-    tools:
-      - cancel_order
-      - modify_order
-
-  - name: discount
-    description: 发放优惠券、折扣码
+  - name: order_agent
+    description: "Handles order status, tracking, and cancellations."
    permission: write
    tools:
-      - apply_discount
-      - generate_coupon
+      - get_order_status
+      - cancel_order
+    personality:
+      tone: friendly
+      greeting: "I can help with your order. What is the order number?"
+      escalation_message: "I'm escalating this to a human agent."
+
+  - name: general_agent
+    description: "Answers general questions and FAQs."
+    permission: read
+    tools:
+      - search_faq
 ```

-## OpenAPI 自动接入
+## API Endpoints

-不需要手动写 MCP 连接器。粘贴你的 API 规范 URL：
+| Method | Path | Description |
+|--------|------|-------------|
+| WS | `/ws` | Main WebSocket chat endpoint |
+| GET | `/api/health` | Health check |
+| GET | `/api/conversations` | List conversations |
+| GET | `/api/replay/{thread_id}` | Replay conversation |
+| GET | `/api/analytics` | Analytics summary |
+| POST | `/api/openapi/import` | Import OpenAPI spec |
+| GET | `/api/openapi/jobs/{id}` | Check import job status |

-1. 框架解析 OpenAPI 3.0 规范
-2. LLM 自动分类每个端点（读/写、客户参数、Agent 分组）
-3. 运维人员审核分类结果
-4. 自动生成 MCP 服务器 + Agent YAML 配置
-5. 新工具立即可用
+## Security

-## 安全设计
+- **SSRF protection** -- OpenAPI import blocks private IPs and metadata service URLs
+- **Input validation** -- messages validated for size (32 KB), content length (10 KB), thread ID format
+- **Rate limiting** -- 10 messages per 10 seconds per session
+- **Audit trail** -- every tool call logged with agent, params, result, timestamp
+- **Permission isolation** -- each agent only accesses its configured tools
+- **Interrupt TTL** -- unanswered approval prompts expire after 30 minutes

- **人工确认** - 所有写操作需要客户或运维人员批准
- **SSRF 防护** - OpenAPI URL 导入时屏蔽内网地址和 DNS 重绑定攻击
- **操作审计** - 每个操作记录 Agent、参数、结果、时间戳
- **权限隔离** - 每个 Agent 只能访问其配置的工具集
- **中断超时** - 30 分钟未确认的操作自动取消，防止过期审批
+## Running Tests

-## 开发阶段
+```bash
+cd backend
+pytest --cov=app --cov-report=term-missing
+```

-| 阶段 | 周期 | 内容 |
-|------|------|------|
-| Phase 1 | 第 1-3 周 | 核心框架：Chat UI + Supervisor + Agent 注册表 + 中断流程 |
-| Phase 2 | 第 3-4 周 | 多 Agent 路由 + Webhook 升级 + 垂直模板 |
-| Phase 3 | 第 4-6 周 | OpenAPI 自动发现 + MCP 服务器生成 + SSRF 防护 |
-| Phase 4 | 第 6-7 周 | 对话回放 + 数据分析仪表盘 |
+Coverage is enforced at 80%+.

-## 目标用户
+## Documentation

-中型电商公司（日均 500-5000 订单，5-20 名客服）的客户体验负责人。
-
-他们的痛点：客服需要在 Zendesk 和 Shopify 后台之间反复切换，手动执行查询和操作。Smart Support 让 AI 直接完成这些操作，人工只需审批关键步骤。
-
-## 相关文档
-
- [设计文档](design-doc.md) - 问题定义、约束、方案选择
- [CEO 计划](ceo-plan.md) - 产品愿景、范围决策
- [工程评审计划](eng-review-plan.md) - 架构决策、测试策略、失败模式
- [测试计划](eng-review-test-plan.md) - 测试路径、边界情况、E2E 流程
- [待办事项](TODOS.md) - 延迟到后续阶段的工作
+- [Architecture](docs/ARCHITECTURE.md) -- System design and component diagram
+- [Development Plan](docs/DEVELOPMENT-PLAN.md) -- Phase breakdown and status
+- [Agent Config Guide](docs/agent-config-guide.md) -- How to configure agents
+- [OpenAPI Import Guide](docs/openapi-import-guide.md) -- Auto-discovery workflow
+- [Deployment Guide](docs/deployment.md) -- Docker and production deployment
+- [Demo Script](docs/demo-script.md) -- Step-by-step live demo walkthrough

 ## License