feat: initial project setup with planning docs

Smart Support - AI customer service action layer framework.
Includes design doc, CEO plan, eng review, test plan, and README.
This commit is contained in:
Yaojia Wang
2026-03-29 21:11:36 +02:00
commit f93e8baef1
8 changed files with 762 additions and 0 deletions

194
eng-review-plan.md Normal file
View File

@@ -0,0 +1,194 @@
# Smart Support Framework — Eng Review Plan
## Context
Build a pluggable AI customer support framework. Core value: "Paste your API, get an AI agent that executes actions." This plan incorporates all CEO review expansions (6 features) with re-sequenced phasing (core first). Timeline extended to 6-7 weeks per outside voice feedback.
No code exists yet. Greenfield project.
## Architecture Decisions
```
Customer → React Chat UI → FastAPI WebSocket → LangGraph Supervisor → Agents → MCP Tools → Client APIs
↑ ↑
Agent Registry interrupt()
(YAML config) (HITL safety)
PostgresSaver
(checkpoint persistence)
```
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Agent orchestration | `langgraph-supervisor` v1.1 | Built-in supervisor with middleware. Don't rebuild. [Layer 1] |
| MCP integration | `langchain-mcp-adapters` + `@tool` | MultiServerMCPClient for MCP, @tool for CLI/API. No custom base class. [Layer 1] |
| Checkpointer | PostgresSaver from day one (app + tests) | Phase 4 analytics/replay needs queryable data. Docker Compose. |
| LLM provider | LangChain `BaseChatModel` + env config | No custom wrapper. `LLM_PROVIDER` + `LLM_MODEL` env vars. [Layer 1] |
| Streaming | FastAPI WebSocket + `astream_events()` | Built-in. No custom streaming layer. [Layer 1] |
| OpenAPI import | Full MCP server generation + LLM classification + human review | Parse spec → generate tools → LLM classifies read/write/params → operator reviews |
| OpenAPI import UX | Async background task with WebSocket progress | Don't block chat during import |
| Replay | Custom paginated API endpoint | Not raw `get_state_history()`. Design for 200+ turn threads. |
| Interrupt TTL | Auto-cancel + retry offer after 30 min | Stale approvals are dangerous. Re-evaluate current state on retry. |
| Routing fallback | General-purpose fallback agent | Catches misroutes. TODO for routing accuracy eval. |
| Resolution metric | Tool call success + no escalation | Honest starting definition. Refine with customer satisfaction signals. |
| Cost tracking | LangChain callback logging tokens per conversation | Surface cost-per-resolution in analytics. |
| SSRF protection | Block private IPs + DNS rebinding protection | Mandatory for OpenAPI URL fetching. Build as standalone utility. |
| DB error handling | try/except around graph invocation | Return clear error message to user, don't fail silently. |
## Project Structure
```
smart-support/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI app + WebSocket
│ │ ├── graph.py # LangGraph supervisor setup
│ │ ├── agents/ # Agent definitions + tools
│ │ ├── registry.py # YAML agent registry loader
│ │ ├── openapi/ # OpenAPI parser + MCP server generator
│ │ ├── replay/ # Replay API endpoint
│ │ ├── analytics/ # Analytics queries + endpoint
│ │ └── callbacks.py # Token usage logging callback
│ ├── agents.yaml # Agent registry config
│ ├── templates/ # Vertical templates (e-commerce.yaml, etc.)
│ └── tests/
├── frontend/ # React chat UI + replay + dashboard
├── docker-compose.yml # Postgres + app
└── pyproject.toml
```
## Phasing (6-7 weeks)
### Phase 1 (Weeks 1-3): Core Framework
- FastAPI backend with WebSocket for chat
- LangGraph supervisor with 2-3 demo agents (order lookup, FAQ, escalation)
- PostgresSaver checkpointer via Docker Compose
- YAML-based agent registry with validation
- React chat UI with streaming tokens
- Agent personality via YAML config
- Basic interrupt() flow for write operations
- Fallback agent for misrouted queries
- Token usage logging callback
- Try/except for DB errors
- **Integration checkpoint:** End of week 3, full chat loop works end-to-end
### Phase 2 (Weeks 3-4): Multi-Agent + Safety
- Full supervisor routing with intent classification
- Webhook escalation (HTTP POST to configured URL + retry)
- Vertical templates (YAML configs for e-commerce, SaaS)
- Expired interrupt handling (auto-cancel + retry offer after 30-min TTL)
- **Integration checkpoint:** End of week 4, multi-agent routing + interrupt flow works
### Phase 3 (Weeks 4-6): OpenAPI Auto-Discovery
- Parse OpenAPI 3.0 specs from user-provided URLs
- SSRF protection (block private IPs, DNS rebinding, URL allowlist)
- Generate full MCP servers wrapping each endpoint
- LLM-assisted endpoint classification (read/write, customer params, agent groupings)
- Operator review/correction UI for classifications
- Auto-generate agent YAML from classified spec
- Async import with WebSocket progress updates
- **Integration checkpoint:** End of week 6, paste a real API spec → tools work in chat
### Phase 4 (Weeks 6-7): Analytics + Replay
- Custom paginated replay API endpoint
- Replay UI (step-by-step timeline in React)
- Analytics queries (resolution rate, agent usage, escalation %, cost-per-resolution)
- Analytics dashboard UI with zero-state handling
- Resolution rate = successful tool call + no escalation
- **Integration checkpoint:** End of week 7, full product demo ready
### Phase 5 (Buffer): Polish + Demo Prep
- Error handling hardening
- Demo script and sample data
- Docker Compose for full-stack deployment
## Tech Stack
- Python 3.11+, FastAPI, LangGraph v1.1.0
- langgraph-supervisor, langchain-mcp-adapters, langgraph-checkpoint-postgres v3.0.5
- React (frontend), PostgreSQL 16 (via Docker Compose)
- Claude Sonnet 4.6 via `ChatAnthropic` (configurable via env)
- pytest + FastAPI TestClient for backend tests
- openapi-spec-validator for spec validation
## NOT in scope
- Authentication/authorization (deferred to pre-production)
- Multi-tenant architecture (deferred to first paid customer)
- CI/CD pipeline (manual deploy for prototype)
- Rate limiting (deferred to pre-production)
- Zendesk/Intercom marketplace integration (deferred to post-validation)
- Mobile-responsive chat UI (desktop-only for demo)
- Internationalization/i18n
- Billing/pricing infrastructure
- Distribution pipeline (manual Docker Compose deploy)
## What already exists (reuse, don't rebuild)
- `langgraph-supervisor` — agent orchestration
- `langgraph-checkpoint-postgres` — state persistence
- LangGraph `interrupt()` — human-in-the-loop
- `langchain-mcp-adapters` (`MultiServerMCPClient`) — MCP tool integration
- LangChain `BaseChatModel` — LLM provider abstraction
- FastAPI WebSocket + `astream_events()` — streaming
- `openapi-spec-validator` — OpenAPI spec validation
## Testing Strategy
TDD per phase. 80%+ coverage target. pytest + FastAPI TestClient.
45 codepaths identified (33 code paths + 12 user flows, 6 E2E).
Key test categories:
1. **Graph tests** — invoke supervisor with mock tools, assert routing + state
2. **MCP tool tests** — mock external HTTP, test structured responses
3. **WebSocket tests** — FastAPI TestClient, test message → response cycle
4. **Interrupt tests** — test approval, rejection, and TTL expiry flows
5. **OpenAPI tests** — test spec parsing, SSRF blocking, MCP generation
6. **E2E tests** — 6 critical flows (happy path, cancel+approve, cancel+reject, multi-turn, OpenAPI import, replay)
## Failure Modes
| Codepath | Failure | Mitigation |
|----------|---------|------------|
| LLM API call | Timeout/rate limit | Error message to user |
| MCP tool call | External API down | Escalation + error message |
| Interrupt resume | 30-min TTL expired | Auto-cancel + retry offer |
| PostgresSaver | DB connection lost | try/except + user-facing error |
| OpenAPI URL fetch | SSRF attempt | Block private IPs + DNS rebinding |
| Supervisor routing | Wrong agent | Fallback agent catches misroutes |
| Webhook POST | Target unreachable | Retry with backoff + log |
## Parallelization Strategy
| Lane | Steps | Modules |
|------|-------|---------|
| A | Phase 1 backend + Phase 2 | backend/app/ |
| B | Phase 1 frontend | frontend/ |
| C | SSRF utility (standalone) | backend/app/openapi/ssrf.py |
Launch A + B + C in parallel. Merge after Phase 1. Phase 3-4 are sequential (depend on core).
## Verification
1. `docker compose up` — Postgres + app starts
2. Open `http://localhost:8000` — chat UI loads
3. Send "What's the status of order 1042?" — get streaming response
4. Send "Cancel order 1042" — get interrupt prompt → approve → confirmation
5. `pytest --cov` — 80%+ coverage
6. Paste sample OpenAPI spec → tools generated → chat uses them (Phase 3)
7. View replay of completed conversation (Phase 4)
8. View analytics dashboard (Phase 4)
## GSTACK REVIEW REPORT
| Review | Trigger | Why | Runs | Status | Findings |
|--------|---------|-----|------|--------|----------|
| CEO Review | `/plan-ceo-review` | Scope & strategy | 1 | CLEAR | 6 proposals, 6 accepted, 0 deferred |
| Codex Review | `/codex review` | Independent 2nd opinion | 0 | — | — |
| Eng Review | `/plan-eng-review` | Architecture & tests (required) | 2 | CLEAR | 10 issues, 0 critical gaps |
| Design Review | `/plan-design-review` | UI/UX gaps | 0 | — | — |
- **OUTSIDE VOICE:** Claude subagent review found 10 issues. 3 cross-model tensions resolved (PostgresSaver timing, OpenAPI feasibility, timeline). 3 new findings adopted (routing fallback, resolution metric definition, LLM cost tracking).
- **UNRESOLVED:** 0
- **VERDICT:** CEO + ENG CLEARED — ready to implement