feat: initial project setup with planning docs

Smart Support - AI customer service action layer framework. Includes design doc, CEO plan, eng review, test plan, and README.
2026-03-29 21:11:36 +02:00
commit f93e8baef1
8 changed files with 762 additions and 0 deletions
--- a/eng-review-plan.md
+++ b/eng-review-plan.md
@@ -0,0 +1,194 @@
+# Smart Support Framework — Eng Review Plan
+
+## Context
+
+Build a pluggable AI customer support framework. Core value: "Paste your API, get an AI agent that executes actions." This plan incorporates all CEO review expansions (6 features) with re-sequenced phasing (core first). Timeline extended to 6-7 weeks per outside voice feedback.
+
+No code exists yet. Greenfield project.
+
+## Architecture Decisions
+
+```
+Customer → React Chat UI → FastAPI WebSocket → LangGraph Supervisor → Agents → MCP Tools → Client APIs
+                                                      ↑                   ↑
+                                                Agent Registry        interrupt()
+                                                (YAML config)        (HITL safety)
+                                                      ↑
+                                              PostgresSaver
+                                            (checkpoint persistence)
+```
+
+| Decision | Choice | Rationale |
+|----------|--------|-----------|
+| Agent orchestration | `langgraph-supervisor` v1.1 | Built-in supervisor with middleware. Don't rebuild. [Layer 1] |
+| MCP integration | `langchain-mcp-adapters` + `@tool` | MultiServerMCPClient for MCP, @tool for CLI/API. No custom base class. [Layer 1] |
+| Checkpointer | PostgresSaver from day one (app + tests) | Phase 4 analytics/replay needs queryable data. Docker Compose. |
+| LLM provider | LangChain `BaseChatModel` + env config | No custom wrapper. `LLM_PROVIDER` + `LLM_MODEL` env vars. [Layer 1] |
+| Streaming | FastAPI WebSocket + `astream_events()` | Built-in. No custom streaming layer. [Layer 1] |
+| OpenAPI import | Full MCP server generation + LLM classification + human review | Parse spec → generate tools → LLM classifies read/write/params → operator reviews |
+| OpenAPI import UX | Async background task with WebSocket progress | Don't block chat during import |
+| Replay | Custom paginated API endpoint | Not raw `get_state_history()`. Design for 200+ turn threads. |
+| Interrupt TTL | Auto-cancel + retry offer after 30 min | Stale approvals are dangerous. Re-evaluate current state on retry. |
+| Routing fallback | General-purpose fallback agent | Catches misroutes. TODO for routing accuracy eval. |
+| Resolution metric | Tool call success + no escalation | Honest starting definition. Refine with customer satisfaction signals. |
+| Cost tracking | LangChain callback logging tokens per conversation | Surface cost-per-resolution in analytics. |
+| SSRF protection | Block private IPs + DNS rebinding protection | Mandatory for OpenAPI URL fetching. Build as standalone utility. |
+| DB error handling | try/except around graph invocation | Return clear error message to user, don't fail silently. |
+
+## Project Structure
+
+```
+smart-support/
+├── backend/
+│   ├── app/
+│   │   ├── main.py          # FastAPI app + WebSocket
+│   │   ├── graph.py         # LangGraph supervisor setup
+│   │   ├── agents/          # Agent definitions + tools
+│   │   ├── registry.py      # YAML agent registry loader
+│   │   ├── openapi/         # OpenAPI parser + MCP server generator
+│   │   ├── replay/          # Replay API endpoint
+│   │   ├── analytics/       # Analytics queries + endpoint
+│   │   └── callbacks.py     # Token usage logging callback
+│   ├── agents.yaml          # Agent registry config
+│   ├── templates/           # Vertical templates (e-commerce.yaml, etc.)
+│   └── tests/
+├── frontend/                # React chat UI + replay + dashboard
+├── docker-compose.yml       # Postgres + app
+└── pyproject.toml
+```
+
+## Phasing (6-7 weeks)
+
+### Phase 1 (Weeks 1-3): Core Framework
+- FastAPI backend with WebSocket for chat
+- LangGraph supervisor with 2-3 demo agents (order lookup, FAQ, escalation)
+- PostgresSaver checkpointer via Docker Compose
+- YAML-based agent registry with validation
+- React chat UI with streaming tokens
+- Agent personality via YAML config
+- Basic interrupt() flow for write operations
+- Fallback agent for misrouted queries
+- Token usage logging callback
+- Try/except for DB errors
+- **Integration checkpoint:** End of week 3, full chat loop works end-to-end
+
+### Phase 2 (Weeks 3-4): Multi-Agent + Safety
+- Full supervisor routing with intent classification
+- Webhook escalation (HTTP POST to configured URL + retry)
+- Vertical templates (YAML configs for e-commerce, SaaS)
+- Expired interrupt handling (auto-cancel + retry offer after 30-min TTL)
+- **Integration checkpoint:** End of week 4, multi-agent routing + interrupt flow works
+
+### Phase 3 (Weeks 4-6): OpenAPI Auto-Discovery
+- Parse OpenAPI 3.0 specs from user-provided URLs
+- SSRF protection (block private IPs, DNS rebinding, URL allowlist)
+- Generate full MCP servers wrapping each endpoint
+- LLM-assisted endpoint classification (read/write, customer params, agent groupings)
+- Operator review/correction UI for classifications
+- Auto-generate agent YAML from classified spec
+- Async import with WebSocket progress updates
+- **Integration checkpoint:** End of week 6, paste a real API spec → tools work in chat
+
+### Phase 4 (Weeks 6-7): Analytics + Replay
+- Custom paginated replay API endpoint
+- Replay UI (step-by-step timeline in React)
+- Analytics queries (resolution rate, agent usage, escalation %, cost-per-resolution)
+- Analytics dashboard UI with zero-state handling
+- Resolution rate = successful tool call + no escalation
+- **Integration checkpoint:** End of week 7, full product demo ready
+
+### Phase 5 (Buffer): Polish + Demo Prep
+- Error handling hardening
+- Demo script and sample data
+- Docker Compose for full-stack deployment
+
+## Tech Stack
+
+- Python 3.11+, FastAPI, LangGraph v1.1.0
+- langgraph-supervisor, langchain-mcp-adapters, langgraph-checkpoint-postgres v3.0.5
+- React (frontend), PostgreSQL 16 (via Docker Compose)
+- Claude Sonnet 4.6 via `ChatAnthropic` (configurable via env)
+- pytest + FastAPI TestClient for backend tests
+- openapi-spec-validator for spec validation
+
+## NOT in scope
+
+- Authentication/authorization (deferred to pre-production)
+- Multi-tenant architecture (deferred to first paid customer)
+- CI/CD pipeline (manual deploy for prototype)
+- Rate limiting (deferred to pre-production)
+- Zendesk/Intercom marketplace integration (deferred to post-validation)
+- Mobile-responsive chat UI (desktop-only for demo)
+- Internationalization/i18n
+- Billing/pricing infrastructure
+- Distribution pipeline (manual Docker Compose deploy)
+
+## What already exists (reuse, don't rebuild)
+
+- `langgraph-supervisor` — agent orchestration
+- `langgraph-checkpoint-postgres` — state persistence
+- LangGraph `interrupt()` — human-in-the-loop
+- `langchain-mcp-adapters` (`MultiServerMCPClient`) — MCP tool integration
+- LangChain `BaseChatModel` — LLM provider abstraction
+- FastAPI WebSocket + `astream_events()` — streaming
+- `openapi-spec-validator` — OpenAPI spec validation
+
+## Testing Strategy
+
+TDD per phase. 80%+ coverage target. pytest + FastAPI TestClient.
+
+45 codepaths identified (33 code paths + 12 user flows, 6 E2E).
+
+Key test categories:
+1. **Graph tests** — invoke supervisor with mock tools, assert routing + state
+2. **MCP tool tests** — mock external HTTP, test structured responses
+3. **WebSocket tests** — FastAPI TestClient, test message → response cycle
+4. **Interrupt tests** — test approval, rejection, and TTL expiry flows
+5. **OpenAPI tests** — test spec parsing, SSRF blocking, MCP generation
+6. **E2E tests** — 6 critical flows (happy path, cancel+approve, cancel+reject, multi-turn, OpenAPI import, replay)
+
+## Failure Modes
+
+| Codepath | Failure | Mitigation |
+|----------|---------|------------|
+| LLM API call | Timeout/rate limit | Error message to user |
+| MCP tool call | External API down | Escalation + error message |
+| Interrupt resume | 30-min TTL expired | Auto-cancel + retry offer |
+| PostgresSaver | DB connection lost | try/except + user-facing error |
+| OpenAPI URL fetch | SSRF attempt | Block private IPs + DNS rebinding |
+| Supervisor routing | Wrong agent | Fallback agent catches misroutes |
+| Webhook POST | Target unreachable | Retry with backoff + log |
+
+## Parallelization Strategy
+
+| Lane | Steps | Modules |
+|------|-------|---------|
+| A | Phase 1 backend + Phase 2 | backend/app/ |
+| B | Phase 1 frontend | frontend/ |
+| C | SSRF utility (standalone) | backend/app/openapi/ssrf.py |
+
+Launch A + B + C in parallel. Merge after Phase 1. Phase 3-4 are sequential (depend on core).
+
+## Verification
+
+1. `docker compose up` — Postgres + app starts
+2. Open `http://localhost:8000` — chat UI loads
+3. Send "What's the status of order 1042?" — get streaming response
+4. Send "Cancel order 1042" — get interrupt prompt → approve → confirmation
+5. `pytest --cov` — 80%+ coverage
+6. Paste sample OpenAPI spec → tools generated → chat uses them (Phase 3)
+7. View replay of completed conversation (Phase 4)
+8. View analytics dashboard (Phase 4)
+
+## GSTACK REVIEW REPORT
+
+| Review | Trigger | Why | Runs | Status | Findings |
+|--------|---------|-----|------|--------|----------|
+| CEO Review | `/plan-ceo-review` | Scope & strategy | 1 | CLEAR | 6 proposals, 6 accepted, 0 deferred |
+| Codex Review | `/codex review` | Independent 2nd opinion | 0 | — | — |
+| Eng Review | `/plan-eng-review` | Architecture & tests (required) | 2 | CLEAR | 10 issues, 0 critical gaps |
+| Design Review | `/plan-design-review` | UI/UX gaps | 0 | — | — |
+
+- **OUTSIDE VOICE:** Claude subagent review found 10 issues. 3 cross-model tensions resolved (PostgresSaver timing, OpenAPI feasibility, timeline). 3 new findings adopted (routing fallback, resolution metric definition, LLM cost tracking).
+- **UNRESOLVED:** 0
+- **VERDICT:** CEO + ENG CLEARED — ready to implement