Yaojia Wang
f0699436c5
refactor: engineering improvements -- API versioning, structured logging, Alembic, error standardization, test coverage
...
- API versioning: all REST endpoints prefixed with /api/v1/
- Structured logging: replaced stdlib logging with structlog (console/JSON modes)
- Alembic migrations: versioned DB schema with initial migration
- Error standardization: global exception handlers for consistent envelope format
- Interrupt cleanup: asyncio background task for expired interrupt removal
- Integration tests: +30 tests (analytics, replay, openapi, error, session APIs)
- Frontend tests: +57 tests (all components, pages, useWebSocket hook)
- Backend: 557 tests, 89.75% coverage | Frontend: 80 tests, 16 test files
2026-04-06 23:19:29 +02:00
Yaojia Wang
af53111928
refactor: fix architectural issues across frontend and backend
...
Address all architecture review findings:
P0 fixes:
- Add API key authentication for admin endpoints (analytics, replay, openapi)
and WebSocket connections via ADMIN_API_KEY env var
- Add PostgreSQL-backed PgSessionManager and PgInterruptManager for
multi-worker production deployments (in-memory defaults preserved)
P1 fixes:
- Implement actual tool generation in OpenAPI approve_job endpoint
using generate_tool_code() and generate_agent_yaml()
- Add missing clarification, interrupt_expired, and tool_result message
handlers in frontend ChatPage
P2 fixes:
- Replace monkey-patching on CompiledStateGraph with typed GraphContext
- Replace 9-param dispatch_message with WebSocketContext dataclass
- Extract duplicate _envelope() into shared app/api_utils.py
- Replace mutable module-level counter with crypto.randomUUID()
- Remove hardcoded mock data from ReviewPage, use api.ts wrappers
- Remove `as any` type escape from ReplayPage
All 516 tests passing, 0 TypeScript errors.
2026-04-06 15:59:14 +02:00
Yaojia Wang
b8654aa31f
feat: upgrade LangGraph to 1.x and migrate deprecated APIs
...
- Bump langgraph from 0.4 to 1.0+, langgraph-supervisor from 0.0.12 to 0.0.30+
- Bump langchain-core, langchain-anthropic, langchain-openai to 1.x
- Add langchain>=1.0 dependency for new create_agent location
- Migrate create_react_agent -> create_agent (prompt -> system_prompt)
- Fix create_supervisor positional arg to named agents= parameter
- Replace AsyncMock checkpointer with InMemorySaver in tests (v1 type validation)
- Update version references in README, ARCHITECTURE, eng-review-plan
2026-04-06 14:51:51 +02:00
Yaojia Wang
19fc9f3289
test: close coverage gaps and add frontend test infrastructure
...
Backend (516 tests, 94% coverage):
- Add azure_openai endpoint/deployment validation tests (config.py -> 100%)
- Add _total_conversations and _avg_turns direct tests (queries.py -> 100%)
- Add transformer edge cases: list content, string checkpoint, invalid JSON,
malformed message graceful skip (transformer.py -> 93%)
- Add safety combined status_code+error_message interaction tests
- Fix ambiguous 200/422 assertion to strict 422
- Add E2E pagination shape assertions (total, page, per_page, row count)
- Fix ReplayPool mock to respect LIMIT/OFFSET params
Frontend (23 tests, vitest + happy-dom + @testing-library/react):
- Add vitest infrastructure with happy-dom environment
- Add api.ts tests: success, HTTP error, success=false, URL encoding
- Add DashboardPage tests: loading, data, error, empty states
- Add ReplayListPage tests: loading, empty, data, error, status badge classes
- Add ReplayPage tests: loading, steps, empty, error states
2026-04-06 13:32:10 +02:00
Yaojia Wang
036e12349d
refactor: formalize safety rules, extract shared styles, reconcile docs (P2)
...
- Add backend/app/safety.py with explicit confirmation policy, multi-intent
semantics, and MCP error taxonomy with retry classification
- Add 26 unit tests for safety module (confirmation rules, error taxonomy)
- Extract repeated inline styles into shared CSS classes in index.css
(section-card, stat-label, status-badge, data-table, empty/error-state,
pagination-bar)
- Refactor DashboardPage, ReplayListPage, ReplayPage to use shared classes
- Update README: add missing API endpoints, document safety/confirmation rules
- Use proper HTML entities for arrow/dash characters to fix encoding glitches
2026-04-05 23:10:50 +02:00
Yaojia Wang
e0931daece
feat: wire frontend pages to live APIs and standardize response contracts (P1)
...
- Backend: Add COUNT query and paginated response shape to conversations endpoint
Returns { conversations: [...], total, page, per_page } instead of flat array
- Frontend: Replace mock data in DashboardPage with fetchAnalytics() API calls
- Frontend: Replace mock data in ReplayListPage with fetchConversations() API calls
- Frontend: Replace mock data in ReplayPage with fetchReplay() API calls
- Add proper loading, empty, and error states to all three pages
- Align ConversationSummary type with actual DB columns (created_at, status)
- Update unit and E2E tests for new paginated conversation response shape
- Add fetchone() to FakeCursor for COUNT query support in E2E tests
2026-04-05 23:06:00 +02:00
Yaojia Wang
e55ec42ae5
fix: restore green builds and align frontend-backend contracts (P0)
...
- Isolate Settings tests from .env and process env leakage
- Fix analytics metadata test to unwrap psycopg Json wrapper
- Remove unused state variables causing frontend build failures
- Fix ReviewPage to use /classifications endpoint instead of nonexistent /result
- Normalize ReviewPage status enums (failed not error) and access_type values
- Align api.ts types with backend response shapes (ReplayPage, AnalyticsData, AgentUsage)
2026-04-05 23:00:39 +02:00
Yaojia Wang
189a0fad34
feat(ui): implement premium beige design system and ux refinements
2026-04-05 22:35:48 +02:00
Yaojia Wang
d2b4610df9
fix: address code and security review findings for Phase 5
...
- Add nginx security headers (X-Frame-Options, X-Content-Type-Options, etc.)
- Fix postgres networking: add to app_network, comment out host port exposure
- Fix rate limit memory leak: add bounded eviction for stale thread entries
- Use immutable update pattern in rate limit check (no .append mutation)
- Extract _VERSION constant to avoid duplicate hardcoded version string
2026-03-31 21:35:13 +02:00
Yaojia Wang
0e78e5b06b
feat: complete phase 5 -- error hardening, frontend, Docker, demo, docs
...
Backend:
- ConversationTracker: Protocol + PostgresConversationTracker for lifecycle tracking
- Error handler: ErrorCategory enum, classify_error(), with_retry() exponential backoff
- Wire PostgresAnalyticsRecorder + ConversationTracker into ws_handler
- Rate limiting (10 msg/10s per thread), edge case hardening
- Health endpoint GET /api/health, version 0.5.0
- Demo seed data script + sample OpenAPI spec
Frontend (all new):
- React Router with NavBar (Chat / Replay / Dashboard / Review)
- ReplayListPage + ReplayPage with ReplayTimeline component
- DashboardPage with MetricCard, range selector, zero-state
- ReviewPage for OpenAPI classification review
- ErrorBanner for WebSocket disconnect handling
- API client (api.ts) with typed fetch wrappers
Infrastructure:
- Frontend Dockerfile (multi-stage node -> nginx)
- nginx.conf with SPA routing + API/WS proxy
- docker-compose.yml with frontend service + healthchecks
- .env.example files (root + backend)
Documentation:
- README.md with quick start and architecture
- Agent configuration guide
- OpenAPI import guide
- Deployment guide
- Demo script
48 new tests, 449 total passing, 92.87% coverage
2026-03-31 21:20:06 +02:00
Yaojia Wang
38644594d2
test: add thread_id validation tests for replay API
...
- Test invalid thread_id with spaces returns 400
- Test thread_id with special chars returns 400
- Tighten existing 404 test assertion
2026-03-31 13:44:04 +02:00
Yaojia Wang
ef6e5ac2be
fix: address security findings in Phase 4 analytics and replay
...
- Fix CRITICAL: use parameterized INTERVAL arithmetic (%(days)s * INTERVAL '1 day')
instead of string interpolation inside SQL literal
- Use asyncio.gather() for parallel query execution in get_analytics()
- Add range upper bound (max 365 days) to prevent DoS via full-table scans
- Add thread_id validation (alphanumeric, max 128 chars) in replay API
- Sanitize error messages to not reflect user input
2026-03-31 13:38:09 +02:00
Yaojia Wang
33db5aeb10
feat: complete phase 4 -- conversation replay API + analytics dashboard
...
- Replay models: StepType enum, ReplayStep, ReplayPage frozen dataclasses
- Checkpoint transformer: PostgresSaver JSONB -> structured timeline steps
- Replay API: GET /api/conversations (paginated), GET /api/replay/{thread_id}
- Analytics models: AgentUsage, InterruptStats, AnalyticsResult
- Analytics event recorder: Protocol + PostgresAnalyticsRecorder + NoOp
- Analytics queries: resolution_rate, agent_usage, escalation_rate, cost, interrupts
- Analytics API: GET /api/analytics?range=Xd with envelope response
- DB migration: analytics_events table + conversations column additions
- 74 new tests, 399 total passing, 92.87% coverage
2026-03-31 13:35:45 +02:00
Yaojia Wang
a2f750269d
fix: address critical security and code review findings in Phase 3
...
- Wire ImportOrchestrator into review_api start_import via BackgroundTasks
- Sanitize docstrings in generated tool code to prevent code injection
- Add Literal["read", "write"] validation for access_type
- Add regex validation for agent_group
- Validate URL scheme (http/https only) in ImportRequest
- Validate LLM output fields (clamp confidence, validate access_type)
- Use dataclasses.replace instead of manual reconstruction in importer
- Expand SSRF blocked networks (Carrier-Grade NAT, IPv4-mapped IPv6, etc.)
- Make _BLOCKED_NETWORKS immutable tuple
- Use yaml.safe_dump instead of yaml.dump
- Fix _to_snake_case for empty strings and Python keywords
2026-03-31 00:28:28 +02:00
Yaojia Wang
a54eb224e0
feat: complete phase 3 -- OpenAPI auto-discovery, SSRF protection, tool generation
...
- SSRF protection: private IP blocking, DNS rebinding defense, redirect validation
- OpenAPI fetcher with SSRF guard, JSON/YAML auto-detection, 10MB limit
- Structural spec validator (3.0.x/3.1.x)
- Endpoint parser with $ref resolution, auto-generated operation IDs
- Heuristic + LLM endpoint classifier with Protocol interface
- Review API at /api/openapi (import, job status, classification CRUD, approve)
- @tool code generator + Agent YAML generator
- Import orchestrator (fetch -> validate -> parse -> classify pipeline)
- 125 new tests, 322 total passing, 93.23% coverage
2026-03-31 00:10:44 +02:00
Yaojia Wang
006b4ee5d7
fix: resolve ruff lint errors in Phase 2 code
...
- Move intent imports to TYPE_CHECKING block in graph.py (TC001)
- Rename test classes to CapWords convention (N801)
- Fix line length violations across test files (E501)
- Auto-fix import sorting (I001)
2026-03-30 21:44:47 +02:00
Yaojia Wang
b861ff055f
test: add routing integration tests for Phase 2 test requirements
...
9 tests covering the complete multi-agent routing flow:
- Single-intent routing to each agent (order_lookup, order_actions, discount, fallback)
- Multi-intent routing hint injection for sequential execution
- Ambiguity detection skips graph and returns clarification
- Low confidence threshold triggers ambiguity
- No-classifier fallback to supervisor prompt routing
Fills Phase 2 test requirement for integration-level routing coverage.
Total: 197 tests, 92.60% coverage.
2026-03-30 21:41:01 +02:00
Yaojia Wang
512f988dd0
test: add Phase 2 checkpoint acceptance tests
...
18 integration tests validating all 7 Phase 2 checkpoint criteria:
1. Order query routes to order_lookup agent
2. Multi-intent classification with routing hint injection
3. Ambiguous message triggers clarification prompt
4. 30-min interrupt TTL auto-cancel with retry prompt
5. Webhook POST escalation with retry on failure
6. E-commerce template loads 4 correctly configured agents
7. Coverage at 92.60% (188 tests total)
2026-03-30 21:38:25 +02:00
Yaojia Wang
6e7b824b64
test: add integration tests for WebSocket message flow
...
17 integration tests covering:
- Happy path: token streaming, tool calls, multi-message sessions
- Interrupt flow: approve and reject paths with manager tracking
- Session TTL: expiration, sliding window reset, interrupt extension
- Validation: invalid JSON, missing fields, size limits
- Interrupt TTL: expired interrupt sends retry prompt
Fills Phase 1 test gap for integration-level WebSocket coverage.
Total: 170 tests, 92.15% coverage.
2026-03-30 21:24:31 +02:00
Yaojia Wang
1050df780d
feat: complete phase 2 -- multi-agent routing, interrupt TTL, escalation, templates
...
- Intent classification with LLM structured output (single/multi/ambiguous)
- Discount agent with apply_discount and generate_coupon tools
- Interrupt manager with 30-min TTL auto-expiration and retry prompts
- Webhook escalation module with exponential backoff retry (max 3)
- Three vertical industry templates (e-commerce, SaaS, fintech)
- Template loading in AgentRegistry
- Enhanced supervisor prompt with dynamic agent descriptions
- 153 tests passing, 90.18% coverage
2026-03-30 21:04:39 +02:00
Yaojia Wang
33488fd634
feat: complete phase 1 -- core framework with chat loop, agents, and React UI
...
Backend:
- FastAPI WebSocket /ws endpoint with streaming via LangGraph astream
- LangGraph Supervisor connecting 3 mock agents (order_lookup, order_actions, fallback)
- YAML Agent Registry with Pydantic validation and immutable configs
- PostgresSaver checkpoint persistence via langgraph-checkpoint-postgres
- Session TTL with 30-min sliding window and interrupt extension
- LLM provider abstraction (Anthropic/OpenAI/Google)
- Token usage + cost tracking callback handler
- Input validation: message size cap, thread_id format, content length
- Security: no hardcoded defaults, startup API key validation, no input reflection
Frontend:
- React 19 + TypeScript + Vite chat UI
- WebSocket hook with reconnect + exponential backoff
- Streaming token display with agent attribution
- Interrupt approval/reject UI for write operations
- Collapsible tool call viewer
Testing:
- 87 unit tests, 87% coverage (exceeds 80% requirement)
- Ruff lint + format clean
Infrastructure:
- Docker Compose (PostgreSQL 16 + backend)
- pyproject.toml with full dependency management
2026-03-30 00:54:21 +02:00