Address all architecture review findings:
P0 fixes:
- Add API key authentication for admin endpoints (analytics, replay, openapi)
and WebSocket connections via ADMIN_API_KEY env var
- Add PostgreSQL-backed PgSessionManager and PgInterruptManager for
multi-worker production deployments (in-memory defaults preserved)
P1 fixes:
- Implement actual tool generation in OpenAPI approve_job endpoint
using generate_tool_code() and generate_agent_yaml()
- Add missing clarification, interrupt_expired, and tool_result message
handlers in frontend ChatPage
P2 fixes:
- Replace monkey-patching on CompiledStateGraph with typed GraphContext
- Replace 9-param dispatch_message with WebSocketContext dataclass
- Extract duplicate _envelope() into shared app/api_utils.py
- Replace mutable module-level counter with crypto.randomUUID()
- Remove hardcoded mock data from ReviewPage, use api.ts wrappers
- Remove `as any` type escape from ReplayPage
All 516 tests passing, 0 TypeScript errors.
- Bump langgraph from 0.4 to 1.0+, langgraph-supervisor from 0.0.12 to 0.0.30+
- Bump langchain-core, langchain-anthropic, langchain-openai to 1.x
- Add langchain>=1.0 dependency for new create_agent location
- Migrate create_react_agent -> create_agent (prompt -> system_prompt)
- Fix create_supervisor positional arg to named agents= parameter
- Replace AsyncMock checkpointer with InMemorySaver in tests (v1 type validation)
- Update version references in README, ARCHITECTURE, eng-review-plan
README:
- Remove duplicated agent config, safety, security sections (covered by docs)
- Add ux_design_system.md and safety.py to project structure and doc links
- Convert doc links to descriptive table
agent-config-guide.md:
- Replace fictional agents/tools with real ones from agents.yaml
- Remove nonexistent 'admin' permission level (only read/write)
- Fix template names (e-commerce, saas, fintech)
- List all available built-in tools
openapi-import-guide.md:
- Fix /result -> /classifications endpoint
- Fix POST /approve to show no request body
- Remove nonexistent 'admin' access type
- Update response examples to match actual API
demo-script.md:
- Fix agent names (order_agent -> order_lookup)
- Replace fictional refund scenario with real lookup+cancel flow
ARCHITECTURE.md:
- Fix langgraph-supervisor version (v1.1 -> 0.0.12+)
docker-compose.yml:
- Expose postgres on port 5433 for local dev
- Backend: Add COUNT query and paginated response shape to conversations endpoint
Returns { conversations: [...], total, page, per_page } instead of flat array
- Frontend: Replace mock data in DashboardPage with fetchAnalytics() API calls
- Frontend: Replace mock data in ReplayListPage with fetchConversations() API calls
- Frontend: Replace mock data in ReplayPage with fetchReplay() API calls
- Add proper loading, empty, and error states to all three pages
- Align ConversationSummary type with actual DB columns (created_at, status)
- Update unit and E2E tests for new paginated conversation response shape
- Add fetchone() to FakeCursor for COUNT query support in E2E tests
- Isolate Settings tests from .env and process env leakage
- Fix analytics metadata test to unwrap psycopg Json wrapper
- Remove unused state variables causing frontend build failures
- Fix ReviewPage to use /classifications endpoint instead of nonexistent /result
- Normalize ReviewPage status enums (failed not error) and access_type values
- Align api.ts types with backend response shapes (ReplayPage, AnalyticsData, AgentUsage)
- Fix CRITICAL: use parameterized INTERVAL arithmetic (%(days)s * INTERVAL '1 day')
instead of string interpolation inside SQL literal
- Use asyncio.gather() for parallel query execution in get_analytics()
- Add range upper bound (max 365 days) to prevent DoS via full-table scans
- Add thread_id validation (alphanumeric, max 128 chars) in replay API
- Sanitize error messages to not reflect user input
- Move intent imports to TYPE_CHECKING block in graph.py (TC001)
- Rename test classes to CapWords convention (N801)
- Fix line length violations across test files (E501)
- Auto-fix import sorting (I001)
- CLAUDE.md with phase execution workflow and project conventions
- ARCHITECTURE.md with system design, data flow, and component breakdown
- DEVELOPMENT-PLAN.md with detailed 5-phase task breakdown