- API versioning: all REST endpoints prefixed with /api/v1/ - Structured logging: replaced stdlib logging with structlog (console/JSON modes) - Alembic migrations: versioned DB schema with initial migration - Error standardization: global exception handlers for consistent envelope format - Interrupt cleanup: asyncio background task for expired interrupt removal - Integration tests: +30 tests (analytics, replay, openapi, error, session APIs) - Frontend tests: +57 tests (all components, pages, useWebSocket hook) - Backend: 557 tests, 89.75% coverage | Frontend: 80 tests, 16 test files
3.9 KiB
Engineering Improvements -- Development Log
Status: COMPLETED Branch:
eng/engineering-improvementsDate started: 2026-04-06 Date completed: 2026-04-06
What Was Built
Phase 1: Quick Wins (no new deps)
-
Interrupt Cleanup Background Task -- Added asyncio background task in lifespan that calls
interrupt_manager.cleanup_expired()every 60 seconds. Prevents unbounded memory growth from expired interrupts. -
API Versioning -- All REST endpoints prefixed with
/api/v1/(was/api/). Updated 4 router prefixes, Docker healthcheck, all frontend fetch URLs, and all test assertions. WebSocket/wsendpoint unchanged. -
Error Response Standardization -- Added global exception handlers for
HTTPException,RequestValidationError, andException. All error responses now use the same envelope format as success responses:{"success": false, "data": null, "error": "..."}.
Phase 2: Medium Items (new deps)
-
Alembic Database Migrations -- Replaced inline DDL in
setup_app_tables()with versioned Alembic migrations. Initial migration001_initial_schema.pycaptures all 4 tables + ALTER TABLE migration.setup_app_tables()preserved for tests. Production usesrun_alembic_migrations(). -
Structured Logging -- Replaced stdlib
logging.getLogger()withstructlog.get_logger()across 10 files. Addedlogging_config.pywith console (dev) and JSON (production) modes. Configurable viaLOG_FORMATenv var.
Phase 3: Test Coverage
-
Integration Tests (+30) -- Created 5 new test files: analytics API, replay API, OpenAPI API, error responses, session/interrupt lifecycle. Uses httpx.AsyncClient with ASGITransport for full API layer testing.
-
Frontend Tests (+57) -- Created 12 new test files covering all components (ChatInput, ChatMessages, InterruptPrompt, ErrorBanner, NavBar, MetricCard, ReplayTimeline, AgentAction, Layout), pages (ChatPage, ReviewPage), and hooks (useWebSocket).
Code Structure
New files created
backend/app/logging_config.py-- structlog configurationbackend/alembic.ini-- Alembic configbackend/alembic/env.py-- Migration environmentbackend/alembic/versions/001_initial_schema.py-- Initial migrationbackend/tests/unit/test_interrupt_cleanup.py(3 tests)backend/tests/unit/test_error_responses.py(6 tests)backend/tests/unit/test_logging_config.py(2 tests)backend/tests/integration/test_analytics_api.py(6 tests)backend/tests/integration/test_replay_api.py(6 tests)backend/tests/integration/test_openapi_api.py(5 tests)backend/tests/integration/test_error_responses.py(5 tests)backend/tests/integration/test_session_interrupt_lifecycle.py(8 tests)- 12 frontend test files (57 tests total)
Modified files
backend/app/main.py-- cleanup task, exception handlers, alembic, structlogbackend/app/db.py-- added run_alembic_migrations()backend/app/config.py-- added log_format settingbackend/pyproject.toml-- added alembic, structlog deps- 4 router files --
/api/v1/prefix - 10 files -- structlog migration
docker-compose.yml-- healthcheck URLfrontend/src/api.ts--/api/v1/URLs- All existing test files -- API path updates + error envelope assertions
Test Coverage
- Backend: 557 tests (was 516), 89.75% coverage
- Unit: ~490 tests
- Integration: ~60 tests
- E2E: ~7 tests
- Frontend: 80 tests (was 23), 16 test files (was 4)
Deviations from Plan
- Redis rate limiting deferred (single-worker sufficient for now)
- ConversationTracker verified correct by design (pool per-method), skipped
- Coverage dropped slightly from 90.26% to 89.75% due to new alembic/logging modules with partial test coverage (still well above 80% threshold)
Known Issues / Tech Debt
- Rate limiting remains process-global (needs Redis for multi-worker)
- Alembic migrations not tested against real PostgreSQL in CI (would need running DB)
- Frontend test coverage could be deeper (e.g., WebSocket reconnect edge cases)