Files
smart-support/docs/phases/eng-improvements-dev-log.md
Yaojia Wang f0699436c5 refactor: engineering improvements -- API versioning, structured logging, Alembic, error standardization, test coverage
- API versioning: all REST endpoints prefixed with /api/v1/
- Structured logging: replaced stdlib logging with structlog (console/JSON modes)
- Alembic migrations: versioned DB schema with initial migration
- Error standardization: global exception handlers for consistent envelope format
- Interrupt cleanup: asyncio background task for expired interrupt removal
- Integration tests: +30 tests (analytics, replay, openapi, error, session APIs)
- Frontend tests: +57 tests (all components, pages, useWebSocket hook)
- Backend: 557 tests, 89.75% coverage | Frontend: 80 tests, 16 test files
2026-04-06 23:19:29 +02:00

3.9 KiB

Engineering Improvements -- Development Log

Status: COMPLETED Branch: eng/engineering-improvements Date started: 2026-04-06 Date completed: 2026-04-06

What Was Built

Phase 1: Quick Wins (no new deps)

  1. Interrupt Cleanup Background Task -- Added asyncio background task in lifespan that calls interrupt_manager.cleanup_expired() every 60 seconds. Prevents unbounded memory growth from expired interrupts.

  2. API Versioning -- All REST endpoints prefixed with /api/v1/ (was /api/). Updated 4 router prefixes, Docker healthcheck, all frontend fetch URLs, and all test assertions. WebSocket /ws endpoint unchanged.

  3. Error Response Standardization -- Added global exception handlers for HTTPException, RequestValidationError, and Exception. All error responses now use the same envelope format as success responses: {"success": false, "data": null, "error": "..."}.

Phase 2: Medium Items (new deps)

  1. Alembic Database Migrations -- Replaced inline DDL in setup_app_tables() with versioned Alembic migrations. Initial migration 001_initial_schema.py captures all 4 tables + ALTER TABLE migration. setup_app_tables() preserved for tests. Production uses run_alembic_migrations().

  2. Structured Logging -- Replaced stdlib logging.getLogger() with structlog.get_logger() across 10 files. Added logging_config.py with console (dev) and JSON (production) modes. Configurable via LOG_FORMAT env var.

Phase 3: Test Coverage

  1. Integration Tests (+30) -- Created 5 new test files: analytics API, replay API, OpenAPI API, error responses, session/interrupt lifecycle. Uses httpx.AsyncClient with ASGITransport for full API layer testing.

  2. Frontend Tests (+57) -- Created 12 new test files covering all components (ChatInput, ChatMessages, InterruptPrompt, ErrorBanner, NavBar, MetricCard, ReplayTimeline, AgentAction, Layout), pages (ChatPage, ReviewPage), and hooks (useWebSocket).

Code Structure

New files created

  • backend/app/logging_config.py -- structlog configuration
  • backend/alembic.ini -- Alembic config
  • backend/alembic/env.py -- Migration environment
  • backend/alembic/versions/001_initial_schema.py -- Initial migration
  • backend/tests/unit/test_interrupt_cleanup.py (3 tests)
  • backend/tests/unit/test_error_responses.py (6 tests)
  • backend/tests/unit/test_logging_config.py (2 tests)
  • backend/tests/integration/test_analytics_api.py (6 tests)
  • backend/tests/integration/test_replay_api.py (6 tests)
  • backend/tests/integration/test_openapi_api.py (5 tests)
  • backend/tests/integration/test_error_responses.py (5 tests)
  • backend/tests/integration/test_session_interrupt_lifecycle.py (8 tests)
  • 12 frontend test files (57 tests total)

Modified files

  • backend/app/main.py -- cleanup task, exception handlers, alembic, structlog
  • backend/app/db.py -- added run_alembic_migrations()
  • backend/app/config.py -- added log_format setting
  • backend/pyproject.toml -- added alembic, structlog deps
  • 4 router files -- /api/v1/ prefix
  • 10 files -- structlog migration
  • docker-compose.yml -- healthcheck URL
  • frontend/src/api.ts -- /api/v1/ URLs
  • All existing test files -- API path updates + error envelope assertions

Test Coverage

  • Backend: 557 tests (was 516), 89.75% coverage
    • Unit: ~490 tests
    • Integration: ~60 tests
    • E2E: ~7 tests
  • Frontend: 80 tests (was 23), 16 test files (was 4)

Deviations from Plan

  • Redis rate limiting deferred (single-worker sufficient for now)
  • ConversationTracker verified correct by design (pool per-method), skipped
  • Coverage dropped slightly from 90.26% to 89.75% due to new alembic/logging modules with partial test coverage (still well above 80% threshold)

Known Issues / Tech Debt

  • Rate limiting remains process-global (needs Redis for multi-worker)
  • Alembic migrations not tested against real PostgreSQL in CI (would need running DB)
  • Frontend test coverage could be deeper (e.g., WebSocket reconnect edge cases)