Files
smart-support/docs/phases/eng-improvements-dev-log.md
Yaojia Wang f0699436c5 refactor: engineering improvements -- API versioning, structured logging, Alembic, error standardization, test coverage
- API versioning: all REST endpoints prefixed with /api/v1/
- Structured logging: replaced stdlib logging with structlog (console/JSON modes)
- Alembic migrations: versioned DB schema with initial migration
- Error standardization: global exception handlers for consistent envelope format
- Interrupt cleanup: asyncio background task for expired interrupt removal
- Integration tests: +30 tests (analytics, replay, openapi, error, session APIs)
- Frontend tests: +57 tests (all components, pages, useWebSocket hook)
- Backend: 557 tests, 89.75% coverage | Frontend: 80 tests, 16 test files
2026-04-06 23:19:29 +02:00

77 lines
3.9 KiB
Markdown

# Engineering Improvements -- Development Log
> Status: COMPLETED
> Branch: `eng/engineering-improvements`
> Date started: 2026-04-06
> Date completed: 2026-04-06
## What Was Built
### Phase 1: Quick Wins (no new deps)
1. **Interrupt Cleanup Background Task** -- Added asyncio background task in lifespan that calls `interrupt_manager.cleanup_expired()` every 60 seconds. Prevents unbounded memory growth from expired interrupts.
2. **API Versioning** -- All REST endpoints prefixed with `/api/v1/` (was `/api/`). Updated 4 router prefixes, Docker healthcheck, all frontend fetch URLs, and all test assertions. WebSocket `/ws` endpoint unchanged.
3. **Error Response Standardization** -- Added global exception handlers for `HTTPException`, `RequestValidationError`, and `Exception`. All error responses now use the same envelope format as success responses: `{"success": false, "data": null, "error": "..."}`.
### Phase 2: Medium Items (new deps)
4. **Alembic Database Migrations** -- Replaced inline DDL in `setup_app_tables()` with versioned Alembic migrations. Initial migration `001_initial_schema.py` captures all 4 tables + ALTER TABLE migration. `setup_app_tables()` preserved for tests. Production uses `run_alembic_migrations()`.
5. **Structured Logging** -- Replaced stdlib `logging.getLogger()` with `structlog.get_logger()` across 10 files. Added `logging_config.py` with console (dev) and JSON (production) modes. Configurable via `LOG_FORMAT` env var.
### Phase 3: Test Coverage
7. **Integration Tests (+30)** -- Created 5 new test files: analytics API, replay API, OpenAPI API, error responses, session/interrupt lifecycle. Uses httpx.AsyncClient with ASGITransport for full API layer testing.
8. **Frontend Tests (+57)** -- Created 12 new test files covering all components (ChatInput, ChatMessages, InterruptPrompt, ErrorBanner, NavBar, MetricCard, ReplayTimeline, AgentAction, Layout), pages (ChatPage, ReviewPage), and hooks (useWebSocket).
## Code Structure
### New files created
- `backend/app/logging_config.py` -- structlog configuration
- `backend/alembic.ini` -- Alembic config
- `backend/alembic/env.py` -- Migration environment
- `backend/alembic/versions/001_initial_schema.py` -- Initial migration
- `backend/tests/unit/test_interrupt_cleanup.py` (3 tests)
- `backend/tests/unit/test_error_responses.py` (6 tests)
- `backend/tests/unit/test_logging_config.py` (2 tests)
- `backend/tests/integration/test_analytics_api.py` (6 tests)
- `backend/tests/integration/test_replay_api.py` (6 tests)
- `backend/tests/integration/test_openapi_api.py` (5 tests)
- `backend/tests/integration/test_error_responses.py` (5 tests)
- `backend/tests/integration/test_session_interrupt_lifecycle.py` (8 tests)
- 12 frontend test files (57 tests total)
### Modified files
- `backend/app/main.py` -- cleanup task, exception handlers, alembic, structlog
- `backend/app/db.py` -- added run_alembic_migrations()
- `backend/app/config.py` -- added log_format setting
- `backend/pyproject.toml` -- added alembic, structlog deps
- 4 router files -- `/api/v1/` prefix
- 10 files -- structlog migration
- `docker-compose.yml` -- healthcheck URL
- `frontend/src/api.ts` -- `/api/v1/` URLs
- All existing test files -- API path updates + error envelope assertions
## Test Coverage
- Backend: 557 tests (was 516), 89.75% coverage
- Unit: ~490 tests
- Integration: ~60 tests
- E2E: ~7 tests
- Frontend: 80 tests (was 23), 16 test files (was 4)
## Deviations from Plan
- Redis rate limiting deferred (single-worker sufficient for now)
- ConversationTracker verified correct by design (pool per-method), skipped
- Coverage dropped slightly from 90.26% to 89.75% due to new alembic/logging modules with partial test coverage (still well above 80% threshold)
## Known Issues / Tech Debt
- Rate limiting remains process-global (needs Redis for multi-worker)
- Alembic migrations not tested against real PostgreSQL in CI (would need running DB)
- Frontend test coverage could be deeper (e.g., WebSocket reconnect edge cases)