Files
invoice-master-poc-v2/docs/web-refactoring-plan.md
Yaojia Wang 58bf75db68 WIP
2026-01-27 00:47:10 +01:00

7.1 KiB

Web Directory Refactoring Plan

Current Structure Issues

  1. Flat structure: All files in one directory (20 Python files)
  2. Naming inconsistency: Mix of admin_*, async_*, batch_* prefixes
  3. Mixed concerns: Routes, schemas, services, and workers in same directory
  4. Poor scalability: Hard to navigate and maintain as project grows

Proposed Structure (Best Practices)

src/web/
├── __init__.py                  # Main exports
├── app.py                       # FastAPI app factory
├── config.py                    # App configuration
├── dependencies.py              # Global dependencies
│
├── api/                         # API Routes Layer
│   ├── __init__.py
│   └── v1/                      # API version 1
│       ├── __init__.py
│       ├── routes.py            # Public API routes (inference)
│       ├── admin/               # Admin API routes
│       │   ├── __init__.py
│       │   ├── documents.py     # admin_routes.py → documents.py
│       │   ├── annotations.py   # admin_annotation_routes.py → annotations.py
│       │   ├── training.py      # admin_training_routes.py → training.py
│       │   └── auth.py          # admin_auth.py → auth.py (routes only)
│       ├── async_api/           # Async processing API
│       │   ├── __init__.py
│       │   └── routes.py        # async_routes.py → routes.py
│       └── batch/               # Batch upload API
│           ├── __init__.py
│           └── routes.py        # batch_upload_routes.py → routes.py
│
├── schemas/                     # Pydantic Models
│   ├── __init__.py
│   ├── common.py                # Shared schemas (ErrorResponse, etc.)
│   ├── inference.py             # schemas.py → inference.py
│   ├── admin.py                 # admin_schemas.py → admin.py
│   ├── async_api.py             # New: async API schemas
│   └── batch.py                 # New: batch upload schemas
│
├── services/                    # Business Logic Layer
│   ├── __init__.py
│   ├── inference.py             # services.py → inference.py
│   ├── autolabel.py             # admin_autolabel.py → autolabel.py
│   ├── async_processing.py      # async_service.py → async_processing.py
│   └── batch_upload.py          # batch_upload_service.py → batch_upload.py
│
├── core/                        # Core Components
│   ├── __init__.py
│   ├── auth.py                  # admin_auth.py → auth.py (logic only)
│   ├── rate_limiter.py          # rate_limiter.py → rate_limiter.py
│   └── scheduler.py             # admin_scheduler.py → scheduler.py
│
└── workers/                     # Background Task Queues
    ├── __init__.py
    ├── async_queue.py           # async_queue.py → async_queue.py
    └── batch_queue.py           # batch_queue.py → batch_queue.py

File Mapping

Current → New Location

Current File New Location Purpose
admin_routes.py api/v1/admin/documents.py Document management routes
admin_annotation_routes.py api/v1/admin/annotations.py Annotation routes
admin_training_routes.py api/v1/admin/training.py Training routes
admin_auth.py Split: api/v1/admin/auth.py + core/auth.py Auth routes + logic
admin_schemas.py schemas/admin.py Admin Pydantic models
admin_autolabel.py services/autolabel.py Auto-label service
admin_scheduler.py core/scheduler.py Training scheduler
routes.py api/v1/routes.py Public inference API
schemas.py schemas/inference.py Inference models
services.py services/inference.py Inference service
async_routes.py api/v1/async_api/routes.py Async API routes
async_service.py services/async_processing.py Async processing service
async_queue.py workers/async_queue.py Async task queue
batch_upload_routes.py api/v1/batch/routes.py Batch upload routes
batch_upload_service.py services/batch_upload.py Batch upload service
batch_queue.py workers/batch_queue.py Batch task queue
rate_limiter.py core/rate_limiter.py Rate limiting logic
config.py config.py Keep as-is
dependencies.py dependencies.py Keep as-is
app.py app.py Keep as-is (update imports)

Benefits

1. Clear Separation of Concerns

  • Routes: API endpoint definitions
  • Schemas: Data validation models
  • Services: Business logic
  • Core: Reusable components
  • Workers: Background processing

2. Better Scalability

  • Easy to add new API versions (v2/)
  • Clear namespace for each domain
  • Reduced file size (no 800+ line files)

3. Improved Maintainability

  • Find files by function, not by prefix
  • Each module has single responsibility
  • Easier to write focused tests

4. Standard Python Patterns

  • Package-based organization
  • Follows FastAPI best practices
  • Similar to Django/Flask structures

Implementation Steps

Phase 1: Create New Structure (No Breaking Changes)

  1. Create new directories: api/, schemas/, services/, core/, workers/
  2. Copy files to new locations (don't delete originals yet)
  3. Update imports in new files
  4. Add __init__.py with proper exports

Phase 2: Update Tests

  1. Update test imports to use new structure
  2. Run tests to verify nothing breaks
  3. Fix any import issues

Phase 3: Update Main App

  1. Update app.py to import from new locations
  2. Run full test suite
  3. Verify all endpoints work

Phase 4: Cleanup

  1. Delete old files
  2. Update documentation
  3. Final test run

Migration Priority

High Priority (Most used):

  • Routes and schemas (user-facing APIs)
  • Services (core business logic)

Medium Priority:

  • Core components (auth, rate limiter)
  • Workers (background tasks)

Low Priority:

  • Config and dependencies (already well-located)

Backwards Compatibility

During migration, maintain backwards compatibility:

# src/web/__init__.py
# Old imports still work
from src.web.api.v1.admin.documents import router as admin_router
from src.web.schemas.admin import AdminDocument

# Keep old names for compatibility (temporary)
admin_routes = admin_router  # Deprecated alias

Testing Strategy

  1. Unit Tests: Test each module independently
  2. Integration Tests: Test API endpoints still work
  3. Import Tests: Verify all old imports still work
  4. Coverage: Maintain current 23% coverage minimum

Rollback Plan

If issues arise:

  1. Keep old files until fully migrated
  2. Git allows easy revert
  3. Tests catch breaking changes early

Next Steps

Would you like me to:

  1. Start Phase 1: Create new directory structure and move files?
  2. Create migration script: Automate the file moves and import updates?
  3. Focus on specific area: Start with admin API or async API first?