# Web Directory Refactoring Plan ## Current Structure Issues 1. **Flat structure**: All files in one directory (20 Python files) 2. **Naming inconsistency**: Mix of `admin_*`, `async_*`, `batch_*` prefixes 3. **Mixed concerns**: Routes, schemas, services, and workers in same directory 4. **Poor scalability**: Hard to navigate and maintain as project grows ## Proposed Structure (Best Practices) ``` src/web/ ├── __init__.py # Main exports ├── app.py # FastAPI app factory ├── config.py # App configuration ├── dependencies.py # Global dependencies │ ├── api/ # API Routes Layer │ ├── __init__.py │ └── v1/ # API version 1 │ ├── __init__.py │ ├── routes.py # Public API routes (inference) │ ├── admin/ # Admin API routes │ │ ├── __init__.py │ │ ├── documents.py # admin_routes.py → documents.py │ │ ├── annotations.py # admin_annotation_routes.py → annotations.py │ │ ├── training.py # admin_training_routes.py → training.py │ │ └── auth.py # admin_auth.py → auth.py (routes only) │ ├── async_api/ # Async processing API │ │ ├── __init__.py │ │ └── routes.py # async_routes.py → routes.py │ └── batch/ # Batch upload API │ ├── __init__.py │ └── routes.py # batch_upload_routes.py → routes.py │ ├── schemas/ # Pydantic Models │ ├── __init__.py │ ├── common.py # Shared schemas (ErrorResponse, etc.) │ ├── inference.py # schemas.py → inference.py │ ├── admin.py # admin_schemas.py → admin.py │ ├── async_api.py # New: async API schemas │ └── batch.py # New: batch upload schemas │ ├── services/ # Business Logic Layer │ ├── __init__.py │ ├── inference.py # services.py → inference.py │ ├── autolabel.py # admin_autolabel.py → autolabel.py │ ├── async_processing.py # async_service.py → async_processing.py │ └── batch_upload.py # batch_upload_service.py → batch_upload.py │ ├── core/ # Core Components │ ├── __init__.py │ ├── auth.py # admin_auth.py → auth.py (logic only) │ ├── rate_limiter.py # rate_limiter.py → rate_limiter.py │ └── scheduler.py # admin_scheduler.py → scheduler.py │ └── workers/ # Background Task Queues ├── __init__.py ├── async_queue.py # async_queue.py → async_queue.py └── batch_queue.py # batch_queue.py → batch_queue.py ``` ## File Mapping ### Current → New Location | Current File | New Location | Purpose | |--------------|--------------|---------| | `admin_routes.py` | `api/v1/admin/documents.py` | Document management routes | | `admin_annotation_routes.py` | `api/v1/admin/annotations.py` | Annotation routes | | `admin_training_routes.py` | `api/v1/admin/training.py` | Training routes | | `admin_auth.py` | Split: `api/v1/admin/auth.py` + `core/auth.py` | Auth routes + logic | | `admin_schemas.py` | `schemas/admin.py` | Admin Pydantic models | | `admin_autolabel.py` | `services/autolabel.py` | Auto-label service | | `admin_scheduler.py` | `core/scheduler.py` | Training scheduler | | `routes.py` | `api/v1/routes.py` | Public inference API | | `schemas.py` | `schemas/inference.py` | Inference models | | `services.py` | `services/inference.py` | Inference service | | `async_routes.py` | `api/v1/async_api/routes.py` | Async API routes | | `async_service.py` | `services/async_processing.py` | Async processing service | | `async_queue.py` | `workers/async_queue.py` | Async task queue | | `batch_upload_routes.py` | `api/v1/batch/routes.py` | Batch upload routes | | `batch_upload_service.py` | `services/batch_upload.py` | Batch upload service | | `batch_queue.py` | `workers/batch_queue.py` | Batch task queue | | `rate_limiter.py` | `core/rate_limiter.py` | Rate limiting logic | | `config.py` | `config.py` | Keep as-is | | `dependencies.py` | `dependencies.py` | Keep as-is | | `app.py` | `app.py` | Keep as-is (update imports) | ## Benefits ### 1. Clear Separation of Concerns - **Routes**: API endpoint definitions - **Schemas**: Data validation models - **Services**: Business logic - **Core**: Reusable components - **Workers**: Background processing ### 2. Better Scalability - Easy to add new API versions (`v2/`) - Clear namespace for each domain - Reduced file size (no 800+ line files) ### 3. Improved Maintainability - Find files by function, not by prefix - Each module has single responsibility - Easier to write focused tests ### 4. Standard Python Patterns - Package-based organization - Follows FastAPI best practices - Similar to Django/Flask structures ## Implementation Steps ### Phase 1: Create New Structure (No Breaking Changes) 1. Create new directories: `api/`, `schemas/`, `services/`, `core/`, `workers/` 2. Copy files to new locations (don't delete originals yet) 3. Update imports in new files 4. Add `__init__.py` with proper exports ### Phase 2: Update Tests 5. Update test imports to use new structure 6. Run tests to verify nothing breaks 7. Fix any import issues ### Phase 3: Update Main App 8. Update `app.py` to import from new locations 9. Run full test suite 10. Verify all endpoints work ### Phase 4: Cleanup 11. Delete old files 12. Update documentation 13. Final test run ## Migration Priority **High Priority** (Most used): - Routes and schemas (user-facing APIs) - Services (core business logic) **Medium Priority**: - Core components (auth, rate limiter) - Workers (background tasks) **Low Priority**: - Config and dependencies (already well-located) ## Backwards Compatibility During migration, maintain backwards compatibility: ```python # src/web/__init__.py # Old imports still work from src.web.api.v1.admin.documents import router as admin_router from src.web.schemas.admin import AdminDocument # Keep old names for compatibility (temporary) admin_routes = admin_router # Deprecated alias ``` ## Testing Strategy 1. **Unit Tests**: Test each module independently 2. **Integration Tests**: Test API endpoints still work 3. **Import Tests**: Verify all old imports still work 4. **Coverage**: Maintain current 23% coverage minimum ## Rollback Plan If issues arise: 1. Keep old files until fully migrated 2. Git allows easy revert 3. Tests catch breaking changes early --- ## Next Steps Would you like me to: 1. **Start Phase 1**: Create new directory structure and move files? 2. **Create migration script**: Automate the file moves and import updates? 3. **Focus on specific area**: Start with admin API or async API first?