Files
invoice-master-poc-v2/docs/web-refactoring-plan.md
Yaojia Wang 58bf75db68 WIP
2026-01-27 00:47:10 +01:00

187 lines
7.1 KiB
Markdown

# Web Directory Refactoring Plan
## Current Structure Issues
1. **Flat structure**: All files in one directory (20 Python files)
2. **Naming inconsistency**: Mix of `admin_*`, `async_*`, `batch_*` prefixes
3. **Mixed concerns**: Routes, schemas, services, and workers in same directory
4. **Poor scalability**: Hard to navigate and maintain as project grows
## Proposed Structure (Best Practices)
```
src/web/
├── __init__.py # Main exports
├── app.py # FastAPI app factory
├── config.py # App configuration
├── dependencies.py # Global dependencies
├── api/ # API Routes Layer
│ ├── __init__.py
│ └── v1/ # API version 1
│ ├── __init__.py
│ ├── routes.py # Public API routes (inference)
│ ├── admin/ # Admin API routes
│ │ ├── __init__.py
│ │ ├── documents.py # admin_routes.py → documents.py
│ │ ├── annotations.py # admin_annotation_routes.py → annotations.py
│ │ ├── training.py # admin_training_routes.py → training.py
│ │ └── auth.py # admin_auth.py → auth.py (routes only)
│ ├── async_api/ # Async processing API
│ │ ├── __init__.py
│ │ └── routes.py # async_routes.py → routes.py
│ └── batch/ # Batch upload API
│ ├── __init__.py
│ └── routes.py # batch_upload_routes.py → routes.py
├── schemas/ # Pydantic Models
│ ├── __init__.py
│ ├── common.py # Shared schemas (ErrorResponse, etc.)
│ ├── inference.py # schemas.py → inference.py
│ ├── admin.py # admin_schemas.py → admin.py
│ ├── async_api.py # New: async API schemas
│ └── batch.py # New: batch upload schemas
├── services/ # Business Logic Layer
│ ├── __init__.py
│ ├── inference.py # services.py → inference.py
│ ├── autolabel.py # admin_autolabel.py → autolabel.py
│ ├── async_processing.py # async_service.py → async_processing.py
│ └── batch_upload.py # batch_upload_service.py → batch_upload.py
├── core/ # Core Components
│ ├── __init__.py
│ ├── auth.py # admin_auth.py → auth.py (logic only)
│ ├── rate_limiter.py # rate_limiter.py → rate_limiter.py
│ └── scheduler.py # admin_scheduler.py → scheduler.py
└── workers/ # Background Task Queues
├── __init__.py
├── async_queue.py # async_queue.py → async_queue.py
└── batch_queue.py # batch_queue.py → batch_queue.py
```
## File Mapping
### Current → New Location
| Current File | New Location | Purpose |
|--------------|--------------|---------|
| `admin_routes.py` | `api/v1/admin/documents.py` | Document management routes |
| `admin_annotation_routes.py` | `api/v1/admin/annotations.py` | Annotation routes |
| `admin_training_routes.py` | `api/v1/admin/training.py` | Training routes |
| `admin_auth.py` | Split: `api/v1/admin/auth.py` + `core/auth.py` | Auth routes + logic |
| `admin_schemas.py` | `schemas/admin.py` | Admin Pydantic models |
| `admin_autolabel.py` | `services/autolabel.py` | Auto-label service |
| `admin_scheduler.py` | `core/scheduler.py` | Training scheduler |
| `routes.py` | `api/v1/routes.py` | Public inference API |
| `schemas.py` | `schemas/inference.py` | Inference models |
| `services.py` | `services/inference.py` | Inference service |
| `async_routes.py` | `api/v1/async_api/routes.py` | Async API routes |
| `async_service.py` | `services/async_processing.py` | Async processing service |
| `async_queue.py` | `workers/async_queue.py` | Async task queue |
| `batch_upload_routes.py` | `api/v1/batch/routes.py` | Batch upload routes |
| `batch_upload_service.py` | `services/batch_upload.py` | Batch upload service |
| `batch_queue.py` | `workers/batch_queue.py` | Batch task queue |
| `rate_limiter.py` | `core/rate_limiter.py` | Rate limiting logic |
| `config.py` | `config.py` | Keep as-is |
| `dependencies.py` | `dependencies.py` | Keep as-is |
| `app.py` | `app.py` | Keep as-is (update imports) |
## Benefits
### 1. Clear Separation of Concerns
- **Routes**: API endpoint definitions
- **Schemas**: Data validation models
- **Services**: Business logic
- **Core**: Reusable components
- **Workers**: Background processing
### 2. Better Scalability
- Easy to add new API versions (`v2/`)
- Clear namespace for each domain
- Reduced file size (no 800+ line files)
### 3. Improved Maintainability
- Find files by function, not by prefix
- Each module has single responsibility
- Easier to write focused tests
### 4. Standard Python Patterns
- Package-based organization
- Follows FastAPI best practices
- Similar to Django/Flask structures
## Implementation Steps
### Phase 1: Create New Structure (No Breaking Changes)
1. Create new directories: `api/`, `schemas/`, `services/`, `core/`, `workers/`
2. Copy files to new locations (don't delete originals yet)
3. Update imports in new files
4. Add `__init__.py` with proper exports
### Phase 2: Update Tests
5. Update test imports to use new structure
6. Run tests to verify nothing breaks
7. Fix any import issues
### Phase 3: Update Main App
8. Update `app.py` to import from new locations
9. Run full test suite
10. Verify all endpoints work
### Phase 4: Cleanup
11. Delete old files
12. Update documentation
13. Final test run
## Migration Priority
**High Priority** (Most used):
- Routes and schemas (user-facing APIs)
- Services (core business logic)
**Medium Priority**:
- Core components (auth, rate limiter)
- Workers (background tasks)
**Low Priority**:
- Config and dependencies (already well-located)
## Backwards Compatibility
During migration, maintain backwards compatibility:
```python
# src/web/__init__.py
# Old imports still work
from src.web.api.v1.admin.documents import router as admin_router
from src.web.schemas.admin import AdminDocument
# Keep old names for compatibility (temporary)
admin_routes = admin_router # Deprecated alias
```
## Testing Strategy
1. **Unit Tests**: Test each module independently
2. **Integration Tests**: Test API endpoints still work
3. **Import Tests**: Verify all old imports still work
4. **Coverage**: Maintain current 23% coverage minimum
## Rollback Plan
If issues arise:
1. Keep old files until fully migrated
2. Git allows easy revert
3. Tests catch breaking changes early
---
## Next Steps
Would you like me to:
1. **Start Phase 1**: Create new directory structure and move files?
2. **Create migration script**: Automate the file moves and import updates?
3. **Focus on specific area**: Start with admin API or async API first?