Prepare for opencode

This commit is contained in:
Yaojia Wang
2026-02-03 22:03:44 +01:00
parent 729d96f59e
commit 183d3503ef
22 changed files with 4858 additions and 244 deletions

View File

@@ -29,57 +29,38 @@ wsl bash -c "source ~/miniconda3/etc/profile.d/conda.sh && conda activate invoic
- No print() in production - use logging
- Run tests: `pytest --cov=src`
## File Structure
## Critical Rules
```
src/
├── cli/ # autolabel, train, infer, serve
├── pdf/ # extractor, renderer, detector
├── ocr/ # PaddleOCR wrapper, machine_code_parser
├── inference/ # pipeline, yolo_detector, field_extractor
├── normalize/ # Per-field normalizers
├── matcher/ # Exact, substring, fuzzy strategies
├── processing/ # CPU/GPU pool architecture
├── web/ # FastAPI app, routes, services, schemas
├── utils/ # validators, text_cleaner, fuzzy_matcher
└── data/ # Database operations
tests/ # Mirror of src structure
runs/train/ # Training outputs
```
### Code Organization
## Supported Fields
- Many small files over few large files
- High cohesion, low coupling
- 200-400 lines typical, 800 max per file
- Organize by feature/domain, not by type
| ID | Field | Description |
|----|-------|-------------|
| 0 | invoice_number | Invoice number |
| 1 | invoice_date | Invoice date |
| 2 | invoice_due_date | Due date |
| 3 | ocr_number | OCR reference (Swedish payment) |
| 4 | bankgiro | Bankgiro account |
| 5 | plusgiro | Plusgiro account |
| 6 | amount | Amount |
| 7 | supplier_organisation_number | Supplier org number |
| 8 | payment_line | Payment line (machine-readable) |
| 9 | customer_number | Customer number |
### Code Style
## Key Patterns
- No emojis in code, comments, or documentation
- Immutability always - never mutate objects or arrays
- No console.log in production code
- Proper error handling with try/catch
- Input validation with Zod or similar
### Inference Result
### Testing
```python
@dataclass
class InferenceResult:
document_id: str
document_type: str # "invoice" or "letter"
fields: dict[str, str]
confidence: dict[str, float]
cross_validation: CrossValidationResult | None
processing_time_ms: float
```
- TDD: Write tests first
- 80% minimum coverage
- Unit tests for utilities
- Integration tests for APIs
- E2E tests for critical flows
### API Schemas
### Security
See `src/web/schemas.py` for request/response models.
- No hardcoded secrets
- Environment variables for sensitive data
- Validate all user inputs
- Parameterized queries only
- CSRF protection enabled
## Environment Variables
@@ -97,47 +78,16 @@ CONFIDENCE_THRESHOLD=0.5
SERVER_HOST=0.0.0.0
SERVER_PORT=8000
```
## Available Commands
## CLI Commands
- `/tdd` - Test-driven development workflow
- `/plan` - Create implementation plan
- `/code-review` - Review code quality
- `/build-fix` - Fix build errors
```bash
# Auto-labeling
python -m src.cli.autolabel --dual-pool --cpu-workers 3 --gpu-workers 1
## Git Workflow
# Training
python -m src.cli.train --model yolo11n.pt --epochs 100 --batch 16 --name invoice_fields
# Inference
python -m src.cli.infer --model runs/train/invoice_fields/weights/best.pt --input invoice.pdf --gpu
# Web Server
python run_server.py --port 8000
```
## API Endpoints
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/` | Web UI |
| GET | `/api/v1/health` | Health check |
| POST | `/api/v1/infer` | Process invoice |
| GET | `/api/v1/results/{filename}` | Get visualization |
## Current Status
- **Tests**: 688 passing
- **Coverage**: 37%
- **Model**: 93.5% mAP@0.5
- **Documents Labeled**: 9,738
## Quick Start
```bash
# Start server
wsl bash -c "source ~/miniconda3/etc/profile.d/conda.sh && conda activate invoice-py311 && cd /mnt/c/Users/yaoji/git/ColaCoder/invoice-master-poc-v2 && python run_server.py"
# Run tests
wsl bash -c "source ~/miniconda3/etc/profile.d/conda.sh && conda activate invoice-py311 && cd /mnt/c/Users/yaoji/git/ColaCoder/invoice-master-poc-v2 && pytest"
# Access UI: http://localhost:8000
```
- Conventional commits: `feat:`, `fix:`, `refactor:`, `docs:`, `test:`
- Never commit to main directly
- PRs require review
- All tests must pass before merge