Prepare for opencode
This commit is contained in:
@@ -29,57 +29,38 @@ wsl bash -c "source ~/miniconda3/etc/profile.d/conda.sh && conda activate invoic
|
||||
- No print() in production - use logging
|
||||
- Run tests: `pytest --cov=src`
|
||||
|
||||
## File Structure
|
||||
## Critical Rules
|
||||
|
||||
```
|
||||
src/
|
||||
├── cli/ # autolabel, train, infer, serve
|
||||
├── pdf/ # extractor, renderer, detector
|
||||
├── ocr/ # PaddleOCR wrapper, machine_code_parser
|
||||
├── inference/ # pipeline, yolo_detector, field_extractor
|
||||
├── normalize/ # Per-field normalizers
|
||||
├── matcher/ # Exact, substring, fuzzy strategies
|
||||
├── processing/ # CPU/GPU pool architecture
|
||||
├── web/ # FastAPI app, routes, services, schemas
|
||||
├── utils/ # validators, text_cleaner, fuzzy_matcher
|
||||
└── data/ # Database operations
|
||||
tests/ # Mirror of src structure
|
||||
runs/train/ # Training outputs
|
||||
```
|
||||
### Code Organization
|
||||
|
||||
## Supported Fields
|
||||
- Many small files over few large files
|
||||
- High cohesion, low coupling
|
||||
- 200-400 lines typical, 800 max per file
|
||||
- Organize by feature/domain, not by type
|
||||
|
||||
| ID | Field | Description |
|
||||
|----|-------|-------------|
|
||||
| 0 | invoice_number | Invoice number |
|
||||
| 1 | invoice_date | Invoice date |
|
||||
| 2 | invoice_due_date | Due date |
|
||||
| 3 | ocr_number | OCR reference (Swedish payment) |
|
||||
| 4 | bankgiro | Bankgiro account |
|
||||
| 5 | plusgiro | Plusgiro account |
|
||||
| 6 | amount | Amount |
|
||||
| 7 | supplier_organisation_number | Supplier org number |
|
||||
| 8 | payment_line | Payment line (machine-readable) |
|
||||
| 9 | customer_number | Customer number |
|
||||
### Code Style
|
||||
|
||||
## Key Patterns
|
||||
- No emojis in code, comments, or documentation
|
||||
- Immutability always - never mutate objects or arrays
|
||||
- No console.log in production code
|
||||
- Proper error handling with try/catch
|
||||
- Input validation with Zod or similar
|
||||
|
||||
### Inference Result
|
||||
### Testing
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class InferenceResult:
|
||||
document_id: str
|
||||
document_type: str # "invoice" or "letter"
|
||||
fields: dict[str, str]
|
||||
confidence: dict[str, float]
|
||||
cross_validation: CrossValidationResult | None
|
||||
processing_time_ms: float
|
||||
```
|
||||
- TDD: Write tests first
|
||||
- 80% minimum coverage
|
||||
- Unit tests for utilities
|
||||
- Integration tests for APIs
|
||||
- E2E tests for critical flows
|
||||
|
||||
### API Schemas
|
||||
### Security
|
||||
|
||||
See `src/web/schemas.py` for request/response models.
|
||||
- No hardcoded secrets
|
||||
- Environment variables for sensitive data
|
||||
- Validate all user inputs
|
||||
- Parameterized queries only
|
||||
- CSRF protection enabled
|
||||
|
||||
## Environment Variables
|
||||
|
||||
@@ -97,47 +78,16 @@ CONFIDENCE_THRESHOLD=0.5
|
||||
SERVER_HOST=0.0.0.0
|
||||
SERVER_PORT=8000
|
||||
```
|
||||
## Available Commands
|
||||
|
||||
## CLI Commands
|
||||
- `/tdd` - Test-driven development workflow
|
||||
- `/plan` - Create implementation plan
|
||||
- `/code-review` - Review code quality
|
||||
- `/build-fix` - Fix build errors
|
||||
|
||||
```bash
|
||||
# Auto-labeling
|
||||
python -m src.cli.autolabel --dual-pool --cpu-workers 3 --gpu-workers 1
|
||||
## Git Workflow
|
||||
|
||||
# Training
|
||||
python -m src.cli.train --model yolo11n.pt --epochs 100 --batch 16 --name invoice_fields
|
||||
|
||||
# Inference
|
||||
python -m src.cli.infer --model runs/train/invoice_fields/weights/best.pt --input invoice.pdf --gpu
|
||||
|
||||
# Web Server
|
||||
python run_server.py --port 8000
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| GET | `/` | Web UI |
|
||||
| GET | `/api/v1/health` | Health check |
|
||||
| POST | `/api/v1/infer` | Process invoice |
|
||||
| GET | `/api/v1/results/{filename}` | Get visualization |
|
||||
|
||||
## Current Status
|
||||
|
||||
- **Tests**: 688 passing
|
||||
- **Coverage**: 37%
|
||||
- **Model**: 93.5% mAP@0.5
|
||||
- **Documents Labeled**: 9,738
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Start server
|
||||
wsl bash -c "source ~/miniconda3/etc/profile.d/conda.sh && conda activate invoice-py311 && cd /mnt/c/Users/yaoji/git/ColaCoder/invoice-master-poc-v2 && python run_server.py"
|
||||
|
||||
# Run tests
|
||||
wsl bash -c "source ~/miniconda3/etc/profile.d/conda.sh && conda activate invoice-py311 && cd /mnt/c/Users/yaoji/git/ColaCoder/invoice-master-poc-v2 && pytest"
|
||||
|
||||
# Access UI: http://localhost:8000
|
||||
```
|
||||
- Conventional commits: `feat:`, `fix:`, `refactor:`, `docs:`, `test:`
|
||||
- Never commit to main directly
|
||||
- PRs require review
|
||||
- All tests must pass before merge
|
||||
|
||||
Reference in New Issue
Block a user