2.2 KiB
2.2 KiB
Invoice Master POC v2
Swedish Invoice Field Extraction System - YOLO26 + PaddleOCR 从瑞典 PDF 发票中提取结构化数据。
Tech Stack
| Component | Technology |
|---|---|
| Object Detection | YOLO26 (Ultralytics >= 8.4.0) |
| OCR Engine | PaddleOCR v5 (PP-OCRv5) |
| PDF Processing | PyMuPDF (fitz) |
| Database | PostgreSQL + psycopg2 |
| Web Framework | FastAPI + Uvicorn |
| Deep Learning | PyTorch + CUDA 12.x |
WSL Environment (REQUIRED)
Prefix ALL commands with:
wsl bash -c "source ~/miniconda3/etc/profile.d/conda.sh && conda activate invoice-sm120 && <command>"
NEVER run Python commands directly in Windows PowerShell/CMD.
Project-Specific Rules
- Python 3.11+ with type hints
- No print() in production - use logging
- Run tests:
pytest --cov=src
Critical Rules
Code Organization
- Many small files over few large files
- High cohesion, low coupling
- 200-400 lines typical, 800 max per file
- Organize by feature/domain, not by type
Code Style
- No emojis in code, comments, or documentation
- Immutability always - never mutate objects or arrays
- No console.log in production code
- Proper error handling with try/catch
- Input validation with Zod or similar
Testing
- TDD: Write tests first
- 80% minimum coverage
- Unit tests for utilities
- Integration tests for APIs
- E2E tests for critical flows
Security
- No hardcoded secrets
- Environment variables for sensitive data
- Validate all user inputs
- Parameterized queries only
- CSRF protection enabled
Environment Variables
# Required
DB_PASSWORD=
# Optional (with defaults)
DB_HOST=192.168.68.31
DB_PORT=5432
DB_NAME=docmaster
DB_USER=docmaster
MODEL_PATH=runs/train/invoice_fields/weights/best.pt
CONFIDENCE_THRESHOLD=0.5
SERVER_HOST=0.0.0.0
SERVER_PORT=8000
Available Commands
/tdd- Test-driven development workflow/plan- Create implementation plan/code-review- Review code quality/build-fix- Fix build errors
Git Workflow
- Conventional commits:
feat:,fix:,refactor:,docs:,test: - Never commit to main directly
- PRs require review
- All tests must pass before merge
Push the code before review and fix finished.