- Add .NET 8 backend with Clean Architecture - Add React + Vite + TypeScript frontend - Implement authentication with JWT - Implement Azure Blob Storage client - Implement OCR integration - Implement supplier matching service - Implement voucher generation - Implement Fortnox provider - Add unit and integration tests - Add Docker Compose configuration
94 lines
2.1 KiB
Markdown
94 lines
2.1 KiB
Markdown
# Invoice Master POC v2
|
|
|
|
Swedish Invoice Field Extraction System - YOLOv11 + PaddleOCR 从瑞典 PDF 发票中提取结构化数据。
|
|
|
|
## Tech Stack
|
|
|
|
| Component | Technology |
|
|
|-----------|------------|
|
|
| Object Detection | YOLOv11 (Ultralytics) |
|
|
| OCR Engine | PaddleOCR v5 (PP-OCRv5) |
|
|
| PDF Processing | PyMuPDF (fitz) |
|
|
| Database | PostgreSQL + psycopg2 |
|
|
| Web Framework | FastAPI + Uvicorn |
|
|
| Deep Learning | PyTorch + CUDA 12.x |
|
|
|
|
## WSL Environment (REQUIRED)
|
|
|
|
**Prefix ALL commands with:**
|
|
|
|
```bash
|
|
wsl bash -c "source ~/miniconda3/etc/profile.d/conda.sh && conda activate invoice-py311 && <command>"
|
|
```
|
|
|
|
**NEVER run Python commands directly in Windows PowerShell/CMD.**
|
|
|
|
## Project-Specific Rules
|
|
|
|
- Python 3.11+ with type hints
|
|
- No print() in production - use logging
|
|
- Run tests: `pytest --cov=src`
|
|
|
|
## Critical Rules
|
|
|
|
### Code Organization
|
|
|
|
- Many small files over few large files
|
|
- High cohesion, low coupling
|
|
- 200-400 lines typical, 800 max per file
|
|
- Organize by feature/domain, not by type
|
|
|
|
### Code Style
|
|
|
|
- No emojis in code, comments, or documentation
|
|
- Immutability always - never mutate objects or arrays
|
|
- No console.log in production code
|
|
- Proper error handling with try/catch
|
|
- Input validation with Zod or similar
|
|
|
|
### Testing
|
|
|
|
- TDD: Write tests first
|
|
- 80% minimum coverage
|
|
- Unit tests for utilities
|
|
- Integration tests for APIs
|
|
- E2E tests for critical flows
|
|
|
|
### Security
|
|
|
|
- No hardcoded secrets
|
|
- Environment variables for sensitive data
|
|
- Validate all user inputs
|
|
- Parameterized queries only
|
|
- CSRF protection enabled
|
|
|
|
## Environment Variables
|
|
|
|
```bash
|
|
# Required
|
|
DB_PASSWORD=
|
|
|
|
# Optional (with defaults)
|
|
DB_HOST=192.168.68.31
|
|
DB_PORT=5432
|
|
DB_NAME=docmaster
|
|
DB_USER=docmaster
|
|
MODEL_PATH=runs/train/invoice_fields/weights/best.pt
|
|
CONFIDENCE_THRESHOLD=0.5
|
|
SERVER_HOST=0.0.0.0
|
|
SERVER_PORT=8000
|
|
```
|
|
## Available Commands
|
|
|
|
- `/tdd` - Test-driven development workflow
|
|
- `/plan` - Create implementation plan
|
|
- `/code-review` - Review code quality
|
|
- `/build-fix` - Fix build errors
|
|
|
|
## Git Workflow
|
|
|
|
- Conventional commits: `feat:`, `fix:`, `refactor:`, `docs:`, `test:`
|
|
- Never commit to main directly
|
|
- PRs require review
|
|
- All tests must pass before merge
|