Invoice Master POC v2

Swedish Invoice Field Extraction System - YOLO + PaddleOCR extracts structured data from Swedish PDF invoices.

Architecture

PDF → PyMuPDF (DPI=150) → YOLO Detection → PaddleOCR → Field Extraction → Normalization → Output

Project Structure

packages/
├── backend/    # FastAPI web server + inference pipeline
│   └── pipeline/   # YOLO detector → OCR → field extractor → value selector → normalizers
├── shared/     # Common utilities (bbox, OCR, field mappings)
└── training/   # YOLO training data generation (annotation, dataset)
tests/          # Mirrors packages/ structure

Pipeline Flow (process_pdf)

YOLO detects field regions on rendered PDF page
PaddleOCR extracts text from detected bboxes
Field extractor maps detections to invoice fields via CLASS_TO_FIELD
Value selector picks best candidate per field (confidence + validation)
Normalizers clean values (dates, amounts, invoice numbers)
Fallback regex extraction if key fields missing

Tech Stack

Component	Technology
Object Detection	YOLO (Ultralytics >= 8.4.0)
OCR	PaddleOCR v5 (PP-OCRv5)
PDF	PyMuPDF (fitz), DPI=150
Database	PostgreSQL + psycopg2
Web	FastAPI + Uvicorn
ML	PyTorch + CUDA 12.x

WSL Environment (REQUIRED)

ALL Python commands MUST use this prefix:

wsl bash -c "source ~/miniconda3/etc/profile.d/conda.sh && conda activate invoice-sm120 && <command>"

NEVER run Python directly in Windows PowerShell/CMD.

Project Rules

Python 3.10, type hints on all function signatures
No print() in production code - use logging module
Validation with pydantic or dataclasses
Error handling with try/except (not try/catch)
Run tests: pytest --cov=packages tests/

Key Files

File	Purpose
`packages/backend/backend/pipeline/pipeline.py`	Main inference pipeline
`packages/backend/backend/pipeline/field_extractor.py`	YOLO → field mapping
`packages/backend/backend/pipeline/value_selector.py`	Best candidate selection
`packages/shared/shared/fields/mappings.py`	CLASS_TO_FIELD mapping
`packages/shared/shared/ocr/paddle_ocr.py`	OCRToken definition
`packages/shared/shared/bbox/`	Bbox expansion strategies

Environment Variables

# Required
DB_PASSWORD=

# Optional (with defaults)
DB_HOST=192.168.68.31
DB_PORT=5432
DB_NAME=docmaster
DB_USER=docmaster
MODEL_PATH=runs/train/invoice_fields/weights/best.pt
CONFIDENCE_THRESHOLD=0.5
SERVER_HOST=0.0.0.0
SERVER_PORT=8000

Auto-trigger Rules (ALWAYS FOLLOW - even after context compaction)

These rules MUST be followed regardless of conversation history:

New feature or bug fix → MUST use tdd-guide agent (write tests first)
When writing code → MUST follow coding standards skill for the target language:
- Python → python-patterns (PEP 8, type hints, Pythonic idioms)
- C# → dotnet-skills:coding-standards (records, pattern matching, modern C#)
- TS/JS → coding-standards (universal best practices)
After writing/modifying code → MUST use code-reviewer agent
Before git commit → MUST use security-reviewer agent
When build/test fails → MUST use build-error-resolver agent
After context compaction → read MEMORY.md to restore session state

Plan Completion Protocol

After completing any plan or major task:

Test - Run pytest to confirm all tests pass
Security review - Use security-reviewer agent on changed files
Fix loop - If security review reports CRITICAL or HIGH issues:
- Fix the issues
- Re-run tests (back to step 1)
- Re-run security review (back to step 2)
- Repeat until no CRITICAL/HIGH issues remain
Commit - Auto-commit with conventional commit message (feat:, fix:, refactor:, etc.). Stage only the files changed in this task, not unrelated files
Save - Write a summary to MEMORY.md including: what was done, files changed, decisions made, remaining work
Suggest clear - Tell the user: "Plan complete. Recommend /clear to free context for the next task."
Do NOT start a new task in the same context - wait for user to /clear first

This keeps each plan in a fresh context window for maximum quality.

Known Issues

Pre-existing test failures: test_s3.py, test_azure.py (missing boto3/azure) - safe to ignore
Always re-run dedup/validation after fallback adds new fields
PDF DPI must be 150 (not 300) for correct bbox alignment

4.5 KiB Raw Blame History