Previously substring matching was only enabled for date fields, causing
OCR values embedded in longer tokens like "Fakturanummer: 2465027205"
to not be matched.
Changes:
- Extended Strategy 4 (substring match) to numeric fields
- Updated _find_substring_matches to support OCR, InvoiceNumber, Bankgiro, Plusgiro
This should significantly improve match rates for these fields.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Features:
- Auto-labeling pipeline: CSV values -> PDF search -> YOLO annotations
- Flexible date matching: year-month match, nearby date tolerance
- PDF text extraction with PyMuPDF
- OCR support for scanned documents (PaddleOCR)
- YOLO training and inference pipeline
- 7 field types: InvoiceNumber, InvoiceDate, InvoiceDueDate, OCR, Bankgiro, Plusgiro, Amount
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>