Commit Graph

6 Commits

Author SHA1 Message Date
Yaojia Wang
d6550375b0 restructure project 2026-01-27 23:58:17 +01:00
Yaojia Wang
58bf75db68 WIP 2026-01-27 00:47:10 +01:00
Yaojia Wang
e83a0cae36 Add claude config 2026-01-25 16:17:39 +01:00
Yaojia Wang
4ea4bc96d4 Add payment line parser and fix OCR override from payment_line
- Add MachineCodeParser for Swedish invoice payment line parsing
- Fix OCR Reference extraction by normalizing account number spaces
- Add cross-validation tests for pipeline and field_extractor
- Update UI layout for compact upload and full-width results

Key changes:
- machine_code_parser.py: Handle spaces in Bankgiro numbers (e.g. "78 2 1 713")
- pipeline.py: OCR and Amount override from payment_line, BG/PG comparison only
- field_extractor.py: Improved invoice number normalization
- app.py: Responsive UI layout changes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 21:47:02 +01:00
Yaojia Wang
b26fd61852 WOP 2026-01-13 00:10:27 +01:00
Yaojia Wang
8938661850 Initial commit: Invoice field extraction system using YOLO + OCR
Features:
- Auto-labeling pipeline: CSV values -> PDF search -> YOLO annotations
- Flexible date matching: year-month match, nearby date tolerance
- PDF text extraction with PyMuPDF
- OCR support for scanned documents (PaddleOCR)
- YOLO training and inference pipeline
- 7 field types: InvoiceNumber, InvoiceDate, InvoiceDueDate, OCR, Bankgiro, Plusgiro, Amount

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 17:44:14 +01:00