invoice-master-poc-v2

Author	SHA1	Message	Date
Yaojia Wang	c4e3773df1	feat: upgrade PaddlePaddle and PaddleOCR to 3.x - Update paddlepaddle from >=2.5.0 to >=3.0.0,<3.3.0 - Update paddleocr from >=2.7.0 to >=3.0.0 - Update paddlepaddle-gpu from >=2.5.0 to >=3.0.0,<3.3.0 Note: PaddlePaddle 3.3.0 has an OneDNN bug that breaks CPU inference (ConvertPirAttribute2RuntimeAttribute not implemented). Using <3.3.0 until the bug is fixed upstream. This upgrade enables PP-StructureV3 for table extraction and uses PP-OCRv5 for improved text recognition accuracy. The existing codebase is already compatible with the 3.x API (predict() method and new response format). Verified: - PaddleOCR import works - PPStructureV3 is available - OCREngine initializes correctly - Inference API returns correct field extractions - 2117 unit tests pass Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-02 12:15:02 +01:00
Yaojia Wang	58bf75db68	WIP	2026-01-27 00:47:10 +01:00
Yaojia Wang	e83a0cae36	Add claude config	2026-01-25 16:17:39 +01:00
Yaojia Wang	8938661850	Initial commit: Invoice field extraction system using YOLO + OCR Features: - Auto-labeling pipeline: CSV values -> PDF search -> YOLO annotations - Flexible date matching: year-month match, nearby date tolerance - PDF text extraction with PyMuPDF - OCR support for scanned documents (PaddleOCR) - YOLO training and inference pipeline - 7 field types: InvoiceNumber, InvoiceDate, InvoiceDueDate, OCR, Bankgiro, Plusgiro, Amount Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-10 17:44:14 +01:00

4 Commits