Initial commit: Invoice field extraction system using YOLO + OCR
Features: - Auto-labeling pipeline: CSV values -> PDF search -> YOLO annotations - Flexible date matching: year-month match, nearby date tolerance - PDF text extraction with PyMuPDF - OCR support for scanned documents (PaddleOCR) - YOLO training and inference pipeline - 7 field types: InvoiceNumber, InvoiceDate, InvoiceDueDate, OCR, Bankgiro, Plusgiro, Amount Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
22
requirements.txt
Normal file
22
requirements.txt
Normal file
@@ -0,0 +1,22 @@
|
||||
# Invoice Master POC v2 - Dependencies
|
||||
|
||||
# PDF Processing
|
||||
PyMuPDF>=1.23.0 # PDF rendering and text extraction
|
||||
|
||||
# OCR
|
||||
paddlepaddle>=2.5.0 # PaddlePaddle framework
|
||||
paddleocr>=2.7.0 # PaddleOCR
|
||||
|
||||
# YOLO
|
||||
ultralytics>=8.1.0 # YOLOv8/v11
|
||||
|
||||
# Image Processing
|
||||
Pillow>=10.0.0 # Image handling
|
||||
numpy>=1.24.0 # Array operations
|
||||
opencv-python>=4.8.0 # Image processing
|
||||
|
||||
# Data Processing
|
||||
pyyaml>=6.0 # YAML config files
|
||||
|
||||
# Utilities
|
||||
tqdm>=4.65.0 # Progress bars
|
||||
Reference in New Issue
Block a user