Initial commit: Invoice field extraction system using YOLO + OCR
Features: - Auto-labeling pipeline: CSV values -> PDF search -> YOLO annotations - Flexible date matching: year-month match, nearby date tolerance - PDF text extraction with PyMuPDF - OCR support for scanned documents (PaddleOCR) - YOLO training and inference pipeline - 7 field types: InvoiceNumber, InvoiceDate, InvoiceDueDate, OCR, Bankgiro, Plusgiro, Amount Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
71
.gitignore
vendored
Normal file
71
.gitignore
vendored
Normal file
@@ -0,0 +1,71 @@
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
|
||||
# Virtual environments
|
||||
venv/
|
||||
ENV/
|
||||
env/
|
||||
.venv/
|
||||
|
||||
# IDE
|
||||
.idea/
|
||||
.vscode/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# Data files (large files)
|
||||
data/raw_pdfs/
|
||||
data/dataset/train/images/
|
||||
data/dataset/val/images/
|
||||
data/dataset/test/images/
|
||||
data/dataset/train/labels/
|
||||
data/dataset/val/labels/
|
||||
data/dataset/test/labels/
|
||||
*.pdf
|
||||
*.png
|
||||
*.jpg
|
||||
*.jpeg
|
||||
|
||||
# Model weights
|
||||
models/weights/
|
||||
runs/
|
||||
*.pt
|
||||
*.onnx
|
||||
|
||||
# Reports and logs
|
||||
reports/*.jsonl
|
||||
logs/
|
||||
*.log
|
||||
|
||||
# Jupyter
|
||||
.ipynb_checkpoints/
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# Credentials
|
||||
.env
|
||||
*.key
|
||||
*.pem
|
||||
credentials.json
|
||||
Reference in New Issue
Block a user