Implement center-point based bbox scaling with directional compensation to capture field labels that typically appear above or to the left of field values. This improves YOLO training data quality by including contextual information around field values. Key changes: - Add shared.bbox module with ScaleStrategy dataclass and expand_bbox function - Define field-specific strategies (ocr_number, bankgiro, invoice_date, etc.) - Support manual_mode for minimal padding (no scaling) - Integrate expand_bbox into AnnotationGenerator - Add FIELD_TO_CLASS mapping for field_name to class_name lookup - Comprehensive tests with 100% coverage (45 tests) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Shared Package
Shared utilities and abstractions for the Invoice Master system.
Storage Abstraction Layer
A unified storage abstraction supporting multiple backends:
- Local filesystem - Development and testing
- Azure Blob Storage - Azure cloud deployments
- AWS S3 - AWS cloud deployments
Installation
# Basic installation (local storage only)
pip install -e packages/shared
# With Azure support
pip install -e "packages/shared[azure]"
# With S3 support
pip install -e "packages/shared[s3]"
# All cloud providers
pip install -e "packages/shared[all]"
Quick Start
from shared.storage import get_storage_backend
# Option 1: From configuration file
storage = get_storage_backend("storage.yaml")
# Option 2: From environment variables
from shared.storage import create_storage_backend_from_env
storage = create_storage_backend_from_env()
# Upload a file
storage.upload(Path("local/file.pdf"), "documents/file.pdf")
# Download a file
storage.download("documents/file.pdf", Path("local/downloaded.pdf"))
# Get pre-signed URL for frontend access
url = storage.get_presigned_url("documents/file.pdf", expires_in_seconds=3600)
Configuration File Format
Create a storage.yaml file with environment variable substitution support:
# Backend selection: local, azure_blob, or s3
backend: ${STORAGE_BACKEND:-local}
# Default pre-signed URL expiry (seconds)
presigned_url_expiry: 3600
# Local storage configuration
local:
base_path: ${STORAGE_BASE_PATH:-./data/storage}
# Azure Blob Storage configuration
azure:
connection_string: ${AZURE_STORAGE_CONNECTION_STRING}
container_name: ${AZURE_STORAGE_CONTAINER:-documents}
create_container: false
# AWS S3 configuration
s3:
bucket_name: ${AWS_S3_BUCKET}
region_name: ${AWS_REGION:-us-east-1}
access_key_id: ${AWS_ACCESS_KEY_ID}
secret_access_key: ${AWS_SECRET_ACCESS_KEY}
endpoint_url: ${AWS_ENDPOINT_URL} # Optional, for S3-compatible services
create_bucket: false
Environment Variables
| Variable | Backend | Description |
|---|---|---|
STORAGE_BACKEND |
All | Backend type: local, azure_blob, s3 |
STORAGE_BASE_PATH |
Local | Base directory path |
AZURE_STORAGE_CONNECTION_STRING |
Azure | Connection string |
AZURE_STORAGE_CONTAINER |
Azure | Container name |
AWS_S3_BUCKET |
S3 | Bucket name |
AWS_REGION |
S3 | AWS region (default: us-east-1) |
AWS_ACCESS_KEY_ID |
S3 | Access key (optional, uses credential chain) |
AWS_SECRET_ACCESS_KEY |
S3 | Secret key (optional) |
AWS_ENDPOINT_URL |
S3 | Custom endpoint for S3-compatible services |
API Reference
StorageBackend Interface
class StorageBackend(ABC):
def upload(self, local_path: Path, remote_path: str, overwrite: bool = False) -> str:
"""Upload a file to storage."""
def download(self, remote_path: str, local_path: Path) -> Path:
"""Download a file from storage."""
def exists(self, remote_path: str) -> bool:
"""Check if a file exists."""
def list_files(self, prefix: str) -> list[str]:
"""List files with given prefix."""
def delete(self, remote_path: str) -> bool:
"""Delete a file."""
def get_url(self, remote_path: str) -> str:
"""Get URL for a file."""
def get_presigned_url(self, remote_path: str, expires_in_seconds: int = 3600) -> str:
"""Generate a pre-signed URL for temporary access (1-604800 seconds)."""
def upload_bytes(self, data: bytes, remote_path: str, overwrite: bool = False) -> str:
"""Upload bytes directly."""
def download_bytes(self, remote_path: str) -> bytes:
"""Download file as bytes."""
Factory Functions
# Create from configuration file
storage = create_storage_backend_from_file("storage.yaml")
# Create from environment variables
storage = create_storage_backend_from_env()
# Create from StorageConfig object
config = StorageConfig(backend_type="local", base_path=Path("./data"))
storage = create_storage_backend(config)
# Convenience function with fallback chain: config file -> env vars -> local default
storage = get_storage_backend("storage.yaml") # or None for env-only
Pre-signed URLs
Pre-signed URLs provide temporary access to files without exposing credentials:
# Generate URL valid for 1 hour (default)
url = storage.get_presigned_url("documents/invoice.pdf")
# Generate URL valid for 24 hours
url = storage.get_presigned_url("documents/invoice.pdf", expires_in_seconds=86400)
# Maximum expiry: 7 days (604800 seconds)
url = storage.get_presigned_url("documents/invoice.pdf", expires_in_seconds=604800)
Note: Local storage returns file:// URLs that don't actually expire.
Error Handling
from shared.storage import (
StorageError,
FileNotFoundStorageError,
PresignedUrlNotSupportedError,
)
try:
storage.download("nonexistent.pdf", Path("local.pdf"))
except FileNotFoundStorageError as e:
print(f"File not found: {e}")
except StorageError as e:
print(f"Storage error: {e}")
Testing with MinIO (S3-compatible)
# Start MinIO locally
docker run -p 9000:9000 -p 9001:9001 minio/minio server /data --console-address ":9001"
# Configure environment
export STORAGE_BACKEND=s3
export AWS_S3_BUCKET=test-bucket
export AWS_ENDPOINT_URL=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
Module Structure
shared/storage/
├── __init__.py # Public exports
├── base.py # Abstract interface and exceptions
├── local.py # Local filesystem backend
├── azure.py # Azure Blob Storage backend
├── s3.py # AWS S3 backend
├── config_loader.py # YAML configuration loader
└── factory.py # Backend factory functions