Files
Yaojia Wang a516de4320 WIP
2026-02-01 00:08:40 +01:00

206 lines
5.8 KiB
Markdown

# Shared Package
Shared utilities and abstractions for the Invoice Master system.
## Storage Abstraction Layer
A unified storage abstraction supporting multiple backends:
- **Local filesystem** - Development and testing
- **Azure Blob Storage** - Azure cloud deployments
- **AWS S3** - AWS cloud deployments
### Installation
```bash
# Basic installation (local storage only)
pip install -e packages/shared
# With Azure support
pip install -e "packages/shared[azure]"
# With S3 support
pip install -e "packages/shared[s3]"
# All cloud providers
pip install -e "packages/shared[all]"
```
### Quick Start
```python
from shared.storage import get_storage_backend
# Option 1: From configuration file
storage = get_storage_backend("storage.yaml")
# Option 2: From environment variables
from shared.storage import create_storage_backend_from_env
storage = create_storage_backend_from_env()
# Upload a file
storage.upload(Path("local/file.pdf"), "documents/file.pdf")
# Download a file
storage.download("documents/file.pdf", Path("local/downloaded.pdf"))
# Get pre-signed URL for frontend access
url = storage.get_presigned_url("documents/file.pdf", expires_in_seconds=3600)
```
### Configuration File Format
Create a `storage.yaml` file with environment variable substitution support:
```yaml
# Backend selection: local, azure_blob, or s3
backend: ${STORAGE_BACKEND:-local}
# Default pre-signed URL expiry (seconds)
presigned_url_expiry: 3600
# Local storage configuration
local:
base_path: ${STORAGE_BASE_PATH:-./data/storage}
# Azure Blob Storage configuration
azure:
connection_string: ${AZURE_STORAGE_CONNECTION_STRING}
container_name: ${AZURE_STORAGE_CONTAINER:-documents}
create_container: false
# AWS S3 configuration
s3:
bucket_name: ${AWS_S3_BUCKET}
region_name: ${AWS_REGION:-us-east-1}
access_key_id: ${AWS_ACCESS_KEY_ID}
secret_access_key: ${AWS_SECRET_ACCESS_KEY}
endpoint_url: ${AWS_ENDPOINT_URL} # Optional, for S3-compatible services
create_bucket: false
```
### Environment Variables
| Variable | Backend | Description |
|----------|---------|-------------|
| `STORAGE_BACKEND` | All | Backend type: `local`, `azure_blob`, `s3` |
| `STORAGE_BASE_PATH` | Local | Base directory path |
| `AZURE_STORAGE_CONNECTION_STRING` | Azure | Connection string |
| `AZURE_STORAGE_CONTAINER` | Azure | Container name |
| `AWS_S3_BUCKET` | S3 | Bucket name |
| `AWS_REGION` | S3 | AWS region (default: us-east-1) |
| `AWS_ACCESS_KEY_ID` | S3 | Access key (optional, uses credential chain) |
| `AWS_SECRET_ACCESS_KEY` | S3 | Secret key (optional) |
| `AWS_ENDPOINT_URL` | S3 | Custom endpoint for S3-compatible services |
### API Reference
#### StorageBackend Interface
```python
class StorageBackend(ABC):
def upload(self, local_path: Path, remote_path: str, overwrite: bool = False) -> str:
"""Upload a file to storage."""
def download(self, remote_path: str, local_path: Path) -> Path:
"""Download a file from storage."""
def exists(self, remote_path: str) -> bool:
"""Check if a file exists."""
def list_files(self, prefix: str) -> list[str]:
"""List files with given prefix."""
def delete(self, remote_path: str) -> bool:
"""Delete a file."""
def get_url(self, remote_path: str) -> str:
"""Get URL for a file."""
def get_presigned_url(self, remote_path: str, expires_in_seconds: int = 3600) -> str:
"""Generate a pre-signed URL for temporary access (1-604800 seconds)."""
def upload_bytes(self, data: bytes, remote_path: str, overwrite: bool = False) -> str:
"""Upload bytes directly."""
def download_bytes(self, remote_path: str) -> bytes:
"""Download file as bytes."""
```
#### Factory Functions
```python
# Create from configuration file
storage = create_storage_backend_from_file("storage.yaml")
# Create from environment variables
storage = create_storage_backend_from_env()
# Create from StorageConfig object
config = StorageConfig(backend_type="local", base_path=Path("./data"))
storage = create_storage_backend(config)
# Convenience function with fallback chain: config file -> env vars -> local default
storage = get_storage_backend("storage.yaml") # or None for env-only
```
### Pre-signed URLs
Pre-signed URLs provide temporary access to files without exposing credentials:
```python
# Generate URL valid for 1 hour (default)
url = storage.get_presigned_url("documents/invoice.pdf")
# Generate URL valid for 24 hours
url = storage.get_presigned_url("documents/invoice.pdf", expires_in_seconds=86400)
# Maximum expiry: 7 days (604800 seconds)
url = storage.get_presigned_url("documents/invoice.pdf", expires_in_seconds=604800)
```
**Note:** Local storage returns `file://` URLs that don't actually expire.
### Error Handling
```python
from shared.storage import (
StorageError,
FileNotFoundStorageError,
PresignedUrlNotSupportedError,
)
try:
storage.download("nonexistent.pdf", Path("local.pdf"))
except FileNotFoundStorageError as e:
print(f"File not found: {e}")
except StorageError as e:
print(f"Storage error: {e}")
```
### Testing with MinIO (S3-compatible)
```bash
# Start MinIO locally
docker run -p 9000:9000 -p 9001:9001 minio/minio server /data --console-address ":9001"
# Configure environment
export STORAGE_BACKEND=s3
export AWS_S3_BUCKET=test-bucket
export AWS_ENDPOINT_URL=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
```
### Module Structure
```
shared/storage/
├── __init__.py # Public exports
├── base.py # Abstract interface and exceptions
├── local.py # Local filesystem backend
├── azure.py # Azure Blob Storage backend
├── s3.py # AWS S3 backend
├── config_loader.py # YAML configuration loader
└── factory.py # Backend factory functions
```