invoice-master-poc-v2/docs/azure-deployment-guide.md

# Azure 部署方案完整指南

## 目录
- [核心问题](#核心问题)
- [存储方案](#存储方案)
- [训练方案](#训练方案)
- [推理方案](#推理方案)
- [价格对比](#价格对比)
- [推荐架构](#推荐架构)
- [实施步骤](#实施步骤)

---

## 核心问题

| 问题 | 答案 |
|------|------|
| Azure Blob Storage 能用于训练吗？ | 可以，用 BlobFuse2 挂载 |
| 能实时从 Blob 读取训练吗？ | 可以，但建议配置本地缓存 |
| 本地能挂载 Azure Blob 吗？ | 可以，用 Rclone (Windows) 或 BlobFuse2 (Linux) |
| VM 空闲时收费吗？ | 收费，只要开机就按小时计费 |
| 如何按需付费？ | 用 Serverless GPU 或 min=0 的 Compute Cluster |
| 推理服务用什么？ | Container Apps (CPU) 或 Serverless GPU |

---

## 存储方案

### Azure Blob Storage + BlobFuse2（推荐）

```bash
# 安装 BlobFuse2
sudo apt-get install blobfuse2

# 配置文件
cat > ~/blobfuse-config.yaml << 'EOF'
logging:
  type: syslog
  level: log_warning

components:
  - libfuse
  - file_cache
  - azstorage

file_cache:
  path: /tmp/blobfuse2
  timeout-sec: 120
  max-size-mb: 4096

azstorage:
  type: block
  account-name: YOUR_ACCOUNT
  account-key: YOUR_KEY
  container: training-images
EOF

# 挂载
mkdir -p /mnt/azure-blob
blobfuse2 mount /mnt/azure-blob --config-file=~/blobfuse-config.yaml
```

### 本地开发（Windows）

```powershell
# 安装
winget install WinFsp.WinFsp
winget install Rclone.Rclone

# 配置
rclone config  # 选择 azureblob

# 挂载为 Z: 盘
rclone mount azure:training-images Z: --vfs-cache-mode full
```

### 存储费用

| 层级 | 价格 | 适用场景 |
|------|------|---------|
| Hot | $0.018/GB/月 | 频繁访问 |
| Cool | $0.01/GB/月 | 偶尔访问 |
| Archive | $0.002/GB/月 | 长期存档 |

**本项目**: ~10,000 张图片 × 500KB = ~5GB → **~$0.09/月**

---

## 训练方案

### 方案总览

| 方案 | 适用场景 | 空闲费用 | 复杂度 |
|------|---------|---------|--------|
| Azure VM | 简单直接 | 24/7 收费 | 低 |
| Azure VM Spot | 省钱、可中断 | 24/7 收费 | 低 |
| Azure ML Compute | MLOps 集成 | 可缩到 0 | 中 |
| Container Apps GPU | Serverless | 自动缩到 0 | 中 |

### Azure VM vs Azure ML

| 特性 | Azure VM | Azure ML |
|------|----------|----------|
| 本质 | 虚拟机 | 托管 ML 平台 |
| 计算费用 | $3.06/hr (NC6s_v3) | $3.06/hr (相同) |
| 附加费用 | ~$5/月 | ~$20-30/月 |
| 实验跟踪 | 无 | 内置 |
| 自动扩缩 | 无 | 支持 min=0 |
| 适用人群 | DevOps | 数据科学家 |

### Azure ML 附加费用明细

| 服务 | 用途 | 费用 |
|------|------|------|
| Container Registry | Docker 镜像 | ~$5-20/月 |
| Blob Storage | 日志、模型 | ~$0.10/月 |
| Application Insights | 监控 | ~$0-10/月 |
| Key Vault | 密钥管理 | <$1/月 |

### Spot 实例

两种平台都支持 Spot/低优先级实例，最高节省 90%：

| 类型 | 正常价格 | Spot 价格 | 节省 |
|------|---------|----------|------|
| NC6s_v3 (V100) | $3.06/hr | ~$0.92/hr | 70% |
| NC24ads_A100_v4 | $3.67/hr | ~$1.15/hr | 69% |

### GPU 实例价格

| 实例 | GPU | 显存 | 价格/小时 | Spot 价格 |
|------|-----|------|---------|----------|
| NC6s_v3 | 1x V100 | 16GB | $3.06 | $0.92 |
| NC24s_v3 | 4x V100 | 64GB | $12.24 | $3.67 |
| NC24ads_A100_v4 | 1x A100 | 80GB | $3.67 | $1.15 |
| NC48ads_A100_v4 | 2x A100 | 160GB | $7.35 | $2.30 |

---

## 推理方案

### 方案对比

| 方案 | GPU 支持 | 扩缩容 | 价格 | 适用场景 |
|------|---------|--------|------|---------|
| Container Apps (CPU) | 否 | 自动 0-N | ~$30/月 | YOLO 推理 (够用) |
| Container Apps (GPU) | 是 | Serverless | 按秒计费 | 高吞吐推理 |
| Azure App Service | 否 | 手动/自动 | ~$50/月 | 简单部署 |
| Azure ML Endpoint | 是 | 自动 | ~$100+/月 | MLOps 集成 |
| AKS (Kubernetes) | 是 | 自动 | 复杂计费 | 大规模生产 |

### 推荐: Container Apps (CPU)

对于 YOLO 推理，**CPU 足够**，不需要 GPU：
- YOLOv11n 在 CPU 上推理时间 ~200-500ms
- 比 GPU 便宜很多，适合中低流量

```yaml
# Container Apps 配置
name: invoice-inference
image: myacr.azurecr.io/invoice-inference:v1
resources:
  cpu: 2.0
  memory: 4Gi
scale:
  minReplicas: 1      # 最少 1 个实例保持响应
  maxReplicas: 10     # 最多扩展到 10 个
  rules:
    - name: http-scaling
      http:
        metadata:
          concurrentRequests: "50"  # 每实例 50 并发时扩容
```

### 推理服务代码示例

```python
# Dockerfile
FROM python:3.11-slim

WORKDIR /app

# 安装依赖
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# 复制代码和模型
COPY src/ ./src/
COPY models/best.pt ./models/

# 启动服务
CMD ["uvicorn", "src.web.app:app", "--host", "0.0.0.0", "--port", "8000"]
```

```python
# src/web/app.py
from fastapi import FastAPI, UploadFile, File
from ultralytics import YOLO
import tempfile

app = FastAPI()
model = YOLO("models/best.pt")

@app.post("/api/v1/infer")
async def infer(file: UploadFile = File(...)):
    # 保存上传文件
    with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp:
        content = await file.read()
        tmp.write(content)
        tmp_path = tmp.name

    # 执行推理
    results = model.predict(tmp_path, conf=0.5)

    # 返回结果
    return {
        "fields": extract_fields(results),
        "confidence": get_confidence(results)
    }

@app.get("/health")
async def health():
    return {"status": "healthy"}
```

### 部署命令

```bash
# 1. 创建 Container Registry
az acr create --name invoiceacr --resource-group myRG --sku Basic

# 2. 构建并推送镜像
az acr build --registry invoiceacr --image invoice-inference:v1 .

# 3. 创建 Container Apps 环境
az containerapp env create \
  --name invoice-env \
  --resource-group myRG \
  --location eastus

# 4. 部署应用
az containerapp create \
  --name invoice-inference \
  --resource-group myRG \
  --environment invoice-env \
  --image invoiceacr.azurecr.io/invoice-inference:v1 \
  --registry-server invoiceacr.azurecr.io \
  --cpu 2 --memory 4Gi \
  --min-replicas 1 --max-replicas 10 \
  --ingress external --target-port 8000

# 5. 获取 URL
az containerapp show --name invoice-inference --resource-group myRG --query properties.configuration.ingress.fqdn
```

### 高吞吐场景: Serverless GPU

如果需要 GPU 加速推理（高并发、低延迟）：

```bash
# 请求 GPU 配额
az containerapp env workload-profile add \
  --name invoice-env \
  --resource-group myRG \
  --workload-profile-name gpu \
  --workload-profile-type Consumption-GPU-T4

# 部署 GPU 版本
az containerapp create \
  --name invoice-inference-gpu \
  --resource-group myRG \
  --environment invoice-env \
  --image invoiceacr.azurecr.io/invoice-inference-gpu:v1 \
  --workload-profile-name gpu \
  --cpu 4 --memory 8Gi \
  --min-replicas 0 --max-replicas 5 \
  --ingress external --target-port 8000
```

### 推理性能对比

| 配置 | 单次推理时间 | 并发能力 | 月费估算 |
|------|------------|---------|---------|
| CPU 2核 4GB | ~300-500ms | ~50 QPS | ~$30 |
| CPU 4核 8GB | ~200-300ms | ~100 QPS | ~$60 |
| GPU T4 | ~50-100ms | ~200 QPS | 按秒计费 |
| GPU A100 | ~20-50ms | ~500 QPS | 按秒计费 |

---

## 价格对比

### 月度成本对比（假设每天训练 2 小时）

| 方案 | 计算方式 | 月费 |
|------|---------|------|
| VM 24/7 运行 | 24h × 30天 × $3.06 | ~$2,200 |
| VM 按需启停 | 2h × 30天 × $3.06 | ~$184 |
| VM Spot 按需 | 2h × 30天 × $0.92 | ~$55 |
| Serverless GPU | 2h × 30天 × ~$3.50 | ~$210 |
| Azure ML (min=0) | 2h × 30天 × $3.06 | ~$184 |

### 本项目完整成本估算

| 组件 | 推荐方案 | 月费 |
|------|---------|------|
| 图片存储 | Blob Storage (Hot) | ~$0.10 |
| 数据库 | PostgreSQL Flexible (Burstable B1ms) | ~$25 |
| 推理服务 | Container Apps CPU (2核4GB) | ~$30 |
| 训练服务 | Azure ML Spot (按需) | ~$1-5/次 |
| Container Registry | Basic | ~$5 |
| **总计** | | **~$65/月** + 训练费 |

---

## 推荐架构

### 整体架构图

```
                            ┌─────────────────────────────────────┐
                            │         Azure Blob Storage          │
                            │  ├── training-images/               │
                            │  ├── datasets/                      │
                            │  └── models/                        │
                            └─────────────────┬───────────────────┘
                                              │
            ┌─────────────────────────────────┼─────────────────────────────────┐
            │                                 │                                 │
            ▼                                 ▼                                 ▼
┌───────────────────────┐       ┌───────────────────────┐       ┌───────────────────────┐
│   推理服务 (24/7)      │       │   训练服务 (按需)      │       │   Web UI (可选)        │
│   Container Apps      │       │   Azure ML Compute    │       │   Static Web Apps     │
│   CPU 2核 4GB         │       │   min=0, Spot         │       │   ~$0 (免费层)         │
│   ~$30/月             │       │   ~$1-5/次训练        │       │                       │
│                       │       │                       │       │                       │
│ ┌───────────────────┐ │       │ ┌───────────────────┐ │       │ ┌───────────────────┐ │
│ │ FastAPI + YOLO    │ │       │ │ YOLOv11 Training  │ │       │ │ React/Vue 前端    │ │
│ │ /api/v1/infer     │ │       │ │ 100 epochs        │ │       │ │ 上传发票界面      │ │
│ └───────────────────┘ │       │ └───────────────────┘ │       │ └───────────────────┘ │
└───────────┬───────────┘       └───────────┬───────────┘       └───────────┬───────────┘
            │                               │                               │
            └───────────────────────────────┼───────────────────────────────┘
                                            │
                                            ▼
                              ┌───────────────────────┐
                              │   PostgreSQL          │
                              │   Flexible Server     │
                              │   Burstable B1ms      │
                              │   ~$25/月             │
                              └───────────────────────┘
```

### 推理服务配置

```yaml
# Container Apps - CPU (24/7 运行)
name: invoice-inference
resources:
  cpu: 2
  memory: 4Gi
scale:
  minReplicas: 1
  maxReplicas: 10
env:
  - name: MODEL_PATH
    value: /app/models/best.pt
  - name: DB_HOST
    secretRef: db-host
  - name: DB_PASSWORD
    secretRef: db-password
```

### 训练服务配置

**方案 A: Azure ML Compute（推荐）**

```python
from azure.ai.ml.entities import AmlCompute

gpu_cluster = AmlCompute(
    name="gpu-cluster",
    size="Standard_NC6s_v3",
    min_instances=0,      # 空闲时关机
    max_instances=1,
    tier="LowPriority",   # Spot 实例
    idle_time_before_scale_down=120
)
```

**方案 B: Container Apps Serverless GPU**

```yaml
name: invoice-training
resources:
  gpu: 1
  gpuType: A100
scale:
  minReplicas: 0
  maxReplicas: 1
```

---

## 实施步骤

### 阶段 1: 存储设置

```bash
# 创建 Storage Account
az storage account create \
  --name invoicestorage \
  --resource-group myRG \
  --sku Standard_LRS

# 创建容器
az storage container create --name training-images --account-name invoicestorage
az storage container create --name datasets --account-name invoicestorage
az storage container create --name models --account-name invoicestorage

# 上传训练数据
az storage blob upload-batch \
  --destination training-images \
  --source ./data/dataset/temp \
  --account-name invoicestorage
```

### 阶段 2: 数据库设置

```bash
# 创建 PostgreSQL
az postgres flexible-server create \
  --name invoice-db \
  --resource-group myRG \
  --sku-name Standard_B1ms \
  --storage-size 32 \
  --admin-user docmaster \
  --admin-password YOUR_PASSWORD

# 配置防火墙
az postgres flexible-server firewall-rule create \
  --name allow-azure \
  --resource-group myRG \
  --server-name invoice-db \
  --start-ip-address 0.0.0.0 \
  --end-ip-address 0.0.0.0
```

### 阶段 3: 推理服务部署

```bash
# 创建 Container Registry
az acr create --name invoiceacr --resource-group myRG --sku Basic

# 构建镜像
az acr build --registry invoiceacr --image invoice-inference:v1 .

# 创建环境
az containerapp env create \
  --name invoice-env \
  --resource-group myRG \
  --location eastus

# 部署推理服务
az containerapp create \
  --name invoice-inference \
  --resource-group myRG \
  --environment invoice-env \
  --image invoiceacr.azurecr.io/invoice-inference:v1 \
  --registry-server invoiceacr.azurecr.io \
  --cpu 2 --memory 4Gi \
  --min-replicas 1 --max-replicas 10 \
  --ingress external --target-port 8000 \
  --env-vars \
    DB_HOST=invoice-db.postgres.database.azure.com \
    DB_NAME=docmaster \
    DB_USER=docmaster \
  --secrets db-password=YOUR_PASSWORD
```

### 阶段 4: 训练服务设置

```bash
# 创建 Azure ML Workspace
az ml workspace create --name invoice-ml --resource-group myRG

# 创建 Compute Cluster
az ml compute create --name gpu-cluster \
  --type AmlCompute \
  --size Standard_NC6s_v3 \
  --min-instances 0 \
  --max-instances 1 \
  --tier low_priority
```

### 阶段 5: 集成训练触发 API

```python
# src/web/routes/training.py
from fastapi import APIRouter
from azure.ai.ml import MLClient, command
from azure.identity import DefaultAzureCredential

router = APIRouter()

ml_client = MLClient(
    credential=DefaultAzureCredential(),
    subscription_id="your-subscription-id",
    resource_group_name="myRG",
    workspace_name="invoice-ml"
)

@router.post("/api/v1/train")
async def trigger_training(request: TrainingRequest):
    """触发 Azure ML 训练任务"""
    training_job = command(
        code="./training",
        command=f"python train.py --epochs {request.epochs}",
        environment="AzureML-pytorch-2.0-cuda11.8@latest",
        compute="gpu-cluster",
    )
    job = ml_client.jobs.create_or_update(training_job)
    return {
        "job_id": job.name,
        "status": job.status,
        "studio_url": job.studio_url
    }

@router.get("/api/v1/train/{job_id}/status")
async def get_training_status(job_id: str):
    """查询训练状态"""
    job = ml_client.jobs.get(job_id)
    return {"status": job.status}
```

---

## 总结

### 推荐配置

| 组件 | 推荐方案 | 月费估算 |
|------|---------|---------|
| 图片存储 | Blob Storage (Hot) | ~$0.10 |
| 数据库 | PostgreSQL Flexible | ~$25 |
| 推理服务 | Container Apps CPU | ~$30 |
| 训练服务 | Azure ML (min=0, Spot) | 按需 ~$1-5/次 |
| Container Registry | Basic | ~$5 |
| **总计** | | **~$65/月** + 训练费 |

### 关键决策

| 场景 | 选择 |
|------|------|
| 偶尔训练，简单需求 | Azure VM Spot + 手动启停 |
| 需要 MLOps，团队协作 | Azure ML Compute |
| 追求最低空闲成本 | Container Apps Serverless GPU |
| 生产环境推理 | Container Apps CPU |
| 高并发推理 | Container Apps Serverless GPU |

### 注意事项

1. **冷启动**: Serverless GPU 启动需要 3-8 分钟
2. **Spot 中断**: 可能被抢占，需要检查点机制
3. **网络延迟**: Blob Storage 挂载比本地 SSD 慢，建议开启缓存
4. **区域选择**: 选择有 GPU 配额的区域 (East US, West Europe 等)
5. **推理优化**: CPU 推理对于 YOLO 已经足够，无需 GPU