This commit is contained in:
Yaojia Wang
2026-02-11 23:40:38 +01:00
parent f1a7bfe6b7
commit ad5ed46b4c
117 changed files with 5741 additions and 7669 deletions

View File

@@ -32,7 +32,7 @@
### 1.1 项目背景
Invoice Master是一个基于YOLOv11 + PaddleOCR的发票字段自动提取系统当前准确率达到94.8%。本方案设计将Invoice Master作为Fortnox会计软件的插件/扩展,实现无缝的发票数据导入功能。
Invoice Master是一个基于YOLO26 + PaddleOCR的发票字段自动提取系统当前准确率达到94.8%。本方案设计将Invoice Master作为Fortnox会计软件的插件/扩展,实现无缝的发票数据导入功能。
### 1.2 目标

View File

@@ -500,7 +500,7 @@ estimator = PyTorch(
hyperparameters={
"epochs": 100,
"batch-size": 16,
"model": "yolo11n.pt"
"model": "yolo26s.pt"
}
)
```

View File

@@ -152,7 +152,7 @@ rclone mount azure:training-images Z: --vfs-cache-mode full
### 推荐: Container Apps (CPU)
对于 YOLO 推理,**CPU 足够**,不需要 GPU
- YOLOv11n 在 CPU 上推理时间 ~200-500ms
- YOLO26s 在 CPU 上推理时间 ~200-500ms
- 比 GPU 便宜很多,适合中低流量
```yaml
@@ -335,7 +335,7 @@ az containerapp create \
│ ~$30/月 │ │ ~$1-5/次训练 │ │ │
│ │ │ │ │ │
│ ┌───────────────────┐ │ │ ┌───────────────────┐ │ │ ┌───────────────────┐ │
│ │ FastAPI + YOLO │ │ │ │ YOLOv11 Training │ │ │ │ React/Vue 前端 │ │
│ │ FastAPI + YOLO │ │ │ │ YOLO26 Training │ │ │ │ React/Vue 前端 │ │
│ │ /api/v1/infer │ │ │ │ 100 epochs │ │ │ │ 上传发票界面 │ │
│ └───────────────────┘ │ │ └───────────────────┘ │ │ └───────────────────┘ │
└───────────┬───────────┘ └───────────┬───────────┘ └───────────┬───────────┘

View File

@@ -0,0 +1,185 @@
# YOLO Model Fine-Tuning Best Practices
Production guide for continuous fine-tuning of YOLO object detection models with user feedback.
## Overview
When users report failed detections, those documents are collected, reviewed, and used to incrementally improve the model without degrading performance on existing data.
Key risks:
- **Catastrophic forgetting**: model forgets original training after fine-tuning on small new data
- **Cumulative drift**: repeated fine-tuning sessions cause progressive degradation
- **Overfitting**: few samples + many epochs = memorizing noise
## 1. Data Management
```
Original training set (25K) --> permanently retained as "anchor dataset"
|
User-reported failures --> human review & labeling --> "fine-tune pool"
|
Fine-tune pool accumulates over time, never deleted
```
Every new sample MUST be human-verified before entering the fine-tune pool. Incorrect labels are more harmful than no labels.
### Data Mixing Ratios
| Accumulated New Samples | Old Data Multiplier | Total Training Size |
|------------------------|--------------------|--------------------|
| 10 | 50x (500) | 510 |
| 50 | 20x (1,000) | 1,050 |
| 200 | 10x (2,000) | 2,200 |
| 500+ | 5x (2,500) | 3,000 |
Principle: fewer new samples require higher old data ratio. Stabilize at 5x once pool reaches 500+.
Old samples are randomly sampled from the original 25K each time, ensuring broad coverage.
## 2. Model Version Management
```
base_v1.pt (original 25K training)
+-- ft_v1.1.pt (base + fine-tune batch 1)
+-- ft_v1.2.pt (base + fine-tune batch 1+2)
+-- ...
When fine-tune pool reaches 2000+ samples:
base_v2.pt (original 25K + all accumulated samples, trained from scratch)
+-- ft_v2.1.pt
+-- ...
```
CRITICAL: Never chain fine-tunes (ft_v1.1 -> ft_v1.2 -> ft_v1.3). Always start from the base model to avoid cumulative drift.
## 3. Fine-Tuning Parameters
```yaml
base_model: best.pt # always start from base model
epochs: 10 # few epochs are sufficient
lr0: 0.001 # 1/10 of base training lr
freeze: 10 # freeze first 10 backbone layers
warmup_epochs: 1
cos_lr: true
# data mixing
new_samples: all # entire fine-tune pool
old_samples: min(5x_new, 3000) # old data sampling, cap at 3000
```
### Why These Settings
| Parameter | Rationale |
|-----------|-----------|
| `epochs: 10` | More than enough for small datasets; prevents overfitting |
| `lr0: 0.001` | Low learning rate preserves base model knowledge |
| `freeze: 10` | Backbone features are general; only fine-tune detection head and later layers |
| `cos_lr: true` | Smooth decay prevents sharp weight updates |
## 4. Deployment Gating (Most Important)
Every fine-tuned model MUST pass three gates before deployment:
### Gate 1: Regression Validation
Run evaluation on the original test set (held out from the 25K training data).
| mAP50 Change | Action |
|-------------|--------|
| Drop < 1% | PASS - deploy |
| Drop 1-3% | REVIEW - human inspection required |
| Drop > 3% | REJECT - do not deploy |
### Gate 2: New Sample Validation
Run inference on the new failure documents.
| Detection Rate | Action |
|---------------|--------|
| > 80% correct | PASS |
| < 80% correct | REVIEW - check label quality or increase training |
### Gate 3: A/B Comparison (Optional)
Sample 100 production documents, run both old and new models:
- New model must not be worse on any field type
- Compare per-class mAP to detect targeted regressions
## 5. Fine-Tuning Frequency
| Strategy | Trigger | Recommendation |
|----------|---------|---------------|
| **By volume (recommended)** | Pool reaches 50+ new samples | Best signal-to-noise ratio |
| By schedule | Weekly or monthly | Predictable but may trigger with insufficient data |
| By performance | Monitored accuracy drops below threshold | Reactive, requires monitoring infrastructure |
Do NOT fine-tune daily with fewer than 50 samples. The noise outweighs the signal.
## 6. Complete Workflow
```
User marks failed document
|
v
Human reviews and labels annotations
|
v
Add to fine-tune pool
|
v
Pool >= 50 samples? --NO--> Wait for more samples
|
YES
|
v
Prepare mixed dataset:
- All samples from fine-tune pool
- Random sample 5x from original 25K
|
v
Fine-tune from base.pt:
- 10 epochs
- lr0 = 0.001
- freeze first 10 layers
|
v
Gate 1: Original test set mAP drop < 1%?
|
PASS
|
v
Gate 2: New sample detection rate > 80%?
|
PASS
|
v
Deploy new model, retain old model for rollback
|
v
Pool accumulated 2000+ samples?
|
YES --> Merge all data, train new base from scratch
```
## 7. Monitoring in Production
Track these metrics continuously:
| Metric | Purpose | Alert Threshold |
|--------|---------|----------------|
| Detection rate per field | Catch field-specific regressions | < 90% for any field |
| Average confidence score | Detect model uncertainty drift | Drop > 5% from baseline |
| User-reported failures / week | Measure improvement trend | Increasing over 3 weeks |
| Inference latency | Ensure model size hasn't bloated | > 2x baseline |
## 8. Summary of Rules
| Rule | Practice |
|------|----------|
| Never chain fine-tunes | Always start from base.pt |
| Never use only new data | Must mix with old data |
| Never fine-tune on < 50 samples | Accumulate before triggering |
| Never auto-deploy | Must pass gating validation |
| Never discard old models | Retain versions for rollback |
| Periodically retrain base | Merge all data at 2000+ new samples |
| Always human-review labels | Bad labels are worse than no labels |

View File

@@ -546,7 +546,7 @@ Request:
"description": "First training run with 500 documents",
"document_ids": ["uuid1", "uuid2", "uuid3"],
"config": {
"model_name": "yolo11n.pt",
"model_name": "yolo26s.pt",
"epochs": 100,
"batch_size": 16,
"image_size": 640
@@ -1036,7 +1036,7 @@ Response:
| | Name: [Training Run 2024-01____________] | |
| | Description: [First training with 500 documents_________] | |
| | | |
| | Base Model: [yolo11n.pt v] Epochs: [100] Batch: [16] | |
| | Base Model: [yolo26s.pt v] Epochs: [100] Batch: [16] | |
| | Image Size: [640] Device: [GPU 0 v] | |
| | | |
| | [ ] Schedule for later: [2024-01-20] [22:00] | |
@@ -1088,7 +1088,7 @@ Response:
| | - Recall: 92% | |
| | | |
| | Configuration: | |
| | - Base: yolo11n.pt Epochs: 100 Batch: 16 Size: 640 | |
| | - Base: yolo26s.pt Epochs: 100 Batch: 16 Size: 640 | |
| | | |
| | Documents Used: [View 600 documents] | |
| +--------------------------------------------------------------+ |

View File

@@ -27,7 +27,7 @@ flowchart TD
I --> I1{--resume?}
I1 -- Yes --> I2[Load last.pt checkpoint]
I1 -- No --> I3[Load pretrained model\ne.g. yolo11n.pt]
I1 -- No --> I3[Load pretrained model\ne.g. yolo26s.pt]
I2 --> J[Configure Training]
I3 --> J