WIP
This commit is contained in:
@@ -32,7 +32,7 @@
|
||||
|
||||
### 1.1 项目背景
|
||||
|
||||
Invoice Master是一个基于YOLOv11 + PaddleOCR的发票字段自动提取系统,当前准确率达到94.8%。本方案设计将Invoice Master作为Fortnox会计软件的插件/扩展,实现无缝的发票数据导入功能。
|
||||
Invoice Master是一个基于YOLO26 + PaddleOCR的发票字段自动提取系统,当前准确率达到94.8%。本方案设计将Invoice Master作为Fortnox会计软件的插件/扩展,实现无缝的发票数据导入功能。
|
||||
|
||||
### 1.2 目标
|
||||
|
||||
|
||||
@@ -500,7 +500,7 @@ estimator = PyTorch(
|
||||
hyperparameters={
|
||||
"epochs": 100,
|
||||
"batch-size": 16,
|
||||
"model": "yolo11n.pt"
|
||||
"model": "yolo26s.pt"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
@@ -152,7 +152,7 @@ rclone mount azure:training-images Z: --vfs-cache-mode full
|
||||
### 推荐: Container Apps (CPU)
|
||||
|
||||
对于 YOLO 推理,**CPU 足够**,不需要 GPU:
|
||||
- YOLOv11n 在 CPU 上推理时间 ~200-500ms
|
||||
- YOLO26s 在 CPU 上推理时间 ~200-500ms
|
||||
- 比 GPU 便宜很多,适合中低流量
|
||||
|
||||
```yaml
|
||||
@@ -335,7 +335,7 @@ az containerapp create \
|
||||
│ ~$30/月 │ │ ~$1-5/次训练 │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ ┌───────────────────┐ │ │ ┌───────────────────┐ │ │ ┌───────────────────┐ │
|
||||
│ │ FastAPI + YOLO │ │ │ │ YOLOv11 Training │ │ │ │ React/Vue 前端 │ │
|
||||
│ │ FastAPI + YOLO │ │ │ │ YOLO26 Training │ │ │ │ React/Vue 前端 │ │
|
||||
│ │ /api/v1/infer │ │ │ │ 100 epochs │ │ │ │ 上传发票界面 │ │
|
||||
│ └───────────────────┘ │ │ └───────────────────┘ │ │ └───────────────────┘ │
|
||||
└───────────┬───────────┘ └───────────┬───────────┘ └───────────┬───────────┘
|
||||
|
||||
185
docs/fine-tuning-best-practices.md
Normal file
185
docs/fine-tuning-best-practices.md
Normal file
@@ -0,0 +1,185 @@
|
||||
# YOLO Model Fine-Tuning Best Practices
|
||||
|
||||
Production guide for continuous fine-tuning of YOLO object detection models with user feedback.
|
||||
|
||||
## Overview
|
||||
|
||||
When users report failed detections, those documents are collected, reviewed, and used to incrementally improve the model without degrading performance on existing data.
|
||||
|
||||
Key risks:
|
||||
- **Catastrophic forgetting**: model forgets original training after fine-tuning on small new data
|
||||
- **Cumulative drift**: repeated fine-tuning sessions cause progressive degradation
|
||||
- **Overfitting**: few samples + many epochs = memorizing noise
|
||||
|
||||
## 1. Data Management
|
||||
|
||||
```
|
||||
Original training set (25K) --> permanently retained as "anchor dataset"
|
||||
|
|
||||
User-reported failures --> human review & labeling --> "fine-tune pool"
|
||||
|
|
||||
Fine-tune pool accumulates over time, never deleted
|
||||
```
|
||||
|
||||
Every new sample MUST be human-verified before entering the fine-tune pool. Incorrect labels are more harmful than no labels.
|
||||
|
||||
### Data Mixing Ratios
|
||||
|
||||
| Accumulated New Samples | Old Data Multiplier | Total Training Size |
|
||||
|------------------------|--------------------|--------------------|
|
||||
| 10 | 50x (500) | 510 |
|
||||
| 50 | 20x (1,000) | 1,050 |
|
||||
| 200 | 10x (2,000) | 2,200 |
|
||||
| 500+ | 5x (2,500) | 3,000 |
|
||||
|
||||
Principle: fewer new samples require higher old data ratio. Stabilize at 5x once pool reaches 500+.
|
||||
|
||||
Old samples are randomly sampled from the original 25K each time, ensuring broad coverage.
|
||||
|
||||
## 2. Model Version Management
|
||||
|
||||
```
|
||||
base_v1.pt (original 25K training)
|
||||
+-- ft_v1.1.pt (base + fine-tune batch 1)
|
||||
+-- ft_v1.2.pt (base + fine-tune batch 1+2)
|
||||
+-- ...
|
||||
|
||||
When fine-tune pool reaches 2000+ samples:
|
||||
base_v2.pt (original 25K + all accumulated samples, trained from scratch)
|
||||
+-- ft_v2.1.pt
|
||||
+-- ...
|
||||
```
|
||||
|
||||
CRITICAL: Never chain fine-tunes (ft_v1.1 -> ft_v1.2 -> ft_v1.3). Always start from the base model to avoid cumulative drift.
|
||||
|
||||
## 3. Fine-Tuning Parameters
|
||||
|
||||
```yaml
|
||||
base_model: best.pt # always start from base model
|
||||
epochs: 10 # few epochs are sufficient
|
||||
lr0: 0.001 # 1/10 of base training lr
|
||||
freeze: 10 # freeze first 10 backbone layers
|
||||
warmup_epochs: 1
|
||||
cos_lr: true
|
||||
|
||||
# data mixing
|
||||
new_samples: all # entire fine-tune pool
|
||||
old_samples: min(5x_new, 3000) # old data sampling, cap at 3000
|
||||
```
|
||||
|
||||
### Why These Settings
|
||||
|
||||
| Parameter | Rationale |
|
||||
|-----------|-----------|
|
||||
| `epochs: 10` | More than enough for small datasets; prevents overfitting |
|
||||
| `lr0: 0.001` | Low learning rate preserves base model knowledge |
|
||||
| `freeze: 10` | Backbone features are general; only fine-tune detection head and later layers |
|
||||
| `cos_lr: true` | Smooth decay prevents sharp weight updates |
|
||||
|
||||
## 4. Deployment Gating (Most Important)
|
||||
|
||||
Every fine-tuned model MUST pass three gates before deployment:
|
||||
|
||||
### Gate 1: Regression Validation
|
||||
|
||||
Run evaluation on the original test set (held out from the 25K training data).
|
||||
|
||||
| mAP50 Change | Action |
|
||||
|-------------|--------|
|
||||
| Drop < 1% | PASS - deploy |
|
||||
| Drop 1-3% | REVIEW - human inspection required |
|
||||
| Drop > 3% | REJECT - do not deploy |
|
||||
|
||||
### Gate 2: New Sample Validation
|
||||
|
||||
Run inference on the new failure documents.
|
||||
|
||||
| Detection Rate | Action |
|
||||
|---------------|--------|
|
||||
| > 80% correct | PASS |
|
||||
| < 80% correct | REVIEW - check label quality or increase training |
|
||||
|
||||
### Gate 3: A/B Comparison (Optional)
|
||||
|
||||
Sample 100 production documents, run both old and new models:
|
||||
- New model must not be worse on any field type
|
||||
- Compare per-class mAP to detect targeted regressions
|
||||
|
||||
## 5. Fine-Tuning Frequency
|
||||
|
||||
| Strategy | Trigger | Recommendation |
|
||||
|----------|---------|---------------|
|
||||
| **By volume (recommended)** | Pool reaches 50+ new samples | Best signal-to-noise ratio |
|
||||
| By schedule | Weekly or monthly | Predictable but may trigger with insufficient data |
|
||||
| By performance | Monitored accuracy drops below threshold | Reactive, requires monitoring infrastructure |
|
||||
|
||||
Do NOT fine-tune daily with fewer than 50 samples. The noise outweighs the signal.
|
||||
|
||||
## 6. Complete Workflow
|
||||
|
||||
```
|
||||
User marks failed document
|
||||
|
|
||||
v
|
||||
Human reviews and labels annotations
|
||||
|
|
||||
v
|
||||
Add to fine-tune pool
|
||||
|
|
||||
v
|
||||
Pool >= 50 samples? --NO--> Wait for more samples
|
||||
|
|
||||
YES
|
||||
|
|
||||
v
|
||||
Prepare mixed dataset:
|
||||
- All samples from fine-tune pool
|
||||
- Random sample 5x from original 25K
|
||||
|
|
||||
v
|
||||
Fine-tune from base.pt:
|
||||
- 10 epochs
|
||||
- lr0 = 0.001
|
||||
- freeze first 10 layers
|
||||
|
|
||||
v
|
||||
Gate 1: Original test set mAP drop < 1%?
|
||||
|
|
||||
PASS
|
||||
|
|
||||
v
|
||||
Gate 2: New sample detection rate > 80%?
|
||||
|
|
||||
PASS
|
||||
|
|
||||
v
|
||||
Deploy new model, retain old model for rollback
|
||||
|
|
||||
v
|
||||
Pool accumulated 2000+ samples?
|
||||
|
|
||||
YES --> Merge all data, train new base from scratch
|
||||
```
|
||||
|
||||
## 7. Monitoring in Production
|
||||
|
||||
Track these metrics continuously:
|
||||
|
||||
| Metric | Purpose | Alert Threshold |
|
||||
|--------|---------|----------------|
|
||||
| Detection rate per field | Catch field-specific regressions | < 90% for any field |
|
||||
| Average confidence score | Detect model uncertainty drift | Drop > 5% from baseline |
|
||||
| User-reported failures / week | Measure improvement trend | Increasing over 3 weeks |
|
||||
| Inference latency | Ensure model size hasn't bloated | > 2x baseline |
|
||||
|
||||
## 8. Summary of Rules
|
||||
|
||||
| Rule | Practice |
|
||||
|------|----------|
|
||||
| Never chain fine-tunes | Always start from base.pt |
|
||||
| Never use only new data | Must mix with old data |
|
||||
| Never fine-tune on < 50 samples | Accumulate before triggering |
|
||||
| Never auto-deploy | Must pass gating validation |
|
||||
| Never discard old models | Retain versions for rollback |
|
||||
| Periodically retrain base | Merge all data at 2000+ new samples |
|
||||
| Always human-review labels | Bad labels are worse than no labels |
|
||||
@@ -546,7 +546,7 @@ Request:
|
||||
"description": "First training run with 500 documents",
|
||||
"document_ids": ["uuid1", "uuid2", "uuid3"],
|
||||
"config": {
|
||||
"model_name": "yolo11n.pt",
|
||||
"model_name": "yolo26s.pt",
|
||||
"epochs": 100,
|
||||
"batch_size": 16,
|
||||
"image_size": 640
|
||||
@@ -1036,7 +1036,7 @@ Response:
|
||||
| | Name: [Training Run 2024-01____________] | |
|
||||
| | Description: [First training with 500 documents_________] | |
|
||||
| | | |
|
||||
| | Base Model: [yolo11n.pt v] Epochs: [100] Batch: [16] | |
|
||||
| | Base Model: [yolo26s.pt v] Epochs: [100] Batch: [16] | |
|
||||
| | Image Size: [640] Device: [GPU 0 v] | |
|
||||
| | | |
|
||||
| | [ ] Schedule for later: [2024-01-20] [22:00] | |
|
||||
@@ -1088,7 +1088,7 @@ Response:
|
||||
| | - Recall: 92% | |
|
||||
| | | |
|
||||
| | Configuration: | |
|
||||
| | - Base: yolo11n.pt Epochs: 100 Batch: 16 Size: 640 | |
|
||||
| | - Base: yolo26s.pt Epochs: 100 Batch: 16 Size: 640 | |
|
||||
| | | |
|
||||
| | Documents Used: [View 600 documents] | |
|
||||
| +--------------------------------------------------------------+ |
|
||||
|
||||
@@ -27,7 +27,7 @@ flowchart TD
|
||||
|
||||
I --> I1{--resume?}
|
||||
I1 -- Yes --> I2[Load last.pt checkpoint]
|
||||
I1 -- No --> I3[Load pretrained model\ne.g. yolo11n.pt]
|
||||
I1 -- No --> I3[Load pretrained model\ne.g. yolo26s.pt]
|
||||
|
||||
I2 --> J[Configure Training]
|
||||
I3 --> J
|
||||
|
||||
Reference in New Issue
Block a user