WIP

2026-02-11 23:40:38 +01:00
parent f1a7bfe6b7
commit ad5ed46b4c
117 changed files with 5741 additions and 7669 deletions
--- a/docs/FORTNOX_INTEGRATION_SPEC.md
+++ b/docs/FORTNOX_INTEGRATION_SPEC.md
@@ -32,7 +32,7 @@

 ### 1.1 项目背景

-Invoice Master是一个基于YOLOv11 + PaddleOCR的发票字段自动提取系统，当前准确率达到94.8%。本方案设计将Invoice Master作为Fortnox会计软件的插件/扩展，实现无缝的发票数据导入功能。
+Invoice Master是一个基于YOLO26 + PaddleOCR的发票字段自动提取系统，当前准确率达到94.8%。本方案设计将Invoice Master作为Fortnox会计软件的插件/扩展，实现无缝的发票数据导入功能。

 ### 1.2 目标

--- a/docs/aws-deployment-guide.md
+++ b/docs/aws-deployment-guide.md
@@ -500,7 +500,7 @@ estimator = PyTorch(
    hyperparameters={
        "epochs": 100,
        "batch-size": 16,
-        "model": "yolo11n.pt"
+        "model": "yolo26s.pt"
    }
 )
 ```
--- a/docs/azure-deployment-guide.md
+++ b/docs/azure-deployment-guide.md
@@ -152,7 +152,7 @@ rclone mount azure:training-images Z: --vfs-cache-mode full
 ### 推荐: Container Apps (CPU)

 对于 YOLO 推理，**CPU 足够**，不需要 GPU：
- YOLOv11n 在 CPU 上推理时间 ~200-500ms
+- YOLO26s 在 CPU 上推理时间 ~200-500ms
 - 比 GPU 便宜很多，适合中低流量

 ```yaml
@@ -335,7 +335,7 @@ az containerapp create \
 │   ~$30/月             │       │   ~$1-5/次训练        │       │                       │
 │                       │       │                       │       │                       │
 │ ┌───────────────────┐ │       │ ┌───────────────────┐ │       │ ┌───────────────────┐ │
-│ │ FastAPI + YOLO    │ │       │ │ YOLOv11 Training  │ │       │ │ React/Vue 前端    │ │
+│ │ FastAPI + YOLO    │ │       │ │ YOLO26 Training   │ │       │ │ React/Vue 前端    │ │
 │ │ /api/v1/infer     │ │       │ │ 100 epochs        │ │       │ │ 上传发票界面      │ │
 │ └───────────────────┘ │       │ └───────────────────┘ │       │ └───────────────────┘ │
 └───────────┬───────────┘       └───────────┬───────────┘       └───────────┬───────────┘
--- a/docs/fine-tuning-best-practices.md
+++ b/docs/fine-tuning-best-practices.md
@@ -0,0 +1,185 @@
+# YOLO Model Fine-Tuning Best Practices
+
+Production guide for continuous fine-tuning of YOLO object detection models with user feedback.
+
+## Overview
+
+When users report failed detections, those documents are collected, reviewed, and used to incrementally improve the model without degrading performance on existing data.
+
+Key risks:
+- **Catastrophic forgetting**: model forgets original training after fine-tuning on small new data
+- **Cumulative drift**: repeated fine-tuning sessions cause progressive degradation
+- **Overfitting**: few samples + many epochs = memorizing noise
+
+## 1. Data Management
+
+```
+Original training set (25K) --> permanently retained as "anchor dataset"
+         |
+User-reported failures --> human review & labeling --> "fine-tune pool"
+         |
+Fine-tune pool accumulates over time, never deleted
+```
+
+Every new sample MUST be human-verified before entering the fine-tune pool. Incorrect labels are more harmful than no labels.
+
+### Data Mixing Ratios
+
+| Accumulated New Samples | Old Data Multiplier | Total Training Size |
+|------------------------|--------------------|--------------------|
+| 10                     | 50x (500)          | 510                |
+| 50                     | 20x (1,000)        | 1,050              |
+| 200                    | 10x (2,000)        | 2,200              |
+| 500+                   | 5x (2,500)         | 3,000              |
+
+Principle: fewer new samples require higher old data ratio. Stabilize at 5x once pool reaches 500+.
+
+Old samples are randomly sampled from the original 25K each time, ensuring broad coverage.
+
+## 2. Model Version Management
+
+```
+base_v1.pt (original 25K training)
+  +-- ft_v1.1.pt (base + fine-tune batch 1)
+  +-- ft_v1.2.pt (base + fine-tune batch 1+2)
+  +-- ...
+
+When fine-tune pool reaches 2000+ samples:
+base_v2.pt (original 25K + all accumulated samples, trained from scratch)
+  +-- ft_v2.1.pt
+  +-- ...
+```
+
+CRITICAL: Never chain fine-tunes (ft_v1.1 -> ft_v1.2 -> ft_v1.3). Always start from the base model to avoid cumulative drift.
+
+## 3. Fine-Tuning Parameters
+
+```yaml
+base_model: best.pt           # always start from base model
+epochs: 10                    # few epochs are sufficient
+lr0: 0.001                    # 1/10 of base training lr
+freeze: 10                    # freeze first 10 backbone layers
+warmup_epochs: 1
+cos_lr: true
+
+# data mixing
+new_samples: all              # entire fine-tune pool
+old_samples: min(5x_new, 3000) # old data sampling, cap at 3000
+```
+
+### Why These Settings
+
+| Parameter | Rationale |
+|-----------|-----------|
+| `epochs: 10` | More than enough for small datasets; prevents overfitting |
+| `lr0: 0.001` | Low learning rate preserves base model knowledge |
+| `freeze: 10` | Backbone features are general; only fine-tune detection head and later layers |
+| `cos_lr: true` | Smooth decay prevents sharp weight updates |
+
+## 4. Deployment Gating (Most Important)
+
+Every fine-tuned model MUST pass three gates before deployment:
+
+### Gate 1: Regression Validation
+
+Run evaluation on the original test set (held out from the 25K training data).
+
+| mAP50 Change | Action |
+|-------------|--------|
+| Drop < 1%   | PASS - deploy |
+| Drop 1-3%   | REVIEW - human inspection required |
+| Drop > 3%   | REJECT - do not deploy |
+
+### Gate 2: New Sample Validation
+
+Run inference on the new failure documents.
+
+| Detection Rate | Action |
+|---------------|--------|
+| > 80% correct | PASS |
+| < 80% correct | REVIEW - check label quality or increase training |
+
+### Gate 3: A/B Comparison (Optional)
+
+Sample 100 production documents, run both old and new models:
+- New model must not be worse on any field type
+- Compare per-class mAP to detect targeted regressions
+
+## 5. Fine-Tuning Frequency
+
+| Strategy | Trigger | Recommendation |
+|----------|---------|---------------|
+| **By volume (recommended)** | Pool reaches 50+ new samples | Best signal-to-noise ratio |
+| By schedule | Weekly or monthly | Predictable but may trigger with insufficient data |
+| By performance | Monitored accuracy drops below threshold | Reactive, requires monitoring infrastructure |
+
+Do NOT fine-tune daily with fewer than 50 samples. The noise outweighs the signal.
+
+## 6. Complete Workflow
+
+```
+User marks failed document
+       |
+       v
+Human reviews and labels annotations
+       |
+       v
+Add to fine-tune pool
+       |
+       v
+Pool >= 50 samples? --NO--> Wait for more samples
+       |
+      YES
+       |
+       v
+Prepare mixed dataset:
+  - All samples from fine-tune pool
+  - Random sample 5x from original 25K
+       |
+       v
+Fine-tune from base.pt:
+  - 10 epochs
+  - lr0 = 0.001
+  - freeze first 10 layers
+       |
+       v
+Gate 1: Original test set mAP drop < 1%?
+       |
+      PASS
+       |
+       v
+Gate 2: New sample detection rate > 80%?
+       |
+      PASS
+       |
+       v
+Deploy new model, retain old model for rollback
+       |
+       v
+Pool accumulated 2000+ samples?
+       |
+      YES --> Merge all data, train new base from scratch
+```
+
+## 7. Monitoring in Production
+
+Track these metrics continuously:
+
+| Metric | Purpose | Alert Threshold |
+|--------|---------|----------------|
+| Detection rate per field | Catch field-specific regressions | < 90% for any field |
+| Average confidence score | Detect model uncertainty drift | Drop > 5% from baseline |
+| User-reported failures / week | Measure improvement trend | Increasing over 3 weeks |
+| Inference latency | Ensure model size hasn't bloated | > 2x baseline |
+
+## 8. Summary of Rules
+
+| Rule | Practice |
+|------|----------|
+| Never chain fine-tunes | Always start from base.pt |
+| Never use only new data | Must mix with old data |
+| Never fine-tune on < 50 samples | Accumulate before triggering |
+| Never auto-deploy | Must pass gating validation |
+| Never discard old models | Retain versions for rollback |
+| Periodically retrain base | Merge all data at 2000+ new samples |
+| Always human-review labels | Bad labels are worse than no labels |
--- a/docs/product-plan-v2.md
+++ b/docs/product-plan-v2.md
@@ -546,7 +546,7 @@ Request:
  "description": "First training run with 500 documents",
  "document_ids": ["uuid1", "uuid2", "uuid3"],
  "config": {
-    "model_name": "yolo11n.pt",
+    "model_name": "yolo26s.pt",
    "epochs": 100,
    "batch_size": 16,
    "image_size": 640
@@ -1036,7 +1036,7 @@ Response:
 |  | Name: [Training Run 2024-01____________]                      | |
 |  | Description: [First training with 500 documents_________]     | |
 |  |                                                               | |
-|  | Base Model: [yolo11n.pt v]   Epochs: [100]   Batch: [16]     | |
+|  | Base Model: [yolo26s.pt v]   Epochs: [100]   Batch: [16]     | |
 |  | Image Size: [640]            Device: [GPU 0 v]               | |
 |  |                                                               | |
 |  | [ ] Schedule for later: [2024-01-20] [22:00]                 | |
@@ -1088,7 +1088,7 @@ Response:
 |  | - Recall: 92%                                                | |
 |  |                                                               | |
 |  | Configuration:                                                | |
-|  | - Base: yolo11n.pt   Epochs: 100   Batch: 16   Size: 640    | |
+|  | - Base: yolo26s.pt   Epochs: 100   Batch: 16   Size: 640    | |
 |  |                                                               | |
 |  | Documents Used: [View 600 documents]                          | |
 |  +--------------------------------------------------------------+ |
--- a/docs/training-flow.mmd
+++ b/docs/training-flow.mmd
@@ -27,7 +27,7 @@ flowchart TD

    I --> I1{--resume?}
    I1 -- Yes --> I2[Load last.pt checkpoint]
-    I1 -- No --> I3[Load pretrained model\ne.g. yolo11n.pt]
+    I1 -- No --> I3[Load pretrained model\ne.g. yolo26s.pt]

    I2 --> J[Configure Training]
    I3 --> J