Add claude config

2026-01-25 16:17:23 +01:00
parent e599424a92
commit d5101e3604
40 changed files with 5559 additions and 1378 deletions
--- a/.claude/commands/eval.md
+++ b/.claude/commands/eval.md
@@ -0,0 +1,174 @@
+# Eval Command
+
+Evaluate model performance and field extraction accuracy.
+
+## Usage
+
+`/eval [model|accuracy|compare|report]`
+
+## Model Evaluation
+
+`/eval model`
+
+Evaluate YOLO model performance on test dataset:
+
+```bash
+# Run model evaluation
+python -m src.cli.train --model runs/train/invoice_fields/weights/best.pt --eval-only
+
+# Or use ultralytics directly
+yolo val model=runs/train/invoice_fields/weights/best.pt data=data.yaml
+```
+
+Output:
+```
+Model Evaluation: invoice_fields/best.pt
+========================================
+mAP@0.5:     93.5%
+mAP@0.5-0.95: 83.0%
+
+Per-class AP:
+- invoice_number:    95.2%
+- invoice_date:      94.8%
+- invoice_due_date:  93.1%
+- ocr_number:        91.5%
+- bankgiro:          92.3%
+- plusgiro:          90.8%
+- amount:            88.7%
+- supplier_org_num:  85.2%
+- payment_line:      82.4%
+- customer_number:   81.1%
+```
+
+## Accuracy Evaluation
+
+`/eval accuracy`
+
+Evaluate field extraction accuracy against ground truth:
+
+```bash
+# Run accuracy evaluation on labeled data
+python -m src.cli.infer --model runs/train/invoice_fields/weights/best.pt \
+    --input ~/invoice-data/test/*.pdf \
+    --ground-truth ~/invoice-data/test/labels.csv \
+    --output eval_results.json
+```
+
+Output:
+```
+Field Extraction Accuracy
+=========================
+Documents tested: 500
+
+Per-field accuracy:
+- InvoiceNumber:   98.9% (494/500)
+- InvoiceDate:     95.5% (478/500)
+- InvoiceDueDate:  95.9% (480/500)
+- OCR:             99.1% (496/500)
+- Bankgiro:        99.0% (495/500)
+- Plusgiro:        99.4% (497/500)
+- Amount:          91.3% (457/500)
+- supplier_org:    78.2% (391/500)
+
+Overall: 94.8%
+```
+
+## Compare Models
+
+`/eval compare`
+
+Compare two model versions:
+
+```bash
+# Compare old vs new model
+python -m src.cli.eval compare \
+    --model-a runs/train/invoice_v1/weights/best.pt \
+    --model-b runs/train/invoice_v2/weights/best.pt \
+    --test-data ~/invoice-data/test/
+```
+
+Output:
+```
+Model Comparison
+================
+                Model A     Model B     Delta
+mAP@0.5:        91.2%       93.5%       +2.3%
+Accuracy:       92.1%       94.8%       +2.7%
+Speed (ms):     1850        1520        -330
+
+Per-field improvements:
+- amount:       +4.2%
+- payment_line: +3.8%
+- customer_num: +2.1%
+
+Recommendation: Deploy Model B
+```
+
+## Generate Report
+
+`/eval report`
+
+Generate comprehensive evaluation report:
+
+```bash
+python -m src.cli.eval report --output eval_report.md
+```
+
+Output:
+```markdown
+# Evaluation Report
+Generated: 2026-01-25
+
+## Model Performance
+- Model: runs/train/invoice_fields/weights/best.pt
+- mAP@0.5: 93.5%
+- Training samples: 9,738
+
+## Field Extraction Accuracy
+| Field | Accuracy | Errors |
+|-------|----------|--------|
+| InvoiceNumber | 98.9% | 6 |
+| Amount | 91.3% | 43 |
+...
+
+## Error Analysis
+### Common Errors
+1. Amount: OCR misreads comma as period
+2. supplier_org: Missing from some invoices
+3. payment_line: Partially obscured by stamps
+
+## Recommendations
+1. Add more training data for low-accuracy fields
+2. Implement OCR error correction for amounts
+3. Consider confidence threshold tuning
+```
+
+## Quick Commands
+
+```bash
+# Evaluate model metrics
+yolo val model=runs/train/invoice_fields/weights/best.pt
+
+# Test inference on sample
+python -m src.cli.infer --input sample.pdf --output result.json --gpu
+
+# Check test coverage
+pytest --cov=src --cov-report=html
+```
+
+## Evaluation Metrics
+
+| Metric | Target | Current |
+|--------|--------|---------|
+| mAP@0.5 | >90% | 93.5% |
+| Overall Accuracy | >90% | 94.8% |
+| Test Coverage | >60% | 37% |
+| Tests Passing | 100% | 100% |
+
+## When to Evaluate
+
+- After training a new model
+- Before deploying to production
+- After adding new training data
+- When accuracy complaints arise
+- Weekly performance monitoring