feat: add field-specific bbox expansion strategies for YOLO training

Implement center-point based bbox scaling with directional compensation
to capture field labels that typically appear above or to the left of
field values. This improves YOLO training data quality by including
contextual information around field values.

Key changes:
- Add shared.bbox module with ScaleStrategy dataclass and expand_bbox function
- Define field-specific strategies (ocr_number, bankgiro, invoice_date, etc.)
- Support manual_mode for minimal padding (no scaling)
- Integrate expand_bbox into AnnotationGenerator
- Add FIELD_TO_CLASS mapping for field_name to class_name lookup
- Comprehensive tests with 100% coverage (45 tests)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Yaojia Wang
2026-02-04 22:56:52 +01:00
parent 8723ef4653
commit 0990239e9c
13 changed files with 1424 additions and 18 deletions

View File

@@ -0,0 +1,101 @@
"""
BBox Expander Module.
Provides functions to expand bounding boxes using field-specific strategies.
Expansion is center-point based with directional compensation.
Two modes:
- Auto-label (default): Field-specific scale strategies
- Manual-label: Minimal padding only to prevent edge clipping
"""
from .scale_strategy import (
ScaleStrategy,
DEFAULT_STRATEGY,
MANUAL_LABEL_STRATEGY,
FIELD_SCALE_STRATEGIES,
)
def expand_bbox(
bbox: tuple[float, float, float, float],
image_width: float,
image_height: float,
field_type: str,
strategies: dict[str, ScaleStrategy] | None = None,
manual_mode: bool = False,
) -> tuple[int, int, int, int]:
"""
Expand bbox using field-specific scale strategy.
The expansion follows these steps:
1. Scale bbox around center point (scale_x, scale_y)
2. Apply directional compensation (extra_*_ratio)
3. Clamp expansion to max_pad limits
4. Clamp to image boundaries
Args:
bbox: (x0, y0, x1, y1) in pixels
image_width: Image width for boundary clamping
image_height: Image height for boundary clamping
field_type: Field class_name (e.g., "ocr_number")
strategies: Custom strategies dict, defaults to FIELD_SCALE_STRATEGIES
manual_mode: If True, use MANUAL_LABEL_STRATEGY (minimal padding only)
Returns:
Expanded bbox (x0, y0, x1, y1) as integers, clamped to image bounds
"""
x0, y0, x1, y1 = bbox
w = x1 - x0
h = y1 - y0
# Get strategy based on mode
if manual_mode:
strategy = MANUAL_LABEL_STRATEGY
elif strategies is None:
strategy = FIELD_SCALE_STRATEGIES.get(field_type, DEFAULT_STRATEGY)
else:
strategy = strategies.get(field_type, DEFAULT_STRATEGY)
# Step 1: Scale around center point
cx = (x0 + x1) / 2
cy = (y0 + y1) / 2
new_w = w * strategy.scale_x
new_h = h * strategy.scale_y
nx0 = cx - new_w / 2
nx1 = cx + new_w / 2
ny0 = cy - new_h / 2
ny1 = cy + new_h / 2
# Step 2: Apply directional compensation
nx0 -= w * strategy.extra_left_ratio
nx1 += w * strategy.extra_right_ratio
ny0 -= h * strategy.extra_top_ratio
ny1 += h * strategy.extra_bottom_ratio
# Step 3: Clamp expansion to max_pad limits (preserve asymmetry)
left_pad = min(x0 - nx0, strategy.max_pad_x)
right_pad = min(nx1 - x1, strategy.max_pad_x)
top_pad = min(y0 - ny0, strategy.max_pad_y)
bottom_pad = min(ny1 - y1, strategy.max_pad_y)
# Ensure pads are non-negative (in case of contraction)
left_pad = max(0, left_pad)
right_pad = max(0, right_pad)
top_pad = max(0, top_pad)
bottom_pad = max(0, bottom_pad)
nx0 = x0 - left_pad
nx1 = x1 + right_pad
ny0 = y0 - top_pad
ny1 = y1 + bottom_pad
# Step 4: Clamp to image boundaries
nx0 = max(0, int(nx0))
ny0 = max(0, int(ny0))
nx1 = min(int(image_width), int(nx1))
ny1 = min(int(image_height), int(ny1))
return (nx0, ny0, nx1, ny1)