1449 lines
61 KiB
Markdown
1449 lines
61 KiB
Markdown
# Document Annotation Tool - Product Plan v2
|
|
|
|
## Table of Contents
|
|
1. [Product Requirements Document (PRD)](#1-product-requirements-document-prd)
|
|
- Epic 1-6: Original features
|
|
- **Epic 7: Dashboard Enhancement** (NEW)
|
|
2. [CSV Format Specification](#2-csv-format-specification)
|
|
3. [Database Schema Changes](#3-database-schema-changes)
|
|
4. [API Specification](#4-api-specification)
|
|
- 4.1-4.2: Original endpoints
|
|
- **4.3: Dashboard API Endpoints** (NEW)
|
|
5. [UI Wireframes (Text-Based)](#5-ui-wireframes-text-based)
|
|
- **5.0: Dashboard Overview** (NEW)
|
|
- 5.1-5.5: Original wireframes
|
|
6. [Implementation Phases](#6-implementation-phases)
|
|
7. [State Machine Diagrams](#7-state-machine-diagrams)
|
|
|
|
---
|
|
|
|
## 1. Product Requirements Document (PRD)
|
|
|
|
### 1.1 Overview
|
|
|
|
This enhancement adds batch upload capabilities, document lifecycle management, manual annotation workflow with auto-label dependency, comprehensive training management, and enhanced document detail views to the Invoice Master Document Annotation Tool.
|
|
|
|
### 1.2 User Stories
|
|
|
|
#### Epic 1: Batch Upload (ZIP Support)
|
|
|
|
| ID | User Story | Acceptance Criteria | Priority |
|
|
|----|------------|---------------------|----------|
|
|
| US-1.1 | As a user, I want to upload a ZIP file containing multiple PDFs so that I can process many documents at once | - ZIP file is extracted<br>- Each PDF is registered as a separate document<br>- Document IDs are returned for all files<br>- Invalid files are skipped with error message | P0 |
|
|
| US-1.2 | As a user, I want to include a CSV file in my ZIP for auto-labeling so that annotations are created automatically | - CSV is parsed and validated<br>- DocumentId column maps to PDF filenames<br>- Field values are stored for auto-labeling<br>- Invalid CSV rows are logged | P0 |
|
|
| US-1.3 | As a user, I want to upload a single PDF with auto-label values via API so that I can integrate with my workflow | - PDF is uploaded<br>- Auto-label values provided in JSON body<br>- Auto-labeling runs automatically<br>- Document ID returned | P0 |
|
|
| US-1.4 | As a user, I want clear feedback on batch upload progress so that I know which files succeeded or failed | - Upload progress indicator<br>- Per-file status (success/failed)<br>- Error messages for failed files<br>- Summary count displayed | P1 |
|
|
|
|
#### Epic 2: Document List and Status
|
|
|
|
| ID | User Story | Acceptance Criteria | Priority |
|
|
|----|------------|---------------------|----------|
|
|
| US-2.1 | As a user, I want to see a list of all uploaded documents so that I can manage my annotations | - Paginated document list<br>- Shows filename, status, date<br>- Sortable columns<br>- Search/filter capability | P0 |
|
|
| US-2.2 | As a user, I want to see auto-label status for each document so that I know processing progress | - Status badge: pending, processing, completed, failed<br>- Progress indicator for processing<br>- Error message for failed | P0 |
|
|
| US-2.3 | As a user, I want to see the upload source (API vs UI) so that I can track document origin | - Source column in list<br>- Filter by source<br>- Source shown in detail view | P1 |
|
|
| US-2.4 | As a user, I want to see annotation preview for completed documents so that I can quickly review | - Thumbnail with overlaid bounding boxes<br>- Annotation count badge<br>- Click to view full detail | P1 |
|
|
|
|
#### Epic 3: Manual Annotation with Auto-Label Dependency
|
|
|
|
| ID | User Story | Acceptance Criteria | Priority |
|
|
|----|------------|---------------------|----------|
|
|
| US-3.1 | As a user, I want to be blocked from manual annotation if auto-label is pending so that I don't lose work | - Clear message: "Auto-labeling in progress, please wait"<br>- Refresh button to check status<br>- Automatic unlock when complete | P0 |
|
|
| US-3.2 | As a user, I want to override auto-generated annotations so that I can correct errors | - Can edit any annotation<br>- Source changes from "auto" to "manual"<br>- Original auto value preserved in history<br>- Override timestamp recorded | P0 |
|
|
| US-3.3 | As a user, I want to see which annotations are manual vs auto so that I can review confidence | - Color-coded annotation badges<br>- Manual: solid border<br>- Auto: dashed border with confidence %<br>- Filter by source | P0 |
|
|
| US-3.4 | As a user, I want to accept or reject individual auto-annotations so that I can curate training data | - Accept button marks as verified<br>- Reject button removes annotation<br>- Bulk accept/reject actions | P1 |
|
|
|
|
#### Epic 4: Training Page Features
|
|
|
|
| ID | User Story | Acceptance Criteria | Priority |
|
|
|----|------------|---------------------|----------|
|
|
| US-4.1 | As a user, I want to see all documents available for training so that I can select training data | - Filtered list (only labeled documents)<br>- Shows annotation count per document<br>- Checkbox selection<br>- Select all/none options | P0 |
|
|
| US-4.2 | As a user, I want to select specific documents for training so that I can control data quality | - Multi-select with checkboxes<br>- Selection count displayed<br>- Clear selection button<br>- Persisted selection state | P0 |
|
|
| US-4.3 | As a user, I want to see all trained models so that I can track model history | - Model list with name, date, status<br>- Document count used<br>- mAP/accuracy metrics<br>- Download model link | P0 |
|
|
| US-4.4 | As a user, I want to see which documents were used in training so that I can track data lineage | - "Used in training" badge on documents<br>- Click to see model list<br>- Filter documents by training status | P1 |
|
|
| US-4.5 | As a user, I want to start a training job with selected documents so that I can create new models | - Start training button<br>- Training config options<br>- Progress monitoring<br>- Email notification on completion | P0 |
|
|
|
|
#### Epic 5: Document Detail View (Enhanced)
|
|
|
|
| ID | User Story | Acceptance Criteria | Priority |
|
|
|----|------------|---------------------|----------|
|
|
| US-5.1 | As a user, I want to see all annotations with their source so that I can review data quality | - Annotation list with source column<br>- Confidence score for auto<br>- Edit/delete buttons<br>- Group by field type | P0 |
|
|
| US-5.2 | As a user, I want to see training history for a document so that I can understand model lineage | - List of models using this document<br>- Training date and model name<br>- Link to model detail page | P1 |
|
|
| US-5.3 | As a user, I want to edit annotations inline so that I can quickly make corrections | - Click to edit bounding box<br>- Drag to resize<br>- Double-click to edit text value<br>- Save/cancel buttons | P0 |
|
|
| US-5.4 | As a user, I want to see auto vs manual annotation comparison so that I can evaluate auto-label quality | - Side-by-side comparison view<br>- Highlight differences<br>- Override history timeline | P2 |
|
|
|
|
#### Epic 6: API Endpoints
|
|
|
|
| ID | User Story | Acceptance Criteria | Priority |
|
|
|----|------------|---------------------|----------|
|
|
| US-6.1 | As a developer, I want to upload ZIP/PDF via API so that I can automate document ingestion | - POST endpoint accepts multipart<br>- Returns document IDs array<br>- Async processing option<br>- Webhook callback support | P0 |
|
|
| US-6.2 | As a developer, I want to upload PDF with auto-label values so that I can pre-annotate documents | - JSON body with field values<br>- Auto-label runs synchronously or async<br>- Returns annotation IDs | P0 |
|
|
| US-6.3 | As a developer, I want to query document status so that I can poll for completion | - GET endpoint with document ID<br>- Returns full status object<br>- Includes annotation summary | P0 |
|
|
| US-6.4 | As a developer, I want API-uploaded documents visible in UI so that I can manage all documents centrally | - Same data model for API/UI uploads<br>- Source field distinguishes origin<br>- Full UI functionality available | P0 |
|
|
|
|
#### Epic 7: Dashboard Enhancement
|
|
|
|
| ID | User Story | Acceptance Criteria | Priority |
|
|
|----|------------|---------------------|----------|
|
|
| US-7.1 | As a user, I want to see data quality metrics on the dashboard so that I can monitor annotation completeness | - Annotation completeness rate displayed as percentage ring<br>- Complete/incomplete/pending document counts<br>- Click incomplete count to jump to filtered document list | P0 |
|
|
| US-7.2 | As a user, I want to see the active model status on the dashboard so that I can monitor model performance | - Current model version and name displayed<br>- mAP/precision/recall metrics shown<br>- Activation date and training document count displayed<br>- Running training task shown if any | P0 |
|
|
| US-7.3 | As a user, I want to see recent activity on the dashboard so that I can track system changes | - Last 10 activities displayed with relative timestamps<br>- Activity types: document upload, annotation change, training complete/failed, model activation<br>- Each activity shows icon, description, and time | P1 |
|
|
| US-7.4 | As a user, I want the dashboard stats cards to show meaningful data so that I can quickly assess system state | - Total Documents count<br>- Annotation Complete count (documents with core fields)<br>- Incomplete count (labeled but missing core fields)<br>- Pending count (pending + auto_labeling status) | P0 |
|
|
|
|
**Annotation Completeness Definition:**
|
|
|
|
A document is considered "annotation complete" when it has:
|
|
- `invoice_number` OR `ocr_number` (at least one identifier)
|
|
- `bankgiro` OR `plusgiro` (at least one payment account)
|
|
|
|
Documents with status=labeled but missing these core fields are considered "incomplete".
|
|
|
|
---
|
|
|
|
## 2. CSV Format Specification
|
|
|
|
### 2.1 Required Headers
|
|
|
|
```csv
|
|
customer_number,supplier_name,supplier_organisation_number,supplier_accounts,DocumentId,InvoiceNumber,InvoiceDate,InvoiceDueDate,Amount,OCR,Message,Bankgiro,Plusgiro
|
|
```
|
|
|
|
### 2.2 Column Definitions
|
|
|
|
| Column | Type | Required | Maps to Class | Description | Validation Rules |
|
|
|--------|------|----------|---------------|-------------|------------------|
|
|
| `DocumentId` | string | YES | N/A | PDF filename (without .pdf extension) | Non-empty, alphanumeric + underscore/hyphen |
|
|
| `customer_number` | string | NO | customer_number (9) | Customer reference number | Max 50 chars |
|
|
| `supplier_name` | string | NO | N/A (metadata only) | Supplier company name | Max 255 chars |
|
|
| `supplier_organisation_number` | string | NO | supplier_organisation_number (7) | Swedish org number (XXXXXX-XXXX) | Format: 6 digits, hyphen, 4 digits |
|
|
| `supplier_accounts` | string | NO | N/A (metadata) | Pipe-separated account numbers | Max 500 chars |
|
|
| `InvoiceNumber` | string | NO | invoice_number (0) | Invoice reference | Max 50 chars |
|
|
| `InvoiceDate` | date | NO | invoice_date (1) | Invoice issue date | ISO 8601 or YYYY-MM-DD |
|
|
| `InvoiceDueDate` | date | NO | invoice_due_date (2) | Payment due date | ISO 8601 or YYYY-MM-DD |
|
|
| `Amount` | decimal | NO | amount (6) | Invoice total amount | Numeric, max 2 decimal places |
|
|
| `OCR` | string | NO | ocr_number (3) | Swedish OCR payment reference | Numeric string, max 25 chars |
|
|
| `Message` | string | NO | N/A (metadata only) | Free-text payment message | Max 140 chars |
|
|
| `Bankgiro` | string | NO | bankgiro (4) | Bankgiro account number | Format: XXX-XXXX or 7-8 digits |
|
|
| `Plusgiro` | string | NO | plusgiro (5) | Plusgiro account number | Format: XXXXXX-X or 6-8 digits |
|
|
|
|
### 2.3 Field to Class Mapping
|
|
|
|
```python
|
|
CSV_TO_CLASS_MAPPING = {
|
|
'InvoiceNumber': 0, # invoice_number
|
|
'InvoiceDate': 1, # invoice_date
|
|
'InvoiceDueDate': 2, # invoice_due_date
|
|
'OCR': 3, # ocr_number
|
|
'Bankgiro': 4, # bankgiro
|
|
'Plusgiro': 5, # plusgiro
|
|
'Amount': 6, # amount
|
|
'supplier_organisation_number': 7, # supplier_organisation_number
|
|
# 8: payment_line (derived from OCR/Bankgiro/Amount)
|
|
'customer_number': 9, # customer_number
|
|
}
|
|
```
|
|
|
|
### 2.4 Example CSV
|
|
|
|
```csv
|
|
customer_number,supplier_name,supplier_organisation_number,supplier_accounts,DocumentId,InvoiceNumber,InvoiceDate,InvoiceDueDate,Amount,OCR,Message,Bankgiro,Plusgiro
|
|
C12345,ACME Corp,556677-8899,123-4567|987-6543,INV001,F2024-001,2024-01-15,2024-02-15,1250.00,7350012345678,,123-4567,
|
|
C12346,Widget AB,112233-4455,,INV002,F2024-002,2024-01-16,2024-02-16,3450.50,,,987-6543,
|
|
```
|
|
|
|
### 2.5 Validation Rules
|
|
|
|
1. **DocumentId**: Required, must match a PDF filename in the ZIP
|
|
2. **At least one matchable field**: One of InvoiceNumber, OCR, Bankgiro, Plusgiro, Amount, supplier_organisation_number must be non-empty
|
|
3. **Date formats**: YYYY-MM-DD, DD/MM/YYYY, DD.MM.YYYY
|
|
4. **Amount formats**: 1234.56, 1 234,56, 1234,56 SEK
|
|
5. **Swedish org number**: XXXXXX-XXXX pattern
|
|
|
|
---
|
|
|
|
## 3. Database Schema Changes
|
|
|
|
### 3.1 New Tables
|
|
|
|
#### 3.1.1 BatchUpload Table
|
|
|
|
```sql
|
|
CREATE TABLE batch_uploads (
|
|
batch_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
admin_token VARCHAR(255) NOT NULL REFERENCES admin_tokens(token),
|
|
filename VARCHAR(255) NOT NULL,
|
|
file_size INTEGER NOT NULL,
|
|
upload_source VARCHAR(20) NOT NULL DEFAULT 'ui', -- 'ui' or 'api'
|
|
status VARCHAR(20) NOT NULL DEFAULT 'processing',
|
|
-- Status: processing, completed, partial, failed
|
|
total_files INTEGER DEFAULT 0,
|
|
processed_files INTEGER DEFAULT 0,
|
|
successful_files INTEGER DEFAULT 0,
|
|
failed_files INTEGER DEFAULT 0,
|
|
error_message TEXT,
|
|
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
|
|
completed_at TIMESTAMP
|
|
);
|
|
|
|
CREATE INDEX idx_batch_uploads_admin_token ON batch_uploads(admin_token);
|
|
CREATE INDEX idx_batch_uploads_status ON batch_uploads(status);
|
|
```
|
|
|
|
#### 3.1.2 BatchUploadFile Table
|
|
|
|
```sql
|
|
CREATE TABLE batch_upload_files (
|
|
file_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
batch_id UUID NOT NULL REFERENCES batch_uploads(batch_id) ON DELETE CASCADE,
|
|
document_id UUID REFERENCES admin_documents(document_id),
|
|
filename VARCHAR(255) NOT NULL,
|
|
status VARCHAR(20) NOT NULL DEFAULT 'pending',
|
|
-- Status: pending, processing, completed, failed, skipped
|
|
error_message TEXT,
|
|
csv_row_data JSONB, -- Parsed CSV row for this file
|
|
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
|
|
processed_at TIMESTAMP
|
|
);
|
|
|
|
CREATE INDEX idx_batch_upload_files_batch_id ON batch_upload_files(batch_id);
|
|
CREATE INDEX idx_batch_upload_files_document_id ON batch_upload_files(document_id);
|
|
```
|
|
|
|
#### 3.1.3 TrainingDocumentLink Table (Junction Table)
|
|
|
|
```sql
|
|
CREATE TABLE training_document_links (
|
|
link_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
task_id UUID NOT NULL REFERENCES training_tasks(task_id) ON DELETE CASCADE,
|
|
document_id UUID NOT NULL REFERENCES admin_documents(document_id) ON DELETE CASCADE,
|
|
annotation_snapshot JSONB, -- Snapshot of annotations at training time
|
|
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
|
|
|
|
UNIQUE(task_id, document_id)
|
|
);
|
|
|
|
CREATE INDEX idx_training_doc_links_task_id ON training_document_links(task_id);
|
|
CREATE INDEX idx_training_doc_links_document_id ON training_document_links(document_id);
|
|
```
|
|
|
|
#### 3.1.4 AnnotationHistory Table
|
|
|
|
```sql
|
|
CREATE TABLE annotation_history (
|
|
history_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
annotation_id UUID NOT NULL REFERENCES admin_annotations(annotation_id) ON DELETE CASCADE,
|
|
action VARCHAR(20) NOT NULL, -- 'created', 'updated', 'deleted', 'override'
|
|
previous_value JSONB, -- Full annotation state before change
|
|
new_value JSONB, -- Full annotation state after change
|
|
changed_by VARCHAR(255), -- admin_token
|
|
change_reason TEXT,
|
|
created_at TIMESTAMP NOT NULL DEFAULT NOW()
|
|
);
|
|
|
|
CREATE INDEX idx_annotation_history_annotation_id ON annotation_history(annotation_id);
|
|
CREATE INDEX idx_annotation_history_created_at ON annotation_history(created_at);
|
|
```
|
|
|
|
### 3.2 Modified Tables
|
|
|
|
#### 3.2.1 AdminDocument Modifications
|
|
|
|
```sql
|
|
ALTER TABLE admin_documents ADD COLUMN upload_source VARCHAR(20) DEFAULT 'ui';
|
|
-- Values: 'ui', 'api'
|
|
|
|
ALTER TABLE admin_documents ADD COLUMN batch_id UUID REFERENCES batch_uploads(batch_id);
|
|
|
|
ALTER TABLE admin_documents ADD COLUMN csv_field_values JSONB;
|
|
-- Stores original CSV values for reference
|
|
|
|
ALTER TABLE admin_documents ADD COLUMN auto_label_queued_at TIMESTAMP;
|
|
-- When auto-label was queued (for dependency checking)
|
|
|
|
ALTER TABLE admin_documents ADD COLUMN annotation_lock_until TIMESTAMP;
|
|
-- Lock for manual annotation while auto-label runs
|
|
|
|
CREATE INDEX idx_admin_documents_upload_source ON admin_documents(upload_source);
|
|
CREATE INDEX idx_admin_documents_batch_id ON admin_documents(batch_id);
|
|
```
|
|
|
|
#### 3.2.2 AdminAnnotation Modifications
|
|
|
|
```sql
|
|
ALTER TABLE admin_annotations ADD COLUMN is_verified BOOLEAN DEFAULT FALSE;
|
|
-- User-verified annotation
|
|
|
|
ALTER TABLE admin_annotations ADD COLUMN verified_at TIMESTAMP;
|
|
ALTER TABLE admin_annotations ADD COLUMN verified_by VARCHAR(255);
|
|
|
|
ALTER TABLE admin_annotations ADD COLUMN override_source VARCHAR(20);
|
|
-- If this annotation overrides another: 'auto' or 'imported'
|
|
|
|
ALTER TABLE admin_annotations ADD COLUMN original_annotation_id UUID;
|
|
-- Reference to the annotation this overrides
|
|
|
|
CREATE INDEX idx_admin_annotations_source ON admin_annotations(source);
|
|
CREATE INDEX idx_admin_annotations_is_verified ON admin_annotations(is_verified);
|
|
```
|
|
|
|
#### 3.2.3 TrainingTask Modifications
|
|
|
|
```sql
|
|
ALTER TABLE training_tasks ADD COLUMN document_count INTEGER DEFAULT 0;
|
|
-- Count of documents used in training
|
|
|
|
ALTER TABLE training_tasks ADD COLUMN document_ids UUID[];
|
|
-- Array of document IDs used (for quick reference)
|
|
|
|
ALTER TABLE training_tasks ADD COLUMN metrics_mAP FLOAT;
|
|
ALTER TABLE training_tasks ADD COLUMN metrics_precision FLOAT;
|
|
ALTER TABLE training_tasks ADD COLUMN metrics_recall FLOAT;
|
|
-- Extracted metrics for easy querying
|
|
|
|
CREATE INDEX idx_training_tasks_metrics ON training_tasks(metrics_mAP);
|
|
```
|
|
|
|
### 3.3 SQLModel Definitions
|
|
|
|
```python
|
|
# File: src/data/admin_models.py
|
|
|
|
from datetime import datetime
|
|
from typing import Any
|
|
from uuid import UUID, uuid4
|
|
from sqlmodel import Field, SQLModel, Column, JSON, ARRAY
|
|
from sqlalchemy import String
|
|
|
|
|
|
class BatchUpload(SQLModel, table=True):
|
|
"""Batch upload record for ZIP uploads."""
|
|
|
|
__tablename__ = "batch_uploads"
|
|
|
|
batch_id: UUID = Field(default_factory=uuid4, primary_key=True)
|
|
admin_token: str = Field(foreign_key="admin_tokens.token", max_length=255, index=True)
|
|
filename: str = Field(max_length=255)
|
|
file_size: int
|
|
upload_source: str = Field(default="ui", max_length=20)
|
|
status: str = Field(default="processing", max_length=20, index=True)
|
|
total_files: int = Field(default=0)
|
|
processed_files: int = Field(default=0)
|
|
successful_files: int = Field(default=0)
|
|
failed_files: int = Field(default=0)
|
|
error_message: str | None = Field(default=None)
|
|
created_at: datetime = Field(default_factory=datetime.utcnow)
|
|
completed_at: datetime | None = Field(default=None)
|
|
|
|
|
|
class BatchUploadFile(SQLModel, table=True):
|
|
"""Individual file within a batch upload."""
|
|
|
|
__tablename__ = "batch_upload_files"
|
|
|
|
file_id: UUID = Field(default_factory=uuid4, primary_key=True)
|
|
batch_id: UUID = Field(foreign_key="batch_uploads.batch_id", index=True)
|
|
document_id: UUID | None = Field(default=None, foreign_key="admin_documents.document_id")
|
|
filename: str = Field(max_length=255)
|
|
status: str = Field(default="pending", max_length=20)
|
|
error_message: str | None = Field(default=None)
|
|
csv_row_data: dict[str, Any] | None = Field(default=None, sa_column=Column(JSON))
|
|
created_at: datetime = Field(default_factory=datetime.utcnow)
|
|
processed_at: datetime | None = Field(default=None)
|
|
|
|
|
|
class TrainingDocumentLink(SQLModel, table=True):
|
|
"""Link between training tasks and documents used."""
|
|
|
|
__tablename__ = "training_document_links"
|
|
|
|
link_id: UUID = Field(default_factory=uuid4, primary_key=True)
|
|
task_id: UUID = Field(foreign_key="training_tasks.task_id", index=True)
|
|
document_id: UUID = Field(foreign_key="admin_documents.document_id", index=True)
|
|
annotation_snapshot: dict[str, Any] | None = Field(default=None, sa_column=Column(JSON))
|
|
created_at: datetime = Field(default_factory=datetime.utcnow)
|
|
|
|
|
|
class AnnotationHistory(SQLModel, table=True):
|
|
"""History of annotation changes."""
|
|
|
|
__tablename__ = "annotation_history"
|
|
|
|
history_id: UUID = Field(default_factory=uuid4, primary_key=True)
|
|
annotation_id: UUID = Field(foreign_key="admin_annotations.annotation_id", index=True)
|
|
action: str = Field(max_length=20)
|
|
previous_value: dict[str, Any] | None = Field(default=None, sa_column=Column(JSON))
|
|
new_value: dict[str, Any] | None = Field(default=None, sa_column=Column(JSON))
|
|
changed_by: str | None = Field(default=None, max_length=255)
|
|
change_reason: str | None = Field(default=None)
|
|
created_at: datetime = Field(default_factory=datetime.utcnow, index=True)
|
|
```
|
|
|
|
---
|
|
|
|
## 4. API Specification
|
|
|
|
### 4.1 New Endpoints
|
|
|
|
#### 4.1.1 Batch Upload (ZIP)
|
|
|
|
```yaml
|
|
POST /api/v1/admin/batch/upload
|
|
Content-Type: multipart/form-data
|
|
|
|
Request:
|
|
file: binary (ZIP file)
|
|
async: boolean (default: true)
|
|
auto_label: boolean (default: true)
|
|
|
|
Response (202 Accepted):
|
|
{
|
|
"batch_id": "uuid",
|
|
"status": "processing",
|
|
"total_files": 25,
|
|
"message": "Batch upload started. Use batch_id to check progress.",
|
|
"status_url": "/api/v1/admin/batch/{batch_id}"
|
|
}
|
|
|
|
Response (200 OK - sync mode):
|
|
{
|
|
"batch_id": "uuid",
|
|
"status": "completed",
|
|
"total_files": 25,
|
|
"successful_files": 23,
|
|
"failed_files": 2,
|
|
"documents": [
|
|
{
|
|
"document_id": "uuid",
|
|
"filename": "INV001.pdf",
|
|
"status": "completed",
|
|
"auto_label_status": "completed",
|
|
"annotations_created": 8
|
|
}
|
|
],
|
|
"errors": [
|
|
{
|
|
"filename": "invalid.pdf",
|
|
"error": "Corrupted PDF file"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### 4.1.2 Batch Status
|
|
|
|
```yaml
|
|
GET /api/v1/admin/batch/{batch_id}
|
|
|
|
Response:
|
|
{
|
|
"batch_id": "uuid",
|
|
"status": "processing",
|
|
"progress": {
|
|
"total": 25,
|
|
"processed": 15,
|
|
"successful": 14,
|
|
"failed": 1,
|
|
"percentage": 60
|
|
},
|
|
"files": [
|
|
{
|
|
"file_id": "uuid",
|
|
"filename": "INV001.pdf",
|
|
"document_id": "uuid",
|
|
"status": "completed"
|
|
}
|
|
],
|
|
"created_at": "2024-01-15T10:00:00Z",
|
|
"estimated_completion": "2024-01-15T10:05:00Z"
|
|
}
|
|
```
|
|
|
|
#### 4.1.3 Upload PDF with Auto-Label Values
|
|
|
|
```yaml
|
|
POST /api/v1/admin/documents/upload-with-labels
|
|
Content-Type: multipart/form-data
|
|
|
|
Request:
|
|
file: binary (PDF file)
|
|
field_values: JSON string
|
|
{
|
|
"InvoiceNumber": "F2024-001",
|
|
"InvoiceDate": "2024-01-15",
|
|
"Amount": "1250.00",
|
|
"OCR": "7350012345678",
|
|
"Bankgiro": "123-4567"
|
|
}
|
|
auto_label: boolean (default: true)
|
|
wait_for_completion: boolean (default: false)
|
|
|
|
Response (202 Accepted):
|
|
{
|
|
"document_id": "uuid",
|
|
"filename": "invoice.pdf",
|
|
"status": "auto_labeling",
|
|
"auto_label_status": "running",
|
|
"message": "Document uploaded. Auto-labeling in progress."
|
|
}
|
|
|
|
Response (200 OK - wait_for_completion=true):
|
|
{
|
|
"document_id": "uuid",
|
|
"filename": "invoice.pdf",
|
|
"status": "labeled",
|
|
"auto_label_status": "completed",
|
|
"annotations": [
|
|
{
|
|
"annotation_id": "uuid",
|
|
"class_id": 0,
|
|
"class_name": "invoice_number",
|
|
"text_value": "F2024-001",
|
|
"confidence": 0.95,
|
|
"bbox": { "x": 100, "y": 200, "width": 150, "height": 30 }
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### 4.1.4 Query Document Status
|
|
|
|
```yaml
|
|
GET /api/v1/admin/documents/{document_id}/status
|
|
|
|
Response:
|
|
{
|
|
"document_id": "uuid",
|
|
"filename": "invoice.pdf",
|
|
"status": "labeled",
|
|
"auto_label_status": "completed",
|
|
"upload_source": "api",
|
|
"annotation_summary": {
|
|
"total": 8,
|
|
"manual": 2,
|
|
"auto": 6,
|
|
"verified": 3
|
|
},
|
|
"can_annotate": true,
|
|
"annotation_lock_reason": null,
|
|
"training_history": [
|
|
{
|
|
"task_id": "uuid",
|
|
"task_name": "Training Run 2024-01",
|
|
"trained_at": "2024-01-20T15:00:00Z"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### 4.1.5 Training with Document Selection
|
|
|
|
```yaml
|
|
POST /api/v1/admin/training/tasks
|
|
Content-Type: application/json
|
|
|
|
Request:
|
|
{
|
|
"name": "Training Run 2024-01",
|
|
"description": "First training run with 500 documents",
|
|
"document_ids": ["uuid1", "uuid2", "uuid3"],
|
|
"config": {
|
|
"model_name": "yolo11n.pt",
|
|
"epochs": 100,
|
|
"batch_size": 16,
|
|
"image_size": 640
|
|
},
|
|
"scheduled_at": "2024-01-20T22:00:00Z"
|
|
}
|
|
|
|
Response:
|
|
{
|
|
"task_id": "uuid",
|
|
"name": "Training Run 2024-01",
|
|
"status": "scheduled",
|
|
"document_count": 500,
|
|
"message": "Training task scheduled for 2024-01-20T22:00:00Z"
|
|
}
|
|
```
|
|
|
|
#### 4.1.6 Get Documents for Training
|
|
|
|
```yaml
|
|
GET /api/v1/admin/training/documents
|
|
|
|
Query Parameters:
|
|
- status: labeled (required)
|
|
- has_annotations: true
|
|
- min_annotation_count: 3
|
|
- exclude_used_in_training: boolean
|
|
- limit: 100
|
|
- offset: 0
|
|
|
|
Response:
|
|
{
|
|
"total": 1500,
|
|
"documents": [
|
|
{
|
|
"document_id": "uuid",
|
|
"filename": "INV001.pdf",
|
|
"annotation_count": 8,
|
|
"annotation_sources": { "manual": 3, "auto": 5 },
|
|
"used_in_training": ["task_id_1", "task_id_2"],
|
|
"last_modified": "2024-01-15T10:00:00Z"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### 4.1.7 Get Model List
|
|
|
|
```yaml
|
|
GET /api/v1/admin/training/models
|
|
|
|
Query Parameters:
|
|
- status: completed
|
|
- limit: 20
|
|
- offset: 0
|
|
|
|
Response:
|
|
{
|
|
"total": 15,
|
|
"models": [
|
|
{
|
|
"task_id": "uuid",
|
|
"name": "Training Run 2024-01",
|
|
"status": "completed",
|
|
"document_count": 500,
|
|
"created_at": "2024-01-20T15:00:00Z",
|
|
"completed_at": "2024-01-20T18:30:00Z",
|
|
"metrics": {
|
|
"mAP": 0.935,
|
|
"precision": 0.92,
|
|
"recall": 0.88
|
|
},
|
|
"model_path": "runs/train/invoice_fields_20240120/weights/best.pt",
|
|
"download_url": "/api/v1/admin/training/models/{task_id}/download"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### 4.1.8 Override Annotation
|
|
|
|
```yaml
|
|
PATCH /api/v1/admin/documents/{document_id}/annotations/{annotation_id}/override
|
|
Content-Type: application/json
|
|
|
|
Request:
|
|
{
|
|
"bbox": { "x": 110, "y": 205, "width": 145, "height": 28 },
|
|
"text_value": "F2024-001-A",
|
|
"reason": "Corrected OCR error"
|
|
}
|
|
|
|
Response:
|
|
{
|
|
"annotation_id": "uuid",
|
|
"source": "manual",
|
|
"override_source": "auto",
|
|
"original_annotation_id": "uuid",
|
|
"message": "Annotation overridden successfully",
|
|
"history_id": "uuid"
|
|
}
|
|
```
|
|
|
|
### 4.2 Modified Endpoints
|
|
|
|
#### 4.2.1 Document List (Enhanced)
|
|
|
|
```yaml
|
|
GET /api/v1/admin/documents
|
|
|
|
Query Parameters (additions):
|
|
- upload_source: 'ui' | 'api' | null
|
|
- has_annotations: boolean
|
|
- auto_label_status: 'pending' | 'running' | 'completed' | 'failed'
|
|
- used_in_training: boolean
|
|
- batch_id: uuid
|
|
|
|
Response (additions to DocumentItem):
|
|
{
|
|
"documents": [
|
|
{
|
|
// ... existing fields ...
|
|
"upload_source": "api",
|
|
"batch_id": "uuid",
|
|
"can_annotate": true,
|
|
"training_count": 2
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### 4.2.2 Document Detail (Enhanced)
|
|
|
|
```yaml
|
|
GET /api/v1/admin/documents/{document_id}
|
|
|
|
Response (additions):
|
|
{
|
|
// ... existing fields ...
|
|
"upload_source": "api",
|
|
"csv_field_values": {
|
|
"InvoiceNumber": "F2024-001",
|
|
"Amount": "1250.00"
|
|
},
|
|
"can_annotate": true,
|
|
"annotation_lock_reason": null,
|
|
"annotations": [
|
|
{
|
|
// ... existing fields ...
|
|
"is_verified": true,
|
|
"verified_at": "2024-01-16T09:00:00Z",
|
|
"override_source": null
|
|
}
|
|
],
|
|
"training_history": [
|
|
{
|
|
"task_id": "uuid",
|
|
"name": "Training Run 2024-01",
|
|
"trained_at": "2024-01-20T15:00:00Z",
|
|
"model_metrics": { "mAP": 0.935 }
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### 4.3 Dashboard API Endpoints
|
|
|
|
#### 4.3.1 Dashboard Statistics
|
|
|
|
```yaml
|
|
GET /api/v1/admin/dashboard/stats
|
|
|
|
Response:
|
|
{
|
|
"total_documents": 38,
|
|
"annotation_complete": 25,
|
|
"annotation_incomplete": 8,
|
|
"pending": 5,
|
|
"completeness_rate": 75.76
|
|
}
|
|
```
|
|
|
|
**Completeness Calculation Logic:**
|
|
- `annotation_complete`: Documents where status=labeled AND has (invoice_number OR ocr_number) AND has (bankgiro OR plusgiro)
|
|
- `annotation_incomplete`: Documents where status=labeled BUT missing core fields
|
|
- `pending`: Documents where status IN (pending, auto_labeling)
|
|
- `completeness_rate`: annotation_complete / (annotation_complete + annotation_incomplete) * 100
|
|
|
|
#### 4.3.2 Active Model Info
|
|
|
|
```yaml
|
|
GET /api/v1/admin/dashboard/active-model
|
|
|
|
Response:
|
|
{
|
|
"model": {
|
|
"version_id": "uuid",
|
|
"version": "1.2.0",
|
|
"name": "Invoice Model",
|
|
"metrics_mAP": 0.951,
|
|
"metrics_precision": 0.94,
|
|
"metrics_recall": 0.92,
|
|
"document_count": 500,
|
|
"activated_at": "2024-01-20T15:00:00Z"
|
|
},
|
|
"running_training": {
|
|
"task_id": "uuid",
|
|
"name": "Run-2024-02",
|
|
"status": "running",
|
|
"started_at": "2024-01-25T10:00:00Z",
|
|
"progress": 45
|
|
}
|
|
}
|
|
|
|
Response (no active model):
|
|
{
|
|
"model": null,
|
|
"running_training": null
|
|
}
|
|
```
|
|
|
|
#### 4.3.3 Recent Activity
|
|
|
|
```yaml
|
|
GET /api/v1/admin/dashboard/activity
|
|
|
|
Query Parameters:
|
|
- limit: 10 (default)
|
|
|
|
Response:
|
|
{
|
|
"activities": [
|
|
{
|
|
"type": "model_activated",
|
|
"description": "Activated model v1.2.0",
|
|
"timestamp": "2024-01-25T12:00:00Z",
|
|
"metadata": {
|
|
"version_id": "uuid",
|
|
"version": "1.2.0"
|
|
}
|
|
},
|
|
{
|
|
"type": "training_completed",
|
|
"description": "Training complete: Run-2024-01, mAP 95.1%",
|
|
"timestamp": "2024-01-24T18:30:00Z",
|
|
"metadata": {
|
|
"task_id": "uuid",
|
|
"task_name": "Run-2024-01",
|
|
"mAP": 0.951
|
|
}
|
|
},
|
|
{
|
|
"type": "annotation_modified",
|
|
"description": "Modified INV-001.pdf invoice_number",
|
|
"timestamp": "2024-01-24T14:20:00Z",
|
|
"metadata": {
|
|
"document_id": "uuid",
|
|
"filename": "INV-001.pdf",
|
|
"field_name": "invoice_number"
|
|
}
|
|
},
|
|
{
|
|
"type": "document_uploaded",
|
|
"description": "Uploaded INV-005.pdf",
|
|
"timestamp": "2024-01-23T09:15:00Z",
|
|
"metadata": {
|
|
"document_id": "uuid",
|
|
"filename": "INV-005.pdf"
|
|
}
|
|
},
|
|
{
|
|
"type": "training_failed",
|
|
"description": "Training failed: Run-2024-00",
|
|
"timestamp": "2024-01-22T16:45:00Z",
|
|
"metadata": {
|
|
"task_id": "uuid",
|
|
"task_name": "Run-2024-00",
|
|
"error": "GPU memory exceeded"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Activity Types:**
|
|
|
|
| Type | Description Template | Source |
|
|
|------|---------------------|--------|
|
|
| `document_uploaded` | "Uploaded {filename}" | `admin_documents.created_at` |
|
|
| `annotation_modified` | "Modified {filename} {field_name}" | `annotation_history` |
|
|
| `training_completed` | "Training complete: {task_name}, mAP {mAP}%" | `training_tasks` (status=completed) |
|
|
| `training_failed` | "Training failed: {task_name}" | `training_tasks` (status=failed) |
|
|
| `model_activated` | "Activated model {version}" | `model_versions.activated_at` |
|
|
|
|
---
|
|
|
|
## 5. UI Wireframes (Text-Based)
|
|
|
|
### 5.0 Dashboard Overview
|
|
|
|
```
|
|
+------------------------------------------------------------------+
|
|
| DOCUMENT ANNOTATION TOOL [User: Admin] [Logout]|
|
|
+------------------------------------------------------------------+
|
|
| [Dashboard] [Documents] [Training] [Models] [Settings] |
|
|
+------------------------------------------------------------------+
|
|
| |
|
|
| DASHBOARD |
|
|
| |
|
|
| +-------------+ +-------------+ +-------------+ +-------------+ |
|
|
| | Total | | Complete | | Incomplete | | Pending | |
|
|
| | Documents | | Annotations | | | | | |
|
|
| | 38 | | 25 | | 8 | | 5 | |
|
|
| +-------------+ +-------------+ +-------------+ +-------------+ |
|
|
| [View List] |
|
|
| |
|
|
| +---------------------------+ +-------------------------------+ |
|
|
| | DATA QUALITY | | ACTIVE MODEL | |
|
|
| | +-----------+ | | | |
|
|
| | | | | | v1.2.0 - Invoice Model | |
|
|
| | | 78% | Annotation | | ----------------------------- | |
|
|
| | | | Complete | | | |
|
|
| | +-----------+ | | mAP Precision Recall | |
|
|
| | | | 95.1% 94% 92% | |
|
|
| | Complete: 25 | | | |
|
|
| | Incomplete: 8 | | Activated: 2024-01-20 | |
|
|
| | Pending: 5 | | Documents: 500 | |
|
|
| | | | | |
|
|
| | [View Incomplete Docs] | | Training: Run-2024-02 [====] | |
|
|
| +---------------------------+ +-------------------------------+ |
|
|
| |
|
|
| +--------------------------------------------------------------+ |
|
|
| | RECENT ACTIVITY | |
|
|
| +--------------------------------------------------------------+ |
|
|
| | [rocket] Activated model v1.2.0 2 hours ago| |
|
|
| | [check] Training complete: Run-2024-01, mAP 95.1% yesterday| |
|
|
| | [edit] Modified INV-001.pdf invoice_number yesterday| |
|
|
| | [doc] Uploaded INV-005.pdf 2 days ago| |
|
|
| | [doc] Uploaded INV-004.pdf 2 days ago| |
|
|
| | [x] Training failed: Run-2024-00 3 days ago| |
|
|
| +--------------------------------------------------------------+ |
|
|
| |
|
|
| +--------------------------------------------------------------+ |
|
|
| | SYSTEM STATUS | |
|
|
| | Backend API: Online Database: Connected GPU: Available | |
|
|
| +--------------------------------------------------------------+ |
|
|
+------------------------------------------------------------------+
|
|
```
|
|
|
|
**Dashboard Components:**
|
|
|
|
| Component | Data Source | Update Frequency |
|
|
|-----------|-------------|------------------|
|
|
| Total Documents | `admin_documents` count | Real-time |
|
|
| Complete Annotations | Documents with (invoice_number OR ocr_number) AND (bankgiro OR plusgiro) | Real-time |
|
|
| Incomplete | Labeled documents missing core fields | Real-time |
|
|
| Pending | Documents with status pending or auto_labeling | Real-time |
|
|
| Data Quality Ring | Complete / (Complete + Incomplete) * 100% | Real-time |
|
|
| Active Model | `model_versions` where is_active=true | On model activation |
|
|
| Recent Activity | Aggregated from multiple tables (see below) | Real-time |
|
|
|
|
**Recent Activity Sources:**
|
|
|
|
| Activity Type | Icon | Source Table | Query |
|
|
|--------------|------|--------------|-------|
|
|
| Document Upload | doc | `admin_documents` | `created_at DESC` |
|
|
| Annotation Change | edit | `annotation_history` | `created_at DESC` |
|
|
| Training Complete | check | `training_tasks` | `status=completed, completed_at DESC` |
|
|
| Training Failed | x | `training_tasks` | `status=failed, completed_at DESC` |
|
|
| Model Activated | rocket | `model_versions` | `activated_at DESC` |
|
|
|
|
### 5.1 Document List View
|
|
|
|
```
|
|
+------------------------------------------------------------------+
|
|
| DOCUMENT ANNOTATION TOOL [User: Admin] [Logout]|
|
|
+------------------------------------------------------------------+
|
|
| [Documents] [Training] [Models] [Settings] |
|
|
+------------------------------------------------------------------+
|
|
| |
|
|
| DOCUMENTS |
|
|
| +-----------------+ +-----------------------------------------+ |
|
|
| | UPLOAD | | FILTERS | |
|
|
| | [Single PDF] | | Status: [All v] Source: [All v] | |
|
|
| | [ZIP Batch] | | Auto-Label: [All v] Search: [________] | |
|
|
| +-----------------+ +-----------------------------------------+ |
|
|
| |
|
|
| +--------------------------------------------------------------+ |
|
|
| | [] Filename | Status | Auto-Label | Source | Date | |
|
|
| +--------------------------------------------------------------+ |
|
|
| | [] INV001.pdf | Labeled | Completed | API | 01/15 | |
|
|
| | [8 annotations] | [Preview] | [95%] | | | |
|
|
| +--------------------------------------------------------------+ |
|
|
| | [] INV002.pdf | Pending | Running | UI | 01/16 | |
|
|
| | [0 annotations] | [Locked] | [==== ] | | | |
|
|
| +--------------------------------------------------------------+ |
|
|
| | [] INV003.pdf | Labeled | Failed | API | 01/16 | |
|
|
| | [5 annotations] | [Preview] | [Retry] | | | |
|
|
| +--------------------------------------------------------------+ |
|
|
| | [] INV004.pdf | Labeled | Completed | UI | 01/17 | |
|
|
| | [10 annotations]| [Preview] | [98%] | [Used] | | |
|
|
| +--------------------------------------------------------------+ |
|
|
| |
|
|
| Showing 1-20 of 1,543 documents [<] [1] [2] [3] ... [78] [>] |
|
|
| |
|
|
| [Delete Selected] [Start Training with Selected] |
|
|
+------------------------------------------------------------------+
|
|
```
|
|
|
|
### 5.2 Document Detail View
|
|
|
|
```
|
|
+------------------------------------------------------------------+
|
|
| < Back to Documents INV001.pdf |
|
|
+------------------------------------------------------------------+
|
|
| |
|
|
| +---------------------------+ +-------------------------------+ |
|
|
| | | | DOCUMENT INFO | |
|
|
| | | | Status: Labeled | |
|
|
| | [Page 1 Image with | | Source: API Upload | |
|
|
| | Annotation Overlays] | | Auto-Label: Completed (95%) | |
|
|
| | | | Pages: 1 | |
|
|
| | [Manual: Solid border] | | Uploaded: 2024-01-15 | |
|
|
| | [Auto: Dashed border] | | | |
|
|
| | | | TRAINING HISTORY | |
|
|
| | | | - Run 2024-01 (mAP: 93.5%) | |
|
|
| | | | - Run 2024-02 (mAP: 95.1%) | |
|
|
| | | | | |
|
|
| +---------------------------+ +-------------------------------+ |
|
|
| |
|
|
| ANNOTATIONS [Add Annotation] [Run OCR] |
|
|
| +--------------------------------------------------------------+ |
|
|
| | Field | Value | Source | Conf | Actions | |
|
|
| +--------------------------------------------------------------+ |
|
|
| | invoice_number | F2024-001 | Manual | - | [E] [D] | |
|
|
| +--------------------------------------------------------------+ |
|
|
| | invoice_date | 2024-01-15 | Auto | 95% | [V] [E][D]| |
|
|
| +--------------------------------------------------------------+ |
|
|
| | amount | 1,250.00 | Auto | 98% | [V] [E][D]| |
|
|
| +--------------------------------------------------------------+ |
|
|
| | ocr_number | 7350012345 | Auto | 87% | [V] [E][D]| |
|
|
| +--------------------------------------------------------------+ |
|
|
| | bankgiro | 123-4567 | Manual | - | [E] [D] | |
|
|
| +--------------------------------------------------------------+ |
|
|
| |
|
|
| [V] = Verify [E] = Edit [D] = Delete |
|
|
| |
|
|
| CSV FIELD VALUES (Reference) |
|
|
| +--------------------------------------------------------------+ |
|
|
| | InvoiceNumber: F2024-001 | InvoiceDate: 2024-01-15 | |
|
|
| | Amount: 1250.00 | OCR: 7350012345678 | |
|
|
| | Bankgiro: 123-4567 | | |
|
|
| +--------------------------------------------------------------+ |
|
|
+------------------------------------------------------------------+
|
|
```
|
|
|
|
### 5.3 Training Page
|
|
|
|
```
|
|
+------------------------------------------------------------------+
|
|
| DOCUMENT ANNOTATION TOOL [User: Admin] [Logout]|
|
|
+------------------------------------------------------------------+
|
|
| [Documents] [Training] [Models] [Settings] |
|
|
+------------------------------------------------------------------+
|
|
| |
|
|
| TRAINING |
|
|
| |
|
|
| DOCUMENT SELECTION Selected: 500 docs |
|
|
| +--------------------------------------------------------------+ |
|
|
| | [] Filename | Annotations | Source | Last Modified | |
|
|
| +--------------------------------------------------------------+ |
|
|
| | [x] INV001.pdf | 8 (M:3 A:5) | API | 2024-01-15 | |
|
|
| +--------------------------------------------------------------+ |
|
|
| | [x] INV002.pdf | 10 (M:2 A:8)| UI | 2024-01-16 | |
|
|
| +--------------------------------------------------------------+ |
|
|
| | [ ] INV003.pdf | 5 (M:5 A:0) | UI | 2024-01-16 | |
|
|
| +--------------------------------------------------------------+ |
|
|
| | [x] INV004.pdf | 12 (M:4 A:8)| API | 2024-01-17 | |
|
|
| +--------------------------------------------------------------+ |
|
|
| |
|
|
| [Select All] [Select None] [Select Not Used in Training] |
|
|
| |
|
|
| Showing labeled documents only [<] [1] [2] [3] ... [50] [>] |
|
|
| |
|
|
| TRAINING CONFIGURATION |
|
|
| +--------------------------------------------------------------+ |
|
|
| | Name: [Training Run 2024-01____________] | |
|
|
| | Description: [First training with 500 documents_________] | |
|
|
| | | |
|
|
| | Base Model: [yolo11n.pt v] Epochs: [100] Batch: [16] | |
|
|
| | Image Size: [640] Device: [GPU 0 v] | |
|
|
| | | |
|
|
| | [ ] Schedule for later: [2024-01-20] [22:00] | |
|
|
| +--------------------------------------------------------------+ |
|
|
| |
|
|
| [Start Training] |
|
|
+------------------------------------------------------------------+
|
|
```
|
|
|
|
### 5.4 Model History View
|
|
|
|
```
|
|
+------------------------------------------------------------------+
|
|
| DOCUMENT ANNOTATION TOOL [User: Admin] [Logout]|
|
|
+------------------------------------------------------------------+
|
|
| [Documents] [Training] [Models] [Settings] |
|
|
+------------------------------------------------------------------+
|
|
| |
|
|
| TRAINED MODELS |
|
|
| |
|
|
| +--------------------------------------------------------------+ |
|
|
| | Name | Status | Docs | mAP | Date | |
|
|
| +--------------------------------------------------------------+ |
|
|
| | Training Run 2024-03 | Running | 750 | - | 01/25 | |
|
|
| | | [==== ] | | | | |
|
|
| | | [View Logs] [Cancel] | |
|
|
| +--------------------------------------------------------------+ |
|
|
| | Training Run 2024-02 | Completed | 600 | 95.1% | 01/20 | |
|
|
| | | P: 94% R: 92% | |
|
|
| | | [View] [Download] [Use as Base] | |
|
|
| +--------------------------------------------------------------+ |
|
|
| | Training Run 2024-01 | Completed | 500 | 93.5% | 01/15 | |
|
|
| | | P: 92% R: 88% | |
|
|
| | | [View] [Download] [Use as Base] | |
|
|
| +--------------------------------------------------------------+ |
|
|
| | Initial Training | Completed | 200 | 85.2% | 01/10 | |
|
|
| | | P: 84% R: 80% | |
|
|
| | | [View] [Download] [Use as Base] | |
|
|
| +--------------------------------------------------------------+ |
|
|
| |
|
|
| MODEL DETAIL: Training Run 2024-02 |
|
|
| +--------------------------------------------------------------+ |
|
|
| | Created: 2024-01-20 15:00 | Completed: 2024-01-20 18:30 | |
|
|
| | Duration: 3h 30m | Documents: 600 | |
|
|
| | | |
|
|
| | Metrics: | |
|
|
| | - mAP@0.5: 95.1% | |
|
|
| | - Precision: 94% | |
|
|
| | - Recall: 92% | |
|
|
| | | |
|
|
| | Configuration: | |
|
|
| | - Base: yolo11n.pt Epochs: 100 Batch: 16 Size: 640 | |
|
|
| | | |
|
|
| | Documents Used: [View 600 documents] | |
|
|
| +--------------------------------------------------------------+ |
|
|
+------------------------------------------------------------------+
|
|
```
|
|
|
|
### 5.5 Batch Upload Modal
|
|
|
|
```
|
|
+------------------------------------------------------------------+
|
|
| BATCH UPLOAD [X] |
|
|
+------------------------------------------------------------------+
|
|
| |
|
|
| Upload a ZIP file containing: |
|
|
| - Multiple PDF files |
|
|
| - (Optional) CSV file for auto-labeling |
|
|
| |
|
|
| +--------------------------------------------------------------+ |
|
|
| | | |
|
|
| | [Drag and drop ZIP file here] | |
|
|
| | or | |
|
|
| | [Browse Files] | |
|
|
| | | |
|
|
| +--------------------------------------------------------------+ |
|
|
| |
|
|
| [x] Auto-label documents (requires CSV) |
|
|
| [ ] Process asynchronously |
|
|
| |
|
|
| CSV FORMAT REQUIREMENTS: |
|
|
| Required columns: DocumentId |
|
|
| Optional: InvoiceNumber, InvoiceDate, Amount, OCR, Bankgiro... |
|
|
| [View full CSV specification] |
|
|
| |
|
|
| [Cancel] [Upload] |
|
|
+------------------------------------------------------------------+
|
|
|
|
+------------------------------------------------------------------+
|
|
| UPLOAD PROGRESS [X] |
|
|
+------------------------------------------------------------------+
|
|
| |
|
|
| Processing batch upload... |
|
|
| |
|
|
| [======================================== ] 80% |
|
|
| |
|
|
| Files: 20 / 25 |
|
|
| Successful: 18 |
|
|
| Failed: 2 |
|
|
| |
|
|
| +--------------------------------------------------------------+ |
|
|
| | [OK] INV001.pdf - Completed (8 annotations) | |
|
|
| | [OK] INV002.pdf - Completed (10 annotations) | |
|
|
| | [!!] INV003.pdf - Failed: Corrupted PDF | |
|
|
| | [OK] INV004.pdf - Completed (6 annotations) | |
|
|
| | [...] Processing INV005.pdf... | |
|
|
| +--------------------------------------------------------------+ |
|
|
| |
|
|
| [Cancel] [Close] |
|
|
+------------------------------------------------------------------+
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Implementation Phases
|
|
|
|
### Phase 1: Database and Core Models (Week 1)
|
|
|
|
| Step | Task | Files | Risk |
|
|
|------|------|-------|------|
|
|
| 1.1 | Create database migration script | `src/data/migrations/` | Low |
|
|
| 1.2 | Add new SQLModel classes | `src/data/admin_models.py` | Low |
|
|
| 1.3 | Update AdminDB with new methods | `src/data/admin_db.py` | Medium |
|
|
| 1.4 | Add unit tests for new models | `tests/data/test_admin_models.py` | Low |
|
|
|
|
**Dependencies**: None
|
|
**Risk Assessment**: Low - mostly additive changes to existing structure
|
|
|
|
### Phase 2: Batch Upload Backend (Week 2)
|
|
|
|
| Step | Task | Files | Risk |
|
|
|------|------|-------|------|
|
|
| 2.1 | Create ZIP extraction service | `src/web/batch_upload_service.py` | Medium |
|
|
| 2.2 | Add CSV parsing with new format | `src/data/csv_loader.py` | Low |
|
|
| 2.3 | Create batch upload routes | `src/web/admin_batch_routes.py` | Medium |
|
|
| 2.4 | Add async processing queue | `src/web/batch_queue.py` | High |
|
|
| 2.5 | Integration tests | `tests/web/test_batch_upload.py` | Medium |
|
|
|
|
**Dependencies**: Phase 1
|
|
**Risk Assessment**: Medium - ZIP handling and async processing add complexity
|
|
|
|
### Phase 3: Enhanced Document Management (Week 3)
|
|
|
|
| Step | Task | Files | Risk |
|
|
|------|------|-------|------|
|
|
| 3.1 | Add upload source tracking | `src/data/admin_models.py` | Low |
|
|
| 3.2 | Update document list endpoint | `src/web/admin_routes.py` | Low |
|
|
| 3.3 | Add annotation lock mechanism | `src/web/admin_annotation_routes.py` | Medium |
|
|
| 3.4 | Add document status endpoint | `src/web/admin_routes.py` | Low |
|
|
| 3.5 | Update auto-label service | `src/web/admin_autolabel.py` | Medium |
|
|
|
|
**Dependencies**: Phase 1, Phase 2
|
|
**Risk Assessment**: Medium - locking mechanism needs careful implementation
|
|
|
|
### Phase 4: Manual Annotation Enhancement (Week 4)
|
|
|
|
| Step | Task | Files | Risk |
|
|
|------|------|-------|------|
|
|
| 4.1 | Add override mechanism | `src/web/admin_annotation_routes.py` | Medium |
|
|
| 4.2 | Add annotation history | `src/data/admin_db.py` | Low |
|
|
| 4.3 | Add verification endpoint | `src/web/admin_annotation_routes.py` | Low |
|
|
| 4.4 | Update schemas with new fields | `src/web/admin_schemas.py` | Low |
|
|
|
|
**Dependencies**: Phase 3
|
|
**Risk Assessment**: Low - extending existing annotation system
|
|
|
|
### Phase 5: Training Integration (Week 5)
|
|
|
|
| Step | Task | Files | Risk |
|
|
|------|------|-------|------|
|
|
| 5.1 | Add document selection for training | `src/web/admin_training_routes.py` | Medium |
|
|
| 5.2 | Add training document link table | `src/data/admin_db.py` | Low |
|
|
| 5.3 | Add model list endpoint | `src/web/admin_training_routes.py` | Low |
|
|
| 5.4 | Update export with selection | `src/web/admin_training_routes.py` | Medium |
|
|
| 5.5 | Add metrics extraction | `src/cli/train.py` | Medium |
|
|
|
|
**Dependencies**: Phase 1, Phase 4
|
|
**Risk Assessment**: Medium - integration with training pipeline
|
|
|
|
### Phase 6: Frontend Implementation (Weeks 6-7)
|
|
|
|
| Step | Task | Files | Risk |
|
|
|------|------|-------|------|
|
|
| 6.1 | Create React component structure | `frontend/` | High |
|
|
| 6.2 | Implement document list view | `frontend/src/components/` | Medium |
|
|
| 6.3 | Implement document detail view | `frontend/src/components/` | High |
|
|
| 6.4 | Implement training page | `frontend/src/components/` | Medium |
|
|
| 6.5 | Implement batch upload modal | `frontend/src/components/` | Medium |
|
|
| 6.6 | Add annotation editor | `frontend/src/components/` | High |
|
|
|
|
**Dependencies**: Phase 2-5
|
|
**Risk Assessment**: High - frontend development is a new component
|
|
|
|
### Phase 7: Testing and Documentation (Week 8)
|
|
|
|
| Step | Task | Files | Risk |
|
|
|------|------|-------|------|
|
|
| 7.1 | Integration tests | `tests/integration/` | Medium |
|
|
| 7.2 | E2E tests | `tests/e2e/` | High |
|
|
| 7.3 | API documentation | `docs/api/` | Low |
|
|
| 7.4 | User guide | `docs/user-guide/` | Low |
|
|
| 7.5 | Performance testing | `tests/performance/` | Medium |
|
|
|
|
**Dependencies**: All phases
|
|
**Risk Assessment**: Medium
|
|
|
|
### Risk Mitigation Strategies
|
|
|
|
| Risk | Impact | Probability | Mitigation |
|
|
|------|--------|-------------|------------|
|
|
| ZIP bomb attack | High | Low | Limit max file count, max total size, scan before extraction |
|
|
| Async queue failures | Medium | Medium | Implement retry logic, dead letter queue, manual retry endpoint |
|
|
| Annotation lock deadlock | Medium | Low | Timeout-based locks, admin override capability |
|
|
| Large batch performance | Medium | High | Chunked processing, progress tracking, background workers |
|
|
| Database migration issues | High | Low | Backward compatible changes, rollback scripts |
|
|
| Frontend complexity | Medium | Medium | Use established UI framework, incremental delivery |
|
|
|
|
---
|
|
|
|
## 7. State Machine Diagrams
|
|
|
|
### 7.1 Document Lifecycle States
|
|
|
|
```
|
|
+-------------+
|
|
| DELETED |
|
|
+------^------+
|
|
|
|
|
| delete
|
|
|
|
|
+----------+ upload +----------+ |
|
|
| | --------------> | |--+
|
|
| (none) | | PENDING |
|
|
| | | |
|
|
+----------+ +----+-----+
|
|
|
|
|
+----------------+-----------------+
|
|
| |
|
|
| trigger auto-label | create manual annotation
|
|
v |
|
|
+-------------+ |
|
|
| | |
|
|
| AUTO_LABEL- | |
|
|
| ING | |
|
|
| | |
|
|
+------+------+ |
|
|
| |
|
|
+---------+---------+ |
|
|
| | |
|
|
| complete | fail |
|
|
v v |
|
|
+-------------+ +-------------+ |
|
|
| | | | |
|
|
| LABELED |<----+ PENDING +<--------------+
|
|
| | retry| (failed) |
|
|
+------+------+ +-------------+
|
|
|
|
|
| export
|
|
v
|
|
+-------------+
|
|
| |
|
|
| EXPORTED |
|
|
| |
|
|
+-------------+
|
|
```
|
|
|
|
### 7.2 Auto-Label Workflow States
|
|
|
|
```
|
|
+-------------+
|
|
| MANUAL |
|
|
| OVERRIDE |
|
|
+------^------+
|
|
|
|
|
| user edit
|
|
|
|
|
+----------+ queue +----------+ | +-----------+
|
|
| | --------------> | | | | |
|
|
| (none) | | QUEUED |--+--->| COMPLETED |
|
|
| | | | | |
|
|
+----------+ +----+-----+ +-----^-----+
|
|
| |
|
|
| start |
|
|
v |
|
|
+-------------+ |
|
|
| | |
|
|
| RUNNING +-----------+
|
|
| | success
|
|
+------+------+
|
|
|
|
|
| error
|
|
v
|
|
+-------------+
|
|
| |
|
|
| FAILED |
|
|
| |
|
|
+------+------+
|
|
|
|
|
| retry
|
|
v
|
|
+-------------+
|
|
| |
|
|
| QUEUED |
|
|
| |
|
|
+-------------+
|
|
```
|
|
|
|
### 7.3 Batch Upload States
|
|
|
|
```
|
|
+----------+ upload +-------------+
|
|
| | --------------> | |
|
|
| (none) | | PROCESSING |
|
|
| | | |
|
|
+----------+ +------+------+
|
|
|
|
|
+---------------+---------------+
|
|
| | |
|
|
| all success | some fail | all fail
|
|
v v v
|
|
+-------------+ +-------------+ +-------------+
|
|
| | | | | |
|
|
| COMPLETED | | PARTIAL | | FAILED |
|
|
| | | | | |
|
|
+-------------+ +-------------+ +-------------+
|
|
```
|
|
|
|
### 7.4 Training Task States
|
|
|
|
```
|
|
+----------+ create +-------------+
|
|
| | --------------> | |
|
|
| (none) | | PENDING |
|
|
| | | |
|
|
+----------+ +------+------+
|
|
|
|
|
+-------------+-------------+
|
|
| |
|
|
| immediate | scheduled
|
|
v v
|
|
+-------------+ +-------------+
|
|
| | | |
|
|
| RUNNING |<------------+ SCHEDULED |
|
|
| | trigger | |
|
|
+------+------+ +------+------+
|
|
| |
|
|
+---------+---------+ | cancel
|
|
| | v
|
|
| success | error +-------------+
|
|
v v | |
|
|
+-------------+ +-------------+ | CANCELLED |
|
|
| | | | | |
|
|
| COMPLETED | | FAILED | +-------------+
|
|
| | | |
|
|
+-------------+ +------+------+
|
|
|
|
|
| retry
|
|
v
|
|
+-------------+
|
|
| |
|
|
| PENDING |
|
|
| |
|
|
+-------------+
|
|
```
|
|
|
|
### 7.5 Annotation Lock States
|
|
|
|
```
|
|
+-------------+
|
|
| LOCKED |
|
|
| (auto-label |
|
|
| running) |
|
|
+------^------+
|
|
|
|
|
| auto-label starts
|
|
|
|
|
+----------+ upload +----------+ |
|
|
| | --------------> | |--+
|
|
| (none) | | UNLOCKED |<---------+
|
|
| | | | |
|
|
+----------+ +----+-----+ |
|
|
| |
|
|
| auto-label | auto-label
|
|
| starts | completes/fails
|
|
| |
|
|
v |
|
|
+-------------+ |
|
|
| | |
|
|
| LOCKED +---------+
|
|
| (timeout: |
|
|
| 5 minutes) |
|
|
+-------------+
|
|
```
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
This comprehensive plan provides:
|
|
|
|
1. **PRD**: 24 user stories across 6 epics with clear acceptance criteria and priorities
|
|
2. **CSV Specification**: 13 columns with detailed validation rules and field mappings
|
|
3. **Database Schema**: 4 new tables + modifications to 3 existing tables with full SQLModel definitions
|
|
4. **API Specification**: 8 new endpoints + 2 modified endpoints with complete request/response schemas
|
|
5. **UI Wireframes**: 5 detailed text-based wireframes covering all major views
|
|
6. **Implementation Phases**: 7 phases over 8 weeks with 30+ tasks, dependencies, and risk assessments
|
|
7. **State Machines**: 5 state diagrams covering document, auto-label, batch, training, and locking workflows
|
|
|
|
The implementation follows an incremental approach starting with database/backend changes before frontend development, minimizing risk and enabling continuous testing throughout the development cycle.
|