Prepare for opencode

This commit is contained in:
Yaojia Wang
2026-02-03 22:03:44 +01:00
parent 729d96f59e
commit 183d3503ef
22 changed files with 4858 additions and 244 deletions

226
AGENTS.md
View File

@@ -1,179 +1,93 @@
# AGENTS.md - Coding Guidelines for AI Agents
# Invoice Master POC v2
## Build / Test / Lint Commands
Swedish Invoice Field Extraction System - YOLOv11 + PaddleOCR 从瑞典 PDF 发票中提取结构化数据。
## Tech Stack
| Component | Technology |
|-----------|------------|
| Object Detection | YOLOv11 (Ultralytics) |
| OCR Engine | PaddleOCR v5 (PP-OCRv5) |
| PDF Processing | PyMuPDF (fitz) |
| Database | PostgreSQL + psycopg2 |
| Web Framework | FastAPI + Uvicorn |
| Deep Learning | PyTorch + CUDA 12.x |
## WSL Environment (REQUIRED)
**Prefix ALL commands with:**
### Python Backend
```bash
# Install packages (editable mode)
pip install -e packages/shared
pip install -e packages/training
pip install -e packages/backend
# Run all tests
DB_PASSWORD=xxx pytest tests/ -q
# Run single test file
DB_PASSWORD=xxx pytest tests/path/to/test_file.py -v
# Run with coverage
DB_PASSWORD=xxx pytest tests/ --cov=packages --cov-report=term-missing
# Format code
black packages/ tests/
ruff check packages/ tests/
# Type checking
mypy packages/
wsl bash -c "source ~/miniconda3/etc/profile.d/conda.sh && conda activate invoice-py311 && <command>"
```
### Frontend
```bash
cd frontend
**NEVER run Python commands directly in Windows PowerShell/CMD.**
# Install dependencies
npm install
## Project-Specific Rules
# Development server
npm run dev
- Python 3.11+ with type hints
- No print() in production - use logging
- Run tests: `pytest --cov=src`
# Build
npm run build
## Critical Rules
# Run tests
npm run test
### Code Organization
# Run single test
npx vitest run src/path/to/file.test.ts
- Many small files over few large files
- High cohesion, low coupling
- 200-400 lines typical, 800 max per file
- Organize by feature/domain, not by type
# Watch mode
npm run test:watch
### Code Style
# Coverage
npm run test:coverage
```
- No emojis in code, comments, or documentation
- Immutability always - never mutate objects or arrays
- No console.log in production code
- Proper error handling with try/catch
- Input validation with Zod or similar
## Code Style Guidelines
### Testing
### Python
- TDD: Write tests first
- 80% minimum coverage
- Unit tests for utilities
- Integration tests for APIs
- E2E tests for critical flows
**Imports:**
- Use absolute imports within packages: `from shared.pdf.extractor import PDFDocument`
- Group imports: stdlib → third-party → local (separated by blank lines)
- Use `from __future__ import annotations` for forward references when needed
### Security
**Type Hints:**
- All functions must have type hints (enforced by mypy)
- Use `| None` instead of `Optional[...]` (Python 3.10+)
- Use `list[str]` instead of `List[str]` (Python 3.10+)
**Naming:**
- Classes: `PascalCase` (e.g., `PDFDocument`, `InferencePipeline`)
- Functions/variables: `snake_case` (e.g., `extract_text`, `get_db_connection`)
- Constants: `UPPER_SNAKE_CASE` (e.g., `DEFAULT_DPI`, `DATABASE`)
- Private: `_leading_underscore` for internal use
**Error Handling:**
- Use custom exceptions from `shared.exceptions`
- Base exception: `InvoiceExtractionError`
- Specific exceptions: `PDFProcessingError`, `OCRError`, `DatabaseError`, etc.
- Always include context in exceptions via `details` dict
**Docstrings:**
- Use Google-style docstrings
- All public functions/classes must have docstrings
- Include Args/Returns sections for complex functions
**Code Organization:**
- Maximum line length: 100 characters (black config)
- Target Python: 3.10+
- Keep files under 800 lines, ideally 200-400 lines
### TypeScript / React Frontend
**Imports:**
- Use path alias `@/` for project imports: `import { Button } from '@/components/Button'`
- Group: React → third-party → local (@/) → relative
**Naming:**
- Components: `PascalCase` (e.g., `Dashboard.tsx`, `InferenceDemo.tsx`)
- Hooks: `camelCase` with `use` prefix (e.g., `useDocuments.ts`)
- Types/Interfaces: `PascalCase` (e.g., `DocumentListResponse`)
- API endpoints: `camelCase` (e.g., `documentsApi`)
**TypeScript:**
- Strict mode enabled
- Use explicit return types on exported functions
- Prefer `type` over `interface` for simple shapes
- Use enums for fixed sets of values
**React Patterns:**
- Functional components with hooks
- Use React Query for server state
- Use Zustand for client state (if needed)
- Props interfaces named `{ComponentName}Props`
**Styling:**
- Use Tailwind CSS exclusively
- Custom colors: `warm-*` theme (e.g., `bg-warm-text-secondary`)
- Component variants defined as objects (see Button.tsx pattern)
**Testing:**
- Use Vitest + React Testing Library
- Test files: `{name}.test.ts` or `{name}.test.tsx`
- Co-locate tests with source files when possible
## Project Structure
```
packages/
shared/ # Shared utilities (PDF, OCR, storage, config)
training/ # Training service (GPU, CLI commands)
backend/ # Web API + inference (FastAPI)
frontend/ # React + TypeScript + Vite
tests/ # Test suite
migrations/ # Database SQL migrations
```
## Key Configuration
- **DPI:** 150 (must match between training and inference)
- **Database:** PostgreSQL (configured via env vars)
- **Storage:** Abstracted (Local/Azure/S3 via storage.yaml)
- **Python:** 3.10+ (3.11 recommended, 3.10 for RTX 50 series)
- No hardcoded secrets
- Environment variables for sensitive data
- Validate all user inputs
- Parameterized queries only
- CSRF protection enabled
## Environment Variables
Required: `DB_PASSWORD`
Optional: `DB_HOST`, `DB_PORT`, `DB_NAME`, `DB_USER`, `STORAGE_BASE_PATH`
```bash
# Required
DB_PASSWORD=
## Common Patterns
### Python: Adding a New API Endpoint
1. Add route in `backend/web/api/v1/`
2. Define Pydantic schema in `backend/web/schemas/`
3. Implement service logic in `backend/web/services/`
4. Add tests in `tests/web/`
### Frontend: Adding a New Component
1. Create component in `frontend/src/components/`
2. Export from `frontend/src/components/index.ts` if shared
3. Add types to `frontend/src/api/types.ts` if API-related
4. Add tests co-located with component
### Error Handling
```python
from shared.exceptions import DatabaseError
try:
result = db.query(...)
except Exception as e:
raise DatabaseError(f"Failed to fetch document: {e}", details={"doc_id": doc_id})
# Optional (with defaults)
DB_HOST=192.168.68.31
DB_PORT=5432
DB_NAME=docmaster
DB_USER=docmaster
MODEL_PATH=runs/train/invoice_fields/weights/best.pt
CONFIDENCE_THRESHOLD=0.5
SERVER_HOST=0.0.0.0
SERVER_PORT=8000
```
## Available Commands
### Database Access
```python
from shared.data.repositories import DocumentRepository
- `/tdd` - Test-driven development workflow
- `/plan` - Create implementation plan
- `/code-review` - Review code quality
- `/build-fix` - Fix build errors
repo = DocumentRepository()
doc = repo.get_by_id(doc_id)
```
## Git Workflow
- Conventional commits: `feat:`, `fix:`, `refactor:`, `docs:`, `test:`
- Never commit to main directly
- PRs require review
- All tests must pass before merge