569 lines
16 KiB
Markdown
569 lines
16 KiB
Markdown
---
|
|
name: security-review
|
|
description: Use this skill when adding authentication, handling user input, working with secrets, creating API endpoints, or implementing payment/sensitive features. Provides comprehensive security checklist and patterns.
|
|
---
|
|
|
|
# Security Review Skill
|
|
|
|
Security best practices for Python/FastAPI applications handling sensitive invoice data.
|
|
|
|
## When to Activate
|
|
|
|
- Implementing authentication or authorization
|
|
- Handling user input or file uploads
|
|
- Creating new API endpoints
|
|
- Working with secrets or credentials
|
|
- Processing sensitive invoice data
|
|
- Integrating third-party APIs
|
|
- Database operations with user data
|
|
|
|
## Security Checklist
|
|
|
|
### 1. Secrets Management
|
|
|
|
#### NEVER Do This
|
|
```python
|
|
# Hardcoded secrets - CRITICAL VULNERABILITY
|
|
api_key = "sk-proj-xxxxx"
|
|
db_password = "password123"
|
|
```
|
|
|
|
#### ALWAYS Do This
|
|
```python
|
|
import os
|
|
from pydantic_settings import BaseSettings
|
|
|
|
class Settings(BaseSettings):
|
|
db_password: str
|
|
api_key: str
|
|
model_path: str = "runs/train/invoice_fields/weights/best.pt"
|
|
|
|
class Config:
|
|
env_file = ".env"
|
|
|
|
settings = Settings()
|
|
|
|
# Verify secrets exist
|
|
if not settings.db_password:
|
|
raise RuntimeError("DB_PASSWORD not configured")
|
|
```
|
|
|
|
#### Verification Steps
|
|
- [ ] No hardcoded API keys, tokens, or passwords
|
|
- [ ] All secrets in environment variables
|
|
- [ ] `.env` in .gitignore
|
|
- [ ] No secrets in git history
|
|
- [ ] `.env.example` with placeholder values
|
|
|
|
### 2. Input Validation
|
|
|
|
#### Always Validate User Input
|
|
```python
|
|
from pydantic import BaseModel, Field, field_validator
|
|
from fastapi import HTTPException
|
|
import re
|
|
|
|
class InvoiceRequest(BaseModel):
|
|
invoice_number: str = Field(..., min_length=1, max_length=50)
|
|
amount: float = Field(..., gt=0, le=1_000_000)
|
|
bankgiro: str | None = None
|
|
|
|
@field_validator("invoice_number")
|
|
@classmethod
|
|
def validate_invoice_number(cls, v: str) -> str:
|
|
# Whitelist validation - only allow safe characters
|
|
if not re.match(r"^[A-Za-z0-9\-_]+$", v):
|
|
raise ValueError("Invalid invoice number format")
|
|
return v
|
|
|
|
@field_validator("bankgiro")
|
|
@classmethod
|
|
def validate_bankgiro(cls, v: str | None) -> str | None:
|
|
if v is None:
|
|
return None
|
|
cleaned = re.sub(r"[^0-9]", "", v)
|
|
if not (7 <= len(cleaned) <= 8):
|
|
raise ValueError("Bankgiro must be 7-8 digits")
|
|
return cleaned
|
|
```
|
|
|
|
#### File Upload Validation
|
|
```python
|
|
from fastapi import UploadFile, HTTPException
|
|
from pathlib import Path
|
|
|
|
ALLOWED_EXTENSIONS = {".pdf"}
|
|
MAX_FILE_SIZE = 10 * 1024 * 1024 # 10MB
|
|
|
|
async def validate_pdf_upload(file: UploadFile) -> bytes:
|
|
"""Validate PDF upload with security checks."""
|
|
# Extension check
|
|
ext = Path(file.filename or "").suffix.lower()
|
|
if ext not in ALLOWED_EXTENSIONS:
|
|
raise HTTPException(400, f"Only PDF files allowed, got {ext}")
|
|
|
|
# Read content
|
|
content = await file.read()
|
|
|
|
# Size check
|
|
if len(content) > MAX_FILE_SIZE:
|
|
raise HTTPException(400, f"File too large (max {MAX_FILE_SIZE // 1024 // 1024}MB)")
|
|
|
|
# Magic bytes check (PDF signature)
|
|
if not content.startswith(b"%PDF"):
|
|
raise HTTPException(400, "Invalid PDF file format")
|
|
|
|
return content
|
|
```
|
|
|
|
#### Verification Steps
|
|
- [ ] All user inputs validated with Pydantic
|
|
- [ ] File uploads restricted (size, type, extension, magic bytes)
|
|
- [ ] No direct use of user input in queries
|
|
- [ ] Whitelist validation (not blacklist)
|
|
- [ ] Error messages don't leak sensitive info
|
|
|
|
### 3. SQL Injection Prevention
|
|
|
|
#### NEVER Concatenate SQL
|
|
```python
|
|
# DANGEROUS - SQL Injection vulnerability
|
|
query = f"SELECT * FROM documents WHERE id = '{user_input}'"
|
|
cur.execute(query)
|
|
```
|
|
|
|
#### ALWAYS Use Parameterized Queries
|
|
```python
|
|
import psycopg2
|
|
|
|
# Safe - parameterized query with %s placeholders
|
|
cur.execute(
|
|
"SELECT * FROM documents WHERE id = %s AND status = %s",
|
|
(document_id, status)
|
|
)
|
|
|
|
# Safe - named parameters
|
|
cur.execute(
|
|
"SELECT * FROM documents WHERE id = %(id)s",
|
|
{"id": document_id}
|
|
)
|
|
|
|
# Safe - psycopg2.sql for dynamic identifiers
|
|
from psycopg2 import sql
|
|
|
|
cur.execute(
|
|
sql.SQL("SELECT {} FROM {} WHERE id = %s").format(
|
|
sql.Identifier("invoice_number"),
|
|
sql.Identifier("documents")
|
|
),
|
|
(document_id,)
|
|
)
|
|
```
|
|
|
|
#### Verification Steps
|
|
- [ ] All database queries use parameterized queries (%s or %(name)s)
|
|
- [ ] No string concatenation or f-strings in SQL
|
|
- [ ] psycopg2.sql module used for dynamic identifiers
|
|
- [ ] No user input in table/column names
|
|
|
|
### 4. Path Traversal Prevention
|
|
|
|
#### NEVER Trust User Paths
|
|
```python
|
|
# DANGEROUS - Path traversal vulnerability
|
|
filename = request.query_params.get("file")
|
|
with open(f"/data/{filename}", "r") as f: # Attacker: ../../../etc/passwd
|
|
return f.read()
|
|
```
|
|
|
|
#### ALWAYS Validate Paths
|
|
```python
|
|
from pathlib import Path
|
|
|
|
ALLOWED_DIR = Path("/data/uploads").resolve()
|
|
|
|
def get_safe_path(filename: str) -> Path:
|
|
"""Get safe file path, preventing path traversal."""
|
|
# Remove any path components
|
|
safe_name = Path(filename).name
|
|
|
|
# Validate filename characters
|
|
if not re.match(r"^[A-Za-z0-9_\-\.]+$", safe_name):
|
|
raise HTTPException(400, "Invalid filename")
|
|
|
|
# Resolve and verify within allowed directory
|
|
full_path = (ALLOWED_DIR / safe_name).resolve()
|
|
|
|
if not full_path.is_relative_to(ALLOWED_DIR):
|
|
raise HTTPException(400, "Invalid file path")
|
|
|
|
return full_path
|
|
```
|
|
|
|
#### Verification Steps
|
|
- [ ] User-provided filenames sanitized
|
|
- [ ] Paths resolved and validated against allowed directory
|
|
- [ ] No direct concatenation of user input into paths
|
|
- [ ] Whitelist characters in filenames
|
|
|
|
### 5. Authentication & Authorization
|
|
|
|
#### API Key Validation
|
|
```python
|
|
from fastapi import Depends, HTTPException, Security
|
|
from fastapi.security import APIKeyHeader
|
|
|
|
api_key_header = APIKeyHeader(name="X-API-Key", auto_error=False)
|
|
|
|
async def verify_api_key(api_key: str = Security(api_key_header)) -> str:
|
|
if not api_key:
|
|
raise HTTPException(401, "API key required")
|
|
|
|
# Constant-time comparison to prevent timing attacks
|
|
import hmac
|
|
if not hmac.compare_digest(api_key, settings.api_key):
|
|
raise HTTPException(403, "Invalid API key")
|
|
|
|
return api_key
|
|
|
|
@router.post("/infer")
|
|
async def infer(
|
|
file: UploadFile,
|
|
api_key: str = Depends(verify_api_key)
|
|
):
|
|
...
|
|
```
|
|
|
|
#### Role-Based Access Control
|
|
```python
|
|
from enum import Enum
|
|
|
|
class UserRole(str, Enum):
|
|
USER = "user"
|
|
ADMIN = "admin"
|
|
|
|
def require_role(required_role: UserRole):
|
|
async def role_checker(current_user: User = Depends(get_current_user)):
|
|
if current_user.role != required_role:
|
|
raise HTTPException(403, "Insufficient permissions")
|
|
return current_user
|
|
return role_checker
|
|
|
|
@router.delete("/documents/{doc_id}")
|
|
async def delete_document(
|
|
doc_id: str,
|
|
user: User = Depends(require_role(UserRole.ADMIN))
|
|
):
|
|
...
|
|
```
|
|
|
|
#### Verification Steps
|
|
- [ ] API keys validated with constant-time comparison
|
|
- [ ] Authorization checks before sensitive operations
|
|
- [ ] Role-based access control implemented
|
|
- [ ] Session/token validation on protected routes
|
|
|
|
### 6. Rate Limiting
|
|
|
|
#### Rate Limiter Implementation
|
|
```python
|
|
from time import time
|
|
from collections import defaultdict
|
|
from fastapi import Request, HTTPException
|
|
|
|
class RateLimiter:
|
|
def __init__(self):
|
|
self.requests: dict[str, list[float]] = defaultdict(list)
|
|
|
|
def check_limit(
|
|
self,
|
|
identifier: str,
|
|
max_requests: int,
|
|
window_seconds: int
|
|
) -> bool:
|
|
now = time()
|
|
# Clean old requests
|
|
self.requests[identifier] = [
|
|
t for t in self.requests[identifier]
|
|
if now - t < window_seconds
|
|
]
|
|
# Check limit
|
|
if len(self.requests[identifier]) >= max_requests:
|
|
return False
|
|
self.requests[identifier].append(now)
|
|
return True
|
|
|
|
limiter = RateLimiter()
|
|
|
|
@app.middleware("http")
|
|
async def rate_limit_middleware(request: Request, call_next):
|
|
client_ip = request.client.host if request.client else "unknown"
|
|
|
|
# 100 requests per minute for general endpoints
|
|
if not limiter.check_limit(client_ip, max_requests=100, window_seconds=60):
|
|
raise HTTPException(429, "Rate limit exceeded. Try again later.")
|
|
|
|
return await call_next(request)
|
|
```
|
|
|
|
#### Stricter Limits for Expensive Operations
|
|
```python
|
|
# Inference endpoint: 10 requests per minute
|
|
async def check_inference_rate_limit(request: Request):
|
|
client_ip = request.client.host if request.client else "unknown"
|
|
if not limiter.check_limit(f"infer:{client_ip}", max_requests=10, window_seconds=60):
|
|
raise HTTPException(429, "Inference rate limit exceeded")
|
|
|
|
@router.post("/infer")
|
|
async def infer(
|
|
file: UploadFile,
|
|
_: None = Depends(check_inference_rate_limit)
|
|
):
|
|
...
|
|
```
|
|
|
|
#### Verification Steps
|
|
- [ ] Rate limiting on all API endpoints
|
|
- [ ] Stricter limits on expensive operations (inference, OCR)
|
|
- [ ] IP-based rate limiting
|
|
- [ ] Clear error messages for rate-limited requests
|
|
|
|
### 7. Sensitive Data Exposure
|
|
|
|
#### Logging
|
|
```python
|
|
import logging
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
# WRONG: Logging sensitive data
|
|
logger.info(f"Processing invoice: {invoice_data}") # May contain sensitive info
|
|
logger.error(f"DB error with password: {db_password}")
|
|
|
|
# CORRECT: Redact sensitive data
|
|
logger.info(f"Processing invoice: id={doc_id}")
|
|
logger.error(f"DB connection failed to {db_host}:{db_port}")
|
|
|
|
# CORRECT: Structured logging with safe fields only
|
|
logger.info(
|
|
"Invoice processed",
|
|
extra={
|
|
"document_id": doc_id,
|
|
"field_count": len(fields),
|
|
"processing_time_ms": elapsed_ms
|
|
}
|
|
)
|
|
```
|
|
|
|
#### Error Messages
|
|
```python
|
|
# WRONG: Exposing internal details
|
|
@app.exception_handler(Exception)
|
|
async def error_handler(request: Request, exc: Exception):
|
|
return JSONResponse(
|
|
status_code=500,
|
|
content={
|
|
"error": str(exc),
|
|
"traceback": traceback.format_exc() # NEVER expose!
|
|
}
|
|
)
|
|
|
|
# CORRECT: Generic error messages
|
|
@app.exception_handler(Exception)
|
|
async def error_handler(request: Request, exc: Exception):
|
|
logger.error(f"Unhandled error: {exc}", exc_info=True) # Log internally
|
|
return JSONResponse(
|
|
status_code=500,
|
|
content={"success": False, "error": "An error occurred"}
|
|
)
|
|
```
|
|
|
|
#### Verification Steps
|
|
- [ ] No passwords, tokens, or secrets in logs
|
|
- [ ] Error messages generic for users
|
|
- [ ] Detailed errors only in server logs
|
|
- [ ] No stack traces exposed to users
|
|
- [ ] Invoice data (amounts, account numbers) not logged
|
|
|
|
### 8. CORS Configuration
|
|
|
|
```python
|
|
from fastapi.middleware.cors import CORSMiddleware
|
|
|
|
# WRONG: Allow all origins
|
|
app.add_middleware(
|
|
CORSMiddleware,
|
|
allow_origins=["*"], # DANGEROUS in production
|
|
allow_credentials=True,
|
|
)
|
|
|
|
# CORRECT: Specific origins
|
|
ALLOWED_ORIGINS = [
|
|
"http://localhost:8000",
|
|
"https://your-domain.com",
|
|
]
|
|
|
|
app.add_middleware(
|
|
CORSMiddleware,
|
|
allow_origins=ALLOWED_ORIGINS,
|
|
allow_credentials=True,
|
|
allow_methods=["GET", "POST"],
|
|
allow_headers=["*"],
|
|
)
|
|
```
|
|
|
|
#### Verification Steps
|
|
- [ ] CORS origins explicitly listed
|
|
- [ ] No wildcard origins in production
|
|
- [ ] Credentials only with specific origins
|
|
|
|
### 9. Temporary File Security
|
|
|
|
```python
|
|
import tempfile
|
|
from pathlib import Path
|
|
from contextlib import contextmanager
|
|
|
|
@contextmanager
|
|
def secure_temp_file(suffix: str = ".pdf"):
|
|
"""Create secure temporary file that is always cleaned up."""
|
|
tmp_path = None
|
|
try:
|
|
with tempfile.NamedTemporaryFile(
|
|
suffix=suffix,
|
|
delete=False,
|
|
dir="/tmp/invoice-master" # Dedicated temp directory
|
|
) as tmp:
|
|
tmp_path = Path(tmp.name)
|
|
yield tmp_path
|
|
finally:
|
|
if tmp_path and tmp_path.exists():
|
|
tmp_path.unlink()
|
|
|
|
# Usage
|
|
async def process_upload(file: UploadFile):
|
|
with secure_temp_file(".pdf") as tmp_path:
|
|
content = await validate_pdf_upload(file)
|
|
tmp_path.write_bytes(content)
|
|
result = pipeline.process(tmp_path)
|
|
# File automatically cleaned up
|
|
return result
|
|
```
|
|
|
|
#### Verification Steps
|
|
- [ ] Temporary files always cleaned up (use context managers)
|
|
- [ ] Temp directory has restricted permissions
|
|
- [ ] No leftover files after processing errors
|
|
|
|
### 10. Dependency Security
|
|
|
|
#### Regular Updates
|
|
```bash
|
|
# Check for vulnerabilities
|
|
pip-audit
|
|
|
|
# Update dependencies
|
|
pip install --upgrade -r requirements.txt
|
|
|
|
# Check for outdated packages
|
|
pip list --outdated
|
|
```
|
|
|
|
#### Lock Files
|
|
```bash
|
|
# Create requirements lock file
|
|
pip freeze > requirements.lock
|
|
|
|
# Install from lock file for reproducible builds
|
|
pip install -r requirements.lock
|
|
```
|
|
|
|
#### Verification Steps
|
|
- [ ] Dependencies up to date
|
|
- [ ] No known vulnerabilities (pip-audit clean)
|
|
- [ ] requirements.txt pinned versions
|
|
- [ ] Regular security updates scheduled
|
|
|
|
## Security Testing
|
|
|
|
### Automated Security Tests
|
|
```python
|
|
import pytest
|
|
from fastapi.testclient import TestClient
|
|
|
|
def test_requires_api_key(client: TestClient):
|
|
"""Test authentication required."""
|
|
response = client.post("/api/v1/infer")
|
|
assert response.status_code == 401
|
|
|
|
def test_invalid_api_key_rejected(client: TestClient):
|
|
"""Test invalid API key rejected."""
|
|
response = client.post(
|
|
"/api/v1/infer",
|
|
headers={"X-API-Key": "invalid-key"}
|
|
)
|
|
assert response.status_code == 403
|
|
|
|
def test_sql_injection_prevented(client: TestClient):
|
|
"""Test SQL injection attempt rejected."""
|
|
response = client.get(
|
|
"/api/v1/documents",
|
|
params={"id": "'; DROP TABLE documents; --"}
|
|
)
|
|
# Should return validation error, not execute SQL
|
|
assert response.status_code in (400, 422)
|
|
|
|
def test_path_traversal_prevented(client: TestClient):
|
|
"""Test path traversal attempt rejected."""
|
|
response = client.get("/api/v1/results/../../etc/passwd")
|
|
assert response.status_code == 400
|
|
|
|
def test_rate_limit_enforced(client: TestClient):
|
|
"""Test rate limiting works."""
|
|
responses = [
|
|
client.post("/api/v1/infer", files={"file": b"test"})
|
|
for _ in range(15)
|
|
]
|
|
rate_limited = [r for r in responses if r.status_code == 429]
|
|
assert len(rate_limited) > 0
|
|
|
|
def test_large_file_rejected(client: TestClient):
|
|
"""Test file size limit enforced."""
|
|
large_content = b"x" * (11 * 1024 * 1024) # 11MB
|
|
response = client.post(
|
|
"/api/v1/infer",
|
|
files={"file": ("test.pdf", large_content)}
|
|
)
|
|
assert response.status_code == 400
|
|
```
|
|
|
|
## Pre-Deployment Security Checklist
|
|
|
|
Before ANY production deployment:
|
|
|
|
- [ ] **Secrets**: No hardcoded secrets, all in env vars
|
|
- [ ] **Input Validation**: All user inputs validated with Pydantic
|
|
- [ ] **SQL Injection**: All queries use parameterized queries
|
|
- [ ] **Path Traversal**: File paths validated and sanitized
|
|
- [ ] **Authentication**: API key or token validation
|
|
- [ ] **Authorization**: Role checks in place
|
|
- [ ] **Rate Limiting**: Enabled on all endpoints
|
|
- [ ] **HTTPS**: Enforced in production
|
|
- [ ] **CORS**: Properly configured (no wildcards)
|
|
- [ ] **Error Handling**: No sensitive data in errors
|
|
- [ ] **Logging**: No sensitive data logged
|
|
- [ ] **File Uploads**: Validated (size, type, magic bytes)
|
|
- [ ] **Temp Files**: Always cleaned up
|
|
- [ ] **Dependencies**: Up to date, no vulnerabilities
|
|
|
|
## Resources
|
|
|
|
- [OWASP Top 10](https://owasp.org/www-project-top-ten/)
|
|
- [FastAPI Security](https://fastapi.tiangolo.com/tutorial/security/)
|
|
- [Bandit (Python Security Linter)](https://bandit.readthedocs.io/)
|
|
- [pip-audit](https://pypi.org/project/pip-audit/)
|
|
|
|
---
|
|
|
|
**Remember**: Security is not optional. One vulnerability can compromise sensitive invoice data. When in doubt, err on the side of caution.
|