feat: initial commit — Billo Release Agent (LangGraph)

LangGraph-based release automation agent with:
- PR discovery (webhook + polling)
- AI code review via Claude Code CLI (subscription-based)
- Auto-create Jira tickets for PRs without ticket ID
- Jira ticket lifecycle management (code review -> staging -> done)
- CI/CD pipeline trigger, polling, and approval gates
- Slack interactive messages with approval buttons
- Per-repo semantic versioning
- PostgreSQL persistence (threads, staging, releases)
- FastAPI API (webhooks, approvals, status, manual triggers)
- Docker Compose deployment

1069 tests, 95%+ coverage.
This commit is contained in:
Yaojia Wang
2026-03-24 17:38:23 +01:00
commit f5c2733cfb
104 changed files with 19721 additions and 0 deletions

341
README.md Normal file
View File

@@ -0,0 +1,341 @@
# Billo Release Agent
A LangGraph-based release automation agent for Billo. Automates the full release
pipeline: PR discovery, code review (via Claude Code CLI), Jira ticket management,
staging release tracking, CI/CD pipeline triggering with approval gates, and Slack
interactive notifications.
## Architecture
```
+--- Azure DevOps Webhook ---+ +--- PR Poller (every 5 min) ---+
| POST /webhooks/azdo | | Scans WATCHED_REPOS for |
| (push-based) | | active PRs (pull-based) |
+------------+---------------+ +------------+------------------+
| |
v v
+---------------------------------------------+
| FastAPI Application |
| /webhooks/azdo /slack/interactions |
| /approvals/* /manual/* /status |
+---------------------+------------------------+
|
v
+---------------------------------------------+
| LangGraph Graphs |
| |
| pr_completed: |
| parse -> fetch -> [has ticket?] |
| no -> Claude generates ticket |
| yes -> Claude code review |
| -> merge -> Jira -> staging -> CI build |
| |
| release: |
| create release PR -> merge -> CI build |
| -> CD release -> [Sandbox approve] |
| -> [Production approve] -> Slack notify |
+---------------------+------------------------+
|
+----------------+----------------+
| | |
v v v
PostgreSQL Azure DevOps Slack (buttons)
- threads - PRs/Pipelines - Notifications
- staging - Builds/Releases - Approvals
- releases
```
## Key Features
| Feature | Description |
|---------|-------------|
| **PR Discovery** | Webhook-based (push) or polling-based (pull) — or both |
| **Auto-Create Jira Ticket** | When PR branch has no ticket ID, Claude generates summary + description and creates a Jira Story |
| **AI Code Review** | Claude Code CLI reviews PRs with full repo context (Read/Glob/Grep), using your subscription |
| **CI/CD Integration** | Triggers CI builds after merge, polls for completion, handles CD release approval gates |
| **Slack Interactive** | Approval requests with [Approve]/[Cancel] buttons, CI/CD status notifications |
| **Human-in-the-loop** | 5 interrupt points where operator confirmation is required before destructive actions |
| **Per-repo Versioning** | Independent semantic versioning per repository (patch auto-increment) |
## Prerequisites
- Python 3.12+
- PostgreSQL 16+
- [uv](https://github.com/astral-sh/uv) (recommended) or pip
- Claude Code CLI installed and authenticated (`claude` in PATH)
- Slack App (for interactive buttons) or Slack Incoming Webhook (for notifications only)
## Quick Start
### Local Development
```bash
# Install dependencies
uv sync --all-extras
# Copy and configure environment
cp .env.example .env
# Edit .env -- fill in all REQUIRED variables
# Start PostgreSQL
docker compose up -d db
# Run the server
uv run uvicorn release_agent.main:app --reload --port 8000
# Verify
curl http://localhost:8000/status
```
### Docker Compose (Production)
```bash
cp .env.example .env
# Edit .env -- POSTGRES_PASSWORD, WEBHOOK_SECRET, etc. are required
docker compose up -d
```
Tables are created automatically on first startup.
## Configuration
All configuration is via environment variables. See `.env.example` for the full list.
### Required Variables
| Variable | Description |
|----------|-------------|
| `AZDO_ORGANIZATION` | Azure DevOps organization name |
| `AZDO_PROJECT` | Azure DevOps project name |
| `AZDO_PAT` | Azure DevOps personal access token |
| `POSTGRES_DSN` | PostgreSQL connection string |
| `POSTGRES_PASSWORD` | PostgreSQL password (used by docker-compose) |
| `JIRA_EMAIL` | Jira account email |
| `JIRA_API_TOKEN` | Jira API token |
| `SLACK_WEBHOOK_URL` | Slack incoming webhook URL |
| `WEBHOOK_SECRET` | Shared secret for validating AzDo webhooks (must be non-empty) |
### Optional Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `REPOS_BASE_DIR` | `""` | Base dir with Billo repos (e.g., `/c/Users/yaoji/git/Billo`) |
| `WATCHED_REPOS` | `""` | Comma-separated repos to poll (e.g., `Billo.Platform.Payment,Billo.Platform.Document.DocumentAnalyser`) |
| `PR_POLL_ENABLED` | `False` | Enable periodic PR polling |
| `PR_POLL_INTERVAL_SECONDS` | `300` | Polling interval (5 min) |
| `PR_POLL_TARGET_BRANCH` | `refs/heads/develop` | Target branch filter |
| `DEFAULT_JIRA_PROJECT` | `ALLPOST` | Jira project key for auto-created tickets |
| `AUTO_CREATE_TICKET_ENABLED` | `True` | Auto-create Jira ticket when branch has no ticket ID |
| `SLACK_BOT_TOKEN` | `""` | Slack App bot token (for interactive buttons) |
| `SLACK_SIGNING_SECRET` | `""` | Slack App signing secret (required for /slack/interactions) |
| `SLACK_CHANNEL_ID` | `""` | Channel for interactive messages |
| `CI_POLL_INTERVAL_SECONDS` | `30` | CI build status poll interval |
| `CI_POLL_MAX_WAIT_SECONDS` | `1800` | Max wait for CI completion (30 min) |
| `OPERATOR_TOKEN` | `""` | Token for operator endpoints (empty = no auth) |
| `JIRA_BASE_URL` | `https://billolife.atlassian.net` | Jira instance URL |
| `PORT` | `8000` | HTTP server port |
### Security Notes
- `WEBHOOK_SECRET` must be non-empty; empty secret rejects all webhooks
- `POSTGRES_PASSWORD` has no default in docker-compose; fails if unset
- `SLACK_SIGNING_SECRET` must be set for `/slack/interactions` to accept requests (returns 503 if empty)
- Slack signature verification includes 5-minute replay attack prevention
- All secrets use `SecretStr` and are never logged or included in error responses
- Set `OPERATOR_TOKEN` in production to protect approval and manual trigger endpoints
## API Endpoints
### Webhooks
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| POST | `/webhooks/azdo` | `X-Webhook-Secret` | Receive Azure DevOps PR webhook events |
| POST | `/slack/interactions` | Slack Signing Secret | Receive Slack button click callbacks |
### Approvals (requires `X-Operator-Token` when configured)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/approvals/pending` | List threads awaiting operator approval |
| POST | `/approvals/{thread_id}` | Submit approval decision (merge/cancel/approve/skip) |
### Status and Manual Triggers
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/status` | None | Health check |
| GET | `/releases/{repo}` | None | List versions for a repo |
| GET | `/staging?repo={repo}` | None | Current staging release |
| POST | `/manual/pr/{pr_id}` | `X-Operator-Token` | Manually trigger PR processing |
| POST | `/manual/release` | `X-Operator-Token` | Manually trigger a release |
## Graph Workflows
### PR Completed
```
parse_webhook -> fetch_pr_details -> route_after_fetch
|-- merged -----------------> calculate_version -> update_staging -> CI build -> END
|-- active_with_ticket -----> move_jira_code_review -+
|-- active_no_ticket -------> auto_create_ticket ----+
|
run_code_review -> evaluate_review
|-- approve -> [Slack: Merge?] -> merge_pr
|-- request_changes -> notify -> END
-> Jira transitions -> calculate_version
-> update_staging -> CI build -> notify -> END
```
### Release
```
load_staging -> [Slack: Create release?] -> create_release_pr
-> [Slack: Merge release?] -> merge_release_pr
-> CI build on main -> poll until complete
|-- ci_failed -> notify failure -> END
|-- ci_passed -> wait for CD release -> approval loop:
|-- [Slack: Approve Sandbox?] -> approve -> poll again
|-- [Slack: Approve Production?] -> approve -> poll again
|-- all_deployed -> move tickets to Done
-> Slack release notification -> archive -> END
```
### Interrupt Points (Slack buttons)
| # | When | Slack Message | Buttons |
|---|------|--------------|---------|
| 1 | After code review approves | PR title + review summary | [Merge] [Cancel] |
| 2 | Before creating release PR | Version + ticket list | [Create] [Cancel] |
| 3 | Before merging release PR | Release PR link | [Merge] [Cancel] |
| 4 | Before triggering pipelines | Pipeline list | [Trigger] [Skip] |
| 5 | Before approving release stage | Stage name + status | [Approve] [Skip] |
## PR Polling (Alternative to Webhooks)
When `PR_POLL_ENABLED=True`, the agent periodically scans all `WATCHED_REPOS` for
active PRs targeting the configured branch. New PRs not yet tracked in `agent_threads`
are automatically processed through the `pr_completed` graph.
This eliminates the need for Azure DevOps webhook configuration and works behind
firewalls without public endpoint exposure.
## Auto-Create Jira Ticket
When a PR branch has no ticket ID (e.g., `chore/update-dependencies` instead of
`feature/ALLPOST-4028_login-page`), the agent automatically:
1. Sends the PR diff to Claude Code CLI
2. Claude generates a concise ticket summary and description
3. Creates a Jira Story in the `DEFAULT_JIRA_PROJECT`
4. Continues the normal workflow with the created ticket
## Database Schema
Tables are created automatically on startup:
```sql
-- Thread tracking for LangGraph interrupts and PR dedup
agent_threads (thread_id, graph_name, repo_name, pr_id, status, state JSONB,
slack_message_ts, created_at, updated_at)
-- Current in-progress releases (one per repo)
staging_releases (repo, version, started_at, tickets JSONB, updated_at)
-- Completed releases (immutable history)
archived_releases (repo, version, started_at, tickets JSONB, released_at)
```
## Migrating Existing JSON Files
If you have existing release data from the Claude Code skill:
```bash
# Dry run
uv run python scripts/migrate_json_to_db.py \
--source ../release-workflow/releases --dry-run
# Execute
uv run python scripts/migrate_json_to_db.py \
--source ../release-workflow/releases \
--dsn "postgresql://agent:password@localhost/agent"
```
## Development
### Running Tests
```bash
# Run all tests with coverage (1061 tests, 96%+ coverage)
uv run pytest
# Run without coverage (faster)
uv run pytest --no-cov
# Run specific module
uv run pytest tests/graph/test_pr_completed.py -v
```
### Project Structure
```
src/release_agent/
main.py # FastAPI app, lifespan, task management
config.py # pydantic-settings (all env vars)
state.py # LangGraph ReleaseState TypedDict
exceptions.py # Exception hierarchy
branch_parser.py # Extract ticket ID from branch name
versioning.py # Per-repo version calculation
api/
models.py # HTTP request/response Pydantic models
dependencies.py # FastAPI Depends() + operator auth
webhooks.py # POST /webhooks/azdo
approvals.py # Approval endpoints
status.py # Status, releases, manual triggers
slack_interactions.py # POST /slack/interactions (button callbacks)
graph/
dependencies.py # ToolClients, StagingStore Protocol
postgres_staging_store.py # PostgreSQL-backed store
routing.py # Pure routing functions (route_after_fetch, etc.)
pr_completed.py # PR graph nodes + auto_create_ticket
release.py # Release graph nodes + CI/CD approval loop
full_cycle.py # Subgraph composition
ci_nodes.py # CI trigger, poll, notify nodes
polling.py # Reusable async poll_until utility
models/
pr.py, ticket.py, release.py, pipeline.py
webhook.py, review.py, jira.py, build.py
tools/
azdo.py # Azure DevOps REST client
jira.py # Jira REST client (transitions + create_issue)
slack.py # Slack dual-mode (webhook + Web API)
claude_review.py # Claude Code CLI (review + ticket generation)
_http.py, _retry.py # Shared helpers
services/
pr_poller.py # Background PR polling loop
pr_dedup.py # PR deduplication via agent_threads
scripts/
migrate_json_to_db.py # One-time JSON -> PostgreSQL migration
tests/ # 1061 tests, 96%+ coverage
```
## Docker
```bash
# Required: set POSTGRES_PASSWORD and WEBHOOK_SECRET in .env
docker compose up -d
```
The agent service includes a health check at `/status`. PostgreSQL uses
`pg_isready` with `service_healthy` dependency.
## Slack App Setup
To use interactive buttons (optional — REST API approvals still work without it):
1. Create a Slack App at https://api.slack.com/apps
2. Enable **Interactivity** with Request URL: `https://<your-domain>/slack/interactions`
3. Add Bot Token Scopes: `chat:write`, `chat:update`
4. Install to workspace, get Bot Token (`xoxb-...`)
5. Set `SLACK_BOT_TOKEN`, `SLACK_SIGNING_SECRET`, `SLACK_CHANNEL_ID` in `.env`